Saad Krylov Subspace Methods for Solving Large Unsymmetric Linear Systems
Saad Krylov Subspace Methods for Solving Large Unsymmetric Linear Systems
Author(s): Y. Saad
Source: Mathematics of Computation , Jul., 1981, Vol. 37, No. 155 (Jul., 1981), pp. 105-
126
Published by: American Mathematical Society
REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.com/stable/2007504?seq=1&cid=pdf-
reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
American Mathematical Society is collaborating with JSTOR to digitize, preserve and extend
access to Mathematics of Computation
By Y. Saad*
Abstact. Some algorithms based upon a projection process onto the Krylov subspace
Km = Span(ro,Aro, . . .,AmA- rO) are developed, generalizing the method of conjugate
gradients to unsymmetric systems. These methods are extensions of Arnoldi's algorithm for
solving eigenvalue problems. The convergence is analyzed in terms of the distance of the
solution to the subspace Km and some error bounds are established showing, in particular, a
similarity with the conjugate gradient method (for symmetric matrices) when the eigenvalues
are real. Several numerical experiments are described and discussed.
1. Introduction. Few efficient iterative methods have been developed for treating
large nonsymmetric linear systems. Some methods amount to solving the normal
equations A HAx = AHb associated with the system Ax = b or with some other
system derived by a preconditioning technique.
This, unfortunately, is sensitive to the conditioning of A HA which is in general
much worse than that of A. Techniques using Chebyshev iteration (12] do not
suffer from this drawback but require the computation of some eigenvalues of A.
A powerful method for solving symmetric linear systems is provided by the
conjugate gradient algorithm. This method achieves a projection process onto the
Krylov subspace Km = Span(ro, Aro, . .. , A m- 1ro), where ro is the initial residual
vector. Although the process should theoretically produce the exact solution in at
most N steps, it is well known that a satisfactory accuracy is often achieved for
values of m for less than N [15]. Concus and Golub [5] have proposed a generaliza-
tion of the conjugate gradient method which is based upon the splitting of A into
its symmetric and skew-symmetric parts.
The purpose of the present paper is to generalize the conjugate gradient method
regarded as a projection process onto the Krylov subspace K,,. We shall say of a
method realizing such a process that it belongs to the class of Krylov subspace
methods. It will be seen that these methods can be efficient for solving large
nonsymmetric systems.
The next section describes the Krylov subspace methods from a theoretical point
of view. In Section 3 some algorithms are proposed. They are essentially the
extensions of the Arnoldi-like methods for solving large eigenvalue problems
described in [18]. Section 4 deals with the convergence of the Krylov subspace
methods. Finally, some numerical experiments are described in Section 5.
PROPOSITION 2.1. Let Ym = 117TmA(I - 7t)I. Then the residual of umx* for problem
(2.5) satisfies
(2.6) lb - Am7TmX*|ll < ymll(I - 7m)X*11-
Proof.
b - AmTmX* = b - 7TmA7Tmx* = b - 7ImA[X* - (I -m)x*]
= 7TmA(I - 7Tm)x*-
As a consequence, we can state the next corollary which gives a bound for
||X* - x(m) .
COROLLARY 2. 1. Let Ym be defined as above and let Km be the norm of the inverse of
Am. Then the error x* - x(m) satisfies
Proof. By Proposition (2.1) and the fact that x(m) - 7Tmx* = A7,(b
we get
(2.11) dim(Km) = m.
If Vm _ [v, ... ., vm] is any basis of Km then
can be expressed as z(m) = Vm y(), where y(m) is the solution of the m x m system
3. Practical Methods. Some algorithms based upon the Krylov subspace methods
described above will now be presented. We first propose an adaptation of Arnoldi's
method [1], [18] to the solution of systems of linear equations. The algorithm
constructs an orthonormal basis Vm = [v,, ... ., vm] of Km such that V'TA Vm ha
Hessenberg form. An iterative version of this method is also given so as to avoid
the storage of too large arrays in memory. Then another class of algorithms is
derived from the incomplete orthogonalization method described in [18].
3.1. The Method of Arnoldi. Arnoldi's algorithm builds an orthonormal basis
vl, ... , v,,, of Km = Span[ro, Aro, .. 'Am- A ro] by the recurrence
k
which can be derived from the algorithm and from equality (2.8).
An interesting practical method would be to generate the vectors vk and the
matrix Hk, k = 1, 2, ... , m, . . ., to compute periodically the estimate
hm+ ImIe:y(m)I of the norm of the residual and to stop as soon as this i
enough. As was suggested in [15] for the symmetric case, there are various ways of
updating Ie4Hy(m)l without even actually computing the vector y(m). Let us give a few
indications about the problem of computing the estimation je,y(m)j, since it will
appear in several parts along the paper. Parlett [15] suggests utilizing a recurrence
relation proposed by Paige and Saunders [14], which is based upon the LQ
factorization of Hm.
provided by the Gaussian elimination with partial pivoting on the matrix Hj. Th
factorization of Hj can be easily performed by using the information at the
previous step. Supposing that no pivoting has been necessary for steps 1 through
j -1, and writing the LU factorization of Hj, Hj = LU, it can be easily seen that
pj = 1+ I e, yI e is simply
pj = hj+ 'i (p t 11 u
where the 1j, i = 1, . . ., j-1, are the successive pivots. More generally, it can b
shown that when no pivoting has been necessary at steps i, i E I, where I c
{1, 2, ... ., j- 1), then pj becomes
Algorithm 3.2.
1. Start. Choose m and x0; ro b - Axo.
2. Fors:=O,1,..., do
*Compute vI, v2, .. v, m and Hm by Algorithm 3.
rS/( , := llrsll)
*Solve the system Hm y = y eI
*Zs+ 1 Vm Y
*Xs+1= Xs + Zs+1
rs+l := rs- Azs+1
* If hm+ 1,m Ie,I < c, stop else continue.
3.3. Incomplete Orthogonalization Methods.
3.3.1. The construction of the vectors v,, . .. , vm by Algorithm 3.1 amounts to
orthogonalizing the vectors Avk against all previous vectors v1, . .. , Vk. This is
costly and some numerical observations suggest to orthogonalize Avk against the
precedingp + 1 vectors rather than all; see [18].
The system produced is such that (vi, vj) = 8, for i, j satisfying li - jj < p.
Algorithm 3.3.
1. Choose p and m such that p < m; compute ro := b - Axo and vI := ro/ IIroII.
2. Forj:= 1,2,...,mdo
io max(l, -p + 1)
w := vj - j.ohii vi with
(3.7) huj: (Avj, vi),
(3.8) vj+ I w/(h+1j := IIwll).
Under the assumption (2.11), this algorithm will not stop before the mth step and
will produce a system of vectors v,, . .. , v, locally orthogonal and a (banded)
Hessenberg matrix of the form
Hm= 0]
whose nonzero elements are computed from (3.7) and (3.8). The generalized
Lanczos approximation z(m) must satisfy the equations
Hny )- ( V[Vm)' Vm ro = 0.
Observing that (V:,Vm)-'Vp[ro = 13e, where f = lroll, we obtain the system
(3.11) H)(m)- f3el = 0.
(3.12) Ym = Ym-aHrn'Sm,
Ym := f3J,'len
x := IIlsM
a := e,ym/ (1 + e'x)
Ym =Ym - ax
x(M) = x0 + Vm *m.
We shall now give some additional practical details.
1. If necessary, the vectors vI, v2, . . ., Vm may be stored in auxiliary memory,
one by one as soon as they are computed. Only the p vectors vj, vj . . ., +1
must be kept in main memory for more efficiency.
hm+i,mep:[jm .i
(3.15) Ym = 9m + 1 A ISMI,H
It is remarkable that, by (3.6), the term hm+ me
lro -Az(m)II, except for the sign, and hence it
{ v1, .. , vm+ 1) is nearly orthonormal, then VVM m
in general. This shows that, in general, the second term on the right-hand side of
(3.15) can be neglected (in comparison with 9m) as long as Vm+i remains nearly
orthonormal. This fact is confirmed by the experiments, and it is observed that the
residual norms behave in the same manner as the residual norms obtained for the
incomplete orthogonalization method applied to the eigenvalue problem; see [18,
Section 4.2].
The residual norms IIro - Aim decrease rapidly until a certain step and then
start oscillating and decreasing more slowly. This suggests restarting immediately
after a residual norm is larger than the previous one. Here, again, the formula (3.6)
remains very useful for estimating the residual norm. This leads to the following
algorithm.
Algorithm 3.5. Incomplete Orthogonalization Without Correction.
z(m) :- fVmHHel
i - _AE(m)
r := r- Ai(m)
The numerical experiments (Section 5) will reveal that this last algorithm is to be
preferred to the iterative Arnoldi algorithm and to the incomplete orthogonaliza-
tion method with correction. Surprisingly, it is often the case that no restart is
necessary, even for matrices that are not nearly symmetric.
We shall conclude this section by a remark concerning the application of
preconditioning techniques to the algorithms described above. Suppose that we can
find a matrix M, for which linear systems are easily solvable and such that M-IA is
closer to the identity than A. In this case it is advantageous, in general, to replace
the system Ax = b by the new system M-'Ax = M-lb before applying one of the
previous methods. There are two reasons for this. The first is that the rate of
convergence of the second system will, in general, be higher than that of the first
because the spectrum will be included in a disk with center one and with small
radius, and the next section will show that in that case the smaller the radius, the
higher the rate of convergence. The second is that M-IA, which is close to the
identity matrix, is clearly close to a symmetric matrix (the Identity), so that the
application of incomplete orthogonalization without correction is most effective; cf.
Subsection (5.5).
PROPOSITION 4.1. The distance 11(I - 7m)z*ll between z* and the Krylov subspace
Km satisfies
In order to obtain an upper bound for (4.1), we shall assume that A admits N
eigenvectors 4,. 02, ... I ON of norm one, associated with the eigenvalues
XI, XN. Then the solution z* can be expressed as
N
Z*= I
i-i
THEOREM 4.1. Set a = a IaA, where the at, are the components of the solution z*
in the eigenbasis of A.
Then
l/p(A)z*lj = Fp(
N N N
l |aip(i\)O
Therefore, for any polynomial of degree not exceeding m such that p(O) = 1, we
have
THEOREM 4.2. Let m < N - 1. Then there exist m + 1 eigenvalues which, without
The result does not specify which are the eigenvalues A1, ... ., XM+,I but it stil
gives an interesting indication. If the origin is well separated from the spectrum,
then e(m) is likely to be very small. Indeed, if A1 is, for example, the eigenvalue the
closest to zero, among those eigenvalues involved in the theorem, then, in general,
we shall have IAkI > IAX - AkI, k = 1, . . . , N, as seen in Figure 1. Therefore,
m+1 IAk I
k-2 AXk - Il
and it is seen from (4.6) that e(m) will be small. There are particular distributions of
the eigenvalues where e(m) is known exactly (for m = N - 1). But, in general, the
result (4.6) is not useful for giving an estimation of the rate of convergence. Upper
bounds for e(m) must be established for that purpose.
Im(X)
___\__ Re (A)
? k'
FIGuRE 1
4.3. Bounds for e(m). In the real case one usually obtains bounds for e(m) by
majorizing the discrete norm maxj ,NIP(?)I by the continuous norm max,.,iIp(X)I,
where I is an interval (or the union of two intervals) containing the eigenvalues X.
and not zero.
In the complex case, however, one encounters the difficulty of choosing an
adequate continuum containing all the eigenvalues and not zero. An infinity of
choices are possible, but, except for some particular shapes such as circles,
ellipses..., there is no simple expression for the minimax quantity
THEOREM 4.3. Suppose that all the eigenvalues of A are real and positive and let
X;\. and Xm. be the smallest and the largest of them. Then
(4.7) I(1 - TM)Z*II < a/Tm(y)
where a is as before, y = (Ama + .mjn)/(.mac- Xi), and wh
Chebyshev polynomial of degree m of the first kind.
This result is an immediate application of a well-known bound for (4.4) when the
At are real [2]. It is also possible to establish some results when the eigenvalues are
known to lie in two or more intervals; see [21, [101.
Inequality (4.7) shows that the generalized Lanczos method converges at least as
(4.8) pe m zemaEx
p(c/e)= I
where the domain E' is bounded by the ellipse centered at origin with eccentricity
one and major semiaxis a/e. It was shown by Clayton [4] that the above minimax
is realized for the polynomial Tm(z')/ Tm(c/e).
Im(z)
e DI
a |rV_
Re (z)
FIGURE 2
THEOREM 4.4. Assume that the eigenvalues of A lie within an ellipse with center c
on the real axis, foci c + e, c - e, and with major semiaxis a. Suppose that the origin
is not inside this ellipse. Then
By the maximum principle, the maximum on the right-hand side is realized for z'
belonging to the boundary aE' of the ellipse E' centered at the origin and having
major semiaxis a/e and eccentricity one. Thus, (4.2) becomes
(4.12) Idc + -
a + a2 e2
When the eigenvalues are all real, then the ellipse degenerates to the interval
[A1, AN], and we shall have e = a = (AN - A1)/2, c = (A1 + AN)/2 such that T will
become y + y2 - 1 with y = (AN + Al)/(AN - A1). This means that the result
(2.17) coincides with that of Corollary 2.1 when the spectrum lies on the real line.
Consider now the family of all ellipses having center c and major semiaxis a, and
let the eccentricity decrease from a to zero. Then the ellipse will pass from the
interval (c - a, c + a) to the circle with center c and radius a. It is easily seen
that the bound (4.12) for the rate of convergence will decrease from
THEOREM 4.4. Suppose that there exists a disk D(c, a), with center c and radius a,
that contains all the eigenvalues of A and not the origin. Then
(4.14) E(M) a
Proof. Consider the particular polynomial p(z) = [(c - z)/c]m. p has degree m
and satisfiesp(O) = 1. Hence, by (2.13),
E(m) <j=I,N
max >()j< c
|-Aic
| < E| a
Im(z)
i~~~~~~ a
Re (z)
FIGuRE 3
The coefficient Ia/cl in (2.21) is smaller than one, and one can even choose an
"optimal" circle for which la/cl is the least. The optimal center j should minimize
max N,, I(c - Xj)/cI over all complex c, c :# 0, and the optimal radius J is
simply max - .NIc- 1 The inequality (2.21) is the best bound possible for e(m)
that can be obtained by replacing the discrete set (X1, . .. , XN} by the disk D(c, a)
in the formula (2.13). This is due to the next theorem, proved by Zarantonello in
[22].
THEOREM 2.3. The polynomial ((c - z)/c)m is the polynomial of degree m having
least uniform norm over the disk D(c, a) when a < Ici. Furthermore
a M
min max = -|
p E Pm z E D(c,a) C
p(O)=l
Dk [ k[ke k]
-k d] k = 1,29...,.n.
The dk and ek are chosen in such a way that the eigenvalues Xk = dk ? iek of A lie
on the ellipse having center c = 1 and major semiaxis a = 0.8. The eccentricity e
varies from e = 0 to e = 0.8. The real parts dk of the eigenvalues are uniformly
distributed on the interval [c - a, c + a]. In other words
dk = 0.2+ k-i
TABLE 1
Note that in passing from e = 0.79 to e = 0.80 the spectrum of the matrix A
becomes purely real and consists in 40 double eigenvalues, which explains the jump
in the actual rate of convergence.
The values Pact and Pest of Table 1 are plotted in Figure 4.
R
A/
T
E
s
F .
C /I
0
NNJ /
R . 4 -
E
N
0. 1 .
0. .4 .
.2 .8
E C C E N T R I C I T Y
FIGURE 4
A = -I * * I] with a
anda = -1 + 8; b =- 1-.
L
00
F
A -4
L
0. 100. 200,
50. 150.
N U M B E R 0 F S T E P S
FIGuRE 5
cost of computing Av dominates all the other costs in each step, but this will not
always be the case. Figure 5 also shows that, when the matrix by vector multiplica-
tion is costly, it may be advantageous to choose m as large as possible.
5.3. In the previous example, the matrix treated is nearly symmetric and so the
use of the incomplete orthogonalization method without correction is more suit-
able. Takingp = 2, and starting with the same initial vector as in the experiment of
5.2, yielded a rapidly decreasing sequence of residual norm estimates. No restart
was necessary, and convergence occurred after 90 steps with a residual norm equal
to 4.6 x 10-1". Clearly, the amount of work required here is far less than that
required by either of the methods compared in 5.2.
5.4. We shall now compare the incomplete orthogonalization methods with and
without corrective step on the 100 x 100 block-tridiagonal matrix A of Subsection
5.2, obtained by taking 8 = 0.2. In a first test an iterative method based upon the
incomplete orthogonalization algorithm with correction (Algorithm 3.4) was tried.
As soon as the estimate f8hm+ I,mIe:NymI of the residual norm stops decreasing or
when the number of steps reaches the maximum number of steps allowed, mm, =
40, the algorithm is halted, a corrective step is taken, and the algorithm is either
stopped (if the residual norm is small enough) or restarted. For the present
example, the algorithm halted first at m = 20 and gave a residual norm of 1.8.
After the correction step, the residual norm dropped down to 6.2 x 10-3. In the
second iteration the algorithm halted at m = mmax = 40 and gave the residual
norms 9.6 x 10-5 before the correction and 1.14 x 106 after.
It is important to mention that, here, the corrective steps necessitate the use of
the bidiagonalization algorithm to compute the corrective column Sm, which is
usually very expensive.
The results obtained with the incomplete orthogonalization method without
correction are by far superior from the point of view of the run times. Algorithm
3.5 was first tested with p = 2. At the first iteration the residual norms decreased
from 7.6 to 1.8 at the 15th step and then a restart was made. At the second iteration
the residual norms kept decreasing rapidly to 2.1 x 106 at the 60th step. The test
with p = 4 yielded a steadily decreasing sequence of residual norm estimates and
therefore no restart has been necessary. The final residual norm obtained at
m = 60 was 7.88 x 10-7.
5.5. Finally, we shall describe an experiment on a more difficult example
considered in [19]. The runs reported below have been made on a CDC CYBER
175 computer using a word of 60 bits and a mantissa of 48 bits (single precision).
The problem Ax = b treated has dimension N = 1000 and the nonzero part of A
consists of 7 diagonals
(The nonzero elements of the first row and first column of A are Al,, A12,
A1,10, A1,,1, A21, Aj,0, A1O,O.) The problem originated from the simulation of
G
0.
R
E -1.
S
I
0
U
A -2.
L
N
0
R -3-
Mk \
-4.
-8.
0. 40.
20. 80.
ITERATIONS
FIGURE 6
Two runs have been made with Algorithm 3.5, the first with p = 2 and the
second with p = 4. The same preconditioning matrix M = LU as above has been
used. Figure 6 shows the evolution of the residual norms IIM-Ax(k) - M-'bI and
confirms the remarks ending Section 3. In either case, no restart was necessary
1. W. E. ARNOLDI, "The principle of miniized iterations in the solution of the matrix eigenvalue
problem," Quart. Appl. Math., v. 9, 1951, pp. 17-29.
2. 0. AXELSSON, Solution of Linear Systems of Equations: Iterative Methods, Lecture Notes in Math.,
vol. 572 (V. A. Barker, Ed.), Springer-Verlag, Berlin and New York, 1977, pp. 1-5 1.
3. A. Bj6RK & T. ELFVING, "Accelerated projection methods for computing pseudo inverse solutions
of systems of linear equations," BIT, v. 19, 1979, pp. 145-163.
4. A. CLAYTON, Further Results on Polynomials Having Least Maximum Modules Over an Ellipse in
the Complex Plane, UKAEA Report AEEW-7348, 1963.
5. P. CONCUS & G. H. GOLUB, A Generalized Conjugate Gradient Methodfor Non-Symmetric Systems
of Linear Equations, Report STAN-CS-75-535, Computer Science Dept., Stanford University, 1976.
6. D. K. FADDEEV & V. N. FADDEEVA, Computational Methods of Linear Algebra, Freeman, San
Francisco, Calif., 1963.
7. A. S. HOUSEHOLDER, The Theory of Matrices in Numerical Analysis, Blaisdell, New York, 1964.
8. M. A. KRASNOSELSKHI ET AL., Approximate Solutions of Operator Equations, Wolters-Noordhoff,
Groningen, 1972.
9. C. C. LANczos, "Solution of systems of linear equations by minimized iterations," J. Res. Nat.
Bur. Standards, v. 49, 1952, pp. 33-53.
10. V. I. LEBEDEV, "Iterative methods for solution of operator equations with their spectrum on
several intervals," Z. Vycisl. Mat. i Mat. Fiz., v. 9, 1969, pp. 1247-1252.
11. G. G. LoRENrz, Approximation of Functions, Holt, Rinehart & Winston, New York, 1966.
12. T. A. MANTEUFFEL, An Iterative Method for Solving Nonsymmetric Linear Systems With Dynamic
Estimation of Parameters, Report UIUCDCS-R-75-758, Dept. of Computer Science, Univ. of Illinois at
Urbana-Champaign; Ph.D. thesis, 1975.
13. C. C. PAIGE, "Bidiagonalization of matrices and solution of linear equations," SIAM J. Numer.
Anal., v. 11, 1974, pp. 197-209.
14. C. C. PAIGE & M. A. SAUNDERS, "Solution of sparse indefinite systems of linear equations," SIAM
J. Numer. Anal., v. 12, 1975, pp. 617-629.
15. B. N. PARLETr, "A new look at the Lanczos algorithm for solving symmetric systems of linear
equations," Linear Algebra Appl., v. 29, 1980, pp. 323-346.
16. J. K.- REID, "On the method of conjugate gradients for the solution of large sparse systems of
linear equations," in Large Sparse Sets of Linear Equations (J. K. Reid, Ed.), Academic Press, New
York, 1971.
17. T. J. RIVIIN, The Chebyshev Polynomials, Wiley, New York, 1976.
18. Y. SAAD, "Variations on Armoldi's method for computing eigenelements of large unsymmetric
matrices," Linear Algebra Appl., v. 34, 1980, pp. 269-295.
19. P. E. SAYLOR, Richardson's Iteration With Dynamic Parameters and the SIP Approximate
Factorization for the Solution of the Pressure Equation, Society of Petroleum Engineers of AIME Fifth
Symposium on Reservoir Simulation, Denver, Colorado, 1979, SPE 7688.
20. G. W. STEWART, Introduction to Matrix Computation, Academic Press, New York, 1973.
21. H. L. STONE, "Iterative solution of implicit approximations of multidimensional partial differential
equations," SIAM J. Numer. Anal., v. 5, 1968, pp. 530-558.
22. R. S. VARGA "A comparison of the successive overrelaxation method and semi-iterative methods
using Chebyshev polynomials," J. Soc. Indust. Appl. Math., v. 5, 1957, pp. 39-46.
23. H. E. WRIGLEY, "Accelerating the Jacobi method for solving simultaneous equations by
Chebyshev extrapolation when the eigenvalues of the iteration matrtix are complex," Comput. J., v. 6,
1963, pp. 169-176.