Nash J.C. Compact Numerical Methods for Computers.. Lin. Algebra and Function Minimisation (2ed., IOP, 1990)(288s)_MN_ (1)
Nash J.C. Compact Numerical Methods for Computers.. Lin. Algebra and Function Minimisation (2ed., IOP, 1990)(288s)_MN_ (1)
METHODS
FOR COMPUTERS
linear algebra and
function minimisation
Second Edition
J C NASH
ISBN 0-85274-318-1
ISBN 0-85274-319-X (pbk)
ISBN 0-7503-0036-1 (5¼" IBM disc)
ISBN 0-7503-0043-4 (3½" IBM disc)
1. A STARTING POINT 1
1.1. Purpose and scope 1
1.2. Machine characteristics 3
1.3. Sources of programs 9
1.4. Programming languages used and structured programming 11
1.5. Choice of algorithms 13
1.6. A method for expressing algorithms 15
1.7. General notation 17
1.8. Software engineering issues 17
v
vi Compact numerical methods for computers
6. LINEAR EQUATIONS-A DIRECT APPROACH 72
6.1. Introduction 72
6.2. Gauss elimination 72
6.3. Variations on the theme of Gauss elimination 80
6.4. Complex systems of equations 82
6.5. Methods for special matrices 83
APPENDICES 253
1. Nine test matrices 253
2. List of algorithms 255
3. List of examples 256
4. Files on the software diskette 258
BIBLIOGRAPHY 263
INDEX 271
PREFACE TO THE SECOND EDITION
The first edition of this book was written between 1975 and 1977. It may come as a
surprise that the material is still remarkably useful and applicable in the solution of
numerical problems on computers. This is perhaps due to the interest of researchers
in the development of quite complicated computational methods which require
considerable computing power for their execution. More modest techniques have
received less time and effort of investigators. However, it has also been the case that
the algorithms presented in the first edition have proven to be reliable yet simple.
The need for simple, compact numerical methods continues, even as software
packages appear which relieve the user of the task of programming. Indeed, such
methods are needed to implement these packages. They are also important when
users want to perform a numerical task within their own programs.
The most obvious difference between this edition and its predecessor is that the
algorithms are presented in Turbo Pascal, to be precise, in a form which will operate
under Turbo Pascal 3.01a. I decided to use this form of presentation for the following
reasons:
Section 1.6 and appendix 4 give some details about the codes and especially the
driver and support routines which provide examples of use.
The realization of this edition was not totally an individual effort. My research
work, of which this book represents a product, is supported in part by grants from
the Natural Sciences and Engineering Research Council of Canada. The Mathema-
tics Department of the University of Queensland and the Applied Mathematics
Division of the New Zealand Department of Scientific and Industrial Research
provided generous hospitality during my 1987-88 sabbatical year, during which a
great part of the code revision was accomplished. Thanks are due to Mary Walker-
Smith for reading early versions of the codes, to Maureen Clarke of IOP Publishing
Ltd for reminders and encouragement, and to the Faculty of Administration of the
University of Ottawa for use of a laser printer to prepare the program codes. Mary
Nash has been a colleague and partner for two decades, and her contribution to this
project in many readings, edits, and innumerable other tasks has been a large one.
In any work on computation, there are bound to be errors, or at least program
ix
x Compact numerical methods for computers
structures which operate in unusual ways in certain computing environments. I
encourage users to report to me any such observations so that the methods may be
improved.
J. C. Nash
Ottawa, 12 June 1989
PREFACE TO THE FIRST EDITION
xi
xii Compact numerical methods for computers
(ii) in the United States, the members of the Applied Mathematics Division of the
Argonne National Laboratory who have taken such an interest in the algorithms,
and Stephen Nash who has pointed out a number of errors and faults; and
(iii) in Canada, the members of the Economics Branch of Agriculture Canada for
presenting me with such interesting problems to solve, Kevin Price for careful and
detailed criticism, Bob Henderson for trying out most of the algorithms, Richard
Wang for pointing out several errors in chapter 8, John Johns for trying (and
finding errors in) eigenvalue algorithms, and not least Mary Nash for a host of
corrections and improvements to the book as a whole.
It is a pleasure to acknowledge the very important roles of Neville Goodman
and Geoff Amor of Adam Hilger Ltd in the realisation of this book.
J. C. Nash
Ottawa, 22 December 1977
Chapter 1
A STARTING POINT
q+r>q (1.8)
will be found. On a machine which truncates, r is then the radix R. However, if
the machine rounds in some fashion, the condition (1.8) may be satisfied for r < R.
Nevertheless, the representations of q and (q + r) will differ by R. In the example,
doubling will produce q = 16 384 which will be represented as
0·1638 * 1E5
so q + r is represented as
0·1639 * 1E5
for some r 10. Then subtraction of these gives
0·0001 * 1E5 = 10.
Unfortunately, it is possible to foresee situations where this will not work.
Suppose that q = 99 990, then we have
0·9999 * 1E5 + 10 = 0·1000 * 1E6
and
0·1000 * 1E6–0·9999 * 1E5 = R'.
But if the second number in this subtraction is first transformed to
0·0999 * 1E6
then R´ is assigned the value 100. Successive doubling should not, unless the
machine arithmetic is extremely unusual, give q this close to the upper bound of
(1.6).
Suppose that R has been found and that it is greater than two. Then if the
representation of q + (R – 1) is greater than that of q, the machine we are using
rounds, otherwise it chops or truncates the results of arithmetic operations.
The number of radix digits t is now easily found as the smallest integer such
that
Rt + 1
is represented identically to Rt. Thus the machine precision is given as
eps = R1-t = R-(t-1). (1.9)
In the example, R = 10, t = 4, so
R-3 = 0·001.
Thus
1 + 0·00l = 1·001 > 1
but 1 + 0·0009 is, on a machine which truncates, represented as 1.
In all of the previous discussion concerning the computation of the machine
precision it is important that the representation of numbers be that in the
8 Compact numerical methods for computers
memory, not in the working registers where extra digits may be carried. On a
Hewlett-Packard 9830, for instance, it was necessary when determining the
so-called ‘split precision’ to store numbers specifically in array elements to force
the appropriate truncation.
The above discussion has assumed a model of floating-point arithmetic which may
be termed an additive form in that powers of the radix are added together and the
entire sum multiplied by some power of the radix (the exponent) to provide the final
quantity representing the desired real number. This representation may or may not
be exact. For example, the fraction cannot be exactly represented in additive binary
(radix 2) floating-point arithmetic. While there are other models of floating-point
arithmetic, the additive form is the most common, and is used in the IEEE binary
and radix-free floating-point arithmetic standards. (The March, 1981, issue of IEEE
Computer magazine, volume 3, number 4, pages 51-86 contains a lucid description of
the binary standard and its motivations.)
If we are concerned with having absolute upper and lower bounds on computed
quantities, interval arithmetic is possible, but not commonly supported by program-
ming languages (e.g. Pascal SC (Kulisch 1987)). Despite the obvious importance of
assured bounds on results, the perceived costs of using interval arithmetic have
largely prevented its widespread use.
The development of standards for floating-point arithmetic has the great benefit
that results of similar calculations on different machinery should be the same.
Furthermore, manufacturers have been prompted to develop hardware implemen-
tations of these standards, notably the Intel 80 x 87 family and the Motorola 68881
of circuit devices. Hewlett-- Packard implemented a decimal version of the IEEE 858
standard in their HP 71B calculator.
Despite such developments, there continues to be much confusion and misinfor-
mation concerning floating-point arithmetic. Because an additive decimal form of
arithmetic can represent fractions such as exactly, and in general avoid input-
output conversion errors, developers of software products using such arithmetic
(usually in binary coded decimal or BCD form) have been known to claim that it has
'no round-off error', which is patently false. I personally prefer decimal arithmetic, in
that data entered into a calculation can generally be represented exactly, so that a
display of the stored raw data reproduces the input familiar to the user. Nevertheless,
the differences between good implementations of floating-point arithmetic, whether
binary or decimal, are rarely substantive.
While the subject of machine arithmetic is still warm, note that the mean of two
numbers may be calculated to be smaller or greater than either! An example in
four-figure decimal arithmetic will serve as an illustration of this.
SEND INDEX
to the pseudo-user NETLIB at node ANL-MCS on the ARPA network (Dongarra and
Grosse 1987). The software itself may be obtained by a similar mechanism.
Suppliers such as the Numerical Algorithms Group (NAG), International Math-
ematical and Statistical Libraries (IMSL), C Abaci, and others, have packages
designed for users of various computers and compilers, but provide linkable object
code rather than the FORTRAN source. C Abaci, in particular, allows users of the
Scientific Desk to also operate the software within what is termed a ‘problem solving
environment’ which avoids the need for programming.
For languages other than FORTRAN, less software is available. Several collections of
programs and procedures have been published as books, some with accompanying
diskettes, but once again, the background and true authorship may be lacking. The
number of truly awful examples of badly chosen, badly coded algorithms is alarming,
and my own list of these too long to include here.
Several sources I consider worth looking at are the following.
Maindonald (1984)
—A fairly comprehensive collection of programs in BASIC (for a Digital Equip-
ment Corporation VAX computer) are presented covering linear estimation,
statistical distributions and pseudo-random numbers.
Nash and Walker-Smith (1987)
—Source codes in BASIC are given for six nonlinear minimisation methods and a
large selection of examples. The algorithms correspond, by and large, to those
presented later in this book.
LEQBO5 (Nash 1984b, 1985)
—This single ‘program’ module (actually there are three starting points for
execution) was conceived as a joke to show how small a linear algebra package
could be made. In just over 300 lines of BASIC is the capability to solve linear
equations, linear least squares, matrix inverse and generalised inverse, sym-
metric matrix eigenproblem and nonlinear least squares problems. The joke
back-fired in that the capability of this program, which ran on the Sinclair ZX81
computer among other machines, is quite respectable.
Kahaner, Moler and Nash (1989)
—This numerical analysis textbook includes FORTRAN codes which illustrate the
material presented. The authors have taken pains to choose this software for
A starting point 11
quality. The user must, however, learn how to invoke the programs, as there is
no user interface to assist in problem specification and input.
Press et al (1986) Numerical Recipes
—This is an ambitious collection of methods with wide capability. Codes are
offered in FORTRAN, Pascal, and C. However, it appears to have been only
superficially tested and the examples presented are quite simple. It has been
heavily advertised.
Many other products exist and more are appearing every month. Finding out
about them requires effort, the waste of which can sometimes be avoided by using
modern online search tools. Sadly, more effort is required to determine the quality of
the software, often after money has been spent.
Finally on sources of software, readers should be aware of the Association for
Computing Machinery (ACM) Transactions on Mathematical Software which pub-
lishes research papers and reports algorithms. The algorithms themselves are avail-
able after a delay of approximately 1 year on NETLIB and are published in full in the
Collected Algorithms of the ACM. Unfortunately, many are now quite large pro-
grams, and the Transactions on Mathematical Software (TOMS) usually only
publishes a summary of the codes, which is insufficient to produce a working
program. Moreover, the programs are generally in FORTRAN.
Other journals which publish algorithms in some form or other are Applied
Statistics (Journal of the Royal Statistical Society, Part C), the Society for Industrial
and Applied Mathematics (SIAM) journals on Numerical Analysis and on Scientific
and Statistical Computing, the Computer Journal (of the British Computer Society),
as well as some of the specialist journals in computational statistics, physics,
chemistry and engineering. Occasionally magazines, such as Byte or PC Magazine,
include articles with interesting programs for scientific or mathematical problems.
These may be of very variable quality depending on the authorship, but some
exceptionally good material has appeared in magazines, which sometimes offer the
codes in machine-readable form, such as the Byte Information Exchange (BIX) and
disk ordering service. The reader has, however, to be assiduous in verifying the
quality of the programs.
(1.10)
2.1. INTRODUCTION
A great many practical problems in the scientific and engineering world give rise
to models or descriptions of reality which involve matrices. In consequence, a very
large proportion of the literature of numerical mathematics is devoted to the
solution of various matrix equations. In the following sections, the major formal
problems in numerical linear algebra will be introduced. Some examples are
included to show how these problems may arise directly in practice. However, the
formal problems will in most cases occur as steps in larger, more difficult
computations. In fact, the algorithms of numerical linear algebra are the key-
stones of numerical methods for solving real problems.
Matrix computations have become a large area for mathematical and compu-
tational research. Textbooks on this subject, such as Stewart (1973) and Strang
(1976), offer a foundation useful for understanding the uses and manipulations of
matrices and vectors. More advanced works detail the theorems and algorithms for
particular situations. An important collection of well-referenced material is Golub
and Van Loan (1983). Kahaner, Moler and Nash (1989) contains a very readable
treatment of numerical linear algebra.
(2.3)
(2.5)
where 0 is the null vector having all components zero. If the vectors aj are now
assembled to make the matrix A and are linearly independent, then it is always
possible to find an x such that (2.2) is satisfied. Other ways of stating the
condition that the columns of A are linearly independent are that A has full rank
or
rank(A) = n (2.6)
or that A is non-singular,
If only k < n of the vectors are linearly independent, then
rank(A) = k (2.7)
and A is singular. In general (2.2) cannot be solved if A is singular, though
consistent systems of equations exist where b belongs to the space spanned by
{aj: j = 1, 2, . . . , n}.
In practice, it is useful to separate linear-equation problems into two categories.
(The same classification will, in fact, apply to all problems involving matrices.)
(i) The matrix A is of modest order with probably few zero elements (dense).
(ii) The matrix A is of high order and has most of its elements zero (sparse).
The methods presented in this monograph for large matrices do not specifically
require sparsity. The question which must be answered when computing on a small
machine is, ‘Does the matrix fit in the memory available?’
Example 2.1. Mass - spectrograph calibration
To illustrate a use for the solution of a system of linear equations, consider the
determination of the composition of a mixture of four hydrocarbons using a mass
spectrograph. Four lines will be needed in the spectrum. At these lines the
intensity for the sample will be bi, i = 1, 2, 3, 4. To calibrate the instrument,
intensities Aij for the ith line using a pure sample of the j th hydrocarbon are
measured. Assuming additive line intensities, the composition of the mixture is
then given by the solution x of
Ax = b.
Example 2.2. Ordinary differential equations: a two-point boundary-value problem
Large sparse sets of linear equations arise in the numerical solution of differential
Formal problems in linear algebra 21
equations. Fröberg (1965, p 256) considers the differential equation
y" + y/(1+x 2 ) = 7x (2.8)
with the boundary conditions
y= 0
{
2
for x = 0
for x = 1.
(2.9)
(2.10)
To solve this problem numerically, Fröberg replaces the continuum in x on the
interval [0, 1] with a set of (n + 1) points, that is, the step size on the grid is
h = 1/n. The second derivative is therefore replaced by the second difference at
point j
(yj+l – 2yj + yj-1)/h 2 . (2.11)
The differential equation (2.8) is therefore approximated by a set of linear
equations of which the jth is
(2.12)
or
(2.13)
(2.21)
A+ A = (2.35)
but in this case x is not defined uniquely since it can contain arbitrary components
from the orthogonal complement of the space spanned by the columns of A. That
is, we have
x = A+ b + (1 n – A + A) g (2.36)
where g is any vector of order n.
The normal equations (2.22) must still be satisfied. Thus in the full-rank case, it
is straightforward to identify
A+ = (AT A) -lA T . (2.37)
In the rank-deficient case, the normal equations (2.22) imply by substitution of
(2.36) that
AT A x = AT AA+ b+(AT A – AT AA+ A) g (2.38)
= AT b .
If
AT AA+ = AT (2.39)
then equation (2.38) is obviously true. By requiring A+ to satisfy
AA + A = A (2.40)
and
(AA+)T = AA+ (2.41)
this can indeed be made to happen. The proposed solution (2.36) is therefore a
least-squares solution under the conditions (2.40) and (2.41) on A+. In order that
(2.36) gives the minimum-length least-squares solution, it is necessary that xT x be
minimal also. But from equation (2.36) we find
x T x = b T (A+ ) T A+ b + g T (1 – A+A) T ( 1 – A+A)g + 2g T(1 – A+ A) TA+ b (2.42)
which can be seen to have a minimum at
g =0 (2.43)
if
(1 – A + A) T
26 Compact numerical methods for computers
is the annihilator of A+ b, thus ensuring that the two contributions (that is, from b
and g) to x T x are orthogonal. This requirement imposes on A + the further
conditions
A+ AA + = A+ (2.44)
(A+ A)T = A+ A. (2.45)
The four conditions (2.40), (2.41), (2.44) and (2.45) were proposed by Penrose
(1955). The conditions are not, however, the route by which A + is computed.
Here attention has been focused on one generalised inverse, called the Moore-
Penrose inverse. It is possible to relax some of the four conditions and arrive at
other types of generalised inverse. However, these will require other conditions to
be applied if they are to be specified uniquely. For instance, it is possible to
consider any matrix which satisfies (2.40) and (2.41) as a generalised inverse of A
since it provides, via (2.33), a least-squares solution to equation (2.14). However,
in the rank-deficient case, (2.36) allows arbitrary components from the null space
of A to be added to this least-squares solution, so that the two-condition general-
ised inverse is specified incompletely.
Over the years a number of methods have been suggested to calculate ‘generalised
inverses’. Having encountered some examples of dubious design, coding or appli-
cations of such methods, I strongly recommend testing computed generalised inverse
matrices to ascertain the extent to which conditions (2.40), (2.41), (2.44) and (2.45)
are satisfied (Nash and Wang 1986).
0 {
S = 1/S ii for Sii 0
for Sii = 0
(2.57)
satisfies the four conditions (2.40), (2.41), (2.44) and (2.45), that is
AA + A = USVT VS+ UT USVT
= USS+ SVT (2.58)
= USVT = A
(2.59)
28 Compact numerical methods for computers
A+ AA+ = VS+ UT USVT VS+ UT
= VS+ SS+ UT = VS+ U T = A+ (2.60)
and
T T
(A+ A)T = (VS + UT USVT)T = ( VS+ SV )
= VS + SVT = A+ A. (2.61)
+
Several of the above relationships depend on the diagonal nature of S and S and
on the fact that diagonal matrices commute under multiplication.
{ 1
(fi, fi) = δ ij = 0
for i = j
for i j
(2.69)
3.1. INTRODUCTION
This chapter presents an algorithm for accomplishing the powerful and versatile
singular-value decomposition. This allows the solution of a number of problems to
be realised in a way which permits instabilities to be identified at the same time.
This is a general strategy I like to incorporate into my programs as much as
possible since I find succinct diagnostic information invaluable when users raise
questions about computed answers-users do not in general raise too many idle
questions! They may, however, expect the computer and my programs to produce
reliable results from very suspect data, and the information these programs
generate together with a solution can often give an idea of how trustworthy are
the results. This is why the singular values are useful. In particular, the appear-
ance of singular values differing greatly in magnitude implies that our data are
nearly collinear. Collinearity introduces numerical problems simply because small
changes in the data give large changes in the results. For example, consider the
following two-dimensional vectors:
A = (1, 0)T
B = (1, 0·1)T
C = (0·95, 0·1)T.
A = (1, 0)T
D = (0, 1)T
E = (0, 0·95)T
The quantities Si may, as yet, be either positive or negative, since only their
square is defined by equation (3.2). They will henceforth be taken arbitrarily as
positive and will be called singular values of the matrix A. The vectors
uj = bj/Sj (3.5)
which can be computed when none of the Sj is zero, are unit orthogonal vectors.
Collecting these vectors into a real m by n matrix, and the singular values into a
diagonal n by n matrix, it is possible to write
B = US (3.6)
where
UT U = 1 n (3.7)
is a unit matrix of order n.
In the case that some of the Sj are zero, equations (3.1) and (3.2) are still valid,
but the columns of U corresponding to zero singular values must now be
32 Compact numerical methods for computers
constructed such that they are orthogonal to the columns of U computed via
equation (3.5) and to each other. Thus equations (3.6) and (3.7) are also satisfied.
An alternative approach is to set the columns of U corresponding to zero singular
values to null vectors. By choosing the first k of the singular values to be the
non-zero ones, which is always possible by simple permutations within the matrix
V, this causes the matrix UT U to be a unit matrix of order k augmented to order n
with zeros. This will be written
(3.8)
While not part of the commonly used definition of the svd, it is useful to require
the singular values to be sorted, so that
S11 > S22 > S33 > . . . > Skk > . . . > Snn.
(2.53a)
à 1, Ã2, . . . , Ãn .
Ãn = A
u j S jj v T j
can be referred to as singular planes, and the partial sums (in order of decreasing
singular values) are partial svds (Nash and Shlien 1987).
A combination of (3.1) and (3.6) gives
AV = US (3.9)
or, using (3.3), the orthogonality of V,
A = USVT (2.53)
which expresses the svd of A.
The preceding discussion is conditional on the existence and computability of a
suitable matrix V. The next section shows how this task may be accomplished.
(3.10)
where z is some index not necessarily related to the dimensions m and n of A, the
matrix being decomposed. The matrices used in this product will be plane
rotations. If V (k) is a rotation of angle φ in the ij plane, then all elements of V( k )
will be the same as those in a unit matrix of order n except for
(3.11)
Thus V (k) affects only two columns of any matrix it multiplies from the right.
These columns will be labelled x and y. Consider the effect of a single rotation
involving these two columns
(3.12)
Thus we have
X = x cos φ + y sin φ
(3.13)
Y = –x sin φ + y cos φ.
If the resulting vectors X and Y are to be orthogonal, then
XT Y = 0 = –(x T x – yT y) sinφ cosφ + x T y(cos2 φ – sin 2φ ). (3.14)
There is a variety of choices for the angle φ, or more correctly for the sine and
cosine of this angle, which satisfy (3.14). Some of these are mentioned by
Hestenes (1958), Chartres (1962) and Nash (1975). However, it is convenient if
the rotation can order the columns of the orthogonalised matrix B by length, so
that the singular values are in decreasing order of size and those which are zero
(or infinitesimal) are found in the lower right-hand corner of the matrix S as in
equation (3.8). Therefore, a further condition on the rotation is that
XT X – xT x > 0. (3.15)
For convenience, the columns of the product matrix
(3.16)
(3.17)
}1
sgn (p) = –1
for p > 0
for p < 0.
(3.27)
Note that having two forms for the calculation of the functions of the angle of
rotation permits the subtraction of nearly equal numbers to be avoided. As the
matrix nears orthogonality p will become small, so that q and v are bound to have
nearly equal magnitudes.
In the first edition of this book, I chose to perform the computed rotation only
when q > r, and to use
when q < 0. This effects an interchange of the columns of the current matrix A.
However, I now believe that it is more efficient to perform the rotations as defined in
the code presented. The rotations (3.28) were used to force nearly null columns of the
final working matrix to the right-hand side of the storage array. This will occur when
the original matrix A suffers from linear dependencies between the columns (that is,
is rank deficient). In such cases, the rightmost columns of the working matrix
eventually reflect the lack of information in the data in directions corresponding to
the null space of the matrix A. The current methods cannot do much about this lack
of information, and it is not sensible to continue computations on these columns. In
the current implementation of the method (Nash and Shlien 1987), we prefer to
ignore columns at the right of the working matrix which become smaller than a
Singular-value decomposition, and use in least-squares problems 35
specified tolerance. This has a side effect of speeding the calculations significantly
when rank deficient matrices are encountered.
Y T Y – yT y = – ( v – q)/2 (3.35)
to compute the updated column norms after each rotation. There is a danger that
nearly equal magnitudes may be subtracted, with the resultant column norm having a
large relative error. However, if the application requires information from the largest
singular values and vectors, this approach offers some saving of effort. The changes
needed are:
(1) an initial loop to compute the Z[i], that is, the sum of squares of the elements
of each column of the original matrix A;
(2) the addition of two statements to the end of the main svd loop on k, which,
if a rotation has been performed, update the column norms Z[j] and Z[k] via
formulae (3.34) and (3.35). Note that in the present algorithm the quantities
needed for these calculations have not been preserved. Alternatively, add at
the end of STEP 8 (after the rotation) the statements
Z[j] := q; Z[k] := r;
Hilbert segment:
Column orthogonality of U
Largest inner product is 5, 5 = -1.44016460160157E-006
Largest inner product is 3, 3 = 5.27355936696949E-016
Singular values
1.27515004411E+000 4.97081651063E-001 1.30419686491E-001 2.55816892287E-002
1.27515004411E+000 4.97081651063E-001 1.30419686491E-001 2.55816892259E-002
3.60194233367E-003
3.60194103682E-003
(3.37)
where q is some tolerance set by the user. The use of the symbol for the tolerance
is not coincidental. The previous employment of this symbol in computing the
rotation parameters and the norm of the orthogonalised columns of the resulting
matrix is finished, and it can be re-used.
Permitting S + to depend on a user-defined tolerance places upon him/her the
responsibility for deciding the degree of linear dependence in his/her data. In an
economic modelling situation, for instance, columns of U corresponding to small
singular values are almost certain to be largely determined by errors or noise in
the original data. On the other hand, the same columns when derived from the
tracking of a satellite may contain very significant information about orbit
perturbations. Therefore, it is not only difficult to provide an automatic definition
for S +, it is inappropriate. Furthermore, the matrix B = US contains the principal
components (Kendall and Stewart 1958-66, vol 3, p 286). By appropriate
choices of q in equation (3.37), the solutions x corresponding to only a few of the
Singular-value decomposition, and use in least-squares problems 41
dominant principal components can be computed. Furthermore, at this stage in
the calculation UT b should already have been computed and saved, so that only a
simple matrix-vector multiplication is involved in finding each of the solutions.
Another way to look at this is to consider the least-squares problem
Bw b (3.38)
where B is the matrix having orthogonal columns and is given in equations (3.1)
and (3.6). Thus the normal equations corresponding to (3.38) are
BT Bw = S2 w = BT b. (3.39)
But S 2 is diagonal so that the solutions are easily obtained as
w = S - 2 BT b (3.40)
and substitution of (3.6) gives
w = S - 1UT b. (3.41)
Should the problem be singular, then
w = S+ U T b. (3.42)
can be used. Now note that because
BVT = A (3.43)
from (3.1), the solution w allows x to be computed via
x = Vw . (3.44)
The coefficients w are important as the solution of the least-squares problem in
terms of the orthogonal combinations of the original variables called the principal
components. The normalised components are contained in U. It is easy to
rearrange the residual sum of squares so that
r T r = (b – Ax) T ( b – Ax)T= ( b – Bw) T (b – Bw) = bT b – b T Bw (3.45)
by virtue of the normal equations (3.39). However, substituting (3.37) in (3.42)
and noting the ordering of S, it is obvious that if
Sk+1,k+1 < q (3.46)
is the first singular value less than or equal to the tolerance, then
wi = 0 for i > k. (3.47)
The components corresponding to small singular values are thus dropped from the
solution. But it is these components which are the least accurately determined
since they arise as differences. Furthermore, from (3.6) and (3.45)
r T r = b T b – bT USS+ U T b
(3.48)
where the limit of the sum in (3.48) is k, the number of principal components
which are included. Thus inclusion of another component cannot increase the
42 Compact numerical methods for computers
residual sum of squares. However, if a component with a very small singular value
is introduced, it will contribute a very large amount to the corresponding element
of w, and x will acquire large elements also. From (3.48), however, it is the
interaction between the normalised component uj and b which determines how
much a given component reduces the sum of squares. A least-squares problem
will therefore be ill conditioned if b is best approximated by a column of U which
is associated with a small singular value and thus may be computed inaccurately.
On the other hand, if the components corresponding to ‘large’ singular values
are the ones which are responsible for reducing the sum of squares, then the
problem has a solution which can be safely computed by leaving out the
components which make the elements of w and x large without appreciably
reducing the sum of squares. Unless the unwanted components have no part in
reducing the sum of squares, that is unless
uiT b = 0 for i > k (3.49)
under the same condition (3.46) for k, then solutions which omit these components
are not properly termed least-squares solutions but principal-components solutions.
In many least-squares problems, poorly determined components will not arise,
all singular values being of approximately the same magnitude. As a rule of
thumb for my clients, I suggest they look very carefully at their data, and in
particular the matrix A, if the ratio of the largest singular value to the smallest
exceeds 1000. Such a distribution of singular values suggests that the columns of A
are not truly independent and, regardless of the conditioning of the problem as
discussed above, one may wish to redefine the problem by leaving out certain
variables (columns of A) from the set used to approximate b.
Algorithm 2. Least-squares solution via singular-value decomposition
procedure svdlss(nRow, nCo1: integer; {order of problem}
W : wmatrix; {working array with decomposition}
Y : rvector; {right hand side vector}
Z : r-vector; {squares of singular values}
A : rmatrix; {coefficient matrix (for residuals)}
var Bvec: r-vector); {solution vector}
{alg02.pas ==
least squares solution via singular value decomposition.
On entry, W must have the working matrix resulting from the operation of
NashSVD on a real matrix A in alg1.pas. Z will have the squares of the
singular values. Y will have the vector to be approximated. Bvec will be
the vector of parameters (estimates) returned. Note that A could be
omitted if residuals were not wanted. However, the user would then lose
the ability to interact with the problem by changing the tolerance q.
Because this uses a slightly different decomposition from that in the
first edition of Compact Numerical Methods, the step numbers are not
given.
Copyright 1988 J. C. Nash
}
var
i, j, k : integer;
q, s : real;
Singular-value decomposition, and use in least-squares problems 43
In the above code the residual sum of squares is computed in the separate procedure resids.pas.
In alg02.pas, I have not included step numbers because the present code is quite different from the
original algorithm.
44 Compact numerical methods for computers
Example 3.1. The generalised inverse of a rectangular matrix via the singular-value
decomposition
Given the matrices U, V and S of the singular-value decomposition (2.53), then by
the product
A+ = VS+ UT (2.56)
the generalised (Moore-Penrose) inverse can be computed directly. Consider the
matrix
and
The generalised inverse using the definition (2.57) of S+ is then (to six figures)
in place of
In the above solutions and products, all figures printed by the HP 9830 have been
given rather than the six-figure approximations used earlier in the example.
(3.50)
(3.52)
R2 and provide measures of the goodness of fit of our model which are not
dependent on the scale of the data.
Using the last four columns of table 3.1 together with a column of ones for the
matrix A in algorithm 2, with the first column of the table as the dependent
variable b, a Data General NOVA operating in 23-bit binary floating-point
arithmetic computes the singular values:
The ratio of the smallest of these to the largest is only very slightly larger than the
machine precision, 2-22, and we may therefore expect that a great number of
extremely different models may give very similar degees of approximation to the
data. Solutions (a), (b), (c) and (d) in table 3.2 therefore present the solutions
corresponding to all, four, three and two principal components, respectively. Note
that these have 8, 9, 10 and 11 degrees of freedom because we estimate the
coefficients of the principal components, then transform these to give solutions in
terms of our original variables. The solution given by only three principal
components is almost as good as that for all components, that is. a conventional
least-squares solution. However, the coefficients in solutions (a), (b) and (c) are
very different.
Neither the algorithms in this book nor those anywhere else can make a clear
and final statement as to which solution is ‘best’. Here questions of statistical
significance will not be addressed, though they would probably enter into consi-
deration if we were trying to identify and estimate a model intended for use in
TABLE 3.2. Solutions for various principal-component regressions using the data in table 3.1.
Tolerance
for zero R* xconstant X nitrogen x phosphate xcotash xpetroleum
The values in parentheses below each R’ are the corrected statistic given by formula (3.52).
48 Compact numerical methods for computers
some analysis or prediction. To underline the difficulty of this task, merely
consider the alternative model
income = x1 + x3 (phosphate) (3.53)
for which the singular values are computed as 1471·19 and 0·87188, again quite
collinear. The solutions are (e) and (f) in table 3.2 and the values of R2 speak for
themselves.
A sample driver program DR0102.PAS is included on the program diskette.
Appendix 4 describes the sample driver programs and supporting procedures and
functions.
Chapter 4
4.1. INTRODUCTION
The previous chapter used plane rotations multiplying a matrix from the right to
orthogonalise its columns. By the essential symmetry of the singular-value decom-
position, there is nothing to stop us multiplying a matrix by plane rotations from
the left to achieve an orthogonalisation of its rows. The amount of work involved
is of order m2 n operations per sweep compared to mn 2 for the columnwise
orthogonalisation (A is m by n), and as there are normally more rows than
columns it may seem unprofitable to do this. However, by a judicious combination
of row orthogonalisation with Givens’ reduction, an algorithm can be devised
which will handle a theoretically unlimited number of rows.
(4.1)
where
c = cos φ s = sin φ (4.2)
and φ is the angle of rotation. If Y1 is to be zero, then
–sz1 + cy1 = 0 (4.3)
so that the angle of rotation in this case is given by
tan φ = s/c = yl/z1. (4.4)
49
50 Compact numerical methods for computers
This is a simpler angle calculation than that of §3.3 for the orthogonalisation
process, since it involves only one square root per rotation instead of two. That is,
if
(4.5)
then we have
c = z 1 /p (4.6)
and
s = y1/p. (4.7)
It is possible, in fact, to perform such transformations with no square roots at
all (Gentleman 1973, Hammarling 1974, Golub and Van Loan 1983) but no way has
so far come to light for incorporating similar ideas into the orthogonalising rotation
of §3.3. Also, it now appears that the extra overhead required in avoiding the square
root offsets the expected gain in efficiency, and early reports of gains in speed now
appear to be due principally to better coding practices in the square-root-free
programs compared to their conventional counterparts.
The Givens’ transformations are assembled in algorithm 3 to triangularise a real
m by n matrix A. Note that the ordering of the rotations is crucial, since an
element set to zero by one rotation must not be made non-zero by another.
Several orderings are possible; algorithm 3 acts column by column, so that
rotations placing zeros in column k act on zeros in columns 1, 2, . . . , (k - 1) and
leave these elements unchanged. Algorithm 3 leaves the matrix A triangular, that
is
A [i,j] = 0 for i > j (4.8)
which will be denoted R. The matrix Q contains the transformations, so that the
original m by n matrix is
A = QR. (4.9)
In words, this procedure simply zeros the last (m – 1) elements of column 1,
then the last (m – 2) elements of column 2, . . . , and finally the last ( m – n )
elements of column n.
Since the objective in considering the Givens’ reduction was to avoid storing a
large matrix, it may seem like a step backwards to discuss an algorithm which
introduces an m by m matrix Q. However, this matrix is not needed for the
solution of least-squares problems except in its product Q T b with the right-hand
side vector b. Furthermore, the ordering of the rotations can be altered so that
they act on one row at a time, requiring only storage for this one row and for the
resulting triangular n by n matrix which will again be denoted R, that is
(4.10)
(4.11)
Handling larger problems 51
Thus the zeros below R multiply the last (m – n) elements of
(4.12)
where d1 is of order n and d2 of order (m – n). Thus
AT Ax = RT Rx
= R T dl + 0d2 = AT b. (4.13)
These equations are satisfied regardless of the values in the vector d2 by
solutions x to the triangular system
Rx = d1. (4.14)
This system is trivial to solve if there are no zero elements on the diagonal of R.
Such zero elements imply that the columns of the original matrix are not linearly
independent. Even if no zero or ‘small’ element appears on the diagonal, the
original data may be linearly dependent and the solution x to (4.14) in some way
‘unstable’.
After the Givens’ procedure is complete, the array A contains the triangular factor R in rows 1
to mn. If nRow is less than nCol, then the right-hand (nCo1 – nRow) columns of array A contain the
transformed columns of the original matrix so that the product Q*R = A, in which R is now
trapezoidal. The decomposition can be used together with a back-substitution algorithm such as
algorithm 6 to solve systems of linear equations.
The order in which non-zero elements of the working array are transformed to zero is not
unique. In particular, it may be important in some applications to zero elements row by row
instead of column by column. The ,file alg03a.pas on the software disk presents such a row-wise
variant. Appendix 4 documents driver programs DR03.PAS and DR03A.PAS which illustrate
how the two Givens’ reduction procedures may be used.
Example 4.1. The operation of Givens’ reduction
The following output of a Data General ECLIPSE operating in six hexadecimal
digit arithmetic shows the effect of Givens’ reduction on a rectangular matrix. At
each stage of the loops of steps 1 and 2 of algorithm 3 the entire Q and A matrices
are printed so that the changes are easily seen. The loop parameters j and k as
well as the matrix elements c = A[j,j] and s = A[k,j] are printed also. In this
example, the normalisation at step 3 of the reduction is not necessary, and the
sine and cosine of the angle of rotation could have been determined directly from
the formulae (4.5), (4.6) and (4.7).
The matrix chosen for this example has only rank 2. Thus the last row of the
FINAL A MATRIX is essentially null. In fact, small diagonal elements of the
triangular matrix R will imply that the matrix A is ‘nearly’ rank-deficient.
However, the absence of small diagonal elements in R, that is, in the final array A,
do not indicate that the original A is of full rank. Note that the recombination of
Handling larger problems 53
the factors Q and R gives back the original matrix apart from very small errors
which are of the order of the machine precision multiplied by the magnitude of
the elements in question.
*RUN
TEST GIVENS - GIFT - ALG 3 DEC: 12 77
SIZE -- M= ? 3 N= ? 4
MTIN - INPUT M BY N MATRIX
ROW 1 : ? 1 ? 2 ? 3 ? 4
ROW 2 : ? 5 ? 6 ? 7 ? 8
ROW 3 : ? 9 ? 10 ? 11 ? 12
ORIGINAL A MATRIX
ROW 1 : 1 2 3 4
ROW 2 : 5 6 7 8
HOW 3 : 9 10 11 12
GIVENS TRIANGULARIZATION DEC: 12 77
Q MATRIX
ROW 1 : 1 0 0
ROW 2 : 0 1 0
ROW 3 : 0 0 1
J= 1 K= 2 A[J,J]= 1 A[K,J]= 5
A MATRIX
ROW 1 : 5.09902 6.27572 7.45242 8.62912
ROW 2 : -1.19209E-07 -.784466 -1.56893 -2.3534
ROW 3 : 9 10 11 12
Q MATRIX
ROW 1 : .196116 -.980581 0
ROW 2 : .980581 .l96116 0
ROW 3 : 0 0 1
J = 1 K= 3 A[J,J]= 5.09902 A[K,J]= 9
A MATRIX
ROW 1 : 10.3441 11.7942 13.2443 14.6944
ROW 2 : -1.19209E-07 -.784466 -1.56893 -2.3534
ROW 3 : 0 -.530862 -1.06172 -1.59258
Q MATRIX
ROW 1 : 9.66738E-02 -.980581 -.170634
ROW 2 : .483369 .196116 -.853168
ROW 3 : .870063 0 .492941
J= 2 K= 3 A[J,J]=-.784466 A[K,J]=-.530862
FINAL A MATRIX
ROW 1 : 10.3441 11.7942 13.2443 14.6944
ROW 2 : 9.87278E-08 .947208 1.89441 2.84162
ROW 3 : -6.68l09E-08 0 -9.53674E-07 -1.90735E-06
FINAL Q MATRIX
ROW 1 : 9.66738E-02 .907738 -.40825
ROW 2 : .483369 .315737 .816498
ROW 3 : .870063 .276269 -.4408249
RECOMBINATION
ROW 1 : 1 2.00001 3.00001 4.00001
ROW 2 : 5.00001 6.00002 7.00002 8.00002
ROW 3 : 9.00001 10 11 12
54 Compact numerical methods for computers
4.3. EXTENSION TO A SINGULAR-VALUE DECOMPOSITION
P T Q T A = PT (4.18)
or
(4.19)
(4.20)
where k is the number of singular values larger than some pre-assigned tolerance
for zero. Since in the solution of least-squares problems these rows always act
only in products with S or S+, this presents no great difficulty to programming an
algorithm using the above Givens’ reduction/row orthogonalisation method.
r T r = (b – Ax) T (b – Ax)
= bTb – bT Ax – xTA Tb + xTATAx. (4.21)
By using the normal equations (2.22) the last two terms of this expression cancel
leaving
r T r = b T b – b T Ax. (4.22)
If least-squares problems with large numbers of observations are being solved
via the normal equations, expression (4.22) is commonly used to compute the
residual sum of squares by accumulating bT b, AT A and AT b with a single pass
through the data. In this case, however, (4.22) almost always involves the
subtraction of nearly equal numbers. For instance, when it is possible to approxi-
mate b very closely with Ax, then nearly all the digits in b T b will be cancelled by
those in b TAx, leaving a value for r T r with very few correct digits.
For the method using rotations, on the other hand, we have
(4.23)
and
(4.24)
(4.26)
where f1 is of order (n – k) and f 2 of order k. Now equation (4.24) will have the
form
(4.27)
56 Compact numerical methods for computers
by application of equation (4.16) and the condition that Sk+l, Sk+2, . . . , Sn, are all
‘zero’. Thus, using
(4.28)
and (4.22) with (4.27) and (4.23), the residual sum of squares in the rank-deficient
case is
(4.29)
From a practical point of view (4.29) is very convenient, since the computation of
the residual sum of squares is now clearly linked to those singular values which
are chosen to be effectively zero by the user of the method. The calculation is
once again as a sum of squared terms, so there are no difficulties of digit
cancellation.
The vector
(4.30)
{alg04.pas ==
Givens’ reduction, singular value decomposition and least squares
solution.
In this program, which is designed to use a very small working array yet
solve least squares problems with large numbers of observations, we do not
explicitly calculate the U matrix of the singular value decomposition.
Handling larger problems 57
Algorithm 4. Givens’ reductions, singular-value decomposition and least-squares sol-
ution (cont.)
One could save the rotations and carefully combine them to produce the U
matrix. However, this algorithm uses plane rotations not only to zero
elements in the data in the Givens’ reduction and to orthogonalize rows of
the work array in the svd portion of the code, but also to move the data
into place from the (n+1)st row of the working array into which the data
is read. These movements i.e. of the observation number nobs,
would normally move the data to row number nobs of the original
matrix A to be decomposed. However, it is possible, as in the array given
by data file ex04.cnm
3 1 <--- there are 3 columns in the matrix A
and 1 right hand side
-999 <--- end of data flag
1 2 3 1 <--- the last column is the RHS vector
2471
2221
5311
-999 0 0 0 <--- end of data row
that this movement does not take place. This is because we use a complete
cycle of Givens’ rotations using the diagonal elements W[i,j], j := 1 to
n, of the work array to zero the first n elements of row nobs of the
(implicit) matrix A. In the example, row 1 is rotated from row 4 to row 1
since W is originally null. Observation 2 is loaded into row 4 of W, but
the first Givens’ rotation for this observation will zero the first TWO
elements because they are the same scalar multiple of the corresponding
elements of observation 1. Since W[2,2] is zero, as is W[4,2], the second
Givens’ rotation for observation 2 is omitted, whereas we should move the
data to row 2 of W. Instead, the third and last Givens’ rotation for
observation 2 zeros element W[4,3] and moves the data to row 3 of W.
In the least squares problem such permutations are irrelevant to the final
solution or sum of squared residuals. However, we do not want the
rotations which are only used to move data to be incorporated into U.
Unfortunately, as shown in the example above, the exact form in which such
rotations arise is not easy to predict. Therefore, we do not recommend
that this algorithm be used via the rotations to compute the svd unless
the process is restructured as in Algorithm 3. Note that in any event a
large data array is needed.
The main working matrix W must be n+l by n+nRHS in size.
Copyright 1988 J. C. Nash
}
var
count, EstRowRank, i, j, k, m, slimit, sweep, tcol : integer;
bb, c, e2, eps, p, q, r, s, tol, trss, vt : real;
enddata : boolean;
endflag : real;
procedure rotnsub; {to allow for rotations using variables local
to Givsvd. c and s are cosine and sine of
angle of rotation.}
var
i: integer;
r: real;
58 Compact numerical methods for computers
Algorithm 4. Givens’ reductions, singular-value decomposition and least-squares sol-
ution (cont.)
begin
for i := m to tcol do {Note: starts at column m, not column 1.}
begin
r := W[j,i];
W[j,i] := r*c+s*W[k,i];
W[k,i] := -r*s+c*W[k,i];
end;
end; {rotnsub}
begin {Givsvd}
writeln(‘alg04.pas -- Givens’,chr(39),
‘reduction, svd, and least squares solution’);
Write(‘Order of 1s problem and no. of right hand sides:’);
readln(infile,n,nRHS); {STEP 0}
if infname<>‘con’ then writeln(n,‘ ’,nRHS);
write(‘Enter a number to indicate end of data’);
readln(infile,endflag);
if infname<>‘con’ then writeln(endflag);
tcol := n+nRHS; {total columns in the work matrix}
k := n+1; {current row of interest in the work array during Givens’ phase}
for i := l to n do
for j := 1 to tcol do
W[i,j] := 0.0; {initialize the work array}
for i := 1 to nRHS do rss[i] := 0.0; {initialize the residual sums of squares}
{Note that other quantities, in particular the means and the total
sums of squares will have to be calculated separately if other
statistics are desired.}
eps := calceps; {the machine precision}
tol := n*n*eps*eps; {a tolerance for zero}
nobs := 0; {initially there are no observations]
{STEP 1 -- start of Givens’ reduction}
enddata := false; {set TRUE when there is no more data. Initially FALSE.}
while (not enddata) do
begin {main loop for data acquisition and Givens’ reduction}
getobsn(n, nRHS, W, k, endflag, enddata); {STEP 2}
if (not enddata) then
begin {We have data, so can proceed.} {STEP 3}
nobs := nobs+1; {to count the number of observations}
write(‘Obsn’,nobs,‘ ’);
for j := 1 to (n+nRHS) do
begin
write(W[k,j]:10:5,‘ ’);
if (7 * (j div 7) = j) and (j<n+nRHS) then writeln;
end;
writeln;
for j := 1 to (n+nRHS) do
begin {write to console file}
end;
for j := 1 to n do {loop over the rows of the work array to
move information into the triangular part of the
Givens’ reduction} {STEP 4}
begin
m := j; s := W[k,j]; c := W[j,j]; {select elements in rotation}
Handling larger problems 59
Algorithm 4. Givens’ reductions, singular-value decomposition and least-squares sol-
ution (cont.)
bb := abs(c); if abs(s)>bb then bb := abs(s);
if bb>0.0 then
begin {can proceed with rotation as at least one non-zero element}
c := c/bb; s := s/bb; p := sqrt(c*c+s*s); {STEP 7}
s := s/p; {sin of angle of rotation}
if abs(s)>=tol then
begin {not a very small angle} {STEP8}
c := c/p; {cosine of angle of rotation}
rotnsub; {to perform the rotation}
end; {if abs(s)>=tol}
end; {if bb>0.0}
end; {main loop on j for Givens’ reduction of one observation} {STEP 9}
{STEP 10 -- accumulate the residual sums of squares}
write(‘Uncorrelated residual(s):’);
for j := 1 to nRHS do
begin
rss[j] := rss[j]+sqr(W[k,n+j]); write(W[k,n+j]:10,‘ ’);
if (7 * (j div 7) = j) and (j < nRHS) then
begin
writeln;
end;
end;
writeln;
{NOTE: use of sqr function which is NOT sqrt.}
end; {if (not enddata)}
end; {while (not enddata)}
{This is the end of the Givens’ reduction part of the program.
The residual sums of squares are now in place. We could find the
least squares solution by back-substitution if the problem is of full
rank. However, to determine the approximate rank, we will continue
with a row-orthogonalisation.}
{STEP 11} {Beginning of svd portion of program.}
m := 1; {Starting column for the rotation subprogram}
slimit := n div 4; if slimit< then slimit := 6; {STEP 12}
{This sets slimit, a limit on the number of sweeps allowed.
A suggested limit is max([n/4], 6).}
sweep := 0; {initialize sweep counter}
e2 := l0.0*n*eps*eps; {a tolerance for very small numbers}
tol := eps*0.1; {a convergence tolerance}
EstRowRank := n; {current estimate of rank};
repeat
count := 0; {to initialize the count of rotations performed}
for j := 1 to (EstRowRank-1) do {STEP 13}
begin {STEP 14}
for k := (j+1) to EstRowRank do
begin {STEP 15}
p := 0.0; q := 0.0; r := 0.0;
for i := 1 to n do
begin
p := p+W[j,i]*W[k,i]; q := q+sqr(W[j,i]); r := r+sqr(W[k,i]);
end; {accumulation loop}
svs[j] := q; svs[k] := r;
60 Compact numerical methods for computers
Algorithm 4. Givens’ reductions, singular-value decomposition and least-squares sol-
ution (cont.)
(4.31)
where the tilde is used to indicate matrices which have been updated. Deletion of
y T now requires the subtraction
(4.32)
SE(bi ) = ( σ
2
(AT A) i i - 1 )½ (4.33)
where σ 2 is an estimate of the variance of data about the fitted model calculated by
dividing the sum of squared residuals by the number of degrees of freedom (nRow –
nCol) = (nRow – n). The sum of squared residuals has already been computed in
algorithm 4, and has been adjusted for rank deficiency within the solution phase of
the code.
The diagonal elements of the inverse of the sum of squares and cross-products
matrix may seem to pose a bigger task. However, the singular-value decomposition
leads easily to the expression
T
(ATA)-1 = VS + S+ V . (4.34)
In particular, diagonal elements of the inverse of the sum of squares and cross-
Handling larger problems 65
products matrix are
(4.35)
Thus, the relevant information for the standard errors is obtained by quite simple
row sums over the V matrix from a singular-value decomposition. When the original
A matrix is rank deficient, and we decide (via the tolerance for zero used to select
‘non-zero’ singular values) that the rank is r, the summation above reduces to
(4.36)
However, the meaning of a standard error in the rank-deficient case requires careful
consideration, since the standard error will increase very sharply as small singular
values are included in the summation given in (4.36). I usually refer to the dispersion
measures computed via equations (4.33) through (4.36) for rank r < n cases as
‘standard errors under the condition that the rank is5 (or whatever value r currently
has)‘. More discussion of these issues is presented in Searle (1971) under the topic
‘estimable functions’, and in various sections of Belsley, Kuh and Welsch (1980).
Chapter 5
66
Some comments on the formation of the cross-products matrix 67
where is the mean of the jth column of the m by n matrix A. Furthermore, the
right-hand side of the nth normal equation is
(5.9)
This permits xn to be eliminated by using the nth normal equation
(5.10)
or
(5.11)
When this expression is substituted into the normal equations, the kth equation
(note carefully the bars above the symbols) becomes
(5.12)
But since
(5.13)
and
(5.14)
equation (5.12) becomes
(5.15)
(5.17)
68 Compact numerical methods for computers
which has mean = 1003·5 so that
(5.18)
which is singular since the first two columns or rows are identical. If we use
deviations from means (and drop the constant column) a singular matrix still
results. For instance, on a Data General NOVA minicomputer using a 23-bit
binary mantissa (between six and seven decimal digits), the A matrix using
deviation from mean data printed by the machine is
which is singular.
However, by means of the singular-value decomposition given by algorithm 1,
the same machine computes the singular values of A (not A') as
2·17533, 1·12603 and 1E–5.
Since the ratio of the smallest to the largest of the singular values is only slightly
larger than the machine precision (2-22 2·38419E-7), it is reasonable to pre-
sume that the tolerance q in the equation (3.37) should be set to some value
between 1E-5 and 1·12603. This leads to a computed least-squares solution
with
T
r r = 1·68956E–4.
(In exact arithmetic it is not possible for the sum of squares with q= 0 to exceed
that for a larger tolerance.)
70 Compact numerical methods for computers
When using the singular-value decomposition one could choose to work with
deviations from means or to scale the data in some way, perhaps using columns
which are deviations from means scaled to have unit variance. This will then
prevent ‘large’ data from swamping ‘small’ data. Scaling of equations has proved a
difficult and somewhat subjective issue in the literature (see, for instance, Dahl-
quist and Björck 1974, p 181ff).
Despite these cautions, I have found the solutions to least-squares problems
obtained by the singular-value decomposition approach to be remarkably resilient
to the omission of scaling and the subtraction of means.
As a final example of the importance of using decomposition methods for
least-squares problems, consider the data (Nash and Lefkovitch 1976)
This is a regression through the origin and can be shown to have the exact solution
with a zero residual sum of squares. If we wish to use a method which only scans
the data once, that is, explicit residuals are not computed, then solution of the
normal equations allows the residual sum of squares to be computed via
r T r = b T b – bTAx. (5.23)
Alternatively, algorithm 4 can be used, to form the sum of squares by means
of the uncorrelated residuals (4.30).
The following solutions were found using a Hewlett-Packard 9830 desk cal-
culator (machine precision equal to 1E-11, but all arrays in the examples stored
in split precision equal to 1E–5):
(i) Conventional regression performed by using the Choleski decomposition
(§7.1) to solve the normal equations gave
( α) for α = 8
and r Tr = 4·22E–4
(h) for α = 64
and r T r = 0·046709.
Some comments on the formation of the cross-products matrix 71
(ii) Algorithm 4 gave
(a) for α =8
and rT r = 0
and rT r = 0.
Since the first edition of this book appeared, several authors have considered
problems associated with the formation of the sum of squares and cross-products
matrix, in particular the question of collinearity. See, for example, Nash (1979b) and
Stewart (1987).
Chapter 6
6.1. INTRODUCTION
So far we have been concerned with solving linear least-squares problems. Now
the usually simpler problem of linear equations will be considered. Note that a
program designed to solve least-squares problems will give solutions to linear
equations. The residual sum of squares must be zero if the equations are
consistent. While this is a useful way to attack sets of equations which are
suspected to possess singular coefficient matrices, since the singular-value decom-
position permits such to be identified, in general the computational cost will be
too high. Therefore this chapter will examine a direct approach to solving systems
of linear equations. This is a variant of the elimination method taught to students
in secondary schools, but the advent of automatic computation has changed only
its form, showing its substance to be solid.
These equations follow directly from equations (2.2) and the supposed upper- or
right-triangular structure of A. The Gauss elimination scheme uses this idea to
find solutions to simultaneous linear equations by constructing the triangular form
Rx = f (6.4)
from the original equations.
Note that each of the equations (2.2), that is
for i = 1, 2, . . . , n (6.5)
= Aik – mi 1 A 1 k (6.8)
and
(6.9)
But
(6.10)
so that we have eliminated all but the first element of column 1 of A . This process
can now be repeated with new equations 2, 3, . . . , n to eliminate all but the first
two elements of column 2. The element A12 is unchanged because equation 1 is
not a participant in this set of eliminations. By performing (n - 1) such sets of
eliminations we arrive at an upper-triangular matrix R. This procedure can be
thought of as an ordered sequence of multiplications by elementary matrices. The
elementary matrix which eliminates Aij will be denoted Mij and is defined by
M ij = 1 n – mij ij
(6.11)
where
mij = Aij/Ajj (6.12)
(the elements in A are all current, not original, values) and where ij is the matrix
having 1 in the position ij and zeros elsewhere, that is
(6.13)
which uses the Kronecker delta, δ ir = 1 for i = r and δ ir = 0 otherwise. The effect
on Mij when pre-multiplying a matrix A is to replace the ith row with the
difference between the ith row and mij times the jth row, that is, if
A' = M i jA (6.14)
then
for r i (6.15)
(6.16)
with k = 1, 2, . . . , n. Since Ajk = 0 for k < j, for computational purposes one need
only use k = j, ( j+ 1), . . . , n. Thus
(6.17)
= L - 1A (6.18)
74 Compact numerical methods for computers
gives the triangular matrix in question. The choice of symbol
(6.19)
The solutions to the triangular system(s) of equations and hence to the original equations
(2.2) are contained in columns n + 1, n + 2, . . . , n+p, of the working array.
Example 6.1. The use of linear equations and linear least-squares problems
Organisations which publish statistics frequently use indices to summarise the
change in some set of measurable quantities. Already in example 3.2 we have
used indices of the use of various chemicals in agriculture and an index for farm
income. The consumer price index, and the Dow Jones and Financial Times
indices provide other examples. Such indices are computed by dividing the
average value of the quantity for period t by the average for some base period
t = 0 which is usually given the index value 100. Thus, if the quantity is called P,
then
(6.28)
78 Compact numerical methods for computers
where
(6.29)
given n classes or types of quantity P, of which the jth has value Ptj in period t
and is assigned weight Wj in the average. Note that it is assumed that the
weighting Wj is independent of the period, that is, of time. However, the
weightings or ‘shopping basket’ may in fact change from time to time to reflect
changing patterns of product composition, industrial processes causing pollution,
stocks or securities in a portfolio, or consumer spending.
Substitution of (6.29) into (6.28) gives
Finally, letting
(6.30)
gives
(6.31)
Thus, if n periods of data It, Ptj, j = 1, . . . , n, are available, we can compute the
weightings KWj. Hence, by assuming
(6.32)
that is, that the weights are fractional contributions of each component, we can
find the value of K and each of the Wj. This involves no more nor less than the
solution of a set of linear equations. The work of solving these is, of course,
unnecessary if the person who computes the index publishes his set of weights-as
indeed is the case for several indices published in the Monthly Digest of Staristics † .
Unfortunately, many workers do not deem this a useful or courteous practice
towards their colleagues, and I have on two occasions had to attempt to discover
the weightings. In both cases it was not possible to find a consistent set of weights
over more than n periods, indicating that these were being adjusted over time.
This created some difficulties for my colleagues who brought me the problems,
since they were being asked to use current price data to generate a provisional
estimate of a price index considerably in advance of the publication of the indices
by the agency which normally performed the task. Without the weights, or even
approximate values from the latest period for which they were available, it was
not possible to construct such estimates. In one case the calculation was to have
P1 P 2 P 3 P 4 I 1 I2
used various proposed oil price levels to ascertain an index of agricultural costs.
When it proved impossible to construct a set of consistent weights, it was
necessary to try to track down the author of the earlier index values.
As an example of such calculations, consider the set of prices shown in table 6.1
and two indices I1 and I2 calculated from them. I1 is computed using proportions
0·4, 0·1, 0·3 and 0·2 respectively of P1, P2, P3 and P4. I2 uses the same weights
except for the last two periods where the values 0·35, 0·15, 0·4 and 0·1 are used.
Suppose now that these weights are unknown. Then the data for the first four
periods give a set of four equations (6.31) which can be solved to give
KW =
using Gauss elimination (Data General NOVA, 23-bit binary mantissa). Applying
the normalisation (6.32) gives
W =
If these weights are used to generate index numbers for the last three periods, the
values I1 will be essentially reproduced, and we would detect a change in the
weighting pattern if the values I2 were expected.
An alternative method is to use a least-squares formulation, since if the set of
weights is consistent, the residual sum of squares will be zero. Note that there is
no constant term (column of ones) in the equations. Again on the NOVA in
23-bit arithmetic, I1 gives
80 Compact numerical methods for computers
with a residual sum of squares (using KW) over the seven periods of 4·15777E–7.
The same calculation with I2 gives a residual sum of squares of 241·112, showing
that there is not a consistent set of weights. It is, of course, possible to find a
consistent set of weights even though index numbers have been computed using a
varying set; for instance, if our price data had two elements identical in one
period, any pair of weights for these prices whose sum was fixed would generate
the same index number.
(6.46)
(6.47)
and
(6.48)
This is how complex systems of linear equations can be solved using real
arithmetic only. Unfortunately the repetition of the matrices Y and Z in (6.46)
means that for a set of equations of order n, 2n2 storage locations are used
unnecessarily. However, the alternative is to recode algorithms 5 and 6 to take
account of the complex arithmetic in (6.43). Bowdler et al (1966) give ALGOL
procedures to perform the Crout variant of the elimination for such systems of
equations, unfortunately again requiring double-length accumulation.
(7.3)
Note that the summation runs only from 1 to the minimum of i and j due to the
triangular nature of L. Thus we have
(7.4)
so that
L11 = (A11)½. (7.5)
Furthermore
Ai1 = LilL11 (7.6)
so that we obtain
L i1 = A i1 /L 11 . (7.7)
Consider now the mth column of L which is defined for i > m by
(7.8)
(7.13)
(7.16)
which reduces to
Aii - cTc > 0. (7.17)
But a comparison of this with (7.8) shows that it implies the square of each
diagonal element of L is positive, so that all the elements of L are real providing A
is positive definite. Furthermore, an analysis similar to that used in (7.10), (7.11)
and (7.12) demands that
(7.18)
86 Compact numerical methods for computers
(Again, the diagonal elements must be chosen to be positive in the decomposi-
tion.) Equations (7.17) and (7.18) give bounds to the size of the subdiagonal
elements of L, which suggests the algorithm is stable. A much more complete
analysis which confirms this conjecture is given by Wilkinson (1961) who shows
the matrix LLT as computed is always close in some norm to A.
Once the Choleski decomposition has been performed, linear equations
Ax=LLT x = b (7.19)
can be solved by a combination of a forward- and a back-substitution, that is
Lv = b (7.20)
followed by
Rx = LT x = v (7.21)
T
where we have used R to emphasise the fact that L is upper-triangular. In a
computer program, b, v and x can all occupy the same storage vector, that is, v
overwrites b, and x overwrites v. The solution of (7.20) is termed forward-
substitution because the triangular structure defines the elements v j in the order
1, 2, . . . , n, that is
v l = b l/ L 1 1 (7.22)
and
for j = 2, 3, . . . , n. (7.23)
Likewise, the solution elements xj of (7.21) are obtained in the backward order n,
(n – 1), . . . , 1 from
xn= vn / Ln n (7.24)
(7.25)
(7.26)
(7.32)
which is another way of stating the requirement of consistency for the equations.
88 Compact numerical methods for computers
For a specific example, consider that Lmm = 0 as above. Thus, in the forward-
substitution v m is always multiplied by zero and could have arbitrary value, except
that in the back-substitution the mth row of LT is null. Denoting this by the vector
(7.33)
it is easily seen that
(7.34)
so that vm must be zero or the equation (7.34) is not satisfied. From (7.30) and
(7.31) one has
Lv = LLT x= b =Ax. (7.35)
Since xm is arbitrary by virtue of (7.34), the value chosen will influence the
values of all xi, i < m, so some standard choice here is useful when, for instance,
an implementation of the algorithm is being tested. I have always chosen to set
xm = 0 (as below in step 14 of algorithm 8).
The importance of the above ideas is that the solution of linear least-squares
problems by the normal equations
BT Bx = BT y (7.36)
provides a set of consistent linear equations with a symmetric non-negative
definite matrix A = BTB, that is
(7.37)
(alg07.pas ==
Choleski decomposition of symmetric positive definite matrix stored in
compact row-order form. a[i*(i-1)/2+j] = A[i,j]
for j := 1 to n do {STEP 1}
begin {STEP 2}
q := j*(i+1) div 2; {index of the diagonal element of row j}
if j>1 then {STEP 3}
begin {prepare for the subtraction in Eqn. (7.8). This is not needed
for the first column of the matrix.}
for i := j to n do {STEP 4}
begin
m := (i*(i-1) div 2)+j; s := a[m];
for k := 1 to (j-1) do s := s-a[m-k]*a[q-k];
a[m] := s;
end; {loop on i}
end; {of STEP 4}
if a[q]<=0.0 then {STEP 5}
begin {matrix singular}
singmat := true;
a[q] := 0.0; {since we shall assume matrix is non-negative definite)}
end;
s := sqrt(a[q]); {STEP 7}
for i := j to n do {STEP 8}
begin
m := (i*(i-1) div 2)+j;
if s=0.0 then a[m] := 0 {to zero column elements in singular case}
else a[m] := a[m]/s; {to perform the scaling}
end; (loop on i)
end; {loop on j -- end-loop is STEP 9}
end; {alg07.pass == Choleski decomposition choldcmp}
This completes the decomposition. The lower-triangular factor L is left in the vector a in
row-wise storage mode.
{alg08.pass ==
Choleski back substitution for the solution of consistent sets of
linear equations with symmetric coefficient matrices.
Note that this algorithm will solve consistent sets of equations whose coefficient matrices are
symmetric and non-negative definite. It will not detect the cases where the equations are not
consistent or the matrix of coefficients is indefinite.
for i j
for i = j.
On a Data General NOVA operating in arithmetic having six hexadecimal digits
(that is, a machine precision of 16-5) the correct decomposition was observed for
Moler matrices of order 5, 10, 15, 20 and 40. Thus for order 5, the Moler matrix
92 Compact numerical methods for computers
has a lower triangle
-1 2
-1 0 3
-1 0 1 4
-1 0 1 2 5
1
-1 1
-1 -1 1
-1 -1 -1 1
-1 -1 -1 -1 1.
Using the data in example 3.2, it is straightforward to form the matrix B from the
last four columns of table 3.1 together with a column of ones. The lower triangle
of BT B is then (via a Hewlett-Packard 9830 in 12 decimal digit arithmetic)
18926823
6359705 2 1 6 4 3 7 9
10985647 3734131 6445437
3344971 1 1 6 6 5 5 9 2008683 659226
14709 5147 8 8 5 9 2926 13
Note that, despite the warnings of chapter 5, means have not been subtracted, since
the program is designed to perform least-squares computations when a constant
(column of ones) is not included. This is usually called regression through the
origin in a statistical context. The Choleski factor L is computed as
4350·496868
1461·834175 165·5893864
2525·147663 258·3731371 48·05831416
768·8710282 257·2450797 14·66763457 40·90441964
3·380993125 1·235276666 0·048499519 0·194896363 0·051383414
The Choleski decomposition 93
Using the right-hand side
5937938
2046485
T
B y = 3526413
1130177
5003
the forward- and back-substitution algorithm 8 computes a solution
-0·046192435
1·019386565
x = -0·159822924
-0·290376225
207·7826146
This is to be compared with solution (a) of table 3.2 or the first solution of
example 4.2 (which is on pp 62 and 63), which shows that the various methods all
give essentially the same solution under the assumption that none of the singular
values is zero. This is despite the fact that precautions such as subtracting means
have been ignored. This is one of the most annoying aspects of numerical
computation-the foolhardy often get the right answer! To underline, let us use
the above data (that is BT B and BTy ) in the Gauss elimination method, algorithms
5 and 6. If a Data General NOVA operating in 23-bit binary arithmetic is used,
the largest integer which can be represented exactly is
223– 1 = 8388607
so that the original matrix of coefficients cannot be represented exactly. However,
the solution found by this method, which ignores the symmetry of B T B, is
-4·62306E-2
1·01966
x = -0·159942
-0·288716
207·426
While this is not as close as solution (a) of table 3.2 to the solutions computed in
comparatively double-length arithmetic on the Hewlett-Packard 9830, it retains
the character of these solutions and would probably be adequate for many
practitioners. The real advantage of caution in computation is not, in my opinion,
that one gets better answers but that the answers obtained are known not to be
unnecessarily in error.
Chapter 8
(8.1)
The substitution of this into the other equations gives a new set of (n - 1)
equations in (n - 1) unknowns which we shall write
A'x' = b' (8.2)
in which the indices will run from 2 to n. In fact x' will consist of the last (n - 1)
elements of x. By means of (8.1) it is simple to show that
(8.3)
and
(8.4)
for k, j = 2, . . . , n. Notice that if b is included as the (n + 1)th column of A, (8.4) is
the only formula needed, though j must now run to (n + 1).
We now have a set of equations of order (n – l), and can continue the process
until only a set of equations of order 1 remains. Then, using the trivial solution of
this, a set of substitutions gives the desired solution to the original equations. This
is entirely equivalent to Gauss elimination (without pivoting) and back-substitution,
and all the arithmetic is the same.
Consider, however, that the second substitution is made not only of x2 into the
remaining (n – 2) equations, but also into the formula (8.1) for x1. Then the final
order-1 equations yield all the xj at once. From the viewpoint of elimination it
corresponds to eliminating upper-triangular elements in the matrix R in the
system
Rx = f (6.4)
94
The symmetric positive definite matrix again 95
then dividing through by diagonal elements. This leaves
1x = f ' (8.5)
(8.1b)
(8.3a)
for k = 1, 2, . . . , n with k i. To determine the inverse of a matrix, we could solve
the linear-equation problems for the successive columns of 1n. But now all
columns ej for j > i will be unaltered by (8.1b) and (8.3a). At the ith stage of the
reduction, ei can be substituted on column i of the matrix by storing the pivot Aii,
substituting the value of 1 in this diagonal position, then performing the division
implied in (8.1a). Row i of the working matrix now contains the multipliers
Ãij = (Aij/Aii). By performing (8.4a) row-wise, each value Aki can be saved, a
zero substituted from ei, and the elements of Akj, j = 1, 2, . . . , n, computed.
96 Compact numerical methods for computers
This process yields a very compact algorithm for inverting a matrix in the
working storage needed to store only a single matrix. Alas, the lack of pivoting
may be disastrous. The algorithm will not work, for instance, on the matrix
0 1
1 0
which is its own inverse. Pivoting causes complications. In this example, inter-
changing rows to obtain a non-zero pivot implies that the columns of the resulting
inverse are also interchanged.
The extra work involved in column interchanges which result from partial
pivoting is avoided if the matrix is symmetric and positive definite-this special
case is treated in detail in the next section. In addition, in this case complete
pivoting becomes diagonal pivoting, which does not disorder the inverse. There-
fore algorithms such as that discussed above are widely used to perform stepwise
regression, where the pivots are chosen according to various criteria other than
error growth. Typically, since the pivot represents a new independent variable
entering the regression, we may choose the variable which most reduces the
residual sum of squares at the current stage. The particular combination of (say) m
out of n independent variables chosen by such a forward selection rule is not
necessarily the combination of m variables of the n available which gives the
smallest residual sum of squares. Furthermore, the use of a sum-of-squares and
cross-products matrix is subject to all the criticisms of such approaches to
least-squares problems outlined in chapter 5.
As an illustration, consider the problem given in example 7.2. A Data General
ECLIPSE operating in six hexadecimal digit arithmetic gave a solution
-4·64529E-2
1·02137
x= -0·160467
-0·285955
206·734
when the pivots were chosen according to the residual sum-of-squares reduction
criterion. The average relative difference of these solution elements from those of
solution (α ) of table 3.2 is 0·79%. Complete (diagonal) pivoting for the largest
element in the remaining submatrix of the Gauss-Jordan working array gave a
solution with an average relative difference (root mean square) of 0·41%. There
are, of course, 120 pivot permutations, and the differences measured for each
solution ranged from 0·10% to 0·79%. Thus pivot ordering does not appear to be
a serious difficulty in this example.
The operations of the Gauss-Jordan algorithm are also of utility in the solution
of linear and quadratic programming problems as well as in methods derived from
such techniques (for example, minimum absolute deviation fitting). Unfortunately,
these topics, while extremely interesting, will not be discussed further in this
monograph.
The symmetric positive definite matrix again 97
8.2. THE GAUSS-JORDAN ALGORITHM FOR THE INVERSE
OF A SYMMETRIC POSITIVE DEFINITE MATRIX
Bauer and Reinsch (in Wilkinson and Reinsch 1971, p 45) present a very compact
algorithm for inverting a positive definite symmetric matrix in situ, that is,
overwriting itself. The principal advantages of this algorithm are as follows.
(i) No pivoting is required. This is a consequence of positive definiteness and
symmetry. Peters and Wilkinson (1975) state that this is ‘well known’, but I
believe the full analysis is as yet unpublished.
(ii) Only a triangular portion of the matrix need be stored due to symmetry,
though a working vector of length n, where n is the order of the matrix, is needed.
The algorithm is simply the substitution procedure outlined above. The
modifications which are possible due to symmetry and positive definiteness,
however, cause the computational steps to look completely different.
Consider an intermediate situation in which the first k of the elements x and b
have been exchanged in solving
Ax = b (8.6)
by the Gauss-Jordan algorithm. At this stage the matrix of coefficients will have
the form
W X
Y Z (8.7)
(8.8)
Thus by setting xj = 0, for j = (k + 1), (k + 2), . . . , n, in both (8.6) and (8.8) the
required association of W and the leading k by k block of A-1 is established.
Likewise, by setting bj = 0, for j = 1, . . . , k, in (8.6) and (8.8), Z is the inverse of
the corresponding block of A-l. (Note that W and Z are symmetric because A and
A-1 are symmetric.)
From these results, W and Z are both positive definite for all k since A is
positive definite. This means that the diagonal elements needed as pivots in the
Gauss-Jordan steps are always positive. (This follows directly from the definition
of positive definiteness for A, that is, that xT A x > 0 for all x 0. )
In order to avoid retaining elements above or below the diagonal in a program,
we need one more result. This is that
Y = –X T (8.9)
(8.16)
where we use the identity
(8.17)
since these elements belong to a submatrix Z which is symmetric in accord with
the earlier discussion.
It remains to establish that
for j = (k+1), . . . , n (8.18)
but this follows immediately from equations (8.11) and (8.12) and the symmetry of
the submatrix Z. This completes the induction.
There is one more trick needed to make the Bauer-Reinsch algorithm ex-
tremely compact. This is a sequential cyclic re-ordering of the rows and columns
of A so that the arithmetic is always performed with k = 1. This re-numeration
relabels (j + 1) as j for j = 1, 2, . . . , (n - 1) and relabels 1 as n. Letting
(8.19)
this gives a new Gauss-Jordan step
(8.20)
(8.21)
(8.22)
(8.23)
for i, j = 2, . . . , n.
The symmetric positive definite matrix again 99
A difficulty in this is that the quantities /p have to be stored or they will
be overwritten by Ai-1,j-1 during the work. This is overcome by using a working
vector to store the needed quantities temporarily.
Because of the re-numeration we also have the matrix of coefficients in the
form
(8.24)
This completes the inversion, the original matrix having been overwritten by its inverse.
NEW NEW
LOAD ENHBRT LOAD ENHBRT
LOAD ENHMT4 LOAD ENHMT5
RUN RUN
ENHBRG AUG 19 75 ENHBRG AUG 19 75
BAUER REINSCH BAUER REINSCH
ORDER? 5 ORDER? 5
FRANK MATRIX MOLER MATRIX
1 1
1 2 -1 2
1 2 3 -1 0 3
1 2 3 4 -1 0 1 4
1 2 3 4 5 -1 0 1 2 5
INVERSE INVERSE
ROW 1 ROW 1
2 86
ROW 2 ROW 2
-1 2 43 22
ROW 3 ROW 3
0 -1 2 22 11 6
ROW 4 ROW 4
-1 2 12 6 3 2
ROW 5 ROW 5
0 0 0 -1 1 8 4 2 1 1
The symmetric positive definite matrix again 101
INVERSE OF INVERSE INVERSE OF INVERSE
ROW 1 ROW 1
1 .999994
ROW 2 ROW 2
1 2 -1 2
ROW 3 ROW 3
.999999 2 3 -.999987 0 2.99997
ROW 4 ROW 4
.999999 2 3 4 -.999989 0 .999976 3.99998
ROW 5 ROW 5
1 2 3 4 5 -.999999 0 .999978 1.99998 4.99998
Chapter 9
9.1. INTRODUCTION
The next three chapters are concerned with the solution of algebraic eigen-
value problems
A x = ex (2.62)
and
Ax = eBx. (2.63)
The treatment of these problems will be highly selective, and only methods which
I feel are particularly suitable for small computers will be considered. The reader
interested in other methods is referred to Wilkinson (1965) and Wilkinson and
Reinsch (1971) for an introduction to the very large literature on methods for the
algebraic eigenproblem. Here I have concentrated on providing methods which
are reliable and easy to implement for matrices which can be stored in the
computer main memory, except for a short discussion in chapter 19 of two methods
suitable for sparse matrices. The methods have also been chosen to fit in with
ideas already developed so the reader should not feel on too unfamiliar ground.
Thus it is possible, even likely, that better methods exist for most applications and
any user should be aware of this. Section 10.5 discusses some of the possible
methods for real symmetric methods.
(9.3)
(9.5)
(9.6)
But since
| ej/e1 | < 1 (9.7)
unless j = 1 (the case of degenerate eigenvalues is treated below), the coefficients
of φj, j 1, eventually become very small. The ultimate rate of convergence is
given by
r = | e 2 /e1 | (9.8)
where e 2 is the eigenvalue having second largest magnitude. By working with the
matrix
A ' = A – kl (9.9)
this rate of convergence can be improved if some estimates of e 1 and e2 are
known. Even if such information is not at hand, ad hoc shifts may be observed to
improve convergence and can be used to advantage. Furthermore, shifts permit
(i) the selection of the most positive eigenvalue or the most negative eigenvalue
and, in particular,
(ii) evasion of difficulties when these two eigenvalues are equal in magnitude.
Degenerate eigenvalues present no difficulty to the power method except that it
now converges to a vector in the subspace spanned by all eigenvectors corres-
ponding to e1 . Specific symmetry or other requirements on the eigenvector must
be imposed separately.
In the above discussion the possibility that a 1 = 0 in the expansion of x 1 has
been conveniently ignored, that is, some component of x 1 in the direction of φ l i s
assumed to exist. The usual advice to users is, ‘Don’t worry, rounding errors will
104 Compact numerical methods for computers
eventually introduce a component in the right direction’. However, if the matrix
A has elements which can be represented exactly within the machine, that is, if A
can be scaled so that all elements are integers small enough to fit in one machine
word, it is quite likely that rounding errors in the ‘right direction’ will not occur.
Certainly such matrices arise often enough to warrant caution in choosing a
starting vector. Acton (1970) and Ralston (1965) discuss the power method in
more detail.
The power method is a simple, yet highly effective, tool for finding the extreme
eigensolutions of a matrix. However, by applying it with the inverse of the shifted
matrix A' (9.9) an algorithm is obtained which permits all distinct eigensolutions
to be determined. The iteration does not, of course, use an explicit inverse, but
solves the linear equations
A 'y i = xi (9.10a)
then normalises the solution by
xi+l = yi/||yi||. (9.10b)
Note that the solution of a set of simultaneous linear equations must be found at
each iteration.
While the power method is only applicable to the matrix eigenproblem (2.62),
inverse iteration is useful for solving the generalised eigenproblem (2.63) using
A' = A – kB (9.11)
in place of (9.9). The iteration scheme is now
A' yi = Bxi (9.12a)
xi+1 = yi/||yi||. (9.12b)
Once again, the purpose of the normalisation of y in (9.1b), (9.10b) and (9.12b) is
simply to prevent overflow in subsequent calculations (9.1a), (9.10a) or (9.12a).
The end use of the eigenvector must determine the way in which it is standard-
ised. In particular, for the generalised eigenproblem (2.63), it is likely that x
should be normalised so that
x T Bx = 1. (9.13)
Such a calculation is quite tedious at each iteration and should not be performed
until convergence has been obtained, since a much simpler norm will suffice, for
instance the infinity norm
(9.14)
where yj is the jth element of y. On convergence of the algorithm, the eigenvalue
is
e = k + xj/y j (9.14)
(9.16)
or
(9.17)
Therefore
(9.18)
(9.27)
The calculation can be arranged so that u is not needed. that is so x and y are the
only working vectors needed.
RUN
ENHCMG - COMEIG AT SEPT 3 74
ORDER? 3
ELEMENT ( 1 , 1 );REAL=? 1 IMAGINARY? 2
ELEMENT ( 1 , 2 );REAL=? 3 IMAGINARY? 4
ELEMENT ( 1 , 3 );REAL=? 21 IMAGINARY? 22
ELEMENT ( 2 , 1 );REAL=? 43 IMAGINARY? 44
ELEMENT ( 2 , 2 );REAL=? 13 IMAGINARY? 14
ELEMENT ( 2 , 3 );REAL=? 15 IMAGINARY? 16
ELEMENT ( 3 , 1 );REAL=? 5 IMAGINARY? 6
ELEMENT ( 3 , 2 );REAL=? 7 IMAGINARY? 8
ELEMENT ( 3 , 3 );REAL=? 25 IMAGINARY? 26
TAU= 194 AT ITN 1
TAU= 99,7552 AT ITN 2
TAU= 64,3109 AT ITN 3
TAU= 25,0133 AT ITN 4
TAU= 7,45953 AT ITN 5
TAU= .507665 AT ITN 6
TAU= 6.23797E-4 AT ITN 7
TAU= 1.05392E-7 AT ITN 8
EIGENSOLUTIONS
RAW VECTOR 1
( .371175 ,-.114606 )
( .873341 ,-.29618 )
( .541304 ,-.178142 )
EIGENVALUE 1 =( 39,7761 , 42,9951 )
VECTOR
( .42108 , 1.15757E-2 )
( 1 , 5.96046E-8 )
( .617916 , 5.57855E-3 )
RESIDUALS
( 2.2918E-4 , 2.34604E-4 )
( 5.16415E-4 , 5.11169E-4 )
( 3.70204E-4 , 3.77655E-4 )
RAW VECTOR 2
(-9.52917E-2 ,-.491205 )
( 1.19177 , .98026 )
(-.342159 ,-9.71221E-2 )
118 Compact numerical methods for computers
EIGENVALUES 2 =( 6.7008 ,-7.87591 )
VECTOR
(-.249902 ,-.206613 )
( 1 , 1.19209E-7 )
(-.211227 , 9.22453E-2 )
RESIDUALS
(-3.8147E-5 , 3.8147E-6 )
( 7.55787E-5 ,-7.48634E-5 )
(-1.52588E-5 , 2.57492E-5 )
RAW VECTOR 3
( .408368 , .229301 )
(-.547153 ,-1.39186 )
(-4.06002E-2 , .347927 )
EIGENVALUE 3 =(-7.47744 , 6.88024 )
VECTOR
(-.242592 , .198032 )
( 1 , 0 )
(-.206582 ,-.110379 )
RESIDUALS
( 5.24521E-6 ,-4.00543E-5 )
(-7.9155E-5 , 7.82013E-5 )
( 2.81334E-5 ,-1.04904E-5 )
Chapter 10
(10.8)
thus implying that all the ej must be positive or else a vector w = XT y could be
devised such that wj = 1, wi = 0 for i j corresponding to the non-positive
eigenvalue, thus violating the condition (7.9) for definiteness. Hereafter, E and S
will be ordered so that
Si > Si +l > 0 (10.9)
ei > ei + l · (10.10)
This enables S and E to be used interchangably once their equivalence has been
demonstrated.
Now consider pre-multiplying equation (10.1) by A. Thus we obtain
A2X = AAX = AXE = XEE = XE 2 (10.11)
while from symmetry (10.7) and the decomposition (2.53)
A2 V = ATAV = VS 2 . (10.12)
Since (10.11) and (10.12) are both eigenvalue equations for A , S and E 2 are
2 2
identical to within ordering, and since all ei are positive, the orderings (10.9) and
(10.10) imply
S = E. (10.13)
Now it is necessary to show that
AV = VS. (10.14)
From (10.1), letting Q = X T V, we obtain
T
AV = XEX V = XEQ = XSQ. (10.15)
However, from (10.11) and (10.12), we get
QS2 = S2 Q. (10.16)
Real symmetric matrices 121
Explicit analysis of the elements of equation (10.16) shows that (a) if Sii Sjj, then
Qij = 0, and (b) the commutation
QS = SQ (10.17)
is true even in the degenerate eigenvalue case; thus,
AV = XSQ = XQS = XXTVS = VS. (10.18)
The corresponding result for U is shown in the same fashion.
(10.21)
(10.22)
If E > 0, the matrix A is positive definite; otherwise a shift equal to E will make
122 Compact numerical methods for computers
it so. Thus we can define
h=0 for E > ε (10.23a)
h = –(|E| + ε½) = E – ε½ for E < ε (10.23b)
to ensure a positive definite matrix A' results from the shift (10.19). The machine
precision ε is used simply to take care of those situations, such as a matrix with a
null row (and column), where the lower bound E is in fact a small eigenvalue.
Unfortunately, the accuracy of eigensolutions computed via this procedure is
sensitive to the shift. For instance, the largest residual element R, that is, the
element of largest magnitude in the matrix
AX – XE (10.24)
and the largest inner product P, that is, the off-diagonal element of largest
magnitude in the matrix
XT X – 1 n (10.25)
for the order-10 Ding Dong matrix (see appendix 1) are: for h = –3.57509,
R = 5·36442E–6 and P = 1·24425E–6 while for h = –10·7238, R = 1·49012E–5
and P = 2·16812E–6. These figures were computed on a Data General NOVA
(23-bit binary arithmetic) using single-length arithmetic throughout as no ex-
tended precision was available. The latter shift was obtained using
for E > 0 (10.26a)
for E < 0. (10.26 b)
In general, in a test employing all nine test matrices from appendix 1 of order 4
and order 10, the shift defined by formulae (10.23) gave smaller residuals and
inner products than the shift (10.26). The eigenvalues used in the above examples
were computed via the Rayleigh quotient
(10.27)
rather than the singular value, that is, equation (10.20). In the tests mentioned
above, eigenvalues computed via the Rayleigh quotient gave smaller residuals
than those found merely by adding on the shift. This is hardly surprising if the
nature of the problem is considered. Suppose that the true eigenvectors are φi ,
i = 1, 2, . . . , n. Let us add a component c w to φ j , where w is some normalised
combination of the φ i , i j, and c measures the size of the component (error); the
normalised approximation to the eigenvector is then
xj = (l + c 2 ) - ½ ( φj + cw). (10.28)
The norm of the deviation (xj – φ j ) is found, using the binomial expansion and
ignoring terms in c4 and higher powers relative to those in c2, to be approximately
equal to c. The Rayleigh quotient corresponding to the vector given by (10.28) is
2 T 2
Qj = ( Ejj + c w Aw)/(1+ c ) (10.29)
since is zero by virtue of the orthogonality of the eigenvectors. The
deviation of Qj from the eigenvalue is
Real symmetric matrices 123
TABLE 10.1. Maximum absolute residual element R and maximum absolute inner product P between
normalised eigenvectors for eigensolutions of order n = 10 real symmetric matrices. All programs in
-22
BASIC on a Data General NOVA. Machine precision = 2 .
(The maximum absolute residual was 3·8147E–6, the maximum inner product
4·4226E–7.) The last two principal moments of inertia are the same or
degenerate. Thus any linear combination of v2 and v 3 will give a new vector
where the V(k) are the plane rotations introduced in §3.3. The limit of the
sequence is a diagonal matrix under some conditions on the angles of rotation.
Each rotation is chosen to set one off-diagonal element of the matrix A (k) to zero.
In general an element made zero by one rotation will be made non-zero by
another so that a series of sweeps through the off-diagonal elements are needed to
reduce the matrix to diagonal form. Note that the rotations in equation (10.32)
preserve symmetry, so that there are n(n -1)/2 rotations in one sweep if A is of
order n.
Consider now the effect of a single rotation, equation (3.11), in the ij plane.
Then for m i, j
(10.33)
(10.34)
while
(10.35)
(10.36)
(10.37)
By allowing
(10.38)
and
(10.39)
the angle calculation defined by equations (3.22)-(3.27) will cause to be
zero. By letting
(10.40)
(11.21)
which therefore requires 12n2 matrix elements to take the expanded matrices and
resulting eigenvectors. Nash (1974) shows how it may be solved using only
4n2+ 4 n matrix elements.
(11.23)
(11.24)
we note that the oscillator which has coefficients k 2 = 1, k 4= 0 in its potential has
exact solutions which are polynomials multiplied by
exp(-0·5 x 2 ) .
Therefore, the basis functions which will be used here are
f i (x) = Nxj -1 exp(-α x 2 ) (11.25)
where N is a normalising constant.
The approximation sought will be limited to n terms. Note now that
Hfj (x) = N exp(–α x 2)[–(j – 1)(j – 2)x j-3 + 2α (2j – 1)x j- 1
+( k 2 -4a2 )x j +1+ k4 x j+3
]. (11.26)
The minimisation of the Rayleigh quotient with respect to the coefficients cj gives
the eigenproblem
Ac = eBc (11.27)
where
(11.28)
and
(11.29)
These integrals can be decomposed to give expressions which involve only the
integrals
for m odd
for m = 0.
The normalising constant N 2 has been chosen to cancel some awkard constants in
the integrals (see, for instance, Pierce and Foster 1956, p 68).
Because of the properties of the integrals (11.30) the eigenvalue problem
(11.27) reduces to two smaller ones for the even and the odd functions. If we set a
parity indicator w equal to zero for the even case and one for the odd case,
140 Compact numerical methods for computers
we can substitute
j - 1 = 2 (q -1) + w (11.31 a)
i- 1 = 2 (p-1) + w (11.31b)
where p and q will be the new indices for the matrices A and B running from 1 to
n'= n /2 (assuming n even). Thus the matrix elements are
Ãp q=-(j – 1)(j – 2)Is + 2 a(2j – 1)I s+2 + (k 2– 4a 2 )I s +4 + k4 I s + 6 (11.32)
and
(11.33)
where
s = i + j – 4 =2(p + q – 3+ w)
and j is given by (11.31a). The tilde is used to indicate the re-numeration of A
and B.
The integrals (11.30) are easily computed recursively.
STEP DESCRIPTION
0 Enter s, α. Note s is even.
1 Let v = 1.
2 If s<0, stop. Is is in v. For s<0 this is always multiplied by 0.
3 For k = 1 to s/2.
Let v = v * (2 * k- 1) * 0·25/ α.
End loop on k.
4 End integral. Is is returned in v.
As an example, consider the exactly solvable problem using n' = 2, and α = 0·5
for w = 0 (even parity). Then the eigenproblem has
with solutions
e =1 c = (1, 0)T
and
e=5 c = 2-½(-1, 2)T.
The same oscillator (a = 0·5) with w = 1 and n' = 10 should also have exact
solutions. However, the matrix elements range from 0·5 to 3·2E+17 and the
solutions are almost all poor approximations when found by algorithm 15.
Likewise, while the problem defined by n' = 5, w = 0, a = 2, k2= 0, k4 = 1 is
solved quite easily to give the smallest eigenvalue e1= 1·06051 with eigenvector
(12.2)
142
Optimisation and nonlinear equations 143
By using the shorthand of vector notation, the nonlinear least-squares problem
is written: minimise
S ( b) = f Tf = f T (b, Y)f(b, Y) (12.4)
with respect to the parameters b. Once again, K is the number of variables, M is
the number of data points and n is the number of parameters.
Every nonlinear least-squares problem as defined above is an unconstrained
minimisation problem, though the converse is not true. In later sections methods
will be presented with aim to minimise S(b) where S is any function of the
parameters. Some ways of handling constraints will also be mentioned. Unfortu-
nately, the mathematical programming problem, in which a minimum is sought for
a function subject to many constraints, will only be touched upon briefly since
very little progress has been made to date in developing algorithms with minimal
storage requirements.
There is also a close relationship between the nonlinear least-squares problem
and the problem of finding solutions of systems of nonlinear equations. A system
of nonlinear equations
f (b,Y) = 0 (12.5)
having n = M (number of parameters equal to the number of equations) can be
approached as the nonlinear least-squares problem: minimise
S = f Tf (12.6)
with respect to b. For M greater than n, solutions can be sought in the least-
squares sense; from this viewpoint the problems are then indistinguishable. The
minimum in (12.6) should be found with S = 0 if the system of equations has a
solution. Conversely, the derivatives
j = 1, 2, . . . , n (12.7)
for an unconstrained minimisation problem, and in particular a least-squares
problem, should be zero at the minimum of the function S( b), so that these
problems may be solved by a method for nonlinear equations, though local
maxima and saddle points of the function will also have zero derivatives and are
acceptable solutions of the nonlinear equations. In fact, very little research has
been done on the general minimisation or nonlinear-equation problem where
either all solutions or extrema are required or a global minimum is to be found.
The minimisation problem when n = 1 is of particular interest as a subproblem
in some of the methods to be discussed. Because it has only one parameter it is
usually termed the linear search problem. The comparable nonlinear-equation
problem is usually called root-finding. For the case that f ( b) is a polynomial of
degree (K – 1), that is
(12.8)
the problem has a particularly large literature (see, for instance, Jenkins and
Traub 1975).
144 Compact numerical methods for computers
Example 12.1. Function minimisation-optimal operation of a public lottery
Perry and Soland (1975) discuss the problem of deciding values for the main
variables under the control of the organisers of a public lottery. These are p, the
price per ticket; u, the value of the first prize; w, the total value of all other prizes;
and t, the time interval between draws. If N is the number of tickets sold, the
expected cost of a single draw is
K1 + K2 N
that is, a fixed amount plus some cost per ticket sold. In addition, a fixed cost per
time K3 is assumed. The number of tickets sold is assumed to obey a Cobb-
Douglas-type production function
where F is a scale factor (to avoid confusion the notation has been changed from
that of Perry and Soland). There are a number of assumptions that have not been
stated in this summary, but from the information given it is fairly easy to see that
each draw will generate a revenue
R = Np – (v + w + K2 + K2N + K3t).
Thus the revenue per unit time is
R/t = –S = [(p – K2 )N – (v + w + K1)]/ t – K3.
Therefore, maximum revenue per unit time is found by minimising S( b) where
i Yil
1 5·308
2 7·24
3 9·638
4 12·866
5 17·069
6 23·192
7 31·443
8 38·558
9 50·156
10 62·948
11 75·995
12 91·972
where Z and β are constants. In this simple example, the equations reduce to
or
so that
However, in general, the system will involve more than one commodity and will
not offer a simple analytic solution.
Example 12.4. Root-finding
In the economic analysis of capital projects, a measure of return on investment
that is commonly used is the internal rate of return r. This is the rate of interest
applicable over the life of the project which causes the net present value of the
project at the time of the first investment to be zero. Let yli be the net revenue of
the project, that is, revenue or income minus loss or investment, in the ith time
period. This has a present value at the first time period of
y l i /(1 + 0·01r)i - 1
where r is the interest rate in per cent per period. Thus the total present value at
the beginning of the first time period is
where K is the number of time periods in the life of the project. By setting
b = 1/(1 + 0·01r)
this problem is identified as a polynomial root-finding problem (12.8).
146 Compact numerical methods for computers
Example 12.5. Minimum of a function of one variable
Suppose we wish to buy some relatively expensive item, say a car, a house or a
new computer. The present era being afflicted by inflation at some rate r, we will
pay a price
P(1 + r) t
at some time t after the present. We can save at s dollars (pounds) per unit time,
and when we buy we can borrow money at an overall cost of (F – 1) dollars per
dollar borrowed, that is, we must pay back F dollars for every dollar borrowed. F
can be evaluated given the interest rate R and number of periods N of a loan as
N
F = NR (1 + R ) /[(1 + R ) N – 1].
Then, to minimise the total cost of our purchase, we must minimise
S(t) = ts + [P(1 + r)t – ts]F
= ts(1 – F) + FP(1 + r)t.
This has an analytic solution
t = 1n{(F – 1)s/[FP ln(1 + r)]}/ln(1 + r).
However, it is easy to construct examples for which analytic solutions are harder
to obtain, for instance by changing inflation rate r with time.
ONE-DIMENSIONAL PROBLEMS
13.1. INTRODUCTION
One-dimensional problems are important less in their own right than as a part of
larger problems. ‘Minimisation’ along a line is a part of both the conjugate
gradients and variable metric methods for solution of general function minimisa-
tion problems, though in this book the search for a minimum will only proceed
until a satisfactory new point has been found. Alternatively a linear search is
useful when only one parameter is varied in a complicated function, for instance
when trying to discover the behaviour of some model of a system to changes in
one of the controls. Roots of functions of one variable are less commonly needed
as a part of larger algorithms. They arise in attempts to minimise functions by
setting derivatives to zero. This means maxima and saddle points are also found,
so I do not recommend this approach in normal circumstances. Roots of polyno-
mials are another problem which I normally avoid, as some clients have a nasty
habit of trying to solve eigenproblems by means of the characteristic equation.
The polynomial root-finding problem is very often inherently unstable in that very
small changes in the polynomial coefficients give rise to large changes in the roots.
Furthermore, this situation is easily worsened by ill chosen solution methods. The
only genuine polynomial root-finding problem I have encountered in practice is
the internal rate of return (example 12.4). However, accountants and economists
have very good ideas about where they would like the root to be found, so I have
not tried to develop general methods for finding all the roots of a polynomial, for
instance by methods such as those discussed by Jenkins and Traub (1975). Some
experiments I have performed with S G Nash (unpublished) on the use of matrix
eigenvalue algorithms applied to the companion matrices of polynomials were
not very encouraging as to accuracy or speed, even though we had expected such
methods to be slow.
S j =S(b j) (13.7)
for brevity,
S1 <S0 (13.8)
and
S 1 <S 2 . (13.9)
Excluding the exceptional cases that the function is flat or otherwise perverse, so
that at least one of the conditions (13.8) or (13.9) is an inequality, the interpola-
ting parabola will have its minimum between b 0 and b 2. Note now that we can
measure all distances from b1, so that equations (13.5) can be rewritten
(13.10)
where
xj = bj - b1 for j = 0, 1, 2. (13.11)
Equations (13.10) can then be solved by elimination to give
(13.12)
and
(13.13)
One-dimensional problems 153
(Note that the denominators differ only in their signs.) Hence the minimum of the
parabola is found at
(13.14)
The success-failure algorithm always leaves the step length equal to x2. The
length x0 can be recovered if the steps from some initial point to the previous two
evaluation points are saved. One of these points will be b 1; the other is taken as
b0. The expression on the right-hand side of equation (13.14) can be evaluated in
a number of ways. In the algorithm below, both numerator and denominator have
been multiplied by -1.
To find the minimum of a function of one parameter, several cycles of
success-failure and parabolic inverse interpolation are usually needed. Note that
algorithm 17 recognises that some functions are not computable at certain points
b. (This feature has been left out of the program FMIN given by Forsythe et al
(1977), and caused some failures of that program to minimise fairly simple
functions in tests run by B Henderson and the author, though this comment
reflects differences in design philosophy rather than weaknesses in FMIN.) Al-
gorithm 17 continues to try to reduce the value of the computed function until
(b+h) is not different from b in the machine arithmetic. This avoids the
requirement for machine-dependent tolerances, but may cause the algorithm to
execute indefinitely in environments where arithmetic is performed in extended-
precision accumulators if a storage of (b+h) is not forced to shorten the number
of digits carried.
In tests which I have run with B Henderson, algorithm 17 has always been
more efficient in terms of both time and number of function evaluations than a
linear search procedure based on that in algorithm 22. The reasons for retaining
the simpler approach in algorithm 22 were as follows.
(i) A true minimisation along the line requires repeated cycles of success-
failure/inverse interpolation. In algorithm 22 only one such cycle is used as part of
a larger conjugate gradients minimisation of a function of several parameters.
Therefore, it is important that the inverse interpolation not be performed until at
least some progress has been made in reducing the function value, and the
procedure used insists that at least one ‘success’ be observed before interpolation
is attempted.
(ii) While one-dimensional trials and preliminary tests of algorithm 17-like cycles
in conjugate gradients minimisation of a function of several parameters showed
some efficiency gains were possible with this method, it was not possible to carry
out the extensive set of comparisons presented in chapter 18 for the function
minimisation algorithms due to the demise of the Data General NOVA; the
replacement ECLIPSE uses a different arithmetic and operating system. In view
of the reasonable performance of algorithm 22, I decided to keep it in the
collection of algorithms. On the basis of our experiences with the problem of
minimising a function of one parameter, however, algorithm 17 has been chosen
for linear search problems. A FORTRAN version of this algorithm performed
154 Compact numerical methods-for computers
competitively with the program FMIN due to Brent as given in Forsythe et al (1977)
when several tests were timed on an IBM 370/168.
The choice of the step adjustment factors Al and A2 to enlarge the step length
or to reduce it and change its sign can be important in that poor choices will
obviously cause the success-failure process to be inefficient. Systematic optimisa-
tion of these two parameters over the class of one-dimensional functions which
may have to be minimised is not feasible, and one is left with the rather
unsatisfactory situation of having to make a judgement from experience. Dixon
(1972) mentions the choices (2,-0·25) and (3,-0·5). In the present application,
however, where the success-failure steps are followed by inverse interpolation, I
have found the set (1·5,-0·25) to be slightly more efficient, though this may
merely reflect problems I have been required to solve.
NOVA ECLIPSE
For both sets, the endpoints are u=0, v =50 and number of points
is n = 5.
One-dimensional problems 157
Simple grid search was applied to this function on Data General NOVA and
ECLIPSE computers operating in 23-bit binary and six-digit hexadecimal arithmetic,
respectively. The table at the bottom of the previous page gives the results of this
exercise. Note the difference in the computed function values!
An extended grid search (on the ECLIPSE) uses 26 function evaluations to
localise the minimum to within a tolerance of 0·1.
NEW
*ENTER#SGRID#
*RUN
SGRID NOV 23 77
3 5 l978 16 44 31
ENTER SEARCH INTERVAL ENDPOINTS
AND TOLERANCE OF ANSWER’S PRECISION
? 10 ? 30 ? 1
ENTER THE NUMBER OF GRID DIVISIONS
F( 14 )= 22180.5
F( 18 )= 22169.2
F( 22 )=: 22195.9
F( 76 )= 22262.1
THE MINIMUM LIES IN THE INTERVAL. [ 14 , 22 ]
F( 15.6 )= 22171.6
F( 17.2 )= 22168.6
F( 18.8 )= 22171.6
F( 20.4 )=: 22180.7
THE MINIMUM LIES IN THE INTERVAL [ 15.6 , 18.8 ]
F( 16.24 )=: 22169.7
F( 16.88 )= 22168.7
F( 17.52 )= 22168.7
F( 18.16 )= 22169.7
THE MINIMUM LIES IN THE INTERVAL [ 16.24 , 17.52 ]
F( 16.496 )= 22169.2
F( 36.752 )= 22168.8
F( 17.008 ):= 22168.7
F( 17.264 )= 22168.6
THE MINIMUM LIES IN THE INTERVAL [ 17.008 , 17.52 ]
18 FUNCTION EVALUATIONS
NEW TOLERANCE ? .1
F( 17.1104 )= 22168.6
F( 17.2128 )= 22168.6
F( 17.3152 )= 22168.6
F( 17.4176 )= 22168.6
THE MINIMUM LIES IN THE INTERVAL [ 17.1104 , 17.3152 ]
F( 17.1513 )= 22168.6
F( l7.1923 )= 22168.6
F( 17.2332 )= 22168.6
F( 17.2742 )= 22168.6
THE: MINIMUM LIES IN THE INTERVAL [ 17.1923 , 17.2742 ]
26 FUNCTION EVALUATIONS
NEW TOLERANCE ? -1
STOP AT 0420
*
Algorithm 17 requires a starting point and a step length. The ECLIPSE gives
*
*RUN
NEWMIN JULY 7 77
STARTING VALUE= ? 10 STEP ? 5
158 Compact numerical methods for computers
F( 10 )= 22228.5
F( 15 )= 22174.2
SUCCESS
F( 22.5)= 22202.2
PARAMIN STEP= 2.15087
F( 17.1509 )= 22168.6
NEW K4=-1.78772
F( 15.3631 )= 22172.6
FAILURE
F( 17.5978 )= 22168.8
PARAMIN STEP= 7.44882E-02
F( 17.2253 )= 22168.6
NEW K4=-.018622
F( 17.2067 )= 22168.6
SUCCESS
F( 17.1788 )= 22168.6
PARAMIN STEP=-4.65551E-03
F( 17.2021 )= 22168.6
PARAMIN FAILS
NEW K4= 4.65551E-03
F( 172114 )= 22168.6
FAILURE
F( 17.2055 )= 22168.6
PARAMIN FAILS
NEW K4= 0
MIN AT 17.2067 = 22168.6
12 FN EVALS
STOP AT 0060
*
The effect of step length choice is possibly important. Therefore, consider the
following applications of algorithm 17 using a starting value of t = 10.
1 17·2264 13
5 17·2067 12
10 17·2314 10
20 17·1774 11
The differences in the minima are due to the flatness of this particular function,
which may cause difficulties in deciding when the minimum has been located. By
way of comparison, a linear search based on the success-failure/inverse interpola-
tion sequence in algorithm 22 found the following minima starting from t = 10.
1 17·2063 23
5 17·2207 23
10 17·2388 21
20 17·2531 24
One-dimensional problems 159
A cubic inverse interpolation algorithm requiring both the function and deriva-
tive to be computed, but which employs a convergence test based solely on the
change in the parameter, used considerably more effort to locate a minimum from
t = 10.
1 17·2083 38+38
5 17·2082 23+23
10 17·2082 36+36
20 17·2081 38+38
Most of the work in this algorithm is done near the minimum, since the region of
the minimum is located very quickly.
If we can be confident of the accuracy of the derivative calculation, then a
root-finder is a very effective way to locate a minimum of this function. However,
we should check that the point found is not a maximum or saddle. Algorithm 18
gives
*
*RUN200
ROOTFINDER
U= ? 10 V= ? 30
BISECTION EVERY ? 5
TOLERANCE ? 0
F( 10 )=-16.4537 F( 30 )= 32.0994
FP ITN 1 U= 10 V= 30 F( 16.7776 )=-1.01735
FP ITN 2 U= 16.7776 V= 30 F( 17.1838 )=-0.582123
FP ITN 3 U= 17.1838 V= 30 F( 17.207 )=-.00361633
FP ITN 4 U= 17.2307 V= 30 F( 117.2084 )=-2.28882E-04
FP ITN 5 U= 17.2084 V= 30 F( 17.2085 )=-3.05176E-05
BI ITN 6 U= 17.2085 V= 30 F( 23.6042 )= 15.5647
FP CONVERGED
ROOT: F( 17.2085 )=-3.05176E-05
STOP AT 0340
*
Unless otherwise stated all of the above results were obtained using a Data
General ECLIPSE operating in six hexadecimal digit arithmetic.
It may appear that this treatment is biased against using derivative information.
For instance, the cubic inverse interpolation uses a convergence test which does
not take it into account at all. The reason for this is that in a single-precision
environment (with a short word length) it is difficult to evaluate the projection of
a gradient along a line since inner-product calculations cannot be made in
extended precision. However, if the linear search is part of a larger algorithm to
minimise a function for several parameters, derivative information must usually
be computed in this way. The function values may still be well determined, but
160 Compact numerical methods for computers
inaccuracy in the derivative has, in my experience, appeared to upset the
performance of either the inverse interpolation or the convergence test.
The algorithm above is able to halt if a point is encountered where the function is not computable.
In later minimisation codes we will be able to continue the search for a minimum by presuming
such points are not local minima. However, in the present context of one-dimensional root-
164 Compact numerical methods for computers
finding, we prefer to require the user to provide an interval in which at least one root exists and
upon which the function is defined. The driver program DR1618.PAS on the software diskette is
in tended to allow users to approximately localise roots of functions using grid search, ,followed by
a call to algorithm 18 to refine the position of a suspected root.
(13.31)
In fact using the grid search procedure (algorithm 16) we find the values in table
13.1 given t=0·5, z=100, s=100 and w=0·99. (The results in the table have
been computed on a Data General NOVA having 23-bit binary arithmetic.)
The consequences of such behaviour can be quite serious for the False Position
algorithm. This is because the linear approximation used is not valid, and a typical
step using u = 0, 2, v = 1 gives
b=1·00001/200=5·00005E-3.
Since the root is near 0·473533, the progress is painfully slow and the method
requires 143 iterations to converge. Bisection, on the other hand, converges in 24
iterations (nbis=1 in the algorithm above). For nbis=2, 25 iterations are
required, while for nbis=5, which is the suggested value, 41 iterations are
needed. This may indicate that bisection should be a permanent strategy. How-
ever, the function (13.28) can be smoothed considerably by setting w=0·2 and
s=1, for which the root is found near 0·297268. In this case the number of
iterations needed is again 24 for nbis=1 (it is a function only of the number of
bits in the machine arithmetic), 6 for nbis=5 and also 6 if nbis is set to a large
number so no bisections are performed. Figure 13.1 shows plots of the two
functions obtained on a Hewlett-Packard 9830 calculator.
One-dimensional problems 165
TABLE 13.1. Values found in example 13.2.
b f(b)
0 -1·00001
0·1 -1·00001
0·2 -1·00001
0·3 -1·00001
0·4 -1·00001
0·41 -1·00001
0·42 -0·999987
0·43 -0·999844
0·44 -0·998783
0·45 -0·990939
0·46 -0·932944
0·47 -0·505471
0·48 2·5972
0·49 22·8404
0·5 98·9994
0·6 199
0·7 199
0·8 199
0·9 199
1·0 199
early in the simulation period than if they occur at the end. Therefore. it is likely
that any sensible simulation. will use root-finding to solve (13.32) for p for a
variety of sets of arrest figures n. In particular, a pseudo-random-number
generator can be used to provide such sets of numbers chosen from some
distribution or other. The function is then computed via one of the two recurrence
relations
f i+ 1 (p)=f i(p)(1+r e ) +mp(1+0·5r e ) -n i b for fi (p)>0 (13.33)
or
f i+ 1 (p ) =f i(p )(1+r b ) +m p(1+0·5r e ) -n i b f o r f i(p)<0. (13.34)
Note that our shrewd criminals invest their premium money to increase the fund.
The rate 0·5r e is used to take account of the continuous collection of premium
payments over a period.
To give a specific example consider the following parameters: benefit b=1200,
membership m=2000, interest rates r=0·08 and rb=0·15, initial fund f 0=0
and after 10 periods f10=0 (a non-profit scheme!). The root-finding algorithm is
then applied using u=0, v=2000. Three sets of arrest figures were used to
One-dimensional problems 167
simulate the operation of the scheme. The results are given in table 13.2. The
arrests are drawn from a uniform distribution on (0,400).
The last entry in each column is an approximation based on no interest paid or earned in the fund
management. Thus
approximate premium = total arrests * b/(n* T)
= total arrests * 0·06.
These examples were run in on
FORTRAN an IBM 370/168.
Chapter 14
(14.2)
Direct search methods 169
The operation of reflection then reflects bH through b C using a reflection factor (a,
that is
b R = b C +a(b C -bH )
= ( l +a)b C -ab H . (14.3)
If S(b R) is less than S(b L) a new lowest point has been found, and the simplex can
be expanded by extending the line (bR -b C ) to give the point
(14.4)
where γ, the expansion factor, is greater than unity or else (14.4) represents a
contraction. If S(bE)<S(b R) then b H is replaced by b E and the procedure
repeated by finding a new highest point and a new centroid of n points b C .
Otherwise b R is the new lowest point and it replaces b H .
170 Compact numerical methods for computers
In the case where b R is not a new lowest point, but is less than b N, the
next-to-highest point, that is
S( b L )<S(b R ) <S(b N ) (14.5)
b H is replaced by b R and the procedure repeated. In the remaining situation, we
have S(b R) at least as great as S(b N) and should reduce the simplex.
There are two possibilities. (a) If
S (b N )<S(b R )<S(b H ) (14.6)
then the reduction is made by replacing b H by b R and finding a new vertex
between bC and b R (now b H). This is a reduction on the side of the reflection
(‘low’ side). (b) If
S(bR )>S(b H ) (14.7)
the reduction is made by finding a new vertex between b C and b H (‘high’ side).
In either of the above cases the reduction is controlled by a factor β between 0
and 1. Since case (a) above replaces b H by bR the same formula applies for the
new point bS (‘S’ denotes that the simplex is smaller) in both cases. b H is used to
denote both bR and b H since in case (a) b R has become the new highest point in
the simplex
(14.8)
The new point b S then replaces the current b H, which in case (a) is, in fact, b R ,
unless
S(b S)>min(S(b H ),S( b R ) ) . (14.9)
The replacement of b H by b R in case (a) will, in an implementation, mean that
this minimum has already been saved with its associated point. When (14.9) is
satisfied a reduction has given a point higher than S( b N), so a general contraction
of the simplex about the-lowest point so far, b L, is suggested. That is
(14.10)
for all i L. In exact arithmetic, (14.10) is acceptable for all points, and the
author has in some implementations omitted the test for i = L. Some caution is
warranted, however, since some machines can form a mean of two numbers which
is not between those two numbers. Hence, the point b L may be altered in the
operations of formula (14.10).
Different contraction factors β and β' may be used in (14.8) and (14.10). In
practice these, as well as a and γ can be chosen to try to improve the rate of
convergence of this procedure either for a specific class or for a wide range of
problems. Following Nelder and Mead (1965), I have found the strategy
a =1 γ =2 β' = β = 0·5 (14.11)
to be effective. It should be noted, however, that the choice of these values is
based on limited testing. In fact, every aspect of this procedure has been evolved
Direct search methods 171
heuristically based on a largely intuitive conception of the minimisation problem.
As such there will always be functions which cannot be minimised by this method
because they do not conform to the idealisation. Despite this, the algorithm is
surprisingly robust and, if permitted to continue long enough, almost always finds
the minimum.
The thorniest question concerning minimisation algorithms must, therefore, be
addressed: when has the minimum been found? Nelder and Mead suggest using
the ‘standard error’ of the function values
(14.12)
where
(14.13)
The procedure is taken to have converged when the test value falls below some
preassigned tolerance. In the statistical applications which interested Nelder and
Mead, this approach is reasonable. However, the author has found this criterion
to cause premature termination of the procedure on problems with fairly flat areas
on the function surface. In a statistical context one might wish to stop if such a
region were encountered, but presuming the minimum is sought. it seems logical
to use the simpler test for equality between S(b L) and S(b H), that is, a test for
equal height of all points in the simplex.
An additional concern on machines with low-precision arithmetic is that it is
possible for a general contraction (14.10) not to reduce the simplex size. There-
fore, it is advisable to compute some measure of the simplex size during the
contraction to ensure a decrease in the simplex size, as there is no point in
continuing if the contraction has not been effective. A very simple measure is the
sum
(14.14)
where
(14.15)
Finally, it is still possible to converge at a point which is not the minimum. If,
for instance, the (n+1) points of the simplex are all in one plane (which is a line
in two dimensions), the simplex can only move in (n-1) directions in the
n-dimensional space and may not be able to proceed towards the minimum.
O’Neill (1971), in a FORTRAN implementation of the Nelder-Mead ideas,
tests the function value at either side of the supposed minimum along each
of the parameter axes. If any function value is found lower than the current
supposed minimum, then the procedure is restarted.
The author has found the axial search to be useful in several cases in avoiding
false convergence. For instance, in a set of 74 tests, six failures of the procedure
were observed. This figure would have been 11 failures without the restart facility.
172 Compact numerical methods for computers
14.2. POSSIBLE MODIFICATIONS OF THE NELDER-MEAD
ALGORITHM
Besides choices for (a, β, β' and γ other than (14.11) there are many minor
variations on the basic theme of Nelder and Mead. The author has examined
several of these, mainly using the Rosenbrock (1960) test function of two
parameters
(14.16)
starting at the point (-1·2, 1).
(i) The function value S(b C ) can be computed at each iteration. If S(b C )<S(b L ),
b L is replaced by b C . The rest of the procedure is unaffected by this change, which
is effectively a contraction of the simplex. If there are more than two parameters,
the computation of bC can be repeated. In cases where the minimum lies within
the current simplex, this modification is likely to permit rapid progress towards
the minimum. Since, however, the simplex moves by means of reflection and
expansion, the extra function evaluation is often unnecessary, and in tests run by
the author the cost of this evaluation outweighed the benefit.
(ii) In the case that S(bR ) <S( b L) the simplex is normally expanded by extension
along the line (b R -b C ). If b R is replaced by b E, the formulae contained in the first
two lines of equation (14.4) permit the expansion to be repeated. This modifica-
tion suffers the same disadvantages as the previous one; the advantages of the
repeated extension are not great enough-in fact do not occur often enough-to
offset the cost of additional function evaluations.
(iii) Instead of movement of the simplex by reflection of b H through bC , one could
consider extensions along the line (b L -b C ), that is, from the ‘low’ vertex of the
simplex. Simple drawings of the two-dimensional case show that this tends to
stretch the simplex so that the points become coplanar, forcing restarts. Indeed,
a test of this idea produced precisely this behaviour.
(iv) For some sets of parameters b, the function may not be computable, or a
constraint may be violated (if constraints are included in the problem). In such
cases, a very large value may be returned for the function to prevent motion in
the direction of forbidden points. Box (1965) has enlarged on this idea in his
Complex Method which uses more than (n+1) points in an attempt to prevent all
the points collapsing onto the constraint.
(v) The portion of the algorithm for which modifications remain to be suggested
is the starting (and restarting) of the procedure. Until now, little mention has been
made of the manner in which the original simplex should be generated. Nelder
and Mead (1965) performed a variety of tests using initial simplexes generated by
equal step lengths along the parameter axes and various ‘arrangements of the
initial simplex.’ The exact meaning of this is not specified. They found the rate of
convergence to be influenced by the step length chosen to generate an initial
simplex. O’Neill (1971) in his FORTRAN implementation permits the step along
each parameter axis to be specified separately, which permits differences in the
scale of the parameters to be accommodated by the program. On restarting, these
steps are reduced by a factor of 1000. General rules on how step lengths should
Direct search methods 173
be chosen are unfortunately difficult to state. Quite obviously any starting step
should appreciably alter the function. In many ways this is an (n+1)-fold
repetition of the necessity of good initial estimates for the parameters as in §12.2.
More recently other workers have tried to improve upon the Nelder-Mead
strategies, for example Craig et al (1980). A parallel computer version reported by
Virginia Torczon seems to hold promise for the solution of problems in relatively
large numbers of parameters. Here we have been content to stay close to the original
Nelder-Mead procedure, though we have simplified the method for ranking the
vertices of the polytope, in particular the selection of the point b N .
The following output was produced with driver program DR1920, but has been
edited for brevity. The final parameters are slightly different from those given in the
first edition, where algorithm 19 was run in much lower precision. Use of the final
parameters from the first edition (1·1104, -0·387185) as starting parameters for the
present code gives an apparent minimum.
Minimum function value found = 2.5734305415E-24
At parameters
B[l]= 1.1060491080Et00
B[2]= -3.7996531780E-01
using analytic derivatives. The starting point b1=-1·2, b 2=1 was used.
# ITNS= 1 # EVALNS= 1 FUNCTION= 0.24199860E+02
# ITNS= 2 # EVALNS= 6 FUNCTION= 0.20226822E+02
# ITNS= 3 # EVALNS= 9 FUNCTION= 0.86069937E+01
# ITNS= 4 # EVALNS= 14 FUNCTION= 0.31230078E+01
# ITNS= 5 # EVALNS= 16 FUNCTION= 0.28306570E+01
# ITNS= 6 # EVALNS= 21 FUNCTION= 0.26346817E+01
# ITNS= 7 # EVALNS= 23 FUNCTION= 0.20069408E+01
# ITNS= 8 # EVALNS= 24 FUNCTION= 0.18900719E+01
# ITNS= 9 # EVALNS= 25 FUNCTION= 0.15198193E+01
# ITNS= 10 # EVALNS= 26 FUNCTION= 0.13677282E+01
# ITNS= 11 # EVALNS= 27 FUNCTION= 0.10138159E+01
# ITNS= 12 # EVALNS= 28 FUNCTION= 0.85555243E+00
# ITNS= 13 # EVALNS= 29 FUNCTION= 0.72980821E+00
# ITNS= 14 # EVALNS= 30 FUNCTION= 0.56827205E+00
# ITNS= 15 # EVALNS= 32 FUNCTION= 0.51492560E+00
# ITNS= 16 # EVALNS= 33 FUNCTION= 0.44735157E+00
# ITNS= 17 # EVALNS= 34 FUNCTION= 0.32320732E+00
# ITNS= 18 # EVALNS= 35 FUNCTION= 0.25737345E+00
# ITNS= 19 # EVALNS= 37 FUNCTION= 0.20997590E+00
# ITNS= 20 # EVALNS= 38 FUNCTION= 0.17693651E+00
# ITNS= 21 # EVALNS= 39 FUNCTION= 0.12203962E+00
# ITNS= 22 # EVALNS= 40 FUNCTION= 0.74170172E-01
# ITNS= 23 # EVALNS= 41 FUNCTION= 0.39149582E-01
# ITNS= 24 # EVALNS= 43 FUNCTION= 0.31218585E-01
# ITNS= 25 # EVALNS= 44 FUNCTION= 0.25947951E-01
# ITNS= 26 # EVALNS= 45 FUNCTION= 0.12625925E-01
# ITNS= 27 # EVALNS= 46 FUNCTION= 0.78500621E-02
# ITNS= 28 # EVALNS= 47 FUNCTION= 0.45955069E-02
# ITNS= 29 # EVALNS= 48 FUNCTION= 0.15429037E-02
# ITNS= 30 # EVALNS= 49 FUNCTION= 0.62955730E-03
# ITNS= 31 # EVALNS= 50 FUNCTION= 0.82553088E-04
# ITNS= 32 # EVALNS= 51 FUNCTION= 0.54429529E-05
# ITNS= 33 # EVALNS= 52 FUNCTION= 0.57958061E-07
# ITNS= 34 # EVALNS= 53 FUNCTION= 0.44057202E-10
# ITNS= 35 # EVALNS= 54 FUNCTION= 0.0
# ITNS= 35 # EVALNS= 54 FUNCTION= 0.0
B( 1)= 0.10000000E+01
B( 2)= 0.10000000E+01
# ITNS= 35 # EVALNS= 54 FUNCTION= 0.0
Chapter 16
such that
for j<i. (16.3)
This is achieved by applying to both sides of equation (16.2), giving
(16.4)
by substitution of the condition (16.3) and the assumed conjugacy of the tj , j = 1 ,
2, . . . , (i-1). Note that the denominator of (16.4) cannot be zero if H is positive
definite and tj is not null.
197
198 Compact numerical methods for computers
Now if q i is chosen to be the negative gradient
q i = -g i (16.5
and is substituted from (15.18), then we have
(16.6)
Moreover, if accurate line searches have been performed at each of the ( i -1)
previous steps, then the function S (still the quadratic form (15.11)) has been
minimised on a hyperplane spanned by the search directions tj , j=1, 2, . . . ,
(i-l), and gi is orthogonal to each of these directions. Therefore, we have
z i j=0 for j<(i- 1 ) (16.7)
(16.8)
Alternatively, using
(16.9)
where the bj are the parameters which should give T=0 when summed as shown
with the weights w. Given wj, j=1, 2, . . . , n, T can easily be made zero since
there are (n-1) degrees of freedom in b. However, some degree of confidence
must be placed in the published figures, which we shall call pj , j=1, 2, . . . , n.
Thus, we wish to limit each bj so that
| b j-p j| <d j for j=1, 2, . . . , n
where dj is some tolerance. Further, we shall try to make b close to p by
minimising the function
The factor 100 is arbitrary. Note that this is in fact a linear least-squares problem,
subject to the constraints above. However, the conjugate gradients method is
quite well suited to the particular problem in 23 parameters which was presented,
since it can easily incorporate the tolerances dj by declaring the function to be ‘not
computable’ if the constraints are violated. (In this example they do not in fact
appear to come into play.) The output below was produced on a Data General
ECLIPSE operating in six hexadecimal digit arithmetic. Variable 1 is used to hold
the values p, variable 2 to hold the tolerances d and variable 3 to hold the weights
w. The number of data points is reported to be 24 and a zero has been appended
Descent to a minimum II: conjugate gradients 205
to each of the above-mentioned variables to accommodate the particular way in
which the objective function was computed. The starting values for the parame-
ters b are the obvious choices, that is, the reported values p. The minimisation
program and data for this problem used less than 4000 bytes (characters) of
memory.
*
* NEW
* ENTER”JJJRUN”
* RUN
12 7 1978 9 15 2
NCG JULY 26 77
CG + SUCCESS FAILURE
DATA FILE NAME ? D16. 12
# OF VARIABLES 3
# DATA POINTS 24
# OF PARAMETERS 23
ENTER VARIABLES
VARIABLE 1 - COMMENT - THE PUBLISHED VALUES P
167.85 .895 167.85 .895 -99.69
167.85 .895 -74.33 167.85 .895
-4.8 -1.03 -1 3.42 -65.155
-.73 -.12 -20.85 1.2 -2.85
31.6 -20.66 -8.55 0
VARIABLE 2 - COMMENT -THE TOLERANCES D
65.2 5.5E-02 65.2 5.5E-02 20
65.2 5.5E-02 19.9 65.2 5.5E-02
1.6 .36 .34 1.5 10.185
.51 .26 9.57 .27 .56
14.7 3.9 4.8 0
VARIABLE 3 - COMMENT -THE WEIGHTS W
1 1309.67 1 1388.87 1
1 1377.69 1 1 1251.02
15 119.197 215 29.776 15
806.229 1260 23.62 2761 2075
29.776 33.4 51.58 0
ENTER STARTING VALUES FOR PARAMETERS
B( 1 )= 167.85
B( 2 )= .895
B( 3 )= 167.85
B( 4 )= .895
B( 5 )=-99.69
B( 6 )= 167.85
B( 7 )= .895
B( 8 )=-74.33
B( 9 )= 167.55
B( 10 )= .895
B( 11 )=-4.8
B( 12 )=-1.03
B( 13 )=-1
B( 14 )= 3.42
B( 15 )=-67.155
B( 16 )=-.73
B( 17 )=-.12
B( 18 )=-20.55
B( 19 )= 1.2
B( 20 )=-2.35
B( 21 )= 31.6
B( 22 )=-20.66
B( 23 )=-9.55
STEPSIZE= 1
0 1 772798
1 21 5.76721E-04
206 Compact numerical methods for computers
2 31 5.76718E-04
3 31 5.76718E-04
4 42 5.76716E-04
5 42 5.76716E-04
6 45 5.76713E-04
7 45 5.76713E-04
8 48 5.76711E-04
9 48 5.76711E-04
CONVERGED TO 5.76711E-04 # lTNS= 10 # EVALS= 50
# EFES= 290
B( 1 )= 167.85 G( 1 )= .148611
B( 2 )= .900395 G( 2 )= 194.637
B( 3 )= 167.85 G( 3 )= .148611
B( 4 )= .900721 G( 4 )= 206.407
B( 5 )=-99.69 G( 5 )= .148626
B( 6 )= 167.85 G( 6 )= .145611
B( 7 )= .900675 G( 7 )= 204.746
B( 8 )=-74.33 G( 8 )= .148626
B( 9 )= 167.85 G( 9 )= .148611
B( 10 )= .900153 G( 10 )= 185.92
B( 11 )=-4.79994 G( 11 )= 2.22923
B( 12 )=-1.02951 G( 12 )= 17.7145
B( 13 )=-.999114 G( 13 )= 31.9523
B( 14 )= 3.42012 G( 14 )= 4.42516
B( 15 )=-65.1549 G( 15 )= 2.22924
B( 16 )=-.726679 G( 16 )= 119.818
B( 17 )=-.114509 G( 17 )= 157.255
B( 19 )=-20.8499 G( 18 )= 3.5103
B( 19 )= 1.21137 G( 19 )= 410.326
B( 20 )=-2.84145 G( 20 )= 308.376
B( 21 )= 31.6001 G( 21 )= 4.42516
B( 22 )=-20.6598 G( 22 )= 4.36376
B( 23 )=-8.54979 G( 23 )= 7.66536
STOP AT 0911
*SIZE
USED: 3626 BYTES
LEFT: 5760 BYTES
*
0 1 772741
1 21 5.59179E-4
2 22 5.59179E-4
CONVERGED TO 5.59179E-4 # ITNS= 3 # EVALS= 29
In the above output, the quantities printed are the number of iterations
(gradient evaluations), the number of function evaluations and the lowest function
value found so far. The sensitivity of the gradient and the convergence pattern to
relatively small changes in arithmetic is, in my experience, quite common for
algorithms of this type.
Chapter 17
17.1. INTRODUCTION
The mathematical problem to be considered here is that of minimising
(17.1)
with respect to the parameters xj, j=1, 2, . . . , n (collected for convenience as the
vector x), where at least one of the functions fi(x) is nonlinear in x. Note that by
collecting the m functions fi(x), i=1, 2, . . . , m, as a vector f, we get
S(x )=f Tf . (17.2)
The minimisation of a nonlinear sum-of-squares function is a sufficiently wide-
spread activity to have developed special methods for its solution. The principal
reason for this is that it arises whenever a least-squares criterion is used to fit a
nonlinear model to data. For instance, let yi represent the weight of some
laboratory animal at week i after birth and suppose that it is desired to model this
by some function of the week number i, which will be denoted y( i, x), where x is
the set of parameters which will be varied to fit the model to the data. If the
criterion of fit is that the sum of squared deviations from the data is to be
minimised (least squares) then the objective is to minimise (17.1) where
f i(x) =y(i,x) -y i (17.3)
or, in the case that confidence weightings are available for each data point,
f i(x)=[y(i,x) -y i]wi (17.4)
where wi, i=1, 2, . . . , m, are the weightings. As a particular example of a growth
function, consider the three-parameter logistic function (Oliver 1964)
y(i,x)=y(i,x 1 ,x 2 ,x 3 ) =x i/[1+exp( x 2 +i x3 ) ] . (17.5)
Note that the form of the residuals chosen in (17.3) and (17.4) is the negative
of the usual ‘actual minus fitted’ used in most of the statistical literature. The
reason for this is to make the derivatives of fi (x) coincide with those of y(i,x).
The minimisation of S(x) could, of course, be approached by an algorithm for
the minimisation of a general function of n variables. Bard (1970) suggests that
this is not as efficient as methods which recognise the sum-of-squares form of
S(x), though more recently Biggs (1975) and McKeown (1974) have found
contrary results. In the paragraphs below, algorithms will be described which take
explicit note of the sum-of-squares form of S(x), since these are relatively simple
and, as building blocks, use algorithms for linear least-squares computations
which have already been discussed in earlier chapters.
207
208 Compact numerical methods for computers
17.2. TWO METHODS
Almost immediately two possible routes to minimising S( x) suggest themselves.
The Cauchy steepest descents method
Find the gradient 2υ(x) of S(x) and step downhill along it. (The reason for the
factor 2 will become apparent shortly.) Suppose that t represents the step length
along the gradient, then for some t we have
S ( x- tv ) <S ( x) (17.6)
except at a local minimum or a saddle point. The steepest descents method
replaces x by (x-tv) and repeats the process from the new point. The iteration is
continued until a t cannot be found for which (17.6) is satisfied. The method,
which was suggested by Cauchy (1848), is then taken to have converged. It can be
shown always to converge if S(x) is convex, that is, if
S (c x 1 +(1-c )x 2 )<c S(x 1 )+(1-c)S(x 2 ) (17.7)
for 0<c<1. Even for non-convex functions which are bounded from below, the
steepest descents method will find a local minimum or saddle point. All the
preceding results are, of course, subject to the provision that the function and
gradient are computed exactly (an almost impossible requirement). In practice,
however, convergence is so slow as to disqualify the method of steepest descents
on its own as a candidate for minimising functions.
Often the cause of this slowness of convergence is the tendency of the method
to take pairs of steps which are virtually opposites, and which are both essentially
perpendicular to the direction in which the minimum is to be found. In a
two-parameter example we may think of a narrow valley with the minimum
somewhere along its length. Suppose our starting point is somewhere on the side
of the valley but not near this minimum. The gradient will be such that the
direction of steepest descent is towards the floor of the valley. However, the step
taken can easily traverse the valley. The situation is then similar to the original
one, and it is possible to step back across the valley almost to our starting point
with only a very slight motion along the valley toward the solution point. One can
picture the process as following a path similar to that which would be followed by
a marble or ball-bearing rolling over the valley-shaped surface.
To illustrate the slow convergence, a modified version of steepest descents was
programmed in BASIC on a Data General NOVA minicomputer having machine
precision 2-22. The modification consisted of step doubling if a step is successful.
The step length is divided by 4 if a step is unsuccessful. This reduction in step size
is repeated until either a smaller sum of squares is found or the step is so small
that none of the parameters change. As a test problem, consider the Rosenbrock
banana-shaped valley:
starting with
S(-l·2,1)=24·1999
Minimising a nonlinear sum of squares 209
(as evaluated). The steepest descents program above required 232 computations
of the derivative and 2248 evaluations of S(x) to find
S(1·00144, 1·0029)=2·1×10-6.
The program was restarted with this point and stopped manually after 468
derivative and 4027 sum-of-squares computations, where
S(1·00084, 1·00168)=7·1×10-7.
By comparison, the Marquardt method to be described below requires 24
derivative and 32 sum-of-squares evaluations to reach
S(1,1)=1·4×10 - 1 4 .
(There are some rounding errors in the display of x1, x2 or in the computation of
S(x), since S(1,1)=0 is the solution to the Rosenbrock problem.)
The Gauss-Newton method
At the minimum the gradient v(x) must be null. The functions vi (x), j=1,
2, . . . , n, provide a set of n nonlinear functions in n unknowns x such that
v (x)= 0 (17.8)
the solution of which is a stationary point of the function S(x), that is, a local
maximum or minimum or a saddle point, The particular form (17.1) or (17.2) of
S(x) gives gradient components
(17.9)
which reduces to
(17.10)
or
v = J Tf (17.11)
by defining the Jacobian matrix J by
(17.12)
Some approximation must now be made to simplify the equations (17.8).
Consider the Taylor expansion of vj (x) about x
(17.13)
If the terms in q2 (that is, those involving qkqj for k, j=1, 2, . . . , n) are assumed
to be negligible and v j (x+q) is taken as zero because it is assumed to be the
solution, then
(17.14)
210 Compact numerical methods for computers
for each j=1, 2, . . . , n. From (17.10) and (17.12), therefore, we have
(17.15)
LEFT-OVERS
18.1. INTRODUCTION
This chapter is entitled ‘left-overs’ because each of the topics-approximation
of derivatives, constrained optimisation and comparison of minimisation
algorithms-has not so far been covered, though none is quite large enough in
the current treatment to stand as a chapter on its own. Certainly a lot more could
be said on each, and I am acutely aware that my knowledge (and particularly my
experience) is insufficient to allow me to say it. As far as I am aware, very little
work has been done on the development of compact methods for the mathemati-
cal programming problem, that is, constrained minimisation with many con-
straints. This is a line of research which surely has benefits for large machines, but
it is also one of the most difficult to pursue due to the nature of the problem. The
results of my own work comparing minimisation algorithms are to my knowledge
the only study of such methods which has been made on a small computer. With
the cautions I have given about results derived from experiments with a single
system, the conclusions made in §18.4 are undeniably frail, though they are for
the most part very similar to those of other workers who have used larger
computers.
where ej is the jth column of the unit matrix of order n (b is presumed to have n
elements). For explanatory purposes, the case n=1 will be used. In place of the
limit (18.3), it is possible to use the forward difference
D= [ S(b+h )- S(b)]/ h (18.4)
for some value of h.
Consider the possible sources of error in using D.
(i) For h small, the discrete nature of the representation of numbers in the
computer causes severe inaccuracies in the calculation of D. The function S is
continuous; its representation is not. In fact it will be a series of steps. Therefore,
h cannot be allowed to be small. Another way to think of this is that since most of
the digits of b are the same as those of (b+h), any function S which is not varying
rapidly will have similar values at b and (b+h), so that the expression (18.4)
implies a degree of digit cancellation causing D to be determined inaccurately.
(ii) For h large, the line joining the points (b , S(b)) and (b+h, S(b+h)) is no
longer tangential to the curve at the former. Thus expression (18.4) is in error due
to the nonlinearity of the function. Even worse, for some functions there may be
a discontinuity between b and (b+h). Checks for such situations are expensive of
both human and computer time. The penalty for ignoring them is unfortunately
more serious.
As a compromise between these extremes, I suggest letting
(18.5)
where ε is the machine precision. The parameter has once more been given a
subscript to show that the step taken along each parameter axis will in general be
different. The value for h given by (18.5) has the pleasing property that it cannot
become smaller than the machine precision even if bj is zero. Neither can it fail to
change at least the right-most half of the digits in bj since it is scaled by the
magnitude of the parameter. Table 18.1 shows some typical results for this
step-length choice compared with some other values.
Some points to note in table 18.1 are:
(i) The effect of the discontinuity in the tangent function in the computations for
b=1 and b=1·57 (near π/2). The less severely affected calculations for b=
-1·57 suggest that in some cases the backward difference
D=[ S(b)-S (b- h)]/ h (18.6)
may be preferred.
(ii) In approximating the derivative of exp(0·001) using h=1·93024E-6 as in
equation (18.5), the system used for the calculations printed identical values for
exp(b) and exp(b+h) even though the internal representations were different
TABLE 18.1. Derivative approximations computed by formula (18.4) on a Data General NOVA. (Extended BASIC system. Machine precision = 16-5.)
h b = 0·001 b=1 b = 100 b = –10 b = 0·001 b=l b = 100 b=0 b=l b = l·57 b = –1.57
1 6·90876 0·693147 0·95064E–3 7·80101E–5 1·72 4·67077 4·6188E43 1·55741 –3·74245 –1256·61 1255·32
0·0625 66·4166 0·969993 9·99451E–3 4·68483E–5 1·03294 2·80502 2·7737H343 1·0013 3·80045 –20354·4 19843
3·90625E–3 407·171 0·998032 1·00098E–2 4·54858E–5 1·00293 2·72363 2·69068E43 1 3·44702 –403844 267094
2·44141E–4 894·754 0·999756 1·17188E–2 4·52995E–5 1 2·71875 2·67435E43 0·999999 3·42578 2·2655E6 1·2074E6
1·52588E–5 992·437 1 0·0625 4·3869E–5 1 2·75 1·81193E43 0·999999 3·5 1·57835E6 1·59459E6
9·53674E–7 1000 1 0 3·05176E–5 1 2 0 0·999999† 5 1·24006E6 2·47322E6
1·93024E–6 999·012† 0·988142†
9·77516E–4 0·999512† 2·72† 3·43317†
9·76572E–2 9·9999E–3† 2·82359E43†
9·76658E–3 4·56229E–5†
1·53416E–3 –1·70216E6† 539026†
Analytic
derivative‡ 1000 1 0·01 4·54E–5 1.001 2.71828 2·68805E43 1 3.42552 1·57744E6 1·57744E6
(18.13)
It should be noted that while the constraint has obviously disappeared, these
substitutions tend to complicate the resulting unconstrained minimisation by
introducing local minima and multiple solutions. Furthermore, in many cases
suitable substitutions are very difficult to construct.
Penalty functions
The basis of the penalty function approach is simply to add to the function S( b )
some positive quantity reflecting the degree to which a constraint is violated. Such
penalties may be global, that is, added in everywhere, or partial, that is, only
added in where the constraint is violated. While there are many possibilities, here
we will consider only two very simple choices. These are, for equality constraints,
the penalty functions
(18.14)
and for inequalities, the partial penalty
(18.15)
where H is the Heaviside function
f o r x >0
(18.16)
f o r x<0.
The quantities wj, j=1, 2, . . . , m, and Wk , k=1, 2, . . . , q, are weights which
have to be assigned to each of the constraints. The function
(18.17)
(18.21)
subject to
b3 b6= b4 b5 . (18.22)
The data for this problem are given in table 18.2. The decision that must now
be made is which variable is to be eliminated via (18.22); for instance, b 6 can be
found as
b 6= b 4 b 5 /b3 . (18.23)
The behaviour of the Marquardt algorithm 23 on each of the four unconstrained
minimisation problems which can result from elimination in this fashion is shown
in table 18.3. Numerical approximation of the Jacobian elements was used to save
some effort in running these comparisons. Note that in two cases the algorithm
has failed to reach the minimum. The starting point for the iterations was bj =1,
j=l, 2, . . . , 6, in every case, and these failures are most likely due to the large
differences in scale between the variables. Certainly, this poor scaling is respons-
ible for the failure of the variable metric and conjugate gradients algorithms when
the problem is solved by eliminating b 6. (Analytic derivatives were used in these
cases.)
The penalty function approach avoids the necessity of choosing which parame-
ter is eliminated. The lower half of table 18.3 presents the results of computations
with the Marquardt-like algorithm 23. Similar results were obtained using the
Nelder-Mead and variable metric algorithms, but the conjugate gradients method
failed to converge to the true minimum. Note that as the penalty weighting w is
increased the minimum function value increases. This will always be the case if a
constraint is active, since enforcement of the constraint pushes the solution ‘up
the hill’.
Usually the penalty method will involve more computational effort than the
elimination approach (a) because there are more parameters in the resulting
224 Compact numerical methods for computers
TABLE 18.2. Data for the problem of Z Hassan specified by (18.21) and (18.22). Column j below
gives the observations yij , for rows i=1, 2, . . . , m, for m = 26.
1 2 3 4 5 6
unconstrained problem, in our example six instead of five, and (b) because the
unconstrained problem must be solved for each increase of the weighting w.
Furthermore, the ultimate de-scaling of the problem as w is made large may cause
slow convergence of the iterative algorithms.
In order to see how the penalty function approach works for inequality
constraints where there is no corresponding elimination, consider the following
problem (Dixon 1972, p 92): minimise
(18.24)
subject to
3b1+4b2 <6 (18.25)
and
-b1+4 b2 <2. (18.26)
The constraints were weighted equally in (18.15) and added to (18.24). The
resulting function was minimised using the Nelder-Mead algorithm starting from
b 1= b2=0 with a step of 0·01 to generate the initial simplex. The results are
presented in table 18.4 together with the alternative method of assigning the
Left-overs 225
TABLE 18.3. Solutions found for Z Hassan problem via Marquardt-type algorithm using numerical
approximation of the Jacobian and elimination of one parameter via equation (18.14). The values in
italics are for the eliminated parameter. All calculations run in BASIC on a Data General NOVA in
23-bit binary arithmetic.
Elimi- Sum of
b1 b2 b3 b4 b5 b6 nated squarest
† Figures in brackets below each sum of squares denote total number of equivalent function
evaluations (= (n+1) *(number of Jacobian calculations) + (number of function calculations)) to con-
vergence.
function a very large value whenever one or more of the constraints is violated. In
this last approach it has been reported that the simplex may tend to ‘collapse’ or
‘flatten’ against the constraint. Swann discusses some of the devices used to
counteract this tendency in the book edited by Gill and Murray (1974). Dixon
(1972, chap 6) gives a discussion of various methods for constrained optimisation
with a particular mention of some of the convergence properties of the penalty
function techniques with respect to the weighting factors w and W.
TABLE 18.4. Results of solution of problem (18.24)-( 18.26) by the Nelder-Mead
algorithm.
(18.31)
The average of w/n over all problems, on the other hand, is approximately
(18.32)
228 Compact numerical methods for computers
Hence the ratio
(18.33)
gives
a = l / ( 2r - 1 ) (18.34)
as an estimate of the degree of the relationship between work and problem order,
n, of a given method. The limited extent of the tests and the approximation of
sums by integrals, however, mean that the results of such an analysis are no more
than a guide to the behaviour of algorithms. The results of the tests are presented
in table 18.5.
The conclusions which may be drawn from the table are loosely as follows.
(i) The Marquardt algorithm 23 is generally the most reliable and efficient.
Particularly if problems having ‘large’ residuals, which cause the Gauss-Newton
approximation (17.16) to be invalid, are solved by other methods or by increasing
the parameter phi in algorithm 23, it is extremely efficient, as might be expected
since it takes advantage of the sum-of-squares form.
(ii) The Marquardt algorithm using a numerical approximation (18.4) for the
Jacobian is even more efficient than its analytic-derivative counterpart on those
problems it can solve. It is less reliable, of course, than algorithms using analytic
derivatives. Note, however, that in terms of the number of parameters determined
successfully, only the variable metric algorithm and the Marquardt algorithm are
more effective.
(iii) The Nelder-Mead algorithm is the most reliable of the derivative-free
methods in terms of number of problems solved successfully. However, it is also
one of the least efficient, depending on the choice of measure w1 or w0, though in
some ways this is due to the very strict convergence criterion and the use of the
axial search procedure. Unfortunately, without the axial search, the number of
problems giving rise to ‘failures’ of the algorithm is 11 instead of four, so I cannot
recommend a loosening of the convergence criteria except when the properties of
the function to be minimised are extremely well known.
(iv) The conjugate gradients algorithm, because of its short code length and low
working-space requirements, should be considered whenever the number of
parameters to be minimised is large, especially if the derivatives are inexpensive
to compute. The reliability and efficiency of conjugate gradients are lower than
those measured for variable metric and Marquardt methods. However, this study,
by using equivalent function evaluations and by ignoring the overhead imposed by
each of the methods, is biased quite heavily against conjugate gradients, and I
would echo Fletcher’s comment (in Murray 1972, p 82) that ‘the algorithm is
extremely reliable and well worth trying’.
As a further aid to the comparison of the algorithms, this chapter is concluded
with three examples to illustrate their behaviour.
Example 18.1. Optimal operation of a public lottery
In example 12.1 a function minimisation problem has been described which arises
in an attempt to operate a lottery in a way which maximises revenue per unit
TABLE 18.5. Comparison of algorithm performance as measured by equivalent function evaluations (efe’s).
Algorithm 19+20 21 21 22 22 23 23 23
With With With
numerically numerically numerically Omitting
Type Nelder– Variable approximated Conjugate approximated Marquardt approximated problem
Mead metric gradient gradients gradient Jacobian 34†
† Problem 34 of the set of 79 has been designed to have residuals f so that the second derivatives of these residuals cannot be dropped from
equation ( 17, 15) to make the Gauss–Newton approximation. The failure of the approximation in this case is reflected in the very slow (12000 efe’s)
convergence of algorithm 23.
‡ On Data General NOVA (23-bit mantissa).
230 Compact numerical methods for computers
time. Perry and Soland (1975) derive analytic solutions to this problem, but both
to check their results and determine how difficult the problem might be if such a
solution were not possible, the following data were run through the Nelder-Mead
simplex algorithm 19 on a Data General NOVA operating in 23-bit binary
arithmetic.
K1=3·82821 K2=0·416 K3=5·24263 F=8·78602
a =0·23047 β=0·12 γ =0·648 δ =1·116.
Starting at the suggested values bT=(7, 4, 300, 1621), S( b)=-77·1569, the
algorithm took 187 function evaluations to find S(b*)=-77·1602 at (b*)T=
(6·99741, 3·99607, 300·004, 1621·11). The very slow convergence here is cause
for some concern, since the start and finish points are very close together.
From b T=(1, 1, 300, 1621), S(b)=707·155, the Nelder-Mead procedure took
888 function evaluations to b*=(6·97865, 3·99625, 296·117, 1619·92)T and
S(b*)=-77·1578 where it was stopped manually. However, S was less than -77
after only 54 evaluations, so once again convergence appears very slow near the
minimum. Finally, from bT=(1, 1, 1, l), S(b)=5·93981, the algorithm converged
to S(b*)=-77·1078 in 736 evaluations with b*=(11·1905, 3·99003, 481·054,
2593·67)T. In other words, if the model of the lottery operation, in particular the
production function for the number of tickets sold, is valid, there is an alternative
solution which ‘maximises’ revenue per unit time. There may, in fact, be several
alternatives.
If we attempt the minimisation of S(b) using the variable metric algorithm 21
and analytic derivatives, we obtain the following results.
(The efe’s are equivalent function evaluations; see §18.4 for an explanation.) In
case (b), the price per ticket (second parameter) is clearly exorbitant and the
duration of the draw (first parameter) over a year and a half. The first prize (third
parameter, measured in units 1000 times as large as the price per ticket) is
relatively small. Worse, the revenue (-S) per unit time is negative! Yet the
derivatives with respect to each parameter at this solution are small. An addi-
tional fact to be noted is that the algorithm was not able to function normally, that
is, at each step algorithm 21 attempts to update an iteration matrix. However,
under certain conditions described at the end of §15.3, it is inadvisable to do this
and the method reverts to steepest descent. In case (b) above, this occurred in 23
of the 25 attempts to perform the update, indicating that the problem is very far
from being well approximated by a quadratic form. This is hardly surprising. The
matrix of second partial derivatives of S is certain to depend very much on the
parameters due to the fractional powers (a, β, γ, δ) which appear. Thus it is
unlikely to be ‘approximately constant’ in most regions of the parameter space as
Left-overs 231
required of the Hessian in §15.2. This behaviour is repeated in the early iterations
of case (c) above.
In conclusion, then, this problem presents several of the main difficulties which
may arise in function minimisation:
(i) it is highly nonlinear;
(ii) there are alternative optima; and
(iii) there is a possible scaling instability in that parameters 3 and 4 (v and w)
take values in the range 200-2000, whereas parameters 1 and 2 (t and p) are in
the range l-10.
These are problems which affect the choice and outcome of minimisation proce-
dures. The discussion leaves unanswered all questions concerning the reliability of
the model or the difficulty of incorporating other parameters, for instance to take
account of advertising or competition, which will undoubtedly cause the function
to be more difficult to minimise.
Example 18.2. Market equilibrium and the nonlinear equations that result
In example 12.3 the reconciliation of the market equations for supply
q=Kp a
and demand
has given rise to a pair of nonlinear equations. It has been my experience that
such systems are less common than minimisation problems, unless the latter are
solved by zeroing the partial derivatives simultaneously, a practice which gener-
ally makes work and sometimes trouble. One’s clients have to be trained to
present a problem in its crude form. Therefore, I have not given any special
method in this monograph for simultaneous nonlinear equations, which can be
written
f (b) =0 (12.5)
preferring to solve them via the minimisation of
f Tf = S(b) (12.4)
which is a nonlinear least-squares problem. This does have a drawback, however,
in that the problem has in some sense been ‘squared’, and criticisms of the same
kind as have been made in chapter 5 against the formation of the sum-of-squares
and cross-products matrix for linear least-squares problems can be made against
solving nonlinear equations as nonlinear least-squares problems. Nonetheless, a
compact nonlinear-equation code will have to await the time and energy for its
development. For the present problem we can create the residuals
f 1= q- Kp α
f 2 = l n (q)-ln(Z)+bln(p) .
232 Compact numerical methods for computers
The second residual is the likely form in which the demand function would be
estimated. To obtain a concrete and familiar form, substitute
q =b 1 p = b2 K =1
= 1·5 β = 1·2 Z = exp(2)
so that
f 2 =ln( b 1 )-2+1·2ln( b 2 ) .
Now minimising the sum of squares
To solve this by means of algorithms 19, 21, 22 and 23, the residuals
19.1. INTRODUCTION
This monograph concludes by applying the conjugate gradients method, de-
veloped in chapter 16 for the minimisation of nonlinear functions, to linear equations,
linear least-squares and algebraic eigenvalue problems. The methods suggested
may not be the most efficient or effective of their type, since this subject area has
not attracted a great deal of careful research. In fact much of the work which has
been performed on the sparse algebraic eigenvalue problem has been carried out
by those scientists and engineers in search of solutions. Stewart (1976) has
prepared an extensive bibliography on the large, sparse, generalised symmetric
matrix eigenvalue problem in which it is unfortunately difficult to find many
reports that do more than describe a method. Thorough or even perfunctory
testing is often omitted and convergence is rarely demonstrated, let alone proved.
The work of Professor Axe1 Ruhe and his co-workers at Umea is a notable
exception to this generality. Partly, the lack of testing is due to the sheer size of
the matrices that may be involved in real problems and the cost of finding
eigensolutions.
The linear equations and least-squares problems have enjoyed a more diligent
study. A number of studies have been made of the conjugate gradients method
for linear-equation systems with positive definite coefficient matrices, of which
one is that of Reid (1971). Related methods have been developed in particular by
Paige and Saunders (1975) who relate the conjugate gradients methods to the
Lanczos algorithm for the algebraic eigenproblem. The Lanczos algorithm has
been left out of this work because I feel it to be a tool best used by someone
prepared to tolerate its quirks. This sentiment accords with the statement of
Kahan and Parlett (1976):‘The urge to write a universal Lanczos program should
be resisted, at least until the process is better understood.’ However, in the hands
of an expert, it is a very powerful method for finding the eigenvalues of a large
symmetric matrix. For indefinite coefficient matrices, however, I would expect the
Paige-Saunders method to be preferred, by virtue of its design. In preparing the first
edition of this book, I experimented briefly with some FORTRAN codes for several
methods for iterative solution of linear equations and least-squares problems, finding
no clear advantage for any one approach, though I did not focus on indefinite
matrices. Therefore, the treatment which follows will stay with conjugate gradients,
which has the advantage of introducing no fundamentally new ideas.
It must be pointed out that the general-purpose minimisation algorithm 22 does
not perform very well on linear least-squares or Rayleigh quotient minimisations.
234
Conjugate gradients method in linear algebra 235
In some tests run by S G Nash and myself, the inexact line searches led to very
slow convergence in a number of cases, even though early progress may have
been rapid (Nash and Nash 1977).
The above algorithm requires five working vectors to store the problem and intermediate results
as well as the solution. This is exclusive of any space needed to store or generate the coefficient
matrix.
Thus, since y0 is fixed as zero by (2.9), if the value y 1 is known, all the points of
the solution can be computed. But (2.10) requires
yn+1=2 (19.15)
thus we can consider the difference
f( y 1 ) =y n+ 1 - 2 (19.16)
to generate a root-finding problem. This process is called a shooting method since
we aim at the value of yn+l desired by choosing yl. Table 19.1 compares the three
methods suggested for n=4, 10 and 50. The main comparison is between the
values found for the deviation from the true solution
y (x) =x+x 3 (19.17)
or
y(x j)=jh(l+j 2 h2 ) . (19.18)
It is to be noted that the use of the conjugate gradients method with the normal
equations (19.13) is unsuccessful, since we have unfortunately increased the ill
conditioning of the equations in the manner discussed in chapter 5. The other two
methods offer comparable accuracy, but the shooting method, applying algorithm
18 to find the root of equation (19.16) starting with the interval [0,0·5], is
somewhat simpler and faster. In fact, it could probably be solved using a trial-and-
error method to find the root on a pocket calculator.
240 Compact numerical methods for computers
Example 19.2. Surveying-data fitting
The output below illustrates the solution of a linear least-squares problem of the
type described in example 2.4. No weighting of the observations is employed here,
though in practice one would probably weight each height-difference observation
by some factor inversely related to the distance of the observation position from
the points whose height difference is measured. The problem given here was
generated by artificially perturbing the differences between the heights b=
(0, 100, 121, 96)T. The quantities G printed are the residuals of the normal
equations.
RUN
SURVEYING LEAST SQUARES
# OF POINTS? 4
# OF OBSERVATIONS? 5
HEIGHT DIFF BETWEEN? 1 AND? 2=? -99.99
HEIGHT DIFF BETWEEN? 2 AND? 3=? -21.03
HEIGHT DIFF BETWEEN? 3 AND? 4=? 24.98
HEIGHT DIFF BETWEEN? 1 AND? 3=? -121.02
HEIGHT DIFF BETWEEN? 2 AND? 4=? 3.99
B( 1 )=-79.2575 G= 2.61738E-6
B( 2 )= 20.7375 G=-2.26933E-6
B( 3 )= 41.7575 G=-6.05617E-6
B( 4 )= 16.7625 G=-5.73596E-6
DIFF( 1 )=-99.995
DIFF( 2 )=-21.02
DIFF( 3 )= 24.995
DIFF( 4 )=-121.015
DIFF( 5 )= 3.97501
# MATRIX PRODUCTS= 4
HEIGHT FORM B(1)=0
2 99.995
3 121.015
4 96.02
The software diskette contains the data file EX24LSl.CNM which, used with the driver
DR24LS.PAS, will execute this example.
Variance of Variance
m= Matrix Height Perturbation Perturbation computed height reduction
n n(n-1)/2 products scale S1 scale S2 variance? differences? factor
Consider two symmetric matrices A and B where B is also positive definite. The
Rayleigh quotient defined by
R=x T Ax/ xT Bx (19.28)
then takes on its stationary values (that is, the values at which the partial
derivatives with respect to the components of x are zero) at the eigensolutions of
Ax=e Bx. (2.63)
In particular, the maximum and minimum values of R are the extreme eigen-
values of the problem (2.63). This is easily seen by expanding
(19.29)
where ~j is the jth eigenvector corresponding to the eigenvalue ej. Then we have
(19.31)
244 Compact numerical methods for computers
If
e 1 >e 2 > . . .>e n (19.31)
then the minimum value of R is en and occurs when x is proportional to φ n . The
maximum value is e1. Alternatively, this value can be obtained via minimisation of
-R. Furthermore, if B is the identity, then minimising the Rayleigh quotient
R' =x T (A-k1 n )2x/ x T x (19.32)
will give the eigensolution having its eigenvalue closest to k.
While any of the general methods for minimising a function may be applied to
this problem, concern for storage requirements suggests the use of the conjugate
gradients procedure. Unfortunately, the general-purpose algorithm 22 may con-
verge only very slowly. This is due (a) to the inaccuracy of the linear search, and
(b) to loss of conjugacy between the search directions t j , j=1, 2, . . . , n. Both these
problems are exacerbated by the fact that the Rayleigh quotient is homogeneous
of degree zero, which means that the Rayleigh quotient takes the same value for
any vector Cx, where C is some non-zero constant. This causes the Hessian of the
Rayleigh quotient to be singular, thus violating the conditions normally required
for the conjugate gradients algorithm. Bradbury and Fletcher (1966) address this
difficulty by setting to unity the element of largest magnitude in the current
eigenvector approximation and adjusting the other elements accordingly. This
adjustment is made at each iteration. However, Geradin (1971) has tackled the
problem more directly, that is, by examining the Hessian itself and attempting to
construct search directions which are mutually conjugate with respect to it. This
treatment, though surely not the last word on the subject, is essentially repeated
here. The implementation details are my own.
Firstly, consider the linear search subproblem, which in the current case can
be solved analytically. It is desired to minimise
R= (x+k t) TA( x+k t) / (x+k t) TB( x+k t) (19.33)
with respect to k. For convenience this will be rewritten
R=N( k) /D( k ) (19.34)
with N and D used to denote the numerator and denominator, respectively.
Differentiating with respect to k gives
d R/ d k=0=(DdN/ d k- NdD/ d k) /D 2 . (19.35)
Because of the positive definiteness of B, D can never be zero unless
x + kt = 0 . (19.36)
Therefore, ignoring this last possibility, we set the numerator of expression
(19.35) to zero to obtain the quadratic equation
uk2 +v k + w =0 (19.37)
where
u = (t T At)(x T Bt)-(x T At)(t T Bt) (19.38)
Conjugate gradients method in linear algebra 245
v= (t T At) (x T Bx) - (x T Ax) (t T Bt) (19.39)
T T T T
w = (x At) (x Bx) - (x Ax) (x Bt). (19.40)
Note that by symmetry
x T At = t T A x (19.41)
and
x T Bt=t T Bx. (19.42)
Therefore, only six inner products are needed in the evaluation of u, v and w.
These are
(x T Ax) (x T At) and (tT At)
and
(x T Bx) (x T Bt) and (tT Bt).
The quadratic equation (19.37) has two roots, of which only one will correspond to
a minimum. Since
y (k )=0·5D 2 ( dR/ dk ) = u k 2 + v k + w (19.43)
we get
(19.44)
Example 19.3. Conjugate gradients for inverse iteration and Rayleigh quotient
minimisation
Table 19.3 presents approximations to the minimal and maximal eigensolutions
of the order-10 matrix eigenproblem (2.63) having as A the Moler matrix and as
B the Frank matrix (appendix 1). The following notes apply to the table.
(i) The maximal (largest eigenvalue) eigensolution is computed using (- A) instead
of A in algorithm 25.
(ii) Algorithm 15 computes all eigensolutions for the problem. The maximum
absolute residual quoted is computed in my program over all these solutions, not
simply for the eigenvalue and eigenvector given.
(iii) It was necessary to halt algorithm 10 manually for the case involving a shift
of 8·8. This is discussed briefly in §9.3 (p 109).
(iv) The three iterative algorithms were started with an initial vector of ones.
250 Compact numerical methods for computers
TABLE 19.3. (a) Minimal and (b) maximal eigensolutions of Ax = eBx for A = Moler matrix, B = Frank
matrix (order 10).
(v) Different measures of convergence and different tolerances have been used in
the computations, which were all performed on a Data General NOVA in
23-bit binary arithmetic. That these measures are different is due to the various
operating characteristics of the programs involved.
Conjugate gradients method in linear algebra 251
Example 19.4. Negative definite property of Fröberg’s matrix
In example 19.1 the coefficient matrix arising in the linear equations ‘turns out to
be negative definite’. In practice, to determine this property the eigenvalues of the
matrix could be computed. Algorithm 25 is quite convenient in this respect, since
a matrix A having a positive minimal eigenvalue is positive definite. Conversely,
if the smallest eigenvalue of (-A) is positive, A is negative definite. The minimum
eigenvalues of Fröberg coefficient matrices of various orders were therefore
computed. (The matrices were multiplied by -1.)
4 0·350144 5 1·80074E-13
10 7·44406E-2 11 2·08522E-10
50 3·48733E-3 26 1·9187E-10
100 8·89398E-4 49 7·23679E-9
In order to test programs for the algebraic eigenproblem and linear equations, it is
useful to have a set of easily generated matrices whose properties are known. The
following nine real symmetric matrices can be used for this purpose.
Hilbert segment of order n
A i j = l / (i+ j- 1 ) .
This matrix is notorious for its logarithmically distributed eigenvalues. While it
can be shown in theory to be positive definite, in practice it is so ill conditioned
that most eigenvalue or linear-equation algorithms fail for some value of n<20.
Ding Dong matrix
A i j =0·5/(n- i- j+1·5).
The name and matrix were invented by Dr F N Ris of IBM, Thomas J Watson
Research Centre, while he and the author were both students at Oxford. This
Cauchy matrix has few trailing zeros in any elements, so is always represented
inexactly in the machine. However, it is very stable under inversion by elimination
methods. Its eigenvalues have the property of clustering near ± π /2.
Moler matrix
Ai i =i
A i j =min(i, j)-2 for i j.
Professor Cleve Moler devised this simple matrix. It has the very simple Choleski
decomposition given in example 7.1, so is positive definite. Nevertheless, it has
one small eigenvalue and often upsets elimination methods for solving linear-
equation systems.
Frank matrix
A i j=min(i,j).
A reasonably well behaved matrix.
Bordered matrix
Ai i =1
Ai n =A n i =2 1 - i for i n
Ai j = 0 otherwise.
The matrix has (n-2) eigenvalues at 1. Wilkinson (1965, pp 94-7) gives some
discussion of this property. The high degree of degeneracy and the form of the
253
254 Compact numerical methods for computers
‘border’ were designed to give difficulties to a specialised algorithm for matrices of
this form in which I have been interested from time to time.
Diagonal matrix
Ai i =i
Ai j=0 for i j.
This matrix permits solutions to eigenvalue and linear-equation problems to be
computed trivially. It is included in this set because I have known several
programs to fail to run correctly when confronted with it. Sometimes programs
are unable to solve trivial problems because their designers feel they are ‘too
easy.’ Note that the ordering is ‘wrong’ for algorithms 13 and 14.
Wilkinson W+matrix
A i i = [n/2]+1-min( i, n- i+1) for i=1, 2, . . . , n
Ai, i + l = A i + 1 ,i =1 for i=1, 2, . . . , (n-1)
Ai j = 0 for| j - i | > l
where [b] is the largest integer less than or equal to b. The W+matrix (Wilkinson
1965, p 308) is normally given odd order. This tridiagonal matrix then has
several pairs of close eigenvalues despite the fact that no superdiagonal element is
small. Wilkinson points out that the separation between the two largest eigen-
values is of the order of (n!)-2 so that the power method will be unable to
separate them unless n is very small.
Wilkinson W-matrix
Ai i = [ n/ 2 ] + 1 -i for i=1, 2, . . . , n
Ai, i + l = A i+ l , l for i=1, 2, . . . , (n-1)
Ai j=0 for|j-i|>1
where [b] is the largest integer less than or equal to b. For odd order, this matrix
has eigenvalues which are pairs of equal magnitude but opposite sign. The
magnitudes of these are very close to some of those of the corresponding W+
matrix.
Ones
Ai j =1 for all i,j.
This matrix is singular. It has only rank one, that is, (n-1) zero eigenvalues.
The matrices described here may all be generated by the Pascal procedure
MATRIXIN.PAS, which is on the software diskette. This procedure also allows for
keyboard entry of matrices.
Appendix 2
LIST OF ALGORITHMS
255
Appendix 3
LIST OF EXAMPLES
The files on the diskette fall into several categories. For the new user of the diskette,
we strongly recommend looking at the file
README.CNM
which contains notes of any errors or additions to material in either the book or the
diskette. This can be displayed by issuing a command
TYPE(drive:)README.CNM
where drive: is the disk drive specification for the location of the README.CNM
file. The file may also be printed, or viewed with a text editor.
The algorithms (without comments) are in the files which follow. Only
ALG03A.PAS has not appeared on the pages of the book.
ALG01.PAS
ALG02.PAS
ALG03.PAS
ALG03A.PAS
ALG04.PAS
ALG05.PAS
ALG06.PAS
ALG07.PAS
ALG08.PAS
ALG09.PAS
ALG10.PAS
ALG11.PAS
ALG12.PAS
ALG13.PAS
ALG14.PAS
ALG15.PAS
ALG16.PAS
ALG17.PAS
ALG18.PAS
ALG19.PAS
ALG20.PAS
ALG21.PAS
ALG22.PAS
258
Files on the software diskette 259
ALG23.PAS
ALG24.PAS
ALG25.PAS
ALG26.PAS
ALG27.PAS
The following files are driver programs to run examples of use of the algorithms.
CALCEPS.PAS --to compute the machine precision for the Turbo Pascal com-
puting environment in which the program is compiled
CONSTYPE.DEF ---a set of constant and type specifications common to the codes
CUBEFN.PAS ---a cubic test function of one variable with minimum at 0.81650
FNMIN.PAS ---a main program to run function minimisation procedures
GENEVRES.PAS ---residuals of a generalised eigenvalue problem
GETOBSN.PAS ---a procedure to read a single observation for several variables
(one row of a data matrix)
HTANFN.PAS ---the hyperbolic tangent, example 13.2
JJACF.PAS ---Jaffrelot’s autocorrelation problem, example 14.1
MATCOPY.PAS ---to copy a matrix
MATMUL.PAS ---to multiply two matrices
MATRIXIN.PAS ---to create or read in matrices
PSVDRES.PAS ---to print singular-value decomposition results
QUADFN.PAS ---real valued test function of x for [1D] minimisation and root-
finding
RAYQUO.PAS ---to compute the Rayleigh quotient for a generalised eigenvalue
problem
RESIDS.PAS ---to compute residuals for linear equations and least-squares
problems
ROSEN.PAS ---to set up and compute function and derivative information for
the Rosenbrock banana-shaped valley test problem
SPENDFN.PAS ---the expenditure example, illustrated in example 12.5 and
example 13.1
STARTUP.PAS ---code to read the names of and open console image and/or
console control files for driver programs. This common code
segment is not a complete procedure, so cannot be included in
Turbo Pascal 5.0 programs.
SVDTST.PAS ---to compute various tests of a singular-value decomposition
TDSTAMP.PAS ---to provide a time and date stamp for output (files). This code
makes calls to the operating system and is useful only for MS-
DOS computing environments. In Turbo Pascal 5.0, there are
utility functions which avoid the DOS call.
VECTORIN.PAS ---to create or read in a vector
The following files provide control information and data to the driver programs.
Their names can be provided in response to the question
Be sure to include the filename extension (.CNM). The nomenclature follows that for
the DR*.PAS files. In some cases additional examples have been provided. For these
files a brief description is provided in the following list of control files.
Files on the software diskette 261
EX0102.CNM
EX03.CNM
EX03A.CNM
EX04.CNM
EX0506.CNM
EX0506S.CNM --- a set of equations with a singular coefficient matrix
EX0708.CNM
EX09.CNM
EX10.CNM
EX13.CNM
EX14.CNM
EX15.CNM
EX1617.CNM
EX1618.CNM
EX19.CNM
EX1920.CNM
EX1920J.CNM --- data for the Jaffrelot problem (JJACF.PAS), example 14.1
EX21.CNM
EX22.CNM
EX23.CNM
EX24II.CNM
EX24LE.CNM
EX24LS.CNM
EX24LS1.CNM --- data for example 19.2
EX25.CNM
EX26.CNM
EX26A.CNM
EX27J.CNM --- data for the Jaffrelot problem (JJACF.PAS), example 14.1.
EX27R.CNM --- console control file for the regular test problem, the Rosen-
brock test function (ROSEN.PAS)
If the driver programs have been loaded and compiled to saved executable (.COM)
files, then we can execute these programs by typing their names, e.g. DR0102. The
user must then enter command information from the keyboard. This is not difficult,
but it is sometimes useful to be able to issue such commands from a file. Such a
BATch command file (.BAT extension) is commonly used in MS-DOS systems. In the
driver programs we have included compiler directives to make this even easier to use
by allowing command input to come from a file. A batch file EXAMPLE.BAT which
could run drivers for algorithms 1 through 6 would have the form
rem EXAMPLE.BAT
rem runs Nash Algorithms 1 through 6 automatically
DR0102<DR0102X.
DR03A<DR03AX.
DR03<DR03X.
DR04<DR04X.
DR0506<DR0506X.
262 Compact numerical methods for computers
The files which end in an ‘X.’ contain information to control the drivers, in fact, they
contain the names of the EX*.CNM control files. This facility is provided to allow
for very rapid testing of all the codes at once (the technical term for this is ‘regression
testing’). Note that console image files having names of the form OUT0102 are
created, which correspond in form to the driver names, i.e. DR0102.PAS. The
command line files present on the disk are:
Users may wish to note that there are a number of deficiencies with version 3.01 a of
Turbo Pascal. I have experienced some difficulty in halting programs with the
Control-C or Control-Break keystrokes, in particular when the program is waiting
for input. In some instances, attempts to halt the program seem to interfere with the
files on disk, and the ‘working’ algorithm file has been over-written! On some
occasions, the leftmost characters entered from the keyboard are erased by READ
instructions. From the point of view of a software developer, the absence of a facility
to compile under command of a BATch command file is a nuisance. Despite these
faults, the system is relatively easy to use. Many of the faults of Turbo Pascal 3.01a
have been addressed in later versions of the product. We anticipate that a diskette of
the present codes adapted for version 5.0 of Turbo Pascal will be available about the
time the book is published. Turbo Pascal 5.0 is, however, a much ‘larger’ system in
terms of memory requirements.
BIBLIOGRAPHY
ABKAMOWITZ M and STEGUN I A 1965 Handbook of Mathematical Functions with Formulas, Graphs and
Mathematical Tables (New York: Dover)
ACTON F S 1970 Numerical Methods that Work (New York: Harper and Row)
BARD Y 1967 Nonlinear Parameter Estimation and Programming (New York: IBM New York Scientific
Center)
---1970 Comparison of gradient methods for the solution of nonlinear parameter estimation problems
SIAM J. Numer. Anal. 7 157-86
--1974 Nonlinear Parameter Estimation (New York/London: Academic)
BATES D M and WATTS D G 1980 Relative curvature measures of nonlinearity J. R. Stat. Soc. B 42 1-25
---1981a A relative offset orthogonality convergence criterion for nonlinear least squares Technometrics
23 179-83
---1988 Nonlinear Least Squares (New York: Wiley)
BAUER F L and REINSCH C 1971 Inversion of positive definite matrices by the Gauss-Jordan method in
linear algebra Handbook for Automatic Computation vol 2, eds J H Wilkinson and C Reinsch (Berlin:
Springer) contribution l/3 (1971)
BEALE E M L 1972 A derivation of conjugate gradients Numerical Methods for Nonlinear Optimization ed.
F A Lootsma (London: Academic)
BELSLEY D A, KUH E and WELSCH R E 1980 Regression Diagnostics: Identifying Influential Data and
Sources of Collinearity (New York/Toronto: Wiley)
BIGGS M C 1975 Some recent matrix updating methods for minimising sums of squared terms Hatfield
Polytechnic, Numerical Optimization Centre, Technical Report 67
BOOKER T H 1985 Singular value decomposition using a Jacobi algorithm with an unbounded angle of
rotation PhD Thesis (Washington, DC: The American University)
BOWDLER H J, MARTIN R S, PETERS G and WILKINSON J H 1966 Solution of real and complex systems of
linear equations Numer. Math. 8 217-34; also in Linear Algebra, Handbook for Automatic Computation
vol 2, eds J H Wilkinson and C Reinsch (Berlin: Springer) contribution l/7 (1971)
BOX G E P 1957 Evolutionary operation: a method for increasing industrial productivity Appl. Stat. 6
81-101
BOX M J 1965 A new method of constrained optimization and a comparison with other methods Comput.
J. 8 42-52
BOX M J, DAVIES D and SWANN W H 1971 Techniques d’optimisation non lintéaire, Monographie No 5 (Paris:
Entreprise Moderne D’ Edition) Original English edition (London: Oliver and Boyd)
BRADBURY W W and FLETCHER R 1966 New iterative methods for solution of the eigenproblem Numer.
Math. 9 259-67
BREMMERMANN H 1970 A method of unconstrained global optimization Math. Biosci. 9 1-15
BRENT R P 1973 Algorithms for Minimization Without Derivatives (Englewood Cliffs, NJ: Prentice-Hall)
BROWN K M and GEARHART W B 1971 Deflation techniques for the calculation of further solutions of
nonlinear systems Numer. Math. 16 334-42
BROYDEN C G 1970a The convergence of a class of double-rank minimization algorithms, pt 1 J. Inst.
Maths Applies 6 76-90
---1970b The convergence of a class of double-rank minimization algorithms, pt 2 J. Inst. Maths Applies
6 222-31
---1972 Quasi-Newton methods Numerical methods for Unconstrained Optimization ed. W Murray
(London: Academic) pp 87-106
263
264 Compact numerical methods for computers
BUNCH J R and NEILSEN C P 1978 Updating the singular value decomposition Numerische Mathematik 31
111-28
BUNCH J R and ROSE D J (eds) 1976 Sparse Matrix Computation (New York: Academic)
BUSINGER P A 1970 Updating a singular value decomposition (ALGOL programming contribution, No 26)
BIT 10 376-85
CACEI M S and CACHERIS W P 1984 Fitting curves to data (the Simplex algorithm is the answer) Byte 9
340-62
CAUCHY A 1848 Méthode générale pour la resolution des systémes d’équations simultanées C. R. Acad.
Sci., Paris 27 536-8
CHAMBERS J M 1969 A computer system for fitting models to data Appl. Stat. 18 249-63
---1971 Regression updating J. Am. Stat. Assoc. 66 744-8
---1973 Fitting nonlinear models: numerical techniques Biometrika 60 1-13
CHARTRES B A 1962 Adaptation of the Jacobi methods for a computer with magnetic tape backing store
Comput. J. 5 51-60
CODY W J and WAITE W 1980 Software Manual for the Elementary Functions (Englewood Cliffs. NJ:
Prentice Hall)
CONN A R 1985 Nonlinear programming. exact penalty functions and projection techniques for non-
smooth functions Boggs, Byrd and Schnabel pp 3-25
C OONEN J T 1984 Contributions to a proposed standard for binary floating-point arithmetic PhD
Dissertation University of California, Berkeley
CRAIG R J and EVANS J W c. 1980A comparison of Nelder-Mead type simplex search procedures Technical
Report No 146 (Lexington, KY: Dept of Statistics, Univ. of Kentucky)
CRAIG R J, EVANS J W and ALLEN D M 1980 The simplex-search in non-linear estimation Technical Report
No 155 (Lexington, KY: Dept of Statistics. Univ. of Kentucky)
CURRY H B 1944 The method of steepest descent for non-linear minimization problems Q. Appl. Math. 2
258-61
DAHLQUIST G and BJÖRAK A 1974 Numerical Methods (translated by N Anderson) (Englewood Cliffs. NJ:
Prentice-Hall)
DANTZIG G B 1979 Comments on Khachian’s algorithm for linear programming Technical Report No
SOL 79-22 (Standford, CA: Systems Optimization Laboratory, Stanford Univ.)
DAVIDON W C 1959 Variable metric method for minimization Physics and Mathematics, AEC Research
and Development Report No ANL-5990 (Lemont, IL: Argonne National Laboratory)
---1976 New least-square algorithms J. Optim. Theory Applic. 18 187-97
---1977 Fast least squares algorithms Am. J. Phys. 45 260-2
DEMBO R S, EISENSTAT S C and STEIHAUG T 1982 Inexact Newton methods SIAM J. Numer. Anal. 19
400-8
DEMBO R S and STEIHAUG T 1983 Truncated-Newton algorithms for large-scale unconstrained optimiza-
tion Math. Prog. 26 190-212
DENNIS J E Jr, GAY D M and WELSCH R E 1981 An adaptive nonlinear least-squares algorithm ACM
Trans. Math. Softw. 7 348-68
DENNIS J E Jr and SCHNABEL R 1983 Numerical Methods far Unconstrained Optimization and Nonlinear
Equations (Englewood Cliffs, NJ: Prentice-Hall)
DIXON L C W 1972 Nonlinear Optimisation (London: The English Universities Press)
DIXON L C W and SZEGÖ G P (eds) 1975 Toward Global Optimization (Amsterdam/Oxford: North-
Holland and New York: American Elsevier)
---(eds) 1978 Toward Global Optimization 2 (Amsterdam/Oxford: North-Holland and New York:
American Elsevier)
DONALDSON J R and SCHNABEL R B 1987 Computational experience with confidence regions and
confidence intervals for nonlinear least squares Technometrics 29 67-82
DONGARRA and GROSSE 1987 Distribution of software by electronic mail Commun. ACM 30 403-7
DRAPER N R and SMITH H 1981 Applied Regression Analysis 2nd edn (New York/Toronto: Wiley)
EASON E D and F ENTON R G 1972 Testing and evaluation of numerical methods for design
optimization Report No lJTME-TP7204 (Toronto, Ont.: Dept of Mechanical Engineering, Univ. of
Toronto)
---1973 A comparison of numerical optimization methods for engineering design Trans. ASME J. Eng.
Ind. paper 73-DET-17, pp l-5
Bibliography 265
EVANS D J (ed.) 1974 Software for Numerical Mathematics (London: Academic)
EVANS J W and CRAIG R J 1979 Function minimization using a modified Nelder-Mead simplex search
procedure Technical Report No 144 (Lexington, KY: Dept of Statistics, Univ. of Kentucky)
FIACCO A V and MCCORMICK G P 1964 Computational algorithm for the sequential unconstrained
minimization technique for nonlinear programming Mgmt Sci. 10 601-17
---1966 Extensions of SUMT for nonlinear programming: equality constraints and extrapolation Mgmt
Sci. 12 816-28
FINKBEINER D T 1966 Introduction to Matrices and Linear Transformations (San Francisco: Freeman)
FLETCHER R 1969 Optimization Proceedings of a Symposium of the Institute of Mathematics and its
Applications, Univ. of Keele, 1968 (London: Academic)
---1970 A new approach to variable metric algorithms Comput. J. 13 317-22
---1971 A modified Marquardt subroutine for nonlinear least squares Report No AERE-R 6799
(Harwell, UK: Mathematics Branch, Theoretical Physics Division, Atomic Energy Research Establish-
ment)
---1972 A FORTRAN subroutine for minimization by the method of conjugate gradients Report No
AERE-R 7073 (Harwell, UK: Theoretical Physics Division, Atomic Energy Research Establishment)
---1980a Practical Methods of Optimization vol 1: Unconstrained Optimization (New York/Toronto:
Wiley)
---1980b Practical Methods of Optimization vol 2: Constrained Optimization (New York/Toronto:
Wiley)
FLETCHER R and POWELL M J D 1963 A rapidly convergent descent method for minimization Comput. J. 6
163-8
FLETCHER R and REEVES C M 1964 Function minimization by conjugate gradients Comput. J. 7 149-54
FORD B and HALL G 1974 The generalized eigenvalue problem in quantum chemistry Comput. Phys.
Commun. 8 337-48
FORSYTHE G E and HENRICI P 1960 The cyclic Jacobi method for computing the principal values of a
complex matrix Trans. Am. Math. Soc. 94 l-23
FORSYTHE G E, MALCOLM M A and MOLER C E 1977 Computer Methods for Mathematical Computations
(Englewood Cliffs, NJ: Prentice-Hall)
FRIED I 1972 Optimal gradient minimization scheme for finite element eigenproblems J . Sound Vib. 20
333-42
FRÖBERG C 1965 Introduction to Numerical Analysis (Reading, Mass: Addison-Wesley) 2nd edn, 1969
GALLANT A R 1975 Nonlinear regression Am. Stat. 29 74-81
GASS S I 1964 Linear Programming 2nd edn (New York/Toronto: McGraw-Hill)
GAUSS K F 1809 Theoria Motus Corporum Coelestiam Werke Bd. 7 240-54
GAY D M 1983 Remark on algorithm 573 (NL2SOL: an adaptive nonlinear least squares algorithm) ACM
Trans. Math. Softw. 9 139
GENTLEMAN W M 1973 Least squares computations by Givens’ transformations without square roots J.
Inst. Maths Applies 12 329-36
GENTLEMAN W M and MAROVICH S B 1974 More on algorithms that reveal properties of floating point
arithmetic units Commun. ACM 17 276-7
GERADIN M 1971 The computational efficiency of a new minimization algorithm for eigenvalue analysis J.
Sound Vib. 19 319-31
GILL P E and MURRAY W (eds) 1974 Numerical Methods for Constrained Optimization (London:
Academic)
---1978 Algorithms for the solution of the nonlinear least squares problem SIAM J. Numer. Anal. 15
977-92
G ILL P E, M URRAY W and W RIGHT M H 1981 Practical Optimization (London: Academic)
GOLUB G H and PEREYRA V 1973 The differentiation of pseudo-inverses and nonlinear least squares
problems whose variables separate SIAM J. Numer. Anal. 10 413-32
GOLUB G H and STYAN G P H 1973 Numerical computations for univariate linear models J. Stat. Comput.
Simul. 2 253-74
GOLUB G H and VAN LOAN C F 1983 Matrix Computations (Baltimore, MD: Johns Hopkins University
Press)
GREGORY R T and KARNEY D L 1969 Matrices for Testing Computational Algorithms (New York: Wiley
Interscience)
266 Compact numerical methods for computers
HADLEY G 1962 Linear Programming (Reading, MA: Addison-Wesley)
HAMMARLING S 1974 A note on modifications to the Givens’ plane rotation J. Inst. Maths Applics 13
215-18
HARTLEY H O 1948 The estimation of nonlinear parameters by ‘internal least squares’ Biometrika 35 32-45
----1961 The modified Gauss-Newton method for the fitting of non-linear regression functions by least
squares Technometrics 3 269-80
HARTLEY H O and BOOKER A 1965 Nonlinear least squares estimation Ann. Math. Stat. 36 638-50
HEALY M J R 1968 Triangular decomposition of a symmetric matrix (algorithm AS6) Appl Srat. 17 195- 7
HENRICI P 1964 Elements of Numerical Analysis (New York: Wiley)
HESTENES M R 1958 Inversion of matrices by biorthogonahzation and related results J. Soc. Ind. Appl.
Math. 5 51-90
---1975 Pseudoinverses and conjugate gradients Commun. ACM 18 40-3
HESTENES M R and STIFFEL E 1952 Methods of conjugate gradients for solving linear systems J. Res. Nat.
Bur. Stand. 49 409-36
HILLSTROM K E 1976 A simulation test approach to the evaluation and comparison of unconstrained
nonlinear optimization algorithms Argonne National Laboratory Report ANL-76-20
HOCK W and SCHITTKOWSKI K 1981 Test examples for nonlinear programming codes Lecture Notes in
Economics and Mathematical Systems 187 (Berlin: Springer)
HOLT J N and FLETCHER R 1979 An algorithm for constrained nonlinear least squares J. Inst. Maths
Applics 23 449-63
HOOKE R and JEEVES T A 1961 ‘Direct Search’ solution of numerical and statistical problems J. ACM 8
212-29
JACOBI C G J 1846 Uber ein leichtes Verfahren. die in der Theorie der Sakularstorungen vorkommenden
Gleichungen numerisch aufzulosen Crelle's J. 30 51-94
JACOBY S L S. KOWALIK J S and PIZZO J T 1972 Iterative Methods for Nonlinear Optimization Problems
(Englewood Cliff‘s, NJ: Prentice Hall)
JENKINS M A and TRAUB J F 1975 Principles for testing polynomial zero-finding programs ACM Trans.
Math. Softw. 1 26-34
JONES A 1970 Spiral a new algorithm for non-linear parameter estimation using least squares Comput. J.
13 301-8
KAHANER D, MOLER C and NASH S G 1989 Numerical Analysis and Software (Englewood Cliffs. NJ:
Prentice Hall)
KAHANER D and PARLETT B N 1976 How far should you go with the Lanczos process’! Sparse Matrix
Computations eds J R Bunch and D J Rose (New York: Academic) pp 131-44
KAISER H F 1972 The JK method: a procedure for finding the eigenvectors and eigenvalues of a real
symmetric matrix Comput. J. 15 271-3
KARMARKAR N 1984 A new polynomial time algorithm for linear programming Combinatorica 4 373-95
KARPINSKI R 1985 PARANOIA: a floating-point benchmark Byte 10(2) 223-35 (February)
KAUFMAN L 1975 A variable projection method for solving separable nonlinear least squares problems
BIT 15 49-57
KENDALL M G 1973 Time-series (London: Griffin)
KENDALL M G and STEWART A 1958-66 The Advanced Theory of Statistics vols 1-3 (London: Griffin)
KENNEDY W J Jr and GENTLE J E 1980 Statistical Computing (New York: Marcel Dekker)
KERNIGHAN B W and PLAUGER P J 1974 The Elements of Programming Style (New York: McGraw-Hill)
KIRKPATRICK S, GELATT C D Jr and VECCHI M P 1983 Optimization by simulated annealing Science 220
(4598) 671-80
KOWALIK J and OSBORNE M R 1968 Methods for Unconstrained Optimization Problems (New York:
American Elsevier)
KUESTER J L and MIZE H H 1973 Optimization Techniques with FORTRAN (New York London Toronto:
McGraw-Hill)
KUI.ISCH U 1987 Pascal SC: A Pascal extension for scientific computation (Stuttgart: B G Teubner and
Chichester: Wiley)
LANCZOS C 1956 Applied Analysis (Englewood Cliffs. NJ: Prentice Hall)
LAWSON C L and HANSON R J 1974 Solving Least Squares Problems (Englewood Cliffs, NJ: Prentice Hall)
LEVENBERG K 1944 A method for the solution of certain non-linear problems in least squares Q. Appl.
Math. 2 164-8
Bibliography 267
LOOTSMA F A (ed.) 1972 Numerical Methods for Non-Linear Optimization (London/New York: Academic)
MAINDONALD J H 1984 Statistical Computation (New York: Wiley)
MALCOLM M A 1972 Algorithms to reveal properties of floating-point arithmetic Commun. ACM 15
949-51
MARQUARDT D W 1963 An algorithm for least-squares estimation of nonlinear parameters J. SIAM 11
431-41
---1970 Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation
Technometrics 12 59l-612
MCKEOWN J J 1973 A comparison of methods for solving nonlinear parameter estimation problems
Identification & System Parameter Estimation, Proc. 3rd IFAC Symp. ed. P Eykhoff (The Hague: Delft)
pp 12-15
-- 1974 Specialised versus general purpose algorithms for minimising functions that are sums of squared
terms Hatfield Polytechnic, Numerical Optimization Centre Technical Report No 50, Issue 2
MEYER R R and ROTH P M 1972 Modified damped least squares: an algorithm for non-linear estimation J.
Inst. Math. Applic. 9 218-33
MOLER C M and VAN LOAN C F 1978 Nineteen dubious ways to compute the exponential of a matrix
SIAM Rev. 20 801-36
MORÉ J J, GARBOW B S and HILLSTROM K E 1981 Testing unconstrained optimization software ACM
Trans. Math. Softw. 7 17-41
MOSTOW G D and SAMPSON J H 1969 Linear Algebra (New York: McGraw-Hill)
MURRAY W (ed.) 1972 Numerical Methods for Unconstrained Optimization (London: Academic)
NASH J C 1974 The Hermitian matrix eigenproblem HX=eSx using compact array storage Comput.
Phys. Commun. 8 85-94
---1975 A one-sided transformation method for the singular value decomposition and algebraic
eigenproblem Comput. J. 18 74-6
---1976 An Annotated Bibliography on Methods for Nonlinear Least Squares Problems Including Test
Problems (microfiche) (Ottawa: Nash Information Services)
---1977 Minimizing a nonlinear sum of squares function on a small computer J. Inst. Maths Applics 19
231-7
---1979a Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation
(Bristol: Hilger and New York: Halsted)
---1979b Accuracy of least squares computer programs: another reminder: comment Am. J. Ag. Econ.
61 703-9
---1980 Problémes mathématiques soulevés par les modéles économiques Can. J. Ag. Econ. 28 51-7
---1981 Nonlinear estimation using a microcomputer Computer Science and Statistics: Proceedings of
the 13th Symposium on the Interface ed. W F Eddy (New York: Springer) pp 363-6
---1984a Effective Scientific Problem Solving with Small Computers (Reston, VA: Reston Publishing) (all
rights now held by J C Nash)
---1984b LEQB05: User Guide - A Very Small Linear Algorithm Package (Ottawa, Ont.: Nash
Information Services Inc.)
---1985 Design and implementation of a very small linear algebra program package Commun. ACM 28
89-94
---1986a Review: IMSL MATH/PC-LIBRARY Am. Stat. 40 301-3
---1986b Review: IMSL STAT/PC-LIBRARY Am. Stat. 40 303-6
---1986c Microcomputers, standards, and engineering calculations Proc. 5th Canadian Conf. Engineer-
ing Education, Univ. of Western Ontario, May 12-13, 1986 pp 302-16
NASH J C and LEFKOVITCH L P 1976 Principal components and regression by singular value decomposition
on a small computer Appl. Stat. 25 210-16
---1977 Programs for Sequentially Updated Principal Components and Regression by Singular Value
Decomposition (Ottawa: Nash Information Services)
NASH J C and NASH S G 1977 Conjugate gradient methods for solving algebraic eigenproblems Proc.
Symp. Minicomputers and Large Scale Computation, Montreal ed. P Lykos (New York: American
Chemical Society) pp 24-32
---1988 Compact algorithms for function minimisation Asia-Pacific J. Op. Res. 5 173-92
NASH J C and SHLIEN S 1987 Simple algorithms for the partial singular value decomposition Comput. J. 30
268-75
268 Compact numerical methods for computers
NASH J C and TEETER N J 1975 Building models: an example from the Canadian dairy industry Can. Farm.
Econ. 10 17-24
NASH J C and WALKER-SMITH M 1986 Using compact and portable function minimization codes in
forecasting applications INFOR 24 158-68
-- 1987 Nonlinear Parameter Estimation, an Integrated System in Basic (New York: Marcel Dekker)
NASH J C and WANG R L C 1986 Algorithm 645 Subroutines for testing programs that compute the
generalized inverse of a matrix ACM Trans. Math. Softw. 12 274-7
N ASH S G 1982 Truncated-Newton methods Report No STAN-CS-82-906 (Stanford, CA: Dept of
Computer Science, Stanford Univ.)
---1983 Truncated-Newton methods for large-scale function minimization Applications of Nonlinear
Programming to Optimization and Control ed. H E Rauch (Oxford: Pergamon) pp 91-100
---1984 Newton-type minimization via the Lanczos method SIAM J. Numer. Anal. 21 770-88
---1985a Preconditioning of truncated-Newton methods SIAM J. Sci. Stat. Comp. 6 599-616
---1985b Solving nonlinear programming problems using truncated-Newton techniques Boggs, Byrd
and Schnabel pp 119-36
NASH S G and RUST B 1986 Regression problems with bounded residuals Technical Report No 478
(Baltimore, MD: Dept of Mathematical Sciences, The Johns Hopkins University)
NELDER J A and MEAD R 1965 A simplex method for function minimization Comput. J. 7 308-13
NEWING R A and CUNNINGHAM J 1967 Quantum Mechanics (Edinburgh: Oliver and Boyd)
OLIVER F R 1964 Methods of estimating the logistic growth function Appl. Stat. 13 57-66
---1966 Aspects of maximum likelihood estimation of the logistic growth function JASA 61 697-705
OLSSON D M and NELSON L S 1975 The Nelder-Mead simplex procedure for function minimization
Technometrics 17 45-51; Letters to the Editor 3934
O’NEILL R 1971 Algorithm AS 47: function minimization using a simplex procedure Appl. Stat. 20 338-45
OSBORNE M R 1972 Some aspects of nonlinear least squares calculations Numerical Methods for Nonlinear
Optimization ed. F A Lootsma (London: Academic) pp 171-89
PAIGE C C and SAUNDERS M A 1975 Solution of sparse indefinite systems of linear equations SIAM J.
Numer. Anal. 12 617-29
PAULING L and WILSON E B 1935 Introduction to Quantum Mechanics with Applications to Chemistry (New
York: McGraw-Hill)
PENROSE R 1955 A generalized inverse for matrices Proc. Camb. Phil. Soc. 51 406-13
PERRY A and SOLAND R M 1975 Optimal operation of a public lottery Mgmt. Sci. 22 461-9
PETERS G and WILKINSON J H 1971 The calculation of specified eigenvectors by inverse iteration Linear
Algebra, Handbook for Automatic Computation vol 2, eds J H Wilkinson and C Reinsch (Berlm:
Springer) pp 418-39
---1975 On the stability of Gauss-Jordan elimination with pivoting Commun. ACM 18 20-4
PIERCE B O and FOSTER R M 1956 A Short Table of Integrals 4th edn (New York: Blaisdell)
POLAK E and RIBIERE G 1969 Note sur la convergence de méthodes de directions conjugées Rev. Fr. Inf.
Rech. Oper. 3 35-43
POWELL M J D 1962 An iterative method for stationary values of a function of several variables Comput. J.
5 147 51
---1964 An efficient method for finding the minimum of a function of several variables without
calculating derivatives Comput. J. 7 155-62
---1975a Some convergence properties of the conjugate gradient method CSS Report No 23 (Harwell,
UK: Computer Science and Systems Division, Atomic Energy Research Establishment)
----1975b Restart procedures for the conjugate gradient method CSS Report No 24 (Harwell, UK:
Computer Science and Systems Division, Atomic Energy Research Establishment)
---1981 Nonlinear Optimization (London: Academic)
PRESS W H, FLANNERY B P, TEUKOLSKY S A and VETTERLING W T (1986/88) Numerical Recipes (in
Fortran/Pascal/C), the Art of Scientific Computing (Cambridge, UK: Cambridge University Press)
RALSTON A 1965 A First Course in Numerical Analysis (New York: McGraw-Hill)
RATKOWSKY D A 1983 Nonlinear Regression Modelling (New York: Marcel-Dekker)
REID J K 1971 Large Sparse Sets of Linear Equations (London: Academic)
RHEINBOLDT W C 1974 Methods for Solving Systems of Nonlinear Equations (Philadelphia: SIAM)
RICE J 1983 Numerical Methods Software and Analysis (New York: McGraw-Hill)
Bibliography 269
RILEY D D 1988 Structured programming: sixteen years later J. Pascal, Ada and Modula-2 7 42-8
ROSENBKOCK H H 1960 An automatic method for finding the greatest or least value of a function Comput.
J. 3 175-84
ROSS G J S 1971 The efficient use of function minimization in non-linear maximum-likelihood estimation
Appl. Stat. 19 205-21
---1975 Simple non-linear modelling for the general user Warsaw: 40th Session of’ the International
Statistical Institute 1-9 September 1975, ISI/BS Invited Paper 81 pp 1-8
RUHE A and WEDIN P-A 1980 Algorithms for separable nonlinear least squares problems SIAM Rev. 22
318-36
RUHE A and WIBERG T 1972 The method of conjugate gradients used in inverse iteration BIT 12 543-54
RUTISHAUSER H 1966 The Jacobi method for real symmetric matrices Numer. Math. 9 1-10; also in Linear
Algebra, Handbook for Automatic Computation vol 2, eds J H Wilkinson and C Reinsch (Berlin:
Springer) pp 202-11 (1971)
SARGENT R W H and SEBASTIAN D J 1972 Numerical experience with algorithms for unconstrained
minimisation Numerical Methods for Nonlinear Optimization ed. F A Lootsma (London: Academic) pp
445-68
SCHNABEL R B, KOONTZ J E and WEISS B E 1985 A modular system of algorithms for unconstrained
minimization ACM Trans. Math. Softw. 11 419-40
SCHWARZ H R, R UTISHAUSER H and S TIEFEL E 1973 Numerical Analysis of Symmetric Matrices
(Englewood Cliffs, NJ: Prentice- Hall)
SEARLE S R 1971 Linear Models (New York: Wiley)
SHANNO D F 1970 Conditioning of quasi-Newton methods for function minimization Math. Comput. 24
647-56
SHEARER J M and WOLFE M A 1985 Alglib, a simple symbol-manipulation package Commun. ACM 28
820-5
SMITH F R Jr and SHANNO D F 1971 An improved Marquardt procedure for nonlinear regressions
Technometrics 13 63-74
SORENSON H W 1969 Comparison of some conjugate direction procedures for function minimization J.
Franklin Inst. 288 421-41
SPANG H A 1962 A review of minimization techniques for nonlinear functions SIAM Rev. 4 343-65
SPENDLEY W 1969 Nonlinear least squares fitting using a modified Simplex minimization method Fletcher
pp 259-70
SPENDLEY W, HEXT G R and HIMSWORTH F R 1962 Sequential application of simplex designs in
optimization and evolutionary operation Technometric. 4 441-61
STEWART G W 1973 Introduction to Matrix Computations (New York: Academic)
---1976 A bibliographical tour of the large, sparse generalized eigenvalue problem Sparse Matrix
Computations eds J R Bunch and D J Rose (New York: Academic) pp 113-30
---1987 Collinearity and least squares regression Stat. Sci. 2 68-100
STRANG G 1976 Linear Algebra and its Applications (New York: Academic)
SWANN W H 1974 Direct search methods Numerical Methods for Unconstrained Optimization ed. W
Murray (London/New York: Academic)
SYNGE J L and GRIFFITH B A 1959 Principles of Mechanics 3rd edn (New York: McGraw-Hill)
TOINT PH L 1987 On large scale nonlinear least squares calculations SIAM J. Sci. Stat. Comput. 8 416-35
VARGA R S 1962 Matrix Iterative Analysis (Englewood Cliffs. NJ: Prenticee-Hall)
WILKINSON J H 1961 Error analysis of direct methods of matrix inversion J. ACM 8 281-330
---1963 Rounding Errors in Algebraic Processes (London: HMSO)
---1965 The Algebraic Eigenvalue Problem (Oxford: Clarendon)
WILKINSUN J H and REINSCH C (eds) 197 1 Linear Algebra, Handbook for Automatic Computation vol 2
(Berlin: Springer)
WOLFE M A 1978 Numerical Methods for Unconstrained Optimization, an Introduction (Wokingham, MA:
Van Nostrand-Reinhold)
YOURDON E 1975 Techniques of’ Program Structure and Design (Englewood Cliffs, NJ: Prentice-Hall)
ZAMBARDINO R A 1974 Solutions of systems of linear equations with partial pivoting and reduced storage
requirements Comput. J. 17 377-8
270
INDEX
271
272 Compact numerical methods for computers
Complex matrix, Deletion of observations in least-squares, 64
eigensolutions of, 1 10 Delta,
Complex systems of linear equations, 82 Kronecker, 3 1, 73, 119
Components, Dense matrix, 20, 23
principal, 40, 46 Derivative evaluation count, 217
Computability of a function, 153 Derivatives of a function, 149, 187, 210
Computations, approximation by differences, 21, 217
statistical, 66 in minimisation, 143
Computer, De-scaling,
small, 3 of nonlinear least-squares problem, 223
Conjugacy of search directions, 186, 188, 197, of nonlinear minimisation, 231
244,245 Descent methods for function minimisation, 186
Conjugate gradients, 153, 186, 197, 223, 228, 232, Diagonal matrix, 254
233 Diagonalisation of a real symmetric matrix, 126
in linear algebra, 234 Difference,
Constrained optimisation, 3, 218, 221 replacement of derivative, 21
Constraints, 143 Differential equations,
equality, 221 ordinary, 20
independent, 221 Digit cancellation, 55
inequality, 221 Ding Dong matrix, 122, 253
Contraction of simplex, 168, 170 Direct method for linear equations, 72
Convergence, Direct search methods for function minimisation
criteria for, 5, 15 182
of inverse iteration, 105 Dixon, L. C., 154, 182, 223, 225
of Nelder-Mead search, 180 Doolittle method, 75, 80
of power method, 103 Double precision, 9, 14, 81, 83, 91
Convergence test, 159, 171, 180, 242 Dow Jones index, 77
for inverse iteration, 108
Convex function, 208 E (notation), 17
Corrected R2 statistic, 45 Eason, E. D., 182
Cost of computations, 1, 3 Eberlein, P., 110, 117
Cox, M., 133 ECLIPSE, 52, 96, 128, 153, 156, 159
Cross-products matrix, 49, 66 Effect of Jacobi rotations, 126
Crout method, 75, 80 Eigenproblem,
for complex equations, 83 generalised, 104
Cubic interpolation, 15 1 total or complete, 119
Cubic inverse interpolation, 159 Eigenproblem of a real symmetric matrix,
Cubic-parabola problem, 232 comparison of methods, 133
Cunningham, J., 138,141 Eigensolutions, 28, 31
Cycle or sweep, 35, 49 by singular-value decomposition, 123
Cyclic Jacobi algorithm, 127 of a complex matrix, 117
Cyclic re-ordering, 98 of a real symmetric matrix, 119
Eigenvalue, 28, 135
Dahlquist, G., 70, 75, 80, 81, 197 degenerate, 103
Data General computers, see NOVA or Eigenvalue approximation in inverse iteration,
ECLIPSE 108
Data points, 142 Eigenvalue decomposition of matrix, 135
Davies, 182 Eigenvalue problem,
Davies, Swann and Campey method, 182 matrix or algebraic, 102
Decomposition, Eigenvector, 28, 135
Choleski, 27 Elementary matrices, 73
of a matrix, 26, 49 Elementary operations on matrices, 73
Definiteness of a matrix, 22 Elimination method for linear equations, 72
Degenerate eigenvalues, 120, 125 Elimination of constraints, 22 1
Degrees of freedom, 46 choice in, 223
Index 273
Equations, Geradin, M., 244, 246
linear, 19, 20, 51 Gerschgorin bound, 136
Equilibration of matrix, 80 Gerschgorin’s theorem, 121
Equivalent function evaluations (efe’s), 227 Gill, P. E., 221, 225
Euclidean norm, 22 Givens’ reduction, 15, 49, 51, 63, 83
Examples, and singular-value decomposition,
list of, 256 implementation, 54
Execution time, 227 for inverse iteration, 105, 109
Expenditure minimisation, 156 of a real rectangular matrix, 51
Exponents of decimal numbers, 17 operation of, 52
Expression of algorithms, 15 singular-value decomposition and least-squares
Extended precision, 14 solution, 56
Extension of simplex, 168, 169, 172 Givens’ tridiagonalisation, 133
Extrapolation, 151 Global minimum, 146
Golub, G. H., 56
GOTO instructions, 12
False Position, 161
Gradient, 186, 188, 197, 208, 226
Fenton, R. G., 182
computed, 226
Financial Times index, 77
of nonlinear sum of squares, 209
Finkbeiner, D. T., 87
of Rayleigh quotient, 245
Fletcher, R., 190, 192, 198, 199, 215, 228, 244
Gradient calculation in conjugate gradients for
Fletcher-Reeves formula, 199
linear equations, 235
FMIN linear search program, 153
Gradient components,
Ford B., 135
‘large’ computed values of, 206
Formulae,
Gram-Schmidt orthogonalisation, 197
Gauss-Jordan, 98
Gregory, R. T., 117
Forsythe, G. E., 127, 153
Grid search, 149, 156, 160
FORTRAN, 10, 56, 63
Griffith, B. A., 125
Forward difference, 2 19
Guard digits, 7
Forward-substitution, 86, 136
Foster, R. M., 139
Frank matrix, 250,253 Hall, G., 135
Fried, I., 246 Hamiltonian operator, 28, 138
Fröberg, C., 21, 127, 238, 251 Hammarling, S., 50
Full-rank case, 23, 66 Hanson, R. J., 64
Function evaluation count, 157, 164, 209, 217, Hartley, H. O., 210, 211
227, 232 Harwell subroutine library, 215
Function minimisation, 142, 207 Hassan, Z., 223
Functions, Healy, M. J. R., 88, 90
penalty, 222 Heaviside function, 222
Hemstitching of function minimisation method,
Galle, 131 186, 208
Gauss elimination, 72, 79, 82, 93 Henderson, B., 153
for inverse iteration, 105, 109 Henrici, P., 127, 162
variations, 80 Hermitian matrix, 137
with partial pivoting, 75 Hessian, 189, 197, 231
Gauss-Jordan reduction, 82, 93 for Rayleigh quotient, 244
Gauss-Newton method, 209, 211, 228 matrix, 187
Gearhart, W. B., 146, 232 Hestenes, M. R., 33, 134, 235, 241
Generalised eigenvalue problem, 135, 234, 242 Heuristic method, 168, 171
Generalised inverse, 44, 66 Hewlett-Packard,
2 and 4 condition, 26 computers, see HP9830
of a matrix, 24 pocket calculators, 5
Generalised matrix eigenvalue problem, 28, 104 Hilbert segment, 108, 253
Gentleman, W. M., 50 Hillstrom, K. E., 227
274 Compact numerical methods for computers
Homogeneity of a function, 244 Jacobi, C.G. J., 126. 127, 131
Hooke and Jeeves method, 182 jacobi (ALGOL procedure), 128, 133
Householder tridiagonalisation, 133 Jacobi algorithm, 126, 136,250
HP9830, 44, 56, 62, 70, 90, 92, 131, 164 cyclic, 127
organisation of, 128
Jacobi rotations,
effect of, 126
IBM 370, 120 Jacobian, 211, 217, 232
IBM 370/168, 56, 128, 167, 196, 239 matrix, 209
I11 conditioning of least-squares problem, 42 Jaffrelot, J. J., 204
Implicit interchanges for pivoting, 81 Jeeves, 185
IMSL, 10 Jenkins, M. A., 143, 148
Indefinite systems of linear equations. 241 Jones, A., 215
Independence,
linear, 20
Index array, 82 Kahan, W., 234
Index numbers, 23, 77 Kaiser, H. F., 134
Infeasible problems, 221 Karney, D. L., 117
Infinity norm, 104 Kendall, M. G., 40, 180
Information loss, 67 Kernighan, B. W., 12
Initial values for parameters, 146 Kowalik, J., 85, 142, 186
Inner product, 28, 245 Kronecker delta, 3 173, 119
Insurance premium calculation. 165
Interchange, LLTdecomposition, 84
implicit, 81 Lagrange multipliers, 221
row and column, 95 Lanczos method for eigenvalue problems. 234
Internal rate of return, 145 Lawson, C. L., 64
International Mathematical and Statistical Least-squares, 23, 50, 54, 77
Libraries, 10 linear, 21
Interpolating parabola, 152 via normal equations, 92
Interpolation, via singular-value decomposition, 40, 42
formulae for differentiation, 218 Least-squares computations,
linear, 161 example, 45
Interpreter for computer programming language, Least-squares solution, 22
91 Lefkovitch, L. P., 56, 63, 70
Interval, Levenberg, K., 211
closed, 17 Leverrier, 131
for linear search, 148 Linear algebra, 19
for root-finding, 160 Linear approximation of nonlinear function. 187
open, 17 Linear combination, 29
Inverse, Linear dependence, 34
generalised, 44 Linear equations, 19, 20, 72, 77, 93, 234, 235
of a matrix, 24 as a least-squares problem, 23
of a symmetric positive definite matrix, 97 complex, 82
of triangular matrices, 74 consistent, 87
Inverse interpolation, 151 Linear independence, 20, 25
Inverse iteration, 104, 140 Linear least-squares, 21, 77, 207, 234, 235
behaviour of, 108 Linear relationship, 23
by conjugate gradients, 241,249 Linear search, 143, 146, 148, 156, 159, 188, 189,
Inverse linear interpolation, 161 192, 198, 199, 235, 244
Inverse matrix, 95 acceptable point strategy, 190
Iteration limit, 109 List of algorithms. 255
Iteration matrix, 188 List of examples, 256
initialisation, 191 Local maxima, 143. 146, 149
Iterative improvement of linear-equation Local minima, 146, 208
solutions, 81 Logistic growth function, 144, 216
Index 275
Loss of information in least-squares Matrix iteration methods for function
computations, 23, 67 minimisation, 187
Lottery, Matrix product count, 250
optimal operation of, 144, 228 Matrix transpose, 22
LU decomposition, 74 Maxima, 143
Maximal and minimal eigensolutions, 243
McKeown, J. J., 207
Machine arithmetic, 6 Mead, R., 168, 170
Machine precision, 6, 46, 70, 105, 219 Mean of two numbers, 8
Magnetic roots, 232 Measure of work in function minimisation, 227
Magnetic zeros, 147 Method of substitution, 93
Malcolm, M. A., 6 Minima of functions, 142
Mantissa, 6 Minimum-length least-squares solution, 22, 25
Market equilibrium, Model,
nonlinear equations, 231 linear, 23
Marquardt, D. W., 211, 212 nonlinear, 207
Marquardt algorithm, 209, 223, 228, 232, 233 of regional hog supply, 204
Mass-spectrograph calibration, 20 Modular programming, 12
Mathematical programming, 3, 13 Moler, C., 250, 253
Mathematical software, 11 Moler matrix, 127, 250, 253
Matrix, 19 Choleski decomposition of, 91
coefficient, 20.23 Moments of inertia, 125
complex, 110 Moore-Penrose inverse, 26, 44
cross-products, 66 Mostow, G. D., 74
dense, 20, 23 Multiplicity of eigenvalues, 120
diagonal, 26, 3 1 Murray. W., 221, 225, 228
elementary, 73
Frank, 100 NAG, 10, 215
generalised inverse of, 24 Nash, J. C., 33, 56, 63, 70, 110, 134, 137, 196, 211,
Hermitian, 110 215, 226, 235
inverse, 24, 95 Nash, S. G., 82, 148, 235
Moler, 100 Negative definite matrix, 238
non-negative definite, 22, 86 Nelder, J. A., 168, 170
non-symmetric, 110 NelderMead search, 168, 197, 223, 228, 230, 233
null, 52 modifications, 172
orthogonal, 26, 31, 50 Neptune (planet), 131
positive definite, 22 Newing, R. A., 138, 141
rank of, 20 Newton-Raphson iteration, 210
real symmetric, 31, 119 Newton’s method, 161, 188, 210
rectangular, 24, 44 for more than one parameter, 187
semidefinite, 22 Non-diagonal character,
singular, 20 measure of, 126
sparse, 20, 21, 23 Nonlinear equations, 142, 143, 144, 186, 231
special, 83 Nonlinear least-squares, 142, 144, 207, 231
symmetric, 23, 28 Nonlinear model of demand equations, 223
symmetric positive definite, 83, 84, 93 Non-negative definite matrix, 22
triangular, 26, 50, 52, 72, 74 Non-singular matrix, 20
unit, 29, 32 Norm, 17, 21, 66, 243
unitary, 27 Euclidean, 22
Matrix decomposition, of vector, 104
triangular, 74 Normal equations, 22, 25, 41, 50, 55, 66, 92, 239
Matrix eigenvalue problem, 28, 135 as consistent set, 88
generalised, 104, 148 Normalisation, 28, 52
Matrix eigenvalues for polynomial roots, 148 of eigenvectors, 108, 119
Matrix form of linear equations, 19 of vector to prevent overflow, 104
Matrix inverse for linear equations, 24 to prevent overflow, 103
276 Compact numerical methods for computers
Normalising constant, 139 Plauger, P. J., 12
Notation, 17 Plot or graph of function, 151
NOVA, 5, 46, 69, 79, 90, 91, 93, 100, 108, 109, Polak, E., 198, 199
117, 122, 123, 125, 127, 141, 153, 156, 164, Polak-Ribiere formula, 199
199, 206, 208, 220, 225,226, 229, 230,232, Polynomial roots, 143, 145
241,250 Positive definite iteration matrix, 192
Null vector, 20 Positive definite matrix, 22, 120, 188, 197, 211,
Numerical Algorithms Group, 10 235, 241, 243
Numerical approximation of derivatives, 2 17, Positive definite symmetric matrix, 83
218, 223, 228 inverse of, 24
Numerical differentiation, 218 Powell. M. J. D., 185, 199
Power method for dominant matrix
Objective function, 205, 207 eigensolution, 102
Oliver, F. R., 144, 207 Precision,
One-dimensional problems, 148 double, 9, 14
O’Neill, R., 171, 178 extended, 9, 14
One-sided transformation, 136 machine, 5, 46, 70
Ones matrix, 254 Price, K., 90
Operations, Principal axes of a cube, 125
arithmetic, 5 Principal components, 41, 46
Optimisation, 142 Principal moments of inertia, 125
constrained, 3 Product of triangular matrices, 74
Ordering of eigenvalues, 127, 134 Program,
Ordinary differential equations, 20 choice, 14
Orthogonal vectors, 25, 32 coding, 14
Orthogonalisation, compactness, 12
by plane rotations, 32 maintenance, 14
of matrix rows, 49, 54 readability, 12
Orthogonality, reliability, 14
of eigenvectors of real symmetric matrix, 119 testing, 14
of search directions, 198 Programming,
of vectors, 26 mathematical, 13
Osborne, M. R., 85, 142, 186, 226 structured, 12
Programming language, 11, 15
Programs,
Paige, C. C., 234 manufacturers’, 9
Parabolic interpolation, 151 sources of, 9
Parabolic inverse interpolation, 152, 199, 210 Pseudo-random numbers, 147, 166, 240
formulae, 153
Parameters, 142
Parlett, B. N., 234 QR algorithm, 133
Partial penalty function, 222 QR decomposition, 26, 49, 50, 64
Partial pivoting, 75 Quadratic equation, 85, 244
Pascal, 12 Quadratic form, 22, 89, 190, 198, 235
Pauling, L., 28 Quadratic or parabolic approximation, 15 1
Penalty functions, 222, 223 Quadratic termination, 188, 199, 236
Penrose, R., 26 Quantum mechanics, 28
Penrose conditions for generalised inverse, 26 Quasi-Newton methods, 187
Permutations or interchanges, 75
Perry, A., 144, 230
Peters, G., 105 R2 statistic, 45, 63
Pierce, B. O., 139 Radix, 7
Pivoting, 75, 93, 95, 97 Ralston, A., 95, 104, 121, 127, 218
Plane rotation, 32, 49, 54, 126 Rank, 20
formulae, 34 Rank-deficient case, 24, 25, 55
Index 277
Rayleigh quotient, 122, 123, 138, 200, 234, 242, Saunders, M. A., 234
244 Scaling,
minimisation, 250 of Gauss -Newton method, 211
minimisation by conjugate gradients, 243 of linear equations, 80
Rayleigh-Ritz method, 138 Schwarz, H. R., 127
Readability of programs, 12 Search,
Real symmetric matrix, 119 along a line, 143, 148
Reconciliation of published statistics, 204 directions, 192, 197
Recurrence relation, 166, 198, 235, 246 Sebastian, D. J., 190
Reduction, Secant algorithm, 162
of simplex, 168, 170 Seidel, L., 131
to tridiagonal form, 133 sgn (Signum function), 34
Reeves, C. M., 198, 199 Shanno, D. F., 190
References, 263 Shift of matrix eigenvalues, 103, 121, 136, 242
Reflection of simplex, 168, 169, 172 Shooting method, 239
Regression, 92 Short word-length arithmetic, 159, 191
stepwise, 96 Signum function, 34
Reid, J. K., 234 Simplex, 168
Reinsch, C., 13, 83, 97, 102, 110, 133, 137, 251 size, 171
Reliability, 14 Simulation of insurance scheme, 165
Re-numeration, 98 Simultaneous equations,
Re-ordering, 99 linear, 19
Residual, 21, 45, 250 nonlinear, 142, 144
uncorrelated, 56, 70 Single precision, 134, 159
weighted, 24 Singular least-squares problem, 240
Residuals, 142, 144, 207 Singular matrix, 20
for complex eigensolutions, 117 Singular-value decomposition, 26, 30, 31, 54, 66,
for eigensolutions, 125, 128 69, 81, 119
sign of, 142, 207 algorithm, 36
Residual sum of squares, 55, 79 alternative implementation, 38
computation of, 43 updating of, 63
Residual vector, 242 Singular values, 30, 31, 33, 54, 55
Restart, ordering of, 33
of conjugate gradients for linear equations, 236 ratio of, 42
of conjugate gradients minimisation, 199 Small computer, 3
of Nelder-Mead search, 171 Software,
Ribiere, G., 198, 199 mathematical, 10
Ris, F. N., 253 Soland, R. M., 144, 230
Root-finding, 143, 145, 148, 159, 160,239 Solution,
Roots, least-squares, 22
of equations, 142 minimum-length least-squares, 22
of quadratic equation, 245 Sorenson, H. W., 198
Rosenbrock, H. H., 151, 182, 196, 208, 209 Sparse matrix, 20, 23, 102, 234
Rounding, 7 Spendley, W., 168
Row, ‘Square-root-free Givens’ reduction, 50
orthogonalisation, 49, 54 Standardisation of complex eigenvector, 111
permutations, 75 Starting points, 146
Ruhe, A., 234, 242 Starting vector,
Rutishauser, H., 127, 134 power method, 104
Statistical computations, 66
Steepest descent, 186, 199, 208, 209, 211
Saddle points, 143, 146, 159, 208 Stegun, I. A., 4
Safety check (iteration limit), 128 Step adjustment in success-failure algorithm, 154
Sampson, J. H., 74 Step length, 178, 187, 197, 200, 242
Sargent, R. W. H., 190 Step-length choice, 158
278 Compact numerical methods for computers
Step length for derivative approximation, 219 Truncation, 7
Stepwise regression, 96 Two-point boundary value problem, 238
Stewart, G. W., 40, 234
Structured programming, 12
Styan, G. P. H., 56 Unconstrained minimisation. 142
Substitution for constraints, 221 Uncorrelated residuals, 56, 70
Success-failure, Uniform distribution. 167
algorithm, 151, 153 Unimodal function, 149
search, 152 Unit matrix, 29
Success in function minimisation, 226 Univac 1108, 56, 120
Sum of squares, 22, 23, 39,42, 55, 79 Updating,
and cross products, 66 formula, 190
nonlinear, 207 of approximate Hessian, 189, 192
total, 45
Surveying-data fitting, 24, 240 V-shaped triple of points, 152
Swann, 182, 225 Values,
Sweep or cycle. 35, 49, 126 singular, see Singular values
Symmetric matrix, 135, 243 Varga, R. S., 83
Symmetry. Variable metric.
use in eigensolution program, 134 algorithms, 198
Synge, J. L., 125 methods, 186, 187, 223, 228, 233
System errors, 4 Variables. 142
Variance computation in floating-point
Taylor serves, 190, 209 arithmetic. 67
Tektronix 4051, 156 Variance of results from ‘known’ values, 241
Test matrices, 253 Variation method, 28
Test problems, 226 Vector. 19, 30
Time series, 180 null, 20. 32
Tolerance, 5, 15, 35, 40, 54 residual. 21
for acceptable point search, 190
for conjugate gradients least-squares, 240 Weighting.
for deviation of parameters from target, 204 for nonlinear least-squares, 207
for inverse iteration by conjugate gradients, of constraints, 222
243 in index numbers. 77
Total sum of squares, 45 Wiberg, T., 242
Transactions on Mathematical Software. 11 Wilkinson, J. H., 13, 28, 75, 83. 86, 97, 102, 105,
Transposition, 22 110, 119, 127, 133, 137, 251, 253, 254
Traub, J. F., 143, 148 W+matrix, 254
Trial function, 28 W- matrix, 108, 254
Triangle inequality, 22 Wilson, E. B., 28
Triangular decomposition, 74
Triangular matrix, 72
Triangular system, Yourdon. E., 12
of equations, 72
of linear equations, 51 Zambardino, R. A., 13
Tridiagonal matrix, 251