Cuyt, Rall - 1985 - Computational Implementation of The Multivariate Halley Method For Solving Nonlinear Systems of Equations
Cuyt, Rall - 1985 - Computational Implementation of The Multivariate Halley Method For Solving Nonlinear Systems of Equations
net/publication/220492569
CITATIONS READS
24 492
2 authors, including:
Annie Cuyt
University of Antwerp
211 PUBLICATIONS 1,443 CITATIONS
SEE PROFILE
All content following this page was uploaded by Annie Cuyt on 10 June 2014.
Cubicaliy convergent iterative methods for the solution of nonlinear systems of equations, such as
the multivariate Halley method, require first and second partial derivatives of the functions compris-
ing the system. Automatic differentiation is used to automate the Halley method, using the data type
HESSIAN and routines for the required operators and functions. A Pascal-SC program is given,
which implements this method in a single-step iteration mode. The program is applied to two
nonlinear systems, and the results are compared with Newton’s method.
Categories and Subject Descriptors: G.l.5 [Numerical Analysis]: Roots of Nonlinear Equations-
iterative methods, systems of equations; G.1.m [Numerical Analysis]: Miscellaneous
General Terms: Languages
Additional Key Words and Phrases: Automatic differentiation, cubic convergence, Halley method,
Pascal-SC, type HESSIAN
Research was sponsored by the Belgian National fund for Scientific Research (NFWO), and in part
by the U.S. Army under Contract No. DAAG29-80-C-0041.
Authors’ addresses: A. A. M. Cuyt, Department of Mathematics, University of Antwerp UIA,
B-2610 Wilrijk, Belgium; L. B. Rall, Mathematics Research Center, University of Wisconsin-
Madison, Madison WI 53706.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association
for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
permission.
0 1985 ACM 0098-3500/85/0300-0020 $00.75
ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985, Pages 20-36.
Multivariate Halley Method for Solving Nonlinear Systems of Equations l 21
with the matrix A = (eij) and the vector b = (bl, bz, . . . , b,) given, then the
system (1.1) is said to be linear. This important special case is now fairly well
understood in both theory and computational practice. Otherwise, (1.1) is a
nonlinear system, and the situation is quite different from the linear case with
respect to both theory and practice. Most of the methods for nonlinear systems
investigated to date [14,X] involve some form of iteration, and many also involve
approximation of the nonlinear system by a linear system during the various
steps of the solution process, such as in the case of Newton’s methods and its
many variants [14,15]. It has been observed that some solution procedures work
better than others on a given problem, so that in the absence of a clear-cut
criterion for choosing the optimal method, it is advisable to have several choices
available in the form of computer programs that are easy to use.
It will be assumed that the operator f corresponding to the system (1.1) has
first and second Frechet derivatives f ‘, f fl on its domain D C R” [15]. In this
case, the first Frechet derivative off at x is represented by the Jacobian matrix
f’(x) = F ) (1.4)
( J )
and the second by the Hessian operator
(1.5)
[15]. Necessary values of the derivatives appearing in (1.4) and (1.5) will be
obtained by automatic differentiation [ 171, so that the user need only supply
expressions or subroutines for the n functions fi(xl, x2, . . . , x,) appearing in
(1.1). This avoids both the labor of providing code for derivatives and the
inaccuracy of numerical differentiation. In [20], it was shown how to automate
the calculation of the Jacobian matrix (1.4) needed in Newton’s method by the
use of type GRADIENT. Here, type HESSIAN [ 181will be used to evaluate both
(1.4) and (1.5), which are required for the computational implementation of a
cubically convergent iterative procedure, the multivariate Halley method due to
Cuyt [2, 3,4].
(a”)*
x u+l = xy + - (2.1)
a” + ib”’
ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985.
22 l A. A. M. Cuyt and L. B. Rail
where
u.” = -f’(x”)-‘[ix”) (the Newton correction),
and
b” = f’(x”)-‘f”(x”)a”a”, v = 0, 1, 2, . . . .
In the actual computation, the Jacobian matrix f ‘ix”) is not inverted. Rat,her,
the linear system
f ‘(x”)u” = -f (x”) (2.2)
is solved for a”, following which the linear system
f ‘(x”)b” = f N(xY)u”uY (2.3)
is then solved for b”. Since the systems (2.2) and (2.3) have the same coefficient
matrix, the decomposition of the Jacobian matrix f’ (x’) used to solve (2.2) can
also be used to solve (2.3), resulting in a saving of effort.
An outline of the computational effort for one step of Halley’s method is thus
(1) evaluation off ix”), f ‘ix”), f “ix”);
(2) solution of (2.2) for a”;
(3) evaluation off n(xY)uYuY;
(4) solution of (2.3) for b”;
(5) calculation of the Halley correction c” = (a”)‘/(~” + ibY);
(6) addition of the Halley correction to x” to obtain x”+‘.
This sequence of operations is more elaborate than required for Newton’s
method [15, 201, which requires only the evaluation off ix”), f ‘ix”), the solution
of (2.2) for a”, and finally the addition of u” to x” to obtain x”+l. However, in
favorable cases, the rate of convergence of Halley’s method is cubic, whereas
Newton’s method converges quadratically. Thus the greater effort required for
each step of Halley’s method could be offset if fewer steps are required to obtain
the accuracy desired. Two steps of Newton’s method can be combined to yield a
method with biquadratic convergence. However, this requires the solution of (2.2)
with different coefficient matrices f ‘ix”) and f ‘ix”+‘) and right sides -f (x”) and
-f (x”+l).
For computational implementation, it is convenient to consider the steps of
Halley’s method to consist of a procedure for evaluation (Step l), which will
depend on the specific system being solved, and a procedure for iteration (Steps
2-6), which will have the same form for all systems. The operations of compo-
nentwise multiplication and division of vectors will also have to be provided in
addition to the standard vector operations; These are simple to define, sirlce for
u=(q,uz,..., a,) and b = (bl, b2, . . . , b,) E R”, one has
ub = (ulbl, uzbz, , . . , unb,),
(2.4)
Writing F = (f, f ‘) to represent an element of this new type of data, the next
step is to define the corresponding arithmetic operations to implement the rules
for differentiation in a computable form. For example, for G = (g, g’), addition
and multiplication are defined by
F + G = (f + g, f' + g'h
(3.2)
F*G = (f*g,f*g' +g*f'),
respectively. Similarly, functions such as the sine function can be represented in
the form
GSIN(F) = (sin(f), cos(f)*f'). (3.3)
The independent variable xi is represented by the GRADIENT variable X[i] =
(xi, ei), where ei is the ith unit vector, and the evaluation of a GRADIENT
expression will automatically yield both the values of the function f(x) and its
gradient vector Of(x) at the given value of x. Thus the programmer need only
supply code for the evaluation of a function to get also its derivative, once the
standard set of GRADIENT operators and functions [20] is available.
For the present purpose, second derivatives are needed, and so type GRA-
DIENT is extended to type HESSIAN, a datum of which is the triple F = (f, f ',
f") = (f(x),Vf(x),Hf(x)) [W, w here Hf (x) is the Hessian matrix
(3.4)
and
SIN(F) = (sin(f), cos(f)*f', cos(f)*f" - sin(f)*f’*f’T). (3.6)
The HESSIAN variables X[i] corresponding to the independent variables Xi
are X[i] = (xi, ei, O), where ei is the ith unit vector, and 0 denotes the n X n zero
matrix. Thus evaluation of expressions of type HESSIAN yields the value of the
second derivative f"(x) as well as the values of the function f(x) and its first
derivative f ‘(x). Although the formulations of HESSIAN operators and standard
functions are more complicated than those for type GRADIENT [20], program-
ming them is no real challenge, and this needs to be done only one time. Once
available, these subroutines shift the burden of differentiation from the program-
mer to the computer, which is as it should be.
In order to calculate the Jacobian matrix (1.4) and Hessian operator (1.5) of a
uector-ualued function f(x) = (fi(x), fi(x), . . . , fn(x)), each real-valued component
function fi(x) is defined to be of type HESSIAN. In this case, the ith row of the
Jacobian matrix f ‘(x) is given by the gradient vector Vfi(X) of the ith component
function, and the Hessian matrix Hfi(x) of the ith component function will be
the ith “panel” of the Hessian operator f"(x).
[19]. Thus, as the result of a subroutine for computation of f(x) as the HESSIAN
variable F, one has
The power operator ** and various standard functions are also available for type
HESSIAN [18]. Typical examples of HESSIAN operators can be found in the
evaluation routine given in Appendix C.
In order to represent the vector x = (x1, x2, . . . , x,) of independent variables
and the vector-valued function f(x) = (fi(x), f&r), . . . , f”(x)), with components
of type HESSIAN, it is convenient to introduce the data type HESSVAR, defined
by
TYPE HESSVAR = ARRAY [DIMTYPE] OF HESSIAN; (4.5)
In this way, it is possible to code systems of equations (1.1) in a form that follows
ordinary mathematical notation. For example, the simple system
investigated by Cuyt and Van der Cruyssen [2, 41 requires the following HES-
SIAN operators and functions:
OPERATOR - (H: HESSIAN) RES: HESSIAN;
OPERATOR - (HA, HB: HESSIAN) RES: HESSIAN);
OPERATOR - (H: HESSIAN; R: REAL) RES: HESSIAN; (4.7)
OPERATOR + (HA, HB: HESSIAN) RES: HESSIAN;
FUNCTION HEXP(H: HESSIAN): HESSIAN;
The subroutines for these have to appear in the heading of the procedure
HESSEVAL(VAR X, F: HESSVAR) for the evaluation of f(x) corresponding to
(4.6) (see Appendix C). The evaluation off and its first and second derivatives is
then carried out by the statements
FL11 := HEXP(-X[l] + X[2]) - 0.1;
(4.6)
FL21 := HEXP(-X[l] - X[2]) - 0.1;
after which the evaluation of f and its derivatives takes place by means
of the statements
FL11 := 16 * (X[l] ** 4) + 16 * (x[2] ** 4)
+ X13] **4 - 16; (4.11)
F[21 := X[l] ** 2 + x[2] ** 2 + X[3] ** 2 - 3;
The meaning of DIM is the same as before; if one wishes to solve a smaller
system, the parameter AKDIM can be used to set the number of rows and
columns of the coefficient matrix A and components of the right side vector :B
that enter into the computation. More significantly, instead of returning a
floating-point RVECTOR x as an approximate solution of the linear system
An=B, (5.1)
LGLP returns an interval vector (IVECTOR) Y, which, if proper, is guaranteed
to contain the exact solution x of (5.1) [lo, 21, 221. Furthermore, successful
completion of LGLP guarantees that the floating-point matrix A is nonsingular
[21, 221. If A is singular or extremely badly conditioned, LGLP will return an
improper interval vector Y with all components equal to the improper interval
[+1, -11 [22].
ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985.
Multivariate Halley Method for Solving Nonlinear Systems of Equations l 27
For the first system to be solved, one sets NRS = FALSE, and then subsequently
NRS = TRUE for each new right side. The results from the first solution needed
later are stored as the real matrix R and interval matrix MB.
After solution of the linear systems (2.2) or (2.3), the interval vector Y has to
be checked and converted to a real vector, before the computation can be
continued. This is done by the function MID given in Appendix B.
A = (aij) = BX = (6.3)
Once the matrix I; is formed by computing the vectors (4.4), then the vector
(6.5)
and
OPERATOR / (VA, VB: RVECTOR) RES: RVECTOR;
VAR I: DIMTYPE; U: RVECTOR;
BEGIN
FOR I := 1 TO DIM DO
IF (VA[I] = 0) and (VB[I] = 0) THEN U[I] := 0
ELSE U[I] := VA[I]/VB[I]; (7.2)
RES := U
END;
ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985.
Multivariate Halley Method for Solving Nonlinear Systems of Equations l 29
In (6.2), the indeterminant form O/O is assigned the value 0, by continuity of the
Halley approximation. The calculation of the Halley correction also requires the
standard Pascal-SC operators
OPERATOR * (A: REAL; B: RVECTOR) RES: RVECTOR;
OPERATOR + (A, B: RVECTOR) RES: RVECTOR;
for multiplication of vectors by real numbers, and addition of vectors [22]. With
these and the componentwise operators (7.1) and (7.2), the Halley correction can
be evaluated by a statement of the form
CN := (AN * AN)/(AN + 0.5 * BN); (7.3)
where, of course, AN = a”, BN = b”, CN = c”. The current value of X is then
updated by the statement
FOR I := 1 TO DIM DO X[I] .F := X(I].F -t CN[I]; (7.4)
The steps required for a Halley iteration are collected in the form of the Pascal-
SC procedure given in Appendix B. Together with the procedure
HESSEVAL(VAR X, F: HESSVAR);
for the evaluation of the function f(r) corresponding to the system of equations
(l.l), a program for the iterative solution of (1.1) by Halley’s method can be
constructed easily. A simple program of this type is given in Appendix A, which
presents the results of each iteration to the user, who can then decide whether
to iterate further, stop the iteration, or start over with another initial vector.
In the program of Appendix A, the compiler directive
$USES LGL, DIM = #;
brings in the necessary type declarations, sets the constant DIM to the dimension
of the system specified by the user [13], and refers the compiler to the external
library LGLLIB containing the linear equation-solving and matrix inversion
routines [ 131.The $INCLUDEodirectives bring in the source code for the method
being used and for evaluation of the systems, which are in the external files
HALLEY.SRC and HESSEVAL.SRC, respectively [13]. In general, the program-
mer need only supply the file HESSEVAL.SRC for evaluation of the system
being solved, and modify the source code of the program ITERATE to set the
dimension and give the name of the method being used in the heading of the
output. The only place where modifications are necessary is indicated by “#” in
the source code file ITERATE.SRC.
8. NUMERICAL RESULTS
The method described in this paper was applied to the systems (4.8) and (4.9),
and the results were compared with those obtained by Newton’s method [20].
The initial approximations for the system (4.8) were
Xl = 4.3, xp = 2.0, (8.1)
[3, 41, and the initial approximations for (4.9) were
X] = 1.0, x2 = 1.0, x3 = 1.0, (8.2)
ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985.
30 ’ A. A. WI.Cuyt and L. B. Rail
[20]. For the syst.em (4.8), Newton’s method requires 55 iterations to reduce the
residual to 0 to 12 decimal places, whereas Halley’s methods required only five
iterations. On the other hand, for (4.9), the corresponding numbers were eigh.t
iterations for Newton’s method, and five for Halley’s method, a result that is
more favorable to Newton’s method. The results are given in detail in Appendix
lJ.
The methodology presented in this paper can also be used to automate other
higher order methods for the solution of systems of equations, such as Chebyshl-
ev’s method and the method of tangent hyperbolas [14, 151.
C:='R';WHILE C = 'R' DO
ACM Transactions on hilathematicalSoftware,Vol. 11, No. 1,March 1985.
Multivariate Halley Method for Solving Nonlinear Systems of Equations l 31
BEGIN (* ITERATION l )
BEGIN
-Y(X,F) i (* ITERATION STEP *)
K: =K+l ; (+ INCREASE ITERATION COUNTER *)
WRITELN;WRITELN('RESULTS OF ITERATION ',K:3);
ENDi (* ITERATION STEP l )
(* HALLEY METHOD l )
VAR I: DIMTYPE;
JACF : RMATRIX; (* THE JACOBIAN MATRIX l )
AN: RVECTOR; (* THE NEWTON CORRECTION *)
BN: RVECTOR;
CN: RVECTOR; (+ THE HALLEY CORRECTION l )
A: RMATRIX;
B: RVECTOR; (* USED BY LGLPR l )
R: RMATRIX; (* "
l I
Y: IVECTOR; (* I l I
MB: IMATRIX; (* I
l )
LGLPR(DIM,DIM,JACF,B,FALSE,R,MB,Y)i
AN := MID(Y); (* NEWTON CORRECTION l )
X[ l] = 4.30000000000E+00 F[ l] = 2.58843723000E-04
X[ 21 = 2.00000000000E+00 F[ 21 = -9.81636952230E-02
RESULTS OF ITERATION 1
X[ l] = 3.33615528246E+OO F[ l] = 2.40511813000E-04
X[ 21 = l.O3597241993E+OO F [ 21 = -8.73756488903E-02
RESULTS OF ITERATION 2
X[ l] = 2.560818009373+00 F[ l] = 1.44792584000E-04
X[ 21 = 2.59679794981E-01 F[ 21 = -4.042372200443-02
RESULTS OF ITERATION 3
X[ 1] = 2.308175634693+00 F[ l] = 9.324795000003-06
X[ 21 =i 5.683785305003-03 F[ 21 = -1.121100995703-03
ACM Transactions on Mltthematical Software, Vol. 11, No. 1, March 1985.
Multivariate Halley Method for Solving Nonlinear Systems of Equations - 35
FxESUi.,TS OF ITERATION 4
X[ l] = 2.302585151183+00 F[ l] = 3.02000000000E-10
X[ 21 = 6.12055700000E-08 F [ 21 = -1.193960000003-08
RESULTS OF ITERATION 5
X [ 11 = 2.30258509299E+OO F [ 1] = 0.00000000000E+00
X[ 21 = -2.433561400003-12 F[ 21 = 0.00000000000E+00
X[ 11 = 1.00000000000E+00 F[ 11 = 1.70000000000E+01
XL 21 = 1.00000000000E+00 F [ 21 = 0.00000000000E+00
X[ 31 = 1.00000000000E+00 F[ 31 = 0.00000000000E+00
RESULTS OF ITERATION 1
X[ l] = 8.91118701964E-01 F[ l] = 9.37521623100E-01
X[ 21 = 7.05429347548E-01 F[ 21 = -9.44922445000E-03
X[ 31 = 1.30339083879E+OO F[ 31 = 2.20137281800E-03
RESULTS OF ITERATION 2
X[ 11 = 8.77982528233E-01 F [ 11 = l.O3685690000E-03
X[ 21 = 6.767866893023-01 F[ 21 = -9.09324000000E-06
X[ 31 = 1.33082582033E+OO F[ 31 = 9.05738500000E-06
RESULTS OF ITERATION 3
X[ 11 = 8.77965760274E-01 F[ 1] = 0.00000000000E+00
X[ 21 = 6.76756970516E-01 F[ 21 = O.OOOOOOOOO0OE+OO
X[ 31 = 1.33085541162E+OO F[ 31 = 2.00000000000E-12
RESULTS OF ITERATION 4
x[ 11 = 8.779657602743-01 F[ l] = 0.00000000000E+00
X[ 21 = 6.76756970517E-01 F[ 21 = O.OOOOOOOOOOOE+OO
X[ 31 = 1.330855411623+00 F[ 31 = 1.00000000000E-12
RESULTS OF ITERATION 5
X[ l] = 8.779657602743-01 F[ l] = 0.00000000000E+00
X[ 21 = 6.767569705186-01 F[ 21 = O.OOOOOOOOOOOE+OO
X[ 31 = 1.33085541162E+OO F[ 31 = 0.00000000000E+00
REFERENCES
1. BOHLENDER, G., GRUNER, K., KAUCHER, E., KLATTE, R., KRAMER, W., KULISCH, U. W., RUMP,
S. M., ULLRICH, C. WOLFF VON GUDENBERG, J., AND MIRANKER, W. L. PASCAL-SC: A
PASCAL for contemporary scientific computation. Res. Rep. RC 9009, IBM Thomas J. Watson
Research Center, Yorktown Heights, N.Y., 1981.
2. CUYT, A. A. M. Abstract Pad6 approximants for operators: Theory and applications. Lecture
Notes in Mathematics, vol 1065, Springer-Verlag, New York, 1984.
3. CUYT, A. A. M. Numerical stability of the Halley-iteration for the solution of a system of
nonlinear equations. Math. Comput. 38 (1982), 171-179.
ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985.
36 * A. A. M. Cuyt and L. 6. Rail
4. CUYT, A. A. M., AND VAN DER CRUYSSEN, P. Abstract Pad6 approximants for the solution of
a system of nonlinear equations. Rep. 80-17, University of Antwerp UIA, Antwerp, Belgium,
1980.
5. GRAY, J. H., AND RALL, L. B. NEWTON: A general purpose program for solving nonlinear
systems. In Proceedings of the 1967 Army Numerical Analysis Conference. U. S. Army Research
Office, Durham, N.C., 1967, pp. 11-59.
6. KEDEM, G. Automatic differentiation of computer programs. ACM Trans. Math. Softw. 6, 2
(June 1980), 150-165.
7. KURA, D., AND RALL, L. B. A UNIVAC 1108 program for obtaining rigorous error estimates for
approximate solutions of systems of equations. Tech. Summary Rep. 1168, Mathematics Research
Center, University of Wisconsin-Madison, 1972.
8. KULISCH, U. A new arithmetic for scientific computation. In A New Approach to Scientific
Computation, U. Kulisch and W. L. Miranker, Eds. Academic Press, New York, 1983, pp. l-26.
9. KULISCH, U., AND MIRANKER, W. L. Computer Arithmetic in Theory and Practice. Academic
Press, New York, 1981.
10. KULISCH, U., AND MIRANKER, W. L., Eds. A New Approach to Scientific Computation. Academic
Press, New York, 1983.
11. MOORE, R. E. Interval Analysis. Prentice-Hall, Englewood Cliffs, N. J., 1966.
12. MOORE, R. E. Techniques and Applications of Interval Analysis, vol. 2, SIAM Studies in Applied
Mathematics. SIAM, Philadelphia, Pa., 1979.
13. NEAGA, M. Pascal-SC Language Description and Programming Guide (German). Department
of Computer Science, University of Kaiserslautern, Kaiserslautern, W. Germany, 1982.
14. ORTEGA, J. M., AND RHEINBOLDT, W. C. Iterative Solution of Nonlinear Equations in Seueral
Variables. Academic Press, New York, 1970.
15. RALL, L. B. Computational Solution of Nonlinear Operator Equations. Krieger, Huntington,
N. Y., 1979.
16. RALL, L. B. Applications of software for automatic differentiation in numerical computation.
Computing, Suppl. 2 (1980), 141-156.
17. RALL, L. B. Automatic Differentiation: Techniques and Applications, Lecture Notes in Computer
Science, vol. 120. Springer-Verlag. Berlin, Heidelberg, New York, 1981.
18. RALL, L. B. Differentiation and generation of Taylor coefficients in PASCAL-SC. In A New
Approach to Scientific Computation, U. W. Kulisch and W. L. Miranker, Eds. Academic Press,
New York, 1983, pp. 291-309.
19. RALL, L. B. Representations of intervals and optimal error bounds. Math. Comput. 41, 163
(1983), 219-227.
20. RALL, L. B. Differentiation in Pascal-SC: Type GRADIENT. ACM Trans. Math. Softw. IO, 2
(June 1984), 161-184.
21. RUMP, S. Solving algebraic problems with high accuracy. In A New Approach to Scientific
Computation, U. W. Kulisch and W. L. Miranker, Eds. Academic Press, New York, 1983, pp.
53-120.
22. WOLFF VON GUDENBERG, J. Complete Arithmetic of the PASCAL-SC Computer: User Hand-
book (German). Institute for Applied Mathematics, University of Karlsruhe, Karlsruhe, W.
Germany, 1981.