a262114
a262114
Systems
"O ptimization
Laboratory
-" ':jTIC
AR24 199393
•B•
Department of Operations Research
Stanford University
Stanford, CA 94305
SYSTEMS OPTIMIZATION LABORATORY
DEPARTMENT OF OPERATIONS RESEARCH
STANFORD UNIVERSITY
STANFORD, CALIFORNIA 94305-4022
K-_TTLl7?•Y
UTIC !n2T!'?CTED I
Acoession For
NTIS GRA&I I
Research and reproduction of this report were partially supported by the National Science Foundation
Grants DDM-9204208, DDM-9204547; the Department ot Energy Grant DE-FG03-92ER25117 and the
Office of Naval Research Grant N0001 4-90-J-1 242.
Research partially supported by the Gdran Gustafsson Foundation and the Swedish National Board for
Technical Development.
This paper is simultaneously issued as Report TRITA-MAT-1993-9, Department of Mathematics, Royal
Institute of Technology; Report LMS 93-2, Department of Mathematics, University of California at San
Diego; and Report SOL 93-1, Department of Operations Research, Stanford University. It supersedes
part of Report SOL 89-12 "A modified Newton method for unconstrainedminimization'; Department of
Operations Research, Stanford University, 1989.
Any opinions, findings, and conclusions or recommendations expressed in this publication are those of
the author(s) and do NOT necessarily reflect the views of the above sponsors.
Also issued as Operations Research Department Technical Report 93-2. Reproduction in whole or in part
is permitted for any purposes of the United States Government. This document has been approved for
public release and sale; its distribution is unlimited.
93-05990
933 23 035 iU~U\\
COMPUTING MODIFIED NEWTON DIRECTIONS USING
A PARTIAL CHOLESKY FACTORIZATION
tDepartment of Mathematics
University of California at San Diego, La Jolla, California 92093-0112, USA
Abstract
The effectiveness of Newton's method for finding an unconstrained mini-
mizer of a strictly convex twice continuously differentiable function has prompted
the proposal of various modified Newton methods for the nonconvex case.
Linesearch modified Newton methods utilize a linear combination of a de-
scent direction and a direction of negative curvature. If these directions are
sufficient in a certain sense, and a suitable linesearclh is used, the resulting
method will generate limit points that satisfy the second-order necessary con-
ditions for optimality.
We propose an efficient method for computing a descent direction and a
direction of negative curvature that is based on a partial Cholesky factoriza-
tion of the Hessian. This factorization not only gives theoretically satisfactory
directions, but also requires only a partial pivoting strategy, i.e., the equivalent
of only two rows of the Schur complement need be examined at each step.
Keywords: Unconstrained minimization, modified Newton method, descent
direction, negative curvature, Cholesky factorization
*Research partially supported by the G6ran Gustafsson Fouindation and lhe Swedish National
Board for Technical Development.
4 Research supported by the Department of Energy Contract DF-FG03-92ER25117, the Na-
tional Science Foundation Grants DDM-9204208, DDM-92001547, and the Oiir-of Ntva;l lResearch
Grant N00014-90-J-1242.
1. Introduction
gkk-- 0 gk -
9 0 and(1 0, 1.1n)
and
drHdk -, 0 . min{AmI.,(I1.),0} -- 0 and d1 . - 0, (1.11,)
then every limit point of the resulting sequence (xk.}=0 will satisfy the second-order
necessary conditions for optimality.
It has been observed in practice that the number of iterates at which the lies-
sian is positive definite is large compared to the total number of iteration,;. Since
linesea.rch methods revert to Newton's method when the Hessian is sufficiently pos-
itive definite, it would seem sensible to use a modified Newton method based oil
the most efficient method for solving a symmetric positive-definite system. This i,
the motivation for the modified Cholesky factorization proposed by Gill and Miir-
ray [GM74]. However, it has been shown by Mor6 and Sorensen [tNS791 that this
factorization may not give directions of negative curvature that are sufficient in the
sense of (1.1b). This paper is motivated by the need for an algorithm with the
efficiency and simplicity of the Cholesky factorization, but with the guarantee of
convergence when used with a suitable linesearch. It is shown in Section 3 that a
2. The partial Cholesky factorization 3
partial Cholesky factorization can give search directions that are sufficient in the
sense of (1.1).
To simplify the notation, we will drop the subscript k wvheit referring to the
quantities 9k, Hk, sk and dk at a specific iteration. Unless otherwise stated, 11-l
refers to the vector two-norm or its induced matrix norm. The vector r, (l!liotes
the j-th unit vector whose dimension is determined by the context.
(.1r
111 1112
H12
LI:11
L, ) ( m LT
1 LfI)
where Lit is unit lower triangular and BI is a positive-definite diagonal matrix.
(2.1)
The submatrix H 11 is positive definite, and ti!1 = LIIB 1 L'T is its usual Cholesk-v
factorization obtained using diagonal pivoting. The factorization mia.y be written
briefly as H = LBLT, where L is a row-permuted lower-triangular matrix with
We will use Ii, to denote the size of "_2, so that v, + 7, = 7). A "pseido-nliathial"
version of the pa.rtial Cholesky algorith ii is given in Algoriithmn 2.
4 Partial Cholesky factoriZation
The curvature along any direction d computed from the partial Cholesky factor-
ization is related to the magnitude of the smallest eigenvalue of the Schur comple-
ment B2. The following lemma relates the smallest eigenvalue of B 2 to the smallest
eigenvalue of H.
Lemma 2.1. Let H be a symmetric n x n matriz with at least one negative eigen-
value. Let the partial Cholesky factorization of H be denoted by H = LBLT, where
PTK P is partitioned as in (2.1). Then
where -H2
Y=P -L -T L2T P( •I;
Proof. The inequality A<i.(B.) < Ajn(H) can be established using the idenltity
Note that the matrix Y (2.3) consists of the last n, columns of L"'. Our analysis
requires bounds on the norms of Y, L and L-', which are provided by the following
lemma given by Highaln [Hig90].
Lemma 2.2. Let H be factorized using the partial Cholesky factorization described
in Algorithm 2.1. If pTJIP is partitioned as in (2.1), then
n,)(4-- - 1);
(a) JITT~r
(a)~ 51
2 <; /1
3I -
V-(n
vI;r
2. The partial Cholesky fctorization 5
(c) IILII _-
Then,
T Jg12 n 4f" t-
-2
-g s > n vail 2 and Ilsll•_<2 H( .
2\.(B) mi(D
Proof. From the definition of s in (2.5) we have
s = 1 L-'g.
-L-P-1 (2.6)
Premultiplying (2.6) by gT gives
= gTL-T-fL-y
-gTs 2 1 T
IILI12Amax.(TDj "
and the required lower bound on -gTs follows from part (c) of Leinina 2.2. To
obtain the bound on tlslI we derive the inequality Ilsll_ 111211!111, by
taking norms of both sides of (2.6), substituting for L from (2.2) and using norm
inequalities. The required upper bound follows from part (d) of Lemma 2.2. 1
6 PartialCholesky factorization
and
dTtId < 31 2(1 -_ )
S-
n,(3-2 + 2(4"' - 1 ))Aflin(l)"
Proof. If nj = n, then A,,in(H) > 0, and the lemma holds from the de(inition d = 0.
For the remainder of the proof, assume that n1 < n.
First, it is necessary to show that -y < ap, where = max {{maxj>,,, bi,}, 0).
If the factorization terminates with -f = 0, the inequality -y < ip is trivially satis-
fied. If the factorization terminates with - > 0, there exists an index t (t > n,)
such that b,, = 7. Since - must be an unacceptable pivot, we can infer that
7 < vmaxjtt,j>n, Ibi,I. Consequently, if nl < n, it must hold that - < vp.
Let d, and v, denote the first nj components of Pd and v respectively. Similarly,
let d 2 and v2 denote the last n 2 components of Pd and v. The definitions of d and
v imply that IIvil = 0, tIv 2 11= 1, and d2 = Vpv 2 . Therefore,
where the last inequality follows from Lemma 2.2. Combining (2.7) and (2.8) yields
>1 1
p_ Ami(B,) >_ -- A,ni,(H). (2.12)
<- 3v 2 (1v )
30(II), AP (2.13)
drHd < 3z(1-.)
2
dTd - 3V + 2(4"' - 1) - n 2 (3v:- + 2(4"' - 1))
as required.
Since, by definition, Amin(H) < dTI~d~dTd, the left-most inequality of (2.13) gives
an upper bound on p, which in conjunction with (2.9) and (2.12) give the bound.
on dTd as
1 1 ( '2(,l"' - )
1\Am.n(H) < drd < - 1. + 2(-n A, (/l ).
n2 V 3-V2
This lemma gives a relation between the curvature along d and the smallest
eigenvalue of H, which is the "best possible" curvature. The bound is exponential
in n 1 , but the computational experiments discussed below imply that the bound is
unlikely to be tight in practice. However, as in Highain (HigOO], we observe that
8 PartialCholesky factorization
there do exist matrices whose bound is "almost" tight. For given n (n > 3) and 0,
define L(O) and B(O) as
1
- cos 0 1
-cos9 -cos0 1
L(8) = : : ". -. and
1
sin 2o
sin 4 9
B(O) =
sinll ,-a 9
0 -1
-1 0
4
+
matrices of order 50. Each H was defined as QAQ T , with Q a random orthogonal
matrix and A a random diagonal matrix with at least one negative element. The
matrix Q was obtained from the QR-factorization of a 50 x 50 matrix whose ele-
ments were taken from an independent normal distribution with zero mean and unit
variance. The elements of A were taken from an independent uniform distribution in
the interval [-25,25]. Directions of negative curvature were computed with V-values
ViT, 0.05, 0.10, ... , 0.95, and 1 - Vi, where c denotes the machine precision. A new
random matrix was generated for each factorization, giving a total of 1500 matrices
for each value of v. Figure 2.1 gives the outcome of the computational experiment.
The three lines depict the maximum, mean, and minimum values of the ratio r of
dTHd/ldd to Ami,(H). Each "+" represents the value of r for a particular value of
the parameter v.
0.46-
0.53 + +
TI +
+ +
0.4- -
0.1!
The bound on r given by Lemma 2.4 is approximately maximized for v = 2/3. If,
for n = 50, this optimal value gives n, = 49, the theoretical bound is approximately
7 x 10-31. This should be compared with the computed values of r, which never
fell below 0.05 when v was larger than 0.5. The minimum value of r attained a
maximum of 0.0809 for v = 0.9. Based on these results, we would recommend a
10 PartialCholesky factorization
value of v in the range (0.5, 0.95). Note that the larger the value of v, the smaller
the value of n1 and consequently, the smaller tile amount of computation.
3. Theoretical results
The partial Cholesky factorization can be used as the bas-is for a descent r-!thod for
minimizing a twice-continuously differentiable function f :Il' -- Ill. This ilethodl
defines a sequence {Xk}'__0 of improving estimates of a local minimizer.
Let x 0 be any starting point such that the level set {x I f(x) < f(x0)} is compact.
Let {sk} and {dk} be bounded sequences such that each sk is a descent (lirectiun
that satisfies (1.1a) and each dk is a direction of negative curvature that satisfies
(1.1b). Mor6 and Sorensen [MS79] show that with an appropriate linesearch, certain
linear combinations of sk and dk define xk+l so that every limit point of {xk}' 0
will satisfy the second-order necessary conditions for optimality-i.f., at every limit
point t, Vf(;f) is zero and V 2f(i) is positive semidefinite. The main result of this
paper-that the search directions obtained using the partial Cholesky factorizationi
are sufficient in the sense of Mor6 and Sorensen [MS79]-is stated in the following
theorem.
Proof. Since {Xk} lies in a compact region, the smoothness of f implies that {il•9&I}
and {IlHkII} are bounded.
With the existence of cl and C2 , and the boundedness of 1Ig9j, Lemma 2.3 implies
that {sk} is a bounded sequence, and gks&. - 0 implies Yk -- 0 anld Sk 0, as
required.
Lemma 2.4 and the boundedness of jIHk•j imply that {dA.) is a bounded sequence,
and dfkf•1dk - 0 implies d4 - 0 and min{A ji,(Hk),0} - 0, as required. I
If V 2f(xk) is sufficiently positive definite, all pivots will be acceptable ami!d th,
partial Cholesky factorization will terminate with n1 = n. This implies that if
{x•}• 0 has a limit point t at which V 2f(;E) is sufficiently positive definite, then the
iterates will be identical to those of Newton's method for l sufficiently large.
4. Discussion
4. Discussion
The partial Cholesky factorization may be implemented in other ways. For example,
the calculation of the matrix HI, can be made independent of the calculation of the
descent direction Sk. Once a direction of negative curvature has been defined, a
descent direction can be calculated by forming the modified Cholesky factorization
of B 2 (see, e.g., Gill and Murray [GM74], Schnabel and Eskow [SE90]).
The algorithm of Section 2.2 requires the examination of the diagonals and a sin-
gle row of the Schur complement at each step. Alternative strategies can be devised
in which the complete Schur complement is examined under certain exceptional cir-
cumstances. For example, if a pivot is small, the pivot acceptance criterion could be
strengthened so that a pivot is acceptable if, in addition to the requirements of Algo-
rithm 2.1, it is larger in absolute value than L'bm,,, where b,,,,,x is either the diagonal
of largest magnitude in the Schur complement or the element of largest magnitude
in the full Schur complement. Each of these modifications gives an algorithm with
identical theoretical properties, but a potentially smaller value of ni. However, this
potential improvement is at the expense of an increase in the number of compar-
isons during the factorization. The pivot criterion that requires the examination of
the full Schur complement would cope successfully with the "pathological" H(O) of
Section 2.2 since the factorization would terminate after one step for 0 sifficiently
small.
5. Summary
We have shown how a partial Cholesky factorization can be used to define search
diiections suitable for a linesearch-based modified Newton method. The resulting
directions are sufficient in the sense that it is possible to generate a sequence {Jx. }ký=0
with limit points having a zero gradient and a positive-seinidefinitc liessian.
To our knowledge, this is the first triangular factorization that not only gives
theoretically satisfactory directions, but also requires only a partial pivoting strat-
egy, i.e., the equivalent of only two rows of the Schur complement need be examined
at each step.
Acknowledgement
We thank the Numerical Algorithms Group, Oxford, for providing the computing
facilities that enabled work on this paper to be completed.
A. Eigenvalues of H(0)
Lemma A.1. Let the n x n-matrices L(O) and B(O) be dcfincd as in 8cc/ion 2 for
0 = 0 and n > 3. Define H(O) = L(0)B(0)L(0)r. Then A = -I(V n'-+ "2n- 7 -
n + 1) is the smallest eigenvalue of 11(0), and -1 _<A < - I + 4/(u + 1).
12 Partial Cholesky factorization
-1 1 1 . 1 1 1
-1 1 1 1 1 1
Hf(O)= : ..
-1 1 1 .. 1 1 1
-1 1 1 .. 1 1 0
-1 1 1 ... 1 0 1
Since B(0) has one negative eigenvalue and L(0) is nonsingular, Sylvester's law
of inertia implies that H(0) has one negative eigenvalue (see e.g., Golub and Van
Loan [GV89, page 416]). Consequently, since A is negative for n > 3, it is enough
to show that it is an eigenvalue.
Assume that v = (1 -1 -1 -1 a a)T is an eigenvector of H(0) for some
scalar a. Then, if v is an eigenvector, there must exist a A such that
It is straightforward to show that for n > 3, (A.1) has a negative solution A givcen
by
2
A =-and/"n + 2n- 7- n + 1 a= n + 2n - 7 + n - 3
2 4
The upper and lower bounds on A follow from the sequence of inequalities
(Note that the lower bound can also be obtained directly from Lemma 2.1.) I
References
[CP90] E. Casas and C. Pola. An algorithm for indefinite quadratic programming based on a
partial Cholesky factorization. Preprint. Universidad de Cantabria, Santander, Spain,
1990.
[DS89] J. E. Dennis, Jr. and R. B. Schnabel. A view of unconstrained optimization. In G. L.
Nemhauser, A. H. G. Rinnooy Kan, and M. J. Todd, editors, Ihondbook.4 in Operaouioi
Research and MaangernentScience, volume 1. Oplimization, chapler I, pagc. 1-72. North
Holland, Amsterdam, New York, Oxford and Tokyo, 1989.
[FF77] R. Fletcher and T. L. Freeman. A modified Newton method for miniimization. J. Opli-
mization Theory and Applications, 23, 357-372, 1977.
[FGM91] A. Forsgren, P. E. Gill, and W. Murray. On the identification of local miininizers in
inertia-controlling methods for quadratic programming. SIAM J. on Aftirix Analysis
and Applications, 12, 730-746, 1991.
A. Eigenvalues of H(O) 13
(GM74] P. E. Gill and W. Murray. Newton-type methods for unconstrained and linearly con-
strained optimization. Mathematical Programming, 7, 311-350, 1974.
[Go180] D. Goldfarb. Curvilinear path steplength algorithms for minimization which use direc-
tions of negative curvature. Mathematical Programming, 18, 31-40, 1980.
[GV89] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University
Press, Baltimore, Maryland, second edition, 1989.
[Hig9O] N. J. Higham. Analysis of the Cholesky decomposition of a semi-definite matrix. II
M. G. Cox and S. Hammarling, editors, Reliable Numerical Computation,pages 161-185.
Oxford University Press, 1990.
[KD79] S. Kaniel and A, Dax. A modified Newton's method for unconstrained minimization.
SIAM J. on Numerical Analysis, 16, 324-331, 1979.
[McC77] G. P. McCormick. A modification of Armijo's step-size rule for negative curvature. Math-
ematical Programming, 13, 111-115, 1977.
[MP78] H. Mukai and E. Polak. A second-order method for unconstrained optimization, J.
Optimization, Theory and Applications, 26, 501-513, 1978.
[MS79] J. J. More and D. C. Sorensen. On the use of directions of negative curvature in a
modified Newton method. Mathematical Programming, 16, 1-20, 1979.
[OR70] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several
Variables. Academic Press, New York, 1970.
[SE90] R. B. Schnabel and E. Eskow. A new modified Cholesky factorization. SIAM J. oft
Scientific and Statistical Computing, 11, 1136-1158, 1990.
(SSB85) G. A. Shultz, R. B. Schnabel, and R. H. Byrd. A family of trust-region bhaed algorithms
for unconstrained minimization with strong global convergence properties. SIAM J. o(
Numerical Analysis, 22, 47-67, 1985.
14 PartialCholesky factorization
1i. AGENCY Ufa ONLY (04111"MOM) 4. REPORT DATE 3. RIPOR11 TYPE AND DAYES COVERED1
Magc Jmjý.
""'m echicalRP nrt_
4. TITLE AND SUSa~tma FUNOIG NUMaERS
Computing Modified Newton Directions Using A Partial DE-FG03-92ER25117
Cholesky Factorization N00014-90--J-1242
6. kuoTHOt(S)
Anders Forsgren, Philip E. Gill and Walter Murray
UNLIMITED UL
Abstract
The effectiveness of Newvton's method for finding an unconstrained mini-
mizer of a strictly convex twice continuously differentiable function has prompted
the proposal of various modified Newton methods for the nonconvex case.
Linesearch modified Newton methods utilize a linear combination of a de-
scent direction and a direction of negative curvature. If these directions are
sufficient in a certain sense, and a suitable Iinesearch is used, the resulting
method will generate limit points that satisfy the second-order necessary con-
ditions for optimality.
We propose an efficient method for computing a descent direction and a
direction of negative curvature that is based on a partial Cholesky factoriza-
tion of the Hessian. This factorization not only gives theoretically satisfactory
directions, but also requires only a partialpivoting strategy, i.e., the equivalent
of only two rows of the Schur complement need be examined at each step.
1141.
SU641ECT TERMS IS. NUMBER OF PAGES
14 pp
Unconstrained minimization, modified Newton method, descent 116. PRICE CooE
direction, negative curvature, Cholesky factorization
SECURITY CLASSIFICATION SEURITY CLASSIFICATION 20. LIMITATION OF
I?. SECURITY CLASSIFICATION
OF REPORT j 13. Of THIS PAGE
11
OFABSTRACT
ABSTRACT