0% found this document useful (0 votes)
14 views

a262114

Uploaded by

srnankit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

a262114

Uploaded by

srnankit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

"AD-A262 114

Systems
"O ptimization

Laboratory

Computing Modified Newton Directions Using DTIC


A Partial Cholesky Factorizatlon unam
Just
by
Anders Forsgren, Philip E. Gill and Walter Murray by-
Dist
TECHNICAL REPORT SOL-93-1 AvE
March 1993
DNot

-" ':jTIC
AR24 199393
•B•
Department of Operations Research
Stanford University
Stanford, CA 94305
SYSTEMS OPTIMIZATION LABORATORY
DEPARTMENT OF OPERATIONS RESEARCH
STANFORD UNIVERSITY
STANFORD, CALIFORNIA 94305-4022

K-_TTLl7?•Y
UTIC !n2T!'?CTED I

Acoession For
NTIS GRA&I I

Computing Modified Newton Directions Using DTIC TAB 0


A Partial Cholesky Factorizatlon Umiazounoed 0
Justification

Anders Forsgren, Philip E. Gill and Walter Murray By


Distribution/
TECHNICAL REPORT SOL-93-1 Dvatrabutiou e
Availability Codesf
March 1993 va~l. and/or
Dist Special

Research and reproduction of this report were partially supported by the National Science Foundation
Grants DDM-9204208, DDM-9204547; the Department ot Energy Grant DE-FG03-92ER25117 and the
Office of Naval Research Grant N0001 4-90-J-1 242.

Research partially supported by the Gdran Gustafsson Foundation and the Swedish National Board for
Technical Development.
This paper is simultaneously issued as Report TRITA-MAT-1993-9, Department of Mathematics, Royal
Institute of Technology; Report LMS 93-2, Department of Mathematics, University of California at San
Diego; and Report SOL 93-1, Department of Operations Research, Stanford University. It supersedes
part of Report SOL 89-12 "A modified Newton method for unconstrainedminimization'; Department of
Operations Research, Stanford University, 1989.
Any opinions, findings, and conclusions or recommendations expressed in this publication are those of
the author(s) and do NOT necessarily reflect the views of the above sponsors.
Also issued as Operations Research Department Technical Report 93-2. Reproduction in whole or in part
is permitted for any purposes of the United States Government. This document has been approved for
public release and sale; its distribution is unlimited.

93-05990
933 23 035 iU~U\\
COMPUTING MODIFIED NEWTON DIRECTIONS USING
A PARTIAL CHOLESKY FACTORIZATION

Anders FORSGREN*, Philip E. GILLt and Walter MURRAYt

*Optimization and Systems Theory, Department of Mathematics


Royal Institute of Technology, S - 100 44 Stockholm, Sweden

tDepartment of Mathematics
University of California at San Diego, La Jolla, California 92093-0112, USA

ISystems Optimization Laboratory, Department of Operations Research


Stanford University, Stanford, California 94305-4022, USA

Technical Report SOL 93-1 §


March 1993

Abstract
The effectiveness of Newton's method for finding an unconstrained mini-
mizer of a strictly convex twice continuously differentiable function has prompted
the proposal of various modified Newton methods for the nonconvex case.
Linesearch modified Newton methods utilize a linear combination of a de-
scent direction and a direction of negative curvature. If these directions are
sufficient in a certain sense, and a suitable linesearclh is used, the resulting
method will generate limit points that satisfy the second-order necessary con-
ditions for optimality.
We propose an efficient method for computing a descent direction and a
direction of negative curvature that is based on a partial Cholesky factoriza-
tion of the Hessian. This factorization not only gives theoretically satisfactory
directions, but also requires only a partial pivoting strategy, i.e., the equivalent
of only two rows of the Schur complement need be examined at each step.
Keywords: Unconstrained minimization, modified Newton method, descent
direction, negative curvature, Cholesky factorization

*Research partially supported by the G6ran Gustafsson Fouindation and lhe Swedish National
Board for Technical Development.
4 Research supported by the Department of Energy Contract DF-FG03-92ER25117, the Na-
tional Science Foundation Grants DDM-9204208, DDM-92001547, and the Oiir-of Ntva;l lResearch
Grant N00014-90-J-1242.

IThis paper is simultaneously issued as Report TRITA-MAT-1993-9, Departinctit. of Mathe-


matics, Royal Institute of Technology; Report LMS 93-2, Department of Mathematicts. Uiniversitv
of California at San Diego; and Report SOL 93-1, Department of Operations Research, Stanford
University. It supersedes part of Report SOL 89-12 "A modified Newton method for unconstmmincd
minimization", Department of Operations Research, Stanford University, 1989.
2 PartialCholesky factorization

1. Introduction

We consider the unconstrained minimization of a twice continuously differentiable


function f : R'• -- BR. If f is strictly convex, the excellent local convergence proper-
ties of Newton's method make it one of the most effective methods for minimization
(see, e.g., Ortega and Rheinboldt [OR70]).
In the non-convex case, various modified Newton methods have been p)roposed
that ensure convergence from an arbitrary starting point. Here we focus on the
class of linesearch modified Newton methods (for a complete discussion of modilied
Newton methods and their relative merits, see, e.g., Shultz et a/. [SSIJS5, l)ekilrii
and Schnabel [DS891). Linesearch modified Newton methods ga:.:. .c a c.
{lxk}'_ 0 of improving estimates of a local minimizer. At iteration k, a linesearch is
performed along a path formed from a linear combination of two directions .Sk and
dk, where either sk or dk can be zero. The directions sk and dj. are chosen such
that gTs. <_0 and dII,.d. !_ 0, where gL. and Ilk denote the gradient Vf(x) ill
Hessian V 2f(x) evaluated at xL.. (Implicitly, we also assume the condition g~Td•. _ 0.
which can be imposed with a. trivial sign change of dk.) Each nonero ,•. sati4•is
gTsk < 0 and is known as a descct dircclion. Each nonzero dk satisfies d(/!lrdk < ft
and is known as a direction of negative curvaturc. If Lkis nonzero, 11k must have
at least one negative eigenvalue. (IHenceforth we will sacrifice precision for the sake
of brevity and refer to the sequences {sk} and {dk.} as sequences of "descent di-
rections" and "directions of negative curvature".) Linesearch methods of this type
have been proposed by Gill and Murray [GM74], Fletcher and Freeman [FF77J,
McCormick [McC77], Mukai and Polak [MP78], Kaniel and Dax [KD79], and Gold-
farb [Go180].
Mor6 and Sorensen [MS79] have shown that if: (i) a modified Newton method is
used in conjunction with a suitable linesearch; and (ii) the directions sj. and d(. are
sufficient in the sense that the sequences {st.} and {(dl} are bounded and satisfy

gkk-- 0 gk -
9 0 and(1 0, 1.1n)
and
drHdk -, 0 . min{AmI.,(I1.),0} -- 0 and d1 . - 0, (1.11,)
then every limit point of the resulting sequence (xk.}=0 will satisfy the second-order
necessary conditions for optimality.
It has been observed in practice that the number of iterates at which the lies-
sian is positive definite is large compared to the total number of iteration,;. Since
linesea.rch methods revert to Newton's method when the Hessian is sufficiently pos-
itive definite, it would seem sensible to use a modified Newton method based oil
the most efficient method for solving a symmetric positive-definite system. This i,
the motivation for the modified Cholesky factorization proposed by Gill and Miir-
ray [GM74]. However, it has been shown by Mor6 and Sorensen [tNS791 that this
factorization may not give directions of negative curvature that are sufficient in the
sense of (1.1b). This paper is motivated by the need for an algorithm with the
efficiency and simplicity of the Cholesky factorization, but with the guarantee of
convergence when used with a suitable linesearch. It is shown in Section 3 that a
2. The partial Cholesky factorization 3

partial Cholesky factorization can give search directions that are sufficient in the
sense of (1.1).
To simplify the notation, we will drop the subscript k wvheit referring to the
quantities 9k, Hk, sk and dk at a specific iteration. Unless otherwise stated, 11-l
refers to the vector two-norm or its induced matrix norm. The vector r, (l!liotes
the j-th unit vector whose dimension is determined by the context.

2. The partial Cholesky factorization

The partial Cholesky factorization of H is a variant of the standard Cholesky fac-


torization with diagonal pivoting. The algorithm is stated in outer-product form,
where the Schur complement associated with the unfactorized part of 11 is up-
dated explicitly at each step (see, e.g., Golub and Van Loan [GV89, page 1431 and
Higham [Hig90]).
At each step, the largest diagonal is selected as pivot and is used to eliminate a
row and column from the Schur complement. The algorithm continues until either
all the matrix has been factorized or the pivot is considered unacceptable. The final
factors are therefore uniquely determined by the rule used to accept the pivot (i.e.,
the rule used to terminate the elimination). Termination is controlled by a preas-
signed scalar parameter v (0 < v < 1). A pivot is acceptable if it is both positive
and larger in absolute value than v times the off-diagonal of largest magnitude in
the pivot row and column. At each step, the determilation of an acceptable pivot
requires the examination of the diagonals and a single row of the Schur comple-
ment. (For a similar scheine in the context of (Juadratic programming, see Casas
and Pola. [CP90].)
It will be shown below that once a pivot is deemed inacceptable (and he•nlice
the factorization is terminated), a suitable direction of negative ciirvat. iiic I)v
determined from the elements of the remaining Schur complement.
Let P denote the permutation matrix representing the symmetric interchanges
performed during the factorization. If n, denotes the number of steps needed before
termination, the factorization implicitly identifies a leading it x n1 positive-definite
submatrix of the permuted matrix PTtIP. In terms of a partition HI,, H12, H._,
and H 22 of PTHP, we have

(.1r
111 1112
H12
LI:11
L, ) ( m LT
1 LfI)
where Lit is unit lower triangular and BI is a positive-definite diagonal matrix.
(2.1)

The submatrix H 11 is positive definite, and ti!1 = LIIB 1 L'T is its usual Cholesk-v
factorization obtained using diagonal pivoting. The factorization mia.y be written
briefly as H = LBLT, where L is a row-permuted lower-triangular matrix with

L =P ( L ) and R= (Bi 13.) (2.2)

We will use Ii, to denote the size of "_2, so that v, + 7, = 7). A "pseido-nliathial"
version of the pa.rtial Cholesky algorith ii is given in Algoriithmn 2.
4 Partial Cholesky factoriZation

The curvature along any direction d computed from the partial Cholesky factor-
ization is related to the magnitude of the smallest eigenvalue of the Schur comple-
ment B2. The following lemma relates the smallest eigenvalue of B 2 to the smallest
eigenvalue of H.

Lemma 2.1. Let H be a symmetric n x n matriz with at least one negative eigen-
value. Let the partial Cholesky factorization of H be denoted by H = LBLT, where
PTK P is partitioned as in (2.1). Then

Amin(B 2) •_ Amin() and B 2 = yTHy,

where -H2
Y=P -L -T L2T P( •I;

Proof. The inequality A<i.(B.) < Ajn(H) can be established using the idenltity

which is a rearrangement of the factorization (2.1). The eigenvalues of H and


pTHp are identical. Moreover, the positive-definiteness of B1 implies that the
second term on the right-hand-side of (2.4) is positive semidefinite. Since the
eigenvalues of PTHP cannot increase on subtraction of a positive semidefinite ma-
trix, it must follow that min{0, Ami,(B2)} _<A..,(H) (see e.g., Golub and Van
Loan [GV89, page 411]). From the assumption Ani,(H) < 0, we conclude that
Amin(B2 ) _<Amin(H), as required.
To show that the matrix Y (2.3) is well defined, it is sufficient to verify that
H1 2 = L 1 . This is an immediate consequence of multiplying the partitioned
right-hand-side matrix from (2.1) to obtain Hia L1 B, LT anld 112 = LJBI L .
Finally, the identity yTtIY = B2 may be verified by expressing L- 1 IIL-T= B
in the partitioned form
TI _
-L 2 1L'• 1 1 21 H!.2 I U2

from which the result follows. I

Note that the matrix Y (2.3) consists of the last n, columns of L"'. Our analysis
requires bounds on the norms of Y, L and L-', which are provided by the following
lemma given by Highaln [Hig90].

Lemma 2.2. Let H be factorized using the partial Cholesky factorization described
in Algorithm 2.1. If pTJIP is partitioned as in (2.1), then
n,)(4-- - 1);
(a) JITT~r
(a)~ 51
2 <; /1
3I -
V-(n
vI;r
2. The partial Cholesky fctorization 5

(b) JILITLT2,e, I< •V5(4--'- 1);


vL 3

(c) IILII _-

(d) IIL-'t1 < '---l.


Proof. Part (a) follows immediately from Lemma 9.4 of Iligham Illig90] and the
fact that the elements of L21 are bounded in absolute value by I/v. Part (b) is
a consequence of part (a), since LT e, is an n 2 -vector whose elements are bounded
in absolute value by 1/v. Part (c) follows from the fact that all elements of L are
bounded by 1/v in absolute value. Similarly, part (d) is a consequence of the fact
that all elements of L- 1 are bounded by 2'1'-l/v (see Itigham [Hig90] for details).
1

2.1. Computation of the descent direction


We now discuss the application of the partial Cholesky factorization to the calcu-
lation of a descent direction sL satisfying (1.1a). Let D be any positive-definite
modification of B, i.e., B0 is a positive-definite matrix with lIB - D11 "small" and
/ = B when B is sufficiently positive definite. There are many choices for !B-for
example, consider the block-diagonal matrix B = diag(BI, I), where I is the identity
matrix of order n 2. With this definition, when nj = n and H is sufficiently pobitive
definite, 1l = B1 and s satisfies the usual Newton equations Hs = -g.
Lemma 2.3. Let H be factorized using the partial Cholesky factorization described
in Algorithm 2.1 and assume that pTJ~p is partitioned as in (2.1). Let B bc a
positive-definite modification of B, and let s satisfy
LBLTS = -g. (2.5)

Then,
T Jg12 n 4f" t-
-2
-g s > n vail 2 and Ilsll•_<2 H( .
2\.(B) mi(D
Proof. From the definition of s in (2.5) we have

s = 1 L-'g.
-L-P-1 (2.6)
Premultiplying (2.6) by gT gives
= gTL-T-fL-y
-gTs 2 1 T
IILI12Amax.(TDj "
and the required lower bound on -gTs follows from part (c) of Leinina 2.2. To
obtain the bound on tlslI we derive the inequality Ilsll_ 111211!111, by
taking norms of both sides of (2.6), substituting for L from (2.2) and using norm
inequalities. The required upper bound follows from part (d) of Lemma 2.2. 1
6 PartialCholesky factorization

2.2. Computation of the direction of negative curvature


The formula for d is derived from a method for computing directions of negative
curvature in quadratic programming (see Forsgren et al. [FGM91). The approach
is based on the observation that, in the positive-definite case, the Newton direction
is a minimizer of a quadratic model with gradient g and Hessian II. In particular,
the Newton direction can be found by a quadratic programming algorithm that
minimizes the model function while successively releasing variables from temporarily
fixed values. This analogy can be extended to the indefinite case, where the variables
corresponding to H 22 are temporarily fixed at their current values, and a direction
of negative curvature is defined by releasing either one or two of the fixed variables.
This scheme corresponds to using a direction of negative curvature that is a multiple
of either yj or y, ± yj, where yj and yj denote columns i and j of the matrix Y (2.3).
The following lemma shows how the indices i and j are determined from the elements
of B_ = Y T HY.

Lemma 2.4. On termination of the partial Cholesky factorization with diogonlu/


pivoting, let pTflp be partitioned (s in (2.1). If nj = n, define d = 0. Otherwise,
if nj < n, define d as follows. Given p = ma~x>,,,j>,,, Ib lf and any pair of indices
q (q > n 1 ) and r (r > n,) such that Ibqrl = p, let d be the solution of
eq if q=r7.

=(e - sgn(bq)er) otherwise.

Then, if Ami.(H) _>0, then d = 0. Otherwise, if Amni,(H) < 0, then


1..1 / Q(4tli-_1))"2Am1)I()
1 1+ 2(- _
1 Amin(II) < dTd < 3v21
n2

and
dTtId < 31 2(1 -_ )
S-
n,(3-2 + 2(4"' - 1 ))Aflin(l)"

Proof. If nj = n, then A,,in(H) > 0, and the lemma holds from the de(inition d = 0.
For the remainder of the proof, assume that n1 < n.
First, it is necessary to show that -y < ap, where = max {{maxj>,,, bi,}, 0).
If the factorization terminates with -f = 0, the inequality -y < ip is trivially satis-
fied. If the factorization terminates with - > 0, there exists an index t (t > n,)
such that b,, = 7. Since - must be an unacceptable pivot, we can infer that
7 < vmaxjtt,j>n, Ibi,I. Consequently, if nl < n, it must hold that - < vp.
Let d, and v, denote the first nj components of Pd and v respectively. Similarly,
let d 2 and v2 denote the last n 2 components of Pd and v. The definitions of d and
v imply that IIvil = 0, tIv 2 11= 1, and d2 = Vpv 2 . Therefore,

dTd = d'd1 + dd 2 > pvTv2 =p. (2.7)


2. The partial Cholesky factorization 7

Similarly, the definition of d and (2.2) imply that

drd < (1 + 11LTlTvII 2 )p _5 1 + 2(4n,- 1 p, (2.-)

where the last inequality follows from Lemma 2.2. Combining (2.7) and (2.8) yields

p5d'd_< (I+ 2(4- -)P. (2.9)

Consider the case p = 0, which is equivalent to II being positive semidefinite


and singular with Amin(H) = 0. In this case, (2.9) implies d = 0, as required.
Now assume that p > 0. First, if q = r, then Ibqql = p. Since bqq • 7 _<I/p < P,
it must hold that bq = -p, and from the definition of d we obtain the bound
dlTld = pbqq < -(1 - V)p 2•. (2.10)

Alternatively, if q ý r, then the definition of d yields

dTHd = f(bqq + b,, - 21bqrI) < p(7 - p) _<-(1 - V)p 2 , (2.11)


2 q
where the inequalities follow from the conditions b.q < -y, b,, < -y7and p _ i'l/.
Since the magnitude of every element in B, is bounded by p, the Gershgorin
circle theorem and Lemma 2.1 imply

>1 1
p_ Ami(B,) >_ -- A,ni,(H). (2.12)

Combining (2.9), (2.10), (2.11) and (2.12) we obtain

<- 3v 2 (1v )
30(II), AP (2.13)
drHd < 3z(1-.)
2
dTd - 3V + 2(4"' - 1) - n 2 (3v:- + 2(4"' - 1))

as required.
Since, by definition, Amin(H) < dTI~d~dTd, the left-most inequality of (2.13) gives
an upper bound on p, which in conjunction with (2.9) and (2.12) give the bound.
on dTd as
1 1 ( '2(,l"' - )
1\Am.n(H) < drd < - 1. + 2(-n A, (/l ).
n2 V 3-V2

This lemma gives a relation between the curvature along d and the smallest
eigenvalue of H, which is the "best possible" curvature. The bound is exponential
in n 1 , but the computational experiments discussed below imply that the bound is
unlikely to be tight in practice. However, as in Highain (HigOO], we observe that
8 PartialCholesky factorization

there do exist matrices whose bound is "almost" tight. For given n (n > 3) and 0,
define L(O) and B(O) as
1
- cos 0 1
-cos9 -cos0 1
L(8) = : : ". -. and

-cos0 -cosO . cos0 1


-cos9 -cos0 ... -cos0 -coso 1
-cos9 -cos8 .. .cos9 -cos9 (0

1
sin 2o
sin 4 9
B(O) =
sinll ,-a 9
0 -1
-1 0

Define H(6) = L(8)B(t9)L(q)T. If 9 = 0, it is shown in Lemma A.I of Appendix A


that Amin(11(0))= - 1 (vn 2 + 2n - 7 - n + 1), where

4
+

If 0 = 0, the partial Cholesky factorization with diagonal pivoting gives n, = I.


If d(O) denotes the direction of negative curvature associated with 11(0). we obtain
d(O)TII(O)d(o) 1
d(0)Td(0) =- 3 (2.1-i)
and d(0) is a satisfactory direction of negative curvature. However, if 0 is nonzero, it
follows from the analysis of iiigham [Ilig90] that the l)arti,,l Choleskv factorization
with diagonal pivoting will define L(O) and B(O) as factors with nj = n - 2 for all
0 # 0. Moreover,
lim d(O)rI1(8)d(O) 3
0-0 d(O)Td(O) I + 2.4--'
and for 9 near zero, the curvature along d(8) is close to the worst possible value
predicted by Lemma 2.4 (see Iligham [Hig90] for the details). This "pathological"
example arises because the principal submatrix of order n - 2 of H(O) is positive
definite but arbitrarily close to being singular so that IIH- 1 12. 1J(or equivalently
I!Lj7L'LIj) is very large. This is reflected in arbitrarily small pivot elements.
A numerical experiment was devised to investigate if the bound of Lemma 2.A
is likely to be sharp for an arbitrary indefinite matrix. Matlab 4.0 was used to gen-
erate directions of negative curvature for a large set of random indefinite symmetric
2. The partial Cholesky factorization 9

matrices of order 50. Each H was defined as QAQ T , with Q a random orthogonal
matrix and A a random diagonal matrix with at least one negative element. The
matrix Q was obtained from the QR-factorization of a 50 x 50 matrix whose ele-
ments were taken from an independent normal distribution with zero mean and unit
variance. The elements of A were taken from an independent uniform distribution in
the interval [-25,25]. Directions of negative curvature were computed with V-values
ViT, 0.05, 0.10, ... , 0.95, and 1 - Vi, where c denotes the machine precision. A new
random matrix was generated for each factorization, giving a total of 1500 matrices
for each value of v. Figure 2.1 gives the outcome of the computational experiment.
The three lines depict the maximum, mean, and minimum values of the ratio r of
dTHd/ldd to Ami,(H). Each "+" represents the value of r for a particular value of
the parameter v.

0.46-

0.53 + +
TI +
+ +
0.4- -

0.1!

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 2.1: Curvature ratio r as function of v.

The bound on r given by Lemma 2.4 is approximately maximized for v = 2/3. If,
for n = 50, this optimal value gives n, = 49, the theoretical bound is approximately
7 x 10-31. This should be compared with the computed values of r, which never
fell below 0.05 when v was larger than 0.5. The minimum value of r attained a
maximum of 0.0809 for v = 0.9. Based on these results, we would recommend a
10 PartialCholesky factorization

value of v in the range (0.5, 0.95). Note that the larger the value of v, the smaller
the value of n1 and consequently, the smaller tile amount of computation.

3. Theoretical results

The partial Cholesky factorization can be used as the bas-is for a descent r-!thod for
minimizing a twice-continuously differentiable function f :Il' -- Ill. This ilethodl
defines a sequence {Xk}'__0 of improving estimates of a local minimizer.
Let x 0 be any starting point such that the level set {x I f(x) < f(x0)} is compact.
Let {sk} and {dk} be bounded sequences such that each sk is a descent (lirectiun
that satisfies (1.1a) and each dk is a direction of negative curvature that satisfies
(1.1b). Mor6 and Sorensen [MS79] show that with an appropriate linesearch, certain
linear combinations of sk and dk define xk+l so that every limit point of {xk}' 0
will satisfy the second-order necessary conditions for optimality-i.f., at every limit
point t, Vf(;f) is zero and V 2f(i) is positive semidefinite. The main result of this
paper-that the search directions obtained using the partial Cholesky factorizationi
are sufficient in the sense of Mor6 and Sorensen [MS79]-is stated in the following
theorem.

Theorem 3.1. Let {Xkl} 0 be a sequence of iterates contained in a compact region


of iR', and atsunie that f : IR'Y , fR is a twice-continuously differentiable funclion.
For each k, define gi = Vf(xk) and 1k = V 2f(xL.), and let Ik = L .BkLT be tl,.
partial Cholesky factorization of Hk (is lescribed in Algorithm 2. 1. Giwen positivl
constants c, and c 2 (c, < c2 ), let Sk be defined from Lemma 2.3 with the additional
requirement that cl <_ A,.j,(JBk) • A,,,(/il.) <_ c2. iFinally, Irl d(. br dcfincd fl,,,,
Lemma 2.4. Then, {sk} and {dk} are bounded scquen(ces such tihla

9LS "-*0 9k O and s--. O


and
dTIIkdk - 0 ain {JAn(,i(Hk), 0} 0 and dk. - 0.

Proof. Since {Xk} lies in a compact region, the smoothness of f implies that {il•9&I}
and {IlHkII} are bounded.
With the existence of cl and C2 , and the boundedness of 1Ig9j, Lemma 2.3 implies
that {sk} is a bounded sequence, and gks&. - 0 implies Yk -- 0 anld Sk 0, as
required.
Lemma 2.4 and the boundedness of jIHk•j imply that {dA.) is a bounded sequence,
and dfkf•1dk - 0 implies d4 - 0 and min{A ji,(Hk),0} - 0, as required. I

If V 2f(xk) is sufficiently positive definite, all pivots will be acceptable ami!d th,
partial Cholesky factorization will terminate with n1 = n. This implies that if
{x•}• 0 has a limit point t at which V 2f(;E) is sufficiently positive definite, then the
iterates will be identical to those of Newton's method for l sufficiently large.
4. Discussion

4. Discussion

The partial Cholesky factorization may be implemented in other ways. For example,
the calculation of the matrix HI, can be made independent of the calculation of the
descent direction Sk. Once a direction of negative curvature has been defined, a
descent direction can be calculated by forming the modified Cholesky factorization
of B 2 (see, e.g., Gill and Murray [GM74], Schnabel and Eskow [SE90]).
The algorithm of Section 2.2 requires the examination of the diagonals and a sin-
gle row of the Schur complement at each step. Alternative strategies can be devised
in which the complete Schur complement is examined under certain exceptional cir-
cumstances. For example, if a pivot is small, the pivot acceptance criterion could be
strengthened so that a pivot is acceptable if, in addition to the requirements of Algo-
rithm 2.1, it is larger in absolute value than L'bm,,, where b,,,,,x is either the diagonal
of largest magnitude in the Schur complement or the element of largest magnitude
in the full Schur complement. Each of these modifications gives an algorithm with
identical theoretical properties, but a potentially smaller value of ni. However, this
potential improvement is at the expense of an increase in the number of compar-
isons during the factorization. The pivot criterion that requires the examination of
the full Schur complement would cope successfully with the "pathological" H(O) of
Section 2.2 since the factorization would terminate after one step for 0 sifficiently
small.

5. Summary

We have shown how a partial Cholesky factorization can be used to define search
diiections suitable for a linesearch-based modified Newton method. The resulting
directions are sufficient in the sense that it is possible to generate a sequence {Jx. }ký=0
with limit points having a zero gradient and a positive-seinidefinitc liessian.
To our knowledge, this is the first triangular factorization that not only gives
theoretically satisfactory directions, but also requires only a partial pivoting strat-
egy, i.e., the equivalent of only two rows of the Schur complement need be examined
at each step.

Acknowledgement

We thank the Numerical Algorithms Group, Oxford, for providing the computing
facilities that enabled work on this paper to be completed.

A. Eigenvalues of H(0)

Lemma A.1. Let the n x n-matrices L(O) and B(O) be dcfincd as in 8cc/ion 2 for
0 = 0 and n > 3. Define H(O) = L(0)B(0)L(0)r. Then A = -I(V n'-+ "2n- 7 -
n + 1) is the smallest eigenvalue of 11(0), and -1 _<A < - I + 4/(u + 1).
12 Partial Cholesky factorization

Proof. It is straightforward to verify that


1 -1 -1 .... -1 -1 -1

-1 1 1 . 1 1 1
-1 1 1 1 1 1
Hf(O)= : ..

-1 1 1 .. 1 1 1
-1 1 1 .. 1 1 0
-1 1 1 ... 1 0 1

Since B(0) has one negative eigenvalue and L(0) is nonsingular, Sylvester's law
of inertia implies that H(0) has one negative eigenvalue (see e.g., Golub and Van
Loan [GV89, page 416]). Consequently, since A is negative for n > 3, it is enough
to show that it is an eigenvalue.
Assume that v = (1 -1 -1 -1 a a)T is an eigenvector of H(0) for some
scalar a. Then, if v is an eigenvector, there must exist a A such that

n-2-2a = A and (A.la)


-n+2+ a = Aa. (A.lb)

It is straightforward to show that for n > 3, (A.1) has a negative solution A givcen
by
2
A =-and/"n + 2n- 7- n + 1 a= n + 2n - 7 + n - 3
2 4
The upper and lower bounds on A follow from the sequence of inequalities

n+l>_V (n+1)2-8=(n+1) 1n8 >+


F (n +1) 2 nf 8

(Note that the lower bound can also be obtained directly from Lemma 2.1.) I

References
[CP90] E. Casas and C. Pola. An algorithm for indefinite quadratic programming based on a
partial Cholesky factorization. Preprint. Universidad de Cantabria, Santander, Spain,
1990.
[DS89] J. E. Dennis, Jr. and R. B. Schnabel. A view of unconstrained optimization. In G. L.
Nemhauser, A. H. G. Rinnooy Kan, and M. J. Todd, editors, Ihondbook.4 in Operaouioi
Research and MaangernentScience, volume 1. Oplimization, chapler I, pagc. 1-72. North
Holland, Amsterdam, New York, Oxford and Tokyo, 1989.
[FF77] R. Fletcher and T. L. Freeman. A modified Newton method for miniimization. J. Opli-
mization Theory and Applications, 23, 357-372, 1977.
[FGM91] A. Forsgren, P. E. Gill, and W. Murray. On the identification of local miininizers in
inertia-controlling methods for quadratic programming. SIAM J. on Aftirix Analysis
and Applications, 12, 730-746, 1991.
A. Eigenvalues of H(O) 13

(GM74] P. E. Gill and W. Murray. Newton-type methods for unconstrained and linearly con-
strained optimization. Mathematical Programming, 7, 311-350, 1974.
[Go180] D. Goldfarb. Curvilinear path steplength algorithms for minimization which use direc-
tions of negative curvature. Mathematical Programming, 18, 31-40, 1980.
[GV89] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University
Press, Baltimore, Maryland, second edition, 1989.
[Hig9O] N. J. Higham. Analysis of the Cholesky decomposition of a semi-definite matrix. II
M. G. Cox and S. Hammarling, editors, Reliable Numerical Computation,pages 161-185.
Oxford University Press, 1990.
[KD79] S. Kaniel and A, Dax. A modified Newton's method for unconstrained minimization.
SIAM J. on Numerical Analysis, 16, 324-331, 1979.
[McC77] G. P. McCormick. A modification of Armijo's step-size rule for negative curvature. Math-
ematical Programming, 13, 111-115, 1977.
[MP78] H. Mukai and E. Polak. A second-order method for unconstrained optimization, J.
Optimization, Theory and Applications, 26, 501-513, 1978.
[MS79] J. J. More and D. C. Sorensen. On the use of directions of negative curvature in a
modified Newton method. Mathematical Programming, 16, 1-20, 1979.
[OR70] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several
Variables. Academic Press, New York, 1970.
[SE90] R. B. Schnabel and E. Eskow. A new modified Cholesky factorization. SIAM J. oft
Scientific and Statistical Computing, 11, 1136-1158, 1990.
(SSB85) G. A. Shultz, R. B. Schnabel, and R. H. Byrd. A family of trust-region bhaed algorithms
for unconstrained minimization with strong global convergence properties. SIAM J. o(
Numerical Analysis, 22, 47-67, 1985.
14 PartialCholesky factorization

Algorithm 2.1. An algorithm for the partial Cholesky factorization

%PARTCHOL Partial Cholesky factorization routine for a real symmetric


% matrix H.
% [L,B,perm,n1 ] = partchol(H)
% forms a permutation perm, a unit lower-triangular matrix
% L(perm,:) and a block diagonal matrix B such that L.B. L'=IH
% using the partial Cholesky factorization with diagonal pivoting.
% The size of the positive-definite principal submatrix obtained
i% in the factorization is denoted by nj.
function [L,B,perm,nl] = partchol(tI)
n = length(H);
perm = 1:n;
B =H;
L = zeros(n);
v E (0, 1);
k =1;
n, = 0;
while k < n
[9r,r) = max(fzeros(1,k-1) diag(B(k:n,k:n))']);
if k <n
lj, = max(abs(B(r,[l:r-1 r+l:n])));
else
Apr = 0;
end
if IA, > 0 and Ar >V -. Lpr
nl = k;
perm([k r]) = perm([r k]);
B([k r],:) = B([r k],:);
B(:,fk r]) = B(:,4r k]);
L(perm(k:n),k) = B(k:n,k)/1B(k,k);
if k < n
B(k+l:n,k+l:n) = B(k+l:n,k+l:n)-L(perm(k+ 1:n),k).B(k,k+ l :n);
B(k+l:n,k) = zeros(n-k,1);
B(k,k+l:n) = zeros(1,n-k);
end
k = k+1;
else
L(perm(k:n),k:n) = eye(n-k+l);
k = n+1;
end
end
REPORT DOCUMENTATION PAGE ON o0"400
IIAf.I w ~fiAgf~~UAW.44
C~f.A i "=If.ft4Aft" .*OW
mlwwm -04%w t"S "no uim
ffWW'^VC"A.W watt **u, ."Il ~
~~~'
l a w w ' eF 4w " f V0 . " . Wc f " t 1l 4tef
O W f VVES U fWU ~ 1 O 0f0 04u"
Of cCW"S ~
o , $ 'M pm et a g o r a s Wo . ff 11 0 ~. of~" ft

1i. AGENCY Ufa ONLY (04111"MOM) 4. REPORT DATE 3. RIPOR11 TYPE AND DAYES COVERED1
Magc Jmjý.
""'m echicalRP nrt_
4. TITLE AND SUSa~tma FUNOIG NUMaERS
Computing Modified Newton Directions Using A Partial DE-FG03-92ER25117
Cholesky Factorization N00014-90--J-1242
6. kuoTHOt(S)
Anders Forsgren, Philip E. Gill and Walter Murray

7. PERFORMING ORGANIZATION NAME(S) ANO ADDRESS11EII B. PERFORMING ORGANI1ZATION


REPORT NUMBER
Department of Operations Research - SOL
Stanford University 111 iMA
Stanford, CA 94305-4022

9a SPONSORING MONITORING AGENCY NAME(S) AND AODORSS(IS) Ia. SPONSORING/ MONITORING


AGENCY REPORT NUMBER
Office of Naval Research - Department of the Navy
800 N. Quincy Street SOL 93-1
Arlington, VA 22217

Office of Energy Research


U.S. Department of Enerby
Washington, DC 20585

12a. OISTA40,0JT.O AVAILABILITY STATEMENT 12b. DISTRIBUJTION CODE

UNLIMITED UL

13. ABSTRACT (dAlIMUM2O0wOI'U)

Abstract
The effectiveness of Newvton's method for finding an unconstrained mini-
mizer of a strictly convex twice continuously differentiable function has prompted
the proposal of various modified Newton methods for the nonconvex case.
Linesearch modified Newton methods utilize a linear combination of a de-
scent direction and a direction of negative curvature. If these directions are
sufficient in a certain sense, and a suitable Iinesearch is used, the resulting
method will generate limit points that satisfy the second-order necessary con-
ditions for optimality.
We propose an efficient method for computing a descent direction and a
direction of negative curvature that is based on a partial Cholesky factoriza-
tion of the Hessian. This factorization not only gives theoretically satisfactory
directions, but also requires only a partialpivoting strategy, i.e., the equivalent
of only two rows of the Schur complement need be examined at each step.

1141.
SU641ECT TERMS IS. NUMBER OF PAGES
14 pp
Unconstrained minimization, modified Newton method, descent 116. PRICE CooE
direction, negative curvature, Cholesky factorization
SECURITY CLASSIFICATION SEURITY CLASSIFICATION 20. LIMITATION OF
I?. SECURITY CLASSIFICATION
OF REPORT j 13. Of THIS PAGE
11
OFABSTRACT
ABSTRACT

UNCLASSIFIED =t1 it__________


4S% 75'G.I*ZBO3SQS-Carea go'- -?S 2#'

You might also like