Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov
Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov
VOLUME 99
Series Editors:
Panos M. Pardalos
University of Florida, U.S.A.
Donald W. Heam
University of Florida, U.S.A.
CONTINUOUS OPTIMIZATION
Current Trends and Modem Applications
Edited by
VAITHILINGAM JEYAKUMAR
University of New South Wales, Sydney, Australia
ALEXANDER RUBINOV
University of Ballarat, Ballarat, Australia
Springer
Library of Congress Cotaloging-in-Publication Data
9 8 7 6 5 4 3 2 1 SPIN 11399797
springeronline. com
Contents
Preface XIII
List of Contributors XV
Part I Surveys
7 Acknowledgement 289
References 289
Generalized Lagrange Multipliers for Nonconvex Directionally
Differentiable Programs
Nguyen Dinh, Gue Myung Lee, Le Anh Tuan 293
1 Introduction and Preliminaries 293
2 Generalized Lagrange Multipliers 296
2.1 Necessary conditions for optimality 296
2.2 Sufficient condition for optimality 301
3 Special Cases and Applications 304
3.1 Problems with convexlike directional derivatives 304
3.2 Composite nonsmooth programming with Gateaux
differentiability 305
3.3 Quasidifferentiable problems 309
4 Directionally Differentiable Problems with DSL-approximates 314
References 317
Slice Convergence of Sums of Convex functions in Banach
Spaces and Saddle Point Convergence
Robert Wenczel, Andrew Eberhard 321
1 Introduction 321
2 Preliminaries 323
3 A Sum Theorem for Slice Convergence 327
4 Saddle-point Convergence in Fenchel Duality 336
References 341
Topical Functions and their Properties in a Class of Ordered
Banach Spaces
Hossein Mohebi 343
1 Introduction 343
2 Preliminaries 344
3 Plus-Minkowski gauge and plus-weak Pareto point for a downward
set 347
4 X(^-subdifferential of a topical function 349
5 Fenchel-Moreau conjugates with respect to cp 353
6 Conjugate of type Lau with respect to ip 357
References 360
8 Conclusions 430
8.1 Optimization 430
8.2 Clustering 431
References 433
Analysis of a Practical Control Policy for Water Storage in
Two Connected Dams
Phil Howlett, Julia Piantadosi, Charles Pearce 435
1 Introduction 435
2 Problem formulation 436
3 Intuitive calculation of the invariant probability 438
4 Existence of the inverse matrices 440
5 Probabilistic analysis 441
6 The expected long-term overflow 445
7 Extension of the fundamental ideas 445
References 450
Preface
Surveys
Linear Semi-infinite Optimization: Recent
Advances
Miguel A. Goberna
1 Introduction
Linear semi-infinite optimization (LSIO) deals with linear optimization prob-
lems such t h a t either the set of variables or the set of constraints (but not
both) is infinite. In particular, LSIO deals with problems of the form
where T is an infinite index set, ceW^,a\T \—> R"", and 6 : T i—> R, which
are called primal The Haar^s dual problem of (P) is
where R^ ^ denotes the positive cone in the space of generalized finite se-
quences R^-^^ (the linear space of all the functions A : T H-> R such that A^ = 0
for alH G T except maybe for a finite number of indices). Other dual LSIO
problems can be associated with (P) in particular cases, e.g., if T is a compact
Hausdorff topological space and a and h are continuous functions, then the
continuous dual problem of (P) is
where C!^ (T) represents the cone of nonnegative regular Borel measures on T
(R^^ ' can be seen as the subset of C^. (T) formed by the nonnegative atomic
measures). The value of all these dual problems is less or equal to the value
of (P) and the equality holds under certain conditions involving either the
properties of the constraints system a = {a[x > 6t, t G T} or some relationship
between c and a. Replacing the linear functions in (P) by convex functions we
obtain a convex semi-infinite optimization (CSIO) problem. Many results and
methods for ordinary linear optimization (LO) have been extended to LSIO,
usually assuming that the linear semi-infinite system (LSIS) a satisfies certain
properties. In the same way, LSIO theory and methods have been extended
to CSIO and even to nonlinear semi-infinite optimization (NLSIO).
We denote by P , P* and v[P) the feasible set, the optimal set and the value
of (P), respectively (the same notation will be used for NLSIO problems). The
boundary and the set of extreme points of P will be denoted by B and P ,
respectively. We also represent with yl, yl* and v{D) the corresponding objects
of [D). We also denote by P the solution set of a. For the convex analysis
concepts we adopt a standard notation (as in [GL98]).
At least three reasons justify the interest of the optimization community
in LSIO. First, for its many real life and modeling applications. Second, for
providing nontrivial but still tractable optimization problems on which it is
possible to check more general theories and methods. Finally, LSIO can be
seen as a theoretical model for large scale LO problems.
Section 2 deals with LSISs theory, i.e., with existence theorems (i.e., char-
acterizations of P 7^ 0) and the properties of the main families of LSISs in the
LSIO context. The main purpose of this section is to establish a theoretical
frame for the next sections.
Section 3 surveys recent applications of LSIO in a variety of fields. In fact,
LSIO models arise naturally in difi'erent contexts, providing theoretical tools
for a better understanding of scientific and social phenomena. On the other
hand, LSIO methods can be a useful tool for the numerical solution of difficult
Linear Semi-infinite Optimization: Recent Advances 5
J. = c„„e{(;;).,eT;(!",)}.
The reference cone of cr, cli^, characterizes the consistency of a (by the
condition I J^ J ^ c\K) as well as the halfspaces containing its solution set,
The statement of two basic theorems and the sketch of the main numerical
approaches will show the crucial role played by the above families of LSISs, as
constraint qualifications, in LSIO theory and methods (see [GL98] for more
details).
Duality theorem: if a is FM and F ^^^ 0 ^^ yl, then v{D) = v{P) and (D)
is solvable.
Optimality theorem: if x G F satisfies the KKT condition c G A{x),
then X G F*, and the converse is true if a is LFM.
Discretization methods generate sequences of points in R^ converging
to a point of F* by solving suitable LO problems, e.g., sequences of optimal
solutions of the subproblems of (P) which are obtained by replacing T with
a sequence of grids. The classical cutting plane approach consists of replacing
in (P) the index set T with a finite subset which is formed from the previous
one according to certain aggregation and elimination rules. The central cutting
plane methods start each step with a polytope containing a sublevel set of (P),
calculate a certain "centre" of this polytope by solving a suitable LO problem
Linear Semi-infinite Optimization: Recent Advances 7
and then the polytope is updated by aggregating to its defining system either
a feasibhty cut (if the center is unfeasible) or an objective cut (otherwise). In
order to prove the convergence of any discretization method it is necessary
to assume the continuity of a. The main difficulties with these methods are
undesirable jamming (unless (P) has a strongly unique optimal solution) and
the increasing size of the auxiliary LO problems (unless efficient elimination
rules are implemented).
Reduction methods replace (P) with a nonlinear system of equations
(and possibly some inequalities) to be solved by means of a quasi-Newton
method. The optimality theorem is the basis of such an approach, so that it
requires a to be LFM. Moreover, some smoothness conditions are required,
e.g., a to be analytic. These methods have a good local behavior provided
they start sufficiently close to an optimal solution.
Two-phase methods combine a discretization method (1st phase) and
a reduction method (2nd phase). No theoretical result supports the decision
to go from phase 1 to phase 2.
Feasible directions (or descent) methods generate a feasible direction
at the current iterate by solving a certain LO problem, the next iterate being
the result of performing a linear search in this direction. The auxiliary LO
problem is well defined assuming that a is smooth enough, e.g., it is analytic.
Purification methods provide finite sequences of feasible points with
decreasing values of the objective functional and the dimension of the corre-
sponding smallest faces containing them, in such a way that the last iterate
is an extreme point of either F or yl (but not necessarily an optimal solu-
tion). This approach can only be applied to (P) if the extreme points of F
are characterized, i.e., if a is analytic or LOP.
Hybrid methods (improperly called LSIO simplex method in [AL89])
alternate purification steps (when the current iterate is not an extreme point
of P) and descent steps (otherwise).
Simplex methods can be defined for both problems, (P) and (-D), and
they generate sequences of linked edges of the corresponding feasible set (ei-
ther F or A) in such a way that the objective functional improves on the
successive edges under a nondegeneracy assumption. The starting extreme
point can be calculated by means of a purification method. Until 2001 the
only available simplex method for LSIO problems was conceived for (D) and
its convergence status is dubious (recall that the simplex method in [GG83]
can be seen as an extension of the classical exchange method for polynomial
approximation problems, proposed by Remes in 1934).
Now let us consider the following question: which is the family of solution
sets for each class of LSISs?
The answer is almost trivial for continuous, FM and LFM systems. In fact,
if
Ti := I r J j G R^ I a'x > 6Vx G P
T2:={^GTI|||^||<1},
M.A. Goberna
and
:= < a'x > 6, eTi 1,2,
1 ^ 1 degF [
max {0,2p - 3}
{x G R^ 1 c'^x > di,i = l,...,p} (minimal)
convex hull of an ellipse 4
convex hull of a parabola 4
convex hull of a branch of hyperbola 2
3 Applications
As the classical applications of LSIO described in Chapters 1 and 2 of [GL98]
and in [GusOlb], the new applications could be classified following different
criteria as the kind of LSIO problem to be analized or solved ((P), (i^),
(Do), etc.), the class of constraint system of (P) (continuous, FM, etc.) or the
presentation or not of numerical experiments (real or modeling applications,
respectively).
Economics
During the 80s different authors formulated and solved risk decision prob-
lems as primal LSIO problems without using this name (in fact they solved
some examples by means of naive numerical approaches). In the same vein
[KMOl], instead of using the classical stochastic processes approach to finan-
cial mathematics, reformulates and solves dynamic interest rate models as
primal LSIO problems where a is analytical and FM. The chosen numerical
approach is two-phase.
Linear Semi-infinite Optimization: Recent Advances 9
LSIO models arise naturally in inexact LO, when feasibility under any
possible perturbation of the nominal problem is required. Thus, the robust
counterpart of min^; c'x subject to Ax > 6, where (c, A^b) eU CW^ x W^'^ x
M^, is formulated in [BN02] as the LSIO problem mint,^^ subject to t >
c'x, Ax > b\/ (c, A^ b) G U; the computational tractability of this problem is
discussed (in Section 2) for different uncertainty sets U in a, real application
(the antenna design problem). On the other hand, [AGOl] provides strong
duahty theorems for inexact LO problems of the form min^; maxcec c'x subject
to Ax e B yA G A and x e R!J:, where C and B are given nonempty convex
sets and ^ is a given family of matrices. If Ax G B can be expressed as
A{t)x = b{t), t G T, then this problem admits a continuous dual LSIO
formulation.
LSIO also applies to fuzzy systems and optimization. The continuous LSIO
(and NLSIO) problems arising in [HFOO] are solved with a cutting-plane
method. In all the numerical examples reported in [LVOla], the constraint
system of the LSIO reformulation is the union of analytic systems (with or
without box constraints); all the numerical examples are solved with a hybrid
method.
Semidefinite p r o g r a m m i n g
Many authors have analyzed the connections between LSIO and semidef-
inite programming (see [VB98, Fay02], and references therein, some of them
solving SDP problems by means of the standard LSIO methods). In [KZOl]
the LSIO duality theory has been used in order to obtain duality theorems
for SDP problems. [KKOO] and [KGUY02] show that a special class of dual
SDP problems can be solved efficiently by means of its reformulation as a
continuous LSIO problem which is solved by a cutting-plane discretization
method. This idea is also the basis of [KM03], where it is shown that, if the
LSIO reformulation of the dual SDP problem has finite value and a FM con-
straint system, then there exists a low size discretization with the same value.
Numerical experiments show that large scale SDP problems which cannot be
handled by means of the typical interior point methods (e.g., with more than
3000 dual variables) can be solved applying an ad hoc discretization method
which exploits the structure of the problem.
4 Numerical methods
In the previous section we have seen that most of the LSIO problems arising
in practical applications in the last years have been solved by means of new
methods (usually variations of other already known). Two possible reasons
for this phenomenon are the lack of available codes for large classes of LSIO
problems (commercial or not) and the computational inefficiency of the known
methods (which could fail to exploit the structure of the particular problems).
Now we review the specific literature on LSIO methods.
12 M.A. Goberna
[Bet04] and [WFLOl] propose two new central cutting plane methods, tak-
ing as center of the current polytope the center of the greatest ball inscribed in
the polytope and its analytic center, respectively. [FLWOl] proposes a cutting-
plane method for solving LSIO and quadratic CSIO problems (an extension
of this method to infinite dimensional LSIO can be found in [WFLOl]). Sev-
eral relaxation techniques and their combinations are proposed and discussed.
The method in [Bet04], which reports numerical experiments, is an accelerated
version of the cutting-plane (Elzinga-Moore) Algorithm 11.4.2 in [GL98] for
LSIO problems with continuous a whereas [WFLOl] requires the analiticity
of a. A Kelley cutting-plane algorithm has been proposed in [KGUY02] for a
particular class of LSIO problems (the reformulations of dual SDP problems);
an extension of this method to SIO problems with nonlinear objective and
linear constraints has been proposed in [KKT03].
A reduction approach for LSIO (and CSIO) problems has been proposed
in [ILTOO], where a is assumed to be continuous and FM. The idea is to
reduce the Wolfe's dual problem to a small number of ordinary non linear
optimization problems. The method performs well on a famous test example.
This method has been extended to quadratic SIO in [LTW04].
[AGLOl] proposes a simplex method (and a reduced gradient method) for
LSIO problems such that a is LOP. These methods are the unique which could
be applied to LSIO problems with a countable set of constraints. The proof
of the convergence is an open problem.
[LSVOO] proposes two hybrid methods to LSIO problems such that cr is a
finite union of analytic systems with box constraints. Numerical experiments
are reported.
[KM02] considers LSIO problems in which a is continuous, satisfies the
Slater condition and the components of a^ G C (T) are linearly inpendent.
Such kind of problems are reformulated as a linear approximation problem,
and then they are solved by means of a classical method of Polya. Convergence
proofs are given.
[KosOl] provides a conceptual path-following algorithm for the parametric
LSIO problem arising in optimal control consisting of replacing T in (P) with
an interval T (r) := [0, r ] , where r ranges on a certain interval. The constraints
system of the parametric problem are assumed to be continuous and FM for
each r. An illustrative example is given.
Finally, let us observe that LSIO problems could also be solved by means
of numerical methods initially conceived for more general models, as CSIO
([AbbOl, TKVB02, ZNFOO]), NLSIO ([ZR03, VFG03, GPOl, GusOla]) and
GSIO ([StiOl, SS03, Web03] and references therein). The comparison of the
particular versions for LSIO problems of these methods with the specific LSIO
methods is still to be made.
Linear Semi-infinite Optimization: Recent Advances 13
5 Perturbation analysis
In this section we consider possible any arbitrary perturbation of the nominal
data TT = (a^b^c) which preserve n and T (the constraint system of TT is a).
TT is bounded if v (P) 7^^—00 and it has bounded data if a and b are bounded
functions. The parameter space is 11 := (R^ x R) x R"^, endowed with the
pseudometric of the uniform convergence:
On+i ^ clconv
{(:)•'-}
(a useful condition involving the data).
The characterization of the use property of ^ at TT G ilc in [CLP02a]
requires some additional notation. Let K^ be the characteristic cone of
<.-={a'x>6.(;).(co„v{(»<),,.r})J,
cone ( K - U { ( ; ) } ) = C O „ ( C , K U {(;)}).
The stability of the feasible set has been analyzed from the point of view
of the dual problem ([GLTOl]), for the primal problem with equations and
constraint set ([AG05]) and for the primal problem in CSIO ([LVOlb] and
[GLT02]).
Stability of the boundary of the feasible set
Given n e lie such that F ^W^, then we have ([GLV03], [GLT05]):
B closed at TT
(4)
• If TT G iTc, then
diTTMHsi) = d(On+lMH) .
• If TT G (clils) n (intilc) and Z~ := conv{at,^ G T; - c } , then
Error bounds
The residual function of TT is
^ ^ ^ < /? Vx G R^\F,
r (x,7r)
If there exists such a /?, then the condition number of TT is
d(x F)
0 < T (TT) :=: sup ) ' , < +00.
xeR^\F r{x,7r)
The following statements hold for any TT with bounded data ([HuOO]):
• Assume that F is bounded and TT G int/Zc , and let /?, x^ and 5 > 0 such
that ||x|| < p Vx G F and a[x^ > bt + e yt e T. Let 0 <-f < 1, Then, if
£771 2
c/(7ri,7r) <
we have
r(7ri) < 2 p 5 " ^
1+ 7
(1-7)'
• Assume that F is unbounded and TTH G intilc, and let u and rj > 0 such
that a'tU >r]\/teT, \\u\\ = 1. Let 0 < 5 < n-^r}. Then, if (i(7ri,7r) < 5, we
have
T (TTI ) < (7/ — (^n 2 j
Acknowlegement
This work was supported by D G E S of Spain and F E D E R , G r a n t BFM2002-
04114-C02-01.
References
[AbbOl] Abbe, L.: Two logarithmic barrier methods for convex semi-infinite prob-
lems. In [GLOl], 169-195 (2001)
[AGOl] Amaya, J., J.A. Gomez: Strong duality for inexact linear programming
problems. Optimization, 49, 243-369 (2001)
[AG05] Amaya, J., M.A. Goberna: Stability of the feasible set of linear systems
with an exact constraints set. Math. Meth. Oper. Res., to appear (2005)
[AGLOl] Anderson, E.J., Goberna, M.A., Lopez, M.A.: Simplex-like trajectories
on quasi-polyhedral convex sets. Mathematics of Oper. Res., 26, 147-162
(2001)
[AL89] Anderson, E.J., Lewis, A.S.: An extension of the simplex algorithm for
semi-infinite Hnear programming. Math. Programming (Ser. A), 44, 247-
269 (1989)
[BN02] Ben-Tal, A., Nemirovski, A.: Robust optimization - methodology and
appHcations. Math. Programming (Ser. B), 92, 453-480 (2002)
[Bet04] Betro, B.: An accelerated central cutting plane algorithm for linear semi-
infinite linear programming. Math. Programming (Ser. A), 101, 479-495
(2004)
[BGOO] Betro, B., Guglielmi, A.: Methods for global prior robustness under gener-
alized moment conditions. In: Rios, D., Ruggeri, F. (ed) Robust Bayesian
Analysis, 273-293. Springer, N.Y. (2000)
[BSOO] Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Prob-
lems. Springer Verlag, New York, N.Y. (2000)
[CLOP03] Canovas, M.J., Lopez, M.A., Ortega, E.-M., Parra, J.: Upper semicon-
tinuity of closed-convex-valued multifunctions. Math. Meth. Oper. Res.,
57, 409-425 (2003)
[CLP02a] Canovas, M.J., Lopez, M.A., Parra, J.: Upper semicontinuity of the fea-
sible set mapping for linear inequality systems. Set-Valued Analysis, 10,
361-378 (2002)
[CLP02b] Canovas, M.J., Lopez, M.A., Parra, J.: Stability in the discretization of a
parametric semi-infinite convex inequality system. Mathematics of Oper.
Res., 27, 755-774 (2002)
[CLP05] Canovas, M.J., Lopez, M.A. and Parra, J.: StabiHty of linear inequality
systems in a parametric setting, J. Optim. Theory AppL, to appear (2005)
[CLPTOl] Canovas, M.J., Lopez, M.A., Parra, J., Todorov, M.I.: Solving strategies
and well-posedness in linear semi-infinite programming. Annals of Oper.
Res., 101, 171-190 (2001)
[CLPT04] Canovas, M.J., Lopez, M.A., Parra, J., F.J. Toledo: Distance to ill-
posedness and consistency value of Hnear semi-infinite inequality systems,
Math. Programming (Ser. A), Published onhne: 29/12/2004, (2004)
18 M.A. Goberna
[DCNNOl] Dahl, M., Claesson, L, Nordebo, S., Nordholm, S.: Chebyshev optimiza-
tion of circular arrays. In: Yang, X. et al (ed): Optimization Methods and
Applications, 309-319. Kluwer, Dordrecht, (2001)
[DalOl] Dall'Aglio: On some applications of LSIP to probability and statistics. In
[GLOl], 237-254 (2001)
[FLWOl] Fang, S.-Ch., Lin, Ch.-J., Wu, S.Y.: Solving quadratic semi-infinite pro-
gramming problems by using relaxed cutting-plane scheme. J. Comput.
Appl. Math., 129, 89-104 (2001)
[Fay02] Faybusovich, L.: On Nesterov's approach to semi-infinite programming.
Acta Appl. Math., 74, 195-215 (2002)
[GauOl] Gauvin, J.: Formulae for the sensitivity analysis of linear programming
problems. In Lassonde, M. (ed): Approximation, Optimization and Math-
ematical Economics, 117-120. Physica-Verlag, Berlin (2001)
[GLV03] Gaya, V.E., Lopez, M. A., Vera de Serio, V.: Stability in convex semi-
infinite programming and rates of convergence of optimal solutions of
discretized finite subproblems. Optimization, 52, 693-713 (2003)
[GG83] Glashofi", K., Gustafson, S.-A.: Linear Optimization and Approximation.
Springer Verlag, Berlin (1983)
[GHT05a] Goberna, M.A., Hernandez, L., Todorov, M.I.: On linear inequality sys-
tems with smooth coefficients. J. Optim. Theory Appl., 124, 363-386
(2005)
[GHT05b] Goberna, M.A., Hernandez, L., Todorov, M.I.: Separating the solution
sets of analytical and polynomial systems. Top, to appear (2005)
[GGGT05] Goberna, M.A., Gomez, S., Guerra, F., Todorov, M.I.: Sensitivity analy-
sis in linear semi-infinite programming: perturbing cost and right-hand-
side coefficients. Eur. J. Oper. Res., to appear (2005)
[GJD05] Goberna, M.A., Jeyakumar, V., Dinh, N.: Dual characterizations of set
containments with strict inequalities. J. Global Optim., to appear (2005)
[GJM03] Goberna, M.A., Jornet, V., Molina, M.D.: Saturation in linear optimiza-
tion. J. Optim. Theory Appl., 117, 327-348 (2003)
[GJM05] Goberna, M.A., Jornet, V., Molina, M.D.: Uniform saturation. Top, to
appear (2005)
[GJROl] Goberna, M.A., Jornet, V., Rodriguez, M.: Directional end of a convex set:
Theory and apphcations. J. Optim. Theory Appl, 110, 389-411 (2001)
[GJR02] Goberna, M.A., Jornet, V., Rodriguez, M.: On the characterization of
some families of closed convex sets. Contributions to Algebra and Geom-
etry, 43, 153-169 (2002)
[GJR03] Goberna, M.A., Jornet, V., Rodriguez, M.: On linear systems containing
strict inequalities. Linear Algebra Appl, 360, 151-171 (2003)
[GLV03] Goberna, M.A., Larriqueta, M., Vera de Serio, V.: On the stability of the
boundary of the feasible set in Hnear optimization. Set-Valued Analysis,
11, 203-223 (2003)
[GLV05] Goberna, M.A., Larriqueta, M., Vera de Serio, V.: On the stability of
the extreme point set in linear optimization. SIAM J. Optim., to appear
(2005)
[GL98] Goberna, M.A., Lopez, M.A.: Linear Semi-Infinite Optimization, Wiley,
Chichester, England (1998)
[GLOl] Goberna, M.A., Lopez, M.A. (ed): Semi-Infinite Programming: Recent
Advances. Kluwer, Dordrecht (2001)
Linear Semi-infinite Optimization: Recent Advances 19
[JJNSOl] Jess, A., Jongen, H.Th., Neralic, L., Stein, O.: A semi-infinite program-
ming model in data envelopment analysis. Optimization, 49, 369-385
(2001)
[Jey03] Jeyakumar, V.: Characterizing set containments involving infinite convex
constraints and reverse-convex constraints. SIAM J. Optim., 13, 947-959
(2003)
[JOW05] Jeyakumar, V., Ormerod, J., Womersly, R.S.: Knowledge-based semi-
definite linear programming classifiers. Optimization Methods and Soft-
ware, to appear (2005)
[JSOO] Juhnke, F., Sarges, O.: Minimal spherical shells and linear semi-infinite
optimization. Contributions to Algebra and Geometry, 41, 93-105 (2000)
[KH98] Klatte, D., Henrion, R.: Regularity and stability in nonlinear semi-infinite
optimization. In: Reemtsen, R., Riickmann, J. (ed) Semi-infinite Program-
ming. Kluwer, Dordrecht, 69-102 (1998)
[KGUY02] Konno, H., Gotho, J., Uno, T., Yuki, A.: A cutting plane algorithm
for semidefinite programming with applications to failure discriminant
analysis. J. Comput. and Appl. Math., 146, 141-154 (2002)
[KKT03] Konno, H., Kawadai, N. , Tuy, H.: Cutting-plane algorithms for nonlinear
semi-definite programming problems with applications. J. Global Optim.,
25, 141-155 (2003)
[KKOO] Konno, H., Kobayashi, H.: Failure discrimination and rating of enterprises
by semi-definite programming, Asia-Pacific Financial Markets, 7, 261-273
(2000)
[KMOl] Kortanek, K.O., Medvedev, V.G.: Building and Using Dynamic Interest
Rate Models. Wiley, Chichester (2001)
[KZOl] Kortanek, K.O., Zhang, Q.: Perfect duality in semi-infinite and semidefi-
nite programming. Math. Programming (Ser. A), 9 1 , 127-144 (2001)
[KM02] Kosmol, P., Miiller-Wichards, D.: Nomotopic methods for semi-infinite
optimization. J. Contemp. Math. Anal., 36, 31-48 (2002)
[KosOl] Kostyukova, O.I.: An algorithm constructing solutions for a family of lin-
ear semi-infinite problems. J. Optim. Theory Appl., 110, 585-609 (2001)
[KM03] Krishnan, K., Mitchel, J.E.: Semi-infinite linear programming approaches
to semidefinite programming problems. In: Pardalos, P., (ed) Novel Ap-
proaches to Hard Discrete Optimization, 121-140. American Mathemat-
ical Society, Providence, RI (2003)
[LSVOO] Leon, T., Sanmatias, S., Vercher, E.: On the numerical treatment of lin-
early constrained semi-infinite optimization problems. Eur. J. Oper. Res.,
121, 78-91 (2000)
[LVOla] Leon, T., Vercher, E.: Optimization under uncertainty and linear semi-
infinite programming: A survey. In [GLOl], 327-348 (2001)
[LTW04] Liu, Y., Teo, K.L., Wu, S.Y.: A new quadratic semi-infinite programming
algorithm based on dual parametrization. J. Global Optim., 29, 401-413
(2004)
[LVOlb] Lopez, M. A., Vera de Serio, V.: Stability of the feasible set mapping in
convex semi-infinite programming, in [GLOl], 101-120 (2001)
[MM03] Marinacci, M., Montrucchio, L.: Subcalculus for set functions and cores
of TU games. J. Mathematical Economics, 39, 1-25 (2003)
[MMOO] Mir a, J. A., Mora, G.: Stability of linear inequality systems measured by
the HausdorflP metric. Set-Valued Analysis, 8, 253-266 (2000)
Linear Semi-infinite Optimization: Recent Advances 21
[NY02] Ng, K.F., Yang, W.H.: Error bounds for abstract linear inequality sys-
tems. SIAM J. Optim., 13, 24-43 (2002)
[NNCNOl] Nordholm, S., Nordberg, J., Claesson, L, Nordebo, S.: Beamforming and
interference cancellation for capacity gain in mobile networks. Annals of
Oper. Res., 98, 235-253 (2001)
[NSOl] Noubiap, R.F., Seidel, W.: An algorithm for calculating Gamma-minimax
decision rules under generalized moment conditions. Ann. Stat., 29, 1094-
1116 (2001)
[PenOl] Penot, J.-P.: Genericity of well-posedness, perturbations and smooth vari-
ational principles. Set-Valued Analysis, 9, 131-157 (2001)
[RDB02] Ratsch, G., Demiriz, A., Bennet, K.P.: Sparse regression ensembles in in-
finite and finite hypothesis spaces. Machine Learning, 48, 189-218 (2002)
[RubOOa] Rubio, J.E.: The optimal control of nonlinear diffusion equations with
rough initial data. J. Franklin Inst., 337, 673-690 (2000)
[RubOOb] Rubio, J.E.: Optimal control problems with unbounded constraint sets.
Optimization, 48, 191-210 (2000)
[RSOl] Riickmann, J.-J., Stein, O.: On linear and linearized generalized semi-
infinite optimization problems. Annals Oper. Res., 101, 191-208 (2001)
[SAPOO] Sabharwal, A., Avidor, D., Potter, L.: Sector beam synthesis for cellular
systems using phased antenna arrays. IEEE Trans, on Vehicular Tech.,
49, 1784-1792 (2000)
[SLTTOl] Sanchez-Soriano, J., Llorca, N., Tijs, S., Timmer, J.: Semi-infinite assign-
ment and transportation games. In [GLOl], 349-363 (2001)
[ShaOl] Shapiro, A.: On duality theory of conic linear problems. In [GLOl], 135-
165 (2001)
[Sha04] Shapiro, A.: On duality theory of convex semi-infinite programming. Tech.
Report, School of Industrial and Systems Engineering, Georgia Institute
of Technology, Atlanta, GE (2004)
[SIFOl] Slupphaug, O., Imsland, L., Foss, A.: Uncertainty modelling and robust
output feedback control of nonlinear discrete systems: A mathematical
programming approach. Int. J. Robust Nonlinear Control, 10, 1129-1152
(2000) [also in: Modeling, Identification and Control, 22, 29-52 (2001)]
[SS03] Stein, O., Still, G.: Solving semi-infinite optimization problems with in-
terior point techniques. SIAM J. Control Optim., 42, 769-788 (2003)
[StiOl] Still, G.: Discretization in semi-infinite programming: The rate of conver-
gence. Math. Programming (Ser. A), 9 1 , 53-69 (2001)
[TKVB02] Tichatschke, R., Kaplan, A., Voetmann, T., Bohm, M.: Numerical treat-
ment of an asset price model with non-stochastic uncertainty. Top, 10,
1-50 (2002)
[TTLSOl] Tijs, J., Timmer, S., Llorca, N., Sanchez-Soriano, J.: In [GLOl], 365-386
(2001)
[VB98] Vandenberghe, L., Boyd, S.: Connections between semi-infinite and semi-
definite programming. In Reemtsen, R., Riickmann, J. (ed) Semi-Infinite
Programming, 277-294. Kluwer, Dordrecht (1998)
[VFG03] Vaz, I., Fernandes, E., Gomes, P.: A sequential quadratic programming
with a dual parametrization approach to nonlinear semi-infinite program-
ming. Top 11,109-130 (2003)
[Web03] Weber, G.-W.: Generalized Semi-Infinite Optimization and Related Top-
ics. Heldermann Verlag, Lemgo, Germany (2003)
22 M.A. Goberna
[WFLOl] Wu, S.Y., Fang, S.-Ch., Lin, Ch.-J.: Analytic center based cutting plane
method for linear semi-infinite programming. In [GLOl], 221-233 (2001)
[ZR03] Zakovic, S., Rustem, B.: Semi-infinite programming and applications to
minimax problems. Annals Oper. Res., 124, 81-110 (2003)
[ZNFOO] Zavriev, S.K., Novikova, N.M., Fedosova, A.V.: Stochastic algorithm for
solving convex semi-infinite programming problems with equality and in-
equality constraints (Russian, Enghsh). Mosc. Univ. Comput. Math. Cy-
bern., 2000, 44-52 (2000)
Some Theoretical Aspects of Newton's Method
for Constrained Best Interpolation
Hou-Duo Qi
I Introduction
T h e convex best interpolation problem is defined as follows:
where a = t i < ^2 < • • • < ^n+2 = b and ^/i, i = 1 , . . . , n + 2 are given numbers,
II • II2 is the Lebesgue L'^[a, b] norm, and M^^'^[a, b] denotes the Sobolev space of
functions with absolutely continuous first derivatives and second derivatives
in L'^[a^b], and equipped with the norm being the sum of the L'^[a,b] norms
of the function, its first, and its second derivatives.
Using an integration by parts technique, Favard [Fav40] and, more gener-
ally, de Boor [deB78] showed t h a t this problem has an equivalent reformulation
as follows:
24 H.-D. Qi
where r-(- :== max{0, r} and {A*} satisfy the following interpolation condition:
Once we have the solution u*, the function required by (1) can be obtained by
f" = u. This representation result was obtained first by Hornung [HorSO] and
subsequently extended to a much broader circle of problems in [AE87, DK89,
IP84, IMS86, MSSW85, MU88]. We briefly discuss below both theoretically
and numerically important progresses on those problems.
Theoretically, prior to [MU88] by Micchelli and Utreras, most of research
is mainly centered on the problem (1) and its slight relaxations such as f^'
is bounded below or above, see [IP84, MSSW85, IMS86, AE87, DK89]. After
[MU88] the main focus is on to what degree the solution characterization like
(3) and (4) can be extended to a more general problem proposed in Hilbert
spaces:
min<^ - | | a : - x ° | | ^ | xeC and Ax=^h\ (5)
for the constraints in (5) if and only the unique solution x* has the following
representation:
x* = P c ( x V ^ * A * ) , (7)
where Pc denotes the projection to the closed convex set C (the closeness and
convexity guarantees the existence of Pc)^ and ^* is the adjoint of A^ and
A* G IR^ satisfies the following nonlinear nonsmooth equation:
To see (7) and (8) recover (3) and (4) it is enough to use the fact:
If the strong CHIP does not hold we still have similar characterization in
which Pc is replaced by Pcb, where Ch is an extremal face of C satisfying
some properties [DeuOl]. However, it is often hard to get enough information
to make the calculation of Pc^ possible, unless in some particular cases. Hence,
we mainly focus on the case where the strong CHIP holds. We will see that the
assumption rf^ > 0, i = 1 , . . . , n for problem (1) is a sufficient condition for the
strong CHIP, and much more than that, it ensures the quadratic convergence
of Newton's method.
Numerically, problem (1) has been well studied [IP84, IMS86, AE87,
MU88, DK89, DQQOl, DQQ03]. As demonstrated in [IMS86] and verified
in several other occasions [AE87, DK89], the Newton method is the most effi-
cient compared to many other global methods for solving the equation (4). We
delay the description of the Newton method to the end of Section 3, instead
we list some difficulties in designing algorithms for (4) and (8). First of all, the
equation (4) is generally nonsmooth. The nonsmoothness was a major barrier
for Andersson and Elfving [AE87] to establish the convergence of Newton's
method (they have to assume that the equation is smooth near the solution
(the simple case) in order that the classical convergence result of Newton's
method appHes). Second, as having been both noticed in [IMS86, AE87], in the
simple (i.e., smooth) case, the method presented in [IMS86, AE87] becomes
the classical Newton method. More justification is needed to consolidate the
name and the use of Newton's method when the equation is nonsmooth. To
do this, we appeal to the theory of the generalized Newton method developed
by Kummer [Kum88] and Qi and Sun [QS93] for nonsmooth equations. This
was done in [DQQOl, DQQ03]. We will review this theory in Section 3. Third,
Newton's method is only developed for the conical case, i.e., C is a cone. It is
yet to know in what form the Newton method appears even for the polyhedral
case (i.e., C is intersection of finitely many halfspaces). We will tackle those
difficulties against the problem (5).
The problem (5) can also be studied via a very different approach devel-
oped by Borwein and Lewis [BL92] for partially finite convex programming
problems:
26 H.-D. Qi
Ax = ( ( x i , x ) , . . . ,(xn,x)), Vx G X.
Defining
Hi := {x G X\ {xi,x) =bi} , z = 1,.. . , n
the interpolation problem (5) has the following appearance
Newton's Method for Constrained Best Interpolation 27
mm
jill^o _^||2| ^ g ^ := cn {n]^,Hj)Y (11)
Recall that for any convex set D C X^ the (negative) polar of D, denoted by
D°, is defined by
Hence, the strong CHIP is actually assuming the other direction. The impor-
tance of the strong CHIP is with the following solution characterization of the
problem (11).
Theorem 1. [DLW97, Theorem 3.2] and [DeuOl, Theorem 10.13] The set
{C.,r\^Hj} has the strong CHIP if and only if for every x^ e. X there exists
A* G IR" such that the optimal solution x* = PK{X^) has the representation:
X* =Pc{x^ + A*X*)
APc{x^ + A'^X) = b.
We remark that in general the strong CHIP of the sets {C, i J i , . . . , ifn}
implies the strong CHIP of the sets {C, nyifj}. The following lemma gives a
condition that ensures their equivalence.
Lemma 1. ]Deu01, Lemma 10.11] Suppose that X is a Hilbert space and
{Co, Ci, . . . , Cm} 'is a collection of closed convex subsets such that { C i , . . . , Cm}
has the strong CHIP. Then the following statements are equivalent:
28 H.-D. Qi
Obviously, in this case (9) has a unique solution since f{x) is strongly convex.
For y G X* we calculate
= supl^{x,y + x')-l\\xf-\\\xY]
/ i | | u + a;0||2_iiiy , „ o _ | | 2 _ 1 | | oipl
= sup
APc{x^ + A''X)=:b
Following Theorem 1 we see that the sets {C, A~^{b)} has the strong CHIP. In
fact, the qualification (14) is exactly the condition 6 G ri (AC) by Proposition
1, except that (14) needs a priori assumption qri C 7^ 0.
However, for the problem (10), where
K = Cn{x\Ax>b},
the condition 6 G ri AC is not suitable as it might happen that b ^ AC. It
turns out that the strong CHIP again plays an essential role in this case. Let
Hj :={x\{aj,x) >bj}.
30 H.-D. Qi
Theorem 4. [DLW99, Theorem 3.2] The sets {C, H^Wj} has the strong CHIP
if and only if the optimal solution of (10) x* = PK{X^) has the following
representation:
X* = P c ( : ^ * + ^ * A * ) , (16)
where A* is any solution of the nonlinear complementarity problem:
A > 0, w:= APc{x^ + ^*A) - 6 > 0, \^w = 0. (17)
The following question was raised in [DLW99] that if the constraint qual-
ification (14) is a sufficient condition for the strong CHIP of {C.D^Hj}. We
give an affirmative answer in the next result.
Theorem 5. If it holds
qriCn{ninj)^iD, (18)
then the sets {C^DiTij} has the strong CHIP.
Proof Suppose (18) is in place, it follows from Theorem 3 with f{x) = h\\^~
x^W^ that there exists an optimal solution A* to the problem (13). (15) says
that
F{x + /i) - F{x) -Vh = o{\\h\\) \/V e dF{x + h) and h G K^. (22)
Furthermore, if
The use of Theorem 6 relies on the availability of the following three ele-
ments: (a) availability of an element in dF{x) near the solution x*, (b) regu-
larity of F at X* and, (c) (strong) semismoothness of F at x*. We illustrate
Newton's Method for Constrained Best Interpolation 33
how the first can be easily calculated below for the convex best interpolation
problem and leave the other two tasks to the next section.
Illustration t o the convex best interpolation problem. It follow from
(3) and (4) that the solution of the convex best interpolation problem can be
obtained by solving the following equation:
F(A) = d, (24)
where d= ( d i , . . . , d„)-^ and each component of F is given by
,(A) = f/
FJW= Ifl^eBe]
E Bjit)dt, j = l,...,n. (25)
Irvine, Marin, and Smith [IMS86] developed Newton's method for (24):
X+ = X-(M{X)r'{F{X)-d), (26)
where A and A+ denote respectively the old and the new iterate, and M(A) G
jf^nxn |g giygn by
and
0 _ flifr>0
^^^+ ~ \ 0 if r < 0.
Let e denote the element of all ones in IR^, then it is easy to see that the
directional derivative of F at A along the direction e is
F'{\e)=M{\)e.
Moreover, if F is differentiable at A then F'{X) = M(A). Due to those reasons,
the iteration (26) was then called Newton's method, and based on extensive
numerical experiments, was observed quadratically convergent in [IMS86]. In-
dependent of [IMS86], partial theoretical results on the convergence of (26) was
estabhshed by Andersson and Elfving [AE87]. Complete convergence analysis
was established by Dontchev, Qi, and Qi [DQQOl, DQQ03] by casting (26)
as a particular instance of (21). The convergence analysis procedure verifies
exactly the availability of the three elements discussed above, in particular,
M(A) G dF{\). We will present in the next section the procedure on the
constrained interpolation problem in Hilbert space.
(conceptual, at least) Newton's method for the nonsmooth equation (8). How-
ever, efficient implementation of Newton's method relies on the assumption
that there is an efficient way to calculate the generalized Jacobian of APc{x).
The most interesting case due to this consideration is when C is a closed con-
vex cone (i.e., the conical case [BL92]), which covers many problems including
(1). We recall our setting below
APc{x^ + A''X) = b.
F(A) ~b = 0 (27)
One of several difficulties with the Newton method (29) is to select an ap-
propriate matrix V^(A) from 9F(A), which is well defined as F is Lipschitz
continuous under Assumption 1 stated later. We will also see the following
choice satisfies all the requirements.
That is, V{\) is positive semidefinite for arbitrary choice A G IR^. We need
an assumption to make it positive definite. Let the support of ae be
UF=iJ(/2,) = { l , . . . , n } .
Thus we have
F(A)-T/(A)A + ^I ( x V ^ A . a ^ l xM .
^=1
36 H.-D. Qi
T/(A)A+-6-A x^ + ^ A ^ a J x° . (32)
£=1
+
A very interesting case is when x^ = 0, which implies that no function eval-
uations are required to implement Newton's method, i.e, (32) takes the form
V{X)X+ = b.
Other choices of V{X) are also possible as dF{X) usually contains infinitely
many elements. For example,
1 if r > 0
( ^ A ) ) . . : = y ((a:° + f ^ A , a , j aia^dt, and (r)^:=|j
if r < 0.
It is easy to see that P^V{X)P > P'^V{X)p for any P G IR"'. This means
that V{X) "increases the positivity" of ^(A) in the sense that V^(A) — V{X) is
positive semidefinite. The argument leading to (32) also applies to V{X). We
will show below that both V^(A) and V'(A) are contained in dF{X).
Due to Assumption (1), r(A) contains closed intervals in \a,b], possibly iso-
lated points. For j •= 1 , . . . , n, define
and
y:= I x^ + ^ A ^ a H XT(A)'
C'.= {xeL'^{T{X))\x>0}.
Hence,
lim V F - ( A + A^) = 0 G 9F-(A).
/c—>oo
D
We then have
Corollary 1. For any X G IR^, F(A) G dF{X).
Proof. It follows from Lemma 3 that 0 G 9F_-(A) andfrom (35)_that ^(A) =
VF+(A). The relation (33) then implies V{X) G dF{X). Since A is arbitrary
we are done. D
We need another assumption for our regularity result.
Assumption 3, be ^ 0 for all i = 1 , . . . , n.
L e m m a 4. Suppose Assumptions (1), (2) and (3) hold and let A* he the so-
lution of (27). then every element o/9F(A*) is positive definite.
Proof. We have proved that
dF{X*) = aF-(A*) + VF+(A*) - (9F-(A*) + F(A*)
and every element in dF~(X*) is positive semidefinite. It is enough to prove
VF"^(A*) is positive definite. We recall that at the solution
The assumption (3) implies that (x^ + Yll,=\ '^^^^)_L ^oes not vanish identi-
cally at the support of each a^. Then Lemma 2 implies that VF"^(A*) = V{X*)
is positive definite. D
Illustration to problem (2). An essential assumption for problem (2) is that
the second divided difference is positive, i.e., d^ > 0 for alH — 1 , . . . , n. Hence,
Assumption (3) is automatically valid. It is easy to see that Assumptions (1)
and (2) are also satisfied for ^-splines. It follows from the above argument
that the Newton method (26) is well defined near the solution. However, to
prove its convergence we need the semismoothness property of F , which is
addressed below.
Newton's Method for Constrained Best Interpolation 39
4.3 S e m i s m o o t h n e s s
As we see from Theorem 6 that the property of semismoothness plays an
important role in convergence analysis of nonsmooth Newton's method (21).
In our application it involves functions of following type:
^°(A;A-A)< / (t)mX,t);{X-X,0))dt.
Jo
If in (39) we replace 0°((A,it); (A-A,0)) by -(/)''{{X,t); (-A4-A,0)) and follow
an argument that is almost identical to the subsequent development, we obtain
the counter condition
/ e((A,0,(A,0)^^<^l|A-A||.
For each A S IR" the mapping 11-> e((A, t), (A, t)) is measurable, hence the set
{t|e((A,i),(A,i))<|||A-A||} .
is also measurable. Thus, A{5), the interior of measurable sets, is itself mea-
surable. Obviously, A{5) C A{S') ii 5 > 5\ And for fixed t G [0,1], semi-
smoothness gives, via Lemma 5, that
Newton's Method for Constrained Best Interpolation 41
e((A,t),(A,0)
0as07^A-A->0,
l|A-A|
i.e., for all small enough S > 0^ t £ A(S),
Let f2{5) := [0,1] \ A{S). The properties of A{6) yields (a) measurability
of i?(5), (b) Q{S) 2 f2{S^) ii 5 > S\ and (c) for each t and all small enough
S > 0, t ^ f2{S). In particular, n5>o^(^) == 0 and it follows that the measure
of i7(5), meas(i7((5)), converges to 0 as J —> 0-|-.
Let L be the Lipschitz constant of (/> in a neighborhood of (A,0), so that
for each A near A,
ei{X,t),{>^,t))<\<f>{X,t)-4>CX,t)\ + \4>°{X,t);{X-X,0))\
< 2 L | | ( A - A , 0 ) | | = 2L||A-A||
using the 2-norm. To sum up,
£=1
K = Cn{x\Ax<b}.
Under the strong CHIP assumption, we have solution characterization (16)
and (17), which we restate below for easy reference.
and the square ^'^^ is continuously differentiable, though (f)FB is not differ-
entiable. Define
(^Fs(Ai,'w;i)'
;
(f>FB{Xni'^n) ,
and
dW(Xw)c(( ^W - ^ \,V{X)GdFiX) 1
ovv(A,w) ^ <j^|^£,(;^^^) E{X,w)J ' D{X,w),E{X,w) satisfy (42) and ( 4 3 ) / "
(41)
J9(A, li;) and E{X, w) are diagonal matrices whose £th diagonal element is given
by
D,{X,w):^l- . , . / ' .,,, E , ( A , i / ; ) : - l - — ^ ^ (42)
if (A^,i(;^) 7^ 0 and by
De{X,w) = l-^e, Ee{X,w) = 1 - pe, V(e^,/>^) e JR^ such that \m,pe)\\ <1
(43)
if {Xe.we) == 0.
Lemma 6. Suppose every element V{X) in dF{X) is positive definite. Then
every element of dW{X^ w) is nonsingular.
Proof. Let M(A, w) be an element of the right side set in (41) and let (2/, z) G
IR2n be such that M{y,z) = 0. Then there exist V{X) G dF{X) and i:)(A,'w;)
and E{X, w) satisfying (42) and (43) such that
V{X)y~z = 0 and D{X,w)y-^ E{X,w)z = 0.
Since V{X) is nonsingular, it yields that
{DV-'^-}-E)z = 0.
It is well known from the NCP theory [DFK96, Theorem 21] that the matrix
{DV~^ + E) is nonsingular because V~^ is positive definite according to the
assumption. Hence, z = 0, implying y = 0. This establishes the nonsingularity
of all elements in dW{X, w). D
Newton's method for (40) can be developed as follows
{X-^,w^)-{X,w) = -M-^W{X,w), MedW(X,w). (44)
We have proved that each Fj is semismooth (Corollary 2). Using the fact that
composite of semismooth functions is semismooth and the Fischer-Burmeister
function is strongly semismooth, we know that W is semismooth function.
Suppose (A*,tt;*) is a solution of (40).
Assumption 4. Each be > 0 for i = 1 , . . . , n.
Lemma 7. Suppose Assumption (1), (2) and (4) hold. Then every element
in 9W(A*,it;*) is nonsingular.
Proof. We note that at the solution it holds
APc(a;°+^*A*) = 6 + ^*.
Since w^^ > 0, we see that be -]- w} > 0. Following the proof of Lemma 4 we
can prove that each element V in 9F(A*) is positive definite, and hence each
element of dW{X*,w*) is nonsingular by Lemma 7. D
44 H.-D. Qi
All preparation is ready for the use of Theorem 6 to state the superlinear
convergence of the method (44). The proof is similar to Theorem 7.
Theorem 8. Suppose Assumptions (1), (2) and (4) hold. Then the Newton
method (44) ^s superlinearly convergent provided that the initial point (X^^w^)
is sufficiently close to {X*^w*).
4.5 Globalization
pb / n \ 2 n
/(A):=/ x^ + ^ A . a , dt-^X^be
*^" V £=1 /+ e=i
severs this purpose because
V/(A) = F{X) - b.
Since / is convex, ||V/(A)|| = ll-P'(A) — 6|| can be used to monitor the conver-
gence of global methods. We present below a global method, which globalizes
the method (29) and has been shown extremely efficient for the convex best
interpolation problem (1).
(5.3) (Line search) Choose rrik as the smallest nonnegative integer m satis-
fying
/(A^ + p'^s^) - /(A^) < ap'^VfiX^fsK (46)
(5.4) (Update) Set A^+i = A^ + p ^ ^ 5 ^ A: : - A; + 1, return to step (S.l).
Since V{X) is positive semidefinite, the matrix {V{X) + cl) is positive defi-
nite for e > 0. Hence the linear equation (45) is well defined and the direction
s^ is a descent direction for the objective function / . The global convergence
anafysis for Algorithm 1 is standard and can be found in [DQQ03].
Globalized version for the method (44) can be developed as well, but with
some notable differences. To this case, the objective function f{X,w) is given
by
5 Open Problems
It is obvious from Section 2 and Section 4 that there is a big gap between
theoretical results and Newton-type algorithms for constrained interpolation
problems. For example, the solution characterizations appeared in Theorems
1, 3, and 4 are for general convex sets (i.e., C is a closed convex set), however,
the Newton method well-developed so far is only on the particular case yet
the most important case that C is the cone of positive functions. This is
due to the fact that the projection is an essential ingredient when solving
the interpolation problem, and that the projection on the cone of positive
functions is easy to calculate.
There are many problems that are associated to the projections onto other
convex sets including cones. We only discuss two of them which we think are
most interesting and likely to be (at least partly) solved by the techniques
developed in this paper. The first one is the case that C is a closed polyhedral
set in X, i.e.,
C := {x e X\ {ci,x) <ri^ i = 1 , . . . ,m}
where Ci G X and r^ G IR. We note that cones are not necessarily polyhe-
dral. It follows from [DeuOl, Examples 10.7 and 10.9] that the sets {C.nHj}
and {C^nHj} both have strong CHIP. Hence the solution characterization
theorems are applicable to the polyhedral case. Questions related to Pc in-
clude diff'erentiability, directional differentiability, generalized Jacobian and
46 H.-D. Qi
Hence 0 and ip are given piecewise hnear functions (or more generally lower
and upper semicontinuous functions, respectively) such that
The constraint is
Acknowledgement
The author would like to thank Danniel Ralph for his constructive comments
on the topic and especially for his kind offer of his material [Ral02] on semi-
smoothness of integral functions being included in this survey (i.e., Sec. 4.3).
It is also interesting to see how his approach can be extended to cover the
strongly semismooth case.
The work was done while the author was with School of Mathematics, The
University of New South Wales, Australia, and was supported by Australian
Research Council.
References
[AE87] Andersson, L.-E., Elfving, T.: An algorithm for constrained interpolation.
SIAM J. Sci. Statist. Comput., 8, 1012-1025 (1987)
Newton's Method for Constrained Best Interpolation 47
[Fav40] Favard, J.: Sur rinterpolation. J. Math. Pures AppL, 19, 281-306 (1940)
[Fis92] Fischer, A.: A special Newton-type optimization method. Optimization,
24, 269-284 (1992)
[GT90] Gowda, M.S., Teboulle, M.: A comparison of constraint qualifications in
infinite-dimensional convex programming. SI AM J. Control Optim., 28,
925-935 (1990)
[Hor80] Hornung, U.: Interpolation by smooth functions under restriction on the
derivatives. J. Approx. Theory, 28, 227-237 (1980)
[IP84] Iliev, G., Pollul, W.: Convex interpolation by functions with minimal Lp
norm (1 < p < oo) of the /cth derivative. Mathematics and mathematical
education (Sunny Beach, 1984), 31-42, Bulg. Akad. Nauk, Sofia (1984)
[IMS86] Irvine, L.D., Marin, S.P., Smith, P.W.: Constrained interpolation and
smoothing. Constr. Approx., 2, 129-151 (1986)
[Jey92] V. Jeyakumar: Infinite-dimensional convex programming with applica-
tions to constrained approximation. J. Optim. Theory AppL, 75, 569-586
(1992)
[JL98] V. Jeyakumar, D.T. Luc: Approximate Jacobian matrices for nonsmooth
continuous maps and C^-optimization. SIAM J. Control Optim., 36,
1815-1832 (1998)
[JW92] V. Jeyakumar, H. Wolkowicz: Generalizations of Slater's constraint qual-
ification for infinite convex programs. Math. Program., 57, 85-101 (1992)
[JQ95] Jiang, H., Qi, L.: Local uniqueness and Newton-type methods for non-
smooth variational inequahties. J. Math. Analysis and AppL, 196 314-331
(1995)
[KK02] Klatte D., Kummer, B.: Nonsmooth equations in optimization. Regular-
ity, calculus, methods and applications. Nonconvex Optimization and its
Applications, 60. Kluwer Academic Publishers, Dordrecht (2002)
[Kum88] B. Kummer: Newton's method for nondifferentiable functions. Advances
in mathematical optimization, 114-125, Math. Res., 45, Akademie-Verlag,
Berlin (1988)
[LJ02] Li, C , Jin, X.Q.: NonHnearly constrained best approximation in Hilbert
spaces: the strong chip and the basic constraint qualification. SIAM J.
Optim., 13, 228-239 (2002)
[LN02] Li, C , Ng, K.F.: On best approximation by nonconvex sets and perturba-
tion of nonconvex inequality systems in Hilbert spaces. SIAM J. Optim.,
13, 726-744 (2002)
[LN03] Li, C , Ng, K.F.: Constraint qualification, the strong chip, and best ap-
proximation with convex constraints in Banach spaces. SIAM J. Optim.,
14, 584-607 (2003)
[MSSW85] Micchein, C.A., Smith, P.W., Swetits, J., Ward, J.D.: Constrained Lp
approximation. Constr. Approx., 1, 93-102 (1985)
[MU88] Micchelfi, C.A., Utreras, F.I.: Smoothing and interpolation in a convex
subset of a Hilbert space. SIAM J. Sci. Statist. Comput., 9, 728-747
(1988)
[Mif77] Miflflin, R.: Semismoothness and semiconvex functions in constrained op-
timization. SIAM J. Control Optim., 15, 959-972 (1977)
[QS93] Qi, L., Sun, J.: A nonsmooth version of Newton's method. Math. Pro-
gram., 58, 353-367 (1993)
[Ral02] Ralph, D.: Personal communication. May. (2002)
Newton's Method for Constrained Best Interpolation 49
[SQ99] Sun, D., Qi, L.: On NCP-functions. Comput. Optim. Appl., 13, 201-220
(1999)
[Xu99] Xu, H.: Set-valued approximations and Newton's methods. Math. Pro-
gram., 84, 401-420 (1999)
Optimization Methods in Direct and Inverse
Scattering
Summary. In many Direct and Inverse Scattering problems one has to use a
parameter-fitting procedure, because analytical inversion procedures are often not
available. In this paper a variety of such methods is presented with a discussion of
theoretical and computational issues.
The problem of finding small subsurface inclusions from surface scattering data
is stated and investigated. This Inverse Scattering problem is reduced to an opti-
mization problem, and solved by the Hybrid Stochastic-Deterministic minimization
algorithm. A similar approach is used to determine layers in a particle from the
scattering data.
The Inverse potential scattering problem is described and its solution based on
a parameter fitting procedure is presented for the case of spherically symmetric
potentials and fixed-energy phase shifts as the scattering data. The central feature
of the minimization algorithm here is the Stability Index Method. This general
approach estimates the size of the minimizing sets, and gives a practically useful
stopping criterion for global minimization algorithms.
The 3D inverse scattering problem with fixed-energy data is discussed. Its so-
lution by the Ramm's method is described. The cases of exact and noisy discrete
data are considered. Error estimates for the inversion algorithm are given in both
cases of exact and noisy data. Comparison of the Ramm's inversion method with
the inversion based on the Dirichlet-to-Neumann map is given and it is shown that
there are many more numerical difficulties in the latter method than in the Ramm's
method.
An Obstacle Direct Scattering problem is treated by a novel Modified Rayleigh
Conjecture (MRC) method. MRC's performance is compared favorably to the well
known Boundary Integral Equation Method, based on the properties of the single
and double-layer potentials. A special minimization procedure allows one to inex-
pensively compute scattered fields for 2D and 3D obstacles having smooth as well
as nonsmooth surfaces.
A new Support Function Method (SFM) is used for Inverse Obstacle Scattering
problems. The SFM can work with limited data. It can also be used for Inverse
52 A.G. Ramm, S. Gutman
scattering problems with unknown scattering conditions on its boundary (e.g. soft,
or hard scattering). Another method for Inverse scattering problems, the Linear
Sampling Method (LSM), is analyzed. Theoretical and computational difficulties in
using this method are pointed out.
1 Introduction
Suppose that an acoustic or electromagnetic wave encounters an inhomo-
geneity and, as a consequence, gets scattered. The problem of finding the
scattered wave assuming the knowledge of the inhomogeneity (penetrable or
not) is the Direct Scattering problem. An impenetrable inhomogeneity is also
called an obstacle. On the other hand, if the scattered wave is known at
some points outside an inhomogeneity, then we are faced with the Inverse
Scattering problem, the goal of which is to identify this inhomogeneity, see
[CCMOO, CK92, Ram86, Ram92b, Ram94a, Ram05a, Ram05b]
Among a variety of methods available to handle such problems few pro-
vide a mathematically justified algorithm. In many cases one has to use a
parameter-fitting procedure, especially for inverse scattering problems, be-
cause the analytical inversion procedures are often not available. An impor-
tant part of such a procedure is an efficient global optimization method, see
[FloOO, FPOl, HPT95, HT93, PRTOO, RubOO].
The general scheme for parameter-fitting procedures is simple: one has a
relation B{q) = A, where B is some operator, q is an unknown function, and A
is the data. In inverse scattering problems q is an unknown potential, and A is
the known scattering amplitude. If q is sought in a finite-parametric family of
functions, then q = q{x^p), where p = (pi, ....,Pn) is a parameter. The parame-
ter is found by solving a global minimization problem: ^[B{q{x,p))—A] = min,
where ^ is some positive functional, and q E Q^ where Q is an admissible set
oi q. In practice the above problem often has many local minimizers, and the
global minimizer is not necessarily unique. In [Ram92b, Ram94b] some func-
tional ^ are constructed which have unique global minimizer, namely, the
solution to inverse scattering problem, and the global minimum is zero.
Moreover, as a rule, the data A is known with some error. Thus As is
known, such that \\A — As\\ < S. There are no stability estimates which would
show how the global minimizer q{x^Popt) is perturbed when the data A are
replaced by the perturbed data A5. In fact, one can easily construct examples
showing that there is no stability of the global minimizer with respect to small
errors in the data, in general.
For these reasons there is no guarantee that the parameter-fitting proce-
dures would yield a solution to the inverse problem with a guaranteed accu-
racy. However, overwhelming majority of practitioners are using parameter-
fitting procedures. In dozens of published papers the results obtained by vari-
ous parameter-fitting procedures look quite good. The explanation, in most of
the cases is simple: the authors know the answer beforehand, and it is usually
Optimization Methods in Direct and Inverse Scattering 53
not difficult to parametrize the unknown function so that the exact solution is
well approximated by a function from a finite-parametric family, and since the
authors know a priori the exact answer, they may choose numerically the val-
ues of the parameters which yield a good approximation of the exact solution.
When can one rely on the results obtained by parameter-fitting procedures?
Unfortunately, there is no rigorous and complete answer to this question, but
some recommendations are given in Section 4-
In this paper the authors present their recent results which are based on
specially designed parameter-fitting procedures. Before describing them, let us
mention that usually in a numerical solution of an inverse scattering problem
one uses a regularization procedure, e.g. a variational regularization, spectral
cut-ofi", iterative regularization, DSM (the dynamical systems method), quasi-
solutions, etc, see e.g. [Ram04a, Ram05a]. This general theoretical framework
is well established in the theory of ill-posed problems, of which the inverse
scattering problems represent an important class. This framework is needed
to achieve a stable method for assigning a solution to an ill-posed problem,
usually set in an infinite dimensional space. The goal of this paper is to present
optimization algorithms already in a finite dimensional setting of a Direct or
Inverse scattering problem.
In Section 2 the problem of finding small subsurface inclusions from sur-
face scattering data is investigated ([Ram97, RamOOa, Ram05a, Ram05b]).
This (geophysical) Inverse Scattering problem is reduced to an optimization
problem. This problem is solved by the Hybrid Stochastic-Deterministic min-
imization algorithm ([GROO]). It is based on a genetic minimization algorithm
ideas for its random (stochastic) part, and a deterministic minimization with-
out derivatives used for the local minimization part.
In Section 3 a similar approach is used to determine layers in a particle
subjected to acoustic or electromagnetic waves. The global minimization al-
gorithm uses Rinnooy Kan and Timmer's Multilevel Single-Linkage Method
for its stochastic part.
In Section 4 we discuss an Inverse potential scattering problem appear-
ing in a quantum mechanical description of particle scattering experiments.
The central feature of the minimization algorithm here is the Stability Index
Method ([GRS02]). This general approach estimates the size of the minimizing
sets, and gives a practically useful stopping criterion for global minimization
algorithms.
In Section 5 Ramm's method for solving 3D inverse scattering problem
with fixed-energy data is presented following [Ram04d], see also [Ram02a,
Ram05a]. The cases of exact and noisy discrete data are considered. Error
estimates for the inversion algorithm are given in both cases of exact and
noisy data. Comparison of the Ramm's inversion method with the inversion
based on the Dirichlet-to-Neumann map is given and it is shown that there
are many more numerical difficulties in the latter method than in Ramm's
method.
54 A.G. Ramm, S. Gutman
expiiklx — y\) ^
The proposed method for solving the (IP) consists of finding the global
minimizer of function (6). This minimizer ( ^ 1 , . . . , ZM, ^I> • • • ? VM) gives the
estimates of the positions Zm of the small inhomogeneities and their intensities
Vm- See [Ram97] and [RamOOa] for a justification of this approach.
The function ^ depends on M unknown points 2;^^ G R i , and M unknown
parameters Vm, I < m < M. The number M of the small inhomogeneities is
also unknown, and its determination is a part of the minimization problem.
56 A.G. Ramm, S. Gutman
The box is located above the earth surface for a computational convenience.
Then, given the location of the points 2:1, ^ 2 , . . . , ZM^ the minimum of <P
with respect to the intensities t;i,t'2, • • • ^VM can be found by minimizing the
resulting quadratic function in (6) over the region satisfying (8). This can be
done using normal equations for (6) and projecting the resulting point back
onto the region defined by (8). Denote the result of this minimization by c^,
that is
Morig<M, (11)
and
Z2 = (-1,0.3,0.580),
The plot shows multiple local minima and almost flat regions.
A direct application of a gradient type method to such a function would
result in finding a local minimum, which may or may not be the sought global
one. In the example above, such a method would usually be trapped in a lo-
cal minimum located at r = —2, r = —1.4, r = —0.6, r = 0.2 or r = 0.9,
58 A.G. Ramm, S. Gutman
and the desired global minimum at r = 1.6 would be found only for a suffi-
ciently close initial guess 1.4 < r < 1.9. Various global minimization methods
are known (see below), but we found that an efficient way to accomplish
the minimization task for this Inverse Problem was to design a new method
(HSD) combining both the stochastic and the deterministic approach to the
global minimization. Deterministic minimization algorithms with or without
the gradient computation, such as the conjugate gradient methods, are known
to be efficient (see [Bre73, DS83, Jac77, Pol71]), and [RubOO]. However, the
initial guess should be chosen sufficiently close to the sought minimum. Also
such algorithms tend to be trapped at a local minimum, which is not nec-
essarily close to a global one. A new deterministic method is proposed in
[BP96] and [BPR97], which is quite efficient according to [BPR97]. On the
other hand, various stochastic minimization algorithms, e.g. the simulated
annealing method [KGV83, Kir84], are more likely to find a global minimum,
but their convergence can be very slow. We have tried a variety of minimiza-
tion algorithms to find an acceptable minimum of 3. Among them were the
Levenberg-Marquardt Method, Conjugate Gradients, Downhill Simplex, and
Simulated Annealing Method. None of them produced consistent satisfactory
results.
Among minimization methods combining random and deterministic searches
we mention Deep's method [DE94] and a variety of clustering methods
[RT87a], [RT87b]. An application of these methods to the particle identifi-
cation using light scattering is described in [ZUB98]. The clustering methods
are quite robust (that is, they consistently find global extrema) but, usually,
require a significant computational eff'ort. One such method is described in
the next section on the identification of layers in a multilayer particle. The
HSD method is a combination of a reduced sample random search method
with certain ideas from Genetic Algorithms (see e.g. [HH98]). It is very effi-
cient and seems especially well suited for low dimensional global minimization.
Further research is envisioned to study its properties in more detail, and its
applicability to other problems.
The steps of the Hybrid Stochastic-Deterministic (HSD) method are
outlined below. Let us call a collection of M points ( inclusion's centers)
{ZI,Z2,>.'-,ZM}^ Zi e B a, configuration Z. Then the minimization problem
(10) is the minimization of the objective function ^ over the set of all config-
urations.
For clarity, let PQ = 1, e^ = 0.5, e^ = 0.25, Cd = 0.1, be the same values
as the ones used in numerical computations in the next section.
Generate a random configuration Z. Compute the best fit intensities Vi
corresponding to this configuration. If Vi > Vmaxi then let Vi :== Vmax- If
Vi < 0, then let Vi :== 0. If <P(Z) < PQCS, then this configuration is a preliminary
candidate for the initial guess of a deterministic minimization method (Step
!)•
Drop the points Zi e Z such that Vi < Vmax^i- That is, the inclusions with
small intensities are eliminated (Step 2).
Optimization Methods in Direct and Inverse Scattering 59
If two points Zky Zj G Z are too close to each other, then replace them with
one point of a combined intensity (Step 3).
After completing steps 2 and 3 we would be left with N < M points
zi,Z2^'.",Z]s[ (after a re-indexing) of the original configuration Z. Use this re-
duced configuration Zred as the starting point for the deterministic restraint
minimization in the 3N dimensional space (Step 4). Let the resulting mini-
mizer be Zred = (^i, ••-, ^iv)- If the value of the objective function 3{Zred) < e,
then we are done: Zred is the sought configuration containing N inclusions. If
^(Zred) ^ e, then the iterations should continue.
To continue the iteration, randomly generate M — N points in B (Step
5). Add them to the reduced configuration Zred- Now we have a new full
configuration Z, and the iteration process can continue (Step 1).
This entire iterative process is repeated Umax times, and the best config-
uration is declared to represent the sought inclusions.
Let PQ, Tmax^ '^max', ^s? ^ii ^di and € be positive numbers. Let a positive
integer M be larger than the expected number of inclusions. Let N = 0.
In both cases we searched for the same 6 inhomogeneities with the coordinates
xi,X2,X3 and the intensities v shown in Table 1.
Parameter M was set to 16, thus the only information on the number
of inhomogeneities given to the algorithm was that their number does not
exceed 16. This number was chosen to keep the computational time within
reasonable limits. Still another consideration for the number M is the aim of
the algorithm to find the presence of the most influential inclusions, rather
then all inclusions, which is usually impossible in the presence of noise and
with the limited amount of data.
Experiment 1. In this case we used 12 sources and 21 detectors, all on
the surface xs = 0. The sources were positioned at {(—1.667 -f 0.667i, —0.5 +
l.Oj, 0), i = 0 , 1 , . . . , 5, j = 0,1}, that is 6 each along two lines X2 = —0.5 and
X2 = 0.5. The detectors were positioned at {(—2 + 0.667z, —1.0+ l.Oj, 0), i =
0 , 1 , . . . , 6 , J = 0,1,2}, that is seven detectors along each of the three lines
Optimization Methods in Direct and Inverse Scattering 61
• Sources
• Detectors
O Inclusions
X Identified Objects
Fig. 2. Inclusions and Identified objects for subsurface particle identification, Ex-
periment I, S — 0.00. X3 coordinate is not shown.
deepest and the weakest are lost. This can be expected, since their influence
on the cost functional is becoming comparable with the background noise in
the data.
In summary, the proposed method for the identification of small inclusions
can be used in geophysics, medicine and technology. It can be useful in the
development of new approaches to ultrasound mammography. It can also be
used for localization of holes and cracks in metals and other materials, as
well as for finding mines from surface measurements of acoustic pressure and
possibly in other problems of interest in various applications.
The HSD minimization method is a specially designed low-dimensional
minimization method, which is well suited for many inverse type problems.
The problems do not necessarily have to be within the range of applicability
of the Born approximation. It is highly desirable to apply HSD method to
practical problems and to compare its performance with other methods.
and S'm = {x G M : |x| — r ^ } for 0 = ro < ri < • • • < rjv < i?. Suppose that
a multilayered scatterer in D has a constant refractive index Um in the region
Dm , m = 1,2,..., AT. If the scatterer is illuminated by a plane harmonic
wave then, after the time dependency is eliminated, the total field u{x) —
uo{x) + Us{x) satisfies the Helmholtz equation
• Sources
• Detectors
O Inclusions
X Identified Objects
Fig. 3. Inclusions and Identified objects for for subsurface particle identification,
Experiment 2, ^ = 0.00. xs coordinate is not shown.
where uo{x) = e'^^^^'^ is the incident field and a is the unit vector in the direc-
tion of propagation. The scattered field Us is required to satisfy the radiation
condition at infinity, see [Ram86].
Let fc^ = fco^m- We consider the following transmission problem
under the assumption that the fields Um and their normal derivatives are
continuous across the boundaries Sm , m = l,2,...,A^.
In fact, the choice of the boundary conditions on the boundaries Sm de-
pends on the physical model under the consideration. The above model may
or may not be adequate for an electromagnetic or acoustic scattering, since
the model may require additional parameters (such as the mass density and
the compressibility) to be accounted for. However, the basic computational
approach remains the same. For more details on transmission problems, in-
cluding the questions on the existence and the uniqueness of the solutions, see
[ARS98, EJP57, RPYOO].
The Inverse Problem to be solved is:
IPS: Given u{x) for all x E S = {x : \x\ = R) at a fixed ko > 0, find the
number N of the layers, the location of the layers, and their refractive indices
Um, m=^ 1,2,,.. ,N in (14).
Here the IPS stands for a Single frequency Inverse Problem. Numerical ex-
perience shows that there are some practical difficulties in the successful res-
olution of the IPS even when no noise is present, see [GutOl]. While there are
some results on the uniqueness for the IPS (see [ARS98, RPYOO]), assuming
that the refractive indices are known, and only the layers are to be identified,
the stability estimates are few, see [Ram94c, Ram94d, Ram02a]. The identi-
fication is successful, however, if the scatterer is subjected to a probe with
plane waves of several frequencies. Thus we state the Multifrequency Inverse
Problem:
IPM: Given U'P{X) for all x E S = {x : \x\ = R) at a finite number P of
wave numbers k^ > 0, find the number N of the layers, the location of the
layers, and their refractive indices Um , m = 1,2,... ,N in (14).
If the refractive indices riyyi are sufficiently close to 1, then we say that the
scattering is weak. In this case the scattering is described by the Born ap-
proximation, and there are methods for the solution of the above Inverse
Problems. See [CM90], [Ram86] and [Ram94a] for further details. In particu-
lar, the Born inversion is an ill-posed problem even if the Born approximation
is very accurate, see [Ram90], or [Ram92b]. When the assumption of the Born
approximation is not appropriate, one matches the given observations to a set
of solutions for the Direct Problem. Since our interest is in the solution of the
IPS and IPM in the non-Born region of scattering, we choose to follow the
best fit to data approach. This approach is used widely in a variety of applied
problems, see e. g. [Bie97].
Note, that, by the assumption, the scatterer has the rotational symmetry.
Thus we only need to know the data for one direction of the incident plane
wave. For this reason we fix a = (1,0) in (13) and define the (complex)
functions
9^^\e), 0 < ^ < 2 ^ , p = l,2,...,P, (15)
66 A.G. Ramm, S. Gutman
to be the observations measured on the surface S of the ball D for a finite set
of free space wave numbers fcg .
Fix a positive integer M. Given a configuration
Q = (ri,r2,...,rM,ni,n2,...,nM) (16)
we solve the Direct Problem (13)-(14) (for each free space wave number k^)
with the layers Dm = {x G M."^ : Vm-i < \x\ < Vm , m = 1,2,..., M } , and
the corresponding refractive indices n ^ , where TQ = 0, Let
M2=d2^'{0i))'/'. (18)
1=1
Define
1 - ||^(P)_^(.)||2
^ ( r i , r 2 , . . . , r M , n i , n 2 , . . . , n M ) == p 2 ^ IIQ^PH^ ' ^^^^
0.80
0.60
0.00
Fig. 4. Best fit profile for the configurations qt', Multiple frequencies P
{3.0, 6.5, 10.0}.
MSLM
'n\2
dj = ^{d)? + (dJ)
4. Order the sample points in H^^^ so that ^{Qi) < ^(Q^+i), 2 = 1 , . . . , jjL.
For each value of z, start the local minimization from Q^, unless there
exists an index A: < i, such that \\Qk — Qi\\ < dj. Ascertain if the result is
a known local minimum.
5. Let K be the number of local minimizations performed, and W be the
number of different local minima found. Let
K-W-2
The algorithm is terminated if
scattering data, which in this case consist of the fixed-energy phase shifts. In
[Ram96, Ram02a, Ram04d, Ram05a] the three-dimensional inverse scattering
problem with fixed-energy data is treated.
Let q{x)^ X G M^, be a real-valued potential with compact support. Let
i? > 0 be a number such that q{x) = 0 iov \x\ > R. We also assume that
q e L'^{BR) , BR = {x : \x\ < R,x e M^}. Let 5^ be the unit sphere, and
a e S"^. For a given energy A; > 0 the scattering solution '0(x,a) is defined as
the solution of
dv . '
Um / — ikv ds = 0. (26)
or
It can be shown, that
^ikr /2\
'^(x, a) = T/^O + A(a', a, k) f- o I - , as r oo, — = a' r := \x\.
r
r \r J (27)
The function A{a',a,k) is called the scattering amplitude, a and a' are
the directions of the incident and scattered waves, and /c^ is the energy, see
[New82, Ram94a].
For spherically symmetric scatterers q{x) — q{r) the scattering amplitude
satisfies A{a'^a^k) = A{a^ • a,k). The converse is established in [Ram91].
Following [RS99], the scattering amplitude for q = q{r) can be written as
oo I
where Yim are the spherical harmonics, normalized in L^(5^), and the bar
denotes the complex conjugate.
The fixed-energy phase shifts —TT < 5i < n {6i = 5(/,A:), k > Ois fixed) are
related to Ai{k) (see e.g., [RS99]) by the formula:
and [RGOl, RS99, RSOO], present new numerical methods for solving this
problem. In [Ram02d] (also see [Ram04b, Ram05a]) it is proved that the
R.Newton-P.Sabatier method for solving inverse scattering problem the fixed-
energy phase shifts as the data (see [CS89, New82] ) is fundamentally wrong
in the sense that its foundation is wrong. In [Ram02c] a counterexample is
given to a uniqueness theorem claimed in a modification of the R.Newton's
inversion scheme.
Phase shifts for a spherically symmetric potential can be computed by a
variety of methods, e.g. by a variable phase method described in [Cal67]. The
computation involves solving a nonlinear ODE for each phase shift. However,
if the potential is compactly supported and piecewise-const ant, then a much
simpler method described in [ARS99] and [GRS02] can be used. We refer the
reader to these papers for details.
Let ^o(^) be a spherically symmetric piecewise-constant potential, {6{k, O l / l i
be the set of its phase shifts for a fixed k > 0 and a sufficiently large A^. Let
q{r) be another potential, and let {<5(A;,/)}^^ be the set of its phase shifts.
The best fit to data function ^(g, k) is defined by
*(,,,) = E £ 4 M H M ! ! , (30,
The phase shifts are known to decay rapidly with /, see [RAI98]. Thus, for
sufficiently large A/", the function ^ is practically the same as the one which
would use all the shifts in (30). The inverse problem of the reconstruction of
the potential from its fixed-energy phase shifts is reduced to the minimization
of the objective function ^ over an appropriate admissible set.
problems with large stability indices have distinct minimizers with practi-
cally the same values of the objective function. If no additional information
is known, one has an uncertainty of the minimizer's choice. The stability in-
dex provides a quantitative measure of this uncertainty or instability of the
minimization.
If Dc < ry, where r/ is an a priori chosen treshold, then one can solve the
global minimization problem stably. In the above general scheme it is not
discussed in detail what are possible algorithms for computing the Stability
Index.
One idea to construct such an algorithm is to iteratively estimate stabil-
ity indices of the minimization problem, and, based on this information, to
conclude if the method has achieved a stable minimum.
One such algorithm is an Iterative Reduced Random Search (IRRS)
method, which uses the Stability Index for its stopping criterion. Let a batch
H of L trial points be randomly generated in the admissible set Aadm- Let
7 be a certain fixed fraction, e.g., 7 = 0.01. Let Smin be the subset of H
containing points {pi} with the smallest 7L values of the objective function
^ in if. We call Smin the minimizing set. If all the minimizers in Smin are
close to each other, then the objective function ^ is not fiat near the global
minimum. That is, the method identifies the minimum consistently. Let || • ||
be a norm in the admissible set.
Let
e= max ^(pj) - min ^{pj)
and
De = diam{Smin) = max{||p^ - pj\\ : pi.pj e Smin} - (33)
Then D^ can be considered an estimate for the Stability Index D^ of
the minimization problem. The Stability Index reflects the size of the mini-
mizing sets. Accordingly, it is used as a self-contained stopping criterion for
an iterative minimization procedure. The identification is considered to be
stable if the Stability Index D^ < rj, for an a priori chosen rj > 0. Otherwise,
another batch of L trial points is generated, and the process is repeated. We
used /3 — 1.1 as described below in the stopping criterion to determine if
subsequent iterations do not produce a meaningful reduction of the objective
function.
More precisely
We used /? = 1.1, e = 0.02 and jmax = 30. The choice of these and other
parameters (L = 5000, 7 = 0.01, v = 0.16 ) is dictated by their meaning in the
Optimization Methods in Direct and Inverse Scattering 75
5h{kJ) = 5{kJ){l-^{0.5-z)'h),
where z is the uniformly distributed on [0,1] random variable.
The distance d{pi{r)^p2{r)) for potentials in step 5 of the IRRS algorithm
was computed as
d{pi{r),P2{r)) = \\pi{r) - P 2 ( r ) | |
where the norm is the L2-norm in R^.
The results of the identification algorithm (the Stability Indices) for dif-
ferent iterations of the IRRS algorithm are shown in Tables 6-8.
For example, Table 8 shows that for A: = 2.5, h = 0.00 the Stability Index
has reached the value 0.013621 after 2 iteration. According to the Stopping
criterion for IRRS, the program has been terminated with the conclusion
that the identification was stable. In this case the potential identified by the
program was
and
-9.999565 0 < r < 7.987208
P2{r) == <-1.236253 7.987208 < r < 8.102628
0.0 r > 8.102628
with ^{pi) = 0.0992806 and ^(^2) = 0.0997561. One may conclude from this
example that the threshold e = 0.02 is too tight and can be relaxed, if the
above uncertainty is acceptable.
Finally, we studied the dependency of the Stabihty Index from the dimen-
sion of the admissible set Aadm^ see (34). This dimension is equal to 2M , where
M is the assumed number of layers in the potential. More precisely, M = 3,
for example, means that the search is conducted in the class of potentials
having 3 or less layers. The experiments were conducted for the identification
of the original potential g2(^) with k = 2.0 and no noise present in the data.
78 A.G. Ramm, S. Gutman
The results are shown in Table 9. Since the potential q2 consists of only one
layer, the smallest Stability Indices are obtained for M = 1. They gradually
increase with M. Note, that the algorithm conducts the global search using
random variables, so the actual values of the indices are different in every
run. Still the results show the successful identification (in this case) for the
entire range of the a priori chosen parameter M. This agrees with the theoret-
ical consideration according to which the Stability Index corresponding to an
ill-posed problem in an infinite-dimensional space should be large. Reducing
the original ill-posed problem to a one in a space of much lower dimension
regularizes the original problem.
80 A.G. Ramm, S. Gutman
Table 9. Stability Indices for (72 (r) identification for different values of M.
Iteration M =1 M =2 M =3 M =4
1 0.472661 1.068993 1.139720 1.453076
2 0.000000 0.400304 0.733490 1.453076
3 0.000426 0.125855 0.899401
4 0.125855 0.846117
5 0.033173 0.941282
6 0.033173 0.655669
7 0.033123 0.655669
8 0.000324 0.948816
9 0.025433
10 0.025433
11 0.012586
The results we describe in this Section are taken from [Ram94a] and [Ram02a].
Assume q e Q := Qa H L'^{R^), where Qa := {q : q{x) = q{x), q{x) G
L'^{Ba), q{x) =: 0 if \x\ > a}, Ba '.= {x : \x\ < a). Let A{a'^a) be the cor-
responding scattering amplitude at a fixed energy A;^, A; = 1 is taken without
loss of generality. One has:
00 «
where 5^ is the unit sphere in R^, Yeiot') = y^,m(<^Oj~^ < m < £, are the
normalized spherical harmonics, summation over m is understood in (37) and
in (44) below. Define the following algebraic variety:
3
M : = {l9 : 6> G C ^ 6>. 6> = 1 } , 6> • it; : = ^ Ojivj. (38)
Optimization Methods in Direct and Inverse Scattering 81
This variety is non-compact, intersects R^ over 5^, and, given any ^ G R'^,
there exist (many) 9,9' G M such that
satisfy (39) for any complex numbers (i and C2 satisfying the last equa-
tion (40) and such that |CiP + IC2P —> 00. There are infinitely many
such C15C2 ^ C. Consider a subset M' C M consisting of the vectors
9 = (sin-i? cos (^, sin 1? sin (^, cos-i?), where ^9 and (p run through the whole com-
plex plane. Clearly 9 E M, but M ' is a proper subset of M. Indeed, any
9 e M with 93 ^ ±1 is an element of M\ If 93 = ±1, then cos^ = ± 1 ,
so sin 7? = 0 and one gets 9 = (0,0, ±1) G M'. However, there are vectors
9 = (^1,^2,1) ^ M which do not belong to M'. Such vectors one obtains
choosing ^1,^2 ^ C such that ^f -f ^2 — 0. There are infinitely many such
vectors. The same is true for vectors (^1,^2?—!)• Note that in (39) one can
replace M by M ' for any ^ G M^, ^ 7^ 2e3.
Let us state two estimates proved in [Ram94a]:
j,^|^,(a)|<c(2)*(|f', (41)
where c > 0 is a constant depending on the norm ||9||L2(Ba)5 ^^^
1 grl/m^l
\Ye{9)\< r....... Vr>0, 9 e M', (42)
V47r \j£[r)\
where
1 1 fer\^
Mr) : - {^yJe^iir) - ^ ^ ( | ) [1 + o(l)] as ^ - 00, (43)
and Ji{r) is the Bessel function regular at r = 0. Note that 1^(0;')) defined
above, admits a natural analytic continuation from S'^ to M by taking 1} and
(p to be arbitrary complex numbers. The resulting 9' G M ' C M.
The series (37) converges absolutely and uniformly on the sets 5^ x Mc,
where Mc is any compact subset of M.
Fix any numbers ai and 6, such that a < ai < b. Let || • || denote the
L'^icii ^ l^:] < 6)-norm. If |x| > a, then the scattering solution is given
analytically:
/.,(r):=e^f(^+^)y^<,(r),
H^ \r) is the Hankel function, and the normalizing factor is chosen so that
heir) = -^[1 + 0(1)] as r -^ 00. Define
where the infimum is taken over all ly e L^(S'^), and (39) holds.
It is proved in [Ram94a] that
\\pix,iy)\\<2d{e), (48)
where in place of the factor 2 in (48) one could put any fixed constant greater
than 1.
b) Any such i^ia, 9) generates an estimate of q{^) with the error O ( 4 | 1,
1^1 -^ 00. This estimate is calculated by the formula
The norm of q in the above Theorem can be any norm such that the set
{q\ \\q\\ < const) is a compact set in L^{Ba)'
In [Ram94a, Ram02a] an inversion algorithm is formulated also for noisy
data, and the error estimate for this algorithm is obtained. Let us describe
these results.
Assume that the scattering data are given with some error: a function
As{a'^a) is given such that
N{d)
us{x, a) : - e^^-^ + ^ A5e{a)Ye{a')he{r), (55)
the supremum is taken over 6 e M and u G LP'{S'^) under the constraint (60).
By c we denote various positive constants.
Given ^ G M^ one can always find 9 and 0' such that (39) holds. We prove
that 'd{5) —> GO, more precisely:
Let the pair 6{5) and i/<5(a, 0) be any approximate solution to problem
(59)-(60) in the sense that
\0m > ^ . (62)
Calculate
qs := -47r / A5{e\a)u8{a,e)da. (63)
In [Ram94a] estimates (50) and (64) were formulated with the supremum
taken over an arbitrary large but fixed ball of radius ^o- Here these estimates
are improved: ^o = oo- The key point is: the constant c > 0 in the estimate
(47) does not depend on 6.
Remark. In [Ram96] (see also [Ram92a, Ram02a]) an analysis of the
approach to ISP, based on the recovery of the DN (Dirichle-to-Neumann)
map from the fixed-energy scattering data, is given. This approach is discussed
below.
The basic numerical difficulty of the approach described in Theorems 1 and
2 comes from solving problems (46) for exact data, and problem (59)-(60) for
noisy data. Solving (46) amounts to finding a global minimizer of a quadratic
form of the variables Q , if one takes u in (45) as a linear combination of the
spherical harmonics: u = J^^^Q ^^^^(^)- ^^ ^^^ ^^^^ ^^^ necessary condition
for a minimizer of a quadratic form, that is, a linear system, then the matrix
of this system is ill-conditioned for large L, This causes the main difficulty
in the numerical solution of (46). On the other hand, there are methods for
global minimization of the quadratic functionals, based on the gradient de-
scent, which may be more efficient than using the above necessary condition.
Let
w= g{x,s)a{s)ds, (69)
where a is some function, which we find below, and g is the Green func-
tion (resolvent kernel) of the Schroedinger operator, satisfying the radiation
condition at infinity. Then
wj^ =w]^ + a, (70)
where A^ is the outer normal to 5, so A^ is directed along the radius-vector.
We require w = f on S. Then w is given by (68) in the exterior of 5, and
r An r
where u is the scattering solution,
00
Prom (68), (72) and (73) one gets an equation for finding a ([Ram96], eq.
(23), see also [Ram94a], p. 199):
i^ = a\-iyTM'^^i'jM)Sw+Ainhv{a% (75)
are the Fourier coefficients of the scattering ampHtude. Problems (74) and
(75) are very ill-posed (see [Ram96] for details).
This approach faces many difficulties:
1) The construction of the DN map from the scattering data is a very
ill-posed problem,
2) The construction of the potential from the DN map is a very difficult
problem numerically, because one has to solve a Predholm-type integral equa-
tion ( equation (66) ) whose kernel contains G, defined in (67). This G is a
tempered distribution, and it is very difficult to compute it,
3) One has to calculate a limit of an integral whose integrand grows ex-
ponentially to infinity if a factor in the integrand is not known exactly. The
solution of equation (66) is one of the factors in the integrand. It cannot be
known exactly in practice because it cannot be calculated with arbitrary ac-
curacy even if the scattering data are known exactly. Therefore the limit in
formula (65) cannot be calculated accurately.
No error estimates are obtained for this approach.
In contrast, in Ramm's method, there is no need to compute G, to solve
equation (66), to calculate the DN map from the scattering data, and to
compute the limit (65). The basic difficulty in Ramm's inversion method for
exact data is to minimize the quadratic form (46), and for noisy data to
solve optimization problem (59)-(60). The error estimates are obtained for
the Ramm's method.
In this section we present a novel numerical method for Direct Obstacle Scat-
tering Problems based on the Modified Rayleigh Conjecture (MRC). The basic
Optimization Methods in Direct and Inverse Scattering 87
^ikr / 2\ ^
u = uo-\-A(a\ a) h o - , r :== b l - ^ oo, a':=-. (77)
r \r J r
Here UQ := e'^^^'^ is the incident field, v :— U—UQ is the scattered field, A{a', a)
is called the scattering amplitude, its k-dependence is not shown, k > 0 is the
wavenumber. Denote
over c G C ^ , where c = { Q , J } - That is, the total field u — g{x) + 1 ' is desired
to be as close to zero as possible at the boundary 5, to satisfy the required
88 A.G. Ramm, S. Gutman
j=\ e=o
J L
3 = 1 ^=0
over c G C"^, where c = { Q J } -
Let the minimal value of ^ be r^^'^.
c) Let
J L
Ve{x) := Ve{x) + ^ ^ Q j ' 0 ^ ( x , X^), X G D\
j=i e=o
3. Stopping criterion.
a) If r"^^^ < e, then stop.
b) If r^^^ > e, and n y^ Nmax, let
J L
g{x) := g{x) + Y^Y^cejiJi{x,Xj), x e S
j=l£=0
Optimization Methods in Direct and Inverse Scattering 89
Let a ball BR := {x : \x\ < R} contain the obstacle D. In the region r > R
the solution to (76)-(77) is:
X
u{x, a) - e^^«-^ + Yl Mc^)i^e. ^e -= yi{a')he{kr), r > R, a' =
r
£=0
(81)
where the sum includes the summation with respect to m, —^ < m < £, and
A^{a) are defined in (78).
The Rayleigh conjecture (RC) is: the series (81) converges up to the bound-
ary S (originally RC dealt with periodic structures, gratings). This conjecture
is false for many obstacles, but is true for some ([Bar71, Mil73, Ram86]). For
example, if n = 2 and D is an ellipse, then the series analogous to (81) con-
verges in the region r > a^ where 2a is the distance between the foci of the
ellipse [Bar71]. In the engineering literature there are numerical algorithms,
based on the Rayleigh conjecture. Our aim is to give a formulation of a Mod-
ified Rayleigh Conjecture (MRC) which holds for any Lipschitz obstacle and
can be used in numerical solution of the direct and inverse scattering problems
(see [Ram02b]). We discuss the Dirichlet condition but similar argument is
applicable to the Neumann boundary condition, corresponding to acoustically
hard obstacles.
Fix e > 0, an arbitrary small number.
Lemma 1. There exist L = L{e) and C£ — Ci{e) such that
L(e)
Theorem 3. For an arbitrary small e > 0 there exist L(e) and Q(e), 0 < £ <
L{e), such that (82), (84) and (86) hold.
dN
LHS)
with respect to Q . Analogs of Lemmas 1-3 are valid and their proofs are
essentially the same.
See [Ram04c] for an extension of these results to scattering by periodic
structures.
method with the BIEM, inspite in spite of the fact that the numerical imple-
mentation of the MRC method in [GR02b] is considerably less efficient than
the one presented in this paper.
A numerical implementation of the Random Multi-point MRC method
follows the same outline as for the Multi-point MRC, which was described in
[GR02b]. Of course, in a 2D case, instead of (79) one has
iPi{x,Xj) = Hl'\k\x-Xj\)e'^^^,
where {x — Xj)/\x — Xj\ = e^^K
For a numerical implementation choose M nodes {tm} on the surface S of
the obstacle D. After the interior points Xj, j = 1,2,..., J are chosen, form A^
vectors
Table 10. Normalized residuals attained in the numerical experiments for 2D ob-
stacles, ||uo|| = 1.
Experiment J k a
I 4 1.0 (1.0,0.0) 0.000201 0.0001
4 1.0 (0.0,1.0) 0.000357 0.0001
4 5.0 (1.0,0.0) 0.001309 0.0001
4 5.0 (0.0,1.0) 0.007228 0.0001
II 16 1.0 (1.0,0.0) 0.003555 0.0001
16 1.0 (0.0,1.0) 0.002169 0.0001
16 5.0 (1.0,0.0) 0.009673 0.0001
16 5.0 (0.0,1.0) 0.007291 0.0001
III 16 1.0 (1.0,0.0) 0.008281 0.0001
16 1.0 (0.0,1.0) 0.007523 0.0001
16 5.0 (1.0,0.0) 0.021571 0.0001
16 5.0 (0.0,1.0) 0.024360 0.0001
IV 32 1.0 (1.0,0.0) 0.006610 0.0001
32 1.0 (0.0,1.0) 0.006785 0.0001
32 5.0 (1.0,0.0) 0.034027 0.0001
32 5.0 (0.0,1.0) 0.040129 0.0001
Table 11. Normalized residuals attained in the numerical experiments for 3D ob-
stacles, ||uo|| = 1.
Experiment k ai r^*^ Nuer run time
I To 00002 i 1 sec
5.0 0.001 700 7min
II 1.0 (1) 0.001 800 16 min
1.0 (2) 0.001 200 4 min
5.0 (1) 0.0035 2000 40 min
5.0 (2) 0.002 2000 40 min
III 1.0 (1) 0.001 3600 37 min
1.0 (2) 0.001 3000 31 min
5.0 (1) 0.0026 5000 53 min
5.0 (2) 0.001 5000 53 min
In the last experiment the run time could be reduced by taking a smaller
value for J. For example, the choice of J == 8 reduced the running time to
about 6-10 minutes.
Numerical experiments show that the minimization results depend on the
choice of such parameters as J, Wmin, and L. They also depend on the choice
of the interior points Xj. It is possible that further versions of the MRC could
94 A.G. Ramm, S. Gutman
be made more efficient by finding a more efiicient rule for their placement.
Numerical experiments in [GR02b] showed that the efficiency of the minimiza-
tion greatly depended on the deterministic placement of the interior points,
with better results obtained for these points placed sufficiently close to the
boundary S of the obstacle D, but not very close to it. The current choice
of a random placement of the interior points Xj reduced the variance in the
obtained results, and efiminated the need to provide a justified algorithm for
their placement. The random choice of these points distributes them in the
entire interior of the obstacle, rather than in a subset of it.
6.4 Conclusions.
For 3D obstacle Rayleigh's hypothesis (conjecture) says that the acoustic field
u in the exterior of the obstacle D is given by the series convergent up to the
boundary of D:
oo
While this conjecture (RC) is false for many obstacles, it has been modified
in [Ram02b] to obtain a valid representation for the solution of (76)-(77).
This representation (Theorem 3) is called the Modified Rayleigh Conjecture
(MRC), and is, in fact, not a conjecture, but a Theorem.
Can one use this approach to obtain solutions to various scattering prob-
lems? A straightforward numerical implementation of the MRC may fail, but,
as we show here, it can be efficiently implemented and allows one to obtain
accurate numerical solutions to obstacle scattering problems.
The Random Multi-point MRC algorithm was successfully applied to var-
ious 2D and 3D obstacle scattering problems. This algorithm is a significant
improvement over previous MRC implementation described in [GR02b]. The
improvement is achieved by allowing the required minimizations to be done
iteratively, while the previous methods were limited by the problem size con-
straints. In [GR02b], such MRC method was presented, and it favorably com-
pared to the Boundary Integral Equation Method.
The Random Multi-point MRC has an additional attractive feature, that it
can easily treat obstacles with complicated geometry (e.g. edges and corners).
Unlike the BIEM; it is easily modified to treat different obstacle shapes.
Further research on MRC algorithms is conducted. It is hoped that the
MRC in its various implementation can emerge as a valuable and efficient
alternative to more established methods.
Optimization Methods in Direct and Inverse Scattering 95
The Inverse Scattering Problem consists of finding the obstacle D from the
Scattering Amplitude, or similarly observed data. The Support Function
Method (SFM) was originally developed in a 3-D setting in [RamTO], see
also [Ram86, pp. 94-99]. It is used to approximately locate the obstacle D.
The method is derived using a high-frequency approximation to the scattered
field for smooth, strictly convex obstacles. It turns out that this inexpensive
method also provides a good localization of obstacles in the resonance region
of frequencies. If the obstacle is not convex, then the SFM yields its convex
hull.
One can restate the SFM in a 2-D setting as follows (see [GR03]). Let
D C M^ be a smooth and strictly convex obstacle with the boundary F. Let
z/(y) be the unique outward unit normal vector to JT at y G i"". Fix an incident
direction a ^ S^. Then the boundary F can be decomposed into the following
two parts:
which are, correspondingly, the illuminated and the shadowed parts of the
boundary for the chosen incident direction a.
Given a £ S^^ its specular point so(a) G /If. is defined from the condi-
tion:
So (a) • a = min s • a (93)
and
z/(so(a)) = - a . (95)
The Support function d[a) is defined by
Thus \d{a)\ is the distance from the origin to the unique tangent hne to
/If perpendicular to the incident vector a. Since the obstacle D is assumed
to be convex
^ = naG5i{xGM^ : x - a > d ( a ) } . (97)
The boundary T of -D is smooth, hence so is the Support Function. The
knowledge of this function allows one to reconstruct the boundary F using
the following procedure.
96 A.G. Ramm, S. Gutman
Parametrize unit vectors 1 E 5^ by l(^) = (cos t, sin t), 0 < t < 27r and
define
p{t) = d{l{t)), 0 < ^ < 2 7 r . (98)
Equation (94) and the definition of the Support Function give
Since F is the envelope of its tangent hues, its equation can be found from
(99) and
—xi sin t-i-X2 cos t = p'{t). (100)
Therefore the parametric equations of the boundary F are
So, the question is how to construct the Support function d(l), 1 G 5^ from
the knowledge of the Scattering Amplitude. In 2-D the Scattering Amplitude
is related to the total field u = UQ-^V hy
(102)
•^<°'°> = - ^ X a ^ ' " " " ' ' ' ' ^ w -
In the case of the "soft" boundary condition (i.e. the pressure field satis-
fies the Dirichlet boundary condition u = &) the Kirchhoff (high frequency)
approximation gives
|H.O (.04)
iy/k e*
(106)
V27r 70
Let Co E [0,1/] be such that SQ = y(Co) is the specular point of the unit
vector 1, where
\a — a'\
Then i/(so) = - 1 , and c/(l) = y(Co) • 1. Let
Optimization Methods in Direct and Inverse Scattering 97
^(C) = ( a - a ' ) - y ( C ) .
Then (p{() = 1 • y(C)|ct — a'\. Since z^(so) and y'(Co) are orthogonal, one has
27r
/ /(C)e^'='^«)rfC = /(Co)exp ik(p{Co) +
Jo 4 |<^"(Co)| A:|v"(Co)|
(108)
as fc -^ DO.
By the definition of the curvature A^(CO) = iy^'CCo)!- Therefore, from the
collinearity of y'XCo) and 1, |<^"(Co)| = \oc — a'\K{C,o), Finally, the strict con-
vexity of J9, and the definition of <^{C,)-, imply that Co is the unique point of
minimum of (f on [0, L], and
V'"(Co)
= 1 (109)
l^"(Co)|
Using (108)-(109), expression (106) becomes:
l a ifc(a-a')-y(Co)
A{a',a) l + O , fc-^oc. (110)
^J\a^^^''a%{^
can be used for an approximate recovery of the curvature and the support
function (modulo 27T/k\a — a'|) of the obstacle, provided one knows that the
total field satisfies the Dirichlet boundary condition. The uncertainty in the
support function determination can be remedied by using difi'erent combina-
tions of vectors a and a' as described in the numerical results section.
98 A.G. Ramm, S. Gut man
Since it is also of interest to localize the obstacle in the case when the
boundary condition is not a priori known, one can modify the SFM as shown
in [RG04], and obtain
2 V /^(Co)
where
7o = arctan —,
a
and
— + hu = 0
on
along the boundary F of the sought obstacle.
Now one can recover the Support Function d{l) from (113), and the loca-
tion of the obstacle.
where a' = x / | x | = e^^, and a = e^^. Vectors a and a' are defined by their
polar angles shown in Table 12.
Table 12 shows that only vectors a close to the vector 1 are suitable for the
Scattering Amplitude approximation. This shows the practical importance of
the backscattering data. Any single combination of vectors a and a' repre-
senting 1 is not sufficient to uniquely determine the Support Function d{l)
from (112) because of the phase uncertainty. However, one can remedy this
by using more than one pair of vectors a and a' as follows.
Let 1 G 5^ be fixed. Let
Table 12. Ratios of the approximate and the exact Scattering Amphtudes
Aa{a',a)/A{a\a) for 1 = (1.0,0.0).
/c = 1.0 /c - 5.0
2 4 6 8
Fig. 5. Identified (dotted line), and the original (solid line) obstacle D for /c = 1.0.
y
4
2 4 6 8
Fig. 6. Identified points and the original obstacle D (solid line); k = 1.0.
where, given t^ the vectors a and a' are chosen as above, and the phase
function '^(t), \/2 < t < 2 is continuous. Similarly, let Aa{t), ipa{t) be the
approximate scattering amplitude and its phase defined by formula (113).
If the approximation (113) were exact for any a G i?(l), then the value of
ij{t) « Cit + C2
d(l) = ^ . (117)
Also
2
However, the formula for h did not work well numerically. It could only deter-
mine if the boundary conditions were or were not of the Dirichlet type. Table
13 shows that the algorithm based on (117) was successful in the identification
of the circle of radius 1.0 centered in the origin for various values of h with no
a priori assumptions on the boundary conditions. For this circle the Support
Function d{l) — —1.0 for any direction I.
102 A.G. Ramm, S. Gutman
Table 13. Identified values of the Support Function for the circle of radius 1.0 at
k = 3.0.
h Identified d(\) Actual d(l)
0.01 -0.9006 -1.00
0.10 -0.9191 -1.00
0.50 -1.0072 -1.00
1.00 -1.0730 -1.00
2.00 -0.9305 -1.00
5.00 -1.3479 -1.00
10.00 -1.1693 -1.00
100.00 -1.0801 -1.00
If
{A + k^)w = 0 in D',w\s = K (121)
and w satisfies the radiation condition, then ([Ram86]) one has
The following claim follows easily from the results in [Ram86], [Ram92b] (cf
[Kir98]):
Claim: f := e'^^^''^ G R{B) if and only if z e D.
Proof. If e"^^"'-^ = Bh, then Lemma 1 and (12.6) imply
7. 51
A{a,^)jd0 - e-»'="-^|| < €. (128)
Fg = f, (129)
fn = - i ^ e - ^ ^ - - ^ n-l,...,Ar,
ll7f = E ^ - (130)
n=l *"
N , 12
ll7f = E ^ - (132)
n=l
A detailed numerical comparison of the two LSMs and the linearized tomo-
graphic inverse scattering is given in [BLWOl].
The conclusions of [BLWOl], as well as of our own numerical experiments
are that the method of Kirsch (131) gives a better, but a comparable iden-
tification, than (129). The identification is significantly deteriorating if the
scattering amplitude is available only for a limited aperture, or the data are
corrupted by noise. Also, the points with the smallest values of the ||7|| are
the best in locating the inclusion, and not the largest one, as required by
the theory in [CK96, Kir98]. In Figures 7 and 8 the implementation of the
Colton-Kirsch LSM (130) is denoted by gnck, and of the Kirsch method (132)
by gnk. The Figures show a contour plot of the logarithm of the ||7||. In all the
cases the original obstacle was the circle of radius 1.0 centered at the point
(10.0, 15.0). A similar circular obstacle that was identified by the Support
Function Method (SFM) is discussed in Section 10. Note that the actual ra-
dius of the circle is 1.0, but it cannot be seen from the LSM identification.
The LSM does not require any knowledge of the boundary conditions on the
obstacle. The use of the SFM for unknown boundary conditions is discussed in
the previous section. The LSM identification was performed for the scattering
amplitude of the circle computed analytically with no noise added. In all the
experiments the value for the parameter N was chosen to be 128.
References
[ARS99] Airapetyan, R., Ramm, A.G., Smirnova, A.: Example of two different
potentials which have practically the samefixed-energyphase shifts. Phys.
Lett. A, 254, N3-4, 141-148(1999).
[Apa97] Apagyi, B. et al (eds): Inverse and algebraic quantum scattering theory.
Springer, Berlin (1997)
[ARS98] Athanasiadis, C, Ramm A.G., Stratis I.G.: Inverse Acoustic Scattering
by a Layered Obstacle. In: Ramm A. (ed) Inverse Problems, Tomography,
and Image Processing. Plenum Press, New York, 1-8 (1998)
[Bar71] Barantsev, R.: Concerning the Rayleigh hypothesis in the problem of scat-
tering from finite bodies of arbitrary shapes. Vestnik Lenungrad Univ.,
Math., Mech., Astron., 7, 56-62 (1971)
[BP96] Barhen, J., Protopopescu, V.: Generalized TRUST algorithm for global
optimization. In: Floudas C. (ed) State of The Art in Global Optimization.
Kluwer, Dordrecht (1996)
[BPR97] Barhen, J., Protopopescu, V., Reister, D.: TRUST: A deterministic algo-
rithm for global optimization. Science, 276, 1094-1097 (1997)
106 A.G. Ramm, S. Gutman
34.3
5.39
0 5 10 15 20
70.4
39.0
0 5 10 15 20
[Bie97] Biegler, L.T. (ed): Large-scale Optimization With Applications. In: IMA
volumes in Mathematics and Its Applications, 92-94. Springer-Verlag,
New York (1997)
[BR87] Boender, C.G.E., Rinnooy Kan, A.H.G.: Bayesian stopping rules for mul-
tistart global optimization methods. Math. Program., 37, 59-80 (1987)
[Bom97] Bomze, I.M. (ed): Developments in Global Optimization. Kluwer Acad-
emia Publ., Dordrecht (1997)
[BLWOl] Brandfass, M., Lanterman A.D., Warnick K.F.: A comparison of the
Colton-Kirsch inverse scattering methods with linearized tomographic in-
verse scattering. Inverse Problems, 17, 1797-1816 (2001)
[Bre73] Brent, P.: Algorithms for minimization without derivatives. Prentice-Hall,
Englewood Cliffs, NJ (1973)
[Cal67] Calogero, P.: Variable Phase Approach to Potential Scattering. Academic
Press, New York and London (1967)
[CS89] Chadan, K., Sabatier, P.: Inverse Problems in Quantum Scattering The-
ory. Springer, New York (1989)
[CCMOO] Colton, D., Coyle, J., Monk, P.: Recent developments in inverse acoustic
scattering theory. SIAM Rev., 42, 369-414 (2000)
[CK96] Colton, D., Kirsch, A.: A simple method for solving inverse scattering
problems in the resonance region. Inverse Problems 12, 383-393 (1996)
Optimization Methods in Direct and Inverse Scattering 107
32.7
7.64
67.6
40.3
0 5 10 15 20
[CK92] Colton, D., Kress, R.: Inverse Acoustic and Electromagnetic Scattering
Theory. Springer-Verlag, New York (1992)
[CM90] Colton, D., Monk, P.: The Inverse Scattering Problem for acoustic waves
in an Inhomogeneous Medium. In: Colton D., Ewing R., Rundell W. (eds)
Inverse Problems in Partial Differential Equations. SIAM Publ. Philadel-
phia, 73-84 (1990)
[CT70] Cox, J., Thompson, K.: Note on the uniqueness of the solution of an
equation of interest in the inverse scattering problem. J. Math. Phys., 11,
815-817 (1970)
[DE94] Deep, K., Evans, D.J.: A parallel random search global optimization
method. Technical Report 882, Computer Studies, Loughborough Uni-
versity of Technology (1994)
[DS83] Dennis, J.E., Schnabel, R.B.: Numerical methods for unconstrained op-
timization and nonlinear equations. Prentice-Hall, Englewood Cliffs, NJ
(1983)
[DJ93] Dixon, L.C.W., Jha, M.: Parallel algorithms for global optimization. J.
Opt. Theor. AppL, 79, 385-395 (1993)
[EJP57] Ewing, W.M, Jardetzky, W.S., Press, P.: Elastic Waves in Layered Media.
McGraw-Hill, New York (1957)
108 A.G. Ramm, S. Gutman
[Mil73] Millar, R.: The Rayleigh hypothesis and a related least-squares solution to
the scattering problems for periodic surfaces and other scatterers. Radio
Sci., 8, 785-796 (1973)
[New82] Newton R.: Scattering Theory of Waves and Particles. Springer, New York
(1982)
[PRTOO] Pardalos, P.M., Romeijn, H.E., Tuy, H.: Recent developments and trends
in global optimization. J. Comput. Appl. Math., 124, 209-228 (2000)
[Pol71] Polak, E.: Computational methods in optimization. Academic Press, New
York (1971)
[PTVF92] Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numeri-
cal Recepies in FORTRAN, Second Edition, Cambridge University Press
(1992)
[Ram70] Ramm, A.G.: Reconstruction of the shape of a reflecting body from the
scattering amplitude. Radiofisika, 13, 727-732 (1970)
[Ram82] Ramm A.G.: Iterative methods for calculating the static fields and wave
scattering by small bodies. Springer-Verlag. New York, NY (1982)
[Ram86] Ramm A.G.: Scattering by Obstacles. D. Reidel Publishing, Dordrecht,
Holland (1986)
[Ram88] Ramm, A.G. Recovery of the potential from fixed energy scattering data.
Inverse Problems, 4, 877-886.
[Ram90] Ramm, A.G.: Is the Born approximation good for solving the inverse prob-
lem when the potential is small? J. Math. Anal. Appl., 147, 480-485
(1990)
[Ram91] Ramm, A.G.: Symmetry properties for scattering amplitudes and appli-
cations to inverse problems. J. Math. Anal. Appl., 156, 333-340 (1991)
[Ram92a] Ramm, A.G.: Stability of the inversion of 3D fixed-frequency data. J.
Math. Anal. Appl., 169, 329-349 (1992)
[Ram92b] Ramm, A.G.: Multidimensional Inverse Scattering Problems. Long-
man/Wiley, New York (1992)
[Ram94a] Ramm, A.G.: Multidimensional Inverse Scattering Problems. Mir,
Moscow (1994) (expanded Russian edition of [Ram92b])
[Ram94b] Ramm, A.G.: Numerical method for solving inverse scattering problems.
Doklady of Russian Acad, of Sci., 337, 20-22 (1994)
[Ram94c] Ramm, A.G.: Stability of the solution to inverse obstacle scattering prob-
lem. J. Inverse and 111-Posed Problems, 2, 269-275 (1994)
[Ram94d] Ramm, A.G.: Stability estimates for obstacle scattering. J. Math. Anal.
Appl., 188, 743-751 (1994)
[Ram96] Ramm, A.G.: Finding potential from the fixed-energy scattering data via
D-N map. J. of Inverse and Ill-Posed Problems, 4, 145-152 (1996)
[Ram97] Ramm, A.G.: A method for finding small inhomogeneities from surface
data. Math. Sci. Research Hot-Line, 1, 10 , 40-42 (1997)
[RamOOa] Ramm A.G.: Finding small inhomogeneities from scattering data. Jour.
of inverse and ill-posed problems, 8, 1-6 (2000)
[RamOOb] Ramm, A.G.: Property C for ODE and applications to inverse problems.
In: Operator Theory and Its Applications, Amer. Math. Soc, Fields In-
stitute Communications, Providence, RI, 25, 15-75 (2000)
[Ram02a] Ramm, A.G.: Stability of the solutions to 3D inverse scattering problems.
Milan Journ of Math 70, 97-161 (2002)
[Ram02b] Ramm, A.G.: Modified Rayleigh Conjecture and appHcations. J. Phys.
A: Math. Gen., 35, 357-361.
110 A.G. Ramm, S. Gutman
1 Introduction
In real life we constantly have to make decisions under uncertainty and, more-
over, we would like to make such decisions in a reasonably optimal way. T h e n
for a specified objective function F ( x , ^ ) , depending on decision vector x G M^
and vector ^ E M^ of uncertain parameters, we are faced with the problem of
optimizing (say minimizing) F ( x , ^) over x varying in a permissible (feasible)
set X C M'^. Of course, such an optimization problem is not well defined since
our objective depends on an unknown value of ^. A way of dealing with this is
to optimize the objective on average. T h a t is, it is assumed t h a t ^ is a r a n d o m
vector^, with known probability distribution P having support S' C R^, and
the following optimization problem is formulated
^ Sometimes, in the sequel, ^ denotes a random vector and sometimes its particular
reahzation (numerical value). Which one of these two meanings is used will be
clear from the context.
112 A. Shapiro, A. Nemirovski
mn{f{x):=Ep[Fix,^)]}. (1)
We assume throughout the paper that considered expectations are well de-
fined, e.g., F(x, •) is measurable and P-integrable.
In particular, the above formulation can be applied to two-stage stochastic
programming problem with recourse, pioneered by Beale [Bea55] and Dantzig
[Dan55]. That is, an optimization problem is divided into two stages. At the
first stage one has to make a decision on the basis of some available informa-
tion. At the second stage, after a realization of the uncertain data becomes
known, an optimal second stage decision is made. Such stochastic program-
ming problem can be written in the form (1) with F(x,^) being the optimal
value of the second stage problem.
It should be noted that in the formulation (1) all uncertainties are con-
centrated in the objective function while the feasible set X is supposed to
be known (deterministic). Quite often the feasible set itself is defined by con-
straints which depend on uncertain parameters. In some cases one can rea-
sonably formulate such problems in the form (1) by introducing penalties for
possible infeasibilities. Alternatively one can try to optimize the objective sub-
ject to satisfying constraints for all values of unknown parameters in a chosen
(uncertain) region. This is the approach of robust optimization (cf., Ben-Tal
and Nemirovski [BNOl]). Satisfying the constraints for all possible realizations
of random data may be too conservative and, more reasonably, one may try
to satisfy the constraints with a high (close to one) probability. This leads to
the chance, or probabilistic, constraints formulation which is going back to
Charnes and Cooper [CC59].
There are several natural questions which arise with respect to formulation
(i) How do we know the probability distribution P? In some cases one has his-
torical data which can be used to obtain a reasonably accurate estimate
of the corresponding probability distribution. However, this happens in
rather specific situations and often the probability distribution either can-
not be accurately estimated or changes with time. Even worse, in many
cases one deals with scenarios (i.e., possible realizations of the random
data) with the associated probabilities assigned by a subjective judgment.
(ii) Why, at the first stage, do we optimize the expected value of the second
stage optimization problem? If the optimization procedure is repeated
many times, with the same probability distribution of the data, then it
could be argued by employing the Law of Large Numbers that this gives
an optimal decision on average. However, if in the process, because of the
variability of the data one looses all its capital, it does not help that the
decisions were optimal on average.
(iii)How difficult is it to solve the stochastic programming problem (1)? Eval-
uation of the expected value function f{x) involves calculation of the cor-
responding multivariate integrals. Only in rather specific cases it can be
On Complexity of Stochastic Programming Problems 113
given X, even unimodal distributions and F(x, •) := —ls{')^ where ts{') is the
indicator function of a symmetric convex set 5, this result was first estabhshed
by Barmish and Lagoa [BL97], where it was called the "Uniformity Principle".
Question (ii) has also a long history. One can optimize a weighted sum
of the expected value and a term representing variability of the second stage
objective function. For example, we can try to minimize
where c G [0,1].
It turns out that p{Z) is a coherent risk measure if and only if it can be
represented in the form p{Z) = supp^fpEp[Z], where ^ is a set of probabil-
ity measures. In different frameworks this dual representation was derived in
[ADEH99, FS02, RUZ02, RS04a]. Therefore, the min-max problem (2) and
the problem of minimization of a coherent risk measure, of F{x, ^), in fact are
equivalent. We may refer to [ADEHK03, ER05, RieOS, RS04b] for extensions
of this approach to a multi-stage setting.
If the number of scenarios K is not "too large", then the above linear pro-
gramming problem (5) can be solved accurately in a reasonable time. However,
even a crude discretization of the probability distribution of ^ typically results
in an exponential growth of the number of scenarios with increase of the num-
ber d of random parameters. Suppose, for example, that components of the
random vector ^ are mutually independently distributed each having a small
number r of possible realizations. Then the size of the corresponding input
data grows linearly in d (and r) while the number of scenarios K — r^ grows
exponentially. Yet in some cases problem (5) can be solved numerically in a
reasonable time. For example, suppose that matrices T and W are constant
(deterministic) and only h is random and, moreover, Q{x,^) decomposes into
the sum (3(x,^) = Qi{xi,hi) + ... + Qn{xn, ^n)- This happens in the case of
the so-called simple recourse with
^ It is said that the recourse is relatively complete if for every x ^ X and every
possible realization of random data, the second stage problem is feasible.
On Complexity of Stochastic Programming Problems 117
average'^
/;v(x):=^f^F(x,CO. (6)
Consequently, we approximate the true problem (1) by the problem:
Min/^(a;). (7)
xex
We refer to (7) as the Sample Average Approximation (SAA) problem. The
optimal value VN and the set SN of optimal solutions of the SAA problem
(7) provide estimates of their true counterparts of problem (1). It should
be noted that once the sample is generated, /iv(x) becomes a deterministic
function and problem (7) becomes a stochastic programming problem with
N scenarios ^^,...,C^ taken with equal probabilities 1/A^. It also should be
mentioned that the SAA method is not an algorithm. One still has to solve the
obtained problem (7) by employing an appropriate (deterministic) algorithm.
By the Law of Large Numbers we have that /iv(^) converges (pointwise
in x) w.p.l to f{x) as A^ tends to infinity. Therefore it is reasonable to ex-
pect for VN and SN to converge to their counterparts of the true problem
(1) with probability one (w.p.l) as A^ tends to infinity. And, indeed, such
convergence can be proved under mild regularity conditions. However, for a
fixed X G X, convergence of /Ar(x) to f{x) is notoriously slow. By the Central
Limit Theorem it is of order Op{N~^^'^). The rate of convergence can be im-
proved, sometimes significantly, by variance reduction methods. However, by
using Monte Carlo (Quasi-Monte Carlo) techniques one cannot evaluate the
expected value f{x) very accurately.
The following analysis is based on exponential bounds of the Large Devi-
ations (LD) theory (see, e.g., [DZ98] for a general discussion of LD theory).
Denote by S^ and Sf^ the sets of ^-optimal solutions of the true and SAA prob-
lems, respectively, i.e., x e S^ iS x E X and f{x) < infxex / ( ^ ) + ^- Choose
accuracy constants e > 0 and 0 < 6 < e^ and significance level a G (0,1).
Suppose for the moment that the set X is finite although its cardinahty \X\
can be very large. Then by using Cramer's LD theorem it is not difficult to
show that the sample size
guarantees solving the true problem with accuracy e with probability at least
1-a.
The number rj{€,6) in the estimate (8) is defined as follows. Consider a
mapping u : X \ S^ -^ X such that f{u{x)) < f{x) — e hi all x e X \ S^.
Such mappings do exist, although not unique. For example, any mapping
u : X \ S^ -^ S satisfies this condition. Choice of such a mapping gives a
certain flexibility to the corresponding estimate of the sample size. For x E X^
consider random variable
Y,:=F{uix),0-F{x,^),
its moment generating function Mx{t) :— E [e*^^] and the LD rate function^
4(z) :=s\XY>{tz-logM:,{t)].
Because of (9) and since 5 < e, the number Ix{—5) is positive provided that
the probability distribution of Yx is not "too bad". Specifically, if we assume
that the moment generating function Mx(t), of Yx, is finite valued for all t
in a neighborhood of 0, then the random variable Yx has finite moments and
Ixil^x) = I'{l^x) = 0, and r\lJix) = ^/^x where a^ := Var [Yx]. Consequently,
J^x{—S) can be approximated, by using second order Taylor expansion, as
follows
^^^-^^ —2^r~ - ^ ^ -
This suggests that one can expect the constant r}{e,5) to be of order of (e—5)^.
And, indeed, this can be ensured by various conditions. Consider the following
condition.
(Al) There exists constant cr > 0 such that for any x' ,x G X, the moment
generating function M*(^) of F{x', ^ - F(x, ^)-E[F{x\ ^) - F(x, 0 ] sat-
isfies:
M * ( t ) < e x p ( ^ a V ) , \/teR, (11)
Note that random variable F ( x ' , 0 - ^^(^,0 - E [F{x\^) - F{x,^)] has
zero mean. Moreover, if it has a normal distribution, with variance a'^, then
^ That is, /a;(-) is the conjugate of the function logMa;(-) in the sense of convex
analysis.
On Complexity of Stochastic Programming Problems 119
its moment generating function is equal to the right hand side of (11). Con-
dition (11) means that tail probabilities Prob(|F(a:',^) — F{x,^)\ > t) are
bounded from above^ by 0 ( l ) e x p ( — ^ ) . This condition certainly holds if
the distribution of the considered random variable has a bounded support.
For x' = u{x), random variable F{x\^) — F{x,^) coincides with Y^, and
hence (11) implies that Mx{t) < exp{fixt + a^t^/2). It follows that
ri{e,5)>^^^^>^^. (13)
It follows that, under assumption (Al), the estimate (8) can be written as
2^
^ > 7 r ^ l o g ( ^ ) . (14)
where ilj(t) is a convex even function with -0(0) = 0. Then \ogMx{t) < jJix^ +
il){t) and hence Ix{z) >i)*{z — jix)^ where i/^* is the conjugate of the function
ip. It follows then that
For example, instead of assuming that the bound (11) holds for all t G R,
we can assume that it holds for all t in a finite interval [—a,a], where a >
0 is a given constant. That is, we can take ilj{t) := ^a'^t if |^| < a, and
'0(t) := +00 otherwise. In that case ip*{z) = z'^/{2a'^) for \z\ < aa"^^ and
'0*(2:) = a\z\ - \O?'CT'^ for \z\ > aa'^.
D :=sup^,^^^;^||x'-x||.
Then for r > 0 we can construct a set Xr C X such that for any x G X
there is x' G Xr satisfying \\x — x^\\ < r, and \Xr\ = 0 ( 1 ) ( D / r ) ^ . Suppose
that condition (Al) holds. Then by (14), for e' > 5, we can estimate the
corresponding sample size required to solve the reduced optimization problem,
obtained by replacing X with Xr, as
r:=[{e-6)/i2L')f-',
we obtain that with probability at least 1 — a / 2 , the point XN is an ^'-optimal
solution of the reduced SAA problem with e' := {s + S)/2. Moreover, by taking
a sample size satisfying (17), we obtain that xj\/ is an ^'-optimal solution of the
reduced expected value problem with probabihty at least 1 — a / 2 . It follows
that XN is an ^''-optimal solution of the SAA problem (1) with probability at
least 1 — a and e" = e^ -\- Lr^ < e. We obtain the following estimate
will discuss this in the next section. In typical applications (e.g., in the convex
case) the constant 7 = 1, in which case condition (18) means that -F(-,0 is
Lipschitz continuous on X with constant /^(O- However, there are also some
applications where 7 could be less than 1 (cf., [Sha05a]).
We obtain the following basic positive result.
Theorem 1. Suppose that assumptions (Al) and (A2) hold and X has a finite
diameter D. Then for e>0,0<S<e and sample size N satisfying (22), we
are guaranteed that any 6-optimal solution of the SAA problem is an e-optimal
solution of the true problem with probability at least 1 — a.
and hence rj{e,5) > 0{l){e — (5)^/C^. Consequently, the bound (8) for the
sample size which is required to solve the true problem with accuracy e > 0
and probability at least 1 — a, by solving the SAA problem with accuracy
6 := e/2, takes the form
„>Om(£)\o,(l^). (24,
The estimate (24) can be also derived by using Hoeffding's inequality^^ instead
of Cramer's LD bound.
In particular, if we assume that 7 = 1 and K{^) = L for all ^ e S, i.e.,
F(-,^) is Lipschitz continuous on X with constant L independent of ^ G ^ ,
then we can take C — DL and remove the term P~^ log(2/a) in the right hand
side of (22). By taking, further, S := e/2 we obtain in that case the following
estimate of the sample size
„,oa)(^)'[„,„.(5i).,o.(ffi)) (25)
Theorem 2. Suppose that X has a finite diameter D and condition (18) holds
with 7 = 1 and K{£) = L for all ^ G S. Then with sample size N satisfying
(25) we are guaranteed that every {e/2)-optimal solution of the SAA problem
is an e-optimal solution of the true problem with probability at least 1 — a.
mnfix), (CP)
In this framework, for every e > 0 one can find an e-solution to (CP) by
an algorithm which requires at most
calls to the Separation and First Order oracles, with a call accompanied by
0{n^) arithmetic operations to process oracle^s answer.
When comparing bounds (25) and (26), our first observation is that both
of them depend polynomially on the design dimension n of the problem, which
is nice. What does make diff"erence between these bounds, is their dependence
^^ In our context, Theorem 3 allows to handle the most general "black box" situation
- no assumptions on F(-, ^) and X except for convexity and computability. When
-^(•)0 possesses appropriate analytic structure, the complexity of solving the
SAA problem can be reduced by using a solver adjusted to this structure.
124 A. Shapiro, A. Nemirovski
situation considered in this statement the bound (25) is the best possible (up
to logarithmic term) as far as the dependence on D,L and e is concerned.
To make our presentation self-contained, we explain here what are
the "laws of Statistics" which underlie the above conclusions. First,
an algorithm A capable of solving within accuracy e and reliability 0.9
every one of the problems (SP±i), given an A/'-element sample drawn
from the corresponding distribution, indeed implies a "0.9-reliable"
procedure which decides, based on the same sample, what is the dis-
tribution; this procedure accepts hypothesis I stating that the sample
is drawn from distribution Pi if and only if the approximate solu-
tion generated by A is in [0, D/2]; if it is not the case, the procedure
accepts hypothesis II "the sample is drawn from P_i". Note that if
the first of the hypotheses is true and the outlined procedure accepts
the second one, the approximate solution produced by A is not and
e-solution to (SPi), so that the probability p^ to accept the second hy-
pothesis when the first is true is < 1 — 0.9 == 0.1. Similarly, probability
p^^ for the procedure to accept the first hypothesis when the second
is true is < 0.1. The announced lower bound on A^ is given by the
following observation: Consider a decision rule which, given on input
a sequence ^^ of N independent reahzations of ^ known in advance
to be drawn either from the distribution Pi, or from the distribution
P-i, decides which one of these two options takes place, and let p^,
p^^ be the associated probabilities of wrong decisions. Then
4^es
whence
yNfcNW / ^I
E'o.(^)p."«'')..-.o. 1-p"
and similarly
whence
For every p G (0,1/2), the minimum of the left hand side in the latter
inequality in p^p^^ G (0,j9] is achieved when p^ — p^^ = p and is equal
to plog ^ + (1 - p) log i ^ > 4(p - 1/2)2. Thus,
On the other hand, taking into account the product structure of P±i^
we have
/C = i v [ P i ( - L ) l o g ^ i ^ + P i ( L ) l o g ^ ]
The concluding quantity is < 0{l)Nu'^, provided that z/ < 0.1. Com-
bining this observation and (28), we arrive at (27).
with only the second-stage right hand side vector h = h{^) being random.
To see that a generic problem of checking whether (29) is feasible for a
given X is NP-hard, consider the case when the constraints Tx-\-Wy >
h{^) read y < 0, y -\- x > h{^), where x, ?/ G R,
p(Q):=max{(e,QO:ee[-l,l]''}.
in finite /(x)), not speaking about minimizing over these x's. As it was men-
tioned above, the standard way to avoid, to some extent, this difficulty is to
pass to a penahzed problem. For example, we can replace the second stage
problem (4) with the penalized version:
where e is vector of ones and r > > 1 plays the role of the penalty coefficient.
With this penalization, the second stage problem becomes always feasible. At
the same time, one can hope that with large enough penalty coefficient r, the
first-stage optimal solution will lead to "nearly always nearly feasible" second-
stage problems, provided that the original problem is feasible. Unfortunately,
in the situation where one cannot tolerate arising, with probability bigger
than a, a second-stage infeasibility z bigger than r (here a and r are given
thresholds), the penalty parameter r should be of order of {ar)~^. In the
"high reliability" case a < < 1 we end up with problem (30) which contains
large coefficients, which can lead to large value of the Lipschitz constant L^ of
the optimal value function Fr{'^^) of the penalized second stage problem. As
a result, quite moderate accuracy requirements (like e being of order of 5% of
the optimal value of the true problem) can result in the necessity to solve (30)
within a pretty high relative accuracy u = e/{DLr) like 10~^ or less, with all
unpleasant consequences of this necessity.
and minimize under these restrictions the expected value of a given cost func-
tion /(x,2/i, ...,yT)- Note that even in the case when the functions gi do not
depend of ^, the left hand sides of the constraints (31) are functions of ^,
since all yt are so, and that the interpretation of (31) is that these functional
constraints should be satisfied with probability one.
In the sequel, we focus on the case of linear multi-stage problems
and specifying the conditional, given ^[T-I]? expected cost of the last-stage
problem:
FT-i(x,yi,...,2/T-i,<?[T-i]) :==E|^j^_^^Min|FT(x,i/i,...,2/T-i,yT,^[ri) •
Aoi^[T])x + Al{^[T])yi + . . . + A^{^^T])yT
>&^(?[T])},
where Ei^^^jj is the conditional, given C[T-I]? expectation. Observe that (32)
is equivalent to the (T — l)-stage problem:
where P^~^ is the distribution of ^[7^_i]. Now we can iterate this construction,
ending up with the problem
Min[Fo(a;)].
It can be easily seen that under the assumption of complete recourse, plus
mild boundedness assumptions, all functions F^(x,2/i, -..,2/^,^^]) ^^^ Lipschitz
continuous in the x, y-arguments.
^^ To the best of our knowledge, the complexity status of problem (32), even in
the case of complete and fixed recourse and known in advance easy-to-describe
distribution P, remains unknown (cf., [DS03]),
On Complexity of Stochastic Programming Problems 131
The "common wisdom" says that since both, two-stage and multi-stage,
problems are of the same generic form (1), with the integrand convex in x, and
both are processed numerically by generating a sample of scenarios and solving
the resulting "scenario counterpart" of the problem of interest, there should be
no much difference between the two and the multi-stage case, provided that in
both cases one uses the same number of scenarios. This "reasoning", however,
completely ignores a crucial point as follows: in order to solve generated SAA
problems efficiently, the integrand F should be efficiently computable at every
pair (x, ^). This is indeed the case for a two-stage problem, since there F(x, ^)
is the optimal value in an explicit Linear Programming problem and as such
can be computed in polynomial time. In contrast to this, the integrand F
produced by the outlined scheme, as applied to a multi-stage problem, is
not easy to compute. For example, in 3-stage problem this integrand is the
optimal value in a 2-stage stochastic problem, so that its computation at a
point is a much more computationally involving task than similar task in the
two-stage case. Moreover, in order to get just consistent estimates in an SAA
type procedure (not talking about rate of convergence) one needs to employ a
conditional sampling which typically results in an exponential growth of the
number of generated scenarios with increase of the number T of stages (cf.,
[ShaOSa]).
Analysis demonstrates that for an algorithm of the SAA type, the total
number of scenarios needed to solve T-stage problem (32), with complete
recourse, would grow, as e diminishes, as £:~^^, so that the computational
effort blows up exponentially as the number of stages grows^^ (cf., [Sha05b]).
Equivalently, for a sampling-based algorithms with a given number of sce-
narios, existing theoretical quality guarantees deteriorate dramatically as the
number of stages grows. Of course, nobody told us that sampling-type algo-
rithms are the only way to handle stochastic problems, so that the outlined
reasoning does not pretend to justify "severe computational intractability" of
multi-stage problems. Our goal is more modest, we only argue that the fact
that when solving a particular stochastic program a sample of 10^ scenar-
ios was used does not say much about the quality of the resulting solution:
in the two-stage case, there are good reasons to believe that this quality is
reasonable, while in the 5-stage the quality may be disastrously bad.
We have described one source of severe difficulty arising when solving
multi-stage stochastic problems - dramatic growth, with increase of the num-
ber of stages, in the complexity of evaluating the integrand F in representation
(1) of the problem. We are about to demonstrate that even when this difficulty
does not arise, a multi-stage problem still may be very difficult. To this end,
consider the following story: at time t = 0, one has $ 1, and should decide how
to distribute this money between stocks and a bank account. When investing
amount of money x into stocks, the value Ut of the portfolio at time t will be
where the returns pt{^[t]) = pti^ii "•'>^t) ^ 0 are known functions of the
underlying random parameters. Amount of money 1 — x put to bank account
reach at time t the value Vt = p^{l — x), where p > 0 is a given constant. The
goal is to maximize the total expected wealth E[UT + VT] at a given time T.
The problem can be written as a simple-looking T-stage stochastic problem
of the form (32):
MinEp[^T(H+^T(e^)]
s.t. 0 < X < 1 (Co)
Mi[i]) = pii^ii])^^ M^m) = ^(1 - x) (Ci)
^2(^[2]) = P2(^[2j)^i(^[i)), '^2(C[2]) = pM^m) (C2)
where y(-) = ('^t(')''^*(•))*=!• ^ ^ ^ let us specify the structure and the distri-
bution of ^ as follows: a realization of ^ is a permutation ^ = (^1,..., ^7-) of T
elements 1,..., T, and P is the uniform distribution on the set of all T! possible
permutations. Further, let us specify the returns as follows: the returns are
given by a T X T matrix A with 0-1 elements, and
/>t(6,...,6):-/^A6. /^:=(T!)^/^
(Note that by Stirling's formula n = (T/e)(l + o(l)) as T -> 00.) We end up
with a simple-looking instance of (32) with complete recourse and given in
advance "easy-to-describe" discrete distribution P; when represented in the
form of (1), our problem becomes
T
Min { / ( x ) : = E p F ( x , 0 } , F{x,£) = p^{I - x)-^ xT\{KAt^,), (34)
arG[0,lJ ^-^
so that F indeed is easy to compute. Thus, problem (33) looks nice - complete
recourse, simple and known in advance distribution, no large data entries,
easy-to-compute F in representation (1). At the same time the problem is
disastrously difficult. Indeed, from (34) it is clear that f{x) = p^{l — x) -\-
xper(A), where per (A) is the permanent of A\
T
per(^) = ^][[^*^-
e t=i
per(^) > p^. Thus, our simple-looking T-stage problem is, essentially, the
problem of computing the permanent of a T x T matrix with 0-1 entries. The
latter problem is known to be really difficult. First of all, it is NP-hard, [Val79].
Further, there are strong theoretical reasons to doubt that the permanent can
be efficiently approximated within a given relative accuracy 5, provided that
£ > 0 can be arbitrarily small, [DLMV88]. The best known to us algorithm
capable to compute permanent of a T x T 0-1 matrix within relative accuracy e
has running time as large as£-2exp{0(l)Ti/2log^(T)} (cf., [JV96]), while the
best known to us efficient algorithm for approximating permanent has relative
error as large as c^ with certain fixed c > 1, see [LSWOO]. Thus, simple-looking
multi-stage stochastic problems can indeed be extremely difficult...
A reader could argue that in fact we deal with a two-stage problem (34)
rather than with a multi-stage one, so that the outlined difficulties have noth-
ing to do with our initial multi-stage setting. Our counter-argument is that
the two-stage problem (34) honestly says about itself that it is very difficult:
with moderate p and T, the data in (34) can be astronomically large (look at
the coefficient p^ of (1 — x) or at the products Y\t=i{i^At^^) which can be as
large as K^ = T!), and so is the Lipschitz constant of F. In contrast to this,
the structure and the data in (33) look completely normal. Of course, it is
immediate to recognize that this "nice image" is just a disguise, and in fact
we are dealing with a disastrously difficult problem. Imagine, however, that
we add to (33) a number of redundant variables and constraints; how could
your favorite algorithm (or you, for that matter) recognize in the resulting
messy problem that solving it numerically is, at least at the present level of
our knowledge, a completely hopeless endeavor?
1
^N{X) := T7 ^ h9ix,^n<o} >^-0e, (36)
There is another generic case when the feasible set given by a chance
constraint is convex. This is the case when the constraint can be rep-
resented in the form (x, ^) G Q, where Q is a closed and convex set,
and the distribution P of the random vector ^ G M^ is logarithmically
quasi-concave^ meaning that
P{XA + (1 - X)B) > max [P{A), P{B)]
for all closed and convex sets A^B dW^ (cf., Prekopa [Pre95]). Ex-
amples include uniform distributions on closed and bounded convex
domains, normal distribution and every distribution on R^ with den-
sity /(^) with respect to the Lebesgue measure such that the function
/~^^^(0 is convex. The related result (due to Prekopa [Pre95]) is that
in the situation in question, the set {x : P{{i : (x,^) G Q]) > a} is
closed and convex for every a. This result can be applied, e.g., to
two-stage stochastic programs with chance constraints of the form
Min(c, x) s.t. Prob{32/ eY:Tx + Wy>^}>l-e,
XEX
where X, Y are closed convex sets and T, W are fixed matrices. Here
the chance constraint indeed is of the form Prob{(x, ^) e Q} >l — e,
where
Q = {{x,0 '^y eV :Tx ^Wy > 0.
The set Q clearly is convex; under mild additional assumptions, it is
also closed. Thus, the feasible set of the chance constraint in question
is convex, provided that the distribution of ^ is logarithmically quasi-
concave.
Note that the outlined convexity results are applicable only to the
chance constraints coming from scalar or vector inequalities where
the only term affected by uncertainty is the right hand side, not the
coefficients at the variables. For example, nothing similar is known for
the chance constraint
P r o b { ( a * + ^ , x ) < 0} > 1 - e,
except for the already mentioned case of normally distributed vector
Aside of few special cases we have mentioned, chance constraint (35) "as it
is" seems to be too difficult for efficient numerical processing, and what we
can try to do is to replace it with its "tractable approximation". For the
time being, there exist two approaches to building such an approximation:
"deterministic" and "scenario".
V^s(^)<0, (38)
which is a "safe computationally tractable" approximation of (35), with the
latter notion defined as follows:
1. "Safety" means that the validity of (38) is a sufRcient condition for the
vahdity of (35);
2. "Tractability" means that (38) is an explicitly given convex constraint.
Just to give an example, consider a randomly perturbed linear constraint, that
is, assume that
e.g., ^i can have a distribution supported on the interval [—1,1], or ^i can have
normal distribution A/'(0, 2~^/^), = l,...,o!. In this case, applying standard
results on probabilities of large deviations for sums of "light tail" independent
random variables with zero means, one can easily verify that when e E (0,1)
and f2{£) = 0 ( l ) ^ l o g ( l / £ ) with properly chosen absolute constant 0 ( l ) , then
the validity of the convex constraint
is a sufficient condition for the validity of (35). (Note that under our assump-
tions MM^ is an upper bound on the covariance matrix of ^, and compare
with (37).)
The simple result we have just described is rather attractive. First, it does
not require a detailed knowledge of the distribution of ^. Second, the approx-
imation, although being more complicated than a linear constraint we start
with, still is pretty simple; modern convex optimization techniques can process
routinely to high accuracy problems with thousands of decision variables and
thousands of constraints of the form (39). Third, the approximation is "not
too conservative" - the safety parameter Q{e) grows pretty slowly as 5 -^ 0
and is only by a moderate constant factor larger than the safety parameter
in the case of Gaussian noise, where our approximation is not conservative at
all.
Recently, "not too conservative" computationally tractable safe approxi-
mations were built (see [Nem03]) for chance versions of well-structured non-
linear convex constraints with nice analytic structure, specifically, for affinely
perturbed least squares constraints
K + E ^^^°] + E ^i K + E ^^^ ^ 0
i j=l i
{A^ are symmetric matrices, A>zO means that A is symmetric positive semi-
definite). In both cases, ^i are independent scalar disturbances with zero mean
and "of order of 1". However, the outUned approach, whatever promising we
beheve it is, seemingly works for a very restricted family of "well-structured"
functions ^(x,^), and even in these cases requires a lot of highly nontrivial
"tailoring" to a particular structure in question. Consider, for example, the
case of chance constraint associated with two-stage linear stochastic problem:
where e is vector of ones. Note that here g{x,0 is convex in x, and g{x,0 ^ ^
if and only if the second-stage problem
is feasible (cf., (30)). Thus, the chance constraint requires from x to result in
a feasible, with probability at least 1 — e, second stage problem. Even in the
case of simple recourse (T, W are independent of ^) the chance constraint in
question seems to be by far too difficult to admit a safe tractable deterministic
approximation.
Scenario approximation.
How large should be the sample size N in order for the optimal solution
XN of (43) to be feasible for (42) with probability at least 1 — e?
The difference between the latter question and the former one is that now we
do not require from all points feasible for (41) to satisfy (35), we require this
property to be possessed by a specific point, XN, we are interested in.
As it was discovered in [CC05, CC04], question (Q) admits a nice "uni-
versal" answer. Namely, under extremely mild assumptions it turns out that
whenever e,5 G (0,1/2) and
^^ It should be added that the outlined "crude" scenario approach is not completely
satisfactory even when e is not too small. Indeed, assume that your problem has
n = 100 variables and you are ready to take 10% chances (e == ^ = 0.1). To this
end, you use the scenario approach with the smallest N allowed by (44), that is,
N = 9835. What should be the actual probability e' for a fixed point x to violate
On Complexity of Stochastic Programming Problems 139
One could be surprised by the fact that we treat as acceptable the SAA
method with the complexity proportional to e"^, e being the required tol-
erance in terms of the objective, and are dissatisfied with the scenario ap-
proach where the sample size is merely inverse proportional to the tolerance
e. To explain our point, think whether you will agree (a) to use a portfolio
management policy with the average profit by at most 0.5% less than the
"ideal" - the optimal - one, and (b) to board an airliner which may crash
during the flight with probability 0.5% (or 0.05%).
W h e n handling hard chance constraints - those with really small e, like 10~^
or less - we would like to have sample sizes polynomial in b o t h log(l/6:) and
\og{l/5) rather t h a n to be polynomial in {l/e) and \og{l/5). We are about to
explain t h a t under favorable circumstances, such a possibility does exist; it is
given by combining scenario approach with a kind of importance sampling. To
proceed, assume t h a t the constraint g{x^ 0 —^ underlying (35) is of a specific
structure as follows: there exists a closed convex set K C R"^ and an afl&ne
mapping x — i > ^[^Jx + &[^] : M^ -^ M"^ depending on ^ as on a parameter such
that
gix,0<O^A[^]x + b[^]eK, (45)
Moreover, let us assume t h a t the affine mapping in question is affinely para-
meterized by ^, t h a t is, both A[^] and b[^] depend affinely on ^. Finally, we
may assume without loss of generality t h a t ^ has zero mean.
Note that X is a convex polyhedral (and thus closed) set. Now, it is clear
from (40) that g(x,^) < 0 if and only if h(0 - T{^)x e K. It follows that
when passing from uncertain parameter ^ to the new uncertain parameter
^= [h{i),T{^)]-'K{[h[i),T{i)]} and updating accordingly the underlying
distribution, we arrive at the situation described in (45).
Under our assumptions, the vector A[^]x + h[^] is afEne in ^, and thus can be
represented as a[x]^ + /^N? where a[x], f3[x\ are affine in x. It follows t h a t
g{x, 0 < 0 <^ A[^]x + h[^] eK<^^eK:,:={u : a[x]u + /3[x] G K}. (46)
Note t h a t the set Kx is closed and convex along with K. Now, numer-
ous i m p o r t a n t distributions IT on R^ with zero mean (multivariate normal,
the constraint g{x,^) < 0 in order to be feasible for (43) with probability 0.9?
The answer is: e' should be as small as 10~^. Thus, when applied with small e,
the crude scenario approach becomes impractical, while in the case of "large" e
it seems to be too conservative.
140 A. Shapiro, A. Nemirovski
with fixed recourse, where the cost coefficients Cf and the matrices At, t > 1,
are not affected by uncertainty, as reflected in the notation. Besides this,
in what follows we assume that the data affected by the uncertainty (that
is, co(0, ^ o ( 0 ' HO) ^^^ diffine functions of ^; as we remember from the
previous section, this "assumption" is in fact a convention on how we use
words: nobody forbids us to treat as the actual "random parameter" the
collection (c(<^),^o(05^(0) I'ather than ^ itself.
As we have explained, a multistage problem (even much better struc-
tured than (49)) is, generically, "severely computationally intractable". We
are about to propose a radical way to reduce the complexity of the problem,
specifically, to pass from arbitrary decision rules yt{') to afRne ones:
where x^^Xt are our new - deterministic! - variables (a vector and a matrix
of appropriate sizes), and Qt£,, Qt being a given deterministic matrix, is the
"portion" of uncertainty which is revealed at time t and thus can be used to
make the decision yt^^.
Now let us look at the problem we end up with. When substituting linear
decision rules (50) into the constraint of (49), the constraint takes the form
The left hand side of the system of inequalities in the latter Prob {•} is affine
in ^, thus, the constraint in question says exactly that the system should be
satisfied for all ^ from the support S of the distribution P of ^. Since the left
hand side of the system is affine in ^, the latter requirement is equivalent to
the system to be valid for all ^ G Z, where Z is the closed convex hull of E,
Thus, the constraint of (49) is nothing but the semi-infinite system of linear
inequalities
T
given by the data M,N,p. Then the semi-infinite system (51) is equiv-
alent to a finite system S of linear inequalities:
The sizes of S (that is, the row and the column sizes of A, B) are
polynomial in the sizes of the matrices AQ, AI,...,AT, M, N, and the
data A,B,q of S are readily given by the data of (51) and M, N, p
(that is, given the latter data, one can build S in polynomial time).
In fact, [BN98] asserts much more than stated by (!), namely, that (51) is
computationally tractable whenever Z is so. We, however, intend to stay
within the grasp of Linear Programming, and to this end (!) is exactly what
we need.
Example: interval uncertainty. Assume that Z is a box; without loss
of generality, we may assume that Z = {^ : —1 < ^^ < 1, i == 1,..., d}.
Since ^o(0> KO ^^^ affine in ^, (51) can be rewritten equivalently as
the semi-infinite problem
d
4W + E^'[^]^^ <0\/^GZJ = 1,..., J, (52)
where X stands for the collection {x, {x^, Xt}Jl=i} ^^ design variables
in (51), and sl[X] are afline functions of X readily given by the data
of (51). With our Z, the semi-infinite system (52) is clearly equivalent
to the system of constraints
Remark 2. The only reason for restricting ourselves with afiine decision rules
stems from the desire to end up with a computationally tractable problem.
We do not pretend that affine decision rules approximate well the optimal
On Complexity of Stochastic Programming Problems 143
Remark 6. Passing from arbitrary decision rules to affine ones seems to reduce
dramatically the flexibility of our decision-making and thus - the expected
results. Note, however, t h a t the numerical results for inventory management
models reported in [BGGN04, BGNV04] demonstrate t h a t aflfinity may well be
not as a severe restriction as one could expect it to be. In any case, we believe
t h a t when processing multi-stage problems, affine decision rules make a good
and easy-to-implement starting point, and t h a t it hardly makes sense to look
for more sophisticated (and by far more computationally demanding) decision
policies, unless there exists a clear indication of "severe tion-optimality" of the
affine rules.
References
[ADEH99] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D.: Coherent measures of
risk. Mathematical Finance, 9, 203-228 (1999)
[ADEHK03] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D. Ku, H.: Coherent
multiperiod risk measurement, Manuscript, ETH Zurich (2003)
[BL97] Barmish, B.R., Lagoa, CM.: The uniform distribution: a rigorous justifi-
cation for the use in robustness analysis. Math. Control, Signals, Systems,
10, 203-222 (1997)
[Bea55] Beale, E.M.L.: On minimizing a convex function subject to linear inequal-
ities. Journal of the Royal Statistical Society, Series B, 17, 173-184 (1955)
[BN98] Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Mathematics
of Operations Research, 23 (1998)
[BNOl] Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization.
SIAM, Philadelphia (2001)
[BGGN04] Ben-Tal, A., Goryashko, A., Guslitzer, E., Nemirovski, A.: Adjustable
robust solutions of uncertain linear Programs. Mathematical Program-
ming, 99, 351-376 (2004)
[BGNV04] Ben-Tal, A., Golany, B., Nemirovski, A., Vial J.-Ph.: Retailer-supplier
flexible commitments contracts: A robust optimization approach. Submit-
ted to Manufacturing 8z Service Operations Management (2004)
[CC05] Calafiore G., Campi, M.C.: Uncertain convex programs: Randomized so-
lutions and confidence levels. Mathematical Programming, 102, 25-46
(2005)
^° In fact, in this case the semi-infinite system (51) can become NP-hard already
with Z as simple as a box [BGGN04].
On Complexity of Stochastic Programming Problems 145
Janos D. Pinter
1 Introduction
Nonlinearity is literally ubiquitous in t h e development of n a t u r a l objects, for-
mations and processes, including also living organisms of all scales. Conse-
quently, nonlinear descriptive models - and modeling paradigms even beyond
a straightforward (analytical) function-based description - are of relevance in
m a n y areas of t h e sciences, engineering, and economics. For example, [BM68,
Ric73, E W 7 5 , Man83, Mur83, Cas90, H J 9 1 , Sch91, BSS93, Ste95, Gro96,
PSX96, Pin96a, Ari99, Ber99, Ger99, LafOO, PWOO, CZOl, EHLOl, JacOl,
148 J.D. Pinter
Sch02, TS02, W0IO2, Diw03, Zab03, Neu04b, HL05, KP05, Pin05a, Pin05b] -
as well as many other authors - present discussions and an extensive repertoire
of examples to illustrate this point.
Decision-making (optimization) models that incorporate such a nonlinear
system description frequently lead to complex models that (may or prov-
ably do) have multiple - local and global - optima. The objective of global
optimization (GO) is to find the "absolutely best solution of nonlinear opti-
mization (NLO) models under such circumstances.
The most important (currently available) GO model types and solution
approaches are discussed in the Handbook of Global Optimization volumes,
edited by Horst and Pardalos [HP95], and by Pardalos and Romeijn [PR02].
As of 2004, over a hundred textbooks and a growing number of informative
web sites are devoted to this emerging subject.
We shall consider a general GO model form defined by the following in-
gredients:
Note that in (2) all vector inequalities are meant component-wise (/, u,
are n-vectors and the zero denotes an m-vector). Let us also remark that the
set of the additional constraints g could be empty, thereby leading to - of-
ten much simpler, although still potentially multi-extremal - box-constrained
models. Finally, note that formally more general optimization models (that
include also = and > constraint relations and/or explicit lower bounds on
the constraint function values) can be simply reduced to the canonical model
form (l)-(2). The canonical model itself is already very general: in fact, it triv-
ially includes linear programming and convex nonlinear programming models
(under corresponding additional specifications). Furthermore, it also includes
the entire class of pure and mixed integer programming problems, since all
(bounded) integer variables can be represented by a corresponding set of bi-
nary variables; and every binary variable y G {0,1} can be equivalently rep-
resented by its continuous extension y G [0,1] and the non-convex constraint
y ( l — ^ ) < 0 . Of course, we do not claim that the above approach is best - or
Nonlinear Optimization in Modeling Environments 149
even suitable - for "all" optimization models: however, it certainly shows the
generality of the CGO modeling framework.
Let us observe next that the above stated "minimal" analytical assump-
tions already guarantee that the optimal solution set X* in the CGO model
is non-empty. This key existence result directly follows by the classical theo-
rem of Weierstrass (that states the existence of the minimizer point (set) of a
continuous function over a non-empty compact set). For reasons of numerical
tract ability, the following additional requirements are also often postulated:
• D is a, full-dimensional subset ("body") in R^\
• the set of globally optimal solutions to (l)-(2) is at most countable;
• / and g (the latter component-wise) are Lipschitz-continuous functions on
[l,u]-
Note that the first two of these requirements support the development and
(easier) implementation of globally convergent algorithmic search procedures.
Specifically, the first assumption - i.e., the fact that D is the closure of its
non-empty interior - makes algorithmic search possible within the set D.
This requirement also imphes that e.g., nonhnear equality constraints need to
be directly incorporated into the objective function as discussed in [Pin96a],
Chapter 4.1.
With respect to the second assumption, let us note that in most well-
posed practical problems the set of global optimizers consists only of a single
point, or at most of several points. However, in full generality, GO models may
have even manifold solution sets: in such cases, software implementations will
typically find a single solution, or several of them. (There are theoretically
straightforward iterative ways to provide a sequence of global solutions.)
The third assumption is a sufficient condition for estimating / * on the basis
of a finite set of feasible search points. (Recall that the real-valued function
h is Lipschitz-continuous on its domain of definition D C R^, if \h{xi) —
h{x2)\ < L\\xi — X2II holds for all pairs xi G D^X2 G D; here L = L{D,h) is
a suitable Lipschitz-constant of h on the set D\ the inequality above directly
supports lower bound estimates on sets of finite size.) We emphasize that
the factual knowledge of the smallest suitable Lipschitz-constant - for each
model function - is not required, and in practice such information is typically
unavailable indeed.
Let us remark here that e.g., models defined by continuously diff"erentiable
functions / and g certainly belong to the CGO or even to the Lipschitz model
class. In fact, even such "minimal" smooth structure is not essential: since
e.g., "saw-tooth" like functions are also Lipschitz-continuous. This comment
also implies that CGO indeed covers a very general class of optimization
models. As a consequence of this generality, the CGO model class includes also
many extremely diflficult instances. To perceive this difficulty, one can think of
model-instances that would require "the finding of the lowest valley across a
range of islands" (since the feasible set may well be disconnected), based on an
150 J.D. Pinter
'vif. f.v#;
If •/
/
Needless to say, not all - and especially not all practically motivated - CGO
models are as difficult as indicated by Figures 1 and 2. At the same time, we do
not always have the possibility to directly inspect and estimate the difficulty of
an optimization model, and perhaps unexpected complexity can be met under
such circumstances. An important case in point is when the software user
(client) has a confidential or otherwise visibly complex model that needs to
be analyzed and solved. The model itself can be presented to the solver engine
as an object code, dynamic fink hbrary (dll), or even as an executable program:
in such situations, direct model inspection is simply not an option. In many
other cases, the evaluation of the optimization model functions may require
the numerical solution of a system of differential equations, the evaluation of
special functions or integrals, the execution of a complex system of program
code, stochastic simulation, even some physical experiments, and so on.
Traditional numerical optimization methods - discussed in most topical
textbooks such as e.g. [BSS93, Ber99, CZOl] - search only for local optima.
This approach is based on the tacit assumption that a "sufficiently good" ini-
tial solution (that is located in the region of attraction of the "true" solution)
is immediately available. Both Fig. 1 and Fig. 2 suggest that this may not al-
ways be a realistic assumption . . . Models with less "dramatic" difficulty, but
in (perhaps much) higher dimensions also imply the need for global optimiza-
tion. For instance, in advanced engineering design, models with hundreds or
thousands of variables and constraints are analyzed. In similar cases to those
152 J.D. Pinter
very simple example that illustrates this point, one can think of a pure ran-
dom search mechanism applied in the interval l<x<u to solve the CGO model:
this will eventually converge, if the "basin of attraction" of the (say, unique)
global optimizer x* has a positive volume. In addition, stochastic sampling
methods can also be directly combined with search steps of other - various
global and efficient local - search strategies, and the overall global convergence
of such strategies will be still maintained. The theoretical background of sto-
chastic "hybrid" algorithms is discussed by [Pin96a]. The underlying general
convergence theory of such combined methods allows for a broad range of im-
plementations. In particular, a hybrid optimization program system supports
the flexible usage of a selection of component solvers: one can execute a fully
automatic global or local search based optimization run, can combine solvers,
and can also design various interactive runs.
Obviously, there remains a significant issue regarding the (typically un-
foreseeable best) "switching point" from strategy to strategy: this is however,
unavoidable, when choosing between theoretical rigor and numerical efficiency.
(Even local nonlinear solvers would need, in theory, an infinite iterative pro-
cedure to converge, except in idealized special cases.) For example, in the
stochastic search framework outlined above, it would suffice to find just one
sample point in the "region of attraction" of the (unique) global solution x*,
and then that solution estimate could be refined by a suitably robust and
efficient local solver. Of course, the region of attraction of x* (e.g., its shape
and relative size) is rarely known, and one needs to rely on computationally
expensive estimates of the model structure (again, the reader is referred, e.g.,
to the review of [BR95]). Another important numerical aspect is that one loses
the deterministic (lower) bound guarantees when applying a stochastic search
procedure: instead, suitable statistical estimation methods can be applied,
consult [Pin96a] and topical references therein. Again, the implementation of
such methodology is far from trivial.
To summarize the discussion, there are good reasons to apply various
search methods and heuristic global-to-local search "switching points" with a
reasonable expectation of numerical success. Namely,
listed above are not necessarily disjoint: e.g., someone can be an expert re-
searcher and software developer in a certain professional area, with a perhaps
more modest optimization expertise. The pros and cons of the individual
software products - in terms of ease of model prototyping, detailed code de-
velopment and maintenance, optimization model processing tools, availability
of solvers and other auxiliary tools, program execution speed, overall level of
system integration, quality of related documentation and support - make such
systems more or less attractive for the user groups listed.
It is also worth mentioning at this point that - especially in the context
of nonlinear modeling and optimization - it can be a salient idea to tackle
challenging problems by making use of several modeling systems and solver
tools, if available. In general, dense NLO model formulations are far less easy
to "standardize" than linear or even mixed integer linear models, since one
typically needs an explicit, specific formula to describe a particular model
function. Such formulae are relatively straightforward to transfer from one
modehng system into another: some of the systems hsted above even have such
built-in converter capabilities, and their syntaxes are typically quite similar
(whether it is x**2 or x^, sin(x) or Sin[x], bernouni(n,x) or BernoulliB[n,x],
and so on).
In subsequent sections we shall summarize the principal features of sev-
eral current nonlinear optimization software implementations that have been
developed with quite diverse user groups in mind. The range of products re-
viewed in this work includes the following:
• LGO Solver System with a Text I/O Interface
• LGO Integrated Development Environment
• LGO Solver Engine for Excel
• MathOptimizer Professional (LGO Solver Engine for Mathematica)
• Maple Global Optimization Toolbox (LGO Solver Engine for Maple).
We will also present relatively small, but non-trivial test problems to il-
lustrate some of the key functionality of these implementations.
Note that all software products discussed are professionally developed and
supported, and that they are commercially available. For this reason - and
also in line with the objectives of this paper - some of the algorithmic tech-
nical details are only briefly mentioned. Additional technical information is
available upon request; please consult also the publicly available references,
including the software documentation and topical web sites.
In order to keep the length of this article within reasonable bounds, further
product implementations not discussed here are
• LGO Solver Engine for GAMS
• LGO Solver Engine for MPL
• TOMLAB/LGO for MATLAB
• MathOptimizer for Mathematica.
156 J.D. Pinter
With respect to these products, consult e.g. the references [Pin02a, PK03,
KP04b, KP05, PHGE04, PK05].
The Lipschitz Global Optimizer (LGO) software has been developed and used
for more than a decade (as of 2004). Detailed technical descriptions and user
documentation have appeared elsewhere: consult, for instance, [Pin96a, Pin97,
PinOla, Pin04], and the software review [BSOO]. Let us also remark here that
LGO was chosen to illustrate global optimization software (in connection with
a demo version of the MPL modeling language) in the well-received textbook
[HL05].
Since LGO serves as the core of most current implementations (with the
exception of one product), we will provide its somewhat more detailed de-
scription, followed by concise summaries of the other platform-specific imple-
mentations.
In accordance with the approach advocated in Section 2, LGO is based on
a seamless combination of a suite of global and local scope nonlinear solvers.
Currently, LGO includes the following solver options:
• adaptive partition and search (branch-and-bound) based global search
(BB)
• adaptive global random search (single-start) (GARS)
• adaptive global random search (multi-start) (MS)
• constrained local search (generalized reduced gradient method) (LS).
The global search methodology was discussed briefly in Section 2; the well-
known GRG method is discussed in numerous textbooks, consult e.g. [EHLOl].
Note that in all three global search modes the model functions are aggregated
by an exact penalty function. By contrast, in the local search phase all model
functions are considered and treated individually Note also that the global
search phases are equipped with stochastic sampling procedures that support
the usage of statistical bound estimation methods.
All LGO search algorithms are derivative-free: specifically, in the local
search phase central differences are used to approximate gradients. This choice
reflects again our objective to handle (also) models with merely computable,
continuous functions, including "black box" systems.
The compiler-based LGO solver suite is used as an option linked to various
modeling environments. In its core text I/O based version, the application-
specific LGO executable program (that includes a driver file and the model
function file) reads an input text file that contains all remaining application
information (model name, variable and constraint names, variable bounds
and nominal values, and constraint types), as well as a few key solver options
Nonlinear Optimization in Modeling Environments 157
(global solver type, precision settings, resource and time limits). Upon com-
pleting the LGO run, a summary and a detailed report file are available. As
can be expected, this LGO version has the lowest demands for hardware, it
also runs fastest, and it can be directly embedded into vertical and proprietary
user applications.
Both files are slightly edited for the present purposes. Note also that in the
simplest usage mode, the driver file contains only a single statement that calls
LGO: therefore we skip the display of that file. (Additional pre- and post-
solver manipulations can also be inserted in the driver file: this can be useful
in various customized applications.)
Model Descriptors
LGO Model ModelName
1 Number of Variables
1 Number of Constraints
Variable names Lower Bounds Nomimal Values Upper Bounds
X -10. 0. 10.
ObjFct ! Objective Function Name
Constraint Names and Constraint Types (0 for ==, -1 for <=)
Constraint1 -1
S u m m a r y result file
4 . 3 L G O s o l v e r e n g i n e for E x c e l u s e r s
Solver Platform: the latter is fully compatible with the standard Excel Solver,
but it has enhanced algorithmic capabilities and features.
LGO for Excel, in addition to continuous global and local capabilities,
also provides basic support for handling integer variables: this feature has
been implemented - as a generic option for all advanced solver engines - by
Frontline Systems.
The LGO solver options available are essentially based on the stand-alone
"silent" version of the software, with some modifications and added features.
The LGO Solver Options dialog, shown by Fig. 3, allows the user to control
solver choices and several other settings.
Fig. 3. Excel/ LGO solver engine: solver options and parameters dialog
lte.^-ation5: Gance!
By assumption, the vector variable x belongs to the box region [0,10] . The
numerical values of the constants p^/e,f = l,...,5,A: = l , . . . , 4 are listed in the
paper of Ratschek and Rokne [RR93], and will not be repeated here. (Note
that, in order to make the model functions more readable, several constants
are simply aggregated in the above formulae, when compared to that paper.)
To solve the ECD model rigorously, Ratschek and Rokne applied a com-
bination of interval arithmetic, subdivision and branch-and-bound strategies.
They concluded that the rigorous solution was extremely costly (billions of
model function evaluations were needed), in order to arrive at a guaranteed
interval (i.e., embedding box) estimate that is component-wise within at least
10-4 precision of the postulated approximate solution:
X* = (0.9,0.45,1.0,2.0,8.0,8.0,5.0,1.0,2.0).
Obviously, by taking e.g. the Euclidean norm of the overall error in the
model equations, the problem of finding the solution can be formulated as a
global optimization problem. This model has been set up in a demo spread-
sheet, and then solved by the Excel LGO solver engine. The numerical solution
found by LGO - directly imported from the answer report - is shown below:
Microsoft Excel 10.0 Answer Report
Worksheet: [CircuitDesign_9^9.XLS] Model
Report Created: 12/16/2004
12:39:29 AM
Result: Solver found a solution. All constraints and
optimality conditions are satisfied.
162 J.D. Pinter
Adjustable Cells
Cell Name Original Value Final Value
$D$10 x_l 1 0.900000409
$D$11 x_2 2 0.450000021
$D$12 x_3 3 1.000000331
$D$13 x_4 4 2.000001476
$D$14 x^5 5 7.999999956
$D$15 x_6 6 7.999998226
$D$16 x_7 7 4.999999941
$D$17 x_8 8 1.000000001
$D$18 x_9 9 1.999999812
The error of the solution found is within 10"^ to the verified solution, for
each component. The numerical solution of the ECD model in Excel takes less
than 5 seconds on a personal computer (Intel Pentium 4, 2.4 GHz processor,
512 Mb RAM). Let us note that we have solved this model also using core
LGO implementations with various C and Fortran compilers, with essentially
identical success (in about a second or less). Although this finding should not
lead per se to overly optimistic claims, it certainly shows the robustness and
efiiciency of LGO in solving this particular (non-trivial) example.
Let us also remark that we have attempted to solve instances of the same
circle packing problem applying the built-in Mathematica function NMinimize
for nonhnear (global) optimization, but - using it in all of its default solver
modes - it could not find a solution of acceptable quality already for the
Nonlinear Optimization in Modeling Environments 165
Hence, the solution found by the Maple GOT (using default precision settings)
is accurate to 15 digits.
It is probably just as noteworthy that one can find a reasonably good
solution even in a much larger variable range, with the same solution eff'ort:
> GlobalSolveCf, x l = - 1 0 0 . . 1 0 0 , x2=-100..100, e v a l u a t i o n l i m i t
=100000, noimprovementlimit=100000);
[-3.06433688275856530, [xl - -0.233457978266705634e- 1,
x2 = .774154819772443825]]
The corresponding GOT runtimes are a little more than one second in
both cases. (Note that all such runtimes are approximate, and may vary a
bit even between consecutive test runs, depending on the machine's actual
runtime environment).
One of the advantages of using ISTCs that one can visuahze models and
verify their perceived difficulty. Fig. 5 is based on using the Maple Optimiza-
tion Plotter dialog, a feature that can be used in conjunction with the GOT:
it shows the box-constrained Trefethen model [Tre02] in the range [-3,3]^;
observe also the location of the optimal solution (green dot).
Fig. 5. Problem 4 in [Tre02] solved and visualized using the Maple GOT
B;0|itiiiiizatic>fi:;f*ld^| JSl
Ranges
5 F u r t h e r Applications
For over a decade, LGO has been applied in a variety of professional, as well as
academic research and educational contexts (in some 20 countries, as of 2004).
In recent years, LGO has been used to solve models in up to a few thousand
variables and constraints. The software seems to be particularly well-suited
to analyze and solve complex, sophisticated applications in advanced engi-
neering, biotechnology, econometrics, financial modeling, process industries,
medical studies, and in various other areas of scientific modeling.
Without aiming at completeness, let us refer to some recent (published)
applications and case studies that are related to the following areas:
• model calibration ([PinOSa])
• potential energy models in computational chemistry ([PinOO, PinOlb]),
([SSPOl])
• laser design ([IPC03])
• cancer therapy planning ([TKLPL03])
• combined finite element modeling and optimization in sonar equipment
design ([PP03])
• Configuration analysis and design ([KP04b]).
Note additionally that some of the LGO software users develop other
advanced (but confidential) applications. Articles and numerical examples,
specifically related to various LGO implementations are available from the
author upon request. The forthcoming volumes ([KP05]; [Pin05a, Pin05b])
also discuss a large variety of GO applications, with extensive further refer-
ences.
6 Conclusions
In this paper, a review of several nonlinear optimization software products
has been presented. Following the introduction of the LGO solver suite, we
have provided a brief review of several currently available implementations for
use with compiler platforms, spreadsheets, optimization modeling languages,
and ISTCs. It is our objective to add customized functionality to the existing
products, and to develop further implementations, in order to meet the needs
of a broad range of users.
Global optimization is and will remain a field of extreme numerical diffi-
culty, not only when considering "all possible" GO models, but also in prac-
tical attempts to handle complex, sizeable problems in an acceptable time-
frame. Therefore the discussion advocates a practically motivated approach
that combines rigorous global optimization strategies with efficient local search
methodology, in integrated, flexible solver suites. The illustrative - yet non-
trivial - application examples and the numerical results show the practical
merits of such an approach.
Nonlinear Optimization in Modeling Environments 169
Acknowledgements
First of all, I wish to thank my developer partners and colleagues for their
cooperation and many useful discussions, quality software, documentation,
and technical support. These partners include AMPL LLC, Frontline Systems,
the GAMS Development Corporation, Dr. Frank J. Kampas, Lahey Computer
Systems, LINDO Systems, Maplesoft, Maximal Software, Paragon Decision
Technology, TOMLAB AB, and Wolfram Research.
Several application examples reviewed or cited in this paper are based on
cooperation with colleagues: all such cooperation is gratefully acknowledged
and is reflected by the references.
In addition to professional contributions and in-kind support oS^ered by
developer partners, the work summarized and reviewed in this paper has re-
ceived financial support from the following organizations: DRDC Atlantic Re-
gion, Canada (Contract W7707-01-0746), the Dutch Technology Foundation
(STW Grant CWI55.3638), the Hungarian Scientific Research Fund (OTKA
Grant T 034350), Maplesoft, the National Research Council of Canada (NRC
IRAP Project 362093), the University of Ballarat, Austraha; the University
of Kuopio, Finland; and the University of Tilburg, Netherlands.
References
[Ari99] Aris, R.: Mathematical Modeling: A Chemical Engineers Perspective. Aca-
demic Press, San Diego, CA (1999)
[BSS93] Bazaraa, M.S., Sherali, H.D., Shetty, CM.: Nonlinear Programming: The-
ory and Algorithms. Wiley, New York (1993)
[BSOO] Benson, H.P., Sun, E. LGO - Versatile tool for global optimization. In:
OR/MS Today, 27, 52-55 (2000)
[Ber99] Bertsekas, D.P.: Nonlinear Programming (2nd Edition). Athena Scientific,
Cambridge, MA (1999)
[BR95] Boender, C.G.E., Romeijn, H.E. Stochastic methods. In: Horst and Parda-
los (eds) Handbook of Global Optimization. Volume 1, pp. 829-869 (1995)
[BLWW04] Bornemann, P., Laurie, D., Wagon, S., Waldvogel, J.: The SIAM 100-
Digit Challenge. A Study in High-Accuracy Numerical Computing. SIAM,
Philadelphia, PA (2004)
[BM68] Bracken, J. and McCormick, G.P.: Selected Applications of Nonlinear Pro-
gramming. Wiley, New York (1968)
[BKM88] Brooke, A., Kendrick, D. and Meeraus, A.: GAMS: A User's Guide. The
Scientific Press, Redwood City, CA. (Revised versions are available from
the GAMS Corporation.) See also http://www.gams.com (1988)
170 J.D. Pinter
[Cas90] Casti, J.L.: Searching for Certainty. Morrow & Co., New York (1990)
[Cog03] Cogan, B. How to get the best out of optimization software. In: Scientific
Computing World, 7 1 , 67-68 (2003)
[CK99] Corliss, G.F., Kearfott, R.B. Rigorous global search: industrial applications.
In: Csendes, T. (ed) Developments in Reliable Computing, 1-16. Kluwer
Academic Publishers, Boston/Dordrecht/London (1999)
[CFOOl] Coullard, C , Fourer, R., Owen, J.H. (eds): Annals of Operations Research,
104, Special Issue on Modeling Languages and Systems. Kluwer Academic
Publishers, Boston/Dordrecht/London (2001)
[CZOl] Chong, E.K.P., Zak, S.H.: An Introduction to Optimization (2nd Edition).
Wiley, New York (2001)
[Diw03] Diwekar, U.: Introduction to Applied Optimization. Kluwer Academic Pub-
lishers, Boston/Dordrecht/London (2003)
[EHLOl] Edgar, T.F., Himmelblau, D.M., Lasdon, L.S. Optimization of Chemical
Processes (2nd Edition). McGraw-Hill, New York (2001)
[EW75] Eigen, M. and Winkler, R.: Das Spiel. Piper & Co., Miinchen (1975)
[Fou04] Fourer, R.: Nonlinear Programming Frequently Asked Questions. Op-
timization Technology Center of Northwestern University and Ar-
gonne National Laboratory, http://www-unix.mcs.anl.gov/otc/Guide/faq/
nonlinear-programming-faq.html (2004)
[FGK93] Fourer, R., Gay, D.M., Kernighan, B.W.: AMPL - A Modeling Lan-
guage for Mathematical Programming. The Scientific Press, Redwood
City, CA (Reprinted by Boyd and Eraser, Danvers, MA, 1996. See also
http://www.ampl.com) (1993)
[FSOl] Frontline Systems: Premium Solver Platform - Solver Engines. User Guide.
Frontline Systems, Inc. Incline Village, NV (See http://www.solver.com,
and http://www.solver.com/xlslgoeng.htm) (2001)
[Ger99] Gershenfeld, N.: The Nature of Mathematical Modeling. Cambridge Uni-
versity Press, Cambridge (1999)
[Gro96] Grossmann, I.E. (ed): Global Optimization in Engineering Design. Kluwer
Academic Publishers, Boston/Dordrecht/London (1996)
[HJ91] Hansen, P.E. and J0rgensen, S.E. (eds): Introduction to Environmental
Management. Elsevier, Amsterdam (1991)
[HL05] Hillier, F.J. and Lieberman, G.J. Introduction to Operations Research. (8th
Edition.) McGraw-Hill, New York (2005)
[HP95] Horst, R., Pardalos, P.M. (eds): Handbook of Global Optimization (Volume
1). Kluwer Academic Publishers, Boston/Dordrecht/London (1995)
[HT96] Horst, R., Tuy, H.: Global Optimization - Determinsitic Approaches (3rd
Edition). Springer-Verlag, Berhn / Heidelberg / New York (1996)
[104] ILOG: ILOG OPL Studio and Solver Suite, http://www.ilog.com (2004)
[IPC03] Isenor, G., Pinter, J.D., Cada, M.: A global optimization approach to laser
design. Optimization and Engineering 4, 177-196 (2003)
[JacOl] Jacob, C : Illustrating Evolutionary Computation with Mathematica. Mor-
gan Kaufmann Publishers, San Francisco (2001)
[Kal04] Kallrath, J. (ed): Modeling Languages in Mathematical Optimization.
Kluwer Academic Publishers, Boston/Dordrecht/London (2004)
[KP04a] Kampas, F.J., Pinter, J.D.: Generalized circle packings: model formula-
tions and numerical results. Proceedings of the International Mathematica
Symposium (Banff, AB, Canada, August 2004)
Nonlinear Optimization in Modeling Environments 171
[KP04b] Kampas, F.J., Pinter, J.D.: Configuration analysis and design by using
optimization tools in Mathematica. The Mathematica Journal (to appear)
(2004)
[KP05] Kampas, F.J., Pinter, J.D.: Advanced Optimization: Scientific, Engineering,
and Economic Applications with Mathematica Examples. Elsevier, Amster-
dam (to appear) (2005)
[Kea96] Kearfott, R.B.: Rigorous Global Search: Continuous Problems. Kluwer Aca-
demic Publishers, Boston/Dordrecht/London (1996)
[LafOO] Lafe, O.: Cellular Automata Transforms. Kluwer Academic Publishers,
Boston / Dordrecht / London (2000)
[LCS02] Lahey Computer Systems. Fortran 90 User's Guide. Lahey Computer Sys-
tems, Inc., Inchne Village, http://www.lahey.com (2002)
[LS96] LINDO Systems. Solver Suite. LINDO Systems, Inc., Chicago, IL.
http://www.lindo.com (1996)
[Man83] Mandelbrot, B.B.: The Fractal Geometry of Nature. Freeman &; Co., New
York (1983)
[M04a] Maplesoft. Maple. (Current version: 9.5.) Maplesoft, Inc., Waterloo, ON.
http://www.maplesoft.com (2004)
[M04b] Maplesoft. Global Optimization Toolbox. Maplesoft, Inc. Waterloo, ON.
http://www.maplesoft.com (2004)
[MM95] Maros, I., Mitra, G. (eds): Annals of Operations Research, 58, Applied
Mathematical Programming and Modeling II (APMOD 93) J.C. Baltzer
AG, Science Publishers, Basel (1995)
[MMS97] Maros, I., Mitra, G., Sciomachen, A. (eds): Annals of Operations Re-
search, 8 1 , Applied Mathematical Programming and Modeling III (AP-
MOD 95). J.C. Baltzer AG, Science Publishers, Basel (1997)
[MS04] Mittelmann, H.D., Spellucci, P. Decision Tree for Optimization Software.
http://plato.la.asu.edu/guide.html (2004)
[MS02] Maximal Software. MPL Modeling System. Maximal Software, Inc. Arling-
ton, VA. http://www.maximal-usa.com (2002)
[Mur83] Murray, J.D.: Mathematical Biology. Springer-Verlag, Berlin (1983)
[Neu04a] Neumaier, A.: Global Optimization, http://www.mat.univie.ac.at/ neum
/glopt.html (2004)
[Neu04b] Neumaier, A.: Complete search in continuous global optimization and con-
straint satisfaction. In: Iserles, A. (ed) Acta Numerica 2004. Cambridge
University Press, Cambridge (2004b)
[PWOO] Papalambros, P.Y., Wilde, D.J.: Principles of Optimal Design. Cambridge
University Press, Cambridge (2000)
[PDT04] Paragon Decision Technology: AIMMS (Current version 3.5).
Paragon Decision Technology BV, Haarlem, The Netherlands. See
http://www.aimms.com (2004)
[PSX96] Pardalos, P.M., Shalloway, D. and Xue, G.: Global minimization of noncon-
vex energy functions: molecular conformation and protein folding. In: DI-
MACS Series, 23, American Mathematical Society, Providence, RI (1996)
[PR02] Pardalos, P.M., Romeijn, H.E. (eds): Handbook of Global Optimization.
Volume 2. Kluwer Academic Publishers, Boston/Dordrecht/London (2002)
[Pin96a] Pinter, J.D.: Global Optimization in Action. Kluwer Academic Publishers,
Boston / Dordrecht / London (1996)
172 J.D. Pinter
[PP03] Pinter, J.D., Purcell, C.J.: Optimization of finite element models with
MathOptimizer and ModelMaker. Lecture presented at the 2003 Mathe-
matica Developer Conference, Champaign, IL (2003) (Extended abstract is
available upon request, and also from http://www.library.com)
[RR93] Ratschek, H., Rokne, J.: Experiments using interval analysis for solving a
circuit design problem. Journal of Global Optimization 3, 501-518 (1993)
[RR95] Ratschek, H., Rokne, J.: Interval methods. In: Horst and Pardalos (eds)
Handbook of Global Optimization. Volume 1, 751-828 (1995)
[Ric73] Rich, L.G.: Environmental Systems Engineering. McGraw-Hill, Tokyo
(1973).
[Sch02] Schittkowski, K.: Numerical Data Fitting in Dynamical Systems. Kluwer
Academic Publishers, Boston/Dordrecht/London (2002)
[Sch91] Schroeder, M.: Fractals, Chaos, Power Laws. Freeman & Co., New York
(1991)
[Ste95] Stewart, I.: Nature's Numbers. Basic Books / Harper and Collins, New
York (1995)
[SSPOl] Stortelder, W.J.H., de Swart, J.J.B., Pinter, J.D.: Finding elliptic Fekete
point sets: two numerical solution approaches. Journal of Computational
and Applied Mathematics, 130, 205-216 (2001)
[TS02] Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization
in Continuous and Mixed-integer Nonlinear Programming. Kluwer Acad-
emic Publishers, Boston/Dordrecht/London (2002)
[TKLPL03] Tervo, J., Kolmonen, P., Lyyra-Laitinen, T., Pinter, J.D., and Lahtinen,
T. An optimization-based approach to the multiple static delivery technique
in radiation therapy. Annals of Operations Research, 119, 205-227 (2003)
[TO04] TOMLAB Optimization. TOMLAB. TOMLAB Optimization AB,
Vasteras, Sweden (2004) (See http://www.tomlab.biz)
[Tre02] Trefethen, L.N.: The hundred-dollar, hundred-digit challenge problems.
SIAM News, Issue 1, p. 3 (2002)
[TM04] The MathWorks: MATLAB. (Current version: 6.5) The MathWorks, Inc.,
Natick, MA (2004) (See http://www.mathworks.com)
[VMMOO] Vladimirou, H., Maros, I., Mitra, G. (eds): Annals of Operations Re-
search, 99, Applied Mathematical Programming and Modeling IV (AP-
MOD 98) J.C. Baltzer AG, Science Publishers, Basel, Switzerland (2000)
[Wol02] Wolfram, S.: A New Kind of Science. Wolfram Media, Champaign, IL, and
Cambridge University Press, Cambridge (2002)
[Wol03] Wolfram, S.: The Mathematica Book. (Fourth Edition) Wolfram Media,
Champaign, IL, and Cambridge University Press, Cambridge (2003)
[WR04] Wolfram Research: Mathematica (Current version: 5.1). Wolfram Research,
Inc., Champaign, IL (2004) (See http://www.wolfram.com)
[Zab03] Zabinsky, Z.B.: Stochastic Adaptive Search for Global Optimization.
Kluwer Academic Publishers, Boston/Dordrecht/London (2003)
Supervised Data Classification via Max-min
Separability
Summary. The problem of discriminating between the elements of two finite sets
of points in n-dimensional space is a fundamental in supervised data classification.
In practice, it is unlikely for the two sets to be linearly separable. In this paper we
consider the problem of separating of two finite sets of points by means of piece-
wise linear functions. We prove that if these two sets are disjoint then they can be
separated by a piecewise linear function and formulate the problem of finding the
latter function as an optimization problem with an objective function containing
max-min of linear functions. The diff'erential properties of the objective function are
studied and an algorithm for its minimization is developed. We present the results
of numerical experiments with real world data sets. These results demonstrate the
eflPectiveness of the proposed algorithm for separating two finite sets of points. They
also demonstrate the effectiveness of an algorithm based on the concept of max-min
separability for solving supervised data classification problems.
1 Introduction
Supervised d a t a classification is an i m p o r t a n t area in d a t a mining. It has
m a n y applications in science, engineering, medicine etc. T h e aim of super-
vised d a t a classification is to establish rules for the classification of some
observations assuming t h a t the classes of d a t a are known. To find these
rules, known training subsets of the given classes are used. During the
last decades m a n y algorithms have been proposed and studied to solve su-
pervised d a t a classification problems. One of the promising approaches to
these problems is based on mathematical programming techniques. This ap-
proach has gained a great deal of attention over last years, see, for exam-
ple, [AG02, Bag05, BRSYOl, BRYOO, BRY02, BB97, BB96, BM92, BMOO,
BFM99, Bur98, CM95, Man94, Man97, Tho02, Vap95].
176 A.M. Bagirov, J. Ugon
2 Preliminaries
In this section we present a brief review of the concepts of linear, bilinear and
polyhedral separability.
where
^ m ^ p
f{x,y) = ~ V m a x (O, (x,a^) - y + l) + - V^max (O, -{x,V) + y + l)
i=l ^ 3~l
is an error function. Here (•, •) stands for the scalar product in IR"^. The authors
describe an algorithm for solving problem (1). They show that the problem
(1) is equivalent to the following linear program:
1 ^ ^ 1 ^
minimize
178 A.M. Bagirov, J. Ugon
subject to
ti > (x, a') - y H- 1, i = 1 , . . . , m,
Zj >-{x,V)-{-y-{-l, j = l,...,p,
t>0, z>0,
where ti is nonnegative and represents the error for the point a'^ e A and Zj
is nonnegative and represents the error for the point b^ e B.
The sets A and B are hnearly separable if and only if / * = /(x*,^*) = 0
where (x*,y*) is the solution to the problem (1). It is proved that the trivial
solution X = 0 cannot occur.
(x^a^•)-yi<0, /-l,2
and for any fc = 1 , . . . ,p there exists I G {1,2} such that
{x',b^)-yi>0,
2. For any A; = 1 , . . . , p
{x\b^)-yi<0, 1 = 1,2
and for any j = 1 , . . . , m there exists / G {1,2} such that
{x\a^)-yi >0.
3. For any j = 1 , . . . , m either
{x\a^)-yi<0, / = l,2
or
(-x^a^•)^-y/<0, / = 1,2
and for any A: — 1 , . . . , p either
{x\b^)-yi<0, {-x^,b'')+y2<0
or
{-x\b'')+yi<0, {a:^&'=)-2/2>0.
Supervised Data Classification via Max-min Separability 179
2. For any fc = 1 , . . . ,p
max{{x\b'')-yi}<0
max [min{(x\ 6^) - 2/1,-(x^ 6^) + 2/2},min{-{x\ 6^) + yi, (x^ 6^) - ^2}]
>0.
{x\a^)-yi <0,
180 A.M. Bagirov, J. Ugon
{x\b^)-yi>0,
It is proved in [AG02] that the sets A and B are /i-polyhedrally separable, for
some h < p ii and only if
co^Pl^-0.
Figure 1 presents one example of polyhedral separability.
The problem of polyhedral separability of the sets A and B is reduced to
the following problem:
where
^ 771
3 Max-min separability
In many practical applications two sets are not linearly, bilinearly or poly-
hedrally separable. Figure 2 presents one such case. In this case two sets are
separable with more complicated piecewise linear function.
•5 0 "•'•.
^0 0 0 ^^00^
^1 §/
(P 0 0/
\ 0 0 G i-'G 0 Q'
% <5Go
° Q 0 /' \ G o
X. H e >'•
i t , ^\
1 .^\^ J
3.1 D e f i n i t i o n a n d p r o p e r t i e s
Remark 1, It follows from Definition 3 that if the sets A and B are max-min
separable then (p{a) < 0 for any a E A and (p{b) > 0 for any b e B^ where the
function (p is defined by (3). Thus the sets A and B can be separated by a
function represented as a max-min of linear functions. Therefore this kind of
separability is called a max-min separability.
Remark 2. Linear and polyhedral separability can be considered as particular
cases of the max-min separability. If / = {1} and Ji = {1} then we have the
linear separability and if / = { 1 , . . . , /i} and Ji = {i}, i E I we obtain the
/i-polyhedral separability.
Remark 3. Bilinear separability can also be considered as particular case of the
max-min separability. It follows from Definition 2 that the bilinear separability
of two sets A and B coincides with one of the following cases:
1. The sets A and B are 2-polyhedrally separable and c o ^ Q - B — 0;
2. The sets A and B are 2-polyhedrally separable and c o 5 Q ^ = 0;
3. The sets A and B are max-min separable with the following hyperplanes:
..--'-o"„'""'"°'""'"'i"o" r'o"=^o"°°"«"o°""^;£--..,
o:?^
and
minmaxmin {{x^^b) — yj\ = S2.
We consider the new set of hyper planes {x^ ,yj} with x^ G IR^, yj G IR , j G
J, defined as follows:
x^ — x^ /S, j G J,
y^ =y^/5, j G J.
Then it follows from (4) and (5) that
Proof Since max-min of linear functions is piecewise linear function the ne-
cessity is straightforward.
Sufficiency. It is known that any piecewise Hnear function can be represented
as a max-min of linear functions of the form (3) (see [BKS95]). Then we get
that there exists max-min of linear functions that separates the sets A and B
which in its turn means that these sets are max-min separable. D
Remark 4- It follows from Proposition (2) that the notions of max-min and
piecewise linear separability are equivalent.
A=\jAi
i=l
{x'{b),b)-yi{b)>0,
Supervised Data Classification via Max-min Separability 185
and
min {{x\V),a)-yi{V)} <0, Va G A (8)
i=l,...,q
{x\b^),yi{b^)} , i - l , . . . , g , j = l,...,p.
In the next proposition we show that in most cases the number of hyper-
lanes necessary for the max-min separation of the sets A and B is hmited.
and
coAiC\coBj=^ for alH = 1,...,g, j = 1,...,ci. (9)
Then the number of hyperplanes necessary for the separation of the sets A and
B is at most q • d.
and
min (x'-^",a) - yij < 0, Va G A. (11)
H = {h^,...M}
J ^ - { J i , . . . , J 4 , Jk = {{k-l)q-^l,..,M}. k = l,..,,d.
It follows from (10) and (11) that for all /c G / and a e A
that is the sets A and B are max-min separable with at most ^-rf hyper planes.
D
Remark 5. The only cases where the number of hyperplanes necessary is large
are when the sets Ai and Bj contain a very small number of points. This
situation appears only in the particular case where the distribution of the
points is like a " chessboard".
max p , m i n m a x { - ( x ^ 6 ) + t / j + 1} . (13)
Proof Necessity. Assume that the sets A and B are max-min separable. Then
it follows from Proposition 1 that there exists a set of hyperplanes {x^,yj},j G
J and a partition J^ = {Ji^ - • - ,Jr} of the set J such that
xmn{{x^b}-yj}>l. (16)
J&Jt
Consequently we have
Proposition 6. (see [Bag05]). Assume that the sets A and B are max-min
separable with a set of hyperplanes {x-^, y^}, j G J = { 1 , . . . , /} and a partition
J^ — {Ji^ • • • ? Jr] of the set J. Then
1) x^ ==0, j E J cannot be an optimal solution;
2) if
(a) for any t e I there exists at least one b e B such that
(b) there exists J = {Ji, •.., Jr} such that Jt C Jt, Vt G / , Jt is nonempty
at least for one t E I and x^ = 0 for any j E Jt^ t E I,
Then the sets A and B are max-min separable with a set of hyperplanes
{x^,yj},j G J^ and a partition J = { J i , . . . , Jr} of the set J^ where
r
Jt = Jt\Jt, tGl and J° = U Ji.
i=l
Supervised Data Classification via Max-min Separability 189
Proof. 1) Since t h e sets A and B are max-min separable we get from Propo-
sition 5 t h a t f{x,y) = 0. If x^ = 0, j e J then it follows from (14) t h a t for
any y elR^
f{0,y) = ( l / m ) ^ m a x 0,maxmin{—yj + 1}
iei jeJi
k=i
-(i/p)E m a x 0,minmax{v7 + 1}
t=i iei jeJi ^^ ^
We denote
R = maxminj—Vi).
iei jeJi -^^
T h e n we have
m i n m a x v i — — m a x mini—V7} — —R.
iei jeJi -^ iei jeJi -^^
Thus
/(O, y) = m a x [0, i^ + 1] + m a x [0, -R+l\.
It is clear t h a t
-R+l \{R<-1,
m a x [0, i? + 1] + m a x [0, - i ? + 1] = < 2 if - 1 < J R < 1 ,
R+l ifi?>l.
I^ = {ieI:Jiy^ 0},
1 ^
m a x 0, min m a x { — (x-^, h^) + T/^ + 1}
%ei- jej%
t=i
Since the function / is nonnegative we obtain
190 A.M. Bagirov, J. Ugon
It follows from (17) and (19) that for any i e P there exists a point b e B
such that
m a x { - ( x ^ 6 ) + y ^ - f l } < 0. (20)
Hi e P C P then we have
0 > max { —(x-^, 6) + y^ + l } = max < max { — {x^, 6) + ?/j + 1} , max {yj + 1} >
and
max{2/j+1} < 0. (22)
jeJi
Ifi e P\P then from (20) we obtain
Thus we get that for all i e P the inequality (22) is true. (22) can be rewritten
as follows:
msixyj < - 1 , Vi e P. (23)
jeJi
Consequently for any i E P
0 > min {{x^, a) — y^ + 1} = min < min {{x^, a) — y^ + 1} , min {—yj + l}> .
J^'Ji [jeJi jeJi J
m i n { ( x ^ a ) - yj + l} < 0. (26)
jeJi
liieP\P then it follows from (25) that
Supervised Data Classification via Max-min Separability 191
mm{-yj + l} < 0
jeJi
m i n { ( x ^ a ) - % + l } < 0. (27)
Prom (26) and (27) we can conclude that for any i E I and a E A
It follows from (19) that for any b G B there exists at least one i E I
Lax <{ — {x^ J &) + 2/j + 1} = max < max { — {x^, &) + y^ + 1} , max {yj + 1} ^
max
we get that for any b e B there exists at least one i e I such that
Thus it follows from (28) and (29) that the sets A and B are max-min sepa-
rable with the set of hyperplanes {x-^, y^}, j G P and a partition J of the set
P, D
-o-?^Too"o;o-^-^-..
..o'< 0^
,JS> 0 ' ^ \
0 ^«r. ^G ^ 0.,
0 Oo^
,-^0^ oo^ \
^ ^ / ^\
"< ''
f{x,y) = fi{x,y)-\-f2{x,y)
and
^ m r
/ i {x, y) = — / max 0, max min {{x^, a^) — y. + 1} (31)
m k=l
p
1 1
f2(x.y) = - > max O.min max { — (x^.b^) -\-yj -\-l\ (32)
d(p{x)
= co{v eJRJ^ : 3{x^ e D{ip),x^ —> x,k —> +oo) :v= lim V(^(x^)},
k >-j-oo
here D{(p) denotes the set where (p is diff'erentiable, co denotes the convex
hull of a set.
The function cp is differentiable at the point x G IR^ with respect to the
direction g G IR" if the limit
LD (x, q) = lim -^ —^
exists. The number ^'{x^g) is said to be the derivative of the function (p with
respect to the direction g G IR^ at the point x,
The Clarke upper derivative cp^{x,g) of the function (p at the point x with
respect to the direction g G IR^ is defined as follows:
It should be noted that the Clarke upper derivative always exists for locally
Lipschitz continuous functions. The function (p is said to be Clarke regular at
the point x G IR"" if
(p'{x,g) =(p^{x,g)
194 A.M. Bagirov, J. Ugon
for all g G IR^. For Clarke regular functions there exists a calculus (see [Cla83,
DR95]). However in general for non-regular functions such a calculus does not
exist.
The function C/P is called semismooth at x G H^, if it is locally Lipschitz
continuous at x and for every g G IR'^, the limit
lim {v,g)
ved^(x-htg'),g'-^g,t~^+0
(^{x) = max {min{3a;i + X2, 2xi + 3a;2}, min{xi + 2^2, ^xi + 4x2}} •
9^(x)=co{(3,l),(2,3),(l,2),(4,4)}.
Then the Clarke upper derivative (p^{x,g^) of the function (f at the point
X = (0,0) with respect to the direction g^ — (0,1) is
However, the directional derivative of this function with respect to the direc-
tion g^ = (0,1) is ip\x^g) = 2 that is (p^x^g^) < cp^{x^g^). Thus the function
(p is not Clarke regular.
Here ^i is the unit sphere, G is the set of vertices of the unit hyper cube
in H^ and P is the set of univariate positive infinitesimal functions.
We define operators Hi : IR'^ -^ IR^ for z = 1 , . . . , n, j = 0 , . . . , n by the
formula
j = l,...,n,j ^i,
Now let us return to the objective function / of the problem (30). This function
depends on (n + 1)/ variables where / is the number of hyperplanes. The
function / i contains max-min functions (pik
where
i^ijk{x,y) = (x^a^) -Vj + 1, j G Ji, i e I.
We can see that for every A: = 1 , . . . , m, each pair of variables {x^^Vj} appears
in only one function ipijk-
For a given i = 1 , . . . , (n + 1)/ we set
i-1
Qi = + 1, di==i-{qi-l){n + l)
n+ l
where [u\ stands for the floor of a number u. We define by X the vector of
all variables {x-^,T/J}, j = 1 , . . . , /:
as in Remark 7. It follows from (36) that the points Xl~^ and XI diSei by one
coordinate only. This coordinate appears in only one linear function ipiq^k- It
follows from the definition of the operator H^ that X | = X*~^ and thus this
observation is also true for X^'^^. Then we get
Moreover the function ipiq^k can be calculated at the point X^ using the value
of this function at the point XIti- ^, i > 1:
198 A.M. Bagirov, J. Ugon
\\w^\\<S, (39)
|K^||=mm{||t;||:«e:D„(x^)}.
Furthermore either ||t'J|| < (J/, or for the search direction ^^ — —^v^sV'^'^^s
200 A.M. Bagirov, J. Ugon
Step 4. If
Ibfll < h (42)
then set x^+^ = x^.k = k + 1 and go to Step 2. Otherwise go to Step 5.
Step 5. Construct the following iteration x^^^ =" Xs~^(^s9s) where as is defined
as follows
For the point x^ e JR^ we consider the set M{x^) = {x G IR'^ : (p{x) <
Theorem 1. Assume that the set M{x^) is bounded for starting points x^ G
IR'^. Then every accumulation point of {x^} belongs to the set X^ — {x £
WC ',Oedip{x)}.
These two sets define the following four sets (see Figure 5):
1. Aon(iR"\^°)
2. (IR"\ylO)ni?
4^ (IR"\AO)n(IR"\^?)
If a new observation a belongs to the first set we classify it in class i, if it
belongs to the second set we classify it not to be in class i. If this point belongs
to the third or fourth set in this case if ^i{a) < minj=i^...,(ij^i(/?j(a) then we
classify it in class i, otherwise we classify it not to be in class i.
In order to evaluate the classification algorithm we use two performance
measures. First we present the average accuracy (a2c in Tables 3 and 4) for
well-classified points in two classes classification (when one particular class is
separated from all others) and the multi-class classification accuracy {amc in
Tables 3 and 4) as described above. First accuracy is an indication of sepa-
ration quality and the second one is an indication of multi-class classification
quality.
Table 1. Results of numerical experiments with small and middle size datasets
Database m/p/n Linear Polyhedral Max-min
h accuracy r x j accuracy
WBCD 239/444/9 97.36 7 98.98 5x2 100
Heart 137/160/13 84.19 10 100 2x5 100
Ionosphere 126/225/34 93.73 4 97.44 2x2 100
Votes 168/267/16 96.80 5 100 2x3 100
WBCP 46/148/32 76.80 4 100 3x2 100
Diabetes 268/500/8 76.95 12 80.60 15x2 90.10
Liver 145/200/6 68.41 12 74.20 6x5 89.86
Datasets
The datasets used are the Shuttle control , the Letter recognition, the Land-
sat satellite image, the Pen-based recognition of handwritten and the Page
blocks classification databases. Table 2 presents some characteristics of these
databases. More detailed information can be found in [MA92]. It should be
noted that all attributes in these datasets are continuous.
accuracy for both training and test phases. Results on training sets show
that this algorithm provides a high quality of separation between two sets. In
our experiments we used only large-scale datasets. Results on these datasets
show that a few hyperplanes are sufficient to separate efficiently sets with large
numbers of points. Since we use a derivative-free method to solve problem (30)
the number of objective function evaluations is a significant characteristic for
estimation of the complexity of the max-min separability algorithm. Results
presented in Tables 3 and 4 confirm that the proposed algorithm is effective
for solving classification problems on large-scale databases.
Acknowledgements
This research was supported by the Australian Research Council.
References
[AG02] Astorino, A., Gaudioso, M.: Polyhedral separability through successive
LP. Journal of Optimization Theory and Applications, 112, 265-293
(2002)
206 A.M. Bagirov, J. Ugon
Gleb Beliakov
Summary. The theory of abstract convexity provides us with the necessary tools
for building accurate one-sided approximations of functions. Cutting angle methods
have recently emerged as a tool for global optimization of families of abstract convex
functions. Their applicability have been subsequently extended to other problems,
such as scattered data interpolation. This paper reviews three different applications
of cutting angle methods, namely global optimization, generation of nonuniform
random variates and multivatiate interpolation.
1 Introduction
The theory of abstract convexity [RubOO] provides the necessary tools for
building accurate lower and upper approximations of various classes of func-
tions. Such approximations arise from a generalization of the following clas-
sical result: each convex function is the upper envelop of its affine minor ants
[Roc70]. In abstract convex analysis the requirement of linearity of the mino-
rants is dropped, and abstract convex functions are represented as the upper
envelops of some simple minor ants, or support functions, which are not nec-
essarily affine. Depending on the choice of the support functions, one obtains
different flavours of abstract convex analysis.
By using a subset of support functions, one obtains an approximation
of an abstract convex function from below. Such one-sided approximation,
or underestimate, can be very useful in various applications. For instance,
in optimization, the global minimum of the underestimate provides a lower
bound on the global minimum of the objective function. One can find the
global minimum of the objective function as the limiting point of the sequence
210 G. Beliakov
We start with the classical case of afRne support functions [Roc70, RubOO].
Example 1. Let the set H denote the set of all affine functions
I /// //
the Lipschitz constant of g in Zi-norm. Thus the underestimate (6) can also
be used to approximate Lipschitz functions on the unit simplex.
Function (6) has a very irregular shape illustrated on Figs. 2,3, the reason
why it is often called the saw-tooth underestimate (or saw-tooth cover) of / .
Example 3. [RubOO]. Let the set H be the set of functions of the form
Example 4- [Bel05]. Let dp be a simplicial distance function, and let the set
H be the set of functions of the form
Since dp can also be written as (4), we can use the following underestimate
of a Lipschitz /
Fig. 5. The hypograph of the function H^ in (8) in the case of two variables.
Fig. 6. The Voronoi diagram of a set of sites, and its dual Delaunay triangulation.
There are multiple extensions of the Voronoi diagram, notably those based
on the generalization of the distance function [OBSCOO, BSTY98]. One such
generalization is called additively weighted Voronoi diagram, in which case
each site has an associated weight Wk-
Definition 10. Let {x^}j^=i^x^ ^ R^ be the set of sites, and w G R^ be the
vector of weights. The set
Vor{x\w) = {xeR'' : Wk + \\x - x^\\ < wj + \\x - x^'||,Vj / k},
is called Additively Weighted Voronoi cell. The collection of such cells is called
Additively Weighted Voronoi diagram.
Voronoi diagrams and their duals, Delaunay (pre-)triangulations, are very
popular in multivariate scattered data interpolation, e.g., Sibson's natural
neighbour interpolation [SibSl].
Let us show how Voronoi diagrams are related to underestimates (7),(8).
First consider the special case f = I. For the function H^ in (7), and for each
k = 1,... ,K define the set
S^ = {xeR'': h^x) > h^{x),\fj ^ k).
It is easy to show that sets S^ coincide with Voronoi cells Vor{x^). In-
deed, h^{x) > h^{x) implies 1 — C\\x — x^\\ > 1 — C\\x — x^\\, and then
\\x — x^W < \\x — x^\. Furthermore, if we now take H^ in (8), the sets S^
coincide with Voronoi cells in distance dp.
Let us now take an arbitrary Lipschitz / and (7). Consider an additively
weighted Voronoi diagram with weights Wk given as Wk - ^ ^ . It is not
difficult to show that Voronoi cells Vor{x^^w) can be written as
Vor{x^,w) = {xeR'' : h^{x) > h^{x),Wj ^ k}.
The last equation is also valid for other distance functions, and in particular
dp and h^ in (8).
Applications of Cutting Angle methods 217
min/(x) (9)
s.t. X e D.
Below we present the generalized cutting plane method, of which cutting angle
method (CAM) is a particular instance, following [RubOO, ARG99, BROO]. The
principle of this method is to replace the original global optimization problem
with a sequence of relaxed problems
uiinH^ix) (10)
s.t. X e D,
The relaxed problems (10) are required at every iteration of the algorithm,
and as such their solution must be efficient. In the case of convex / we obtain
Kelley's cutting plane method. In this case the relaxed problem can be solved
using linear programming techniques.
For Lipschitz and IPH functions, the relaxed problems are very challeng-
ing. In the univariate case, the above algorithm is known as Pijavski-Shubert
method [HJ95, Pij72, Shu72, SSOO], and many its variations are available.
However its multivariate generalizations, like Mladineo's method [Mla86], did
not succeed for more than 2-3 variables because of significant computational
challenges [HP95, HPTOO].
To solve the relaxed Problem (10) with H^ given by (6),(7) or (8), one
has to enumerate all local minimizers of the saw-tooth underestimate. The
number of these minimizers grows exponentially with the dimension n, and
until recently this task was impractical. Below we review a new method for
enumerating local minimizers of i J ^ , as published in the series of papers
[BROO, BROl, BB02, Bel03].
v^/^.../^/
The following result is proven in [BROO]: every local minimizer x* of H^ in
ri S corresponds to a combination L satisfying two conditions
(I)Vi,ie/,f9^i:/^>Z^
(II) yvefC\L ,3i€l:l'l' <Vi
where K, = {l^,P,..., l^} is the set of all support vectors. Further, the actual
local minima are found from L using
d = H'^{x*)=Trace{L)-\ (13)
x*{L) = ddiag{L).
220 G. Beliakov
Condition (I) implies that the diagonal elements dominate their respective
columns, and condition (II) implies that the diagonal of L does not dominate
any other support vector v. Thus we obtain a combinatorial problem of enu-
merating all combinations L that satisfy conditions (I) and (II).
It is infeasible to enumerate all such combinations directly for large K.
Fortunately there is no need to do so. It was shown in [BB02, Bel03, Bel04]
that the required combinations can be put into a tree structure. The leaves of
the tree correspond to the local minimizers of H^, whereas the intermediate
nodes correspond to the minimizers of H'^^ H^'^^,..., H^~^. Such a tree is
illustrated on Fig.7. The use of the tree structure makes the algorithm very
efficient numerically (as processing of queries using trees requires logarithmic
time of the number of nodes).
{^ =(L i i^
/6 _ /-7 9 21-\
/ ' = (0.0,1) / ' = (0.04)
•*mi» ^419 > 419 ' 419- ^mt. = (n53'nm'T555)
Fig. 7. The tree of combinations of support vectors L that satisfy conditions (I)
and (II) and define local minima of H^.
(0,1.0)
(0,0,1) (1,0,0)
Fig. 8. Sets A{L) on which the saw-tooth underestimate has unique local minimum.
Two such sets are shown. Black circles denote points x'^.
min a (16)
s.t. \/i e I: a- l^'xi > 0,
Xe A{L)nD,
and recall that the set A{L) is an intersection of halfspaces (14) and D is a,
polytope. The details are given in [Bel04].
Consider now functions (8), illustrated on Figs.4,5. In this case we can use
a similar enumeration technique. Define the support vectors
222 G. Beliakov
if = ^ - 4 . (17)
Form ordered combinations of n + 1 support vectors L (12). We have the
following result [Bel05]: every local minimizer of H^ corresponds to a combi-
nation L that satisfies conditions (I) and (II) above, and the actual minima
are found from
^^jjK^^.^^Trace^>±l^ (18)
G
where C " = E i e / ^ •
The sets A{L), on which each local minimum is unique, are characterized
by
Vi, j e { 1 , . . . , n + 1 } , i i ^ j : Cj{x* - x'/) < Qix* - x ^ ) . (19)
We performed extensive testing of various versions of CAM on test and real life
problems [BB02, Bel03, Bel04, BTMRB03, LBB03, LBB03]. In this section,
to indicate the performance of the algorithm, we present a selection of results
of numerical experiments. We took the following test optimization problems.
Test Problem 1 (Six-hump camel back function)
-2<Xi<2,i = 1,2.
0<Xi<10,i = 1,2.
Parameters a* and d are given in [HPT00],p,262.
Test Problem 3 [HJ95]
f{x) = sin(xi)sin(xiX2)sin(xiX2X3)
0<Xi<A,
Test Problem 4 (Griewanks function)
4000,
i=l i=l \V /
- 5 0 <Xi< 50.
Applications of Cutting Angle methods 223
3.5 Applications
Various versions of CAM have been applied to solving real life practical prob-
lems. In [BRYOl, BRY02] the authors successfully used CAM in problems of
supervised classification. In particular they applied CAM for automatic clas-
sification of medical diagnosis. In [BRS03] the same authors extended the use
of CAM for unsupervised classification problems.
CAM has been applied as a tool to find parameters of a function in uni-
variate and multivariate nonlinear approximation. [Bel03] applies CAM to op-
timize position of knots in univariate spline approximation, whereas in [Bel02]
CAM was used to fit aggregation operators to empirical data.
Recently we applied CAM to the molecular structure prediction problem
[Neu97, FloOO, LBB03]. This is a very challenging problem in computational
chemistry, which consists in predicting the geometry of a molecule by mini-
mizing its potential energy as a function of atomic coordinates. We chose the
benchmark problem of unsolvated met-enkephalin [FloOO, LBB03]. As inde-
pendent variables we used the 24 dihedral angles of this pentapeptide, and
following [FloOO], 10 of the dihedral angles (the backbone) were used as global
variables in ECAM, while the rest were treated as local variables (i.e., each
function evaluation involved a local optimization problem with respect to the
dihedral angles treated as local variables). This objective function (the po-
tential energy) involves in the order of 10^^ local minima. The problem is
very challenging because of the existence of several strong local minima which
trap local descent algorithms. For instance all reported multistart local search
algorithms failed to identify the global minimum [FloOO].
Previously we reported that a combination of CAM with local search al-
gorithms allowed us to locate the global minimum of the potential energy
function in 120,000 iterations of CAM, which took 4740 seconds (79 min) on
a cluster of 36 DEC Alpha workstations (1 MHz processors) [LBB03, LBB03].
Using ECAM and the same hardware and software configuration the global
minimum was found in 80,000 iterations, which took 50 min on the cluster of
36 DEC Alpha workstations.
224 G. Beliakov
There are two main approaches for generating random numbers from ar-
bitrary distributions. The inversion method relies on knowledge of the inverse
of the cumulative distribution P(x), P~^{y). If this inverse is given explicitly,
then one generates uniformly distributed random numbers Z and transforms
them to X using X = P~^{Z). This approach is very useful when distrib-
utions are simple enough to find P~^ analytically, however, in case of more
complicated distributions, P~^ may not available, and one has to invert P
numerically by solving the equation Z = P{X) for X, e.g., using bisection or
Newton's method. Given the slowness of numerical solution, this method be-
comes very inefficient. This method cannot be used for multivariate densities.
The second approach, so-called acceptance/ rejection method^ relies on effi-
cient generation of random numbers from another distribution, whose density
h{x) multiplied by a suitable positive constant, dominates the density p{x) of
the required distribution, Vx G Dom[p] : p{x) < g{x) = ch{x). The function
g{x) is often called the hat function of the distribution with density p. In this
case we need two independent random variates, a random number X with
density h(x) and a uniform random number Z on [0,1]. If Zg{X) < p{X),
then X is accepted (and returned by the generator), otherwise X is rejected,
and the process repeats until some X is accepted.
The acceptance/rejection approach does not rely on the analytic form of
the distribution or its inversion. However, its effectiveness depends on how
accurate p is approximated from above by the hat function. The less accurate
is the approximation, the greater is the chance of rejection (and hence ineffi-
ciency of the algorithm). A number of important inequalities relating densities
of various distributions are presented in [Dev86]. These inequalities allow one
to choose an appropriate hat function for a given p.
The acceptance/rejection approach generalizes well for multivariate distri-
butions. In fact, this method does not change at all if X is a random vector
rather than a random number. The challenge lies in efficient construction
of the hat function for a multivariate density p{x), and finding an efficient
way to sample from the distribution defined by this hat function. With the
increasing dimension, the need for tight upper approximation to p becomes
more important, as the number of wasted calculations in case of X rejected
increases.
Subdivision of the domain of p is frequently used in universal random
number generators [Hor95, LH98, LHOl]. If little information about p is avail-
able (i.e., no analj^ical form), a piecewise constant (or piecewise linear) hat
function can be used. It is constructed by taking values of p at a number of
points (Fig.9). For instance, some methods use concavity of p to guarantee
that such an approximation overestimates p, whereas in [Hor95, LH98, LHOl]
the log-concavity or T-concavity is exploited. A function is called log-concave
(or T-concave for a monotone continuous function T), if the transformed den-
sity p = ln{p) (or p = T{p)) is concave. In [ES98] the authors rely on detecting
the inflection points of p in their construction of the hat function.
226 G. Beliakov
K"
\ g(x)
1 6 >
P'x) \
6;—
Fig. 9. A piecewise constant upper semicontinuous hat function (thick soHd hne)
that approximates a monotone density p.
However, regardless of the way the hat function is obtained at this pre-
processing step, the random numbers are always generated in a similar fash-
ion. First the interval is chosen using a universal discrete generator (e.g., using
alias method [Dev86, Wal74]). Then a random variate X is generated that has
a multiple of the hat function on this interval as its density. Then X is ei-
ther accepted or rejected (according to whether Zg{X) < p{X) for a uniform
random variate Z on [0,1]). In case of rejection we have to restart from the
first step. The intervals are chosen with probabilities proportional to the area
under g on each interval.
It is clear that the form of the hat function g on each interval of the
subdivision needs not be the same. While constant or linear functions can be
used for some intervals, on intervals where p is has a vertical asymptote, or
on infinite intervals (for the tails of the distributions) other forms are more
appropriate (e.g., multiples of Pareto or Cauchy tails). It is also clear that
the multivariate case can be treated in exactly the same way, by partitioning
the domain into small regions. For T-concave distributions such method is
described in [LH98].
Hence, efficient universal generators of non-uniform random numbers or
random vectors can be built in a standardized fashion, by partitioning the
domain of p, and constructing a piecewise continuous hat function. The prob-
lem is how to build an accurate upper approximation that can serve as a hat
function. In this section we review the methods of building the hat functions
based on one-sided approximations discussed earlier in this paper.
random X
Fig. 10. A piecewise linear hat function g built using the saw-tooth overestimate
in the univariate case.
y,Pix'^
Fig. 11. A piecewise constant hat function g built using the saw-tooth overestimate
in the univariate case. The value gk is chosen as the absolute maximum of the
saw-tooth overestimate on each Dk-
for the alias method) and longer tables in the alias method, but not in longer
generation time once preprocessing has been finalized.
One variation of this method is to use shorter tables (i.e., less subintervals),
but to improve the lower overestimate of the maximum of p on each subinter-
val. Previously we assumed that such lower overestimate is the maximum of
230 G. Beliakov
H{x) = d{L), i f x G ^ ( L ) ,
generator. We can use the alias method [Dev86, Wal74] for this purpose. The
second step requires additional processing, as generation of random variates
on a polytope requires its triangulation.
Generation of random variates uniformly distributed in a simplex is rela-
tively easy using sorting or uniform spacings [Dev86],p.214. The way to gen-
erate uniform random variates on a general polytope A{L) is to subdivide
it into simplices, the procedure known as triangulation. Further, it is easy
to compute the volume of a polytope given its triangulation. Hence we will
triangulate every polytope A{L) as part of the preprocessing.
For our purposes any triangulation of the polytope is suitable, and we
used the revised Cohen and Hickey triangulation as described in [BEFOO].
This triangulation method requires the vertex representation of the polytope
A(L), whereas it is given as the set of inequalities (14). The calculation of ver-
tex representation of A{L) can be done using the Double Description method
[FP96, MRT53]. The software package CDD, which implements the Double
Description method is available from [Fuk05]. The software package Vinci,
available from [Eng05] can be used for the revised Cohen and Hickey triangu-
lation.
Once the triangulation of the sets A{L) is done, the volume of each simplex
needs to be computed and multiplied by the value of the hat function g on
it. The volume computation is performed by taking the determinant of an
n X n-matrix of vertex coordinates [BEFOO]. The vertices and volumes (times
the value of g) of the simplices that partition the domain of p are stored for
the random vector generator.
Summarizing this section, given an arbitrary Lipschitz density p on 5, we
can find an underestimate H^ of an auxiliary function / = —p -f C, and a
partition of S into polytopes A{L), such that on each A(L), the local minimum
of i J ^ , d(L) in (13), is the greatest lower bound on / . This lower bound is
tight, i.e., one can find such a Lipschitz function, that min^^^(£,)/(x) = d,
for instance / = H^ itself. Based on H^, we define the hat function as
g = —H + C, where H{x) = o!, if x G A[L), Then we subdivide each polytope
A{L) into simplices to facilitate generation of random variates, and compute
the volume of each simplex for the discrete random variate generator.
Let us now detail some of the steps required to build a universal random
vector generator using the hat function described in the previous section. The
algorithm consists of two parts, preprocessing and generation. First, given
the set of values p{x^), k = 1 , . . . , iC, we build the saw-tooth underestimate
of an IPH function / = —p-\-C. Points x^ can be given a priori, or can be
determined by the algorithm itself, for instance each x^^k = n + 1 , . . . can
be chosen as a global minimizer of the function H^~^, i.e., at the teeth of
the saw-tooth underestimate at the current iteration. The first n points are
always chosen as the vertices of 5.
232 G. Beliakov
Preprocessing
1 Choose constant C > pmax + 2M
2 Build the saw-tooth underestimate H^ of the function / = ~p + C using
K points x^ within the domain of p, by using the algorithm from [BB02,
Bel03]. Except for the first n points, x^ are chosen automatically by the
algorithm.
3 For each local minimum of H^ compute the polytope A{L) using (14).
4 Convert each A{L) to the vertex representation using the Double Descrip-
tion method from [FP96, MRT53] and find its triangulation.
5 For each simplex Si from the triangulation of A{L) find its volume and
multiply it by P{Si) = C- d{L).
6 Store the list of all simplices as the list of vertices and computed values P
and VP[Si) = Volume[Si) x P{Si).
7 Create two tables for the alias method using the values VP as the vector
of probabilities.
Random vector generation
1 Using the alias method randomly choose simplex Si.
2 Generate random vector R uniformly distributed in the unit simplex S
([Dev86], p.214, via either sorting or uniform spacings).
3 Compute vector X — Y^^=i Rj^i, where S] is the j - t h vertex of the chosen
simplex Si ([Dev86], p.568).
4 Generate an independent uniform random number Z in [0,1]
5 If ZP{S) < p(X) then return X otherwise go to Step 1.
Applications of Cutting Angle methods 233
Fig. 12. Multimodal density p used to generate random vectors in R^. p in this
example is a mixture of five normal distributions. The algorithm uses exclusively
numerical values of p and its Lipschitz constant.
Fig. 13. Density p used to generate random vectors in B?^ given by p{x^y)
kexp{—{y — x^Y ~ ^ V^ )• This density is not log-concave.
Covariance matrices were all diagonal. For the reference, the time to gener-
ate one uniform random number was 0.271 x 10~^ sec. The Ranlux lagged
Fibonacci generator with the period 10^^^ was used for uniform random num-
bers [Lue94].
Table 2. The number of local minima of H^. Function / 1 was used in the
calculations.
K n=1 n = 3 n — 5 n= 7 n= 9
1000 999 4699 13495 24810 31217
'2000 1999 9631 28210 50526 74132
4000 3999 20435 104117 177358 187973
8000 7999 42031 270328 527995 886249
15000 14999 81301 532387 1093040 1956075
20000 19999 109587 738888 1605995 2661807
25000 24999 137770 993812 3861070 6175083
30000 29999 167251 1234810 6340898 10521070
is available. For instance, it may be known a priori that the function must
be monotone, convex, positive, symmetric, unimodal, etc. These conditions
determine additional constraints on the approximant, which may find explicit
representation in terms of the parameters that are fitted to the data. In spline
approximation, this problem has been thoroughly studied (see [Die95, KM97,
KvaOO, BelOO]), and such constraints as monotonicity or convexity usually
translate into restrictions on spline coefficients.
More recently, the concept of shape preserving interpolation and approxi-
mation has been extended to include other known a priori restrictions on the
approximant, such as generalized convexity, unimodality, possessing peaks or
discontinuities, Lipschitz property, associativity [KM97, Bel03]. These restric-
tions require new problem formulations leading to new specific methods of
approximation.
In this section we consider interpolation of scattered multivariate data
which restricts the Lipschitz constant of the interpolant. Lipschitz condition
ensures reasonable bounds on the interpolated values of the function, which is
sometimes hard to achieve in nonlinear interpolation. As we shall see, preserva-
tion of the Lipschitz condition implies strict bounds on the difference between
the interpolant and the function it models in the Chebyshev max-norm, so
Applications of Cutting Angle methods 237
g{x')==y',k = l,...,K.
such that
gix'') = fix') = y\k = l,...,K.
238 G. Beliakov
The lower approximation directly follows from (7), whereas the upper ap-
proximation is built from the lower approximation of an auxiliary function
/ = - / , c f . Eq. (20).
Applications of Cutting Angle methods 239
H^^^^^Cx)
Fig. 15. The lower and upper approximations of a Lipschitz function / , and the
best uniform approximation g.
norm in (7). However the method based on the simplicial distance (8) is very
efficient numerically. In this case we can represent H^^^^'^ through the list
of its local minimizers. We have an efficient method of enumerating local
minimizers of (8), described in Section 3.3. This representation is useful when
a value of H^^'^^'^ is needed for an arbitrary x G X. It allows one to compute the
maximum in (8) using only a limited subset of { 1 , 2 , . . . , K}^ which makes the
algorithm competitive with alternative methods (like Sibson's interpolation
[Sib81]).
To obtain the overestimate H'^PP^'^ we proceed as earlier: define an auxil-
iary function / = —/, for which we build the underestimate (8), then we take
TTupper __ 77"^
Test function 2
Test function 3
Fig. 17, Uniform approximation of the test function 1 using 20000 data points.
Applications of Cutting Angle methods 243
Fig. 19. Uniform approximation of the test function 2 using 80000 data points.
244 G. Beliakov
100000 times at random points to gather statistics, and the average time
is reported. Further, the maximum and mean errors of approximation are
reported. The root mean squared error is computed as
where A^ is the number of test points x^ not used in the construction of the
interpolant. All computations were performed on a Pentium-IV PC, 1.2 GHz,
512 MB Ram, Visual C + + (version 6) compiler.
6 Conclusion
The theory of abstract convexity provides us with the necessary tools for build-
ing guaranteed tight one-sided approximations of various classes of functions.
Such approximations find applications in many areas, such as global opti-
mization, statistical simulation and approximation. In this paper we reviewed
methods of building lower (upper) approximations of convex, log-convex, IPH
and Lipschitz functions, which commonly arise in practice.
We presented an overview of three important applications of one-sided ap-
proximations: global optimization, random variate generation and scattered
data interpolation. In all three applications we used essentially the same con-
struction, in which the lower (or upper) approximation was represented by
means of the list of its local minima (maxima). We also described a fast com-
binatorial algorithm for identification of these local minima. Each of the pre-
sented applications also requires a number of specific techniques to make use
of this general construction. This paper addresses this issue and presents the
details of the algorithms used in each case, and also illustrates the performance
of the algorithms using numerical experiments, and practical applications.
References
[Alf89] Alfeld, P.: Scattered data interpolation in three or more variables. In
Schumaker, L.L., Lyche, T. (eds) Mathematical Methods in Computer
Aided Geometric Design, 1-34. Academic Press, New York (1989)
[ARG99] Andramonov, M., Rubinov, A., Glover, B.: Cutting angle methods in
global optimization. Applied Mathematics Letters, 12, 95-100 (1999)
[Aur91] Aurenhammer, F.: Voronoi diagrams - a survey of a fundamental data
structure. ACM Computing Surveys, 23, 345-405 (1991)
[BROO] Bagirov, A., Rubinov, A.: Global minimization of increasing positively
homogeneous function over the unit simplex. Annals of Operations Re-
search, 98, 171-187 (2000)
Applications of Cutting Angle methods 245
[BROl] Bagirov, A., Rubinov, A.: Modified versions of the cutting angle method.
In: Hadjisavvas, N., Pardalos, P.M., (eds) Convex Analysis and Global
optimization, Nonconvex optimization and its applications, 54, 245-268.
Kluwer, Dordrecht (2001)
[BRS03] Bagirov, A., Rubinov, A.M., Soukhoroukova, N.V., Yearwood, J.L.: Un-
supervised and supervised data classification via nonsmooth and global
optimization. T O P (Formerly Trabajos Investigacin Operativa), 11, 1-93
(2003)
[BRYOl] Bagirov, A., Rubinov, A.M., Yearwood, J.L.: Using global optimization to
improve classification for medical diagnosis. Topics in Health Information
Management, 22, 65-74 (2001)
[BRY02] Bagirov, A., Rubinov, A.M., Yearwood, J.L.: A global optimization ap-
proach to classification. Optimization and Engineering, 3, 129-155 (2002)
[BB02] Batten, L.M., Beliakov, G.: Fast algorithm for the cutting angle method of
global optimization. Journal of Global Optimization, 24, 149-161 (2002)
[BelOO] Beliakov, G.: Shape preserving approximation using least squares splines.
Approximation theory and applications, 16, 80-98 (2000)
[Bel02] Beliakov, G.: Approximation of membership functions and aggregation
operators using splines. In Bouchon-Meunier, B., Gutierrez-Rios, Mag-
dalena, L., and Yager, R. (eds) Technologies for Constructing Intelligent
Systems, 2, 159-172. Springer, Berlin (2002)
[Bel03] Beliakov, G.: Geometry and combinatorics of the cutting angle method.
Optimization, 52, 379-394 (2003)
[Bel03] Beliakov, G: How to build aggregation operators from data? Int. J. Intel-
ligent Systems, 18, 903-923, (2003)
[Bel04] Beliakov, G.: The cutting angle method - a tool for constrained global
optimization. Optimization Methods and Software, 19, 137-151 (2004)
[Bel03] Beliakov, G.: Least squares sphnes with free knots: global optimization
approach. Applied Mathematics and Computation, 149, 783-798 (2004)
[Bel05] Beliakov, G: Extended cutting angle method of constrained global op-
timization. In: Caccetta, L. (eds) Optimization in Industry (in press).
Kluwer, Dordrecht (2005)
[BTMOl] Behakov, G., Ting, K.-M., Murshed, M.: Efficient serial and parallel im-
plementation of the cutting angle global optimization technique. In: 5th
International Conference on Optimization: Techniques and Applications,
1, 80-87, Hong Kong (2001)
[BTMRB03] Beliakov, G., Ting, K.M., Murshed, M., Rubinov, A., Bertoh, M.: Effi-
cient serial and parallel implementations of the cutting angle method. In:
Di Pillo, G. (ed) High Performance Algorithms and Software for Nonlinear
Optimization, 57-74. Kluwer Academic Publishers (2003)
[BSTY98] Boissonnat, J.-D., Sharir, M., Tagansky, B., Yvinec, M.: Voronoi dia-
grams in higher dimensions under certain polyhedral distance functions.
Discrete and Comput. Geometry, 19, 485-519 (1998)
[BEFOO] Biieler, B., Enge, A., Fukuda, K.: Exact volume computation for convex
polytopes: a practical study. In: Kalai, G., Ziegler, G.M. (eds) Polytopes
- Combinatorics and Computation, 131-154. Birkhauser, Basel (2000)
[Coo95] Cooper, D.A.: Learning Lipschitz functions. Int. J. of Computer Mathe-
matics, 59, 15-26 (1995)
[Dag88] Dagpunar, J.: Principles of Random Variate Generation. Clarendon Press,
Oxford (1988)
246 G. Beliakov
[LBB03] Lim, K.F., Beliakov, G., Batten, L.M.: A new method for locating the
global optimum: Application of the cutting angle method to molecular
structure prediction. In: Proceedings of the 3rd International Confer-
ence on Computational Science, 4, 1040-1049. Springer-Verlag, Heidel-
berg (2003)
[LBB03] Lim, K.F., Beliakov, C , Batten, L.M.: Predicting molecular structures:
Application of the cutting angle method. Physical Chemistry Chemical
Physics, 5, 3884-3890 (2003)
[LS02] LocatelH, M., Schoen, F.: Fast global optimization of difficult lennard-
Jones clusters. Computational Optimization and Applications, 2 1 , 55-70
(2002)
[Lue94] Luescher, M.: A portable high-quality random number generator for lat-
tice field theory calculations. Computer Physics Communications, 79,
100-110 (1994)
[Mla86] Mladineo, R.: An algorithm for finding the global maximum of a multi-
modal, multivariate function. Math. Prog., 34, 188-200 (1986)
[MRT53] Motzkin, T.S., Raiffa, H., Thompson, G.L., Thrall, R.M.: The double
description method. In: Kuhn, H.W., Tucker, A.W. (eds) Contribution to
Theory of Games, 2. Princeton University Press, Princeton, RI (1953)
[Neu97] Neumaier, A.: Molecular modeling of proteins and mathematical predic-
tion of protein structure. SIAM Review, 39, 407-460 (1997)
[OBSCOO] Okabe, A., Boots, B., Sugihara, K., Chiu, S.N.: Spatial Tessellations:
Concepts and Applications of Voronoi Diagrams (2nd edition). John Wi-
ley, Chichester (2000)
[Pij72] Pijavski, S.A.: An algorithm for finding the absolute extremum of a func-
tion. USSR Comput. Math, and Math. Phys., 2, 57-67 (1972)
[Pin96] Pinter, J.: Global Optimization in Action: Continuous and Lipschitz
Optimization-algorithms, implementations, and applications. Nonconvex
optimization and its applications, 6. Kluwer Academic Publishers, Dor-
drecht/Boston (1996)
[Roc70] Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton
(1970)
[RubOO] Rubinov, A.M.: Abstract Convexity and Global Optimization. Noncon-
vex optimization and its applications, 44. Kluwer Academic Publishers,
Dordrecht/Boston (2000)
[Ser03] Sergeyev, Y.D.: Finding the minimal root of an equation: applications
and algorithms based on Lipschitz condition. In Pinter, J. (ed) Global
Optimization - Selected Case Studies. Kluwer Academic Publishers (2003)
[Shu72] Shubert, B.: A sequential method seeking the global maximum of a func-
tion. SIAM J. Numer. Anal, 9, 379-388 (1972)
[SibSl] Sibson, R.: A brief description of natural neighbor interpolation. In: Bar-
nett, V. (ed) Interpreting Multivariate Data, 21-36. John Wiley, Chich-
ester (1981)
[SL97] Sio, K.C., Lee, C.K.: Estimation of the Lipschitz norm with neural net-
works. Neural Processing Letters, 6, 99-108 (1997)
[SSOO] Strongin, R.G., Sergeyev, Y.D.: Global Optimization with Non-convex
Constraints: Sequential and Parallel Algorithms. Nonconvex optimization
and its applications, 45. Kluwer Academic, Dordrecht/London (2000)
[Wal74] Walker, A.J.: New fast method for generating discrete random numbers
with arbitrary frequency distributions. Electron. Lett., 10, 127-128 (1974)
248 G. Beliakov
[WZ96] Wood, G.R., Zhang, B.P.: Estimation of the Lipschitz constant of a func-
tion. J. Global Optim., 8, 91-103 (1996)
[ZKS02] Zabinsky, Z.B., Kristinsdottir, B.P., Smith, R.L.: Optimal estimation
of univariate black box Lipschitz functions with upper and lower error
bounds. Int. J. of Computers and Operations Research (2002)
Part II
1 Introduction
Concave minimization techniques play an i m p o r t a n t role in other fields of
global optimization. Large classes of optimization problems can be trans-
formed into equivalent concave minimization problems. Concave minimization
can be applied in the large number of fields. For instance, m a n y problems
from such fields as economics, telecommunications, transportation, computer
design and finance can be formulated as concave minimization problems. More
applications of concave minimization can be found in [HT93, PR87]. Concave
minimization problems are N P - h a r d , even in most special cases. For instance,
252 A. Chinchuluun et al.
[PS88] has shown that minimizing a concave quadratic function over a very
simple polyhedron such as a hypercube is an NP-hard problem. More com-
plete surveys of the complexity of these and other problems can be found in
[Par93]. General concave minimization problem can be written as follows:
min f{x)
s.t. X e D^
Then
A Numerical Method for Concave Programming Problems 253
Clearly,
\\y-u\\>0 (3)
holds because u ^ Uf(^z){f)- Moreover, this y can be considered as a solution
of the following convex minimization problem:
ygiy) - AV/(y) = 0
\{fiz)-f{y)) =0 (4)
fiz) - f{y) < 0, A > 0
If A = 0, then we have Vg{y) =y — u = 0, which contradicts (3). Thus, A > 0
in (4). Then we obtain
y - u - AV/(2/) = 0, A>0,
fiv) = m-
254 A. Chinchuluun et al.
Prom this we conclude that {u—y)^Vf{y) < 0, which contradicts (2). This last
contradiction implies that the assumption that z is not a solution of Problem
(1) must be false. This completes the proof. D
This theorem tells us that we need to find a pair x,y eW^ such that
in order to conclude that the point z' e D is not a solution to Problem (1).
The following example illustrate the use of this property.
Example 1.
mm j{x) —
1 - x i - X2
s.t. 0.6 < xi < 7,
0.6 < 0:2 < 2.
We can easily show that / is a quasiconcave function over the constraint set.
The gradient of the function is found as follows.
Note that the optimality condition (2) for Problem (5) requires to check
the linear programming problem
min (x - y)^Vf{y)
s.t. Xe D
for every y G Ef(^z){f)- This is a hard problem. Thus, we need to find an ap-
propriate approximation set so that one could check the optimality condition
at a finite number of points.
The following lemmas show that finding a point at the level set of f{x) is
theoretically possible.
Lemma 1. Let h GR^, Z e D which is not a global maximizer of f{x) over
M^ and let x* be an optimal solution of the problem
max f{x)
s.t. X G M"",
and let the set of all optimal solutions of this problem be bounded. Then there
exists a unique positive number a such that x* + a/i G Ef(^z){f)'
Proof We will prove that there exists a positive number a such that x*-\-ah e
^f{z)if) ^^ fii'st. Suppose conversely that there is no number which satisfies
the above condition; i.e., f{x* + ah) > f{z) holds for all a > 0. Note that
hyp{f) — {(^>^) ^ M^"^^ : r < f{x)} is a convex set since / is a concave
function. For a > 0, we obtain (x* + ah^ f{z)) G hyp{f). Next we show that
(/i,0) is a direction of hyp{f). Suppose conversely that there exist a vector
y ^ hyp{f) and a positive scalar /? such that y^f3{h^ 0) G W^^\hyp{f). Since
W^^^\hyp{f) is an open set, there exists a scalar ji that satisfies the following
conditions:
We can conclude that x* + a/i is also a global maximizer of / for all a > 0
because x* is a global maximizer of / over R^. This contradicts the assumption
256 A. Chinchuluun et al.
in the lemma. Now, we prove the uniqueness property. Assume that there are
two positive scalars ai and a2 such that x*+aih G Ef(^z)if)^ z = 1,2. Without
loss of generahty, we can assume that 0 < a i < a2. By concavity of / , we
have
V a2 / a2
= fl-^)/(x*) + ^/(.)>/(z)
\ Oi2j 0C2
Note that f{x*) > f{z) for any z £ D.By convexity of / , we have
Prom the last inequality and assumption h^Vf{z) < 0, we can conclude that
h^vm -
Since / is a concave function, for all a > 0, we have
Fora = - i ^ ^ ^ , ^ g P > 0 , w e g e t
0<f{x*+ah)-f{z)<0,
This gives contradiction. D
Example 2. Consider the quadratic concave minimization problem
h^Ch < 0
for all /i 7^ 0. Let us solve the equation f{x* + ah) = f{z) with respect to a.
or
fix') + ah^iCx' + c/) + ^a^h'^Ch = f{z)
/2(/(z)-/(x*))- '
V h^Ch
Constructing Points on the Level Set
As we have seen in Example 2, the number a can be found analytically for the
quadratic case. In a general case, this analytical formula is not always avail-
able but Lemmas 1 and 2 give us an opportunity to find a point on the level
set using numerical methods. For this purpose, let us introduce the following
function of one variable in R"*".
m = fix*+th)-fiz). (8)
The above lemmas state that this function has a unique root in R+. Our goal
is to find the root of the function and , now, we can use numerical methods
for this problem such as the Fixed point method, the Newton's method, the
Bracketing methods and so on . We could use the following method to find
initial guesses a and b such that ip{a) > 0 and ip{b) < 0 for the Bracketing
methods as follows:
1. Choose a step size p > 0.
2. Determine ip{qp), g- = 1, 2 , . . . , ^o
3. a= {qo - l)p , b = qop
where go is the smallest positive integer number such that ip{qop) < 0. More-
over the Bisection method can be stated in the following form.
1. Determine ip at the midpoint ^ ^ of the interval [a, 6].
2. If ip{^) > 0, then the root is in the interval [ ^ , 6 ] . Otherwise, the root
is in the interval [a, ^J^]. The length of the interval containing the root
is reduced by a factor of ^.
3. Repeat the procedure until a prescribed precision is attained.
258 A. Chinchuluun et al.
v^^Vf{y')^mmx^Vf{y') (12)
where a^'s are positive numbers, P's are orthogonal vectors such that P =
—/^+-^ for j = 1 , . . . , n and x* is a solution to the problem
max f{x)
s.t. X e W.
max f{x)
s.t. X G R"".
min x^Wfiy')
s.t. X G D.
This contradicts the previous inequality. Thus 0 < a-^ < 1. Now we are ready
to prove the lemma.
Consider the point y^ — x* + 6LJ {U^ — X*) in B^. Prom the concavity of / and
the above observations, it follows that
Remark 2. If we use Selection (14), it is easy to see that the lemma is still
true when aj and aj are approximate roots to the functions ipi{t) = f{x* +
W) - f{z) and V^2(4 == / ( ^ * + ^(^^ " ^*)) " / ( ^ ) . respectively.
In analogy with 6m for A^^ introduce 9m for the set B'^ as follows.
9m = . min (v'-ff^fif),
Z = 1,2,...,771
Om = . min K - yY^fiy%
2=1,2,...,m
min x^Vf{y^)
s.t. X e D
to obtain a solution ix% i.e.,
1=1,2,...,2n
Step 6. If 6>2n < 0 then x^"*"^ := u^, k := k-{-1 and return to Step 2. Otherwise, z^
is an approximate global minimizer and terminate.
Proof. We show that if ^2n < ^ holds for all k, then z^ converges to a global
minimizer of Problem (5) in a finite number of steps. In fact, take a j G
{ 1 , 2 , . . . , 2n} such that y^ G A^^ and u^ G D satisfy
Since u^ is a starting point for finding a local solution 2;^"^^, finally, it can be
deduced that
f{z^-^^) < f{z^) for all /c - 0 , 1 , 2 , . . . .
By the assumption, the number of local minimizers z^ is finite, and this se-
quence reaches a global minimizer in a finite number of steps or stops at an
approximate local solution. This completes the proof. D
where
Iteration 1.
An initial feasible solution is x\ = (0,0)"^. Note that this vertex is a local
solution to the problem. In this case, a local search method cannot affect
the current approximate solution. The current best objective function value
264 A. Chinchuluun et al.
is 0. There does not exist a global maximizer of the function f(x) over R^.
Thus, we consider a global maximizer of the function over the constraint set;
therefore, it can be used for constructing an approximation set. The maximizer
is xl = (2.555,1.444)^. The trivial approximation set can be constructed
easily solving quadratic equations.
yl = (7.432,1.444)^, yf = (2.555,3.452)^,
yf = (0.345,1.444)^, yf = (2.555,0.102)^.
Solving linear programming problems, we find the following vectors.
Moreover, 6l — —0.563 and the initial feasible point to the next iteration is
^f = (0.75,2.0)^.
Iteration 2.
New feasible solution is XQ = (0.75,2.0)-^. The local search method cannot
improve this solution. The current objective function value is —1.0625. The
trivial approximation set to the level set £'-i.0625(/) is
yl = (7.579,1.444)^, y | - (2.555,3.530)^,
yl = (0.199,1.444)'^, y^ = (2.555,0.025)^.
Solving linear programming problems, we have
ul = (3.0,0.5)^, ul = (0.75,2.0)^,
ul = (0.75,2.0)^, u^ = (1.0,0.0)^
which is the same as we find at Iteration 1. Therefore, O^ = 0.313 at the vertex
ul = (0.75,2.0)^. The algorithm terminates at this iteration and the global
approximate solution is (0.75,2.0)^. Note that this is a global solution to the
problem.
min x^Vfiy')
s.t. Xe D
z=l,2,.,.,2n
Step 6. If ^2n < 0 then x^'^'^ := u^, k \=k-\-l and return to Step 2. Otherwise
go to the next step.
Step 7. Construct a second order approximation set -B^^ at z^ by (15).
Step 8. For each y'^ G 5^1^, i = 1,2,..., 2n, solve the problem
min x^Vf{y')
s.t. X G D
where
C = { - ' - ' f ) , Ar= 't'], 6- = (20,19,3).
^~~ ^-0.5 - 4 y ' "" ~ \^1 2 1
Since the constraint set D is a polytope, we can use the algorithm without a
local search method.
Iteration 1.
Let us choose x^ = (1.0,0.0)^ as an initial feasible solution. The current ob-
jective function value is —2.0. The global maximizer of the function f{x) over
R^ is X* = (0.0,0.0)-^. The trivial approximation set can be computed as we
have seen in Example 2, and the vectors are
Iteration 2.
The current best feasible solution is x^ = (4.0,0.0)^ and the objective function
value is —32. The trivial approximation set to the level set E-s2{f) is
Iteration 3.
The current approximate feasible solution is x^ = (3.25,3.0)-^ and the objec-
tive function value at this point is —44. The trivial approximation set to the
level set E-44(f) is
A Numerical Method for Concave Programming Problems 267
XED
268 A. Chinchuluun et al.
Step 10. He^^ < 0 then x^""^ := v^ k := fc+l and return to Step 2. Otherwise
go to the next step.
Step 11. Construct an orthogonal approximation set C^^ at z^ by (16).
Step 12. For each y^ G C^?, i = 1,2,..., 2n, solve the problem
min x'^Vfif)
s.t. X e D
xeD
Step 12. Find the number s E { 1 , 2 , . . . , 2n} such that
Step 13. If ^2n < 01^^^^ ^^"^^ := 1^^, A: :=fc+1and return to Step 2. Otherwise
z^ is an approximate global minimizer and terminate the algorithm.
where
^^=(:l"f-19oL^l'5)' ^^ = (4,90,102,121,192,270).
Since the constraint set D is a polytope, we can use the algorithm without a
local search method.
Iteration 1.
A Numerical Method for Concave Programming Problems 269
Iteration 2.
x'^ = (20.0,2.0)-^. The current objective function value is —404.0. In this
case, the approach of the trivial approximation set cannot improve the cur-
rent approximate solution, i.e., 6l = 4.01 at the vertex u^ — (20.0,2.0)^.
Introducing the improved approximation set, we get d\ = —4.00 at the vertex
v\ = (19.5,8.0)^.
Iteration 3.
The current objective function value is —444.25 at x^ — (19.5,8.0)-^. Con-
structing the trivial and the improved approximation sets cannot improve the
current approximate solution, i.e., d\ — 45.41 at the vertex u\ = (20.0,2.0)-^
and §1 = 37.01 at the vertex v^ = (19.5,8.0)-^. Next, we introduce the rotation
matrix
Using this rotation matrix, the following new orthogonal approximation set is
found.
yl - (15.508,-15.508)^, y^ = (15.508,15.508)^,
yl = (-15.508,15.508)^, y^ = (-15.508, -15.508)^.
wl = (20.0,2.0)^, wl - (15.0,16.0)^,
wi = (0.0,18.0)^, w^ = (-2.0, - 2 . 0 ) ^
Iteration 4.
The current approximate solution is x^ = (15.0,16.0)-^. The objective func-
tion value at this point is —481. We can check that the algorithm stops at
this iteration. Thus, (15.0,16.0)^ is the approximate global optimal solution
to the problem.
Note that this solution is the global optimal solution to the problem. The
points x^ = (20.0,2.0)^ and x^ = (19.5,8.0)^ which we found during the
algorithm are local solutions to the problem.
270 A. Chinchuluun et al.
5 Numerical Examples
In this section, we present four examples which are implemented by the pro-
posed algorithms.
Problem 1.
The global solutions to these problems are obtained by Algorithm 1 and the
computational results are shown in Table 1.
min
mi-n
f{x) - , "" ~ ^^ ? , , + ln(l + a - {a^xf)
-P f nr'\ —
(19)
1 -h a — {oP^xY
s.t. Ax <h
x^ > —1 , i = 1,2,... ,n
Let t; be an n vector which its all entries are equal to —1. Then, whenever the
linear programming problem maxja-^x : Ax < b , Xi > —1, z = 1,2,... ,n}
has an optimal solution w, the concave quadratic minimization problem has
a global solution u G {v^w}. Moreover, u is also a global solution of Problem
(19) [Tho94]. This solution can be found using Algorithm 1 without a local
search method for Problem (19) and the computational results are shown in
Table 2.
Problem 4.
Algorithm 2 can be used for the above problem without a local search method.
Table 1 shows the computational results for the problem.
The numerical experiments were conducted using MATLAB 6.1 on a PC with
an Intel Pentium 4 CPU 2.20GHz processor and memory equal to 512 MB. The
primal-dual interior-point method [Meh92] and the active set method [Dan55],
which is a variation of the simplex method, were implemented by calling
subroutines linprog.m from Matlab 6.1 regarding the size of the problem. For
Problem (17), the subspace trust region-method [CL96] based on the interior-
reflective Newton method and the active set method [GMW81], which is a
projection method, were implemented as local search methods by calling the
subroutine quadprog.m from Matlab 6.1.
272 A. Chinchuluun et al.
6 Conclusions
In this paper, we developed three algorithms for concave programming prob-
lems based on a global optimality condition. Under some condition, the con-
vergence of the algorithms have been established. For the implementation
purpose, three kinds of approximation sets are introduced and it is shown
t h a t some numerical methods are available to construct t h e approximation
sets. At each iteration, it is required to solve 2n linear programming problems
with the same constraints as the initial problem. Some existing test problems
were solved by t h e proposed algorithms and the computational results have
shown t h a t the algorithms are efficient and easy in computing a solution.
References
[Ber95] Bertsekas, D.P.: Nonlinear programming. Athena Scientific, Belmont,
Mass. (1995)
[CL96] Coleman, T.F., Li, Y.: A reflective Newton method for minimizing a
quadratic function subject to bounds on some of the variables. SIAM
Journal on Optimization, 6, 1040-1058 (1996)
[Dan55] Dantzig, G.B., Orden, A., Wolfe, P.: The generalized Simplex Method
for minimizing a linear form under linear inequality constraints. Pacific
Journal Math., 5, 183-195.
[Die94] Dietrich, H.: Global optimization conditions for certain nonconvex mini-
mization problems. Journal of Global Optimization, 5, (359-370) (1994)
[Enk96] Enkhbat, R.: An algorithm for maximizing a convex function over a simple
set. Journal of Global Optimization, 8, 379-391 (1996)
[GMW81] Gill, P.E., Murray, W., Wright, M.H.: Practical Optimization. Academic
Press, London, UK (1981)
[Hir89] Hiriart-Urruty, J.B.: Prom convex optimization to nonconvex optimiza-
tion. In: Nonsmooth Optimization and Related Topics, 219-239. Plenum
(1989)
[HT93] Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches (sec-
ond edition). Springer Verlag, Heidelberg (1993)
A Numerical Method for Concave Programming Problems 273
^ Department of Mathematics
Shanghai University
Shanghai 200444, P.R. China
xlsunQstaff.shu.edu.en
^ College of Mathematics and Information Science
Guangxi University
Nanning, Guangxi 530004, P.R. China
ljll23Qgxu.edu.cn
^ Department of Systems Engineering and Engineering Management
The Chinese University of Hong Kong
Shatin, N.T., Hong Kong, P.R. China
dliQse.cuhk.edu.hk
1 Introduction
Global optimization has been one of the i m p o r t a n t yet challenging research
areas in optimization. It appears very difficult, if not impossible, to design an
efficient method in finding global optimal solutions for general global optimiza-
tion problems. Over the last four decades, much attention has been drawn to
the investigation of specially structured global optimization problems. In par-
ticular, concave minimization problems have been studied extensively. Various
algorithms including extreme point ranking methods, cutting plane methods
and outer approximation methods have been developed for concave minimiza-
tion problems (see e.g. [Ben96, H T 9 3 , RP87] and a bibliographical survey in
[PR86]). Monotone optimization problems, as an i m p o r t a n t class of specially
structured global optimization problems, have been also studied in recent
years by m a n y researchers (see e.g. [LSBGOl, RTMOl, SMLOl, TuyOO, TLOO]).
T h e monotone optimization problem can be posted in t h e following form:
276 X. Sun et al.
where / and all ^^s are increasing functions on [/,u] with / = (^i, ^2, • • • > ^n)"^
and u = {ui,U2,... ^Un)^- Note that functions / and ^^s are not necessarily
convex or separable. Due to the monotonicity of / and ^^s, optimal solutions
of (P) always locate on the boundary of the feasible region. It is easy to see
that the problem of maximizing a decreasing function subject to decreasing
constraints can be reduced to problem (P). Since there may exist multiple lo-
cal optimal solutions on the boundary, problem (P) is of a specially structured
global optimization problem. In real-world applications, the monotonicity of-
ten arises naturally from certain inherent structure of the problem under con-
sideration. For example, in resource allocation problems ([IK88]), the profit or
return is increasing as the assigned amount of resource increases. In reliabil-
ity networks, the overall reliability of the system and the weight, volume and
cost are increasing as the reliability in subsystems increases ([TzaSO]). Partial
or total monotone properties are also encountered in globally optimal design
problems ([HJL89]).
The purpose of this survey paper is to summarize the recent progress on
convexification methods for monotone optimization problems. In Section 2, we
discuss the convexification schemes for monotone functions. In Section 3 we
first establish the equivalence between problem (P) and its convexified prob-
lem. Outer approximation method for the transformed convex maximization
problem is then described. Polyblock outer approximation method is presented
in Section 4. In Section 5, a hybrid method that combines partition, convexi-
fication and local search is described. Finally, concluding remarks with some
suggestions for further studies are given in Section 6.
There are many specific mappings that satisfy condition (5). In particular,
consider the following two functions:
Note that f{x) is a nonconvex and strictly increasing function. The plot of
f{x) is shown in Figure 1. We have f{x) - 3(x - 2)^ + 2 > 2 and f'{x) -
6 ( x - 2 ) > - 6 f o r x G [1,3]. Take t(y) - (l/p)ln(l - ^) in (2). By Corollary 1,
pi = —(—6)/2 = 3. So, any p > 3 guarantees the convexity of ft{y) on
Y* = [—l/(e^ — 1),—l/(e^^ — 1)]. Figure 2 shows the convexified function
ft{y) with p == 3. In practice, p can be chosen much smaller than the bound
defined in Corollary 1.
f(x)
ft(y)
(iii) t{y) = {ti{yi),... ytn{yn)) andtj^ j = 1,... ,n, are twice differentiate
and strictly monotone convex functions satisfying
Then ft{y) defined in (2) is a convex function on any convex subset ofY^.
Note that convex functions, C^ functions, and pointwise maximum or min-
imum of C^ functions are semismooth. Furthermore, certain composite semi-
smooth functions are also semismooth (see [Muf77]).
The idea of convexifying a nonconvex function via both domain transfor-
mation and range transformation can be traced back to 1950s. Convex (or
concave) transformable functions were introduced in [Fen51]. Let / be de-
fined on a convex subset C C R^. / i s said to be convex range transformable
or F-convex if there exists a continuous strictly increasing function F such
280 X. Sun et al.
The following theorem establishes the equivalence between the monotone op-
timization problem (P) and the transformed problem (15).
Theorem 3. ([SMLOl])
(i) t/* G Y* is a global optimal solution to problem (15) if and only if
X* = t{y*) is a global optimal solution to problem (P).
(ii) Ift~^ exists and botht andt~^ are continuous mappings, theny* G Y^
is a local optimal solution to problem (15) if and only if x* = t{y'^) is a local
optimal solution to problem (P).
It is clear that the global optimal solution xf^^ is not on the boundary of
the convex hull of the nonconvex feasible region 5. Take t to be the con-
vexification transformation (6) with p = 2. The convexified feasible region is
shown in Figure 4. Set e = 10""*. The outer approximation procedure finds
an approximate global optimal solution 2/* — (—0.21642,-0.19934) of (15)
after 17 iterations and generating 36 vertices. The point y* corresponds to
X* = (3.45290,3.58899), an approximate optimal solution to Example 1 with
/(x*) = 3.857736887.
6- x^
—S loc
5-
\ ;
X, 4
Xxf 1
X/loc
3
^^^^ J
• loc
2
1
1 2 3 4 5
oh
I ,js 1
-o.i[
-0.2 ^
I ^X' I
-0.3h
-0.4 h
-0.5h
I I ' I
-0.6h
Step 1. Compute
Vk+i = {Vk\{z'^})U{z'''\...,z'''"}.
Let T4+1 be the set of the remaining vertices after removing all improper
vertices in 14+1-
Step 4' Set k := k -\-l, return to Step 1.
1 ' '
1 z' Z' z°
[
...:• -76
x"^ y v •'' : Z:
z'
z'
^?\ z" 1
^"^^"""^^^^^^^^^^^
5 A hybrid m e t h o d
dral vertices can be limited and controlled. Moreover, as the domain shrinks
during the branch-and-bound process, the convexity can be achieved with a
smaller parameter, thus avoiding the ill-conditional effect for the transformed
subproblems.
Lemma 1. ([SL04]) Let a < (3. Denote A=[a,l3], B= [a, 7) and C - (7,/?].
Then A\{B UC) can be partitioned into 2n — 2 subboxes.
6 Conclusions
We have summarized in this paper basic ideas and results on convexification
methods for monotone optimization. Applying convexification to a monotone
optimization problem results in a concave minimization problem that can be
solved by the polyhedral outer approximation method. The polyblock approx-
imation method can also be viewed as an outer approximation method where
Convexification and Monotone Optimization 289
polyblocks are used to approximate the feasible region and upper bounds are
computed by ranking the extreme points of the polyblock. Integrating the
promising features of convexification schemes and the polyblock approxima-
tion method, the newly proposed branch-and-bound framework that combines
partition, convexification and local search is promising from the computational
point of view.
Among many interesting topics for the future research, we mention the
following three areas:
(i) D.I. functions (difference of two increasing functions) constitute a large
class of nonconvex functions in global optimization (e.g., polynomials). Di-
rect application of the convexification method to problem with D.I. functions
involved gives rise to a D.C. (difference of convex functions) optimization
problem ([HT99]). It is of a great interest to study efficient convexification
methods for different types of D.I. programming problems and develop effi-
cient global optimization methods for the transformed D.C. problems.
(ii) Many real-world optimization models may only have partial monotonic-
ity. For example, the function is monotone with respect to some variables and
nonmonotone with respect to other variables, or the function is a sum of a
monotone function and a nonmonotone function. In global optimal design
problems ([HJL89]), partial monotonicity properties are often inherent in ob-
jective and constraint functions. How to exploit the partial monotonicity by
certain convexification scheme is an interesting topic for future study.
(iii) Many computational issues of the outer approximation method still
need to be further investigated. The major computation burden in the outer
approximation method is the computation and storage of the vertices of the
polyhedron containing the feasible region. Vertex elimination technique could
be a possible remedy for preventing a rapid increase of the number of vertices
of the outer approximation polyhedron.
7 Acknowledgement
This research was supported by the National Natural Science Foundation of
China under Grants 10271073 and 10261001, and the Research Grants Council
of Hong Kong under Grant CUHK 4214/OlE.
References
[Ben77] Ben-Tal, A.: On generalized means and generalized convexity. Journal of
Optimization Theory and Applications, 21, 1-13 (1977)
[Ben96] Benson, H.P.: Deterministic algorithm for constrained concave minimiza-
tion: A unified critical survey. Naval Research Logistics, 43, 765-795
(1996)
[Cha85] Chaney, R.W.: On second derivatives for nonsmooth functions. Nonlinear
Analysis: Theory and Methods and Application, 9, 1189-1209 (1985)
290 X. Sun et al.
[CHJ91] Chen, P . C , Hansen, P., Jaumard, B.: On-line and off-line vertex enumer-
ation by adjacency lists. Operations Research Letters, 10, 403-409 (1991)
[Fen51] Fenchel, W.: Convex cones, sets and functions, mimeographed lecture
notes. Technical report, Princeton University, NJ, 1951
[HJL89] Hansen, P., Jaumard, B., Lu, S.H.: Some further results on monotonicity
in globally optimal design. Journal of Mechanisms, Transmissions, and
Automation Design, 111, 345-352 (1989)
[HofSl] Hoffman, K.L.A.: A method for globally minimizing concave functions
over convex set. Mathematical Programming, 20, 22-32 (1981)
[Hor84] Horst, R.: On the convexification of nonlinear programming problems: An
applications-oriented survey. European Journal of Operational Research,
15, 382-392, (1984)
[HT99] Horst, R., Thoai, N.V.: D.C. programming: Overview. Journal of Opti-
mization Theory and Apphcations, 103, 1-43 (1999)
[HT93] Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches.
Springer-Verlag, Heidelberg (1993)
[HV88] Horst, R., Vries, J.D.: On finding new vertices and redundant constraints
in cutting plane algorithms for global optimization. Operations Research
Letters, 7, 85-90, (1988)
[IK88] Ibaraki, T., Katoh, N.: Resource Allocation Problems: Algorithmic Ap-
proaches. MIT Press, Cambridge, Mass. (1988)
[Li95] Li, D.: Zero duality gap for a class of nonconvex optimization problems.
Journal of Optimization Theory and Applications, 85, 309-324 (1995)
[Li96] Li, D.: Convexification of noninferior frontier. Journal of Optimization
Theory and Apphcations, 88, 177-196 (1996)
[LSOl] Li, D., Sun X.L.: Convexification and existence of saddle point in a p-th
power reformulation for nonconvex constrained optimization. Journal of
Nonlinear Analysis: Theory and Methods (Series A), 47, 5611-5622 (2001)
[LSBGOl] Li, D., Sun, X.L., Biswal, M.P., Gao, F.: Convexification, concavifica-
tion and monotonization in global optimization. Annals of Operations
Research, 105, 213-226 (2001)
[LSM05] Li, D., Sun, X.L., McKinnon, K.: An exact solution method for reliability
optimization in complex systems. Annals of Operations Research, 133,
129-148 (2005)
[LWLYZ05] Li, D., Wu, Z.Y., Lee, H.W.J., Yang, X.M., Zhang, L.S.: Hidden convex
minimization. Journal of Global Optimization, 31, 211-233 (2005)
[Muf77] Mufflin, R.: Semismooth and semiconvex functions in constrained opti-
mization. SIAM Journal on Control and Optimization, 15, 959-972 (1977)
[PR86] Pardalos, P.M., Rosen, J.B.: Methods for global concave minimization: a
bibliogrphic survey. SIAM Review, 28, 367-379 (1986)
[RP87] Rosen, J.B., Pardalos, P.M.: Constrained Global Optimization: Algo-
rithms and Apphcations. Springer-Verlag (1987)
[RTMOl] Rubinov, A., Tuy, H., Mays, H.: An algorithm for monotonic global opti-
mization problems. Optimization, 49, 205-221 (2001)
[SL04] Sun, X.L., Li, J.L.: A new branch-and-bound method for monotone opti-
mization problems. Technical report. Department of Mathematics, Shang-
hai University (2004)
[SLL04] Sun, X.L., Luo, H.Z., Li, D.: Convexification of nonsmooth monotone func-
tions. Technical report. Department of Mathematics, Shanghai University,
(2004)
Convexification and Monotone Optimization 291
[SMLOl] Sun, X.L., McKinnon, K.I.M., Li, D.: A convexification method for a class
of global optimization problems with applications to reliability optimiza-
tion. Journal of Global Optimization, 2 1 , 185-199 (2001)
[TuyOO] Tuy, H.: Monotonic optimization: problems and solution approaches.
SI AM Journal on Optimization, 11, 464-494 (2000)
[TLOO] Tuy, H., Luc, L.T.: A new approach to optimization under monotonic
constraint. Journal of Global Optimization, 18, 1-15 (2000)
[TzaSO] Tzafestas, S.G. Optimization of system reliability: A survey of problems
and techniques. International Journal of Systems Science, 1 1 , 455-486
(1980)
[WBZ05] Wu, Z.Y., Bai, F.S., Zhang, L.S.: Convexification and concavification for
a general class of global optimization problems. Journal of Global Opti-
mization, 3 1 , 45-60 (2005)
Generalized Lagrange Multipliers for
Nonconvex Directionally DifFerentiable
Programs
^ Department of Mathematics-Informatics
Ho Chi Minh City University of Pedagogy
280 An Duong Vuong St., District 5, HCM city, Vietnam
ndinhQhcmup.edu.vn
^ Division of Mathematical Sciences
Pukyong National University
5 9 9 - 1 , Daeyeon-3Dong, Nam-Gu, Pusan 608 - 737, Korea
gmleeSpknu.ac.kr
^ Ninh Thuan College of Pedagogy
Ninh Thuan, Vietnam
latucin02@yahoo. com
min f{x)
subject to X e C, gi{x) < 0, i = 1,2, • • • , m.
It is worth noting that the conclusion of Lemma 1 still holds without the
assumption on the lower semicontinuity of p'(xo, •) if X is finite dimensional.
This was established in [War91, Lemma 2.6].
In this section we will concern the Problem (P) where f^giiX —> MU{+cx)},
i G / := {1,2, • • • , m}. Let S be the feasible set of (P), that is, S := C n{x G
^ I 9i{^) < 0, i = 1,2, • • • , m}. Let also XQ e S and I{xo) := {i e I \ gi{xo) =
0}. We assume in the following that all the functions / and gi, i G / are
directionally differentiable at XQ. It is not assumed that the functions / ' ( X Q , •)
and g[{xQ^ ')-> i ^ ^(^o), are convex.
We begin with the necessary condition of Fritz-John type whose the proof is
quite elementary. Note that no extra assumptions are needed here but the
directional differentiability of / , p^, and the continuity of gi (at XQ) with
i ^ I{xo). The same condition (holds for feasible directions from XQ and
X = M'^) was recently proved in [CraOO].
Aof(xo,r)+ Yl A,^i(^o,r)>0
Definition 4. [MW90] The Problem (P) is called (CQ2) regular atxo if there
exists X G cone{C — XQ) such that
It suffices to prove that for each r G cone{C — XQ), Ao(r) 7^ 0. Assume on the
contrary that there is f G cone{C — XQ) with Ao(f) = 0. We will prove that in
this case it is possible to replace the multiplier A(f) by some other A(f) with
Ao(f) 7^ 0 such that (6) holds at r == f with A(f) instead of A(f).
Since XQ is a local minimizer of (P), the following system of variable ^ G X
is inconsistent:
(i) Suppose that (c) holds, i.e., (P) is (CQ2) regular and ^'(XQ,.) is l.s.c.
for all i G /(XQ). Let ^'^(.), ^[(.) be upper approximates of / and gi, i E /(xo)
at Xo in the direction f, respectively. By Lemma 1 there exist h,hi, upper
approximates of / , ^^, i G /(XQ) at XQ (respectively), satisfying for all x G X ,
Therefore, if AQ = 0 then
y^ Xihi{x) > 0, Vx G cone{C - XQ). (9)
iEl{xo)
iel(xo) iGlixo)
which contradicts (9). Hence, AQ 7^ 0 (and we can take AQ = 1). With x = f,
(8) gives
Hr)-^ Yl Xihi{f)>0.
ieI{xo)
Since h{f) < ^^(f) = f'{xo,f), hi{f) < ^[(f) = g'i{xo,f), and A^ > 0 for all
i € I{xo), we arrive at
f'ixo,f)+ ^ Xig'iixo,f)>0.
iei{xQ)
Take Xi{f) — Xi for i G /(XQ) and A^(f) = 0 for all i ^ /(XQ) and A(f) =
(A^(f))^e/• It is obvious that A(f) satisfies the condition (DKT) at r = f. The
proof is complete in this case.
(ii) The proof for the case where (b) holds is the same as in the previ-
ous case, using Lemma 2.6 in [War91] instead of Lemma 1 (see the remark
following Lemma 1).
(iii) The proof for the case where (a) holds is quite similar to that of (c).
Take ^^, ^ [ to be the upper approximates of / and pi, i G /(XQ) (respectively)
at xo in the direction f that exist by (CQl). The inconsistency (7) implies the
inconsistency of the following system:
X G cone(C - XQ), ^ ^ ( X ) < 0, ^j^(x) < 0, i G /(XQ).
Then we get (8) with h is replaced by #^ and hi is replaced by ^ [ , i G /(XQ).
If Ao = 0 then
y] Xi^lix) > 0, Vx G cone{C - XQ).
i€l{xo)
This is impossible since by (CQl), ^[(x) < 0 for all i G /(XQ) and A^ > 0,
{i G /(xo)) not all zero. Hence AQ J^ 0. The rest is the same as in (i). The
proof is complete. D
Generalized Lagrange Multipliers 301
The relation between (CQl), (CQ2) and the other regularity conditions,
as well as the relation between (DKT) and some other Kuhn-Tucker condition
will be discussed at the end of Section 3 in the context of quasidifferentiable
programs.
f{xo,x)<(t){x), VXGX,
gii^o.x) < (f)i{x), V X G X, \fie I{xo).
The Problem (P) is called invex atxo on S with respect to (/>(•), (/){{'), i G I{xo)
if there exists a function rj : S —> cone{C — XQ) such that the following holds:
f{x)-f{xo)>ct>{v{x))^ VXG5,
9i{x) - Qiixo) > ^i{rj{x)), Vx G 5, \/i e I{XQ),
If (P) is invex (at XQ on S) with recpect to / ' ( X Q , •),^^(XO, -), i G I{XQ) then
it is called simply invex (the most important case).
Note that if / , gi are differentiable at XQ then the invexity of (P) (with respect
to f'{xo, •), g[{xo^ ')^ i G H^o)) ^s exactly the one which appeared in [HanSl,
CraSl]. In Definition 5, if in addition, f,gi are locally Lipschitz at XQ and
if we take 0(-) = f^{xo,')^ (j)i{') = ^^(XQ,*)? ^ ^ -^(^o) then we come back
to the definition of invexity appearing in [YS93, BRS83]. This also relates to
the cone-invexity for locally Lipschitz functions, which was defined in [Cra86].
The following result was established in [CraOO] concerning feasible directions
and for X = R^. Its proof is almost the same as in [CraOO, DT03] and so it
will be omitted.
Corollary 1. For the problem (P), let XQ is a feasible point and let cj), (pi
be upper approximates of f, gi, i G / at XQ, respectively. Suppose that gi is
continuous at XQ for all i ^ /(xo).
(i) If XQ is a local minimizer of (P) then there exist AQ > 0, A^ > 0^ i G / ;
not all zero, such that
Moreover, if there exists x G 5 such that (j)i{x) < 0 for all i G I{xo) then XQ ^
0 (and hence, one can take AQ = \)> That is, there exists X = {Xi)i^i G M!f?
such that
(a) Conversely, if XQ satisfies (10) (for some upper approximates (j), (pi of
f, gi on cone {C — XQ), respectively, and some X G W^) and if (P) is invex
at xo on 5 := C n {x G X\gi[x) < 0, i = 1,2, • • • ,m} with respect to (p, cpi,
i G /(xo) then XQ is a global minimizer of (P).
in the previous subsection (Theorem 2). However, for smooth problems (i.e.,
/ , gi are differentiable), or convex, or locally Lipschitz problems, condition
(10) collapses to the standard optimaUty conditions. For instant, if / and gi
are convex then / ' ( X Q , •)? Qii^o, •)? ^ ^ -^(^o) are convex and hence, by taking
(/>(•) = /'(xo,'), 0i(-) = giixo, •), i e I{xo), (10) is none other than
f{xo,x) + ^ Xig[{xo,x) > 0, Vx G cone (C - XQ)
(provided that there is x G cone {C — XQ) satisfying g[{xQ^x) < 0 for all
i e I{xo)), Note also that by separation theorem, this inequality is equivalent
to
Oedf(xo)+ Y^ Xidgi(xo) + Nc{xo)
i€lixo)
where Nc{xo) stands for the normal con of C at XQ in the sense of convex
analysis.
Example 1. Consider the following problem (PI)
min f{x)
subject to g{x) < 0, x = (xi,X2) G C
where
C=:co{(0,0),(-l,-l),(-l,l)}
and the functions f^g-.R"^ —> R are defined by
Let i^ be a subset of X. Let ^ = (0i,</>2, • • • Am) '- D —> W^. Recall that
the map ^ is called convexlike {subconvexlike^ resp.) if ^{D) + M!p is convex
(^(D) + intR!p is convex, resp.). It is called gerneralized subconvexlike if
cone^(D) + intR!p is convex (see [HK82, Jey85, Sac02]).
It is well-known that the Gordan's alternative theorem still holds with con-
vexlike, subconvexlike, generalized subconvexlike functions instead of convex
ones (see [Jey85, Sac02] for more extensions). Namely, if ^ = (</>i, 02,''' -, 4>m) -
D —> M"^ is generalized subconvexlike (convexlike, subconvexlike) on D then
exactly one of the following assertions holds:
(i) 3x e D such that ^^(x) < 0, z = 1,2, • • • , m,
(ii) 3X - (Al, A2, • • • , Xm) G R!p, A 7^ 0 such that YlT=i >'iMx) > 0, Vx G
D,
The existence of AQ > 0, A^ > 0, z G /(xo), not all zero, satisfying the
conclusion of the theorem now follows from Gordan's theorem for generalized
subconvexlike systems (setting A^ = 0 for i ^ / ( X Q ) ) . The rest is obvious. D
Generalized Lagrange Multipliers 305
Let XQ be a feasible point of (P) and let D be the set of all feasible directions
of (P) from XQ. Set
M := {(r(a:o,c^),(^K^o,rf)W(xo)) \deD}.
We now apply Corollary 1 to derive an optimality condition (with constant
Lagrange multipliers) for (P), which was established recently in [CraOO].
f'{xo,d)-{- Y, Xigi{xo,d)>0.
i€iI{xQ)
Proof By the definition of upper Dini derivative, there exists (An) C M-f, An —>
0 such that
vedfiFia))
then iZ^ : X —> R is a l.s.c. subUnear function (finite valued). Moreover,
(/oF)+(a,d) <^(d) for all deX.
This means that ^ is an upper approximate of / o F at a (see the remark that
follows Corollary 1).
We are now in a position to give a necessary condition for optimality for
(CP).
Generalized Lagrange Multipliers 307
Xifi{Fi{xo)) - 0, Vi G / ,
where F/(xo)* is the adjoint operators o/F/(xo).
Proof We first notice that if XQ is solution of (CP) then the following system
has no solution d G X:
Let ^i{d) : - max {vi,Fl{xo)d). Then {fi o Fi)^{xo,d) < %{d) for all
Vi^dfi{Fi{xQ))
d e cone{C — XQ). It follows from Corollary 1 that there exist Ao, A^ > 0, z G
I{xo), not all zero, such that
This is equivalent to
OGAO9^O(0)+ Y1 ^idHO) + Nc{xo). (15)
d^iiO) = FlixonOMFiixo))].
It follows from the last equahty and (15) that there exist Vi G dfi{Fi{xo)), iG
/ U {0} such that
308 N, Dinh et al.
771
Theorem 6. Assume that all the conditions in Theorem 5 hold. Assume fur-
ther that the regularity condition that there is do G cone{C — XQ) satisfying
^i{do) < 0; for all i G I{xo) holds. If XQ is a solution of (CP) then there exist
Ai > 0, i G / , Vi e dfi{Fi{xo)), i G /o such that
m
[F^{xo)*vo + ^XiF'{xoyvi]{x - xo) >0,\/xe C,
XiMFiixo)) = o,\/ie I.
Proof The proof is the same as that of Theorem 5. Note that if the regularity
condition in the statement of the theorem holds then Ao 7^ 0 in (14). D
It is worth noting that the same conditions as in Theorems 5-6 were es-
tablished in [Jey91] under the additional assumption that the maps Fi, i E IQ
are locally Lipschitz.
The following example illustrates the significance of Theorems 5, 6.
Let
^^(x) := max {x,v) + {x,v), (18)
ved_f{xQ)
It is easy to see that ^^(.) is sublinear, l.s.c, f\xQ^x) < # ( x ) for all x G X,
and f'{xo,^) = ^^(0? which proves ^^(.) to be an upper approximate of / in
the direction ^. D
Consider the problem (P) defined in Section 1. Let S be the feasible set of
(P) and XQ e S. We are now ready to get necessary and sufficient optimality
conditions for (P).
Theorem 7. (Necessary condition) For the problem (P), assume that f, gi,
i G / == {1, 2, 3, • • • ,m} are quasidifferentiable at XQ and gi is continuous at
XQ for all i ^ I{XQ). If XQ is a minimizer of (P) then
Proof. It follows from Lemma 3 that the functions/ and gi possess upper
approximates at XQ in any direction ^ G X. The conclusion now follows from
Theorem 2. D
Theorem 8. (Sufficient condition) For the problem (P), assume that f, gi,
z G / = = { l , 2 , 3 , - - - , m} are quasidifferentiable at XQ and gi is continuous at XQ
for all i ^ I{XQ). Assume further that XQ is a directional Kuhn-Tucker point
of (P). If (P) is invex at XQ on the feasible set S then XQ is a global solution
of(P)-
It should be noted that both the necessary and sufficient optimality con-
ditions for (P) established in Theorems 7, 8 do not depend on any specific
choice of quasidiflPerentials of / and gi, i E I{XQ).
The regularity conditions are of special interest in quasidifferentiable opti-
mization. The above (CQ2) condition was introduced in [MW90]. It is prefered
much since it does not depend on any specific choice of the quasidiff'erentials
(see [LRW91]). In order to make some relation between our results and the
Generalized Lagrange Multipliers 311
others, we take a quick look at some other regularity conditions that ap-
peared in the literature and for the sake of simplicity we consider the case
where C = X.
(CQ3) i G / ( x o ) , V^^ e % ( x o ) , 0 ^ CO U (^^(^o) + ^^).
ieI{xo)
The (CQ3) condition was used in [SacOO] and [LRW91] while (RC) was
introduced in [War91], both for the case where X = W^.
It was proved in [DT03] that in the finite dimensional case (CQ3) is equiv-
alent to (RC). By Lemma 3, it is clear that (RC) implies (CQl).
On the other hand, it was proved in [LRW91] that (RC) imphes (CQ2)
when X = E^. However, the proof (given in [LRW91]) goes through without
any change for the case where X is a real Banach space. Briefly, the following
scheme holds for quasidifferentiable problems:
(CQl)
The conclusion in Theorem 7 (also Theorem 8) was established in [DT03]
for quasidiff'erentiable Problem (P) when C = X = W^ and under the (CQ3)
(or the same, (RC)).
Due to the previous observation. Theorems 7 still holds if (RC) is assumed
instead of (i) or (ii).
As mentioned above, the quasidifferentiable problems with inequality con-
straints of the form (P) have been studied by many authors. Various types of
Kuhn-Tucker conditions were proposed to be necessary optimality conditions
for (P) (under various assumptions and regularity conditions). A typical such
condition is as follows:
point of (P). This conclusion still holds (without any change in the proof)
when X is a Banach space.
The following example shows that the two notions of the Kuhn-Tucker
point and the directional Kuhn-Tucker point are not coincide, and that even
for a simple nonconvex problem the generahzed Lagrange multiplier can not be
chosen to be a constant function. It also shows that one can use the directional
Kuhn-Tucker condition to search for a minimizer.
min f{x)
subject to g{x) < 0, x == (xi,X2) G C C B?,
f{x) \=X2,
9{x) := I xixi-\-(xj+xl)^
+ {xj -f xl)^
- X2 if
if
X2 > 0,
X2 < 0.
Let xo = (0,0) G R 2 .
(a) Consider first the case where C := co {(0,0), (0,1), (1, —1)}.
(i) It is clear that 5 = C H {x G R^ | g{x) < 0} = co {(0,0), (0,1)} C
cone {C — XQ) = cone C, where S is the feasible set of (P3). It is also easy
to check that XQ is a directional Kuhn-Tucker point of (P3). The generalized
Lagrange multipher A : cone C —> R+ can be chosen as follows (r = (ri, r2) G
cone C):
r ( x o , r ) + A(r)^'(xo,r)>0. (21)
On the other hand, since /'(xo,r) = r2, g'{xo,r) = g{r), /(XQ) = g{xo) =
0, it is easy to see that (P3) is invex at XQ with rj : S —> cone C, rj{x) = x.
Consequently, XQ is a minimizer of (P3) due to Theorem 3.
(ii) For (P3), the generalized Lagrange multiplier A : cone C —> R4. can
not be chosen to be a constant function. In fact, (21) is equivalent to
This shows that for r = (ri,r2) G cone C with r2 < 0 (then g{r) > 0), A(r)
satisfies (22) if and only if A(r) G [—-^5+00). So the multilipier A(r) = ~~^
which is chosen in (20) is the smallest possible number such that (22) holds.
Generalized Lagrange Multipliers 313
r2n 1
= = y 1 + rj^ - Tin -^ +00 as n -^ +00.
9{rn) Tin + x/1 + rlIn
(h) The case where C = M^. The Problem (P3) with C = B? was con-
sidered in [War91, Example 3.2], [LRW91, Example 3] and [DT03, Example
3.9]. It was proved in [LRW91] that xo is not a Kuhn-Tucker point of (P3).
But it is shown in [DT03] that XQ is a directional Kuhn-Tucker point of (P3).
Moreover, similar observations as in the case (a) ((i) and (ii)) still hold. We
now show another feature of the directional Kuhn-Tucker condition.
It is possible to search for the candidates for minimizers of (P3) by using
the directional Kuhn-Tucker condition. Note that a point a: is a directional
Kuhn-Tucker point of (P3) if and only if for each r = (ri,r2) G M^ the
following system (linear in variable A) has at least one solution A:
f(x,r)+A^'(x,r)>0,
A > 0, (23)
Xg{x) = 0.
points of (P3). This happens since (P3) does not satisfy regularity conditions
stated in Theorem 2. This means that even for non-regular problems the
directional Kuhn-Tucker condition can be used to find out solutions satisfying
this condition (if any).
The proof of Theorem 10 is the same as that of Theorem 3 with 0, (j)i playing
the role of / ' ( X Q , .)» ^^(^o, O? '^ ^ ^(^o), respectively.
Proof. ( for Theorem 9.) We follow almost the same argument as in the proof
of Theorem 2 under the assumption (b).
Fix r ^ X. Since XQ is a minimizer of (P), the following system of variable
^ G X is inconsistent:
Take ^^ ^ ^ [ , i G /(XQ) to be the functions with the property (26) and with
£, = r. Lemma 1 then ensures the existence of h and hi which are upper
approximates of / and gi, i E I{xo), respectively, such that for all x G X,
/i(x)<min{^-(x),(r)-(xo,x)},
hi{x) < min{^K^). (ginxo^x)}, W G /(XQ). ^^^^
It follows from the inconsistency of (28) and the definition of upper apprixi-
mate functions that
iGl(xo)
We now show the relation between our results and the results in [Sha86].
In [Sha86] the author considered a problem with equality and inequality con-
straints but here we ignore the equality constraints. In [Sha86], the author
considered the Problem (P) with C = X = W^^ f and gi^ i E I are locally
Lipschitz at point XQ G S {S is the feasible set of (P)). The upper Dini dirc-
tional derivative of a (locally Lipschitz) function g at XQ^ denoted by ^+(xo, .)•
The upper DSL-approximate of a locally Lipschitz g was defined as in Defini-
tion 7 with g'{xQ^x) was replaced by p"^(xo, x) in (25). Suppose that (/>, (/)i are
upper DSL-approximates of/, gi, i G I (respectively) at XQ. It was established
in [Sha86] that under the so-called "nondegeneracy condition^' (regularity con-
dition) with respect to 0^, i G I{XQ)\
Note that in (31) the inclusion holds for the quasidifferentials of upper
DSL-approximates of / and gi instead of those of / and gi themselves as in
(19). Note also that (31) can be found in [MW90] (as a special case) where it
was proved under (CQ2) regular condition. The relation between the necessary
optimaity conditions (31) and (27) is established below.
Acknowledgement
The authors would like to thank the referees whose comments improved the
paper. Work of the first author was supported partly by the project "Rought
Analysis - Theory and Applications", Institute of Mathematics, Vietnam
Academy of Science and Technology, Vietnam, and by the APEC postdoc-
toral Fellowship from the KOSEF, Korea. The second author was supported
by the Brain Korea 21 Project in 2003.
References
[BRS83] Brandao, A.J.V., Rojas-Medar, MA., Silva, G.N.: Invex nonsmooth alter-
native theorems and applications. Optimization, 48, 230-253 (2000)
318 N. Dinh et al.
[Cla83] Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York
(1983)
[Cra81] Craven, B.D.: Invex functions and constrained local minima. Bull. Austral.
Math. Soc, 24, 357 - 366 (1981)
[Cra86] Craven, B.D.: Nondifferentiable optimization by smooth approximations.
Optimization, 17, 3-17 (1986)
[CraOO] Craven, B.D.: Lagrange Multipliers for Nonconvex Optimization. Progress
in Optimization. Kluwer Academic Publishers (2000)
[DJ97] Demyanov, V.F., Jeyakumar, V.: Hunting for a smaller convex subdiffer-
ential. J. Global Optimization, 10, 305-326 (1997)
[DPR86] Demyanov, V.F., Polyakova, L.N., Rubinov, A.M.: Nonsmoothness and
quasidifferentiability. Mathematical Programming Study 29, 1-19 (1986)
[DR80] Demyanov, V.F., Rubinov, A.M.: On quasidifferentiable functionals. Dokl.
Acad. Sci. USSR, 250, 21-25 (1980) (in Russian)
[DT03] Dinh, N., Tuan, L.A.: Directional Kuhn-Tucker conditions and duahty for
quasidifferentiable programs. Acta Mathematica Vietnamica, 28, 1 7 - 3 8
(2003)
[DV81] Demyanov, V.F., VasiHev, L.V.: Nondifferentiable optimization. Nauka,
Moscow (1981) (in Russian).
[EL87] Eppler, K., Luderer, B.: The Lagrange principle and quasidifferent calcu-
lus. Wiss. Z. Techn. Univ. Karl-Marx-Stadt., 29, 187-192 (1987)
[GaoOO] Gao, Y.: Demyanov difference of two sets and optimality conditions of
Lagrange multiplier type for constrained quasidifferentiable optimization.
Journal of Optimization Theory and Apphcations, 104, 177-194 (2000)
[G192] Glover, B.M.: On quasidifferentiable functions and non-differentiable pro-
gramming. Optimization, 24, 253-268 (1992)
[Han81] Hanson, M.A.: On sufficiency of the Kuhn-Tucker conditions. J. Math.
Anal. Appl. 80, 545-550 (1981)
[HK82] Hayashi, M., Komiya, H.: Perfect duality for convexlike programs. Journal
of Optimization Theory and Applications, 38, 179-189 (1982)
[IT79] loffee, A.D., Tikhomirov, V.M.: Theory of extremal problems. North-
Holland, Amsterdam (1979)
[Jey85] Jeyakumar, V.: Convexlike alternative theorems and mathematical pro-
gramming. Optimization, 16, 643-652 (1985)
[Jey91] Jeyakumar, V.: Composite nonsmooth programming with Gateaux differ-
entiabihty. SIAM J. Optimization, 1 , 30-41 (1991)
[LRW91] Luderer, B., Rosiger, R., Wurker, U.: On necessary minimum conditions
in quasidifferential calculus: independence of the specific choice of quasid-
ifferentials. Optimization, 22, 643-660 (1991)
[Man94] Mangasarian, O.L.: Nonlinear Programming. SIAM, Philadelphia (1994)
[MW90] Merkovsky, R.R., Ward, D.E.: Upper DSL approximates and nonsmooth
optimization. Optimization, 21, 163-177 (1990)
[SacOO] Sach, P.H.: Martin's results for quasidifferentiable programs (Draft) (2000)
[Sac02] Sach, P.H.: Nonconvex alternative theorems and multiobjective optimiza-
tion. Proceedings of the Korea-Vietnam Joint seminar: Mathematical Op-
timization Theory and Applications. November 30 - December 2, 2002.
Pusan, Korea (2002)
[SKLOO] Sach, P.H., Kim, D.S., Lee, G.M.: Invexity as a necessary optimality condi-
tion in nonsmooth programs. Preprint 2000/30, Institute of Mathematics,
Hanoi (2000)
Generalized Lagrange Multipliers 319
[SLK03] Sach, P.H,, Lee, G.M., Kim, D.S.: Infine functions, nonsmooth alternative
theorems and vector optimization problems. J. Global Optimization, 27,
51-81 (2003)
[Sha84] Shapiro, A.: On optimality conditions in quasidifFerentiable optimization.
SIAM J. Control and Optimization, 22, 610-617 (1984)
[Sha86] Shapiro, A.: QuasidifFerential calculus and first-order optimality conditions
in nonsmooth optimization. Mathematical Programming Study, 29, 56-68
(1986)
[War91] Ward, D.E.: A constraint qualification in quasidifferentiable programming.
Optimization, 22, 661-668 (1991)
[YS93] Yen, N.D., Sach, P.H.: On locally Lipschitz vector-valued Invex functions.
Bull. Austral. Math. Soc, 47, 259-271 (1993)
Slice Convergence of Sums of Convex functions
in Banach Spaces and Saddle Point
Convergence
Department of Mathematics
Royal Melbourne University of Technology
Melbourne, VIC 3001, Austraha
robert.wenczelQrmit.edu.au, andy.ebQrmit.edu.au
S u m m a r y . In this note we provide various conditions under which the slice con-
vergence of fv -^ f and Qv —^ 9 implies that of fv+Qv to /H-p, where {fv}^^y^ and
{9'^}vew ^^® parametrized families of closed, proper, convex function in a general
Banach space X. This 'sum theorem' complements a result found in [EWOO] for
the epidistance convergence of sums. It also provides an alternative approach to the
derivation of some of the results recently proved in [Zal03] for slice convergence in
the case when the spaces are Banach spaces. We apply these results to the problem
of convergence of saddle points associated with Fenchel duality of slice convergent
families of functions.
1 Introduction
In this paper we provide alternative proofs of some recent results of Zalinescu
[Zal03]. Some hold for the case when the underlying spaces are general Banach
spaces and others only require the spaces to be normed linear. T h e paper
[Zal03] was originally motivated by [WE99] and extended the results of this
paper t o t h e context of normed space and to the convergence of marginal or
perturbation functions (rather t h a n j u s t sums of convex functions). In this
paper we clarify to what degree we are able to deduce such results from the
work of [EWOO, WE99] by either modifications of the proofs of [WE99] or
short deduction using the methods of [EWOO, WE99].
322 R. Wenczel, A. Eberhard
The first results give conditions under which slice convergence of a sum
{fv + 9v}vew follows from the slice convergence of the two parametrized fam-
ihes {fv}yew ^^^ {9v}veW' "^^^^ result has a counterpart for epi-distance
convergence which was proved by the authors in [EWOO] and we refer to such
results as sum theorems. We show that in the particular case of Banach spaces
the corresponding result for slice convergence follows easily from the work in
[WE99] and moreover so do the corresponding results for the so-called mar-
ginal or perturbation functions used to study duality of convex optimization
problems which are studied in [Zal03]. Such results only hold under certain
conditions which we will refer to as qualification assumptions due to their sim-
ilarity (and connections) to constraint qualifications in convex optimization
problems. The approach here is more aligned with that of [AR96] were the
sum theorem is the primary point of departure.
The marginal or perturbation function is given by h{y) := inf^rGX F{x,y)
from which the primal (convex) problem corresponds to /i(0) and the dual
problem corresponds to —/i**(0) = inf^^^y* F*(0,y*) — inf^^^y*/i*(y*).
This leads to the consideration of the dual perturbation function k{x*) :=
infy*^Y* F*{x'',y*) (see [Roc74, ET99]) and the consideration of the closed-
ness and properness of /i(y) at y == 0. Letting F, Fi e F {X xY) {i G I) then
as a framework for the study of stability of optimization problem one may
study the variational convergence of {Fi{',0)}^^j to F(-,0) and {F/(0, OI^G/
to F*(0, •) (see for example [AR96, Zal03]). Clearly this analysis is greatly
facilitated when the variational convergence under consideration is generated
by a topology for which the Fenchel conjugate is bi-continuous. Thus typically
the so-called slice and epi-distance topologies are usually considered as we
will also do in this paper. Once this is enforced the generality of this formula-
tion allows one to obtain the sum theorem alluded to in the beginning of this
introduction as well as many other stability results with respect to other op-
erations on convex functions and sets (which preserve convexity). In this way
the study of perturbation functions appears to be more general than the study
of any one single operation (say, addition) of convex functions. Indeed this is
only partly true in that when all spaces considered are Banach and the con-
straint qualification is imposed on the primal functions we will show that the
slice stability of the perturbation function follows easily from sum theorems.
When the qualification assumption is placed on the dual function we are able
to deduce the main result in this direction of [Zal03] in a straightforward man-
ner when all spaces are only normed (possibly not complete) linear spaces. It
is also possible to treat the upper and lower slice (respectively, epi-distance)
convergences separately as is done in [Zal03, Pen93, Pen02] and in part in
[WE99]. There is an economy of statement gained by avoiding this and it will
also avoid us reworking results in previously published papers. Consequently
we will not do so in this paper.
Convex-concave bivariate functions are related to convex bivariate func-
tions through partial conjugation (i.e. conjugation with respect to one of the
variables). In this context we are led to the introduction of equivalence classes
Slice Convergence of Sums of Convex Functions 323
2 Preliminaries
In this section we draw together a number of results and definitions. This is
done to make the development self-contained. A reader conversant with set-
convergence notions and the infimal convolution need only read the first part
of this section, only returning to consult results and definitions as needed. A
useful reference for much of the material of this section is [Bee93].
We will let C{X) stand for the class of all nonempty closed convex subsets
of a normed space X and CB{X) the closed bounded convex sets. Place
d{a, B) = inf{ \\a -b\\\b e B}, and Bp = {x e X \ \\x\\ < p}. Corresponding
balls in the dual space X* will be denoted B^. The indicator function of a set A
will be denoted 5^, and S{A^ •) shall denote the support function. We will use
u.s.c.to denote upper-semicontinuity and l.s.c.to denote lower-semicontinuity.
Recall that a function / : X —> R is called closed, proper convex on X if and
only if / is convex, l.s.c, is never — oo, and not identically +oo. The class
of all closed proper convex functions on X is denoted by r ( X ) , and r*(X*)
denotes the class of all weak* closed proper convex functions on X*. We shall
use the notation A for the closure of a set A in a topological space (Z, r) and,
to emphasise the topology, we may write A . For x e Z, Afr(x) denotes the
collection of all r-neighborhoods of x. For a function / : Z -^ R, the epigraph
of / , denoted epi / , is the set {(x^a) G Z x R | f{x) < a } , and the strict
epigraph epi^/ is the set {(x,a) G Z x R | f{x) < a}. The domain, denoted
d o m / is the set {x e Z \ f{x) < +oo}. The (sub-)level set {x e Z \ f{x) < a}
(where a > iniz f) will be given the abbreviation {/ < a}. Any product
X X y of normed spaces will always be understood to be endowed with the
box norm ||(a;,2/)|| = max{||a;||, ||2/||}; any balls in such product spaces will
always be with respect to the box norm. The natural projections from X xY
to X or F will be denoted by Px and Py respectively. We also will assume the
following convention for products Z x R where (Z, r ) is topological: We assume
the product topology, where R has the usual topology, and for any subset
324 R. Wenczel, A. Eberhard
Remark 2. For metrizable X, the above definitions can be shown to have the
equivalent forms:
1.
limsupF(f)
v—^w
= {x E X \3 a. net vp -^ w and xp G F{vp) with X/5 ^ x }
= {x G X I \immfd{x,F{v)) = 0}
liminf F(t')
= {x e X \\/ nets Vjs —^ w^ Bxjs -^ x with xp G F{vf3) eventually }
= {x e X \ limsupc/(x,F('i;)) = 0}
Also dom {fDg) = dom / + dom g; epi fDg 2 epi / + epi g, and
ifngr = r+g*
where /*(x*) = sup^^j^((x,x*) — f{x)) is the Young-Fenchel conjugate of / .
Lower semi-continuity of the epi-graphical multi-function v H^ epi5(/-i;n^^)
may be deduced from that of its components using the following lemma, a
proof of which may be found in [WE99].
These definitions and relations have natural counterparts for nets {/^j^^/ of
functions.
fv){x)
where the epi-limits are taken with respect to the underlying topology r.
326 R. Wenczel, A. Eberhard
Proof. We prove the last assertion only; the rest can be found in the cited
reference. Let {an ~ bn} Q A — B he a, bounded sequence, let A^ > 0 be
such that Yl^=i An = 1. Then {an} £ A and {bn} Q B are both bounded, so
X ] ^ i Anttn e A and X ^ ^ i An^n ^ B (both convergent). Thus X l ^ i Kidn -
Definition 8. Following [Att86], define for K EH, and for functions fy, Qy
(v G W),
Definition 9.
The following lemma from [WE99] provides a criterion for the inf-convolu-
tion of conjugate functionals to be weak* lower semicontinuous.
Lemma 2. ([WE99, Lem. 4-^]) Let fy and gy be in r{X) for a Banach space
X, such that i7/s:(X*, v) is bounded for each K ell. Then f*Bg* e r * ( X * ) .
The next lemma is elementary, and its proof will be omitted.
Lemma 3. Let fy be in r{X), with fy slice converging to f^, and Xy -^ Xyj
in norm, as v —^ w. Then fy(xy + •) slice converges to fwi^w + *)•
The following three lemmas provide bounds that will be of use in the next
theorem.
since ||x* + y*\\x* < K and y G dom^-^ C Xy with ||y|| < p. This yields
that ||x*||x* < 1 ( ^ ( 1 + p)'+ 2/o), from arbitrariness of ^ G ^5 fl Xy. Also,
||y*IU* < \\y* +x*\\x* + lk*IU* < K -\- ||x*||x* thus giving a uniform bound
on HK{X*,V) for all v. D
Slice Convergence of Sums of Convex Functions 329
Proof Supposing the contrary, there are nets vp —> w, {xl , xX ) G HK^X""^vp)
nBj with limi3g* (x2^) = +oo. It then follows that lim/j fy^{x\^) = —oo, and
since \\x\ || < 7 for all ^, we have contradicted the statement of Lemma 5. D
Before proving the first of our main theorems we make the following im-
portant observation for latter reference.
Proof. Prom the assumptions follows that dom fy H dom gy is nonempty, and
where Xy is any member of dom fy fi dom gy. Both {fy < p} f) Bp — Xy and
{QV ^ p}f^Bp — Xy are bounded, ideally convex [Hol75] subsets of the Banach
space Xy. Hence, by Proposition 2, {fy < p} H Bp — {gy < p} D Bp is also
ideally convex in Xy and has the same interior (in Xy) as does its X-^-closure.
Thus we obtain (5). D
K > ap>f:^{yl)+g:^{xl-yl)
= /:,(^t,) + ^ : , K ) (so (xl^x^^) e HK{X\VP))
for all (3. We need some uniform bound on the ^^^(^L)- These follow from
Lemma 5 (lower bounds) and Lemma 6 (upper bounds), the latter since
{x\^,xl^) G HKiX^'.vp) OB* and vp -> w. Thus, the Qy^ix"^^) are eventually
uniformly bounded in /?, and
This completes the proof for the case where Xy D domfy U dom^^ for all v.
For the general case, let p > infx fw Then, v \-^ {fw < p} is norm-l.s.c.
at w since (see [Bee93]) {/^ < p} shce converges to {fy; < p} diS v —> w.
Thus on choosing some x^ € {/t^ < /9}, we have some Xy £ {fy < p} with
Xy strongly convergent to Xy, dbS v ^ w. Place fy := fv{xy + •) and Qy :~
9v{xv -^ ')' By Lemma 3, fy and Qy shce converge to /-u; and Qy^ respectively.
Also, 0 G dom/^y, whence X^ contains both dom/^, and dom^-^;, with Xy ~
span (dom /-i; — dom^^). The form of the conditions in the theorem statement
are not altered by passing from fy, Qy to fy, Qy, the only change being an
increase in the value of p in the interiority condition. Thus we obtain the slice
convergence fy-\-gy -^ fw+9w Translating the sum by —x^. Lemma 3 yields
the convergence
It is well known (see, for instance, [AR96]) that results for sums, such as
Theorem 1, imply convergence results for restrictions F(-, 0) of bivariate func-
tions on product spaces X xY (just apply a sum theorem to the combination
F -{- 5xx{o}) 2tnd that such results may be used to extend sum theorems to
include an operator, that is, yield convergence of functions of the form f+goT
where T : X -^ Y is a, bounded hnear operator. As discussed in [Zal03], con-
vergence theorems for F(-,0) may be used to derive theorems not only for
sums, but also for other combinations of functions, such as max(/, ^ o T ) , and
so, in a sense, results for sums are equivalent to results for sums with operator
and equivalent to results on restrictions of bivariate functions. Thus, it is a
matter of taste, or the intended application, that will dictate the choice of
primary form to be considered.
We now use Theorem 1 to obtain a convergence theorem for restrictions of
functions on product (Banach) spaces, (cf. [Zal03, Prop. 13] for the normed-
space version)
Bj n n C Py{{F, < p} n 5 ^ ^ ^ ) ,
5 f x ^ n z , = Bf X {BjnYy) c xx (BjnYy)
CXxPY{{Fy<p}nB^''^)
= {Fy<p}nB^^''-{Gy<p}
for all V e V\{w}. Moreover, since (F^nG^)(x*,y*) = /i(x*) for x* G X*,
y* G y*, we see that properness oih — h implies that F^HGl^ = F^DG^^
and is proper. Thus, the conditions of Theorem 1 hold, from which follows
the slice convergence of Fy + Sxx{o} to Fyj + ^xx{o} iii F{X x Y), which in
turn implies that F^(-,0) -> F^(-,0) in r{X). D
Corollary 2. Let X and Y be Banach spaces. Let fy —^ fyj and gy —> gy,
under slice convergence in r{X) and r{Y) respectively, and letTy-.X-^Y
be continuous linear operators with Ty —^ T^ in operator norm. Assume that
there exist a neighborhood V of w, and S > 0, p > 0 such that
yV G V\{W} BsnYyC Ty{{fy < ^j H Pp) " {^^ < />} (6)
^if + goTrix*).
Alternately, we may deduce the above by using [Zal03, Lemmas 15, 16] with
F{x, y) := fy, (x) + gw {Ty,x + y)
In [Zal03] a number of qualification conditions are framed in the dual
spaces. We consider some related results next.
P r o p o s i t i o n 3. Let X he normed and linear {fy}y£w «^^ {gv}vew be slice
convergent families in r{X) convergent to fyj andg^, respectively, with fyDgy
proper for all v. Suppose in addition that for Fy{x, y) := fy{y) -\- gv{x — v) '^e
have
Then we have Xa G {fyJ^Qvc, < ^a} H Bp, implying (by (7)) the existence of
Iball < P with Fy{xoc^ya) < ^a- As noted earher we always have {Fy}y^w
slice convergent to F^. Also note that a simple calculation shows
Thus (8) and the (upper) shce convergence of {Fy}y^w (recalhng that
F^(x*,0) < T]) implies
and also that the norm- and weak""-closures of hy, coincide. Then hy (dual)
slice converges to hyj = hw and {Fy{',0)}y^w slice converges to Fyj{',0).
This last result could be used to deduce the next result but instead we
prefer to use a direct argument along the lines of argument in Theorem 1.
(x*,a) G 6t/;*-limsupepi/^n^*
v—^w
Then there are nets vp —> w, (x^, ap) —^'^ (x*, a), and p > 0 with (x^, ap) G
^ ; n e p i sf*,Ogt, for all p. By use of (10) we obtain a bounded net \\yp\\ < p
such that ap > F*^{xl^,yl^) = f:,^{xl^ - Vl^) + dv^iVv^) and we may now
argue as in the final part of the proof of [WE99, Theorem 4.3] to deduce that
(x*,a) Gepi/;^n^:,. D
Proof As 0 G sqri (Py domF) we have cone (Py domP) = span(Py domP).
Place Gn^G := 5xx{o}' We apply [EWOO, Thm 4.9] to Fn, P , Gn, G,
Place Zn = span (domP^ — dom Gn) and ZQ = span (domP — domG). Since
d o m P - domG = X x Py d o m P , we have ZQ = X x YQ and Zn = X x Yn,
with
cone (dom P — dom G) = cone {X x Py dom F) = X XYQ
336 R. Wenczel, A. Eberhard
therefore being closed inXxY. Also ZQ has closed complement ZQ := {0} xFQ?
and
'z;;xZo = {XxYn)n m x Y^) = {o} x (y, n yj) = {o} •
Hence the hypotheses of [EWOO, Thm 4.9] are satisfied, yielding
Definition 10. Suppose that (X, r ) and {¥, a) are two topological spaces and
{K^ : X X y ^ R, n G N } is a sequence of hi-variate functions. Define:
Definition 11. Suppose that (X, r) and {Y,a) are two topological spaces and
{K^ : X X y -^ R, n G N} is a sequence of hivariate functions.
1. We say that they epi/hypo-converge in the extended sense to a function
K :X xY -^Rif
where cl x denotes the extended lower closure with respect to x (and there-
fore w. r. t. T) for fixed y and cl ^ denotes the extended upper closure with
respect to y (and therefore w.r.t. a) for fixed x. Note that by definition,
d/:=-d(-/).
2. A point (x, y) is a saddle-point of a hivariate function K : X xY —^K if
for all (x,y) G X X y we have K{x,y) < K{x,y) < K{x^y).
The interest in this kind of convergence stems from the following result
(see [AAW88, Thm 2.4]).
Slice Convergence of Sums of Convex Functions 337
The next result from [AAW88] uses sequential forms of the epi-limit func-
tions, as per the following
Definition 12. [AAW88, p 541] Let {X,r) be topological, /n : X -> R. Then
It can be shown that these reduce to the usual (topologically defined) forms
if (X, r) is first-countable, and that the above infima are achieved. We will
need these alternate forms, for generally weak topologies on normed spaces
are not first-countable.
Definition 13. Let {X,r) and (X*,r*) be topological vector spaces. We shall
say they are paired if there is a bilinear map {-,') : X x X* -^ H such that
the maps x* —i > (^x*) and x \-^ (x, •) are (algebraic) isomorphisms such that
X* ^ (X,r)* and X ^ (X*,r*)* respectively.
It is readily checked that if ( X , r ) and (X*,r*) are paired, and so are
(F,o-) and ( r * , a * ) , then {X xY,r xa) is paired with (X* x y * , r * x a*),
with the pairing
((x,y),(x*,y*)) = (x,x*) + (y,y*),
and similarly for other combinations of product spaces.
For any convex-concave saddle function K : X x Y* —> H^ that is, where
K is convex in the first argument and concave in the second, we may associate
a convex and concave parent. These play a fundamental role in convex duality
(see [Roc74]). These are defined respectively as:
where
^ ( x , y * ) = sup,.^;,.[G(x*,2/*) + (x,x*)]
i^(x,^*)= miy^y[F{x,y)-{y,y^)],
Our focus will be on the Fenchel duality, where given the primal prob-
lem infx f + 9J ^^ form F{x,y) := f{x) + g{x + y)^ so that G{x*,y*) =
—/*(a:*—y*)—5'*(2/*) and the Fenchel dual takes the form sup^*^x* G^(0)2/*) =
suPy*ex* -/*(-?/*) -P*(y*) (cf. (12) below). Also, any K e [K,'K] is a suit-
able saddle function for the Fenchel primal/dual pair and we shall use K := K
in what follows.
The following result is taken from [AAW88] and requires no additional
assumption.
1.
(r X cr)-seq-e-lSn^oo-^^ ^ F on X xY
implies clx{er/ha*-ls K ) < K_. i
2,
with respect to the strong topology on X and the weak* topology on X*.
Slice Convergence of Sums of Convex Functions 339
where s and 5* stand for the respective norm topologies on X and X*. Now
apply Proposition 6. •
and similarly for 99^, where for any function 7/;, ij{x) := '0(—x). Note that
douiip = dom^ — d o m / and similarly for (fn- The operation ip \-^ "ip com-
mutes with conjugation and with slice limits, the verification of this being an
elementary exercise. Prom [Roc74] we have the following: Calling infx(/ + ^)
the primal problem, and infx(/n +5'n) the approximate problems, then —cp*
and —(/:?* are the associated dual objective functionals, and:
and similarly for cpn and the saddle-points {xn^y^) of K^. On taking conju-
gates of ^n we obtain
Then if {xniVn) ^'^^ saddle-points of K^ for each n and the Xn has a strong
limit X, and the saddle-values are hounded below, then K has a saddle-point
{x,y*) that is a {s X w*)-limit of saddlepoints (x^, {yn)\Mn) ^f ^ subsequence
of the K'^, with K{x^y*) the limit of the corresponding saddle-function values.
(Here 's^ stands for the norm topology on X and (y^)|Mn denotes any norm-
preserving extension (via Hahn-Banach Theorem, for example) to X* of the
restriction of ^* to Mn ) .
Proof The proof follows from Propositions 7 and 5, on showing that the
{yn)\Mn ^^^ norm-bounded in X*, so that weak*-convergent subsequences
are available and and are the required dual variables.
Since the sublevel-sets of fn are themselves slice convergent [Bee93], there
are Xn G dom/n converging to some x G d o m / . Place /n(-) '-= fn{xn + O^
9ni') '= 9n{xn H" ')' ^ith aualogous definitions for / and g as translates by x.
Then 0 G dom/n, implying that dom/^ H dom^^ C Mn-
Let (fn be the value function corresponding to fn and gn via (11). Simi-
larly, denote the corresponding saddle function by K'^, Then we immediately
observe that ip'^ =^ (p^, from which follows that
since (x^, 5*) are an optimal pair for the primal and dual problems if and only
if {xn — Xni yn) ^^^ Optimal for the problems based on the translated functions
fn, 9n' Evidently the optimal values are not affected by this translation, so we
also obtain that K'^{xn — Xn^Vn) — ^^i^n^yn)- Hence the saddle-values of
K^ are also bounded below. As Mn contains both dom/^ and dom^n (recall
this follows from 0 G dom/n), we obtain
References
[Att84] Attouch, H.: Variational Convergence for Functions and Operators. Ap-
plicable Mathematics Series, Pitman, London (1984)
[Att86] Attouch, H. Brezis, H.: Duality for the sum of convex functions in gen-
eral Banach spaces. In; Barroso, J. (ed) Aspects of Mathematics and its
Applications, 125-133. Elsevier Sc. Publ. (1986)
[AR96] Aze, D., Rahmouni, A.: On Primal-Dual stability in convex optimization.
Journal of Convex Analysis, 3, 309-327 (1996)
[AAW88] Aze, D., Attouch, H., Wets, R.J.-B.: Convergence of convex-concave sad-
dle functions: applications to convex programming and mechanics. Ann.
Inst. Henri Poincare, 5, 537-572 (1988)
[AP90] Aze, D., Penot, J.-P.: Operations on convergent families of sets and func-
tions. Optimization, 21, 521-534 (1990)
[Bee92] Beer, G.: The slice topology: a viable alternative to mosco convergence in
non-refiexive spaces. Nonlinear Analysis: Theory, Methods and Applica-
tions, 19, 271-290 (1992)
[Bee93] Beer, C : Topologies on closed and closed convex sets. Mathematics and
its Apphcations, 268, Kluwer Acad. Publ. (1993)
[BL92] Borwein, J.M., Lewis, A.S.: Partially-finite convex programming. Mathe-
matical Programming, 57, 15-83 (1992)
[EWOO] Eberhard, A., Wenczel, R.: Epi-distance convergence of parametrised sums
of convex functions in non-reflexive spaces. J. Conv. Anal., 7, 47-71 (2000)
[ET99] Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. SIAM
Classics in Applied Mathematics, 28 (1999)
[Hol75] Holmes, R.B.: Geometric Functional Analysis and its Applications.
Springer-Verlag Graduate Texts in Mathematics 24 (1975)
[Pen93] Penot, J. -P.: Preservation of persistence and stability under intersection
and operations. J. Optim. Theory & Appl., 79, 525-561 (1993)
[Pen02] Penot, J-P, Zalinescu, C : Continuity of usual operations and variational
convergence, personal communication, 30/04/02 (2002)
[RW84] Rockafellar, R.T., Wets, J.-B.: Variational systems, an introduction. In:
Salinetti, G. (ed) Multifunctions and Integrands. Springer-Verlag Lecture
Notes in Mathematics, 1091, 1-54 (1984)
[Roc70] Rockafellar, R.T.: Convex Analysis. Princeton University Press (1970)
[Roc74] Rockafellar, R.T.: Conjugate Duality and Optimization. SIAM publ. (1974)
[WE99] Wenczel, R.B., Eberhard, A.C.: Slice convergence of parametrised sums of
convex functions in nonreflexive spaces. Bull. Aust. Math. Soc, 60, 429-
458 (1999)
[Zal03] Zalinescu, C : Slice convergence for some classes of convex functions. J.
Nonlinear and Convex Analysis, 4, (2003)
Topical Functions and their Properties in a
Class of Ordered Banach Spaces
Hossein Mohebi
Department of Mathematics
Shahid Bahonar University of Kerman
Kerman, Iran
hmohebiOmail.uk.ac.ir;
CIAO, School of Information Technology and Mathematical Sciences
University of Ballarat
Ballarat, VIC 3353, Austraha
h. mohebiQballarat. edu. au
1 Introduction
A function / : IR'^ —> JR^ is called topical if this function is increasing
{x > y = ^ f{x) > f{y)) and plus-homogeneous {f{x + Al) = f{x) + Al
for all X e IR^ and all A G IR), where 1 is the vector of the corresponding
dimension with all coordinates equal to one. These functions are studied in
[GG98, Gun98, Gun99, GK95, RSOl, Sin02] and they have m a n y appHcations
in various parts of applied mathematics (see [Gun98, Gun99]).
In this paper we study topical functions / : X —> IR defined on an
ordered Banach space X. We show t h a t the topical functions / : X —> IR
344 H. Mohebi
are characterized by the fact that the Fenchel-Moreau conjugate function and
the conjugate function of type Lau admits a very simple expHcit description.
Most of these results have been obtained by A. Rubinov and I. Singer in finite
dimensional case (see [RSOl, Sin02]). In this paper, we obtain these results in
ordered Banach spaces without using the concepts of lattice theory.
The structure of the paper is as follows. In Section 2, we recall main defin-
itions and prove some results related to downward sets and topical functions.
We also show that a topical function is abstract convex. Characterizations of
plus-weak Pareto points for a closed downward set are investegated in Sec-
tion 3. In Section 4, we study the subdifferential of a topical function and we
present the characterizations of plus-weak Pareto points of a closed downward
set in terms of separation from outside points. In Section 5, we give chara-
terizations of a topical function in terms of its Fenchel-Moreau conjugate and
biconjugate with respect to a certain set of elementary functions. In section
6, we first give characterizations of topical functions in terms of the conjugate
of type Lau. Next, we show that for topical functions, the conjugate of type
Lau and the Fenchel-Moreau conjugate coincide.
2 Preliminaries
Let X be a Banach space with the norm ||.|| and let C be a closed convex cone
in X such that Cfl (—C) = {0} and int C 7^ 0. We assume that X is equipped
with the order relation > generated hy C : x > y \i and only if x — y £ C
(x, y e. X). Moreover, we assume that C is a normal cone. Recall that a cone
C is called normal if there exists a constant m > 0 such that \\x\\ < m\\y\\^
whenever 0 < x < y, and x, y E X. Let 1 G int C and let
It is well known and easy to check that B can be considered as the unit
ball of a certain norm ||.||i, which is equivalent to the initial norm ||.||. Assume
without loss of generality that ||.|{ = ||.||i.
We study in this paper topical functions and downward sets. Recall (see
[Sin87]) that a subset VF of X is said to be downward, ifweW and x G X with
X < w, then x e W. A function / : X —> IR := [—00, +00] is called topical if
this function is increasing {x > y ==^ f{x) > f{y)) and plus-homogeneous
(/(x + A l ) = /(x) + A for all X G X and all A G IR). The definition of a topical
function in finite dimensional case can be found in [RSOl].
For any subset W of X, we shall denote by int W, cl W, and bd W the
interior, the closure and the boundary of VF, respectively.
For a non-empty subset W of X and x G X, define
It follows from (1) that the set {A G IR : Al < x -\- y} is non-empty and
bounded from above (by ||x + y||). Clearly this set is closed. It follows from
the definition of (p that the function (p enjoys the following properties:
—oo < cp{x,y) < \\x + y\\ for each x,y e X (4)
it follows that
y--\\x- y\\l < X < y + ||x - y||l.
346 H. Mohebi
and hence
\fix)-f{y)\<\\x~y\\. (9)
Thus, / is Lipschitz continuous. D
Proof. 1) Suppose that there exists x € X such that f{x) = +oo, and let
y e X he arbitrary. Let A = (/?(—x,y), where cp is the function defined by (3).
Then by (4) we have A G IR. In view of (5), it follows that Al < y — x^ and so
X -i- XI < y. Since / is a topical function, we conclude that f{x) + A < f{y).
This implies that f{y) = +oo.
2) Assume that there exists x e X such that f{x) — — oo, and let y £ X
be arbitrary. Let A = ip{x^ —y), where ^ is the function defined by (3). Then
by (4) we have A G IR. In view of (5), it follows that Al < x — y, and so
y + Al < X. Since / is a topical function, we conclude that f{y) < f{x) — A.
This implies that f{y) — — oo, which completes the proof. D
It follows from Proposition 2, for any topical function / : X —> IR, either
we have d o m / = X oi f = +oo, where d o m / := {x e X : f{x) < +oo}.
In the following we denote by X^p the set of all functions (pi (I G X) defined
by (8). That is:
X^ = {ipi:=ip{.J):leX}. (10)
/ ( x ) = sup(^/ (x) (x G X ) ,
y£X
The following proposition has been proved in finite dimensional case (see
[RSOl]). However, the same proof is valid in the case under consideration.
348 H. Mohebi
Finally, let IQ — f{y)l — y. Since (p{y,.) is a topical function and (7) holds,
it follows that
^loiy) = ^{y^ ^o) = ^{y^ f{y)i -y) = f{y) + ^(y, -y) = f{y)-
Indeed, if x e {x G X : pw{x) < 0}, then there exists A < 0 such that
X e XI + W. Since x < x — Al, x — XI e W and W is a, downward set, it
follows that X G W. Also, note that if VK is a closed downward subset of X,
then
W = {xeX : pw{x) < 0}.
Proof Since t6^ is a plus-weak Pareto point of W, it follows from Lemma 3.1
that pwi"^) = 0.
1) =^ 2). Suppose that 1) holds. Then, by Definition 3 and Remark 1, we
have
^{yJ)<Pw{y)<0 yyeW
and (p{w, I) = (pi{w) — pwi'^) = 0. Hence, sup^^^ (p{y, I) < 0 = (p{w, I).
2) = > 1). Assume that 2) holds. Let y G X and x — y — pwiy)'^- Since, by
Proposition 3, pw is a topical function, it follows that pw{x) = 0. In view of
Remark 1, we have x G W. Thus, by hypothesis, (p{x,l) < 0. This implies that
^i{y) < Pw{y) for all y e X, Also, we have (pi{w) := ^{wj) = 0 == pwiw).
Hence, by Definition 3, / G dx^Pwi'i^)^ which completes the proof. D
AQI -i- WO E W^ and hence by hypothesis, (p{Xol + WQJ) < 0. This impHes,
since (p{.J) is a topical function, that
This is a contradiction. D
The following example shows that every plus-weak Pareto point of a closed
downward set W need not separate W and ball B{xo,ro).
Example 1. Let X = JR^ with the maximum norm ||x|| = maxi<^<2 \xi\ and
Let
W = {{wi,W2) G H^ : mm{wi,W2} < 1},
XQ = (2,2) e X \W and WQ = (1,3). It is clear that C is a closed convex
normal cone in X, W^ is a closed downward subset of X and WQ G bd W.
Also, we have 1 = (1,1) G i n t C We have d{xo, W) = 1 = ||xo — go\\^ where
^0 = (1,1) is the least element of the set Pw{xo). Since WQ G bd W, it follows
from Lemma 2 that WQ is a plus-weak Pareto point of VF, and we have also
^0 '= \\xo - 'u;o|| = 1 = d{xo, W).
Now, let / = —WQ and w = {wiyW2) GW he arbitrary. Then we have
Proof. Let go = XQ—rl be the least element of the set Pw{xo). It is clear that
9o < XO' Now, assume if possible that WQ ^ Pw{xo). Then, r < TQ. Choose
A G IR such that 1 — ror~^ < A < 0, and let w = XXQ + (1 — X)go. Since
9o ^ ^0, it follows that w — go = X{xo — go) < 0, and so w < go- Since VF" is a
downward set and go G W, we conclude that w eW. Also, we have
||xo - ^11 = ||xo - Axo - (1 - A)^o|| == (1 - A)||xo - ^o|| = (1 - A)r < ro,
It is not difficult to show that the function ipy is topical and Lipschitz
continuous and consequently, ip is continuous (see Proposition 1 and its corol-
laries).
It is not difficult to show that for any function / : X —> IR, /^^"^^ is a topical
function, and hence we conclude that / is a topical function, which completes
the proof. D
The proof of the following theorem is similar to that in finite dimensional
case (see [RSOl]).
T h e o r e m 7. Let f : X —> M be a plus-homogeneous function and 8 : X x
X —> M be a coupling function such that 6{.,y) {y E X) is a topical function.
Then
f<'\y)= sup 9{x,y)= sup e{x,y) {y e X).
xGX, f{x)=0 xeSoif)
Proposition 5. Let ip and ip be the functions defined by (3) and (21), respec-
tively. Let f : X —> M be a plus-homogeneous function. Then the following
assertions are true:
1) We have
2) We have
Proof. 1). It is easy to check that /^('^) and f^^"^^ are topical functions. Since
ij) and if are symmetric coupling functions, It follows from Remark 3 that
c('0) = c{ipy and c{(p) = c{^y. Therefore, by Theorem 7 and Corollary 5, we
conclude that
and
and
A ^ ( G ) : = { ^ G W ; % , t x ; ) < 0 , V ^ G G} (G C F ) ,
which will be also called the conjugate of type Lau with respect to 9, and
denoted by f^^^\ we have
Remark 4- We recall (see [Sin87]) that if V and W are sets, then for any
duality A : 2^ —> 2 ^ and any function / : V —> IR, the lower level set
Sxif^^^^) (A G IR) has the following form:
5A(/^(^^)-n,^v,;(,)<_AA(M).
Corollary 6. Let ^ and ip be the functions defined by (3) and (21), respec-
tively. Let f : X —> M be any function such that
yex, x-hyec
We recall (see [Sin02, Lemma 3.3]) that if F is a set and 9 :VxV —> IR is
a symmetric coupling function, then the conjugate of type Lau L{6) : IR —>
IR^ of (29) is self-dual, that is, 1(9) = L{ey. Also, if V and W are sets
and A : 2^ —> 2 ^ is any duality, then the biconjugate of type Lau of a
function f : V —> M with respect to A, is the function / ^ ( ^ ) ^ ( ^ ) ' :V —>M
defined by / ^ ( A ) ^ ( A ) ' .= (yL(A))L(A)' (g^^ [Sin87]). In particular, for the
function (p defined by (3) and the (/^-duality Ac^ : 2 ^ —> 2 ^ of (30), we have
fLi^)L{^y ^ ^fLM>jL{<py^ ^j^g^g f .X —> IR is a function.
Theorem 10. Let (p be the function defined by (3). Then for a function f :
X —> M the following assertions are equivalent:
1) We have
Now, let y G X be fixed and x E X he such that x > —y. Since by Proposition
6, /^(^^ is an increasing function, it follows that -f^^'^\x) < -f^^'^H-y),
and so in view of (35) and (31) and that cl (int C) = C, we get
360 H. Mohebi
XEX^ x'>—y
T h e o r e m 1 1 . Let (p be the function defined by (3). Then for any topical func-
tion f : X —> M, we have
References
[GG98] Gaubert, S., Gunawardena, J.: A non-linear hierarchy for discrete event
dynamical systems. Proc. 4th Workshop on discrete event systems, Cal-
giari. Technical Report HPL-BRIMS-98-20, Hewlett-Packard Labs. (1998)
[Gun98] Gunawardena, J.: An introduction to idempotency. Cambridge University
Press, Cambridge (1998)
[Gun99] Gunawardena, J.: Prom max-plus algebra to non-expansive mappings: a
non-linear theory for discrete event systems. Theoretical Computer Sci-
ence, Technical Report HPL-BRIMS-99-07, Hewlett-Packard Labs. (1999)
[GK95] Gunawardena, J., Keane, M.: On the existence of cycle times for some non-
expansive maps. Technical Report HPL-BRIMS-95-003, Hewlett-Packard
Labs. (1995)
[MRS02] Martinez-Legaz, J.-E., Rubinov, A.M., Singer, L: Downward sets and their
separation and approximation properties. Journal of Global Optimization,
23, 111-137 (2002)
[MR05] Mohebi, H., Rubinov, A.M.: Best approximation by downward sets with
applications. Journal of Analysis in Theory and Applications, (to appear)
(2005)
[RubOO] Rubinov, A.M.: Abstarct Convexity and Global Optimization. Kluwer
Academic Publishers, Boston/Dordrecht/London (2000)
Topical Functions and Their Properties 361
[RSOl] Rubinov, A.M., Singer, I.: Topical and sub-topical functions, downward
sets and abstract convexity. Optimization, 50, 307-351 (2001)
[Sin74] Singer, L: The theory of best approximation and functional analysis. Re-
gional Conference Series in Applied Mathematics, 13 (1974)
[Sin87] Singer, I.: Abstract Convex Analysis. Wiley-Interscience, New York (1987)
[Sin02] Singer, I.: Further application of the additive min-type coupling function.
Optimization, 5 1 , 471-485 (2002)
P a r t III
Applications
Dynamical Systems Described by Relational
Elasticities with Applications
Summary. In this paper we describe a new method for modelling dynamical sys-
tems assuming that the information about the system is presented in the form of
a data set. The main idea is to describe the relationships between two variables as
influences of the changes of one variable on another. The approach introduced was
examined in data classification and global optimization problems.
1 Introduction
In [Mam94] a new approach for mathematical modeling of dynamical systems
was introduced. This approach was further developed in [Mam01a]-[MYA04]
and has been applied to solving many problems, including data classifica-
tion and global optimization. This paper gives a systematic survey to this
approach.
The approach is based on non-functional relationship between two vari-
ables which describes the influences of the change (increase or decrease) of
one variable on the change of the other variable. It can be considered as a
certain analog of elasticity used in the literature (see, for example, [IntTl]).
We shall refer to this relationship between variables as relational elasticity
{fuzzy derivative, in [Mam94, MamOlb, MYOl]).
In [MM02] the notion of influence (of one state on another state) as a
measure of the non-local contribution of a state to the value function at other
states was defined. Conditional probability functions were used in this defini-
tion, but the idea behind this notion is close to the notion of influence used in
[Mam94]. The calculations undertaken have shown that ([MamOla, MYOl])
366 M.A. Mammadov et al.
this definition of the influence provides better results than if we use conditional
probability.
As mentioned in [MM02] the notion of influence is also closely related to
dual variables (or shadow prices in economics) for some problems (see, for
example, [Gor99]).
We now describe some situations, where the notion of relational elasticity
can be applied. Classical mathematical analysis, which is based on the no-
tion of functional dependance, is suitable for examination of many situations,
where influence of one variable on another can be explicitly described. The
theory of probabilities is used in the situation, where such a dependance is
not clear. However, this theory does not include many real-world situations.
Indeed, probability can be used for examination of situations, which repeat
(or can be repeated) many times. The attempts to use probability theory in
uncertain situations, which can not be repeated many times, may lead to great
errors.
We consider here only real-valued variables (some generalizations to vector-
valued variables are also possible, however we do not consider them in the cur-
rent paper). One of the main properties of a real-valued variable is monotonic-
ity. We define the notion of infiuence by the increase or decrease of one variable
on the increase or decrease of the other. We can consider the change of a vari-
able as a result of activity of some unknown forces. In many instances our
approach can be used for finding resulting state without explicit description
of forces. Although the forces are unknown, this approach allows us to predict
their action and as a result, to predict the behavior of the system and/or give
a correct forecast. In this paper we undertake an attempt to give some descrip-
tion of forces acting on the system through the influences between variables
and to describe dynamical systems generated by these forces.
The suggested approach of description of relationships between variables
has been successfully applied to data classification problems (see [MamOla]-
[MYOl], and references therein). In this paper we will only concentrate on
some applications of dynamical systems, generated by this approach, and tra-
jectories to these systems.
In Section 5, we examine the dynamical systems approach to data clas-
sification by introducing a simple classification algorithm. Using dynamical
system ideas (trajectories) makes results, obtained by such a simple algo-
rithm, comparable with the results obtained by other algorithms, designed
for the purpose of data classification. The main idea behind this algorithm is
close to some methods used in Nonlinear Support Vector Machines (see, for
example, [Bur98]) where the domain is mapped to another space using some
nonlinear (mainly, quadratic) mappings. In our case the transformation of the
domain is made using the forces acting at each point of the domain.
The main application of this dynamical systems approach is to global opti-
mization problems. In Section 6, we describe a global optimization algorithm
based on this approach. The algorithm uses a new global search mechanism
based on dynamical systems generated by the given objective function. The
Dynamical Systems with Applications 367
results, obtained for many test examples and some difficult practical problems
([Mam04, MYA04]), have shown the efficiency of this global search mechanism.
points found so far) we need to develop special techniques for computing the
functions ^^(x,y) (see Section 3)
Therefore, the functions ^^,i = 1,2,3,4 completely describe the influence
of the variable y on x in terms of changes. We will call it the relational elasticity
between the two variables and denote it by dx/dy.
Let £,{x,y) = {^i{x,y), 6(^>^), 6(^,2/), ^4{x,y)). So we have dx/dy =
^(x,y), where ^i(x,y), ^2{x,y), ^six.y) and ^4{x,y) are non-negative valued
functions.
By analogy we define dy/dx as an influence of x on y. Let dy/dx = rj{x, y),
where r] = (771,7/2,773,774), and 771 = d{x "[ y t ) , 772 = d{x "[ y i ) , Vs = d{x i y i
), 7/4 = d{x iy T).
Thus, the relationship between variables x and y will be described in the
following form:
dx/dy = ^(x, y), dy/dx = rj{x, y). (1)
The examples of relationships presented below show that the system (1)
covers quite a large range of relations including those that can not be described
by some functions (or even set-valued mappings).
1. A homotone relationship. Assume that ^i(x,y) > ^2(^,2/), ^?>{x^y)
:>U{x^y) and 771(0;, 2/) > 772(^,2/), 773(0:, y) >774(x,t/).
This case can be considered as a homotone relationship, because the in-
fluence of the increase (or decrease) of one variable on another is, mainly,
directed in the same direction: increase (or decrease).
2. A n antitone relationship. Assume that ^i(x,y) <^ ^2(^52/), ^si^.y)
< Ui^^y) and 771 (x,y) <C mi^^v)^ V^i^^v) < V4{x,y).
This case can be considered as an antitone relationship, because the in-
fluence of the increase (or decrease) of one variable on another is, mainly,
directed in the inverse direction: decrease (or increase).
3. Assume that the influence of y on x such that dx/dy = {a, a, a, a),
where a > 0. In this case the variable x may increase or decrease with the
same degree and these changes do not depend on y. We can say that the
influence of y on x is quite indefinite.
4. Let dx/dy = (a, 0,0, a), (a > 0). In contrast to case 3, in this case the
influence of y on x is quite definite; every change in y increases x.
5. Let dx/dy — (a, 0,6,0), where a, 6 > 0 and a^ b. This is a special case
(known as hysteresis) of a homotone relationship considered above, where as
y increases x increases strongly and when y decreases then x decreases not as
strongly. If such a relationship is valid at all points (x, y) then the dependence
between these variables can not be described by some mappings, like y = y{x)
or X = x(y).
More complicated relationships arise when all the components in dx/dy
are not zero. This is the case that we have when dealing with real problems
where the information about the systems is given in the form of some datasets.
Dynamical Systems with Applications 369
For the global approach the numbers Mi, M n , M12, M2, M13, M14 stand for
the number of points (x^,y"^), satisfying x^ > x^, x'^ > x^ and y'^ > y^,
x"^ > x^ and y"^ < y^, x^ < x°, x"^ < x^ and y"^ < y^, x ^ < x^ and
y'^ > y^, respectively. In the local approach we use x^ G (x^,x^ + e) and
x^ e{x^ -e,x^) instead of x ^ > x° and x"^ < x°.
Note that according to Remark 3.1 we could define the changes of the
variable y by taking any small number S > 0. For instance, we could take
ym y yO _^^ instead of y"^ > y^.
where X+ - {m; zixf^ > 0}; X++ = {m; Z\x7^ > 0, Z \ / ^ > 0};
X~ — {m; zix^ < 0};
i - + = {m; Axf < 0, Af^ > 0}; i^+ = {m; Z \ / ^ > 0 > 0};
i;^++ = {m; Af^ > 0, Z\x7^ > 0}; i^+- = {m; Af^ > 0, Zix^^ < 0}.
The coefficients af^ = (|zAxf^|/||x"^ — x^|| )^ are used to indicate the contri-
bution of the coordinate i in the change ||x"^ —x^||. Clearly, a5i" + ... + a ^ = 1
for all m.
4 Dynamical systems
In this section we present some notions introduced in [Mam94] which have
been used for studying the changes in the system.
Consider a system which consists of two variables x and y, and assume that
at every point (x, y) the relationship between them is presented by relational
elasticities (1); that is:
dx/dy = C(x, y), dy/dx = rj{x, y),
In this case we say that a Dynamical System is given. Here we study the
changes of these variables using only the information obtained from relational
elasticities. In this way the notion of forces introduced below will play an
important role.
Definition 1. At given point (x, y) : the quantities F{x | ) — 771^1 + 772C4 ^^^
F{x I) — rjs^s + 774^2 CLre called the forces acting from y on the increase and
decrease of x, respectively; the quantity F{x) = F{x t) + F{x j) is called the
force acting from y on x. By analogy, the forces F{y),F{y ])^F{y | ) acting
from X on y are defined: F{y) = Fly t) + F{y j), F{y t) = 6 m + 6^4,
F{y i) = 6 ^ 3 + 6 ^ 2 .
Dynamical Systems with Applications 371
The main sense of this definition, for example for F{x t ) , becomes clear from
the expression
This proposition states that, the size of the force on x equals the size of the
force on y. It can be considered as a generalization of Newton's Third Law of
Motion. To explain this statement, and, also, the reasonableness of Definition
1, we consider one example from Mechanics.
Assume that there are two particles, placed on a line, and x, y are their
coordinates. Let x < y. Then, in terms of gravitational influences, we would
have
dx/dy= (6,0,0,(^4), dy/dx = (0,772,773,0);
where ^1,^4,772,773 > 0. Then, from Definition 1 it follows that
^i6 = 0 (7)
^36 = 0 (8)
V2U = 0 (9)
7746 = 0 (10)
Consider two cases.
1). Let 6 — 0- I^ this case we have
0 (i) . 3 ^ 1
(10)
6 =1 m ^^
(3) (4)- 771-1 -^ (6).
6 =0 ^ 6 =1
2). Let ?7i = 0. In this case we have
(8)
^2 = 1 => 6 0
dx/dy = ( 6 {x, y), 0, ^^{x, y), 0), dy/dx = {rji (x, y), 0,773(2;, y), 0).
In this case we can say that there are no internal forces creating the changes
in the system. Changes in the system may arise only as a result of outside
forces.
at + l)=at)+a-Sign{A^{t)y, (11)
where
{ 1 ifa>0;
0 i f a = 0;
- 1 if a < 0.
In the second method we set
at+i)=m+c^-mt). (12)
The difference between these formulae is that, in (12) the variables are
changed with different steps along the direction /\^(t), whilst, in (11) all
the variables are changed with the same step a > 0.
Consider an example.
Example 1. Consider a domain D = {{dib) : a G [0,10],6 G [1,10]}. Assume
that the field of forces in the domain D is defined by the data {(x,y)} pre-
sented in Table 1. Using this data, we can calculate forces acting at each
374 M.A. Mammadov et al.
point (x^y) G D and, then, we can calculate trajectories to system (1). First
we calculate the values of relational elasticities dy/dx and dx/dy by the local
approach taking 5 = 1.1 (see Section 3). Then, we generate trajectories taking
a = (0.5)^ and different initial points. We consider two cases /c = 0 and k > 1.
Therefore, in this case, the domain D can divided into two parts, as well
as data presented in Table 1. Clearly, if A: ^ oo then P^ —> {(4,4)}, P2 ^^
{(3,4)} in the Hausdorff metric.
We observe that there are three sets Pi^P2^P^ for k = 0 and two sets
Pi^P2 for A: > 1, which are the limit cycles for all trajectories. This means
that the turnpike property is true for this example (see [MR73]). Thus, the
idea of describing dynamical systems in the form of (1) and the study of
trajectories to this system can be used in different problems. In the next
section, we check this approach in data classification problems. As a domain
D we take the heart disease and liver disorder databases.
attributes. Here we treat all the attributes uniformly i.e. we simply considered
m levels for each attribute. Intervals related to these levels are defined only
by using the training set and, therefore, the scaled values of the features of
the observation depends on the training set.
{ 1 if Xj < a'j;
where ry > 0 is a given tolerance. Since the set A consists of the vectors with
integer coordinates we take rj = 1/3 in the calculations below.
Clearly we can not expect a good performance from such a simple al-
gorithm (see the results presented in Table 2 for T == 0), but considering
376 M.A. Mammadov et al.
trajectories starting from test points, we can increase the accuracy of classifi-
cation. The results obtained in this way are even comparable with the results
obtained by other classification algorithms (see Table 3).
We define the field of forces in R^ using the set A which contains all
training examples from both classes. At a given point x = (xi,...,Xn) the
relational elasticities are calculated for each pair of features (z, j ) by the global
approach (see Section 3). Let F{xj -^ Xi | ) and F{xj -^ Xi j) be the forces
acting from the feature j to decrease and increase, respectively, the feature i
at the point x. Then the resulting forces on the feature i is defined as a sum
of all these forces; that is,
Xi T). (13)
Then, given new (test) point x*^, we calculate (as in Example 1) a trajec-
tory X (t) {t = 0,1,2, ...,T) started from this point. We use a step a = 0.25.
To decrease the influence of circulating effects on the transform the trajectory
X (t) to x{t) by taking middle points of each of the last 5 steps; that is,
( ^ ( 0 ) - f x ( l ) + ...x(t) .r . ^ A.
Table 2. Accuracy for test set for the heart disease and liver-disorder databases
with 10-fold cross validation obtained by Algorithm F
T 0 2 4 6 8 10 12 14 16 18 20
Heart 80.0 80.0 80.3 80.7 81.0 81.4 81.4 81.7 81.7 81.7 82.1
Liver 60.6 60.3 63.8 63.8 67.1 67.6 68.5 69.7 69.7 69.4 70.9
T 22 24 26 28 30 32 34 36 38 40 42
Heart 82.1 82.4 82.8 82.4 82.4 82.4 82.1 82.8 82.4 83.1 83.1
Liver 70.3 70.9 70.9 70.6 70.3 70.0 70.0 71.8 71.2 70.6 70.6
Table 3. Results for the heart disease and liver-disorder databases with 10-fold
cross validation obtained by other methods
Heart Liver
Algorithm Ptr Pts Ptr Pts
HMM 87.5 82.8 72.2 66.6
PMM 91.4 82.2 74.9 68.4
RLP 84.5 83.5 69.0 66.9
S V M II • 111 85.3 84.6 67.8 64.0
S V M II . Iloo 85.8 82.5 68.7 64.6
S V M II . 11^ 84.7 75.9 60.2 61.0
Now, given box Bk, we describe the procedure of finding a good solution
Let
x*(0) = argmin{/(x) : x G ^(0)}.
3. The set ^(0) together with the values of the objective function allows
us to generate a dynamical system. Our aim in this step is to find some "good"
point x*(l) and add it to the set ^(0).
Let t = 0 and the point x*(^) be the "best" point in the set A{t).
The main part of the algorithm is to determine a direction, say F{t), at
the point x*(^), which can provide a better solution x*{t-\-l). We can consider
F{t) as a global descent direction. For this aim, using the set A{t), we calculate
the forces acting on / | at the point x*(t) from each variable i G { l , . . . , n } .
We set F{t) = {Fi{t), ,..,Fn{t)) where the components Fi{t) = F{i -^ f f)
are calculated at the point x*{t) (see Definition 1). Then we define a point
x{t + 1) by formula (12); that is, we consider the vector — F{t) as a descent
direction and set
There are many different methods and algorithms developed for global op-
timization problems (see, for example, [MPVOl, PR02, Pin95] and references
therein). Here, we mention some of them and note some aspects.
The algorithm AGOP takes into account some relatively "poor" points for
further consideration. This is what many other methods do, such as Simulated
Annealing ([Glo97, Loc02], Genetic Algorithms ([Smi02]) and Taboo Search
([CK02, Glo97]). The choice of a descent (good) direction is the main part
of each algorithm. Instead of using a stochastic search (as in the algorithms
mentioned), AGOP uses the formula (16), where the direction F{t) is defined
by relational elasticities.
Note that the algorithm AGOP has quite different settings and motiva-
tions compared with the methods that use so called "dynamical search" (see
[PWZ02] and references therein). Our method of a search has some ideas in
common with the heuristic method which attempts to estimate the "overall"
convexity characteristics of the objective function ([DPR97]). This method
does not work well when the postulated quadratic model is unsuitable. The
advantage of our approach is that we do not use any approximate underesti-
mations (including convex underestimations).
The methods that we use in this paper, are quite different from the homo-
topy and trajectory methods ([Die95, For95]), which attempt to visit (enu-
merate) all stationary points (local optimas) of the objective function, and,
therefore, cannot be fast for high dimensional problems. The algorithm AGOP
attempts to jump over local minimum points trying to find "deeper" points
that do not need to be a local minima.
Numerical experiments have been carried out on a Pentium III PC with 800
MHz main processor. We use the following notations:
n - is the number of variables;
fmin - is the minimum value obtained;
fbest - is the global minimum or the best known result;
t (sec) - is the CPU time in seconds;
Nf - is the number of function evaluation.
We used 24 well known test problems (the list of test problems can be found
at [Mam04]). The results obtained by algorithms AGOP(F) and AGOP(D)
are presented in Table 4. We observe that the version AGOP(F) is more stable
Dynamical Systems with Applications 381
in finding global minima in all cases, meanwhile the version AGOP(D) has
failed in two cases (for the Rastrigin function). In Table 5, we present the
elapsed times and the number of function evaluations for functions with large
number of variables obtained by AGOP(F).
The results obtained have shown the efficiency of the algorithm. For in-
stance, for some of the test examples (where the number of variables could be
chosen arbitrarily), the number of variables is increased up to 3000, and the
time of processing was between 2 (for Rastrigin and Ackley's functions) and 12
(for Michalewicz function) minutes. We could not find comparable results in
the literature. For instance, in [LL05] (Genetic Algorithms), the problems for
Rastrigin, Griewank and Ackley's functions are solved for up to 1000 variables
only, with the number of function evaluations [337570, 574561], 563350 and
[548306, 686614], respectively (3 digit accuracy was the goal to be achieved).
In our case, we have the number of function evaluations 174176, 174124 and
185904, respectively (see Table 4), with the complete global search.
Table 5. Elapsed times and the number of function evaluations for AGOP(F)
1 Function n JBest Jmin \t (sec) Nj
1 Ackleys lUUU 0 0.000459 1 21.23 185904
1 Ackleys 300U 0 0.000516 145.67 530154
1 Griewank lUUU 0 4.248-10"^ 42.74 174124
1 Griewank 3000 0 4.43M0~^ 367.09 555123
1 Levy Nr.l 1000 0 1.433-10-'^ 22.07 163724
Levy Nr.l 3000 0 3.875-10"^ 201.06 463924
Levy Nr.2 1000 0 1.434-10-^, 46.75 165724
Levy Nr.2 3000 0 1.292-10"^ 380.01 463724
Levy Nr.3 1000 -11.5044 -11.395 24.68 182522
Levy Nr.3 3000 -11.5044 -11.395 174.62 573514
Michalewicz 1000 N/A -957.0770 68.08 257265
Michalewicz 3000 N/A -2859.124 715.60 955907
Rastrigin 1000 0 1.440-10-H 20.69 174176
Rastrigin 3000 0 2.159-10-^ 1162.07 5091251
References
[BM92] Bennett, K.P., Mangasarian, O.L.: Robust linear programming discrim-
ination of two linearly inseparable sets. Optimization Methods and Sol-
ware, 1, 23-34 (1992)
[BM98] Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimiza-
tion and support vector machines.In: Shavlik, J. (ed) Machine Learning
Proceedings of the Fifteenth International Conference (ICLML'98), 82-
90. Morgan Kaufmann, San Francisco, Cahfornia (1998)
[Bur98] Burges, J.C.: A tutorial on support vector machines for pattern recog-
nition. Data Mining and Knowledge Discovery, 2 121-167 (1998)
(http://svm.research.bell-labs.com/SVMdoc.html)
[CM95] Chen, C , Mangasarian, O.L.: Hybrid misclassification minimization.
Mathematical Programming Technical Report, 95-05, University of Wis-
consin (1995)
[CK02] Cvijovic, D., Klinovski, J.: Taboo search: an approach to the multiple-
minima problem for continuous functions. In: Pardalos, P., Romeijn, H.
(eds) Handbook of Global Optimization, 2, Kluwer Academic Publishers
(2002)
[Die95] Diener, I.: Trajectory methods in global optimization. In: Horst, R.,
Pardalos, P. (eds) Handbook of Global Optimization, Kluwer Academic
Publishers (1995)
[DPR97] Dill, K.A., Phillips, A.T., Rosen, J.M.: Molecular structure prediction by
global optimization. In: Bomze, I.M. et al (eds) Developments in Global
Optimization, Kluwer Academic Publishers (1997)
[DKS95] Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised dis-
cretization of continuous features. ICML-95 (1995)
[For95] Forster, W.: Homotopy methods. In: Horst, R., Pardalos, P. (eds) Hand-
book of Global Optimization, Kluwer Academic Publishers (1995)
384 M.A. Mammadov et al.
[Glo97] Glover, F., Laguna, M.: Taboo search. Kluwer Academic Publishers (1997)
[Gor99] Gordon, G.J.: Approximate solutuions to Markov decision processes.
Ph.D. Thesis, CS department, Carnegie Mellon University, Pittsburgh,
PA (1999)
[IntTl] Intriligator, M.D.: Mathematical Optimization and Economic Theory,
Prentice-Hall, Englewood Cliffs (1971)
[LL05] Lazauskas, L: http://solon.cma.univie.ac.at/ neum/glopt/results/ga.html
- Some Genetic Algorithms Results (collected by Leo Lazauskas) (2005)
[KLT03] Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search:
new perspectives on some classical and modern methods, SI AM Review,
45, 385-482 (2003)
[Loc02] Locatelli, M.: Simulated annealing algorithms for continuous global opti-
mization. In: Pardalos, P., Romeijn, H. (eds) Handbook of Global Opti-
mization, 2, Kluwer Academic Publishers (2002)
[MR73] Makarov, V.L., Rubinov, A.M.: Mathematical theory of economic dynam-
ics and equilibria, Nauka, Moscow (1973) (English trans.: Springer-Verlag,
New York, 1977)
[Mam94] Mamedov, M.A.: Fuzzy derivative and dynamic systems. In: Proc. of the
Intern. Conf. On Appl. of Fuzzy systems, ICAFS-94, Tabriz (Iran), Oct.
17-19, 122-126 (1994)
[MamOla] Mammadov, M.A.: Sequential separation of sets with a given accuracy
and its applications to data classification. In: Proc. of the 16-th National
Conference the Australian Society for Operations Research in conjuction
with Optimization Day, 23-27 Sep., Mclarens on the Lake Resort, South
Australia (2001)
[MamOlb] Mammadov, M.A.: Fuzzy derivative and its applications to data clas-
sification. The 10-th IEEE International Conference on Fuzzy Systems,
Melbourne, 2-5 Dec. (2001)
[Mam04] Mammadov, M.A.: (2004). A new global optimization algorithm based
on dynamical systems approach. In: Rubinov, A., Sniedovich, M.
(eds) Proceedings of The Sixth International Conference on Optimiza-
tion: Techniques and Applications (IC0TA6), University of Ballarat,
Austraha, Dec. 2004, Article index number 198 (94th article); Also
in: Research Report 04/04, University of Ballarat, Australia (2004)
(http://www.ballarat.edu.au/ard/itms/publications/researchPapers.sht-
ml)
[MRYOl] Mammadov, M.A., Rubinov, A.M., Yearwood, J.: Sequential separation
of sets and its applications to data classification. In: Proc. of the Post-gr.
ADFA Conf. On Computer Science, 14 Jul. 2001, Canberra, Australia,
75-80 (2001)
[MSY04] Mammadov, M.A., Saunders, G., Yearwood, J.: A fuzzy derivative ap-
proach to classification of outcomes from the ADRAC database. Interna-
tional Transactions in Operational Research, 1 1 , 169-179 (2004)
[MYOl] Mammadov, M.A., Yearwood, J.: An induction algorithm with selec-
tion significance based on a fuzzy derivative. In: Abraham, A., Koeppen,
M. (eds) Hybrid Information Systems, 223-235. Physica-Verlag, Springer
(2001)
[MYA04] Mammadov, M.A., Yearwood, J., Aliyeva, L.: (2004). Multi label clas-
sification and drug-reaction associations using global optimization tech-
niques. In: Rubinov, A., Sniedovich, M. (eds) Proceedings of The Sixth
Dynamical Systems with Applications 385
^ School of Mathematics
The University of Adelaide
Adelaide, SA 5005, Australia
cpearceQmaths. a d e l a i d e . edu. au, sbelenQankaLra. baskent. edu. t r
^ School of Mathematics and Statistics
University of South Australia
Mawson Lakes, SA 5095, Australia; Departamento de Sistemas e Computagao
Universidade Federal do Rio de Janeiro
Rio de Janeiro, Brazil
Yalcin.KayaQunisa.edu.au
1 Introduction
Stochastic rumour models were introduced by Daley and Kendall [DK65], who
considered a single initial spreader introducing a rumour into a closed pop-
ulation. Initially the remainder of the population do not know the rumour
and as such are termed ignorants. The members of the population meet one
another with uniform mixing. A spreader-ignorant interaction converts the
ignorant into a spreader. W h e n two spreaders interact, they stop spreading
388 C. Pearce et al.
di
(1)
-=-s(l-20. (2)
(3)
This is the equation used by Daley and Kendall to determine that in their
classical case ( ^ 0.2031878.
It is interesting to look at the case when a —> 0, in other words when
there are almost no initial ignorants in the population. For this purpose we
introduce a new variable
a
Impulsive Control of a Sequence of Rumour Processes 391
rj = l/e^ 0.368 .
Thus even when there is a small initial proportion of ignorants and a large
initial proportion of spreaders, about 36.8% of the ignorant population never
hear the rumour. This result is given in [BP04].
We shall make repeated use of the following theorem, which plays the role
of a basis result for subsequent inductive arguments. Here we are examining
the variation of ( with respect to one of a, /?, 7 subject to (4), with another
of a, /?, 7 being fixed.
This is [BP04, Theorem 3], except that the statements there corresponding
to (a) and (b) are for a < 1/2 and a > 1/2 respectively. The extensions to
include a == 1/2 follow trivially from the continuity of C as a function of a.
3 Scenario 1
We now address a compound rumour process in which n > 1 broadcasts
are made under Scenario 1. We shall show that the final proportion of the
population never hearing a rumour is minimised when and only when the
second and subsequent broadcasts are made at the successive epochs at which
5 = 0 occurs. We refer to this procedure as control policy S. It is convenient to
392 C. Pearce et al.
consider separately the cases 0 < a < 1/2 and a > 1/2. Throughout this and
the following two sections, ^ denotes the final proportion of the population
hearing none of the sequence of rumours.
Theorem 3. Suppose (4) holds with 0 < a < 1/2, that Scenario 1 applies
and n > 1 broadcasts are made. Then
(a) ^ is minimised if and only if the control process S is adopted;
(h) for (3 fixed, ^ is a strictly increasing function of a under control policy S.
Proof Let T be an optimal control policy, with successive broadcasts occur-
ring at times TI < r2 < . . . < Tn- We denote the proportion of ignorants in
the population at Tk hy ik (A: = 1 , . . . ,n), so that ii — a. Since i is strictly
decreasing during the course of each rumour and is continuous at a broadcast
epoch, we have from applying Theorem 1 to each broadcast in turn that
all the inequalities being strict unless two consecutive broadcasts are simul-
taneous.
Suppose if possible that s > 0 at time Tn — 0. Imagine the broadcast about
to be made at this epoch were postponed and s allowed to decrease to zero
before that broadcast is made. Denote by ^' the corresponding final proportion
of ignorants in the population. Since i decreases strictly with time, the final
broadcast would then occur when the proportion of ignorants had a value
2 > ^n > e .
Applying Theorem 2(a) again, to the last two broadcasts, gives that in is
a strictly increasing function of in-i and that ^ is strictly increasing in in.
Hence ^ is strictly increasing in in-i-
If n == 2, we have nothing left to prove, so suppose n > 2. We shall derive
the desired results by backward induction on the broadcast labels. We suppose
that for some k with 2 < fc < n we have
(i) 3 = 0 at time TJ — 0 for j = /c,fc+ 1 , . . . , n;
(ii) ^ is a strictly increasing function of ik-i-
To establish the inductive step, we need to show that 5 = 0 at Tk-i — 0
and that ^ is a strictly increasing function of i/c-2- The previous paragraph
provides a basis A: — n for the backward induction.
If 5 > 0 at Tfc-i — 0, then we may envisage again modifying the system,
allowing s to reduce to zero before making broadcast k — 1. This entails that,
Impulsive Control of a Sequence of Rumour Processes 393
For the counterpart result for a > 1/2, it will be convenient to extend the
notation of Theorem 2 and use ({i) to denote the final proportion of ignorants
when a single rumour beginning with state (z, /?, 1 — i — /?) has run its course.
Theorem 4. Suppose (4) holds with a > 1/2^ that Scenario 1 applies and
n> 1 broadcasts are made. Then
(a) ^ is minimised if and only if the control process S is adopted;
(h) for fixed P, ^ is a strictly decreasing function of a under control policy S.
Proof First suppose that in > 1/2. By Theorem 1 and (6), this necessitates
that 5 > 0 at time T2 — 0. If we withheld broadcast 2 until 5 = 0 occurred,
the proportion 23 of ignorants at that epoch would then satisfy
The second inequality will be strict unless 5 = 0 at time rj^+i — 0. This leads
to
i3 = C{i2)<C{h+i)<ik^2<l/2,
and proceeding recursively we obtain
Thus we have ^' < ^, again contradicting the optimahty of T. Hence we must
have k = 1, and so
-n0
a
whiclI may be rewritten as
^e-2« = a+n(3)
(10)
and
^e-2« = ifce-(''+2i.) _ (12)
Consider the limiting case /? —> 0 and 7 -^ 0, which gives the classical Daley-
Kendall limit of a rumour started by a single individual. Since ik < 1/2 for
2 < k < n and ^ <l/2, we have by Lemma 1 that in fact
and subsequent broadcast epochs does not change the system physically. This
cannot occur for P > 0, which shows that when the initial broadcast is to a
perceptible proportion of the population, as with the mass media, the effects are
qualitatively different from those in the situation of a single initial spreader.
The behaviour oi ik with n = 5 broadcasts is depicted in Figure 1(a) with
the traditional choice 7 = 0. In generating the graphs, Equation 11 has been
solved with initial conditions P = 0,0.2,0.4,0.6,0.8,1. The figure illustrates
Remark 2.
4 Monotonicity of ^
In this section we examine the dependence of ^ on the initial conditions for
Scenario 1. Equation (10) can be expressed as
n/? + 2 (a - 0 + In ^ - In a = 0. (13)
A single broadcast may be regarded as an instantiation of Scenario 1 with
n = 1. The outcome is independent of the control policy. This enables us to
derive the following extension of Theorem 2 to n > 1 broadcasts, ^ taking
the role of C- We examine the variation of ^ with respect to one of a, /?, 7
subject to and one of a, /?, 7 being fixed. For example, if /? is fixed then we
can consider the variation of ^ with respect to a subject to the constraint
a + 7 = I — P supplied by (4). For clarity we adopt the notation {d^/da)f3 for
the derivative of ^ with respect to a for fixed P subject toa-{-j = l—p. We
use corresponding notation for the other possibilities arising with permutation
of a, /?, 7.
Theorem 5. Suppose (4) holds with n > 1. Then under Scenario 1 we have
the following.
(a) For P fixed, ^ is strictly increasing in a for a < 1/2 and strictly decreasing
in a for a > 1/2.
(b) For a fixed, ^ is strictly decreasing in p.
(c) For 7 fixed, ^ is strictly increasing in a.
Proof. The case n = 1 is covered by Theorem 2, so we may assume that n > 2.
Also Part (a) is simply a restatement of Theorem 3(b) and Theorem 4(b).
For parts (b) and (c), we use the fact that ^ < 1/2. Implicit differentiation
of (13) yields
and
'dC\ _ O + (n - 2)a
^da)^ a l-2e
for any n > 1, which yield (b) and (c) respectively. D
396 C. Pearce et al.
1 - !
0.9[
0.8
0.7
o.ei
ik 0.5
0.4
0.3
p—0
0.2
0.1
oi p — • • , 1 1
1
(a)
Corollary 1. For any n > 1, we have ^ := sup^ = 1/2. This occurs in the
limiting case a — 'j —^ 1/2 with /? —» 0.
Proof. Prom Theorem 5(c) we have for fixed 7 > 0 that ^ is approached in
the limit a = 1 — 7 with /? = 0. By Theorem 5(a), we have in the limit /? = 0
that ^ arises from a = 1/2. This gives the second part of the corollary.
Prom (13), J satisfies
1 - 2x + ln(2x) = 0.
It is shown in Corollary 4 of [BP04] that this equation has the unique positive
solution X = 1/2. The first part follows. D
Remark 4- I'n the case a —> 0 of a vanishing small proportion of initial igno-
rants, we have by (15) that
rj = e-^^ . (16)
Thus the ratio of the final proportion of ignorants to those at the beginning de-
cays exponentially at a rate equal to the product of the number n of broadcasts
and the proportion (3 of initial spreaders. Two subcases are of interest.
(i) The case (3 —> 0 represents a finite number of spreaders in an infinite
population. Almost all of the initial population consists of stiflers, that is,
7 —> 1, and we have rj = 1. No matter how many broadcasts are made,
the proportion of ignorants remains unchanged.
(ii)In the case of (3 —> 1 almost all of the initial population consists of
spreaders, and we obtain rj = e~^.
398 C. Pearce et al.
Consider Equation (16) again. For 0 < /3 < 1, as well as for (3 —> I, we have
that T] —> 0 as n —> oo.
The behaviour of 0k for the standard case 7 = 0 is illustrated in Fig-
ure 1(b), for which we solve (14) with various initial conditions for 5 broad-
casts. This brings out the variation with /? more dramatically. The graph
illustrates in particular Remark 4(ii). The curves pass through (1,1), since
ii = a implies ©i = 1.
Remark 5. Given initial proportions a of ignorants and (3 of subscribers, with
0 < /? < 1 or with (3 —> 1, the required number n of broadcasts to achieve a
target proportion rj or less of ignorants can be obtained through (15) as
k -~[ln{rj) + 2a{l-rj)]
This necessitates
n = 2{l-en+i) (19)
and so from (18) that
n ( l - 7 ) + ln0n+i-O. (20)
Clearly (19) and (20) are together also sufficient for there to be a point of
concurrence.
Elimination of n between (19) and (20) provides
Denote by r/o the value of C, for a (single) rumour in the limit y5 ^ 0 and the
same fixed value of 7 as in the repeated rumour. We have
Prom (21) and (22) we can identify 0n+i = ^0 and (19) then yields n =
2(1 — r]o). We thus have a common point of intersection (3 — 2rjo,rjo). In
particular, for the traditional choice 7 — 0, we have 770 ~ 0.203 and the
common point is approximately (2.594, 0.203), a point very close to the cluster
of points in Figure 1(b).
5 Convexity of ^
We now address second-order monotonicity properties of ^ as a function of
a, /3, 7 in Scenario 1. The properties derived are new for n = 1 as well as
for n > 2. First we establish two results, of some interest in their own right,
which will be useful in the sequel.
400 C. Pearce et al.
Theorem 7. Suppose (4) holds with n > 1 and Scenario 1 applies. For 0 <
X <1 and u; > 0 define
h{x, cj) := a; + 2(2x - 1) + ln(l - x) - In x.
Then
(a)h{x,iu) = 0 defines a unique
x = (l>{uj)e (1/2,1);
(b) h is strictly increasing in u;
(c)i>l-a ^^=> a > </)(n/?) and ^<l-a 4=^ a<(j){np).
Proof, We have
dh _ {l-2xf ^^^
dx x{l — x) ~ '
with equahty if and only \i x = 1/2, so h{-,uj) is strictly decreasing on (0,1).
Also h{l/2,(jo) = uj > 0 and h{x,uj) —^ —oo as x —> 1—. Part (a) follows.
The relation h{x,uj) — 0 may be written as
-u; = 2(2x - 1) + ln(l - x) - Inx.
Part (b) is an immediate consequence, since the right-hand side is a strictly
decreasing function of a; on (0,1).
Since h is strictly decreasing in x, we deduce from (a) that
/i(x,u;)>0 for x < ^{^) and /i(x,cj)<0 for x>(\)[uS), (23)
For 2/G (0,1) put
g{a,u,y) := uj-i-2{a - y)+ lny - In a.
We have readily that dg/dy is positive for y <l/2 and negative for ?/ > 1/2, so
g is strictly increasing in y for y < 1/2 and strictly decreasing in y for y > 1/2.
Also p ^ a ; > O a s y — ^ a and g -^ —oo as y ^ 0, whence g{a,n(3,^) = 0
defines a unique ^ G (0, a A 1/2). We have
^ ^ 1— a according as g{a, n(3,1 — a) ^ 0.
But ^(a,n/3,1 -- a) = h{a,np). Part (c) now follows immediately from (23).
D
a-)(0^Mii>»'
which yields (a). Similarly
2
1_ 1 - 2a .
c ? I V 1 - 2^
that is, the opposite sign to 1 — (a + ^). By Theorem 7(c), the expression in
brackets is thus negative if a < 4>{nP) and positive if a > (t){nP)^ whence part
(b).
Also by implicit differentiation of (13) twice with respect to a,
(24)
U VU^V^ e\da,1,0?
and a single differentiation gives
my^—H (25)
By Theorem 5(c), the right-hand side of (25) is positive for n > 2, so the
right-hand side of (24 must be positive and therefore so also the left-hand
side, whence we have the first part of (c).
To complete the proof, we wish to show that for n = 1 the right-hand side
of (25) is positive for 7 > 1 — In 2 and negative for 7 < 1 — In 2. Since
^daj^ a(l-20'
6 Scenario 2
Theorem 9. Suppose (4) holds and n > 1 broadcasts are made under Sce-
nario 2. Then
(a) ^ is minimised if and only if control policy S is adopted;
(h) for fixed y, ^ is a strictly increasing function of a under control policy S.
Proof The argument closely parallels that of Theorem 3. The proof follows
verbatim down to (7). We continue by noting that in either the original or
modified system r == 7 at time r^ -f- 0. By Theorem 2(c), (7) implies ^' < ^,
contradicting the optimality of control policy T . Hence we must have 5 = 0
at time r^ — 0. The rest of the proof follows the corresponding argument in
Theorem 3 but with Theorem 2(c) invoked in place of Theorem 2(a). D
Since ik,^ ^ (0,1/2) for 1 < k < n, Lemma 1 yields that (26) determines
^ uniquely and sequentially from ii = a.
Theorem 10. Suppose (4) holds and Scenario 2 applies with n > 1. Then we
have the following.
(a) For a fixed, ^ is strictly decreasing in (3.
(b) For 7 fixed, ^ is strictly increasing in a,
Impulsive Control of a Sequence of Rumour Processes 403
1 1 1 1
o.sl
o.yi
o.el
ik 0.5 [
0.4
o.si
P—0 ^ ^ ^ 1
0.2
0.11
p—1 , 1 1 gMaMM— 1
3 4
k
(a)
dik
<0 for 2<k<n + l.
'3X2
-1,
dfiJAi2 ^.
supplying a basis. Implicit differentiation for general k gives
dik dik-] 1
1-2 -1
dp ^k dp ^k-l
from which we derive the inductive step and complete the proof. D
rj = One ek = Ok-ie-^
which in turn give
-n/3
7] — e
This equation is the same as that obtained in Remark 4 made for Scenario 1.
The rest of the discussion given in Remark 4 O'lso holds for Scenario 2.
Impulsive Control of a Sequence of Rumour Processes 405
Figure 2(b) illustrates the above remark for a+13 -^ 1. As with Figure 1(b),
Figure 2(b) shows more dramatically the dependence on /?: for a given initial
value a, we have for each A: > 1 that Ok increases with /?, the relative and
absolute effects being both less marked with increasing k.
7 Comparison of Scenarios
We now compare the eventual proportions ^ and ^* respectively of the pop-
ulation never hearing a rumour when n broadcasts are made under control
policy S with Scenarios 1 and 2. For clarity we use the superscript * to distin-
guish quantities pertaining to Scenario 2 from the corresponding quantities
for Scenario 1.
Theorem 11. Suppose (4) holds and that a sequence of n broadcasts is made
under control policy S. Then
(a) if n> 2, we have
Proof. From (11), (12) (under Scenario 1) and (26) (under Scenario 2), ^ may
be regarded as i^+i and ^* as ijl^+i, so it suffices to establish Part (a). This
we do by forward induction on k,
Suppose that for some A: > 2 we have
il-i<ik-i^ (30)
so t h a t
Acknowledgement
Yalcin Kaya acknowledges support by a fellowship from C A P E S , Ministry
of Education, Brazil (Grant No. 0138-11/04), for his visit to Department of
Systems and Computing at the Federal University of Rio de Janeiro, during
which p a r t of this research was carried out.
References
[AH98] Aspnes, J., Hurwood, W.: Spreading rumours rapidly despite an adversary.
Journal of Algorithms, 26, 386-411 (1998)
[Bar72] Barbour, A.D.: The principle of the diffusion of arbitrary constants. J.
Appl. Probab., 9, 519-541 (1972)
[BKP05] Belen, S., Kaya, C.Y., Pearce, C.E.M.: Impulsive control of rumours with
two broadcasts. ANZIAM J. (to appear) (2005)
[BP04] Belen, S., Pearce, C.E.M.: Rumours with general initial conditions.
ANZIAM J., 45, 393-400 (2004)
[Bla85] Blaquiere, A.: Impulsive optimal control with finite or infinite time horizon,
J. Optimiz. Theory Applic, 46, 431-439 (1985)
[Bom03] Bommel, J.V.: Rumors. Journal of Finance, 58, 1499-1521 (2003)
[CHJK96] Corless, R.M., Hare, D.E.G., Jeffrey, D.J., Knuth, D.E.: On the Lambert
W function. Advances in Computational Mathematics, 5, 329-359 (1996)
[DK65] Daley, D.J., Kendall, D.G.: Stochastic rumours. J. Inst. Math. Applic, 1,
42-55 (1965)
[DP03] Dickinson, R.E., Pearce, C.E.M.: Rumours, epidemics and processes of
mass action: synthesis and analysis. Mathematical and Computer Mod-
elling, 38, 1157-1167 (2003)
[DMCOl] Donavan, D.T., Mowen, J . C , Chakraborty, C : Urban legends: diffusion
processes and the exchange of resources. Journal of Consumer Marketing,
18, 521-533 (2001)
Impulsive Control of a Sequence of Rumour Processes 407
[FPRU90] Feige, U., Peleg, D., Rhagavan, P., Upfal, E.: Randomized broadcast in
networks. Random Structures and Algorithms, 1, 447-460 (1990)
[ProOO] Frost, C : Tales on the internet: making it up as you go along. ASLIB
P r o c , 52, 5-10 (2000)
[GanOO] Gani, J.: The Maki-Thompson rumour model: a detailed analysis. Envi-
ronmental Modelling and Software, 15, 721-725 (2000)
[MT73] Maki, D.P., Thompson, M.: Mathematical Models and AppHcations.
Prentice-Hall, Englewood Cliffs (1973)
[OT77] Osei, G.K., Thompson, J.W.: The supersession of one rumour by another.
J. App. Prob., 14, 127-134 (1977)
[PeaOO] Pearce, C.E.M.: The exact solution of the general stochastic rumour. Math.
and Comp. Modelling, 3 1 , 289-298 (2000)
[Pit90] Pittel, B.: On a Daley-Kendall model of random rumours. J. Appl.
Probab., 27, 14-27 (1990)
[RZ88] Rempala, R., Zabczyk, J.: On the maximum principle for deterministic
impulse control problems. J. Optim. Theory Appl., 59, 281-288 (1988)
[Sud85] Sudbury, A.: The proportion of the population never hearing a rumour. J.
Appl. Probab., 22, 443-446 (1985)
[Wat87] Watson, R.: On the size of a rumour. Stoch. Proc. Apphc, 27, 141-149
(1987)
[ZanOl] Zanette, D.H.: Critical behaviour of propagation on small-world networks.
Physical Review E, 64, 050901(R), 4 pages (2001)
Minimization of the Sum of Minima of Convex
Functions and Its Application to Clustering
1 Introduction
In this paper we introduce and study a class of sum-min functions. This class
T consists of functions of the form
Consider finite dimensional vector space IR^ and IR"^. Let A C IR^ be a finite
set and let A: be a positive integer. Consider a function F defined on (IR"^)^
by
F ( x i , . . . , XA;) = ^ min((^i(xi, a), (^2(^2, a ) , . . . ^k{x],,a)), (1)
F{xi,...,Xk) = Yl ™ ^ , (pi{xi,a),
aeA
where x H-> (pi{xi^a) is a convex function. Then F enjoys the following prop-
erties:
1. F is quasidifferentiable ([DR95]). Moreover, F is DC (the difference of
convex functions). Indeed, we have (see for example [DR95], p. 108):
F{x) = fi{x) - f2{x), x = (xi,...,Xk),
where
aEAi=l
M^) == X^ .i^ax^^(^^(x^,a).
aeA jy^i
Both / i and /2 are convex functions. The pair DF{x) = (9/i(x), —df2{x)) is
a quasidifferential [DR95] of F at a point x. Here df stands for the convex
subdifferential of a convex function / .
2. Since F is DC, it follows that this function is locally Lipschitz.
3. Since F is DC it follows that this function is semi-smooth.
3 Examples
We now give some examples of functions belonging to class T. In all the
examples, datasets are denoted as finite sets A C IR^, that is as sets of n-
dimensional points (also denoted observations).
412 A. Rubinov et al.
Assume that a finite set A C IR'^ consists of k clusters. Let X = {xi,..., x^} C
(IR"')^. Consider the distance d{X,a) = min{||xi — a||,... \\xk — a\\) between
the set X and a point [observation) a e A. (It is assumed that IR"^ is equipped
with a norm || • ||.) The deviation of X from A is the quantity d{X,A) =
"^aeA ^{-^i ^)- -^^^ ^ — {^1' • • • ^fc} be a solution to the problem:
is called a cluster function. This function has the form (1) with ipi{x,a) =
11 a: — o II for each aeA and i = 1 , . . . , fc. The cluster function was examined
in [BRY02]. Some numerical methods for its minimization were suggested in
[BRY02].
The cluster function has a saw-tooth form and the number of teeth dras-
tically increases as the number of addends in (2) increases. This leads to the
increase of the number of shallow local minima and saddle points. If the norm
II • II is a polyhedral one, say || • || = || • ||i, then the cluster function is piece-wise
linear with a very large number of different linear pieces. The restriction of
the cluster function to a one-dimensional line has the form of a saw with a
huge amount of teeth of different size but of the same slope.
Let {ma)aeA be a family of positive numbers. Function
is called a generalized cluster function. Clearly Ck has the form (1). The
structure of this function is similar to the structure of cluster function, however
different teeth of generalized cluster function can have different slopes.
Clusters constructed according to centres, obtained as a result of the clus-
ter function minimization are called centre-based clusters.
^Y2,d{a,H) = ^ m i n | [ / ^ , a ] - Ci\.
aEA aGA
The function
• * _•.•
* • •
Clearly this set consists of two clusters, the centers of these clusters (points
xi and X2) can be found by the minimization of the cluster function. The
skeleton of this set hardly depends on the number k of hyperplanes (straight
lines). For each k this skeleton cannot give a clear presentation on the structure
of the set.
Minimization of the Sum of Minima of Convex Functions 415
F{xi,..,,Xk) = — ^mm{(pi(xi,a),(p2(x2,a),..,(pk{xk,a)),
aeA
Xi e IR"", 2 == l,...,fc.
where A C IR^ is a finite set. This function depends on n x k variables. In
real-world applications n x A; is a large enough number and the set A contains
some hundreds or thousands points. In such a case function F has a huge
amount of shallow local minimizers that are very close to each other. The
minimization of such functions is a challenging problem.
In this paper we consider both local and global minimization of sum-min
functions from J^. First we discuss possible local techniques for the minimiza-
tion.
The calculation of even one of the Clarke subgradients and/or a quasidiffer-
ential of function (1) is a difficult task, so methods of nonsmooth optimization
based on subgradient information (quasidifferential information) at each iter-
ation are not effective for the minimization of F . It seems that derivative-free
methods are more effective for this purpose.
416 A. Rubinov et al.
For the local minimization of functions (1) we propose to use the so-called
discrete gradient (DG) method, which was introduced and studied by Adil
Bagirov (see for example, [Bag99]). A discrete gradient is a certain finite dif-
ference approximated the Clarke subgradient or a quasidifferential. In contrast
with many other finite differences, the discrete gradient is defined with respect
to a given direction. This leads to a good enough approximation of Clarke sub-
gradients (quasidifferentials). DG calculates discrete gradients step-by-step; if
a current point in hands is not an approximate stationary point then af-
ter a finite number of iterations the algorithm calculates a descent direction.
Armijo's method is used in DG for a line search.
The calculation of discrete gradients is much easier if the number of ad-
dends in (1) is not very large. The decrease of the number of addends leads
also to a drastic diminishing of the number of shallow local minima. Since
the number of addends is equal to the number of points in the dataset, we
conclude that the results of the application of DG for minimization of (1)
significantly depend on the size of the set A.
The discrete gradient method is a local method, which may terminate in
a local minimum. In order to ascertain the quality of the solution reached, it
is necessary to apply global methods. Here we call global method a method
that does not get trapped on stationary points, and can leave local minima
to a better solution.
Various combinations between local and global techniques have recently
been studied (see, for example [HF02, YLT04]).
We use a combination of the DG and the cutting angle method (DG+CAM)
in our experiments. We call this method the hybrid global method.
These two techniques (DG and DG+CAM) have been included in a new
optimization software (CIAO-GO) created recently at the Centre for Infor-
matics and Applied Optimization (CIAO) at the University of Ballarat, see
[CIA05] for more information. This version of the CIAO-GO software (Centre
for Informatics and Applied Optimization-Global Optimization) allows one to
use four different solvers
1. DG,
2. DG multi start,
3. DG+CAM,
4. DG+CAM multi start.
Working with this software users have to input
• an objective function (for minimization),
• an initial point for optimization,
• upper and lower bounds for variables,
• constraints and a penalty constant (in the case of constrained optimiza-
tion), constraints can be represented as equalities and inequalities,
• maximal running time,
• maximal number of iterations.
Minimization of the Sum of Minima of Convex Functions 417
"Multi start" option in CIAO-GO means that the program starts from the
initial point chosen by a user and also generates 4 additional random initial
points. The final result is the best obtained result. The additional initial points
are generated by CIAO-GO from the corresponding feasible region (or close
to the feasible region).
As a global optimization technique we use the General Algebraic Mod-
eling System (GAMS), see [GAM05] for more information. We use the Lip-
schitz global optimizer (LGO) solver [LGO05] from Pinter Consulting Services
[Pin05].
Consider a set ^ c IR^ that contains N points. Choose e > 0. Then choose
a random vector b^ G A and consider subset A^ji = {a G A : ||a — 6^|| < e}
of the set A. Take randomly a point b"^ e Ai = A\ Ai^i. Let ^52 = {a e Ai :
a — 6^11 < e} and ^2 = ^1 \ ^62- If the set Aj-i is known, take randomly
b^ G Aj-iy define set Ai^j as {a G Aj-i : ||a — 6^|| < e} and define set A
as Aj-i \ Aijj. The result of the described procedure is the set B = {b^}^^^,
which is a subset of the original dataset A. The vector b^ is a representative
for the whole group of vectors, removed on the step j .
If rrij is the cardinality of Af^j then the generalized cluster function corre-
sponding to B
Ckix\...,x') = ^'£mjunn{\\x'-V\\,...,\\x''-b^\\)
3
Most methods of local optimization are very sensitive to the choice of an initial
point. In this section we suggest a choice of initial points which can be used
for the minimization of cluster functions and generaUzed cluster functions.
Consider a set ^1 C IR^ that contains N points. Assume that we want to
find k clusters in A. In this case an initial point is a vector x G IR^^^. The
structure of the problem under consideration leads to different approaches to
the choice of initial points. We suggest the following four approaches.
fc-meansLi initial point The fc-meansLi method is a version of the well-
known fc-means method (see, for example, [MST94]), where || • ||i is used
instead of || • ||2. (We use || • ||i in numerical experiments, this is the reason
for consideration of /c-meansLi instead of /c-means.) We use the following
procedure in order to sort N observations into k clusters:
1. Take any k observations as the centres of the first k clusters.
2. Assign the remaining N — k observations to one of the k clusters on the
basis of the shortest distance (in the sense of || • ||i norm) between an
observation and the mean of the cluster.
3. After each observation has been assigned to one of the k clusters, the
means are recomputed (updated).
Stopping criterion: there is no observation, which moves from one cluster to
another.
Note that results of this procedure depend on the choice of an initial
observation.
We apply this algorithm for original dataset A and then the result point
X G IR^^'^ is considered as an initial point for minimization of generalized
cluster function generated by the dataset B.
Uniform initial point The appHcation of optimization methods to clustering
requires a certain data processing. In particular, a scaling procedure should
be applied. In our experiments we convert a given dataset to a dataset with
the mean-value 1 for each feature (coordinate). In such a case we can choose
the point x = ( 1 , 1 , . . . , 1 ) G IR"^'^ as initial one. We shall call it the uniform
initial point.
Ordered initial point Recall that rrij indicates the cardinality of the set
of points A^j G A, which are represented by a point IP G 5 . It is natural
to consider the collection of the heaviest k points as an initial vector for the
minimization of generalized cluster function C. To formalize this, we rearrange
the points so that the numbers mj, j = 1, •. •, NB decrease and take the first
k points from this rearranged dataset. Thus, in order to construct an initial
point we choose the k observations with the largest values for weights ruj from
the dataset B.
Minimization of the Sum of Minima of Convex Functions 419
6.1 Datasets
We carried out numerical experiments with two well-known test datasets (see
[MST94]):
• Letters dataset (20000 observations, 26 classes, 16 features). This dataset
consists of samples of 26 capital letters, printed in different fonts; 20 differ-
ent fonts were considered and the location of the samples was distributed
randomly within the dataset.
• Pendigits dataset (10992 observations, 10 classes, 16 features). This dataset
was created by collecting 250 samples from 44 writers. These writers are
asked to write 250 digits in random order inside boxes of 500 by 500 tablet
pixel resolution.
Both Letters and Pendigit datasets have been used for testing different
methods of supervised classification (see [MST94] for details). Since we use
these datasets only for construction of generalized cluster function, we consider
them as datasets with unknown classes.
We are looking for three and four clusters in both Letters and Pendigits
datasets. Dimension of optimization problems is equal to 48 in the case of 3
clusters and 64 in the case of 4 clusters. We consider two small sub-databases
of the Letters dataset (Letl, 353 points, approximately 2% of the original
dataset; and Let2, 810 points, approximately 4% of the original dataset) and
two small sub-sets of the Pendigits dataset (Penl, 216 points, approximately
2% of the original dataset; and Pen2, 426 points, approximately 4% of the
original dataset).
We apply local techniques (discrete gradient method) and global tech-
niques (a combination between discrete gradient and cutting angle method
and LGO solver) to minimize the generalized cluster function. Then we need
420 A. Rubinov et al.
to estimate the results obtained. We can use different approaches for this es-
timation. One of them is based on comparison of values of cluster function
Ck constructed with respect to the centers obtained in the original dataset
A and with respect to the centers obtained in its small sub-dataset B. We
compare the cluster function values, started from different initial points in
original datasets and their approximations.
We use the following procedure.
Let A be an original dataset and B be its small sub-dataset. First, the
centres of clusters in B should be found by an optimization technique. Then
we evaluate the cluster function values in A using the obtained points as the
centers of clusters in A. Using this approach we can find out how the results
of the minimization depend on initial points and how far we can go in the
process of dataset reduction.
In our research we use 4 types of initial points, described in section 5.2.
These initial points have been carefully chosen and the results obtained start-
ing from these initial points are better than the results obtained starting from
random initial points. Therefore, we present the results obtained for these 4
types of initial points rather than the results obtained starting from random
initial points generated, for example, by "multi start" option.
Local optimization
First of all we have to point out that we have two groups of initial points
• Group 1: Uniform initial point and A:-meansLi initial point,
• Group 2: Ordered initial point and Uniform-ordered initial point.
Initial points from Group 1 are the same for an original dataset and for all
its reduced versions. Initial points from Group 2 are constructed according
to their weights. Points in original datasets have the same weights which are
equal to L
Remark 3. Because the weights can vary for different reductions of the dataset,
the Ordered initial points for Letl and Let2 do not necessarily coincide. The
same is true for the Uniform-ordered initial points. The same observation
appUes to the Pendigits dataset and its reduced versions Penl and Pen2.
Our next step is to compare results obtained starting from different initial
points in the original datasets and in their approximations. In our experi-
ments we use two different kinds of function: the cluster function and the
generalized cluster function. Values for the cluster function and the general-
ized cluster function are the same for original datasets because each point has
the same weight which is equal to 1. In the case of reduced datasets we pro-
duce our numerical experiments in corresponding approximations of original
datasets and calculate two different value: the cluster function value and the
Minimization of the Sum of Minima of Convex Functions 421
generalized function value. The cluster function value is the value of the
cluster function calculated in the corresponding original dataset according to
the centres found in the reduced dataset. The generalized cluster function
value is the value of the generalized cluster function calculated in the reduced
dataset according to the centres found in the same reduced dataset. Normally
a cluster function value (calculated according to the centres found reduced
datasets) is larger than a generalized cluster function value calculated accord-
ing to the same centres and the corresponding weights, because optimization
techniques have been actually applied to minimize the generalized cluster in
the corresponding reduced dataset. In Tables 1-2 we present the results of
our numerical experiments obtained for DG and DG+CA starting from the
Uniform initial point.
It is also very important to remember that a better result in a reduced
dataset is not necessarily better for the original one. For example, in the case of
the Penl dataset, 3 clusters, the Uniform initial point the generalized function
value is lower for DG+CAM than for DG, however the cluster function value
is lower for DG than for DG+CAM. We observe the same situation in some
other examples.
Table 1. Cluster function and generalized cluster function: DG, Uniform initial
point
^, , Generalized ^, , Generalized
Cluster , ^ Cluster , ^
p ,. cluster p ,. cluster
Dataset Size function „ function „
, ninction , mnction
value , value
value value
3 clusters 4 clusters
Penl 216 6.4225 5.5547 5.7962 4.8362
Pen2 426 6.3844 5.8132 5.7725 5.0931
Pendigits 10992 6.3426 6.3426 5.7218 5.7218
Letl 353 4.3059 3.3859 4.1200 3.1611
Let2 810 4.2826 3.7065 4.0906 3.5040
Letters 20000 4.2494 4.2494 4.0695 4.0695
^, , Generalized ^, , Generalized
Cluster , ^ Cluster , ^
. ^. cluster r ,. cluster
Dataset Size lunction ^ function p
, function , function
value , value ,
value value
3 clusters 4 clusters
Penl 216 6.4254 5.5546 5.7943 4.8353
Pen2 426 6.3843 5.8131 5.7718 5.0931
Pendigits 10992 6.3426 6.3426 5.7218 5.7218
Letl 353 4.3059 3.3859 4.1208 3.1600
Let2 810 4.2828 3.7061 4.0909 3.5020
Letters 20000 4.2494 4.2494 4.0695 4.0695
First we present the results obtained by the LGO solver (global optimization).
We use the Uniform initial point. The results are in Table 9.
In almost all the cases (except Pendigits 3 clusters) the results for reduced
datasets are better than for original datasets. It means that the cluster func-
tion is too complicate for the solver as an objective function and it is more
efficient to use generalized cluster functions generated on reduced datasets. It
is beneficial to use reduced datasets in the case of the LGO solver from two
points of view
1. computations with reduced datasets allow one to reach a better minimizer;
2. computational time is significantly less for reduced datasets than for orig-
inal datasets.
It is also obvious that the software failed to reach a global minimum. We sug-
gest that the LGO solver has been developed for a broad class of optimization
problems. However, the solvers included in CIAO-GO are more efiicient for
minimization of the sum of minima of convex functions, especially if the num-
ber of components in sums is large.
Remark 5. The LGO solver was not used in the experiments on skeletons.
7 Skeletons
7.1 Introduction
Pi =
T(O,I...,I)
Ps
The problem has been solved for different sets of points, selected from 3 dif-
ferent well known datasets: the Heart disease database (13 features, 2 classes:
160 observations are from the first class and 197 observations are from the
second class), the Diabetes database (8 features, 2 classes: 500 observations
are from the first class and 268 observations are from the second class) and
the Australian credit cards database (14 features, 2 classes: 383 observations
are from the first class and 307 observations are from the second class), see
also [MST94] and references therein. Each of these datasets was submitted
first to the feature selection method described in [BRY02].
426 A. Rubinov et al.
The different examples show that although sometimes the hybrid method
does not improve the result obtained with the discrete gradient method, in
some other cases the result obtained is much better than when the discrete
gradient method is used. However the computations times it induces are much
greater than the simple use of the discrete gradient method. The diabetes
dataset has 3 features, after feature selection (see [BRY02]). This allows us to
plot graphically some of the results obtained during the computations.
We can observe that the hybrid method does not necessarily give an opti-
mal solution. Even with the hybrid method the initial point is very important.
Figure 3 however, confirms that the solutions obtained are usually very good,
and represent correctly the set of points. The set of points studied here is
Minimization of the Sum of Minima of Convex Functions 427
# %k .•
J"
m •
HI rf^«^ © ^^
@ «» •a
constituted by a big mass of points, and some other points spread around. It
is interesting to remark that the hyperplanes intersect around the same place
- where the big mass is situated - and take different directions, to be the closer
possible to the spread points.
Figure 4 shows the complexity of the diabetes dataset.
We are looking for three and four clusters in both Letters and Pendigits
datasets. Dimension of optimization problems is equal to 51 in the case of
428 A. Rubinov et al.
In this subsection we present the results obtained for the skeleton function.
Our goal is to find the centres in original datasets, therefore we do not present
the generalized skeleton function values. Table 12 and Table 13 present the
values of the skeleton function evaluated in the corresponding original datasets
(Pendigits and Letters respectively) according to the skeletons obtained as
optimization results reached in datasets from the first column of the tables. We
use two different optimization methods: DG and DG+CAM and two different
types of initial points: "single start" (DG or DG+CAM) and "multi start"
(DGMULT or DG+CAMMULT).
The most important conclusion to the results is that in the case of the
skeleton function the best optimization results (the lowest value of the skeleton
function) have been reached in the experiments with the original datasets. It
means that the proposed cleaning procedure is not as efficient in the case of
skeleton function as it is in the case of the clustering function. However, in the
case of the clustering function the initial points for the optimization methods
have been chosen after some preliminary study. It can happen that an efficient
choice of initial points leads to better optimization results for both kinds of
datasets: original and reduced.
430 A. Rubinov et al.
Another set of numerical experiments has been carried out on the both ob-
jective functions. Although of little interest from the point of view of the
optimization itself, to the authors' opinion it may bring some more light on
the clustering part.
The objective functions (2) and (7) has been minimized using two different
methods: the discrete gradient method described above, and a hybrid method
between the DG method and the well known simulated annealing method.
This command is described with details in [BZ03].
The basic idea of the hybrid method is to alternate the descent method
to obtain a local minima and the simulated annealing method to escape this
minimum. This reduces drastically the dependency of the local method on an
initial point, and ensures that the method reaches a "good" minimum.
Numerical experiments were carried out on the Pendigit and Letters
datasets for the generalized cluster function using different size dataset ap-
proximations. The results have shown that the hybrid method reached a sen-
sibly comparable value as the other methods, although the algorithm had to
leave up to 50 local minima. This can be explained by the large number of
local minima in the objective function, each close to one another.
The skeleton function was minimized for the Heart Disease and the Dia-
betes datasets. The same behaviour can be observed. As the results of these
experiments were not drawing any major conclusion, they are not shown here.
Numerical experiments have shown that while considerably faster than the
simulated annealing method, the hybrid method is still fairly slow to converge.
8 Conclusions
8.1 Optimization
tions. This type of problems appears quite often in the area of data analysis,
and two examples have been solved.
The generalized cluster function has been minimized for two datasets, us-
ing three different methods: the LGO global optimization software included in
GAMS, the discrete gradient method and a combination between this method
and the cutting angle method.
The last two methods have been started from carefully selected initial
points and from a random initial point.
The LGO software failed most of the time to reach even a good solution.
This is due to the fact that the objective function has a very complex structure.
This method was limited in time, and may have reached the global solution,
had it been given a limitless amount of time.
Similarly, the local methods failed to reach the solution when started from
a random point. The reason is the large amount of local minima in the objec-
tive function which prevent local methods to reach a good solution.
However the discrete gradient method, for all the examples, reached a good
solution for at least one of the initial point. The combination reached a good
solution for all of the initial points.
This shows that for such types of functions, presenting a complex structure
and many local minima, most global methods will fail. However, well chosen
initial points will lead to a deep local minimum. Because the local methods
are much faster than global ones, it is more advantageous to start the local
method from a set of carefully chosen initial points to reach a global minimum.
The application of the combination between the discrete gradient and the
cutting angle methods appears to be a good alternative, as it is not very
dependant on the initial point, while reaching a good solution in a limited
time.
The second set of experiments was carried out over the hyperplanes func-
tion. This function having been less studied in the literature, it is harder to
draw definite conclusions. However, the experiments show very clearly that the
local methods once again strongly depend on the initial point. Unfortunately
it is harder to devise a good initial point for this objective function.
8.2 Clustering
Prom the clustering point of view, two different similarity functions have been
minimized. The first one is a variation of the widely studied cluster function,
where the points are weighted. The second one is a variation of the Bradley-
Mangasarian function, where distances from the hyperplanes are taken instead
of their square.
A method for reducing the size of the dataset, e-cleaning, has been devised
and applied. Different values for epsilon lead to different sizes of datasets.
Numerical experiments have been carried out for different values of epsilon,
leading to very small (2% and 4%) datasets.
432 A. Rubinov et al.
For the generalized cluster function, this method proves to be very suc-
cessful: even for very small datasets, the function value obtained is very sat-
isfactory. When the method was solved using the global method LGO, the
results obtained for the reduced dataset were almost always better than those
obtained for the original dataset. The reason is that the larger the dataset, the
larger number of local minima for the objective function. When the dataset is
reduced, what is lost in measurement quality is gained by the strong simplifi-
cation of the function. Because each point in the reduced dataset acts already
as a centre for its neighbourhood, minimizing the generalized cluster function
is equivalent to group these "mini" clusters into larger clusters.
It has to be noted that there is not a monotone correspondence between
the value of the generalized cluster function for the reduced and the original
dataset. It may happen that a given solution is better than another one for
the reduced dataset, and worse for the original. Thus we cannot conclude that
the solution can be reached for the reduced dataset. However, the experiments
show that the solution found for the reduced dataset is always good.
For the skeletons function, however, this method is not so successful. Al-
though this has to be taken with precautions, as the initial points for this
function could not be devised so carefully as for the cluster function, one can
expect such behavior: the reduced dataset is actually a set of cluster cen-
tres. The skeleton approach is based on the assumption that the clusters in
the dataset can be represented by hyperplanes, while the cluster approach
assumes that the clusters are represented by centres.
The experiments show the significance of the choice of the initial point to
reach good clusters. While random points did not allow any method to reach
a good solution, all initial points selected upon the structure of the dataset
lead the combination DG-CAM to the solution.
Since for the cluster function we are able to provide some good initial
points, but not for the skeleton function, unless the structure of the dataset
is known to correspond to some skeletons, we would recommend to use the
centre approach.
Finally the comparison between the results obtained by the two different
methods has to be relativized: experiments having shown the importance of
initial points, it is difficult to draw definitive conclusions fi:om the results
obtained for the skeleton approach.
However, there seems to be a relationship between the classes and the
clusters obtained by both approaches, some classes being almost absent from
certain clusters. Further investigations should be carried out in this direction,
and classification processes based on these approaches could be proposed.
Acknowledgements
The authors are very thankful to Dr. Adil Bagirov for his valuable comments.
Minimization of the Sum of Minima of Convex Functions 433
References
[Bag99] Bagirov, A.M.: Derivative-free methods for unconstrained nonsmooth op-
timization and its numerical analysis. Investigacao Operacional, 19, 75-93
(1999)
[BRSY03] Bagirov, A.M., Rubinov, A.M., Soukhoroukova, N., Yearwood, J.: Unsu-
pervised and Supervised Data Classification Via Nonsmooth and Global
Optimization. Sociedad de Estadistica e Investigacion Operativa, Top, 1 1 ,
1-93 (2003)
[BRY02] Bagirov, A.M., Rubinov, A.M., Yearwood, J.: A global optimization ap-
proach to classification. Optimization and Engineering, 3, 129-155 (2002)
[BRZ05a] Bagirov, A., Rubinov, A., Zhang, J.: Local optimization
method with global multidimensional search for descent. Jour-
nal of Global Optimization (accepted) (http://www.optimization-
online.org/DB_FILE/2004/01/808.pdf)
[BRZ05b] Bagirov, A., Rubinov, A., Zhang, J.: A new multidimensional descent
method for global optimization. Computational Optimization and Appli-
cations (Submitted) (2005)
[BZ03] Bagirov, A.M., Zhang, J.: Hybrid simulating anneaUng method and dis-
crete gradient method for global optimization. In: Proceedings of Indus-
trial Mathematics Symposium, Perth (2003)
[BBM03] Beliakov, G., Bagirov, A., Monsalve, J.E.: Parallelization of the discrete
gradient method of non-smooth optimization and its applications. In: Pro-
ceedings of the 3rd International Conference on Computational Science.
Springer-Verlag, Heidelberg, 3, 592-601 (2003)
[BMOO] Bradley, P.S., Mangasarian, O.L.: /c-Plane clustering. Journal of Global
Optimization, 16, 23-32 (2000)
[BLM02] Brimberg, J., Love, R.F., Mehrez, A.: Location/Allocation of queuing fa-
cilities in continuous space using minsum and minmax criteria. In: Parda-
los. P., Migdalas, A., Burkard, R. (eds) Combinatorial and Global Opti-
mization. World Scientific (2002)
[DR95] Demyanov, V., Rubinov, A.: Constructive Nonsmooth Analysis. Peter
Lang (1995)
[GRZ05] Ghosh, R., Rubinov, A.M., Zhang, J.: Optimisation approach for cluster-
ing datasets with weights. Optimization Methods and Software, 20 (2005)
[HF02] Hedar, A.-R., Fukushima, M.: Hybrid simulated annealing and direct
search method for nonlinear unconstrained global optimization. Optimiza-
tion Methods and Software, 17, 891-912 (2002)
[JMF99] Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM
Computing Surveys, 3 1 , 264-323 (1999)
[Kel99] Kelly, C.T.: Detection and remediatio of stagnation in the Nelder-Mead
algorithm using a sufficient decreasing condition. SI AM J. Optimization,
10, 43-55 (1999)
[MST94] Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds): Machine Learning,
Neural and Statistical Classification. Ellis Horwood Series in Artificial
Intelligence, London (1994)
[SU05] Soukhoroukova, N., Ugon, J.: A new algorithm to find a shape of a finite
set of points. Proceedings of Conference on Industrial Optimization, Perth,
Australia (Submitted) (2005)
434 A. Rubinov et al.
[YLT04] Yiu, K.F.C., Liu, Y., Teo, K.L.: A hybrid descent method for global opti-
mization. Journal of Global Optimization, 28, 229-238 (2004)
[GAM05] http://www.gams.com/
[LGO05] http://www.gams.com/solvers/lgo.pdf
[Pin05] http://www.dal.ca/ jdpinter/
[CIA05] http://www.ciao-go.com.au/index.php
Analysis of a Practical Control Policy for
Water Storage in Two Connected Dams
1 Introduction
T h e mathematical literature on storage dams, now half a century old, devel-
oped largely from the seminal work of Moran [Mor54, Mor59] and his school
(see, for example, [Gan69, Yeo74, Yeo75]). Moran was motivated by specific
practical problems faced by the Snowy Mountain Authority in Australia in the
1950s. Our present study is likewise motivated by a specific practical problem
at Mawson Lakes in South Australia relating to a pair of dams in t a n d e m .
T h e mathematical analysis of dams has proved technically more difficult
t h a n t h a t of their discrete counterpart, queues. In order to deal with the
complexity of a t a n d e m system, we t r e a t a discretised version of the prob-
lem and adopt the m a t r i x - a n a l y t i c methodology of Neuts and his school
(see [LR99, Neu89] for a modern exposition). T h e N e u t s ' methodology is well
436 P. Hewlett et al.
suited for handling processes with a bivariate state space, here the contents
of the two dams.
A further new feature in this study is the incorporation of control. For
recent work on control in the context of a dam, see [Abd03] and the references
therein. The present article is prehminary and raises issues of both practical
and theoretical interest.
In Section 2 we formulate the problem in matrix-analytic terms and in
Section 3 provide an heuristic for the determination of an invariant probability
measure for the process. This depends on the existence of certain matrix
inverses. Section 4 sketches a purely algebraic procedure for establishing the
existence of these inverses. In Section 5 we show how this can be simplified and
systematised using a probabilistic analysis based on modern machinery of the
matrix-analytic approach. In Section 6 we describe briefly how these results
enable us to determine expected long-term overflow, which is needed for the
analysis of control procedures. We conclude in Section 7 with a discussion of
extensions of the ideas presented in the earlier sections.
2 Problem formulation
We assume a discrete state model and let the first and second components of
denote respectively the number of units of water in the first and second dams
at time t. We assume a stochastic intake to the capture dam where pr denotes
the probability that r units of water will enter the dam on any given day
and a regular demand from the supply dam of 1 unit per day. To begin we
assume that pr > 0 for all r = 0 , 1 , 2 , . . . and we will also assume that these
probabilities do not depend on time. The first assumption is a reasonable
assumption in practice but the latter assumption is certainly not reasonable
over an extended period of time. We revise these assumptions later in the
paper.
We consider a class of practical pumping policies where the pumping deci-
sion depends only on the contents of the first dam. Choose an integer me [1, /i]
and pump m units from the capture dam to the supply dam each day when
the capture dam contains at least m units. For an intake r there are two basic
transition patterns
• (^i,o)^(Ci,o)
• {Z1,Z2) -^ ( C l , ^ 2 - 1)
where (i = min{[2;i -\-r],h} for zi < m, and two basic transition patterns
. (^i,0)^(Cr,m)
. (^i,^2)-(cr,C2)
Analysis of a practical control policy 437
where Q = min{[2;i — m -\- r], h} and where Q = min{[2:2 — 1 -f m], k}, for
zi > m. These transitions have probability Pr- T h e variable m is the control
variable for a class of practical control policies b u t in this paper we assume
m is fixed and suppress any notational dependence on m.
We now set up a suitable Markov chain to describe the process. In terms
of m a t r i x - a n a l y t i c machinery, it t u r n s out to be more convenient to use the
ordered pair {z2^zi) for the state of the process rather t h a n the seemingly
more n a t u r a l {zi^Z2). This we do for the remainder of the article. We now
order the states as
(0,0),...,(0,/i),(l,0),...,(l,/i),...,(fc,0),...,(fc,/i).
P e M(^+I)(^+I)X(^+I)(^+I)
0 1 ' ' m k
A 0 • • B 0 0 0 0
^ 0 - J 5 0 0 0 0
0 A ' ' 0 B 0 0 0
0 0 - A 0 0 0 0
P = 0 0 ' ' 0 A 0 0 0
0 0 - - 0 0 BOO
0 0- - 0 0 0 B 0
0 0 - - 0 0 0 0 J 3
0 0 - 0 0 A 0 B
O O ' - O O 0 A B
where
A and B e x(h+l)
An Ai2
0 0
where
438 P. Hewlett et al.
P o P i • Pm-2 Pm-l
0 po- Pm-3 Pm-2
^11 =
0 0 • Po Pi
0 0- 0 PO
and
Pm P m + 1 Ph-1 Ph
Pm—1 Pm Ph-2 Ph-1
An =
Pi P2 " Ph-m Ph-m+l.
0 0
B
B21 B22
where
PoPi Pm-2 Pm-l
0 Po Pm-3 Pm-2
0 0 Po Pi
-^21
0 0 0 PO
0 0 0 0
0 0 0 0
and
Pm Pm+1 Ph-1 Ph
Pm.—1 Pm, Ph-2 Ph-1
L 0 0 '" Pm-
The vector TT^ is a scalar multiple of the invariant probability measure for the
transition matrix Vk- We conclude that the invariant probability measure n
for the transition matrix P is unique if and only if the associated invariant
probability measure
^k '= ^k/iTTk ' 1)
for the transition matrix Vk is uniquely defined. We have established the
following rudimentary result.
T h e o r e m 1. If the sequence of matrices
is well defined by the formulae (7)-(12) then there exists an invariant measure
n for the transition matrix P. The measure is not necessarily unique.
440 P. Hewlett et al.
^11 • 1 = <i.
pfj
It follows that (7 — All) ^ is well defined and hence
{I - An)-'Au{I - An)-'
0 /
is also well defined. It is necessary to begin with an elementary but important
result. This result, and other later results in this section, have already been
established by Piantadosi [Pia04] but for convenience we present details of the
more elementary proofs to indicate our general method of argument.
L e m m a 1. If pr > 0 for a// r = 0 , 1 , . . . then
{I-A)-'B'1 = 1 and A'^-'il - A)-'B -1 < A"^''-1
and the matrix Vm — A[I — A^~^{I — A)~^B]~^ is well defined.
Proof Note that A'l-{- B -1 = 1 implies B - 1 = {I - A) - 1 and hence
{I-A)-'B'1 = 1.
Now
A^-\I-A)-'B'1 = A'^-' -1
0 0
ATf'lAn-l+A 12 • 1]
0
Am-2
0
< 1.
Hence Vm = A[I - A ^ - i ( / - A)-^B]-'^ is well defined D
E^^
i=l
i-l, i-e B'l = l
Proof. For details of the rather long and difRcult proof we refer the reader to
Piantadosi [Pia04] where the notation dictates that the identities are described
and established in two parts as the (JP) identities of the first and second kind.
The complexity of these identities is masked in the current paper by notational
sophistication. D
5 Probabilistic analysis
In practice the matrix P can be expected to be irreducible. First we establish
the following simple sufficient condition for this to be the case.
Theorem 2. Suppose A, B have the forms displayed above and that k > m.
If
(i) m > 1 and
(ii) po,pi,...,ph-i,p^ > 0,
then the matrix P is irreducible.
Proof. We use the notation P(ij)^(r,s) to refer to the element in the matrix
P describing the transition from state {i,j) to state (r, s) and we write A =
[aj^s] and B = [bj^s] to denote the individual elements of A and B. To prove
irreducibility, it suffices to show that, for any state (i,j), there is a path of
positive probability from state (fc, /i) to state (k^h) and a path of positive
probability from state (i, j ) to state {k,h).
The former may be seen as follows. For i = k with h — m < j < h^
by (ii), so there is a path consisting of a single step. For i = k with 0 < j <
h — m^ there exists a positive integer £ such that h — m < j -\- £{m -\-1) < h.
One path of positive probability from (fc, h) to (/c, j ) consists of the consecutive
steps
Finally, for i < k, one such path is obtained by passing from {k,h) to (A:,0)
as above and then proceeding
We now consider passage from (i, j ) to (fc, h). For j = 0, (i, j ) has one-step
access to (0, h) (if i = 0) or to (i — 1, h) (if i > 0), while for j > 0, (i, j ) has
one-step access to (z + m, /i) (if i — 0), to (i + m — 1, /i) (if 0 < i < /c — m + 1)
or to {k,h) (iffc— m + 1 < i <k). Putting these results together shows that
each state (i, j ) has a path of positive probability to {k, h).
By the results of the two previous paragraphs, the chain is irreducible. D
442 P. Hewlett et al.
Next we derive invertibility results for some key (/i + 1) x (/i+ 1) matrices.
While this can be effected purely in terms of matrix arguments, a shorter
derivation is available employing probabilistic arguments, based on successive
censorings of a Markov chain.
Theorem 3. Suppose conditions (i) and (ii) of Theorem 2 apply. Then there
exists a sequence {Vi}o<i<k of {h-\-1) x {h-\-1) matrices defined by equations
(7), (8), (9), (10), (11) and (12). The matrices F o , . . . , T 4 - i are invertible,
Proof It suffices to show that the formulae (7), (8), (9) hold and that for
k > m-f 1 the formula (10) is vahd. Let Co be a Markov chain of the same form
as P but with k replaced by K > 2k. By Theorem 2, Co is irreducible and
finite and so positive recurrent. Denote by Ci {1 < i < k) the Markov chain
formed by censoring out levels 0, 1, ... , i — 1, that is, observing Co only when
it is in the levels i,i + l , . . . , K . The chain Ci must also be irreducible and
positive recurrent. For 0 < i < fc, denote by Pi the one-step transition matrix
of Ci and by Qi its leading block. Then Qi is the sub-stochastic one-step
transition matrix of a Markov chain Vi whose states form level i of CQ. Since
Ci is recurrent, the states of T>i must all be transient and so X^^o Q? *^ ^^•
Hence I — Qi is invertible foi 0 < i < k.
We shall show that the matrices
Vi-A{I-Qi)-' {0<i<k)
oo
n=0
Thus enumerating all paths yields
which provides (9). Prom similar enumerations of paths, the leading row of
Pn may be derived to be
Qm+l = Wm,2B
Suppose these hold for some i satisfying m < i < k. Since the two leading
rows of Pi are
we have
Theorem 4. Suppose that conditions (i) and (ii) of Theorem 2 apply. Then
the probability vectors TTI satisfy the recurrence relations (6) and ixk is the
invariant measure of the matrix Vk defined by (12). The measure TT is unique.
Proof For i = 0, (6) follows from (1) and (7). For 0 < i < m, (6) is
immediate from (2) and (8). These two parts combine to provide
^ = EEE/[(i,i)Hpr-
i=0 j=o lr=0
where TT^J is the invariant probability of state (z, j ) at level i and phase j and
f[{i,j)\r] is the overflow from state (i, j ) when r units of stormwater enter the
system. Note that we have ignored pumping cost and other costs which are
likely to be factors in a real system.
Theorem 5,IfpQ>0 and p^ > 0 then there is at least one finite cycle with
non-zero invariant probability that includes all levels 0 , 1 , . . . , A: of the second
dam. All states have access to this cycle in finite time with finite probability
and hence are either transient with invariant probability zero or else are part
of a single maximal cycle.
Proof (Outhne) In this paper we have tried to look beyond a simply alge-
braic view. For this reason we suggest an alternative proof. Let po = ^ > 0.
If p+ > 0 then there is some r > m with p^ = e > 0. Our argument here
assumes r > m. Choose p so that 0 < h — pm < m and choose s so that
{s + l)r — (p + s)m > 0 and s{m — 1) + 1 > fc and t so that t > p + k and
consider the elementary cycle
446 P. Hewlett et al.
where the invariant probability TT^ for level i is non-zero for all i = 0 , . . . , A;. All
transient states have zero probability and all states in the cycle have non-zero
probability. D
Observe that by adding together the separate equations (1), (2), (3), (4)
and (5) for the vectors TTQ, . . . , TT^ we obtain the equation
Therefore
p — TTo H h TTfc
S = A + B.
Indeed a little thought shows us that S is the transition matrix for the phase
j of the state vector. By analysing these transitions we can shed some light
on the structure of the full irreducible cycle for the original system.
We have another interesting result.
Theorem 6. Ifpo = 5 > 0 andpr = e > 0 for some r > m and z/gcd(m, r) =
1 then for every phase j ~ 0, l , 2 , . . . , / i we can find non-negative integers
P — P{j) ^^^ Q = QU) ^^c/i that
pr — qm = j
Proof (Outline) We suppose only that po > 0 and Pr > 0 for some r > m.
In the following phase transition diagram we suppose that
r — m < m, 2r — 3m < m,
Analysis of a practical control policy 447
and note that the following phase transitions are possible for j with non-zero
probability.
0 -> [0 U r]
This result means that the unique irreducible cycle for the (i,j) chain
generated by P which already includes all possible levels i G [0, /c] also includes
all possible phases j G [0, h] although not necessarily all states (i,i).
<^[t]WW)) = 4W)-
with
^(t+i) ^ x(*)p([i])
for each i = 1,2,... should satisfy
xW -^ x{[t])
as ^ —> oo. Because the contraction operates in the same structural way for
every value of [t] we expect that convergence will occur quite seamlessly. This
is demonstrated in the following simple example. There is no reason to expect
the convergence to be slower in the case where we have a product of a larger
number of matrices.
Example 1. Let [t] = {t - 1) mod 2 + 1 with R{1) = P(1)P(2) and R{2) =
P(2)P([3]) = P(2)P(1) where
Am 0 B{{t]) 0
Ai[t]) 0 Bi[t]) 0
Pilt]) 0 A{[t]) 0 Bm
0 0 A{[t])Bi{t])
Analysis of a practical control policy 449
and
0.45 0.27 0.13 0.15 0 0 0 0
0 0.45 0.27 0.28 0 0 0 0
A{2) = and B{2) =
0 0 0 0 0.45 0.27 0.13 0.15
0 0 0 0 0 0.45 0.27 0.28
Using MATLAB we calculate
p(l)-(0.2,0.4,0.2,0.2)
and so we set
1
x^'^ = -(p{l),pil),p{l),p{l))
= (.0500, .1000, .0500, .0500, .0500, .1000, .0500, .0500,
.0500, .1000, .0500, .0500, .0500, .1000, .0500, .0500)
and calculate
x(2^ = (.0500, .1250, .0625, .0625, .0250, .0625, .0312, .0312
.0750, .1375, .0687, .0687, .0500, .0750, .0375, .0375)
x(3) -. (.0338, .1046, .0604, .0638, .0338, .0821, .0469, .0498
.0647, .1148, .0643, .0688, .0478, .0765, .0425, .0457)
x^^^ - (.0338, .1103, .0551, .0551, .0323, .0735, .0368, .0368
.0775, .1338, .0669, .0669, .0534, .0839, .0420, .0420)
Thus w e have
References
[Abd03] Abdel-Hameed, M.: Optimal control of dams using Pjf^ policies and
penalty cost. Mathematical and Computer Modelling, 38, 1119-1123
(2003)
[Gan69] Gani, J.: Recent advances in storage and flooding theory. Advanced Ap-
plied Probability, 1, 90-110 (1969)
[KT65] Karlin, S., Taylor, H.M.: A First Course in Stochastic Processes. Wiley
and Sons, New York (1965)
[LR99] Latouche, G., Ramaswami, V.: Introduction to Matrix Analytic Methods
in Stochastic Modeling. SI AM (1999)
[Mor54] Moran, P.A.P.: A probability theory of dams and storage systems. Journal
of Applied Science, 5, 116-124 (1954)
[Mor59] Moran, P.A.P.: The Theory of Storage. Wiley and Sons, New York (1959)
[Neu89] Neuts, M.F.: Structured Stochastic Matrices of M / G / 1 type and Their
AppHcations. Marcel Dekker, Inc. (1989)
[Pia04] Piantadosi, J.: Optimal Pohcies for Management of Urban Stormwater,
PhD Thesis, University of South Australia (2004)
[Yeo74] Yeo, G.F.: A finite dam with exponential variable release. Journal of Ap-
plied Probability, 1 1 , 122-133 (1974)
[Yeo75] Yeo, G.F.: A finite dam with variable release rate. Journal of Applied
Probability, 12, 205-211 (1975)