100% found this document useful (1 vote)
401 views

Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov

Optimización Continua

Uploaded by

Raúl Castellón
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
401 views

Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov

Optimización Continua

Uploaded by

Raúl Castellón
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 453

CONTINUOUS OPTIMIZATION

Current Trends and Modem Applications


Applied Optimization

VOLUME 99

Series Editors:

Panos M. Pardalos
University of Florida, U.S.A.

Donald W. Heam
University of Florida, U.S.A.
CONTINUOUS OPTIMIZATION
Current Trends and Modem Applications

Edited by

VAITHILINGAM JEYAKUMAR
University of New South Wales, Sydney, Australia

ALEXANDER RUBINOV
University of Ballarat, Ballarat, Australia

Springer
Library of Congress Cotaloging-in-Publication Data

Continuous optimization : current trends and modern applications / edited by


Vaithilingam Jeyakumar, Alexander Rubinov.
p. c m . — (Applied optimization ; v. 99)
Includes bibliographical references.
ISBN-13: 978-0-387-26769-2 (acid-free paper)
ISBN-10: 0-387-26769-7 (acid-free paper)
ISBN-13: 978-0-387-26771-5 (ebook)
ISBN-10: 0-387-26771-9 (ebook)
1. Functions, Continuous. 2. Programming (Mathematics). 3. Mathematical models.
I. Jeyakumar, Vaithilingam. II. Rubinov, Aleksandr Moiseevich. III. Series.

QA331 .C657 2005


515'.222—dc22
2005049900

AMS Subject Classifications: 65Kxx, 90B, 90Cxx, 62H30

© 2005 Springer Science+Business Media, Inc.


All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, Inc., 233 Spring Street, New York, NY
10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now know or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks and similar terms, even if the are not
identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to
proprietary rights.

Printed in the United States of America.

9 8 7 6 5 4 3 2 1 SPIN 11399797

springeronline. com
Contents

Preface XIII

List of Contributors XV

Part I Surveys

Linear Semi-infinite Optimization: Recent Advances


Miguel A. Goberna 3
1 Introduction 3
2 Linear semi-infinite systems 5
3 Applications 8
4 Numerical methods 11
5 Perturbation analysis 13
References 17
Some Theoretical Aspects of Newton's Method for
Constrained Best Interpolation
Hou-Duo Qi 23
1 Introduction 23
2 Constrained Interpolation in Hilbert Space 26
3 Nonsmooth Functions and Equations 31
4 Newton's Method and Convergence Analysis 33
4.1 Newton's Method 33
4.2 Splitting and Regularity 36
4.3 Semismoothness 39
4.4 Application to Inequality Constraints 42
4.5 Globahzation 44
5 Open Problems 45
References 46
VI Contents

Optimization Methods in Direct and Inverse Scattering


Alexander G. Ramm, Semion Gutman 51
1 Introduction 52
2 Identification of small subsurface inclusions 54
2.1 Problem description 54
2.2 Hybrid Stochastic-Deterministic Method(HSD) 56
2.3 Description of the HSD Method 59
2.4 Numerical results 60
3 Identification of layers in multilayer particles 63
3.1 Problem Description 63
3.2 Best Fit Profiles and Local Minimization Methods 65
3.3 Global Minimization Methods 68
4 Potential scattering and the Stability Index method 70
4.1 Problem description 70
4.2 Stability Index Minimization Method 72
4.3 Numerical Results 75
5 Inverse scattering problem with fixed-energy data 80
5.1 Problem description 80
5.2 Ramm's inversion method for exact data 80
5.3 Discussion of the inversion method which uses the DN map . . . 84
6 Obstacle scattering by the Modified Rayleigh Conjecture (MRC)
method S^
6.1 Problem description 86
6.2 Direct scattering problems and the Rayleigh conjecture 89
6.3 Numerical Experiments 90
6.4 Conclusions 94
7 Support Function Method for inverse obstacle scattering problems. . 95
7.1 Support Function Method (SFM) 95
7.2 Numerical results for the Support Function Method 98
8 Analysis of a Linear Sampling method 102
References 105
On Complexity of Stochastic Programming Problems
Alexander Shapiro, Arkadi Nemirovski Ill
1 Introduction Ill
2 Complexity of two-stage stochastic programs 114
3 What is easy and what is diflicult in stochastic programming? 122
3.1 What is difiicult in the two-stage case? 128
3.2 Complexity of multi-stage stochastic problems 129
4 Some novel approaches 133
4.1 Tractable approximations of chance constraints 133
4.2 Multistage Stochastic Programming in linear decision r u l e s . . . . 140
References 144
Contents VII

Nonlinear Optimization in Modeling Environments: Software


Implementations for Compilers, Spreadsheets, Modeling
Languages, and Integrated Computing Systems
Jdnos D. Pinter 147
1 Introduction 147
2 A solver suite approach to practical global optimization 152
3 Modeling systems and user demands 154
4 Software implementation examples 156
4.1 LGO solver system with a text I/O interface 156
4.2 LGO integrated development environment 157
4.3 LGO solver engine for Excel users 159
4.4 MathOptimizer Professional 162
4.5 Maple Global Optimization Toolbox 165
5 Further Applications 168
6 Conclusions 168
References 169
Supervised Data Classification via Max-min Separability
Adil M. Bagirov, Julien Ugon 175
1 Introduction 175
2 Preliminaries 177
2.1 Linear separability 177
2.2 Bilinear separability 178
2.3 Polyhedral separability 179
3 Max-min separability 180
3.1 Definition and properties 181
3.2 Error function 187
4 Minimization of the error function 191
4.1 Statement of problem 192
4.2 DiflPerential properties of the objective function 193
4.3 Discrete gradient method 195
5 Results of numerical experiments 200
5.1 Supervised data classification via max-min separability 200
5.2 Results on small and middle size datasets 201
5.3 Results on larger datasets 203
6 Conclusions and further work 204
References 205
A Review of Applications of the Cutting Angle Methods
Gleb Beliakov 209
1 Introduction 209
2 Support functions and lower approximations 210
2.1 Basic definitions 210
2.2 Choices of support functions 212
2.3 Relation to Voronoi diagrams 215
VIII Contents

3 Optimization: the Cutting Angle method 217


3.1 Problem formulation 217
3.2 The Cutting Angle algorithm 218
3.3 Enumeration of local minima 219
3.4 Numerical experiments 222
3.5 Applications 223
4 Random variate generation: acceptance/ rejection 224
4.1 Problem formulation 224
4.2 Log-concave densities 226
4.3 Univariate Lipschitz densities 227
4.4 Lipschitz densities in K^ 230
4.5 Description of the algorithm 231
4.6 Numerical experiments 233
5 Scattered data interpolation: Lipschitz approximation 235
5.1 Problem formulation 235
5.2 Best uniform approximation 237
5.3 Description of the algorithm 238
5.4 Numerical experiments 240
6 Conclusion 244
References 244

Part II Theory and Numerical Methods

A Numerical Method for Concave Programming Problems


Altannar Chinchuluun, Enkhhat Rentsen, Panos M. Pardalos 251
1 Introduction 251
2 Global Optimality Condition 252
3 Approximation Techniques of the Level Set 254
4 Algorithms and their Convergence 262
5 Numerical Examples 270
6 Conclusions 272
References 272
Convexification and Monotone Optimization
Xiaoling Sun, Jianling Li, Duan Li 275
1 Introduction 275
2 Monotonicity and convexity 276
3 Monotone optimization and concave minimization 281
3.1 Equivalence to concave minimization 281
3.2 Outer approximation algorithm for concave minimization
problems 281
4 Polyblock outer approximation method 283
5 A hybrid method 286
6 Conclusions 288
Contents IX

7 Acknowledgement 289
References 289
Generalized Lagrange Multipliers for Nonconvex Directionally
Differentiable Programs
Nguyen Dinh, Gue Myung Lee, Le Anh Tuan 293
1 Introduction and Preliminaries 293
2 Generalized Lagrange Multipliers 296
2.1 Necessary conditions for optimality 296
2.2 Sufficient condition for optimality 301
3 Special Cases and Applications 304
3.1 Problems with convexlike directional derivatives 304
3.2 Composite nonsmooth programming with Gateaux
differentiability 305
3.3 Quasidifferentiable problems 309
4 Directionally Differentiable Problems with DSL-approximates 314
References 317
Slice Convergence of Sums of Convex functions in Banach
Spaces and Saddle Point Convergence
Robert Wenczel, Andrew Eberhard 321
1 Introduction 321
2 Preliminaries 323
3 A Sum Theorem for Slice Convergence 327
4 Saddle-point Convergence in Fenchel Duality 336
References 341
Topical Functions and their Properties in a Class of Ordered
Banach Spaces
Hossein Mohebi 343
1 Introduction 343
2 Preliminaries 344
3 Plus-Minkowski gauge and plus-weak Pareto point for a downward
set 347
4 X(^-subdifferential of a topical function 349
5 Fenchel-Moreau conjugates with respect to cp 353
6 Conjugate of type Lau with respect to ip 357
References 360

Part III Applications

Dynamical Systems Described by Relational Elasticities with


Applications
Musa Mammadov, Alexander Rubinov, John Yearwood 365
1 Introduction 365
X Contents

2 Relationship between two variables: relational elasticity 367


3 Some examples for calculating relational elasticities 369
4 Dynamical systems 370
5 Classification Algorithm based on a dynamical systems approach . .. 374
6 Algorithm for global optimization 377
7 Results of numerical experiments 380
8 Conclusions and future work 381
References 383
Impulsive Control of a Sequence of Rumour Processes
Charles Pearce, Yalcin Kaya, Selma Belen 387
1 Introduction 387
2 Single-Rumour Process and Preliminaries 389
3 Scenario 1 391
4 Monotonicity of ^ 395
5 Convexity of ^ 399
6 Scenario 2 402
7 Comparison of Scenarios 405
References 406
Minimization of the Sum of Minima of Convex Functions and
Its Application to Clustering
Alexander Rubinov, Nadejda Soukhoroukova, Julien Ugon 409
1 Introduction 409
2 A class of sum-min functions 410
2.1 Functions represented as the sum of minima of convex functions410
2.2 Some properties of functions belonging to ^ 411
3 Examples 411
3.1 Cluster functions and generalized cluster functions 412
3.2 Bradley-Mangasarian approximation of a finite set 412
3.3 Skeleton of a finite set of points 413
3.4 Illustrative examples 414
4 Minimization of sum-min functions belonging to class J^ 415
5 Minimization of generalized cluster function 417
5.1 Construction of generalized cluster functions 417
5.2 Initial points 418
6 Numerical experiments with generalized cluster function 419
6.1 Datasets 419
6.2 Numerical experiments: description 419
6.3 Results of numerical experiments 420
7 Skeletons 424
7.1 Introduction 424
7.2 Numerical experiments: description 427
7.3 Numerical experiments: results 429
7.4 Other experiments 430
Contents XI

8 Conclusions 430
8.1 Optimization 430
8.2 Clustering 431
References 433
Analysis of a Practical Control Policy for Water Storage in
Two Connected Dams
Phil Howlett, Julia Piantadosi, Charles Pearce 435
1 Introduction 435
2 Problem formulation 436
3 Intuitive calculation of the invariant probability 438
4 Existence of the inverse matrices 440
5 Probabilistic analysis 441
6 The expected long-term overflow 445
7 Extension of the fundamental ideas 445
References 450
Preface

Continuous optimization is the study of problems in which we wish to opti-


mize (either maximize or minimize) a continuous function (usually of several
variables) often subject to a collection of restrictions on these variables. It has
its foundation in the development of calculus by Newton and Leibniz in the
17*^ century. Nowadys, continuous optimization problems are widespread in
the mathematical modelling of real world systems for a very broad range of
applications.
Solution methods for large multivariable constrained continuous optimiza-
tion problems using computers began with the work of Dantzig in the late
1940s on the simplex method for linear programming problems. Recent re-
search in continuous optimization has produced a variety of theoretical devel-
opments, solution methods and new areas of applications. It is impossible to
give a full account of the current trends and modern applications of contin-
uous optimization. It is our intention to present a number of topics in order
to show the spectrum of current research activities and the development of
numerical methods and applications.
The collection of 16 refereed papers in this book covers a diverse number
of topics and provides a good picture of recent research in continuous opti-
mization. The first part of the book presents substantive survey articles in
a number of important topic areas of continuous optimization. Most of the
papers in the second part present results on the theoretical aspects as well as
numerical methods of continuous optimization. The papers in the third part
are mainly concerned with applications of continuous optimization.
We feel that this book will be an additional valuable source of informa-
tion to faculty, students, and researchers who use continuous optimization to
model and solve problems. We would like to take the opportunity to thank
the authors of the papers, the anonymous referees and the colleagues who
have made direct or indirect contributions in the process of writing this book.
Finally, we wish to thank Fusheng Bai for preparing the camera-ready version
of this book and John Martindale and Robert Saley for their assistance in
producing this book.

Sydney and Ballarat Vaithilingam Jeyakumar


April 2005 Alexander Rubinov
List of Contributors

Adil M. Bagirov Nguyen Dinh


CIAO, School of Information Department of Mathematics-
Technology and Mathematical Informatics
Sciences Ho Chi Minh City University of
University of Ballarat Pedagogy
Ballarat, VIC 3353 280 An Duong Vuong St., District 5,
Australia HCM city
a.bagirovQballarat.edu.au Vietnam
ndinhOhcmup.edu.vn
Selma Belen
School of Mathematics Andrew Eberhard
The University of Adelaide Department of Mathematics
Adelaide, SA, 5005 Royal Melbourne University of
Australia Technology
sbelenOankara.baskent.edu.tr
Melbourne, 3001
Australia
andy.ebOrmit.edu.au
Gleb Beliakov
School of Information Technology Miguel A. Goberna
Deakin University Dep. de Estadistica e Investigacion
221 Burwood Hwy, Burwood, 3125 Operativa
Australia Universidad de Alicante
glebOdeakin.edu.au Spain
mgobernaOua.es
Altannar Chinchuluun
Department of Industrial and Semion Gutman
Systems Engineering Department of Mathematics
University of Florida University of Oklahoma
303 Weil Hall, Gainesville, FL, 32611 Norman, OK 73019
USA USA
altannarOuf1.edu sgutmsinQou. edu
XVI List of Contributors

Phil Hewlett Pukyong National University


Centre for Industrial and Applied 599 - 1, Daeyeon-3Dong, Nam-Gu,
Mathematics Pusan 608 - 737
University of South Australia Korea
Mawson Lakes, SA 5095 gmleeOpknu.ae.kr
Australia
phil.howlettOunisa.edu.au Musa Mammadov
CIAO, School of Information
Yalcin Kaya Technology and Mathematical
School of Mathematics and Statistics Sciences
University of South Australia University of Ballarat
Mawson Lakes, SA, 5095 Ballarat, VIC 3353
Australia; Australia
Departamento de Sistemas e m.mammadovQballarat.edu.au
Computagao
Universidade Federal do Rio de Hossein Mohebi
Janeiro Department of Mathematics
Rio de Janeiro Shahid Bahonar University of
Brazil Kerman
Yale in.KayaQunisa.edu.au Kerman
Iran
Duan Li hmohebiOmail.uk.ae.ir
Department of Systems Engineering
and Engineering Arkadi Nemirovski
Management Technion - Israel Institute of
The Chinese University of Hong Technology
Kong Haifa 32000
Shatin, N.T., Hong Kong Israel
P.R. China nemirovsQie.teehnion.ae.il
dliQse.cuhk.edu.hk
Panos M. Pardalos
Jianling Li Department of Industrial and
Department of Mathematics Systems Engineering
Shanghai University University of Florida
Shanghai 200436 303 Weil Hall, Gainesville, FL,
P.R. China; 32611, USA
College of Mathematics and pardalosOuf1.edu
Information Science
Guangxi University
Nanning, Guangxi 530004 Charles Pearce
P.R. China School of Mathematics
[email protected] The University of Adelaide
Adelaide, SA 5005
Gue Myung Lee Australia
Division of Mathematical Sciences epeareeOmaths.adelaide.edu.au
List of Contributors XVII

Julia Piantadosi Alexander Shapiro


Centre for Industrial and Applied Georgia Institute of Technology
Mathematics Atlanta, Georgia 30332-0205
University of South Australia USA
Mawson Lakes, SA, 5095 ashapiroOisye.gatech.edu
Australia
Julia.piantadosiQunisa.edu.au Nadejda Soukhoroukova
Janos D . Pinter CIAO, School of Information
Pinter Consulting Services, Inc. Technology and Mathematical
129 Glenforest Drive, Halifax, NS, Sciences,
B3M 1J2 University of Ballarat
Canada Ballarat, VIC 3353
jdpinterOhfx.eastlink.ca Australia
n.soukhoroukovaOballarat.edu.au
Hou-Duo Qi
School of Mathematics
Xiaoling Sun
The University of Southampton,
Highfield Department of Mathematics
Southampton S017 IBJ Shanghai University
Great Britain Shanghai 200444
hdqiOsoton.ac.uk
P. R. China
xlsunOstaff.shu.edu.en
Alexander G. R a m m
Department of Mathematics Le Anh Tuan
Kansas State University Ninh Thuan College of Pedagogy
Manhattan, Kansas 66506-2602 Ninh Thuan
USA Vietnam
rammQmath.ksu.edu latuan02(9yahoo. com
Enkhbat Rentsen
Department of Mathematical Julien Ugon
Modeling CIAO, School of Information
School of Mathematics and Com- Technology and Mathematical
puter Science Sciences
National University of Mongolia University of Ballarat
Ulaanbaatar Ballarat, VIC 3353
Mongolia Australia
[email protected] [email protected]
Alexander Rubinov
CIAO, School of Information Robert Wenczel
Technology and Mathematical Department of Mathematics
Sciences Royal Melbourne University of
University of Ballarat Technology
Ballarat, VIC 3353 Melbourne, VIC 3001
Australia Australia
a.rubinovOballarat.edu.au robert.wenczelOrmit.edu.au
XVIII List of Contributors

John Yearwood University of Ballarat


CIAO, School of Information Ballarat, VIC 3353
Technology and Mathematical Australia
Sciences j.yearwoodOballarat.edu.au
Part I

Surveys
Linear Semi-infinite Optimization: Recent
Advances

Miguel A. Goberna

Dep. de Estadistica e Investigacion Operativa


Universidad de Alicante
Spain
mgobernaQua.es

Summary. Linear semi-infinite optimization (LSIO) deals with linear optimization


problems in which either the dimension of the decision space or the number of con-
straints (but not both) is infinite. This paper overviews the works on LSIO published
after 2000 with the purpose of identifying the most active research fields, the main
trends in applications, and the more challenging open problems. After a brief in-
troduction to the basic concepts in LSIO, the paper surveys LSIO models arising
in mathematical economics, game theory, probability and statistics. It also reviews
outstanding real applications of LSIO in semidefinite programming, telecommunica-
tions and control problems, in which numerical experiments are reported. In almost
all these applications, the LSIO problems have been solved by means of ad hoc
numerical methods, and this suggests that either the standard LSIO numerical ap-
proaches are not well-known or they do not satisfy the users' requirements. From the
theoretical point of view, the research during this period has been mainly focused on
the stability analysis of different objects associated with the primal problem (only
the feasible set in the case of the dual). Sensitivity analysis in LSIO remains an open
problem.

2 0 0 0 M R S u b j e c t C l a s s i f i c a t i o n . Primary: 90C34, 90C05; Secondary:


15A39, 49K40.

K e y w o r d s : semi-infinite optimization, linear inequality systems

1 Introduction
Linear semi-infinite optimization (LSIO) deals with linear optimization prob-
lems such t h a t either the set of variables or the set of constraints (but not
both) is infinite. In particular, LSIO deals with problems of the form

(P) Inf dx s.t. o!^x > hu for all t € T,


4 M.A. Goberna

where T is an infinite index set, ceW^,a\T \—> R"", and 6 : T i—> R, which
are called primal The Haar^s dual problem of (P) is

{D) Sup ^ A t ^ t , s.t. ^ A t a t ^ c , AGR f \


teT tsT

where R^ ^ denotes the positive cone in the space of generalized finite se-
quences R^-^^ (the linear space of all the functions A : T H-> R such that A^ = 0
for alH G T except maybe for a finite number of indices). Other dual LSIO
problems can be associated with (P) in particular cases, e.g., if T is a compact
Hausdorff topological space and a and h are continuous functions, then the
continuous dual problem of (P) is

(Do) Sup I bt^i {dt) s.t. / atiJi (dt) = c, fi e C'^ (T),


JT JT

where C!^ (T) represents the cone of nonnegative regular Borel measures on T
(R^^ ' can be seen as the subset of C^. (T) formed by the nonnegative atomic
measures). The value of all these dual problems is less or equal to the value
of (P) and the equality holds under certain conditions involving either the
properties of the constraints system a = {a[x > 6t, t G T} or some relationship
between c and a. Replacing the linear functions in (P) by convex functions we
obtain a convex semi-infinite optimization (CSIO) problem. Many results and
methods for ordinary linear optimization (LO) have been extended to LSIO,
usually assuming that the linear semi-infinite system (LSIS) a satisfies certain
properties. In the same way, LSIO theory and methods have been extended
to CSIO and even to nonlinear semi-infinite optimization (NLSIO).
We denote by P , P* and v[P) the feasible set, the optimal set and the value
of (P), respectively (the same notation will be used for NLSIO problems). The
boundary and the set of extreme points of P will be denoted by B and P ,
respectively. We also represent with yl, yl* and v{D) the corresponding objects
of [D). We also denote by P the solution set of a. For the convex analysis
concepts we adopt a standard notation (as in [GL98]).
At least three reasons justify the interest of the optimization community
in LSIO. First, for its many real life and modeling applications. Second, for
providing nontrivial but still tractable optimization problems on which it is
possible to check more general theories and methods. Finally, LSIO can be
seen as a theoretical model for large scale LO problems.
Section 2 deals with LSISs theory, i.e., with existence theorems (i.e., char-
acterizations of P 7^ 0) and the properties of the main families of LSISs in the
LSIO context. The main purpose of this section is to establish a theoretical
frame for the next sections.
Section 3 surveys recent applications of LSIO in a variety of fields. In fact,
LSIO models arise naturally in difi'erent contexts, providing theoretical tools
for a better understanding of scientific and social phenomena. On the other
hand, LSIO methods can be a useful tool for the numerical solution of difficult
Linear Semi-infinite Optimization: Recent Advances 5

problems. We shall consider, in particular, the connection between LSIO and


semidefinite programming (SDP).
Section 4 reviews the last contributions to LSIO numerical methods. We
shall also mention some CSIO methods as far as they can be applied, in
particular, to hnear problems.
Finally, Section 5 deals with the perturbation analysis of LSIO problems.
In fact, in many applications, due to either measurement errors or rounding er-
rors occurring during the computation process, the nominal data (represented
by the triple (a, 6, c)) are replaced in practice by approximate data. Stability
results allow to check whether small perturbations of the data preserve desir-
able properties of the main objects (as the nonemptiness of F , F*, A and A*,
or the boundedness of v{P) and v{D)) and, in the affirmative case, allow to
know whether small perturbations provoke small variations of these objects.
Sensitivity results inform about the variation of the value of the perturbed
primal and dual problems. Sections 1.2 and 1.5 can be seen as updating the
last survey paper on LSIO theory ([GL02]) although for the sake of brevity
certain topics are not considered here, e.g., excess of information phenomena
in LSIO ([GJROl, GJM03, GJM05]), duality in LSIO ([KZOl, ShaOl, Sha04]),
inexact LSIO ([GBA05]), etc.

2 Linear semi-infinite systems


Most of the information on a is captured by its characteristic cone,

J. = c„„e{(;;).,eT;(!",)}.
The reference cone of cr, cli^, characterizes the consistency of a (by the
condition I J^ J ^ c\K) as well as the halfspaces containing its solution set,

F (if it is nonempty): a^x > 6 is a consequence of a if and only if ( i I ^ cl/f


(nonhomogeneous Farkas Lemma). Thus, if Fi and F2 {Ki and K2) are the
solution sets (the characteristic cones, respectively) of the consistent systems
(7i and (72, then Fi C F2 if and only if CIK2 C cli^i (this characterization of
set containment is useful in large scale knowledge-based data classification,
see [Jey03] and [GJD05]). All these results have been extended from LSISs to
linear systems containing strict inequalities ([GJR03, GR05]) and to convex
systems possibly containing strict inequalities ([GJD05]). On the other hand,
since Fi = F2 if and only if cli^2 = cl/fi, there exists a one-to-one corre-
spondence between closed convex sets in R^ and closed convex cones in W^'^^
containing I ^ j (the reference cone of their corresponding linear represen-
tations). Thus many families of closed convex sets have been characterized by
6 M.A. Goberna

means of the corresponding properties of their corresponding reference cones


([GJR02]). If the index set in a depends on the variable x, as it happens in
generaHzed semi-infinite optimization (GSIO), F may be nonclosed and even
nonconnected ([RSOl]).
Let us recall the definition of the main classes of consistent LSIS (which
are analyzed in Chapter 5 of [GL98]).
a is said to be continuous {analytic, polynomial) if T is a compact Haus-
dorff space (a compact interval, respectively) and the coefficients are contin-
uous (analytic, polynomial, respectively) on T. Obviously,

a polynomial —> a analytic -^ a continuous.


In order to define the remaining three classes of LSISs we associate with
X E F two convex cones. The cone of feasible directions at x is

D (F; x) = {d eW \39 > 0,x + ed e F}

and the active cone at x is

A (x) := cone {at \ a[x = bt, t E T}

(less restrictive definitions of active cone are discussed in [GLT03b] and


[GLT03c]).
a is Farkas-Minkowsky (FM) if every consequence of a is consequence of a
finite subsystem (i.e., K is closed), a is locally polyhedral (LOP) if D (F; x) =
A {x) for all x E F. Finally, a is locally Farkas-Minkowsky (LFM) if every
consequence of a binding at a certain point of F is consequence of a finite
subsystem (i.e., D (F; x) ~ A (x) for all x e F). We have

a continuous & Slater c.q. -> a FM —» a LFM ^ a LOP.

The statement of two basic theorems and the sketch of the main numerical
approaches will show the crucial role played by the above families of LSISs, as
constraint qualifications, in LSIO theory and methods (see [GL98] for more
details).
Duality theorem: if a is FM and F ^^^ 0 ^^ yl, then v{D) = v{P) and (D)
is solvable.
Optimality theorem: if x G F satisfies the KKT condition c G A{x),
then X G F*, and the converse is true if a is LFM.
Discretization methods generate sequences of points in R^ converging
to a point of F* by solving suitable LO problems, e.g., sequences of optimal
solutions of the subproblems of (P) which are obtained by replacing T with
a sequence of grids. The classical cutting plane approach consists of replacing
in (P) the index set T with a finite subset which is formed from the previous
one according to certain aggregation and elimination rules. The central cutting
plane methods start each step with a polytope containing a sublevel set of (P),
calculate a certain "centre" of this polytope by solving a suitable LO problem
Linear Semi-infinite Optimization: Recent Advances 7

and then the polytope is updated by aggregating to its defining system either
a feasibhty cut (if the center is unfeasible) or an objective cut (otherwise). In
order to prove the convergence of any discretization method it is necessary
to assume the continuity of a. The main difficulties with these methods are
undesirable jamming (unless (P) has a strongly unique optimal solution) and
the increasing size of the auxiliary LO problems (unless efficient elimination
rules are implemented).
Reduction methods replace (P) with a nonlinear system of equations
(and possibly some inequalities) to be solved by means of a quasi-Newton
method. The optimality theorem is the basis of such an approach, so that it
requires a to be LFM. Moreover, some smoothness conditions are required,
e.g., a to be analytic. These methods have a good local behavior provided
they start sufficiently close to an optimal solution.
Two-phase methods combine a discretization method (1st phase) and
a reduction method (2nd phase). No theoretical result supports the decision
to go from phase 1 to phase 2.
Feasible directions (or descent) methods generate a feasible direction
at the current iterate by solving a certain LO problem, the next iterate being
the result of performing a linear search in this direction. The auxiliary LO
problem is well defined assuming that a is smooth enough, e.g., it is analytic.
Purification methods provide finite sequences of feasible points with
decreasing values of the objective functional and the dimension of the corre-
sponding smallest faces containing them, in such a way that the last iterate
is an extreme point of either F or yl (but not necessarily an optimal solu-
tion). This approach can only be applied to (P) if the extreme points of F
are characterized, i.e., if a is analytic or LOP.
Hybrid methods (improperly called LSIO simplex method in [AL89])
alternate purification steps (when the current iterate is not an extreme point
of P) and descent steps (otherwise).
Simplex methods can be defined for both problems, (P) and (-D), and
they generate sequences of linked edges of the corresponding feasible set (ei-
ther F or A) in such a way that the objective functional improves on the
successive edges under a nondegeneracy assumption. The starting extreme
point can be calculated by means of a purification method. Until 2001 the
only available simplex method for LSIO problems was conceived for (D) and
its convergence status is dubious (recall that the simplex method in [GG83]
can be seen as an extension of the classical exchange method for polynomial
approximation problems, proposed by Remes in 1934).
Now let us consider the following question: which is the family of solution
sets for each class of LSISs?
The answer is almost trivial for continuous, FM and LFM systems. In fact,
if
Ti := I r J j G R^ I a'x > 6Vx G P

T2:={^GTI|||^||<1},
M.A. Goberna

and
:= < a'x > 6, eTi 1,2,

it is easy to show that ai and a2 are FM (and so LFM) and continuous


representations of F , respectively. It is also known that F admits LOP repre-
sentation if and only if F is quasipolyhedral (i.e., the non-empty intersections
of F with polytopes are polytopes).
The problem remains open for analytic and polynomial LSISs. In fact, all
we know is that the two families of closed convex sets are different ([GHT05b])
and a list of necessary (sufficient) conditions for F to admit analytic (polyno-
mial) representations. More in detail, it has been shown ([JP04]) that F does
not admit analytic representation if either F is a quasi-polyhedral nonpolyhe-
dral set or F C M"^, with n > 3, is smooth (i.e., there exists a unique support-
ing halfspace at each boundary point of F) and the dimension of the lineality
space of F is less than n — 4 (e.g., the closed balls in R", n > 3). Between the
sets which admit polynomial representation, let us mention the polyhedral
convex sets and the plane conic sections, for which it is possible to determine
degF, defined as the minimum of dega := max{deg6;degai,i == l,....,n}
(where a^ denotes the ith component of a) for all a polynomial representation
o f F ([GHT05a]):

1 ^ 1 degF [
max {0,2p - 3}
{x G R^ 1 c'^x > di,i = l,...,p} (minimal)
convex hull of an ellipse 4
convex hull of a parabola 4
convex hull of a branch of hyperbola 2

3 Applications
As the classical applications of LSIO described in Chapters 1 and 2 of [GL98]
and in [GusOlb], the new applications could be classified following different
criteria as the kind of LSIO problem to be analized or solved ((P), (i^),
(Do), etc.), the class of constraint system of (P) (continuous, FM, etc.) or the
presentation or not of numerical experiments (real or modeling applications,
respectively).
Economics
During the 80s different authors formulated and solved risk decision prob-
lems as primal LSIO problems without using this name (in fact they solved
some examples by means of naive numerical approaches). In the same vein
[KMOl], instead of using the classical stochastic processes approach to finan-
cial mathematics, reformulates and solves dynamic interest rate models as
primal LSIO problems where a is analytical and FM. The chosen numerical
approach is two-phase.
Linear Semi-infinite Optimization: Recent Advances 9

Two recent applications in economic theory involve LSIO models where


the FM property plays a crucial role. The continuous assignment problem of
mathematical economics has been formulated in [GOZ02] as a linear optimiza-
tion problem over locally convex topological spaces. The discussion involves a
certain dual pair of LSIO problems. On the other hand, informational asym-
metries generate adverse selection and moral hazard problems. The characteri-
zation of economies under asymmetric information (e.g., competitive markets)
is a challenging problem. [Jer03] has characterized efficient allocations in this
environment by means of LSIO duality theory.
Game Theory
Semi-infinite games arise in those situations (quite frequent in economy) in
which one of the players has infinitely many pure strategies whereas the other
one only has finitely many alternatives. None of the three reviewed papers
reports numerical experiments.
[MM03] deals with transferable utility games, which play a central role in
cooperative game theory. The calculus of the linear core is formulated as a
primal LSIO problem.
A semi-infinite transportation problem consists of maximizing the profit
from the transportation of a certain good from a finite number of suppliers to
an infinite number of customers. [SLTTOl] uses LSIO duality theory in order
to show that the underlying optimization problems have no duality gap and
that the core of the game is nonempty. The same authors have considered, in
[TTLSOl], linear production situations with an infinite number of production
techniques. In this context, a LSIO problem arises giving rise to primal and
dual games.
Geometry
Different geometrical problems can be formulated and solved with LSIO
theory and methods. For instance, the separation and the strong separation of
pairs of subsets of a normed space is formulated this way in [GLWOl], whereas
[JSOO] provides a characterization of the minimal shell of a convex body based
upon LSIO duality theory (let us recall that the spherical shell of a convex
body C with center x G C is the difference between the smallest closed ball
centered at x containing C and the interior of the greatest closed ball centered
at X contained in C).
Probability and Statistics
[DalOl] has analyzed the connections between subjective probability the-
ory, maximum likelihood estimation and risk theory with LSIO duality theory
and methods. Nevertheless the most promising application field in statistics is
Bayesian robustness. Two central problems in this field consist of optimizing
posterior functional over a generalized moment class and calculating mini-
max decision rules under generalized moment conditions. The first problem
has been reformulated as a dual LSIO problem in [BGOO]. Concerning the sec-
ond problem, the corresponding decision rules are obtained by minimizing the
maximum of the integrals of the risk function with respect to a given family of
distributions on a certain space of parameters. Assuming the compactness of
10 M.A. Goberna

this space, [NSOl] proposes a convergence test consisting of solving a certain


LSIO problem with continuous constraint system. The authors use duality
theory and a discretization algorithm.
Machine Learning
A central problem in machine learning consists of generating a sequence of
functions (hypotheses) from a given set of functions which are producible by
a base learning algorithm. When this set is infinite, the mentioned problem
has been reformulated in [RDB02] as a LSIO one. Certain data classification
problems are solved by formulating the problems as linear SDP problems
([JOW05]), so that they can be reformulated and solved as LSIO problems.
Data envelopment analysis
Data Envelopment Analysis (DEA) deals with the comparison of the ef-
ficiency of a set of decision making units (e.g., firms, factories, branches or
schools) or technologies in order to obtain certain outputs from the available
inputs. In the case of a finite set of items to be compared, the efficiency ratios
are usually calculated by solving suitable LO problems. In the case of chemi-
cal processes which are controlled by means of certain parameters (pressure,
temperature, concentrations, etc.) which range on given intervals, the cor-
responding models can formulated as either LSIO or as bilevel optimization
problems. Both approaches are compared in [JJNSOl], where a numerical ex-
ample is provided.
Telecommunication networks
At least three of the techniques for optimizing the capacity of telecommu-
nication systems require the solution of suitable LSIO problems.
In [NNCNOl] the capacity of mobile networks is improved by filtering the
signal through a beamforming structure. The optimal design of this structure
is formulated as an analytic LSIO problem. Numerical results are obtained by
means of a hybrid method. The same numerical approach is used in [DCNNOl]
for the design of narrow-band antennas. Finally, [SAPOO] proposes to increase
the capacity of cellular systems by means of cell sectorization. A certain techni-
cal difficulty arising in this approach can be overcome by solving an associated
LSIO problem with continuous a. Numerical results are provided by means of
a discretization procedure.
Control problems
Certain optimal control problems have been formulated as continuous dual
LSIO problems. This was done in [RubOOa] for an optimal boundary con-
trol problem corresponding to a certain nonlinear diff'usion equation with a
"rough" initial condition, and in [RubOOb] with two kinds of optimal control
problems with unbounded control sets.
On the other hand, in [SIFOl] the robust control of certain nonlinear sys-
tems with uncertain parameters is obtained by solving a set of continuous
primal LSIO problems. Numerical experiments with a discretization proce-
dure are reported.
Optimization under uncertainty
Linear Semi-infinite Optimization: Recent Advances 11

LSIO models arise naturally in inexact LO, when feasibility under any
possible perturbation of the nominal problem is required. Thus, the robust
counterpart of min^; c'x subject to Ax > 6, where (c, A^b) eU CW^ x W^'^ x
M^, is formulated in [BN02] as the LSIO problem mint,^^ subject to t >
c'x, Ax > b\/ (c, A^ b) G U; the computational tractability of this problem is
discussed (in Section 2) for different uncertainty sets U in a, real application
(the antenna design problem). On the other hand, [AGOl] provides strong
duahty theorems for inexact LO problems of the form min^; maxcec c'x subject
to Ax e B yA G A and x e R!J:, where C and B are given nonempty convex
sets and ^ is a given family of matrices. If Ax G B can be expressed as
A{t)x = b{t), t G T, then this problem admits a continuous dual LSIO
formulation.
LSIO also applies to fuzzy systems and optimization. The continuous LSIO
(and NLSIO) problems arising in [HFOO] are solved with a cutting-plane
method. In all the numerical examples reported in [LVOla], the constraint
system of the LSIO reformulation is the union of analytic systems (with or
without box constraints); all the numerical examples are solved with a hybrid
method.
Semidefinite p r o g r a m m i n g
Many authors have analyzed the connections between LSIO and semidef-
inite programming (see [VB98, Fay02], and references therein, some of them
solving SDP problems by means of the standard LSIO methods). In [KZOl]
the LSIO duality theory has been used in order to obtain duality theorems
for SDP problems. [KKOO] and [KGUY02] show that a special class of dual
SDP problems can be solved efficiently by means of its reformulation as a
continuous LSIO problem which is solved by a cutting-plane discretization
method. This idea is also the basis of [KM03], where it is shown that, if the
LSIO reformulation of the dual SDP problem has finite value and a FM con-
straint system, then there exists a low size discretization with the same value.
Numerical experiments show that large scale SDP problems which cannot be
handled by means of the typical interior point methods (e.g., with more than
3000 dual variables) can be solved applying an ad hoc discretization method
which exploits the structure of the problem.

4 Numerical methods
In the previous section we have seen that most of the LSIO problems arising
in practical applications in the last years have been solved by means of new
methods (usually variations of other already known). Two possible reasons
for this phenomenon are the lack of available codes for large classes of LSIO
problems (commercial or not) and the computational inefficiency of the known
methods (which could fail to exploit the structure of the particular problems).
Now we review the specific literature on LSIO methods.
12 M.A. Goberna

[Bet04] and [WFLOl] propose two new central cutting plane methods, tak-
ing as center of the current polytope the center of the greatest ball inscribed in
the polytope and its analytic center, respectively. [FLWOl] proposes a cutting-
plane method for solving LSIO and quadratic CSIO problems (an extension
of this method to infinite dimensional LSIO can be found in [WFLOl]). Sev-
eral relaxation techniques and their combinations are proposed and discussed.
The method in [Bet04], which reports numerical experiments, is an accelerated
version of the cutting-plane (Elzinga-Moore) Algorithm 11.4.2 in [GL98] for
LSIO problems with continuous a whereas [WFLOl] requires the analiticity
of a. A Kelley cutting-plane algorithm has been proposed in [KGUY02] for a
particular class of LSIO problems (the reformulations of dual SDP problems);
an extension of this method to SIO problems with nonlinear objective and
linear constraints has been proposed in [KKT03].
A reduction approach for LSIO (and CSIO) problems has been proposed
in [ILTOO], where a is assumed to be continuous and FM. The idea is to
reduce the Wolfe's dual problem to a small number of ordinary non linear
optimization problems. The method performs well on a famous test example.
This method has been extended to quadratic SIO in [LTW04].
[AGLOl] proposes a simplex method (and a reduced gradient method) for
LSIO problems such that a is LOP. These methods are the unique which could
be applied to LSIO problems with a countable set of constraints. The proof
of the convergence is an open problem.
[LSVOO] proposes two hybrid methods to LSIO problems such that cr is a
finite union of analytic systems with box constraints. Numerical experiments
are reported.
[KM02] considers LSIO problems in which a is continuous, satisfies the
Slater condition and the components of a^ G C (T) are linearly inpendent.
Such kind of problems are reformulated as a linear approximation problem,
and then they are solved by means of a classical method of Polya. Convergence
proofs are given.
[KosOl] provides a conceptual path-following algorithm for the parametric
LSIO problem arising in optimal control consisting of replacing T in (P) with
an interval T (r) := [0, r ] , where r ranges on a certain interval. The constraints
system of the parametric problem are assumed to be continuous and FM for
each r. An illustrative example is given.
Finally, let us observe that LSIO problems could also be solved by means
of numerical methods initially conceived for more general models, as CSIO
([AbbOl, TKVB02, ZNFOO]), NLSIO ([ZR03, VFG03, GPOl, GusOla]) and
GSIO ([StiOl, SS03, Web03] and references therein). The comparison of the
particular versions for LSIO problems of these methods with the specific LSIO
methods is still to be made.
Linear Semi-infinite Optimization: Recent Advances 13

5 Perturbation analysis
In this section we consider possible any arbitrary perturbation of the nominal
data TT = (a^b^c) which preserve n and T (the constraint system of TT is a).
TT is bounded if v (P) 7^^—00 and it has bounded data if a and b are bounded
functions. The parameter space is 11 := (R^ x R) x R"^, endowed with the
pseudometric of the uniform convergence:

(i(7ri, TT) := max < ||c^ — c||, sup^^^^


ii)-i:)\\Y
where TTI = (c^, a^, 6^) denotes a perturbed data set. The associated problems
are (Pi) and (^1). The sets of consistent (bounded, solvable) perturbed prob-
lems are denoted by 77c (^6, -^s, respectively). Obviously, Us C Ub C lie C
n.
Prom the primal side, we consider the following set-valued mappings:
^(TTI) := Fi, i3(7ri) := Bi, f (TTI) := Ei and ^ * (TTI) := Ff, where Fi,
Bi, El and Ff denote the feasible set of TTI , its boundary, its set of extreme
points and the optimal set of TTI , respectively. The upper and lower semiconti-
nuity (use and Isc) of these mappings are implicitly understood in the sense of
Berge (almost no stability analysis has been made with other semicontinuity
concepts). The value function is '^(TTI) := v{Pi). Similar mappings can be
considered for the dual problem. Some results in this section are new even for
LO (i.e., | r | < 00).
Stability of the feasible set
It is easy to prove that !F is closed everywhere whereas the Isc and the use
properties are satisfied or not at a given n e lie depending on the data a and
b.
Chapter 6 of [GL98] provides many conditions which are equivalent to the
Isc property of ^ at TT € ilc, e.g., n G intilc, existence of a strong Slater point
X (i.e., a[x > bt-\- e ioi all t G T , with e > 0), or

On+i ^ clconv
{(:)•'-}
(a useful condition involving the data).
The characterization of the use property of ^ at TT G ilc in [CLP02a]
requires some additional notation. Let K^ be the characteristic cone of

<.-={a'x>6.(;).(co„v{(»<),,.r})J,

where XQO := {hniA; Ijik^^ \ {x^} C X, {/x^} i O}. If F is bounded, then J^ is


use at TT. Otherwise two cases are possible:
If F contains at least one line, then !F is use at TT if and only if K^ ~ clK.
Otherwise, if w is the sum of a certain basis of R^ contained in {a^, t G T } ,
then J^ is use at TT if and only if there exists /? G R such that
14 M.A. Goberna

cone ( K - U { ( ; ) } ) = C O „ ( C , K U {(;)}).

The stability of the feasible set has been analyzed from the point of view
of the dual problem ([GLTOl]), for the primal problem with equations and
constraint set ([AG05]) and for the primal problem in CSIO ([LVOlb] and
[GLT02]).
Stability of the boundary of the feasible set
Given n e lie such that F ^W^, then we have ([GLV03], [GLT05]):

!F Isc at TT <—> B Isc at TT

B closed at TT

T use at TT <— B use at TT


Remarks: (1) the converse holds if d i m F = n; (2) the converse statement
holds if F is bounded.
Stability of the extreme points set
The following concept is the key of the analysis carried out in [GLV05]: TT
is nondegenerate if |{^ G T | a[x = ht]\ < n for all x G B\E.
Let 7TH = (a,0,c). If \T\ > n, ^ 7^ 0, and | F | > 1 (the most difficult
case), then we have:

J^ Isc at TT <—> £ Isc at TT

(4)

£ closed at TT —> TT nondeg.


(2) I I (3)
(5)

£ use at TT —> TT & TT/f uondcg.


Remarks: (1) if F is strictly convex; (2) if F is bounded; (3) if {at^t G T}
is bounded; (4) if ^ is Isc at TT; the converse holds if \T\ < 00; (5) the converse
statement holds if |T| < 00.
Stability of the optimal set
In Chapter 10 of [GL98] it is proved that, if TT G ils, then the following
statements hold:
• ^ * is closed at IT <—> either J^ is Isc at TT or F = F*.
• ^ * is Isc at TT <—> T is Isc at TT and |F*| — 1 (uniqueness).
• If ^ * is use at TT, then ^ * is closed at TT (and the converse is true if F* is
bounded).
The following generic result on Us has been proved in [GLTOSa]: almost
every (in a topological sense) solvable LSIO problem with bounded data has a
strongly unique solution. Results on the stability of J^* in CSIO can be found
in [GLV03] and [GLT02].
Linear Semi-infinite Optimization: Recent Advances 15

Stability of the value and well-posedness


The following definition of well-posedness is orientated towards the sta-
bility of 19. {x^} C E^ is an asymptotically minimazing sequence for n €
IJc associated with {TT^.} C lib if ^^ ^ Fr for all r, lim^ TTr == TT, and
liuir [{c^y x'^ — v{Pr)] = 0. In particular, TT e Us is Hadamard well-posed
(Hwp) if for every x* G F* and for every {iTr} C lib such that liuirTTr = TT
there exists an asymptotically minimazing sequence converging to x*. The
following statements are proved in Chapter 10 of [GL98]:
• If F* 7^^ 0 and bounded, then i) is Isc at TT. The converse statement holds
if TT G lib.
• I? is use at TT <—> T is Isc at TT.
• If TT is Hwp, then -^1^^ is continuous.
• If F* is bounded, TT is Hwp <—> either ^ is Isc at TT or | F | = 1.
• If F* is unbounded and TT is Hwp, then T is Isc at TT.
A similar analysis has been made in [CLPTOl] with other Hwp concepts.
Extensions to CSIO can be found in [GLV03]. A generic result on Hwp prob-
lems in quadratic SIO can be found [ILROl]. The connection between gener-
icity and Hwp properties is discussed in [PenOl].
Distance to ill-posedness
There exist different concepts of ill-posedness in LSIO: bdTJc is the set of
ill-posed problems in the feasibility sense, hdllgi (where Ilsi denotes the set of
problems which have a finite inconsistent subproblem) is the set of generalized
ill-posed problems in the feasibility sense, and bdil^ = bdil^ is the set of ill-
posed problems in the optimality sense. The following formulae ([CLPT04])
replace the calculus of distances in 77 with the calculus of distances in R^+^
involving the so-called hypographic set

H : - conv ihA^teA-^ cone ((^\),tGT\.

• If TT G iTc, then

diTTMHsi) = d(On+lMH) .
• If TT G (clils) n (intilc) and Z~ := conv{at,^ G T; - c } , then

d{TT, hdUs) = min{(i(On+i, bdi7), d{On, b d Z " ) } .

• If TT G (clTT^) n (bdJTc) and Z+ := convja^, t G T; c}, then

diTTMHs) > min{d(On+i,bdif),d(On,bdZ+)}.

Error bounds
The residual function of TT is

r (x, TT) :— sup {bt — a[x) ,


16 M.A. Goberna

where a+ := m a x { a , 0 } . Obviously, x G F ^^ r {x^n) = 0. 0 < P < +oo is a


global error bound for TT E ilc if

^ ^ ^ < /? Vx G R^\F,
r (x,7r)
If there exists such a /?, then the condition number of TT is
d(x F)
0 < T (TT) :=: sup ) ' , < +00.
xeR^\F r{x,7r)
The following statements hold for any TT with bounded data ([HuOO]):
• Assume that F is bounded and TT G int/Zc , and let /?, x^ and 5 > 0 such
that ||x|| < p Vx G F and a[x^ > bt + e yt e T. Let 0 <-f < 1, Then, if

£771 2
c/(7ri,7r) <

we have
r(7ri) < 2 p 5 " ^
1+ 7
(1-7)'
• Assume that F is unbounded and TTH G intilc, and let u and rj > 0 such
that a'tU >r]\/teT, \\u\\ = 1. Let 0 < 5 < n-^r}. Then, if (i(7ri,7r) < 5, we
have
T (TTI ) < (7/ — (^n 2 j

Improved error bounds for arbitrary TT can be found in [CLPT04]. There


exist extensions to CSIO ([GugOO]) and to abstract LSIO ([NY02]).
Sensitivity analysis
The basic problem in sensitivity analysis is to evaluate the impact on the
primal and the dual value functions of small perturbations of the data. In the
case of perturbations of c, an approximate answer can be obtained from the
subdifferentials of these functions (see Chapter 8 in [GL98]). [GGGT05] ex-
tends from LO to LSIO the exact formulae in [GauOl] for both value functions
under perturbations of c and b (separately). This is done determining neigh-
borhoods of c (6), or at least segments emanating from c {b, respectively),
on which the corresponding value function is linear (i.e., finite, convex and
concave).
Other perspectives
In the parametric setting the perturbed data depend on a certain parame-
ter ^ G 0 (space of parameters), i.e., are expressed as TT (0) = (a (6) ,b(9) ,c{6)),
with T fixed or not, and the nominal problem is TT {6). The stability of J^ in
this context has been studied in [MMOO, CLP05], where the stability of d
and !F* has been also analyzed. Results on the stability of !F in CSIO in a
parametric setting can be found in [CLP02b, CLOP03].
For more information on perturbation analysis in more general contexts
the reader is referred to [KH98, BSOO] and references therein.
Linear Semi-infinite Optimization: Recent Advances 17

Acknowlegement
This work was supported by D G E S of Spain and F E D E R , G r a n t BFM2002-
04114-C02-01.

References
[AbbOl] Abbe, L.: Two logarithmic barrier methods for convex semi-infinite prob-
lems. In [GLOl], 169-195 (2001)
[AGOl] Amaya, J., J.A. Gomez: Strong duality for inexact linear programming
problems. Optimization, 49, 243-369 (2001)
[AG05] Amaya, J., M.A. Goberna: Stability of the feasible set of linear systems
with an exact constraints set. Math. Meth. Oper. Res., to appear (2005)
[AGLOl] Anderson, E.J., Goberna, M.A., Lopez, M.A.: Simplex-like trajectories
on quasi-polyhedral convex sets. Mathematics of Oper. Res., 26, 147-162
(2001)
[AL89] Anderson, E.J., Lewis, A.S.: An extension of the simplex algorithm for
semi-infinite Hnear programming. Math. Programming (Ser. A), 44, 247-
269 (1989)
[BN02] Ben-Tal, A., Nemirovski, A.: Robust optimization - methodology and
appHcations. Math. Programming (Ser. B), 92, 453-480 (2002)
[Bet04] Betro, B.: An accelerated central cutting plane algorithm for linear semi-
infinite linear programming. Math. Programming (Ser. A), 101, 479-495
(2004)
[BGOO] Betro, B., Guglielmi, A.: Methods for global prior robustness under gener-
alized moment conditions. In: Rios, D., Ruggeri, F. (ed) Robust Bayesian
Analysis, 273-293. Springer, N.Y. (2000)
[BSOO] Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Prob-
lems. Springer Verlag, New York, N.Y. (2000)
[CLOP03] Canovas, M.J., Lopez, M.A., Ortega, E.-M., Parra, J.: Upper semicon-
tinuity of closed-convex-valued multifunctions. Math. Meth. Oper. Res.,
57, 409-425 (2003)
[CLP02a] Canovas, M.J., Lopez, M.A., Parra, J.: Upper semicontinuity of the fea-
sible set mapping for linear inequality systems. Set-Valued Analysis, 10,
361-378 (2002)
[CLP02b] Canovas, M.J., Lopez, M.A., Parra, J.: Stability in the discretization of a
parametric semi-infinite convex inequality system. Mathematics of Oper.
Res., 27, 755-774 (2002)
[CLP05] Canovas, M.J., Lopez, M.A. and Parra, J.: StabiHty of linear inequality
systems in a parametric setting, J. Optim. Theory AppL, to appear (2005)
[CLPTOl] Canovas, M.J., Lopez, M.A., Parra, J., Todorov, M.I.: Solving strategies
and well-posedness in linear semi-infinite programming. Annals of Oper.
Res., 101, 171-190 (2001)
[CLPT04] Canovas, M.J., Lopez, M.A., Parra, J., F.J. Toledo: Distance to ill-
posedness and consistency value of Hnear semi-infinite inequality systems,
Math. Programming (Ser. A), Published onhne: 29/12/2004, (2004)
18 M.A. Goberna

[DCNNOl] Dahl, M., Claesson, L, Nordebo, S., Nordholm, S.: Chebyshev optimiza-
tion of circular arrays. In: Yang, X. et al (ed): Optimization Methods and
Applications, 309-319. Kluwer, Dordrecht, (2001)
[DalOl] Dall'Aglio: On some applications of LSIP to probability and statistics. In
[GLOl], 237-254 (2001)
[FLWOl] Fang, S.-Ch., Lin, Ch.-J., Wu, S.Y.: Solving quadratic semi-infinite pro-
gramming problems by using relaxed cutting-plane scheme. J. Comput.
Appl. Math., 129, 89-104 (2001)
[Fay02] Faybusovich, L.: On Nesterov's approach to semi-infinite programming.
Acta Appl. Math., 74, 195-215 (2002)
[GauOl] Gauvin, J.: Formulae for the sensitivity analysis of linear programming
problems. In Lassonde, M. (ed): Approximation, Optimization and Math-
ematical Economics, 117-120. Physica-Verlag, Berlin (2001)
[GLV03] Gaya, V.E., Lopez, M. A., Vera de Serio, V.: Stability in convex semi-
infinite programming and rates of convergence of optimal solutions of
discretized finite subproblems. Optimization, 52, 693-713 (2003)
[GG83] Glashofi", K., Gustafson, S.-A.: Linear Optimization and Approximation.
Springer Verlag, Berlin (1983)
[GHT05a] Goberna, M.A., Hernandez, L., Todorov, M.I.: On linear inequality sys-
tems with smooth coefficients. J. Optim. Theory Appl., 124, 363-386
(2005)
[GHT05b] Goberna, M.A., Hernandez, L., Todorov, M.I.: Separating the solution
sets of analytical and polynomial systems. Top, to appear (2005)
[GGGT05] Goberna, M.A., Gomez, S., Guerra, F., Todorov, M.I.: Sensitivity analy-
sis in linear semi-infinite programming: perturbing cost and right-hand-
side coefficients. Eur. J. Oper. Res., to appear (2005)
[GJD05] Goberna, M.A., Jeyakumar, V., Dinh, N.: Dual characterizations of set
containments with strict inequalities. J. Global Optim., to appear (2005)
[GJM03] Goberna, M.A., Jornet, V., Molina, M.D.: Saturation in linear optimiza-
tion. J. Optim. Theory Appl., 117, 327-348 (2003)
[GJM05] Goberna, M.A., Jornet, V., Molina, M.D.: Uniform saturation. Top, to
appear (2005)
[GJROl] Goberna, M.A., Jornet, V., Rodriguez, M.: Directional end of a convex set:
Theory and apphcations. J. Optim. Theory Appl, 110, 389-411 (2001)
[GJR02] Goberna, M.A., Jornet, V., Rodriguez, M.: On the characterization of
some families of closed convex sets. Contributions to Algebra and Geom-
etry, 43, 153-169 (2002)
[GJR03] Goberna, M.A., Jornet, V., Rodriguez, M.: On linear systems containing
strict inequalities. Linear Algebra Appl, 360, 151-171 (2003)
[GLV03] Goberna, M.A., Larriqueta, M., Vera de Serio, V.: On the stability of the
boundary of the feasible set in Hnear optimization. Set-Valued Analysis,
11, 203-223 (2003)
[GLV05] Goberna, M.A., Larriqueta, M., Vera de Serio, V.: On the stability of
the extreme point set in linear optimization. SIAM J. Optim., to appear
(2005)
[GL98] Goberna, M.A., Lopez, M.A.: Linear Semi-Infinite Optimization, Wiley,
Chichester, England (1998)
[GLOl] Goberna, M.A., Lopez, M.A. (ed): Semi-Infinite Programming: Recent
Advances. Kluwer, Dordrecht (2001)
Linear Semi-infinite Optimization: Recent Advances 19

[GL02] Goberna, M.A., Lopez, M.A.: Linear semi-infinite optimization theory:


an updated survey. Eur. J. Oper. Res., 143, 390-415 (2002)
[GLTOl] Goberna, M.A., Lopez, M.A., Todorov, M.I.: On the stabihty of the fea-
sible set in linear optimization. Set-Valued Analysis, 9, 75-99 (2001)
[GLT03a] Goberna, M.A., Lopez, M.A., Todorov, M.I.: A generic result in linear
semi-infinite optimization. Applied Mathematics and Optimization, 48,
181-193 (2003)
[GLT03b] Goberna, M.A., Lopez, M.A., Todorov, M.I.: A sup-function approach to
linear semi-infinite optimization. Journal of Mathematical Sciences, 116,
3359-3368 (2003)
[GLT03c] Goberna, M.A., Lopez, M.A., Todorov, M.I.: Extended active constraints
in linear optimization with applications. SIAM J. Optim., 14, 608-619
(2003)
[GLT05] Goberna, M.A., Lopez, M.A., Todorov, M.I.: On the stability of closed-
convex-valued mappings and the associated boundaries, J. Math. Anal.
AppL, to appear (2005)
[GLWOl] Goberna, M.A., Lopez, M.A., Wu, S.Y.: Separation by hyperplanes: a
linear semi-infinite programming approach. In [GLOl], 255-269 (2001)
[GR05] Goberna, M.A., Rodriguez, M.: Analyzing linear systems containing strict
inequahties via evenly convex hulls. Eur. J. Oper. Res., to appear (2005)
[GBA05] Gomez, J.A., Bosch, P.J., Amaya, J.: Duality for inexact semi-infinite
linear programming. Optimization, 54, 1-25 (2005)
[GLT02] Gomez, S., Lancho, A., Todorov, M.I.: Stability in convex semi-infinite
optimization. C. R. Acad. Bulg. Sci., 55, 23-26 (2002)
[GOZ02] Gretsky, N.E., Ostroy, J.M., Zame, W.R.: Subdiff'erentiability and the
duality gap. Positivity, 6, 261-264 (2002)
[GPOl] Guarino Lo Bianco, C., Piazzi, A.: A hybrid algorithm for infinitely con-
strained optimization. Int. J. Syst. Sci., 32, 91-102 (2001)
[GugOO] Gugat, M.: Error bounds for infinite systems of convex inequalities with-
out Slater's condition. Math. Programing (Ser. B), 88, 255-275 (2000)
[GusOla] Gustafson, S.A.: Semi-infinite programming: Approximation methods. In
Floudas, C.A., Pardalos, P.M. (ed) Encyclopedia of Optimization Vol. 5,
96-100. Kluwer, Dordrecht (2001)
[GusOlb] Gustafson, S.A.: Semi-infinite programming: Methods for linear problems.
In Floudas, C.A., Pardalos, P.M. (ed) Encyclopedia of Optimization Vol.
5, 107-112. Kluwer, Dordrecht (2001)
[HuOO] Hu, H.: Perturbation analysis of global error bounds for systems of linear
inequalities. Math. Programming (Ser. B), 88, 277-284 (2000)
[HFOO] Hu, C. F., Fang, S.-C.: Solving a System of Infinitely Many Fuzzy In-
equalities with Piecewise Linear Membership Functions, Comput. Math.
AppL, 40, 721-733 (2000)
[ILROl] loffe, A.D., Lucchetti, R.E., Revalski, J.P.: A variational principle for
problems with functional constraints. SIAM J. Optim., 12, 461-478
(2001)
[ILTOO] Ito, S., Liu, Y., Teo, K.L.: A dual parametrization method for convex
semi-infinite programming. Annals of Oper. Res., 98, 189-213 (2000)
[JP04] Jaume, D., Puente, R.: Represent ability of convex sets by analytical linear
inequality systems. Linear Algebra AppL, 380, 135-150 (2004)
[Jer03] Jerez, B.: A dual characterization of incentive efficiency. J. Econ. Theory,
112, 1-34 (2003)
20 M.A. Goberna

[JJNSOl] Jess, A., Jongen, H.Th., Neralic, L., Stein, O.: A semi-infinite program-
ming model in data envelopment analysis. Optimization, 49, 369-385
(2001)
[Jey03] Jeyakumar, V.: Characterizing set containments involving infinite convex
constraints and reverse-convex constraints. SIAM J. Optim., 13, 947-959
(2003)
[JOW05] Jeyakumar, V., Ormerod, J., Womersly, R.S.: Knowledge-based semi-
definite linear programming classifiers. Optimization Methods and Soft-
ware, to appear (2005)
[JSOO] Juhnke, F., Sarges, O.: Minimal spherical shells and linear semi-infinite
optimization. Contributions to Algebra and Geometry, 41, 93-105 (2000)
[KH98] Klatte, D., Henrion, R.: Regularity and stability in nonlinear semi-infinite
optimization. In: Reemtsen, R., Riickmann, J. (ed) Semi-infinite Program-
ming. Kluwer, Dordrecht, 69-102 (1998)
[KGUY02] Konno, H., Gotho, J., Uno, T., Yuki, A.: A cutting plane algorithm
for semidefinite programming with applications to failure discriminant
analysis. J. Comput. and Appl. Math., 146, 141-154 (2002)
[KKT03] Konno, H., Kawadai, N. , Tuy, H.: Cutting-plane algorithms for nonlinear
semi-definite programming problems with applications. J. Global Optim.,
25, 141-155 (2003)
[KKOO] Konno, H., Kobayashi, H.: Failure discrimination and rating of enterprises
by semi-definite programming, Asia-Pacific Financial Markets, 7, 261-273
(2000)
[KMOl] Kortanek, K.O., Medvedev, V.G.: Building and Using Dynamic Interest
Rate Models. Wiley, Chichester (2001)
[KZOl] Kortanek, K.O., Zhang, Q.: Perfect duality in semi-infinite and semidefi-
nite programming. Math. Programming (Ser. A), 9 1 , 127-144 (2001)
[KM02] Kosmol, P., Miiller-Wichards, D.: Nomotopic methods for semi-infinite
optimization. J. Contemp. Math. Anal., 36, 31-48 (2002)
[KosOl] Kostyukova, O.I.: An algorithm constructing solutions for a family of lin-
ear semi-infinite problems. J. Optim. Theory Appl., 110, 585-609 (2001)
[KM03] Krishnan, K., Mitchel, J.E.: Semi-infinite linear programming approaches
to semidefinite programming problems. In: Pardalos, P., (ed) Novel Ap-
proaches to Hard Discrete Optimization, 121-140. American Mathemat-
ical Society, Providence, RI (2003)
[LSVOO] Leon, T., Sanmatias, S., Vercher, E.: On the numerical treatment of lin-
early constrained semi-infinite optimization problems. Eur. J. Oper. Res.,
121, 78-91 (2000)
[LVOla] Leon, T., Vercher, E.: Optimization under uncertainty and linear semi-
infinite programming: A survey. In [GLOl], 327-348 (2001)
[LTW04] Liu, Y., Teo, K.L., Wu, S.Y.: A new quadratic semi-infinite programming
algorithm based on dual parametrization. J. Global Optim., 29, 401-413
(2004)
[LVOlb] Lopez, M. A., Vera de Serio, V.: Stability of the feasible set mapping in
convex semi-infinite programming, in [GLOl], 101-120 (2001)
[MM03] Marinacci, M., Montrucchio, L.: Subcalculus for set functions and cores
of TU games. J. Mathematical Economics, 39, 1-25 (2003)
[MMOO] Mir a, J. A., Mora, G.: Stability of linear inequality systems measured by
the HausdorflP metric. Set-Valued Analysis, 8, 253-266 (2000)
Linear Semi-infinite Optimization: Recent Advances 21

[NY02] Ng, K.F., Yang, W.H.: Error bounds for abstract linear inequality sys-
tems. SIAM J. Optim., 13, 24-43 (2002)
[NNCNOl] Nordholm, S., Nordberg, J., Claesson, L, Nordebo, S.: Beamforming and
interference cancellation for capacity gain in mobile networks. Annals of
Oper. Res., 98, 235-253 (2001)
[NSOl] Noubiap, R.F., Seidel, W.: An algorithm for calculating Gamma-minimax
decision rules under generalized moment conditions. Ann. Stat., 29, 1094-
1116 (2001)
[PenOl] Penot, J.-P.: Genericity of well-posedness, perturbations and smooth vari-
ational principles. Set-Valued Analysis, 9, 131-157 (2001)
[RDB02] Ratsch, G., Demiriz, A., Bennet, K.P.: Sparse regression ensembles in in-
finite and finite hypothesis spaces. Machine Learning, 48, 189-218 (2002)
[RubOOa] Rubio, J.E.: The optimal control of nonlinear diffusion equations with
rough initial data. J. Franklin Inst., 337, 673-690 (2000)
[RubOOb] Rubio, J.E.: Optimal control problems with unbounded constraint sets.
Optimization, 48, 191-210 (2000)
[RSOl] Riickmann, J.-J., Stein, O.: On linear and linearized generalized semi-
infinite optimization problems. Annals Oper. Res., 101, 191-208 (2001)
[SAPOO] Sabharwal, A., Avidor, D., Potter, L.: Sector beam synthesis for cellular
systems using phased antenna arrays. IEEE Trans, on Vehicular Tech.,
49, 1784-1792 (2000)
[SLTTOl] Sanchez-Soriano, J., Llorca, N., Tijs, S., Timmer, J.: Semi-infinite assign-
ment and transportation games. In [GLOl], 349-363 (2001)
[ShaOl] Shapiro, A.: On duality theory of conic linear problems. In [GLOl], 135-
165 (2001)
[Sha04] Shapiro, A.: On duality theory of convex semi-infinite programming. Tech.
Report, School of Industrial and Systems Engineering, Georgia Institute
of Technology, Atlanta, GE (2004)
[SIFOl] Slupphaug, O., Imsland, L., Foss, A.: Uncertainty modelling and robust
output feedback control of nonlinear discrete systems: A mathematical
programming approach. Int. J. Robust Nonlinear Control, 10, 1129-1152
(2000) [also in: Modeling, Identification and Control, 22, 29-52 (2001)]
[SS03] Stein, O., Still, G.: Solving semi-infinite optimization problems with in-
terior point techniques. SIAM J. Control Optim., 42, 769-788 (2003)
[StiOl] Still, G.: Discretization in semi-infinite programming: The rate of conver-
gence. Math. Programming (Ser. A), 9 1 , 53-69 (2001)
[TKVB02] Tichatschke, R., Kaplan, A., Voetmann, T., Bohm, M.: Numerical treat-
ment of an asset price model with non-stochastic uncertainty. Top, 10,
1-50 (2002)
[TTLSOl] Tijs, J., Timmer, S., Llorca, N., Sanchez-Soriano, J.: In [GLOl], 365-386
(2001)
[VB98] Vandenberghe, L., Boyd, S.: Connections between semi-infinite and semi-
definite programming. In Reemtsen, R., Riickmann, J. (ed) Semi-Infinite
Programming, 277-294. Kluwer, Dordrecht (1998)
[VFG03] Vaz, I., Fernandes, E., Gomes, P.: A sequential quadratic programming
with a dual parametrization approach to nonlinear semi-infinite program-
ming. Top 11,109-130 (2003)
[Web03] Weber, G.-W.: Generalized Semi-Infinite Optimization and Related Top-
ics. Heldermann Verlag, Lemgo, Germany (2003)
22 M.A. Goberna

[WFLOl] Wu, S.Y., Fang, S.-Ch., Lin, Ch.-J.: Analytic center based cutting plane
method for linear semi-infinite programming. In [GLOl], 221-233 (2001)
[ZR03] Zakovic, S., Rustem, B.: Semi-infinite programming and applications to
minimax problems. Annals Oper. Res., 124, 81-110 (2003)
[ZNFOO] Zavriev, S.K., Novikova, N.M., Fedosova, A.V.: Stochastic algorithm for
solving convex semi-infinite programming problems with equality and in-
equality constraints (Russian, Enghsh). Mosc. Univ. Comput. Math. Cy-
bern., 2000, 44-52 (2000)
Some Theoretical Aspects of Newton's Method
for Constrained Best Interpolation

Hou-Duo Qi

School of Mathematics, The University of Southampton


Highfield, Southampton S017 IBJ, Great Britain
hdqiOsoton.ac.uk

S u m m a r y . The paper contains new results as well as surveys on recent devel-


opments on the constrained best interpolation problem, and in particular on the
convex best interpolation problem. Issues addressed include theoretical reduction of
the problem to a system of nonsmooth equations, nonsmooth analysis of those equa-
tions and development of Newton's method, convergence analysis and globalization.
We frequently use the convex best interpolation to illustrate the seemingly com-
plex theory. Important techniques such as splitting are introduced and interesting
links between approaches from approximation and optimization are also established.
Open problems related to polyhedral constraints and strips may be tackled by the
tools introduced and developed in this paper.

2 0 0 0 M R S u b j e c t C l a s s i f i c a t i o n . 49M45, 90C25, 90C33

I Introduction
T h e convex best interpolation problem is defined as follows:

minimize ||/''||2 (1)

subject to f{ti)=yi, i = 1,2, • • • , n + 2,


/ is convex on [a, 6], / G H^^'^[a, 6],

where a = t i < ^2 < • • • < ^n+2 = b and ^/i, i = 1 , . . . , n + 2 are given numbers,
II • II2 is the Lebesgue L'^[a, b] norm, and M^^'^[a, b] denotes the Sobolev space of
functions with absolutely continuous first derivatives and second derivatives
in L'^[a^b], and equipped with the norm being the sum of the L'^[a,b] norms
of the function, its first, and its second derivatives.
Using an integration by parts technique, Favard [Fav40] and, more gener-
ally, de Boor [deB78] showed t h a t this problem has an equivalent reformulation
as follows:
24 H.-D. Qi

min {||i^||| u G L^[a,6], w > 0, {u^x'^) = di, i = : l , . . . , n } , (2)

where the functions x^ G L'^[a,b] and the numbers di can be expressed in


terms of the original data {ti^yi} (in fact, x* = Bi{t), the jB-sphne of order
2 defined by the given data and {di} are the second divided differences of
{(^i? yi)}^=i)' Under the assumption d ^ > 0 , i = l , . . . , n the optimal solution
u* of (2) has the form

u*it) = (J2KBim (3)

where r-(- :== max{0, r} and {A*} satisfy the following interpolation condition:

/ (Z]'^^^^W) Bi{t)dt = du i = l,...,n. (4)

Once we have the solution u*, the function required by (1) can be obtained by
f" = u. This representation result was obtained first by Hornung [HorSO] and
subsequently extended to a much broader circle of problems in [AE87, DK89,
IP84, IMS86, MSSW85, MU88]. We briefly discuss below both theoretically
and numerically important progresses on those problems.
Theoretically, prior to [MU88] by Micchelli and Utreras, most of research
is mainly centered on the problem (1) and its slight relaxations such as f^'
is bounded below or above, see [IP84, MSSW85, IMS86, AE87, DK89]. After
[MU88] the main focus is on to what degree the solution characterization like
(3) and (4) can be extended to a more general problem proposed in Hilbert
spaces:
min<^ - | | a : - x ° | | ^ | xeC and Ax=^h\ (5)

where C C X is a closed convex set in a Hilbert space X^ A \ X \-^ IR^ is a


bounded linear operator, h G IR^. It is easy to see that if we let

X = L'^[a,h], C ={xeX\x>0}, Ax = {{Bux},... ,{Bn,x)),


x^ = 0, b = d
(6)
then (5) becomes (2). The abstract interpolation problem (5), initially studied
in [MU88], was extensively studied in a series of papers by Chui, Deutsch, and
Ward [CDW90, CDW92], Deutsch, Ubhaya, Ward, and Xu [DUWX96], and
Deutsch, Li, and Ward [DLW97]. For the complete treatment on this problem
in the spirit of those papers, see the recent book by Deutsch [DeuOl].
Among the major developments in those papers is an important concept
called the strong CHIP [DLW97], which is the refinement of the property
CHIP [CDW90] (Conical Hull Intersection Property). More studies on the
strong CHIP, CHIP and other properties can be found in the two recent
papers [BBL99, BBTOO]. Roughly speaking, the importance of the strong
CHIP is with the following characterization result: The strong CHIP holds
Newton's Method for Constrained Best Interpolation 25

for the constraints in (5) if and only the unique solution x* has the following
representation:
x* = P c ( x V ^ * A * ) , (7)
where Pc denotes the projection to the closed convex set C (the closeness and
convexity guarantees the existence of Pc)^ and ^* is the adjoint of A^ and
A* G IR^ satisfies the following nonlinear nonsmooth equation:

APc{x^ ^A'X) = h, (8)

To see (7) and (8) recover (3) and (4) it is enough to use the fact:

Pc = x^ where C =^ {x e L^[a,6]|x > 0}.

If the strong CHIP does not hold we still have similar characterization in
which Pc is replaced by Pcb, where Ch is an extremal face of C satisfying
some properties [DeuOl]. However, it is often hard to get enough information
to make the calculation of Pc^ possible, unless in some particular cases. Hence,
we mainly focus on the case where the strong CHIP holds. We will see that the
assumption rf^ > 0, i = 1 , . . . , n for problem (1) is a sufficient condition for the
strong CHIP, and much more than that, it ensures the quadratic convergence
of Newton's method.
Numerically, problem (1) has been well studied [IP84, IMS86, AE87,
MU88, DK89, DQQOl, DQQ03]. As demonstrated in [IMS86] and verified
in several other occasions [AE87, DK89], the Newton method is the most effi-
cient compared to many other global methods for solving the equation (4). We
delay the description of the Newton method to the end of Section 3, instead
we list some difficulties in designing algorithms for (4) and (8). First of all, the
equation (4) is generally nonsmooth. The nonsmoothness was a major barrier
for Andersson and Elfving [AE87] to establish the convergence of Newton's
method (they have to assume that the equation is smooth near the solution
(the simple case) in order that the classical convergence result of Newton's
method appHes). Second, as having been both noticed in [IMS86, AE87], in the
simple (i.e., smooth) case, the method presented in [IMS86, AE87] becomes
the classical Newton method. More justification is needed to consolidate the
name and the use of Newton's method when the equation is nonsmooth. To
do this, we appeal to the theory of the generalized Newton method developed
by Kummer [Kum88] and Qi and Sun [QS93] for nonsmooth equations. This
was done in [DQQOl, DQQ03]. We will review this theory in Section 3. Third,
Newton's method is only developed for the conical case, i.e., C is a cone. It is
yet to know in what form the Newton method appears even for the polyhedral
case (i.e., C is intersection of finitely many halfspaces). We will tackle those
difficulties against the problem (5).
The problem (5) can also be studied via a very different approach devel-
oped by Borwein and Lewis [BL92] for partially finite convex programming
problems:
26 H.-D. Qi

mi{f{x)\Axeb^Q, xeC}, (9)


where C G X is a closed convex set, X is a topological vector space, A : X ^-^
JR^ is a bounded linear operator, 6 G IR^, Q is a polyhedral set in IR^, and
/ : X i-> (-00,00] is convex. If f{x) = \\x^ — x p and Q = {0}, then (9)
becomes (5). Under the constraint qualification that there is a feasible point
which is in the quasi-relative interior of C, the problem (9) can be solved
by its Fenchel-Rockafellar dual problem. We will see in the next section that
this approach also leads to the solution characterization (7) and (8). See, e.g.,
[GT90, Jey92, JW92] for further development of Borwein-Lewis approach.
An interesting aspect of (9) is when Q = IR!J:, the nonnegative orthant of
IR'^. This yields the following approximation problem:

iin|-||x°-x|| Ax>b, xeC\. (10)

This problem was systematically studied by Deutsch, Li and Ward in [DLW97],


proving that the strong CHIP again plays an important role but the sufficient
condition ensuring the strong CHIP takes a very different form from that (i.e.,
6 G ri AC) for (5). We will prove in Section 2 that the constraint qualification
of Borwein and Lewis also implies the strong CHIP. Nonlinear convex and
nonconvex extension of (10) can be found in [LJ02, LN02, LN03].
The paper is organized as follows: The next section contains some nec-
essary background materials. In particular, we review the approach initiated
by Micchelli and Utreras [MU88] and all the way to the advent of the strong
CHIP and its consequences. We then review the approach of Borwein and
Lewis [BL92] and state its implications by establishing the fact that the non-
emptiness of the quasi-relative interior of the feasible set implies the strong
CHIP. In section 3, we review the theory of Newton's method for nonsmooth
equations, laying down the basis for the analysis of the Newton method for
(5), which is conducted in Section 4. In the last section, we discuss some ex-
tensions to other problems such as interpolation in a strip. Throughout the
paper we use the convex best interpolation problem (1) and (4) as an example
to illustrate the seemingly complex theory.

2 Constrained Interpolation in Hilbert Space


Since X is a Hilbert space, the bounded Hnear operator A : X \-^ H^ has the
following representation: there exist x i , . . . , X n G X such that

Ax = ( ( x i , x ) , . . . ,(xn,x)), Vx G X.

Defining
Hi := {x G X\ {xi,x) =bi} , z = 1,.. . , n
the interpolation problem (5) has the following appearance
Newton's Method for Constrained Best Interpolation 27

mm
jill^o _^||2| ^ g ^ := cn {n]^,Hj)Y (11)

Recall that for any convex set D C X^ the (negative) polar of D, denoted by
D°, is defined by

D° :={yeX\ {y,x) < 0 , VXGL>}.

The well-know strong CHIP is now defined as follows.


Definition 1. [DeuOl, Definition 10.2] A collection of closed convex sets
{Ci,C2, . . . , C m } in X, which has a nonempty intersection, is said to have
the strong conical hull intersection property, or the strong CHIP, if
m
{r\fCi - xf - Y^^Ci - xf Vx G n^Q.
1

The concept of the strong CHIP is a refinement of CHIP [DLW97], which


requires
~rn
( n r a - xf = Y,(Ci - xY \Jx e n^Cu (12)
1

where C denotes the closure of C It is worth mentioning that one direction


of (12) is automatic, that is

{n'pCi - x)° D J2{Ci ~ x)° Vx G nTCi,


1

Hence, the strong CHIP is actually assuming the other direction. The impor-
tance of the strong CHIP is with the following solution characterization of the
problem (11).
Theorem 1. [DLW97, Theorem 3.2] and [DeuOl, Theorem 10.13] The set
{C.,r\^Hj} has the strong CHIP if and only if for every x^ e. X there exists
A* G IR" such that the optimal solution x* = PK{X^) has the representation:
X* =Pc{x^ + A*X*)

and A* satisfies the interpolation equation

APc{x^ + A'^X) = b.

We remark that in general the strong CHIP of the sets {C, i J i , . . . , ifn}
implies the strong CHIP of the sets {C, nyifj}. The following lemma gives a
condition that ensures their equivalence.
Lemma 1. ]Deu01, Lemma 10.11] Suppose that X is a Hilbert space and
{Co, Ci, . . . , Cm} 'is a collection of closed convex subsets such that { C i , . . . , Cm}
has the strong CHIP. Then the following statements are equivalent:
28 H.-D. Qi

(i) {Co, C i , . . . , Cm} has the strong CHIP.


(iiJiCo^n'pCj} has the strong CHIP.
Since each Hj is a hyperplane, {Hi,..., Hn} has the strong CHIP [DeuOl,
Example 10.9]. It follows from Lemma 1 that the strong CHIP of { C , H i , . . . ,
Hn} is equivalent to that of {C, A~^{b)}. However, it is often difficult to know
if {C,A~^{b)} has the strong CHIP. Fortunately, there are available easy-
to-be-verified sufficient conditions for this property. Given a convex subset
D C IR'^, let ri D denote the relative of D. Note that ri D ^^ 0 if D 7^ 0.
Theorem 2. [DeuOl, Theorem 10.32] and [DLW97, Theorem 3.12] If h e
ri AC, then {C,A~^{h)} has the strong CHIP.
Theorem 2 also follows from the approach of Borwein and Lewis [BL92].
The concept of quasi-relative interior of convex sets plays an important role in
this approach. We assume temporarily that X be a locally convex topological
vector space. Let X* denote the dual space of X (if Xis a Hilbert space then
X* = X) and Nc{x) C X* denote the normal cone to C at x G C, i.e.,
Nc{x):={yeX''\{y,x-x)<{), Vx G C}.
The most useful properties of the quasi-relative interiors are contained in
the following
Proposition 1. [BL92] Suppose C C X is convex, then
(i) If X is finite-dimensional then qri C = ri C.
(ii)Let X G C then x E qri C if and only if Nc{x) is a subspace of X*.
(Hi)Let A : X \-^ IR^ be a bounded linear map. If qri C ^ ^ then A{qri C) =
riAC.
We note that (ii) serves a definition for the quasi-relative interior of convex
sets. One can find several other interesting properties of the quasi-relative
interior in [BL92]. Although in finite-dimensional case quasi-relative interior
becomes classical relative interior, it is a genuine new concept in infinite-
dimensional cases. To see this, let X = I'^fO, 1], (p > 1), C := {x G X\x >
0 a.e.}. Since C reproduces X (i.e., X = C — C), ri C = ^, however, qri C =
{x G X\x > 0 a.e.}. One of the basic results in [BL92] is
Theorem 3. [BL92, Corollary 4-8] Let the assumptions on problem (9) hold.
Consider its dual problem
m a x { - ( / + (5(.|C))*(A*A) + 6^A| A G Q+} . (13)
/ / the following constraint qualification is satisfied
there exists an x G qri C which is feasible for (9), (14)
then the value of (9) and (13) are equal with attainment in (13). Suppose
further that (/+5(-|C)) is closed. IfX* is optimal for the dual and {f-\-6{'\C)y
is differentiable at A*X* with Gateaux derivative x* G X, then x* is optimal
for (9) and furthermore the unique optimal solution.
Newton's Method for Constrained Best Interpolation 29

In (13), Q+ := {y G X*| {y,x) > 0, V x G Q}. We now apply Theorem 3


to problem (11), i.e., we let

f{x) - i||x^ - xf, Q = {0} so that g + = IR^.

Obviously, in this case (9) has a unique solution since f{x) is strongly convex.
For y G X* we calculate

(/ + S{-\C)riy) = sup {{y,x) - /(x) - 5(x|C)}


xex

= supl^{x,y + x')-l\\xf-\\\xY]
/ i | | u + a;0||2_iiiy , „ o _ | | 2 _ 1 | | oipl
= sup

= l\\y + Af-l\\y + ^°-Pciy + x')f-\\\xY-ii5)


It is well known (see, e.g., [MU88, Theorem 3.2]) that the right side of (15) is
Gateaux differentiable with

^if + 5i-\C)r{y) = Pc{y + x').


Returning to (13), which is an unconstrained convex optimization problem,
we know that the optimal solution A* to (13) satisfies

APc{x^ + A''X)=:b

and the optimal solution to (9) is

x' = Pc{x^ + A'X').

Following Theorem 1 we see that the sets {C, A~^{b)} has the strong CHIP. In
fact, the qualification (14) is exactly the condition 6 G ri (AC) by Proposition
1, except that (14) needs a priori assumption qri C 7^ 0.
However, for the problem (10), where

K = Cn{x\Ax>b},
the condition 6 G ri AC is not suitable as it might happen that b ^ AC. It
turns out that the strong CHIP again plays an essential role in this case. Let

Hj :={x\{aj,x) >bj}.
30 H.-D. Qi

Theorem 4. [DLW99, Theorem 3.2] The sets {C, H^Wj} has the strong CHIP
if and only if the optimal solution of (10) x* = PK{X^) has the following
representation:
X* = P c ( : ^ * + ^ * A * ) , (16)
where A* is any solution of the nonlinear complementarity problem:
A > 0, w:= APc{x^ + ^*A) - 6 > 0, \^w = 0. (17)
The following question was raised in [DLW99] that if the constraint qual-
ification (14) is a sufficient condition for the strong CHIP of {C.D^Hj}. We
give an affirmative answer in the next result.
Theorem 5. If it holds
qriCn{ninj)^iD, (18)
then the sets {C^DiTij} has the strong CHIP.

Proof Suppose (18) is in place, it follows from Theorem 3 with f{x) = h\\^~
x^W^ that there exists an optimal solution A* to the problem (13). (15) says
that

if + Si-\C)riy) = l\\y + xY-\\\y + x°-Pciy + x')f-\\\xY


and it is Gateaux differentiable and convex [MU88, Lemma 3.1]. Then (13)
becomes

mm I i p * A + x ^ f - i||A*A + x° - Pc(^*A + x ° ) f ~ 6^A| A > o | .

It is a finite-dimensional convex optimization problem and the optimal solu-


tion is attained. Hence, the optimal solution A* is exactly a solution of (17)
and the optimal solution of (10) is x* = Pc{x^ + ^*A). It then follows from
the characterization in Theorem 4 that the sets {C, fl^Wj} has the strong
CHIP. D
Illustration to problem (2). We recall the problem (2) and the setting in
(6). Prom the fact [BL92, Lemma 7.17]

Bixdt j I X > 0 a.e. x e L^[a,b] i ^ {r G IR""! r^ > 0,z = 1 , . . . ,n}

and the fact qri C = {x G L'^[a,b]\ x > 0 a.e.}, we have


Aqri C = ri ^ C = int ^ C - {r G IR^^I n > 0,2 - 1 , . . . , n}.
It follows from Theorem 2 or Theorem 3 that the solution to (2) is given by
(3) and (4), under the assumption that di > 0 for all i. Moreover, we will see
that this assumption implies the uniqueness of the solution A*, and eventually
guarantees the quadratic convergence of the Newton method.
Newton's Method for Constrained Best Interpolation 31

3 Nonsmooth Functions and Equations


As is well known, if F : IR^ H-> IR"^ is smooth the classical Newton method for
finding a solution x* of the equation F{x) = 0 takes the following form:

^^+1 ^ ^^ _ {F\x^))~^ F{x^) (19)


where F ' is the Jacobian of F. If F'(a;*) is nonsingular then (19) is well defined
near the solution x* and is quadratically convergent. However, as we see from
the previous sections we are encountered with nonsmooth equations. There is
need to develop Newton's method for nonsmooth equation, which is presented
below.
Now we suppose that F : IR'^ y-^ IR^ is only locally Lipschitz and we want
to find a solution of the equation
F{x) = 0. (20)
Since F is differentiable almost everywhere according to Redemacher's the-
orem, the Bouligand diff'erential of F at x, denoted by dBF{x)^ is defined
by
DBF^X) := \V\ V = lim F\x'), F is difi^erentiable at x'\ .
[ re*—>x J
In other words, dBF{x) is the set of all limits of any sequence {F\x'^)] where
F' exists at x'^ and x'^ —^ x. The generalized Jacobian of Clark [Cla83] is then
the convex hull of 9j3F(x), i.e.,
dF{x) =codBF{x).
The basic properties of OF are included in the following result.
Proposition 2. [Cla83, Proposition 2.6.2]
(a)dF is a nonempty convex compact subset ofJR^^'^.
(h) dF is closed at x; that is, if x'^ -^ x, Mi e dF{x^), Mi —> M, then
M edF{x).
(c) dF is upper semicontinuous at x.
Having the object of 9 F , the nonsmooth version of Newton's method for
the solution of (20) can be described as follows (see, e.g., [Kum88, QS93]).
x^+i =x^ - V^^F{x^), Vk e dF{x^). (21)
We note that different choice of Vk results in different sequence of {x^}. Hence,
it is more accurate to say that (21) defines a class of Newton-type methods
rather than a single method. It is always arguable which element in dF{x^)
is the most suitable in defining (21). We will say more about the choice with
regard to the convex best interpolation problem. We also note that there
are other ways in defining nonsmooth Newton's method, essentially using
different definitions 9F(x), but servicing the same objective as 9 F , see, e.g.,
[JL98, Xu99, KK02].
32 H.-D. Qi

Definition 2. We say that F is regular at x if each element in dF{x) is


nonsingular.

If F is regular at x* it follows from the upper semicontinuity of F at x*


(Prop. 2) that F is regular near a:*, and consequently, (21) is well defined
near x*. Contrasted to the smooth case, the regularity at x* only is no long a
sufficient condition for the convergence of the method (21). It turns out that
its convergence also relies on another important property of F , named the
semismoothness.

Definition 3. [QS93] We say that F is semismooth at a;* if the following


conditions hold:
(i) F is directionally differentiahle at x, and
(a) it holds

F{x + /i) - F{x) -Vh = o{\\h\\) \/V e dF{x + h) and h G K^. (22)

Furthermore, if

F{x + /i) - F{x) -Vh = 0{\\hf) yV e dF{x + h) and h e IR^, (23)

F is said strongly semismooth at x. If F is (strongly) semismooth everywhere,


we simply say that F is (strongly) semismooth.

The property of semismoothness, as introduced by Mifflin [Mif77] for func-


tionals and scalar-valued functions and further extended by Qi and Sun [QS93]
for vector-valued functions, is of particular interest due to the key role it plays
in the super linear convergence of the nonsmooth Newton method (21). It is
worth mentioning that in a largely ignored paper [Kum88] by Kummer, the
relation (22), being put in a very general form in [Kum88], has been revealed
to be essential for the convergence of a class of Newton type methods, which
is essentially the same as (21). Nevertheless, Qi and Sun's work [QS93] makes
it more accessible to and much easier to use by many researchers (see, e.g., the
book [FP03] by Facchinei and Pang). The importance of the semismoothness
can be seen from the following convergence result for (21).

Theorem 6. [QS93, Theorem 3.2] Let x* he a solution of the equation


F{x) — 0 and let F he a locally Lipschitz function which is semismooth at
X*. Assume that F is regular at x*. Then every sequence generated hy the
method (21) is superlinearly convergent to x* provided that the starting point
x° is sufficiently close to x*. Furthermore, if F is strongly semismooth at x*,
then the convergence rate is quadratic.

The use of Theorem 6 relies on the availability of the following three ele-
ments: (a) availability of an element in dF{x) near the solution x*, (b) regu-
larity of F at X* and, (c) (strong) semismoothness of F at x*. We illustrate
Newton's Method for Constrained Best Interpolation 33

how the first can be easily calculated below for the convex best interpolation
problem and leave the other two tasks to the next section.
Illustration t o the convex best interpolation problem. It follow from
(3) and (4) that the solution of the convex best interpolation problem can be
obtained by solving the following equation:
F(A) = d, (24)
where d= ( d i , . . . , d„)-^ and each component of F is given by

,(A) = f/
FJW= Ifl^eBe]
E Bjit)dt, j = l,...,n. (25)

Irvine, Marin, and Smith [IMS86] developed Newton's method for (24):
X+ = X-(M{X)r'{F{X)-d), (26)
where A and A+ denote respectively the old and the new iterate, and M(A) G
jf^nxn |g giygn by

(M(A)),^. = J' (f;^ XeBA B,{t)Bi {t)dt,

and
0 _ flifr>0
^^^+ ~ \ 0 if r < 0.
Let e denote the element of all ones in IR^, then it is easy to see that the
directional derivative of F at A along the direction e is
F'{\e)=M{\)e.
Moreover, if F is differentiable at A then F'{X) = M(A). Due to those reasons,
the iteration (26) was then called Newton's method, and based on extensive
numerical experiments, was observed quadratically convergent in [IMS86]. In-
dependent of [IMS86], partial theoretical results on the convergence of (26) was
estabhshed by Andersson and Elfving [AE87]. Complete convergence analysis
was established by Dontchev, Qi, and Qi [DQQOl, DQQ03] by casting (26)
as a particular instance of (21). The convergence analysis procedure verifies
exactly the availability of the three elements discussed above, in particular,
M(A) G dF{\). We will present in the next section the procedure on the
constrained interpolation problem in Hilbert space.

4 Newton's Method and Convergence Analysis


4.1 Nev^ton's Method
We first note that all results in Section 2 assume no other requirements for the
set C except being convex and closed. Consequently, we are able to develop
34 H.-D. Qi

(conceptual, at least) Newton's method for the nonsmooth equation (8). How-
ever, efficient implementation of Newton's method relies on the assumption
that there is an efficient way to calculate the generalized Jacobian of APc{x).
The most interesting case due to this consideration is when C is a closed con-
vex cone (i.e., the conical case [BL92]), which covers many problems including
(1). We recall our setting below

X = L^[a,b], C = {xeX\x>0},Ax = {{ai,x),.,.,{an,x)), belR"^

where a^ G X, £ = 1 , . . . , n (in fact we may assume that X = L^[a, 6], in this


case a£ G L^[a, b] where l/p-\-l/q = 1). This setting simplifies our description.
We want to develop Newton's method for the equation:

APc{x^ + A''X) = b.

Taking into account of the fact Pc{x) = x-^^ we let

F(A) ~b = 0 (27)

where each component of F : IR^ i-^ R^ is given by


n
F,(A):=(a,-,(xO + ^ a , A ^ ) + ) . (28)

We propose a nonsmooth Newton method (in the spirit of Section 3) for


nonsmooth equation (27) as follows:

F(A)(A+ - A) = 6 - F(A), V{X) e dF{X). (29)

One of several difficulties with the Newton method (29) is to select an ap-
propriate matrix V^(A) from 9F(A), which is well defined as F is Lipschitz
continuous under Assumption 1 stated later. We will also see the following
choice satisfies all the requirements.

(^(A)),,.. := I ix^ + Y1A^^^ ) (^i(^jdi' (30)

We note that for p eW

/3^y(A)/3= / (xO + f ^ A , a , ) Ij^Peae] dt>0. (31)

That is, V{\) is positive semidefinite for arbitrary choice A G IR^. We need
an assumption to make it positive definite. Let the support of ae be

supp(a^) := {t G [a,b]\ae{t) ^ 0}.


Newton's Method for Constrained Best Interpolation 35

Assumption 1. Each ae is continuous, and any subset of functions

{a£,£ Gl C { 1 , . . . , n}\supp{ai) D supp(aj) ^ 0 for any pair i,j e 1} ,

are linearly independent on U££xsupp{ai). Moreover,

Ul^isupp{ai) = [a, 6].

This assumption is not restrictive. Typical choices of ae are {a^ = f^} or


{ai = Bi}. With Assumption 1 we have the following result.
Lemma 2. Suppose Assumption 1 holds. V{X) is positive definite if and only
if {x^ + Yl7=i ^e(^£)-\- does not vanish identically on the supporting set of each
ae, £ = l,...,n.
Proof Suppose that {x^ + Yl^=i ^^^^)+ is nonzero on each supp(a£). Due
to the continuity of (x^ + Yll=i ^^^^) ^^^ ^e^ there exists a Borel set i?^ C
supp(a^) such that (x^ -{- Y^^^i A^a^)^ = 1 for all t E Qe and the measure of
Qi is not zero. Let

I{f2e) : - 01 supp(a^) D f2e ^ 0}.


Since {aj\j G Z(i?^)} are linearly independent, (3^V{X)I3 = 0 implies (Sj = 0
for all j GX{Qi). We also note that

UF=iJ(/2,) = { l , . . . , n } .

We see that pj = 0 for all j - 1 , . . . , n if P^V{X)(3 = 0. Hence, (31) yields


the positive definiteness of V{X). The converse follows from the observation
that if {x^ + Y^l=:i A^<^^)+ ^ 0 on supp(a^) for some £ then /?^y(A)/3 = 0 for
/3 G R^ with pe = 1 and pj = 0 for j ^ £. D
Due to the special structure of F(A), Newton's method (29) can be sim-
plified by noticing that

Fj(A) - / (^° + X] ^^^^) ^'^^


n \^ / ^ \
v^ + yjA^a^ I I x^ + 2_\A^a^ 1 Ojdt
Ja e=i /+ V ^=1 / +
n b / "^ \ ^
= ^ Xe{V{X))je + f f ^° + X^ Xeae ] ajxH{t).

Thus we have

F(A)-T/(A)A + ^I ( x V ^ A . a ^ l xM .
^=1
36 H.-D. Qi

Recalling (29) we have


0

T/(A)A+-6-A x^ + ^ A ^ a J x° . (32)
£=1
+
A very interesting case is when x^ = 0, which implies that no function eval-
uations are required to implement Newton's method, i.e, (32) takes the form
V{X)X+ = b.
Other choices of V{X) are also possible as dF{X) usually contains infinitely
many elements. For example,

1 if r > 0
( ^ A ) ) . . : = y ((a:° + f ^ A , a , j aia^dt, and (r)^:=|j
if r < 0.

It is easy to see that P^V{X)P > P'^V{X)p for any P G IR"'. This means
that V{X) "increases the positivity" of ^(A) in the sense that V^(A) — V{X) is
positive semidefinite. The argument leading to (32) also applies to V{X). We
will show below that both V^(A) and V'(A) are contained in dF{X).

4.2 Splitting and Regularity

We now introduce a splitting technique that decomposes the (nonsmooth)


function F into two parts, namely F'^ and F ~ , satisfying that F + is con-
tinuously differentiable at the given point and F ~ is necessarily nonsmooth
nearby. This technique facihtates our arguments that lead to the conclusion
that V{X) belongs to dF{X) and pave the ways to study the regularity of F
at the solution. For the moment, we let A be our reference point. Let
n
T{\) := {t e [a,b]\ x° + ^Xeae = 0}, f (A) := [a,b]\T{X).

Due to Assumption (1), r(A) contains closed intervals in \a,b], possibly iso-
lated points. For j •= 1 , . . . , n, define

^/W-= / (a;° + V A f o J ajdt,


JTOO \ ^ ) ^

^r(^)'-= [ la^° + y^A«Of) ajdt,

and

F+(A) := ( F + ( A ) , . . . , F:iX)f, F-(A) := ( F f (A),..., F-(X)f.


Newton's Method for Constrained Best Interpolation 37

It is easy to see that


F(A)-F+(A)+F-(A).
It is elementary to see that the vector-valued function F"^ is continuous dif-
ferentiable in a neighborhood M(X) of A. Then from the definition of the
generalized Jacobian we obtain that for any A G A/'(A),

dF{X) = VF+(A) + 9F-(A), (33)

where VF'^(A) denotes the usual Jacobian of F+ at A. More precisely,

(VF+(A))..= / Ix^ + y^XaA aiajdt. (34)


'' Jnx)\ ^ J^
Since
x° + ^ A a ^ - 0 for all teT(X),

(34) can be written as

(VF+(A))..-^ U^ + f^Aa,J aiajdt = V{X)^ (35)

Regarding to F ~ we need following assumption:


Assumption 2. There exists a sequence of {A^} in JR^ converging to zero
such that the sum J2^=i ^e^^ ^^ negative on [a^b] for all A^.
This assumption also holds if each of ae is nonnegative or nonpositive.
Lemma 3. For any X G IR"" every element in dF~{X) is positive semidefinite.
Moreover, if Assumption 2 holds then the zero matrix belongs to dF~{X).
Proof. We denote

y:= I x^ + ^ A ^ a H XT(A)'

where XT{\) ^^ the characteristic function of the set T{X). In terms of y, F~


can be written as F~{X) — Ay. Since T(A) consists of only closed intervals,
without loss of generality we assume T(A) is a closed interval. Let

C'.= {xeL'^{T{X))\x>0}.

Then we have L'^[a,b] C L2(T(A)) since (T(A)) C [a,6]. Define

e{X):= [ L^' + yXeaA dt ^ j (Pc(^vf]A^a^)) d^.


JT{X)\ ^ ) ^ JTi\)\ ^^1 )
38 H.-D. Qi

According to [MU88, Lemma 2.1], ^(A) is continuously Gateaux differentiable


and convex. Moreover,
Ve{X)=Ay = F-{X).
Therefore, any matrix in the generaUzed Jacobian of the gradient mapping
(which is required to be Lipschitz continuous) of a convex function must be
positive semidefinite, see, for example, [JQ95, Proposition 2.3]. Now we prove
the second part. Suppose Assumption 2 holds for the sequence {A^} which
converges to zero. Then F~{X + A^) is differentiable because
n
(A + X^)eae < 0 for all t G r(A) and r > 0.

Hence,
lim V F - ( A + A^) = 0 G 9F-(A).
/c—>oo

D
We then have
Corollary 1. For any X G IR^, F(A) G dF{X).
Proof. It follows from Lemma 3 that 0 G 9F_-(A) andfrom (35)_that ^(A) =
VF+(A). The relation (33) then implies V{X) G dF{X). Since A is arbitrary
we are done. D
We need another assumption for our regularity result.
Assumption 3, be ^ 0 for all i = 1 , . . . , n.
L e m m a 4. Suppose Assumptions (1), (2) and (3) hold and let A* he the so-
lution of (27). then every element o/9F(A*) is positive definite.
Proof. We have proved that
dF{X*) = aF-(A*) + VF+(A*) - (9F-(A*) + F(A*)
and every element in dF~(X*) is positive semidefinite. It is enough to prove
VF"^(A*) is positive definite. We recall that at the solution

= Fi{Xn = J | x ^ + X^A,*a,j a,dt, Vz = 1,

The assumption (3) implies that (x^ + Yll,=\ '^^^^)_L ^oes not vanish identi-
cally at the support of each a^. Then Lemma 2 implies that VF"^(A*) = V{X*)
is positive definite. D
Illustration to problem (2). An essential assumption for problem (2) is that
the second divided difference is positive, i.e., d^ > 0 for alH — 1 , . . . , n. Hence,
Assumption (3) is automatically valid. It is easy to see that Assumptions (1)
and (2) are also satisfied for ^-splines. It follows from the above argument
that the Newton method (26) is well defined near the solution. However, to
prove its convergence we need the semismoothness property of F , which is
addressed below.
Newton's Method for Constrained Best Interpolation 39

4.3 S e m i s m o o t h n e s s
As we see from Theorem 6 that the property of semismoothness plays an
important role in convergence analysis of nonsmooth Newton's method (21).
In our application it involves functions of following type:

^(A) := / ^{X,t)dt (36)

where cj) : IR'^ x [a, 6] H^ IR is a locally Lipschitz mapping. The following


development is due to D. Ralph [Ral02] and relies on a characterization of
semismoothness using the Clarke generalized directional derivative.
Definition 4. fClaSS] Suppose ip : IR"" \-> JR is locally Lipschitz. The gener-
alized directional derivative of ip which, when evaluated at A in the direction
h, is given by
ip [X] h) := limsup .
SiO
The different quotient when upper limit is being taken is bounded above
in light of Lipschitz condition. So ip^{X; h) is well defined finite quantity. An
important property of ijj^ is that for any h,
^^(A;/i) - max{(e,/i)| i G dilj{X)]. (37)
We now have the following characterization of semismoothness.
Lemma 5. [Ral02] A locally Lipschitz function ip : IR^ i-^ IR Z5 semismooth
at X if and only if ip is directionally differentiate and
xP{X) + r (A; A - A) - V'(A) < o(||A - A||), and
^(A) - V°(A; - A + A) - ^(A) > o(||A - A||). ^""^^
The equivalence remains valid if the inequalities are replaced by equalities.
Proof. Noticing that (37) implies —'0°(A, —/i) = min^^^^(;^)/i^^, the condi-
tions in (38) are equivalent to
^P{X) + l-nX; -X + A), V^°(A; A - A)] - V^(A) = o{\\X - A||).
Combining with the directional differentiability of ip, this set-valued equation
clearly implies the semismoothness of i/; at A because for any <^ G dip{X), we
have
^^(A - A) € [-^°(A; - A + A), V°(A; A - A)].
Conversely, if -0 is semismooth at A then for any A we take an element ^ G
dip{X) (respectively) to obtain
V^°(A, A - A) - ^^(A - A) (respectively - V^°(A; - A + A) = ^^(-A + A)).
The existence of such ^ follows from compactness of 9-0(A). Then the required
inequalities follows from the semismoothness of T/^ at A. D
40 H.-D. Qi

Now we have our major result concerning the function in (36).


Proposition 3. fRal02] Let (f):W x [0,1] H^ H . Suppose for every t G [0,1]
(/)(•, t) is semismooth at X e M^. Then ^ defined in (36) is also semismooth
at A.
Proof. The directional differentiability of ^ follows from the first part of
[DQQOl, Proposition 3.1]. Now we use Lemma 5 to prove the semismoothness
of ^. To this purpose it is enough to establish the following relation:

(0(A, t) + (/)°((A, t); (A - A, 0)) - 0(A, t)) dt = o(||A - A||). (39)


//o
This implies
^(A) - ^°(A; A - A) - ^(A) < o(||A - A||)
because the first principles give

^°(A;A-A)< / (t)mX,t);{X-X,0))dt.
Jo
If in (39) we replace 0°((A,it); (A-A,0)) by -(/)''{{X,t); (-A4-A,0)) and follow
an argument that is almost identical to the subsequent development, we obtain
the counter condition

^(A) - ^°(A; - A + A) - ^(A) > a(||A - A||)


and the proof is sealed in Lemma 5.
Now let U be the closed unit ball in IR'^ and

e{'.y) = 0(z/) + 0°(2/; • - y ) -</>(•), yeiR^x [o,i].


Let e > 0 we will find S > 0 such that if X eX + SU then

/ e((A,0,(A,0)^^<^l|A-A||.

Since e can be made arbitrarily small, verifying existence of S is equivalent to


verifying (39).
For any (5 > 0 let

A{S) := {t € [0,1]| e((A,0, {X,t)) < |||A - A||, V A G A + <5t/} .

For each A S IR" the mapping 11-> e((A, t), (A, t)) is measurable, hence the set

{t|e((A,i),(A,i))<|||A-A||} .

is also measurable. Thus, A{5), the interior of measurable sets, is itself mea-
surable. Obviously, A{5) C A{S') ii 5 > 5\ And for fixed t G [0,1], semi-
smoothness gives, via Lemma 5, that
Newton's Method for Constrained Best Interpolation 41

e((A,t),(A,0)
0as07^A-A->0,
l|A-A|
i.e., for all small enough S > 0^ t £ A(S),
Let f2{5) := [0,1] \ A{S). The properties of A{6) yields (a) measurability
of i?(5), (b) Q{S) 2 f2{S^) ii 5 > S\ and (c) for each t and all small enough
S > 0, t ^ f2{S). In particular, n5>o^(^) == 0 and it follows that the measure
of i7(5), meas(i7((5)), converges to 0 as J —> 0-|-.
Let L be the Lipschitz constant of (/> in a neighborhood of (A,0), so that
for each A near A,
ei{X,t),{>^,t))<\<f>{X,t)-4>CX,t)\ + \4>°{X,t);{X-X,0))\
< 2 L | | ( A - A , 0 ) | | = 2L||A-A||
using the 2-norm. To sum up,

/ e((A, 0, (A, t))dt =1 f + / ) e((A, t), (A, t))dt


Jo \Jai5) JA{5)J
< (2L||A - A||)meas(i7((5)) + (||A - A||e/2)meas(Z\((5))
< ||A - A||(2Lmeas(J7((5)) + e/2).
Choose 5 > 0 small enough such that meas(i7(5)) < e/(4L), and we are done.
D

Corollary 2. Under Assumption 1, the functions Fj defined in (28) are each


semismooth.
Proof. For each t e [a^b]^ the mapping (/)j : IR^ H-> IR by

£=1

is piecewise linear with respect to A, and hence is semismooth. Then Propo-


sition 3 implies that each Fj defined in (28) is semismooth since Fj(X) =
jl<Pj{X,t)dt. D
Now we are ready to use Theorem 6 of Qi and Sun [QS93] to establish the
super linear convergence of the Newton method (29) for the equation (27).
Theorem 7. Suppose that Assumptions (1), (2) and (3) hold. Then Newton's
method (29) for (27) is superlinearly convergent provided that the initial point
A° is close enough to the unique solution A*.
Proof. Three major elements for the use of Theorem 6 have been established:
(i) V{X) e dF{X) for any A G IR'' (see, corollary 1), (ii) F is regular at A*
(see. Lemma 4), and (iii) F is semismooth since each Fj is semismooth (see,
Corollary 2). The result follows the direct appHcation of Theorem 6 to the
equation (27). D
42 H.-D. Qi

Illustration to (26). The superlinear convergence of the method (26) is a


direct consequence of Theorem 7 because all the assumptions for Theorem
7 are satisfied for the convex best interpolation problem (1). This recovers
the main result in [DQQOl]. Refinement of some results in [DQQOl] by tak-
ing into account of special structures of the 5-splines leads to the quadratic
convergence analysis conducted in [DQQ03].

4.4 Application to Inequality Constraints

Now we consider the approximation problem given by inequality constraints:

K = Cn{x\Ax<b}.
Under the strong CHIP assumption, we have solution characterization (16)
and (17), which we restate below for easy reference.

X>0, w:= APc{x^ + A* A) - 6 > 0, X^w = 0. (40)

Again for computational consideration we assume that C is the cone of positive


functions so that Pc{x) = x+. Below we design Newton's method for (40) and
study when it is superlinearly convergent. To do this, we use the well-known
Fischer-Burmeister NCP function, widely studied in nonlinear complemen-
tarity problems [Fis92, SQ99], to reformulate (40) as a system (semismooth)
equations.
Recall the Fischer-Burmeister function is given by

(t>FB{ci, b) := a-hb — y a^ H- 6^.

Two important properties of (f)FB are

(f)FB{a, b)=0 <=^ a > 0, 6 > 0, ab = 0

and the square ^'^^ is continuously differentiable, though (f)FB is not differ-
entiable. Define

(^Fs(Ai,'w;i)'
;
(f>FB{Xni'^n) ,
and

Then it is easy to see that (40) is equivalent to the nonsmooth equation

Since W is locally Lipschitz, direct calculation gives


Newton's Method for Constrained Best Interpolation 43

dW(Xw)c(( ^W - ^ \,V{X)GdFiX) 1
ovv(A,w) ^ <j^|^£,(;^^^) E{X,w)J ' D{X,w),E{X,w) satisfy (42) and ( 4 3 ) / "
(41)
J9(A, li;) and E{X, w) are diagonal matrices whose £th diagonal element is given
by
D,{X,w):^l- . , . / ' .,,, E , ( A , i / ; ) : - l - — ^ ^ (42)

if (A^,i(;^) 7^ 0 and by
De{X,w) = l-^e, Ee{X,w) = 1 - pe, V(e^,/>^) e JR^ such that \m,pe)\\ <1
(43)
if {Xe.we) == 0.
Lemma 6. Suppose every element V{X) in dF{X) is positive definite. Then
every element of dW{X^ w) is nonsingular.
Proof. Let M(A, w) be an element of the right side set in (41) and let (2/, z) G
IR2n be such that M{y,z) = 0. Then there exist V{X) G dF{X) and i:)(A,'w;)
and E{X, w) satisfying (42) and (43) such that
V{X)y~z = 0 and D{X,w)y-^ E{X,w)z = 0.
Since V{X) is nonsingular, it yields that
{DV-'^-}-E)z = 0.
It is well known from the NCP theory [DFK96, Theorem 21] that the matrix
{DV~^ + E) is nonsingular because V~^ is positive definite according to the
assumption. Hence, z = 0, implying y = 0. This establishes the nonsingularity
of all elements in dW{X, w). D
Newton's method for (40) can be developed as follows
{X-^,w^)-{X,w) = -M-^W{X,w), MedW(X,w). (44)
We have proved that each Fj is semismooth (Corollary 2). Using the fact that
composite of semismooth functions is semismooth and the Fischer-Burmeister
function is strongly semismooth, we know that W is semismooth function.
Suppose (A*,tt;*) is a solution of (40).
Assumption 4. Each be > 0 for i = 1 , . . . , n.
Lemma 7. Suppose Assumption (1), (2) and (4) hold. Then every element
in 9W(A*,it;*) is nonsingular.
Proof. We note that at the solution it holds
APc(a;°+^*A*) = 6 + ^*.
Since w^^ > 0, we see that be -]- w} > 0. Following the proof of Lemma 4 we
can prove that each element V in 9F(A*) is positive definite, and hence each
element of dW{X*,w*) is nonsingular by Lemma 7. D
44 H.-D. Qi

All preparation is ready for the use of Theorem 6 to state the superlinear
convergence of the method (44). The proof is similar to Theorem 7.

Theorem 8. Suppose Assumptions (1), (2) and (4) hold. Then the Newton
method (44) ^s superlinearly convergent provided that the initial point (X^^w^)
is sufficiently close to {X*^w*).

We remark that the quadratic convergence is also possible if we could


establish the strong semismoothness of W at (A*,tt;*). A sufficient condition
for this property is that each Fj is strongly semismooth since the Fischer-
Burmeister function is automatically strongly semismooth.

4.5 Globalization

In the previous subsections, Newton's method is developed for nonsmooth


equations arising from constrained interpolation and approximation problems.
It is locally superlinearly convergent under reasonable conditions. It is also
worth of mentioning it globalization scheme that makes the Newton method
globally convergent.
The first issue to be resolved is that we need an objective function for
the respective problems. Natural choices for objective functions are briefly
described below with outline of an algorithmic scheme, but without global
convergence analysis. It is easy to see (following discussion in [MU88, DQQOl])
that the function / given by

pb / n \ 2 n
/(A):=/ x^ + ^ A . a , dt-^X^be
*^" V £=1 /+ e=i
severs this purpose because

V/(A) = F{X) - b.

Since / is convex, ||V/(A)|| = ll-P'(A) — 6|| can be used to monitor the conver-
gence of global methods. We present below a global method, which globalizes
the method (29) and has been shown extremely efficient for the convex best
interpolation problem (1).

Algorithm 1. (Damped Newton method)


(5.0) (Initialization) Choose A° G R'', p G (0,1), a G (0,1/2), and tolerance
tol > 0. A: := 0.
(5.1) (Termination criterion) If Ck = \\F{X^) — d\\ < tol then stop. Otherwise,
go to (S.2).
(5.2) (Direction generation) Let s^ be a solution of the following linear system

{V{X'')+ekI)s = -Vf{X''). (45)


Newton's Method for Constrained Best Interpolation 45

(5.3) (Line search) Choose rrik as the smallest nonnegative integer m satis-
fying
/(A^ + p'^s^) - /(A^) < ap'^VfiX^fsK (46)
(5.4) (Update) Set A^+i = A^ + p ^ ^ 5 ^ A: : - A; + 1, return to step (S.l).

Since V{X) is positive semidefinite, the matrix {V{X) + cl) is positive defi-
nite for e > 0. Hence the linear equation (45) is well defined and the direction
s^ is a descent direction for the objective function / . The global convergence
anafysis for Algorithm 1 is standard and can be found in [DQQ03].
Globalized version for the method (44) can be developed as well, but with
some notable differences. To this case, the objective function f{X,w) is given
by

fiX,w):= f (x° + ^\eae] dt-J2\e(b + w) + \\^FB(X, w)f


-'" V e=i /+ e=i
This function is also continuously differentiable, but not convex because
||^Fs(A, w)\\'^ is not convex although continuously differentiable. We also note
that the gradient of /(A, w) is not W{X, w) any more. A global method based
on / can be developed by following the scheme in [DFK96].

5 Open Problems
It is obvious from Section 2 and Section 4 that there is a big gap between
theoretical results and Newton-type algorithms for constrained interpolation
problems. For example, the solution characterizations appeared in Theorems
1, 3, and 4 are for general convex sets (i.e., C is a closed convex set), however,
the Newton method well-developed so far is only on the particular case yet
the most important case that C is the cone of positive functions. This is
due to the fact that the projection is an essential ingredient when solving
the interpolation problem, and that the projection on the cone of positive
functions is easy to calculate.
There are many problems that are associated to the projections onto other
convex sets including cones. We only discuss two of them which we think are
most interesting and likely to be (at least partly) solved by the techniques
developed in this paper. The first one is the case that C is a closed polyhedral
set in X, i.e.,
C := {x e X\ {ci,x) <ri^ i = 1 , . . . ,m}
where Ci G X and r^ G IR. We note that cones are not necessarily polyhe-
dral. It follows from [DeuOl, Examples 10.7 and 10.9] that the sets {C.nHj}
and {C^nHj} both have strong CHIP. Hence the solution characterization
theorems are applicable to the polyhedral case. Questions related to Pc in-
clude diff'erentiability, directional differentiability, generalized Jacobian and
46 H.-D. Qi

semismoothness of the mapping APc, and most importantly how to design


Newton's method for this case.
The second is the problem of interpolating a finite set of points with a
curve constrained to lie between two piecewise linear splines (with knots at
the abscissae of the given points). The objective is to minimize the 2-norm of
the second derivative of the interpolant. Let {ti^yi) be given data points in
IR2 with

to <ti < ,,. Ktn-, (l>{ti) <yi < i^iU) for i = 1 , . . . , n.

Hence 0 and ip are given piecewise hnear functions (or more generally lower
and upper semicontinuous functions, respectively) such that

inf Mt) - m) > 0.

The constraint is

C:={xe W^^'^[toM\ Ht) < ^{t) < ^W}


and
H :={xe W'^^'^[to,tn]\ x{ti) = y^,i = 1 , . . . , n } .
This problem can be reformulated as a constrained interpolation problem from
a convex set in certain Hilbert space [Don93, AE95]. Questions similar to that
for the first problem remain unsolved for this interpolation problem from a
strip.

Acknowledgement
The author would like to thank Danniel Ralph for his constructive comments
on the topic and especially for his kind offer of his material [Ral02] on semi-
smoothness of integral functions being included in this survey (i.e., Sec. 4.3).
It is also interesting to see how his approach can be extended to cover the
strongly semismooth case.
The work was done while the author was with School of Mathematics, The
University of New South Wales, Australia, and was supported by Australian
Research Council.

References
[AE87] Andersson, L.-E., Elfving, T.: An algorithm for constrained interpolation.
SIAM J. Sci. Statist. Comput., 8, 1012-1025 (1987)
Newton's Method for Constrained Best Interpolation 47

[AE95] Andersson, L.-E., ElfVing, T.: Best constrained approximation in Hilbert


space and interpolation by cubic splines subject to obstacles. SI AM J. Sci,
Comput., 16, 1209-1232 (1995)
[BBTOO] Bauschke, H.H., Borwein, J.M., Tseng, P.: Bounded linear regularity,
strong CHIP, and CHIP are distinct properties. J. Convex Anal., 7, 395-
412 (2000)
[BBL99] Bauschke, H.H., Borwein, J.M., Li, W.: Strong conical hull intersection
property, bounded linear regularity, Jameson's property {G), and error
bounds in convex optimization. Math. Program., 86, 135-160 (1999)
[BL92] Borwein, J., Lewis, A.S.: Partially finite convex programming I: Quasi
relative interiors and duality theory. Math. Program. 57, 15-48 (1992)
[Cla83] Clarke, F.H.: Optimization and Nonsmooth Analysis. John Wiley & Sons,
New York (1983)
[CDW90] Chui, C.K., Deutsch, F., Ward, J.D.: Constrained best approximation in
Hilbert space. Constr. Approx., 6, 35-64 (1990)
[CDW92] Chui, C.K., Deutsch, P., Ward, J.D.: Constrained best approximation in
Hilbert space II, J. Approx. Theory, 71 (1992), pp. 213-238.
[deB78] de Boor, C : A Practical Guide to Splines. Springer-Verlag, New York
(1978)
[DFK96] De Luca, T., Facchinei, F., Kanzow, C : A semismooth equation approach
to the solution of nonlinear complementarity problems. Math. Program.,
75, 407-439 (1996)
[DeuOl] Deutsch, F.: Best approximation in inner product spaces. CMS Books in
Mathematics 7. Springer-Verlag, New York (2001)
[DLW97] Deutsch, F., Li, W., Ward, J.D.: A dual approach to constrained inter-
polation from a convex subset of Hilbert space. J. Approx. Theory, 90,
385-414 (1997)
[DLW99] Deutsch, F., Li, W., Ward, J.D.: Best approximation from the intersection
of a closed convex set and a polyhedron in Hilbert space, weak Slater
conditions, and the strong conical hull intersection property. SI AM J.
Optim., 10, 252-268 (1999)
[DUWX96] Deutsch, F., Ubhaya, V.A., Ward, J.D., Xu, Y.: Constrained best ap-
proximation in Hilbert space. III. Applications to n-convex functions.
Constr. Approx., 12, 361-384 (1996)
[Don93] Dontchev, A.L.: Best interpolation in a strip. J. Approx. Theory, 73 334-
342 (1993)
[DK89] Dontchev, A.L., Kalchev, B.D.: Duality and well-posedness in convex in-
terpolation. Numer. Funct. Anal, and Optim., 10, 673-689 (1989)
[DK96] Dontchev, A.L., Kolmanovsky, I.: Best interpolation in a strip. II. Re-
duction to unconstrained convex optimization. Comput. Optim. Appl., 5,
233-251 (1996)
[DQQOl] Dontchev, A.L., Qi, H.-D., Qi, L.: Convergence of Newton's method for
convex best interpolation. Numer. Math., 87 435-456 (2001)
[DQQ03] Dontchev, A.L., Qi, H.-D., Qi, L.: Quadratic convergence of Newton's
method for convex interpolation and smoothing. Constr. Approx., 19,
123-143 (2003)
[DQQY02] Dontchev, A.L., Qi, H.-D., Qi, L., Yin, H.: A Newton method for shape-
preserving spline interpolation. SIAM J. Optim., 13, 588-602 (2002)
[FP03] Facchinei, F., Pang, J.-S.: Finite-dimensional variational inequalities and
complementarity problems. Vol. I & II. Springer-Verlag, New York (2003)
48 H.-D. Qi

[Fav40] Favard, J.: Sur rinterpolation. J. Math. Pures AppL, 19, 281-306 (1940)
[Fis92] Fischer, A.: A special Newton-type optimization method. Optimization,
24, 269-284 (1992)
[GT90] Gowda, M.S., Teboulle, M.: A comparison of constraint qualifications in
infinite-dimensional convex programming. SI AM J. Control Optim., 28,
925-935 (1990)
[Hor80] Hornung, U.: Interpolation by smooth functions under restriction on the
derivatives. J. Approx. Theory, 28, 227-237 (1980)
[IP84] Iliev, G., Pollul, W.: Convex interpolation by functions with minimal Lp
norm (1 < p < oo) of the /cth derivative. Mathematics and mathematical
education (Sunny Beach, 1984), 31-42, Bulg. Akad. Nauk, Sofia (1984)
[IMS86] Irvine, L.D., Marin, S.P., Smith, P.W.: Constrained interpolation and
smoothing. Constr. Approx., 2, 129-151 (1986)
[Jey92] V. Jeyakumar: Infinite-dimensional convex programming with applica-
tions to constrained approximation. J. Optim. Theory AppL, 75, 569-586
(1992)
[JL98] V. Jeyakumar, D.T. Luc: Approximate Jacobian matrices for nonsmooth
continuous maps and C^-optimization. SIAM J. Control Optim., 36,
1815-1832 (1998)
[JW92] V. Jeyakumar, H. Wolkowicz: Generalizations of Slater's constraint qual-
ification for infinite convex programs. Math. Program., 57, 85-101 (1992)
[JQ95] Jiang, H., Qi, L.: Local uniqueness and Newton-type methods for non-
smooth variational inequahties. J. Math. Analysis and AppL, 196 314-331
(1995)
[KK02] Klatte D., Kummer, B.: Nonsmooth equations in optimization. Regular-
ity, calculus, methods and applications. Nonconvex Optimization and its
Applications, 60. Kluwer Academic Publishers, Dordrecht (2002)
[Kum88] B. Kummer: Newton's method for nondifferentiable functions. Advances
in mathematical optimization, 114-125, Math. Res., 45, Akademie-Verlag,
Berlin (1988)
[LJ02] Li, C , Jin, X.Q.: NonHnearly constrained best approximation in Hilbert
spaces: the strong chip and the basic constraint qualification. SIAM J.
Optim., 13, 228-239 (2002)
[LN02] Li, C , Ng, K.F.: On best approximation by nonconvex sets and perturba-
tion of nonconvex inequality systems in Hilbert spaces. SIAM J. Optim.,
13, 726-744 (2002)
[LN03] Li, C , Ng, K.F.: Constraint qualification, the strong chip, and best ap-
proximation with convex constraints in Banach spaces. SIAM J. Optim.,
14, 584-607 (2003)
[MSSW85] Micchein, C.A., Smith, P.W., Swetits, J., Ward, J.D.: Constrained Lp
approximation. Constr. Approx., 1, 93-102 (1985)
[MU88] Micchelfi, C.A., Utreras, F.I.: Smoothing and interpolation in a convex
subset of a Hilbert space. SIAM J. Sci. Statist. Comput., 9, 728-747
(1988)
[Mif77] Miflflin, R.: Semismoothness and semiconvex functions in constrained op-
timization. SIAM J. Control Optim., 15, 959-972 (1977)
[QS93] Qi, L., Sun, J.: A nonsmooth version of Newton's method. Math. Pro-
gram., 58, 353-367 (1993)
[Ral02] Ralph, D.: Personal communication. May. (2002)
Newton's Method for Constrained Best Interpolation 49

[SQ99] Sun, D., Qi, L.: On NCP-functions. Comput. Optim. Appl., 13, 201-220
(1999)
[Xu99] Xu, H.: Set-valued approximations and Newton's methods. Math. Pro-
gram., 84, 401-420 (1999)
Optimization Methods in Direct and Inverse
Scattering

Alexander G. Ramm^ and Semion Gutman^

^ Department of Mathematics, Kansas State University


Manhattan, Kansas 66506-2602, USA
rammOmath.ksu.edu
^ Department of Mathematics, University of Oklahoma
Norman OK 73019, USA
sgutmanQou.edu

Summary. In many Direct and Inverse Scattering problems one has to use a
parameter-fitting procedure, because analytical inversion procedures are often not
available. In this paper a variety of such methods is presented with a discussion of
theoretical and computational issues.
The problem of finding small subsurface inclusions from surface scattering data
is stated and investigated. This Inverse Scattering problem is reduced to an opti-
mization problem, and solved by the Hybrid Stochastic-Deterministic minimization
algorithm. A similar approach is used to determine layers in a particle from the
scattering data.
The Inverse potential scattering problem is described and its solution based on
a parameter fitting procedure is presented for the case of spherically symmetric
potentials and fixed-energy phase shifts as the scattering data. The central feature
of the minimization algorithm here is the Stability Index Method. This general
approach estimates the size of the minimizing sets, and gives a practically useful
stopping criterion for global minimization algorithms.
The 3D inverse scattering problem with fixed-energy data is discussed. Its so-
lution by the Ramm's method is described. The cases of exact and noisy discrete
data are considered. Error estimates for the inversion algorithm are given in both
cases of exact and noisy data. Comparison of the Ramm's inversion method with
the inversion based on the Dirichlet-to-Neumann map is given and it is shown that
there are many more numerical difficulties in the latter method than in the Ramm's
method.
An Obstacle Direct Scattering problem is treated by a novel Modified Rayleigh
Conjecture (MRC) method. MRC's performance is compared favorably to the well
known Boundary Integral Equation Method, based on the properties of the single
and double-layer potentials. A special minimization procedure allows one to inex-
pensively compute scattered fields for 2D and 3D obstacles having smooth as well
as nonsmooth surfaces.
A new Support Function Method (SFM) is used for Inverse Obstacle Scattering
problems. The SFM can work with limited data. It can also be used for Inverse
52 A.G. Ramm, S. Gutman

scattering problems with unknown scattering conditions on its boundary (e.g. soft,
or hard scattering). Another method for Inverse scattering problems, the Linear
Sampling Method (LSM), is analyzed. Theoretical and computational difficulties in
using this method are pointed out.

1 Introduction
Suppose that an acoustic or electromagnetic wave encounters an inhomo-
geneity and, as a consequence, gets scattered. The problem of finding the
scattered wave assuming the knowledge of the inhomogeneity (penetrable or
not) is the Direct Scattering problem. An impenetrable inhomogeneity is also
called an obstacle. On the other hand, if the scattered wave is known at
some points outside an inhomogeneity, then we are faced with the Inverse
Scattering problem, the goal of which is to identify this inhomogeneity, see
[CCMOO, CK92, Ram86, Ram92b, Ram94a, Ram05a, Ram05b]
Among a variety of methods available to handle such problems few pro-
vide a mathematically justified algorithm. In many cases one has to use a
parameter-fitting procedure, especially for inverse scattering problems, be-
cause the analytical inversion procedures are often not available. An impor-
tant part of such a procedure is an efficient global optimization method, see
[FloOO, FPOl, HPT95, HT93, PRTOO, RubOO].
The general scheme for parameter-fitting procedures is simple: one has a
relation B{q) = A, where B is some operator, q is an unknown function, and A
is the data. In inverse scattering problems q is an unknown potential, and A is
the known scattering amplitude. If q is sought in a finite-parametric family of
functions, then q = q{x^p), where p = (pi, ....,Pn) is a parameter. The parame-
ter is found by solving a global minimization problem: ^[B{q{x,p))—A] = min,
where ^ is some positive functional, and q E Q^ where Q is an admissible set
oi q. In practice the above problem often has many local minimizers, and the
global minimizer is not necessarily unique. In [Ram92b, Ram94b] some func-
tional ^ are constructed which have unique global minimizer, namely, the
solution to inverse scattering problem, and the global minimum is zero.
Moreover, as a rule, the data A is known with some error. Thus As is
known, such that \\A — As\\ < S. There are no stability estimates which would
show how the global minimizer q{x^Popt) is perturbed when the data A are
replaced by the perturbed data A5. In fact, one can easily construct examples
showing that there is no stability of the global minimizer with respect to small
errors in the data, in general.
For these reasons there is no guarantee that the parameter-fitting proce-
dures would yield a solution to the inverse problem with a guaranteed accu-
racy. However, overwhelming majority of practitioners are using parameter-
fitting procedures. In dozens of published papers the results obtained by vari-
ous parameter-fitting procedures look quite good. The explanation, in most of
the cases is simple: the authors know the answer beforehand, and it is usually
Optimization Methods in Direct and Inverse Scattering 53

not difficult to parametrize the unknown function so that the exact solution is
well approximated by a function from a finite-parametric family, and since the
authors know a priori the exact answer, they may choose numerically the val-
ues of the parameters which yield a good approximation of the exact solution.
When can one rely on the results obtained by parameter-fitting procedures?
Unfortunately, there is no rigorous and complete answer to this question, but
some recommendations are given in Section 4-
In this paper the authors present their recent results which are based on
specially designed parameter-fitting procedures. Before describing them, let us
mention that usually in a numerical solution of an inverse scattering problem
one uses a regularization procedure, e.g. a variational regularization, spectral
cut-ofi", iterative regularization, DSM (the dynamical systems method), quasi-
solutions, etc, see e.g. [Ram04a, Ram05a]. This general theoretical framework
is well established in the theory of ill-posed problems, of which the inverse
scattering problems represent an important class. This framework is needed
to achieve a stable method for assigning a solution to an ill-posed problem,
usually set in an infinite dimensional space. The goal of this paper is to present
optimization algorithms already in a finite dimensional setting of a Direct or
Inverse scattering problem.
In Section 2 the problem of finding small subsurface inclusions from sur-
face scattering data is investigated ([Ram97, RamOOa, Ram05a, Ram05b]).
This (geophysical) Inverse Scattering problem is reduced to an optimization
problem. This problem is solved by the Hybrid Stochastic-Deterministic min-
imization algorithm ([GROO]). It is based on a genetic minimization algorithm
ideas for its random (stochastic) part, and a deterministic minimization with-
out derivatives used for the local minimization part.
In Section 3 a similar approach is used to determine layers in a particle
subjected to acoustic or electromagnetic waves. The global minimization al-
gorithm uses Rinnooy Kan and Timmer's Multilevel Single-Linkage Method
for its stochastic part.
In Section 4 we discuss an Inverse potential scattering problem appear-
ing in a quantum mechanical description of particle scattering experiments.
The central feature of the minimization algorithm here is the Stability Index
Method ([GRS02]). This general approach estimates the size of the minimizing
sets, and gives a practically useful stopping criterion for global minimization
algorithms.
In Section 5 Ramm's method for solving 3D inverse scattering problem
with fixed-energy data is presented following [Ram04d], see also [Ram02a,
Ram05a]. The cases of exact and noisy discrete data are considered. Error
estimates for the inversion algorithm are given in both cases of exact and
noisy data. Comparison of the Ramm's inversion method with the inversion
based on the Dirichlet-to-Neumann map is given and it is shown that there
are many more numerical difficulties in the latter method than in Ramm's
method.
54 A.G. Ramm, S. Gutman

In Section 6 an Obstacle Direct Scattering problem is treated by a novel


Modified Rayleigh Conjecture (MRC) method. It was introduced in [Ram02b]
and applied in [GR02b, GR05, Ram04c, Ram05b]. MRC's performance is
compared favorably to the well known Boundary Integral Equation Method,
based on the properties of the single and double-layer potentials. A special
minimization procedure allows us to inexpensivly compute scattered fields for
several 2D and 3D obstacles having smooth as well as nonsmooth surfaces.
In Section 7 a new Support Function Method (SFM) is used to determine
the location of an obstacle (cf [GR03, RamTO, Ram86]). Unlike other methods,
the SFM can work with limited data. It can also be used for Inverse scattering
problems with unknown scattering conditions on its boundary (e.g. soft or hard
obstacles).
Finally, in Section 8, we present an analysis of another popular method for
Inverse scattering problems, the Linear Sampling Method (LSM), and show
that both theoretically and computationally the method fails in many aspects.
This section is based on the paper [RG05].

2 Identification of small subsurface inclusions


2.1 Problem description

In many applications it is desirable to find small inhomogeneities from surface


scattering data. For example, such a problem arises in ultrasound mammogra-
phy, where small inhomogeneities are cancer cells. Other examples include the
problem of finding small holes and cracks in metals and other materials, or the
mine detection. The scattering theory for small scatterers originated in the
classical works of Lord Rayleigh (1871). Rayleigh understood that the basic
contribution to the scattered field in the far-field zone comes from the dipole
radiation, but did not give methods for calculating this radiation. Analytical
formulas for calculating the polarizability tensors for homogeneous bodies of
arbitrary shapes were derived in [Ram86] (see also references therein). These
formulas allow one to calculate the 5-matrix for scattering of acoustic and
electromagnetic waves by small bodies of arbitrary shapes with arbitrary ac-
curacy. Inverse scattering problems for small bodies are considered in [Ram82]
and [Ram94a]. In [Ram97] and [RamOOa] the problem of identification of small
subsurface inhomogeneities from surface data was posed and its possible ap-
plications were discussed.
In the context of a geophysical problem, let ^ G R^ be a point source of
monochromatic acoustic waves on the surface of the earth. Let u{x,y, k) be
the acoustic pressure at a point x G M^, and A: > 0 be the wavenumber. The
governing equation for the acoustic wave propagation is:

[V^ + fc^ -h k^v{x)] u = -S{x - y) in R^ (1)


Optimization Methods in Direct and Inverse Scattering 55

where x = (a:i,X2,X3), v{x) is the inhomogeneity in the velocity profile, and


u{x^ y, k) satisfies the radiation condition at infinity, i.e. it decays sufficiently
fast as \x\ —^ oo.
Let us assume that v{x) is a bounded function vanishing outside of the
domain D — \j!^^^Dm which is the union of M small nonintersecting domains
Dmi all of them are located in the lower half-space R^ = {x : X3 < 0}. Small-
ness is understood in the sense /cp <C 1, where p '•= \ maxi<^<M{diamZ)^},
and diam D is the diameter of the domain D. Practically kp <^ 1 means that
kp < 0.1. In some cases kp < 0.2 is sufficient for obtaining acceptable numer-
ical results. The background velocity in (1) equals to 1, but we can consider
the case of fairly general background velocity [Ram94a].
Denote Zm and Vm the position of the center of gravity of Dm, and the
total intensity of the m-th inhomogeneity Vm '-— J£, v{x)dx. Assume that
Vm 7^ 0. Let P be the equation of the surface of the earth:

P := {x = {xuX2,X3) G M^ : X3 = 0}. (2)

The inverse problem to be solved is:


I P : Given u{x,y,k) for all source-detector pairs {x,y) on P at a fixed
A: > 0; find the number M of small inhomogeneities, the positions Zm of the
inhomogeneities, and their intensities Vm-
Practically, one assumes that a fixed wavenumber A: > 0, and J source-
detector pairs {xj,yj),j ~ 1,2,..., J, on P are known together with the
acoustic pressure measurements u{xj,yj,k). Let

expiiklx — y\) ^

Gj{z):^G{xj,yj,z)\-= g{xj,z,k)g{yj,z,k), Xj.yj e P, z G R?., (4)


n ._ u{xj,yj,k)-g{xj,yj,k)
Jo '- p ' y^)
and
J M
^ ( 2 1 , . . . , 2 : M , VI,...,VM) '= Y^ fj- Yl^o(^rn)Vn (6)
m=l

The proposed method for solving the (IP) consists of finding the global
minimizer of function (6). This minimizer ( ^ 1 , . . . , ZM, ^I> • • • ? VM) gives the
estimates of the positions Zm of the small inhomogeneities and their intensities
Vm- See [Ram97] and [RamOOa] for a justification of this approach.
The function ^ depends on M unknown points 2;^^ G R i , and M unknown
parameters Vm, I < m < M. The number M of the small inhomogeneities is
also unknown, and its determination is a part of the minimization problem.
56 A.G. Ramm, S. Gutman

2.2 Hybrid Stochastic-Deterministic M e t h o d ( H S D )

Let the inhomogeneities be located within the box

B = {(xi, X2, X3) : —a < xi < a, —b<X2<b, 0 < X3 < c} , (7)

and their intensities satisfy

max ' (8)

The box is located above the earth surface for a computational convenience.
Then, given the location of the points 2:1, ^ 2 , . . . , ZM^ the minimum of <P
with respect to the intensities t;i,t'2, • • • ^VM can be found by minimizing the
resulting quadratic function in (6) over the region satisfying (8). This can be
done using normal equations for (6) and projecting the resulting point back
onto the region defined by (8). Denote the result of this minimization by c^,
that is

^(2:1,^2,. • . , Z M ) = m i n { ^ ( ^ i , 2 : 2 , . . . ,ZM,VI,V2, - • -,VM) '


(9)
0 <Vm < Vmax , 1< m < M}

Now the original minimization problem for ^(^1,^2, • • • 5 ZM^VIIV2^ .. •, VM)


is reduced to the 3M-dimensional constrained minimization for ^(2:1,2^2,..., ZM)

3{zi, Z2,..., ZM) =rnm, Zm ^ B , 1 < m < M. (10)

Note, that the dependency of 3 on its 3M variables (the coordinates of the


points Zm) is highly nonlinear. In particular, this dependency is complicated by
the computation of the minimum in (9) and the consequent projection onto the
admissible set B. Thus, an analytical computation of the gradient of 3 is not
computationally efficient. Accordingly, the Powell's quadratic minimization
method was used to find local minima. This method uses a special procedure
to numerically approximate the gradient, and it can be shown to exhibit the
same type of quadratic convergence as conjugate gradient type methods (see
[Bre73]).
In addition, the exact number of the original inhomogeneities MoHg is
unknown, and its estimate is a part of the inverse problem. In the HSD algo-
rithm described below this task is accomplished by taking the initial number
M sufficiently large, so that

Morig<M, (11)

which, presumably, can be estimated from physical considerations. After all,


our goal is to find only the strongest inclusions, since the weak ones cannot be
distinguished from background noise. The Reduction Procedure (see below)
allows the algorithm to seek the minimum of ^ in a lower dimensional subsets
Optimization Methods in Direct and Inverse Scattering 57

Fig. 1. Objective function ^{zr,Z2,Z3^Z4^Z5,ZQ), —2 < r < 2

of the admissible set B^ thus finding the estimated number of inclusions M.


Still another difficulty in the minimization is a large number of local minima
of 3. This phenomenon is well known for objective functions arising in various
inverse problems, and we illustrate this point in Figure 1.
For example, let Morig = 6, and the coordinates of the inclusions, and
their intensities {zi,... ,ze,vij >.. yVe) be as in Table 1. Figure 1 shows the
values of the function ^{zr, Z2, zs, Z4, zs^ ZQ), where

Zr = (r,0,0.520), -2 < r < 2

and
Z2 = (-1,0.3,0.580),
The plot shows multiple local minima and almost flat regions.
A direct application of a gradient type method to such a function would
result in finding a local minimum, which may or may not be the sought global
one. In the example above, such a method would usually be trapped in a lo-
cal minimum located at r = —2, r = —1.4, r = —0.6, r = 0.2 or r = 0.9,
58 A.G. Ramm, S. Gutman

and the desired global minimum at r = 1.6 would be found only for a suffi-
ciently close initial guess 1.4 < r < 1.9. Various global minimization methods
are known (see below), but we found that an efficient way to accomplish
the minimization task for this Inverse Problem was to design a new method
(HSD) combining both the stochastic and the deterministic approach to the
global minimization. Deterministic minimization algorithms with or without
the gradient computation, such as the conjugate gradient methods, are known
to be efficient (see [Bre73, DS83, Jac77, Pol71]), and [RubOO]. However, the
initial guess should be chosen sufficiently close to the sought minimum. Also
such algorithms tend to be trapped at a local minimum, which is not nec-
essarily close to a global one. A new deterministic method is proposed in
[BP96] and [BPR97], which is quite efficient according to [BPR97]. On the
other hand, various stochastic minimization algorithms, e.g. the simulated
annealing method [KGV83, Kir84], are more likely to find a global minimum,
but their convergence can be very slow. We have tried a variety of minimiza-
tion algorithms to find an acceptable minimum of 3. Among them were the
Levenberg-Marquardt Method, Conjugate Gradients, Downhill Simplex, and
Simulated Annealing Method. None of them produced consistent satisfactory
results.
Among minimization methods combining random and deterministic searches
we mention Deep's method [DE94] and a variety of clustering methods
[RT87a], [RT87b]. An application of these methods to the particle identifi-
cation using light scattering is described in [ZUB98]. The clustering methods
are quite robust (that is, they consistently find global extrema) but, usually,
require a significant computational eff'ort. One such method is described in
the next section on the identification of layers in a multilayer particle. The
HSD method is a combination of a reduced sample random search method
with certain ideas from Genetic Algorithms (see e.g. [HH98]). It is very effi-
cient and seems especially well suited for low dimensional global minimization.
Further research is envisioned to study its properties in more detail, and its
applicability to other problems.
The steps of the Hybrid Stochastic-Deterministic (HSD) method are
outlined below. Let us call a collection of M points ( inclusion's centers)
{ZI,Z2,>.'-,ZM}^ Zi e B a, configuration Z. Then the minimization problem
(10) is the minimization of the objective function ^ over the set of all config-
urations.
For clarity, let PQ = 1, e^ = 0.5, e^ = 0.25, Cd = 0.1, be the same values
as the ones used in numerical computations in the next section.
Generate a random configuration Z. Compute the best fit intensities Vi
corresponding to this configuration. If Vi > Vmaxi then let Vi :== Vmax- If
Vi < 0, then let Vi :== 0. If <P(Z) < PQCS, then this configuration is a preliminary
candidate for the initial guess of a deterministic minimization method (Step
!)•
Drop the points Zi e Z such that Vi < Vmax^i- That is, the inclusions with
small intensities are eliminated (Step 2).
Optimization Methods in Direct and Inverse Scattering 59

If two points Zky Zj G Z are too close to each other, then replace them with
one point of a combined intensity (Step 3).
After completing steps 2 and 3 we would be left with N < M points
zi,Z2^'.",Z]s[ (after a re-indexing) of the original configuration Z. Use this re-
duced configuration Zred as the starting point for the deterministic restraint
minimization in the 3N dimensional space (Step 4). Let the resulting mini-
mizer be Zred = (^i, ••-, ^iv)- If the value of the objective function 3{Zred) < e,
then we are done: Zred is the sought configuration containing N inclusions. If
^(Zred) ^ e, then the iterations should continue.
To continue the iteration, randomly generate M — N points in B (Step
5). Add them to the reduced configuration Zred- Now we have a new full
configuration Z, and the iteration process can continue (Step 1).
This entire iterative process is repeated Umax times, and the best config-
uration is declared to represent the sought inclusions.

2.3 Description of the H S D Method

Let PQ, Tmax^ '^max', ^s? ^ii ^di and € be positive numbers. Let a positive
integer M be larger than the expected number of inclusions. Let N = 0.

1. Randomly generate M — N additional points ZN-^I, . • •, ^M ^ B to obtain


a full configuration Z = ( z i , . . . , ZM)- Find the best fit intensities Vi, i =
1,2, ...,M. If Vi> Vmax, then let Vi := Vmax- If Vi < 0, then let Vi := 0.
Compute Ps = 3{zi,Z2 . . . , ^M)- If ^5 < ^0^5 then go to step 2, otherwise
repeat step 1.
2. Drop all the points with the intensities Vi satisfying vi < VmaxU- Now
only N < M points zi^Z2. > ^ -^ZN (re-indexed) remain in the configuration
Z,
3. If any two points Zm, Zn in the above configuration satisfy \zm — Zn\ < e^D,
where D = diam{B), then eliminate point Zn? change the intensity of point
Zm to Vm-^^n^ and assign N := N—1. This step is repeated until no further
reduction in N is possible. Call the resulting reduced configuration with
N points by Zred-
4. Run a constrained deterministic minimization of ^ in 3A^ variables, with
the initial guess Zred- Let the minimizer be Zred = (^1, • • •, ^^AT). If i^ ==
^ ( ^ 1 , . . . , ZN) < e, then save this configuration, and go to step 6, otherwise
let PQ = P, and proceed to the next step 5.
5. Keep intact N points zi^... ,ZN- If the number of random configurations
has exceeded Tmax (the maximum number of random tries), then save the
configuration and go to step 6, otherwise go to step 1, and use these A^
points there.
6. Repeat steps 1 through 5 Umax times.
7. Find the configuration among the above Umax ones, which gives the small-
est value to ^. This is the best fit.
60 A.G. Ramm, S. Gutman

The Powell's minimization method (see [Bre73] for a detailed description)


was used for the deterministic part, since this method does not need gradient
computations, and it converges quadratically near quadratically shaped min-
ima. Also, in step 1, an idea from the Genetic Algorithm's approach [HH98] is
implemented by keeping only the strongest representatives of the population,
and allowing a mutation for the rest.

2.4 Numerical results

The algorithm was tested on a variety of configurations. Here we present the


results of just two typical numerical experiments illustrating the performance
of the method. In both experiments the box B is taken to be

B — {(xi, 0:2,3:3) : —a < xi < a, —b<X2<b^ 0 < X3 < c} ,

with a = 2, 6 = 1 , c~l. The wavenumber fc = 5, and the effective intensities


Vm are in the range from 0 to 2. The values of the parameters were chosen as
follows

Po-l.Trr 1000, e.s 0.5, Ci = 0.25, e^ = 0 . 1 , e =— 10


in-5

In both cases we searched for the same 6 inhomogeneities with the coordinates
xi,X2,X3 and the intensities v shown in Table 1.

Table 1. Actual inclusions.


Inclusions Xl X2 X3 V
1 1.640 -0.510 0.520 1.200
2 -1.430 -0.500 0.580 0.500
3 1.220 0.570 0.370 0.700
4 1.410 0.230 0.740 0.610
5 -0.220 0.470 0.270 0.7001
6 -1.410 0.230 0.174 0.600

Parameter M was set to 16, thus the only information on the number
of inhomogeneities given to the algorithm was that their number does not
exceed 16. This number was chosen to keep the computational time within
reasonable limits. Still another consideration for the number M is the aim of
the algorithm to find the presence of the most influential inclusions, rather
then all inclusions, which is usually impossible in the presence of noise and
with the limited amount of data.
Experiment 1. In this case we used 12 sources and 21 detectors, all on
the surface xs = 0. The sources were positioned at {(—1.667 -f 0.667i, —0.5 +
l.Oj, 0), i = 0 , 1 , . . . , 5, j = 0,1}, that is 6 each along two lines X2 = —0.5 and
X2 = 0.5. The detectors were positioned at {(—2 + 0.667z, —1.0+ l.Oj, 0), i =
0 , 1 , . . . , 6 , J = 0,1,2}, that is seven detectors along each of the three lines
Optimization Methods in Direct and Inverse Scattering 61

X2 = —1^X2 = 0 and ^2 = 1- This corresponds to a mammography search,


where the detectors and the sources are placed above the search area. The
results for noise level 5 = 0.00 are shown in Figure 2 and Table 2. The results
for noise level 5 = 0.05 are shown in Table 3.

Table 2. Experiment 1. Identified inclusions, no noise, S = 0.00.


Xl X2 X3 V
1.640 -0.510 0.520 1.20000
-1.430 -0.500 0.580 0.50000
1.220 0.570 0.370 0.70000
1.410 0.230 0.740 0.61000
-0.220 0.470 0.270 0.70000
-1.410 0.230 0.174 0.60000

Table 3. Experiment 1. Identified inclusions, <5 = 0.05.


Xl X2 X3 V
1.645 -0.507 0.525 1.24243
1.215 0.609 0.376 0.67626
-0.216 0.465 0.275 0.69180
-1.395 0.248 0.177 0.60747

Experiment 2. In this case we used 8 sources and 22 detectors, all on


the surface xs = 0. The sources were positioned at {(—1.75 -f 0.5i, 1.5,0), i =
0 , 1 , . . . , 7, j = 0,1}, that is all 8 along the line X2 = 1.5. The detectors were
positioned at {(-2-h0.4z, 1.0+l.Oj, 0), z -= 0 , 1 , . . . , 10, j = 0,1}, that is eleven
detectors along each of the two Hues X2 = 1 and ^2 = 2. This corresponds to
a mine search, where the detectors and the sources must be placed outside of
the searched ground. The results of the identification for noise level 5 = 0.00
in the data are shown in Figure 3 and Table 4. The results for noise level
J = 0.05 are shown in Table 5.

Table 4. Experiment 2. Identified inclusions, no noise, S = 0.00.


Xl X2 X3 V
1.656 -0.409 0.857 1.75451
-1.476 -0.475 0.620 0.48823
1.209 0.605 0.382 0.60886
-0.225 0.469 0.266 0.69805
-1.406 0.228 0.159 0.59372
62 A.G. Ramm, S. Gut man

• Sources
• Detectors
O Inclusions
X Identified Objects
Fig. 2. Inclusions and Identified objects for subsurface particle identification, Ex-
periment I, S — 0.00. X3 coordinate is not shown.

In general, the execution times were less than 2 minutes on a 333MHz


PC. As it can be seen from the results, the method achieves a perfect iden-
tification in the Experiment # 1 when no noise is present. The identification
deteriorates in the presence of noise, as well as if the sources and detectors
are not located directly above the search area. Still the inclusions with the
highest intensity and the closest ones to the surface are identified, while the
Optimization Methods in Direct and Inverse Scattering 63

Table 5. Experiment 2. Identified inclusions, (5 = 0.05.


Xi X2 X3 V
1.575 -0.523 0.735 1.40827
-1.628 -0.447 0.229 1.46256
1.197 0.785 0.578 0.53266
-0.221 0.460 0.231 0.67803

deepest and the weakest are lost. This can be expected, since their influence
on the cost functional is becoming comparable with the background noise in
the data.
In summary, the proposed method for the identification of small inclusions
can be used in geophysics, medicine and technology. It can be useful in the
development of new approaches to ultrasound mammography. It can also be
used for localization of holes and cracks in metals and other materials, as
well as for finding mines from surface measurements of acoustic pressure and
possibly in other problems of interest in various applications.
The HSD minimization method is a specially designed low-dimensional
minimization method, which is well suited for many inverse type problems.
The problems do not necessarily have to be within the range of applicability
of the Born approximation. It is highly desirable to apply HSD method to
practical problems and to compare its performance with other methods.

3 Identification of layers in multilayer particles.


3.1 Problem Description

Many practical problems require an identification of the internal structure of


an object given some measurements on its surface. In this section we study
such an identification for a multilayered particle illuminated by acoustic or
electromagnetic plane waves. Thus the problem discussed here is an inverse
scattering problem. A similar problem for the particle identification from the
light scattering data is studied in [ZUB98]. Our approach is to reduce the
inverse problem to the best fit to data multidimensional minimization.
Let j9 C M^ be the circle of a radius R> 0,

Dm = {xe Wn _i < |x| < r ^ , m ^ 1,2,...,AT} (12)

and S'm = {x G M : |x| — r ^ } for 0 = ro < ri < • • • < rjv < i?. Suppose that
a multilayered scatterer in D has a constant refractive index Um in the region
Dm , m = 1,2,..., AT. If the scatterer is illuminated by a plane harmonic
wave then, after the time dependency is eliminated, the total field u{x) —
uo{x) + Us{x) satisfies the Helmholtz equation

Au + k^u = 0 , \x\ > rjsf (13)


64 A.G. Ramm, S. Gutman

• Sources
• Detectors
O Inclusions
X Identified Objects
Fig. 3. Inclusions and Identified objects for for subsurface particle identification,
Experiment 2, ^ = 0.00. xs coordinate is not shown.

where uo{x) = e'^^^^'^ is the incident field and a is the unit vector in the direc-
tion of propagation. The scattered field Us is required to satisfy the radiation
condition at infinity, see [Ram86].
Let fc^ = fco^m- We consider the following transmission problem

AUm + k'Lum =0 X e Dn (14)


Optimization Methods in Direct and Inverse Scattering 65

under the assumption that the fields Um and their normal derivatives are
continuous across the boundaries Sm , m = l,2,...,A^.
In fact, the choice of the boundary conditions on the boundaries Sm de-
pends on the physical model under the consideration. The above model may
or may not be adequate for an electromagnetic or acoustic scattering, since
the model may require additional parameters (such as the mass density and
the compressibility) to be accounted for. However, the basic computational
approach remains the same. For more details on transmission problems, in-
cluding the questions on the existence and the uniqueness of the solutions, see
[ARS98, EJP57, RPYOO].
The Inverse Problem to be solved is:
IPS: Given u{x) for all x E S = {x : \x\ = R) at a fixed ko > 0, find the
number N of the layers, the location of the layers, and their refractive indices
Um, m=^ 1,2,,.. ,N in (14).
Here the IPS stands for a Single frequency Inverse Problem. Numerical ex-
perience shows that there are some practical difficulties in the successful res-
olution of the IPS even when no noise is present, see [GutOl]. While there are
some results on the uniqueness for the IPS (see [ARS98, RPYOO]), assuming
that the refractive indices are known, and only the layers are to be identified,
the stability estimates are few, see [Ram94c, Ram94d, Ram02a]. The identi-
fication is successful, however, if the scatterer is subjected to a probe with
plane waves of several frequencies. Thus we state the Multifrequency Inverse
Problem:
IPM: Given U'P{X) for all x E S = {x : \x\ = R) at a finite number P of
wave numbers k^ > 0, find the number N of the layers, the location of the
layers, and their refractive indices Um , m = 1,2,... ,N in (14).

3.2 Best Fit Profiles and Local Minimization Methods

If the refractive indices riyyi are sufficiently close to 1, then we say that the
scattering is weak. In this case the scattering is described by the Born ap-
proximation, and there are methods for the solution of the above Inverse
Problems. See [CM90], [Ram86] and [Ram94a] for further details. In particu-
lar, the Born inversion is an ill-posed problem even if the Born approximation
is very accurate, see [Ram90], or [Ram92b]. When the assumption of the Born
approximation is not appropriate, one matches the given observations to a set
of solutions for the Direct Problem. Since our interest is in the solution of the
IPS and IPM in the non-Born region of scattering, we choose to follow the
best fit to data approach. This approach is used widely in a variety of applied
problems, see e. g. [Bie97].
Note, that, by the assumption, the scatterer has the rotational symmetry.
Thus we only need to know the data for one direction of the incident plane
wave. For this reason we fix a = (1,0) in (13) and define the (complex)
functions
9^^\e), 0 < ^ < 2 ^ , p = l,2,...,P, (15)
66 A.G. Ramm, S. Gutman

to be the observations measured on the surface S of the ball D for a finite set
of free space wave numbers fcg .
Fix a positive integer M. Given a configuration

Q = (ri,r2,...,rM,ni,n2,...,nM) (16)

we solve the Direct Problem (13)-(14) (for each free space wave number k^)
with the layers Dm = {x G M."^ : Vm-i < \x\ < Vm , m = 1,2,..., M } , and
the corresponding refractive indices n ^ , where TQ = 0, Let

w(''HO) = u^''\x)l^g. (17)


Fix a set of angles 0 = (^i, ^2, • • • ? ^L) and let

M2=d2^'{0i))'/'. (18)
1=1

Define

1 - ||^(P)_^(.)||2
^ ( r i , r 2 , . . . , r M , n i , n 2 , . . . , n M ) == p 2 ^ IIQ^PH^ ' ^^^^

where the same set 0 is used for g^^^ as for it;^^^


We solve the IPM by minimizing the above best fit to data functional ^
over an appropriate set of admissible parameters Aadm C M^^.
It is reasonable to assume that the underlying physical problem gives some
estimate for the bounds niow a^^d Uhigh of the refractive indices Ti-jji a s well as
for the bound M of the expected number of layers A^. Thus,

^adm C { ( r i , r 2 , . . . , r M , n i , n 2 , . . . , n M ) •' 0 < u < R, niow < nm < rihigh]-


(20)
Note, that the admissible configurations must also satisfy

ri < r 2 < r 3 < - - - < r M . (21)


It is well known that a multidimensional minimization is a difficult prob-
lem, unless the objective function is "well behaved". The most important
quality of such a cooperative function is the presence of just a few local min-
ima. Unfortunately, this is, decidedly, not the case in many applied problems,
and, in particular, for the problem under the consideration.
To illustrate this point further, let P be the set of three free space wave
numbers k^ chosen to be

P - { 3 . 0 , 6.5, 10.0}. (22)


Optimization Methods in Direct and Inverse Scattering 67

0.80

0.60

0.00

Fig. 4. Best fit profile for the configurations qt', Multiple frequencies P
{3.0, 6.5, 10.0}.

Figure 4 shows the profile of the functional ^ as a function of the variable


^ , 0 . 1 < f < 0 . 6 i n the configurations qt with

0.49 0 < |x| < ^


n{x) 9.0 t<\x\< 0.6
1.0 0.6<b|<1.0
Thus the objective function ^ has many local minima even along this
arbitrarily chosen one dimensional cross-section of the admissible set. There
are sharp peaks and large gradients. Consequently, the gradient based methods
(see [Bre73, DS83, FleSl, Hes80, Jac77, Pol71]), would not be successful for
a significant portion of this region. It is also appropriate to notice that the
dependency of ^ on its arguments is highly nonlinear. Thus, the gradient
computations have to be done numerically, which makes them computationally
expensive. More importantly, the gradient based minimization methods (as
expected) perform poorly for these problems.
These complications are avoided by considering conjugate gradient type
algorithms which do not require the knowledge of the derivatives at all, for
example the Powell's method. Further refinements in the deterministic phase
of the minimization algorithm are needed to achieve more consistent per-
68 A.G. Ramm, S. Gutman

formance. They include special line minimization, and Reduction procedures


similar to the ones discussed in a previous section on the identification of
underground inclusions. We skip the details and refer the reader to [GutOl].
In summary, the entire Local Minimization Method (LMM) consists of
the following:

Local Minimization Method (LMM)

1. Let your starting configuration be Qo = (^i,^2, • • •,^M, ^ i , ^ 2 , • • •, ^ M ) -


2. Apply the Reduction Procedure to Qo? and obtain a reduced configuration
QQ containing M^ layers.
3. Apply the Basic Minimization Method in Aadm
flM^^" with the starting
point QQ, and obtain a configuration Qi.
4. Apply the Reduction Procedure to Qi, and obtain a final reduced config-
uration Qi.
3.3 Global Minimization Methods

Given an initial configuration Qo a local minimization method finds a lo-


cal minimum near QQ. On the other hand, global minimization methods ex-
plore the entire admissible set to find a global minimum of the objective
function. While the local minimization is, usually, deterministic, the ma-
jority of the global methods are probabilistic in their nature. There is a
great interest and activity in the development of efficient global minimization
methods, see e.g. [Bie97],[Bom97]. Among them are the simulated anneal-
ing method ([KGV83],[Kir84]), various genetic algorithms [HH98], interval
method, TRUST method ([BP96],[BPR97]), etc. As we have already men-
tioned before, the best fit to data functional ^ has many narrow local min-
ima. In this situation it is exceedingly unhkely to get the minima points by
chance alone. Thus our special interest is for the minimization methods, which
combine a global search with a local minimization. In [GROO] we developed
such a method (the Hybrid Stochastic-Deterministic Method), and applied it
for the identification of small subsurface particles, provided a set of surface
measurements, see Sections 2.2-2.4. The HSD method could be classified as
a variation of a genetic algorithm with a local search with reduction. In this
paper we consider the performance of two algorithms: Deep's Method, and
Rinnooy Kan and Timmer's Multilevel Single-Linkage Method. Both combine
a global and a local search to determine a global minimum. Recently these
methods have been applied to a similar problem of the identification of par-
ticles from their light scattering characteristics in [ZUB98]. Unlike [ZUB98],
our experience shows that Deep's method has failed consistently for the type
of problems we are considering. See [DE94] and [ZUB98] for more details on
Deep's Method.
Optimization Methods in Direct and Inverse Scattering 69

Multilevel Single-Linkage Method (MSLM)

Rinnooy Kan and Timmer in [RT87a, RT87b] give a detailed description of


this algorithm. Zakovic et. al. in [ZUB98] describe in detail an experience of its
application to an inverse light scattering problem. They also discuss different
stopping criteria for the MSLM. Thus, we only give here a shortened and an
informal description of this method and of its algorithm.
In a pure Random Search method a batch H oiL trial points is generated
in Aadm using a uniformly distributed random variable. Then a local search is
started from each of these L points. A local minimum with the smallest value
of ^ is declared to be the global one.
A refinement of the Random Search is the Reduced Sample Random
Search method. Here we use only a certain fixed fraction 7 < 1 of the original
batch of L points to proceed with the local searches. This reduced sample Hred
of 7L points is chosen to contain the points with the smallest 7L values of ^
among the original batch. The local searches are started from the points in
this reduced sample.
Since the local searches dominate the computational costs, we would like
to initiate them only when it is truly necessary. Given a critical distance d
we define a cluster to be a group of points located within the distance d of
each other. Intuitively, a local search started from the points within a cluster
should result in the same local minimum, and, therefore, should be initiated
only once in each cluster.
Having tried all the points in the reduced sample we have an information
on the number of local searches performed and the number of local minima
found. This information and the critical distance d can be used to determine
a statistical level of confidence, that all the local minima have been found.
The algorithm is terminated (a stopping criterion is satisfied) if an a priori
level of confidence is reached.
If, however, the stopping criterion is not satisfied, we perform another
iteration of the MSLM by generating another batch of L trial points. Then
it is combined with the previously generated batches to obtain an enlarged
batch H^ oi jL points (at iteration j ) , which leads to a reduced sample H^^^
of jjL points. According to MSLM the critical distance d is reduced to dj,
(note that dj —> 0 as j -^ 00, since we want to find a minimizer), a local
minimization is attempted once within each cluster, the information on the
number of local minimizations performed and the local minima found is used
to determine if the algorithm should be terminated, etc.
The following is an adaptation of the MSLM method to the inverse scat-
tering problem presented in Section 3.L The LMM local minimization method
introduced in the previous Section is used here to perform local searches.

MSLM

(at iteration j).


70 A.G. Ramm, S. Gutman

1. Generate another batch of L trial points (configurations) from a random


uniform distribution in Aadm- Combine it with the previously generated
batches to obtain an enlarged batch H^ of jL points.
2. Reduce H^ to the reduced sample H^^^ of 7JL points, by selecting the
points with the smallest jjL values of ^ in H^.
3. Calculate the critical distance dj by

'n\2
dj = ^{d)? + (dJ)
4. Order the sample points in H^^^ so that ^{Qi) < ^(Q^+i), 2 = 1 , . . . , jjL.
For each value of z, start the local minimization from Q^, unless there
exists an index A: < i, such that \\Qk — Qi\\ < dj. Ascertain if the result is
a known local minimum.
5. Let K be the number of local minimizations performed, and W be the
number of different local minima found. Let

K-W-2
The algorithm is terminated if

Wtot<W + 0.5. (23)


Here F is the gamma function, and cr is a fixed constant.
A related algorithm (the Mode Analysis) is based on a subdivision of
the admissible set into smaller volumes associated with local minima. This
algorithm is also discussed in [RT87a, RT87b]. Prom the numerical studies
presented there, the authors deduce their preference for the MSLM.
The presented MSLM algorithm was successful in the identification of
various 2D layered particles, see [GutOl] for details.

4 Potential scattering and the Stability Index method.


4.1 Problem description

Potential scattering problems are important in quantum mechanics, where


they appear in the context of scattering of particles bombarding an atom
nucleus. One is interested in reconstructing the scattering potential from the
results of a scattering experiment. The examples in Section 4 deal with finding
a spherically symmetric {q = q{r), r = \x\) potential from the fixed-energy
Optimization Methods in Direct and Inverse Scattering 71

scattering data, which in this case consist of the fixed-energy phase shifts. In
[Ram96, Ram02a, Ram04d, Ram05a] the three-dimensional inverse scattering
problem with fixed-energy data is treated.
Let q{x)^ X G M^, be a real-valued potential with compact support. Let
i? > 0 be a number such that q{x) = 0 iov \x\ > R. We also assume that
q e L'^{BR) , BR = {x : \x\ < R,x e M^}. Let 5^ be the unit sphere, and
a e S"^. For a given energy A; > 0 the scattering solution '0(x,a) is defined as
the solution of

A^ + A: V - g(^)^ == 0, xeR^ (24)


satisfying the following asymptotic condition at infinity:

^ = ^0 + ^, ^o:=e^^^•^ aeS\ (25)

dv . '
Um / — ikv ds = 0. (26)
or
It can be shown, that

^ikr /2\
'^(x, a) = T/^O + A(a', a, k) f- o I - , as r oo, — = a' r := \x\.
r
r \r J (27)
The function A{a',a,k) is called the scattering amplitude, a and a' are
the directions of the incident and scattered waves, and /c^ is the energy, see
[New82, Ram94a].
For spherically symmetric scatterers q{x) — q{r) the scattering amplitude
satisfies A{a'^a^k) = A{a^ • a,k). The converse is established in [Ram91].
Following [RS99], the scattering amplitude for q = q{r) can be written as
oo I

A{a',a,k) = Y1 E Mk)Yim{a')YU^, (28)


1=0 m=-l

where Yim are the spherical harmonics, normalized in L^(5^), and the bar
denotes the complex conjugate.
The fixed-energy phase shifts —TT < 5i < n {6i = 5(/,A:), k > Ois fixed) are
related to Ai{k) (see e.g., [RS99]) by the formula:

Ai{k) = ^e'^^smi5i). (29)

Several parameter-fitting procedures were proposed for calculating the


potentials from the fixed-energy phase shifts, (by Fiedeldey, Lipperheide,
Hooshyar and Razavy, loannides and Mackintosh, Newton, Sabatier, May
and Scheid, Ramm and others). These works are referenced and their results
are described in [CS89, New82]. Recent works [GutOO, GutOl, GROO, GR02a]
72 A.G. Ramm, S. Gut man

and [RGOl, RS99, RSOO], present new numerical methods for solving this
problem. In [Ram02d] (also see [Ram04b, Ram05a]) it is proved that the
R.Newton-P.Sabatier method for solving inverse scattering problem the fixed-
energy phase shifts as the data (see [CS89, New82] ) is fundamentally wrong
in the sense that its foundation is wrong. In [Ram02c] a counterexample is
given to a uniqueness theorem claimed in a modification of the R.Newton's
inversion scheme.
Phase shifts for a spherically symmetric potential can be computed by a
variety of methods, e.g. by a variable phase method described in [Cal67]. The
computation involves solving a nonlinear ODE for each phase shift. However,
if the potential is compactly supported and piecewise-const ant, then a much
simpler method described in [ARS99] and [GRS02] can be used. We refer the
reader to these papers for details.
Let ^o(^) be a spherically symmetric piecewise-constant potential, {6{k, O l / l i
be the set of its phase shifts for a fixed k > 0 and a sufficiently large A^. Let
q{r) be another potential, and let {<5(A;,/)}^^ be the set of its phase shifts.
The best fit to data function ^(g, k) is defined by

*(,,,) = E £ 4 M H M ! ! , (30,
The phase shifts are known to decay rapidly with /, see [RAI98]. Thus, for
sufficiently large A/", the function ^ is practically the same as the one which
would use all the shifts in (30). The inverse problem of the reconstruction of
the potential from its fixed-energy phase shifts is reduced to the minimization
of the objective function ^ over an appropriate admissible set.

4.2 Stability Index Minimization Method

Let the minimization problem be

min{^(^) : q e Aadm} (31)

Let qo be its global minimizer. Typically, the structure of the objective


function ^ is quite complicated: this function may have many local minima.
Moreover, the objective function in a neighborhood of minima can be nearly
fiat resulting in large minimizing sets defined by

Se^{qe Aadm '• ^{Q) < ^{QO) + e} (32)


for an e > 0.
Given an e > 0, let D^ be the diameter of the minimizing set S^, which we
call the Stability Index De of the minimization problem (31),
Its usage is explained below.
One would expect to obtain stable identification for minimization prob-
lems with small (relative to the admissible set) stability indices. Minimization
Optimization Methods in Direct and Inverse Scattering 73

problems with large stability indices have distinct minimizers with practi-
cally the same values of the objective function. If no additional information
is known, one has an uncertainty of the minimizer's choice. The stability in-
dex provides a quantitative measure of this uncertainty or instability of the
minimization.
If Dc < ry, where r/ is an a priori chosen treshold, then one can solve the
global minimization problem stably. In the above general scheme it is not
discussed in detail what are possible algorithms for computing the Stability
Index.
One idea to construct such an algorithm is to iteratively estimate stabil-
ity indices of the minimization problem, and, based on this information, to
conclude if the method has achieved a stable minimum.
One such algorithm is an Iterative Reduced Random Search (IRRS)
method, which uses the Stability Index for its stopping criterion. Let a batch
H of L trial points be randomly generated in the admissible set Aadm- Let
7 be a certain fixed fraction, e.g., 7 = 0.01. Let Smin be the subset of H
containing points {pi} with the smallest 7L values of the objective function
^ in if. We call Smin the minimizing set. If all the minimizers in Smin are
close to each other, then the objective function ^ is not fiat near the global
minimum. That is, the method identifies the minimum consistently. Let || • ||
be a norm in the admissible set.
Let
e= max ^(pj) - min ^{pj)

and
De = diam{Smin) = max{||p^ - pj\\ : pi.pj e Smin} - (33)
Then D^ can be considered an estimate for the Stability Index D^ of
the minimization problem. The Stability Index reflects the size of the mini-
mizing sets. Accordingly, it is used as a self-contained stopping criterion for
an iterative minimization procedure. The identification is considered to be
stable if the Stability Index D^ < rj, for an a priori chosen rj > 0. Otherwise,
another batch of L trial points is generated, and the process is repeated. We
used /3 — 1.1 as described below in the stopping criterion to determine if
subsequent iterations do not produce a meaningful reduction of the objective
function.
More precisely

Iterative Reduced Random Search (IRRS)

(at the j—th iteration).


Fix 0 < 7 < 1, /? > 1, 7/ > 0 and Nmax-
1. Generate another batch H^ of L trial points in Aadm using a random
distribution.
74 A.G. Ramm, S. Gutman

2. Reduce H^ to the reduced sample H^^^ of 7L points by selecting the


points in H^ with the smallest 7L values of ^.
3. Combine H^^^ with H^^ obtained at the previous iteration. Let S^^^
be the set of jL points from H^^^^ U H^^ with the smallest values of ^.
( U s e F ^ , „ f o r j = l).
4. Compute the Stability Index (diameter) D^ of 5^^^ by D-^ = max{||pi —
Pk\\ ' PuPk ^ Smin} '
5. Stopping criterion.
Let p G S^^^ be the point with the smallest value of ^ in S^^^ (the global
minimizer).
If D^ < 77, then stop. The global minimizer is p. The minimization is
stable.
If D^ > Tj and ^{q) < /3^(p) : q G 5'^^^, then stop. The minimization is
unstable. The Stability Index D^ is the measure of the instability of the
minimization.
Otherwise, return to step 1 and do another iteration, unless the maximum
number of iterations Nmax is exceeded.
One can make the stopping criterion more meaningful by computing a
normalized stability index. This can be achieved by dividing D^ by a fixed
normalization constant, such as the diameter of the entire admissible set Aadm-
To improve the performance of the algorithm in specific problems we found it
useful to modify (IRRS) by combining the stochastic (global) search with a de-
terministic local minimization. Such Hybrid Stochastic-Deterministic (HSD)
approach has proved to be successful for a variety of problems in inverse quan-
tum scattering (see [GutOl, GRS02, RGOl]) as well as in other applications
(see [GutOO, GROO]). A somewhat difi'erent implementation of the Stability
Index Method is described in [GR02a].
We seek the potentials q{r) in the class of piecewise-constant, spherically
symmetric real-valued functions. Let the admissible set be

Adm C { ( r i , r 2 , . . . , r M , g i , ^ 2 , . . . , g M ) : 0 < n < R, qiow < Qm < qhigh} ,


(34)
where the bounds qiow and qhigh for the potentials, as well as the bound
M on the expected number of layers are assumed to be known.
A configuration (ri, r 2 , . . . , VM^qi 1Q21' - - ^qn) corresponds to the potential

Q{r) = qm 1 for r ^ - i , < r < Vm , l<m<M, (35)


where TQ = 0 and q{r) = 0 for r >rM — R-
Note, that the admissible configurations must also satisfy

ri < r2 < rs < • • • < TM . (36)

We used /? = 1.1, e = 0.02 and jmax = 30. The choice of these and other
parameters (L = 5000, 7 = 0.01, v = 0.16 ) is dictated by their meaning in the
Optimization Methods in Direct and Inverse Scattering 75

algorithm and the comparative performance of the program at their different


values. As usual, some adjustment of the parameters, stopping criteria, etc., is
needed to achieve the optimal performance of the algorithm. The deterministic
part of the IRRs algorithm was based on the Powell's minimization method,
one-dimensional minimization, and a Reduction procedure similar to ones
described in the previous section 3, see [GRS02] for details.

4.3 Numerical Results

We studied the performance of the algorithm for 3 different potentials Q'^(r), i =


1,2,3 chosen from the physical considerations.
The potential qsir) = - 1 0 for 0 < r < 8.0 and qs = 0 for r > 8.0
and a wave number k = 1 constitute a typical example for elastic scattering
of neutral particles in nuclear and atomic physics. In nuclear physics one
measures the length in units of fm = 10"^^m, the quantity qs in units of 1/fm^,
and the wave number in units of 1/fm. The physical potential and incident
energy are given by V{r) = f-^3(^) and E = \ ^ , respectively, here ^'•= ^^
h = 6.62510-2'^ erg-s is the Planck constant. He = 197.32 MeV-fm, c = 3 • 10^
m/sec is the velocity of light, and fi is the mass of a neutron. By choosing the
mass /J, to be equal to the mass of a neutron // = 939.6 MeV/c^, the potential
and energy have the values of V{r) = -207.2 MeV for 0 < r < 8.0 fm and
E{k = l / f m ) = 20.72 MeV. In atomic physics one uses atomic units with the
Bohr radius ao = 0.529 • 10~^°m as the unit of length. Here, r. A: and qs are
measured in units of ao, 1/ao and l/ag, respectively. By assuming a scattering
of an electron with mass mo = 0.511 MeV/c^, we obtain the potential and
energy as follows: V{r) = -136 eV for 0 < r < 8ao = 4.23 • 10"^°m and
E{k = 1/ao) = 13.6 eV. These numbers give motivation for the choice of
examples applicable in nuclear and atomic physics.
The method used here deals with finite-range (compactly supported) po-
tentials. One can use this method for potentials with the Coulomb tail or
other potentials of interest in physics, which are not of finite range. This is
done by using the phase shifts transformation method which allows one to
transform the phase shifts corresponding to a potential, not of finite range,
whose behavior is known for r > a, where a is some radius, into the phase
shifts corresponding to a potential of finite range a (see [Apa97], p.156).
In practice differential cross section is measured at various angles, and from
it the fixed-energy phase shifts are calculated by a parameter-fitting proce-
dure. Therefore, we plan in the future work to generalize the stability index
method to the case when the original data are the values of the differential
cross section, rather than the phase shifts.
For the physical reasons discussed above, we choose the following three
potentials:
[ - 2 / 3 0 < r <8.0
^ ^ lo.O r >8.0
76 A.G. Ramm, S. Gutman

f-4.0 0 < r <8.0


^^^^)^\0.0 r>8.0

f-10.0 0 < r <8.0


^^^^)=^|o.O r>8.0
In each case the following values of the parameters have been used. The
radius R of the support of each Qi was chosen to be i? = 10.0. The admissible
set Aadm (34) was defined with M = 2. The Reduced Random Search para-
meters: L = 5000, 7 = 0.01, u = 0.16, e = 0.02, /? = 1.10 Jmax = 30. The
value er = 0.1 was used in the Reduction Procedure during the local min-
imization phase. The initial configurations were generated using a random
number generator with seeds determined by the system time. A typical run
time was about 10 minutes on a 333 MHz PC, depending on the number of
iterations in IRRS. The number N of the shifts used in (30) for the formation
of the objective function ^{q) was 31 for all the wave numbers. It can be seen
that the shifts for the potential qs decay rapidly for A: = 1, but they remain
large for k = 4. The upper and lower bounds for the potentials qiow = —20.0
and qhigh = 0 . 0 used in the definition of the admissible set Aadm were chosen
to reflect a priori information about the potentials.
The identification was attempted with 3 diff'erent noise levels h. The levels
are ft = 0.00 (no noise), ft == 0.01 and ft = 0.1. More precisely, the noisy phase
shifts 5/i(fc, /) were obtained from the exact phase shifts 5(/c, /) by the formula

5h{kJ) = 5{kJ){l-^{0.5-z)'h),
where z is the uniformly distributed on [0,1] random variable.
The distance d{pi{r)^p2{r)) for potentials in step 5 of the IRRS algorithm
was computed as

d{pi{r),P2{r)) = \\pi{r) - P 2 ( r ) | |
where the norm is the L2-norm in R^.
The results of the identification algorithm (the Stability Indices) for dif-
ferent iterations of the IRRS algorithm are shown in Tables 6-8.
For example, Table 8 shows that for A: = 2.5, h = 0.00 the Stability Index
has reached the value 0.013621 after 2 iteration. According to the Stopping
criterion for IRRS, the program has been terminated with the conclusion
that the identification was stable. In this case the potential identified by the
program was

f-10.000024 0 < r < 7.999994


pir) = <
"^^ ^ [0.0 r > 7.999994
which is very close to the original potential
Optimization Methods in Direct and Inverse Scattering 77

Table 6. Stability Indices for qi{r) identification at different noise levels h.


k Iteration h = 0.00 h = Q.Ql h = 0.10
1.00 1 1.256985 0.592597 1.953778
2 0.538440 0.133685 0.799142
3 0.538253 0.007360 0.596742
4 0.014616 0.123247
5 0.015899
2.00 1 0.000000 0.020204 0.009607
2.50 1 0.000000 0.014553 0.046275
3.00 1 0.000000 0.000501 0.096444
4.00 1 0.000000 0.022935 0.027214

, , f-10.0 0 < r <8.0


''^'^ = jo.O r > 8.0
On the other hand, when the phase shifts of qsir) were corrupted by a
10% noise {k = 2.5, h = 0.10), the program was terminated (according to
the Stopping criterion) after 4 iterations with the Stability Index at 0.079241.
Since the Stability Index is greater than the a priori chosen threshold of e =
0.02 the conclusion is that the identification is unstable. A closer look into this
situation reveals that the values of the objective function ^{pi), pi G Smin
(there are 8 elements in Smin) ^^^ between 0.0992806 and 0.100320. Since we
chose /? = 1.1 the values are within the required 10% of each other. The actual
potentials for which the normalized distance is equal to the Stability Index
0.079241 are

-9.997164 0 < r < 7.932678


Pi{r) -7.487082 7.932678 <r< 8.025500
0.0 r > 8.025500

and
-9.999565 0 < r < 7.987208
P2{r) == <-1.236253 7.987208 < r < 8.102628
0.0 r > 8.102628
with ^{pi) = 0.0992806 and ^(^2) = 0.0997561. One may conclude from this
example that the threshold e = 0.02 is too tight and can be relaxed, if the
above uncertainty is acceptable.
Finally, we studied the dependency of the Stabihty Index from the dimen-
sion of the admissible set Aadm^ see (34). This dimension is equal to 2M , where
M is the assumed number of layers in the potential. More precisely, M = 3,
for example, means that the search is conducted in the class of potentials
having 3 or less layers. The experiments were conducted for the identification
of the original potential g2(^) with k = 2.0 and no noise present in the data.
78 A.G. Ramm, S. Gutman

T a b l e 7. Stability Indices for q2{r) identification at different noise levels h.


k Iteration h = 0.00 h = 0.01 h = 0.10
1.00 1 0.774376 0.598471 0.108902
2 0.773718 1.027345 0.023206
3 0.026492 0.025593 0.023206
4 0.020522 0.029533 0.024081
5 0.020524 0.029533 0.024081
6 0.000745 0.029533
7 0.029533
8 0.029533
9 0.029533
10 0.029533
11 0.029619
12 0.025816
13 0.025816
14 0.008901
2.00 1 0.863796 0.799356 0.981239
2 0.861842 0.799356 0.029445
3 0.008653 0.000993 0.029445
4 0.029445
5 0.026513
6 0.026513
7 0.024881
2.50 1 1.848910 1.632298 0.894087
2 1.197131 1.632298 0.507953
3 0.580361 1.183455 0.025454
4 0.030516 0.528979
5 0.016195 0.032661
3.00 1 1.844702 1.849016 1.708201
2 1.649700 1.782775 1.512821
3 1.456026 1.782775 1.412345
4 1.410253 1.457020 1.156964
5 0.624358 0.961263 1.156964
6 0.692080 0.961263 0.902681
7 0.692080 0.961263 0.902681
8 0.345804 0.291611 0.902474
9 0.345804 0.286390 0.159221
10 0.345804 0.260693 0.154829
11 0.043845 0.260693 0.154829
12 0.043845 0.260693 0.135537
13 0.043845 0.260693 0.135537
14 0.043845 0.260693 0.135537
15 0.042080 0.157024 0.107548
16 0.042080 0.157024
17 0.042080 0.157024
18 0.000429 0.157024
19 0.022988
4.00 1 0.000000 0.000674 0.050705
Optimization Methods in Direct and Inverse Scattering 79

Table 8. Stability Indices for qsir) identification at different noise levels h.


k Iteration h = 0.00 h = 0.01 h = 0.10
1.00 1 0.564168 0.594314 0.764340
2 0.024441 0.028558 0.081888
3 0.024441 0.014468 0.050755
4 0.024684
5 0.024684
6 0.005800
2.00 1 0.684053 1.450148 0.485783
2 0.423283 0.792431 0.078716
3 0.006291 0.457650 0.078716
4 0.023157 0.078716
5 0.078716
6 0.078716
7 0.078716
8 0.078716
9 0.078716
10 0.078716
11 0.078716
2.50 1 0.126528 0.993192 0.996519
2 0.013621 0.105537 0.855049
3 0.033694 0.849123
4 0.026811 0.079241
3.00 1 0.962483 1.541714 0.731315
2 0.222880 0.164744 0.731315
3 0.158809 0.021775 0.072009
4 0.021366
5 0.021366
6 0.001416
4.00 1 1.714951 1.413549 0.788434
2 0.033024 0.075503 0.024482
3 0.018250 0.029385
4 0.029421
5 0.029421
6 0.015946

The results are shown in Table 9. Since the potential q2 consists of only one
layer, the smallest Stability Indices are obtained for M = 1. They gradually
increase with M. Note, that the algorithm conducts the global search using
random variables, so the actual values of the indices are different in every
run. Still the results show the successful identification (in this case) for the
entire range of the a priori chosen parameter M. This agrees with the theoret-
ical consideration according to which the Stability Index corresponding to an
ill-posed problem in an infinite-dimensional space should be large. Reducing
the original ill-posed problem to a one in a space of much lower dimension
regularizes the original problem.
80 A.G. Ramm, S. Gutman

Table 9. Stability Indices for (72 (r) identification for different values of M.
Iteration M =1 M =2 M =3 M =4
1 0.472661 1.068993 1.139720 1.453076
2 0.000000 0.400304 0.733490 1.453076
3 0.000426 0.125855 0.899401
4 0.125855 0.846117
5 0.033173 0.941282
6 0.033173 0.655669
7 0.033123 0.655669
8 0.000324 0.948816
9 0.025433
10 0.025433
11 0.012586

5 Inverse scattering problem with fixed-energy data.


5.1 Problem description

In this Section we continue a discussion of the Inverse potential scattering


with a presentation of Ramm's method for solving inverse scattering problem
with fixed-energy data, see [Ram04d]. The method is applicable to both exact
and noisy data. Error estimates for this method are also given. An inversion
method using the Dirichlet-to-Neumann (DN) map is discussed, the difficul-
ties of its numerical implementation are pointed out and compared with the
difficulties of the implementation of the Ramm's inversion method. See the
previous Section on the potential scattering for the problem set up.

5.2 Ramm's inversion method for exact data

The results we describe in this Section are taken from [Ram94a] and [Ram02a].
Assume q e Q := Qa H L'^{R^), where Qa := {q : q{x) = q{x), q{x) G
L'^{Ba), q{x) =: 0 if \x\ > a}, Ba '.= {x : \x\ < a). Let A{a'^a) be the cor-
responding scattering amplitude at a fixed energy A;^, A; = 1 is taken without
loss of generality. One has:
00 «

A{.a', a) - ^ A,(a)F^(aO, A^{a) := / A{a\a)Y^)da\ (37)


e=o '^^^

where 5^ is the unit sphere in R^, Yeiot') = y^,m(<^Oj~^ < m < £, are the
normalized spherical harmonics, summation over m is understood in (37) and
in (44) below. Define the following algebraic variety:
3
M : = {l9 : 6> G C ^ 6>. 6> = 1 } , 6> • it; : = ^ Ojivj. (38)
Optimization Methods in Direct and Inverse Scattering 81

This variety is non-compact, intersects R^ over 5^, and, given any ^ G R'^,
there exist (many) 9,9' G M such that

9^-9 = ^, \9\ -^ oo, 9,9' G M. (39)

In particular, if one chooses the coordinate system in which ^ = tes, ^ > 0, 63


is the unit vector along the xs-axis, then the vectors
t t t^
0' = -es + C2e2 + Ciei, 9 = - - 6 3 + (262 + Ci^i, Ci + Cl ^ 1 - ^ ^ (40)

satisfy (39) for any complex numbers (i and C2 satisfying the last equa-
tion (40) and such that |CiP + IC2P —> 00. There are infinitely many
such C15C2 ^ C. Consider a subset M' C M consisting of the vectors
9 = (sin-i? cos (^, sin 1? sin (^, cos-i?), where ^9 and (p run through the whole com-
plex plane. Clearly 9 E M, but M ' is a proper subset of M. Indeed, any
9 e M with 93 ^ ±1 is an element of M\ If 93 = ±1, then cos^ = ± 1 ,
so sin 7? = 0 and one gets 9 = (0,0, ±1) G M'. However, there are vectors
9 = (^1,^2,1) ^ M which do not belong to M'. Such vectors one obtains
choosing ^1,^2 ^ C such that ^f -f ^2 — 0. There are infinitely many such
vectors. The same is true for vectors (^1,^2?—!)• Note that in (39) one can
replace M by M ' for any ^ G M^, ^ 7^ 2e3.
Let us state two estimates proved in [Ram94a]:

j,^|^,(a)|<c(2)*(|f', (41)
where c > 0 is a constant depending on the norm ||9||L2(Ba)5 ^^^
1 grl/m^l
\Ye{9)\< r....... Vr>0, 9 e M', (42)
V47r \j£[r)\
where
1 1 fer\^
Mr) : - {^yJe^iir) - ^ ^ ( | ) [1 + o(l)] as ^ - 00, (43)

and Ji{r) is the Bessel function regular at r = 0. Note that 1^(0;')) defined
above, admits a natural analytic continuation from S'^ to M by taking 1} and
(p to be arbitrary complex numbers. The resulting 9' G M ' C M.
The series (37) converges absolutely and uniformly on the sets 5^ x Mc,
where Mc is any compact subset of M.
Fix any numbers ai and 6, such that a < ai < b. Let || • || denote the
L'^icii ^ l^:] < 6)-norm. If |x| > a, then the scattering solution is given
analytically:

u{x,a) = e^^'^ + ^ ^ ( a ) n ( a O / i K O . r : - |x| > a, a ' := - , (44)


£=0
82 A.G. Ramm, S. Gutman

where A£{a) and ^^(a') are defined above,

/.,(r):=e^f(^+^)y^<,(r),

H^ \r) is the Hankel function, and the normalizing factor is chosen so that
heir) = -^[1 + 0(1)] as r -^ 00. Define

p{x) := p{x] u) := e-^^'^ / u{x, a)u{a, e)da - 1, ve L^{S^). (45)

Consider the minimization problem

IIPII = inf := d{e), (46)

where the infimum is taken over all ly e L^(S'^), and (39) holds.
It is proved in [Ram94a] that

d{e) <c\e\-^ if(9GM, \e\ > i . (47)


The symbol \9\ ^ 1 means that \9\ is sufficiently large. The constant c > 0 in
(47) depends on the norm ||^||z,2(5^) but not on the potential q{x) itself.
An algorithm for computing a function z/(a, 6), which can be used for inver-
sion of the exact, fixed-energy, three-dimensional scattering data, is as follows:
a) Find an approximate solution to (46) in the sense

\\pix,iy)\\<2d{e), (48)

where in place of the factor 2 in (48) one could put any fixed constant greater
than 1.
b) Any such i^ia, 9) generates an estimate of q{^) with the error O ( 4 | 1,
1^1 -^ 00. This estimate is calculated by the formula

5^:=:_47r/ A{9',a)v{a,9)da, (49)

where i^(a, ^) G L'^{S'^) is any function satisfying (48).


Our basic result is:

T h e o r e m 1. Let (39) and (48) hold. Then

snv\q-qm<^.. I^H oo, (50)

The constant c > 0 in (50) depends on a norm of q, but not on a particular


Optimization Methods in Direct and Inverse Scattering 83

The norm of q in the above Theorem can be any norm such that the set
{q\ \\q\\ < const) is a compact set in L^{Ba)'
In [Ram94a, Ram02a] an inversion algorithm is formulated also for noisy
data, and the error estimate for this algorithm is obtained. Let us describe
these results.
Assume that the scattering data are given with some error: a function
As{a'^a) is given such that

sup \A{a',a)-A5{a',a)\<5. (51)

We emphasize that ^5(a', a) is not necessarily a scattering amplitude cor-


responding to some potential, it is an arbitrary function in L^{S'^ x 5^) satis-
fying (51). It is assumed that the unknown function A(a\a) is the scattering
amplitude corresponding to a g E Q-
The problem is: Find an algorithm for calculating qs such that

sup \q6 - m\ < Vi5), 7/(5) - . 0 as 5 ^ 0, (52)


CGM3

and estimate the rate at which r]{6) tends to zero.


An algorithm for inversion of noisy data will now be described.
Let
N{S) :- (53)
[ln\lnS\_
where [x] is the integer nearest to a; > 0,
N{5)
As{e\a) := Y^ Ase{a)Ye{e^), Ase{a) : - / As{a',a)Ye{a^)da', (54)

N{d)
us{x, a) : - e^^-^ + ^ A5e{a)Ye{a')he{r), (55)

ps{x; u) := e~'^"^ / U5{x, a)u{a)da - 1, 9 e M, (56)


0/52

fi{S) : - e--^^^^^ 7 - In — > 0, (57)


a
a{u) := ||z/||^2(52), K := \ImO\. (58)
Consider the variational problem with constraints:

\e\ = sup : - ^(5), (59)

1^1 [WPSMW + a{u)e^'fi{5)] < c, 9 e M, \e\ = sup := i9((5), (60)


the norm is defined above (44), and it is assumed that (39) holds, where
^ G M^ is an arbitrary fixed vector, c > 0 is a sufficiently large constant, and
84 A.G. Ramm, S. Gut man

the supremum is taken over 6 e M and u G LP'{S'^) under the constraint (60).
By c we denote various positive constants.
Given ^ G M^ one can always find 9 and 0' such that (39) holds. We prove
that 'd{5) —> GO, more precisely:

Let the pair 6{5) and i/<5(a, 0) be any approximate solution to problem
(59)-(60) in the sense that
\0m > ^ . (62)
Calculate
qs := -47r / A5{e\a)u8{a,e)da. (63)

Theorem 2. / / (39) and (62) hold, then

sup \qs - m\ < ^^^T^ as 5^0, (64)


e€]R3 |lnd|

where c > 0 is a constant depending on a norm of q.

In [Ram94a] estimates (50) and (64) were formulated with the supremum
taken over an arbitrary large but fixed ball of radius ^o- Here these estimates
are improved: ^o = oo- The key point is: the constant c > 0 in the estimate
(47) does not depend on 6.
Remark. In [Ram96] (see also [Ram92a, Ram02a]) an analysis of the
approach to ISP, based on the recovery of the DN (Dirichle-to-Neumann)
map from the fixed-energy scattering data, is given. This approach is discussed
below.
The basic numerical difficulty of the approach described in Theorems 1 and
2 comes from solving problems (46) for exact data, and problem (59)-(60) for
noisy data. Solving (46) amounts to finding a global minimizer of a quadratic
form of the variables Q , if one takes u in (45) as a linear combination of the
spherical harmonics: u = J^^^Q ^^^^(^)- ^^ ^^^ ^^^^ ^^^ necessary condition
for a minimizer of a quadratic form, that is, a linear system, then the matrix
of this system is ill-conditioned for large L, This causes the main difficulty
in the numerical solution of (46). On the other hand, there are methods for
global minimization of the quadratic functionals, based on the gradient de-
scent, which may be more efficient than using the above necessary condition.

5.3 Discussion of the inversion method w^hich uses the D N map

In [Ram96] the following inversion method is discussed:


Optimization Methods in Direct and Inverse Scattering 85

q{^) = lim / exp(-z(9' 's){A- Ao)i>ds, (65)


\e\-^ooJs
where (39) is assumed, A is the Dirichlet-to-Neumann (DN) map, V^ is found
from the equation:

^(s) = 7/^0(5) - I G{s- t)Bijdt, B:=A- ylo, ^^0(5) := e^^•^ (66)


Js
and G is defined by the formula:

The DN map is constructed from the fixed-energy scattering data A{a\ a) by


the method of [Ram96] (see also [Ram94a]).
Namely, given A{a\a) for all a\a G 5^, one finds A using the following
steps.
Let / G H^^'^{S) be given, 5 is a sphere of radius a centered at the origin,
fe are its Fourier coefficients in the basis of the spherical harmonics,
00
^_ . ..^..^o^hl{r) ^^^ ^^ _x
X://ll(x«);^, r>a, x^:=^, r := \x\. (68)

Let
w= g{x,s)a{s)ds, (69)

where a is some function, which we find below, and g is the Green func-
tion (resolvent kernel) of the Schroedinger operator, satisfying the radiation
condition at infinity. Then
wj^ =w]^ + a, (70)
where A^ is the outer normal to 5, so A^ is directed along the radius-vector.
We require w = f on S. Then w is given by (68) in the exterior of 5, and

By formulas (70) and (71), finding A is equivalent to finding a. By (69),


asymptotics of K; as r := \x\ —^ 00, x/\x\ := x^, is (cf [Ram94a], p.67):

r An r
where u is the scattering solution,
00

u{y, -x") = e-'-°-y + J2 Aei-x^)Yeiy°)he{\y\). (73)


i=0
86 A.G. Ramm, S. Gutman

Prom (68), (72) and (73) one gets an equation for finding a ([Ram96], eq.
(23), see also [Ram94a], p. 199):

JL. = 1.J^ dsa{s) {u{s, -f3), Yim^.^s^^ ^ (74)

which can be written as a Hnear system:

i^ = a\-iyTM'^^i'jM)Sw+Ainhv{a% (75)

for the Fourier coefficients a^ of cr. The coefficients

are the Fourier coefficients of the scattering ampHtude. Problems (74) and
(75) are very ill-posed (see [Ram96] for details).
This approach faces many difficulties:
1) The construction of the DN map from the scattering data is a very
ill-posed problem,
2) The construction of the potential from the DN map is a very difficult
problem numerically, because one has to solve a Predholm-type integral equa-
tion ( equation (66) ) whose kernel contains G, defined in (67). This G is a
tempered distribution, and it is very difficult to compute it,
3) One has to calculate a limit of an integral whose integrand grows ex-
ponentially to infinity if a factor in the integrand is not known exactly. The
solution of equation (66) is one of the factors in the integrand. It cannot be
known exactly in practice because it cannot be calculated with arbitrary ac-
curacy even if the scattering data are known exactly. Therefore the limit in
formula (65) cannot be calculated accurately.
No error estimates are obtained for this approach.
In contrast, in Ramm's method, there is no need to compute G, to solve
equation (66), to calculate the DN map from the scattering data, and to
compute the limit (65). The basic difficulty in Ramm's inversion method for
exact data is to minimize the quadratic form (46), and for noisy data to
solve optimization problem (59)-(60). The error estimates are obtained for
the Ramm's method.

6 Obstacle scattering by the Modified Rayleigh


Conjecture (MRC) method.
6.1 Problem description

In this section we present a novel numerical method for Direct Obstacle Scat-
tering Problems based on the Modified Rayleigh Conjecture (MRC). The basic
Optimization Methods in Direct and Inverse Scattering 87

theoretical foundation of the method was developed in [Ram02b]. The MRC


has the appeal of an easy implementation for obstacles of complicated geome-
try, e.g. having edges and corners. A special version of the MRC method was
used in [GR05] to compute the scattered field for 3D obstacles. In our numer-
ical experiments the method has shown itself to be a competitive alternative
to the BIEM (boundary integral equations method), see [GR02b]. Also, unlike
the BIEM, one can apply the algorithm to different obstacles with very little
additional effort.
We formulate the obstacle scattering problem in a 3D setting with the
Dirichlet boundary condition, but the discussed method can also be used for
the Neumann and Robin boundary conditions.
Consider a bounded domain D cR^, with a boundary S which is assumed
to be Lipschitz continuous. Denote the exterior domain by D^ = R^\i^. Let
a, a' G 5^ be unit vectors, and 5^ be the unit sphere in R^.
The acoustic wave scattering problem by a soft obstacle D consists in
finding the (unique) solution to the problem (76)-(77):

(V^ -{-k^)u = 0 in D\ u - 0 on 5, (76)

^ikr / 2\ ^
u = uo-\-A(a\ a) h o - , r :== b l - ^ oo, a':=-. (77)
r \r J r
Here UQ := e'^^^'^ is the incident field, v :— U—UQ is the scattered field, A{a', a)
is called the scattering amplitude, its k-dependence is not shown, k > 0 is the
wavenumber. Denote

Ae{a):= [ A{a', a)Ye{^da\ (78)

where Y£{a) are the orthonormal spherical harmonics. Ye = Yim^ —i<m<£.


Let h£{r) be the spherical Hankel functions, normalized so that /i^(r) ~ ^-^
as r —> +00.
Informally, the Random Multi-point MRC algorithm can be described as
follows.
Fix a J > 0. Let Xj^j = 1,2,..., J be a batch of points randomly chosen
inside the obstacle D. For x e D\ let

V^^(x, Xj) = Ye{a')he{k\x - Xj|). (79)


\X — Xnl

Let g{x) = uo{x)^ x E 5, and minimize the discrepancy


J L
^(C) - ||p(x) + ^ ^ Q j ^ , ( x , X , ) | U 2 ( 5 ) , (80)
3=1 ^=0

over c G C ^ , where c = { Q , J } - That is, the total field u — g{x) + 1 ' is desired
to be as close to zero as possible at the boundary 5, to satisfy the required
88 A.G. Ramm, S. Gutman

condition for the soft scattering. If the resulting residual r"^^^ = m i n ^ is


smaller than the prescribed tolerance e, than the procedure is finished, and
the sought scattered field is
J L

j=\ e=o

(see Lemma 1 below).


If, on the other hand, the residual r^'^'^ > e, then we continue by trying to
improve on the already obtained fit in (80). Adjust the field on the boundary
by letting g{x) := g{x) + Ve{x), x e S, Create another batch of J points
randomly chosen in the interior of D, and minimize (80) with this new g{x).
Continue with the iterations until the required tolerance e on the boundary
S is attained, at the same time keeping the track of the changing field Ve-
Note, that the minimization in (80) is always done over the same number
of points J, However, the points X j are sought to be different in each iteration
to assure that the minimal values of ^ are decreasing in consequent itera-
tions. Thus, computationally, the size of the minimization problem remains
the same. This is the new feature of the Random multi-point MRC method,
which allows it to solve scattering problems untreatable by previously devel-
oped MRC methods, see [GR02b].
Here is the precise description of the algorithm.
Random Multi-point M R C .
For Xj G D, and ^ > 0 functions ipe{x^Xj) are defined as in (79).
1. Initialization. Fix e > 0, L > 0, J > 0, Nmax > 0. Let n = 0, v^ = 0
and g{x) — uo{x), x e S.
2. Iteration.
a) Let n := n + 1. Randomly choose J points Xj G J9, j = 1, 2 , . . . , J.
b) Minimize

J L

3 = 1 ^=0
over c G C"^, where c = { Q J } -
Let the minimal value of ^ be r^^'^.
c) Let
J L
Ve{x) := Ve{x) + ^ ^ Q j ' 0 ^ ( x , X^), X G D\
j=i e=o
3. Stopping criterion.
a) If r"^^^ < e, then stop.
b) If r^^^ > e, and n y^ Nmax, let
J L
g{x) := g{x) + Y^Y^cejiJi{x,Xj), x e S
j=l£=0
Optimization Methods in Direct and Inverse Scattering 89

and repeat the iterative step (2).


c) If r"^^^ > e, and n = Nmax^ then the procedure failed.

6.2 Direct scattering problems and the Rayleigh conjecture.

Let a ball BR := {x : \x\ < R} contain the obstacle D. In the region r > R
the solution to (76)-(77) is:

X
u{x, a) - e^^«-^ + Yl Mc^)i^e. ^e -= yi{a')he{kr), r > R, a' =
r
£=0
(81)
where the sum includes the summation with respect to m, —^ < m < £, and
A^{a) are defined in (78).
The Rayleigh conjecture (RC) is: the series (81) converges up to the bound-
ary S (originally RC dealt with periodic structures, gratings). This conjecture
is false for many obstacles, but is true for some ([Bar71, Mil73, Ram86]). For
example, if n = 2 and D is an ellipse, then the series analogous to (81) con-
verges in the region r > a^ where 2a is the distance between the foci of the
ellipse [Bar71]. In the engineering literature there are numerical algorithms,
based on the Rayleigh conjecture. Our aim is to give a formulation of a Mod-
ified Rayleigh Conjecture (MRC) which holds for any Lipschitz obstacle and
can be used in numerical solution of the direct and inverse scattering problems
(see [Ram02b]). We discuss the Dirichlet condition but similar argument is
applicable to the Neumann boundary condition, corresponding to acoustically
hard obstacles.
Fix e > 0, an arbitrary small number.
Lemma 1. There exist L = L{e) and C£ — Ci{e) such that
L(e)

|iXo + X]Q(e)V^€||L2(5) < e . (82)

/ / (82) and the boundary condition (76) hold, then


Lie)
\\ve ~ V\\L^S) < e, Ve '-= ^ ce{e)i)i. (83)

Lemma 2. / / (83) holds then

\\\ve-v\\\ = 0{e), e->0, (84)

where \\\ • ||| := || • ||i^-^(D') + II • ||L2(D';(i+|a:|)-7); 1 > I, m > {) is an


arbitrary integer, H^ is the Sobolev space, and v^^v in (84) are functions
defined in D'.
In particular, (84) implies
90 A.G. Ramm, S. Gutman

\\Ve-v\\L2iSa)-0{e), 6-^0, (85)

where SR is the sphere centered at the origin with radius R.

Lemma 3. One has:

ce{e)-^ Ae{a), Vf, e-^ 0. (86)

The Modified Rayleigh Conjecture (MRC) is formulated as a theorem,


which follows from the above three lemmas:

Theorem 3. For an arbitrary small e > 0 there exist L(e) and Q(e), 0 < £ <
L{e), such that (82), (84) and (86) hold.

See [Ram02b] for a proof of the above statements.


The diff'erence between RC and MRC is: (83) does not hold if one replaces
'^e t)y Yle=o^^{^)'^^^ ^^<^ l^^s L ^ 00 (instead of letting e —^ 0). Indeed,
the series lC?lo^^(^)'^^ diverges at some points of the boundary for many
obstacles. Note also that the coefficients in (83) depend on e, so (83) is not a
partial sum of a series.
For the Neumann boundary condition one minimizes

dN
LHS)

with respect to Q . Analogs of Lemmas 1-3 are valid and their proofs are
essentially the same.
See [Ram04c] for an extension of these results to scattering by periodic
structures.

6.3 Numerical Experiments.

In this section we desribe numerical results obtained by the Random Multi-


point MRC method for 2D and 3D obstacles. We also compare the 2D re-
sults to the ones obtained by our earher method introduced in [GR02b]. The
method that we used previously can be described as a Multi-point MRC. Its
difference from the Random Multi-point MRC method is twofold: It is just the
first iteration of the Random method, and the interior points Xj, j = 1,2,..., J
were chosen deterministically, by an ad hoc method according to the geome-
try of the obstacle D. The number of points J was Hmited by the size of the
resulting numerical minimization problem, so the accuracy of the scattering
solution (i.e. the residual r'^'^'^) could not be made small for many obstacles.
The method was not capable of treating 3D obstacles. These limitations were
removed by using the Random Multi-point MRC method. As we mentioned
previously, [GR02b] contains a favorable comparison of the Multi-point MRC
Optimization Methods in Direct and Inverse Scattering 91

method with the BIEM, inspite in spite of the fact that the numerical imple-
mentation of the MRC method in [GR02b] is considerably less efficient than
the one presented in this paper.
A numerical implementation of the Random Multi-point MRC method
follows the same outline as for the Multi-point MRC, which was described in
[GR02b]. Of course, in a 2D case, instead of (79) one has

iPi{x,Xj) = Hl'\k\x-Xj\)e'^^^,
where {x — Xj)/\x — Xj\ = e^^K
For a numerical implementation choose M nodes {tm} on the surface S of
the obstacle D. After the interior points Xj, j = 1,2,..., J are chosen, form A^
vectors

n = 1,2,..., AT of length M. Note that A' = (2L + 1) J for a 2D case, and


N = {L -{-1)^ J for a 3D case. It is convenient to normahze the norm in R ^
by
1 ^
11*^11'= M E l^-l'' b = (61,62, ...,6M).
Then \\uo\\ = 1.
Now let b = {9{tm)}m=i^ i^ ^^^ Random Multi-point MRC (see section
1), and minimize
^ ( c ) - | | b + ylc||, (87)
for c e C ^ , where A is the matrix containing vectors a^^\ n = : 1,2,...,A" as
its columns.
We used the Singular Value Decomposition (SVD) method (see e.g.
[PTVF92]) to minimize (87). Small singular values Sn < Wmin of the ma-
trix A are used to identify and delete linearly dependent or almost linearly
dependent combinations of vectors a^^^. This spectral cut-off makes the min-
imization process stable, see the details in [GR02b].
l^^lrprnin ^^ ^^g residud, i.e. the minimal value of^{c) attained after Nmax
iterations of the Random Multi-point MRC method (or when it is stopped).
For a comparison, let r'^^ be the residual obtained in [GR02b] by an earlier
method.
We conducted 2D numerical experiments for four obstacles: two ellipses
of different eccentricity, a kite, and a triangle. The M=720 nodes tm were
uniformly distributed on the interval [0,27r], used to parametrize the boundary
S. Each case was tested for wave numbers fc = 1.0 and k = 5.0. Each obstacle
was subjected to incident waves corresponding to a — (1.0,0.0) and a =
(0.0,1.0).
The results for the Random Multi-point MRC with J = 1 are shown in
Table 10, in the last column r"^^"^. In every experiment the target residual
e — 0.0001 was obtained in under 6000 iterations, in about 2 minutes run
time on a 2.8 MHz PC.
92 A.G. Ramm, S. Gutman

In [GR02b], we conducted numerical experiments for the same four 2D


obstacles by a Multi-point MRC, as described in the beginning of this section.
The interior points Xj were chosen differently in each experiment. Their choice
is indicated in the description of each 2D experiment. The column J shows
the number of these interior points. Values L = b and M = 720 were used in
all the experiments. These results are shown in Table 10, column r^X-
Thus, the Random Multi-point MRC method achieved a significant im-
provement over the earlier Multi-point MRC.

Table 10. Normalized residuals attained in the numerical experiments for 2D ob-
stacles, ||uo|| = 1.
Experiment J k a
I 4 1.0 (1.0,0.0) 0.000201 0.0001
4 1.0 (0.0,1.0) 0.000357 0.0001
4 5.0 (1.0,0.0) 0.001309 0.0001
4 5.0 (0.0,1.0) 0.007228 0.0001
II 16 1.0 (1.0,0.0) 0.003555 0.0001
16 1.0 (0.0,1.0) 0.002169 0.0001
16 5.0 (1.0,0.0) 0.009673 0.0001
16 5.0 (0.0,1.0) 0.007291 0.0001
III 16 1.0 (1.0,0.0) 0.008281 0.0001
16 1.0 (0.0,1.0) 0.007523 0.0001
16 5.0 (1.0,0.0) 0.021571 0.0001
16 5.0 (0.0,1.0) 0.024360 0.0001
IV 32 1.0 (1.0,0.0) 0.006610 0.0001
32 1.0 (0.0,1.0) 0.006785 0.0001
32 5.0 (1.0,0.0) 0.034027 0.0001
32 5.0 (0.0,1.0) 0.040129 0.0001

E x p e r i m e n t 2D-I. The boundary S is an ellipse described by

r{t) = {2.0cost, s'lnt), 0 < ^ < 27r. (88)

The Multi-point MRC used J = 4 interior points Xj = 0.7r(^^^^^), j =


1 , . . . , 4. Run time was 2 seconds.
E x p e r i m e n t 2D-II. The kite-shaped boundary S (see [CK92], Section
3.5) is described by

r(t) = (-0.65 +cos^-f 0.65 cos2^, 1.5 sini^), 0<?^<27r. (89)

The Multi-point MRC used J = 16 interior points Xj = 0.9r(^^^^^), j =


1 , . . . , 16. Run time was 33 seconds.
E x p e r i m e n t 2D-III. The boundary S is the triangle with vertices
(-1.0,0.0) and (1.0, ±1.0). The Multi-point MRC used the interior points
Xj = Q.9r(^^-^g" ^), j = 1 , . . . , 16. Run time was about 30 seconds.
Optimization Methods in Direct and Inverse Scattering 93

Experiment 2D-IV. The boundary S is an ellipse described by

r(^) = (0.1 cos^, sint), 0 < t < 27r. (90)

The Multi-point MRC used J = 32 interior points Xj = 0.95r(^^i(=^), j -


1 , . . . , 32. Run time was about 140 seconds.
The 3D numerical experiments were conducted for 3 obstacles: a sphere,
a cube, and an ellipsoid. We used the Random Multi-point MRC with L =
0, Wmin = 10~^^, and J = 80. The number M of the points on the boundary
S is indicated in the description of the obstacles. The scattered field for each
obstacle was computed for two incoming directions ai — {O.cj)), i = 1,2,
where (j) was the polar angle. The first unit vector a i is denoted by (1) in
Table 11, a i = (0.0,7r/2). The second one is denoted by (2), a2 = (7r/2,7r/4).
A typical number of iterations Nuer and the run time on a 2.8 MHz PC are
also shown in Table 11. For example, in experiment I with k — 5.0 it took
about 700 iterations of the Random Multi-point MRC method to achieve the
target residual r^^^ = 0.001 in 7 minutes.
E x p e r i m e n t 3D-I. The boundary S is the sphere of radius 1, with M ~
450.
E x p e r i m e n t 3D-II. The boundary S is the surface of the cube [—1,1]^
with M = 1350.
Experiment 3D-III. The boundary S is the surface of the ellipsoid
x V l 6 -f y2 -I- ^2 ^ 1 with M - 450.

Table 11. Normalized residuals attained in the numerical experiments for 3D ob-
stacles, ||uo|| = 1.
Experiment k ai r^*^ Nuer run time
I To 00002 i 1 sec
5.0 0.001 700 7min
II 1.0 (1) 0.001 800 16 min
1.0 (2) 0.001 200 4 min
5.0 (1) 0.0035 2000 40 min
5.0 (2) 0.002 2000 40 min
III 1.0 (1) 0.001 3600 37 min
1.0 (2) 0.001 3000 31 min
5.0 (1) 0.0026 5000 53 min
5.0 (2) 0.001 5000 53 min

In the last experiment the run time could be reduced by taking a smaller
value for J. For example, the choice of J == 8 reduced the running time to
about 6-10 minutes.
Numerical experiments show that the minimization results depend on the
choice of such parameters as J, Wmin, and L. They also depend on the choice
of the interior points Xj. It is possible that further versions of the MRC could
94 A.G. Ramm, S. Gutman

be made more efficient by finding a more efiicient rule for their placement.
Numerical experiments in [GR02b] showed that the efficiency of the minimiza-
tion greatly depended on the deterministic placement of the interior points,
with better results obtained for these points placed sufficiently close to the
boundary S of the obstacle D, but not very close to it. The current choice
of a random placement of the interior points Xj reduced the variance in the
obtained results, and efiminated the need to provide a justified algorithm for
their placement. The random choice of these points distributes them in the
entire interior of the obstacle, rather than in a subset of it.

6.4 Conclusions.

For 3D obstacle Rayleigh's hypothesis (conjecture) says that the acoustic field
u in the exterior of the obstacle D is given by the series convergent up to the
boundary of D:
oo

u{x, a) = e^^^-^ + Yl Mc^)i^i. i^e - Ye{a')he{kr), a' = -. (91)

While this conjecture (RC) is false for many obstacles, it has been modified
in [Ram02b] to obtain a valid representation for the solution of (76)-(77).
This representation (Theorem 3) is called the Modified Rayleigh Conjecture
(MRC), and is, in fact, not a conjecture, but a Theorem.
Can one use this approach to obtain solutions to various scattering prob-
lems? A straightforward numerical implementation of the MRC may fail, but,
as we show here, it can be efficiently implemented and allows one to obtain
accurate numerical solutions to obstacle scattering problems.
The Random Multi-point MRC algorithm was successfully applied to var-
ious 2D and 3D obstacle scattering problems. This algorithm is a significant
improvement over previous MRC implementation described in [GR02b]. The
improvement is achieved by allowing the required minimizations to be done
iteratively, while the previous methods were limited by the problem size con-
straints. In [GR02b], such MRC method was presented, and it favorably com-
pared to the Boundary Integral Equation Method.
The Random Multi-point MRC has an additional attractive feature, that it
can easily treat obstacles with complicated geometry (e.g. edges and corners).
Unlike the BIEM; it is easily modified to treat different obstacle shapes.
Further research on MRC algorithms is conducted. It is hoped that the
MRC in its various implementation can emerge as a valuable and efficient
alternative to more established methods.
Optimization Methods in Direct and Inverse Scattering 95

7 Support Function Method for inverse obstacle


scattering problems.
7.1 Support Function Method (SFM)

The Inverse Scattering Problem consists of finding the obstacle D from the
Scattering Amplitude, or similarly observed data. The Support Function
Method (SFM) was originally developed in a 3-D setting in [RamTO], see
also [Ram86, pp. 94-99]. It is used to approximately locate the obstacle D.
The method is derived using a high-frequency approximation to the scattered
field for smooth, strictly convex obstacles. It turns out that this inexpensive
method also provides a good localization of obstacles in the resonance region
of frequencies. If the obstacle is not convex, then the SFM yields its convex
hull.
One can restate the SFM in a 2-D setting as follows (see [GR03]). Let
D C M^ be a smooth and strictly convex obstacle with the boundary F. Let
z/(y) be the unique outward unit normal vector to JT at y G i"". Fix an incident
direction a ^ S^. Then the boundary F can be decomposed into the following
two parts:

r + = {y G r : z/(y) • a < 0} , and r _ - {y G T : v{y) • a > 0} , (92)

which are, correspondingly, the illuminated and the shadowed parts of the
boundary for the chosen incident direction a.
Given a £ S^^ its specular point so(a) G /If. is defined from the condi-
tion:
So (a) • a = min s • a (93)

Note that the equation of the tangent line to F^ at SQ is

< xi^X2> ' a = so(a) • a , (94)

and
z/(so(a)) = - a . (95)
The Support function d[a) is defined by

d{a) = So(a) • a. (96)

Thus \d{a)\ is the distance from the origin to the unique tangent hne to
/If perpendicular to the incident vector a. Since the obstacle D is assumed
to be convex
^ = naG5i{xGM^ : x - a > d ( a ) } . (97)
The boundary T of -D is smooth, hence so is the Support Function. The
knowledge of this function allows one to reconstruct the boundary F using
the following procedure.
96 A.G. Ramm, S. Gutman

Parametrize unit vectors 1 E 5^ by l(^) = (cos t, sin t), 0 < t < 27r and
define
p{t) = d{l{t)), 0 < ^ < 2 7 r . (98)
Equation (94) and the definition of the Support Function give

xi cost + X2smt =^ p{t). (99)

Since F is the envelope of its tangent hues, its equation can be found from
(99) and
—xi sin t-i-X2 cos t = p'{t). (100)
Therefore the parametric equations of the boundary F are

xi{t) — p{t)cost ~ p'{t)smt^ X2{t) = p{t)sYnt-\-p'{t)cost. (101)

So, the question is how to construct the Support function d(l), 1 G 5^ from
the knowledge of the Scattering Amplitude. In 2-D the Scattering Amplitude
is related to the total field u = UQ-^V hy

(102)
•^<°'°> = - ^ X a ^ ' " " " ' ' ' ' ^ w -
In the case of the "soft" boundary condition (i.e. the pressure field satis-
fies the Dirichlet boundary condition u = &) the Kirchhoff (high frequency)
approximation gives

on the illuminated part F^ of the boundary T, and

|H.O (.04)

on the shadowed part F-. Therefore, in this approximation,

A{ol, a) = _ ! ^ / a . z/(y) e^'^^^-^')-^ ds{y). (105)


V^nk Jr+
Let L be the length of /T^, and y = y(C)? 0 < C :^ -^ be its arc length
parametrization. Then

iy/k e*
(106)
V27r 70
Let Co E [0,1/] be such that SQ = y(Co) is the specular point of the unit
vector 1, where

\a — a'\
Then i/(so) = - 1 , and c/(l) = y(Co) • 1. Let
Optimization Methods in Direct and Inverse Scattering 97

^(C) = ( a - a ' ) - y ( C ) .

Then (p{() = 1 • y(C)|ct — a'\. Since z^(so) and y'(Co) are orthogonal, one has

¥''(Co) = l - y ' ( C o ) | a - a ' | = 0 .

Therefore, due to the strict convexity of D, Co is also the unique non-


degenerate stationary point of (/?(C) on the interval [0,L], that is ^'{Co) — 0,
and (^''(Co) ^ 0.
According to the Stationary Phase method

27r
/ /(C)e^'='^«)rfC = /(Co)exp ik(p{Co) +
Jo 4 |<^"(Co)| A:|v"(Co)|
(108)
as fc -^ DO.
By the definition of the curvature A^(CO) = iy^'CCo)!- Therefore, from the
collinearity of y'XCo) and 1, |<^"(Co)| = \oc — a'\K{C,o), Finally, the strict con-
vexity of J9, and the definition of <^{C,)-, imply that Co is the unique point of
minimum of (f on [0, L], and

V'"(Co)
= 1 (109)
l^"(Co)|
Using (108)-(109), expression (106) becomes:

l a ifc(a-a')-y(Co)
A{a',a) l + O , fc-^oc. (110)
^J\a^^^''a%{^

At the specular point one has 1 • a' = —1 • a. By the definition a — a' =


\\a — a'\. Hence 1 • (a — a') = |a — a'| and 21- a =^\a — a'\. These equalities
and d{\) = y(Co) • 1 give

l + O k —> oo. (111)

Thus, the approximation

1 /la —a' „ife|a-c«'|d(l)


A{a',a) (112)
«(Co)

can be used for an approximate recovery of the curvature and the support
function (modulo 27T/k\a — a'|) of the obstacle, provided one knows that the
total field satisfies the Dirichlet boundary condition. The uncertainty in the
support function determination can be remedied by using difi'erent combina-
tions of vectors a and a' as described in the numerical results section.
98 A.G. Ramm, S. Gut man

Since it is also of interest to localize the obstacle in the case when the
boundary condition is not a priori known, one can modify the SFM as shown
in [RG04], and obtain

2 V /^(Co)

where
7o = arctan —,
a
and
— + hu = 0
on
along the boundary F of the sought obstacle.
Now one can recover the Support Function d{l) from (113), and the loca-
tion of the obstacle.

7.2 Numerical results for the Support Function Method.

In the first numerical experiment the obstacle is the circle

D = {{xi,X2)eR^ : (xi - 6 ) 2 + ( X 2 - 2 ) 2 = 1 } . (114)

It is reconstructed using the Support Function Method for two frequencies


in the resonance region: k = 1.0, and k = 5.0. Table 12 shows how well
the approximation (112) is satisfied for various pairs of vectors a and a' all
representing the same vector 1 = (1.0,0.0) according to (107). The Table
shows the ratios of the approximate Scattering Amplitude Aa{a',a) defined
as the right hand side of the equation (112) to the exact Scattering Amplitude
A(a\ a). Note, that for a sphere of radius a, centered at XQ G M^^ one has

where a' = x / | x | = e^^, and a = e^^. Vectors a and a' are defined by their
polar angles shown in Table 12.
Table 12 shows that only vectors a close to the vector 1 are suitable for the
Scattering Amplitude approximation. This shows the practical importance of
the backscattering data. Any single combination of vectors a and a' repre-
senting 1 is not sufficient to uniquely determine the Support Function d{l)
from (112) because of the phase uncertainty. However, one can remedy this
by using more than one pair of vectors a and a' as follows.
Let 1 G 5^ be fixed. Let

Ril) = {aeS^ : \a-l\>l/V2}.


Optimization Methods in Direct and Inverse Scattering 99

Table 12. Ratios of the approximate and the exact Scattering Amphtudes
Aa{a',a)/A{a\a) for 1 = (1.0,0.0).

/c = 1.0 /c - 5.0

TT 0 0.88473 - 0.17487i 0.98859 - 0.05846z


237r/24 7r/24 0.88272 - 0.17696i 0.98739 - 0.06006z
227r/24 27r/24 0.87602 - 0.18422z 0.98446 - 0.06459i
2l7r/24 37r/24 0.86182 - 0.19927i 0.97977 - 0.07432z
207r/24 47r/24 0.83290 - 0.2241 H 0.96701 - 0.08873i
197r/24 57r/24 0.77723 - 0.25410z 0.95311 - 0.1032H
187r/24 67r/24 0.68675 - 0.27130z 0.92330 - 0.14195i
177r/24 77r/24 0.57311-0.253602 0.86457-0.149592
167r/24 87r/24 0.46201 - 0.19894z 0.81794 - 0.22900i
157r/24 97r/24 0.36677 - 0.12600i 0.61444 - 0.19014z
147r/24 107r/24 0.28169 - 0.054492 0.57681 - 0.310752
137r/24 ll7r/24 0.19019 + 0.000752 0.14989 - 0.094792
127r/24 127r/24 0.00000 4- O.OOOOO2 0.00000 + O.OOOOO2

Define "^ : K -> K+ by


|2
1^K'") ,Jk\a-a'\t\
nt) =
ll^K«)r 1\LHR(\))
where a' = a'{a) is defined by 1 and a according t o (107), and t h e integration
is done over a £ i?(l).
If t h e approximation (112) were exact for any a G -R(l), then the value
of ^{d{\)) would be zero. This justifies t h e use of t h e minimizer ^0 ^ I^ of
t h e function ^(t) as an approximate value of t h e Support Function d{\). If
t h e Support Function is known for sufficiently m a n y directions 1 G 5 ^ , t h e
obstacle can be localized using (97) or (101). T h e results of such a localization
for fc = 1.0 together with the original obstacle D is shown on Figure 5. For
fc = 5.0 t h e identified obstacle is not shown, since it is practically the same
as D. T h e only a priori assumption on D was t h a t it was located inside t h e
circle of radius 20 with the center in the origin. T h e Support Function was
computed for 16 uniformly distributed in S^ vectors 1. T h e program run takes
about 80 seconds on a 333 MHz P C .
In another numerical experiment we used A: = 1.0 and a kite-shaped ob-
stacle. Its b o u n d a r y is described by

r ( t ) = (5.35 + c o s t + 0.65cos2t, 2.0 + 1.5sint), 0<t<27r. (116)


100 A.G. Ramm, S. Gutman
y
6

2 4 6 8
Fig. 5. Identified (dotted line), and the original (solid line) obstacle D for /c = 1.0.

Numerical experiments using the boundary integral equation method (BIEM)


for the direct scattering problem for this obstacle centered in the origin are
described in [CK92, Section 3.5]. Again, the Dirichlet boundary conditions
were assumed. We computed the scattering amplitude for 120 directions a
using the MRC method with about 25% performance improvement over the
BIEM, see [GR02b].
The Support Function Method (SFM) was used to identify the obstacle D
from the synthetic scattering amplitude with no noise added. The only a priori
assumption on D was that it was located inside the circle of radius 20 with
the center in the origin. The Support Function was computed for 40 uniformly
distributed \u S^ vectors 1 in about 10 seconds on a 333 MHz PC. The results
of the identification are shown in Figure 6. The original obstacle is the solid
line. The points were identified according to (101). As expected, the method
recovers the convex part of the boundary JT, and fails for the concave part.
The same experiment but with fc = 5.0 achieves a perfect identification of the
convex part of the boundary. In each case the convex part of the obstacle was
successfully localized. Further improvements in the obstacle localization using
the MRC method are suggested in [Ram02b], and in the next section.
For the identification of obstacles with unknown boundary conditions let

A{t) =^ A{Q!,a) iiPit)


^\A(t)\e
Optimization Methods in Direct and Inverse Scattering 101

y
4

2 4 6 8
Fig. 6. Identified points and the original obstacle D (solid line); k = 1.0.

where, given t^ the vectors a and a' are chosen as above, and the phase
function '^(t), \/2 < t < 2 is continuous. Similarly, let Aa{t), ipa{t) be the
approximate scattering amplitude and its phase defined by formula (113).
If the approximation (113) were exact for any a G i?(l), then the value of

\^a{t) - ktd{l) + 2-fo - 7r\

would be a multiple of 27r.


This justifies the following algorithm for the determination of the Support
Function d{\):
Use a linear regression to find the approximation

ij{t) « Cit + C2

on the interval y/2 <t<2. Then

d(l) = ^ . (117)

Also

2
However, the formula for h did not work well numerically. It could only deter-
mine if the boundary conditions were or were not of the Dirichlet type. Table
13 shows that the algorithm based on (117) was successful in the identification
of the circle of radius 1.0 centered in the origin for various values of h with no
a priori assumptions on the boundary conditions. For this circle the Support
Function d{l) — —1.0 for any direction I.
102 A.G. Ramm, S. Gutman

Table 13. Identified values of the Support Function for the circle of radius 1.0 at
k = 3.0.
h Identified d(\) Actual d(l)
0.01 -0.9006 -1.00
0.10 -0.9191 -1.00
0.50 -1.0072 -1.00
1.00 -1.0730 -1.00
2.00 -0.9305 -1.00
5.00 -1.3479 -1.00
10.00 -1.1693 -1.00
100.00 -1.0801 -1.00

8 Analysis of a Linear Sampling m e t h o d .


During the last decade many papers were published, in which the obstacle
identification methods were based on a numerical verification of the inclusion
of some function / := / ( a , z), z G R^, a G 5^, in the range R{B) of a cer-
tain operator B. Examples of such methods include [CCMOO, CK96, Kir98].
However, one can show that the methods proposed in the above papers have
essential diflSculties, see [RG05]. Although it is true that / 0 R{B) when
z ^ D, it turns out that in any neighborhood of / there are elements from
R{B). Also, although / G R{B) when z £ D, there are elements in every
neighborhood of / which do not belong to R{B) even if 2; G D. Therefore it is
quite difficult to construct a stable numerical method for the identification of
D based on the verification of the inclusions / ^ R{B), and / G R{B). Some
published numerical results were intended to show that the method based on
the above idea works practically, but it is not clear how these conclusions were
obtained.
Let us introduce some notations : A''(^) and R{B) are, respectively, the
null-space and the range of a linear operator B, D eR^ is a, bounded domain
(obstacle) with a smooth boundary S, D^ = R^\ D, UQ = e*^^'^, k = const >
0, a G 5^ is a unit vector, N is the unit normal to S pointing into D^
9 = 9{x,y,k) := g{\x - y\) := f^,^!^,, / : - e"^^^''^, where 2: G M^ and
a' G 5^, a' := xr~^, r — |x|, u = w(x,a, k) is the scattering solution:

{A-^k'^)u = 0 in D\u\s = 0, (118)

u = uo-i-v, V = A{a\a,k)e'^^'^r~^ -\-o{r~^), as r - ^ 00, (119)


where A := ^(a',a,fc) is called the scattering amplitude, corresponding to
the obstacle D and the Dirichlet boundary condition. Let G = G{x^y^k) be
the resolvent kernel of the Dirichlet Laplacian in D'\

{A-\-e)G = -5{x-y) in D',G\s = ^, (120)

and G satisfies the outgoing radiation condition.


Optimization Methods in Direct and Inverse Scattering 103

If
{A + k^)w = 0 in D',w\s = K (121)
and w satisfies the radiation condition, then ([Ram86]) one has

w{x) = [ GN{X, s)h{s)ds, w = A{a', k)e'^W-^ -f o(r-^), (122)


Js
as r —> GO, and xr~^ = a'. We write A{a') for A{a'^ k), and

A{a') := Bh:=^ [ UN{S, -a')h{s)ds, (123)


^^ Js
as follows from Ramm's lemma:
Lemma 1. ([Ram86, p. 46]) One has:

G{x,y,k) = g{r)u{y^—a\k) + o{r~^)^ as r = |a:| —> oo, xr~^ = a\


(124)
where u is the scattering solution of (118)-(119).
One can write the scattering amplitude as:

A{a',a,k) = - - ^ I UN{s,-a')e'^''-'ds. (125)

The following claim follows easily from the results in [Ram86], [Ram92b] (cf
[Kir98]):
Claim: f := e'^^^''^ G R{B) if and only if z e D.
Proof. If e"^^"'-^ = Bh, then Lemma 1 and (12.6) imply

g{y,z) = / GN{s,y)hds for \y\ > \z\.

Thus z E D^ because otherwise one gets a contradiction: lim^-,^ g{y^ z) = oo if


z e D^ ^ while lim^^^ fs GN{S^ y)hds < oo'iiz ^ D'. Conversely, if ^ G Z>, then
Green's formula yields g{y,z) = fgGp^{s,y)g{s,z)ds. Taking \y\ -^ oo, A =
a\ and using Lemma 1, one gets e~^^" '^ = Bh, where h = g{s^ z). The claim
is proved. D
Consider B : L^{S) -^ L^(S^), and A : L^{S^) -^ 1^(3^), where B is
defined in (123) and Aq := /^2 A{a\a)q{a)da. Then one proves (see [RG05]):
Theorem 1. The ranges R{B) and R{A) are dense in L^(5^).
Remark 1. In [CK96] the 2D inverse obstacle scattering problem is con-
sidered. It is proposed to solve the equation (1.9) in [CK96]:

/ A{a,p)jdp = e-'^'''', (126)

where A is the scattering amplitude at a fixed A: > 0, 5^ is the unit circle,


a e S^^ and z is a point on R^. If 7 = 7(/?, z) is found, the boundary S of the
104 A.G. Ramm, S. Gutman

obstacle is to be found by finding those z for which ||7|| := ||7(y5,2:)||/,2(5i) is


maximal. Assuming that k^ is not a Dirichlet or Neumann eigenvalue of the
Laplacian in D^ that J9 is a smooth, bounded, simply connected domain, the
authors state Theorem 2.1 [CK96, p. 386], which says that for every e > 0
there exists a function 7 G L^(5^), such that

Jim||7(/3,^)|| = oo, (127)

and (see [CK96, p. 386]),

7. 51
A{a,^)jd0 - e-»'="-^|| < €. (128)

There are several questions concerning the proposed method.


First, equation (126), in general, is not solvable. The authors propose to
solve it approximately, by a regularization method. The regularization method
applies for stable solution of solvable ill-posed equations (with exact or noisy
data). If equation (126) is not solvable, it is not clear what numerical "solu-
tion" one seeks by a regularization method.
Secondly, since the kernel of the integral operator in (126) is smooth, one
can always find, for any ^ G R^, infinitely many 7 with arbitrary large ||7||,
such that (128) holds. Therefore it is not clear how and why, using (127), one
can find S numerically by the proposed method.
A numerical implementation of the Linear SampUng Method (LSM) sug-
gested in [CK96] consists of solving a discretized version of (126)

Fg = f, (129)

where F = {Aai,Pj}, i = 1,...,A^, j = 1,...,A^ be a square matrix formed


by the measurements of the scattering amplitude for N incoming, and A^
outgoing directions. In 2-D the vector f is formed by

fn = - i ^ e - ^ ^ - - ^ n-l,...,Ar,

see [BLWOl] for details.


Denote the Singular Value Decomposition of the far field operator by F ==
USV^. Let Sn be the singular values of F , p = C/^f, and 11 = V^f. Then the
norm of the sought function g is given by

ll7f = E ^ - (130)
n=l *"

A different LSM is suggested by A. Kirsch in [Kir98]. In it one solves


(/?*i^)l/4g = f (131)
Optimization Methods in Direct and Inverse Scattering 105

instead of (129). The corresponding expression for the norm of 7 is

N , 12

ll7f = E ^ - (132)
n=l

A detailed numerical comparison of the two LSMs and the linearized tomo-
graphic inverse scattering is given in [BLWOl].
The conclusions of [BLWOl], as well as of our own numerical experiments
are that the method of Kirsch (131) gives a better, but a comparable iden-
tification, than (129). The identification is significantly deteriorating if the
scattering amplitude is available only for a limited aperture, or the data are
corrupted by noise. Also, the points with the smallest values of the ||7|| are
the best in locating the inclusion, and not the largest one, as required by
the theory in [CK96, Kir98]. In Figures 7 and 8 the implementation of the
Colton-Kirsch LSM (130) is denoted by gnck, and of the Kirsch method (132)
by gnk. The Figures show a contour plot of the logarithm of the ||7||. In all the
cases the original obstacle was the circle of radius 1.0 centered at the point
(10.0, 15.0). A similar circular obstacle that was identified by the Support
Function Method (SFM) is discussed in Section 10. Note that the actual ra-
dius of the circle is 1.0, but it cannot be seen from the LSM identification.
The LSM does not require any knowledge of the boundary conditions on the
obstacle. The use of the SFM for unknown boundary conditions is discussed in
the previous section. The LSM identification was performed for the scattering
amplitude of the circle computed analytically with no noise added. In all the
experiments the value for the parameter N was chosen to be 128.

References
[ARS99] Airapetyan, R., Ramm, A.G., Smirnova, A.: Example of two different
potentials which have practically the samefixed-energyphase shifts. Phys.
Lett. A, 254, N3-4, 141-148(1999).
[Apa97] Apagyi, B. et al (eds): Inverse and algebraic quantum scattering theory.
Springer, Berlin (1997)
[ARS98] Athanasiadis, C, Ramm A.G., Stratis I.G.: Inverse Acoustic Scattering
by a Layered Obstacle. In: Ramm A. (ed) Inverse Problems, Tomography,
and Image Processing. Plenum Press, New York, 1-8 (1998)
[Bar71] Barantsev, R.: Concerning the Rayleigh hypothesis in the problem of scat-
tering from finite bodies of arbitrary shapes. Vestnik Lenungrad Univ.,
Math., Mech., Astron., 7, 56-62 (1971)
[BP96] Barhen, J., Protopopescu, V.: Generalized TRUST algorithm for global
optimization. In: Floudas C. (ed) State of The Art in Global Optimization.
Kluwer, Dordrecht (1996)
[BPR97] Barhen, J., Protopopescu, V., Reister, D.: TRUST: A deterministic algo-
rithm for global optimization. Science, 276, 1094-1097 (1997)
106 A.G. Ramm, S. Gutman

gnk, k=1.0, noise=0.00

34.3

5.39
0 5 10 15 20

gnck, k=1.0, noise=0.00

70.4

39.0
0 5 10 15 20

Fig. 7. Identification of a circle at k = 1.0.

[Bie97] Biegler, L.T. (ed): Large-scale Optimization With Applications. In: IMA
volumes in Mathematics and Its Applications, 92-94. Springer-Verlag,
New York (1997)
[BR87] Boender, C.G.E., Rinnooy Kan, A.H.G.: Bayesian stopping rules for mul-
tistart global optimization methods. Math. Program., 37, 59-80 (1987)
[Bom97] Bomze, I.M. (ed): Developments in Global Optimization. Kluwer Acad-
emia Publ., Dordrecht (1997)
[BLWOl] Brandfass, M., Lanterman A.D., Warnick K.F.: A comparison of the
Colton-Kirsch inverse scattering methods with linearized tomographic in-
verse scattering. Inverse Problems, 17, 1797-1816 (2001)
[Bre73] Brent, P.: Algorithms for minimization without derivatives. Prentice-Hall,
Englewood Cliffs, NJ (1973)
[Cal67] Calogero, P.: Variable Phase Approach to Potential Scattering. Academic
Press, New York and London (1967)
[CS89] Chadan, K., Sabatier, P.: Inverse Problems in Quantum Scattering The-
ory. Springer, New York (1989)
[CCMOO] Colton, D., Coyle, J., Monk, P.: Recent developments in inverse acoustic
scattering theory. SIAM Rev., 42, 369-414 (2000)
[CK96] Colton, D., Kirsch, A.: A simple method for solving inverse scattering
problems in the resonance region. Inverse Problems 12, 383-393 (1996)
Optimization Methods in Direct and Inverse Scattering 107

gnk, k=5.0, noise=0.00

32.7

7.64

gnck, k=5.0, noise=0.00

67.6

40.3
0 5 10 15 20

Fig. 8. Identification of a circle at /c = 5.0.

[CK92] Colton, D., Kress, R.: Inverse Acoustic and Electromagnetic Scattering
Theory. Springer-Verlag, New York (1992)
[CM90] Colton, D., Monk, P.: The Inverse Scattering Problem for acoustic waves
in an Inhomogeneous Medium. In: Colton D., Ewing R., Rundell W. (eds)
Inverse Problems in Partial Differential Equations. SIAM Publ. Philadel-
phia, 73-84 (1990)
[CT70] Cox, J., Thompson, K.: Note on the uniqueness of the solution of an
equation of interest in the inverse scattering problem. J. Math. Phys., 11,
815-817 (1970)
[DE94] Deep, K., Evans, D.J.: A parallel random search global optimization
method. Technical Report 882, Computer Studies, Loughborough Uni-
versity of Technology (1994)
[DS83] Dennis, J.E., Schnabel, R.B.: Numerical methods for unconstrained op-
timization and nonlinear equations. Prentice-Hall, Englewood Cliffs, NJ
(1983)
[DJ93] Dixon, L.C.W., Jha, M.: Parallel algorithms for global optimization. J.
Opt. Theor. AppL, 79, 385-395 (1993)
[EJP57] Ewing, W.M, Jardetzky, W.S., Press, P.: Elastic Waves in Layered Media.
McGraw-Hill, New York (1957)
108 A.G. Ramm, S. Gutman

[FleSl] Fletcher, R. : Practical methods of optimization (Second edition). John


Wiley & Sons, New York (1981)
[FloOO] Floudas, C.A.: Deterministic Global Optimization-Theory, Methods and
Applications. In: Nonconvex Optimization and Its Applications, 37,
Kluwer Academic Publishers, Dordrecht (2000)
[FPOl] Floudas, C.A., Pardalos, P.M.: Encyclopedia of Optimization. Kluwer
Academic Publishers, Dordrecht (2001)
[GutOO] Gutman, S. Identification of multilayered particles from scattering data
by a clustering method. J. Comp. Phys., 163, 529-546 (2000)
[GutOl] Gutman, S.: Identification of piecewise-constant potentials by fixed-energy
shifts. Appl. Math. Optim., 44, 49-65 (2001)
[GROO] Gutman, S., Ramm, A.G.: AppHcation of the Hybrid Stochastic-
deterministic Minimization method to a surface data inverse scattering
problem. In: Ramm A.G., Shivakumar P.N., Strauss A.V. (eds) Operator
Theory and Its AppHcations, Amer. Math. Soc, Fields Institute Commu-
nications, 25, 293-304 (2000)
[GR02a] Gutman, S. and Ramm, A.G.: Stable identification of piecewise-constant
potentials from fixed-energy phase shifts. Jour, of Inverse and 111-Posed
Problems, 10, 345-360.
[GR02b] Gutman, S., Ramm, A.G.: Numerical Implementation of the MRC Method
for obstacle Scattering Problems. J. Phys. A: Math. Gen., 35, 8065-8074
(2002)
[GR03] Gutman, S., Ramm, A.G.: Support Function Method for Inverse Scatter-
ing Problems. In: Wirgin A. (ed) Acoustics, Mechanics and Related Topics
of Mathematical Analysis. World Scientific, New Jersey, 178-184 (2003)
[GR05] Gutman, S., Ramm, A.G.: Modified Rayleigh Conjecture method for Mul-
tidimensional Obstacle Scattering problems. Numerical Funct. Anal and
Optim., 26 (2005)
[GRS02] Gutman, S., Ramm, A.G., Scheid, W.: Inverse scattering by the stability
index method. Jour, of Inverse and Ill-Posed Problems, 10, 487-502 (2002)
[HH98] Haupt, R.L., Haupt, S.E.: Practical genetic algorithms. John Wiley and
Sons Inc., New York (1998)
[Hes80] Hestenes, M.: Conjugate direction methods in optimization. In: Applica-
tions of mathematics 12. Springer-Verlag, New York (1980)
[HPT95] Horst, R., Pardalos, P.M., Thoai, N.V.: Introduction to Global Optimiza-
tion. Kluwer Academic Publishers, Dordrecht (1995)
[HT93] Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches, sec-
ond edition. Springer, Heidelberg (1993)
[Hu95] Hu, F.Q.: A spectral boundary integral equation method for the 2D Hel-
holtz equation. J. Comp. Phys., 120, 340-347 (1995)
[Jac77] Jacobs, D.A.H. (ed): The State of the Art in Numerical Analysis. Acad-
emic Press, London (1977)
[Kir84] Kirkpatrick, S.: Optimization by simulated annealing: quantitative stud-
ies. Journal of Statistical Physics, 34, 975-986 (1984)
[KGV83] Kirkpatrick, S., Gelatt, C D . , Vecchi, M.P.: Optimization by Simulated
Annealing. Science, 220, 671-680 (1983)
[Kir96] Kirsch, A.: An Introduction to the Mathematical Theory of Inverse Prob-
lems. Springer-Verlag, New York (1996)
[Kir98] Kirsch, A.: Characterization of the shape of a scattering obstacle using
the spectral data for far field operator. Inverse Probl., 14, 1489-1512.
Optimization Methods in Direct and Inverse Scattering 109

[Mil73] Millar, R.: The Rayleigh hypothesis and a related least-squares solution to
the scattering problems for periodic surfaces and other scatterers. Radio
Sci., 8, 785-796 (1973)
[New82] Newton R.: Scattering Theory of Waves and Particles. Springer, New York
(1982)
[PRTOO] Pardalos, P.M., Romeijn, H.E., Tuy, H.: Recent developments and trends
in global optimization. J. Comput. Appl. Math., 124, 209-228 (2000)
[Pol71] Polak, E.: Computational methods in optimization. Academic Press, New
York (1971)
[PTVF92] Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numeri-
cal Recepies in FORTRAN, Second Edition, Cambridge University Press
(1992)
[Ram70] Ramm, A.G.: Reconstruction of the shape of a reflecting body from the
scattering amplitude. Radiofisika, 13, 727-732 (1970)
[Ram82] Ramm A.G.: Iterative methods for calculating the static fields and wave
scattering by small bodies. Springer-Verlag. New York, NY (1982)
[Ram86] Ramm A.G.: Scattering by Obstacles. D. Reidel Publishing, Dordrecht,
Holland (1986)
[Ram88] Ramm, A.G. Recovery of the potential from fixed energy scattering data.
Inverse Problems, 4, 877-886.
[Ram90] Ramm, A.G.: Is the Born approximation good for solving the inverse prob-
lem when the potential is small? J. Math. Anal. Appl., 147, 480-485
(1990)
[Ram91] Ramm, A.G.: Symmetry properties for scattering amplitudes and appli-
cations to inverse problems. J. Math. Anal. Appl., 156, 333-340 (1991)
[Ram92a] Ramm, A.G.: Stability of the inversion of 3D fixed-frequency data. J.
Math. Anal. Appl., 169, 329-349 (1992)
[Ram92b] Ramm, A.G.: Multidimensional Inverse Scattering Problems. Long-
man/Wiley, New York (1992)
[Ram94a] Ramm, A.G.: Multidimensional Inverse Scattering Problems. Mir,
Moscow (1994) (expanded Russian edition of [Ram92b])
[Ram94b] Ramm, A.G.: Numerical method for solving inverse scattering problems.
Doklady of Russian Acad, of Sci., 337, 20-22 (1994)
[Ram94c] Ramm, A.G.: Stability of the solution to inverse obstacle scattering prob-
lem. J. Inverse and 111-Posed Problems, 2, 269-275 (1994)
[Ram94d] Ramm, A.G.: Stability estimates for obstacle scattering. J. Math. Anal.
Appl., 188, 743-751 (1994)
[Ram96] Ramm, A.G.: Finding potential from the fixed-energy scattering data via
D-N map. J. of Inverse and Ill-Posed Problems, 4, 145-152 (1996)
[Ram97] Ramm, A.G.: A method for finding small inhomogeneities from surface
data. Math. Sci. Research Hot-Line, 1, 10 , 40-42 (1997)
[RamOOa] Ramm A.G.: Finding small inhomogeneities from scattering data. Jour.
of inverse and ill-posed problems, 8, 1-6 (2000)
[RamOOb] Ramm, A.G.: Property C for ODE and applications to inverse problems.
In: Operator Theory and Its Applications, Amer. Math. Soc, Fields In-
stitute Communications, Providence, RI, 25, 15-75 (2000)
[Ram02a] Ramm, A.G.: Stability of the solutions to 3D inverse scattering problems.
Milan Journ of Math 70, 97-161 (2002)
[Ram02b] Ramm, A.G.: Modified Rayleigh Conjecture and appHcations. J. Phys.
A: Math. Gen., 35, 357-361.
110 A.G. Ramm, S. Gutman

[Ram02c] Ramm, A.G.: A counterexample to the uniqueness result of Cox and


Thompson. Appl. Anal, 8 1 , 833-836 (2002)
[Ram02d] Ramm, A.G.: Analysis of the Newton-Sabatier scheme for inverting fixed-
energy phase shifts. Appl. Anal., 8 1 , 965-975 (2002)
[Ram04a] Ramm, A.G.: Dynamical systems method for solving operator equations.
Communic. in Nonlinear Science and Numer. Simulation, 9, 383-402
(2004)
[Ram04b] Ramm, A.G.: One-dimensional inverse scattering and spectral problems.
Cubo a Mathem. Journ., 6, 313-426 (2004)
[Ram04c] Ramm, A.G., Gutman, S.: Modified Rayleigh Conjecture for scattering
by periodic structures. Intern. Jour, of Appl. Math. Sci., 1, 55-66 (2004)
[Ram04d] Ramm, A.G.: Inverse scattering with fixed-energy data. Jour, of Indones.
Math. Soc, 10, 53-62 (2004)
[Ram05a] Ramm, A.G.: Inverse Problems. Springer, New York (2005)
[Ram05b] Ramm, A.G.: Wave Scattering by Small Bodies of Arbitrary Shapes.
World Sci. Publishers, Singapore (2005)
[RAI98] Ramm, A.G., Arredondo, J.H., Izquierdo, B.G.: Formula for the radius of
the support of the potential in terms of the scattering data. Jour. Phys.
A, 3 1 , 39-44 (1998)
[RGOl] Ramm, A.G., Gutman, S.: Piecewise-constant positive potentials with
practically the same fixed-energy phase shifts. Appl. Anal., 78, 207-217
(2001)
[RG04] Ramm, A.G., Gutman, S.: Numerical solution of obstacle scattering prob-
lems. Internat. Jour, of Appl. Math, and Mech., 1, 71-102 (2004)
[RG05] Ramm, A.G., Gutman, S.: Analysis of a linear sampling method for iden-
tification of obstacles. Acta Math. Appl. Sinica, 2 1 , 1-6 (2005)
[RPYOO] Ramm, A.G, Pang, P., Yan, G.: A uniqueness result for the inverse trans-
mission problem. Internat. Jour, of Appl. Math., 2, 625-634 (2000)
[RS99] Ramm, A.G., Scheid, W.: An approximate method for solving inverse
scattering problems with fixed-energy data. Jour, of Inverse and Ill-posed
Problems, 7, 561-571 (1999)
[RSOO] Ramm, A.G., Smirnova, A.: A numerical method for solving the inverse
scattering problem with fixed-energy phase shifts. Jour, of Inverse and
Ill-Posed Problems, 3, 307-322.
[RT87a] Rinnooy Kan, A.H.G., Timmer, G.T.: Stochastic global optimization
methods, part I: clustering methods. Math. Program., 39, 27-56 (1987)
[RT87b] Rinnooy Kan, A.E.G., Timmer, G.T.: Stochastic global optimization
methods, part II: multi level methods. Math. Prog., 39, 57-78 (1987)
[RubOO] Rubinov, A.M.: Abstract Convexity and Global Optimization. Kluwer
Acad. Publ., Dordrecht (2000)
[Sch90] Schuster, G.T.: A fast exact numerical solution for the acoustic response
of concentric cylinders with penetrable interfaces. J. Acoust. Soc. Am.,
87, 495-502 (1990)
[ZUB98] Zakovic, S., Ulanowski, Z., Bartholomew-Biggs, M.C.: Appfication of
global optimization to particle identification using light scattering. Inverse
Problems, 14, 1053-1067 (1998)
On Complexity of Stochastic Programming
Problems

Alexander Shapiro^ and Arkadi Nemirovski^

^ Georgia Institute of Technology


Atlanta, Georgia 30332-0205, USA
ashapiroQisye.gatech.edu
^ Technion - Israel Institute of Technology
Haifa 32000, Israel
nemirovsQie.technion.ac.il

Suinmary. The main focus of this paper is in a discussion of complexity of stochas-


tic programming problems. We argue that two-stage (linear) stochastic programming
problems with recourse can be solved with a reasonable accuracy by using Monte
Carlo sampling techniques, while multi-stage stochastic programs, in general, are
intractable. We also discuss complexity of chance constrained problems and multi-
stage stochastic programs with linear decision rules.

K e y w o r d s : stochastic programming, complete recourse, chance constraints,


Monte Carlo sampling, SAA method, large deviations bounds, convex pro-
gramming, multi-stage stochastic programming.

1 Introduction
In real life we constantly have to make decisions under uncertainty and, more-
over, we would like to make such decisions in a reasonably optimal way. T h e n
for a specified objective function F ( x , ^ ) , depending on decision vector x G M^
and vector ^ E M^ of uncertain parameters, we are faced with the problem of
optimizing (say minimizing) F ( x , ^) over x varying in a permissible (feasible)
set X C M'^. Of course, such an optimization problem is not well defined since
our objective depends on an unknown value of ^. A way of dealing with this is
to optimize the objective on average. T h a t is, it is assumed t h a t ^ is a r a n d o m
vector^, with known probability distribution P having support S' C R^, and
the following optimization problem is formulated

^ Sometimes, in the sequel, ^ denotes a random vector and sometimes its particular
reahzation (numerical value). Which one of these two meanings is used will be
clear from the context.
112 A. Shapiro, A. Nemirovski

mn{f{x):=Ep[Fix,^)]}. (1)

We assume throughout the paper that considered expectations are well de-
fined, e.g., F(x, •) is measurable and P-integrable.
In particular, the above formulation can be applied to two-stage stochastic
programming problem with recourse, pioneered by Beale [Bea55] and Dantzig
[Dan55]. That is, an optimization problem is divided into two stages. At the
first stage one has to make a decision on the basis of some available informa-
tion. At the second stage, after a realization of the uncertain data becomes
known, an optimal second stage decision is made. Such stochastic program-
ming problem can be written in the form (1) with F(x,^) being the optimal
value of the second stage problem.
It should be noted that in the formulation (1) all uncertainties are con-
centrated in the objective function while the feasible set X is supposed to
be known (deterministic). Quite often the feasible set itself is defined by con-
straints which depend on uncertain parameters. In some cases one can rea-
sonably formulate such problems in the form (1) by introducing penalties for
possible infeasibilities. Alternatively one can try to optimize the objective sub-
ject to satisfying constraints for all values of unknown parameters in a chosen
(uncertain) region. This is the approach of robust optimization (cf., Ben-Tal
and Nemirovski [BNOl]). Satisfying the constraints for all possible realizations
of random data may be too conservative and, more reasonably, one may try
to satisfy the constraints with a high (close to one) probability. This leads to
the chance, or probabilistic, constraints formulation which is going back to
Charnes and Cooper [CC59].
There are several natural questions which arise with respect to formulation

(i) How do we know the probability distribution P? In some cases one has his-
torical data which can be used to obtain a reasonably accurate estimate
of the corresponding probability distribution. However, this happens in
rather specific situations and often the probability distribution either can-
not be accurately estimated or changes with time. Even worse, in many
cases one deals with scenarios (i.e., possible realizations of the random
data) with the associated probabilities assigned by a subjective judgment.
(ii) Why, at the first stage, do we optimize the expected value of the second
stage optimization problem? If the optimization procedure is repeated
many times, with the same probability distribution of the data, then it
could be argued by employing the Law of Large Numbers that this gives
an optimal decision on average. However, if in the process, because of the
variability of the data one looses all its capital, it does not help that the
decisions were optimal on average.
(iii)How difficult is it to solve the stochastic programming problem (1)? Eval-
uation of the expected value function f{x) involves calculation of the cor-
responding multivariate integrals. Only in rather specific cases it can be
On Complexity of Stochastic Programming Problems 113

done analytically. Therefore, typically, one employs a finite discretization


of the random data which allows to write the expectation in a form of
summation. Note, however, that if random vector ^ has d elements each
with just 3 possible realizations independent of each other, then the total
number of scenarios is 3^, i.e., the number of scenarios grows exponentially
fast with dimension d of the data vector.
(iv)Finally, what can be said about multi-stage stochastic programming, when
decisions are made in several stages based on available information at the
time of making the sequential decisions?
It turns out that there is a close relation between questions (i) and (ii).
As far as question (i) is concerned, one can approach it from the following
point of view. Suppose that a plausible family ^ of probability distributions,
of the random data vector ^, can be identified. Consequently, the "worst-case-
distribution" minimax problem

Min \f{x):=snpEp[F{x,0]] (2)


xex
is formulated. The worst-case approach to decision analysis, of course, is not
new. It was also discussed extensively in the stochastic programming literature
(e.g., [Dup79, Dup87, EGN85, Gai91, SKOO, Zac66]).
Again we are facing the question of how to choose the set ^ of possible
distributions. Traditionally this problem is approached by assuming knowl-
edge of certain moments of the involved random parameters. This leads to
the so-called Problem of Moments, where the set ^ is formed by probability
measures P satisfying moment constraints Ep['0i(^)] = bi, i = l,...,m (see,
e.g., [Lan87]). In that case the extreme (worst case) distributions are measures
with a finite support of at most m + 1 points.
On the other hand, it often happens in applications that one is given a
deterministic value /i of the uncertain data vector ^ and does not have an idea
what a corresponding distribution may be. For example, ^ could represent an
uncertain demand and /i is viewed as its mean vector given by a forecast. It is
well recognized now that solving a corresponding optimization problem for the
deterministic value ^ = /i may give a poor solution from a robustness point of
view. It is natural then to introduce random perturbations to the deterministic
vector /i and to solve the obtained stochastic program. For instance, one can
assume that components ^i of the uncertain data vector are independent and
have a certain type (say, log-normal if ^i should be nonnegative) distribution
with means j^i and standard deviations a^ which are defined within a certain
percentage of /x^, i = l,...,d. Often this quickly stabilizes optimal solutions
of the corresponding stochastic programs irrespective of the underlying dis-
tribution (cf., [SAGS05]). Furthermore, we can approach this setup from the
minimax point of view by considering a worst distribution supported on, say,
a box region around vector /i. If, moreover, we consider unimodal type families
of distributions, then the worst case distribution is uniform (cf., [Sha04]). For a
114 A. Shapiro, A. Nemirovski

given X, even unimodal distributions and F(x, •) := —ls{')^ where ts{') is the
indicator function of a symmetric convex set 5, this result was first estabhshed
by Barmish and Lagoa [BL97], where it was called the "Uniformity Principle".

Question (ii) has also a long history. One can optimize a weighted sum
of the expected value and a term representing variability of the second stage
objective function. For example, we can try to minimize

fix) := E[F{x, 0] + cVar[F(rr, ^)], (3)

where c > 0 is a chosen constant. This approach goes back to Markowitz


[Mar52]. The additional (variance) term in (3) can be viewed as a risk measure
of the second stage (optimal) outcome. It could be noted, however, that adding
the variance term may destroy convexity of the function /(•) even if F(-,^)
is convex for all realizations of ^ (cf., [TA04]). An axiomatic approach to
a mathematical theory of risk measures was suggested recently by Artzner
et al. [ADEH99]. That is, value of a random variable Z is measured by a
function p{Z) satisfying certain axioms. An example of such function p{Z),
called coherent risk measure, is the mean-semideviation
1/2
P{Z):=E[Z]+C{E[[Z-E[Z]]1)}

where c G [0,1].
It turns out that p{Z) is a coherent risk measure if and only if it can be
represented in the form p{Z) = supp^fpEp[Z], where ^ is a set of probabil-
ity measures. In different frameworks this dual representation was derived in
[ADEH99, FS02, RUZ02, RS04a]. Therefore, the min-max problem (2) and
the problem of minimization of a coherent risk measure, of F{x, ^), in fact are
equivalent. We may refer to [ADEHK03, ER05, RieOS, RS04b] for extensions
of this approach to a multi-stage setting.

2 Complexity of two-stage stochastic programs

In this section we discuss question (iii) mentioned in the introduction, that


is, how difficult is to solve a stochastic program. Problem (1) is a problem of
minimizing a deterministic implicitly given objective f{x). We should expect
that this problem is at least as difficult as minimizing / ( x ) , x G X, in the case
where f(x) is given explicitly, say by a "closed form analytic expression", or,
more general, by an "oracle" capable to compute the values and the derivatives
of f{x) at every given point. As far as problems of minimization of / ( x ) , x G
X, with explicitly given objective, are concerned, the "solvable case" is known,
this is the Convex Programming case. That is, X is a closed convex set and
/ : X -^ R is a convex function. It is known that generic Convex Programming
problems satisfying mild computability and boundedness assumptions can be
On Complexity of Stochastic Programming Problems 115

solved in polynomial time. In contrast to this, typical nonconvex problems


turn out to be NP-hard"*. It follows that when speaking about conditions
under which the stochastic program (1) is efficiently solvable, it makes sense
to assume that X is a closed convex set, and /(•) is convex on X. We gain
from a technical point (and do not lose much from practical viewpoint) by
assuming X to be bounded. These assumptions, plus mild technical conditions,
would be sufficient to make (1) easy, if f{x) were given explicitly. However, in
Stochastic Programming it makes no sense to assume that we can compute
efficiently the expectation in (1), thus arriving at an explicit representation
of f{x). Would it be the case, there would be no necessity to treat (1) as a
stochastic program.
We argue now that stochastic programming problems of the form (1) can
be solved reasonably efficiently by using Monte Carlo sampling techniques
provided that the probability distribution of the random data is not "too bad"
and certain general conditions are met. In this respect we should explain what
do we mean by "solving" stochastic programming problems. Let us consider,
for example, two-stage linear stochastic programming problems with recourse.
Such problems can be written in the form (1) with^
X := {x: Ax-=^b, x>0} and F ( x , ^ := (c,x) + Q{x,i),
where (5(x,^) is the optimal value of the second stage problem:
Min (g, y) subject to Tx + Wy > h. (4)

Here T and W are matrices of an appropriate order and ^ G R^ is a vector


whose elements are composed from elements of vectors q and h and matri-
ces T and W which, in a considered problem, are assumed to be random. If
we assume that the random data vector has a finite number of realizations
(scenarios) ^k = {(Ik-, Wk^Tk^ hk) with respective probabilities p^, k = 1,...,X,
then the obtained two-stage problem can be written as one large linear pro-
gramming problem:

s.t Ax=:b,TkX-\-Wkyk>hk,k = l,,..,K, (5)


x>0,yk>0, k = l,...,K.
It is beyond the scope of this paper to give a detailed explanation of what "polyno-
mial time solvability" and "NP-hardness" mean. Informally speaking, the former
property of a problem P means that P is "easy to solve" - it admits a compu-
tationally efficient solution algorithm. NP-hardness of P means that no efficient
solution algorithms for P are known, and there are strong theoretical reasons to
believe that they do not exist. For formal treatment of these issues in Continuous
Optimization, see, e.g. [BNOl, Chapter 5].
We should also stress that a claim "such and such problem is difficult" relates
to a generic problem in question and does not imply that the problem has no
solvable particular cases.
By (x^y) we denote the standard scalar product of two vectors x^y E R^.
116 A. Shapiro, A. Nemirovski

If the number of scenarios K is not "too large", then the above linear pro-
gramming problem (5) can be solved accurately in a reasonable time. However,
even a crude discretization of the probability distribution of ^ typically results
in an exponential growth of the number of scenarios with increase of the num-
ber d of random parameters. Suppose, for example, that components of the
random vector ^ are mutually independently distributed each having a small
number r of possible realizations. Then the size of the corresponding input
data grows linearly in d (and r) while the number of scenarios K — r^ grows
exponentially. Yet in some cases problem (5) can be solved numerically in a
reasonable time. For example, suppose that matrices T and W are constant
(deterministic) and only h is random and, moreover, Q{x,^) decomposes into
the sum (3(x,^) = Qi{xi,hi) + ... + Qn{xn, ^n)- This happens in the case of
the so-called simple recourse with

Qi{xi,hi) = qt[xi -hiU +Q^[hi ~ Xi]^, i = l,...,n,

where q^ and q~ are some positive numbers. Then E[Q{x,^)] =E[Qi{xi,hi)]-\-


... +E[Qn{xn, hn)]^ i.e., calculation of the multidimensional expectation is re-
duced to calculations of one dimensional expectations. Of course, the above
is a rather specific case and in a general situation there is no hope to solve
problem (5) accurately (say with machine precision) even for moderate values
of r a n d d (cf., [DS03]).
It should be said at this point that from a practical point of view, typically,
it does not make sense to try to solve a stochastic programming problem with
a high precision. A numerical error resulting from an inaccurate estimation of
the involved probability distributions, modeling errors, etc., can be far bigger
than such an optimization error. We argue now that two-stage stochastic
problems can be solved efficiently with a reasonable accuracy provided that
the following conditions are met:
(a) The feasible set X is fixed (deterministic).
(b) For all X G X and ^ E S the objective function F(x, ^) is real valued.
(c) The considered stochastic programming problem can be solved efficiently
(by a deterministic algorithm) if the number of scenarios is not "too large".
When applied to two-stage stochastic programming, the above conditions
(a) and (b) mean that the recourse is relatively complete^ and the second stage
problem is bounded from below. The above condition (c) certainly holds in
the case of two-stage linear stochastic programming with recourse.
In order to proceed let us consider the following Monte Carlo sampling
approach. Suppose that we can generate an iid (independent identically dis-
tributed) random sample ^^,..., ^ ^ of A^ realizations of the considered random
vector. Then we can estimate the expected value function f{x) by the sample

^ It is said that the recourse is relatively complete if for every x ^ X and every
possible realization of random data, the second stage problem is feasible.
On Complexity of Stochastic Programming Problems 117

average'^

/;v(x):=^f^F(x,CO. (6)
Consequently, we approximate the true problem (1) by the problem:

Min/^(a;). (7)
xex
We refer to (7) as the Sample Average Approximation (SAA) problem. The
optimal value VN and the set SN of optimal solutions of the SAA problem
(7) provide estimates of their true counterparts of problem (1). It should
be noted that once the sample is generated, /iv(x) becomes a deterministic
function and problem (7) becomes a stochastic programming problem with
N scenarios ^^,...,C^ taken with equal probabilities 1/A^. It also should be
mentioned that the SAA method is not an algorithm. One still has to solve the
obtained problem (7) by employing an appropriate (deterministic) algorithm.
By the Law of Large Numbers we have that /iv(^) converges (pointwise
in x) w.p.l to f{x) as A^ tends to infinity. Therefore it is reasonable to ex-
pect for VN and SN to converge to their counterparts of the true problem
(1) with probability one (w.p.l) as A^ tends to infinity. And, indeed, such
convergence can be proved under mild regularity conditions. However, for a
fixed X G X, convergence of /Ar(x) to f{x) is notoriously slow. By the Central
Limit Theorem it is of order Op{N~^^'^). The rate of convergence can be im-
proved, sometimes significantly, by variance reduction methods. However, by
using Monte Carlo (Quasi-Monte Carlo) techniques one cannot evaluate the
expected value f{x) very accurately.
The following analysis is based on exponential bounds of the Large Devi-
ations (LD) theory (see, e.g., [DZ98] for a general discussion of LD theory).
Denote by S^ and Sf^ the sets of ^-optimal solutions of the true and SAA prob-
lems, respectively, i.e., x e S^ iS x E X and f{x) < infxex / ( ^ ) + ^- Choose
accuracy constants e > 0 and 0 < 6 < e^ and significance level a G (0,1).
Suppose for the moment that the set X is finite although its cardinahty \X\
can be very large. Then by using Cramer's LD theorem it is not difficult to
show that the sample size

guarantees that probability of the event {Sf^ C S^} is at least 1 — a (see


[KSH01],[Sha03b, section 3.1]). That is, for any A^ bigger than the right hand
side of (8) we are guaranteed that any (^-optimal solution of the corresponding
SAA problem provides an ^-optimal solution of the true problem with proba-
bility at least 1 — a, in other words, solving the SAA problem with accuracy 5
^ In order to simplify notation we only write in the subscript the sample size A^ while
actually /Ar(-) depends on the generated sample, and in that sense is random.
118 A. Shapiro, A. Nemirovski

guarantees solving the true problem with accuracy e with probability at least
1-a.
The number rj{€,6) in the estimate (8) is defined as follows. Consider a
mapping u : X \ S^ -^ X such that f{u{x)) < f{x) — e hi all x e X \ S^.
Such mappings do exist, although not unique. For example, any mapping
u : X \ S^ -^ S satisfies this condition. Choice of such a mapping gives a
certain flexibility to the corresponding estimate of the sample size. For x E X^
consider random variable

Y,:=F{uix),0-F{x,^),

its moment generating function Mx{t) :— E [e*^^] and the LD rate function^

4(z) :=s\XY>{tz-logM:,{t)].

Note that, by construction of mapping u{x)^ the inequality

/i, := E [Yx] - f{u[x)) - fix) < -e (9)

holds for all X e X\S^. Finally, we define

vie, 5):= Mn 7,M). (10)

Because of (9) and since 5 < e, the number Ix{—5) is positive provided that
the probability distribution of Yx is not "too bad". Specifically, if we assume
that the moment generating function Mx(t), of Yx, is finite valued for all t
in a neighborhood of 0, then the random variable Yx has finite moments and
Ixil^x) = I'{l^x) = 0, and r\lJix) = ^/^x where a^ := Var [Yx]. Consequently,
J^x{—S) can be approximated, by using second order Taylor expansion, as
follows

^^^-^^ —2^r~ - ^ ^ -
This suggests that one can expect the constant r}{e,5) to be of order of (e—5)^.
And, indeed, this can be ensured by various conditions. Consider the following
condition.
(Al) There exists constant cr > 0 such that for any x' ,x G X, the moment
generating function M*(^) of F{x', ^ - F(x, ^)-E[F{x\ ^) - F(x, 0 ] sat-
isfies:
M * ( t ) < e x p ( ^ a V ) , \/teR, (11)
Note that random variable F ( x ' , 0 - ^^(^,0 - E [F{x\^) - F{x,^)] has
zero mean. Moreover, if it has a normal distribution, with variance a'^, then

^ That is, /a;(-) is the conjugate of the function logMa;(-) in the sense of convex
analysis.
On Complexity of Stochastic Programming Problems 119

its moment generating function is equal to the right hand side of (11). Con-
dition (11) means that tail probabilities Prob(|F(a:',^) — F{x,^)\ > t) are
bounded from above^ by 0 ( l ) e x p ( — ^ ) . This condition certainly holds if
the distribution of the considered random variable has a bounded support.
For x' = u{x), random variable F{x\^) — F{x,^) coincides with Y^, and
hence (11) implies that Mx{t) < exp{fixt + a^t^/2). It follows that

I,{z) > sup {zt - ^xt - aH^l2) = ^^~/f ^ (12)

and hence for any e > 0 and b G [0, e)\

ri{e,5)>^^^^>^^. (13)

It follows that, under assumption (Al), the estimate (8) can be written as

2^
^ > 7 r ^ l o g ( ^ ) . (14)

Remark 1. Condition (11) can be replaced by a more general condition

M*(t) <exp(V^(t)), V t G R , (15)

where ilj(t) is a convex even function with -0(0) = 0. Then \ogMx{t) < jJix^ +
il){t) and hence Ix{z) >i)*{z — jix)^ where i/^* is the conjugate of the function
ip. It follows then that

7/(5, S) > r{-5 - ^x) > ^{e - 5). (16)

For example, instead of assuming that the bound (11) holds for all t G R,
we can assume that it holds for all t in a finite interval [—a,a], where a >
0 is a given constant. That is, we can take ilj{t) := ^a'^t if |^| < a, and
'0(t) := +00 otherwise. In that case ip*{z) = z'^/{2a'^) for \z\ < aa"^^ and
'0*(2:) = a\z\ - \O?'CT'^ for \z\ > aa'^.

Now let X be a bounded, not necessary finite, subset of R^ of diameter

D :=sup^,^^^;^||x'-x||.

Then for r > 0 we can construct a set Xr C X such that for any x G X
there is x' G Xr satisfying \\x — x^\\ < r, and \Xr\ = 0 ( 1 ) ( D / r ) ^ . Suppose
that condition (Al) holds. Then by (14), for e' > 5, we can estimate the
corresponding sample size required to solve the reduced optimization problem,
obtained by replacing X with Xr, as

By 0(1) we denote generic absolute constants.


120 A. Shapiro, A. Nemirovski
2^2
N> (s'-S)^ [n(logJ?-logr)+log(0(l)/a)]. (17)

Suppose, further, that there exists a (measurable) function K : S -^ R4. and


7 > 0 such that
\F{x\o~nx.^)\<<m^'-^r m
holds for all x',x eX and all ^G S.lt follows by (18) that
N
\fN{x') - fN{x)\ < N-' ^ | F ( x ^ e ) - F{x,e^-)| < kN\W - ^ i r , (19)

where KN '-= N~^ ^j=i f^H^)-


Let us assume, further, the following:
(A2j?he moment generating function M^{t) := E [e*'^^^^] of K{^) is finite valued
for all t in a neighborhood of 0.
It follows then that the expectation L :— E[/^((^)] is finite, and moreover, by
Cramer's LD Theorem that for any L' > L there exists a positive constant
p -= f3{L') such that
P[kN>L')<e~^^. (20)
Let XN be a (5-optimal solution of the SAA problem and xjsi G Xr be a
point such that ||xiv — ^A^H < r. Let us take N > P~^ log(2/a), so that by
(20) we have that
Prob(/^iV > L') <a/2. (21)
Then with probability at least 1 — a/2, the point XN is a ((^ + LV^)-optimal
solution of the reduced SAA problem. Setting

r:=[{e-6)/i2L')f-',
we obtain that with probability at least 1 — a / 2 , the point XN is an ^'-optimal
solution of the reduced SAA problem with e' := {s + S)/2. Moreover, by taking
a sample size satisfying (17), we obtain that xj\/ is an ^'-optimal solution of the
reduced expected value problem with probabihty at least 1 — a / 2 . It follows
that XN is an ^''-optimal solution of the SAA problem (1) with probability at
least 1 — a and e" = e^ -\- Lr^ < e. We obtain the following estimate

n(logi? + 7 - ' l o g ^ ) + l o g ( ^ ) V [/?-Mog(2/a)]


{e-5y
(22)
for the sample size (cf., [ShaOSb, section 3.2]).
The above result is quite general and does not involve the assumption of
convexity. Estimate (22) of the sample size contains various constants and
is too conservative for practical applications. However, in a sense, it gives
an estimate of complexity of two-stage stochastic programming problems. We
On Complexity of Stochastic Programming Problems 121

will discuss this in the next section. In typical applications (e.g., in the convex
case) the constant 7 = 1, in which case condition (18) means that -F(-,0 is
Lipschitz continuous on X with constant /^(O- However, there are also some
applications where 7 could be less than 1 (cf., [Sha05a]).
We obtain the following basic positive result.

Theorem 1. Suppose that assumptions (Al) and (A2) hold and X has a finite
diameter D. Then for e>0,0<S<e and sample size N satisfying (22), we
are guaranteed that any 6-optimal solution of the SAA problem is an e-optimal
solution of the true problem with probability at least 1 — a.

Let us also consider the following simplified variant of Theorem 1. Suppose


that:
(A3) There is a positive constant C such that \F{x\^) — F{x,^)\ < C for all
x', X G X and ^ ^ S.
Under assumption (A3) we have that for any a > 0 and S G [0,6:]:

Ix{-S) > 0(1)''^ ~P , for all X G X \ 5 ^ ^ G S, (23)

and hence rj{e,5) > 0{l){e — (5)^/C^. Consequently, the bound (8) for the
sample size which is required to solve the true problem with accuracy e > 0
and probability at least 1 — a, by solving the SAA problem with accuracy
6 := e/2, takes the form

„>Om(£)\o,(l^). (24,
The estimate (24) can be also derived by using Hoeffding's inequality^^ instead
of Cramer's LD bound.
In particular, if we assume that 7 = 1 and K{^) = L for all ^ e S, i.e.,
F(-,^) is Lipschitz continuous on X with constant L independent of ^ G ^ ,
then we can take C — DL and remove the term P~^ log(2/a) in the right hand
side of (22). By taking, further, S := e/2 we obtain in that case the following
estimate of the sample size

„,oa)(^)'[„,„.(5i).,o.(ffi)) (25)

We can write the following simplified version of Theorem 1.


^° Recall that Hoeffding's inequality states that if Zi,..., ZN is an iid random sample
from a distribution supported on a bounded interval [a, 6], then for any t > 0,

where Z is the sample average and fi = E[Zi].


122 A. Shapiro, A. Nemirovski

Theorem 2. Suppose that X has a finite diameter D and condition (18) holds
with 7 = 1 and K{£) = L for all ^ G S. Then with sample size N satisfying
(25) we are guaranteed that every {e/2)-optimal solution of the SAA problem
is an e-optimal solution of the true problem with probability at least 1 — a.

In the next section we compare complexity estimates implied by the bound


(25) with complexity of "deterministic" convex programming.

3 What is easy and what is difficult in stochastic


programming?
Since, generically, nonconvex problems are difficult already in the determin-
istic case, when discussing the question of what is easy and what is not in
Stochastic Programming, it makes sense to restrict ourselves with convex
problems (1). Thus, in the sequel it is assumed by default that X is a closed
and bounded convex set, and f : X -^ R is convex. These assumptions, plus
mild technical conditions, would be sufficient to make (1) easy, provided that
f{x) were given explicitly, but the latter is not what we assume in SP. What
we usually (and everywhere below) do assume in SP is that:
(i) The function F(x, ^) is given explicitly, so that we can compute efficiently
its value (and perhaps the derivatives in x) at every given pair (x,^) G
X xS.
(ii) We have access to a mechanism which is capable of sampling from the
distribution P , that is, we can generate a sample ^^, ^'^,... of independent
realizations of ^.
For the sake of discussion to follow we assume in this section that we are
under the premise of Theorem 2 and that problem (1) is convex. To proceed,
let us compare the complexity bound given by Theorem 2 with a typical
result on the "black box" complexity of the usual (deterministic) Convex
Programming.

Theorem 3. Consider a convex problem

mnfix), (CP)

where X C W^ is a closed convex set which is contained in a centered at the


origin ball of diameter D and contains a ball of given diameter d > 0, and that
f : X —^ R is convex Lipschitz continuous, with constant L. Assume that X
is given by a Separation Oracle which, given on input a point x G W^, reports
whether x E X, and if it is not the case, returns e G R'^ which separates x
and X: such that (e,x) > max2y£x(e,2/). Assume, further, that f is given by a
First Order oracle which, given on input x E X, returns on output the value
f{x) and a subgradient Vf{x), ||V/(x)||2 < L, of f at x.
On Complexity of Stochastic Programming Problems 123

In this framework, for every e > 0 one can find an e-solution to (CP) by
an algorithm which requires at most

M = 0{l)n'^ log(^)+log(f (26)

calls to the Separation and First Order oracles, with a call accompanied by
0{n^) arithmetic operations to process oracle^s answer.

In our context, the role of Theorem 3 is twofold. First, it can be viewed as


a necessary follow-up to Theorem 2 which reduces solving (1) to solving the
corresponding SAA problem and says nothing on how difficult is the latter
task. This question is answered by Theorem 3 in the convex case^^. However,
the main role of Theorem 3 in our context is the one of a benchmark for the
SP complexity results. Let us use this benchmark to evaluate the result stated
in Theorem 2.

Observation 1. In contrast to Theorem 3, Theorem 2 provides us with no


more than probabilistic quality guarantees. That is, the random approximate
solution to (1) implied by the outlined SAA approach, being ^-solution to (1)
with probability 1 — a, can be very bad with the remaining probability a. In
our "black box" informational environment (the distribution of ^ is not given
in advance, all we have is an access to a black box generating independent
realizations of (^), this "shortcoming" is unavoidable. Note, however, that the
sample size N as given by (22) is "nearly independent" of a, i.e., to reduce
unreliability from 10~^tolO~^^ requires at most 6-fold increase in the sample
size. Note that unreliability as small as 10~^^ is, for all practical purposes,
the same as 100% reliability.

Observation 2. To proceed with our comparison, it makes sense to measure


the complexity of the SAA method merely by the number of scenarios A^
required to get an e-solution with probability at least 1 — a, and to measure
the complexity of deterministic convex optimization as presented in Theorem
3 by the number M of oracle calls required to get an e-solution. The rationale
behind is that "very large" A^ definitely makes the SAA method impractical,
while with a "moderate" A/", the method becomes practical, provided that
F(-, •) and X are not too complicated, and similarly for M in the context of
Theorem 3.

When comparing bounds (25) and (26), our first observation is that both
of them depend polynomially on the design dimension n of the problem, which
is nice. What does make diff"erence between these bounds, is their dependence

^^ In our context, Theorem 3 allows to handle the most general "black box" situation
- no assumptions on F(-, ^) and X except for convexity and computability. When
-^(•)0 possesses appropriate analytic structure, the complexity of solving the
SAA problem can be reduced by using a solver adjusted to this structure.
124 A. Shapiro, A. Nemirovski

on the required accuracy e, or, better to say, on the relative accuracy^^ v :=


e/{DL). In contrast to bound (26) which is polynomial in log(l/i/), bound (25)
is polynomial (specifically, quadratic) in l/u. In reality this means that the
SAA method could solve in a reasonable time to a moderate relative accuracy,
hke u = 10% or even u = 1%^ stochastic problems involving an astronomically
large, or even infinite, number of scenarios. This was verified in a number of
numerical experiments (e.g., [LSW05, MMW99, SAGS05, VAKNS03]). On
the other hand, in general, the SAA method does not allow to solve, even
simply-looking, problems to high relative accuracy^^: according to (25), the
estimated sample size N required to achieve ly = 10~^ {ly = 10"^) is at least
of order of millions (respectively, tens of billions). In sharp contrast to this,
bound (26) says that in the deterministic case, relative accuracy u = 10~^ is
just by factor 5 "more costly" than i/ = 0.1.
It should be stressed that in our general setting the outlined phenomenon
is not a shortcoming of the SAA method - it is unavoidable. Indeed, given
positive constants L,D and e such that u = e/{LD) < 0.1, consider the pair
of stochastic problems:

Min {/;,(a:):=EpJx^]} (SP,)


a:G[u,l/J

indexed by x = =tl, and with distribution P-^ of ^ supported on the two-point


set {—L;L} on the axis. Specifically, Pi assigns the mass 1/2 — 4^ to the
point —L and the mass 1/2 + 4z/ to the point L, while P_i assigns to the
same points —L^L the masses 1/2 + 4i/, 1/2 — 4z>', respectively. Of course,
/i(x) = 4ejD~^x, / - i ( x ) = ~4:eD~^x^ the solution to (SPi) is x\ — 0, while
the solution to (SP_i) is a;_i = D. Note, however, that the situation is that
trivial only when we know in advance what is the distribution P^ we deal
with. If it is not the case and all we can see is a sample of A^ independent
realizations of ^, the situation changes dramatically: an algorithm capable of
solving with accuracy s and reliability 1 — a = 0.9 every one of the problems
(SP±i) using sample of size N^ would, as a byproduct, imply a procedure
which, given the sample, decides, with the same reliability, which one of the
two possible distributions P±i underlies the sample. The laws of Statistics say
that such a reliable identification of the underlying distribution is possible only
when A^ > 0{l)^^t (compare with bound (25)). Note that both stochastic
problems in question satisfy all the assumptions in Theorem 2, so that in the

Recall that, under assumptions of Theorem 2, DL gives an upper bound on the


variation of the objective on the feasible domain. While using bound (22) we can
take u := e/a. Passing from £ to i/, means quantifying inaccuracies as fractions
of the variation, which is quite natural.
It is possible to solve true problem (1) by the SAA method with high (machine)
accuracy in some specific situations, for example, in some cases of linear two-stage
stochastic programming with a finite (although very large) number of scenarios,
see [SHOO, SHK02].
On Complexity of Stochastic Programming Problems 125

situation considered in this statement the bound (25) is the best possible (up
to logarithmic term) as far as the dependence on D,L and e is concerned.
To make our presentation self-contained, we explain here what are
the "laws of Statistics" which underlie the above conclusions. First,
an algorithm A capable of solving within accuracy e and reliability 0.9
every one of the problems (SP±i), given an A/'-element sample drawn
from the corresponding distribution, indeed implies a "0.9-reliable"
procedure which decides, based on the same sample, what is the dis-
tribution; this procedure accepts hypothesis I stating that the sample
is drawn from distribution Pi if and only if the approximate solu-
tion generated by A is in [0, D/2]; if it is not the case, the procedure
accepts hypothesis II "the sample is drawn from P_i". Note that if
the first of the hypotheses is true and the outlined procedure accepts
the second one, the approximate solution produced by A is not and
e-solution to (SPi), so that the probability p^ to accept the second hy-
pothesis when the first is true is < 1 — 0.9 == 0.1. Similarly, probability
p^^ for the procedure to accept the first hypothesis when the second
is true is < 0.1. The announced lower bound on A^ is given by the
following observation: Consider a decision rule which, given on input
a sequence ^^ of N independent reahzations of ^ known in advance
to be drawn either from the distribution Pi, or from the distribution
P-i, decides which one of these two options takes place, and let p^,
p^^ be the associated probabilities of wrong decisions. Then

max{p^p"} < 0.1 implies that N > 0 ( l ) i / ~ ^ (27)

where 0(1) is a positive absolute constant.


Indeed, a candidate decision rule can be identified with a subset S
of C^', this set is comprised of all realizations ^^ resulting, via the
decision rule in question, in acceptance of hypothesis I. Let P/^, P^i
be the distributions of ^^ corresponding to hypotheses I, II. We clearly
have

Now consider the Kullback distance from P(^ to P^^i:

the function p l o g | of two positive variables p,q is jointly convex;


denoting by S the complement of S in C^ and by | ^ | the cardinality
of a finite set A, it follows that
126

4^es

whence
yNfcNW / ^I
E'o.(^)p."«'')..-.o. 1-p"

and similarly

whence

For every p G (0,1/2), the minimum of the left hand side in the latter
inequality in p^p^^ G (0,j9] is achieved when p^ — p^^ = p and is equal
to plog ^ + (1 - p) log i ^ > 4(p - 1/2)2. Thus,

p := max[p\p"] < 1/2 implies that K>{2p- if, (28)

On the other hand, taking into account the product structure of P±i^
we have

/C = i v [ P i ( - L ) l o g ^ i ^ + P i ( L ) l o g ^ ]

The concluding quantity is < 0{l)Nu'^, provided that z/ < 0.1. Com-
bining this observation and (28), we arrive at (27).

Observation 3. One can argue that the phenomenon discussed in Observa-


tion 2 is not too dangerous from the practical viewpoint. In reality, especially
in an "uncertain one", treated in stochastic models, relative accuracy like 1%
or 5% is more than satisfactory. This indeed is true in numerous applications,
which, in our opinion, is the intrinsic reason for Stochastic Programming to
be of significant practical value. At the same time, there are some unpleasant
exceptions; the most disturbing, from applied viewpoint, is the one related
to problems without relatively complete recourse. This is the issue we are
consider next.
On Complexity of Stochastic Programming Problems 127

The above analysis, summarized in Theorem 2, implicitly depends on the


assumptions (i) and (ii) formulated in the beginning of this section (which are
parallel to the assumptions (a)-(c) specified in the previous section). When
applied to two-stage stochastic programming with recourse these assumptions
imply that the recourse is relatively complete, i.e., for every x G X and every
possible realization of ^, the second stage problem is feasible. If, on the other
hand, for some x £ X and ^ G S' the second stage problem is infeasible, we can
formally set the value F(x, <f) of the second stage problem to be +00. In order
to avoid such infinite penalizations and to restore the appHcabihty of Theorem
2 one can introduce a finite penalty for infeasibility. In some cases this can
reasonably solve the problem. However, in some situations the infeasibility
may result in a catastrophic event. In that case the penalty could be huge.
Translated into the sample size bounds considered in the previous section, this
means huge variances in the estimate (22) or huge Lipschitz constant in (25),
which makes these estimates useless. In a sense, in such situation "nothing
works".
It is NP-hard even to check whether a given first-stage decision x £ X
leads to feasible, with probability 1, second-stage problem, and even in the
case when the second-stage problem is as simple as

Min(^, y) subject to Tx + Wy> h, (29)


y

with only the second-stage right hand side vector h = h{^) being random.
To see that a generic problem of checking whether (29) is feasible for a
given X is NP-hard, consider the case when the constraints Tx-\-Wy >
h{^) read y < 0, y -\- x > h{^), where x, ?/ G R,

Q = [Qij] is a given d x d symmetric matrix, and ^ = (^i,...,^d)


is uniformly distributed in [—1,1]^. Here x results in feasible, with
probability 1, second stage problem if and only ii x > p{Q)^ where

p(Q):=max{(e,QO:ee[-l,l]''}.

It is well-known that given x and Q, it is NP-hard to distinguish be-


tween the cases of x < p{Q) and x > 1.01 p(Q). This NP-hard prob-
lem is, of course, not more difficult than to decide whether x > p{Q).
Note that replacing in the above example the uniform distribution on
[—1,1]^ with the uniform distribution on the discrete set, of cardinal-
ity 2^, of d-dimensional vectors with entries ± 1 , we end up with an
equally difficult problem.
Thus, if a two-stage (linear) problem has no relatively complete recourse
(which in many applications is a rule rather than an exception), it is, in gen-
eral, NP-hard just to find a feasible first-stage solution x (one which results
128 A. Shapiro, A. Nemirovski

in finite /(x)), not speaking about minimizing over these x's. As it was men-
tioned above, the standard way to avoid, to some extent, this difficulty is to
pass to a penahzed problem. For example, we can replace the second stage
problem (4) with the penalized version:

Min {q, y) + rz subject to Tx + Wy > h - ze, (30)


2/>0, z>Q

where e is vector of ones and r > > 1 plays the role of the penalty coefficient.
With this penalization, the second stage problem becomes always feasible. At
the same time, one can hope that with large enough penalty coefficient r, the
first-stage optimal solution will lead to "nearly always nearly feasible" second-
stage problems, provided that the original problem is feasible. Unfortunately,
in the situation where one cannot tolerate arising, with probability bigger
than a, a second-stage infeasibility z bigger than r (here a and r are given
thresholds), the penalty parameter r should be of order of {ar)~^. In the
"high reliability" case a < < 1 we end up with problem (30) which contains
large coefficients, which can lead to large value of the Lipschitz constant L^ of
the optimal value function Fr{'^^) of the penalized second stage problem. As
a result, quite moderate accuracy requirements (like e being of order of 5% of
the optimal value of the true problem) can result in the necessity to solve (30)
within a pretty high relative accuracy u = e/{DLr) like 10~^ or less, with all
unpleasant consequences of this necessity.

3.1 What is difficult in the tv^o-stage case?

We already know partial answer to this question: generically, under the


premise of Theorem 2 it is difficult to solve problem (1) (even a convex one)
to a high relative accuracy v = e/{DL), Note, however, that the statistical
arguments demonstrating that this difficulty lies in the nature of the prob-
lem work only for the black-box setting of (1) considered so far, that is, only
in the case when the distribution P of ^ is not known in advance, and all
we have in our disposal is a black box generating realizations of ^. With a
"good description" of P available, the results could be quite difi'erent, as it
is clear when looking at problems (SP±i) - with the underlying distributions
given in advance, the problems become trivial. Note that in reality stochastic
models are usually equipped with known in advance and easy-to-describe dis-
tributions, like Gaussian, or Bernoulli, or uniform on [—1,1]^. Thus, it might
happen that our conclusion "it is difficult to solve (1) to high accuracy" is an
artifact coming from the black-box model we used, and we could overcome
this difficulty by using more advanced solution techniques based on utilizing
a given in advance and "simple" description of P. Unfortunately, this virtual
possibility does not exist in reality Specifically, it is shown in [DS03] that in-
deed it is difficult to solve to high accuracy already two-stage linear stochastic
programs with complete recourse and easy-to-describe discrete distributions.
On Complexity of Stochastic Programming Problems 129

Another difficulty, which we have already discussed, is the case of two-


stage linear problems without complete recourse or, more generally, convex
problems (1) with only partially defined integrand F{x,^). As we have seen,
this difficulty arises already when looking for feasible first-stage solutions with
known in advance simple distribution P.

3.2 Complexity of multi-stage stochastic problems

In a multi-stage stochastic programming setting random data ^ is partitioned


into T > 2 blocks ^t, t = 1^ ...,T^ i.e., ^t is viewed as a (discrete time) random
process, and the decisions are made at time instants 0,1, ...,T. At time t the
decision maker already knows the realizations ^r? T" ^ -^5 of the process up to
time t, while realizations of the "future" blocks are still unknown. The goal is
to find the first-stage decisions x (which should not depend on ^) and decision
rules yt = yt{^[t]) which are functions of ^[t] '•= ( 6 , •••,6)5 ^ = 1, •••,^, which
satisfy a given set of constraints

9i{£,,x,yi,...,yT) < 0, i = 1,...,/, (31)

and minimize under these restrictions the expected value of a given cost func-
tion /(x,2/i, ...,yT)- Note that even in the case when the functions gi do not
depend of ^, the left hand sides of the constraints (31) are functions of ^,
since all yt are so, and that the interpretation of (31) is that these functional
constraints should be satisfied with probability one.
In the sequel, we focus on the case of linear multi-stage problems

Min Ep {{co,x) + Er=i(ct(^[tl), J/t(^[tl))}


s.t. Alx>b° (Co)
Aj(C[i])x + ^}(^[i])j/i(^[i]) > 6Heiii) (Ci)
Al{^^2])x + A?(^[2,)yi(^[i]) + ^i(e[2])2/2(C[2]) > ^''(^[21) {C2)

Al{(,^T])x + Aj{^yr])yx{^[i]) + - + ^?(?m)j/r(^(Ti) > 6^(^[T)) {CT)


(32)
where y(-) = {yi{'), ....yri')) and the constraints (Ci),..., (CT) should be sat-
isfied with probability one. Problems (32) are called problems with complete
recourse, if for every instant t and whatever decisions x, yi,...yt-i made at
preceding instants, the system of constraints (Ct) (treated as a system of
linear inequalities in variable yt) is feasible for almost all realizations of ^.
The major focus of theoretical research is on multi-stage problems even sim-
pler than (32), specifically, on problems with Bxed recourse where matrices
Al = ^J(^[i]), ^ = 1, ...,T, are assumed to be deterministic (independent of ^).
We argue that multi-stage problems, even linear of the form (32) with
complete recourse, generically are computationally intractable already when
medium-accuracy solutions are sought. (Of course, this does not mean
that some specific cases of multi-stage stochastic programming
130 A. Shapiro, A. Nemirovski

problems cannot be solved efficiently.) Note that this claim is rather a


belief than a statement which we can rigorously prove. It is even not a formal
statement which can be true or wrong since, in particular, we do not specify
what does "medium accuracy" mean^^. What we are trying to say is that we
believe that in the multi-stage case (with T treated as varying parameter, and
not as a once for ever fixed entity), even "moderately positive" results like the
one stated in Theorem 2 are impossible. We are about to explain what are
the reasons for our belief.
Often practitioners do not pay attention to a dramatic difference between
two-stage and multi-stage case. It is argued that in both cases the problem
of interest can be written in the form of (1), with appropriately defined inte-
grand F. Specifically, in case of the linear two-stage problem, with relatively
complete recourse, we have that F(x,^) = (c, x) + (5(x,^), where Q{x,^) is
the optimal value of the second stage problem (4). In the case of problem (32)
with complete recourse, F{x,^) is given by a recurrence as follows. We start
with setting

FT{X, 2/1,..., yr, ^[T]) — (co, x) + (ci(^[1]), yi) + ... + (CT-I(CfT-ij), y r - i )


+ (CT(?[TI),2/T)

and specifying the conditional, given ^[T-I]? expected cost of the last-stage
problem:

FT-i(x,yi,...,2/T-i,<?[T-i]) :==E|^j^_^^Min|FT(x,i/i,...,2/T-i,yT,^[ri) •
Aoi^[T])x + Al{^[T])yi + . . . + A^{^^T])yT
>&^(?[T])},

where Ei^^^jj is the conditional, given C[T-I]? expectation. Observe that (32)
is equivalent to the (T — l)-stage problem:

Min EpT-i {FT-i{x,yi,...,yT-i,^iT-i])}


^,{2/t(.)}r=Y (PT-I)
s.t. x,yi(.),..., yr-ii') satisfy (Co), (Ci),..., (CT-I) w.p.l,

where P^~^ is the distribution of ^[7^_i]. Now we can iterate this construction,
ending up with the problem

Min[Fo(a;)].

It can be easily seen that under the assumption of complete recourse, plus
mild boundedness assumptions, all functions F^(x,2/i, -..,2/^,^^]) ^^^ Lipschitz
continuous in the x, y-arguments.
^^ To the best of our knowledge, the complexity status of problem (32), even in
the case of complete and fixed recourse and known in advance easy-to-describe
distribution P, remains unknown (cf., [DS03]),
On Complexity of Stochastic Programming Problems 131

The "common wisdom" says that since both, two-stage and multi-stage,
problems are of the same generic form (1), with the integrand convex in x, and
both are processed numerically by generating a sample of scenarios and solving
the resulting "scenario counterpart" of the problem of interest, there should be
no much difference between the two and the multi-stage case, provided that in
both cases one uses the same number of scenarios. This "reasoning", however,
completely ignores a crucial point as follows: in order to solve generated SAA
problems efficiently, the integrand F should be efficiently computable at every
pair (x, ^). This is indeed the case for a two-stage problem, since there F(x, ^)
is the optimal value in an explicit Linear Programming problem and as such
can be computed in polynomial time. In contrast to this, the integrand F
produced by the outlined scheme, as applied to a multi-stage problem, is
not easy to compute. For example, in 3-stage problem this integrand is the
optimal value in a 2-stage stochastic problem, so that its computation at a
point is a much more computationally involving task than similar task in the
two-stage case. Moreover, in order to get just consistent estimates in an SAA
type procedure (not talking about rate of convergence) one needs to employ a
conditional sampling which typically results in an exponential growth of the
number of generated scenarios with increase of the number T of stages (cf.,
[ShaOSa]).
Analysis demonstrates that for an algorithm of the SAA type, the total
number of scenarios needed to solve T-stage problem (32), with complete
recourse, would grow, as e diminishes, as £:~^^, so that the computational
effort blows up exponentially as the number of stages grows^^ (cf., [Sha05b]).
Equivalently, for a sampling-based algorithms with a given number of sce-
narios, existing theoretical quality guarantees deteriorate dramatically as the
number of stages grows. Of course, nobody told us that sampling-type algo-
rithms are the only way to handle stochastic problems, so that the outlined
reasoning does not pretend to justify "severe computational intractability" of
multi-stage problems. Our goal is more modest, we only argue that the fact
that when solving a particular stochastic program a sample of 10^ scenar-
ios was used does not say much about the quality of the resulting solution:
in the two-stage case, there are good reasons to believe that this quality is
reasonable, while in the 5-stage the quality may be disastrously bad.
We have described one source of severe difficulty arising when solving
multi-stage stochastic problems - dramatic growth, with increase of the num-
ber of stages, in the complexity of evaluating the integrand F in representation
(1) of the problem. We are about to demonstrate that even when this difficulty
does not arise, a multi-stage problem still may be very difficult. To this end,
consider the following story: at time t = 0, one has $ 1, and should decide how
to distribute this money between stocks and a bank account. When investing
amount of money x into stocks, the value Ut of the portfolio at time t will be

^^ Note that in the considered framework, T = 1 corresponds to two-stage program-


ming, T = 2 corresponds to 3-stage programming, and so on.
132 A. Shapiro, A. Nemirovski

given by chain of t relations

where the returns pt{^[t]) = pti^ii "•'>^t) ^ 0 are known functions of the
underlying random parameters. Amount of money 1 — x put to bank account
reach at time t the value Vt = p^{l — x), where p > 0 is a given constant. The
goal is to maximize the total expected wealth E[UT + VT] at a given time T.
The problem can be written as a simple-looking T-stage stochastic problem
of the form (32):

MinEp[^T(H+^T(e^)]
s.t. 0 < X < 1 (Co)
Mi[i]) = pii^ii])^^ M^m) = ^(1 - x) (Ci)
^2(^[2]) = P2(^[2j)^i(^[i)), '^2(C[2]) = pM^m) (C2)

y'Ti^lT]) = PT-li^lT-lj^T-liClT-l]), ^^ri^lT]) = pVT-l{^[T-l]) {CT),


(33)
rp

where y(-) = ('^t(')''^*(•))*=!• ^ ^ ^ let us specify the structure and the distri-
bution of ^ as follows: a realization of ^ is a permutation ^ = (^1,..., ^7-) of T
elements 1,..., T, and P is the uniform distribution on the set of all T! possible
permutations. Further, let us specify the returns as follows: the returns are
given by a T X T matrix A with 0-1 elements, and
/>t(6,...,6):-/^A6. /^:=(T!)^/^
(Note that by Stirling's formula n = (T/e)(l + o(l)) as T -> 00.) We end up
with a simple-looking instance of (32) with complete recourse and given in
advance "easy-to-describe" discrete distribution P; when represented in the
form of (1), our problem becomes
T
Min { / ( x ) : = E p F ( x , 0 } , F{x,£) = p^{I - x)-^ xT\{KAt^,), (34)
arG[0,lJ ^-^

so that F indeed is easy to compute. Thus, problem (33) looks nice - complete
recourse, simple and known in advance distribution, no large data entries,
easy-to-compute F in representation (1). At the same time the problem is
disastrously difficult. Indeed, from (34) it is clear that f{x) = p^{l — x) -\-
xper(A), where per (A) is the permanent of A\
T

per(^) = ^][[^*^-
e t=i

(the summation is taken over all permutations of T elements 1, ...,T). Now,


the solution to (34) is either a; = 1 or x = 0, depending on whether or not
On Complexity of Stochastic Programming Problems 133

per(^) > p^. Thus, our simple-looking T-stage problem is, essentially, the
problem of computing the permanent of a T x T matrix with 0-1 entries. The
latter problem is known to be really difficult. First of all, it is NP-hard, [Val79].
Further, there are strong theoretical reasons to doubt that the permanent can
be efficiently approximated within a given relative accuracy 5, provided that
£ > 0 can be arbitrarily small, [DLMV88]. The best known to us algorithm
capable to compute permanent of a T x T 0-1 matrix within relative accuracy e
has running time as large as£-2exp{0(l)Ti/2log^(T)} (cf., [JV96]), while the
best known to us efficient algorithm for approximating permanent has relative
error as large as c^ with certain fixed c > 1, see [LSWOO]. Thus, simple-looking
multi-stage stochastic problems can indeed be extremely difficult...
A reader could argue that in fact we deal with a two-stage problem (34)
rather than with a multi-stage one, so that the outlined difficulties have noth-
ing to do with our initial multi-stage setting. Our counter-argument is that
the two-stage problem (34) honestly says about itself that it is very difficult:
with moderate p and T, the data in (34) can be astronomically large (look at
the coefficient p^ of (1 — x) or at the products Y\t=i{i^At^^) which can be as
large as K^ = T!), and so is the Lipschitz constant of F. In contrast to this,
the structure and the data in (33) look completely normal. Of course, it is
immediate to recognize that this "nice image" is just a disguise, and in fact
we are dealing with a disastrously difficult problem. Imagine, however, that
we add to (33) a number of redundant variables and constraints; how could
your favorite algorithm (or you, for that matter) recognize in the resulting
messy problem that solving it numerically is, at least at the present level of
our knowledge, a completely hopeless endeavor?

4 Some novel approaches


Here we outline some novel approaches to treating uncertainty which per-
haps can cope, to some extent, with intrinsic difficulties arising in two-stage
problems without complete recourse and in multi-stage problems.

4.1 Tractable approximations of chance constraints

As it was already mentioned, a natural way to handle two-stage stochastic


problems without complete recourse is to impose chance constraints. That is,
to require that a probability of insolvability of the second-stage problem is at
most e « 1 instead of being 0. The rationale behind this idea is twofold: first,
from the practical viewpoint, "highly unlikely" events are not too dangerous:
why should we bother about a marginal chance, like 10~^, for the second stage
to be infeasible, given that the level of various inaccuracies in our model, es-
pecially in its probabilistic data, usually is by orders of magnitude larger than
10"^? Not speaking of the fact that 5 days a week we take worse chances in
the morning traffic. Second, while it might be very difficult to check whether
134 A. Shapiro, A. Nemirovski

a given first-stage solution results in a feasible, with probability 1, second-


stage problem, it seems to be possible to check whether this probability is at
least 1 — <s by applying Monte-Carlo simulation. Note t h a t chance constraints
arise naturally not only in the context of two-stage problems without com-
plete recourse, b u t in a much more general situation of solving a constrained
optimization problem with the d a t a aflFected by stochastic uncertainty. Thus,
it makes sense to pose a question how could one process numerically a chance
constraint
(t>{x) := P r o b { ^ ( x , 0 < 0} > 1 - ^. (35)
where x is the decision vector, ^ is the random disturbance with, say, known
distribution, and e < < 1 is a given tolerance.
T h e concept of chance constraints originates from [CC59] and is one of the
oldest concepts in Operations Research. Unfortunately, in its nearly 50 year
old age, this concept still cannot be treated as practical. T h e first reason is
t h a t typically it is extremely difficult to verify exactly whether this constraint
is satisfied at a given point. This problem is difficult already in the case of a
single linear constraint g{x,() := (a* + ^,x) with perturbations ^ uniformly
distributed in a box. Another severe problem is t h a t usually constraint (35),
even with very simple, say bi-affine in x and in ^, function g{x, ^) and simple-
looking distribution of ^ (like uniform in a box) defines a nonconvex feasible
set in t h e space of decision variables, which makes problematic subsequent
optimization over this set of even pretty simple - j u s t linear - objectives.
The difficulty we have just outlined rules out the idea to approximate (35)
by a "sample version" of this constraint, that is, by

1
^N{X) := T7 ^ h9ix,^n<o} >^-0e, (36)

where ^^, ...,^^ is a sample of N independent realizations of ^, ^{g{x,^J)<o}


is the indicator function^^ of the event {g{x^^^) < 0}, and ^ < 1 is fixed
(say, 9 — 0.99). When A'' > > £ ~ \ the validity of (36) at a point x implies,
with probability close to 1, the validity of (35), so that (36) can be thought
of as a "computable approximation" of (35). Unfortunately, the left hand
side in (35) is, generically, a nonconvex (and even discontinuous) function
of x, so that we have no way to optimize under this constraint.
To the best of our knowledge, the only generic case where b o t h these severe
difficulties disappear is the case of linear constraint (a* + ^, x) < 0 with
normally distributed d a t a ^ ~ A/'(0, E). In this case, (35) is equivalent to the
convex deterministic constraint

(a*, x) -h f2{e) A / ( X , UX) < 0, (37)

where the "safety parameter" f2{e) = A/21og(l/e)(l + o ( l ) ) , £ —> 0, is readily


given by e (which we assume to be < 1/2).
16
HA =^ 1 if the event A happens, and 1A = 0 otherwise
On Complexity of Stochastic Programming Problems 135

There is another generic case when the feasible set given by a chance
constraint is convex. This is the case when the constraint can be rep-
resented in the form (x, ^) G Q, where Q is a closed and convex set,
and the distribution P of the random vector ^ G M^ is logarithmically
quasi-concave^ meaning that
P{XA + (1 - X)B) > max [P{A), P{B)]
for all closed and convex sets A^B dW^ (cf., Prekopa [Pre95]). Ex-
amples include uniform distributions on closed and bounded convex
domains, normal distribution and every distribution on R^ with den-
sity /(^) with respect to the Lebesgue measure such that the function
/~^^^(0 is convex. The related result (due to Prekopa [Pre95]) is that
in the situation in question, the set {x : P{{i : (x,^) G Q]) > a} is
closed and convex for every a. This result can be applied, e.g., to
two-stage stochastic programs with chance constraints of the form
Min(c, x) s.t. Prob{32/ eY:Tx + Wy>^}>l-e,
XEX

where X, Y are closed convex sets and T, W are fixed matrices. Here
the chance constraint indeed is of the form Prob{(x, ^) e Q} >l — e,
where
Q = {{x,0 '^y eV :Tx ^Wy > 0.
The set Q clearly is convex; under mild additional assumptions, it is
also closed. Thus, the feasible set of the chance constraint in question
is convex, provided that the distribution of ^ is logarithmically quasi-
concave.
Note that the outlined convexity results are applicable only to the
chance constraints coming from scalar or vector inequalities where
the only term affected by uncertainty is the right hand side, not the
coefficients at the variables. For example, nothing similar is known for
the chance constraint
P r o b { ( a * + ^ , x ) < 0} > 1 - e,
except for the already mentioned case of normally distributed vector

Aside of few special cases we have mentioned, chance constraint (35) "as it
is" seems to be too difficult for efficient numerical processing, and what we
can try to do is to replace it with its "tractable approximation". For the
time being, there exist two approaches to building such an approximation:
"deterministic" and "scenario".

Tractable deterministic approximations of chance constraints,


With this approach, one replaces (35) with a properly chosen deterministic
constraint
136 A. Shapiro, A. Nemirovski

V^s(^)<0, (38)
which is a "safe computationally tractable" approximation of (35), with the
latter notion defined as follows:
1. "Safety" means that the validity of (38) is a sufRcient condition for the
vahdity of (35);
2. "Tractability" means that (38) is an explicitly given convex constraint.
Just to give an example, consider a randomly perturbed linear constraint, that
is, assume that

where the deterministic vector a* is the "nominal data", M is a given deter-


ministic matrix and ^ — (^i, ...,^ci) is a tuple of d independent scalar random
variables with zero mean and "of order of 1":

IE[exp(e,')] < e x p { l } , i-l,..,d,

e.g., ^i can have a distribution supported on the interval [—1,1], or ^i can have
normal distribution A/'(0, 2~^/^), = l,...,o!. In this case, applying standard
results on probabilities of large deviations for sums of "light tail" independent
random variables with zero means, one can easily verify that when e E (0,1)
and f2{£) = 0 ( l ) ^ l o g ( l / £ ) with properly chosen absolute constant 0 ( l ) , then
the validity of the convex constraint

(a*, x) + i?(£) ^J{x, MM^x) <0 (39)

is a sufficient condition for the validity of (35). (Note that under our assump-
tions MM^ is an upper bound on the covariance matrix of ^, and compare
with (37).)
The simple result we have just described is rather attractive. First, it does
not require a detailed knowledge of the distribution of ^. Second, the approx-
imation, although being more complicated than a linear constraint we start
with, still is pretty simple; modern convex optimization techniques can process
routinely to high accuracy problems with thousands of decision variables and
thousands of constraints of the form (39). Third, the approximation is "not
too conservative" - the safety parameter Q{e) grows pretty slowly as 5 -^ 0
and is only by a moderate constant factor larger than the safety parameter
in the case of Gaussian noise, where our approximation is not conservative at
all.
Recently, "not too conservative" computationally tractable safe approxi-
mations were built (see [Nem03]) for chance versions of well-structured non-
linear convex constraints with nice analytic structure, specifically, for affinely
perturbed least squares constraints

[A, + J2 ^^^i]^ -[^- + J2 ^i^i] 2


On Complexity of Stochastic Programming Problems 137

and Linear Matrix Inequality constraints


m

K + E ^^^°] + E ^i K + E ^^^ ^ 0
i j=l i

{A^ are symmetric matrices, A>zO means that A is symmetric positive semi-
definite). In both cases, ^i are independent scalar disturbances with zero mean
and "of order of 1". However, the outUned approach, whatever promising we
beheve it is, seemingly works for a very restricted family of "well-structured"
functions ^(x,^), and even in these cases requires a lot of highly nontrivial
"tailoring" to a particular structure in question. Consider, for example, the
case of chance constraint associated with two-stage linear stochastic problem:

gix, 0 := Min {z : T{Ox + W{Oy > HO -ze,z>0}, (40)


z,y

where e is vector of ones. Note that here g{x,0 is convex in x, and g{x,0 ^ ^
if and only if the second-stage problem

Mm(q{0,y) s.t. T{0x + W{0y>h{0


y

is feasible (cf., (30)). Thus, the chance constraint requires from x to result in
a feasible, with probability at least 1 — e, second stage problem. Even in the
case of simple recourse (T, W are independent of ^) the chance constraint in
question seems to be by far too difficult to admit a safe tractable deterministic
approximation.

Scenario approximation.

In contrast to the "highly specialized and heavily restricted" approach we


have just considered, the scenario-based approach is completely universal. We
just generate a sample ^^, ...,^^ oi N "scenarios" - independent realizations
of the random disturbance ^ - and approximate (35) by the random system
of inequalities
^ ( x , ^ ^ ) < 0 , j = l,...,iV. (41)
Extremely nice features of this approach are its generality and computational
tractability - whenever g{x,0 is convex in x and efficiently computable (as it
is the case, e.g., with the function (40)), (41) becomes a system of explicitly
given convex constraints and as such can be efficiently processed numerically,
provided that the number of scenarios N is not prohibitively large. The ques-
tion, of course, is how large should be the sample in order to ensure, with
reliability close to 1, that every feasible solution to (41) satisfies the chance
constraint (35). This question is by far not easy, and we do not intend to
discuss relevant nice and deep results known from the literature, since in fact
we are more interested in a slightly different question, namely, as follows:
138 A. Shapiro, A. Nemirovski

{Q)Assume we are given a convex optimization problem

Min fix) s.t.g{x,O<0 (42)

(all f, g are convex in x) with ^ being a random vector with a known


distribution, and, given tolerance e > 0, replace this problem with its
^'scenario counterpart^^

Mm fix) s . t . p ( x , e ) < 0 , j = l,...,iV. (43)

How large should be the sample size N in order for the optimal solution
XN of (43) to be feasible for (42) with probability at least 1 — e?
The difference between the latter question and the former one is that now we
do not require from all points feasible for (41) to satisfy (35), we require this
property to be possessed by a specific point, XN, we are interested in.
As it was discovered in [CC05, CC04], question (Q) admits a nice "uni-
versal" answer. Namely, under extremely mild assumptions it turns out that
whenever e,5 G (0,1/2) and

iV>^,„,(H) + ?,„,0)+2„, (44)


the probability of "bad sampling" which results in XN not satisfying (35) is less
than or equal to 6. Note that this result, which heavily utilizes the convexity of
(42), is completely "distribution-free" - it is independent of any assumptions
on the distribution of ^ and requires no knowledge of this distribution.
All this being said, there is a serious problem with the scenario approach
as presented so far - it becomes impractical when the required value of e
is really small, like 10~^ or 10~^. Indeed, for those e relation (44) results in
unrealistically large samples. Note that pretty small values oie are completely
reasonable when speaking about a "hard" constraint gix,^) < 0, that is, such
that its violation has very severe or even catastrophic consequences, like heavy
jam in a communication network, a blackout caused by malfunctioning of a
power supply network, not speaking about exploding nuclear power plants or
airliners falling from the sky. In a sense, in the context of chance constraints
hard restrictions and implied pretty small values of e seem to be a rule rather
than exception. Indeed, "soft" constraints - those with e like 1% or 0.1% -
can be eliminated altogether by augmenting the objective with appropriate
penalties^^.

^^ It should be added that the outlined "crude" scenario approach is not completely
satisfactory even when e is not too small. Indeed, assume that your problem has
n = 100 variables and you are ready to take 10% chances (e == ^ = 0.1). To this
end, you use the scenario approach with the smallest N allowed by (44), that is,
N = 9835. What should be the actual probability e' for a fixed point x to violate
On Complexity of Stochastic Programming Problems 139

One could be surprised by the fact that we treat as acceptable the SAA
method with the complexity proportional to e"^, e being the required tol-
erance in terms of the objective, and are dissatisfied with the scenario ap-
proach where the sample size is merely inverse proportional to the tolerance
e. To explain our point, think whether you will agree (a) to use a portfolio
management policy with the average profit by at most 0.5% less than the
"ideal" - the optimal - one, and (b) to board an airliner which may crash
during the flight with probability 0.5% (or 0.05%).

W h e n handling hard chance constraints - those with really small e, like 10~^
or less - we would like to have sample sizes polynomial in b o t h log(l/6:) and
\og{l/5) rather t h a n to be polynomial in {l/e) and \og{l/5). We are about to
explain t h a t under favorable circumstances, such a possibility does exist; it is
given by combining scenario approach with a kind of importance sampling. To
proceed, assume t h a t the constraint g{x^ 0 —^ underlying (35) is of a specific
structure as follows: there exists a closed convex set K C R"^ and an afl&ne
mapping x — i > ^[^Jx + &[^] : M^ -^ M"^ depending on ^ as on a parameter such
that
gix,0<O^A[^]x + b[^]eK, (45)
Moreover, let us assume t h a t the affine mapping in question is affinely para-
meterized by ^, t h a t is, both A[^] and b[^] depend affinely on ^. Finally, we
may assume without loss of generality t h a t ^ has zero mean.

As an instructive example, consider the feasibility constraint associated with


the second-stage problem, that is, the constraint g{x,$,) < 0 with g{x,^)
given by (40). Assuming fixed recourse,,that is, W{$,) = W being indepen-
dent of ^, let us set

K := {u : 3y such that u < Wy}.

Note that X is a convex polyhedral (and thus closed) set. Now, it is clear
from (40) that g(x,^) < 0 if and only if h(0 - T{^)x e K. It follows that
when passing from uncertain parameter ^ to the new uncertain parameter
^= [h{i),T{^)]-'K{[h[i),T{i)]} and updating accordingly the underlying
distribution, we arrive at the situation described in (45).

Under our assumptions, the vector A[^]x + h[^] is afEne in ^, and thus can be
represented as a[x]^ + /^N? where a[x], f3[x\ are affine in x. It follows t h a t

g{x, 0 < 0 <^ A[^]x + h[^] eK<^^eK:,:={u : a[x]u + /3[x] G K}. (46)

Note t h a t the set Kx is closed and convex along with K. Now, numer-
ous i m p o r t a n t distributions IT on R^ with zero mean (multivariate normal,

the constraint g{x,^) < 0 in order to be feasible for (43) with probability 0.9?
The answer is: e' should be as small as 10~^. Thus, when applied with small e,
the crude scenario approach becomes impractical, while in the case of "large" e
it seems to be too conservative.
140 A. Shapiro, A. Nemirovski

uniform on a multidimensional box, etc.) possess a kind of "concentration


property" as follows: if Q is a closed convex set in R^ and 11 {Q) > c,
where c < 1 is a characteristic constant of / I , then the probability of the
event Q~^r} £ Q, rj r.^ P, rapidly approaches 1 as i? > 1 grows, namely,
n{{r}: i7~^7/ ^ Q}) < C~^ exp{—Ci?^}, where C is another characteristic
constant of 77. For example, in the case of multivariate normal distribution
IT with zero mean, then n{Q) > 0.8 implies, for a closed convex set Q, that
n{{rj : rj/Q ^ Q}) < exp{-r2V3}.
Now assume that we are in the situation of (46) and that the distribution
of ^ possesses the outlined concentration property. Let us choose somehow a
safety parameter i? > 1, and consider the scenario counterpart of (35), where
the disturbances are drawn from the distribution of Q^ rather than from the
distribution of^:
g{x,f2e)<0,t = l,..„N
t (47)

where ^* ~ P are independent. Specifying A^ as


0(l)(l-c)-Mog(l/5) (48)
with appropriate absolute constant 0(1), observe that if a fixed x satisfies
(47), then it is "highly likely" that Vroh{g{x,Q£) < 0} > c; specifically, in
the case of Prob{^(x, i?^) < 0} < c, the probability to get a realization of A^
disturbances (with N given by (48)) which results in (47) is at most 5. Thus,
when a given x turns out to satisfy (47), then, up to probability of "bad
sampling" as small as 5, we have Prob{p(x, i?^) < 0} = Prob{i7^ G Kx] >
c. In the latter case, due to the concentration property of the distribution
17 of rj = Q^ (induced by similar property of the distribution P of ^), we
have Prob{^(x,C) > 0} = Prob{C ^ K^} < C " ! exp{-Ci?2}. When i? =
\/C~^ log(C~%"~^), the latter probability is < e, that is, x satisfies the chance
constraint (35). For example, in the case when P is a multivariate normal
distribution with zero mean and e in (35) is as small as 10~^^, the above rule
results in i? = 9.1. Thus, when ^ ~ A/'(0, T), A^ is given by (48) and r? = 9.1,
a fixed point x which satisfies (47) is, up to probability of "bad sampling" at
most 5, feasible for the chance constraint (35) with e = 10~^^.
The outlined idea - to apply the scenario approach with moderately am-
plified disturbances rather than with "true ones" - under favorable circum-
stances allows to approximate chance constraints via samples of size A^ which
is polynomial in the "sizes" of the problem (the dimensions of x, ^ and K)
and logarithms of 1/e, 1/5, and thus allows to handle efficiently constraints
(35) with really small tolerances e. For detailed presentation and analysis of
this approach, see [NS05].

4.2 Multistage Stochastic Programming in linear decision rules


Consider a linear multi-stage stochastic program
On Complexity of Stochastic Programming Problems 141
T T
MinEp
t=l t=l

with fixed recourse, where the cost coefficients Cf and the matrices At, t > 1,
are not affected by uncertainty, as reflected in the notation. Besides this,
in what follows we assume that the data affected by the uncertainty (that
is, co(0, ^ o ( 0 ' HO) ^^^ diffine functions of ^; as we remember from the
previous section, this "assumption" is in fact a convention on how we use
words: nobody forbids us to treat as the actual "random parameter" the
collection (c(<^),^o(05^(0) I'ather than ^ itself.
As we have explained, a multistage problem (even much better struc-
tured than (49)) is, generically, "severely computationally intractable". We
are about to propose a radical way to reduce the complexity of the problem,
specifically, to pass from arbitrary decision rules yt{') to afRne ones:

ytiO =x"t+ XtQti, (50)

where x^^Xt are our new - deterministic! - variables (a vector and a matrix
of appropriate sizes), and Qt£,, Qt being a given deterministic matrix, is the
"portion" of uncertainty which is revealed at time t and thus can be used to
make the decision yt^^.
Now let us look at the problem we end up with. When substituting linear
decision rules (50) into the constraint of (49), the constraint takes the form

Prob J A^{Ox + ^ [Atx'l + AtXtQti] - KO > 0 I - 1.

The left hand side of the system of inequalities in the latter Prob {•} is affine
in ^, thus, the constraint in question says exactly that the system should be
satisfied for all ^ from the support S of the distribution P of ^. Since the left
hand side of the system is affine in ^, the latter requirement is equivalent to
the system to be valid for all ^ G Z, where Z is the closed convex hull of E,
Thus, the constraint of (49) is nothing but the semi-infinite system of linear
inequalities
T

A^{i)x + Y.['^tx'l + AtXtQt(\~h{0>^ V^GZ (51)


t=\

in variables w = {x, { x ^ , X t } ^ i } . Besides this, the coefficients of the semi-


infinite inequalities in (51) depend affinely on ^. Now let us use the following
known fact (see [BN98]):
(!) Assume that Z is a polyhedral set

In the notation of (32), Qtf = f[t] = ( 6 , •••^Ct)-


142 A. Shapiro, A. Nemirovski

Z - {^ : 3 77 such that M^ + Nrj-i-p > 0},

given by the data M,N,p. Then the semi-infinite system (51) is equiv-
alent to a finite system S of linear inequalities:

w satisfies (51) <^ 3u : Aw -\- Bu-]- q>0.

The sizes of S (that is, the row and the column sizes of A, B) are
polynomial in the sizes of the matrices AQ, AI,...,AT, M, N, and the
data A,B,q of S are readily given by the data of (51) and M, N, p
(that is, given the latter data, one can build S in polynomial time).
In fact, [BN98] asserts much more than stated by (!), namely, that (51) is
computationally tractable whenever Z is so. We, however, intend to stay
within the grasp of Linear Programming, and to this end (!) is exactly what
we need.
Example: interval uncertainty. Assume that Z is a box; without loss
of generality, we may assume that Z = {^ : —1 < ^^ < 1, i == 1,..., d}.
Since ^o(0> KO ^^^ affine in ^, (51) can be rewritten equivalently as
the semi-infinite problem
d
4W + E^'[^]^^ <0\/^GZJ = 1,..., J, (52)

where X stands for the collection {x, {x^, Xt}Jl=i} ^^ design variables
in (51), and sl[X] are afline functions of X readily given by the data
of (51). With our Z, the semi-infinite system (52) is clearly equivalent
to the system of constraints

4[X] + J2\Si[X]\<0,j = l,...,J,

that is, to an explicit system of convex constraints (which can be


further straightforwardly converted to a system of linear inequalities).
By the outlined analysis, when restricted to affine decision rules, (49) becomes
an explicit deterministic linear program

^^^w={x,{xlXt}M {(c, w) :Aw-^Bu + q>0},


{c,w)^E{{co{0,x)+Eti{cu[xUXtPt^])}- ^^^'

in variables w = {x, {x^, Xt}J^i}.


Several remarks are in order.

Remark 2. The only reason for restricting ourselves with afiine decision rules
stems from the desire to end up with a computationally tractable problem.
We do not pretend that affine decision rules approximate well the optimal
On Complexity of Stochastic Programming Problems 143

ones - whether it is so or not, it depends on the problem, and we usually have


no possibility to understand how good in this respect is a particular problem
we should solve. T h e rationale behind restricting to afRne decision rules is the
belief t h a t in actual applications it is better to pose a modest and achievable
goal rather t h a n an ambitious goal which we do not know how to achieve^^
Remark 3. To some extent, w h a t is affine and what is not is a m a t t e r of how
we use words. Assume, e.g., t h a t one wants to pass from affine decision rules
to quadratic ones. This is exactly the same as to keep the rules afRne and to
add to the entries of ^ their pairwise products, and similarly for more com-
plicated families of decision rules. Statement (!) explains what are the "limits
of sophistication in the decision rules" we can achieve: representing a sophis-
ticated decision rule as an afRne one, the uncertainty vector ^ being properly
extended, we need the convex hull of the support of this extended vector to
be computationally tractable. In principle, this might be not the case already
for "genuinely afRne" decision rules; however, in typical applications distrib-
ution P of the "actual" uncertainty ^ is simple enough, so t h a t C o n v ( s u p p P )
is computationally tractable. However, with P as simple as a uniform distri-
bution on a box, the "quadratic extension" ^ H-> (,^, {ii^j}i,j) of ^ results in
r a n d o m vector with a distribution too complicated, as far as our needs are
concerned. Thus, the limitations of afRne decision rules are in fact limitations
of our possibility to describe efRciently convex hulls of supports of nonlinear
transformations of ^.
Remark 4- One could bet t h a t the idea of multi-stage decision making under
uncertainty via linear decision rules is as old as the corresponding optimization
model. It seems, however, t h a t this idea remained completely forgotten for a
long time; at least, we do not know who should be credited with it. Linear
decision rules in optimization under uncertainty were recently "resurrected" in
[BGGN04] in the framework of Robust Optimization. Our exposition follows
the methodology developed in [BGGN04], with the only minor exception t h a t
in Robust Optimization one is aimed at minimizing the worst-case value of an
uncertainty-affected objective under the restriction t h a t a candidate solution
remains feasible whatever be a realization of uncertainty-affected constraints,
while here we intend to optimize, under the same restriction, the expected
value of the objective.
Remark 5. We have assumed t h a t (49) has a fixed recourse; the role of this
assumption was to ensure affinity of the constraints in (51) in ^, which in t u r n
•^^ In this respect, it is very instructive to look at Control, where the idea of linear
feedback dominates theoretical research, and, to some extent, applications. Aside
of a handful of simple particular cases, there are no reasons to believe that "the
abilities" of linear feedback are as good as those of a general nonlinear feedback.
However, Control community realized long ago that a bird in the hand is worth
two in the bush - it is much better to restrict ourselves with something which
we indeed can analyze and process numerically. We believe this is an instructive
example for the optimization community.
144 A. Shapiro, A. Nemirovski

made it possible to use (1) in order to end up with tractable reformulation


(53) of the problem of interest. In the case when the recourse is not fixed,
t h a t is, the matrices At^ t > 1, in (49) depend affinely on ^, the situation
becomes much more complicated - the left hand sides of the inequalities in (51)
become quadratic in ^, which makes (!) inapplicable^^. It t u r n s out, however,
t h a t under not too restrictive assumptions the problem of optimizing under
the constraints (51), although NP-hard, admits tractable approximations of
reasonable quality [BGGN04].

Remark 6. Passing from arbitrary decision rules to affine ones seems to reduce
dramatically the flexibility of our decision-making and thus - the expected
results. Note, however, t h a t the numerical results for inventory management
models reported in [BGGN04, BGNV04] demonstrate t h a t aflfinity may well be
not as a severe restriction as one could expect it to be. In any case, we believe
t h a t when processing multi-stage problems, affine decision rules make a good
and easy-to-implement starting point, and t h a t it hardly makes sense to look
for more sophisticated (and by far more computationally demanding) decision
policies, unless there exists a clear indication of "severe tion-optimality" of the
affine rules.

References
[ADEH99] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D.: Coherent measures of
risk. Mathematical Finance, 9, 203-228 (1999)
[ADEHK03] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D. Ku, H.: Coherent
multiperiod risk measurement, Manuscript, ETH Zurich (2003)
[BL97] Barmish, B.R., Lagoa, CM.: The uniform distribution: a rigorous justifi-
cation for the use in robustness analysis. Math. Control, Signals, Systems,
10, 203-222 (1997)
[Bea55] Beale, E.M.L.: On minimizing a convex function subject to linear inequal-
ities. Journal of the Royal Statistical Society, Series B, 17, 173-184 (1955)
[BN98] Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Mathematics
of Operations Research, 23 (1998)
[BNOl] Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization.
SIAM, Philadelphia (2001)
[BGGN04] Ben-Tal, A., Goryashko, A., Guslitzer, E., Nemirovski, A.: Adjustable
robust solutions of uncertain linear Programs. Mathematical Program-
ming, 99, 351-376 (2004)
[BGNV04] Ben-Tal, A., Golany, B., Nemirovski, A., Vial J.-Ph.: Retailer-supplier
flexible commitments contracts: A robust optimization approach. Submit-
ted to Manufacturing 8z Service Operations Management (2004)
[CC05] Calafiore G., Campi, M.C.: Uncertain convex programs: Randomized so-
lutions and confidence levels. Mathematical Programming, 102, 25-46
(2005)

^° In fact, in this case the semi-infinite system (51) can become NP-hard already
with Z as simple as a box [BGGN04].
On Complexity of Stochastic Programming Problems 145

[CC04] Calafiore, G., Campi, M.C.: Decision making in an uncertain environment:


the scenariobased optimization approach. Working paper (2004)
[CC59] Charnes, A., Cooper, W.W.: Uncertain convex programs: randomized so-
lutions and confidence levels. Management Science, 6, 73-79 (1959)
[DLMV88] Dagum, P., Luby, L., Mihail, M., Vazirani, U.: Polytopes, Permanents,
and Graphs with Large Factors. Proc. 27th IEEE Symp. on Fondations
of Comput. Sci. (1988)
[Dan55] Dantzig, G.B.: Linear programming under uncertainty. Management Sci-
ence, 1, 197-206 (1955)
[DZ98] Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications.
Springer-Verlag, New York, NY (1998)
[Dup79] Dupacova, J.: Minimax stochastic programs with nonseparable penalties.
In: Optimization techniques (Proc. Ninth IFIP Conf., Warsaw, 1979),
Part 1, 22 of Lecture Notes in Control and Information Sci., 157-163.
Springer, Berlin (1980)
[Dup87] Dupacova, J.: The minimax approach to stochastic programming and an
illustrative application. Stochastics, 20, 73-88 (1987)
[DS03] Dyer, M., Stougie, L.: Computational complexity of stochastic program-
ming problems. SPOR-Report 2003-20, Dept. of Mathematics and Com-
puter Sci., Eindhoven Technical Univ., Eindhoven (2003)
[EROS] Eichhorn, A., Romisch, W.: Polyhedral risk measures in stochastic pro-
gramming. SIAM J. Optimization, to appear (2005)
[EGN85] Ermohev, Y., Gaivoronski, A., Nedeva, C : Stochastic optimization prob-
lems with partially known distribution functions. SIAM Journal on Con-
trol and Optimization, 23, 697-716 (1985)
[FS02] Follmer, H., Schied, A.: Convex measures of risk and trading constraints.
Finance and Stochastics, 6, 429-447 (2002)
[KSHOl] Kleywegt, A.J., Shapiro, A., Homem-De-Mello, T.: The sample average
approximation method for stochastic discrete optimization. SIAM Journal
of Optimization, 12, 479-502 (2001)
[GaiQl] Gaivoronski, A.A.: A numerical method for solving stochastic program-
ming problems with moment constraints on a distribution function. An-
nals of Operations Research, 3 1 , 347-370 (1991)
[JV96] Jerrum, M., Vazirani, U.: A mildly exponential approximation algorithm
for the permanent. Algorithmica, 16, 392-401 (1996)
[LSW05] Linderoth, J., Shapiro, A., Wright, S.: The empirical behavior of sampling
methods for stochastic programming. Annals of Operations Research, to
appear (2005)
[LSWOO] Linial, N., Samorodnitsky, A., Wigderson, A.: A deterministic strongly
poilynomial algorithm for matrix scaling and approximate permanents.
Combinatorica, 20, 531-544 (2000)
[MMW99] Mak, W.K., Morton, D.P., Wood, R.K.: Monte Carlo bounding tech-
niques for determining solution quality in stochastic programs. Opera-
tions Research Letters, 24, 47-56 (1999)
[Mar52] Markowitz, H.M.: Portfolio selection. Journal of Finance, 7, 77-91 (1952)
[Lan87] H.J. Landau (ed): Moments in mathematics. Proc. Sympos. Appl. Math.,
37. Amer. Math. Soc, Providence, RI (1987)
[Nem03] Nemirovski, A.: On tractable approximations of randomly perturbed con-
vex constraints - Proceedings of the 42nd IEEE Conference on Decision
and Control. Maui, Hawaii USA, December 2003, 2419-2422 (2003)
146 A. Shapiro, A. Nemirovski

[NS05] Nemirovski, A., Shapiro, A.: Scenario approximations of chance con-


straints. In: Calafiore, G., Dabbene, F., (eds) Probabilistic and Random-
ized Methods for Design under Uncertainty. Springer, Berlin (2005)
[Pre95] Prekopa, A.: Stochastic Programming. Kluwer, Dordrecht, Boston (1995)
[Rie03] Riedel, F.: Dynamic coherent risk measures. Working Paper 03004, De-
partment of Economics, Stanford University (2003)
[RUZ02] Rockafellar, R.T., Uryasev, S., Zabarankin, M.: Deviation measures in
risk analysis and optimization, Research Report 2002-7, Department of
Industrial and Systems Engineering, University of Florida (2002)
[RS04a] Ruszczyriski, A., Shapiro, A.: Optimization of convex risk functions. E-
print available at: h t t p : //www. o p t i m i z a t i o n - o n l i n e . org (2004)
[RS04b] Ruszczyiiski, A., Shapiro, A.: Conditional risk mappings. E-print available
at: h t t p : / / w w w . o p t i m i z a t i o n - o n l i n e . o r g (2004)
[SAGS05] Santoso, T., Ahmed, S., Goetschalckx, M., Shapiro, A.: A stochastic pro-
gramming approach for supply chain network design under uncertainty.
European Journal of Operational Research, 167, 96-115 (2005)
[SHOO] Shapiro, A., Homem-de-Mello, T.: On rate of convergence of Monte Carlo
approximations of stochastic programs. SI AM Journal on Optimization,
11, 70-86 (2000)
[SKOO] Shapiro, A., Kleywegt, A.: Minimax analysis of stochastic programs. Op-
timization Methods and Software, 17, 523-542 (2002)
[SHK02] Shapiro, A., Homem de Mello, T., Kim, J.C.: Conditioning of stochastic
programs. Mathematical Programming, 94, 1-19 (2002)
[Sha03a] Shapiro, A.: Inference of statistical bounds for multistage stochastic pro-
gramming problems. Mathematical Methods of Operations Research. 58,
57-68 (2003)
[Sha03b] Shapiro, A.: Monte Carlo sampling methods. In: Rusczynski, A., Shapiro,
A. (eds) Stochastic Programming, volume 10 of Handbooks in Operations
Research and Management Science. North-Holland (2003)
[Sha04] Shapiro, A.: Worst-case distribution analysis of stochastic programs. E-
print available at: h t t p : / / w w w . o p t i m i z a t i o n - o n l i n e . o r g (2004)
[Sha05a] Shapiro, A.: Stochastic programming with equilibrium constraints. Jour-
nal of Optimization Theory and Applications (to appear). E-print avail-
able at: http://www.optimization-online.org (2005)
[Sha05b] Shapiro, A.: On complexity of multistage stochastic programs.
Operations Research Letters (to appear). E-print available at:
h t t p : / / w w w . o p t i m i z a t i o n - o n l i n e . o r g (2005)
[TA04] Takriti, S., Ahmed, S.: On robust optimization of two-stage systems.
Mathematical Programming, 99, 109-126 (2004)
[VAKNS03] Verweij, B., Ahmed, S., Kleywegt, A.J., Nemhauser, C , Shapiro, A.:
The sample average approximation method applied to stochastic rout-
ing problems: a computational study. Computational Optimization and
Apphcations, 24, 289-333 (2003)
[Val79] Valiant, L.G.: The complexity of computing the permanent. Theoretical
Computer Science, 80, 189-201 (1979)
[Zac66] Zackova, J.: On minimax solutions of stochastic linear programming prob-
lems. Cas. Pest. Mat., 9 1 , 423-430 (1966)
Nonlinear Optimization in Modeling
Environments
Software Implementations for Compilers, Spreadsheets,
Modeling Languages, and Integrated Computing
Systems

Janos D. Pinter

Pinter Consulting Services, Inc.


129 Gienforest Drive, Halifax, NS, Canada B3M 1J2
jdpinterQhfX.eastlink.ca
http://www.pinterconsulting.com

Summary. We present a review of several professional software products that serve


to analyze and solve nonlinear (global and local) optimization problems across a va-
riety of hardware and software environments. The product versions discussed have
been implemented for compiler platforms, spreadsheets, algebraic (optimization)
modeling languages, and for integrated scientific-technical computing systems. The
discussion highlights some of the key advantages of these implementations. Test ex-
amples, well-known numerical challenges and client applications illustrate the usage
of the current software versions.

K e y w o r d s : nonlinear (convex and global) optimization; L G O solver suite


and its implementations; compiler platforms, spreadsheets, optimization mod-
eling languages, scientific-technical computing systems; illustrative applica-
tions and case studies.

2 0 0 0 M R S u b j e c t C l a s s i f i c a t i o n . 65K30, 90C05, 90C31.

1 Introduction
Nonlinearity is literally ubiquitous in t h e development of n a t u r a l objects, for-
mations and processes, including also living organisms of all scales. Conse-
quently, nonlinear descriptive models - and modeling paradigms even beyond
a straightforward (analytical) function-based description - are of relevance in
m a n y areas of t h e sciences, engineering, and economics. For example, [BM68,
Ric73, E W 7 5 , Man83, Mur83, Cas90, H J 9 1 , Sch91, BSS93, Ste95, Gro96,
PSX96, Pin96a, Ari99, Ber99, Ger99, LafOO, PWOO, CZOl, EHLOl, JacOl,
148 J.D. Pinter

Sch02, TS02, W0IO2, Diw03, Zab03, Neu04b, HL05, KP05, Pin05a, Pin05b] -
as well as many other authors - present discussions and an extensive repertoire
of examples to illustrate this point.
Decision-making (optimization) models that incorporate such a nonlinear
system description frequently lead to complex models that (may or prov-
ably do) have multiple - local and global - optima. The objective of global
optimization (GO) is to find the "absolutely best solution of nonlinear opti-
mization (NLO) models under such circumstances.
The most important (currently available) GO model types and solution
approaches are discussed in the Handbook of Global Optimization volumes,
edited by Horst and Pardalos [HP95], and by Pardalos and Romeijn [PR02].
As of 2004, over a hundred textbooks and a growing number of informative
web sites are devoted to this emerging subject.
We shall consider a general GO model form defined by the following in-
gredients:

• X decision vector, an element of the real Euclidean n-space R^\


• f{x) continuous objective function, f \ R^ —^ R^\
• D non-empty set of admissible decisions, a proper subset of R^.
The feasible set D is defined by
• l^ u explicit, finite vector bounds of x (a "box" in R^)\
• g{x) m-vector of continuous constraint functions, g : R^ —^ R^,
Applying the notation introduced above, the continuous global optimiza-
tion (CGO) model is stated as

min/(x) s.t. X belongs to (1)


D = {x:l<x< u,g{x) < 0}. (2)

Note that in (2) all vector inequalities are meant component-wise (/, u,
are n-vectors and the zero denotes an m-vector). Let us also remark that the
set of the additional constraints g could be empty, thereby leading to - of-
ten much simpler, although still potentially multi-extremal - box-constrained
models. Finally, note that formally more general optimization models (that
include also = and > constraint relations and/or explicit lower bounds on
the constraint function values) can be simply reduced to the canonical model
form (l)-(2). The canonical model itself is already very general: in fact, it triv-
ially includes linear programming and convex nonlinear programming models
(under corresponding additional specifications). Furthermore, it also includes
the entire class of pure and mixed integer programming problems, since all
(bounded) integer variables can be represented by a corresponding set of bi-
nary variables; and every binary variable y G {0,1} can be equivalently rep-
resented by its continuous extension y G [0,1] and the non-convex constraint
y ( l — ^ ) < 0 . Of course, we do not claim that the above approach is best - or
Nonlinear Optimization in Modeling Environments 149

even suitable - for "all" optimization models: however, it certainly shows the
generality of the CGO modeling framework.
Let us observe next that the above stated "minimal" analytical assump-
tions already guarantee that the optimal solution set X* in the CGO model
is non-empty. This key existence result directly follows by the classical theo-
rem of Weierstrass (that states the existence of the minimizer point (set) of a
continuous function over a non-empty compact set). For reasons of numerical
tract ability, the following additional requirements are also often postulated:
• D is a, full-dimensional subset ("body") in R^\
• the set of globally optimal solutions to (l)-(2) is at most countable;
• / and g (the latter component-wise) are Lipschitz-continuous functions on
[l,u]-
Note that the first two of these requirements support the development and
(easier) implementation of globally convergent algorithmic search procedures.
Specifically, the first assumption - i.e., the fact that D is the closure of its
non-empty interior - makes algorithmic search possible within the set D.
This requirement also imphes that e.g., nonhnear equality constraints need to
be directly incorporated into the objective function as discussed in [Pin96a],
Chapter 4.1.
With respect to the second assumption, let us note that in most well-
posed practical problems the set of global optimizers consists only of a single
point, or at most of several points. However, in full generality, GO models may
have even manifold solution sets: in such cases, software implementations will
typically find a single solution, or several of them. (There are theoretically
straightforward iterative ways to provide a sequence of global solutions.)
The third assumption is a sufficient condition for estimating / * on the basis
of a finite set of feasible search points. (Recall that the real-valued function
h is Lipschitz-continuous on its domain of definition D C R^, if \h{xi) —
h{x2)\ < L\\xi — X2II holds for all pairs xi G D^X2 G D; here L = L{D,h) is
a suitable Lipschitz-constant of h on the set D\ the inequality above directly
supports lower bound estimates on sets of finite size.) We emphasize that
the factual knowledge of the smallest suitable Lipschitz-constant - for each
model function - is not required, and in practice such information is typically
unavailable indeed.
Let us remark here that e.g., models defined by continuously diff"erentiable
functions / and g certainly belong to the CGO or even to the Lipschitz model
class. In fact, even such "minimal" smooth structure is not essential: since
e.g., "saw-tooth" like functions are also Lipschitz-continuous. This comment
also implies that CGO indeed covers a very general class of optimization
models. As a consequence of this generality, the CGO model class includes also
many extremely diflficult instances. To perceive this difficulty, one can think of
model-instances that would require "the finding of the lowest valley across a
range of islands" (since the feasible set may well be disconnected), based on an
150 J.D. Pinter

intelligent (adaptive, automatic), but otherwise completely "blind" sampling


procedure...
For illustration, a merely one-dimensional, box-constrained model is shown
in Fig. 1. This is a frequently used classical GO test problem, due to Shubert:
it is defined as

min Y^ k sm(k -\-(k + l)x) 10 < x < 10.


;c=i,...,5

Fig. 1. One-dimensional, box-constrained CGO model

Model complexity may - and frequently will - increase dramatically, al-


ready in (very) low dimensions. For example, both the amplitude and the
frequency of the trigonometric components in the model of Figure 1 could be
increased arbitrarily, leading to more and more difficult problems.
Furthermore, increasing dimensionality per se can lead to a tremendous
- theoretically exponential - increase of model complexity (e.g., in terms
of the number of local/global solutions, for a given type of multi-extremal
models). To illustrate this point, consider the - merely two-dimensional, box-
constrained, yet visibly challenging - objective function shown in Fig. 2 below.
The model is based on Problem 4 of the Hundred-Dollar, Hundred-Digit Chal-
lenge Problems [Tre02], and it is stated as

min -^^ + exp(sin(50x)) - sin(10(x + y)) + sin(60 exp(y))


+ sin(70 sin(a;)) + sin(sin(802/))
-3<x<3 -3<y<3.
Nonlinear Optimization in Modeling Environments 151

Fig. 2. Two-dimensional, box-constrained CGO model

'vif. f.v#;
If •/
/

Needless to say, not all - and especially not all practically motivated - CGO
models are as difficult as indicated by Figures 1 and 2. At the same time, we do
not always have the possibility to directly inspect and estimate the difficulty of
an optimization model, and perhaps unexpected complexity can be met under
such circumstances. An important case in point is when the software user
(client) has a confidential or otherwise visibly complex model that needs to
be analyzed and solved. The model itself can be presented to the solver engine
as an object code, dynamic fink hbrary (dll), or even as an executable program:
in such situations, direct model inspection is simply not an option. In many
other cases, the evaluation of the optimization model functions may require
the numerical solution of a system of differential equations, the evaluation of
special functions or integrals, the execution of a complex system of program
code, stochastic simulation, even some physical experiments, and so on.
Traditional numerical optimization methods - discussed in most topical
textbooks such as e.g. [BSS93, Ber99, CZOl] - search only for local optima.
This approach is based on the tacit assumption that a "sufficiently good" ini-
tial solution (that is located in the region of attraction of the "true" solution)
is immediately available. Both Fig. 1 and Fig. 2 suggest that this may not al-
ways be a realistic assumption . . . Models with less "dramatic" difficulty, but
in (perhaps much) higher dimensions also imply the need for global optimiza-
tion. For instance, in advanced engineering design, models with hundreds or
thousands of variables and constraints are analyzed. In similar cases to those
152 J.D. Pinter

mentioned above, even an approximately completed, but genuine global (ex-


haustive) search strategy may - and typically will - yield better results than
the most sophisticated local search approach when "started from the wrong
valley"...

2 A solver suite approach to practical global


optimization
The general development philosophy followed by the software implementa-
tions discussed here is based on the seamless combination of rigorous (i.e.,
theoretically convergent) global and efficient local search strategies.
As it is well-known ([HT96, Pin96a]), the existence of vahd overestimates
of the actual (smallest possible) Lipschitz-constants, for / and for each compo-
nent of ^ in the model (l)-(2), is sufficient to guarantee the global convergence
of suitably defined adaptive partition algorithms. In other words, the applica-
tion of a proper branch-and-bound search strategy (that exploits the Lipschitz
information referred to above) generates a sequence of sample points that con-
verges exactly to the (unique) global solution x* = {-^*} of the model instance
considered. If the model has a finite or countable number of global solutions,
then - theoretically, and under very general conditions - sub-sequences of
search points are generated that respectively converge to the points of X*.
For further details related to the theoretical background, including also a de-
tailed discussion of algorithm implementation aspects, consult [Pin96a] and
references therein.
In numerical practice, deterministically guaranteed global convergence
means that after a finite number of search steps - i.e., sample points and
corresponding function evaluations - one has an incumbent solution (with a
corresponding upper bound of the typically unknown optimum value), as well
as a verified lower bound estimate. Furthermore, the "gap" between these es-
timates converges to zero, as the number of generated search points tends to
infinity. For instance, interval arithmetic based approaches follow this avenue:
consult, e.g., [RR95, Kea96, Neu04b]; [CK99] review a number of successful
applications of rigorous search methods.
The essential difficulty of applying such rigorous approaches to "all" GO
models is that their computational demand typically grows at an exponential
pace with the size of the models considered. For example, the Lipschitz infor-
mation referred to above is often not precise enough: "carefree" overestimates
of the best possible (smallest) Lipschitz-constant lead to a search procedure
that will, in effect, be close in efficiency to a passive uniform grid search. For
this reason, in a practical GO context, other search strategies also need to be
considered.
It is also well-known that properly constructed stochastic search algo-
rithms also possess general theoretical global convergence properties (with
probability 1): consult, for instance, the review of [BR95], or [Pin96a]. For a
Nonlinear Optimization in Modeling Environments 153

very simple example that illustrates this point, one can think of a pure ran-
dom search mechanism applied in the interval l<x<u to solve the CGO model:
this will eventually converge, if the "basin of attraction" of the (say, unique)
global optimizer x* has a positive volume. In addition, stochastic sampling
methods can also be directly combined with search steps of other - various
global and efficient local - search strategies, and the overall global convergence
of such strategies will be still maintained. The theoretical background of sto-
chastic "hybrid" algorithms is discussed by [Pin96a]. The underlying general
convergence theory of such combined methods allows for a broad range of im-
plementations. In particular, a hybrid optimization program system supports
the flexible usage of a selection of component solvers: one can execute a fully
automatic global or local search based optimization run, can combine solvers,
and can also design various interactive runs.
Obviously, there remains a significant issue regarding the (typically un-
foreseeable best) "switching point" from strategy to strategy: this is however,
unavoidable, when choosing between theoretical rigor and numerical efficiency.
(Even local nonlinear solvers would need, in theory, an infinite iterative pro-
cedure to converge, except in idealized special cases.) For example, in the
stochastic search framework outlined above, it would suffice to find just one
sample point in the "region of attraction" of the (unique) global solution x*,
and then that solution estimate could be refined by a suitably robust and
efficient local solver. Of course, the region of attraction of x* (e.g., its shape
and relative size) is rarely known, and one needs to rely on computationally
expensive estimates of the model structure (again, the reader is referred, e.g.,
to the review of [BR95]). Another important numerical aspect is that one loses
the deterministic (lower) bound guarantees when applying a stochastic search
procedure: instead, suitable statistical estimation methods can be applied,
consult [Pin96a] and topical references therein. Again, the implementation of
such methodology is far from trivial.
To summarize the discussion, there are good reasons to apply various
search methods and heuristic global-to-local search "switching points" with a
reasonable expectation of numerical success. Namely,

• one needs to apply proper global search methods to generate an initial


good "coverage" of the search space;
• it is also advantageous to apply quality local search that enables the fast
improvement of solution estimates generated by a preceding global search
phase;
• using several - global or local - search methods based on different theoret-
ical strategies, one has a better chance to find quality solutions in difficult
models (or ideally, confirm the solution by comparing the results of several
solver runs);
• one can always place more or less emphasis on rigorous search vs. efficiency,
by selecting the appropriate solver combination, and by allocating search
effort (time, function evaluations);
154 J.D. Pinter

• we often have a priori knowledge regarding good quality solutions, based on


practical, model-specific knowledge (for example, one can think of solving
systems of equations: here a global solution that "nearly" satisfies the
system can be deemed as a sufficiently good point from which local search
can be directly started);
• practical circumstance and resource limitations may (will) dictate the use
of additional numerical stopping and switching rules that can be flexibly
built into the software implementation.
Based on the design philosophy outlined - that has been further confirmed
and dictated by practical user demands - we have been developing for over
a decade nonlinear optimization software implementations that are based on
global and local solver combinations. The currently available software prod-
ucts will be briefly discussed below with illustrative examples; further related
work is in progress.

3 Modeling systems and user demands


Due to advances in modeling, optimization methods and computer technol-
ogy, there has been a rapidly growing interest towards modeling languages
and environments. Consult, for example, the topical Annals of Operations Re-
search volumes [MM95, MMS97, VMMOO, CFOOl], and the volume [Kal04].
Additional useful information can be found, for example, at the web sites
[Fou04, MS04, Neu04a].
Prominent examples of widely used modeling systems that are focused on
optimization include AIMMS ([PDT04]), AMPL ([FGK93]), GAMS ([BKM88]),
the Excel Premium Solver Platform ([FSOl]), ILOG ([104]), the LINDO Solver
Suite ([LS96]), MPL ([MS02]), and TOMLAB ([TO04]). (Please note that the
literature references cited may not always reflect the current status of the
modeling systems listed: for the latest information, contact the developers
and/or visit their website.)
In addition, there exists also a large variety of core compiler platform-
based solver systems with some built-in model development functionality: in
principle, these all can be linked to the modeling languages listed above. At
the other end of the spectrum, there is also signiflcant development related
to fully integrated scientific and technical computing (ISTC) systems such as
Maple ([M04a]), Mathematica ([WR04]), and MATLAB ([TM04]). The ISTCs
also incorporate a growing range of optimization-related functionality, supple-
mented by application products.
The modeling environments listed above are aimed at meeting the needs
and demands of a broad range of clients. Major client groups include educa-
tional users (instructors and students); research scientists, engineers, econo-
mists, and consultants (possibly, but not necessarily equipped with an in-
depth optimization related background); optimization experts, vertical appli-
cation developers, and other "power users". Obviously, the user categories
Nonlinear Optimization in Modeling Environments 155

listed above are not necessarily disjoint: e.g., someone can be an expert re-
searcher and software developer in a certain professional area, with a perhaps
more modest optimization expertise. The pros and cons of the individual
software products - in terms of ease of model prototyping, detailed code de-
velopment and maintenance, optimization model processing tools, availability
of solvers and other auxiliary tools, program execution speed, overall level of
system integration, quality of related documentation and support - make such
systems more or less attractive for the user groups listed.
It is also worth mentioning at this point that - especially in the context
of nonlinear modeling and optimization - it can be a salient idea to tackle
challenging problems by making use of several modeling systems and solver
tools, if available. In general, dense NLO model formulations are far less easy
to "standardize" than linear or even mixed integer linear models, since one
typically needs an explicit, specific formula to describe a particular model
function. Such formulae are relatively straightforward to transfer from one
modehng system into another: some of the systems hsted above even have such
built-in converter capabilities, and their syntaxes are typically quite similar
(whether it is x**2 or x^, sin(x) or Sin[x], bernouni(n,x) or BernoulliB[n,x],
and so on).
In subsequent sections we shall summarize the principal features of sev-
eral current nonlinear optimization software implementations that have been
developed with quite diverse user groups in mind. The range of products re-
viewed in this work includes the following:
• LGO Solver System with a Text I/O Interface
• LGO Integrated Development Environment
• LGO Solver Engine for Excel
• MathOptimizer Professional (LGO Solver Engine for Mathematica)
• Maple Global Optimization Toolbox (LGO Solver Engine for Maple).
We will also present relatively small, but non-trivial test problems to il-
lustrate some of the key functionality of these implementations.
Note that all software products discussed are professionally developed and
supported, and that they are commercially available. For this reason - and
also in line with the objectives of this paper - some of the algorithmic tech-
nical details are only briefly mentioned. Additional technical information is
available upon request; please consult also the publicly available references,
including the software documentation and topical web sites.
In order to keep the length of this article within reasonable bounds, further
product implementations not discussed here are
• LGO Solver Engine for GAMS
• LGO Solver Engine for MPL
• TOMLAB/LGO for MATLAB
• MathOptimizer for Mathematica.
156 J.D. Pinter

With respect to these products, consult e.g. the references [Pin02a, PK03,
KP04b, KP05, PHGE04, PK05].

4 Software implementation examples

4.1 LGO solver system with a text I / O interface

The Lipschitz Global Optimizer (LGO) software has been developed and used
for more than a decade (as of 2004). Detailed technical descriptions and user
documentation have appeared elsewhere: consult, for instance, [Pin96a, Pin97,
PinOla, Pin04], and the software review [BSOO]. Let us also remark here that
LGO was chosen to illustrate global optimization software (in connection with
a demo version of the MPL modeling language) in the well-received textbook
[HL05].
Since LGO serves as the core of most current implementations (with the
exception of one product), we will provide its somewhat more detailed de-
scription, followed by concise summaries of the other platform-specific imple-
mentations.
In accordance with the approach advocated in Section 2, LGO is based on
a seamless combination of a suite of global and local scope nonlinear solvers.
Currently, LGO includes the following solver options:
• adaptive partition and search (branch-and-bound) based global search
(BB)
• adaptive global random search (single-start) (GARS)
• adaptive global random search (multi-start) (MS)
• constrained local search (generalized reduced gradient method) (LS).
The global search methodology was discussed briefly in Section 2; the well-
known GRG method is discussed in numerous textbooks, consult e.g. [EHLOl].
Note that in all three global search modes the model functions are aggregated
by an exact penalty function. By contrast, in the local search phase all model
functions are considered and treated individually Note also that the global
search phases are equipped with stochastic sampling procedures that support
the usage of statistical bound estimation methods.
All LGO search algorithms are derivative-free: specifically, in the local
search phase central differences are used to approximate gradients. This choice
reflects again our objective to handle (also) models with merely computable,
continuous functions, including "black box" systems.
The compiler-based LGO solver suite is used as an option linked to various
modeling environments. In its core text I/O based version, the application-
specific LGO executable program (that includes a driver file and the model
function file) reads an input text file that contains all remaining application
information (model name, variable and constraint names, variable bounds
and nominal values, and constraint types), as well as a few key solver options
Nonlinear Optimization in Modeling Environments 157

(global solver type, precision settings, resource and time limits). Upon com-
pleting the LGO run, a summary and a detailed report file are available. As
can be expected, this LGO version has the lowest demands for hardware, it
also runs fastest, and it can be directly embedded into vertical and proprietary
user applications.

4.2 LGO integrated development environment

LGO can be also equipped - as a readily available implementation option -


with a simple, but functional and user-friendly MS Windows interface. This
enhanced version is referred to as the LGO Integrated Development Environ-
ment (IDE). The LGO IDE provides a menu that supports model develop-
ment, compilation, linking, execution, and the inspection of results. To this
end, a text editor is used that can be chosen optionally such as e.g. the freely
downloadable ConTEXT and PFE editors, or others. Note here that even the
simple Notebook Windows accessory - or the more sophisticated and still free
Metapad text editor - would do. The IDE also includes external program call
options and two concise help files: the latter discuss global optimization basics
and the main application development steps when using LGO.
As already noted, this LGO implementation is compiler-based: user models
can be connected to LGO using one of several programming languages on
personal computers and workstations. Currently supported platforms include
essentially all professional Fortran 77/90/95 and C compilers and some others:
prominent examples are Borland C / C + + and Delphi, Compaq/Digital Visual
Fortran; Lahey Fortran 77/90/95; Microsoft Visual Basic and C / C + + ; and
Salford Fortran 77/95. Other customized versions can also be made available
upon request, especially since the vendors of development environments often
expand the list of compatible platforms.
This LGO software implementation (in both versions discussed above)
fully supports communication with sophisticated user models, including en-
tirely closed or confidential "black box" systems. These LGO versions are
particularly advantageous in application areas, where program execution (so-
lution) speed is a major concern: in the GO context, many projects fall into
this category. The added features of the LGO IDE can also greatly assist in
educational and research (prototyping) projects.
LGO deliveries are accompanied by an approximately 60-page User Guide.
In addition to installation and technical notes, this document provides a brief
introduction to GO; describes LGO and its solvers; discusses the model de-
velopment procedure, including modeling and solution tips; and reviews a list
of applications. The appendices provide examples of the user (main, model,
and input parameter) files, as well as of the resulting output files; connectivity
issues and workstation implementations are also discussed.
For a simple illustration, we display below the LGO model function file
(in C format), and the input parameter file that correspond to a small, but
158 J.D. Pinter

not quite trivial GO model (this is a constrained extension of Shubert's model


discussed earlier):

min 2_] ^ sm{k + (/c + l)x)


/c=l,...,5

s.t. x^ + 3x + sin(x) < 6, 10 < x < 10.

Both files are slightly edited for the present purposes. Note also that in the
simplest usage mode, the driver file contains only a single statement that calls
LGO: therefore we skip the display of that file. (Additional pre- and post-
solver manipulations can also be inserted in the driver file: this can be useful
in various customized applications.)

Model function file


#include < s t d l i b . h >
#include <stdio.h>
#include<math.h>

_ s t d c a l l USER _FCT( double x[] , double f o x [ l ] , double gox[])


{
fox[0] = s i n ( l . + 2.*x[0]) + 2 . * s i n ( 2 . + 3.*x[0]) + 3 . * s i n ( 3 .
+ 4 . * x [ 0 ] ) + 4 . * s i n ( 4 . + 5.*x[0]) + 5 . * s i n ( 5 . + 6 . * x [ 0 ] ) ;
gox[0]=-6.+ pow(x[0],2.) -f s i n ( x [ 0 ] ) + 3 . * x [ 0 ] ;
r e t u r n 0;
}
Input parameter file

Model Descriptors
LGO Model ModelName
1 Number of Variables
1 Number of Constraints
Variable names Lower Bounds Nomimal Values Upper Bounds
X -10. 0. 10.
ObjFct ! Objective Function Name
Constraint Names and Constraint Types (0 for ==, -1 for <=)
Constraint1 -1

! SOLVER OPTIONS AND PARAMETERS —


1 ! Operational modes 0: LS; 1: BB+LS; 2: GARS
! +LS; 3: MS+LS
2000 ! Maximal no. of fct evals in global search
! phase
400 ! Maximal no. of fct evals in global search
! w/o improvement
Nonlinear Optimization in Modeling Environments 159

Constraint penalty multiplier


-1000000. Target objective fimction value in global
search phase
-1000000. Target objective function value in local
search phase
0.000001 Merit function precision improvement
threshold in local search phase
0.000001 Constraint violation tolerance in local
search phase
0.000001 Kuhn-Tucker local optimality conditions
tolercince in local search phase
0 Built-in random number generator seed value
300 Program execution time limit (seconds)

S u m m a r y result file

LGO Solver Results Summary


Model name: LGO Model
Total number of function evaluations 997
Objective function: ObjFct -14.8379500257
Solution vector components
1 X -1.1140996879
C o n s t r a i n t f u n c t i o n v a l u e s a t optimum e s t i m a t e
1 Constraint1 -8.9985950759
Solver s t a t u s i n d i c a t o r value 4 TERMINATED BY
SOLVER
Model s t a t u s i n d i c a t o r v a l u e 1 GLOBALLY OPTIMAL
SOLUTION FOUND
LGO s o l v e r s y s t e m e x e c u t i o n t i m e ( s e c o n d s ) 0 . 0 1
For a d d i t i o n a l r u n t i m e i n f o r m a t i o n , p l e a s e c o n s u l t t h e
LGO.OUT f i l e .
LGO a p p l i c a t i o n r u n c o m p l e t e d .

4 . 3 L G O s o l v e r e n g i n e for E x c e l u s e r s

T h e L G O global solver engine for Microsoft Excel has been developed in


cooperation with Frontline Systems [FSOl]. For details on the Excel Solver
and the currently available advanced engine options visit Frontline's web site
(www.solver.com). T h e site contains useful information, including for instance,
tutorial material, modeling tips, and various spreadsheet examples. T h e User
Guide provides a brief introduction to all current solver engines; discusses the
diagnosis of solver results, solver options and reports; and it also contains
a section on Solver VBA functions. Note t h a t this information can also be
invoked t h r o u g h Excel's on-line help system. In this implementation, L G O
is a field-installable Solver Engine t h a t seamlessly connects to the P r e m i u m
160 J.D. Pinter

Solver Platform: the latter is fully compatible with the standard Excel Solver,
but it has enhanced algorithmic capabilities and features.
LGO for Excel, in addition to continuous global and local capabilities,
also provides basic support for handling integer variables: this feature has
been implemented - as a generic option for all advanced solver engines - by
Frontline Systems.
The LGO solver options available are essentially based on the stand-alone
"silent" version of the software, with some modifications and added features.
The LGO Solver Options dialog, shown by Fig. 3, allows the user to control
solver choices and several other settings.

Fig. 3. Excel/ LGO solver engine: solver options and parameters dialog

1 LGO Glob Sm»J \ : \ llJSl


i**i»b"f^ 1 iTi'fK^l ilOO st^conds OK :

lte.^-ation5: Gance!

Precisiofi: jaoGoooi Integer Options... j

Convergence; |o.oooi LoadHodel...

Global C on ver*;;er!C e: |0.00, 5av6 Model...

Global Phase Cutoff


[T H-^t'
Global Phd<e Iterations: [soij
Global Pha$e Iterations
vAT'/oImprovrrrien!::

Local ^'Hcive Cutoff: piiTTo


Random S eed: 1'?'?'?

LGC> Search Optiorss


r Sahovv Iter'ation Pesuits
*'"' local Search fforn f-iomirual Solution
r n>>3 Ajtorrtatic Scaling
^* i^Glofeai Branch Sc'Bourjd;
r Assunfe Mon-Negative
("' Global Ad.3pti''/e Random Search
r Bvf»a>s Solver Repo?i:s
^" Global Multji-.t:arl Searc^J
Nonlinear Optimization in Modeling Environments 161

To illustrate the usage of the Excel/LGO implementation, we shall present


and solve the Electrical Circuit Design (ECD) test problem. The ECD model
has been extensively studied in the global optimization literature, as a well-
known computational challenge: see, e.g., [RR93], with detailed historical
notes and further references.
In the ECD problem, a bipolar transistor is modeled by an electrical cir-
cuit: this model leads to the following square system of nonlinear equations

ak{x) = 0 A: = 1,...,4; bk{x) = 0 A: = 1,...,4; c{x) = 0.

The individual equations are defined as follows:

ak{x) = {1- xiX2)x3{exp[x5{gik - gskXj - QbkXs)] - 1} - 9bk + 9AkX2,

bk{x) = {l- xiX2)a;4{exp[x6(pifc - g2k - gskx? + gAkXg)] - 1} - 95kXi + g^k


fc = l , . . . , 4 ;
c{x) = X1X3 — X2X4.

By assumption, the vector variable x belongs to the box region [0,10] . The
numerical values of the constants p^/e,f = l,...,5,A: = l , . . . , 4 are listed in the
paper of Ratschek and Rokne [RR93], and will not be repeated here. (Note
that, in order to make the model functions more readable, several constants
are simply aggregated in the above formulae, when compared to that paper.)
To solve the ECD model rigorously, Ratschek and Rokne applied a com-
bination of interval arithmetic, subdivision and branch-and-bound strategies.
They concluded that the rigorous solution was extremely costly (billions of
model function evaluations were needed), in order to arrive at a guaranteed
interval (i.e., embedding box) estimate that is component-wise within at least
10-4 precision of the postulated approximate solution:

X* = (0.9,0.45,1.0,2.0,8.0,8.0,5.0,1.0,2.0).
Obviously, by taking e.g. the Euclidean norm of the overall error in the
model equations, the problem of finding the solution can be formulated as a
global optimization problem. This model has been set up in a demo spread-
sheet, and then solved by the Excel LGO solver engine. The numerical solution
found by LGO - directly imported from the answer report - is shown below:
Microsoft Excel 10.0 Answer Report
Worksheet: [CircuitDesign_9^9.XLS] Model
Report Created: 12/16/2004
12:39:29 AM
Result: Solver found a solution. All constraints and
optimality conditions are satisfied.
162 J.D. Pinter

Engine: LGO Global Solver

Target Gel]. (Min)


Cell Name Original Value Final Value
$B$21 objective 767671534.2 9.02001E-11

Adjustable Cells
Cell Name Original Value Final Value
$D$10 x_l 1 0.900000409
$D$11 x_2 2 0.450000021
$D$12 x_3 3 1.000000331
$D$13 x_4 4 2.000001476
$D$14 x^5 5 7.999999956
$D$15 x_6 6 7.999998226
$D$16 x_7 7 4.999999941
$D$17 x_8 8 1.000000001
$D$18 x_9 9 1.999999812
The error of the solution found is within 10"^ to the verified solution, for
each component. The numerical solution of the ECD model in Excel takes less
than 5 seconds on a personal computer (Intel Pentium 4, 2.4 GHz processor,
512 Mb RAM). Let us note that we have solved this model also using core
LGO implementations with various C and Fortran compilers, with essentially
identical success (in about a second or less). Although this finding should not
lead per se to overly optimistic claims, it certainly shows the robustness and
efiiciency of LGO in solving this particular (non-trivial) example.

4.4 MathOptimizer Professional

Mathematica is an integrated environment for scientific and technical com-


puting. This ISTC system supports functional, rule-based and procedural
programming styles. Mathematica also offers advanced multimedia (graphics,
image processing, animation, sound generation) tools, and it can be used to
produce publication-quality documentation. For further information, consult
the key reference [WolOS]; the website www.wolfram.com provides detailed
information regarding also the range of other products and services related to
Mathematica.
MathOptimizer Professional ([PK03]), combines the model development
power of Mathematica with the robust performance and efficiency of the LGO
solver suite. To this end, the general-purpose interface MathLink is used that
supports communication between Mathematica and external programs. The
functionality of MathOptimizer Professional is summarized by the following
stages (note that all steps are fully automatic, except - obviously - the first
one):
• model formulation in Mathematica
Nonlinear Optimization in Modeling Environments 163

• translation of the Mathematica optimization model into C or Fortran code,


to generate the LGO model function file
• generation of the LGO input parameter file
• compilation of the C or Fortran model code into object code or dynamic
link library (dll): this step makes use of a corresponding compiler
• call to the LGO solver engine: the latter is typically provided as object
code or an executable program that is now linked together with the model
function object or dll file
• numerical solution and report generation by LGO
• report of LGO results back to the calUng Mathematica notebook.
A "side-benefit" of using MathOptimizer Professional is that the Math-
ematica models formulated are automatically translated into C or Fortran
format: this feature can be put to good use in a variety of contexts. (For ex-
ample, the LGO model function and input parameter file examples shown in
Section 4.2 were generated automatically.)
Let us also remark that the approach outlined supports "only" the solu-
tion of models defined in Mathematica that can be directly converted into C
or Fortran program code. Of course, this model category still allows the han-
dling of a broad range of optimization problems. The approximately 150-page
MathOptimizer Professional manual is a "live" (notebook) document that can
be directly invoked through Mathematica''s on-line help system. In addition
to basic usage description, the User Guide also discusses a large number of
simple and more challenging test problems, and several realistic application
examples in detail.
As an illustrative example, we will present the solution of a new - and
rather difficult - object packing model: we wish to find (numerically) the
"best" non-overlapping arrangement of a set of non-uniform size circles in
an embedding circle. Notice that this is not a standard model type (unlike
uniform circle packings that have been studied for decades, yet still only spe-
cial cases are solved to guaranteed optimality). Our approach can be directly
generalized to find essentially arbitrary object arrangements.
The best packing is defined here by a combination of two criteria: the
radius of the circumscribed circle, and the average pair-wise distance between
the centers of the embedded circles. The relative weight of the two objective
function components can be selected as a model-instance parameter.
Detailed numerical results are reported in [KP04a], for circles defined by
the sequence of radii ri — i'^'^.i = 1 , . . . , A/", up to A/" == 40-circle con-
figurations. Observe that the required (pair-wise) non-overlapping arrange-
ment leads to ^ 2~ non-convex constraints, in addition to 2N + 1 bound
constraints on the circle center and circumscribed radius decision variables.
Hence, in the 40-circle example, LGO solves this model with nearly 780 non-
convex constraints: the corresponding runtime is about 3.5 hours on a P4 1.6
GHz personal computer.
164 J.D. Pinter
As an illustration, the configuration found for the case of A^ = 20 circles
is displayed in Fig. 4. In this example, equal consideration (weight) is given
to minimizing the radius of the circumscribed circle and the average distance
between the circle centers. As the picture shows, the circumscribed radius
is about 2.2: in fact, the numerical value found is ~2.1874712123. Detailed
results appeared and will appear in [KP04a] and [KP05], respectively.

Fig. 4. An illustrative non-uniform circle packing result for N = 20 circles with


radii ri \i=l,...,N

Let us also remark that we have attempted to solve instances of the same
circle packing problem applying the built-in Mathematica function NMinimize
for nonhnear (global) optimization, but - using it in all of its default solver
modes - it could not find a solution of acceptable quality already for the
Nonlinear Optimization in Modeling Environments 165

case N = 5. Again, this is just a numerical observation, as opposed to an


"all-purpose" conclusion, to illustrate the quality of the LGO solver suite. We
have also conducted detailed numerical studies that provide a more systematic
comparison of global solvers available for use with Mathematica: these results
will appear in [KP05].
Finally, let us mention that MathOptimizer Professional is included in a
recent peer review of optimization capabilities using Mathematica ([Cog03]).

4.5 Maple Global Optimization Toolbox

The integrated computing environment Maple [M04a] enables the develop-


ment of sophisticated interactive documents that seamlessly combine technical
description, calculations, simple and advanced computing, and visualization.
Maple includes an extensive mathematical library: its more than 3,500 built-in
functions cover virtually all research areas in the scientific and technical dis-
ciplines. Maple also incorporates numerous supporting features and enhance-
ments such as e.g. detailed on-line documentation, a built-in mathematical
dictionary with definitions for more than 5000 mathematical terms, debug-
ging tools, automated (ANSI C, Fortran 77, Java, Visual Basic and MATLAB)
code generation, and document production (including HTML, MathML, TeX,
and RTF converters). All these capabilities accelerate and expand the scope
of optimization model development and solution.
To emphasize the key features pertaining to advanced systems modeling
and optimization, a concise listing of these capabilities is provided below.
Maple
• supports rapid prototyping and model development
• performance scales well to modeling large, complex problems
• offers context-specific "point and click" (essentially syntax-free) opera-
tions, including various "Assistants" (these are windows and dialogs that
help to execute various tasks)
• has an extensive set of built-in mathematical and computational functions
• has comprehensive symbolic calculation capabilities
• supports advanced computations with arbitrary numeric precision
• is fully programmable, thus extendable by adding new functionality
• has sophisticated visualization and animation tools
• supports the development of GUIs (by using "Maplets")
• supports advanced technical documentation, desktop publishing, and pre-
sentation
• provides links to external software products.
Maple is portable across all major hardware platforms and operating sys-
tems (Windows, Macintosh, Linux, and Unix versions). Without going into
further details that are outside of the scope of the present discussion, we refer
to the web site www.maplesoft.com that provides a wealth of further topical
information and product demos.
166 J.D. Pinter

The core of the recently released Global Optimization Toolbox (GOT) is


a customized implementation of the LGO solver suite for Maple [M04b]. To
this end, LGO was auto-translated into C code, and then fully integrated
with Maple. The advantage of this approach is that, in principle, the GOT
can handle all (thousands) of functions that are defined in Maple, including
their further extensions.
As an illustrative example, let us revisit Problem 4 posted by Trefethen
[Tre02]; recall Fig. 2 from Section 1. We can easily set up this model in Maple:
> f := e x p ( s i n ( 5 0 * x l ) ) + s i n ( 6 0 * e x p ( x 2 ) ) + s i n ( 7 0 * s i n ( x l ) )
+ s i n ( s i n ( 8 0 * x 2 ) ) - s i n ( 1 0 * ( x l + x 2 ) ) + (xl^2+x2'^2)/4;
/ : = exp(sin(50xl)) + sin(60exp(x2)) + sin(70sin(xl)) + sin(sin(80x2))
- sin(10xl + 10x2) + - x l ^ + -x2'^
Now using the bounds [—3,3] for both variables, and applying the Global
Optimization Toolbox we receive the numerical solution:
> GlobalSolveCf, x l = - 3 . . 3 , x 2 = - 3 . . 3 , evaluationlimit=100000,
noimprovementlimit=100000);

[-3.30686864747523535, [xl = -0.0244030794174338178,


x2 - 0.210612427162285371]]

We can compare the optimum estimate found to the corresponding 40-digit


precision value as stated at the website http://web.comlab.ox.ac.uk/oucl/work
/nick.trefethen/hundred.html (of Trefethen). The website provides the 40-
digit numerical optimum value

-3.306868647 4752372800 7611377089 8515657166...

Hence, the solution found by the Maple GOT (using default precision settings)
is accurate to 15 digits.
It is probably just as noteworthy that one can find a reasonably good
solution even in a much larger variable range, with the same solution eff'ort:
> GlobalSolveCf, x l = - 1 0 0 . . 1 0 0 , x2=-100..100, e v a l u a t i o n l i m i t
=100000, noimprovementlimit=100000);
[-3.06433688275856530, [xl - -0.233457978266705634e- 1,
x2 = .774154819772443825]]

A partial explanation is that the shape of the objective function f is close


to quadratic, at least "from a distance". Note at the same time that the built-
in Maple local solver produces much inferior results on the larger region (and
it also misses the global optimum when using the variable bounds [—3,3], as
can be expected):
Nonlinear Optimization in Modeling Environments 167

> Minimize(f, x l = - 1 0 0 . . 1 0 0 , x2=-100..100);

[-.713074709310511201, [xl = -0.223022309405313465e- 1,


x2 = -0.472762143202519123e- 2]]

The corresponding GOT runtimes are a little more than one second in
both cases. (Note that all such runtimes are approximate, and may vary a
bit even between consecutive test runs, depending on the machine's actual
runtime environment).
One of the advantages of using ISTCs that one can visuahze models and
verify their perceived difficulty. Fig. 5 is based on using the Maple Optimiza-
tion Plotter dialog, a feature that can be used in conjunction with the GOT:
it shows the box-constrained Trefethen model [Tre02] in the range [-3,3]^;
observe also the location of the optimal solution (green dot).

Fig. 5. Problem 4 in [Tre02] solved and visualized using the Maple GOT

B;0|itiiiiizatic>fi:;f*ld^| JSl

Ranges

Riangs of jxl ^ j = k e-Arerr.a r^t 0.242807


Risnge or |x2 -^J = [-3 sxtremaat -0.ij93323S
Rar^ge of objsc-tjvs vstkies = {defauit e;drerfia of -2.96667
y Plot Using PtobJeiii Domaii-i

r~ Plot Con^trcsirits I as SurTsices


168 J.D. Pinter

5 F u r t h e r Applications
For over a decade, LGO has been applied in a variety of professional, as well as
academic research and educational contexts (in some 20 countries, as of 2004).
In recent years, LGO has been used to solve models in up to a few thousand
variables and constraints. The software seems to be particularly well-suited
to analyze and solve complex, sophisticated applications in advanced engi-
neering, biotechnology, econometrics, financial modeling, process industries,
medical studies, and in various other areas of scientific modeling.
Without aiming at completeness, let us refer to some recent (published)
applications and case studies that are related to the following areas:
• model calibration ([PinOSa])
• potential energy models in computational chemistry ([PinOO, PinOlb]),
([SSPOl])
• laser design ([IPC03])
• cancer therapy planning ([TKLPL03])
• combined finite element modeling and optimization in sonar equipment
design ([PP03])
• Configuration analysis and design ([KP04b]).
Note additionally that some of the LGO software users develop other
advanced (but confidential) applications. Articles and numerical examples,
specifically related to various LGO implementations are available from the
author upon request. The forthcoming volumes ([KP05]; [Pin05a, Pin05b])
also discuss a large variety of GO applications, with extensive further refer-
ences.

6 Conclusions
In this paper, a review of several nonlinear optimization software products
has been presented. Following the introduction of the LGO solver suite, we
have provided a brief review of several currently available implementations for
use with compiler platforms, spreadsheets, optimization modeling languages,
and ISTCs. It is our objective to add customized functionality to the existing
products, and to develop further implementations, in order to meet the needs
of a broad range of users.
Global optimization is and will remain a field of extreme numerical diffi-
culty, not only when considering "all possible" GO models, but also in prac-
tical attempts to handle complex, sizeable problems in an acceptable time-
frame. Therefore the discussion advocates a practically motivated approach
that combines rigorous global optimization strategies with efficient local search
methodology, in integrated, flexible solver suites. The illustrative - yet non-
trivial - application examples and the numerical results show the practical
merits of such an approach.
Nonlinear Optimization in Modeling Environments 169

We are interested to learn suggestions regarding future development direc-


tions. Test problems and challenges - as well as prospective application areas
- are welcome.

Acknowledgements
First of all, I wish to thank my developer partners and colleagues for their
cooperation and many useful discussions, quality software, documentation,
and technical support. These partners include AMPL LLC, Frontline Systems,
the GAMS Development Corporation, Dr. Frank J. Kampas, Lahey Computer
Systems, LINDO Systems, Maplesoft, Maximal Software, Paragon Decision
Technology, TOMLAB AB, and Wolfram Research.
Several application examples reviewed or cited in this paper are based on
cooperation with colleagues: all such cooperation is gratefully acknowledged
and is reflected by the references.
In addition to professional contributions and in-kind support oS^ered by
developer partners, the work summarized and reviewed in this paper has re-
ceived financial support from the following organizations: DRDC Atlantic Re-
gion, Canada (Contract W7707-01-0746), the Dutch Technology Foundation
(STW Grant CWI55.3638), the Hungarian Scientific Research Fund (OTKA
Grant T 034350), Maplesoft, the National Research Council of Canada (NRC
IRAP Project 362093), the University of Ballarat, Austraha; the University
of Kuopio, Finland; and the University of Tilburg, Netherlands.

References
[Ari99] Aris, R.: Mathematical Modeling: A Chemical Engineers Perspective. Aca-
demic Press, San Diego, CA (1999)
[BSS93] Bazaraa, M.S., Sherali, H.D., Shetty, CM.: Nonlinear Programming: The-
ory and Algorithms. Wiley, New York (1993)
[BSOO] Benson, H.P., Sun, E. LGO - Versatile tool for global optimization. In:
OR/MS Today, 27, 52-55 (2000)
[Ber99] Bertsekas, D.P.: Nonlinear Programming (2nd Edition). Athena Scientific,
Cambridge, MA (1999)
[BR95] Boender, C.G.E., Romeijn, H.E. Stochastic methods. In: Horst and Parda-
los (eds) Handbook of Global Optimization. Volume 1, pp. 829-869 (1995)
[BLWW04] Bornemann, P., Laurie, D., Wagon, S., Waldvogel, J.: The SIAM 100-
Digit Challenge. A Study in High-Accuracy Numerical Computing. SIAM,
Philadelphia, PA (2004)
[BM68] Bracken, J. and McCormick, G.P.: Selected Applications of Nonlinear Pro-
gramming. Wiley, New York (1968)
[BKM88] Brooke, A., Kendrick, D. and Meeraus, A.: GAMS: A User's Guide. The
Scientific Press, Redwood City, CA. (Revised versions are available from
the GAMS Corporation.) See also http://www.gams.com (1988)
170 J.D. Pinter

[Cas90] Casti, J.L.: Searching for Certainty. Morrow & Co., New York (1990)
[Cog03] Cogan, B. How to get the best out of optimization software. In: Scientific
Computing World, 7 1 , 67-68 (2003)
[CK99] Corliss, G.F., Kearfott, R.B. Rigorous global search: industrial applications.
In: Csendes, T. (ed) Developments in Reliable Computing, 1-16. Kluwer
Academic Publishers, Boston/Dordrecht/London (1999)
[CFOOl] Coullard, C , Fourer, R., Owen, J.H. (eds): Annals of Operations Research,
104, Special Issue on Modeling Languages and Systems. Kluwer Academic
Publishers, Boston/Dordrecht/London (2001)
[CZOl] Chong, E.K.P., Zak, S.H.: An Introduction to Optimization (2nd Edition).
Wiley, New York (2001)
[Diw03] Diwekar, U.: Introduction to Applied Optimization. Kluwer Academic Pub-
lishers, Boston/Dordrecht/London (2003)
[EHLOl] Edgar, T.F., Himmelblau, D.M., Lasdon, L.S. Optimization of Chemical
Processes (2nd Edition). McGraw-Hill, New York (2001)
[EW75] Eigen, M. and Winkler, R.: Das Spiel. Piper & Co., Miinchen (1975)
[Fou04] Fourer, R.: Nonlinear Programming Frequently Asked Questions. Op-
timization Technology Center of Northwestern University and Ar-
gonne National Laboratory, http://www-unix.mcs.anl.gov/otc/Guide/faq/
nonlinear-programming-faq.html (2004)
[FGK93] Fourer, R., Gay, D.M., Kernighan, B.W.: AMPL - A Modeling Lan-
guage for Mathematical Programming. The Scientific Press, Redwood
City, CA (Reprinted by Boyd and Eraser, Danvers, MA, 1996. See also
http://www.ampl.com) (1993)
[FSOl] Frontline Systems: Premium Solver Platform - Solver Engines. User Guide.
Frontline Systems, Inc. Incline Village, NV (See http://www.solver.com,
and http://www.solver.com/xlslgoeng.htm) (2001)
[Ger99] Gershenfeld, N.: The Nature of Mathematical Modeling. Cambridge Uni-
versity Press, Cambridge (1999)
[Gro96] Grossmann, I.E. (ed): Global Optimization in Engineering Design. Kluwer
Academic Publishers, Boston/Dordrecht/London (1996)
[HJ91] Hansen, P.E. and J0rgensen, S.E. (eds): Introduction to Environmental
Management. Elsevier, Amsterdam (1991)
[HL05] Hillier, F.J. and Lieberman, G.J. Introduction to Operations Research. (8th
Edition.) McGraw-Hill, New York (2005)
[HP95] Horst, R., Pardalos, P.M. (eds): Handbook of Global Optimization (Volume
1). Kluwer Academic Publishers, Boston/Dordrecht/London (1995)
[HT96] Horst, R., Tuy, H.: Global Optimization - Determinsitic Approaches (3rd
Edition). Springer-Verlag, Berhn / Heidelberg / New York (1996)
[104] ILOG: ILOG OPL Studio and Solver Suite, http://www.ilog.com (2004)
[IPC03] Isenor, G., Pinter, J.D., Cada, M.: A global optimization approach to laser
design. Optimization and Engineering 4, 177-196 (2003)
[JacOl] Jacob, C : Illustrating Evolutionary Computation with Mathematica. Mor-
gan Kaufmann Publishers, San Francisco (2001)
[Kal04] Kallrath, J. (ed): Modeling Languages in Mathematical Optimization.
Kluwer Academic Publishers, Boston/Dordrecht/London (2004)
[KP04a] Kampas, F.J., Pinter, J.D.: Generalized circle packings: model formula-
tions and numerical results. Proceedings of the International Mathematica
Symposium (Banff, AB, Canada, August 2004)
Nonlinear Optimization in Modeling Environments 171

[KP04b] Kampas, F.J., Pinter, J.D.: Configuration analysis and design by using
optimization tools in Mathematica. The Mathematica Journal (to appear)
(2004)
[KP05] Kampas, F.J., Pinter, J.D.: Advanced Optimization: Scientific, Engineering,
and Economic Applications with Mathematica Examples. Elsevier, Amster-
dam (to appear) (2005)
[Kea96] Kearfott, R.B.: Rigorous Global Search: Continuous Problems. Kluwer Aca-
demic Publishers, Boston/Dordrecht/London (1996)
[LafOO] Lafe, O.: Cellular Automata Transforms. Kluwer Academic Publishers,
Boston / Dordrecht / London (2000)
[LCS02] Lahey Computer Systems. Fortran 90 User's Guide. Lahey Computer Sys-
tems, Inc., Inchne Village, http://www.lahey.com (2002)
[LS96] LINDO Systems. Solver Suite. LINDO Systems, Inc., Chicago, IL.
http://www.lindo.com (1996)
[Man83] Mandelbrot, B.B.: The Fractal Geometry of Nature. Freeman &; Co., New
York (1983)
[M04a] Maplesoft. Maple. (Current version: 9.5.) Maplesoft, Inc., Waterloo, ON.
http://www.maplesoft.com (2004)
[M04b] Maplesoft. Global Optimization Toolbox. Maplesoft, Inc. Waterloo, ON.
http://www.maplesoft.com (2004)
[MM95] Maros, I., Mitra, G. (eds): Annals of Operations Research, 58, Applied
Mathematical Programming and Modeling II (APMOD 93) J.C. Baltzer
AG, Science Publishers, Basel (1995)
[MMS97] Maros, I., Mitra, G., Sciomachen, A. (eds): Annals of Operations Re-
search, 8 1 , Applied Mathematical Programming and Modeling III (AP-
MOD 95). J.C. Baltzer AG, Science Publishers, Basel (1997)
[MS04] Mittelmann, H.D., Spellucci, P. Decision Tree for Optimization Software.
http://plato.la.asu.edu/guide.html (2004)
[MS02] Maximal Software. MPL Modeling System. Maximal Software, Inc. Arling-
ton, VA. http://www.maximal-usa.com (2002)
[Mur83] Murray, J.D.: Mathematical Biology. Springer-Verlag, Berlin (1983)
[Neu04a] Neumaier, A.: Global Optimization, http://www.mat.univie.ac.at/ neum
/glopt.html (2004)
[Neu04b] Neumaier, A.: Complete search in continuous global optimization and con-
straint satisfaction. In: Iserles, A. (ed) Acta Numerica 2004. Cambridge
University Press, Cambridge (2004b)
[PWOO] Papalambros, P.Y., Wilde, D.J.: Principles of Optimal Design. Cambridge
University Press, Cambridge (2000)
[PDT04] Paragon Decision Technology: AIMMS (Current version 3.5).
Paragon Decision Technology BV, Haarlem, The Netherlands. See
http://www.aimms.com (2004)
[PSX96] Pardalos, P.M., Shalloway, D. and Xue, G.: Global minimization of noncon-
vex energy functions: molecular conformation and protein folding. In: DI-
MACS Series, 23, American Mathematical Society, Providence, RI (1996)
[PR02] Pardalos, P.M., Romeijn, H.E. (eds): Handbook of Global Optimization.
Volume 2. Kluwer Academic Publishers, Boston/Dordrecht/London (2002)
[Pin96a] Pinter, J.D.: Global Optimization in Action. Kluwer Academic Publishers,
Boston / Dordrecht / London (1996)
172 J.D. Pinter

[Pin96b] Pinter, J.D.: Continuous global optimization software: A brief


review. Optima, 52, 1-8 (1996) (Web version is available at:
http://plato.la.asu.edu/gom.html)
[Pin97] Pinter, J.D.: LGO - A Program System for Continuous and Lipschitz Opti-
mization. In: Bomze, I.M., Csendes, T., Horst, R. and Pardalos, P.M. (eds)
Developments in Global Optimization, 183-197. Kluwer Academic Publish-
ers, Boston/Dordrecht/London (1997)
[PinOO] Pinter, J.D.: Extremal energy models and global optimization. In: La-
guna, M., Gonzalez-Velarde, J-L., (eds) Computing Tools for Model-
ing, Optimization and Simulation, 145-160. Kluwer Academic Publishers,
Boston/Dordrecht/London (2000)
[PinOla] Pinter, J.D.: Computational Global Optimization in Nonlinear Systems.
Lionheart Publishing Inc., Atlanta, GA (2001)
[PinOlb] Pinter, J.D.: Globally optimized spherical point arrangements: model vari-
ants and illustrative results. Annals of Operations Research 104, 213-230
(2001)
[Pin02a] Pinter, J.D.: MathOptimizer - An Advanced Modehng and Opti-
mization System for Mathematica Users. User Guide. Pinter Con-
sulting Services, Inc., Halifax, NS (2002a) (For a summary, see also
http://www.wolfram.com/products/ applications/mathoptimizer/)
[Pin02b] Pinter, J.D.: Global optimization: software, test problems, and applica-
tions. In: Pardalos and Romeijn (eds) Handbook of Global Optimization.
Volume 2, 515-569 (2002)
[Pin03a] Pinter, J.D.: (2003a) Globally optimized calibration of nonlinear models:
techniques, software, and applications. Optimization Methods and Soft-
ware, 18, 335-355 (2003)
[Pin03b] Pinter, J.D.: GAMS /LGO nonlinear solver suite: key features, usage,
and numerical performance. Submitted for publication. Downloadable at
ht t p: / /www. gams. com/sol vers/ Igo (2003)
[Pin04] Pinter, J.D.: LGO - A Model Development System for Contin-
uous Global Optimization. Users Guide. (Current revision.) Pinter
Consulting Services, Inc., Hahfax, NS (2004) (For a summary, see
ht tp: / /www. pinter consult ing. com)
[Pin05a] Pinter, J.D.: Applied Nonlinear Optimization in Modeling Environments.
CRC Press, Baton Rouge, FL (2005) (To appear)
[Pin05b] Pinter, J.D. (ed): Global Optimization - Selected Case Studies. Springer
Science + Business Media, New York (2005) (To appear)
[PHGE04] Pinter, J.D., Holmstrom, K., Goran, A.O., Edvall, M.M.: User's Guide
for TOMLAB /LGO. TOMLAB Optimization AB, Vasteras, Sweden (2004)
(See http://www.tomlab.biz)
[PK03] Pinter, J.D., Kampas, F.J.: MathOptimizer Professional - An Ad-
vanced Modeling and Optimization System for Mathematica Users
with an External Solver Link. User Guide. Pinter Consulting Ser-
vices, Inc., Halifax, NS, Canada (2003) (For a summary, see also
http://www.wolfram.com/products/ applications/mathoptpro/)
[PK05] Pinter, J.D., Kampas, F.J.: Model development and optimization with
Mathematica. In: Golden, B., Raghavan, S., Wasil, E. (eds) The Next Wave
in Computing, Optimization, and Decision Technologies, 285-302. Springer
Science + Business Media, New York (2005)
Nonlinear Optimization in Modeling Environments 173

[PP03] Pinter, J.D., Purcell, C.J.: Optimization of finite element models with
MathOptimizer and ModelMaker. Lecture presented at the 2003 Mathe-
matica Developer Conference, Champaign, IL (2003) (Extended abstract is
available upon request, and also from http://www.library.com)
[RR93] Ratschek, H., Rokne, J.: Experiments using interval analysis for solving a
circuit design problem. Journal of Global Optimization 3, 501-518 (1993)
[RR95] Ratschek, H., Rokne, J.: Interval methods. In: Horst and Pardalos (eds)
Handbook of Global Optimization. Volume 1, 751-828 (1995)
[Ric73] Rich, L.G.: Environmental Systems Engineering. McGraw-Hill, Tokyo
(1973).
[Sch02] Schittkowski, K.: Numerical Data Fitting in Dynamical Systems. Kluwer
Academic Publishers, Boston/Dordrecht/London (2002)
[Sch91] Schroeder, M.: Fractals, Chaos, Power Laws. Freeman & Co., New York
(1991)
[Ste95] Stewart, I.: Nature's Numbers. Basic Books / Harper and Collins, New
York (1995)
[SSPOl] Stortelder, W.J.H., de Swart, J.J.B., Pinter, J.D.: Finding elliptic Fekete
point sets: two numerical solution approaches. Journal of Computational
and Applied Mathematics, 130, 205-216 (2001)
[TS02] Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization
in Continuous and Mixed-integer Nonlinear Programming. Kluwer Acad-
emic Publishers, Boston/Dordrecht/London (2002)
[TKLPL03] Tervo, J., Kolmonen, P., Lyyra-Laitinen, T., Pinter, J.D., and Lahtinen,
T. An optimization-based approach to the multiple static delivery technique
in radiation therapy. Annals of Operations Research, 119, 205-227 (2003)
[TO04] TOMLAB Optimization. TOMLAB. TOMLAB Optimization AB,
Vasteras, Sweden (2004) (See http://www.tomlab.biz)
[Tre02] Trefethen, L.N.: The hundred-dollar, hundred-digit challenge problems.
SIAM News, Issue 1, p. 3 (2002)
[TM04] The MathWorks: MATLAB. (Current version: 6.5) The MathWorks, Inc.,
Natick, MA (2004) (See http://www.mathworks.com)
[VMMOO] Vladimirou, H., Maros, I., Mitra, G. (eds): Annals of Operations Re-
search, 99, Applied Mathematical Programming and Modeling IV (AP-
MOD 98) J.C. Baltzer AG, Science Publishers, Basel, Switzerland (2000)
[Wol02] Wolfram, S.: A New Kind of Science. Wolfram Media, Champaign, IL, and
Cambridge University Press, Cambridge (2002)
[Wol03] Wolfram, S.: The Mathematica Book. (Fourth Edition) Wolfram Media,
Champaign, IL, and Cambridge University Press, Cambridge (2003)
[WR04] Wolfram Research: Mathematica (Current version: 5.1). Wolfram Research,
Inc., Champaign, IL (2004) (See http://www.wolfram.com)
[Zab03] Zabinsky, Z.B.: Stochastic Adaptive Search for Global Optimization.
Kluwer Academic Publishers, Boston/Dordrecht/London (2003)
Supervised Data Classification via Max-min
Separability

Adil M. Bagirov and Julien Ugon

CIAO, School of Information Technology and Mathematical Sciences


University of Ballarat
VIC 3353, Austraha
a. bagirovQballaLrat. edu. au, j . ugonQballaxat. edu. au

Summary. The problem of discriminating between the elements of two finite sets
of points in n-dimensional space is a fundamental in supervised data classification.
In practice, it is unlikely for the two sets to be linearly separable. In this paper we
consider the problem of separating of two finite sets of points by means of piece-
wise linear functions. We prove that if these two sets are disjoint then they can be
separated by a piecewise linear function and formulate the problem of finding the
latter function as an optimization problem with an objective function containing
max-min of linear functions. The diff'erential properties of the objective function are
studied and an algorithm for its minimization is developed. We present the results
of numerical experiments with real world data sets. These results demonstrate the
eflPectiveness of the proposed algorithm for separating two finite sets of points. They
also demonstrate the effectiveness of an algorithm based on the concept of max-min
separability for solving supervised data classification problems.

K e y w o r d s : Supervised d a t a classification, separability, nonconvex optimiza-


tion, nonsmooth optimization.

1 Introduction
Supervised d a t a classification is an i m p o r t a n t area in d a t a mining. It has
m a n y applications in science, engineering, medicine etc. T h e aim of super-
vised d a t a classification is to establish rules for the classification of some
observations assuming t h a t the classes of d a t a are known. To find these
rules, known training subsets of the given classes are used. During the
last decades m a n y algorithms have been proposed and studied to solve su-
pervised d a t a classification problems. One of the promising approaches to
these problems is based on mathematical programming techniques. This ap-
proach has gained a great deal of attention over last years, see, for exam-
ple, [AG02, Bag05, BRSYOl, BRYOO, BRY02, BB97, BB96, BM92, BMOO,
BFM99, Bur98, CM95, Man94, Man97, Tho02, Vap95].
176 A.M. Bagirov, J. Ugon

There are different approaches for solving supervised data classification


problems based on mathematical programming techniques. In one of them the
use of mathematical programming techniques is carried out by reducing the
classification problem to the problem of separation of two finite sets of points
A and B in n-dimensional space. If co ^ p| co 5 = 0 then these two sets are
linearly separable and there exists a hyper plane which separates these two sets.
Linear programming techniques can be used to construct such a hyper plane.
If the convex hulls of A and B intersect then linear programming techniques
can be applied to obtain a hyperplane which minimizes some misclassification
measure. Algorithms based on such an approach are developed in [BB96,
BM92, CM95, Man94].
The paper [BM93] develops the concept of bihnear separability, where two
sets are separated using two hyperplanes. The problem of finding of these
hyperplanes is reduced to a certain bilinear programming problem. The paper
[BM93] presents an algorithm for solving the latter problem.
In the paper [AG02] the concept of polyhedral separability was introduced.
In this paper the case when co A f]B = 0 was considered. The set A is ap-
proximated by a polyhedral set. It is proved that the sets A and B are h-
polyhedrally separable for some h < \B\, where |-B| is the cardinality of the
set B, Thus in this case the sets A and B can be separated by a certain
piecewise linear function. The authors introduce an error function which is
nonconvex piecewise linear function. An algorithm for minimizing this func-
tion is proposed. The problem of the calculation of the descent direction in
this algorithm is reduced to a certain linear programming problem.
The paper [Bag05] introduces the notion of max-min separability where
two sets are separated by a piecewise linear function. Since any piecewise
linear function can be represented as a max-min of linear functions we call it
max-min separability. This approach can be considered as a generalization of
the linear, bilinear and polyhedral separabilities.
The problem of max-min separability is reduced to a certain nonsmooth,
nonconvex optimization problem. The objective function in this problem is
represented as a sum of functions containing max-min of linear functions and
it is a locally Lipschitz continuous. However this function is not Clarke regular
and the calculation of its subgradient is a difficult task. Therefore methods of
nonsmooth optimization based on subgradient information are not appropri-
ate for solving max-min separability problems.
In this paper we develop an algorithm for solving max-min separability
problems which uses only values of the objective function. This algorithm cal-
culates a descent direction by evaluating the so-called discrete gradient of the
objective function. The form of the objective function allows to significantly
reduce the number of its evaluations during the computation of the discrete
gradient. This is very important because each evaluations of the objective
function for large data sets is expensive.
We carried out some numerical experiments using large scale data sets.
We present their results and discuss them,
Supervised Data Classification via Max-min Separability 177

The structure of this paper is as follows. Section 2 provides some prelim-


inaries. In Section 3 the definition and some results related to the max-min
separability are given. An algorithm for solving max-min separability prob-
lems is discussed in Section 4. Results of numerical experiments are presented
in Section 5. Section 6 concludes the paper.

2 Preliminaries
In this section we present a brief review of the concepts of linear, bilinear and
polyhedral separability.

2.1 Linear separability

Let A and B be given sets containing m and p n-dimensional vectors, respec-


tively:
^ = { a \ . . . , a ^ } , a'eW, 2-l,...,m,
B = {b\..,,b^}, 6^GIR", J = l , . . . , p .
The sets A and B are linearly separable if there exists a hyperplane {x, ?/},
with X G IR"", y eJR^ such that
1) for any j = 1 , . . . ,m
(x,a-^) - 2 / < 0,
2) for any /c = 1 , . . . ,p
{x,b^)-y>0.
The sets A and B are linearly separable if and only if co Apjco-B == 0.
In practice, it is unlikely for the two sets to be linearly separable. Therefore
it is important to find a hyperplane which minimizes some misclassification
cost. In the paper [BM92] the problem of finding this hyperplane is formulated
as the following optimization problem:

minimize f{x,y) subject to (x,y) G H"'"*"^ (1)

where
^ m ^ p
f{x,y) = ~ V m a x (O, (x,a^) - y + l) + - V^max (O, -{x,V) + y + l)
i=l ^ 3~l

is an error function. Here (•, •) stands for the scalar product in IR"^. The authors
describe an algorithm for solving problem (1). They show that the problem
(1) is equivalent to the following linear program:

1 ^ ^ 1 ^
minimize
178 A.M. Bagirov, J. Ugon

subject to
ti > (x, a') - y H- 1, i = 1 , . . . , m,
Zj >-{x,V)-{-y-{-l, j = l,...,p,
t>0, z>0,
where ti is nonnegative and represents the error for the point a'^ e A and Zj
is nonnegative and represents the error for the point b^ e B.
The sets A and B are hnearly separable if and only if / * = /(x*,^*) = 0
where (x*,y*) is the solution to the problem (1). It is proved that the trivial
solution X = 0 cannot occur.

2.2 Bilinear separability

The concept of bilinear separabihty was introduced in [BM93]. In this ap-


proach two sets are separated using two hyperplanes. We again assume that
A and B are given sets containing m and p n-dimensional vectors, respectively.
Definition 1. (see [BM93]). The sets A and B are bilinear separable if and
only if there exist two hyperplanes {x^,yi) and (x^,2/2) such that at least one
of the following conditions holds:
1. For any j = 1 , . . . , m

(x^a^•)-yi<0, /-l,2
and for any fc = 1 , . . . ,p there exists I G {1,2} such that

{x',b^)-yi>0,
2. For any A; = 1 , . . . , p
{x\b^)-yi<0, 1 = 1,2
and for any j = 1 , . . . , m there exists / G {1,2} such that

{x\a^)-yi >0.
3. For any j = 1 , . . . , m either

{x\a^)-yi<0, / = l,2
or
(-x^a^•)^-y/<0, / = 1,2
and for any A: — 1 , . . . , p either
{x\b^)-yi<0, {-x^,b'')+y2<0
or
{-x\b'')+yi<0, {a:^&'=)-2/2>0.
Supervised Data Classification via Max-min Separability 179

We reformulate Definition 1 using max and min statements.


Definition 2. The sets A andB are bilinear separable if and only if there exist
two hyperplanes (x^^yi) and (0:^,2/2) such that at least one of the following
conditions holds:
1. For any j == 1 , . . . ,m

max{(x^,a-^) — yi} < 0

and for any k = 1,... ^p

max{(x^6^) - yi} > 0.

2. For any fc = 1 , . . . ,p
max{{x\b'')-yi}<0

and for any j = 1 , . . . , m

max{(x^,a"^) — yi} > 0.

3. For any j — 1 , . . . ,m,

max[min{(x\a-^) -yi,-{x^,a^) + y2},niin{-(x\a^) + y 1, (x^,a-^) -7/2}]


<0

and for any /c = 1 , . . . ,p,

max [min{(x\ 6^) - 2/1,-(x^ 6^) + 2/2},min{-{x\ 6^) + yi, (x^ 6^) - ^2}]
>0.

The problem of bilinear separability is reduced to a certain bilinear pro-


gramming problem and the paper [BM93] presents an algorithm for its solu-
tion.

2.3 Polyhedral separability

The concept of /i-polyhedral separability was developed in [AG02]. The sets


A and B are /i-polyhedrally separable if there exists a set of h hyperplanes
{x\yi}, with
x' eWC, yi G I R \ i = l , . . . , / i
such that
1) for any j = 1 , . . . , m and i = 1 , . . . , /i

{x\a^)-yi <0,
180 A.M. Bagirov, J. Ugon

2) for any A; = 1 , . . . ,p there exists at least one i G { 1 , . . . , /i} such that

{x\b^)-yi>0,

It is proved in [AG02] that the sets A and B are /i-polyhedrally separable, for
some h < p ii and only if
co^Pl^-0.
Figure 1 presents one example of polyhedral separability.
The problem of polyhedral separability of the sets A and B is reduced to
the following problem:

minimize f{x,y) subject to (x,2/) G IR^"'^^^''^ (2)

where
^ 771

max 0, max {{x\a^) -yi + l}


+
l<i<h
m .
1 p"
max 0 min {-{x\b^)^yi + l}

is an error function. Note that this function is a nonconvex piecewise linear


function. It is proved that x* = 0, i = 1,... ,h cannot be the optimal solution.
Let {x^ 5^}, f = 1 , . . . , /i be a global solution to the problem (2). The sets A
and B are /i-polyhedrally separable if and only if / ( x , y) == 0. If there exists a
nonempty set 7 C { 1 , . . . , /i} such that x^ = 0, i e I, then the sets A and B are
{h — |J|)-polyhedrally separable. In [AG02] an algorithm for solving problem
(2) is developed. The calculation of the descent direction at each iteration of
this algorithm is reduced to a certain linear programming problem.
The advantage of this technique is that it does not restrict the search
to only a convex polyhedron, and thus allows both the sets A and B to be
nonconvex. One disadvantage, however, is that it only considers the sets sep-
arately.

3 Max-min separability
In many practical applications two sets are not linearly, bilinearly or poly-
hedrally separable. Figure 2 presents one such case. In this case two sets are
separable with more complicated piecewise linear function.

In this section we describe the concept of max-min separability and introduce


an error function (see [Bag05]).
Supervised Data Classification via Max-min Separability 181

•5 0 "•'•.

^0 0 0 ^^00^
^1 §/
(P 0 0/
\ 0 0 G i-'G 0 Q'
% <5Go

° Q 0 /' \ G o
X. H e >'•

i t , ^\

1 .^\^ J

Fig. 1. Polyhedral separability.

Fig. 2. The sets A and B are separated by a piecewise linear function.

3.1 D e f i n i t i o n a n d p r o p e r t i e s

Let H = {hi,..., hi}, where hj = {x^ ,yj}, j = 1,... J with x^ G IR^, Vj G


I R \ be a finite set of hyperplanes. Let J = {1 , / } . Consider any partition
182 A.M. Bagirov, J. Ugon

of this set J^ = {Ji^ •'' iJr} such that

J;t^0, A:-l,...,r, Jkf]jj=^, \JJk = J^


k=i

Let / == { l , . . . , r } . A particular partition J'^ = { J i , . . . , J ^ } of the set J


defines the following max-min-type function:
(p{z) — max min [{r?,z) — Vj} , z G IR^. (3)

In Figure 3 two sets are max-min separable.


Let A,B (ZM^ be given disjoint sets, that is ^ f| 5 = 0.
Definition 3. The sets A and B are max-min separable if there exist a finite
number of hyperplanes {x^^yj} with x^ G IR'^, yj G I R \ j G J = { 1 , . . . , / }
and a partition J^ = {Ji^... ,Jr} of the set J such that
1) for all i e I and a E A
min {{x^ ,a) — yA < 0;

2) for any b e B there exists at least one i e I such that


min {{x^ ^b) —yj] > 0.

Remark 1, It follows from Definition 3 that if the sets A and B are max-min
separable then (p{a) < 0 for any a E A and (p{b) > 0 for any b e B^ where the
function (p is defined by (3). Thus the sets A and B can be separated by a
function represented as a max-min of linear functions. Therefore this kind of
separability is called a max-min separability.
Remark 2. Linear and polyhedral separability can be considered as particular
cases of the max-min separability. If / = {1} and Ji = {1} then we have the
linear separability and if / = { 1 , . . . , /i} and Ji = {i}, i E I we obtain the
/i-polyhedral separability.
Remark 3. Bilinear separability can also be considered as particular case of the
max-min separability. It follows from Definition 2 that the bilinear separability
of two sets A and B coincides with one of the following cases:
1. The sets A and B are 2-polyhedrally separable and c o ^ Q - B — 0;
2. The sets A and B are 2-polyhedrally separable and c o 5 Q ^ = 0;
3. The sets A and B are max-min separable with the following hyperplanes:

{ ( x \ 2/i), ( - x \ - y i ) , (x^, 2/2), {-x^, - ^ 2 } .


In this case / = {1,2} and Ji = {1,4}, J2 ^ {2,3}. Thus the bilinear
separable sets are also max-min separable.
Supervised Data Classification via Max-min Separability 183

..--'-o"„'""'"°'""'"'i"o" r'o"=^o"°°"«"o°""^;£--..,

o:?^

Fig. 3. Max-min separability.

Proposition 1. (see [Bag05]). The sets A and B are max-min separable if


and only if there exists a set of hyperplanes {x^^yj} with x^ G IR'^, yj G
IR^, j E J and a partition J'^ = {Ji^..., Jr} of the set J such that
1) for any i E I and a e A

min{(x^a) - yj} < - 1 ;

2) for any b G B there exists at least one i E I such that

min {{x^,b)-yj} > 1.

Proof Sufficiency is straightforward.


Necessity. Since A and B are max-min separable there exists a set of hyper-
planes {x^,yj} with x^ G IR"^, yj G I R \ j G J, a partition J^ of the set J and
numbers (Ji > 0, S2 > 0 such that

maxmaxmin Ux^,a) — Vn] = —Si

and
minmaxmin {{x^^b) — yj\ = S2.

We put 6 — min{5i, (^2} > 0. Then we have


max min I (^-^, a) —yA < —5, Va G A, (4)
184 A.M. Bagirov, J. Ugon

maxmin {(x-^, 6) - yj} > S, \/b e B. (5)

We consider the new set of hyper planes {x^ ,yj} with x^ G IR^, yj G IR , j G
J, defined as follows:
x^ — x^ /S, j G J,
y^ =y^/5, j G J.
Then it follows from (4) and (5) that

max min {(x-^, a) — T/J } < — 1, Va e A,

max min {{x^ ,b) — y^} > 1, Mb ^ B^


iei jeJi
which completes the proof. D

Proposition 2. (see [Bag05]). The sets A and B are max-min separable if


and only if there exists a piecewise linear function separating them.

Proof Since max-min of linear functions is piecewise linear function the ne-
cessity is straightforward.
Sufficiency. It is known that any piecewise Hnear function can be represented
as a max-min of linear functions of the form (3) (see [BKS95]). Then we get
that there exists max-min of linear functions that separates the sets A and B
which in its turn means that these sets are max-min separable. D

Remark 4- It follows from Proposition (2) that the notions of max-min and
piecewise linear separability are equivalent.

Proposition 3. (see [Bag05]). Assume that the set A can be represented as


a union of sets Ai, i = 1,... ,q :

A=\jAi
i=l

and for any i = 1,... ^q


^p|co^^-0. (6)
Then the sets A and B are max-min separable.

Proof It follows from (6) that b ^ coAi for all b e B and i G { l , . . . , g } .


Then, for each b e B and i G { 1 , . . . , g} there exists a hyperplane {x'^{b),yi{b)}
separating b from the set co Ai, that is

{x'{b),b)-yi{b)>0,
Supervised Data Classification via Max-min Separability 185

{x\b),a) -yi(b) < 0, Va 6 coA^, i = 1 , . . . ,g.


Then we have
min {{x\h),b)-yi{b)]>^
1=1,...,q
and
min {(x'(6),a)-?/i(6)} < 0, "ia e A.
i=l,...,q ^

Thus we obtain that for any b^ G JB, j = 1 , . . . ,p there exists a set of q


hyperplanes {x'^{b^),yi{b^)}, i = 1,,.. ,q such that

min {{x\b^),b^)-yi{b^)}>0 (7)


1=1,...,g

and
min {{x\V),a)-yi{V)} <0, Va G A (8)
i=l,...,q

Consequently we have pq hyperplanes

{x\b^),yi{b^)} , i - l , . . . , g , j = l,...,p.

The set of these hyperplanes can be rewritten as follows:


H = {hi,..., / i j , /ii+o-i)g = {x\b^),yi{b^)} ,
i = l,..,,q, j = l,..,,p, l=pq.
Let J = { 1 , . . . , / } , / = {!,... ,p} and

^i+O-i). ^ ^^(5.•)^ y^+0-1), - ?/i(6^), i - 1,...,g, j - 1 , . . . ,p.

Consider the following partition of the set J:

J ^ - { J i , . . . , J p } , Jk = {{k-l)q + l,,,,,kq}, k= l,...,p.

It follows from (7) and (8) that for all A: G / and a e A

min {{x^,a) — yj] < 0

and for any b ^ B there exists at least one A: G / such that

min {{x^ ,b) — yA > 0

which means that the sets A and B are max-min separable. D


Corollary 1. (see [Bag05]). The sets A and B are max-min separable if and
only if they are disjoint: ^ p| B = 0.
Proof. Necessity is straightforward.
Sufficiency. The set A can be represented as a union of its own points. Since
the sets A and B are disjoint the condition (6) is satisfied. Then the proof of
the corollary follows from Proposition 3. D
186 A.M. Bagirov, J. Ugon

In the next proposition we show that in most cases the number of hyper-
lanes necessary for the max-min separation of the sets A and B is hmited.

Proposition 4. (see [Bag05]). Assume that the set A can be represented as a


union of sets Ai, i = 1^... ,q and the set B as a union of sets Bj, j = 1 , . . . , rf
such that
q d
A=\jAu B=[JB^
i=i j=\

and
coAiC\coBj=^ for alH = 1,...,g, j = 1,...,ci. (9)
Then the number of hyperplanes necessary for the separation of the sets A and
B is at most q • d.

Proof Let i G { l , . . . , g } and j G { l , . . . , ( i } be any fixed indices. Since


CO Ai PI CO Bj = 0 there exists a hyperplane {x^-^, y^j} with x'^^ G IR^, yij G IR^
such that
(x'-^', a) - yij < 0 Va G CO ^ i
and
{x'^,b)-yij>0 ybe CO Bj.
Consequently for any j G {1,... ,d} there exists a set of hyperplanes {x'^^, yij}^
i — 1^.., ,q such that

. min {x'^,b) - yij > 0, V6 G Bj (10)


i—l,...,q

and
min (x'-^",a) - yij < 0, Va G A. (11)

Thus we get a system oil — dq hyperplanes:

H = {h^,...M}

where /i^+Q_i)g = {x'^ ,yij] , z = l , . . . , g , j -- 1 , . . . ,Gf. Let J = { 1 , . . . , / } ,


/ = { 1 , . . . , 6/} and

Consider the following partition of the set J:

J ^ - { J i , . . . , J 4 , Jk = {{k-l)q-^l,..,M}. k = l,..,,d.
It follows from (10) and (11) that for all /c G / and a e A

min I(x^,a) — Vj] < 0


Supervised Data Classification via Max-min Separability 187

and for any b e B there exists at least one k e I such that

min{(x^6) - yj} > 0,

that is the sets A and B are max-min separable with at most ^-rf hyper planes.
D

Remark 5. The only cases where the number of hyperplanes necessary is large
are when the sets Ai and Bj contain a very small number of points. This
situation appears only in the particular case where the distribution of the
points is like a " chessboard".

3.2 Error function

Given any set of hyperplanes {x^ ,yj}, j e J = {1^... J} with x^ G IR^, yj G


IR^ and a partition J^ = {^i, • • •, Jr} of the set J, we say that a point a G A
is well separated from the set B if the following condition is satisfied:

max min {{x^, a) — ?/j } + 1 < 0.


it-/ J^Ji

Then we can define the separation error for a point a G ^ as follows:

max 0, max min {{x^, a) — yj + 1} (12)

Analogously, a point h £ B \s said to be well separated from the set A if


the following condition is satisfied:

minmax{ —(a;-^,6) -\-yj] + 1 < 0.

Then the separation error for a point h G B can be written as

max p , m i n m a x { - ( x ^ 6 ) + t / j + 1} . (13)

Thus, an averaged error function can be defined as

f{x,y) = ( l / m ) y ^ m a x 0,maxmin {{x^,a^) — yj + l }

+(Vp)X^max 0, min max { — {x^, 6*) + yj + 1} (14)


t=i
where X ^ ( x \ . . . , x O G IR^^"", y = (yi,...,y/) G IR^ It is clear t h a t / ( x , y ) >
Ixn
Ofor allxGlR^'''' and y G IR\
188 A.M. Bagirov, J. Ugon

Proposition 5. (see [Bag05]). The sets A and B are max-min separable if


and only if there exists a set of hyperplanes {x^ ^yj},j G J = { 1 , . . . , / } and a
partition J^ = { J i , . . . , Jr} of the set J such that f{x,y) = 0.

Proof Necessity. Assume that the sets A and B are max-min separable. Then
it follows from Proposition 1 that there exists a set of hyperplanes {x^,yj},j G
J and a partition J^ = {Ji^ - • - ,Jr} of the set J such that

min{(x^a) - yA < - 1 , Va G A, i G / = { 1 , . . . , r } (15)

and for any b e B there exists at least one t E I such that

xmn{{x^b}-yj}>l. (16)
J&Jt

Consequently we have

maxmin{(x-^,a) — y.- + 1} < 0, \fa E A,


iel jGJi

minmax{-(x-^',6)+2;j+ 1} < 0, MbeB,


iei jeJi
Then from the definition of the error function we obtain that f{x,y) = 0.
Sufficiency. Assume that there exist a set of hyperplanes {x^,yj},j EJ—
{ 1 , . . . , /} and a partition J'^ — { J i , . . . , Jr} of the set J such that / ( x , y) = 0.
Then from the definition of the error function / we immediately get that the
inequalities (15) and (16) are satisfied, that is the sets A and B are max-min
separable. D

Proposition 6. (see [Bag05]). Assume that the sets A and B are max-min
separable with a set of hyperplanes {x-^, y^}, j G J = { 1 , . . . , /} and a partition
J^ — {Ji^ • • • ? Jr] of the set J. Then
1) x^ ==0, j E J cannot be an optimal solution;
2) if
(a) for any t e I there exists at least one b e B such that

max{ —(x-^,6) +yj + 1} = minmax{ — (x-^,6) + yj + l } , (17)

(b) there exists J = {Ji, •.., Jr} such that Jt C Jt, Vt G / , Jt is nonempty
at least for one t E I and x^ = 0 for any j E Jt^ t E I,
Then the sets A and B are max-min separable with a set of hyperplanes
{x^,yj},j G J^ and a partition J = { J i , . . . , Jr} of the set J^ where
r
Jt = Jt\Jt, tGl and J° = U Ji.
i=l
Supervised Data Classification via Max-min Separability 189

Proof. 1) Since t h e sets A and B are max-min separable we get from Propo-
sition 5 t h a t f{x,y) = 0. If x^ = 0, j e J then it follows from (14) t h a t for
any y elR^

f{0,y) = ( l / m ) ^ m a x 0,maxmin{—yj + 1}
iei jeJi
k=i

-(i/p)E m a x 0,minmax{v7 + 1}
t=i iei jeJi ^^ ^
We denote
R = maxminj—Vi).
iei jeJi -^^
T h e n we have
m i n m a x v i — — m a x mini—V7} — —R.
iei jeJi -^ iei jeJi -^^
Thus
/(O, y) = m a x [0, i^ + 1] + m a x [0, -R+l\.
It is clear t h a t

-R+l \{R<-1,
m a x [0, i? + 1] + m a x [0, - i ? + 1] = < 2 if - 1 < J R < 1 ,
R+l ifi?>l.

T h u s for any y eM^


/(0,y)>2.

On the other side f{x,y) = 0 for the optimal solution (a:,y), t h a t is x^


0, j G J cannot be the optimal solution.
2) Consider the following sets:

I^ = {ieI:Jiy^ 0},

It is clear t h a t Ji == 0 for any i e I^ \I^ and J^ = 0 for any i e P \ P.


It follows from the definition of the error function t h a t

0 = f{x,y) = — y ^ m a x p , m a x m i n {(x-^,a^) — yj -\-1}


Tit I iei jeJi
k=i

1 ^
m a x 0, min m a x { — (x-^, h^) + T/^ + 1}
%ei- jej%
t=i
Since the function / is nonnegative we obtain
190 A.M. Bagirov, J. Ugon

maxmin{(x-^,a) - yj + l } < 0, ^a e A, (18)


iei jeJi
m i n m a x { - ( a ; ^ 6 ) + y j + l } < 0, \/beB, (19)

It follows from (17) and (19) that for any i e P there exists a point b e B
such that
m a x { - ( x ^ 6 ) + y ^ - f l } < 0. (20)

Hi e P C P then we have

0 > max { —(x-^, 6) + y^ + l } = max < max { — {x^, 6) + ?/j + 1} , max {yj + 1} >

which means that


ma_x{-(x^6)+%-fl} < 0 (21)

and
max{2/j+1} < 0. (22)
jeJi
Ifi e P\P then from (20) we obtain

0 > max{ —(x-^, &) H-^j + 1} = maxjy^ + 1} .

Thus we get that for all i e P the inequality (22) is true. (22) can be rewritten
as follows:
msixyj < - 1 , Vi e P. (23)
jeJi
Consequently for any i E P

min {-yj + 1} = - max^/j -h 1 > 2. (24)


jeJi jGJi
It follows from (18) that for any i e I and a e A

m i n { ( x ^ a ) -2/j + l } < 0. (25)

Then for any i e P we have

0 > min {{x^, a) — y^ + 1} = min < min {{x^, a) — y^ + 1} , min {—yj + l}> .
J^'Ji [jeJi jeJi J

Taking into account (24) we get that for any i G P and a e A

m i n { ( x ^ a ) - yj + l} < 0. (26)
jeJi
liieP\P then it follows from (25) that
Supervised Data Classification via Max-min Separability 191

mm{-yj + l} < 0
jeJi

which contradicts (24). Thus we obtain that P\I^ ^^ cannot occur, P C P


and P = P. It is clear that Ji = Ji for any i e P \P. Then it follows from
(18) that for any i e P\P and a e A

m i n { ( x ^ a ) - % + l } < 0. (27)

Prom (26) and (27) we can conclude that for any i E I and a E A

mm{{x^,a)-yj + l} < 0. (28)


jeJi

It follows from (19) that for any b G B there exists at least one i E I

mdix{-{x^,b) +yj + l } < 0.

Then from expression

Lax <{ — {x^ J &) + 2/j + 1} = max < max { — {x^, &) + y^ + 1} , max {yj + 1} ^
max

we get that for any b e B there exists at least one i e I such that

ma_x{-(x^6)+?/jH-l} < 0. (29)


jeJi

Thus it follows from (28) and (29) that the sets A and B are max-min sepa-
rable with the set of hyperplanes {x-^, y^}, j G P and a partition J of the set
P, D

Remark 6. In most cases, if a given set of hyperplanes with a particular par-


tition separates the sets A and JB, then there are other sets of hyperplanes
with the same partition which will also separate the sets A and B (see Figure
4). The error function (14) is nonconvex and if the sets A and B are max-min
separable, then the global minimum of this function f{x*,y^) — 0 and the
global minimizer is not unique.

4 Minimization of the error function


In this section we discuss an algorithm for minimization of the error function.
192 A.M. Bagirov, J. Ugon

-o-?^Too"o;o-^-^-..
..o'< 0^
,JS> 0 ' ^ \

0 ^«r. ^G ^ 0.,

0 Oo^

,-^0^ oo^ \
^ ^ / ^\

"< ''

Fig. 4. Max-min separability.

4.1 Statement of problem

The problem of the max-min separability is reduced to the following mathe-


matical programming problem:

minimize f{x,y) subject to (x,y) G K^''"^^^^^ (30)

where the objective function / has the following form:

f{x,y) = fi{x,y)-\-f2{x,y)

and
^ m r
/ i {x, y) = — / max 0, max min {{x^, a^) — y. + 1} (31)
m k=l
p
1 1
f2(x.y) = - > max O.min max { — (x^.b^) -\-yj -\-l\ (32)

The problem (30) is a global optimization problem. However, the number


of variables in this problem is large and the global optimization methods
cannot be directly applied to solve it. Therefore we will discuss algorithms for
finding local minima of the function / .
Supervised Data Classification via Max-min Separability 193

The function / i contains the follov^ing max-min functions:

y^ik{x,y) = mdiX min {{x^,a^)-yj+ l] , fc = l , . . . , m

and the function /2 contains the following min-max functions:

(P2t{x,y) =min max{-(x^6*) + y^-+ 1} , t = l,...,p.

4.2 Differential properties of the objective function

Both functions / i and /2 are nonsmooth, nonconvex piecewise linear. These


functions contain some max-min-type functions. The functions / i and /2 and
consequently, the function / are locally Lipschitz continuous. We will recall
some definitions from nonsmooth analysis.
We consider a locally Lipschitz function ip defined on IR". This function
is diff"erentiable almost everywhere and one can define for it a Clarke subdif-
ferential (see [Cla83]), by

d(p{x)
= co{v eJRJ^ : 3{x^ e D{ip),x^ —> x,k —> +oo) :v= lim V(^(x^)},
k >-j-oo

here D{(p) denotes the set where (p is diff'erentiable, co denotes the convex
hull of a set.
The function cp is differentiable at the point x G IR^ with respect to the
direction g G IR" if the limit

LD (x, q) = lim -^ —^

exists. The number ^'{x^g) is said to be the derivative of the function (p with
respect to the direction g G IR^ at the point x,
The Clarke upper derivative cp^{x,g) of the function (p at the point x with
respect to the direction g G IR^ is defined as follows:

^^x,g)= limsup ^(y + ^9)-^{y)_

The following is true (see [Cla83])

(p^{x,g) = m^x{{v,g) : v G d(p{x)}.

It should be noted that the Clarke upper derivative always exists for locally
Lipschitz continuous functions. The function (p is said to be Clarke regular at
the point x G IR"" if
(p'{x,g) =(p^{x,g)
194 A.M. Bagirov, J. Ugon

for all g G IR^. For Clarke regular functions there exists a calculus (see [Cla83,
DR95]). However in general for non-regular functions such a calculus does not
exist.
The function C/P is called semismooth at x G H^, if it is locally Lipschitz
continuous at x and for every g G IR'^, the limit

lim {v,g)
ved^(x-htg'),g'-^g,t~^+0

exists (see [Mif77]).


Let us return to the objective function / of problem (30). Since this func-
tion is locally Lipschitz continuous it is Clarke subdifferentiable.
Proposition 7. The function f is semismooth.
Proof. The sum, the maximum and the minimum of semismooth functions are
semismooth (see [Mif77]). A linear function, as a smooth function, is semi-
smooth. Thus the function / which is the sum of functions represented as the
maximum of 0 and max-min of linear functions, is semismooth. D

The properties of max-min type functions were studied, for example, in


[DDM02, Pol97]. Max-min-type functions in general are not Clarke regular.

Example 1. Consider the function

(^{x) = max {min{3a;i + X2, 2xi + 3a;2}, min{xi + 2^2, ^xi + 4x2}} •

The Clarke subdifferential of this function at the point x = (0,0) is

9^(x)=co{(3,l),(2,3),(l,2),(4,4)}.

Then the Clarke upper derivative (p^{x,g^) of the function (f at the point
X = (0,0) with respect to the direction g^ — (0,1) is

(^°(x,^°) - max{(^,^o^ : v G d^{x)} = 4.

However, the directional derivative of this function with respect to the direc-
tion g^ = (0,1) is ip\x^g) = 2 that is (p^x^g^) < cp^{x^g^). Thus the function
(p is not Clarke regular.

Since the function / contains max-min of linear functions this function is


not Clarke regular apart from linear separability. Therefore, subgradients of
the function / cannot be calculated using subgradients of the involved max-
min-type functions. We can conclude that the calculation of the subgradients
of the function / is a very difficult task and therefore the application of meth-
ods of nonsmooth optimization requiring a subgradient evaluation at each
iteration, including bundle method and its variations([HL93, Kiw85, MN92]),
cannot be effective.
Supervised Data Classification via Max-min Separability 195

In the paper [KP98] optimization problems with twice continuously dif-


ferentiable objective functions and max-min constraints were considered and
these problems were converted to problems with smooth objective and con-
straint functions. However, this approach cannot be applied to the problem
(30), because the function / contains not only max-min-type functions but
also min-max-type functions.
Since the evaluation of subgradients of the function / is difficult, direct
search methods of optimization seem to be the best option for solving problem
(30). Among such methods we mention here two widely used methods: Pow-
ell's method (see [Pow02]) which is based on a quadratic approximation of the
objective function and Nelder-Mead's simplex method [NM65]. As was men-
tioned in [Pow02] Powell's method performs well when the number of variables
is less than 20. For the simplex method this number is even smaller. Moreover,
both methods are effective when the objective function is smooth. However, in
the max-min separabiHty problem the number of variables is riy = {n-\-l) x I
where n is the dimension of the sets A and B (ranging from 5 to thousands in
real world datasets), and / is the number of separating hyperplanes. In many
cases the number riy is greater than 20. Furthermore, the objective function
in this problem is a quite complicated nonsmooth function.
In this paper we use the discrete gradient method to solve the problem
(30). The description of this method can be found in [Bag99a, Bag99b] (see,
also, [Bag02]). The discrete gradient method can be considered as a version
of the bundle method ([HL93, Kiw85, MN92]), where subgradients of the
objective function are replaced by its discrete gradients.
The discrete gradient method uses only values of the objective function.
It should be noted that the calculation of the objective function in the prob-
lem (30) can be expensive. We will show that the use of the discrete gradient
method allow to significantly reduce the number of objective function evalu-
ations.

4.3 Discrete gradient method

In this subsection we will briefly describe the discrete gradient method. We


start with the definition of the discrete gradient.

Definition of the discrete gradient

Let / be a locally Lipschitz continuous function defined on IR^. Let

S, = {geJR'': \\g\\ = 1},


G = {e^JR"^ : e= ( e i , . . . ,en), le^l = 1, j == l , . . . , n } ,
P = {z{X) : z{X) G M\ Z{X) > 0, A > 0, X'^iX) -^ 0, A -^ 0},
/(^,a) - {i G { l , . . . , n } : \gi\>a},

where a G (0,n~^/^] is a fixed number.


196 A.M. Bagirov, J. Ugon

Here ^i is the unit sphere, G is the set of vertices of the unit hyper cube
in H^ and P is the set of univariate positive infinitesimal functions.
We define operators Hi : IR'^ -^ IR^ for z = 1 , . . . , n, j = 0 , . . . , n by the
formula

^,^ [ ( . . . . . , , , , 0 ..,0) if,<i, (33^


[(^i,...,^i_i,0,^i+i,...,5f^-,0,...,0) if J > I.
We can see that

' ' \ 0 ifj = i

Let e{f3) = {/3ei,p'^e2,... ,/?''en), where /? G (0,1]. For xemJ'we consider


vectors
xi = xi(g,e,z,X,0) = x + Xg - ^(A)/?/e(/?), (35)
where g E Si, e e G, i e I{g, a), z e P, A > 0, j = 0 , . . . , n, j y^ i.
It follows from (34) that

^j-i _^j ^ f(0,...,0,2(A)e,(/?),0,...,0) if j - l , . . . , n , j V ^ , /3gx


^' ^' [ 0 ifi = ^.
It is clear that H^g = 0 and x^{g, e, z, A, /?) = x + A^ for all i G / ( ^ , a).
Definition 4. (^see [BG95]) The discrete gradient of the function f at the
point X G IR"" is the vector r\x,g,e,z,X,(3) = ( r / , . . . , r ^ ) G IR"",^ G 5 i , i G
I{g,a), with the following coordinates:

r ; = [ziX)ejm-' [f{xi-\g,e,z,X,(3)) - f{xi{g,e,z,X,P))

j = l,...,n,j ^i,

rt = (Xgil - l f(x^ig,e,z,X,(3))-f{x)- ^ qiXgj - ziX)ejm

A more detailed description of the discrete gradient and examples can be


found in [Bag99b].
Remark 7. It follows from Definition 4 that for the calculation of the discrete
gradient /"^(x, ^, e, z, A, /?), z G I{g,a) we define a sequence of points
0 i~l i-\-l n

For the calculation of the discrete gradient it is sufficient to evaluate the


function / at each point of this sequence.
Supervised Data Classification via Max-min Separability 197

Remark 8. The discrete gradient is defined with respect to a given direction


g ^ Si. We can see that for the calculation of one discrete gradient we have
to calculate (n + 1) values of the function / : at the point x and at the points
x](^, e, 2;, A, /?), j = 0 , . . . , n, j j^ i. For the calculation of the next discrete
gradient at the same point with respect to any other direction g^ e Si we
have to calculate this function n times, because we have already calculated /
at the point x.

Calculation of the discrete gradients of the objective function (30)

Now let us return to the objective function / of the problem (30). This function
depends on (n + 1)/ variables where / is the number of hyperplanes. The
function / i contains max-min functions (pik

(pik{x,y) =m^x mm^ijk{x,y), k= l,...,m


iei jeJi

where
i^ijk{x,y) = (x^a^) -Vj + 1, j G Ji, i e I.
We can see that for every A: = 1 , . . . , m, each pair of variables {x^^Vj} appears
in only one function ipijk-
For a given i = 1 , . . . , (n + 1)/ we set

i-1
Qi = + 1, di==i-{qi-l){n + l)
n+ l
where [u\ stands for the floor of a number u. We define by X the vector of
all variables {x-^,T/J}, j = 1 , . . . , /:

X = (Xi, X2, . . . , X(^ri-\-l)l)


where
^_ ^ ix% if 1 < d, < n,
[Vg. if di = n-\-l.
We use the vector of variables X to define a sequence

as in Remark 7. It follows from (36) that the points Xl~^ and XI diSei by one
coordinate only. This coordinate appears in only one linear function ipiq^k- It
follows from the definition of the operator H^ that X | = X*~^ and thus this
observation is also true for X^'^^. Then we get

Moreover the function ipiq^k can be calculated at the point X^ using the value
of this function at the point XIti- ^, i > 1:
198 A.M. Bagirov, J. Ugon

, .yi^ /V'i<,.fc(^r')-^(A)aS,e,(/3) if 1 < d^ < n,


^uM^t) - | ^ ^ ^ ^ , ( x ; - i ) + z{X)e,{P) if d, = n + 1 ^"^

In order to calculate the function / i at the point XI ^ i > 1 first we have


to calculate the values of the functions i^iq^k for all a'^ G A^k = l , . . . , m
using (37). Then we update / i using these values and the values of all other
hnear functions at the point Xl~^ according to (31). Thus we have to apply
a full calculation of the function / i using the formula(31) only at the point
X^^X + Xg.
Since the function /2 has a similar structure as / i we can calculate it in
the same manner using a formula similar to (37).
Thus for the calculation of each discrete gradient we have to apply a full
calculation of the objective function / only at the point X^ = X-hXg and this
function can be updated at the points XI^ i > I using a simplified scheme.
We can conclude that for the calculation of the discrete gradient at a point
X with respect to the direction g^ G Si we calculate the function / at two
points: X and X^ = X -\- Xg^. For the calculation of another discrete gradient
at the same point X with respect to any other direction g^ e Si we calculate
the function / only at the point: X -\- Xg^.
Since the number of variables {n -\- 1)1 in the problem (30) can be large
this algorithm allows to significantly reduce the number of objective function
evaluations during the calculation of a discrete gradient.
On the other hand the function / i contains max-min-type functions and
their computation can be simplified using an algorithm proposed in [Evt72].
The function /2 contains min-max-type functions and a similar algorithm can
be used for their calculation.
Results of numerical experiments show that the use of these algorithms
allows one to significantly accelerate the computation of the objective function
/ and its discrete gradients.

Discrete gradient method

We consider the following unconstrained minimization problem:

minimize (p{x) subject to x G IR'^ (38)

where the function (p is assumed to be semismooth. We consider the following


algorithm for solving this problem. An important step in this algorithm is
the calculation of a descent direction of the objective function (p. So first, we
describe an algorithm for the computation of this descent direction.
Let 2: G P, A > 0,/? G (0,1], the number c G (0,1) and a small enough
number 5 > 0 be given.

Algorithm 1. An algorithm for the computation of the descent direction.


Supervised Data Classification via Max-min Separability 199

Step 1. Choose any g^ e Si,e e G,i e I{g^,a) and compute a discrete


gradient v^ = r'^{x,g^,e,z,X,p). Set Di{x) — {v^} and k = 1.
Step 2. Calculate the vector ||tx;'^|| = min{||K;|| : w G Dk{x)}. If

\\w^\\<S, (39)

then stop. Otherwise go to Step 3.


Step 3. Calculate the search direction by g^'^^ = —\\w^\\~'^w^.
Step 4. If
ip{x + A/+1) - ip{x) < -cA||ti;^||, (40)
then stop. Otherwise go to Step 5.
Step 5. Calculate a discrete gradient

v'^+^ = rix,g>'+\e,z,X,P), i G Hg'^Kcx),

construct the set Dk-^i{x) = co{Dk{x)[j{v^^^}}, set k = k -\- 1 and go to


Step 2.

Algorithm 1 contains some steps which deserve some explanations. In Step


1 we calculate the first discrete gradient. The distance between the convex hull
of all calculated discrete gradients and the origin is calculated in Step 2. If
this distance is less than the tolerance 6 > 0 then we accept the point x
as an approximate stationary point (Step 2), otherwise we calculate another
search direction in Step 3. In Step 4 we check whether this direction is a
descent direction. If it is we stop and the descent direction has been calculated,
otherwise we calculate another discrete gradient with respect to this direction
in Step 5 and add it to the set D^.
It is proved that Algorithm 1 is terminating (see [Bag99a, Bag99b]).

Let numbers ci G (0,1),C2 G (0,ci] be given.

Algorithm 2. Discrete gradient method


Step 1. Choose any starting point x^ G IR^ and set fc = 0.
Step 2. Set 5 = 0 and x^^ = x^.
Step 3. Apply Algorithm 1 for the calculation of the descent direction at
x = x^,6 = Sk^z = Zk^X = Xk^P = Pk^c — c\. This algorithm terminates
after a finite number of iterations m > 0. As a result we get the set ^^(Xg)
and an element f J such that

|K^||=mm{||t;||:«e:D„(x^)}.

Furthermore either ||t'J|| < (J/, or for the search direction ^^ — —^v^sV'^'^^s
200 A.M. Bagirov, J. Ugon

^{x'l + Xkg'l) - V{x1) < -c^\k\\v% (41)

Step 4. If
Ibfll < h (42)
then set x^+^ = x^.k = k + 1 and go to Step 2. Otherwise go to Step 5.
Step 5. Construct the following iteration x^^^ =" Xs~^(^s9s) where as is defined
as follows

as = arg maxjo" > 0 : ip{x^, + ag^) - (^(x^) < -C2a||^,^||}.

Step 6. Set 5 == s + 1 and go to Step 3.

For the point x^ e JR^ we consider the set M{x^) = {x G IR'^ : (p{x) <

Theorem 1. Assume that the set M{x^) is bounded for starting points x^ G
IR'^. Then every accumulation point of {x^} belongs to the set X^ — {x £
WC ',Oedip{x)}.

Since the objective function in problem (30) is semismooth the discrete


gradient method can be applied to solve it. Discrete gradients in Step 5 of
Algorithm 1 can be calculated using the simplified scheme described above.

5 Results of numerical experiments


We applied the max-min separation to solve supervised data classification
problems on some real-world datasets. In this section we present results of
numerical experiments. Our algorithm has been implemented in Lahey Fortran
95 on a Pentium 4 1.7 GHz.

5.1 Supervised d a t a classification via m£ix-inin separability

We are given a dataset A containing a finite number of points in IR^. This


dataset contains d disjoint subsets Ai^... ,Ad where Ai represents a training
set for the class i. The aim of supervised data classification is to establish rules
for the classification of some new observations using these training subsets of
the classes. This problem is reduced to d set separation problems.
Each of these problems consists in separating one class from the rest of
the dataset. To separate the class i from all others, we separate sets Ai and
[jj=iLi ^ji with a piecewise linear function by solving problem (30).
One of the important question in supervised data classification is the esti-
mation of performance measure. Different performance measures are discussed
in [Tho02]. When the dataset contains two classes the classification problem
Supervised Data Classification via Max-min Separability 201

can be reduced to only one separation problem, therefore the classification


rules are straightforward. We consider that the separation function obtained
from the training set, separates the two classes.
When the dataset contains more than two classes we have more than one
separation function. In our case for each class i of the dataset A we have one
piecewise linear function (pi separating the training set Ai from all other train-
ing points [jj-^iAj. We approximate the training set Ai using the following
set
Ai = {aelR'' : (pi{a) < 0}.
Thus we get the sets ^ i , . . . , A^ which approximate the training sets A i , . . . , ^d,
respectively. Then for each i G {1,... ,d} we can consider the following two
sets:
d

These two sets define the following four sets (see Figure 5):
1. Aon(iR"\^°)
2. (IR"\ylO)ni?
4^ (IR"\AO)n(IR"\^?)
If a new observation a belongs to the first set we classify it in class i, if it
belongs to the second set we classify it not to be in class i. If this point belongs
to the third or fourth set in this case if ^i{a) < minj=i^...,(ij^i(/?j(a) then we
classify it in class i, otherwise we classify it not to be in class i.
In order to evaluate the classification algorithm we use two performance
measures. First we present the average accuracy (a2c in Tables 3 and 4) for
well-classified points in two classes classification (when one particular class is
separated from all others) and the multi-class classification accuracy {amc in
Tables 3 and 4) as described above. First accuracy is an indication of sepa-
ration quality and the second one is an indication of multi-class classification
quality.

5.2 Results on small and middle size datasets

In this subsection we present results of numerical experiments with some


small and middle size datasets in order to demonstrate the separation ability
of the proposed algorithm. The datasets used are the Wisconsin Breast Can-
cer Diagnosis (WBCD), the Wisconsin Breast Cancer Prognosis (WBCP),
the Cleveland Heart Disease (Heart), the Pima Indians Diabetes (Diabetes),
the BUPA Liver Disorders (Liver), the United States Congressional Voting
Records (Votes) and the Ionosphere. All datasets contain 2 classes. The de-
scription of these datasets can be found in [MA92].
We take entire datasets and check their polyhedral or max-min separability
considering various number of hyper planes. Results of numerical experiments
202 A.M. Bagirov, J. Ugon

Fig. 5. Multi-class classification by a max-min separation

are presented in Table 1. We use the following notation: m - is the number


of instances in the first class, p - is the number of instances in the second
class, n - number of attributes, h number of hyperplanes used for polyhedral
separability, r is the cardinality of the set / and j is the cardinality of the sets
Ji, i e I in the max-min separability. The sets Ji contain the same number
of indices for alH G / . In our experiments we restrict r to 15 and j to 5. The
accuracy is defined as the ratio between the number of well-classified points
of both A and B and the total number of points in the dataset.

Table 1. Results of numerical experiments with small and middle size datasets
Database m/p/n Linear Polyhedral Max-min
h accuracy r x j accuracy
WBCD 239/444/9 97.36 7 98.98 5x2 100
Heart 137/160/13 84.19 10 100 2x5 100
Ionosphere 126/225/34 93.73 4 97.44 2x2 100
Votes 168/267/16 96.80 5 100 2x3 100
WBCP 46/148/32 76.80 4 100 3x2 100
Diabetes 268/500/8 76.95 12 80.60 15x2 90.10
Liver 145/200/6 68.41 12 74.20 6x5 89.86

Prom the results presented in Table 1 we can conclude that in none of


the datasets classes are linearly separable. Classes in heart, votes and WBCP
are polyhedrally separable and in WBCD they are "almost" polyhedrally sep-
Supervised Data Classification via Max-min Separability 203

arable. We considered different values for h in diabetes and liver datasets


and present best results. These results show that classes in these datasets are
unlikely to be polyhedrally separable. Classes in WBCD, heart, ionosphere,
votes and WBCP are max-min separable with a presented number of hyper-
planes whereas classes in diabetes and liver datasets are likely to be max-min
separable with quite large number of hyperplanes. On the other side results
for these datasets show that the use of max-min separability allows one to
achieve significantly better separation.

5.3 Results on larger datasets

Datasets

The datasets used are the Shuttle control , the Letter recognition, the Land-
sat satellite image, the Pen-based recognition of handwritten and the Page
blocks classification databases. Table 2 presents some characteristics of these
databases. More detailed information can be found in [MA92]. It should be
noted that all attributes in these datasets are continuous.

Table 2. Large datasets


Database (train,test) No. of No. of
attributes classes
Shuttle control (43500,14500) 9 7
Letter recognition (15000,5000) 16 26
Landsat satellite image (4435,2000) 36 6
Pen-based recognition of
handwritten (7494,3498) 16 10
Page blocks (4000,1473) 10 5

Results and discussion

We took X^ = 0 e IR^^"^^^^ as a starting point for solving each separation


problem (30). At each iteration of the discrete gradient method the line search
is carried out by approximation of the objective function using univariate
piecewise linear function (see [Bag99a]). In each separation problem (30) all
Ji, i e I have the same cardinality.
Results of numerical experiments are presented in Tables 3 and 4. In these
tables fct eval, DG eval and CPU time show respectively the average number
of objective function evaluations, discrete gradient evaluations and CPU time
required to solve an optimization problem. CPU time is presented in seconds.
Prom the results presented in these tables we can see that the use of
the max-min separability algorithm allows to achieve a high classification
204 A.M. Bagirov, J. Ugon

accuracy for both training and test phases. Results on training sets show
that this algorithm provides a high quality of separation between two sets. In
our experiments we used only large-scale datasets. Results on these datasets
show that a few hyperplanes are sufficient to separate efficiently sets with large
numbers of points. Since we use a derivative-free method to solve problem (30)
the number of objective function evaluations is a significant characteristic for
estimation of the complexity of the max-min separability algorithm. Results
presented in Tables 3 and 4 confirm that the proposed algorithm is effective
for solving classification problems on large-scale databases.

Table 3. Results of numerical experiments with Shuttle control, Letter recognition


and Landsat satellite image datasets
Training Test
m \Ji\1 CL2c CLmc Q^2c 0,mc •fct eval DG eval CPU time
Shuttle control dataset
1 1 94.63 87.84 94.66 87.86 265 268 54.44
2 1 97.26 97.58 97.08 97.49 396 399 145.12
3 1 97.04 99.36 96.87 99.21 379 379 211.23
4 1 97.35 99.50 97.19 99.35 402 405 310.54
2 2 99.86 99.57 99.86 99.39 391 394 281.92
3 2 99.48 99.92 99.43 99.86 636 639 825.99
4 2 99.84 99.76 99.82 99.70 447 450 810.58
Letter recognition dataset
1 1 92.51 66.89 92.32 66.00 280 284 17.57
2 1 96.83 79.86 95.24 79.36 568 572 60.98
3 1 98.34 85.73 95.94 84.82 573 575 93.72
4 1 99.08 89.32 96.36 86.86 665 667 158.29
2 2 98.12 86.89 96.20 84.56 683 686 143.07
3 2 98.97 91.46 96.32 89.12 634 635 366.16
3 3 99.52 93.73 96.16 90.32 511 511 436.37
Landsat satellite image dataset
1 1 93.12 86.00 91.30 83.45 298 301 4.62
2 1 96.73 88.12 94.40 85.65 549 552 19.12
3 1 97.54 89.80 94.80 87.00 618 621 37.37
4 1 97.81 91.14 94.35 87.45 656 659 61.64
2 2 97.56 90.85 94.25 87.10 606 609 48.83
3 2 98.02 90.98 94.60 86.70 712 715 116.86
4 2 98.47 93.33 94.80 86.70 533 536 137.07

6 Conclusions and further work


In this paper we have developed the concept of the max-min separability. If
finite point sets A and B are disjoint then they can be separated by a certain
Supervised Data Classification via Max-min Separability 205

Table 4. Results of numerical experiments with Pen-based recognition of handwrit-


ten and Page blocks datasets
Training Test
Ml Mil tt2c CLmc Ci2cO'mc fct eval. 'DG. eval. CPU time
Pen-based recognition of handwritten dataset
1 1 97.54 94.93 93.68 89.94 385 388 6.97
2 1 99.45 98.91 96.05 95.37 582 585 19.71
3 1 99.91 99.65 96.51 96.54 865 868 48.19
4 1 99.97 99.79 96.23 97.11 841 844 70.21
2 2 99.91 99.69 96.68 96.31 888 890 63.94
3 2 99.97 99.88 97.37 97.40 727 730 124.91
4 2 99.99 99.89 97.06 97.28 733 736 191.71
Page blocks dataset
1 1 93.48 92.60 81.87 82.48 623 626 2.93
2 1 93.88 93.48 80.52 85.61 369 372 3.59
3 1 95.38 94.20 87.24 86.69 550 553 9.65
4 1 95.68 94.88 85.81 87.44 822 825 22.09
2 2 95.55 94.33 88.53 86.97 505 508 11.51
3 2 96.55 95.68 89.34 88.46 779 782 40.71
4 2 96.45 95.40 87.71 86.08 682 685 54.60

piecewise linear function presented as a max-min of linear functions. We have


proposed an algorithm to find this piecewise linear function by minimizing an
error function.
This algorithm has been applied to solve data classification problems in
some large-scale datasets. Results from numerical experiment show the eflPec-
tiveness of this algorithm.
However the number of hyperplanes needed to separate the two sets has
to be known. In further research some methods to find automatically this
number will be introduced. Problem (30) is a global optimization problem on
which we use a local optimization method. Therefore it is very crucial to find
a good initial point in order to reduce computational cost and to improve the
solution. These questions are the subject of our further research.

Acknowledgements
This research was supported by the Australian Research Council.

References
[AG02] Astorino, A., Gaudioso, M.: Polyhedral separability through successive
LP. Journal of Optimization Theory and Applications, 112, 265-293
(2002)
206 A.M. Bagirov, J. Ugon

[BG95] Bagirov, A.M., Gasanov, A.A.: A method of approximating a quasidiffer-


ential. Russian Journal of Computational Mathematics and Mathematical
Physics, 35, 403-409 (1995)
[Bag99a] Bagirov, A.M.: Derivative-free methods for unconstrained nonsmooth op-
timization and its numerical analysis. Investigacao Operacional, 19, 75-93
(1999)
[Bag99b] Bagirov, A.M.: Minimization methods for one class of nonsmooth func-
tions and calculation of semi-equilibrium prices. In: Eberhard. A. et al
(eds) Progress in Optimization: Contribution from Australasia, 147-175.
Kluwer Academic Publishers (1999)
[Bag02] A.M. Bagirov, A method for minimization of quasidifferentiable functions.
Optimization Methods and Software, 17, 31-60 (2002)
[Bag05] Bagirov, A.M.: Max-min separability. Optimization Methods and Soft-
ware, 20, 271-290 (2005)
[BRSYOl] Bagirov, A.M., Rubinov, A., Soukhoroukova, N., Yearwood, J.: Unsu-
pervised and supervised data classification via nonsmooth and global op-
timization. TOP, 1 1 , 1-93, Sociedad de Estadistica Operativa, Madrid,
Spain, June 2003 (2003)
[BRYOO] Bagirov, A.M., Rubinov, A.M., Yearwood, J.: Using global optimization
to improve classification for medical diagnosis and prognosis. Topics in
Health Information Management, 22, 65-74 (2001)
[BRY02] Bagirov, A.M., Rubinov, A.M., Yearwood, J.: A global optimization ap-
proach to classification. Optimization and Engineering, 3, 129-155 (2002)
[BKS95] Bartels, S.G., Kuntz, L., Sholtes, S.: Continuous selections of linear func-
tions and nonsmooth critical point theory. Nonlinear Analysis, TMA, 24,
385-407 (1995)
[BB97] Bennet, K.P., Blue, J.: A support vector machine approach to decision
trees. Mathematics Report 97-100, Rensselaer Polytechnic Institute, Troy,
New York (1997)
[BB96] Bennet, K.P., Bredersteiner, E.J.: A parametric optimization method for
machine learning. INFORMS Journal on Computing, 9, 311-318, (1997)
[BM92] Bennett, K.P., Mangasarian, O.L.: Robust linear programming discrimi-
nation of two linearly inseparable sets. Optimization Methods and Soft-
ware, 1, 23-34 (1992)
[BM93] Bennett, K.P., Mangasarian, O.L.: Bilinear separation of two sets in n-
space. Computational Optimization and Applications, 2, 207-227 (1993)
[BMOO] Bradley, P.S., Mangasarian, O.L.: Massive data discrimination via linear
support vector machines. Optimization Methods and Software, 13, 1-10
(2000)
[BFM99] Bradley, P.S., Fayyad, U.M., O.L. Mangasarian: Data mining: overview
and optimization opportunities. INFORMS Journal on Computing, 1 1 ,
217-238 (1999)
[Bur98] Burges, C.J.C.: A tutorial on support vector machines for pattern recog-
nition. Data Mining and Knowledge Discovery, 2, 121-167 (1998)
[CM95] C. Chen, Mangasarian, O.L.: Hybrid misclassification minimization.
Mathematical Programming Technical Report, 95-05, University of Wis-
consin (1995)
[Cla83] Clarke, F.H.: Optimization and Nonsmooth Analysis, Wiley-Interscience,
New York (1983)
Supervised Data Classification via Max-min Separability 207

[DDM02] Demyanov, A.V., Demyanov, V.F., Malozemov, V.N.: Minmaxmin prob-


lems revisited. Optimization Methods and Software, 17, 783-804 (2002)
[DR95] Demyanov, V.F., Rubinov, A.M., Constructive Nonsmooth Analysis. Pe-
ter Lang, Frankfurt am Main (1995)
[Evt72] Evtushenko, Y u . C : A numerical method for finding best guaranteed esti-
mates. USSR Journal of Computational Mathematics and Mathematical
Physics, 12, 109-128 (1972)
[HL93] Hiriart-Urruty, J.-B., Lemarechal, C : Convex Analysis and Minimization
Algorithms, Vol. 1 and Vol. 2. Springer Verlag, Berlin, Heidelberg, New
York (1993)
[Kiw85] Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Lec-
ture Notes in Mathematics, 1133, Springer Verlag, Berlin (1985)
[Pol97] Polak, E.: Optimization: Algorithms and Consistent Approximations.
Springer Verlag, New York (1997)
[KP98] Kirjner-Neto, C , Polak, E.: On the conversion of optimization problems
with max-min constraints to standard optimization problems. SI AM J.
Optimization, 8, 887-915 (1998)
[MN92] Makela, M.M., Neittaanmaki, P.: Nonsmooth Optimization. World Scien-
tific, Singapore (1992)
[Man94] Mangasarian, O.L.: Misclassification minimization. Journal of Global Op-
timization, 5, 309-323 (1994)
[Man97] Mangasarian, O.L.: Mathematical programming in data mining. Data
Mining and Knowledge Discovery, 1, 183-201 (1997)
[Mif77] Mifflin, R.: Semismooth and semiconvex functions in constrained opti-
mization. SIAM Journal on Control and Optimization, 15, 957-972 (1977)
[MA92] Murphy, P.M., Aha, D.W.: UCI repository of machine learn-
ing databases. Technical report. Department of Information
and Computer science. University of California, Irvine (1992)
(www.ics.uci.edu/mlearn/MLRepository.html)
[NM65] Nelder, J.A., Mead, R.: A simplex method for function minimization.
Comput. J., 7, 308-313 (1965)
[Pow02] Powell, M.J.D.: UOBYQA: unconstrained optimization by quadratic ap-
proximation. Mathematical Programming, Series B, 92, 555-582 (2002)
[Tho02] Thorsten, J.: Learning to Classify Text Using Support Vector Machines.
Kluwer Academic Publishers, Dordrecht (2002)
[Vap95] Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New
York (1995)
A Review of Applications of the Cutting Angle
Methods

Gleb Beliakov

School of Information Technology


Deakin University
221 Burwood Hwy, Burwood, 3125, Australia
glebOdecikin. edu. au

Summary. The theory of abstract convexity provides us with the necessary tools
for building accurate one-sided approximations of functions. Cutting angle methods
have recently emerged as a tool for global optimization of families of abstract convex
functions. Their applicability have been subsequently extended to other problems,
such as scattered data interpolation. This paper reviews three different applications
of cutting angle methods, namely global optimization, generation of nonuniform
random variates and multivatiate interpolation.

Key words: Global optimization, Abstract convexity, Cutting angle method,


Random variate generation, Uniform approximation.

1 Introduction
The theory of abstract convexity [RubOO] provides the necessary tools for
building accurate lower and upper approximations of various classes of func-
tions. Such approximations arise from a generalization of the following clas-
sical result: each convex function is the upper envelop of its affine minor ants
[Roc70]. In abstract convex analysis the requirement of linearity of the mino-
rants is dropped, and abstract convex functions are represented as the upper
envelops of some simple minor ants, or support functions, which are not nec-
essarily affine. Depending on the choice of the support functions, one obtains
different flavours of abstract convex analysis.
By using a subset of support functions, one obtains an approximation
of an abstract convex function from below. Such one-sided approximation,
or underestimate, can be very useful in various applications. For instance,
in optimization, the global minimum of the underestimate provides a lower
bound on the global minimum of the objective function. One can find the
global minimum of the objective function as the limiting point of the sequence
210 G. Beliakov

of global minima of underestimates. This is the principle of the cutting angle


method of global optimization [ARG99, BROO, RubOO], reviewed in section 3.
This paper discusses two other applications of one-sided approximations.
The second application is generation of random variates from a given dis-
tribution using acceptance/rejection approach. Non-uniform random variates
generation is an important task in statistical sumulation. The method of ac-
ceptance/ rejection consists in approximating the required probability density
from above, using a simpler function, called the hat function. Then the ran-
dom variates are generated using a multiple of the hat function as the density,
and these random variates are either accepted or rejected based on the value
of an independent uniform random number. In section 4 we discuss this ap-
proach in detail, and show how one-sided approximation (from above) can be
used to build suitable hat functions.
The last application comes from the field of scattered data interpolation.
Here we combine the upper and lower approximations of the function known
to us through a set of its values, and obtain an accurate interpolant, which as
we show, solves the best uniform approximation problem.

2 Support functions and lower approximations


2.1 Basic definitions

We will use the following notations.


K^ denotes the cone of vectors with non-negative components
{ x G i i ^ :xi > 0 , i = l , . . . , n } ;
- R^^ denotes the cone of vectors with strictly positive components
{x e R"^ : Xi > 0,i = 1,... ,n};
i?4.oo denotes (—oo,+oo];
S denotes the unit simplex S = {x e R^ : Xi >0^ Z^ILi ^i — 1}5
- riS is the relative interior of 5, riS = {x e R^ : Xi > 0^ YH=I ^i ~ -^}'
- Index set / = { 1 , 2 , . . . , n};
- x = ( a : i , X 2 , . . . , X n ) € R^]
x^ G S denotes the fc-th vector of some sequence {x^}^^i]
Vector inequality x >y denotes dominance Xi >yi,'ii e I.
Definition 1. The function f : X -^ R is called Lipschitz-continuous in X, if
there exists a number M: Vx^y G X : \f{x) — f{y)\ < M\\x — y\\. The smallest
such number is called the Lipschitz constant of f in the norm \\ • ||.
Definition 2. A function f : R^ —^Ris called IPH (Increasing positively
homogeneous functions of degree one) if
yx.yeR^, x>y=> f{x) > f{y);\/x G i^?,VA G i?++ : /(Ax) = A/(x).
Let X be some set, and let iJ be a nonempty set of functions
h : X —^ V C [—oo,+oo]. We have the following definitions [RubOO].
Applications of Cutting Angle methods 211

Definition 3. A function f is abstract convex with respect to the set of func-


tions H (or H-convex) if there exists U C H:
f{x) = sup{h{x) : /i G f/},Vx G X.
Definition 4. The set U of H-minorants of f is called the support set of f
with respect to the set of functions H:
suppif, H) = {he H, h{x) < f{x) \/x G X}.
Definition 5. H-subgradient of f at x is a function
heH: f{y) > h{y) - {h{x) - f{x))yy € X.
The set of all H-subgradients of f at x is called H-subdifferential
dnfix) = {hGH:\/yeX, f{y) > h{y) - {h{x) - fix))}.
Definition 6. The set dfjf{x) at x is defined as
d*Hf{x) = {h& supp{f,H) : hix) = fix)}.
Proposition 1. [RubOO], p. 10. If the set H is closed under vertical shifts, ie.,
{h e H,ce R) implies h — c e H, then 9 ^ / ( x ) = dnfix).
Definition 7. Polyhedral distance,
Let P be a finite convex polyhedron in R^ defined by the intersection of r
half spaces, containing the origin in its interior (example 7.2 from [DR95J)
r
P=f]{x'.X'hi<l}, (1)
i=l

where hi G R^ are the directional vectors. The polyhedral distance is


dp{x^ y) = max{(x — y) - hi : 1 <i <r}.
As a special case consider the distance defined by a simplex centered at 0.
Definition 8. Simplicial distance.
Let P be a simplex defined as the intersection of n -\- 1 halfspaces (1),
defined by the vectors
hi = {-vi,0,0,...),
/i2 = ( 0 , - ^ 2 , 0 , . . . ) ,

hm-}-! = {Vm+1, • ' ' ,Vm+l), (2)

i;^ > 0. The simplicial distance is


n
dp{x,y) = max{ max Vi{yi - Xi),Vn+i yZ(^^ ~ ^^)}- (^)
212 G. Beliakov

Let us now for the purposes of convenience introduce a slack variable


Xn+i = I — Yl7=i^'^' With the help of the new coordinate, and using
I]^_i(x^ -yi) = l - J];^^j yi-{l- EILi ^i) = Vn+i - ^n+i, we Can write (3)
in a more symmetric form

dp{x, y) = max Vi{yi - Xi). (4)


t=l,...,n+l

2.2 Choices of support functions

We start with the classical case of afRne support functions [Roc70, RubOO].
Example 1. Let the set H denote the set of all affine functions

H = {h: h{x) =a-x-\-b, x,ae K^.be R}.

A function / : R^ -^ R-^oo is i7-convex if and only if / a is lower semicontin-


uous convex function.
As a consequence of this result, we can approximate convex lower semi-
continuous functions from below using a finite subset of functions from
supp{f,H). For instance, suppose know a number of values of function /
at points x^^k — 1,... ^K. Then the pointwise maximum of the support func-
tions h^

H^ix) = ^ max^ h'ix) = ^maxjfix") + A'f{x - x")) (5)

is a lower approximation, or underestimate of / . A^ denotes a subgradient of


f Sit x^. The function H^ is a piecewise linear convex function, illustrated on
Fig.l.

Example 2. [RubOO]. Let the set H be the set of min-type functions

H = {h: h(x) = mmaiXi.a G Rl,x G Rl],

A function / : R^ -^ R^ is if-convex if and only if / is IPH.

As a consequence, we can approximate IPH functions from below using


pointwise maxima of subsets of its support functions,

H^(x) = max h^(x) = max rain a^Xi, (6)

where af = ^^r^ if x^ > 0 and 0 otherwise.


Further, it is shown in [RubOO] that IHP functions are closely related to
Lipschitz functions, in the sense that every Lipschitz function g defined on
the unit simplex S can be transformed to a restriction of an IPH f to S using
an additive constant: f = g -\- C, where C > — min^(x) + 2M, where M is
Applications of Cutting Angle methods 213

Fig. 1. The graph of the function H^ in (5).

I /// //

Fig. 2. Saw-tooth underestimate of / in CAM using functions (6).

the Lipschitz constant of g in Zi-norm. Thus the underestimate (6) can also
be used to approximate Lipschitz functions on the unit simplex.
Function (6) has a very irregular shape illustrated on Figs. 2,3, the reason
why it is often called the saw-tooth underestimate (or saw-tooth cover) of / .

Example 3. [RubOO]. Let the set H be the set of functions of the form

H = {h: h{x) =a- C\\x -h\\,x,h e BJ'.a e R,C e R+]


214 G. Beliakov

Fig. 3. The hypograph of the function H^ in (6).

Then / : RJ^ —^ R-\-oo is H-convex if and only if / is a lower semicontinuous


function. The i/-subdifferential of / is not empty if / is Lipschitz.

As a consequence, we can approximate Lipschitz functions from below


using underestimates of the form

H^{x)= max /i^(x) - max (f(x^)-C\\x-x^\\), (7)

where C > M^ and M is the Lipschitz constant of / in the norm || • ||.

Example 4- [Bel05]. Let dp be a simplicial distance function, and let the set
H be the set of functions of the form

H = {h: h{x) =a- Cdp{x, h), x,b e BJ'.a e R,C £ R^}

Then f : R^ -^ R-\-oo is if-convex if and only if / is a lower semicontinuous


function. The iJ-subdifferential of / is not empty if / is Lipschitz.

Since dp can also be written as (4), we can use the following underestimate
of a Lipschitz /

H^{x) = max (/(x^) - Cdp{x,x^)) = max min(/(x'') - Ci{x^ - x^)),


(8)
where Ci = Cvi^ and C satisfies Cdp{x,y) > M\\x — y\\, where M is the
Lipschitz constant of / in the norm || • || [Bel05]. We remind that here we use
a slack variable, as in (4), and the components of x G R^'^^ are restricted by
Y^Xi — 1. The shape of H^ is illustrated on Figs. 4,5, and it is also called
the saw-tooth underestimate.
Applications of Cutting Angle methods 215

Fig. 4. Univariate saw-tooth underestimate of / using functions (8).

Fig. 5. The hypograph of the function H^ in (8) in the case of two variables.

2.3 Relation to Voronoi diagrams

Consider a set of points {x^}k=i^^^ ^ ^^^ called sites.


Definition 9. The set

Vor{x^) = {xeR'' \ ||x-x^|| < | | X - X ^ ' | | , V J V ^ }


is called the Voronoi cell of x^.
One can choose any norm, or in fact any distance function dp in this
definition. The collection of Voronoi cells for all sites x^,A: = 1,...,K is
called the Voronoi diagram of the data set. Voronoi diagram is one of the
most fundamental data structures of a data set with a long history [Aur91,
OBSCOO, BSTY98]. An example is presented on Fig.6.
216 G. Beliakov

Fig. 6. The Voronoi diagram of a set of sites, and its dual Delaunay triangulation.

There are multiple extensions of the Voronoi diagram, notably those based
on the generalization of the distance function [OBSCOO, BSTY98]. One such
generalization is called additively weighted Voronoi diagram, in which case
each site has an associated weight Wk-
Definition 10. Let {x^}j^=i^x^ ^ R^ be the set of sites, and w G R^ be the
vector of weights. The set
Vor{x\w) = {xeR'' : Wk + \\x - x^\\ < wj + \\x - x^'||,Vj / k},
is called Additively Weighted Voronoi cell. The collection of such cells is called
Additively Weighted Voronoi diagram.
Voronoi diagrams and their duals, Delaunay (pre-)triangulations, are very
popular in multivariate scattered data interpolation, e.g., Sibson's natural
neighbour interpolation [SibSl].
Let us show how Voronoi diagrams are related to underestimates (7),(8).
First consider the special case f = I. For the function H^ in (7), and for each
k = 1,... ,K define the set
S^ = {xeR'': h^x) > h^{x),\fj ^ k).
It is easy to show that sets S^ coincide with Voronoi cells Vor{x^). In-
deed, h^{x) > h^{x) implies 1 — C\\x — x^\\ > 1 — C\\x — x^\\, and then
\\x — x^W < \\x — x^\. Furthermore, if we now take H^ in (8), the sets S^
coincide with Voronoi cells in distance dp.
Let us now take an arbitrary Lipschitz / and (7). Consider an additively
weighted Voronoi diagram with weights Wk given as Wk - ^ ^ . It is not
difficult to show that Voronoi cells Vor{x^^w) can be written as
Vor{x^,w) = {xeR'' : h^{x) > h^{x),Wj ^ k}.
The last equation is also valid for other distance functions, and in particular
dp and h^ in (8).
Applications of Cutting Angle methods 217

This interesting relation of saw-tooth underestimates and Voronoi dia-


grams has two implications. Firstly, we can use existing results on compu-
tational complexity of Voronoi diagrams to estimate the number of "teeth"
of the saw-tooth underestimate, i.e., the number of local minimizers
These miminizers correspond to the vertices of the Voronoi diagram. It is
known that the number of vertices of Voronoi diagram grows as 0{K^^'^)
in any simplicial distance function or /oo-metric [BSTY98]. \a] denotes the
smallest integer greater or equal to a. Thus we obtain an estimate on the
number of local minimizers
Secondly, we can apply methods of enumerating local minima of H^ dis-
cussed in the next section as a tool for building Voronoi diagrams, and in
particular weighted Voronoi diagrams, as well as their dual Delaunay trian-
gulations.

3 Optimization: the Cutting Angle method


3.1 Problem formulation

We consider the following global optimization problem. Let / be an i7-convex


function on some compact set D C R^. We solve

min/(x) (9)
s.t. X e D.

Depending on the set H we obtain different classes of abstract convex


functions. Consider the following instances of Problem (9). In the case of
H being the set of afRne functions, / is convex and possesses the unique
local minimum. While there are many alternative efficient methods of local
minimization, we consider below the cutting plane method of Kelley [Kel60],
as other instances of Problem (9) essentially rely on the same approach.
If H is the set of min-functions as in Example 2, / is IPH. The class of
IPH functions is quite broad, and includes the following functions on R^ or

1. f{x) = a^x^ai > 0;


2. f{x) = \\x\\p,p>0;
3. f{x) = l[x'j\JcI={l,.,,,n},tj>0,^ tj = l',
jeJ ^
4. f{x) = y/[Ax,x], where A is a matrix with nonnegative entries
and [•, •] is the usual inner product in R^.
In addition, since Lipschitz functions on 5, modified with a suitable con-
stant, can be seen as restrictions of IPH functions, we can effectively solve
Lipschitz optimization problems on S or its subsets.
218 G. Beliakov

If the set H is chosen as in Examples 3 and 4, / is lower semicontinuous,


and if we require the subdifferential to be non-empty, then / is Lipschitz.
Lipschitz functions appear very frequently in applications [HP95, HPTOO,
HJ95, HPTOO, Neu97, Pin96]. The difficulty of the optimization problem in
this case is that the objective function / may possess a huge number of local
minimizers (in some instances 10^° — 10^° [FloOO, LS02], which are impossible
to enumerate (and hence find the global minimum) using local optimization
methods.
Lipschitz properties of / allow one to put accurate bounds on the value of
the global minimum on D and also on parts of D. Those parts of the domain on
which the lower bound is too high are automatically excluded, the technique
known as fathoming. This way a largely reduced subset of D will eventually
be searched for the global minimum, and the majority of local minima of /
can be avoided.

3.2 The Cutting Angle algorithm

Below we present the generalized cutting plane method, of which cutting angle
method (CAM) is a particular instance, following [RubOO, ARG99, BROO]. The
principle of this method is to replace the original global optimization problem
with a sequence of relaxed problems

uiinH^ix) (10)
s.t. X e D,

K = 1,2, The sequence of solutions to the relaxed problems converges to


the global minimum of / under very mild assumptions [RubOO].

Generalized Cutting Plane Algorithm


Step 0. (Initialisation)
0.1 Set X = 0.
0.2 Choose an arbitrary initial point x^ G D.
Step 1. (Calculate H-subdifferential)
1.1 Calculate h^ G d%f{x^).
1.2 Define H^{x) = maxfc=o,...,/r h^{x), for all x e D.
Step 2. (Minimize H^)
2.1 Solve Problem (10). Let x* be its solution.
2.2Set i^ = i^-f l , x ^ =x\
Step 3. (Stopping criterion)
3.1 UK < Kmax and fbest - H^ix"") > e go to Step 1.
Applications of Cutting Angle methods 219

The relaxed problems (10) are required at every iteration of the algorithm,
and as such their solution must be efficient. In the case of convex / we obtain
Kelley's cutting plane method. In this case the relaxed problem can be solved
using linear programming techniques.
For Lipschitz and IPH functions, the relaxed problems are very challeng-
ing. In the univariate case, the above algorithm is known as Pijavski-Shubert
method [HJ95, Pij72, Shu72, SSOO], and many its variations are available.
However its multivariate generalizations, like Mladineo's method [Mla86], did
not succeed for more than 2-3 variables because of significant computational
challenges [HP95, HPTOO].
To solve the relaxed Problem (10) with H^ given by (6),(7) or (8), one
has to enumerate all local minimizers of the saw-tooth underestimate. The
number of these minimizers grows exponentially with the dimension n, and
until recently this task was impractical. Below we review a new method for
enumerating local minimizers of i J ^ , as published in the series of papers
[BROO, BROl, BB02, Bel03].

3.3 Enumeration of local minima

We are concerned with enumerating all local minimizers of the function H^


(6) on 5 or D C 5, where D is a polytope. This function is illustrated on
Figs.2,3, For convenience, let us introduce the support vectors /^ G R^ U oo

l^ = ^ ^ - ^ , if x^ > 0, or oo otherwise. (11)

At the K-th iteration of the algoritm we have K support vectors. Consider


ordered combinations of n support vectors, L = {l^'^J^^,... , / ^ ^ } , which we
can visualize as n x n matrix whose rows are given by the participating support
vectors
fix' ' 2 ^ - - - e \
jk2 //C2 7k2
n ^2 • • • ^n
L = (12)

v^/^.../^/
The following result is proven in [BROO]: every local minimizer x* of H^ in
ri S corresponds to a combination L satisfying two conditions
(I)Vi,ie/,f9^i:/^>Z^
(II) yvefC\L ,3i€l:l'l' <Vi
where K, = {l^,P,..., l^} is the set of all support vectors. Further, the actual
local minima are found from L using

d = H'^{x*)=Trace{L)-\ (13)
x*{L) = ddiag{L).
220 G. Beliakov

Condition (I) implies that the diagonal elements dominate their respective
columns, and condition (II) implies that the diagonal of L does not dominate
any other support vector v. Thus we obtain a combinatorial problem of enu-
merating all combinations L that satisfy conditions (I) and (II).
It is infeasible to enumerate all such combinations directly for large K.
Fortunately there is no need to do so. It was shown in [BB02, Bel03, Bel04]
that the required combinations can be put into a tree structure. The leaves of
the tree correspond to the local minimizers of H^, whereas the intermediate
nodes correspond to the minimizers of H'^^ H^'^^,..., H^~^. Such a tree is
illustrated on Fig.7. The use of the tree structure makes the algorithm very
efficient numerically (as processing of queries using trees requires logarithmic
time of the number of nodes).

{^ =(L i i^

/6 _ /-7 9 21-\
/ ' = (0.0,1) / ' = (0.04)
•*mi» ^419 > 419 ' 419- ^mt. = (n53'nm'T555)

~7K -VK 7T^

Fig. 7. The tree of combinations of support vectors L that satisfy conditions (I)
and (II) and define local minima of H^.

To enumerate local minimizers in a polytope D C. S one proceeds as


follows. Using the enumeration technique from [BB02, Bel03], find all local
minimizers on ri S. Each such minimizer has an associated set A{L) on which
it is unique. The set A{L) is characterized by [Bel03, Bel04]
Applications of Cutting Angle methods 221

(0,1.0)

local minimum of H*^

(0,0,1) (1,0,0)

Fig. 8. Sets A{L) on which the saw-tooth underestimate has unique local minimum.
Two such sets are shown. Black circles denote points x'^.

3 ^ 3 ' ' /'~ T ' ^^ '

XiXj' > XjX^\ ij e lyi < j . (14)

The sets A{L) form a nonintersecting partition of S. They are illustrated on


Fig.8.
For each local minimizer x* on ri S we can have three situations: a) x* G D,
in which case we just record it, h) x* ^ D and A{L) fl -D = 0, in which case
we discard x*, and c) A{L) Pi i^ y^ 0, in which case we look for a constrained
minimum on the boundary of D. This can be done by solving an optimization
problem

min max L * Xi (15)


iei '
s.t. X e A{L)nD,

which is subsequently transformed into a linear programming problem. To do


this, introduce an auxiliary variable a = max^^/ l^^Xi, and write (15) as

min a (16)
s.t. \/i e I: a- l^'xi > 0,
Xe A{L)nD,
and recall that the set A{L) is an intersection of halfspaces (14) and D is a,
polytope. The details are given in [Bel04].
Consider now functions (8), illustrated on Figs.4,5. In this case we can use
a similar enumeration technique. Define the support vectors
222 G. Beliakov

if = ^ - 4 . (17)
Form ordered combinations of n + 1 support vectors L (12). We have the
following result [Bel05]: every local minimizer of H^ corresponds to a combi-
nation L that satisfies conditions (I) and (II) above, and the actual minima
are found from
^^jjK^^.^^Trace^>±l^ (18)
G

where C " = E i e / ^ •
The sets A{L), on which each local minimum is unique, are characterized
by
Vi, j e { 1 , . . . , n + 1 } , i i ^ j : Cj{x* - x'/) < Qix* - x ^ ) . (19)

3.4 Numerical experiments

We performed extensive testing of various versions of CAM on test and real life
problems [BB02, Bel03, Bel04, BTMRB03, LBB03, LBB03]. In this section,
to indicate the performance of the algorithm, we present a selection of results
of numerical experiments. We took the following test optimization problems.
Test Problem 1 (Six-hump camel back function)

fix) =U- 2,\x\ + ^ " J x? + xiX2 + 4(x2 - \)xl

-2<Xi<2,i = 1,2.

Test Problem 2 ([HPT00],p.261)


10
10 ^

0<Xi<10,i = 1,2.
Parameters a* and d are given in [HPT00],p,262.
Test Problem 3 [HJ95]

f{x) = sin(xi)sin(xiX2)sin(xiX2X3)
0<Xi<A,
Test Problem 4 (Griewanks function)

4000,
i=l i=l \V /
- 5 0 <Xi< 50.
Applications of Cutting Angle methods 223

In Table 1 we compare the performance of CAM which uses underestimate


(6) and Extended CAM (ECAM), which uses underestimate (8). For functions
1-3, EC AM was able to compute the same lower bound on the global minimum
using less function evaluations (and significantly less time) than CAM. For
function 4, we ran both CAM and EC AM algorithms the same number of
iterations (function evaluations), and compared the values of the lower bound
on the global minimum. It appears from Table 1 that EC AM consistently
produces better results than CAM. This is not surprising, as all test problems
involve Lipschitz functions. Approximation (6) used in CAM is more suitable
for IPH functions, and the conversion of Lipschitz objective functions to IPH
functions resulted in somewhat less efficient algorithm than EC AM.

3.5 Applications

Various versions of CAM have been applied to solving real life practical prob-
lems. In [BRYOl, BRY02] the authors successfully used CAM in problems of
supervised classification. In particular they applied CAM for automatic clas-
sification of medical diagnosis. In [BRS03] the same authors extended the use
of CAM for unsupervised classification problems.
CAM has been applied as a tool to find parameters of a function in uni-
variate and multivariate nonlinear approximation. [Bel03] applies CAM to op-
timize position of knots in univariate spline approximation, whereas in [Bel02]
CAM was used to fit aggregation operators to empirical data.
Recently we applied CAM to the molecular structure prediction problem
[Neu97, FloOO, LBB03]. This is a very challenging problem in computational
chemistry, which consists in predicting the geometry of a molecule by mini-
mizing its potential energy as a function of atomic coordinates. We chose the
benchmark problem of unsolvated met-enkephalin [FloOO, LBB03]. As inde-
pendent variables we used the 24 dihedral angles of this pentapeptide, and
following [FloOO], 10 of the dihedral angles (the backbone) were used as global
variables in ECAM, while the rest were treated as local variables (i.e., each
function evaluation involved a local optimization problem with respect to the
dihedral angles treated as local variables). This objective function (the po-
tential energy) involves in the order of 10^^ local minima. The problem is
very challenging because of the existence of several strong local minima which
trap local descent algorithms. For instance all reported multistart local search
algorithms failed to identify the global minimum [FloOO].
Previously we reported that a combination of CAM with local search al-
gorithms allowed us to locate the global minimum of the potential energy
function in 120,000 iterations of CAM, which took 4740 seconds (79 min) on
a cluster of 36 DEC Alpha workstations (1 MHz processors) [LBB03, LBB03].
Using ECAM and the same hardware and software configuration the global
minimum was found in 80,000 iterations, which took 50 min on the cluster of
36 DEC Alpha workstations.
224 G. Beliakov

It is worth noting that CAM can be efficiently parallehzed to take ad-


vantage of the distributed memory architecture of computer clusters. Various
branches of the tree of local minima are stored on different processors, and
are processed independently of each other. It allows one to use the combined
RAM of many processors. Our experiments with parallelization of CAM are
described in [BTMRB03, BTMOl].

Table 1. Comparison of performance of CAM and Extended CAM on a set of test


problems. CPU is measured on Pentium 4 1.2GHz PC with 512 MB RAM, under
Windows XP. The algorithms were implemented in C-f + language (Visual C++ 6
compiler). The values in the last column are the global minima of the functions,
found by a local descent algorithm starting from the approximate minimum found
by CAM/ECAM.
[Problem m Iterations CPU upper lower Solution improved
(sec) bound fbest bound by local method
1 (CAM) 2 30000 3.12 -1.0302 -1.07 -1.03163
1 (ECAM) 2 10000 1.31 -1.0316 -1.07 -1.03163
2 (CAM) 2 10000 1.10 -2.1452 -2.152 -2.14520
2 (ECAM) 2 10000 1.03 -2.1452 -2.148 -2.14520
3 (CAM) 3 40000 21.5 -0.999 -1.09 -1
3 (ECAM) 3 10000 2.7 -0.9998 -1.10 -1
4 (CAM) 2 10000 0.99 0.0022 -0.61 0
4 (ECAM) 2 10000 1.30 0.000012 -0.06 0
4 (CAM) 3 40000 21.1 0.0071 -0.41 0
4 (ECAM) 3 40000 17.2 0.0058 -0.138 0
4 (CAM) 4 60000 380 0.00 -1.02 0
4 (ECAM) 4 60000 231 0.00 -0.91 0
4 (CAM) 5 90000 523 0.00 -1.18 0
4 (ECAM) 5 90000 460 0.00 -0.51 0

4 Random variate generation: acceptance/ rejection


4.1 Problem formulation

Efficient non-uniform random number generators are important in many ap-


plications, such as Markov Chain sampling. Many specialized algorithms for a
variety of standard distributions are available; however more recently so-called
black box methods have attracted substantial attention [HLD04]. These meth-
ods are applicable to a large class of distributions, but require a setup stage
and are generally less efficient than the specialized methods. The monographs
[DagSS, Dev86, HLD04] present a wide range of methods used in this area.
Applications of Cutting Angle methods 225

There are two main approaches for generating random numbers from ar-
bitrary distributions. The inversion method relies on knowledge of the inverse
of the cumulative distribution P(x), P~^{y). If this inverse is given explicitly,
then one generates uniformly distributed random numbers Z and transforms
them to X using X = P~^{Z). This approach is very useful when distrib-
utions are simple enough to find P~^ analytically, however, in case of more
complicated distributions, P~^ may not available, and one has to invert P
numerically by solving the equation Z = P{X) for X, e.g., using bisection or
Newton's method. Given the slowness of numerical solution, this method be-
comes very inefficient. This method cannot be used for multivariate densities.
The second approach, so-called acceptance/ rejection method^ relies on effi-
cient generation of random numbers from another distribution, whose density
h{x) multiplied by a suitable positive constant, dominates the density p{x) of
the required distribution, Vx G Dom[p] : p{x) < g{x) = ch{x). The function
g{x) is often called the hat function of the distribution with density p. In this
case we need two independent random variates, a random number X with
density h(x) and a uniform random number Z on [0,1]. If Zg{X) < p{X),
then X is accepted (and returned by the generator), otherwise X is rejected,
and the process repeats until some X is accepted.
The acceptance/rejection approach does not rely on the analytic form of
the distribution or its inversion. However, its effectiveness depends on how
accurate p is approximated from above by the hat function. The less accurate
is the approximation, the greater is the chance of rejection (and hence ineffi-
ciency of the algorithm). A number of important inequalities relating densities
of various distributions are presented in [Dev86]. These inequalities allow one
to choose an appropriate hat function for a given p.
The acceptance/rejection approach generalizes well for multivariate distri-
butions. In fact, this method does not change at all if X is a random vector
rather than a random number. The challenge lies in efficient construction
of the hat function for a multivariate density p{x), and finding an efficient
way to sample from the distribution defined by this hat function. With the
increasing dimension, the need for tight upper approximation to p becomes
more important, as the number of wasted calculations in case of X rejected
increases.
Subdivision of the domain of p is frequently used in universal random
number generators [Hor95, LH98, LHOl]. If little information about p is avail-
able (i.e., no analj^ical form), a piecewise constant (or piecewise linear) hat
function can be used. It is constructed by taking values of p at a number of
points (Fig.9). For instance, some methods use concavity of p to guarantee
that such an approximation overestimates p, whereas in [Hor95, LH98, LHOl]
the log-concavity or T-concavity is exploited. A function is called log-concave
(or T-concave for a monotone continuous function T), if the transformed den-
sity p = ln{p) (or p = T{p)) is concave. In [ES98] the authors rely on detecting
the inflection points of p in their construction of the hat function.
226 G. Beliakov

K"
\ g(x)
1 6 >

P'x) \

6;—

' • ' • ' 1 • ' • ' ] ' • • ''^T"~^^~'^~nr"^ '^ ' i

Fig. 9. A piecewise constant upper semicontinuous hat function (thick soHd hne)
that approximates a monotone density p.

However, regardless of the way the hat function is obtained at this pre-
processing step, the random numbers are always generated in a similar fash-
ion. First the interval is chosen using a universal discrete generator (e.g., using
alias method [Dev86, Wal74]). Then a random variate X is generated that has
a multiple of the hat function on this interval as its density. Then X is ei-
ther accepted or rejected (according to whether Zg{X) < p{X) for a uniform
random variate Z on [0,1]). In case of rejection we have to restart from the
first step. The intervals are chosen with probabilities proportional to the area
under g on each interval.
It is clear that the form of the hat function g on each interval of the
subdivision needs not be the same. While constant or linear functions can be
used for some intervals, on intervals where p is has a vertical asymptote, or
on infinite intervals (for the tails of the distributions) other forms are more
appropriate (e.g., multiples of Pareto or Cauchy tails). It is also clear that
the multivariate case can be treated in exactly the same way, by partitioning
the domain into small regions. For T-concave distributions such method is
described in [LH98].
Hence, efficient universal generators of non-uniform random numbers or
random vectors can be built in a standardized fashion, by partitioning the
domain of p, and constructing a piecewise continuous hat function. The prob-
lem is how to build an accurate upper approximation that can serve as a hat
function. In this section we review the methods of building the hat functions
based on one-sided approximations discussed earlier in this paper.

4.2 Log-concave densities

The use of envelop representation of convex functions, and one-sided approx-


imation of type (5) has been used to construct the hat function of univariate
log-concave densities for some time [Dev86, HLD04]. In [LH98] the authors de-
Applications of Cutting Angle methods 227

scribed the transformed density rejection approach applicable to multivariate


T-concave distributions. Consider a continuous strictly increasing function T.
A density p is called T-concave if the transformed density p — T{p) is concave.
A typical example is T := In, in which case p is called log-concave.
Let us define a convex function / = —T{p). We shall build an underesti-
mate of / using Eq.(5), and then change its sign to obtain the overestimate
g^ = —H^. After this, the hat function of p is computed as ^ = T~^{g^).
In the univariate case, generation of random numbers using a multiple of
the hat function g — T~^{g^) as the density is quite simple. Firstly, one calcu-
lates the intersections of linear segments of functions
H^ = maxfc(/(x^) + A^(x — x^)), which gives a partition of the domain
of p into subintervals. H^ is linear on each subinterval, and since T is given,
generation of random numbers on each subinterval using g as the hat function
is easily achieved by inversion [Dev86, HLD04]. The choice of the subinterval
is performed using a discrete randon mumber generator.
The multivariate case proceeds in a similar fasion, but with a more com-
plicated generation step. The authors of [LH98] use piecewise linear function
H^ (5) to build the hat function g = T-^{-H^). Then they determine the
partition of the domain of p into the set of convex polyhedra (bounded or
unbounded), so than on each polyhedron H^ is linear. Then the authors use
the sweep-plane algorithm to generate random vectors on each polyhedron.
As earlier, the choice of the polyhedron is performed using a discrete randon
mumber generator. The programming library UNURAN implements several uni-
versal random variate generation algorithms for T-concave densities [HLD04].

4.3 Univariate Lipschitz densities

In this section we consider univariate Lipschitz-continuous densities p on a


compact set. As we mentioned earlier, the infinite domains can be treated by
splitting them into a compact and semi-infinite interval (say, [0, a], [a, oo)).
The hat function of the tail of the distribution on [a, oo) can be the multiple
of Pareto heavy tail distribution g{x) — cjx^^^, and will not be treated here.
We are interested in the compact subdomain [0,a] (or [a, 6] for generality).
Let us subdivide the interval [a, h\ into a finite number of subintervals
fx^x^+M
\aM= U [^''^'
k=\,...,K-\

whose interiors do not intersect {x^^x^^^) fi {x^,x^^^) — 0, if j ^ k.


Lipschitz continuity can be exploited in order to put upper and lower
bounds on the values of p on any subinterval [x^, x'^"^^], given its values at the
ends pk = p{x^),pk-\.i = p{x^'^^), namely
228 G. Beliakov

rmix{pk - M\x^ -x\,pk-\-i - M\x^~^^ - x\} < p{x)


< min {pk + M\x^ - x|,p^+i + M\x^^^ - x\} ,
X e [x^,x^+^].

As earlier M denotes the Lipschitz constant of p. By having K values of


p on [a, 6] we can build the saw-tooth overestimate of p, which we can use as
the hat function
g'<{x)=mm(pk + M\x''-x\). (20)

One can recognize Eq.(7), in which we use / = —p and H^ = —g^, as we are


interested in the upper, rather than lower approximation.
The use of saw-tooth overestimates as hat functions in the acceptance/
rejection approach was described in [Dev86], p.348. The process of building the
saw-tooth overestimate of p can be organized very efficiently (in O(i^logi^)
operations), and the points x^ can be chosen either randomly on [a, 6], or,
which is more efficient, by choosing one of the schemata described in [HJ95,
SSOO]. For example, in Pijavski-Shubert algorithm [Pij72], given a set of K
function values pk, k = 1,... ,K, one chooses the K + 1-st value at the global
maximum of the function g^{x) in (20). The global maximum of (20) is found
by sorting out all local maxima (the teeth of the saw-tooth cover). This way
the saw-tooth overestimate tends to be closer to p, which reduces the chance
of rejection.
There are two ways to proceed with building the hat function after the saw-
tooth overestimate is built. Firstly, we can use a constant hat function g{x) =
m a x ^ ^ ( x ) , x G [x^,x^'^^] on every subinterval [x^,a;^+-^], A: = l , . . . , i ^ — 1.
Secondly, we can use the saw-tooth overestimate itself as the hat function,
g{x) = g^{x), in which case we need to divide [a, 6] into as twice as many
subintervals [x^, ^^], [^^^x^'^^],k = 1 , . . . , ii' — 1, where ^^ is the local maxi-
mizer oi g^, ^^ = argmax^^^j^-fc .^fc+ij g^{x). On each subinterval the hat func-
tion is linear, and the random variate X with (a multiple of) such density,
as required by the acceptance/ rejection method, is generated using inversion
(Fig.lO).
It is worth noting that the described approach is applicable to multimodal
distributions (as opposed to T-concave distributions in [Hor95, LHOl]). How-
ever, this method requires knowledge of the Lipschitz constant of p, M, which
is a crucial piece of information. If unknown, the Lipschitz constant can be
safely overestimated, at the price of less accurate upper approximation. In
references [WZ96, SL97, SSOO, Ser03] various methods of estimating Lipschitz
constants are developed. These methods are based only on the ability to com-
pute the values of p, not on its analytic formula. On the other hand, the value
of M can sometimes follow from theoretical considerations.
Using saw-tooth overestimates as hat functions requires more function
values K than methods applicable to T-concave distributions, which translates
into a longer pre-processing step (building saw-tooth overestimate and tables
Applications of Cutting Angle methods 229

random X

Fig. 10. A piecewise linear hat function g built using the saw-tooth overestimate
in the univariate case.

y,Pix'^

Fig. 11. A piecewise constant hat function g built using the saw-tooth overestimate
in the univariate case. The value gk is chosen as the absolute maximum of the
saw-tooth overestimate on each Dk-

for the alias method) and longer tables in the alias method, but not in longer
generation time once preprocessing has been finalized.
One variation of this method is to use shorter tables (i.e., less subintervals),
but to improve the lower overestimate of the maximum of p on each subinter-
val. Previously we assumed that such lower overestimate is the maximum of
230 G. Beliakov

g^ on [x^^x^'^^]. It is possible to improve this value by performing subdivision


of these subintervals in search for the global maximum of p on them, without
recording the finer partition. This can be done by applying Pijavski-Shubert
algorithm on each [x^,x^"^^], and then taking as the hat function the piece-
wise constant function g^ whose values are given by the lower overestimates
of the global maximum of p on each [x^,a;^"^^] (Fig.11).

4.4 Lipschitz densities in BP^

Consider generation of random vectors X with density p on a compact subset


A ^ R^ using acceptance/ rejection approach. We shall use the unit simplex
S as the set A, but it is not difficult to modify this method for subsets of 5,
like polytopes D C S.
We consider a Lipschitz continuous density p on S. Treatment of the tails
is outside the scope of this paper. Our goal is to build a partition of S into
simple polytopes (e.g., simplices) on which we shall (narrowly) overestimate
p with a constant function. This piecewise constant upper approximation will
be our hat function in the acceptance/ rejection approach.
Because Lipschitz functions on S can be seen as restrictions of a suitable
IPH function (see discussion after Example 2), we will use the underestimate
(6) in our computations. Let us define an auxilirary IPH function / = —p-\-C,
with C > maxxes p{x) + 2M. Using the values of / at x^, fc = 1 , . . . , i ^ ,
build the underestimate H^ (6). At this stage, we can take the function
g^ = —H^ + C as the overestimate of p, and use it in the acceptance/
rejection algorithm. However this is extremely inconvenient, because it is hard
to build a random variate generator which uses such a complicated g^ as the
density.
Instead, we will use a simpler piecewise constant hat function. We know
that function H^ is piecewise linear, and possesses a number of local min-
ima, which can be identified from combinations of support vectors (12) using
Eq.(13). We further know that on sets A{L) characterized by (14), each local
minimum is unique, and these sets form a partition of S. Define the following
piecewise constant underestimate of /

H{x) = d{L), i f x G ^ ( L ) ,

where L is the combination of support vectors which identifies the minimizer


X*, and A{L) is the set (14) on which it is unique. Now we take g = —H + C
as the hat function.
We now need an efficient method of generating random variates with a
multiple of the hat function as the density. In our case the hat function is
piecewise constant, which means that we can generate random variates in two
steps: 1) randomly choose an element of the partition A{L), with probability
proportional to the volume of A{L) times d{L); 2) generate X uniformly
distributed on A{L). The first step requires an efficient discrete random variate
Applications of Cutting Angle methods 231

generator. We can use the alias method [Dev86, Wal74] for this purpose. The
second step requires additional processing, as generation of random variates
on a polytope requires its triangulation.
Generation of random variates uniformly distributed in a simplex is rela-
tively easy using sorting or uniform spacings [Dev86],p.214. The way to gen-
erate uniform random variates on a general polytope A{L) is to subdivide
it into simplices, the procedure known as triangulation. Further, it is easy
to compute the volume of a polytope given its triangulation. Hence we will
triangulate every polytope A{L) as part of the preprocessing.
For our purposes any triangulation of the polytope is suitable, and we
used the revised Cohen and Hickey triangulation as described in [BEFOO].
This triangulation method requires the vertex representation of the polytope
A(L), whereas it is given as the set of inequalities (14). The calculation of ver-
tex representation of A{L) can be done using the Double Description method
[FP96, MRT53]. The software package CDD, which implements the Double
Description method is available from [Fuk05]. The software package Vinci,
available from [Eng05] can be used for the revised Cohen and Hickey triangu-
lation.
Once the triangulation of the sets A{L) is done, the volume of each simplex
needs to be computed and multiplied by the value of the hat function g on
it. The volume computation is performed by taking the determinant of an
n X n-matrix of vertex coordinates [BEFOO]. The vertices and volumes (times
the value of g) of the simplices that partition the domain of p are stored for
the random vector generator.
Summarizing this section, given an arbitrary Lipschitz density p on 5, we
can find an underestimate H^ of an auxiliary function / = —p -f C, and a
partition of S into polytopes A{L), such that on each A(L), the local minimum
of i J ^ , d(L) in (13), is the greatest lower bound on / . This lower bound is
tight, i.e., one can find such a Lipschitz function, that min^^^(£,)/(x) = d,
for instance / = H^ itself. Based on H^, we define the hat function as
g = —H + C, where H{x) = o!, if x G A[L), Then we subdivide each polytope
A{L) into simplices to facilitate generation of random variates, and compute
the volume of each simplex for the discrete random variate generator.

4.5 Description of the algorithm

Let us now detail some of the steps required to build a universal random
vector generator using the hat function described in the previous section. The
algorithm consists of two parts, preprocessing and generation. First, given
the set of values p{x^), k = 1 , . . . , iC, we build the saw-tooth underestimate
of an IPH function / = —p-\-C. Points x^ can be given a priori, or can be
determined by the algorithm itself, for instance each x^^k = n + 1 , . . . can
be chosen as a global minimizer of the function H^~^, i.e., at the teeth of
the saw-tooth underestimate at the current iteration. The first n points are
always chosen as the vertices of 5.
232 G. Beliakov

We build the saw-tooth underestimate (6) by enumerating its local min-


imizers using the combinatorial technique presented in section 3.3. Based
on these local minimizers, we partition the domain S into polytopes A{L)^
and then further into simphces. On each A{L) the hat function is defined by
g = —d-\- C. We complete the preprocessing part by computing the volumes
of each simplex Si in the partition.
The generation part now works as usual: 1) randomly choose a simplex Si
of the partition of S according to the probability, which is proportional to the
volume of Si times the value of g on it; for this we use the alias method. 2)
generate a random vector X uniformly distributed in the chosen simplex, see
[Dev86]. 3) generate an independent random number Z, uniformly distributed
in [0,1]; if Zg{X) < p{X) then accept X, otherwise reject X and return to
step 1).
The overall algorithm to generate random vectors with density p follows.

Acceptance/rejection Algorithm for Lipschitz densities


Requires: density p (not necessarily an analytic expression), its Lipschitz con-
stant M in /i-norm (or its overestimate) and Pmax — niax^^5/?(a:).
The number of points jFf as a control parameter.

Preprocessing
1 Choose constant C > pmax + 2M
2 Build the saw-tooth underestimate H^ of the function / = ~p + C using
K points x^ within the domain of p, by using the algorithm from [BB02,
Bel03]. Except for the first n points, x^ are chosen automatically by the
algorithm.
3 For each local minimum of H^ compute the polytope A{L) using (14).
4 Convert each A{L) to the vertex representation using the Double Descrip-
tion method from [FP96, MRT53] and find its triangulation.
5 For each simplex Si from the triangulation of A{L) find its volume and
multiply it by P{Si) = C- d{L).
6 Store the list of all simplices as the list of vertices and computed values P
and VP[Si) = Volume[Si) x P{Si).
7 Create two tables for the alias method using the values VP as the vector
of probabilities.
Random vector generation
1 Using the alias method randomly choose simplex Si.
2 Generate random vector R uniformly distributed in the unit simplex S
([Dev86], p.214, via either sorting or uniform spacings).
3 Compute vector X — Y^^=i Rj^i, where S] is the j - t h vertex of the chosen
simplex Si ([Dev86], p.568).
4 Generate an independent uniform random number Z in [0,1]
5 If ZP{S) < p(X) then return X otherwise go to Step 1.
Applications of Cutting Angle methods 233

Generation step clearly requires n + 1 random numbers (either uniform


or exponential, see [Dev86], p.214), and calculations take 0{n'^) operations,
because of computing the n components of X in the sum. Bucket sort is
assumed to take on average 0{n) operations ([Dev86], p.216). Probability of
rejection depends on how accurate is the computed upper approximation to p,
which in turn depends on its Lipschitz constant and the number of points K.
The latter value is the control parameter for the algorithm: the more points
are used, the better is the approximation, but the longer is preprocessing step,
dominated by building the saw-tooth underestimate and triangulation.
The number of simplices in the partition of the domain of p is difficult to
calculate a priori, but Table 2 provides some indicative values.

Fig. 12. Multimodal density p used to generate random vectors in R^. p in this
example is a mixture of five normal distributions. The algorithm uses exclusively
numerical values of p and its Lipschitz constant.

4.6 Numerical experiments

We tested the acceptance/rejection method for Lipschitz densities on some


multivariate multimodal distributions, such as a mixture of several normal
distributions with different weights a^, p and covariance matrices. One such
distribution is plotted on Fig. 12 for the case of two variables. Of course, one
can easily generate random variates from such a mixture using alternative
methods (e.g., composition method, if the parameters ai,p,U are known).
However, none of this information was available to the algorithm, which relies
only on the ability to compute the value of p at a given point (plus its Lip-
schitz properties). Figs. 13,14 depict graphs of other densities used for testing.
Sampling from these non-standard densities is a much more challenging prob-
lem than sampling from a mixture of normal distributions, yet the described
algorithm easily accomplishes this task with the same efficiency.
234 G. Beliakov

Fig. 13. Density p used to generate random vectors in B?^ given by p{x^y)
kexp{—{y — x^Y ~ ^ V^ )• This density is not log-concave.

Fig. 14. Density p used to generate random vectors in B? is given by p(r)


(|r| - \f X e x p ( - l ^ ^ ± | ^ ) , where r = {x,y).
Applications of Cutting Angle methods 235

Table 3 presents timing of preprocessing and generation steps for various


n and K for one such /9, taken as
5
p= ^ aiNorm{iJ.i, Ui).
i=l

Covariance matrices were all diagonal. For the reference, the time to gener-
ate one uniform random number was 0.271 x 10~^ sec. The Ranlux lagged
Fibonacci generator with the period 10^^^ was used for uniform random num-
bers [Lue94].

Table 2. The number of local minima of H^. Function / 1 was used in the
calculations.
K n=1 n = 3 n — 5 n= 7 n= 9
1000 999 4699 13495 24810 31217
'2000 1999 9631 28210 50526 74132
4000 3999 20435 104117 177358 187973
8000 7999 42031 270328 527995 886249
15000 14999 81301 532387 1093040 1956075
20000 19999 109587 738888 1605995 2661807
25000 24999 137770 993812 3861070 6175083
30000 29999 167251 1234810 6340898 10521070

Table 3 clearly shows that as K increases, the upper approximation be-


comes more tight, and the acceptance ratio improves. However, this is at the
cost of a rapidly growing number of simplices in the subdivision of the domain
of p, and thus at the cost of increased preprocessing time, especially for n > 3.

5 Scattered data interpolation: Lipschitz approximation


5.1 Problem formulation
Multivariate data interpolation and approximation is a very common problem
in many branches of science. Sometimes this problem is referred to as regres-
sion, estimation, data fitting, learning of functions and other names. There is
a great number of techniques developed for various instances of this problem,
such as polynomial regression, spline interpolation and smoothing, wavelets,
nearest neighbour search, Sibson interpolation, MARS (multivariate adaptive
regression splines), machine learning techniques (e.g., decision trees), neural
networks, radial basis functions, etc. For an overview the reader is referred to
[Alf89].
Shape preserving approximation refers to the approximation problem in
which in addition to the data, other information about the function in question
236 G. Beliakov

Table 3. Performance of the acceptance/rejection method as a function of dimension


n and the number of points K. Preprocessing step includes building the saw-tooth
underestimate and triangulation. Generation time is the average time to generate
one random n-vector. Acceptance ratio is the criterion of efficiency.
Time to build Time for Generation j Accep-
n K Number of saw-tooth under- triangu- time tance
simplices estimate (s) lation (s) (sxlO-^) ratio
2 300 1276 0.05 0.27 9.28 0.241
1000 4424 0.18 0.73 6.11 0.36
2000 8972 0.33 1.39 5.24 0.44
4000 18078 0.56 2.82 4.82 0.53
8000 36369 1.11 5.78 4.31 0.61
16000 73208 2.29 11.70 4.08 0.69
3 300 16166 0.39 2.20 23.8 0.13
1000 60080 1.08 8.38 18.0 0.18
2000 124300 1.98 17.32 15.3 0.21
4000 259428 3.74 35.22 12.9 0.26
8000 530237 7.24 69.28 11.2 0.31
'4 300 333522 3.18 30.58 63.5 0.06
1000 1399372 11.06 116.6 58.2 0.09
2000 3087003 22.51 268.6 50.1 0.11
5 50 509560 1.39 41.12 29950 0.00012
100 1904996 4.80 102.3 27130 0.0002
200 5378880 14.4 370.8 21411 0.000281

is available. For instance, it may be known a priori that the function must
be monotone, convex, positive, symmetric, unimodal, etc. These conditions
determine additional constraints on the approximant, which may find explicit
representation in terms of the parameters that are fitted to the data. In spline
approximation, this problem has been thoroughly studied (see [Die95, KM97,
KvaOO, BelOO]), and such constraints as monotonicity or convexity usually
translate into restrictions on spline coefficients.
More recently, the concept of shape preserving interpolation and approxi-
mation has been extended to include other known a priori restrictions on the
approximant, such as generalized convexity, unimodality, possessing peaks or
discontinuities, Lipschitz property, associativity [KM97, Bel03]. These restric-
tions require new problem formulations leading to new specific methods of
approximation.
In this section we consider interpolation of scattered multivariate data
which restricts the Lipschitz constant of the interpolant. Lipschitz condition
ensures reasonable bounds on the interpolated values of the function, which is
sometimes hard to achieve in nonlinear interpolation. As we shall see, preserva-
tion of the Lipschitz condition implies strict bounds on the difference between
the interpolant and the function it models in the Chebyshev max-norm, so
Applications of Cutting Angle methods 237

that Lipschitz interpolation guarantees the performance of the interpolant


in the worst case scenario, whereas other methods target the average perfor-
mance. In this sense, Lipschitz approximation translates into reliable learning
of functions [Coo95].
As the interpolant, we will use a combination of the lower and upper ap-
proximations of Lipschitz / defined by (7) or (8). We will show that such an
interpolant is not a matter of arbitrary choice, but arises as the solution to
the best uniform approximation problem, as formulated in the next section.
On the other hand, the obtained solution is a piecewise continuous function
(piecewise Unear in case of (8), i.e., a linear spline). Sphnes possess many
desirable features, such as stability and speed of evaluation, local behaviour,
ability to model functions of virtually any shape, and so on [Die95]. We also
obtain continuous dependence of the interpolant on the data, which is fre-
quently hard to achieve [Alf89].

5.2 Best uniform approximation

Assume that we are given a data set {{x^,y^)}^=i, x^ G EP'.y^ G R. We also


assume that y^ are the values of some function f{x^) = y^, which is unknown
to us and which we want to approximate with g, g ^ f- Thus we look for an
interpolant g : R^ -^ R, such that

g{x')==y',k = l,...,K.

It is known (e.g., see [GW59]) that it is impossible to give finite bounds


on the values / ( x ) , x ^ x^,k = 1 , . . . , X in terms of the data set, if the only
additional information is that / is the element of a linear space V, no matter
how restricted the space V is in terms of conditions of continuity, smoothness,
analyticity, etc. Therefore it is meaningless to speak about the goodness of
approximation without a reference to some nonlinear constraint on V.
We shall work in the space of continuous functions with the supremum
norm, i.e., V = C{X),X C R^, We shall assume that / is bounded and
Lipschitz continuous, with the Lipschitz constant M in the norm || • ||. We
denote the class of functions whose smallest Lipschitz constant is equal or
smaller than M by Lip{M). We can use any norm, or any distance function
dp. Our goal is to find an interpolant g that approximates / well at the
points X distinct from the data, given that / G Lip{M). That is, we solve the
following problem.
Given the data set as above, find an optimal interpolating function gM -
R"" -^R,
gM = ^Tg ^^^ inf . ^ ^ ^ I | | / - ^ | | c m } (21)

such that
gix'') = fix') = y\k = l,...,K.
238 G. Beliakov

Golomb and Weinberger [GW59] have considered the problem of approxi-


mation in Hnear spaces subject to finite bounds on some nonlinear functional
in a very general setting. Let V be a linear vector space, u is an element of
V, and F{u)^ Fi(i^),..., FK{U) are linear functionals on this space. Given the
values of functionals Fk{u),k = 1,,.. ,K, the goal is to approximate F{u),
subject to u being restricted to some subset 5 C V. The subset S is defined
by means of a non-negative nonlinear positively homogeneous and continuous
functional p{u): S = {u G V : p{u) < r}. Thus the unknown function u is
known to lie in the intersection of the set S and the plane v e V, defined by
Fk{v) = fk,k = l,...,K.
Consider the set a of values the functional F{v) assumes as v ranges over
this intersection. Under certain conditions on p (namely, the triangular in-
equality), cr is a closed interval, and the best approximation problem has a
solution u that corresponds to the midpoint of this interval, while the error
bounds on F{u) are easily computed as half-length of a.
In our case, V is the space of continuous functions C(X), the functionals
F^Fk^k = 1,...,J^ are defined as the values u{x),u{x^)^ and p{u) is the
Lipschitz seminorm (i.e.,

\/veV : p{v) - : i n f { M : \v{x) - v{z)\ < M||x - z||, Vx,z}).

For every x G X , denote the interval a by [i7^^^^'"(a:),F^^P^^(x)], where


jjiower^jjupper ^^^ respectively the lower and the upper bounds on u. Then
the solution to the best uniform approximation problem (21) is given by

u{x) = ^[iJ^^^^^(x) + iJ^^^^^(x)].

To build a constructive interpolation algorithm, we need a suitable repre-


sentation for functions H^ower^ j^upper ^ rpj^jg representation is given by Eqs.
(7) or (8), and involves only the values of f{x^) and its Lipschitz constant M.
We already used this representation to build both the lower and the upper
approximations of / . We now combine the two approximations.

5.3 Description of the algorithm

First we describe Lipschitz approximation algorithm in the univariate case for


the purposes of illustration, and then we proceed to the general multivariate
case. Given the dataset {{x^,y^)}^=i, x^,y^ G i?, and the Lipschitz constant
of / , M, define the lower and upper approximations

ffiowerf^^>^ = m a x ( / - M\x - x^l), H^'P^^^x) = mm(y^ + M\x - x^\).


k k

The lower approximation directly follows from (7), whereas the upper ap-
proximation is built from the lower approximation of an auxiliary function
/ = - / , c f . Eq. (20).
Applications of Cutting Angle methods 239

H^^^^^Cx)

Fig. 15. The lower and upper approximations of a Lipschitz function / , and the
best uniform approximation g.

Both approximations are piecewise linear functions, illustrated on Fig.


15. Their calculation for any x can be performed very efficiently in O(logi^)
operations by locating the interval x G [x'^, x^+^), assuming that x^ are sorted
in increasing order. Under our assumption that / G Lip{M),

yxex : /f^^^^^x) < fix) < iJ^pp^^(x),


and the bounds are tight. The optimal interpolant is then

g{x) = - (j^^^^"^(x) + i7^^^^^(x)) .

Such an interpolant was considered in [Coo95, ZKS02]; the authors used it


as a tool for reliable learning of Lipschitz functions. It possesses a number of
desirable features listed below.
(1) ^ is a piecewise linear continuous function.
(2) ^ has Lipschitz constant M, i.e., g G Lip{M).
(3) g reproduces constant and linear functions.
(4) g preserves the range of the data min{y^} < g < max{y^}.
(5) g preserves monotonicity of the data, if for all k: x^ < x^'^^ implies y^ <
2/^"^\ then g{x) < g(z) Vx, z : x < z.
(6) g continuously depends on x^ and y^.
(7) The tight bound on the largest error of approximation is computed
as C = Mmaxa^min^fc \x - x^\. That is V/ G Lip{M),f{x^) = y^,
maxa; \f{x) — g{x)\ < C, and this bound is achieved, e.g., when / = H^^^^^
o r / = ff upper ^
(8) ^ is a minimum of the functional F{g) = J^ \g\x)\dx.
Now consider the multivariate case. We use the underestimate (7) (and
the respective overestimate) as the functions iJ^^^^^, ^upper^ ^ ^ ^^^ ^g^ ^^y.
240 G. Beliakov

norm in (7). However the method based on the simplicial distance (8) is very
efficient numerically. In this case we can represent H^^^^'^ through the list
of its local minimizers. We have an efficient method of enumerating local
minimizers of (8), described in Section 3.3. This representation is useful when
a value of H^^'^^'^ is needed for an arbitrary x G X. It allows one to compute the
maximum in (8) using only a limited subset of { 1 , 2 , . . . , K}^ which makes the
algorithm competitive with alternative methods (like Sibson's interpolation
[Sib81]).
To obtain the overestimate H'^PP^'^ we proceed as earlier: define an auxil-
iary function / = —/, for which we build the underestimate (8), then we take
TTupper __ 77"^

Like its univariate counterpart, the multivariate interpolant

also possesses a number of desirable features. It provides uniform approxi-


mation to / , preserves its range, preserves the Lipschitz constant of / , and
provides local approximation scheme (i.e., values of g depend only on the
nearest data points). Furthermore, g depends continuously on the data. The
latter property is very desirable [Alf89], but only a few multivariate inter-
polants possess this property. For instance, none of the schemata based on
triangulation of the domain of / has this property.
However, the most important feature of the interpolant g is that it provides
the best approximation of / in the worst case scenario: no matter how "bad"
was the Lipschitz function / that generated the input data, or how inconve-
niently these data are distributed, g is the best approximation of / based on
the available data. Thus our method translates into reliable approximation of
/ : even in the worst case the error bounds are guaranteed.

5.4 Numerical experiments

To illustrate the performance of the interpolant g we approximate the follow-


ing Lipschitz functions.
Test function 1

f{x) = sinxi sinx2 + 0.05(sin5xi sin6x2)^,x e [0,1]^.

Test function 2

f{x) = sin 5x1 sin2x2 + 0.2(sin20xi sin 20x2)^, x G [0,3]^.

Test function 3

f{x) = f[sin2xuxe [0,3]^.


i=l
Applications of Cutting Angle methods 241

Table 4. Performance of the algorithm for test function 1 as a function of the


number of data points.
K preprocessing evaluation max error root mean
time(s) time (s xlO~^) squared error
10000 1.021 0.293 0.173 0.025
20000 2.393 0.28 0.116 0.018
40000 5.578 0.39 0.0923 0.013
80000 12.878 0.45 0.069 0.0090

Table 5. Performance of the algorithm for test function 2 as a function of the


number of data points.
K preprocessing evaluation max error root mean
time(s) time (s xlO~^) squared error
10000 1.021 0.25 0.34 0.045
20000 2.43 0.28 0.18 0.031
40000 5.83 0.34 0.15 0.021
80000 12.9 0.40 0.021 0.013

Table 6. Performance of the algorithm for test function 3 as a function of the


number of data points and dimension.
n K preprocessing evaluation max error root mean
time(s) time (s xlO~^) squared error
3 1000 0.17 0.72 0.63 0.14
10000 2.8 1.43 0.32 0.063
20000 6.67 1.66 0.27 0.050
40000 15.69 1.85 0.18 0.038
80000 35.57 2.09 0.17 0.031
4 1000 0.78 4.42 1.01 0.19
5000 7.29 8.91 0.72 0.13
10000 18.2 11.0 0.61 0.113
20000 45.3 20.8 0.33 0.08
40000 110.0 15.7 0.29 0.076
5 1000 4.66 29.84 1.04 0.19
5000 54.08 69.06 0.83 0.14
10000 211.8 98.40 0.62 0.121

The approximations of test functions 1 and 2 are plotted on Figs, 16-19.


Tables 4-6 provide quantitative information about the quality of fit and the
speed of evaluation.
There are tMro steps of the algorithm that need benchmarking. The first
step of building the interpolant g is called preprocessing, and the second step
is the evaluation of g for an arbitrary x. Evaluation step was performed A^ =
242 G. Beliakov

Fig. 16. Test function 1

Fig. 17, Uniform approximation of the test function 1 using 20000 data points.
Applications of Cutting Angle methods 243

Fig. 18. Test function 2

Fig. 19. Uniform approximation of the test function 2 using 80000 data points.
244 G. Beliakov

100000 times at random points to gather statistics, and the average time
is reported. Further, the maximum and mean errors of approximation are
reported. The root mean squared error is computed as

where A^ is the number of test points x^ not used in the construction of the
interpolant. All computations were performed on a Pentium-IV PC, 1.2 GHz,
512 MB Ram, Visual C + + (version 6) compiler.

6 Conclusion
The theory of abstract convexity provides us with the necessary tools for build-
ing guaranteed tight one-sided approximations of various classes of functions.
Such approximations find applications in many areas, such as global opti-
mization, statistical simulation and approximation. In this paper we reviewed
methods of building lower (upper) approximations of convex, log-convex, IPH
and Lipschitz functions, which commonly arise in practice.
We presented an overview of three important applications of one-sided ap-
proximations: global optimization, random variate generation and scattered
data interpolation. In all three applications we used essentially the same con-
struction, in which the lower (or upper) approximation was represented by
means of the list of its local minima (maxima). We also described a fast com-
binatorial algorithm for identification of these local minima. Each of the pre-
sented applications also requires a number of specific techniques to make use
of this general construction. This paper addresses this issue and presents the
details of the algorithms used in each case, and also illustrates the performance
of the algorithms using numerical experiments, and practical applications.

References
[Alf89] Alfeld, P.: Scattered data interpolation in three or more variables. In
Schumaker, L.L., Lyche, T. (eds) Mathematical Methods in Computer
Aided Geometric Design, 1-34. Academic Press, New York (1989)
[ARG99] Andramonov, M., Rubinov, A., Glover, B.: Cutting angle methods in
global optimization. Applied Mathematics Letters, 12, 95-100 (1999)
[Aur91] Aurenhammer, F.: Voronoi diagrams - a survey of a fundamental data
structure. ACM Computing Surveys, 23, 345-405 (1991)
[BROO] Bagirov, A., Rubinov, A.: Global minimization of increasing positively
homogeneous function over the unit simplex. Annals of Operations Re-
search, 98, 171-187 (2000)
Applications of Cutting Angle methods 245

[BROl] Bagirov, A., Rubinov, A.: Modified versions of the cutting angle method.
In: Hadjisavvas, N., Pardalos, P.M., (eds) Convex Analysis and Global
optimization, Nonconvex optimization and its applications, 54, 245-268.
Kluwer, Dordrecht (2001)
[BRS03] Bagirov, A., Rubinov, A.M., Soukhoroukova, N.V., Yearwood, J.L.: Un-
supervised and supervised data classification via nonsmooth and global
optimization. T O P (Formerly Trabajos Investigacin Operativa), 11, 1-93
(2003)
[BRYOl] Bagirov, A., Rubinov, A.M., Yearwood, J.L.: Using global optimization to
improve classification for medical diagnosis. Topics in Health Information
Management, 22, 65-74 (2001)
[BRY02] Bagirov, A., Rubinov, A.M., Yearwood, J.L.: A global optimization ap-
proach to classification. Optimization and Engineering, 3, 129-155 (2002)
[BB02] Batten, L.M., Beliakov, G.: Fast algorithm for the cutting angle method of
global optimization. Journal of Global Optimization, 24, 149-161 (2002)
[BelOO] Beliakov, G.: Shape preserving approximation using least squares splines.
Approximation theory and applications, 16, 80-98 (2000)
[Bel02] Beliakov, G.: Approximation of membership functions and aggregation
operators using splines. In Bouchon-Meunier, B., Gutierrez-Rios, Mag-
dalena, L., and Yager, R. (eds) Technologies for Constructing Intelligent
Systems, 2, 159-172. Springer, Berlin (2002)
[Bel03] Beliakov, G.: Geometry and combinatorics of the cutting angle method.
Optimization, 52, 379-394 (2003)
[Bel03] Beliakov, G: How to build aggregation operators from data? Int. J. Intel-
ligent Systems, 18, 903-923, (2003)
[Bel04] Beliakov, G.: The cutting angle method - a tool for constrained global
optimization. Optimization Methods and Software, 19, 137-151 (2004)
[Bel03] Beliakov, G.: Least squares sphnes with free knots: global optimization
approach. Applied Mathematics and Computation, 149, 783-798 (2004)
[Bel05] Beliakov, G: Extended cutting angle method of constrained global op-
timization. In: Caccetta, L. (eds) Optimization in Industry (in press).
Kluwer, Dordrecht (2005)
[BTMOl] Behakov, G., Ting, K.-M., Murshed, M.: Efficient serial and parallel im-
plementation of the cutting angle global optimization technique. In: 5th
International Conference on Optimization: Techniques and Applications,
1, 80-87, Hong Kong (2001)
[BTMRB03] Beliakov, G., Ting, K.M., Murshed, M., Rubinov, A., Bertoh, M.: Effi-
cient serial and parallel implementations of the cutting angle method. In:
Di Pillo, G. (ed) High Performance Algorithms and Software for Nonlinear
Optimization, 57-74. Kluwer Academic Publishers (2003)
[BSTY98] Boissonnat, J.-D., Sharir, M., Tagansky, B., Yvinec, M.: Voronoi dia-
grams in higher dimensions under certain polyhedral distance functions.
Discrete and Comput. Geometry, 19, 485-519 (1998)
[BEFOO] Biieler, B., Enge, A., Fukuda, K.: Exact volume computation for convex
polytopes: a practical study. In: Kalai, G., Ziegler, G.M. (eds) Polytopes
- Combinatorics and Computation, 131-154. Birkhauser, Basel (2000)
[Coo95] Cooper, D.A.: Learning Lipschitz functions. Int. J. of Computer Mathe-
matics, 59, 15-26 (1995)
[Dag88] Dagpunar, J.: Principles of Random Variate Generation. Clarendon Press,
Oxford (1988)
246 G. Beliakov

[DR95] Demyanov, V.F., Rubinov, A.M.: Constructive Nonsmooth Analysis. Pe-


ter Lang, Frankfurt am Main (1995)
[Dev86] Devroye, L.: Non-uniform Random Variate Generation. Springer Verlag,
New York (1986)
[Die95] Dierckx, P.: Curve and Surface Fitting with Splines. Clarendon press,
Oxford (1995)
[Eng05] Enge, A.: http://www.lix.polytechnique.fr/labo/andreas.enge/volumen.html
(2005)
[ES98] Evans, M., Swartz, T.: Random variable generation using concavity prop-
erties of the transformed densities. J. of Computational and Graphical
Statistics, 7, 514-528 (1998)
[FloOO] Floudas, C.A.: Deterministic Global Optimization: Theory, Methods, and
Applications. Nonconvex optimization and its applications, 37. Kluwer
Academic Publishers, Dordrecht/London, (2000)
[Fuk05] Fukuda, K.: http://www.cs.mcgill.ca/~fukuda/soft/cdd_home/cdd.html
(2005)
[FP96] Fukuda, K., Prodon, A.: Double description method revisited. In: Deza,
M., Euler, R., Manoussakis, I. (eds) Combinatorics and Computer Science,
91-111. Springer-Verlag, Heidelberg (1996)
[GW59] Golomb, M., Weinberger, H.F.: Optimal approximation and error bounds.
In: Langer, R.E. (ed) On Numerical Approximation, 117-190. The Univ.
of Wisconsin Press, Madison (1959)
[HJ95] Hansen, P., Jaumard, B.: Lipschitz optimization. In: Horst, R, Pardalos,
P. (eds) Handbook of Global Optimization, 407-493. Kluwer, Dordrecht
(1995)
[Hor95] Hermann, W.: A rejection technique for sampling from t-concave distrib-
utions. ACM Transactions on Mathematical Software, 2 1 , 182-193 (1995)
[HLD04] Hormann, W., Leydold, J., Derflinger, G.: Automatic Nonuniform Ran-
dom Variate Generation. Springer, Berlin (2004)
[HPTOO] Horst, R., Pardalos, P., Thoai, N.: Introduction to Global Optimization
(2nd edition). Kluwer Academic Publishers, Dordrecht (2000)
[HP95] Horst, R., Pardalos, P.M.: Handbook of Global Optimization. Noncon-
vex optimization and its applications, 2. Kluwer Academic Publishers,
Dordrecht/Boston (1995)
[Kel60] Kelley, J.E.: The cutting-plane method for solving convex programs. J. of
SIAM, 8, 703-712 (1960)
[KM97] Kocic, L.M., Milovanovic, G.V.: Shape-preserving approximations by
polynomials and splines. Computer and Mathematics with Applications,
33, 59-97 (1997)
[KvaOO] Kvasov, B.: Methods of Shape Preserving Spline Approximation. World
Scientific, Singapore (2000)
[LH98] Leydold, J., Hormann, W.: A sweep-plane algorithm for generating ran-
dom tuples in simple polytopes. Mathematics of Computation, 67, 1617-
1635 (1998)
[LHOl] Leydold, J., Hormann, W.: Universal algorithms as an alternative for
generating non-uniform continuous random variates. In Schueler, G.I.,
Spanos, P.D. (eds) Monte Carlo Simulation, 177-183. A. A. Balkema,
Rotterdam (2001)
Applications of Cutting Angle methods 247

[LBB03] Lim, K.F., Beliakov, G., Batten, L.M.: A new method for locating the
global optimum: Application of the cutting angle method to molecular
structure prediction. In: Proceedings of the 3rd International Confer-
ence on Computational Science, 4, 1040-1049. Springer-Verlag, Heidel-
berg (2003)
[LBB03] Lim, K.F., Beliakov, C , Batten, L.M.: Predicting molecular structures:
Application of the cutting angle method. Physical Chemistry Chemical
Physics, 5, 3884-3890 (2003)
[LS02] LocatelH, M., Schoen, F.: Fast global optimization of difficult lennard-
Jones clusters. Computational Optimization and Applications, 2 1 , 55-70
(2002)
[Lue94] Luescher, M.: A portable high-quality random number generator for lat-
tice field theory calculations. Computer Physics Communications, 79,
100-110 (1994)
[Mla86] Mladineo, R.: An algorithm for finding the global maximum of a multi-
modal, multivariate function. Math. Prog., 34, 188-200 (1986)
[MRT53] Motzkin, T.S., Raiffa, H., Thompson, G.L., Thrall, R.M.: The double
description method. In: Kuhn, H.W., Tucker, A.W. (eds) Contribution to
Theory of Games, 2. Princeton University Press, Princeton, RI (1953)
[Neu97] Neumaier, A.: Molecular modeling of proteins and mathematical predic-
tion of protein structure. SIAM Review, 39, 407-460 (1997)
[OBSCOO] Okabe, A., Boots, B., Sugihara, K., Chiu, S.N.: Spatial Tessellations:
Concepts and Applications of Voronoi Diagrams (2nd edition). John Wi-
ley, Chichester (2000)
[Pij72] Pijavski, S.A.: An algorithm for finding the absolute extremum of a func-
tion. USSR Comput. Math, and Math. Phys., 2, 57-67 (1972)
[Pin96] Pinter, J.: Global Optimization in Action: Continuous and Lipschitz
Optimization-algorithms, implementations, and applications. Nonconvex
optimization and its applications, 6. Kluwer Academic Publishers, Dor-
drecht/Boston (1996)
[Roc70] Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton
(1970)
[RubOO] Rubinov, A.M.: Abstract Convexity and Global Optimization. Noncon-
vex optimization and its applications, 44. Kluwer Academic Publishers,
Dordrecht/Boston (2000)
[Ser03] Sergeyev, Y.D.: Finding the minimal root of an equation: applications
and algorithms based on Lipschitz condition. In Pinter, J. (ed) Global
Optimization - Selected Case Studies. Kluwer Academic Publishers (2003)
[Shu72] Shubert, B.: A sequential method seeking the global maximum of a func-
tion. SIAM J. Numer. Anal, 9, 379-388 (1972)
[SibSl] Sibson, R.: A brief description of natural neighbor interpolation. In: Bar-
nett, V. (ed) Interpreting Multivariate Data, 21-36. John Wiley, Chich-
ester (1981)
[SL97] Sio, K.C., Lee, C.K.: Estimation of the Lipschitz norm with neural net-
works. Neural Processing Letters, 6, 99-108 (1997)
[SSOO] Strongin, R.G., Sergeyev, Y.D.: Global Optimization with Non-convex
Constraints: Sequential and Parallel Algorithms. Nonconvex optimization
and its applications, 45. Kluwer Academic, Dordrecht/London (2000)
[Wal74] Walker, A.J.: New fast method for generating discrete random numbers
with arbitrary frequency distributions. Electron. Lett., 10, 127-128 (1974)
248 G. Beliakov

[WZ96] Wood, G.R., Zhang, B.P.: Estimation of the Lipschitz constant of a func-
tion. J. Global Optim., 8, 91-103 (1996)
[ZKS02] Zabinsky, Z.B., Kristinsdottir, B.P., Smith, R.L.: Optimal estimation
of univariate black box Lipschitz functions with upper and lower error
bounds. Int. J. of Computers and Operations Research (2002)
Part II

Theory and Numerical Methods


A Numerical Method for Concave
Programming Problems

Altannar Chinchuluun^, E n k h b a t Rentsen^, and Panos M. Pardalos^

^ Department of Industrial and Systems Engineering


University of Florida
303 Weil Hall, Gainesville, FL, 32611, USA
altaimaurOufl.edu, psirdalosQufl.edu
^ Department of Mathematical Modeling
School of Mathematics and Computer Science
National University of Mongolia
Ulaanbaatar, Mongolia
renkhbatQses.edu.mn

Summary. Concave programming problems constitute one of the most important


and fundamental classes of problems in global optimization. Concave minimization
problems have a diverse range of direct and indirect applications. Moreover, concave
minimization problems are well known to be NP-hard. In this paper, we present
three algorithms which are similar to each other for concave minimization problems.
In each iteration of the algorithms, linear programming problems with the same
constraints as the initial problem are required to solve and a local search method
is required to use. Furthermore, the convergence result is given. From the result,
we see that the local search method is not necessarily required but we require that
some conditions must hold on the constraint.

K e y w o r d s : Approximation set; Trivial Approximation Set; Improved Ap-


proximation Set; General Orthogonal Approximation Set; Level Set; Concave
Programming; Quasiconcave Function; Global Optimization

1 Introduction
Concave minimization techniques play an i m p o r t a n t role in other fields of
global optimization. Large classes of optimization problems can be trans-
formed into equivalent concave minimization problems. Concave minimization
can be applied in the large number of fields. For instance, m a n y problems
from such fields as economics, telecommunications, transportation, computer
design and finance can be formulated as concave minimization problems. More
applications of concave minimization can be found in [HT93, PR87]. Concave
minimization problems are N P - h a r d , even in most special cases. For instance,
252 A. Chinchuluun et al.

[PS88] has shown that minimizing a concave quadratic function over a very
simple polyhedron such as a hypercube is an NP-hard problem. More com-
plete surveys of the complexity of these and other problems can be found in
[Par93]. General concave minimization problem can be written as follows:

min f{x)
s.t. X e D^

where / is a concave function and D is a convex set. Concave minimization


problems generally possess many local solutions that are not global. More-
over, we know that the global minimum of the above problem is attained at
a vertex of D when JD is a polytope. Many deterministic and stochastic ap-
proaches have been proposed for the local and global solutions to the concave
minimization problem. There are three fundamental algorithmic approaches.
The first approach is the enumerative method and it can be used only when
JD is a polyhedron. The other two approaches are the successive approxima-
tion approach and the branch and bound approach. These approaches can be
found in most global optimization books [HT93, HP95, HPTOl].
In this paper, we present a numerical method to solve the concave mini-
mization problem with specific constraints. Basic idea of the method is to find
an approximate solution to the problem solving linear programming prob-
lems with the same constraints as the initial problem. The paper is organized
as follows: In Section 2, an optimality condition for the quasiconcave min-
imization problem is presented. In Section 3, the concept of approximation
techniques and an approximation set, which are helpful to construct the al-
gorithms, are introduced. In Section 4, three global optimization algorithms,
which are based on the global optimality condition for the concave quadratic
problem, are presented and their convergence properties are established.

2 Global Optimality Condition


Consider the quasiconcave minimization problem

min f{x) (1)


s.t. X e D,

where / : R'^ —> R is a quasiconcave and different!able function and D


is a convex set in R^. Then the following theorem generalizes the result in
Strekalovsky [Str98, SE90] .
Theorem 1. Let z is a solution of Problem (1), and let

EM) = {2/ e R" I fiy) = c}.

Then
A Numerical Method for Concave Programming Problems 253

{x - yfVfiy) > 0 for all y e Ef^,^{f) and x e D, (2)


//; in addition, Vf{y) ^ 0 holds for all y G Ef(^z){f)) then condition (2) is
sufficient for z e D being a solution to Problem (1).
Proof. Necessity. Suppose that ^ is a global minimizer of problem (1) and let
y G Ef(^z){f) ^^d X E D. Then we have f{x) > f{y). Since the function / is a
quasiconcave, it follows that

fiax + (1 - a)y) > mm{f{x), f{y)} = f{y) for all a € [0,1].

By Taylor's formula, there is a neighborhood of the point y on which:

f{y + aix - y)) - f(y) = a ({x - yfVf{y) + ^ i ^ ^ Z ^ M ^ > o, a > 0.

Note that lira "^"H"^"^"^ = 0. This implies that (x - yYVfiy) > 0.


Sufficiency. Conversely, suppose that z is not a solution to problem (1); i.e.,
there exists bXiu £ D such that f{u) < f{z). By the definition of quasiconcave
function, Uf(^z){f) = {x eW^ \ f{x) > f{z)} is a closed and convex set. Note
that int Uf(^z){f) ¥" 0 according to the assumption in the theorem. Denote the
projection of u on Uf(^z){f) by y. It satisfies
\\y-u\\ = min \\x~u\\.

Clearly,
\\y-u\\>0 (3)
holds because u ^ Uf(^z){f)- Moreover, this y can be considered as a solution
of the following convex minimization problem:

min g{x) = -\\x - uf


s.t. xeUf^z){f)'
Since Uf(^z){f) ¥" 0 ^^^ ^^is set is convex, the Slater's constraint qualification
condition holds. Under this condition, y is a solution to the above problem if
and only if there exists Lagrange multiplier A such that (y. A) is a solution to
the following mixed nonlinear complementary problem :

ygiy) - AV/(y) = 0
\{fiz)-f{y)) =0 (4)
fiz) - f{y) < 0, A > 0
If A = 0, then we have Vg{y) =y — u = 0, which contradicts (3). Thus, A > 0
in (4). Then we obtain

y - u - AV/(2/) = 0, A>0,
fiv) = m-
254 A. Chinchuluun et al.

Prom this we conclude that {u—y)^Vf{y) < 0, which contradicts (2). This last
contradiction implies that the assumption that z is not a solution of Problem
(1) must be false. This completes the proof. D

This theorem tells us that we need to find a pair x,y eW^ such that

( x - y r V / ( 2 / ) < 0 , f{y) = f{z'), xeD

in order to conclude that the point z' e D is not a solution to Problem (1).
The following example illustrate the use of this property.
Example 1.

mm j{x) —
1 - x i - X2
s.t. 0.6 < xi < 7,
0.6 < 0:2 < 2.

We can easily show that / is a quasiconcave function over the constraint set.
The gradient of the function is found as follows.

J.. . __ {x'2 — 2x1X2 - xf + 2xi x\ — 2xiX2 - x\-\- 2x2


(1-Xi-X2)^ ' (1-Xi-X2)^

Now we want to check whether a feasible point x^ = (0.6,0.6)-^,which is clearly


local minimizer to the problem, is global minimizer or not. Then consider a
pair u = (5,2)^ and y = (3,3)^ satisfying f{y) = /(x°) = -3.6. We have
{u — yY'Vf{y) = — if < 0 and it follows that x° is not a global solution. In
fact, we can show that the global solution is x* = (7,0.6)-^.

3 Approximation Techniques of t h e Level Set


For further discussion, we will consider only the concave case of the Problem
(1), which is

min /(x) (5)


s.t. X G D,

where / is a concave and differentiable function and D is a convex compact


set in R^.
Definition 1. The set Ef^z)U) defined by

% . ) ( / ) = {?/eR" I/(y) = /(2)}


is called the level set of f at z.
A Numerical Method for Concave Programming Problems 255

Note that the optimality condition (2) for Problem (5) requires to check
the linear programming problem

min (x - y)^Vf{y)
s.t. Xe D

for every y G Ef(^z){f)- This is a hard problem. Thus, we need to find an ap-
propriate approximation set so that one could check the optimality condition
at a finite number of points.
The following lemmas show that finding a point at the level set of f{x) is
theoretically possible.
Lemma 1. Let h GR^, Z e D which is not a global maximizer of f{x) over
M^ and let x* be an optimal solution of the problem

max f{x)
s.t. X G M"",

and let the set of all optimal solutions of this problem be bounded. Then there
exists a unique positive number a such that x* + a/i G Ef(^z){f)'
Proof We will prove that there exists a positive number a such that x*-\-ah e
^f{z)if) ^^ fii'st. Suppose conversely that there is no number which satisfies
the above condition; i.e., f{x* + ah) > f{z) holds for all a > 0. Note that
hyp{f) — {(^>^) ^ M^"^^ : r < f{x)} is a convex set since / is a concave
function. For a > 0, we obtain (x* + ah^ f{z)) G hyp{f). Next we show that
(/i,0) is a direction of hyp{f). Suppose conversely that there exist a vector
y ^ hyp{f) and a positive scalar /? such that y^f3{h^ 0) G W^^\hyp{f). Since
W^^^\hyp{f) is an open set, there exists a scalar ji that satisfies the following
conditions:

ti{x\ f{z)) + (1 - ^x){y + pih, 0)) e W+\hyp{f) , 0 < M< 1 (6)

On the other hand, we can show that //(x*,/(2;)) + (1 — /j>){y-{-P{h,0)) lies on


the line segment joining some two points of hyp{f). For the points (x*, /(2;)) +
iT (^'Q) ^^^ y^ ^^^ following equation holds.

^i{{x*,f{z)) + ii:ii^(/i,0)) + {l~^i)y = Mx*,f{z)) + (1 - ^i)iy + (3ih,0))

By convexity of hyp{f), we have iJ.{x*J{z)) + (1 - //)(2/ + P{h, 0)) G hyp{f).


This contradicts (6), hence, (/i, 0) is a direction of hyp{f). Since (x*, /(x*)) G
hyp{f), the following statement is true.

(x*,/(x*)) + a(/i,0) G hyp{f) for all a > 0

We can conclude that x* + a/i is also a global maximizer of / for all a > 0
because x* is a global maximizer of / over R^. This contradicts the assumption
256 A. Chinchuluun et al.

in the lemma. Now, we prove the uniqueness property. Assume that there are
two positive scalars ai and a2 such that x*+aih G Ef(^z)if)^ z = 1,2. Without
loss of generahty, we can assume that 0 < a i < a2. By concavity of / , we
have

f{z) = fix* + aih) =f((l-^)x'^ + ^ ( x * + a2h)

V a2 / a2
= fl-^)/(x*) + ^/(.)>/(z)
\ Oi2j 0C2

This inequality is valid only if a i = a2 •


Under some condition, it is possible to compute a point on the level set.
This is shown by the following statement.
Lemma 2. Let a point z E D and a vector /i G R^ satisfy h^Vf{z) < 0 and
let X* be global maximizer of f over D. Then there exists a unique positive
number a such that x* + a/i G Ef(^z){f)-
Proof Suppose conversely that

f{z) < fix"" + ah) for all a > 0.

Note that f{x*) > f{z) for any z £ D.By convexity of / , we have

(x* - z)^Vf{z) > 0.

Prom the last inequality and assumption h^Vf{z) < 0, we can conclude that

h^vm -
Since / is a concave function, for all a > 0, we have

0 < /(x* + ah) - f{z) < (x* +ah- z)^Vf{z)


= {x''-z)^Wf{z) + ah^Vf{z).

Fora = - i ^ ^ ^ , ^ g P > 0 , w e g e t

0<f{x*+ah)-f{z)<0,
This gives contradiction. D
Example 2. Consider the quadratic concave minimization problem

min f{x) == -x^Cx + d^x (7)


s.t. Xe D
A Numerical Method for Concave Programming Problems 257

where D is a convex set in E^, d G R^ and C is a symmetric negative definite


n X n matrix.
Since C is negative definite, we have

h^Ch < 0

for all /i 7^ 0. Let us solve the equation f{x* + ah) = f{z) with respect to a.

i(x* + a/i)^C(x* + ahf + c/^(x* + ah) = f{z)

or
fix') + ah^iCx' + c/) + ^a^h'^Ch = f{z)

Note that x* satisfies Cx* + d = 0. Using this fact, we have

/2(/(z)-/(x*))- '
V h^Ch
Constructing Points on the Level Set
As we have seen in Example 2, the number a can be found analytically for the
quadratic case. In a general case, this analytical formula is not always avail-
able but Lemmas 1 and 2 give us an opportunity to find a point on the level
set using numerical methods. For this purpose, let us introduce the following
function of one variable in R"*".

m = fix*+th)-fiz). (8)
The above lemmas state that this function has a unique root in R+. Our goal
is to find the root of the function and , now, we can use numerical methods
for this problem such as the Fixed point method, the Newton's method, the
Bracketing methods and so on . We could use the following method to find
initial guesses a and b such that ip{a) > 0 and ip{b) < 0 for the Bracketing
methods as follows:
1. Choose a step size p > 0.
2. Determine ip{qp), g- = 1, 2 , . . . , ^o
3. a= {qo - l)p , b = qop
where go is the smallest positive integer number such that ip{qop) < 0. More-
over the Bisection method can be stated in the following form.
1. Determine ip at the midpoint ^ ^ of the interval [a, 6].
2. If ip{^) > 0, then the root is in the interval [ ^ , 6 ] . Otherwise, the root
is in the interval [a, ^J^]. The length of the interval containing the root
is reduced by a factor of ^.
3. Repeat the procedure until a prescribed precision is attained.
258 A. Chinchuluun et al.

Finally, we choose a number a as an approximate root of the function '0(t)


such that
'0(a) < 0 (9)
i.e., a lies on the right hand side of the exact root. We will see that this selec-
tion helps us when we construct algorithms for Problem (1) in the next section.

In order to check the optimality condition at a finite number of points of


the level set, it is necessary to introduce a notion of an approximation set.
Definition 2. The set A^ defined for a given integer m by
AT = {y\y\...,y"'\f€Ef^,){f), i = l,2,...,m} (10)
is called an approximation set to the level set Ef(^z){f) ^^ ^'
Since we can construct a point on the level set, an approximation set can
be constructed in same way. Assume that A^ is given. Then for each y'^ G
A^, i — 1,2,. . . , m , solve the auxiliary problem
min x^Vf{y') (11)
s.t. X e D,
Let 'u% i = 1,2,..., m, be the solutions of those problems, which always exist
due to the compactness of D\

v^^Vf{y')^mmx^Vf{y') (12)

Let us define 6m as follows:


em= . min {u'-yyVf{y') (13)
z=l,2,...,m

There are some properties of A^ and Om-


L e m m a 3. If there is a point y'^ e A^ for z £ D such that {u^ — y'^)^Vf{y'^) <
0; where u^ e D satisfies u^ Vf{y'^) = minx^V/(y^), then
XED

f{u') < f{z)


holds.
Proof. By the definition of u^^ we have
mm(x - 2/*)^V/(y*) = {u' - vYVfiy')
x£D
Since / is concave,
f{u)-f{v)<{u-vf\/fiv)
holds for all u^v G M^. Therefore, the assumption in the lemma implies that
/ ( « ' ) - f{z) = f{u') - f{y') < {u' - vYVfiy') < 0.
n
A Numerical Method for Concave Programming Problems 259

Trivial Approximation Set

Consider the following set of vectors :

^ f = {2/1,2/2,..., j / 2 " | y ^ = x * + a , F € % , ) ( / ) , j = 1,2,... ,2n}, (14)

where a^'s are positive numbers, P's are orthogonal vectors such that P =
—/^+-^ for j = 1 , . . . , n and x* is a solution to the problem

max f{x)
s.t. X e W.

Without loss of generality we can assume that P is the j ^ ^ unit (coordinate)


vector and there exists some aj such that y^ G Ef(^z)if) (if this number does
not exist, we just eliminate y^ from the set), therefore A^^ is an approximation
set to the level set Ef(^z){f) ^tt the point z. When z is not a global maximizer
of / over R^^ clearly, A^^ is a nonempty set and it contains at least n points.
Definition 3. The approximation set constructed according to (I4) is called
the trivial approximation set.
Second order Approximation Set
In order to improve the approximation set, it is helpful to define another
approximation set based on the previous approximation set. Assume that we
have an approximation set A^. We can construct another approximation set
B^ based on the approximation set as follows.

BT = {y\y^...,y'^lf = ^* + ^i{u' ~ x*) G ^ / ( . ) ( / ) , i - i , 2 , . . . , m } ,


(15)
where x* is a solution to the problem

max f{x)
s.t. X G R"".

and u'^ is a solution to the problem

min x^Wfiy')
s.t. X G D.

The use of x* is justified by the relationship between A^ and B^ in the


following lemma.
Lemma 4. Let f{z) 7^ /(x*). If 6m < 0 then there exists a j G { 1 , 2 , . . . , m}
andv e D such that y^ G B^ satisfies {v - y^)^Vf{y^) < 0.
260 A. Chinchuluun et al.

Proof. According to (13), there exists a j G {1,2,. . . , m } such that

0m = (u^ - y^fVfiy^) = . min {u' - yY^f{y') < 0,


^=l,2,...,77^

where u^ satisfies u^ ^f{y^) = vamx^Vf{y'^).

0 > {u^ - y^)Vf{y^) = {u^ - y^ + x* - x*)V/(y^)


- {u^ - x*)V/(y^) + (x* - y^)Vf{y^)
or
[u^ ~x^)Vf{y')<{y^ -x')Vf{y^),
Using the concavity of / , we can show that the right hand side of the last
inequahty is negative as follows
{yi-x*)Vf{yi)<f{yi)-f{x*)<0.
Since {u^ —x*)Vf{y^) < 0 from the last two inequalities, according to Lemma
2, there exists a unique positive number aj such that y^ = x* + aj{u^ — x*) G
Ef(^yj){f) = Ef^^){f). Clearly y^ G Bf. Now, we will show that a^ <l.
Conversely, suppose that a-^ > 1 or 0 < ^ < 1. As we have seen in Lemma
3, we can write
f{u^) < f{z).
Since x* is the global maximizer of / over R^, we have

f{u^) < f(z) = fix'+ajiu^-x*)) < /(x*).


On the other hand, by concavity of /

^f {x* + ajiu^ - X*)) + (l-l-)f (x*)

< / (^-^^{x* + ajiu^ - X*)) + ( l - ^ ) ^*) = n^')-

This contradicts the previous inequality. Thus 0 < a-^ < 1. Now we are ready
to prove the lemma.
Consider the point y^ — x* + 6LJ {U^ — X*) in B^. Prom the concavity of / and
the above observations, it follows that

{u^ - ffVf{y^) = (1 - aj){u^ - x*)^V/(x* + aj{u^ - x*))


- i - ^ a , ( w ^ ' - x*)^V/(x* + aj{u^ - x*))
aj

= i ^ ( x * + aj{u^ - X*) - x*)^V/(x* + aj{u^ - x*))


aj

< ^-ir^ifix* + aj{u^ - X*)) - fix*)) < 0.


aj
A Numerical Method for Concave Programming Problems 261

Now, if we take a point v = u^ e D, then we have {v — y-^)^V/(y-^) < 0, and


the assertion is proven. D
Remark 1, Note that ^m ^ 0 does not always imply

min min {u' - ff^fif) > 0-

Remark 2. If we use Selection (14), it is easy to see that the lemma is still
true when aj and aj are approximate roots to the functions ipi{t) = f{x* +
W) - f{z) and V^2(4 == / ( ^ * + ^(^^ " ^*)) " / ( ^ ) . respectively.
In analogy with 6m for A^^ introduce 9m for the set B'^ as follows.
9m = . min (v'-ff^fif),
Z = 1,2,...,771

where v'^ is defined by v^ ^f{y^) — mina;-^V/(y^)

Definition 4. The approximation set constructed according to (15) is called


the second order approximation set to the level set Ef(^z){f) ^^ point z.

Orthogonal Approximation Set


Another way to construct an approximation set is extracting an approxima-
tion set from the trivial approximation set using the rotation. Consider the
coordinate vectors P, j = 1 , . . . , n, l'^'^^ such that P"^-^ = —P, j = 1 , . . . , n and
a rotation matrix R. Let us define the following vectors and a set of vectors.
C f = {y\f,..., f^ I yi =x*+Piq\ i = 1,2,..., 2n}, (16)
where x* is a solution to the problem
max f{x)
s.t. X G W".
and q^ = RP, j = 1 , . . . , 2n.
Without loss of generality, we can assume that there exist positive numbers
Pj such that X* -f pjq^ G ^f{z)f ^^^5 therefore, C'^'^ is an approximation set
to the level set Ef(^z){f) ^t point z. Also we can introduce 9m as follows:

Om = . min K - yY^fiy%
2=1,2,...,m

where w^ is defined by w'^ V/(y^) = minx^V/(i/^).


xeD
Definition 5. The approximation set constructed according to (16) is called
the orthogonal approximation set to the level set Ef(^z){f)'
262 A. Chinchuluun et al.

4 Algorithms and their Convergence


In this section, we discuss three algorithms based on observations discussed in
Section 3 to solve Problem (5). We begin by explaining the main idea of our
methods for the problem. The idea of the algorithms is to check whether 6m,
which is defined in (13), is negative or nonnegative solving linear programming
problems. Therefore, if it is negative, a new improved solution can be found
according to Lemma 3; otherwise, terminate the algorithm. When the new
improved solution is found, we will use one of the existing local search methods
to get faster convergence. Also, we assume here that Problem (5) has finite
stationary points on the constraint set D in order to ensure convergence of
the algorithms. The first algorithm uses only the trivial approximation set
and the second algorithm uses a combination of the trivial and the second
order approximation sets, finally, the last algorithm uses a combination of the
three approximation sets defined in Section 3. The basic algorithm can be
summarized as follows:

Algorithm 1. INPUT : A concave differentiable function / , a convex compact set


D and x*, a global maximizer of / .
OUTPUT : A global solution x to Problem (5).
Step 1. Choose a point x^ e D. Set /c — 0.
Step 2. Find a local minimizer z^ £ D using one of the existing methods starting
with an initial approximation point x^.
Step 3. Construct the trivial approximation set ^^J at z^.
Step 4. For each y'' G A^^, i = 1,2,..., 2n, solve the problem

min x^Vf{y^)
s.t. X e D
to obtain a solution ix% i.e.,

Step 5. Find the number j G {1,2,..., 2n} such that

1=1,2,...,2n

Step 6. If 6>2n < 0 then x^"*"^ := u^, k := k-{-1 and return to Step 2. Otherwise, z^
is an approximate global minimizer and terminate.

The convergence of Algorithm 1 is given by the following statement.


Theorem 2. The sequence {z^, k = 0,1,...} generated by Algorithm 1 con-
verges to a solution of Problem (5) in a finite number of steps or finds an
approximate solution as a local solution to the problem.
A Numerical Method for Concave Programming Problems 263

Proof. We show that if ^2n < ^ holds for all k, then z^ converges to a global
minimizer of Problem (5) in a finite number of steps. In fact, take a j G
{ 1 , 2 , . . . , 2n} such that y^ G A^^ and u^ G D satisfy

According to Lemma 3, we have

We show that this inequality holds even when aj is an approximate root of


the function ip{t) = /(x* + W) — f{z^). From Selection (9), we can conclude
that
fiy^) < fiz").
Then it follows that

f{ui) - f{z'') < f{u^) - f{yi) < {u^ - y^)Vf{yn < 0.

Since u^ is a starting point for finding a local solution 2;^"^^, finally, it can be
deduced that
f{z^-^^) < f{z^) for all /c - 0 , 1 , 2 , . . . .
By the assumption, the number of local minimizers z^ is finite, and this se-
quence reaches a global minimizer in a finite number of steps or stops at an
approximate local solution. This completes the proof. D

Remark 3. When J9 is a polytope, we can use Algorithm 1 without a local


search method since every auxiliary problem finds a vertex of the set D and
number of the vertices is finite.

Example 3. [HPTOl]. To illustrate Algorithm 1, let us consider the following


example.

min f{x) = -x^Cx + dFx


s.t. Ax <h
x>0,

where

Iteration 1.
An initial feasible solution is x\ = (0,0)"^. Note that this vertex is a local
solution to the problem. In this case, a local search method cannot affect
the current approximate solution. The current best objective function value
264 A. Chinchuluun et al.

is 0. There does not exist a global maximizer of the function f(x) over R^.
Thus, we consider a global maximizer of the function over the constraint set;
therefore, it can be used for constructing an approximation set. The maximizer
is xl = (2.555,1.444)^. The trivial approximation set can be constructed
easily solving quadratic equations.

yl = (7.432,1.444)^, yf = (2.555,3.452)^,

yf = (0.345,1.444)^, yf = (2.555,0.102)^.
Solving linear programming problems, we find the following vectors.

ul = (3.0,0.5)^, ul = (0.75,2.0)^, ul = (0.75,2.0)^, uf - (1.0,0,0)^

Moreover, 6l — —0.563 and the initial feasible point to the next iteration is
^f = (0.75,2.0)^.

Iteration 2.
New feasible solution is XQ = (0.75,2.0)-^. The local search method cannot
improve this solution. The current objective function value is —1.0625. The
trivial approximation set to the level set £'-i.0625(/) is

yl = (7.579,1.444)^, y | - (2.555,3.530)^,

yl = (0.199,1.444)'^, y^ = (2.555,0.025)^.
Solving linear programming problems, we have

ul = (3.0,0.5)^, ul = (0.75,2.0)^,

ul = (0.75,2.0)^, u^ = (1.0,0.0)^
which is the same as we find at Iteration 1. Therefore, O^ = 0.313 at the vertex
ul = (0.75,2.0)^. The algorithm terminates at this iteration and the global
approximate solution is (0.75,2.0)^. Note that this is a global solution to the
problem.

Unfortunately, Algorithm 1 cannot always guarantee for a global optimal


solution. In this case, we can extend Algorithm 1 using the improved approx-
imation set and present an outline of the next algorithms as follows :

Algorithm 2. INPUT : A concave differentiable function / , a convex com-


pact set D and x*, a global maximizer of / .
OUTPUT : A global solution x to Problem (5).
Step 1. Choose a point x^ € D. Set A; = 0.
Step 2. Find a local minimizer z^ e D using one of the existing methods
starting with an initial approximation point x^.
Step 3. Construct a trivial approximation set A^]^ at z^.
A Numerical Method for Concave Programming Problems 265

Step 4. For each y'^ G A^]^, i = 1, 2 , . . . , 2n, solve the problem

min x^Vfiy')
s.t. Xe D

to obtain a solution i^% i.e.,

u'^'vfiy') =mmx^V f{y')

Step 5. Find the number j E { 1 , 2 , . . . , 2n} such that

z=l,2,.,.,2n

Step 6. If ^2n < 0 then x^'^'^ := u^, k \=k-\-l and return to Step 2. Otherwise
go to the next step.
Step 7. Construct a second order approximation set -B^^ at z^ by (15).
Step 8. For each y'^ G 5^1^, i = 1,2,..., 2n, solve the problem

min x^Vf{y')
s.t. X G D

to obtain a solution t;\ i.e.,

Step 9. Find the number s G { 1 , 2 , . . . , 2n} such that

C = K - r)^v/(r) =. min K - r)^v/(^o


1=1,2,...,2n
Step 10. If (9^^ < 0 then x^-^^ := ^;^ k := k + 1 and return to Step 2. Other-
wise, 2;^ is an approximate global minimizer and terminate the algorithm.
The convergence of this algorithm is the same as in Theorem 2.
Theorem 3. The sequence {2:^, k = 0,1,...} generated by Algorithm 2 con-
verges to a solution of problem (5) in a finite number of steps or finds an
approximate solution as a local solution to the problem.
Remark 4- When JD is a polytope. Algorithm 2 can be used without a local
search method.
Example 4- To illustrate Algorithm 2, let us consider the following concave
quadratic programming problem.

min f{x) = -x^Cx


s.t. Ax <b
x>0,
266 A. Chinchuluun et al.

where
C = { - ' - ' f ) , Ar= 't'], 6- = (20,19,3).
^~~ ^-0.5 - 4 y ' "" ~ \^1 2 1
Since the constraint set D is a polytope, we can use the algorithm without a
local search method.

Iteration 1.
Let us choose x^ = (1.0,0.0)^ as an initial feasible solution. The current ob-
jective function value is —2.0. The global maximizer of the function f{x) over
R^ is X* = (0.0,0.0)-^. The trivial approximation set can be computed as we
have seen in Example 2, and the vectors are

yl = (1.0,0.0f, yj = (0.0,1.0f, yf = (-1.0,0.0f, yf = ( 0 . 0 , - l . O f .

Solving linear programming problems, the following vertices of the polytope


are found.

u\ = (4.0,0.0)^, uj - (3.25,3.0)^, ul = (0.0,0.0)^, uj = (0.0,0.0)^

Therefore, Ol = - 1 2 at the vertex u\ = (4.0,0.0)^.

Iteration 2.
The current best feasible solution is x^ = (4.0,0.0)^ and the objective function
value is —32. The trivial approximation set to the level set E-s2{f) is

yl = (4.0,0.0f, yl = (0.0,4.0)^, yl = (-4.0,0.0)^, y^ = (0.0,-4.0)^.

The 1^2 vectors are

ul = (4.0,0.0)^, ul = (3.25,3.0)^, ul = (0.0,0.0)^, ui - (0.0,0.0)^

which is same as we find at Iteration 1. Nevertheless, ^4 = 0 at the vertex


u\ = (4.0,0.0)-^; i.e., in this case, the trivial approximation set does not work.
Next, we construct the improved approximation set, which is derived from the
last two sets of vectors according to (15).

yl = (4.0,0.0)^, yl - (2.772,2.558)^, yl = (-4.0,0.0)^, y | = (-2.772,-2.558)^.

Solving linear programming problems, we have

vl - (4.0,0.0)^, vl = (3.25,3.0)^, vl = (0.0,0.0)^, v^ = (0.0,0.0)^.

Therefore, Sj = -11.047 at the vertex vl = (3.25,3.0)^.

Iteration 3.
The current approximate feasible solution is x^ = (3.25,3.0)-^ and the objec-
tive function value at this point is —44. The trivial approximation set to the
level set E-44(f) is
A Numerical Method for Concave Programming Problems 267

yl - (4.69,0.0)^, yj = (0.0,4.69)^, y | - (-4.69,0.0)^, yl = (0.0,-4.69)^.


The vectors Ug's are same as the vectors 1x3's. Therefore, 6^ = 12.953 at
the vertex u^ = (4.0,0.0)-^. Thus, the current approximate solution did not
change. Also, for the improved approximation set to the level set E'_44(/),
the following sets can be found:
yl = (4.69,0.0)^, yl = (3.25,3.0)^, y | = (-4.69,0.0)^, yl = (-3.25,-3.0)^.
and
vl = (4.0,0.0)^, vj = (3.25,3.0)^, v^ = (0.0,0.0)^, v^ = (0.0,0.0)^.
Therefore, 9^ = 0 . 0 at the vertex V2 = (3.25,3.0)^. The algorithm terminates
at this iteration. Hence, the algorithm terminates at this iteration, and the
global approximate solution is (3.25,3.0)-^.
Note that this approximate solution is the global optimal solution and
(4.0,0.0)^ is the local solution to the problem.
Algorithm 3. INPUT : A concave differentiable function / , a convex com-
pact set D and x*, a global maximizer of / .
OUTPUT : A global solution x to Problem (5).
Step 1. Choose a point x^ G D. Set k = 0.
Step 2. Find a local minimizer z^ e D using one of the existing methods
starting with an initial approximation point x^.
Step 3. Construct a trivial approximation set A^]^ at z^.
Step 4. For each y^ e A^^, z = 1,2,..., 2n, solve the problem
min x^Vf{y'^)
s.t. X e D

to obtain a solution it% i.e.,

Step 5. Find the number j G { 1 , 2 , . . . , 2n} such that


^=l,2,...,2n
Step 6. If ^2n *^ 0 then x^'^^ := u^, k := k-\-l and return to Step 2. Otherwise
go to the next step.
Step 7. Construct a second order approximation set B'^J^ at z^ by (15).
Step 8. For each y* G B^^, i = 1,2,..., 2n, solve the problem
min x^Vf{y^)
s.t. X e D
to obtain a solution v^^ i.e.,

XED
268 A. Chinchuluun et al.

Step 9. Find the number s G { 1 , 2 , . . . , 2n} such that

OL = iv' - ff'^fif) = . min {v' - ffVfif)


z=l,2,...,2n

Step 10. He^^ < 0 then x^""^ := v^ k := fc+l and return to Step 2. Otherwise
go to the next step.
Step 11. Construct an orthogonal approximation set C^^ at z^ by (16).
Step 12. For each y^ G C^?, i = 1,2,..., 2n, solve the problem

min x'^Vfif)
s.t. X e D

to obtain a solution w'^, i.e.,

xeD
Step 12. Find the number s E { 1 , 2 , . . . , 2n} such that

OL = iw' - ff^fm = . min ^ [w' - yYVf{f)


z=l,2,...,2n

Step 13. If ^2n < 01^^^^ ^^"^^ := 1^^, A: :=fc+1and return to Step 2. Otherwise
z^ is an approximate global minimizer and terminate the algorithm.

The convergence of the algorithm is given by the following theorem and


the proof is similar to the proof of Theorem 2.
Theorem 4. The sequence {z^, fc = 0,1,...} generated by Algorithm 3 con-
verges to a solution of Problem (5) in a finite number of steps or finds an
approximate solution as a local solution to the problem.

Remark 5. When i^ is a polytope, we can use Algorithm 3 without a local


search method.

Example 5, Consider the following problem to illustrate Algorithm 3.

min f{x) = —||x||^


s.t. Ax < 6,

where

^^=(:l"f-19oL^l'5)' ^^ = (4,90,102,121,192,270).

Since the constraint set D is a polytope, we can use the algorithm without a
local search method.

Iteration 1.
A Numerical Method for Concave Programming Problems 269

An initial feasible solution is x^ = (1.0,0.0)^. The current objective func-


tion value is —1.0. The global maximizer of the function f{x) over R^ is
X* = (0.0,0.0)^. 9l = - 3 8 at the vertex u\ = (20.0,2.0)^. Therefore, this
vertex is the initial point of the next iteration.

Iteration 2.
x'^ = (20.0,2.0)-^. The current objective function value is —404.0. In this
case, the approach of the trivial approximation set cannot improve the cur-
rent approximate solution, i.e., 6l = 4.01 at the vertex u^ — (20.0,2.0)^.
Introducing the improved approximation set, we get d\ = —4.00 at the vertex
v\ = (19.5,8.0)^.

Iteration 3.
The current objective function value is —444.25 at x^ — (19.5,8.0)-^. Con-
structing the trivial and the improved approximation sets cannot improve the
current approximate solution, i.e., d\ — 45.41 at the vertex u\ = (20.0,2.0)-^
and §1 = 37.01 at the vertex v^ = (19.5,8.0)-^. Next, we introduce the rotation
matrix

Using this rotation matrix, the following new orthogonal approximation set is
found.

yl - (15.508,-15.508)^, y^ = (15.508,15.508)^,
yl = (-15.508,15.508)^, y^ = (-15.508, -15.508)^.

The solutions of the corresponding linear programming problems are

wl = (20.0,2.0)^, wl - (15.0,16.0)^,
wi = (0.0,18.0)^, w^ = (-2.0, - 2 . 0 ) ^

Since 9l — —35.54 at the vertex w^ = (15.0,16.0)^, according to Lemma 3,


new approximate solution is (15.0,16.0)^.

Iteration 4.
The current approximate solution is x^ = (15.0,16.0)-^. The objective func-
tion value at this point is —481. We can check that the algorithm stops at
this iteration. Thus, (15.0,16.0)^ is the approximate global optimal solution
to the problem.

Note that this solution is the global optimal solution to the problem. The
points x^ = (20.0,2.0)^ and x^ = (19.5,8.0)^ which we found during the
algorithm are local solutions to the problem.
270 A. Chinchuluun et al.

5 Numerical Examples
In this section, we present four examples which are implemented by the pro-
posed algorithms.
Problem 1.

min f{x) = - ^{xi + i)' (17)

s.t. 1 — i < Xi < 2i , i = 1,2,

The global solutions to these problems are obtained by Algorithm 1 and the
computational results are shown in Table 1.

Table 1. Computational results for Problem (17) and Problem (20).


Dimension of Computational
Problems Initial Value Global Value
the problems time (sec.)
(17) To ^lo ^3465 0.090
(17) 50 -50 -386325 1.542
(17) 100 -100 -3045150 3.565
(17) 200 -200 -24180300 25.447
(17) 500 -500 -376125750 158.989
(17) 1000 -1000 -3004501500 694.889
(20) 10 -10 -1540 0.731
(20) 50 -50 -171700 17.315
(20) 100 -100 -1353400 45.470
(20) 200 -200 -10746800 143.707
(20) 500 -500 -167167000 826.769
(20) 1000 -1000 -1335334000 3253.549

Next, we consider the two test problems given in [Tho94].

Problem 2. Consider the following concave minimization problem

min fix) = --(a^x)2 (18)


s.t. Ax<b
Xi > -1 , i-1,2,

where ^ is an n x n matrix with positive entries, a and b are n vectors


with positive entries. Let u^ and u^ be the optimal solutions to the linear
programming problems min{a^x : Ax < b , Xi > —1, i = 1,2,... ,n} and
A Numerical Method for Concave Programming Problems 271

max{a-^x : Ax < 6 , X i > — 1 , i = l , 2 , . . . , n } , respectively. Then the above


problem has a global solution u G {u^.u'^} [Tho94]. Algorithm 1 finds a global
solution to the problem for various dimensions without a local search method,
and results are shown in Table 2.
Problem 3

min
mi-n
f{x) - , "" ~ ^^ ? , , + ln(l + a - {a^xf)
-P f nr'\ —
(19)
1 -h a — {oP^xY
s.t. Ax <h
x^ > —1 , i = 1,2,... ,n

where ^ is an n x n matrix with positive entries, a and h are n vectors with


positive entries, and a is a real number such that a — (a^x)'^ > 0 for the
all feasible points of the problem. Consider the following concave quadratic
programming problem.

min g{x) — —{a^x)'^


s.t. Ax <b
Xi> —1 ^ i = 1,2,... ,n

Let t; be an n vector which its all entries are equal to —1. Then, whenever the
linear programming problem maxja-^x : Ax < b , Xi > —1, z = 1,2,... ,n}
has an optimal solution w, the concave quadratic minimization problem has
a global solution u G {v^w}. Moreover, u is also a global solution of Problem
(19) [Tho94]. This solution can be found using Algorithm 1 without a local
search method for Problem (19) and the computational results are shown in
Table 2.

Problem 4.

min f{x) = -"^{xi-i)^ (20)

S.t. —i<Xi<i~l, i = 1,2,..., n

Algorithm 2 can be used for the above problem without a local search method.
Table 1 shows the computational results for the problem.
The numerical experiments were conducted using MATLAB 6.1 on a PC with
an Intel Pentium 4 CPU 2.20GHz processor and memory equal to 512 MB. The
primal-dual interior-point method [Meh92] and the active set method [Dan55],
which is a variation of the simplex method, were implemented by calling
subroutines linprog.m from Matlab 6.1 regarding the size of the problem. For
Problem (17), the subspace trust region-method [CL96] based on the interior-
reflective Newton method and the active set method [GMW81], which is a
projection method, were implemented as local search methods by calling the
subroutine quadprog.m from Matlab 6.1.
272 A. Chinchuluun et al.

Table 2. Computational results for Problem (18) and Problem (19).


Dimension of
Problems Constraint type Computational time (sec.)
the problems
(18) random generated 10 0551
(18) random generated 50 16.283
(18) random generated 100 126.502
(18) random generated 200 1571.760
(19) random generated 10 0.432
(19) random generated 50 17.185
(19) random generated 100 140.642
(19) random generated 200 1906.512

6 Conclusions
In this paper, we developed three algorithms for concave programming prob-
lems based on a global optimality condition. Under some condition, the con-
vergence of the algorithms have been established. For the implementation
purpose, three kinds of approximation sets are introduced and it is shown
t h a t some numerical methods are available to construct t h e approximation
sets. At each iteration, it is required to solve 2n linear programming problems
with the same constraints as the initial problem. Some existing test problems
were solved by t h e proposed algorithms and the computational results have
shown t h a t the algorithms are efficient and easy in computing a solution.

References
[Ber95] Bertsekas, D.P.: Nonlinear programming. Athena Scientific, Belmont,
Mass. (1995)
[CL96] Coleman, T.F., Li, Y.: A reflective Newton method for minimizing a
quadratic function subject to bounds on some of the variables. SIAM
Journal on Optimization, 6, 1040-1058 (1996)
[Dan55] Dantzig, G.B., Orden, A., Wolfe, P.: The generalized Simplex Method
for minimizing a linear form under linear inequality constraints. Pacific
Journal Math., 5, 183-195.
[Die94] Dietrich, H.: Global optimization conditions for certain nonconvex mini-
mization problems. Journal of Global Optimization, 5, (359-370) (1994)
[Enk96] Enkhbat, R.: An algorithm for maximizing a convex function over a simple
set. Journal of Global Optimization, 8, 379-391 (1996)
[GMW81] Gill, P.E., Murray, W., Wright, M.H.: Practical Optimization. Academic
Press, London, UK (1981)
[Hir89] Hiriart-Urruty, J.B.: Prom convex optimization to nonconvex optimiza-
tion. In: Nonsmooth Optimization and Related Topics, 219-239. Plenum
(1989)
[HT93] Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches (sec-
ond edition). Springer Verlag, Heidelberg (1993)
A Numerical Method for Concave Programming Problems 273

[HP95] Horst, R., Pardalos, P.M. (eds): Handbook of Global Optimization.


Kluwer Academic, Netherlands (1995)
[HPTOl] Horst, R., Pardalos, P.M., Thoai, N.V.: Introduction to Global Optimiza-
tion (second edition). Kluwer Academic, Netherlands (2001)
[Kha79] Khachiyan, L.: A polynomial algorithm in linear programming. Math.
Doklady, 20, 191-194 (1979)
[Meh92] Mehrotra, S.: On the implementation of a Primal-Dual Interior Point
Method, SIAM Journal on Optimization, 2, 575-601 (1992)
[Par93] Pardalos, P.M.: Complexity in Numerical Optimization. World Scientific
Publishing, River Edge, New Jersey (1993)
[PR87] Pardalos, P.M., Rosen, J.B.: Constrained Global Optimization: Algo-
rithms and Applications. Lecture Notes in Computer Science 268,
Springer-Verlag (1987)
[PR86] Pardalos, P.M., Rosen, J.B.: Methods for Global Concave Minimization:
A Bibliographic Survey. SIAM Review, 28, 367-379 (1986)
[PS88] Pardalos, P.M., Schnitger, G.: Checking local optimality in constrained
quadratic programming is NP-hard. Operations Research Letters, 7, 3 3 -
35 (1988)
[Roc70] Rockafellar, R.T.; Convex Analysis. Princeton University Press, Princeton
(1970)
[Str98] Strekalovsky, A.S.: Global optimality conditions for nonconvex optimiza-
tion. Journal of Global Optimization, 12, 415-434 (1998)
[SE90] Strekalovsky, A.S., Enkhbat, R.: Global maximum of convex functions on
an arbitrary set. Dep.in VINITI, Irkutsk, No. 1063, 1-27 (1990)
[Tho94] Thoai, N.V.: On the construction of test problems for concave minimiza-
tion problem. Journal of Global Optimization, 5, 399-402 (1994)
Convexification and Monotone Optimization

Xiaoling Sun^, Jianling Li^^, and Duan Li^

^ Department of Mathematics
Shanghai University
Shanghai 200444, P.R. China
xlsunQstaff.shu.edu.en
^ College of Mathematics and Information Science
Guangxi University
Nanning, Guangxi 530004, P.R. China
ljll23Qgxu.edu.cn
^ Department of Systems Engineering and Engineering Management
The Chinese University of Hong Kong
Shatin, N.T., Hong Kong, P.R. China
dliQse.cuhk.edu.hk

S u m m a r y . Monotone maximization is a global optimization problem that max-


imizes an increasing function subject to increasing constraints. Due to the often
existence of multiple local optimal solutions, finding a global optimal solution of
such a problem is computationally difficult. In this survey paper, we summarize
global solution methods for the monotone optimization problem. In particular, we
propose a unified framework for the recent progress on convexification methods for
the monotone optimization problem. Suggestions for further research are also pre-
sented.

1 Introduction
Global optimization has been one of the i m p o r t a n t yet challenging research
areas in optimization. It appears very difficult, if not impossible, to design an
efficient method in finding global optimal solutions for general global optimiza-
tion problems. Over the last four decades, much attention has been drawn to
the investigation of specially structured global optimization problems. In par-
ticular, concave minimization problems have been studied extensively. Various
algorithms including extreme point ranking methods, cutting plane methods
and outer approximation methods have been developed for concave minimiza-
tion problems (see e.g. [Ben96, H T 9 3 , RP87] and a bibliographical survey in
[PR86]). Monotone optimization problems, as an i m p o r t a n t class of specially
structured global optimization problems, have been also studied in recent
years by m a n y researchers (see e.g. [LSBGOl, RTMOl, SMLOl, TuyOO, TLOO]).
T h e monotone optimization problem can be posted in t h e following form:
276 X. Sun et al.

(P) max f{x)


s.t. gi{x) <bi, z = l , . . . , m ,
X G X = {x \ Ij < Xj < Uj, j = 1 , . . . , n},

where / and all ^^s are increasing functions on [/,u] with / = (^i, ^2, • • • > ^n)"^
and u = {ui,U2,... ^Un)^- Note that functions / and ^^s are not necessarily
convex or separable. Due to the monotonicity of / and ^^s, optimal solutions
of (P) always locate on the boundary of the feasible region. It is easy to see
that the problem of maximizing a decreasing function subject to decreasing
constraints can be reduced to problem (P). Since there may exist multiple lo-
cal optimal solutions on the boundary, problem (P) is of a specially structured
global optimization problem. In real-world applications, the monotonicity of-
ten arises naturally from certain inherent structure of the problem under con-
sideration. For example, in resource allocation problems ([IK88]), the profit or
return is increasing as the assigned amount of resource increases. In reliabil-
ity networks, the overall reliability of the system and the weight, volume and
cost are increasing as the reliability in subsystems increases ([TzaSO]). Partial
or total monotone properties are also encountered in globally optimal design
problems ([HJL89]).
The purpose of this survey paper is to summarize the recent progress on
convexification methods for monotone optimization problems. In Section 2, we
discuss the convexification schemes for monotone functions. In Section 3 we
first establish the equivalence between problem (P) and its convexified prob-
lem. Outer approximation method for the transformed convex maximization
problem is then described. Polyblock outer approximation method is presented
in Section 4. In Section 5, a hybrid method that combines partition, convexi-
fication and local search is described. Finally, concluding remarks with some
suggestions for further studies are given in Section 6.

2 Monotonicity and convexity


Monotonicity and convexity are two closely related yet different properties of
a real function in classical convex analysis. One of the interesting questions
is whether or not a nonconvex monotone function can be transformed into
a convex function via certain variable transformations. Since linear transfor-
mation does not change the convexity of a real function, we have to appeal
to nonlinear transformation for converting a monotone function into a convex
function.
To motivate the convexification method for general monotone functions,
let us consider a univariate function f{x). Suppose that f{x) is a strictly
increasing function and t{x) is a strictly monotone and convex function. Define
a composite function g{x) = f{t{x)). If / is twice differentiable, then g'{x) —
f{t{x))t\x) and
Convexification and Monotone Optimization 277

g"{x) = f"{tix))\t'{x)]' + f'{t{x))t"{x).

Thus, g{x) is a convex function if and only if

Inequality (1) characterizes the condition for a nonlinear transformation t


to convexify a univariate twice differentiable increasing function via a variable
transformation or domain transformation. We now turn to derive conditions
for convexifying a multivariate monotone function. A function / : i9 —> R is
said to be increasing (decreasing) on D C W^ if / ( x ) > f{y) {f{x) < f{y))
for any two vectors x, y E D whenever x > y. li the strict inequality holds,
then / is said to be strictly increasing (decreasing) on D cW^.
Let a, PeR'' with 0 < a < p. Denote [a,p] = {x eW \ a < x < p}. Let
t : M^ K-» R^ be a one-to-one mapping. Define

My) = my)). (2)


The domain of ft is y* = r ^(X). Define
a = xam{(fV^f{x)d | a; € [a,/?], ||rf||2 = 1}, (3)
/ i - m i n j — I X e [a,/?], j - l , . . . , n } . (4)

We have the following theorem on convexification.


Theorem 1. ([LSM05]) Assume that f is a twice differentiable and strictly in-
creasing function on [a, /3] and ii> 0. Suppose that t{y) =^ (^1(2/1),..., tn{yn))
is a separable mapping, and each ti is twice differentiable and strictly monotonic.
If t satisfies the following condition:

j ^ ^ > - ^ , iovyjGY}=tj\laj,/3j]), j = l,...,n, (5)

then ft{y) is a convex function on any convex subset ofY^.


Similarly, a strictly decreasing function can also be converted into a convex
function via a variable transformation satisfying:

Jljy^ < --, for yj e YJ, j = l,...,n.

There are many specific mappings that satisfy condition (5). In particular,
consider the following two functions:

tj{yj) = -\n{l--), p>0, j = l,...,n, (6)


P Vj
tjiVj) = 2/7'' P>0, j = l,...,n. (7)
278 X. Sun et al.

Corollary 1. Let pi = max{0, —o-f/j,} andp2 = max{0, —{Pa)/fi — 1}, where


P = xmiii<j<n 0j' Then, the mapping t with tj defined by (6) satisfies con-
dition (5) when p > Pi, and the mapping t with tj defined by (7) satisfies
condition (5) when p > P2'
For illustration, let us consider a one-dimensional function:

f{x) = {x-2f + 2x, xeX = [1,3].

Note that f{x) is a nonconvex and strictly increasing function. The plot of
f{x) is shown in Figure 1. We have f{x) - 3(x - 2)^ + 2 > 2 and f'{x) -
6 ( x - 2 ) > - 6 f o r x G [1,3]. Take t(y) - (l/p)ln(l - ^) in (2). By Corollary 1,
pi = —(—6)/2 = 3. So, any p > 3 guarantees the convexity of ft{y) on
Y* = [—l/(e^ — 1),—l/(e^^ — 1)]. Figure 2 shows the convexified function
ft{y) with p == 3. In practice, p can be chosen much smaller than the bound
defined in Corollary 1.

Fig. 1. The plot of/(a;).

f(x)

Range transformation can be also incorporated into the convexification


formulation (2) to enhance the convexification efl^ect. Let T be a strictly in-
creasing and convex function on R. Define

fTAy) = nf{t{y)). (8)


Certain conditions [SMLOl] can be derived for fr.tiy) to be a convex function
on Y^. Typical range transformation functions are T{z) = e'^^ and T{z) = z^^
where r > 0 and p > 0 are positive parameters. One advantage to use both
range and domain transformations is to reduce possible ill conditions caused
by the convexification process.
Convexification and Monotone Optimization 279

Fig. 2. The plot of/t(t/).

ft(y)

-0.05 -0.03 -0.01

Theorem 1 was generahzed in [SLL04] to convexify a class of nonsmooth


functions. Let df{x) denote the set of Clarke's generalized gradient of / at x
and dyjf{x) the set of Clarke's generalized gradient in direction w. Denote by
f'^{x,v,w) Chaney's second-order derivative of / at x (cf. [Cha85]).
Theorem 2. ([SLL04]) Assume in (2) that
(i) / is semismooth and regular on X.
(ii) f is a strictly increasing function on X and

inf min {^i | C (6,...,^nf e9/(x), x G X } > e > 0 , (9)


i=l,...,n
(J — m.i{f'^{x^v,'w) I X G X, ||it;||2 = 1, t' G dwf{x)} > —oo. (10)

(iii) t{y) = {ti{yi),... ytn{yn)) andtj^ j = 1,... ,n, are twice differentiate
and strictly monotone convex functions satisfying

> Vj&YL j = \,.. .,n. (11)

Then ft{y) defined in (2) is a convex function on any convex subset ofY^.
Note that convex functions, C^ functions, and pointwise maximum or min-
imum of C^ functions are semismooth. Furthermore, certain composite semi-
smooth functions are also semismooth (see [Muf77]).
The idea of convexifying a nonconvex function via both domain transfor-
mation and range transformation can be traced back to 1950s. Convex (or
concave) transformable functions were introduced in [Fen51]. Let / be de-
fined on a convex subset C C R^. / i s said to be convex range transformable
or F-convex if there exists a continuous strictly increasing function F such
280 X. Sun et al.

that F{f{x)) is convex. The concept of domain transformation for general


nonconvex functions was introduced in [Ben77]. / is said to be h-convex if
a continuous one-to-one mapping h exists such that f{h~^{y)) is convex on
domain h{C). f is said to be (/i, F)-convex if f{h~^{y)) is F-convex on h{C).
A special class of h-convex functions is the posynomial function defined as
q n

with Ci > 0, aij G M and q positive integer. A simple convexification trans-


formation is readily obtained by taking Xj = e^^, j — l , . . . , n . Previous
research work on convexification has led to results on transforming a noncon-
vex programming problem into a convex programming problem. In particular,
geometric programming and fractional programming are two classes of non-
convex optimization problems that can be convexified. A survey of applica-
tions of F-convexity and /i-convexity and convex approximation in nonlinear
programming was given in [Hor84].
Convexifying monotone functions was inspired by a success of convexifying
a perturbation function in nonconvex optimization. Li [Li95] first introduced
a p-th power method for convexifying the perturbation function of a noncon-
vex optimization problem (see also [LSOl]). In [Li96], the p-th power method
was applied to convexify the noninferior frontier in multi-objective program-
ming. Two p-th power transformation schemes were proposed in [LSBGOl] for
convexifying monotone functions:

fl{y) = -\f{y"'')]-'', p>o, (12)


fliv) = [f{y"nY, p > 0. (13)

It is shown in [LSBGOl] that if f{x) is a strictly increasing function, then


/p(y) is a concave function for sufficiently large j9, and if f{x) is a strictly
decreasing function, then /^(y) is a convex function for sufficiently large p.
Another class of convexification transformations is defined as follows:

fp{y) = T{pfC-t{y))), p>0, (14)

where ^ is a one-to-one mapping without parameter. Conditions for convex-


ifying / via transformation (14) were derived in [SMLOl]. Obviously, (14) is
a special case of the general formulation (8). A more general transformation
that includes (12), (13) and (14) as special cases was proposed in [WBZ05].
Convexification method was used in [LWLYZ05] to identify a class of hidden
convex optimization problems.
Convexification and Monotone Optimization 281

3 Monotone optimization and concave minimization


3.1 Equivalence to concave minimization

Given a mapping t: R^ -^ R^. We now consider the following transformed


problem of (P):

max (l){y) - f{t{y))


s.t. ipiiy) = gi{t{y)) <bi, 2 = 1 , . . . , m, (15)
yeY\

where t : y* ^ X is an onto mapping with X = t{Y^). Denote by S and St


the feasible region of problem (P) and problem (15), respectively, i.e.

S={xeX\ gi{x) <hu i - 1 , . . . , m}, (16)


St^{y£Y'\ i;i{y) < 6,, i = 1 , . . . , m}. (17)

The following theorem establishes the equivalence between the monotone op-
timization problem (P) and the transformed problem (15).

Theorem 3. ([SMLOl])
(i) t/* G Y* is a global optimal solution to problem (15) if and only if
X* = t{y*) is a global optimal solution to problem (P).
(ii) Ift~^ exists and botht andt~^ are continuous mappings, theny* G Y^
is a local optimal solution to problem (15) if and only if x* = t{y'^) is a local
optimal solution to problem (P).

Combining Theorem 3 with Theorems 1-2 implies that if t in (15) is a


one-to-one mapping satisfying the conditions in Theorems 1 or 2, then the
monotone optimization (P) is equivalent to the convex maximization (or con-
cave minimization) problem (15). Especially, when ti takes the form of (6)
or (7) and the parameter p is greater than certain threshold value, then the
transformed problem (15) is a concave minimization problem.

3.2 Outer approximation algorithm for concave minimization


problems

Concave minimization is a class of global optimization problems studied inten-


sively in the literature. It is well-known that a convex function always achieves
its maximum over a bounded polyhedron at one of its vertices. Ranking the
function values at all vertices of the polyhedron gives an optimal solution.
For a convex maximization (or concave minimization) problem with a general
convex feasible set, Hoffman [HofSl] proposed an outer approximation algo-
rithm. The convex objective function is successively maximized on a sequence
of polyhedra that enclose the feasible region. At each iteration the current
enclosing polyhedron is refined by adding a cutting plane tangential to the
282 X. Sun et al.

feasible region at a boundary point. The algorithm generates a nonincreasing


sequence of upper bounds for the optimal value of problem (15) and termi-
nates when the current feasible solution is within a given tolerance of the
optimal solution.
An outer approximation procedure for problem (15) can be described
briefly as follows:

Algorithm 1 ( Polyhedral Outer Approximation Method).


Step 1. Choose an initial polyhedron PQ that contains St with vertex set VQ
and set k = 0.
Step 2. Compute v^ and 0^ such that (f)^ — 0(t'^) = max-i;^Vfc 4^{y)^ 1-^., v^ is
the best vertex in the current enclosing polyhedron.
Step 3. Find a feasible point y^ on the boundary of St- Let i be such that
ipi{y^) ~ hi. Form a new polyhedron P/c+i by adding a cutting plane
inequality: ^^{y — y^) < 0^ where ^k is a subgradient of the binding con-
straint ipi at y^.
Step 4. Calculate the vertex set T4+i of P/^+i. Set A; := A: + 1, return to Step
2.

It was shown in [HofSl] that the above method converges to a global


optimal solution to problem (15). In implementation, the above procedure
can be terminated when (p^ — (j){y^) < e, where e > 0 is a given tolerance.
There are many ways to generate the feasible point y^ in Step 3. A simple
method proposed in [HofSl] is to find the (relative) boundary point of St on
the line connecting v^ and a fixed (relative) interior point of 5t. Horst and Tuy
[HT93] suggested projecting v^ onto the boundary of St and choosing y^ to
be the projected point. Finding vertices of P^+i is the major computational
burden in the outer approximation method. After adding a cutting plane
{y I i^{y ~ y^) — O}, the new vertices can be generated by computing the
intersection point of each edge of Pk with the cutting plane. Techniques of
computing new vertices resulted from an intersection of a polyhedron with a
cutting plane are discussed in [CHJ91, HV88].
Let us consider a small example to illustrate the convexification and outer
approximation method.
Example 1.

max f{x) = 4.5(1 - 0.40^i-^)(l - 0.40"^^-^) + 0.2 exp(xi + X2 - 7)


s.t. gi{x) = bx\X2 — 4xi — 4.5x2 < 32,
xeX =-{x\2<xi< 6.2, 2 < X2 < 6}.

It is clear that / and gi are strictly increasing functions on X. The problem


has three local optimal solutions: x}^^ = (2.2692,6)^ with / ( x / ^ J = 3.7735,
xf^^ = (3.4528,3.5890)^ with /(xf^J = 3.857736 and xf^, - (6.2,2.1434)^
with f{xf^^) = 3.6631. Figure 3 shows the feasible region of the example.
Convexification and Monotone Optimization 283

It is clear that the global optimal solution xf^^ is not on the boundary of
the convex hull of the nonconvex feasible region 5. Take t to be the con-
vexification transformation (6) with p = 2. The convexified feasible region is
shown in Figure 4. Set e = 10""*. The outer approximation procedure finds
an approximate global optimal solution 2/* — (—0.21642,-0.19934) of (15)
after 17 iterations and generating 36 vertices. The point y* corresponds to
X* = (3.45290,3.58899), an approximate optimal solution to Example 1 with
/(x*) = 3.857736887.

Fig . 3. Feasible region of Example 1.


7

6- x^
—S loc

5-
\ ;

X, 4
Xxf 1
X/loc
3
^^^^ J
• loc
2

1
1 2 3 4 5

4 Polyblock outer approximation method


Polyblock approximation methods for monotone optimization were proposed
in [RTMOl, TuyOO, TLOO]. A polyblock is a union of a finite number of boxes
[a, z], where point a is called the lower corner point and point z eV is called
a vertex of the polyblock, with V being a finite set in R^. The polyblock outer
approximation method is based on the following two key observations: (i) the
feasible region S of (P) can be approximated from outside by a polyblock,
no matter S is convex or nonconvex, and (ii) any increasing function achieves
its maximum on a polyblock at one of its vertices. These two properties are
analogous to those of the polyhedral outer approximation in concave mini-
mization. Recall that a convex set can be approximated from outside by a
polyhedron and any convex function achieves its maximum on a polyhedron
at one of its vertices.
284 X. Sun et al.

Fig. 4. Convexified feasible region with p = 2.


0.1

oh
I ,js 1
-o.i[
-0.2 ^
I ^X' I
-0.3h

-0.4 h

-0.5h
I I ' I
-0.6h

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1

A polyblock outer approximation method can be developed for monotone


optimization by successively constructing polyblock that covers the feasible
region S. The algorithm first uses [l,u] as the initial polyblock. At the fc-th
iteration, let z^ he the vertex with the maximum objective function value
among all the vertices of the enclosing polyblock. A boundary point x^ of S
on the line connecting / and z^ is calculated. The polyblock approximation is
refined by cutting the box (x^, z^] from [I, z^], A set of n new vertices is then
generated by alternatively setting one of the components equal to that of x^
and the other components equal to those of z^. The iteration process repeats
until the difference between the upper bound (the maximum objective value
of the vertices) and the lower bound (the objective value of the current best
boundary point) is within a given tolerance. A vertex z is called improper if
there exists another vertex w of the polyblock such that z < w with at least
one component satisfying Zi < Wi. By the monotonicity of the problem, any
improper vertex generated during the polyblock approximation process can
be deleted.
Let Se = 5 n [/-f ee, u], where e > 0 and e = ( 1 , . . . , 1)^. A feasible solution
X* is said to be an e-optimal solution to (P) if x* G argmax{/(x) | x e Se}.
A feasible solution x* is said to be an (e, 77)-optimal solution to (P) if /(x*) >
/ * — 77, where / * is the global maximum of / ( x ) over Se. It is easy to see that
both e-optimal and (e, r7)-optimal solutions can be regarded as approximate
optimal solutions to (P).

Algorithm 2 (Polyblock Approximation Algorithm).


Step 0 (Initialization). Choose tolerance parameters e > 0 and 77 > 0. If / i s
infeasible then (P) has no feasible solution. If u is feasible then u is the
optimal solution to (P), stop. Otherwise, set x^ = l,Vi — {u} and fc = 1.
Convexification and Monotone Optimization 285

Step 1. Compute

z^ G argmax{/(2;) \ z eVk, z>l + ee}.

If z^ G 5, stop and z^ is an e-optimal solution to (P).


Step 2. Compute a boundary point x^ of S on the line linking / and z^. Set
x^ = argmax{/(x^~-^),/(x^)}. If f(x^) > f{z^) — ^, stop and x^ is an
(e, 77)-approximate solution to problem (P).
Step 3. Compute the n new vertices of the box [x^^z^] that are adjacent to
z^:

z^^'= z^ - {z^ - x^)e\ i-l,...,n,

where e^ is the z-th unit vector of R^. Set

Vk+i = {Vk\{z'^})U{z'''\...,z'''"}.

Let T4+1 be the set of the remaining vertices after removing all improper
vertices in 14+1-
Step 4' Set k := k -\-l, return to Step 1.

Remark 1. We note that in Algorithm 2 at most n new vertices are added


to the vertex set Vk at each iteration. However, the number of vertices accu-
mulated during the iteration process could be so large such that storing all
vertices is prohibitive from the computational point of view. In order to avoid
such a storage problem, restarting strategy can be adopted. Specifically, Step
4 can be replaced by the following two steps:
Step ^. If 1141 < AT" (AT is the critical size of the vertex set), then set fc := /c + l
and return to Step 1. Otherwise go to Step 5.
Step 5. Redefine Vk^i = {u— [ui — x^)e% i — 1 , . . . , n } . Set k \— k + 1 and
return to Step 1.

It was shown that Algorithm 2 either stops at an e-optimal solution in Step


1 or stops at an (e, r7)-approximate solution after finite number of iterations
(see [RTMOl]).
Figure 5 illustrates the first 3 iterations of the polyblock approximation
for Example 1. Using the same accuracy e = 10~^ as in Example 1, the
method finds an e-approximate global solution x\yest = (3.4526,3.5890)-^ with
fi^best) = 3.857736 after 359 iterations and generating 718 vertices.
Algorithm 2 can be extended to deal with monotone optimization problems
with an additional reverse monotone constraint h{x) > c, where h{x) is an
increasing function (see [TuyOO, TLOO]). Various applications of monotone
optimization can be found in [TuyOO].
286 X. Sun et al.

1 ' '

1 z' Z' z°

[
...:• -76
x"^ y v •'' : Z:

z'
z'
^?\ z" 1
^"^^"""^^^^^^^^^^^

Fig. 5. Polyblock approximation for Example 1.

5 A hybrid m e t h o d

Despite its relatively easy implementation, the polyblock approximation me-


thod may suffer from its slow convergence due to the poor quality of upper
bounds, as witnessed from illustrative examples and computational exper-
iments. The convexification method discussed in Sections 2 and 3, on the
other hand, is essentially a polyhedral approximation method for solving the
transformed concave minimization problem. Therefore, it may suffer from the
rapid (exponentially) increase of the number of polyhedral vertices generated
by the outer approximation, as is the case for any polyhedral outer approx-
imation methods for concave minimization (see [Ben96, HT93]). Moreover,
it is difficult to determine a suitable convexification parameter that controls
the degree of the convexity of the functions on a large size domain. A large
parameter may cause an ill-conditional transformed problem.
To overcome the computational difficulties of the convexification method, a
hybrid method was developed in [SL04] to incorporate three basic strategies:
partition, convexification and local search, into a branch-and-bound frame-
work. The partition scheme is used to decompose the domain X into a union
of subboxes. The union of these subboxes forms a generalized polyblock that
covers the boundary of the feasible region. Figure 6 illustrates the partition
process for Example 1. To obtain a better upper bound on each subbox,
convexification method is used to construct polyhedral outer approximation,
thus enabling more efficient node fathoming and speeding up the convergence
of the branch-and-bound process. A local search procedure is employed to
improve the lower bound of the optimal value. Since only an approximate
solution is needed in the upper bounding procedure, the number of polyhe-
Convexification and Monotone Optimization 287

dral vertices can be limited and controlled. Moreover, as the domain shrinks
during the branch-and-bound process, the convexity can be achieved with a
smaller parameter, thus avoiding the ill-conditional effect for the transformed
subproblems.

Fig. 6. Partition process for Example 1.

Consider a subproblem of (P) by replacing X = [l,u] with a subbox


[a,/?]CX:

{SP) max f{x)


s.t. gi{x) <bi, i ,m,
xe [a,/?].
Let Xb be the boundary point of S on the hne connecting a and /?. By the
monotonicity of / and ^^s, there are no better feasible points than Xb in
[a,Xb) and there are no feasible points in (x^,/?]. Therefore, the two boxes
[a^Xb) and (x^,/?] can be removed from [a,P] without missing any optimal
solution of {SP) (cf. Figure 6). The following lemma shows that the set of the
points left in [a,/?] after removing [a,Xb) and (x^,/?] can be partitioned into
at most 2n — 2 subboxes.

Lemma 1. ([SL04]) Let a < (3. Denote A=[a,l3], B= [a, 7) and C - (7,/?].
Then A\{B UC) can be partitioned into 2n — 2 subboxes.

A\{BUC) = {U?^2 ( ^ f c l K , 7 f c ] X buPi] X n^=i+i[ak,Pk])}


(18)
U{Ut2 {nrJibk,/3k] X [ai,7i] X iTfc"^i+i[afc,/3fe])}.
288 X. Sun et al.

Let J = { 1 , . . . , n } . The hybrid algorithm can be formally described as


follows.

Algorithm 3 (Hybrid Algorithm for Monotone Optimization).

Step 0 (Initialization). Choose tolerance parameters e > 0 and r/ > 0. If / is


infeasible then problem (P) has no feasible solution. If i^ is feasible then u is
the optimal solution to (P), stop. Otherwise, set x^est = U fbest = f{xbest)^
fl = f{u),a' ==l,P'= u,X' = {[a\(3']}. Set k = 1.
Step 1 (Box Selection). Select the subbox [a^,f3^] G X^ with maximum upper
bound ft Let I^ = {j e J \ (3^ - a^ < rj} and Q^ = J \ / ^ If Q^ = 0,
stop, X = a^ is an //-optimal solution to problem (P).
Step 2 (Boundary Point). Set X^ := X^ \ [a^,/?^]. Find a boundary point x^
of S on the line connecting a^ and /3^.
Step 3 (Local Search). Starting from x^, apply a local search procedure to find
a local solution xf^^ of the subproblem on [a^,/?^]. If f{xf^^) > fbest^ set
Xbest = ^loc'> J best = Jx^loc)'
Step 4 (Partition). Partition the set i?^ - [a^,/?^] \ ([a^,x^) U (x^,/?^]) into
2\Q^\ — 2 new subboxes using formulation (18). Let X^'^^ be the set of
subboxes after adding the new subboxes to X'^. Removing all sub-boxes
[j,5] in X^^'with f{5) <hest-
Step 5 (Upper Bounding). For each subbox [a,/?] in X^~^^, apply the convex-
ification and outer approximation (Algorithm 1) to find an upper bound
UB^oc^p^ of the objective function.
Step ^(Fathoming). Removing all subboxes [a,/?] in X^'^^ with UB^f^^pj <
fbest' Let /-^"^^ be the maximum upper bound of all the subboxes. If
fu^^—fbest < e, then stop, Xbest is an e-optimal solution to (P). Otherwise,
set A; := /c + 1, goto Step 1.

In the implementation of Step 5, the outer approximation iteration can be


terminated upon finding a new vertex whose objective function value is less
than or equal to the lower bound fbest- It was shown that Algorithm 3 either
stops at an ry-optimal solution in Step 1 or stops at an e-optimal solution in
Step 6 within finite number of iterations (see [SL04]). Details of computational
considerations and extensive numerical results of Algorithm 3 were given in
[SL04].

6 Conclusions
We have summarized in this paper basic ideas and results on convexification
methods for monotone optimization. Applying convexification to a monotone
optimization problem results in a concave minimization problem that can be
solved by the polyhedral outer approximation method. The polyblock approx-
imation method can also be viewed as an outer approximation method where
Convexification and Monotone Optimization 289

polyblocks are used to approximate the feasible region and upper bounds are
computed by ranking the extreme points of the polyblock. Integrating the
promising features of convexification schemes and the polyblock approxima-
tion method, the newly proposed branch-and-bound framework that combines
partition, convexification and local search is promising from the computational
point of view.
Among many interesting topics for the future research, we mention the
following three areas:
(i) D.I. functions (difference of two increasing functions) constitute a large
class of nonconvex functions in global optimization (e.g., polynomials). Di-
rect application of the convexification method to problem with D.I. functions
involved gives rise to a D.C. (difference of convex functions) optimization
problem ([HT99]). It is of a great interest to study efficient convexification
methods for different types of D.I. programming problems and develop effi-
cient global optimization methods for the transformed D.C. problems.
(ii) Many real-world optimization models may only have partial monotonic-
ity. For example, the function is monotone with respect to some variables and
nonmonotone with respect to other variables, or the function is a sum of a
monotone function and a nonmonotone function. In global optimal design
problems ([HJL89]), partial monotonicity properties are often inherent in ob-
jective and constraint functions. How to exploit the partial monotonicity by
certain convexification scheme is an interesting topic for future study.
(iii) Many computational issues of the outer approximation method still
need to be further investigated. The major computation burden in the outer
approximation method is the computation and storage of the vertices of the
polyhedron containing the feasible region. Vertex elimination technique could
be a possible remedy for preventing a rapid increase of the number of vertices
of the outer approximation polyhedron.

7 Acknowledgement
This research was supported by the National Natural Science Foundation of
China under Grants 10271073 and 10261001, and the Research Grants Council
of Hong Kong under Grant CUHK 4214/OlE.

References
[Ben77] Ben-Tal, A.: On generalized means and generalized convexity. Journal of
Optimization Theory and Applications, 21, 1-13 (1977)
[Ben96] Benson, H.P.: Deterministic algorithm for constrained concave minimiza-
tion: A unified critical survey. Naval Research Logistics, 43, 765-795
(1996)
[Cha85] Chaney, R.W.: On second derivatives for nonsmooth functions. Nonlinear
Analysis: Theory and Methods and Application, 9, 1189-1209 (1985)
290 X. Sun et al.

[CHJ91] Chen, P . C , Hansen, P., Jaumard, B.: On-line and off-line vertex enumer-
ation by adjacency lists. Operations Research Letters, 10, 403-409 (1991)
[Fen51] Fenchel, W.: Convex cones, sets and functions, mimeographed lecture
notes. Technical report, Princeton University, NJ, 1951
[HJL89] Hansen, P., Jaumard, B., Lu, S.H.: Some further results on monotonicity
in globally optimal design. Journal of Mechanisms, Transmissions, and
Automation Design, 111, 345-352 (1989)
[HofSl] Hoffman, K.L.A.: A method for globally minimizing concave functions
over convex set. Mathematical Programming, 20, 22-32 (1981)
[Hor84] Horst, R.: On the convexification of nonlinear programming problems: An
applications-oriented survey. European Journal of Operational Research,
15, 382-392, (1984)
[HT99] Horst, R., Thoai, N.V.: D.C. programming: Overview. Journal of Opti-
mization Theory and Apphcations, 103, 1-43 (1999)
[HT93] Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches.
Springer-Verlag, Heidelberg (1993)
[HV88] Horst, R., Vries, J.D.: On finding new vertices and redundant constraints
in cutting plane algorithms for global optimization. Operations Research
Letters, 7, 85-90, (1988)
[IK88] Ibaraki, T., Katoh, N.: Resource Allocation Problems: Algorithmic Ap-
proaches. MIT Press, Cambridge, Mass. (1988)
[Li95] Li, D.: Zero duality gap for a class of nonconvex optimization problems.
Journal of Optimization Theory and Applications, 85, 309-324 (1995)
[Li96] Li, D.: Convexification of noninferior frontier. Journal of Optimization
Theory and Apphcations, 88, 177-196 (1996)
[LSOl] Li, D., Sun X.L.: Convexification and existence of saddle point in a p-th
power reformulation for nonconvex constrained optimization. Journal of
Nonlinear Analysis: Theory and Methods (Series A), 47, 5611-5622 (2001)
[LSBGOl] Li, D., Sun, X.L., Biswal, M.P., Gao, F.: Convexification, concavifica-
tion and monotonization in global optimization. Annals of Operations
Research, 105, 213-226 (2001)
[LSM05] Li, D., Sun, X.L., McKinnon, K.: An exact solution method for reliability
optimization in complex systems. Annals of Operations Research, 133,
129-148 (2005)
[LWLYZ05] Li, D., Wu, Z.Y., Lee, H.W.J., Yang, X.M., Zhang, L.S.: Hidden convex
minimization. Journal of Global Optimization, 31, 211-233 (2005)
[Muf77] Mufflin, R.: Semismooth and semiconvex functions in constrained opti-
mization. SIAM Journal on Control and Optimization, 15, 959-972 (1977)
[PR86] Pardalos, P.M., Rosen, J.B.: Methods for global concave minimization: a
bibliogrphic survey. SIAM Review, 28, 367-379 (1986)
[RP87] Rosen, J.B., Pardalos, P.M.: Constrained Global Optimization: Algo-
rithms and Apphcations. Springer-Verlag (1987)
[RTMOl] Rubinov, A., Tuy, H., Mays, H.: An algorithm for monotonic global opti-
mization problems. Optimization, 49, 205-221 (2001)
[SL04] Sun, X.L., Li, J.L.: A new branch-and-bound method for monotone opti-
mization problems. Technical report. Department of Mathematics, Shang-
hai University (2004)
[SLL04] Sun, X.L., Luo, H.Z., Li, D.: Convexification of nonsmooth monotone func-
tions. Technical report. Department of Mathematics, Shanghai University,
(2004)
Convexification and Monotone Optimization 291

[SMLOl] Sun, X.L., McKinnon, K.I.M., Li, D.: A convexification method for a class
of global optimization problems with applications to reliability optimiza-
tion. Journal of Global Optimization, 2 1 , 185-199 (2001)
[TuyOO] Tuy, H.: Monotonic optimization: problems and solution approaches.
SI AM Journal on Optimization, 11, 464-494 (2000)
[TLOO] Tuy, H., Luc, L.T.: A new approach to optimization under monotonic
constraint. Journal of Global Optimization, 18, 1-15 (2000)
[TzaSO] Tzafestas, S.G. Optimization of system reliability: A survey of problems
and techniques. International Journal of Systems Science, 1 1 , 455-486
(1980)
[WBZ05] Wu, Z.Y., Bai, F.S., Zhang, L.S.: Convexification and concavification for
a general class of global optimization problems. Journal of Global Opti-
mization, 3 1 , 45-60 (2005)
Generalized Lagrange Multipliers for
Nonconvex Directionally DifFerentiable
Programs

Nguyen Dinh^, Gue Myung Lee^, and Le Anh Tuan^

^ Department of Mathematics-Informatics
Ho Chi Minh City University of Pedagogy
280 An Duong Vuong St., District 5, HCM city, Vietnam
ndinhQhcmup.edu.vn
^ Division of Mathematical Sciences
Pukyong National University
5 9 9 - 1 , Daeyeon-3Dong, Nam-Gu, Pusan 608 - 737, Korea
gmleeSpknu.ac.kr
^ Ninh Thuan College of Pedagogy
Ninh Thuan, Vietnam
latucin02@yahoo. com

S u m m a r y . A class of nonconvex optimization problems in which all the functions


involved are directionally differentiable is considered. Necessary optimality condi-
tions of Kuhn-Tucker type based on the directional derivatives are proved. Here
the Lagrange multipliers generally depend on the directions. It is shown that for
various concrete classes of problems (including classes convex problems, locally Lip-
schitz problems, composite nonsmooth problems), generalized Lagrange multipliers
collapse to the standard ones (i.e., Lagrange multipliers are constants as usual).
Optimality conditions for quasidifferentiable problems are derived from the main
results. Optimality conditions for a class of problems in which all the functions
possess upper DSL-approximates are also derived from the framework.

2 0 0 0 M R S u b j e c t C l a s s i f i c a t i o n . Primary: 90C30; Secondary: 90C46;


49K27

K e y w o r d s : Directional Kuhn-Tucker condition, quasidifferentiable func-


tions, regularity conditions, upper approximates, invexity, composite prob-
lems, optimality conditions.

1 Introduction and Preliminaries


We consider the following mathematical programming problem (P):
294 N. Dinh et al.

min f{x)
subject to X e C, gi{x) < 0, i = 1,2, • • • , m.

where f,gi : X —> RU{oo} , i G / := {1,2, • • • , m } , X is a real Banach space


and C is a closed convex subset of X.
In the case where the directional derivatives of the functions / and Qi, i =
1,2, • • • , m exist but are not convex functions of the directions, the standard
necessary condition for a feasible point XQ to be a solution of (P) stating that
there exist A^ > 0, i — 0 , 1 , 2 , . . . , m such that
m
Ao/'(^o, r-) + Y^ Xigl{xo,r) > 0, Vr G cone(C - XQ), (1)
i=l

Kgii^o) = 0, for alH = 1,2,..., m (2)


fails to hold (see [CraOO, DT03], and Examples 1 and 3). For this class of prob-
lems, an optimality condition based on directional derivatives (an extended
version of (l)-(2)) with Lagrange multipliers are functions of directions was
introduced recently by B.D. Craven in [CraOO]. Such type of conditions were
also established in [DT03] for quasidifferentiable problems.
In this paper we introduce a more general approach which can apply to
larger classes of directionally differentiable problems. Concretely, we are deal-
ing with a class of problems where the functions involved are directionally
differ entiable and possess upper approximates in each direction. A necessary
condition for optimality of Kuhn-Tucker form where Lagrange multipliers are
functions of directions is established. This condition is also sufficient under
invex type hypothesis. It is shown that for various concrete classes of problems
(including classes of convex problems, locally Lipschitz problems, composite
nonsmooth problems), generalized Lagrange multipliers collapse to the stan-
dard ones (i.e., Lagrange multipliers are constants). As an application, opti-
mality conditions for quasidifferentiable problems are derived from the main
results. Optimality conditions for a class of problems in which all the functions
possess upper DSL-approximates (see [Sha86, MW90]) are also derived from
the framework.
In Section 2, a necessary condition of Kuhn-Tucker type (called "direc-
tional Kuhn-Tucker condition") based on directional derivatives is estabhshed.
Here the Lagrange multiplier A is a map of the directions, A : cone(C —xo) —>
M!p. It is called "generalized Lagrange multiplier". Under some generalized
convexity condition, the condition is also sufficient for optimality. Also, nec-
essary optimality conditions (Pritz-John and Kuhn-Tucker type conditions)
associated with upper approximates of the objective and the constrained func-
tions are given. In Section 3 we examine some special cases where the gener-
alized Lagrange multipliers collapse to constants as usual. As applications, in
Generalized Lagrange Multipliers 295

this section optimality conditions for a class of composite nonsmooth prob-


lems with Gateaux differentiability and also for quasidifferentiable problems
are given. For the class of quasidifferentiable problems, it is shown that the "di-
rectional Kuhn-Tucker condition" is weaker than the well-known Kuhn-Tucker
condition in the set inclusion form established earlier in [War91, LRW91]. It
should be noted that for this class of problems the optimality conditions ob-
tained in this section are based on the directional derivatives only and hence,
do not depend on any specific choice of quasidifferentials of the functions in-
volved. This is one of the special interest aspects of this class of problems
(see [LRW91, War91]). An example is given at the end of this section to show
that for quasidifferentiable problems, in general, the Lagrange multipliers can
not be a constants. Furthermore, it is shown by this example that the candi-
dates for minimizers can be sorted out by using the directional Kuhn-Tucker
condition. In the last section. Section 4, we show the ability of applying the
framework introduced in Section 2 to some larger class of problems. Con-
cretely, it is shown that the framework is applicable to the class of problems
for which the functions involved possess upper DSL-approximates in the sense
of [Sha86, MW90] (that is, upper approximates can be represented as a differ-
ence of two sublinear functions). A necessary condition parallel to those given
in [Sha86, MW90] and a sufficient condition are proved. A relation between
these conditions and the one in [MW90, Sha86] is also established.
We close this section by recalling the notions of directional differentiability
and recession functions of extended real-valued functions.
Let X be a real Banach space and / : X —> R U {+oo}. For XQ, /i G X, if
the limit
/(xo,/.):=^lim^

exists and is finite then / ' ( X Q , h) is called the directional derivative of / at XQ


in the direction h. The function / is called directionally differentiable at XQ if
/'(XO, h) exists and is finite for any direction h e X.
Note that if / is directionally differentiable at XQ then the directional
derivative / ' ( X Q , •) is positive homogeneous but in general it is not convex.
Let g : X —> MU{+oo} be directionally differentiable at XQ. The recession
function of g' at XQ is defined by

{9')'^{xo,y) '= sup[g\xo,d + y)-g\xo,d)].


dex
The notion of recession function was widely used (see [War91, MW90]
and the references therein). It is worth noting that {g')^{xQ^ •) is a sublinear
function and p'(xo,-) < (^')°°(xo, •)• Concerning the recession function, the
following lemma [MW90, Corollary 3.5] will be used in the next section.
296 N. Dinh et al.

Lemma 1. [MW90] Suppose that g is directionally differentiable at XQ and


g'{xQ^.) is lower semicontinuous (Is.c.) on X. Ifp{.) is an upper approximate
of g at XQ, then there exists an upper approximate h of g at XQ such that

h{x) < mm{p{x), {g')'^{xo,x)} for all x e X,

It is worth noting that the conclusion of Lemma 1 still holds without the
assumption on the lower semicontinuity of p'(xo, •) if X is finite dimensional.
This was established in [War91, Lemma 2.6].

2 Generalized Lagrange Multipliers

In this section we will concern the Problem (P) where f^giiX —> MU{+cx)},
i G / := {1,2, • • • , m}. Let S be the feasible set of (P), that is, S := C n{x G
^ I 9i{^) < 0, i = 1,2, • • • , m}. Let also XQ e S and I{xo) := {i e I \ gi{xo) =
0}. We assume in the following that all the functions / and gi, i G / are
directionally differentiable at XQ. It is not assumed that the functions / ' ( X Q , •)
and g[{xQ^ ')-> i ^ ^(^o), are convex.

2,1 Necessary conditions for optimality

We begin with the necessary condition of Fritz-John type whose the proof is
quite elementary. Note that no extra assumptions are needed here but the
directional differentiability of / , p^, and the continuity of gi (at XQ) with
i ^ I{xo). The same condition (holds for feasible directions from XQ and
X = M'^) was recently proved in [CraOO].

Theorem 1. Suppose that f and gi, i G I{xo) are directionally differentiable


at XQ and gi with i ^ I{xo) are continuous at XQ.IfxoESisa local minimizer
of (P) then for each r G cone{C — XQ), there exists X = (AQ, AI, • • • , A^) G
M!!?"^^; X ^ 0 such that the following conditions hold:

Aof(xo,r) + ^ A , ^ K ^ o , r ) > 0 , (3)


i=l

Xigi{xo) — 0,/or a/H = 1,2, • • • ,m. (4)


Generalized Lagrange Multipliers 297

Proof. We first note that the conditions (3)-(4) are equivalent to

Aof(xo,r)+ Yl A,^i(^o,r)>0

and Xi = 0 iox i ^ I{xo).


Suppose that xo G 5 is a local minimizer of (P). Assume on the contrary
that there exists f G cone{C — XQ) such that for any A G R!^^^°^' , A T^^ 0 one
has
A o f ( x o , f ) + Yl ^idli^o.f) <0. (5)
ieI{xo)

Then by the arbitrariness of A G R!^^^°^'"^ we get from (5) that


/(xo,f)<0, ^,'(xo,f)<0, ieI{xo).
It follows from the definition of directional derivatives and the continuity of
Qi^ i ^ /(xo) that for sufficiently small /x > 0,
xo-\-fxf e C, f{xo + jLLf) < /(xo), gi{xo + fif) < 0, Vi G / .
This contradicts the fact that XQ is a minimizer of (P). D

It is worth noting that the multiplier A = (AQ, AI, • • • , Am) G R!}?"^^, A j^


0 that exists in Theorem 1 depends on the direction r G cone (C — XQ).
Precisely, the Lagrange multiplier A is a map of direction r G cone (C — XQ).
The conclusion of Theorem 1 can be expressed as follows:
There exists a map A(.) : cone{C — XQ) —> R^'^'^, A — (AQ, Ai, • • • , A^)
with nonzero values, satisfying
m
Ao(r)/'(xo, r) + ^ Ai(r)^-(xo, r) > 0, Vr G cone(C - XQ),
i=l
Xi{r)gi{xo) = 0, for alH = 1,2,..., m and r G cone(C — XQ).
The map A(.) : cone(C — xo) —> R++^ is then called a generalized La-
grange multiplier (of Fritz-John type). It is shown in the next section that in
many cases (e.g., differentiable, convex, Lipschitz problems) A can be taken
to be a constant (constant map) as usual.
It is easy to see that if for some r G cone(C — xo), g[{xQ^r) < 0 for all
i G I{XQ) then Ao(r) ^ 0. So if we want to have a condition of Kuhn-Tucker
type (which is of the most interest), such condition have to be satisfied for all
r ^ X. But it seems that this is quite strong in comparision with the well-
known ones (see [Man94]). In the following we will search for some weaker
ones that imply Ao(r) ^ 0 for all r G cone(C — xo). Such conditions are often
known as regularity conditions .
Let g : X —> R U {+00} be directionally diflFerentiable at XQ E X.
298 N. Dinh et al.

Definition 1. A lower semicontinuous sublinear function cj) : X —> R is


called an upper approximate of g at XQ if

g'{xo,x) < 0(x), for all x e X.

If this condition satisfies for all x G D where D is a cone in X then we say


that (j) is an upper approximate of g at XQ on D.

An upper approximate of a function g, if it exists, may not be unique. So in


general it may not give "good enough" information about the function g near
XQ. We introduce another kind of upper approximates.

Definition 2. Let ^ be a point ofX.A function (j): X —> R is called an upper


approximate of g at XQ in the direction ^ e X if (p is an upper approximate of
g at XQ and

Note that if ^ is a proper convex function on X then g'{xo,.) is an upper


approximate of g dit XQ. Moreover, in this case g'{xo^.) is also an upper ap-
proximate of ^ at XQ in any direction ^ G X. If further, g is locally Lipschitz at
XQ then g^{xQ,.) (the Clarke generalized derivative at XQ) is an upper approx-
imate of ^ at XQ. g^ixQ^.) is an upper approximate of g at XQ in the direction
^ G X if and only if g^{xQ,i) = g'{xo, ^).
A function g which possesses upper approximates at XQ in every direction
^ G X means that there exists a family of upper approximates {(f)^{.))^^x of
g at xo such that ^^(C) = g'i^o.Oy ^^^ every ^ E X. Such classes of functions
contain, for example, the class of convex functions, differentiable functions,
locally Lipschitz and regular functions (in the sense of Clarke), and the class
of quasidifferentiable functions in the sense of Demyanov and Rubinov (see
Section 3).
We now introduce a regularity condition for (P), which is of Slater type
constraint qualifications and involves upper approximates. Suppose that gi ,
i G /(xo), possesses upper approximates at XQ in any direction ^ G X.

Definition 3. The Problem (P) is called (CQl) regular at XQ if there exists


X G cone{C — xo) such that for any direction ^ G X there are upper approxi-
mates ^f (.) of gi, i G I{xo), at XQ in the direction ^ satisfying

^^(x) < 0 for all i G I{xo).

Definition 4. [MW90] The Problem (P) is called (CQ2) regular atxo if there
exists X G cone{C — XQ) such that

(gD'^ixo.x) < 0 for all i G /(XQ).


Generalized Lagrange Multipliers 299

We are now able to establish a necessary optimality condition of Kuhn-


Tucker type for (P).

Theorem 2. (Directional Kuhn-Tucker condition) Suppose that f.gi, i G / ;


are directionally differentiable at XQ and possess upper approximates at XQ in
any direction ^ G X; gi is continuous at XQ for all i ^ /(XQ). If XQ is a local
minimizer of (P) and if one of the following holds
(a) (P) is (CQl) regular at XQ;
(h) dim X < -f GO and (P) is (CQ2) regular at X{),
(c) (P) is (CQ2) regular at XQ and g[(xQ, •) is l.s,c. for all i G /(XQ)
then the following directional Kuhn- Tucker condition (DKT) holds
(DKT) For each r G cone{C — XQ), there exists A = {Xi)iQi G R!p such
that
/'(xo,r) + ^ A ^ ^ ; ( x o , r ) > 0 , Xigi{xo)=0, \/i e L
iei

A point XQ G S that satisfies (DKT) is called a directional Kuhn-Tucker


point of (P).

Proof Suppose that XQ is a minimizer of (P). It follows from Theorem 1 that


for each r G cone{C — XQ), there exists A(r) == (Ao(r), Ai(r), • • • , A^(r)) ^ 0,
Ai(r) > 0 for all z G / such that

Ao(r)f(xo,r) + ^ A , ( r ) ^ , ' ( x o , r ) > 0 , A,(r).^,(xo) - 0, Vz G / . (6)


iei

It suffices to prove that for each r G cone{C — XQ), Ao(r) 7^ 0. Assume on the
contrary that there is f G cone{C — XQ) with Ao(f) = 0. We will prove that in
this case it is possible to replace the multiplier A(f) by some other A(f) with
Ao(f) 7^ 0 such that (6) holds at r == f with A(f) instead of A(f).
Since XQ is a local minimizer of (P), the following system of variable ^ G X
is inconsistent:

^Gcone(C-xo), / ' ( x o , e ) < 0 , p - ( ^ o , 0 < 0 , Vi G/(XQ). (7)

(i) Suppose that (c) holds, i.e., (P) is (CQ2) regular and ^'(XQ,.) is l.s.c.
for all i G /(XQ). Let ^'^(.), ^[(.) be upper approximates of / and gi, i E /(xo)
at Xo in the direction f, respectively. By Lemma 1 there exist h,hi, upper
approximates of / , ^^, i G /(XQ) at XQ (respectively), satisfying for all x G X ,

r h{x) < m i n { r (x), ( f )^(xo, x)},


\ hi{x) < mm{^l{x), (PO^(XO,X)}, Vi G /(XQ).

Since (7) is inconsistent, the following system of convex functions is inconsis-


tent:
^ G cone{C — XQ), h{x) < 0, hi{x) < 0, i G /(XQ).
300 N. Dinh et al.

By Gordan's alternative theorem (see [Man94, HK82]), there exist AQ > 0,


A^ > 0, 2 G I{xo)^ not all zero, such that

Xoh{x) + ^ Xihi{x) > 0, Vx G cone{X - XQ). (8)

Therefore, if AQ = 0 then
y^ Xihi{x) > 0, Vx G cone{C - XQ). (9)
iEl{xo)

By (CQ2) regularity condition, there is :r G cone(C—XQ) such that (^^)°°(xo, x) <


0 for all i G I{xo). This implies that (note that A^ > 0 for all i G /(XQ) and
not all zero)

iel(xo) iGlixo)
which contradicts (9). Hence, AQ 7^ 0 (and we can take AQ = 1). With x = f,
(8) gives
Hr)-^ Yl Xihi{f)>0.
ieI{xo)
Since h{f) < ^^(f) = f'{xo,f), hi{f) < ^[(f) = g'i{xo,f), and A^ > 0 for all
i € I{xo), we arrive at

f'ixo,f)+ ^ Xig'iixo,f)>0.
iei{xQ)

Take Xi{f) — Xi for i G /(XQ) and A^(f) = 0 for all i ^ /(XQ) and A(f) =
(A^(f))^e/• It is obvious that A(f) satisfies the condition (DKT) at r = f. The
proof is complete in this case.
(ii) The proof for the case where (b) holds is the same as in the previ-
ous case, using Lemma 2.6 in [War91] instead of Lemma 1 (see the remark
following Lemma 1).
(iii) The proof for the case where (a) holds is quite similar to that of (c).
Take ^^, ^ [ to be the upper approximates of / and pi, i G /(XQ) (respectively)
at xo in the direction f that exist by (CQl). The inconsistency (7) implies the
inconsistency of the following system:
X G cone(C - XQ), ^ ^ ( X ) < 0, ^j^(x) < 0, i G /(XQ).
Then we get (8) with h is replaced by #^ and hi is replaced by ^ [ , i G /(XQ).
If Ao = 0 then
y] Xi^lix) > 0, Vx G cone{C - XQ).
i€l{xo)
This is impossible since by (CQl), ^[(x) < 0 for all i G /(XQ) and A^ > 0,
{i G /(xo)) not all zero. Hence AQ J^ 0. The rest is the same as in (i). The
proof is complete. D
Generalized Lagrange Multipliers 301

The relation between (CQl), (CQ2) and the other regularity conditions,
as well as the relation between (DKT) and some other Kuhn-Tucker condition
will be discussed at the end of Section 3 in the context of quasidifferentiable
programs.

2.2 Sufficient condition for optimality

We now prove that the directional Kuhn-Tucker condition (in Theorem 2) is


also sufficient for optimality under an assumption on the invexity of (P). This
notion of generalized convexity has been widely used in smooth as well as
nonsmooth optimization problems (see [BRS83, CraSl, Cra86, CraOO, HanSl,
SacOO, SKLOO, SLK03, YS93], . . . )• Our definition of invexity is slightly dif-
ferent from the others.

Definition 5. Suppose that (/>(•) , ^i{'), i G /(XQ) are positively homogenous


functions defined on X such that

f{xo,x)<(t){x), VXGX,
gii^o.x) < (f)i{x), V X G X, \fie I{xo).

The Problem (P) is called invex atxo on S with respect to (/>(•), (/){{'), i G I{xo)
if there exists a function rj : S —> cone{C — XQ) such that the following holds:

f{x)-f{xo)>ct>{v{x))^ VXG5,
9i{x) - Qiixo) > ^i{rj{x)), Vx G 5, \/i e I{XQ),

If (P) is invex (at XQ on S) with recpect to / ' ( X Q , •),^^(XO, -), i G I{XQ) then
it is called simply invex (the most important case).

Note that if / , gi are differentiable at XQ then the invexity of (P) (with respect
to f'{xo, •), g[{xo^ ')^ i G H^o)) ^s exactly the one which appeared in [HanSl,
CraSl]. In Definition 5, if in addition, f,gi are locally Lipschitz at XQ and
if we take 0(-) = f^{xo,')^ (j)i{') = ^^(XQ,*)? ^ ^ -^(^o) then we come back
to the definition of invexity appearing in [YS93, BRS83]. This also relates to
the cone-invexity for locally Lipschitz functions, which was defined in [Cra86].
The following result was established in [CraOO] concerning feasible directions
and for X = R^. Its proof is almost the same as in [CraOO, DT03] and so it
will be omitted.

Theorem 3. (Sufficient condition for optimality) Let f,gi,i£lbe direc-


tionally differentiable at XQ , If XQ is a directional Kuhn- Tucker point of (P)
and if (P) is invex at XQ on S then XQ is a global minimizer of (P).

In view of Theorems 2 and 3, it is easy to obtain the following necessary


and sufficient optimaity conditions with upper approximates:
302 N. Dinh et al.

Corollary 1. For the problem (P), let XQ is a feasible point and let cj), (pi
be upper approximates of f, gi, i G / at XQ, respectively. Suppose that gi is
continuous at XQ for all i ^ /(xo).
(i) If XQ is a local minimizer of (P) then there exist AQ > 0, A^ > 0^ i G / ;
not all zero, such that

Xo(t){x) + ^ XiCpiix) > 0, Vx G cone{C - XQ); Xigi{xo) = 0, Vz G / .


iei

Moreover, if there exists x G 5 such that (j)i{x) < 0 for all i G I{xo) then XQ ^
0 (and hence, one can take AQ = \)> That is, there exists X = {Xi)i^i G M!f?
such that

^W + ^ K(t>i{^) > 0^ Vx G cone{C - XQ); Xigi{xo) =0, Vi G / . (10)


iei

(a) Conversely, if XQ satisfies (10) (for some upper approximates (j), (pi of
f, gi on cone {C — XQ), respectively, and some X G W^) and if (P) is invex
at xo on 5 := C n {x G X\gi[x) < 0, i = 1,2, • • • ,m} with respect to (p, cpi,
i G /(xo) then XQ is a global minimizer of (P).

Proof (i) Since XQ is a solution of (P) the following system of variable ^ G X:

C G cone (C - XQ), /'(XO,0 < 0, g[{xo,0 < 0, i G /(XQ)

is inconsistent. By the definition of upper approximates, the following system


of convex functions is inconsistent.

<^ G cone (C - XQ), p{x) < 0, (pi{x) < 0, zG /(XQ). (11)

The rest of the proof is similar to those of Theorems 2 and 3. D

It is worth noting that the conclusion of Corollary 1 still holds if in the


definition of upper approximate (Definition 1) one replaces directional deriva-
tives g'{xo, d) by the upper Dini derivative ^"^(xo, d) of g at XQ in the direction
d e X which is defined by

+/ rx V g{xo i-Xd) - g{xo)


g^{xo,d) : = l i m s u p - ^ -^ ^—-.
A^o+ A
This can be seen by replacing (11) by the following:

^G cone ( C - x o ) , /+(xo,0<0. ^ ^ ^ ( ^ o , 0 < 0 . i e I{xo).

The Lagrange multipliers that exist in Corollary 1 are constants (indepen-


dent from the directions). The price for this is that (10) is just based on the
upper approximates of / and gi at XQ instead of /'(^o, •)' dii^o^ •)? ^ ^ -^(^o) as
Generalized Lagrange Multipliers 303

in the previous subsection (Theorem 2). However, for smooth problems (i.e.,
/ , gi are differentiable), or convex, or locally Lipschitz problems, condition
(10) collapses to the standard optimaUty conditions. For instant, if / and gi
are convex then / ' ( X Q , •)? Qii^o, •)? ^ ^ -^(^o) are convex and hence, by taking
(/>(•) = /'(xo,'), 0i(-) = giixo, •), i e I{xo), (10) is none other than
f{xo,x) + ^ Xig[{xo,x) > 0, Vx G cone (C - XQ)

(provided that there is x G cone {C — XQ) satisfying g[{xQ^x) < 0 for all
i e I{xo)), Note also that by separation theorem, this inequality is equivalent
to
Oedf(xo)+ Y^ Xidgi(xo) + Nc{xo)
i€lixo)
where Nc{xo) stands for the normal con of C at XQ in the sense of convex
analysis.
Example 1. Consider the following problem (PI)
min f{x)
subject to g{x) < 0, x = (xi,X2) G C
where
C=:co{(0,0),(-l,-l),(-l,l)}
and the functions f^g-.R"^ —> R are defined by

f{x) ^ -X2 + ylxf - x | | , g{x) = I |xi|+X2|.


Observe that
5 - C n {x G R^ : ^(x) < 0} = CO {(0,0), ( - 1 , - 1 ) } C cone C.
Let xo = (0,0). It is easy to see that /(XQ) = g{xo) = 0, f'{xo^r) = f{r)
and g'{xQ,r) = g{r) for all r = (ri,r2) G R^.
Set, for r = (ri,r2) G cone (C — XQ),
0 if -r2 + V|r2-ri|>0,
fir)
^^'^''" \ 9(r)
- ^ ^i -r2 + v / R ^ ^ < 0
(note that when —r2 + \ / k i — ^21 < 0' ^(^) ¥" 0)- Then the following holds:
/'(xo, r) + A(r)^'(xo, r) > 0, We cone (C - XQ).
This means that XQ = (0,0) is a directional Kuhn-Tucker point of (PI).
On the other hand, it is clear that for each x G 5 (feasible set),
/ ( x ) - / ( x o ) = /'(xo,x),
g{x) -g{xo) =g'{xo,x),
which proves that (PI) is invex at XQ (with 77: S —> cone (C—XQ), ry(x) — x).
Thus, Xo is a minimizer of (PI).
304 N. Dinh et al.

3 Special Cases and Applications


In this section we will show that for some special classes of problems such as
composite nonsmooth problems with Gateaux differentiablity or for problems
where the directional derivatives are generalized subconvexlike, the general-
ized Lagrange multipliers can be chosen to be constants. The last part of this
section is left for an application to a class of quasidifferentiable problems.
Some examples are given to illustrate the significant of the results.

3.1 Problems with convexlike directional derivatives

Let i^ be a subset of X. Let ^ = (0i,</>2, • • • Am) '- D —> W^. Recall that
the map ^ is called convexlike {subconvexlike^ resp.) if ^{D) + M!p is convex
(^(D) + intR!p is convex, resp.). It is called gerneralized subconvexlike if
cone^(D) + intR!p is convex (see [HK82, Jey85, Sac02]).
It is well-known that the Gordan's alternative theorem still holds with con-
vexlike, subconvexlike, generalized subconvexlike functions instead of convex
ones (see [Jey85, Sac02] for more extensions). Namely, if ^ = (</>i, 02,''' -, 4>m) -
D —> M"^ is generalized subconvexlike (convexlike, subconvexlike) on D then
exactly one of the following assertions holds:
(i) 3x e D such that ^^(x) < 0, z = 1,2, • • • , m,
(ii) 3X - (Al, A2, • • • , Xm) G R!p, A 7^ 0 such that YlT=i >'iMx) > 0, Vx G
D,

Theorem 4. Suppose that f and gi, i £ H^o) ^^^ directionally differentiable


at xo and that gi with i ^ /(XQ) are continuous at XQ. Suppose further that the
map ^ : cone {C - XQ) —> RI^(^O)|+I defined by ^ ( 0 = {f'{xQ,^),g[{xQ,^)),
i G I[XQ) is generalized subconvexlike. If XQ e S is a local minimizer of (P)
then there exists A = (AQ, AI, . . . , Am) G RIP"^^, A 7^ 0 such that the following
conditions hold.
m
cone (C — Xo),

^i9i{xo) = 0, for a// z = 1,2, • • • , m.


Moreover, if there exists x G cone {C — XQ) such that g[[xQ^x) < 0 for all
i G I{xo) then Ao 7^ 0 (and hence, one can take XQ = I).
Proof It is easy to see that the optimality of XQ implies the inconsistency of
the following system of variable ^ G X:

CGcone ( C - x o ) , f(xo,O<0, p - ( ^ o , 0 < 0 . i e I{xo).

The existence of AQ > 0, A^ > 0, z G /(xo), not all zero, satisfying the
conclusion of the theorem now follows from Gordan's theorem for generalized
subconvexlike systems (setting A^ = 0 for i ^ / ( X Q ) ) . The rest is obvious. D
Generalized Lagrange Multipliers 305

Let XQ be a feasible point of (P) and let D be the set of all feasible directions
of (P) from XQ. Set

M := {(r(a:o,c^),(^K^o,rf)W(xo)) \deD}.
We now apply Corollary 1 to derive an optimality condition (with constant
Lagrange multipliers) for (P), which was established recently in [CraOO].

Corollary 2. [CraOO] Let U be a closed convex cone contained in M. De-


note q := {fA9i)iGi{xo))' Assume that some d* satisfies q'{xo,d*) G U and
g[{xQ^d*) < 0 for all i G I{XQ). Then there exists A = {\)iei{xQ) ^ M!^ ^^ ,
dependent on U but not on d G D, such that for each d E D := {d E D \ 3rj E
E,q'{xQ,d) = r}},

f'{xo,d)-{- Y, Xigi{xo,d)>0.
i€iI{xQ)

Proof By definition, for each d G D, q'{xo,d) G U . Define

Then ^(D) = E is a closed convex cone and hence, ^ is convexlike. The


conclusion follows from Theorem 4 with D playing the role of cone(C — XQ).
U

3.2 Composite nonsmooth programming with Gateaux


differentiability

Let X, Y be Banach space and C be a closed convex subset of X. Consider


the composite problem (CP):

(CP) Minimize /o(Fo(^))


subject to X G C^ fi{Fi{x)) < 0, z = 1,2, • • • , m,

where Fi : X —^ Y is Gateaux differentiable with Gateaux derivative F/(-)


and fi :Y -^R is locally Lipschitz, i = 0,1, • • • , m.
Note that the Gateaux differentiability of a map F : X ^ F at a does
not necessarily imply the continuity of F at a. The following simple example
[IT79, p. 24] shows this. Let f{x,y) = 1 if x == T/^ and y ^ 0, f{x,y) = 0
otherwise. Then / is Gateaux differentiable at (0,0) and /'(0,0) = 0 while /
is not continuous at 0.
Now let xo G C n {x G X I fi{Fi{x)) < 0, i = 1,2, • • • , m}, / -
{1,. • • , m } , Jo = {0} U / and let I{xo) = {j e I \ fj{Fj{xo)) = 0}, XQ G C.
We shall use the notation (/ o F)+(a, d) to indicate the upper Dini derivative
of / o F at a in the direction c(, which is defined by
306 N. Dinh et al.
/. X.X + / n . fiF(a + Xd))-f(F(a))

The following lemma is crucial for establishing optimality conditions for


(CP).
Lemma 2. Let a E X. If F : X —^ Y is continuous and Gateaux differentiable
at a and f : Y —^ R is locally Lipschitz at F{a) then for any d £ X, there
exists V G df{F{a)) such that

{foF)+ia,d) = {v, F'{a)d).

Proof By the definition of upper Dini derivative, there exists (An) C M-f, An —>
0 such that

ifoF)Ha,d) = lim nFia + Xnd))-fiFia))_ ^^^^


n—»oo An

Assume that / is Lipschitz of rank X on a convex open neighborhood U of


F{a). Note that / is also locally Lipschitz at any point of U with the same
rank K. Since F is continuous at a, without loss of generality, we can assume
that for all n, F{a + And) G U.
It follows from the mean-value theorem of Lebourg [Cla83, Theorem 2.3.7,
p. 41], for each n G N, there exist tn G (0,1), Vn G df{zn) such that
f{F{a + And)) - f{F{a)) = K , F{a + Kd) - F{a)) (13)
where Zn := F{a) + f{F{a + And) - F{a)) G U.
Note that Vn G Y* and \\vn\\ ^ ^ - Hence we can assume {vn)n weak*
converges to v. Note also that when n -^ oo, we have Zn -^ ^"{0) and it
follows from the weak* - closedness of the 9 / , we get v G df{F{a)), It now
follows from (12) and (13) that
( / o F ) + ( a , d ) = (^,F'(a)d>.
D

Following Lemma 2, if we set

vedfiFia))
then iZ^ : X —> R is a l.s.c. subUnear function (finite valued). Moreover,
(/oF)+(a,d) <^(d) for all deX.
This means that ^ is an upper approximate of / o F at a (see the remark that
follows Corollary 1).
We are now in a position to give a necessary condition for optimality for
(CP).
Generalized Lagrange Multipliers 307

T h e o r e m 5. Assume that Fi is continuous and Gateaux differentiable at a


feasible point XQ of (CP) and fi is locally Lipschitz at F^(xo), i = 0,l,--- ,m.
If XQ is a solution of (CP) then there exist Ao, Ai, • • • , A^ > 0, not all zero,
Vi e dfi{Fi{xo)), i e /o such that
m
[XoF^ixoTvo + ^XiFl{xorvi]ix - XQ) > 0, VX G C ,

Xifi{Fi{xo)) - 0, Vi G / ,
where F/(xo)* is the adjoint operators o/F/(xo).
Proof We first notice that if XQ is solution of (CP) then the following system
has no solution d G X:

d e cone{C - XQ), {fi o Fi)^{xo, d) <0, ie I{xo) U {0}.

Let ^i{d) : - max {vi,Fl{xo)d). Then {fi o Fi)^{xo,d) < %{d) for all
Vi^dfi{Fi{xQ))
d e cone{C — XQ). It follows from Corollary 1 that there exist Ao, A^ > 0, z G
I{xo), not all zero, such that

\o%{d) + J2 ^i'^M) > 0, Vd G cone {C - XQ). (14)


iel{a)

Since XQ G C, 0 G cone{C — XQ), the above inequality means that 0 is a


minimizer of the convex problem

Minimize [Xo%{d) + ^ Xi^i{d)]


iel(xo)

subject to d e cone{C — XQ).

This is equivalent to
OGAO9^O(0)+ Y1 ^idHO) + Nc{xo). (15)

Note that for each d e X, i e I{xo) U {0},

Md) = ^inax {F[{xoYv,d) = max {w,d) = asM

where Bi := F/(xo)*[9/i(F^(xo))] and (7^. is the support function of Bi. It


follows from [Cla83, proposition 2.1.4, p. 29] that the set Fl{xoy[dfi{Fi{xo))]
is weak*-compact and we have

d^iiO) = FlixonOMFiixo))].

It follows from the last equahty and (15) that there exist Vi G dfi{Fi{xo)), iG
/ U {0} such that
308 N, Dinh et al.
771

XoF^ixoTvo + J2^i^iM"^i]i^ - ^o) > 0, Vx G C.


i=l

The conclusion follows by setting A^ = 0 if i ^ H^o)^ i T^ 0. D

We now give a necessary condition for (CP) in Kuhn-Tucker form.

Theorem 6. Assume that all the conditions in Theorem 5 hold. Assume fur-
ther that the regularity condition that there is do G cone{C — XQ) satisfying
^i{do) < 0; for all i G I{xo) holds. If XQ is a solution of (CP) then there exist
Ai > 0, i G / , Vi e dfi{Fi{xo)), i G /o such that
m
[F^{xo)*vo + ^XiF'{xoyvi]{x - xo) >0,\/xe C,

XiMFiixo)) = o,\/ie I.
Proof The proof is the same as that of Theorem 5. Note that if the regularity
condition in the statement of the theorem holds then Ao 7^ 0 in (14). D

It is worth noting that the same conditions as in Theorems 5-6 were es-
tablished in [Jey91] under the additional assumption that the maps Fi, i E IQ
are locally Lipschitz.
The following example illustrates the significance of Theorems 5, 6.

Example 2. Consider the following problem (P2)

Minimize f{F{x, y))


subject to g{G{x, y)) < 0, (x, y) G R^

where / : R -^ R, ^ : R -> R, F : R^ -^ R, G : R^ -> R are the functions


defined by
f{z) = z, g{z) = z, G{x,y) = x,
x^ if y = ^,
n^.2/)=|0 if 2/^0.
Note that F is continuous at XQ = (0,0), Gateaux differentiable at this point
and F'ixo) = (0,0) but F is not locally Lipschitz XQ = (0,0). It is easy to see
that Xo = (0,0) is a solution of (P2), and the necessary condition in Theorem
6 holds with AQ == 1, Ai = 0 (note also that for (P2) the regularity condition
in Theorem 6 holds).
Generalized Lagrange Multipliers 309

3.3 Quasidifferentiable problems

Quasidifferentiable functions are those of which the directional derivatives


can be represented as a difference of two sublinear functions. The class of
these functions covers all classes of differentiable functions, convex functions,
DC-functions, • • •. It was introduced by V.F. Demyanov and A.M. Rubinov
([DR80]) in 1980. Since then optimization problems with quasidifferentiable
data have been widely investigated and developed by many authors (see [D J97,
DV81, DT03, EL87, GaoOO, 0192, LRW91, MW90, Sha84, Sha86, War91], . . .
. See also [DPR86] for a discussion on the place and the role of quasidifferen-
tiable functions in nonsmooth optimization). Many optimality conditions were
introduced. Most of them are conditions that base on the subdifferentials and
super differentials of the quasidifferentiable functions involved.
In this section we will apply the results obtained in Section 2 to quasi-
differentiable programs. The relation between the directional Kuhn-Tucker
condition and some other type of Kuhn-Tucker conditions that appeared in
the literature is established. It is shown (by an example) that for quasidifferen-
tiable problems (even in the finite dimensional case) the generalized Lagrange
multipliers can not be constants.
Let X be a real Banach space and xo G X. A function / : X —> RU{+oc}
is called quasidifferentiable at XQ if / is directionally differentiable at XQ and
if there are two weak* compact subsets df{xo), df{xo) of the topological dual
X* of X such that

f\xo,d)= max ( d , 0 + mm (rf,0, VrfGX. (16)


Cedfixo) ^edfixo)
The pair of sets Df{xo) := [df{xo), df(xo)] is called the quasidifferential of /
at XQ and df{xo), df{xo) are called the subdifferential and superdifferential
of / at xo, respectively. Note that the quasidifferential Df{xo) of / at XQ is
not uniquely defined (see [LRW91]). Note also that (16) can be written in the
form
f(xo,d)= max {d,0 - max {d.^.^deX (17)
Cea/(a;o) ^e-a/(a:o)

and hence, f'{xo,.) can be represented as a difference of two sublinear func-


tions. In general, f'{xo,.) is not convex.
Throughout this subsection, the following lemma plays a key role.

Lemma 3. / / / is quasidifferentiable at XQ then for any direction ^ ^ X, there


is an upper approximate of f at XQ in the direction ^.

Proof Let ^ e X. Since df{xo) is weak* compact, there exists v G df{xo)


such that
{^,v)= min {^,v),
vedf{xo)
310 N. Dinh et al.

Let
^^(x) := max {x,v) + {x,v), (18)
ved_f{xQ)

It is easy to see that ^^(.) is sublinear, l.s.c, f\xQ^x) < # ( x ) for all x G X,
and f'{xo,^) = ^^(0? which proves ^^(.) to be an upper approximate of / in
the direction ^. D

Consider the problem (P) defined in Section 1. Let S be the feasible set of
(P) and XQ e S. We are now ready to get necessary and sufficient optimality
conditions for (P).

Theorem 7. (Necessary condition) For the problem (P), assume that f, gi,
i G / == {1, 2, 3, • • • ,m} are quasidifferentiable at XQ and gi is continuous at
XQ for all i ^ I{XQ). If XQ is a minimizer of (P) then

Vr G cone(C - XQ), 3 (Ao, Ai, • • • , A^) G R^"^^ \ {0} satisfying


Xof\xo,r) + J2iei ^i9i{^o, r) > 0, Xigi{xo) = 0, Vi G / .

Moreover, if one of the following conditions holds


(i) dim X < +CXD and (P) is (CQ2) regular at XQ,
(a) (P) is (CQ2) regular at XQ and g'i{xQ^.) is l.s.c. for all i G I{XQ)
then Ao 7^ 0 and hence we can take Ao = 1 (i.e., XQ is a directional Kuhn-
Tucker point of (P)).

Proof. It follows from Lemma 3 that the functions/ and gi possess upper
approximates at XQ in any direction ^ G X. The conclusion now follows from
Theorem 2. D

The following theorem is a direct consequence of Theorem 3 in Section 2.

Theorem 8. (Sufficient condition) For the problem (P), assume that f, gi,
z G / = = { l , 2 , 3 , - - - , m} are quasidifferentiable at XQ and gi is continuous at XQ
for all i ^ I{XQ). Assume further that XQ is a directional Kuhn-Tucker point
of (P). If (P) is invex at XQ on the feasible set S then XQ is a global solution
of(P)-

It should be noted that both the necessary and sufficient optimality con-
ditions for (P) established in Theorems 7, 8 do not depend on any specific
choice of quasidiflPerentials of / and gi, i E I{XQ).
The regularity conditions are of special interest in quasidifferentiable opti-
mization. The above (CQ2) condition was introduced in [MW90]. It is prefered
much since it does not depend on any specific choice of the quasidiff'erentials
(see [LRW91]). In order to make some relation between our results and the
Generalized Lagrange Multipliers 311

others, we take a quick look at some other regularity conditions that ap-
peared in the literature and for the sake of simplicity we consider the case
where C = X.
(CQ3) i G / ( x o ) , V^^ e % ( x o ) , 0 ^ CO U (^^(^o) + ^^).
ieI{xo)

(RC) There exists x e X such that

max (x^Vi) -f max {x,Wi)<0^ Vz G/(XQ).


Viedgiixo) Wiedgi(xo)

The (CQ3) condition was used in [SacOO] and [LRW91] while (RC) was
introduced in [War91], both for the case where X = W^.
It was proved in [DT03] that in the finite dimensional case (CQ3) is equiv-
alent to (RC). By Lemma 3, it is clear that (RC) implies (CQl).
On the other hand, it was proved in [LRW91] that (RC) imphes (CQ2)
when X = E^. However, the proof (given in [LRW91]) goes through without
any change for the case where X is a real Banach space. Briefly, the following
scheme holds for quasidifferentiable problems:

(CQ3) ^=>ix=Rr.) (RC) =^ (CQ2)

(CQl)
The conclusion in Theorem 7 (also Theorem 8) was established in [DT03]
for quasidiff'erentiable Problem (P) when C = X = W^ and under the (CQ3)
(or the same, (RC)).
Due to the previous observation. Theorems 7 still holds if (RC) is assumed
instead of (i) or (ii).
As mentioned above, the quasidifferentiable problems with inequality con-
straints of the form (P) have been studied by many authors. Various types of
Kuhn-Tucker conditions were proposed to be necessary optimality conditions
for (P) (under various assumptions and regularity conditions). A typical such
condition is as follows:

cone {dgi{xo) + Wi)]. (19)


wiedgi(^xo) ieI{xo)
i^I{xo)

A point XQ satisfies (19) is called a Kuhn-Tucker point of (P) (see [War91,


LRW91]). The ondition (19) was established in [War91] as a necessary condi-
tion for a point XQ G 5 to be a minimizer for (P) when C = X = W^ (under
some reagularity condition). It was proved in [DT03] for C = X = R^ that
if XQ is a Kuhn-Tucker point of (P) then it is also a directional Kuhn-Tucker
312 N. Dinh et al.

point of (P). This conclusion still holds (without any change in the proof)
when X is a Banach space.

The following example shows that the two notions of the Kuhn-Tucker
point and the directional Kuhn-Tucker point are not coincide, and that even
for a simple nonconvex problem the generahzed Lagrange multiplier can not be
chosen to be a constant function. It also shows that one can use the directional
Kuhn-Tucker condition to search for a minimizer.

Example 3. Consider the following problem (P3)

min f{x)
subject to g{x) < 0, x == (xi,X2) G C C B?,

where f^g\B? —> R are functions defined by

f{x) \=X2,
9{x) := I xixi-\-(xj+xl)^
+ {xj -f xl)^
- X2 if
if
X2 > 0,
X2 < 0.

Let xo = (0,0) G R 2 .

(a) Consider first the case where C := co {(0,0), (0,1), (1, —1)}.
(i) It is clear that 5 = C H {x G R^ | g{x) < 0} = co {(0,0), (0,1)} C
cone {C — XQ) = cone C, where S is the feasible set of (P3). It is also easy
to check that XQ is a directional Kuhn-Tucker point of (P3). The generalized
Lagrange multipher A : cone C —> R+ can be chosen as follows (r = (ri, r2) G
cone C):

Equivalently, the following inequality holds for all r = (^1,^2) G cone C:

r ( x o , r ) + A(r)^'(xo,r)>0. (21)

On the other hand, since /'(xo,r) = r2, g'{xo,r) = g{r), /(XQ) = g{xo) =
0, it is easy to see that (P3) is invex at XQ with rj : S —> cone C, rj{x) = x.
Consequently, XQ is a minimizer of (P3) due to Theorem 3.
(ii) For (P3), the generalized Lagrange multiplier A : cone C —> R4. can
not be chosen to be a constant function. In fact, (21) is equivalent to

r2 + X{r)g{r) > 0. (22)

This shows that for r = (ri,r2) G cone C with r2 < 0 (then g{r) > 0), A(r)
satisfies (22) if and only if A(r) G [—-^5+00). So the multilipier A(r) = ~~^
which is chosen in (20) is the smallest possible number such that (22) holds.
Generalized Lagrange Multipliers 313

We now take a sequence of directions {rn)n C cone C with Tn — (rin,^2n)?


^2n = — 1, for all n E N and rin -^ —oo as n —> +00. Then

r2n 1
= = y 1 + rj^ - Tin -^ +00 as n -^ +00.
9{rn) Tin + x/1 + rlIn

(h) The case where C = M^. The Problem (P3) with C = B? was con-
sidered in [War91, Example 3.2], [LRW91, Example 3] and [DT03, Example
3.9]. It was proved in [LRW91] that xo is not a Kuhn-Tucker point of (P3).
But it is shown in [DT03] that XQ is a directional Kuhn-Tucker point of (P3).
Moreover, similar observations as in the case (a) ((i) and (ii)) still hold. We
now show another feature of the directional Kuhn-Tucker condition.
It is possible to search for the candidates for minimizers of (P3) by using
the directional Kuhn-Tucker condition. Note that a point a: is a directional
Kuhn-Tucker point of (P3) if and only if for each r = (ri,r2) G M^ the
following system (linear in variable A) has at least one solution A:

f(x,r)+A^'(x,r)>0,
A > 0, (23)
Xg{x) = 0.

Note also that g{x) = 0 iff Xi = 0, X2 > 0 or xi < 0, X2 = 0. We consider


various possibilities.
(a) If X = (xi,X2) G R^ such that g{x) ^ 0 then A must be zero and
the first inequality in (23) becomes r2 > 0 (A = 0). This is impossible for all
r = (ri,r2) G R^.
(/?) If X = (xi,X2) e R^ with xi = 0, X2 > 0 then ^(x) = 0. Some
elementary calculation gives g'{x,r) = r i , f'{x^r) = r2- The system (23)
becomes
/ r 2 + Ari > 0 ,
\A>0,
which has no solution A when ri < 0 and r2 < 0.
(7) If X = (xi,X2) G R^ with xi < 0, X2 = 0 then ^(x) =• 0. Take
r = (ri,r2) G R^, r2 < 0 then we get f'{x,r) = r2 and g'{x,r) = 0. In this
case, (23) is equivalent to
rr2 + A 0 > 0 ,
IA>0,
which has no solution.
Therefore, every point x G R^ \ {(0,0)} fails to be a directional Kuhn-
Tucker point of (P3). As it is already known that XQ = (0,0) is a directional
Kuhn-Tucker point of (P3) and so it is a minimizer of (P3).
It is worth noting that in this case (C = R^), XQ is not the unique solution
of (P3). In fact, all points of the form (x,0) where x < 0 are solutions of (P3).
However, these points, except XQ = (0,0), are not directional Kuhn-Tucker
314 N. Dinh et al.

points of (P3). This happens since (P3) does not satisfy regularity conditions
stated in Theorem 2. This means that even for non-regular problems the
directional Kuhn-Tucker condition can be used to find out solutions satisfying
this condition (if any).

4 Directionally DifFerentiable Problems with


DSL-approximates
In this section we will give some extension of the framework to some larger
classes of problems. Namely, the class of problems for which the objective
function and the functions appeared in the inequality constraints possess some
upper DSL-approximates (in the sense introduced in [Sha86], [MW90]) at the
minimum point. Let X be a Banach space.

Definition 6. [Sha86] A function h : X —> M is called a DSL-function if h


is a difference of two sublinear functions. That is, there exist p,q : X —> R
which are sublinear and such that h{x) = p{x) — q{x) for all x E X.

Note that a DSL-function can be represented in the form

h{x) = max(x, a) + min(x, 6), Vx G X. (24)


aeA b£B
where A,B are convex, compact subsets of X. Obviously, h is quasidifferen-
tiable at 0 (the origin in X) and one can take Dh{0) := [9/i(0), dh{0)] = [A^ B]
(see the definition of Dh{0) in section 3.3, and note that Dh{0) is not uniquely
defined).

Definition 7. [Sha86] Let g : X —^ M be directionally differentiable at XQ. A


function 4> : X —> M is said to be an upper DSL-approximate of g at XQ if (j)
is a DSL-function and if

g\xo,x) < 0 ( x ) , yxeX. (25)

Suppose now that X is a real Banach space and g : X —> R U {+00} is


directionally differentiable at XQ.
Consider the Problem (P) in Section 1 with C = X. As usual, let S be the
feasible set of (P). Assume that f^gi^iel are directionally differentiable at
xo G S. Moreover, let / and gi possess upper DSL-approximates (p, (pi, i e I,
at Xo, respectively.
Note that each 0, (pi has the form (24) and hence, for each ^ G X, we can
construct the functions ^^, ^^, i G / , as in (18) (with A, B in (24) playing the
role of 9/(xo), df{xo)). These functions are upper approximates of / and gi,
respectively. Moreover,
Generalized Lagrange Multipliers 315

<A(0 = <?«(0; 0(3^) < ^«(fc), Vx e X ^^e)


MO = ^fiO; Mx) < ^f(^). Vx ex,\/i€ I.

Theorem 9. Assume that (P) is (CQ2) regular. If XQ is a minimizer of (P)


then
Vr G X 3 (Ai, •.. , A^) G R!p satisfying , .

Theorem 10. If XQ G 5 satisfies (27) for some upper DSL-approximates (j),


(f>i of f and gi (at XQ) and if (P) is invex with respect to (j), (j)i, z G I{XQ) then
XQ is a global minimizer of (P).

The proof of Theorem 10 is the same as that of Theorem 3 with 0, (j)i playing
the role of / ' ( X Q , .)» ^^(^o, O? '^ ^ ^(^o), respectively.

Proof. ( for Theorem 9.) We follow almost the same argument as in the proof
of Theorem 2 under the assumption (b).
Fix r ^ X. Since XQ is a minimizer of (P), the following system of variable
^ G X is inconsistent:

/'(^o,0<0, ^ • ( x o , 0 < 0 , V2G/(a;o). (28)

Take ^^ ^ ^ [ , i G /(XQ) to be the functions with the property (26) and with
£, = r. Lemma 1 then ensures the existence of h and hi which are upper
approximates of / and gi, i E I{xo), respectively, such that for all x G X,

/i(x)<min{^-(x),(r)-(xo,x)},
hi{x) < min{^K^). (ginxo^x)}, W G /(XQ). ^^^^

It follows from the inconsistency of (28) and the definition of upper apprixi-
mate functions that

h{x) < 0, hi{x) < 0, i G /(xo)

is inconsistent. In turn, Gordan's theorem leads to the existence of AQ > 0,


Ai > 0, i G /(xo), not all zero, such that

Xoh{x) + J2 A^^^(^) > 0, Vx G X. (30)


iel(xo)

If Ao = 0 then by (30), X]iG/(xo) ^i^i{x) ^ 0, for all x G X. This is impossible


because of (CQ2), (29) and the fact that A^, i G /(XQ) are nonnegative, not
all zero. Therefore, AQ 7^ 0 (take AQ == 1). We get from (30) for x = r,

Kr) + ^ \hi{r) > 0.


iEl(xo)
316 N. Dinh et al.

Combining this, (29), and (26), we get

iGl(xo)

Then (27) follows by setting A^ = 0 with i ^ I{XQ). D

We now show the relation between our results and the results in [Sha86].
In [Sha86] the author considered a problem with equality and inequality con-
straints but here we ignore the equality constraints. In [Sha86], the author
considered the Problem (P) with C = X = W^^ f and gi^ i E I are locally
Lipschitz at point XQ G S {S is the feasible set of (P)). The upper Dini dirc-
tional derivative of a (locally Lipschitz) function g at XQ^ denoted by ^+(xo, .)•
The upper DSL-approximate of a locally Lipschitz g was defined as in Defini-
tion 7 with g'{xQ^x) was replaced by p"^(xo, x) in (25). Suppose that (/>, (/)i are
upper DSL-approximates of/, gi, i G I (respectively) at XQ. It was established
in [Sha86] that under the so-called "nondegeneracy condition^' (regularity con-
dition) with respect to 0^, i G I{XQ)\

cl {y I (t^i{y) < 0,Vz G /(xo)} = {y \ Mv) < 0,Vi G /(XQ)},

the following is necessary for XQ to be a local minimizer of (P):

-^(/>(0) C U [^0(0) + cone |J (0^0)+ Wi)]. (31)


WiedcPiiO) ieI{xo)
ieI{xo)

Note that in (31) the inclusion holds for the quasidifferentials of upper
DSL-approximates of / and gi instead of those of / and gi themselves as in
(19). Note also that (31) can be found in [MW90] (as a special case) where it
was proved under (CQ2) regular condition. The relation between the necessary
optimaity conditions (31) and (27) is established below.

Theorem 11. (31) implies (27).

Proof. We first note that

cone [J {d(l)i{0) +Wi) = ^ cone [d(t)i{0) -\-Wi).


ieI{xo) ieI{xo)

Hence, (26) can be rewritten in the form

~d(t){0) C (J [MO)+ Yl <^one{d(l)i{0)^Wi)]. (32)


wied(f>i{0) iei{xo)
Generalized Lagrange Multipliers 317

Suppose (32) holds and r is an arbitrary point of X. Take


V e argmin^^^^(o)(r,0, Vi e argmin^.^^^.(o)(r,^^), i G I{xo). (33)
Then (32) implies that

Oed(l){0)-\-v+ Y^ cone {d^i (0) + ^^).

This ensures the existence of a G d(f){0), bi G 90t(O), and A^ > 0, i G I{xo)


such that
0 = a + v+ ^ A^(6i+^i).
iG/(a;o)

Combining this and (33) we get

0(r) + y ^ \(j)^(r) z=z max (r,i;)+ min (r,it;)


ie/^o) "^-^^"^ "^^^(°)
+ y^ A j max ( r , ^ i ) + min (r,r/i)l

= max (r, t') + (r, v)


v£d(f)(xo)

+ IZ A,[ max (r,e,) + (r,TJ,)]


i£l{xo)

> (r, a) + (r, tJ) + ^ A^ [(r, &») + (r, zJi)]


^€/(a:o)

> ( r , a + t;+ ^ Xi{bi-{-Vi))


iel(xo)
>0.
Set Ai = 0 for z ^ I{xo)' Then (28) holds since r G X is arbitrary. D

Acknowledgement
The authors would like to thank the referees whose comments improved the
paper. Work of the first author was supported partly by the project "Rought
Analysis - Theory and Applications", Institute of Mathematics, Vietnam
Academy of Science and Technology, Vietnam, and by the APEC postdoc-
toral Fellowship from the KOSEF, Korea. The second author was supported
by the Brain Korea 21 Project in 2003.

References
[BRS83] Brandao, A.J.V., Rojas-Medar, MA., Silva, G.N.: Invex nonsmooth alter-
native theorems and applications. Optimization, 48, 230-253 (2000)
318 N. Dinh et al.

[Cla83] Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York
(1983)
[Cra81] Craven, B.D.: Invex functions and constrained local minima. Bull. Austral.
Math. Soc, 24, 357 - 366 (1981)
[Cra86] Craven, B.D.: Nondifferentiable optimization by smooth approximations.
Optimization, 17, 3-17 (1986)
[CraOO] Craven, B.D.: Lagrange Multipliers for Nonconvex Optimization. Progress
in Optimization. Kluwer Academic Publishers (2000)
[DJ97] Demyanov, V.F., Jeyakumar, V.: Hunting for a smaller convex subdiffer-
ential. J. Global Optimization, 10, 305-326 (1997)
[DPR86] Demyanov, V.F., Polyakova, L.N., Rubinov, A.M.: Nonsmoothness and
quasidifferentiability. Mathematical Programming Study 29, 1-19 (1986)
[DR80] Demyanov, V.F., Rubinov, A.M.: On quasidifferentiable functionals. Dokl.
Acad. Sci. USSR, 250, 21-25 (1980) (in Russian)
[DT03] Dinh, N., Tuan, L.A.: Directional Kuhn-Tucker conditions and duahty for
quasidifferentiable programs. Acta Mathematica Vietnamica, 28, 1 7 - 3 8
(2003)
[DV81] Demyanov, V.F., VasiHev, L.V.: Nondifferentiable optimization. Nauka,
Moscow (1981) (in Russian).
[EL87] Eppler, K., Luderer, B.: The Lagrange principle and quasidifferent calcu-
lus. Wiss. Z. Techn. Univ. Karl-Marx-Stadt., 29, 187-192 (1987)
[GaoOO] Gao, Y.: Demyanov difference of two sets and optimality conditions of
Lagrange multiplier type for constrained quasidifferentiable optimization.
Journal of Optimization Theory and Apphcations, 104, 177-194 (2000)
[G192] Glover, B.M.: On quasidifferentiable functions and non-differentiable pro-
gramming. Optimization, 24, 253-268 (1992)
[Han81] Hanson, M.A.: On sufficiency of the Kuhn-Tucker conditions. J. Math.
Anal. Appl. 80, 545-550 (1981)
[HK82] Hayashi, M., Komiya, H.: Perfect duality for convexlike programs. Journal
of Optimization Theory and Applications, 38, 179-189 (1982)
[IT79] loffee, A.D., Tikhomirov, V.M.: Theory of extremal problems. North-
Holland, Amsterdam (1979)
[Jey85] Jeyakumar, V.: Convexlike alternative theorems and mathematical pro-
gramming. Optimization, 16, 643-652 (1985)
[Jey91] Jeyakumar, V.: Composite nonsmooth programming with Gateaux differ-
entiabihty. SIAM J. Optimization, 1 , 30-41 (1991)
[LRW91] Luderer, B., Rosiger, R., Wurker, U.: On necessary minimum conditions
in quasidifferential calculus: independence of the specific choice of quasid-
ifferentials. Optimization, 22, 643-660 (1991)
[Man94] Mangasarian, O.L.: Nonlinear Programming. SIAM, Philadelphia (1994)
[MW90] Merkovsky, R.R., Ward, D.E.: Upper DSL approximates and nonsmooth
optimization. Optimization, 21, 163-177 (1990)
[SacOO] Sach, P.H.: Martin's results for quasidifferentiable programs (Draft) (2000)
[Sac02] Sach, P.H.: Nonconvex alternative theorems and multiobjective optimiza-
tion. Proceedings of the Korea-Vietnam Joint seminar: Mathematical Op-
timization Theory and Applications. November 30 - December 2, 2002.
Pusan, Korea (2002)
[SKLOO] Sach, P.H., Kim, D.S., Lee, G.M.: Invexity as a necessary optimality condi-
tion in nonsmooth programs. Preprint 2000/30, Institute of Mathematics,
Hanoi (2000)
Generalized Lagrange Multipliers 319

[SLK03] Sach, P.H,, Lee, G.M., Kim, D.S.: Infine functions, nonsmooth alternative
theorems and vector optimization problems. J. Global Optimization, 27,
51-81 (2003)
[Sha84] Shapiro, A.: On optimality conditions in quasidifFerentiable optimization.
SIAM J. Control and Optimization, 22, 610-617 (1984)
[Sha86] Shapiro, A.: QuasidifFerential calculus and first-order optimality conditions
in nonsmooth optimization. Mathematical Programming Study, 29, 56-68
(1986)
[War91] Ward, D.E.: A constraint qualification in quasidifferentiable programming.
Optimization, 22, 661-668 (1991)
[YS93] Yen, N.D., Sach, P.H.: On locally Lipschitz vector-valued Invex functions.
Bull. Austral. Math. Soc, 47, 259-271 (1993)
Slice Convergence of Sums of Convex functions
in Banach Spaces and Saddle Point
Convergence

Robert Wenczel and Andrew E b e r h a r d

Department of Mathematics
Royal Melbourne University of Technology
Melbourne, VIC 3001, Austraha
robert.wenczelQrmit.edu.au, andy.ebQrmit.edu.au

S u m m a r y . In this note we provide various conditions under which the slice con-
vergence of fv -^ f and Qv —^ 9 implies that of fv+Qv to /H-p, where {fv}^^y^ and
{9'^}vew ^^® parametrized families of closed, proper, convex function in a general
Banach space X. This 'sum theorem' complements a result found in [EWOO] for
the epidistance convergence of sums. It also provides an alternative approach to the
derivation of some of the results recently proved in [Zal03] for slice convergence in
the case when the spaces are Banach spaces. We apply these results to the problem
of convergence of saddle points associated with Fenchel duality of slice convergent
families of functions.

2 0 0 0 M R S u b j e c t Classification.Primary 49J52, 47N10; Secondary 46A20,


52A27

K e y w o r d s : slice convergence, Young-Fenchel duality

1 Introduction
In this paper we provide alternative proofs of some recent results of Zalinescu
[Zal03]. Some hold for the case when the underlying spaces are general Banach
spaces and others only require the spaces to be normed linear. T h e paper
[Zal03] was originally motivated by [WE99] and extended the results of this
paper t o t h e context of normed space and to the convergence of marginal or
perturbation functions (rather t h a n j u s t sums of convex functions). In this
paper we clarify to what degree we are able to deduce such results from the
work of [EWOO, WE99] by either modifications of the proofs of [WE99] or
short deduction using the methods of [EWOO, WE99].
322 R. Wenczel, A. Eberhard

The first results give conditions under which slice convergence of a sum
{fv + 9v}vew follows from the slice convergence of the two parametrized fam-
ihes {fv}yew ^^^ {9v}veW' "^^^^ result has a counterpart for epi-distance
convergence which was proved by the authors in [EWOO] and we refer to such
results as sum theorems. We show that in the particular case of Banach spaces
the corresponding result for slice convergence follows easily from the work in
[WE99] and moreover so do the corresponding results for the so-called mar-
ginal or perturbation functions used to study duality of convex optimization
problems which are studied in [Zal03]. Such results only hold under certain
conditions which we will refer to as qualification assumptions due to their sim-
ilarity (and connections) to constraint qualifications in convex optimization
problems. The approach here is more aligned with that of [AR96] were the
sum theorem is the primary point of departure.
The marginal or perturbation function is given by h{y) := inf^rGX F{x,y)
from which the primal (convex) problem corresponds to /i(0) and the dual
problem corresponds to —/i**(0) = inf^^^y* F*(0,y*) — inf^^^y*/i*(y*).
This leads to the consideration of the dual perturbation function k{x*) :=
infy*^Y* F*{x'',y*) (see [Roc74, ET99]) and the consideration of the closed-
ness and properness of /i(y) at y == 0. Letting F, Fi e F {X xY) {i G I) then
as a framework for the study of stability of optimization problem one may
study the variational convergence of {Fi{',0)}^^j to F(-,0) and {F/(0, OI^G/
to F*(0, •) (see for example [AR96, Zal03]). Clearly this analysis is greatly
facilitated when the variational convergence under consideration is generated
by a topology for which the Fenchel conjugate is bi-continuous. Thus typically
the so-called slice and epi-distance topologies are usually considered as we
will also do in this paper. Once this is enforced the generality of this formula-
tion allows one to obtain the sum theorem alluded to in the beginning of this
introduction as well as many other stability results with respect to other op-
erations on convex functions and sets (which preserve convexity). In this way
the study of perturbation functions appears to be more general than the study
of any one single operation (say, addition) of convex functions. Indeed this is
only partly true in that when all spaces considered are Banach and the con-
straint qualification is imposed on the primal functions we will show that the
slice stability of the perturbation function follows easily from sum theorems.
When the qualification assumption is placed on the dual function we are able
to deduce the main result in this direction of [Zal03] in a straightforward man-
ner when all spaces are only normed (possibly not complete) linear spaces. It
is also possible to treat the upper and lower slice (respectively, epi-distance)
convergences separately as is done in [Zal03, Pen93, Pen02] and in part in
[WE99]. There is an economy of statement gained by avoiding this and it will
also avoid us reworking results in previously published papers. Consequently
we will not do so in this paper.
Convex-concave bivariate functions are related to convex bivariate func-
tions through partial conjugation (i.e. conjugation with respect to one of the
variables). In this context we are led to the introduction of equivalence classes
Slice Convergence of Sums of Convex Functions 323

of saddle-functions which are uniquely associated with concave or convex par-


ents (depending on the which variable is partially conjugated). Two bivariate
functions are said to belong to the same equivalence class if they have the
same convex and concave parents. Such members of the same equivalence
class not only have the same saddle-point but so do all linear perturbations
of these two functions. Thus when discussing the variational convergence of
saddle-functions one is necessarily led to the study of the convergence of the
equivalence class. We investigate saddle-point convergence of the associated
saddle function. This allows one to investigate the convergence of approximate
solutions of the perturbed Fenchel primal and dual optimization problems to
solutions of the limiting problem. It may be shown that one can quite gener-
ally deduce the existence of an accumulation point of the approximating dual
solutions.

2 Preliminaries
In this section we draw together a number of results and definitions. This is
done to make the development self-contained. A reader conversant with set-
convergence notions and the infimal convolution need only read the first part
of this section, only returning to consult results and definitions as needed. A
useful reference for much of the material of this section is [Bee93].
We will let C{X) stand for the class of all nonempty closed convex subsets
of a normed space X and CB{X) the closed bounded convex sets. Place
d{a, B) = inf{ \\a -b\\\b e B}, and Bp = {x e X \ \\x\\ < p}. Corresponding
balls in the dual space X* will be denoted B^. The indicator function of a set A
will be denoted 5^, and S{A^ •) shall denote the support function. We will use
u.s.c.to denote upper-semicontinuity and l.s.c.to denote lower-semicontinuity.
Recall that a function / : X —> R is called closed, proper convex on X if and
only if / is convex, l.s.c, is never — oo, and not identically +oo. The class
of all closed proper convex functions on X is denoted by r ( X ) , and r*(X*)
denotes the class of all weak* closed proper convex functions on X*. We shall
use the notation A for the closure of a set A in a topological space (Z, r) and,
to emphasise the topology, we may write A . For x e Z, Afr(x) denotes the
collection of all r-neighborhoods of x. For a function / : Z -^ R, the epigraph
of / , denoted epi / , is the set {(x^a) G Z x R | f{x) < a } , and the strict
epigraph epi^/ is the set {(x,a) G Z x R | f{x) < a}. The domain, denoted
d o m / is the set {x e Z \ f{x) < +oo}. The (sub-)level set {x e Z \ f{x) < a}
(where a > iniz f) will be given the abbreviation {/ < a}. Any product
X X y of normed spaces will always be understood to be endowed with the
box norm ||(a;,2/)|| = max{||a;||, ||2/||}; any balls in such product spaces will
always be with respect to the box norm. The natural projections from X xY
to X or F will be denoted by Px and Py respectively. We also will assume the
following convention for products Z x R where (Z, r ) is topological: We assume
the product topology, where R has the usual topology, and for any subset
324 R. Wenczel, A. Eberhard

C C Z X R, its closure in this topology is written as C . If / : (Z, r) —> R, its r-


l.s.c. hull, denoted / , is defined by / (x) = liminf^,jr^^ / ( ^ ' ) - The (extended)
lower closure cl^/ is defined to coincide with / if the latter does not take the
value — oo anywhere, and to be identically — oo otherwise.

Definition 1. Let F:W -^2^ he a multifunction from topological spaces W


toX.

1. limsup^^^ F{v) = Hvemw) U G V ^(^)-


2, liminf^^^ F{v) = f]{BCW\weB} U G B ^(^)-
3. F{') is lowersemicontinuous at w iff F{w) C lim iniy-^yj F{v),

Remark i. It is easily seen that this notion of lower-semicontinuity is equiv-


alent to the classical formulation—namely: For any open set U intersecting
F{w) there is a neighborhood F of it; for which F{v) nU is nonempty for
every !» in F .

Remark 2. For metrizable X, the above definitions can be shown to have the
equivalent forms:

1.

limsupF(f)
v—^w
= {x E X \3 a. net vp -^ w and xp G F{vp) with X/5 ^ x }
= {x G X I \immfd{x,F{v)) = 0}

liminf F(t')
= {x e X \\/ nets Vjs —^ w^ Bxjs -^ x with xp G F{vf3) eventually }
= {x e X \ limsupc/(x,F('i;)) = 0}

with obvious analogs for nets of sets.

Definition 2. Let A be a convex set in a topological vector space and x G ^4.


Then cone^l := UxyoXA (the smallest convex cone containing A).

The infimal convolution plays a central role in our development.

Definition 3. Let f and g be closed convex functions on X into the extended


reals. Then
ifOgXx) := mi(fiy)+9{x-y))
is called the inf-convolution.
Slice Convergence of Sums of Convex Functions 325

It is well known that the strict epigraph of the inf-convolution is equal to


the set-addition of the strict epigraphs of the individual functions:

episif^g) = epi^/ + epi^^.

Also dom {fDg) = dom / + dom g; epi fDg 2 epi / + epi g, and

ifngr = r+g*
where /*(x*) = sup^^j^((x,x*) — f{x)) is the Young-Fenchel conjugate of / .
Lower semi-continuity of the epi-graphical multi-function v H^ epi5(/-i;n^^)
may be deduced from that of its components using the following lemma, a
proof of which may be found in [WE99].

L e m m a 1. If Fi{') and F2(-) are multi-functions Ls.c. at w then F{v) :=


Fi{v) -\- F2(v) is ls.c. at w.

We conclude this section with a summary of variational limit notions used


in this paper. Let X and W be topological spaces, then iov x E X, w e W,
and {fv}vew a collection of R-valued functions on X, define the lower and
upper epi-limits by:

{e-\iy-,^fy){x) := sup sup inf i n f / ^ ( y ) ,


ueM{x) veH{w) '^^v v^u
{e-lsy-,yjfy){x) := sup inf sup inf fy{y).
U£M{x) y^^J^M vev y^^
It is well known [RW84] that these limits correspond to the Kuratowski(-
Painleve) limit of the epi-graph multifunction in the sense that

epi (e-ls^_^^/^) = liminf epi fy ,


epi (e-li^_^^/^;) = limsup epi fy . (1)

These definitions and relations have natural counterparts for nets {/^j^^/ of
functions.

Definition 4. Let {fy}y^w be a family of functions and r a topology on X.


We say that {fy}yew '^s r-epi-u.s.c. at w eW if for all x we have

{r-e-lSy^yjfy){x) < fyj{x)

and r-epi-l.s.c. if for all x

fv){x)

where the epi-limits are taken with respect to the underlying topology r.
326 R. Wenczel, A. Eberhard

We will say that {fv}vew is strongly epi-u.s.c. when r corresponds to the


strong (norm) topology on X. In this case we will drop the reference to r. Thus
for an epi-u.s.c. family the epi-graphs of fy are lower Kuratowski-convergent
to epi fw in the (strong) norm topology.

Definition 5. A family of functions {fv}vew ^^ R is epi-convergent to a


function fw (as v -^ w) if it is both epi-u.s.c. and epi-l.s.c. at w.

Since e-liy^^fv < ^-^^v^wfv on X, the relation defining epi-convergence


is in fact an equality.

Definition 6. Let {fv}vew ^^ a family of functions on X and {fy}vew the


family of conjugate functions on X* (for a normed space X). We denote the
bounded-weak* upper epi-limit (as v -^ w) of {fy}yew by

6ti;*-limsup epi/* := {(x*,a) E X* x R | 3 nets v^ -^ w; (y*,o;^) ^ ^P^fv


V—>W ^

such that a^ -^ a; y* norm bounded] y* —> x*}.

The above closely resembles the limit-superior of epigraphs, relative to the


bounded-weak* topology on X* (hence the terminology). The bounded-weak*
topology is described in, for example, [Hol75]. For a family of sets {F{v)}y^\Y
we will also say that it is 6i(;*-upper-semicontinuous (at w) whenever F{w) D
bw*-Urn supy_^ F{v)
Definition 7. [Bee92, Bee93] We say {fv}yew i"^ ^{^) ^s upper slice con-
vergent to f E r[X) (as V —^ w) if whenever Va ^^ w is a convergent
net and {x^} a bounded net in X we have for each {y*^rj) G epig/* that
fy^{xa) > {xocy'') — rj eventually. If we also have that fw > e-\syfy, then fy
is said to slice converge to fw
A dual slice convergence on r'*(X*) may be defined, which ensures the
bicontinuity of Fenchel conjugation. For our purposes, we work with an equiv-
alent definition of dual sHce convergence, as contained in the proposition to
follow.
Again, analogous definitions follow for nets of functions. The following
characterization of slice convergence is essentially contained in [WE99, Cor.
3.6].
Proposition 1. For functions fy G F{X), fy slice-converges to fw if and
only if
bw*-Urn sup epi fy C e p i / ^ C 5*-liminf epi/^ ,

where s* denotes the norm topology on X*.


Note that this result gives a characterisation of dual slice convergence for the
conjugate functions in r*(X*).
Slice Convergence of Sums of Convex Functions 327

Prom [H0I75] we have the following. Recall that a set A in a topological


linear space X is ideally convex if for any bounded sequence {xn} C A and
{An} of nonnegative numbers with Y^^=\ -^n = 1? the series Yll^=i ^n^n either
converges to an element of ^ , or else does not converge at all. Open or closed
convex sets are ideally convex, as is any finite-dimensional convex set. In
particular, if X is Banach, then such series always converge, and the definition
of ideal convexity only requires that Yl^=i ^n^n be in A. Prom [Hol75, Section
17E] we have
Proposition 2, For a Banach space X,
1' If C C X is closed convex, it is ideally convex.
2. For ideally convex C, intC — intC.
S, If A and B are ideally convex subsets of X, one of which is bounded, then
A — Bis ideally convex.

Proof. We prove the last assertion only; the rest can be found in the cited
reference. Let {an ~ bn} Q A — B he a, bounded sequence, let A^ > 0 be
such that Yl^=i An = 1. Then {an} £ A and {bn} Q B are both bounded, so
X ] ^ i Anttn e A and X ^ ^ i An^n ^ B (both convergent). Thus X l ^ i Kidn -

3 A Sum T h e o r e m for Slice Convergence


We will now discuss the passage of slice convergence through addition. Such
theorems will hereafter be referred to as sum theorems. In [WE99] was proved
a sum theorem for slice convergence of fn + Qn {^01 convergent /n, Qn) under
the rather restrictive condition that the conjugates ^* have domains uniformly
contained in a weak* locally compact cone. (This hypothesis arose from an
attempt to derive a sufficient condition that acts on only one of the sum-
mands, whereas most such conditions are symmetric in both fn and ^n-) In
the normed-space context, [Zal03, Prop. 25 or Prop. 13] yields an extension
of the results of [WE99], using a constraint-qualification more in the spirit of
those usually appearing in sum theorems for variational convergences (for in-
stance, in [AP90, Pen93, EWOO]). In this Section, we show that in the Banach
space context, the cited results of [Zal03] may also be derived using a slight
modification of arguments appearing in [WE99].

Definition 8. Following [Att86], define for K EH, and for functions fy, Qy
(v G W),

HK{X\V) := {(x*,2/*) G X* X X* I /^(x*) + ^:(2/*) <K, ||x* +y*|| < K} .

We shall also need the related object in Xy := span(dom/^ — dom^^) given


by
328 R. Wenczel, A. Eberhard

Definition 9.

H,ix:,v) := {(.*,,*). x: X x: I ^^^'^"^^^P/J^^'^^il^*^ -'''


L I IK r i/ l|A* r:^ ^^
where the conjugate functions are computed relative to the subspace Xy.

The following lemma from [WE99] provides a criterion for the inf-convolu-
tion of conjugate functionals to be weak* lower semicontinuous.
Lemma 2. ([WE99, Lem. 4-^]) Let fy and gy be in r{X) for a Banach space
X, such that i7/s:(X*, v) is bounded for each K ell. Then f*Bg* e r * ( X * ) .
The next lemma is elementary, and its proof will be omitted.
Lemma 3. Let fy be in r{X), with fy slice converging to f^, and Xy -^ Xyj
in norm, as v —^ w. Then fy(xy + •) slice converges to fwi^w + *)•
The following three lemmas provide bounds that will be of use in the next
theorem.

Lemma 4. Let fy and gy be proper closed convex H-valued functions with


domfy C Xy and dom gy C Xy for all v in some set V. If, additionally, for
some positive p, S,

(yv ev) BsnXyC {/^ <p}nBp- {gy <p]r\Bp (2)


then for each K > 0,

sup{||(x*,2/*)||x;xx; I {x*,y*) e HK{X:,V), veV}<+oo.

Proof. For v £ V, and {x*,y*) € Hii{X*,v), the Fenchel Inequality gives


(since fy\x,, fg\x^ are in r{X))

K > (MxSix*) + (QvlxSi^n > {x*,^} + {y\y) - fv{x) - g^y)

for any x G dom fy^ y G dom^-i; (C Xy).


Let ^ G XyHBs. From (2), ^ = x-y where x,y e Bp, fy{x) < p, gy{y) < p
whence (noting that x and y are in Xy also)

since ||x* + y*\\x* < K and y G dom^-^ C Xy with ||y|| < p. This yields
that ||x*||x* < 1 ( ^ ( 1 + p)'+ 2/o), from arbitrariness of ^ G ^5 fl Xy. Also,
||y*IU* < \\y* +x*\\x* + lk*IU* < K -\- ||x*||x* thus giving a uniform bound
on HK{X*,V) for all v. D
Slice Convergence of Sums of Convex Functions 329

L e m m a 5. ([WE99, Lem 4-2]) Let {fv}vew be a family of proper closed


convex extended-real-valued functions on a normed space X. Suppose that
fw ^ ^-^Sy-^u)fv on X. Then for each M > Q,

{W e M{w)){3^i e R){yv e V^O(V|k*|| < M)(/;(x*) > /i). (3)


L e m m a 6. Let fy, Qy he proper closed convex functions in r{X) (v G W).
Suppose that fyj > e-\sy-^yjfv on X. Then for any fixed K > 0 and j > 0,
there is a neighborhood V of w and a positive p for which

{Mv G V){^{x\,xl) G HK{X\v)r\B;){g:{xl) < p).

Proof Supposing the contrary, there are nets vp —> w, {xl , xX ) G HK^X""^vp)
nBj with limi3g* (x2^) = +oo. It then follows that lim/j fy^{x\^) = —oo, and
since \\x\ || < 7 for all ^, we have contradicted the statement of Lemma 5. D

Before proving the first of our main theorems we make the following im-
portant observation for latter reference.

L e m m a 7. Let X he a Banach space and fy and gy (v G W) he in r{X).


Assume that there exist 5 > 0, /? > 0, F a neighborhood of w such that for all
V eV (v ^w)

BsnXy C {/^ < p}nB, - {gy <p}nBp (4)

where Xy \= span (dom/-i; — dom.gy). Then for v ^w inV we have

0 G i n t ^ ( { / , <p]nBp- {gy < p] r\ B,) . (5)

Proof. Prom the assumptions follows that dom fy H dom gy is nonempty, and

CmtBs) nXyCBsDXyC {fy <p}r\Bp- {gy <p}nBp


= {{fv <p}nBp- Xy) - {{gy <p}nBp- Xy)

where Xy is any member of dom fy fi dom gy. Both {fy < p} f) Bp — Xy and
{QV ^ p}f^Bp — Xy are bounded, ideally convex [Hol75] subsets of the Banach
space Xy. Hence, by Proposition 2, {fy < p} H Bp — {gy < p} D Bp is also
ideally convex in Xy and has the same interior (in Xy) as does its X-^-closure.
Thus we obtain (5). D

T h e o r e m 1. Let X he a Banach space, let fy and gy (v G W) he in r{X),


with the slice convergence fy -^ f^ and gv -^ Qw Assume that there exist
(5 > 0, /9 > 0, y a neighborhood of w such that for all v E V (v y^ w) (4)
holds. Also, assume that f^Ug^^ is proper and weak^ lower-semicontinuous.
Then fy + gv slice converges to fyj -\- gw
330 R. Wenczel, A. Eberhard

Proof. We use as template the proof of [WE99, Thm. 4.3]. We temporar-


ily append the condition that Xy contain both dom/-y and dom gy for v
in V (and shall remove this later). Observe immediately from (5) that
cone (dom fy —dom gy) coincides with the closed subspace Xy so that fyOg* €
r*(X*) by [Att86, Thm. 1.1] (for v ^ w). Again, via the characterization
given by Proposition 1 we seek to prove that fy^gy converges in the dual
slice topology to f^Dg^. It is straightforward to deduce that v H-^ epi/JD^*
is strongly lower-semicontinuous a,t v = w (see the opening paragraph of the
proof of [WE99, Thm. 4.3]). To complete the proof, we require that

bw''-limsupepif^Bg* C epi f^Bg^ .


v—>w

Let (x*,a) G btt'*-limsup^_^^ epi/^D^*. Then there are nets vp -^ w,


{x*^,ap) ^ ^ * (x*,a), and /C > 0 with {x^^.ap) e JB|^ Hepi^/^^D^*^ for all /?.
For such /?, there is y^ G X* for which K > ap > fy0{y0)+gt0{xp-yp)' Place
^ip '•= yp\xv0 ) a norm-preserving extension of y^|x,^ G X*^ (obtained, say, by
the Hahn-Banach Theorem). Also, define x^,^ := x*^ —^L- Then \\x\ +^2^ || =
\\x}\\ <K, and

K > ap>f:^{yl)+g:^{xl-yl)
= /:,(^t,) + ^ : , K ) (so (xl^x^^) e HK{X\VP))

= ifvp \X., Ti^l, \X., ) + {9vp \X., )*(^2^ \X., )


(since Xy contains dom fy and dom^^, and on Xy^ we have xl = y^ and
^2, = xp -y})' Thus {x\^,xl^) e HK{X\VP) and {xl^\x,^.xl^\x,^) G
HK{X*^,VP), the latter since \\x\^\x,^ + ^ 2 j x . ^ ||x*^ < \\x\^ + x ^ J | < K. It
now follows from Lemma 4 that \\x\^ \xy \\ < Y eventually in /? for some 7' >
0. Since x^^ G X* is a norm-preserving extension of xj^ \xy = yp\xy ^ X*^,
we have \\xlj = | | x | J x , J | x * ^ < f for all ^ so ||x^J| < ||a:^|| + \\xlj <
K ^ i \= 7. Thus,

{x},ap) = {x\^,ap - gl^{xl^)) -Y- {xl^,gl^{xl^))


G epif:^ n {Bf X R) +epi^:^ n {B^* x R)

for all (3. We need some uniform bound on the ^^^(^L)- These follow from
Lemma 5 (lower bounds) and Lemma 6 (upper bounds), the latter since
{x\^,xl^) G HKiX^'.vp) OB* and vp -> w. Thus, the Qy^ix"^^) are eventually
uniformly bounded in /?, and

{x*p,ap) = (xl^^ai^) + {X2^,ai^) G epif*^+epig*V0

with the Xi , X2 , cxip, 0^2^ all uniformly bounded in /?.


We may now argue as in the final paragraph of the proof of [WE99, Thm.
4.3] to conclude that
Slice Convergence of Sums of Convex Functions 331

(x*,a) = w*-lim{x*p,af3) (on passing to subnets)

= w*- lim(xt^ ,0^1^)+ w*- lim(x;^, a2^)


G 6it;*- lim sup epi / * + bw*- lim sup epi ^*

C epi / ^ + epi^;!;, (from the slice convergence fv -^ fw, 9v -^ Qw)


C Q^iflUgl.

This completes the proof for the case where Xy D domfy U dom^^ for all v.
For the general case, let p > infx fw Then, v \-^ {fw < p} is norm-l.s.c.
at w since (see [Bee93]) {/^ < p} shce converges to {fy; < p} diS v —> w.
Thus on choosing some x^ € {/t^ < /9}, we have some Xy £ {fy < p} with
Xy strongly convergent to Xy, dbS v ^ w. Place fy := fv{xy + •) and Qy :~
9v{xv -^ ')' By Lemma 3, fy and Qy shce converge to /-u; and Qy^ respectively.
Also, 0 G dom/^y, whence X^ contains both dom/^, and dom^-^;, with Xy ~
span (dom /-i; — dom^^). The form of the conditions in the theorem statement
are not altered by passing from fy, Qy to fy, Qy, the only change being an
increase in the value of p in the interiority condition. Thus we obtain the slice
convergence fy-\-gy -^ fw+9w Translating the sum by —x^. Lemma 3 yields
the convergence

Jv I 9v ^^ \Jv ~^ 9v)v '^vj ^ \Jw "r 9w)\' ^w) ^^ Jw ~r 9w •

It is well known (see, for instance, [AR96]) that results for sums, such as
Theorem 1, imply convergence results for restrictions F(-, 0) of bivariate func-
tions on product spaces X xY (just apply a sum theorem to the combination
F -{- 5xx{o}) 2tnd that such results may be used to extend sum theorems to
include an operator, that is, yield convergence of functions of the form f+goT
where T : X -^ Y is a, bounded hnear operator. As discussed in [Zal03], con-
vergence theorems for F(-,0) may be used to derive theorems not only for
sums, but also for other combinations of functions, such as max(/, ^ o T ) , and
so, in a sense, results for sums are equivalent to results for sums with operator
and equivalent to results on restrictions of bivariate functions. Thus, it is a
matter of taste, or the intended application, that will dictate the choice of
primary form to be considered.
We now use Theorem 1 to obtain a convergence theorem for restrictions of
functions on product (Banach) spaces, (cf. [Zal03, Prop. 13] for the normed-
space version)

Corollary 1. Let X and Y be Banach spaces, let Fy (v G W) be in r{X x Y)


with Fy slice convergent to F^. Assume that 0 G Py (dom F^) for all v, and,
moreover, that there are 5 > 0, p > 0 and neighborhood V of w such that for
all V eV (with V ^ w)
332 R. Wenczel, A. Eberhard

Bj n n C Py{{F, < p} n 5 ^ ^ ^ ) ,

where Yy := span (Py (domF^)) C Y, and that /i : X* -^ R given by


— —w*
/i(x*) = infy^-^Y* F^{x*,y*) satisfies h = h . Then Fy{',0) -> Fyj{',0) in
nx).
Proof. Note that since h* = Fyj{',0) G r ( X ) , it follows that /i**, and therefore
TT , is in r*(X*). Place Gy = G := 5xx{o} ^ ^{^ x y)^ where Sxx{o}
denotes indicator function of X x {0}. We shall apply Theorem 1 to Fy and
Gy, so we check its hypotheses.
Since {Gy <p} = Xx{0} for any/9 > 0, we have {Fy < p}nB^''^-{Gy <
p} = X X PriiFy <p}n Bf""^) and domF^ - domG^ =X x Py(domF^),
whence Zy := span {dom Fy ~ domG^;) = X xYy^ implying

5 f x ^ n z , = Bf X {BjnYy) c xx (BjnYy)
CXxPY{{Fy<p}nB^''^)
= {Fy<p}nB^^''-{Gy<p}
for all V e V\{w}. Moreover, since (F^nG^)(x*,y*) = /i(x*) for x* G X*,
y* G y*, we see that properness oih — h implies that F^HGl^ = F^DG^^
and is proper. Thus, the conditions of Theorem 1 hold, from which follows
the slice convergence of Fy + Sxx{o} to Fyj + ^xx{o} iii F{X x Y), which in
turn implies that F^(-,0) -> F^(-,0) in r{X). D

We can use Corollary 1 to obtain a version of Theorem 1 "with an opera-


tor" . We start with an elementary lemma whose proof will be omitted. (This
lemma is also a consequence of Lemmas 19,20 of [Zal03].)
L e m m a 8. Let fy -^ fy, and gy —> g^j be slice convergent in r{X) and r{Y)
respectively, and let Ty —^Tyjbea norm-convergent family of continuous
linear operators mapping X into Y. Place Fy{x, y) = fv{x) + gy{TyX + y) for
(x^y) e X xY. Then Fy slice converges to F^ in r{X x Y).
The next result now extends Theorem 1.

Corollary 2. Let X and Y be Banach spaces. Let fy —^ fyj and gy —> gy,
under slice convergence in r{X) and r{Y) respectively, and letTy-.X-^Y
be continuous linear operators with Ty —^ T^ in operator norm. Assume that
there exist a neighborhood V of w, and S > 0, p > 0 such that

yV G V\{W} BsnYyC Ty{{fy < ^j H Pp) " {^^ < />} (6)

where Yy = span (domp-y — Ty dom fy). Assume further that h : X* —> R


T^
defined by h{x*) := mfy*eY*{f^{x* - T^y"") + ^^(y*)) satisfies h = T
Then fv+gv^ Tv slice converges to fw + g
Slice Convergence of Sums of Convex Functions 333

Proof. Place Fy{x,y) := fv{x) + QviTyX + y). Then Fy slice converges to F^


by Lemma 8. It is easily seen that Yy = span(PydomFt,). U y e Bs H Yy,
then y = yi- TyX with gy{yi) < p, \\yi\\ < p' (for some p' > p), fy{x) < p,
\\x\\ <p, so ||(x,y)|| < p^p', Fy{x,y) = fv{x) -{-gy{yi) < p + />', thus yielding
that y e PviiFy < p + p'} H B^^^). Thus, there is a /O > 0 such that
B5r\Yy <Z PviiFy < p}n Bf""^) for all v e V\{w}. We may then apply
Corollary 1 to obtain fy+gyoTy = Fy{',0) -^ Fyj{-,0) == fw+ 9w^Tyj. D
Remark 3. The condition 0 G sqri (T^ dom/-„; — dom^^) can be shown to
be equivalent to assuming the condition in (6) to hold dX v = w. If this
is assumed, then there follows, by a standard Fenchel duality result, that
— —w*
h = {fw -\- gw ^ Tyj^ so h is weak*-closed and hence h = h . Indeed,
hix*) = - sup [ - ( / - xri-T*y*) - g*iy*)]
y
= -inf(/ -x*+goT){x)= sup [{x,x*) - {f + g oT)(x)]
^ X

^if + goTrix*).
Alternately, we may deduce the above by using [Zal03, Lemmas 15, 16] with
F{x, y) := fy, (x) + gw {Ty,x + y)
In [Zal03] a number of qualification conditions are framed in the dual
spaces. We consider some related results next.
P r o p o s i t i o n 3. Let X he normed and linear {fy}y£w «^^ {gv}vew be slice
convergent families in r{X) convergent to fyj andg^, respectively, with fyDgy
proper for all v. Suppose in addition that for Fy{x, y) := fy{y) -\- gv{x — v) '^e
have

\/p > 0, 3p> 0, 3Vp e Af{w), yveVp\/s<p',


[jmy <s}nBpC Px{{Fy <s}n{Xx Bp)). (7)

Then {fv^gv}v£W ^5 slice convergent to fw^gw as v —^ w.


Proof It is straightforward to deduce that v H-> epifyDgy is strongly lower-
semicontinuous 3,1 v = w (see the opening paragraph of the proof of [WE99,
Thm. 4.3]). For the upper slice convergence take

V > iUOg^T {x*) = m^*)+9l{xl (so {x\0,ii) Gepi.K) (8)


and Va -^ w. Let {xa} be bounded. Place p — sup^{||x*||||xa|| — r/, ||xa||}. If
fy^ Dgy^ {xa) > p we immediately have

fvc^dvai^a) > {x*,Xoc) -rj,


whence, without losing generality we may assume fy^ngy^{xa) < P- By re-
defining the index set for the above net if necessary, we may assert the exis-
tence of a net e^^ > 0 tending to zero, such that 5^ := fyJI]gy^{xoc) -\-€a < P-
334 R. Wenczel, A. Eberhard

Then we have Xa G {fyJ^Qvc, < ^a} H Bp, implying (by (7)) the existence of
Iball < P with Fy{xoc^ya) < ^a- As noted earher we always have {Fy}y^w
slice convergent to F^. Also note that a simple calculation shows

Thus (8) and the (upper) shce convergence of {Fy}y^w (recalhng that
F^(x*,0) < T]) implies

Since e^ -^ 0 we arrive at the desired conclusion. D


Now consider fyOg^ in the dual space X*. We use the projection PX*XR :
(x*,y*,/3)^(x*,/3).
Lemma 9. Let X and Y be normed linear spaces and {Fy}y^w ^ F{X x Y),
with 0 e Py domF^ for allv . Place hy{x*) = inf^^^^y* F*{x*^y*) and assume
that {Fy}y^w slice converges to F^ along with
V/9 > 0, 3p> 0, 3Vp e N{w), "iveVp:
e p i , C * n B; C P X ^ X R ( e p i . F ; H (X* x B^ x R)) (9)

and also that the norm- and weak""-closures of hy, coincide. Then hy (dual)
slice converges to hyj = hw and {Fy{',0)}y^w slice converges to Fyj{',0).

Proof First we show that the multi-function v H-^ epi hy is bounded-


weak* upper-semicontinuous. Let Va —^ w he taken so that {xl^,py^) G
epi/i** weak* converge to (x*,/3) and ||(^j;^,/?i;«)|| is bounded. Then we have
{xl^^Py^ + €a) G epi5/1** for any positive net €a —> 0.
Then take p = mdiKa{\\{xl^,Py^ + €a)\\} and apply (9) to deduce the ex-
istence of y*^ G y * such that \\y*J < p and (x*^, y*^,/?t;« + ^a) ^ ^pi sFy^ .
Take a weak* convergent subnet if necessary (and on reparametrizing) we
may assume that {xl^,yl^,l3y^ + ^a) -^ (^*,y*5/?)- Since {Fy}y^w is slice
convergent so is {F*}y^w and hence {epiF^*}^^^^ is bounded weak* up-
per semi-continuous. Hence we have (x*,y*,y9) G epiF^ implying (x*,/3) G
Px*xR(epiF^)Cepi/z;*.
The slice convergence of {Fy}y^w has been observed to follow from
that of {Fy}y^\Y' Thus v 1—> epiF^* is strongly lower semi-continuous. Next
note that for any open set O C X* x R we have i^x*xR(^) = O x y*
and so P x * x R ( e p i F ; n (O x F*)) = Px*xR(epiF;) H O. Hence {v e W \
Px^xK (epiF;) n O ^ 0} = {^ G W I Px*xR (epiF; n (O x y*)) ^ 0} which
clearly coincides with the open set {f G VT | epi Fy H {O x Y*) ^ 0} implying
strong lower semi-continuity.
Finally note that h** = (F^;(-,0))* and hence slice convergence of
{Fy{',0)}yeW
follows from the bicontinuity of Fenchel conjugation. D
Slice Convergence of Sums of Convex Functions 335

This last result could be used to deduce the next result but instead we
prefer to use a direct argument along the lines of argument in Theorem 1.

Theorem 2. Let X he a normed linear space, let fy and Qy (v G W) be


in r{X), with the slice convergence fy -^ fw cind Qy —> g^ and domfy H
dom^-y ^ 0 for all v . Also, assume that f^^g^ is proper and weak* lower-
semicontinuous, Fy{x^y) := fv{^) + gv{'^ + v) Ci^^d

Vp > 0, 3p> 0, 3Vp G Af{w), yv eVp,\Js<p'.

{/*n^* <s}nB;cp*^{{F;<s}r^{x^xBp)) . (lo)


Then fy + gy slice converges to fw+gw

Proof As noted earlier, the strong lower-semicontinuity of i; i-> epi/*n^*


at V = w follows straightforwardly. For the other half of the convergence, let

(x*,a) G 6t/;*-limsupepi/^n^*
v—^w

Then there are nets vp —> w, (x^, ap) —^'^ (x*, a), and p > 0 with (x^, ap) G
^ ; n e p i sf*,Ogt, for all p. By use of (10) we obtain a bounded net \\yp\\ < p
such that ap > F*^{xl^,yl^) = f:,^{xl^ - Vl^) + dv^iVv^) and we may now
argue as in the final part of the proof of [WE99, Theorem 4.3] to deduce that
(x*,a) Gepi/;^n^:,. D

We note that one could have framed a qualification assumption based on


iij*
the assumption that fyClg* — fy^gy for eiil v eW and the assumption of
10 without the weak star closure on the right hand side. A similar proof as
above then obtains essentially [Zal03, Prop. 28].
We close this section with the observation that the argument of Corollary 1
also permits the deduction of epi-distance convergence results for perturbation
functions from those for sums. (See, for example, [EWOO] for detail on epi-
distance convergence.)
Proposition 4. Let X and Y be Banach, and Fn, F in r{X x Y) with
Fji —^ F in epi-distance. Assume that 0 G sqri (Py d o m F ) , and that YQ :=
span(Py domF) has closed algebraic complement YQ for which YQ CiYn = {0}
eventually (where Yn := span(Py d o m F ^ ) / Then Fn{',0) epi-distance con-
verges to F(-,0).

Proof As 0 G sqri (Py domF) we have cone (Py domP) = span(Py domP).
Place Gn^G := 5xx{o}' We apply [EWOO, Thm 4.9] to Fn, P , Gn, G,
Place Zn = span (domP^ — dom Gn) and ZQ = span (domP — domG). Since
d o m P - domG = X x Py d o m P , we have ZQ = X x YQ and Zn = X x Yn,
with
cone (dom P — dom G) = cone {X x Py dom F) = X XYQ
336 R. Wenczel, A. Eberhard

therefore being closed inXxY. Also ZQ has closed complement ZQ := {0} xFQ?
and
'z;;xZo = {XxYn)n m x Y^) = {o} x (y, n yj) = {o} •
Hence the hypotheses of [EWOO, Thm 4.9] are satisfied, yielding

or equivalently, Fn(-,0) -> F(., 0). •

4 Saddle-point Convergence in Fenchel Duality


When discussing saddle point convergence we are necessarily lead to the
study of equivalence classes of saddle-functions which are uniquely associ-
ated with concave or convex parents (depending on the which variable is par-
tially conjugated). We direct the reader to the excellent texts of Rockafellar
[Roc70, Roc74] for a detailed treatment of this phenomenon. The following is
taken from [AAW88] from which we adapt results and proofs.

Definition 10. Suppose that (X, r ) and {¥, a) are two topological spaces and
{K^ : X X y ^ R, n G N } is a sequence of hi-variate functions. Define:

Cr/ha-ls K'^{x,y) ~ sup inf limsupi^^(a:n,2/n)

ha/er-li K'^{x,y) = inf sup liminf i^"'(xn,2/n) •

Definition 11. Suppose that (X, r) and {Y,a) are two topological spaces and
{K^ : X X y -^ R, n G N} is a sequence of hivariate functions.
1. We say that they epi/hypo-converge in the extended sense to a function
K :X xY -^Rif

clx{er/ha~ls K'')<K< c[y{h^/er-li K"")

where cl x denotes the extended lower closure with respect to x (and there-
fore w. r. t. T) for fixed y and cl ^ denotes the extended upper closure with
respect to y (and therefore w.r.t. a) for fixed x. Note that by definition,
d/:=-d(-/).
2. A point (x, y) is a saddle-point of a hivariate function K : X xY —^K if
for all (x,y) G X X y we have K{x,y) < K{x,y) < K{x^y).

The interest in this kind of convergence stems from the following result
(see [AAW88, Thm 2.4]).
Slice Convergence of Sums of Convex Functions 337

Proposition 5. Let us assume that {K^,K : {X,r) x {Y,cr) ^ R, n G N }


are such that they epi/hypo-converge in the extended sense. Assume also that
{xk,yl) are saddle points of K"^^ for all k and {uk} is an increasing sequence
of integers, such that Xn^ —^ x and y*^ —> y*. Then {x,y*) is a saddle point
of K and
k—^oo

The next result from [AAW88] uses sequential forms of the epi-limit func-
tions, as per the following
Definition 12. [AAW88, p 541] Let {X,r) be topological, /n : X -> R. Then

(r-seq-e-ls„_^^/n)(x) := inf lim sup/n(xn)

(r-seq-e-li^^^/n)(x) \= inf liminf/n(xn)

It can be shown that these reduce to the usual (topologically defined) forms
if (X, r) is first-countable, and that the above infima are achieved. We will
need these alternate forms, for generally weak topologies on normed spaces
are not first-countable.
Definition 13. Let {X,r) and (X*,r*) be topological vector spaces. We shall
say they are paired if there is a bilinear map {-,') : X x X* -^ H such that
the maps x* —i > (^x*) and x \-^ (x, •) are (algebraic) isomorphisms such that
X* ^ (X,r)* and X ^ (X*,r*)* respectively.
It is readily checked that if ( X , r ) and (X*,r*) are paired, and so are
(F,o-) and ( r * , a * ) , then {X xY,r xa) is paired with (X* x y * , r * x a*),
with the pairing
((x,y),(x*,y*)) = (x,x*) + (y,y*),
and similarly for other combinations of product spaces.
For any convex-concave saddle function K : X x Y* —> H^ that is, where
K is convex in the first argument and concave in the second, we may associate
a convex and concave parent. These play a fundamental role in convex duality
(see [Roc74]). These are defined respectively as:

F{x,y) = supy.^yA^i^^y*) + (y^y*)]


Gix^y'^) = mf,ex[K{x,y*) - (x,x*)].
Subject to suitable closure properties on AT, it follows that G = —F*,
and that K is a saddle function for the dual pair of optimization problems
infx F('^O) and supj5^* G(-, 0). One may also proceed in reverse, and show that
for any closed convex function F : X x Y -^ K, if G := —F* relative to the
natural pairing of X x y with X* x y*, (these yielding the primal objective
F(', 0) and dual objective G(0, •)), we have an interval of saddle functions, all
equivalent in the sense that they possess the same saddle points, given by
338 R. Wenczel, A. Eberhard

[K^'K] :={K :XxY'' -^U\K convex-concave, I£<K <T< onXx F*},

where

^ ( x , y * ) = sup,.^;,.[G(x*,2/*) + (x,x*)]
i^(x,^*)= miy^y[F{x,y)-{y,y^)],

Our focus will be on the Fenchel duality, where given the primal prob-
lem infx f + 9J ^^ form F{x,y) := f{x) + g{x + y)^ so that G{x*,y*) =
—/*(a:*—y*)—5'*(2/*) and the Fenchel dual takes the form sup^*^x* G^(0)2/*) =
suPy*ex* -/*(-?/*) -P*(y*) (cf. (12) below). Also, any K e [K,'K] is a suit-
able saddle function for the Fenchel primal/dual pair and we shall use K := K
in what follows.
The following result is taken from [AAW88] and requires no additional
assumption.

Proposition 6. Let (X, r ) , ( X * , T * ) and (F, a), (y*,cr*) be paired topological


vector spaces, with the pairings sequentially continuous; let {F^, F : X xY —>
R, n G N } be a family of bivariate (r x a)-closed convex functions. Then,
if K^, K are members of the corresponding equivalence classes of bivariate
convex-concave saddle functions,

1.

(r X cr)-seq-e-lSn^oo-^^ ^ F on X xY
implies clx{er/ha*-ls K ) < K_. i

2,

(r* X (7*)-seq-e-lsn-.oo(i^'')* < {Ff on X* x F*


implies K < c^\h^^/er-li K"") .

Proposition 7. Suppose that X is a Banach space and

{/n,/}^=:i and {gn.9}^=i

be two families of proper closed, convex extended-real-valued functions slice-


convergent to f and g, respectively. Then

K"ix,y*) = mi\U{x) + g^ix + y) - (y,y*)\

epi/hypo-converges (in the extended sense) to

K{x, y*) = inf [/(x) + g{x + y)- {y, y*)]


yeX

with respect to the strong topology on X and the weak* topology on X*.
Slice Convergence of Sums of Convex Functions 339

Proof. Prom the slice convergence of fn and gn it is elementary exercise to


show that Fn{x,y) := fn{x) + gn{x + y) is slice-convergent to F{x,y) =
f{x) + g{x + y). Prom the bicontinuity of conjugation with respect to sHce
convergence, follows the dual sHce convergence of F^ —> F*. Prom the resulting
strong epi-upper-semicontinuity for the Fn and F^ on X x X and X* x X*
respectively,

F >{s X s)-e-\^n-^ooF^ = {s X s)-seq-e-ls^^^F'^ and


F* > (5* X s*)-e-lsn-.oo(i^^)* - (5* X s*)-seq-e-ls^_^(F^)*
>(^*x^*)-seq-e-K_^(F-)*,

where s and 5* stand for the respective norm topologies on X and X*. Now
apply Proposition 6. •

We note the following for later reference. Por u E X, write

V{u) := inUfixJ+gix + u)} = (fOdKu), (11)

and similarly for 99^, where for any function 7/;, ij{x) := '0(—x). Note that
douiip = dom^ — d o m / and similarly for (fn- The operation ip \-^ "ip com-
mutes with conjugation and with slice limits, the verification of this being an
elementary exercise. Prom [Roc74] we have the following: Calling infx(/ + ^)
the primal problem, and infx(/n +5'n) the approximate problems, then —cp*
and —(/:?* are the associated dual objective functionals, and:

(x, y*) is a saddle-point of K iff


¥'(0) = (/ + gm = inf(/ + g) = sup -cp* = - ^ * ( r ) ,
X X*

and similarly for cpn and the saddle-points {xn^y^) of K^. On taking conju-
gates of ^n we obtain

<p*„ = (/;:n^)* = if*7^*) = 7*+9:


and so the dual problem becomes

sup - < = - if*ng*J (0) = - mf if*{-y*) + g*^{y*)) • (12)


X* y*eX*
The next result tackles the problem of finding convergent sequences of dual
variables. (Note that Proposition 5 makes no claim about such existence).
Corollary 3. Suppose that X is a separable Banach space and {fn^f}^=i
and {gn',9}^=i be two families of proper closed, convex extended-real-valued
functions slice-convergent to f and g respectively. Let K^, K he the associated
saddle-functions as in Proposition 7. Assume also the following:
340 R. Wenczel, A. Eberhard

1. 36 > 0, p > 0 such that for all large n eN,

BsnMnC {/^ <p}nBp- {gn <p}nBp

where Mn := span (dom fn — dom^n)


2, f*ng* is proper w*-Isc.

Then if {xniVn) ^'^^ saddle-points of K^ for each n and the Xn has a strong
limit X, and the saddle-values are hounded below, then K has a saddle-point
{x,y*) that is a {s X w*)-limit of saddlepoints (x^, {yn)\Mn) ^f ^ subsequence
of the K'^, with K{x^y*) the limit of the corresponding saddle-function values.
(Here 's^ stands for the norm topology on X and (y^)|Mn denotes any norm-
preserving extension (via Hahn-Banach Theorem, for example) to X* of the
restriction of ^* to Mn ) .

Proof The proof follows from Propositions 7 and 5, on showing that the
{yn)\Mn ^^^ norm-bounded in X*, so that weak*-convergent subsequences
are available and and are the required dual variables.
Since the sublevel-sets of fn are themselves slice convergent [Bee93], there
are Xn G dom/n converging to some x G d o m / . Place /n(-) '-= fn{xn + O^
9ni') '= 9n{xn H" ')' ^ith aualogous definitions for / and g as translates by x.
Then 0 G dom/n, implying that dom/^ H dom^^ C Mn-
Let (fn be the value function corresponding to fn and gn via (11). Simi-
larly, denote the corresponding saddle function by K'^, Then we immediately
observe that ip'^ =^ (p^, from which follows that

{xn^Vn) ^^ ^ saddlepoint of K^ iff {xn — Xn^Vn) ^s ^ saddlepoint of K^ ,

since (x^, 5*) are an optimal pair for the primal and dual problems if and only
if {xn — Xni yn) ^^^ Optimal for the problems based on the translated functions
fn, 9n' Evidently the optimal values are not affected by this translation, so we
also obtain that K'^{xn — Xn^Vn) — ^^i^n^yn)- Hence the saddle-values of
K^ are also bounded below. As Mn contains both dom/^ and dom^n (recall
this follows from 0 G dom/n), we obtain

K'^iXn -Xn^yl) =k''{Xn-Xn,y^\Mn)^

w h i c J l ^ l o W S f r o m ( ^ ; ( ^ * ) = fni-Vn) "^ 9niyn) ^ / n ( - ^ ; ^ | M J + ^^(y* | M J =


f^niVnlMn)^ since Mn contains the domains of fn and gn- Letting — a G R be
a lower bound for the saddle-values of K'^ (and hence of K'^), we have for all
n large, that (—^nlMn?5nlMn) G Ha{M*,n) (where the latter set is defined
relative to the translated functions /n, gn), since

{fn\MS{-yn\Mn) + {9n\Mr^y {VnlMj = fni-Vn) + 9n{yn)


= -k'^{xn-Xn,yl) <a.
Slice Convergence of Sums of Convex Functions 341

By L e m m a 4, t h e ||5^|Mnll ^^^ norm-bounded in M * for all large n. T h e n


t h e sequence of norm-preserving extensions z^ '-= VnlMn ^ ^* is also n o r m -
bounded and hence has a weakly* convergent subsequence ^* —> ^*. For each
^? (^n ~ ^n^^n) ^ saddlepoint for K^^ so (xn,^^) is one for K'^. By Propo-
sitions 7 and 5, (x^z*) is a saddlepoint for K, with value t h e limit of t h e
saddle-values along t h e sequence. D

References
[Att84] Attouch, H.: Variational Convergence for Functions and Operators. Ap-
plicable Mathematics Series, Pitman, London (1984)
[Att86] Attouch, H. Brezis, H.: Duality for the sum of convex functions in gen-
eral Banach spaces. In; Barroso, J. (ed) Aspects of Mathematics and its
Applications, 125-133. Elsevier Sc. Publ. (1986)
[AR96] Aze, D., Rahmouni, A.: On Primal-Dual stability in convex optimization.
Journal of Convex Analysis, 3, 309-327 (1996)
[AAW88] Aze, D., Attouch, H., Wets, R.J.-B.: Convergence of convex-concave sad-
dle functions: applications to convex programming and mechanics. Ann.
Inst. Henri Poincare, 5, 537-572 (1988)
[AP90] Aze, D., Penot, J.-P.: Operations on convergent families of sets and func-
tions. Optimization, 21, 521-534 (1990)
[Bee92] Beer, G.: The slice topology: a viable alternative to mosco convergence in
non-refiexive spaces. Nonlinear Analysis: Theory, Methods and Applica-
tions, 19, 271-290 (1992)
[Bee93] Beer, C : Topologies on closed and closed convex sets. Mathematics and
its Apphcations, 268, Kluwer Acad. Publ. (1993)
[BL92] Borwein, J.M., Lewis, A.S.: Partially-finite convex programming. Mathe-
matical Programming, 57, 15-83 (1992)
[EWOO] Eberhard, A., Wenczel, R.: Epi-distance convergence of parametrised sums
of convex functions in non-reflexive spaces. J. Conv. Anal., 7, 47-71 (2000)
[ET99] Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. SIAM
Classics in Applied Mathematics, 28 (1999)
[Hol75] Holmes, R.B.: Geometric Functional Analysis and its Applications.
Springer-Verlag Graduate Texts in Mathematics 24 (1975)
[Pen93] Penot, J. -P.: Preservation of persistence and stability under intersection
and operations. J. Optim. Theory & Appl., 79, 525-561 (1993)
[Pen02] Penot, J-P, Zalinescu, C : Continuity of usual operations and variational
convergence, personal communication, 30/04/02 (2002)
[RW84] Rockafellar, R.T., Wets, J.-B.: Variational systems, an introduction. In:
Salinetti, G. (ed) Multifunctions and Integrands. Springer-Verlag Lecture
Notes in Mathematics, 1091, 1-54 (1984)
[Roc70] Rockafellar, R.T.: Convex Analysis. Princeton University Press (1970)
[Roc74] Rockafellar, R.T.: Conjugate Duality and Optimization. SIAM publ. (1974)
[WE99] Wenczel, R.B., Eberhard, A.C.: Slice convergence of parametrised sums of
convex functions in nonreflexive spaces. Bull. Aust. Math. Soc, 60, 429-
458 (1999)
[Zal03] Zalinescu, C : Slice convergence for some classes of convex functions. J.
Nonlinear and Convex Analysis, 4, (2003)
Topical Functions and their Properties in a
Class of Ordered Banach Spaces

Hossein Mohebi

Department of Mathematics
Shahid Bahonar University of Kerman
Kerman, Iran
hmohebiOmail.uk.ac.ir;
CIAO, School of Information Technology and Mathematical Sciences
University of Ballarat
Ballarat, VIC 3353, Austraha
h. mohebiQballarat. edu. au

Summary. We study topical functions in a class of ordered Banach spaces and


show that these functions are abstract convex with respect to a certain set of ele-
mentary functions and obtain an explicit formula for their subdifferential. We give
characterizations of the Fenchel-Moreau conjugate and the conjugate of type Lau of
topical functions. We also present necessary and sufficient conditions for plus-weak
Pareto points of a closed downward set in terms of separation from outside points.

2 0 0 0 M R S u b j e c t C l a s s i f i c a t i o n . Primary: 26B25, 52A41; Secondary:


46B42

K e y w o r d s : Topical function, Downward set, Fenchel-Moreau conjugation,


Conjugation of type Lau, Plus-weak P a r e t o point, Subdifferential, Ordered
Banach space

1 Introduction
A function / : IR'^ —> JR^ is called topical if this function is increasing
{x > y = ^ f{x) > f{y)) and plus-homogeneous {f{x + Al) = f{x) + Al
for all X e IR^ and all A G IR), where 1 is the vector of the corresponding
dimension with all coordinates equal to one. These functions are studied in
[GG98, Gun98, Gun99, GK95, RSOl, Sin02] and they have m a n y appHcations
in various parts of applied mathematics (see [Gun98, Gun99]).
In this paper we study topical functions / : X —> IR defined on an
ordered Banach space X. We show t h a t the topical functions / : X —> IR
344 H. Mohebi

are characterized by the fact that the Fenchel-Moreau conjugate function and
the conjugate function of type Lau admits a very simple expHcit description.
Most of these results have been obtained by A. Rubinov and I. Singer in finite
dimensional case (see [RSOl, Sin02]). In this paper, we obtain these results in
ordered Banach spaces without using the concepts of lattice theory.
The structure of the paper is as follows. In Section 2, we recall main defin-
itions and prove some results related to downward sets and topical functions.
We also show that a topical function is abstract convex. Characterizations of
plus-weak Pareto points for a closed downward set are investegated in Sec-
tion 3. In Section 4, we study the subdifferential of a topical function and we
present the characterizations of plus-weak Pareto points of a closed downward
set in terms of separation from outside points. In Section 5, we give chara-
terizations of a topical function in terms of its Fenchel-Moreau conjugate and
biconjugate with respect to a certain set of elementary functions. In section
6, we first give characterizations of topical functions in terms of the conjugate
of type Lau. Next, we show that for topical functions, the conjugate of type
Lau and the Fenchel-Moreau conjugate coincide.

2 Preliminaries
Let X be a Banach space with the norm ||.|| and let C be a closed convex cone
in X such that Cfl (—C) = {0} and int C 7^ 0. We assume that X is equipped
with the order relation > generated hy C : x > y \i and only if x — y £ C
(x, y e. X). Moreover, we assume that C is a normal cone. Recall that a cone
C is called normal if there exists a constant m > 0 such that \\x\\ < m\\y\\^
whenever 0 < x < y, and x, y E X. Let 1 G int C and let

B = {xeX :~l<x<l}. (1)

It is well known and easy to check that B can be considered as the unit
ball of a certain norm ||.||i, which is equivalent to the initial norm ||.||. Assume
without loss of generality that ||.|{ = ||.||i.
We study in this paper topical functions and downward sets. Recall (see
[Sin87]) that a subset VF of X is said to be downward, ifweW and x G X with
X < w, then x e W. A function / : X —> IR := [—00, +00] is called topical if
this function is increasing {x > y ==^ f{x) > f{y)) and plus-homogeneous
(/(x + A l ) = /(x) + A for all X G X and all A G IR). The definition of a topical
function in finite dimensional case can be found in [RSOl].
For any subset W of X, we shall denote by int W, cl W, and bd W the
interior, the closure and the boundary of VF, respectively.
For a non-empty subset W of X and x G X, define

d{x,W) = inf ||x —tt;||.


wew
Topical Functions and Their Properties 345

Recall (see [Sin74]) that a point WQ E W is called a best approximation for


xeX ii
\\x — wo\\ = d{x,W).
Let W C X. For x E X, denote by Piy (x) the set of all best approximations
oi X inW :
Pw{x) =={w eW :\\x- w\\ = d{x, W)}.
It is well-known that Pw{^) is a closed and bounded subset oi X. U x ^ W
then Pw{x) is located in the boundary of W.
For X e X and r > 0, by (1), we have

B{x,r) :={y e X :\\x-y\\<r} = {y e X :x-rl <y <x-f rl}. (2)

Let (p : X X X —> IR be a function defined by

(f{x, y) := sup{A elR : XI < x-\-y} W x, y e X. (3)

It follows from (1) that the set {A G IR : Al < x -\- y} is non-empty and
bounded from above (by ||x + y||). Clearly this set is closed. It follows from
the definition of (p that the function (p enjoys the following properties:

—oo < cp{x,y) < \\x + y\\ for each x,y e X (4)

(p{x, y)l < X + y for all X, y £ X (5)


(p{x,y) = ^{y,x) for all x, y G X; (6)
ip[x, -x) = sup{A G IR : Al < X - X - 0} = 0 for all a: G X. (7)
For each y G X, define the function (fy : X —> IR by

(py{x) :=(p{x,y) \/xeX. (8)

The function (py defined by (8) is topical (see [MR05]).


Let 5 be a set and L = {h : S —^ M, : h is a function} be a set of
functions. We recall (see [RubOO, Sin87]) that a function / : S —> IR is called
abstract convex with respect to L, or, briefly, L-convex^ if there exists a subset
Lo of L such that
f{s) = sup h{s) {s G S).
heLo, h<f

Proposition 1. Let f : X —> IR be a topical function. Then f is Lipschitz


continuous.

Proof. Let x, y E X he arbitrary. Since by (2) we have

- | | x - y | | l <x-y < ||x-y||l,

it follows that
y--\\x- y\\l < X < y + ||x - y||l.
346 H. Mohebi

Since by hypothesis / is topical, we get

f{y) - \\x - y\\ < f{x) < f{y) + \\x - y\\,

and hence
\fix)-f{y)\<\\x~y\\. (9)
Thus, / is Lipschitz continuous. D

Corollary 1. The function (py defined by (8) is Lipschitz continuous.

Proof. It follows from Proposition 1. D

Corollary 2. The function (p defined by (3) is continuous.

Proof. It follows from (9). D


Proposition 2. Let f : X —> Bl be a topical function. Then the following
assertions are true:
1) If there exists x Q X such that / ( x ) = +oo, then / = +oo.
2) If there exists x E X such that f{x) = —oo, then f = —oo.

Proof. 1) Suppose that there exists x € X such that f{x) = +oo, and let
y e X he arbitrary. Let A = (/?(—x,y), where cp is the function defined by (3).
Then by (4) we have A G IR. In view of (5), it follows that Al < y — x^ and so
X -i- XI < y. Since / is a topical function, we conclude that f{x) + A < f{y).
This implies that f{y) = +oo.
2) Assume that there exists x e X such that f{x) — — oo, and let y £ X
be arbitrary. Let A = ip{x^ —y), where ^ is the function defined by (3). Then
by (4) we have A G IR. In view of (5), it follows that Al < x — y, and so
y + Al < X. Since / is a topical function, we conclude that f{y) < f{x) — A.
This implies that f{y) — — oo, which completes the proof. D

It follows from Proposition 2, for any topical function / : X —> IR, either
we have d o m / = X oi f = +oo, where d o m / := {x e X : f{x) < +oo}.
In the following we denote by X^p the set of all functions (pi (I G X) defined
by (8). That is:
X^ = {ipi:=ip{.J):leX}. (10)

Theorem 1. Let cp be the function defined by (3). Then for a function f :


X —> M the following assertions are equivalent:
1) f is a topical function.
2) For each y G X there exists ly E X such that

^iy{x) < f{x) y xeX, and cpi^iy) = f{y).

3) f is X^p-convex, where X^p is defined by (10).


Topical Functions and Their Properties 347

Proof. 1) = > 2). Suppose that / is a topical function and let y G X be


arbitrary. Define
ly-fiy)l~yeX. (11)
Now, let X G X be arbitrary and A := (p{x^ —y). Then by (5) we have Al <
X — y, and so y + Al < x. Using (11) and that (^(x,.) and / are topical
functions, we obtain

f{x) > f{y + Al) = f{y) + A = f{y) + ^(x, -y)


= ^ ( ^ , / ( y ) i -y) = ^{x^ ly) = ^ly W-
Also, by using (7) we have

^iy{y) = ^{y. fiyn ~y) = sup{A G IR : A1 < y + f{y)l - y)

- sup{A G IR : Al < f{y)l} = sup{a + f{y) G IR : a l < 0}


= sup{a G IR : a l < 0} + f{y) - 0 + f[y) = f{y).
Hence, we have 2).
2) = ^ 3). Assume that 2) holds. Then we have

/ ( x ) = sup(^/ (x) (x G X ) ,
y£X

and hence / is X^^-convex.


3) = > 1). Assume that 3) holds. First, note that it is easy to check that every
supremum of topical functions defined on X is a topical function. Since every
function ipi {I G X) defined by (8) is topical, it follows from the hypothesis
that / is a topical function, which completes the proof. D

Corollary 3. Every topical function f : X —> 5l is lower semi-continuous.

3 Plus-Minkowski gauge and plus-weak Pareto point for


a downward set
We start with the following definition, which is given in [MRS02], [RSOl] for
the finite dimensional case.

Definition 1. Let W be a downward subset of X. The function pw • X —^ Bl


defined by
Pw{x) = M{XeIR\xe\i^W} (XGX)
is called the plus-Minkowski gauge of the set W.

The following proposition has been proved in finite dimensional case (see
[RSOl]). However, the same proof is valid in the case under consideration.
348 H. Mohebi

Proposition 3. Let W he a downward subset of X, Then pw is a topical


function.
In the sequel, we give a definition of plus-weak Pareto points.
Definition 2. Let W be a closed downward subset of X. A point w e W is
called a plus-weak Pareto point of W if (Al + w)^W for all 0 < X e M.
Lemma 1. Let W be a closed downward subset of X and w EW be arbitrary.
Then w is a plus-weak Pareto point ofW if and only if pwi'w) = 0.
Proof. Let
Dy, = {\eM:w£\l^W] {weW).
Then we have
w is a plus — weak Pareto point of W 4==^ (Al -\-w)^W VA>0
w^-\l +W VA>0
4=^ -X^D^ VA> 0
^=^ X^D^ VA<0
<=^ Xe D^ VA > 0
pwi'w) = inf-D^t; == 0.
D

Lemma 2. Let W be a closed downward subset of X andw eW be arbitrary.


Then the following assertions are equivalent:
1) w is a plus-weak Pareto point of W.
2)wehdW.
Proof. 1) = ^ 2). Assume 1) holds and if possible that w ^ bdVF. Then,
w G int W. It follows that there exists ^ > 0 such that

V '.= {xeX'. \\x-w\\ <e}cW.


This implies, by (2), that w -\- el ^ W. Hence w is not a plus-weak Pareto
point of W. This is a contradiction.
2) =^ 1). Suppose that 2) holds. We claim that Xl^w ^W ior all A > 0.
Assume if possible that there exists Ao > 0 such that XQ1-\-W^W. Let

V = {xeX: \\x-w\\ < Ao}


be a neighbourhood of w. It follows from (2) that
V = {x eX \w -XQKX <w + XQI).

Since VF is a downward set and AQI + it^ G W, we conclude that V CW.


Hence, w G int W. This is a contradiction. Thus, the claim is true, and so w
is a plus-weak Pareto ponit of VF, which completes the proof. D
Topical Functions and Their Properties 349

P r o p o s i t i o n 4. Let 0 ^ x e X and R — { a l + x : a > 0}. Let W be a closed


downward subset of X. Then \Rr\hdW\ < 1, where \A\ denotes the cardinality
of the set A.
Proof li RnhdW = ^, then |i^ ft bd W^l - 0 < 1. Now, suppose that R H
bd VF 7^ 0. We may assume that x G i?nbd W. Thus, x ehdW C.W.lt follows
from Lemma 2 that x is a plus-weak Pareto point of W^ and so Al + x ^ W^
for all A > 0.
On the other hand, assume if possible that there exists AQ < 0 such that
AQI + X G bd lyi Then, by Lemma 2, AQI + x is a plus-weak Pareto point of
W. Hence for -AQ > 0, we have [-AQI + (AQI + x)] ^ W. That is, x^W. This
is a contradiction. It follows that Al + x G VK only for A = 0. Consequently,
Rf^hdW = {x}, and hence \R^hdW\ = l. U

4 X(^-subdifFerential of a topical function


Definition 3. Let f : X —> Si be a topical function and (p be the function
defined by (3). Define the X^-subdifferential dx^f{x) of f at a point y e X
by

dxjiy) = {leX: ^i{x) < fix) V X G X, and ^i{y) = f{y)}, (12)

where X^p is defined by (10).

Lemma 3. Let f : X —> IR be a topical function and let y E X. Then

dxJiy) ^{leX: <piiy) > f{y), and f{~l) = 0}.

Hence, in particular, {f{y)l ~ y) e dx^f{y)-


Proof Let
D^{leX: ipi{y) > f{y), and f{-l) - 0}
and let I G dx^f{y) be arbitrary. Then, by (12), we have ^i{y) > fiy)- This
implies, by (5), that y + / > ^i{y)l > /(2/)l, and so y > f{y)l — L Since / is
a topical function, it follows that f{y) > f{y) + f{—l). Thus, /(—/) < 0.
On the other hand, by (7), we have /(—/) > ^i{—l) '•= ^{—h 0 — 0- Hence,
/(—/) = 0. Therefore, I E D. Conversely, assume I G D and if possible that
there exists x £ X such that (pi{x) > / ( x ) . This implies that there exists
A > 0 such that (pi{x) > f{x) + A, and so by (5) we get x > (/(x) + A)l — /.
Since / is a topical function and that /(—/) = 0, it follows that

f{x) > fix) + A + fi-l) = fix) + A.


This is a contradiction. Thus we conclude that (pi{x) < /(x) for all x G X,
and hence, in particular, we have (pi(y) < fiy)- Consequently, since I e D,we
obtain (pi{y) = f{y), and so / G dx^f{y).
350 H. Mohebi

Finally, let IQ — f{y)l — y. Since (p{y,.) is a topical function and (7) holds,
it follows that

^loiy) = ^{y^ ^o) = ^{y^ f{y)i -y) = f{y) + ^(y, -y) = f{y)-

Also, we have / ( - / o ) = f{y) - f{y) = 0. We conclude that IQ e D = dx^f{y),


which completes the proof. D

Remark i. If VF is a downward subset of X and pw is its plus-Minkowski


gauge function, then

{xeX : pw{x) <0} CW C {xeX : pw{x) < 0}.

Indeed, if x e {x G X : pw{x) < 0}, then there exists A < 0 such that
X e XI + W. Since x < x — Al, x — XI e W and W is a, downward set, it
follows that X G W. Also, note that if VK is a closed downward subset of X,
then
W = {xeX : pw{x) < 0}.

Lemma 4. Let W be a proper closed downward subset ofX^wGWbea plus-


weak Pareto ponit of W and I G X. Assume that (p is the function defined by
(3). Then the following assertions are equivalent:
1) I e dx^pw{w).
2) sup^^vT^ if{y, I) <0 = (p{w, I).

Proof Since t6^ is a plus-weak Pareto point of W, it follows from Lemma 3.1
that pwi"^) = 0.
1) =^ 2). Suppose that 1) holds. Then, by Definition 3 and Remark 1, we
have
^{yJ)<Pw{y)<0 yyeW
and (p{w, I) = (pi{w) — pwi'^) = 0. Hence, sup^^^ (p{y, I) < 0 = (p{w, I).
2) = > 1). Assume that 2) holds. Let y G X and x — y — pwiy)'^- Since, by
Proposition 3, pw is a topical function, it follows that pw{x) = 0. In view of
Remark 1, we have x G W. Thus, by hypothesis, (p{x,l) < 0. This implies that
^i{y) < Pw{y) for all y e X, Also, we have (pi{w) := ^{wj) = 0 == pwiw).
Hence, by Definition 3, / G dx^Pwi'i^)^ which completes the proof. D

Theorem 2. Let W be a closed downward subset of X^ XQ E X \W^ WQ G W


and ro ~ 11^0 — i^oll- ^/ there exists I G X such that

(p{wj) <0<(p{y,l) \/weW, yeB{xo,ro),

Then WQ is a plus-weak Pareto point of W.


Proof. Since ro = ||xo — tt;o||, then WQ G i5(xo,ro). Also, we have WQ G W.
It follows by hypothesis that (p{woyl) = 0. Now, assume if possible that WQ
is not a plus-weak Pareto point of W. Then there exists AQ > 0 such that
Topical Functions and Their Properties 351

AQI -i- WO E W^ and hence by hypothesis, (p{Xol + WQJ) < 0. This impHes,
since (p{.J) is a topical function, that

0 > <y^(Aol + Wo, I) = Xo + ^{wo, /) = Ao + 0 = AQ.

This is a contradiction. D

Remark 2. If T^ is a closed downward subset of X and xo € X, then the least


element go = xo — rl of the set Pw{xo) exists (see [MR05], Proposition 3.2),
where r = d{xo, W).
Theorem 3. Let W he a closed downward subset of X, Xo G X \W and
9o = Xo — rol be the least element of the set Pw{xo)^ where ro — c/(xo, VF).
Then the following assertions are equivalent:
1) go is a plus-weak Pareto point of W.
2) There exists I G X such that

(p{wj) <0<(p{yj) \/weW,yeB{xo,ro).

Proof 1) = ^ 2). Suppose that 1) holds. Let I = —go and y G B{xo,ro) be


arbitrary. Since ^o = ^o — ^ol^ it follows that go is also the least element of
B{xo,ro). Hence, go < y, and so by (7) and that (p{.,l) is a topical function,
we have
0 = ^{go.l)<^{yJ) \/yeBixo,ro). (13)
On the other had, by hypothesis go is a plus-weak Pareto point of W.
In view of Lemma 1, we have pw{9o) = 0. It follows from Lemma 3 that
I = -^0 = pwigo)'^ - 90 ^ dx^pw{9o)' Thus, by Lemma 4, we have

(f{w,l) <0 yweW, (14)

Therefore, (13) and (14) imply 2).


2) = > 1). Assume that 2) holds. Since go G Pw{xo) and ro = d{xo,W), it
follows that ro = H^^o — ^'oH- Therefore, In view of Theorem 2, we have ^o is a
plus-weak Pareto point of W, which completes the proof. D
Corollary 4. Let W be a closed downward subset of X, xo E X \W and
9o = Xo — rol be the least element of the set Pw{xo)^ where ro = d{xo, W).
Then there exists I G X such that

ip{w,l) <0<(p{yj) ^weW,yeB{xo,ro).

Proof Since go G Pw{xo) and Pw{xo) C bdl^, then ^o G hdW, and so by


Lemma 2, po is a plus-weak Pareto point of W. Hence, by Theorem 3, there
exists / G X such that

ip{wj) <0<ip{y,l) "^weW^yeBixo.ro),

and the proof is complete. D


352 H. Mohebi

The following example shows that every plus-weak Pareto point of a closed
downward set W need not separate W and ball B{xo,ro).

Example 1. Let X = JR^ with the maximum norm ||x|| = maxi<^<2 \xi\ and

C = {(xi,X2) G IR^ : xi > 0, X2 > 0}.

Let
W = {{wi,W2) G H^ : mm{wi,W2} < 1},
XQ = (2,2) e X \W and WQ = (1,3). It is clear that C is a closed convex
normal cone in X, W^ is a closed downward subset of X and WQ G bd W.
Also, we have 1 = (1,1) G i n t C We have d{xo, W) = 1 = ||xo — go\\^ where
^0 = (1,1) is the least element of the set Pw{xo). Since WQ G bd W, it follows
from Lemma 2 that WQ is a plus-weak Pareto point of VF, and we have also
^0 '= \\xo - 'u;o|| = 1 = d{xo, W).
Now, let / = —WQ and w = {wiyW2) GW he arbitrary. Then we have

(p{w, I) = (p{w, —WQ) = sup{A G IR : Al < If — WQ}

= sup{A G IR : A < mm{wi -1,W2- 3}} < 0, (15)


and

(p{xo, I) = (p{xo, —WQ) = sup{A G R : Al < xo - WQ}

- sup{A G IR : A < - 1 } = - 1 < 0. (16)


Therefore, (15) and (16) show that —WQ does not separate W and B{xo,ro).

Theorem 4. Let W be a closed downward subset of X, XQ e X \W, WQ E W


and VQ = \\xo — wo\\. If there exists I G X such that

ifi{wj)<0<^{yj) \/weW,yeB{xo,ro). (17)

Then, WQ G PW{XQ). Moreover, if (17) holds with I = —WQ^ then, WQ —


minPvK(^o) '.= XQ — rl^ where r = d{xQ,W).

Proof. Let go = XQ—rl be the least element of the set Pw{xo). It is clear that
9o < XO' Now, assume if possible that WQ ^ Pw{xo). Then, r < TQ. Choose
A G IR such that 1 — ror~^ < A < 0, and let w = XXQ + (1 — X)go. Since
9o ^ ^0, it follows that w — go = X{xo — go) < 0, and so w < go- Since VF" is a
downward set and go G W, we conclude that w eW. Also, we have

||xo - ^11 = ||xo - Axo - (1 - A)^o|| == (1 - A)||xo - ^o|| = (1 - A)r < ro,

and hence w G B{xo,ro). This implies by hypothesis that (p{w,l) = 0. But,


on the other hand, since (/?(.,/) is a topical function, we have

^{w, I) = (p{go + A(xo - ^o), 0 = ^(90 + rAl, I)


Topical Functions and Their Properties 353

= V^(^o, l)+rX<0 + rX = rX<0.


This is a contradiction. Hence, WQ G Pwi^o).
Finally, Suppose that (17) holds with I = —WQ. Then by the above WQ G
Pwi^o), and so r = TQ. Thus, we have go e B{XQ^ TQ), and hence 0 < ^{goi 0 —
(^(^0, —'^o)- In view of (5), we get 0 < ^{QQ, —WQ)1 < go ~ wo- This implies
that Wo < go- Since go is the least element of the set Pw{xo), it follows that
'^0 = 9o^ which completes the proof. D

Theorem 5. Let W be a closed downward subset of X, xo G X \W, wo G W


and To = \\xo — wo\\. Then the following assertions are equivalent:
1) wo e Pw{xo)'
2) There exists I G X such that

^{wj)<0<y:>{yj) yweW,yeB{xo,ro). (18)

Proof 1) = ^ 2). Suppose that 1) holds and r := d{xo,W). Then r = ro-


Since, by Lemma 2, go = XQ — rol the least element of the set Pw{xo) is
a plus-weak Pareto point of W, it follows from Theorem 3 that there exists
/ G X such that

(p{wj) <0<ip{yj) \fweW,yeB{xo,ro).


The implication 2) = > 1) follows from Theorem 4. D

5 Fenchel-Moreau conjugates with respect t o cp


Recall (see [RubOO, Sin87]) that if V and W are sets and 9 : V x W —> JR
is a coupling function, then for a function / : V —> K the Fenchel-Moreau
conjugate function of f with respect to 9 is the function /^^^^ : W —> IR
defined by
f'^^Hw) : - sup{9{v,w) - f{v)} {w G W). (19)
vev
We point out that (-cx))^^^) = -f oo and (+00)^^^^ = - 0 0 .
Also, we recall that the dual of any mapping u : IR —> IR is the
mapping u' : IR —> IR defined by

h'''{v)= inf / (/iGlR^), (20)

where for any mapping u : IR —> IR and any / G IR we write / ^ instead


of u{f), and for a set ^ , IR denotes the set of all functions g : A —> JR.
In the sequel, we define the couphng function ip : X x X —> IR by

^p{x,y):=mf{\eJR:x + y<Xl} \/x, y e X. (21)


354 H. Mohebi

It follows from (1) that the s e t { A G l R : x + 2 / < A l } i s non-empty and


bounded from below (by — ||x + 2/||). Clearly this set is closed. It follows from
the definition of ip that it enjoys the following properties:

— ||x + 2/|| ^ '0(^?^) < +00 for each x^y G X (22)

x + y < '0(x, y)l for all x, y e X (23)


'ipix.y) =- ip{y,x) for all x, y G X; (24)
'0(x, -x) = inf{A E IR : 0 = X - rr < Al} - 0 for all a; G X. (25)
For each y E X, define the function ipy : X —> IR by

i;y{x):=ij{x,y) \/x G X, (26)

It is not difficult to show that the function ipy is topical and Lipschitz
continuous and consequently, ip is continuous (see Proposition 1 and its corol-
laries).

Definition 4. Let W be a non-empty subset of X and 9 : X x X —> IR be a


coupling function. We define the plus-polar set of W by

W^ixeX :9{x,w)<0, yweW},

and the plus-bipolar set of W by

Clearly, X^ = 0, and by definition, 0^ = X.


Theorem 6. Let (p be the function defined by (3). Then for a function f :
X —> M, the following assertions are equivalent:
1) f is topical.
2) We have
fcM^a:) = -f{-x) (xeX).

Proof 1) = » 2). Assume that / is a topical function. Let x, y e X be


arbitrary. It follows from (5) that (p{x^y)l <x-\-y^ and hence x > (^(x,y)l—y.
Since / is a topical function, we conclude that

^{x^y) - fix) < -f{-y) {x, y e X),


and so
r^'^^(y) = sup {cpix, y) - f{x)} < -/(-y) (y € X). (27)

Also, by definition of the Fenchel-Moreau conjugate function of / and (7),


we have
Topical Functions and Their Properties 355

f<'^\y) = sup{^(a;,y) - f{x)} > ^{-y,y) - f{-y) = -/(-y) {y € X).


xex
(28)
Hence (27) and (28) imply 2).
2) = ^ 1). Suppose that 2) holds. Then we have
/(or) = - r ( ^ ) ( - a ; ) {x & X).

It is not difficult to show that for any function / : X —> IR, /^^"^^ is a topical
function, and hence we conclude that / is a topical function, which completes
the proof. D
The proof of the following theorem is similar to that in finite dimensional
case (see [RSOl]).
T h e o r e m 7. Let f : X —> M be a plus-homogeneous function and 8 : X x
X —> M be a coupling function such that 6{.,y) {y E X) is a topical function.
Then
f<'\y)= sup 9{x,y)= sup e{x,y) {y e X).
xGX, f{x)=0 xeSoif)

Corollary 5. Let f : X —> M be a plus-homogeneous function and 6 : X x


X —> IR be a coupling function such that 9{., y) {y G X) is a topical function.
Then
5o(rW) = Soif)".
Proof The proof follows from Definition 4 and Theorem 7. D
Remark 3. We recall (see [RubOO]) that if X is a set and 9 \ X x X —> IR is
a coupling function such that

9{x,y) = 9{y,x) (x, y e X),

that is, 9 is symmetric. Then the Fenchel-Moreau conjugate mapping c(9) :


IR^ —> IR^ of (19), is self-dual. That is, c(9) = c{9y.
We recall (see [RubOO]) that if V and W are sets and 9 : V x W —> JR
is a coupling function, then for a function / : V —> IR the Fenchel-Moreau
biconjugate function of f with respect to 9, is the function /^(^)^(^) : V —> IR
defined by
/^W^W'(t;) := (/cW)^W'(t;) (veV).
For the proof of the following theorem see [RSOl] in finite dimensional
case. The same proof is valid in the case under consideration.
T h e o r e m 8. Let cp be the function defined by (3). Then for a function f :
X —> M^ the following assertions are equivalent:
1) f is topical.
2) We have
356 H. Mohebi

Proposition 5. Let ip and ip be the functions defined by (3) and (21), respec-
tively. Let f : X —> M be a plus-homogeneous function. Then the following
assertions are true:
1) We have

f<^)^W{x)= sup V^(x,2/)-r^^^"^^^'W (xeX).


yesoif)"

2) We have

f<^)<^y{x)= sup ^(x,y) = r W " ( ^ ) ' ( x ) {xeX).


yesoifr
3) We have

Proof. 1). It is easy to check that /^('^) and f^^"^^ are topical functions. Since
ij) and if are symmetric coupling functions, It follows from Remark 3 that
c('0) = c{ipy and c{(p) = c{^y. Therefore, by Theorem 7 and Corollary 5, we
conclude that

= sup i){x,y)= sup ip{x,y) {x e X),

and

= sup i){x,y)= sup ip{x,y) {x e X),


yeSoif^^^")) xe<So(/)«
which proves 1). The proof of statement 2) is similar to the proof of statement
1).
3). We apply Corollary 5 to the functions /'=('^), /<^('^) and / , it follows that

and

By a similar proof, we have

which completes the proof. D


Topical Functions and Their Properties 357

6 Conjugate of type Lau with respect to ^


Recall (see [Sin87]) that if V and W are sets and A : 2^ —> 2 ^ is any
duality, then for a function / : V —> IR the conjugate of type Lau of f with
respect to A, is the function f^^^^ : W —> IR defined by
fL(A)^^y_^_ ini f{v) {WGW). (29)
vev, wew\A({v})
li 9 : V xW —> IR is a coupling function, then for the conjugate of type Lau
fL(Ae) ^i^h respect to the l9-duality A^ : 2^ —> 2^ defined by

A ^ ( G ) : = { ^ G W ; % , t x ; ) < 0 , V ^ G G} (G C F ) ,

which will be also called the conjugate of type Lau with respect to 9, and
denoted by f^^^\ we have

/ ^ W ( ^ ) = /^(^^)(^^) = _ inf f{v) {f:V—^% w^W), (30)

Remark 4- We recall (see [Sin87]) that if V and W are sets, then for any
duality A : 2^ —> 2 ^ and any function / : V —> IR, the lower level set
Sxif^^^^) (A G IR) has the following form:

5A(/^(^^)-n,^v,;(,)<_AA(M).

Remark 5. Note that since C is a closed convex normal cone in X and 1 G


int C, it is not difficult to show that

(p{x, y) >0 <=^ X + 2/ G int C (x, y G X ) ,

where (p is the function defined by (3).

Therefore, for the coupling function if : X x X —> IR defined by (3) and


a function / : X —> IR, it follows from (30) and Remark 5 that

/^(^)(y) = - inf /(a;) = - ini fix) (ye X). (31)


xex,ip{x,y)>o xex.x-i-yemtc
P r o p o s i t i o n 6. Let 9 : X x X —> IR he a coupling function such that 9{x,.)
{x £ X) is an increasing and lower semi-continuous function. Then for any
function f : X —> ffi, the conjugate of type Lau f^^^^ : X —> M is an
increasing and lower semi-continuous function.

Proof. Let y, z E X and y < z. Since ^(x,.) (x G X) is an increasing function,


it follows that

A:={xeX : 9{x,y) > 0} C B := {x e X : 9{x,z) > 0}.

This implies, by (31), that


358 H. Mohebi

f^^'\y) = - inf / ( x ) < - mi fix) = f^^'^z).


xeA xeB

Hence, f^^^^ is an increasing function.


Finally, it follows from Remark 4 that

Sxif"-^^^) - n^ex, f{x)<-x{y e X : e{x, y)<0} = n,ex, /(X)<-A^X (A G IR),


where E^ := {y e X : 9{x,y) < 0} (x G X). Since (9(x,.) (x G X) is lower
semi-continuous, we have Ex is a closed set in X, and hence Sx{f^^^^) is closed
for each A G IR. Thus, /^^^^ is lower semi-continuous, which completes the
proof. D

Lemma 5. Let 9 : X x X —> IR be a coupling function such that 9{x^.)


{x G X) is an increasing and lower semi-continuous function. Let f : X —> M
he any function such that

f^^'Hx) =-fi-x) (xex).


Then f is increasing and upper semi-continuous.
Proof This is an immediate consequence of Proposition 6 and that the func-
tion h{x) := —f{—x) {x G X) is topical, whenever / is a topical function. D

Corollary 6. Let ^ and ip be the functions defined by (3) and (21), respec-
tively. Let f : X —> M be any function such that

f^i^\x) =-fi-x) (xex),


or
fLW(a:) = -f{-x) ixeX).
Then f is increasing and upper semi-continuous.

Theorem 9. Let (p be the function defined by (3). Then for a function f :


X —> Si, the following assertions are equivalent:
1) We have
fL('p){x) = -f{-x) (xeX).

2) f is increasing and upper semi-continuous.


Proof. The implication 1) = > 2) follows from Corollary 6.
2) => 1). Assume that 2) holds. By (31) and that cl(intC) = C, we have

f^^^Hx) =- inf f{y) = - inf f{y) (x G X ) . (32)


yex, x-\-yemt c v^^^ x-vyec
Now, let X G X be fixed and y G X be such that x -^ y ^ C. Then y > —x.
Since / is increasing, we have f{y) > /(—x), and so —f{y) < —/(—x). In
view of (32), we get
Topical Functions and Their Properties 359

/i-(<P)(^) = _ inf /(y)= sup (-/(?/))<-/(-x). (33)


yeX, x+yeC y^x, x+yeC

We also have / is upper semi-continuous. It follows that — / i s lower semi-


continuous, and hence by (32), we obtain

yex, x-hyec

= sup (-/(y))= sup {-f{y))>-f{-x). (34)


yGX, x-j-yeC yeX, y>-x
Therefore, (33) and (34) imply 1), which completes the proof. D

We recall (see [Sin02, Lemma 3.3]) that if F is a set and 9 :VxV —> IR is
a symmetric coupling function, then the conjugate of type Lau L{6) : IR —>
IR^ of (29) is self-dual, that is, 1(9) = L{ey. Also, if V and W are sets
and A : 2^ —> 2 ^ is any duality, then the biconjugate of type Lau of a
function f : V —> M with respect to A, is the function / ^ ( ^ ) ^ ( ^ ) ' :V —>M
defined by / ^ ( A ) ^ ( A ) ' .= (yL(A))L(A)' (g^^ [Sin87]). In particular, for the
function (p defined by (3) and the (/^-duality Ac^ : 2 ^ —> 2 ^ of (30), we have
fLi^)L{^y ^ ^fLM>jL{<py^ ^j^g^g f .X —> IR is a function.

Theorem 10. Let (p be the function defined by (3). Then for a function f :
X —> M the following assertions are equivalent:
1) We have

2) f is increasing and lower semi-continuous.

Proof 1) = > 2). Suppose that 1) holds. Since v? is a symmetric coupling


function, we have L{(p) = L{ipy, and so / = f^M^M' = ^fL{^))L{^)^ it
follows from Proposition 6 with 6 =• if that / is increasing and lower semi-
continuous.
2) =^ 1). Assume that 2) holds. Since L[(f) = L{(f)', by (31) and that
cl (int C) = C, we have

jL{^)L{^y ^y,^ = {f^^^^)^^^\y) = - inf f^^'^Hx)


xex, x-^yeintc
sup (-/i(^)(x))= sup (-/^(^'(:c))
x€X,x+yemtC x€X,x+yeC

= sup (-/^(^)(x)) (yeX). (35)


XEX, x>—y

Now, let y G X be fixed and x E X he such that x > —y. Since by Proposition
6, /^(^^ is an increasing function, it follows that -f^^'^\x) < -f^^'^H-y),
and so in view of (35) and (31) and that cl (int C) = C, we get
360 H. Mohebi

f''^''^''^^^'{y) < -f'-^^H-y) = inf . fix)


xex, x-yGintc

= inf ^^f{x)= Ani f{x)<f{y) {y G X). (36)


XGX, x—yGC xeX, x>y
On the other hand, by (35) and (31) and t h a t / is increasing and lower
semi-continuous, we obtain

XEX^ x'>—y

= sup inf f(z) = sup inf f{z)


xex, x>-yzex, z+xGintc xex, x>-y^^^^ z+xec

= sup inf f{z)> sup f{-x)>f{y) (y e X). (37)


xeX, x>-y ^ ^ ^ ' z>-x j,^x, x>-y
Hence the result follows from (36) and (37), which completes the proof. D

T h e proof of the following theorem is similar to t h a t in finite dimensional


case (see [Sin02]).

T h e o r e m 1 1 . Let (p be the function defined by (3). Then for any topical func-
tion f : X —> M, we have

/^(^)(x) = r(^)(x) (xex).

References
[GG98] Gaubert, S., Gunawardena, J.: A non-linear hierarchy for discrete event
dynamical systems. Proc. 4th Workshop on discrete event systems, Cal-
giari. Technical Report HPL-BRIMS-98-20, Hewlett-Packard Labs. (1998)
[Gun98] Gunawardena, J.: An introduction to idempotency. Cambridge University
Press, Cambridge (1998)
[Gun99] Gunawardena, J.: Prom max-plus algebra to non-expansive mappings: a
non-linear theory for discrete event systems. Theoretical Computer Sci-
ence, Technical Report HPL-BRIMS-99-07, Hewlett-Packard Labs. (1999)
[GK95] Gunawardena, J., Keane, M.: On the existence of cycle times for some non-
expansive maps. Technical Report HPL-BRIMS-95-003, Hewlett-Packard
Labs. (1995)
[MRS02] Martinez-Legaz, J.-E., Rubinov, A.M., Singer, L: Downward sets and their
separation and approximation properties. Journal of Global Optimization,
23, 111-137 (2002)
[MR05] Mohebi, H., Rubinov, A.M.: Best approximation by downward sets with
applications. Journal of Analysis in Theory and Applications, (to appear)
(2005)
[RubOO] Rubinov, A.M.: Abstarct Convexity and Global Optimization. Kluwer
Academic Publishers, Boston/Dordrecht/London (2000)
Topical Functions and Their Properties 361

[RSOl] Rubinov, A.M., Singer, I.: Topical and sub-topical functions, downward
sets and abstract convexity. Optimization, 50, 307-351 (2001)
[Sin74] Singer, L: The theory of best approximation and functional analysis. Re-
gional Conference Series in Applied Mathematics, 13 (1974)
[Sin87] Singer, I.: Abstract Convex Analysis. Wiley-Interscience, New York (1987)
[Sin02] Singer, I.: Further application of the additive min-type coupling function.
Optimization, 5 1 , 471-485 (2002)
P a r t III

Applications
Dynamical Systems Described by Relational
Elasticities with Applications

Musa Mammadov, Alexander Rubinov, and John Yearwood

CIAO, School of Information Technology and Mathematical Sciences


University of Ballarat
Ballarat, VIC 3353, Austraha
m.mammadovQballarat.edu.au, a.rubinovQballarat.edu.au,
j.yearwoodQballarat.edu.au

Summary. In this paper we describe a new method for modelling dynamical sys-
tems assuming that the information about the system is presented in the form of
a data set. The main idea is to describe the relationships between two variables as
influences of the changes of one variable on another. The approach introduced was
examined in data classification and global optimization problems.

K e y words: Dynamical systems, elasticity, data classification, global opti-


mization.

1 Introduction
In [Mam94] a new approach for mathematical modeling of dynamical systems
was introduced. This approach was further developed in [Mam01a]-[MYA04]
and has been applied to solving many problems, including data classifica-
tion and global optimization. This paper gives a systematic survey to this
approach.
The approach is based on non-functional relationship between two vari-
ables which describes the influences of the change (increase or decrease) of
one variable on the change of the other variable. It can be considered as a
certain analog of elasticity used in the literature (see, for example, [IntTl]).
We shall refer to this relationship between variables as relational elasticity
{fuzzy derivative, in [Mam94, MamOlb, MYOl]).
In [MM02] the notion of influence (of one state on another state) as a
measure of the non-local contribution of a state to the value function at other
states was defined. Conditional probability functions were used in this defini-
tion, but the idea behind this notion is close to the notion of influence used in
[Mam94]. The calculations undertaken have shown that ([MamOla, MYOl])
366 M.A. Mammadov et al.

this definition of the influence provides better results than if we use conditional
probability.
As mentioned in [MM02] the notion of influence is also closely related to
dual variables (or shadow prices in economics) for some problems (see, for
example, [Gor99]).
We now describe some situations, where the notion of relational elasticity
can be applied. Classical mathematical analysis, which is based on the no-
tion of functional dependance, is suitable for examination of many situations,
where influence of one variable on another can be explicitly described. The
theory of probabilities is used in the situation, where such a dependance is
not clear. However, this theory does not include many real-world situations.
Indeed, probability can be used for examination of situations, which repeat
(or can be repeated) many times. The attempts to use probability theory in
uncertain situations, which can not be repeated many times, may lead to great
errors.
We consider here only real-valued variables (some generalizations to vector-
valued variables are also possible, however we do not consider them in the cur-
rent paper). One of the main properties of a real-valued variable is monotonic-
ity. We define the notion of infiuence by the increase or decrease of one variable
on the increase or decrease of the other. We can consider the change of a vari-
able as a result of activity of some unknown forces. In many instances our
approach can be used for finding resulting state without explicit description
of forces. Although the forces are unknown, this approach allows us to predict
their action and as a result, to predict the behavior of the system and/or give
a correct forecast. In this paper we undertake an attempt to give some descrip-
tion of forces acting on the system through the influences between variables
and to describe dynamical systems generated by these forces.
The suggested approach of description of relationships between variables
has been successfully applied to data classification problems (see [MamOla]-
[MYOl], and references therein). In this paper we will only concentrate on
some applications of dynamical systems, generated by this approach, and tra-
jectories to these systems.
In Section 5, we examine the dynamical systems approach to data clas-
sification by introducing a simple classification algorithm. Using dynamical
system ideas (trajectories) makes results, obtained by such a simple algo-
rithm, comparable with the results obtained by other algorithms, designed
for the purpose of data classification. The main idea behind this algorithm is
close to some methods used in Nonlinear Support Vector Machines (see, for
example, [Bur98]) where the domain is mapped to another space using some
nonlinear (mainly, quadratic) mappings. In our case the transformation of the
domain is made using the forces acting at each point of the domain.
The main application of this dynamical systems approach is to global opti-
mization problems. In Section 6, we describe a global optimization algorithm
based on this approach. The algorithm uses a new global search mechanism
based on dynamical systems generated by the given objective function. The
Dynamical Systems with Applications 367

results, obtained for many test examples and some difficult practical problems
([Mam04, MYA04]), have shown the efficiency of this global search mechanism.

2 Relationship between two variables: relational


elasticity
Let us consider two objects and assume that the states of these objects can
be described by the scalar variables x and y. Increases and decreases of these
variables indicates changes in the objects. The relationship between x and y
will be defined by changes in both directions: increase and decrease.
We define the influence of y on x as follows: consider for instance the
following event: y increases. As a result of this event x may either increase
or decrease. To determine the influence we have to define the degree of these
events. So we need to have the following expressions:
1) the degree of the increase of x when y increases;
2) the degree of the decrease of x when y increases.
Obviously, the expressions increase and decrease should be precisely de-
fined in applications. For example if we say that y increases then we should
determine: a) by how much? and b) during what time? These factors mainly
depend on the problem under consideration and the nature of variables. For
example, if we consider an economic system and y stands for the National
Product, then we can take one year (or a month, etc) as the time interval,
and for the increase we can take the relative increase of y. In some applica-
tions we do not need to determine the time. We denote the events y increases
and y decreases hy y ] and y | , respectively.
The key point in expressions 1) and 2) is the degree. Of course the degree
of these events depends on the initial state (point) {x,y). For example, we
can describe it by fuzzy sets on the plane (x, y); that is, at every initial point
{x,y) the degree can be defined as a number in the interval [0,1]. In general,
we will assume that the degree is a function oi (x^y) with non-negative values.
We denote the degrees corresponding to 1) and 2) by d{y T ^ T) ^^^
d{y t X I), or by ^i(x,?/) and ^2(^,2/)? respectively. We assume that the case
^i{x,y) = 0 corresponds to the lowest influence.
Similarly we can define the degree of decrease and increase of x when y
decreases. They will be described by functions ^3(0:, y) and ^4(x, y): ^s — d{y j
X i), U =d{y Ix T).
Note that in applications the functions ^i{x,y) can be computed in quite
different ways. For example, assume that there is a functional relation y =
f{x) and the directional derivative f\.{x) exists at the point x. In this case, we
can define ^\{x,y) -=• / | ( x ) and ^2(^,2/) = 0 if f'^[x) > 0, and ^i{x,y) = 0
and ^2(^5 y) = —f\.{x) if f'^{x) < 0. However, if the relation between variables
is presented in the form of some finite set of observations (for example, in terms
of applications to global optimization, it might be a set of some local minimum
368 M.A. Mammadov et al.

points found so far) we need to develop special techniques for computing the
functions ^^(x,y) (see Section 3)
Therefore, the functions ^^,i = 1,2,3,4 completely describe the influence
of the variable y on x in terms of changes. We will call it the relational elasticity
between the two variables and denote it by dx/dy.
Let £,{x,y) = {^i{x,y), 6(^>^), 6(^,2/), ^4{x,y)). So we have dx/dy =
^(x,y), where ^i(x,y), ^2{x,y), ^six.y) and ^4{x,y) are non-negative valued
functions.
By analogy we define dy/dx as an influence of x on y. Let dy/dx = rj{x, y),
where r] = (771,7/2,773,774), and 771 = d{x "[ y t ) , 772 = d{x "[ y i ) , Vs = d{x i y i
), 7/4 = d{x iy T).
Thus, the relationship between variables x and y will be described in the
following form:
dx/dy = ^(x, y), dy/dx = rj{x, y). (1)
The examples of relationships presented below show that the system (1)
covers quite a large range of relations including those that can not be described
by some functions (or even set-valued mappings).
1. A homotone relationship. Assume that ^i(x,y) > ^2(^,2/), ^?>{x^y)
:>U{x^y) and 771(0;, 2/) > 772(^,2/), 773(0:, y) >774(x,t/).
This case can be considered as a homotone relationship, because the in-
fluence of the increase (or decrease) of one variable on another is, mainly,
directed in the same direction: increase (or decrease).
2. A n antitone relationship. Assume that ^i(x,y) <^ ^2(^52/), ^si^.y)
< Ui^^y) and 771 (x,y) <C mi^^v)^ V^i^^v) < V4{x,y).
This case can be considered as an antitone relationship, because the in-
fluence of the increase (or decrease) of one variable on another is, mainly,
directed in the inverse direction: decrease (or increase).
3. Assume that the influence of y on x such that dx/dy = {a, a, a, a),
where a > 0. In this case the variable x may increase or decrease with the
same degree and these changes do not depend on y. We can say that the
influence of y on x is quite indefinite.
4. Let dx/dy = (a, 0,0, a), (a > 0). In contrast to case 3, in this case the
influence of y on x is quite definite; every change in y increases x.
5. Let dx/dy — (a, 0,6,0), where a, 6 > 0 and a^ b. This is a special case
(known as hysteresis) of a homotone relationship considered above, where as
y increases x increases strongly and when y decreases then x decreases not as
strongly. If such a relationship is valid at all points (x, y) then the dependence
between these variables can not be described by some mappings, like y = y{x)
or X = x(y).
More complicated relationships arise when all the components in dx/dy
are not zero. This is the case that we have when dealing with real problems
where the information about the systems is given in the form of some datasets.
Dynamical Systems with Applications 369

3 Some examples for calculating relational elasticities

In this section we give some examples to demonstrate the calculation of rela-


tional elasticities. Note that we can suggest quite different methods according
to the problem under consideration. In this paper, we examine the introduced
notions in the context of global optimization and data classification problems.
Accordingly, we consider the case when the relationship between variables x
and y is given in the form of a dataset and we present some formulae to cal-
culate relational elasticities which will be used in the applications below.

M . l . Consider data {(x"^,y'^), m = 1,...,M }. To calculate relational


elasticities first we have to define the events "increase" and "decrease". Here
we suggest two techniques. For the sake of definiteness, we consider only the
variable x.
Let x^ be the initial point.
Global approach. If x^ > x^ ( x'^ < x^, respectively) we say that for
the observation m the variable x increases (decreases, respectively).
Remark 3.1. In some cases it might be useful to define the increase
(decrease, respectively) of x for the observation m by x'^ > x^-{-5 {x^ < x^—6^
respectively), where 6 > 0.
Local approach. Take any number e > 0. li x^ e {x^,x^ + e) {x'^ 6
{x^ — €,x^), respectively) we say that for the observation m the variable x
increases (decreases, respectively).
Note that in the second case we follow the notion of the derivative in
classical mathematics as a local notion.
Now we give two methods, related to the global and local approaches, for
calculating a relational elasticity dy/dx = (771,7^2,^3? ^4) at the initial point
(x^2/0). Weset

771 - M i i / ( M i + 1), 772 = Mi2/(Mi + 1),


rjs = Mi3/(M2 + 1), 7/4 = Mi4/(M2 + 1). ^^^

For the global approach the numbers Mi, M n , M12, M2, M13, M14 stand for
the number of points (x^,y"^), satisfying x^ > x^, x'^ > x^ and y'^ > y^,
x"^ > x^ and y"^ < y^, x^ < x°, x"^ < x^ and y"^ < y^, x ^ < x^ and
y'^ > y^, respectively. In the local approach we use x^ G (x^,x^ + e) and
x^ e{x^ -e,x^) instead of x ^ > x° and x"^ < x°.
Note that according to Remark 3.1 we could define the changes of the
variable y by taking any small number S > 0. For instance, we could take
ym y yO _^^ instead of y"^ > y^.

M . 2 . Now we present a method for calculating relational elasticities which


will be used in the applications to global optimization.
Consider an objective function f{x): R^ -^ R and assume that
the values of the function have been calculated at some points; that is,
370 M.A. Mammadov et al.

fm ^ / ( x f , x ^ , . . . , x ; ^ ) , m = 1,...,M. Therefore, we have data A =


{(xj^,X2^, ...,xj^,/"^) : m = l,...,Af}. We can refer to these points as "local"
minimum points found so far. Let x^ = (x5,X2, ...,^n) t)e the "best" point
among these; that is, f^ = f{x^) > f^ for all m.
We will consider the relations between / and each particular variable,
say Xi, at the initial point x^. Clearly, in data A the event / j (that is, /
decreases) will not occur. Therefore, we set d{xi hfi)=0, d{xi j , / i) = 0,
d{f i, Xi t) = 0, d{f i, Xi I) = 0. We need to calculate the values d{xi T? / 1)^
d{xiiJ]), d(f txi^),^nd d{f t^i i).
We denote by || • || the Euchdian distance and let Z\x^ = x ^ — x^,
Af^ = / ( x ^ ) - /(x^), m = 1,..., M. Then we set:
1 Afm 1 Afm

where X+ - {m; zixf^ > 0}; X++ = {m; Z\x7^ > 0, Z \ / ^ > 0};
X~ — {m; zix^ < 0};
i - + = {m; Axf < 0, Af^ > 0}; i^+ = {m; Z \ / ^ > 0 > 0};
i;^++ = {m; Af^ > 0, Z\x7^ > 0}; i^+- = {m; Af^ > 0, Zix^^ < 0}.
The coefficients af^ = (|zAxf^|/||x"^ — x^|| )^ are used to indicate the contri-
bution of the coordinate i in the change ||x"^ —x^||. Clearly, a5i" + ... + a ^ = 1
for all m.

4 Dynamical systems
In this section we present some notions introduced in [Mam94] which have
been used for studying the changes in the system.
Consider a system which consists of two variables x and y, and assume that
at every point (x, y) the relationship between them is presented by relational
elasticities (1); that is:
dx/dy = C(x, y), dy/dx = rj{x, y),
In this case we say that a Dynamical System is given. Here we study the
changes of these variables using only the information obtained from relational
elasticities. In this way the notion of forces introduced below will play an
important role.
Definition 1. At given point (x, y) : the quantities F{x | ) — 771^1 + 772C4 ^^^
F{x I) — rjs^s + 774^2 CLre called the forces acting from y on the increase and
decrease of x, respectively; the quantity F{x) = F{x t) + F{x j) is called the
force acting from y on x. By analogy, the forces F{y),F{y ])^F{y | ) acting
from X on y are defined: F{y) = Fly t) + F{y j), F{y t) = 6 m + 6^4,
F{y i) = 6 ^ 3 + 6 ^ 2 .
Dynamical Systems with Applications 371

The main sense of this definition, for example for F{x t ) , becomes clear from
the expression

Fix T) = dixU T)% Ul) + dixU i)d{y i X T).


Prom Definition 1 we obtain
Proposition 1. At every point (x,y) the forces F{x) and F{y) are equal:
Fix) = Fiv).

This proposition states that, the size of the force on x equals the size of the
force on y. It can be considered as a generalization of Newton's Third Law of
Motion. To explain this statement, and, also, the reasonableness of Definition
1, we consider one example from Mechanics.
Assume that there are two particles, placed on a line, and x, y are their
coordinates. Let x < y. Then, in terms of gravitational influences, we would
have
dx/dy= (6,0,0,(^4), dy/dx = (0,772,773,0);
where ^1,^4,772,773 > 0. Then, from Definition 1 it follows that

F{x i) = 0, F{y T) - 0, and F{x ]) = F{y j) = 772^4 = d{x U i) d{y [ x ]).

This is the Newton's Third Law of Motion.


Now, we assume that the influences d{x ] y [) and d{y [ x ]) are propor-
tional to the masses mi and 7722 of the particles, and are inverse-proportional
to the distance r = \x ~ y\ between them; that is, d{x ] y [) — Ciirii/r and
d{y i X ^) = C2m2/r. Then, from Definition 1 we have

Fix^) = Fiyi) = CrC,.^.

This is consistent with the Newton's Law of Gravity.


The main characteristic of non-mechanical systems is that, all values
F{x I), F{x I), F{y I) and F{y | ) may be non-zero. This might be, in
particular, as a result of outside influences (say, some other variable z has
an influence on x and y). This is the main factor that complicates the de-
scription (modelling) relationships between variables and makes it difficult to
study the changes in the system.
Let the inequality F{x t) > F{x | ) hold at the point (x,y). In this case
we can say that there are superfluous forces acting for the increase of variable
X. If F{x "l) = F{x I) then these forces are balanced. So we can introduce

Definition 2. The point {x,y) is called a stationary point if


Fix T) = Fix i), Fiy T) = Fiv i);
and an absolutely stationary point if
372 M.A. Mammadov et al.

F{x T) = F{x i) = F{y 1) = F{y i) = 0.

Proposition 2. Assume that relational elasticities at the point {x,y) are


calculated such that
6+6 = 6 +6-1, (3)
^ 1 + ^ 2 = V3-^V4 = 1. (4)
Then the point (x, y) is an absolutely stationary point if and only if one of the
following conditions holds:

dx/dy= {1,0,1,0), dy/dx = {0,l,0,iy, (5)

II. dx/dy= {0,1,0,1), dy/dx = {1,0,1,0). (6)

Proof Prom conditions F{x t) = F{x j.) = F{y t) = F{y | ) == 0 we have

^i6 = 0 (7)

^36 = 0 (8)
V2U = 0 (9)
7746 = 0 (10)
Consider two cases.
1). Let 6 — 0- I^ this case we have

0 (i) . 3 ^ 1
(10)
6 =1 m ^^
(3) (4)- 771-1 -^ (6).
6 =0 ^ 6 =1
2). Let ?7i = 0. In this case we have

(8)
^2 = 1 => 6 0

r73 = 0 d) 774 = 1 (10) (3)


^^^ 6 = 0 ^ 6 = 1 -> (5).
D

This proposition shows that if {x,y) is an absolutely stationary point then


the influences x on y and y on x are inverse. In this case, the state {x, y) can
not be changed without outside forces (say the change can be generated as an
influence of some other variable ^ on x and y).
It is not difficult to prove the following propositions.
Dynamical Systems with Applications 373

Proposition 3. Assume that at the point {x^y) there is a homotone relation-


ship between x and y and

dx/dy = ( 6 {x, y), 0, ^^{x, y), 0), dy/dx = {rji (x, y), 0,773(2;, y), 0).

U ^i{Xiy)Vi{xiy) =^3{x^y)'n3{x^y) then {x,y) is a stationary point


Proposition 4. Assume that at the point (x,y) there is an antitone relation-
ship between x and y and

dx/dy = (0,C2(^,2/),0,^4(x,y)), dy/dx= (0,772(2:, y),0,774 (x,y)).

// £^2{x^y)'n4^{x^y) = £,/^{x^y)r]2{x^y) then {x,y) is a stationary point.

In this case we can say that there are no internal forces creating the changes
in the system. Changes in the system may arise only as a result of outside
forces.

4.1 Trajectories of the system (1)

In this section we study trajectories of the system (1). We define a trajectory


(xt.yt), (t = 0,1,2,...), of the system (1) using the notion of forces acting
between x and y. At every point (x,y) the forces F{x j), F{x j ) , F{y | ) ,
F{y I) are defined as in Definition 1.
Diff'erent methods can be used for calculation of trajectories. We present
here two methods which will be used in the applications below.
Consider a variable ^ and let A^(t) = F{{{t) T) - F{^{t) j). In the first
method we define a trajectory as follows:

at + l)=at)+a-Sign{A^{t)y, (11)
where

{ 1 ifa>0;
0 i f a = 0;
- 1 if a < 0.
In the second method we set

at+i)=m+c^-mt). (12)
The difference between these formulae is that, in (12) the variables are
changed with different steps along the direction /\^(t), whilst, in (11) all
the variables are changed with the same step a > 0.
Consider an example.
Example 1. Consider a domain D = {{dib) : a G [0,10],6 G [1,10]}. Assume
that the field of forces in the domain D is defined by the data {(x,y)} pre-
sented in Table 1. Using this data, we can calculate forces acting at each
374 M.A. Mammadov et al.

Table 1. Data used in Example 4.1


X 1|1|2|2|2|3|3|4|4|5|5|6|6|7
y\ 2I4I3I4I5I3I4I2I4I2I5I3I4I4

point (x^y) G D and, then, we can calculate trajectories to system (1). First
we calculate the values of relational elasticities dy/dx and dx/dy by the local
approach taking 5 = 1.1 (see Section 3). Then, we generate trajectories taking
a = (0.5)^ and different initial points. We consider two cases /c = 0 and k > 1.

1. Let fc = 0. Consider a trajectory {x{t)^y{t)) starting from the initial


point (x(0), 2/(0)) = (2,2). We have (a;(l), y(l)) = (3,3) and (x(2m), ^(2m)) -
(4,4), (x(2m + l),2/(2m + 1)) = (5,3) for m > 1. Therefore, the set
Pi = {(4,4), (5,3)} is a limit cycle of the trajectory {x{t),y{t)). Now con-
sider other trajectories starting from different initial points (a, 6), a,b G
{0,1,..., 10}. Each trajectory has one of the following three limit cycles: Pi,
P2 = {(2,3), (3,4)}, Ps = {(5,4), (4,3)}. Thus, the domain D is divided
into 3 parts so that all trajectories, starting from one of these parts, have the
same limit cycle.
2. Let k > 1. In this case we observe that each trajectory has one of the
following limit cycles:

Pf = {(4,4), ( 4 - ( 0 . 5 ) ^ 4 - (0.5)'=)}, P^ = {(3,4), (3 - (0.5)^4 - (0.5)'=)}.

Therefore, in this case, the domain D can divided into two parts, as well
as data presented in Table 1. Clearly, if A: ^ oo then P^ —> {(4,4)}, P2 ^^
{(3,4)} in the Hausdorff metric.
We observe that there are three sets Pi^P2^P^ for k = 0 and two sets
Pi^P2 for A: > 1, which are the limit cycles for all trajectories. This means
that the turnpike property is true for this example (see [MR73]). Thus, the
idea of describing dynamical systems in the form of (1) and the study of
trajectories to this system can be used in different problems. In the next
section, we check this approach in data classification problems. As a domain
D we take the heart disease and liver disorder databases.

5 Classification Algorithm based on a dynamical systems


approach
Consider a database A d BP^^ which consists of two classes: A^ and A^. We
denote by J = {1,2,..., n} the set of features.
The first stage of the data classification is the scaling phase. In this phase
the data is considered to be measured on an m level scale. We did not use
any of the known methods (for example, [DKS95]) for discretizing continuous
Dynamical Systems with Applications 375

attributes. Here we treat all the attributes uniformly i.e. we simply considered
m levels for each attribute. Intervals related to these levels are defined only
by using the training set and, therefore, the scaled values of the features of
the observation depends on the training set.

Scaling. Take any number m G {1,2,...}. First for every feature j ^ J


we calculate the maximum and minimum quantities among all points of the
set A = A^ U A^. Let a^ and a^ be the maximum and minimum quantities,
respectively. Then any observation x = (xi,...,Xn) is transformed into y —
(yi, ...,yn) by the formula

{ 1 if Xj < a'j;

p if Xj e [a| + aj{p - 1), a^ + ajp) , ;? = 1 , . . . , m;


m if Xj > a ] ,
where aj — [a^j — a^)/m.
As a result, ail the observation x = (xi,..., x^) are transformed into vectors
y = (yi,...,yn)? with coordinates t/j G {1,2,3, ...,m}. Every new observation
(test example) will also be scaled by this formula. After this scaling the data-
base A is transformed into a set, which will be denoted by A. The set A
consists of two scaled classes A^ and A'^ which are the transformation of the
classes A^ and A^, respectively.
This scaling is not linear and so the structure of the sets can essentially
be changed after this scaling. This is why we use different numbers m in the
classification. Note that for small numbers m the classes A^ and A"^ may not
be disjoint even if the sets B and D are disjoint. The minimal number m for
which these classes are disjoint is m—19 for the liver-disorder database and is
m=4: for the heart disease database.
Therefore, we have a scaled (with m subdivisions) database A C R^ which
consists of two classes A^ and ^^. By a* = (a^, a2,..., a^) (i = 1,2) we denote
the centroid of the class A\ Let x*^ be a test point.
For classification we use a very simple method which consists of two or-
dered rules. The point x*^ is predicted to belong to the class A^ if:
First Rule: 7^i(x*^) = A^; that is, x*^ ^ a for some a e A^ and x*^ ^ b
for all b e A^,j y^ i; otherwise go to the second rule.
Second Rule: 7^2(x*^) = A'; that is, ||x*^ - a'\\ < \\x^' - a^'||, j ^ i.
Here we use the Euclidean norm || • || in R^. The notations x*^ « a and
x*^ 7^ b are used in the following sense:
max |x*^ — %| < V ^^d max \xY — bj\ > rj;

where ry > 0 is a given tolerance. Since the set A consists of the vectors with
integer coordinates we take rj = 1/3 in the calculations below.
Clearly we can not expect a good performance from such a simple al-
gorithm (see the results presented in Table 2 for T == 0), but considering
376 M.A. Mammadov et al.

trajectories starting from test points, we can increase the accuracy of classifi-
cation. The results obtained in this way are even comparable with the results
obtained by other classification algorithms (see Table 3).
We define the field of forces in R^ using the set A which contains all
training examples from both classes. At a given point x = (xi,...,Xn) the
relational elasticities are calculated for each pair of features (z, j ) by the global
approach (see Section 3). Let F{xj -^ Xi | ) and F{xj -^ Xi j) be the forces
acting from the feature j to decrease and increase, respectively, the feature i
at the point x. Then the resulting forces on the feature i is defined as a sum
of all these forces; that is,

Xi T). (13)

Then, given new (test) point x*^, we calculate (as in Example 1) a trajec-
tory X (t) {t = 0,1,2, ...,T) started from this point. We use a step a = 0.25.
To decrease the influence of circulating effects on the transform the trajectory
X (t) to x{t) by taking middle points of each of the last 5 steps; that is,
( ^ ( 0 ) - f x ( l ) + ...x(t) .r . ^ A.

I x(t-4) + x(t-3) + ...x(t) If ^ > 4^


^ 5 —

Table 2. Accuracy for test set for the heart disease and liver-disorder databases
with 10-fold cross validation obtained by Algorithm F
T 0 2 4 6 8 10 12 14 16 18 20
Heart 80.0 80.0 80.3 80.7 81.0 81.4 81.4 81.7 81.7 81.7 82.1
Liver 60.6 60.3 63.8 63.8 67.1 67.6 68.5 69.7 69.7 69.4 70.9
T 22 24 26 28 30 32 34 36 38 40 42
Heart 82.1 82.4 82.8 82.4 82.4 82.4 82.1 82.8 82.4 83.1 83.1
Liver 70.3 70.9 70.9 70.6 70.3 70.0 70.0 71.8 71.2 70.6 70.6

Classification Algorithm (F).


Step 1. Set ^ = 0.
Step 2. If lZi{x (t)) = A^ then the example x^^ is predicted to belong to
the class A^. Otherwise we set t = t -h 1. If t < T go to Step 2, otherwise go
to Step 3.
Step 3. If lZ2{x{T)) = A^ then the example x^^ is predicted to belong
to the class A'^. Otherwise the program terminates and the test point x^^ is
unclassified.
We apply this algorithm to the heart disease and liver disorder databases
taking the consecutive scaling numbers m = 20,21, ...40. We use 10-fold cross
validation.
Dynamical Systems with Applications 377

Table 3. Results for the heart disease and liver-disorder databases with 10-fold
cross validation obtained by other methods
Heart Liver
Algorithm Ptr Pts Ptr Pts
HMM 87.5 82.8 72.2 66.6
PMM 91.4 82.2 74.9 68.4
RLP 84.5 83.5 69.0 66.9
S V M II • 111 85.3 84.6 67.8 64.0
S V M II . Iloo 85.8 82.5 68.7 64.6
S V M II . 11^ 84.7 75.9 60.2 61.0

Note that in this application the choice of a combination of features is


very important. The combination of features should form, in some sense, a
minimal closed system in which the influences of the features on each other
contain complete information about the process under consideration (disease
in our case). For example, using two "similar" features can contribute noise
because of summing (13). In this paper we did not try to find an optimal
combination of features. Our aim is to find some combination of features for
which the summing (13) does not create so much noise. For the heart disease
database we use all 13 features, for the liver disorder database good results
are obtained when we take just three features - the third, fourth and fifth. The
result obtained for the test set for different time periods T are presented in
Table 2. The accuracy for the training set is stable: 100.0 for the heart disease
database and 99.7 for the liver disorder database and, so we did not present
them in Table 2. The results show that when T increases, more test points
become closer to the centroid of their own class. As a result, the accuracy
of classification becomes sufficiently high. To have some idea about the level
of accuracy that could be achieved in these domains, in Table 3, we present
results obtained by other methods: HMM - Hybrid misclassification minimiza-
tion ([CM95]), PMM - Parametric misclassification minimization ([Man94]),
RLP - Robust linear programming ([BM92]), SVM || • jji, SVM || • ||oo, SVM
II • II2 - Support vector machines algorithms with 1-norm, 00-norm and 2-norm
([BM98]).

6 Algorithm for global optimization

In this section we apply the approach described above to global optimization


problems. More detailed information about this application can be found in
[Mam04].
We consider the following unconstrained continuous optimization problem

minimize f{x) (14)


378 M.A. Mammadov et al.

s.t. X e R^^ ai < Xi < bi^ i = I,..., n. (15)


For the convenience, we will use the symbols LocDD, LineSearch and
LocOpt defined below.
LocDD. Given point x, we denote by / = (^i,--.,^n) == LocDD{x) a
local descent direction from this point. It can be calculated in different ways.
In the calculations below, it is calculated as follows, let £ > 0 be a given
small number. Take any coordinate i G {1, ...,n}, and calculate the values of
the objective function / ; let ao = / ( x i , ...,Xn), ai = /(xi,...,Xi — e, ...,Xn),
^2 = f{xi,.>.-,Xi + 6:, ...,Xn). Then we set li = 0 if ai > ao and a2 > ao;
li = ao ~ a2 if tti > ao and a2 < ao; U = ao — ai if ai < ao and a2 > ao.
If ai < ao and a2 < ao then we set li = ao — a2 if ao — a2 > ao — ai; and
li = ao — ai if ao — a2 < ao — ai.
LineSearch. Given point x and direction /, we denote by LineSearch (1)
the best point on the hne x + tl, t > 0. In the calculations below, we apply
inexact line search, taking t = 7717, (m = 0,1,...), where 7 > 0 is a some small
step.
LocOpt. For the local minimization we could use different methods. In
this paper we apply a direct search method called local variations. This is an
efficient local optimization technique that does not explicitly use derivatives
and can be applied to non-smooth functions. A good survey of direct search
methods can be found in [KLT03].

The algorithm contains the following steps.


Step 1. Let L be a given integer, and k G {0,1,2,..., L — 1}. For each k
we define the box

Bk^ixGR"", a^ <Xi<b^, i = l,...,n};

where 5i = (hi — ai)/2L and a^ = ai + kSi, b^ = bi — kSi.


Step 2. For each box Bk^ we find a minima x^, /c = 1,2,..., L — 1.
Step 3. Let x* = argmin{/(x'^), A: = 1,2, ...,L — 1}. We refine the point
X* by local optimization and get the global solution Xmin — LocOpt (x*).

Now, given box Bk, we describe the procedure of finding a good solution

1. To apply the methods of dynamical systems, described above, we need


to have a corresponding dataset. In other words, we need to generate some
initial points and calculate values of the objective function at these points.
Different methods can be used for the choice of initial points. In the algorithm
described here, we generate initial points from the vertices of boxes Bk>
Let A = {x^, ...,x"^} be the set of initial points.

2. Given point x we find x* = LocOpt{LineSearch{LocDD{x))) which


means that
Dynamical Systems with Applications 379

- we calculate the local descent direction I = LocDD{x) at the point x;


- then we find the best point y = LineSearch{l) on the line /;
- and, finally, we refine the point y by local optimization and get the point
X* = LocOpt{y).
We apply this procedure for each initial point from the set A and obtain
the set ^(0) = { x * ' \ . . . , x * ' ^ } , where

X*'* = LocOpt{LineSearch{LocDirection{x^))), i = I, ...,m,

Let
x*(0) = argmin{/(x) : x G ^(0)}.
3. The set ^(0) together with the values of the objective function allows
us to generate a dynamical system. Our aim in this step is to find some "good"
point x*(l) and add it to the set ^(0).
Let t = 0 and the point x*(^) be the "best" point in the set A{t).
The main part of the algorithm is to determine a direction, say F{t), at
the point x*(^), which can provide a better solution x*{t-\-l). We can consider
F{t) as a global descent direction. For this aim, using the set A{t), we calculate
the forces acting on / | at the point x*(t) from each variable i G { l , . . . , n } .
We set F{t) = {Fi{t), ,..,Fn{t)) where the components Fi{t) = F{i -^ f f)
are calculated at the point x*{t) (see Definition 1). Then we define a point
x{t + 1) by formula (12); that is, we consider the vector — F{t) as a descent
direction and set

x{t + 1) = x*(t) - a*{t)F{t) (16)


where the step a*{t) is calculated as

a*{t) = arg min {/(x*(t) - aF{t)) : (17)

a G {(ai,...,an) : a^ = — (6^ - a^), / == 1, ...,M}}. (18)

Clearly x(t + l) ^ x*(t). Then we calculate x°(^ + l) := LocOpt{x{t + l)),


and set A{t + l) = A{t) [J {x^{t-\-1)]. The next "good" point :z;*(^ + 1) is
defined as the best point in the set A{t-\-l)\ that is, x*(t + l) = x°(t + l), if
our search was successful ( /(x^(t + 1)) < f{x*{t)) ), and a:*(t + 1) = x*(t),
if it was not.
We continue this procedure and obtain a trajectory x*(t), t = 1,2,....,
starting from initial point x*(0). The process is terminated at the point
x*(T), if either F{T) = 0 or T > T*, where T* is a priori given number. We
note that F{t) = 0 means that x*(^) is a stationary point.
Therefore, x^ — x*{T) is a minimum point for the box Bk.
In the calculations below we take L = 1 0 , M = = 1 0 0 and T* — 20. It is
clear that, the results obtained can be refined by choosing larger L, M, T*.
380 M.A. Mammadov et al.

We call this algorithm AGOP ([Mam04]). For the calculation of direction


F{t)^ we need to determine the influences d(xi T? / T)? d{xi j , / t), d{f T, ^i T)
and d{f |,x^ j ) . For this aim, we use the methods introduced in Section 3.
Therefore, we will consider two versions of the algorithm AGOP; the version
AGOP(F) which uses the method M . l described in Section 3 and the version
AGOP(D) which uses the method M.2.

There are many different methods and algorithms developed for global op-
timization problems (see, for example, [MPVOl, PR02, Pin95] and references
therein). Here, we mention some of them and note some aspects.
The algorithm AGOP takes into account some relatively "poor" points for
further consideration. This is what many other methods do, such as Simulated
Annealing ([Glo97, Loc02], Genetic Algorithms ([Smi02]) and Taboo Search
([CK02, Glo97]). The choice of a descent (good) direction is the main part
of each algorithm. Instead of using a stochastic search (as in the algorithms
mentioned), AGOP uses the formula (16), where the direction F{t) is defined
by relational elasticities.
Note that the algorithm AGOP has quite different settings and motiva-
tions compared with the methods that use so called "dynamical search" (see
[PWZ02] and references therein). Our method of a search has some ideas in
common with the heuristic method which attempts to estimate the "overall"
convexity characteristics of the objective function ([DPR97]). This method
does not work well when the postulated quadratic model is unsuitable. The
advantage of our approach is that we do not use any approximate underesti-
mations (including convex underestimations).
The methods that we use in this paper, are quite different from the homo-
topy and trajectory methods ([Die95, For95]), which attempt to visit (enu-
merate) all stationary points (local optimas) of the objective function, and,
therefore, cannot be fast for high dimensional problems. The algorithm AGOP
attempts to jump over local minimum points trying to find "deeper" points
that do not need to be a local minima.

7 Results of numerical experiments

Numerical experiments have been carried out on a Pentium III PC with 800
MHz main processor. We use the following notations:
n - is the number of variables;
fmin - is the minimum value obtained;
fbest - is the global minimum or the best known result;
t (sec) - is the CPU time in seconds;
Nf - is the number of function evaluation.
We used 24 well known test problems (the list of test problems can be found
at [Mam04]). The results obtained by algorithms AGOP(F) and AGOP(D)
are presented in Table 4. We observe that the version AGOP(F) is more stable
Dynamical Systems with Applications 381

in finding global minima in all cases, meanwhile the version AGOP(D) has
failed in two cases (for the Rastrigin function). In Table 5, we present the
elapsed times and the number of function evaluations for functions with large
number of variables obtained by AGOP(F).
The results obtained have shown the efficiency of the algorithm. For in-
stance, for some of the test examples (where the number of variables could be
chosen arbitrarily), the number of variables is increased up to 3000, and the
time of processing was between 2 (for Rastrigin and Ackley's functions) and 12
(for Michalewicz function) minutes. We could not find comparable results in
the literature. For instance, in [LL05] (Genetic Algorithms), the problems for
Rastrigin, Griewank and Ackley's functions are solved for up to 1000 variables
only, with the number of function evaluations [337570, 574561], 563350 and
[548306, 686614], respectively (3 digit accuracy was the goal to be achieved).
In our case, we have the number of function evaluations 174176, 174124 and
185904, respectively (see Table 4), with the complete global search.

8 Conclusions and future work


In this paper we developed a method to describe a relationship between two
variables based on the notion of relational elasticities. Some methods for cal-
culation of the relational elasticities are presented. We defined dynamical sys-
tems by using the relational elasticities and made some brief analysis of tra-
jectories of such dynamical systems with applications to data classification
and global optimization problems. The results obtained show that the rela-
tional elasticities can be considered a sound mathematical method to describe
a relationship between two variables.
One of the main problems of our future investigation is to study a re-
lationship between more than two variables. In this paper we simply used
either formula (13), where the forces acting on some variable are summed, or
the method described in M.2, Section 3. It will be very useful to define the
infiuence of a combination of variables on some other variable.
We introduced a global optimization algorithm that can be used to han-
dle functions with a large number of variables for solving continuous uncon-
strained optimization problems. The algorithm can be developed for solving
continuous constrained optimization problems where special penalty functions
and non-linear Lagrange-type functions (see [RY03]) are involved. In fact, the
methodology that we use can be adapted for discrete optimization, because
the determination of forces does not need a continuous state space. There-
fore, the development of algorithms for solving discrete (unconstrained and
constrained) optimization problems will be our future work.
382 M.A. Mammadov et al.

Table 4. The results obtained by AGOP for non-convex continuously differentiable


functions
1 Function n 1 foLMin LAGOP(F) 1 AGOP(D)
1 Ackleys 2 0 1 0.000048 0.000048
1 Ackleys 1000 0 0.000459 0.000241
1 Ackleys 3000 0 0.000516 0.000495
1 Bohachevsky 1 2 0 0 0
1 Bohachevsky 2 2 0 0 0
1 Bohachevsky 3 2 0 5.5753-10-' 3.1066-10-''
1 Branin 2 0 1.544510"' 1.5445-10"'
1 Camel 2 -1.03163 -1.03162842 -1.03162844
1 Easom 2 -1 -0.9999998 -0.9999999
1 Golds, and Price 2 3 3.00000037 3.00000037
1 Griewank 2 0 7.38ao-« 7.83-10-^*
1 Griewank 1000 0 4.248-10"^ 3.784-10-''
1 Griewank 3000 0 4.43110"^ 9.917-10"^
1 Hansen 2 -176.5417 -176.54179 -176.54179
1 Hart man 3 -3.86278 -3.86278 -3.86278
1 Hart man 6 -3.32237 -3.322368 -3.3223678
1 Levy Nr.l 2 0 1.309-10-^ 1.309-10"''
1 Levy Nr.l 1000 0 1.433-10-^ 1.433-10"^
1 Levy Nr.l 3000 0 3.875-10"^ 3.875-10"^
Levy Nr.2 2 0 6.618-10-^ 6.618-10"'^
Levy Nr.2 1000 0 1.43410"'* 1.434-10"'*
1 Levy Nr.2 3000 0 1.29210-^ 1.292-10"'*
1 Levy Nr.3 4 -11.5044 -11.5044 -11.5044
Levy Nr.3 1000 -11.5044 -11.395 -11.395
1 Levy Nr.3 3000 -11.5044 -11.395 -11.5044
1 Michalewicz 2 -1.8013 -1.8013 -1.8013
1 Michalewicz 5 -4.687 -4.6876581 -4.6876577
1 Michalewicz 10 -9.660 -9.6601482 -9.6601481
1 Michalewicz 1000 N/A -957.0770 -964.1458
1 Michalewicz 3000 N/A -2859.124 -2859.124
1 Rastrigin | 2 0 1.016-10-^ 2.525-10"'^
1 Rastrigin | 1000 0 1.440-10-'* 323.362
Rastrigin | 3000 0 2.159-10-^ 1546.17
SchafferNr.l 2 0 0 0
Schaffer Nr.2 2 0 4.845-10-'* 4.845-10"'*
Shekel-5 4 -10.15320 -10.15319 -10.15319
Shekel-7 4 -10.40294 -10.40294 -10.40294
ShekeHO 4 -10.53641 -10.5364045 -10.5364045
Shubert Nr.l 2 -186.7309 -186.7309 -186.7309
Shubert Nr.2 2 -186.7309 -186.7309 -186.3406
Shubert Nr.3 | 2 | -24.0625o| -24.062498 -24.062498 |
Dynamical Systems with Applications 383

Table 5. Elapsed times and the number of function evaluations for AGOP(F)
1 Function n JBest Jmin \t (sec) Nj
1 Ackleys lUUU 0 0.000459 1 21.23 185904
1 Ackleys 300U 0 0.000516 145.67 530154
1 Griewank lUUU 0 4.248-10"^ 42.74 174124
1 Griewank 3000 0 4.43M0~^ 367.09 555123
1 Levy Nr.l 1000 0 1.433-10-'^ 22.07 163724
Levy Nr.l 3000 0 3.875-10"^ 201.06 463924
Levy Nr.2 1000 0 1.434-10-^, 46.75 165724
Levy Nr.2 3000 0 1.292-10"^ 380.01 463724
Levy Nr.3 1000 -11.5044 -11.395 24.68 182522
Levy Nr.3 3000 -11.5044 -11.395 174.62 573514
Michalewicz 1000 N/A -957.0770 68.08 257265
Michalewicz 3000 N/A -2859.124 715.60 955907
Rastrigin 1000 0 1.440-10-H 20.69 174176
Rastrigin 3000 0 2.159-10-^ 1162.07 5091251

References
[BM92] Bennett, K.P., Mangasarian, O.L.: Robust linear programming discrim-
ination of two linearly inseparable sets. Optimization Methods and Sol-
ware, 1, 23-34 (1992)
[BM98] Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimiza-
tion and support vector machines.In: Shavlik, J. (ed) Machine Learning
Proceedings of the Fifteenth International Conference (ICLML'98), 82-
90. Morgan Kaufmann, San Francisco, Cahfornia (1998)
[Bur98] Burges, J.C.: A tutorial on support vector machines for pattern recog-
nition. Data Mining and Knowledge Discovery, 2 121-167 (1998)
(http://svm.research.bell-labs.com/SVMdoc.html)
[CM95] Chen, C , Mangasarian, O.L.: Hybrid misclassification minimization.
Mathematical Programming Technical Report, 95-05, University of Wis-
consin (1995)
[CK02] Cvijovic, D., Klinovski, J.: Taboo search: an approach to the multiple-
minima problem for continuous functions. In: Pardalos, P., Romeijn, H.
(eds) Handbook of Global Optimization, 2, Kluwer Academic Publishers
(2002)
[Die95] Diener, I.: Trajectory methods in global optimization. In: Horst, R.,
Pardalos, P. (eds) Handbook of Global Optimization, Kluwer Academic
Publishers (1995)
[DPR97] Dill, K.A., Phillips, A.T., Rosen, J.M.: Molecular structure prediction by
global optimization. In: Bomze, I.M. et al (eds) Developments in Global
Optimization, Kluwer Academic Publishers (1997)
[DKS95] Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised dis-
cretization of continuous features. ICML-95 (1995)
[For95] Forster, W.: Homotopy methods. In: Horst, R., Pardalos, P. (eds) Hand-
book of Global Optimization, Kluwer Academic Publishers (1995)
384 M.A. Mammadov et al.

[Glo97] Glover, F., Laguna, M.: Taboo search. Kluwer Academic Publishers (1997)
[Gor99] Gordon, G.J.: Approximate solutuions to Markov decision processes.
Ph.D. Thesis, CS department, Carnegie Mellon University, Pittsburgh,
PA (1999)
[IntTl] Intriligator, M.D.: Mathematical Optimization and Economic Theory,
Prentice-Hall, Englewood Cliffs (1971)
[LL05] Lazauskas, L: http://solon.cma.univie.ac.at/ neum/glopt/results/ga.html
- Some Genetic Algorithms Results (collected by Leo Lazauskas) (2005)
[KLT03] Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search:
new perspectives on some classical and modern methods, SI AM Review,
45, 385-482 (2003)
[Loc02] Locatelli, M.: Simulated annealing algorithms for continuous global opti-
mization. In: Pardalos, P., Romeijn, H. (eds) Handbook of Global Opti-
mization, 2, Kluwer Academic Publishers (2002)
[MR73] Makarov, V.L., Rubinov, A.M.: Mathematical theory of economic dynam-
ics and equilibria, Nauka, Moscow (1973) (English trans.: Springer-Verlag,
New York, 1977)
[Mam94] Mamedov, M.A.: Fuzzy derivative and dynamic systems. In: Proc. of the
Intern. Conf. On Appl. of Fuzzy systems, ICAFS-94, Tabriz (Iran), Oct.
17-19, 122-126 (1994)
[MamOla] Mammadov, M.A.: Sequential separation of sets with a given accuracy
and its applications to data classification. In: Proc. of the 16-th National
Conference the Australian Society for Operations Research in conjuction
with Optimization Day, 23-27 Sep., Mclarens on the Lake Resort, South
Australia (2001)
[MamOlb] Mammadov, M.A.: Fuzzy derivative and its applications to data clas-
sification. The 10-th IEEE International Conference on Fuzzy Systems,
Melbourne, 2-5 Dec. (2001)
[Mam04] Mammadov, M.A.: (2004). A new global optimization algorithm based
on dynamical systems approach. In: Rubinov, A., Sniedovich, M.
(eds) Proceedings of The Sixth International Conference on Optimiza-
tion: Techniques and Applications (IC0TA6), University of Ballarat,
Austraha, Dec. 2004, Article index number 198 (94th article); Also
in: Research Report 04/04, University of Ballarat, Australia (2004)
(http://www.ballarat.edu.au/ard/itms/publications/researchPapers.sht-
ml)
[MRYOl] Mammadov, M.A., Rubinov, A.M., Yearwood, J.: Sequential separation
of sets and its applications to data classification. In: Proc. of the Post-gr.
ADFA Conf. On Computer Science, 14 Jul. 2001, Canberra, Australia,
75-80 (2001)
[MSY04] Mammadov, M.A., Saunders, G., Yearwood, J.: A fuzzy derivative ap-
proach to classification of outcomes from the ADRAC database. Interna-
tional Transactions in Operational Research, 1 1 , 169-179 (2004)
[MYOl] Mammadov, M.A., Yearwood, J.: An induction algorithm with selec-
tion significance based on a fuzzy derivative. In: Abraham, A., Koeppen,
M. (eds) Hybrid Information Systems, 223-235. Physica-Verlag, Springer
(2001)
[MYA04] Mammadov, M.A., Yearwood, J., Aliyeva, L.: (2004). Multi label clas-
sification and drug-reaction associations using global optimization tech-
niques. In: Rubinov, A., Sniedovich, M. (eds) Proceedings of The Sixth
Dynamical Systems with Applications 385

International Conference on Optimization: Techniques and Applications


(IC0TA6),University of Ballarat, Australia, Dec. 2004, Article index
number 168 (76th article) (2004)
[Man94] Mangasarian, O.L.: Misclassification minimization. Journal of Global Op-
timization, 5, 309-323 (1994)
[MPvol] Migdalas, A., Pardalos, P., Varbrand P.: Prom Local to Global Optimiza-
tion. Nonconvex Optimization and Its Applications, 53, Kluwer Academic
Publishers (2001)
[MM02] Munos, R., Moore, A.: Variable resolution discretization in optimal con-
trol. Machine Learning, 49, 291-323 (2002)
[PR02] Pardalos, P., Romeijn, H. (eds): Handbook of Global Optimization, 2,
Kluwer Academic Publishers (2002)
[Pin95] Pinter, J. (ed): Global Optimization in Action. Kluwer Academic Pub-
lishers (1995)
[PWZ02] Pronzato, L., Wynn, H., Zhigljausky, A.A.: An introduction to dynamical
search. In: Pardalos, P., Romeijn, H. (eds) Handbook of Global Optimiza-
tion, 2, Kluwer Academic Publishers (2002)
[RY03] Rubinov, A.M., Yang, X.Q.: Lagrange-type Functions in Constrained
Non-convex Optimization. Applied Optimization, Volume 85. Kluwer
Academic Publishers (2003)
[Smi02] Smith, J.: Genetic algorithms. In: Pardalos, P., Romeijn, H. (eds) Hand-
book of Global Optimization, 2, Kluwer Academic Publishers (2002)
Impulsive Control of a Sequence of Rumour
Processes

Charles Pearce^, Yalcin Kaya^, and Selma Belen^

^ School of Mathematics
The University of Adelaide
Adelaide, SA 5005, Australia
cpearceQmaths. a d e l a i d e . edu. au, sbelenQankaLra. baskent. edu. t r
^ School of Mathematics and Statistics
University of South Australia
Mawson Lakes, SA 5095, Australia; Departamento de Sistemas e Computagao
Universidade Federal do Rio de Janeiro
Rio de Janeiro, Brazil
Yalcin.KayaQunisa.edu.au

Summary. In this paper we introduce an impulsive control model for a sequence of


rumour processes evolving in a given population. Each rumour process begins with a
broadcast, the recipients of which begin to spread that rumour. The recipients of the
first broadcast are termed the subscribers. The second and subsequent broadcasts
are either to the subscribers (Scenario 1) or to all individuals who have at any
time to date been spreaders (Scenario 2). The objective is to time the second and
subsequent broadcasts so as to minimise the final proportion of ignorants. It is shown
that with either scenario the optimal time for each broadcast after the first is when
the proportion of spreaders in the rumour process begun by the previous broadcast
reaches zero. Results are presented concerning dependence on initial conditions, as
well as graphical illustration of the controlled rumour processes under each scenario.

K e y w o r d s : rumours, information spread, impulsive optimal control

2 0 0 0 M R S u b j e c t C l a s s i f i c a t i o n . 60J75, 60J27, 91D99

1 Introduction
Stochastic rumour models were introduced by Daley and Kendall [DK65], who
considered a single initial spreader introducing a rumour into a closed pop-
ulation. Initially the remainder of the population do not know the rumour
and as such are termed ignorants. The members of the population meet one
another with uniform mixing. A spreader-ignorant interaction converts the
ignorant into a spreader. W h e n two spreaders interact, they stop spreading
388 C. Pearce et al.

the rumour and become stifiers. A spreader-stifler interaction results in the


spreader becoming a stifler. Otherwise interactions leave the roles of individu-
als unchanged. With a change of time scale, this model may be converted into
one in which those interactions effecting a change do so only with probability
p {0 <p<l),
Daley and Kendall reported a striking result later refined as follows: with
one initial spreader, the proportion of the population never to hear the rumour
converges almost surely to ^ 0.2031878 of the population size as the latter
tends to infinity. The constant is the solution of a certain transcendental
equation. The same constant arises with a variant stochastic model of Maki
and Thompson [MT73]. With both models the number of spreaders eventually
becomes zero and the process stops.
A rigorous treatment of this result requires surprisingly delicate analysis,
and much of the ensueing literature has been involved with technical questions
arising from this. See, for example, [Bar72, GanOO, Pit90, Sud85, Wat87].
Rumour models can be used to describe a number of phenomena, such
as the dissemination of information, disinformation or memes, and changes in
political persuasion and the stock market. They are therefore of some practical
significance. It is difficult to dispute that rumours have an important impact
on stock prices. A stochastic model of the so-called pervasive rumour phe-
nomenon in the stock markets can be found in [Bom03]. Reference [DMCOl]
studies the customer behaviour and marketing implications of urban legends
and rumours. There have also been studies of rumour models over ordered net-
works [FPRU90, ZanOl]. References [AH98, DP03, OT77] deal with processes
with more than one rumour at any given time.
Many broad questions pertaining to stochastic rumours have still to be ad-
dressed, partly because the technical questions tend to be rather more difficult
than those for the related stochastic epidemic. It is remarkable that while the
time-dependent behaviour of the general stochastic epidemic was determined
in the 1960s, that of the Daley-Kendall and Maki-Thompson rumour models
was not elucidated until 2000 [PeaOO]. The effect of varying from unity the
initial number of spreaders has also been investigated only recently (Belen and
Pearce [BP04]). The perhaps surprising result was discovered that even when
the proportion of the initial population who are spreaders tends to unity, the
proportion of the initial ignorants who never hear the rumour does not tend
to zero.
In an age of mass communication, it is natural to consider the initiation of
a rumour by means of television, radio or the internet (Frost [ProOO]). We may
use the term broadcast to refer to such an initiation. In a companion paper
[BKP05] a model with two broadcasts is envisaged. A control ingredient is
incorporated, the timing of the second broadcast.
This paper presents a generalisation of this model to a general number
n > 1 of broadcasts, with the intention of reducing the final proportion of
the population never hearing the rumour. The rumour process is started by a
broadcast to a subpopulation, the subscribers, who commence spreading the
Impulsive Control of a Sequence of Rumour Processes 389

rumour. We wish to determine when to effect subsequent broadcasts 2 , 3 , . . . , n


so as to minimise the final proportion of ignorants in the population.
Two basic scenarios are considered. In the first, the recipients of each
broadcast are the fixed group of subscribers: a subscriber who had become a
stifler becomes activated again as a subscriber spreader. In the second, the
recipients of any subsequent broadcast are those individuals who have been
spreaders at any time during the rumour initiated by the immediately previous
broadcast.
To obtain some results without becoming too enmeshed in probabilistic
technicalities, we follow Daley and Kendall and, after an initial discrete de-
scription of the population, describe the process in the continuum limit cor-
responding to a total population tending to infinity. Exactly the same formu-
lation occurs in the continuum limit if one starts with the Maki-Thompson
formulation. The resultant differential equations with each scenario can be
expressed in state-space form, with the upward jump in spreaders at each
broadcast epoch constituting an impulsive control input. Since we are deal-
ing with an optimal control problem, a natural approach would be to employ
a Pontryagin-like maximum principle furnishing necessary conditions for an
extremum of an impulsive control system (see, for example, Blaquiere [Bla85]
and Rempala and Zapcyk [RZ88]). However, because of the tract ability of the
dynamical system equations, we are able to solve the given impulsive control
problem without resorting to this theory.
In Section 2 we review the Daley-Kendall model and related results and
introduce two useful preliminary results. In Section 3 we solve the control
problem with Scenario 1 and in Sections 4 and 5 treat first- and second-order
monotonicity properties associated with the solution. In Section 6 we solve
the control problem for the somewhat more complicated Scenario 2. Also we
perform a corresponding analysis of the first-order monotonicity properties
for Scenario 2. Finally, in Section 7, we compare the two scenarios.

2 Single-Rumour Process and Preliminaries


The Daley-Kendall model considers a population of n individuals with three
subpopulations, ignorants, spreaders and stiflers. Denote the respective sizes
of these subpopulations by i, s and r. There are three kinds of interactions
which result in a change in the sizes of the subpopulations. The transitions
arising from these interactions along with their associated probabilities are as
tabulated. The other interactions do not result in any changes to the subpop-
ulations.
We now adopt a continuum formulation appropriate for n —> oo. Let z(r),
5(r), r{T) denote respectively the proportions of ignorants, spreaders and
stifiers in the population at time r > 0. The evolution of the limiting form of
the model is prescribed by the deterministic dynamic equations
390 C. Pearce et al.

Interaction Transition Probability


i^s (z,5,r)i—>> (i — l,s + l,r) isdr-\-o{dT)
s^ s (i, s, r) I—^ (i, s - 2, r + 2) s{s - l)/2 dr + o{dT)
s^r (i,5,r)i—)" (i,5 — l,r + 1) srdr + o{dT)

di
(1)

-=-s(l-20. (2)

(3)

ith initial conditions

i(0) = a > 0, s(0) = /? > 0 and r(0) = 7 > 0 satisfying a + /? + 7 = 1. (4)

The dynamics and asymptotics of the continuum rumour process are


treated by Belen and Pearce [BP04]. Under (4) z is a strictly decreasing func-
tion of time during the course of a rumour and we may reparametrise and re-
gard % as the independent variable. Define the limiting value C '-= hmr^oo ^{j)-
For our present purpose, the pertinent discussion of [BP04] may be sum-
marised as follows.

Theorem 1. In the rumour process prescribed by (l)-(4),


(a) i is strictly decreasing with time with limiting value ^ satisfying

0 < C < 1/2;

(b) C is the smallest positive solution to the transcendental equation

i e2(«-C) = e-0; (5)

(c) s is ultimately strictly decreasing to limit 0.


The limiting case a —> 1, /? —> 0, 7 = 0 is the classical situation treated
by Daley and Kendall. In this case (5) becomes

This is the equation used by Daley and Kendall to determine that in their
classical case ( ^ 0.2031878.
It is interesting to look at the case when a —> 0, in other words when
there are almost no initial ignorants in the population. For this purpose we
introduce a new variable

a
Impulsive Control of a Sequence of Rumour Processes 391

the ratio of the proportion of ignorants at time r to the initial proportion.


Note that ^(0) = 1. We define also rj := (/a^ the Hmiting value of ^ for r —» oo.
Then (5) reads as
rje^-(^-^)=e-
For a —> 0, this becomes
rj^e-^
If p —> 0 too, that is, when there are almost no initial spreaders in the popu-
lation, we get 77 = 1, that is, the proportion of the initial ignorant population
remains unchanged. However if f3 —> 1, then

rj = l/e^ 0.368 .

Thus even when there is a small initial proportion of ignorants and a large
initial proportion of spreaders, about 36.8% of the ignorant population never
hear the rumour. This result is given in [BP04].
We shall make repeated use of the following theorem, which plays the role
of a basis result for subsequent inductive arguments. Here we are examining
the variation of ( with respect to one of a, /?, 7 subject to (4), with another
of a, /?, 7 being fixed.

Theorem 2. Suppose (4) holds in a single-rumour process. Then we have the


following.
(a) For /? fixed, ( is strictly increasing in a for a < 1/2.
(b) For (3 fixed, ( is strictly decreasing in a for a > 1/2.
(c) For 7 fixed, C ^^ strictly increasing in a.
(d) For a fixed, C, is strictly increasing in p.

This is [BP04, Theorem 3], except that the statements there corresponding
to (a) and (b) are for a < 1/2 and a > 1/2 respectively. The extensions to
include a == 1/2 follow trivially from the continuity of C as a function of a.

It is also convenient to articulate the following lemma, the proof of which


is immediate.

Lemma 1. For x G [0,1/2]^ the map x 1-^ xe~'^^ is strictly increasing.

3 Scenario 1
We now address a compound rumour process in which n > 1 broadcasts
are made under Scenario 1. We shall show that the final proportion of the
population never hearing a rumour is minimised when and only when the
second and subsequent broadcasts are made at the successive epochs at which
5 = 0 occurs. We refer to this procedure as control policy S. It is convenient to
392 C. Pearce et al.

consider separately the cases 0 < a < 1/2 and a > 1/2. Throughout this and
the following two sections, ^ denotes the final proportion of the population
hearing none of the sequence of rumours.
Theorem 3. Suppose (4) holds with 0 < a < 1/2, that Scenario 1 applies
and n > 1 broadcasts are made. Then
(a) ^ is minimised if and only if the control process S is adopted;
(h) for (3 fixed, ^ is a strictly increasing function of a under control policy S.
Proof Let T be an optimal control policy, with successive broadcasts occur-
ring at times TI < r2 < . . . < Tn- We denote the proportion of ignorants in
the population at Tk hy ik (A: = 1 , . . . ,n), so that ii — a. Since i is strictly
decreasing during the course of each rumour and is continuous at a broadcast
epoch, we have from applying Theorem 1 to each broadcast in turn that

Zl > 22 > . . . > in > C > 0, (6)

all the inequalities being strict unless two consecutive broadcasts are simul-
taneous.
Suppose if possible that s > 0 at time Tn — 0. Imagine the broadcast about
to be made at this epoch were postponed and s allowed to decrease to zero
before that broadcast is made. Denote by ^' the corresponding final proportion
of ignorants in the population. Since i decreases strictly with time, the final
broadcast would then occur when the proportion of ignorants had a value

in < ^n. (7)


In both the original and modified systems we have that s = /? at r^ + 0.
By Theorem 2(a), (7) imphes ^' < ^, contradicting the optimality of poHcy
T. Hence we must have s = 0 at r^ — 0 and so by Theorem 1 that

2 > ^n > e .

Applying Theorem 2(a) again, to the last two broadcasts, gives that in is
a strictly increasing function of in-i and that ^ is strictly increasing in in.
Hence ^ is strictly increasing in in-i-
If n == 2, we have nothing left to prove, so suppose n > 2. We shall derive
the desired results by backward induction on the broadcast labels. We suppose
that for some k with 2 < fc < n we have
(i) 3 = 0 at time TJ — 0 for j = /c,fc+ 1 , . . . , n;
(ii) ^ is a strictly increasing function of ik-i-
To establish the inductive step, we need to show that 5 = 0 at Tk-i — 0
and that ^ is a strictly increasing function of i/c-2- The previous paragraph
provides a basis A: — n for the backward induction.
If 5 > 0 at Tfc-i — 0, then we may envisage again modifying the system,
allowing s to reduce to zero before making broadcast k — 1. This entails that,
Impulsive Control of a Sequence of Rumour Processes 393

if there is a proportion i^_2 of ignorants in the population at the epoch of


that broadcast, then
0 < 4 _ i < ik-i .
By (ii) this gives ^' < C ^^^ hence contradicts the optimality of T, so we
must have s = 0 at Tk-i — 0. Theorem 2(a) now yields that ik-i is a strictly
increasing function of i/c-2> so that by (ii) O s a strictly increasing function
of iA;-2- Thus we have the inductive step and the theorem is proved. D

For the counterpart result for a > 1/2, it will be convenient to extend the
notation of Theorem 2 and use ({i) to denote the final proportion of ignorants
when a single rumour beginning with state (z, /?, 1 — i — /?) has run its course.

Theorem 4. Suppose (4) holds with a > 1/2^ that Scenario 1 applies and
n> 1 broadcasts are made. Then
(a) ^ is minimised if and only if the control process S is adopted;
(h) for fixed P, ^ is a strictly decreasing function of a under control policy S.

Proof First suppose that in > 1/2. By Theorem 1 and (6), this necessitates
that 5 > 0 at time T2 — 0. If we withheld broadcast 2 until 5 = 0 occurred,
the proportion 23 of ignorants at that epoch would then satisfy

i'2=^C{ii) <C{in)= ^<l/2.

The relations between consecutive pairs of terms in this continued inequality


are given by the definition of C, Theorem 2(b), the definition of C again, and
Theorem 1 applied to broadcast n.
Hence policy S would give rise to ^' satisfying

r < 4 < ^2 < e,


contradicting the optimality of T. Thus we must have i^ < 1/2 and so

ii > 22 > • • • > ifc > 1/2 > ik+i >"'>in>^

for some k with 1 < k < n.


Suppose if possible A: > 1. Then arguing as above gives

^2 = C(n) < Ciik) < ik+i < 1/2 .

The second inequality will be strict unless 5 = 0 at time rj^+i — 0. This leads
to
i3 = C{i2)<C{h+i)<ik^2<l/2,
and proceeding recursively we obtain

4-fe+l <^n< 1/2


and so
394 C. Pearce et al.

Thus we have ^' < ^, again contradicting the optimahty of T. Hence we must
have k = 1, and so

ii > 1/2 > 12 > is >'"> in > ^ '


Consider an optimally controlled rumour starting from state (^2,/?, 1 —
i2 — P)' By Theorem 3(b), ^ is a strictly increasing function of ^2- For T
to be optimal, we thus require that 22 be determined by letting the initial
rumour run its full course, that is, that 5 = 0 at r2 — 0. This yields Part (a).
Since a > 1/2, Theorem 2(b) gives that, with control policy 5 , 22 is a strictly
decreasing function of a. Part (b) now follows from the fact that ^ is a strictly
increasing function of 22. •
Remark 1. For an optimal sequence of n broadcasts under Scenario 1, Theo-
rems 1, 3 and 4 provide

:e-^ for 1 < k< n


ik-1 (8)
and
n-0 = e-^.
(9)
Multiplying these relations together yields

-n0
a
whiclI may be rewritten as
^e-2« = a+n(3)
(10)

By Lemma 1, the left-hand side is a strictly increasing function of ^ for ^ G


[0,1/2]. Hence (10) determines (^ uniquely.
Remark 2. Equations (8), (9) may be recast as

ikc-'^'^ = ik-ie~^^-^'^'^-'^ for 2<k<n (11)

and
^e-2« = ifce-(''+2i.) _ (12)
Consider the limiting case /? —> 0 and 7 -^ 0, which gives the classical Daley-
Kendall limit of a rumour started by a single individual. Since ik < 1/2 for
2 < k < n and ^ <l/2, we have by Lemma 1 that in fact

ik = ^ for 2 < k < n.


If a < 1/2, then the above equality actually holds for 1 < k < n. This is also
clear intuitively: in the limit P —^ 0 the reactivation taking place at the second
Impulsive Control of a Sequence of Rumour Processes 395

and subsequent broadcast epochs does not change the system physically. This
cannot occur for P > 0, which shows that when the initial broadcast is to a
perceptible proportion of the population, as with the mass media, the effects are
qualitatively different from those in the situation of a single initial spreader.
The behaviour oi ik with n = 5 broadcasts is depicted in Figure 1(a) with
the traditional choice 7 = 0. In generating the graphs, Equation 11 has been
solved with initial conditions P = 0,0.2,0.4,0.6,0.8,1. The figure illustrates
Remark 2.

4 Monotonicity of ^
In this section we examine the dependence of ^ on the initial conditions for
Scenario 1. Equation (10) can be expressed as
n/? + 2 (a - 0 + In ^ - In a = 0. (13)
A single broadcast may be regarded as an instantiation of Scenario 1 with
n = 1. The outcome is independent of the control policy. This enables us to
derive the following extension of Theorem 2 to n > 1 broadcasts, ^ taking
the role of C- We examine the variation of ^ with respect to one of a, /?, 7
subject to and one of a, /?, 7 being fixed. For example, if /? is fixed then we
can consider the variation of ^ with respect to a subject to the constraint
a + 7 = I — P supplied by (4). For clarity we adopt the notation {d^/da)f3 for
the derivative of ^ with respect to a for fixed P subject toa-{-j = l—p. We
use corresponding notation for the other possibilities arising with permutation
of a, /?, 7.
Theorem 5. Suppose (4) holds with n > 1. Then under Scenario 1 we have
the following.
(a) For P fixed, ^ is strictly increasing in a for a < 1/2 and strictly decreasing
in a for a > 1/2.
(b) For a fixed, ^ is strictly decreasing in p.
(c) For 7 fixed, ^ is strictly increasing in a.
Proof. The case n = 1 is covered by Theorem 2, so we may assume that n > 2.
Also Part (a) is simply a restatement of Theorem 3(b) and Theorem 4(b).
For parts (b) and (c), we use the fact that ^ < 1/2. Implicit differentiation
of (13) yields

and
'dC\ _ O + (n - 2)a
^da)^ a l-2e
for any n > 1, which yield (b) and (c) respectively. D
396 C. Pearce et al.

1 - !
0.9[
0.8
0.7
o.ei
ik 0.5

0.4

0.3
p—0
0.2

0.1

oi p — • • , 1 1

1
(a)

Fig. 1. An illustration of Scenario 1 with a-\-P — 1 and five broadcasts. In successive


simulations 0 is incremented by 0.2. For visual convenience, linear interpolation has
been made between values of ik (resp. &k) for integral values of k.
Impulsive Control of a Sequence of Rumour Processes 397

The following result provides an extension to Corollary 4 of [BP04] to


n > 1.

Corollary 1. For any n > 1, we have ^ := sup^ = 1/2. This occurs in the
limiting case a — 'j —^ 1/2 with /? —» 0.

Proof. Prom Theorem 5(c) we have for fixed 7 > 0 that ^ is approached in
the limit a = 1 — 7 with /? = 0. By Theorem 5(a), we have in the limit /? = 0
that ^ arises from a = 1/2. This gives the second part of the corollary.
Prom (13), J satisfies

1 - 2x + ln(2x) = 0.

It is shown in Corollary 4 of [BP04] that this equation has the unique positive
solution X = 1/2. The first part follows. D

Figure 1(a) provides a graphical illustration of Theorem 5(c) for 7 - ^ 0 .


Por 7 = 0, the initial state is given by a single parameter a = ii = 1 — p.
Define 0k = ik/o^ ^or 1 < k < n and rj = ©n+i = ^/<^- Then 0 i = 1 and

Ok+ie-'^''^'+' = e-(2a+/c^) ^ 1 < A: < n - 1 , (14)

rye-2"^ = e-(2«+n/?) . (I5)

Remark 3. Put w = —2^. Then (15) gives


^ e ^ = -2ae-(2a+n^) ^

the solution of which is given by the so-called Lambert w function ([CHJK96,


BPO4J)' A direct application of the Lagrange series expression given in [BPO4]
provides

Remark 4- I'n the case a —> 0 of a vanishing small proportion of initial igno-
rants, we have by (15) that
rj = e-^^ . (16)
Thus the ratio of the final proportion of ignorants to those at the beginning de-
cays exponentially at a rate equal to the product of the number n of broadcasts
and the proportion (3 of initial spreaders. Two subcases are of interest.
(i) The case (3 —> 0 represents a finite number of spreaders in an infinite
population. Almost all of the initial population consists of stiflers, that is,
7 —> 1, and we have rj = 1. No matter how many broadcasts are made,
the proportion of ignorants remains unchanged.
(ii)In the case of (3 —> 1 almost all of the initial population consists of
spreaders, and we obtain rj = e~^.
398 C. Pearce et al.

Consider Equation (16) again. For 0 < /3 < 1, as well as for (3 —> I, we have
that T] —> 0 as n —> oo.
The behaviour of 0k for the standard case 7 = 0 is illustrated in Fig-
ure 1(b), for which we solve (14) with various initial conditions for 5 broad-
casts. This brings out the variation with /? more dramatically. The graph
illustrates in particular Remark 4(ii). The curves pass through (1,1), since
ii = a implies ©i = 1.
Remark 5. Given initial proportions a of ignorants and (3 of subscribers, with
0 < /? < 1 or with (3 —> 1, the required number n of broadcasts to achieve a
target proportion rj or less of ignorants can be obtained through (15) as

k -~[ln{rj) + 2a{l-rj)]

For example, consider the conventional case of ^ = 0. Given 20% initial


spreaders ((3 = 0.2j in the infinite population, in order to reduce the initial
number of ignorants by 90% (that is, to reduce to a level where rj < 0.1) at least
five broadcasts are needed (see also Figure 1(b)). The same target is achieved
in three broadcasts if the initial spreaders comprise 60% of the population
((3 = o.e;.
For n > 1, Equation (15) can be rewritten as

n/3 + 2a(l-77)4-In77 = 0 . (17)

Theorem 6. Suppose n > 1 and (4) applies. Then under Scenario 1:


(a) for (3 fixed, rj is strictly decreasing in a;
(b) for a fixed, rj is strictly decreasing in (3;
(c) for 7 fixed, rj is strictly decreasing in a for n — 1 and strictly increasing
in a for n>2.
Proof. We use the facts that 77 < 1/2 and ^ = arj < 1/2. Implicit differentia-
tion of (17) gives
'drA 2rj{l-v) ^^^
^dajp 1 — 2arj
dr]\ nr)
' ^ -
^d(3j^ '
l-2arj <0,
which furnish (a) and (b) respectively.
Similarly
'drj\ 7/(2 - n - 2rj)
,da)^ l-2ari
For n = 1 the numerator on the right is positive and so (drj/da)^ < 0. For
7^ > 2 the numerator is negative and {drj/da)^ > 0. This completes the proof.
D
Impulsive Control of a Sequence of Rumour Processes 399

A graphical illustration of Theorem 6(c) for 7 == 0 is given in Figure 1(b).


Theorem 6(c) can be re-expressed as saying that, for fixed 7, r/ = ©n+i is
increasing in /? for n = 1 and decreasing for n > 1. This is reflected in the
graphs almost having a point of concurrence between k = 2 and A: = 3.
We may interpolate between integer values of k by extending (13) to define
^ for nonintegral n > 1, rather than by employing linear interpolation. Doing
this yields exact concurrence of the interpolated curves. To see this, suppose
we write (13) as

n(l - a - 7) + 2a(l - 0n+i) + In ©n+i = 0. (18)

For 7 > 0 given, if this curve passes through a point ( n + 1 , 0 n + i ) independent


of a we must have

2a(l — 0n+i) — na= constant .

This necessitates
n = 2{l-en+i) (19)
and so from (18) that
n ( l - 7 ) + ln0n+i-O. (20)
Clearly (19) and (20) are together also sufficient for there to be a point of
concurrence.
Elimination of n between (19) and (20) provides

2(1 - 0n+i)(l - 7) + In0n+i = 0. (21)

Denote by r/o the value of C, for a (single) rumour in the limit y5 ^ 0 and the
same fixed value of 7 as in the repeated rumour. We have

2(l-7)(l-77o) + lnryo-0. (22)

Prom (21) and (22) we can identify 0n+i = ^0 and (19) then yields n =
2(1 — r]o). We thus have a common point of intersection (3 — 2rjo,rjo). In
particular, for the traditional choice 7 — 0, we have 770 ~ 0.203 and the
common point is approximately (2.594, 0.203), a point very close to the cluster
of points in Figure 1(b).

5 Convexity of ^
We now address second-order monotonicity properties of ^ as a function of
a, /3, 7 in Scenario 1. The properties derived are new for n = 1 as well as
for n > 2. First we establish two results, of some interest in their own right,
which will be useful in the sequel.
400 C. Pearce et al.

Theorem 7. Suppose (4) holds with n > 1 and Scenario 1 applies. For 0 <
X <1 and u; > 0 define
h{x, cj) := a; + 2(2x - 1) + ln(l - x) - In x.
Then
(a)h{x,iu) = 0 defines a unique
x = (l>{uj)e (1/2,1);
(b) h is strictly increasing in u;
(c)i>l-a ^^=> a > </)(n/?) and ^<l-a 4=^ a<(j){np).
Proof, We have
dh _ {l-2xf ^^^
dx x{l — x) ~ '
with equahty if and only \i x = 1/2, so h{-,uj) is strictly decreasing on (0,1).
Also h{l/2,(jo) = uj > 0 and h{x,uj) —^ —oo as x —> 1—. Part (a) follows.
The relation h{x,uj) — 0 may be written as
-u; = 2(2x - 1) + ln(l - x) - Inx.
Part (b) is an immediate consequence, since the right-hand side is a strictly
decreasing function of a; on (0,1).
Since h is strictly decreasing in x, we deduce from (a) that
/i(x,u;)>0 for x < ^{^) and /i(x,cj)<0 for x>(\)[uS), (23)
For 2/G (0,1) put
g{a,u,y) := uj-i-2{a - y)+ lny - In a.
We have readily that dg/dy is positive for y <l/2 and negative for ?/ > 1/2, so
g is strictly increasing in y for y < 1/2 and strictly decreasing in y for y > 1/2.
Also p ^ a ; > O a s y — ^ a and g -^ —oo as y ^ 0, whence g{a,n(3,^) = 0
defines a unique ^ G (0, a A 1/2). We have
^ ^ 1— a according as g{a, n(3,1 — a) ^ 0.
But ^(a,n/3,1 -- a) = h{a,np). Part (c) now follows immediately from (23).
D

Corollary 2. Under the conditions of the preceding theorem with n = 1,


^ ^ a/2 according as 7 ^ 1 — In 2.
Proof The argument of the theorem gives that
^ ^ a/2 according as g{a, /?, a/2) ^ 0,
that is,
^ ^ a/2 according as /^ + a — In 2 ^ 0.
The stated result follows from a + /3 + 7 = l. •
Impulsive Control of a Sequence of Rumour Processes 401

T h e o r e m 8. Suppose (4) holds with n > 1 and Scenario 1 applies. Then


(a) for a fixed, ^ is strictly convex in (3;
(h) for p fixed, ^ is strictly concave in a for a G (0,4>{nP)) and strictly convex
forae[ct>{n(5)A);
(c) for 7 fixed, ^ is strictly convex in a if n>2 or n = 1 and 7 > 1 — In 2;
(d) for 7 fixed, ^ is strictly concave in a ifn = l and 7 < 1 — In 2.
Proof Implicit differentiation of 13 twice with respect to /? yields
2

a-)(0^Mii>»'
which yields (a). Similarly

2
1_ 1 - 2a .
c ? I V 1 - 2^

The expression in brackets has the same sign as

that is, the opposite sign to 1 — (a + ^). By Theorem 7(c), the expression in
brackets is thus negative if a < 4>{nP) and positive if a > (t){nP)^ whence part
(b).
Also by implicit differentiation of (13) twice with respect to a,

(24)
U VU^V^ e\da,1,0?
and a single differentiation gives

my^—H (25)

By Theorem 5(c), the right-hand side of (25) is positive for n > 2, so the
right-hand side of (24 must be positive and therefore so also the left-hand
side, whence we have the first part of (c).
To complete the proof, we wish to show that for n = 1 the right-hand side
of (25) is positive for 7 > 1 — In 2 and negative for 7 < 1 — In 2. Since

^daj^ a(l-20'

the desired result is established by Corollary 2, completing the proof. D


402 C. Pearce et al.

6 Scenario 2
Theorem 9. Suppose (4) holds and n > 1 broadcasts are made under Sce-
nario 2. Then
(a) ^ is minimised if and only if control policy S is adopted;
(h) for fixed y, ^ is a strictly increasing function of a under control policy S.

Proof The argument closely parallels that of Theorem 3. The proof follows
verbatim down to (7). We continue by noting that in either the original or
modified system r == 7 at time r^ -f- 0. By Theorem 2(c), (7) implies ^' < ^,
contradicting the optimality of control policy T . Hence we must have 5 = 0
at time r^ — 0. The rest of the proof follows the corresponding argument in
Theorem 3 but with Theorem 2(c) invoked in place of Theorem 2(a). D

Remark 6. The determination of ^ under Scenario 2 with control policy S is


more involved than that under Scenario 1. For 1 < k < n, set (3k = <5(rfc + 0).
Then ik -^ Pk = ^ — 1 = (^ + P, so that Theorem 1 yields

Ik 2{ik-i-ik) = g-(a+/3-ifc_i) y.^^ 1 < A; < n + 1,


e
u-1
where we set 2^4-1 := C- We may recast this relation as
ij^ e-^'^ = ik-i g-(a+/3+u_i) j ^ ^ 1 < A: < n + 1. (26)

Since ik,^ ^ (0,1/2) for 1 < k < n, Lemma 1 yields that (26) determines
^ uniquely and sequentially from ii = a.

Figure 2(a), obtained by solving (26), depicts the behaviour of ik with


n = 5 for the standard case of 7 = 0. The initial values /? = 0,0.2,0.4,0.6,0.8,1
have been used to generate the graphs.
As with Scenario 1, we examine the dependence of ^ on the initial condi-
tions. Equation (26) can be rewritten as

p + a + ik-i -2ik-^\nik-lnik-i -0, 1< A; < n + 1 . (27)

We now give the following result as a companion to Theorem 5. As before,


a single broadcast may be regarded as an instantiation of Scenario 2 with
n = l.

Theorem 10. Suppose (4) holds and Scenario 2 applies with n > 1. Then we
have the following.
(a) For a fixed, ^ is strictly decreasing in (3.
(b) For 7 fixed, ^ is strictly increasing in a,
Impulsive Control of a Sequence of Rumour Processes 403

1 1 1 1

o.sl
o.yi
o.el
ik 0.5 [

0.4

o.si
P—0 ^ ^ ^ 1
0.2

0.11
p—1 , 1 1 gMaMM— 1

3 4
k
(a)

Fig. 2. An illustration of Scenario 2 with a+P = 1 and five broadcasts. In successive


simulations P is incremented by 0.2. For visual convenience, linear interpolation has
been made between values of ik (resp. 0k) for integral values of k.
404 C. Pearce et al.

Proof. T h e case n = 1 is covered by Theorem 2, so we may assume t h a t n > 2.


P a r t (b) is simply a restatement of Theorem 9(b).
To derive (a), we use an inductive proof to show t h a t

dik
<0 for 2<k<n + l.

Imphcit differentiation of (27) for A: — 2 provides

'3X2
-1,
dfiJAi2 ^.
supplying a basis. Implicit differentiation for general k gives

dik dik-] 1
1-2 -1
dp ^k dp ^k-l

from which we derive the inductive step and complete the proof. D

T h e following result provides an extension to Corollary 4 of [BP04] to


n > 1 for the context of Scenario 2.

C o r o l l a r y 3 . For any n > 1, we have ^ := s u p ^ = = 1 / 2 . This occurs in the


limiting case a = 7 —> 1/2 with /? ^ 0.
Proof. W i t h the limiting values of a, /? and 7, (27) reads as

- -{-ik-i -"^ik +^^ik -^^ik-i =0 for l < A : < n + l.

We may now show by induction t h a t ik = 1/2 ioi 1 < fc < n + 1. T h e basis is


provided by a = 1/2 and the inductive step by the uniqueness result cited in
the second part of the proof of Corollary 1. since ^ < 1/2, this completes the
proof. D

Using the notation introduced for Scenario 1, the recursive equation 26


can be rewritten as
^g-2ar,_0^g-(a+^+0.) ^ (28)
eke-^""^^ = Ok-1 e-^^+^+^'^-i) , 1 < /c < n, (29)
where ©i = 1.
Remark 7. In the case of almost no initial ignorants in the population, that is,
when a —> 0, Equations (28), (29) reduce to

rj = One ek = Ok-ie-^
which in turn give
-n/3
7] — e
This equation is the same as that obtained in Remark 4 made for Scenario 1.
The rest of the discussion given in Remark 4 O'lso holds for Scenario 2.
Impulsive Control of a Sequence of Rumour Processes 405

Figure 2(b) illustrates the above remark for a+13 -^ 1. As with Figure 1(b),
Figure 2(b) shows more dramatically the dependence on /?: for a given initial
value a, we have for each A: > 1 that Ok increases with /?, the relative and
absolute effects being both less marked with increasing k.

Remark 8. The required number n of broadcasts necessary to achieve a tar-


get proportion e or less of ignorants may be evaluated by solving (28)-(29)
recursively to obtain the smallest positive integer n for which rj < e.

7 Comparison of Scenarios
We now compare the eventual proportions ^ and ^* respectively of the pop-
ulation never hearing a rumour when n broadcasts are made under control
policy S with Scenarios 1 and 2. For clarity we use the superscript * to distin-
guish quantities pertaining to Scenario 2 from the corresponding quantities
for Scenario 1.

Theorem 11. Suppose (4) holds and that a sequence of n broadcasts is made
under control policy S. Then
(a) if n> 2, we have

il < ik for 2 <k <n\

(b) if n>2, we have

Proof. From (11), (12) (under Scenario 1) and (26) (under Scenario 2), ^ may
be regarded as i^+i and ^* as ijl^+i, so it suffices to establish Part (a). This
we do by forward induction on k,
Suppose that for some A: > 2 we have

il-i<ik-i^ (30)

A basis is provided by the trivial relation 23 ~ i^- We have the defining


relations
ile-^^l ^ il_^e~^^+f^^'k-i) (31)
and
ikc-^''^ = ik-ie-^^^^'^-^^ . (32)
The inequality
il_i < a
may be rewritten as
/3 + 2 i ^ _ i < a + /3 + i^_i,
406 C. Pearce et al.

so t h a t

Hence we have using (31) t h a t

L e m m a 1 and (30) thus provide

By (32) and a second appHcation of Lemma 1 we deduce t h a t i^ < ik, the


desired inductive step. This completes the proof. D

Theorem 11 can be verified for the case of 7 = 0 by comparing the graphs


in Figures 1(a) and 2(a).

Acknowledgement
Yalcin Kaya acknowledges support by a fellowship from C A P E S , Ministry
of Education, Brazil (Grant No. 0138-11/04), for his visit to Department of
Systems and Computing at the Federal University of Rio de Janeiro, during
which p a r t of this research was carried out.

References
[AH98] Aspnes, J., Hurwood, W.: Spreading rumours rapidly despite an adversary.
Journal of Algorithms, 26, 386-411 (1998)
[Bar72] Barbour, A.D.: The principle of the diffusion of arbitrary constants. J.
Appl. Probab., 9, 519-541 (1972)
[BKP05] Belen, S., Kaya, C.Y., Pearce, C.E.M.: Impulsive control of rumours with
two broadcasts. ANZIAM J. (to appear) (2005)
[BP04] Belen, S., Pearce, C.E.M.: Rumours with general initial conditions.
ANZIAM J., 45, 393-400 (2004)
[Bla85] Blaquiere, A.: Impulsive optimal control with finite or infinite time horizon,
J. Optimiz. Theory Applic, 46, 431-439 (1985)
[Bom03] Bommel, J.V.: Rumors. Journal of Finance, 58, 1499-1521 (2003)
[CHJK96] Corless, R.M., Hare, D.E.G., Jeffrey, D.J., Knuth, D.E.: On the Lambert
W function. Advances in Computational Mathematics, 5, 329-359 (1996)
[DK65] Daley, D.J., Kendall, D.G.: Stochastic rumours. J. Inst. Math. Applic, 1,
42-55 (1965)
[DP03] Dickinson, R.E., Pearce, C.E.M.: Rumours, epidemics and processes of
mass action: synthesis and analysis. Mathematical and Computer Mod-
elling, 38, 1157-1167 (2003)
[DMCOl] Donavan, D.T., Mowen, J . C , Chakraborty, C : Urban legends: diffusion
processes and the exchange of resources. Journal of Consumer Marketing,
18, 521-533 (2001)
Impulsive Control of a Sequence of Rumour Processes 407

[FPRU90] Feige, U., Peleg, D., Rhagavan, P., Upfal, E.: Randomized broadcast in
networks. Random Structures and Algorithms, 1, 447-460 (1990)
[ProOO] Frost, C : Tales on the internet: making it up as you go along. ASLIB
P r o c , 52, 5-10 (2000)
[GanOO] Gani, J.: The Maki-Thompson rumour model: a detailed analysis. Envi-
ronmental Modelling and Software, 15, 721-725 (2000)
[MT73] Maki, D.P., Thompson, M.: Mathematical Models and AppHcations.
Prentice-Hall, Englewood Cliffs (1973)
[OT77] Osei, G.K., Thompson, J.W.: The supersession of one rumour by another.
J. App. Prob., 14, 127-134 (1977)
[PeaOO] Pearce, C.E.M.: The exact solution of the general stochastic rumour. Math.
and Comp. Modelling, 3 1 , 289-298 (2000)
[Pit90] Pittel, B.: On a Daley-Kendall model of random rumours. J. Appl.
Probab., 27, 14-27 (1990)
[RZ88] Rempala, R., Zabczyk, J.: On the maximum principle for deterministic
impulse control problems. J. Optim. Theory Appl., 59, 281-288 (1988)
[Sud85] Sudbury, A.: The proportion of the population never hearing a rumour. J.
Appl. Probab., 22, 443-446 (1985)
[Wat87] Watson, R.: On the size of a rumour. Stoch. Proc. Apphc, 27, 141-149
(1987)
[ZanOl] Zanette, D.H.: Critical behaviour of propagation on small-world networks.
Physical Review E, 64, 050901(R), 4 pages (2001)
Minimization of the Sum of Minima of Convex
Functions and Its Application to Clustering

Alexander Rubinov, Nadejda Soukhoroukova, and Julien Ugon

CIAO, School of Information Technology and Mathematical Sciences


University of Ballarat
Ballarat, VIC 3353, Australia
a. rubinovQballarat. edu. au, n. soukhoroiikovaQballarat. edu. au,
jugonQstudents. b a l l c i r a t . edu. au

Summary. We study functions that can be represented as the sum of minima of


convex functions. Minimization of such functions can be used for approximation of
finite sets and their clustering. We suggest to use the local discrete gradient (DG)
method [Bag99] and the hybrid method between the cutting angle method and
the discrete gradient method (DG+CAM) [BRZ05b] for the minimization of these
functions. We report and analyze the results of numerical experiments.

K e y w o r d s : sum-min function, cluster function, skeleton, discrete gradi-


ent method, cutting angle method

1 Introduction
In this paper we introduce and study a class of sum-min functions. This class
T consists of functions of the form

F ( x i , . . . , XA:) = ^ min((^i(a;i, a ) , (/?2(3:2, a ) , . . . ^},{x},,a)),

where ^ is a finite subset of a finite dimensional space and the function


X 1-^ (pi{x,a) is convex for each i and a G ^ . In particular, the cluster func-
tion (see, for example, [BRY02]) and Bradley-Mangasarian function [BMOO]
belong to J^. We also introduce the notion of a skeleton of the set A, which
is a version of Bradley-Mangasarian approximation of a finite set. T h e search
for skeletons can be carried out by a constrained minimization of a certain
function belonging to J^.
We point out some properties of functions F e J^. In particular we show
t h a t these functions are DC (diff'erence of convex) functions.
Functions F e J^ are nonsmooth and nonconvex. If the set A is large
enough then these functions have a large number of shallow local minima.
410 A. Rubinov et al.

Some functions F G ^ (in particular, cluster functions) have a saw-tooth form.


The minimization of these functions is a challenging problem. We consider
both local and global minimization of functions F ^ T. We suggest to use the
derivative-free discrete gradient (DG) method [Bag99] for local minimization
of these functions. For global minimization we use the hybrid method between
DG and the cutting angle method (DG+CAM)[BRZ05a, BRZ05b] and the
commercial software GAMS (LGO solver), see [GAM05, LGO05] for more
information.
These methods were applied to the minimization of two types of functions
from T\ cluster functions C^ (generalized cluster functions C^) and skeleton
functions L^ (generalized skeleton functions Z^). These functions are used for
finding clusters in datasets (unsupervised classification).
The notion of clustering is relatively fiexible (see [JMF99, BRSY03] for
more information). The goal of clustering is to group points in a dataset in
a way that representatives of the same group (the same cluster) are similar
to each other. There are difi'erent notions of similarity. Very often it is as-
sumed that similar points have similar coordinates because each coordinate
represents measurements of the same characteristic. The functions Ck^Ck^ Lk^
Lk can be used to represent the dissimilarity of obtained systems of clusters.
Therefore, a clustering system which gives a minimum of a chosen dissimilar-
ity function is considered as a desired clustering system. Different dissimilarity
functions lead to difi'erent approaches to clustering, therefore difi'erent clus-
tering results can be obtained by the minimization of functions F ^ T.
We report results of numerical experiments and analyze these results.

2 A class of sum-min functions


2.1 Functions represented as the sum of minima of convex
functions

Consider finite dimensional vector space IR^ and IR"^. Let A C IR^ be a finite
set and let A: be a positive integer. Consider a function F defined on (IR"^)^
by
F ( x i , . . . , XA;) = ^ min((^i(xi, a), (^2(^2, a ) , . . . ^k{x],,a)), (1)

where x v-^ (pi{x,a) is a convex function defined on IR"^ (i = l,...,fc, a G


A). We do not assume that this function is smooth. We denote the class of
functions of the form (1) by ^ .
The search for some geometric characteristics of a finite set can be accom-
plished by minimization (either unconstrained or constrained) of functions
from ^ , (see, for example [BRY02, BMOO]). Location problems (see, for ex-
ample, [BLM02]) also can be reduced to the minimization of functions from
Minimization of the Sum of Minima of Convex Functions 411

The minimization of function F G ^ is a min-sum-min problem. We also


can consider min-max-min problems with the objective function
F{xi, ...,Xk)= msixmm{(pi{xi,a),(p2{x2,a),.. .(pk{xk,a)).
aeA
Using sum-min function F we take into account the contribution of each
point a G ^ to a characteristic of the set A, which is described by means of
functions (pi{x,a). This is not true if we consider F. From this point of view,
the minimization of sum-min functions is preferable for examination of many
characteristics of finite sets.

2.2 Some properties of functions belonging to !F.


Let F eJ^, that is

F{xi,...,Xk) = Yl ™ ^ , (pi{xi,a),
aeA
where x H-> (pi{xi^a) is a convex function. Then F enjoys the following prop-
erties:
1. F is quasidifferentiable ([DR95]). Moreover, F is DC (the difference of
convex functions). Indeed, we have (see for example [DR95], p. 108):
F{x) = fi{x) - f2{x), x = (xi,...,Xk),
where

aEAi=l

M^) == X^ .i^ax^^(^^(x^,a).
aeA jy^i
Both / i and /2 are convex functions. The pair DF{x) = (9/i(x), —df2{x)) is
a quasidifferential [DR95] of F at a point x. Here df stands for the convex
subdifferential of a convex function / .
2. Since F is DC, it follows that this function is locally Lipschitz.
3. Since F is DC it follows that this function is semi-smooth.

We can use quasidifferentials of a function F G /* for a local approximation


of this function near a point x. Clarke subdifferential also can be used for local
approximation of F , since F is locally Lipschitz.

3 Examples
We now give some examples of functions belonging to class T. In all the
examples, datasets are denoted as finite sets A C IR^, that is as sets of n-
dimensional points (also denoted observations).
412 A. Rubinov et al.

3.1 Cluster functions and generalized cluster functions

Assume that a finite set A C IR'^ consists of k clusters. Let X = {xi,..., x^} C
(IR"')^. Consider the distance d{X,a) = min{||xi — a||,... \\xk — a\\) between
the set X and a point [observation) a e A. (It is assumed that IR"^ is equipped
with a norm || • ||.) The deviation of X from A is the quantity d{X,A) =
"^aeA ^{-^i ^)- -^^^ ^ — {^1' • • • ^fc} be a solution to the problem:

"^Kj^n Y^ n^MIki - «lh • • • W^k - a\\}.


aeA

Then x i , . . . ,Xfc can be considered as the centres of required clusters. (It is


implicitly assumed that these are point-centred clusters.) If the cluster centres
are known each point is assigned to the cluster with the nearest centre. Assume
that N is the cardinahty of set A. The function

Ck{xu...,Xk) = ~d{X,A) = — ^ m i n ( | | x i - a||,..., ||x/c - a||) (2)


aeA

is called a cluster function. This function has the form (1) with ipi{x,a) =
11 a: — o II for each aeA and i = 1 , . . . , fc. The cluster function was examined
in [BRY02]. Some numerical methods for its minimization were suggested in
[BRY02].
The cluster function has a saw-tooth form and the number of teeth dras-
tically increases as the number of addends in (2) increases. This leads to the
increase of the number of shallow local minima and saddle points. If the norm
II • II is a polyhedral one, say || • || = || • ||i, then the cluster function is piece-wise
linear with a very large number of different linear pieces. The restriction of
the cluster function to a one-dimensional line has the form of a saw with a
huge amount of teeth of different size but of the same slope.
Let {ma)aeA be a family of positive numbers. Function

Ck{xi,...,Xk) =—^ ma min(||xi - a||,..., ||xfc - a | | ) (3)


aeA

is called a generalized cluster function. Clearly Ck has the form (1). The
structure of this function is similar to the structure of cluster function, however
different teeth of generalized cluster function can have different slopes.
Clusters constructed according to centres, obtained as a result of the clus-
ter function minimization are called centre-based clusters.

3.2 Bradley-Mangasarian approximation of a finite set

If a finite set A consists of flat parts it can be approximated by a collection


of hyperplanes. Such kind of approximation was suggested by P.S. Bradley
and O.L. Mangasarian [BMOO]. Assume that we are looking for a collection
Minimization of the Sum of Minima of Convex Functions 413

of k hyperplanes Hi = {x : [k^x] = Ci} approximating the set A. (Here [l,x]


stands for the inner product of vectors / and x.) The following optimization
problem was considered in [BMOO]:

minimize 7 min ([/i,a] — c^)^ subject to ||/i||2 = 1, 2 = l,...,fc. (4)


aeA

Here mini=i,...,A;([^i, a] — c^)^ is the square of 2-norm distance between a point


a and the nearest hyperplane from the given collection. Function

G((/i,ci),...,(/fc,CA;)) = Y ] min {[li,a]-Cif


'^—' i=l,...,k
aGA

can be represented in the form (1):

G{{li,ci),,.,,{lk,Ck)) = V min (p{{li,Ci),a),


'—^ 1=1....k
a£A

where ip{{l,c),a) = {[I, a] — c)^.

3.3 Skeleton of a finite set of points

We now consider a version of Bradley-Mangasarian definition, where the dis-


tances to hyperplanes are used instead of the squares of these distances. As-
sume that IR'^ is equipped with a norm || • ||. Let A be a finite set of points.
Consider vectors / i , . . . , / ^ with ||/^||* = max||a.||=,i[/,x] = 1 and numbers Q
{i = 1 , . . , , /e). Let Hi = {x : [k^x] = Ci} and H = UiHi. Then the distance
between the set Hi and a point a is d{a^Hi) — \[li^a] — Ci\ and the distance
between the set H and a is

d{a,H) = min I[/^, a] - Q | . (5)


i
The deviation of X from A is

^Y2,d{a,H) = ^ m i n | [ / ^ , a ] - Ci\.
aEA aGA

The function

Lk{{li,ci),...,{lk,Ck)) = y ' m i n | [ / i , a ] - Ci\ (6)


aeA

is of the form (1). Consider the following constrained min-sum-min problem

min V^ min \[li, a] — Ci\ subject to ||/j|| = 1, c^ G IR (j = 1 , . . . , k) (7)


' '^ i
aeA
A solution of this problem will be called a k-skeleton of the set A. The function
in (7) is called the skeleton function.
414 A, Rubinov et al.

More precisely, /c-skeleton is the union of k hyperplanes {x : [k^x] — Q } ,


where ( ( / i , c i ) , . . . , (//c^Cjt)) is a solution of (7). / / the skeletons are known,
each point is assigned to the cluster with the nearest skeleton. It is difficult
to find a global minimizer of (7), so sometimes we can consider the union of
hyperplanes that is formed by a local solution of (7) as a skeleton.
Clusters constructed according to skeletons, obtained as a result of the
skeleton function minimization are called skeleton-based clusters.
The concept of shape of a finite set of points was introduced and studied
in [SU05]. By definition, the shape is a minimal (in a certain sense) ellipsoid,
which contains the given set. A technique to find an ellipsoidal shape is then
proposed in the same paper. In many instances the geometric characterization
of a set A can be viewed as the intersection between its shape, describing its
external boundary, and its skeleton, describing its internal aspect.
A comparative study of Bradley-Mangasarian approximation and skele-
tons was undertaken in [GRZ05]. It was shown there that skeletons are quite
different from Bradley-Mangasarian approximation, even for simple sets.

3.4 Illustrative examples

We now give two illustrative examples.


Example 1. Consider the set depicted in Fig. 1

Fig. 1. Clusters based on centres

• * _•.•

•''VI •*•• m ^ m if^ ' • • I* • •• ^*

* • •

Clearly this set consists of two clusters, the centers of these clusters (points
xi and X2) can be found by the minimization of the cluster function. The
skeleton of this set hardly depends on the number k of hyperplanes (straight
lines). For each k this skeleton cannot give a clear presentation on the structure
of the set.
Minimization of the Sum of Minima of Convex Functions 415

Fig. 2. Clusters based on skeletons

Example 2. Consider now the set depicted in Fig. 2.


It is difficult to say how many point-centred clusters has this set. Its de-
scription by means of such clusters cannot clarify its structure. At the same
time this structure can be described by the intersection of its skeleton consist-
ing on three straight lines and its shape. It does not make sense to consider
A:-skeletons of the given set with k > 3.

4 Minimization of sum-min functions belonging t o class


T
Consider function F defined by (1):

F{xi,..,,Xk) = — ^mm{(pi(xi,a),(p2(x2,a),..,(pk{xk,a)),
aeA

Xi e IR"", 2 == l,...,fc.
where A C IR^ is a finite set. This function depends on n x k variables. In
real-world applications n x A; is a large enough number and the set A contains
some hundreds or thousands points. In such a case function F has a huge
amount of shallow local minimizers that are very close to each other. The
minimization of such functions is a challenging problem.
In this paper we consider both local and global minimization of sum-min
functions from J^. First we discuss possible local techniques for the minimiza-
tion.
The calculation of even one of the Clarke subgradients and/or a quasidiffer-
ential of function (1) is a difficult task, so methods of nonsmooth optimization
based on subgradient information (quasidifferential information) at each iter-
ation are not effective for the minimization of F . It seems that derivative-free
methods are more effective for this purpose.
416 A. Rubinov et al.

For the local minimization of functions (1) we propose to use the so-called
discrete gradient (DG) method, which was introduced and studied by Adil
Bagirov (see for example, [Bag99]). A discrete gradient is a certain finite dif-
ference approximated the Clarke subgradient or a quasidifferential. In contrast
with many other finite differences, the discrete gradient is defined with respect
to a given direction. This leads to a good enough approximation of Clarke sub-
gradients (quasidifferentials). DG calculates discrete gradients step-by-step; if
a current point in hands is not an approximate stationary point then af-
ter a finite number of iterations the algorithm calculates a descent direction.
Armijo's method is used in DG for a line search.
The calculation of discrete gradients is much easier if the number of ad-
dends in (1) is not very large. The decrease of the number of addends leads
also to a drastic diminishing of the number of shallow local minima. Since
the number of addends is equal to the number of points in the dataset, we
conclude that the results of the application of DG for minimization of (1)
significantly depend on the size of the set A.
The discrete gradient method is a local method, which may terminate in
a local minimum. In order to ascertain the quality of the solution reached, it
is necessary to apply global methods. Here we call global method a method
that does not get trapped on stationary points, and can leave local minima
to a better solution.
Various combinations between local and global techniques have recently
been studied (see, for example [HF02, YLT04]).
We use a combination of the DG and the cutting angle method (DG+CAM)
in our experiments. We call this method the hybrid global method.
These two techniques (DG and DG+CAM) have been included in a new
optimization software (CIAO-GO) created recently at the Centre for Infor-
matics and Applied Optimization (CIAO) at the University of Ballarat, see
[CIA05] for more information. This version of the CIAO-GO software (Centre
for Informatics and Applied Optimization-Global Optimization) allows one to
use four different solvers
1. DG,
2. DG multi start,
3. DG+CAM,
4. DG+CAM multi start.
Working with this software users have to input
• an objective function (for minimization),
• an initial point for optimization,
• upper and lower bounds for variables,
• constraints and a penalty constant (in the case of constrained optimiza-
tion), constraints can be represented as equalities and inequalities,
• maximal running time,
• maximal number of iterations.
Minimization of the Sum of Minima of Convex Functions 417

"Multi start" option in CIAO-GO means that the program starts from the
initial point chosen by a user and also generates 4 additional random initial
points. The final result is the best obtained result. The additional initial points
are generated by CIAO-GO from the corresponding feasible region (or close
to the feasible region).
As a global optimization technique we use the General Algebraic Mod-
eling System (GAMS), see [GAM05] for more information. We use the Lip-
schitz global optimizer (LGO) solver [LGO05] from Pinter Consulting Services
[Pin05].

5 Minimization of generalized cluster function


In this section we discuss applications DG, DG-fCAM and the LGO solver for
minimization of generalized cluster functions. We propose several approaches
for selecting initial points.

5.1 Construction of generalized cluster functions

Consider a set ^ c IR^ that contains N points. Choose e > 0. Then choose
a random vector b^ G A and consider subset A^ji = {a G A : ||a — 6^|| < e}
of the set A. Take randomly a point b"^ e Ai = A\ Ai^i. Let ^52 = {a e Ai :
a — 6^11 < e} and ^2 = ^1 \ ^62- If the set Aj-i is known, take randomly
b^ G Aj-iy define set Ai^j as {a G Aj-i : ||a — 6^|| < e} and define set A
as Aj-i \ Aijj. The result of the described procedure is the set B = {b^}^^^,
which is a subset of the original dataset A. The vector b^ is a representative
for the whole group of vectors, removed on the step j .
If rrij is the cardinality of Af^j then the generalized cluster function corre-
sponding to B

Ckix\...,x') = ^'£mjunn{\\x'-V\\,...,\\x''-b^\\)
3

can be used for finding centers of clusters of the set A.


The size of the dataset B obtained as the result of the described pro-
cedure is the most important parameter, so we shall use this parameter for
characterization of B.
It can be proved (see [BRSY03]) that this function does not differ by more
than e from the original cluster function.
Remark 1. We can use the same idea to construct the generalized skeleton
function.
Remark 2. Unfortunately, it is very difficult to know a priori the value for
€ which allows one to remove a certain proportion of observations. In our
experiments we had to try several values for e before we found suitable ones.
418 A. Rubinov et al.

5.2 Initial points

Most methods of local optimization are very sensitive to the choice of an initial
point. In this section we suggest a choice of initial points which can be used
for the minimization of cluster functions and generaUzed cluster functions.
Consider a set ^1 C IR^ that contains N points. Assume that we want to
find k clusters in A. In this case an initial point is a vector x G IR^^^. The
structure of the problem under consideration leads to different approaches to
the choice of initial points. We suggest the following four approaches.
fc-meansLi initial point The fc-meansLi method is a version of the well-
known fc-means method (see, for example, [MST94]), where || • ||i is used
instead of || • ||2. (We use || • ||i in numerical experiments, this is the reason
for consideration of /c-meansLi instead of /c-means.) We use the following
procedure in order to sort N observations into k clusters:
1. Take any k observations as the centres of the first k clusters.
2. Assign the remaining N — k observations to one of the k clusters on the
basis of the shortest distance (in the sense of || • ||i norm) between an
observation and the mean of the cluster.
3. After each observation has been assigned to one of the k clusters, the
means are recomputed (updated).
Stopping criterion: there is no observation, which moves from one cluster to
another.
Note that results of this procedure depend on the choice of an initial
observation.
We apply this algorithm for original dataset A and then the result point
X G IR^^'^ is considered as an initial point for minimization of generalized
cluster function generated by the dataset B.
Uniform initial point The appHcation of optimization methods to clustering
requires a certain data processing. In particular, a scaling procedure should
be applied. In our experiments we convert a given dataset to a dataset with
the mean-value 1 for each feature (coordinate). In such a case we can choose
the point x = ( 1 , 1 , . . . , 1 ) G IR"^'^ as initial one. We shall call it the uniform
initial point.
Ordered initial point Recall that rrij indicates the cardinality of the set
of points A^j G A, which are represented by a point IP G 5 . It is natural
to consider the collection of the heaviest k points as an initial vector for the
minimization of generalized cluster function C. To formalize this, we rearrange
the points so that the numbers mj, j = 1, •. •, NB decrease and take the first
k points from this rearranged dataset. Thus, in order to construct an initial
point we choose the k observations with the largest values for weights ruj from
the dataset B.
Minimization of the Sum of Minima of Convex Functions 419

Uniform-ordered initial point This initial point is a hybrid between the


Uniform and the Ordered initial points. It contains the heaviest k — 1 obser-
vations and the barycentre (each coordinate is 1).

6 Numerical experiments with generalized cluster


function
For numerical experiments we use two types of datasets, namely the original
dataset A and a small dataset B obtained by the procedure described in
Subsection 5.1. We compare results obtained for B with the results obtained
for the entire original dataset A.

6.1 Datasets

We carried out numerical experiments with two well-known test datasets (see
[MST94]):
• Letters dataset (20000 observations, 26 classes, 16 features). This dataset
consists of samples of 26 capital letters, printed in different fonts; 20 differ-
ent fonts were considered and the location of the samples was distributed
randomly within the dataset.
• Pendigits dataset (10992 observations, 10 classes, 16 features). This dataset
was created by collecting 250 samples from 44 writers. These writers are
asked to write 250 digits in random order inside boxes of 500 by 500 tablet
pixel resolution.
Both Letters and Pendigit datasets have been used for testing different
methods of supervised classification (see [MST94] for details). Since we use
these datasets only for construction of generalized cluster function, we consider
them as datasets with unknown classes.

6.2 Numerical experiments: description

We are looking for three and four clusters in both Letters and Pendigits
datasets. Dimension of optimization problems is equal to 48 in the case of 3
clusters and 64 in the case of 4 clusters. We consider two small sub-databases
of the Letters dataset (Letl, 353 points, approximately 2% of the original
dataset; and Let2, 810 points, approximately 4% of the original dataset) and
two small sub-sets of the Pendigits dataset (Penl, 216 points, approximately
2% of the original dataset; and Pen2, 426 points, approximately 4% of the
original dataset).
We apply local techniques (discrete gradient method) and global tech-
niques (a combination between discrete gradient and cutting angle method
and LGO solver) to minimize the generalized cluster function. Then we need
420 A. Rubinov et al.

to estimate the results obtained. We can use different approaches for this es-
timation. One of them is based on comparison of values of cluster function
Ck constructed with respect to the centers obtained in the original dataset
A and with respect to the centers obtained in its small sub-dataset B. We
compare the cluster function values, started from different initial points in
original datasets and their approximations.
We use the following procedure.
Let A be an original dataset and B be its small sub-dataset. First, the
centres of clusters in B should be found by an optimization technique. Then
we evaluate the cluster function values in A using the obtained points as the
centers of clusters in A. Using this approach we can find out how the results
of the minimization depend on initial points and how far we can go in the
process of dataset reduction.
In our research we use 4 types of initial points, described in section 5.2.
These initial points have been carefully chosen and the results obtained start-
ing from these initial points are better than the results obtained starting from
random initial points. Therefore, we present the results obtained for these 4
types of initial points rather than the results obtained starting from random
initial points generated, for example, by "multi start" option.

6.3 Results of numerical experiments

Local optimization

First of all we have to point out that we have two groups of initial points
• Group 1: Uniform initial point and A:-meansLi initial point,
• Group 2: Ordered initial point and Uniform-ordered initial point.
Initial points from Group 1 are the same for an original dataset and for all
its reduced versions. Initial points from Group 2 are constructed according
to their weights. Points in original datasets have the same weights which are
equal to L

Remark 3. Because the weights can vary for different reductions of the dataset,
the Ordered initial points for Letl and Let2 do not necessarily coincide. The
same is true for the Uniform-ordered initial points. The same observation
appUes to the Pendigits dataset and its reduced versions Penl and Pen2.

Our next step is to compare results obtained starting from different initial
points in the original datasets and in their approximations. In our experi-
ments we use two different kinds of function: the cluster function and the
generalized cluster function. Values for the cluster function and the general-
ized cluster function are the same for original datasets because each point has
the same weight which is equal to 1. In the case of reduced datasets we pro-
duce our numerical experiments in corresponding approximations of original
datasets and calculate two different value: the cluster function value and the
Minimization of the Sum of Minima of Convex Functions 421

generalized function value. The cluster function value is the value of the
cluster function calculated in the corresponding original dataset according to
the centres found in the reduced dataset. The generalized cluster function
value is the value of the generalized cluster function calculated in the reduced
dataset according to the centres found in the same reduced dataset. Normally
a cluster function value (calculated according to the centres found reduced
datasets) is larger than a generalized cluster function value calculated accord-
ing to the same centres and the corresponding weights, because optimization
techniques have been actually applied to minimize the generalized cluster in
the corresponding reduced dataset. In Tables 1-2 we present the results of
our numerical experiments obtained for DG and DG+CA starting from the
Uniform initial point.
It is also very important to remember that a better result in a reduced
dataset is not necessarily better for the original one. For example, in the case of
the Penl dataset, 3 clusters, the Uniform initial point the generalized function
value is lower for DG+CAM than for DG, however the cluster function value
is lower for DG than for DG+CAM. We observe the same situation in some
other examples.

Table 1. Cluster function and generalized cluster function: DG, Uniform initial
point

^, , Generalized ^, , Generalized
Cluster , ^ Cluster , ^
p ,. cluster p ,. cluster
Dataset Size function „ function „
, ninction , mnction
value , value
value value
3 clusters 4 clusters
Penl 216 6.4225 5.5547 5.7962 4.8362
Pen2 426 6.3844 5.8132 5.7725 5.0931
Pendigits 10992 6.3426 6.3426 5.7218 5.7218
Letl 353 4.3059 3.3859 4.1200 3.1611
Let2 810 4.2826 3.7065 4.0906 3.5040
Letters 20000 4.2494 4.2494 4.0695 4.0695

Our actual goal is to find clusters in the original datasets, therefore it is


important to compare cluster function values calculated in original datasets
according to obtained centres. Centres can be obtained from our numerical ex-
periments with both types of datasets: original datasets and reduced datasets.
It is one of the possible ways to test the efficiency of the proposed approach:
substitution of original datasets by their smaller approximations.
Tables 3-8 represent cluster function values obtained in our numerical ex-
periments starting from the fc-meansLi, Ordered and Uniform-ordered initial
point. We do not present the obtained generalized function values because
this function can not be used as a measure of the quality of clustering.
422 A. Rubinov et al.

Table 2. Cluster function and generalized cluster function: DG+CAM, Uniform


initial point

^, , Generalized ^, , Generalized
Cluster , ^ Cluster , ^
. ^. cluster r ,. cluster
Dataset Size lunction ^ function p
, function , function
value , value ,
value value
3 clusters 4 clusters
Penl 216 6.4254 5.5546 5.7943 4.8353
Pen2 426 6.3843 5.8131 5.7718 5.0931
Pendigits 10992 6.3426 6.3426 5.7218 5.7218
Letl 353 4.3059 3.3859 4.1208 3.1600
Let2 810 4.2828 3.7061 4.0909 3.5020
Letters 20000 4.2494 4.2494 4.0695 4.0695

Recall that reduced datasets are approximations of corresponding original


datasets. Decreasing the number of observations we reduce the complexity of
our optimization problems but obtain less precise approximations. Therefore,
our goal is to find some balance between the reduction of the complexity of op-
timization problems and the quality of obtained results. In some cases (mostly
initial point from Group 2, see Remark 3 for more information) the results
obtained on larger approximations of original datasets (more precise approx-
imations) are worse than the results obtained on smaller approximations of
original datasets (less precise approximations). For example, Penl and Pen2
for initial point from Group 2 (3 and 4 clusters).

Table 3. Cluster function: DG, /c-meansLi initial point


Dataset Size Cluster function value Cluster function value
3 clusters 4 clusters
Penl 216 6.4272 5.8063
Pen2 426 6.3840 5.7704
Pendigits 10992 6.3409 5.7217
Letl 353 4.3087 4.1241
Let2 810 4.2816 4.1013
Letters 20000 4.2495 4.0726

Remark 4- In the original datasets, it is not relevant to consider the Ordered


and Uniform-ordered initial points, because all the points have the same
weight.
Summarizing the results of the numerical experiments (cluster function,
local and hybrid global techniques, 4 special kinds of initial points) we can
draw out the following conclusions:
Minimization of the Sum of Minima of Convex Functions 423

Table 4. Cluster function: DG+CAM, /c-meansLi initial point


Dataset Size Cluster function value Cluster function value
3 clusters 4 clusters
Penl 216 6.4278 5.8063
Pen2 426 6.3841 5.7723
Pendigits 10992 6.3409 5.7217
Letl 353 4.3087 4.1262
Let2 810 4.2824 4.1014
Letters 20000 4.2495 4.0726

Table 5. Cluster function: DC, Ordered initial point


Dataset Size Cluster function value Cluster function value
3 clusters 4 clusters
Penl 216 6.4188 5.8226
Pen2 426 6.6534 5.9047
Letl 353 4.3228 4.2049
Let2 810 4.3843 4.1112

Table 6. Cluster function: DG+CAM, Ordered initial point


Dataset Size Cluster function value Cluster function value
3 clusters 4 clusters
Penl 216 6.4171 5.8201
Pen2 426 6.6536 5.9047
Letl 353 4.3228 4.2045
Let2 810 4.3843 4.1107

Table 7. Cluster function: DC, Uniform-ordered initial point


Dataset Size Cluster function value Cluster function value
3 clusters 4 clusters
Penl 216 6.4188 5.7921
Pen2 426 6.6514 5.8718
Letl 353 4.2910 4.1225
Let2 810 4.2828 4.1129

DC and DG+CAM applied to the same datasets produce almost identical


results if initial points are the same,
DG and DG+CAM applied to the same datasets starting from different
initial points (4 proposed initial points) produce very similar results in
most of the examples,
424 A. Rubinov et al.

Table 8. Cluster function: DG+CAM, Uniform-ordered initial point


Dataset Size Cluster function value Cluster function value
3 clusters 4 clusters
Penl 216 6.4171 5.7945
Pen2 426 6.6492 5.8715
Letl 353 4.2905 4.1233
Let2 810 4.2828 4.1130

3. in some cases the results obtained on smaller approximations of original


datasets are better than the results obtained on larger approximations of
original datasets.

Global optimization: LGO solver

First we present the results obtained by the LGO solver (global optimization).
We use the Uniform initial point. The results are in Table 9.
In almost all the cases (except Pendigits 3 clusters) the results for reduced
datasets are better than for original datasets. It means that the cluster func-
tion is too complicate for the solver as an objective function and it is more
efficient to use generalized cluster functions generated on reduced datasets. It
is beneficial to use reduced datasets in the case of the LGO solver from two
points of view
1. computations with reduced datasets allow one to reach a better minimizer;
2. computational time is significantly less for reduced datasets than for orig-
inal datasets.
It is also obvious that the software failed to reach a global minimum. We sug-
gest that the LGO solver has been developed for a broad class of optimization
problems. However, the solvers included in CIAO-GO are more efiicient for
minimization of the sum of minima of convex functions, especially if the num-
ber of components in sums is large.
Remark 5. The LGO solver was not used in the experiments on skeletons.

7 Skeletons

7.1 Introduction

The problem of grouping (clustering) points by means of skeletons is not so


widely studied as it is in the case of cluster function based models. There-
fore, we would like to start with some examples produced in not very large
datasets (no more than 1000 observations). In this subsection we formulate
Minimization of the Sum of Minima of Convex Functions 425

Table 9. Cluster function: LGO solver


Dataset Size Cluster function value Cluster function value
3 clusters 4 clusters
Penl 216 6.4370 5.8029
Pen2 426 6.4122 5.7800
Pendigits 10992 6.3426 7.1859 1
Letl 353 4.3076 4.1426
Let2 810 4.2829 4.1191
Letters 20000 5.8638 4.2064

the problems of finding skeletons mathematically, discuss applications of DG


and DG+SA to finding skeletons with respect to || • ||i and and give graph-
ical implementation to obtained results (for examples with no more than 3
features).
The search for skeletons can be done by solving constrained minimization
problem (7).
Both algorithms are designed for unconstrained problems so we use a
penalty function in order to convert problem (7) to the unconstrained mini-
mization. The corresponding unconstrained problem has the form:

mm ^^^^min|[/,,a^]-6,| + i?^^|||/,||i-l|, (8)


qeQ * i=l

where Rp is a penalty parameter.


Finally, the algorithms were applied starting from 3 different initial points,
and the best solution found was selected. The 3 different points used in the
example are:

Pi =

T(O,I...,I)
Ps

The problem has been solved for different sets of points, selected from 3 dif-
ferent well known datasets: the Heart disease database (13 features, 2 classes:
160 observations are from the first class and 197 observations are from the
second class), the Diabetes database (8 features, 2 classes: 500 observations
are from the first class and 268 observations are from the second class) and
the Australian credit cards database (14 features, 2 classes: 383 observations
are from the first class and 307 observations are from the second class), see
also [MST94] and references therein. Each of these datasets was submitted
first to the feature selection method described in [BRY02].
426 A. Rubinov et al.

The value of the objective function was considerably decreased by both


methods. However, the discrete gradient method often gives a local solution
which is very close to the initial point, while the hybrid gives a solution which
is further and better. In the tables the distance considered is the Euclidean
distance between the solution obtained and the initial solution, and the value
considered is the value of the objective function at this solution.

Table 10. Australian credit card database with 2 hyperplanes skeletons


DG method hybrid method
Initial point value distance value distance
1 22.9804 10.668 6.11298 7.98738
Class 1 2 25.5102 2.81543 13.2263 5.91397
3 6.10334 4.40741 6.10334 4.40741
1 0.473317 5.00549 0.473317 5.00549
Class 2 2 3.029 2.14784 0.222154 2.13944
3 6.87897 6.06736 4.73828 6.74424
computation time 54 sec 664 sec

Table 11. Diabetes database with 3 hyperplanes skeletons


DG method hybrid method
Initial point value distance value distance
1 28.5856 6.78624 28.1024 6.79326
Class 1 2 39.3925 11.4668 28.2417 11.7711
3 33.2006 3.09434 31.4624 2.31922
1 22.2806 2.3755 22.2806 2.3755
Class 2 2 30.346 56.7222 19.5574 8.76914
3 23.0529 1.61649 22.9495 1.76052
computation time 212 sec 1521 sec

The different examples show that although sometimes the hybrid method
does not improve the result obtained with the discrete gradient method, in
some other cases the result obtained is much better than when the discrete
gradient method is used. However the computations times it induces are much
greater than the simple use of the discrete gradient method. The diabetes
dataset has 3 features, after feature selection (see [BRY02]). This allows us to
plot graphically some of the results obtained during the computations.
We can observe that the hybrid method does not necessarily give an opti-
mal solution. Even with the hybrid method the initial point is very important.
Figure 3 however, confirms that the solutions obtained are usually very good,
and represent correctly the set of points. The set of points studied here is
Minimization of the Sum of Minima of Convex Functions 427

Fig. 3. 2^^ class for the diabetes database, with 2 hyperplanes

# %k .•

J"
m •

HI rf^«^ © ^^
@ «» •a

constituted by a big mass of points, and some other points spread around. It
is interesting to remark that the hyperplanes intersect around the same place
- where the big mass is situated - and take different directions, to be the closer
possible to the spread points.
Figure 4 shows the complexity of the diabetes dataset.

7.2 Numerical experiments: description

We are looking for three and four clusters in both Letters and Pendigits
datasets. Dimension of optimization problems is equal to 51 in the case of
428 A. Rubinov et al.

Fig. 4. Diabetes database, with 1 hyperplane per class

3 skeletons and 68 in the case of 4 skeletons. We use the same sub-datasets


as in section 6 (Penl, Pen2, Letl, Let2).
We apply local techniques (DG and DG+CAM) for minimization of the
generalized skeleton function. Then we use a procedure which is similar to the
one we use for the cluster function to estimate the obtained results. First, we
find skeletons in original datasets (or in reduced datasets). Then we evaluate
the skeleton function values in original datasets using the obtained skeletons.
For the skeleton function the problem of constructing a good initial point
has not been studied yet. Therefore, in our numerical experiments as an initial
point we choose a feasible point. We also use "multi start" option to compare
results obtained starting from different initial points.
Minimization of the Sum of Minima of Convex Functions 429

7.3 Numerical experiments: results

In this subsection we present the results obtained for the skeleton function.
Our goal is to find the centres in original datasets, therefore we do not present
the generalized skeleton function values. Table 12 and Table 13 present the
values of the skeleton function evaluated in the corresponding original datasets
(Pendigits and Letters respectively) according to the skeletons obtained as
optimization results reached in datasets from the first column of the tables. We
use two different optimization methods: DG and DG+CAM and two different
types of initial points: "single start" (DG or DG+CAM) and "multi start"
(DGMULT or DG+CAMMULT).

Table 12. Skeleton function: Pendigits


Number of Dataset Size Skeleton function values
seletons DG DGMULT DG+CAM DG+CAMMULT
Penl 216 2137.00 1287.58 1832.97 1320.00
3 Pen2 426 735.00 735.47 735.47 735.47
Pendigits 10992 567.20 567.20 567.20 566.55
Penl 216 1223.16 1315.68 1194.65 1180.79
4 Pen2 426 1360.16 946.74 1322.46 946.74
Pendigits 10992 905.56 905.56 905.56 661.84

Table 13. Skeleton function: Letters


Number of Dataset Size Skeleton function values
seletons DG DGMULT DG+CAM DG+CAMMULT
Letl 353 1548.30 1548.30 1545.58 1545.58
3 Let2 810 2201.75 1475.77 2171.01 1608.14
Letters 20000 1904.71 1904.71 1904.71 964.37
Letl 353 1566.69 1566.69 1531.99 1531.99
4 Let2 810 2030.20 2030.20 1892.31 1892.31
Letters 20000 964.37 850.14 850.14 850.14

The most important conclusion to the results is that in the case of the
skeleton function the best optimization results (the lowest value of the skeleton
function) have been reached in the experiments with the original datasets. It
means that the proposed cleaning procedure is not as efficient in the case of
skeleton function as it is in the case of the clustering function. However, in the
case of the clustering function the initial points for the optimization methods
have been chosen after some preliminary study. It can happen that an efficient
choice of initial points leads to better optimization results for both kinds of
datasets: original and reduced.
430 A. Rubinov et al.

Recall that (7) is a constrained optimization problem with equality con-


straints. This problem is equivalent to the following constrained optimization
problem with inequality constraints

min ^ min |[/i, a] — QJ subject to ||/j|| > 1, Cj 6 IR (j = 1 , . . . , k). (9)

In our numerical experiments we use both formulations (7) and (9). In


most of the experiments the results obtained for (7) are better than for (9) but
computational time is much higher for (7) than for (9). It is recommended,
however, to use the formulation (9) if, for example, experiments with (7)
produce empty skeletons.

7.4 Other experiments

Another set of numerical experiments has been carried out on the both ob-
jective functions. Although of little interest from the point of view of the
optimization itself, to the authors' opinion it may bring some more light on
the clustering part.
The objective functions (2) and (7) has been minimized using two different
methods: the discrete gradient method described above, and a hybrid method
between the DG method and the well known simulated annealing method.
This command is described with details in [BZ03].
The basic idea of the hybrid method is to alternate the descent method
to obtain a local minima and the simulated annealing method to escape this
minimum. This reduces drastically the dependency of the local method on an
initial point, and ensures that the method reaches a "good" minimum.
Numerical experiments were carried out on the Pendigit and Letters
datasets for the generalized cluster function using different size dataset ap-
proximations. The results have shown that the hybrid method reached a sen-
sibly comparable value as the other methods, although the algorithm had to
leave up to 50 local minima. This can be explained by the large number of
local minima in the objective function, each close to one another.
The skeleton function was minimized for the Heart Disease and the Dia-
betes datasets. The same behaviour can be observed. As the results of these
experiments were not drawing any major conclusion, they are not shown here.
Numerical experiments have shown that while considerably faster than the
simulated annealing method, the hybrid method is still fairly slow to converge.

8 Conclusions
8.1 Optimization

In this paper, a particular type of optimization problems has been presented.


The objective function of these problems is the sum of mins of convex func-
Minimization of the Sum of Minima of Convex Functions 431

tions. This type of problems appears quite often in the area of data analysis,
and two examples have been solved.
The generalized cluster function has been minimized for two datasets, us-
ing three different methods: the LGO global optimization software included in
GAMS, the discrete gradient method and a combination between this method
and the cutting angle method.
The last two methods have been started from carefully selected initial
points and from a random initial point.
The LGO software failed most of the time to reach even a good solution.
This is due to the fact that the objective function has a very complex structure.
This method was limited in time, and may have reached the global solution,
had it been given a limitless amount of time.
Similarly, the local methods failed to reach the solution when started from
a random point. The reason is the large amount of local minima in the objec-
tive function which prevent local methods to reach a good solution.
However the discrete gradient method, for all the examples, reached a good
solution for at least one of the initial point. The combination reached a good
solution for all of the initial points.
This shows that for such types of functions, presenting a complex structure
and many local minima, most global methods will fail. However, well chosen
initial points will lead to a deep local minimum. Because the local methods
are much faster than global ones, it is more advantageous to start the local
method from a set of carefully chosen initial points to reach a global minimum.
The application of the combination between the discrete gradient and the
cutting angle methods appears to be a good alternative, as it is not very
dependant on the initial point, while reaching a good solution in a limited
time.
The second set of experiments was carried out over the hyperplanes func-
tion. This function having been less studied in the literature, it is harder to
draw definite conclusions. However, the experiments show very clearly that the
local methods once again strongly depend on the initial point. Unfortunately
it is harder to devise a good initial point for this objective function.

8.2 Clustering

Prom the clustering point of view, two different similarity functions have been
minimized. The first one is a variation of the widely studied cluster function,
where the points are weighted. The second one is a variation of the Bradley-
Mangasarian function, where distances from the hyperplanes are taken instead
of their square.
A method for reducing the size of the dataset, e-cleaning, has been devised
and applied. Different values for epsilon lead to different sizes of datasets.
Numerical experiments have been carried out for different values of epsilon,
leading to very small (2% and 4%) datasets.
432 A. Rubinov et al.

For the generalized cluster function, this method proves to be very suc-
cessful: even for very small datasets, the function value obtained is very sat-
isfactory. When the method was solved using the global method LGO, the
results obtained for the reduced dataset were almost always better than those
obtained for the original dataset. The reason is that the larger the dataset, the
larger number of local minima for the objective function. When the dataset is
reduced, what is lost in measurement quality is gained by the strong simplifi-
cation of the function. Because each point in the reduced dataset acts already
as a centre for its neighbourhood, minimizing the generalized cluster function
is equivalent to group these "mini" clusters into larger clusters.
It has to be noted that there is not a monotone correspondence between
the value of the generalized cluster function for the reduced and the original
dataset. It may happen that a given solution is better than another one for
the reduced dataset, and worse for the original. Thus we cannot conclude that
the solution can be reached for the reduced dataset. However, the experiments
show that the solution found for the reduced dataset is always good.
For the skeletons function, however, this method is not so successful. Al-
though this has to be taken with precautions, as the initial points for this
function could not be devised so carefully as for the cluster function, one can
expect such behavior: the reduced dataset is actually a set of cluster cen-
tres. The skeleton approach is based on the assumption that the clusters in
the dataset can be represented by hyperplanes, while the cluster approach
assumes that the clusters are represented by centres.
The experiments show the significance of the choice of the initial point to
reach good clusters. While random points did not allow any method to reach
a good solution, all initial points selected upon the structure of the dataset
lead the combination DG-CAM to the solution.
Since for the cluster function we are able to provide some good initial
points, but not for the skeleton function, unless the structure of the dataset
is known to correspond to some skeletons, we would recommend to use the
centre approach.
Finally the comparison between the results obtained by the two different
methods has to be relativized: experiments having shown the importance of
initial points, it is difficult to draw definitive conclusions fi:om the results
obtained for the skeleton approach.
However, there seems to be a relationship between the classes and the
clusters obtained by both approaches, some classes being almost absent from
certain clusters. Further investigations should be carried out in this direction,
and classification processes based on these approaches could be proposed.

Acknowledgements

The authors are very thankful to Dr. Adil Bagirov for his valuable comments.
Minimization of the Sum of Minima of Convex Functions 433

References
[Bag99] Bagirov, A.M.: Derivative-free methods for unconstrained nonsmooth op-
timization and its numerical analysis. Investigacao Operacional, 19, 75-93
(1999)
[BRSY03] Bagirov, A.M., Rubinov, A.M., Soukhoroukova, N., Yearwood, J.: Unsu-
pervised and Supervised Data Classification Via Nonsmooth and Global
Optimization. Sociedad de Estadistica e Investigacion Operativa, Top, 1 1 ,
1-93 (2003)
[BRY02] Bagirov, A.M., Rubinov, A.M., Yearwood, J.: A global optimization ap-
proach to classification. Optimization and Engineering, 3, 129-155 (2002)
[BRZ05a] Bagirov, A., Rubinov, A., Zhang, J.: Local optimization
method with global multidimensional search for descent. Jour-
nal of Global Optimization (accepted) (http://www.optimization-
online.org/DB_FILE/2004/01/808.pdf)
[BRZ05b] Bagirov, A., Rubinov, A., Zhang, J.: A new multidimensional descent
method for global optimization. Computational Optimization and Appli-
cations (Submitted) (2005)
[BZ03] Bagirov, A.M., Zhang, J.: Hybrid simulating anneaUng method and dis-
crete gradient method for global optimization. In: Proceedings of Indus-
trial Mathematics Symposium, Perth (2003)
[BBM03] Beliakov, G., Bagirov, A., Monsalve, J.E.: Parallelization of the discrete
gradient method of non-smooth optimization and its applications. In: Pro-
ceedings of the 3rd International Conference on Computational Science.
Springer-Verlag, Heidelberg, 3, 592-601 (2003)
[BMOO] Bradley, P.S., Mangasarian, O.L.: /c-Plane clustering. Journal of Global
Optimization, 16, 23-32 (2000)
[BLM02] Brimberg, J., Love, R.F., Mehrez, A.: Location/Allocation of queuing fa-
cilities in continuous space using minsum and minmax criteria. In: Parda-
los. P., Migdalas, A., Burkard, R. (eds) Combinatorial and Global Opti-
mization. World Scientific (2002)
[DR95] Demyanov, V., Rubinov, A.: Constructive Nonsmooth Analysis. Peter
Lang (1995)
[GRZ05] Ghosh, R., Rubinov, A.M., Zhang, J.: Optimisation approach for cluster-
ing datasets with weights. Optimization Methods and Software, 20 (2005)
[HF02] Hedar, A.-R., Fukushima, M.: Hybrid simulated annealing and direct
search method for nonlinear unconstrained global optimization. Optimiza-
tion Methods and Software, 17, 891-912 (2002)
[JMF99] Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM
Computing Surveys, 3 1 , 264-323 (1999)
[Kel99] Kelly, C.T.: Detection and remediatio of stagnation in the Nelder-Mead
algorithm using a sufficient decreasing condition. SI AM J. Optimization,
10, 43-55 (1999)
[MST94] Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds): Machine Learning,
Neural and Statistical Classification. Ellis Horwood Series in Artificial
Intelligence, London (1994)
[SU05] Soukhoroukova, N., Ugon, J.: A new algorithm to find a shape of a finite
set of points. Proceedings of Conference on Industrial Optimization, Perth,
Australia (Submitted) (2005)
434 A. Rubinov et al.

[YLT04] Yiu, K.F.C., Liu, Y., Teo, K.L.: A hybrid descent method for global opti-
mization. Journal of Global Optimization, 28, 229-238 (2004)
[GAM05] http://www.gams.com/
[LGO05] http://www.gams.com/solvers/lgo.pdf
[Pin05] http://www.dal.ca/ jdpinter/
[CIA05] http://www.ciao-go.com.au/index.php
Analysis of a Practical Control Policy for
Water Storage in Two Connected Dams

Phil Hewlett^, Julia Piantadosi^, and Charles Pearce^

^ Centre for Industrial and Applied Mathematics


University of South Australia
Mawson Lakes, SA 5095, Australia
phil.howlettQunisa. edu. au, j u l i a . p i a n t a d o s i Q u n i s a . edu. au
^ School of Mathematics
University of Adelaide
Adelaide, SA 5005, Austraha
cpearceQmaths. adelaide. edu. au

S u m m a r y . We consider the management of water storage in two connected dams.


The first dam is designed to capture stormwater generated by rainfall. Water is
pumped from the first dam to the second dam and is subsequently supplied to users.
There is no direct intake of stormwater to the second dam. We assume random
generation of rainfall according to a known probability distribution and wish to
find practical pumping policies from the capture dam to the supply dam in order to
minimise overflow. Within certain practical policy classes each specific policy defines
a large sparse transition matrix. We use matrix reduction methods to calculate the
invariant state probability vector and the expected overflow for each policy. We
explain why the problem is more difficult when the inflow probabilities are time
dependent and suggest an alternative procedure.

1 Introduction
T h e mathematical literature on storage dams, now half a century old, devel-
oped largely from the seminal work of Moran [Mor54, Mor59] and his school
(see, for example, [Gan69, Yeo74, Yeo75]). Moran was motivated by specific
practical problems faced by the Snowy Mountain Authority in Australia in the
1950s. Our present study is likewise motivated by a specific practical problem
at Mawson Lakes in South Australia relating to a pair of dams in t a n d e m .
T h e mathematical analysis of dams has proved technically more difficult
t h a n t h a t of their discrete counterpart, queues. In order to deal with the
complexity of a t a n d e m system, we t r e a t a discretised version of the prob-
lem and adopt the m a t r i x - a n a l y t i c methodology of Neuts and his school
(see [LR99, Neu89] for a modern exposition). T h e N e u t s ' methodology is well
436 P. Hewlett et al.

suited for handling processes with a bivariate state space, here the contents
of the two dams.
A further new feature in this study is the incorporation of control. For
recent work on control in the context of a dam, see [Abd03] and the references
therein. The present article is prehminary and raises issues of both practical
and theoretical interest.
In Section 2 we formulate the problem in matrix-analytic terms and in
Section 3 provide an heuristic for the determination of an invariant probability
measure for the process. This depends on the existence of certain matrix
inverses. Section 4 sketches a purely algebraic procedure for establishing the
existence of these inverses. In Section 5 we show how this can be simplified and
systematised using a probabilistic analysis based on modern machinery of the
matrix-analytic approach. In Section 6 we describe briefly how these results
enable us to determine expected long-term overflow, which is needed for the
analysis of control procedures. We conclude in Section 7 with a discussion of
extensions of the ideas presented in the earlier sections.

2 Problem formulation
We assume a discrete state model and let the first and second components of

z = z{t) e [0,/i] X [0,A:] C Z^

denote respectively the number of units of water in the first and second dams
at time t. We assume a stochastic intake to the capture dam where pr denotes
the probability that r units of water will enter the dam on any given day
and a regular demand from the supply dam of 1 unit per day. To begin we
assume that pr > 0 for all r = 0 , 1 , 2 , . . . and we will also assume that these
probabilities do not depend on time. The first assumption is a reasonable
assumption in practice but the latter assumption is certainly not reasonable
over an extended period of time. We revise these assumptions later in the
paper.
We consider a class of practical pumping policies where the pumping deci-
sion depends only on the contents of the first dam. Choose an integer me [1, /i]
and pump m units from the capture dam to the supply dam each day when
the capture dam contains at least m units. For an intake r there are two basic
transition patterns
• (^i,o)^(Ci,o)
• {Z1,Z2) -^ ( C l , ^ 2 - 1)
where (i = min{[2;i -\-r],h} for zi < m, and two basic transition patterns
. (^i,0)^(Cr,m)
. (^i,^2)-(cr,C2)
Analysis of a practical control policy 437

where Q = min{[2;i — m -\- r], h} and where Q = min{[2:2 — 1 -f m], k}, for
zi > m. These transitions have probability Pr- T h e variable m is the control
variable for a class of practical control policies b u t in this paper we assume
m is fixed and suppress any notational dependence on m.
We now set up a suitable Markov chain to describe the process. In terms
of m a t r i x - a n a l y t i c machinery, it t u r n s out to be more convenient to use the
ordered pair {z2^zi) for the state of the process rather t h a n the seemingly
more n a t u r a l {zi^Z2). This we do for the remainder of the article. We now
order the states as

(0,0),...,(0,/i),(l,0),...,(l,/i),...,(fc,0),...,(fc,/i).

T h e first component (that is, the content of d a m 2) we refer to as the level of


the process and the second component (the content of d a m 1) as the phase.
T h e one-step transition m a t r i x

P e M(^+I)(^+I)X(^+I)(^+I)

t h e n has a simple block structure

0 1 ' ' m k

A 0 • • B 0 0 0 0
^ 0 - J 5 0 0 0 0
0 A ' ' 0 B 0 0 0

0 0 - A 0 0 0 0
P = 0 0 ' ' 0 A 0 0 0

0 0 - - 0 0 BOO
0 0- - 0 0 0 B 0
0 0 - - 0 0 0 0 J 3

0 0 - 0 0 A 0 B
O O ' - O O 0 A B

where
A and B e x(h+l)

On the one hand we have

An Ai2
0 0

where
438 P. Hewlett et al.

P o P i • Pm-2 Pm-l
0 po- Pm-3 Pm-2
^11 =
0 0 • Po Pi
0 0- 0 PO

and
Pm P m + 1 Ph-1 Ph
Pm—1 Pm Ph-2 Ph-1
An =
Pi P2 " Ph-m Ph-m+l.

where we have defined p ^ == Pr + Pr+i + •' and on the other hand

0 0
B
B21 B22

where
PoPi Pm-2 Pm-l
0 Po Pm-3 Pm-2

0 0 Po Pi
-^21
0 0 0 PO
0 0 0 0

0 0 0 0
and
Pm Pm+1 Ph-1 Ph
Pm.—1 Pm, Ph-2 Ph-1

B22 Po Pi Ph-m-1 Ph-m


0 PO Ph-m-2Ph-m-l

L 0 0 '" Pm-

3 Intuitive calculation of the invariant probability


We consider an intuitive calculation of the invariant probability measure TT. If
we write

then the equation TT = nP can be rewritten as a linear system


Analysis of a practical control policy 439

TTo = TTQA + TTIA (1)

TTi = TTi^iA (1 < i < m ) (2)


T^m = TTQB + TTiB + TTm+lA (3)
TTi = TTi-m+lB + TTi^iA {TU < i < k) (4)
TTk =" TTk-m-^lB H + TTkB. (5)

We wish to know if this system has a unique solution. In a formal sense we


observe that the sequence of non-negative vectors

satisfy the recurrence relations


7ri=7ri^iVi {0<i<k) (6)
where the sequence of matrices

is defined as follows. Let


Vo = A{I-A)-' (7)
Vi = A, (0 < i < m) (8)
Vm = A[l-A'"-\l-A)-^By' (9)
Vi = A[I-Wi-i,i-.m+iBr' {m<i<k) (10)
where
Wi,e:=ViVi+i...Ve (i > £) (11)
provided the required inverse matrices exist and let
r m—i
B. (12)

The vector TT^ is a scalar multiple of the invariant probability measure for the
transition matrix Vk- We conclude that the invariant probability measure n
for the transition matrix P is unique if and only if the associated invariant
probability measure
^k '= ^k/iTTk ' 1)
for the transition matrix Vk is uniquely defined. We have established the
following rudimentary result.
T h e o r e m 1. If the sequence of matrices

is well defined by the formulae (7)-(12) then there exists an invariant measure
n for the transition matrix P. The measure is not necessarily unique.
440 P. Hewlett et al.

4 Existence of the inverse matrices


Provided pr > 0 for all r < /i the matrix An is strictly sub-stochastic with

^11 • 1 = <i.
pfj
It follows that (7 — All) ^ is well defined and hence
{I - An)-'Au{I - An)-'
0 /
is also well defined. It is necessary to begin with an elementary but important
result. This result, and other later results in this section, have already been
established by Piantadosi [Pia04] but for convenience we present details of the
more elementary proofs to indicate our general method of argument.
L e m m a 1. If pr > 0 for a// r = 0 , 1 , . . . then
{I-A)-'B'1 = 1 and A'^-'il - A)-'B -1 < A"^''-1
and the matrix Vm — A[I — A^~^{I — A)~^B]~^ is well defined.
Proof Note that A'l-{- B -1 = 1 implies B - 1 = {I - A) - 1 and hence
{I-A)-'B'1 = 1.
Now
A^-\I-A)-'B'1 = A'^-' -1

0 0
ATf'lAn-l+A 12 • 1]
0
Am-2

0
< 1.
Hence Vm = A[I - A ^ - i ( / - A)-^B]-'^ is well defined D

To establish the existence of the remaining inverse matrices it is necessary


to establish some important identities.
Lemma 2. The (JP) identities
771—1

E^^
i=l
i-l, i-e B'l = l

are valid for i = m + l,...,A: — 1 and hence the matrix Vi = A[I —


Wi-i^ i-rn+iB]~^ is well defined.
Analysis of a practical control policy 441

Proof. For details of the rather long and difRcult proof we refer the reader to
Piantadosi [Pia04] where the notation dictates that the identities are described
and established in two parts as the (JP) identities of the first and second kind.
The complexity of these identities is masked in the current paper by notational
sophistication. D

5 Probabilistic analysis
In practice the matrix P can be expected to be irreducible. First we establish
the following simple sufficient condition for this to be the case.

Theorem 2. Suppose A, B have the forms displayed above and that k > m.
If
(i) m > 1 and
(ii) po,pi,...,ph-i,p^ > 0,
then the matrix P is irreducible.
Proof. We use the notation P(ij)^(r,s) to refer to the element in the matrix
P describing the transition from state {i,j) to state (r, s) and we write A =
[aj^s] and B = [bj^s] to denote the individual elements of A and B. To prove
irreducibility, it suffices to show that, for any state (i,j), there is a path of
positive probability from state (fc, /i) to state (k^h) and a path of positive
probability from state (i, j ) to state {k,h).
The former may be seen as follows. For i = k with h — m < j < h^

P(k,h),(kj) = bhj > 0

by (ii), so there is a path consisting of a single step. For i = k with 0 < j <
h — m^ there exists a positive integer £ such that h — m < j -\- £{m -\-1) < h.
One path of positive probability from (fc, h) to (/c, j ) consists of the consecutive
steps

(/c, h) -^ {kj + e{m + 1)) ^ (fc, j + {£- l)(m + 1)) -^ . . . ^ (fc, j ) .

Finally, for i < k, one such path is obtained by passing from {k,h) to (A:,0)
as above and then proceeding

(fc,0) ^ (fc - 1,0) -^ . . . ^ (z + 1,0) ^ {ij).

We now consider passage from (i, j ) to (fc, h). For j = 0, (i, j ) has one-step
access to (0, h) (if i = 0) or to (i — 1, h) (if i > 0), while for j > 0, (i, j ) has
one-step access to (z + m, /i) (if i — 0), to (i + m — 1, /i) (if 0 < i < /c — m + 1)
or to {k,h) (iffc— m + 1 < i <k). Putting these results together shows that
each state (i, j ) has a path of positive probability to {k, h).
By the results of the two previous paragraphs, the chain is irreducible. D
442 P. Hewlett et al.

Next we derive invertibility results for some key (/i + 1) x (/i+ 1) matrices.
While this can be effected purely in terms of matrix arguments, a shorter
derivation is available employing probabilistic arguments, based on successive
censorings of a Markov chain.

Theorem 3. Suppose conditions (i) and (ii) of Theorem 2 apply. Then there
exists a sequence {Vi}o<i<k of {h-\-1) x {h-\-1) matrices defined by equations
(7), (8), (9), (10), (11) and (12). The matrices F o , . . . , T 4 - i are invertible,

Proof It suffices to show that the formulae (7), (8), (9) hold and that for
k > m-f 1 the formula (10) is vahd. Let Co be a Markov chain of the same form
as P but with k replaced by K > 2k. By Theorem 2, Co is irreducible and
finite and so positive recurrent. Denote by Ci {1 < i < k) the Markov chain
formed by censoring out levels 0, 1, ... , i — 1, that is, observing Co only when
it is in the levels i,i + l , . . . , K . The chain Ci must also be irreducible and
positive recurrent. For 0 < i < fc, denote by Pi the one-step transition matrix
of Ci and by Qi its leading block. Then Qi is the sub-stochastic one-step
transition matrix of a Markov chain Vi whose states form level i of CQ. Since
Ci is recurrent, the states of T>i must all be transient and so X^^o Q? *^ ^^•
Hence I — Qi is invertible foi 0 < i < k.
We shall show that the matrices

Vi-A{I-Qi)-' {0<i<k)

satisfy the conditions of the enunciation. Nonnegativity of Vi is inherited from


that of Qi. We have (7) and (8) immediately, since Qo = A and we have easily
that (5z = 0 for 0 < i < m. We now address (9) and (10).
One-step transitions in Vm arise from paths of two types. In the first, the
process passes in sequence through levels m , m — 1,...,0. These give rise to
a one-step transition matrix A^~^B. Paths of the second type are the same
except that they spend one or more time points in level 0 between occupying
levels 1 and m. These give rise to a one-step transition matrix

oo

n=0
Thus enumerating all paths yields

Qm = A'^-^B -h ^ ^ ( / - A)-^B - A'^-^I - A)-^B,

which provides (9). Prom similar enumerations of paths, the leading row of
Pn may be derived to be

Qm A'^-'^B A'^-^B ...AB BO ...0.

The other rows of Pm are given by rows m + l , m + 2 , . . . , i ^ o f P o restricted


to columns m , m + l , . . . , J ^ . The first two rows of Pm are then
Analysis of a practical control policy 443
Qm A'^-^B A'^-^B ,..ABBO...O
A 0 0 ...005...0,

from which we derive

Qm+l =A[I- Qm]'' A^-^B = VmVm-l • • • V^2

and that the leading row of Pm+i is

Qm^i VmA'^-^B VmA^-^B . . . VmAB VmB B 0 ... 0.

Using the notation in equation (11) we can write

Qm+l = Wm,2B

and the leading row of Pm+i may be expressed as

Qm+l Wm^sB WmAB . . . Wm^mB J5 0 . . . 0.

We may use this as a basis (z = m + 1) for an inductive proof that for


m <i<k
Qi = Wi-i^i-m-j-lB
and the leading row of Pi is

Qi Wi-i^i-m^iB Wi-.i,i-.m+2B . . . 1^^-1,^-15 5 0 . . . 0.

Suppose these hold for some i satisfying m < i < k. Since the two leading
rows of Pi are

Qi Wi-i,i-m+2B Wi-i^i-m+sB . . . Wi-i^i-iB B 0 ... 0


A 0 0 ... 0 05...0,

we have

Qi+l = A[I — Qi] Wi-i^i-m-\-2B = ViWi-i^i-m-\-2B = Wi,i_m+25

and, since VtWi-i^e = Wi/, that the leading row of Pi-^i is

Qi^i Wi,i-m^iB Wi^i-m+2B ... Wi^i-iB Wi^iB J3 0 . . . 0,

providing the inductive step. D

Under assumptions (i) and (ii) of Theorem 2, we may now proceed to


the determination of the invariant measure TT = (TTQ, TTI, . . . , TT/C) of the block-
entry discrete-time Markov chain P. The relation TT = TTP yields the block
component equations (1), (2, (3), (4) and (5). The evaluation of TT may be
effected by the following.
444 P. Hewlett et al.

Theorem 4. Suppose that conditions (i) and (ii) of Theorem 2 apply. Then
the probability vectors TTI satisfy the recurrence relations (6) and ixk is the
invariant measure of the matrix Vk defined by (12). The measure TT is unique.

Proof For i = 0, (6) follows from (1) and (7). For 0 < i < m, (6) is
immediate from (2) and (8). These two parts combine to provide

TTo = iTmA'^il - A)-^ and ^i = 7 r ^ ^ ^ - \

so that (3) may be cast as

TTm [I - A^-\I - A)-'B] - TT^+iA.

Equation (6) for i = m follows from (9).


We have now shown that (6) holds for 0 < i < m, from which

7r2 =" TTm+l V m K n - l . • • V2.

Hence (4) with i = m + 1 yields

TTm+l [I - Vm+l • • • V3B] = 7rm-\-2A.

By (11), this is (6) for i = m + 1, which supplies a basis for a derivation


of the remainder of the theorem by induction. For the inductive step, suppose
that (6) holds for i = m + 1 , . . . , Q' for some q with m < q < k. Then from (4),

TTq+l = TTq^iVqVq-i . . . Vq-m-\-2B + 7rg+2^-

By (11), this is simply (6) with i = g + 1, and so we have established the


inductive step.
As a direct consequence we have

TTi = TTkVk-l ...Vi= TTkWk-l, i

ioi 0 <i < k SO that (5) implies


k-1
^k = TT/c B = 7^kVk,
i=k—m-\-l

by definition. Hence TT/C is an invariant measure of Vk- Any invariant measure


TT/c of Vk induces via (6) a distinct invariant measure TT for P. Since the irre-
ducibility of P guarantees it has a unique invariant measure (to a scale factor),
TTfc is unique invariant up to a scale factor. This completes the proof. D
Analysis of a practical control policy 445

6 The expected long-term overflow


Using the invariant probability measure TT we can calculate the expected over-
flow of water from the system. Let (i,j) G [0, A;] x [0, ft] denote the collection
of all possible states. The expected overflow is calculated by

^ = EEE/[(i,i)Hpr-
i=0 j=o lr=0

where TT^J is the invariant probability of state (z, j ) at level i and phase j and
f[{i,j)\r] is the overflow from state (i, j ) when r units of stormwater enter the
system. Note that we have ignored pumping cost and other costs which are
likely to be factors in a real system.

7 Extension of the fundamental ideas


The assumption that pr > 0 for all r = 0 , 1 , . . . is convenient and is usually
true in practice but many of the general results remain true with weaker
assumptions. Let us suppose that the system is balanced. That is we assume
that the expected daily supply is equal to the daily demand. Thus we assume
that
0 • po + 1 • pi + 2 • p2 4-• • • = L
Since

it follows that the condition po — 0 would imply that pi = 1 and p^ = 0 for


all r > 2. This condition is not particularly interesting and suggests that the
assumption po > 0 is a reasonable assumption. If we assume also that po < 1
then it is clear that there is some r > 1 such that Pr > 0.
By using a purely algebraic approach Piantadosi [Pia04] effectively estab-
lished the following result.

Theorem 5,IfpQ>0 and p^ > 0 then there is at least one finite cycle with
non-zero invariant probability that includes all levels 0 , 1 , . . . , A: of the second
dam. All states have access to this cycle in finite time with finite probability
and hence are either transient with invariant probability zero or else are part
of a single maximal cycle.

Proof (Outhne) In this paper we have tried to look beyond a simply alge-
braic view. For this reason we suggest an alternative proof. Let po = ^ > 0.
If p+ > 0 then there is some r > m with p^ = e > 0. Our argument here
assumes r > m. Choose p so that 0 < h — pm < m and choose s so that
{s + l)r — (p + s)m > 0 and s{m — 1) + 1 > fc and t so that t > p + k and
consider the elementary cycle
446 P. Hewlett et al.

(0, h — pm) —> (0, /i — pm + r) —> (m, h — {p+ l)m + 2r) -^


(2m - 1, /i - (p + 2)m + 3r) -> > (fc, /i) -> > {k,h) -^ {k,h - m) -^
• • • —> (fc, /i — pm) —> (A: — 1, /i — pm) ^ • • • —^ (0, /i — pm) —>
• • • —> (0, /i — pm)

for the state (i, j ) of the system. We have 5 + 1 consecutive inputs of r


units followed by t consecutive inputs of 0 units. The cycle has probability
Pr^'^^Po^ = e^'^^S^. It is obvious that the state (/c, h) is accessible in finite time
with finite probability from any initial state (i, j ) . It follows that all states
are either transient or are part of a unique irreducible cycle. Of course the
irreducible cycle must include the elementary cycle. Hence there is a unique
invariant probability

where the invariant probability TT^ for level i is non-zero for all i = 0 , . . . , A;. All
transient states have zero probability and all states in the cycle have non-zero
probability. D

Observe that by adding together the separate equations (1), (2), (3), (4)
and (5) for the vectors TTQ, . . . , TT^ we obtain the equation

(TTO + • • • + 7rk)iA + 5 ) =. (TTO + • • • + TTk).

Therefore
p — TTo H h TTfc

is an invariant probability for the stochastic matrix

S = A + B.

Indeed a little thought shows us that S is the transition matrix for the phase
j of the state vector. By analysing these transitions we can shed some light
on the structure of the full irreducible cycle for the original system.
We have another interesting result.

Theorem 6. Ifpo = 5 > 0 andpr = e > 0 for some r > m and z/gcd(m, r) =
1 then for every phase j ~ 0, l , 2 , . . . , / i we can find non-negative integers
P — P{j) ^^^ Q = QU) ^^c/i that

pr — qm = j

and the chain with transition matrix S = A-^ B is irreducible.

Proof (Outline) We suppose only that po > 0 and Pr > 0 for some r > m.
In the following phase transition diagram we suppose that

r — m < m, 2r — 3m < m,
Analysis of a practical control policy 447

and note that the following phase transitions are possible for j with non-zero
probability.

0 -> [0 U r]

r —^ [(r — m) U (2r — m)]

{r — m) ^ [{r — m) U (2r — m)]


(2r - m) -^ [(2r - 2m) U (3r - 2m)]

(2r - 2m) ^ [(2r - 3m) U (3r - 3m)]


(3r - 2m) -^ [(3r - 3m) U (4r - 3m)]

(2r - 3m) -^ [(2r - 3m) U (3r - 3m)]

If gcd(m, r) = 1 then it is clear by extending the above transition table that


every phase j G [0, h] is accessible in finite time with finite probability. D

This result means that the unique irreducible cycle for the (i,j) chain
generated by P which already includes all possible levels i G [0, /c] also includes
all possible phases j G [0, h] although not necessarily all states (i,i).

In practice the input probabilities are likely to depend on time. Because


there is a natural yearly cycle for rainfall we have used the notation [t] =
{t - 1) mod 365 -h 1 and Pr = Pr{[t]) for all r = 0,1, 2 , . . . and all t G N. The
transition from day t to day ^ + 1 is described by a matrix P = P{[t]) with the
same block structure as before but with elements that vary from day to day
throughout the year. The transition from day t to day t + 365 is described by

x{t + 365) = x{t)R{[t])

where the matrix R{[t]) is defined by

R{[t]) = P{\t]).-.P{l)P{365)-..P{\t + l]).

In principle we can calculate an invariant probability 7r{[t]) for each matrix


R{[t]) and it is easy to show that successive invariant probabilities are related
by the equation
ni[t + l])^ni[t])Pi[t]).
However, although all P{[t]) have the same block structure this structure is
not preserved in the product matrix R{[t]) and it is not clear that matrix
448 P. Hewlett et al.

reduction methods can be used in the calculation of 7r([t]). It is obvious that


the invariant probabilities for the phase j on day [t] can be calculated from
p{[t])=p{[t])Sm"'S{l)S{365)^-^S{[t]^l)

where S{[t]) = A{[t]) + B{[t]). Unfortunately knowledge of p{[t]) does not


help us directly to calculate 7r([t]). In general terms the existence of a unique
invariant probability is associated with the idea of a contraction mapping.
Define

T={xeW \x = {xo,...,Xk) > 0 where x^- G R^"^^ andxo-l+-• •+XA;-1 = 1}.


For each ^ = 1,2,... we suppose that the mapping ip[t] : T —
i > T is defined by
<fi[t]{x) = xP{[t])
for each x G T. We have the following conjecture.
Conjecture 1. For each f == 1, 2 , . . . let po(W) > 0 and suppose that for some
r = T[[t]) > m with gcd(r, m) = 1 we have Pr(M) > 0. Then
[^^,]f-\T)Cint(T)
and there is a unique invariant measure 7r([t]) with

<^[t]WW)) = 4W)-

If this conjecture is true then the iteration given by

with
^(t+i) ^ x(*)p([i])
for each i = 1,2,... should satisfy
xW -^ x{[t])
as ^ —> oo. Because the contraction operates in the same structural way for
every value of [t] we expect that convergence will occur quite seamlessly. This
is demonstrated in the following simple example. There is no reason to expect
the convergence to be slower in the case where we have a product of a larger
number of matrices.
Example 1. Let [t] = {t - 1) mod 2 + 1 with R{1) = P(1)P(2) and R{2) =
P(2)P([3]) = P(2)P(1) where
Am 0 B{{t]) 0
Ai[t]) 0 Bi[t]) 0
Pilt]) 0 A{[t]) 0 Bm
0 0 A{[t])Bi{t])
Analysis of a practical control policy 449

for each [t] = 1,2 and

0.5 0.25 0.125 0.125 0 0 0 0


0 0.5 0.25 0.25 0 0 0 0
and B(l) =
0 0 0 0 0.5 0.25 0.125 0.125
0 0 0 0 0 0.5 0.25 0.25

and
0.45 0.27 0.13 0.15 0 0 0 0
0 0.45 0.27 0.28 0 0 0 0
A{2) = and B{2) =
0 0 0 0 0.45 0.27 0.13 0.15
0 0 0 0 0 0.45 0.27 0.28
Using MATLAB we calculate

p(l)-(0.2,0.4,0.2,0.2)

and so we set
1
x^'^ = -(p{l),pil),p{l),p{l))
= (.0500, .1000, .0500, .0500, .0500, .1000, .0500, .0500,
.0500, .1000, .0500, .0500, .0500, .1000, .0500, .0500)
and calculate
x(2^ = (.0500, .1250, .0625, .0625, .0250, .0625, .0312, .0312
.0750, .1375, .0687, .0687, .0500, .0750, .0375, .0375)
x(3) -. (.0338, .1046, .0604, .0638, .0338, .0821, .0469, .0498
.0647, .1148, .0643, .0688, .0478, .0765, .0425, .0457)
x^^^ - (.0338, .1103, .0551, .0551, .0323, .0735, .0368, .0368
.0775, .1338, .0669, .0669, .0534, .0839, .0420, .0420)

x(i3) = (.0291, .0994, .0576, .0607, .0343, .0801, .0456, .0485


.0660, .1199, .0672, .0719, .0494, .0791, .0439, .0472)
x^^^) = (.0317, .1056, .0528, .0528, .0330, .0764, .0382, .0382
.0763, .1323, .0661, .0661, .0556, .0874, .0437, .0437)

Thus w e have

x{l) ^ (.0291, .0994, .0576, .0607, .0343, .0801, .0456, .0485


.0660, .1199, .0672, .0719, .0494, .0791, .0439, .0472)
x{2) ^ (.0317, .1056, .0528, .0528, .0330, .0764, .0382, .0382
.0763, .1323, .0661, .0661, .0556, .0874, .0437, .0437).
450 P. Hewlett et al.

References
[Abd03] Abdel-Hameed, M.: Optimal control of dams using Pjf^ policies and
penalty cost. Mathematical and Computer Modelling, 38, 1119-1123
(2003)
[Gan69] Gani, J.: Recent advances in storage and flooding theory. Advanced Ap-
plied Probability, 1, 90-110 (1969)
[KT65] Karlin, S., Taylor, H.M.: A First Course in Stochastic Processes. Wiley
and Sons, New York (1965)
[LR99] Latouche, G., Ramaswami, V.: Introduction to Matrix Analytic Methods
in Stochastic Modeling. SI AM (1999)
[Mor54] Moran, P.A.P.: A probability theory of dams and storage systems. Journal
of Applied Science, 5, 116-124 (1954)
[Mor59] Moran, P.A.P.: The Theory of Storage. Wiley and Sons, New York (1959)
[Neu89] Neuts, M.F.: Structured Stochastic Matrices of M / G / 1 type and Their
AppHcations. Marcel Dekker, Inc. (1989)
[Pia04] Piantadosi, J.: Optimal Pohcies for Management of Urban Stormwater,
PhD Thesis, University of South Australia (2004)
[Yeo74] Yeo, G.F.: A finite dam with exponential variable release. Journal of Ap-
plied Probability, 1 1 , 122-133 (1974)
[Yeo75] Yeo, G.F.: A finite dam with variable release rate. Journal of Applied
Probability, 12, 205-211 (1975)

You might also like