Implicit Functions and Solutions Mappings. A. L. Dontche, R. T. Rockafellar
Implicit Functions and Solutions Mappings. A. L. Dontche, R. T. Rockafellar
Editorial Board
S. Axler
K.A. Ribet
Implicit Functions
and Solution Mappings
A View from Variational Analysis
With 12 Illustrations
ABC
Asen L. Dontchev R. Tyrrell Rockafellar
Mathematical Reviews University of Washington
416 Fourth Street Department of Mathematics
Ann Arbor, MI 48107-8604 PO Box 354350
USA Seattle, WA 98195-4350
[email protected] USA
[email protected]
Mathematics Subject Classification (2000): 26B10, 47J07, 58C15, 49J53, 49K40, 90C31, 93C70
Springer
c Science+Business Media, LLC 2009
All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject
to proprietary rights.
Setting up equations and solving them has long been so important that, in popular
imagination, it has virtually come to describe what mathematical analysis and its
applications are all about. A central issue in the subject is whether the solution to
an equation involving parameters may be viewed as a function of those parameters,
and if so, what properties that function might have. This is addressed by the classical
theory of implicit functions, which began with single real variables and progressed
through multiple variables to equations in infinite dimensions, such as equations
associated with integral and differential operators.
A major aim of the book is to lay out that celebrated theory in a broader way
than usual, bringing to light many of its lesser known variants, for instance where
standard assumptions of differentiability are relaxed. However, another major aim
is to explain how the same constellation of ideas, when articulated in a suitably
expanded framework, can deal successfully with many other problems than just
solving equations.
These days, forms of modeling have evolved beyond equations, in terms, for ex-
ample, of problems of minimizing or maximizing functions subject to constraints
which may include systems of inequalities. The question comes up of whether the
solution to such a problem may be expressed as a function of the problem’s pa-
rameters, but differentiability no longer reigns. A function implicitly obtainable
this manner may only have one-sided derivatives of some sort, or merely exhibit
Lipschitz continuity or something weaker. Mathematical models resting on equa-
tions are replaced by “variational inequality” models, which are further subsumed
by “generalized equation” models.
The key concept for working at this level of generality, but with advantages even
in the context of equations, is that of the set-valued solution mapping which as-
signs to each instance of the parameter element in the model all the corresponding
solutions, if any. The central question is whether a solution mapping can be local-
ized graphically in order to achieve single-valuedness and in that sense produce a
function, the desired implicit function.
In modern variational analysis, set-valued mappings are an accepted workhorse
in problem formulation and analysis, and many tools have been developed for
v
vi Preface
handling them. There are helpful extensions of continuity, differentiability, and re-
gularity of several types, together with powerful results about how they can be ap-
plied. A corresponding further aim of this book is to bring such ideas to wider
attention by demonstrating their aptness for the fundamental topic at hand.
In line with classical themes, we concentrate primarily on local properties of so-
lution mappings that can be captured metrically, rather than on results derived from
topological considerations or involving exotic spaces. In particular, we only briefly
discuss the Nash–Moser inverse function theorem. We keep to finite dimensions in
Chapters 1 to 4, but in Chapters 5 and 6 provide bridges to infinite dimensions.
Global implicit function theorems, including the classical Hadamard theorem, are
not discussed in the book.
In Chapter 1 we consider the implicit function paradigm in the classical case of
the solution mapping associated with a parameterized equation. We give two proofs
of the classical inverse function theorem and then derive two equivalent forms of it:
the implicit function theorem and the correction function theorem. Then we grad-
ually relax the differentiability assumption in various ways and even completely
exit from it, relying instead on the Lipschitz continuity. We also discuss situations
in which an implicit function fails to exist as a graphical localization of the so-
lution mapping, but there nevertheless exists a function with desirable properties
serving locally as a selection of the set-valued solution mapping. This chapter does
not demand of the reader more than calculus and some linear algebra, and it could
therefore be used by both teachers and students in analysis courses.
Motivated by optimization problems and models of competitive equilibrium,
Chapter 2 moves into wider territory. The questions are essentially the same as in
the first chapter, namely, when a solution mapping can be localized to a function
with some continuity properties. But it is no longer an equation that is being solved.
Instead it is a condition called a generalized equation which captures a more com-
plicated dependence and covers, as a special case, variational inequality conditions
formulated in terms of the set-valued normal cone mapping associated with a con-
vex set. Although our prime focus here is variational models, the presentation is
self-contained and again could be handled by students and others without special
background. It provides an introduction to a subject of great applicability which is
hardly known to the mathematical community familiar with classical implicit func-
tions, perhaps because of inadequate accessibility.
In Chapter 3 we depart from insisting on localizations that yield implicit func-
tions and approach solution mappings from the angle of a “varying set.” We identify
continuity properties which support the paradigm of the implicit function theorem in
a set-valued sense. This chapter may be read independently from the first two. Chap-
ter 4 continues to view solution mappings from this angle but investigates substitutes
for classical differentiability. By utilizing concepts of generalized derivatives, we are
able to get implicit mapping theorems that reach far beyond the classical scope.
Chapter 5 takes a different direction. It presents extensions of the Banach open
mapping theorem which are shown to fit infinite-dimensionally into the paradigm of
the theory developed finite-dimensionally in Chapter 3. Some background in basic
functional analysis is required. Chapter 6 goes further down that road and illustrates
Preface vii
how some of the implicit function/mapping theorems from earlier in the book can
be used in the study of problems in numerical analysis.
This book is targeted at a broad audience of researchers, teachers and graduate
students, along with practitioners in mathematical sciences, engineering, economics
and beyond. In summary, it concerns one of the chief topics in all of analysis, his-
torically and now, an aid not only in theoretical developments but also in methods
for solving specific problems. It crosses through several disciplines such as real and
functional analysis, variational analysis, optimization, and numerical analysis, and
can be used in part as a graduate text as well as a reference. It starts with elemen-
tary results and with each chapter, step by step, opens wider horizons by increas-
ing the complexity of the problems and concepts that generate implicit function
phenomena.
Many exercises are included, most of them supplied with detailed guides. These
exercises complement and enrich the main results. The facts they encompass are
sometimes invoked in the subsequent sections.
Each chapter ends with a short commentary which indicates sources in the liter-
ature for the results presented (but is not a survey of all the related literature). The
commentaries to some of the chapters additionally provide historical overviews of
past developments.
Special thanks are owed to our readers Marius Durea, Shu Lu, Yoshiyuki Sekiguchi
and Hristo Sendov, who gave us valuable feedback on the entire manuscript,
and to Francisco J. Aragón Artacho, who besides reviewing most of the book
helped us masterfully with all the figures. During various stages of the writing we
also benefited from discussions with Aris Daniilidis, Darinka Dentcheva, Hélène
Frankowska, Michel Geoffroy, Alexander Ioffe, Stephen Robinson, Vladimir Veliov,
and Constantin Zălinescu. We are also grateful to Mary Anglin for her help with the
final copy-editing of the book.
The authors
ix
Contents
Preface v
Acknowledgements ix
xi
xii Contents
References 363
Notation 371
Index 373
Chapter 1
Functions Defined Implicitly by Equations
y x
B
A y
A x
B
A.L. Dontchev and R.T. Rockafellar, Implicit Functions and Solution Mappings: A View 1
from Variational Analysis, Springer Monographs in Mathematics,
DOI 10.1007/978-0-387-87821-8 1, c Springer Science+Business Media, LLC 2009
2 1 Functions Defined Implicitly by Equations
Although the reflected graph in this figure is not, as a whole, the graph of a
function, it can be regarded as the graph of something more general, a “set-valued
mapping” in terminology which will be formalized shortly. The question revolves
then around the extent to which a “graphical localization” of a set-valued mapping
might be a function, and if so, what properties that function would possess. In the
case at hand, the reflected graph assigns two different x’s to y when y > 0, but no x
when y < 0, and just x = 0 when y = 0.
To formalize that framework for the general purposes of this chapter, we focus
on set-valued mappings F from IRn and IRm , signaled by the notation
F : IRn →
→ IR ,
m
by which we mean correspondences which assign to each x ∈ IRn one or more ele-
ments of IRm , or possibly none. The set of elements y ∈ IRm assigned by F to x is
denoted by F(x). However, instead of regarding F as going from IRn to a space of
subsets of IRm we identify as the graph of F the set
gph F = (x, y) ∈ IRn × IRm y ∈ F(x) .
Every subset of IRn × IRm serves as gph F for a uniquely determined F : IRn → → IRm ,
so this concept is very broad indeed, but it opens up many possibilities.
When F assigns more than one element to x we say it is multi-valued at x, and
when it assigns no element at all, it is empty-valued at x. When it assigns exactly
one element y to x, it is single-valued at x, in which case we allow ourselves to write
F(x) = y instead of F(x) = {y} and thereby build a bridge to handling functions as
special cases of set-valued mappings.
Domains and ranges get flexible treatment in this way. For F : IRn → → IRm the
domain is the set
dom F = x F(x) = 0/ ,
while the range is
rge F = y y ∈ F(x) for some x ,
so that dom F and rge F are the projections of gph F on IRn and IRm respectively.
Any subset of gph F can freely be regarded then as itself the graph of a set-valued
submapping which likewise projects to some domain in IRn and range in IRm .
The functions from IRn to IRm are identified in this context with the set-valued
mappings F : IRn → → IRm such that F is single-valued at every point of dom F. When
F is a function, we can emphasize this by writing F : IRn → IRm , but the notation
F : IRn →
→ IRm doesn’t preclude F from actually being a function. Usually, though, we
use lower case letters for functions: f : IRn → IRm . Note that in this notation f can still
be empty-valued in places; it’s single-valued only on the subset dom f of IRn . Note
also that, although we employ “mapping” in a sense allowing for potential multi-
valuedness (as in a “set-valued mapping”), no multi-valuedness is ever involved
when we speak of a “function.”
A clear advantage of the framework of set-valued mappings over that of only
functions is that every set-valued mapping F : IRn → → IRm has an inverse, namely the
1 Functions Defined Implicitly by Equations 3
namely
1/2
∑ j=1 x2j
n
|x| = x, x = .
We denote the closed unit ball IB1 (0) by IB. A neighborhood of x̄ is any set U such
that IBr (x̄) ⊂ U for some r > 0. We recall for future needs that the interior of a
set C ⊂ IRn consists of all points x such that C is a neighborhood of x, whereas
the closure of C consists of all points x such that the complement of C is not a
neighborhood of x; C is open if it coincides with its interior and closed if it coincides
with its closure. The interior and closure of C will be denoted by int C and cl C.
Graphical localization. For F : IRn → → IRm and a pair (x̄, ȳ) ∈ gph F , a graphical
localization of F at x̄ for ȳ is a set-valued mapping F̃ such that
so that
F(x) ∩V when x ∈ U,
F̃ : x →
0/ otherwise.
The inverse of F̃ then has
−1 F −1 (y) ∩U when y ∈ V,
F̃ (y) =
0/ otherwise,
in which p acts as a parameter. The question is no longer that of inverting f , but the
framework of set-valuedness is valuable nonetheless because it allows us to imme-
diately introduce the solution mapping
(2) S : IRd →
→ IR with S(p) = x f (p, x) = 0 .
n
We can then look at pairs ( p̄, x̄) in gph S and ask whether S has a single-valued
localization s around p̄ for x̄. Such a localization is exactly what constitutes an
implicit function coming out of the equation. The classical implicit function theorem
deduces the existence from certain assumptions on f . A review of the form of this
theorem will help in setting the stage for later developments because of the pattern
it provides. Again, some basic background needs to be recalled, and this is also an
opportunity to fix some additional notation and terminology for subsequent use.
A function f : IRn → IR is upper semicontinuous at a point x̄ when x̄ ∈ int dom f
and for every ε > 0 there exists δ > 0 for which
If instead we have
then f is said to be lower semicontinuous at x̄. Such upper and lower semicontinuity
combine to continuity, meaning the existence for every ε > 0 of a δ > 0 for which
This condition, in our norm notation, carries over to defining the continuity of a
vector-valued function f : IRn → IRm at a point x̄ ∈ int dom f . However, we also
speak more generally then of f being continuous at x̄ relative to a set D when
x̄ ∈ D ⊂ dom f and this last estimate holds for x ∈ D; in that case x̄ need not belong
to int dom f . When f is continuous relative to D at every point of D, we say it is
continuous on D. The graph gph f of a function f : IRn → IRm with closed domain
dom f that is continuous on D = dom f is a closed set in IRn × IRm .
A function f : IRn → IRm is Lipschitz continuous relative to a set D, or on a set D,
if D ⊂ dom f and there is a constant κ ≥ 0 such that
The kernel of A is
ker A = x Ax = 0 .
In the finite-dimensional setting, we carefully distinguish between a linear mapping
and its matrix, but often use the same notation for both. A linear mapping A : IRn →
IRm is represented then by a matrix A with m rows, n columns, and components ai, j :
⎛ ⎞
a11 a12 ··· a1n
⎜ a21 a22 ··· a2n ⎟
A=⎜
⎝ ... .. ⎟ .
. ⎠
am1 am2 · · · amn
The inverse A−1 of a linear mapping A : IRn → IRm always exists in the set-valued
sense, but it isn’t a linear mapping unless it is actually a function with all of IRm as its
domain, in which case A is said to be invertible. From linear algebra, of course, that
requires m = n and corresponds to the matrix A being nonsingular. More generally,
if m ≤ n and the rows of the matrix A are linearly independent, then the rank of
the matrix A is m and the mapping A is surjective. In terms of the transpose of A,
6 1 Functions Defined Implicitly by Equations
denoted by AT , the matrix AAT is in this case nonsingular. On the other hand, if
m ≥ n and the columns of A are linearly independent then AT A is nonsingular.
Both the identity mapping and its matrix will be denoted by I, regardless of
dimensionality. By default, |A| is the operator norm of A induced by the Euclidean
norm,
|A| = max |Ax|.
|x|≤1
A function f : →
IRn is differentiable at a point x̄ when x̄ ∈ int dom f and
IRm
there is a linear mapping A : IRn → IRm with the property that for every ε > 0 there
exists δ > 0 with
| f (x̄ + h) − f (x̄) − Ah| ≤ ε |h| for every h ∈ IRn with |h| < δ .
If such a mapping A exists at all, it is unique; it is denoted by D f (x̄) and called the
derivative of f at x̄. A function f : IRn → IRm is said to be twice differentiable at a
point x̄ ∈ int dom f when there is a bilinear mapping N : IRn × IRn → IRm with the
property that for every ε > 0 there exists δ > 0 with
| f (x̄ + h) − f (x̄) − D f (x̄)h − N(h, h)| ≤ ε |h|2 for every h ∈ IRn with |h| < δ .
If such a mapping N exists it is unique and is called the second derivative of f at x̄,
denoted by D2 f (x̄). Higher-order derivatives can be defined accordingly.
The m × n matrix that represents the derivative D f (x̄) is called the Jacobian of f
at x̄ and is denoted by ∇ f (x̄). In the notation x = (x1 , . . . , xn ) and f = ( f1 , . . . , fm ),
the components of ∇ f (x̄) are the partial derivatives of the component functions fi :
m,n
∂ fi
∇ f (x̄) = (x̄) .
∂xj i, j=1
With this notation and terminology in hand, let us return to the setting of im-
plicit functions in equation (1), as traditionally addressed with tools of differen-
tiability. Most calculus books present a result going back to Dini1 , who formulated
and proved it in lecture notes of 1877/78; the cover of Dini’s manuscript is displayed
above. The version typically seen in advanced texts is what we will refer to as the
classical implicit function theorem or Dini’s theorem. In those texts the set-valued
solution mapping S in (2) never enters the picture directly, but a brief statement in
that mode will help to show where we are headed in this book.
1 Ulisse Dini (1845–1918). Many thanks to Danielle Ritelli from the University of Bologna for a
copy of Dini’s manuscript.
8 1 Functions Defined Implicitly by Equations
The statement centers on a pair ( p̄, x̄) satisfying the equation (1), or equiva-
lently such that x̄ ∈ S( p̄). It makes two assumptions: f is continuously differen-
tiable around ( p̄, x̄) and the partial Jacobian ∇x f ( p̄, x̄) is nonsingular (requiring of
course that m = n). The conclusion then is that a single-valued localization s of S
exists around p̄ for x̄ which moreover is continuously differentiable around p̄ with
Jacobian given by the formula
The Dini classical implicit function theorem and its variants will be taken up in
detail in Section 1B after the development in Section 1A of an equivalent inverse
function theorem. Later in Chapter 1 we gradually depart from the assumption of
continuously differentiability of f to obtain far-reaching extensions of this classical
theorem. It will be illuminating, for instance, to reformulate the assumption about
the Jacobian ∇x f ( p̄, x̄) as an assumption about the function
giving the partial linearization of f at ( p̄, x̄) with respect to x and having h(x̄) = 0.
The condition corresponding to the invertibility of ∇x f ( p̄, x̄) can be turned into the
condition that the inverse mapping h−1 , with x̄ ∈ h−1 (0), has a single-valued local-
ization around 0 for x̄. In this way the theme of single-valued localizations can be
carried forward even into realms where f might not be differentiable and h could
be some other kind of “local approximation” of f . We will be able to operate with
a broad implicit function paradigm, extending in later chapters to much more than
solving equations. It will deal with single-valued localizations s of solution map-
pings S to “generalized equations.” These localizations s, if not differentiable, will
at least have other key properties.
1 Functions Defined Implicitly by Equations 9
(a) | f (x ) − f (x) − ∇ f (x)(x − x)| ≤ ε |x − x| for every x , x ∈ IBδ (x̄).
(b) | f (x ) − f (x) − ∇ f (x̄)(x − x)| ≤ ε |x − x| for every x , x ∈ IBδ (x̄).
Theorem 1A.1 (classical inverse function theorem). Let f : IRn → IRn be contin-
uously differentiable in a neighborhood of a point x̄ and let ȳ := f (x̄). If ∇ f (x̄)
is nonsingular, then f −1 has a single-valued localization s around ȳ for x̄. More-
over, the function s is continuously differentiable in a neighborhood V of ȳ, and its
Jacobian satisfies
Examples.
1) For the function f (x) = x2 considered in the introduction, the inverse f −1 is
a set-valued mapping whose domain is [0, ∞). It has two single-valued localizations
√ √
around any ȳ > 0 for x̄ = 0, represented by either x(y) = y if x̄ > 0 or x(y) = − y
−1
if x̄ < 0. The inverse f has no single-valued localization around ȳ = 0 for x̄ = 0.
2) The inverse f −1 of the function f (x) = x3 is single-valued everywhere; it is
√ √
the function x(y) = 3 y. The inverse f −1 = 3 y is not differentiable at 0, which fits
with the observation that f (0) = 0.
3) For a higher-dimensional illustration, we look at diagonal real matrices
λ1 0
A=
0 λ2
and the function f : IR2 → IR2 which assigns to (λ1 , λ2 ) the trace y1 = λ1 + λ2 of
A and the determinant y2 = λ1 λ2 of A,
λ1 + λ2
f (λ1 , λ2 ) = .
λ 1 λ2
What can be said about the inverse of f ? The range of f consists of all y = (y1 , y2 )
such that 4y2 ≤ y21 . The Jacobian
1 1
∇ f (λ1 , λ2 ) =
λ 2 λ1
2 These two proofs are not really different, if we take into account that the contraction mapping
principle is proved by using a somewhat similar iterative procedure, see Section 5E.
1 Functions Defined Implicitly by Equations 11
analysis, namely Newton’s iterative method3 for solving nonlinear equations. The
second, which we include for the sake of connections with later developments, uti-
lizes a nonconstructive, but very broad, fixed-point argument.
Proof I of Theorem 1A.1. First we introduce some constants. Let a > 0 be a scalar
so small that, by appeal to Fact 2 in the beginning of this section, the Jacobian
matrix ∇ f (x) is nonsingular for every x in IBa (x̄) and the function x → ∇ f (x)−1 is
continuous in IBa (x̄). Set
c = max |∇ f (x)−1 |.
x∈IBa (x̄)
Take a > 0 smaller if necessary to obtain, on the basis of the estimate (a) in Fact 1,
that
1
(2) | f (x ) − f (x) − ∇ f (x)(x − x)| ≤ |x − x| for every x , x ∈ IBa (x̄).
2c
Let b = a/(16c). Let s be the localization of f −1 with respect to the neighborhoods
IBb (ȳ) and IBa (x̄):
(3) gph s = IBb (ȳ) × IBa (x̄) ∩ gph f −1 .
We will show that s has the properties claimed. The argument is divided into three
steps.
S TEP 1: The localization s is nonempty-valued on IBb (ȳ) with x̄ ∈ s(ȳ), in par-
ticular.
The fact that x̄ ∈ s(ȳ) is immediate of course from (3), inasmuch as x̄ ∈ f −1 (ȳ).
Pick any y ∈ IBb (ȳ) and any x0 ∈ IBa/8 (x̄). We will demonstrate that the iterative
procedure
and
3 Isaac Newton (1643–1727). In 1669 Newton wrote his paper De Analysi per Equationes Numero
Terminorum Infinitas, where, among other things, he describes an iterative procedure for approx-
imating real roots of a polynomial equation of third degree. In 1690 Joseph Raphson proposed
a similar iterative procedure for solving more general polynomial equations and attributed it to
Newton. It was Thomas Simpson who in 1740 stated the method in today’s form (using Newton’s
fluxions) for an equation not necessarily polynomial, without making connections to the works of
Newton and Raphson; he also noted that the method can be used for solving optimization problems
by setting the gradient to zero.
12 1 Functions Defined Implicitly by Equations
a
(5b) |xk − xk−1 | ≤ .
2k+1
To initialize the induction, we establish (5a) and (5b) for k = 1. Since x0 ∈
IBa/8 (x̄), the matrix ∇ f (x0 ) is indeed invertible, and (4) gives us x1 . The equality
in (4) for k = 0 can also be written as
obtaining
Taking norms on both sides and utilizing (2) with x = x̄ and x = x0 we get
c 0
|x1 − x̄| ≤ |∇ f (x0 )−1 |(| f (x̄) − f (x0 ) − ∇ f (x0 )(x̄ − x0 )| + |y − ȳ|) ≤ |x − x̄| + cb.
2c
Inasmuch as |x0 − x̄| ≤ a/8, this yields
a a
|x1 − x̄| ≤ + cb = ≤ a.
16 8
Hence (5a) holds for k = 1. Moreover, by the triangle inequality,
a a a
|x1 − x0 | ≤ |x1 − x̄| + |x̄ − x0 | ≤ + = ,
8 8 4
which is (5b) for k = 1.
Assume now that (5a) and (5b) hold for k = 1, 2, . . . , j. Then the matrix ∇ f (xk )
is nonsingular for all such k and the iteration (4) gives us for k = j the point x j+1 :
This gives (5a) for k = j + 1 and the induction step is complete. Thus, both (5a) and
(5b) hold for all k = 1, 2, . . ..
To verify that the sequence {xk } converges, we observe next from (5b) that, for
any k and j satisfying k > j, we have
k−1 ∞
a a
|xk − x j | ≤ ∑ |xi+1 − xi| ≤ ∑ 2 j+2 ≤ 2 j+1 .
i= j i= j
Hence, the sequence {xk } satisfies the Cauchy criterion, which is known to guaran-
tee that it is convergent.
Let x be the limit of this sequence. Clearly, from (5a), we have x ∈ IBa (x̄).
Through passing to the limit in (4), x must satisfy x = x − ∇ f (x)−1 ( f (x) − y), which
is equivalent to f (x) = y. Thus, we have proved that for any y ∈ IBb (ȳ) there exists
x ∈ IBa (x̄) such that x ∈ f −1 (y). In other words, the localization s of the inverse f −1
at ȳ for x̄ specified by (3) has nonempty values. In particular, IBb (ȳ) ⊂ dom f −1 .
S TEP 2: The localization s is single-valued on IBb (ȳ).
Let y ∈ IBb (ȳ) and suppose x and x belong to s(y). Then x, x ∈ IBa (x̄) and also
x = −∇ f (x)−1 f (x) − y − ∇ f (x)x and x = −∇ f (x)−1 f (x ) − y − ∇ f (x)x .
Consequently
x − x = −∇ f (x)−1 f (x ) − f (x) − ∇ f (x)(x − x) .
so that
x − x = −∇ f (x)−1 f (x ) − f (x) − ∇ f (x)(x − x) − (y − y) .
This estimate means that the localization s is Lipschitz continuous on IBb (ȳ).
Now take any ε > 0. Then, from (a) in Fact 1, there exists δ > 0 such that
ε
(8) | f (x ) − f (x) − ∇ f (x)(x − x)| ≤ |x − x| for every x , x ∈ IBδ (x̄).
2c2
Choose y ∈ int IBb (ȳ); then there exists τ > 0 such that τ < δ /(2c) and y+ h ∈ IBb (ȳ)
for any h ∈ IRn with |h| ≤ τ . From the estimate (7) we get that
and
s(y) = −∇ f (s(y))−1 ( f (s(y)) − y − ∇ f (s(y))s(y))
and subtracting the second from the first, we obtain
Once again taking norms on both sides, and using (7) and (8), we get
cε
|s(y + h) − s(y) − ∇ f (s(y))−1 h| ≤ |s(y + h) − s(y)| ≤ ε |h| whenever h ∈ IBτ (0).
2c2
By definition, this says that the function s is differentiable at y and that its Jacobian
equals ∇ f (s(y))−1 , as claimed in (1). This Jacobian is continuous in int IBb (ȳ); this
comes from the continuity of ∇ f −1 in IBa (x̄) where are the values of s, and the conti-
nuity of s in int IBb (ȳ), and also taking into account that a composition of continuous
functions is continuous.
1 Functions Defined Implicitly by Equations 15
We can make a shortcut through Steps 1 and 2 of the Proof I, arriving at the
promised Proof II, if we employ a deeper result of analysis far beyond the frame-
work so far, namely the contraction mapping principle. Although we work here
in Euclidean spaces, we state this theorem in the framework of a complete metric
space, as is standard in the literature. A more general version of this principle for
set-valued mapping will be proved in Section 5E. The reader who wants to stick
with Euclidean spaces may assume that X is a closed nonempty subset of IRn with
metric ρ (x, y) = |x − y|.
and
Exercise 1A.5. Prove that theorems 1A.2, 1A.3 and 1A.4 are equivalent.
Guide. Let 1A.2 be true and let Φ satisfy the assumptions in 1A.3 with some λ ∈
[0, 1). Choose x̄ ∈ X; then Φ (x̄) ∈ X. Let a > ρ (x̄, Φ (x̄))/(1 − λ ). Then (a) and (b)
are satisfied with this a and hence there exists a unique fixed point x of Φ in IBa (x̄).
The uniqueness of this fixed point in the whole X follows from the contraction
property. To prove the converse implication first use (a)(b) to obtain that Φ maps
IBa (x̄) into itself and then use the fact that the closed ball IBa (x̄) equipped with
metric ρ is a complete metric space. Another way to have equivalence of 1A.2 and
1A.3 is to reformulate 1A.2 with a being possibly ∞.
Let 1A.3 be true and let Φ satisfy the assumptions (9) and (10) in 1A.4 with
corresponding
λ and μ . Then, by 1A.3, for every fixed p ∈ P the set x ∈ X x =
Φ (p, x) is a singleton; that is, the mapping ψ in (11) is a function with domain
P. To complete the proof, choose p , p ∈ P and the corresponding x = Φ (p , x ),
x = Φ (p, x), and use (9), (10) and the triangle inequality to obtain
Proof II of Theorem 1A.1. Denote A = ∇ f (x̄) and let c := |A−1 |. There exists
a > 0 such that from the estimate (b) in Fact 1 (in the beginning of this section) we
have
1
(12) | f (x ) − f (x) − ∇ f (x̄)(x − x)| ≤ |x − x| for every x , x ∈ IBa (x̄).
2c
Let b = a/(4c). The space IRn equipped with the Euclidean norm is a complete
metric space, so in this case X in Theorem 1A.2 is identified with IRn . Fix y ∈ IBb (ȳ)
and consider the function
We have
ca 1
|Φy (x̄) − x̄| = | − A−1(ȳ − y)| ≤ cb = < a 1− ,
4c 2
hence condition (a) in the contraction mapping principle 1A.2 holds with the so
chosen a and λ = 1/2. Further, for any x, x ∈ IBa (x̄), from (12) we obtain that
|Φy (x) − Φy (x )| = |x − x − A−1 ( f (x) − f (x ))| ≤ |A−1 || f (x) − f (x ) − A(x − x)|
1 1
≤ c |x − x | = |x − x |.
2c 2
Thus condition (b) in 1A.2 is satisfied with the same λ . Hence, there is a unique
x ∈ IBa (x̄) such that Φy (x) = x; that is equivalent to f (x) = y.
1 Functions Defined Implicitly by Equations 17
Translated into our terminology, this tells us that f −1 has a single-valued local-
ization around ȳ for x̄ whose graph satisfies (3). The continuous differentiability is
argued once more through Step 3 of Proof I.
Exercise 1A.6. Prove Theorem 1A.1 by using, instead of iteration (4), the iteration
Guide. Follow the argument in Proof I with respective adjustments of the constants
involved.
In this and the following chapters we will derive the classical inverse function
theorem 1A.1 a number of times and in different ways from more general theorems
or utilizing other basic results. For instance, in Section 1F we will show how to
obtain 1A.1 from Brouwer’s invariance of domain theorem and in Section 4B we
will prove 1A.1 again with the help of the Ekeland variational principle.
There are many roads to be taken from here, by relaxing the assumptions in the
classical inverse function theorem, that lead to a variety of results. Some of them
are paved and easy to follow, others need more advanced techniques, and a few lead
to new territories which we will explore later in the book.
Theorem 1B.1 (Dini classical implicit function theorem). Let f : IRd × IRn → IRn
be continuously differentiable in a neighborhood of ( p̄, x̄) and such that f ( p̄, x̄) = 0,
and let the partial Jacobian of f with respect to x at ( p̄, x̄), namely ∇x f ( p̄, x̄), be
nonsingular. Then the solution mapping S defined in (1) has a single-valued local-
ization s around p̄ for x̄ which is continuously differentiable in a neighborhood Q of
p̄ with Jacobian satisfying
The classical inverse function theorem is the particular case of the classical im-
plicit function theorem in which f (p, x) = −p + f (x) (with a slight abuse of nota-
tion). However, it will also be seen now that the classical implicit function theorem
can be obtained from the classical inverse function theorem. For that, we first state
an easy-to-prove fact from linear algebra.
acting from IRd × IRn to itself. The inverse of this function is defined by the solutions
of the equation
p y1
(3) ϕ (p, x) = = ,
f (p, x) y2
where the vector (y1 , y2 ) ∈ IRd × IRn is now the parameter and (p, x) is the depen-
dent variable. The nonsingularity of the partial Jacobian ∇x f ( p̄, x̄) implies through
Lemma 1B.2 that the Jacobian of the function ϕ in (3) at the point (x̄, p̄), namely
the matrix
I 0
J( p̄, x̄) = ,
∇ p f ( p̄, x̄) ∇x f ( p̄, x̄)
is nonsingular as well. Then, according to the classical inverse function theorem
1A.1, the inverse ϕ −1 of the function in (3) has a single-valued localization
which is continuously differentiable around ( p̄, 0). To develop formula (2), we note
that
q(y1 , y2 ) = y1 ,
f (y1 , r(y1 , y2 )) = y2 .
Differentiating the second equality with respect to y1 by using the chain rule, we get
When (y1 , y2 ) is close to ( p̄, 0), the point (y1 , r(y1 , y2 )) is close to ( p̄, x̄) and then
∇x f (y1 , r(y1 , y2 )) is nonsingular (Fact 2 in Section 1A). Thus, solving (4) with re-
spect to ∇y1 r(y1 , y2 ) gives
In particular, at points (y1 , y2 ) = (p, 0) close to ( p̄, 0) we have that the mapping
p → s(p) := r(p, 0) is a single-valued localization of the solution mapping S in
(1) around p̄ for x̄ which is continuously differentiable around p̄ and its derivative
satisfies (2).
Thus, the classical implicit function theorem, as stated above, is equivalent to the
classical inverse function theorem as stated in the preceding section. We now look
at yet another equivalent result.
Exercise 1B.4. Prove that the correction function theorem implies the inverse func-
tion theorem.
Guide. Let ȳ := f (x̄) and assume A := ∇ f (x̄) is nonsingular. In these terms the
correction function theorem 1B.3 claims that the mapping
20 1 Functions Defined Implicitly by Equations
Ξ : z → ξ ∈ IRn f (z + ξ ) = ȳ + A(z − x̄) for z ∈ IRn
Theorem 1B.6 (Goursat’s implicit function theorem). For the solution mapping S
defined in (1), consider a pair ( p̄, x̄) with x̄ ∈ S( p̄). Assume that:
(a) f (p, x) is differentiable with respect to x in a neighborhood of the point ( p̄, x̄),
and both f (p, x) and ∇x f (p, x) depend continuously on (p, x) in this neighborhood;
(b) ∇x f ( p̄, x̄) is nonsingular.
Then S has a single-valued localization around p̄ for x̄ which is continuous at p̄.
We will prove a far reaching generalization of this result in Section 2B, which we
supply with a detailed proof. In the following exercise we give a guide for a direct
proof.
then one has, as in the estimate (a) in Fact 1, that for every x , x ∈ IBa (x̄) and p ∈
IBq ( p̄)
1
(5) | f (p, x ) − f (p, x) − ∇x f (p, x)(x − x)| ≤ |x − x|.
2c
Then use the iteration
to obtain that S has a nonempty graphical localization s around p̄ for x̄. As in Step 2
in Proof I of 1A.1, show that s is single-valued. To show continuity at p̄, for x = s(p)
subtract the equalities
and
x̄ = ∇x f ( p̄, x̄)−1 ( f ( p̄, x̄) − ∇x f ( p̄, x̄)x̄),
and after adding and subtracting terms, use (5).
where the coefficients a0 , . . . , an are real numbers. For each coefficient vector a =
(a0 , . . . , an ) ∈ IRn+1 let S(a) denote the set of all real zeros of p, so that S is a
mapping from IRn+1 to IR whose domain consists of the vectors a such that p has at
least one real zero. Let ā be a coefficient vector such that p has a simple real zero
s̄; thus p(s̄) = 0 but p (s̄) = 0. Prove that S has a smooth single-valued localization
around ā for s̄. Is such a statement correct when s̄ is a double zero?
1C. Calmness
The calmness property (1) can alternatively be expressed in the form of the in-
clusion
f (x) ∈ f (x̄) + κ |x − x̄|IB for all x ∈ D ∩ dom f .
That expression connects with the generalization of the definition of calmness to
set-valued mappings, which we will discuss at length in Chapter 3.
Note that a function f which is calm at x̄ may have empty values at some points
x near x̄ when x̄ is on the boundary of dom f . If x̄ is an isolated point of D ∩ dom f ,
then trivially f is calm at x̄ relative to D with κ = 0.
We will mostly use a local version of the calmness property where the set D in
the condition (1) is a neighborhood of x̄; if such a neighborhood exists we simply
say that f is calm at x̄. Calmness of this kind can be identified with the finiteness of
the modulus which we proceed to define next.
Calmness modulus. For a function f : IRn → IRm and a point x̄ ∈ dom f , the calm-
ness modulus of f at x̄, denoted clm ( f ; x̄), is the infimum of the set of values κ ≥ 0
for which there exists a neighborhood D of x̄ such that (1) holds.
According to this, as long as x̄ is not an isolated point of dom f , the calmness
modulus satisfies
| f (x) − f (x̄)|
clm ( f ; x̄) = lim sup .
x∈dom f ,x→x̄, |x − x̄|
x=x̄
If x̄ is an isolated point we have clm ( f ; x̄) = 0. When f is not calm at x̄, from the
definition we get clm ( f ; x̄) = ∞. In this way,
Examples.
1) The function f (x) = x for x ≥ 0 is calm at any point of its domain [0, ∞),
always with calmness modulus
1.
2) The function f (x) = |x|, x ∈ IR is not calm at zero but calm everywhere else.
3) The linear mapping A : x → Ax, where A is an m × n matrix, is calm at every
point x ∈ IRn and everywhere has the same modulus clm (A; x) = |A|.
Straight from the definition of the calmness modulus, we observe that
(i) clm ( f ; x̄) ≥ 0 for any x̄ ∈ dom f ;
(ii) clm (λ f ; x̄) = |λ | clm ( f ; x̄) for any λ ∈ IR and x̄ ∈ dom f ;
(iii) clm ( f + g; x̄) ≤ clm ( f ; x̄) + clm (g; x̄) for any x̄ ∈ dom f ∩ dom g.
These properties of the calmness modulus resemble those of a norm on a space of
functions f , but because clm( f ; x̄) = 0 does not imply f = 0, one could at most
contemplate a seminorm. However, even that falls short, since the modulus can take
on ∞, as can the functions themselves, which do not form a linear space because
they need not even have the same domain.
(b) clm ( f − g; x̄) = 0 ⇒ clm ( f ; x̄) = clm (g; x̄) whenever x̄ ∈ int(dom f ∩
dom g), but the converse is false.
With the concept of calmness in hand, we can interpret the differentiability of a
function f : IRn → IRm at a point x̄ ∈ int dom f as the existence of a linear mapping
A : IRn → IRm , represented by an n × m matrix, such that
(2) clm (e; x̄) = 0 for e(x) = f (x) − [ f (x̄) + A(x − x̄)].
According to property (iii) before 1C.1 there is at most one mapping M satisfying
(2). Indeed, if A1 and A2 satisfy (2) we have for the corresponding approximation
error terms e1 (x) and e2 (x) that
Thus, A is unique and the associated matrix has to be the Jacobian ∇ f (x̄). We con-
clude further from property (b) in 1C.1 that
The following theorem complements Theorem 1A.1. It shows that the invertibil-
ity of the derivative is a necessary condition to obtain a calm single-valued localiza-
tion of the inverse.
Choose τ to satisfy 0 < τ < 1/κ . Then, since x̄ ∈ int dom f and f is differentiable
at x̄, there exists δ > 0 such that
(4) | f (x) − f (x̄) − ∇ f (x̄)(x − x̄)| ≤ τ |x − x̄| for all x ∈ IBδ (x̄).
If the matrix ∇ f (x̄) were singular, there would exist d ∈ IRn , |d| = 1, such that
∇ f (x̄)d = 0. Pursuing this possibility, let ε satisfy 0 < ε < min{a, b/τ , δ }. Then,
by applying (4) with x = x̄ + ε d, we get f (x̄ + ε d) ∈ IBb (ȳ). In terms of yε := f (x̄ +
ε d), we then have x̄ + ε d ∈ f −1 (yε ) ∩ IBa (x̄), hence s(yε ) = x̄ + ε d. The calmness
condition (3) then yields
24 1 Functions Defined Implicitly by Equations
1 1 κ κ
1 = |d| = |x̄ + ε d − x̄| = |s(yε ) − x̄| ≤ |yε − ȳ| = | f (x̄ + ε d) − f (x̄)|.
ε ε ε ε
Combining this with (4) and taking into account that ∇ f (x̄)d = 0, we arrive at 1 ≤
κτ |d| < 1 which is absurd. Hence ∇ f (x̄) is nonsingular.
Note that in the particular case of an affine function f (x) = Ax + b, where A is a
square matrix and b is a vector, calmness can be dropped from the set of assumptions
of Theorem 1C.2; the existence of a single-valued localization of f −1 around any
point is already equivalent to the nonsingularity of the Jacobian. This is not always
true even for polynomials. Indeed, the inverse of f (x) = x3 , x ∈ IR, has a single-
valued localization around the origin (which is not calm), but ∇ f (0) = 0.
The classical inverse function theorem 1A.1 combined with Theorem 1C.2 above
gives us
Theorem 1C.3 (symmetric inverse function theorem). Let f : IRn → IRn be contin-
uously differentiable around x̄. Then the following are equivalent:
(i) ∇ f (x̄) is nonsingular;
(ii) f −1 has a single-valued localization s around ȳ := f (x̄) for x̄ which is con-
tinuously differentiable around ȳ.
The formula for the Jacobian of the single-valued localization s of the inverse,
Thus, ε d + ξ (x̄ + ε d) ∈ Ξ (x̄) for all small ε > 0. Since Ξ has a single-valued local-
ization around x̄ we get x̄ + ε d + ξ (x̄ + ε d) = x̄. Then
1
1 = |d| = |ξ (x̄ + ε d)|.
ε
1 Functions Defined Implicitly by Equations 25
The right side of this equation goes to zero as ε → 0, and that produces a contradic-
tion.
Partial calmness. A function f : IRd × IRn → IRm is said to be calm with respect to x
at ( p̄, x̄) ∈ dom f when the function ϕ with values ϕ (x) = f ( p̄, x) is calm at x̄. Such
calmness is said to be uniform in p at ( p̄, x̄) when there exists a constant κ > 0 and
neighborhoods Q of p̄ and U of x̄ such that actually
provided that every neighborhood of ( p̄, x̄) contains points (p, x) ∈ dom f with x = x̄.
Observe in this context that differentiability of f (p, x) with respect to x at ( p̄, x̄) ∈
int dom f is equivalent to the existence of a linear mapping A : IRn → IRm , the partial
derivative of f with respect to x at ( p̄, x̄), which satisfies
clm (e; x̄) = 0 for e(x) = f ( p̄, x) − [ f ( p̄, x̄) + A(x − x̄)],
and then A is the partial derivative Dx f ( p̄, x̄). In contrast, under the stronger condi-
tion that
x (e; ( p̄, x̄)) = 0, for e(p, x) = f (p, x) − [ f ( p̄, x̄) + A(x − x̄)],
clm
we say f is differentiable with respect to x uniformly in p at ( p̄, x̄). This means that
for every ε > 0 there are neighborhoods Q of p̄ and U of x̄ such that
Exercise 1C.5 (joint calmness criterion). Let f : IRd × IRn → IRm be calm in x uni-
formly in p and calm in p, both at ( p̄, x̄). Show that f is calm at ( p̄, x̄).
the assumption that ∇ f (x̄) is nonsingular, argue in a manner parallel to the first part
of Step 3 of Proof I of Theorem 1A.1.
It is said to be Lipschitz continuous around x̄ when this holds for some neighborhood
D of x̄. We say further, in the case of an open set C, that f is locally Lipschitz
continuous on C if it is a Lipschitz continuous function around every point x of C.
Lipschitz modulus. For a function f : IRn → IRm and a point x̄ ∈ int dom f , the
Lipschitz modulus of f at x̄, denoted lip ( f ; x̄), is the infimum of the set of values of
κ for which there exists a neighborhood D of x̄ such that (1) holds. Equivalently,
| f (x ) − f (x)|
(2) lip ( f ; x̄) := lim sup .
x ,x→x̄, |x − x|
x=x
Note that, by this definition, for the Lipschitz modulus we have lip ( f ; x̄) = ∞
precisely in the case where, for every κ > 0 and every neighborhood D of x̄, there
are points x , x ∈ D violating (1). Thus,
A function f with lip ( f ; x̄) < ∞ is also called strictly continuous at x̄. For an open set
C, a function f is locally Lipschitz continuous on C exactly when lip ( f ; x) < ∞ for
every x ∈ C. Every continuously differentiable function on an open set C is locally
Lipschitz continuous on C.
1 Functions Defined Implicitly by Equations 27
Examples.
1) The function x → |x|, x ∈ IRn , is Lipschitz continuous everywhere with
lip (|x|; x) = 1; it is not differentiable at 0.
2) An affine function f : x → Ax + b, corresponding to a matrix A ∈ IRm×n and a
vector b ∈ IRm , has lip ( f ; x̄) = |A| for every x̄ ∈ IRn .
3) If f is continuously differentiable in a neighborhood of x̄, then lip ( f ; x̄) =
|∇ f (x̄)|.
Like the calmness modulus, the Lipschitz modulus has the properties of a semi-
norm, except in allowing for ∞:
(i) lip ( f ; x̄) ≥ 0 for any x̄ ∈ int dom f ;
(ii) lip (λ f ; x̄) = |λ | lip ( f ; x̄) for any λ ∈ IR and x̄ ∈ int dom f ;
(iii) lip ( f + g; x̄) ≤ lip ( f ; x̄) + lip(g; x̄) for any x̄ ∈ int dom f ∩ int dom g.
or in other words, if C contains for any pair of its points the entire line segment that
joins them. The most obvious convex set is the ball IB as well as its interior, while
the boundary of the ball is of course nonconvex.
Exercise 1D.2 (Lipschitz continuity on convex sets). Show that if C is a convex sub-
set of int dom f such that lip ( f ; x) ≤ κ for all x ∈ C, then f is Lipschitz continuous
relative to C with constant κ.
Guide. It is enough to demonstrate for an arbitrary choice of points x and x in C
and ε > 0 that | f (x ) − f (x)| ≤ (κ + ε )|x − x|. Argue that the line segment joining
x and x is a compact subset of int dom f which can be covered by finitely many
balls on which f is Lipschitz continuous with constant κ + ε . Moreover these balls
can be chosen in such a way that a finite sequence of points x0 , x1 , . . . , xr along the
segment, starting with x0 = x and ending with xr = x , has each consecutive pair in
one of them. Get the Lipschitz inequality for x and x from the Lipschitz inequalities
for these pairs.
Distance and projection. For a point x ∈ IRn and a set C ⊂ IRn , the quantity
is called the distance from x to C. (Whether the notation dC (x) or d(x,C) is used is a
matter of convenience in a given context.) Any point y of C which is closest to x in
the sense of achieving this distance is called a projection of x on C. The set of such
projections is denoted by PC (x). Thus,
y
x
PC(x) C
(c) For a nonempty, closed set C ⊂ IRn , the projection set PC (x) is nonempty,
closed and bounded for every x ∈ IRn .
Proof. For (a), we fix any x ∈ IRn and note that obviously dcl C (x) ≤ dC (x). This
inequality can’t be strict because for any ε > 0 we can find y ∈ cl C making |x −
y| < dcl C (x) + ε but then also find y ∈ C with |y − y | < ε , in which case we have
dC (x) ≤ |x − y| < dcl C (x) + 2ε . In particular, this argument reveals that dcl C (x) = 0
if and only if x ∈ cl C. Having
demonstrated that dC (x) = dcl C (x), we may conclude
that C = x dC (x) = 0 if and only if C = cl C.
For (b), consider any points x and x along with any ε > 0. Take any point y ∈ C
such that |x − y| ≤ dC (x) + ε . We have dC (x ) ≤ |x − y| ≤ |x − x| + |x − y| ≤ |x −
x| + dC (x) + ε , and through the arbitrariness of ε therefore dC (x ) − dC (x) ≤ |x − x|.
The same thing must hold with the roles of x and x reversed, so this demonstrates
that dC is Lipschitz continuous with constant 1.
Let C be nonempty and closed. If x̄ ∈ int C, we have dC (x) = 0 for all x in a
neighborhood of x̄ and consequently lip (dC ; x̄) = 0. Suppose now that x̄ ∈ / int C. We
will show that lip (dC ; x̄) ≥ 1, in which case equality must actually hold because we
already know that dC is Lipschitz continuous on IRn with constant 1. According to
the property of the Lipschitz modulus displayed in Exercise 1D.1(c), it is sufficient
to consider x̄ ∈ / C. Let x̃ ∈ PC (x̄). Then on the line segment from x̃ to x̄ the distance
increases linearly, that is, dC (x̃ + τ (x̄ − x̃)) = τ d(x̄,C) for 0 ≤ τ ≤ 1 (prove!). Hence
for the two points x = x̃ + τ (x̄ − x̃) and x = x̃ + τ (x̄ − x̃) we have |dC (x ) − dC (x)| =
|τ − τ ||x̃ − x̄| = |x − x|. Note that x̄ can be approached by such pairs of points and
hence lip (dC ; x̄) ≥ 1.
Turning now to (c), we again fix any x ∈ IRn and choose a sequence of points
yk ∈ C such that |x − yk | → dC (x) as k → ∞. This sequence is bounded and therefore
has an accumulation point y in C, inasmuch as C is closed. Since |x − yk | ≥ dC (x) for
all k, it follows that |x − y| = dC (x). Thus, y ∈ PC (x), so PC (x) is not empty. Since by
definition PC (x) is the intersection of C with the closed ball with center x and radius
dC (x), it’s clear that PC (x) is furthermore closed and bounded.
It has been seen in 1D.4(c) that for any nonempty closed set C ⊂ IRn the projection
mapping PC : IRn →→ C is nonempty-compact-valued, but when might it actually be
single-valued as well? The convexity of C is the additional property that yields this
conclusion, as will be shown in the following proposition4.
4 A set C such that PC is single-valued is called a Chebyshev set. A nonempty, closed, convex set
is always a Chebyshev set, and in IRn the converse is also true; for proofs of this fact see Borwein
and Lewis [2006] and Deutsch [2001]. The question of whether a Chebyshev set in an arbitrary
infinite-dimensional Hilbert space must be convex is still open.
30 1 Functions Defined Implicitly by Equations
Proof. We have PC (x) = 0/ in view of 1D.4(c). Suppose ȳ ∈ PC (x̄). For any τ ∈ (0, 1),
any y ∈ IRn and yτ = (1 − τ )ȳ + τ y we have the identity
and consequently
It follows that
|y1 − y0 | ≤ |x1 − x0 |.
Thus, PC is Lipschitz continuous with Lipschitz constant 1.
Projection mappings have many uses in numerical analysis and optimization.
Note that PC always fails to be differentiable on the boundary of C. As an example,
when C is the set of nonpositive reals IR− one has
0 for x ≥ 0,
PC (x) =
x for x < 0
and this function is not differentiable at x = 0.
It is clear from the definitions of the calmness and Lipschitz moduli that we
always have
clm ( f ; x̄) ≤ lip ( f ; x̄).
This relation is illustrated in Fig. 1.4.
In the preceding section we showed how to characterize differentiability through
calmness. Now we introduce a sharper concept of derivative which is tied up with
the Lipschitz modulus.
1 Functions Defined Implicitly by Equations 31
Fig. 1.4 Plots of calm and Lipschitz continuous functions. On the left is the plot of the func-
tion f (x) = (−1)n+1 9x + (−1)n 22n+1 /5n−2 , |x| ∈ [xn+1 , xn ] for xn = 4n−1 /5n−2 , n = 1, 2, . . . for
which clm ( f ; 0) < lip( f ; 0) < ∞. On the right is the plot of the function f (x) = (−1)n+1 (6 + n)x +
(−1)n 210(5 + n)!/(6 + n)!, |x| ∈ [xn+1 , xn ] for xn = 210(4 + n)!/(6 + n)!, n = 1, 2, . . . for which
clm ( f ; 0) < lip( f ; 0) = ∞.
In particular, in this case we have that clm (e; x̄) = 0 and hence f is differentiable
at x̄ with A = D f (x̄), but strictness imposes a requirement on the difference
also when x = x̄. Specifically, it demands the existence for each ε > 0 of a neigh-
borhood U of x̄ such that
Fig. 1.5 Plots of functions differentiable at the origin. The function on the left is strictly differen-
tiable at the origin but not continuously differentiable. The function on the right is differentiable at
the origin but not strictly differentiable there.
The second of these examples has the interesting feature that, even though
f (0) = 0 and f (0) = 0, no single-valued localization of f −1 exists around 0 for
0. In contrast, we will see in 1D.9 that strict differentiability would ensure the avail-
ability of such a localization.
1
(7) | f (x2 ) − f (x1 ) − ∇ f (x̄)(x2 − x1 )| ≤ ε |x1 − x2 |.
2
Fix an x1 ∈ IBδ1 /2 (x̄). For this x1 there exists δ2 > 0 such that for any x ∈ IBδ2 (u1 ),
1
(8) | f (x ) − f (x1 ) − ∇ f (x1 )(x − x1 )| ≤ ε |x − x1 |.
2
Make δ2 smaller if necessary so that IBδ2 (x1 ) ⊂ IBδ1 (x̄). By (7) with x2 replaced by
x and by (8), we have
5 These two examples are from Nijenhuis [1974], where the introduction of strict differentiability
is attributed to Leach [1961]. By the way, Nijenhuis dedicated his paper to Carl Allendoerfer “for
not taking the implicit function theorem for granted.” In the book we follow this advice.
1 Functions Defined Implicitly by Equations 33
This implies
|∇ f (x1 ) − ∇ f (x̄)| ≤ ε .
Since x1 is arbitrarily chosen in IBδ1 /2 (x̄), we obtain that the Jacobian is continuous
at x̄.
For the opposite direction, use Fact 1 in the beginning of Section 1A.
uniformly in p around ( p̄, x̄) ∈ int dom f when there are neighborhoods Q of p̄ and
U of x̄ along with a constant κ and such that
Accordingly, the partial uniform Lipschitz modulus with respect to x has the form
( f ; ( p̄, x̄)) := lim sup | f (p, x ) − f (p, x)| .
lip x
x,x →x̄,p→ p̄, |x − x|
x=x
(e; ( p̄, x̄)) = 0 for e(p, x) = f (p, x) − [ f ( p̄, x̄) + Dx f ( p̄, x̄)(x − x̄)],
lip x
or in other words, if for every ε > 0 there are neighborhoods Q of p̄ and U of x̄ such
that
Exercise 1D.11 (joint differentiability criterion). Let f : IRd × IRn → IRm be strictly
differentiable with respect to x uniformly in p and be differentiable with respect to
p, both at ( p̄, x̄). Prove that f is differentiable at ( p̄, x̄).
Exercise 1D.12 (joint strict differentiability criterion). Prove that f : IRd ×IRn → IRm
is strictly differentiable at ( p̄, x̄) if and only if it is strictly differentiable with respect
to x uniformly in p and strictly differentiable with respect to p uniformly in x, both
at ( p̄, x̄).
We state next the implicit function counterpart of Theorem 1D.9.
Proof. We apply Theorem 1D.9 in a manner parallel to the way that the classical
implicit function theorem 1B.1 was derived from the classical inverse function the-
orem 1A.1.
Prove that the solution mapping associated with this equation has a strictly differen-
tiable single-valued localization around 0 for 0.
Guide. The function g(p, x) = p f (x)− 0x f (pt)dt satisfies (∂ g/∂ x)(0, 0) = − f (0),
which is nonzero by assumption. For any ε > 0 there exist open intervals Q and U
centered at 0 such that for every p ∈ Q and x, x ∈ U we have
∂g
|g(p, x) − g(p, x) − (0, 0)(x − x)|
∂x
x
= |p( f (x) − f (x )) − f (pt)dt + f (0)(x − x )|
x
= |p( f (x) − f (x )) − ( f (px̃) − f (0))(x − x )|
≤ |p( f (x) − f (x ) − f (0)(x − x ))| + |p f (0)(x − x )|
+|( f (px̃) − f (0))(x − x )| ≤ ε |x − x |,
where the mean value theorem guarantees that xx f (pt)dt = (x − x ) f (px̃) for some
x̃ between x and x. Hence, g is strictly differentiable with respect to x uniformly in
p at (0, 0). Prove in a similar way that g is strictly differentiable with respect to p
uniformly in x at (0, 0). Then apply 1D.12 and 1D.13.
continuous. The price to pay is that the single-valued localization of the inverse that
is obtained might not be differentiable, but at least it will have a Lipschitz property.
The way to do that is found through notions of how a function f may be “approx-
imated” by another function h around a point x̄. Classical theory focuses on f being
differentiable at x̄ and approximated there by the function h giving its “linearization”
at x̄, namely h(x) = f (x̄) + ∇ f (x̄)(x − x̄). Differentiability corresponds to having
f (x) = h(x) + o(|x − x̄|) around x̄, which is the same as clm ( f − h; x̄) = 0, whereas
strict differentiability corresponds to the stronger requirement that lip ( f − h; x̄) = 0.
The key idea is that conditions like this, and others in a similar vein, can be applied
to f and h even when h is not a linearization dependent on the existence of ∇ f (x̄).
Assumptions on the nonsingularity of ∇ f (x̄), corresponding in the classical setting
to the invertibility of the linearization, might then be replaced by assumptions on
the invertibility of some other approximation h.
which can also be written as f (x) = h(x) + o(|x − x̄|). It is a strict first-order approx-
imation if the stronger condition holds that
and moreover,
Exercise 1E.2 (strict approximations through composition). Let the function f sat-
isfy lip ( f ; x̄) < ∞ and let the function g have a strict first-order approximation q at
ȳ, where ȳ := f (x̄). Then q◦ f is a strict first-order approximation of g◦ f at x̄.
Guide. Mimic the proof of 1E.1.
For our purposes here, and in later chapters as well, first-order approximations
offer an appealing substitute for differentiability, but an even looser notion of ap-
proximation will still lead to important conclusions.
which can also be written as | f (x) − h(x)| ≤ μ |x − x̄| + o(|x − x̄|). It is a strict esti-
mator if the stronger condition holds that
This result is a particular case of the implicit function theorem 1E.13 presented
later in this section, which in turn follows from a more general result proven in
Chapter 2. The reader who does not want to wait for a proof until the next chapter
is encouraged to do the following exercise which is supplied with a detailed guide.
Without loss of generality, suppose that x̄ = 0 and ȳ = 0 and take a small enough
that the mapping
y → h−1 (y) ∩ aIB for y ∈ aIB
is a localization of σ that is Lipschitz continuous with constant λ and also the dif-
ference e = f − h is Lipschitz continuous on aIB with constant ν . Next we choose α
satisfying
1
0 < α < a(1 − λ ν ) min{1, λ }
4
and let b := α /(4λ ). Pick any y ∈ bIB and any x0 ∈ (α /4)IB; this gives us
α α α
|y − e(x0 )| ≤ |y| + |e(x0 ) − e(0)| ≤ |y| + ν |x0 | ≤ +ν ≤ .
4λ 4 2λ
1 Functions Defined Implicitly by Equations 39
In particular |y − e(x0 )| < a, so the point y − e(x0 ) lies in the region where σ is
Lipschitz continuous. Let x1 = σ (y − e(x0 )); then
α
|x1 | = |σ (y − e(x0 ))| = |σ (y − e(x0)) − σ (0)| ≤ λ |y − e(x0 )| ≤ ,
2
so in particular x1 belongs to the ball aIB. Furthermore,
We also have
hence
lim |xk − x j | = 0.
j,k→∞
k> j
Then the sequence {xk } is Cauchy and hence convergent. Let x be its limit. Since
all xk and all y − e(xk ) are in aIB, where both e and σ are continuous, we can pass to
the limit in the equation xk+1 = σ (y − e(xk )) as k → ∞, getting
λb
|x| ≤ .
1−λν
40 1 Functions Defined Implicitly by Equations
Thus, it is established that for every y ∈ bIB there exists x ∈ f −1 (y) with |x| ≤
λ b/(1 − λ ν ). In other words, we have shown the nonempty-valuedness of the lo-
calization of f −1 given by
λb
s : y → f −1 (y) ∩ IB for y ∈ bIB.
1−λν
Next, demonstrate that this localization s is in fact single-valued and Lipschitz con-
tinuous. If for some y ∈ bIB we have two points x = x , both of them in s(y), then
subtracting x = σ (y − e(x)) from x = σ (y − e(x )) gives
0 < |x − x| = |σ (y − e(x )) − σ (y − e(x))| ≤ λ |e(x ) − e(x)| ≤ λ ν |x − x| < |x − x|,
and hence s is Lipschitz continuous relative to bIB with constant λ /(1 − λ ν ). This
expression is continuous and increasing as a function of λ and ν , which are greater
than κ and μ but can be chosen arbitrarily close to them, hence the Lipschitz mod-
ulus of s at ȳ satisfies the desired inequality.
0 for x = 0,
Fig. 1.6 Graphs of the function f in Example 1E.5 when α = 2 on the left and α = 0.5 on the
right.
Corollary 1E.6 (estimators with strict differentiability). For f : IRn → IRn with
x̄ ∈ int dom f and f (x̄) = ȳ, suppose there is a strict estimator h : IRn → IRn of f at x̄
with constant μ which is strictly differentiable at x̄ with nonsingular Jacobian ∇h(x̄)
satisfying μ |∇h(x̄)−1 | < 1. Then f −1 has a Lipschitz continuous single-valued lo-
calization s around ȳ for x̄ with
|∇h(x̄)−1 |
lip (s; ȳ) ≤ .
1 − μ |∇h(x̄)−1 |
42 1 Functions Defined Implicitly by Equations
An even more special application, to the case where both f and h are linear,
yields a well-known estimate for matrices.
Exercise 1E.8 (equivalent estimation rules for matrices). Prove that the following
two statements are equivalent to Corollary 1E.7:
(a) For any n × n matrix C with |C| < 1, the matrix I + C is nonsingular and
1
(1) |(I + C)−1 | ≤ .
1 − |C|
(b) For any n × n matrices A and B with A nonsingular and |BA−1 | < 1, the matrix
A + B is nonsingular and
|A−1 |
|(A + B)−1 | ≤ .
1 − |BA−1|
Guide. Clearly, (b) implies Corollary 1E.7 which in turn implies (a) with A = I and
B = C. Let (a) hold and let the matrices A and B satisfy |BA−1 | < 1. Then, by (a) with
C = BA−1 we obtain that I + BA−1 is nonsingular, and hence A + B is nonsingular,
too. Using the equality A + B = (I + BA−1)A in (1) we have
|A−1 |
|(A + B)−1 | = |A−1 (I + BA−1 )−1 | ≤ ,
1 − |BA−1|
It turns out that this inequality is actually equality, another classical result in matrix
perturbation theory.
1 Functions Defined Implicitly by Equations 43
Theorem 1E.9 (radius theorem for matrix nonsingularity). For any nonsingular ma-
trix A,
inf |B| A + B is singular = |A−1 |−1 .
Proof. It is sufficient to prove the inequality opposite to (2). Choose ȳ ∈ IRn with
|ȳ| = 1 and |A−1 ȳ| = |A−1 |. For x̄ = A−1 ȳ we have |x̄| = |A−1 |. The matrix
ȳ x̄T
B=−
|x̄|2
satisfies
|ȳ x̄T x| |x̄T x| |x̄T x̄| 1
|B| = max ≤ max = = = |A−1 |−1 .
|x|=1 |x̄| 2 |x|=1 |x̄| 2 |x̄| 3 |x̄|
where A = ∇ f (x̄) and the infimum is taken over all linear mappings B : IRn → IRn .
Guide. Combine 1E.9 and 1D.9.
We will extend the facts in 1E.9 and 1E.10 to the much more general context of
set-valued mappings in Chapter 6.
In the case of Theorem 1E.3 with μ = 0, an actual equivalence emerges between
the invertibility of f and that of h, as captured by the following statement. The key
is the fact that first-order approximation is a symmetric relationship between two
functions.
and moreover σ is then a first-order approximation to s at ȳ: s(y) = σ (y)+ o(|y− ȳ|).
Proof. Applying Theorem 1E.3 with μ = 0 and κ = lip (σ ; ȳ), we see that f −1 has a
single-valued localization s around ȳ for x̄ with lip (s; ȳ) ≤ lip (σ ; ȳ). In these circum-
44 1 Functions Defined Implicitly by Equations
Let κ > lip (s; ȳ). From (3) there exists b > 0 such that
(5) max |σ (y) − σ (y )|, |s(y) − s(y )| ≤ κ |y − y| for y, y ∈ IBb (ȳ).
Because e(x̄) = 0 and lip (e; x̄) = 0, we know that for any ε > 0 there exists a positive
a > 0 for which
Choose a
b
0 < β ≤ min , .
κ (1 + εκ )
Then, for every y ∈ IBβ (ȳ) from (5) we have
|σ (y) − x̄| ≤ κβ ≤ a,
and
|y + e(σ (y)) − ȳ| ≤ |y − ȳ| + ε |σ (y) − x̄| ≤ β + εκβ ≤ b.
Hence, utilizing (4), (5) and (6), we obtain
Since ε can be arbitrarily small, we arrive at the equality clm (s − σ ; ȳ) = 0, and this
completes the proof.
Finally, we observe that these results make it possible to deduce a slightly sharper
version of the equivalence in Theorem 1D.9.
Partial first-order estimators and approximations. For f : IRd × IRn → IRm and a
point ( p̄, x̄) ∈ int dom f , a function h : IRn → IRm is said to be an estimator of f with
respect to x uniformly in p at ( p̄, x̄) with constant μ if h(x̄) = f (x̄, p̄) and
Exercise 1E.14 (approximation criteria). Consider f : IRd × IRn → IRm and h : IRd ×
IRn → IRm with f ( p̄, x̄) = h( p̄, x̄), and the difference e(p, x) = f (p, x)− h(p, x). Prove
that
x (e; ( p̄, x̄)) = 0 and clm
(a) If clm p (e; ( p̄, x̄)) = 0, then h is a first-order approx-
imation to f at ( p̄, x̄).
46 1 Functions Defined Implicitly by Equations
(e; ( p̄, x̄)) = 0 and clm p (e; ( p̄, x̄)) = 0, then h is a first-order approxi-
(b) If lip x
mation to f at ( p̄, x̄).
(e; ( p̄, x̄)) = 0 and lip
(c) If lip (e; ( p̄, x̄)) = 0, then h is a strict first-order ap-
x p
proximation to f at ( p̄, x̄).
Exercise 1E.16 (the zero function as an approximation). Let f : IRd × IRn → IRm
( f ; ( p̄, x̄)) < ∞, and let u : IRd × IRm → IRk have a strict first-order ap-
satisfy lip x
proximation v with respect to y at ( p̄, ȳ), where ȳ := f ( p̄, x̄). Show that the zero
function is a strict first-order approximation with respect to x at ( p̄, x̄) to the func-
tion (p, x) → u(p, f (p, x)) − v( f (p, x)).
As another illustration of applicability of Theorem 1E.3 beyond first-order ap-
proximations, we sketch now a proof of the quadratic convergence of Newton’s
method for solving equations, a method we used in Proof I of the classical inverse
function theorem 1A.1.
Consider the equation g(x) = 0 for a continuously differentiable function g :
IRn → IRn with a solution x̄ at which the Jacobian ∇g(x̄) is nonsingular. Newton’s
method consists in choosing a starting point x0 possibly close to x̄ and generating a
sequence of points x1 , x2 , . . . according to the rule
According to the classical inverse function theorem 1A.1, g−1 has a smooth single-
valued localization around 0 for x̄. Consider the function f (x) = ∇g(x0 )(x − x̄) for
which f (x̄) = g(x̄) = 0 and f (x) − g(x) = −g(x) + ∇g(x0 )(x − x̄). An easy calcula-
tion shows that the Lipschitz modulus of e = f − g at x̄ can be made arbitrarily small
by making x0 close to x̄. However, this modulus must be nonzero — but less than
|∇g(x̄)−1 |−1 , the Lipschitz modulus of the single-valued localization of g−1 around
0 for x̄, if one wants to choose x0 as an arbitrary starting point from an open neigh-
borhood of x̄. Here Theorem 1E.3 comes into play with h = g and ȳ = 0, saying
that f −1 has a Lipschitz continuous single-valued localization s around 0 for x̄ with
Lipschitz constant, say, μ . (In the simple case considered this also follows directly
from the fact that if ∇g(x̄) is nonsingular at x̄, then ∇g(x) is likewise nonsingular for
all x in a neighborhood of x̄, see Fact 2 in Section 1A.) Hence, the Lipschitz constant
μ and the neighborhood V of 0 where s is Lipschitz continuous can be determined
before the choice of x0 , which is to be selected so that ∇g(x0 )(x0 − x̄) − g(x0 ) is in
V . Noting for the iteration (7) that x1 = s(∇g(x0 )(x0 − x̄) − g(x0 )) and x̄ = s(g(x̄)),
and using the smoothness of g, we obtain
1 Functions Defined Implicitly by Equations 47
for a suitable constant c. This kind of argument works for any k, and in that way,
through induction, we obtain quadratic convergence for Newton’s iteration (7).
In Chapter 6 we will present, in a broader framework of generalized equations in
possibly infinite-dimensional spaces, a detailed proof of this quadratic convergence
of Newton’s method and study its stability properties.
Theorem 1F.1 (Brouwer invariance of domain theorem). Let O ⊂ IRn be open and
for m ≤ n let f : O → IRm be continuous and such that f −1 is single-valued on f (O).
Then f (O) is open, f −1 is continuous on f (O), and m = n.
This topological result reveals that, for continuous functions f , the dimension of
the domain space has to agree with the dimension of the range space, if there is to be
any hope of an inverse function theorem claiming the existence of a single-valued
localization of f −1 . Of course, in the theorems already viewed for differentiable
functions f , the dimensions were forced to agree because of a rank condition on
the Jacobian matrix, but we see now that this limitation has a deeper source than a
matrix condition.
Brouwer’s invariance of domain theorem helps us to obtain the following char-
acterization of the existence of a Lipschitz continuous single-valued localization of
the inverse:
Theorem 1F.3 (inverse function theorem for local diffeomorphism). Let f : IRn →
IRn be continuously differentiable in a neighborhood of a point x̄ and let the Jacobian
∇ f (x̄) be nonsingular. Then for some open neighborhood U of x̄ there exists an open
neighborhood V of ȳ := f (x̄) and a continuously differentiable function s : V → U
which is one-to-one from V onto U and which satisfies s(y) = f −1 (y) ∩ U for all
y ∈ V . Moreover, the Jacobian of s is given by
Proof. First, we utilize a simple observation from linear algebra: for a nonsingular
n × n matrix A, one has |Ax| ≥ |x|/|A−1| for every x ∈ IRn . Thus, let c > 0 be such
that |∇ f (x̄)u| ≥ 2c|u| for every u ∈ IRn and choose a > 0 to have, on the basis of (b)
in Fact 1 of Section 1A, that
Using the triangle inequality, for any x , x ∈ IBa (x̄) we then have
Exercise 1F.4 (implicit function version). Let f : IRd × IRn → IRn be continuously
differentiable in a neighborhood of ( p̄, x̄) and such that f ( p̄, x̄) = 0, and let ∇x f ( p̄, x̄)
be nonsingular. Then for some open neighborhood U of x̄ there exists an open neigh-
borhood Q of p̄ and
a continuously
differentiable
function s : Q → U such that
(p, s(p)) p ∈ Q = (p, x) f (p, x) = 0 ∩ (Q
× U) ; that is, s is a single-valued
localization of the solution mapping S(p) = x f (p, x) = 0 with associated open
neighborhoods Q for p̄ and U for x̄. Moreover, the Jacobian of s is given by
Guide. Apply 1F.3 in the same way as 1A.1 is used in the proof of 1B.1.
Selections. Given a set-valued mapping F : IRn → → IRm and a set D ⊂ dom F , a func-
tion w : IR → IR is said to be a selection of F on D if dom w ⊃ D and w(x) ∈ F(x)
n m
right inverse of the mapping corresponds to AT (AAT )−1 , while when m ≥ n, the left
inverse7 corresponds to (AT A)−1 AT . For m = n they coincide and equal the inverse.
In general, of course, whenever a mapping f is one-to-one from a set C to f (C), any
left inverse to f on C is also a right inverse, and vice versa, and the restriction of
such an inverse to f (C) is uniquely determined.
The following result can be viewed as an extension of the classical inverse func-
tion theorem 1A.1 for selections.
Theorem 1F.6 (inverse selections when m ≤ n). Let f : IRn → IRm , where m ≤ n,
be k times continuously differentiable in a neighborhood of x̄ and suppose that its
Jacobian ∇ f (x̄) is full rank m. Then, for ȳ = f (x̄), there exists a local selection s of
f −1 around ȳ for x̄ which is k times continuously differentiable in a neighborhood
V of ȳ and whose Jacobian satisfies
Proof. There are various ways to prove this; here we apply the classical inverse
function theorem. Since A has rank m, the m × m matrix AAT is nonsingular. Then
the function ϕ : IRm → IRm defined by ϕ (u) = f (AT u) is k times continuously differ-
entiable in a neighborhood of the point ū := (AT A)−1 Ax̄, its Jacobian ∇ϕ (ū) = AAT
is nonsingular, and ϕ (ū) = ȳ. Then, from Theorem 1A.1 supplemented by Proposi-
tion 1B.5, it follows that ϕ −1 has a single-valued localization σ at ȳ for ū which is k
times continuously differentiable near ȳ with Jacobian ∇σ (ȳ) = (AAT )−1 . But then
the function s(y) = AT σ (y) satisfies s(ȳ) = x̄ and f (s(y)) = y for all y near ȳ and
is k times continuously differentiable near ȳ with Jacobian satisfying (2). Thus, s(y)
is a solution of the equation f (x) = y for y close to ȳ and x close to x̄, but perhaps
not the only solution there, as it would be in the classical inverse function theorem.
Therefore, s is a local selection of f −1 around ȳ for x̄ with the desired properties.
When m = n the Jacobian becomes nonsingular and the right inverse of A in (2)
is just A−1 . The uniqueness of the localization can be obtained much as in Step 2 of
Proof I of the classical theorem 1A.1.
Exercise 1F.7 (parameterization of solution sets). Let M = x f (x) = 0 for a
function f : IRn → IRm , where n − m = d > 0. Let x̄ ∈ M be a point around which f
is k times continuously differentiable, and suppose that the Jacobian ∇ f (x̄) has full
rank m. Then for some open neighborhood U of x̄ there is an open neighborhood
O of the origin in IRd and a k times continuously differentiable function s : O → U
which is one-to-one from O onto M ∩U , such that the Jacobian ∇s(0) has full rank
d and
7 The left inverse and the right inverse are particular cases of the Moore-Penrose pseudo-inverse
A+ of a matrix A. For more on this, including the singular-value decomposition, see Golub and
Van Loan [1996].
1 Functions Defined Implicitly by Equations 51
and apply 1F.6 (with a modification parallel to Proposition 1B.5) to the equation
f¯(p, x) = (0, 0), obtaining for the solution mapping of this equation a localization
s with f¯(p, s(p)) = (0, 0), i.e., Bs(p) = p + Bx̄ and f (s(p)) = 0. Show that this
function s has the properties claimed.
Exercise 1F.9 (implicit selections). Consider a function f : IRd × IRn → IRm , where
m ≤ n, along with the associated solution mapping
S : p → x ∈ IRn f (p, x) = 0 for p ∈ IRd .
Let f ( p̄, x̄) = 0, so that x̄ ∈ S( p̄). Assume that f is strictly differentiable at ( p̄, x̄)
and suppose further that the partial Jacobian ∇x f ( p̄, x̄) is of full rank m. Then the
mapping S has a local selection s around p̄ for x̄ which is strictly differentiable at p̄
with Jacobian
(3) ∇s( p̄) = AT (AAT )−1 ∇ p f ( p̄, x̄), where A = ∇x f ( p̄, x̄).
The verification of this claim relies on the following fixed point theorem, which
we state here without proof.
Theorem 1G.2 (Brouwer fixed point theorem). Let Q be a compact and convex set
in IRn , and let Φ : IRn → IRn be a function which is continuous on Q and maps Q into
itself. Then there exists a point x ∈ Q such that Φ (x) = x.
Proof of Theorem 1G.1. Without loss of generality, we can suppose that x̄ = 0 and
f (x̄) = 0. Let A := ∇ f (0) and choose a neighborhood U of 0 ∈ IRn . Take c ≥ |A−1 |.
Choose any α ∈ (0, c−1 ). From the assumed differentiability of f , there exists a > 0
such that x ∈ aIB implies | f (x) − Ax| ≤ α |x|. By making a smaller if necessary, we
can arrange that f is continuous in aIB and aIB ⊂ U. Let b = a(1 − cα )/c and pick
any y ∈ bIB. Consider the function
Φy : x → x − A−1 f (x) − y for x ∈ aIB.
This function is of course continuous on the compact and convex set aIB. Further-
more, for any x ∈ aIB we have
|Φy (x)| = |x − A−1( f (x) − y)| = |A−1 (Ax − f (x) + y)| ≤ |A−1 |(|Ax − f (x)| + |y|)
≤ c|Ax − f (x)| + c|y| ≤ cα |x| + cb ≤ cα a + ca(1 − cα )/c = a,
so Φy maps aIB into itself. Then, by Brouwer’s fixed point theorem 1G.2, there
exists a point x ∈ aIB such that Φy (x) = x. Note that, in contrast to the contraction
mapping principle 1A.2, this point may be not unique in aIB. But Φy (x) = x if and
only if f (x) = y. For each y ∈ bIB, y = 0, we pick one x ∈ aIB such that x = Φy (x);
then x ∈ f −1 (y). For y = 0 we take x = 0, which is clearly in f −1 (0). Denoting this
x by s(y), we deduce the existence of a local selection s : bIB → aIB of f −1 around 0
for 0, also having the property that for any neighborhood U of 0 there exists b > 0
such that s(y) ∈ U for y ∈ bIB, that is, s is continuous at 0.
Let s be a local selection of f −1 around 0 for 0 that is continuous at 0. Choose c,
α and a as in the beginning of the proof. Then there exists b > 0 with the property
that s(y) ∈ f −1 (y) ∩ aIB for every y ∈ b IB. This can be written as
which gives
that is,
c
(2) |s(y)| ≤ |y| for all y ∈ b IB.
1 − cα
In particular, s is calm at 0. But we have even more. Choose any ε > 0. The differ-
entiability of f with ∇ f (0) = A furnishes the existence of τ ∈ (0, a) such that
(1 − cα )ε
(3) | f (x) − Ax| ≤ |x| whenever |x| ≤ τ .
c2
Let δ = min{b , τ (1 − cα )/c}. Then δ ≤ b , so that on δ IB we have our local selec-
tion s of f −1 satisfying (2) and consequently
c c (1 − cα )τ
|s(y)| ≤ δ≤ = τ when |y| ≤ δ .
1 − cα 1 − cα c
Taking norms in the identity
s(y) − A−1 y = −A−1 f (s(y)) − As(y) ,
Having demonstrated that for any ε > 0 there exists δ > 0 such that |s(y) − A−1 y| ≤
ε |y| when |y| ≤ δ , we conclude that (1) holds, as claimed.
In order to gain more insight into what Theorem 1G.1 does or does not say,
think about the case where the assumptions of the theorem hold and f −1 has a
localization at ȳ for x̄ that avoids multi-valuedness. This localization must actually
be single-valued around ȳ, coinciding in some neighborhood with the local selection
s in the theorem. Then we have a result which appears to be fully analogous to the
classical inverse function theorem, but its shortcoming is the need to guarantee that
a localization of f −1 without multi-valuedness does exist. That, in effect, is what
strict differentiability of f at x̄, in contrast to just ordinary differentiability, is able
to provide. An illustration of how inverse multi-valuedness can indeed come up
when the differentiability is not strict has already been encountered in Example
1E.5 with α ∈ (0, 1). Observe that in this example there are infinitely many (even
uncountably many) local selections of the inverse f −1 and, as the theorem says, each
is continuous and even differentiable at 0, but also each selection is discontinuous
at infinitely many points near zero.
We can now partially extend Theorem 1G.1 to the case when m ≤ n.
Comparing 1G.1 with 1G.3, we see that the equality m = n gives us not only the
existence of a local selection which is differentiable at ȳ but also that every local
selection which is continuous at ȳ, whose existence is assured also for m < n, is
differentiable at ȳ with the same Jacobian. Of course, if we assume in addition that
f is strictly differentiable, we obtain strict differentiability of s at ȳ. To get this last
result, however, we do not have to resort to Brouwer’s fixed point theorem 1G.2.
Theorem 1G.1 is in fact a special case of a still broader result in which f does
not need to be differentiable.
(4) |e(x)| ≤ α |x| for all x ∈ aIB, where e(x) = f (x) − h(x).
Let
a(1 − cα ) γ
(5) 0 < b ≤ min , .
c 2
(6) |y − e(x)| ≤ α a + b ≤ γ .
This function is of course continuous on aIB; moreover, from (6), the Lipschitz con-
tinuity of σ on γ IB with constant c, the fact that σ (0) = 0, and the choice of b in (5),
we obtain
Hence, by Brouwer’s fixed point theorem 1G.2, there exists x ∈ aIB with x = σ (y −
e(x)). Then h(x) = h(σ (y − e(x))) = y − e(x), that is, f (x) = y. For each y ∈ bIB,
y = 0 we pick one such fixed point x of the function Φy in (7) in aIB and call it s(y);
for y = 0 we set s(0) = 0 ∈ f −1 (0). The function s is a local selection of f −1 around
0 for 0 which is, moreover, continuous at 0, since for an arbitrary neighborhood U
of 0 we found b > 0 such that s(y) ∈ U whenever |y| ≤ b. Also, for any y ∈ bIB we
1 Functions Defined Implicitly by Equations 55
have
From the continuity of s at 0 there exists b ∈ (0, b) such that |s(y)| ≤ a for all y ∈
b IB. For y ∈ b IB, we see from (4), (8), the Lipschitz continuity of σ with constant c
and the equality σ (0) = 0 that
(1 − α c)ε
(10) |e(x)| ≤ |x| whenever |x| ≤ τ .
c2
Finally, taking b > 0 smaller if necessary and using (9) and (10), for any y with
|y| ≤ b we obtain
|s(y) − σ (y)| = σ (y − e(s(y))) − σ (y)|
(1 − α c)ε
≤ c|e(s(y))| ≤ c |s(y)|
c2
(1 − α c)ε c
≤c |y| = ε |y|.
c2 1 − αc
Since for any ε > 0 we found b > 0 for which this holds when |y| ≤ b , the proof is
complete.
Proof of Theorem 1G.3. Apply Theorem 1G.4 with h(x) = f (x̄) + A(x − x̄) and
σ (y) = AT (AAT )−1 y.
We state next as an exercise an implicit function counterpart of 1G.3.
Let f ( p̄, x̄) = 0, so that x̄ ∈ S( p̄), and suppose f is continuous around ( p̄, x̄) and
differentiable at ( p̄, x̄). Assume further that ∇x f ( p̄, x̄) has full rank m. Prove that the
mapping S has a local selection s around p̄ for x̄ which is differentiable at p̄ with
Jacobian
∇s(ȳ) = AT (AAT )−1 ∇ p f ( p̄, x̄), where A = ∇x f ( p̄, x̄).
56 1 Functions Defined Implicitly by Equations
Openness. A function f : IRn → IRm is said to be open at x̄ if x̄ ∈ int dom f and for
every neighborhood U of x̄ the set f (U) is a neighborhood of f (x̄).
Thus, f is open at x̄ if for every open neighborhood U of x̄ there is an open
neighborhood V of ȳ = f (x̄) such that f −1 (y) ∩U = 0/ for every y ∈ V . In particular
this corresponds to the localization of f −1 relative to V and U being nonempty-
valued on V , but goes further than referring just to a single such localization at ȳ
for x̄. It actually requires the existence of a nonempty-valued graphical localization
for every neighborhood U of x̄, no matter how small. From 1G.3 we obtain the
following basic result about openness:
Corollary 1G.6 (Jacobian criterion for openness). For a function f : IRn → IRm ,
where m ≤ n, suppose that f is continuous around x̄ and differentiable at x̄ with
∇ f (x̄) being of full rank m. Then f is open at x̄.
There is much more to say about openness of functions and set-valued mappings,
and we will explore this in detail in Chapters 3 and 5.
1 Functions Defined Implicitly by Equations 57
Commentary
Although functions given implicitly by equations had been considered earlier by
Descartes, Newton, Leibnitz, Lagrange, Bernoulli, Euler, and others, Cauchy [1831]
is credited by historians to be the first who stated and rigorously proved an im-
plicit function theorem — for analytic functions, by using his calculus of residuals
and limits. As we mentioned in the preamble to this chapter, Dini [1877/78] gave
the form of the implicit function theorem for continuously differentiable functions
which is now used in most calculus books; in his proof he relied on the mean value
theorem. More about early history of the implicit function theorem can be found
in historical notes of the paper of Hurwicz and Richter [2003] and in the book of
Krantz and Parks [2002].
Proof I of the classical inverse function theorem, 1A.1, goes back to Goursat
[1903]8. Apparently not aware of Dini’s theorem and inspired by Picard’s suc-
cessive approximation method for proving solution existence of differential equa-
tions, Goursat stated an implicit function theorem under assumptions weaker than
in Dini’s theorem, and supplied it with a new path-breaking proof. With updated
notation, Goursat’s proof employs the iterative scheme
This scheme would correspond to Newton’s method for solving f (p, x) = 0 with
respect to x if A were replaced by Ak giving the partial derivative at (p, xk ) instead
of ( p̄, x̄). But Goursat proved anyway that for each p near enough to p̄ the sequence
{xk } is convergent to a unique point x(p) close to x̄, and furthermore that the func-
tion p → x(p) is continuous at p̄. Behind the scene, as in Proof I of Theorem 1A.1,
is the contraction mapping idea. An updated form of Goursat’s implicit function
theorem is given in Theorem 1B.6. In the functional analysis text by Kantorovich
and Akilov [1964], Goursat’s iteration is called a “modified Newton’s method.”
The rich potential in this proof was seen by Lamson [1920], who used the it-
erations in (1) to generalize Goursat’s theorem to what are now known as Banach
spaces9 . Especially interesting for our point of view in the present book is the fact
that Lamson was motivated by an optimization problem, namely the problem of La-
grange in the calculus of variations with equality constraints, for which he proved a
Lagrange multiplier rule by way of his implicit function theorem.
Lamson’s work was extended in a significant way by Hildebrand and Graves
[1927], who also investigated differentiability properties of the implicit function.
They first stated a contraction mapping result (their Theorem 1), in which the only
difference with the statement of Theorem 1A.2 is the presence of a superfluous pa-
rameter. The contraction mapping principle, as formulated in 1A.3, was published
http://www.numdam.org/.
9 The name “Banach spaces” for normed linear spaces that are complete was coined by Fréchet,
according to Hildebrand and Graves [1927]; we deal with Banach spaces in Chapter 5.
58 1 Functions Defined Implicitly by Equations
five years earlier in Banach [1922] (with some easily fixed typos), but the idea be-
hind the contraction mapping was evidently known to Goursat, Picard and proba-
bly even earlier. Hildebrand and Graves [1927] cited in their paper Banach’s work
[1922], but only in the context of the definition of a Banach space. Further, based on
their parameterized formulation of 1A.2, they established an implicit function the-
orem in the classical form of 1B.1 (their Theorem 4) for functions acting in linear
metric spaces. More intriguing, however, for its surprising foresight, is their Theo-
rem 3, called by the authors a “neighborhood theorem,” where they do not assume
differentiability; they say “only an approximate differential . . . is required.” In this,
they are far ahead of their time. (We will see a similar picture with Graves’ theo-
rem later in Section 5D.) Because of the importance of this result of Hildebrand and
Graves, we provide a statement of it here in finite dimensions with some adjustments
in terminology and notation.
Solutions mappings in the classical setting of the implicit function theorem concern
problems in the form of parameterized equations. The concept can go far beyond
that, however. In any situation where some kind of problem in x depends on a pa-
rameter p, there is the mapping S that assigns to each p the corresponding set of
solutions x. The same questions then arise about the extent to which a localization
of S around a pair ( p̄, x̄) in its graph yields a function s which might be continuous
or differentiable, and so forth.
This chapter moves into that much wider territory in replacing equation-solving
problems by more complicated problems termed “generalized equations.” Such
problems arise variationally in constrained optimization, models of equilibrium, and
many other areas. An important feature, in contrast to ordinary equations, is that
functions obtained implicitly from their solution mappings typically lack differen-
tiability, but often exhibit Lipschitz continuity and sometimes combine that with the
existence of one-sided directional derivatives.
The first task is to explain “generalized equations” and their special case, some-
what confusingly termed “variational inequality” problems, which arises from the
variational geometry of sets expressing constraints. Problems of optimization and
the Lagrange multiplier conditions characterizing their solutions provide key exam-
ples. Convexity of sets and functions enters as a valuable ingredient.
From that background, the chapter proceeds to Robinson’s implicit function
theorem for parameterized variational inequalities and several of its extensions. Sub-
sequent sections introduce concepts of ample parameterization and semidifferentia-
bility, building toward major results in 2E for variational inequalities over convex
sets that are polyhedral. A follow-up in 2F looks at a type of “monotonicity” and
its consequences, after which, in 2G, a number of applications in optimization are
worked out.
A.L. Dontchev and R.T. Rockafellar, Implicit Functions and Solution Mappings: A View 61
from Variational Analysis, Springer Monographs in Mathematics,
DOI 10.1007/978-0-387-87821-8 2, c Springer Science+Business Media, LLC 2009
62 2 Implicit Function Theorems for Variational Problems
Normal cones. For a convex set C ⊂ IRn and a point x ∈ C, a vector v is said to be
normal to C at x if v, x − x ≤ 0 for all x ∈ C. The set of all such vectors v is called
the normal cone to C at x and is denoted by NC (x). For x ∈ / C, NC (x) is taken to be
the empty set. The normal cone mapping is thus defined as
NC : x → NC (x) for x ∈ C,
0/ otherwise.
The term cone refers to a set of vectors which contains 0 and contains with any
of its elements v all positive multiples of v. For each x ∈ C, the normal cone NC (x)
is indeed a cone in this sense. Moreover it is closed and convex. The normal cone
mapping NC : x → NC (x) has dom NC = C. When C is closed, gph NC is a closed
subset of IRn × IRn .
Variational inequalities. For a function f : IRn → IRn and a closed convex set C ⊂
dom f , the generalized equation
(4) v ∈ NC (x) ⇐⇒ PC (x + v) = x.
Interestingly, this projection rule for normals means for the mappings NC and PC
that
A consequence of (5) is that the variational inequality (2) can actually be written as
an equation, namely
It should be kept in mind, though, that this doesn’t translate the solving of variatio-
nal inequalities into the classical framework of solving nonlinear equations. There,
“linearizations” are essential, but PC often fails to be differentiable, so linearizations
generally aren’t available for the equation in (6), regardless of the degree of differ-
entiability of f . Other approaches can sometimes be brought in, however, depending
on the nature of the set C. Anyway, the characterization in (6) has the advantage of
leading quickly to a criterion for the existence of a solution to a variational inequal-
ity in a basic case.
Guide. The projection rule (4) provides an easy way of identifying the normal vec-
tors in these examples.
The formula in 2A.2(c) comes up far more frequently than might be anticipated.
A variational inequality (2) in which C = IRn+ is called a complementarity problem;
one has
− f (x) ∈ NIRn+ (x) ⇐⇒ x ≥ 0, f (x) ≥ 0, x ⊥ f (x).
Here the common notation is adopted that a vector inequality like x ≥ 0 is to be
taken componentwise, and that x ⊥ y means x, y = 0. Many variational inequali-
ties can be recast, after some manipulation, as complementarity problems, and the
numerical methodology for solving such problems has therefore received especially
much attention.
The orthogonality relation in 2A.2(a) extends to a “polarity” relation for cones
which has a major presence in our subject.
Proposition 2A.3 (polar cone). Let K be a closed, convex cone in IRn and let K ∗ be
its polar, defined by
(7) K ∗ = y x, y ≤ 0 for all x ∈ K .
Then K ∗ is likewise a closed, convex cone, and its polar (K ∗ )∗ is in turn K . Further-
more, the normal vectors to K and K ∗ are related by
Proof. First consider any x ∈ K and y ∈ NK (x). From the definition of normality in
(7) we have y, x − x ≤ 0 for all x ∈ K, so the maximum of y, x over x ∈ K is
attained at x. Because K contains all positive multiples of each of its vectors, this
comes
down
tohaving y, x = 0 and y, x ≤ 0 for all x ∈ K. Therefore NK (x) =
∗
y∈K y⊥x .
It’s elementary that K ∗ is a cone which is closed and convex, with (K ∗ )∗ ⊃ K.
Consider any z ∈ / K. Let x = PK (z) and y = z− x. Then y = 0 and PK (x+ y) = x, hence
y ∈ NK (x), so that y ∈ K ∗ and y ⊥ x. We have y, z = y, y > 0, which confirms that
z∈/ (K ∗ )∗ . Therefore (K ∗ )∗ = K. The formula for normals to K must hold then
2 Implicit Function Theorems for Variational Problems 65
equally for K ∗ through symmetry: NK ∗ (y) = x ∈ K x ⊥ y for any y ∈ K ∗ . This
establishes the normality relations that have been claimed.
TC(x)
NC(x)
Polarity has a basic role in relating the normal vectors to a convex set to its
“tangent vectors.”
Tangent cones. For a set C ⊂ IRn (not necessarily convex) and a point x ∈ C, a
vector v is said to be tangent to C at x if
1 k
(x − x) → v for some xk → x, xk ∈ C, τ k 0.
τk
The set of all such vectors v is called the tangent cone to C at x and is denoted by
TC (x). For x ∈
/ C, TC (x) is taken to be the empty set.
Although we will mainly be occupied with normal cones to convex sets at
present, tangent cones to convex sets and even nonconvex sets will be put to se-
rious use later in the book.
Exercise 2A.4. The tangent cone TC (x) to a closed, convex set C at a point x ∈ C is
the closed, convex cone that is polar to the normal cone NC (x): one has
Guide. The second of the equations (9) comes immediately from the definition of
NC (x), and the first is then obtained from Proposition 2A.3.
Variational inequalities are instrumental in capturing conditions for optimality in
problems of minimization or maximization and even “equilibrium” conditions such
66 2 Implicit Function Theorems for Variational Problems
as arise in games and models of conflict. To explain this motivation, it will be helpful
to be able to appeal to the convexity of functions, at least in part.
It is strictly convex if this holds with strict inequality for x = x . It is strongly convex
with constant μ when μ > 0 and, for every x, x ∈ C,
g((1 − τ )x + τ x) ≤ (1 − τ )g(x) + τ g(x) − μτ (1 − τ )|x − x|2 for all τ ∈ (0, 1).
The greatest lower bound of the objective function g on C, namely infx∈C g(x), is
the optimal value in the problem, which may or may not be attained, however, and
could even be infinite. If it is attained at a point x̄, then x̄ is said to furnish a global
minimum, or just a minimum, and to be a globally optimal solution; the set of such
points is denoted as argminx∈C g(x). A point x ∈ C is said to furnish a local minimum
of g relative to C and to be a locally optimal solution when, at least, g(x) ≤ g(x ) for
every x ∈ C belonging to some neighborhood of x. A global or local maximum of g
corresponds to a global or local minimum of −g.
In the context of variational inequalities, the gradient mapping ∇g : IRn → IRn
associated with a differentiable function g : IRn → IR will be a focus of attention.
Observe that
∇2 g(x) = ∇ f (x) when f (x) = ∇g(x).
is necessary for x to furnish a local minimum. It is both necessary and sufficient for
a global minimum if g is convex.
Proof. Along with x ∈ C, consider any other point x ∈ C and the function ϕ (t) =
g(x + tw) with w = x − x. From convexity we have x + tw ∈ C for all t ∈ [0, 1]. If
a local minimum of g occurs at x relative to C, then ϕ must have a local minimum
at 0 relative to [0, 1], and consequently ϕ (0) ≥ 0. But ϕ (0) = ∇g(x), w. Hence
∇g(x), x − x ≥ 0. This being true for arbitrary x ∈ C, we conclude through the
characterization of (2) in (3) that −∇g(x) ∈ NC (x).
In the other direction, if g is convex and −∇g(x) ∈ NC (x) we have for every x ∈ C
that ∇g(x), x − x ≥ 0, but also g(x ) − g(x) ≥ ∇g(x), x − x by the convexity
criterion in 2A.5(a). Hence g(x ) − g(x) ≥ 0 for all x ∈ C, and we have a global
minimum at x.
To illustrate the condition in Theorem 2A.6, we may use it to reconfirm the pro-
jection rule for normal vectors in (4), which can be stated equivalently as saying
that PC (z) = x if and only if z − x ∈ NC (x). Consider any nonempty, closed, convex
68 2 Implicit Function Theorems for Variational Problems
set C ⊂ IRn and any point z ∈ IRn . Let g(x) = 12 |x − z|2 , which has ∇g(x) = x − z and
∇2 g(x) ≡ I, implying strict convexity. The projection x = PC (z) is the solution to the
problem of minimizing g over C. The variational inequality (10) characterizes it by
the relation −(x − z) ∈ NC (x), which is exactly what was targeted.
According to Theorem 2A.6, minimizing a differentiable convex function g over
a closed, convex set C is equivalent to solving a type of variational inequality (2)
in which f is the gradient mapping ∇g. When C = IRn , so that we are dealing with
unconstrained minimization, this is equivalent to solving f (x) = 0 for f = ∇g. The
notion of a variational inequality thus makes it possible to pass from unconstrained
minimization to constrained minimization. Whether the problem is constrained or
unconstrained, there is no guarantee that the minimum will be attained at a unique
point (although nonuniqueness is impossible when g is strictly convex, at least), but
still, local uniqueness dominates the picture conceptually. For that reason, it does
make sense to be thinking of the task as one of “solving a generalized equation.”
When g is not convex, solving the variational inequality (2) is no longer equiva-
lent to minimization over C, but nevertheless it has a strong association with identi-
fying a local minimum. Anyway, there’s no need really to insist on a minimum. Just
as the equation ∇g(x) = 0 describes, in general, a “stationary point” of g (uncon-
strained), the variational inequality (10) can be viewed as describing a constrained
version of a stationary point, which could be of interest in itself.
The minimization rule in Theorem 2A.6 can be employed to deduce a rule for
determining normal vectors to intersections of convex sets, as in the second part of
the following proposition.
holds for any x ∈ C such that there is no v = 0 with v ∈ NC1 (x) and −v ∈ NC2 (x). This
condition is fulfilled in particular for every x ∈ C if C1 ∩int C2 = 0/ or C2 ∩int C1 = 0/ .
Proof. To prove (a), we note that, by definition, a vector v = (v1 , v2 ) belongs to
NC (x) if and only if, for every x = (x1 , x2 ) in C1 × C2 we have 0 ≥ v, x − x =
v1 , x1 − x1 + v2 , x2 − x2 . That’s the same as having v1 , x1 − x1 ≤ 0 for all x1 ∈ C1
and v2 , x2 −x2 ≤ 0 for all x2 ∈ C2 , or in other words, v1 ∈ NC1 (x1 ) and v2 ∈ NC2 (x2 ).
In proving (b), it is elementary that if v = v1 + v2 with v1 ∈ NC1 (x) and v2 ∈
NC2 (x), then for every x in C1 ∩C2 we have both v1 , x − x ≤ 0 and v2 , x − x ≤ 0,
so that v, x − x ≤ 0. Thus, NC (x) ⊃ NC1 (x) + NC2 (x).
The opposite inclusion takes more work to establish. Fix any x ∈ C and v ∈ NC (x).
As we know from (4), this corresponds to x being the projection of x + v on C, which
we can elaborate as follows: (x, x) is the unique solution to the problem
The expression being minimized here is nonnegative and, as seen from the case of
x1 = x2 = x, has minimum no greater than 2|v|2 . It suffices therefore in the minimiza-
tion to consider only points x1 and x2 such that |x1 − (x+ v)|2 + |x2 − (x+ v)|2 ≤ 2|v|2
and k|x1 − x2 |2 ≤ 2|v|2 . For each k, therefore, the minimum in (11) is attained by
some (xk1 , xk2 ), and these pairs form a bounded sequence such that xk1 − xk2 → 0.
Any accumulation point of this sequence must be of the form (x̃, x̃) and satisfy
|x̃ − (x + v)|2 + |x̃ − (x + v)|2 ≤ 2|v|2 , or in other words |x̃ − (x + v)| ≤ |v|. But by the
projection rule (4), x is the unique closest point of C to x + v, the distance being |v|,
so this inequality implies x̃ = x. Therefore, (xk1 , xk2 ) → (x, x).
We investigate next the necessary condition for optimality provided for problem
(11). Invoking the formula in (a) for the normal cone to C1 × C2 at (xk1 , xk2 ), we see
that it requires
−2[xk1 − (x + v) + k(xk1 − xk2 )] ∈ NC1 (xk1 ),
−2[xk2 − (x + v) − k(xk1 − xk2 )] ∈ NC2 (xk2 ),
or equivalently, for wk = k(xk2 − xk1 ),
Two cases have to be analyzed now separately. In the first case, we suppose that
the sequence of vectors wk is bounded and therefore has an accumulation point w.
Let vk1 = v + (x − xk1 ) + wk and vk2 = v + (x − xk2 ) − wk , so that, through (4), we have
PC1 (xk1 + vk1 ) = xk1 and PC2 (xk2 + vk2 ) = xk2 . Since xk1 → x and xk2 → x, the sequences of
vectors vk1 and vk2 have accumulation points v1 = v − w and v2 = v + w which satisfy
v1 + v2 = 2v. By the continuity of the projection mappings coming from 1D.5, we
get PC1 (x + v1 ) = x and PC2 (x + v2 ) = x. By (6), these relations mean v1 ∈ NC1 (x) and
v2 ∈ NC2 (x) and hence 2v ∈ NC1 (x) + NC2 (x). Since the sum of cones is a cone, we
get v ∈ NC1 (x) + NC2 (x). Thus NC (x) ⊂ NC1 (x) + NC2 (x), and since we have already
shown the opposite inclusion, we have equality.
In the second case, we suppose that the sequence of vectors wk is unbounded. By
passing to a subsequence if necessary, we can reduce this to having 0 < |wk | → ∞
with wk /|wk | converging to some v̄ = 0. Let
Then v̄k1 → v̄ and v̄k2 → −v̄. By (12) we have v̄k1 ∈ NC1 (xk1 ) and v̄k2 ∈ NC1 (xk2 ), or
equivalently through (4), the projection relations PC1 (xk1 + v̄k1 ) = xk1 and PC2 (xk2 +
v̄k2 ) = xk1 . In the limit we get PC1 (x + v̄) = x and PC2 (x − v̄) = x, so that v̄ ∈ NC1 (x)
and −v̄ ∈ NC2 (x). This contradicts our assumption in (b), and we see thereby that
the second case is impossible.
70 2 Implicit Function Theorems for Variational Problems
We turn now to minimization over sets C that might not be convex and are speci-
fied by systems of constraints which have to be handled with Lagrange multipliers.
This will lead us to other valuable examples of variational inequalities, after some
elaborations.
Theorem 2A.8 (Lagrange multiplier rule). Let X ⊂ IRn and D ⊂ IRm be nonempty,
closed, convex sets, and consider the problem
(13) minimize g0 (x) over C = x ∈ X g(x) ∈ D ,
for g(x) = (g1 (x), . . . , gm (x)), where the functions gi : IRn → IR, i = 0, 1, . . . , m are
continuously differentiable. Let x be a point of C at which the following constraint
qualification condition is fulfilled:
(15) there exists y ∈ ND (g(x)) such that − [∇g0 (x) + y∇g(x)] ∈ NX (x).
Proof. Assume that a local minimum occurs at x. Let X and D be compact, convex
sets which coincide with X and D in neighborhoods of x and g(x), respectively, and
are small enough that g0 (x ) ≥ g0 (x) for all x ∈ X having g(x ) ∈ D . Consider the
auxiliary problem
Obviously the unique solution to this is (x , u ) = (x, g(x)). Next, for k → ∞, consider
the following sequence of problems, which replace the equation in (16) by a penalty
expression:
1 k
(17) minimize g0 (x ) + |x − x|2 + |g(x ) − u|2 over all (x , u ) ∈ X × D .
2 2
For each k let (xk , uk ) give the minimum in this relaxed problem (the minimum being
attained because the functions are continuous and the sets X and D are compact).
The minimum value in (17) can’t be greater than the minimum value in (16), as seen
by taking (x , u ) to be the unique solution (x, g(x)) to (16). It’s apparent then that
the only possible cluster point of the bounded sequence {(xk , uk )}∞ k=1 as k → ∞ is
(x, g(x)). Thus, (xk , uk ) → (x, g(x)).
Next we apply the optimality condition in Theorem 2A.6 to problem (17) at its
solution (xk , uk ). We have NX ×D (xk , uk ) = NX (xk ) × ND (uk ), and on the other hand
NX (xk ) = NX (xk ) and ND (uk ) = ND (uk ) by the choice of X and D , at least when
k is sufficiently large. The variational inequality condition in Theorem 2A.6 comes
down in this way to
2 Implicit Function Theorems for Variational Problems 71
−[ ∇g0 (xk ) + (xk − x) + k(g(xk ) − uk )∇g(xk ) ] ∈ NX (xk ),
(18)
k(g(xk ) − uk ) ∈ ND (uk ).
(Here we use the fact that any positive multiple of a vector in NX (xk ) or ND (uk ) is
another such vector.) By passing to a subsequence, we can arrange that the sequence
of vectors yk converges to some y, necessarily with |y| = 1. In this limit, (19) turns
into the relation in (14), which has been forbidden to hold for any y = 0. Hence case
(B) is impossible under our assumptions, and we are left with the conclusion (15)
obtained from case (A).
In the first-order optimality condition (15), y is said to be a Lagrange multiplier
vector associated with x. More can be said about this condition by connecting it with
the Lagrangian function for problem (13), which is defined by
for y = (y1 , . . . , ym ).
Then, in terms of the Lagrangian L in (20), the condition on x and y in (15) can be
written in the form
(22) − f (x, y) ∈ NX×Y (x, y) for f (x, y) = (∇x L(x, y), −∇y L(x, y)).
This problem1 corresponds in (13) to taking X = IRn and having D be the closed,
convex cone in IRm consisting of all u = (u1 , . . . , um ) such that ui ≤ 0 for i ∈ [1, s] but
ui = 0 for i ∈ [s + 1, m]. The polar cone Y = D∗ is Y = IRs+ × IRm−s . The optimality
condition in (18) can equally well be placed then in the Lagrangian framework in
(21), corresponding to the variational inequality (22). The requirements it imposes
on x and y come out as
≤ 0 for i ∈ [1, s] with yi = 0,
y ∈ IR+ × IR , gi (x)
s m−s
(24) = 0 for all other i ∈ [1, m],
∇g0 (x) + y1 ∇g1 (x) + · · · + ym ∇gm (x) = 0.
These are the Karush–Kuhn–Tucker conditions for the nonlinear programming prob-
lem (23). According to Theorem 2A.8, the existence of y satisfying these conditions
with x is necessary for the local optimality of x under the constraint qualification
(14), which insists on the nonexistence of y = 0 satisfying the same conditions with
the term ∇g0 (x) suppressed. The existence of y satisfying (24) is sufficient for the
global optimality of x by Theorem 2A.9 as long as L(x, y) is convex as a function of
x ∈ IRn for each fixed y ∈ IRs+ × IRm−s , which is equivalent to having
Exercise 2A.10 (variational inequality for a saddle point). Let X ⊂ IRn and Y ⊂ IRm
be any nonempty, closed, convex sets, and let L be a C 1 function on IRn × IRm such
that L(·, y) is a convex function on X for each y ∈ Y , and L(x, ·) is a concave function
on Y for each x ∈ X . The variational inequality (22) is equivalent to having (x, y) be
a saddle point of L with respect to X × Y in the sense that
Guide. Rely on the equivalence between (21) and (22), plus Theorem 2A.6.
A saddle point as defined in Exercise 2A.10 represents an equilibrium in the two-
person zero-sum game in which Player 1 chooses x ∈ X , Player 2 chooses y ∈ Y , and
then Player 1 pays the amount L(x, y) (possibly negative) to Player 2. Other kinds
of equilibrium can likewise be captured by other variational inequalities.
For example, in an N-person game there are players 1, . . . , N, with Player k hav-
ing a nonempty strategy set Xk . Each Player k chooses some xk ∈ Xk , and is then
obliged to pay—to an abstract entity (not necessarily another player)—an amount
which depends not only on xk but also on the choices of all the other players; this
amount can conveniently be denoted by
(The game is zero-sum if ∑Nk=1 Lk (xk , x−k ) = 0.) A choice of strategies xk ∈ Xk for
k = 1, . . . , N is said to furnish a Nash equilibrium if
This necessary condition is sufficient for a Nash equilibrium if, in addition, the
functions Lk (·, x−k ) on IRnk are convex.
Guide. Make use of the product rule for normals in 2A.7(a) and the optimality
condition in Theorem 2A.6.
Finally, we look at a kind of generalized equation (1) that is not a variational
inequality (2), but nonetheless has importance in many situations:
74 2 Implicit Function Theorems for Variational Problems
which is (1) for f (x) = −(g1 (x), . . . , gm (x)), F(x) ≡ D. Here D is a subset of IRm ; the
format has been chosen to be that of the constraints in problem (13), or as a special
case, problem (23).
Although (25) would reduce to an equation, pure and simple, if D consists of
a single point, the applications envisioned for it lie mainly in situations where in-
equality constraints are involved, and there is little prospect or interest in a solution
being locally unique. In the study of generalized equations with parameters, to be
taken up next in 2B, our attention will at first be concentrated on issues parallel to
those in Chapter 1. Only later, in Chapter 3, will generalized equations like (25) be
brought in.
The example in (25) also brings a reminder about a feature of generalized equa-
tions which dropped out of sight in the discussion of the variational inequality case.
In (2), necessarily f had to go from IRn to IRn , whereas in (25), and in (1), f may go
from IRn to a space IRm of different dimension.
(2) S : p → x f (p, x) + F(x) 0 for p ∈ IRd .
The questions we will concentrate on answering, for now, are nevertheless the same
as in Chapter 1. To what extent might S be single-valued and possess various prop-
erties of continuity or some type of differentiability?
In a landmark paper2 , S. M. Robinson studied the solution mapping S in the
case of a parameterized variational inequality, where m = n and F is a normal cone
mapping NC : IRn → → IRn :
His results were, from the very beginning, stated in abstract spaces, and we will
come to that in Chapter 5. Here, we confine the exposition to Euclidean spaces, but
the presentation is tailored in such a way that, for readers who are familiar with some
basic functional analysis, the expansion of the framework from Euclidean spaces to
general Banach spaces is straightforward. The original formulation of Robinson’s
theorem, up to some rewording to fit this setting, is as follows.
Theorem 2B.1 (Robinson implicit function theorem). For the solution mapping S
to a parameterized variational inequality (3), consider a pair ( p̄, x̄) with x̄ ∈ S( p̄).
Assume that:
(a) f (p, x) is differentiable with respect to x in a neighborhood of the point ( p̄, x̄),
and both f (p, x) and ∇x f (p, x) depend continuously on (p, x) in this neighborhood;
(b) the inverse G−1 of the set-valued mapping G : IRn → → IRn defined by
(4) G(x) = f ( p̄, x̄) + ∇x f ( p̄, x̄)(x − x̄) + NC (x), with G(x̄) 0,
lip (σ ; 0) ≤ κ .
In the special case where the variational inequality treated by Robinson’s the-
orem reduces to the equation f (p, x) = 0 (namely with C = IRn , so NC ≡ 0), the
invertibility condition on the mapping G in assumption (b) of Robinson’s theorem
comes down to the nonsingularity of the Jacobian ∇x f ( p̄, x̄) in the Dini classical
implicit function theorem 1B.1. But because of the absence of an assertion about
the differentiability of s, Theorem 2B.1 falls short of yielding all the conclusions of
that theorem. It could, though, be used as an intermediate step in a proof of Theo-
rem 1B.1, which we leave to the reader as an exercise.
Exercise 2B.4. Supply a proof of the classical implicit function theorem 1B.1 based
on Robinson’s theorem 2B.1.
Guide. In the case C = IRn , so NC ≡ 0, use the Lipschitz continuity of the single-
valued localization s following from Corollary 2B.3 to show that s is continuously
differentiable around p̄ when f is continuously differentiable near ( p̄, x̄).
The invertibility property in assumption (b) of 2B.1 is what Robinson called
“strong regularity” of the generalized equation (3). A related term, “strong metric
regularity,” will be employed in Chapter 3 for set-valued mappings in reference to
the existence of Lipschitz continuous single-valued localizations of their inverses.
In the extended version of Theorem 2B.1 which we present next, the differen-
tiability assumptions on f are replaced by assumptions about an estimator h for
f ( p̄, ·), which could in particular be a first-order approximation in the x argument.
This mode of generalization was initiated in 1E.3 and 1E.13 for equations, but now
we use it for a generalized equation (1). In contrast to Theorem 2B.1, which was
concerned with the case of a variational inequality (3), the mapping F : IRn → → IRm
need not be of form NC and the dimensions n and m could in principle be different.
Remarkably, no direct assumptions need be made about F, but certain properties of
F will implicitly underlie the “invertibility” condition imposed jointly on F and the
estimator h.
Lemma 2B.6. Consider a function ϕ : IRd ×IRn → IRm and a point ( p̄, x̄) ∈ int dom ϕ
and let the scalars ν ≥ 0, b ≥ 0, a > 0, and the set Q ⊂ IRd be such that p̄ ∈ Q and
|ϕ (p, x ) − ϕ (p, x)| ≤ ν |x − x | for all x , x ∈ IBa (x̄) and p ∈ Q,
(7)
|ϕ (p, x̄) − ϕ ( p̄, x̄)| ≤ b for all p ∈ Q.
Consider also a set-valued mapping M : IRm → → IRn with (ȳ, x̄) ∈ gph M where ȳ :=
ϕ ( p̄, x̄), such that for each y ∈ IBν a+b (ȳ) the set M(y) ∩ IBa (x̄) consists of exactly
one point, denoted by r(y), and suppose that the function
is Lipschitz continuous on IBν a+b (ȳ) with a Lipschitz constant λ . In addition, sup-
pose that
(a) λ ν < 1;
(b) λ ν a + λ b ≤ a.
Then for each p ∈ Q the set {x ∈ IBa (x̄) | x ∈ M(ϕ (p, x))} consists of exactly one
point, and the associated function
satisfies
λ
(10) |s(p ) − s(p)| ≤ |ϕ (p , s(p)) − ϕ (p, s(p))| for all p , p ∈ Q.
1−λν
Proof. Fix p ∈ Q and consider the function Φ p : IRn → IRn defined by
First, note that for x ∈ IBa (x̄) from (7) one has |ȳ − ϕ (p, x)| ≤ b + ν a, thus, by (8),
IBa (x̄) ⊂ dom Φ p . Next, if x ∈ IBa (x̄), we have from the identity x̄ = r(ϕ ( p̄, x̄)), the
Lipschitz continuity of r, and conditions (7) and (b) that
|Φ p (x̄) − x̄| = |r(ϕ (p, x̄)) − r(ϕ ( p̄, x̄))| ≤ λ |ϕ (p, x̄) − ϕ ( p̄, x̄)| ≤ λ b ≤ a(1 − λ ν ).
|Φ p (x ) − Φ p (x)| = |r(ϕ (p, x )) − r(ϕ (p, x))| ≤ λ |ϕ (p, x ) − ϕ (p, x)| ≤ λ ν |x − x|,
that is, Φ p is Lipschitz continuous in IBa (x̄) with constant λ ν < 1, from condition
(a). We are in position then to apply the contraction mapping principle 1A.2 and to
conclude from it that Φ p has a unique fixed point in IBa (x̄).
Denoting that fixed point by s(p), and doing this for every p ∈ Q, we get
a function s : Q → IBa (x̄). But having x = Φ p (x) is equivalent to having x =
r(ϕ (p, x)) = M(ϕ (p, x)) ∩ IBa (x̄). Hence s is the function in (9). Moreover, since
s(p) = r(ϕ (p, s(p))), we have from the Lipschitz continuity of r and (7) that, for
any p , p ∈ Q,
λ κ +ε
(11) ≤ ,
1−λν 1 − κμ
as is possible under the assumption that κ μ < 1. Let a, b and c be positive numbers
such that
|σ (y) − σ (y )| ≤ λ |y − y| for y, y ∈ IBν a+b (0),
|e(p, x ) − e(p, x)| ≤ ν |x − x | for x, x ∈ IBa (x̄) and p ∈ IBc ( p̄),
3
Lemma 2B.6 can be stated in a complete metric space X and then it will be equivalent to the
standard formulation of the contraction mapping principle in Theorem 1A.2. There is no point,
of course, in giving a fairly complicated equivalent formulation of a classical result unless, as in
our case, this formulation would bring some insights and dramatically simplify the proofs of later
results.
2 Implicit Function Theorems for Variational Problems 79
we obtain that the solution mapping S in (2) has a single-valued localization s around
p̄ for x̄. Due to (11), the inequality in (6) holds for Q = IBc ( p̄). That estimate implies
the continuity of s at p̄, in particular.
From Theorem 2B.5 we obtain a generalization of Theorem 1E.13, the result in
Chapter 1 about implicit functions without differentiability, in which the function f
is replaced now by the sum f + F for an arbitrary set-valued mapping F. The next
statement, 2B.7, covers most of this generalization; the final part of 1E.13 (giving
special consequences when μ = 0) will be addressed in the follow-up statement,
2B.8.
For the case of 2B.7 with μ = 0, in which case h is a partial first-order approxima-
tion of f with respect to x at ( p̄, x̄), much more can be said about the single-valued
localization s. The details are presented in the next result, which extends the part of
1E.13 for this case, and with it, Corollaries 2B.2 and 2B.3. We see that, by adding
some relatively mild assumptions about the function f (while still allowing F to
be arbitrary!), we can develop a first-order approximation of the localized solution
mapping s in Theorem 2B.5. This opens the way to obtain differentiability proper-
ties of s, for example.
(c) If, along with (a), f has a first-order approximation r with respect to p at
( p̄, x̄), then, for Q as in (6), the function η : Q → IRn defined by
Proof. Let the constants a and c be as in the proof of Theorem 2B.5; then Q =
IBc ( p̄). Let U = IBa (x̄). For p ∈ Q, from (13) we have
along with x̄ = s( p̄) = σ (0). Let κ equal lip (σ ; 0) and consider for any ε > 0 the
estimate in (6) with μ = 0. Let p ∈ Q, p = p̄ and p = p̄ in (6) and divide both sides
of (6) by |p − p̄|. Taking the limsup as p → p̄ and ε → 0 gives us (14).
Under the assumptions of (b), in a similar way, by letting p , p ∈ Q, p = p in (6),
and dividing both sides of (6) by |p − p| and passing to the limit, we obtain (15).
Observe that (15) follows directly from 2B.7.
Consider now any λ > clm (s; p̄) and ε > 0. Make the neighborhoods Q and U
smaller if necessary so that for all p ∈ Q and x ∈ U we have |s(p)− s( p̄)| ≤ λ |p − p̄|
and
and furthermore so that the points −e(p, x) and −r(p) + f ( p̄, x̄) are contained in a
neighborhood of 0 on which the function σ is Lipschitz continuous with Lipschitz
constant κ + ε = lip (σ ; 0) + ε . Then, for p ∈ Q, we get by way of (18), along with
the first inequality in (19) and the fact that e( p̄, x̄) = 0, the estimate that
2 Implicit Function Theorems for Variational Problems 81
Since ε can be arbitrarily small and also s( p̄) = x̄ = σ (0) = η ( p̄), the function η
defined in (16) is a first-order approximation of s at p̄.
Moving on to part (d) of the theorem, suppose that the assumptions in (b)(c) are
satisfied and also σ (y) = x̄ + Ay. Again, choose any ε > 0 and further adjust the
neighborhoods Q of p̄ and U of x̄ so that
Since ε can be arbitrarily small, we see that the first-order approximation of s fur-
nished by η is strict, and the proof is complete.
Note that the assumption in part (d), that the localization σ of G−1 = (h + F)−1
around 0 for x̄ is affine, can be interpreted as a sort of differentiability condition on
G−1 at 0 with A giving the derivative mapping.
has a Lipschitz continuous single-valued localization σ around 0 for x̄. Then not
only do the conclusions of Theorem 2B.5 hold for a solution localization s, but also
there is a first-order approximation η to s at p̄ given by
η (p) = σ − ∇ p f ( p̄, x̄)(p − p̄) .
82 2 Implicit Function Theorems for Variational Problems
The second part of Corollary 2B.9 shows how the implicit function theorem for
equations as stated in Theorem 1D.13 is covered as a special case of Theorem 2B.7.
In the case of the generalized equation (1) where f (p, x) = g(x)− p for a function
g : IRn → IRm (d = m), so that
(22) S(p) = x p ∈ g(x) + F(x) = (g + F)−1 (p),
the inverse function version of Theorem 2B.8 has the following symmetric form.
Theorem 2B.10 (inverse version). In the framework of the solution mapping (22),
consider any pair ( p̄, x̄) with x̄ ∈ S( p̄). Let h be any strict first-order approximation
to g at x̄. Then (g + F)−1 has a Lipschitz continuous single-valued localization s
around p̄ for x̄ if and only if (h + F)−1 has such a localization σ around p̄ for x̄, in
which case σ is a first-order approximation of s at p̄ and
r(y) ∈ M(y) ∩ IBa (x̄) for y ∈ IBν a+b (ȳ) and r(ȳ) = x̄.
In addition, suppose now that the Lipschitz constant λ for the function r is such that
the conditions (a) and
(b) in the statement of Lemma
2B.6 are fulfilled. Then for
each p ∈ Q the set x ∈ IBa (x̄) x ∈ M(ϕ (p, x)) contains a point s(p) such that the
function p → s(p) satisfies s( p̄) = x̄ and
λ
(24) |s(p ) − s(p)| ≤ |ϕ (p , s(p)) − ϕ (p, s(p))| for all p , p ∈ Q.
1−λν
Thus, the mapping N := p → x x ∈ M(ϕ (p, x)) ∩ IBa (x̄) has a local selection
s around p̄ for x̄ which satisfies (24). The difference from Lemma 2B.6 is that r is
now required only to be a local selection of the mapping M with specified properties,
and then we obtain a local selection s of N at p̄ for x̄. For the rest use the proofs of
Theorems 2B.5 and 2B.8.
and technical assumptions about differentiability have been greatly relaxed. Much
of the rest of this chapter will be concerned with working out the consequences in
situations where additional structure is available. Here, however, we reflect on the
ways that parameters enter the picture and the issue of whether there are “enough”
parameters, which emerges as essential in drawing good conclusions about solution
mappings.
The differences in parameterization between an inverse function theorem and an
implicit function theorem are part of a larger pattern which deserves, at this stage, a
closer look. Let’s start by considering a generalized equation without parameters,
(2) f : IRd × IRn → IRm having f ( p̄, x) ≡ g(x) for a particular p̄ ∈ IRd .
which we proceed to study around p̄ and a point x̄ ∈ S( p̄) for the presence of a nice
localization σ . Different parameterizations yield different solution mappings, which
may possess different properties according to the assumptions placed on f .
That’s the general framework, but the special kind of parameterization that corre-
sponds to the “inverse function” case has a fundamental role which is worth trying to
understand more fully. In that case, we simply have f (p, x) = g(x) − p in (2), so that
in (1) we are solving g(x) + F(x) p and the solution mapping is S = (g + F)−1 .
Interestingly, this kind of parameterization comes up even in obtaining “implicit
function” results through the way that approximations are utilized. Recall that in
Theorem 2B.5, for a function h which is “close” to f ( p̄, ·) near x̄, the mapping
(h + F)−1 having x̄ ∈ (h + F)−1 (0) is required to have a Lipschitz continuous single-
valued localization around 0 for x̄. Only then are we able to deduce that the solution
mapping S in (3) has a localization of such type at p̄ for x̄. In other words, the de-
sired conclusion about S is obtained from an assumption about a simpler solution
mapping in the “inverse function” category.
When S itself already belongs to that category, because f (p, x) = g(x) − p and
S = (g + F)−1 , another feature of the situation emerges. Then, as seen in Theorem
2B.10, the assumption made about (h + F)−1 is not merely sufficient for obtain-
ing the desired localization of S but actually necessary. This distinction was already
observed in the classical setting. In the “symmetric” version of the inverse func-
tion theorem in 1C.3, the invertibility of a linearized mapping is both necessary and
sufficient for the conclusion, whereas such invertibility acts only as a sufficient con-
dition in the implicit function theorem 1B.2 (even though that theorem and the basic
version of the inverse function theorem 1A.1 are equivalent to each other).
2 Implicit Function Theorems for Variational Problems 85
The reason why the rank condition in (4) can be interpreted as ensuring the rich-
ness of the parameterization is that it can always be achieved through supplementary
parameters. Any parameterization function f having the specified strict differentia-
bility can be extended to a parameterization function f˜ with
which does satisfy the ampleness condition, since trivially rank ∇q f˜(q̄, x̄) = m. The
generalized equation being solved then has solution mapping
(6) S : (p, y) → x f (p, x) + F(x) y .
Lemma 2C.1. Let f : IRd × IRn → IRm with ( p̄, x̄) ∈ int dom f afford an ample pa-
rameterization of the generalized equation (1) at x̄. Suppose that f has a strict first-
order approximation h : IRn → IRm with respect to x uniformly in p at ( p̄, x̄). Then
the mapping
(7) Ψ : (x, y) → p e(p, x) + y = 0 for (x, y) ∈ IRn × IRm ,
where e(p, x) = f (p, x) − h(x), has a local selection ψ around (x̄, 0) for p̄ which
satisfies
and
Proof. Let A = ∇ p f ( p̄, x̄); then AAT is invertible. Without loss of generality, sup-
pose x̄ = 0, p̄ = 0, and f (0, 0) = 0; then h(0) = 0. Let c = |AT (AAT )−1 |. Let
0 < ε < 1/(2c) and choose a positive a such that for all x, x ∈ aIB and p, p ∈ aIB
we have
86 2 Implicit Function Theorems for Variational Problems
and
For b = a(1 − 2cε )/c, fix x ∈ aIB and y ∈ bIB, and consider the mapping
Through (9) and (10), keeping in mind that e(0, 0) = 0, we see that
The contraction mapping principle 1A.2 then applies, and we obtain from it the
existence of a unique p ∈ aIB such that
We denote by ψ (x, y) the unique solution in aIB of this equation for x ∈ aIB and
y ∈ bIB. Multiplying both sides of (11) by A and simplifying, we get e(p, x) + y = 0.
This means that for each (x, y) ∈ aIB × bIB the equation e(p, x) + y = 0 has ψ (x, y)
as a solution. From (11), we know that
Hence
cε
|ψ (x, y) − ψ (x , y)| ≤
|x − x |.
1 − cε
Since ε can be arbitrarily small, we conclude that (8a) holds. Analogously, from
(12) and using again (9) and (10) we obtain
|ψ (x, y) − ψ (x, y )|
≤ c| f (ψ (x, y), x) + y − Aψ (x, y) − f (ψ (x, y ), x) − y + Aψ (x, y )|
≤ c|y − y | + cε |ψ (x, y) − ψ (x, y)|,
and then
2 Implicit Function Theorems for Variational Problems 87
c
|ψ (x, y) − ψ (x, y )| ≤ |y − y|
1 − cε
which gives us (8b).
We are now ready to present the first main result of this section:
We now apply Lemma 2B.6 with ϕ (p, x) = ψ (x, y) for p = y and M(p) = S(p),
thereby obtaining that the mapping
IBc (0) y → x ∈ IBa (x̄) x ∈ S(ψ (x, y))
Theorem 2C.3 (parametric robustness). Consider the generalized equation (1) un-
der the assumption that x̄ is a point where g is strictly differentiable. Let h(x) =
g(x̄) + ∇g(x̄)(x − x̄). Then the following statements are equivalent.
(a) (h + F)−1 has a Lipschitz continuous single-valued localization around 0
for x̄.
(b) For every parameterization (2) in which f is strictly differentiable at (x̄, p̄),
the mapping S in (3) has a Lipschitz continuous single-valued localization around p̄
for x̄.
Proof. The implication from (a) to (b) already follows from Theorem 2B.8. The
focus is on the reverse implication. This is valid because, among the parameteriza-
tions covered by (b), there will be some that are ample. For instance, one could pass
from a given one to an ample parameterization in the mode of (5). For the solution
mapping for such a parameterization, we have the implication from (a) to (b) in
Theorem 2C.2. That specializes to what we want.
A solution x̄ to the generalized equation (1) is said to be parametrically robust
when the far-reaching property in (b) of Theorem 2C.3 holds. In that terminology,
Theorem 2C.3 gives a criterion for parametric robustness.
4 Also called Bouligand differentiable or B-differentiable under the additional assumption of Lip-
schitz continuity, see the book Facchinei and Pang [2003].
90 2 Implicit Function Theorems for Variational Problems
1
(3) | f (x̄; u) − f (x̄; v)| = lim | f (x̄ + tu) − f (x̄ + tv)| ≤ λ |u − v|.
t→0 t
t>0
uk 1
≤ 2λ | − ū| + | ( f (x̄ + tk ū) − f (x̄)) − f (x̄; ū)|,
tk tk
where in the final inequality we invoke (3). Since uk is arbitrarily chosen, we con-
clude by passing to the limit as k → ∞ that for h(x) = f (x̄) + f (x̄; x − x̄) we do have
clm ( f − h; x̄) = 0.
When the semiderivative D f (x̄) : IRn → IRm is linear, semidifferentiability turns
into differentiability, and strict semidifferentiability turns into strict differentiability.
The connections known between D f (x̄) and the calmness modulus and Lipschitz
modulus of f at x̄ under differentiability can be extended to semidifferentiability by
adopting the definition that
We then have clm (D f (x̄); 0) = |D f (x̄)| and consequently clm ( f ; x̄) = |D f (x̄)|,
which in the case of strict semidifferentiability becomes lip ( f ; x̄) = |D f (x̄)|. Thus
in particular, semidifferentiability of f at x̄ implies that clm ( f ; x̄) < ∞, while strict
semidifferentiability at x̄ implies that lip ( f ; x̄) < ∞.
from that the equivalence with having a first-order approximation as in the definition
of semidifferentiability.
Examples.
1) The function f (x) = e|x| for x ∈ IR is not differentiable at 0, but it is semidif-
ferentiable there and its semiderivative is given by D f (0) : w → |w|. This is actually
a case of strict semidifferentiability. Away from 0, f is of course continuously dif-
ferentiable (hence strictly differentiable).
2) The function f (x1 , x2 ) = min{x1 , x2 } on IR2 is continuously differentiable at
every point away from the line where x1 = x2 . On that line, f is strictly semidiffer-
entiable with
D f (x1 , x2 )(w1 , w2 ) = min{w1 , w2 }.
3) A function of the form f (x) = max{ f1 (x), f2 (x)}, with f1 and f2 continuously
differentiable from IRn to IR, is strictly differentiable at all points x where f1 (x) =
f2 (x) and semidifferentiable where f1 (x) = f2 (x), the semiderivative being given
there by
D f (x)(w) = max{D f1 (x)(w), D f2 (x)(w)}.
However, f might not be strictly semidifferentiable at such points; see Example
2D.5 below.
The semiderivative obeys standard calculus rules, such as semidifferentiation of
a sum, product and ratio, and, most importantly, the chain rule. We pose the verifi-
cation of these rules as exercises.
Guide. Apply Proposition 1E.1 and observe that a composition of positively homo-
geneous functions is positively homogeneous.
Example 2D.5. The functions f and g in Exercise 2D.4 cannot exchange places: the
composition of a strictly semidifferentiable function with a strictly differentiable
function is not always strictly semidifferentiable. For a counterexample, consider
the function f : IR2 → IR given by
As ε goes to 0 this ratio tends to 1, and therefore lip ( f − D f (0, 0); (0, 0)) ≥ 1.
Our aim now is to forge out of Theorem 2B.8 a result featuring semideriva-
tives. For this purpose, we note that if f (p, x) is (strictly) semidifferentiable at ( p̄, x̄)
jointly in its two arguments, it is also “partially (strictly) semidifferentiable” in these
arguments separately. In denoting the semiderivative of f ( p̄, ·) at x̄ by Dx f ( p̄, x̄) and
the semiderivative of f (·, x̄) at p̄ by D p f ( p̄, x̄), we have
Dx f ( p̄, x̄)(w) = D f ( p̄, x̄)(0, w), D p f ( p̄, x̄)(q) = D f ( p̄, x̄)(q, 0).
In contrast to the situation for differentiability, however, D f ( p̄, x̄)(q, w) isn’t neces-
sarily the sum of these two partial semiderivatives.
associated with a choice of F : IRn → → IRm and f : IRd × IRn → IRm such that f is
strictly semidifferentiable at ( p̄, x̄). Suppose that the inverse G−1 of the mapping
Proof. First, note that s( p̄) = σ (0) = x̄ and the function r in Theorem 2B.8 may be
chosen as r(p) = f (x̄, p̄) + D p f ( p̄, x̄)(p − p̄). Then we have
|s(p) − s( p̄) − (Dσ (0)◦(−D p f ( p̄, x̄)))(p − p̄)| ≤ |s(p) − σ (−r(p) + r( p̄))|
+|σ (−D p f ( p̄, x̄)(p − p̄)) − σ (0) − Dσ (0)(−D p f ( p̄, x̄)(p − p̄))|.
that i. A sequence with the property that every subsequence contains a conver-
gent subsubsequence necessarily converges as a whole. Thus the sequence {Δ k }∞ k=1
converges with its limit equaling ∇ fi (x̄)·w for at least one i ∈ I(x̄). This con-
firms
semidifferentiability
and establishes that the semiderivative is a selection from
∇ fi (x̄)·w i ∈ I(x̄) .
The functions in the examples given after 2D.2 are not only semidifferentiable
but also piecewise smooth. Of course, a semidifferentiable function does not have to
be piecewise smooth, e.g., when it is a selection of infinitely many, but not finitely
many, smooth functions.
A more elaborate example of a piecewise smooth function is the projection map-
ping PC on a nonempty, convex and closed set C ⊂ IRn specified by finitely many
inequalities.
for convex functions gi of class C 2 on IRn , let x̄ be a point of C at which the gra-
dients ∇gi (x̄) associated with the active constraints, i.e., the ones with gi (x̄) = 0,
are linearly independent. Then there is an open neighborhood O of x̄ such that the
projection mapping PC is piecewise smooth on O.
Guide. Since in a sufficiently small neighborhood of x̄ the inactive constraints re-
main inactive, one can assume without loss of generality that gi (x̄) = 0 for all
i = 1, . . . , m. Recall that because C is nonempty, closed and convex, PC is a Lipschitz
continuous function from IRn onto C (see 1D.5). For each u around x̄ the projection
PC (u) is the unique solution to the problem of minimizing 12 |x − u|2 in x subject to
gi (x) ≤ 0 for i = 1, . . . , m. The associated Lagrangian variational inequality (Theo-
rem 2A.9) tells us that when u belongs to a small enough neighborhood of x̄, the
point x solves the problem if and only if x is feasible and there is a subset J of the
index set {1, 2, . . ., m} and Lagrange multipliers yi ≥ 0, i ∈ J, such that
x + ∑i∈J yi ∇gi (x)T = u,
(6)
gi (x) = 0, i ∈ J.
The linear independence of the gradients of the active constraint gradients yields
that the Lagrange multiplier vector y is unique, hence it is zero for u = x = x̄. For
each fixed subset J of the index set {1, 2, . . . , m} the Jacobian of the function on the
left of (6) at (x̄, 0) is
In + ∑i∈J yi ∇2 gi (x̄) ∇gJ (x̄)T
Q= ,
∇gJ (x̄) 0
where
2 Implicit Function Theorems for Variational Problems 95
∂ gi
∇gJ (x̄) = (x̄) and In is the n × n identity matrix.
∂xj i∈J, j∈{1,...,n}
Since ∇gJ (x̄) has full rank, the matrix Q is nonsingular and then we can apply the
classical inverse function theorem (Theorem 1A.1) to the equation (6), obtaining
that its solution mapping u → (xJ (u), yJ (u)) has a smooth single-valued
localization
around u = x̄ for (x, y) = (x̄, 0). There are finitely many subsets J of 1, . . . , m ,
and for each u close to x̄ we have PC (u) = xJ (u) for some J. Thus, the projection
mapping PC is a selection of finitely many smooth functions.
Exercise 2D.10. For a set of the form C = x ∈ IRn Ax = b ∈ IRm , if the m × n
matrix A has linearly independent rows, then the projection mapping is given by
Guide. The optimality condition (6) in this case leads to the system of equations
x I AT PC (x)
= .
b A 0 λ
where f : IRd × IRn → IRn and C is a nonempty, closed and convex subset of IRn . The
corresponding solution mapping S : IR p →
→ IRn , with
(2) S(p) = x f (p, x) + NC (x) 0 ,
has already been the direct subject of Theorem 2B.1, the implicit function theorem
of Robinson. From there we moved on to broader results about solution mappings to
generalized equations, but now wish to summarize what those results mean back in
96 2 Implicit Function Theorems for Variational Problems
the variational inequality setting, and furthermore to explore special features which
emerge under additional assumptions on the set C.
(3) G(x) = f ( p̄, x̄) + ∇x f ( p̄, x̄)(x − x̄) + NC (x), with G(x̄) 0,
Polyhedral convex sets. A set C in IRn is said to be polyhedral convex when it can
be expressed as the intersection of finitely many closed half-spaces and/or hyper-
planes.
2 Implicit Function Theorems for Variational Problems 97
Any such set must obviously be closed. The empty set 0/ and the whole space IRn are
regarded as polyhedral convex sets, in particular.
Polyhedral convex cones are characterized by having a representation (5) in
which αi = 0 for all i. A basic fact about polyhedral convex cones is that they can
equally well be represented in another way, which we recall next.
It is easy to see that the cone K ∗ that is polar to a cone K having a representation
of the kind in (6) consists of the vectors x satisfying bi , x ≤ 0 for i = 1, . . . , m. The
polar of a polyhedral convex cone having such an inequality representation must
therefore have the representation in (6), inasmuch as (K ∗ )∗ = K for any closed,
convex cone K. This fact leads to a special description of the tangent and normal
cones to a polyhedral convex set.
and
Proof. The formula (7) for TC (x) follows from (5) just by applying the definition
of the tangent cone in 2A. Then from (7) and the preceding facts about polyhedral
cones and polarity, utilizing also the relation in 2A(8), we obtain (8). The equality
(9) is deduced simply by comparing (5) and (7). To obtain (10), observe that I(x) ⊂
I(x̄) for x close to x̄ and then the inclusion follows from (7).
TC(x)
KC(x,v)
v x
NC(x)
The normal cone mapping NC associated with a polyhedral convex set C has a
special property which will be central to our analysis. It revolves around the follow-
ing notion.
Critical cone. For a convex set C, any x ∈ C and any v ∈ NC (x), the critical cone to
C at x for v is
KC (x, v) = w ∈ TC (x) w ⊥ v .
If C is polyhedral, then KC (x, v) is polyhedral as well, as seen immediately from
the representation in (7).
Lemma 2E.4 (reduction lemma). Let C be a polyhedral convex set in IRn , and let
The graphical geometry of the normal cone mapping NC around (x̄, v̄) reduces then
to the graphical geometry of the normal cone mapping NK around (0, 0), in the sense
that
Proof. Since we are only involved with local properties of C around one of its points
x̄, and C − x̄ agrees with the cone TC (x̄) around 0 by Theorem 2E.3, we can assume
without loss of generality that x̄ = 0 and C is a cone, and TC (x̄) = C. Then, in terms
of the polar cone C∗ (which likewise is polyhedral on the basis of Theorem 2E.2),
we have the characterization from 2A.3 that
In particular for our focus on the geometry of gph NC around (0, v̄), we have from
(12) that
(13) NC∗ (v̄) = w ∈ C v̄, w = 0 = K.
We know on the other hand from 2E.3 that U ∩ [C∗ − v̄] = U ∩ TC∗ (v̄) for some
neighborhood U of 0, where moreover TC∗ (v̄) is polar to NC∗ (v̄), hence equal to K ∗
by (13). Thus, there is a neighborhood O of (0, 0) such that
Our goal (in the context of x̄ = 0) is to show that (14) reduces to (11), at least
when the neighborhood O in (14) is chosen still smaller, if necessary. Because of
(15), this comes down to demonstrating that v̄, w = 0 in the circumstances of (14).
We can take C to be represented by
(16) C = w bi , w ≤ 0 for i = 1, . . . , m ,
The relations in (12) can be coordinated with these representations as follows. For
each index set I ⊂ {1, . . . , m}, consider the polyhedral convex cones
WI = w ∈ C bi , w = 0 for i ∈ I , VI = ∑i∈I yi bi with yi ≥ 0 ,
with W0/ = C and V0/ = {0}. Then v ∈ NC (w) if and only if, for some I, one has w ∈ WI
and v ∈ VI . In other words, gph NC is the union of the finitely many polyhedral
convex cones GI = WI × VI in IRn × IRn .
Among these cones GI , we will only be concerned with the ones containing (0, v̄).
Let I be the collection of index sets I ⊂ {1, . . . , m} having that property. According
to (9) in 2E.3, there exists for each I ∈ I a neighborhood OI of (0, 0) such that
OI ∩ [GI − (0, v̄)] = OI ∩ TGI (0, v̄). Furthermore, TGI (0, v̄) = WI × TVI (v̄). This has
the crucial consequence that when v̄ + u ∈ NC (w) with (w, u) near enough to (0, 0),
100 2 Implicit Function Theorems for Variational Problems
we also have v̄ + τ u ∈ NC (w) for all τ ∈ [0, 1]. Since having v̄ + τ u ∈ NC (w) entails
having v̄ + τ u, w = 0 through (12), this implies that v̄, w = −τ u, w for all τ ∈
[0, 1]. Hence v̄, w = 0, as required. We merely have to shrink the neighborhood O
in (14) to lie within every OI for I ∈ I .
Example 2E.5. The nonnegative orthant IRn+ is a polyhedral convex cone in IRn ,
since it consists of the vectors x = (x1 , . . . , xn ) satisfying the linear inequalities x j ≥
0, j = 1, . . . , n. For v = (v1 , . . . , vn ), one has
Thus, whenever v ∈ NIRn+ (x) one has in terms of the index sets
J1 = j x j > 0, v j = 0 ,
J2 = j x j = 0, v j = 0 ,
J3 = j x j = 0, v j < 0
that the vectors w = (w1 , . . . , wn ) belonging to the critical cone to IRn+ at x for v are
characterized by
⎧
⎨ w j free for j ∈ J1 ,
w ∈ KIRn+ (x, v) ⇐⇒ w j ≥ 0 for j ∈ J2 ,
⎩ w = 0 for j ∈ J .
j 3
In the developments ahead, we will make use of not only critical cones but also
certain subspaces.
Critical subspaces. The smallest linear subspace that includes the critical cone
KC (x, v) will be denoted by KC+ (x, v), whereas the smallest linear subspace that is
included in KC (x, v) will be denoted by KC− (x, v), the formulas being
KC+ (x, v) = KC (x, v) − KC (x, v) = w− w w, w ∈ K
C (x, v) ,
(18)
KC− (x, v) = KC (x, v) ∩ [−KC (x, v)] = w ∈ KC (x, v) − w ∈ KC (x, v) .
The formulas follow from the fact that KC (x, v) is already a convex cone. Obvi-
ously, KC (x, v) is itself a subspace if and only if KC+ (x, v) = KC− (x, v).
Proof. According to reduction lemma 2E.4, we have, for (w, u) in some neighbor-
hood of (0, 0), that v̄ + u ∈ NC (x̄ + w) if and only if u ∈ NK (w). In the change of
notation from u to v = u + Aw, this means that, for (w, v) in a neighborhood of
(0, 0), we have v ∈ G(x̄ + w) if and only if v ∈ G0 (w). Thus, the existence of a
Lipschitz continuous single-valued localization σ of G−1 around 0 for x̄ as in (a)
corresponds to the existence of a Lipschitz continuous single-valued localization σ0
of G−1
0 around 0 for 0; the relationship is given by σ (v) = x̄ + σ0 (v). But when-
ever v ∈ G0 (w) we have λ v ∈ G0 (λ w) for all λ > 0, i.e., the graph of G0 is a cone.
Therefore, when σ0 exists it can be scaled arbitrarily large and must correspond to
G−1 n
0 being a single-valued mapping with all of IR as its domain.
−1
We claim next that when G0 is single-valued everywhere it is necessarily Lip-
schitz continuous. This comes out of the argument pursued in the proof of 2E.4 in
analyzing the graph of NC , which applies equally well to NK , inasmuch as K is a
polyhedral convex cone. Specifically, the graph of NK is the union of finitely many
polyhedral convex cones in IRn × IRn . The same also holds then for the graphs of G0
and G−10 . It remains only to observe that if a single-valued mapping has its graph
composed of the union of finitely many polyhedral convex sets it has to be Lipschitz
continuous (prove or see 3D.6).
This leaves us with verifying that the condition in (20) is sufficient for G−1
0 to be
single-valued with all of IRn as its domain. We note in preparation for this that
(K )⊥ = K ∗ ∩ (−K ∗ ) = (K ∗ ) , (K )⊥ = K ∗ − K ∗ = (K ∗ ) .
+ − − +
(21)
imply. Again we utilize the fact that the graph of G−1 0 is the union of finitely many
polyhedral convex cones in IRn × IRn . Under the mapping (v, w) → v, each of them
projects onto a cone in IRn ; the union of these cones is dom G−1 0 . Since the image
of a polyhedral convex cone under a linear transformation is another polyhedral
convex cone, in consequence of 2E.2 (since the image of a cone generated by finitely
many vectors is another such cone), and polyhedral convex cones are closed sets
in particular, this ensures that dom G−1 0 is closed. Then there is certain to exist a
point v0 ∈ dom G−1 0 that is closest to ṽ; for all τ > 0 sufficiently small, we have
/ dom G−1
v0 + τ (ṽ − v0 ) ∈ 0 . For each of the polyhedral convex cones D in the finite
−1
union making up dom G0 , if v0 ∈ D, then v0 must be the projection PD (ṽ), so
that ṽ − v0 must belong to ND (v0 ) (cf. relation (4) in 2A). It follows that, for some
neighborhood U of v0 , we have
Consider any w0 ∈ G−10 (v0 ); this means v0 − Aw0 ∈ NK (w0 ). Let K0 be the critical
cone to K at w0 for v0 − Aw0 :
(23) K0 = w ∈ TK (w0 ) w ⊥ (v0 − Aw0 ) .
In the line of argument already pursued, the geometry of the graph of NK around
(w0 , v0 − Aw0 ) can be identified with that of the graph of NK0 around (0, 0). Equiv-
alently, the geometry of the graph of G−1 0 = (A + NK )
−1 around (v , w ) can be
0 0
identified with that of (A + NK0 ) around (0, 0); for (v , w ) near enough to (0, 0),
−1
we have w0 + w ∈ G−1 −1
0 (v0 + v ) if and only if w ∈ (A + NK0 ) (v ). Because of
(22) holding for the neighborhood U of v0 , this implies that ṽ − v0 , v ≤ 0 for all
v ∈ dom (A + NK0 )−1 close to 0. Thus,
The case of w = 0 has NK0 (w ) = K0∗ , so (24) implies in particular that ṽ − v0 , u ≤
0 for all u ∈ K0∗ , so that ṽ − v0 ∈ (K0∗ )∗ = K0 . On the other hand, since u = 0 is
always one of the elements of NK0 (w ), we must have from (24) that ṽ− v0 , Aw ≤ 0
for all w ∈ K0 . Here ṽ − v0 , Aw = AT (ṽ − v0 ), w for all w ∈ K0 , so this means
AT (ṽ − v0 ) ∈ K0∗ . In summary, (24) requires, among other things, having
We observe now from the formula for K0 in (23) that K0 ⊂ TK (w0 ), where further-
more TK (w0 ) is the cone generated by the vectors w − w0 with w ∈ K and hence
lies in K − K. Therefore K0 ⊂ K + . On the other hand, because TK (w0 ) and NK (w0 )
are polar to each other by 2E.3, we have from (23) that K0 is polar to the cone
comprised by all differences v − τ (v0 − Aw0 ) with v ∈ NK (w0 ) and τ ≥ 0, which is
again polyhedral. That cone of differences must then in fact be K0∗ . Since we have
taken v0 and w0 to satisfy v0 − Aw0 ∈ NK (w0 ), and also NK (w0 ) ⊂ K ∗ , it follows that
2 Implicit Function Theorems for Variational Problems 103
AT w ⊥ K , AT w, w ≤ 0
+ −
(26) w∈K , =⇒ w = 0.
(28) AT T
11 w1 + A21 w2 = 0, w2 , AT T
12 w1 + A22 w2 ≤ 0 =⇒ w1 = 0, w2 = 0.
In particular, through the choice of w2 = 0, (27) insists that the only w1 with A11 w1 =
0 is w1 = 0. Thus, A11 must be nonsingular. Then the initial equation in (27) can be
solved for w1 , yielding w1 = −A−1 11 A12 w2 , and this expression can be substituted
into the inequality, thereby reducing the condition to
AT T T −1 T −1 T
22 − A12(A11 ) A21 = (A22 − A21 A11 A12 ) ,
Example 2E.7.
(a) When the critical cone K in Theorem 2E.6 is a subspace, the condition in
(20) reduces to the nonsingularity of the linear transformation K w → PK (Aw),
where PK is the projection onto K.
(b) When the critical cone K in Theorem 2E.6 is pointed, in the sense that K ∩
(−K) = {0}, the condition in (20) reduces to the requirement that w, Aw > 0 for
all nonzero w ∈ K + .
104 2 Implicit Function Theorems for Variational Problems
Suppose that for each u ∈ IRn there is a unique solution w = s̄(u) to the auxiliary
variational inequality Aw − u + NK (w) 0, this being equivalent to saying that
(30) lip (s; p̄) ≤ lip (s̄; 0) · |∇ p f ( p̄, x̄)|, Ds( p̄)(q) = s̄(−∇ p f ( p̄, x̄)q).
Moreover, under the ample parameterization condition, rank ∇ p f ( p̄, x̄) = n, condi-
tion (29) is not only sufficient but also necessary for a Lipschitz continuous single-
valued localization of S around p̄ for x̄.
Proof. We merely have to combine the observation made before this theorem’s
statement with the statement of Theorem 2E.1. According to formula (4) in that
theorem for the first-order approximation η of s at p̄, we have η ( p̄ + q) − x̄ =
s̄(−∇ p f ( p̄, x̄)q). Because K is a cone, the mapping NK is positively homogeneous,
and the same is true then for A + NK and its inverse, which is s̄. Thus, the function
q → s̄(−∇ p f ( p̄, x̄)q) gives a first-order approximation to s( p̄ + q) − s( p̄) at q = 0
that is positively homogeneous. We conclude that s is semidifferentiable at p̄ with
this function furnishing its semiderivative, as indicated in (30).
As a special case of Example 2E.7(a), if C = IRn the result in Theorem 2E.8 re-
duces once more to a version of the classical implicit function theorem. Further in-
sights into solution mappings associated with variational inequalities will be gained
in Chapter 4.
2 Implicit Function Theorems for Variational Problems 105
Exercise 2E.9. Prove that the projection mapping PC associated with a polyhedral
convex set C is Lipschitz continuous and semidifferentiable everywhere, with its
semiderivative being given by
Guide. Use the relation between the projection mapping and the normal cone map-
ping given in formula 2A(4).
Additional facts about critical cones, which will be useful later, can be developed
from the special geometric structure of polyhedral convex sets.
Proposition 2E.10 (local behavior of critical cones and subspaces). Let C ⊂ IRn be
a polyhedral convex set, and let v̄ ∈ NC (x̄). Then the following properties hold:
(a) KC (x, v) ⊂ KC+ (x̄, v̄) for all (x, v) ∈ gph NC in some neighborhood of (x̄, v̄).
(b) KC (x, v) = KC+ (x̄, v̄) for some (x, v) ∈ gph NC in each neighborhood of (x̄, v̄).
Proof. By appealing to 2E.3 as in the proof of 2E.4, we can reduce to the case
where x̄ = 0 and C is a cone. Theorem 2E.2 then provides a representation in terms
of a collection of nonzero vectors b1 , . . . , bm , in which C consists of all linear combi-
nations y1 b1 + · · · + ym bm with coefficients yi ≥ 0, and the polar cone C∗ consists of
all v such that bi , v ≤ 0 for all i. We know from 2A.3 that, at any x ∈ C, the normal
cone NC (x) is formed by the vectors v ∈ C∗ such that x, v = 0, so that NC (x) is the
cone that is polar to the one comprised of all vectors w − λ x with w ∈ C and λ ≥ 0.
Since the latter cone is again polyhedral (in view of Theorem 2E.2), hence closed,
it must in turn be the cone polar to NC (x) and therefore equal to TC (x). Thus,
TC (x) = y1 b1 + · · · + ymbm − λ x yi ≥ 0, λ ≥ 0 for any x ∈ C.
v ∈ NC (x) ⇐⇒ x ∈ F(v).
Then too, for such x and v, the critical cone KC (x, v) = w ∈ TC (x) w, v = 0 we
have
(32) KC (x, v) = w − λ x w ∈ F(v), λ ≥ 0 ,
and actually KC (x̄, v̄) = F(v̄) (inasmuch as x̄ = 0). In view of the fact, evident from
(31), that
I(v) ⊂ I(v̄) and F(v) ⊂ F(v̄) for all v near enough to v̄,
106 2 Implicit Function Theorems for Variational Problems
In that case KC (x, v) ⊂ F(v̄) − F(v̄) = KC (x̄, v̄) − KC (x̄, v̄) = KC+ (x̄, v̄), so (a) is valid.
To confirm (b), it will be enough now to demonstrate that, arbitrarily close to
x̄ = 0, we can find a vector x̃ for which KC (x̃, v̄) = F(v̄) − F(v̄). Here F(v̄) consists
by definition of all nonnegative linear combinations of the vectors bi with i ∈ I(v̄),
whereas F(v̄) − F(v̄) is the subspace consisting of all linear combinations. For arbi-
trary ε > 0, let x̃ = ỹ1 b1 +
· · · + ỹmbm with ỹi = ε fori ∈ I(v̄) but ỹi = 0 for i ∈/ I(v̄).
Then KC (x̃, v̄), equaling w − λ x̃ w ∈ F(v̄), λ ≥ 0 by (32), consists of all linear
combinations of the vectors bi for i ∈ I(v̄) in which the coefficients have the form
yi − λ ε with yi ≥ 0 and λ ≥ 0. Can any given choice of coefficients yi for i ∈ I(v̄)
be obtained in this manner? Yes, by taking λ high enough that yi + λ ε ≥ 0 for all
i ∈ I(v̄) and then setting yi = yi + λ ε . This completes the argument.
Our attention shifts now from special properties of the set C in a variational inequal-
ity to special properties of the function f and their effect on solutions.
monotone if and only if w, Aw > 0 for all w = 0, i.e., A is positive definite. These
terms make no requirement of symmetry on A. It may be recalled that any square
matrix A can be written as a sum As + Aa in which As is symmetric (A∗s = As ) and
Aa is antisymmetric (A∗a = −Aa ), namely with As = 12 [A + A∗ ] and Aa = 12 [A − A∗ ];
then w, Aw = w, As w. The monotonicity of f (x) = a + Ax thus depends only on
the symmetric part As of A; the antisymmetric part Aa can be anything.
For differentiable functions f that aren’t affine, monotonicity has a similar char-
acterization with respect to the Jacobian matrices ∇ f (x).
Exercise 2F.2 (monotonicity from derivatives). For a function f : IRn → IRn that is
continuously differentiable on an open convex set O ⊂ dom f , verify the following
facts.
(a) A necessary and sufficient condition for f to be monotone on O is the positive
semidefiniteness of ∇ f (x) for all x ∈ O.
(b) If ∇ f (x) is positive definite at every point x of a closed, bounded, convex set
C ⊂ O, then f is strongly monotone on C.
(c) If C is a convex subset of O such that ∇ f (x)w, w ≥ 0 for every x ∈ C and
w ∈ C − C, then f is monotone on C.
(d) If C is a convex subset of O such that ∇ f (x)w, w ≥ μ |w|2 for every x ∈ C
and w ∈ C − C, where μ > 0, then f is strongly monotone on C with constant μ.
Guide. Derive this from the characterizations in 2F.1 by investigating the deriva-
tives of the function ϕ (τ ) introduced there. In proving (c), argue by way of the mean
value theorem that f (x )− f (x), x − x equals ∇ f (x̃)(x − x), x − x for some point
x̃ on the line segment joining x with x .
We are ready now to develop some special results for variational inequalities
in which attention is devoted to the case when f is monotone. We work with the
basic perturbation scheme in which f (x) is replaced by f (x) − p for a parameter
vector p ∈ IRn . The solution mapping is then
(4) S(p) = x p − f (x) ∈ NC (x) = ( f + NC )−1 (p),
In parallel from the second inequality in (5), we have 0 ≤ f (x(t)), x(t)− x1 . There-
fore
where the final expression equals f (x(t)), x(t) − x̄ = t f (x(t)), x̃ − x̄, since x(t) −
x̄ = t[x̃ − x̄]. Thus, 0 ≤ f (x(t)), x̃ − x̄. Because x(t) → x̄ as t → 0, and f is contin-
uous, we conclude that f (x̄), x̃ − x̄ ≥ 0, as required.
such a result in 2A.1, but only for bounded sets C. The following result goes beyond
that boundedness restriction, without yet imposing any monotonicity assumption
on f . When combined with monotonicity, it will have particularly powerful conse-
quences.
Then the variational inequality (3) has a solution, and every solution x of (3) satisfies
|x − x̂| < ρ.
Proof. Any solution x to (3) would have f (x), x − x̂ ≤ 0 in particular, and then
necessarily |x − x̂| < ρ under (6). Hence it will suffice to show that (6) guarantees
the existence of at least x to (3) with |x − x̂| < ρ .
one solution
Let Cρ = x ∈ C |x − x̂| ≤ ρ and consider the modified variational inequal-
ity (3) in which C is replaced by Cρ . According to Theorem 2A.1, this modi-
fied variational inequality has a solution x̄. We have x̄ ∈ Cρ and − f (x̄) ∈ NCρ (x̄).
From 2A.7(b) we know that NCρ (x̄) = NC (x̄) + NB (x̄) for the ball B = IBρ (x̂) =
x |x − x̂| ≤ ρ . Thus,
Corollary 2F.6 (uniform local existence). Consider a function f : IRn → IRn and a
nonempty closed convex set C ⊂ dom f relative to which f is continuous (but not
necessarily monotone). Suppose there exist x̂ ∈ C, ρ > 0 and η > 0 such that
$
(8) there is no x ∈ C with |x − x̂| ≥ ρ and f (x), x − x̂ |x − x̂| ≤ η .
Proof. The stronger assumption here ensures that assumption (6) of Theorem 2F.5
is fulfilled by the function fv (x) = f (x) − v for every v with |v| ≤ η . Since S(v) =
( fv + NC )−1 (0), this leads to the desired conclusion.
We can proceed now to take advantage of monotonicity of f on C through the
property in 2F.1 and the observation that
Then, for any vector w such that x̂ + τ w ∈ C for all τ ∈ (0, ∞), the expression f (x̂ +
τ w), w is nondecreasing as a function of τ ∈ (0, ∞) and thus has a limit (possibly
∞) as τ → ∞.
Equivalently, in terms of τk = |xk − x̂| and wk = τk−1 (xk − x̂) we have f (x̂ +
τk wk ), wk ≤ ηk with |wk | = 1, x̂ + τk wk ∈ C and τk → ∞. Without loss of gener-
ality we can suppose that wk → w for a vector w again having |w| = 1. Then for
any τ > 0 and k high enough that τk ≥ τ , we have from the convexity of C that
x̂ + τ wk ∈ C and from the monotonicity of f that f (x̂ + τ wk ), wk ≤ ηk . On taking
the limit as k → ∞ and utilizing the closedness of C and the continuity of f , we get
x̂ + τ w ∈ C and f (x̂ + τ w), w ≤ 0. This being true for any τ > 0, we see that w ∈ W
and the limit condition in (a) is violated. The validity of the claim in (a) is thereby
confirmed.
The condition in (b) not only implies the condition in (a) but also, by a slight
extension of the argument, guarantees that the criterion in Corollary 2F.6 holds for
every ε > 0.
Exercise 2F.8 (Jacobian criterion for existence and uniqueness). Let f : IRn → IRn
and C ⊂ IRn be such that f is continuously differentiable on C and monotone relative
to C. Fix x̂ ∈ C and let W consist of the vectors w with |w| = 1 such that x̂ + τ w ∈ C
for all τ ∈ (0, ∞). Suppose there exists μ > 0 such that ∇ f (x)w, w ≥ μ for every
2 Implicit Function Theorems for Variational Problems 111
In the perspective of 2F.2(b), the result in 2F.8 seems to come close to invoking
strong monotonicity of f in the case where f is continuously differentiable. How-
ever, it only involves special vectors w, not every nonzero w ∈ IRn . For instance, in
the affine case where f (x) = Ax + b and C = IRn+ , the criterion obtained from 2F.8 by
choosing x̂ = 0 is simply that Aw, w > 0 for every w ∈ IRn+ with |w| = 1, whereas
strong monotonicity of f would require this for w in IRn , not just IRn+ . In fact, full
strong monotonicity has bigger implications than those in 2F.8.
with the aim of drawing on the achievements in Section 2E in the presence of mono-
tonicity properties of f with respect to x.
112 2 Implicit Function Theorems for Variational Problems
Proof. We apply Theorem 2E.1, observing that its assumption (b) is satisfied on
the basis of Theorem 2F.9 and the criterion in 2F.2(d) for strong monotonicity with
constant μ . Theorem 2F.9 tells us moreover that the Lipschitz constant for the lo-
calization σ in Theorem 2E.1 is no more than μ −1 , and we then obtain (12) from
Theorem 2E.1.
Theorem 2F.10 can be compared to the result in Theorem 2E.8. That result re-
quires C to be polyhedral but allows (11) to be replaced by a weaker condition in
terms of the critical cone K = KC (x̄, v̄) for v̄ = − f ( p̄, x̄). Specifically, instead of ask-
ing the inequality in (11) to hold for all w ∈ C − C, one only asks it to hold for all
w ∈ K − K such that ∇x f ( p̄, x̄)w ⊥ K ∩ (−K). The polyhedral convexity leads in this
case to the further conclusion that the localization is semidifferentiable.
Several types of variational inequalities are closely connected with problems of op-
timization. These include the basic condition for minimization in Theorem 2A.6 and
the Lagrange condition in Theorem 2A.9, in particular. In this section we investigate
what the general results obtained for variational inequalities provide in such cases.
Recall from Theorem 2A.6 that in minimizing a continuously differentiable func-
tion g over a nonempty, closed, convex set C ⊂ IRn , the variational inequality
This gives a way to think about the first-order condition for a local minimum of
g in which the vectors w in (2) making the inequality hold as an equation can be
anticipated to have a special role. In fact, those vectors w comprise the critical cone
KC (x, −∇g(x)) to C at x with respect to the vector −∇g(x) in NC (x), as defined in
Section 2E:
(3) KC (x, −∇g(x)) = w ∈ TC (x) ∇g(x), w = 0 .
When C is polyhedral, at least, this critical cone is able to serve in the expression of
second-order necessary and sufficient conditions for the minimization of g over C.
(5) t −2 [g(x̄ + tz) − g(x̄)] ≥ ε for all z ∈ Z ∩ TC (x̄) when t is sufficiently small.
The assumption in (b) entails (in terms of w = tz) the existence of ε > 0 such that
z, ∇2 g(x̄)z > ε when z ∈ Z ∩ KC (x̄, v̄). This inequality also holds then for all z in
some open set containing Z ∩ KC (x̄, v̄). Let Z0 be the intersection of the complement
of that open set with Z ∩ TC (x̄). Since (1) is equivalent to (2), we have for z ∈ Z0
that ∇g(x̄), z > 0. Because Z0 is compact, we actually have an η > 0 such that
∇g(x̄), z > η for all z ∈ Z0 . We see then, in writing
provides for each p a first-order condition which x must satisfy if it furnishes a local
minimum, but only describes, in general, the stationary points x for the minimization
in (6). If C is polyhedral the question of a local minimum can be addressed through
the second-order conditions provided by Theorem 2G.1 relative to the critical cone
(8) KC (x, −∇x g(p, x)) = w ∈ TC (x) w ⊥ ∇x g(p, x) .
The basic object of interest to us for now, however, is the stationary point mapping
S : IRd →
→ IRn defined by
(9) S(p) = x ∇x g(p, x) + NC (x) 0 .
With respect to a choice of p̄ and x̄ such that x̄ ∈ S( p̄), it will be useful to consider
alongside of (6) an auxiliary problem with parameter v ∈ IRn in which g( p̄, ·) is
essentially replaced by its second-order expansion at x̄:
The subtraction of v, w “tilts” ḡ, and is referred to therefore as a tilt perturbation.
When v = 0, ḡ itself is minimized.
For this auxiliary problem the basic first-order condition comes out to be the
parameterized variational inequality
(11) ∇x g( p̄, x̄) + ∇2xx g( p̄, x̄)w − v + NW (w) 0, where NW (w) = NC (x̄ + w).
The stationary point mapping for the problem in (10) is accordingly the mapping
S̄ : IRn →
→ IRn defined by
(12) S̄(v) = w ∇x g( p̄, x̄) + ∇2xx g( p̄, x̄)w + NW (w) v .
The points w ∈ S̄(v) are sure to furnish a minimum in (10) if, for instance, the
matrix ∇2x g( p̄, x̄) is positive semidefinite, since that corresponds to the convexity of
the “tilted” function being minimized. For polyhedral C, Theorem 2G.1 could be
brought in for further analysis of a local minimum in (10). Note that 0 ∈ S̄(0).
116 2 Implicit Function Theorems for Variational Problems
On the other hand, (b) is necessary for S to have a Lipschitz continuous single-
valued localization around p̄ for x̄ when the n × d matrix ∇2xp g( p̄, x̄) has rank n.
Under the additional assumption that C is polyhedral, condition (b) is equivalent
to the condition that, for the critical cone K = KC (x̄, −∇x g( p̄, x̄)), the mapping
v → S̄0 (v) = w ∇2xx g( p̄, x̄)w + NK (w) v is everywhere single-valued.
Moreover, a sufficient condition for this can be expressed in terms of the critical
subspaces KC+ (x̄, v̄) = KC (x̄, v̄) − KC (x̄, v̄) and KC− (x̄, v̄) = KC (x̄, v̄) ∩ [−KC (x̄, v̄)] for
v̄ = −∇x g( p̄, x̄), namely
for every nonzero w ∈ KC+ (x̄, v̄)
(14) w, ∇xx g( p̄, x̄)w > 0
2
with ∇2xx g( p̄, x̄)w ⊥ KC− (x̄, v̄).
Then S has a localization s not only with the properties laid out in Theorem 2G.2,
but also with the property that, for every p in some neighborhood of p̄, the point
2 Implicit Function Theorems for Variational Problems 117
x = s(p) furnishes a strong local minimum in (6). Moreover, (15) is necessary for
the existence of a localization s with all these properties, when the n × d matrix
∇2xp g( p̄, x̄) has rank n.
Proof. Obviously (15) implies (14), which ensures according to Theorem 2G.2 that
S has a Lipschitz continuous single-valued localization s around p̄ for x̄. Applying
2E.10(a), we see then that
+ +
KC (x, −∇x g(p, x)) ⊂ KC (x̄, −∇x g( p̄, x̄)) = KC (x̄, v̄)
when x = s(p) and p is near enough to p̄. Since the matrix ∇2xx g(p, x) converges
to ∇2xx g( p̄, x̄) as (p, x) tends to ( p̄, x̄), it follows that w, ∇2xx g(p, x)w > 0 for all
nonzero w ∈ KC (x, −∇x g(p, x)) when x = s(p) and p is close enough to p̄. Since
having x = s(p) corresponds to having the first-order condition in (7), we conclude
that from Theorem 2G.1 that x furnishes a strong local minimum in this case.
Arguing now toward the necessity of (15) under the rank condition on ∇2xp g( p̄, x̄),
we suppose S has a Lipschitz continuous single-valued localization s around p̄ for
x̄ such that x = s(p) gives a local minimum when p is close enough to p̄. For any
x ∈ C near x̄ and v ∈ NC (x), the rank condition gives us a p such that v = −∇x g(p, x);
this follows e.g. from 1F.6. Then x = s(p) and, because we have a local minimum,
it follows that w, ∇2xx g(p, x)w ≥ 0 for every nonzero w ∈ KC (x, v). We know from
2E.10(b) that KC (x, v) = KC+ (x̄, v̄) for choices of x and v arbitrarily close to (x̄, v̄),
where v̄ = −∇x g( p̄, x̄). Through the continuous dependence of ∇2xx g(p, x) on (p, x),
we therefore have
+
(16) w, Aw ≥ 0 for all w ∈ KC (x̄, v̄), where A = ∇2xx g( p̄, x̄) is symmetric.
For this reason, we can only have w, Aw = 0 if Aw ⊥ KC+ (x̄, v̄), i.e., w , Aw = 0
for all w ∈ KC+ (x̄, v̄).
On the other hand, because the rank condition corresponds to the ample param-
eterization property, we know from Theorem 2E.8 that the existence of the single-
valued localization s requires for A and the critical cone K = KC (x̄, v̄) that the map-
ping (A + NK )−1 be single-valued. This would be impossible if there were a nonzero
w such that Aw ⊥ KC+ (x̄, v̄), because we would have w , Aw = 0 for all w ∈ K in
particular (since K ⊂ KC+ (x̄, v̄)), implying that −Aw ∈ NK (w). Then (A + NK )−1 (0)
would contain w along with 0, contrary to single-valuedness. Thus, the inequality in
(16) must be strict when w = 0.
Next we provide a complementary, global result for the special case of a tilted
strongly convex function.
has a unique solution S(v), and the solution mapping S is Lipschitz continuous on
IRn (globally) with Lipschitz constant μ −1 .
Proof. Let gv denote the function being minimized in (17). Like g, this function
is continuously differentiable and strongly convex on C with constant μ ; we have
∇gv (x) = ∇g(x)− v. According to Theorem 2A.6, the condition ∇gv (x)+ NC (x) 0,
or equivalently x ∈ (∇g + NC )−1 (v), is both necessary and sufficient for x to furnish
the minimum in (17). The strong convexity of g makes the mapping f = ∇g strongly
monotone on C with constant μ ; see 2F.3(a). The conclusion follows now by apply-
ing Theorem 2F.9 to this mapping f .
Exercise 2G.5. In the setting of Theorem 2G.2, condition (b) is fulfilled in particu-
lar if there exists μ > 0 such that
and then lip (s; p̄) ≤ μ −1 . If C is polyhedral, the additional conclusion holds that,
for all p in some neighborhood of p̄, there is a strong local minimum in problem (6)
at the point x = s(p).
Guide. Apply Proposition 2G.4 to the function ḡ in the auxiliary minimization
problem (10). Get from this that s̄ coincides with S̄, which is single-valued and
Lipschitz continuous on IRn with Lipschitz constant μ −1 . In the polyhedral case,
also apply Theorem 2G.3, arguing that (18) entails (15).
Observe that because C − C is a convex set containing 0, the condition in (18)
holds for all w ∈ C − C if it holds for all w ∈ C − C with |w| sufficiently small.
We turn now to minimization over sets which need not be convex but are specified
by a system of constraints. A first-order necessary condition for a minimum in that
case was developed in a very general manner in Theorem 2A.8. Here, we restrict
ourselves to the most commonly treated problem of nonlinear programming, where
the format is to
≤ 0 for i ∈ [1, s],
(19) minimize g0 (x) over all x satisfying gi (x)
= 0 for i ∈ [s + 1, m].
In order to bring second-order conditions for optimality into the picture, we assume
that the functions g0 , g1 , . . . , gm are twice continuously differentiable on IRn .
The basic first-order condition in this case has been worked out in detail in Sec-
tion 2A as a consequence of Theorem 2A.8. It concerns the existence, relative to
x, of a multiplier vector y = (y1 , . . . , ym ) fulfilling the Karush–Kuhn–Tucker condi-
tions:
2 Implicit Function Theorems for Variational Problems 119
≤ 0 for i ∈ [1, s] with y j = 0,
y ∈ IRs+ × IRm−s , gi (x)
(20) = 0 for all other i ∈ [1, m],
∇g0 (x) + y1 ∇g1 (x) + · · · + ym ∇gm (x) = 0.
This existence is necessary for a local minimum at x as long as x satisfies the con-
straint qualification requiring that the same conditions, but with the term ∇g0 (x)
suppressed, can’t be satisfied with y = 0. It is sufficient for a global minimum at x if
g0 , g1 , . . . , gs are convex and gs+1 , . . . , gm are affine. However, we now wish to take a
second-order approach to local sufficiency, rather than rely on convexity for global
sufficiency.
The key for us will be the fact, coming from Theorem 2A.9, that (20) can be
identified in terms of the Lagrangian function
Because our principal goal is to illustrate the application of the results in the preced-
ing sections, rather than push consequences for optimization theory to the limit, we
will only deal with this variational inequality under an assumption of linear inde-
pendence for the gradients of the active constraints. A constraint in (20) is inactive
at x if it is an inequality constraint with gi (x) < 0; otherwise it is active at x.
(b) (sufficient condition) If a multiplier vector ȳ exists such that (x̄, ȳ) satisfies
the conditions in (20), or equivalently (22), and if (24) holds with strict inequality
when w = 0, then x̄ furnishes a local minimum in (19). Indeed, it furnishes a strong
local minimum in the sense of there being an ε > 0 such that
ε
(25) g0 (x) ≥ g0 (x̄) + |x − x̄|2 for all x near x̄ satisfying the constraints.
2
120 2 Implicit Function Theorems for Variational Problems
Proof. The linear independence of the gradients of the active constraints guaran-
tees, among other things, that x̄ satisfies the constraint qualification under which
(22) is necessary for local optimality.
In the case of a local minimum, as in (a), we do therefore have the variatio-
nal inequality (22) fulfilled by x̄ and some vector ȳ; and of course (22) holds by
assumption in (b). From this point on, therefore, we can concentrate on just the
second-order parts of (a) and (b) in the framework of having x̄ and ȳ satisfying (20).
In particular then, we have
where the multiplier vector ȳ is moreover uniquely determined by the linear inde-
pendence of the gradients of the active constraints and the stipulation in (20) that
inactive constraints get coefficient 0.
Actually, the inactive constraints play no role around x̄, so we can just as well
assume, for simplicity of exposition in our local analysis, that every constraint is
active at x̄: we have gi (x̄) = 0 for i = 1, . . . , m. Then, on the level of first-order
conditions, we just have the combination of (20), which corresponds to ∇x L(x̄, ȳ) =
0, and the requirement that ȳi ≥ 0 for i = 1, . . . , s. In this simplified context, let
≤ 0 for i = 1, . . . , s,
(27) T = set of all w ∈ IRn satisfying ∇gi (x̄), w
= 0 for i = s + 1, . . ., m,
The rest of our argument will rely heavily on the classical inverse function the-
orem, 1A.1. Our assumption that the vectors ∇gi (x̄) for i = 1, . . . , m are linearly
independent in IRn entails of course that m ≤ n. These vectors can be supplemented,
if necessary, by vectors ak for k = 1, . . . , n − m so as to form a basis for IRn . Then, by
setting gm+k (x) = ak , x − x̄, we get functions gi for i = m + 1, . . ., n such that for
g : IRn → IRn with g(x) = (g1 (x), . . . , gm (x), gm+1 (x), . . . , gn (x))
we have g(x̄) = 0 and ∇g(x̄) nonsingular. We can view this as providing, at least
locally around x̄, a change of coordinates g(x) = u = (u1 , . . . , un ), x = s(u) (for a
localization s of g−1 around 0 for x̄) in which x̄ corresponds to 0 and the constraints
in (19) correspond to linear constraints
ui ≤ 0 for i = 1, . . . , s, ui = 0 for i = s + 1, . . . , m
and sufficient conditions in Theorem 2G.1 are applicable to this and entail having
− f (0) belong to ND (0). It will be useful to let ỹ stand for (ȳ, 0, . . . , 0) ∈ IRn .
The inverse function theorem reveals that the Jacobian ∇s(0) is ∇g(x̄)−1 . We
have ∇ f (0) = ∇g0 (x̄)∇s(0) by the chain rule, and on the other hand −∇g0 (x̄) =
ỹ∇g(x̄) by (26), and therefore ∇ f (0) = −ỹ. The vectors w belonging to the set T in
(27) correspond one-to-one with the vectors z ∈ D through ∇g(x̄)w = z, and under
this, through (28), the vectors w ∈ K correspond to the vectors z ∈ D such that
z, ỹ = 0, i.e., the vectors in the critical cone KD (0, ỹ) = KD (0, −∇ f (0)).
The second-order conditions in Theorem 2G.1, in the context of the transformed
version of problem (19), thus revolve around the nonnegativity or positivity of
z, ∇2 f (0)z for vectors z ∈ KD (0, ỹ). It will be useful that this is the same as the
nonnegativity or positivity or z, ∇2 h(0)z for the function
z, ∇2 h(0)z = ϕ (0) for the function ϕ (t) = h(tz) = L(s(tz), ȳ).
hence ϕ (0) = w, ∇2xx L(x̄, ȳ), w + ∇x L(x̄, ȳ), x (0). But ∇x L(x̄, ȳ) = 0 from the
first-order conditions. Thus, (29) holds, as claimed.
The final assertion of part (b) automatically carries over from the corresponding
assertion of part (b) of Theorem 2G.1 under the local change of coordinates that we
utilized.
Exercise 2G.7. In the context of Theorem 2G.6, let ȳ be a multiplier associated with
x̄ through the first-order condition (22). Let I0 (x̄, ȳ) be the set of indices i ∈ I(x̄) such
that i ≤ s and ȳi = 0. Then an equivalent description of the cone K in the second-
order conditions is that
≤ 0 for i ∈ I0 (x̄, ȳ),
w ∈ K ⇐⇒ ∇gi (x̄), w
= 0 for i ∈ I(x̄)\I0 (x̄, ȳ).
Guide. Utilize the fact that −∇g0 (x̄) = ȳ1 ∇g1 (x̄) + · · · + ȳm ∇gm (x̄) with ȳi ≥ 0 for
i = 1, . . . , s.
122 2 Implicit Function Theorems for Variational Problems
The alternative description in 2G.7 lends insights in some situations, but it makes
K appear to depend on ȳ, whereas in reality it doesn’t.
Next we take up the study of a parameterized version of the nonlinear program-
ming problem in the form
≤ 0 for i ∈ [1, s],
(30) minimize g0 (p, x) over all x satisfying gi (p, x)
= 0 for i ∈ [s + 1, m],
The pairs (x, y) satisfying this variational inequality are the Karush–Kuhn–Tucker
pairs for the problem specified by p in (30). The x components of such pairs might
or might not give a local minimum according to the circumstances in Theorem 2G.6
(or whether certain convexity assumptions are fulfilled), and indeed we are not im-
posing a linear independence condition on the constraint gradients in (30) of the
kind on which Theorem 2G.6 was based. But these x’s serve anyway as station-
ary points and we wish to learn more about their behavior under perturbations by
studying the Karush–Kuhn–Tucker mapping S : IRd → IRn × IRm defined by
(33) S(p) = (x, y) f (p, x, y) + NE (x, y) (0, 0) .
ḡ0 (w) = L( p̄, x̄, ȳ) + ∇x L( p̄, x̄, ȳ), w + 12 w, ∇2xx L( p̄, x̄, ȳ)w,
ḡi (w) = gi ( p̄, x̄) + ∇x gi ( p̄, x̄), w for i = 1, . . . , m,
The auxiliary problem, depending on a tilt parameter vector v but also now an addi-
tional parameter vector u = (u1 , . . . , um ), is to
2 Implicit Function Theorems for Variational Problems 123
ḡ0 (w) − v, w + z1 [ḡ1 (w) + u1 ] + · · · + zm [ḡm (w) + um ] =: L̄(w, z) − v, w + z, u.
∇2xx L( p̄,
x̄, ȳ)w + z1 ∇x g1 ( p̄, x̄) + · · · + zm ∇x gm ( p̄, x̄) − v = 0,
(37) ≥ 0 for i ∈ I0 having ḡi (w) + ui = 0,
with zi
= 0 for i ∈ I0 having ḡi (w) + ui < 0 and for i ∈ I1 .
We need to pay heed to the auxiliary solution mapping S̄ : IRn × IRn → → IRn × IRm
defined by
(38) S̄(v, u) = (w, z) f¯(w, z) + NĒ (w, z) (v, u) = (w, z) satisfying (37) ,
which has
(0, 0) ∈ S̄(0, 0).
The following subspaces will enter our analysis of the properties of the mapping S̄:
M + = w ∈ IRn w ⊥ ∇x gi ( p̄, x̄) for all i ∈ I\I
0 ,
(39)
M − = w ∈ IRn w ⊥ ∇x gi ( p̄, x̄) for all i ∈ I .
124 2 Implicit Function Theorems for Variational Problems
Theorem 2G.8 (implicit function theorem for stationary points). Let (x̄, ȳ) ∈ S( p̄)
for the mapping S in (33), constructed from functions gi that are twice continuously
differentiable. Assume for the auxiliary mapping S̄ in (38) that
S̄ has a Lipschitz continuous single-valued
(40)
localization s̄ around (0, 0) for (0, 0).
Then S has a Lipschitz continuous single-valued localization s around p̄ for (x̄, ȳ),
and this localization s is semidifferentiable at p̄ with semiderivative given by
⎡ 2 ⎤
∇xp L( p̄, x̄, ȳ)
⎢ −∇p g1 ( p̄, x̄) ⎥
⎢ ⎥
(41) Ds( p̄)(q) = s̄(−Bq), where B = ∇ p f ( p̄, x̄, ȳ) = ⎢ ⎥.
⎣ ... ⎦
−∇p gm ( p̄, x̄)
Moreover the condition in (40) is necessary for the existence of a Lipschitz continu-
ous single-valued localization of S around p̄ for (x̄, ȳ) when the (n + m) × d matrix B
has rank n + m. In particular, S̄ is sure to satisfy (40) when the following conditions
are both fulfilled:
(a) the gradients ∇x gi ( p̄, x̄) for i ∈ I are linearly independent,
(b) w, ∇2xx L( p̄, x̄, ȳ)w > 0 for every nonzero w ∈ M + with ∇2xx L( p̄, x̄, ȳ)w ⊥ M − ,
with M + and M− as in (39).
On the other hand, (40) always entails at least (a).
Proof. This is obtained by applying 2E.1 with the additions in 2E.6 and 2E.8 to the
variational inequality (32). Since ∇y L(p, x, y) = g(p, x) for g(p, x) = (g1 (p, x), . . . ,
gm (p, x)), the Jacobian in question is
2
∇xx L( p̄, x̄, ȳ) ∇x g( p̄, x̄)T
(42) ∇(x,y) f ( p̄, x̄, ȳ) = .
−∇x g( p̄, x̄) 0
In terms of polyhedral convex cone Y = IRs+ × IRm−s , the critical cone to the polyhe-
dral convex cone set E is
for the polyhedral cone W in (36). By taking A to be the matrix in (42) and K to be
the cone in (43), the auxiliary mapping S̄ can be identified with (A + NK )−1 in the
framework of Theorem 2E.6. (The w and v in that result have here turned into pairs
(w, z) and (v, u).)
This leads to all the conclusions except for establishing that (40) implies (a) and
working out the details of the sufficient condition provided by Theorem 2E.6. To
verify that (40) implies (a), consider any ε > 0 and let
m
ε for i ∈ I0 ,
vε = ∑ zεi ∇x gi ( p̄, x̄), zεi =
i=1 0 otherwise.
2 Implicit Function Theorems for Variational Problems 125
Then, as seen from the conditions in (37), we have (0, zε ) ∈ S̄(vε , 0). If (a) didn’t
hold, we would also have ∑m i=1 ζi ∇x g( p̄, x̄) = 0 for some coefficient vector ζ = 0
with ζi = 0 when i ∈ I1 . Then for every δ > 0 small enough that ε + δ ζi ≥ 0 for
all i ∈ I0 , we would also have (0, zε + δ ζ ) ∈ S̄(vε , 0). Since ε and δ can be chosen
arbitrarily small, this would contradict the single-valuedness in (40). Thus, (a) is
necessary for (40).
To come next to an understanding of what the sufficient condition in 2E.6 means
here, we observe in terms of Y = IRs+ × IRm−s that
KE+ (x̄, ȳ, − f ( p̄, x̄, ȳ)) = IRn × KY+ (ȳ, g( p̄, x̄)),
KE− (x̄, ȳ, − f ( p̄, x̄, ȳ)) = IRn × KY− (ȳ, g( p̄, x̄)),
where
z ∈ KY+ (ȳ, g( p̄, x̄)) ⇐⇒ zi = 0 for i ∈ I1 ,
z ∈ KY− (ȳ, g( p̄, x̄)) ⇐⇒ zi = 0 for i ∈ I0 ∪ I1 .
In the shorthand notation
+ + − −
H = ∇2xx L( p̄, x̄, ȳ), K = KE (x̄, ȳ, − f ( p̄, x̄, ȳ)), K = KE (x̄, ȳ, − f ( p̄, x̄, ȳ)),
our concern is to have (w, z), A(w, z) > 0 for every (w, z) ∈ K + with A(w, z) ⊥ K −
and (w, z) = (0, 0). It’s clear from (41) that
Having ∇x g( p̄, x̄)w ⊥ KY− corresponds to having w ⊥ ∇x gi ( p̄, x̄) for all i ∈ I \ I0 ,
which means w ∈ M + . On the other hand, having Hw+ ∇x g( p̄, x̄)T z = 0 corresponds
to having Hw = −(z1 ∇x g1 ( p̄, x̄) + · · · + zm ∇x gm ( p̄, x̄)). The sufficient condition in
2E.6 boils down, therefore, to the requirement that
w, Hw > 0 when (w, z) = (0, 0), w ∈ M , Hw = − ∑i∈I zi ∇x gi ( p̄, x̄).
+
Our final topic concerns the conditions under which the mapping S in (33) de-
scribes perturbations not only of stationarity, but also of local minimuma.
126 2 Implicit Function Theorems for Variational Problems
Theorem 2G.9 (implicit function theorem for local minima). Suppose in the setting
of the parameterized nonlinear programming problem (30) for twice continuously
differentiable functions gi and its Karush–Kuhn–Tucker mapping S in (33) that the
following conditions hold in the notation coming from (34):
(a) the gradients ∇x gi ( p̄, x̄) for i ∈ I are linearly independent, and
(b) w, ∇2xx L( p̄, x̄, ȳ)w > 0 for every w = 0 in the subspace M + in (39).
Then not only does S have a localization s with the properties laid out in Theorem
2G.8, but also, for every p in some neighborhood of p̄, the x component of s(p)
furnishes a strong local minimum in (30). Moreover, (a) and (b) are necessary for
this additional conclusion when n + m is the rank of the (n + m) × d matrix B in (41).
Proof. Sufficiency. Condition (b) here is a stronger assumption than (b) of Theorem
2G.8, so we can be sure that (a) and (b) guarantee the existence of a localization
s possessing the properties in that result. Moreover (b) implies satisfaction of the
sufficient condition for a local minimum at x̄ in Theorem 2G.6, inasmuch as the cone
K in that theorem is obviously contained in the set of w such that ∇gi ( p̄, x̄), w = 0
for all i ∈ I \ I0 . We need to demonstrate, however, that this local minimum property
persists in passing from p̄ to nearby p.
To proceed with that, denote the two components of s(p) by x(p) and y(p), and
let I(p), I0 (p) and I1 (p) be the index sets which correspond to x(p) as I, I0 and
I1 do to x̄, so that I(p) consists of the indices i ∈ {1, . . . , m} with gi (p, x(p)) = 0,
and I(p) \ I0 (p) consists of the indices i ∈ I(p) having yi (p) > 0 for inequality
constraints, but I1 (p) consists of the indices of the inequality constraints having
gi (p, x(p)) < 0. Consider the following conditions, which reduce to (a) and (b) when
p = p̄:
(a(p)) the gradients ∇x gi (p, x(p)) for i ∈ I(p) are linearly independent,
(b(p)) w, ∇2xx L(p, x(p), y(p))w > 0 for every w = 0 such that w ⊥ ∇x gi (p, x(p))
for all i ∈ I(p) \ I0 (p).
Since x(p) and y(p) tend toward x( p̄) = x̄ and y( p̄) = ȳ as p → p̄, the fact that
yi (p) = 0 for i ∈ I1 (p) and the continuity of the gi ’s ensure that
Through this and the fact that ∇x gi (p, x(p)) tends toward ∇x gi ( p̄, x̄) as p goes to p̄,
we see that the linear independence in (a) entails the linear independence in (a(p))
for p near enough to p̄. Indeed, not only (a(p)) but also (b(p)) must hold, in fact in
the stronger form that there exist ε > 0 and a neighborhood Q of p̄ for which
when |w| = 1 and w ⊥ ∇x gi (p, x(p)) for all i ∈ I(p) \ I0 (p). Indeed, otherwise there
would be sequences of vectors pk → p̄ and wk → w violating this condition for
εk → 0, and this would lead to a contradiction of (b) in view of the continuous
dependence of the matrix ∇2xx L(p, x(p), y(p)) on p.
Of course, with both (a(p)) and (b(p)) holding when p is in some neighborhood
of p̄, we can conclude through Theorem 2G.6, as we did for x̄, that x(p) furnishes a
2 Implicit Function Theorems for Variational Problems 127
strong local minimum for problem (30) for such p, since the cone
∇x gi (p, x(p)), w ≤ 0 for i ∈ I(p) with i ≤ s,
K(p) = set of w satisfying
∇x gi (p, x(p)), w = 0 for i = s + 1, . . ., m and i = 0
lies in the subspace formed by the vectors w with ∇x gi (p, x(p)), w = 0 for all
i ∈ I(p) \ I0 (p).
Necessity. Suppose that S has a Lipschitz continuous single-valued localization
s around p̄ for (x̄, ȳ). We already know from Theorem 2G.8 that, under the rank
condition in question, the auxiliary mapping S̄ in (38) must have such a localization
around (0, 0) for (0, 0), and that this requires the linear independence in (a). Under
the further assumption now that x(p) gives a local minimum in problem (30) when
p is near enough to p̄, we wish to deduce that (b) must hold as well. Having a local
minimum at x(p) implies that the second-order necessary condition for optimality
in Theorem 2G.6 is satisfied with respect to the multiplier vector y(p):
(44) w, ∇2xx L(p, x(p), y(p))w ≥ 0 for all w ∈ K(p) when p is near to p̄.
We will now find a value of the parameter p close to p̄ such that (x(p), y(p)) = (x̄, ȳ)
and K(p) = M + . If I0 = 0/ there is nothing to prove. Let I0 = 0. / The rank condi-
tion on the Jacobian B = ∇ p f ( p̄, x̄, ȳ) provides through Theorem 1F.6 (for k = 1)
the existence of p(v, u), depending continuously on some (v, u) in a neighborhood
of (0, −g( p̄, x̄)), such that f (p(v, u), x̄, ȳ) = (v, u), i.e., ∇x L(p(v, u), x̄, ȳ) = v and
−g(p(v, u), x̄) = u. For an arbitrarily small ε > 0, let the vector uε have uεi = −ε
for i ∈ I0 but uεi = 0 for all other i. Let pε = p(0, uε ). Then ∇x L(pε , x̄, ȳ) = 0 with
gi (pε , x̄) = 0 for i ∈ I \ I0 but gi (pε , x̄) < 0 for i ∈ I0 as well as for i ∈ I1 . Thus,
I(pε ) = I \ I0 , I0 (pε ) = 0,
/ I1 (pε ) = I0 ∪ I1 , and (x̄, ȳ) ∈ S(pε ) and, moreover, (x̄, ȳ)
furnishes a local minimum in (30) for p = pε , moreover with K(pε ) coming out to
be the subspace
M (pε ) = w w ⊥ ∇x gi (pε , x̄) for all i ∈ I \ I0 .
+
whereas we are asking in (b) for this to hold with strict inequality for w = 0 in the
case of M + = M + ( p̄).
We know that pε → p̄ as ε → 0. Owing to (a) and the continuity of the functions
gi and their derivatives, the gradients ∇x gi (pε , x̄) for i ∈ I must be linearly inde-
pendent when ε is sufficiently small. It follows from this that any w in M + can be
approximated as ε → 0 by vectors wε belonging to the subspaces M + (pε ). In the
limit therefore, we have at least that
+
(45) w, ∇2xx L( p̄, x̄, ȳ)w ≥ 0 for all w ∈ M .
128 2 Implicit Function Theorems for Variational Problems
How are we to conclude strict inequality when w = 0? It’s important that the matrix
H = ∇2xx L( p̄, x̄, ȳ) is symmetric. In line with the positive semidefiniteness in (45),
any w̄ ∈ M + with w̄, H w̄ = 0 must have H w̄ ⊥ M + . But then in particular, the
auxiliary solution mapping S̄ in (38) would have (t w̄, 0) ∈ S̄(0, 0) for all t ≥ 0, in
contradiction to the fact, coming from Theorem 2G.8, that S̄(0, 0) contains only
(0, 0) in the current circumstances.
2 Implicit Function Theorems for Variational Problems 129
Commentary
The basic facts about convexity, polyhedral sets, and tangent and normal cones
given in Section 2A are taken mainly from Rockafellar [1970]. Robinson’s implicit
function theorem was stated and proved in Robinson [1980], where the author was
clearly motivated by the problem of how the solutions of the standard nonlinear
programming problem depend on parameters, and he pursued this goal in the same
paper.
At that time it was already known from the work of Fiacco and McCormick
[1968] that under the linear independence of the constraint gradients and the stan-
dard second-order sufficient condition, together with strict complementarity slack-
ness at the reference point (which means that there are no inequality constraints
satisfied as equalities that are associated with zero Lagrange multipliers), the solu-
tion mapping for the standard nonlinear programming problem has a smooth single-
valued localization around the reference point. The proof of this result was based on
the classical implicit function theorem, inasmuch as under strict complementarity
slackness the Karush–Kuhn–Tucker system turns into a system of equations locally.
Robinson looked at the case when the strict complementarity slackness is violated,
which may happen, as already noted in 2B, when the “stationary point trajectory”
hits the constraints. Based on his implicit function theorem, which actually reached
far beyond his immediate goal, Robinson proved, still in his paper from 1980, that
under a stronger form of the second-order sufficient condition, together with linear
independence of the constraint gradients, the solution mapping of the standard non-
linear programming problem has a Lipschitz continuous single-valued localization
around the reference point; see Theorem 2G.9 for an updated statement.
This result was a stepping stone to the subsequent extensive development of sta-
bility analysis in optimization, whose maturity came with the publication of the
books Bank, Guddat, Klatte, Kummer and Tammer [1983], Levitin [1992], Bonnans
and Shapiro [2000], Klatte and Kummer [2002] and Facchinei and Pang [2003].
Robinson’s breakthrough in the stability analysis of nonlinear programming was
in fact much needed for the emerging numerical analysis of variational problems
more generally. In his paper from 1980, Robinson noted the thesis of his Ph.D.
student Josephy [1979], who proved that strong regularity yields local quadratic
convergence of Newton’s method for solving variational inequalities, a method
whose version for constrained optimization problems is well known as the sequen-
tial quadratic programming (SQP) method in nonlinear programming.
Quite a few years after Robinson’s theorem was published, it was realized that
the result could be used as a tool in the analysis of a variety of variational prob-
lems, and beyond. Alt [1990] applied it to optimal control, while in Dontchev and
Hager [1993], and further in Dontchev [1995b], the statement of Robinson’s theo-
rem was observed actually to hold for generalized equations of the form 2B(1) for
an arbitrary mapping F, not just a normal cone mapping. Variational inequalities
thus serve as an example, not a limitation. Important applications, e.g. to conver-
gence analysis of algorithms and discrete approximations to infinite-dimensional
130 2 Implicit Function Theorems for Variational Problems
variational problems, came later. In the explosion of works in this area in the 80’s
and 90’s Robinson’s contribution, if not forgotten, was sometimes taken for granted.
More about this will come out in Chapter 6.
The presentation of the material in Section 2B mainly follows Dontchev and
Rockafellar [2009a], while that in Section 2C comes from Dontchev and Rockafellar
[2001]. In Section 2D, we used some facts from the books of Facchinei and Pang
[2003] (in particular, 2D.5) and Scholtes [1994]. Theorem 2E.6 is a particular case
of a result in Dontchev and Rockafellar [1996].
Sections 2F and 2G give an introduction to the theory of monotone mappings
which for its application to optimization problems goes back to Rockafellar [1976a]
and [1976b]. Much more about this kind of monotonicity and its earlier history can
be found in Chapter 12 of the book of Rockafellar and Wets [1998]. The stability
analysis in 2G uses material from Dontchev and Rockafellar [1996,1998], but some
versions of these results could be extracted from earlier works.
Chapter 3
Regularity Properties of Set-valued Solution
Mappings
f i (p, x) = 0 for i = 1, . . . , m,
In studying the behavior of the corresponding solution mapping S : IRd →→ IRn given
by
(3) S(p) = x x satisfies (2) (covering (1) as a special case),
we are therefore still, very naturally, in the realm of the “extended implicit function
theory” we have been working to build up.
Here, F is not a normal cone mapping NC , so we are not dealing with a variational
inequality. The results in Chapter 2 for solution mappings to parameterized gener-
alized equations would anyway, in principle, be applicable, but in this framework
they miss the mark. The trouble is that those results focus on the prospects of find-
ing single-valued localizations of a solution mapping, especially ones that exhibit
Lipschitz continuity. For a solution mapping S as in (3), coming from a general-
ized equation as in (2), single-valued localizations are unlikely to exist at all (apart
from the pure equation case with m = n) and aren’t even a topic of serious concern.
Rather, we are confronted with a “varying set” S(p) which cannot be reduced locally
to a “varying point.” That could be the case even if, in (2), F(x) is not a constant set
but a sort of continuously moving or deforming set. What we are looking for is not
so much a generalized implicit function theorem, but an implicit mapping theorem,
the distinction being that “mappings” truly embrace set-valuedness.
To understand the behavior of such a solution mapping S, whether qualitatively
or quantitatively, we have to turn to other concepts, beyond those in Chapter 2.
Our immediate task, in Sections 3A, 3B, 3C and 3D, is to introduce the notions
of Painlevé–Kuratowski convergence and Pompeiu–Hausdorff convergence for se-
quences of sets, and to utilize them in developing properties of continuity and
Lipschitz continuity for set-valued mappings. In tandem with this, we gain impor-
tant insights into the solution mappings (3) associated with constraint systems as in
(1) and (2), especially for cases where f is affine. We also obtain by-products con-
cerning the behavior of various mappings associated with problems of optimization.
In Section 3E, however, we open a broader investigation in which the Aubin
property, serving as a sort of localized counterpart to Lipschitz continuity for set-
valued mappings, is tied to the concept of metric regularity, which directly relates
to estimates of distances to solutions. The natural context for this is the study of
how properties of a set-valued mapping correspond to properties of its set-valued
inverse, or in other words, the paradigm of the inverse function theorem. We are
3 Regularity Properties of Set-valued Solution Mappings 133
An alternative description of these values is that lim supk→∞ rk is the highest r for
N
which there exists N ∈ N such that rk → r, whereas lim infk→∞ rk is the lowest such
r. The limit itself exists if and only if these upper and lower limits coincide. For
simplicity, we often just write lim supk , lim infk and limk , with the understanding
that this refers to k → ∞.
In working with sequences of sets, a similar pattern is encountered in which
“outer” and “inner” limits always exist and give a “limit” when they agree.
(b) The inner limit of this sequence, denoted by lim infk Ck , is the set of all x ∈
IRn for which
N
there exist N ∈ N and xk ∈ Ck for k ∈ N such that xk → x.
(c) When the inner and outer limits are the same set C, this set is defined to be
the limit of the sequence {Ck }∞
k=1 :
(1b) lim inf Ck = x ∀ neighborhood V of x, ∃ N ∈ N , ∀ k ∈ N : Ck ∩V = 0/ .
k→∞
(2b) lim inf Ck = x ∀ ε > 0, ∃ N ∈ N : x ∈ Ck + ε IB (k ∈ N)}.
k→∞
Both the outer and inner limits of a sequence {Ck }k∈IN are closed sets. Indeed, if
x∈ / lim supk Ck , then, from (2a), there exists ε > 0 such that for every N ∈ N we
have x ∈/ Ck + ε IB, that is, IB(x, ε ) ∩Ck = 0,
/ for some k ∈ N. But then a neighborhood
of x can meet Ck for finitely many k only. Hence no points in this neighborhood can
be cluster points of sequences {xk } with xk ∈ Ck for infinitely many k. This implies
that the complement of lim supk Ck is an open set and therefore that lim supk Ck is
closed. An analogous argument works for lim infk Ck (this could also be derived by
the following Proposition 3A.1).
Recall from Section 1D that the distance from a point x ∈ IRn to a subset C of IRn
is
136 3 Regularity Properties of Set-valued Solution Mappings
(3b) lim inf Ck = x lim d(x,Ck ) = 0 .
k→∞ k→∞
Proof. If x ∈ lim supk Ck then, by (2a), for any ε > 0 there exists N ∈ N such
that d(x,Ck ) ≤ ε for all k ∈ N. But then, by the definition of the lower limit for
a sequence of real numbers, as recalled in the beginning of this section, we have
lim infk→∞ d(x,Ck ) = 0. The left side of (3a) is therefore contained in the right side.
Conversely, if x is in the set on the right side of (3a), then there exists N ∈ N and
N
xk ∈ Ck for all k ∈ N such that xk → x; then, by definition, x must belong to the left
side of (3a).
If x is not in the set on the right side of (3b), then there exist ε > 0 and N ∈ N
such that d(x,Ck ) > ε for all k ∈ N. Then x ∈ / Ck + ε IB for all k ∈ N and hence by
(2b) x is not in lim infk C . In a similar way, from (2b) we obtain that x ∈
k / lim infk Ck
only if lim supk d(x,C ) > 0. This gives us (3b).
k
Observe that the distance to a set does not distinguish whether this set is closed
or not. Therefore, in the context of convergence, there is no difference whether the
sets in a sequence are closed or not. (But limits of all types are closed sets.)
More examples.
1) The limit of the sequence of intervals [k, ∞) as k → ∞ is the empty set, whereas
the limit of the sequence of intervals [1/k, ∞) is [0, ∞).
2) More generally for monotone sequences of subsets Ck ⊂ IRn , if Ck ⊃ Ck+1 for
all k ∈ IN, then limk Ck = ∩k cl Ck , whereas if Ck ⊂ Ck+1 for all k, then limk Ck =
cl ∪kCk .
3) The constant sequence Ck = D, where D is the set of vectors in IRn whose
coordinates are rational numbers, converges not to D, which isn’t closed, but to the
closure of D, which is IRn . More generally, if Ck = C for all k, then limk Ck = cl C.
(c) C ⊂ lim infk Ck if and only if for every ρ > 0 and ε > 0 there is an index set
N ∈ N such that C ∩ ρ IB ⊂ Ck + ε IB for all k ∈ N ;
(d) C ⊃ lim supk Ck if and only if for every ρ > 0 and ε > 0 there is an index set
N ∈ N such that Ck ∩ ρ IB ⊂ C + ε IB for all k ∈ N ;
(e) C ⊂ lim infk Ck if and only if lim supk d(x,Ck ) ≤ d(x,C) for every x ∈ IRn ;
(f) C ⊃ lim supk Ck if and only if d(x,C) ≤ lim infk d(x,Ck ) for every x ∈ IRn .
Thus, from (c)(d) C = limk Ck if and only if for every ρ > 0 and ε > 0 there is an
index set N ∈ N such that
Also, from (e)(f), C = limk Ck if and only if limk d(x,Ck ) = d(x,C) for every x ∈ IRn .
Proof. (a): Necessity comes directly from (1b). To show sufficiency, assume that
there exists x ∈ C \ lim infk Ck . But then, by (1b), there exists an open neighborhood
V of x such that for every N ∈ N there exists k ∈ N with V ∩ Ck = 0/ and also
V ∩C = 0./ This is the negation of the condition on the right.
(b): Let C ⊃ lim supk Ck and let there exist a compact set B with C ∩ B = 0, / such
that for any N ∈ N one has Ck ∩ B = 0/ for some k ∈ N. But then there exist N ∈ N
and a convergent sequence xk ∈ Ck for k ∈ N whose limit is not in C, a contradiction.
Conversely, if there exists x ∈ lim supk Ck which is not in C then, from (2a), a ball
IBε (x) with sufficiently small radius ε does not meet C yet meets Ck for infinitely
many k; this contradicts the condition on the right.
Sufficiency in (c): Consider any point x ∈ C, and any ρ > |x|. For an arbitrary
ε > 0, there exists, by assumption, an index set N ∈ N such that C ∩ ρ IB ⊂ Ck + ε IB
for all k ∈ N. Then x ∈ Ck + ε IB for all k ∈ N. By (2b), this yields x ∈ lim infk Ck .
Hence, C ⊂ lim infk Ck .
Necessity of (c): It will be demonstrated that if the condition fails, there must be
a point x̄ ∈ C lying outside of lim infk Ck . To say that the condition fails is to say
that there exist ρ > 0 and ε > 0, such that, for each N ∈ N , the inclusion C ∩ ρ IB ⊂
Ck + ε IB is false for at least one k ∈ N. Then there is an index set N0 ∈ N such that
this inclusion is false for all k ∈ N0 ; there are points xk ∈ [C ∩ ρ IB] \ [Ck + ε IB] for all
k ∈ N0 . Such points form a bounded sequence in the closed set C with the property
that d(xk ,Ck ) ≥ ε . A subsequence {xk }k∈N1 , for an index set N1 ∈ N within N0 ,
converges in that case to a point x̄ ∈ C. Since d(xk ,Ck ) ≤ d(x̄,Ck ) + |x̄− xk |, we must
have
d(x̄,Ck ) ≥ ε /2 for all k ∈ N1 large enough.
It is impossible then for x̄ to belong to lim infk Ck , because that requires d(x̄,Ck ) to
converge to 0, cf. (3b).
Sufficiency in (d): Let x̄ ∈ lim supk Ck ; then for some N0 ∈ N there are points
N
xk ∈ Ck such that xk →0 x̄. Fix any ρ > |x̄|, so that xk ∈ ρ IB for k ∈ N0 large enough.
By assumption, there exists for any ε > 0 an index set N ∈ N such that Ck ∩ ρ IB ⊂
C + ε IB when k ∈ N. Then for large enough k ∈ N0 ∩ N we have xk ∈ C + ε IB, hence
N
d(xk ,C) ≤ ε . Because d(x̄,C) ≤ d(xk ,C) + |xk − x̄| and xk →0 x̄, it follows from the
arbitrary choice of ε that d(x̄,C) = 0, which means x̄ ∈ C (since C is closed).
138 3 Regularity Properties of Set-valued Solution Mappings
Necessity in (d): Suppose to the contrary that one can find ρ > 0, ε > 0 and
N ∈ N such that, for all k ∈ N, there exists xk ∈ [Ck ∩ ρ IB] \ [C + ε IB]. The sequence
{xk }k∈N is then bounded, so it has a cluster point x̄ which, by definition, belongs
to lim supk Ck . On the other hand, since each xk lies outside of C + ε IB, we have
d(xk ,C) ≥ ε and, in the limit, d(x̄,C) ≥ ε . Hence x̄ ∈
/ C, and therefore lim supk Ck is
not a subset of C.
(e): Sufficiency follows from (3b) by taking x ∈ C. To prove necessity, choose
x ∈ IRn and let y ∈ C be a projection of x on C: |x − y| = d(x,C). By the definition of
N
lim inf there exist N ∈ N and yk ∈ Ck , k ∈ N such that yk → y. For such yk we have
d(x,C ) ≤ |y − x|, k ∈ N and passing to the limit with k → ∞ we get the condition
k k
on the right.
(f): Sufficiency follows from (3a) by taking x ∈ lim supk Ck . Choose x ∈ IRn . If
x ∈ C there is nothing to prove. If not, note that for any nonnegative α the condition
d(x,C) > α is equivalent to C ∩ IBα (x) = 0. / But then from (b) there exists N ∈ N
with Ck ∩ IBα (x) = 0/ for k ∈ N, which is the same as d(x,Ck ) > α for k ∈ N. This
implies the condition on the right.
Observe that in parts (c)(d) of 3A.2 we can replace the phrase “for every ρ ” by
“there is some ρ0 ≥ 0 such that for every ρ ≥ ρ0 ”.
Set convergence can also be characterized in terms of concepts of distance be-
tween sets.
Excess and Pompeiu–Hausdorff distance. For sets C and D in IRn , the excess of
C beyond D is defined by
e(C, D) = sup d(x, D),
x∈C
and
h(C, D) = inf τ ≥ 0 C ⊂ D + τ IB, D ⊂ C + τ IB .
The excess and the Pompeiu–Hausdorff distance are illustrated in Fig. 3.1. They
are unaffected by whether C and D are closed or not, but in the case of closed sets the
infima in the alternative formulas are attained. Note that both e(C, D) and h(C, D)
can sometimes be ∞ when unbounded sets are involved. For that reason in particular,
the Pompeiu–Hausdorff distance does not furnish a metric on the space of nonempty
closed subsets of IRn , although it does on the space of nonempty closed subsets of a
3 Regularity Properties of Set-valued Solution Mappings 139
h(C,D) = e(D,C)
C D
e(C,D)
Proof. Since the distance to a set doesn’t distinguish whether the set is closed or
not, we may assume that C and D are nonempty closed sets.
According to 1D.4, for any x ∈ IRn we can pick u ∈ C such that d(x, u) = d(x,C).
For any v ∈ D, the triangle inequality tells us that d(x, v) ≤ d(x, u) + d(u, v). Taking
the infimum on both sides with respect to v ∈ D, we see that d(x, D) ≤ d(x, u) +
d(u, D), where d(u, D) ≤ e(C, D). Therefore, d(x, D) − d(x,C) ≤ e(C, D), and by
symmetry in exchanging the roles of C and D, also d(x,C) − d(x, D) ≤ e(D,C), so
that
|d(x,C) − d(x, D)| ≤ max{e(C, D), e(D,C)} = h(C, D).
Hence “≥” holds in (4).
On the other hand, since d(x,C) = 0 when x ∈ C, we have
Ck = Ck ∩ ρ IB ⊂ C + ε IB and C = C ∩ ρ IB ⊂ Ck + ε IB.
Ck ∩ ρ IB ⊂ Ck ⊂ C + ε IB and C ∩ ρ IB ⊂ C ⊂ Ck + ε IB for k ∈ IN
and hence (5) holds and we have convergence of Ck to C with respect to Pompeiu–
Hausdorff distance.
(a) for every open set O ⊂ IRn with C ∩ O = 0/ there exists N ∈ N such that
Ck ∩ O = 0/ for all k ∈ N ;
(b) for every open set O ⊂ IRn with C ⊂ O there exists N ∈ N such that Ck ⊂ O
for all k ∈ N .
Moreover, condition (a) is always necessary for Pompeiu–Hausdorff conver-
gence, while (b) is necessary when the set C is bounded.
Proof. Let (a)(b) hold and let the first inclusion in (5) be violated, that is, there exist
x ∈ C, a scalar ε > 0 and a sequence N ∈ N such that x ∈ / Ck + ε IB for k ∈ N. Then
k
an open neighborhood of x does not meet C for infinitely many k; this contradicts
condition (a). Furthermore, (b) implies that for any ε > 0 there exists N ∈ N such
that Ck ⊂ C + ε IB for all k ∈ N, which is the second inclusion in (5). Then Pompeiu–
Hausdorff convergence follows from (5).
According to 3A.2(a), condition (a) is equivalent to C ⊂ lim infk Ck , and hence
it is necessary for Painlevé–Kuratowski convergence, and then also for Pompeiu–
Hausdorff convergence. To show necessity of (b), let C ⊂ O for some open set O ⊂
IRn . For k ∈ IN let there exist points xk ∈ C and yk in the complement of O such that
|xk − yk | → 0 as k → ∞. Since C is compact, there exists N ∈ N and x ∈ C such
N N
that xk → x, hence yk → x as well. But then x must be also in the complement of
O, which is impossible. The contradiction so obtained shows there is an ε > 0 such
that C + ε IB ⊂ O; then, from (5), for some N ⊂ N we have Ck ⊂ O for k ∈ N.
Examples 3A.7 (unboundedness issues). As an illustration of the troubles that may
occur when we deal with unbounded sets, consider first the sequence of bounded
sets Ck ⊂ IR2 in which Ck is the segment having one end at the origin and the other
at the point (cos 1k , sin 1k ); that is,
1 1
Ck = x ∈ IR2 x1 = t cos , x2 = t sin , 0 ≤ t ≤ 1 .
k k
Both the Painlevé–Kuratowski and Pompeiu–Hausdorff limits exist and are equal to
the segment having one end at the origin and the other at the point (1, 0). Also, both
conditions (a) and (b) in 3A.6 are satisfied.
Let us now modify this example by taking as Ck , instead of a segment, the whole
unbounded ray with its end at the origin. That is,
1 1
Ck = x ∈ IR2 x1 = t cos , x2 = t sin , t ≥ 0 .
k k
The Painlevé–Kuratowski limit is the ray x ∈ IR2 x1 ≥ 0, x2 = 0 , whereas the
Pompeiu–Hausdorff limit fails to exist. In this case condition (a) in 3A.6 holds,
whereas (b) is violated.
As another example demonstrating issues with unboundedness, consider the se-
quence of sets
1 1
Ck = x ∈ IR2 x1 > 0, x2 ≥ − ,
x1 k
142 3 Regularity Properties of Set-valued Solution Mappings
and
-
lim inf S(y) = lim inf S(yk )
y→ȳ
yk →ȳ k→∞
N
= x ∀ yk → ȳ, ∃ N ∈ N , xk → x with xk ∈ S(yk ) .
In other words, the limsup is the set of all possible limits of sequences xk ∈ S(yk )
when yk → ȳ, while the liminf is the set of points x for which there exists a sequence
xk ∈ S(yk ) when yk → ȳ such that xk → x.
These terms are invoked relative to a subset D in IRm when the properties hold
for limits taken with y → ȳ in D (but not necessarily for limits y → ȳ without this
restriction). Continuity is taken to refer to Painlevé–Kuratowski continuity, unless
otherwise specified.
For single-valued mappings both definitions of continuity reduce to the usual
definition of continuity of a function. Note that when S is isc at ȳ relative to D then
there must exist a neighborhood V of ȳ such that D ∩ V ⊂ dom S. When D = IRm ,
this means ȳ ∈ int (dom S).
Proof. Necessity in (a): Suppose that there exists x ∈ / S(ȳ) such that for any neigh-
borhood U of x and neighborhood V of ȳ we have S(y) ∩U = 0/ for some y ∈ V ∩ D.
But then there exists a sequence yk → ȳ, yk ∈ D and xk ∈ S(yk ) such that xk → x.
This implies that x ∈ lim supk S(yk ), hence x ∈ S(ȳ) since S is osc, a contradiction.
Sufficiency in (a): Let x ∈ / S(ȳ). Then there exists ρ > 0 such that S(ȳ) ∩ IBρ (x) =
0;
/ the condition in the second half of (a) then gives a neighborhood V of ȳ such that
N
for every N ∈ N , every sequence yk → ȳ with yk ∈ D ∩ V has S(yk ) ∩ IBρ (x) = 0. /
But in that case d(x, S(y )) > ρ /2 for all large k, which implies, by Proposition 3A.1
k
and the definition of limsup, that x ∈ / lim supy→ȳ S(y). This means that S is osc at ȳ.
Necessity in (b): Suppose that there exists x ∈ S(ȳ) such that for some neighbor-
hood U of x and any neighborhood V of ȳ we have S(y) ∩U = 0/ for some y ∈ V ∩ D.
Then there is a sequence yk convergent to ȳ in D such that for every sequence xk → x
one has xk ∈/ S(yk ). This means that x ∈ / lim infy→ȳ S(y). But then S is not isc at ȳ.
Sufficiency in (b): If S is not isc at ȳ relative to D, then, according to 3A.2(a),
there exist an infinite sequence yk → ȳ in D, a point x ∈ S(ȳ) and an open neighbor-
hood U of x such that S(yk ) ∩ U = 0/ for infinitely many k. But then there exists a
neighborhood V of ȳ such that D ∩V is not in S−1 (U) which is the opposite of (b).
(c): S has closed graph if and only if for any (y, x) ∈ / gph S there exist open neigh-
borhoods V of y and U of x such that V ∩ S−1 (U) = 0. / From (a), this comes down
to S being osc at every y ∈ dom S.
(d): Every sequence in a compact set B has a convergent subsequence, and on
the other hand, a set consisting of a convergent sequence and its limit is a compact
set. Therefore the condition in the second part of (d) is equivalent to the condition
that if xk → x̄, yk ∈ S−1 (xk ) and yk → ȳ with yk ∈ D, one has ȳ ∈ S−1 (x̄). But this is
precisely the condition for S to be osc relative to D.
(e): Failure of the condition in (e) means the existence of an open set O and a
sequence yk → ȳ in D such that ȳ ∈ S−1(O) but yk ∈ / S−1 (O); that is, S(ȳ) ∩ O = 0/
yet S(y ) ∩ O = 0/ for all k. This last property says that lim infk S(yk ) ⊃ S(ȳ), by
k
3A.2(a). Hence the condition in (e) fails precisely when S is not isc.
The equivalences in (f) and (g) follow from 3A.2(e) and 3A.2(f).
Assume that each fi is a continuous real-valued function on IRd × IRn . Then S is osc
at any point of its domain. If moreover each fi is convex in x for each p and p̄ is
such that there exists x̄ with fi ( p̄, x̄) < 0 for each i = 1, . . . , m, then S is continuous
at p̄.
Detail. The graph of S is the intersection of the sets (p, x) f i (p, x) ≤ 0 , which
are closed by the continuity of fi . Then gph S is closed, and the osc property comes
from Theorem 3B.2(c). The isc part will follow from a much more general result
(Robinson-Ursescu theorem) which we present in Chapter 5.
Here f 0 is the objective function and Sfeas is the feasible set mapping (with Sfeas (p)
taken to be the empty set when p ∈ / P). In particular, Sfeas could be specified by
constraints in the manner of Example 3B.4, but we now allow it to be more general.
Our attention is focused now on two other mappings in this situation: the optimal
value mapping acting from IRd to IR and defined by
Sval : p → inf f0 (p, x) x ∈ Sfeas (p) when the inf is finite,
x
and the optimal set mapping acting from P to IRn and defined by
Sopt : p → x ∈ Sfeas (p) f0 (p, x) = Sval (p) .
This gives us
Then there exist ε > 0 and sequences pk → p̄ in P and xk ∈ Sfeas (pk ), k ∈ IN, such
that
From (a) we see that d(xk , Sfeas ( p̄)) → 0 as k → ∞. This provides the existence of a
sequence of points x̄k ∈ Sfeas ( p̄) such that |xk − x̄k | → 0 as k → ∞. Because Sfeas ( p̄)
is compact, there must be some x̄ ∈ Sfeas ( p̄) along with an index set N ∈ N such
N N
that x̄k → x̄, in which case xk → x̄ as well. Then, from the continuity of f0 at ( p̄, x̄),
we have f0 ( p̄, x̄) ≤ f0 (pk , xk ) + ε for k ∈ N and sufficiently large, which, together
with (3), implies for such k that
which, combined with (1), gives us the continuity of the optimal value mapping Sval
at p̄ relative to P.
3 Regularity Properties of Set-valued Solution Mappings 147
Example 3B.6 (minimization over a fixed set). Let X be a nonempty, compact sub-
set of IRn and let f0 be a continuous function from P× X to IR, where P is a nonempty
subset of IRd . For each p ∈ P, let
Sval (p) = min f0 (p, x), Sopt (p) = argmin f0 (p, x).
x∈X x∈X
Then the function Sval : P → IR is continuous relative to P, and the mapping Sopt :
P→
→ IRn is osc relative to P.
Detail. This exploits the case of Theorem 3B.5 where Sfeas is the constant mapping
p → X.
x2
x1
Example 3B.7 (Painlevé–Kuratowski continuity of the feasible set does not imply
continuity of the optimal value). Consider the minimization of f0 (x1 , x2 ) = ex1 + x22
on the set
x1 x1
Sfeas (p) = (x1 , x2 ) ∈ IR2 − − p ≤ x 2 ≤ + p .
(1 + x21) (1 + x21)
For parameter value p = 0, the optimal value Sval (0) = 1 and occurs at Sopt (0) =
(0, 0), but for p > 0 the asymptotics of the function x1 /(1 + x21 ) open up a “phantom”
148 3 Regularity Properties of Set-valued Solution Mappings
portion of the feasible set along the negative x1 -axis, and the optimal value is 0, see
Fig. 3.2. The feasible set nonetheless does depend continuously on p in [0, ∞) in the
Painlevé–Kuratowski sense.
(3) d(x, S(y)) ≤ κ d(y, S−1 (x) ∩ D) for all x ∈ IRn and y ∈ D.
3 Regularity Properties of Set-valued Solution Mappings 149
Since the left side of this inequality does not depend on ε , passing to zero with ε we
conclude that (3) holds with the κ of (1).
Conversely, let (3) hold, let y, y ∈ D ⊂ dom S and let x ∈ S(y). Then
since y ∈ S−1 (x) ∩ D. Taking the supremum with respect to x ∈ S(y), we obtain
e(S(y), S(y )) ≤ κ |y − y| and, by symmetry, we get (1).
For the inverse mapping F = S−1 the property described in (3) can be written as
and when gph F is closed this can be interpreted in the following manner. Whenever
we pick a y ∈ D and an x ∈ dom F, the distance from x to the set of solutions u of
the inclusion y ∈ F(u) is proportional to d(y, F(x) ∩ D), which measures the extent
to which x itself fails to solve this inclusion. In Section 3E we will introduce a local
version of this property which plays a major role in variational analysis and is known
as “metric regularity.”
The difficulty with the concept of Lipschitz continuity for set-valued map-
pings S with values S(y) that may be unbounded comes from the fact that usually
h(C1 ,C2 ) = ∞ when C1 or C2 is unbounded, the only exceptions being cases where
both C1 and C2 are unbounded and “the unboundedness points in the same direc-
tion.” For instance, when C1 and C2 are lines in IR2 , one has h(C1 ,C2 ) < ∞ only
when these lines are parallel.
In the remainder of this section we consider a particular class of set-valued map-
pings, with significant applications in variational analysis, which are automatically
Lipschitz continuous even when their values are unbounded sets.
In the notational context of elements x ∈ S(y) for a mapping S : IRm →→ IRn , poly-
hedral convexity of S is equivalent to the existence of a positive integer r, matrices
D ∈ IRr×n , E ∈ IRr×m , and a vector q ∈ IRr such that
(4) S(y) = x ∈ IRn Dx + Ey ≤ q for all y ∈ IRm .
Note for instance that any mapping S whose graph is a linear subspace is a poly-
hedral convex mapping.
Also, recall that the convex hull of a set C ⊂ IRn , which will be denoted by co C,
is the smallest convex set that includes C. (It can be identified as the intersection of
all convex sets that include C, but also can be described as consisting of all linear
combinations λ0 x0 + λ1 x1 + · · · + λn xn with xi ∈ C, λi ≥ 0, and λ0 + λ1 + · · · + λn =
1; this is Carathéodory’s theorem.) The closed convex hull of C is the closure of
the convex hull of C and denoted cl co C; it is the smallest closed convex set that
contains C.
(5) d(x, S(y)) ≤ L|(Ax − y)+ | for every y ∈ dom S and every x ∈ IRn .
Proof. For any y ∈ dom S the set S(y) is nonempty, convex and closed, hence any
point x ∈
/ S(y) has a unique (Euclidean) projection u = PS(y) (x) on S(y) (Proposi-
tion 1D.5):
where NS(y) is the normal cone mapping to the convex set S(y). In these terms, the
problem of projecting x on S(y) is equivalent to that of finding the unique u = x such
that
x ∈ u + NS(y)(u).
3 Regularity Properties of Set-valued Solution Mappings 151
where the ai ’s are the rows of the matrix A regarded as vectors in IRn . Thus, the
projection u of x on S(y), as described by (6), can be obtained by finding a pair
(u, λ ) such that
(7)
x − u − ∑m i=1 λi ai = 0,
λi ≥ 0, λi (ai , u − yi) = 0, i = 1, . . . , m.
While the projection u exists and is unique, this variational inequality might not
have a unique solution (u, λ ) because the λ component might not be unique. But
since u = x (through our assumption that x ∈ / S(y)), we can conclude from the first
relation in (7) that for any solution (u, λ ) the vector λ = (λ1 , . . . , λm ) is not the zero
vector. Consider the family J of subsets J of {1, . . . , m} for which there are real
numbers λ1 , . . . , λm with λi > 0 for i ∈ J and λi = 0 for i ∈/ J and such that (u, λ )
satisfies (7). Of course, if ai , u − yi < 0 for some i, then λi = 0 according to the
second relation (complementarity) in (7), and then this i cannot be an element of
any J. That is,
Since the set of vectors λ such that (u, λ ) solves (7) does not contain the zero vector,
we have J = 0. /
We will now prove that there is a nonempty index set J¯ ∈ J for which there are
no numbers βi , i ∈ J¯ satisfying
On the contrary, suppose that for every J ∈ J this is not the case, that is, (9) holds
with J¯ = J for some βi , i ∈ J. Let J be a set in J with a minimal number of
elements (J might be not unique). Note that the number of elements in any J is
greater than 1. Indeed, if there were just one element i in J , then we would have
βi ai = 0 and βi > 0, hence ai = 0, and then, since (7) holds for (u, λ ) such that
λi = βi , i = i , λi = 0, i = i , from the first equality in (7) we would get x = u which
contradicts the assumption that x ∈ / S(y). Since J ∈ J , there are λi > 0, i ∈ J such
that
Multiplying both sides of the equality in (11) by a positive scalar t and adding to
(10), we obtain
x − u = ∑ (λi − t βi )ai .
i∈J
Let
λ
t0 = min i i ∈ J with βi > 0 .
i βi
Thus, the vector λ ∈ IRm with components λi − t0 βi when i ∈ J and λi = 0 when
i∈/ J is such that (u, λ ) satisfies (7). Hence, we found a nonempty index set J ∈ J
having fewer elements than J , which contradicts the choice of J . The contradiction
obtained proves that there is a nonempty index set J¯ ∈ J for which there are no
numbers βi , i ∈ J, satisfying (8). In particular, the zero vector in IRn is not in the
convex hull co {a j , j ∈ J}.¯
Let λ̄i > 0, i ∈ J,¯ be the corresponding vector of multipliers such that, if we set
λ̄i = 0 for i ∈
/ J,¯ we have that (u, λ̄ ) is a solution of (7). Since ∑ j∈J¯ λ̄i ai = 0, because
otherwise (9) would hold for βi = λ̄i , we have
γ := ∑ λ̄i > 0.
i∈J¯
Because (7) holds with (u, λ̄ ), using (7) and (8) we have
λ̄
i 1
d(0, co {a j , j ∈ J})|x
¯ − u| ≤ ∑ ai |x − u| = |x − u||x − u|
i∈J¯
γ γ
.1 / .1 /
=
γ
(x − u), x − u = ∑
γ i∈
λ̄ i a i , x − u
J¯
λ̄i λ̄i
= ∑ γ (ai , x − ai, u) = ∑ γ (ai , x − yi)
i∈J¯ i∈J¯
≤ max{(ai , x − yi )+ }.
i∈J¯
This inequality remains valid (perhaps with a different constant c) after passing from
the max vector norm to the equivalent Euclidean norm. This proves (5).
3 Regularity Properties of Set-valued Solution Mappings 153
Proof of Theorem 3C.3. Let y, y ∈ dom S and let x ∈ S(y). Since S is polyhedral,
from the representation (4) we have Dx + Ey − q ≤ 0 and then
Then from Lemma 3C.4 above we obtain the existence of a constant L such that
with κ = L|E|. The same must hold with the roles of y and y reversed, and in
consequence S is Lipschitz continuous on dom S.
Here c is a fixed vector in IRn , A is a fixed matrix in IRm×n . Define the solution
mappings associated with (13) as in Section 3B, that is, the feasible set mapping
(14) Sfeas : y → x Ax ≤ y ,
It is known from the theory of linear programming that Sopt (y) = 0/ when the infi-
mum in (15) is finite (and only then).
Next, for the case of Sval , consider any y, y ∈ D and any x ∈ Sopt (y), which exists
because Sopt is nonempty when y ∈ D. In particular we have x ∈ Sfeas (y). From the
Lipschitz continuity of Sfeas , there exists x ∈ Sfeas (y ) such that |x − x | ≤ κ |y − y |.
Use this along with the fact that Sval (y ) ≤ c, x but Sval (y) = c, x to get a bound
on Sval (y ) − Sval (y) which confirms the Lipschitz continuity claimed for Sval .
For the case of Sopt , consider the set-valued mapping
G : (y,t) → x ∈ IRn Ax ≤ y, c, x ≤ t for (y,t) ∈ IRm × IR.
Confirm that this mapping is polyhedral convex and apply Theorem 3C.3 to it. Ob-
serve that Sopt (y) = G(y, Sval (y)) for y ∈ D and invoke the Lipschitz continuity of
Sval .
or equivalently
(1), and furthermore outer Lipschitz continuity might not entail outer semicontinu-
ity (where closed-valuedness is essential). Therefore, we hold back from such an
extension.
We present next a result which historically was the main motivation for intro-
ducing the property of outer Lipschitz continuity and which complements Theo-
rem 3C.3. It uses the following concept.
Then each Si is Lipschitz continuous on its domain, according to Theorem 3C.3. Let
ȳ ∈ dom S and let
J = i there exists x ∈ IRn with (ȳ, x) ∈ Gi .
For any i ∈/ J , since the sets {ȳ} × IRn and Gi are1disjoint and polyhedral
1
convex,
there is a neighborhood Vi of ȳ such that (Vi × IRn ) Gi = 0.
/ Let V = i∈J/ V i . Then
of course V is a neighborhood of ȳ and we have
- ,
k , ,
(4) (V × IRn ) gph S ⊂ Gi \ Gi ⊂ Gi .
i=1 i∈J
/ i∈J
Let y ∈ V . If S(y) = 0,
/ then the relation (1) holds trivially. Let x be any point in S(y).
Then from (4), - ,
(y, x) ∈ (V × IRn ) gph S ⊂ Gi ,
j∈J
hence for some i ∈ J we have (y, x) ∈ Gi , that is, x ∈ Si (y). Since each Si is Lip-
schitz continuous and ȳ ∈ dom Si , with constant κi , say, we obtain by using (3) that
156 3 Regularity Properties of Set-valued Solution Mappings
d(x, S(ȳ)) ≤ maxi d(x, Si (ȳ)) ≤ maxi e(Si (y), Si (ȳ)) ≤ maxi κi |y − ȳ|.
We know from Section 2E that the normal cone to C at the point x ∈ C is the set
NC (x) = u u = ∑i=1 yi ai , yi ≥ 0 for i ∈ I(x), yi = 0 for i ∈
m
/ I(x) ,
where I(x) = i ai , x = αi is the active index set for x ∈ C. The graph of the
normal cone mapping NC is not convex, unless C is a translate of a subspace, but it
is the union, with respect to all possible subsets J of {1, . . . , m}, of the polyhedral
convex sets
(x, u) u = ∑i=1 yi ai , ai , x = αi , yi ≥ 0 if i ∈ J, ai , x < αi , yi = 0 if i ∈
m
/J .
It remains to observe that the graph of the sum A+ NC is also the union of polyhedral
convex sets.
Outer Lipschitz continuity becomes automatically Lipschitz continuity when the
mapping is inner semicontinuous, a property we introduced in the preceding section.
Theorem 3D.3 (isc criterion for Lipschitz continuity). Consider a set-valued map-
ping S : IRm →
→ IRn and a convex set D ⊂ dom S such that S(y) is closed for every
y ∈ D. Then S is Lipschitz continuous relative to D with constant κ if and only if S is
both inner semicontinuous (isc) relative to D and outer Lipschitz continuous relative
to D with constant κ.
Proof. Let S be inner semicontinuous and outer Lipschitz continuous with constant
κ , both relative to D. Choose y, y ∈ D and let yt = (1 − t)y + ty. The assumed outer
Lipschitz continuity together with the closedness of the values of S implies that for
each t ∈ [0, 1] there exists a positive rt such that
Let
(5) τ = sup t ∈ [0, 1] S(ys ) ⊂ S(y) + κ |ys − y|IB for each s ∈ [0,t] .
Hence, yτ ∈ / S−1 (O), that is, S(yτ ) ∩ O = 0/ and therefore S(yτ ) is a subset of S(y) +
κ |yτ − y|IB. This implies that the supremum in (5) is attained.
Let us next prove that τ = 1. If τ < 1 there must exist η ∈ (τ , 1) with |yη − yτ | <
rτ such that
S(yη ) ⊂ S(yτ ) + κ |yη − yτ |IB ⊂ S(y) + κ (|yη − yτ | + |yτ − y|)IB = S(y) + κ |yη − y|IB,
where the final equality holds because yτ is a point in the segment [y, yη ]. This
contradicts (6), hence τ = 1. Putting τ = 1 into (5) results in S(y ) ⊂ S(y) + κ |y − y|.
By the symmetry of y and y , we obtain that S is Lipschitz continuous relative to D.
Conversely, if S is Lipschitz continuous relative to D, then S is of course outer
Lipschitz continuous. Let now y ∈ D and let O be an open set such that y ∈ S−1(O).
Then there is x ∈ S(y) and ε > 0 such that x ∈ S(y) ∩ O and x + ε IB ⊂ O. Let 0 <
ρ < ε /κ and pick a point y ∈ D ∩ IBρ (y). Then
Hence there exists x ∈ S(y ) with |x − x| ≤ ε and thus x ∈ S(y ) ∩ O, that is y ∈
S−1 (O). This means that S−1(O) is open relative to D, and from Theorem 3B.2(e)
we conclude that S is isc relative to D.
We obtain from Theorems 3D.1 and 3D.3 some further insights.
Corollary 3D.6. If the solution mapping S of the linear variational inequality in Ex-
ercise 3D.2 is single-valued everywhere in IRn , then it must be Lipschitz continuous
globally.
Inner Lipschitz continuity might be of interest on its own, but no significant appli-
cation of this property in variational analysis has come to light, as yet. Even very
simple polyhedral (nonconvex) mappings don’t have this property (e.g., consider the
mapping from IR to IR, which graph is the union of the axes, and choose the origin
as reference point) as opposed to the outer Lipschitz continuity which holds for ev-
ery polyhedral mapping. In addition, a local version of this property does not obey
the general implicit function theorem paradigm, as we will show in Section 3H. We
therefore drop inner Lipschitz continuity from further consideration.
3 Regularity Properties of Set-valued Solution Mappings 159
Aubin property. A mapping S : IRm → → IRn is said to have the Aubin property at ȳ ∈
IR for x̄ ∈ IR if x̄ ∈ S(ȳ) and there is a constant κ ≥ 0 together with neighborhoods
m n
The infimum of κ over all such combinations of κ, U and V is called the Lipschitz
modulus of S at ȳ for x̄ and denoted by lip (S; ȳ| x̄). The absence of this property is
signaled by lip (S; ȳ| x̄) = ∞.
It is not claimed that (1) and (2) are themselves equivalent, although that is true
when S(y) is closed for every y ∈ V . Nonetheless, the infimum furnishing lip (S; ȳ| x̄)
is the same whichever formulation is adopted. When S is single-valued on a neigh-
borhood of ȳ, then the Lipschitz modulus lip (S; ȳ|S(ȳ)) equals the usual Lipschitz
modulus lip (S; ȳ) for functions.
In contrast to Lipschitz continuity, the Aubin property is tied to a particular point
in the graph of the mapping. As an example, consider the set-valued mapping S :
IR →
→ IR defined as √
{0, 1 + y} for y ≥ 0,
S(y) =
0 for y < 0.
At 0, the value S(0) consists of two points, 0 and 1. This mapping has the Aubin
property at 0 for 0 but not at 0 for 1. Also, S is not Lipschitz continuous relative to
any interval containing 0.
Observe that when a set-valued mapping S has the Aubin property at ȳ for x̄, then,
for every point (y, x) ∈ gph S which is sufficiently close to (ȳ, x̄), it has the Aubin
property at y for x as well. It is also important to note that the Aubin property of S
at ȳ for x̄ implicitly requires ȳ be an element of int dom S; this is exhibited in the
following proposition.
Proposition 3E.1 (local nonemptiness). If S : IRm → → IRn has the Aubin property at
ȳ for x̄, then for every neighborhood U of x̄ there exists a neighborhood V of ȳ such
that S(y) ∩U = 0/ for all y ∈ V .
Proof. The inclusion (2) for y = ȳ yields
160 3 Regularity Properties of Set-valued Solution Mappings
That is, S(y) intersects every neighborhood of x̄ when y is sufficiently close to ȳ.
The property displayed in Proposition 3E.1 is a local version of the inner semi-
continuity. Further, if S is Lipschitz continuous relative to an open set D, then S
has the Aubin property at any y ∈ D ∩ int dom S for any x ∈ S(y). In particular, the
inverse A−1 of a linear mapping A is Aubin continuous at any point provided that
rge A = dom A−1 has nonempty interior, that is, A is surjective. The converse is also
true, since the inverse A−1 of a surjective linear mapping A is Lipschitz continuous
on the whole space, by Theorem 3C.3, and hence A−1 has the Aubin property at any
point.
Hence, there exists x ∈ S(y ) such that |x − s(y)| ≤ d(s(y), S(y ))+ a/4 ≤ a/2. Since
|s(y) − x̄| = d(x̄, S(y)) and
|x − x̄| ≤ |x − s(y)| + |s(y) − x̄| ≤ a/2 + d(x̄, S(y)) ≤ a/2 + κ |y − ȳ| < a
that is, S has the desired Aubin property. For the “only if” part, suppose now that S
has the Aubin property at ȳ for x̄ with constant κ , and let a > 0 and b > 0 be such
that
(3) S(y ) ∩ IBa (x̄) ⊂ S(y) + κ |y − y|IB for all y , y ∈ IBb (ȳ).
(5) |x − x | ≤ κ |y − y|.
If x ∈ IBa (x̄), there is nothing more to prove, so assume that r := |x − x̄| > a. By (4),
we can choose a point x̃ ∈ S(y) ∩ IBa/2 (x̄). Since S(y) is convex, there exists a point
z ∈ S(y) on the segment [x, x̃] such that |z − x̄| = a and then z ∈ S(y) ∩ IBa (x̄). We
will now show that
(6) |z − x | ≤ 5κ |y − y|,
which yields that the mapping y → S(y) ∩ IBa (x̄) is Lipschitz continuous on IBb (ȳ)
with constant 5κ .
By construction, there exists t ∈ (0, 1) such that z = (1 − t)x + t x̃. Then
r−a
t≤ .
r − a/2
Using the triangle inequality |x̃ − x| ≤ |x − x̄| + |x̃ − x̄| ≤ r + a/2, we obtain
r−a
(7) |z − x| = t|x̃ − x| ≤ (r + a/2).
r − a/2
162 3 Regularity Properties of Set-valued Solution Mappings
Note that d := r − a is exactly the distance from x to the ball IBa (x̄), hence d ≤ |x− x |
because x ∈ IBa (x̄). Combining this with (9) and taking into account (6), we arrive
at
|z − x | ≤ |z − x| + |x − x| ≤ 4d + |x − x| ≤ 5|x − x| ≤ 5κ |y − y|.
But this is (6), and we are done.
The Aubin property could alternatively be defined with one variable “free,” as
shown in the next proposition.
Proof. Clearly, (10) implies (1). Assume (1) with corresponding U and V and
choose positive a and b such that IBa (x̄) ⊂ U and IBb (ȳ) ⊂ V . Let 0 < a < a and
0 < b < b be such that
(11) 2κ b + a ≤ κ b.
hence
Take any y ∈ IRm . If y ∈ IBb (ȳ) the inequality in (10) comes from (1) and there is
nothing more to prove. Assume |y − ȳ| > b. Then |y − y | > b − b and from (11),
κ b + a ≤ κ (b − b) ≤ κ |y − y|. Using this in (12) we obtain
and since S(y ) ∩ IBa (x̄) is obviously a subset of IBa (x̄), we come again to (10).
The Aubin property of a mapping is characterized by Lipschitz continuity of the
distance function associated with it.
3 Regularity Properties of Set-valued Solution Mappings 163
Proof. Let κ > lip (S; ȳ| x̄). Then, from 3E.1, there exist positive constants a and b
such that
(14) 0/ = S(y) ∩ IBa (x̄) ⊂ S(y ) + κ |y − y|IB for all y, y ∈ IBb (ȳ).
Without loss of generality, let a/(4κ ) ≤ b. Let y ∈ IBa/(4κ ) (ȳ) and x ∈ IBa/4 (x̄) and
let x̃ be a projection of x on cl S(y). Using 1D.4(b) and (14) with y = ȳ we have
Using the fact that for any set C and for any r ≥ 0 one has
By the symmetry of y and y , we conclude that lip (s; (ȳ, x̄)) ≤ κ . Since κ can be
y
arbitrarily close to lip (S; ȳ| x̄), it follows that
Conversely, let κ > lip (s; (ȳ, x̄)). Then there exist neighborhoods U and V of
y
x̄ and ȳ, respectively, such that s(·, x) is Lipschitz continuous relative to V with a
constant κ for any given x ∈ U. Let y, y ∈ V . Since V ⊂ dom s(·, x) for any x ∈ U
we have that S(y ) ∩ U = 0./ Pick any x ∈ S(y ) ∩ U; then s(y , x) = 0 and, by the
164 3 Regularity Properties of Set-valued Solution Mappings
Taking supremum with respect to x ∈ S(y ) ∩ U on the left, we obtain that S has
the Aubin property at ȳ for x̄ with constant κ . Since κ can be arbitrarily close to
(s; (ȳ, x̄)), we get
lip y
(s; (ȳ, x̄)) ≥ lip (S; ȳ| x̄).
lip y
The infimum of κ over all such combinations of κ, U and V is called the regula-
rity modulus for F at x̄ for ȳ and denoted by reg (F; x̄| ȳ). The absence of metric
regularity is signaled by reg (F; x̄| ȳ) = ∞.
Metric regularity is a valuable concept in its own right, especially for numerical
purposes. For a general set-valued mapping F and a vector y, it gives an estimate
for how far a point x is from being a solution to the generalized equation F(x) y
in terms of the “residual” d(y, F(x)).
To be specific, let x̄ be a solution of the inclusion ȳ ∈ F(x), let F be metrically
regular at x̄ for ȳ, and let xa and ya be approximations to x̄ and ȳ, respectively. Then
from (18), the distance from xa to the set of solutions of the inclusion ya ∈ F(x)
is bounded by the constant κ times the residual d(ya , F(xa )). In applications, the
residual is typically easy to compute or estimate, whereas finding a solution might
be considerably more difficult. Metric regularity says that there exists a solution to
the inclusion ya ∈ F(x) at distance from xa proportional to the residual. In particular,
if we know the rate of convergence of the residual to zero, then we will obtain the
rate of convergence of approximate solutions to an exact one.
Proposition 3C.1 for a mapping S, when applied to F = S−1 , F −1 = S, ties the
Lipschitz continuity of F −1 relative to a set D to a condition resembling (18), but
with F(x) replaced by F(x) ∩ D on the right, and with U × V replaced by IRn × D.
We demonstrate now that metric regularity of F in the sense of (18) corresponds to
the Aubin property of F −1 for the points in question.
3 Regularity Properties of Set-valued Solution Mappings 165
Theorem 3E.6 (equivalence of metric regularity and the inverse Aubin property). A
set-valued mapping F : IRn → → IRm is metrically regular at x̄ for ȳ with a constant κ if
and only if its inverse F : IRm →
−1
→ IRn has the Aubin property at ȳ for x̄ with constant
κ, i.e. there exist neighborhoods U of x̄ and V of ȳ such that
Thus,
Proof. Let κ > reg (F; x̄| ȳ); then there are positive constants a and b such that (18)
holds with U = IBa (x̄), V = IBb (ȳ) and with this κ . Without loss of generality, assume
b < a/κ . Choose y, y ∈ IBb (ȳ). If F −1 (y) ∩ IBa (x̄) = 0,
/ then d(x̄, F −1 (y)) ≥ a. But
then the inequality (18) with x = x̄ yields
a contradiction. Hence there exists x ∈ F −1 (y) ∩ IBa (x̄), and for any such x we have
from (18) that
Taking the supremum with respect to x ∈ F −1 (y) ∩ IBa (x̄) we obtain (19) with U =
IBa (x̄) and V = IBb (ȳ), and therefore
(23) e(F −1 (y ) ∩U, F −1 (y)) ≤ κ |y − y| for all y ∈ IRm and y ∈ V.
This holds for any y ∈ F(x), hence, by taking the infimum with respect to y ∈ F(x)
in the last expression we get
(If F(x) = 0,
/ then because of the convention d(y, 0) / = ∞, this inequality holds auto-
matically.) Hence, F is metrically regular at x̄ for ȳ with a constant κ . Then we have
κ ≥ reg (F; x̄| ȳ) and hence reg (F; x̄| ȳ) ≤ lip (F −1 ; ȳ| x̄). This inequality together
with (22) results in (20).
166 3 Regularity Properties of Set-valued Solution Mappings
Observe that metric regularity of F at x̄ for ȳ does not require that x̄ ∈ int dom F.
Indeed, if x̄ is an isolated point of dom F then the right side in (18) is ∞ for all x ∈ U,
x = x̄, and then (18) holds automatically. On the other hand, for x = x̄ the right side
of (18) is always finite (since by assumption x̄ ∈ dom F), and then F −1 (y) = 0/ for
y ∈ V . This also follows from 3E.1 via 3E.6.
(24) d(x, F −1 (y)) ≤ κ d(y, F(x)) for all x ∈ U having F(x)∩V = 0/ and all y ∈ V.
Guide. First, note that (18) implies (24). Let (24) hold with constant κ and neigh-
borhoods IBa (x̄) and IBb (ȳ) having b < a/κ . Choose y, y ∈ IBb (ȳ). As in the proof of
3E.6 show first that F −1 (y) ∩ IBa (x̄) = 0/ by noting that F(x̄) ∩ IBb (ȳ) = 0/ and hence
the inequality in (24) holds for x̄ and y. Then for any x ∈ F −1 (y) ∩ IBa (x̄) we have
that y ∈ F(x) ∩ IBb (ȳ), that is, F(x) ∩ IBb (ȳ) = 0.
/ Thus, the inequality in (24) holds
with y and any x ∈ F −1 (y) ∩ IBa (x̄), which leads to (21) and hence to (19), in the
same way as in the proof of 3E.6. The rest follows from the equivalence of (18) and
(19) established in 3E.6.
There is a third property, which we introduced for functions in Section 1F, and
which is closely related to both metric regularity and the Aubin property.
Openness. A mapping F : IRn → → IRm is said to be open at x̄ for ȳ if ȳ ∈ F(x̄) and for
every neighborhood U of x̄, F(U) is a neighborhood of ȳ.
From the equivalence of metric regularity of F at x̄ for ȳ and the Aubin property
of F −1 at ȳ for x̄, and Proposition 3E.1, we obtain that if a mapping F is metrically
regular at x̄ for ȳ, then F is open at x̄ for ȳ. Metric regularity is actually equivalent
to the following stronger version of the openness property:
Openness is a particular case of linear openness and follows from (25) for x = x̄.
Linear openness postulates openness around the reference point with balls having
proportional radii.
r = |y− y |, for every ε > 0 we have y ∈ (F(x )+ r(1 + ε ) intIB)∩V . From (25), there
exists x ∈ F −1 (y) with |x − x | ≤ κ (1 + ε )r = κ (1 + ε )|y − y|. Then d(x , F −1 (y)) ≤
κ (1+ ε )|y −y|. Taking infimum with respect to y ∈ F(x ) on the right and passing to
zero with ε (since the left side does not depend on ε ), we obtain that F is metrically
regular at x̄ for ȳ with constant κ .
For the converse implication, we use the characterization of the Aubin property
given in Proposition 3E.4. Let x ∈ U, r > 0, and let y ∈ (F(x) + r intIB) ∩ V . Then
there exists y ∈ F(x) such that |y − y | < r. Let ε > 0 be so small that (κ + ε )|y −
y | < κ r. From (10) we obtain d(x, F −1 (y )) ≤ κ |y − y | ≤ (κ + ε )|y − y |. Then
there exists x ∈ F −1 (y ) such that |x − x | ≤ (κ + ε )|y − y |. But then y ∈ F(x ) ⊂
F(x + (κ + ε )|y − y|IB) ⊂ F(x + κ rintIB), which yields (25) with constant κ .
In the classical setting, of course, the equation f (p, x) = 0 is solved for x in
terms of p, and the goal is to determine when this reduces to x being a function of p
through a localization, moreover one with some kind of property of differentiability,
or at least Lipschitz continuity. Relinquishing single-valuedness entirely, we can
look at “solving” the relation
Fixing a pair ( p̄, x̄) such that x̄ ∈ S( p̄), we can raise questions about local behavior
of S as might be deduced from assumptions on G.
We will concentrate here on the extent to which S can be guaranteed to have the
Aubin property at p̄ for x̄. This turns out to be true when G has the Aubin property
with respect to p and a weakened metric regularity property with respect to x, but
we have to formulate exactly what we need about this in a local sense.
Partial Aubin property. The mapping G : IRd × IRn → → IRm is said to have the par-
tial Aubin property with respect to p uniformly in x at ( p̄, x̄) for ȳ if ȳ ∈ G( p̄, x̄) and
there is a constant κ ≥ 0 together with neighborhoods Q for p̄, U of x̄ and V of ȳ
such that
The infimum of κ over all such combinations of κ, Q, U and V is called the partial
Lipschitz modulus of G with respect to p uniformly in x at ( p̄, x̄) for ȳ and denoted
(G; p̄, x̄| ȳ). The absence of this property is signaled by lip
by lip (G; p̄, x̄| ȳ) = ∞.
p p
The basic result we are able now to state about the solution mapping in (27) could
be viewed as an “implicit function” complement to the “inverse function” result in
168 3 Regularity Properties of Set-valued Solution Mappings
Theorem 3E.6, rather than as a result in the pattern of the implicit function theorem
(which features approximations of one kind or another).
(29) d(x, S(p)) ≤ λ d(0, G(p, x)) for all (p, x) close to ( p̄, x̄).
Then the solution mapping S in (27) has the Aubin property at p̄ for x̄ with constant
λ κ.
Proof. Take p, p ∈ Q and x ∈ S(p) ∩U so that (28) holds for a neighborhood V of
0. From (29) and then (28) we have
d(x, S(p )) ≤ λ d(0, G(p , x)) ≤ λ e(G(p, x) ∩V, G(p , x)) ≤ λ κ |p − p|.
Taking the supremum of the left side with respect to x ∈ S(p) ∩U, we obtain that S
has the Aubin property with constant λ κ .
including the limit cases when either of these moduli is 0 and then the other is ∞
under the convention 0∞ = ∞.
Guide. Let κ > reg (F; x̄| ȳ) and γ > lip (F; x̄| ȳ). Then there are neighborhoods U
of x̄ and V of ȳ corresponding to metric regularity and the Aubin property of F with
constants κ and γ , respectively. Let (x, y) ∈ U × V be such that d(x, F −1 (y)) > 0
(why does such a point exist?). Then there exists x ∈ F −1 (y) such that 0 < |x − x | =
d(x, F −1 (y)). We have
3 Regularity Properties of Set-valued Solution Mappings 169
Hence, κγ ≥ 1.
For a solution mapping S = F −1 of an inclusion F(x) y with a parameter y, the
quantity lip (S; ȳ| x̄) measures how “stable” solutions near x̄ are under changes of the
parameter near ȳ. In this context, the smaller this modulus is, the “better” stability we
have. In view of 3E.11, better stability means larger values of the regularity modulus
reg (S; ȳ| x̄). In the limit case, when S is a constant function near ȳ, that is, when the
solution is not sensitive at all with respect to small changes of the parameter y near
ȳ, then lip (S; ȳ|S(ȳ)) = 0 while the metric regularity modulus of S there is infinity.
In Section 6A we will see that the “larger” the regularity modulus of a mapping is,
the “easier” it is to perturb the mapping so that it looses its metric regularity. In this
connection, E3.11 provides a good candidate for a condition number of a general
mapping, but we shall not go into this here.
Theorem 3F.1 (inverse mapping theorem with metric regularity). Consider a map-
ping F : IRn →
→ IRm and any (x̄, ȳ) ∈ gph F at which gph F is locally closed and let κ
and μ be nonnegative constants such that
Then for any function g : IRn → IRm with x̄ ∈ int dom g and lip (g; x̄) ≤ μ, one has
κ
(1) reg (g + F; x̄|g(x̄) + ȳ) ≤ .
1 − κμ
Corollary 3F.2 (detailed estimates). Consider a mapping F : IRn → → IRm and any
pair (x̄, ȳ) ∈ gph F at which gph F is locally closed. If reg (F; x̄| ȳ) > 0, then for any
g : IRn → IRm such that reg (F; x̄| ȳ) · lip (g; x̄) < 1, one has
(2) reg (g + F; x̄|g(x̄) + ȳ) ≤ (reg (F; x̄| ȳ)−1 − lip (g; x̄))−1 .
If reg (F; x̄| ȳ) = 0, then reg (g + F; x̄|g(x̄) + ȳ) = 0 for any g : IRn → IRm with
lip (g; x̄) < ∞. If reg (F; x̄| ȳ) = ∞, then reg (g + F; x̄|g(x̄) + ȳ) = ∞ for any g : IRn →
IRm with lip (g; x̄) = 0.
Proof. If reg (F; x̄| ȳ) < ∞, then by choosing κ and μ appropriately and pass-
ing to limits in (1) we obtain the claimed inequality (2) also for the case where
reg (F; x̄| ȳ) = 0. Let reg (F; x̄| ȳ) = ∞, and suppose that reg (g + F; x̄|g(x̄) + ȳ) < κ
for some κ and a function g with lip (g; x̄) = 0. Note that g is Lipschitz continuous
around x̄, hence the graph of g is locally closed around (x̄, g(x̄)). Then g + F has
locally closed graph at (x̄, g(x̄) + ȳ). Applying Theorem 3F.1 to the mapping g + F
with perturbation −g, and noting that lip (−g; x̄) = 0, we get reg (F; x̄| ȳ) ≤ κ , which
constitutes a contradiction.
When the perturbation g has zero Lipschitz modulus at the reference point, we
obtain another interesting fact.
Proof. The cases with reg (F; x̄| ȳ) = 0 or reg (F; x̄| ȳ) = ∞ are already covered by
Corollary 3F.2. If 0 < reg (F; x̄| ȳ) < ∞, we get from (2) that
(x̄, ȳ) (or in other words that gph F is locally closed at (x̄, ȳ − f (x̄))). Then, for the
linearization
M0 (x) = f (x̄) + ∇ f (x̄)(x − x̄) + F(x)
one has
reg (M; x̄ | ȳ) = reg (M0 ; x̄| ȳ).
In the case when m = n and the mapping F is the normal cone mapping to a
polyhedral convex set, we can likewise employ a “first-order approximation” of F.
When f is linear the corresponding result parallels 2E.6.
Ax + NC (x) y.
Let x̄ be a solution for ȳ, let v̄ = ȳ − Ax̄, so that v̄ ∈ NC (x̄), and let K = KC (x̄, v̄) be
the critical cone to C at x̄ for v̄. Then, for the mappings
we have
reg (G; x̄| ȳ) = reg (G0 ; 0|0).
Proof. From reduction lemma 2E.4, for (w, u) in a neighborhood of (0, 0), we have
that v̄ + u ∈ NC (x̄ + w) if and only if u ∈ NK (w). Then, for (w, v) in a neighborhood
of (0, 0), we obtain ȳ+ v ∈ G(x̄ + w) if and only if v ∈ G0 (w). Thus, metric regularity
of A + NC at x̄ for ȳ with a constant κ implies metric regularity of A + NK at 0 for 0
with the same constant κ , and conversely.
Combining 3F.5 and 3F.6 we obtain the following corollary:
We are ready now to take up once more the study of a generalized equation
having the form
This time, however, we are not looking for single-valued localizations of S but aim-
ing at a better understanding of situations in which S may not have any such localiza-
tion, as in the example of parameterized constraint systems. Recall from Chapter 1
that, for f : IRd × IRn → IRm and a point ( p̄, x̄) ∈ int dom f , a function h : IRn → IRm
is said to be a strict estimator of f with respect to x uniformly in p at ( p̄, x̄) with
constant μ if h(x̄) = f (x̄, p̄) and
Theorem 3F.8 (implicit mapping theorem with metric regularity). For the general-
ized equation (3) and its solution mapping S in (4), and a pair ( p̄, x̄) with x̄ ∈ S( p̄),
let h : IRn → IRm be a strict estimator of f with respect to x uniformly in p at ( p̄, x̄)
with constant μ, let gph (h + F) be locally closed at (x̄, 0) and let h + F be metrically
regular at x̄ for 0 with reg (h + F; x̄|0) ≤ κ. Suppose that
κλ
(6) lip (S; p̄| x̄) ≤ .
1 − κμ
This theorem will be established in an infinite-dimensional setting in Section 5E,
so we will not prove it separately here. An immediate consequence is obtained by
specializing the function h in Theorem 3F.8 to a linearization of f with respect to
x. We add to this the effect of ample parameterization, in parallel to the case of
single-valued localization in Theorem 2C.2.
Theorem 3F.9 (using strict differentiability and ample parameterization). For the
generalized equation (3) and its solution mapping S in (4), and a pair ( p̄, x̄) with
x̄ ∈ S( p̄), suppose that f is strictly differentiable at ( p̄, x̄) and that gph F is locally
closed at (x̄, − f ( p̄, x̄)). If the mapping
is metrically regular at x̄ for 0, then S has the Aubin property at p̄ for x̄ with
then the converse implication holds as well: the mapping h + F is metrically regular
at x̄ for 0 provided that S has the Aubin property at p̄ for x̄.
Proof of 3F.9, initial part. In these circumstances with this choice of h, the condi-
tions in (5) are satisfied because, for e = f − h,
Thus, (7) follows from (6) with μ = 0. In the remainder of the proof, regarding
ample parameterization, we will make use of the following fact.
the form
y → N(y) = x x ∈ M(ψ (x, y)) for y ∈ IRm .
Let ψ satisfy
and, for p̄ = ψ (x̄, 0), let ( p̄, x̄) ∈ gph M at which gph M is locally closed. Under
these conditions, if M has the Aubin property at p̄ for x̄, then N has the Aubin
property at 0 for x̄.
174 3 Regularity Properties of Set-valued Solution Mappings
Proof. Let the mapping M have the Aubin property at p̄ for x̄ with neighborhoods
Q of p̄ and U of x̄ and constant κ > lip (M; p̄ | x̄). Choose λ > 0 with λ < 1/κ and
(ψ ; (x̄, 0)). By (9) there exist positive constants a and b such that for
let γ > lip y
any y ∈ IBa (0) the function ψ (·, y) is Lipschitz continuous on IBb (x̄) with Lipschitz
constant λ and for every x ∈ IBb (x̄) the function ψ (x, ·) is Lipschitz continuous on
IBa (0) with Lipschitz constant γ . Pick a positive constant c and make a and b smaller
if necessary so that:
(a) IBc ( p̄) ⊂ Q and IBb (x̄) ⊂ U,
(b) the set gph M ∩ (IBc ( p̄) × IBb (x̄)) is closed, and
(c) the following inequalities are satisfied:
4κγ a
(10) ≤ b and γ a + λ b ≤ c.
1 − κλ
Let y , y ∈ IBa (0) and let x ∈ N(y ) ∩ IBb/2 (x̄). Then x ∈ M(ψ (x , y )) ∩ IBb/2 (x̄).
Further, we have
and the same for ψ (x , y). From the Aubin property of M we obtain the existence of
x1 ∈ M(ψ (x , y)) such that
b
|x1 − x̄| ≤ |x1 − x | + |x − x̄| ≤ κγ |y − y| + |x − x̄| ≤ κγ (2a) + ≤ b,
2
and consequently
utilizing the second inequality in (10). Hence again, from the Aubin property of M
applied to x1 ∈ M(ψ (x , y)) ∩ IBb (x̄), there exists x2 ∈ M(ψ (x1 , y)) such that
b k−1 b 2aκγ
≤ + ∑ (κλ ) j κγ |y − y| ≤ + ≤ b,
2 j=0 2 1 − κλ
where we use the first inequality in (10). Hence |ψ (xk , y) − p̄| ≤ λ b + γ a ≤ c. Then
there exists xk+1 ∈ M(ψ (xk , y)) such that
we obtain, for any κ ≥ (κγ )/(1 − κλ ), and on passing to the limit with respect
to k → ∞, that |x − x | ≤ κ |y − y|. Thus, N has the Aubin property at 0 for x̄ with
constant κ .
Proof of 3F.9, final part. Under the ample parameterization condition (8), Lemma
2C.1 guarantees the existence of neighborhoods U of x̄, V of 0, and Q of p̄, as well
as a local selection ψ : U × V → Q around (x̄, 0) for p̄ of the mapping
(x, y) → p y + f (p, x) = h(x)
for h(x) = f ( p̄, x̄) + ∇x f ( p̄, x̄)(x − x̄) which satisfies the conditions in (9). Hence,
Fix y ∈ V . If x ∈ (h+F)−1 (y)∩U and p = ψ (x, y), then p ∈ Q and y+ f (p, x) = h(x),
hence x ∈ S(p)∩U. Conversely, if x ∈ S(ψ (x, y))∩U, then clearly x ∈ (h+F)−1 (y)∩
U. Thus,
(11) (h + F)−1 (y) ∩U = x x ∈ S(ψ (x, y)) ∩U .
Since the Aubin property of S at p̄ for x̄ is a local property of the graph of S relative to
the point ( p̄, x̄), it holds if and only if the same holds for the truncated mapping SU :
p → S(p) ∩ U (see Exercise 3F.11 below). That equivalence is valid for (h + F)−1
as well. Thus, if the mapping SU has the Aubin property at p̄ for x̄, from Proposition
3F.10 in the context of (11), we obtain that (h + F)−1 has the Aubin property at 0
for x̄, hence, by 3E.6, h + F is metrically regular at x̄ for 0 as desired.
Exercise 3F.11. Let S : IRm → → IRn have the Aubin property at ȳ for x̄ with constant
κ. Show that for any neighborhood U of x̄ the mapping SU : y → S(y) ∩U also has
the Aubin property at ȳ for x̄ with constant κ.
176 3 Regularity Properties of Set-valued Solution Mappings
Guide. Choose sufficiently small a > 0 and b > 0 such that IBa (x̄) ⊂ U and b ≤
a/(4κ ). Then for every y, y ∈ IBb (ȳ) and every x ∈ S(y) ∩ IBa/2 (x̄) there exists x ∈
S(y ) with |x − x| ≤ κ |y − y| ≤ 2κ b ≤ a/2. Then both x and x are from U.
Let us now look at the case of 3F.9 in which F is a constant mapping, F(x) ≡ K,
which was featured at the beginning of this chapter as a motivation for investigating
real set-valuedness in solution mappings. Solving f (p, x) + F(x) 0 for a given p
then means finding an x such that − f (p, x) ∈ K. For particular choices of K this
amounts to solving some mixed system of equations and inequalities, for example.
Example 3F.12 (application to general constraint systems). For f : IRd × IRn → IRm
and a closed set K ⊂ IRm , let
S(p) = x f (p, x) ∈ K .
If S̄ has the Aubin property at 0 for x̄, then S has the Aubin property at p̄ for x̄. The
converse implication holds under the ample parameterization condition (8).
The key to applying this result, of course, is being able to ascertain when the
linearized system does have the Aubin property in question. In the important case of
K = IRs− × {0}m−s, a necessary and sufficient condition will emerge in the so-called
Mangasarian–Fromovitz constraint qualification. This will be seen in Section 4D.
Example 3F.13 (application to polyhedral variational inequalities). For f : IRd ×
IRn → IRn and a convex polyhedral set C ⊂ IRn , let
S(p) = x f (p, x) + NC (x) 0 .
Fix p̄ and x̄ ∈ S( p̄) and for v̄ = − f ( p̄, x̄) let K = KC (x̄, v̄) be the associated critical
cone to C. Suppose that f is continuously differentiable on a neighborhood of ( p̄, x̄),
and consider the solution mapping for an associated reduced system:
S̄(y) = x ∇x f ( p̄, x̄)x + NK (x) y .
If S̄ has the Aubin property at 0 for 0, then S has the Aubin property at p̄ for x̄. The
converse implication holds under the ample parameterization condition (8).
If a mapping F : IRn → → IRm has the Aubin property at x̄ for ȳ, then for any function
f : IR → IR with lip ( f ; x̄) < ∞, the mapping f + F has the Aubin property at x̄ for
n m
f (x̄) + ȳ as well. This is a particular case of the following observation which utilizes
ample parameterization.
Exercise 3F.14. Consider a mapping F : IRn → → IRm with (x̄, ȳ) ∈ gph F and a func-
tion f : IR × IR → IR having ȳ = − f ( p̄, x̄) and which is strictly differentiable at
d n m
( p̄, x̄) and satisfies the ample parameterization condition (8). Prove that the mapping
3 Regularity Properties of Set-valued Solution Mappings 177
has the Aubin property at x̄ for p̄ if and only if F has the Aubin property at x̄ for ȳ.
Guide. First, apply 3F.9 to show that, under the ample parameterization condition
(8), the mapping
(x, y) → Ω (x, y) = p y + f (p, x) = 0
has the Aubin property at (x̄, ȳ) for p̄. Let F have the Aubin property at x̄ for ȳ with
neighborhoods U of x̄ of V for ȳ and constant κ . Choose a neighborhood Q of p̄ and
adjust U and V accordingly so that Ω has the Aubin property with constant λ and
neighborhoods U × V and Q. Let b > 0 be such that IBb (ȳ) ⊂ V , then choose a > 0
and adjust Q such that IBa (x̄) ⊂ U, a ≤ b/(4κ ) and also − f (p, x) ∈ IBb/2 (ȳ) for x ∈
IBa (x̄) and p ∈ Q. Let x, x ∈ IBa (x̄) and p ∈ P(x) ∩ Q. Then y = − f (p, x) ∈ F(x) ∩V
and by the Aubin property of F there exists y ∈ F(x ) such that |y − y | ≤ κ |x − x |.
But then |y − ȳ| ≤ κ (2a) + b/2 ≤ b. Thus y ∈ V and hence, by the Aubin property
of Ω , there exists p satisfying y + f (p , x ) = 0 and
Noting that p ∈ P(x ) we get that P has the Aubin property at x̄ for p̄.
Conversely, let P have the Aubin property at x̄ for p̄ with associated constant κ
and neighborhoods U and Q of x̄ and p̄, respectively. Let f be Lipschitz continuous
on Q × U with constant μ . We already know that the mapping Ω has the Aubin
property at (x̄, ȳ) for p̄; let λ be the associated constant and U ×V and Q the neigh-
borhoods of (x̄, ȳ) and of p̄, respectively. Choose c > 0 such that IBc ( p̄) ⊂ Q and let
a > 0 satisfy
Let x, x ∈ IBa (x̄) and y ∈ F(x) ∩ IBa (ȳ). Since Ω has the Aubin property and
p̄ ∈ Ω (x̄, ȳ) ∩ IBc ( p̄), there exists p ∈ Ω (x, y) such that |p − p̄| ≤ λ (2a) ≤ c/2.
This means that p ∈ P(x) ∩ IBc/2 ( p̄) and from the Aubin property of P there ex-
ists p ∈ P(x ) so that |p − p| ≤ κ |x − x|. Thus, |p − p̄| ≤ κ (2a) + c/2 ≤ c. Let
y = − f (p , x ). Then y ∈ F(x ) because p ∈ P(x ) and the Lipschitz continuity of
f gives us
(1) x ∈ U0 , y ∈ V0 =⇒ y − g(x) ∈ V.
Suppose to the contrary that y ∈ V0 and x, x ∈ U0 , x = x , are such that both x and
xbelong to (g + F)−1 (y). Clearly x ∈ (g + F)−1 (y) ∩ U0 if and only if x ∈ U0 and
y ∈ g(x) + F(x), or equivalently y − g(x) ∈ F(x). The latter, in turn, is the same as
having x ∈ F −1 (y − g(x)) ∩U0 ⊂ F −1 (y − g(x)) ∩U = s(y − g(x)), where y − g(x) ∈
V by (1). Then
which is absurd.
The observation in 3G.1 leads to a definition.
Strong metric regularity. A mapping F : IRn → → IRm having the equivalent proper-
ties in 3G.1 will be called strongly metrically regular at x̄ for ȳ.
For a linear mapping represented by an m × n matrix A, strong metric regularity
comes out as the nonsingularity of A and thus requires that m = n. Moreover, for
any single-valued function f : IRn → IRm , strong metric regularity requires m = n by
Theorem 1F.1 on the invariance of domain. This property can be seen therefore as
corresponding closely to the one in the classical implicit function theorem, except
for its focus on Lipschitz continuity instead of continuous differentiability. It was
the central property in fact, if not in name, in Robinson’s implicit function theorem
2B.1.
The terminology of strong metric regularity offers a way of gaining new per-
spectives on earlier results by translating them into the language of metric regula-
rity. Indeed, strong metric regularity is just metric regularity plus the existence of a
single-valued localization of the inverse. According to Theorem 3F.1, metric regula-
rity of a mapping F with a locally closed graph is stable under addition of a function
g with a “small” Lipschitz constant, and so too is local single-valuedness, accord-
ing to 3G.2 above. Thus, strong metric regularity must be stable under perturbation
in the same way as metric regularity. The corresponding result is a version of the
inverse function result in 2B.10 corresponding to the extended form of Robinson’s
implicit function theorem in 2B.5.
Theorem 3G.3 (inverse function theorem with strong metric regularity). Consider
a mapping F : IRn → → IRm and any (x̄, ȳ) ∈ gph F such that F is strongly metrically
regular at x̄ for ȳ. Let κ and μ be nonnegative constants such that
Then for any function g : IRn → IRm with x̄ ∈ int dom g and lip (g; x̄) ≤ μ, the map-
ping g + F is strongly metrically regular at x̄ for g(x̄) + ȳ. Moreover,
κ
reg g + F; x̄g(x̄) + ȳ ≤ .
1 − κμ
Proof. Our hypothesis that F is strongly metrically regular at x̄ for ȳ implies that
a graphical localization of F −1 around (ȳ, x̄) is single-valued and continuous near
180 3 Regularity Properties of Set-valued Solution Mappings
ȳ and therefore that gph F is locally closed at (x̄, ȳ). Further, by fixing λ > κ and
using Proposition 3G.1, we can get neighborhoods U of x̄ and V of ȳ such that for
any y ∈ V the set F −1 (y) ∩ U consists of exactly one point, which we may denote
by s(y) and know that the function s : y → F −1 (y) ∩ U is Lipschitz continuous on
V with Lipschitz constant λ . Let μ < ν < λ −1 and choose a function g : IRn → IRm
and a neighborhood U ⊂ U of x̄ on which g is Lipschitz continuous with constant
ν . Applying Proposition 3G.2 we obtain that the mapping (g + F)−1 has a local-
ization at g(x̄) + ȳ for x̄ which is nowhere multi-valued. On the other hand, since F
has locally closed graph at (x̄, ȳ), we know from Theorem 3F.1 that for such g the
mapping g + F is metrically regular at g(x̄) + ȳ for x̄. Applying Proposition 3G.1
once more, we complete the proof.
In much the same way we can state in terms of strong metric regularity an implicit
function result paralleling Theorem 2B.5.
Theorem 3G.4 (implicit function theorem with strong metric regularity). For the
generalized equation f (p, x) + F(x) 0 with f : IRn → IRm and F : IRn →
→ IRm and its
solution mapping
S : p → x f (p, x) + F(x) 0 ,
consider a pair ( p̄, x̄) with x̄ ∈ S( p̄). Let h : IRn → IRm be a strict estimator of f
with respect to x uniformly in p at ( p̄, x̄) with constant μ and let h + F be strongly
metrically regular at x̄ for 0 with reg (h + F; x̄|0) ≤ κ. Suppose that
Then S has a Lipschitz continuous single-valued localization s around p̄ for x̄, more-
over with
κλ
lip (s; p̄) ≤ .
1 − κμ
Many corollaries of this theorem could be stated in a mode similar to that in
Section 3F, but the territory has already been covered essentially in Chapter 2. We
will get back to this result in Section 5F.
In some situations, metric regularity automatically entails strong metric regula-
rity. That is the case, for instance, for a linear mapping from IRn to itself represented
by an n × n matrix A. Such a mapping is metrically regular if and only if it is sur-
jective, which means that A has full rank, but then A is nonsingular, so that we
have strong metric regularity. More generally, for any mapping which describes the
Karush-Kuhn-Tucker optimality system in a nonlinear programming problem, met-
ric regularity implies strong metric regularity. We will prove this fact in Section
4F.
We will describe now another class of mappings for which metric regularity and
strong metric regularity come out to be the same thing. This class depends on a
localized, set-valued form of the monotonicity concept which appeared in Section
2F in the context of variational inequalities.
3 Regularity Properties of Set-valued Solution Mappings 181
Since the metric regularity of F implies through 3E.6 the Aubin property of F −1 at
ȳ for x̄, there exist κ > 0 and a > 0 such that
Then for k large, we have yk , yk + τk hk ∈ IBa (ȳ) and xk ∈ F −1 (yk ) ∩ IBa (x̄), and hence
there exists uk ∈ F −1 (yk + τk hk ) satisfying
uk − zk , yk + τk hk − yk ≥ 0.
Calmness. A mapping S : IRm → → IRn is said be calm at ȳ for x̄ if (ȳ, x̄) ∈ gph S and
there is a constant κ ≥ 0 along with neighborhoods U of x̄ and V of ȳ such that
although perhaps with larger constant κ. The infimum of κ over all such combina-
tions of κ, U and V is called the calmness modulus of S at ȳ for x̄ and denoted by
clm (S; ȳ| x̄). The absence of this property is signaled by clm (S; ȳ| x̄) = ∞.
As in the case of the Lipschitz modulus lip (S; ȳ| x̄) in 3E, it is not claimed that (1)
and (2) are themselves equivalent, although that is true when S(y) is closed for every
y ∈ V . But anyway, the infimum furnishing clm (S; ȳ| x̄) is the same with respect to
(2) as with respect to (1).
In the case when S is not multi-valued, the definition above reduces to the def-
inition of calmness of a function in Section 1C relative to a neighborhood V of ȳ;
clm (S; ȳ, S(ȳ)) = clm (S; ȳ). Indeed, for any y ∈ V \ dom S the inequality (1) holds
automatically.
Clearly, for mappings with closed values, outer Lipschitz continuity implies
calmness. In particular, we get the following fact from Theorem 3D.1.
Exercise 3H.2 (local outer Lipschitz continuity under truncation). Show that a
mapping S : IRm → → IRn with (ȳ, x̄) ∈ gph S and with S(ȳ) convex is calm at ȳ for
x̄ if and only if there is a neighborhood U of x̄ such that the truncated mapping
y → S(y) ∩U is outer Lipschitz continuous at ȳ.
Guide. Mimic the proof of 3E.3 with y = ȳ.
Is there a “one-point” variant of the metric regularity which would characterize
calmness of the inverse, in the way metric regularity characterizes the Aubin prop-
erty of the inverse? Yes, as we explore next.
3 Regularity Properties of Set-valued Solution Mappings 183
The infimum of all κ for which this holds is the modulus of metric subregula-
rity, denoted by subreg (F; x̄| ȳ). The absence of metric subregularity is signaled
by subreg(F; x̄| ȳ) = ∞.
The main difference between metric subregularity and metric regularity is that
the data input ȳ is now fixed and not perturbed to a nearby y. Since d(ȳ, F(x)) ≤
κ d(ȳ, F(x) ∩ V ), it is clear that subregularity is a weaker condition than metric re-
gularity, and
subreg (F; x̄| ȳ) ≤ reg (F; x̄| ȳ).
The following result reveals the equivalence of metric subregularity of a mapping
with calmness of its inverse:
which is (3). This shows that (4) implies (3) and that
inf κ U, V, κ satisfying (4) ≥ inf κ U, V, κ satisfying (3) ,
Exercise 3H.4 (equivalent formulations). For a mapping F : IRn → → IRm and a point
(x̄, ȳ) ∈ gph F metric subregularity of F at x̄ for ȳ with constant κ is equivalent
simply to the existence of a neighborhood U of x̄ such that
whereas the calmness of F −1 at ȳ for x̄ with constant κ can be identified with the
existence of a neighborhood U of x̄ such that
Guide. Assume that (3) holds with κ > 0 and associated neighborhoods U and V .
We can choose within V a neighborhood of the form V = IBε (ȳ) for some ε > 0.
Let U := U ∩ (x̄ + εκ IB) and pick x ∈ U . If F(x) ∩ V = 0/ then d(ȳ, F(x) ∩ V ) =
d(ȳ, F(x)) and (3) becomes (5) for this x. Otherwise, F(x) ∩V = 0/ and then
1 1
d(ȳ, F(x)) ≥ ε ≥ |x − x̄| ≥ d(x, F −1 (ȳ)),
κ κ
which is (5).
Similarly, (6) entails the calmness in (4), so attention can be concentrated on
showing that we can pass from (4) to (6) under an adjustment in the size of U. We
already know from 3H.3 that the calmness condition in (4) leads to the metric sub-
regularity in (3), and further, from the argument just given, that such subregularity
yields the condition in (5). But that condition can be plugged into the argument in
the proof of 3H.3, by taking V = IRm , to get the corresponding calmness property
with V = IRm but with U replaced by a smaller neighborhood of x̄.
Although we could take (5) as a redefinition of metric subregularity, we prefer
to retain the neighborhood V in (3) in order to underscore the parallel with metric
regularity; similarly for calmness.
Does metric subregularity enjoy stability properties under perturbation resem-
bling those of metric regularity and strong metric regularity? In other words, does
metric subregularity obey the general paradigm of the implicit function theorem?
The answer to this question turns out to be no even for simple functions. Indeed, the
function f (x) = x2 is clearly not metrically subregular at 0 for 0, but its derivative
D f (0), which is the zero mapping, is metrically subregular.
More generally, every linear mapping A : IRn → IRm is metrically subregular, and
hence the derivative mapping of any smooth function is metrically subregular. But of
course, not every smooth function is subregular. For this reason, there cannot be an
implicit mapping theorem in the vein of 3F.8 in which metric regularity is replaced
by metric subregularity, even for the classical case of an equation with smooth f
and no set-valued F.
An illuminating but more intricate counterexample of instability of metric sub-
regularity of set-valued mappings is as follows. In IR × IR, let gph F be the set of
all (x, y) such that x ≥ 0, y ≥ 0 and yx = 0. Then F −1 (0) = [0, ∞) ⊃ F −1 (y) for all
3 Regularity Properties of Set-valued Solution Mappings 185
It turns out that this property likewise fails to be preserved when f is perturbed
to f + g by a function g with lip (g; x̄) = 0. This is demonstrated by the following
example. Define f : IR2 → IR2 by taking f (0, 0) = (0, 0) and, for x = (x1 , x2 ) = (0, 0),
2
1 x1 − x22
f (x1 , x2 ) = 2 .
2|x1 |x2
x21 + x22
Then f satisfies (7) at x̄ = 0 with b = a, since | f (x)| = |x|. The function g(x1 , x2 ) =
(0, x32 ) has g(0, 0) = (0, 0) and lip (g; (0, 0)) = 0, but ( f + g)−1 (c, 0) = 0/ when c < 0.
A “metric regularity variant” of the openness property (7), equally failing to be
preserved under small Lipschitz continuous perturbations, as shown by this same
example, is the requirement that d(x̄, F −1 (y)) ≤ κ |y − ȳ| for y close to ȳ. The same
trouble comes up also for an “inner semicontinuity variant,” namely the condition
that there exist neighborhoods U of x̄ and V of ȳ such that F −1 (y) ∩ U = 0/ for all
y ∈ V.
If we consider calmness as a local version of the outer Lipschitz continuity, then it
might seem to be worthwhile to define a local version of inner Lipschitz continuity,
introduced in Section 3D. For a mapping S : IRm → IRn with (ȳ, x̄) ∈ gph S, this would
refer to the existence of neighborhoods U of x̄ and V of ȳ such that
We will not give a name to this property here, or a name to the associated property
of the inverse of a mapping satisfying (8). We will only demonstrate, by an example,
that the property of the inverse associated to (8), similar to metric subregularity, is
not stable under perturbation, in the sense we have been exploring, and hence does
not support the implicit function theorem paradigm.
→
Consider
the mapping S : IR → IR whose values are the set of three points
{− |y|, 0, |y|} for all y ≥ 0 and the empty set for y < 0. This mapping has the
property in (8) at ȳ = 0 for x̄ = 0. Now consider the inverse S−1 and add to it the
186 3 Regularity Properties of Set-valued Solution Mappings
function g(x) = −x2 , which has zero derivative at x̄ = 0. The sum S−1 + g is the
mapping whose value at x = 0 is the interval [0, ∞) but is just zero for x = 0. The
inverse (S−1 + g)−1 has (−∞, ∞) as its value for y = 0, but 0 for y > 0 and the empty
set for y < 0. Clearly, this inverse does not have the property displayed in (8) at ȳ = 0
for x̄ = 0.
In should be noted that for special cases of mappings with particular perturba-
tions one might still obtain stability of metric subregularity, or the property associ-
ated to (8), but we shall not go into this further.
Isolated calmness. A mapping S : IRm → → IRn is said to have the isolated calmness
property if it is calm at ȳ for x̄ and, in addition, S has a graphical localization at ȳ
for x̄ that is single-valued at ȳ itself (with value x̄). Specifically, this refers to the
existence of a constant κ ≥ 0 and neighborhoods U of x̄ and V of ȳ such that
Observe that in this definition S(ȳ) ∩U is a singleton, namely the point x̄, so x̄ is
an isolated point in S(ȳ), hence the terminology. Isolated calmness can equivalently
be defined as the existence of a (possibly slightly larger) constant κ and neighbor-
hoods U of x̄ and V of ȳ such that
For a linear mapping A, isolated calmness holds at every point, whereas isolated
calmness of A−1 holds at some point of dom A−1 if and only if A is nonsingular.
More generally we have the following fact through Theorem 3D.1 for polyhedral
mappings, as defined there.
Then infimum of all κ such that the inclusion holds for some U and V equals
subreg(F; x̄| ȳ).
Proof. Assume first that F is strongly subregular at x̄ for ȳ. Let κ > subreg (F; x̄| ȳ).
Then there are neighborhoods U for x̄ and V for ȳ such that (3) holds with the
indicated κ . Consider any y ∈ V . If F −1 (y) ∩U = 0, / then (4) holds trivially. If not,
let x ∈ F −1 (y) ∩ U. This entails y ∈ F(x) ∩ V , hence d(ȳ, F(x) ∩ V ) ≤ |y − ȳ| and
consequently |x − x̄| ≤ κ |y − ȳ| by (3). Thus, x ∈ x̄ + κ |y − ȳ|IB, and we conclude
that (4) holds. Also, we see that subreg(F; x̄| ȳ) is not less than the infimum of all κ
such that (4) holds for some choice of U and V .
For the converse, suppose (4) holds for some κ and neighborhoods U and V .
Consider any x ∈ U. If F(x) ∩ V = 0/ the right side of (3) is ∞ and there is nothing
more to prove. If not, for an arbitrary y ∈ F(x) ∩ V we have x ∈ F −1 (y) ∩ U, and
therefore x ∈ x̄ + κ |y − ȳ|IB by (4), which means |x − x̄| ≤ κ |y − ȳ|. This being true
for all y ∈ F(x) ∩ V , we must have |x − x̄| ≤ κ d(ȳ, F(x) ∩ V ). Thus, (3) holds, and
in particular we have κ ≥ subreg(F; x̄| ȳ). Therefore, the infimum of κ in (4) equals
subreg(F; x̄| ȳ).
Observe also, through 3H.4, that the neighborhood V in (2) and (3) can be chosen
to be the entire space IRm , by adjusting the size of U; that is, strong metric subregu-
larity as in (3) with constant κ is equivalent to the existence of a neighborhood U
of x̄ such that
Exercise 3I.3. Provide direct proofs of the equivalence of (3) and (5), and (4) and
(6), respectively.
Guide. Use the argument in the proof of 3H.4.
Similarly to the distance function characterization in Theorem 3E.8 for the Aubin
property, the isolated calmness property is characterized by uniform calmness of the
distance function associated with the inverse mapping:
Consider the function s(y, x) = d(x, F −1 (y)). Then the mapping F is strongly met-
rically subregular at x̄ for ȳ if and only if s is calm with respect to y uniformly in x
at (ȳ, x̄), in which case
Proof. Let F be strongly metrically subregular at x̄ for ȳ and let κ > subreg (F; x̄| ȳ).
Let (5) and (6) hold with U = IBa (x̄) and also F −1 (ȳ) ∩ IBa (x̄) = x̄. Let b > 0 be
such that, according to (7), F −1 (y) ∩ IBa (x̄) = 0/ for all y ∈ IBb (ȳ). Make b smaller
if necessary so that b ≤ a/(10κ ). Choose y ∈ IBb (ȳ) and x ∈ IBa/4 (x̄); then from (6)
we have
Since all points in F −1 (ȳ) except x̄ are at distance from x more than a/4 we obtain
and consequently
Therefore,
According to (6),
Since x and y were arbitrarily chosen in dom s and close to x̄ and ȳ, respectively, we
y (s; (ȳ, x̄)) ≤ κ , hence
obtain by combining (11) and (14) that clm
To show the converse inequality, let κ > clm y (s; (ȳ, x̄)); then there exists a > 0
such that s(·, x) is calm on IBa (ȳ) with constant κ uniformly in x ∈ IBa (x̄). Adjust a so
that F −1 (ȳ) ∩ IBa (x̄) = x̄. Pick any x ∈ IBa/3 (x̄). If F(x) = 0, / (5) holds automatically.
If not, choose any y ∈ IRn such that (x, y) ∈ gph F. Since s(y, x) = 0, we have
Since y is arbitrarily chosen in F(x), this gives us (5). This means that F is strongly
subregular at x̄ for ȳ with constant κ and hence
Theorem 3I.6 (inverse mapping theorem for strong metric subregularity). Consider
a mapping F : IRn →→ IRm and a point (x̄, ȳ) ∈ gph F such that F is strongly metrically
subregular at x̄ for ȳ and let κ and μ be nonnegative constants such that
Then for any function g : IRn → IRm with x̄ ∈ dom g and clm (g; x̄) ≤ μ, one has
κ
subreg(g + F; x̄|g(x̄) + ȳ) ≤ .
1 − κμ
Proof. Choose κ and μ as in the statement of the theorem and let λ > κ , ν > μ be
such that λ ν < 1. Pick g : IRn → IRm with clm (g; x̄) < ν . Without loss of generality,
let g(x̄) = 0; then there exists a > 0 such that
Since subreg (F; x̄| ȳ) < λ , we can arrange, by taking a smaller if necessary, that
(17) |x − x̄| ≤ λ |y − ȳ| when (x, y) ∈ gph F ∩ (IBa (x̄) × IBa (ȳ)).
These relations entail z ∈ g(x) + F(x), hence z = y + g(x) for some y ∈ F(x). From
(16) and since x ∈ IBa/2ν (x̄), we have |g(x)| ≤ μ (a/2ν ) ≤ a/2 (inasmuch as ν ≥
ν ). Using the equality y − ȳ = z − g(x) − ȳ we get |y − ȳ| ≤ |z − ȳ| + |g(x)| ≤ (a/2) +
(a/2) = a. However, because (x, y) ∈ gph F ∩ (IBa (x̄) × IBa (ȳ)), through (17),
hence |x − x̄| ≤ λ /(1 − λ ν )|z − ȳ|. Since x and z are chosen as in (18) and λ and ν
could be arbitrarily close to κ and μ , respectively, the proof is complete.
Corollaries that parallel those for metric regularity given in Section 3F can im-
mediately be derived.
subreg(F; x̄| ȳ) · clm (g; x̄) < 1. Then the mapping g + F is strongly metrically sub-
regular at x̄ for ȳ, and one has
3 Regularity Properties of Set-valued Solution Mappings 191
−1
subreg (g + F; x̄|g(x̄) + ȳ) ≤ subreg(F; x̄| ȳ)−1 − clm (g; x̄) .
If subreg(F; x̄| ȳ) = 0, then subreg(g + F; x̄| ȳ) = 0 for any g : IRn → IRm with
clm (g; x̄) < ∞. If subreg(F; x̄| ȳ) = ∞, then subreg(g + F; x̄|g(x̄) + ȳ) = ∞ for any
g : IRn → IRm with clm (g; x̄) = 0.
This result implies in particular that the property of strong metric subregularity is
preserved under perturbations with zero calmness moduli. The only difference with
the corresponding results for metric regularity in Section 3F is that now a larger
class of perturbation is allowed with first-order approximations replacing the strict
first-order approximations.
x̄ ∈ int dom f ∩ int dom g which are first-order approximations to each other at x̄.
Then the mapping f + F is strongly metrically subregular at x̄ for f (x̄) + ȳ if and
only if g + F is strongly metrically subregular at x̄ for g(x̄) + ȳ, in which case
This corollary takes a more concrete form when the first-order approximation is
represented by a linearization:
Then M is strongly metrically subregular at x̄ for ȳ if and only if M0 has this property.
Moreover subreg(M; x̄ | ȳ) = subreg(M0 ; x̄| ȳ).
Through 3I.1, the result in Corollary 3I.9 could equally well be stated in terms of
the isolated calmness property of M−1 in relation to that of M0−1 . We can specialize
that result in the following way.
Corollary 3I.10 (linearization with polyhedrality). Let M : IRn → → IRm with ȳ ∈ M(x̄)
be of the form M = f + F for f : IR → IR and F : IR →
n m n → IRm such that f is differen-
tiable at x̄ and F is polyhedral. Let M0 (x) = f (x̄) + ∇ f (x̄)(x − x̄) + F(x). Then M−1
has the isolated calmness property at ȳ for x̄ if and only if x̄ is an isolated point of
M0−1 (ȳ).
Proof. This applies 3I.1 in the framework of the isolated calmness restatement of
3I.9 in terms of the inverses.
Applying Corollary 3I.9 to the case where F is the zero mapping, we obtain yet
another inverse function theorem in the classical setting:
Corollary 3I.11 (an inverse function result). Let f : IRn → IRm be differentiable at
x̄ and such that ker ∇ f (x̄) = {0}. Then there exist κ > 0 and a neighborhood U of
192 3 Regularity Properties of Set-valued Solution Mappings
x̄ such that
|x − x̄| ≤ κ | f (x) − f (x̄)| for every x ∈ U.
Proof. This comes from (5).
Next, we state and prove an implicit function theorem for strong metric subregu-
larity:
Theorem 3I.12 (implicit mapping theorem with strong metric subregularity). For
the generalized equation f (p, x) + F(x) 0 and its solution mapping
S : p → x f (p, x) + F(x) 0 ,
consider a pair ( p̄, x̄) with x̄ ∈ S( p̄). Let h : IRn → IRm be an estimator of f with
respect to x at ( p̄, x̄) with constant μ and let h + F be strongly metrically subregular
at x̄ for 0 with subreg(h + F; x̄|0) ≤ κ. Suppose that
Then S has the isolated calmness property at p̄ for x̄, moreover with
κλ
clm (S; p̄| x̄) ≤ .
1 − κμ
Proof. The proof goes along the lines of the proof of Theorem 3I.6 with different
choice of constants. Let κ , μ and λ be as required and let δ > κ and ν > μ be such
that δ ν < 1. Let γ > λ . By the assumptions for the mapping h + F and the functions
f and h, there exist positive scalars a and r such that
(20) |x − x̄| ≤ δ |y| for all x ∈ (h + F)−1 (y) ∩ IBa (x̄) and y ∈ IBν a+γ r (0),
(21) | f (p, x) − f ( p̄, x)| ≤ γ |p − p̄| for all p ∈ IBr ( p̄) and x ∈ IBa (x̄),
(22) |e(p, x) − e(p, x̄)| ≤ ν |x − x̄| for all x ∈ IBa (x̄) and p ∈ IBr ( p̄).
Let x ∈ S(p) ∩ IBa (x̄) for some p ∈ IBr ( p̄). Then, since h(x̄) = f ( p̄, x̄), we obtain
from (21) and (22) that
Observe that x ∈ (h + F)−1 (− f (p, x) + h(x)) ∩ IBa (x̄), and then from (20) and (23)
we have
|x − x̄| ≤ δ | − f (p, x) + h(x)| ≤ δ ν |x − x̄| + δ γ |p − p̄|.
In consequence,
3 Regularity Properties of Set-valued Solution Mappings 193
δγ
|x − x̄| ≤ |p − p̄|.
1−δν
Since δ is arbitrarily close to κ , ν is arbitrarily close to μ and γ is arbitrarily close
to λ , we arrive at the desired result.
In the theorem we state next, we can get away with a property of f at ( p̄, x̄)
which is weaker than local continuous differentiability, namely a kind of uniform
differentiability we introduced in Section 1C. We say that f (p, x) is differentiable
in x uniformly with respect to p at ( p̄, x̄) if f is differentiable with respect to (p, x)
at ( p̄, x̄) and for every ε > 0 there is a (p, x)-neighborhood of ( p̄, x̄) in which
then the converse implication holds as well: the mapping h + F is strongly metrically
subregular at x̄ for 0 provided that S has the isolated calmness property at p̄ for x̄.
Proof. With this choice of h, the assumption (19) of 3I.12 holds and then (24) fol-
lows from the conclusion of this theorem. To handle the ample parameterization we
employ Lemma 2C.1 by repeating the argument in the proof of 3F.10, simply re-
placing the composition rule there with the one in the following proposition.
the form
194 3 Regularity Properties of Set-valued Solution Mappings
y → N(y) = x x ∈ M(ψ (x, y)) for y ∈ IRm .
Let ψ satisfy
and let (ψ (x̄, 0), x̄) ∈ gph M . If M has the isolated calmness property at ψ (x̄, 0) for
x̄, then N has the isolated calmness property at 0 for x̄.
Proof. Let M have the isolated calmness property with neighborhoods IBb (x̄), IBc ( p̄)
and constant κ > clm (M; p̄ | x̄), where p̄ = ψ (x̄, 0). Choose λ > 0 with λ < 1/κ and
a > 0 such that for any y ∈ IBa (0) the function ψ (·, y) is calm on IBb (x̄) with calmness
y (ψ ; (x̄, 0)) and make a and b smaller if necessary so that
constant λ . Pick γ > clm
the function ψ (x, ·) is calm on IBa (0) with constant γ and also
(26) λ b + γ a ≤ c.
Let y ∈ IBa (0) and x ∈ N(y) ∩ IBb (x̄). Then x ∈ M(ψ (x, y)) ∩ IBb (x̄). Using the as-
sumed calmness properties (25) of ψ and utilizing (26) we see that
hence
κγ
|x − x̄| ≤|y|.
1 − κλ
This establishes that the mapping N has the isolated calmness property at 0 for x̄
with constant κγ /(1 − κλ ).
Then, from 3I.13 we obtain that if the solution mapping for (28) has the isolated
calmness property at ȳ = f ( p̄, x̄) − Ax̄ for x̄, then the solution mapping for (27)
has the isolated calmness property at p̄ for x̄. Under the ample parameterization
condition, rank ∇ p f ( p̄, x̄) = n, the converse implication holds as well.
3 Regularity Properties of Set-valued Solution Mappings 195
Commentary
The inner and outer limits of sequences of sets were introduced by Painlevé in
his lecture notes as early as 1902 and later popularized by Hausdorff [1927] and
Kuratowski [1933]. The definition of excess was first given by Pompeiu [1905],
who also defined the distance between sets C and D as e(C, D) + e(D,C). Hausdorff
[1927] gave the definition we use here. These two definitions are equivalent in the
sense that they induce the same convergence of sets. The reader can find much more
about set-convergence and continuity properties of set-valued mappings together
with extended historical commentary in Rockafellar and Wets [1998]. This includes
the reason why we prefer “inner and outer” in contrast to the more common terms
“lower and upper,” so as to avoid certain conflicts in definition that unfortunately
pervade the literature.
Theorem 3B.4 is a particular case of a result sometimes referred to as the Berge
theorem; see Section 8.1 in Dontchev and Zolezzi [1993] for a general statement.
Theorem 3C.3 comes from Walkup and Wets [1969], while the Hoffman lemma,
3C.4, is due to Hoffman [1952].
The concept of outer Lipschitz continuity was introduced by Robinson [1981]
under the name “upper Lipschitz continuity” and adjusted to “outer Lipschitz con-
tinuity” later in Robinson [2007]. Theorem 3D.1 is due to Robinson [1981] while
3D.3 is a version, given in Robinson [2007], of a result due to Wu Li [1994].
The Aubin property of set-valued mappings was introduced by J.-P. Aubin
[1984], who called it “pseudo-Lipschitz continuity”; it was renamed after Aubin in
Dontchev and Rockafellar [1996]. In the literature one can also find it termed “Aubin
continuity,” but we do not use that here since the Aubin property does not imply con-
tinuity. Theorem 3E.3 is from Bessis, Ledyaev and Vinter [2001]. The name “metric
regularity” was coined by J. M. Borwein [1986a], but the origins of this concept go
back to the Banach open mapping theorem and even earlier. In the literature, metric
regularity is defined in various ways, for example in Schirotzek [2007] the property
expressed in 3E.6 is called weak metric regularity; see e.g. Mordukhovich [2006]
for other names. Theorem 3E.4 is from Rockafellar [1985]. Theorem 3E.9 comes
from Ledyaev and Zhu [1999]. For historical remarks regarding inverse and implicit
mapping theorems with metric regularity, see the commentary to Chapter 5.
As we mentioned earlier in Chapter 2, the term “strong regularity” comes from
Robinson [1980], who used it in the framework of variational inequalities. Theorem
3F.5 is a particular case of a more general result due to Kenderov [1975]; see also
Levy and Poliquin [1997].
Calmness and metric subregularity, as well as isolated calmness and metric sub-
regularity, have been considered in various contexts and under various names in
the literature; here we follow the terminology of Dontchev and Rockafellar [2004].
Isolated calmness was formally introduced in Dontchev [1995a], where its stability
(Theorem 3I.6) was first proved. The equivalent property of strong metric subregu-
larity was considered earlier, without giving it a name, by Rockafellar [1989]; see
also the commentary to Chapter 4.
Chapter 4
Regularity Properties Through Generalized
Derivatives
A.L. Dontchev and R.T. Rockafellar, Implicit Functions and Solution Mappings: A View 197
from Variational Analysis, Springer Monographs in Mathematics,
DOI 10.1007/978-0-387-87821-8 4, c Springer Science+Business Media, LLC 2009
198 4 Regularity Properties Through Generalized Derivatives
Described as an outer limit in this way, it is clear in particular that TC (x) is always
a closed set. When C is a “smooth manifold” in IRn , TC (x) is the usual tangent
subspace, but in general, of course, TC (x) need not even be convex. The tangent
cone mapping TC has dom TC = C but gph TC is not necessarily a closed subset of
IRn × IRn even when C is closed.
As noted in 2A.4, when the set C is convex, the tangent cone TC (x) is also convex
for any x ∈ C. In this case the limsup in (1) can be replaced by lim, as shown in the
following proposition.
Proposition 4A.1 (tangent cones to convex sets). For a convex set C ⊂ IRn and a
point x ∈ C,
Now let v ∈ KC (x). Then v = (x̃ − x)/τ for some τ > 0 and x̃ ∈ C. Take an arbitrary
sequence τk 0 as k → ∞. Since C is convex, we have
τk τk
x + τk v = (1 − )x + x̃ ∈ C for all k.
τ τ
But then v ∈ (C − x)/τk for all k and hence v ∈ lim infk τk−1 (C − x). Since τk was
arbitrarily chosen, we conclude that
Graphical derivatives. For a mapping F : IRn → → IRm and a pair (x, y) with y ∈ F(x),
the graphical derivative of F at x for y is the mapping DF(x|y) : IRn → → IRm whose
graph is the tangent cone Tgph F (x, y) to gph F at (x, y):
Proposition 4A.2 (sum rule). For a function f : IRn → IRm which is differentiable at
x, a set-valued mapping F : IRn →
→ IRm and any y ∈ F(x), one has
ferentiability of f ,
(4) f (x) − K y,
4 Regularity Properties Through Generalized Derivatives 201
for a function f : IRn → IRm , a set K ⊂ IRm and a parameter vector y, and let x be a
solution of (4) for y at which f is differentiable. Then for the mapping
one has
Detail. This applies the sum rule to the case of a constant mapping F ≡ −K,
for which the definition of the graphical derivative gives DF(x|z) = T−K (z) =
−TK (−z).
In the special but important case of Example 4A.3 in which K = IRs− × {0}m−s
with f = ( f1 , . . . , fm ), the constraint system (4) with respect to y = (y1 , . . . , ym ) takes
the form
≤ yi for i = 1, . . . , s,
fi (x)
= yi for i = s + 1, . . ., m.
The graphical derivative formula (5) says then that a vector v = (v1 , . . . , vm ) is in
DG(x|y)(u) if and only if
≤ vi for i ∈ [1, s] with fi (x) = yi ,
D fi (x)u
= vi for i = s + 1, . . . , m.
one has
Detail. From the sum rule in 4A.2 we have DG(x|y)u = D f (x)u + DNC (x|v)(u).
According to Lemma 2E.4 (the reduction lemma for normal cone mappings to poly-
hedral convex sets), for any (x, v) ∈ gph NC there exists a neighborhood O of the
origin in IRn × IRn such that for (x , v ) ∈ O one has
This reveals in particular that the tangent cone to gph NC at (x, v) is just gph NKC (x,v) ,
or in other words, that DNC (x|v) is the normal cone mapping NKC (x,v) . Thus we have
(7).
Because graphical derivative mappings are positively homogeneous, general
properties of positively homogeneous mappings can be applied to them. Norm con-
cepts are available in particular for capturing quantitative characteristics.
Outer and inner norms. For any positively homogeneous mapping H : IRn →
→ IRm ,
the outer norm and the inner norm are defined, respectively, by
+ −
(8) |H| = sup sup |y| and |H| = sup inf |y|
|x|≤1 y∈H(x) |x|≤1 y∈H(x)
Moreover,
−
(10) |H| < ∞ =⇒ dom H = IRn ;
If H has closed and convex graph, then the implication (10) becomes equivalence:
−
(14) |H| < ∞ ⇐⇒ dom H = IRn
H(x) = Ax − K
for a linear mapping A : IRn → IRm and a closed convex cone K ⊂ IRm . Then H is
positively homogeneous with closed and convex graph. Moreover
204 4 Regularity Properties Through Generalized Derivatives
and
and
|H −1 | < ∞ ⇐⇒
+
(18) Ax − K 0 =⇒ x = 0 .
Proof. Formula (15) follows from the definition (8) while (16) comes from (14)
applied to this case. Formula (17) follows from (12) while (18) is the specification
of (13).
We will come back to the general theory of positively homogeneous mappings
and their norms in Section 5A. In the meantime there will be applications to the case
of derivative mappings.
Some properties of the graphical derivatives of convex-valued mappings under
Lipschitz continuity are displayed in the following exercise.
and in particular,
Then use the convexity of the values of F as in the proof of Proposition 4A.1 to
show that liminf in (21) can be replaced by lim and use this to obtain convexity of
4 Regularity Properties Through Generalized Derivatives 205
DF(x|y)(u) from the convexity of F(x + τ u). Lastly, to show (20) apply 4A.1 to
(19) in the case u = 0.
Thus, F is metrically regular at x̄ for ȳ if and only if the right side of (1) is finite.
The proof of Theorem 4B.1 will be furnished later in this section. Note that in the
case when m ≤ n and F is a function f which is differentiable on a neighborhood of
x̄, the representation of the regularity modulus in (1) says that f is metrically regular
precisely when the Jacobians ∇ f (x) for x near x̄ are of full rank and the inner norms
of their inverses ∇ f (x)−1 are uniformly bounded. This holds automatically when f
is continuously differentiable around x̄ with ∇ f (x̄) of full rank, in which case we get
not only metric regularity but also existence of a continuously differentiable local
selection of f −1 , as in 1F.3. When m = n this becomes nonsingularity and we come
to the classical inverse function theorem.
Also to be kept in mind here is the connection between metric regularity and the
Aubin property in 3E.6. This allows Theorem 4B.1 to be formulated equivalently as
a statement about that property of the inverse mapping.
Theorem 4B.2 (derivative criterion for the Aubin property). For a mapping F :
IRn →
→ IRm , consider the inverse mapping S = F −1 or, equivalently, the solution map-
ping S : IRm →
→ IRn for the generalized equation F(x) y:
S(y) = x F(x) y .
206 4 Regularity Properties Through Generalized Derivatives
Fix ȳ and any x̄ ∈ S(ȳ), and suppose that gph F or, equivalently, gph S, is locally
closed at (x̄, ȳ). Then
−
(2) lip (S; ȳ| x̄) = lim sup |DS(y|x)| .
(y,x)→(ȳ,x̄)
(y,x)∈gph S
Thus, S has the Aubin property at ȳ for x̄ if and only if the right side of (2) is finite.
Note that 4B.2 can be stated for the mapping S alone, without referring to it as
an inverse or a solution mapping.
Solution mappings of much greater generality can also be handled with these
ideas. For this, we return to the framework introduced briefly at the end of Section
3E and delve into it much further. We consider the parameterized relation
Theorem 4B.3 (solution mapping estimate). For the generalized equation (3) and
its solution mapping S in (4), let x̄ ∈ S( p̄), so that ( p̄, x̄, 0) ∈ gph G. Suppose that
gph G is locally closed at ( p̄, x̄, 0) and that the distance mapping p → d(0, G(p, x̄))
is upper semicontinuous at p̄. Then for every c ∈ (0, ∞) satisfying
Proof. Let c satisfy (5). Then there exists η > 0 such that
%
for every (p, x, y) ∈ gph G with |p − p̄| + max{|x − x̄|, c|y|} ≤ 2η ,
(7)
and for every v ∈ IRm , there exists u ∈ Dx G(p, x|y)−1 (v) with |u| ≤ c|v|.
4 Regularity Properties Through Generalized Derivatives 207
Lemma 4B.4 (intermediate estimate). For c and η as above, let ε > 0 and s > 0 be
such that
Then for every y ∈ IBs (ν ) there exists x̂ with y ∈ G(p, x̂) such that
1
(11) |x̂ − ω | ≤ |y − ν |.
ε
In the proof of the lemma we apply a fundamental result in variational analysis,
which is stated next:
and
f (uδ ) < f (u) + δ ρ (u, uδ ) for every u ∈ X, u = uδ .
Proof of Lemma 4B.4. On the product space Z := IRn × IRm we introduce the norm
which is equivalent to the Euclidean norm. Pick ε , s and (p, ω , ν ) ∈ gph G as re-
quired in (9) and (10) and let y ∈ IBs (ν ). By (8) the set
E p := (x, y) (p, x, y) ∈ gph G, |p − p̄| + (x, y) − (x̄, 0) ≤ 2η } ⊂ IRn × IRm
is closed, hence, equipped with the metric induced by the norm in question, it is a
complete metric space. The function Vp : E p → IR defined by
and
(14) Vp (x̂, ŷ) ≤ Vp (x, y) + ε (x, y) − (x̂, ŷ) for every (x, y) ∈ E p .
and
(16) |y − ŷ| ≤ |y − y| + ε (x, y) − (x̂, ŷ) for every (x, y) ∈ E p .
This gives us
4 Regularity Properties Through Generalized Derivatives 209
that is,
|y − ŷ| ≤ |vk − (y − ŷ)| + ε (uk , vk ).
Passing to the limit with k → ∞ leads to |y − ŷ| ≤ ε (u, y − ŷ) and then, taking
into account the second relation in (19), we conclude that |y − ŷ| ≤ ε c|y − ŷ|. Since
ε c < 1 by (9), the only possibility here is that y = ŷ. But then y ∈ G(p, x̂) and (17)
yields (11). This proves the lemma.
We continue now with the proof of Theorem 4B.3. Let τ = η /(4c). Since the
function p → d(0, G(p, x̄)) is upper semicontinuous at p̄, there exists a positive δ ≤
cτ such that d(0, G(p, x̄)) ≤ τ /2 for all p with |p − p̄| < δ . Set V := IBδ ( p̄), U :=
IBcτ (x̄) and pick any p ∈ V and x ∈ U. We can find y such that y ∈ G(p, x̄) with
|y| ≤ d(0, G(p, x̄)) + τ /3 < τ . Note that
Choose ε > 0 such that 1/2 < ε c < 1 and let s = εη . Then s > τ . We apply
Lemma 4B.4 with the indicated ε and s, and with (p, ω , ν ) = (p, x̄, y) which, as
seen in (20), satisfies (10), and with y = 0, since 0 ∈ IBs (y). Thus, there exists x̂
such that 0 ∈ G(p, x̂), that is, x̂ ∈ S(p), and also, from (11), |x̂ − x̄| ≤ |y|/ε . There-
fore, in view of the choice of y, we have x̂ ∈ IBτ /ε (x̄). We now consider two cases.
C ASE 1. d(0, G(p, x)) ≥ 2τ . We just proved that there exists x̂ ∈ S(p) with x̂ ∈
IBτ /ε (x̄); then
and then, by (8), the nonempty set G(p, x) ∩ 2τ IB is closed. Hence, there exists ỹ ∈
G(p, x) such that |ỹ| = d(0, G(p, x)) < 2τ and therefore
η
c|ỹ| < 2cτ = .
2
We conclude that the point (p, x, ỹ) ∈ gph G satisfies
η
|p − p̄| + max{|x − x̄|, c|ỹ|} ≤ δ + max{cτ , } ≤ η.
2
Thus, the assumptions of Lemma 4B.4 hold for (p, ω , ν ) = (p, x, ỹ), s = 2τ , and
y = 0. Hence there exists x̃ ∈ S(p) such that
210 4 Regularity Properties Through Generalized Derivatives
1
|x̃ − x| ≤ |ỹ|.
ε
Then, by the choice of ỹ,
1 1
d(x, S(p)) ≤ |x − x̃| ≤ |ỹ| = d(0, G(p, x)).
ε ε
Hence, by (21), for both cases 1 and 2, and therefore for any p in V and x ∈ U, we
have
1
d(x, S(p)) ≤ d(0, G(p, x)).
ε
Since U and V do not depend on ε , and 1/ε can be arbitrarily close to c, this gives
us (6).
With this result in hand, we can confirm the criterion for metric regularity pre-
sented at the beginning of this section.
Proof of Theorem 4B.1. For short, let dDF denote the right side of (1). We will
start by showing that reg(F; x̄| ȳ) ≤ dDF . If dDF = ∞ there is nothing to prove.
Let dDF < c < ∞. Applying Theorem 4B.3 to G(p, x) = F(x) − p and this c, let-
ting y take the place of p, we have S(y) = F −1 (y) and d(0, G(y, x)) = d(y, F(x)).
Condition (6) becomes the definition of metric regularity of F at x̄ for ȳ = p̄, and
therefore reg (F; x̄| ȳ) ≤ c. Since c can be arbitrarily close to dDF we conclude that
reg (F; x̄| ȳ) ≤ dDF .
We turn now to demonstrating the opposite inequality,
If reg (F; x̄| ȳ) = ∞ we are done. Suppose therefore that F is metrically regular at x̄
for ȳ with respect to a constant κ and neighborhoods U for x̄ and V for ȳ. Then
We know from 3E.1 that V can be chosen so small that F −1 (y) ∩ U = 0/ for every
y ∈ V . Pick any y ∈ V and x ∈ F −1 (y ) ∩U, and let v ∈ IB. Take a sequence τ k 0
such that yk := y + τ k v ∈ V for all k. By (23) and the local closedness of gph F at
(x̄, ȳ) there exists xk ∈ F −1 (y + τ k v) such that
|DF(x|y)−1 | ≤ κ .
−
Since (y, x) ∈ gph S is arbitrarily chosen near (x̄, ȳ), and κ is independent of this
choice, we conclude that (22) holds and hence we have (1).
We apply Theorem 4B.3 now to obtain for the implicit mapping result in
Theorem 3E.9 an elaboration in which graphical derivatives provide estimates. Re-
(G; p̄, x̄| ȳ), the modulus of the partial Aubin property
call here the definition of lip p
introduced just before 3E.9.
Theorem 4B.6 (implicit mapping theorem with graphical derivatives). For the
general inclusion (3) and its solution mapping S in (4), let x̄ ∈ S( p̄), so that
( p̄, x̄, 0) ∈ gph G. Suppose that gph G is locally closed at ( p̄, x̄, 0) and that the dis-
tance d(0, G(p, x̄)) depends upper semicontinuously on p at p̄. Assume further that
G has the partial Aubin property with respect to p uniformly in x at ( p̄, x̄), and that
Proof. This just combines Theorem 3E.9 with the estimate now available from
Theorem 4B.3.
Note from Proposition 4A.5 that finiteness in condition (25) necessitates, in par-
ticular, having the range of Dx G(p, x|y) be all of IRm when (p, x, y) is sufficiently
close to ( p̄, x̄, 0) in gph G.
Next we specialize Theorem 4B.6 to the generalized equations we studied in
detail in Chapters 2 and 3, or in other words, to a solution mapping of the type
(27) S(p) = x f (p, x) + F(x) 0 ,
where f : IRd × IRn → IRm and F : IRn →→ IRm . In the next two corollaries we take a
closer look at the Aubin property of the solution mapping (27).
Corollary 4B.7 (derivative criterion for generalized equations). For the solution
( f ; ( p̄, x̄)) < ∞.
mapping S in (27), and a pair ( p̄, x̄) with x̄ ∈ S( p̄), suppose that lip p
Then the mapping G(p, x) := f (p, x) + F(x) has the partial Aubin property with
respect to p uniformly in x at ( p̄, x̄) with
(30) ( f ; p̄ | x̄).
lip (S; p̄| x̄) ≤ λ lip p
( f ; ( p̄, x̄))
Proof. By definition, the mapping G has ( p̄, x̄, 0) ∈ gph G. Let μ > lip p
and let Q and U be neighborhoods of p̄ and x̄ such that f is Lipschitz continuous
with respect to p ∈ Q uniformly in x ∈ U with Lipschitz constant μ . Let p, p ∈ Q,
x ∈ U and y ∈ G(p, x); then y − f (p, x) ∈ F(x) and we have
d(y, G(p , x)) = d(y − f (p , x), F (x)) ≤ | f (p, x) − f (p , x)| ≤ μ |p − p|.
Thus,
e(G(p, x), G(p , x)) ≤ μ |p − p|
and hence G has the partial Aubin (actually, Lipschitz) property with respect to
p uniformly in x at ( p̄, x̄) with modulus satisfying (28). The assumptions that
f is differentiable near ( p̄, x̄) and gph F is locally closed at (x̄, − f ( p̄, x̄)) yield
that gph G is locally closed at ( p̄, x̄, 0) as well. Further, observe that the function
p → d(0, G(p, x̄)) = d(− f (p, x̄), F(x̄)) is Lipschitz continuous near p̄ and therefore
upper semicontinuous at p̄. Then we can apply Theorem 4B.6 where, by using the
sum rule 4A.2, the condition (25) comes down to (29) while (26) yields (30).
From Section 3F we know that when the function f is continuously differen-
tiable, the Aubin property of the solution mapping in (27) can be obtained by pass-
ing to the linearized generalized equation, in which case we can also utilize the
ample parameterization condition. Specifically, we have the following result:
Corollary 4B.8 (derivative criterion with continuous differentiability and ample pa-
rameterization). For the solution mapping S in (27), and a pair ( p̄, x̄) with x̄ ∈ S( p̄),
suppose that f is continuously differentiable on a neighborhood of ( p̄, x̄) and that
gph F is locally closed at (x̄, − f ( p̄, x̄)). If
then the converse implication holds as well; that is, S has the Aubin property at p̄
for x̄ if and only if condition (31) is satisfied.
4 Regularity Properties Through Generalized Derivatives 213
Proof. According to Theorem 3F.9, the mapping S has the Aubin property at p̄ for
x̄ provided that the linearized mapping
is metrically regular at x̄ for 0, and the converse implication holds under the ample
parameterization condition (33). Further, according to the derivative criterion for
metric regularity 4B.1, metric regularity of the mapping h + F in (34) is equivalent
to condition (31) and its regularity modulus is bounded by λ . Then the estimate (32)
follows from formula 3F(7) in the statement of 3F.9.
The purpose of the next exercise is to understand what condition (29) means in
the setting of the classical implicit function theorem.
and a pair ( p̄, x̄) with x̄ ∈ S( p̄). Suppose that f is differentiable in a neighborhood
of ( p̄, x̄) with Jacobians satisfying
Show that then S has the Aubin property at p̄ for x̄ with constant λ κ.
When f is continuously differentiable we can apply Corollary 4B.8, and the
assumptions in 4B.9 can in that case be captured by conditions on the Jacobian
∇ f ( p̄, x̄). Then 4B.8 goes a long way toward the classical implicit function theo-
rem, 1A.1. But Steps 2 and 3 of Proof I of that theorem would afterward need to
be carried out to reach the conclusion that S has a single-valued localization that is
smooth around p̄.
Applications of Theorem 4B.6 and its corollaries to constraint systems and va-
riational inequalities will be worked out in Sections 4D and 4E. We conclude the
present section with a variant of the graphical derivative formula for the modulus
of metric regularity in Theorem 4B.1, which will be put to use in the numerical
variational analysis of Chapter 6.
Recall that the closed convex hull of a set C ⊂ IRn , which will be denoted by
cl co C, is the smallest closed convex set that contains C.
Convexified graphical derivative. For a mapping F : IRn → → IRm and a pair (x, y)
with y ∈ F(x), the convexified graphical derivative of F at x for y is the mapping
D̃F(x|y) : IRn → → IRm whose graph is the closed convex hull of the tangent cone
Tgph F (x, y) to gph F at (x, y):
(x,y)→(x̄,ȳ)
(x,y)∈gph F
Proof. Since D̃F(x|y)−1 (v) ⊃ DF(x|y)−1 (v) for any v ∈ IRn , we have
(x,y)→(x̄,ȳ)
(x,y)∈gph F
and to complete the proof we only need validate the opposite inequality. Choose λ
such that
lim sup |D̃F(x|y)−1 | ≤ λ < ∞.
−
(x,y)→(x̄,ȳ)
(x,y)∈gph F
(35) sup inf |u| ≤ λ for all (x, y) ∈ gph F ∩ IBr (x̄, ȳ),
v∈IB u∈D̃F(x | y)−1 (v)
and that the set gph F ∩ IBr (x̄, ȳ) is closed. We will now demonstrate that
(36) sup inf |u| ≤ λ for all (x, y) ∈ gph F ∩ int IBr (x̄, ȳ),
v∈IB u∈DF(x | y)−1 (v)
Observe that the point (u∗ , v∗ ) is the unique projection of any point in the open seg-
ment ((u∗ , v∗ ), (w, v)) on gph DF(x|y). We will show that (u∗ , v∗ ) = (w, v), thereby
confirming (36).
By the definition of the graphical derivative DF(x|y), there exist sequences
τ k 0, uk → u∗ and vk → v∗ , such that y + τ k vk ∈ F(x + τ k uk ) for all k. Let (xk , yk ) be
a point in cl gph F which is closest to (x, y)+ τ2 (u∗ + w, v∗ + v). Since (x, y) ∈ gph F
k
we have
(x, y) + τ (u∗ + w, v∗ + v) − (xk , yk ) ≤ τ |(u∗ + w, v∗ + v)|,
k k
2 2
and consequently
4 Regularity Properties Through Generalized Derivatives 215
τk
(x, y) − (xk , yk ) ≤ (x, y) + (u∗ + w, v∗ + v) − (xk , yk )
2
τk ∗
+ |(u + w, v∗ + v)| ≤ τ k |(u∗ + w, v∗ + v)|.
2
Thus, for k sufficiently large, we have (xk , yk ) ∈ intIBr (x̄, ȳ) and hence (xk , yk ) ∈
gph F ∩ intIBr (x̄, ȳ). Setting (ūk , v̄k ) = τ1k (xk − x, yk − y), we get, from the basic prop-
erties of projections, that
1 ∗
(u + w, v∗ + v) − (ūk , v̄k ) ∈ [Tgph F (xk , yk )]∗ = gph [D̃F(xk |yk )]∗ .
2
Then, by (35), there exists wk ∈ λ IB such that v ∈ D̃F(xk |yk )(wk ) and also
3 ∗ 4 3 ∗ 4
u +w v +v
(37) − ūk , wk + − v̄k , v ≤ 0.
2 2
We will show now that (v̄k , ūk ) converges to (v∗ , u∗ ) as k → ∞. First observe that
∗ ∗
u + w v∗ + v k k
1 k u +w v +v
∗
k k
, − (ū , v̄ ) = (x, y) + τ , − (x , y )
2 2 τ k 2 2
∗
1 u + w v∗ + v
≤ k (x, y) + τ k , − (x, y) − τ k (uk , vk )
τ 2 2
∗
u +w v +v ∗
= , k k
− (u , v ).
2 2
Therefore, since (uk , vk ) is a bounded sequence, the sequence {(ūk , v̄k )} is bounded
too and has a cluster point (ū, v̄) which, since yk = y + τ k v̄k ∈ F(xk ) = F(x + τ k ūk ),
belongs to gph DF(x|y). Moreover, by the last estimation, the limit (ū, v̄) satisfies
∗ ∗
u + w v∗ + v u + w v∗ + v
,
− (ū, v̄) ≤ , ∗ ∗
− (u , v ).
2 2 2 2
This inequality, together with the fact that (u∗ , v∗ ) is the unique closest point to
1 ∗ ∗ ∗ ∗
2 (u + w, v + v) in gph DF(x, y) implies that (ū, v̄) = (u , v ).
k
Up to a subsequence, the sequence of points w satisfying (37) converges to some
w̄ ∈ λ IB. Passing to the limit in (37) we obtain
Since (w, v) is the unique projection of (u∗ , v∗ ) on the closed convex set λ IB × {v},
we have
Exercise 4B.11 (sum rule for convexified derivatives). For a function f : IRn → IRm
which is differentiable at x and a mapping F : IRn →
→ IRm , prove that
We end this section with yet another proof of the classical inverse function theo-
rem 1A.1. This time it is based on the Ekeland principle given in 4B.5.
Proof of Theorem 1A.1. Without loss of generality, let x̄ = 0, f (x̄) = 0. Let A =
∇ f (0) and let δ = |A−1 |−1 . Choose a > 0 such that
δ
(41) | f (x) − f (x ) − A(x − x)| ≤ |x − x | for all x, x ∈ aIB,
2
and let b = aδ /2. We now redo Step 1 in Proof I that the localization s of f −1 with
respect to the neighborhoods bIB and aIB is nonempty-valued. The other two steps
remain the same as in Proof I.
Fix y ∈ bIB and consider the function | f (x)−y| with domain containing the closed
ball aIB, which we view as a complete metric space equipped with the Euclidean
metric. This function is continuous and bounded below, hence, by Ekeland principle
4B.5 with the indicated δ and ū = 0 there exists xδ ∈ aIB such that
δ
(42) |y − f (xδ )| < |y − f (x)| + |x − xδ | for all x ∈ aIB, x = xδ .
2
Let us assume that y = f (xδ ). Then x̃ := A−1 (y − f (xδ )) + xδ = xδ . Moreover, from
(41) with x = xδ and x = 0 and the choice of δ and b we get
4 Regularity Properties Through Generalized Derivatives 217
−1 −1 aδ
|x̃| ≤ |A |(|y| + | − f (xδ ) + Axδ |) ≤ |A | b + = |A−1 |aδ = a.
2
δ
(43) |y − f (xδ )| < |y − f (x̃)| + |x̃ − xδ |.
2
Using (41), we have
and also
which furnishes a contradiction. Thus, our assumption that y = f (xδ ) is voided, and
we have xδ ∈ f −1 (y) ∩ (aIB). This means that s is nonempty-valued, and the proof
is complete.
Proof. The equivalence between (2) and (3) comes from 4A.6. To get the equiv-
alence of these conditions with strong metric subregularity, suppose first that κ >
subreg(F; x̄| ȳ) so that F is strongly metrically subregular at x̄ for ȳ and (1) holds for
some neighborhoods U and V . By definition, having v ∈ DF(x̄| ȳ)(u) refers to the
existence of sequences uk → u, vk → v and τ k 0 such that ȳ + τ k vk ∈ F(x̄ + τ k uk ).
Then x̄ + τ k uk ∈ U and ȳ + τ k vk ∈ V eventually, so that (1) yields |(x̄ + τ k uk ) − x̄| ≤
κ |(ȳ + τ k vk ) − ȳ|, which is the same as |uk | ≤ κ |vk |. In the limit, this implies
|u| ≤ κ |v|. But then, by 4A.6, |DF(x̄| ȳ)−1 |+ ≤ κ and hence
In the other direction, (3) implies the existence of a κ > 0 such that
This in turn implies that |x − x̄| ≤ κ |y − ȳ| for all (x, y) ∈ gph F close to (x̄, ȳ). That
description fits with (1). Further, κ can be chosen arbitrarily close to |DF(x̄| ȳ)−1 |+ ,
and therefore |DF(x̄| ȳ)−1 |+ ≥ subreg (F; x̄| ȳ). This, combined with (5), finishes the
argument.
4 Regularity Properties Through Generalized Derivatives 219
Fix ȳ and any x̄ ∈ S(ȳ), and suppose that gph F is locally closed at (x̄, ȳ), which is
the same as gph S being locally closed at (ȳ, x̄). Then
+
clm (S; ȳ| x̄) = |DS(ȳ| x̄)| .
Thus, S has the isolated calmness property at ȳ for x̄ if and only if |DS(ȳ| x̄)|+ < ∞.
Theorem 4C.1 immediately gives us the linearization result in Corollary 3I.9 by
using the sum rule in 4A.2.
Implicit function theorems could be developed for the isolated calmness of so-
lution mappings to general inclusions G(p, x) 0 in parallel to the results in 4B,
but we shall not do this here. We limit ourselves to an application of the derivative
criterion 4C.1 to the solution mapping of a generalized equation
(6) S(p) = x f (p, x) + F(x) 0 ,
Corollary 4C.3 (derivative rule for isolated calmness of solution mappings). For
the solution mapping S in (6) and a pair ( p̄, x̄) with x̄ ∈ S( p̄), suppose that f is
differentiable with respect to x uniformly in p at ( p̄, x̄) and also differentiable with
respect to p uniformly in x at ( p̄, x̄). Also, suppose that gph F is locally closed at
(x̄, − f ( p̄, x̄)). If
then S has the isolated calmness property at p̄ for x̄, moreover with
then the converse implication holds as well; that is, S has isolated calmness property
at p̄ for x̄ if and only if (7) is satisfied.
Proof. We apply Theorem 3I.13 according to which the mapping S has the isolated
calmness property at p̄ for x̄ if the mapping
is strongly metrically subregular at x̄ for 0, and the converse implication holds under
the ample parameterization condition (9). Then, it is sufficient to apply 4C.1 together
with the sum rule 4A.2 to the mapping h + F. The estimate (8) follows from formula
3I(24).
x
−1 0 1
and therefore
|DF(0|0)−1 | = 0, |DF(0|0)−1 | = ∞.
+ −
This fits with F being strongly metrically subregular, but not metrically regular at 0
for 0.
4 Regularity Properties Through Generalized Derivatives 221
Theorem 4D.1 (implicit mapping theorem for a constraint system). Let x̄ ∈ S( p̄) for
solution mapping S of the constraint system (1) and suppose that f is differentiable
( f ; ( p̄, x̄)) < ∞, and that the set K is
in a neighborhood of ( p̄, x̄) and satisfies lip p
closed. If
(3) lim sup sup d 0, Dx f (p, x)−1 (v + TK ( f (p, x) − y)) ≤ λ < ∞,
(p,x,y)→( p̄,x̄,0) |v|≤1
f (p,x)−y∈K
Next, we use the definition of inner norm in 4A(8) to write 4B(29) as (3) and apply
4B.7 to obtain that S has the Aubin property at p̄ for x̄. The estimate (4) follows
immediately from 4B(30).
A much sharper result can be obtained when f is continuously differentiable and
the set K in the system (1) is polyhedral convex.
Theorem 4D.2 (constraint systems with polyhedral convexity). Let x̄ ∈ S( p̄) for the
solution mapping S of the constraint system (1) in the case of a polyhedral convex
set K . Suppose that f is continuously differentiable in a neighborhood of ( p̄, x̄).
Then for S to have the Aubin property at x̄ for p̄, it is sufficient that
in which case the corresponding modulus satisfies lip (S; p̄| x̄) ≤ λ |∇ p f ( p̄, x̄)| for
222 4 Regularity Properties Through Generalized Derivatives
(6) λ = sup d 0, Dx f ( p̄, x̄)−1 (v + TK ( f ( p̄, x̄))) .
|v|≤1
Moreover (5) is necessary for S to have the Aubin property at p̄ for x̄ under the
ample parameterization condition
Proof. We invoke Theorem 4D.1 but make special use of the fact that K is polyhe-
dral. That property implies that TK (w) ⊃ TK (w̄) for all w sufficiently near to w̄, as
seen in 2E.3; we apply this to w = f (p, x) − y and w̄ = f ( p̄, x̄) in the formulas (3)
and (4) of 4D.1. The distances in question are greatest when the cone is as small as
possible; this, combined with the continuous differentiability of f , allows us to drop
the limit in (3). Further, from the equivalence relation 4A(16) in Corollary 4A.7, we
obtain that the finiteness of λ in (6) is equivalent to (5).
For the necessity, we bring in a further argument which makes use of the ample
parameterization condition (7). According to Theorem 3F.9, under (7) the Aubin
property of S at p̄ for x̄ implies metric regularity of the linearized mapping h − K
for h(x) = f ( p̄, x̄) + ∇x f ( p̄, x̄)(x − x̄). The derivative criterion for metric regularity
4B.1 tells us then that
lim sup (x,y)→(x̄,0) sup|v|≤1 d 0,
(8) f ( p̄,x̄)+Dx f ( p̄,x̄)(x−x̄)−y∈K
Dx f ( p̄, x̄)−1 (v + TK ( f ( p̄, x̄) + Dx f ( p̄, x̄)(x − x̄) − y)) < ∞.
Let x̄ solve this for p̄ and let each fi be continuously differentiable around ( p̄, x̄).
Then a sufficient condition for S to have the Aubin property for p̄ for x̄ is the
Mangasarian–Fromovitz condition:
∇x fi ( p̄, x̄)w < 0 for i ∈ [1, s] with fi ( p̄, x̄) = 0,
(9) ∃ w ∈ IR with
n
∇x fi ( p̄, x̄)w = 0 for i ∈ [s + 1, m],
and
Moreover, the combination of (9) and (10) is also necessary for S to have the Aubin
property under the ample parameterization condition (7). In particular, when f is
independent of p and then 0 ∈ f (x̄) − K, the Mangasarian–Fromovitz condition (9)–
(10) is a necessary and sufficient condition for metric regularity of the mapping
f − K at x̄ for 0.
Detail. According to 4D.2, it is enough to show that (5) is equivalent to the com-
bination of (9) and (10) in the case of K = IRs− × {0}m−s. Observe that the tangent
cone to the set K at f ( p̄, x̄) has the following form:
≤ 0 for i ∈ [1, s] with fi ( p̄, x̄) = 0,
(11) v ∈ TK ( f ( p̄, x̄)) ⇐⇒ vi
= 0 for i ∈ [s + 1, m].
Let (5) hold. Then, using (11), we obtain that the matrix with rows the vectors
∇x fs+1 ( p̄, x̄), . . . , ∇x fm ( p̄, x̄) must be of full rank, hence (10) holds. If (9) is violated,
then for every w ∈ IRn either ∇x fi ( p̄, x̄)w ≥ 0 for some i ∈ [1, s] with fi ( p̄, x̄) = 0, or
∇x fi ( p̄, x̄)w = 0 for some i ∈ [s + 1, m], which contradicts (5) in an obvious way.
The combination of (9) and (10) implies that for every y ∈ IRm there exist w, v ∈
IR and z ∈ IRm with zi ≤ 0 for i ∈ [1, s] with fi ( p̄, x̄) = 0 such that
n
∇x fi ( p̄, x̄)w − zi = yi for i ∈ [1, s] with fi ( p̄, x̄) = 0,
∇x fi ( p̄, x̄)(w + v) = yi for i ∈ [s + 1, m].
But then (5) follows directly from the form (11) of the tangent cone.
If f is independent of p, by 3E.6 the metric regularity of − f + K is equivalent
to the Aubin property of the inverse (− f + K)−1 , which is the same as the solution
mapping
S(p) = x p + f (x) ∈ K
for which the ample parameterization condition (7) holds automatically. Then, from
4D.2, for x̄ ∈ S( p̄), the Aubin property of S at p̄ for x̄ and hence metric regularity of
f − K at x̄ for p̄ is equivalent to (5) and therefore to (9)–(10).
Exercise 4D.4. Consider the constraint system in 4D.3 with f (p, x) = g(x) − p,
p̄ = 0 and g continuously differentiable near x̄. Show that the existence of a
Lipschitz continuous local selection of the solution mapping S at 0 for x̄ implies
the Mangasarian–Fromovitz condition. In other words, the existence of a Lipschitz
continuous local selection of S at 0 for x̄ implies metric regularity of the mapping
g − K at x̄ for 0.
Guide. Utilizing 2B.11, from the existence of a local selection of S at 0 for x̄ we
obtain that the inverse F0−1 of the linearization F0 (x) := g(x̄) + ∇g(x̄)(x − x̄) − K
has a Lipschitz continuous local selection at 0 for x̄. Then, in particular, for every
v ∈ IRm there exists w ∈ IRn such that
∇gi (x̄)w ≤ vi for i ∈ [1, s] with gi (x̄) = 0,
∇gi (x̄)w = vi for i ∈ [s + 1, m].
224 4 Regularity Properties Through Generalized Derivatives
for a function f : IRd × IRn → IRn and the normal cone mapping NC associated with a
nonempty, closed, convex set C ⊂ IRn , and the solution mapping S : IRd →
→ IRn defined
by
(2) S(p) = x f (p, x) + NC (x) 0 .
Especially strong results were obtained in 2E for the case in which C is a polyhedral
convex set, and that will also persist here. Of special importance in that setting is
the critical cone associated with C at a point x with respect to a vector v ∈ NC (x),
defined by
Theorem 4E.1 (isolated calmness for variational inequalities). For the variational
inequality (1) and its solution mapping (2) under the assumption that the convex
set C is polyhedral, let x̄ ∈ S( p̄) and suppose that f is continuously differentiable
around ( p̄, x̄). Let A = ∇x f ( p̄, x̄) and let K = KC (x̄, v̄) be the corresponding critical
cone in (3) for v̄ = − f ( p̄, x̄). If
then the solution mapping S has the isolated calmness property at p̄ for x̄ with
Moreover, under the ample parameterization condition rank ∇ p f ( p̄, x̄) = n, the
property in (4) is not just sufficient but also necessary for S to have the isolated
calmness property at p̄ for x̄.
Proof. Utilizing the specific form of the graphical derivative established in 4A.4
and the equivalence relation 4A(13) in 4A.6, we see that (4) is equivalent to the
condition 4C(7) in Corollary 4C.3. Everything then follows from the claim of that
corollary.
Exercise 4E.2 (alternative cone condition). In terms of the cone K ∗ that is polar to
K , show that the condition in (4) is equivalent to
(5) w ∈ K, −Aw ∈ K ∗ , w ⊥ Aw =⇒ w = 0.
This will serve to illustrate the result in Theorem 4E.1. Using the notation introduced
in Section 2E for the analysis of a complementarity problem, we associate with the
reference point (x̄, v̄) ∈ gph NIRn+ the index sets J1 , J2 and J3 in {1, . . . , n} given by
J1 = j x̄ j > 0, v̄ j = 0 , J2 = j x̄ j = 0, v̄ j = 0 , J3 = j x̄ j = 0, v̄ j < 0 .
Then, by 2E.5, the critical cone K = KC (x̄, v̄) = TIRn+ (x̄) ∩ [ f ( p̄, x̄)]⊥ is described by
226 4 Regularity Properties Through Generalized Derivatives
⎧
⎨ w j free for i ∈ J1 ,
(7) w ∈ K ⇐⇒ wj ≥ 0 for i ∈ J2 ,
⎩w = 0 for i ∈ J3 .
j
Any solution x of (9) is a stationary point for problem (8), denoted S(v), and the
associated stationary point mapping is v → S(v) = (Dg + NC )−1 (v). The set of local
minimizers of (8) for v is a subset of S(v). If the function g is convex, every station-
ary point is not only local but also a global minimizer. For the variational inequality
(9), the critical cone to C associated with a solution x for v has the form
If x furnishes a local minimum of (8) for v, then, according to 2G.1(a), x must satisfy
the second-order necessary condition
In addition, from 2G.1(b), when x ∈ S(v) satisfies the second-order sufficient condi-
tion
1 For a detailed description of the classes of matrices appearing in the theory of linear complemen-
tarity problems, see the book Cottle, Pang and Stone [1992].
4 Regularity Properties Through Generalized Derivatives 227
then x is a local optimal solution of (8) for v. Having x to satisfy (9) and (11) is
equivalent to the existence of ε > 0 and δ > 0 such that
where KC+ (x, v) = KC (x, v − ∇g(x)) − KC (x, v − ∇g(x)) is the critical subspace asso-
ciated with x and v. We now complement this result with a necessary and sufficient
condition for isolated calmness of S combined with local optimality at the reference
point.
(13) u ∈ K, −Au ∈ K ∗ , u ⊥ Au =⇒ u = 0,
where K = KC (x̄, v̄−∇g(x̄)). Let (i) hold. Then of course x̄ is a local optimal solution
as described. If (ii) doesn’t hold, there must exist some u = 0 satisfying the condi-
tions in the left side of (13), and that would contradict the inequality u, Au > 0 in
(11).
Conversely, assume that (ii) is satisfied. Then the second-order necessary condi-
tion (10) must hold; this can be written as
u∈K =⇒ − Au ∈ K ∗ .
The isolated calmness property of S at v̄ for x̄ is identified with (13), which in turn
eliminates the possibility of there being a nonzero u ∈ K such that the inequality
in (10) fails to be strict. Thus, the necessary condition (10) turns into the sufficient
condition (11). We already know that (11) implies (12), so the proof is complete.
228 4 Regularity Properties Through Generalized Derivatives
Theorem 4F.1 (localization criterion under polyhedral convexity). For the solution
mapping S of (1) under the assumption that the convex set C is polyhedral, let x̄ ∈
S( p̄) and suppose that f is continuously differentiable near ( p̄, x̄). Let
(2) A = ∇x f ( p̄, x̄) and K = w ∈ TC (x̄) w ⊥ f ( p̄, x̄) ,
noting that the critical cone K is likewise polyhedral convex. Suppose the mapping
A + NK has the property that
In addition, under the ample parameterization condition rank ∇ p f ( p̄, x̄) = n the con-
dition in (3) is not just sufficient but also necessary for S to have a Lipschitz contin-
uous single-valued localization around p̄ for x̄.
Through graphical derivatives, we will actually be able to show that strong metric
regularity of A + NK is implied simply by metric regularity, or in other words, that
the invertibility condition in (3) follows already from (A + NK )−1 having the Aubin
property at 0 for 0, due to the special structure of the mapping A + NK .
Our tactic for bringing this out will involve applying Theorem 4B.1 to A + NK .
Before that can be done, however, we put some effort into a better understanding of
the normal cone mapping NK .
Faces of a cone. For a polyhedral convex cone K , a face is a set F of the form
4 Regularity Properties Through Generalized Derivatives 229
Lemma 4F.2 (critical face lemma). Let C be a convex polyhedral set, let v ∈ NC (x)
and let K = KC (x, v) be the critical cone for C at (x, v),
K = TC (x) ∩ [v]⊥ .
Then there exists a neighborhood O of (x, v) such that for every choice of (x , v ) ∈
gph NC ∩ O the corresponding critical cone KC (x , v ) has the form
KC (x , v ) = F1 − F2
and
Now, let (x , v ) ∈ gph NC be close to (x, v) and let x = x − x. Then from (5) we
have
KC (x , v ) = TC (x ) ∩ [v ] = TC (x) + [x ] ∩ [v ]⊥ .
⊥
We will next show that KC (x, v ) ⊂ K for v sufficiently close to v. If this were not
so, there would be a sequence vk → v and another sequence wk ∈ KC (x, vk ) such that
wk ∈/ K for all k. Each set KC (x, vk ) is a face of TC (x), but since TC (x) is polyhedral,
the set of its faces is finite, hence for some face F of TC (x) we have KC (x, vk ) = F
for infinitely many k. Note that the set gph KC (x, ·) is closed, hence for any w ∈ F,
since (vk , w) is in this graph, the limit (v, w) belongs to it as well. But then w ∈ K
and since w ∈ F is arbitrarily chosen, we have F ⊂ K. Thus the sequence wk ∈ K
for infinitely many k, which is a contradiction. Hence KC (x, v ) ⊂ K.
230 4 Regularity Properties Through Generalized Derivatives
Let (x , v ) ∈ gph NC be close to (x, v). Relation (7) tells us that KC (x , v ) =
KC (x, v ) + [x ] for x = x − x. Let F1 = TC (x) ∩ [v ]⊥ , this being a face of TC (x).
The critical cone K = KC (x, v) = TC (x) ∩ [v]⊥ is itself a face of TC (x), and any face
of TC (x) within K is also a face of K. Then F1 is a face of the polyhedral cone K.
Let F2 be the face of F1 having x in its relative interior. Then F2 is also a face of K
and therefore KC (x , v ) = F1 − F2 , furnishing the desired representation.
Conversely, let F1 be a face of K. Then there exists v ∈ K ∗ = NK (0) such that
F1 = K ∩ [v ]⊥ . The size of v does not matter; hence we may assume that v + v ∈
NC (x) by the reduction lemma 2E.4. By repeating the above argument we have F1 =
TC (x) ∩ [v ]⊥ for v := v + v . Now let F2 be a face of F1 . Let x be in the relative
interior of F2 . In particular, x ∈ TC (x), so by taking the norm of x sufficiently small
we can arrange that the point x = x + x lies in C. We have x ⊥ v and, as in (7),
F1 −F2 = TC (x)∩[v ]⊥ +[x ] = TC (x)+[x ] ∩[v ]⊥ = TC (x )∩[v ]⊥ = KC (x , v ).
Lemma 4F.3 (regularity modulus from derivative criterion). For A + NK with A and
K as in (2), we have
Thus, A + NK is metrically regular at 0 for 0 if and only if |(A + NF1 −F2 )−1 |− < ∞
for every F1 , F2 ∈ FK with F1 ⊃ F2 .
Proof. From Theorem 4B.1, combined with Example 4A.4, we have that
Lemma 4F.2 with (x, v) = (0, 0) gives us the desired representation NTK (x)∩[y−Ax]⊥ =
NF1 −F2 for (x, y) near zero and hence (8).
Example 4F.4 (critical faces for complementarity problems). Consider the comple-
mentarity problem
f (p, x) + NIRn+ (x) 0,
with a solution x̄ for p̄, with K and A as in (2) (with C = IRn+ ) and index sets
J1 = j x̄ j > 0, v̄ j = 0 , J2 = j x̄ j = 0, v̄ j = 0 , J3 = j x̄ j = 0, v̄ j < 0
for v̄ = − f ( p̄, x̄). Then the cones F1 − F2 , where F1 and F2 are closed faces of K with
F1 ⊃ F2 , are the cones K̃ of the following form: There is a partition of {1, . . . , n} into
index sets J1 , J2 , J3 with
4 Regularity Properties Through Generalized Derivatives 231
J1 ⊂ J1 ⊂ J1 ∪ J2 , J3 ⊂ J3 ⊂ J2 ∪ J3 ,
such that
⎧
⎨ xi free for i ∈ J1 ,
(9) x ∈ F1 − F2 ⇐⇒ x ≥ 0 for i ∈ J2 ,
⎩ i
xi = 0 for i ∈ J3 .
Detail. Each face F of K has the form K ∩[v ]⊥ for some vector v ∈ K ∗ . The vectors
v in question are those with
⎧
⎨ vi = 0 for i ∈ J1 ,
v ≤ 0 for i ∈ J2 ,
⎩ i
vi free for i ∈ J3 .
The closed faces F of K correspond one-to-one therefore with the subsets of J2 : the
face F corresponding to an index set J2F consists of the vectors x such that
⎧
⎨ xi free for i ∈ J1 ,
x ≥ 0 for i ∈ J2 \ J2F ,
⎩ i
xi = 0 for i ∈ J3 ∪ J2F .
Exercise 4F.5 (critical face criterion for metric regularity). For a continuously
differentiable function f : IRn → IRn and a polyhedral convex set C ⊂ IRn , let
f (x̄) + NC (x̄) 0. Show that the mapping f + NC is metrically regular at x̄ for 0
if and only if, for all choices of faces F1 and F2 of the critical cone K to the set C at
x̄ for v̄ = − f (x̄), with F1 ⊃ F2 , the following condition holds with A = ∇ f (x̄):
Exercise 4F.6 (variational inequality over a subspace). Show that when the critical
cone K in 4F.5 is a subspace of IRn of dimension m ≤ n, then the matrix BABT is
nonsingular, where B is the matrix whose columns form an orthonormal basis in K .
Using some of the results obtained so far in this section, we will now prove
that, for a mapping appearing in particular in the Karush–Kuhn–Tucker (KKT) op-
timality conditions in nonlinear programming, metric regularity and strong metric
regularity are equivalent properties. Specifically, consider the standard nonlinear
programming problem
232 4 Regularity Properties Through Generalized Derivatives
= 0 for i ∈ [1, r],
(10) minimize g0 (x) over all x satisfying gi (x)
≤ 0 for i ∈ [r + 1, m]
where
⎛ ∇g (x) + m ∇g (x)y ⎞
0 ∑i=1 i i
⎜ −g1 (x) ⎟
(12) f (x, y) = ⎜
⎝ . ⎟
⎠
..
−gm(x)
and
Theorem 2A.8 tells us that, under the constraint qualification condition 2A(14), for
any local minimum x of (10) there exists a Lagrange multiplier y, with yi ≥ 0 for
i = r + 1, . . . , m, such that (x, y) is a solution of (11). We will now establish an
important fact for the mapping on the left side of (11).
Theorem 4F.7 (KKT metric regularity implies strong metric regularity). Consider
the mapping F : IRn+m →
→ IRn+m defined as
with f as in (12) for z = (x, y) and E as in (13), and let z̄ = (x̄, ȳ) solve (11), that is,
F(z̄) 0. If F is metrically regular at z̄ for 0, then F is strongly metrically regular
there.
We already showed in Theorem 3G.5 that this kind of equivalence holds for lo-
cally monotone mappings, but here F need not be monotone even locally, although
it is a special kind of mapping in another way.
The claimed equivalence is readily apparent in a simple case of (10) when F is
an affine mapping, which corresponds to problem (10) with no constraints and with
g0 being a quadratic function, g0 (x) = 12 x, Ax + b, x for an n × n matrix A and a
vector b ∈ IRn . Then F(x, y) = Ax + b and metric regularity of F (at any point) means
that A has full rank. But then A must be nonsingular, so F is in fact strongly regular.
The general argument for F = f + NE is lengthy and proceeds through a se-
ries of reductions. First, since our analysis is local, we can assume without loss of
generality that all inequality constraints are active at x̄. Indeed, if for some index
i ∈ [r + 1, m] we have gi (x̄) < 0, then ȳi = 0. For q ∈ IRn+m consider the solution set
of the inclusion F(z) q. Then for any q near zero and all x near x̄ we will have
gi (x) < qi , and hence any Lagrange multiplier y associated with such an x must have
4 Regularity Properties Through Generalized Derivatives 233
yi = 0; thus, for q close to zero the solution set of F(z) q will not change if
we drop the constraint with index i. Further, if there exists an index i such that
ȳi > 0, then we can always rearrange the constraints so that ȳi > 0 for i ∈ [r + 1, s]
for some r < s ≤ m. Under these simplifying assumptions the critical cone K =
KE (z̄, v̄) to the set E in (13) at z̄ = (x̄, ȳ) for v̄ = − f (z̄) is the product IRn × IRs ×
IRm−s
+ . (Show that this form of the critical cone can be also derived by utilizing
Example 2E.5.) The normal cone mapping NK to the critical cone K has then the
form NK = {0}n × {0}s × N+m−s .
We next recall that metric regularity of F is equivalent to metric regularity of the
mapping
L : z → ∇ f (z̄)z + NK (z) for z = (x, y) ∈ IRn+m
at 0 for 0 and the same equivalence holds for strong metric regularity. This reduction
to a simpler situation has already been highlighted several times in this book, e.g. in
2E.8 for strong metric regularity and 3F.7 for metric regularity. Thus, to achieve our
goal of confirming the claimed equivalence between metric regularity and strong
regularity for F, it is enough to focus on the mapping L which, in terms of the
functions gi in (10), has the form
A BT
(15) L= + NK ,
−B 0
where ⎛ ⎞
m
∇x g1 (x̄)
⎜ ⎟
A = ∇2 g0 (x̄) + ∑ ∇2 gi (x̄)ȳi and B = ⎝ ..
. ⎠.
i=1
∇gm (x̄)
Taking into account the specific form of NK , the inclusion (v, w) ∈ L(x, y) becomes
⎧
⎨ v = Ax + BT y,
(16) (w + Bx)i = 0 for i ∈ [1, s],
⎩
(w + Bx)i ≤ 0, yi ≥ 0, yi (w + Bx)i = 0 for i ∈ [s + 1, m].
In further preparation for proving Theorem 4F.7, next we state and prove three
lemmas. From now on any kind of regularity is at 0 for 0, unless specified otherwise.
Lemma 4F.8 (KKT metric regularity implies strong metric subregularity). If the
mapping L in (15) is metrically regular, then it is strongly subregular.
Proof. Suppose that L is metrically regular. Then the critical face criterion dis-
played in 4F.5 with critical faces given in 4F.4 takes the following form: for ev-
ery partition J1 , J2 , J3 of {s + 1, . . . , m} and for every (v, w) ∈ IRn × IRm there exists
(x, y) ∈ IRn × IRm satisfying
234 4 Regularity Properties Through Generalized Derivatives
⎧
⎪
⎪ v = Ax + BT y,
⎪
⎪
⎨ (w + Bx)i = 0 for i ∈ [1, s],
(17) (w + Bx)i = 0 for i ∈ J1 ,
⎪
⎪
⎪
⎪ (w + Bx)i ≤ 0, yi ≥ 0, yi (w + Bx)i = 0 for i ∈ J2 ,
⎩
yi = 0 for i ∈ J3 .
Indeed, to reach such a conclusion it is enough to take J = J1 and J2 = 0/ in (17). By
4E.1, the mapping L in (15) is strongly subregular if and only if
Now, suppose that L is not strongly subregular. Then, by (19), for some index set J ⊂
{s + 1, . . ., m}, possibly the empty set, there exists a nonzero vector (x, y) ∈ IRn × IRm
satisfying (17) for v = 0, w = 0. Note that this y has y j = 0 for j ∈ {s + 1, . . . , m} \ J.
But then the nonzero vector z = (x, y) with y having components in {1, . . . , s} × J
solves N(J)z = 0 where the matrix N(J) is defined in (18). Hence, N(J) is singular,
and then the condition involving (17) is violated; thus, the mapping L is not metri-
cally regular. This contradiction means that L must be strongly subregular.
The next two lemmas present general facts that are separate from the specific cir-
cumstances of nonlinear programming problem (10) considered. The second lemma
is a simple consequence of Brouwer’s invariance of domain theorem 1F.1:
ϕ (x̂(pk )) → ϕ (x̂(p)) as k → ∞.
from IRn × IRm to itself, where bi are the rows of the matrix B and where we let
y+ = max{0, y} and y− = y − y+ .
For a given (v, u) ∈ IRn × IRm , let (x, y) ∈ H −1 (v, u). Then for zi = y+i , i =
s + 1, . . . , q, we have (x, z) ∈ L−1 (v, u). Indeed, for each i = s + 1, . . . , m, if yi ≤ 0,
then ui + bi , x = y− + −
i ≤ 0 and (ui + bi , x)yi = 0; otherwise ui + bi , x = yi = 0.
−1
Conversely, if (x, z) ∈ L (v, u) then for
z if zi > 0,
(21) yi = i
ui + bi , x if zi = 0,
236 4 Regularity Properties Through Generalized Derivatives
we obtain (x, y) ∈ H −1 (v, u). Thus, in order to achieve our goal for the mapping L,
we can focus on the same question for the equivalence between metric regularity
and strong metric regularity for the function H in (20).
Suppose that H is metrically regular but not strongly metrically regular. Then,
from 4F.8 and the equivalence between regularity properties of L and H, H is
strongly subregular. Consequently, its inverse H −1 has both the Aubin property and
the isolated calmness property, both at 0 for 0. In particular, since H is positively
homogeneous and has closed graph, for each w sufficiently close to 0, H −1 (w) is
a compact set contained in an arbitrarily small ball around 0. Let a > 0. For any
w ∈ aIB the problem
has a solution (x(w), y(w)) which, from the property of H −1 mentioned just above
(22), has a nonempty-valued graphical localization around 0 for 0. According to
Lemma 4F.10, this localization is either a continuous function or a multi-valued
mapping. If it is a continuous function, Lemma 4F.9 implies that H −1 has a con-
tinuous single-valued localization around 0 for 0. But then, since H −1 has the
Aubin property at that point, we conclude that H must be strongly metrically re-
gular, which contradicts the assumption made. Hence, any graphical localization
of the solution mapping of (22) is multi-valued. Thus, there exists a sequence
zk = (vk , uk ) → 0 and two sequences (xk , yk ) → 0 and (ξ k , η k ) → 0, whose k-terms
are both in H −1 (zk ), such that the m-components of yk and η k are the same, ykm = ηmk ,
but (xk , yk ) = (ξ k , η k ) for all k. Remove from yk the final component ykm and denote
the remaining vector by yk−m . Do the same for η k . Then (xk , yk−m ) and (ξ k , η−m
k ) are
both solutions of
s m−1
vk − bm ykm = Axk + ∑ bi yi + ∑ bi y+i
i=1 i=s+1
uk1 = −b1 , x + y1
..
.
uks = −bs , x + ys
..
.
uks+1 = −bs+1 , x + y−s+1
..
.
ukm−1 = −bm−1 , x + y−m−1 .
This relation concerns the reduced mapping H−m with m − 1 vectors bi , and accord-
ingly a vector y of dimension m − 1:
4 Regularity Properties Through Generalized Derivatives 237
⎛ ⎞
Ax + ∑si=1 bi yi + ∑i=s+1
m−1
bi y+i
⎜ −b1 , x + y1 ⎟
⎜ ⎟
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎜
H−m (x, y) = ⎜ −bs , x + ys ⎟.
⎟
⎜ −
−bs+1 , x + ys+1 ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
−bm−1 , x + y−m−1
We obtain that the mapping H−m cannot be strongly metrically regular because for
the same value zk = (vk − bmykm , uk−m ) of the parameter arbitrarily close to 0, we have
two solutions (xk , yk−m ) and (ξ k , η−m
k ). On the other hand, H
−m is metrically regular
as a submapping of H; this follows e.g. from the characterization in (17) for metric
regularity of the mapping L, which is equivalent to the metric regularity of H−m if
we choose J3 in (17) always to include the index m.
Thus, our assumption for the mapping H leads to a submapping H−m , of one less
variable y associated with the “inequality” part of L, for which the same assump-
tion is satisfied. By proceeding further with “deleting inequalities” we will end up
with no inequalities at all, and then the mapping L becomes just the linear mapping
represented by the square matrix
A BT 0 .
B0 0
But this linear mapping cannot be simultaneously metrically regular and not strongly
metrically regular, because a square matrix of full rank is automatically nonsingu-
lar. Hence, our assumption that the mapping H is metrically regular and not strongly
regular is void.
Exercise 4F.11. Find a formula for the metric regularity modulus of the mapping F
in (14).
Guide. The regularity modulus of F at z̄ for 0 equals the regularity modulus of the
mapping L in (15) at 0 for 0. To find a formula for the latter, utilize Lemma 4F.3.
Generalized Jacobian. For f : IRn → IRm and any x̄ ∈ dom f where lip ( f ; x̄) < ∞,
denote by ∇
¯ f (x̄) the set consisting of all matrices A ∈ IRm×n for which there is a
sequence of points xk → x̄ such that f is differentiable at xk and ∇ f (xk ) → A. The
Clarke generalized Jacobian of f at p̄ is the convex hull of this set: co ∇¯ f (x̄).
Note that ∇ ¯ f (x̄) is a nonempty, closed, bounded subset of IRm×n . This ensures
that the convex set co ∇ ¯ f (x̄) is nonempty, closed, and bounded as well. Strict dif-
ferentiability of f at x̄ is known to be characterized by having ∇ ¯ f (x̄) consist of a
single matrix A (or equivalently by having co ∇ f (x̄) consist of a single matrix A),
¯
in which case A = ∇ f (x̄).
The inverse function theorem based on this notion, which we state next without
proof2 , says roughly that a Lipschitz continuous function can be inverted when all
elements of the generalized Jacobian are nonsingular. Compared with the classical
inverse function theorem, the main difference is that the single-valued graphical
localization so obtained can only be claimed to be Lipschitz continuous.
Theorem 4G.1 (Clarke’s inverse function theorem). Consider f : IRn → IRn and a
point x̄ ∈ int dom f where lip ( f ; x̄) < ∞. Let ȳ = f (x̄). If all of the matrices in the
generalized Jacobian co ∇¯ f (x̄) are nonsingular, then f −1 has a Lipschitz continuous
single-valued localization around ȳ for x̄.
For illustration, we provide some elementary cases.
has generalized Jacobian co ∇ ¯ f (0) = [1, 2], which does not contain 0. According to
−1
Theorem 4G.1, f has a Lipschitz continuous single-valued localization around 0
for 0.
In contrast, the function f : IR → IR given by f (x) = |x| has co ∇¯ f (0) = [−1, 1],
which does contain 0. Although the theorem makes no claims about this case, there
is no graphical localization of f −1 around 0 for 0 that is single-valued.
A simple 2-dimensional example3 is with
|x1 | + x2
f (x) = ,
2x1 + |x2 |
for which
λ 1
co ∇
¯ f (0, 0) = | −1 ≤ λ ≤ 1, −1 ≤ τ ≤ 1 .
2 τ
This set of matrices does not contain a singular matrix, and hence 4G.1 can be
applied.
A better hold on the existence of single-valued Lipschitz continuous localizations
can be gained through a different version of graphical differentiation.
Strict graphical derivative. For a function f : IRn → IRm and any point x̄ ∈ dom f ,
the strict graphical derivative at x̄ is the set-valued mapping D∗ f (x̄) : IRn → → IRm
defined by
f (xk + t k uk ) − f (xk )
D∗ f (x̄)(u) = w ∃ (t k , xk , uk ) → (0, x̄, u) with w = lim .
k→∞ tk
When lip ( f ; x̄) < ∞, the set D∗ f (x̄)(u) is nonempty, closed and bounded in IRm
for each u ∈ IRn . Then too, the definition of D∗ f (x̄)(u) can be simplified
by tak-
ing uk ≡ u. In this Lipschitzian setting it can be shown that D∗ f (x̄)u = Au A ∈
co ∇¯ f (x̄) for all u if m = 1, but that fails for higher dimensions. In general, it is
known4 that for a function f : IRn → IRm with lip ( f ; x̄) < ∞, one has
(1) co ∇
¯ f (x̄)(u) ⊃ D∗ f (x̄)(u) for all u ∈ IRn .
θ − : x → x− := min{x, 0}, x ∈ IR
satisfies
θ − (x) = x − θ + (x).
Then, just by applying the definition, we get
Equipped with strict graphical derivatives, we are now able to present a general-
ization of the classical inverse function theorem which furnishes a complete charac-
terization of the existence of a Lipschitz continuous localization of the inverse, and
thus sharpens the theorem of Clarke.
Theorem 4G.3 (Kummer’s inverse function theorem). Let f : IRn → IRn be contin-
uous around x̄, with f (x̄) = ȳ. Then f −1 has a Lipschitz continuous single-valued
localization around ȳ for x̄ if and only if
(3) 0 ∈ D∗ f (x̄)(u) =⇒ u = 0.
Proof. Recall Theorem 1F.2, which says that for a function f : IRn → IRn that is
continuous around x̄, the inverse f −1 has a Lipschitz continuous localization around
f (x̄) for x̄ if and only if, in some neighborhood U of x̄, there is a constant c > 0 such
that
We will show first that (3) implies (4), from which the sufficiency of the condi-
tion will follow. With the aim of reaching a contradiction, let us assume there are
sequences ck → 0, xk → x̄ and x̃k → x̄ such that
x̃k − xk
uk :=
|xk − x̃k |
satisfies |uk | = 1 for all k, hence a subsequence uki of it is convergent to some u = 0.
Restricting ourselves to such a subsequence, we obtain for t ki = |xki − x̃ki | that
By definition, the limit on the left side belongs to D∗ f (x̄)(u), yet u = 0, which is
contrary to (3). Hence (3) does imply (4).
For the converse, we argue that if (3) were violated, there would be sequences
t k → 0, xk → x̄, and uk → u with u = 0, for which
f (xk + t k uk ) − f (xk )
(5) lim = 0.
k→∞ tk
On the other hand, under (4) however, one has
| f (xk + t k uk ) − f (xk )|
≥ c|uk |,
tk
which combined with (5) and the assumption that uk is away from 0 leads to an
absurdity for large k. Thus (4) guarantees that (3) holds.
The property recorded in (1) indicates clearly that Clarke’s theorem follows from
that of Kummer. However, although the characterization of Lipschitz invertibility in
Kummer’s theorem looks simple, the price to be paid still lies ahead: we have to be
able to calculate the strict graphical derivative in every case of interest. This task
could be quite hard without calculus rules.
A rule that immediately follows from the definitions, at least, is the following.
Exercise 4G.4 (strict graphical derivatives of a sum). For a function f1 : IRn → IRm
that is strictly differentiable at x̄ (or, in particular, as a special case, continuously
differentiable in a neighborhood of x̄), and a function f2 : IRn → IRm that is Lipschitz
continuous around x̄, one has
(6) minimize g0 (x) − v, x over all x satisfying gi (x) ≤ ui for i ∈ [1, m],
242 4 Regularity Properties Through Generalized Derivatives
where we put ⎛ ⎞ ⎛ ⎞
g1 (x) y1
⎜ ⎟ ⎜ . ⎟
g(x) = ⎝ ... ⎠ and y = ⎝ .. ⎠ .
gm (x) ym
More conveniently, for the mapping
∇g0 (x) + y∇g(x)
(8) G : (x, y) → + NE (x, y),
−g(x)
the solution mapping of (7) is just G−1 (here, without changing anything, we take the
negative of the first row since the normal cone to IRn is the zero mapping). We now
focus on inverting the mapping G. Choose a reference value (v̄, ū) of the parameters
and let (x̄, ȳ) solve (7) for (v̄, ū), that is, (v̄, ū) ∈ G(x̄, ȳ).
To apply Kummer’s theorem, we convert, as in the final part of the proof of 4F.7,
the variational inequality (7) into an equation involving the function H : IRn+m →
IRn+m defined as follows:
⎛ ⎞
∇g0 (x) + ∑m i=1 yi ∇gi (x)
+
where z̄i = ȳ+i , and if G−1 has a Lipschitz continuous localization around (v̄, ū) for
(x̄, z̄) then H −1 has the same property at (v̄, ū) for (x̄, ȳ), where ȳ satisfies (10).
To invoke Kummer’s theorem for the function H, we need to determine the strict
graphical derivative of H. There is no trouble in differentiating the expressions
−gi (x) + y−i , inasmuch as we already know from 4G.2 the strict graphical deriva-
tive of y− . A little bit more involved is the determination of the strict graphical
derivative of ϕi (x, y) := ∇gi (x)y+i for i = 1, . . . , m. Adding and subtracting the same
expressions, passing to the limit as in the definition, and using (2), we obtain
we obtain that
A BT Λ
(12) M(Λ ) ∈ D∗ H(x̄, ȳ) ⇐⇒ M(Λ ) = .
−B Im − Λ
This formula can be simplified by re-ordering the functions gi according to the sign
of ȳi . We first introduce some notation. Let I = {1, . . . , m} and, without loss of gen-
erality, suppose that
i ∈ I ȳi > 0 = {1, . . . , k} and i ∈ I ȳi = 0 = {k + 1, . . ., l}.
Let
244 4 Regularity Properties Through Generalized Derivatives
⎛ ⎞ ⎛ ⎞
∇g1 (x̄) ∇gk+1 (x̄)
⎜ ⎟ ⎜ ⎟
B+ = ⎝ ... ⎠ and B0 = ⎝ ..
. ⎠,
∇gk (x̄) ∇gl (x̄)
let Λ0 be the (l − k) × (l − k) diagonal matrix with diagonal elements λi ∈ [0, 1], let
I0 be the identity matrix for IRl−k , and let Im−l be the identity matrix for IRm−l . Then,
since λi = 1 for i = 1, . . . , k and λi = 0 for i = l + 1, . . . , m, the matrix M(Λ ) in (12)
takes the form ⎛ ⎞
A BT + BT0 Λ0 0
⎜ 0 0 0 ⎟
M(Λ0 ) = ⎝ ⎠.
−B 0 I0 − Λ0 0
0 0 Im−l
Each column of M(Λ0 ) depends on at most one λi , hence there are numbers
ak+1 , bk+1 , . . . , al , bl
such that
det M(Λ0 ) = (ak+1 + λk+1 bk+1 ) · · · (al + λl bl ).
Therefore, det M(Λ0 ) = 0 for all λi ∈ [0, 1], i = k + 1, . . . , l, if and only if the follow-
ing condition holds:
where I0J is the diagonal matrix having 0 as (i−k)-th element if i ∈ J and 1 otherwise.
Clearly, all the matrices M(J) are obtained from M(Λ0 ) by taking each λi either 0
or 1. The condition (13) can be then written equivalently as
(14) detM(J) = 0 and sign det M(J) is the same for all J.
Let the matrix B(J) have as rows the row vectors ∇gi (x̄) for i ∈ J. Reordering the
last m − k columns and rows of M(J), if necessary, we obtain
A BT B(J)T 0
M(J) = +
,
−B 0 0 I
where I is now the identity for IR{k+1,...,m}\J . The particular form of the matrix M(J)
implies that M(J) fulfills (14) if and only if (14) holds for just a part of it, namely
for the matrix
⎛ ⎞
H BT+ B(J)T
(15) N(J) := ⎝ −B+ 0 0 ⎠.
−B(J) 0 0
Theorem 4G.5 (strong regularity characterization for KKT mappings). The solu-
tion mapping of the Karush–Kuhn–Tucker variational inequality (7) has a Lipschitz
continuous single-valued localization around (v̄, ū) for (x̄, ȳ) if andonly if,
for the
matrix N(J) in (15), det N(J) has the same nonzero sign for all J ⊂ i ∈ I ȳi = 0 .
We should note that, in this specific case, the set of matrices D∗ H(x̄, ȳ) happens to
be convex, and then D∗ H(x̄, ȳ) coincides with its generalized Jacobian co ∇H( ¯ x̄, ȳ).
Graphical derivatives, defined in terms of tangent cones to graphs, have been the
mainstay for most of the developments in this chapter so far, with the exception of
the variants in 4G. However, an alternative approach can be made to many of the
same issues in terms of graphical coderivatives, defined instead in terms of normal
cones to graphs. This theory is readily available in other texts in variational analysis,
so we will only lay out the principal ideas and facts here without going into their
detailed development.
Normal cones NC (x) have already been prominent, of course, in our work with
optimality conditions and variational inequalities, starting in Section 2A, but only
in the case of convex sets C. To arrive at coderivatives for a mapping F : IRn → → IRm ,
246 4 Regularity Properties Through Generalized Derivatives
we wish to make use of normal cones to gph F at points (x, y), but to keep the door
open to significant applications we need to deal with graph sets that are not convex.
The first task, therefore, is generalizing NC (x) to the case of nonconvex C.
General normal cones. For a set C ⊂ IRn and a point x ∈ C at which C is locally
closed, a vector v is said to be a regular normal if v, x − x ≤ o(|x − x|) for x ∈ C.
The set of all such vectors v is called the regular normal cone to C at x and is denoted
by N̂C (x). A vector v is said to be a general normal to C at x if there are sequences
{xk } and {vk } with xk ∈ C, such that
The set of all such vectors v is called the general normal cone to C at x and is
denoted by NC (x). For x ∈
/ C, NC (x) is taken to denote the empty set.
Very often, the limit process in the definition of the general normal cone NC (x)
is superfluous: no additional vectors v are produced in that manner, and one merely
has NC (x) = N̂C (x). This circumstance is termed the Clarke regularity of C at x.
When C is convex, for instance, it is Clarke regular at every one of its points x, and
the generalized normal cone NC (x) agrees with the normal cone defined earlier, in
2A. Anyway, NC (x) is always a closed cone.
Coderivatives of mappings. For a mapping F : IRn → → IRm and a pair (x, y) ∈ gph F
at which gph F is locally closed, the coderivative of F at x for y is the mapping
D∗ F(x|y) : IRm →
→ IRn defined by
Obviously this is a “dual” sort of notion, but where does it fit in with classi-
cal differentiation? The answer can be seen by specializing to the case where F is
single-valued, thus reducing to a function f : IRn → IRm . Suppose f is strictly dif-
ferentiable at x; then for y = f (x), the graphical derivative D f (x|y) is of course
the linear mapping from IRn to IRm with matrix ∇ f (x). In contrast, the coderiva-
tive D∗ f (x|y) comes out as the adjoint linear mapping from IRm to IRn with matrix
∇ f (x)T .
The most striking fact about coderivatives in our context is the following simple
characterization of metric regularity.
Thus, F is metrically regular if and only if the right side of (1) is finite, which is
equivalent to
where f : IRd × IRn → IRn and C is a polyhedral convex set in IRn . Let x̄ be a solution
of (2) for p̄ and let f be continuously differentiable around ( p̄, x̄). From Theorem
4F.1 we know that a sufficient condition for the existence of a Lipschitz single-
valued localization of the solution mapping S of (2) around p̄ for x̄ is the metric
regularity at 0 for 0 of the reduced mapping
(4) A + NK ,
Based on earlier results in 2E we also noted there that in the case of ample parame-
terization, this sufficient condition is necessary as well. Further, in 4F.3 we obtained
a characterization of metric regularity of (3) by means of the derivative criterion in
4B.1. We will now apply the coderivative criterion for metric regularity (2) to the
mapping in (4); for that purpose we have to compute the coderivative of the mapping
in (4). The first step to do that is easy and we will give it as an exercise.
Exercise 4H.2 (reduced coderivative formula). Show that, for a linear mapping A :
IRn → IRn and a closed convex cone K ⊂ IRn one has
polyhedral convex sets in IR2n (due to K being polyhedral), only finitely many cones
can be manifested as TG (x, v) at points (x, v) ∈ G near (0, 0). Thus, for a sufficiently
small neighborhood O of the origin in IR2n we have that
,
(7) NG (0, 0) = TG (x, v)∗ .
(x,v)∈O∩G
It follows from
reduction lemma 2E.4 that TG (x, v) = gph NK(x,v) , where K(x, v) =
x ∈ TK (x) x ⊥ v . Therefore,
TG (x, v) = (x , v ) x ∈ K(x, v), v ∈ K(x, v)∗ , x ⊥ v ,
and we have
6 7
TG (x, v)∗ = (r, u) (r, u), (x , v ) ≤ 0 for all (x , v ) ∈ TG (x, v)
= (r, u) r, x + u, v ≤ 0 for all
x ∈ K(x, v), v ∈ K(x, v)∗ with x ⊥ v .
Hence NG (0, 0) is the union of all product sets K̂ ∗ × K̂ associated with cones K̂ such
that K̂ = K(x, v) for some (x, v) ∈ G near enough to (0, 0).
It remains to observe that the form of the critical cones K̂ = K(x, v) at points (x, v)
close to (0, 0) is already derived in Lemma 4F.2, namely, for every choice of (x, v) ∈
gph K near (0, 0) (this last requirement is actually not needed) the corresponding
critical cone K̂ = K(x, v) is given by
where F is the collection of all faces of K as defined in 4F. To see this all we need
to do is to replace C by K and (x, v) by (0, 0) in the proof of 4F.2. Summarizing,
from (6), (7), (8) and (9), and the coderivative criterion in 4H.1, we come to the
following result:
Lemma 4H.3 (regularity modulus from coderivative criterion). For the mapping in
(4)(5) we have
1
(10) reg (A + NK ; 0|0) = max sup .
F1 ,F2 ∈FK u∈F −F
1 2 d(AT u, (F1 − F2 )∗ )
F1 ⊃F2 |u|=1
Thus, A + NK is metrically regular at 0 for 0 if and only if for every choice of critical
faces F1 , F2 ∈ FK with F2 ⊂ F1 ,
4 Regularity Properties Through Generalized Derivatives 249
u ∈ F1 − F2 and AT u ∈ (F1 − F2 )∗ =⇒ u = 0.
Commentary
Graphical derivatives of set-valued mappings were introduced by Aubin [1981];
for more, see Aubin and Frankowska [1990]. The material in sections 4B and 4C
is from Dontchev, Quincampoix and Zlateva [2006], where results of Aubin and
Frankowska [1987, 1990] were used.
The statement 4B.5 of the Ekeland principle is from Ekeland [1990]. A detailed
presentation of this principle along with various forms and extensions is given in
Borwein and Zhu [2005]. The proof of the classical implicit function theorem 1A.1
given at the end of Section 4B is close, but not identical, to that in Ekeland [1990].
The derivative criterion for metric subregularity in 4C.1 was obtained by
Rockafellar [1989], but the result itself was embedded in a proof of a statement re-
quiring additional assumptions. The necessity without those assumptions was later
noted in King and Rockafellar [1992] and in the case of sufficiency by Levy [1996].
The statement and the proof of 4C.1 are from Dontchev and Rockafellar [2004].
Sections 4E and 4F collect various results scattered in the literature. Theorem
4E.4 is from Dontchev and Rockafellar [2004] while the critical face lemma 4F.2
is a particular case of Lemma 3.5 in Robinson [1984]; see also Theorem 5.6 in
Rockafellar [1989]. Theorem 4F.6 is a particular case of Theorem 3 in Dontchev
and Rockafellar [1996] which in turn is based on a deeper result in Robinson [1992],
see also Ralph [1993]. The presented proof uses a somewhat modified version of a
reduction argument from the book Klatte and Kummer [2002], Section 7.5.
Clarke’s inverse function theorem, 4G.1, was first published in Clarke [1976]; for
more information regarding the generalized Jacobian see the book of Clarke [1983].
Theorem 4G.3 is from Kummer [1991]; see also Klatte and Kummer [2002] and
Páles [1997]. It is interesting to note that a nonsmooth implicit function theorem
which is a special case of both Clarke’s theorem and Kummer’s theorem, appeared
as early as 1916 in a paper by Hedrick and Westfall [1916].
Theorem 4G.5 originates from Robinson [1980]; the proof given here uses some
ideas from Kojima [1980] and Jongen et al. [1987].
The coderivative criterion in 4H.1 goes back to the early works of Ioffe [1981,
1984], Kruger [1982] and Mordukhovich [1984]. A broad review of the role of
coderivatives in variational analysis is given in Mordukhovich [2006].
Chapter 5
Regularity in Infinite Dimensions
The theme of this chapter has origins in the early days of functional analysis and
the Banach open mapping theorem, which concerns continuous linear mappings
from one Banach space to another. The graphs of such mappings are subspaces
of the product of the two Banach spaces, but remarkably much of the classical
theory extends to set-valued mappings whose graphs are convex sets or cones in-
stead of subspaces. Openness connects up then with metric regularity and interior-
ity conditions on domains and ranges, as seen in the Robinson–Ursescu theorem.
Infinite-dimensional inverse function theorems and implicit function theorems due
to Lyusternik, Graves, and Bartle and Graves can be derived and extended. Banach
spaces can even be replaced to some degree by more general metric spaces.
Before proceeding we review some notation and terminology. Already in the first
section of Chapter 1 we stated the contraction mapping principle in metric spaces.
Given a set X, a function ρ : X × X → IR+ is said to be a metric in X when
(i) ρ (x, y) = 0 if and only if x = y;
(ii) ρ (x, y) = ρ (y, x);
(iii) ρ (x, y) ≤ ρ (x, z) + ρ (z, y) (triangle inequality).
A set X equipped with a metric ρ is called a metric space (X, ρ ). In a metric space
(X, ρ ), a sequence {xk } is called a Cauchy sequence if for every ε > 0 there ex-
ists n ∈ IN such that ρ (xk , x j ) < ε for all k, j > n. A metric space is complete if
every Cauchy sequence converges to an element of the space. Any closed set in a
Euclidean space is a complete metric space with the metric ρ (x, y) = |x − y|.
A linear (vector) space over the reals is a set X in which addition and scalar
multiplication are defined obeying the standard algebraic laws of commutativity,
associativity and distributivity. A linear space X with elements x is normed if it
is furnished with a real-valued expression x, called the norm of x, having the
properties
(i) x ≥ 0, x = 0 if and only if x = 0;
(ii) α x = |α | x for α ∈ IR;
(iii) x + y ≤ x + y.
Any normed space is a metric space with the metric ρ (x, y) = x − y. A complete
normed vector space is called a Banach space. On a finite-dimensional space, all
A.L. Dontchev and R.T. Rockafellar, Implicit Functions and Solution Mappings: A View 251
from Variational Analysis, Springer Monographs in Mathematics,
DOI 10.1007/978-0-387-87821-8 5, c Springer Science+Business Media, LLC 2009
252 5 Regularity in Infinite Dimensions
norms are equivalent, but when we refer specifically to IRn we ordinarily have in
mind the Euclidean norm denoted by | · |. Regardless of the particular norm being
employed in a Banach space, the closed unit ball for that norm will be denoted by
IB, and the distance from a point x to a set C will be denoted by d(x,C), and so forth.
As in finite dimensions, a function A acting from a Banach space X into a Banach
space Y is called a linear mapping if dom A = X and A(α x + β y) = α Ax + β Ay for
all x, y ∈ X and all scalars α and β . The range of a linear mapping A from X to
Y is always a subspace of Y , but it might not be a closed subspace, even if A is
continuous. A linear mapping A : X → Y is surjective if rge A = Y and injective if
ker A = {0}.
Although in finite dimensions a linear mapping A : X → Y is automatically con-
tinuous, this fails in infinite dimensions; neither does surjectivity of A when X = Y
necessarily yield invertibility, in the sense that A−1 is single-valued. However, if A
is continuous at any one point of X, then it is continuous at every point of X . That,
moreover, is equivalent to A being bounded, in the sense that A carries bounded
subsets of X into bounded subsets of Y , or what amounts to the same thing due to
linearity, the image of the unit ball in X is included in some multiple of the unit ball
in Y , i.e., the value
A = sup Ax
x≤1
is finite. This expression defines the operator norm on the space L (X ,Y ), consisting
of all continuous linear mappings A : X → Y , which is then another Banach space.
Special and important in this respect is the Banach space L (X, IR), consisting
of all linear and continuous real-valued functions on X. It is the space dual to X,
symbolized by X ∗ , and its elements are typically denoted by x∗ ; the value that an
x∗ ∈ X ∗ assigns to an x ∈ X is written as x∗ , x. The dual of the Banach space X ∗ is
the bidual X ∗∗ of X; when every function x∗∗ ∈ X ∗∗ on X ∗ can be represented as x∗ →
x∗ , x for some x ∈ X , the space X is called reflexive. This holds in particular when
X is a Hilbert space with x, y as its inner product, and each x∗ ∈ X ∗ corresponds
to a function x → x, y for some y ∈ X, so that X ∗ can be identified with X itself.
Another thing to be mentioned for a pair of Banach spaces X and Y and their
duals X ∗ and Y ∗ is that any A ∈ L (X ,Y ) has an adjoint A∗ ∈ L (Y ∗ , X ∗ ) such that
Ax, y∗ = x, A∗ y∗ for all x ∈ X and y∗ ∈ Y ∗ . Furthermore, A∗ = A. A gener-
alization of this to set-valued mappings having convex cones as their graphs will be
seen later.
In fact most of the definitions, and even many of the results, in the preceding
chapters will carry over with hardly any change, the major exception being results
with proofs which truly depended on the compactness of IB. Our initial task, in Sec-
tion 5A, will be to formulate various facts in this broader setting while coordinating
them with classical theory. In the remainder of the chapter, we present inverse and
implicit mapping theorems with metric regularity in abstract spaces. Parallel results
for metric subregularity are not considered.
5 Regularity in Infinite Dimensions 253
As before, the infimum of all such κ associated with choices of U and V is denoted
by reg (F; x̄| ȳ) and called the modulus of metric regularity of F at x̄ for ȳ.
The classical theorem about openness only addresses linear mappings. There are
numerous versions of it available in the literature; we provide the following formu-
lation:
Theorem 5A.1 (Banach open mapping theorem). For any A ∈ L (X,Y ) the follow-
ing properties are equivalent:
(a) A is surjective;
(b) A is open (at every point);
(c) 0 ∈ int A(intIB);
(d) there is a κ > 0 such that for all y ∈ Y there exists x ∈ X with Ax = y and
x ≤ κ y.
This theorem will be derived in Section 5B from a far more general result about
set-valued mappings F than just linear mappings A. Our immediate interest lies in
connecting it with the ideas in previous chapters, so as to shed light on where we
have arrived and where we are going.
The first observation to make is that (d) of Theorem 5A.1 is the same as the
existence of a κ > 0 such that d(0, A−1 (y)) ≤ κ y for all y. Clearly (d) does imply
this, but the converse holds also by passing to a slightly higher κ if need be. But
the linearity of A can also be brought in. For x ∈ X and y ∈ Y in general, we have
d(x, A−1 (y)) = d(0, A−1 (y) − x), and since z ∈ A−1 (y) − x corresponds to A(x + z) =
y, we have d(0, A−1 (y) − x) = d(0, A−1 (y − Ax)) ≤ κ y − Ax. Thus, (d) of Theorem
5A.1 is actually equivalent to:
(3) there exists κ > 0 such that d(x, A−1 (y)) ≤ κ d(y, Ax) for all x ∈ X, y ∈ Y.
254 5 Regularity in Infinite Dimensions
Obviously this is the same as the metric regularity property in (2) as specialized to
A, with the local character property becoming global through the arbitrary scaling
made available because A(λ x) = λ Ax. In fact, due to linearity, metric regularity of
A with respect to any pair (x̄, ȳ) in its graph is identical to metric regularity with
respect to (0, 0), and the same modulus of metric regularity prevails everywhere.
We can simply denote this modulus by reg A and use the formula that
where the excess e(C, D) is defined in Section 3A. The infimum of all κ in (6) over
various choices of U and V is the modulus lip (F −1 ; ȳ| x̄).
5 Regularity in Infinite Dimensions 255
Theorem 5A.3 (metric regularity, linear openness and the inverse Aubin prop-
erty). For Banach spaces X and Y and a mapping F : X → → Y , the following proper-
ties with respect to a pair (x̄, ȳ) ∈ gph F are equivalent:
(a) F is linearly open at x̄ for ȳ with constant κ,
(b) F is metrically regular at x̄ for ȳ with constant κ,
(c) F −1 has the Aubin property at ȳ for x̄ with constant κ.
Moreover reg (F; x̄| ȳ) = lip (F −1 ; ȳ| x̄).
When F is taken to be a mapping A ∈ L (X ,Y ), how does the content of The-
orem 5A.3 compare with that of Theorem 5A.1? With linearity, the openness in
5A.1(b) comes out the same as the linear openness in 5A.3(a) and is easily seen to
reduce as well to the interiority condition in 5A.1(c). On the other hand, 5A.1(d)
has already been shown to be equivalent to the subsequently added property (e), to
which 5A.3(b) reduces when F = A. From 5A.3(c), though, we get yet another prop-
erty which could be added to the equivalences in Theorem 5A.1 for A ∈ L (X,Y ),
specifically that
(f) A−1 : Y →→ X has the Aubin property at every ȳ ∈ Y for every x̄ ∈ A−1 (ȳ),
where lip (A−1 ; ȳ| x̄) = reg A always. This goes farther than the observation in Corol-
lary 5A.2, which covered only single-valued A−1 . In general, of course, the Aubin
property in 5A.3(c) turns into local Lipschitz continuity when F −1 is single-valued.
An important feature of Theorem 5A.1, which is not represented at all in Theo-
rem 5A.3, is the assertion that surjectivity is sufficient, as well as necessary, for all
these properties to hold. An extension of that aspect to nonlinear F will be possible,
in a local sense, under the restriction that gph F is closed and convex. This will
emerge in the next section, in Theorem 5B.4.
Another result which we now wish to upgrade to infinite dimensions is the es-
timation for perturbed inversion which appeared in matrix form in Corollary 1E.7
with elaborations in 1E.8. It lies at the heart of the theory of implicit functions and
will eventually be generalized in more than one way. We provide it here with a direct
proof (compare with 1E.8(b)).
A−1
(7) (A + B)−1 ≤ .
1 − A−1B
Proof. Let C = BA−1 ; then C < 1 and hence Cn ≤ Cn → 0 as n → ∞. Also,
the elements
n
Sn = ∑ C i for n = 0, 1, . . .
i=0
form a Cauchy sequence in the Banach space L (X,Y ) which therefore converges
to some S ∈ L (X ,Y ). Observe that, for each n,
Sn (I − C) = I − Cn+1 = (I − C)Sn ,
256 5 Regularity in Infinite Dimensions
and hence, through passing to the limit, one has S = (I − C)−1 . On the other hand
n ∞
1
Sn ≤ ∑ Ci ≤ ∑ Ci = .
i=0 i=0 1 − C
Thus, we obtain
1
(I − C)−1 ≤ .
1 − C
All that remains is to bring in the identity (I − C)A = A − B and the inequality
C ≤ A−1 B, and to observe that the sign of B does not matter.
Note that, with the conventions ∞ · 0 = 0, 1/0 = ∞ and 1/∞ = 0, Lemma 5A.4
also covers the cases A−1 = ∞ and A−1 ·B = 1.
Exercise 5A.5. Derive Lemma 5A.4 from the contraction mapping principle 1A.2.
Guide. Setting a = A−1 , choose B ∈ L (X,Y ) with B < A−1 −1 and y ∈ Y
with y ≤ 1 − aB. Show that the mapping Φ : x → A−1 (y − Bx) satisfies the
conditions in 1A.2 with λ = aB and hence, there is a unique x ∈ aIB such that
x = A−1 (y − Bx), that is (A + B)x = y. Thus, A + B is invertible. Moreover x =
(A + B)−1(y) ≤ a for every y ∈ (1 − aB)IB, which implies that
A−1
(A + B)−1z ≤ for every z ∈ IB.
1 − A−1B
C2
(I − C)−1 − I − C ≤ .
1 − C
Guide. Use the sequence of mappings Sn in the proof of 5A.4 and observe that
C2
Sn − I − C = C2 + C3 + · · · + Cn ≤ .
1 − C
Proposition 5A.7 (outer and inner norms). The inner norm of a positively homo-
geneous mapping H : X →
→ Y satisfies
H = inf κ ∈ (0, ∞) H(x) ∩ κ IB = 0/ for all x ∈ IB ,
−
so that, in particular,
−
(9) H < ∞ =⇒ dom H = X.
and we have
+
(10) H < ∞ =⇒ H(0) = {0},
with this implication becoming an equivalence when H has closed graph and
dim X < ∞.
The equivalence generally fails in (10) when dim X = ∞ because of the lack of
compactness then (with respect to the norm) of the ball IB in X.
An extension of Lemma 5A.4 to possibly set-valued mappings that are positively
homogeneous is now possible in terms of the outer norm. Recall that for a positively
homogeneous H : X → → Y and a linear B : X → Y we have (H + B)(x) = H(x) + Bx
for every x ∈ X .
H −1 +
(H + B)−1 ≤
+
(11) .
1 − H −1+ B
y ∈ IB and x ∈ (H + B)−1 (y) such that x > ([H −1 + ]−1 − B)−1, which is the
same as
1
> H −1 .
+
(12)
x−1 + B
x
H −1 ≥
+
.
y − Bx
x x 1
H −1 ≥ > H −1 .
+ +
≥ =
y − Bx 1 + Bx x−1 + B
For any set C in a Banach space X and any point x ∈ C, the tangent cone TC (x) at
a point x ∈ C is defined as in 2A to consist of all limits v of sequences (1/τ k )(xk − x)
with xk → x in C and τ k 0. When C is convex, TC (x) has an equivalent description
as the closure of the convex cone consisting of all vectors λ (x − x) with x ∈ C and
λ > 0.
In infinite dimensions, the normal cone NC (x) to C at x can be introduced in
various ways that extend the general definition given for finite dimensions in 4H,
but we will only be concerned with the case of convex sets C. For that case, the
special definition in 2A suffices with only minor changes caused by the need to
work with the dual space X ∗ and the pairing x, x∗ between X and X ∗ . Namely,
NC (x) consists of all x∗ ∈ X ∗ such that
5 Regularity in Infinite Dimensions 259
Equivalently, through the alternative description of TC (x) for convex C, the normal
cone NC (x) is the polar TC (x)∗ of the tangent cone TC (x). It follows that TC (x) is in
turn the polar cone NC (x)∗ .
As earlier, NC (x) is taken to be the empty set when x ∈/ C so as to get a set-valued
mapping NC defined for all x, but this normal cone mapping now goes from X to
X ∗ instead of from the underlying space into itself (except in the case of a Hilbert
space, where X ∗ can be identified with X as recalled above). A generalized equation
of the form
f (x) + NC (x) 0 for a function f : X → X ∗
is again a variational inequality. Such generalized equations are central, for in-
stance, to many applications involving differential or integral operators, especially
in a Hilbert space framework.
Exercise 5A.9 (normals to cones). Show that for a closed convex cone K ⊂ X and
its polar K ∗ ⊂ X ∗ , one has
x∗ ∈ NK (x) ⇐⇒ x ∈ K, x∗ ∈ K, x, x∗ = 0.
A set C is called absorbing if 0 ∈ core C. Obviously core C ⊃ int C always, but there
are circumstances where necessarily core C = int C. It is elementary that this holds
260 5 Regularity in Infinite Dimensions
when C is convex with int C = 0, / but more attractive is the potential of using the
purely algebraic test of whether a point x belongs to core C to confirm that x ∈ int C
without first having to establish that int C = 0.
/ Most importantly for our purposes
here,
Theorem 5B.1 (interiority criteria for domains and ranges). For any mapping F :
X→
→ Y with closed convex graph, one has
In addition, core cl rge F = int cl rge F and core cl dom F = int cl dom F , where
moreover
int cl rge F = int rge F when dom F is bounded,
(3)
int cl dom F = int dom F when rge F is bounded.
αk 1
(xk+1 , yk+1 ) = (xk , yk ) + (uk , vk ).
1+α k 1 + αk
Clearly, (xk+1 , yk+1 ) ∈ gph F by its convexity. Also, the sequence {yk } satisfies
vk − wk 1 k
yk+1 − ỹ = ≤ y − ỹ.
1 + αk 2
If yk+1 = ỹ, we take x̃ = xk+1 and (xn , yn ) = (x̃, ỹ) for all n = k + 1, k + 2, . . .. If not,
we perform the induction step again. As a result, we generate an infinite sequence
{(xk , yk )}, each element of which is equal to (x̃, ỹ) after some k or has yk = ỹ for all
k and also
1 0
(4) yk − ỹ ≤ y − ỹ for all k = 1, 2, . . . .
2k
In the latter case, we have yk → ỹ. Further, for the associated sequence {xk } we
obtain
xk − uk xk + uk k
xk+1 − xk = ≤ y − ỹ.
1 + αk yk − ỹ + δ
Both xk and uk are from dom F and thus are bounded. Therefore, from (4), {xk } is a
Cauchy sequence, hence (because X is a complete metric space) convergent to some
x̃. Because gph F is closed, we end up with (x̃, ỹ) ∈ gph F, as required.
Next we address the second claim in (2), where the inclusion core dom F ⊂
int dom F suffices for establishing equality. We must show that an arbitrarily cho-
sen point of core dom F belongs to int dom F, but through a translation of gph F
we can focus without loss of generality on that point in core dom F being 0,
with F(0) 0. Let F0 : X → → Y be defined by F0 (x) = F(x) ∩ IB. The graph of
F0 , being [X × IB] ∩ gph F, is closed and convex, and we have dom F0 ⊂ dom F
and rge F0 ⊂ IB (bounded). The relations already established in (3) tell us that
int cl dom F0 = int dom F0 , where cl dom F0 is a closed convex set. By demon-
strating that cl dom F0 is absorbing, we will be able to conclude from (1) that
0 ∈ int dom F0 , hence 0 ∈ int dom F. It is enough actually to show that dom F0
itself is absorbing.
Consider any x ∈ X . We have to show the existence of ε > 0 such that tx ∈ dom F0
for t ∈ [0, ε ]. We do know, because dom F is absorbing, that tx ∈ dom F for all t > 0
sufficiently small. Fix t0 as such a t, and letting y0 ∈ F(t0 x); let y = y0 /t0 , so that
t0 (x, y) ∈ gph F. The pair t(x, y) = (tx,ty) belongs then to gph F for all t ∈ [0,t0 ]
through the convexity of gph F and our arrangement that (0, 0) ∈ gph F. Take ε > 0
262 5 Regularity in Infinite Dimensions
small enough that ε y ≤ 1. Then for t ∈ [0, ε ] we have ty = ty ≤ 1, giving us
ty ∈ F(tx) ∩ IB and therefore tx ∈ dom F0 , as required.
Regularity properties will now be explored. The property of a mapping F : X → →Y
being open at x̄ for ȳ, as extended to Banach spaces X and Y in 5A(1), can be restated
equivalently in a manner that more closely resembles the linear openness property
defined in 5A(5):
(5) for any a > 0 there exists b > 0 such that F(x̄ + a intIB) ⊃ ȳ + b intIB.
Linear openness requires a linear scaling relationship between a and b. Under posi-
tive homogeneity, such scaling is automatic. On the other hand, an intermediate type
of property holds automatically without positive homogeneity when the graph of F
is convex, and it will be a stepping stone toward other, stronger, consequences of
convexity.
Proof. Clearly, (5) implies (6). For the converse, assume (6) and consider any a > 0.
Take b = min{1, a}c. If a ≥ 1, the left side of (6) is contained in the left side of (5),
and hence (5) holds. Suppose therefore that a < 1. Let w ∈ ȳ + b intIB. The point v =
(w/a) − (1 − a)(ȳ/a) satisfies v − ȳ = w − ȳ/a < b/a = c, hence v ∈ ȳ + c intIB.
Then from (6) there exists u ∈ x̄ + intIB with (u, v) ∈ gph F. The convexity of gph F
implies a(u, v) + (1 − a)(x̄, ȳ) ∈ gph F and yields av + (1 − a)ȳ ∈ F(au + (1 − a)x̄) ⊂
F(x̄ + a intIB). Substituting v = (w/a) − (1 − a)(ȳ/a) in this inclusion, we see that
w ∈ F(x̄ + a intIB), and since w was an arbitrary point in ȳ + b intIB, we get (5).
The following fact bridges, for set-valued mappings with convex graphs, between
condition (6) and metric regularity.
Lemma 5B.3 (metric regularity estimate). Let F : X → → Y have convex graph con-
taining (x̄, ȳ), and suppose (6) is fulfilled. Then
1 + x − x̄
(7) d(x, F −1 (y)) ≤ d(y, F(x)) for all x ∈ X, y ∈ ȳ + c intIB.
c − y − ȳ
Proof. We may assume that (x̄, ȳ) = (0, 0), since this can be arranged by translating
gph F to gph F − (x̄, ȳ). Then condition (6) has the simpler form
1 + x
d(x, F −1 (y)) < [d(y, F(x)) + ε ].
α −ε
Letting ε → 0, we finish the proof.
Condition (6) entails in particular having ȳ ∈ int rge F. It turns out that when
the graph of F is not only convex but also closed, the converse implication holds as
well, that is, ȳ ∈ int rge F is equivalent to (6). This is a consequence of the following
theorem, which furnishes a far-reaching generalization of the Banach open mapping
theorem.
By a translation, we can reduce to the case of (x̄, ȳ) = (0, 0). To conclude (9)
in this setting, where F(0) 0 and 0 ∈ int rge F, it will be enough to show, for an
arbitrary δ ∈ (0, 1), that 0 ∈ int F(δ IB). Define the mapping Fδ : X → → Y by Fδ (x) =
F(x) when x ∈ δ IB but Fδ (x) = 0/ otherwise. Then Fδ has closed convex graph given
by [δ IB×Y ]∩gph F. Also F(δ IB) = rge Fδ and dom Fδ ⊂ δ IB. We want to show that
0 ∈ int rge Fδ , but have Theorem 5B.1 at our disposal, according to which we only
need to show that rge Fδ is absorbing. For that purpose we use an argument which
closely parallels one already presented in the proof of Theorem 5B.1. Consider any
y ∈ Y . Because 0 ∈ int rge F, there exists t0 such that ty ∈ rge F when t ∈ [0,t0 ]. Then
there exists x0 such that t0 y ∈ F(x0 ). Let x = x0 /t0 , so that (t0 x,t0 y) ∈ gph F. Since
gph F is convex and contains (0, 0), it also then contains (tx,ty) for all t ∈ [0,t0 ].
Taking ε > 0 for which ε x ≤ δ , we get for all t ∈ [0, ε ] that (tx,ty) ∈ gph Fδ ,
hence ty ∈ rge Fδ , as desired.
Utilizing (9), we can put the argument for the equivalences in Theorem 5B.4 to-
gether. That (b) implies (a) is obvious. We work next on getting from (a) to (c). When
(a) holds, we have from (9) that (6) holds for some c, in which case Lemma 5B.3
provides (7). By restricting x and y to small neighborhoods of x̄ and ȳ in (7), we
deduce the metric regularity of F at x̄ for ȳ with any constant κ > 1/c. Thus, (c)
264 5 Regularity in Infinite Dimensions
holds. Finally, out of (c) and the equivalences in Theorem 5A.3 we may conclude
that F is linearly open at x̄ for ȳ, and this gets us back to (b).
The preceding argument passed through linear openness as a fourth property
which could be added to the equivalences in Theorem 5B.4, but which was left out
of the theorem’s statement for historical reasons. We now record this fact separately.
Theorem 5B.5 (linear openness from openness and convexity). For a mapping F :
X→ → Y with closed convex graph, openness at x̄ for ȳ always entails linear openness
at x̄ for ȳ.
Another fact, going beyond the original versions of Theorem 5B.4, has come up
as well.
Theorem 5B.6 (core criterion for regularity). Condition (a) of Theorem 5B.4 can
be replaced by the criterion that ȳ ∈ core rge F .
Proof. This calls up the core property in Theorem 5B.1.
We can finish tying up loose ends now by returning to the Banach open mapping
theorem at the beginning of this chapter and tracing how it fits with the Robinson–
Ursescu theorem.
Derivation of Theorem 5A.1 from Theorem 5B.4. It was already noted in the
sequel to 5A.1 that condition (d) in that result was equivalent to the metric regularity
of the linear mapping A, stated as condition (e). It remains only to observe that when
Theorem 5B.4 is applied to F = A ∈ L (X,Y ) with x̄ = 0 and ȳ = 0, the graph of
A being a closed subspace of X × Y (in particular a convex set), and the positive
homogeneity of A is brought in, we not only get (b) and (c) of Theorem 5A.1, but
also (a).
The argument for Theorem 5B.4, in obtaining metric regularity, also revealed a
relationship between that property and the openness condition in 5B.2 which can be
stated in the form
(10) sup c ∈ (0, ∞) (6) holds ≤ [reg (F; x̄| ȳ)]−1 .
convex set which, although not necessarily closed in X , is sure to have core D =
int D. Moreover, on that interior g is locally Lipschitz continuous.
Guide. Look at the mapping F : X → → IR defined by F(x) = y ∈ IR y ≥ g(x) .
Apply results in this section and also 5A.3.
Moreover, reg (H; 0|0) < ∞ if and only if H is surjective, in which case H −1 is
Lipschitz continuous on Y (in the sense of Pompeiu-Hausdorff distance as defined
in 3A) and the infimum of the Lipschitz constant κ for this equals H −1 − .
Proof. Let κ > reg (H; 0|0). Then, from 5A.3, H is linearly open at 0 for 0 with
constant κ , which reduces to H(κ intIB) ⊃ intIB. On the other hand, just from know-
ing that H(κ intIB) ⊃ intIB, we obtain for arbitrary (x, y) ∈ gph H and r > 0 through
266 5 Regularity in Infinite Dimensions
This establishes that reg (H; x|y) ≤ reg (H; 0|0) for all (x, y) ∈ gph H. Appealing
again to positive homogeneity, we get
(3) reg (H; 0|0) = inf κ > 0 H(κ intIB) ⊃ intIB .
The right side of (3) does not change if we replace the open balls with their closures,
hence, by 5A.7 or just by the definition of the inner norm, it equals H −1 − . This
confirms (2).
The finiteness of the right side of (3) corresponds to H being surjective, by virtue
of positive homogeneity. We are left now with showing that H −1 is Lipschitz con-
tinuous on Y with H −1− as the infimum of the available constants κ .
If H −1 is Lipschitz continuous on Y with constant κ , it must in particular have
the Aubin property at 0 for 0 with this constant, and then κ ≥ reg (H; 0|0) by 5A.3.
We already know that this regularity modulus equals H −1 − , so we are left with
proving that, for every κ > reg (H; 0|0), H −1 is Lipschitz continuous on Y with
constant κ .
Let c < [H −1− ]−1 and κ > 1/c. Taking (2) into account, we apply the inequal-
ity 5B(7) derived in 5B.3 with x = x̄ = 0 and ȳ = 0, obtaining the existence of a > 0
such that
d(0, H −1 (y)) ≤ κ d(y, H(0)) ≤ κ y for all y ∈ aIB.
(Here, without loss of generality, we replace the open ball for y by its closure.) For
any y ∈ Y , we have ay/y ∈ aIB, and from the positive homogeneity of H we get
If H −1 − = 0 then 0 ∈ H −1 (y) for all y ∈ Y (see 4A.9), hence (4) follows automat-
ically.
Let y, y ∈ Y and x ∈ H −1 (y ). Through the surjectivity of H again, we can find
for any δ > 0 an xδ ∈ H −1 (y − y ) such that xδ ≤ d(0, H −1 (y − y )) + δ , and then
from (4) we get
Hence x = x−xδ ∈ H −1 (y)+xδ IB. Recalling (5), we arrive finally at the existence
of x ∈ H −1 (y) such that x − x ≤ κ y − y + δ . Since δ can be arbitrarily small,
this yields Lipschitz continuity of H −1 , and we are done.
5 Regularity in Infinite Dimensions 267
Then S is a sublinear mapping with closed graph, and the following properties are
equivalent:
(a) S(y) = 0/ for all y ∈ Y ;
(b) there exists κ such that d(x, S(y)) ≤ κ d(Ax − y, K) for all x ∈ X, y ∈ Y ;
(c) there exists κ such that h(S(y), S(y )) ≤ κ y − y for all y, y ∈ Y ,
in which case the infimum of the constants κ that work in (b) coincides with the
infimum of the constants κ that work in (c) and equals S−.
Detail. Here S = H −1 for H(x) = Ax − K, and the assertions of Theorem 5C.1 then
translate into this form.
Additional insights into the structure of sublinear mappings will emerge from
applying a notion which comes out of the following fact.
Proof. Let G = gph H, this being a closed, convex cone in X × Y , therefore having
rc G = G. For any (x, y) ∈ G, the recession cone rc H(x) consists of the vectors w
such that y + tw ∈ H(x) for all t ≥ 0, which are the same as the vectors w such that
(x, y) + t(0, w) ∈ G for all t ≥ 0, i.e., the vectors w such that (0, w) ∈ rc G = G. But
these are the vectors w ∈ H(0). That proves (8).
The first inclusion in (9) just reflects the rule that H(x + [−x]) ⊃ H(x) + H(−x)
by sublinearity. To obtain the second inclusion, let y1 and y2 belong to H(x), which
is the same as having y1 − y2 ∈ H(x) − H(x), and let y ∈ H(−x). Then by the first
inclusion we have both y1 + y and y2 + y in H(0), hence their difference lies in
H(0) − H(0).
Note that, without the closedness of the graph of H in Theorem 5C.7, there would
be no assurance that (b) implies (a). We would still have a linear mapping, but it
might not be continuous.
H −1 −
(H + B)−1 ≤
−
.
1 − H −1− B
The proof of this is postponed until 5E, where it will be deduced from the connec-
tion between these properties and metric regularity in 5C.1. Perturbations of metric
regularity will be a major theme, starting in Section 5D.
Duality. A special feature of sublinear mappings, with parallels linear mappings,
is the availability of “adjoints” in the framework of the duals X ∗ and Y ∗ of the
Banach spaces X and Y . For a sublinear mapping H : X → → Y , the upper adjoint
H ∗+ : Y ∗ →
→ X ∗ is defined by
When H reduces to a linear mapping A ∈ L (X,Y ), both adjoints come out as the
usual adjoint A∗ ∈ L (Y ∗ , X ∗ ). In that setting the graphs are subspaces instead of
just cones, and the difference between (10) and (11) has no effect. The fact that
A∗ = A in this case has the following generalization.
Theorem 5C.10 (duality of inner and outer norms). For any sublinear mapping
H:X →→ Y with closed graph, one has
H+ = H ∗− − = H ∗+ − ,
(13)
H− = H ∗− + = H ∗+ + .
The proof requires some additional background. First, we need to update to Ba-
nach spaces the semicontinuity properties introduced in a finite-dimensional frame-
work in Section 3B, but this only involves an extension of notation. A mapping
F :X →→ Y is inner semicontinuous at x̄ ∈ dom F if for every y ∈ F(x̄) and every
neighborhood V of y one can find a neighborhood U of x̄ with U ⊂ F −1 (V ) or,
equivalently, F(x) ∩V = 0/ for all x ∈ U (this corresponds to 3B.2). Outer semiconti-
nuity has a parallel extension. Next, we record a standard fact in functional analysis
which will be called upon.
Let f : M → IR be a linear functional such that f (x) ≤ p(x) for all x ∈ M. Then
there exists a linear functional l : X → IR such that l(x) = f (x) for all x ∈ M, and
l(x) ≤ p(x) for all x ∈ X .
Essentially, this says geometrically that a closed convex set is the intersection of
all the “closed half-spaces” that include it.
5 Regularity in Infinite Dimensions 271
Proof of Theorem 5C.10. First, observe from (10) and (11) that
so that
H ∗− = H ∗+ H ∗− = H ∗+ .
− − + +
and
To prove that H+ = H ∗− − we fix any y∗ ∈ Y ∗ and show that
If infx∗ ∈H ∗− (y∗ ) x∗ < r for some r > 0, then there exist x∗ ∈ H ∗− (y∗ ) such that
x∗ < r. For any x̃ ∈ IB and ỹ ∈ H(x̃) we have
To prove the inequality opposite to (16) and hence the equality (15), assume that
supx∈IB supy∈H(x) y∗ , y < r for some r > 0 and pick 0 < d < r such that
First, observe that gph G is convex. Indeed, if (x1 , z1 ), (x2 , z2 ) ∈ gph G and 0 < λ
< 1, then there exist yi ∈ Y and wi ∈ IB with zi = y∗ , yi and yi ∈ H(xi + wi ), for
i = 1, 2. Since H is sublinear, we get λ y1 + (1 − λ )y2 ∈ H(λ (x1 + w1 )+ (1 − λ )(x2 +
w2 )). Hence, λ y1 + (1 − λ )y2 ∈ H(λ x1 + (1 − λ )x2 + IB), and thus,
We will show next that G is inner semicontinuous at 0. Take z̃ ∈ G(0) and ε > 0. Let
z̃ = y∗ , ỹ for ỹ ∈ H(w̃) and w̃ ∈ IB. Since y∗ , · is continuous, there is some γ > 0
such that |y∗ , y − z̃| ≤ ε when y − ỹ ≤ γ . Choose δ ∈ (0, 1) such that δ ỹ ≤ γ .
If x ≤ δ , we have
(1 − δ )ỹ ∈ H((1 − δ )w̃) = H(x + ((1 − δ )w̃ − x)) ⊂ H(x + IB) whenever x ≤ δ .
272 5 Regularity in Infinite Dimensions
Moreover, (1 − δ )ỹ − ỹ = δ ỹ ≤ γ , and then |y∗ , (1 − δ )ỹ − z̃| ≤ ε . Therefore,
for all x ∈ δ IB, we have y∗ , (1 − δ )ỹ ∈ G(x) ∩ IBε (z̃), and hence G is inner semi-
continuous at 0 as desired.
Let us now define a mapping K : X → → IR whose graph is the conical hull of
gph (d − G) where d is as in (17); that is, its graph is the set of points λ h for h ∈
gph (d − G) and λ ≥ 0. The conical hull of a convex set is again convex, so K is
another sublinear mapping. Since G is inner semicontinuous at 0, there is some
neighborhood U of 0 with U ⊂ dom G, and therefore dom K = X. Consider the
functional
k : x → inf z z ∈ K(x) for x ∈ X.
Because K is sublinear and d − H(0) ⊂ IR+ , we have
This inclusion implies in particular that any point in −K(−x) furnishes a lower
bound in IR for the set of values K(x), for any x ∈ X. Indeed, let x ∈ X and y ∈
−K(−x). Then (18) yields K(x) − y ⊂ IR+ , and consequently y ≤ z for all z ∈ K(x).
Therefore k(x) is finite for all x ∈ X; we have dom k = X . Also, from the sublinearity
of K and the properties of the infimum, we have
Since d − G(x) ⊂ k(x) + IR+ and k(x) ≥ l(x), we have d − G(x) ⊂ l(x) + IR+ and
(l(x) + IR+ ) ∩V = 0/ for all x ∈ (δ /λ )IB, so that (l(λ x) + IR+ ) ∩ λ V = 0/ for all x ∈
(δ /λ )IB. This yields
which means that for all x ∈ δ IB there exists some z ≥ l(x) with |z| ≤ ε . The linearity
of l makes l(x) = −l(−x), and therefore |l(x)| ≤ ε for all x ∈ δ IB. This confirms the
continuity of l.
The inclusion d − G(x) − l(x) ⊂ IR+ is by definition equivalent to having d −
y∗ , y − l(x) ≥ 0 whenever x ∈ H −1 (y) − IB. Let x∗ ∈ X ∗ be such that x∗ , x = −l(x)
for all x ∈ X . Then
5 Regularity in Infinite Dimensions 273
Pick any y ∈ H(x) and λ > 0. Then λ y ∈ H(λ x) and y∗ , λ y − x∗ , λ x ≤ d, or
equivalently,
y∗ , y − x∗ , x ≤ d/λ .
Passing to the limit with λ → ∞, we obtain x∗ ∈ H ∗− (y∗ ). Let now x ∈ IB. Since 0 ∈
H(0), we have 0 ∈ H(−x + IB) and hence y∗ , 0 − x∗ , −x ≤ d. Therefore x∗ ≤
d < r, so that infx∗ ∈H ∗− (y∗ ) x∗ < r. This, combined with (16), gives us the equality
in (15) and hence the equalities in the first line of (13).
We will now confirm the equality in the second line of (13). Suppose H− < r
for some r > 0. Then for any x̃ ∈ IB there is some ỹ ∈ H(x̃) such that ỹ < r. Given
y∗ ∈ IB and x∗ ∈ H ∗+ (y∗ ), we have
This being valid for arbitrary x̃ ∈ IB, we conclude that x∗ ≤ r, and therefore
H ∗+ + ≤ r.
Suppose now that H ∗+ + < r and pick s > 0 with
x∗ = H ∗+ ≤ s < r,
+
sup
x∗ ∈H ∗+ (IB)
The condition on the left of (19) can be written as supy∈IB supx∈H −1 (y) x∗ , x ≤ 1,
which in turn is completely analogous to (17), with d = 1 and H replaced by H −1
and with y and y∗ replaced by x and x∗ , respectively. By repeating the argument in
the first part of the proof after (17), we obtain y∗ ∈ (H −1 )∗− (x∗ ) = (H ∗+ )−1 (x∗ ) with
y∗ ≤ 1. But then x∗ ∈ H ∗+ (IB), and since H ∗+ (IB) ⊂ sIB we have (19).
Now we will show that (19) implies
Then
λ x̃∗ , u > 1 > sup λ x̃∗ , x.
x∈cl H −1 (IB)
274 5 Regularity in Infinite Dimensions
→ Y with
Exercise 5C.13 (more norm duality). For a sublinear mapping H : X →
closed graph show that
(H ∗+ )−1 = H −1 .
+ −
since for an invertible mapping A ∈ L (X ,Y ) one has reg A = A−1. This alter-
native formulation opens the way to extending the estimate to nonlinear and even
set-valued mappings.
First, we recall a basic definition of differentiability in infinite dimensions, which
is just an update of the definition employed in the preceding chapters in finite dimen-
sions. With differentiability as well as Lipschitz continuity and calmness, the only
difference is that the Euclidean norm is now replaced by the norms of the Banach
spaces X and Y that we work with.
If actually
lip ( f − D f (x̄); x̄) = 0,
then f is said to be strictly differentiable at x̄.
Partial Fréchet differentiability and partial strict differentiability can be intro-
duced as well on the basis of the partial Lipschitz moduli, by updating the defini-
tions in Section 1D to infinite dimensions. Building on the formulas for the calmness
and Lipschitz moduli, we could alternatively express these definitions in an epsilon-
delta mode as at the beginning of Chapter 1. If a function f is Fréchet differentiable
at every point x of an open set O and the mapping x → D f (x) is continuous from
O to the Banach space L (X ,Y ), then f is said to be continuously Fréchet differ-
entiable on O. Most of the assertions in Section 1D about functions acting in finite
dimensions remain valid in Banach spaces, e.g., continuous Fréchet differentiability
around a point implies strict differentiability at this point.
The extension of the Banach open mapping theorem to nonlinear and set-
valued mappings goes back to the works of Lyusternik and Graves. In 1934,
L. A. Lyusternik published a result, saying that when a function f : X → Y is con-
tinuously Fréchet differentiable in a neighborhood of a point x̄ where f (x̄) = 0 and
its derivative mapping D f (x̄) is surjective, then the tangent manifold to f −1 (0) at x̄
is the set x̄ + ker D f (x̄). In the current setting we adopt the following statement1 of
Lyusternik theorem:
Then, in terms of ȳ := f (x̄) and c = κ −1 − μ, if y is such that y − ȳ ≤ cε, then the
equation y = f (x) has a solution x ∈ IBε (x̄).
Proof. Without loss of generality, let x̄ = 0 and ȳ = f (x̄) = 0. Note that κ > 0, hence
0 < c < ∞. Take y ∈ Y with y ≤ cε . Starting from x0 = 0 we use induction to
construct an infinite sequence {xk }, the elements of which satisfy for all k = 1, 2, . . .
the following three conditions:
and
By (d) in the Banach open mapping theorem 5A.1 there exists x1 ∈ X such that
That is, x1 satisfies all three conditions (2a), (2b) and (2c). In particular, by the
choice of y and the constant c we have x1 ≤ ε .
Suppose now that for some j ≥ 1 we have obtained points xk satisfying (2a), (2b)
and (2c) for k = 1, . . . , j. Then, since y/c ≤ ε , we have from (2c) that all the points
xk satisfy xk ≤ ε . Again using (d) in 5A.1, we can find x j+1 such that
If we plug y = Ax j − Ax j−1 + f (x j−1 ) into the second relation in (3) and use (1) for
x = x j and x = x j−1 which, as we already know, are from ε IB, we obtain
x j+1 − x j ≤ κ (κ μ ) j y.
Furthermore,
5 Regularity in Infinite Dimensions 277
j j
κ y
x j+1 ≤ x1 + ∑ xi+1 − xi ≤ ∑ (κ μ )i κ y ≤ = y/c.
i=1 i=0 1 − κμ
Thus, {xk } is a Cauchy sequence, hence convergent to some x, and then, passing to
the limit with k → ∞ in (2a) and (2c), this x satisfies y = f (x) and x ≤ y/c. The
final inequality gives us x ≤ ε and the proof is finished.
Observe that in the Graves theorem no differentiability of the function f is re-
quired, but only “approximate differentiability” as in the theorem of Hildebrand and
Graves; see the commentary to Chapter 1. If we suppose that for every μ > 0 there
exists ε > 0 such that (1) holds for every x, x ∈ IBε (x̄), then A is, by definition, the
strict derivative of f at x̄, A = D f (x̄). That is, the Graves theorem encompasses the
following special case: if f is strictly differentiable at x̄ and its derivative D f (x̄) is
onto, then there exist ε > 0 and c > 0 such that for every y ∈ Y with y − ȳ ≤ cε
there is an x ∈ X such that x − x̄ ≤ ε and y = f (x).
The statement of the Graves theorem above does not reflect all the information
that can be extracted from its proof. In particular, a solution x of f (x) = y which not
only is in the ball IBε (x̄) but also satisfies x − x̄ ≤ y − ȳ/c. Taking into account
that x ∈ f −1 (y), which yields d(x̄, f −1 (y)) ≤ x − x̄, along with the form of the
constant c, we get
κ
d(x̄, f −1 (y)) ≤ y − f (x̄).
1 − κμ
Furthermore, this inequality actually holds not only at x̄ but also for all x close to x̄,
and this important extension is hidden in the proof of the theorem.
Indeed, let (1) hold for x, x ∈ IBε (x̄) and choose a positive τ < ε . Then there is
a neighborhood U of x̄ such that IBτ (x) ⊂ IBε (x̄) for all x ∈ U. Make U smaller if
necessary so that f (x) − f (x̄) < cτ for x ∈ U. Pick x ∈ U and a neighborhood V
of ȳ such that y − f (x) ≤ cτ for y ∈ V . Then, remembering that in the proof x̄ = 0,
modify the first induction step in the following way: there exists x1 ∈ X such that
and then
k i
κ
(4) xk − x ≤ ∑ xi − xi−1 ≤ κ y − f (x) ∑ (κ μ )i−1 ≤ y − f (x).
i=1 i=1 1 − κμ
278 5 Regularity in Infinite Dimensions
Thus,
xk − x ≤ y − f (x)/c ≤ τ .
The sequence {xk } is a Cauchy sequence, and therefore convergent to some x̃. In
passing to the limit in (4) we get
κ
x̃ − x ≤ y − f (x).
1 − κμ
Since x̃ ∈ f −1 (y), we see that, under the conditions of the Graves theorem, there
exist neighborhoods U of x̄ and V of f (x̄) such that
κ
(5) d(x, f −1 (y)) ≤ y − f (x) for (x, y) ∈ U × V.
1 − κμ
The property described in (5) is something we know from Chapter 3: this is metric
regularity of the function f at x̄ for ȳ. Noting that the μ in (1) satisfies μ ≥ lip ( f −
A)(x̄) we arrive at the following result:
For f = A + B we obtain from this result the estimation for perturbed inversion
of linear mappings in 5A.4.
The version of the Lyusternik theorem2 stated as Theorem 5D.1 can be derived
from the updated Graves theorem 5D.3. Indeed, the assumptions of 5D.1 are clearly
stronger than those of Theorem 5D.3. From (5) with y = f (x̄) we get
κ
(7) d(x, f −1 (ȳ)) ≤ f (x) − f (x̄)
1 − κμ
for all x sufficiently close to x̄. Let ε > 0 and choose δ > 0 such that
(1 − κ μ )ε
(8) f (x) − f (x̄) + D f (x̄)(x − x̄) ≤ x − x̄ whenever x ∈ IBδ (x̄).
κ
But then for any x ∈ (x̄ + ker D f (x̄)) ∩ IBδ (x̄), from (7) and (8) we obtain
κ
d(x, f −1 (y)) ≤ f (x) − f (x̄) ≤ ε x − x̄,
1 − κμ
2 The iteration (3), which is a key step in the proof of Graves, is also present in the original proof
of Lyusternik [1934], see also Lyusternik and Sobolev [1965]. In the case when A is invertible, it
goes back to Goursat [1903], see the commentary to Chapter 1.
5 Regularity in Infinite Dimensions 279
Exercise 5D.4 (correction function version of Graves theorem). Show that on the
conditions of Theorem 5D.3, for ȳ = f (x̄) there exist neighborhoods U of x̄ and V
of ȳ such that for every y ∈ V and x ∈ U there exists ξ with the property
κ
f (ξ + x) = y and ξ ≤ f (x) − y.
1 − κμ
Guide. From Theorem 5D.3 we see that there exist neighborhoods U of x̄ and V of
ȳ such that for every x ∈ U and y ∈ V
κ
d(x, f −1 (y)) ≤ y − f (x).
1 − κμ
Without loss of generality, let y = f (x); then we can slightly increase μ so that the
latter inequality becomes strict. Then there exists η ∈ f −1 (y) such that x − η ≤
κ /(1 − κ μ )y − f (x). Next take ξ = η − x.
If the function f in 5D.3 is strictly differentiable at x̄, we can choose A = D f (x̄),
and then μ = 0. In this case (6) reduces to
In the following section we will show that this inequality actually holds as equality:
In that case, one obtains the existence of a single-valued graphical localization of the
inverse f −1 around f (x̄) for x̄. If the derivative mapping D f (x̄) is merely surjective,
as assumed in the Graves theorem, the inverse f −1 may not have a single-valued
graphical localization at ȳ for x̄ but, still, this inverse, being a set-valued mapping,
has the Aubin property at ȳ for x̄.
Our second observation is that in the proof of Theorem 5D.2 we use the linearity
of the mapping A only to apply the Banach open mapping theorem. But we can em-
ploy the regularity modulus for any, even set-valued, mapping. After this somewhat
historical section, we will explore this idea further in the section which follows.
280 5 Regularity in Infinite Dimensions
For set-valued mappings F acting in such spaces, the definitions of metric regularity
and the Aubin property persist in the same manner. The equivalence of metric regu-
larity with the Aubin property of the inverse (Theorem 3E.4) with the same constant
remains valid as well.
It will be important for our efforts to take the metric space X to be complete and
to suppose that Y is a linear space equipped with a shift-invariant metric σ . Shift
invariance means that
Theorem 5E.1 (inverse mapping theorem for metric regularity in metric spaces).
Let (X, ρ ) be a complete metric space and let (Y, σ ) be a linear space with shift-
invariant metric σ . Consider a mapping F : X →
→ Y and any (x̄, ȳ) ∈ gph F at which
gph F is locally closed, and let κ and μ be nonnegative constants such that
Then for any function g : X → Y with x̄ ∈ int dom g and lip (g; x̄) ≤ μ, one has
5 Regularity in Infinite Dimensions 281
κ
(1) reg (g + F; x̄|g(x̄) + ȳ) ≤ .
1 − κμ
Before arguing this, we note that it immediately allows us to supply 5C.9 and
5D.5 with proofs.
Proof of 5C.9. We apply 5E.1 with X and Y Banach spaces, F = H, x̄ = 0 and
ȳ = 0. According to 5C.1, reg(H; 0|0) = H −1 − , so 5E.1 tells us that for any
κ > H −1 − , any B ∈ L (X,Y ) with B < 1/κ , and any μ with B < μ < 1/κ
one has from (1) that (H + B)−1 − ≤ κ /(1 − κ μ ). It remains only to pass to the
limit as κ → H −1 − and μ → B.
Proof of 5D.5. To obtain the inequality opposite to 5D(9), choose F = f and g =
D f (x̄) − f and apply 5E.1, in this case with μ = 0.
We proceed now with presenting two separate proofs of Theorem 5E.1, which
echo on a more abstract level the way we proved the classical inverse function the-
orem 1A.1 in Chapter 1. The first proof uses an iteration in line with the original
argument in the proof of the Graves theorem 5D.2, while the second proof is based
on a contraction mapping principle for set-valued mappings.
Proof I of Theorem 5E.1. Let κ and μ be as in the statement of the theorem and
choose a function g : X → Y with lip (g; x̄) ≤ μ . Without loss of generality, suppose
g(x̄) = 0. Let λ > κ and ν > μ satisfy λ ν < 1. Let α > 0 be small enough that the
set gph F ∩ (IBα (x̄) × IBα (ȳ)) is closed, g is Lipschitz continuous with constant ν on
IBα (x̄), and
(2) d(x, F −1 (y)) ≤ λ d(y, F(x)) for all (x, y) ∈ IBα (x̄) × IBα (ȳ).
It is not difficult to see that there are positive a, b and ε that satisfy this system.
Indeed, first fix ε such that these inequalities hold strictly for a = b = 0; then pick
sufficiently small a and b so that both the second and the third inequality are not
violated.
Let x ∈ IBa (x̄) and y ∈ IBb (ȳ). We will show that
λ
(5) d(x, (g + F)−1 (y)) ≤ d(y, (g + F)(x)).
1−λν
282 5 Regularity in Infinite Dimensions
Since x and y are arbitrarily chosen in the corresponding balls around x̄ and ȳ, and
λ and ν are arbitrarily close to κ and μ , respectively, this gives us (1).
According to the choice of a and of b in (4), we have
Hence, by (4),
We already found z1 which gives us (10) for k = 0. Suppose that for some n ≥ 1 we
have generated z1 , z2 , . . . , zn satisfying (10). If zn = zn−1 then zn ∈ F −1 (y − g(zn ))
and hence zn ∈ (g + F)−1 (y). Then, by using (2), (7) and (10), we get
n−1
d(x, (g + F)−1 (y)) ≤ ρ (zn , x) ≤ ∑ ρ (zi+1 , zi )
i=0
n−1
1
≤ ∑ (λ ν + ε )i ρ (z1 , x) ≤ 1 − (λ ν + ε ) ρ (z1 , x)
i=0
λ ε
≤ d(y, (g + F)(x)) + .
1 − (λ ν + ε ) λ
Since the left side of this inequality does not depend on the ε on the right, we are
able to obtain (5) by letting ε go to 0.
Assume zn = zn−1 . We will first show that zi ∈ IBα (x̄) for all i = 2, 3, . . . , n. Uti-
lizing (10), for such an i we have
5 Regularity in Infinite Dimensions 283
i−1 i−1
1
ρ (zi , x) ≤ ∑ ρ (z j+1, z j ) ≤ ∑ (λ ν + ε ) j ρ (z1 , x) ≤ 1 − (λ ν + ε ) ρ (z1 , x)
j=0 j=0
Since ρ (zn , zn−1 ) > 0, from (3) there exists zn+1 ∈ F −1 (y − g(zn )) such that
ρ (zn+1 , zn ) ≤ d zn , F −1 (y − g(zn )) + ερ (zn , zn−1 ),
We conclude that the sequence {zk } satisfies the Cauchy condition, and all its el-
ements are in IBα (x̄). Hence this sequence converges to some z ∈ IBα (x̄) which,
from (10) and the local closedness of gph F, satisfies z ∈ F −1 (y − g(z)), that is,
z ∈ (g + F)−1 (y). Moreover,
k
d(x, (g + F)−1 (y)) ≤ ρ (z, x) = lim ρ (zk , x) ≤ lim
k→∞
∑ ρ (zi+1 , zi )
k→∞ i=0
284 5 Regularity in Infinite Dimensions
k
1
≤ lim
k→∞
∑ (λ ν + ε )i ρ (z1 , x) ≤ 1 − (λ ν + ε ) ρ (z1 , x)
i=0
1
≤ λ d(y, (g + F)(x)) + ε ,
1 − (λ ν + ε )
the final inequality being obtained from (2) and (7). Taking the limit as ε → 0 we
obtain (5), and the proof is finished.
The second proof of Theorem 5E.1 uses the following extension of the contrac-
tion mapping principle 1A.2 for set-valued mappings, furnished with a proof the
idea of which goes back to Banach [1922], if not earlier.
Theorem 5E.2 (contraction mapping principle for set-valued mappings). Let (X, ρ )
be a complete metric space, and consider a set-valued mapping Φ : X → → X and a
point x̄ ∈ X . Suppose that there exist scalars a > 0 and λ ∈ (0, 1) such that the set
gph Φ ∩ (IBa (x̄) × IBa (x̄)) is closed and
(a) d(x̄, Φ (x̄)) < a(1 − λ );
1
(b) e(Φ (u) IBa (x̄), Φ (v)) ≤ λ ρ (u, v) for all u, v ∈ IBa (x̄).
Then Φ has a fixed point in IBa (x̄); that is, there exists x ∈ IBa (x̄) such that x ∈ Φ (x).
Proof. By assumption (a) there exists x1 ∈ Φ (x̄) such that ρ (x1 , x̄) < a(1 − λ ).
Proceeding by induction, let x0 = x̄ and suppose that there exists xk+1 ∈ Φ (xk ) ∩
IBa (x̄) for k = 0, 1, . . . , j − 1 with
By assumption (b),
Thus, {xk } is a Cauchy sequence and consequently converges to some x ∈ IBa (x̄).
Since (xk−1 , xk ) ∈ gph Φ ∩ (IBa (x̄) × IBa (x̄)) which is a closed set, we conclude that
x ∈ Φ (x).
For completeness, we now supply with a proof the (standard) contraction map-
ping principle 1A.2.
(12) e(F −1 (y ) ∩ IBa (x̄), F −1 (y)) ≤ λ σ (y , y) for all y , y ∈ IBb+ν a (ȳ),
and also
σ (g(x ), g(x)) ≤ ν ρ (x , x) for all x , x ∈ IBa (x̄).
Take any λ + > λ and make b > 0 smaller if necessary so that
λ +b
(14) < a/4.
1−λν
For any y ∈ IBb (g(x̄) + ȳ) and x ∈ IBa (x̄), using the shift-invariance of the metric σ
and the triangle inequality, we obtain
(15) σ (−g(x) + y, ȳ) = σ (y, g(x) + ȳ) ≤ σ (y, g(x̄) + ȳ) + σ (g(x), g(x̄)) ≤ b + ν a.
Let y, y ∈ IBb (g(x̄)+ ȳ), y = y , and let x ∈ (g+F)−1 (y )∩IBa/2 (x̄). We will establish
now that there is a fixed point x ∈ Φy (x) in the closed ball centered at x with radius
λ + σ (y, y )
ε := .
1−λν
286 5 Regularity in Infinite Dimensions
Since x ∈ F −1 (−g(x ) + y ) ∩ IBa (x̄) and both (x , y ) and (x , y) satisfy (15), from
(12) we get
By the triangle inequality and (14), ε ≤ λ + (2b)/(1 − λ ν ) < a/2, so that IBε (x ) ⊂
IBa (x̄). Then we have that for any u, v ∈ IBε (x ),
e(Φy (u) ∩ IBε (x ), Φy (v)) ≤ e(F −1 (−g(u) + y) ∩ IBa (x̄), F −1 (−g(v) + y))
≤ λ σ (g(u), g(v)) ≤ λ ν ρ (u, v).
By (13) the set gph Φy ∩ (IBε (x̄) × IBε (x̄)) is closed; hence we can apply the contrac-
tion mapping principle in Theorem 5E.2 to the mapping Φy , with constants a = ε
and the λ taken to be the λ ν here, to obtain the existence of a fixed point x ∈ Φy (x)
within distance ε from x . Since x ∈ (g + F)−1 (y), we obtain
λ+
d(x , (g + F)−1 (y)) ≤ ε = σ (y , y).
1−λν
This tells us that (g + F)−1 has the Aubin property at g(x̄) + ȳ for x̄ with constant
λ + /(1 − λ ν ). Hence, by 5A.3, the mapping g + F is metrically regular at x̄ for
g(x̄) + ȳ with the same constant. Since x and y are arbitrarily chosen in the corre-
sponding balls around x̄ and ȳ, λ + and λ are arbitrarily close to κ , and ν is arbitrarily
close to μ , this comes down to (1).
Next, we put together an implicit function version of Theorem 5E.1.
Theorem 5E.3 (implicit mapping theorem for metric regularity in metric spaces).
Let (X , ρ ) be a complete metric space and let (Y, σ ) be a linear metric space with
→Y,
shift-invariant metric. Let (P, π ) be a metric space. For f : P× X → Y and F : X →
consider the generalized equation f (p, x) + F(x) 0 with solution mapping
S(p) = x f (p, x) + F(x) 0 having x̄ ∈ S( p̄).
Proof. Choose λ > κ and ν > μ such that λ ν < 1. Also, let β > γ . Then there exist
positive scalars α and τ such that the set gph (h + F) ∩ (IBα (x̄) × IBα (0)) is closed,
5 Regularity in Infinite Dimensions 287
(17) e (h + F) (y ) ∩ IBα (x̄), (h + F) (y) ≤ λ σ (y , y) for all y , y ∈ IBα (0),
−1 −1
(18) σ (r(p, x ), r(p, x)) ≤ ν ρ (x , x) for all x , x ∈ IBα (x̄) and p ∈ IBτ ( p̄)
(19) σ ( f (p , x), f (p, x)) ≤ β π (p , p) for all p , p ∈ IBτ ( p̄) and x ∈ IBα (x̄).
Let
2λ β λβ
(20) ≥ λ+ > .
1−λν 1−λν
Now, choose positive a < α and then positive q ≤ τ such that
4λ β q
(21) νa + β q ≤ α and + a ≤ α.
1−λν
Then, from (18) and (19), for every x ∈ IBa (x̄) and p ∈ IBq ( p̄) we have
σ (r(p, x), 0) ≤ σ (r(p, x), r(p, x̄)) + σ (r(p, x̄), r( p̄, x̄))
(22)
≤ ν ρ (x, x̄) + β π (p, p̄) ≤ ν a + β q ≤ α .
Observe that for any x ∈ IBa (x̄) and p ∈ IBq ( p̄), x ∈ Φ p (x) ⇐⇒ x ∈ S(p) and also
that the set gph Φ p ∩ (IBα (x̄) × IBα (x̄)) is closed. Let p , p ∈ IBq ( p̄) with p = p and
let x ∈ S(p ) ∩ IBa (x̄). Let ε := λ + π (p , p); then ε ≤ λ + (2q). Thus, remembering
that x ∈ Φ p (x ) ∩ IBα (x̄), from (17), where we use (22), and from (18), (19) and
(20), we deduce that
d(x , Φ p (x )) ≤ e (h + F)−1 (−r(p , x )) ∩ IBα (x ), (h + F)−1 (−r(p, x ))
≤ λ σ ( f (p , x ), f (p, x )) ≤ λ β π (p , p) < λ + π (p , p)(1 − λ ν ) = ε (1 − λ ν ).
4λ β q
ε ≤ 2λ + q ≤ ,
1−λν
we get IBε (x ) ⊂ IBα (x̄). Then, for any u, v ∈ IBε (x ) using again (17) (with (22)) and
(18), we see that
288 5 Regularity in Infinite Dimensions
(x ), Φ p (v))
e(Φ p (u) ∩ IBε
≤e (h + F)−1 (−r(u, p)) ∩ IBε (x̄), (h + F)−1 (−r(v, p))
≤ λ σ (r(p, u), r(p, v)) ≤ λ ν ρ (u, v) .
Hence the contraction mapping principle in Theorem 5E.2 applies, with the λ there
taken to be the λ ν here, and it follows that there exists x ∈ Φ p (x) ∩ IBε (x ) and hence
x ∈ S(p) ∩ IBε (x ). Thus,
Since this inequality holds for any x ∈ S(p ) ∩ IBa (x̄) and any λ + fulfilling (20), we
arrive at
e(S(p ) ∩ IBa (x̄), S(p)) ≤ λ + π (p , p).
That is, S has the Aubin property at p̄ for x̄ with modulus not greater than λ + . Since
λ + can be arbitrarily close to λ /(1 − λ ν ), and λ , ν and β can be arbitrarily close
to κ , μ and γ , respectively, we achieve the estimate (16).
We can also state Theorem 3F.9 in Banach spaces with only minor adjustments
in notation and terminology.
is metrically regular at x̄ for 0, then S has the Aubin property at p̄ for x̄ with
then the converse implication holds as well: the mapping h + F is metrically regular
at x̄ for 0 provided that S has the Aubin property at p̄ for x̄.
The following exercise, which we supply with a detailed guide, deals with a
more general kind of perturbation and shows that not only the constant but also the
neighborhoods of metric regularity of the perturbed mapping can be independent of
the perturbation provided that its Lipschitz constant is “sufficiently small.” For better
transparency, we consider mappings with closed graphs acting in Banach spaces.
Exercise 5E.5. Let X and Y be Banach spaces and consider a continuous function
f : X → Y , a mapping F : X →
→ Y with closed graph and a point x̄ ∈ X such that
5 Regularity in Infinite Dimensions 289
there exist positive constants α and β such that for every mapping A : X × X → Y
and every x̃ ∈ IBα (x̄) and ỹ ∈ IBβ (0) with the properties that
and
[A(x , u) − f (x )] − [A(x, u) − f (x)] ≤ μ x − x for every x , x, u ∈ IBα +5κ β (x̃),
we have that for any u ∈ IBα (x̃) the mapping A(·, u) + F(·) is metrically regular at x̃
for ỹ with constant κ and neighborhoods IBα (x̃) and IBβ (ỹ).
Guide. Let a and b be positive constants such that f + F is metrically regular with
constant κ and neighborhoods IBa (x̄) and IBb (0). Choose κ satisfying (23) and then
positive α and β such that
α ≤ 2κ β , 2α + 5κ β ≤ a and 6β + μα ≤ b.
− A(x, u) + f (x) + y ≤ y − ỹ + ỹ + − A(x, u) + f (x) + A(x̃, u) − f (x̃)
+A(x̃, u) − A(x̃, x̃) + A(x̃, x̃) − f (x̃)
≤ 4β + μα + 2β ≤ 6β + μα ≤ b.
The same estimate holds of course with y replaced by y; that is, both −A(x, u) +
f (x) + y and −A(x, u) + f (x) + y are in IBb (0). Consider the mapping
Since IBα (x̃) ⊂ IBa (x̄), utilizing the metric regularity of f + F we obtain
where r := κ y − y . Observe that r ≤ 5κ β and IBr (x) ⊂ IBα +5κ β (x̃) ⊂ IBa (x̄).
Again, by utilizing the metric regularity of f + F we get that for every v, w ∈ IBr (x),
Noting that the set gph Φ ∩ (IBr (x) × IBr (x)) is closed, Theorem 5E.2 then yields the
existence of a fixed point x̂ ∈ Φy (x̂) ∩ IBr (x); that is,
If A(x, u)+ F(x) = 0/ there is nothing to prove. If not, choose ε > 0 and w ∈ A(x, y)+
F(x) such that w− y ≤ d(y, A(x, u)+ F(x))+ ε . If w ∈ IB4β (ỹ), then from (25) with
y = w we get
which yields (25) since the left side does not depend on ε . Otherwise, we have
y − w ≥ w − ỹ − y − ỹ ≥ 4β − β = 3β .
But then,
Then F is metrically regular at 0 for 0 while G has the Aubin property at 0 for 0.
Moreover, both F −1 and G are Lipschitz continuous (with respect to the Pompeiu-
Hausdorff distance) on the whole of IR. We have reg (F; 0|0) = 1/2 whereas the
Lipschitz modulus of the single-valued localization of G around 0 for 0 is 0 and
serves also as the infimum of all Aubin constants. The mapping
At the end of this section we will derive from Theorem 5E.2 the following fixed
point theorem due to Nadler [1969]:
Theorem 5E.8 (Nadler). Let (X , ρ ) be a complete metric space and suppose that Φ
maps X into the set of closed subsets of X and is Lipschitz continuous in the sense
of Pompeiu-Hausdorff distance on X with Lipschitz constant λ ∈ (0, 1). Then Φ has
a fixed point.
Proof. We will first show that Φ has closed graph. Indeed, let (xk , yk ) ∈ gph Φ and
(xk , yk ) → (x, y). Then
Hence d(y, Φ (x)) = 0 and since Φ (x) is closed we have (x, y) ∈ gph Φ , and therefore
gph Φ is closed as claimed.
Let x̄ ∈ X and choose a > d(x̄, Φ (x̄))/(1 − λ ). Since gph Φ is closed, the set
gph Φ (x) ∩ (IBa (x̄) × IBa (x̄)) is closed as well. Furthermore, for every u, v ∈ IBa (x̄)
we obtain
e(Φ (u) ∩ IBa (x̄), Φ (v)) ≤ e(Φ (u), Φ (v)) ≤ h(Φ (u), Φ (v)) ≤ λ ρ (u, v).
Theorem 5F.1 (inverse function theorem with strong metric regularity in metric
spaces). Let (X, ρ ) be a complete metric space and let (Y, σ ) be a linear metric
space with shift-invariant metric σ . Consider a mapping F : X → → Y and any (x̄, ȳ) ∈
gph F such that, for a nonnegative constant κ and neighborhoods U of x̄ and V of ȳ,
the mapping y → F −1 (y) ∩U is a Lipschitz continuous function on V with Lipschitz
constant κ.
Then for every nonnegative constant μ with κ μ < 1 there exist neighborhoods U
of x̄ and V of ȳ such that, for every function g : X → Y which is Lipschitz continuous
on U with Lipschitz constant μ, the mapping y → (g + F)−1 (y) ∩U is a Lipschitz
continuous function on g(x̄) + V with Lipschitz constant κ /(1 − κ μ ).
Proof. We apply the standard (single-valued) version of the contracting mapping
principle, 1A.2, as in Proof II of 5E.1 but with some adjustments in the argument.
By assumption, for the function s(y) = F −1 (y) ∩U for y ∈ V we have
Pick μ > 0 such that κ μ < 1 and then choose positive constants a and b such that
For any y ∈ IBb (g(x̄) + ȳ) and any x ∈ IBa (x̄) we have
σ (−g(x) + y, ȳ) = σ (y, g(x) + ȳ) ≤ σ (y, g(x̄) + ȳ) + σ (g(x), g(x̄)) ≤ b + μ a,
and hence, by (2), −g(x) + y ∈ V ⊂ dom s. Fix y ∈ IBb (g(x̄) + ȳ) and consider the
mapping
Φy : x → s(−g(x) + y) for x ∈ IBa (x̄).
Then, by using (1), (2) and (3) we get
ρ (Φy (u), Φy (v)) = ρ (s(y − g(u)), s(y − g(v))) ≤ κ σ (g(u), g(v)) ≤ κ μ ρ (u, v).
Thus, by the contraction mapping principle 1A.2, there exists a fixed point x = Φy (x)
in IBa (x̄), and there is no more than one such fixed point in IBa (x̄). The mapping from
y ∈ IBb (g(x̄) + ȳ) to the unique fixed point x(y) of Φy in IBa (x̄) is a function which
satisfies x(y) = s(y − g(x(y))); therefore, for any y, y ∈ IBb (g(x̄) + ȳ) we have
Hence,
κ
ρ (x(y), x(y )) ≤ σ (y, y ).
1 − κμ
Choosing U = IBa (x̄) and V = IBb (ȳ), and noting that IBb (g(x̄) + ȳ) = g(x̄) + IBb (ȳ),
we end the proof.
Compared with 3G.3, the above theorem exposes the fact that not only the Lip-
schitz constant, but also the neighborhoods associated with the Lipschitz localiza-
tion of (g + F)−1 depend on the mapping F only, and not on the perturbation g, as
long as its Lipschitz modulus is less than the reciprocal to the regularity modulus of
F. We already stated such a result for metric regularity in 5E.5.
Prove that for each γ ≥ κ /(1 − κ μ ) there are neighborhoods U of x̄, V of ȳ and Q
of p̄ such that, for every p ∈ Q, the mapping y → (r(p, ·)+F)−1 (y)∩U is a Lipschitz
continuous function on r( p̄, x̄) + V with Lipschitz constant γ.
Guide. Choose neighborhoods U of x̄ and Q of p̄ such that for each p ∈ Q the
function r(p, ·) is Lipschitz continuous on U with Lipschitz constant μ . Applying
Theorem 5F.1, we obtain a constant γ and neighborhoods U of x̄, and V of ȳ such
that for every p ∈ Q the mapping y → (r(p, ·) + F)−1 (y) ∩U is a Lipschitz contin-
uous function on r(p, x̄) + V with Lipschitz constant γ . Since V is independent of
p ∈ Q, by making Q small enough we can find a neighborhood V of ȳ such that
r( p̄, x̄) + V ⊂ r(p, x̄) + V for every p ∈ Q.
We present next a strong regularity extension of Theorem 5E.3 such as has al-
ready appeared in various forms in the preceding chapters. We proved a weaker
version of this result in 2B.5 via Lemma 2B.6 and stated it again in Theorem 3G.4,
294 5 Regularity in Infinite Dimensions
which we left unproved. Here we treat a general case which can be deduced from
Theorem 5E.3 by taking into account that the strong metric regularity of h + F auto-
matically implies local closedness of its graph, and by adjoining to that the argument
in the proof of 3G.2. Since the result is central in this book, with the risk of repeating
ourselves, we supply it with an unabbreviated proof.
Theorem 5F.4 (implicit function theorem with strong metric regularity in metric
spaces). Let (X, ρ ) be a complete metric space and let (Y, σ ) be a linear metric
space with shift-invariant metric. Let (P, π ) be a metric space. For f : P × X → Y
and F : X →
→ Y , consider the generalized equation f (p, x) + F(x) 0 with solution
mapping
S(p) = x f (p, x) + F(x) 0 having x̄ ∈ S( p̄).
Let f (·, x̄) be continuous at p̄ and let h : X → Y be a strict estimator of f with
respect to x uniformly in p at ( p̄, x̄) with constant μ. Suppose that h + F is strongly
metrically regular at x̄ for 0 or, equivalently, the inverse (h + F)−1 has a Lipschitz
continuous single-valued localization ω around 0 for x̄ such that there exists κ ≥
reg (h + F; x̄|0) = lip (ω ; 0) with κ μ < 1.
Then the solution mapping S has a single-valued localization s around p̄ for x̄.
Moreover, for every ε > 0 there exists a neighborhood Q of p̄ such that
κ +ε
(4) ρ (s(p ), s(p)) ≤ σ ( f (p , s(p)), f (p, s(p))) for all p , p ∈ Q.
1 − κμ
In particular, s is continuous at p̄. In addition, if
Proof. Let ε > 0 and choose λ > κ and ν > μ such that
λ κ +ε
(9) λν < 1 and ≤ .
1−λν 1 − κμ
Then there exists a positive scalar α such that for each y ∈ IBα (ȳ) the set
(h + F)−1 (y) ∩ IBα (x̄) is a singleton, equal to the value ω (y) of the single-valued
localization of (h + F)−1 , and this localization ω is Lipschitz continuous with Lip-
schitz constant λ on IBα (0). We adjust α and choose a positive τ to also have, for
e(p, x) = f (p, x) − h(x),
(10) σ (e(p, x ), e(p, x)) ≤ ν ρ (x , x) for all x , x ∈ IBα (x̄) and p ∈ IBτ ( p̄).
a(1 − λ ν )
(11) νa + ≤α
λ
and then a positive r ≤ τ such that
a(1 − λ ν )
(12) σ ( f (p, x̄), f ( p̄, x̄)) ≤ for all p ∈ IBr ( p̄).
λ
Then for every x ∈ IBa (x̄) and p ∈ IBr ( p̄), from (10)–(12) we have
σ (e(p, x), 0) ≤ σ (e(p, x), e(p, x̄)) + σ (e(p, x̄), e( p̄, x̄))
≤ νρ (x, x̄) + σ ( f (p, x̄), f ( p̄, x̄)) ≤ ν a + a(1 − λ ν )/λ ≤ α .
Observe that for any x ∈ IBa (x̄) having x = Φ p (x) implies x ∈ S(p) ∩ IBa (x̄), and
conversely. Noting that x̄ = ω (0) and using (12) we obtain
ρ (x̄, Φ p (x̄)) = ρ (ω (0), ω (−e(p, x̄))) ≤ λ σ ( f ( p̄, x̄), f (p, x̄)) ≤ a(1 − λ ν ).
Hence the contraction mapping principle 1A.2 applies, with the λ there taken to be
the λ ν here, and it follows that for each p ∈ IBr ( p̄) there exists exactly one s(p) in
IBa (x̄) such that s(p) ∈ S(p); thus
296 5 Regularity in Infinite Dimensions
Hence,
λ
ρ (s(p ), s(p)) ≤ σ ( f (p , s(p)), f (p, s(p))).
1−λν
Taking into account (9), we obtain (4). In particular, for p = p̄, from the continuity
of f (·, x̄) at p̄ we get that s is continuous at p̄. Under (5), the estimate (6) directly
follows from (4) by passing to zero with ε , and the same for (8) under (7). If h
is a strict first-order approximation of f , then μ could be arbitrarily small, and by
passing to lip (ω ; 0) with κ and to 0 with μ we obtain from (6) and (8) the last two
estimates in the statement.
Utilizing strict differentiability and ample parameterization we come to the fol-
lowing infinite-dimensional implicit function theorem which parallels 5E.4.
then the converse implication holds as well: the mapping h + F is strongly met-
rically regular at x̄ for 0 provided that S has a Lipschitz continuous single-valued
localization around p̄ for x̄.
5 Regularity in Infinite Dimensions 297
Ds(ȳ) = [D f (x̄)]−1 .
In Section 1F we considered what may happen (in finite dimensions) when the
derivative mapping is merely surjective; by adjusting the proof of Theorem 1F.6
one obtains that when the Jacobian ∇ f (x̄) has full rank, the inverse f −1 has a local
selection which is strictly differentiable at f (x̄). The claim can be easily extended
to Hilbert (and even more general) spaces:
selection of A−1 ; this selection, however, may not be linear. The original Bartle–
Graves theorem is for nonlinear mappings and says the following:
In other words, the surjectivity of the strict derivative at x̄ implies that f −1 has a
local selection s which is continuous around f (x̄) and calm at f (x̄). It is known3 that,
in contrast to the strictly differentiable local selection in 5G.2 for Hilbert spaces, the
selection in the Bartle–Graves theorem, even for a bounded linear mapping f , might
be not even Lipschitz continuous around ȳ. For this case we have:
Theorem 5G.5 (Michael’s selection theorem). Let X and Y be Banach spaces and
consider a mapping F : Y → → X which is closed-convex-valued and inner semicon-
tinuous on dom F = 0/ . Then F has a continuous selection s : dom F → X .
We require a lemma which connects the Aubin property of a mapping with the
inner semicontinuity of a truncation of this mapping:
Lemma 5G.6 (inner semicontinuous selection from the Aubin property). Consider
a mapping S : Y → → X and any (ȳ, x̄) ∈ gph S, and suppose that S has the Aubin
property at ȳ for x̄ with constant κ. Suppose, for some c > 0, that the sets S(y) ∩
IBc (x̄) are convex and closed for all y ∈ IBc (ȳ). Then for any α > κ there exists
β > 0 such that the mapping
y → Mα (y) = S(y) ∩ IBα y−ȳ (x̄) for y ∈ IBβ (ȳ),
0/ otherwise
Let
(α + κ )yk − y
(2) εk = .
(α − κ )yk − ȳ + (α + κ )yk − y
where in the last inequality we take into account the expression (2) for ε k . Thus
xk ∈ Mα (yk ), and since xk → x, we are done.
Lemma 5G.6 allows us to apply Michael’s selection theorem to the mapping Mα ,
obtaining the following result:
Proof. Choose α such that α > reg (F; x̄| ȳ), and apply Michael’s theorem 5G.5 to
the mapping Mα in 5G.6 for S = F −1 . By the definition of Mα , the continuous local
selection obtained in this way is calm with a constant α .
Note that the continuous local selection s in 5G.7 depends on α and therefore we
cannot replace α in (3) with reg (F; x̄| ȳ).
In the remainder of this section we show that if a mapping F satisfies the as-
sumptions of Theorem 5G.7, then for any function g : X → Y with lip (g; x̄) <
1/ reg(F; x̄| ȳ), the mapping (g + F)−1 has a continuous and calm local selection
around g(x̄) + ȳ for x̄. We will prove this generalization of the Bartle–Graves theo-
rem by repeatedly using an argument similar to the proof of Lemma 5G.6, the idea
of which goes back to (modified) Newton’s method used to prove the theorems of
Lyusternik and Graves and, in fact, to Goursat’s proof on his version of the classical
inverse function theorem. We put the theorem in the format of the general implicit
function theorem paradigm:
Theorem 5G.8 (inverse mapping theorem with continuous and calm local selec-
tions). Consider a mapping F : X → → Y and any (x̄, ȳ) ∈ gph F and suppose that for
some c > 0 the mapping IBc (ȳ) y → F −1 (y) ∩ IBc (x̄) is closed-convex-valued. Let
κ and μ be nonnegative constants such that
Then for any function g : X → Y with x̄ ∈ int dom g and lip (g; x̄) ≤ μ and for every
γ with
κ
<γ
1 − κμ
the mapping (g + F)−1 has a continuous local selection s around g(x̄) + ȳ for x̄,
which moreover is calm at g(x̄) + ȳ with
5 Regularity in Infinite Dimensions 301
Proof. The proof consists of two steps. In the first step, we use induction to obtain
a Cauchy sequence of continuous functions z0 , z1 , . . ., such that zn is a continuous
and calm selection of the mapping y → F −1 (y − g(zn−1 (y))). Then we show that this
sequence has a limit in the space of continuous functions acting from a fixed ball
around ȳ to the space X and equipped with the supremum norm, and this limit is the
selection whose existence is claimed.
Choose κ and μ as in the statement of the theorem and let γ > κ /(1 − κ μ ). Let
λ , α and ν be such that κ < λ < α < 1/ν and ν > μ , and also λ /(1 − αν ) ≤ γ .
Without loss of generality, we can assume that g(x̄) = 0. Let IBa (x̄) and IBb (ȳ) be
the neighborhoods of x̄ and ȳ, respectively, that are associated with the assumed
properties of the mapping F and the function g. Specifically,
(a) For every y, y ∈ IBb (ȳ) and x ∈ F −1 (y) ∩ IBa (x̄) there exists x ∈ F −1 (y ) with
(b) For every y ∈ IBb (ȳ) the set F −1 (y) ∩ IBa (x̄) is nonempty, closed and convex
(that is, max{a, b} ≤ c).
(c) The function g is Lipschitz continuous on IBa (x̄) with a constant ν .
According to 5G.7, there we can find a constant β , 0 < β ≤ b, and a continuous
function z0 : IBβ (ȳ) → X such that
F(z0 (y)) y and z0 (y) − x̄ ≤ λ y − ȳ for all y ∈ IBβ (ȳ).
y−g(z0 (y))− ȳ ≤ τ + ν z0 (y)− x̄ ≤ τ + νλ τ ≤ (1− αν )(1+ νλ )(β /2) ≤ β ≤ b.
Then from the Aubin property of F −1 there exists x ∈ F −1 (y − g(z0 (y))) with
(6) x̌k − z0 (yk ) ≤ λ g(z0 (yk )) − g(x̄) ≤ λ ν z0 (yk ) − x̄ ≤ αν z0 (yk ) − x̄.
Then x̌k ∈ M1 (yk ), and in particular, x̌k ∈ IBa (x̄). Further, the inclusion x ∈ F −1 (y −
g(z0 (y))) ∩ IBa (x̄) combined with the Aubin property of F −1 entails the existence of
x̃k ∈ F −1 (yk − g(z0 (yk ))) such that
Note that, for k → ∞, the numerator in the definition of ε k goes to 0 because of the
continuity of z0 and (7), while the denominator converges to (α − λ )ν z0 (y) − x̄
> 0; therefore ε k → 0 as k → ∞. Let
xk = ε k x̌k + (1 − ε k )x̃k .
z1 (y) ∈ F −1 (y − g(z0 (y))) and z1 (y) − z0 (y) ≤ αν z0 (y) − x̄ for all y ∈ IBτ (ȳ).
z1 (y) − x̄ ≤ z1 (y) − z0 (y) + z0 (y) − x̄ ≤ (1 + αν )λ y − ȳ ≤ γ y − ȳ.
The induction step is parallel to the first step. Let z0 and z1 be as above and
suppose we have also found functions z1 , z2 , . . . , zn , such that each z j , j = 1, 2, . . . , n,
is a continuous selection of the mapping y → M j (y) where
M j (y) = x ∈ F −1 (y − g(z j−1(y))) x − zn (y) ≤ αν z j−1 (y) − z j−2(y)
z j (y) − z j−1 (y) ≤ (αν ) j−1 z1 (y) − z0 (y) ≤ (αν ) j z0 (y) − x̄, j = 2, . . . , n.
Therefore,
j
z j (y) − x̄ ≤ ∑ (αν )i zi (y) − zi−1(y)
i=0
j
λ
≤ ∑ (αν )i z0(y) − x̄ ≤ 1 − αν y − ȳ ≤ γ y − ȳ.
i=0
and also
λ ντ τ
(9) y − g(z j (y)) − ȳ ≤ τ + ν z j (y) − x̄ ≤ τ + ≤ ≤ β ≤ b.
1 − αν 1 − αν
Consider the mapping y → Mn+1 (y) where
304 5 Regularity in Infinite Dimensions
Mn+1 (y) = x ∈ F −1 (y − g(zn (y))) x − zn (y) ≤ αν zn (y) − zn−1(y)
xk − zn (yk ) ≤ λ g(zn (yk )) − g(zn−1(yk )) ≤ αν zn (yk ) − zn−1(yk ).
the Aubin property of F −1 implies the existence of x̌k ∈ F −1 (yk − g(zn (yk ))) such
that
x̌k − zn (yk ) ≤ λ g(zn (yk )) − g(zn−1 (yk )) ≤ λ ν zn (yk ) − zn−1 (yk ).
Similarly, since x ∈ F −1 (y− g(zn (y)))∩IBa (x̄), there exists x̃k ∈ F −1 (yk − g(zn (yk )))
such that
Put
αν zn−1 (y) − zn−1(yk ) + (1 + αν )zn(y) − zn (yk ) + x̃k − x
ε k := .
αν zn (y) − zn−1(y) − λ ν zn (yk ) − zn−1 (yk )
Then ε k → 0 as k → ∞. Taking
xk = ε k x̌k + (1 − ε k )x̃k ,
we obtain that xk ∈ F −1 (yk − g(zn (yk ))) for large k. Further, we estimate xk −
zn (yk ) in the same way as in the first step, that is,
zn+1 (y) ∈ F −1 (y − g(zn (y))) and zn+1 (y) − zn (y) ≤ αν zn (y) − zn−1 (y).
Thus
zn+1 (y) − zn (y) ≤ (αν )(n+1) z0 (y) − x̄.
The induction step is now complete. In consequence, we have an infinite se-
quence of bounded continuous functions z0 , . . . , zn , . . . such that for all y ∈ IBτ (ȳ)
and for all n,
n
λ
zn (y) − x̄ ≤ ∑ (αν )i z0 (y) − x̄ ≤ y − ȳ ≤ γ y − ȳ
i=0 1 − αν
and moreover,
sup zn+1 (y) − zn (y) ≤ (αν )n sup z0 (y) − x̄ ≤ (αν )n λ τ for n ≥ 1.
y∈IBτ (ȳ) y∈IBτ (ȳ)
The sequence {zn } is a Cauchy sequence in the space of functions that are continu-
ous and bounded on IBτ (ȳ) equipped with the supremum norm. Then this sequence
has a limit s which is a continuous function on IBτ (ȳ) and satisfies
s(y) ∈ F −1 (y − g(s(y)))
and
λ
s(y) − x̄ ≤ y − ȳ ≤ γ y − ȳ for all y ∈ IBτ (ȳ).
1 − αν
Thus, s is a continuous local selection of (g + F)−1 which has the calmness property
(4). This brings the proof to its end.
Proof of Theorem 5G.3. Apply 5G.8 with F = D f (x̄) and g(x) = f (x) − D f (x̄)x.
Metric regularity of F is equivalent to surjectivity of D f (x̄), and F −1 is convex-
closed-valued. The mapping g has lip (g; x̄) = 0 and finally F + g = f .
Note that Theorem 5G.7 follows from 5G.8 with g the zero function.
306 5 Regularity in Infinite Dimensions
Theorem 5G.9 (implicit mapping version). Let X,Y and P be Banach spaces. For
f : P × X → Y and F : X →
→ Y , consider the generalized equation f (p, x) + F(x) 0
with solution mapping
S(p) = x f (p, x) + F(x) 0 having x̄ ∈ S( p̄).
Suppose that F satisfies the conditions in Theorem 5G.7 with ȳ = 0 and associate
constant κ ≥ reg (F; x̄|0) and also that f is continuous on a neighborhood of (x̄, p̄)
( f ; ( p̄, x̄)) ≤ μ , where μ is a nonnegative constant satisfying κ μ < 1.
and has lip x
Then for every γ satisfying
κ
(10) <γ
1 − κμ
there exist neighborhoods U of x̄ and Q of p̄ along with a continuous function s :
Q → U such that
(11) s(p) ∈ S(p) and s(p) − x̄ ≤ γ f (p, x̄) − f ( p̄, x̄) for every p ∈ Q.
Proof. The proof is parallel to the proof of Theorem 5G.7. First we choose γ satis-
fying (10) and then λ , α and ν such that κ < λ < α < ν −1 and ν > μ , and also
λ
(12) < γ.
1 − αν
There are neighborhoodsU, V and Q of x̄, 0 and p̄, respectively, which are associated
with the metric regularity of F at x̄ for 0 with constant λ and the Lipschitz continuity
of f with respect to x with constant ν uniformly in p. By appropriately choosing a
sufficiently small radius τ of a ball around p̄, we construct an infinite sequence of
continuous and bounded functions zk : IBτ ( p̄) → X, k = 0, 1, . . ., which are uniformly
convergent on IBτ ( p̄) to a function s satisfying the conditions in (11). The initial z0
satisfies
z0 (p) ∈ F −1 (− f (p, x̄)) and z0 (p) − x̄ ≤ λ f (p, x̄) − f ( p̄, x̄).
for p ∈ IBτ ( p̄), where z−1 (p) = x̄. Then for all p ∈ IBτ ( p̄) we obtain
zk (p) ∈ F −1 (− f (p, zk−1 (p))) and zk (p) − zk−1 (p) ≤ (αν )k z0 (p) − x̄,
hence,
λ
(13) zk (y) − x̄ ≤ f (p, x̄) − f ( p̄, x̄).
1 − αν
5 Regularity in Infinite Dimensions 307
The sequence {zk } is a Cauchy sequence of continuous and bounded function, hence
it is convergent with respect to the supremum norm. In the limit with k → ∞, taking
into account (12) and (13), we obtain a selection s with the desired properties.
Commentary
The equivalence of (a) and (d) in 5A.1 was shown in Theorem 10 on p. 150 of the
original treatise of S. Banach [1932]. The statements of this theorem usually include
the equivalence of (a) and (b), which is called in Dunford and Schwartz [1958] the
“interior mapping principle.” Lemma 5A.4 is usually stated for Banach algebras,
see, e.g., Theorem 10.7 in Rudin [1991]. Theorem 5A.8 is from Robinson [1972].
The generalization of the Banach open mapping theorem to set-valued map-
pings with convex closed graphs was obtained independently by Robinson [1976]
and Ursescu [1975]; the proof of 5B.3 given here is close to the original proof in
Robinson [1976]. A particular case of this result for positively homogeneous map-
pings was shown earlier by Ng [1973]. The Baire category theorem can be found
in Dunford and Schwartz [1958], p. 20. The Robinson–Ursescu theorem is stated
in various ways in the literature, see, e.g., Theorem 3.3.1 in Aubin and Ekeland
[1984], Theorem 2.2.2 in Aubin and Frankowska [1990], Theorem 2.83 in Bonnans
and Shapiro [2000], Theorem 9.48 in Rockafellar and Wets [1998], Theorem 1.3.11
in Zălinescu [2002] and Theorem 4.21 in Mordukhovich [2006].
Sublinear mappings (under the name “convex processes”) and their adjoints were
introduced by Rockafellar [1967]; see also Rockafellar [1970]. Theorem 5C.9 first
appeared in Lewis [1999], see also Lewis [2001]. The norm duality theorem, 5C.10,
was originally proved by Borwein [1983], who later gave in Borwein [1986b] a
more detailed argument. The statement of the Hahn–Banach theorem 5C.11 is from
Dunford and Schwartz [1958], p. 62.
Theorems 5D.1 and 5D.2 are versions of results originally published in Lyusternik
[1934] and Graves [1950], with some adjustments to the current setting. Lyusternik
apparently viewed his theorem mainly as a stepping stone to obtain the Lagrange
multiplier rule for abstract minimization problems, and the title of his paper from
1934 clearly says so. It is also interesting to note that, after the statement of the
Lyusternik theorem as 8.10.2 in the functional analysis book by Lyusternik and
Sobolev [1965], the authors say that “the proof of this theorem is a modification
of the proof of the implicit function theorem, and the [Lyusternik] theorem is a
direct generalization of this [implicit function] theorem.”
It is quite likely that Graves considered his theorem as an extension of the Banach
open mapping theorem for nonlinear mappings. But there is more in its statement
and proof; namely, the Graves theorem does not involve differentiation and then,
as shown in 5D.3, can be easily extended to become a generalization of the the
basic Lemma 5A.4 for nonlinear mappings. This was mentioned already in the his-
torical remarks of Dunford and Schwartz [1958], p. 85. A further generalization in
line with the present setting was revealed in Dmitruk, Milyutin and Osmolovskiı̆
[1980], where the approximating linear mapping is replaced by a Lipschitz continu-
ous function with a sufficiently small Lipschitz constant. Estimates for the regularity
modulus of the kind given in 5D.3 are also present in Ioffe [1979].
In the second part of the last century, when the development of optimality condi-
tions was a key issue, the approach of Lyusternik was recognized for its virtues and
5 Regularity in Infinite Dimensions 309
4 The original statement of Michael’s selection theorem is for mappings acting from a paracompact
space to a Banach space; by a theorem of A. H. Stone every metric space is paracompact and hence
every subset of a Banach space is paracompact.
5 Michael’s theorem was not known at that time.
310 5 Regularity in Infinite Dimensions
the theory and applications behind it is given in Hamilton [1982]. In the following
lines, we only briefly point out a connection to the results in Section 5E.
The Nash–Moser theorem is about mappings acting in Fréchet spaces, which are
more general than the Banach spaces. Consider a linear (vector) space F equipped
with the collection of seminorms { · n |n ∈ IN} (a seminorm differs from a norm
in that the seminorm of a nonzero element could be zero). The topology induced
by this (countable) collection of seminorms makes the space F a locally convex
topological vector space. If x = 0 when xn = 0 for all n, the space is Hausdorff.
In a Hausdorff space, one may define a metric based the family of seminorms in the
following way:
∞
x − yn
(1) ρ (x, y) = ∑ 2−n 1 + x − yn .
n=1
It is not difficult to see that this metric is shift-invariant. A sequence {xk } is said to be
Cauchy when xk − x j n → 0 as k and j → ∞ for all n, or, equivalently, ρ (xk , x j ) → 0
as k → ∞ and j → ∞. As usual, a space is complete if every Cauchy sequence con-
verges. A Fréchet space is a complete Hausdorff metrizable locally convex topolog-
ical vector space.
Having two Fréchet spaces F and G, we can now introduce metrics ρ and σ asso-
ciated with their collections of seminorms as in (1) above, and define Lipschitz con-
tinuity and metric regularity accordingly. Then Theorem 5E.3 will apply of course
and we can obtain, e.g., a Graves-type theorem in Fréchet spaces, in terms of the
metrics ρ and σ , and also an implicit function theorem in Fréchet spaces from the
general Theorem 5E.4. To get to the Nash–Moser theorem, however, we have a long
way to go, translating the meaning of, e.g., the assumptions Theorem 5E.4, in terms
of the metrics ρ and σ for the collections of seminorms and the mappings consid-
ered. For that we will need more structure in the spaces, an ordering (grading) of the
sequence of seminorms and, moreover, a certain uniform approximation property
called the tameness condition. For the mappings, the associated tameness property
means that certain growth estimates hold. The statement of the Nash–Moser theorem
is surprisingly similar to the classical inverse function theorem, but the meaning of
the concepts used is much more involved: when a smooth tame mapping f acting be-
tween Fréchet spaces has an invertible tame derivative, then f −1 has a smooth tame
single-valued localization. The rigorous introduction of the tame spaces, mappings
and derivatives is beyond the scope of this book; we only note here that extending
the Nash–Moser theorem to set-valued mappings, e.g. in the setting of Section 5E,
is a challenging avenue for future research.
Chapter 6
Applications in Numerical Variational Analysis
The classical implicit function theorem finds a wide range of applications in numer-
ical analysis. For instance, it helps in deriving error estimates for approximations to
differential equations and is often relied on in establishing the convergence of algo-
rithms. Can the generalizations of the classical theory to which we have devoted so
much of this book have comparable applications in the numerical treatment of non-
classical problems for generalized equations and beyond? In this chapter we provide
positive answers in several directions.
We begin with a topic at the core of numerical work, the “conditioning” of a
problem and how it extends to concepts like metric regularity. We also explain how
the conditioning of a feasibility problem, like solving a system of inequalities, can
be understood. Next we take up a general iterative scheme for solving generalized
equations under metric regularity, obtaining convergence by means of our earlier
basic results. As particular cases, we get various modes of convergence of the age-
old procedure known as Newton’s method in several guises, and of the much more
recently introduced proximal point algorithm. We go a step further with Newton’s
method by showing that the mapping which assigns to an instance of a parameter the
set of all sequences generated by the method obeys, in a Banach space of sequences,
the implicit function theorem paradigm in the same pattern as the solution mapping
for the underlying generalized equation. Approximations of quadratic optimization
problems in Hilbert spaces are then studied. Finally, we apply our methodology to
discrete approximations in optimal control.
A.L. Dontchev and R.T. Rockafellar, Implicit Functions and Solution Mappings: A View 311
from Variational Analysis, Springer Monographs in Mathematics,
DOI 10.1007/978-0-387-87821-8 6, c Springer Science+Business Media, LLC 2009
312 6 Applications in Numerical Variational Analysis
In this sense, |A−1 |−1 gives the radius of nonsingularity around A. As long as B lies
within that distance from A, the nonsingularity of A + B is assured. Clearly from this
angle as well, a large value of the condition number |A−1 | points toward numerical
difficulties.
The model provided to us by this example is that of a radius theorem, furnishing
a bound on how far perturbations of some sort in the specification of a problem can
go before some key property is lost. Radius theorems can be investigated not only
for solving equations, linear and nonlinear, but also generalized equations, systems
of constraints, etc.
We start down that track by stating the version of the cited matrix result that
works in infinite dimensions for bounded linear mappings acting in Banach spaces.
6 Applications in Numerical Variational Analysis 313
Theorem 6A.1 (radius theorem for invertibility of bounded linear mappings). Let
X and Y be Banach spaces and let A ∈ L (X ,Y ) be invertible1. Then
1
(1) inf B A + B is not invertible = .
B∈L (X,Y ) A−1
x∗ (x)Ax̂
(2) Bx = −
x̂
has B = 1/x̂ and (A + B)x̂ = Ax̂ − Ax̂ = 0. Then A + B is not invertible and
hence the infimum in (1) is ≤ r. It remains to note that B in (2) is of rank one.
The initial step that can be taken toward generality beyond linear mappings is
in the direction of positively homogeneous mappings H : X → → Y ; here and further
on, X and Y are Banach spaces. For such a mapping, ordinary norms can no longer
be of help in conditioning, but the outer and inner norms introduced in 4A in finite
dimensions and extended in 5A to Banach spaces can come into play:
+ −
H = sup sup y and H = sup inf y.
x≤1 y∈H(x) x≤1 y∈H(x)
H −1 = sup H −1 = sup
+ −
sup x and inf x.
y≤1 x∈H −1 (y) y≤1 x∈H −1 (y)
In thinking of H −1 (y) as the set of solutions x to H(x) y, it is clear that the outer
and inner norms of H −1 capture two different aspects of solution behavior, roughly
the distance to the farthest solution and the distance to the nearest solution (when
multi-valuedness is present). We are able to assert, for instance, that
caused by a shift from y to y + δ y. Without H being linear, there seems little hope
of quantifying that aspect of error, not to speak of relative error. Nonetheless, it will
be possible to get radius theorems in which the reciprocals of H −1 + and H −1 −
are featured.
For H −1 + , we can utilize the inversion estimate for the outer norm in 5A.8. A
definition is needed first.
x∗ (x)ŷ
Bx = −
x̂
has B = 1/x̂ < r and (H + B)(x̂) = H(x̂) − ŷ 0. Then the nonzero vector x̂
belongs to (H + B)−1 (0), hence (H + B)−1 + = ∞, i.e., H + B is singular. The infi-
mum in (3) must therefore be less than r. Appealing to the choice of r we conclude
that the infimum in (3) cannot be more than 1/H −1+ , and we are done.
To develop a radius theorem about H −1 − , we have to look more narrowly
at sublinear mappings, which are characterized by having graphs that are not just
cones, as corresponds to positive homogeneity, but convex cones. For such a map-
ping H, if its graph is also closed, we have an inversion estimate for the inner norm
in 5C.9. Furthermore, we know from 5C.2 that the surjectivity of H is equivalent to
having H −1 − < ∞. We also have available the notion of the adjoint mapping as
introduced in Section 5C: the upper adjoint of H : X → → Y is the sublinear mapping
H ∗+ : Y ∗ →
→ X ∗ defined by
Recall too, from 5C.13, that for a sublinear mapping H with closed graph,
(H ∗+ )−1 = H −1 ,
+ −
(4)
6 Applications in Numerical Variational Analysis 315
Theorem 6A.3 (radius theorem for surjectivity of sublinear mappings). For any
H:X →→ Y that is sublinear, surjective, and with closed graph,
1
inf B H + B is not surjective = .
B∈L (X,Y ) H −1 −
The right side of (6) can be identified through Theorem 6A.2 with
1
(7) inf C H ∗+ + C is singular =
C∈L (Y ∗ ,X ∗ ) (H ∗+ )−1 +
by the observation that any C ∈ L (Y ∗ , X ∗ ) of rank one has the form B∗ for some
B ∈ L (X,Y ) of rank one. It remains to apply the relation in (4). In consequence of
that, the left side of (7) is 1/H −1− , and we get the desired equality.
In the case of H being a bounded linear mapping A : X → Y , Theorems 6A.2 and
6A.3 both furnish results which complement Theorem 6A.1, since nonsingularity
just comes down to A−1 being single-valued on rge A, while surjectivity corresponds
only to dom A−1 being all of Y , and neither of those properties automatically entails
the other. When X = Y = IRn , of course, all three theorems reduce to the matrix
result recalled at the beginning of this section.
The surjectivity result in 6A.3 offers more than an extended insight into equation
solving, however. It can be applied also to systems of inequalities. This is true even
in infinite dimensions, but we are not yet prepared to speak of inequality constraints
in that framework, so we limit the following illustration to solving Ax ≤ y in the
case of a matrix A ∈ IRm×n . It will be convenient to say that
We adopt for y = (y1 , . . . , ym ) ∈ IRm the maximum norm |y|∞ = max1≤k≤m |yk | but
equip IRn with any norm. The associated operator norm for linear mappings acting
from IRn to IRm is denoted by | · |∞ . Also, we use the notation for y = (y1 , . . . , ym ) that
+ + + +
y = (y1 , . . . , ym ), where yk = max{0, yk }.
316 6 Applications in Numerical Variational Analysis
Since any linear mapping is sublinear, this equality combined with 6A.3 gives us
yet another radius result.
Corollary 6A.5 (radius theorem for metric regularity of strictly differentiable func-
tions). Let f : X → Y be strictly differentiable at x̄, let ȳ := f (x̄), and let D f (x̄) be
surjective. Then
1
inf B f + B is not metrically regular at x̄ for ȳ + Bx̄ = .
B∈L (X,Y ) D f (x̄)−1 −
It should not escape attention here that in 6A.5 we are not focused any more on
the origins of X and Y but on a general pair (x̄, ȳ) in the graph of f . This allows us
to return to “conditioning” from a different perspective, if we are willing to think of
such a property in a local sense only.
Suppose that a y near to ȳ is perturbed to y + δ y. The solution set f −1 (y) to
the problem of solving f (x) y is thereby shifted to f −1 (y + δ y), and we have an
interest in understanding the “error” vectors δ x such that x + δ x ∈ f −1 (y + δ y).
Since anyway x need not be the only element of f −1 (y), it is appropriate to quantify
the shift by looking for the smallest size of δ x, or in other words at dist(x, f −1 (y +
δ y)) and how it compares to δ y. This ratio, in its limit as (x, y) goes to (x̄, ȳ) and
δ y goes to 0, is precisely reg ( f ; x̄ | ȳ).
In this sense, reg ( f ; x̄ | ȳ) can be deemed the absolute condition number locally
with respect to x̄ and ȳ for the problem of solving f (x) y for x in terms of y. We
then have a local, nonlinear analog of Theorem 6A.1, tying a condition number to
a radius. It provides something more even for linear f , of course, since in contrast
to Theorem 6A.1, it imposes no requirement of invertibility.
Corollary 6A.5 can be stated in a more general form, which we give here as an
exercise:
6 Applications in Numerical Variational Analysis 317
Exercise 6A.6. Let F : X → → Y with (x̄, ȳ) ∈ gph F being a pair at which gph F
is locally closed. Let F be metrically regular at x̄ for ȳ, and let f : X → Y satisfy
x̄ ∈ int dom f and lip ( f ; x̄) = 0. Then
inf B F + B is not metrically regular at x̄ for ȳ + Bx̄
B∈L (X,Y )
= inf B F + f + B is not metrically regular at x̄ for ȳ + f (x̄) + Bx̄ .
B∈L (X,Y )
Guide. Observe that, by the Banach space version of 3F.4 (which follows from
5E.1), the mapping F + B is metrically regular at ȳ + Bx̄ if and only if the mapping
F + f + B is metrically regular at ȳ + f (x̄) + Bx̄.
We will show next that, in finite dimensions at least, the radius result in 6A.5 is
valid when f is replaced by any set-valued mapping F whose graph is locally closed
around the reference pair (x̄, ȳ).
Theorem 6A.7 (radius theorem for metric regularity). Let X and Y be finite-
dimensional normed linear spaces, and for F : X → → Y and ȳ ∈ F(x̄) let gph F be
locally closed at (x̄, ȳ). Suppose F is metrically regular at x̄ for ȳ. Then
1
(8) inf B F + B is not metrically regular at x̄ for ȳ + Bx̄ = .
B∈L (X,Y ) reg (F; x̄| ȳ)
which becomes the equality (8) in the case when reg (F; x̄| ȳ) = 0 under the conven-
tion 1/0 = ∞. To confirm the opposite inequality when reg (F; x̄| ȳ) > 0, we apply
Theorem 4B.9, according to which
reg (F; x̄| ȳ) + εk ≥ D̃F(xk |yk )−1 ≥ reg (F; x̄| ȳ) − εk > 0.
−
318 6 Applications in Numerical Variational Analysis
Let Hk := D̃F(xk |yk ) and Sk := Hk∗+ ; then norm duality gives us Hk−1 − = Sk−1+ ,
see 5C.13.
For each k > k̄ choose a positive real rk satisfying Sk−1 + − εk < 1/rk < Sk−1+ .
From the last inequality there must exist (ŷk , x̂k ) ∈ gph Sk with x̂k = 1 and
Sk−1+ ≥ ŷk > 1/rk . Pick y∗k ∈ Y with ŷk , y∗k = ŷk and y∗k = 1, and define
the rank-one mapping Ĝk ∈ L (Y, X) by
y, y∗k
Ĝk (y) := − x̂k .
ŷk
Then Ĝk (ŷk ) = −x̂k and hence (Sk + Ĝk )(ŷk ) = Sk (ŷk ) + Ĝk (ŷk ) = Sk (ŷk ) − x̂k 0.
Therefore, ŷk ∈ (Sk + Ĝk )−1 (0), and since ŷk = 0 and Sk is positively homogeneous
with closed graph, we have by Proposition 5A.7, formula 5A(10), that
y, y∗
Ĝ(y) := − x̂.
ŷ
Then we have Ĝ = reg (F; x̄| ȳ)−1 and Ĝk − Ĝ → 0.
Let B := (Ĝ)∗ and suppose F + B is metrically regular at x̄ for ȳ + Bx̄. Theorem
4B.9 yields that there is a finite positive constant c such that for k > k̄ sufficiently
large, we have
c > D̃(F + B)(xk |yk + Bxk )−1 .
−
Take k > k̄ sufficiently large such that Ĝ − Ĝk ≤ 1/(2c). Setting Pk := Sk + Ĝ and
Qk := Ĝk − Ĝ, we have that
By using the inversion estimate for the outer norm in Theorem 5C.9, we have
6 Applications in Numerical Variational Analysis 319
−1
(Sk + Ĝk )−1 = (Pk + Qk )−1 ≤ [Pk−1 ]−1 − Qk
+ + +
≤ 2c < ∞.
This contradicts (10). Hence, F + B is not metrically regular at x̄ for ȳ + Bx̄. Noting
that B = Ĝ = 1/ reg(F; x̄| ȳ) and that B is of rank one, we are finished.
In a pattern just like the one laid out after Corollary 6A.5, it is appropriate to
consider reg (F; x̄| ȳ) as the local absolute condition number with respect to x̄ and ȳ
for the problem of solving F(x) y for x in terms of y. An even grander extension
of the fact in 6A.1, that the reciprocal of the absolute condition number gives the
radius of perturbation for preserving an associated property, is thereby achieved.
Based on Theorem 6A.7, it is now easy to obtain a parallel radius result for strong
metric regularity.
Theorem 6A.8 (radius theorem for strong metric regularity). For finite-dimensional
normed linear spaces X and Y , let F : X → → Y have ȳ ∈ F(x̄). Suppose that F is
strongly metrically regular at x̄ for ȳ. Then
1
(12) inf B F + B is not strongly regular at x̄ for ȳ + Bx̄ = .
B∈L (X,Y ) reg (F; x̄| ȳ)
Moreover, the infimum is unchanged if taken with respect to linear mappings of rank
1, but also remains unchanged when the class of perturbations B is enlarged to the
class of locally Lipschitz continuous functions g with B replaced by the Lipschitz
modulus lip (g; x̄).
Proof. Theorem 5F.1 reveals that “≥” holds in (12) when the linear perturbation is
replaced by a Lipschitz perturbation, and moreover that (12) is satisfied in the limit
case reg (F; x̄| ȳ) = 0 under the convention 1/0 = ∞. The inequality becomes an
equality with the observation that the assumed strong metric regularity of F implies
that F has locally closed graph at (x̄, ȳ) and is metrically regular at x̄ for ȳ. Hence
the infimum in (12) is not greater than the infimum in (8).
Next comes a radius theorem for strong subregularity to go along with the ones
for metric regularity and strong metric regularity.
Theorem 6A.9 (radius theorem for strong metric subregularity). Let X and Y be
finite-dimensional normed linear spaces X and Y , and for F : X → → Y and ȳ ∈ F(x̄)
let gph F be locally closed at (x̄, ȳ). Suppose that F is strongly metrically subregular
at x̄ for ȳ. Then
1
inf B F +B is not strongly subregular at x̄ for ȳ + Bx̄ = .
B∈L (X,Y ) subreg (F; x̄| ȳ)
We know from the sum rule for graphical differentiation (4A.1) that D(F + B)(x̄| ȳ+
Bx̄) = DF(x̄| ȳ) + B, hence
inf B D(F + B)(x̄| ȳ + Bx̄) is singular
B∈L (X,Y )
(14)
= inf B DF(x̄| ȳ) + B is singular .
B∈L (X,Y )
including the case DF(x̄| ȳ)−1 + = 0 with the convention 1/0 = ∞. Theorem 4C.1
tells us also that DF(x̄| ȳ)−1 + = subreg(F; x̄| ȳ) and then, putting together (13),
(14) and (15), we get the desired equality.
As with the preceding results, the modulus subreg(F; x̄| ȳ) can be regarded
as a sort of local absolute condition number. But in this case only the ratio of
dist(x̄, F −1 (ȳ + δ y)) to δ y is considered in its limsup as δ y goes to 0, not the
limsup of all the ratios dist(x, F −1 (y + δ y))/δ y with (x, y) ∈ gph F tending to
(x̄, ȳ), which gives reg (F; x̄| ȳ). Specifically, with reg (F; x̄| ȳ) appropriately termed
the absolute condition number for F locally with respect to x̄ and ȳ, subreg(F; x̄| ȳ)
is the corresponding subcondition number.
The radius-type theorems above could be rewritten in terms of the associated
equivalent properties of the inverse mappings. For example, Theorem 6A.7 could
be restated in terms of perturbations B of a mapping F whose inverse has the Aubin
property.
Feasibility. Problem (1) will be called feasible if F −1 (0) = 0/ , i.e., 0 ∈ rge F , and
strictly feasible if 0 ∈ int rge F .
Two examples will point the way toward progress. Recall that any closed, convex
cone K ⊂ Y with nonempty interior induces a partial ordering “≤K ” under the rule
that y0 ≤K y1 means y1 − y0 ∈ K. Correspondingly, y0 <K y1 means y1 − y0 ∈ int K.
Example 6B.1 (convex constraint systems). Let C ⊂ X be a closed convex set, let
K ⊂ Y be a closed convex cone, and let A : C → Y be a continuous and convex
mapping with respect to the partial ordering in Y induced by K; that is,
∗ ∗
∗+ ∗ A (y ) − C+ if y∗ ∈ K + ,
F (y ) =
0/ if y∗ ∈
/ K+,
Detail. In this case the graph of F is clearly a convex cone, and that means F is
sublinear. The claims about the adjoint of F follow by elementary calculation.
Along the lines of the analysis in 6A, in dealing with the feasibility problem (1)
we will be interested in perturbations in which F is replaced by F + B for some
B ∈ L (X ,Y ), and at the same time, the zero on the right is replaced by some other
b ∈ Y . Such a double perturbation, the magnitude of which can be quantified by the
norm
(2) (B, b) = max B, b ,
transforms the condition F(x) 0 to (F + B)(x) b and the solution set F −1 (0) to
(F + B)−1 (b), creating infeasibility if (F + B)−1 (b) = 0,
/ i.e., if b ∈
/ rge(F + B). We
want to understand how large (B, b) can be before this happens.
Proof. Let S1 denote the set of (B, b) over which the infimum is taken in (3) and let
S2 be the corresponding set in (4). Obviously S1 ⊂ S2 , so the first infimum cannot
be less than the second. We must show that it also cannot be greater. This amounts
to demonstrating that for any (B, b) ∈ S2 and any ε > 0 we can find (B , b ) ∈ S1
such that (B , b ) ≤ (B, b) + ε . In fact, we can get this with B = B simply by
noting that when b ∈ / int rge(F + B) there must exist b ∈ Y with b ∈ / rge(F + B)
and b − b ≤ ε .
By utilizing the Robinson–Ursescu theorem 5B.4, we can see furthermore that
the distance to infeasibility is actually the same as the distance to metric nonregu-
larity:
Lemma 6B.4 (distance to infeasibility equals radius of metric regularity). The dis-
tance to infeasibility in problem (1) coincides with the value
(5) inf (b, B) F + B is not metrically regular at any x̄ for b .
(B,b)∈L (X,Y )×Y
6 Applications in Numerical Variational Analysis 323
Proof. In view of the equivalence of infeasibility with strict feasibility in 6B.3, the
Robinson–Ursescu theorem 5B.4 just says that problem (1) is feasible if and only if
F is metrically regular at x̄ for 0 for any x̄ ∈ F −1 (0), hence (5).
In order to estimate the distance to infeasibility in terms of the modulus of metric
regularity, we pass from F to a special mapping F̄ constructed as a “homogeniza-
tion” of F. We will then be able to apply to F̄ the result on distance to metric
nonregularity of sublinear mappings given in 6A.3.
We use the horizon mapping F ∞ associated with F, the graph of F ∞ in X × Y
being the recession cone of gph F in the sense of convex analysis:
We are now ready to state and prove a result which gives a quantitative expression
for the magnitude of the distance to infeasibility:
that if 0 ∈ int rge F, then 0 ∈ int rge F̄. Since rge F̄ is a convex cone, the latter is
equivalent to having rge F̄ = Y , i.e., surjectivity.
Conversely now, suppose F̄ is surjective. Theorem 5B.4 (Robinson–Ursescu) in-
forms us that in this case, 0 ∈ int F̄(W ) for every neighborhood W of the origin in
IR × X. It must be verified, however, that 0 ∈ int rge F. In terms of C(t) = F̄(IB,t) ⊂
Y , it will suffice to show that 0 ∈ int C(t) for some t > 0. Note that the sublinearity
of F̄ implies that
We will use this to show that actually 0 ∈ int C(τ ). For y∗ ∈ Y ∗ define
The property in (9) makes σ (y∗ ,t) concave in t, and the same then follows for λ (t).
As long as 0 ≤ t ≤ 2τ , we have σ (y∗ ,t) ≥ 0 and λ (t) ≥ 0 by (10). On the other
hand, the union in (11) includes some ball around the origin. Therefore,
(12) ∃ ε > 0 such that sup σ (y∗ ,t) ≥ ε for all y∗ ∈ Y ∗ with y∗ = 1.
0≤t≤2τ
We argue next that λ (τ ) > 0. If not, then since λ is a nonnegative concave func-
tion on [0, 2τ ], we would have to have λ (t) = 0 for all t ∈ [0, 2τ ]. Supposing that
to be the case, choose δ ∈ (0, ε /2) and, in the light of the definition of λ (τ ), an
element ŷ∗ with σ (ŷ∗ , τ ) < δ . The nonnegativity and concavity of σ (ŷ∗ , ·) on [0, 2τ ]
imply then that σ (ŷ∗ ,t) ≤ (δ /τ )t when τ ≤ t ≤ 2τ and σ (ŷ∗ ,t) ≤ 2δ − (δ /τ )t when
0 ≤ t ≤ τ . But that gives us σ (ŷ∗ ,t) ≤ 2δ < ε for all t ∈ [0, 2τ ], in contradiction to
the property of ε in (12). Therefore, λ (τ ) > 0, as claimed.
We have σ (y∗ , τ ) ≥ λ (τ ) when y∗ = 1, and hence by positive homogeneity
σ (y∗ , τ ) ≥ λ (τ )y∗ for all y∗ ∈ Y ∗ . In this inequality, σ (·, τ ) is the support function
of the convex set C(τ ), or equivalently of cl C(τ ), whereas λ (τ ) · is the support
function of λ (τ )IB. It follows therefore that cl C(τ ) ⊃ λ (τ )IB, so that at least 0 ∈
int cl C(τ ).
Now, remembering that C(τ ) = τ F(τ −1 IB), we obtain 0 ∈ int cl F(τ −1 IB). Con-
sider the mapping
6 Applications in Numerical Variational Analysis 325
F̃(x) = F(x) if x ∈ τ −1 IB,
0/ otherwise.
Clearly rge F̃ ⊂ rge F. Applying Theorem 5B.1 to the mapping F̃ gives us
Hence, through Lemma 6B.4, the distance to infeasibility for the system F(x) 0
is the infimum of B̄ over all B̄ ∈ L (X × IR,Y ) such that F̄ + B̄ is not surjective.
Theorem 6A.3 then furnishes the conclusion in (8).
Passing to the adjoint mapping, we can obtain a “dual” formula for the distance
to infeasibility:
Proof. In this case the function h in 6B.6 has h(x∗ , y∗ ) = 0 when x∗ ∈ F ∗+ (y∗ ), but
h(x∗ , y∗ ) = ∞ otherwise.
In particular, for a linear-conic constraint system of type x ≥C 0, A(x) ≤K 0, with
respect to a continuous linear mapping A : X → Y and closed, convex cones C ⊂ X
and K ⊂ Y , we obtain
At the end of the section we will provide more details about this method, putting it
in the perspective of monotone mappings and optimization problems.
328 6 Applications in Numerical Variational Analysis
Our main result, which follows, concerns convergence of the iterative process (2)
under the assumption of metric regularity of the mapping f + F.
for all k = 0, 1, . . . .
Then there is a neighborhood O of x̄ such that, for any starting point x0 ∈ O and
any sequence δk 0 satisfying
κ μk + δk
(8) γk := < 1 for k = 0, 1, . . . ,
1 − κεk
there exists a sequence {xk } generated by the procedure (2) which converges to x̄
with
Proof. Let constants a > 0 and b > 0 be such that the mapping f + F is metrically
regular at x̄ for 0 with constant κ and neighborhoods IBa (x̄) and IBb (0). Make a
smaller if necessary so that IBa (x̄) ⊂ U and
From the second inequality in (5) there exists a sequence δk 0 satisfying (8);
choose such a sequence and determine γk from (8). Pick x0 ∈ IBa (x̄). If x0 = x̄ then
take xk = x̄ for all k and there is nothing more to prove. If not, consider the function
x → g0 (x) := f (x) − A0 (x, x0 ). For any x ∈ IBa (x̄), using (6), (7) and (10), and noting
that Ak (x̄, x̄) = f (x̄) from (7), we have
We will demonstrate that the mapping Φ0 : x → ( f + F)−1 (g0 (x)) satisfies the as-
sumptions of the contraction mapping principle for set-valued mappings (Theorem
5E.2).
By virtue of the metric regularity of f + F, the form of g0 , the fact that − f (x̄) =
−A(x̄, x̄) ∈ F(x̄), and (7), we have
where γ0 is defined in (8) and hence, γ0 x0 − x̄ ≤ a. Let u, v ∈ IBγ0 x0 −x̄ (x̄). In-
voking again the metric regularity of f + F as well as the estimate (11), for any
u, v ∈ IBγ0 x0 −x̄ (x̄) we obtain
e(Φ0 (u) ∩ IBγ0 x0 −x̄ (x̄), Φ0 (v)) ≤ e(Φ0 (u) ∩ IBa (x̄), Φ0 (v))
= sup d(x, ( f + F)−1 (g0 (v))) x ∈ ( f + F)−1 (g0 (u)) ∩ IBa (x̄)
≤ sup κ d(g0 (v), f (x) + F(x)) x ∈ ( f + F)−1 (g0 (u)) ∩ IBa (x̄)
≤ κ f (v) − A0 (v, x0 ) − [ f (u) − A0(u, x0 )] ≤ κε0 u − v.
Hence, by the contraction mapping principle 5E.2 there exists a fixed point x1 ∈
Φ0 (x1 ) ∩ IBγ0 x0 −x̄ (x̄). This translates to g0 (x1 ) = f (x1 ) − A0 (x1 , x0 ) ∈ ( f + F)(x1 ),
meaning that x1 is obtained from x0 by iteration (2) and satisfies (9) for k = 0.
The induction step is now clear. If xk ∈ IBa (x̄) and xk = x̄, by defining gk (x) =
f (x) − Ak (x, xk ), we obtain as in (11) that gk (x) ≤ b for all x ∈ IBak (x̄). Then
Theorem 5E.2 applies to Φk : x → ( f + F)−1 (gk (x)) on the ball IBγk xk −x̄ (x̄) and
yields the existence of an iterate xk+1 satisfying (9). The condition in (8) ensures
that the sequence {xk } is convergent and its limit is x̄.
It is a standard concept in numerical analysis that a sequence {xk } is linearly
convergent to x̄ when
xk+1 − x̄
lim sup < 1.
k→∞ xk − x̄
Thus, the sequence {xk } whose existence is claimed in 6C.1, is linearly convergent
to x̄. If the stronger condition
xk+1 − x̄
lim =0
k→∞ xk − x̄
(12) lim μk = 0.
k→∞
Let xk ∈ IBa (x̄) for k = 0, 1, . . . be generated by (2), and let gk (x) = f (x) − Ak (x, xk ).
From (6) and (7),
Exercise 6C.4 (convergence under strong metric regularity). Under the conditions
of Theorem 6C.1, assume in addition that the mapping f + F is strongly metrically
regular at x̄ for 0. Then there exists a neighborhood O of x̄ such that, for any x0 ∈ O,
there is a unique sequence {xk } generated by the iterative process (2). This sequence
is linearly convergent to x̄. If (12) holds, the sequence is superlinearly convergent.
Guide. This could be verified in several ways, one of which is to repeat the proof
of 6C.1 using the standard contraction mapping principle, 1A.2, instead of 5E.2.
We will see next what the assumptions in 6C.1 mean in the specific cases of
Newton’s method (3) and proximal point method (4).
6 Applications in Numerical Variational Analysis 331
Proof. The proof follows the fixed point argument in the proof of 6C.1, but with
some modifications that require attention. Choose γ as in (14) and let
(16) κ > reg ( f + F; x̄|0) and μ > lip (D f ; x̄) be such that γ > κ μ /2.
Further, choose a > 0 and b > 0 so that f + F is metrically regular at x̄ for 0 with
constant κ and neighborhoods IBa (x̄) and IBb (0). Make a > 0 smaller if necessary
so that
5 2 1
(18) μ a ≤ b, κ μ a < 1, κ μ < γ (1 − κ μ a) and γ a < 1.
2 2
Here and in the following section we use an estimate for smooth functions obtained
by elementary calculus. From the standard equality
1
f (u) − f (v) = D f (v + t(u − v))(u − v)dt,
0
This yields
1
(19) f (u) − f (v) − D f (v)(u − v) ≤ μ u − v2.
2
Fix w ∈ IBa (x̄) and consider the function
Pick x0 ∈ IBa (x̄), x0 = x̄, and consider the mapping Φ0 (x) = ( f + F)−1 (g(x0 , x)).
Noting that 0 ∈ ( f + F)(x̄) and using the metric regularity of f + F together with
(20), and also (18) and (19), we obtain
d(x̄, Φ0 (x̄)) = d(x̄, ( f + F)−1 (g(x0 , x̄))) ≤ κ d(g(x0 , x̄), f (x̄) + F(x̄))
≤ κ f (x̄) − f (x0 ) − D f (x0 )(x̄ − x0 )
1
≤ κ μ x0 − x̄2 < r0 (1 − κ μ a),
2
where r0 = γ x0 − x̄2 ≤ a. Moreover, for any u, v ∈ IBr0 (x̄),
Hence, by 5E.2 there exists x1 ∈ Φ (x1 ) ∩ IBr0 (x̄), which translates to having x1 ob-
tained from x0 as a first iterate of Newton’s method (3) and satisfying (15) for k = 0.
The induction step is completely analogous, giving us a sequence {xk } which satis-
fies (15). Since γ a < 1 as required in (18), this sequence is convergent.
6 Applications in Numerical Variational Analysis 333
then there exists a neighborhood O of x̄ such that for any x0 ∈ O there is a sequence
{xk } generated by the method starting at x0 which is linearly convergent to x̄.
(ii) If f + F is strongly metrically subregular at x̄ for 0 and
1
(22) supk λk < ,
2 subreg( f + F; x̄|0)
then there exists a neighborhood O of x̄ such that any sequence {xk } generated by
the method which is contained in O is linearly convergent to x̄.
(iii) If f + F is strongly metrically regular at x̄ for 0, then for the neighborhood O
in (i) and any x0 ∈ O the method generates a unique sequence {xk }. This sequence,
according to (i), is linearly convergent to x̄.
If the sequence of numbers λk in (4) is chosen such that limk→∞ λk = 0, the
convergence claimed in (i), (ii) and (iii) is superlinear.
Proof. With Ak (x, u) = λk (x − u) + f (x), in (6)–(8) we can take εk = μk = λk for
all k = 0, 1, . . . . Then, with a particular choice of κ , from (21) or (22) we get (6)
and also (7)(8) for any neighborhood U of x̄. If λk → 0, then (12) holds, implying
superlinear convergence.
Let us go back to Newton’s method in the general form (3) and apply it to the
nonlinear programming problem considered in sections 2A and 2G:
≤ 0 for i ∈ [1, s],
(23) minimize g0 (x) over all x satisfying gi (x)
= 0 for i ∈ [s + 1, m],
where we denote by g(x) the vector with components g1 (x), . . . , gm (x). Let x̄ be
a local minimum for (23) satisfying the constraint qualification, and let ȳ be an
associated Lagrange multiplier vector. As applied to the variational inequality (24),
Newton’s method (3) consists in generating a sequence {(xk , yk )} starting from a
point (x0 , y0 ), close enough to (x̄, ȳ), according to the iteration
∇x L(xk , yk ) + ∇2xx L(xk , yk )(xk+1 − xk ) + ∇g(xk )T (yk+1 − yk ) = 0,
(25)
g(xk ) + ∇g(xk )(xk+1 − xk ) ∈ NIRs+ ×IRm−s (yk+1 ).
Thus, in the circumstances of (23) under strong metric regularity of the mapping in
(24), Newton’s method (3) comes down to sequentially solving quadratic programs
of the form (26). This specific application of Newton’s method is therefore called
the sequential quadratic programming (SQP) method.
We summarize the conclusions obtained so far about the SQP method as an il-
lustration of the power of the theory developed in this section.
Example 6C.8 (quadratic convergence of SQP). Consider the nonlinear program-
ming problem (23) with the associated Karush–Kuhn–Tucker condition (24) and let
x̄ be a solution with an associated Lagrange multiplier vector ȳ. In the notation
I = i ∈ [1, m] gi (x̄) = 0 ⊃ {s + 1, . . ., m},
I0 = i ∈ [1, s] gi (x̄) = 0 and ȳi = 0 ⊂ I
and
M + = w ∈ IRn w ⊥ ∇x gi (x̄) for all i ∈ I\I
0 ,
M − = w ∈ IRn w ⊥ ∇x gi (x̄) for all i ∈ I ,
suppose that the following conditions are both fulfilled:
(a) the gradients ∇x gi (x̄) for i ∈ I are linearly independent,
(b) w, ∇2xx L(x̄, ȳ)w > 0 for every nonzero w ∈ M + with ∇2xx L( p̄, x̄, ȳ)w ⊥ M − .
6 Applications in Numerical Variational Analysis 335
Then there exists a neighborhood O of (x̄, ȳ) such that, for any starting point
(x0 , y0 ) ∈ O, the SQP method (26) generates a unique sequence which converges
quadratically to (x̄, ȳ).
There are various numerical issues related to implementation of the SQP method
that have been investigated in the last several decades, and various enhancements
are available as commercial software, but we shall not go into this further.
Lastly, we will discuss a bit more the proximal point method in the context of
monotone mappings. First, note that the iterative process (4) can be equally well
written as
It has been extensively studied under the additional assumption that X is a Hilbert
space (e.g., consider IRn under the Euclidean norm) and T is a maximal monotone
mapping from X to X . Monotonicity, which we considered in 2F only for functions
from IRn to IRn , refers in the case of a potentially set-valued mapping T to the prop-
erty of having
It is called maximal when no more points can be added to gph T without running
into a violation of (28). (A localized monotonicity for set-valued mappings was
introduced at the end of 3G, but again only in finite dimensions.)
The following fact about maximal monotone mappings, recalled here without its
proof, underlies much of the literature on the proximal point method in basic form
and indicates its fixed-point motivation.
f (x) + NC (x) 0,
as an instance of the generalized equation (1), describes the points x (if any) which
minimize h over C. In comparison, in the iterations for this case of the proximal
point method in the basic form (4), the point xk+1 determined from xk is the unique
minimizer of h(x) + (λk /2)||x − xk ||2 over C.
Detail. This invokes the gradient monotonicity property associated with convexity
in 2F.3(a) (which is equally valid in infinite dimensions), along with the optimality
condition in 2A.6. The addition of the quadratic expression (λk /2)||x − xk ||2 to h
creates a function hk which is strongly convex with constant λk and thus attains its
minimum, moreover uniquely.
The expression (λk /2)||x − xk ||2 in 6C.11 is called a proximal term because it
helps to keep x near to the current point xk . Its effect is to stabilize the procedure
while inducing technically desirable properties like strong convexity in place of
plain convexity. It’s from this that the algorithm got its name.
Instead of adding a quadratic term to h, the strategy in Example 6C.11 could be
generalized to adding a term rk (x − xk ) for some other convex function rk having its
minimum at the origin, and adjusting the algorithm accordingly.
Exercise 6C.12. Prove Theorem 6C.1 by using the result stated in Exercise 5E.5.
We will focus on the version of Newton’s method treated in the previous section,
the only difference being that now we utilize the partial derivative of the function f
with respect to x:
with a given starting point x0 . We will consider the method (3) more broadly by
reconceiving Newton’s iteration as an inclusion, the solution of which gives a whole
sequence instead of just an element in X. Let l∞ (X) be the Banach space consisting
of all infinite sequences ξ = {x1 , x2 , . . . , xk , . . .} with elements xk ∈ X, k = 1, 2, . . . ,
equipped with the supremum norm
ξ ∞ = sup xk .
k≥1
whose value for a given (u, p) is the set of all sequences {xk }∞ k=1 generated by New-
ton’s iteration (3) for p that start from u. If x̄ is a solution to (1) for p̄, then the
constant sequence ξ̄ = {x̄, x̄, . . . , x̄, . . .} satisfies ξ̄ ∈ Ξ (x̄, p̄).
Our first result reveals uniform quadratic convergence under strong metric regu-
larity.
(5) G(x) = f ( p̄, x̄) + Dx f ( p̄, x̄)(x − x̄) + F(x) for which G(x̄) 0
in Theorem 5F.4. Moreover the convergence is quadratic with constant γ, that is,
The assumed strong regularity of the mapping G in (6) at x̄ for 0 and the choice of
κ guarantee the existence of positive constants α and b such that the mapping y →
σ (y) = G−1 (y) ∩ IBα (x̄) is a Lipschitz continuous function on IBb (0) with Lipschitz
constant κ . Along with the mapping G consider the parameterized mapping
Note that G p,w (x) = G(x) + r(p, w; x), where the function
Now, let κ be such that κ > κ > lip (σ ; 0), and let χ > 0 satisfy
κ
χκ < 1 and < κ.
1 − χκ
Applying 5F.3, which is a special case of 5F.1, and taking into account that
r( p̄, x̄; x̄) = 0, we obtain the existence of positive constants α ≤ α and b ≤ b such
that, for p and w satisfying η (p, w) ≤ χ , the mapping y → G−1 p,w (y) ∩ IBα (x̄) is a
Lipschitz continuous function on IBb (0) with Lipschitz constant κ . We denote this
function by Θ (p, w; ·).
Since Dx f is continuous, there are positive constants c and a such that η (p, w) ≤
χ as long as p ∈ IBc ( p̄) and w ∈ IBa (x̄). Make a and c smaller if necessary so that
a ≤ α and moreover
(9) Dx f (p, x) − Dx f (p, x ) ≤ μ x − x for x, x ∈ IBa (x̄) and p ∈ IBc ( p̄).
By Theorem 5F.4, we can further adjust a and c so that the truncation S(p) ∩ IBa (x̄)
of the solution mapping S in (2) is a function s which is Lipschitz continuous on
6 Applications in Numerical Variational Analysis 339
(12) s(p) − x̄ ≤ a/2 and f (p, x̄) − f ( p̄, x̄) ≤ δ for p ∈ IBc ( p̄).
Summarizing to this point, we have determined constants a, b and c such that, for
each p ∈ IBc ( p̄) and w ∈ IBa (x̄), the function Θ (p, w, ·) is Lipschitz continuous on
IBb (0) with constant κ , and also, the conditions (9)–(12) are satisfied.
From 6C(19) applied now to the function f (p, x) we have through (9) that, for all
u, v ∈ IBa (x̄) and p ∈ IBc ( p̄),
1
(13) f (p, u) − f (p, v) − Dx f (p, v)(u − v) ≤ μ u − v2.
2
Fix p ∈ IBc ( p̄) and w ∈ IBa (x̄), and consider the function
Recall that here s(p) = S(p) ∩ IBa/2 (x̄) for all p ∈ IBc ( p̄). For any x ∈ IBa (x̄), we have
using (9) and (13) that
Hence, remembering that p ∈ IBc ( p̄) and s(p) ∈ IBa (x̄), we see that both g(p, w; x)
and f ( p̄, x̄) − f (p, s(p)) − Dx f (p, s(p))(x̄ − s(p)) are in the domain of Θ (p, s(p); ·)
where this function is Lipschitz continuous with Lipschitz constant κ .
340 6 Applications in Numerical Variational Analysis
We now choose p ∈ IBc ( p̄) and u ∈ IBa (x̄), and construct a sequence ξ (u, p) gen-
erated by Newton’s iteration (3) starting from u for the value p of the parameter,
whose existence, uniqueness and quadratic convergence is claimed in the statement
of the theorem.
If u = s(p) there is nothing to prove, so assume u = s(p). Our first step is to show
that, for the function g defined in (14), the mapping
plus (15), (16) and the Lipschitz continuity of Θ (p, s(p); ·) in IBb (0) with constant
κ , and then the second inequality in (12), (13) and the second inequality in (11), we
get
x̄ − Φ0 (x̄)
= Θ (p, s(p); − f ( p̄, x̄) + f (p, s(p)) + Dx f (p, s(p))(x̄ − s(p)))
−Θ (p, s(p); g(p, u; x̄))
≤ κ − f ( p̄, x̄) + f (p, s(p)) + Dx f (p, s(p))(x̄ − s(p))
(17) −[− f (p, u) − Dx f (p, u)(x̄ − u)
+ f (p, s(p)) + Dx f (p, s(p))(x̄ − s(p))]
= κ − f ( p̄, x̄) + f (p, u) + Dx f (p, u)(x̄ − u)
≤ κ − f ( p̄, x̄) + f (p, x̄) + κ f (p, u) − f (p, x̄) − Dx f (p, u)(u − x̄)
≤ κδ + 12 κ μ u − x̄2 ≤ κδ + 12 κ μ a2 ≤ a(1 − κε ).
Further, for any v, v ∈ IBa (x̄), by (15), the Lipschitz continuity of Θ (p, s(p); ·), (9),
and the second inequality in (10), we obtain
Φ0 (v) − Φ0 (v ) = Θ (p, s(p); g(p, u; v)) − Θ (p, s(p); g(p, u; v ))
(18) ≤ κ g(p, u; v) − g(p, u; v) = κ (−Dx f (p, u) + Dx f (p, s(p)))(v − v )
≤ κ μ u − s(p)v − v ≤ 32 aκ μ v − v ≤ κε v − v.
Hence, by 1A.2, there is a fixed point x1 ∈ Φ0 (x1 ) ∩ IBa (x̄). This translates to
g(p, u; x1 ) ∈ G p,s(p) (x1 ) or, equivalently,
This means that x1 is obtained by Newton’s iteration (3) from u for p, and there is
no more than just one such iterate in IBa (x̄).
Now we will demonstrate that x1 satisfies a tighter estimate. Let
ω0 = γ u − s(p)2.
Then ω0 > 0 and, by the last inequality in (10), ω0 ≤ γ (a + a/2)2 ≤ a/2. We apply
again the basic contraction mapping principle 1A.2 to the mapping Φ0 but now on
6 Applications in Numerical Variational Analysis 341
IBω0 (s(p)). Noting that s(p) = Θ (p, s(p); 0) and using (8), (13) and (15), we have
Since IBω0 (s(p)) ⊂ IBa (x̄), we immediately get from (18) that
Thus, the contraction mapping principle applied to the function Φ0 on the ball
IBω1 (s(p)) yields the existence of x1 in this ball such that x1 = Φ0 (x1 ). But the
fixed point x1 of Φ0 in IBω0 (s(p)) must then coincide with the unique fixed point x1
of Φ0 in the larger set IBa (x̄). Hence the fixed point x1 of Φ0 on IBa (x̄) satisfies
Then the single-valued localization ξ of the mapping Ξ in (4) around (x̄, p̄) for ξ̄
described in Theorem 6D.1 is Lipschitz continuous near (x̄, p̄), moreover with
342 6 Applications in Numerical Variational Analysis
Proof. First, recall some notation and facts established in Theorem 6D.1 and its
proof. We know that for any κ > lip (σ ; 0), there exist positive constants a, α , b
and c such that a ≤ α and, for every p ∈ IBc ( p̄) and w ∈ IBa (x̄), the mapping y →
G−1
p,w (y) ∩ IBα (x̄) is a function, with values Θ (p, w; y), which is Lipschitz continuous
on IBb (0) with Lipschitz constant κ ; moreover, the truncation S(p) ∩ IBa (x̄) of the
solution mapping in (2) is a Lipschitz continuous function on IBc ( p̄) and its values
are in IBa/2 (x̄); also, for any starting point u ∈ IBa (x̄) and any p ∈ IBc ( p̄), there is a
unique sequence ξ (u, p) starting from u and generated by Newton’s method (3) for
p whose components are contained in IBa (x̄), with this sequence being quadratically
convergent to s(p) as described in (7).
Our starting observation is that, for any positive a ≤ a, by adjusting the size of
the constant c and taking as a starting point u ∈ IBa (x̄), we can arrange that, for any
p ∈ IBc ( p̄), all elements xk of the sequence ξ (u, p) are actually in IBa (x̄). Indeed, by
taking δ > 0 to satisfy (11) with a replaced by a and then choosing c so that (12)
holds for the new δ and for a , then all requirements for a will hold for a as well
and hence all Newton’s iterates xk will be at distance a from x̄.
Choose
η > lip (Dx f ; ( p̄, x̄)) and ν > lip ( f ; ( p̄, x̄)).
p
Pick a positive constant d ≤ a/2 and make c smaller if necessary, so that for every
p, p ∈ IBc ( p̄) and every w, w ∈ IBd (x̄), we have
and, in addition, for every x ∈ IBd (x̄), every p, p ∈ IBc ( p̄) and every w, w ∈ IBd (x̄),
we have
Choose a positive τ such that τκ < 1/3. Make d and c smaller if necessary so that
(26) 3η (d + c) < τ .
Since
κτ 1
<
1 − κτ 2
we can take c still smaller in order to have
κτ (2d) + 3κ (τ + ν )(2c)
(27) ≤ d.
1 − κτ
Let p, p ∈ IBc ( p̄), u, u ∈ IBd (x̄), (p, u) = (p , u ). In accordance with Theorem
6D.1 and the observation above, let ξ (p, u) = (x1 , . . . , xk , . . .) be the unique sequence
6 Applications in Numerical Variational Analysis 343
generated by Newton’s iteration (3) starting from u whose components xk are all in
IBd (x̄) and hence in IBa/2 (x̄). For this sequence, denoting x0 = u, we know that for
all k ≥ 0
(28) xk+1 = Θ (p, xk ; 0) := ( f (p, xk ) + Dx f (p, xk )(· − xk ) + F(·))−1 (0) ∩ IBα (x̄).
Let
κτ u − u + κ (τ + ν )p − p
γ0 = .
1 − κτ
By using (27) we get that γ0 ≤ d and then IBγ0 (x1 ) ⊂ IBa (x̄). Consider the function
Employing (25) and then the Lipschitz continuity of Θ (p, u; ·) on IBb (0), and apply-
ing (13), (23), (24), (26) and (28), we obtain
(29)
x1 − Φ0 (x1 ) = Θ (p, u; 0)−
Θ (p, u; − f (p , u ) − Dx f (p , u )(x1 − u) + f (p, u) + Dx f (p, u)(x1 − u))
≤ κ f (p , u) − f (p , u ) − Dx f (p , u )(u − u)
−Dx f (p , u )(x1 − u) − f (p, u) + f (p, u) + Dx f (p, u)(x1 − u)
≤ κ f (p , u) − f (p , u ) − Dx f (p , u )(u − u)
+κ (Dx f (p, u) − Dx f (p , u ))(x1 − u) + κ − f (p, u) + f (p, u)
≤ 12 κη u − u2 + κη u − ux1 − u
+κη p − px1 − u + κν p − p
≤ 3κη du − u + κ (2η d + ν )p − p
≤ κτ u − u + κ (τ + ν )p − p = γ0 (1 − κτ ).
For v, v ∈ IBγ0 (x1 ), we have by way of (23), (24) and (26) that
Hence, by the contraction mapping principle 1A.2, there is a unique x1 in IBγ0 (x1 )
such that
But then
f (p , u ) + D f (p , u )(x1 − u ) + F(x1 ) 0,
that is, x1 is the unique Newton’s iterate from u for p which satisfies
x1 − x1 ≤ γ0 .
Since γ0 ≤ d, we obtain that x1 ∈ IBa (x̄) and then x1 is the unique Newton’s iteration
from u for p which is in IBa (x̄).
344 6 Applications in Numerical Variational Analysis
κτ κ (τ + ν )
(32) γk ≤ u − u + p − p.
1 − κτ 1 − 2κτ
In particular, we obtain through (27) that γk ≤ d for all k, and consequently xk ∈
IBd (xk ) ⊂ IBa (x̄).
To show that xn+1 is a Newton’s iterate from xn for p , we proceed in the same
way as in obtaining x1 from u for p . Consider the function
Φk : x → Θ (p, xk ; − f (p , xk ) − Dx f (p , xk )(x − xk ) + f (p, xk ) + Dx f (p, xk )(x − xk )).
and
Φk (v) − Φk (v ) ≤ κτ v − v for any v, v ∈ IBγk (xk+1 ).
Then, by the contraction mapping principle 1A.2 there is a unique xk+1 in IBγk (xk+1 )
with xk+1 = Φk (xk+1 ), which gives us
ξ (u, p) − ξ (u, p )∞ ≤ O(τ )u − u + (κν + O(τ ))p − p.
As in the case of the classical implicit function theorem, the inverse function
version of Theorem 6D.2 turns into an “if and only if” result.
Consider the generalized equation (1) with f (p, x) = g(x) − p whose solution
mapping S = (g + F)−1 , and let x̄ ∈ S(0). In order to apply 6D.2 suppose that g
is differentiable near x̄ with lip (Dg; x̄) < ∞. The corresponding Newton’s iteration
mapping in (3) then has the form
ϒ : (u, p) → ξ ∈ l∞ (X)
8
(33) ∞
-
(g(xk ) + Dg(xk )(xk+1 − xk ) + F(xk+1 )) p with x0 = u .
k=0
Theorem 6D.3 (inverse function theorem for Newton’s iteration). The mapping g +
F is strongly regular at x̄ for 0 if and only if the mapping ϒ in (33) has a Lipschitz
continuous single-valued localization ξ around (x̄, 0) for ξ̄ with
and is such that, for each (u, p) close to (x̄, 0), the sequence ξ (u, p) is convergent.
Moreover, in this case
Proof. The “only if” part follows from the combination of 6D.1 and 6D.2. Noting
that the Lipschitz modulus of the single-valued localization σ in 6D.1 equals the
regularity modulus of g + F, from (22) we get
This means that the mapping g + F is metrically regular at x̄ for 0. We will demon-
strate that the mapping (g + F)−1 has a single-valued localization around 0 for
x̄. We know that dom(g + F)−1 contains a neighborhood of 0. Assume that for
any neighborhoods U of x̄ and Q of 0 there exist p ∈ Q and w, w ∈ U such that
w = w and both w and w are in (g + F)−1 (p). Then the constant sequences
{w, w, . . . , w, . . .} ∈ ϒ (w, p) and {w , w , . . . , w , . . .} ∈ ϒ (w , p) and all their compo-
nents are in U, hence {w, w, . . . , w, . . .} = ξ (w, p) and {w , w , . . . , w , . . .} = ξ (w , p).
In the beginning of the proof we have chosen the neighborhoods U and Q such that
for a fixed p ∈ V the mapping u → ξ (u, p) is a Lipschitz continuous function from
X to l∞ (X) with Lipschitz constant ε < 1, and hence this condition holds for all of
its components. This yields
w − w ≤ ε w − w < w − w ,
which is absurd. Hence, (g + F)−1 has a single-valued localization s around 0 for x̄.
But then from (39) this localization is Lipschitz continuous around 0 with lip (s; 0) ≤
κ . The Banach space versions of Theorems 2B.10 and 3G.1 say that lip (σ ; 0) =
lip (s; 0) = reg (g + F; x̄|0) and hence lip (σ ; 0) ≤ κ . Since κ could be arbitrarily
p (ξ ; (x̄, 0)), we get the inequality opposite to (36), and hence the equality
close to lip
(35) holds.
As an illustration of possible applications of the results in Theorems 6D.1 and
6D.2 in studying complexity of Newton’s iteration, we will produce an estimate
for the number of iterations needed to achieve a particular accuracy of the method,
which is the same for all values of the parameter p in some neighborhood of the
reference point p̄. Given an accuracy measure ρ , suppose that Newton’s method (3)
is to be terminated at the k-th step if
Also suppose that the constant μ and the constants a and c are chosen to satisfy
(9). For p ∈ IBc ( p̄) consider the unique sequence {xk } generated by (3) for p, all
elements of which are in IBa (x̄). Since xk is a Newton’s iterate from xk−1 , we have
that
Let kρ be the first iteration at which (40) holds; then for k < kρ from (41) we obtain
1
(42) ρ < μ xk − xk−1 2 .
2
Further, utilizing (21) we get
k −2
xk − xk−1 ≤ xk − s(p) + xk−1 − s(p) ≤ θ 2 (1 + θ )(x0 − x̄ + s(p) − x̄),
and from the choice of x0 and the first inequality in (12) we have
k −2 3a
xk − xk−1 ≤ θ 2 (1 + θ ) .
2
But then, taking into account (42), we obtain
k+1 9a (1 + θ )
1 2 2
ρ < μθ 2 .
2 4θ 4
Therefore kρ satisfies
8θ 4 ρ
kρ ≤ log2 logθ − 1.
9a μ (1 + θ )2
2
but also necessary for the existence of a single-valued localization of the mapping
Ξ in (4) whose values are convergent, as in the statement of 6D.1.
The topic of this section is likewise a traditional scheme in numerical analysis and
its properties of convergence, again placed in a broader setting than the classical
one. The problem at which this scheme will be directed is quadratic optimization in
a Hilbert space setting:
1
(1) minimize x, Ax − v, x over x ∈ C,
2
where C is a nonempty, closed and convex set in a Hilbert space X, and v ∈ X
is a parameter.
Here ·, · denotes the inner product in X ; the associated norm is
x = x, x. We take A : X → X to be a linear and bounded mapping, entailing
dom A = X ; furthermore, we take A to be self-adjoint, x, Ay = y, Ax for all x, y ∈ X
and require that
This property of A, sometimes called coercivity (a term which can have conflict-
ing manifestations), corresponds to A being strongly monotone relative to C in the
sense defined in 2F, as well as to the quadratic function in (1) being strongly convex
relative to C. For X = IRn , (2) is equivalent to positive definiteness of A relative to
the subspace generated by C −C. For any Hilbert space X in which that subspace is
dense, it entails A being invertible with A−1 ≤ μ −1 .
In the usual framework for Galerkin’s method, C would be all of X , so the tar-
geted problem would be unconstrained. The idea is to consider an increasing se-
quence of finite-dimensional subspaces Xk of X, and by iteratively minimizing over
Xk , to get a solution point x̂k , generate a sequence which, in the limit, solves the
problem for X .
This approach has proven valuable in circumstances where X is a standard func-
tion space and the special functions making up the subspaces Xk are familiar tools
of approximation, such as trigonometric expansions. Here, we will work more gen-
erally with convex sets Ck furnishing “inner approximations” to C, with the eventual
possibility of taking Ck = C ∩ Xk for a subspace Xk .
In Section 2G with X = IRn , we looked at a problem like (1) in which the function
was not necessarily quadratic, and we studied the dependence of its solution on the
parameter v. Before proceeding with anything else, we must update to our Hilbert
space context with a quadratic function the particular facts from that development
which will be called upon.
6 Applications in Numerical Variational Analysis 349
Theorem 6E.1 (optimality and its characterization). For problem (1) under condi-
tion (2), there exists for each v a unique solution x. The solution mapping S : v → x
is thus single-valued with dom S = X . Moreover, this mapping S is Lipschitz con-
tinuous with constant μ −1 , and it is characterized by a variational inequality:
Proof. The existence of a solution x for a fixed v comes from the fact that, for each
sufficiently large α ∈ IR the set Cα of x ∈ C for which the function being minimized
in (1) has value ≤ α is nonempty, convex, closed and bounded, with the bound
coming from (2). Such a subset of X is weakly compact; the intersection of the
Cα that are nonempty is therefore nonempty. That intersection is comprised of all
possible solutions x. The uniqueness of such x follows however from the strong
convexity of the function in question. The characterization of x in (3) is proved
exactly as in the case of X = IRn in 2A.6. The Lipschitz property of S comes out of
the same argument that was used in the second half of the proof of 2F.9, utilizing
the strong monotonicity of A.
As an important consequence of Theorem 6E.1, we get a Hilbert space version
of the projection result in 1D.5 for convex sets in IRn .
Corollary 6E.2 (projections onto convex sets). For a nonempty, closed, convex set
C in a Hilbert space X , there exists for each v ∈ X a unique nearest point x of
C, called the projection of v on C and denoted by PC (v). The projection mapping
PC : X → C is Lipschitz continuous with constant 1.
Proof. Take A = I in (1), noting that (2) holds then with μ = 1. Problem (1) is
equivalent then to minimizing ||x − v|| over x ∈ C, because the expression being
minimized differs from 12 ||x − v||2 only by the constant term 12 ||v||2 .
In Galerkin’s method, when we get to it, there will be need of comparing solu-
tions to (1) with solutions to other problems for the same v but sets different from
C. In effect, we have to be able to handle the choice of C as another sort of param-
eter. For a start, consider just two different sets, D1 and D2 . How might solutions
to the versions of (1) with D1 and D2 in place of C, but with fixed v, relate to each
other? To get anywhere with this we require a joint strong monotonicity condition
extending (2):
(4) x, Ax ≥ μ x2 for all x ∈ Di − D j and i, j ∈ {1, 2}, i = j, where μ > 0.
Obviously (4) holds without any fuss over different sets if we simply have A strongly
monotone with constant μ on all of X.
Proposition 6E.3 (solution estimation for varying sets). Consider any nonempty,
closed, convex sets D1 and D2 in X satisfying (4). If x1 and x2 are the solutions of
problem (1) with constraint sets D1 and D2 , respectively, in place of C, then
Adding the inequalities in (7) to the one in (6) and rearranging the sum, we obtain
as claimed in (5).
Having this background at our disposal, we are ready to make progress with our
generalized version of Galerkin’s method. We consider along with C a sequence of
sets Ck ⊂ X for k = 1, 2, . . . which, like C, are nonempty, closed and convex. We
suppose that
and let
as provided by Theorem 6E.1 through the observation that (2) carries over to any
subset of C. By generalized Galerkin’s sequence associated with (8) for a given v,
we will mean the sequence of solutions x̂k = Sk (v), k = 1, 2, . . . .
This quadratic inequality in dk = x̂k − x̂ implies that the sequence {dk } is bounded,
say by b. Putting this b in place of x̂k − x̂ on the right side of (11), we get a bound
of the form in (10).
Is the square root describing the rate of convergence through the estimate in (10)
exact? The following example shows that this is indeed the case, and no improve-
ment is possible, in general.
Example 6E.5 (counterexample to improving
the general
estimate). Consider prob-
lem (1) in the case of X = IR2 , C = (x1 , x2 ) x2 ≤ 0 (lower half-plane), v = (0, 1)
and A = I, so that the issue revolves around projecting v on C and the solution is
x̂ = (0, 0). For each k = 1, 2, . . . let ak = (1/k, 0) and let Ck consist of the points
x ∈ C such that x − ak , v − ak ≤ 0. Then the projection x̂k of v on Ck is ak , and
1
|x̂k − x̂| = 1/k, d(x̂,Ck ) = √ .
k 1 + k2
In this case the ratio |x̂k − x̂|/d(x̂,Ck ) p is unbounded in k for any p > 1/2.
Detail. The fact that the projection of v on Ck is ak comes from the observation
that v − ak ∈ NCk (ak ). A similar observation confirms that the specified x̄k is the
projection of x̂ on Ck . The ratio |x̂k − x̂|/d(x̂,Ck ) p can be calculated as k2p−1 (1 +
1/(k2 ) p/2 , and from that the conclusion is clear that it is bounded with respect to k
if and only if 2 − (1/p) ≤ 0, or in other words, p ≤ 1/2.
(0,1)
x^ x^k
Ck
There is, nevertheless, an important case in which the exponent 1/2 in (10) can
be replaced by 1. This case is featured in the following result:
Theorem 6E.6 (improved rate of convergence for subspaces). Under the conditions
of Theorem 6E.4, if the sets C and Ck are subspaces of X , then there is a constant c
such that
Proof. In this situation the variational inequality in (3) reduces to the requirement
that Ax − v ⊥ C. We then have Ax̂ − v ∈ C⊥ ⊂ Ck⊥ and Ax̂k − v ∈ Ck⊥ , so that A(x̂k −
x̂) ∈ Ck⊥ . Consider now an arbitrary x ∈ Ck , noting that since x̂k ∈ Ck we also have
x̂k − x ∈ Ck . We calculate from (2) that
For an example which illustrates how the theory of solution mappings can be applied
in infinite dimensions with an eye toward numerical approximations, we turn to a
basic problem in optimal control, the so-called linear-quadratic regulator problem.
That problem takes the form:
1
1 T T T T
(1) minimize [x(t) Qx(t) + u(t) Ru(t)] + s(t) x(t) − r(t) u(t) dt
0 2
subject to
(2) ẋ(t) = Ax(t) + Bu(t) + p(t) for a.e. t ∈ [0, 1], x(0) = a,
6 Applications in Numerical Variational Analysis 353
This concerns the control system governed by (2) in which x(t) ∈ IRn is the state at
time t and u(t) is the control exercised at time t. The choice of the control function
u : [0, 1] → IRm yields from the initial state a ∈ IRn and the dynamical equation in (2)
a corresponding state trajectory x : [0, 1] → IRn with derivative ẋ. The matrices A, B,
Q and R have dimensions fitting these circumstances, with Q and R symmetric and
positive semidefinite so as to ensure (as will be seen below) that the function being
minimized in (1) is convex.
The set U ⊂ IRm from which the values of the control have to be selected from in
(3) is nonempty, convex and compact3 . We also assume that the matrix R is positive
definite relative to U; in other words, there exists μ > 0 such that
We follow that Hilbert space pattern throughout, assuming that the function r in (1)
belongs to L2 (IRm , [0, 1]) while p and s belong to L2 (IRn , [0, 1]). This is a convenient
compromise which will put us in the framework of quadratic optimization in 6E.
There are two ways of looking at problem (1). We can think of it in terms of min-
imizing over function pairs (u, x) constrained by both (2) and (3), or we can regard
x as a “dependent variable” produced from u through (2) and standard facts about
differential equations, so as to think of the minimization revolving only around the
choice of u. For any u satisfying (3) (and therefore essentially bounded), there is
a unique state trajectory x specified by (2) in the sense of x being an absolutely
continuous function of t and therefore differentiable a.e. Due to the assumption
that p ∈ L2 (IRn , [0, 1]), the derivative ẋ can then be interpreted as an element of
L2 (IRn , [0, 1]) as well. Indeed, x is given by the Cauchy formula
t
(5) x(t) = e a +
At
eA(t−τ ) (Bu(τ ) + p(τ ))d τ for all t ∈ [0, 1].
0
In particular, we can view it as belonging to the Banach space C(IRn , [0, 1]) of con-
tinuous functions from [0, 1] to IRn equipped with the norm
3 We do not really need U to be bounded, but this assumption simplifies the analysis.
354 6 Applications in Numerical Variational Analysis
The relation between u and x can be cast in a frame of inputs and outputs. Define
the mapping T : L2 (IRn , [0, 1]) → L2 (IRn , [0, 1]) as
t
(6) (Tw)(t) = eA(t−τ ) w(τ )d τ for a.e. t ∈ [0, 1],
0
and, on the other hand, let W : L2 (IRn , [0, 1]) → L2 (IRn , [0, 1]) be the mapping defined
by
Finally, with a slight abuse of notation, denote by B the mapping from L2 (IRm , [0, 1])
to L2 (IRn , [0, 1]) associated with the matrix B, that is (Bu)(t) = Bu(t); later we do
the same for the mappings Q and R. Then the formula for x in (5) comes out as
where u is the input, x is the output, and p is a parameter. Note that in this case we
are treating x as an element of L2 (IRn , [0, 1]) instead of C(IRn , [0, 1]). This makes no
real difference but will aid in the analysis.
Exercise 6F.1 (adjoint in the Cauchy formula). Prove that the mapping T defined by
(6) is linear and bounded. Also show that the adjoint (dual) mapping T ∗ , satisfying
x, Tu = T ∗ x, u, is given by
1
T (τ −t)
(T ∗ x)(t) = eA x(τ )d τ for a.e. t ∈ [0, 1].
t
Also show (T B)∗ = B∗ T ∗ , where B∗ is the mapping L2 (IRn , [0, 1]) to L2 (IRm , [0, 1])
associated with the transposed matrix BT ; that is
1
∗ T (τ −t)
((T B) x)(t) = BT eA x(τ )d τ for a.e. t ∈ [0, 1].
t
subject to
In (9) we have dropped the constant terms that do not affect the solution. Noting
that z = (T B)(u) and utilizing the adjoint T ∗ of the mapping T , let
(12) A = B∗ T ∗ QT B + R.
Here, as for the mapping B, we regard Q and R as linear bounded mappings acting
between L2 spaces: for (Ru)(t) = Ru(t), and so forth. Let
(13) C = u ∈ L2 u(t) ∈ U for a.e. t ∈ [0, 1] .
With this notation, problem (9)–(10) can be written in the form treated in 6E:
1
(14) minimize u, A u + V (y), u subject to u ∈ C.
2
Exercise 6F.2 (coercivity in control). Prove that the set C in (13) is a closed and
convex subset of L2 and that the mapping A ∈ L (L2 , L2 ) in (12) satisfies the con-
dition
u, A u ≥ μ u22 for all u ∈ C − C,
where μ is the constant in (4).
Applying Theorem 6E.1 in the presence of 6F.2, we obtain a necessary and suf-
ficient condition for the optimality of u in problem (14), namely the variational
inequality
For (15), or equivalently for (14) or (1)–(3), we arrive then at the following result of
implicit-function type:
Theorem 6F.3 (implicit function theorem for optimal control in L2 ). Under (4), the
solution mapping S which goes from parameter elements y = (p, s, r) to pairs (u, x)
solving (1)–(3) is single-valued and globally Lipschitz continuous from the space
L2 (IRn × IRn × IRm , [0, 1]) to the space L2 (IRm , [0, 1]) × C(IRn , [0, 1]).
356 6 Applications in Numerical Variational Analysis
Proof. Because V in (11) is an affine function of y = (p, s, r), we obtain from 6E.1
that for each y problem (14) has a unique solution u(y) and, moreover, the function
y → u(y) is globally Lipschitz continuous in the respective norms. The value u(y)
is the unique optimal control in problem (1)–(3) for y. Taking norms in the Cauchy
formula (5), we see further that for any y = (p, s, r) and y = (p , s , r ), if x and x are
the corresponding solutions of (2) for u(y), p, and u(y ), p , then, for some constants
c1 and c2 , we get
t
|x(t) − x (t)| ≤ c1 (|B||u(y)(τ ) − u(y )(τ )| + |p(τ ) − p (τ )|)d τ
0
≤ c2 (u(y) − u(y)2 + p − p2 ).
Taking the supremum on the left and having in mind that y → u(y) is Lipschitz
continuous, we obtain that the optimal trajectory mapping y → x(y) is Lipschitz
continuous from the L2 space of y to C(IRn , [0, 1]). Putting these facts together, we
confirm the claim in the theorem.
The optimal control u whose existence and uniqueness for a given y is asserted in
6F.3 is actually, as an element of L2 , an equivalence class of functions differing from
each other only on sets of measure zero in [0, 1]. Thus, having specified an optimal
control function u, we may change its values u(t) on a t-set of measure zero without
altering the value of the expression being minimized or affecting optimality. We will
go on to show now that one can pick a particular function from the equivalence class
which has better continuity properties with respect to both time and the parameter
dependence.
For a given control u and parameter y = (p, s, r), let
ψ = T ∗ (Qx + s),
where x solves (8). Then, through the Cauchy formula and 6F.1, ψ is given by
1
T (τ −t)
ψ (t) = eA (Qx(τ ) + s(τ )) d τ .
t
(16) ψ̇ (t) = −AT ψ (t) − Qx(t) − s(t) for a.e. t ∈ [0, 1], ψ (1) = 0,
where x is the solution of (2) for the given u. The function ψ is called the adjoint or
dual trajectory associated with a given control u and its corresponding state trajec-
tory x, and (16) is called the adjoint equation. Bearing in mind the particular form
of V (y) in (11) and that, by definition,
where B∗ stands for the linear mapping associated with the transpose of the matrix
B. The boundary value problem combining (2) and (16), coupled with the variational
inequality (17), fully characterizes the solution to problem (1)–(3).
We need next a standard fact from Lebesgue integration. For a function ϕ on
[0, 1], a point tˆ ∈ (0, 1) is said to be a Lebesgue point of ϕ when
tˆ+ε
1
lim ϕ (τ )d τ = ϕ (t).
ε →0 2ε tˆ−ε
It is known that when ϕ is integrable on [0, 1], its set of Lebesgue points is of full
measure 1.
Now, let u be the optimal control for a particular parameter value y, and let x and
ψ be the associated optimal trajectory and adjoint trajectory, respectively. Let tˆ be a
Lebesgue point of both u and r (the set of such tˆ is of full measure). Pick any w ∈ U,
and for 0 < ε < min{tˆ, 1 − t̂} consider the function
w for t ∈ (tˆ − ε , tˆ + ε ),
ûε (t) =
u(t) otherwise.
Then for every sufficiently small ε the function ûε is a feasible control, i.e., belongs
to the set C in (13), and from (17) we obtain
tˆ+ε
(−r(τ ) + Ru(τ ) + BTψ (τ ))T (w − u(τ ))d τ ≥ 0.
tˆ−ε
Since tˆ is a Lebesgue point of the function under the integral (we know that ψ is
continuous and hence its set of Lebesgue points is the entire interval [0, 1]), we can
pass to zero with ε and by taking into account that tˆ is an arbitrary point from a set
of full measure in [0, 1] and that w can be any element of U, come to the following
pointwise variational inequality which is required to hold for a.e. t ∈ [0, 1]:
As is easily seen, (18) implies (17) as well, and hence these two variational inequal-
ities are equivalent.
Summarizing, we can now say that a feasible control u is the solution of (1)–(3)
for a given y = (p, s, r) with corresponding optimal trajectory x and adjoint trajectory
ψ if and only if the triple (u, x, ψ ) solves the following boundary value problem
coupled with a pointwise variational inequality:
ẋ(t) = Ax(t) + Bu(t) + p(t), x(0) = a,
(19a)
ψ̇ (t) = −ATψ (t) − Qx(t) − s(t), ψ (1) = 0,
(19b) r(t) ∈ Ru(t) + BTψ (t) + NU (u(t)) for a.e. t ∈ [0, 1].
358 6 Applications in Numerical Variational Analysis
That is, for an optimal control u and associated optimal state and adjoint trajectories
x and ψ , there exists a set of full measure in [0, 1] such that (19b) holds for every t
in this set. Under an additional condition on the function r we obtain the following
result:
Theorem 6F.4 (implicit function theorem for continuous optimal controls). Let the
parameter y = (p, s, r) in (1)–(3) be such that the function r is Lipschitz continuous
on [0, 1]. Then, from the equivalence class of optimal control functions for this y,
there exists an optimal control u(y) for which (19b) holds for all t ∈ [0, 1] and which
is Lipschitz continuous with respect to t on [0, 1]. Moreover, the solution mapping
y → u(y) is Lipschitz continuous from the space L2 (IRn × IRn , [0, 1]) × C(IRm , [0, 1])
to the space C(IRm , [0, 1]).
Proof. It is clear that the adjoint trajectory ψ is Lipschitz continuous in t on [0, 1]
for any feasible control; indeed, it is the solution of the linear differential equation
(16), the right side of which is a function in L2 . Let x and ψ be the optimal state and
adjoint trajectories and let u be a function satisfying (19b) for all t ∈ σ where σ is a
/ σ we define u(t) to be the unique solution of the
set of full measure in [0, 1]. For t ∈
following strongly monotone variational inequality in IRn :
Then this u is within the equivalence class of optimal controls, and, moreover, the
vector u(t) satisfies (20) for all t ∈ [0, 1]. Noting that q is a Lipschitz continuous
function in t on [0, 1], we get from 2F.10 that for each fixed t ∈ [0, 1] the solution
mapping of (20) is Lipschitz continuous with respect to q(t). Since the composition
of Lipschitz continuous functions is Lipschitz continuous, the particular optimal
control function u which satisfies (19b) for all t ∈ [0, 1] is Lipschitz continuous in t
on [0, 1].
We already know from 6F.3 that the optimal trajectory mapping y → x(y) is
Lipschitz continuous into C(IRn , [0, 1]). By the same argument, the associated ad-
joint mapping y → ψ (y) is Lipschitz continuous from L2 to C(IRn , [0, 1]). But then,
according to 2F.10 again, for every y, y with r, r Lipschitz continuous and every
t ∈ [0, 1], with the optimal control values u(y)(t) and u(y )(t) at t being the unique
solutions of (20), we have
|u(y)(t) − u(y)(t)| ≤ μ −1 |r(t) − r (t)| + |B||ψ (y)(t) − ψ (y)(t)| .
This holds for every t ∈ [0, 1], so by invoking the maximum norm we get the desired
result.
We focus next on the issue of solving problem (1)–(3) numerically. By this we
mean determining the optimal control function u. This is a matter of recovering
a function on [0, 1] which is only specified implicitly, in this case by a variatio-
nal problem. Aside from very special cases, it means producing numerically an ac-
ceptable approximation of the desired function u. For simplicity, let us assume that
y = (p, s, r) = 0.
6 Applications in Numerical Variational Analysis 359
There are various numerical techniques for solving problems of this form; here we
shall not discuss this issue.
We will now derive an estimate for the error in approximating the solution of
problem (1)–(3) by use of discretization (21) of the optimality system (19ab).
We suppose that for each given N we can solve (21) exactly, obtaining vectors
uNi ∈ U, i = 0, . . . , N − 1, and xNi ∈ IRn , ψiN ∈ IRn , i = 0, . . . , N. For a given N, the
solution (uN , xN , ψ N ) of (21) is identified with a function on [0, 1], where xN and
ψ N are the piecewise linear and continuous interpolations across the grid {ti } over
[0, 1] of (a, xN1 , . . . , xNN ) and (ψ0N , ψ1N , . . . , ψN−1
N , 0), respectively, and uN is the piece-
wise constant interpolation of (u0 , u1 , . . . , uNN−1 ) which is continuous from the right
N N
across the grid points ti = ih, i = 0, 1, . . . , N − 1 and from the left at tN = 1. The
functions xN and ψ N are piecewise differentiable and their derivatives ẋN and ψ̇ N
are piecewise constant functions which are assumed to have the same continuity
properties in t as the control uN . Thus, (uN , xN , ψ N ) is a function defined in the
whole interval [0, 1], and it belongs to L2 .
Theorem 6F.5 (error estimate for discrete approximation). Consider problem (1)–
(3) with r = 0, s = 0 and p = 0 under condition (4) and let, according to 6F.4,
(u, x, ψ ) be the solution of the equivalent optimality system (19ab) for all t ∈ [0, 1],
360 6 Applications in Numerical Variational Analysis
with u Lipschitz continuous in t on [0, 1]. Consider also the discretization (21) and,
for N = 1, 2, . . . and mesh size h = 1/N , denote by (uN , xN , ψ N ) its solution ex-
tended by interpolation to the interval [0, 1] in the manner described above. Then
the following estimate holds:
Observe that system (23) has the same form as (19ab) with a particular choice of
the parameters. Specifically, (uN , xN , ψ N ) is the solution of (19ab) for the parameter
value yN := (pN , sN , rN ), while (u, x, ψ ) is the solution of (19ab) for y = (p, s, r) =
(0, 0, 0). Then, by the implicit function theorem 6F.4, the solution mapping of (19)
is Lipschitz continuous in the maximum norms, so there exists a constant c such that
For that purpose we employ the following standard result in the theory of difference
equations which we state here without proof:
N
0 ≤ αN ≤ a and 0 ≤ αi+1 ≤ a + b ∑ α j for i = 0, . . . , N,
j=i+1
Then, since all ui are from the compact set U, from the first equation in (21) we get
with some constants c1 , c2 independent of N. On the other hand, the first equation
in (21) can be written equivalently as
i
xN (ti+1 ) = a + ∑ h(AxN (t j ) + BuN (t j )),
j=1
and then, by taking norms and applying the direct part of discrete Gronwall lemma
6F.6, we obtain that sup0≤i≤N |xN (ti )| is bounded by a constant which does not de-
pend on N. This gives us error of order O(h) for pN in the maximum norm. By
repeating this argument for the discrete adjoint equation (the second equation in
(21)), but now applying the backward part of 6F.7, we get the same order of magni-
tude for sN and rN . This proves (25) and hence also (22).
Note that the order of the discretization error is O(h), which is sharp for the Euler
scheme. Using higher-order schemes may improve the order of approximation, but
this may require better continuity properties of the optimal control.
In the proof of 6F.5 we used the combination of the implicit function theo-
rem 6F.4 for the variational system involved and the estimate (25) for the resid-
ual yN = (pN , sN , rN ) of the approximation scheme. The convergence to zero of the
residual comes out of the approximation scheme and the continuity properties of the
solution of the original problem with respect to time t; in numerical analysis this is
called the consistency of the problem and its approximation. The property emerg-
ing from the implicit function theorem 6F.4, that is, the Lipschitz continuity of the
solution with respect to the residual, is sometimes called stability. Theorem 6F.5
furnishes an illustration of a well-known paradigm in numerical analysis: stability
plus consistency yields convergence.
Having the analysis of the linear-quadratic problem as a basis, we could pro-
ceed to more general nonlinear and nonconvex optimal control problems and obtain
convergence of approximations and error estimates by applying more advanced im-
plicit function theorems using, e.g., linearization of the associated nonlinear opti-
mality systems. However, this would involve more sophisticated techniques which
go beyond the scope of this book, so here is where we stop.
362 6 Applications in Numerical Variational Analysis
Commentary
Theorem 6A.2 is from Dontchev, Lewis and Rockafellar [2003], where it is supplied
with a direct proof. Theorem 6A.3 was first shown by Lewis [1999]; see also Lewis
[2001]. Theorem 6A.7 was initially proved in Dontchev, Lewis and Rockafellar
[2003] by using the characterization of the metric regularity of a mapping in terms
of the nonsingularity of its coderivative (see Section 4H) and applying the radius
theorem for nonsingularity in 6A.2. The proof given here is from Dontchev, Quin-
campoix and Zlateva [2006]. For extensions to infinite-dimensional spaces see Ioffe
[2003a,b]. Theorems 6A.8 and 6A.9 are from Dontchev and Rockafellar [2004].
The material in Section 6B is basically from Dontchev, Lewis and Rockafellar
[2003]. The results in Sections 6C and 6D have roots in several papers; see Rock-
afellar [1976a,b], Robinson [1994], Dontchev [2000], Aragón Artacho, Dontchev,
and Geoffroy [2007] and Dontchev and Rockafellar [2009b].
Most of the results in Section 6E can be found in basic texts on variational meth-
ods; for a recent such book see Attouch, Buttazzo and Michaille [2006]. Section
6F presents a very simplified version of a result in Dontchev [1996]; for advanced
studies in this area see Malanowski [2001] and Veliov [2006].
References
363
364 References
Bartle, R. G. and D. R. Sherbert [1992], Introduction to real analysis, Second edition, John Wiley,
New York.
Bessis, D. N., Yu. S. Ledyaev and R. B. Vinter [2001], Dualization of the Euler and Hamiltonian
inclusions, Nonlinear Analysis, 43, 861–882.
Bonnans, J. F. and A. Shapiro [2000], Perturbation analysis of optimization problems, Springer
Series in Operations Research, Springer, New York.
Borwein, J. M. [1983], Adjoint process duality, Mathematics of Operations Research, 8, 403–434.
Borwein, J. M. [1986a], Stability and regular points of inequality systems, Journal of Optimization
Theory and Applications, 48, 9–52.
Borwein, J. M. [1986b], Norm duality for convex processes and applications, Journal of
Optimization Theory and Applications, 48, 53–64.
Borwein, J. M. and A. L. Dontchev [2003], On the Bartle-Graves theorem, Proceedings of the
American Mathematical Society, 131, 2553–2560.
Borwein, J. M. and A. S. Lewis [2006], Convex analysis and nonlinear optimization. Theory and
examples, Second edition, CMS Books in Mathematics/Ouvrages de Mathématiques
de la SMC, 3. Springer, New York.
Borwein, J. M. and Q. J. Zhu [2005], Techniques of variational analysis, CMS Books in
Mathematics/Ouvrages de Mathématiques de la SMC, 20, Springer, New York.
Cauchy, A-L. [1831], Résumé d’un mémoir sur la mécanique céleste et sur un nouveau calcul
appelé calcul des limites, in Oeuvres Complétes d’Augustun Cauchy, volume 12,
pp. 48–112, Gauthier-Villars, Paris 1916. Edited by L’Académie des Sciences.
The part pertaining to series expansions was read at a meeting of the Academie de
Turin, 11 October, 1831.
Chipman, J. S. [1997], “Proofs” and proofs of the Eckart-Young theorem, in Stochastic processes
and functional analysis, 71–83, Lecture Notes in Pure and Applied Mathematics, 186,
Dekker, New York.
Clarke, F. H. [1976], On the inverse function theorem, Pacific Journal of Mathematics, 64, 97–102.
Clarke, F. H. [1983], Optimization and nonsmooth analysis, Canadian Mathematical Society Series
of Monographs and Advanced Texts. A Wiley-Interscience Publication.
John Wiley & Sons, Inc., New York.
Cottle, R. W., J.-S. Pang and R. E. Stone [1992], The linear complementarity problem, Academic
Press, Inc., Boston, MA.
Courant, R. [1988], Differential and integral calculus, Vol. II. Translated from the German by
E. J. McShane. Reprint of the 1936 original. Wiley Classics Library.
A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York.
Deimling, K. [1992], Multivalued differential equations, Walter de Gruyter, Berlin.
Deville, R., G. Godefroy and V. Zizler [1993], Smoothness and renormings in Banach spaces,
Pitman Monographs and Surveys in Pure and Applied Mathematics, 64. Longman
Scientific & Technical, Harlow; copublished in the United States with
John Wiley & Sons, Inc., New York.
Deutsch, F. [2001], Best approximation in inner product spaces, CMS Books in
Mathematics/Ouvrages de Mathématiques de la SMC, 7. Springer, New York.
Dieudonné, J. [1969], Foundations of modern analysis, Enlarged and corrected printing.
Pure and Applied Mathematics, Vol. 10-I, Academic Press, New York.
References 365
Ekeland, I. [1990], The ε -variational principle revisited, with notes by S. Terracini, in Methods of
nonconvex analysis (Varenna 1989), 1–15, Lecture Notes in Mathematics, 1446,
Springer, Berlin.
Facchinei, F. and J-S. Pang [2003], Finite-dimensional variational inequalities and
complementarity problems, Springer, New York.
Fiacco, A. V. and G. P. McCormick [1968], Nonlinear programming: Sequential unconstrained
minimization techniques, John Wiley & Sons, Inc., New York–London–Sydney.
Fitzpatrick, P. [2006], Advanced calculus, Thomson Brooks/Cole.
Golub, G. H. and C. F. Van Loan [1996], Matrix computations, The John Hopkins University Press,
Baltimore, MD, 3d edition.
Goursat, Ed. [1903], Sur la théorie des fonctions implicites, Bulletin de la Société Mathématiques
de France, 31, 184–192.
Goursat, Ed. [1904], A course in mathematical analysis, English translation by E. R. Hedrick,
Ginn Co., Boston.
Graves, L. M. [1950], Some mapping theorems, Duke Mathematical Journal, 17, 111–114.
Halkin, H. [1974], Implicit functions and optimization problems without continuous
differentiability of the data, SIAM Journal on Control, 12, 229–236.
Hamilton, R. S. [1982], The inverse function theorem of Nash and Moser, Bulletin of the
American Mathematical Society (N.S.), 7, 65–222.
Hausdorff, F. [1927], Mengenlehre, Walter de Gruyter and Co., Berlin.
Hedrick, E. R. and W. D. A. Westfall [1916], Sur l’existence des fonctions implicites, Bulletin de
la Société Mathématiques de France, 44, 1–13.
Hildebrand, H. and L. M. Graves [1927], Implicit functions and their differentials in general
analysis, Transactions of the American Mathematical Society, 29, 127–153.
Hoffman, A. J. [1952], On approximate solutions of systems of linear inequalities, Journal of
Research of the National Bureau of Standards, 49, 263–265.
Hurwicz, L. and M. K. Richter [2003], Implicit functions and diffeomorphisms without C 1 ,
Advances in mathematical economics, 5, 65–96.
Ioffe, A. D. [1979], Regular points of Lipschitz functions, Transactions of the American
Mathematical Society, 251, 61–69.
Ioffe, A. D. [1981], Nonsmooth analysis: differential calculus of nondifferentiable mappings,
Transactions of the American Mathematical Society, 266, 1–56.
Ioffe, A. D. [1984], Approximate subdifferentials and applications, I. The finite-dimensional
theory, Transactions of the American Mathematical Society, 281, 389–416.
Ioffe, A. D. [2000], Metric regularity and subdifferential calculus, Uspekhi Mat. Nauk, 55, no. 3
(333), 103–162 (Russian), English translation in Russian Mathematical Surveys,
55, 501–558.
Ioffe, A. D. [2003a], On robustness of the regularity property of maps, Control and Cybernetics,
32, 543–554.
Ioffe, A. D. [2003b], On stability estimates for the regularity property of maps, in H. Brezis,
K.C. Chang, S.J. Li, and P. Rabinowitz, editors, Topological methods, variational
methods and their applications, pp. 133–142, World Science Publishing, NJ.
References 367
Li, Wu [1994], Sharp Lipschitz constants for basic optimal solutions and basic feasible solutions
of linear programs, SIAM Journal on Control and Optimization, 32, 140–153.
Lyusternik, L. A. [1934], On the conditional extrema of functionals, Mat. Sbornik, 41, 390–401
(Russian).
Lyusternik, L. A. and V. I. Sobolev [1965], Elements of functional analysis, Nauka, Moscow
(Russian).
Malanowski, K. [2001], Stability and sensitivity analysis for optimal control problems with
control-state constraints, Dissertationes Mathematicae, 394, 55pp.
Mordukhovich, B. S. [1984], Nonsmooth analysis with nonconvex generalized differentials and
conjugate mappings, Doklady Akad. Nauk BSSR, 28, 976–979, (Russian).
Mordukhovich, B. S. [2006], Variational analysis and generalized differentiation, I. Basic theory,
Springer, Berlin.
Nadler, Sam B., Jr. [1969], Multi-valued contraction mappings, Pacific Journal of Mathematics,
30, 475–488.
Ng, Kung Fu [1973], An open mapping theorem, Proceedings of Cambridge Philosophical
Society, 74, 61–66.
Nijenhuis, A. [1974], Strong derivatives and inverse mappings, The American Mathematical
Monthly, 81, 969–980.
Noble, B. and J. W. Daniel [1977], Applied linear algebra, Second edition, Prentice-Hall, Inc.,
Englewood Cliffs, N.J.
Páles, Z. [1997], Inverse and implicit function theorems for nonsmooth maps in Banach spaces,
Journal of Mathematical Analysis and Applications, 209, 202–220.
Pompeiu, D. [1905], Fonctions de variable complexes, Annales de la Faculté des Sciences
de l´ Université de Toulouse, 7, 265–345.
Ralph, D. [1993], A new proof of Robinson’s homeomorphism theorem for PL-normal maps,
Linear Algebra and Applications, 178, 249–260.
Robinson, S. M. [1972], Normed convex processes, Transactions of the American Mathematical
Society, 174, 127–140.
Robinson, S. M. [1976], Regularity and stability for convex multivalued functions, Mathematics of
Operations Research, 1, 130–143.
Robinson, S. M. [1980], Strongly regular generalized equations, Mathematics of Operations
Research, 5, 43–62.
Robinson, S. M. [1981], Some continuity properties of polyhedral multifunctions, Mathematical
Programming Study, 14, 206–214.
Robinson, S. M. [1984], Local structure of feasible sets in nonlinear programming, Part II:
Nondegeneracy, Mathematical Programming Study, 22, 217–230.
Robinson, S. M. [1991], An implicit-function theorem for a class of nonsmooth functions,
Mathematics of Operations Research, 16, 292–309.
Robinson, S. M. [1992], Normal maps induced by linear transformations, Mathematics of
Operations Research, 17, 691–714.
Robinson, S. M. [1994], Newton’s method for a class of nonsmooth functions, Set-valued
Analysis, 2, 291–305.
References 369
371
372 Notation
373
374 Index