Convex Analysis and Minimization Algorithms II-ChapXI
Convex Analysis and Minimization Algorithms II-ChapXI
Prerequisites. Sublinear functions and associated convex sets (Chap. V); characterization
of the subdifferential via the conjugacy correspondence (§X.l.4); calculus rules on conjugate
functions (§X.2); and also: behaviour at infinity ofone-dimensional convex functions (§I.2.3).
Introduction. There are two motivations for the concepts introduced in this chapter:
a practical one, related with descent methods, and a more theoretical one, in the
framework of differential calculus.
- In §VIII.2.2, we have seen that the steepest-descent method is not convergent, es-
sentially because the subdifferential is not a continuous mapping. Furthermore, we
have defined Algorithm IX.l.6 which, to find a descent direction, needs to extract
limits of subgradients: an impossible task on a computer.
- On the theoretical side, we have seen in Chap. VI the directional derivative of a finite
convex function, which supports a convex set: the subdifferential. This latter set was
generalized to extended-valued functions in §X.1.4; and infinite-valued directional
derivatives have also been seen (Proposition 1.4.1.3, Example X.2.4.3). A natural
question is then: is the supporting property still true in the extended-valued case?
The answer is not quite yes, see below the example illustrated by Fig. 2.1.1.
The two difficulties above are overcome altogether by the so-called e-subdifferen-
tial of f, denoted as f, which is a certain perturbation of the subdifferential studied
in Chap. VI for finite convex functions. While the two sets are identical for e = 0, the
properties of as f turn out to be substantially different from those of af. We therefore
study as f with the help of the relevant tools, essentially the conjugate function f*
(which was of no use in Chap. VI). In return, particularizing our study to the case
e = 0 enables us to generalize the results of Chap. VI to extended-valued functions.
Throughout this chapter, and unless otherwise specified, we therefore have
1f E Conv]Rn and e ~ 0 .1
However, keeping in mind that our development has a practical importance for nu-
merical optimization, we will often pay special attention to the finite-valued case.
J.-B. Hiriart-Urruty et al., Convex Analysis and Minimization Algorithms II
© Springer-Verlag Berlin Heidelberg 1993
92 XI. Approximate Subdifferentials of Convex Functions
Of course, s is still an e-subgradient if (1.1.1) holds only for y E dom f. The set of
all e-subgradients of f at x is the e-subdifferential (of f at x), denoted by ad(x).
Even though we will rarely take e-subdifferentials of functions not in Conv JRn ,
it goes without saying that the relation of definition (1.1.1) can be applied to any
function finite at x. Also, one could set ad (x) = 0 for x ~ dom f. 0
aae+(l-a)e' f(x) :,) aad(x) + (1 - a)ae, f(x) for all a E ]0, 1[. (1.1.4)
The last relation means that the graph of the multifunction JR+ 3 e f---+ aef (x) is a
convex set in JRn+l; more will be said about this set later (Proposition 1.3.3).
We will continue to use the notation af, rather than aof, for the exact subdiffer-
ential- knowing that ae f can be called an approximate subdifferential when e > O.
Figure 1.1.1 gives a geometric illustration of Definition 1.1.1: s, together with
r E JR, defines the affine function y f-+ as ,r (y) := r + (s, y - x); we say that s is
an e-subgradient of f at x when it is possible to have simultaneously r ~ f (x) - e
and as,r ~ f. The condition r = f(x), corresponding to exact subgradients, is thus
relaxed bye; thanks to closed convexity of f, this relaxation makes it possible to find
such an as,r:
, f(x)
I
~:
e ..--1---...... -----.. _------ gr s,r
f
x .....
Fig.t.t.t. Supporting hyperplanes within c
Incidentally, the above proof shows that, among the e-subgradients, there is one
for which strict inequality holds in (1.1.1). A consequence of this result is that, for
e > 0, the domain of the multifunction x ~ 8e1 (x) is the convex set dom I. Here
is a difference with the case e = 0: we know that dom 81 need not be the whole of
dom I; it may even be a nonconvex set.
Consider for example the one-dimensional convex function
[-I,-I-e/x] if x <-e/2,
8ef(x)= { [-1,+1] if-e/2~x~e/2,
[1-e/x,l] ifx>e/2.
The two parts of Fig. 1.1.2 display this set, as a multifunction of e and x respectively. It
is always a segment, reduced to the singleton {f'(x)} only for e = O(whenx # 0). This
example suggests that the approximate subdifferential is usually a proper enlargement
of the exact subdifferential; this will be confirmed by Proposition 1.2.3 below.
Another interesting instance is the indicator function of a nonempty closed convex
set:
s s
x o E
The 8-nonnal set is thus an intersection of half-spaces but is usually not a cone;
it contains the familiar nonnal cone Nc(x), to which it reduces when 8 = O. A
condensed fonn of (1. 1.6) uses the polar of the set C - {x}:
aeIC(x) = 8(C - x)O for all x E C and 8 > o.
These examples raise the question of the boundedness of aef(x).
Theorem 1.1.4 For 8 ~ 0, aef(x) is a closed convex set, which is nonempty and
bounded if and only if x E int dom f.
PROOF. Closedness and convexity come immediately from the definition (1.1.1).
Now, if x E intdomf, then aef(x) contains the nonempty set af(x) (Theo-
rem X.1.4.2). Then let 8 > 0 be such that B(x, 8) C intdomf, and let L be a
Lipschitz constant for f on B(x, 8) (Theorem Iy'3.1.2). For 0 :f:. s E aef(x), take
y = x + 8s/lIsll:
f(x) + L8 ~ f(y) ~ f(x) + 8{s, s/lIsll) - 8
i.e. IIsll ::;,;; L + 8/8. Thus, the nonempty aef(x) is also bounded.
Conversely, take any s) in the normal cone to dom f at x E dom f:
(sJ, y - x) ::;,;; 0 for all y E dom f .
If aef(x) :f:. 0, add this inequality to (1.1.1) to obtain
f(y) ~ f(x) + (s + SJ, Y - x) - 8
In the introduction to this chapter, it was mentioned that one motivation for the
8-subdifferential was practical. The next result gives a first explanation: as f can be
used to characterize the 8-solutions of a convex minimization problem - but, starting
from Chap. XIII, we will see that its role is much more important than that.
Theorem 1.1.S The follOWing two properties are equivalent:
o E aef(x) ,
f(x) ::;,;; f(y) + 8 for all y E ]Rn •
PROOF. Apply the definition. o
1 The Approximate Subdifferential 95
which, remembering that /* (s) is the supremum of the bracket, is equivalent to (1.2.1).
This implies that 8ef(x) C dom /* and, applying (1.2.1) with f replaced by f*:
Indeed, the inclusion "c" comes directly from (1.2.1); conversely, if s E dom f*, we
know from Theorem 1.1.2 that there exists x E 8ef*(s), i.e. s E 8ef(x).
Likewise, for fixed x E dom f,
Here again, the "c" comes directly from (1.2.1) while, if s E dom /*, set
{sEb+lmQ: f(x)+4(s-b,Q-(s-b))~(s,x)+s}.
This set has a nicer expression if we single out V f (x): setting p = s - b - Qx, we
see that s - b E 1m Q means p E 1m Q and, via some algebra,
96 XI. Approximate Subdifferentials of Convex Functions
When Q is invertible and b = 0, f defines the norm Illxlll = (Qx, X)I/2. Its e-
subdifferential is then a neighborhood of the gradient for the metric associated with
the dual norm Illslll* = (s, Q- 1S)I/2. 0
The above example suggests once more that as f is usually a proper enlargement
of af; in particular it is "never" reduced to a singleton, in contrast with af, which is
"often" reduced to the gradient of f. This is made precise by the next results, which
somehow describe two opposite situations.
int as' f(x) = {s E lR.n : f*(s) + f(x) - (s, x) < e'} :::> ad(x).
Proposition 1.2.4 Let f E Conv lR.n and suppose that aso f (Xo) is a Singleton for
some Xo E dom f and eo > O. Then f is affine on lR.n .
PROOF. Denote by So the unique eo-subgradient of f at Xo. Let e E ]0, eo[; in view
of the monotonicity property (1.1.2), the nonempty set as f (Xo) can only be the same
singleton {so}. Then let e' > eo; the graph-convexity property (1.1.4) easily shows
that as' f(xo) is again {so}.
Thus, using the characterization (1.2.l) of an approximate subgradient, we have
proved:
s =1= So ===} f*(s) > e + (so, xo) - f(xo) for all e > 0,
1 The Approximate Subdifferential 97
(. .1\
B = 0, we do obtain the exposed face itself: aac(d) = Fe(d).
""'ctd
)
d
.........,
•• C ,••~
,
\\
o
Fig.I.2.I. An 8-face
Along the same lines, the conjugate ofIe in (1.1.6) is ae, so the B-normal set to
C at x is equivalently defined as
Ne,e(x) = {s E IRn : ac(s) ~ (s, x) + B}.
Beware of the difference with (1.2.6); one set looks like a face, the other like a cone;
the relation linking them connotes the polarity relation, see Remark V.3.2.6.
(a) Elementary Calculus. First, we list some properties coming directly from the
definitions.
Proposition 1.3.1
(i) For the function g(x) := I(x) + r, aeg(x) = ad(x).
(ii) For the function g(x) := al(x) and a > 0, aeg(x) = aae/al(x).
(iii) For the function g(x) := I(ax) and a =f:. 0, aeg(x) = aael(ax).
(iv) Moregenerally, if A is an invertible linear operator, ae(f oA)(x) = A*ad(Ax).
(v) For the function g(x) := I(x - xo), aeg(x + xo) = a/(x).
(vi) For the function g(x) := I(x) + {so, x}, aeg(x) = ad(x) + {so}.
(vii) If II ~ 12 and II (x) = l2(x), then adl(x) C ae l2(x).
PROOF. Apply (1.1.1), or combine (1.2.1) with the elementary calculus rules X.l.3 .1,
whichever seems easier. 0
Proposition 1.3.2 Let H be a subspace containing a point ofdom I and call PH the
operator oforthogonal projection onto H. For all x E dom I n H,
i.e.
(b) The Tilted Conjugate Function. From (1.2.1), ad(x) appears as the sublevel-
set at level e of the "tilted conjugate function"
°
which is clearly in Conv]Rn (remember x E dom I!) and plays an important role. Its
infimum on]Rn is (Theorem 1.1.2), attained at the subgradients of I at x, if there
1 The Approximate Subdifferential 99
Proposition 1.3.3 For x E dom f, the epigraph ofg; is the graph ofthe multifunction
s ~ oef(x):
The supportfunction of this set has, at (d, -u) E JRn x JR, the value
O'epig*(d, -u)
x
= sup{(s,
s,e
d) - su : s E oef(x)} = (1.3.4)
PROOF. The equality of the two sets in (1.3.3) is trivial, in view of(I.2.1) and (1.3.1).
Then (1.3.4) comes either from direct calculations, or via Proposition X.l.2.1, with f
replaced by g;,
whose domain is just dom f* (Proposition X.l.3.1 may also be used).
o
Remember §IY.2.2 and Example 1.3.2.2 to realize that, up to the closure operation
at u = 0, the function (l.3.4) is just the perspective of the shifted function h ~
f(x + h) - f(x).
Figure 1.3.1 displays the graph of s ~ oe f (x), with the variable s plotted along
the horizontal axis, as usual (see also the right part of Fig. 1.1.2). Rotating the picture
so that this axis becomes vertical, we obtain epi g;.
£
Fig. 1.3.1. Approximate subdifferential as a SUblevel set in the dual space
100 XI. Approximate Subdifferentials of Convex Functions
The relation expressed in (1.3.5) is rather natural, and can be compared to Theo-
rem 1.1.5. It defines a sort of "critical" value for 8, satisfying the following property:
PROOF. Fix x e dom/. Then set u = 1 in (1.3.4) to obtain uepigj(h, -1) = gx(h),
i.e. (1.3.7), which is just a closed form for (1.3.6). 0
Theorem 1.3.6 gives a converse to Proposition 1.3.1 (vii): if, for a particular x e
dom II n dom 12,
odl (x) C oeh(x) for alle > 0,
then
II (y) - II (x) :::;; h(y) - h(x) for all y e ]Rn .
which also expresses a closed convex function as a supremum, but of affine functions.
The index s can be restricted to dom 1*, or even to ri dom 1*; another better idea:
restrict s to dom of* (::J ridomf*), in which case f*(s) = (s, y) - I(y) for some
y e dom 01. In other words, we can write
This formula has a refined form, better suited to numerical optimization where
only one subgradient at a particular point is usually known (see again the concept of
black box (Ul) in §VIII.3.5).
Yt:=x+tderidomf forte]O,I].
Then
f(x + d) ~ f(Yt) + (s(Yt), x + d - Yt)
so that
{ ( ) d) ~ f (x + d) - f (Yt)
S Yt, "" 1 -t .
Then write
Throughout this section, x is fixed in dom f. As a (nonempty) closed convex set, the
approximate subdifferential aef(x) then has a support function, for any e > O. We
denote it by f;(x, .):
a closed sublinear function. The notation f; is motivated by §VI.I.I: f' (x, .) supports
af(x), so it is natural to denote by f;(x, .) the function supporting aef(x). The
present section is devoted to a study of this support function, which is obtained via
an "approximate difference quotient".
Theorem 2.1.1 For x e dom f and e > 0, the support function of ae f (x) is
1!])n
IN.. 3
d
~
~'(
Je x,
d) _ . f f(x
- In
+ td) - f(x) +e , (2.1.1)
t>O t
which will be called the e-directional derivative of fat x.
PROOF. We use Proposition 1.3.3: embedding the set aef (x) in the larger space JRn x JR,
we view it as the intersection of epi gi with the horizontal hyperplane
2 The Approximate Directional Derivative 103
f;
(rotate and contemplate thoroughly Fig. 1.3.1). Correspondingly, (x, d) is the value
at (d, 0) of the support function of our embedded set epigi n He:
e E A(riepig;) = ri[A(epig;)],
where the last equality comes from Proposition III.2.1.12. But we know from Theo-
rem 1.1.2 and Proposition 1.3.3 that A(epigi) is JR+ or JRt; in both cases, its relative
interior is JRt, which contains the given e > O. Our assumption is checked.
As a result, the following problem
When e > 0, this function does achieve its minimum on JR+; in the t-Ianguage, this
means that the case t -I- 0 never occurs in the minimization problem (2.1.1). On the
other hand, 0 may be the unique minimum point of re, i.e. (2.1.1) may have no solution
"at finite distance".
which cannot be a support function: it is not even closed. On the other hand, the infimum in
(2.1.1), which is obtained for x + td on the unit sphere, is easy to compute: for d =f. 0,
if 8> 0,
if8 ~ 0.
(2.1.4)
Note: to obtain (2.1.4), it is fun to use elementary geometry (carry out in Fig. 2.1.1 the
inversion operation from the pole x), but the argument of Remark VI.6.3. 7 is more systematic,
and is also interesting.
Remark 2.1.2 We have here an illustration of Example X.2.4.3: closing f' (x, .) is just what
is needed to obtain the support function of af(x). For x E dom af, the function
1I])n
~ 3
d 1-+
f'e x, d) -_ I·1m f(x + td) - f(x) .
t,j,o t
This property appears also in the proof of Theorem 2.1.1: for e = 0, the closure operation
must still be performed after (2.1.2) is solved. 0
°
Figure 2.1.2 illustrates (2.1.1) in a less dramatic situation: for e > 0, the line representing
t f(x) - e + tf;(x, d) supports the graph oft 1-+ f(x + td), not at t = but at some
1-+
point te > 0; among all the slopes joining (0, f(x) - e) to an arbitrary point on the graph of
f (x + .d), the right-hand side of (2.1.1) is the smallest possible one.
On the other hand, Fig. 2.1.2 is the trace in ~ x ~ of a picture in ~n x R among all
the possible hyperplanes passing through (x, f(x) - e) and supporting epi f, there is one
touching gr f somewhere along the given d; this hyperplane therefore gives the maximal
slope along d, which is the value (2.0.1). The contact x + tEd plays an important role in
minimization algorithms and we will return to it later.
2 The Approximate Directional Derivative 105
f(x) ----I
f(x+td) - E - -
~:::""_...L....
The same picture illustrates Theorem 1.3.6: consider the point x + ted as fixed and
call it y. Now, for arbitrary 8 ~ 0, draw a hyperplane supporting epi f and passing through
(0, f (x) - 8). Its altitude at y is f (x) - 8 + f; (x, y - x) which, by definition of a support,
is not larger than f(y), but equal to it when 8 = e.
In summary: fix (x, d) E dom f x lRn and consider a pair (e, y) E lRt x lRn linked by
the relation y = x + ted.
- To obtain y from e, use (2.1.1): as a function of the "horizontal" variable t > 0, draw the
line passing through (x, f(x) - e) and (x + td, f(x + td)); the resulting slope must be
minimized.
- To obtain e from y, use (1.3.7): as a function of the "vertical" variable 8 ~ 0, draw a support
passing through (x, f (x) - 8); its altitude at y must be maximized.
Example 2.1.3 Take again the convex quadratic function of Example 1.2.2: I: (x, d)
is the optimal value of the one-dimensional minimization problem
. t(Qd,d)t 2 +(V/(x),d)t+e
mf .
t>O t
If (Qd, d) = 0, this infimum is (V I(x), d). Otherwise, it is attained at
t
6 -
- Rf; e
(Qd,d)
and
= (V/(x), d) + .j2e(Qd, d).
I;(x, d)
This is a general formula: it also holds for (Qd, d) = O.
It is interesting to observe that[l: (x, d) - I' (x, d)]/ e ~ +00 when e -J.. 0, but
H/:(x, d) - I'(x, d)]2 == (Qd, d).
e
This suggests that, for C2 convex functions,
where
106 XI. Approximate Subdifferentials of Convex Functions
is the second derivative of f at x in the direction d. Here is one more motivation for
the approximate subdifferentiaI: it accounts for second order behaviour of f. 0
Our aim is now to study the minimization of qe, a relevant problem in view ofTheo-
rem 2.1.1.
which describes the behaviour of f(x + td) for t -+ +00 (see §IY.3.2 and Exam-
ple X.2.4.6 for the asymptotic function f6o)' The other,
concerns the case t -J, 0: t l = 0 if f(x+td) has a "positive curvature fort -J, 0". When
t l = +00, i.e. f is affine on the half-line x +lR+d, we have qe(t) = f'ex, d) + elt;
then it is clear that f; (x, d) = f' (x, d) = fbo (d) and that Te is empty for e > O.
Example 2.2.1 Before making precise statements, let us see in Fig. 2.2.1 what can be ex-
pected, with the help of the example
The lower-part of the picture gives the correspondence e -4 Te , with the same abscissa-axis
as the upper-part (namely t). Some important properties are thus introduced:
- As already known, q increases from f'ex, d) = 1 to f/x;(d) = 3;
- indeed, q(t) is constantly equaito its minimal value f'ex, d) forO < t ~ 1/2, and t l = 1/2;
2 The Approximate Directional Derivative 107
t(d) =3 ~--::6 i
f'(x,d) =1
10
00
10
--------
= 1 _._ ... _................_._..... _...-.._............_.........
- f;(x, d) stays between f'ex, d) and f60(d), and reaches this last value for s ;;:: 1;
- To is the segment ]0, 1/2]; TJ is the half-line [1/2, +00[, and Te is empty for s > 1. 0
This example reveals another important number associated with large t, which we will
calls oo (equal to 1 in the example). The statements (i) and (ii) in the next result are already
known, but their proof is more natural in the t-Ianguage.
satisfies
Te "# 0 if s < SOO and Te = 0 if s > SOO •
Because f'(x,·) ~ f/x;, this in turn means that q(t) is constant for t > 0, or that
l- = +00.
f;
Now observe that s r+ (x, d) is increasing (just as de f); so E, when nonempty,
is an interval containing 0: s < soo implies sEE, hence Te #- 0. Conversely, if
s > soo, take s' in ]Soo, s[ (so s' fj E) and t > 0 arbitrary:
The example t r+ f(x + td) = .Jf+t2 illustrates the case SOO < +00 (here
soo = l);itcorrespondstof(x+, d) having the asymptote t r+ f(x)-soo+ f60(d)t.
Also, one directly sees in (2.2.3) thats oo = +00 whenever f60(d) is infinite. With the
example f of(1.1.5)(and taking d = 1), wehavet l = 0, f/x;(1) = Oands oo = +00.
(b) The Closed Convex Function r e. For a more accurate study of Te , we use now the
function re of (2.1.3), obtained via the change of variable U = I/t: re(u) = qe(1/u)
for u > O. It is the trace on lR of the function (1.3.4), which is known to be in
Conv(lRn x lR); therefore
re E ConvlR.
PROOF. The whole point is to compute the subdifferential of the convex function
u ~ t/f(u) := u((J(llu), and this amounts to computing its one-sided derivatives.
Take positive u' = lit' and u = lit, with u E domre (hence ((J(llu) < +00), and
compute the difference quotient of t/f (cf. the proof of Theorem 1.1.1.6)
u'((J(llu') - u((J(llu) ((J(t') - ((J(t)
,
u-u
= ((J(t) - t,
t-t
.
Knowing that re(u') = t/f(u') - u'[f(x) - e] for all u' > 0, we readily obtain
ore(u) = ((J(t) - to((J(t) - f(x) + e = re(u)lu - to((J(t). 0
whose graph supports epi q> at r = t. Its value 1(0) atthe vertical axis r = 0 isa subderivative
of u 1-+ uq>(l/u) at u = lit, and t is optimal in (2.1.1) when 1(0) reaches the given
1
f(x) - e. In this case, TE is the contact-set between gr and grq>. Note: convexity and
Proposition 2.2.2(iii) tell us that f(x) - e oo ~ 1(0) ~ f(x). 0
Let us summarize the results of this section concerning the optimal set Te, or Ue
of (2.2.4).
- First, we have the somewhat degenerate case t l = +00, meaning that f is affine
on the half-line x + 1R+d. This can be described by one of the following equivalent
statements:
f'(x, d) = f60(d);
f;(x,d) = f'(x,d) foralle > 0;
Vt > 0, qe(t) > f60(d) for aIle> 0;
Te = I{) for all e > 0;
Ue = {OJ for all e > O.
110 XI. Approximate SubdifIerentials of Convex Functions
- The second situation, more interesting, is when tf. < +00; then three essentially
different cases may occur, according to the value of e.
- When 0 < e < eoo , one has equivalently
f'(x, d) < f:(x, d) < f/:.o(d);
3t > 0 such that qs(t) < f60(d) ;
Ts is a nonempty compact interval;
Of/. Us.
- When eOO < e:
f:(x, d) = f/:.o(d);
Vt > 0, qs(t) > f/:.o(d);
Ts =0;
Us = {O}.
f:(x, d) = f60(d) ;
Ts is empty or unbounded;
o E Us.
Note: in the last case, Ts nonempty but unbounded means that f(x + td) touches
its asymptote f(x) - e + tf60(d) for t large enough.
._ {-f:(X,d) ife~O,
() .-
ve
+00 th . 0 erwlse.
Then v E Conv lR: in fact, dom v is either [0, +oo[ or ]0, +00[. When e -l- 0, v(e)
tends to - f'(x, d) E lR U {+oo}.
For -e > 0, we have trivially (remember that f' (x, d) < +00)
and this confirms the relevance of the notation f60 for an asymptotic function.
Theorem 2.3.2 With the notation o/Proposition 2.3.1 and (2.2.4),
which exactly means that u E Ue. The rest follows from the conclusions of §2.2. 0
f~(x,d)
t,:.(d) =31------:::ao'I"'"-----
f'(x,d) =1
Remark 2.3.3 Some useful formulae follow from (2.3.2): whenever Te -::/= 0, we have
1'/ - e
I
fT/(x,d) ~ fe(x,d)
I
+- - for all tETe,
t
1'/ - e
fe(x, d) + -.- - 0(1'/ - e) ifl'/ ~ e,
I I
fT/(x, d) =
te
d d 1'/ - e . 1'/
I
fT/(x, ) = Je(x, )
~I
+- -- 0(1'/ - e) If ~ e,
!e
where!e and te are respectively the smallest and largest elements of Te, and the remainder
terms 0(·) are nonnegative (of course,!e = te except possibly for countably many values of
e). We also have the integral representation (1.4.2.6)
A natural question is now: what happens when e ,!.. o? We already know that
fi(x, d) ~ f'(x, d) but at what speed? A qualitative answer is as follows (see also
Example 2.1.3).
· fi(x,d)-f'(x,d 1 [0 ]
11m
e.J..o e
= ""l
t
E ,+00,
For fixed x and d, use the notation of Remark 2.2.4 and look again at Fig. 2.1.2, with
the results of the present section in mind: it is important to meditate on the correspondence
between the horizontal set of stepsizes and the vertical set of f -decrements.
To any stepsize t ~ 0 and slope (s, d) ofa line supporting epiql at (t, qI(t», is associated
a value
8r,s := f(x) - 1(0) = f(x) - f(x + td) + t(s, d) ~ o. (2.3.2)
Likewise, to any decrement 8 ~ 0 from f (x) is associated a stepsize te ~ 0 via the slope
ff(x, d). This defines a pair of multi functions t t---+ -ar(llt) and 8 t---+ Te , inverse to each
other, and monotone in the sense that
for tETe and t' E Te" 8 > 8' ¢=> t > t' .
To go analytically from t to 8, one goes first to the somewhat abstract set of inverse stepsizes
u, from which e is obtained by the duality correspondence of Lemma 2.3.1. See also the lower
part of Fig. 2.2.1, for an instance of a mapping T (or rather its inverse).
Proposition 2.3.5 With the hypotheses and notations given so far, assume t l < +00
(i.e. f is not affine on the whole half-line x + JR.+d). For t > 0, there is a unique
e(t) E [0, e OO [ such that
(v(a): 0:::;;a<8°O}=-[I'(x,d),I~(d)[.
Figure 2.3.2 emphasizes the difference between this 8(t) and the 8t,s defined in
(2.3.2).
f(x+td)
--~------------~---------------
f(x)
gr <p
f(x)-£(t)
From Theorem X.2.3.1, the conjugate of a sum of two functions II + h is the closure
of the infimal convolution It t g.
Expressing ae(iJ + h) (s) will require an expres-
114 XI. Approximate Subdifferentials of Convex Functions
sion for this infimal convolution, which in turn requires the following basic assump-
tion:
When s E dom(fl + 12)*'1 (fl + h)*(s) = !t(PI) + g(P2)
(3.1.1)
for some Pi satisfying PI + P2 = s .
This just expresses that the inf-convolution of !t and 12* is exact at s = PI + P2:
the couple (Ph h) actually minimizes the function (Ph P2) ~ !t(PI) + g(P2).
Furthermore, we know (Theorem X.2.3.2) that this property holds under various
conditions on II and 12; one of them is
This s is therefore in dom(fl + 12)* and we can apply (3.1.1): with the help of some
PI and P2, we write (3.1.4) as 81 + 82 ~ 8, where we have set
Let us compute the £-subdifferential of this function, i.e. the £-normal set of Defini-
tion 1.1.3. The approximate normal set to Hj- has been given in Example 1.2.5, we
obtain for x E C
(3.1.7)
(and the normal cone Nc(x) can even be added to the left-hand set). Likewise, set
k(x):= minj Cj(x). Ifk(x) > 0,
is not -00. The e-minimizers of f on C are those x E C such that f(x) :::;; lc + e; clearly
enough, an e-minimizer is an x such that (remember Theorem 1.1.5)
Here domf = ]Rn: we conclude from Theorem 3.1.1 that an e-minimizer is an x such
that
o E oaf(x) + Ne,s-a(x) for some a E [0, e],
i.e. f has at x an a-subgradient whose opposite lies in the (e - a)-normal set to C. The
situation simplifies in some cases.
116 XI. Approximate Subdifferentials of Convex Functions
An 8-minimizer is then an x E C for which there exists JL = (JLl •...• JLm) E R m such that
2:J=1 JLjSj + So = 0, )
2:J=1 JLj'j ~ (so. x) :( e,
JLj ~ 0 for J = 1•...• m. o
We leave it as an exercise to redo the above examples when
is described in standard form (K being a closed convex polyhedral cone, say the nonnegative
orthant).
Remark 3.1.5 In the space IR n ) x IR n2 equipped with the scalar product of a product-
space, take a decomposable function:
Because ofthe calculus rule X.1.3.1(ix), the basic assumption (3.1.1) holds automat-
ically, so we always have
Given g E ConvlRm and an affine mapping A : IRn -+ IRm (Ax = Aox + Yo E IRm
with Ao linear), take f := goA E ConvlRn. As in §3.1, we need an assumption,
which is in this context:
«(', .) will denote indifferently the scalar product in IR n or IRm). As was the case with
(3.1.1), Theorem X.2.2.1 tells us that the above p actually minimizes the function
3 Calculus Rules on the Approximate Subdifferential 117
Theorem 3.2.1 Let g and A be defined as above. For all e ~ 0 and x such that
Ax E dom g, there holds
(3.2.3)
where we have used the property A(y - x) = Ao(Y - x). Thus we have proved that
Atp E ae(g 0 A)(x).
Conversely, lets E ae(g 0 A)(x), i.e.
Apply (3.2.1): with the help of some p such that Atp = s, (3.2.4) can be written
e ~ g*(p) - (p, Yo) + g(Ax) - (p, Aox) = g*(p) + g(Ax) - (p, Ax) .
This shows that P E aeg(Ax). Altogether, we have proved that our sis in Ataeg(Ax).
o
Naturally, only the linear part of A counts in the right-hand side of (3.2.3): the
translation is taken care ofby Proposition X.1.3. l(v).
As an illustration of this calculus rule, take Xo E domg, a direction d =f. 0, and
compute the approximate subdifferential of the function
We recall that, for g ECony lRm and A linear from lRm to lRn , the image of g under
A is the function Ag defined by
lRn 3 x H- (Ag)(x) := inf {g(y) : Ay = x}. (3.3.1)
Once again, we need an assumption for characterizing the 8-subdifferentia1 of Ag,
namely that the infimum in (3.3.1) is attained "at finite distance". A sufficient assump-
tion for this is
ImA* nridomg* =1= 0, (3.3.2)
which implies at the same time that Ag E Cony lRn (see Theorem X.2.2.3). As already
seen for condition (X.2.2.Q.iii), this assumption is implied by
g~(d) > 0 for all nonzero dE Ker A.
Theorem 3.3.1 Let 8 ~ 0 and x E dom Ag = A (dom g). Suppose that there is some
y E lR m with Ay = x and g(y) = Ag(x);for example assume (3.3.2). Then
8£(Ag)(x) = {s E lRn : A*s E 8£g(y)}. (3.3.3)
PROOF. To say that s E 8£(Ag)(x) is to say that
where we have made use of the existence and properties of y. Then apply Theo-
rem X.2.1.1: (Ag)* = g* 0 A*, so
S E 8£(Ag)(x) {::=} g*(A*s) + g(y) - (A*s, y) ~ 8. 0
This result can of course be compared to Theorem VI.4.5.1: the hypotheses are
just the same - except for the extended-valuedness possibility. Thus, we see that the
inverse image under A * of 8£g(yx) does not depend on the particular Yx optimal in
(3.3.1).
We know that a particular case is the marginal function:
lRn 3 x H- I(x) := inf {g(x, z) : Z E lR P }, (3.3.4)
where g E Conv(lRn x lR P ). Indeed, I is the image of g under the projection mapping
from lRn+ P to lRn defined by A (x, z) = x. The above result can be particularized to
this case:
Corollary 3.3.2 With g E Conv(lRn x lR P ), let g* be associated with a scalar product
preserving the structure oflRm = lRn x lRP as a product space, namely:
(3.3.5)
and consider the marginal function I of (3.3.4). Let 8 ~ 0, x E dom I; suppose that
there is some Z E lR P such that g(x, z) = I(x); Z exists for example when
3so E lRn such that (so, 0) E ridomg* . (3.3.6)
Then
8ef(x) = {s E lRn : (s,O) E 8£g(x, z)}. (3.3.7)
3 Calculus Rules on the Approximate Subdifferential 119
PROOF. Set A : (x, z) t-+ x. With the scalar product (3.3.5), A* : IR n ---+ IRn x IRP
is defined by A*s = =
(s, 0), ImA* IR n x {OJ, so (3.3.6) and (3.3.7) are just (3.3.2)
and (3.3.3) respectively. 0
where I, and !2 are both in Conv IRn. With m = 2n, and 1R2n being equipped with
the Euclidean structure of a product-space, this is indeed an image function:
Theorem 3.4.1 Let 8 ~ 0 and x E dom(f, ~ h) = dom I, + dom fz. Suppose that
there are y, and Y2 such that the inf-convolution is exact at x = y, + Y2; this is the
case for example when
ridomlt n ridomg =f. 13. (3.4.2)
Then
so (Proposition III.2.UI)
so that s E aeJi (Yi) for i = 1,2. Particularizing (3.3.3) to our present situation:
If all (YI) n a12 (Y2) =f:. 0, the inf-convolution is exact at x = YI + Y2 and equality
holds in (3.4.6).
in view of the Fenchel inequality (X. 1. l.3), this is actually an equality, i.e. SEaI (x).
Now use this last equality as a value for (s, x) in (3.4.7) to obtain
i.e. the inf-convolution is exactatx = YI + Y2; equality in (3.4.6) follows from (3.4.5).
o
Remark 3.4.3 Beware that D may be empty on the whole of H while aUI i!i h)(x) ¥= 0.
Take for example
II (y) = exp Y and 12 = exp( -Y);
then II i!i 12 == 0, hence aUI i!i h) == {O}. Yet D(YI, Y2) = 0 for all (YI, Y2) E H: the
inf-convolution is nowhere exact.
Note also that (3.4.5) may express the equality between empty sets: take for example
II = 1[0, +oo[ and
R ~() {-,JYi. ifY2;?:0,
3 Y2 1-+ J2 Y2 = +00 otherwise.
Example 3.4.4 (Moreau-Yosida Regularizations) For e > 0 and I E Conv IRn, let
I(c) := f t (1/2 ell . 11 2), i.e.
f(c)(X) = min {f(y) + !ellx - yll2 : y E IRn};
Using the approximate subdifferential of 1/211 . 112 (see Example 1.2.2 if necessary),
Theorem 3.4.1 gives
Oef(c)(x)= U [oe-af(xc)nB(e(X-xc),J2ea)].
o~a~e
Thus f(c) is a differentiable convex function. It can even be said that V f(c) is Lips-
chitzian with constant e on IRn. To see this, recall that the conjugate
f(~) = f* + ~ II . 112
is strongly convex with modulus 1/e; then apply Theorem X.4.2.1.
In the particular case where e = 1 and f is the indicator function of a closed
convexsetC, f(c) = 1/2d~ (the squared distance to C) and Xc = Pc (x) (the projection
of x onto C). Using the notation of (1.1.6):
Just as in Example 3.4.4, we can particularize ftc] to the case where f is the
indicator function of a closed convex set C: we get ftc] = cdc and therefore
8e (cdc)(x) = NC,e(x) n B(O, c) for all x E C.
Thus, when e increases, the set K' in Fig. V.2.3.1 increases but stays confined within
B(O, c).
3 Calculus Rules on the Approximate Subdifferential 123
Remark 3.4.6 For x fj. Cc[!], there is some Ye =f:. x yielding the infimum in (3.4.9).
At such a Ye, the Euclidean norm is differentiable, so we obtain that f[e] is differentiable
as well and
x - Ye
V f[e] (x) = c Ilx _ Yell E 8f(Ye) .
Compare this with Example 3.4.4: here Ye need not be unique, but the direction Ye - x
depends only on x. 0
An important issue is whether this minimization problem has a solution, and whether
the infimal value is a closed function of s. Here again, the situation is not simple when
the functions are extended-valued.
Theorem 3.5.1 Let fl' ... , fp be a finite number of convex functions from lRn to lR
and let f := maxj f); set m := min{p, n + I}. Then, S E 8ef(x) ifand only if there
exist m vectors Sj E dom fj*, convex multipliers aj, and nonnegative I>j such that
m
f*(s) = Lajft(sj) ,
j=l
m m
"Lajfj*(Sj) + f(x) ~ S + "Laj(sj,x) ,
j=1 j=1
which we write
m m
"Laj[fj*(Sj) + Jj(x) - (Sj' x)] ~ S + "LajJj(x) - f(x). (3.5.3)
j=1 j=1
Thus, if S E ad(x), i.e. if(3.5.3) holds, we can set
It is important to realize what (3.5.2) means. Its third relation (which, incidentally, could
be replaced by an equality) can be written
m
~)ej + ajej) ~ e, (3.5.4)
j=1
where, for each j, the number
ej := f(x) - fj(x)
is nonnegative and measures how close ij comes to the maximal f at x. Using elementary
calculus rules for approximate subdifferentials, a set-formulation of(3.5.2) is
Example 3.5.2 Consider the function f+ := max{O, f}, where f : lRn --+ lR is convex. We
get
as(r)(x) = u {a8(af)(X) : 0::;; a::;; 1, 8 + rex) - af(x) ::;; 8} .
Setting 8(a) := 8 - rex) + af(x), this can be written as
Each 8j /arsubdifferential in (3.5.2) is constantly {Sj}: the 81'S play no role and can be
eliminated from (3.5.4), which can be used as a mere definition
lRn 3 Y 1-+ fey) = f(x) + max {-ej + (Sj, y -x) : j = 1, ... , pl·
The constant term f(x) is of little importance, as far as subdifferentials are concerned.
Neglecting it, ej thus appears as the value at x (the point where asf is computed) ofthe j'll
affine function making up f.
Geometrically, as f (x) dilates when 8 increases, and describes a sort of spider web with
af(x) as "kernel". When 8 reaches the value maxj ej' ad(x) stops at CO{SI, ... , Sp}. 0
Finally, if 8 = 0, (3.5.4) can be satisfied only by 8j = 0 and ajej = 0 for all j. We thus
recover the important Corollary VI.4.3.2. Comparing it with (3.5.6), we see that, for larger ej
or 1}j, the 1}rsubgradient Sj is "more remote from af(x)", and as a result, its weightaj must
be smaller.
PROOF. [=*] Let a ~ 0 be a minimum of the function 1/Is in Theorem X.2.5.1; there
are two cases:
(a) a = O. Because domf = lRn , this implies s = 0 and (g 0 f)*(0) = g*(O). The
characterization of s = 0 E oe (g 0 f) (x) is
Thus, (3.6.1) holds with 82 = 8,81 = 0 - and a = 0 (note: o(Of) == {OJ because f
is finite everywhere).
(b) a > O. Then (g 0 f)*(s) = aj*(s/a) + g*(a) and the characterization of
s E oe(g °f)(x) is
i.e.
(af)*(s) + af(x) - af(x) + g*(a) + g(f(x» - (s, x) ~ 8.
Then we obtain: s E Ne,e(x) if and only if there are nonnegative a, 8t. 82, with
81 + 82 = 8, such that
s E oel (ac)(x) and ac(x) + 82 ~ O. (3.6.3)
4 The Approximate Subdifferential as a Multifunction 127
Corollary 3.6.2 Let C be described by (3.6.2) and assume that c(xo) < 0 for some
Xo. Then
NC,e(x) = U {a8 (a c) (x) : a ~ 0, 8 ~ 0, 8 - ac(x) ~ 6} . (3.6.4)
Inparticular, ifx e bdC, i.e. ifc(x) = 0,
NC,e(x) = U{ae(ac)(x) : a ~ O}. (3.6.5)
PROOF. Eliminate 62 and let 8 := 6) in (3.6.3) to obtain the set-formulation (3.6.4).
Then remember (VI.l.3.6) and use again the monotonicity of the multifunction 8 ~
~~~W. 0
We will see in this section that the multifunction ae f is much more regular when 6 > 0
than the exact subdifferential, studied in §VI.6.2. We start with two useful properties,
stating that the approximate subdifferential (e, x) ~ aef(x) has a closed graph,
and is locally bounded on the interior of dom f; see §A.5 for the terminology.
Proposition 4.1.1 Let {(ek> Xk, Sk)} be a sequence converging to (e, x, s), with sk e
aekf(Xk) for all k. Then s e aef(x).
PROOF. By definition,
f(y) ~ f(Xk) + (Sk> y - Xk) - 6k for all y e ]Rn .
Pass to the limit on k and use the lower semi-continuity of f. o
Proposition 4.1.2 Assume intdom f '" 0; let 8 > 0 and L be such that f is Lip-
schitzian with constant L on some ball B(x, 8), where x e intdomf. Then, for all
8' < 8,
e
IIslI ~ L + 8 _ 8' (4.1.1)
'} (lix
LlH(aef(x), ae' f(x'» ~ --:-----{K
mm e,e
- x'il + Ie - e'I). (4.1.2)
PROOF. With d of norm 1, use (2.1.1): for any TJ > 0, there is tl1 > 0 such that
(4.1.3)
where we have used the notation qe(x, t) for the approximate difference quotient. By
assumption, we can let 8 -+ +00 in Proposition 4.1.2: there is a global Lipschitz
constant, say L, such that f;(x, d) ~ L. From
we therefore obtain
1 2L + TJ
-~--
tl1 e
Then we can write, using (2.1.1) and (4.1.3) again:
~
2Lllx'
-
xii + Ie' - el ~2:7](2Lllx'-xll+le'-el).
L
tl1
Remembering that TJ > 0 is arbitrary and inverting (x, e) with (x', e'), we do
obtain
4 The Approximate Subdifferential as a Multifunction 129
a property already illustrated by Fig. 1.1.2. Remember from §VI.6.2 that the multi-
function af need not be inner semi-continuous; when e = 0, no inclusion resembling
the above can hold, unless x isjixed.
A local version of Theorem 4.1.3 can similarly be proved: (4.1.2) holds on the
compact sets included in int dom f. Here, we consider an extension of the result to
unbounded subdifferentials. Recall that the Hausdorff distance is not convenient for
unbounded sets. A better distance is obtained by comparing the bounded parts of
closed convex sets: for c ~ 0, we take
Corollary 4.1.4 Let f E Conv IRn. Suppose that S C dom f and f. > are such
that af(x) n B(O, 0 #- 0forall XES. Then,forall c > ~ there exists Kc such that,
°
for all x, x' in Sand e, e' positive,
mm 8,E,}(lIx' -
LlH , c(ae!(x), ae,f(x'» ~ ~{ xII + Ie' - el).
PROOF. Consider fic] := f ~ cll . II and observe that we are in the conditions of
Proposition 3.4.5: f[c] is Lipschitzian on IRn and Theorem 4.1.3 applies. The rest
follows because the coincidence set of f[c] and f contains S. 0
Applying this result to an f finite everywhere, we obtain for example the following
local Lipschitz continuity:
Corollary 4.1.5 Let f : IRn ~ IR be convex. For any 8 ~ 0, there is K8 > Osuch
that
K8
LlH(aef(x), ae!(x'» ~ -lix - x' II for all x and x' in B(O, 8).
e
a
PROOF. We know from Proposition 4.1.2 that e f is bounded on B(O, 8), so the result
is a straightforward application of Corollary 4.1.4. 0
In §VI.6.3, we have seen that af(x) can be constructed by piecing together limits of
subgradients along directional sequences. From a practical point of view, this is an
130 XI. Approximate Subdifferentials of Convex Functions
important property: for example, it is the basis for descent schemes in nonsmooth opti-
mization, see Chap. IX. This kind of property is even more important for approximate
subdifferentials: remember from §II.l.2 that the only information obtainable from f
is a black box (Ul), which computes an exact subgradient at designated points. The
concept of approximate subgradient is therefore of no use, as long as there is no "black
box" to compute one. Starting from this observation, we study here the problem of
constructing aef(x) with the sole help of the same (Ul).
Theorem 4.2.1 (A. Brndsted and R.T. Rockafellar) Let be given f E ConvJR.n ,
o.
x E dom f and 8 ~ For any 11 > 0 and s E aef(x), there exist xl'/ E B(X,l1) and
sl'/ E af(xl'/) such that IIsl'/ - sil ~ 8/11.
PROOF. The data are x E dom f, 8 > 0 (if 8 = 0, just take xl'/ = x and sl'/ = s!),
11 > 0 and s E aef(x). Consider the closed convex function
It is nonnegative (Fenchel's inequality), satisfies fP(x) ~ 8 (cf. (1.2.1», and its subdif-
ferential is
afP(Y) = af(y) - Is} for all y E domfP = domf.
Perturb fP to the closed convex function
8
JR.n 3 y 1--+ 1/f(y) := fP(y) + -lly - x II ,
11
whose subdifferential at y E dom f is (apply Corollary 3.1.2: (3.1.2) obviously holds)
*).
is written
o E af(xl'/) - Is} + B(O,
It remains to prove that Xl'/ E B(x, 11). Using the nonnegativity of fP and optimality of
aef(x) c n u
1'/>0 lIy-xll ~ 1'/
{af(Y) + B(O, *)}. (4.2.1)
Proposition 4.2.2 (Transportation Formula) With x and x' in dom f, let s' E
af (x'). Then s' E aef (x) if and only if
f(x') ~ f(x) + (s', x' - x) - e. (4.2.2)
PROOF. The condition is obviously necessary, since the relation of definition (1.1.1)
must in particular hold at y = x'. Conversely, for s' E af (x'), we have for all y
f(y) ~ f(x') + (s', y - x') =
= f(x) + (s', y - x) + [f(x') - f(x) + (s', x - x')].
x ~
Fig. 4.2.1. Linearization errors and the transportation formula
Definition 4.2.3 (Linearization Error) For (x, x', s') E dom f x dom f x IRn, the
linearization error made at x, when f is linearized at x' with slope s', is the number
This linearization error is of particular interest when s' E af(x'); then, calling
The definition of s' E 01 (x') via the conjugate function can also be used:
The content of the transportation formula (4.2.2), illustrated by Fig. 4.2.1, is that
any s' E ol(x') is an e(x, x', s')-subgradient of 1 at x; and also that it is not in a
tighter approximate subdifferential, i.e.
This latter property relies on the contact (at x') between gr lx',s' and gr I. In fact, a
result slightly different from Proposition 4.2.2 is:
or equivalently
e(x, x', s') + ." :s:;; e .
PROOF. [(i)] Apply Proposition 4.1.2: there exist ~ > 0 and a constant K such that,
for all x' E B(x,~) ands' E af(x'),
e(x, x', s') ~ If(x') - f(x)1 + 1Is'1i IIx' - xII ~ 2Kllx' - xII;
see Fig.4.2.1: we insert a point between x and x'. A consequence of (4.2.4) will be
Ys(x) c cl Ve(x) (let ttl).
To prove our claim, set d := x' - x and use the function r = ro of (2.1.3) to
realize with Remark 2.2.4 that
Then we take arbitrary l/t = u > 1 ands" E af(x +td), so that -e(x, x +td, s") E
ar (u). The monotonicity property (V1.6.1.1) or (1.4.2.1) of the subdifferential ar gives
hence e(x, x + td, s") ~ e(x, x', s') ~ e, which proves (4.2.4).
There remains to prove that Ys(x) is closed, so let {xk} be a sequence of Ys(x)
converging to some x'. To each xk' we associate Sk E af(xk> n ae!(x); {Sk} is
bounded by virtue of Theorem 1.1.4: extracting a subsequence if necessary, we may
assume that {Sk} has a limit s'. Then, Proposition 4.1.1 and Theorem 1.1.4 show that
s' E af(x') n aef(x), which means that x' E Ys(x). 0
From a practical point of view, the set Ve (x) and the property (i) above are both important:
provided that y is close enough to x, the s(y) E of(y) computed by the black box (Ul)
is guaranteed to be an e-subgradient at x. Now, Ve(x) and ~(x) differ very little (they
are numerically indistinguishable), and the latter has a strong intrinsic value: by definition,
x' E ~(x) if and only if
3s' E oef(x) such that s' E of (x'), i.e. such that x' E of*(s').
Furthermore the above proof, especially (4.2.5), establishes a connection between ~ (x)
and §2.2. First ~(x) - {x} is star-shaped. Also, consider the intersection of ~(x) with a
direction d issuing from x. Keeping (4.2.3) in mind, we see that this set is the closed interval
134 XI. Approximate Subdifferentials of Convex Functions
where the perturbed difference quotient qe is decreasing. In a word, let te(d) be the largest
element of Te = Te (d) defined by (2.2.2), with the convention te (d) = +00 if Te is empty or
unbounded (meaning that the approximate difference quotient is decreasing on the whole of
lR;). Then
¥e(x) = (x + td : t E [0, te(d)], d E B(O, I)}. (4.2.7)
See again the geometric interpretations of §2, mainly the end of §2.3.
Our neighborhoods Ve(x) and Ye(x) enjoy some interesting properties if addi-
tional assumptions are made on f:
Proposition 4.2.6
(i) If f is I-coercive, then Ye (x) is bounded.
(ii) Iff is differentiable on IR.n, then Ve (x) = Ye (x) and Vf (Ve (x» c 8d (x).
(iii) Iff is I-coercive and differentiable, then V f(Ve(x» = 8d(x).
PROOF. In case (i), f* is finite everywhere, so the result follows from (4.2.6) and the
local boundedness of the subdifferential mapping (Proposition VI.6.2.2).
When f is differentiable, the equality between the two neighborhoods has already
been observed, and the stated inclusion comes from the very definition of Ve (x).
Finally, let us establish the converse inclusion in case (iii): for s E 8e f(x), pick
y E 81*(s), i.e. s = Vf(y). The result follows from (4.2.6): y E Ye(x) = Ve(x).
o
(see Example 4.2.7). However, I is minimal on the unit ball B(O, 1), i.e. 0 E a/(x')
forallx' E B(O,I);furthermore,O ¢ al /d(x).Inviewofitsdefinition(4.2.3), Ve(x)
therefore does not meet B(O, 1):
On the other hand, it suffices to remove B(O, 1) from D, and this can be seen from the
definition (4.2.3) of Ye(x): the linearization error e(x, " .) is left unperturbed by the
max-operation defining I. In summary:
Ye(x) = {(~, TJ) : (~- 1)2 + (TJ - 1)2 ~ 1/2, ~2 + TJ2 ~ I}.
We thus obtain a nonconvex neighborhood; observe its star-shaped character. 0
8(0,1)
Another expression is
which shows that Ve*(x) is closed and convex. Reproduce the proof of Proposi-
tion 4.2.5(i) to see that Ve*(x) is a neighborhood of x. For the quadratic function
of Example 4.2.7, Ve*(x) and Ve(x) coincide.
grf