0% found this document useful (0 votes)
2 views

Convex Analysis and Minimization Algorithms II-ChapXI

This document discusses the concept of approximate subdifferentials of convex functions, introducing the e-subdifferential as a perturbation of the traditional subdifferential. It highlights the practical and theoretical motivations for this concept, including its relevance in descent methods and the characterization of convex functions. The document also presents definitions, properties, and theorems related to e-subgradients and their applications in convex analysis and optimization.

Uploaded by

mounirjaouhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Convex Analysis and Minimization Algorithms II-ChapXI

This document discusses the concept of approximate subdifferentials of convex functions, introducing the e-subdifferential as a perturbation of the traditional subdifferential. It highlights the practical and theoretical motivations for this concept, including its relevance in descent methods and the characterization of convex functions. The document also presents definitions, properties, and theorems related to e-subgradients and their applications in convex analysis and optimization.

Uploaded by

mounirjaouhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

XI.

Approximate Sub differentials


of Convex Functions

Prerequisites. Sublinear functions and associated convex sets (Chap. V); characterization
of the subdifferential via the conjugacy correspondence (§X.l.4); calculus rules on conjugate
functions (§X.2); and also: behaviour at infinity ofone-dimensional convex functions (§I.2.3).

Introduction. There are two motivations for the concepts introduced in this chapter:
a practical one, related with descent methods, and a more theoretical one, in the
framework of differential calculus.

- In §VIII.2.2, we have seen that the steepest-descent method is not convergent, es-
sentially because the subdifferential is not a continuous mapping. Furthermore, we
have defined Algorithm IX.l.6 which, to find a descent direction, needs to extract
limits of subgradients: an impossible task on a computer.
- On the theoretical side, we have seen in Chap. VI the directional derivative of a finite
convex function, which supports a convex set: the subdifferential. This latter set was
generalized to extended-valued functions in §X.1.4; and infinite-valued directional
derivatives have also been seen (Proposition 1.4.1.3, Example X.2.4.3). A natural
question is then: is the supporting property still true in the extended-valued case?
The answer is not quite yes, see below the example illustrated by Fig. 2.1.1.

The two difficulties above are overcome altogether by the so-called e-subdifferen-
tial of f, denoted as f, which is a certain perturbation of the subdifferential studied
in Chap. VI for finite convex functions. While the two sets are identical for e = 0, the
properties of as f turn out to be substantially different from those of af. We therefore
study as f with the help of the relevant tools, essentially the conjugate function f*
(which was of no use in Chap. VI). In return, particularizing our study to the case
e = 0 enables us to generalize the results of Chap. VI to extended-valued functions.
Throughout this chapter, and unless otherwise specified, we therefore have

1f E Conv]Rn and e ~ 0 .1

However, keeping in mind that our development has a practical importance for nu-
merical optimization, we will often pay special attention to the finite-valued case.
J.-B. Hiriart-Urruty et al., Convex Analysis and Minimization Algorithms II
© Springer-Verlag Berlin Heidelberg 1993
92 XI. Approximate Subdifferentials of Convex Functions

1 The Approximate Subdifferential

1.1 Definition, First Properties and Examples

Definition 1.1.1 Given x E dom f, the vector s E JRn is called an e-subgradient of


f at x when the following property holds:

fey) ~ f(x) + (s, y - x) - e for all y E JRn . (1.1.1)

Of course, s is still an e-subgradient if (1.1.1) holds only for y E dom f. The set of
all e-subgradients of f at x is the e-subdifferential (of f at x), denoted by ad(x).
Even though we will rarely take e-subdifferentials of functions not in Conv JRn ,
it goes without saying that the relation of definition (1.1.1) can be applied to any
function finite at x. Also, one could set ad (x) = 0 for x ~ dom f. 0

It follows immediately from the definition that

aef(x) Cae' f(x) whenever e ~ e' ; (1.1.2)

af(x) = aof(x) = n{ad(x) : e > O} [= lime.j,o ad(x)]; (1.1.3)

aae+(l-a)e' f(x) :,) aad(x) + (1 - a)ae, f(x) for all a E ]0, 1[. (1.1.4)

The last relation means that the graph of the multifunction JR+ 3 e f---+ aef (x) is a
convex set in JRn+l; more will be said about this set later (Proposition 1.3.3).
We will continue to use the notation af, rather than aof, for the exact subdiffer-
ential- knowing that ae f can be called an approximate subdifferential when e > O.
Figure 1.1.1 gives a geometric illustration of Definition 1.1.1: s, together with
r E JR, defines the affine function y f-+ as ,r (y) := r + (s, y - x); we say that s is
an e-subgradient of f at x when it is possible to have simultaneously r ~ f (x) - e
and as,r ~ f. The condition r = f(x), corresponding to exact subgradients, is thus
relaxed bye; thanks to closed convexity of f, this relaxation makes it possible to find
such an as,r:

, f(x)
I

~:
e ..--1---...... -----.. _------ gr s,r

f
x .....
Fig.t.t.t. Supporting hyperplanes within c

Theorem 1.1.2 For all x E dom f, ad (x) =f. 0 whenever e > O.


1 The Approximate Subdifferential 93

PROOF. Theorem 111.4.1.1 implies the existence in IRn x IR of a hyperplane Hs,a


separating strictly the point (x, 1 (x) - e) from the closed convex set epi I: for all
y E IRn
(s, x) + a[f(x) - e] < (s, y) + al(y).
Taking y = x in this inequality shows that

a[f(x) - e] < al(x) < +00

hence a > o(the hyperplane is non-vertical). Thens':= -s/a E 8el(x). 0

Incidentally, the above proof shows that, among the e-subgradients, there is one
for which strict inequality holds in (1.1.1). A consequence of this result is that, for
e > 0, the domain of the multifunction x ~ 8e1 (x) is the convex set dom I. Here
is a difference with the case e = 0: we know that dom 81 need not be the whole of
dom I; it may even be a nonconvex set.
Consider for example the one-dimensional convex function

I(x) := { -2.jX ifx ~


+00
?,
otherwtse .
(1.1.5)

Applying the definition, an e-subgradient of 1 at x = 0 is an s E IR satisfying

sy + 2Jy - e ~ 0 for all y ~ 0 ,

and easy calculation shows that the e-subdifferential of 1 at 0 is ] - 00, -1/e]: it is


nonempty for all e > 0, even though 81(0) = 0. Also, note that it is unbounded, just
because 0 is on the boundary of dom I.
For a further illustration, let 1 : IR ~ IR be I(x) := Ix!. We have

[-I,-I-e/x] if x <-e/2,
8ef(x)= { [-1,+1] if-e/2~x~e/2,
[1-e/x,l] ifx>e/2.

The two parts of Fig. 1.1.2 display this set, as a multifunction of e and x respectively. It
is always a segment, reduced to the singleton {f'(x)} only for e = O(whenx # 0). This
example suggests that the approximate subdifferential is usually a proper enlargement
of the exact subdifferential; this will be confirmed by Proposition 1.2.3 below.
Another interesting instance is the indicator function of a nonempty closed convex
set:

Definition 1.1.3 The set of e-normal directions to a closed convex set C at x E C,


or the e-normal set for short, is the e-subdifferential of the indicator function Ie at x:

Ne,e(x) := 8e Ic(x) = {s E IRn : (s, y - x) ~ e for all y E C}. (1.1.6)


o
94 XI. Approximate Subdifferentials of Convex Functions

s s

x o E

for fixed E for fixed x


Fig.l.l.2. Graphs of the approximate subdifferential of Ixl

The 8-nonnal set is thus an intersection of half-spaces but is usually not a cone;
it contains the familiar nonnal cone Nc(x), to which it reduces when 8 = O. A
condensed fonn of (1. 1.6) uses the polar of the set C - {x}:
aeIC(x) = 8(C - x)O for all x E C and 8 > o.
These examples raise the question of the boundedness of aef(x).
Theorem 1.1.4 For 8 ~ 0, aef(x) is a closed convex set, which is nonempty and
bounded if and only if x E int dom f.
PROOF. Closedness and convexity come immediately from the definition (1.1.1).
Now, if x E intdomf, then aef(x) contains the nonempty set af(x) (Theo-
rem X.1.4.2). Then let 8 > 0 be such that B(x, 8) C intdomf, and let L be a
Lipschitz constant for f on B(x, 8) (Theorem Iy'3.1.2). For 0 :f:. s E aef(x), take
y = x + 8s/lIsll:
f(x) + L8 ~ f(y) ~ f(x) + 8{s, s/lIsll) - 8
i.e. IIsll ::;,;; L + 8/8. Thus, the nonempty aef(x) is also bounded.
Conversely, take any s) in the normal cone to dom f at x E dom f:
(sJ, y - x) ::;,;; 0 for all y E dom f .
If aef(x) :f:. 0, add this inequality to (1.1.1) to obtain
f(y) ~ f(x) + (s + SJ, Y - x) - 8

for all y E dom f. We conclude s + s) E as f (x): ae f (x) contains (actually: is equal


a
to) ef (x) + N dom f (x), which is unbounded if x ¢ int dom f. 0

In the introduction to this chapter, it was mentioned that one motivation for the
8-subdifferential was practical. The next result gives a first explanation: as f can be
used to characterize the 8-solutions of a convex minimization problem - but, starting
from Chap. XIII, we will see that its role is much more important than that.
Theorem 1.1.S The follOWing two properties are equivalent:
o E aef(x) ,
f(x) ::;,;; f(y) + 8 for all y E ]Rn •
PROOF. Apply the definition. o
1 The Approximate Subdifferential 95

1.2 Characterization via the Conjugate Function

The conjugate function /* allows a condensed writing for (1.1.1):


Proposition 1.2.1 The vector s E lRn is an s-subgradient of f at x E dom f if and
only if
f*(s) + f(x) - (s, x) ~ s . (1.2.1)
As a result,
S E 8ef(x) <==> x E 8ef*(s) . (1.2.2)

PROOF. The definition (1.1.1) of 8ef(x) can be written

f(x) + [(s, y) - f(y)] - (s, x) ~ s for all y E dom f

which, remembering that /* (s) is the supremum of the bracket, is equivalent to (1.2.1).
This implies that 8ef(x) C dom /* and, applying (1.2.1) with f replaced by f*:

x E 8ef*(s) <==> f**(x) + f*(s) - (s, x) ~ s .

The conclusion follows since Conv lRn 3 f = f**. o


In contrast with the case s = 0, note that
8e f(lR n ) = dom f* whenever s > O.

Indeed, the inclusion "c" comes directly from (1.2.1); conversely, if s E dom f*, we
know from Theorem 1.1.2 that there exists x E 8ef*(s), i.e. s E 8ef(x).
Likewise, for fixed x E dom f,

[lime-jo+oo 8ef(x) =] U {8ef(x) : s > O} = dom f* . (1.2.3)

Here again, the "c" comes directly from (1.2.1) while, if s E dom /*, set

ss := f*(s) + f(x) - (s, x) E [0, +oo[

to see that s is in the corresponding 8esf(x).

Example 1.2.2 (Convex Quadratic Functions) Consider the function

lRn 3 x f-+ f(x) := 4(Qx, x) + (b, x)


where Q is symmetric positive semi-definite with pseudo-inverse Q-. Using Exam-
ple X.1.1.4, we see that 8ef (x) is the set

{sEb+lmQ: f(x)+4(s-b,Q-(s-b))~(s,x)+s}.

This set has a nicer expression if we single out V f (x): setting p = s - b - Qx, we
see that s - b E 1m Q means p E 1m Q and, via some algebra,
96 XI. Approximate Subdifferentials of Convex Functions

ad(x) = {Vf(x)} + {p E ImQ : !(p, Q-p)::::; e} . (1.2.4)

Another equivalent formula is obtained if we set y := Q- p (so p = Qy):

ad(x) = {Vf(x) + Qy : !(Qy, y)::::; e}. (1.2.5)

To illustrate Theorem 1.1.5, we see from (1.2.4) that

... f 'h' {Vf(X)=Qx+bEIm Q [i.e.bEImQ],


x mlmmlzes Wit In e {::::=::> and !(V f(x), Q-V f(x)) ::::; e .

When Q is invertible and b = 0, f defines the norm Illxlll = (Qx, X)I/2. Its e-
subdifferential is then a neighborhood of the gradient for the metric associated with
the dual norm Illslll* = (s, Q- 1S)I/2. 0

The above example suggests once more that as f is usually a proper enlargement
of af; in particular it is "never" reduced to a singleton, in contrast with af, which is
"often" reduced to the gradient of f. This is made precise by the next results, which
somehow describe two opposite situations.

Proposition 1.2.3 Let f E ConvlR.n be I-coercive, i.e. f(x)/lIxll ~ +00 ifllxll ~


00. Then, for all x E dom f,

0::::; e < e' ===} ad(x) c int as' f(x).

Furthermore, any s E lR.n is an e-subgradient of f at x, providing that e is large


enough.

PROOF. Written in the form (1.2.l), an approximate subdifferential of f appears as a


sublevel-set of the convex function f* - (', x); but the 1-coercivity of f means that
this function is finite everywhere (Proposition X.1.3.8). For e' > e, the conditions of
Proposition VI. 1.3.3 are then satisfied to compute the interior of our sublevel-set:

int as' f(x) = {s E lR.n : f*(s) + f(x) - (s, x) < e'} :::> ad(x).

As for the second claim, it comes directly from (1.2.3). o

Proposition 1.2.4 Let f E Conv lR.n and suppose that aso f (Xo) is a Singleton for
some Xo E dom f and eo > O. Then f is affine on lR.n .

PROOF. Denote by So the unique eo-subgradient of f at Xo. Let e E ]0, eo[; in view
of the monotonicity property (1.1.2), the nonempty set as f (Xo) can only be the same
singleton {so}. Then let e' > eo; the graph-convexity property (1.1.4) easily shows
that as' f(xo) is again {so}.
Thus, using the characterization (1.2.l) of an approximate subgradient, we have
proved:

s =1= So ===} f*(s) > e + (so, xo) - f(xo) for all e > 0,
1 The Approximate Subdifferential 97

and not much room is left for f*: indeed


f* = r + I{sol for some real number r .
The affine character of f follows; and naturally r = !*(so) = (so, xo) - f(xo). 0

Our next example somehow generalizes this last situation.


Example 1.2.5 (Support and Indicator Functions) Let a = ae be a closed sub-
linear function, i.e. the support of the nonempty closed convex set C = aa (0). Its
conjugate a * is just the indicator of C, hence
aeac(d) = {s E C : a(d) ~ (s, d) + B}. (1.2.6)
Remembering that a (0) = 0, we see that
aeac(d) c aac(O) = aeac(O) = C for all d E IRn .
Keeping Proposition 1.2.4 in mind, the approximate subdifferential of a linear -
or even affine - function is a singleton. The set aeac(d) could be called the "B-face"
of C exposed by d, i.e. the set of B-maximizers of (', d) over C. As shown in Fig. 1.2.1,
it is the intersection of the convex set C with a certain half-space depending on B. For

(. .1\
B = 0, we do obtain the exposed face itself: aac(d) = Fe(d).

""'ctd
)

d
.........,
•• C ,••~
,
\\

o
Fig.I.2.I. An 8-face

Along the same lines, the conjugate ofIe in (1.1.6) is ae, so the B-normal set to
C at x is equivalently defined as
Ne,e(x) = {s E IRn : ac(s) ~ (s, x) + B}.
Beware of the difference with (1.2.6); one set looks like a face, the other like a cone;
the relation linking them connotes the polarity relation, see Remark V.3.2.6.

Consider for example a half-space: with s I- 0,


H-:= {x ERn: (s,x) (r}.

Its support function is computed by straightforward calculations, already seen in Exam-


ple V.2.3.1; we obtain for x E H-:
NH-,s(x) = {AS: °
(A and (r - (s, X)A (8}.
In other words, the approximate norq1al set to a half-space is a segment, which stretches to the
whole normal cone when x approaches the boundary. Figure 2.1.1 will illustrate an 8-normal
set in a nonlinear situation. o
98 XI. Approximate Subdifferentials of Convex Functions

1.3 Some Useful Properties

(a) Elementary Calculus. First, we list some properties coming directly from the
definitions.

Proposition 1.3.1
(i) For the function g(x) := I(x) + r, aeg(x) = ad(x).
(ii) For the function g(x) := al(x) and a > 0, aeg(x) = aae/al(x).
(iii) For the function g(x) := I(ax) and a =f:. 0, aeg(x) = aael(ax).
(iv) Moregenerally, if A is an invertible linear operator, ae(f oA)(x) = A*ad(Ax).
(v) For the function g(x) := I(x - xo), aeg(x + xo) = a/(x).
(vi) For the function g(x) := I(x) + {so, x}, aeg(x) = ad(x) + {so}.
(vii) If II ~ 12 and II (x) = l2(x), then adl(x) C ae l2(x).

PROOF. Apply (1.1.1), or combine (1.2.1) with the elementary calculus rules X.l.3 .1,
whichever seems easier. 0

Our next result expresses how the approximate subdifferential is transformed,


when the starting function is restricted to a subspace.

Proposition 1.3.2 Let H be a subspace containing a point ofdom I and call PH the
operator oforthogonal projection onto H. For all x E dom I n H,

i.e.

PROOF. From the characterization (1.2.1), s E ae(f + IH)(X) means

e ~ (f + IH)(X) + (f + IH)*(S) - {s, x} =


= I(x) + [(f oPH)* 0PH] (s) - {s,x} [from Prop. X.l.3.2]
= I(PH(X» + (f 0 PH)*(PH(S» - (s, PH(X)} [x = PH(X)]
= (f 0 PH)(X) + (f 0 PH)*(PH(S» - (PH(S), x}. [p H is symmetric]

This just means that PH (s) is in the e-subdifferential of lop H at x. o


A particular case is when doml C H: then I + IH coincides with I and we
have ael = ae(f 0 PH) + H.l..

(b) The Tilted Conjugate Function. From (1.2.1), ad(x) appears as the sublevel-
set at level e of the "tilted conjugate function"

]Rn :3 s ~ I*(s) - {s, x} + I(x) =: g;(s) , (1.3.1)

°
which is clearly in Conv]Rn (remember x E dom I!) and plays an important role. Its
infimum on]Rn is (Theorem 1.1.2), attained at the subgradients of I at x, if there
1 The Approximate Subdifferential 99

are any. Its conjugate g;* =


gx is easily computed, say via Proposition X.l.3.1, and
is simply the "shifted" function

JRn :3 h ~ gx(h) := f(x + h) - f(x). (1.3.2)

Indeed, when connecting g;


to the approximate subdifferential, we just perform the
conjugacy operation on f, but with a special role attributed to the particular point x
under consideration.

Proposition 1.3.3 For x E dom f, the epigraph ofg; is the graph ofthe multifunction
s ~ oef(x):

{(s, s) E JRn x JR : g;(s) ~ s} = {(s, s) E JRn x JR :s E oef(x)}. (1.3.3)

The supportfunction of this set has, at (d, -u) E JRn x JR, the value

O'epig*(d, -u)
x
= sup{(s,
s,e
d) - su : s E oef(x)} = (1.3.4)

u[f(x + diu) - f(x)] if u > 0,


={ O'domj*(d)= f60(d) if u = 0,
+00 if u < O.

PROOF. The equality of the two sets in (1.3.3) is trivial, in view of(I.2.1) and (1.3.1).
Then (1.3.4) comes either from direct calculations, or via Proposition X.l.2.1, with f
replaced by g;,
whose domain is just dom f* (Proposition X.l.3.1 may also be used).
o

Remember §IY.2.2 and Example 1.3.2.2 to realize that, up to the closure operation
at u = 0, the function (l.3.4) is just the perspective of the shifted function h ~
f(x + h) - f(x).
Figure 1.3.1 displays the graph of s ~ oe f (x), with the variable s plotted along
the horizontal axis, as usual (see also the right part of Fig. 1.1.2). Rotating the picture
so that this axis becomes vertical, we obtain epi g;.

£
Fig. 1.3.1. Approximate subdifferential as a SUblevel set in the dual space
100 XI. Approximate Subdifferentials of Convex Functions

(c) A Critical e-Value.


Proposition 1.3.4 Assume that the infimal value j of lover JRn is finite. For all
x E dom I, there holds
inf (8 > 0 : 0 E aef(x)} = I(x) - j. (1.3.5)

PROOF. Because g; is a nonnegative function, we always have


g1(0) = inf (8 > 0 : 8 ~ g1(0)}
which, in view of (1.3.3) and of the definition of g;, can also be written
inf (e > 0 : 0 E aef(x)} = 1*(0) + I(x) = - j + I(x). 0

The relation expressed in (1.3.5) is rather natural, and can be compared to Theo-
rem 1.1.5. It defines a sort of "critical" value for 8, satisfying the following property:

Proposition 1.3.5 Assume that 1 is l-coercive. For a non-optimal x E dom I, define


e := I(x) - j > o.
Then the normal cone to ae1 (x) at 0 is the set of directions pointing from x to the
minimizers of I:
=
Naef(x)(O) JR+(Argminl - (x}).
PROOF. We know that ael(x) is the sublevel-set of g; at level e and we use The-
orem VI.1.3.5 to compute its normal cone. From the coercivity of I, g;
is finite
everywhere (Proposition X.1.3.S); if, in addition, e > 0, the required Slater assump-
tion is satisfied; hence, for an s satisfying g; (s) = e [i.e. s E bd aef (x)],

= JR+ag1(s) = JR+(a/*(s) - (x}).


Nae!(x)(s)
In this formula, we can set e = e > 0 and s = 0 because

g1(0) = 1*(0) + I(x) - (0, x) = e.


The result follows from the expression (X. 1.4.6) of Argminl as a/*(O). 0

(d) Representations of a Closed Convex Function. An observation inspired by the


function g;
is that the multifunctions e t---+ F(e) that are e-subdifferentials (of a
function 1 at a point x) are the closed convex epigraphs; see §IY.1.3(g). Also, if
ael(x) is known for all e > 0 at a given x E dom I, then /* is also known, so 1
itself is completely determined:

Theorem 1.3.6 For x E dom I, there holds

I(x + h) = I(x) + sup {(s, h) - 8 : e > 0, s E ael(x)} (1.3.6)


for all h E JRn; or, using support functions:

I(x + h) = I(x) + sup[crae!(x) (h) - e]. (1.3.7)


e>O
1 The Approximate Subdifferential 101

PROOF. Fix x e dom/. Then set u = 1 in (1.3.4) to obtain uepigj(h, -1) = gx(h),
i.e. (1.3.7), which is just a closed form for (1.3.6). 0

Using the positive homogeneity of a support function, another way of writing


(1.3.7) is: ifx e dom I,

I(x + td) = I(x) + t sup [uataf(x) (d) - a] for all t > o.


u>o

Theorem 1.3.6 gives a converse to Proposition 1.3.1 (vii): if, for a particular x e
dom II n dom 12,
odl (x) C oeh(x) for alle > 0,
then
II (y) - II (x) :::;; h(y) - h(x) for all y e ]Rn .

Remark 1.3.7 The function g;


differs from f* by the affine function I (x) - (., x)
only. In the primal graph-space, gx is obtained from I by a mere translation of the
origin from (0,0) to (x, I(x». Basically, considering the graph of od, instead of
the epigraph of f*, just amounts to this change of variables. In the framework of
numerical optimization, the distinction is significant: it is near the current iterate x of
an algorithm that the behaviour of I matters. Here is one more practical motivation
for introducing the approximate subdifferential.
Theorem 1.3.6 confirms this observation: numerical algorithms need a model of
I near the current x - remember §II.2. A possibility is to considera model for 1*: it
induces via (1.2.1) a model for the multifunction e I---+- oel(x), which in turn gives
via (1.3.7) a model for I, hence a possible strategy for finding the next iterate x+.
o
This remark suggests that, in the framework of numerical optimization, (1.3.7)
will be especially useful with h close to O. More globally, we have

[/**(x) =] I(x) = sup {(s, x) - I*(s) : s e ]Rn},

which also expresses a closed convex function as a supremum, but of affine functions.
The index s can be restricted to dom 1*, or even to ri dom 1*; another better idea:
restrict s to dom of* (::J ridomf*), in which case f*(s) = (s, y) - I(y) for some
y e dom 01. In other words, we can write

I(x) = sup {f(y) + (s,x-y) : yedomol, seol(y)}.

This formula has a refined form, better suited to numerical optimization where
only one subgradient at a particular point is usually known (see again the concept of
black box (Ul) in §VIII.3.5).

Theorem 1.3.8 Associated with I e Conv]Rn, let s : ri dom I ~ ]Rn be a mapping


satisfying s (y) e 01 (y) for all y e ri dom I. Then there holds

I(x) = sup [/(y) + (s(y), x - y}] for all x e dom I. (1.3.8)


YEridomf
102 XI. Approximate Subdifferentials of Convex Functions

PROOF. The existence of the mapping s is guaranteed by Theorem X.1.4.2; besides,


only the inequality ~ in (1.3.8) must be proved. From Lemma III.2.I.6, we can take
d e JRn such that

Yt:=x+tderidomf forte]O,I].

Then
f(x + d) ~ f(Yt) + (s(Yt), x + d - Yt)
so that
{ ( ) d) ~ f (x + d) - f (Yt)
S Yt, "" 1 -t .
Then write

f(Yt) + (s(Yt), x - Yt) = f(Yt) - t{s(Yt), d) ~ f(Yt) ~ tf(x + d)


-t
and let t ..1- 0; use for example Proposition IY.I.2.S to see that the right-hand side tends
to f(x). 0

2 The Approximate Directional Derivative

Throughout this section, x is fixed in dom f. As a (nonempty) closed convex set, the
approximate subdifferential aef(x) then has a support function, for any e > O. We
denote it by f;(x, .):

JRn 3 d ~ f;(x, d) := C1aef(x) (d) = sup (s, d), (2.0.1)


seaef(x)

a closed sublinear function. The notation f; is motivated by §VI.I.I: f' (x, .) supports
af(x), so it is natural to denote by f;(x, .) the function supporting aef(x). The
present section is devoted to a study of this support function, which is obtained via
an "approximate difference quotient".

2.1 The Support Function of the Approximate SubditTerential

Theorem 2.1.1 For x e dom f and e > 0, the support function of ae f (x) is

1!])n
IN.. 3
d
~
~'(
Je x,
d) _ . f f(x
- In
+ td) - f(x) +e , (2.1.1)
t>O t
which will be called the e-directional derivative of fat x.

PROOF. We use Proposition 1.3.3: embedding the set aef (x) in the larger space JRn x JR,
we view it as the intersection of epi gi with the horizontal hyperplane
2 The Approximate Directional Derivative 103

f;
(rotate and contemplate thoroughly Fig. 1.3.1). Correspondingly, (x, d) is the value
at (d, 0) of the support function of our embedded set epigi n He:

f;(x, d) = (Iepigj + IHe)*(d, 0).


Our aim is then to apply the calculus rule X.2.3.2, so we need to check the
qualification assumption (X.2.3.1). The relative interior of the hyperplane He being
He itself, what we have to do is to find in ri epi gi a point of the form (s, e). Denote
by A the linear operator which, to (s, a) E JRn x JR, associates A(s, a) = a: we want

e E A(riepig;) = ri[A(epig;)],
where the last equality comes from Proposition III.2.1.12. But we know from Theo-
rem 1.1.2 and Proposition 1.3.3 that A(epigi) is JR+ or JRt; in both cases, its relative
interior is JRt, which contains the given e > O. Our assumption is checked.
As a result, the following problem

min{CTHe(P,U) +CTePigj(q, v) : p+q=d,u+v=O}=f;(x,d) (2.1.2)

has an optimal solution. Now look at its minimand:


-unless p = 0, CTHe(P, u) = +00; andCTHe(O, u) = ue;
- unless v ~ 0, CTepigj(q, v) = +00 - see (1.3.4).
Ina word,

f;(x, d) = min (CTepigj(d, -u) + ue : u ~ O}.

Remembering that, as a support function, u f-+ CTepi gj (d, u) is in particular continuous


for u -I- 0, and using its value (1.3.4), we finally obtain (2.1.1). 0

The above proof strongly suggests to consider the change of variable t = 1I u in


(2.1.1), and to set

u[f(x + diu) - f(x) + e] ifu > 0,


re(u) := { fbo(d) = CTdomj*(d) ifu=O, (2.1.3)
+00 ifu<O.

When e > 0, this function does achieve its minimum on JR+; in the t-Ianguage, this
means that the case t -I- 0 never occurs in the minimization problem (2.1.1). On the
other hand, 0 may be the unique minimum point of re, i.e. (2.1.1) may have no solution
"at finite distance".

The importance ofthe assumption 8 > 0 cannot be overemphasized; Theorem X.2.3.2


cannot be invoked without it, and the proof breaks down. Consider Fig. 2.1.1 for a counter-
example: B(O, 1) is the unit Euclidean ball oflR2, I its indicator function and x = (0, -1).
Clearly enough, the directional derivative I' (x, d) for d = (y, 8) =/:. 0 is

I'(x,d) = { 0 ~f8 > 0,


+00 If8 ~O,
104 XI. Approximate Subdifferentials of Convex Functions

which cannot be a support function: it is not even closed. On the other hand, the infimum in
(2.1.1), which is obtained for x + td on the unit sphere, is easy to compute: for d =f. 0,

if 8> 0,
if8 ~ 0.

It is the support function of the set

(2.1.4)

which, in view of Theorem 2.1.1, is the e-normal set to B(O, 1) at x.

Fig. 2.1.1. The approximate subdifferential of an indicator function

Note: to obtain (2.1.4), it is fun to use elementary geometry (carry out in Fig. 2.1.1 the
inversion operation from the pole x), but the argument of Remark VI.6.3. 7 is more systematic,
and is also interesting.

Remark 2.1.2 We have here an illustration of Example X.2.4.3: closing f' (x, .) is just what
is needed to obtain the support function of af(x). For x E dom af, the function

~n 3 d 1-+ U(Jj(x)(d) = sup {(s,d) : s E af(x)}

is the closure of the function

1I])n
~ 3
d 1-+
f'e x, d) -_ I·1m f(x + td) - f(x) .
t,j,o t

This property appears also in the proof of Theorem 2.1.1: for e = 0, the closure operation
must still be performed after (2.1.2) is solved. 0

°
Figure 2.1.2 illustrates (2.1.1) in a less dramatic situation: for e > 0, the line representing
t f(x) - e + tf;(x, d) supports the graph oft 1-+ f(x + td), not at t = but at some
1-+
point te > 0; among all the slopes joining (0, f(x) - e) to an arbitrary point on the graph of
f (x + .d), the right-hand side of (2.1.1) is the smallest possible one.
On the other hand, Fig. 2.1.2 is the trace in ~ x ~ of a picture in ~n x R among all
the possible hyperplanes passing through (x, f(x) - e) and supporting epi f, there is one
touching gr f somewhere along the given d; this hyperplane therefore gives the maximal
slope along d, which is the value (2.0.1). The contact x + tEd plays an important role in
minimization algorithms and we will return to it later.
2 The Approximate Directional Derivative 105

f(x) ----I
f(x+td) - E - -

~:::""_...L....

Fig.2.1.2. The e-directional derivative and the minimizing te

The same picture illustrates Theorem 1.3.6: consider the point x + ted as fixed and
call it y. Now, for arbitrary 8 ~ 0, draw a hyperplane supporting epi f and passing through
(0, f (x) - 8). Its altitude at y is f (x) - 8 + f; (x, y - x) which, by definition of a support,
is not larger than f(y), but equal to it when 8 = e.
In summary: fix (x, d) E dom f x lRn and consider a pair (e, y) E lRt x lRn linked by
the relation y = x + ted.
- To obtain y from e, use (2.1.1): as a function of the "horizontal" variable t > 0, draw the
line passing through (x, f(x) - e) and (x + td, f(x + td)); the resulting slope must be
minimized.
- To obtain e from y, use (1.3.7): as a function of the "vertical" variable 8 ~ 0, draw a support
passing through (x, f (x) - 8); its altitude at y must be maximized.

Example 2.1.3 Take again the convex quadratic function of Example 1.2.2: I: (x, d)
is the optimal value of the one-dimensional minimization problem

. t(Qd,d)t 2 +(V/(x),d)t+e
mf .
t>O t
If (Qd, d) = 0, this infimum is (V I(x), d). Otherwise, it is attained at

t
6 -
- Rf; e
(Qd,d)

and
= (V/(x), d) + .j2e(Qd, d).
I;(x, d)
This is a general formula: it also holds for (Qd, d) = O.
It is interesting to observe that[l: (x, d) - I' (x, d)]/ e ~ +00 when e -J.. 0, but
H/:(x, d) - I'(x, d)]2 == (Qd, d).
e
This suggests that, for C2 convex functions,

lim t[f:(x, d) - 1'(x,d)]2 = I" (x , d)


6-1-0 e

where
106 XI. Approximate Subdifferentials of Convex Functions

f "(x, d)'- I' f(x


.- 1m
+ td) - f(x) - tf'(X, d) _ I'
-
f'ex + td, d) - f'ex, d)
1m -'---'-----'---'---'---'-
t--+o !t
2
2 t--+o t

is the second derivative of f at x in the direction d. Here is one more motivation for
the approximate subdifferentiaI: it accounts for second order behaviour of f. 0

2.2 Properties of the Approximate Difference Quotient

For x E dom f, the function qe encountered in (2.1.1) and defined by

f(x + td) - f(x) + e


]0, +oo[ 3 t ~ qeCt) := " - - - - - - - ' - - - - (2.2.1)
t
is called the approximate difference quotient. In what follows, we will set q := qo; to
avoid the trivial case qe == +00, we assume

[x,x +td] c domf for some t > O.

Our aim is now to study the minimization of qe, a relevant problem in view ofTheo-
rem 2.1.1.

(a) Behaviour of q.,. In order to characterize the set of minimizers

Te := {t > 0 : qe(t) = f;(x, d)}, (2.2.2)

two numbers are important. One is

f/x;(d) := sup {q(t) : t > O}

which describes the behaviour of f(x + td) for t -+ +00 (see §IY.3.2 and Exam-
ple X.2.4.6 for the asymptotic function f6o)' The other,

tl:=SUp{t~O: f(x+td)-f(x) = tf'(X, d)},

concerns the case t -J, 0: t l = 0 if f(x+td) has a "positive curvature fort -J, 0". When
t l = +00, i.e. f is affine on the half-line x +lR+d, we have qe(t) = f'ex, d) + elt;
then it is clear that f; (x, d) = f' (x, d) = fbo (d) and that Te is empty for e > O.

Example 2.2.1 Before making precise statements, let us see in Fig. 2.2.1 what can be ex-
pected, with the help of the example

o~ t ~ f(x + td) := max {t, 3t - I}.

The lower-part of the picture gives the correspondence e -4 Te , with the same abscissa-axis
as the upper-part (namely t). Some important properties are thus introduced:
- As already known, q increases from f'ex, d) = 1 to f/x;(d) = 3;
- indeed, q(t) is constantly equaito its minimal value f'ex, d) forO < t ~ 1/2, and t l = 1/2;
2 The Approximate Directional Derivative 107

t(d) =3 ~--::6 i
f'(x,d) =1

10
00
10
--------
= 1 _._ ... _................_._..... _...-.._............_.........

Fig.2.2.1. A possible qe and Te

- f;(x, d) stays between f'ex, d) and f60(d), and reaches this last value for s ;;:: 1;
- To is the segment ]0, 1/2]; TJ is the half-line [1/2, +00[, and Te is empty for s > 1. 0

This example reveals another important number associated with large t, which we will
calls oo (equal to 1 in the example). The statements (i) and (ii) in the next result are already
known, but their proof is more natural in the t-Ianguage.

Proposition 2.2.2 The notations and assumptions are as above.


(i) If s > 0, then qe(t) ~ +00 when t ..j, O.
(ii) When t ~ +00, qe(t) ~ f/x,(d) independently ols ~ O.
(iii) The set
E := {s ~ 0 : f;(x, d) < f~(d)} (2.2.3)
is empty if and only if t l = +00. Its upper bound
sup E =: SOO E [0, +00]

satisfies
Te "# 0 if s < SOO and Te = 0 if s > SOO •

PROOF. For s > 0, let So E Be /2!(x): for all t > 0,

f(x) + t(so, d) - s/2 - f(x) +s d) s


qe(t) ~
t
= (so, + -2t
and this takes care of (i). For t ~ +00, qe(t) = q(t) + s/t has the same limit as
q (t), so (ii) is straightforward.
Thus f~(x, d) ~ f/x,(d), and non-vacuousness of Te depends only on the be-
haviour of qe at infinity. To say that E is empty means from its definition (2.2.3):

f/x,(d) ~ inf inf qe(t) = inf inf qe(t) = f'(x, d).


e ~ 0 t>o t>o e ;;:: 0
108 XI. Approximate Subdifferentials of Convex Functions

Because f'(x,·) ~ f/x;, this in turn means that q(t) is constant for t > 0, or that
l- = +00.
f;
Now observe that s r+ (x, d) is increasing (just as de f); so E, when nonempty,
is an interval containing 0: s < soo implies sEE, hence Te #- 0. Conversely, if
s > soo, take s' in ]Soo, s[ (so s' fj E) and t > 0 arbitrary:

t cannot be in Te , which is therefore empty. o

The example t r+ f(x + td) = .Jf+t2 illustrates the case SOO < +00 (here
soo = l);itcorrespondstof(x+, d) having the asymptote t r+ f(x)-soo+ f60(d)t.
Also, one directly sees in (2.2.3) thats oo = +00 whenever f60(d) is infinite. With the
example f of(1.1.5)(and taking d = 1), wehavet l = 0, f/x;(1) = Oands oo = +00.

(b) The Closed Convex Function r e. For a more accurate study of Te , we use now the
function re of (2.1.3), obtained via the change of variable U = I/t: re(u) = qe(1/u)
for u > O. It is the trace on lR of the function (1.3.4), which is known to be in
Conv(lRn x lR); therefore
re E ConvlR.

We know from Theorem 2.1.1 that re is minimized on a closed interval

Ue := {u ~ 0 : re(u) = f;(x, d)} (2.2.4)

which, in view of Proposition 2.2.2(i), is nonempty and bounded if s > O. Likewise,

Te = {t = I/u : u E Ue and u > O}

is a closed interval, empty if Ue = {O}, a half-line if {O} *- Ue. Knowing that

it will be convenient to set, with the conventions 1/0 = +00, 1/00 = 0:


(2.2.5)

There remains to characterize Te , simply by expressing the minimality condition


o E dre(u).
Lemma 2.2.3 Let ((J be the convexfunction 0 ~ t r+ ((J(t) := f(x + td). For s ~ 0
and 0 < u E domre, the subdifferential ofre at u is given by

dre () _ re(u) - d((J(I/u)


u - . (2.2.6)
u
2 The Approximate Directional Derivative 109

PROOF. The whole point is to compute the subdifferential of the convex function
u ~ t/f(u) := u((J(llu), and this amounts to computing its one-sided derivatives.
Take positive u' = lit' and u = lit, with u E domre (hence ((J(llu) < +00), and
compute the difference quotient of t/f (cf. the proof of Theorem 1.1.1.6)
u'((J(llu') - u((J(llu) ((J(t') - ((J(t)
,
u-u
= ((J(t) - t,
t-t
.

Letting u' ,j.. u, i.e. t' t t, we obtain the right-derivative

the left-derivative is obtained likewise.


Thus, the subdifferential of t/f at u is the closed interval

ot/f(u) = ((J(t) - to((J(t) .

Knowing that re(u') = t/f(u') - u'[f(x) - e] for all u' > 0, we readily obtain
ore(u) = ((J(t) - to((J(t) - f(x) + e = re(u)lu - to((J(t). 0

TherninimalityconditionO E ore(u) is therefore rdu) E o((J(llu).Int-language,


we say: tETe if and only if there is some a E o((J(t) such that

f(x + td) - ta = f(x) - e. (2.2.7)


Remark 2.2.4 Using the calculus rule from Theorem 3.2.1 below, we will see that aq> can
be expressed with the help of af. More precisely, if x + R+d meets ridomf, we have

arE(u) = rE(u) - (af(x + diu), d) .


u
Then Fig. 2.1.2 gives a nice interpretation of (2.2.7): associated with t >0 and S E
af(x + td), consider the affine function
o~ r 1-+ l(r) := f(x + td) + (s, d)(r - t),

whose graph supports epi q> at r = t. Its value 1(0) atthe vertical axis r = 0 isa subderivative
of u 1-+ uq>(l/u) at u = lit, and t is optimal in (2.1.1) when 1(0) reaches the given
1
f(x) - e. In this case, TE is the contact-set between gr and grq>. Note: convexity and
Proposition 2.2.2(iii) tell us that f(x) - e oo ~ 1(0) ~ f(x). 0

Let us summarize the results of this section concerning the optimal set Te, or Ue
of (2.2.4).
- First, we have the somewhat degenerate case t l = +00, meaning that f is affine
on the half-line x + 1R+d. This can be described by one of the following equivalent
statements:
f'(x, d) = f60(d);
f;(x,d) = f'(x,d) foralle > 0;
Vt > 0, qe(t) > f60(d) for aIle> 0;
Te = I{) for all e > 0;
Ue = {OJ for all e > O.
110 XI. Approximate SubdifIerentials of Convex Functions

- The second situation, more interesting, is when tf. < +00; then three essentially
different cases may occur, according to the value of e.
- When 0 < e < eoo , one has equivalently
f'(x, d) < f:(x, d) < f/:.o(d);
3t > 0 such that qs(t) < f60(d) ;
Ts is a nonempty compact interval;
Of/. Us.
- When eOO < e:
f:(x, d) = f/:.o(d);
Vt > 0, qs(t) > f/:.o(d);
Ts =0;
Us = {O}.
f:(x, d) = f60(d) ;
Ts is empty or unbounded;
o E Us.
Note: in the last case, Ts nonempty but unbounded means that f(x + td) touches
its asymptote f(x) - e + tf60(d) for t large enough.

2.3 Behaviour of f; and Te as Functions of e


In this section, we assume again [x, x + td] c domf for some t > O. From the
fundamental relation (2.1.1), the function e t-+- - f:(x, d) appears as a supremum of
affine functions, and is therefore closed and convex: it is the conjugate of some other
function. Following our general policy, we extend it to the whole real line, setting

._ {-f:(X,d) ife~O,
() .-
ve
+00 th . 0 erwlse.
Then v E Conv lR: in fact, dom v is either [0, +oo[ or ]0, +00[. When e -l- 0, v(e)
tends to - f'(x, d) E lR U {+oo}.

Lemma 2.3.1 With r := ro of (2.1.3) andfor all real numbers e and u:

v(e) = r*(-e), i.e. r(u) = v*(-u).


PROOF. For e ~ 0, just apply the definitions:

-v(e) = f;(x, d) = inf [r(u)


uEdomr
+ eu] = -r*(-e).

For -e > 0, we have trivially (remember that f' (x, d) < +00)

r*(-e)~ lim [-eu-r(u)]=+oo-f'(x,d)=+oo=v(e). 0


U~+OO
2 The Approximate Directional Derivative 111

Observe that the case u = 1 is just Theorem 1.3.6; u = 0 gives


[f~(d) =] r(O) = sup f;(x, d) [= lime-++oo ff(x, d)],
e>O

and this confirms the relevance of the notation f60 for an asymptotic function.
Theorem 2.3.2 With the notation o/Proposition 2.3.1 and (2.2.4),

ov(e) = -Ue /orall e ~ O. (2.3.1)

Furthermore, e 1--+ ff(x, d) is strictly increasing on [0, e OO [ and is constantly equal


to f6o(d)/ore ~ eOO.

PROOF. By definition and using Lemma 2.3.1, -u E ov(e) if and only if

-ue = v(e) + v*(-u) = - f;(x, d) + r(u) <==> f;(x, d) = re(u) ,

which exactly means that u E Ue. The rest follows from the conclusions of §2.2. 0

Figure 2.3.1 illustrates the graph of e 1-+ f:


(x, d) in the example of Fig. 2.2.1. Since
the derivative of e 1-+ qe(t) is l/t, a formal application of Theorem VI.4.4.2 would give
directly (2.3.2); but the assumptions of that theorem are hardly satisfied - and certainly not
for e ~ eOO.

f~(x,d)

t,:.(d) =31------:::ao'I"'"-----
f'(x,d) =1

Fig. 2.3.1. The e-directional derivative as a function of e

Remark 2.3.3 Some useful formulae follow from (2.3.2): whenever Te -::/= 0, we have

1'/ - e
I
fT/(x,d) ~ fe(x,d)
I
+- - for all tETe,
t
1'/ - e
fe(x, d) + -.- - 0(1'/ - e) ifl'/ ~ e,
I I
fT/(x, d) =
te
d d 1'/ - e . 1'/
I
fT/(x, ) = Je(x, )
~I
+- -- 0(1'/ - e) If ~ e,
!e
where!e and te are respectively the smallest and largest elements of Te, and the remainder
terms 0(·) are nonnegative (of course,!e = te except possibly for countably many values of
e). We also have the integral representation (1.4.2.6)

f:(x, d) = !'(x, d) + foe Uada for all 0 ~ e < eOO . o


112 XI. Approximate Subdifferentials of Convex Functions

A natural question is now: what happens when e ,!.. o? We already know that
fi(x, d) ~ f'(x, d) but at what speed? A qualitative answer is as follows (see also
Example 2.1.3).

Proposition 2.3.4 Assuming -00 < f'(x, d), there holds

· fi(x,d)-f'(x,d 1 [0 ]
11m
e.J..o e
= ""l
t
E ,+00,

PROOF. We know from (2.2.5) and (2.3.2) that av(O) = -u


o =] - 00, -llt l ], so
0+ v (0) = -1 I t l and everything comes from the elementary results of §1.4.2. 0

For fixed x and d, use the notation of Remark 2.2.4 and look again at Fig. 2.1.2, with
the results of the present section in mind: it is important to meditate on the correspondence
between the horizontal set of stepsizes and the vertical set of f -decrements.
To any stepsize t ~ 0 and slope (s, d) ofa line supporting epiql at (t, qI(t», is associated
a value
8r,s := f(x) - 1(0) = f(x) - f(x + td) + t(s, d) ~ o. (2.3.2)
Likewise, to any decrement 8 ~ 0 from f (x) is associated a stepsize te ~ 0 via the slope
ff(x, d). This defines a pair of multi functions t t---+ -ar(llt) and 8 t---+ Te , inverse to each
other, and monotone in the sense that

for tETe and t' E Te" 8 > 8' ¢=> t > t' .

To go analytically from t to 8, one goes first to the somewhat abstract set of inverse stepsizes
u, from which e is obtained by the duality correspondence of Lemma 2.3.1. See also the lower
part of Fig. 2.2.1, for an instance of a mapping T (or rather its inverse).

To finish this section, we give an additional first-order development: there always


holds the estimate

f(x + td) = f(x) + tf'(x, d) + o(t) ,


and various ways exist to eliminate the remainder term o(t): one is (1.3.7); the mean-
value Theorem VI.2.3.3 is more classical; here is a third one.

Proposition 2.3.5 With the hypotheses and notations given so far, assume t l < +00
(i.e. f is not affine on the whole half-line x + JR.+d). For t > 0, there is a unique
e(t) E [0, e OO [ such that

f(x + td) = f(x) + tf:(t) (x, d). (2.3.3)

PROOF. We have to solve the equation in e: v(e) =


-q(t). In view of the second part
of Theorem 2.3.2, this equation has a unique solution provided that the right-hand
side -q(t) is in the interval
3 Calculus Rules on the Approximate Subdifferential 113

(v(a): 0:::;;a<8°O}=-[I'(x,d),I~(d)[.

As a result, the unique 8(t) of (2.3.3) is

0 for t E ]0, tl],


{
8(t) = (-V)-I(q(t» for t > tl. o

Figure 2.3.2 emphasizes the difference between this 8(t) and the 8t,s defined in
(2.3.2).

f(x+td)
--~------------~---------------
f(x)
gr <p
f(x)-£(t)

f(x)-Et,s -··" . ·. . . .~""::-::::.::::l slope q(t)


Fig.2.3.2. The numbers s(t) and Sl,s

3 Calculus Rules on the Approximate Subdifferential

In §VI.4, we developed a calculus to compute the subdifferential of a finite-valued


convex function I constructed from other functions Ii. Here, we extend the results
of §VI.4 to the case of ael, with 8 ~ 0 and I E ConvJRn . Some rudimentary such
calculus has already been given in Proposition 1.3 .1.
The reader may have observed that the approximate subdifferential is a global
concept (in contrast to the exact subdifferential, which is purely local): for 8 arbitrarily
large, aeI (x) depends on the behaviour of I arbitrarily far from x. It can therefore
be guessed that ae I has to depend on the global behaviour of the Ii'S, while aI (x)
a
depends exclusively on the Ii (x) 'so It turns out that the conjugacy operation gives a
convenient tool for taking this global behaviour into account. Indeed, knowing that
I results from some operation on the Ii'S, the characterization (1.2.1) in terms of
1* shows that the whole issue is to determine the effect of the conjugacy on this
operation: it is the set of calculus rules of §X.2 that is at stake.
Recall that all the functions involved are in Conv JR n .

3.1 Sum of Functions

From Theorem X.2.3.1, the conjugate of a sum of two functions II + h is the closure
of the infimal convolution It t g.
Expressing ae(iJ + h) (s) will require an expres-
114 XI. Approximate Subdifferentials of Convex Functions

sion for this infimal convolution, which in turn requires the following basic assump-
tion:
When s E dom(fl + 12)*'1 (fl + h)*(s) = !t(PI) + g(P2)
(3.1.1)
for some Pi satisfying PI + P2 = s .

This just expresses that the inf-convolution of !t and 12* is exact at s = PI + P2:
the couple (Ph h) actually minimizes the function (Ph P2) ~ !t(PI) + g(P2).
Furthermore, we know (Theorem X.2.3.2) that this property holds under various
conditions on II and 12; one of them is

ridom/l nridomh #- 0, (3.1.2)

which is slightly more stringent than the minimal assumption

dom II n dom 12 #- 0 [¢::::::> II + h ~ +00].


Theorem 3.1.1 For any 8 ~ 0 and x E dom(fl + h) = dom II n dom h. there
holds

8d/l + h)(x) :::> u {8eJI (x) + 8e2 h(x) : 8i ~ 0, 81 + 82 ~ 8} (3.1.3)

with equality under assumption (3.1.1),for example if (3.1.2) holds.

PROOF. Let 8 ~ 0 be given. For arbitrary nonnegative 810 82 with 81 + 82 ~ 8, Defini-


tion 1.1.1 clearly implies

8eJI (x) + 8e2 h(x) C 8e (fl + h)(x).


Conversely, take s E 8e (fl + h)(x), i.e.
(fl + 12)*(S) + (fl + h)(x) - (s, x) ~ 8 . (3.1.4)

This s is therefore in dom(fl + 12)* and we can apply (3.1.1): with the help of some
PI and P2, we write (3.1.4) as 81 + 82 ~ 8, where we have set

8i := N(Pi) + Ii (x) - (Pi, x) for i = 1,2.


We see that Pi E 8ej Ii (x) for i = I, 2, and the required converse inclusion is proved.
o
Naturally, if 1= LI=I Ii, the right-hand side in (3.1.3) becomes
U {LI=I 8e;!i(X) : 8i ~ 0, LI=I 8i ~ 8}. (3.1.5)

Since 8e l(x) increaseswith8, the constraints Li 8i ~ 8 can be replaced by Li 8i = 8


in (3.1.3) and (3.1.5). Also, when 8 = 0, there is only one possibility for 8i in (3.1.3):

Corollary 3.1.2 For x E dom II n dom h. there holds

with equality under, for example, assumption (3.1.2). o


3 Calculus Rules on the Approximate Subdifferential 115

To emphasize the need of an assumption such as (3.1.2), consider in ]R2 the


Euclidean balls C] and C2 , of radius 1 and centered respectively at (-1,0) and (1,0):
they meet at the unique point x = (0,0). Then take for Ii the indicator function of
Cj, so I is the indicator function I{oJ' We have al(O) = ]R2, while a/] (0) + ah(O) =
]R x {OJ.

Example 3.1.3 (£-Normal Sets to Closed Convex Polyhedra) Let C be a closed


convex polyhedron described by its supporting hyperplanes:

Hj-:={XE]Rn: (sj,x):::;rj} forj=l, ... ,m,


(3.1.6)
C:= nHj - = {x E]Rn : (Sj,x) :::;rj for j = 1, ... ,m}.
Its indicator function is clearly
m
Ie = L)H:-'
j=] J

Let us compute the £-subdifferential of this function, i.e. the £-normal set of Defini-
tion 1.1.3. The approximate normal set to Hj- has been given in Example 1.2.5, we
obtain for x E C

(3.1.7)

where we have set Cj(x) := rj - (Sj, x) ~ 0. In fact, Ie is a sum of polyhedral


functions, and Theorem X.2.3.2 - with its qualification assumption (X.2.3.Qj) - does
imply (3.1.1) in this case.
Still for x E C, set K(x) := maxj Cj(x). Clearly enough

K(x) co{s], ... , sm} C Ne,e(x)

(and the normal cone Nc(x) can even be added to the left-hand set). Likewise, set
k(x):= minj Cj(x). Ifk(x) > 0,

Ne,e(x) C kfu co{s], ... ,sm}· o


Example 3.1.4 (Approximate Minimality Conditions) Let be given a convex function f :
IRn -+ IR and a nonempty closed convex set C; assume that
Ie := inf {f(x) : x E C} (3.1.8)

is not -00. The e-minimizers of f on C are those x E C such that f(x) :::;; lc + e; clearly
enough, an e-minimizer is an x such that (remember Theorem 1.1.5)

(f + Ie)(x):::;; inf(f + Ie)


JRn
+e or equivalently 0 E os(f + Ie)(x).

Here domf = ]Rn: we conclude from Theorem 3.1.1 that an e-minimizer is an x such
that
o E oaf(x) + Ne,s-a(x) for some a E [0, e],
i.e. f has at x an a-subgradient whose opposite lies in the (e - a)-normal set to C. The
situation simplifies in some cases.
116 XI. Approximate Subdifferentials of Convex Functions

- Set 8 = 0 to obtain the standard minimality condition of (VII. 1. 1.3):


x solves (3.1.8) {=} -of (x) n Nc(x) =10.
- Another case is when C = (xo}+H is an affine manifold. Then Nc.p (x) = H.l forall,8 ~ 0
and oaf(x) increases with a: our 8-minimality condition becomes -oef(x) n H.l =10.
-Also, when f = (so,,) is linear, oaf(x) = of (x) = {so} for all a ~ 0, while Nc.p(x)
increases with,8; so our e-minimality condition is: -so E Nc,s(x), a triviality if we replace
Nc,s(x) by its definition (1.1.6).
- If C is a closed convex polyhedron as in (3.1.6), its e-nonnal set can be specified as in
(3.1. 7). Take for (3.1.8) a linear programming problem

min{(so,x): (sj,x):(rjforj=I •...• m}.

An 8-minimizer is then an x E C for which there exists JL = (JLl •...• JLm) E R m such that

2:J=1 JLjSj + So = 0, )
2:J=1 JLj'j ~ (so. x) :( e,
JLj ~ 0 for J = 1•...• m. o
We leave it as an exercise to redo the above examples when

C := {x E K : (Sj. x) = rj for j = 1•... , m}

is described in standard form (K being a closed convex polyhedral cone, say the nonnegative
orthant).

Remark 3.1.5 In the space IR n ) x IR n2 equipped with the scalar product of a product-
space, take a decomposable function:

Because ofthe calculus rule X.1.3.1(ix), the basic assumption (3.1.1) holds automat-
ically, so we always have

but beware that this set is not a product-set in IR n ) x IR n2 , except for e = O. 0

3.2 Pre-Composition with an Affine Mapping

Given g E ConvlRm and an affine mapping A : IRn -+ IRm (Ax = Aox + Yo E IRm
with Ao linear), take f := goA E ConvlRn. As in §3.1, we need an assumption,
which is in this context:

When S E dom(g 0 A)*, I (g 0 A)*(s) = g*(p) - (p, Yo)


(3.2.1)
for some p such that A6' p = S

«(', .) will denote indifferently the scalar product in IR n or IRm). As was the case with
(3.1.1), Theorem X.2.2.1 tells us that the above p actually minimizes the function
3 Calculus Rules on the Approximate Subdifferential 117

p ~ g* (p) - (p, Yo) on the affine manifold of equation At p = s. Furthermore we


know (Theorem X.2.2.3) that (3.2.1) holds under various conditions on f and Ao; one
of them is
(3.2.2)
Once again, note that removing the words "relative interior" from (3.2.2) amounts to
assuming goA 1= +00.

Theorem 3.2.1 Let g and A be defined as above. For all e ~ 0 and x such that
Ax E dom g, there holds

(3.2.3)

with equality under assumption (3.2.1),for example if (3.2.2) holds.

PROOF. Fix x such that Ax E domg and let p E aeg(Ax) c IRm :

g(z) ~ g(Ax) + (p, z - Ax) - e for all z E IRm .

Taking in particular z = Ay with y describing IRn :

g(Ay) ~ g(Ax) + (A~p, y - x) - e for all y E IRn ,

where we have used the property A(y - x) = Ao(Y - x). Thus we have proved that
Atp E ae(g 0 A)(x).
Conversely, lets E ae(g 0 A)(x), i.e.

(g 0 A)*(s) + (g 0 A)(x) - (s, x) :::;; e. (3.2.4)

Apply (3.2.1): with the help of some p such that Atp = s, (3.2.4) can be written
e ~ g*(p) - (p, Yo) + g(Ax) - (p, Aox) = g*(p) + g(Ax) - (p, Ax) .

This shows that P E aeg(Ax). Altogether, we have proved that our sis in Ataeg(Ax).
o

Naturally, only the linear part of A counts in the right-hand side of (3.2.3): the
translation is taken care ofby Proposition X.1.3. l(v).
As an illustration of this calculus rule, take Xo E domg, a direction d =f. 0, and
compute the approximate subdifferential of the function

IR 3 t ~ qJ(t) := g(xo + td) .

If Xo + IRd meets ri dom g, we can write


aeqJ(t) = (aeg(xo + td), d) for all t E domqJ.
118 XI. Approximate Subdifferentials of Convex Functions

3.3 Image and Marginal Functions

We recall that, for g ECony lRm and A linear from lRm to lRn , the image of g under
A is the function Ag defined by
lRn 3 x H- (Ag)(x) := inf {g(y) : Ay = x}. (3.3.1)
Once again, we need an assumption for characterizing the 8-subdifferentia1 of Ag,
namely that the infimum in (3.3.1) is attained "at finite distance". A sufficient assump-
tion for this is
ImA* nridomg* =1= 0, (3.3.2)
which implies at the same time that Ag E Cony lRn (see Theorem X.2.2.3). As already
seen for condition (X.2.2.Q.iii), this assumption is implied by
g~(d) > 0 for all nonzero dE Ker A.
Theorem 3.3.1 Let 8 ~ 0 and x E dom Ag = A (dom g). Suppose that there is some
y E lR m with Ay = x and g(y) = Ag(x);for example assume (3.3.2). Then
8£(Ag)(x) = {s E lRn : A*s E 8£g(y)}. (3.3.3)
PROOF. To say that s E 8£(Ag)(x) is to say that

(Ag)*(s) + g(y) - (s, Ay) ~ 8,

where we have made use of the existence and properties of y. Then apply Theo-
rem X.2.1.1: (Ag)* = g* 0 A*, so
S E 8£(Ag)(x) {::=} g*(A*s) + g(y) - (A*s, y) ~ 8. 0

This result can of course be compared to Theorem VI.4.5.1: the hypotheses are
just the same - except for the extended-valuedness possibility. Thus, we see that the
inverse image under A * of 8£g(yx) does not depend on the particular Yx optimal in
(3.3.1).
We know that a particular case is the marginal function:
lRn 3 x H- I(x) := inf {g(x, z) : Z E lR P }, (3.3.4)
where g E Conv(lRn x lR P ). Indeed, I is the image of g under the projection mapping
from lRn+ P to lRn defined by A (x, z) = x. The above result can be particularized to
this case:
Corollary 3.3.2 With g E Conv(lRn x lR P ), let g* be associated with a scalar product
preserving the structure oflRm = lRn x lRP as a product space, namely:
(3.3.5)
and consider the marginal function I of (3.3.4). Let 8 ~ 0, x E dom I; suppose that
there is some Z E lR P such that g(x, z) = I(x); Z exists for example when
3so E lRn such that (so, 0) E ridomg* . (3.3.6)
Then
8ef(x) = {s E lRn : (s,O) E 8£g(x, z)}. (3.3.7)
3 Calculus Rules on the Approximate Subdifferential 119

PROOF. Set A : (x, z) t-+ x. With the scalar product (3.3.5), A* : IR n ---+ IRn x IRP
is defined by A*s = =
(s, 0), ImA* IR n x {OJ, so (3.3.6) and (3.3.7) are just (3.3.2)
and (3.3.3) respectively. 0

3.4 A Study of the Infimal Convolution

As another example of image functions, consider the infimal convolution:

where I, and !2 are both in Conv IRn. With m = 2n, and 1R2n being equipped with
the Euclidean structure of a product-space, this is indeed an image function:

(s" S2), (y" Y2)hn := (s" y,) + (S2' Y2) , }


g(y" Y2) := I, (y,) + !2(Y2); A(y" Y2) := Yt + Y2 , (3.4.1)
g*(s" S2) = !t*(s,) + g(S2); A*s = (s, s).

Theorem 3.4.1 Let 8 ~ 0 and x E dom(f, ~ h) = dom I, + dom fz. Suppose that
there are y, and Y2 such that the inf-convolution is exact at x = y, + Y2; this is the
case for example when
ridomlt n ridomg =f. 13. (3.4.2)
Then

ae(f, ~ !2)(x) = u {ae.f, (Yt) n ae2 !2(Y2) : 8i ~ 0, 8, + 82 ~ 8} . (3.4.3)

PROOF. We apply Theorem 3.3.1 to g and A of (3.4.1). First of all,

dom g* = dom It x dom 12*

so (Proposition III.2.UI)

ridomg* = ri dom It x ri dom 12*


and (3.3.2) means:

3s E IRn such that (s, s) E ri dom It x ri dom g


which is nothing other than (3.4.2). Now, with (y" Y2) as stated and s E IR n , set

8j := /;*(s) + !;(Yi) - (s, Yi) ~ 0 for i = 1,2, (3.4.4)

so that s E aeJi (Yi) for i = 1,2. Particularizing (3.3.3) to our present situation:

which, in view of the definitions (3.4.1), means


120 XI. Approximate Subdifferentials of Convex Functions

With (3.4.4), this is just (3.4.3). o

As with Theorem 3.1.1, nothing is changed if we impose 81 + 82 = 8 in (3.4.3).


The particular value 8 = 0 in the above result brings

aUI t h)(x) = all (YI) n ah(Y2)


(3.4.5)
when the inf-convolution is exact at x = YI + Y2 .
Some more information, not directly contained in (3.4.3), can also be derived: a sort
of converse result ensuring that the inf-convolution is exact.

Proposition 3.4.2 With I := II t h. consider Yi E dom Ii for i = 1, 2 and


x := YI + Y2 (E dom f). Then

If all (YI) n a12 (Y2) =f:. 0, the inf-convolution is exact at x = YI + Y2 and equality
holds in (3.4.6).

It(s) + II (YI) + g(s) + h(Y2) = (s, YI + Y2) = (s, x). (3.4.7)

Then, using It + g = (II t 12)* (Corollary X.2.l.3) and the definition of an


inf-convolution,
UI t h)*(s) + UI t h)(x) :( (s, x) ;

in view of the Fenchel inequality (X. 1. l.3), this is actually an equality, i.e. SEaI (x).
Now use this last equality as a value for (s, x) in (3.4.7) to obtain

i.e. the inf-convolution is exactatx = YI + Y2; equality in (3.4.6) follows from (3.4.5).
o

In summary, fix x E dom(fl t h) and denote by H C Rn x Rn the hyperplane ofequation


YI + Y2 =x. When (YI, Y2) describes H, the set D(YI, Y2):= a/I(Y]) nah(Y2) assumes at
most two values: the empty set and a(fl t h)(x). Actually, there are two possibilities:
- either D(YI, Y2) = 0 for all (YI, Y2) E H;
- or D(YI. Y2) =I 0 for some (YI, Y2) E H. This implies a(fl t h)(x) =I 0 and then, we have
for all (YI. Y2) E H:

D(YI. Y2) =I 0 {:=} D(YI. Y2) = a(fl t h)(x) {:=}


{:=} the inf-convolution is exact at x = YI + Y2 .
3 Calculus Rules on the Approximate Subdifferential 121

Remark 3.4.3 Beware that D may be empty on the whole of H while aUI i!i h)(x) ¥= 0.
Take for example
II (y) = exp Y and 12 = exp( -Y);
then II i!i 12 == 0, hence aUI i!i h) == {O}. Yet D(YI, Y2) = 0 for all (YI, Y2) E H: the
inf-convolution is nowhere exact.
Note also that (3.4.5) may express the equality between empty sets: take for example
II = 1[0, +oo[ and
R ~() {-,JYi. ifY2;?:0,
3 Y2 1-+ J2 Y2 = +00 otherwise.

It is easy to see that

UI i!i h)(x) = inf {-,JYi. : 0 ~ Y2 ~ x} = -.Ji' for all x;?: O.

This inf-convolution is exact in particular at 0 = 0 + 0, yet aUI i!i 12)(0) = 0. 0

Example 3.4.4 (Moreau-Yosida Regularizations) For e > 0 and I E Conv IRn, let
I(c) := f t (1/2 ell . 11 2), i.e.
f(c)(X) = min {f(y) + !ellx - yll2 : y E IRn};

this f(c) is called the Moreau-Yosida regularization of f. Denote by Xc the unique


minimal y above, characterized by

o E of (xc) + e(xc - x).

Using the approximate subdifferential of 1/211 . 112 (see Example 1.2.2 if necessary),
Theorem 3.4.1 gives

Oef(c)(x)= U [oe-af(xc)nB(e(X-xc),J2ea)].
o~a~e

It is interesting to note that, when e = 0, this formula reduces to


Vf(c)(x) = e(x -xc) [E of (x c)]. (3.4.8)

Thus f(c) is a differentiable convex function. It can even be said that V f(c) is Lips-
chitzian with constant e on IRn. To see this, recall that the conjugate

f(~) = f* + ~ II . 112
is strongly convex with modulus 1/e; then apply Theorem X.4.2.1.
In the particular case where e = 1 and f is the indicator function of a closed
convexsetC, f(c) = 1/2d~ (the squared distance to C) and Xc = Pc (x) (the projection
of x onto C). Using the notation of (1.1.6):

oeGd~)(x)= U [Nc,e-a(Pc(X»nB(x- p c(x),.J2a)]. 0


o~a~e
122 XI. Approximate Subdifferentia1s of Convex Functions

Another regularization is ftc] := f t cll . II, i.e.


f[c](x) := inf {f(y) + cllx - yll : y E IR n }, (3.4.9)
which will be of importance in §4.1.
Proposition 3.4.5 For f E ConvlRn , define ftc] as above; assume
c >.~:= inf {lIsll : s E 8f(x), x E dom8f}
and consider
cc[n := {x E IRn : f[c](x) = f(x)} ,
the coincidence set of f and ftc]. Then
(i) ftc] is convex, finite-valued, and Lipschitzian with Lipschitz constant c on IRn;
(ii) the coincidence set is nonempty and characterized by
Cc[n = {x E IRn : 8f(x) n B(O, c) =f. 0} ;
(iii) for all x E Cc[n, there holds
8ef[c](x) = 8ef(x) n B(O, c) .
PROOF. [(i)] There exist by assumption Xo and So E 8f(xo) with IIsoll ::;; c, hence
f ~ f(xo) + (so,, - xo) and cll·1I ~ (so,,);
f and cll . II are minorized by a common affine function: from Proposition IV.2.3.2,
ftc] E Conv IRn . Furthermore, ftc] is finite everywhere by construction.
Now take x and x' in IRn. For any TJ > 0, there is Y'1 such that
f(y'1) + cllx - Y'111 ::;; f[c](x) + TJ
and, by definition of .f[C]'
f[c] (x') ::;; f(y'1) + cllx' - Y'111 ::;; f(y'1) + cllx' - xII + cllx - Y'111
::;; f[c](x) + cllx' - xII + TJ·
This finishes the proof of (i), since TJ was arbitrary.
[(ii) and (iii)] To say that x E Cc[n is to say that the inf-convolution of f and cll·1I is
exactatx = x+O. From Theorem 3.4.2, thismeansthat8f(x)nB(0, c) isnonempty:
indeed
8(cll . 11)(0) = cB(O, 1) = B(O, c) = 8e (cll . 11)(0) .
The last equality comes from the sublinearity of c 11·11 (cf. Example 1.2.5) and implies
with the aid of Theorem 3.4.1:
8e .f[c](x) = U{8a f(x) n B(O, c) : 0::;; a::;; e} = 8ef(x) n B(O, c). 0

Just as in Example 3.4.4, we can particularize ftc] to the case where f is the
indicator function of a closed convex set C: we get ftc] = cdc and therefore
8e (cdc)(x) = NC,e(x) n B(O, c) for all x E C.
Thus, when e increases, the set K' in Fig. V.2.3.1 increases but stays confined within
B(O, c).
3 Calculus Rules on the Approximate Subdifferential 123

Remark 3.4.6 For x fj. Cc[!], there is some Ye =f:. x yielding the infimum in (3.4.9).
At such a Ye, the Euclidean norm is differentiable, so we obtain that f[e] is differentiable
as well and
x - Ye
V f[e] (x) = c Ilx _ Yell E 8f(Ye) .

Compare this with Example 3.4.4: here Ye need not be unique, but the direction Ye - x
depends only on x. 0

3.5 Maximum of Functions

In view of Theorem X.2.4.4, computing the approximate subdifferential of a supre-


mum involves the closed convex hull of an infimum. Here, we limit ourselves to the
case offmitely many functions, say f := maxj=l, ... ,p f), a situation complicated
enough. Thus, to compute f*(s), we have to solve
p
i!lf. Lajfj*(sj) ,
a},s} j=l
p
L ajsj =s , (3.5.1)
j=l
p
aj ~ 0, L aj =1 [i.e. a E Llp] .
j=l

An important issue is whether this minimization problem has a solution, and whether
the infimal value is a closed function of s. Here again, the situation is not simple when
the functions are extended-valued.

Theorem 3.5.1 Let fl' ... , fp be a finite number of convex functions from lRn to lR
and let f := maxj f); set m := min{p, n + I}. Then, S E 8ef(x) ifand only if there
exist m vectors Sj E dom fj*, convex multipliers aj, and nonnegative I>j such that

Sj E 8sj / aj f)(x) for all j such that aj > 0, }


S = L,jajsj, (3.5.2)
L,j[l>j - ajf)(x)] + f(x) ~ 1>.

PROOF. Theorem X.2.4.7 tells us that

m
f*(s) = Lajft(sj) ,
j=l

where (aj, Sj) E lR+ x dom ft form a solution of(3.5.1)(afterpossible renumbering


of the j's). By the characterization (1.2.1), the property S E 8ef(x) is therefore
equivalent to the existence of (aj, Sj) E lR+ x dom fj* - more precisely a solution of
(3.5.1) - such that
124 Xl. Approximate Subdifferentials of Convex Functions

m m
"Lajfj*(Sj) + f(x) ~ S + "Laj(sj,x) ,
j=1 j=1
which we write
m m
"Laj[fj*(Sj) + Jj(x) - (Sj' x)] ~ S + "LajJj(x) - f(x). (3.5.3)
j=1 j=1
Thus, if S E ad(x), i.e. if(3.5.3) holds, we can set

Sj := aj[f/(sj) + fj(x) - (Sj, x)]


so that Sj E aSj/ajJj(x) ifaj > 0: (Sj, aj, Sj) are exhibited for (3.5.2). Conversely,
if(3.5.2) holds, multiply by aj each of the inequalities
*
fj (Sj) + Jj(x) - Sj
(Sj' x) ~ - ;
aj
add to the list thus obtained those inequations having aj = 0 (they hold trivially!);
then sum up to obtain (3.5.3). D

It is important to realize what (3.5.2) means. Its third relation (which, incidentally, could
be replaced by an equality) can be written
m
~)ej + ajej) ~ e, (3.5.4)
j=1
where, for each j, the number
ej := f(x) - fj(x)
is nonnegative and measures how close ij comes to the maximal f at x. Using elementary
calculus rules for approximate subdifferentials, a set-formulation of(3.5.2) is

ad(x) = U {l:J=1 a£/ajij)(x) : a E Llm, l:1=I(ej +ajej) ~ e} (3.5.5)


(remember that the e-subdifferential of the zero function is identically {O}). Another obser-
vation is as follows: those elements (aj, Sj) with aj = 0 do not matter and can be dropped
fromthecombinationmakingups in (3.5.2). Then set 1)j := ej/aj to realize that S E ad(x)
if and only ifthere are positive aj summing up to 1, 1)j ~ 0 and Sj E a'ljf(x) such that

Laj(1)j+ej)~e and Lajsj=s. (3.5.6)


j j
The above formulae are rather complicated, even more so than (3.1.5) corresponding to
a sum - which was itself not simple. In Theorem 3.5.1, denote by
J£(x) := {j : ij(x) ~ f(x) - e}
the set of e-active indices; then we have the inclusion
UjEJ,(X)aij(x) c ad(x) ,
which is already simpler. Unfortunately, equality need not hold: with n = 1, take fl (~) = ~,
h (~) = -~. At ~ = 1, the left-hand side is {I} for all e E [0, 2[, while the right-hand side
has been computed in §l.l: ad(l) = [1- e, 1].
In some cases, the exact formulae simplify:
3 Calculus Rules on the Approximate Subdifferential 125

Example 3.5.2 Consider the function f+ := max{O, f}, where f : lRn --+ lR is convex. We
get
as(r)(x) = u {a8(af)(X) : 0::;; a::;; 1, 8 + rex) - af(x) ::;; 8} .
Setting 8(a) := 8 - rex) + af(x), this can be written as

with the convention ard(x) = 0 if1} < o. o

Another important simplification is obtained when each fJ is affine:

Example 3.5.3 (piecewise Affine Functions) Take

f(x):=max{(sj,x}+bj: j=I, ... ,p}. (3.5.7)

Each 8j /arsubdifferential in (3.5.2) is constantly {Sj}: the 81'S play no role and can be
eliminated from (3.5.4), which can be used as a mere definition

o [::;; L]=I 8j] ::;; 8 - L)=I ajej .


Thus, the approximate subdifferential ofthe function (3.5.7) is the compact convex polyhedron

ad(x) = {L)=I ajsj : a E..1 p , L)=I ajej ::;; 8} .


In this formulation, ej = f(x) - (Sj' x) - bj. The role of ej appears more explicitly if
the origin of the graph-space is carried over to (x, f(x» (remember Example VI.3.4): f of
(3.5.7) can be alternatively defined by

lRn 3 Y 1-+ fey) = f(x) + max {-ej + (Sj, y -x) : j = 1, ... , pl·
The constant term f(x) is of little importance, as far as subdifferentials are concerned.
Neglecting it, ej thus appears as the value at x (the point where asf is computed) ofthe j'll
affine function making up f.
Geometrically, as f (x) dilates when 8 increases, and describes a sort of spider web with
af(x) as "kernel". When 8 reaches the value maxj ej' ad(x) stops at CO{SI, ... , Sp}. 0

Finally, if 8 = 0, (3.5.4) can be satisfied only by 8j = 0 and ajej = 0 for all j. We thus
recover the important Corollary VI.4.3.2. Comparing it with (3.5.6), we see that, for larger ej
or 1}j, the 1}rsubgradient Sj is "more remote from af(x)", and as a result, its weightaj must
be smaller.

3.6 Post-Composition with an Increasing Convex Function

Theorem 3.6.1 Let f : ~n --+ ~ be convex, g E Cony ~ be increasing, and assume


the qualification condition f (~n) n int dom g =1= 0. Then,for all x such that f (x) E
domg,
s E as(g 0 f)(x) {:::::::}
381, 82 ~ 0 and a ~ 0 such that }
(3.6.1)
81 + 82 = 8, S EaSl (af)(x), a E as2 g(f(x» .
126 XI. Approximate SubdiiIerentials of Convex Functions

PROOF. [=*] Let a ~ 0 be a minimum of the function 1/Is in Theorem X.2.5.1; there
are two cases:
(a) a = O. Because domf = lRn , this implies s = 0 and (g 0 f)*(0) = g*(O). The
characterization of s = 0 E oe (g 0 f) (x) is

g*(O) + g(f(x» ~ 8, i.e. 0 E 0eg(f(x» .

Thus, (3.6.1) holds with 82 = 8,81 = 0 - and a = 0 (note: o(Of) == {OJ because f
is finite everywhere).
(b) a > O. Then (g 0 f)*(s) = aj*(s/a) + g*(a) and the characterization of
s E oe(g °f)(x) is

af*(s/a) + g*(a) + g(f(x» - (s, x) ~ 8,

i.e.
(af)*(s) + af(x) - af(x) + g*(a) + g(f(x» - (s, x) ~ 8.

Split the above left-hand side into

(af)*(s) + af(x) - (s, x) =: 81 ~ 0,


g*(a) + g(f(x» - af(x) =: 82 ~ 0,

which can be enunciated as: s E oel (af)(x), a E oe2g(f(x», 81 + 82 ~ 8.


Because, once again, 7/-subdifferentials increase with 7/, (3.6.1) is established.
[<=] (3.6.1) means:
a[f(y) - f(x)] ~ (s, y - x) - 81 for all y E lRn ,
g(r) ~ g(f(x» + a[r - f(x)] - 82 for all r E lR,

hence, with r = f(y):

g(f(y» ~ g(f(x» + (s, y - x) - 8 for all y E lRn . o


As an application, consider the set

C := {x E lRn : c(x) ~ O} with c: lRn ~ lR convex. (3.6.2)

Writing Ie as the compositionI]_oo,o]OC, the 8-normal setNe,e(x) of Definition 1.1.3


can be characterized in terms of approximate subdifferentials of c at x. Theorem 3.6.1
is valid under our qualification condition c(lRn)n ]-00, O[ oF 0, in which we recognize
Slater's assumption. Furthermore, the approximate subdifferential ofI]-oo,o] = g has
been computed in Example 1.2.5:

Then we obtain: s E Ne,e(x) if and only if there are nonnegative a, 8t. 82, with
81 + 82 = 8, such that
s E oel (ac)(x) and ac(x) + 82 ~ O. (3.6.3)
4 The Approximate Subdifferential as a Multifunction 127

Corollary 3.6.2 Let C be described by (3.6.2) and assume that c(xo) < 0 for some
Xo. Then
NC,e(x) = U {a8 (a c) (x) : a ~ 0, 8 ~ 0, 8 - ac(x) ~ 6} . (3.6.4)
Inparticular, ifx e bdC, i.e. ifc(x) = 0,
NC,e(x) = U{ae(ac)(x) : a ~ O}. (3.6.5)
PROOF. Eliminate 62 and let 8 := 6) in (3.6.3) to obtain the set-formulation (3.6.4).
Then remember (VI.l.3.6) and use again the monotonicity of the multifunction 8 ~
~~~W. 0

Naturally, (3.6.5) with 6 = 0 reduces to Theorem VI.1.3.5.


All the necessary material to reproduce Chap. VII is now at hand. For example, Corol-
lary 3.6.2 and various calculus rules from the present Section 3 allow the derivation ofneces-
sary and sufficient conditions for approximate minimality in a constrained convex minimiza-
tion problem. With the help of Example 3.5.2, the theory of exact penalty can be reproduced,
etc. It is interesting to note that the various (Q)-type assumptions, giving the calculus rules in
§X.2.2 and §X.2.3, are intimately related to the constraint qualification conditions of §Vll.2.
Let us conclude with a general remark: in §III.5.3, we have alluded to some calculus rules
for normal and tangent cones. They can be developed in a rigorous manner, by an application
of the present calculus to functions of the form /j = lej' All these exercises are left to the
reader.

4 The Approximate Subdifferential as a Multifunction

4.1 Continuity Properties of the Approximate SubditTerential

We will see in this section that the multifunction ae f is much more regular when 6 > 0
than the exact subdifferential, studied in §VI.6.2. We start with two useful properties,
stating that the approximate subdifferential (e, x) ~ aef(x) has a closed graph,
and is locally bounded on the interior of dom f; see §A.5 for the terminology.
Proposition 4.1.1 Let {(ek> Xk, Sk)} be a sequence converging to (e, x, s), with sk e
aekf(Xk) for all k. Then s e aef(x).
PROOF. By definition,
f(y) ~ f(Xk) + (Sk> y - Xk) - 6k for all y e ]Rn .
Pass to the limit on k and use the lower semi-continuity of f. o

Proposition 4.1.2 Assume intdom f '" 0; let 8 > 0 and L be such that f is Lip-
schitzian with constant L on some ball B(x, 8), where x e intdomf. Then, for all
8' < 8,
e
IIslI ~ L + 8 _ 8' (4.1.1)

whenever s e aef(y), with y e B(x, 8').


128 XI. Approximate Subdifferentials of Convex Functions

PROOF. We know (Theorem N.3.1.2) that f is locally Lipschitzian on intdomf, so


we can take x, 8, L as stated. To prove (4.1.1), take 8', y ands as stated, assume s =f:. 0
and set z := y + (8 - 8')s/lIsll in the definition

f(z) ~ f(y) + (s, z - y) - e.

Observe that z and yare in B(x, 8) and conclude

L(8 - 8') ~ (8 - 8')lIsll- e. o


a
As a result, the multifunction e f is outer semi-continuous, just as is the exact
subdifferential. But for e > 0, it also enjoys fairly strong continuity properties. We
first assume that f is finite everywhere, and then we consider the general case via the
regularized version f ~ ell . II of f. In the result below, recall from Theorem V.3.3.8
that the Hausdorff distance between two (nonempty) compact convex sets A and B
has the expression

Theorem 4.1.3 Let f: Rn -+ R be a convex LipschitzianJunction on Rn. Then there


exists K > 0 such that,for all x, x' in R n and e, e' positive:

'} (lix
LlH(aef(x), ae' f(x'» ~ --:-----{K
mm e,e
- x'il + Ie - e'I). (4.1.2)

PROOF. With d of norm 1, use (2.1.1): for any TJ > 0, there is tl1 > 0 such that

(4.1.3)

where we have used the notation qe(x, t) for the approximate difference quotient. By
assumption, we can let 8 -+ +00 in Proposition 4.1.2: there is a global Lipschitz
constant, say L, such that f;(x, d) ~ L. From

we therefore obtain
1 2L + TJ
-~--
tl1 e
Then we can write, using (2.1.1) and (4.1.3) again:

f;,(x', d) - f;(x, d) - TJ ~ qe'(x', t l1 ) - qe(x, tl1 ) =


f(x' + tl1 d) - f(x + t l1 d) + f(x) - f(x') + e' - e
= tl1
~

~
2Lllx'
-
xii + Ie' - el ~2:7](2Lllx'-xll+le'-el).
L
tl1

Remembering that TJ > 0 is arbitrary and inverting (x, e) with (x', e'), we do
obtain
4 The Approximate Subdifferential as a Multifunction 129

If.e', (x', d) - f;(x, d)1 ~ mm


.2(L
8,8
') (2Lllx' - xII + Ie' - el)
so the theorem is proved, for example with K = max{2L, 4L2}. o
This result definitely implies the inner semi-continuity of (x, e) 1---+ aef(x) for
a Lipschitz-continuous f. In particular, for fixed e > 0,

aef(y) c ae!(x) + lIy - xIlB(O, f) for all x and y,

a property already illustrated by Fig. 1.1.2. Remember from §VI.6.2 that the multi-
function af need not be inner semi-continuous; when e = 0, no inclusion resembling
the above can hold, unless x isjixed.
A local version of Theorem 4.1.3 can similarly be proved: (4.1.2) holds on the
compact sets included in int dom f. Here, we consider an extension of the result to
unbounded subdifferentials. Recall that the Hausdorff distance is not convenient for
unbounded sets. A better distance is obtained by comparing the bounded parts of
closed convex sets: for c ~ 0, we take

Corollary 4.1.4 Let f E Conv IRn. Suppose that S C dom f and f. > are such
that af(x) n B(O, 0 #- 0forall XES. Then,forall c > ~ there exists Kc such that,
°
for all x, x' in Sand e, e' positive,

mm 8,E,}(lIx' -
LlH , c(ae!(x), ae,f(x'» ~ ~{ xII + Ie' - el).

PROOF. Consider fic] := f ~ cll . II and observe that we are in the conditions of
Proposition 3.4.5: f[c] is Lipschitzian on IRn and Theorem 4.1.3 applies. The rest
follows because the coincidence set of f[c] and f contains S. 0

Applying this result to an f finite everywhere, we obtain for example the following
local Lipschitz continuity:

Corollary 4.1.5 Let f : IRn ~ IR be convex. For any 8 ~ 0, there is K8 > Osuch
that
K8
LlH(aef(x), ae!(x'» ~ -lix - x' II for all x and x' in B(O, 8).
e
a
PROOF. We know from Proposition 4.1.2 that e f is bounded on B(O, 8), so the result
is a straightforward application of Corollary 4.1.4. 0

4.2 Transportation of Approximate Subgradients

In §VI.6.3, we have seen that af(x) can be constructed by piecing together limits of
subgradients along directional sequences. From a practical point of view, this is an
130 XI. Approximate Subdifferentials of Convex Functions

important property: for example, it is the basis for descent schemes in nonsmooth opti-
mization, see Chap. IX. This kind of property is even more important for approximate
subdifferentials: remember from §II.l.2 that the only information obtainable from f
is a black box (Ul), which computes an exact subgradient at designated points. The
concept of approximate subgradient is therefore of no use, as long as there is no "black
box" to compute one. Starting from this observation, we study here the problem of
constructing aef(x) with the sole help of the same (Ul).

Theorem 4.2.1 (A. Brndsted and R.T. Rockafellar) Let be given f E ConvJR.n ,
o.
x E dom f and 8 ~ For any 11 > 0 and s E aef(x), there exist xl'/ E B(X,l1) and
sl'/ E af(xl'/) such that IIsl'/ - sil ~ 8/11.

PROOF. The data are x E dom f, 8 > 0 (if 8 = 0, just take xl'/ = x and sl'/ = s!),
11 > 0 and s E aef(x). Consider the closed convex function

JR.n 3 Y 1--+ fP(y) := f(y) + /*(s) - (s, y) .

It is nonnegative (Fenchel's inequality), satisfies fP(x) ~ 8 (cf. (1.2.1», and its subdif-
ferential is
afP(Y) = af(y) - Is} for all y E domfP = domf.
Perturb fP to the closed convex function
8
JR.n 3 y 1--+ 1/f(y) := fP(y) + -lly - x II ,
11
whose subdifferential at y E dom f is (apply Corollary 3.1.2: (3.1.2) obviously holds)

a1/f(y) = afP(Y) + ~a(ll . -xll)(y) c afP(Y) + B(O,~).


Because fP is bounded from below, the O-coercivity of the norm implies the 0-
coercivity of 1/f; there exists a point, say xl'/' minimizing 1/f on JR.n; then 0 E a1/f(xl'/)

*).
is written
o E af(xl'/) - Is} + B(O,
It remains to prove that Xl'/ E B(x, 11). Using the nonnegativity of fP and optimality of

This result can be written in a set formulation:

aef(x) c n u
1'/>0 lIy-xll ~ 1'/
{af(Y) + B(O, *)}. (4.2.1)

It says that any 8-subgradient at x can be approximated by some exact subgradient,


computed at some y, possibly different from x. For Y close to x (11 small) the approx-
imation may be coarse; an accurate approximation (11 large), may require seeking y
far from x. The value 11 = ..;e
is a compromise, which equates the deviation from x
and the degree of approximation: (4.2.1) implies
4 The Approximate Subdifferential as a Multifunction 131

aef(x) c U {af(y) + B(O,../8) : y E B(O,../8)} .

An illustrative example is the one-dimensional function f: x t--+- x + 1/x, with


x = =
1 and e 1 (which is the value eoo of §2.2). Then 1 E aef (x) but 1 is nowhere
the derivative of f: in the above proof, xfj is unbounded for 11 ~ +00.
We also see from (4.2.1) that aef(x) is contained in the closure of af(Rn) - and
the example above shows that the closure operation is necessary. Our next question
is then: given an exact subgradient s computed somewhere, how can we recognize
whether this s is in the set as f (x) that we are interested in? The answer turns out to
be very simple:

Proposition 4.2.2 (Transportation Formula) With x and x' in dom f, let s' E
af (x'). Then s' E aef (x) if and only if
f(x') ~ f(x) + (s', x' - x) - e. (4.2.2)

PROOF. The condition is obviously necessary, since the relation of definition (1.1.1)
must in particular hold at y = x'. Conversely, for s' E af (x'), we have for all y
f(y) ~ f(x') + (s', y - x') =
= f(x) + (s', y - x) + [f(x') - f(x) + (s', x - x')].

If (4.2.2) holds, s' E aef(x). o


The next chapters, dealing with numerical algorithms, will use (4.2.2) intensively.
We call it the transportationformula, since it "transports" at x a given subgradient at
x'. Geometrically, Fig.4.2.1 shows that it is very natural, and it reveals an important
concept:

x ~
Fig. 4.2.1. Linearization errors and the transportation formula

Definition 4.2.3 (Linearization Error) For (x, x', s') E dom f x dom f x IRn, the
linearization error made at x, when f is linearized at x' with slope s', is the number

e(x, x', s') := f(x) - f(x') - (s', x - x').

This linearization error is of particular interest when s' E af(x'); then, calling

ix',s'(Y) := f(x') + (s', y - x')


132 XI. Approximate Subdifferentials of Convex Functions

the corresponding affine approximation of I, there holds the relations

lx',s' :s:;; I, lx',s'(x') = I(x'),


-
Ix',s'(x) = I(x) -
, ,
e(x, x ,s ).

The definition of s' E 01 (x') via the conjugate function can also be used:

e(x, x', s') = I(x) + I*(s') - (s', x) ~ 0 if s' E ol(x'). 0

The content of the transportation formula (4.2.2), illustrated by Fig. 4.2.1, is that
any s' E ol(x') is an e(x, x', s')-subgradient of 1 at x; and also that it is not in a
tighter approximate subdifferential, i.e.

e < e(x, x', s') ==} s' l/. oel(x) .

This latter property relies on the contact (at x') between gr lx',s' and gr I. In fact, a
result slightly different from Proposition 4.2.2 is:

Proposition 4.2.4 Let s' E 0T//(x'). Then s' E oe!(x) if

I(x') ~ I(x) + (s', x' - x) - e +."

or equivalently
e(x, x', s') + ." :s:;; e .

PROOF. Proceed as for the "if" -part of Proposition 4.2.2. o


The transportation formula gives an easy answer to the pragmatic question: "Let
a subgradient be computed at some point by the black box (UI), is it an approximate
subgradient at some other point?" Answer: just compare I-and l-values. Returning
to a more theoretical framework, we now ask: given x and e, what are those x' such
that (4.2.2) holds? This question is ambiguous when ol(x') is not a singleton, so we
define two sets:
Vdx) := {x' E domol : ol(x') c oe!(x)}
Vs(x) := {x' E]Rn : ol(x') n oe!(x) =f:. 0} .

Equivalent definitions are

Ve(x):= {x' E domol : e(x,x',s'):S:;; e for all s' E ol(x')}


(4.2.3)
Vs(x) := {x' E]Rn : e(x, x', s') :s:;; e for some s' E 01 (x') } ,
and it is clear that Ve (x) C Vs (x). If 1 is differentiable on ]Rn, the two sets obviously
coincide:
Ve(x) = Vs(x) = {x' E]Rn : e(x, x', V/(x'»:s:;; e}.
The next result motivates our notation.

Proposition 4.2.5 Suppose that x E intdom/. Then


(i) Ve(x) is a neighborhood o/x ife > O.
(ii) Vs(x) is the closure o/Ve(x).
4 The Approximate Subdifferential as a Multifunction 133

PROOF. [(i)] Apply Proposition 4.1.2: there exist ~ > 0 and a constant K such that,
for all x' E B(x,~) ands' E af(x'),

e(x, x', s') ~ If(x') - f(x)1 + 1Is'1i IIx' - xII ~ 2Kllx' - xII;

so Ve(x) contains the ball of center x and radius min{~, e/(2K)}.


[(ii)] Take a point x' E Ys(x). We claim that

af(x + t(x' - x» c ae!(x) for all t E ]0, 1[; (4.2.4)

see Fig.4.2.1: we insert a point between x and x'. A consequence of (4.2.4) will be
Ys(x) c cl Ve(x) (let ttl).
To prove our claim, set d := x' - x and use the function r = ro of (2.1.3) to
realize with Remark 2.2.4 that

{e(x, x + td, s) : s E af(x + td)} = ar(l/t) for all t > o. (4.2.5)

By definition of Ys(x), we can pick s' E af(x') such that

-ar(l) 3 e(x, x', s') ~ e.

Then we take arbitrary l/t = u > 1 ands" E af(x +td), so that -e(x, x +td, s") E
ar (u). The monotonicity property (V1.6.1.1) or (1.4.2.1) of the subdifferential ar gives

[-e(x, x + td, s") + e(x, x', s')](u - 1) ~ 0

hence e(x, x + td, s") ~ e(x, x', s') ~ e, which proves (4.2.4).
There remains to prove that Ys(x) is closed, so let {xk} be a sequence of Ys(x)
converging to some x'. To each xk' we associate Sk E af(xk> n ae!(x); {Sk} is
bounded by virtue of Theorem 1.1.4: extracting a subsequence if necessary, we may
assume that {Sk} has a limit s'. Then, Proposition 4.1.1 and Theorem 1.1.4 show that
s' E af(x') n aef(x), which means that x' E Ys(x). 0

From a practical point of view, the set Ve (x) and the property (i) above are both important:
provided that y is close enough to x, the s(y) E of(y) computed by the black box (Ul)
is guaranteed to be an e-subgradient at x. Now, Ve(x) and ~(x) differ very little (they
are numerically indistinguishable), and the latter has a strong intrinsic value: by definition,
x' E ~(x) if and only if

3s' E oef(x) such that s' E of (x'), i.e. such that x' E of*(s').

In the language of multifunctions, this can be written.

~(x) = of*(oef(x». (4.2.6)

Furthermore the above proof, especially (4.2.5), establishes a connection between ~ (x)
and §2.2. First ~(x) - {x} is star-shaped. Also, consider the intersection of ~(x) with a
direction d issuing from x. Keeping (4.2.3) in mind, we see that this set is the closed interval
134 XI. Approximate Subdifferentials of Convex Functions

where the perturbed difference quotient qe is decreasing. In a word, let te(d) be the largest
element of Te = Te (d) defined by (2.2.2), with the convention te (d) = +00 if Te is empty or
unbounded (meaning that the approximate difference quotient is decreasing on the whole of
lR;). Then
¥e(x) = (x + td : t E [0, te(d)], d E B(O, I)}. (4.2.7)
See again the geometric interpretations of §2, mainly the end of §2.3.

Our neighborhoods Ve(x) and Ye(x) enjoy some interesting properties if addi-
tional assumptions are made on f:

Proposition 4.2.6
(i) If f is I-coercive, then Ye (x) is bounded.
(ii) Iff is differentiable on IR.n, then Ve (x) = Ye (x) and Vf (Ve (x» c 8d (x).
(iii) Iff is I-coercive and differentiable, then V f(Ve(x» = 8d(x).

PROOF. In case (i), f* is finite everywhere, so the result follows from (4.2.6) and the
local boundedness of the subdifferential mapping (Proposition VI.6.2.2).
When f is differentiable, the equality between the two neighborhoods has already
been observed, and the stated inclusion comes from the very definition of Ve (x).
Finally, let us establish the converse inclusion in case (iii): for s E 8e f(x), pick
y E 81*(s), i.e. s = Vf(y). The result follows from (4.2.6): y E Ye(x) = Ve(x).
o

Example 4.2.7 Let f be convex quadratic as in Example 1.2.2. Substitute in (1.2.5)

Vf(x) + Qy = Qx +b + Qy = Q(x + y) +b = Vf(x + y)

so as to obtain the form

8d(x) = {V f(x') : 4(Q(x' - x), x' - x) ~ e}, (4.2.8)

which discloses the neighborhood

Ye(x) = {x' E IR. n : 4(Q(x' - x), x' - x) ~ e}. (4.2.9)

When Q is invertible, we have a perfect duality correspondence: Ye(x) [resp.


8d(x)] is the ball of radius ,J2£ associated with the metric of Q [resp. Q-I], and
centered at x [resp. at V f(x)].
Finally note that we can use a pseudo-inverse in (4.2.9):

x' = Q-(Qx' + b - b) = Q-[Vf(x') - b]

and obtain the form

Ye(x) = {Q-[Vf(x') - b] : 4(Q(x' - x), x' - x) ~ e} .


This illustrates (4.2.6),knowingthat 81*(s) = Q-(s-b)+KerQfors-b E ImQ
and that Ker Q c Ye(x). 0
4 The Approximate Subdifferential as a Multifunction 135

A final remark: we have seen in §II.2 the importance of defining a neighborhood


of a given iterate x = Xk, to construct suitable directions of search. For the sake of
efficiency, this neighborhood had to reflect the behaviour of I near x. Here, Ye (x) is
a possible candidate: interpret the constraint in (11.2.3.1) as defining a neighborhood
to be compared with (4.2.9). However, it is only because VI and Vf* are affine
mappings that Ye (x) has such a nice expression in the quadratic case. For a general
I, Ye (x) is hardly acceptable; (4.2.7) shows that it need not be bounded, and (4.2.6)
suggests that it need not be convex, being the image of a convex set by a mapping
which is rather complicated

Example 4.2.8 Let I be defined by

I(x) = max{O, ~2 + TJ2 - I} for all x = (~, TJ) E]R2.


Consider x = (1,1), at which I(x) = 1 and V I(x) = (2,2), and take e = 1/2.
Look at Fig. 4.2.2 to see how Ye (x) is constructed: if I were the quadratic function
e + TJ2 - 1, we would obtain the ball
D = {(~, TJ) : (~- 1)2 + (TJ - 1)2 ~ I/2}

(see Example 4.2.7). However, I is minimal on the unit ball B(O, 1), i.e. 0 E a/(x')
forallx' E B(O,I);furthermore,O ¢ al /d(x).Inviewofitsdefinition(4.2.3), Ve(x)
therefore does not meet B(O, 1):

V1/ 2(X) C D\B(O, 1) .

On the other hand, it suffices to remove B(O, 1) from D, and this can be seen from the
definition (4.2.3) of Ye(x): the linearization error e(x, " .) is left unperturbed by the
max-operation defining I. In summary:

Ye(x) = {(~, TJ) : (~- 1)2 + (TJ - 1)2 ~ 1/2, ~2 + TJ2 ~ I}.
We thus obtain a nonconvex neighborhood; observe its star-shaped character. 0

8(0,1)

Fig.4.2.2. A nonconvex neighborhood Ve(x)

By contrast, another interesting neighborhood is indeed convex: it is obtained by


inverting the role of x and x' in (4.2.3). For x E intdom I, define the set illustrated
in Fig. 4.2.3:
136 XI. Approximate Subdifferentials of Convex Functions

Ve*(X) := {x' E]R.n : e(x', x, s) ~ e for all s E af(x)).

Another expression is

which shows that Ve*(x) is closed and convex. Reproduce the proof of Proposi-
tion 4.2.5(i) to see that Ve*(x) is a neighborhood of x. For the quadratic function
of Example 4.2.7, Ve*(x) and Ve(x) coincide.

grf

x' -.. f(x) + <s,x'-x>

Fig. 4.2.3. A second-order neighborhood

You might also like