0% found this document useful (0 votes)
3 views9 pages

Lecture 6

The document discusses convex analysis and optimization, focusing on the properties of convex functions, including directional derivatives and subgradients. It presents key lemmas and propositions that establish the relationships between convexity, differentiability, and the existence of subgradients. The material is structured as a lecture, detailing mathematical proofs and definitions relevant to the study of convex functions in optimization.

Uploaded by

liangqianrou27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views9 pages

Lecture 6

The document discusses convex analysis and optimization, focusing on the properties of convex functions, including directional derivatives and subgradients. It presents key lemmas and propositions that establish the relationships between convexity, differentiability, and the existence of subgradients. The material is structured as a lecture, detailing mathematical proofs and definitions relevant to the study of convex functions in optimization.

Uploaded by

liangqianrou27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Convex Analysis and Optimization

Lecture 6
Instructor: Wu Yuqia

1.1 Directional Derivative

Convex sets and functions can be characterized in many ways by their behavior along lines. For example,
a set is convex if and only if its intersection with any line is convex, a convex set is bounded if and only if
its intersection with every line is bounded, a function is convex if and only if it is convex along any line,
and a convex function is coercive if and only if it is coercive along any line. Similarly, it turns out that the
differentiability properties of a convex function are determined by the corresponding properties along lines.
With this in mind, we first consider convex functions of a single variable.

Lemma 1.1 Let f : I → R be a convex function, where I is an interval, i.e., a convex set of scalars, which
may be open, closed, or neither open nor closed. The convexity of f implies the following important inequality
f (y) − f (x) f (z) − f (x) f (z) − f (y)
≤ ≤ , (1)
y−x z−x z−y
which holds for all x, y, z ∈ I such that x < y < z.

y−x z−y
Proof: By noting that y = z−x z + z−x x and y−x z−y
z−x + z−x = 1, we have from the convexity of f that
   
y−x z−y
f (y) ≤ f (z) + f (x).
z−x z−x
Therefore,
     
y−x x−y y−x
f (y) − f (x) ≤ f (z) + f (x) = (f (z) − f (x)).
z−x z−x z−x
The other inequality can be obtain by the similar method.
We define
f (x + α) − f (x) − f (x) − f (x − α)
s+ (x, α) = , s (x, α) = .
α α
If x is not equal to the right end point, We define the right derivative of f at x to be
f (x + α) − f (x) f (x + α) − f (x)
f + (x) = lim = inf = inf s+ (x, α).
α&0 α α>0 α α>0

Similarly, if x is not equal to the left end point, We define the left derivative of f at x to be
f (x) − f (x − α) f (x) − f (x − α)
f − (x) = lim = sup = sup s− (x, α).
α&0 α α>0 α α>0

If I = [a, b], we define for completeness

f − (a) = −∞, f + (b) = ∞.

1-1
1-2 Lecture 6

Proposition 1.2 Let I be an interval of real numbers, whose left and right end points are denoted by a and
b, respectively, and let f : I → R be a convex function.

(a) We have f − (x) ≤ f + (x) for every x ∈ I.


(b) If x belongs to the interior of I, then f + (x) and f − (x) are finite.
(c) If x, z ∈ I and x < z, then f + (x) ≤ f − (z).
(d) The functions f − , f + : I → [−∞, +∞] are nondecreasing.

Proof:

(a) If x is an end point of I, the result follows, since if x = a then f − (x) = −∞, and if x = b then
f + (x) = ∞. Assume that x is an interior point of I. Then we let α > 0, and we use Eq. (1), with
x, y, z replaced by x − α, x, x + α, respectively, to obtain s− (x, α) ≤ s+ (x, α). Taking the limit as α
decreases to zero, we obtain f − (x) ≤ f + (x).
(b) Let x belong to the interior of I and let α > 0 be such that x − α ∈ I. Then f − (x) ≥ s− (x, α) > −∞.
Similarly, we obtain f + (x) < ∞. Part (a) then implies that f + (x) and f − (x) are finite.
(c) We use Eq. (1), with y = (z + x)/2, to obtain s+ (x, (z − x)/2) ≤ s− (z, (z − x)/2). The result then
follows because f + (x) ≤ s+ (x, (z − x)/2) and s− (z, (z − x)/2) ≤ f − (z).
(d) This follows by combining parts (a) and (c).

1.1.1 Directional derivative and gradient

We will now discuss notions of directional differentiability of multidimensional real-valued functions. The
directional derivative of a function f : Rn → R at a point x ∈ Rn in the direction y ∈ Rn is given by
f (x + αy) − f (x)
f 0 (x; y) = lim ,
α&0 α
provided that the limit exists, in which case we say that f is directionally differentiable at x in the direction
y, and we call f 0 (x; y) the directional derivative of f at x in the direction y. We say that f is directionally
differentiable at x if it is directionally differentiable at x in all directions.
Let f : Rn → R be some function, fix some x ∈ Rn , and consider the expression
f (x + αei ) − f (x)
lim ,
α&0 α
where ei is the i-th unit vector (all components are 0 except for the i-th component which is 1). If the above
limit exists, it is called the i-th partial derivative of f at the vector x and it is denoted by (∂f /∂xi )(x) or
∂f (x)/∂xi (xi in this section will denote the i-th component of the vector x).
Assuming all of these partial derivatives exist, the gradient of f at x is defined as the column vector
 ∂f (x) 
 ∂x. 1 
 ..  .
∇f (x) =  
∂f (x)
∂xn
Lecture 6 1-3

If the directional derivative of f at a vector x exists in all directions y and f 0 (x; y) is a linear function of y,
we say that f is differentiable at x. This type of differentiability is also called Gateaux differentiability.

Lemma 1.3 f is differentiable at x if and only if the gradient ∇f (x) exists and satisfies

∇f (x)> y = f 0 (x; y), ∀y ∈ Rn .

Proof: ⇒: If f is differentiable at x, the directional derivative


Pn exists for y = ei for i = 1, 2, ..., n, which
implies the existence of ∇f (x). On the other hand, since y = i=1 yi ei ,
n
X
f 0 (x; y) = yi f 0 (x; ei ) = ∇f (x)> y for ∀y ∈ Rn .
i=1

⇐: The existence of f 0 (x; y) for all y ∈ Rn is clear. The linear property of f 0 (x; y) is also given by ∇f (x)> y =
f 0 (x; y).

The function f is called differentiable over a subset U of Rn if it is differentiable at every x ∈ U . The


function f is called differentiable (without qualification) if it is differentiable at all x ∈ Rn .

Proposition 1.4 If a function f : Rn → R is convex, then f is directionally differentiable at all points


x ∈ Rn .

Proof: Let y ∈ Rn and define


Fy (α) := f (x + αy).
Then we have
f (x + αy) − f (x) Fy (α) − Fy (0)
Gy (α) := = .
α α
It is not hard to check that Fy is a convex function, which by (1.1) implies that Gy (α) is nondecreasing
sequence. Let ᾱ > 0 be a constant. We have
f (x) − f (x − ᾱy)
Gy (α) ≥ , for α > 0.
ᾱ
Therefore, when α & 0, Gy (α) &, which together with its lower boundedness implies that limα&0 Gy (α)
exists, that is, f is directional differentiable in y. Since y is arbitrarily picked, we obtain the desired result.

From the proof of the proposition, for a convex function, an equivalent definition of the directional derivative
is
f (x + αy) − f (x)
f 0 (x, y) = inf . (1.1)
α≥0 α

Proposition 1.5 Let f : Rn → R be a convex function, and let {fk } be a sequence of convex functions
fk : Rn → R with the property that limk→∞ fk (xk ) = f (x) for every x ∈ Rn and every sequence {xk } that
converges to x. Then, for any x ∈ Rn and y ∈ Rn , and any sequences {xk } and {yk } converging to x and y,
respectively, we have
lim sup fk0 (xk ; yk ) ≤ f 0 (x; y).
k→∞

Furthermore, if f is differentiable over Rn , then it is continuously differentiable over Rn .


1-4 Lecture 6

Proof: Since f is convex, it follows by Prop. 1.4 that f is directional differentiable. From the definition of
directional derivative, for any  > 0, there exists an α > 0 such that

f (x + αy) − f (x)
< f 0 (x; y) + .
α
Hence, for all sufficiently large k, we have, using (1) and the definition of directional derivative,

fk (xk + αyk ) − fk (xk )


fk0 (xk ; yk ) ≤ < f 0 (x; y) + ,
α
so by taking the limit as k → ∞,
lim sup fk0 (xk ; yk ) ≤ f 0 (x; y) + .
k→∞

Since this is true for all  > 0, we obtain lim supk→∞ fk0 (xk ; yk ) ≤ f 0 (x; y).
If f is differentiable at all x ∈ Rn , then using the continuity of f and the part of the proposition just proved,
we have for every sequence {xk } converging to x and every y ∈ Rn ,

lim sup ∇f (xk )> y = lim sup f 0 (xk ; y) ≤ f 0 (x; y) = ∇f (x)> y.


k→∞ k→∞

By replacing y with −y in the preceding argument, we obtain

− lim inf ∇f (xk )> y = lim sup(−∇f (xk )> y) ≤ −∇f (x)> y.
k→∞ k→∞

Therefore, we have ∇f (xk )> y → ∇f (x)> y for every y, which implies that ∇f (xk ) → ∇f (x). Hence, ∇f (·)
is continuous.

1.2 Subgradient and Subdifferential

Let f : Rn → R be a convex function. We say that a vector v ∈ Rn is a subgradient of f at a point x ∈ Rn if

f (z) ≥ f (x) + (z − x)> v, ∀z ∈ Rn . (2)

The set of all subgradients of a convex function f at x ∈ Rn is called the subdifferential of f at x, and is
denoted by ∂f (x).
A subgradient admits an intuitive geometrical interpretation: it can be identified with a nonvertical sup-
porting hyperplane to the epigraph of f at (x, f (x)). Such a hyperplane provides a linear approximation to
the function f , which is an underestimate of f if f is convex.
Now we discuss the existence, convexity and compactness of subdifferential.

Proposition 1.6 Let f : Rn → R be a convex function. The subdifferential ∂f (x) is nonempty, convex, and
compact for all x ∈ Rn .

Proof:
Nonemptyness: Since f is proper and convex, epif is also convex, and does not contain any vertical line.
/ int(epif ), here exists v ∈ Rn such that
Since (x, f (x)) ∈

v > x + f (x) ≤ v > y + α, ∀(y, α) ∈ epif,


Lecture 6 1-5

which implies that v > x + f (x) ≤ v > y + f (y), ∀y ∈ Rn , and hence

f (y) − f (x) ≥ (−v)> (y − x) and − v ∈ ∂f (x), ∂f (x) 6= ∅.

Convexity: Fix any v1 , v2 ∈ ∂f (x) and let vλ = λv1 + (1 − λ)v2 . We have for ∀y ∈ Rn ,

f (y) − f (x) − vλ> (y − x) = f (y) − f (x) − λv1> (y − x) + (1 − λ)v2> (y − x)


= λ f (y) − f (x) − v1> (y − x) + (1 − λ) f (y) − f (x) − v2> (y − x) ≥ 0,
 

which implies that vλ ∈ ∂f (x) and this proves the convexity of ∂f (x).
Compactness: We first prove the closedness. Let Hy = {v ∈ Rn | v > (y − x) ≤ f (y) − f (x)}, which is
clearly a closed set. Since \
∂f (x) = Hy ,
y∈Rn

we know that ∂f (x) is also closed. Next we prove the boundedness by contradiction. If ∂f (x) is not
bounded, there exists {vk } with kvk k→ ∞ such that vk ∈ ∂f (x). Define ṽk = kvvkk k . If necessary passing to
a subssequence, we assume that ṽk → ṽ. For each vk , by the definition of subdifferential, we have

vk> (y − x) ≤ f (y) − f (x), ∀y ∈ Rn .

Pick y = x + tṽ for t > 0. Then,

vk> (tṽ) = tvk> ṽ ≤ f (x + tṽ) − f (x).

By noting that tvk> ṽ = tkvk kṽk> ṽ → ∞, since ṽk> ṽ ≈ 1 and kvk k→ ∞, while the right hand side is a constant,
which leads to a contradiction. Hence we prove the boundedness.
The directional derivative and the subdifferential of a convex function are closely linked. To see this, let
d ∈ ∂f (x) and note that the subgradient inequality is equivalent to

f (x + αy) − f (x)
≥ y > d, ∀y ∈ Rn , ∀α > 0.
α
Since the quotient on the left above decreases monotonically to f 0 (x; y) as α ↓ 0, we conclude that the
subgradient inequality (2) is equivalent to

f 0 (x; y) ≥ y > d for all y ∈ Rn .

Therefore, we obtain
d ∈ ∂f (x) ⇐⇒ f 0 (x; y) ≥ y > d, ∀y ∈ Rn , (3)
and it follows that
f 0 (x; y) ≥ max y > d.
d∈∂f (x)

Next we show that the equality holds.

Proposition 1.7 Let f : Rn → R be a convex function. For every x ∈ Rn , we have

f 0 (x; y) = max y > d, ∀y ∈ Rn . (4)


d∈∂f (x)

In particular, f is differentiable at x with gradient ∇f (x) if and only if it has ∇f (x) as its unique subgradient
at x.
1-6 Lecture 6

Proof: We have already shown that

f 0 (x; y) ≥ max y > d, ∀y ∈ Rn .


d∈∂f (x)

Now we prove the reverse direction. Take x, y ∈ Rn and consider the subset of Rn+1 ,

C1 = {(z, w) | f (z) < w, z ∈ Rn },

and the half-line


C2 = {(z, w) | z = x + αy, w = f (x) + αf 0 (x; y), α ≥ 0}.
Using the definition of directional derivative and the convexity of f , it follows that these two sets are
nonempty, convex, and disjoint. Thus we can use the Separating Hyperplane Theorem to assert the existence
of a nonzero vector (µ, γ) ∈ Rn+1 such that

γw + µ> z ≥ γ(f (x) + αf 0 (x; y)) + µ> (x + αy), ∀α ≥ 0, z ∈ Rn , w > f (z). (5)

We cannot have γ < 0 since then the left-hand side above could be made arbitrarily small by choosing w
sufficiently large. Also if γ = 0, then Eq. (5) implies that µ = 0, which is a contradiction. Therefore, γ > 0
and by dividing with γ in Eq. (5) if necessary, we may assume that γ = 1, i.e.,

w + (z − x)> µ ≥ f (x) + αf 0 (x; y) + αy > µ, ∀α ≥ 0, z ∈ Rn , w > f (z). (6)

By setting α = 0 in the above relation and by taking the limit as w ↓ f (z), we obtain

f (z) ≥ f (x) − (z − x)> µ, ∀z ∈ Rn ,

implying that −µ ∈ ∂f (x). By setting z = x and α = 1 in Eq. (6), and by taking the limit as w ↓ f (x), we
obtain −y > µ ≥ f 0 (x; y), which implies that

max y > d ≥ f 0 (x; y),


d∈∂f (x)

and completes the proof of Eq. (4).


From the definition of directional derivative, we see that f is differentiable at x if and only if the directional
derivative f 0 (x; y) is a linear function of the form f 0 (x; y) = ∇f (x)> y. Thus, from Eq. (4), f is differentiable
at x if and only if it has ∇f (x) as its unique subgradient at x.

Proposition 1.8 Let f : Rn → R be a convex function.


S
(a) If X is a bounded set, then the set x∈X ∂f (x) is bounded.

(b) If a sequence {xk } converges to a vector x ∈ Rn and dk ∈ ∂f (xk ) for all k, then the sequence {dk } is
bounded and each of its limit points is a subgradient of f at x.

Proof: (a) Assume the contrary, i.e., that there exists a sequence {xk } ⊂ X, and an unbounded sequence
{dk } with dk ∈ ∂f (xk ) for all k. Without loss of generality, we assume that dk 6= 0 for all k, and we
denote yk = dk /kdk k. Since both {xk } and {yk } are bounded, they must contain convergent subsequences.
We assume without loss of generality that {xk } converges to some x and {yk } converges to some y. Since
dk ∈ ∂f (xk ), we have
f (xk + yk ) − f (xk ) ≥ d>
k yk = kdk k.
Lecture 6 1-7

Since {xk } and {yk } converge, by the continuity of f , the left-hand side above is bounded. This implies that
the right-hand side is bounded, thereby contradicting the unboundedness of {dk }.
(b) By Prop. 1.7, we have
y > dk ≤ f 0 (xk ; y), ∀y ∈ Rn .
By part (a), the sequence {dk } is bounded, so let d be a limit point of {dk }. By taking the limit along the
relevant subsequence in the above relation and by using Prop. 1.5, it follows that
y > d ≤ lim sup f 0 (xk ; y) ≤ f 0 (x; y), ∀y ∈ Rn .
k→∞

Therefore, by Eq. (3), we have d ∈ ∂f (x).


The subdifferential of the sum of convex functions is obtained as the sum of the corresponding subdifferential.

Proposition 1.9 Let fj : Rn → R, j = 1, . . . , m, be convex functions and let f = f1 + · · · + fm . Then


∂f (x) = ∂f1 (x) + · · · + ∂fm (x).

Proof: It will suffice to prove the result for the case where f = f1 + f2 . If v1 ∈ ∂f1 (x) and v2 ∈ ∂f2 (x),
then from the subgradient inequality, we have
f1 (z) ≥ f1 (x) + (z − x)> v1 , ∀z ∈ Rn ,
f2 (z) ≥ f2 (x) + (z − x)> v2 , ∀z ∈ Rn .
So by adding, we obtain
f (z) ≥ f (x) + (z − x)> (v1 + v2 ), ∀z ∈ Rn .
Hence, v1 + v2 ∈ ∂f (x), implying that ∂f1 (x) + ∂f2 (x) ⊆ ∂f (x).
To prove the reverse inclusion, assume to arrive at a contradiction, that there exists a v ∈ ∂f (x) such that
v∈ / ∂f1 (x) + ∂f2 (x). Since by Prop. 1.6, the sets ∂f1 (x) and ∂f2 (x) are compact, the set ∂f1 (x) + ∂f2 (x)
is compact, and by the Strict Separation Theorem, there exists a hyperplane strictly separating v from
∂f1 (x) + ∂f2 (x), i.e., a vector y and a scalar b such that
y > (v1 + v2 ) < b < y > v, ∀v1 ∈ ∂f1 (x), ∀v2 ∈ ∂f2 (x).
Therefore,
sup y > v1 + sup y > v2 < y > v,
v1 ∈∂f1 (x) v2 ∈∂f2 (x)

and by Prop. 1.7,


f10 (x; y) + f20 (x; y) ≤ y > v.
By using the definition of directional derivative, we have f10 (x; y) + f20 (x; y) = f 0 (x; y), so that
f 0 (x; y) < y > d,
which contradicts the assumption d ∈ ∂f (x), in view of Prop. 1.7.
Finally, we present some versions of the chain rule for directional derivatives and subgradients.

Proposition 1.10 (Chain Rule) (a) Let f : Rm → R be a convex function, and let A be an m × n
matrix. Then, the subdifferential of the function F , defined by
F (x) = f (Ax),
is given by
∂F (x) = A> ∂f (Ax) = {A> g | g ∈ ∂f (Ax)}.
1-8 Lecture 6

(b) Let f : Rn → R be a convex function and let g : R → R be a smooth scalar function. Then the function
F , defined by
F (x) = g(f (x)),
is directionally differentiable at all x, and its directional derivative is given by

F 0 (x; y) = ∇g(f (x))f 0 (x; y), ∀x ∈ Rn , ∀y ∈ Rn . (7)

Furthermore, if g is convex and monotonically nondecreasing, then F is convex and its subdifferential
is given by
∂F (x) = ∇g(f (x))∂f (x), ∀x ∈ Rn . (8)

Proof: (a) From the definition of directional derivative, it can be seen that

F 0 (x; y) = f 0 (Ax; Ay), ∀y ∈ Rn .

Let g ∈ ∂f (Ax). Then by Prop. 1.7, we have

g > z ≤ f 0 (Ax; z), ∀z ∈ Rm ,

and in particular,
g > Ay ≤ f 0 (Ax; Ay), ∀y ∈ Rn ,
or
(A> g)> y ≤ F 0 (x; y), ∀y ∈ Rn .
Hence, by Eq. (3), we have A> g ∈ ∂F (x), so that A> ∂f (Ax) ⊆ ∂F (x).
To prove the reverse inclusion, assume to arrive at a contradiction, that there exists a d ∈ ∂F (x) such that
d∈/ A> ∂f (Ax). Since by Prop. 1.6, the set ∂f (Ax) is compact, the set A> ∂f (Ax) is also compact, and by
the Strict Separation Theorem, there exists a hyperplane strictly separating d from A> ∂f (Ax), i.e., a vector
y and a scalar c such that
y > (A> g) < c < y > d, ∀g ∈ ∂f (Ax).
From this we obtain
max (Ay)> g < y > d,
g∈∂f (Ax)

or, by using Prop. 1.7,


f 0 (Ax; Ay) < y > d.
Since f 0 (Ax; Ay) = F 0 (x; y), it follows that

F 0 (x; y) < y > d,

which contradicts the assumption d ∈ ∂F (x), in view of Prop. 1.7.


(b) We have F 0 (x; y) = limα↓0 s(x, y, α), provided the limit exists, where

F (x + αy) − F (x) g(f (x + αy)) − g(f (x))


s(x, y, α) = = . (9)
α α
From the convexity of f , it follows that there are three possibilities (see Exercise 1.6):

(1) For some ᾱ > 0, f (x + αy) = f (x) for all α ∈ (0, ᾱ],

(2) For some ᾱ > 0, f (x + αy) > f (x) for all α ∈ (0, ᾱ],
Lecture 6 1-9

(3) For some ᾱ > 0, f (x + αy) < f (x) for all α ∈ (0, ᾱ].

In case (1), from Eq. (9), we have limα↓0 s(x, y, α) = 0 and F 0 (x; y) = 0, so Eq. (7) holds.
In case (2), Eq. (9) is written as

f (x + αy) − f (x) g(f (x + αy)) − g(f (x))


s(x, y, α) = · .
α f (x + αy) − f (x)

As α ↓ 0, we have f (x + αy) → f (x), so the preceding equation yields

F 0 (x; y) = lim s(x, y, α) = ∇g(f (x))f 0 (x; y).


α↓0

The proof for case (3) is similar.


If g is convex and monotonically nondecreasing, then F is convex (see Exercise 1.4). To obtain the equation
for the subdifferential of F , we note that by Eq. (3), d ∈ ∂F (x) if and only if y > d ≤ F 0 (x; y) for all y ∈ Rn ,
or equivalently (from what has already been shown)

y > d ≤ ∇g(f (x))f 0 (x; y), ∀y ∈ Rn .

If ∇g(f (x)) = 0, this relation yields d = 0, so ∂F (x) = {0} and the desired Eq. (8) holds. If ∇g(f (x)) 6= 0,
we have ∇g(f (x)) > 0 by the monotonicity of g, so we obtain

d
y> ≤ f 0 (x; y), ∀y ∈ Rn ,
∇g(f (x))

which, by Eq. (3), is equivalent to d/∇g(f (x)) ∈ ∂f (x). Thus, we have shown that d ∈ ∂F (x) if and only if
d/∇g(f (x)) ∈ ∂f (x), which proves the desired Eq. (8).

You might also like