0% found this document useful (0 votes)
88 views

Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative

The document discusses subgradients and subdifferentials of convex functions. It defines a subgradient as a vector that satisfies an inequality characterizing a first-order approximation of a convex function. The subdifferential is the set of all subgradients. Properties of subgradients and subdifferentials are presented, including rules for calculating subgradients of sums, max functions, and other operations on convex functions. Examples of applying these rules to specific convex functions like the absolute value and L1-norm are also provided.

Uploaded by

DAVID MORANTE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative

The document discusses subgradients and subdifferentials of convex functions. It defines a subgradient as a vector that satisfies an inequality characterizing a first-order approximation of a convex function. The subdifferential is the set of all subgradients. Properties of subgradients and subdifferentials are presented, including rules for calculating subgradients of sums, max functions, and other operations on convex functions. Examples of applying these rules to specific convex functions like the absolute value and L1-norm are also provided.

Uploaded by

DAVID MORANTE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

L.

Vandenberghe ECE236C (Spring 2020)

2. Subgradients

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative

2.1
Basic inequality

recall the basic inequality for differentiable convex functions:

f (y) ≥ f (x) + ∇ f (x)T (y − x) for all y ∈ dom f

(x, f (x))
 
∇ f (x)
−1

• the first-order approximation of f at x is a global lower bound


• ∇ f (x) defines non-vertical supporting hyperplane to epigraph of f at (x, f (x)):
 T    
∇ f (x) y x
− ≤ 0 for all (y, t) ∈ epi f
−1 t f (x)

Subgradients 2.2
Subgradient

g is a subgradient of a convex function f at x ∈ dom f if

f (y) ≥ f (x) + gT (y − x) for all y ∈ dom f

f (y)

f (x1) + g1T (y − x1)

f (x1) + g2T (y − x1)

f (x2) + g3T (y − x2)

x1 x2

g1, g2 are subgradients at x1; g3 is a subgradient at x2

Subgradients 2.3
Subdifferential

the subdifferential ∂ f (x) of f at x is the set of all subgradients:

∂ f (x) = {g | gT (y − x) ≤ f (y) − f (x), ∀y ∈ dom f }

Properties

• ∂ f (x) is a closed convex set (possibly empty)


this follows from the definition: ∂ f (x) is an intersection of halfspaces

• if x ∈ int dom f then ∂ f (x) is nonempty and bounded


proof on next two pages

Subgradients 2.4
Proof: we show that ∂ f (x) is nonempty when x ∈ int dom f

• (x, f (x)) is in the boundary of the convex set epi f

• therefore there exists a supporting hyperplane to epi f at (x, f (x)):


 T    
a y x
∃(a, b) , 0, − ≤0 ∀(y, t) ∈ epi f
b t f (x)

• b > 0 gives a contradiction as t → ∞

• b = 0 gives a contradiction for y = x +  a with small  > 0

1
• therefore b < 0 and g = a is a subgradient of f at x
|b|

Subgradients 2.5
Proof: ∂ f (x) is bounded when x ∈ int dom f

• for small r > 0, define a set of 2n points

B = {x ± re k | k = 1, . . . , n} ⊂ dom f

and define M = max f (y) < ∞


y∈B
• for every g ∈ ∂ f (x), there is a point y ∈ B with

r kgk∞ = gT (y − x)

(choose an index k with |gk | = kgk∞, and take y = x + r sign(gk )e k )


• since g is a subgradient, this implies that

f (x) + r kgk∞ = f (x) + gT (y − x) ≤ f (y) ≤ M

• we conclude that ∂ f (x) is bounded:

M − f (x)
kgk∞ ≤ for all g ∈ ∂ f (x)
r

Subgradients 2.6
Example

f (x) = max { f1(x), f2(x)} with f1, f2 convex and differentiable

f (y)

f2(y)

f1(y)

• if f1( x̂) = f2( x̂), subdifferential at x̂ is line segment [∇ f1( x̂), ∇ f2( x̂)]
• if f1( x̂) > f2( x̂), subdifferential at x̂ is {∇ f1( x̂)}
• if f1( x̂) < f2( x̂), subdifferential at x̂ is {∇ f2( x̂)}

Subgradients 2.7
Examples

Absolute value f (x) = |x|

f (x) ∂ f (x)

x −1

Euclidean norm f (x) = k xk2

1
∂ f (x) = { x} if x , 0, ∂ f (x) = {g | kgk2 ≤ 1} if x = 0
k xk2

Subgradients 2.8
Monotonicity

the subdifferential of a convex function is a monotone operator:

(u − v)T (x − y) ≥ 0 for all x , y , u ∈ ∂ f (x), v ∈ ∂ f (y)

Proof: by definition

f (y) ≥ f (x) + uT (y − x), f (x) ≥ f (y) + v T (x − y)

combining the two inequalities shows monotonicity

Subgradients 2.9
Examples of non-subdifferentiable functions

the following functions are not subdifferentiable at x = 0

• f : R → R, dom f = R+

f (x) = 1 if x = 0, f (x) = 0 if x > 0

• f : R → R, dom f = R+ √
f (x) = − x

the only supporting hyperplane to epi f at (0, f (0)) is vertical

Subgradients 2.10
Subgradients and sublevel sets

if g is a subgradient of f at x , then

f (y) ≤ f (x) =⇒ gT (y − x) ≤ 0

x
f (y) ≤ f (x)

the nonzero subgradients at x define supporting hyperplanes to the sublevel set

{y | f (y) ≤ f (x)}

Subgradients 2.11
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Subgradient calculus

Weak subgradient calculus: rules for finding one subgradient

• sufficient for most nondifferentiable convex optimization algorithms


• if you can evaluate f (x), you can usually compute a subgradient

Strong subgradient calculus: rules for finding ∂ f (x) (all subgradients)

• some algorithms, optimality conditions, etc., need entire subdifferential


• can be quite complicated

we will assume that x ∈ int dom f

Subgradients 2.12
Basic rules

Differentiable functions: ∂ f (x) = {∇ f (x)} if f is differentiable at x

Nonnegative linear combination

if f (x) = α1 f1(x) + α2 f2(x) with α1, α2 ≥ 0, then

∂ f (x) = α1 ∂ f1(x) + α2 ∂ f2(x)

(right-hand side is addition of sets)

Affine transformation of variables: if f (x) = h(Ax + b), then

∂ f (x) = AT ∂h(Ax + b)

Subgradients 2.13
Pointwise maximum

f (x) = max { f1(x), . . . , fm (x)}

define I(x) = {i | fi (x) = f (x)}, the ‘active’ functions at x

Weak result

to compute a subgradient at x , choose any k ∈ I(x), any subgradient of fk at x

Strong result
∂ f (x) = conv ∂ fi (x)
[
i∈I(x)

• the convex hull of the union of subdifferentials of ‘active’ functions at x


• if fi ’s are differentiable, ∂ f (x) = conv {∇ fi (x) | i ∈ I(x)}

Subgradients 2.14
Example: piecewise-linear function

f (x) = max (aiT x + bi )


i=1,...,m

f (x)

aiT x + bi

the subdifferential at x is a polyhedron

∂ f (x) = conv {ai | i ∈ I(x)}

with I(x) = {i | aiT x + bi = f (x)}


Subgradients 2.15
Example: `1-norm

f (x) = k xk1 = max sT x


s∈{−1,1}n

the subdifferential is a product of intervals

 [−1, 1]

 xk = 0
∂ f (x) = J1 × · · · × Jn, Jk = {1} xk > 0

 {−1} xk < 0

(1, 1)
(−1, 1) (1, 1) (1, 1)

(−1, −1) (1, −1)


(1, −1)

∂ f (0, 0) = [−1, 1] × [−1, 1] ∂ f (1, 0) = {1} × [−1, 1] ∂ f (1, 1) = {(1, 1)}

Subgradients 2.16
Pointwise supremum

f (x) = sup fα (x), fα (x) convex in x for every α


α∈A

Weak result: to find a subgradient at x̂ ,

• find any β for which f ( x̂) = f β ( x̂) (assuming maximum is attained)


• choose any g ∈ ∂ f β ( x̂)

(Partial) strong result: define I(x) = {α ∈ A | fα (x) = f (x)}

∂ fα (x) ⊆ ∂ f (x)
[
conv
α∈I(x)

equality requires extra conditions (for example, A compact, fα continuous in α)

Subgradients 2.17
Exercise: maximum eigenvalue

Problem: explain how to find a subgradient of

f (x) = λmax(A(x)) = sup yT A(x)y


k yk2 =1

where A(x) = A0 + x1 A1 + · · · + xn An with symmetric coefficients Ai

Solution: to find a subgradient at x̂ ,

• choose any unit eigenvector y with eigenvalue λmax(A( x̂))


• the gradient of yT A(x)y at x̂ is a subgradient of f :

(yT A1 y, . . . , yT An y) ∈ ∂ f ( x̂)

Subgradients 2.18
Minimization

f (x) = inf h(x, y), h jointly convex in (x, y)


y

Weak result: to find a subgradient at x̂ ,

• find ŷ that minimizes h( x̂, y) (assuming minimum is attained)


• find subgradient (g, 0) ∈ ∂h( x̂, ŷ)

Proof: for all x , y ,


h(x, y) ≥ h( x̂, ŷ) + gT (x − x̂) + 0T (y − ŷ)
= f ( x̂) + gT (x − x̂)

therefore
f (x) = inf h(x, y) ≥ f ( x̂) + gT (x − x̂)
y

Subgradients 2.19
Exercise: Euclidean distance to convex set

Problem: explain how to find a subgradient of

f (x) = inf k x − yk2


y∈C

where C is a closed convex set

Solution: to find a subgradient at x̂ ,

• if f ( x̂) = 0 (that is, x̂ ∈ C), take g = 0


• if f ( x̂) > 0, find projection ŷ = P( x̂) on C and take

1 1
g= ( x̂ − ŷ) = ( x̂ − P( x̂))
k ŷ − x̂k2 k x̂ − P( x̂)k2

Subgradients 2.20
Composition

f (x) = h( f1(x), . . . , fk (x)), h convex and nondecreasing, fi convex

Weak result: to find a subgradient at x̂ ,

• find z ∈ ∂h( f1( x̂), . . . , fk ( x̂)) and gi ∈ ∂ fi ( x̂)


• then g = z1 g1 + · · · + z k gk ∈ ∂ f ( x̂)

reduces to standard formula for differentiable h, fi

Proof:
 
f (x) ≥ h f1( x̂) + g1T (x − x̂), . . . ,
fk ( x̂) + gTk (x− x̂)
 
≥ h ( f1( x̂), . . . , fk ( x̂)) + z g1 (x − x̂), . . . , gk (x − x̂)
T T T

= h ( f1( x̂), . . . , fk ( x̂)) + (z1 g1 + · · · + z k gk )T (x − x̂)


= f ( x̂) + gT (x − x̂)

Subgradients 2.21
Optimal value function

define f (u, v) as the optimal value of convex problem

minimize f0(x)
subject to fi (x) ≤ ui, i = 1, . . . , m
Ax = b + v

(functions fi are convex; optimization variable is x )

Weak result: suppose f (û, v̂) is finite and strong duality holds with the dual
!
inf f0(x) + λi ( fi (x) − ûi ) + νT (Ax − b − v̂)
X
maximize
x
i
subject to λ0

if λ̂, ν̂ are optimal dual variables (for right-hand sides û, v̂ ) then (−λ̂, −ν̂) ∈ ∂ f (û, v̂)

Subgradients 2.22
Proof: by weak duality for problem with right-hand sides u, v
!
f (u, v) ≥ inf f0(x) + λ̂i ( fi (x) − ui ) + ν̂T (Ax − b − v)
X
x
i
!
= inf f0(x) + λ̂i ( fi (x) − ûi ) + ν̂T (Ax − b − v̂)
X
x
i

− λ̂T (u − û) − ν̂T (v − v̂)


= f (û, v̂) − λ̂T (u − û) − ν̂T (v − v̂)

Subgradients 2.23
Expectation

f (x) = E h(x, u) u random, h convex in x for every u

Weak result: to find a subgradient at x̂ ,

• choose a function u 7→ g(u) with g(u) ∈ ∂x h( x̂, u)


• then, g = Eu g(u) ∈ ∂ f ( x̂)

Proof: by convexity of h and definition of g(u),

f (x) = E h(x, u)
 
≥ E h( x̂, u) + g(u)T (x − x̂)

= f ( x̂) + gT (x − x̂)

Subgradients 2.24
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Optimality conditions — unconstrained

x? minimizes f (x) if and only


0 ∈ ∂ f (x?)

x?

this follows directly from the definition of subgradient:

f (y) ≥ f (x?) + 0T (y − x?) for all y ⇐⇒ 0 ∈ ∂ f (x?)

Subgradients 2.25
Example: piecewise-linear minimization

f (x) = max (aiT x + bi )


i=1,...,m

Optimality condition

0 ∈ conv {ai | i ∈ I(x?)} where I(x) = {i | aiT x + bi = f (x)}

• in other words, x? is optimal if and only if there is a λ with

m
λ  0, 1 λ = 1,
T
λi ai = 0, λi = 0 for i < I(x?)
X
i=1

• these are the optimality conditions for the equivalent linear program

minimize t maximize bT λ
subject to Ax + b  t1 subject to AT λ = 0
λ  0, 1T λ = 1

Subgradients 2.26
Optimality conditions — constrained

minimize f0(x)
subject to fi (x) ≤ 0, i = 1, . . . , m

assume dom fi = Rn, so functions fi are subdifferentiable everywhere

Karush–Kuhn–Tucker conditions

if strong duality holds, then x?, λ? are primal, dual optimal if and only if
1. x? is primal feasible

2. λ?  0

3. λi? fi (x?) = 0 for i = 1, . . . , m

4. x? is a minimizer of L(x, λ?) = f0(x) + λ? f (x):


Pm
i=1 i i

m
?
0 ∈ ∂ f0(x ) + λi?∂ fi (x?)
X
i=1

Subgradients 2.27
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Directional derivative

Definition (for general f ): the directional derivative of f at x in the direction y is

0 f (x + αy) − f (x)
f (x; y) = lim
α&0 α
 
1
= lim t( f (x + y) − t f (x)
t→∞ t

(if the limit exists)

• f 0(x; y) is the right derivative of g(α) = f (x + αy) at α = 0

• f 0(x; y) is homogeneous in y :

f 0(x; λy) = λ f 0(x; y) for λ ≥ 0

Subgradients 2.28
Directional derivative of a convex function

Equivalent definition (for convex f ): replace lim with inf

0 f (x + αy) − f (x)
f (x; y) = inf
α>0 α
 
1
= inf t f (x + y) − t f (x)
t>0 t

Proof

• the function h(y) = f (x + y) − f (x) is convex in y , with h(0) = 0

• its perspective th(y/t) is nonincreasing in t (ECE236B ex. A2.5); hence

f 0(x; y) = lim th(y/t) = inf th(y/t)


t→∞ t>0

Subgradients 2.29
Properties

consequences of the expressions (for convex f )

0 f (x + αy) − f (x)
f (x; y) = inf
α>0 α
 
1
= inf t f (x + y) − t f (x)
t>0 t

• f 0(x; y) is convex in y (partial minimization of a convex function in y , t )

• f 0(x; y) defines a lower bound on f in the direction y :

f (x + αy) ≥ f (x) + α f 0(x; y) for all α ≥ 0

Subgradients 2.30
Directional derivative and subgradients

for convex f and x ∈ int dom f

f 0(x; y) = sup gT y
g∈∂ f (x)
fˆ0(x, y) = gT y


f 0(x; y) is support function of ∂ f (x)
y ∂ f (x)

• generalizes f 0(x; y) = ∇ f (x)T y for differentiable functions


• implies that f 0(x; y) exists for all x ∈ int dom f , all y (see page 2.4)

Subgradients 2.31
Proof: if g ∈ ∂ f (x) then from page 2.29

0 f (x) + αgT y − f (x)


f (x; y) ≥ inf = gT y
α>0 α

it remains to show that f 0(x; y) = ĝT y for at least one ĝ ∈ ∂ f (x)

• f 0(x; y) is convex in y with domain Rn, hence subdifferentiable at all y

• let ĝ be a subgradient of f 0(x; y) at y : then for all v , λ ≥ 0,

λ f 0(x; v) = f 0(x; λv) ≥ f 0(x; y) + ĝT (λv − y)

• taking λ → ∞ shows that f 0(x; v) ≥ ĝT v ; from the lower bound on page 2.30,

f (x + v) ≥ f (x) + f 0(x; v) ≥ f (x) + ĝT v for all v

hence ĝ ∈ ∂ f (x)

• taking λ = 0 we see that f 0(x; y) ≤ ĝT y

Subgradients 2.32
Descent directions and subgradients

y is a descent direction of f at x if f 0(x; y) < 0

• the negative gradient of a differentiable f is a descent direction (if ∇ f (x) , 0)


• negative subgradient is not always a descent direction

Example: f (x1, x2) = |x1 | + 2|x2 |


x2

g = (1, 2)

x1
(1, 0)

g = (1, 2) ∈ ∂ f (1, 0), but y = (−1, −2) is not a descent direction at (1, 0)

Subgradients 2.33
Steepest descent direction

Definition: (normalized) steepest descent direction at x ∈ int dom f is

∆xnsd = argmin f 0(x; y)


k yk2 ≤1

∆xnsd is the primal solution y of the pair of dual problems (BV §8.1.3)

minimize (over y ) f 0(x; y) maximize (over g ) −kgk2


subject to k yk2 ≤ 1 subject to g ∈ ∂ f (x)

• dual optimal g? is subgradient with least norm


• f 0(x; ∆xnsd) = −kg? k2 ∂ f (x)
g?
• if 0 < ∂ f (x), ∆xnsd = −g?/kg? k2
• ∆xnsd can be expensive to compute

∆xnsd gT ∆xnsd = f 0(x, ∆xnsd)

Subgradients 2.34
Subgradients and distance to sublevel sets

if f is convex, f (y) < f (x), g ∈ ∂ f (x), then for small t > 0,

k x − tg − yk22 = k x − yk22 − 2tgT (x − y) + t 2 kgk22


≤ k x − yk22 − 2t( f (x) − f (y)) + t 2 kgk22
< k x − yk22

• −g is descent direction for k x − yk2, for any y with f (y) < f (x)

• in particular, −g is descent direction for distance to any minimizer of f

Subgradients 2.35
References

• A. Beck, First-Order Methods in Optimization (2017), chapter 3.

• D. P. Bertsekas, A. Nedić, A. E. Ozdaglar, Convex Analysis and Optimization


(2003), chapter 4.

• J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization


Algoritms (1993), chapter VI.

• Yu. Nesterov, Lectures on Convex Optimization (2018), section 3.1.

• B. T. Polyak, Introduction to Optimization (1987), section 5.1.

Subgradients 2.36

You might also like