Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
2. Subgradients
• definition
• subgradient calculus
• directional derivative
2.1
Basic inequality
(x, f (x))
∇ f (x)
−1
Subgradients 2.2
Subgradient
f (y)
x1 x2
Subgradients 2.3
Subdifferential
Properties
Subgradients 2.4
Proof: we show that ∂ f (x) is nonempty when x ∈ int dom f
1
• therefore b < 0 and g = a is a subgradient of f at x
|b|
Subgradients 2.5
Proof: ∂ f (x) is bounded when x ∈ int dom f
B = {x ± re k | k = 1, . . . , n} ⊂ dom f
r kgk∞ = gT (y − x)
M − f (x)
kgk∞ ≤ for all g ∈ ∂ f (x)
r
Subgradients 2.6
Example
f (y)
f2(y)
f1(y)
• if f1( x̂) = f2( x̂), subdifferential at x̂ is line segment [∇ f1( x̂), ∇ f2( x̂)]
• if f1( x̂) > f2( x̂), subdifferential at x̂ is {∇ f1( x̂)}
• if f1( x̂) < f2( x̂), subdifferential at x̂ is {∇ f2( x̂)}
Subgradients 2.7
Examples
f (x) ∂ f (x)
x −1
1
∂ f (x) = { x} if x , 0, ∂ f (x) = {g | kgk2 ≤ 1} if x = 0
k xk2
Subgradients 2.8
Monotonicity
Proof: by definition
Subgradients 2.9
Examples of non-subdifferentiable functions
• f : R → R, dom f = R+
• f : R → R, dom f = R+ √
f (x) = − x
Subgradients 2.10
Subgradients and sublevel sets
if g is a subgradient of f at x , then
f (y) ≤ f (x) =⇒ gT (y − x) ≤ 0
x
f (y) ≤ f (x)
{y | f (y) ≤ f (x)}
Subgradients 2.11
Outline
• definition
• subgradient calculus
• directional derivative
Subgradient calculus
Subgradients 2.12
Basic rules
∂ f (x) = AT ∂h(Ax + b)
Subgradients 2.13
Pointwise maximum
Weak result
Strong result
∂ f (x) = conv ∂ fi (x)
[
i∈I(x)
Subgradients 2.14
Example: piecewise-linear function
f (x)
aiT x + bi
[−1, 1]
xk = 0
∂ f (x) = J1 × · · · × Jn, Jk = {1} xk > 0
{−1} xk < 0
(1, 1)
(−1, 1) (1, 1) (1, 1)
Subgradients 2.16
Pointwise supremum
∂ fα (x) ⊆ ∂ f (x)
[
conv
α∈I(x)
Subgradients 2.17
Exercise: maximum eigenvalue
(yT A1 y, . . . , yT An y) ∈ ∂ f ( x̂)
Subgradients 2.18
Minimization
therefore
f (x) = inf h(x, y) ≥ f ( x̂) + gT (x − x̂)
y
Subgradients 2.19
Exercise: Euclidean distance to convex set
1 1
g= ( x̂ − ŷ) = ( x̂ − P( x̂))
k ŷ − x̂k2 k x̂ − P( x̂)k2
Subgradients 2.20
Composition
Proof:
f (x) ≥ h f1( x̂) + g1T (x − x̂), . . . ,
fk ( x̂) + gTk (x− x̂)
≥ h ( f1( x̂), . . . , fk ( x̂)) + z g1 (x − x̂), . . . , gk (x − x̂)
T T T
Subgradients 2.21
Optimal value function
minimize f0(x)
subject to fi (x) ≤ ui, i = 1, . . . , m
Ax = b + v
Weak result: suppose f (û, v̂) is finite and strong duality holds with the dual
!
inf f0(x) + λi ( fi (x) − ûi ) + νT (Ax − b − v̂)
X
maximize
x
i
subject to λ0
if λ̂, ν̂ are optimal dual variables (for right-hand sides û, v̂ ) then (−λ̂, −ν̂) ∈ ∂ f (û, v̂)
Subgradients 2.22
Proof: by weak duality for problem with right-hand sides u, v
!
f (u, v) ≥ inf f0(x) + λ̂i ( fi (x) − ui ) + ν̂T (Ax − b − v)
X
x
i
!
= inf f0(x) + λ̂i ( fi (x) − ûi ) + ν̂T (Ax − b − v̂)
X
x
i
Subgradients 2.23
Expectation
f (x) = E h(x, u)
≥ E h( x̂, u) + g(u)T (x − x̂)
= f ( x̂) + gT (x − x̂)
Subgradients 2.24
Outline
• definition
• subgradient calculus
• directional derivative
Optimality conditions — unconstrained
x?
Subgradients 2.25
Example: piecewise-linear minimization
Optimality condition
m
λ 0, 1 λ = 1,
T
λi ai = 0, λi = 0 for i < I(x?)
X
i=1
• these are the optimality conditions for the equivalent linear program
minimize t maximize bT λ
subject to Ax + b t1 subject to AT λ = 0
λ 0, 1T λ = 1
Subgradients 2.26
Optimality conditions — constrained
minimize f0(x)
subject to fi (x) ≤ 0, i = 1, . . . , m
Karush–Kuhn–Tucker conditions
if strong duality holds, then x?, λ? are primal, dual optimal if and only if
1. x? is primal feasible
2. λ? 0
m
?
0 ∈ ∂ f0(x ) + λi?∂ fi (x?)
X
i=1
Subgradients 2.27
Outline
• definition
• subgradient calculus
• directional derivative
Directional derivative
0 f (x + αy) − f (x)
f (x; y) = lim
α&0 α
1
= lim t( f (x + y) − t f (x)
t→∞ t
• f 0(x; y) is homogeneous in y :
Subgradients 2.28
Directional derivative of a convex function
0 f (x + αy) − f (x)
f (x; y) = inf
α>0 α
1
= inf t f (x + y) − t f (x)
t>0 t
Proof
Subgradients 2.29
Properties
0 f (x + αy) − f (x)
f (x; y) = inf
α>0 α
1
= inf t f (x + y) − t f (x)
t>0 t
Subgradients 2.30
Directional derivative and subgradients
f 0(x; y) = sup gT y
g∈∂ f (x)
fˆ0(x, y) = gT y
ĝ
f 0(x; y) is support function of ∂ f (x)
y ∂ f (x)
Subgradients 2.31
Proof: if g ∈ ∂ f (x) then from page 2.29
• taking λ → ∞ shows that f 0(x; v) ≥ ĝT v ; from the lower bound on page 2.30,
hence ĝ ∈ ∂ f (x)
Subgradients 2.32
Descent directions and subgradients
g = (1, 2)
x1
(1, 0)
g = (1, 2) ∈ ∂ f (1, 0), but y = (−1, −2) is not a descent direction at (1, 0)
Subgradients 2.33
Steepest descent direction
∆xnsd is the primal solution y of the pair of dual problems (BV §8.1.3)
Subgradients 2.34
Subgradients and distance to sublevel sets
• −g is descent direction for k x − yk2, for any y with f (y) < f (x)
Subgradients 2.35
References
Subgradients 2.36