0% found this document useful (0 votes)

143 views

06 SG Method

The document summarizes the subgradient method for minimizing convex functions that may not be differentiable. It begins by introducing subgradients as generalizations of gradients for nondifferentiable functions. It then provides examples of subgradients for various functions. The subgradient method is presented as an analog of gradient descent that replaces gradients with subgradients. Convergence analysis shows the subgradient method converges at a rate of O(1/sqrt(k)) for k iterations. The document also discusses applications of subgradients to problems like soft thresholding and finding the intersection of convex sets.

Uploaded by

Samsul Rahmadani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

143 views

06 SG Method

Uploaded by

Samsul Rahmadani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Subgradient method

Geoff Gordon & Ryan Tibshirani

Optimization 10-725 / 36-725

Remember gradient descent

We want to solve
min f (x),

xRn

for f convex and differentiable

Gradient descent: choose initial x(0) Rn , repeat:
x(k) = x(k1) tk f (x(k1) ),

k = 1, 2, 3, . . .

If f Lipschitz, gradient descent has convergence rate O(1/k)

Downsides:
Can be slow later
Doesnt work for nondifferentiable functions today

Outline

Today:
Subgradients
Examples and properties
Subgradient method
Convergence rate

Subgradients
Remember that for convex f : Rn R,
f (y) f (x) + f (x)T (y x) all x, y
I.e., linear approximation always underestimates f
A subgradient of convex f : Rn R at x is any g Rn such that
f (y) f (x) + g T (y x), all y
Always exists
If f differentiable at x, then g = f (x) uniquely
Actually, same definition works for nonconvex f (however,

subgradient need not exist)

Examples

0.5

0.0

0.5

f(x)

1.0

1.5

2.0

Consider f : R R, f (x) = |x|

For x 6= 0, unique subgradient g = sign(x)

For x = 0, subgradient g is any element of [1, 1]

Consider f : Rn R, f (x) = kxk (Euclidean norm)

f(x)

For x 6= 0, unique subgradient g = x/kxk

For x = 0, subgradient g is any element of {z : kzk 1}
6

Consider f : Rn R, f (x) = kxk1

f(x)

For xi 6= 0, unique ith component gi = sign(xi )

For xi = 0, ith component gi is an element of [1, 1]
7

f(x)

Let f1 , f2 : Rn R be convex, differentiable, and consider

f (x) = max{f1 (x), f2 (x)}

For f1 (x) > f2 (x), unique subgradient g = f1 (x)

For f2 (x) > f1 (x), unique subgradient g = f2 (x)
For f1 (x) = f2 (x), subgradient g is any point on the line

segment between f1 (x) and f2 (x)

Subdifferential

Set of all subgradients of convex f is called the subdifferential:

f (x) = {g Rn : g is a subgradient of f at x}
f (x) is closed and convex (even for nonconvex f )
Nonempty (can be empty for nonconvex f )
If f is differentiable at x, then f (x) = {f (x)}
If f (x) = {g}, then f is differentiable at x and f (x) = g

Connection to convex geometry

Convex set C Rn , consider indicator function IC : Rn R,
(
0 if x C
IC (x) = I{x C} =
if x
/C
For x C, IC (x) = NC (x), the normal cone of C at x,
NC (x) = {g Rn : g T x g T y for any y C}
Why? Recall definition of subgradient g,
IC (y) IC (x) + g T (y x) for all y
For y
/ C, IC (y) =
For y C, this means 0 g T (y x)
10

Subgradient calculus
Basic rules for convex functions:
Scaling: (af ) = a f provided a > 0
Addition: (f1 + f2 ) = f1 + f2
Affine composition: if g(x) = f (Ax + b), then

g(x) = AT f (Ax + b)
Finite pointwise maximum: if f (x) = maxi=1,...m fi (x), then

f (x) = conv

fi (x) ,

i:fi (x)=f (x)

the convex hull of union of subdifferentials of all active

functions at x

General pointwise maximum: if f (x) = maxsS fs (x), then

n

f (x) cl conv

o
fs (x)

s:fs (x)=f (x)

and under some regularity conditions (on S, fs ), we get =

Norms: important special case, f (x) = kxkp . Let q be such

that 1/p + 1/q = 1, then

n
o
f (x) = y : kykq 1 and y T x = max z T x
kzkq 1

Why is this a special case? Note

kxkp = max z T x
kzkq 1

Why subgradients?

Subgradients are important for two reasons:

Convex analysis: optimality characterization via subgradients,

monotonicity, relationship to duality

Convex optimization: if you can compute subgradients, then

you can minimize (almost) any convex function

Optimality condition
For convex f ,
f (x? ) = minn f (x)
xR

0 f (x? )

I.e., x? is a minimizer if and only if 0 is a subgradient of f at x?

Why? Easy: g = 0 being a subgradient means that for all y
f (y) f (x? ) + 0T (y x? ) = f (x? )
Note analogy to differentiable case, where f (x) = {f (x)}

Soft-thresholding
Lasso problem can be parametrized as
min
x

1
ky Axk2 + kxk1
2

where 0. Consider simplified problem with A = I:

min
x

1
ky xk2 + kxk1
2

Claim: solution of simple problem

soft-thresholding operator:

yi
[S (y)]i = 0

yi +

is x? = S (y), where S is the

if yi >
if yi
if yi <
16

Why? Subgradients of f (x) = 12 ky xk2 + kxk1 are

g = x y + s,
where si = sign(xi ) if xi 6= 0 and si [1, 1] if xi = 0

0.5
1.0

Soft-thresholding in
one variable:

0.0

0.5

1.0

Now just plug in x = S (y) and check we can get g = 0

1.0

0.5

0.0

0.5

1.0

Subgradient method
Given convex f : Rn R, not necessarily differentiable
Subgradient method: just like gradient descent, but replacing
gradients with subgradients. I.e., initialize x(0) , then repeat
x(k) = x(k1) tk g (k1) ,

k = 1, 2, 3, . . . ,

where g (k1) is any subgradient of f at x(k1)

Subgradient method is not necessarily a descent method, so we
(k)
keep track of best iterate xbest among x(1) , . . . x(k) so far, i.e.,
(k)

f (xbest ) = min f (x(i) )

i=1,...k

Step size choices

Fixed step size: tk = t all k = 1, 2, 3, . . .

Diminishing step size: choose tk to satisfy

X
k=1

t2k < ,

tk = ,

k=1

i.e., square summable but not summable

Important that step sizes go to zero, but not too fast
Other options too, but important difference to gradient descent:
all step sizes options are pre-specified, not adaptively computed

Convergence analysis
Assume that f :

R is convex, also:

f is Lipschitz continuous with constant G > 0,

|f (x) f (y)| Gkx yk for all x, y

Equivalently: kgk G for any subgradient of f at any x
kx(1) x k R (equivalently, kx(0) x k is bounded)

Theorem: For a fixed step size t, subgradient method satisfies

(k)

lim f (xbest ) f (x? ) + G2 t/2

Theorem: For diminishing step sizes, subgradient method satisfies

(k)
lim f (xbest ) = f (x? )
k

Basic inequality
Can prove both results from same basic inequality. Key steps:
Using definition of subgradient,

kx(k+1) x? k2
kx(k) x? k2 2tk (f (x(k) ) f (x? )) + t2k kg (k) k2
Iterating last inequality,

kx(k+1) x? k2
kx(1) x? k2 2

k
X
i=1

ti (f (x(i) ) f (x? )) +

k
X

t2i kg (i) k2

i=1

Using kx(k+1) x? k 0 and kx(1) x? k R,

k
X

(i)

ti (f (x ) f (x )) R +

i=1

k
X

t2i kg (i) k2

i=1
(k)

Introducing f (xbest ),

k
X

ti (f (x(i) ) f (x? )) 2

i=1

k
X

(k)
ti (f (xbest ) f (x? ))
i=1

Plugging this in and using kg (i) k G,

(k)

f (xbest ) f (x? )

P
R2 + G2 ki=1 t2i
P
2 ki=1 ti

Convergence proofs
For constant step size t, basic bound is
R 2 + G2 t 2 k
G2 t

as k
2tk
2
For diminishing step sizes tk ,

X
i=1

t2i < ,

ti = ,

i=1

we get
P
R2 + G2 ki=1 t2i
0 as k
P
2 ki=1 ti

Convergence rate
(k)

After k iterations, what is complexity of error f (xbest ) f (x? )?

Consider taking ti = R/(G k), all i = 1, . . . k. Then basic bound

is
P
R2 + G2 ki=1 t2i
RG
=
Pk
k
2 i=1 ti
Can show this choice is the best we can do (i.e., minimizes bound)

I.e., subgradient method has convergence rate O(1/ k)

(k)

I.e., to get f (xbest ) f (x? ) , need O(1/2 ) iterations

Intersection of sets
Example from Boyds lecture notes: suppose we want to find
x? C1 . . . Cm , i.e., find point in intersection of closed,
convex sets C1 , . . . Cm
First define
f (x) = max dist(x, Ci ),
i=1,...m

and now solve

min f (x)

xRn

Note that f (x? ) = 0 x? C1 . . . Cm

Recall distance to set C,
dist(x, C) = min{kx uk : u C}

For closed, convex C, there is a unique point minimizing kx uk

over u C. Denoted u? = PC (x), so dist(x, C) = kx PC (x)k

Let fi (x) = dist(x, Ci ), each i. Then f (x) = maxi=1,...m fi (x),

and
xP

(x)

For each i, and x

/ Ci , fi (x) = kxPCCi (x)k
i
xP

(x)

If f (x) = fi (x) 6= 0, then kxPCi (x)k f (x)

Ci
26

Now apply subgradient method with step size tk = f (x(k1) )

(Polyak step size, can show that we get convergence)
Hence at iteration k, find Ci so that x(k1) is farthest from Ci .
Then update
x(k) = x(k1) f (x(k1) )

x(k1) PCi (x(k1) )

kx(k1) PCi (x(k1) )k

= PCi (x(k1) )
Here we used
f (x(k1) ) = dist(x(k1) , Ci ) = kx(k1) PCi (x(k1) )k
For two sets, this is exactly the famous alternating projections
method, i.e., just keep projecting back and forth

(From Boyds notes)

Can we do better?
Strength of subgradient method: broad applicability

Downside: O(1/ k) rate is really slow ... can we do better?

Given starting point x(0) . Setup:
Problem class: convex functions f with solution x? , with

kx(0) x? k R, f Lipschitz with constant G > 0 on

{x : kx x(0) k R}
Weak oracle: given x, oracle returns a subgradient g f (x)
Nonsmooth first-order methods: iterative methods that start

with x(0) and update x(k) in

x(0) + span{g (0) , g (1) , . . . g (k1) }
subgradients g (0) , g (1) , . . . g (k1) come from weak oracle
29

Lower bound
Theorem (Nesterov): For any k n1 and starting point x(0) ,
there is a function in the problem class such that any nonsmooth
first-order method satisfies
f (x(k) ) f (x? )

2(1 + k + 1)

Proof: Well do the proof for k = n 1 and x(0) = 0; the proof is

similar otherwise. Let
1
f (x) = max xi + kxk2
i=1,...n
2
Solution: x? = (1/n, . . . 1/n), f (x? ) = 1/(2n)

For R = 1/ n, f is Lipschitz with G = 1 + 1/ n

Oracle: returns g = ej + x, where j is smallest index such that
xj = maxi=1,...n xi
30

Claim: for any i 1, . . . n 1, the ith iterate satisfies

(i)

xi+1 = . . . = x(i)
n =0
Start with i = 1: note g (0) = e1 . Then:
span{g (0) , g (1) } span{e1 , e2 }
span{g (0) , g (1) , g (2) } span{e1 , e2 , e3 }
...
span{g (0) , g (1) , . . . g (i1) } span{e1 , . . . ei } v

Therefore f (x(n1) ) 0, recall f (x? ) = 1/(2n), so

f (x(n1) ) f (x? )

1
RG

=
2n
2(1 + n)

Improving on the subgradient method

To improve, we must go beyond nonsmooth first-order methods
There are many ways to improve for general nonconvex problems,
e.g., localization methods, filtered subgradients, memory terms
Instead, well focus on minimizing functions of the form
f (x) = g(x) + h(x)
where g is convex and differentiable, h is convex
For a lot of problems (i.e., functions h), we can recover O(1/k)
rate of gradient descent with a simple algorithm, having big
practical consequences

References
S. Boyd, Lecture Notes for EE 264B, Stanford University,

Spring 2010-2011
Y. Nesterov (2004), Introductory Lectures on Convex

Optimization: A Basic Course, Kluwer Academic Publishers,

Chapter 3
B. Polyak (1987), Introduction to Optimization, Optimization

Software Inc., Chapter 5

R. T. Rockafellar (1970), Convex Analysis, Princeton

University Press, Chapters 2325

L. Vandenberghe, Lecture Notes for EE 236C, UCLA, Spring

2011-2012

Islam, Science, and The Challenge of History
No ratings yet
Islam, Science, and The Challenge of History
254 pages
Exercises With Solutions PDF
No ratings yet
Exercises With Solutions PDF
37 pages
Relation Between Architecture and Fashion
No ratings yet
Relation Between Architecture and Fashion
28 pages
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
21 pages
Subgradients: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradients: Ryan Tibshirani Convex Optimization 10-725
25 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
Subgradient Methods
No ratings yet
Subgradient Methods
56 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
No ratings yet
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
39 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
subgradients_slides
No ratings yet
subgradients_slides
37 pages
Primal - Dual Decomposition Methods
No ratings yet
Primal - Dual Decomposition Methods
40 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Subgradients
No ratings yet
Subgradients
39 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
13 Generalized Programming and Subgradient Optimization PDF
No ratings yet
13 Generalized Programming and Subgradient Optimization PDF
20 pages
L10_Subgrad_PGD (partially annotated)
No ratings yet
L10_Subgrad_PGD (partially annotated)
39 pages
Subgradient Method
No ratings yet
Subgradient Method
22 pages
Gradient
No ratings yet
Gradient
31 pages
Gradient
No ratings yet
Gradient
37 pages
Smooth Convex Minimization Problems
No ratings yet
Smooth Convex Minimization Problems
28 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
Nonlinear Programming 3rd Edition Theoretical Solutions Manual
No ratings yet
Nonlinear Programming 3rd Edition Theoretical Solutions Manual
20 pages
lect4_removed
No ratings yet
lect4_removed
32 pages
Notes On Subgradients
No ratings yet
Notes On Subgradients
13 pages
Mirror Descent Slides
No ratings yet
Mirror Descent Slides
35 pages
Lecture_7_8_other_descent_methods
No ratings yet
Lecture_7_8_other_descent_methods
7 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Gradient Methods For Minimizing Composite Objective Function
No ratings yet
Gradient Methods For Minimizing Composite Objective Function
31 pages
Basic Concepts: 1.1 Continuity
No ratings yet
Basic Concepts: 1.1 Continuity
7 pages
Homework 2
No ratings yet
Homework 2
5 pages
Gradient_Descent
No ratings yet
Gradient_Descent
52 pages
Nisheeth VishnoiFall2014 ConvexOptimization PDF
No ratings yet
Nisheeth VishnoiFall2014 ConvexOptimization PDF
114 pages
Chương 9
No ratings yet
Chương 9
12 pages
Bregman
No ratings yet
Bregman
9 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
Institute of Computer Science: Academy of Sciences of The Czech Republic
No ratings yet
Institute of Computer Science: Academy of Sciences of The Czech Republic
49 pages
SGD
No ratings yet
SGD
19 pages
5 Optimization: F Emp
No ratings yet
5 Optimization: F Emp
52 pages
Lec_11
No ratings yet
Lec_11
13 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
NLP Slides
No ratings yet
NLP Slides
201 pages
0105 Stoch Subgrad Notes
No ratings yet
0105 Stoch Subgrad Notes
17 pages
Hw3sol PDF
No ratings yet
Hw3sol PDF
8 pages
Maximum Slope Method
No ratings yet
Maximum Slope Method
14 pages
Unconstrained Optimization (Contd.) Constrained Optimization
No ratings yet
Unconstrained Optimization (Contd.) Constrained Optimization
19 pages
Co 463
No ratings yet
Co 463
116 pages
A note on the accelerated proximal gradient method for nonconvex optimization
No ratings yet
A note on the accelerated proximal gradient method for nonconvex optimization
9 pages
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
No ratings yet
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
3 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Raghu Meka notes
No ratings yet
Raghu Meka notes
7 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Transformation of Axes (Geometry) Mathematics Question Bank
From Everand
Transformation of Axes (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
3/5 (1)
Amy Zwicker@colorado Edu
No ratings yet
Amy Zwicker@colorado Edu
1 page
Cardiovascular-WPS Office
No ratings yet
Cardiovascular-WPS Office
13 pages
Pipeline Stress Analysis With Caesar II
No ratings yet
Pipeline Stress Analysis With Caesar II
16 pages
Weird Fiction: A Genre Study Michael Cisco - Get the ebook in PDF format for a complete experience
No ratings yet
Weird Fiction: A Genre Study Michael Cisco - Get the ebook in PDF format for a complete experience
53 pages
Sanyo Denki PY2C030U0XXXC09 Manual 20172895519
100% (1)
Sanyo Denki PY2C030U0XXXC09 Manual 20172895519
356 pages
Relative Clauses
No ratings yet
Relative Clauses
2 pages
Professionally Applied Project Kanagatova Merei 323 Исправленный
No ratings yet
Professionally Applied Project Kanagatova Merei 323 Исправленный
44 pages
Chapter-2 Lecture note
No ratings yet
Chapter-2 Lecture note
49 pages
Quantitative Problems Chapter 5
No ratings yet
Quantitative Problems Chapter 5
5 pages
Property Valuation
No ratings yet
Property Valuation
2 pages
Api 576
No ratings yet
Api 576
13 pages
WBSR - I
No ratings yet
WBSR - I
46 pages
The Delicate Art of Falling
No ratings yet
The Delicate Art of Falling
8 pages
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
Week 8 Period 2 Lesson Note
No ratings yet
Week 8 Period 2 Lesson Note
5 pages
Sense of Smell
100% (1)
Sense of Smell
27 pages
Tokyo
No ratings yet
Tokyo
9 pages
Pge Sample Utility Bill - 2
No ratings yet
Pge Sample Utility Bill - 2
8 pages
7 Matching Questions: No Answer Given
No ratings yet
7 Matching Questions: No Answer Given
5 pages
MacrocosmAndMicrocosm (CW119)
No ratings yet
MacrocosmAndMicrocosm (CW119)
310 pages
The Women of the Bible at a Glance
No ratings yet
The Women of the Bible at a Glance
44 pages
Group 7 Prelim Chap 3
50% (2)
Group 7 Prelim Chap 3
61 pages
BDRRMC E.O. 2022
No ratings yet
BDRRMC E.O. 2022
6 pages
Tudor Royal Iconography, by John N. King
No ratings yet
Tudor Royal Iconography, by John N. King
2 pages
Reading
No ratings yet
Reading
5 pages
PE 10 Q2 2nd Modules
No ratings yet
PE 10 Q2 2nd Modules
9 pages
SMMD: Dhm2 Assignment 3 Due: Thursday, June16 8:15 PM: Instructions
No ratings yet
SMMD: Dhm2 Assignment 3 Due: Thursday, June16 8:15 PM: Instructions
2 pages
Bip Complete Case Study
100% (1)
Bip Complete Case Study
5 pages

06 SG Method

Uploaded by

06 SG Method

Uploaded by

Subgradient method

Geoff Gordon & Ryan Tibshirani

Remember gradient descent

for f convex and differentiable

If f Lipschitz, gradient descent has convergence rate O(1/k)

subgradient need not exist)

Consider f : R R, f (x) = |x|

For x 6= 0, unique subgradient g = sign(x)

Consider f : Rn R, f (x) = kxk (Euclidean norm)

For x 6= 0, unique subgradient g = x/kxk

Consider f : Rn R, f (x) = kxk1

For xi 6= 0, unique ith component gi = sign(xi )

Let f1 , f2 : Rn R be convex, differentiable, and consider

For f1 (x) > f2 (x), unique subgradient g = f1 (x)

segment between f1 (x) and f2 (x)

Set of all subgradients of convex f is called the subdifferential:

Connection to convex geometry

i:fi (x)=f (x)

the convex hull of union of subdifferentials of all active

General pointwise maximum: if f (x) = maxsS fs (x), then

s:fs (x)=f (x)

and under some regularity conditions (on S, fs ), we get =

that 1/p + 1/q = 1, then

Why is this a special case? Note

Subgradients are important for two reasons:

monotonicity, relationship to duality

you can minimize (almost) any convex function

I.e., x? is a minimizer if and only if 0 is a subgradient of f at x?

where 0. Consider simplified problem with A = I:

Claim: solution of simple problem

is x? = S (y), where S is the

Why? Subgradients of f (x) = 12 ky xk2 + kxk1 are

Now just plug in x = S (y) and check we can get g = 0

where g (k1) is any subgradient of f at x(k1)

f (xbest ) = min f (x(i) )

Step size choices

Fixed step size: tk = t all k = 1, 2, 3, . . .

i.e., square summable but not summable

f is Lipschitz continuous with constant G > 0,

|f (x) f (y)| Gkx yk for all x, y

Theorem: For a fixed step size t, subgradient method satisfies

lim f (xbest ) f (x? ) + G2 t/2

Theorem: For diminishing step sizes, subgradient method satisfies

Using kx(k+1) x? k 0 and kx(1) x? k R,

Plugging this in and using kg (i) k G,

After k iterations, what is complexity of error f (xbest ) f (x? )?

Consider taking ti = R/(G k), all i = 1, . . . k. Then basic bound

I.e., subgradient method has convergence rate O(1/ k)

I.e., to get f (xbest ) f (x? ) , need O(1/2 ) iterations

and now solve

Note that f (x? ) = 0 x? C1 . . . Cm

For closed, convex C, there is a unique point minimizing kx uk

Let fi (x) = dist(x, Ci ), each i. Then f (x) = maxi=1,...m fi (x),

For each i, and x

If f (x) = fi (x) 6= 0, then kxPCi (x)k f (x)

Now apply subgradient method with step size tk = f (x(k1) )

x(k1) PCi (x(k1) )

(From Boyds notes)

Downside: O(1/ k) rate is really slow ... can we do better?

kx(0) x? k R, f Lipschitz with constant G > 0 on

with x(0) and update x(k) in

Proof: Well do the proof for k = n 1 and x(0) = 0; the proof is

For R = 1/ n, f is Lipschitz with G = 1 + 1/ n

Claim: for any i 1, . . . n 1, the ith iterate satisfies

Therefore f (x(n1) ) 0, recall f (x? ) = 1/(2n), so

Improving on the subgradient method

Optimization: A Basic Course, Kluwer Academic Publishers,

Software Inc., Chapter 5

University Press, Chapters 2325

You might also like

I.e., to get f (xbest ) f (x? ) , need O(1/2 ) iterations