0% found this document useful (0 votes)

55 views

Introductory Course On Non-Smooth Optimisation: Lecture 01 - Gradient Methods

This document introduces gradient descent methods for unconstrained smooth optimization. It discusses finding the minimum of a convex function F(x) through an iterative process. At each iteration k, a descent direction dk is chosen such that the inner product h∇F(xk), dk) is negative. Then an exact or backtracking line search is performed to choose a step size γk to minimize F(x) along the ray xk + γkdk, generating the next iterate xk+1. The goal is to converge to a minimizer x*? of F(x) as k approaches infinity.

Uploaded by

Rashmi Phadnis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Introductory Course On Non-Smooth Optimisation: Lecture 01 - Gradient Methods

Uploaded by

Rashmi Phadnis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Introductory Course on Non-smooth Optimisation

Lecture 01 - Gradient methods

Jingwei Liang

Department of Applied Mathematics and Theoretical Physics

Table of contents

1 Unconstrained smooth optimisation

2 Descent methods

3 Gradient of convex functions

4 Gradient descent

5 Heavy-ball method

6 Nesterov’s optimal schemes

7 Dynamical system
Convexity

Convex set
A set S ⊂ Rn is convex if for any θ ∈ [0, 1] and two points x, y ∈ S,
θx + (1 − θ)y ∈ S.

Convex function
Function F : Rn → R is convex if dom(F) is convex and for all x, y ∈ dom(F) and θ ∈ [0, 1],
F(θx + (1 − θ)y) ≤ θF(x) + (1 − θ)F(y).

Proper convex: F(x) < +∞ at least for one x and F(x) > −∞ for all x.

1st-order condition: F is continuous diﬀerentiable

F(y) ≥ F(x) + h∇F(x), y − xi, ∀x, y ∈ dom(F).

2nd-order condition: if F is twice diﬀerentiable

∇2 F(x) 0, ∀x ∈ dom(F).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Unconstrained smooth optimisation

Problem
Unconstrained smooth optimisation
min F(x),
x∈Rn

where F : Rn → R is proper convex and smooth diﬀerentiable.

Optimality condition: let x? be an minimiser of F(x), then

0 = ∇F(x? ).

F (xk ) ∇F (x)

∇F (x ? )
x

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Example: quadratic minimisation

Quadratic programming
General quadratic programming problem
min 21 xT Ax + bT x + c,
x∈Rn

where A ∈ Rn×n is symmetric positive deﬁnite, b ∈ Rn and c ∈ R.

Optimality condition:
0 = Ax? + b.

Special case least square

||Ax − b||2 = xT (AT A)x − 2(AT b)T x + bT b.

Optimality condition
AT Ax? = AT b.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Example: geometric programming

Geometric programming
Pm T
min log
x∈Rn i=1 exp(ai x + bi ) .

Optimality condition:
1 Pm
0 = Pm T ? i=1
exp(aTi x? + bi )ai .
i=1 exp(ai x + bi )

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Outline

1 Unconstrained smooth optimisation

2 Descent methods

3 Gradient of convex functions

4 Gradient descent

5 Heavy-ball method

6 Nesterov’s optimal schemes

7 Dynamical system
Problem

Unconstrained smooth optimisation

Consider minising
min F(x),
x∈Rn

where F : Rn → R is proper convex and smooth diﬀerentiable.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Problem

Unconstrained smooth optimisation

Consider minising
min F(x),
x∈Rn

where F : Rn → R is proper convex and smooth diﬀerentiable.

The set of minimisers, i.e.

Argmin(F) = {x ∈ Rn : F(x) = minn F(x)}
x∈R

is non-empty.

However, given x? ∈ Argmin(F), no closed form expression.

Iterative strategy to ﬁnd one x? ∈ Argmin(F): start from x0 and generate a train of sequsence
{xk }k∈N such taht
lim xk = x? ∈ Argmin(F).
k→∞

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Problem

Unconstrained smooth optimisation

Consider minising
min F(x),
x∈Rn

where F : Rn → R is proper convex and smooth diﬀerentiable.

xk−1
xk
xk+1
xk+2

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Descent methods

Iterative scheme
For each k = 1, 2, ..., ﬁnd γk > 0 and dk ∈ Rn and then
xk+1 = xk + γk dk ,

where
dk is called search/descent direction.
γk is called step-size.

Descent methods
An algorithm is called descent method, if there holds
F(xk+1 ) < F(xk ).

NB: if xk ∈ Argmin(F), then xk+1 = xk ...

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Conditions

From convexity of F, we have

F(xk+1 ) ≥ F(xk ) + h∇F(xk ), xk+1 − xk i,

which gives
h∇F(xk ), xk+1 − xk i ≥ 0 =⇒ F(xk+1 ) ≥ F(xk ).

Since xk+1 − xk = γk dk , the direction dk should be such that

h∇F(xk ), dk i < 0.

∇F (x k )

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

General descent method

initial : x0 ∈ dom(F);
repeat :
1. Find a descent direction dk .
2. Choose a step-size γk : line search.
3. Update xk+1 = xk + γk dk .
until : stopping criterion is satisﬁed.

Stopping criterion: > 0 is the tolerance,

Function value: F(xk+1 ) − F(xk ) ≤ (can be time consuming).
Sequence: ||xk+1 − xk || ≤ .
Optimality condition: ||∇F(xk )|| ≤ .

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Exact line search

Suppose that the direction dk is given. Choose γk such that F(x) is minimised along the ray
xk + γdk , γ > 0:
γk = argminγ>0 F(xk + γdk ).

Useful when the minimistion problem for γk is simple.

γk can be found analytically for special cases.

F (xk + γdk )

F (xk + γk dk )

γ=0 γk γ

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Backtracking/inexact line search

Backtracking line search

Suppose that the direction dk is given. Choose δ ∈]0, 0.5[ and β ∈]0, 1[, let γ = 1
while F(xk + γdk ) > F(xk ) + δγh∇F(xk ), dk i : γ = βγ.

Reduce F enough along the direction dk .

Since dk is a descent direction
h∇F(xk ), dk i < 0.

Stopping criterion for backtracking:

F(xk + γdk ) ≤ F(xk ) + δγh∇F(xk ), dk i.

When γ is small enough

F(xk + γdk ) ≈ F(xk ) + γh∇F(xk ), dk i < F(xk ) + δγh∇F(xk ), dk i,

whcih means backtracking eventually will stop.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Backtracking/inexact line search

Backtracking line search

Suppose that the direction dk is given. Choose δ ∈]0, 0.5[ and β ∈]0, 1[, let γ = 1
while F(xk + γdk ) > F(xk ) + δγh∇F(xk ), dk i : γ = βγ.

F (xk + γdk )

F (xk ) + γ∇F (xk )T dk F (xk ) + δγ∇F (xk )T dk

γ=0 γ

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Outline

1 Unconstrained smooth optimisation

2 Descent methods

3 Gradient of convex functions

4 Gradient descent

5 Heavy-ball method

6 Nesterov’s optimal schemes

7 Dynamical system
Monotonicity

Monotonicity of gradient
Let F : Rn → R be proper convex and smooth diﬀerentiable, then
h∇F(x) − ∇F(y), x − yi ≥ 0, ∀x, y ∈ dom(F).

C1 : proper convex and smooth diﬀerentiable functions on Rn .

Proof Owing to convexity, given x, y ∈ dom(F), we have

F(y) ≥ F(x) + h∇F(x), y − xi

and
F(x) ≥ F(y) + h∇F(y), x − yi.

Summing them up yields

h∇F(x) − ∇F(y), x − yi ≥ 0.

NB: Let F ∈ C1 , F is convex if and only if ∇F(x) is monotone.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Lipschitz continuous gradient

Lipschitz continuity
The gradient of F is L-Lipschitz continuous if there exists L > 0 such that
||∇F(x) − ∇F(y)|| ≤ L||x − y||, ∀x, y ∈ dom(F).

CL1 : proper convex functions with L-Lipschitz continuous gradient on Rn .

If F ∈ CL1 , then
H(x) = 2L ||x||2 − F(x)
def

is convex.
Hint : monotonicity of ∇H(x), i.e.
h∇H(x) − ∇H(y), x − yi = L||x − y||2 − h∇F(x) − ∇F(y), x − yi
≥ L||x − y||2 − L||x − y||2
= 0.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Descent lemma

Descent lemma, quadratic upper bound

Let F ∈ CL1 , then there holds
F(y) ≤ F(x) + h∇F(x), y − xi + 2L ||y − x||2 , ∀x, y ∈ dom(F).

Proof Deﬁne H(t) = F(x + t(y − x)), then

Z 1 Z 1
F(y) − F(x) = H(1) − H(0) = ∇H(t)dt = (y − x)T ∇F(x + t(y − x))dt
0 0
Z 1 Z 1
(y − x)T ∇F(x)dt + (y − x)T ∇F(x + t(y − x)) − ∇F(x) dt

≤
0 0
Z 1
T
≤ (y − x) ∇F(x) + ||y − x||||∇F(x + t(y − x)) − ∇F(x)||dt
0
Z 1
≤ (y − x)T ∇F(x) + ||y − x|| tL||y − x||dt
0

= (y − x)T ∇F(x) + L ||y − x||2 .

2
def
NB: ﬁrst-order condition of convexity for H(x) = 2L ||x||2 − F(x).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Descent lemma: consequences

Corollary
Let F ∈ CL1 and x? ∈ Argmin(F), then
1
2L
||∇F(x)||2 ≤ F(x) − F(x? ) ≤ 2L ||x − x? ||2 , ∀x ∈ dom(F).

Proof Right-hand inequality: ∇F(x? ) = 0,

F(x) ≤ F(x? ) + h∇F(x? ), x − x? i + 2L ||x − x? ||2 , ∀x ∈ dom(F).

Left-hand inequality:
n o
F(x? ) ≤ min F(x) + h∇F(x), y − xi + 2L ||y − x||2
y∈dom(F)
1
= F(x) − 2L ||∇F(x)||2 .

The corresponding y is y = x − L1 ∇F(x).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Co-coercivity of gradient

Co-coercivity
Let F ∈ CL1 , then
hx − y, ∇F(x) − ∇F(y)i ≥ L1 ||∇F(x) − ∇F(y)||2 .

Co-coercivity implies Lipschitz continuity

def
For F ∈ CL1 , H(x) = 2L ||x||2 − F(x)
Lipschitz continuity of ∇F =⇒ Convexity of H(x)
=⇒ Co-coercivity of ∇F(x)
=⇒ Lipschitz continuity of ∇F

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Co-coercivity of gradient

Co-coercivity
Let F ∈ CL1 , then
hx − y, ∇F(x) − ∇F(y)i ≥ L1 ||∇F(x) − ∇F(y)||2 .

Proof Deﬁne R(z) = F(z) − h∇F(x), zi, then ∇R(x) = 0.

Recall the lemma
F ∈ CL1 and x? ∈ Argmin(F): 1
2L
||∇F(x)||2 ≤ F(x) − F(x? ) ≤ 2L ||x − x? ||2 .

Then we have
1
F(y) − F(x) − h∇F(x), y − xi = R(y) − R(x) ≥ 2L ||∇R(y)||2
1
= 2L ||∇F(y) − ∇F(x)||2 .

Similarly, deﬁne S(z) = F(z) − h∇F(y), zi, then

1
F(x) − F(y) − h∇F(y), x − yi = S(y) − S(x) ≥ 2L ||∇F(x) − ∇F(y)||2 .

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Strongly convex function

Strong convexity
Function F : Rn → R is strongly convex if dom(F) is convex and for all x, y ∈ dom(F) and
θ ∈ [0, 1], there exists α > 0 such that
F(θx + (1 − θ)y) ≤ θF(x) + (1 − θ)F(y) − α
2
θ(1 − θ)||x − y||2 .

F is strongly convex with parameter α > 0 if

G(x) = F(x) − α
def
2
||x||2

is convex.
Monotonicity:
h∇F(x) − ∇F(y), x − yi ≥ α||x − y||2 , ∀x, y ∈ dom(F).

Second-order condition for strong convexity: if F ∈ C2 ,

∇2 F(x) αId, ∀x ∈ dom(F).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Quadratic lower bound

Let F ∈ C1 and strongly convex, then
F(y) ≥ F(x) + h∇F(x), y − xi + α
2
||y − x||2 , ∀x, y ∈ dom(F).

def
Proof First-order condition of convexity for G(x) = F(x) − α
2
||x||2 .

Corollary
Let F ∈ C1 be α-strongly convex and x? ∈ Argmin(F), then
α 1
2
||x − x? ||2 ≤ F(x) − F(x? ) ≤ 2α ||∇F(x)||2 , ∀x ∈ dom(F).

Proof Left-hand inequality: quadratic lower bound.

Right-hand inequality:
n o
F(x? ) ≥ min F(x) + h∇F(x), y − xi + α
2
||y − x||2
y∈dom(F)
1
= F(x) − 2α ||∇F(x)||2 .

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Extension of co-coercivity

If F ∈ CL1 and α-strongly convex, then

G(x) = F(x) − α
def
2
||x2 ||

is convex, and ∇G is L − α-Lipschitz continuous.

The co-coercivity of ∇G yields

h∇F(x) − ∇F(y), x − yi ≥ ααL
+L
1
||x − y||2 + α + L
||∇F(x) − ∇F(y)||2

for all x, y ∈ dom(F).

S1α,L : functions in CL1 that are α-strongly convex.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Rate of convergence

Sequence xk converges linearly to x? if

||xk+1 − x? ||
lim ||xk − x? ||
=ρ
k→+∞

holds for ρ ∈]0, 1[, and ρ is called the rate of convergence.

||xk+1 −x? ||
If xk converges, let ρk = ||xk −x? ||
,
– if limk→+∞ ρk = 0: super-linear convergence.
– if limk→+∞ ρk = 1: sub-linear convergence.

Superlinear convergence: q > 1

||xk+1 − x? ||
lim <η
k→+∞ ||xk − x? ||q

for some η ∈]0, 1[.

– q = 2: quadratic convergence.
– q = 3: cubic convergnce.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Outline

1 Unconstrained smooth optimisation

2 Descent methods

3 Gradient of convex functions

4 Gradient descent

5 Heavy-ball method

6 Nesterov’s optimal schemes

7 Dynamical system
Unconstrained smooth optimisation

Unconstrained smooth optimisation

Consider minimising
min F(x),
x∈Rn

where F : Rn → R is proper convex and smooth diﬀerentiable.

Assumptions:
F ∈ C1 is convex.
∇F(x) is L-Lipschitz continuous for some L > 0.
Set of minimisers is non-empty, i.e. Argmin(F) 6= ∅.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Gradient descent

Descent direction: let d = −∇F(x), then

h∇F(x), di = −||∇F(x)||2 ≤ 0.

Gradient descent
initial : x0 ∈ dom(F);
repeat :
1. Choose step-size γk > 0
2. Update xk+1 = xk − γk ∇F(xk )
until : stopping criterion is satisﬁed.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Convergence analysis: constant step-size

Owing to the quadratic upper bound

F(xk+1 ) ≤ F(xk ) + h∇F(xk ), xk+1 − xk i + 2L ||xk+1 − xk ||2
2
= F(xk ) − γ||∇F(xk )||2 + γ2 L ||∇F(xk )||2
= F(xk ) − γ(1 − γL
2
)||∇F(xk )||2 .
Hence
F(xk ) − F(xk+1 ) ≥ γ(1 − γL
2
)||∇F(xk )||2 .

Let γ ∈]0, 2/L[,

γ(1 − γL
Pk
2
) i=0 ||∇F(xi )||2 ≤ F(x0 ) − F(xk+1 ) ≤ F(x0 ) − F(x? ).

F(x? ) > −∞, rhs is a positive constant.

for lhs, let k → +∞,

lim ||∇F(xk )||2 = 0.
k→+∞

NB: convexity is not required here.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Convergence analysis: constant step-size

Let γ ∈]0, 1/L], then γ(1 − γL

2
)≥ γ
2
, and

F(xk+1 ) ≤ F(xk ) − γ2 ||∇F(xk )||2

(cvx of F at xk ) ≤ F(x? ) + h∇F(xk ), xk − x? i − γ2 ||∇F(xk )||2
1
= F(x? ) + 2γ ||xk − x? ||2 − ||xk − x? − γ∇F(xk )||2

1
= F(x? ) + 2γ ||xk − x? ||2 − ||xk+1 − x? ||2 .

Summability of F(xk ) − F(x? ),

Pk 1 Pk
F(xk ) − F(x? ) ≤ 2γ ||xi−1 − x? ||2 − ||xi − x? ||2

i=1 i=1

1
||x0 − x? ||2 − ||xk+1 − x? ||2

= 2γ
1
≤ 2γ ||x0 − x? ||2 .

Since F(xk ) − F(x? ) is decreasing

P
F(xk ) − F(x? ) ≤ k1 1
k
i=1
F(xk ) − F(x? ) ≤ 2γk ||x0 − x? ||2 .

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Convergence analysis: strongly convex F

Besides the basic assumptions, let’s further assume F ∈ S1α,L .

Recall that, for all x, y ∈ dom(F)

h∇F(x) − ∇F(y), x − yi ≥ ααL
+L
1
||x − y||2 + α + L
||∇F(x) − ∇F(y)||2 .

Analysis for constant step-size: let γ ∈]0, 2/(α + L)[

||xk+1 − x? ||2 = ||xk − γ∇F(xk ) − x? ||2
= ||xk − x? ||2 − 2γh∇F(xk ), xk − x? i + γ 2 ||∇F(xk )||2
2γαL 2
? 2

?
(∇F(x ) = 0) ≤ 1− α +L
||xk − x || + γ γ −
α+L
||∇F(xk )||2
2γαL

≤ 1− α +L
||xk − x? ||2 .

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Convergence analysis: strongly convex F

2γαL
Distance to minimiser: ρ = 1 − α+L

||xk − x? ||2 ≤ ρk ||x0 − x? ||2 .

linear covnergence
2
for γ = α+L
,
2
ρ = LL −
+α
α
.

Convergence rate of objective function value:

k
F(xk ) − F(x? ) ≤ 2L ||xk − x? ||2 ≤ ρ2L ||x0 − x? ||2 .

Numer of iterations k needed for F(xk ) − F(x? ) ≤

F ∈ CL1 : O(1/).
F ∈ S1α,L : O(log(1/)).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Limits on convergence rate of gradient descent

First-order method: xk is an element from the set

x0 + span ∇F(x0 ), ..., ∇F(xi ), ..., ∇F(xk−1 ) . 4.1

Problem class: CL1

Nesterov’s lower bound

For every integer k ≤ (n − 1)/2 and every x0 , there exist functions in the problem class such that
for any ﬁrst-order method satisﬁes (4.1),
2
3 L||x0 − x? ||
F(xk ) − F(x? ) ≥ 32 2 ,
(k + 1)
1
||xk − x || ≥ 8 ||x0 − x? ||2 .
? 2

Suggests O(1/k) is not the optimal rate.

Accelerated gradient methods can achieve O(1/k2 ) rate.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Outline

1 Unconstrained smooth optimisation

2 Descent methods

3 Gradient of convex functions

4 Gradient descent

5 Heavy-ball method

6 Nesterov’s optimal schemes

7 Dynamical system
Observations

Gradient descent:
−γ∇F(xk ) = xk+1 − xk .

def
Consider the angle: θk = angle(∇F(xk+1 ), ∇F(xk )),
lim θk = 0.
k→+∞
Exercise: prove this claim for least square.

Let a > 0 be some constant,

−∇F(xk+1 ) ≈ a(xk+1 − xk ).

xk−1
xk
xk+1
xk+2

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Heavy-ball method

Heavy-ball method (Polyak)

Initial : x0 ∈ dom(F) and γ ∈]0, 2/L[;
yk = xk + ak (xk − xk−1 ), ak ∈ [0, 1],
xk+1 = yk − γ∇F(xk ).

xk−1
xk
xk+1 yk
xk+2
xk+1

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Heavy-ball method

Heavy-ball method (Polyak)

Initial : x0 ∈ dom(F) and γ ∈]0, 2/L[;
yk = xk + ak (xk − xk−1 ), ak ∈ [0, 1],
xk+1 = yk − γ∇F(xk ).

xk − xk−1 is called the inertial term or momentum term.

ak is called the inertial parameter.

Convergence can be proved by studying the Lyapunov function

def ak
E(xk ) = F(xk ) + 2γ ||xk − xk−1 ||2 .

In general, no convergence rate for F ∈ CL1 . Local rate for F ∈ S2α,L .

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Convergence rate

Theorem
Let x? be a (local) minimiser of F such that αId ∇2 F(x? ) LId and choose a, γ with
a ∈ [0, 1[, γ ∈]0, 2(1 + a)/L[. There exists ρ < 1 such that if ρ < ρ < 1 and if x0 , x1 are close
enough to x? , one has
||xk − x? || ≤ Cρk .

Moreover, if √ √ 2 √ √
a= √L − √α , γ = √ 4 √ 2 then ρ = √
L− α
√ .
L+ α ( L + α) L+ α

Starting points need to close enough to x?

Almost the optimal rate can be achieve by gradient method (or ﬁrst-order method)
Gradient descent
−α
ρ = LL + α
.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Convergence rate: proof

Taylor expansion
xk+1 = xk + a(xk − xk−1 ) − γ∇2 F(x? )(xk − x? ) + o(||xk − x? ||).

Let zk = (xk − x? , xk−1 − x? )T and H = ∇2 F, then

" #
(1 + a)Id − aH −aId
zk+1 = zk + o(||zk ||).
Id 0
| {z }
M

Spectral radius ρ(M), η = 1 − γα

0 = ρ2 − (a + η)ρ + aη.

ρ(M) is a function of a and η (essentially γ).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Outline

1 Unconstrained smooth optimisation

2 Descent methods

3 Gradient of convex functions

4 Gradient descent

5 Heavy-ball method

6 Nesterov’s optimal schemes

7 Dynamical system
Convergence rate of gradient descent

Gradient descent with constant step-size:

F ∈ CL1
L||x0 − x? ||2
F(xk ) − F(x? ) ≤ k+4
.
F ∈ S1α,L 2
−α
F(xk ) − F(x? ) ≤ 2L LL + α
||x0 − x? ||2 .

xk−1
xk
xk+1 yk
xk+2 xk+1

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Nesterov’s optimal scheme

Optimal scheme with constant step-size

initial : Choose x0 ∈ Rn , φ0 ∈]0, 1[; Let y0 = x0 and q = α/L.
repeat :
1. Compute φk+1 ∈]0, 1[ from equation
φ2k+1 = (1 − φk+1 )φ2k + qφk+1 .
φk (1−φk )
Let ak = φ2k +φk+1
and
yk = xk + ak (xk − xk−1 ).

2. Update xk+1 by
xk+1 = yk − L1 ∇F(yk ).
until : stopping criterion is satisﬁed.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Convergence rate

Convergence rate
p
Let φ0 ≥ α/L, then
n q k o
α √ 4L √
F(xk ) − F(x? ) ≤ min 1− ,
L (2 L + k ν)2
× F(x0 ) − F(x? ) + ν2 ||x0 − x? ||2 ,

φ0 (φ0 L−α)
where ν = 1−φ0
.

Parameter choices:
F ∈ CL1 : φ0 = 1,
2
q = 0, φk ≈ k + 1
→0 and ak ≈ 11 − φk
+ φk
→ 1.
p
F ∈ S1α,L : φ0 = α/L
√ √
L− α
q q
α α
q= L
, φk ≡ L
and ak ≡ √ √ .
L+ α

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Outline

1 Unconstrained smooth optimisation

2 Descent methods

3 Gradient of convex functions

4 Gradient descent

5 Heavy-ball method

6 Nesterov’s optimal schemes

7 Dynamical system
Dynamical system of gradient descent

From gradient descent

xk+1 − xk
= −∇F(xk ).
γ
Let γ be small enough
Ẋ(t) + ∇F(X(t)) = 0.

Discretisation
Explicit Euler method
X(t + h) − X(t)
Ẋ(t) = .
h
Implicit Euler method
X(t) − X(t − h)
Ẋ(t) = .
h

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Dynamical system of inertial schemes

Given a 2nd order dynamical system

Ẍ(t) + λ(t)Ẋ(t) + ∇F(X(t)) = 0.

Discretisation:
2nd order term
X(t + h) − 2X(t) + X(t − h)
Ẍ(t) = .
h2
Implicit Euler method
X(t) − X(t − h)
Ẋ(t) = .
h
Combine together:
X(t + h) − X(t) − (1 − hλ(t))(X(t) − X(t − h)) + h2 ∇F(X(t)) = 0.

Choices:
Heavy-ball: hλ(t) ∈]0, 1[.
Nesterov: λ(t) = dt , d > 3.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Reference

S. Boyd and L. Vandenberghe. “Convex optimization”. Cambridge university press, 2004.

B. Polyak. “Introduction to optimization”. Optimization Software, 1987.
Y. Nesterov. “Introductory lectures on convex optimization: A basic course”. Vol. 87. Springer
Science & Business Media, 2013.
W. Su, S. Boyd, and E. Candès. “A diﬀerential equation for modeling Nesterov’s accelerated
gradient method: Theory and insights”. Advances in Neural Information Processing Systems.
2014.

Series Cheat Sheet
No ratings yet
Series Cheat Sheet
1 page
Joseph M. Hilbe - Practical Guide To Logistic Regression (2016, Taylor & Francis)
No ratings yet
Joseph M. Hilbe - Practical Guide To Logistic Regression (2016, Taylor & Francis)
162 pages
Bms Basic NLP 120609
No ratings yet
Bms Basic NLP 120609
103 pages
Optimization PDF
No ratings yet
Optimization PDF
59 pages
Numerical Algebra, Control and Optimization Volume 6, Number 2, June 2016
No ratings yet
Numerical Algebra, Control and Optimization Volume 6, Number 2, June 2016
13 pages
Lecture07 Slides
No ratings yet
Lecture07 Slides
42 pages
Optimal Stochastic Non-smooth Non-convex Optimization Through
No ratings yet
Optimal Stochastic Non-smooth Non-convex Optimization Through
39 pages
4 Chapter 21 Non Linear Programming
No ratings yet
4 Chapter 21 Non Linear Programming
37 pages
02_grad_desc
No ratings yet
02_grad_desc
54 pages
Cheatsheet
No ratings yet
Cheatsheet
2 pages
Data Science - Convex Optimization and Examples PDF
No ratings yet
Data Science - Convex Optimization and Examples PDF
9 pages
Lecture 11
No ratings yet
Lecture 11
4 pages
NeurIPS-2019-efficient-algorithms-for-smooth-minimax-optimization-Paper
No ratings yet
NeurIPS-2019-efficient-algorithms-for-smooth-minimax-optimization-Paper
12 pages
lect5_removed
No ratings yet
lect5_removed
35 pages
Chapter 9st - Non-Linear Programming
No ratings yet
Chapter 9st - Non-Linear Programming
21 pages
Master Qfin at Wu Vienna Lecture Optimization: R Udiger Frey
No ratings yet
Master Qfin at Wu Vienna Lecture Optimization: R Udiger Frey
28 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
An Introduction To Nonlinear Optimization Theory Marius Durea Radu Strugariu instant download
100% (1)
An Introduction To Nonlinear Optimization Theory Marius Durea Radu Strugariu instant download
84 pages
Bologna 07
No ratings yet
Bologna 07
315 pages
Princeton University Notation and Terminology in optimization
No ratings yet
Princeton University Notation and Terminology in optimization
13 pages
Get An Introduction to Nonlinear Optimization Theory Durea free all chapters
100% (5)
Get An Introduction to Nonlinear Optimization Theory Durea free all chapters
85 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
latex
No ratings yet
latex
4 pages
Upload Library: Browse
No ratings yet
Upload Library: Browse
44 pages
Latex for Mu
No ratings yet
Latex for Mu
3 pages
NLP Slides
No ratings yet
NLP Slides
201 pages
AMPC1_LB
No ratings yet
AMPC1_LB
59 pages
Notes HQ
No ratings yet
Notes HQ
96 pages
Convex Optimization and Lagrange Duality
No ratings yet
Convex Optimization and Lagrange Duality
24 pages
LGT2
No ratings yet
LGT2
32 pages
MLSS Complete PDF
No ratings yet
MLSS Complete PDF
106 pages
ECN 2115 Lecture 1 - 2
No ratings yet
ECN 2115 Lecture 1 - 2
6 pages
Lecture 12
No ratings yet
Lecture 12
4 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
04b.convex-optimization
No ratings yet
04b.convex-optimization
30 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
O4MD 02 Foundations
No ratings yet
O4MD 02 Foundations
8 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Lec4 Gradient Method Revise
No ratings yet
Lec4 Gradient Method Revise
33 pages
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
No ratings yet
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
258 pages
Optimality Conditions
No ratings yet
Optimality Conditions
10 pages
Chapter 7: Continuous Optimization (Math for machine learning)
No ratings yet
Chapter 7: Continuous Optimization (Math for machine learning)
65 pages
An Introduction to Nonlinear Optimization Theory Durea instant download
100% (1)
An Introduction to Nonlinear Optimization Theory Durea instant download
49 pages
2. Algorithmic Stability
No ratings yet
2. Algorithmic Stability
87 pages
Math Chapter 7
No ratings yet
Math Chapter 7
4 pages
Nonlinear Analysis: Real World Applications: Changfeng Ma, Lihua Jiang, Desheng Wang
No ratings yet
Nonlinear Analysis: Real World Applications: Changfeng Ma, Lihua Jiang, Desheng Wang
16 pages
5 Lagrange Duality
No ratings yet
5 Lagrange Duality
4 pages
C62 Lecture1b
No ratings yet
C62 Lecture1b
20 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Chap. 8 Optimization (Maximization or Minimization) of Single Variable Functions
No ratings yet
Chap. 8 Optimization (Maximization or Minimization) of Single Variable Functions
47 pages
Optimality Conditions For Nonsmooth Fuzzy Optimization Models Under The Gh-Weak Subdifferentiability
No ratings yet
Optimality Conditions For Nonsmooth Fuzzy Optimization Models Under The Gh-Weak Subdifferentiability
18 pages
Material and Energy Balance
No ratings yet
Material and Energy Balance
26 pages
20-267
No ratings yet
20-267
37 pages
Gradient
No ratings yet
Gradient
37 pages
Optimality Conditions For General Constrained Optimization: CME307/MS&E311: Optimization Lecture Note #07
No ratings yet
Optimality Conditions For General Constrained Optimization: CME307/MS&E311: Optimization Lecture Note #07
28 pages
Institute of Computer Science: Academy of Sciences of The Czech Republic
No ratings yet
Institute of Computer Science: Academy of Sciences of The Czech Republic
49 pages
lecture-4-si416-2025
No ratings yet
lecture-4-si416-2025
22 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Mathematics For Data Science 2 - Week 2 GA
100% (1)
Mathematics For Data Science 2 - Week 2 GA
24 pages
DISTRICT THIRD PERIODICAL TESTS IN MATH 2023 2024 Edited
No ratings yet
DISTRICT THIRD PERIODICAL TESTS IN MATH 2023 2024 Edited
4 pages
Coulomb's Law
No ratings yet
Coulomb's Law
5 pages
Square Root and Cube Root - 1st - Chapter
No ratings yet
Square Root and Cube Root - 1st - Chapter
20 pages
Chapter 7 - Coordinate Geometry
No ratings yet
Chapter 7 - Coordinate Geometry
21 pages
Gradient and Y-Intercept
No ratings yet
Gradient and Y-Intercept
9 pages
Mathematics I Calculus 2075 Questionpaper
No ratings yet
Mathematics I Calculus 2075 Questionpaper
1 page
Relational Algebra Operations From Set Theory: Cartesian Product (Cross Product or Cross Join)
No ratings yet
Relational Algebra Operations From Set Theory: Cartesian Product (Cross Product or Cross Join)
1 page
Calculus: Variable
No ratings yet
Calculus: Variable
51 pages
Control Systems For Robots: Prof. Robert Marmelstein CPSC 527 - Robotics Spring 2010
No ratings yet
Control Systems For Robots: Prof. Robert Marmelstein CPSC 527 - Robotics Spring 2010
40 pages
Hexadecimal: Representation
No ratings yet
Hexadecimal: Representation
18 pages
Online Quiz 13 Teori Himpunan Elementer 1 Attempt Review
No ratings yet
Online Quiz 13 Teori Himpunan Elementer 1 Attempt Review
7 pages
Algebraic Coding Theory Elwyn R Berlekamp instant download
100% (1)
Algebraic Coding Theory Elwyn R Berlekamp instant download
51 pages
Controls Engineering in FRC
No ratings yet
Controls Engineering in FRC
238 pages
Sequential Forward Selection (SFS)
No ratings yet
Sequential Forward Selection (SFS)
5 pages
Finding Side Lengths: Lesson T.3
No ratings yet
Finding Side Lengths: Lesson T.3
8 pages
Class 5 Natural and Whole Numbers: Answer The Questions
No ratings yet
Class 5 Natural and Whole Numbers: Answer The Questions
16 pages
Further-Maths-GCSE-SoW-Student-Review-Document
No ratings yet
Further-Maths-GCSE-SoW-Student-Review-Document
16 pages
Mathematics: Quarter 3 - Module 3
100% (1)
Mathematics: Quarter 3 - Module 3
18 pages
On K-Fibonacci Sequences and Polynomials and Their Derivatives
No ratings yet
On K-Fibonacci Sequences and Polynomials and Their Derivatives
15 pages
Mid Exam Discrete Mathematics& Combinatory For Admas April, 2024
100% (1)
Mid Exam Discrete Mathematics& Combinatory For Admas April, 2024
5 pages
Eklavya Mathematics Practice Booklet for JEE Advanced
No ratings yet
Eklavya Mathematics Practice Booklet for JEE Advanced
273 pages
Get Numerical Methods for Engineers 6th Edition Chapra Solutions Manual free all chapters
100% (45)
Get Numerical Methods for Engineers 6th Edition Chapra Solutions Manual free all chapters
51 pages
Proportional Reasoning PDF
No ratings yet
Proportional Reasoning PDF
19 pages
1.set Relations and Functions
No ratings yet
1.set Relations and Functions
24 pages
Contextual Learning Matrix CLM Components PDF
100% (3)
Contextual Learning Matrix CLM Components PDF
11 pages
6th Techno work sheet -9 unit and dimensions
No ratings yet
6th Techno work sheet -9 unit and dimensions
1 page
Untitled
No ratings yet
Untitled
33 pages

Introductory Course On Non-Smooth Optimisation: Lecture 01 - Gradient Methods

Uploaded by

Introductory Course On Non-Smooth Optimisation: Lecture 01 - Gradient Methods

Uploaded by

Introductory Course on Non-smooth Optimisation

Lecture 01 - Gradient methods

Department of Applied Mathematics and Theoretical Physics

1 Unconstrained smooth optimisation

3 Gradient of convex functions

6 Nesterov’s optimal schemes

1st-order condition: F is continuous diﬀerentiable

2nd-order condition: if F is twice diﬀerentiable

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

where F : Rn → R is proper convex and smooth diﬀerentiable.

Optimality condition: let x? be an minimiser of F(x), then

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

where A ∈ Rn×n is symmetric positive deﬁnite, b ∈ Rn and c ∈ R.

Special case least square

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

1 Unconstrained smooth optimisation

3 Gradient of convex functions

6 Nesterov’s optimal schemes

Unconstrained smooth optimisation

where F : Rn → R is proper convex and smooth diﬀerentiable.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Unconstrained smooth optimisation

where F : Rn → R is proper convex and smooth diﬀerentiable.

The set of minimisers, i.e.

However, given x? ∈ Argmin(F), no closed form expression.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Unconstrained smooth optimisation

where F : Rn → R is proper convex and smooth diﬀerentiable.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

NB: if xk ∈ Argmin(F), then xk+1 = xk ...

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

From convexity of F, we have

Since xk+1 − xk = γk dk , the direction dk should be such that

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

General descent method

Stopping criterion:  > 0 is the tolerance,

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Exact line search

Useful when the minimistion problem for γk is simple.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Backtracking line search

Reduce F enough along the direction dk .

Stopping criterion for backtracking:

When γ is small enough

whcih means backtracking eventually will stop.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Backtracking line search

F (xk ) + γ∇F (xk )T dk F (xk ) + δγ∇F (xk )T dk

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

1 Unconstrained smooth optimisation

3 Gradient of convex functions

6 Nesterov’s optimal schemes

C1 : proper convex and smooth diﬀerentiable functions on Rn .

Proof Owing to convexity, given x, y ∈ dom(F), we have

Summing them up yields

NB: Let F ∈ C1 , F is convex if and only if ∇F(x) is monotone.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

CL1 : proper convex functions with L-Lipschitz continuous gradient on Rn .

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Descent lemma, quadratic upper bound

Proof Deﬁne H(t) = F(x + t(y − x)), then

= (y − x)T ∇F(x) + L ||y − x||2 .

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Proof Right-hand inequality: ∇F(x? ) = 0,

The corresponding y is y = x − L1 ∇F(x).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Co-coercivity implies Lipschitz continuity

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Proof Deﬁne R(z) = F(z) − h∇F(x), zi, then ∇R(x) = 0.

Similarly, deﬁne S(z) = F(z) − h∇F(y), zi, then

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

F is strongly convex with parameter α > 0 if

Second-order condition for strong convexity: if F ∈ C2 ,

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Quadratic lower bound

Stopping criterion: > 0 is the tolerance,

Numer of iterations k needed for F(xk ) − F(x? ) ≤