0% found this document useful (0 votes)
20 views

OPTCON Optimization 2023 10 11

This document summarizes key concepts in unconstrained optimization from a course on optimal control and nonlinear optimization taught by Professor Giuseppe Notarstefano. It defines optimal solutions, local and global minima/maxima, and introduces the concepts of gradients and Hessian matrices. It then presents the first and second order necessary conditions for optimality in unconstrained problems, as well as sufficient conditions. Finally, it discusses convex sets, functions, and how constraints define convex regions.

Uploaded by

theroules26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

OPTCON Optimization 2023 10 11

This document summarizes key concepts in unconstrained optimization from a course on optimal control and nonlinear optimization taught by Professor Giuseppe Notarstefano. It defines optimal solutions, local and global minima/maxima, and introduces the concepts of gradients and Hessian matrices. It then presents the first and second order necessary conditions for optimality in unconstrained problems, as well as sufficient conditions. Finally, it discusses convex sets, functions, and how constraints define convex regions.

Uploaded by

theroules26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Optimal Control

Nonlinear Optimization

Prof. Giuseppe Notarstefano

Department of Electrical, Electronic, and Information Engineering


Alma Mater Studiorum Università di Bologna
[email protected]

A special thank to L. Sforni and I. Notarnicola for the support on the slide preparation
The present slides are for internal use of the course
Optimal Control @ University of Bologna.
Unconstrained optimization: definitions, optimality conditions and
special problem classes
Unconstrained optimization: introduction
Consider the unconstrained optimization problem

min ℓ(x),
x∈Rn

with ℓ : Rn → R a cost function to be minimized and x a decision vector.

We say that x⋆ is a
• global minimum if ℓ(x⋆ ) ≤ ℓ(x) for all x ∈ Rn
• strict global minimum if ℓ(x⋆ ) < ℓ(x) for all x ̸= x⋆
• local minimum if there exists ϵ > 0 such that ℓ(x⋆ ) ≤ ℓ(x) for all
x ∈ B(x⋆ , ϵ) = {x ∈ Rn | ∥x − x⋆ ∥ < ϵ}
• strict local minimum if there exists ϵ > 0 such that ℓ(x⋆ ) < ℓ(x) for all x ∈ B(x⋆ , ϵ) and x ̸= x⋆ .

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 2 | 63


Unconstrained optimization: introduction
Consider the unconstrained optimization problem

min ℓ(x),
x∈Rn

with ℓ : Rn → R a cost function to be minimized and x a decision vector.

strict local maximum

local minima

strict global minimum

Remark Maxima can be equivalently defined. Moreover, maxima are minima of −ℓ.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 3 | 63


Notation
Optimal solution
We denote ℓ(x∗ ) the optimal (minimum) value of a generic optimization problem, i.e.

ℓ(x∗ ) = minn ℓ(x),


x∈R


where x is the minimum point (optimal value for the optimization variable), i.e.,

x∗ = argmin ℓ(x),
x∈Rn

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 4 | 63


Remarks on notation: gradient and Hessian matrix of a function
Gradient of a function
For a function r : Rn → R, the gradient is denoted as
 ∂r(x) 
∂x
 .1  n×1
 ..  ∈ R
∇r(x) =  
∂r(x)
∂xn

Hessian matrix of a function


For a function r : Rn → R, the Hessian matrix is denoted as
 2 2 
∂ r(x)
2 · · · ∂∂xr(x)
1 xn 
 ∂x1
∇2 (r(x)) =  ... .. ..  ∈ Rn×n

 . .  
∂r(x) ∂ 2 r(x)
∂xn x1
··· ∂x2 n

The Hessian matrix is a symmetric matrix, since the assumption of continuity of the second derivatives
implies that the order of differentiation does not matter.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 5 | 63


Remarks on notation: gradient of a vector-valued function
Gradient of a (vector-valued) function
For a vector field r : Rn → Rm , the gradient is denoted as
 
∇r(x) = ∇r1 (x) · · · ∇rm (x)
 ∂r (x)
· · · ∂r∂x
m (x)

1
∂x
 .1 ..
1
..  n×m
=  . . . . ∈R

∂r1 (x)
∂xn
· · · ∂r∂x
m (x)
n

which is the transpose of the Jacobian matrix of r.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 6 | 63


Necessary conditions of optimality (unconstrainted)
First order necessary condition (FNC) of optimality (unconstrained)
Let x⋆ be an unconstrained local minimum of ℓ : Rn → R and assume that ℓ is continuously differentiable
(C 1 ) in B(x⋆ , ϵ) for some ϵ > 0. Then ∇ℓ(x⋆ ) = 0.

Second order necessary condition of optimality (unconstrained)


If additionally ℓ is twice continuously differentiable (C 2 ) in B(x⋆ , ϵ), then ∇2 ℓ(x⋆ ) ≥ 0 (the Hessian
∇2 ℓ(x⋆ ) of ℓ is positive semidefinite).

Remark Points x̄ satisfying ∇ℓ(x̄) = 0 are called stationary points. They include minima, maxima and
saddle points.

saddle point

strict local maximum

local minima

strict global minimum

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 7 | 63


Second order sufficient conditions of optimality
Second order sufficient condition of optimality (unconstrained)
Let ℓ : Rn → R be twice continuously differentiable (C 2 ) in B(x⋆ , ϵ) for some ϵ > 0. Suppose that
x⋆ ∈ Rn satisfies

∇ℓ(x⋆ ) = 0 and ∇2 ℓ(x⋆ ) > 0 (positive definite).

Then x⋆ is a strict (unconstrained) local minimum of ℓ.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 8 | 63


Convex sets
Convex set
A set X ⊂ Rn is convex if for any two points xA and xB in X and for all λ ∈ [0, 1], then

λxA + (1 − λ)xB ∈ X.

xB

xA X

Convex set Nonconvex set

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 9 | 63


Convex functions
Convex function
Let X ⊂ Rn be a convex set. A function ℓ : X → R is convex if for any two points xA and xB in X
and for all λ ∈ [0, 1], then

ℓ(λxA + (1 − λ)xB ) ≤ λℓ(xA ) + (1 − λ)ℓ(xB ).

Remark A function ℓ is concave if −ℓ is convex. A function ℓ is strictly convex if the inequality holds
strictly for xA ̸= xB and λ ∈ (0, 1).

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 10 | 63


Inequality constraints and convex sets
Let g : Rn → Rp , we can define a set Xineq ⊂ Rn as

Xineq = {x ∈ Rn | g(x) ≤ 0}

The set Xineq is convex if and only if g is a quasi-convex function (e.g., monotone functions on the axis).

Corollary: If g is convex, then Xineq is convex.

x2 x2
ax21 + bx22 − c ≤ 0
a> x − c ≤ 0

x1 x1

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 11 | 63


Equality constraints and convex sets
Let h : Rn → Rp , we can define a set Xeq ⊂ Rn as

Xeq = {x ∈ Rn | h(x) = 0}

The set Xeq is convex if and only if h is an affine function. Convex sets identified through equality
constraints are linear spaces (hyperplanes).

x2 x2

x> Qx + q > x − c = 0 a> x − c = 0

x1 x1

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 12 | 63


Minimization of convex functions
Proposition (Prop. B.10 Nonlin. Progr.)
Let X ⊂ Rn be a convex set and ℓ : X → R a convex function. Then a local minimum of ℓ is also a
global minimum.
Proof
Let x⋆ be a local minimum of ℓ. Suppose by contradiction that x⋆ is not a global minimum. Then there
exists a global minimum x̄ ∈ X such that ℓ(x̄) < ℓ(x⋆ ).

Using the definition of convexity it holds

ℓ(λx⋆ + (1 − λ)x̄) ≤ λℓ(x⋆ ) + (1 − λ)ℓ(x̄)


< λℓ(x⋆ ) + (1 − λ)ℓ(x⋆ ) = ℓ(x⋆ ),

for every λ ∈ [0, 1).

This contradicts the assumption that x⋆ is a local minimum since we found points, x̃ = λx⋆ + (1 − λ)x̄
with λ ∈ [0, 1), arbitrarily close to x⋆ with ℓ(x̃) < ℓ(x⋆ ). This concludes the proof.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 13 | 63


Minimization of convex functions
Necessary and sufficient condition of optimality (unconstrained)
For the unconstrained minimization of a convex function it can be shown that the first order necessary
condition of optimality is also sufficient (for a global minimum).

Proposition (Prop. 1.1.2 Nonlin. Progr.)


Let ℓ : Rn → R be a convex function. Then x⋆ is a global minimum if and only if ∇ℓ(x⋆ ) = 0.
Proof (sketch)
Use the fact (Proposition B3 in Nonlinear Programming) that ℓ convex if and only if

ℓ(x) ≥ ℓ(x⋆ ) + ∇ℓ(x⋆ )T (x − x⋆ )

for all x ∈ Rn . Thus, if ∇ℓ(x⋆ ) = 0 it follows immediately ℓ(x) ≥ ℓ(x⋆ ) for all x ∈ Rn . The converse
is true by the first order necessary condition and the fact that a local minimum is a global minimum.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 14 | 63


Quadratic Programming (unconstrained)
Let us consider a special class of optimization problems, namely quadratic optimization problems or
quadratic programs.

Quadratic program

min xT Qx + q T x
x∈Rn

with Q = QT ∈ Rn×n and q ∈ Rn .

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 15 | 63


Quadratic Programming (unconstrained): optimality conditions
First-order necessary condition for optimality: if x⋆ is a minimum then

∇ℓ(x⋆ ) = 0 =⇒ 2Qx⋆ + b = 0

Second-order necessary condition for optimality: if x⋆ is a minimum then

∇2 ℓ(x⋆ ) ≥ 0 =⇒ 2Q ≥ 0

Important
A necessary condition for the existence of minima for a quadratic program is that Q ≥ 0.
Thus, quadratic programs admitting at least a minimum are convex optimization problems.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 16 | 63


Quadratic Programming (unconstrained): properties
Since quadratic programs are convex program (Q ≥ 0 is necessary to have a local minimum), then the
following holds.

Important
For a quadratic program necessary conditions of optimality are also sufficient
and minima are global.

If Q > 0, then there exists a unique global minimum given by


1
x⋆ = − Q−1 q.
2

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 17 | 63


Unconstrained optimization: algorithms
Optimization algorithms (unconstrained): iterative descent methods
We consider optimization algorithms relying on the iterative descent idea.

Notation We denote xk ∈ Rn an estimate of a local minimum at iteration k ∈ N.

The algorithm starts at a given initial guess x0 and iteratively generates vectors x1 , x2 , . . . such that ℓ
is decreased at each iteration, i.e.,

ℓ(xk+1 ) < ℓ(xk ) k = 1, 2, . . .

`(x) xk

xk+1

x?

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 19 | 63


Optimization algorithms: two-step procedure
We consider a general two-step procedure that reads as follows

xk+1 = xk + γ k dk , k = 1, 2, . . .

in which
1. each γ k > 0 is a “step-size”,
2. dk ∈ Rn is a “direction”.

The goal is to
1. choose a direction dk along which the cost decreases for γ k sufficiently small;
2. select a step-size γ k guaranteeing a sufficient decrease.

Notation In other references these are called line-search methods.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 20 | 63


Gradient methods: introduction
Let xk be such that ∇ℓ(xk ) ̸= 0. We start by considering the update rule

xk+1 = xk − γ k ∇ℓ(xk ),

i.e., we choose dk = −∇ℓ(xk ).

From the first order Taylor expansion of ℓ at xk we have

ℓ(xk+1 ) = ℓ(xk ) + ∇ℓ(xk )T (xk+1 − xk ) + o(∥xk+1 − xk ∥)


= ℓ(xk ) − γ k ∥∇ℓ(xk )∥2 + o(γ k ).

Thus, for γ k > 0 sufficiently small it can be shown that ℓ(xk+1 ) < ℓ(xk ).

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 21 | 63


Gradient methods: introduction
The update rule

xk+1 = xk − γ k ∇ℓ(xk ),

can be generalized to so called gradient methods

xk+1 = xk + γ k dk ,

with dk such that

∇ℓ(xk )T dk < 0.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 22 | 63


Gradient methods: selecting the descent direction
Several gradient methods can be written as

xk+1 = xk − γ k Dk ∇ℓ(xk ) k = 1, 2, . . .

where Dk ∈ Rn×n is a symmetric positive definite matrix.


It can be immediately seen that

−∇ℓ(xk )T Dk ∇ℓ(xk ) < 0,

i.e., dk = −Dk ∇ℓ(xk ) is a descent direction.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 23 | 63


Gradient methods: selecting the descent direction
Some choices of Dk
• Steepest descent Dk = In
 −1
• Newton’s method Dk = ∇2 ℓ(xk )
It can be used when ∇2 ℓ(xk ) > 0. It typically converges very fast asymptotically. For γ k = 1
pure Newton’s method.
 −1
• Discretized Newton’s method Dk = H(xk ) ,
where H(xk ) is a positive definite symmetric approximation of ∇2 ℓ(xk ) obtained by using finite
difference approximations of the second derivatives.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 24 | 63


Steepest descent (gradient method)
The update rule obtained for Dk = I,

xk+1 = xk − γ k ∇ℓ(xk ),

is called steepest descent (or gradient method tout court).

The name steepest descent is due to the following property. The normalized negative gradient direction

∇ℓ(xk )
dk = − ,
∥∇ℓ(xk )∥

minimizes the slope ∇ℓ(xk )T dk among all normalized directions, i.e., it gives the steepest descent.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 25 | 63


Newton’s method for root finding
Consider the nonlinear root (zero) finding problem

r(x) = 0

Idea: Iteratively refine the solution such that the improved guess xk+1 represents a root (zero) of the
linear approximation of r about the current tentative solution xk .

Consider the linear approximation of r about xk , we have

rk (xk + ∆xk ) = r(xk ) + ∇r(xk )⊤ ∆xk

then, finding the zeros of the approximation, we have


−1
∆xk = − ∇r(xk )⊤ r(xk )

Thus, the solution is improved as


−1
xk+1 = xk − ∇r(xk )⊤ r(xk ).

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 26 | 63


Newton’s Method for Unconstrained Optimization
Consider the unconstrained optimization problem

min ℓ(x),
x∈Rn

stationary points x̄ satisfy the first-order optimality condition

∇ℓ(x̄) = 0.

We can look at it as a root finding problem, with r(x) = ∇ℓ(x), and solve it via Newton’s method.

Therefore, we can compute ∆xk as the solution of the linearization of r(x) = ∇ℓ(x) at xk , i.e.,

∇ℓ(xk ) + ∇2 ℓ(xk )∆xk = 0,

and run the update

xk+1 = xk − ∇2 ℓ(xk )−1 ∇ℓ(xk ) .


| {z }
∆xk

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 27 | 63


Newton’s Method via Quadratic Optimization

Observe that

∇ℓ(xk ) + ∇2 ℓ(xk )∆xk = 0

is the first-order necessary and sufficient condition of optimality for the quadratic program

∆xk = arg min ∇ℓ(xk )⊤ ∆x + 21 ∆x⊤ ∇2 ℓ(xk )∆x. (1)


∆x

Thus, the k-th iteration of Newton’s method can be seen as

xk+1 = xk − ∆xk

with ∆xk solution of the quadratic program (1).

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 28 | 63


Gradient methods via quadratic optimization
Similarly to the Newton’s method, a descent direction ∆xk = Dk ∇ℓ(xk ) can be seen as the direction
that minimizes at each iteration a different quadratic approximation of ℓ about xk .

In fact, consider the quadratic approximation ℓk (x) of ℓ about xk given by


ℓk (x) = ℓ(xk ) + ∇ℓ(xk )⊤ (x − xk ) + 21 (x − xk )⊤ (Dk )−1 (x − xk )
By setting the derivative to zero, we have
∇ℓ(xk ) + (Dk )−1 (x − xk ) = 0
we can calculate the minimum of ℓk (x) and set it as the next iterate xk+1
xk+1 = xk − Dk ∇ℓ(xk )
namely,
∆xk = −Dk ∇ℓ(xk )

Remark Direction ∆xk can be computed by solving the quadratic approximation. This can be very
helpful in optimization problems in which a gradient is not (easily) available, but it is to solve a quadratic
optimization problem. It is useful also in case the domain is a compact convex set.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 29 | 63


Gradient methods: step-size selection rules (I)
Cost
Armijo rule (Backtracking line-search) `(xk + γdk )
γ̄ 0 = 1
Step-size is selected following the procedure:
1. Set γ̄ 0 > 0, β ∈ (0, 1), c ∈ (0, 1)
`(xk ) + cγ∇`(xk )> dk

2. While ℓ(xk + γ̄ i dk ) ≥ ℓ(xk ) + cγ̄ i ∇ℓ(xk )⊤ dk :

γ̄ i+1 = βγ̄ i
`(xk ) + γ∇`(xk )> dk
3. Set γ k = γ̄ i 1 γ

Idea: Select γ k such that cost is reduced while the algorithm makes progress.

Remark Typical values are β = 0.7 and c = 0.5.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 30 | 63


Gradient methods: step-size selection rules (I)
Cost
Armijo rule (Backtracking line-search) `(xk + γdk )

Step-size is selected following the procedure:


1. Set γ̄ 0 > 0, β ∈ (0, 1), c ∈ (0, 1)
`(xk ) + cγ∇`(xk )> dk

2. While ℓ(xk + γ̄ i dk ) ≥ ℓ(xk ) + cγ̄ i ∇ℓ(xk )⊤ dk :


γ̄ 1
i+1 i
γ̄ = βγ̄
`(xk ) + γ∇`(xk )> dk
3. Set γ k = γ̄ i 1 γ

Idea: Select γ k such that cost is reduced while the algorithm makes progress.

Remark Typical values are β = 0.7 and c = 0.5.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 30 | 63


Gradient methods: step-size selection rules (I)
Cost
Armijo rule (Backtracking line-search) `(xk + γdk )

Step-size is selected following the procedure:


1. Set γ̄ 0 > 0, β ∈ (0, 1), c ∈ (0, 1)
`(xk ) + cγ∇`(xk )> dk

2. While ℓ(xk + γ̄ i dk ) ≥ ℓ(xk ) + cγ̄ i ∇ℓ(xk )⊤ dk :

γ̄ i+1 = βγ̄ i γ̄ 2
`(xk ) + γ∇`(xk )> dk
3. Set γ k = γ̄ i 1 γ

Idea: Select γ k such that cost is reduced while the algorithm makes progress.

Remark Typical values are β = 0.7 and c = 0.5.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 30 | 63


Gradient methods: step-size selection rules (II)
Constant step-size γ k = γ > 0 for all t = 0, 1, 2 . . .

Diminishing step-size γ k → 0
It must be avoided that becomes too small to guarantee substantial progress. For this reason we require

X
γ k = ∞.
t=0

Remark The associated convergence rate tends to be slow.

Typical choice: γ k chosen such that



X ∞ 
X 2
γk = ∞ and γ k < ∞,
t=0 t=0

k 1 1
e.g., γ = kα
with 2
< α ≤ 1.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 31 | 63


Gradient methods: convergence result (Armijo)
Proposition (convergence with Armijo step-size)
Let {xk } be a sequence generated by a gradient method xk+1 = xk − γ k Dk ∇ℓ(xk ) with d1 I < Dk <
d2 I. Assume that γ k is chosen by the Armijo rule and ℓ(x) be continuously differentiable.
Then, every limit point x̄ of the sequence {xk } is a stationary point, i.e., ∇ℓ(x̄) = 0.

Remark Recall that a vector x ∈ Rn is a limit point of a sequence {xk } in Rn if there exists a subsequence
of {xk } that converges to x.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 32 | 63


Gradient methods: convergence result (constant/diminishing)
Proposition (convergence with constant or diminishing step-size)
Let {xk } be a sequence generated by a gradient method xk+1 = xk − γ k Dk ∇ℓ(xk ) with d1 I < Dk <
d2 I. Assume that for some L > 0

∥∇ℓ(x) − ∇ℓ(y)∥ ≤ L∥x − y∥, ∀x, y ∈ Rn .

Assume either
1. γ k = γ > 0 sufficiently small, or
2. γ k → 0 and ∞ k
P
t=0 γ = ∞.

Then, every limit point x̄ of the sequence {xk } is a stationary point, i.e., ∇ℓ(x̄) = 0.

Remark Check if the following problem minx∈Rn c⊤ x, with c ∈ R⊤ satisfies the assumptions. □

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 33 | 63


Gradient methods: remarks
Remark The propositions do not guarantee that the sequence converges (to a point) and not even
existence of limit points. Then either ℓ(xk ) → −∞ or ℓ(xk ) converges to a finite value and ∇ℓ(xk ) → 0.
In the second case, one can show that any subsequence {xkp } converges to some stationary point x̄
satisfying ∇ℓ(x̄) = 0. □

Remark Existence of minima can be guaranteed by excluding ℓ(xk ) → −∞ via suitable assumptions.
Assume, e.g., ℓ coercive (or radially unbounded), i.e., lim∥x∥→∞ ℓ(x) = ∞. □

Remark For general (nonconvex) problems, assuming coercivity, only convergence (of subsequences) to
stationary points (i.e., to x̄ with ∇ℓ(x̄) = 0) can be proven. □

{(−1)k }k>0,k=2n
1

0
5 10

−1
{(−1)n }n>0

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 34 | 63


Gradient methods: remarks
Remark The propositions do not guarantee that the sequence converges (to a point) and not even
existence of limit points. Then either ℓ(xk ) → −∞ or ℓ(xk ) converges to a finite value and ∇ℓ(xk ) → 0.
In the second case, one can show that any subsequence {xkp } converges to some stationary point x̄
satisfying ∇ℓ(x̄) = 0. □

Remark Existence of minima can be guaranteed by excluding ℓ(xk ) → −∞ via suitable assumptions.
Assume, e.g., ℓ coercive (or radially unbounded), i.e., lim∥x∥→∞ ℓ(x) = ∞. □

Remark For general (nonconvex) problems, assuming coercivity, only convergence (of subsequences) to
stationary points (i.e., to x̄ with ∇ℓ(x̄) = 0) can be proven. □

Remark For convex programs, assuming coercivity, convergence (of subsequences) to global minima is
guaranteed since necessary conditions of optimality are also sufficient. □

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 34 | 63


Example: Gradient Method applied to Quadratic Programs
Quadratic program

min xT Qx + q T x
x∈Rn

with Q = QT ∈ Rn×n and q ∈ Rn .

Steepest descent method

xk+1 = xk − γ k ∇ℓ(xk )

for a Quadratic Program is given by

xk+1 = xk − γ k (2Qxk + q).

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 35 | 63


Optimization (constrained) over convex sets: optimality conditions
and algorithms
Optimization over convex sets
Consider the optimization problem

min ℓ(x),
x∈X

where X ⊂ Rn is nonempty, convex and closed, and ℓ is continuously differentiable on X.


Remark Recall that a set X is closed if it contains all of its limit points.

Important
The function ℓ can be nonconvex.

Optimality Conditions
If a point x∗ ∈ X is a local minimum of ℓ(x) over X, then

∇ℓ(x∗ )⊤ (x̄ − x∗ ) ≥ 0 ∀x̄ ∈ X

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 37 | 63


Optimization over convex sets
Projection over a convex set
Given a point x ∈ Rn and a closed convex set X, it can be shown that

PX (x) := arg min ∥z − x∥2


z∈X

exists and is unique.

The point PX (x) is called the projection of x on X.

x
PX (x)

Question: can you prove existence and uniqueness of projection point?

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 38 | 63


Projected gradient
Gradient methods can be generalized to optimization over convex sets.

Projected gradient
 
xk+1 = PX xk − γ k ∇ℓ(xk ) .

The algorithm is based on the idea of generating at each t feasible points (i.e., belonging to X) that
give a descent in the cost.

The analysis follows similar arguments to the one of unconstrained gradient methods.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 39 | 63


Feasible direction method
Find x̃ ∈ Rn such that

x̃ = argmin  k
ℓ(x
 ) + ∇ℓ(xk )⊤ (x − xk ) + 21 (x − xk )⊤ (x − xk )
x∈X

Update the solution

xk+1 = xk + γ k (x̃ − xk )
| {z }
feasible direction

Note: For γ k sufficiently small, xk+1 ∈ X.

xk+1 = xk + γ k (x̃ − xk ) xk xk+1

= (1 − γ k )xk + γ k x̃ x̃
X

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 40 | 63


Example: Projected Gradient Method for Quadratic Programs
Quadratic program

min xT Qx + q T x
x∈X

with Q = QT ≥ 0 ∈ Rn×n and q ∈ Rn .

Projected gradient method


 
xk+1 = PX xk − γ k ∇ℓ(xk )

is given by
 
xk+1 = PX xk − γ k (2Qxk + q) .

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 41 | 63


Examples of cost functions
ℓ(x) = x2
ℓ(x) = x4 + 2x3 − 12x2 − 2x + 6
ℓ(x) = ex1 +3x2 −0.1 + ex1 −3x2 −0.1 + e−x1 −0.1
ℓ(x) = 12 x⊤ Qx + r⊤ x

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 42 | 63


Barrier function strategy for inequality constraints
Consider the inequality constrained optimization problem

min ℓ(x)
x∈Rd

subj.to gj (x) ≤ 0 j ∈ {1, . . . , r} −ε log(z)

Inequality constraints can be relaxed and embedded in the


ε→0
cost function by means of a barrier function −ε log(x).

The resulting unconstrained problem reads as z


r
X
min ℓ(x) + ε − log(−gj (x))
x∈Rd
j=1

Implementation: every few iterations shrink the barrier parameters ε.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 43 | 63


Constrained optimization (equality and inequality constraints):
optimality conditions
Constrained optimization problem set-up
min ℓ(x)
x∈X

subj.to gj (x) ≤ 0 j ∈ {1, . . . , r}


hi (x) = 0 i ∈ {1, . . . , m}

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 45 | 63


Active constraints and regular points

Set of active inequality constraints


For a point x, the set of active inequality constraints at x is A(x) = {j ∈ {1, . . . , r} | gj (x) = 0}.

Regular point
A point x is regular if the vectors ∇hi (x), i ∈ {1, . . . , m}, and ∇gj (x), j ∈ A(x), are linearly indepen-
dent.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 46 | 63


Lagrangian function

In order to state the first-order necessary conditions of optimality for (equality and inequality) constrained
problems it is useful to introduce the Lagrangian function
r
X m
X
L(x, µ, λ) = ℓ(x) + µj gj (x) + λi hi (x)
j=1 i=1

Remark µj and λi can be seen as prices for violating the associated constraint. □

Remark We are relaxing the constrained problem, but we cannot minimize the Lagrangian. □

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 47 | 63


Karush-Kuhn-Tucker (KKT) necessary conditions
Let x⋆ be a regular local minimum of

min ℓ(x)
x∈Rd

subj.to gj (x) ≤ 0 j ∈ {1, . . . , r}


hi (x) = 0 i ∈ {1, . . . , m}
1
where ℓ, gj and hi are C .

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 48 | 63


Karush-Kuhn-Tucker (KKT) necessary conditions
Let x⋆ be a regular local minimum of

min ℓ(x)
x∈Rd

subj.to gj (x) ≤ 0 j ∈ {1, . . . , r}


hi (x) = 0 i ∈ {1, . . . , m}
1
where ℓ, gj and hi are C .
Then ∃ unique µ⋆j and λ⋆i , called Lagrange multipliers, s.t.

∇1 L(x⋆ , µ⋆ , λ⋆ ) = 0

µ⋆j ≥ 0
µ⋆j gj (x⋆ ) = 0 j ∈ {1, . . . , r}

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 48 | 63


Karush-Kuhn-Tucker (KKT) necessary conditions
Let x⋆ be a regular local minimum of

min ℓ(x)
x∈Rd

subj.to gj (x) ≤ 0 j ∈ {1, . . . , r}


hi (x) = 0 i ∈ {1, . . . , m}
1
where ℓ, gj and hi are C .
Then ∃ unique µ⋆j and λ⋆i , called Lagrange multipliers, s.t.

∇1 L(x⋆ , µ⋆ , λ⋆ ) = 0

µ⋆j ≥ 0
µ⋆j gj (x⋆ ) = 0 j ∈ {1, . . . , r}

Remark If ℓ, hi and gj twice differentiable, second order conditions.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 48 | 63


Karush-Kuhn-Tucker (KKT) necessary conditions
Let x⋆ be a regular local minimum of

min ℓ(x)
x∈Rd

subj.to gj (x) ≤ 0 j ∈ {1, . . . , r}


hi (x) = 0 i ∈ {1, . . . , m}
1
where ℓ, gj and hi are C .
Then ∃ unique µ⋆j and λ⋆i , called Lagrange multipliers, s.t.

∇1 L(x⋆ , µ⋆ , λ⋆ ) = 0

µ⋆j ≥ 0
µ⋆j gj (x⋆ ) = 0 j ∈ {1, . . . , r}

Notation Points satisfying the KKT necessary conditions of optimality are referred to as KKT points.
They are the counterpart of stationary points in constrained optimization.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 48 | 63


Remarks on Karush-Kuhn-Tucker (KKT) necessary conditions
Remark The condition

∇1 L(x⋆ , µ⋆ , λ⋆ ) = 0

can be explicitly written as


r
X m
X
∇ℓ(x⋆ ) + µ⋆j ∇gj (x⋆ ) + λ⋆i ∇hi (x⋆ ) = 0
j=1 i=1

Remark The condition µ⋆j gj⋆ (x⋆ ) = 0, j ∈ {1, . . . , r}, is called complementary slackness.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 49 | 63


Karush-Kuhn-Tucker (KKT) necessary conditions (second order)
Let x⋆ be a regular local minimum of

min ℓ(x)
x∈Rd

subj.to gj (x) ≤ 0 j ∈ {1, . . . , r}


hi (x) = 0 i ∈ {1, . . . , m}
1
where ℓ, gj and hi are C .
Then ∃ unique µ⋆j and λ⋆i , called Lagrange multipliers, s.t.

∇1 L(x⋆ , µ⋆ , λ⋆ ) = 0, µ⋆j ≥ 0, µ⋆j gj (x⋆ ) = 0 j ∈ {1, . . . , r}

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 50 | 63


Karush-Kuhn-Tucker (KKT) necessary conditions (second order)
Let x⋆ be a regular local minimum of

min ℓ(x)
x∈Rd

subj.to gj (x) ≤ 0 j ∈ {1, . . . , r}


hi (x) = 0 i ∈ {1, . . . , m}
1
where ℓ, gj and hi are C .
Then ∃ unique µ⋆j and λ⋆i , called Lagrange multipliers, s.t.

∇1 L(x⋆ , µ⋆ , λ⋆ ) = 0, µ⋆j ≥ 0, µ⋆j gj (x⋆ ) = 0 j ∈ {1, . . . , r}

Moreover, if ℓ, gj and hi are C 2 it holds

y ⊤ ∇211 L(x⋆ , µ⋆ , λ⋆ )y ≥ 0

for all y ∈ Rn such that

∇hi (x)⊤ y = 0, i ∈ {1, . . . , m}, ∇gj (x)⊤ y = 0, j ∈ A(x) (i.e. j ∈ {1, . . . , r} s.t. gj (x) = 0)
Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 50 | 63
Quadratic Programming (constrained)
Let us consider quadratic optimization problems with linear equality constraints.

Quadratic program

min xT Qx + q T x
x∈Rn

subj.to Ax = b

with Q = QT ∈ Rn×n , q ∈ Rn , A ∈ Rp×n and b ∈ Rp .

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 51 | 63


Quadratic Programming (constrained): optimality conditions
Let us introduce the Lagrangian of the constrained optimization problem, i.e.,

L(x, λ) = xT Qx + q T x + λ⊤ (Ax − b) λ ∈ Rp

First-order necessary condition for optimality: if x⋆ is a minimum then there exists λ⋆ such that

∇1 L(x⋆ , λ⋆ ) 2Qx⋆ + q + A⊤ λ⋆
   
= 0 =⇒ =0
∇2 L(x⋆ , λ⋆ ) Ax − b

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 52 | 63


Quadratic Programming (constrained): optimality conditions
Second-order necessary condition for optimality: if x⋆ is a minimum then

y ⊤ ∇2 L(x⋆ , λ⋆ )y = y ⊤ Qy ≥ 0

for all y ∈ Rn such that

∇hi (x)⊤ y = 0 i ∈ {1, . . . , p}



=⇒ A y = 0

namely, for all y ∈ Rn in the null-space of A⊤ .

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 53 | 63


Constrained optimization (equality and inequality constraints):
optimization algorithms
Optimality conditions for equality constrained problems (refresh)
Consider the equality-constrained minimization problem:

min ℓ(x)
x

subj.to h(x) = 0

where ℓ : Rn → R and h : Rn → Rm .

The first-order necessary conditions of optimality require the solution of

∇ℓ(x) + ∇h(x)λ = 0
h(x) = 0

which can be written as


 
∇ℓ(x) + ∇h(x)λ
∇L(x, λ) = = 0.
h(x)

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 55 | 63


Newton’s method for equality constrained problems
KKT points can be found by solving a root finding problem in variables x, λ with r(x, λ) = ∇L(x, λ).

Newton’s method for this root finding problem reads as


 k+1   k   k 
x x ∆x
= k +
λk+1 λ ∆λk

with
∆xk
 
= − (∇2 L(xk , λk ))−1 ∇L(xk , λk )
∆λk | {z }
∇r(·)−1 r(·)

where
Hk ∇h(xk ) ∇ℓ(xk ) + ∇h(xk )λk
   
∇2 L(xk , λk ) := ∇L(xk , λk ) :=
∇h(xk )⊤ 0 h(xk )

and H k := ∇211 L(xk , λk ).

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 56 | 63


Newton’s method for equality constrained problems

We can write
∆xk
 
∇2 L(xk , λk ) = −∇L(xk , λk )
∆λk

namely

Hk ∇h(xk ) ∆xk ∇ℓ(xk ) + ∇h(xk )λk


    
=
∇h(xk )⊤ 0 ∆λk h(xk ).

Thus, ∆xk , ∆λk can be obtained as solution of a linear system of equations in the variables ∆x, ∆λ.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 57 | 63


Newton’s method via Quadratic Programming
The linear system of equations can be rewritten as
H k ∆xk + ∇h(xk )∆λk = −∇ℓ(xk ) − ∇h(xk )λk
∇h(xk )⊤ ∆xk = −h(xk ).
and equivalently as
∇ℓ(xk ) + H k ∆xk + ∇h(xk )λk+1 = 0
h(xk ) + ∇h(xk )⊤ ∆xk = 0.

We can observe that


∇ℓ(xk ) + H k ∆x + ∇h(xk )λ = 0
h(xk ) + ∇h(xk )⊤ ∆x = 0
are the necessary and sufficient optimality conditions for the Quadratic Program (QP)
min ∇ℓ(xk )⊤ ∆x + 21 ∆x⊤ H k ∆x
∆x

subj.to h(xk ) + ∇h(xk )⊤ ∆x = 0.


Therefore, in the Newton’s update, we can obtain (∆xk , λk+1 ) by solving this QP.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 58 | 63


Sequential Quadratic Programming (SQP)

Start from a tentative solution x0


For k = 0, 1, . . . (up to convergence)
1. Compute ∇ℓ(xk ), H k , ∇h(xk )
2. Obtain (∆xk , ∆λ∗QP ) from

∆xk = argmin ∇ℓ(xk )⊤ ∆x + 21 ∆x⊤ H k ∆x


∆x

subj.to h(xk ) + ∇h(xk )⊤ ∆x = 0. (2)

with ∆λ∗QP the Lagrange multiplier associated to the optimal solution of (2).
3. Update

xk+1 = xk + ∆xk
λk+1 = ∆λ∗QP

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 59 | 63


Barrier function strategy for inequality constraints
Consider the inequality constrained optimization problem

min ℓ(x)
x∈Rd

subj.to gj (x) ≤ 0 j ∈ {1, . . . , r} −ε log(z)

h(x) = 0

Inequality constraints can be embedded in the cost function ε→0


by means of a barrier function −ε log(x)
The resulting unconstrained problem reads as
z
r
X
min ℓ(x) + ε − log(−gj (x))
x∈Rd
j=1

h(x) = 0

Implementation: every few iterations shrink the barrier parameters ε.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 60 | 63


Concluding Remarks on Notation
Warning on notation in Optimization and Optimal Control
Notation x
We denote x ∈ Rn the generic optimization variable of a generic optimization problem, e.g.

min ℓ(x),
x∈Rn

where ℓ : Rn → R cost function.

Notation xk
We denote xk ∈ Rn the solution estimate at iteration k ∈ N of an optimization algorithm for the
optimization problem.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 62 | 63


Warning on notation in Optimization and Optimal Control
Notation xt
Consider a discrete-time system in standard control notation x(t + 1) = ft (x(t), u(t)) with t ∈ N.
We denote xt := x(t) and ut := u(t) the state and the input respectively, so that the dynamics is
written as

xt+1 = ft (xt , ut )

Notation (x, u), (xk , uk ), (xkt , ukt )


The decision vector in optimal control problems is (x, u) and we will denote (xk , uk ) the solution
estimate at iteration k of an optimal control algorithm.
We will also use (xkt , ukt ) to denote the vector component of (xk , uk ) associated to the state and input
at time t.

Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 63 | 63

You might also like