OPTCON Optimization 2023 10 11
OPTCON Optimization 2023 10 11
Nonlinear Optimization
A special thank to L. Sforni and I. Notarnicola for the support on the slide preparation
The present slides are for internal use of the course
Optimal Control @ University of Bologna.
Unconstrained optimization: definitions, optimality conditions and
special problem classes
Unconstrained optimization: introduction
Consider the unconstrained optimization problem
min ℓ(x),
x∈Rn
We say that x⋆ is a
• global minimum if ℓ(x⋆ ) ≤ ℓ(x) for all x ∈ Rn
• strict global minimum if ℓ(x⋆ ) < ℓ(x) for all x ̸= x⋆
• local minimum if there exists ϵ > 0 such that ℓ(x⋆ ) ≤ ℓ(x) for all
x ∈ B(x⋆ , ϵ) = {x ∈ Rn | ∥x − x⋆ ∥ < ϵ}
• strict local minimum if there exists ϵ > 0 such that ℓ(x⋆ ) < ℓ(x) for all x ∈ B(x⋆ , ϵ) and x ̸= x⋆ .
min ℓ(x),
x∈Rn
local minima
Remark Maxima can be equivalently defined. Moreover, maxima are minima of −ℓ.
∗
where x is the minimum point (optimal value for the optimization variable), i.e.,
x∗ = argmin ℓ(x),
x∈Rn
The Hessian matrix is a symmetric matrix, since the assumption of continuity of the second derivatives
implies that the order of differentiation does not matter.
Remark Points x̄ satisfying ∇ℓ(x̄) = 0 are called stationary points. They include minima, maxima and
saddle points.
saddle point
local minima
λxA + (1 − λ)xB ∈ X.
xB
xA X
Remark A function ℓ is concave if −ℓ is convex. A function ℓ is strictly convex if the inequality holds
strictly for xA ̸= xB and λ ∈ (0, 1).
Xineq = {x ∈ Rn | g(x) ≤ 0}
The set Xineq is convex if and only if g is a quasi-convex function (e.g., monotone functions on the axis).
x2 x2
ax21 + bx22 − c ≤ 0
a> x − c ≤ 0
x1 x1
Xeq = {x ∈ Rn | h(x) = 0}
The set Xeq is convex if and only if h is an affine function. Convex sets identified through equality
constraints are linear spaces (hyperplanes).
x2 x2
x1 x1
This contradicts the assumption that x⋆ is a local minimum since we found points, x̃ = λx⋆ + (1 − λ)x̄
with λ ∈ [0, 1), arbitrarily close to x⋆ with ℓ(x̃) < ℓ(x⋆ ). This concludes the proof.
for all x ∈ Rn . Thus, if ∇ℓ(x⋆ ) = 0 it follows immediately ℓ(x) ≥ ℓ(x⋆ ) for all x ∈ Rn . The converse
is true by the first order necessary condition and the fact that a local minimum is a global minimum.
Quadratic program
min xT Qx + q T x
x∈Rn
∇ℓ(x⋆ ) = 0 =⇒ 2Qx⋆ + b = 0
∇2 ℓ(x⋆ ) ≥ 0 =⇒ 2Q ≥ 0
Important
A necessary condition for the existence of minima for a quadratic program is that Q ≥ 0.
Thus, quadratic programs admitting at least a minimum are convex optimization problems.
Important
For a quadratic program necessary conditions of optimality are also sufficient
and minima are global.
The algorithm starts at a given initial guess x0 and iteratively generates vectors x1 , x2 , . . . such that ℓ
is decreased at each iteration, i.e.,
`(x) xk
xk+1
x?
xk+1 = xk + γ k dk , k = 1, 2, . . .
in which
1. each γ k > 0 is a “step-size”,
2. dk ∈ Rn is a “direction”.
The goal is to
1. choose a direction dk along which the cost decreases for γ k sufficiently small;
2. select a step-size γ k guaranteeing a sufficient decrease.
xk+1 = xk − γ k ∇ℓ(xk ),
Thus, for γ k > 0 sufficiently small it can be shown that ℓ(xk+1 ) < ℓ(xk ).
xk+1 = xk − γ k ∇ℓ(xk ),
xk+1 = xk + γ k dk ,
∇ℓ(xk )T dk < 0.
xk+1 = xk − γ k Dk ∇ℓ(xk ) k = 1, 2, . . .
xk+1 = xk − γ k ∇ℓ(xk ),
The name steepest descent is due to the following property. The normalized negative gradient direction
∇ℓ(xk )
dk = − ,
∥∇ℓ(xk )∥
minimizes the slope ∇ℓ(xk )T dk among all normalized directions, i.e., it gives the steepest descent.
r(x) = 0
Idea: Iteratively refine the solution such that the improved guess xk+1 represents a root (zero) of the
linear approximation of r about the current tentative solution xk .
min ℓ(x),
x∈Rn
∇ℓ(x̄) = 0.
We can look at it as a root finding problem, with r(x) = ∇ℓ(x), and solve it via Newton’s method.
Therefore, we can compute ∆xk as the solution of the linearization of r(x) = ∇ℓ(x) at xk , i.e.,
Observe that
is the first-order necessary and sufficient condition of optimality for the quadratic program
xk+1 = xk − ∆xk
Remark Direction ∆xk can be computed by solving the quadratic approximation. This can be very
helpful in optimization problems in which a gradient is not (easily) available, but it is to solve a quadratic
optimization problem. It is useful also in case the domain is a compact convex set.
γ̄ i+1 = βγ̄ i
`(xk ) + γ∇`(xk )> dk
3. Set γ k = γ̄ i 1 γ
Idea: Select γ k such that cost is reduced while the algorithm makes progress.
Idea: Select γ k such that cost is reduced while the algorithm makes progress.
γ̄ i+1 = βγ̄ i γ̄ 2
`(xk ) + γ∇`(xk )> dk
3. Set γ k = γ̄ i 1 γ
Idea: Select γ k such that cost is reduced while the algorithm makes progress.
Diminishing step-size γ k → 0
It must be avoided that becomes too small to guarantee substantial progress. For this reason we require
∞
X
γ k = ∞.
t=0
k 1 1
e.g., γ = kα
with 2
< α ≤ 1.
Remark Recall that a vector x ∈ Rn is a limit point of a sequence {xk } in Rn if there exists a subsequence
of {xk } that converges to x.
Assume either
1. γ k = γ > 0 sufficiently small, or
2. γ k → 0 and ∞ k
P
t=0 γ = ∞.
Then, every limit point x̄ of the sequence {xk } is a stationary point, i.e., ∇ℓ(x̄) = 0.
Remark Check if the following problem minx∈Rn c⊤ x, with c ∈ R⊤ satisfies the assumptions. □
Remark Existence of minima can be guaranteed by excluding ℓ(xk ) → −∞ via suitable assumptions.
Assume, e.g., ℓ coercive (or radially unbounded), i.e., lim∥x∥→∞ ℓ(x) = ∞. □
Remark For general (nonconvex) problems, assuming coercivity, only convergence (of subsequences) to
stationary points (i.e., to x̄ with ∇ℓ(x̄) = 0) can be proven. □
{(−1)k }k>0,k=2n
1
0
5 10
−1
{(−1)n }n>0
Remark Existence of minima can be guaranteed by excluding ℓ(xk ) → −∞ via suitable assumptions.
Assume, e.g., ℓ coercive (or radially unbounded), i.e., lim∥x∥→∞ ℓ(x) = ∞. □
Remark For general (nonconvex) problems, assuming coercivity, only convergence (of subsequences) to
stationary points (i.e., to x̄ with ∇ℓ(x̄) = 0) can be proven. □
Remark For convex programs, assuming coercivity, convergence (of subsequences) to global minima is
guaranteed since necessary conditions of optimality are also sufficient. □
min xT Qx + q T x
x∈Rn
xk+1 = xk − γ k ∇ℓ(xk )
min ℓ(x),
x∈X
Important
The function ℓ can be nonconvex.
Optimality Conditions
If a point x∗ ∈ X is a local minimum of ℓ(x) over X, then
x
PX (x)
Projected gradient
xk+1 = PX xk − γ k ∇ℓ(xk ) .
The algorithm is based on the idea of generating at each t feasible points (i.e., belonging to X) that
give a descent in the cost.
The analysis follows similar arguments to the one of unconstrained gradient methods.
x̃ = argmin k
ℓ(x
) + ∇ℓ(xk )⊤ (x − xk ) + 21 (x − xk )⊤ (x − xk )
x∈X
xk+1 = xk + γ k (x̃ − xk )
| {z }
feasible direction
= (1 − γ k )xk + γ k x̃ x̃
X
min xT Qx + q T x
x∈X
is given by
xk+1 = PX xk − γ k (2Qxk + q) .
min ℓ(x)
x∈Rd
Regular point
A point x is regular if the vectors ∇hi (x), i ∈ {1, . . . , m}, and ∇gj (x), j ∈ A(x), are linearly indepen-
dent.
In order to state the first-order necessary conditions of optimality for (equality and inequality) constrained
problems it is useful to introduce the Lagrangian function
r
X m
X
L(x, µ, λ) = ℓ(x) + µj gj (x) + λi hi (x)
j=1 i=1
Remark µj and λi can be seen as prices for violating the associated constraint. □
Remark We are relaxing the constrained problem, but we cannot minimize the Lagrangian. □
min ℓ(x)
x∈Rd
min ℓ(x)
x∈Rd
∇1 L(x⋆ , µ⋆ , λ⋆ ) = 0
µ⋆j ≥ 0
µ⋆j gj (x⋆ ) = 0 j ∈ {1, . . . , r}
min ℓ(x)
x∈Rd
∇1 L(x⋆ , µ⋆ , λ⋆ ) = 0
µ⋆j ≥ 0
µ⋆j gj (x⋆ ) = 0 j ∈ {1, . . . , r}
min ℓ(x)
x∈Rd
∇1 L(x⋆ , µ⋆ , λ⋆ ) = 0
µ⋆j ≥ 0
µ⋆j gj (x⋆ ) = 0 j ∈ {1, . . . , r}
Notation Points satisfying the KKT necessary conditions of optimality are referred to as KKT points.
They are the counterpart of stationary points in constrained optimization.
∇1 L(x⋆ , µ⋆ , λ⋆ ) = 0
Remark The condition µ⋆j gj⋆ (x⋆ ) = 0, j ∈ {1, . . . , r}, is called complementary slackness.
min ℓ(x)
x∈Rd
min ℓ(x)
x∈Rd
y ⊤ ∇211 L(x⋆ , µ⋆ , λ⋆ )y ≥ 0
∇hi (x)⊤ y = 0, i ∈ {1, . . . , m}, ∇gj (x)⊤ y = 0, j ∈ A(x) (i.e. j ∈ {1, . . . , r} s.t. gj (x) = 0)
Prof. Giuseppe Notarstefano – Optimal Control – Nonlinear Optimization 50 | 63
Quadratic Programming (constrained)
Let us consider quadratic optimization problems with linear equality constraints.
Quadratic program
min xT Qx + q T x
x∈Rn
subj.to Ax = b
L(x, λ) = xT Qx + q T x + λ⊤ (Ax − b) λ ∈ Rp
First-order necessary condition for optimality: if x⋆ is a minimum then there exists λ⋆ such that
∇1 L(x⋆ , λ⋆ ) 2Qx⋆ + q + A⊤ λ⋆
= 0 =⇒ =0
∇2 L(x⋆ , λ⋆ ) Ax − b
y ⊤ ∇2 L(x⋆ , λ⋆ )y = y ⊤ Qy ≥ 0
min ℓ(x)
x
subj.to h(x) = 0
where ℓ : Rn → R and h : Rn → Rm .
∇ℓ(x) + ∇h(x)λ = 0
h(x) = 0
with
∆xk
= − (∇2 L(xk , λk ))−1 ∇L(xk , λk )
∆λk | {z }
∇r(·)−1 r(·)
where
Hk ∇h(xk ) ∇ℓ(xk ) + ∇h(xk )λk
∇2 L(xk , λk ) := ∇L(xk , λk ) :=
∇h(xk )⊤ 0 h(xk )
We can write
∆xk
∇2 L(xk , λk ) = −∇L(xk , λk )
∆λk
namely
Thus, ∆xk , ∆λk can be obtained as solution of a linear system of equations in the variables ∆x, ∆λ.
with ∆λ∗QP the Lagrange multiplier associated to the optimal solution of (2).
3. Update
xk+1 = xk + ∆xk
λk+1 = ∆λ∗QP
min ℓ(x)
x∈Rd
h(x) = 0
h(x) = 0
min ℓ(x),
x∈Rn
Notation xk
We denote xk ∈ Rn the solution estimate at iteration k ∈ N of an optimization algorithm for the
optimization problem.
xt+1 = ft (xt , ut )