55_Optimization
55_Optimization
Optimization
DR. ABHISHEK SARKAR
MECHANICAL ENGINEERING
BITS HYDERABAD
Optimization
• Optimization comes from the same root
as “optimal”, which means best. When
you optimize something, you are “making
it best”.
• Gradient Descent
• Lagrange Multiplier
2
Stationary Point
• Gradient Descent is an
iterative process that finds
the minima of a function.
• A differentiable function has
a stationary point when the
derivative becomes zero.
𝑑𝑓
𝑑𝑥
=0
3
Gradient descent
• Finding the global
minimum of the
objective function is
feasible if the
objective function is
convex, i.e. any local
minimum is a global
minimum.
4
Example
• Find the local minima of the function y=(x+5)² starting from
the point 𝑥0 = 3
• Learning rate = 0.01
• Iteration 1 :
𝑑𝑦
𝑑𝑥
=2 𝑥+5
𝑑𝑦
• Therefore 𝑥1 = 𝑥0 − 0.01
𝑑𝑥
𝑥1 = 3 − 0.01 2 3 + 5 = 2.84
5
Example
• Last few
iterations
before it stops.
6
Exception
• Y’ = 0 does not
always mean a
stationary point.
7
Two Dimensional Function
• Let f(X,Y) has a local minima at a point 𝑋0 , 𝑌0 and has a
continuous partial derivative at this point.
• Then 𝑓𝑋 𝑋0 , 𝑌0 = 0, 𝑓𝑌 𝑋0 , 𝑌0 = 0
8
Two Dimensional Function
• The second partial derivative test classifies the point 𝑋0 , 𝑌0
as a local minima or maxima
• Let 2
𝐷 = 𝑓𝑥𝑥 𝑋0 , 𝑌0 𝑓𝑦𝑦 𝑋0 , 𝑌0 − 𝑓𝑥𝑦 𝑋0 , 𝑌0
• 𝑓 𝑋0 , 𝑌0 is
• - a local maximum if 𝐷 > 0 ∧ 𝑓𝑋𝑋 𝑋0 , 𝑌0 < 0
• - a local minimum if 𝐷 > 0 ∧ 𝑓𝑋𝑋 𝑋0 , 𝑌0 > 0
• - a saddle point if 𝐷
<0
• If D = 0, then the test is inconclusive.
9
Functions of many variables
• Instead of examining the determinant of the Hessian matrix,
one must look at the eigenvalues of the Hessian matrix at the
critical point
10
Functions of many variables
• If the Hessian is positive definite (equivalently, has all
eigenvalues positive) at a, then f attains a local minimum at
(x1, x2,...).
• If the Hessian is negative definite (equivalently, has all
eigenvalues negative) at a, then f attains a local maximum at
(x1, x2,...).
• If the Hessian has both positive and negative eigenvalues then
a is a saddle point for f(x1, x2,...) (and in fact this is true even if
it is degenerate).
11
Contour Function
Link
12
Contour Function
• Contour lines are used
in a map to portray
differences in
elevation.
• When contour lines
are closer together on
a map, they indicate a
steep slope.
13
Gradient descent
in Contour plot
• 2D view of Contour
lines used in a map to
portray differences in
elevation.
14
Constrained Optimization
• Other types of optimization problems involve maximizing or
minimizing a quantity subject to an external constraint.
• In these cases the extreme values frequently won't occur at
the points where the gradient is zero, but rather at other
points that satisfy an important geometric condition.
• These problems can be solved with the method of Lagrange
Multipliers
15
Constrained Optimization
• Find the maximum and minimum values of 𝑓 𝑥, 𝑦 = 𝑦 2 − 𝑥 2
subject to the constraint 𝑔 𝑥, 𝑦 = 𝑥 2 + 𝑦 2 − 4 = 0
• Without restrictions on x, y there would be no maximum or
minimum values of 𝑓 𝑥, 𝑦 , just the saddle point at the
origin!
• The constraint places restrictions on the values of x, y in f(x,y).
• We could use single variable methods: simply eliminate y from
using 𝑓 𝑥, 𝑦 𝑔 . 𝑥, 𝑦 = 0
16
Method of Lagrange Multiplier
• In the 1700's, Joseph
Louis Lagrange studied
constrained optimization
problems of this kind,
and he found a clever
way to express all of our
conditions into a single
equation.
17
Method of Lagrange Multiplier
• The maximum and minimum values of 𝑧 = 𝑓 𝑥, 𝑦 subject to
the constraint 𝑔 𝑥, 𝑦 = 0 occur at a point 𝑎, 𝑏 for which
there exists λ such that
• 𝛻𝑓 𝑎, 𝑏 = 𝜆𝛻𝑔 𝑎, 𝑏 where 𝑔 𝑥, 𝑦 = 0
• and 𝛻𝑔 𝑎, 𝑏 ≠ 0
• Such points 𝑎, 𝑏 will be called critical points
• The method of Lagrange multipliers works just as well when f
and g are functions of 3 variables (or any greater number of
18 variables for that matter).
Example
• Maximize 𝑓 𝑥, 𝑦 = 𝑥 + 𝑦
• Subject to 𝑥 2 + 𝑦 2
=1
• For the method of Lagrange
multipliers, the constraint is
𝑥2 + 𝑦2 + 𝑧2 𝑔 𝑥, 𝑦 = 𝑥 2 + 𝑦 2 − 1
• Hence
𝐿 𝑥, 𝑦, 𝜆 = 𝑓 𝑥, 𝑦 − 𝜆. 𝑔 𝑥, 𝑦
= 𝑥 + 𝑦 − 𝜆 𝑥2 + 𝑦2 − 1
19
Example
• Now we can calculate the gradient
𝜕𝐿 𝜕𝐿 𝜕𝐿
𝛻𝑥,𝑦,𝜆 𝐿 𝑥, 𝑦, 𝜆 = , ,
𝜕𝑥 𝜕𝑦 𝜕𝜆
= 1 + 2𝜆𝑥, 1 + 2𝜆𝑦, 𝑥 2 + 𝑦 2 − 1
1 + 2𝜆𝑥 = 0
• and therefore 𝛻𝑥,𝑦,𝜆 𝐿 𝑥, 𝑦, 𝜆 = 0 ⇔ ൞ 1 + 2𝜆𝑦 = 0
𝑥2 + 𝑦2 − 1 = 0