0% found this document useful (0 votes)
8 views

Multivariable Optimization

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Multivariable Optimization

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Multivariable Optimization

Multivariable Functions
 Functions depends on more than one variable
 In a multivariable function, the gradient of a
function is not a scalar quantity; instead it is a
vector quantity
 The objective function is a function of N variables
represented by x1, x2, . . . , xN.
 The gradient vector at any point x (t) is represented
by ∇f(x(t)) which is an N dimensional vector given
as follows:
Gradient of a multivariable function
 Geometrically, the gradient vector is
normal to the tangent plane at the point
x*,
 Also, it points in the direction of
maximum increase in the function
Function of Two Variables (contour)
Unidirectional Search
 Many multivariable optimization techniques use
successive unidirectional search techniques to find the
minimum point along a particular search direction..
 A unidirectional search is a search performed by
comparing function values only along a specified
direction.
 A unidirectional search is performed from a point x(t)
and in a specified direction s(t)..
 Any arbitrary point on that line can be expressed as
follows:

 Theparameter α is a scalar quantity, specifying a relative


measure of distance of the point x(α) from x(t)
Minimize
starting from (2,1)in the direction of (2,5)
Direct Search Methods
 Use function values only.
 In a single-variable function optimization, there
are only two search directions a point can be
modified—either in the positive x-direction or
the negative x-direction
 In multi-variable function optimization, each
variable can be modified either in the positive or
in the negative direction, thereby totaling 2 N
different ways
Box’s Evolutionary Optimization
Method
 Developed by G. E. P. Box in 1957
 The algorithm requires (2N +1) points, of which 2N
are corner points of an N-dimensional hypercube
centred on the other point
 All (2N + 1) function values are compared and the
best point is identified
 In the next iteration, another hypercube is formed
around this best point.
 If at any iteration, an improved point is not found,
the size of the hypercube is reduced.
 This process continues until the hypercube becomes
very small
Algorithm for Box Evolutionary
optimization method
Step 1 Choose an initial point x(0) and size reduction
parameters Δi for all design variables, i = 1,
(0)
x
2, . . . ,N. Choose a termination parameter ϵ. Set  x
Step 2 If ∥Δ∥ < ϵ, Terminate; Else create 2N points
by adding and subtracting Δi/2 from each variable at
the point x
Step 3 Compute function values at all (2 N+1) points.
Find the point having the minimum function value.
x
Designate the minimum
(0)
point to be
Step 4 If
x  x
, reduce(0)size parameters Δi = Δi/2
x x
and go to Step 2; Else set and go to Step 2.
Box’s Evolutionary Optimization
Method
In the above algorithm, x(0) is always set as the
current best point.
Thus, at the end of simulation, x (0) becomes the
obtained optimum point.
It is evident from the algorithm that at most 2 N
functions are evaluated at each iteration.
Thus, the required number of function
evaluations increases exponentially with N.
Problem
Minimize
,
and and
It is interesting to note that although the
minimum point is found, the algorithm does not
terminate at this step.
Since the current point is the minimum, no other
point can be found better than x(0) = (3, 2)T
therefore, in subsequent iterations the value of
the size parameter will continue to decrease
(according to Step 4 of the algorithm).
When the value ∥Δ∥ becomes smaller than ϵ, the
algorithm terminates.
Simplex Search Method
 The number of points in the initial simplex is much less
compared to that in Box’s evolutionary optimization
method
 This reduces the number of function evaluations
required in each iteration.
 For N variables only (N + 1) points are used in the
initial simplex
 It is important that the points chosen for the initial
simplex should not form a zero-volume N-dimensional
hypercube.
 Thus, in a function with two variables, the chosen three
points in the simplex should not lie along a line
Simplex Search Method
 At each iteration, the worst point in the simplex is
found first.
 Then, a new simplex is formed from the old
simplex by fixed rules that steer the search away
from the worst point in the simplex.
 The extent of steering depends on the relative
function values of the simplex. Four different
situations may arise depending on the function
values.
Simplex Search Method
 This algorithm was originally proposed by Spendley, et al.
(1962) and later modified by Nelder and Mead (1965).
 At first, the centroid (xc) of all points except worst point is
determined.
 Thereafter, the worst point in the simplex is reflected about
the centroid and a new point xr is found.
If the function value at this point is better than the best point
in the simplex, the reflection is considered to have taken the
simplex to a good region
 Thus, an expansion along the direction from the centroid to
the reflected point is performed. The amount of expansion is
controlled by the factor γ.
Simplex Search Method
 If the function value at the reflected point is worse than the
worst point in the simplex, the reflection is considered to
have taken the simplex to a bad region in the search space.
 Thus, a contraction in the direction from the centroid to the
reflected point is made .
 The amount of contraction is controlled by a factor β (a
negative value of β is used).
 If the function value at the reflected point is better than the
worst point in the simplex, a contraction is made with a
positive β value
 The default scenario is the reflected point itself. The
obtained new point replaces the worst point in the simplex
and the algorithm continues with the new simplex.
Simplex Search Method
Algorithm Simplex Search Method
 Step 1 Choose γ > 1, β ∈ (0, 1), and a termination parameter ϵ.
Create an initial simplex.
 Step 2 Find xh (the worst point), xl (the best point), and xg (next to
the worst point). Calculate

 Step 3 Calculate the reflected point xr = 2xc − xh. Set xnew = xr.
 If f(xr) < f(xl), set xnew = (1 + γ)xc − γxh (expansion);
 Else if f(xr) ≥ f(xh), set xnew = (1 − β)xc + βxh (contraction);
 Else if f(xg) < f(xr) < f(xh), set xnew = (1 + β)xc − βxh (contraction).
 Calculate f(xnew) and replace xh by xnew.
 Step 4 , Terminate;

 Else go to Step 2.
Minimize
Take the points defining the initial
simplex as

and β = 0.5, and γ = 2.0. For convergence,


take the value of ε as 0.2.
Hooke-Jeeves Pattern Search Method
 The pattern search method works by creating a set of search
directions iteratively. The created search directions should be
such that they completely span the search space..
 In a N-dimensional problem, this requires at least N linearly
independent search directions.
 In the Hooke-Jeeves method, a combination of exploratory
moves and pattern moves is made iteratively.
 An exploratory move is performed in the vicinity of the
current point systematically to find the best point around the
current point.
 Thereafter, two such points are used to make a pattern move.
1. Algorithm of Exploratory move
Assume that the current solution (the base point) is
denoted by xc. Assume also that the variable xci is
perturbed by Δi. Set i = 1 and x = xc.
 Step 1 Calculate f = f(x), f+=f(xi+Δi) and f−=f(xi−Δi).
 Step 2 Find fmin = min(f, f+, f−). Set x corresponds to
fmin.
 Step 3 Is i = N? If no, set i = i + 1 and go to Step 1;
Else x is the result and go to Step 4.
 Step 4 If x ̸= xc, success; Else failure.
Exploratory move
 In the exploratory move, the current point is perturbed
in positive and negative directions along each variable
one at a time and the best point is recorded.
 The current point is changed to the best point at the end
of each variable perturbation.
 If the point found at the end of all variable perturbations
is different than the original point, the exploratory move
is a success, otherwise the exploratory move is a failure.
 In any case, the best point is considered to be the
outcome of the exploratory move.
2. Pattern move
A new point is found by jumping from the current best
point xc along a direction connecting the previous best
point x(k−1) and the current base point x(k) as follows:

 The Hooke-Jeeves method comprises of an iterative


application of an exploratory move in the locality of the
current point and a subsequent jump using the pattern
move.
 If the pattern move does not take the solution to a better
region, the pattern move is not accepted and the extent
of the exploratory search is reduced.
1. Algorithm of Pattern move
 Step 1 Choose a starting point x(0), variable increments Δi (i = 1,
2, . . . ,N), a step reduction factor α > 1, and a termination
parameter, ϵ. Set k = 0.
 Step 2 Perform an exploratory move with x(k) as the base point. Say
x is the outcome of the exploratory move. If the exploratory move is
a success, set x(k+1) = x and go to Step 4; Else go to Step 3.
 Step 3 Is ∥Δ∥ < ϵ? If yes, Terminate; Else set Δi = Δi/α for i = 1,
2, . . . ,N and go to Step 2.
 Step 4 Set k = k + 1 and perform the pattern move: x(k+1) p = x(k) + (x(k)
− x(k−1)).
 Step 5 Perform another exploratory move using x(k+1) p as the base
point. Let the result be x(k+1).
 Step 6 Is f(x(k+1)) < f(x(k))? If yes, go to Step 4; Else go to Step 3.
Minimize Minimize with
Powell’s Conjugate Direction Method
Parallel subspace property
 Given a quadratic function Q(X) = C + BT X+ 1/2XTAX of
two variables (where C is a scalar quantity, B is a vector, and
A is a 2 × 2 matrix), two arbitrary but distinct points X(1) and
X(2), and a direction d.
 If y(1) is the solution to the problem minimize Q(X(1) + λd)
 and y(2) is the solution to the problem minimize Q(X(2) + λd)
 then the direction (y(2)−y(1)) is conjugate to d or, in other
words, the quantity (y(2)−y(1))TA d is zero.
Powell’s Conjugate Direction Method

 Powell’s method is an extension of the basic pattern search


method.
 A conjugate directions method will minimize a quadratic function
in a finite number of steps.
 A general nonlinear function can be approximated reasonably
well by a quadratic function near its minimum,
 A conjugate directions method is expected to speed up the
convergence of even general nonlinear objective functions.
 The basic idea is to create a set of N linearly independent search
directions and perform a series of unidirectional searches along
each of these search directions, starting each time from the
previous best point
 The algorithm is designed on the basis of solving a quadratic
function and has the Parallel subspace property.
Algorithm

Step 1 Choose a starting point x(0) and a set of N linearly


independent directions; possibly s(i) = e(i) for i = 1,2, . . . ,N.

Step 2 Minimize along N unidirectional search directions using


the previous minimum point to begin the next search. Begin
with the search direction s(1). and end with s(N). Thereafter,
perform another unidirectional search along s(1).

Step 3 Form a new conjugate direction d using the extended


parallel subspace property.

Step 4 If ∥d∥ is small , Terminate; Else replace s(j) = s(j−1) for all
j = N,N − 1, . . . , 2. Set s(1) = d/∥d∥ and go to Step 2.
Minimize from the starting point X1=
using powell’s method

Minimize

s(1) = (1, 0)T and s(2) = (0, 1)T .


Gradient-based Methods
 Direct search methods described require many
function evaluations to converge to the
minimum point.
 Gradient-based methods discussed exploit the
derivative information of the function and are
usually faster search methods.
 where the derivative information is easily
available, gradient-based methods are very
efficient.
Search Direction
 The first derivative ∇f(x(t)) at any point x(t)
represents the direction of the maximum
increase of the function value.
Search Direction
 Finding a point with the minimum function value,
ideally searching along the opposite to the first
derivative direction, that is, we should search along
−∇f(x(t)) direction.
 Any search direction s(t) having smaller function value
than that at the current point x(t). Thus, a search
direction s(t) that satisfies the following relation is a
descent direction.
Descent direction
A search direction, s(t), is a descent direction at point x(t)
if the condition ∇f(x(t)) · s(t) ≤ 0 is satisfied in the vicinity
of the point x(t).
x(k+1)=x(k)+α.s(k)
Cauchy’s (steepest descent) Method
 The steepest descent method uses the gradient vector
at each point as the search direction for each iteration.
 The search direction used in Cauchy’s method is the
negative of the gradient at any particular point x(t):

s(k) = −∇f(x(k)).
Cauchy’s (steepest descent) Method
 Since this direction gives maximum descent in function
values, it is also known as the steepest descent method.
 At every iteration, the derivative is computed at the current
point and a unidirectional search is performed in the
negative to this derivative direction to find the minimum
point along that direction.
 The minimum point becomes the current point and the
search is continued from this point.
 The algorithm continues until a point having a small enough
gradient vector is found. This algorithm guarantees
improvement in the function value at every iteration.
Algorithm :Cauchy’s (steepest
descent) Method
 Step 1 Choose a maximum number of iterations M to be
performed, an initial point x(0), two termination parameters
ϵ1, ϵ2, and set k = 0.
 Step 2 Calculate ∇f(x(k)), the first derivative at the point x(k).
 Step 3 If ∥∇f(x(k))∥ ≤ ϵ1, Terminate; Else if k ≥ M;
Terminate; Else go to Step 4.
 Step 4 Perform a unidirectional search to find α (k) using ϵ2
such that f(x(k+1)) = f(x(k)−α(k)∇f(x(k))) is minimum. One
criterion for termination is when |∇f(x(k+1)) · ∇f(x(k))| ≤ ϵ2.
 Step 5 Is ∥x(k+1)−x(k)∥ /∥x(k)∥ ≤ ϵ1? If yes, Terminate; Else set
k = k + 1 and go to Step 2.
Minimize Minimize with
Second order derivative
The second-order derivatives in multivariable
functions form a matrix, ∇2f(x(t)) (better known as
the Hessian matrix) given as follows:
Derivatives

The computation of the first derivative with respect to each variable requires
two function evaluations, thus totaling 2N function evaluations for the
complete first derivative vector. The computation of the second derivative
requires three function evaluations, but the second-order partial derivative
requires four function evaluations . Thus, the computation of Hessian matrix
requires (2N2 + 1) function evaluations.
Newton’s Method
 Newton’s method presented in single variable optimization
can be extended for the minimization of multivariable
functions.
 Consider the quadratic approximation of the function f(X) at
X = Xi using the Taylor’s series expansion

 where [Ji ] = [J ]|Xi is the matrix of second partial derivatives


(Hessian matrix) of f evaluated at the point Xi .
 By setting the partial derivatives of equal to zero for the
minimum of f (X),

 If [Ji ] is nonsingular, it can be solved to obtain an improved


approximation (X = Xi+1) as
 Newton’s method uses second-order derivatives to create
search directions.
 This allows faster convergence to the minimum point.

If f(X) is a non-quadratic function, Newton’s method may


sometimes diverge, This problem can be avoided by modifying
Algorithm

The algorithm is the same as Cauchy’s method


except that Step 4 is modified as follows:
Step 4 Perform a unidirectional search to find
α(k) using ϵ2 such that
f(x(k+1)) = f(x(k) − α(k)[∇2f(x(k))]−1 ∇f(x(k))) is
minimum

[Ji]
Minimize with
 Despite these advantages, the method is not very useful
in practice, due to the following features of the method:
1. It requires the storing of the n × n matrix [Ji ].
2. It becomes very difficult and sometimes impossible to
compute the elements of the matrix [Ji ].
3. It requires the inversion of the matrix [Ji ] at each step.
4. It requires the evaluation of the quantity [Ji ]−1∇fi at
each step.
These features make the method impractical for problems
involving a complicated objective function with a large
number of variables.

You might also like