Unconstrained Opt
Unconstrained Opt
Unconstrained Optimization
(4.1.2)
Thus, the minimization problem reduces to finding the value that minimizes the
function, f (). In fact, one of the simplest methods used in minimizing functions
of n variables is to seek the minimum of the objective function by changing only
one variable at a time, while keeping all other variables fixed, and performing a onedimensional minimization along each of the coordinate directions of an n-dimensional
design space. This procedure is called the univariate search technique.
In classifying the minimization algorithms for both the one-dimensional and
multi-dimensional problems we generally use three distinct categories. These categories are the zeroth, first, and second order methods. Zeroth order methods use
only the value of the function during the minimization process. First order methods
employ values of the function and its first derivatives with respect to the variables.
Finally, second order methods use the values of the function and its first and second derivatives. In the following discussion of one-variable function minimizations,
the function is assumed to be in the form f = f (). However, the methods to be
discussed are equally applicable for minimization of multivariable problems along a
preselected direction, s, using Eq. (4.1.1).
4.1.1 Zeroth Order Methods
Bracketing Method. As the name suggests, this method brackets the minimum of the
function to be minimized between two points, through a series of function evaluations.
The method begins with an initial point 0 , a function value f (0 ), a step size 0 ,
and a step expansion parameter > 1. The steps of the algorithm [2] are outlined as
1. Evaluate f (0 ) and f (0 + 0 ).
2. If f (0 + 0 ) < f (0 ), let 1 = 0 + 0
f (1 + 1 ). Otherwise go to step 4.
and
1 = 0 , and evaluate
4p1 3p0 p2
,
2
and
c=
p2 + p0 2p1
,
2 2
or
b=
p1 p2
,
2
and
c=
p1 2p0 + p2
,
2 2
if p2 = f (x0 s) .
(4.1.5)
3. The value of = at which p() is extremized for the current cycle is then
given by
b
= .
(4.1.6)
2c
4. corresponds to a minimum of p if c > 0, and the prediction based on
Eq. (4.1.3) is repeated using (x0 + s) as the initial point for the next cycle with
p0 = f (x0 + s) until the desired accuracy is obtained.
5. If the point = corresponds to a maximum of p rather than a minimum, or
if it corresponds to a minimum of p which is at a distance greater than a prescribed
maximum max (possibly meaning is outside the bracket points), then the maximum allowed step is taken in the direction of decreasing f and the point furthest
away from this new point is discarded in order to repeat the process.
In step 4, instead of starting with (x0 + s) as the initial point and repeating
the previous steps, there is a cheaper alternative in terms of the number of function
evaluations. The point (x0 + s) and the two points closest to it from the left and
117
implies that
f (1 ) > f (2 ) ,
(4.1.7)
implies that
f (2 ) > f (1 ) .
(4.1.8)
fn1
l0 ,
fn+1
(4.1.9)
2 = b0
fn1
l0 ,
fn+1
(4.1.10)
and
k+1
ak +
fn(k+1)
lk
fn(k1)
bk
fn(k+1)
lk ,
fn(k1)
(4.1.11)
(4.1.13)
Thus, it is possible to approximate the optimal location of the points given by Eqs.
(4.1.9 - 4.1.11) by the following relations
1 = a0 + 0.382l0 ,
(4.1.14)
2 = b0 0.382l0 ,
(4.1.15)
and
k+1
ak + 0.382lk
bk 0.382lk .
(4.1.16)
119
a+b
,
2
(4.1.17)
which is the point midway between a and b. The value of f 0 is then evaluated at
. If f 0 ( ) agrees in sign with f 0 (a) then the point a is replaced by and the new
interval of uncertainty is given by ( , b). If on the other hand f 0 ( ) agrees in sign
with f 0 (b) then the point b is replaced by and the new interval of uncertainty is
(a, ). The process is then repeated using Eq. (4.1.17).
Davidons Cubic Interpolation Method.
This is a polynomial approximation
method which uses both the function values and its derivatives for locating its minimum. It is especially useful in those multivariable minimization techniques which
require the evaluation of the function and its gradients.
We begin by assuming the function to be minimized f (x0 + s0 ) to be approximated by a polynomial in the form
p() = a + b + c2 + d3 ,
(4.1.18)
p1 = p() = f (x0 + s) ,
(4.1.19)
and
g0 =
dp
(0) = sT f (x0 ),
d
g1 =
dp
() = sT f (x0 + s) .
d
(4.1.20)
g0 + e 2 g0 + g1 + 2e 3
+
,
3 2
3
(p0 p1 ) + g0 + g1 .
(4.1.21)
(4.1.22)
We can now locate the minimum, = m , of Eq. (4.1.21) by setting its derivative
with respect to to be zero. This results in
g0 + e h
m =
,
(4.1.23)
g0 + g1 + 2e
121
(4.1.24)
It can be easily verified, by checking d2 p/d2 , that the positive sign must be retained
in Eq. (4.1.23) for m to be a minimum rather than a maximum. Thus, the algorithm
for Davidons cubic interpolation [5] may be summarized as follows.
1. Evaluate p0 = f (x0 ) and g0 = sT f (x0 ) and make sure that g0 < 0.
2. In the absence of an estimate of the initial step length , we may calculate it
on the basis of a quadratic interpolation derived using p0 , g0 and an estimate of pmin .
Thus,
2(pmin p0 )
=
.
(4.1.25)
g0
3. Evaluate p1 = f (x0 + s) and g1 =
df (x0 + s)
d
df (x0 + m s)
0,
dm
(4.1.26)
f 0 (i )
,
f 00 (i )
(4.1.29)
(4.1.30)
where i and i+1 are the ith and the (i + 1)st estimates of the minimum value of
the , is a non-zero constant.
4.1.4 Safeguarded Polynomial Interpolation [7], p. 92
Polynomial interpolations such as the Quadratic interpolation and the Davidons
cubic interpolation are sometimes found to be quite inefficient and unreliable for
locating the minimum of a function along a line. If the interpolation function is not
representative of the behavior of the function to be minimized within the interval
of uncertainty, the minimum may fall outside the interval, or become unbounded
below, or the successive iterations may be too close to one another without achieving
a significant improvement in the function value. In such cases, we use what are
known as safeguarded procedures. These procedures consist of combining polynomial
interpolations with a simple bisection technique or the golden section search technique
described earlier. At the end of the polynomial interpolation, the bisection technique
would be used to find the zero of the derivative of the function f . The golden
section search, on the other hand, would work with the function f itself using the
known interval of uncertainty (a, b) and locate the point which corresponds to the
minimum of f within the interval.
4.2 Minimization of Functions of Several Variables
n
X
qek ,
j = 1, . . . , n ,
(4.2.1)
k=1
k6=j
with
a
p = ( n + 1 + n 1),
n 2
and
a
q = ( n + 1 1) ,
n 2
(4.2.2)
where ek is the unit base vector along the kth coordinate direction, and x0 is the
initial base point. For example, for a problem in two-dimensional design space Eqs.
(4.2.1) and (4.2.2) lead to an equilateral triangle of side a.
Once the simplex is defined, the function f is evaluated at each of the n+1 vertices
x0 , x1 , . . . , xn . Let xh and xl denote the vertices where the function f assumes its
maximum and minimum values, respectively, and xs the vertex where it assumes the
second highest value. The simplex method discards the vertex xh and replaces it
by a point where f has a lower value. This is achieved by three operations namely
reflection, contraction, and expansion.
The reflection operation creates a new point xr along the line joining xh to the
of the remaining points defined as
centroid x
n
1X
=
x
xi ,
n i=0
i 6= h .
(4.2.3)
(4.2.4)
(4.2.5)
with the expansion coefficient often being chosen to be 2. If the value of the function
fe is smaller than the value at the end of the reflection step, then we replace xh by
xe and repeat the process with the new simplex. However, if the expansion leads to a
function value equal to or larger than fr , then we form the new simplex by replacing
xh by xr and continue.
Finally, if the process of reflection leads to a point xr such that, fr < fh , then
we replace xh by xr and perform contraction. Otherwise (fr fh ), we perform
contraction without any replacement using
+ (xh x
) ,
xc = x
(4.2.6)
with the contraction coefficient , 0 < < 1, usually chosen to be 1/2. If fc = f (xc )
is greater than fh , then we replace all the points by a new set of points
1
xi = xi + (xl xi ),
2
i = 0, 1, . . . , n ,
(4.2.7)
and restart the process with this new simplex. Otherwise, we simply replace xh by
xc and restart the process with this simplex. The operation in Eq. (4.2.7) causes the
distance between the points of the old simplex and the point with the lowest function
value to be halved and is therefore referred to as the shrinkage operation. The flow
chart of the complete method is given in Figure 4.2.1. For the convergence criterion
to terminate the algorithm Nelder and Mead [9] proposed the following
(
1 X
[fi f (
x)]2
1 + n i=0
) 12
< ,
(4.2.8)
126
n
X
# 12
(fi f )2 /(n + 1)
(4.2.10)
i=0
(4.2.11)
cv
The performance of this modified simplex method has been compared [10] with the
simplex method proposed by Nelder and Mead, and also with more powerful methods such as the second order Davidon-Fletcher-Powell (DFP) method which will be
discussed later in this chapter. For high dimensional problems the modified simplex
algorithm was found to be more efficient and robust than the DFP algorithm. Nelder
and Mead [9] have also provided several illustrations of the use of their algorithm
in minimizing classical test functions and compared its performance with Powells
conjugate directions method which will be discussed next.
Powells Conjugate Directions Method and its Subsequent Modification.
Although most problems have functions which are not quadratic, many unconstrained
minimization algorithms are developed to minimize a quadratic function. This is because a function can be approximated well by a quadratic function near a minimum.
Powells conjugate directions algorithm is a typical example. A quadratic function in
Rn may be written as
1
f (x) = xT Qx + bT x + c .
(4.2.12)
2
A set of directions si , i = 1, 2 . . . are said to be Q-conjugate if
sTi Qsj = 0,
for i 6= j .
(4.2.13)
Furthermore, it can be shown that if the function f is minimized once along each
direction of a set s of linearly independent Q-conjugate directions then the minimum
127
12
,
(4.2.14)
then use the same old directions again for the next univariate cycle (that is do not
discard any of the directions of the previous cycle in preference to the pattern direction
skp ). If Eq. (4.2.14) is not satisfied then replace the mth direction by the pattern
direction skp .
5. Begin the next univariate cycle with the directions decided in step 4, and
repeat the steps 2 through 4 until convergence to a specified accuracy. Convergence
is assumed to be achieved when the Euclidean norm kxk1 xk k is less than a prespecified quantity .
Although Powells original method does possess a quadratic termination property,
his modified algorithm does not [3]. The modified method will now be illustrated on
the following simple example from structural analysis.
128
Figure 4.2.2 Tip loaded cantilever beam and its finite element model.
v1
1
2
3
2
3
2
3
2
3
,
v() = (1 3 + 2 ) l( 2 + ) (3 2 ) l( + )
v2
2
(4.2.15)
where = x/l. The corresponding potential energy of the beam model is given by
EI
= 3
2l
Zl
d2 v
d 2
2
d + pv2 .
(4.2.16)
Because of the cantilever end condition at = 0, the first two degrees of freedom
in Eq. (4.2.15) are zero. Therefore, substituting Eq. (4.2.15) into Eq. (4.2.16) we
obtain
EI
(4.2.17)
= 3 (12v22 + 422 l2 12v2 2 l) + pv2 .
2l
Defining f = 2l3 /EI, x1 = v2 , x2 = 2 l, and choosing pl3 /EI = 1, the problem
of determining the tip deflection and rotation of the beam reduces to an unconstrained
minimization of
f = 12x21 + 4x22 12x1 x2 + 2x1 .
(4.2.18)
Starting with an initial point of x10 = (1, 2)T and f (x10 ) = 2 we will minimize
f using Powells conjugate directions method. The exact solution of this problem is
at x = (1/3, 1/2)T .
129
(4.2.20)
Taking the derivative of Eq. (4.2.20) with respect to , we obtain the value of
which minimizes f to be = 1/12. Hence,
13
1
12
and f (x11 ) = 1.916666667 .
x1 =
2
Choosing s12 = (0, 1)T , we obtain
13
13
0
1
12
12
x2 =
+
=
,
2
1
2 +
(4.2.21)
and
f () = 12
13
12
2
+ 4(2 + ) 12
13
12
(2 + ) + 2
13
12
,
(4.2.22)
which is minimum at = 3/8. Therefore, at the end of the univariate search we have
( 13 )
x12 =
12
13
8
=
s1p = x12 x10 =
2
3
13
8
130
(4.2.23)
(4.2.24)
147
83
49
The direction that corresponds to the largest decrease in the objective function f
during the first cycle of the univariate search is associated with the second variable.
We can now decide whether we want to replace the second (m = 2) univariate search
direction by the pattern direction or not by checking the condition stated in step 4
of the algorithm, Eq. (4.2.24). That is, Powells criterion
12
40
2 1.319727891
<
.
(4.2.25)
|| =
49
1.916666667 1.354166667
is satisfied, therefore, we retain the old univariate search directions for the second
cycle and restart the procedure by going back to step 2 of the algorithm. The results
of the second cycle are tabulated in Table 4.2.1.
Table 4.2.1. Solution of the beam problem using Powells conjugate directions method
CycleN o.
0
1
1
2
2
2
x1
1.0
1.083334
1.083334
0.895834
0.895834
0.33334
x2
2.0
2.0
1.625
1.625
1.34375
0.499999
f
2.0
1.916667
1.354167
0.9322967
0.6158854
0.333333
(4.2.27)
(4.2.28)
It can easily be verified (see Exercise 6) that the steepest descent direction is given
by
f
s=
,
(4.2.29)
kf k
where k k denotes the Euclidean norm, and it provides the largest decrease in the
function f . Starting with a point xk at the kth iteration of the minimization process,
we obtain the next point xk+1 as
xk+1 = xk + s .
(4.2.30)
Here s is given by Eq. (4.2.29) and is determined such that f is minimized along
the chosen direction by using any one of the one-dimensional minimization techniques
covered in the previous section. If the function to be minimized is quadratic in Rn
and expressed as
1
f = xT Qx + bT x + c ,
(4.2.31)
2
the step length can be determined directly by substituting Eq. (4.2.30) into Eq.
(4.2.31) for the (k + 1)st iteration followed by a minimization of f with respect to
which yields
(xk T Q + bT )s
=
.
(4.2.32)
(sT Qs)
132
1
y 2 = x2 .
(4.2.33)
12
The function f may now be expressed in terms of the new variables y1 and y2 as
1
f (y1 , y2 ) = y12 + y22 + (y1 + 3y2 ) .
(4.2.34)
6
133
y1 = 12
,
3
12
at which the gradient of f is zero, implying that it is a minimum point. The corresponding values of the original variables x1 , and x2 are 1/3 and 1/2, respectively.
This simple demonstration clearly shows the effectiveness of scaling in convergence
of the steepest descent algorithm to the minimum of a function in Rn . It can be
shown [6] that the steepest descent method has only a linear rate of convergence in
the absence of an appropriate scaling.
Unfortunately, in most multivariable function minimizations it is not easy to determine the appropriate scaling transformation that leads to a one step convergence
to the minimum of a general quadratic form in Rn using the steepest descent algorithm. This would require calculating the Hessian matrix and then performing an
expensive eigenvalue analysis of the matrix. Hence, we are forced to look at other
alternatives for rapid convergence to the minimum of a quadratic form. One such
alternative is provided by minimizing along a set of conjugate gradient directions
which guarantees a quadratic termination property. Hestenes and Stiefel [12] and
later Fletcher and Reeves [13] offered such an algorithm which will be covered next.
Fletcher-Reeves Conjugate Gradient Algorithm. This algorithm begins from
an initial point x0 by first minimizing f along the steepest descent direction,
s0 = f (x0 ) = g0 , to the new iterate x1 . The direction for the next iteration
s1 must be constructed so that it is Q-conjugate to s0 where Q is the Hessian of
the quadratic f . The function is then minimized along s1 to yield the next iterate
x2 . The next direction s2 from x2 is constructed to be Q-conjugate to the previous
directions s0 and s1 , and the process is continued until convergence to the minimum is achieved. By virtue of Powells theorem on conjugate directions for quadratic
functions, convergence to the minimum is theoretically guaranteed at the end of the
minimization of the function f along the conjugate direction sn1 . For functions
which are not quadratic, conjugacy of the directions si , i = 1, . . . , n loses its meaning since the Hessian of the functions is not a matrix of constants. However, it is a
common practice to use this algorithm for non-quadratic functions. Since, for such
functions, convergence to the minimum will rarely be achieved in n steps or less, the
algorithm is restarted after every n steps. The basic steps of the algorithm at the
(k + 1)th iterate is as follows
1. Calculate xk+1 = xk + k+1 sk where k+1 is determined such that
df (k+1 )
=0.
dk+1
134
(4.2.36)
and
gk = f (xk ) .
(4.2.37)
0.4275 ,
x2 =
1.0961
1.8077
+ 2
1.76036
3.0178
.
135
2
4
T
24 12
12
8
1.76036
3.0178
'0.
we have verified the Q-conjugacy of the two directions s0 and s1 . The progress of
minimization using this method is illustrated in Figure (4.2.3).
Beales Restarted Conjugate Gradient Technique. In minimizing non-quadratic
functions using the conjugate gradient method, restarting the method after every
n steps is not always a good strategy. Such a strategy seems to be insensitive to
the nonlinear character of the function being minimized. Beale [14] and later Powell
[15] have proposed restart techniques that take the nonlinearity of the function into
account in deciding when to restart the algorithm. Numerical experiments with
minimization of several general functions have led to the following algorithm by Powell
[15].
1. Given x0 , define s0 to be the steepest descent direction,
s0 = f (x0 ) = g0 ,
let k = t = 0, and begin iterations by incrementing k.
2. For k 1 the direction sk is defined by Beales formula [14]
sk = gk + k sk1 + k st ,
where
k =
and
k =
gk = f (xk ) ,
gkT [gt+1 gt ]
,
sTt [gt+1 gt ]
k = 0,
and
(4.2.38)
(4.2.39)
if k > t + 1 ,
(4.2.40)
if k = t + 1 .
(4.2.41)
(4.2.42)
(4.2.43)
(4.2.45)
(4.2.46)
where Q is the Hessian of the objective function. The general form of the update
equation of Newtons method for minimizing a function in Rn is given by
xk+1 = xk k+1 Q1
k f (xk ) ,
(4.2.47)
(4.2.48)
137
(4.2.49)
to obtain the direction vector s. For every iteration (if Q is non-sparse), Newtons
method involves the calculation of n(n + 1)/2 elements of the symmetric Q matrix,
and n3 operations for obtaining s from the solution of Eqs. (4.2.49). It is this feature
of Newtons method that has led to the development of methods known as quasiNewton or variable-metric methods which seek to use the gradient information to
construct approximations for the Hessian matrix or its inverse.
Quasi-Newton or Variable Metric Algorithms . Consider the Taylor series expansion of the gradient of f around xk+1
f (xk+1 ) ' f (xk ) + Q(xk+1 xk ) ,
(4.2.50)
and
pk = xk+1 xk ,
(4.2.52)
(4.2.53)
(4.2.54)
sk = Bk f (xk ) ,
(4.2.55)
where
with Bk being a positive definite symmetric matrix.
Rank-One Updates. In the class of rank-one updates we have the well-known
symmetric Broydens update [19] for Bk+1 given as
Bk+1 = Bk +
138
(pk Bk yk )(pk Bk yk )T
.
(pk Bk yk )T yk
(4.2.56)
v
v
+
,
(4.2.57)
k k k
k
ykT Bk yk
pTk yk
where
vk =
1
(ykT Bk yk ) 2
pk
Bk yk
T
T
pk yk yk Bk yk
,
(4.2.58)
and k and k are scalar parameters that are chosen appropriately. Updates given
by Eqs. (4.2.57) and (4.2.58) are subsets of Huangs family of updates [20] which
guarantee that Bk+1 yk = pk for all choices of k and k . If we set k = 0 and
k = 1 for all k we obtain the Davidon-Fletcher-Powells (DFP) update formula [21,
22] which is given as
Bk+1 = Bk
Bk yk ykT Bk pk pTk
+ T
.
ykT Bk yk
pk yk
(4.2.59)
The DFP update formula preserves the positive definiteness and symmetry of the
matrices Bk , and has some other interesting properties as well. When used for minimizing quadratic functions, it generates Q-conjugate directions and, therefore, at the
nth iteration Bn becomes the exact inverse of the Hessian Q. Thus, it has the features
of the conjugate gradient as well as the Newton-type algorithms. The DFP algorithm
can be used without an exact line search in determining k+1 in Eq. (4.2.54). However, the step length must guarantee a reduction in the function value, and must
be such that pTk yk > 0 in order to maintain positive definiteness of Bk . The performance of the algorithm, however, was shown to deteriorate as the accuracy of the line
search decreases [20]. In most cases the DFP formula works quite successfully. In a
few cases the algorithm has been known to break down because Bk became singular.
This has led to the introduction of another update formula developed simultaneously
139
.
(4.2.60)
pTk yk
pTk yk
pTk yk
pTk yk
Equation (4.2.60) can also be written in a more compact manner as
pk ykT
yk pTk
pk pT
Bk+1 = I T
Bk I T
+ T k .
pk yk
pk yk
pk yk
(4.2.61)
1
Using Ak+1 = B1
k+1 and Ak = Bk we can invert the above formula to arrive at an
update for the Hessian approximations. It is found that this update formula reduces
to
Ak pk pTk Ak yk ykT
Ak+1 = Ak
+ T ,
(4.2.62)
pTk Ak pk
yk pk
which is the analog of the DFP formula (4.2.59) with Bk replaced by Ak , and
pk and yk interchanged. Conversely, if the inverse Hessian Bk is updated by the DFP
formula then the Hessian Ak is updated according to an analog of the DFP formula.
It is for this reason that the BFGS formula is often called the complementary DFP
formula. Numerical experiments with BFGS algorithm [26] suggest that it is superior
to all known variable-metric algorithms. We will illustrate its use by minimizing the
potential energy function of the cantilever beam problem.
Example 4.2.4
Minimize f (x1 , x2 ) = 12x21 + 4x22 12x1 x2 + 2x1 by using the BFGS update algorithm
with exact line searches starting with the initial guess xT0 = (1, 2).
We initiate the algorithm with a line search along the steepest descent direction.
This is associated with the assumption that B0 = I which is symmetric and positive
definite. The resulting point is previously calculated in example 4.2.3 to be
1.0961
2.6154
x1 =
,
and f (x1 ) =
.
1.8077
1.3077
From Eq. (4.2.52) we calculate
1.0961
1
0.0961
p0 =
=
,
1.8077
2
0.1923
2.6154
2
4.6154
y0 =
=
.
1.3077
4
2.6923
Substituting the terms
pT0 y0 = (0.0961)(4.6154) + (0.1923)(2.6923) = 0.96127 ,
140
p0 y0T =
0 1
0 1
0.96127 0.88754 0.51773
1
1
1 0
0.44354 0.88754
0.00923 0.01848
+
,
0 1
0.96127 0.25873 0.51773
0.96127 0.01848 0.03698
0.37213 0.60225
=
.
0.60225 1.10385
Next, we calculate the new move direction from Eq. (4.2.55)
0.37213 0.60225
2.6154
1.7608
s1 =
=
,
0.60225 1.10385
1.3077
3.0186
and obtain
x2 =
1.0961
1.8077
+ 2
1.7608
3.0186
.
Setting the derivative of f (x2 ) with respect to 2 to 0 yields the value 2 = 0.4332055,
and
0.3333
0
x2 =
, with
f (x2 ) '
.
0.5000
0
This implies convergence to the exact solution. It is left to the reader to verify that
if B1 is updated once more we obtain
0.1667 0.25
B2 =
,
0.25
0.5
which is the exact inverse of the Hessian matrix
24 12
Q=
.
12
8
It can also be verified that, as expected, the directions s0 and s1 are Q-conjugate.
Q-conjugacy of the directions of travel has meaning only for quadratic functions,
and is guaranteed for such problems in the case of variable-metric algorithms belonging to Huangs family only if the line searches are exact. In fact, Q-conjugacy
of the directions is not necessary for ensuring a quadratic termination property [26].
This realization has led to the development of methods based on the DFP and BFGS
formulae that abandon the computationally expensive exact line searches. The line
searches must be such that they guarantee positive definiteness of the Ak or Bk
matrices while reducing the function value appropriately. Positive definiteness is
141
(4.2.63)
T
s f (xk+1 )
sT f (xk ) < 0.9 .
(4.2.64)
and
The convergence of the BFGS algorithm under these conditions has been studied by
Powell [27]. Similar convergence studies with Beales restarted conjugate gradient
method under the same two conditions have been carried out by Shanno [28].
4.2.4 Applications to Analysis
(4.2.65)
where the Hessian of f and the Jacobian of g are the same. In cases where the
problems are posed directly as
g(x) = 0 ,
(4.2.66)
Dennis and Schnabel [6] and others solve Eq. (4.4.2) by minimizing the nonlinear
least squares function
1
f = gT g .
(4.2.67)
2
In this case, however, the Hessian of f and the Jacobian of g are not identical but a
positive definite approximation to the Hessian of f appropriate for most minimization schemes can be easily generated from the Jacobian of g [6]. Minimization of f
then permits the determination of not only stable but also unstable equilibrium configurations provided the minimization does not converge to a local minimum. In the
case of convergence to a local minimum, certain restart [6] or deflation and tunnelling
techniques [29, 30] can be invoked to force convergence to the global minimum of f
at which kgk = 0.
142
(4.3.1)
yk = f (xk+1 ) f (xk ) ,
(4.3.2)
and
and reintroducing them to compute the new search directions. After a sequence of
five to ten iterations during which the BFGS updates are used, the stiffness matrix
is recomputed and the update information is deleted.
Sparse updates for solving large-scale problems were perhaps first proposed by
Schubert [32], who proposed a modification of Broydens method [33] according to
which the ith row of the Hessian Ak+1 is updated by using
(i)
(i)
Ak+1 = Ak +
(4.3.3)
gi
gi (x0 + hj ej ) gi (x0 )
=
,
xj
hj
(4.3.4)
where ej is the jth coordinate vector and hj is a suitable step size. Each step size may
be adjusted such that the greatest ratio of the round-off to truncation error for any
column of the Hessian falls within a specified range. However, such an adjustment of
step sizes would require a significantly large number of gradient evaluations. Hence,
to economize on the number of gradient evaluations the step sizes are not allowed to
leave the range
[max(|xj |, huj ), huj ] ,
(4.3.5)
where is the greatest relative round-off in a single operation, is the relative machine
precision, and huj is an upper bound on hj [36].
Powell and Toint [37] extended the CPR strategy to exploit symmetry of the
Hessian. They proposed two methods, one of which is known as the substitution
method. According to this, the CPR strategy is first applied to the lower triangular
part, L, of the symmetric Hessian, A. Because, all the elements of A computed this
way will not be correct, the incorrect elements are corrected by a back-substitution
scheme. Details of this back-substitution scheme may be found in Ref. 37.
The Powell-Toint (PT) strategy of estimating sparse Hessians directly appears
to be a much better alternative to Toints sparse update algorithm [38]. One major
drawback of Toints update algorithm is that the updated Hessian approximation is
not guaranteed to remain positive definite even if the initial Hessian approximation
was positive definite.
4.3.2 Coercion of Hessians for Suitability with Quasi-Newton Methods
In minimizing a multivariable function f using a discrete Newton method or the
Toints update algorithm we must ensure that the Hessian approximation is positive
144
(4.4.4)
where P = 0.99 for S > 100, and P = 0.995 for S < 100. For discrete valued
variables there are often many options for defining the neighborhood of the design.
One possibility is to define it as all the designs that can be obtained by changing one
design variable to its next higher or lower value. A broader immediate neighborhood
can be defined by changing more than one design variables to their next higher or
lower values. For an n variable problem, the immediate neighborhood has
S = 3n 1 .
(4.4.5)
k = 0, 1, 2, . . . , K ,
(4.4.6)
where 0.5 0.95. Nahar [54] fixes the number of decrement steps K, and
suggests determination of the values of the Tk experimentally. It is also possible to
divide the interval [0, T0 ] into a fixed K number of steps and use
TK =
K k
T0 ,
K
k = 1, 2, . . . , K .
(4.4.7)
x2
x3
x4
(4.4.9)
where string equivalents of the individual variables are connected head-to-tail, and,
in this example, base 10 values of the variables are x1 = 6, x2 = 5, x3 = 3, x4 = 11,
and their ranges correspond to {15 x1 , x4 0}, {7 x2 0}, and {3 x3
0}. Because of the bit string representation of the variables, genetic algorithms are
ideally suited for problems where the variables are required to take discrete or integer
variables. For problems where the design variables are continuous values within a
range xLi xi xUi , one may need to use a large number of bits to represent the
variables to high accuracy. The number of bits that are needed depends on the
accuracy required for the final solution. For example, if a variable is defined in a
range {0.01 xi 1.81} and the accuracy needed for the final value is xincr = 0.001,
then the number of binary digits needed for an appropriate representation can be
calculated from
2m (xUi xLi )/xincr + 1 ,
(4.4.10)
where m is the number of digits. In this example, the smallest number of digits that
satisfy the requirement would be m = 11, which actually produces increments of
0.00087 in the value of the variable, instead of the required value of 0.001.
Unlike the search algorithms discussed earlier that move from one point to another
in the design variable space, genetic algorithms work with a population of strings
150
0 1 1 0 1k0 1 1 1
,
0 1 0 0 1k0 0 0 1
(4.4.11)
are mated with a crossover point of k = 5, the offsprings will have the following
composition,
offspring 1:
011010001
.
(4.4.12)
offspring 2:
010010111
Multiple point crossovers in which information between the two parents are swapped
among more string segments are also possible, but because of the mixing of the strings
the crossover becomes a more random process and the performance of the algorithm
might degrade, De Jong [60]. Exception to this is the two-point crossover. In fact,
the one point crossover can be viewed as a special case of the two point crossover in
which the end of the string is the second crossover point. Booker [61] showed that
by choosing the end-point of the segment to be crossed randomly, the performance
of the algorithm can actually be improved.
Mutation serves an important task of preventing premature loss of important
genetic information by occasional introduction of random alteration of a string. As
151
4.5 Exercises
A1
,
A2
l1
,
l2
1 =
h
,
l1
p=
p
,
EA2
and E is the elastic modulus, A1 and A2 are the cross-sectional areas of the bars. Using the BFGS algorithm determine the equilibrium configuration in terms of x1 and x2
for m = 5, = 4, 1 = 0.02, p = 2 105 . Use xT0 = (0, 0).
5. Continuing the analysis of the problem 4 it can be shown that the critical load pcr
at which the shallow truss is unstable (snap-through instability) is given by
pcr =
EA1 A2 ( + 1)2 13
.
(A1 + A2 ) 3 3
Suppose now that pcr as given above is to be maximized subject to the condition that
A1 l1 + A2 l2 = v0 = constant .
The exterior penalty formulation of Chapter 5 reduces the above problem to the
unconstrained minimization of
pcr (A1 , A2 , r) =
EA1 A2 ( + 1)2 13
+ r(A1 l1 + A2 l2 v0 )2 ,
(A1 + A2 ) 3 3
153
s2i = 1 ,
i=1
f
.
kf k
(4.5.1)
4.6 References
[1] Kamat, M.P. and Hayduk, R.J., Recent Developments in QuasiNewton Methods for Structural Analysis and Synthesis, AIAA J., 20 (5), 672679, 1982.
[2] Avriel, M., Nonlinear Programming: Analysis and Methods. PrenticeHall, Inc.,
1976.
[3] Powell, M.J.D., An Efficient Method for Finding the Minimum of a Function of
Several Variables without Calculating Derivatives, Computer J., 7, pp. 155162,
1964.
[4] Kiefer, J., Sequential Minmax Search for a Maximum, Proceedings of the American Mathematical Society, 4, pp. 502506, 1953.
154
158