0% found this document useful (0 votes)
64 views

Beale 1967-Numerical Methods

Beale_1967-Numerical_Methods

Uploaded by

Nuno Valentino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
64 views

Beale 1967-Numerical Methods

Beale_1967-Numerical_Methods

Uploaded by

Nuno Valentino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 73
VIL NUMERICAL METHODS E. M. L. BEALE C-EL-R Ltd., London Contents 1. THE MINIMIZATION OF A NONLINEAR FUNCTION OF SEVERAL VARIABLES WITHOUT CONSTRAINTS II, AN INTRODUCTION 10 BEALE’S METHOD OF QUADRATIC PROGRAMMING 11, THE PRACTICAL VERSION OF BEALE’S METHOD OF QUADRATIC PROGRAMMING 1V, THE INVERSE MATRIX METHOD FOR LINEAR AND QUADRATIC PROGRAMMING V. SEPARABLE PROGRAMMING Vi. PARAMETRIC SEPARABLE PROGRAMMING AND INTERPOLATION PROCEDURES Vit, METHODS OF APPROXIMATION PROGRAMMING ‘Vu. DECOMPOSITION AND PARTITIONING METHODS FOR NONLINEAR PROGRAMMING REFERENCES 135 143 154 164 173 182 189 197 204 I, THE MINIMIZATION OF A NONLINEAR FUNCTION OF SEVERAL VARIABLES WITHOUT CONSTRAINTS L1. Introduction This first chapter covers some aspects of the problem of minimizing a nonlinear function of several variables in the absence of constraints. This may seem to be out of order, since mathematical programming is specifically concerned with methods of handling constraints. But as a mathematical programming problem becomes more and more nonlinear the fact that it involves constraints becomes less and less important. One is therefore building on a foundation of sand if one goes straight into a discussion of the special problems introduced with the constraints without first reviewing methods of handling the corresponding problems in the absence of con- straints. This is especially true since the developments in computer capa- bilities in the last few years have stimulated research in methods of solving minimization problems without constraints in the same way as they have stimulated research in nonlinear programming. Almost any numerical problem can be expressed as the minimization of a function of several variables; but this course deals specifically with iterative methods of finding local minima, so it seems logical to restrict our attention to such methods when dealing with unconstrained problems. Both here and later when dealing with programming problems, one is particularly happy with methods for finding local minima when the problem is known to be convex, since the local minimum must then be a global minimum. Many real problems are not convex, but one may still be content with a local minimum for one of 3 reasons: (a) because the nonconvex elements in the problem are in the nature of small perturbations, so that it seems unlikely that they could introduce unwanted local minima, (b) because one feels intuitively, perhaps from a knowledge of the real problem represented, that it will not have unwanted local minima (at least if one starts from a good initial solution), or 136 E. M. L. BEALE (©) because the only practical alternative to a local minimum is a completely arbitrary solution. Mathematicians have concentrated on methods for finding global minima jn convex problems. Many of these can also be used to find local minima in nonconvex problems. Others cannot, and are therefore much less useful in practice. The only method discussed in this part of the course that requires any convexity assumptions is decomposition. Now the essence of an iterative method is that one has a trial solution to the problem and looks for a better one. There are 3 broad classes of such methods for minimization problems. There are quadratic methods, that ‘use estimates of the first and second derivatives of the objective function in the neighbourhood of the current trial solution. There are linear methods, that use first but not second derivatives. And there are directional methods that use no derivatives. (The first derivatives form a vector known as the gradient vector. The second derivatives can be written as a symmetric square matrix known as the Hessian.) In asense quadratic methods are the most natural. Any twice differentiable function can be approximated by a quadratic function in the neighbourhood. of an unconstrained minimum. To ensure rapid convergence in the closing stages of an iterative procedure, it is therefore natural to require that the procedure should converge in a finite number of steps if the objective function is truly quadratic. This can easily be accomplished with a quadratic method. On the other hand one can use the concepts of “conjugate direc- tions” to achieve the same result with other methods. The methods are a little more complicated, and the number of steps is increased; but the work per step may be much less, since a function of p variables has p first deriva~ tives and 4p(p+1) second derivatives. This chapter is concerned with quadratic methods, They are appropriate when second derivatives can be estimated fairly easily, and in particular — for reasons discussed below — when the objective function isa sum of squares. Other methods are discussed by Dr. Wolfe elsewhere in the course. 1.2. Gauss’s method for sums of squares problems Let us now consider problems in which the objective function is a sum of squares. The problem is then to minimize ‘NUMERICAL METHODS 137 where the z; are nonlinear functions of the independent variables of the problem. This type of problem arises in a statistical context when one is estimating the parameters of a nonlinear model by the method of least squares. The z; are then the residuals, i.e. the deviations between the observed and fitted values of the observations. A similar situation arises when solving nonlinear simultaneous equations. This problem may be formulated as one of minimizing the sum of squares of the residuals between the left and right hand sides of the equations. We then know that the minimum value of S is zero, which may be useful, but otherwise the problems are the same. The importance of this form of the expression for the objective function is that if we get linear approximations, not to S itself but to the individual components z;, then we can use these to define a quadratic approximation to S. For if the variables are x,,+*-,x,, and our trial solution is given by Xj = Xjo, then, writing Ax, for x,—xXj9 and Zig for 2,(X10,***s Xpo), We have So Sip = S Got tn dtr t + Hyd P 2 2 = bot 2) bjAxj+ YY bp Axy Ax, ima fea RL where by =D zio A Sony (#0) We can now find the values of the Ax, that minimize S,,, by solving the “normal equations” of multiple regression in the usual way; and these define our next trial solution to the problem. This approach is due to Gauss. It should be noted that the quadratic approximation S,,,, is not precisely the one that one would obtain by computing the first and second derivatives of S. For if one expands S'as a power series in the Ax), one will get some quadratic terms in the expansions for the individual quantities z,. In fact if D no = tit Y ayjdxjt YY aids Art oo Jmi Ji ket 138 E, M. L. BEALE we find that P ze. S=ept2Y cAxjt YY cpdxpAryt ss iat i it where & = ¥ 70 oa = Zea A 9) The question whether one should use the cy, or the by, as the quadratic terms in the approximate expression for $ was discussed by Wilson and Puffer [1]. They point out that, quite apart from the labour of computing the quan- tities ajj,, there are advantages in using the by, while optimizing - for example the matrix (b;,) is certainly positive-definite, so the turning point obtained by equating the derivatives to zero is definitely a minimum of the quadratic approximation to S. This is a useful property not necessarily shared by the matrix (cj,). And since ¢; = 6j, if one has a trial solution for which Sipp is minimized with all Ax; = 0, then S is minimized. Wilson and Puffer suggest that if in a statistical problem one wants to quote approximate standard errors for the estimates of the x;, then one should derive these from the matrix (¢,) rather than (b;,). But this problem lies outside the scope of this course. The theoretical statistical problems involved are discussed by Beale [2]. It is, perhaps, surprising that Wilson and Puffer’s work has not been rediscovered and republished by someone else in the last 30 years. But as far as I know this point has not received detailed discussion in print since that time. 1.3 Interpolation methods The procedure outlined above will work well for nearly linear problems, particularly if there are not many variables. But otherwise it may fail to converge. One way to overcome this problem is to regard the elementary theory as simply defining a direction from the current trial solution in which to seek the next point. One then interpolates, or extrapolates, along this line to find the point minimizing S. This procedure was suggested by Box [3], and implemented in a computer program for nonlinear estimation on the NUMERICAL METHODS 139 IBM 704 computer written under his direction. It is discussed in more detail by Hartley [4]. There are various points in favour of this procedure. Theoretically it is bound to converge if the quantities z; are differentiable (once), since S starts to decrease as one moves from the present trial solution towards the suggested new one. Furthermore there is often a fair amount of work involved in computing the derivatives (the quantities a,;) at a trial solution, $0 it is sensible to do some exploration to improve the effectiveness of the step taken as a result of this work. On the other hand, if one is not going all the way to the point indicated by the quadratic approximation, there is no particular reason to go in this direction. The point is illustrated in 2 dimensions in the following diagram. The point P defines a trial solution, and the cllipses are contours of constant values of the quadratic approximation S,,, to S. This quadratic approximation is minimized at the point P’, and the broken line indicates the set of points considered by the Box-Hartley technique as candidates for the next trial solution. But if it turns out to be impossible to move more than a small distance from P in the selected direction before the true value of S starts to increase, then it would seem more logical to move not towards P’ but in the gradieut direction of the function S. In our example the gradient direction is indicated by the arrow. This direction may be up to 89.5° away from the direction PP’, and in high-dimensional problems it often is. Marquardt [5] reports that, having monitored this angle for a variety of problems, he found it usually lay between 80° and 90°. The difficulty arising from such a large deviation from the gradient direction is particularly serious for problems where the derivatives aj are estimated numerically from first differences of the function z;. One then has to accept errors due to using nonlocal derivatives if the step-length is large, or alternatively rounding-off errors due to taking the difference between two nearly equal quantities if the step-length is small. These errors often result in failure to find a better trial solution at all if one only looks in a direction that even theoretically is inclined at a high angle to the gradient direction. 140 E. M, L. BEALE 14 The Levenberg method A suitable method of resolving this difficulty is given by Levenberg [6] and elaborated by Marquardt [5]. The argument is as follows. Suppose that, instead of trying to minimize S,,, directly, one asked for the point minimizing S,,, conditional on it being not more than a given distance from the origin (i.e. the point where all 4x; = 0). The logic behind this request is that one wants to remain fairly close to P because the accuracy of the quadratic approximation will tend to dectease the further away one goes. Then the procedure is of course to minimize, not Sp), but P Seno = Supp t2¥ (4x,)", it where the Lagrange multiplier 4 must be chosen so that the resulting point is the required distance from P. Computationally this is an easy thing to do; one simply adds / to all the diagonal elements of the normal equations before solving for the Ax;. The only real difficulty is to choose 2. It is clear that if 2 = 0 one goes the whole way to the point P’, while as 2 — oo one goes to a point an arbitrarily short distance from P along the gradient direction. One is liable to find that an intermediate value produces a result very near to one of these extremes, and it is generally not worthwhile to try out several values of / at one step. But this difficulty can be mitigated by allowing the program to adjust the value of 2 according to the progress of the calculations. The very best method of modifying 2 is not entirely clear. One approach that has proved satisfactory in practice is as follows. First one defines a standard value of 4, say 2*, which is a suitably small number. Just how small this is depends on the scales on which the variables are measured. And it should be noted that the variables must be scaled to be commensurable in some sense — this scaling can be done either by a physical knowledge of the problem, or numerically, e.g. by choosing scales so that by = 1 for all j, in which case a suitable standard value of 4* may be around 0.0001. Then one defines a scale of used values of 4 equal to (2"—1)/ *by for n = 0, 1,2,-°+. The multiplication by bp makes the procedure independent of the scale on which the objective function (or dependent variable) is measured, Then the basis of the rule for choosing A is the idea that if the new point proves better than the old,then one moves to the new point and reduces NUMERICAL METHODS 141 n by 1, while if the new point proves worse than the old, then one remains at the old point and increases by 1. This seems to be right in principle, since one likes to use as small a value of mas circumstances permit, to give as nearly true quadratic convergence as possible. On the other hand if one is in trouble one must take a smaller step, and this is achieved by increasing n. But one can have situations in which the solution progresses in a rather erratic manner with a low value of # and in which one can do better with a larger n. This simple rule has therefore been modified in two respects. Firstly, if 2 is increased, one notes the reduction in S achieved at the last iteration, and if one achieves an even greater reduction with a higher value of n one increases n by another unit for the next iteration. Secondly, after an iteration with n = 0, ie. 4 = 0, one always increases n (to 1) for the next iteration, to try the effect of a positive Levenberg parameter. Other variants of the scheme will be appropriate in other situations. In particular it will often be desirable to use previously calculated values of the derivatives, ic. of the a,,, only recalculating the z;o at every iteration. One will then recalculate the aj; (a) if one fails to make progress at some iteration, or (b) if one has used a set of derivatives for many iterations without con- verging satisfactorily. LS Miscellaneous practical points A few miscellaneous practical points are perhaps worth noting. Firstly, it is entirely feasible to combine the use of the interpolation scheme with the Levenberg scheme. This is desirable in 2 situations: (a) if there are several independent variables, so that it is a nontrivial matter to recalculate the next trial point from the normal equations, or (b) if it is not feasible (or economical) to store the old derivatives a, so that there is appreciably more function-calculation involved in making a complete new step rather than exploring along a line. Secondly, it is desirable to solve the normal equations by inverting the matrix of sums of squares and products, pivoting on the diagonal elements as in step-wise multiple linear regression. One need only work with half the matrix if one keeps it symmetrical by adopting the appropriate sign conven- tion, indicated for example by Stiefel [7] on page 65. The advantage of this 142 E. M, L, BEALE procedure is that one can (and should) refuse to pivot on a diagonal element that has become very small. This just means that one does not consider changing the trial value of this particular variable while there is little independent evidence concerning its value. This difficulty is likely to arise in practice only when the Levenberg parameter is zero. Thirdly, it is often desirable to take the quantities 2; in groups, since the formulae for their values in terms of the independent variables may have common features that would otherwise have to be recalculated. This situation arises in particular if the quantities in a group refer to the same physical quantity at different times. Finally, it should be borne in mind that one does not necessarily have to make a sharp choice between computing derivatives theoretically and numerically. For example one may decide that most derivatives can most easily be computed numerically in spite of the rounding-off problems involved, but some may be self-evidently zero, while others can be computed in a “wholesale fashion”, using intermediate results obtained while comput- ing other derivatives. ‘NUMERICAL METHODS 143 Il. AN INTRODUCTION TO BEALE’S METHOD OF QUADRATIC PROGRAMMING 1.1 Introduction Historically the first venture into the theory of nonlinear programming has been to the problem known as quadratic programming. This name is restricted to the specific problem of minimizing a convex quadratic objective function of variables subject to linear constraints. In accordance with the philosophy outlined in Chapter I, | would extend this definition to include the problem of finding a local minimum of any quadratic objective function of variables subject to linear constraints. But many people will object to this generalization on the grounds that most of the methods that have been proposed for quadratic programming require that the objective function be convex: indeed some require that it be strictly convex. Many methods for quadratic programming have been published. Kiinzi and Krelle [8] discuss 7 of them, in addition to 3 versions of the applica- tion of gradient methods to quadratic programming. Since that time a number of variants of Wolfe’s [9] method have been published. I am not going to make any pretence of being impartial between these methods, I will content myself with explaining how my own method works and why I think it has great advantages over all other methods. T think it is natural that quadratic programming should have received so much attention from theorists. Mathematically, it is the natural first extension beyond the realm of linear programming; and it has the great mathematical advantage of being solvable in a finite number of steps. In practice it has not been used very extensively, and I think there are 3 main reasons for this. (a) In practical problems the nonlinearities in the constraints are often more important than the nonlinearities in the objective function. (b) When one has a problem with a nonlinear objective function there are generally rather few variables that enter the problem nonlinearly. But most methods for quadratic programming work no more simply in this case 144 E, M. L. BEALE than with a completely general quadratic objective function, and (c) Quadratic perturbations on a basically linear problem may well not be convex, although one may be fairly confident that, since they are perturba- tions, they will not introduce local minima. But most methods for quadratic programming cannot cope with such problems. Of these difficulties, the last does not apply to any version of Beale’s algorithm. The second does not apply to the second version of the algorithm discussed in the next chapter. Quadratic programming may therefore become more widely used now that this algorithm has been implemented in at least one general mathematical programming system for a large computer. The first difficulty is more fundamental. But one should note that the natural way to solve an unconstrained minimization problem is to make the simplest meaningful local approximation, This is to make linear approx- imations to the derivatives of the objective function in the neighbourhood of one’s trial solution, ie. to make a quadratic approximation to the objective function itself. Applying the same philosophy to constrained ‘minimization problems, one would naturally make linear approximations to the constraints and quadratic approximations to the objective function. T will return to the point in Chapter VII, when in particular I will show how one can throw the local nonlinearities in the constraints into the objective function. This is an essential part of the implementation of this philosophy. Having indicated the status of quadratic programming from the point of view of someone interested in applications of nonlinear programming, J now turn to Beale’s method for this problem. This exists in 2 versions. The first was originally published in Beale [10] and amplified in Beale [11]. The second is simply a streamlined way of organizing the computations following the logic of the first method. It was introduced in Beale [11] on page 236, and is discussed in detail in the following chapter. Perhaps the most important thing about this version of the algorithm is that it is particu- arly convenient for the product form of inverse basis method of carrying out the simplex calculations. This aspect is covered in the chapter on the inverse matrix method. This chapter is concerned with the basic logic of Beale’s method, which can be explained most easily in terms of the original version. IL2 The simplex method J will start by describing the simplex method in a way that applies to its use in linear programming and to its use in Beale quadratic programming. NUMERICAL METHODS 145 Let me point out that this differs from Wolfe’s version of the simplex method for quadratic programming. Wolfe [12] remarks that Beale’s method has a better claim to this name. There are 2 main reasons for this. (a). It was published some years earlier as an application of the simplex method to quadratic programming, and (b) it reduces to the ordinary simplex method when the objective function is linear. On the other hand the development of Wolfe’s method to a point where i¢ can handle purely linear problems yields an algorithm in which each step of the simplex method has to be performed twice (once on the original problem and once on its transpose). This last point is important, since it suggests that Wolfe’s method will take about twice as many steps as Beale’s on a nearly linear problem. But this is by the way. Suppose that we want to find a minimum, or at least a local minimum, of some objective function C of n variables x, that must be nonnegative and satisfy the m (<7) linearly independent equations, or constraints D Aus Fa If these constraints are not inconsistent, we can find a basic feasible solution in which m of these variables takes positive values and the others are zero. We can then use the constraints to express the basic variables, i.e. those that are positive, in terms of the nonbasic variables. We will then have sm). Xn = Giot Daze (his sims 1 where =, denotes x,,,,,- It is customary to write these equations in tableau form, corresponding to the coefficients of the equations when the z, are written on the left hand sides. The above form was introduced in Beale [13]. A. W. Tucker has recently suggested writing the equations in the form y= yo Yan —20), so that the numerical coefficients have the same signs as in the traditional tableau. This modification obviously has merit, but | find it rather cumber- some and have not adopted it. Let us now return to the description of the logic of the simplex method. 146 E. M. L. BEALE In the present trial solution of the problem, the basic variables x, equal ayo, and the nonbasic variables are all zero. One can now use the equations for the basic variables to express the objective function C in terms of the nonbasic variables. And one can then consider the partial derivative of C with respect to any one of them, say Z,, assuming that all the other nonbasic variables remain fixed and equal to zero. If now @C/dz, = 0, then a small increase in z,, with the other nonbasic variables held equal to zero, will not reduce C. But if @C/dz, < 0, then a small increase in z, will reduce C. If C is a linear function, then 0C/éz, is constant, and it will be profitable to go on increasing z, until one has to stop to avoid making one of the basic variables negative. Tf C is a nonlinear function with continuous derivatives, then it will be profitable to go on increasing z, until either (a) one has to stop to avoid making some basic variable, say x,, negative, or (b) 8C/az, vanishes and is about to become positive. In case (a), which is the only possible case when C is linear, one changes the basis by making x, nonbasic in place of z,, and uses the equation Xq= dgot D On% Fi to substitute for z, in terms of x, and the other nonbasic variables throughout the constraints and also the expression for C. In case '(b) one is in trouble if C is an arbitrary function. But if C is a quadratic function, then @C/@z, is a linear function of the nonbasic variables. The way in which the function C is expressed is the primary difference between the first, or theoretical, and the second, or practical, version of the algorithm. The theoretically simplest way to express C is as symmetric matrix (cy) for k, 1 = 0, 1,+++,2—m, such that Yew K= 150 C where zo = Land z,,° Then _n denote the nonbasie variables. and if this quantity becomes positive, as z, is increased keeping the other ‘NUMERICAL METHODS 147 nonbasic variables equal to zero, before any basic variable goes negative, then one defines a new nonbasic variable 4, = yo D Con zn & (where the subscript ¢ simply indicates that this is the ¢” such variable introduced into the problem). One then makes u, the new nonbasic variable, using the above equation to substitute for z, in terms of u, and the other nonbasic variables throughout the constraints and the expression for C. Note that if z, is an x-variable, then there will be one more basic x- variable after this iteration than before. The mechanics for substituting for z, in the expression for C will be discussed later. Let us first concentrate on the theory of the procedure. Note that u, is not restricted to nonnegative values. It is therefore called a free variable, as opposed to the original x-variables which are called restricted variables. But there is no objection to having a free nonbasic variable. One simply has to remember that if @C/éu, > 0, then C can be reduced by making u, negative — or alternatively by replacing u, by the variable v, = —u, and increasing v, in the usual way. There is one other point about these free variables. Once a free variable has been removed from the set of nonbasic variables one can forget about it. There are only 2 reasons for keeping track of the expressions for the basic variables in terms of the others. One is to know their yalues in the trial solution when it is obtained. The other, and much more fundamental, reason is to prevent such variables from surreptitiously becoming negative. Neither reason applies to any basic free variable in this type of problem. There remains the problem of explaining why one should make this particular choice of free variable. It is obviously convenient to have a set of nonbasie variables that all vanish at the present trial solution, because the values of the basic variables are then simply represented by the constant terms in the transformed equations. But any expression of the form &p0+ CppZp+ Y, Anz ip would satisfy this condition, and it might seem much simpler to put all a, = 0. The mathematically correct way to justify this is in terms of conjugate directions. One wants to change the nonbasic variables such that if the values of other nonbasic variables are subsequently changed, keeping u, = 0, the 148 E. M L. BEALE direction of motion is conjugate to the step already taken with respect to the objective function. In other words one would like to ensure that, having made 4C{dz, = 0, this derivative will remain zero. If this could be achieved, then the solution to the problem could be reached in at most n—m steps. Unfortunately these intentions are frustrated whenever one comes up against a new constraint. This means that one has to work on a new face of the feasible region and to start again setting up conjugate directions in this new face. In spite of these hazards, the process must terminate in a finite number of steps, as we now show. IL3 Proof of convergence To prove convergence in a finite number of steps, we first make the rules of procedure a little more specific. If there is any free variable that was introduced before a restricted variable was last made nonbasic, then we insist that some such free variable be removed from the set of nonbasic variables at the next iteration. I have not bothered to consider whether this condition is really necessary, since it is obviously reasonable to remove such a variable because it cannot remain nonbasic in the final solution. We now make a further definition. We say that the objective function C is in “standard form” if the linear terms in its expression contain no free variable. When C is in standard form, its value in the present trial solution, coos is a stationary value of C subject to the restriction that all the present nonbasic restricted variables take the value zero (without any restrictions on the sign of the basic variables). So there can be only one such valus for any set of nonbasic restricted variables. Now we know that C decreases at every step. So it can never return to a standard form with the same set of nonbasic restricted variables, even with a different set of free variables. There is only a finite number of possible sets of nonbasic restricted variables, so the algo- rithm must terminate if it always reaches a standard form in a finite number of steps when it is not already in standard form. To prove this, we note that whenever C is not in standard form a free variable will be removed from the set of nonbasic variables. So s, the number of iiGiibasic free variables, cannot increase. Further, if the new nonbasic vafiable is free, it is easy to show that the off-diagonal elements in the new expression for C in the row and column associated with the new nonbasic variable must vanish. It follows that C does not contain a linear term in this variable, and furthermore C can never contain a linear term in this variable unless some other restricted variable becomes nonbasic, thereby decreasing s. Therefore, if C is not in standard form, and s = so, say, then s cannot NUMERICAL METHODS 149 increase and it must decrease after at most so steps unless C meanwhile achieves standard form, Since C is always in standard form when s = 0, the required result follows. IL4 Updating the tableau There is one loose end in the above procedure that should be discussed theoretically before we turn to a numerical example. This concerns the updating of the tableau from one iteration to the next. There is no difficulty about the expressions for the constraints, but the objective function must also be updated. It is unprofitable to discuss this problem in detail, because it does not arise in this form in the practical version of the algorithm to be given in my next chapter. The procedure suggested in Beale [10] and Beale [11] can be expressed algebraically by saying that, starting from the expres- sion x’Cx, one first substitutes for the final x the vector y of new nonbasic variables, deducing the coefficients of C* where and then substitutes for the initial x’, This would never be very convenient in a computer, since it involves operating on both the rows and columns of the matrix. I am indebted to Dr. D. G. Prinz for pointing this out, and for pointing out that the solution is to use the algebraic expressions for the combined effects of these transformations, which are given in Beale [10]. These expressions are as follows: If we denote the pivotal column by the subscript g, and if the expression for the new basic variable z, in terms of the new nonbasic variables is y= CotegZat Y eezus Kea (where z, denotes the new nonbasic variable replacing z,), then the new coefficients (cj,) are given in terms of the old coefficients cy and the e, as follows: ia = eas Chg = Chk = Ceq@qt Cag &q&ks Chr = Cea t CagQert Cate + Cag ee ets where k, | # q. ‘These expressions can be written in a more elegant form if we write Hana OF = Ceg th Cage for k # q. 150 E. M. L. BEALE For then we have y= ae Cea = yey» , oo 2, Chg = Cie = Cee beg ees ce = Cut cre t che. IL5 A numerical example We conclude this chapter with a numerical example. The example given by Beale [11] illustrates most of the features of the method and has a simple geometrical interpretation. A similar example is not repeated here, because it seems better to produce an example that illustrates a special, though not very attractive, feature of the method. This feature is the necessity to sometimes eliminate more than one free variable from the set of nonbasic variables before restoring the problem to standard form. While this is being done one has no chance of reaching the optimum solution to the problem. On the other hand one is making progress, and in particular one may move onto a better face of the feasible region without having to complete the optimization on the present face. To illustrate this situation we have to go into (at least) 3 dimensions. ‘We therefore consider the following problem. Minimize C = 9—8x,— 6x2 —4x3+2x7 +2x3 +23 +201 %2 +2415, subject to the constraints 120, 20, x20, xy +xX,4+2x5 S 3. We start by introducing a slack variable x,, and write Xq = 39—x yy — 25. We can also express C in a form that displays the coefficients (¢,;) and at the same time can be read as an intelligible equation as follows: C=( 9-4x;—3x2—2x3) + (—442x + a+ X3)xy + (-34 x1 +2x2 2 + (-2+ + X3)x3- We see that C can be decreased by increasing x,. This will decrease NUMERICAL METHODS 151 X4, but x4 remains positive until x; = 3, But 2C/0x; = —4+2xy+x2+x3, and this becomes zero if x, = 2. So we introduce the variable uy = 44 2xy txp b%5 as our new nonbasic variable. ‘We then have x = 24+4u,—4x2—4x53 wy = [dud des. To deduce the new expression for C we note that @=h e=% a= &=-k a =-h Mg=l $=-2 che d= G=h and C=( 1 -% ) +O y wy +L +B —dss)s2 + bat bale. We now note that C can be decreased by increasing x,. Again we are stopped by the derivative going to zero before any basic variable becomes negative. So we write uy = —l +43x. —4r. Introducing u as a nonbasic variable in place of x,, we have x2 =F +3 +s 3 =F thy -by -¥ - 4 1 5 MSF Sy Hn Es. 1=% e=% %&=0 G=h =k Hyg =% G=-1, cf=0, c= cf =—4, c=¢ 5) +C dy dey +¢ Buy ie + {-$ +4%3)%5- We now note that C can be decreased by increasing x5. But this time we are stopped by x4 going to zero, when x3 = 2, 152 E. M, L. BEALE So we write 2 3 me: = F stn ote 4 3 1 x2 = +t ska 4 4 2 a = + sla 43% a= Hag = We now have to remove both , and #, from the set of nonbasic variables. Starting with u,, we must decrease this. 2C/du, becomes zero when wy 3, and all basic variables are still positive. So we write Us +e0Ms Le. uy +42 Pu x3 — Sus a — Hs x + Tus q=l, ea tye, e Xqq = Foo: ct=h ge Cc =GS + +( Aris +(g5 + +G5 ta¥st2 This trial solution is of some theoretical interest, since it is one that other methods, such as Wolfe’s, manage to bypass, Some methods for quadratic programming have been put forward as variants of Beale’s algorithm. One good test of the validity of this claim is whether the method passes through this trial solution on this problem. Our last step is to remove the variable u, from the set of nonbasic variables. We again have to decrease the variable being removed. And again we can go to the point where the partial derivative vanishes. NUMERICAL METHODS 153, So we put Us Ge Uy 154 E. M. L, BEALE JL THE PRACTICAL VERSION OF BEALE’S METHOD OF QUADRATIC PROGRAMMING TUL.1. Introduction At the beginning of the last chapter I stressed the importance of having a method for quadratic programming that: (a) would find a local minimum of a nonconvex funetion, and (b) would be specially simple to operate if there were only a few quadratic terms. This chapter starts with a few remarks about local minima. ‘This is follow- ed by a discussion of the extent to which the algorithm presented in the previous chapter meets the second of these criteria. The practical version of this algorithm, originally presented rather briefly on page 236 of Beale [11], is then introduced, It is illustrated by the same numerical example as that solved in the previous chapter. The practical version of the algorithm can be applied more easily using the inverse matrix method. This important point will be taken up in the chapter devoted to the inverse matrix method. TI1.2. Local minima and virtual local minima In principle the algorithm described in the previous chapter will find a jocal minimum of a nonconvex quadratic objective function. The objective function is reduced at every step. At no stage have we assumed that the diagonal clements of the matrix (c,,) must be positive. If we are increasing z, and the element cp, is negative or zero, then there is no danger of 0C/0z, vanishing and threatening to go positive, but this is no disadvantage. Note that if we do introduce a free nonbasic variable it must have a positive squared term in the expression for C, and this coefficient remains unaltered unless some new restricted variable becomes nonbasic, in which case this free variable will in due course be removed from the set of nonbasic variables. So the algorithm cannot terminate with negative quadratic coefficients. Unfortunately there is a theoretical danger of termination at a point that is not a local minimum. One might have some restricted nonbasic variable, say x,, with a reduced cost of zero, ie. such that 6Cjéx, = 0. NUMERICAL METHODS 155 The algorithm may then terminate, but if the objective function is not convex an increase in this variable might be profitable. This is an example of general difficulty when looking for local minima. It is convenient to define a point that is not strictly a local minimum but could easily be taken for one in a numerical minimization process as a “virtual local minimum”. I am grateful to my colleagues at the NATO Advanced Study Institute, and in particular E. H. Jackson, H. I. Scoins and A. C. Williams, for help in sorting out a satisfactory definition of a virtual local minimum. Beale [11] defines it as a point that could be made into a Jocal minimum by an arbitrarily small change in the coefficients of the objective function in a quadratic programming problem. But this definition cannot be applied to more general nonlinear programming problems, and in particular to problems involving nonlinear constraints. Jackson points out that in a minimization problem without constraints it is natural to define a virtual local minimum as a point where the gradient vector vanishes and the Hessian is positive semi-definite. This can be expressed as a point that can be turned into a local minimum by arbitrarily small changes in the linear and quadratic terms of the objective function. But other problems can arise with nonlinear constraints, and it is desirable to extend the class of perturba- tions permitted to include arbitrarily small changes in the constant terms of constraints. Note that the assumption that we can make arbitrarily small changes in the constant terms of the constraints, and in the linear terms of the objective function, means in the terminology of the simplex method that we do not have to worry about either primal or dual degeneracy. So much for generalities at this point. Returning to the subject of quadratic programming, we find that it is possible to extend the algorithm so that it can only terminate at a true local minimum. Whether this is worthwhile in practice, and whether it could cause cycling, I am not sure. But, for the record, the procedure will now be outlined. The partial derivative @C/éz, is given by Acoot LY Conn): mt Normally one terminates the algorithm if the objective function is in standard form, and all c,o = 0. But in the modified algorithm one will not stop if some cg = 0 unless ¢, = 0 for all & such that ey = 0. It is easy to see that if one does stop under these conditions the trial solution must be a local minimum. But if the conditions are not satisfied the trial ee OO—— . Vr 156 E. M. L. BEALE solution may be only a virtual local minimum. If ey) = 0 and ¢,, <0 one can immediately decrease C by increasing z, as far as possible. If cg = 0 and cpp = 0, one can decrease C by first increasing z, and then increasing some other nonbasic variable for which ¢yo = 0 and Cyy <0. If ¢po = 0 and Cp, > 0 one cannot increase z, immediately without increasing C. But by making z, nonbasic and introducing a new free nonbasic variable one may produce a situation in which several of the present nonbasic variables can be increased together so as to reduce C. 111.3. The compactness of Beale’s algorithm One of the features of Beale’s algorithm is that the size of the tableau fluctuates, This could be a nuisance in a computer program, though it is not necessarily so, if the matrix is stored by rows on magnetic tape (or other convenient backing store). In any case it is of some interest to have an upper bound on the number of rows required. For an arbitrary problem, one might have every single x-variable non- basic. One then needs n rows to store the expressions for these variables, plus the e row containing the expression for the variable just leaving the set of nonbasic variables (which might conceivably be a free variable and therefore not among the n x-variables), plus the c* row, plus the expression for C. The situation will be better if there are only a few quadratic terms. This means among other things that the rank r of the matrix of the purely quadratic terms in the objective functions must be much less than its maximum possible value of n—m. Now this rank will be unaffected by any change of basis. And this proves that one cannot have more than r non- basic free variables at any iteration. For before one could introduce an (r+1)" such variable, C would have to be in standard form with r non- basic free variables. There would therefore be r nonzero diagonal elements in the quadratic part of C with nonzero off-diagonal elements in the same row or column. This implies that all the remaining elements of C referring to quadratic terms must vanish if the rank is to be not more than r. And this in turn implies that the next new nonbasic variable (if any) must be a restricted variable. So there must be at least n—m-—r restricted nonbasic variables. And there cannot therefore be more than m-+r restricted basic variables. This is some consolation, but in a typical linear programming problem. there are many more variables than constraints. So if the objective function is stored as an (n—m) x (n—m) matrix then most of the tableau will be taken NUMERICAL METHODS 157 up by these coefficients. It would be possible to use the symmetry of the matrix and work with only half of it (ic. the coefficients cy for k < 0). The coding for this would come quite easily to anyone who had coded the stepwise multiple regression procedure to work in this way. But even saving half the matrix leaves the problem in an unwieldy form if one has, say, 10 quadratic variables and 200 linear ones. TIL4, Representing the objective function as a sum of squares As with so many problems in mathematical programming, the important decision here is not so much the choice of basic logic for the iterative solution procedure as the choice of how to keep track of the numbers required to implement it. Now it is clear that the most compact way to represent a quadratic function of low rank r is as a sum or difference of squares. We can write C= Atty ut Ya, @.1) Bi inten where the J, are linear functions of the variables of the problem, which can be updated from iteration to iteration in the same way as any other rows of the problem. (It is to be understood that the second summation is vacuous if C is convex, ie. if r, =r, and the first summation is vacuous if C is concave, i.e. ifr, = 0.) It turns out that this is a reasonably convenient procedure even if r is not small, so this approach is recommended for a general quadratic programming code. Let us now define the steps of the procedure in detail. The first stage is to express the objective function in the required form. We may refer to this as the “diagonalization” of the objective function. The best approach will depend on how the problem is specified, so we treat it as a preliminary operation, to be carried out in the matrix generator before entering the main mathematical programming routine. Many problems may start with only squared quadratic terms, so there will be nothing more to do at this stage. But obviously we should not rely on this. One procedure is to use a standard subroutine to find the eigenvectors and eigenvalues of the matrix (¢;). This is quite convenient with a moderate sized problem and a powerful computer; but itis theoretically over-elaborate, since it goes to some trouble to create an orthogonality between the A; that has no real relevance to the problem. If one is prepared to write a special routine for this part of the work, the following logic is recommended. 158 E. M. L, BEALE Consider the expression cHy Yeux ei where ¢,; = Cy, and look for its largest coefficient in absolute value, If this is a diagonal element, say cpp, and if it is positive, then define Then we see that kep l#p where cf, = Cu—CepCpilCpp- Similarly if ¢,» is negative we write perm 4 - V2 San Cpp HI and C = —42243, cfimx; where cj, is defined as above. So in this way we have removed one variable from the part of the expres sion for C that is not in the form of a sum or difference of squares. We can now repeat the procedure on the remainder. Tt therefore only remains to define the procedure if the largest coefficient is not a diagonal one. This is something that cannot happen if C is convex, but it is important not to be bound to this condition. We might then be in trouble without some additional rule of procedure. For example all the diagonal elements might even vanish, So we adopt the following policy. If the largest coefficient is cp, with p # q we make a preliminary change of variable as follows: Write Dp = Xpt%q Ya = Xp—%ae then CH=Y Yewm= YY cater 4 14 <1 14 where y, = x fork #p.q NUMERICAL METHODS 159 and Cop = Cpt Cqgt2Cpq pa = Sop San Cra = ppt Sag 2 pa Chp = Ckp + Cag Chg = Cee kg Ck = Ca, Where k,/ A p,q. And if cpg is the numerically largest of the c,,, then either cp, or cjg must be the numerically largest of the cj,. We are therefore back in the situation discussed earlier and can extract a new squared term in the y-variables. Tt is then a simple matter to substitute back the original x-variables in this expression. Now suppose that the diagonalization is complete, and denote the ex- pressions for the 4; in terms of the nonbasic variables of the problem by 4, = digt dnt (3.2) a To carry out the logic of the algorithm in this form, one must be able to compute @C/éz,. (It is easiest to work with this quantity rather than half of it.) Its value ao, at the current trial solution, used to decide which nonbasic variable to increase, is given by 0p = doy Yi diody~_ Yo dot (63) from (3.1) and (3.2). Having chosen the variable z, to be increased, we must be able to compute 2*C/dz? in order to decide whether the new nonbasic variable should bea new free variable, This second derivative is obtainable as La Yt ee itt Finally, if a new free variable u, is needed, it is defined by the equations He = dopt Y digki= dip Ga) = Hop 560%» (3.3) where = Yi dadig— Yd G8) 160 E. M, L. BEALE Note that, to avoid difficulty with rounding-off errors, we do not use the derivatives to test whether a free variable should be removed from the set of nonbasic variables. We do this by theory, removing such variables if and only if they became nonbasic before the last restricted variable became nonbasic. At the very end of the process, we shall need to know the value of the objective function. This is of course doot bY dio 4 Ld. (3.7) inthe TELS. Numerical example We now resolve the numerical example considered in the last chapter using the practical version of the algorithm. It must be admitted that for this type of problem, with a quadratic form of full rank, the practical yersion of the algorithm has no significant advantage over the original form. But nevertheless the problem can be solved quite easily this way. We must first diagonalize the quadratic terms in the objective function. Following the steps outlined above for doing this we find that C = 9-8x, —6x,—4x, + xd + QxZ t+ x3+42xy xy +204 x5 = 9-8x,—6x,—4y5 +4 Qxp+ x, + x3)? + bx}—x x3 +43 = 9-B8x, 6x, 425 +4 Qxt+ x4 x3)? +4 OaJ3—Fay3? +43. So our initial tableau is as follows: X, = 3-x, — x,—2x3 Jy = 98x, —6x.—4x5 A= Wxyt at xy Ag Xp /3—4x3)/3 As = 4x3 /6 and ry =3, r= 3. NUMERICAL METHODS 161 Applying the formula for derivatives, we find that @C/éx, = —8, 0Clax, = —6, 8CJOxs = —4. So we increase x,. We see that 0*C/0x? = 4. So, applying the usual ratio test, we see that we must introduce a free nonbasic variable. We write uy = —84+4x,+2x24+2x5. So the next tableau becomes X= 2thy-by hrs y= l-dy-dx, 3s dg = —7—2u;—2x; A= 4th = xn/3—dasV/3 as = dora/6. Applying the formula for the derivatives of C with respect to the nonbasic restricted variables, we find that aCjax, = -2, @C/éxs = So we increase x,. We see that 6*C/ax} = 3. So, applying the ratio test, we see that we must again introduce a free nonbasic variable. We write tt, = —243x,—%5- So the next tableau becomes e=3t dnt ds 3 a egt hatha —ds m= 3- fu-te—Hs dg = — 2 2 Fn — 3% Ag=4 0 +h dy = 33 +Ha/3 dy = 4x3/6. Applying the formula for the derivative of C with respect to the remaining nonbasic restricted variable, we find that aC}axs = —3. So we increase x3. We see that @2C/4x3 = 3. So, applying the ratio test, we see that we must make x, nonbasic. So we write 162 2 3 el i $b +zoM ~~ tom + hy 4 3 v3 26 The variables u, and w, must now be eliminated. Starting with u,, we see that aC PC _ ae ou, "aw So we must decrease u,, and introduce a new nonbasic free variable. We write 1 3 t+root1 = +ro0H2 T¥0%a- So the next tableau becomes w= 35 w= 8 me x= 8 ay = ~ 838 A= 8 4y= 4Y3 Js = Tov We must now eliminate uy. We see that ac @C _ as Qu, we So we must decrease up, and introduce a new nonbasic free variable. We write 24418 2 Mg = gy tastatea%4- NUMERICAL METHODS 163 So the next tableau becomes + ~ 3% B= F — 3% m= > — 3X4 m= F + Fis 3a + 3% do= 7 ARR, tty + 8X A= A Fey y aaa Xa da= tH + Fite /3 —2p%e3 Jy = BJO — Bus /6 —sate 6 —z7xey/6. The problem is now back in standard form, We must therefore again consider the derivative of C with respect to the (only) nonbasic restricted variable. We see that 8C/ax, = 2, which is positive. So we have the final solution. The value of the objective functions is. ~BHGE? +GiV3)- G9} = 4. We have achieved the same result as before. OO 164 E. M. L. BEALE IV. THE INVERSE MATRIX METHOD FOR LINEAR AND QUADRATIC PROGRAMMING IV.1. Introduction Tt is widely known that all, or nearly all, production computer codes for solving large linear programming problems use the product form of the inverse matrix method. But it is rather less widely known what this form really involves. It therefore seems desirable to review the product form for linear programming before discussing its application to quadratic programming. IV.2. Outline of the inverse matrix method In the original, or straight, simplex method one works with the tableau of coefficients in the expressions for the basic variables as linear functions of the nonbasic variables. If there are m equations and n variables this means that one works with an array of (m+1) x (1n—m+1) coefficients, assuming one objective function and one “right hand side”, or column of constant terms. All these coefficients have to be up-dated from one iteration to the next, although only very few of them are actually used in any single iteration. Specifically one uses (a) all the coefficients in the objective function, in order to select the new pivotal column (normally chosen as the one with the most negative reduced cost), (b) the right hand side and the elements in the pivotal column, in order to select the new pivotal row, (c) the other elements in the pivotal row, in order to update the expression for the objective function. The remaining columns are up-dated simply because they may be needed as pivotal columns in a subsequent iteration. It is therefore natural to wonder whether, instead of carrying around all this information in case it is needed, one cannot represent the problem more compactly, calculating particular elements of the tableau only when required. It turns out that this ‘NUMERICAL METHODS 165 is possible. To explain this clearly it seems necessary to use matrix notation. The original constraints of the problem can be expressed as equations by adding suitable slack variables in all inequality constraints. And these constraints can be written as the matrix equation Ax = b, where Ais an (mx 7) matrix, x an mvector and b an m-vector. The complete problem is defined by the constraints and the objective function, and it is desirable that our matrix equation should include the expression for the objective function. We therefore define a new variable x, and a new equation Xo+) ao, where ¥’ do)x;—bo represents the expression to be minimized. This minimiza- tion can obviously be achieved by maximizing x). So we now think of A as an Gn+1) x (+1) dimensional matrix. Next, consider the situation at any particular iteration during the solution of this linear programming problem by the simplex method. There will be a set of basic variables, and we can imagine that the variables are renumbered so that these are the variables x,, +», x,,. Then our matrix equation can be written in the form bo, (Bl Ax where Bis an (m+1) x (m+1) square matrix of coefficients of the variables Xo» Xis'''s Xm, and Ay denotes the remaining columns of A, i.e. the co- efficients of the nonbasic variables. If we now premultiply our matrix equation by B~+, we have it in the form (L| B+ A,)x = B, where B=B Dd. And this means in effect that we have expressed the variables x9, X1,°**; Xm as linear functions of the variables x,,4:,°*",%,. So the coefficients of the matrix B~'A, are in fact the coefficients in the current tableau, and the coefficients of the column vector f are the current right hand sides, or values of the objective function and the basic variables. Furthermore, the coefficients of the matrix B~' can be updated from one iteration to the next in just the same way as one updates a tableau in the straight simplex method. This important fact follows most easily from the fact that the columns of B~+ can be regarded as columns in the tableau 166 E. M, L, BEALE — they are the columns of coefficients of the original slack or artificial variables associated with each row of the matrix. In the inverse matrix method, one therefore works with the original matrix A, the current right hand side f, and some expression for B', the jnverse of the current basis. For the time being, we may imagine this as an ordinary matrix. We then have the explicit inverse form of the simplex method. Later we shall consider the alternative, product, form. IV.3. The steps of the inverse matrix method Fach simplex iteration can be subdivided into 5 steps when using the inverse matrix method, as follows: Step 1: Form a pricing vector, i.e. a row vector x that can be multiplied into any column of the matrix A to determine the reduced cost for the corresponding variable. The point here is that we wish to pick out the first row of the tableau, i.e. of the matrix B71A, which can be done by premultiplying by a row vector c whose first element is unity and whose remaining m elements are zero. ‘We therefore form the vector product x = cB™*, so that we can subsequently form the product 7A to determine the row vector of reduced costs. If B™* is stored explicitly, this operation simply involves picking out its top row. Step 2: Price out the columns of the matrix A to find a variable to remove from the set of nonbasic variables. This is normally done by forming every element of the matrix 7A and choosing the most negative, though alternative methods have been proposed, since this step involves most of the computation in the inverse matrix method. Step 3: Form the updated column « of coeflicients of the variable to be removed from the set of nonbasic variables, by premultiplying the appro- priate column of 4 by Bol. Step 4: Perform a ratio test between the elements of x and the clements of B to determine the pivot, and the variable to be made nonbasic. Step 5: Update the set of basic variables, the vector B and the inverse Bo}. The first of these operations is simply bookkeeping. The remainder js the same as the straight simplex method ona smaller matrix consisting of a pivotal column « (whose updated value is not needed), the columns of Bo}, and a right hand side p. IV.4. The product form of inverse Using the product form, the inverse of the basis is not recorded explicitly. Instead, it is represented as a product of elementary matrices. Each of these NUMERICAL METHODS 167 represents the effect of a single pivotal operation. Such an operation, in which the pivot is in the p™ row, is to premultiply B~* by a matrix that is a unit matrix except for the p™ column, which is computed from the updated pivotal column in the tableau. This pivotal column is computed in Step 4 of the inverse matrix method. If its components are denoted by «;, then the p" clement of the p column of the elementary matrix is 1/z,, and the i element, for i # p, is —a:/2ty, These elementary matrices can be stored in the computer very compactly. One simply records the column number p, and the nonzero elements in it; the remaining, unit, columns being under- stood, In fact these unit columns are so taken for granted that the elemen- tary matrices are ofien referred to as vectors, specifically “‘y-vectors”. The steps of the inverse matrix method can be carried out when B7* is represented as a product of n-vectors. Step 1 is then called the “backward transformation”, since the row vector ¢ is postmultiplied by each y-vector successively, in the opposite order to that in which they were generated. It will be noted that each y-vector affects only the p™ element of the row vector being built up. Step 2 is carried out as with the explicit inverse. Step 3 is called the “forward transformation”, since the pivotal column of A is premultiplied by each y-vector successively in the order in which they were generated. Step 4 is carried out as with the explicit inverse, as is Step 5, except that the process of updating B~' is very simple - one just adds another -vector to the end of the list. After a while, the list of y-vectors becomes undesirably long, and to shorten it one goes through a process known as reinversion. It should perhaps be made clear that this reinversion does not mean recording some explicit inverse which is then simply updated by adding y-vectors. One always starts from a unit inverse representing an all slack-or-artificial basis, and adds n-vectors representing pivotal operations to replace unwanted slacks or artificials by genuine variables. Knowing which variables are to be introduced, and which slacks or artificials are to be removed, one can per- form these pivotal operations without thinking at all about the signs of the values of the basic variables or of the reduced costs at intermediate stages of the inversion. In fact one has considerable freedom in the choice of pivotal columns and rows during reinversion. One must choose a column corte- sponding to a variable that is due to be in the basis but has not yet been used, and one must choose a row corresponding to a slack or artificial that is due to be removed from the basis but has not yet been used, and which will give a nonzero value to the pivot («,). But that is all. If all the elements of B were nonzero, then it would probably be best to choose pivotal columns 168 E. M. L. BEALE and rows to give the largest possible pivot in absolute value, so as to maximize numerical accuracy. But in practice matrices arising in large linear program- ming problems contain a high proportion of zero elements, and the number of nonzero elements in the set of y-vectors depends very significantly on the order in which the pivotal columns and rows are selected. An important computational aspect of linear programming is the choice of pivots during inversion, since this affects the speed of inversion itself and also the speed of subsequent backward and forward transformations. If B were a lower triangular matrix, and one pivoted on the diagonal elements in order, starting with the first and finishing with the last, then the y-vectors would contain no more off-diagonal elements than the original mairix B. It is of course possible to invert the matrix using different pivots, or the same pivots in a different order: but this will in general produce more nonzero elements in the resulting 4-vectors. In principle therefore what a modern inversion routine does is to premute the rows and columns of B so that they are as nearly in lower triangular form as possible, and then pick the pivots in the original matrix B corresponding to pivoting on the diagonal elements of the permuted matrix in order. In practice one often starts a linear programming calculation by inverting to a prescribed initial basis. This initial inversion is performed in exactly the same way as a re-inversion. IV.5, The advantages of the product form The advantages of the explicit inverse method over the straight simplex method, and of the product form over the explicit inverse, are often sum- marized as follows: In the straight simplex method one has to update the complete tableau, involving (m-+1) x (n—m+1) numbers, at each iteration. In the explicit inverse method one only has to update the inverse, which involves (m-+1) x (m-+1) numbers. This is a considerable saving if, as is usual, n is much larger than m. The advantage of the product form comes from the fact that even this updating is avoided. From a practical point of view this explanation describes the situation reasonably accurately. But from a theoretical point of view it is so unsatis- factory as to cause a number of workers in the field to think that the inverse matrix method is an elaborate hoax and that the straight simplex method is really the best. The theoretical weakness of the argument lies in the fact that if the matrix A were full (ie. contained no zero coefficients) then the pricing operation, NUMERICAL METHODS 169 Step 2 of the inverse matrix method, would already involve as much arith- metic as updating the tableau, and the perhaps small amount of additional work in Step 1 and 5 would simply make the inverse matrix method even less competitive. Obviously, in order to get to the heart of the matter we must take note of the fact that a very large proportion of the elements of the matrix 4 vanish, But we must still be careful. If a proportion p of the elements of 4 are non-zero, then pricing in the inverse matrix method will involve about px (m+1)x (n—m-+1) multiplications, since the x-vector will usually be full. (And this assumes that one does not price the basic vectors.) On the other hand if a proportion p of the elements of the tableau are non-zero, then the updating involves only p? x (m+1)x (a—m-+1) multiplications, So the argument of sparseness can be used to provide further support for the straight simplex method. So what is the true position? Part of the real advantage of the inverse matrix method, and in particular the product form, lies in its greater fiexibility. One does not have to complete the pricing operation for every iteration. For example one can use multiple-column-selection techniques in which one uses the pricing operation to select a number of the most promising columns, updates them all in Step 3, and then does a few steps of the straight simplex method on this very small subproblem. But even without this flexibility the inverse matrix method would still be advantageous on typical large problems because the two p’s in the above formulae are not the same. The original matrix A will be much more sparse than typical tableaux during the calculation. On some computers it is worth- while taking further advantage of the special structure of the matrix A by noting that many of its elements are +1. These unit elements can be stored more compactly than general elements. And when working with such unit elements during pricing one needs only add or subtract rather than multiply. Again, the advantage of the product form over the explicit inverse lies not so much in the fact that it simplifies the task of updating the inverse as in the fact that it generally provides a more compact expression for the inverse if one has a fast and efficient inversion routine on the lines described at the end of the last subsection. There is another practical advantage of the inverse matrix method, which applies even more strongly in the product form. Having updated a tableau, or even an explicit inverse, one generally has to write it out on to a magnetic tape or some other backing store if one is solving a large problem. It is true that most computers can transfer information in this way at the same 170 E. M. L. BEALE time as getting on with other calculations, but these transfers are apt to impede the process of reading further information into the working store for more processing. This bottleneck may be a passing phase in computer technology, but if so it is taking a long time to pass. Vast improvements have been made in the past 10 years in methods of moving information from one part of the machine to another, but these are having a hard time keeping up with the vast improvements in arithmetic speed. Published references to these problems of computational efficiency are somewhat meagre. Some careful theoretical analyses are given in chapters 4 and 5 of Zoutendijk [14], numerical results are reported by Wolfe and Cutler [15] and by Smith and Orchard-Hays [16]. IV.6. The inverse matrix method for quadratic programming Let us now return to the subject of quadratic programming. It has been said that one disadvantage of Beale’s method is that it does not lend itself to the inverse matrix method. In fact the opposite is more nearly true — only with Beale’s method is the matrix an appropriate shape (with many more columns than rows) for the inverse matrix method to be attractive. ‘The application of the inverse matrix method to Beale’s quadratic program- ming has some interesting features, which we now explore, The program should store the information in the usual way, by columns. It is apparently awkward that one has to add rows, and delete rows, through- out the calculation. But in fact this causes no great difficulty. One will start with m-+7r-+1 rows, representing m constraints, 1 linear part of the objective function, and r rows representing the 2,. In addition one needs some spare rows, or “u-rows” in which to record the equations for the free variables. ‘Theoretically r+2 u-rows will be enough, but in practice it is desirable to allow rather more. The free variables are numbered serially from 1 upwards. If there are U u-rows, then free variable i will be defined in w-row j, where jis the remainder when iis divided by U. Two “markers” I and J are required, such that all free variables with a serial number less than or equal to Jhave been removed. from the nonbasic set, and all free variables with a serial number less than or equal to J must be removed from the nonbasic set. Initially J = J=0, but J is increased to the number of the latest free variable whenever a restricted variable is made nonbasic. If J < J, then free variable J+1 is made basic at the next iteration and 7 is increased by one. Once a free variable has been made basic it may be removed from the problem if one is NUMERICAL METHODS 171 using the explicit form ofinverse. But in the product form it must be retained until the next reinversion. As we have seen, an iteration in the inverse matrix method consists of 5 steps, but in quadratic programming the first 2 are omitted if J < J. 1. Form a pricing vector, i.e. a row-vector that can be multiplied into any column of the matrix to determine the reduced cost, i.e. the value of 0C/é@z,, for that variable. In quadratic programming this operation is unaffected, except that (once the problem has become feasible) the original row vector ¢ has elements 1, dy9,*'*d,,0. —4,+1, 00° —Go in the columns corresponding to rows 25, 44,° ~~; 4, respectively. 2. Price out the columns of the matrix to find a variable z, to remove from the set of nonbasic variables. In quadratic programming this operation is applied in the usual way to the restricted variables if 7 = J. Otherwise one just picks the free variable [+1 and increases I by one. 3. Update the column « of coefficients of z, in the tableau — by premulti- plying the original coefficients of z, by the inverse of the current basis. This operation is the same in quadratic and linear programming, 4, Perform a ratio test to choose the next variable to become nonbasic. This works rather differently in quadratic programming, but the changes are fairly obvious. (a) If a free variable is being made basic, and has a positive reduced cost op; then the signs of all the elements of # must be reversed. (b) The rows associated with the quantities A,, and any basic free variable, must be omitted from the test. (©) Having found 6, the amount by which z, can be increased, compute 4 =aop+ OD dn dip F where ap, is defined by (3.3). If A is negative, remove the indicated restricted variable from the basis in the usual way (and increase J to the number of the latest free variable), But if 4 is nonnegative a new free variable must be defined and made nonbasic. This can be done most compactly using eq. (3.4). On the other hand one must use eq. (3.5) and (3.6) if one is not prepared to invert immediately. But it is not necessary to choose between these approaches — we can use both eq. (3.4) and (3.5). The coefficients for the ——o— rT 172 E. M, L. BEALE expression for m in terms of both the 4, and the 2, can be found using a pricing vector, formed in the usual way except that the row vector to be post-multiplied by the inverse of the basis has elements dy,,***, drip» —dyy44,ps°*'s —4yp in the columns corresponding to the rows defining Jy, ++, A, and zeroes elsewhere, Both sets of coefficients can be entered on the A matrix, Until the next re-inversion the coefficients of the 1, can be ignored since the 2, remain basic throughout. Immediately before re-inver- sion the coefficients of the z, can be removed from the 4 matrix. 5. Update the set of basic variables, the vector of constant terms, and the inverse, This operation is the same in quadratic and linear programming. The necessary changes in the markers J and J have already been described. NUMERICAL METHODS Y. SEPARABLE PROGRAMM Y.1. Introduction My remaining chapters are concerned with methods of solving mathe- matical programming problems that are linear programming problems except for a relatively small number of nonlinear constraints. The question whether the objective function is allowed to be nonlinear is not important, because, as Wolfe first pointed out many years ago, one can always make the objective function linear by introducing another nonlinear constraint. For if C is a nonlinear function to be minimized, then one redefines the problem so that one has to minimize z, where z is a new variable satisfying the constraint C-zs0. This point is made, for example, in Wolfe [12]. I confess that when I first heard this I thought it was a mathematical observation of no practical importance. But now I think that it is of some significance, since it emphasizes the fact that satisfactory methods of dealing with nonlinear constraints are all one really needs to solve nonlinear programming problems. As Wolfe points out elsewhere in this volume, powerful special methods have been developed for maximizing general nonlinear objective functions of variables subject to linear constraints. Nevertheless I do not believe that it is worthwhile to develop efficient computer codes for solving such problems on a production basis. It is important to keep one’s armoury of production programs from growing unnecessarily. Each such program needs maintenance effort in much the same way as physical equipment. This involves having people who know how to operate the program. But they must also know enough about its structure to fix bugs that may appear when one tries to use it in new circumstances, and to alter it to meet special requirements or to take advantage of new techniques (in either hardware of software) that may have become available. Such maintenance effort is always in short supply, and it should be concentrated as much as possible without On 174 E. M. L. BEALE | unduly restricting the class of problems one can solve efficiently. And nonlinear programming problems with linear constraints can be solved reasonably efficiently using codes for more general problems. ‘A useful method that can in principle be applied to all nonlinear con- straints, and can in practice be applied to a wide variety of real problems, has been devised by Miller. It is called separable programming. It is still not as well known as it deserves to be, primarily because it was not formally published until 1963, see Miller [17], and perhaps partly because it is so simple that it may appear trivial to the theorist. This chapter describes the theory of the method, and discusses some practical points concerned with its application, with particular reference to ‘ the handling of product terms in constraints. The material for this chapter ) is largely taken from Miller’s paper, and from a paper by Beale, Coen and Flowerdew [18]. Some minor technical points, and also parametric program- ming and interpolation procedures, will be deferred until the next chapter. Y.2. The theory of separable programming Miller’s technique is known as separable programming because it assumes that all the nonlinear constraints in the problem can be separated out into sums and differences of nonlinear functions of single arguments. At i first this assumption seems to severely restrict the usefulness of the method. But we shall see later that this is not really so. if As Miller points out, the technique is related to earlier work. Charnes and Lemke [19] first pointed out that convex separable nonlinearities in the objective function can be handled by the simplex method. Dantzig [20] reviews methods for minimizing separable convex objective functions. Miller’s special contribution was to extend this approach to nonconvex problems. After this it was a simple matter to apply the method to nonlinear constraints as well as to nonlinear objective functions. Suppose that we have some variable z, and that we want to deal with some function f(z). Suppose that the graph of f(z) looks something like this: £@) Fe NUMERICAL METHODS 175 We now replace this function by a piecewise linear approximation based on a finite number of points on the graph. In the diagram I have taken 8 points P,, +++, Py. Now let the coordinates of these 8 points be (a;, &,) and introduce 8 new nonnegative variables 2,,°*+, 4g and the equations Aybotttig = 1 (5.1) QAyt +++ tagdy = 2 (5.2) DyAyt e+ +Bgdg =S(2). (6.2) We call these new variables a single group of “special variables”, for reasons that will soon be clear. The eqs. (5.1) and (5.2) are known respec- tively as the convexity and reference rows for this group of special variables. Note that the quantities z and f(z) are not necessarily nonnegative. This causes no inconvenience, since we will not normally want to introduce them explicitly in the mathematical programming model in any case. Let us now consider some typical solutions of eqs. (5.1), (5.2) and (5.3). If we put 4, = 1, 4; =-++ = 2g = 0, then z = a and f(z) = by, and we have the point P;. If we put 4,=4, 42=4, 4g =-++=0, then z= 4(a,4a,) and F() = (6, +5,), and we have a point half way between P, and P. More generally, if we allow any 2 neighbouring special variables to take nonzero values, keeping the other special variables of the group equal to zero, then we will map out the piecewise linear approximation to f(z) that we have agreed to use. On the other hand if we put say 4, = 4 and Ayah Aga dga As += 0, we have a point midway between P, and P, which is not a valid one. Now in some problems we know beforehand, perhaps from convexity considerations, that such inadmissible combinations of special variables cannot occur in an optimal solution even if we take no special steps to exclude them. In these circumstances we do not need to use separable pro- gramming. Separable programming is a method of reaching a local optimum, which may possibly not be a global optimum, solution to a non-convex problem by taking special steps to exclude inadmissible combinations of special variables. Miller [17] gives an example of a phenomenon that he calls “special de- generacy”, which can cause the procedure to terminate in a virtual local optimum as defined earlier. But he indicates that this hazard is not a serious one in practice. 1716 E. M. L. BEALE The required special steps are very easy if we are using the simplex method. All we have to do is to restrict the set of variable to be considered as candidates for entering the basis. If two special variables of a group are in the current basis, then we do not consider allowing a third. And if one is already in the basis we consider only its neighbours. The proof that one must reach at least virtual local optimum is straight- forward. Obviously the algorithm must terminate, since it is a version of the simplex method for linear programming with the possibility of earlier termination. And when it does terminate we know that we have a true optimum to a linear programming problem obtained from our separable problem by suppressing all special variables other than those in the basis, and their neighbours when they are the only representatives of their group in the basis. This is a local optimum if all the basic special variables are at positive levels: if any are at zero levels it is still a virtual local optimum by the definition of permitted perturbations in Chapter TL The implementation of these steps is itself easy in a mathematical pro- gramming code in which the variables have names. For example the C-E-I-R code LP/90/94. allows variable names to be any 6 characters, provided that the first is either blank or 1, 2, 3, 4, 5, 6, 7, 8 or 9. If the program is jin the separable mode, then all variables with a 9 as their first character are treated as special ones. The next 3 characters define the group of special variables, and the last 2 are decimal digits defining the sequence of the special variables. Incidentally, as Miller [17] points out, one can very casily deal with several different nonlinear functions of the same variable in the same problem. One simply writes down one equation of the type (5.3) for each function. And in practice one does not usually have to include these equations explicitly in the model, since the left hand side can be substituted for the right wherever it occurs. Separable programming has also been implemented in codes making special provisions for bounded variables. One can then apply separable programming without introducing constraints of the type (5.1) by introduc- ing bounded special variables representing the different increments in the independent variable z. A special variable is then allowed to increase above its lower bound (of zero) only if the previous variable of the group is at its upper bound. And it is allowed to decrease below its upper bound only if the following variable of the group is at its lower bound. ‘And that is all there is to the theory of separable programming. A number of technical points have to be considered when one comes to apply it. NUMERICAL METHODS 117 We discuss some of these in connexion with an important class of applica~ tions, namely to product terms. V.3. Product terms A number of mathematical programming problems are linear except for the presence of a few product terms. One may have a price of some commodity that is a linear function of other variables of the problem. The amount spent on this commodity is then the product of the price times the amount bought. In oil production problems, the productivity of a well may be an approx- imately linear function of variables relating to production from this reservoir in previous years. The production available in the current year is then given by the product of the well productivity times the number of wells drilled, The problem considered by Beale, Coen and Flowerdew [18] is actually concerned with iron-making, but it is in principle a rather general one. Raw material is being fed into a number of production units. Various raw materials are available at varying costs, and they all have different specifications concerning their chemical compositions, etc. This sort of situation often leads to a standard application of linear programming to determine the cheapest combination of raw materials to meet certain spe- cifications on the overall properties of the material supplies to the produc- tion units. But nonlinearities arise if some raw material can be fed into a preprocessing unit, the output of which is then fed into more than one main production unit. In the iron-making application this preprocessor is a sinter plant. If the preprocessor only has to feed one type of main unit, or if it can be operated in different ways to feed the different types of main unit, then linear methods may still be applicable. But if the preprocessor has to be run in a fixed way to feed several types of main unit, then the problem is apt to be nonlinear. A good way to handle such problems is often to define variables re- presenting the proportion of the output from the preprocessor that is fed to particular main units. The amount of some chemical element in the preprocessed material supplied to a main unit is then the product of this new variable times the amount of this element in the output from the preprocessor. This amount is itself linearly related to the inputs to the preprocessor, So it is important to be able to handle product terms. The expression uy U> is not a nonlinear function of a single variable, so it might appear that ————TTETTTTE 178 E. M. L, BEALE it was not amenable to separable programming. If that were true, then separable programming would be of very limited value, but fortunately it is not true. We note that Ue = Hey Hu)? Ay ta)? G4) so we can always express a product of 2 linear variables as a difference between 2 nonlinear functions of linear variables. This is a special case of the fact (exploited in Chapter III) that any quadratic function can be represented as a sum or difference of squares. Any such function is therefore amenable to separable programming. And we could then handle 5 the product of such a quadratic function with another variable in the ? same way; so in theory there is no limit to the class of functions that can be : represented in this way. In practice of course a very involved representation would be cumbersome computationally. So we can deal with a simple product term by introducing 2 groups : of special variables. This involves 4 extra equations ~ 2 convexity rows to : represent the conditions that the sum of the special variables in each group must equal I, and 2 reference rows to represent the value of the arguments of the nonlinear functions in terms of the special variables. These correspond to eqs. (5.1) and (5.2), in general with some linear combination of the basic | variables of the problem substituted for z in (5.2). In practice we will generally not write down an equation corresponding to (5.3) explicitly, since we can substitute the left hand side of this equation for f(z) wherever it occurs. | Lam grateful to Eli Hellerman for pointing out that repeated use of (5.4) is not the only way to handle more complicated product terms. One may have to consider an expression of the form | Bi oon, where the x; are essentially positive and have a lower bound that is strictly greater than zero. It will then generally be more economical to write this expression as exp w, where w = ainx,+a,Inx,+ +++ +a,Inx,. ‘This treatment involves n+1 groups of special variables and 2(n+1) extra equations — one of which defines the special variables representing exp w in terms of sums of special variables from the other groups. So this logarithmic treatment is inferior to the use of (5.4) for simple product NUMERICAL METHODS 179 terms, but it will generally be best for more complicated products when the variables concerned are essentially positive. And it may be advantageous for simple product terms if one has several such terms involving the same set of variables. Y.4, Defining the ranges of the variables The next point to notice is that separable programming only deals with nonlinear functions over definite ranges of their arguments, It is clear from eq. (5.2) that z cannot be less than a, or greater than ag. Occasionally the fact that one automatically fixes lower and upper bounds on the independent variable defining a nonlinear function is a useful bonus to the formulator of the problem. More often it is an extra chore to have to fix realistic bounds. for these variables, or else run one of 3 risks: (a) to have an unnecessarily inaccurate approximation (due to using a widely spaced grid), or (b) to have an unnecessarily large problem (due to having more points than are necessary to define the nonlinear functions involved), or (©) to finish up with a solution in which some independent variable for a nonlinear function is up against one of its limits, indicating that a better solution might possibly be available beyond this limit. (Of course if the solution falls within all the chosen limits, then it would necessarily remain a solution if the limits were relaxed, even if one did not know a priori that these limits were justified.) Tn the problems I have encountered so far, there has been no particular difficulty in fixing appropriate limits. In this connexion it is often helpful to work with the proportion of some input or output that is composed or used in a certain way. This proportion must obviously lie between 0 and 1. An alternative formulation might be possible in terms of the ratio of the amounts used in 2 different ways; but such a ratio might vary between 0 and 0. Another illustration of the convenience of proportions is that if one has a variable z defined by zie Sp where the x; are proportions, and are therefore nonnegative and sum to 1, then z must lie between the smallest and the largest of the ¢;. 180 E, M, L. BEALE When dealing with product terms, it seems best to define the quantities uy and wu; to which one applies the identity (5.4) so that each covers the range 0 S uy, u) $ 1. If one has a product term v,vz with arbitrary, but specified, ranges for », and v2, one can always write vy = ay thy vz = a, +byt, where 0

You might also like