0% found this document useful (0 votes)
34 views

IBM Thomas J. Watson Research Center, New York, U.S.A.: K L K L

This document describes an algorithm for finding the point of smallest Euclidean norm in the convex hull of a finite set of points in Euclidean space, or equivalently for finding an optimal separating hyperplane between a point and finite point set. The algorithm works by starting with an initial affinely independent subset ("corral") of points and iteratively adding points to refine the corral until the point of smallest norm is found within the corral's convex hull. The algorithm is proven to always terminate with the optimal solution.

Uploaded by

dezevu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

IBM Thomas J. Watson Research Center, New York, U.S.A.: K L K L

This document describes an algorithm for finding the point of smallest Euclidean norm in the convex hull of a finite set of points in Euclidean space, or equivalently for finding an optimal separating hyperplane between a point and finite point set. The algorithm works by starting with an initial affinely independent subset ("corral") of points and iteratively adding points to refine the corral until the point of smallest norm is found within the corral's convex hull. The algorithm is proven to always terminate with the optimal solution.

Uploaded by

dezevu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Mathematical Programming 11 (1976) 128-149.

North-Holland Publishing Company

F I N D I N G T H E N E A R E S T P O I N T IN A P O L Y T O P E

Philip W O L F E
IBM Thomas J. Watson Research Center, New York, U.S.A.

Received 18 June 1974


Revised manuscript received 26 January 1976

A terminating algorithm is developed for the problem of finding the point of smallest
Euclidean norm in the convex hull of a given finite point set in Euclidean n-space, or
equivalently for finding an "optimal" hyperplane separating a given point from a given finite
point set. Its efficiencyand accuracy are investigated, and its extension to the separation of two
sets and other convex programming problems described.

1. Introduction

We develop here a numerical algorithm for finding that point of a given polytope
in Euclidean n-space which has smallest Euclidean norm. A polytope is the
convex hull of a given point set P = {P1 . . . . . P,,}; algebraically, we are required to
minimize ]XI2= X T X for all X of the form

X = ~ Pkwk, ~ Wk = 1, for all wk I>0. (1.1)


k=l k=l

The point X is thus the nearest point to the origin in the polytope. Solving the
problem for P = {P1 - Y, . . . . P m - Y} yields X + Y as the nearest point to Y; for
simplicity we keep Y = 0 throughout.
That problem, or its dual, arises in many contexts. We have encountered it in
the optimization of nondifferentiable functions, in approximation theory, and in
pattern recognition, and have felt the need for an efficient and reasonably
foolproof way of solving it. We offer what follows as an approximation to that
goal.
The problem is, of course, a problem of quadratic programming, for which there
are several excellent general algorithms; but the special nature of our problem lets
us improve on them. The central feature of our approach is that the representation
(1.1) of X is explicitly used, and the set of such X viewed as a polytope, while the
usual general algorithm concentrates on the constraints w ~> 0, taking the descrip-
tion of the set as a polyhedron (the intersection of halfspaces) as fundamental.
The algorithm to be described has been trimmed to the present problem; we think
it improves on the general-purpose algorithms in computational effort, storage,
and roundoff error. Other procedures we have seen (e.g., [2, 9]) are convergent,
non-terminating methods, and cannot be properly compared with the present
scheme.
The next section presents some elementary notions required to set the problem.
128
P. Wol[e[Finding the nearest point in a polytope 129

The algorithm is described in purely geometric terms in Section 3, and the


corresponding algebraic description given in Sections 4 and 5. Section 6 presents
results f r o m tests on easily generated problems of several kinds, while Section 7
considers some possible i m p r o v e m e n t s on the method. Section 8 studies the
a c c u r a c y of the solution. Section 9 mentions some handy w a y s to use the method
w h e n a problem already solved is modified, and Section 10 describes the use of the
algorithm to find " g o o d " or " o p t i m a l " separating hyperplanes. Finally, Section 11
discusses some natural extensions of the algorithm to other constraint sets and
objective functions.
We are indebted to Daniel Chazan for suggesting this study, and to him, Harlan
Crowder, G.W. Stewart, and Christoph Witzgall for m a n y helpful suggestions.

2. Preliminaries

L e t P = {P~ . . . . . Pro} be a finite point set in a Euclidean space of dimension n. The


smallest fiat (translation of a linear subspace) containing P is called the ajOine hull
of P,

A(P)={X:X=~Pjwj,~ wj = 1}. (2.1)


]=1 i

The smallest c o n v e x body containing P is the convex hull of P,


C(P) = {X: (2.1) holds for some w/> 0}.
In both cases, the vector w = (Wl . . . . . win) constitutes the weights, or barycentric
coordinates, of the point X in P.
Letting the set P be represented (without ambiguity) by the n-by-m matrix P
whose columns are the points Pj, we m a y write
A ( P ) = {X: X = Pw, eTw = 1},
C(P) = {X: X = Pw, eTw = 1, W />0},
where e is the column vector (1, 1. . . . . 1)T. (Throughout, the n u m b e r of compo-
nents of vectors such as e and w is to be inferred f r o m the context. In this case, it
is m for both. An inequality relating two vectors is understood to hold simultane-
ously for each component.)
A point set (or matrix) Q is a1~nely independent if no point of Q belongs to the
affine hull of the remaining points. In that case, the weights expressing the point
X = Qw, eTw = 1 as a m e m b e r of A ( Q ) are uniquely determined. Affine depen-
dence is equivalent to the p r o p e r t y that the (n + 1)-by-k matrix

(where k is the n u m b e r of points of Q) has rank k, as well as to the p r o p e r t y that


the s y m m e t r i c matrix of order k + 1
130 P. Wolfe/Finding the nearest point in a polytope

have rank k + 1, that is, be nonsingular. The sets A ( Q ) and C ( Q ) then have
dimension k - 1, and C ( Q ) is a nondegenerate simplex whose vertices are the
points of Q, while all the faces of dimension p of that simplex are the convex hulls
of all the subsets of p + 1 points of Q. The relative interior of C ( Q ) (relative to the
smallest affine set, A (Q), containing it) is just the set of points whose weights in Q
are all positive, and for any X E C ( Q ) there is a unique face having X in its
relative interior: its vertices are those points for which the weight of X in Q is
positive.
When Q is affinely independent, we can easily minimize IXI on A ( Q ) , that is,
solve the problem
Minimize IX] 2= wTQTQw,
subject to eTw = 1.
Forming the Lagrangian wTQTQw + 22, (eTw -- 1) and differentiating, we have the
necessary conditions

eTw = 1,
eA + Q T Q w = 0, (2.3)

which have a unique solution owing to the nonsingularity of their matrix (2.2).
Since the minimand is convex these conditions are also sufficient; X = Qw
minimizes IXI on A ( Q ) .
Our algorithm will repeatedly produce such points X for selected sets Q. If it
happens that w I> 0, then of course X belongs to C ( Q ) and minimizes IX I there;
and if Q c_ P has been suitably chosen, X may even minimize over C ( P ) , and
solve the original problem. The test for that, given below, amounts to determining
whether the hyperplane

H(x) = {x: xTx = Ixl 2} (2.4)

is a supporting hyperplane of P, and hence of C ( P ) . Since [X] ~ is strictly convex,


the point which minimizes IxI over c ( P ) is unique. We refer to it as
Nr P.

Theorem 2.1 (Optimality). X E C ( P ) is Nr P if and only if

XTpj >1 IX[ 2 for all j.

Proof. Let YEC(P), 0~<0~<1; then X + 0 ( Y - X ) @ C ( P ) and I X +


O(Y--X)[2=[Xr+20(XTY--XTX)+O2[Y--XI z, which is less than [X] 2 for
small 0 unless X T Y >1x T x . It follows that X = Nr P if and only if X T Y >! X T X
for all Y E C ( P ) , from which the theorem follows.

The next result is convenient for testing the quality of an approximation to


Nr P.
P. Wolfe [Finding the nearest point in a polytope 131

Theorem 2.2 (Estimation). If 0 ~ X E C ( P ) , then

bxI/> INrPI ~MinjX~P~/IXI. (2.5)

Proof. L e t N r P = Y~ AjPj, Z~ A; = 1, all A~ ~> 0. Then

Min xTp,/IxI <-~, A,XTp,/IXI : XT(Nr P)/IxI <-INr PI.


J i

The second inequality of (2.5) of course holds for any X ¢ 0. T h e o r e m 2.2 states
just the useful part of a duality t h e o r e m for our problem, which we can state in
greater generality for the discussion in Section 10.
L e t I'[ denote any norm (a positive-homogeneous finite-valued c o n v e x function
vanishing only at the origin) on E", and ]'l* its dual norm

lY[* -- Max{y Tx: Ix l ~< 1}. (2.6)


Let
g(y) = Min yrpj. (2.7)
J

The problems
Min{lxl: x E C(P)}, (2.8)
Max{g(y): lY[* ~< 1} (2.9)
are dual:
(i) g(y)<~lxl for all x E C(P), lyl*~ 1;
(ii) the e x t r e m a (2.8), (2.9) are equal;
(iii) a m o n g the solutions of the two problems are found all the saddlepoints of
yrx, that is, all pairs x, y such that y r $ ~< y r $ ~< yTx for all x E C(P), lY[* ~< I.
(This duality is well k n o w n in approximation theory, but its first a p p e a r a n c e
eludes us.)
For the Euclidean norm we have the simplification that lY[*= [YI, and (2.5)
follows f r o m the fact that the saddlepoint (iii) is given by ~ = Nr P and y = $/15] if
Nr P ~ 0, and b y • = y = 0 otherwise.

3. The algorithm: geometry

W e call an affinely independent subset Q of P a corral if N r Q is in its relative


interior. N o t e that any singleton is a corral. There is a corral whose c o n v e x hull
contains the solution of the smallest-norm problem o v e r P, and our algorithm will
find it.
The algorithm consists of a finite n u m b e r of major cycles, each of which m a y
involve some n u m b e r of minor cycles. At the start of each m a j o r cycle we have a
corral Q and the point Nr Q. The cycle consists in choosing a new point,
adjoining it to Q, and determining whether the result is a corral. If so, the cycle is
finished; if not, a minor cycle is begun. At the start of a minor cycle we have just
132 P. Wolfe/Finding the nearest point in a polytope

some affinely independent set Q and some point X E C(Q). Each minor cycle
removes one selected point from Q and alters X. Minor cycles are repeated until a
corral is found and X = Nr Q, terminating the major cycle.
The initial corral is the singleton given by Step 0 below. Subsequently each
major cycle begins at Step 1; the minor cycles, if any, in a major cycle constitute
repetititions of Steps 3 and 2. Following the algorithm here we show how it works
on the example of Fig. 1.

Pl =(0,2)

pa=(3,o)
Fig. 1. Example.

Step O. Find a point of P of minimal norm. Let X be that point and Q = {X}.
Step 1. If X = 0 or if H(X) separates P from the origin, stop. Otherwise
choose Pj E P on the near side of H(X) and replace Q by Qu{PJ}.
Step 2. Let Y be the point of smallest norm in A(Q). If Y is in the relative
interior of C(Q), replace X by Y and return to Step 1. Otherwise
Step 3. Let Z be the nearest point to Y on the line segment C(Q)nXY(thus a
boundary point of C(Q)). Delete from Q one of the points not on the face of
C(Q) in which Z lies, and replace X by Z. Go to Step 2.
Fig. 1 and Table 1 illustrate the algorithm in a simple problem, giving the current
Q, X, and Y at the end of each step.
We must show that the algorithm terminates in a solution of the problem. First,

Table 1
Solution of the example

Step X Q Y

0 Pt P~
1 P~ P~,P2
2 R do. R
I R P , , P~, P3
2 R do. 0
3 S P2, P3
2 T do. T
1 STOP

R = (0.923, 1.385) is the nearest point to 0 on PIP2.


S = (0.353, 0.529) is the intersection of OR and P2P3.
T = (0.115, 0.577) is the answer.
P. Wolfe/Findingthe nearestpoint in a polytope 133

observe that Q is always affinely independent: it changes only by the deletion of


single points or by the adjunction of P~ in Step 1. Now the line OX is normal to
A ( Q ) , since IX I is minimal there; thus A ( Q ) C H ( X ) . Since P ~ H ( X ) , we know
P s ~ A ( Q ) , so Qu{P~} is affinely independent if Q is. Next, there can be no more
minor cycles in a major cycle beginning with a given Q than the dimension of
C(Q), for when Q is a singleton, Step 2 returns us to Step 1 (indeed, the total
number of minor cycles that have been performed from the beginning cannot
exceed the number of major cycles). Every time Step 1 is followed by the
replacement in Step 2 (the major cycle has no minor cycle) the value of [X I is
reduced, since the segment X Y intersects the interior of C(Qu{P~}) and [XI
strictly decreases along that segment. For the same reason the first minor cycle, if
any, of a major cycle also reduces [X I, and subsequent minor cycles cannot
increase it. Thus IX/ is reduced in each major cycle. Since X is uniquely
determined by the corral on hand at Step 1, no corral can enter the algorithm more
than once. Since there is but a finite number of corrals the algorithm must
terminate, and it can only do so when the problem is solved.
The reader familiar with the Simplex Method for linear programming will notice
its close relationship to our algorithm. More is said about this in Section 11.

4. The algorithm: algebra

We describe the algorithm of Section 3 algebraically. The points Pj are the


columns of the n-by-m matrix P. The matrix for the corral Q is not explicitly
formed; rather, we maintain the set of indices S C_{1 . . . . . m} designating those
columns of P constituting Q. By P[S] we mean the submatrix of P consisting of
those columns, so that Q = P[S]. Note that w is a vector of varying length (one,
in Step 0). The "notes" at the end of this section should be consulted.

Step O. Defining J by [P~r = Min{lPj[2,] = 1. . . . . m}, set S = {J} and w = (1).


Step 1. (a) Set X = P [ S ] w .
(b) Define J by XTPj = Min{XTpj, all ]} (Note 1).
(c) If XTp~ > X T X - Z 1 Max{[P,]2, Max,~s[pj[2}, STOP. (Notes 2,7).
(d) If J E S, then STOP (Note 3).
(e) If not stopped above, then replace S by Su{J} and w by (w, 0).
Step 2. (a) Solve the equations
eTv = 1, eA +P[S]TP[S]v = 0 (4.1)
for v. (Section 5 is devoted to that task.)
(b) If v > Z 2 e (Note 4), set w = v and go to Step 1. Otherwise,
Step 3. (a) Let POS be the set of indices i ~ S for which w, - v, > Z3 (Note 5).
(b) Set 0 = Min{1, Min{w,/(wl - v,): i E POS}}.
(c) Replace w by Ow + ( 1 - O ) v .
(d) Replace by zero all elements of w not greater than Z2 (Note 6).
(e) Delete from w some zero component, and from S the correspond-
ing index. Go to Step 2.
134 P. Wolfe/Finding the nearest point in a polytope

Notes. 1. This rule for choosing J is the handiest, and keeps the number of cycles
low, as well as making approximate affine dependence of Pj on P IS] unlikely. For
large-scale problems there are likely better rules: See Section 7.
2. The term subtracted from X ~ X is intended to make due allowance for
roundoff error in the whole procedure. When Z, = 0 we have the optimality test of
Theorem 1, Section 2. Calculating with sixteen-digit precision we use the generous
value Z1 = 10 -12. Without some such provision, the optimality test might never be
passed.
At the stop in l(c), X is N r P , P/S] is the corral which holds it, and w its
barycentric coordinates in P/S].
3. Stopping in l(d) signals temporary disaster: X is so inaccurate that the
relation XTPi = x T x , which should hold for all ] E S, fails to more than the degree
measured by Z1. We have encountered this when P contained some nearly
identical points, and generally find than that the stop l(c) would have occurred if a
modestly larger value of Z1 had been used. That is a happy ending; see Section 8
regarding the effect of Z1 on the final answer.
Some work would be saved by using only j ~ S in l(b) and omitting this step,
but then some other kind of guard shou!d be kept on the affine independence of Q.
The check is also a fine trap for programming errors.
4. Another precaution. We use Z2 = 10-1°.
5. We use Z3 = 10-1°.
6. The value of 0 given by 3(b) is the smaller of 1 and the largest value for
which all components of Ow + (1 - O)v are nonnegative. If 0 < 1, then at least one
of those components will vanish. The replacement of 3(d) makes that vanishing
decisive. We use the tolerance Z2 here for consistency with Step 2(b).
7. Answers to well-formed optimization problems are subject to checks which
are usually so easy it would be a sin to forgo them. Necessary and sufficient (and
redundant) conditions that X, S, and w i> 0 solve our problem are the vanishing of
these four numbers:

(a) 1 - eTw,
(b) Ix-P[S]wl,
(c) Max IX~P, - X~Xl,
(d) Min XTpi - xTx
i

(whose connection with the accuracy of the final solution is discussed in Section
8). If they do not approximate zero to a plausible degree, the algorithm has
blundered. A possible fix would be to repeat the calculation, using instead of the
matrix P just P [ S ] , and perform the tests again. We have had no experience with
unsatisfactory performance when using the methods recommended in the next
section.
8. A sometimes most useful feature of the algorithm is that after step 0 the
point set P is consulted only in Step l(b), and that step requires only one point Pj
achieving Min{XTpj}. The set P may well be presented in some form other than
that of a simple list, yet permit the step to be executed efficiently. An important
P. Wolfe/Finding the nearest point in a polytope 135

case occurs when P is actually the sum of two other sets: P = C + D =


{c + d : c E C, d @ D}. Although the number of points in P is the product of those
in C and D, it is only necessary to look at each point of C and D once, for the
desired m e m b e r of P is just g + d, where ~, d minimize X ~ P for p E C and
p E D, respectively. The problem of finding the shortest distance between two
convex hulls allows such a formulation, which we exploit in Section 10; and some
of the extensions of this algorithm considered in Section 11 likewise depend on
such a "decomposition" of P.

5. Solving the equations

We have worked with the four ways A - D below of handling the equations (4.1),
and concluded that D is the best. Let s be the number of indices of S - that is, the
number of points in the current corral. By "operation" below we mean the
execution of one floating multiply and one floating add. In making estimates, we
give only the leading term of the polynomial constituting the accurate value.
Storage estimates assume that symmetric matrices are stored without redun-
dancy.
Method A. Maintain in storage the inverse E of the matrix of equations (4.1),
modifying it when S changes. Initially

When S is increased in Step 1, E is replaced by

E + YY~/t -Y/t]
- Y/t l/t J '

where Y = E Q T p j and t = P ~ P ~ - p T Q y . (The " p i v o t " t is easily shown to be


positive [W2]. It is the square of the distance of Ps from A ( P [ S ] ) . ) When S is
decreased in Step 3, the appropriate column C and diagonal entry d of E are used
to form E - C d - I C T, from which the new inverse results by dropping the same
row and column. The solution of the equations is just the top row of E.
Method B. Maintain in storage the "tableau" for the quadratic programming
problem (1.1). The tableau will contain E above and all the other data of the
problem, suitably transformed by E. This approach has been discussed elsewhere
[W14] for its pedagogical value, but its storage (½m2) and arithmetic requirements
ira2 operations per cycle) rule it out unless m ~ n, which is rare. Since it suffers
from roundoff error even more than Method A, which is already in agony (see
Section 8), we will not pursue it further.
For both procedures below it is efficient to find v of (4.1) by using these
equations:
(ee T + P [ S ] r P [ S ] ) u = e, (5.1)
v = u/e~'u, (5.2)
136 P. Wolfe/Finding the nearest point in a polytope

whose equivalence with (4.1) is easily checked. The order of the system (5.1) is
one less than that of (4.1), and the normalization e~v = 1 is almost perfectly
achieved.
Method C. The system (5.1) constitutes the normal equations A TAu = A Tb for
the least-square solution of the equations A u = b, taking

A = P/S]

and b = (1,0 . . . . . 0) T. In the interpretive language APL/360 the " d o m i n o "


operator provides such a solution, and (5.1) is solved as
u~--'b~A.
This method is of course the most convenient to use in an A P L program, and
about as accurate as Method D below; in fact, the machine-language implementa-
tion of " d o m i n o " uses the arithmetic of Method D to compute u. Interestingly, in
A P L Method C is much quicker than D, although each time C is called, a number
of D steps are taken; and it also uses slightly less time than Method A.
Method D. We maintain the matrix of the equations (5.1) in the form RTR,
where R is an upper triangular matrix, doing the arithmetic in the manner
suggested by Golub and Saunders' treatment of the least-squares problem [6].
We have always
R TR = ee T + p [ S ]Tp [ S ], (5.2)

and the solution of (5.1) is the result of solving the two systems
T~ = e
{R Ru a. (5.3)

R must be altered whenever S is. The required work changes the algorithm of
Section 4 as follows:
(i) Add to Step 0: L e t R be the 1-by-1 matrix [(1 + IPjI2)1[21.
(ii) Add to Step 1:
(f) Solve for r the system
RTr = e + P[S]Tpj. (5.4)
(g) Adjoin to R, on the right, the column [rp] T, where
p = (1 + Ps Tp J - rT/.)l/2 •

(iii) Delete " G o to Step 2" from 3(e), and add to Step 3:
(f) Let I be the position of the component deleted in (e); delete the I ~
column of R.
(g) If I exceeds the number of columns of (the new) R, go to Step 2.
Otherwise let
a = R~,I, b = R,+I,I, c = (a 2+ b 2)1/2;

replace R~,. (row I of R ) by (aR~,.+bR~+l,)/c and R~+I. by (-bR~,.+


aRi+l,.)[c; increase I by one, and repeat this step.
P. Wolfe ]Finding the nearest point in a polytope 137

It is easy to check that (ii) above and (f) of (iii) maintain the relation (5.2). Step
2(g) uses plane rotations to restore R to upper triangular form after 2(f) has
damaged it.
The storage needed for this scheme is the same as for A above. Solving a
triangular system of s equations takes ½s2 operations, so (5.3) takes s 2. Equations
(5.4) take ½s2, while the rotations of 3(g) take 2(s - 1 ) 2, which has the expected
value ~s 2 if I is randomly, uniformly chosen. Thus when S is increased
(decreased), ~s 2(~s2) operations are done, for an average of 19s122 in the long run.
We find that with this procedure there is virtually no accumulation of roundoff
error, unlike A (see Section 8). Since the difference of ~s 2 in the average operation
counts of the two methods is swamped by the m n operations taken by Step l(b) in
the typical problem, we prefer D.

6. Experience

Although the amount of computational work done in a single step of the algorithm
is well determined (see Section 5), we do not know how to estimate the number of
iterations required. That number is of course bounded by the number of
possible corrals in P, which is in turn bounded by the number of subsets of P
having no more than n + 1 elements, a calculable number; but the result is
preposterously large. As with linear programming, we must resort to experiment
to determine when the method is practical.
We have tested the algorithm on four types of problem. All of them start with a
set po of m points chosen at random, uniformly distributed over an n-cube of side
2 centered at the origin. (Draw m n integers at random without replacement from
1, 2 . . . . . 104, divide by 5000 and subtract 1.)
Type 0: P = po.
Type 1: The point X ° is chosen randomly from 2P °, and Pj = X ° + po for all j.
Type 2: po is compressed by 10 3, and then displaced one unit, along the xl
axis:
P = {(1 + 10 3x~, x z ..... x,): (xl, x2 . . . . . x,) ~ po}.
Type 3: Like T y p e 2, but displaced by 0.01 instead:
P ={(10 2+ 10 3xl, x2 ..... x,): (xl . . . . . x,) E P°}.
For a problem of Type 0, Nr P = 0 is almost certain while for Type 1, Nr P # 0
is. Both problems are very easy; Type 0 since the origin is well-centered in P,
Type 1 since the origin is likely to be near a " c o r n e r " of P and require only a few
points of P to determine the solution. When m > n a Type 0 problem almost
always terminates in n + 1 major, and no minor, cycles: the first corral of
dimension n + 1 constructed contains the origin. Table 6.1 gives the number of
major and minor cycles for Type 1 problems of various size. In each case, the
number is the average for a sample of ten problems.
Table 6.2 gives the same data for Type 3 problems. We omit the corresponding
data for T y p e 2, since they can be generally described as requiring about 20%
138 P. Wolfe/Finding the nearest point in a polytope

fewer major cycles than do Type 3, and like Type 3 generally have terminal
corrals of maximal size.
Note that the sum of the number of major and minor cycles is the total number
of times a system of equations is solved. The difference of those numbers is the
number of points in the terminal corral. The number of minor cycles can be
viewed as a count of the "mistakes" made by the algorithm: the number of points
selected which do not wind up in the terminal corral. When that number is zero,
the amount of work the algorithm does in equation-solving in the updating
versions A,B,D is simply what would be required to determine the weights w if the
terminal corral were known in advance. The results tend to support our hunch that
for large m the number of cycles increases roughly as log m.
The graphs in Figs. 6.1-6.3 show the convergence of IXk[ and the dual objective
g ( X k) (see Section 2) toward their common value d = INr P I, as a function of the
major cycle number k, for three problems having 80 points in E 2°. Each of the
problems 0,2,3 is of the corresponding type, built from the same P°. The numbers
of major cycles to termination were respectively 21,41,54, and minor cycles
0,21,34. The calculations were done using Method C of the previous section in a
APL\360 routine on an IBM 360 Model 91. The CPU times required were 1.70,
4.63, and 6.63 seconds.
It appears that the convergence of IXkl to d is roughly linear, as is that of g(X k)
when d > 0. (Note that the plotted points are subject to a discretization error of
the width of one dot, and that since the terminal values of !'1 and g were within
10 -15 of d, those points are omitted from the logarithmic graphs.)

Table 6.1.
Cycles for Type 1 problems (major above minor)

n\m 20 40 60 80 100 120 140 160

3.3 3.5 4.2 4.1 5.4 4.2 4.7 5.0


10
0.0 0.2 0.2 0.2 0.1 0.1 0.2 0.6
4.2 5.1 5.2 4.6
2O
0.1 0.2 0.1 0.1
4.9 5.7
30
0.1 0.0

Table 6.2.
Cycles for Type 3 problems

n \m 20 40 60 80 100 120 140 160

14.6 20.2 24.1 25.5 26.0 28.8 30.5 30.7


10
5.1 10.2 14.1 15.5 16.0 18.8 20.5 20.7
14.5 29.8 29.6 50.0
20
1.3 10.6 19.6 30.0
16.2 27.1
30
1.1 3.3
2~-

I-" Ixkl

0
o%

............... k 2F
0 "-.... k

-I ~. ~°=,,...." ......
. g(x k) .... • -... 10g,0[Ix"l-,J]
-2 , I , i , I
I0 20 30 0 I0 20
Fig. 6.1. Solving P r o b l e m 0.

-I °°
• iOq,o[txkl_d]
2.5 -2 %°
°°ooo°°°ooo•
OO°o.o.
2.C -3
..°o
I.~ . Ixkd °

i I I "~
I.E "°-,.o,.•.,IlIIIIlIIIII|IlU ..........
I0 20 50 40

0.5 ." g(x~)

(10
°,.o

-0.5 -I "... %• lOg,o[d-g(xk)]


°,..°•••oOo•
-I.0 L I I L I ! I I q I -2 • o,O•,
I0 20 30 40 50 • •,,°

-3
-,d.I q I h t i I i I
I0 20 50 4O
Fig. 6.2. Solving P r o b l e m 2.

2.0 0

-2
"'.. ....... .. IOq,o[Ix'l-o]
r.5
"%'•°*•°'°°•'o••.•.,°°•••O.oo,,,l,
I.O °°
[xk(
r I i r i I r I i r i I
0.5 •°
Oo
0 I0 20 30 40 50 60
(10 =====================;:-~; ..... k

,°..•'.~ °•°'•
-0.5
. .'... ~y~xk~
"~
"• lOglO[d_g(xk)]
-I.0 , ,

-0.5
oO•
-2
_ j...".'... :.'...._._
~g( •°•
i I i i p i i I i I -3 x):O .• •
-2.00 I0 20 30 40 50

0 ro 20 30 40 50 60
Fig. 6.3. Solving P r o b l e m 3.

139
140 P. Wolfe / Finding the nearest point in a polytope

7. Possible improvements

For a problem having a very large number m of points, almost all the computing
time this algorithm requires will be taken in finding the new point P j in Step l(b).
There are several possibilities for changing the selection criterion given in Section
4 to improve matters.
The first possibility considered is that the criterion used could be improved so
that the number of major cycles might be reduced. Step l(b) as given chooses Pj
as that point on the same side of the separating hyperplane X T x = X T X as the
origin which has greatest distance from the hyperplane. We could, instead, either
(i) determine how much reduction in IX[ would ensue in adjoining any Pi, (ii)
estimate the latter quantity by finding the distance from the origin to the segment
X P i for all P~, or more crudely (iii) choose Pi so as to maximize the angle between
X and P~ - X. We deem (i) as too hard to be worthwhile, (ii) as interesting, and (iii)
as easy enough to try out quickly, which is what we have done. The results, even
for problems with n = 5, m = 100, were uniformly negative: there was a reduction
of a few percent in the number of major cycles, but the extra calculation required
almost doubled the total CPU time. A cheap approximation to this is (iv): Choose
J to minimize X T ( p i - X)/[Pj[, having calculated IP, I once and for all. The amount
of work is about the same as for the original method, but so is the number of
iterations, on the average, for our problems.
We have not yet tried out the other two ideas worth mentioning, although their
analogues have proved successful for linear programming problems with very
large numbers of variables.
"Cyclic pricing": Only a portion of the points of P are examined before
choosing Pj and performing the rest of the algorithm; then another portion is
treated the same way; and so on, beginning again when all of P has been
examined. Ultimately no candidate will be found in P, and we are done.
"Suboptimization": A small number of points of P are chosen after examining
all of them - for instance, the points having the ten lowest values for x T p i -- and
the problem is completely solved on that subset; and this is done repeatedly.
Both procedures above are almost certain to increase the number of major
cycles required by the algorithm, but hold promise of greatly decreasing the total
work. At this time we have no more specific recommendation as to how to carry
them out.

8. Accuracy

In most applications not much accuracy is required of the solution of our problem;
but the answer obtained is quite accurate anyway. Here we show that the problem
of finding Nr P is well-posed, and examine the round-off error in its solution.
The problem is well-posed if small changes in its data lead to small changes in its
answer - i.e. if Nr P is continuous in P. That is the content of the Theorem below,
due to G.W. Stewart.
P. W o l f e / F i n d i n g the nearest point in a polytope 141

T h e o r e m 8.1. S u p p o s e that X = N r P, 3( = Nr/5, and

IP, - P,I < 8 for all j.


Then

II£f- Ixll < 8 (8.1)


and
IX -- X [ < (41X[8 ~- ~2)1[2. (8.2)
Proof. Write X = Pw, ) ( =/5}0, with e~w = 1, e ~ = 1, w, ~ / > 0 . T h e n ])(I =
I/srvf<-- l/swl ~ lPwl + l(/5 - P)w[ < lxl + & and similarly [X I < I1(] + & w h i c h
p r o v e s (8.1).
F o r all j, x T p j >t X r X , w h e n c e X r f f j >~X ~ X - ~lxl; thus X ~ f f >i X T X - 6IX[.
C o n s e q u e n t l y IX - X/2 = I~12- 2x~R + I x l 2 < (IXl + 8) 2 - 2(Ixl =- 8 I x l ) + rx[ 2 =
41x[8 + ~ , completing the proof.

T h e b o u n d (8.1) is o b v i o u s l y sharp, and the e x a m p l e of Fig. 8.l s h o w s that (8.2)


is not bad. W e set IP,l=r, P,P2±P,, IP2l=r+8; P , = ( I + ~ / r ) P , , P2=
P2/(1 + 6/r). Then X = N r { P , , P2} = P1, X = Nr{P1, P2} =/~2, I/~, - PI[ =
1/52- P21 = 6, and IX - X I = 2r sin ½0 = [2r2(1 - cos 0)] '/2 = [2r28/(r + 8)] 1/2 =
[2lxi81(1 + 8/Ix[)] ''2.

0
Fig. 8.1. Perturbation of P.

T h e t h e o r e m a b o v e permits us to m a k e a simple a posteriori error analysis.


S u p p o s e the c o m p u t a t i o n has finished in Step l(c), as it should, with X ~ 0. L e t
B = MaxslPjl. Define
ea=]eTw--l[,

eb = IX - P t S ] w l / B ,
(8.3)
ec = MaxlXTPj - X T X I I B [XI,
j~s

ed = IMin X r P i - X T X I [ B [X I,
J

T h e o r e m 8.2 s h o w s the role of these quantities in a b a c k w a r d error analysis,


w h i c h is a posteriori b e c a u s e we do not k n o w h o w to give a priori b o u n d s for eo
and ed.
142 P. Wolfe/Finding the nearest point in a polytope

Theorem 8.2. L e t P, S, etc. be as above, with w > 0 , X # O, a n d e~< 1. S e t


e = Max{e~, ea}.
There is a p o i n t s e t / 5 s u c h t h a t X = N r / 5 a n d

Maxl/sj - E I < B (2e~ + 2eb + e).


i
Proof. L e t T be the set of all indices j belonging to S or such that x T p j -- X T X <
0. Set
/sj = p i e T w q- X - P w - Xaj,

where
aj = x T ( p j e T w -- P w ) I X T X ,

for all/" ~ T, and /Sj = p~ for all other L (The required P is readily deduced by
altering w, and then P, so as to make the quantities of (8.3) vanish - in the order of
their writing.) L e t ~ = w [ e T w . Then ~ > 0 , eTr~ = 1, Z ~jaj = 0, X = / 5 [ S ] ~ , and
x T f i j -- X T X = 0 for j E T are all easily checked, showing that X = Nr/5.
N o w for j ~ T,
[X[2aj = X T P j ( e T w - 1) + x T ( p j -- X ) + X T ( X - P w ) ,

so [XI " lajl ~ [Pjlea + B e + Beb. Further, for j ~ T

/sj - Pj = P~ (e T w -- 1) q- X - P w - Xaj,
so
IE -P,I ~< IP, le~+ B e b + I X [ . la~l,
f r o m which the conclusion follows.
In practice the errors ea, eb are trifling c o m p a r e d with e, and ec is small if the
equations of the problem are well handled, so that ea = eb = 0, e = ed closely
represents the normal situation. If the computation for the stop rule l(c) has been
done with reasonable a c c u r a c y we should have Min~ x T p j - - X T X <~ Z I B 2, so that
e <~ZIB/IX[. By T h e o r e m 8.2 the hypotheses of T h e o r e m 8.1 hold with 6 =
Z1B2/[XI; we conclude that, where X * is the exact solution of the problem,

IIx*l- Ix[I ~< Z,B2/IXI


and
IX* - X] ~ 2BZI/2
(ignoring the term 6 2 of (8.2)).
As mentioned in Section 4, something like the quantities of (8.3) should always
be checked before a solution is accepted. In our routines the calculation of ea and
eb is actually fallacious: eb vanishes because the same arithmetic is used to f o r m
X = P [ S ] w as to check it, and calculating e~ amounts largely to determining how
accurately the machine represents number close to l(in our case, to within about
10-16).
P r o b l e m s 2 and 3 of Section 6 have been run by Methods A and D, as well as by
C used in Section 6. The terminal valfies of ec and ed are given in Table 8.1.
P. Wolfe/Finding the nearest point in a polytope 143

Table 8.1

do, ed

Problem Method A D

2 1 . 5 x 1 0 -1°, - 1 . 5 x 1 0 -1° 9 . 7 x 1 0 16, _9.7x10-16


3 abandoned 9.6 x 10 -16, - 8 . 2 x 10 -16

Problem 3 did not terminate properly with Method A. At major cycle 48 the
algorithm was abandoned when it calculated a negative pivot t. (Mathematically,
t > 0; t < 0 is disaster, since the less inaccurate value t = 0 destroys the logic of
the algorithm. We might try to recalculate a better inverse, but there is no
guarantee that we could do so using the Method A scheme.) At that point B IXlec
was 1 . 9 x 1 0 5 a b a d value.
Fig. 8.2 plots log~o(B ISleo) for the runs of Table 8.1. Method D seems to entail
no growth of roundoff error, while Method A, probably in its minor cycles,
undergoes exponential growth after a series of good cycles mostly having no
minor cycles.

-5! ...-
,..,.,°°
•..°°
-10 ,,°" ...'""

-lOI
-I~ .,,°,°-°,° ......... " "°'•°°°° ....• -[5 ...°,°°°........... "

-20 i I i I r I i I I I -E0 ~ I ~ I I I r I , I
0 I0 20 30 40 50 I0 20 30 40 50

M e t h o d A, P r o b l e m 2 M e t h o d A, P r o b l e m 3 (abandoned)

44
F -141 ..... .'....
-15 k • ° °°°°°° ,.°"°"°°" -15" . ..,'."'"... ,..
°
-16
0
Lr "I ~ I i | r I
10 2O 30 40
I
5O
7'?"T ii "" I ~ I i E ~
0 I0 20 30 40 50
M e t h o d D, P r o b l e m 2 M e t h o d D, P r o b l e m 3

Fig. 8.2. E r r o r in equation solving: loglo Maxj IXTPj --XrXI.

9. Convenience features

In some applications the algorithm for Nr P is used repetitively: after a problem


has been solved, P is altered by the adjunction or deletion of some points. It will
not normally be necessary to go through the whole algorithm with the new set to
solve the new problem, provided the data S and w are retained, as well as
whatever information is maintained for the solution of the equations.
The routine we use can be entered in any of five modes, three of which are
concerned with changing P. Modes 2-5 below assume the problem to have been
solved, with S and w at their terminal values.
144 P. Wolfe [ Finding the nearest point in a polytope

Mode 1. The entire problem is solved. The routine returns Nr P.


Mode 2. Points are adjoined to P. Here we need only begin the algorithm again
at Step l(a) (or at l(b) if X has also been retained), and proceed as in Mode 1.
Mode 3. Points not in the terminal corral (i.e., whose indices are not in S) are
deleted. Here we discard those points from P and alter appropriately the
identifying tags of S; Nr P is unchanged.
Mode 4. Points in the terminal corral are deleted. If there are many such points
it might be best to start the problem again from scratch. If not, then the existing
information can be utilized by replacing S by S* (those indices of S retained) and
w by w*, obtained from w b y discarding the components in S\S* and
normalizing the result. The algorithm is then entered at Step 2 for method C of
handling the equations; likewise for D, after deleting columns of R and restoring
triangular form by rotations.
Mode 5. The quantities (a-d) of Note 7, Section 4 are calculated and returned
for inspection of the errors.

10. Separating hyperplanes

As Theorem 2.1 shows, a nearest-point algorithm solves the dual problem of


finding a hyperplane separating a point from a given finite set. (Hereafter the word
"plane" will be used for "hyperplane".) Our procedure may be a fine way to
construct planes, but many proposals have been made for that p r o b l e m - b o t h
iterative, nonterminating methods (e.g. [2, 9]) and finite methods using linear or
quadratic programming [3, 8 ] - a n d we have not attempted any comparisons; so
we will deal with the subject only briefly.
Recall the dual objective function (2.7) g(y) = Mini yTpi. We take lY l* -- 1. The
plane
G(y) = {x: yTX = g(y)}
separates C(P) from the origin when g(y)>-O, and strictly separates when
g ( y ) > 0. The distance from the origin to G(y) is

Min{Ixl: yTx = g(y)} = Min{[xl: yTx t> g(y)}


= Min{r: yrxr >~g(y) for some lxl-- l}
= Min{r: lY [*r >I g(y)} = g(y),

so the solution of the dual problem (2.9) Max{g(y): l y l * <~ 1} gives the " b e s t "
separation.
There are only two norms one would want to compute much with other than the
Euclidean:

IXll = E Ix~l and [xl~ = Max [x,[.


i i

They are dual to each other I'l* = ]'l 1), and using either of them our
primal and dual problems can be stated as the same pair of dual linear
P. Wolfe / Finding the nearest point in a polytope 145

programming problems. Computationally, then, there is essentially one linear


programming problem to be solved, and the simplex method is the right way to do
that. Properly organized, the resulting algorithm looks much like that for the
Euclidean norm.
We suspect that a good measure of difficulty, in number of cycles required, for
the problem of finding a single (not the best) separating hyperplane, when one
exists, is D/d, where D is the diameter of the projection along Nr P of C(P) and
d = ]Nr PI. In the experiments reported in Section 6, D was about 2(n - 1)1/2; for
problems 1,2,3 (n = 20) such D/d has the values 2.1, 8.7, 1300; the actual numbers
k at which g(x k) > 0 first happened were 1, 3, 33.
A c o m m o n use of planes is in determining the separability of two point sets. We
follow Canon and Cullum [3] in reducing that problem to a nearest-point problem,
using these observations: (i) If two closed, compact convex sets C~, C2 are
disjoint, then points X E C1 and Y E C2 can be found such that IY - X I is
minimal, and any plane normal to Y - X may be translated to separate the two
sets; (ii) Such X, Y may be found as the resolution Z = Y - X of the point
Z = Nr [Cz - Cx]. (Note that C 2 - C1 is defined as {y - x : x E C1, y @ C2}, and that
for any sets Q , R , C ( Q ) - C ( R ) = C ( Q - R ) . This reduction of the two-set
problem to a one-set problem has been attributed to Caratheodory, but we have no
citation.)
Let Q and R be two finite point sets. It is required to determine whether their
convex hulls intersect and, if not, to find a well-separating hyperplane - that is, to
find a vector X and numbers a, b such that XTQj >1b for all Qj ~ Q, XTR~ <- a for
all R~ E R, and a < b, with a and b well-separated in the sense that (b - a)/IX I is
large.
Let g(y) = Mini yVQj - Max~ yTR, If g ( y ) > 0, then y is the normal of a plane
separating Q from R, and the maximum of g for lY[ ~< 1 will give the best
separation. Since

g ( y ) = Min yr(Qj _ R,),


i,j

it is the dual objective for the problem Nr P, setting


P = { Q j - R ~ : all i,]}.
Write

NrP=Y~(Qj-Ri)wii, w~j>>-O, ~ wij = 1.


id i,j

It is easy to check that

Xo = ~ Q~w,j ~ C(Q), XR = ~, R~w~ E C(R)


id id

are the points of C ( Q ) , C(R) for which [XQ-XR] is minimized. X = N r P =


XQ - X R is the c o m m o n normal to the two planes G(X/[X]), G(-X/[X[), which
constitute the best separation of Q from R.
To apply the algorithm of this paper to the present problem efficiently one
146 P. Wolfe / Finding the nearest point in a polytope

avoids finding the set P explicitly, but works only with Q and R, as foreshadowed
above (Section 4, Note 8). For Step 0 of the algorithm of Section 4 we choose a
point of Q, find that point of R nearest it, then that point of Q nearest the chosen
point of R, and so on. When no new points arise, P1 is set to the difference of the
last member of Q and the last member of R found. (This does not find Min [P], but
that is a hard task.) Throughout, the set S is a set of index-pairs (i,/); the first pair
identifies the members of Q and R just chosen. In Step l(b) we find Qj minimizing
XTQj and Ri maximizing XTRi, forming the new member of P as Q~ - Ri. The rest
of the algorithm is essentially unchanged.
When Method C of Section 5 is used, it is appropriate to retain those members
of P that have been constructed as long as they have positive weights w, and to
discard them when their weights vanish in 3(d). In Method D it is not necessary to
retain them at all once the inverse has been updated, so the storage requirements
are the same as for the problem N r Q u R . A few experiments support our intuition
that when Q is separable from R the number of steps to find a separating plane is
measured by D / d as above, and that the number to find the best plane is about the
total required by the separate problems Nr Q, Nr R.

11. Variants and extensions

In this section we point out the relationship of our algorithm to the simplex
method for linear programming and to the Frank-Wolfe method for convex
programming, and how it can be extended to various types of constraint set and
objective function.
The description of the algorithm in Section 3 parallels quite closely Dantzig's
"Simplex interpretation of the simplex method" [4, Section 7.3], a visualization of
his procedure for the linear programming problem Min{cTw: W ~ 0 , eTw = 1,
AW = b} in the space of the vectors Pj = [cj; Aj], where {Aj} are the columns of
A. A "basis" for the linear programming problem corresponds to our corral, and a
"change of basis" to a complete major cycle containing just one minor cycle.
There are differences in detail, of course: unlike our problem, a basis must have n
points in order to define a hyperplane by means of which the solution can either be
improved or shown to be optimal; and in case of degeneracy a major cycle may
not reduce the objective, so special provision must be made to avoid repetition of
bases. Our experience has been that the two methods take similar numbers of
steps to solve problems of the same size.
The procedure can also be viewed as extending the F r a n k - W o l f e algorithm.
While its original presentation dealt with the minimization of a convex function
over a polyhedron (i.e., a set explicitly defined as the intersection of halfspaces), it
is best conceived as working with a polytope whose vertices are generated by the
solution of linear programming problems in case the constraint set is a polyhed-
ron. The present algorithm reduces to the Frank-Wolfe algorithm if the corral Q
is required to be no more than one-dimensional. (The effect of such a restriction
on the convergence rate is, of course, drastic.)
Gilbert [5] extended the Frank-Wolfe algorithm to the problem Min{Ix I: x E S},
P. Wolfe [Finding the nearest point in a polytope 147

where S is any convex set for which the "support" problem Min{yTx: x E S} is
readily solved, rather than just a polyhedron, and Barr [1] showed that relaxing
the restriction to one dimension considerably improves matters. Barr solves a new
quadratic programming problem each time a new point x E S is generated; it
would seem that our algorithm should be used instead, with the support problem
replacing our Step 1.
The basic idea of our algorithm seems worth using for some variants of our
problem, although we have not had occasion to implement any of them. For a set
of vectors P = {P1 . . . . . Pro} let

(*) L(P)= {X: X = ~ Pjw~}


be the linear subspace spanned by P and
K(P) = {X: (*) holds for some w >1 0}
be the convex cone spanned by P. An algorithm for finding that point of K(P)
closest to a given point R E E n closely parallels the algorithm above. (Here H(X)
denotes the hyperplane {x : (X - R)Tx = IX - R 12}.)
For Step 0, find a ray K (PD closest to R (as Max~ R ~Pi['[PJ1)- Let X be the point
of the ray closest to R, and set Q = {P~}. The rest of the algorithm is exactly as for
Nr P i f " A (Q)" is replaced b y " L (Q)" a n d " C (Q)" b y " K (Q)" throughout. The
algebraic algorithm for this problem is, in fact, a bit simpler than that for Nr P, but
the numerical optimality criterion requires more thought. The resulting algorithm
is almost exactly that introduced by Wilhelmsen [11] in connection with a certain
approximation problem.
The most general problem of this kind is not much worse. Any polyhedron - the
intersection of finitely many h a l f s p a c e s - h a s the form C(P 1) + K(P 2) for some
finite sets p1, p2. In order to find the nearest point to R in that, we begin by doing
Step 0 for both the " C " problem (when p2 is empty) and the " K " problem (when
p1 is empty) and adding the results. In general one has Q~ c_ P~, Q 2 c p2 (Q~u Q 2, a
generalized corral, has at most n + 1 members), and performs Step 1 for either the
C problem or the K problem. The remaining steps are as in the C algorithm, with
A ( Q ) replaced by A(QI)+L(Q 2) and C(Q) replaced by C(Q~)+K(QZ). The
algebraic version of this procedure will be very little different from that of the C
algorithm, but again the numerical questions are important and unstudied.
When a polyhedron is presented as the intersection of halfspaces it is usually an
enormous task to determine P~, p2 of the preceding paragraph explicitly. That
could be avoided by proceding in the spirit of the original Frank-Wolfe algorithm
(or of the "column generation" idea [13]): for each major cycle we require only
the solution of Minj(X -R)TP~ for i = 1 or i = 2, and such a solution will be the
outcome of minimizing (X - R)~x over the polyhedron using the simplex method.
Such a procedure might compete with the usual quadratic programming al-
gorithms for problems of small dimension with many constraints, but we think
that the following ingenious reformulation, due to Witzgall [12], is the right way to
proceed. It rests on the fact [10, p. 55] that if K is a convex cone and K p its dual
(or negative polar: K p = {z: yTz 40 for all y E K}), x is any point, and y and z
148 P. Wolfe/Finding the nearest point in a polytope

are respectively the nearest points to x in K and K p, then y and z are o r t h o g o n a l


and x = y + z. In other w o r d s ,

(**) x=x +Nr(K-x)+x +Nr(KP-x)

for a n y x. N o w let R be the p o l y h e d r o n in n - s p a c e defined b y the inequalities


A ~ x + b~ <<.O, k = 1. . . . . K. Letting Bk be the (n + D - v e c t o r (Ak ; bk), define the
c o n v e x cone in (n + 1)-space

K = {(x ; t): B [ ( x ; t) <~O, all k};

then R = {x : (x ; 1) E K}. If R is n o n e m p t y and (y ; s) is the nearest point in K to


(0 . . . . . 0; 1), it is e a s y to s h o w that 0 < s ~< 1 and that y / s = N r R. T h e c o n e K p is
e x a c t l y the set of all n o n n e g a t i v e linear c o m b i n a t i o n s of Bk, so that our algorithm
f o r cones can be applied to find N r ( K p - (0 . . . . . 0; 1)), and (**) i m m e d i a t e l y gives
the a n s w e r to the same p r o b l e m for K, f r o m w h i c h N r R is determined.
T h e algorithm can also in principle be e x t e n d e d to the minimization of a n y
differentiable c o n v e x f u n c t i o n f on a polytope. Step 1 of the algorithm is then the
task of minimizing V f ( x ) T p j , and the rest of the algorithm is the same p r o v i d e d
one has a m e a n s of finding a satisfactory a p p r o x i m a t i o n to the m i n i m u m of f o v e r
a n y affine set A (Q). T h e required algebra is easily d o n e w h e n f is a strictly c o n v e x
quadratic function; the ensuing algorithm differs f r o m that of Section 4 only in the
equations to be solved. T h e general s c h e m e , in c o n n e c t i o n with generation of
vertices of the p o l y t o p e w h e n it is p r e s e n t e d as a p o l y h e d r o n , has b e e n used in the
interesting w o r k of V o n H o h e n b a l k e n [7], w h o gives an algorithm, b a s e d on the
same ideas we have used, f o r the minimization of a n y p s e u d o - c o n v e x f u n c t i o n
o v e r a b o u n d e d p o l y h e d r o n . H e has also applied the m e t h o d to the minimization
of a certain nondifferentiable c o n v e x function. It w o u l d be fascinating to k n o w
u n d e r w h a t conditions this kind of m e t h o d is c o m p e t i t i v e with other s c h e m e s f o r
convex programming.

References

[1] R.O. Barr, "An efficient computational procedure for a generalized quadratic programming
problem", SIAM Journal on Control 7 (1969) 415-429.
[2] L.C. Barbosa and E. Wong, "On a class of iterative algorithms for linear inequalities with
applications to pattern classification", in: Proceedings of the first annual Princeton conference of
information sciences and systems, 1967, pp. 86-89.
[3] M.D. Canon and C.D. Cullum, "The determination of optimum separating hyperplanes I. A finite
step procedure", RC 2023, IBM Watson Research Center, Yorktown Heights, New York
(February, 1968).
[4] G.B. Dantzig, Linear programming and extensions (Princeton University Press, Princeton, N.J.
1963).
[5] E.G. Gilbert, "An iterative procedure for computing the minimum of a quadratic form on a
convex set", SIAM Journal on Control 4 (1966) 61-80.
[6] G.H. Golnb and M.A. Saunders, Linear least squares and quadratic programming, in: J. Abadie,
ed., Integer and nonlinear programming (North-Holland, Amsterdam, 1970) Ch. 10.
[7] B. yon Hohenbalken, "A finite algorithm to maximize certain pseudoconcave functions on
polytopes", Mathematical Programming 9 (1975) 189-206.
P. Wolfe/Finding the nearest point in a polytope 149

[8] O.L. Mangasarian, "Linear and nonlinear separation of patterns by linear programming",
Operations Research 13 (1965) 444-452.
[9] B.F. Mitchell, V.F. Demyanov, and V.N. Malozemov, "Finding the point of a polyhedron closest
to the origin", S I A M Journal on Control 12 (1974) 19-26.
[10] J. Stoer and C. Witzgall, Convexity and optimization in finite dimensions I (Springer, Berlin,
1970).
[11] D.R. Wilhelmsen, A linear algorithm for generating positive weight cubatures, Mathematics of
Computation 30 (1976) to appear.
[12] C. Witzgall, correspondence with P. Wolfe (December, 1974).
[13] P. Wolfe, "Convergence theory in nonlinear programming", in: J. Abadie, ed., Integer and
nonlinear programming, (North-Holland, Amsterdam, 1970) pp. 1-36.
[14] P. Wolfe, "Algorithm for a least-distance programming problem", Mathematical Programming
Study 1 (1974) 190-205.

You might also like