IBM Thomas J. Watson Research Center, New York, U.S.A.: K L K L
IBM Thomas J. Watson Research Center, New York, U.S.A.: K L K L
F I N D I N G T H E N E A R E S T P O I N T IN A P O L Y T O P E
Philip W O L F E
IBM Thomas J. Watson Research Center, New York, U.S.A.
A terminating algorithm is developed for the problem of finding the point of smallest
Euclidean norm in the convex hull of a given finite point set in Euclidean n-space, or
equivalently for finding an "optimal" hyperplane separating a given point from a given finite
point set. Its efficiencyand accuracy are investigated, and its extension to the separation of two
sets and other convex programming problems described.
1. Introduction
We develop here a numerical algorithm for finding that point of a given polytope
in Euclidean n-space which has smallest Euclidean norm. A polytope is the
convex hull of a given point set P = {P1 . . . . . P,,}; algebraically, we are required to
minimize ]XI2= X T X for all X of the form
The point X is thus the nearest point to the origin in the polytope. Solving the
problem for P = {P1 - Y, . . . . P m - Y} yields X + Y as the nearest point to Y; for
simplicity we keep Y = 0 throughout.
That problem, or its dual, arises in many contexts. We have encountered it in
the optimization of nondifferentiable functions, in approximation theory, and in
pattern recognition, and have felt the need for an efficient and reasonably
foolproof way of solving it. We offer what follows as an approximation to that
goal.
The problem is, of course, a problem of quadratic programming, for which there
are several excellent general algorithms; but the special nature of our problem lets
us improve on them. The central feature of our approach is that the representation
(1.1) of X is explicitly used, and the set of such X viewed as a polytope, while the
usual general algorithm concentrates on the constraints w ~> 0, taking the descrip-
tion of the set as a polyhedron (the intersection of halfspaces) as fundamental.
The algorithm to be described has been trimmed to the present problem; we think
it improves on the general-purpose algorithms in computational effort, storage,
and roundoff error. Other procedures we have seen (e.g., [2, 9]) are convergent,
non-terminating methods, and cannot be properly compared with the present
scheme.
The next section presents some elementary notions required to set the problem.
128
P. Wol[e[Finding the nearest point in a polytope 129
2. Preliminaries
have rank k + 1, that is, be nonsingular. The sets A ( Q ) and C ( Q ) then have
dimension k - 1, and C ( Q ) is a nondegenerate simplex whose vertices are the
points of Q, while all the faces of dimension p of that simplex are the convex hulls
of all the subsets of p + 1 points of Q. The relative interior of C ( Q ) (relative to the
smallest affine set, A (Q), containing it) is just the set of points whose weights in Q
are all positive, and for any X E C ( Q ) there is a unique face having X in its
relative interior: its vertices are those points for which the weight of X in Q is
positive.
When Q is affinely independent, we can easily minimize IXI on A ( Q ) , that is,
solve the problem
Minimize IX] 2= wTQTQw,
subject to eTw = 1.
Forming the Lagrangian wTQTQw + 22, (eTw -- 1) and differentiating, we have the
necessary conditions
eTw = 1,
eA + Q T Q w = 0, (2.3)
which have a unique solution owing to the nonsingularity of their matrix (2.2).
Since the minimand is convex these conditions are also sufficient; X = Qw
minimizes IXI on A ( Q ) .
Our algorithm will repeatedly produce such points X for selected sets Q. If it
happens that w I> 0, then of course X belongs to C ( Q ) and minimizes IX I there;
and if Q c_ P has been suitably chosen, X may even minimize over C ( P ) , and
solve the original problem. The test for that, given below, amounts to determining
whether the hyperplane
The second inequality of (2.5) of course holds for any X ¢ 0. T h e o r e m 2.2 states
just the useful part of a duality t h e o r e m for our problem, which we can state in
greater generality for the discussion in Section 10.
L e t I'[ denote any norm (a positive-homogeneous finite-valued c o n v e x function
vanishing only at the origin) on E", and ]'l* its dual norm
The problems
Min{lxl: x E C(P)}, (2.8)
Max{g(y): lY[* ~< 1} (2.9)
are dual:
(i) g(y)<~lxl for all x E C(P), lyl*~ 1;
(ii) the e x t r e m a (2.8), (2.9) are equal;
(iii) a m o n g the solutions of the two problems are found all the saddlepoints of
yrx, that is, all pairs x, y such that y r $ ~< y r $ ~< yTx for all x E C(P), lY[* ~< I.
(This duality is well k n o w n in approximation theory, but its first a p p e a r a n c e
eludes us.)
For the Euclidean norm we have the simplification that lY[*= [YI, and (2.5)
follows f r o m the fact that the saddlepoint (iii) is given by ~ = Nr P and y = $/15] if
Nr P ~ 0, and b y • = y = 0 otherwise.
some affinely independent set Q and some point X E C(Q). Each minor cycle
removes one selected point from Q and alters X. Minor cycles are repeated until a
corral is found and X = Nr Q, terminating the major cycle.
The initial corral is the singleton given by Step 0 below. Subsequently each
major cycle begins at Step 1; the minor cycles, if any, in a major cycle constitute
repetititions of Steps 3 and 2. Following the algorithm here we show how it works
on the example of Fig. 1.
Pl =(0,2)
pa=(3,o)
Fig. 1. Example.
Step O. Find a point of P of minimal norm. Let X be that point and Q = {X}.
Step 1. If X = 0 or if H(X) separates P from the origin, stop. Otherwise
choose Pj E P on the near side of H(X) and replace Q by Qu{PJ}.
Step 2. Let Y be the point of smallest norm in A(Q). If Y is in the relative
interior of C(Q), replace X by Y and return to Step 1. Otherwise
Step 3. Let Z be the nearest point to Y on the line segment C(Q)nXY(thus a
boundary point of C(Q)). Delete from Q one of the points not on the face of
C(Q) in which Z lies, and replace X by Z. Go to Step 2.
Fig. 1 and Table 1 illustrate the algorithm in a simple problem, giving the current
Q, X, and Y at the end of each step.
We must show that the algorithm terminates in a solution of the problem. First,
Table 1
Solution of the example
Step X Q Y
0 Pt P~
1 P~ P~,P2
2 R do. R
I R P , , P~, P3
2 R do. 0
3 S P2, P3
2 T do. T
1 STOP
Notes. 1. This rule for choosing J is the handiest, and keeps the number of cycles
low, as well as making approximate affine dependence of Pj on P IS] unlikely. For
large-scale problems there are likely better rules: See Section 7.
2. The term subtracted from X ~ X is intended to make due allowance for
roundoff error in the whole procedure. When Z, = 0 we have the optimality test of
Theorem 1, Section 2. Calculating with sixteen-digit precision we use the generous
value Z1 = 10 -12. Without some such provision, the optimality test might never be
passed.
At the stop in l(c), X is N r P , P/S] is the corral which holds it, and w its
barycentric coordinates in P/S].
3. Stopping in l(d) signals temporary disaster: X is so inaccurate that the
relation XTPi = x T x , which should hold for all ] E S, fails to more than the degree
measured by Z1. We have encountered this when P contained some nearly
identical points, and generally find than that the stop l(c) would have occurred if a
modestly larger value of Z1 had been used. That is a happy ending; see Section 8
regarding the effect of Z1 on the final answer.
Some work would be saved by using only j ~ S in l(b) and omitting this step,
but then some other kind of guard shou!d be kept on the affine independence of Q.
The check is also a fine trap for programming errors.
4. Another precaution. We use Z2 = 10-1°.
5. We use Z3 = 10-1°.
6. The value of 0 given by 3(b) is the smaller of 1 and the largest value for
which all components of Ow + (1 - O)v are nonnegative. If 0 < 1, then at least one
of those components will vanish. The replacement of 3(d) makes that vanishing
decisive. We use the tolerance Z2 here for consistency with Step 2(b).
7. Answers to well-formed optimization problems are subject to checks which
are usually so easy it would be a sin to forgo them. Necessary and sufficient (and
redundant) conditions that X, S, and w i> 0 solve our problem are the vanishing of
these four numbers:
(a) 1 - eTw,
(b) Ix-P[S]wl,
(c) Max IX~P, - X~Xl,
(d) Min XTpi - xTx
i
(whose connection with the accuracy of the final solution is discussed in Section
8). If they do not approximate zero to a plausible degree, the algorithm has
blundered. A possible fix would be to repeat the calculation, using instead of the
matrix P just P [ S ] , and perform the tests again. We have had no experience with
unsatisfactory performance when using the methods recommended in the next
section.
8. A sometimes most useful feature of the algorithm is that after step 0 the
point set P is consulted only in Step l(b), and that step requires only one point Pj
achieving Min{XTpj}. The set P may well be presented in some form other than
that of a simple list, yet permit the step to be executed efficiently. An important
P. Wolfe/Finding the nearest point in a polytope 135
We have worked with the four ways A - D below of handling the equations (4.1),
and concluded that D is the best. Let s be the number of indices of S - that is, the
number of points in the current corral. By "operation" below we mean the
execution of one floating multiply and one floating add. In making estimates, we
give only the leading term of the polynomial constituting the accurate value.
Storage estimates assume that symmetric matrices are stored without redun-
dancy.
Method A. Maintain in storage the inverse E of the matrix of equations (4.1),
modifying it when S changes. Initially
E + YY~/t -Y/t]
- Y/t l/t J '
whose equivalence with (4.1) is easily checked. The order of the system (5.1) is
one less than that of (4.1), and the normalization e~v = 1 is almost perfectly
achieved.
Method C. The system (5.1) constitutes the normal equations A TAu = A Tb for
the least-square solution of the equations A u = b, taking
A = P/S]
and the solution of (5.1) is the result of solving the two systems
T~ = e
{R Ru a. (5.3)
R must be altered whenever S is. The required work changes the algorithm of
Section 4 as follows:
(i) Add to Step 0: L e t R be the 1-by-1 matrix [(1 + IPjI2)1[21.
(ii) Add to Step 1:
(f) Solve for r the system
RTr = e + P[S]Tpj. (5.4)
(g) Adjoin to R, on the right, the column [rp] T, where
p = (1 + Ps Tp J - rT/.)l/2 •
(iii) Delete " G o to Step 2" from 3(e), and add to Step 3:
(f) Let I be the position of the component deleted in (e); delete the I ~
column of R.
(g) If I exceeds the number of columns of (the new) R, go to Step 2.
Otherwise let
a = R~,I, b = R,+I,I, c = (a 2+ b 2)1/2;
It is easy to check that (ii) above and (f) of (iii) maintain the relation (5.2). Step
2(g) uses plane rotations to restore R to upper triangular form after 2(f) has
damaged it.
The storage needed for this scheme is the same as for A above. Solving a
triangular system of s equations takes ½s2 operations, so (5.3) takes s 2. Equations
(5.4) take ½s2, while the rotations of 3(g) take 2(s - 1 ) 2, which has the expected
value ~s 2 if I is randomly, uniformly chosen. Thus when S is increased
(decreased), ~s 2(~s2) operations are done, for an average of 19s122 in the long run.
We find that with this procedure there is virtually no accumulation of roundoff
error, unlike A (see Section 8). Since the difference of ~s 2 in the average operation
counts of the two methods is swamped by the m n operations taken by Step l(b) in
the typical problem, we prefer D.
6. Experience
Although the amount of computational work done in a single step of the algorithm
is well determined (see Section 5), we do not know how to estimate the number of
iterations required. That number is of course bounded by the number of
possible corrals in P, which is in turn bounded by the number of subsets of P
having no more than n + 1 elements, a calculable number; but the result is
preposterously large. As with linear programming, we must resort to experiment
to determine when the method is practical.
We have tested the algorithm on four types of problem. All of them start with a
set po of m points chosen at random, uniformly distributed over an n-cube of side
2 centered at the origin. (Draw m n integers at random without replacement from
1, 2 . . . . . 104, divide by 5000 and subtract 1.)
Type 0: P = po.
Type 1: The point X ° is chosen randomly from 2P °, and Pj = X ° + po for all j.
Type 2: po is compressed by 10 3, and then displaced one unit, along the xl
axis:
P = {(1 + 10 3x~, x z ..... x,): (xl, x2 . . . . . x,) ~ po}.
Type 3: Like T y p e 2, but displaced by 0.01 instead:
P ={(10 2+ 10 3xl, x2 ..... x,): (xl . . . . . x,) E P°}.
For a problem of Type 0, Nr P = 0 is almost certain while for Type 1, Nr P # 0
is. Both problems are very easy; Type 0 since the origin is well-centered in P,
Type 1 since the origin is likely to be near a " c o r n e r " of P and require only a few
points of P to determine the solution. When m > n a Type 0 problem almost
always terminates in n + 1 major, and no minor, cycles: the first corral of
dimension n + 1 constructed contains the origin. Table 6.1 gives the number of
major and minor cycles for Type 1 problems of various size. In each case, the
number is the average for a sample of ten problems.
Table 6.2 gives the same data for Type 3 problems. We omit the corresponding
data for T y p e 2, since they can be generally described as requiring about 20%
138 P. Wolfe/Finding the nearest point in a polytope
fewer major cycles than do Type 3, and like Type 3 generally have terminal
corrals of maximal size.
Note that the sum of the number of major and minor cycles is the total number
of times a system of equations is solved. The difference of those numbers is the
number of points in the terminal corral. The number of minor cycles can be
viewed as a count of the "mistakes" made by the algorithm: the number of points
selected which do not wind up in the terminal corral. When that number is zero,
the amount of work the algorithm does in equation-solving in the updating
versions A,B,D is simply what would be required to determine the weights w if the
terminal corral were known in advance. The results tend to support our hunch that
for large m the number of cycles increases roughly as log m.
The graphs in Figs. 6.1-6.3 show the convergence of IXk[ and the dual objective
g ( X k) (see Section 2) toward their common value d = INr P I, as a function of the
major cycle number k, for three problems having 80 points in E 2°. Each of the
problems 0,2,3 is of the corresponding type, built from the same P°. The numbers
of major cycles to termination were respectively 21,41,54, and minor cycles
0,21,34. The calculations were done using Method C of the previous section in a
APL\360 routine on an IBM 360 Model 91. The CPU times required were 1.70,
4.63, and 6.63 seconds.
It appears that the convergence of IXkl to d is roughly linear, as is that of g(X k)
when d > 0. (Note that the plotted points are subject to a discretization error of
the width of one dot, and that since the terminal values of !'1 and g were within
10 -15 of d, those points are omitted from the logarithmic graphs.)
Table 6.1.
Cycles for Type 1 problems (major above minor)
Table 6.2.
Cycles for Type 3 problems
I-" Ixkl
0
o%
............... k 2F
0 "-.... k
-I ~. ~°=,,...." ......
. g(x k) .... • -... 10g,0[Ix"l-,J]
-2 , I , i , I
I0 20 30 0 I0 20
Fig. 6.1. Solving P r o b l e m 0.
-I °°
• iOq,o[txkl_d]
2.5 -2 %°
°°ooo°°°ooo•
OO°o.o.
2.C -3
..°o
I.~ . Ixkd °
i I I "~
I.E "°-,.o,.•.,IlIIIIlIIIII|IlU ..........
I0 20 50 40
(10
°,.o
-3
-,d.I q I h t i I i I
I0 20 50 4O
Fig. 6.2. Solving P r o b l e m 2.
2.0 0
-2
"'.. ....... .. IOq,o[Ix'l-o]
r.5
"%'•°*•°'°°•'o••.•.,°°•••O.oo,,,l,
I.O °°
[xk(
r I i r i I r I i r i I
0.5 •°
Oo
0 I0 20 30 40 50 60
(10 =====================;:-~; ..... k
,°..•'.~ °•°'•
-0.5
. .'... ~y~xk~
"~
"• lOglO[d_g(xk)]
-I.0 , ,
-0.5
oO•
-2
_ j...".'... :.'...._._
~g( •°•
i I i i p i i I i I -3 x):O .• •
-2.00 I0 20 30 40 50
0 ro 20 30 40 50 60
Fig. 6.3. Solving P r o b l e m 3.
139
140 P. Wolfe / Finding the nearest point in a polytope
7. Possible improvements
For a problem having a very large number m of points, almost all the computing
time this algorithm requires will be taken in finding the new point P j in Step l(b).
There are several possibilities for changing the selection criterion given in Section
4 to improve matters.
The first possibility considered is that the criterion used could be improved so
that the number of major cycles might be reduced. Step l(b) as given chooses Pj
as that point on the same side of the separating hyperplane X T x = X T X as the
origin which has greatest distance from the hyperplane. We could, instead, either
(i) determine how much reduction in IX[ would ensue in adjoining any Pi, (ii)
estimate the latter quantity by finding the distance from the origin to the segment
X P i for all P~, or more crudely (iii) choose Pi so as to maximize the angle between
X and P~ - X. We deem (i) as too hard to be worthwhile, (ii) as interesting, and (iii)
as easy enough to try out quickly, which is what we have done. The results, even
for problems with n = 5, m = 100, were uniformly negative: there was a reduction
of a few percent in the number of major cycles, but the extra calculation required
almost doubled the total CPU time. A cheap approximation to this is (iv): Choose
J to minimize X T ( p i - X)/[Pj[, having calculated IP, I once and for all. The amount
of work is about the same as for the original method, but so is the number of
iterations, on the average, for our problems.
We have not yet tried out the other two ideas worth mentioning, although their
analogues have proved successful for linear programming problems with very
large numbers of variables.
"Cyclic pricing": Only a portion of the points of P are examined before
choosing Pj and performing the rest of the algorithm; then another portion is
treated the same way; and so on, beginning again when all of P has been
examined. Ultimately no candidate will be found in P, and we are done.
"Suboptimization": A small number of points of P are chosen after examining
all of them - for instance, the points having the ten lowest values for x T p i -- and
the problem is completely solved on that subset; and this is done repeatedly.
Both procedures above are almost certain to increase the number of major
cycles required by the algorithm, but hold promise of greatly decreasing the total
work. At this time we have no more specific recommendation as to how to carry
them out.
8. Accuracy
In most applications not much accuracy is required of the solution of our problem;
but the answer obtained is quite accurate anyway. Here we show that the problem
of finding Nr P is well-posed, and examine the round-off error in its solution.
The problem is well-posed if small changes in its data lead to small changes in its
answer - i.e. if Nr P is continuous in P. That is the content of the Theorem below,
due to G.W. Stewart.
P. W o l f e / F i n d i n g the nearest point in a polytope 141
0
Fig. 8.1. Perturbation of P.
eb = IX - P t S ] w l / B ,
(8.3)
ec = MaxlXTPj - X T X I I B [XI,
j~s
ed = IMin X r P i - X T X I [ B [X I,
J
where
aj = x T ( p j e T w -- P w ) I X T X ,
for all/" ~ T, and /Sj = p~ for all other L (The required P is readily deduced by
altering w, and then P, so as to make the quantities of (8.3) vanish - in the order of
their writing.) L e t ~ = w [ e T w . Then ~ > 0 , eTr~ = 1, Z ~jaj = 0, X = / 5 [ S ] ~ , and
x T f i j -- X T X = 0 for j E T are all easily checked, showing that X = Nr/5.
N o w for j ~ T,
[X[2aj = X T P j ( e T w - 1) + x T ( p j -- X ) + X T ( X - P w ) ,
/sj - Pj = P~ (e T w -- 1) q- X - P w - Xaj,
so
IE -P,I ~< IP, le~+ B e b + I X [ . la~l,
f r o m which the conclusion follows.
In practice the errors ea, eb are trifling c o m p a r e d with e, and ec is small if the
equations of the problem are well handled, so that ea = eb = 0, e = ed closely
represents the normal situation. If the computation for the stop rule l(c) has been
done with reasonable a c c u r a c y we should have Min~ x T p j - - X T X <~ Z I B 2, so that
e <~ZIB/IX[. By T h e o r e m 8.2 the hypotheses of T h e o r e m 8.1 hold with 6 =
Z1B2/[XI; we conclude that, where X * is the exact solution of the problem,
Table 8.1
do, ed
Problem Method A D
Problem 3 did not terminate properly with Method A. At major cycle 48 the
algorithm was abandoned when it calculated a negative pivot t. (Mathematically,
t > 0; t < 0 is disaster, since the less inaccurate value t = 0 destroys the logic of
the algorithm. We might try to recalculate a better inverse, but there is no
guarantee that we could do so using the Method A scheme.) At that point B IXlec
was 1 . 9 x 1 0 5 a b a d value.
Fig. 8.2 plots log~o(B ISleo) for the runs of Table 8.1. Method D seems to entail
no growth of roundoff error, while Method A, probably in its minor cycles,
undergoes exponential growth after a series of good cycles mostly having no
minor cycles.
-5! ...-
,..,.,°°
•..°°
-10 ,,°" ...'""
,°
-lOI
-I~ .,,°,°-°,° ......... " "°'•°°°° ....• -[5 ...°,°°°........... "
-20 i I i I r I i I I I -E0 ~ I ~ I I I r I , I
0 I0 20 30 40 50 I0 20 30 40 50
M e t h o d A, P r o b l e m 2 M e t h o d A, P r o b l e m 3 (abandoned)
44
F -141 ..... .'....
-15 k • ° °°°°°° ,.°"°"°°" -15" . ..,'."'"... ,..
°
-16
0
Lr "I ~ I i | r I
10 2O 30 40
I
5O
7'?"T ii "" I ~ I i E ~
0 I0 20 30 40 50
M e t h o d D, P r o b l e m 2 M e t h o d D, P r o b l e m 3
9. Convenience features
so the solution of the dual problem (2.9) Max{g(y): l y l * <~ 1} gives the " b e s t "
separation.
There are only two norms one would want to compute much with other than the
Euclidean:
They are dual to each other I'l* = ]'l 1), and using either of them our
primal and dual problems can be stated as the same pair of dual linear
P. Wolfe / Finding the nearest point in a polytope 145
avoids finding the set P explicitly, but works only with Q and R, as foreshadowed
above (Section 4, Note 8). For Step 0 of the algorithm of Section 4 we choose a
point of Q, find that point of R nearest it, then that point of Q nearest the chosen
point of R, and so on. When no new points arise, P1 is set to the difference of the
last member of Q and the last member of R found. (This does not find Min [P], but
that is a hard task.) Throughout, the set S is a set of index-pairs (i,/); the first pair
identifies the members of Q and R just chosen. In Step l(b) we find Qj minimizing
XTQj and Ri maximizing XTRi, forming the new member of P as Q~ - Ri. The rest
of the algorithm is essentially unchanged.
When Method C of Section 5 is used, it is appropriate to retain those members
of P that have been constructed as long as they have positive weights w, and to
discard them when their weights vanish in 3(d). In Method D it is not necessary to
retain them at all once the inverse has been updated, so the storage requirements
are the same as for the problem N r Q u R . A few experiments support our intuition
that when Q is separable from R the number of steps to find a separating plane is
measured by D / d as above, and that the number to find the best plane is about the
total required by the separate problems Nr Q, Nr R.
In this section we point out the relationship of our algorithm to the simplex
method for linear programming and to the Frank-Wolfe method for convex
programming, and how it can be extended to various types of constraint set and
objective function.
The description of the algorithm in Section 3 parallels quite closely Dantzig's
"Simplex interpretation of the simplex method" [4, Section 7.3], a visualization of
his procedure for the linear programming problem Min{cTw: W ~ 0 , eTw = 1,
AW = b} in the space of the vectors Pj = [cj; Aj], where {Aj} are the columns of
A. A "basis" for the linear programming problem corresponds to our corral, and a
"change of basis" to a complete major cycle containing just one minor cycle.
There are differences in detail, of course: unlike our problem, a basis must have n
points in order to define a hyperplane by means of which the solution can either be
improved or shown to be optimal; and in case of degeneracy a major cycle may
not reduce the objective, so special provision must be made to avoid repetition of
bases. Our experience has been that the two methods take similar numbers of
steps to solve problems of the same size.
The procedure can also be viewed as extending the F r a n k - W o l f e algorithm.
While its original presentation dealt with the minimization of a convex function
over a polyhedron (i.e., a set explicitly defined as the intersection of halfspaces), it
is best conceived as working with a polytope whose vertices are generated by the
solution of linear programming problems in case the constraint set is a polyhed-
ron. The present algorithm reduces to the Frank-Wolfe algorithm if the corral Q
is required to be no more than one-dimensional. (The effect of such a restriction
on the convergence rate is, of course, drastic.)
Gilbert [5] extended the Frank-Wolfe algorithm to the problem Min{Ix I: x E S},
P. Wolfe [Finding the nearest point in a polytope 147
where S is any convex set for which the "support" problem Min{yTx: x E S} is
readily solved, rather than just a polyhedron, and Barr [1] showed that relaxing
the restriction to one dimension considerably improves matters. Barr solves a new
quadratic programming problem each time a new point x E S is generated; it
would seem that our algorithm should be used instead, with the support problem
replacing our Step 1.
The basic idea of our algorithm seems worth using for some variants of our
problem, although we have not had occasion to implement any of them. For a set
of vectors P = {P1 . . . . . Pro} let
References
[1] R.O. Barr, "An efficient computational procedure for a generalized quadratic programming
problem", SIAM Journal on Control 7 (1969) 415-429.
[2] L.C. Barbosa and E. Wong, "On a class of iterative algorithms for linear inequalities with
applications to pattern classification", in: Proceedings of the first annual Princeton conference of
information sciences and systems, 1967, pp. 86-89.
[3] M.D. Canon and C.D. Cullum, "The determination of optimum separating hyperplanes I. A finite
step procedure", RC 2023, IBM Watson Research Center, Yorktown Heights, New York
(February, 1968).
[4] G.B. Dantzig, Linear programming and extensions (Princeton University Press, Princeton, N.J.
1963).
[5] E.G. Gilbert, "An iterative procedure for computing the minimum of a quadratic form on a
convex set", SIAM Journal on Control 4 (1966) 61-80.
[6] G.H. Golnb and M.A. Saunders, Linear least squares and quadratic programming, in: J. Abadie,
ed., Integer and nonlinear programming (North-Holland, Amsterdam, 1970) Ch. 10.
[7] B. yon Hohenbalken, "A finite algorithm to maximize certain pseudoconcave functions on
polytopes", Mathematical Programming 9 (1975) 189-206.
P. Wolfe/Finding the nearest point in a polytope 149
[8] O.L. Mangasarian, "Linear and nonlinear separation of patterns by linear programming",
Operations Research 13 (1965) 444-452.
[9] B.F. Mitchell, V.F. Demyanov, and V.N. Malozemov, "Finding the point of a polyhedron closest
to the origin", S I A M Journal on Control 12 (1974) 19-26.
[10] J. Stoer and C. Witzgall, Convexity and optimization in finite dimensions I (Springer, Berlin,
1970).
[11] D.R. Wilhelmsen, A linear algorithm for generating positive weight cubatures, Mathematics of
Computation 30 (1976) to appear.
[12] C. Witzgall, correspondence with P. Wolfe (December, 1974).
[13] P. Wolfe, "Convergence theory in nonlinear programming", in: J. Abadie, ed., Integer and
nonlinear programming, (North-Holland, Amsterdam, 1970) pp. 1-36.
[14] P. Wolfe, "Algorithm for a least-distance programming problem", Mathematical Programming
Study 1 (1974) 190-205.