Dadush 2020
Dadush 2020
ABSTRACT improving estimates of the circuit ratio digraph, together with a re-
Following the breakthrough work of Tardos (Oper. Res. ’86) in the fined potential function based analysis for LLS algorithms in general.
bit-complexity model, Vavasis and Ye (Math. Prog. ’96) gave the With this analysis, we derive an improved O(n2.5 log n log( χ̄A∗ + n))
first exact algorithm for linear programming in the real model of iteration bound for optimally solving (LP) using our algorithm. The
computation with running time depending only on the constraint same argument also yields a factor n/log n improvement on the
matrix. For solving a linear program (LP) max c ⊤x, Ax = b, x ≥ iteration complexity bound of the original Vavasis-Ye algorithm.
0, A ∈ Rm×n , Vavasis and Ye developed a primal-dual interior point
method using a ‘layered least squares’ (LLS) step, and showed that CCS CONCEPTS
O(n3.5 log( χ̄A + n)) iterations suffice to solve (LP) exactly, where • Theory of computation → Linear programming.
χ̄A is a condition measure controlling the size of solutions to linear
systems related to A. KEYWORDS
Monteiro and Tsuchiya (SIAM J. Optim. ’03), noting that the
Linear programming, interior point methods, condition number,
central path is invariant under rescalings of the columns of A and
chi bar, circuits, linear matroids
c, asked whether there exists an LP algorithm depending instead
on the measure χ̄A∗ , defined as the minimum χ̄AD value achievable ACM Reference Format:
by a column rescaling AD of A, and gave strong evidence that this Daniel Dadush, Sophie Huiberts, Bento Natura, and László A. Végh. 2020.
should be the case. We resolve this open question affirmatively. A Scaling-Invariant Algorithm for Linear Programming Whose Running
Our first main contribution is an O(m 2n 2 + n 3 ) time algorithm Time Depends Only on the Constraint Matrix. In Proceedings of the 52nd
which works on the linear matroid of A to compute a nearly optimal Annual ACM SIGACT Symposium on Theory of Computing (STOC ’20), June
diagonal rescaling D satisfying χ̄AD ≤ n( χ̄ ∗ )3 . This algorithm also 22–26, 2020, Chicago, IL, USA. ACM, New York, NY, USA, 14 pages. https:
allows us to approximate the value of χ̄A up to a factor n( χ̄ ∗ )2 . //doi.org/10.1145/3357713.3384326
This result is in (surprising) contrast to that of Tunçel (Math. Prog.
’99), who showed NP-hardness for approximating χ̄A to within
1 INTRODUCTION
2poly(rank(A)) . The key insight for our algorithm is to work with
ratios дi /дj of circuits of A—i.e., minimal linear dependencies Aд = The linear programming (LP) problem in primal-dual form is to
0—which allow us to approximate the value of χ̄A∗ by a maximum solve
geometric mean cycle computation in what we call the ‘circuit ratio min c ⊤x max y ⊤b
digraph’ of A. Ax = b A⊤y + s = c (LP)
While this resolves Monteiro and Tsuchiya’s question by appro-
x ≥ 0, s ≥ 0,
priate preprocessing, it falls short of providing either a truly scaling
invariant algorithm or an improvement upon the base LLS analysis. where A ∈ Rm×n , rank(A) = m ≤ n, b ∈ Rm , c ∈ Rn are given in
In this vein, as our second main contribution we develop a scaling the input, and x, s ∈ Rn , y ∈ Rm are the variables. We consider the
invariant LLS algorithm, which uses and dynamically maintains program in x to be the primal problem and the program in y, s to
be the dual problem.
∗ Supported by the ERC Starting Grants ScaleOpt–757481 and QIP–805241.
Khachiyan [18] used the ellipsoid method to give the first polyno-
mial time LP algorithm in the bit-complexity model, that is, polyno-
mial in the bit description length of A, b, c. Following Khachiyan’s
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation work, the now forty year old open question is whether there exists
on the first page. Copyrights for components of this work owned by others than the a strongly polynomial time algorithm for LP. The task is to solve
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission LP using poly(n, m) basic arithmetic operations. Furthermore, the
and/or a fee. Request permissions from [email protected]. algorithm must be in PSPACE, that is, the numbers occurring in
STOC ’20, June 22–26, 2020, Chicago, IL, USA the computations must remain polynomially bounded in the input
© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-6979-4/20/06. . . $15.00 size. Known strongly polynomially solvable LP problems classes
https://doi.org/10.1145/3357713.3384326 include: feasibility for two variable per inequality systems [26],
761
STOC ’20, June 22–26, 2020, Chicago, IL, USA Daniel Dadush, Sophie Huiberts, Bento Natura, and László A. Végh
the minimum-cost circulation problem [41], the maximum gen- the quantity log(µ/µ ′ ) can be usefully interpreted as the length of
eralized flow problem [33, 50], and discounted Markov decision the corresponding segment of the central path.
problems [52, 53].
For more general LP classes, for which strongly polynomial
algorithms are not known, the principal line of attack has been to Crossover Events and Layered Least Squares Steps. At a very high
reduce the numerical complexity of LP algorithms. More precisely, level, Vavasis and Ye show that the central path can be decom-
the goal has been to develop algorithms whose number of arithmetic posed into at most n2 short but curved segments, possibly joined
operations depend on natural condition measures of the base LP; at a by long (apriori unbounded) but very straight segments. At the
high level, these condition measures attempt to finely measure the end of each curved segment, they show that a new ordering rela-
“intrinsic complexity” of the LP. An important line of work in this tion x i (µ) > x j (µ)—called a ‘crossover event’—is implicitly learned,
area has been to parametrize LPs by the “niceness” of their solutions where this relation did not hold at the start of the segment, but will
(e.g. the depth of the most interior point), where relevant examples hold at every point from the end of the segment onwards. These
n
include the Goffin measure [11] for conic systems and Renegar’s 2 relations give a combinatorial way to measure progress along
distance to ill-posedness for general LPs [35, 36], and bounded ratios the central path. In contrast to Tardos’s algorithm, where the main
between the nonzero entries in basic feasible solutions [4, 19]. progress is setting variables to zero explicitly, the variables partici-
pating in crossover events cannot be identified, only their existence
Parametrizing by the Constraint Matrix. A second line of research, is shown.
and the main focus of this work, makes no assumptions on the At a technical level, the VY algorithm is a variant of the Mizuno-
“niceness" of solutions and instead focuses on the complexity of Todd-Ye [29] predictor-corrector method (MTY P-C). In predictor-
the constraint matrix A. The first breakthrough in this area was corrector methods, corrector steps bring an iterate closer to the
given by Tardos [42], who showed that if A has integer entries path, i.e., improve centrality, and predictor steps “shoot down” the
and all square submatrices of A have determinant at most ∆ in path, i.e., reduce µ without losing too much centrality. VY’s main
absolute value, then (LP) can be solved in time poly(n, m, log ∆). algorithmic innovation was the introduction of a new predictor step,
This is achieved by finding the exact solutions to n2 rounded LPs called the ‘layered least squares’ (LLS) step, which crucially allowed
derived from the original LP, with the right hand side vector and them to cross each aforementioned “straight” segment of the central
cost function being integers of absolute value bounded in terms path in a single step, recalling that these straight segments may be
of n and ∆. From n such rounded problem instances, one can infer, arbitrarily long. To traverse the short and curved segments of the
via proximity results, that a constraint x i = 0 must be valid for path, the standard predictor step, known as affine scaling (AS), in
every optimal solution. The process continues by induction until fact suffices.
the optimal primal face is identified. To compute the LLS direction, the variables are decomposed into
Path-Following Methods and the Vavasis-Ye Algorithm. In a sem- ‘layers’ J1 ∪ J2 ∪ . . . ∪ Jp = [n]. The goal of such a decomposition
inal work, Vavasis and Ye [49] introduced a new type of interior- is to eventually learn a refinement of the optimal partition of the
point method that optimally solves (LP) within O(n 3.5 log( χ̄A + n)) variables B ∗ ∪ N ∗ = [n], where B ∗ := {i ∈ [n] : x i∗ > 0} and
iterations, where the condition number χ̄A controls the size of N ∗ := {i ∈ [n] : si∗ > 0} for the limit optimal solution (x ∗ , y ∗ , s ∗ ).
solutions to certain linear systems related to the kernel of A (see The primal affine scaling direction can be equivalently described
Section 2 for the formal definition). by solving a weighted least squares problem in Ker(A), with re-
Before detailing the Vavasis-Ye (henceforth VY) algorithm, we spect to a weighting defined according to the current iterate. The
recall the basics of path following interior-point methods. If both primal LLS direction is obtained by solving a series of weighted
the primal and dual problems in (LP) are strictly feasible, the central least squares problems, starting with focusing only on the final
path for (LP) is the curve ((x(µ), y(µ), s(µ)) : µ > 0) defined by layer Jp . This solution is gradually extended to the higher layers
(which refers to layers with lower indices). The dual directions
x(µ)i s(µ)i = µ, ∀i ∈ [n] have analogous interpretations, with the solutions on the layers
Ax(µ) = b, x(µ) > 0, (CP) obtained in the opposite direction, starting with J1 . If we use the
⊤
A y(µ) + s(µ) = c, s(µ) > 0, two-level layering J1 = B ∗ , J2 = N ∗ , and are sufficiently close to
the limit (x ∗ , y ∗ , s ∗ ) of the central path, then the LLS step reaches
which converges to complementary optimal primal and dual solu- an exact optimal solution in a single step. We note that standard
tions (x ∗ , y ∗ , s ∗ ) as µ → 0, recalling that the optimality gap at time AS steps generically never find an exact optimal solution, and thus
µ is exactly x(µ)⊤s(µ) = nµ. We thus refer to µ as the normalized some form of “LLS rounding” is always necessary to achieve finite
dualized gap. Methods that “follow the path” generate iterates that termination.
stay in a certain neighborhood around it while trying to achieve Of course, guessing B ∗ and N ∗ correctly is just as hard as solving
rapid multiplicative progress w.r.t. to µ, where given (x, y, s) close (LP). Still, if we work with a “good” layering, these will reveal new
to the path, we define the effective µ as µ(x, y, s) = ni=1 x i si /n. information about the “optimal order” of the variables, where B ∗ is
Í
In general, the direction of movement at each iteration is com- placed on higher layers than N ∗ . The crossover events correspond to
puted by solving a carefully chosen linear system. Given a target swapping two wrongly ordered variables into the correct ordering.
parameter µ ′ and starting point close to the path at parameter µ, Namely, a variable i ∈ B ∗ and j ∈ N ∗ are currently ordered on the
standard path following methods [12] can compute a point at pa- same layer, or j is in a higher layer than i. After the crossover event,
√
rameter below µ ′ in at most O( n log(µ/µ ′ )) iterations, and hence i will always be placed on a higher layer than j.
762
A Scaling-Invariant Algorithm for Linear Programming Whose Running Time Depends Only on the Constraint Matrix STOC ’20, June 22–26, 2020, Chicago, IL, USA
Computing Good Layerings and the χ̄A Condition Measure. Given curved parts of the central path in the same iteration complex-
the above discussion, the obvious question is how to come up with ity bound as the VY algorithm. Moreover, on the “straight” parts of
“good” layerings? The philosophy behind LLS can be stated as saying the path, the rate of progress amplifies geometrically, thus attaining
that if modifying a set of variables x I barely affects the variables a log log convergence on these parts. Subsequently, [21] developed
in x [n]\I (recalling that movement is constrained to ∆x ∈ Ker(A)), an affine invariant trust region step, which traverses the full path
then one should optimize over x I without regard to the effect on in O(n 3.5 log( χ̄A∗ + n)) iterations. However, each iteration is weakly
x [n]\I ; hence x I should be placed on lower layers. polynomial in b and c. The question of developing an LP algorithm
VY’s strategy for computing such layerings was to directly use with complexity bound poly(n, m, log χ̄A∗ ) thus remained open.
the size of the coordinates of the current iterate x (where (x, y, s) is A related open problem to the above is whether it is possible to
a point near the central path). In particular, assuming x 1 ≥ x 2 ≥ compute a near-optimal rescaling D for program (1)? This would
. . . ≥ x n , the layering J1 ∪ J2 ∪ . . . ∪ Jp = [n] corresponds to give an alternate pathway to the desired LP algorithm by simply
consecutive intervals constructed in decreasing order of x i values. preprocessing the matrix A. The related question of approximating
The break between Ji and Ji+1 occurs if the gap x r /x r +1 > д, where χ̄A was already studied by Tunçel [45], who showed NP-hardness
r is the rightmost element of Ji and д > 0 is a threshold parameter. for approximating χ̄A to within a 2poly(rank(A)) factor. Taken at face
Thus, the expectation is that if x i > дx j , then a small multiplicative value, this may seem to suggest that approximating the rescaling
change to x j , subject to moving in Ker(A), should induce a small D should be hard.
multiplicative change to x i . By proximity to the central path, the A further open question is whether Vavasis and Ye’s base cross-
dual ordering is reversed as mentioned above. over analysis can be improved. Ye in [? ] showed that the iteration
The threshold д for which this was justified in VY was a func- complexity can be reduced to O(n 2.5 log( χ̄A + n)) for feasibility
tion of the χ̄A condition measure. We now provide a convenient problems and further to O(n 1.5 log( χ̄A + n)) for homogeneous sys-
definition, which immediately yields this justification (see Propo- tems, though the O(n3.5 log( χ̄A + n)) bound for optimization has
sition 2.3). Letting W = Ker(A) and π I (W ) = {x I : x ∈ W }, we remained unimproved since [49].
define χ̄A := χ̄W as the minimum number M ≥ 1 such that for any
∅ , I ⊆ [n] and z ∈ π I (W ), there exists y ∈ W with y I = z and Our Contributions. In this work, we resolve all of the above ques-
∥y ∥ ≤ M ∥z ∥. Thus, a change of ε in variables in I can be lifted to a tions in the affirmative. We detail our contributions below.
change of at most χ̄A ε in variables in [n] \ I . Crucially, χ̄ is a “self-
dual” quantity. That is, χ̄W = χ̄W ⊥ , where W ⊥ = range(A⊤ ) is the
movement subspace for the dual problem, justifying the reversed 1. Finding an Approximately Optimal Rescaling. As our first contri-
layering for the dual (see Sections 2 for more details). bution, we give an O(m2n2 + n3 ) time algorithm which works on
the linear matroid of A to compute a diagonal rescaling matrix D
The Question of Scale Invariance and χ̄A∗ . While the VY layering which achieves χ̄AD ≤ n( χ̄A∗ )3 , given any m × n matrix A. Further-
procedure is powerful, its properties are somewhat mismatched more, this same algorithm allows us to approximate χ̄A to within a
with those of the central path. In particular, variable ordering infor- factor n( χ̄A∗ )2 . The algorithm bypasses Tunçel’s hardness result by
mation has no intrinsic meaning on the central path, as the path itself allowing the approximation factor to depend on A itself, namely
is scaling invariant. Namely the central path point (x(µ), y(µ), s(µ)) on χ̄A∗ . This gives a simple first answer to Monteiro and Tsuchiya’s
w.r.t. the problem instance (A, b, c) is in bijective correspondence question: by applying the Vavasis-Ye algorithm directly on the pre-
with the central path point (D −1x(µ), Dy(µ), Ds(µ))) w.r.t. the prob- processed A matrix, we may solve any LP with constraint matrix A
lem instance (AD, Dc, b) for any positive diagonal matrix D. The using O(n 3.5 (log χ̄A∗ + n)) iterations. Note that the approximation
standard path following algorithms are also scaling invariant in factor n( χ̄A∗ )2 increases the runtime only by a constant factor.
this sense. To achieve this result, we work directly with the circuits of A,
This lead Monteiro and Tsuchiya [31] to ask whether a scaling where a circuit C ⊆ [n] is C = supp(д) for a minimal linear de-
invariant LLS algorithm exists. They noted that any such algorithm pendency Aд = 0. With each circuit, we can associate a vector
would then depend on the potentially much smaller parameter дC ∈ Ker(A) with supp(дC ) = C that is unique up to scaling. By
the ‘circuit ratio’ of (i, j), we mean the largest ratio |дC C
j /дi | taken
χ̄A∗ := inf χ̄AD , (1) over every circuit C of A such that i, j ∈ C. As our first observation,
D
we show that the maximum of all circuit ratios, which we call the
where the infimum is taken over the set of n × n diagonal matrices. ‘circuit imbalance measure’, in fact characterizes χ̄A up to a factor n.
Thus, Monteiro and Tsuchiya’s question can be rephrased as to This measure was first studied by Vavasis [48], who showed that it
whether there exists an exact LP algorithm with running time lower bounds χ̄A , though, as far as we are aware, our upper bound
poly(n, m, log χ̄A∗ ). is new. The circuit ratios of each pair (i, j) induces a weighted di-
Substantial progress on this question was made in the followup rected graph we call the circuit ratio digraph of A. From here, our
works [21, 32]. The paper [32] showed that the number of iterations main result is that χ̄A∗ is up to a factor n equal to the maximum geo-
of the MTY predictor-corrector algorithm [29] can get from µ 0 > 0 metric mean cycle in the circuit ratio digraph. Our approximation
to η > 0 on the central path in O(n 3.5 log χ̄ ∗ +min{n 2 log log(µ 0 /η), algorithm populates the circuit ratio digraph with ratios for each
log(µ 0 /η)}) iterations. This is attained by showing that the stan- i, j using basic matroid techniques, and then computes a rescaling
dard AS steps are reasonably close to the LLS steps. This prox- by solving the dual of the maximum geometric mean ratio cycle on
imity can be used to show that the AS steps can traverse the the ‘approximate circuit ratio digraph’.
763
STOC ’20, June 22–26, 2020, Chicago, IL, USA Daniel Dadush, Sophie Huiberts, Bento Natura, and László A. Végh
2. Scaling Invariant LLS Algorithm. While the above yields an LP size ≥ 2τ . For every LLS step, we can find a parameter τ such that
algorithm with poly(n, m, log χ̄A∗ ) running time, it does not satis- an event of this type happens concurrently for at least τ − 1 pairs
√
factorily address Monteiro and Tsuchiya’s question for a scaling within the next O( nτ log( χ̄A∗ + n)) iterations,
invariant algorithm. As our second contribution, we use the circuit
ratio digraph directly to give a natural scaling invariant LLS layer-
ing algorithm together with a scaling invariant crossover analysis. Our improved analysis is also applicable to the original VY algo-
At a conceptual level, we show that the circuit ratios give a scale rithm. Let us now comment on the relation between the VY algo-
invariant way to measure whether ‘x i > x j ’ and enable a natural rithm and our new algorithm. The VY algorithm starts a new layer
layering algorithm. Let κi j be the circuit imbalance between i and once x π (i) > дx π (i+1) between two consecutive variables where the
j, i.e., the maximum value |дj /дi | for a minimal kernel solution permutation π is a non-increasing order of the x i variables. Here,
д containing i and j in the support. Given the circuit ratio graph д = poly(n) χ̄ . Setting the initial ‘estimates’ κ̂i j = д/poly(n) for a
induced by κ and a primal point x near the path, our layering suitable polynomial, our algorithm runs the same way as the VY
algorithm can be described as follows. We first rescale the variables algorithm. Using these estimates, the layering procedure becomes
so that x becomes the all ones vector, which rescales κi j to κi j x i /x j . much simpler: there is no need to verify ‘balancedness’ as in our
We then restrict the graph its edges of length ≥ 1/poly(n)—the general algorithm.
long edges of the (rescaled) circuit ratio graph—and let the layering However, setting д = κ̂i j has drawbacks. Most importantly, it
J1 ∪ J2 ∪ . . . ∪ Jp be a topological ordering of its strongly connected does not give a lower bound on the true circuit ratio κi j —to the
components (SCC) with edges going from left to right. Intuitively, contrary, д will be an upper bound! In effect, this causes VY’s layers
variables that “affect each other” should be in the same layer, which to be “much larger” than ours, and for this reason, the connection
motivates the SCC definition. to χ̄ ∗ is lost. Nevertheless, our potential function analysis can still
We note that our layering algorithm does not in fact have access be adapted to the VY algorithm to obtain the same Ω(n/log n) im-
to the true circuit ratios κi j , as these are NP-hard to compute. Get- provement on the iteration complexity bound; see Section 4.1 for
ting a good enough initial estimate for our purposes however is more details.
easy: we let κ̂i j be the ratio corresponding to an arbitrary circuit
containing i and j. This already turns out to be within a factor ( χ̄A∗ )2
from the true value κi j , which we recall is the maximum over all 1.1 Related Work
such circuits. Our layering algorithm in fact learns better circuit Since the seminal works of Karmarkar [17] and Renegar [34], there
ratio estimates if the “lifting costs” of our SCC layering, i.e., how has been a tremendous amount of work on speeding up and im-
much it costs to lift changes from lower layer variables to higher proving interior-point methods. In contrast to the present work,
layers (as in the definition of χ̄A ), are larger than we expected them the focus of these works has mostly been to improve complexity of
to be. approximately solving LPs. Progress has taken many forms, such as
For our analysis, we define cross-overs in a scaling invariant way the development of novel barrier methods, such Vaidya’s volumetric
as follows. Before the crossover event, poly(n)( χ̄A∗ )n > κi j x i /x j , barrier [46] and the recent entropic barrier of Bubeck and Eldan [3]
and after the crossover event, poly(n)( χ̄A∗ )n < κi j x i /x j for all fur- and the weighted log-barrier of Lee and Sidford [22, 24], together
ther central path points. Our analysis relies on χ̄A∗ in only a min- with new path following techniques, such as the predictor-corrector
imalistic way, and does not require an estimate on the value of framework [28, 29], as well as advances in fast linear system solv-
χ̄A∗ . Namely, it is only used to show that if i, j ∈ Jq , for a layer ing [23, 39]. For this last line, there has been substantial progress
q ∈ [p], then the rescaled circuit ratio κi j x i /x j is in the range in improving IPM by amortizing the cost of the iterative updates,
(poly(n) χ̄A∗ )O (± | Jq |) . The argument to show this crucially utilizes and working with approximate computations, see e.g. [5, 34, 46, 47].
the maximum geometric mean cycle characterization. Furthermore, Very recently, Cohen, Lee and Song [5] developed a new inverse
unlike prior analyses [31, 49], our definition of a “good” layering maintenance scheme to get a randomized Õ(n 2.37 log(1/ε))-time
(i.e., ‘balanced’ layerings, see Section 3.4), is completely indepen- algorithm for ε-approximate LP, which was derandomized by van
dent of χ̄A∗ . den Brand [47]. For special classes of LP such as network flow prob-
lems, fast algorithms have been obtained by using fast Laplacian
3. Improved Potential Analysis. As our third contribution, we im- solvers, see e.g. [6, 25]. Given the progress above, we believe it to
prove the Vavasis-Ye crossover analysis using a new and simple be an interesting problem to understand to what extent these new
potential function based approach. When applied to our new LLS al- numerical techniques can be applied to speed up LLS computations,
gorithm, we derive an O(n2.5 log n log( χ̄A∗ + n)) iteration bound for though we expect that such computations will require very high
path following, improving the polynomial term by an Ω(n/log n) precision. We note that no attempt has been made in the present
factor compared to the VY analysis. work to optimize the complexity of the linear algebra.
Our potential function can be seen as a fine-grained version of Ho and Tunçel [14] showed how to extend Tardos’ framework to
the crossover events as described above. In case of such a crossover the real model of computation (i.e., to non-integral A), providing a
event, it is guaranteed that in every subsequent iteration, i is in a blackbox alternative to the VY algorithm. The numerical complexity
layer before j. Instead, we analyze less radical changes: an “event” of the LPs arising in their reduction is controlled by the minimum
parametrized by τ means that i and j are currently together on a and maximum subdeterminant of A restricted to non-singular sub-
layer of size ≤ τ , and after the event, i is on a layer before j, or matrices and the minimum non-zero slack of any basic primal or
if they are together on the same layer, then this layer must have dual solution over a certain grid of right hand sides and objectives.
764
A Scaling-Invariant Algorithm for Linear Programming Whose Running Time Depends Only on the Constraint Matrix STOC ’20, June 22–26, 2020, Chicago, IL, USA
With regard to LLS algorithms, the original VY algorithm re- 2 FINDING AN APPROXIMATELY OPTIMAL
quired explicit knowledge of χ̄A to implement their layering algo- RESCALING
rithm. [27] showed that this could be avoided by computing all LLS
Notation. Our notation will largely follow [31, 32]. We let R++
steps associated with n candidate partitions and picking the best one.
denote the set of positive reals, and R+ the set of nonnegative reals.
For n ∈ N, we let [n] = {1, 2, . . . , n}. Let e i ∈ Rn denote the ith
In particular, they showed that all such LLS steps can be computed
in O(m2n) time. [31] gave an alternate approach which computes
unit vector, and e ∈ Rn the all 1s vector. For a vector x ∈ Rn ,
we let Diag(x) ∈ Rn×n denote the diagonal matrix with x on the
a LLS partition directly from the coefficients of the AS step. We
note that these methods crucially rely on the variable ordering, and
diagonal. We let D denote the set of all positive n × n diagonal
matrices. For x, y ∈ Rn , we use the notation xy ∈ Rn to denote
hence are not scaling invariant. Kitahara and Tsuchiya [20], gave a
2-layer LLS step which achieves a running time depending only on
xy = Diag(x)y = (x i yi )i ∈[n] . The scalar product of the two vectors
χ̄A∗ and right-hand side b, but with no dependence on the objective,
is denoted as x ⊤y. For p ∈ Q, we also use the notation x p to denote
assuming the primal feasible region is bounded. p
the vector (x i )i ∈[n] . Similarly, for x, y ∈ Rn , we let x/y denote the
A series of papers have studied the central path from a differential
vector (x i /yi )i ∈[n] . We denote the support of a vector x ∈ Rn by
geometry perspective. Monteiro and Tsuchiya [30] showed that a
supp(x) = {i ∈ [n] : x i , 0}.
curvature integral of the central path, first introduced by Sonnevend,
For an index subset I ⊆ [n], we use π I : Rn → RI for the
Stoer, and Zhao [38], is in fact upper bounded by O(n3.5 log( χ̄A∗ +n)).
coordinate projection. That is, π I (x) = x I , and for a subset S ⊆ Rn ,
This has been extended to SDP and symmetric cone programming
π I (S) = {x I : x ∈ S }. We let RnI = {x ∈ Rn : x [n]\I = 0}.
[16], and also studied in the context of information geometry [15].
Circuits have appeared in several papers on linear and integer For a matrix B ∈ Rn×k , I ⊂ [n] and J ⊂ [k] we let B I, J denote
optimization (see [8] and its references). The idea of using circuits the submatrix of B restricted to the set of rows in I and columns in J .
within the context of LP algorithms also appears in [7]. They de- We also use B I, • = B I,[k ] and B J = B •, J = B [n], J . We let B † ∈ Rk ×n
velop an augmentation framework for LP (as well ILP) and show denote the pseudo-inverse of B.
that a simplex-like algorithm which takes steps according to the We let Ker(A) denote the kernel of the matrix A ⊆ Rm×n . Through-
“best circuit” direction achieves linear convergence, though these out, we assume that the matrix A in (LP) has full row rank, and that
steps are hard to compute. n ≥ 3.
Our algorithm makes progress towards strongly polynomial Subspace Formulation. Throughout the paper, we letW = Ker(A) ⊆
solvability of LP, by improving the dependence poly(n, m, log χ̄ ) to Rn denote the kernel of the matrix A. Using this notation, (LP) can
poly(n, m, log χ̄ ∗ ). However, in a remarkable recent paper, Allami- be written in the form
geon et al. [2] have shown, using tools from tropical geometry, that
min c ⊤x max d ⊤ (c − s)
path-following methods for the standard logarithmic barrier can-
not be strongly polynomial. In particular, they give a parametrized x ∈W +d s ∈W⊥ +c (2)
family of instances, where, for sufficiently large parameter values, x ≥ 0, s ≥ 0,
any sequence of iterations following the central path must be of
exponential length—thus, χ̄ ∗ will be doubly exponential. We note where d ∈ Rn satisfies Ad = b.
that it is unclear whether their instance is robust to changing the The condition number χ̄ . The condition number χ̄A is defined as
barrier method itself; e.g., the weighted log-barrier [22]. n −1 o
χ̄A = sup ∥A⊤ ADA⊤ AD ∥ : D ∈ D
A y
⊤
= sup : y minimizes D 1/2 (A⊤y − p) (3)
1.2 Organization ∥p∥
for some 0 , p ∈ Rn and D ∈ D .
Section 2 begins with the necessary background on the condition
measures χ̄A and χ̄A∗ . It culminates in the approximate χ̄A∗ rescaling
and χ̄A approximation algorithm. This algorithm relies upon the This condition number was first studied by Dikin [9], Stewart [40],
circuit imbalance measure in Section 2.1, the min-max characteri- and Todd [43], among others, and plays a key role in the analysis
zation in Section 2.2, and a circuit finding algorithm in Section 2.3. of the Vavasis-Ye interior point method [49]. There is an extensive
In Section 3, we develop our scaling invariant interior-point literature on the properties and applications of χ̄A , as well as its
method. Interior-point preliminaries are given in Section 3.1, the relations to other condition numbers. We refer the reader to the
layered least squares step is explained in Section 3.3, our scaling papers [14, 31, 49] for further results and references.
invariant layering algorithm is given in Section 3.4, and lastly, our It is important to note that χ̄A only depends on the subspace
overall algorithm is given in Section 3.5. W = Ker(A). Hence, we can also write χ̄W for a subspace W ⊆ Rn ,
defined to be equal to χ̄A for some matrix A ∈ Rk ×n with W =
In Section 4, we describe the potential function proof for the
improved iteration bound. Section 4.1 shows that our argument also
Ker(A). We will use the notations χ̄A and χ̄W interchangeably.
leads to a factor Ω(n/log n) improvement in the iteration complexity
The next lemma summarizes some important known properties
bound of the VY algorithm. Finally, in Section 5, we discuss the
of χ̄A .
initialization of our interior-point method.
Proofs can be found in the full version of this paper, accessible Proposition 2.1. Let A ∈ Rm×n with full row rank and W =
at https://arxiv.org/abs/1912.06252. Ker(A).
765
STOC ’20, June 22–26, 2020, Chicago, IL, USA Daniel Dadush, Sophie Huiberts, Bento Natura, and László A. Végh
(i) If the entries of A are all integers, then χ̄A is bounded by 2O (L A ) , Note that these are also known as the circuits in the linear ma-
where L A is the input bit length of A. troid associated with A.
(ii) χ̄A = max{∥B −1A∥ : B non-singular m × m-submatrix of A}.
Definition 2.6. For a circuit C ∈ CW , let дC ∈ W be such that
(iii) χ̄W = χ̄W ⊥ .
supp(дC ) = C. For i, j ∈ C, we let
Proof. Part (i) was proved in [49, Lemma 24]. For part (ii), see
[44, Theorem 1] and [49, Lemma 3]. The duality statement (iii) was дC
j
shown in [13]. □ κW
i j (C) = . (5)
дiC
In Proposition 3.8, we will also give another proof of (iii). We now
For any i, j ∈ [n], we define the circuit ratio as the maximum of
κW
define the lifting map, a key operation in this paper, and explain its
connection to χ̄A . i j (C) over all choices of the circuit C:
κW W
n o
Definition 2.2. Let us define the lifting map LW
I : π I (W ) → W by i j = max κi j (C) : C ∈ CW , i, j ∈ C . (6)
LW
I (p) = arg min {∥z ∥ : z I = p, z ∈ W } .
By convention we set κWi j = 0 if there is no circuit supporting i and j.
Further, we define the circuit imbalance measure as
Note that LWI is the unique linear map from π I (W ) to W such
κW = max κW
n o
that L I (p)I = p and LW
W n
I (p) is orthogonal to W ∩ R[n]\I . i j : i, j ∈ [n] .
We have following characterization. This will be the most suit- Minimizing over all coordinate rescalings, we define
able characterization of χ̄W for our purposes. ∗
κW = min {κW D : D ∈ D} .
Proposition 2.3. For a linear subspace W ⊆ Rn ,
We omit the index W whenever it is clear from context. In such cases,
χ̄W = max ∥LW for D = Diag(d) ∈ D, we write κidj = κW D and κ d = κ d = κ
n o
I ∥ : I ⊆ [n], I , ∅ . ij W W D.
The following notation will be convenient for our algorithm. For We want to remark that a priori it is not clear that κW ∗ is well-
a subspace W ⊆ Rn and an index set I ⊆ [n], if π I (W ) , {0}, we defined. Theorem 2.11 will show that the minimum of {κW D : D ∈
define the lifting score D} is indeed attained. Observe that κW i j (C) does not depend on
q the choice of д, since there is only a single choice up to scalar
ℓW (I ) := ∥LW
I ∥ −1.
2 (4) multiplication.
Otherwise, we define ℓW (I ) = 0. This means that for any z ∈ π I (W )
The circuit ratio, as well as the circuit imbalance measure, are
and x = LW W self-dual.
I (z), ∥x [n]\I ∥ ≤ ℓ (I )∥z ∥.
Lemma 2.7. For any subspace W ⊆ Rn and i, j ∈ [n], κW W ⊥
Definition 2.5. For a linear subspace W ⊆ Rn and a matrix A such Our LLS algorithm in Section 3 will use the subroutine described
that W = Ker(A), a circuit is an inclusion-wise minimal dependent in Lemma 2.10. For a subspace W ⊆ Rn , an index set I ⊆ [n],
set of columns of A. Equivalently, a circuit is a set C ⊆ [n] such and a threshold θ > 0, the algorithm Verify-Lift(W , I, θ ) outputs
that W ∩ RC n is one-dimensional and that no strict subset of C has either of the answers ‘pass’ or ’fail’. If the answer is ‘pass’, then
this property. Any circuit is associated with a vector д ∈ W with it is guaranteed that ℓW (I ) ≤ θ . If the answer is ‘fail’, then a pair
inclusion-wise minimal support. The set of circuits of W is denoted of indices i ∈ I , j ∈ [n] \ I , and a bound t are returned, such that
CW . θ /n ≤ t ≤ κWi, j .
766
A Scaling-Invariant Algorithm for Linear Programming Whose Running Time Depends Only on the Constraint Matrix STOC ’20, June 22–26, 2020, Chicago, IL, USA
To implement Verify-Lift, we first need to select a minimal is the maximum size of an independent set contained in S. The
I ′ ⊂ I such that dim(π I ′ (W )) = dim(π I (W )). This can be found by maximal independent sets are called bases. All bases have the same
computing a matrix M ∈ R(n−m)×n such that range(M) = W , and cardinality rk([n]).
selecting a maximal number of linearly independent columns of M I . For the matrix A ∈ Rm×n , we will work with the linear matroid
′
Then, we compute the matrix B ∈ R([n]\I )×I that implements the M(A) = ([n], I(A)), where a subset I ⊆ [n] is independent if the
W
transformation [L I ′ ][n]\I : π I ′ (W ) → π [n]\I (W ). The algorithm columns {Ai : i ∈ I } are linearly independent. Note that rk([n]) =
returns the pair (i, j) corresponding to the entry maximizing |B ji |. m under the assumption that A has full row rank.
The circuits of the matroid are the inclusion-wise minimal non-
2.2 A Min-Max Theorem on κW
∗ independent sets. Let I ∈ I be an independent set, and i ∈ [n] \ I
We next provide a combinatorial min-max characterization on κW ∗ . such that I ∪ {i} < I. Then, there exists a unique circuit C(I, i) ⊆
Consider the circuit ratio digraph G = ([n], E) on the node set [n] I ∪ {i} that is called the fundamental circuit of i with respect to I .
where (i, j) ∈ E if κ(i, j) > 0, that is, there exists a circuit C ∈ C Note that i ∈ C(I , i).
with i, j ∈ C. An edge (i, j) ∈ E is said to have weight κi j = κW ij .
The matroid M is separable, if the ground set [n] can be par-
titioned to two nonempty subsets [n] = S ∪ T such that I ∈ I if
(Note that (i, j) ∈ E if and only if (j, i) ∈ E, but the weight of these
and only if I ∩ S, I ∩ T ∈ I. In this case, the matroid is the direct
two edges can be different.)
sum of its restrictions to S and T . In particular, every circuit is fully
Let H be a cycle in G, that is, a sequence of points i 1 , i 2 , . . . , i k , i k +1 =
contained in S or in T .
i 1 . We use |H | = k to denote the length of the cycle. (In our termi-
For the linear matroid M(A), separability means that Ker(A) =
nology, ‘cycles’ always refer to objects in G, whereas ‘circuits’ refer
Ker(AS ) ⊕ Ker(AT ). In this case, solving (LP) can be decomposed
to the minimum supports in Ker(A).)
into two subproblems, restricted to the columns in AS and in AT ,
We use the notation κ(H ) = κW (H ) = kj=1 κW
Î
i j i j +1 . For a vector and χ̄A = max{ χ̄AS , χ̄AT }.
d ∈ Rn++ , we let κ d (H ) = κW
d (H ) = κ
W D (H ) for D = Diag(d). A Hence, we can focus on non-separable matroids. The following
simple but important observation is that such a rescaling does not characterization is well-known, see e.g. [10, Theorems 5.2.5, 5.2.7–
change the value associated with the cycle, that is, 5.2.9]. For a hypergraph H = ([n], E), we define the underlying
κWd
(H ) = κW (H ) ∀d ∈ Rn++ for any cycle H in G . (7) graph HG = ([n], E) such that (i, j) ∈ E if there is a hyperedge
S ∈ E with i, j ∈ S. That is, we add a clique corresponding to each
We are ready to formulate our theorem. hyperedge. The hypergraph is called connected if the underlying
Theorem 2.11. For a subspace W ⊂ Rn , we have graph G = ([n], E) is connected.
d
n o
∗
κW = min κW = max κW (H )1/ |H | : H is a cycle in G . Proposition 2.14. For a matroid M = ([n], I), the following are
d >0
equivalent:
The following example shows that κ ∗ ≤ χ̄ ∗ can be arbitrarily
(i) M is non-separable.
big.
(ii) The hypergraph of the circuits is connected.
Example 2.12. TakeW = span((0, 1, 1, M), (1, 0, M, 1)), where M > (iii) For any base B of M, the hypergraph formed by the funda-
0. Then {2, 3, 4} and {1, 3, 4} are circuits with κW
34 ({2, 3, 4}) = M and mental circuits C B = {C(B, i) : i ∈ [n] \ B} is connected.
κW
43 ({1, 3, 4}) = M. Hence, by Theorem 2.11, we see that κ ∗ ≥ M. (iv) For any i, j ∈ [n], there exists a circuit containing i and j.
The following corollary of Theorem 2.11 particularly useful. It We are ready to describe the algorithm that will be used to obtain
asserts that any arbitrary circuit containing i and j yields a (κ ∗ )2 lower bounds on all κi j values. For a matrix A ∈ Rm×n , we let Find-
approximation to κi j . Circuits(A) denote the subroutine described in the lemma for the
Corollary 2.13. We are given a linear subspace W ⊆ Rn and i, j ∈ linear matroid M(A).
[n], i , j, and a circuit C ∈ CW with i, j ∈ C. Let д ∈ W be the
Theorem 2.15. Given A ∈ Rm×n , there exists an O(n 2m 2 ) time
corresponding vector with supp(д) = C. Then,
algorithm Find-Circuits(A) that obtains a decomposition of M(A) to
κW
ij |дj | a direct sum of non-separable linear matroids, and returns a family Ĉ
2 ≤ ≤ κW
ij . of circuits such that if i and j are in the same non-separable component,
|дi |
κW
∗
then there exists a circuit in Ĉ containing both i and j. Further, for
each i , j in the same component, the algorithm returns a value κ̂i j as
2.3 Finding Circuits: A Detour in Matroid the the maximum of |дj /дi | such that д ∈ W , supp(д) = C for some
Theory C ∈ Ĉ containing i and j. For these values, κ̂i j ≤ κi j ≤ (κ ∗ )2κ̂i j .
We now show how to efficiently obtain a family Ĉ ⊆ CW such
The rescaling algorithm described in Theorem 2.4 functions by
that for any i, j ∈ [n], Ĉ includes a circuit containing both i and j,
first running Find-Circuits(A) to approximate the circuit ratio
provided there exists such a circuit. q
We need some simple concepts and results from matroid theory. graph. Taking t = 1 + max(i, j)∈E κ̂i2j approximates χ̄A per Theo-
We refer the reader to [37, Chapter 39] or [10, Chapter 5] for defini- rem 2.8. A maximum-mean cycle computation allows us to compute
tions and background. Let M = ([n], I) be a matroid on ground set the a suitable rescaling to approximately minimize κW d in O(n 3 )
[n] with independent sets I ⊆ 2V . The rank rk(S) of a set S ⊆ 2[n] time (see e.g. [1, Theorem 5.8]).
767
STOC ’20, June 22–26, 2020, Chicago, IL, USA Daniel Dadush, Sophie Huiberts, Bento Natura, and László A. Végh
3 A SCALING-INVARIANT LAYERED LEAST of steps. In predictor steps, we use σ = 0. This direction is also
SQUARES INTERIOR-POINT ALGORITHM called the affine scaling direction, and will be denoted as ∆w a =
(∆x a , ∆y a , ∆s a ) throughout. In corrector steps, we use σ = 1. This
3.1 Preliminaries on Interior-Point Methods gives the centrality direction, denoted as ∆w c = (∆x c , ∆y c , ∆s c ).
In this section, we introduce the standard definitions, concepts and In the predictor steps, we make progress along the central path.
results from the interior-point literature that will be required for Given the search direction on the current iterate w = (x, y, s) ∈
our algorithm. We consider an LP problem in the form (LP), or N (β), the step-length is chosen maximal such that we remain in
equivalently, in the subspace form (2) for W = Ker(A). We let N (2β), i.e.
P ++ = {x ∈ Rn : Ax = b, x > 0} α a := sup{α ∈ [0, 1] : ∀α ′ ∈ [0, α] : w + α ′ ∆w a ∈ N (2β)}.
++ m+n ⊤
D = {(y, s) ∈ R : A y + s = c, s > 0} . Thus, we obtain a point w + = w + α a ∆w a ∈ N (2β). The corrector
Recall the central path defined in (CP), with w(µ) = (x(µ), y(µ), s(µ)) step finds a next iterate w c = w a + ∆w c , where ∆w c is the centrality
denoting the central path point corresponding to µ > 0. We let direction computed at w a . The next proposition summarizes well-
w ∗ = (x ∗ , y ∗ , s ∗ ) denote the primal and dual optimal solutions to known properties, see e.g. [51, Section 4.5.1].
(LP) that correspond to the limit of the central path for µ → 0.
For a point w = (x, y, s) ∈ P ++ × D ++ , the normalized duality Proposition 3.4. Let w = (x, y, s) ∈ N (β) for β ∈ (0, 1/4].
gap is µ(w) = x ⊤s/n. (i) For the affine scaling step, we have µ(w + ) = (1 − α)µ(w).
The ℓ2 -neighborhood of the central path with opening β > 0 is the (ii) The affine scaling step-length is
β ∥∆x a ∆s a ∥
set
α a ≥ max √ , 1 − .
xs
n βµ(w)
N (β) = w ∈ P ++ × D ++ : −e ≤ β
µ(w)
Throughout the paper, we will assume β is chosen from (0, 1/4]; in (iii) For w + ∈ N (2β), and w c = w + + ∆w c , we have µ(w c ) =
Algorithm 2 we use the value β = 1/8. The following proposition µ(w + ) and w c ∈ N (β).
√
gives a bound on the distance between w and w(µ) if w ∈ N (β). (iv) After a sequence of O( nt) predictor and corrector steps, we
See e.g. [12, Lemma 5.4]. obtain an iterate w = (x ′, y ′, s ′ ) ∈ N (β) such that µ(w ′ ) ≤
′
µ(w)/2t .
Proposition 3.1. Let w = (x, y, s) ∈ N (β) for β ∈ (0, 1/4] and µ =
µ(w), and consider the central path point w(µ) = (x(µ), y(µ), s(µ)). Minimum norm viewpoint and residuals. For any point w =
For each i ∈ [n], (x, y, s) ∈ P ++ × D ++ we define
xi 1 − 2β xi δ = δ (w) = s 1/2x −1/2 ∈ Rn . (11)
≤ · x i ≤ x i (µ) ≤ , and
1 + 2β 1−β 1−β With this notation, we can write (10) in the form
si 1 − 2β si
≤ · si ≤ si (µ) ≤ . δ ∆x + δ −1 ∆s = −s 1/2x 1/2 . (12)
1 + 2β 1−β 1−β
From Proposition 3.1, we see that if w ∈ N (β), and µ = µ(w),
We will often use the following immediate corollary.
then for each i ∈ [n],
Corollary 3.2. Let w = (x, y, s) ∈ N (β) for β ∈ (0, 1/4], and 1
1 − 2β · δi (w) ≤ δi (w(µ)) ≤ p · δi (w) .
p
µ = µ(w). Then for each i ∈ [n] 1 − 2β
(13)
√ √ √
(1 − β) µ ≤ si x i ≤ (1 + 2β) µ . The matrix Diag(δ (w)) will be often used for rescaling in the algo-
A key property of the central path is “near monotonicity”, formu- rithm. That is, for the current iterate w = (x, y, s) in the interior-
lated in the following lemma, see [49, Lemma 16]. point method, we will perform projections in the spaceW Diag(δ (w)).
To simplify notation, for δ = δ (w), we use LδI and κiδj as shorthands
Lemma 3.3. Let w = (x, y, s) be a central path point for µ and w ′ = W Diag(δ ) W Diag(δ )
(x ′, y ′, s ′ ) be a central path point for µ ′ ≤ µ. Then ∥x ′ /x+s ′ /s ∥∞ ≤ n. for L I and κi j . The subspace W = Ker(A) will be
Further, for the optimal solution w ∗ = (x ∗ , y ∗ , s ∗ ) corresponding to fixed throughout.
the central path limit µ → 0, we have ∥x ∗ /x ∥1 + ∥s ∗ /s ∥1 = n. It is easy to see from the optimality conditions that the compo-
nents of the affine scaling direction (∆x a , ∆y a , ∆s a ) are the optimal
3.2 Predictor and Corrector Steps solutions of the following minimum-norm problems.
Given w = (x, y, s) ∈ P ++ × D ++ , the search directions com- ∆x a = arg min{∥δ (x + ∆x)∥ 2 : A∆x = 0}
monly used in interior-point methods are obtained as the solution ∆x ∈Rn
(∆x, ∆y, ∆s) to the following linear system for some σ ∈ [0, 1]. (∆y , ∆s ) =
a a
arg min {∥δ −1 (s + ∆s)∥ 2 : A⊤ ∆y + ∆s = 0}
(∆y, ∆s)∈Rm ×Rn
A∆x = 0 (8)
(14)
A⊤ ∆y + ∆s = 0 (9) Following [32], for a search direction ∆w = (∆x, ∆y, ∆s), we define
s∆x + x∆s = σ µe − xs (10) the residuals as
δ (x + ∆x) δ −1 (s + ∆s)
Predictor-corrector methods, such as the Mizuno-Todd-Ye Predictor- Rx := √ , Rs := √ . (15)
Corrector (MTY P-C) algorithm [29], alternate between two types µ µ
768
A Scaling-Invariant Algorithm for Linear Programming Whose Running Time Depends Only on the Constraint Matrix STOC ’20, June 22–26, 2020, Chicago, IL, USA
Hence, the primal affine scaling direction ∆x a is the one that mini- The component ∆y ll is obtained as the optimal ∆y for the final layer
mizes the ℓ2 -norm of the primal residual Rx, and the dual affine scal- k = p. We use the notation Rx ll and ε ll (w) analogously to the affine
ing direction (∆y a , ∆s a ) minimizes the ℓ2 -norm of the dual residual scaling direction. This search direction was first introduced in [49].
Rs. The next lemma summarizes simple properties of the residuals,
The affine scaling direction is a special case for the single element
see [32].
partition. In this case, the definitions (18) and (19) coincide with
Lemma 3.5. For β ∈ (0, 1/4] such that w = (x, y, s) ∈ N (β) and those in (14).
the affine scaling direction ∆w = (∆x a , ∆y a , ∆s a ), we have
3.3.1 A Linear System Viewpoint. We now present an equivalent
(i)
definition of the LLS step, generalizing the linear system (9)-(10).
∆x a ∆s a x 1/2s 1/2 We use the subspace notation. With this notation, (9)-(10) for the
Rx a Rs a = , Rx a + Rs a = √ , (16)
µ µ affine scaling direction can be written as
(ii) s∆x a + x ∆s a = −xs , ∆x a ∈ W , and ∆s a ∈ W ⊥ . (20)
∥Rx ∥ + ∥Rs ∥ = n ,
a 2 a 2
√ Recall that (20) is equivalent to δ ∆x a = + δ −1 ∆s a −x 1/2s 1/2 .
(iii) We have ∥Rx a ∥, ∥Rs a ∥ ≤ n, and for each i ∈ [n], we have Given the layering J and w = (x, y, s), for each k ∈ [p] we
max{|Rx i |, |Rsi |} ≥ 2 (1 − β).
a a 1
define the subspaces
(iv)
1 1 W J,k := {x Jk : x ∈ W , x J >k = 0}
Rx a = − √ δ −1 ∆s a , Rs a = − √ δ ∆x a .
µ µ ⊥
W J,k := {x Jk : x ∈ W ⊥ , x J <k = 0} .
For a subset I ⊂ [n], we define It is easy to see that these two subspaces are orthogonal comple-
ε Ia (w) := max min{|Rx ia |, |Rsia |} , and ε a (w) := ε [n]
a
(w) . (17) ments. Analogously to (20), the primal LLS step ∆x ll is obtained as
i ∈I the unique solution to the linear system
The next claim shows that for the affine scaling direction, a small
ε(w) yields a long step; see [32, Lemma 2.5]. δ ∆x ll + δ −1 ∆s = −x 1/2s 1/2 ,
⊥ ⊥
(21)
Lemma 3.6. Let w = (x, y, s) ∈ N (β) for β ∈ (0, 1/4]. Then for the ∆x ll ∈ W , and ∆s ∈ W J,1 ⊕ · · · ⊕ W J,p ,
affine scaling step, we have
√ a and the dual LLS step ∆s ll is the unique solution to
µ(w + α a ∆w a ) β nε (w)
≤ min 1 − √ , . δ ∆x + δ −1 ∆s ll = −x 1/2s 1/2 ,
µ(w) n β (22)
∆x ∈ W J,1 ⊕ · · · ⊕ W J,p , and ∆s ll ∈ W ⊥ .
3.3 Layered Least Squares Direction
Let J = (J1 , J2 , . . . , Jp ) be an ordered partition of [n].1 For k ∈ [p], It is important to note that ∆s in (21) may be different from ∆s ll ,
we use the notations J <k := J1 ∪. . .∪Jk −1 and J >k := Jk +1 ∪. . .∪Jp , and ∆x in (22) may be different from ∆x ll . In fact, ∆s ll = ∆s and
and similarly J ≤k and J ≥k . We will also refer to the sets Jk as layers, ∆x ll = ∆x can only be the case for the affine scaling step.
and J as a layering. Layers with lower indices will be referred to The following lemma proves that the above linear systems are
as ’higher’ layers. indeed uniquely solved by the LLS step.
Given w = (x, y, s) ∈ P ++ × D ++ , and the layering J , the
Lemma 3.7. For t ∈ Rn ,W ⊆ Rn , δ ∈ Rn++ , and J = (J1 , J2 , . . . , Jp ),
layered-least-squares (LLS) direction is defined as follows. For the
primal direction, we proceed backwards, with k = p, p − 1, . . . , 1. let w = LLSW
J
,δ
(t) be defined by
Assume the components on the lower layers ∆x llJ have already δw + δ −1v = δt, w ∈W, ⊥
v ∈ W J,1 ⊥
⊕ · · · ⊕ W J,p .
>k
been determined. We define the components in Jk as the coordinate
projection ∆x llJ = π Jk (X k ), where the affine subspace X k is defined
k Then LLSW J
,δ
(t) is well-defined and
as the set of minimizers
δ Jk (t Jk − w Jk ) = min δ Jk (t Jk − z Jk ) : z ∈ W , z J >k = w J >k
X k := arg min{∥δ Jk (x Jk + ∆x Jk )∥ 2 : A∆x = 0, ∆x J >k = ∆x llJ >k } .
∆x ∈Rn for every k ∈ [p].
(18)
In the notation of the above lemma we have, for ordered par-
The dual direction ∆s ll is determined in the forward order of the
titions J = (J1 , J2 , . . . , Jp ), J̄ = (Jp , Jp−1 , . . . , J1 ), and (x, y, s) ∈
layers k = 1, 2, . . . , p. Assume we already fixed the components
∆s llJ on the higher layers. Then, ∆s llJ = π Jk (Sk ) for P ++ × D ++ with δ = s 1/2x −1/2 , that ∆x ll = LLSW J
,δ
(−x) and
<k
∆s ll = LLSW ,δ (−s).
k ⊥ −1
769
STOC ’20, June 22–26, 2020, Chicago, IL, USA Daniel Dadush, Sophie Huiberts, Bento Natura, and László A. Végh
Proposition 3.8. For a linear subspace W ⊆ Rn and index set (v) We have ε ll (w) = 0 if and only if α = 1. These are further
I ⊆ [n] with J = [n] \ I , equivalent to w + ∆w ll = (x + ∆x ll , y + ∆y ll , s + ∆s ll ) being
∥LW W ⊥ an optimal solution to (LP).
I ∥ ≤ max{1, ∥L J ∥}.
(iv) Let ε ll (w) = maxi ∈[n] min{|Rx ill |, |Rsill |}, and define the step The following lemma shows that within each layer, the κiδj values
length as are within a bounded range. This will play an important role in our
potential analysis.
α := sup{α ′ ∈ [0, 1] : ∀ᾱ ∈ [0, α ′ ] : w + ᾱ ∆w ll ∈ N (2β)} .
We obtain the following bounds on the progress in the LLS step: Lemma 3.14. Let 0 < σ < 1 and t > 0, and i, j ∈ [n], i , j. If the
graph G δ,σ contains a directed path of at most t − 1 edges from j to i,
µ(w + α ∆w ll ) = (1 − α)µ , and then
√ ∗ t t
3 nε ll (w) χ̄ σ
α ≥ 1− . κiδj < and κ δji > .
β σ χ̄ ∗
770
A Scaling-Invariant Algorithm for Linear Programming Whose Running Time Depends Only on the Constraint Matrix STOC ’20, June 22–26, 2020, Chicago, IL, USA
Description of the Layering Subroutine. Consider an iterate w = 3.5 The Overall Algorithm
(x, y, s) ∈ N (β) of the algorithm with δ = δ (w). The subroutine
Layering(δ, κ̂), described in Algorithm 1, constructs a δ -balanced
layering. We recall that the approximated auxilliary graph Ĝ δ,γ /n Algorithm 2: LP-Solve(A, b, c, w 0 )
with respect to κ̂ is as in Definition 3.11 Input :A ∈ Rm×n , b ∈ Rm , c ∈ Rn , and an initial feasible
solution w 0 = (x 0 , y 0 , s 0 ) ∈ N (1/8) to (LP).
Algorithm 1: Layering(δ, κ̂) Output : Optimal solution w ∗ = (x ∗ , y ∗ , s ∗ ) to (LP).
Input :δ ∈ Rn++ and κ̂ ∈ R++ E . 1 Call Find-Circuits(A) to obtain the lower bounds κ̂i j for
Output :δ -balanced layering J = (J1 , . . . , Jp ) and updated each i, j ∈ [n], i , j;
values κ̂ ∈ R++ E . 2 k ← 0, α ← 0;
1 Compute the strongly connected components C 1 , C 2 , . . . , C ℓ 3 repeat
of Ĝ δ,γ /n , listed in the ordering imposed by Ĝ δ,γ /n ; 4 /* Predictor step */
2 Ē ← Êδ,γ /n ; 5 Compute affine scaling direction ∆w a = (∆x a , ∆y a , ∆s a )
for w;
3 for k = 2, . . . , ℓ do
4 Call Verify-Lift(W Diag(δ ), C ≥k , γ ) that answers ‘pass’ 6 if ε a (w) < 10n 3/2γ then // Recall ε a (w) defined
or ‘fail’; in (17)
5 if the answer is ‘fail’ then 7 δ ← (s k )1/2 (x k )−1/2 ;
6 Let i ∈ C ≥k , j ∈ C <k , and t be the output of 8 (J , κ̂) ←Layering(δ , κ̂);
Verify-Lift such that γ /n ≤ t ≤ κiδj ; 9 Compute Layered Least Squares direction
7 κ̂i j ← tδi /δ j ; ∆w ll = (∆x ll , ∆y ll , ∆s ll ) for the layering J and w;
8 Ē ← Ē ∪ {(i, j)}; 10 ∆w ← ∆w ll ;
11 else
9 Compute strongly connected components J1 , J2 , . . . , Jp of 12 ∆w ← ∆w a ;
([n], Ē), listed in the ordering imposed by Ĝ δ,γ /n ;
13 α ← sup{α ′ ∈ [0, 1] : ∀ᾱ ∈ [0, α ′ ] : w + ᾱ ∆w ∈
10 return J = (J1 , J2 , . . . , Jp ), κ̂.
N (1/4)};
14 w ′ ← w k + α ∆w;
We now give an overview of the subroutine Layering(δ, κ̂). We
15 /* Corrector step */
start by computing the strongly connected components (SCCs) of
16 Compute centrality direction ∆w c = (∆x c , ∆y c , ∆s c ) for
the directed graph Ĝ δ,γ /n . The edges of this graph are obtained
w ′;
using the current estimates κ̂iδj . According to Lemma 3.12, we have
17 w k+1 ← w ′ + ∆w c ;
(i, j) ∈ Êδ,γ /n or (j, i) ∈ Êδ,γ /n for every i, j ∈ [n], i , j. Hence, 18 k ← k + 1;
there is a linear ordering of the components C 1 , C 2 , . . . , C ℓ such
19 until µ(w k ) = 0;
that (u, v) ∈ Êδ,γ /n whenever u ∈ Ci , v ∈ C j , and i < j. We call
20 return w k = (x k , y k , s k ).
this the ordering imposed by Ĝ δ,γ /n .
Next, for each k = 2, . . . , ℓ, we use the subroutine Verify-
Lift(W Diag(δ ), C ≥k , γ ) described after Lemma 2.10. If the sub- Algorithm 2 presents the overall algorithm LP-Solve(A, b, c, w 0 ).
routine returns ‘pass’, then we conclude ℓδ (C ≥k ) ≤ γ , and proceed We assume that an initial feasible solution w 0 = (x 0 , y 0 , s 0 ) ∈ N (β)
to the next layer. If the answer is ‘fail’, then the subroutine returns is given. We address this in Section 5, by adapting the extended
as certificates i ∈ C ≥k , j ∈ C <k , and t such that γ /n ≤ t ≤ κiδj . In system used in [49]. We note that this subroutine requires an upper
this case, we update κ̂iδj to the higher value t. We add (i, j) to an edge bound on χ̄ ∗ . Since computing χ̄ ∗ is hard, we can implement it by
a doubling search on log χ̄ ∗ , as explained in Section 5. Other than
set Ē; this edge set was initialized to contain Êδ,γ /n . After adding
for initialization, the algorithm does not require an estimate on χ̄ ∗ .
(i, j), all components C ℓ between those containing i and j will be
The algorithm starts with the subroutine Find-Circuits(A) as
merged into a single strongly connected component. To see this,
in Theorem 2.15. The iterations are similar to the MTY Predictor-
recall that if i ′ ∈ C ℓ and j ′ ∈ C ℓ′ for ℓ < ℓ ′ , then (i ′, j ′ ) ∈ Êδ,γ /n
Corrector algorithm [29]. The main difference is that certain affine
according to Lemma 3.12.
scaling steps are replaced by LLS steps. In every predictor step,
Finally, we compute the strongly connected components of ([n], Ē).
we compute the affine scaling direction, and consider the quantity
We let J1 , J2 , . . . , Jp denote their unique acyclic order, and return
ε a (w) = maxi ∈[n] min{|Rx ia |, |Rsia |}. If this is above the threshold
these layers.
10n 3/2γ , then we perform the affine scaling step. However, in case
Lemma 3.15. The subroutine Layering(δ, κ̂) returns a δ -balanced ε a (w) < 10n3/2γ , we use the LLS direction instead. In each such
layering in O(nm2 + n 2 ) time. iteration, we call the subroutine Layering(δ, κ̂) (Algorithm 1) to
The difficult part of the proof of the above lemma is showing the compute the layers, and we compute the LLS step for this layering.
running time bound. We note that the weaker bound O(n 2m 2 ) can Another important difference is that the algorithm does not
be obtained by a simpler argument. require a final rounding step. It terminates with the exact optimal
771
STOC ’20, June 22–26, 2020, Chicago, IL, USA Daniel Dadush, Sophie Huiberts, Bento Natura, and László A. Végh
solution w ∗ once a predictor step is able to perform a full step with Lemma 4.2. Let w = (x, y, s) ∈ N (β) for β ∈ (0, 1/8], and let J =
α = 1. (J1 , . . . , Jp ) be a δ (w)-balanced layering, and let ∆w ll = (∆x ll , ∆y ll , ∆s ll )
Theorem 3.16. For given A ∈ Rm×n , b ∈ Rm , c ∈ Rn , and an
be the corresponding LLS direction. Then the following statement holds
for every q ∈ [p]:
initial feasible solution w 0 = (x 0 , y 0 , s 0 ) ∈ N (1/8), Algorithm 2 finds
an optimal solution to (LP) in O(n 2.5 log n log( χ̄A∗ + n)) iterations. (i) There exists i ∈ Jq such that
2x i
4 THE POTENTIAL FUNCTION AND THE x i∗ ≥ √ · (∥Rx llJq ∥ − 2γn) . (26)
3 n
OVERALL ANALYSIS (ii) There exists j ∈ Jq such that
Let µ > 0 and δ (µ) = s(µ)1/2x(µ)−1/2 correspond to the point on 2s j
the central path. For i, j ∈ [n], i , j, we define s j∗ ≥ √ · (∥Rs llJq ∥ − 2γn) . (27)
3 n
δ (µ)
log κi j
ϱ µ (i, j) :=
We emphasize that the lemma only shows the existence of such
,
indices i and j, but does not provide an efficient algorithm for
log 3n χ̄A∗ /γ
identifying them. It is also useful to note that for any i ∈ [n],
and the main potentials in the algorithm as max{|Rx ill |, |Rsill |} ≥ 21 − 54 β according to Lemma 3.10(iii). Thus, for
each q ∈ [p], we obtain a positive lower bound either in case (i) or
Ψ µ (i, j) := max 1, min 2n, inf′ ϱ µ (i, j)
′
log2 Ψ µ (i, j) .
Õ
Ψ(µ) := The next lemma shows how we can argue for increase in the
i, j ∈[n],i,j
potential function value for multiple pairs of variables, if we have
lower bounds on both x i∗ and s j∗ for some i, j ∈ [n], along with a
The quantity Ψ µ (i, j) is motivated by the bounds in Lemma 3.14.
lower bound on ϱ µ (i, j).
The next statement is an immediate consequence of this lemma and
(13). Lemma 4.3. Let w = (x, y, s) ∈ N (2β) for β ∈ (0, 1/8], let µ = µ(w)
and δ = δ (w). Let i, j ∈ [n] and 2 ≤ τ ≤ n such that for the optimal
Lemma 4.1. Let w = (x, y, s) ∈ N (β) for β ∈ (0, 1/4], let µ = µ(w),
solution w ∗ = (x ∗ , y ∗ , s ∗ ), we have x i∗ ≥ βx i /(210n5.5 ) and s j∗ ≥
and δ = δ (w). Let i, j ∈ [n], i , j. If the graph G δ,γ /(3n) contains a
path from j to i of at most t − 1 edges, then ϱ µ (i, j) < t. If there is a βs j /(210n 5.5 ), and assume ϱ µ (i, j) ≥ −τ . Let µ ′ be the normalized
√
path of at most t − 1 edges from i to j, then −t < ϱ µ (i, j). duality gap after Ω( nτ log( χ̄ ∗ + n)) iterations subsequent to the
iterate w. Then Ψ µ (i, j) ≥ 2τ , and for every ℓ ∈ [n] \ {i, j}, either
′
If Ψ µ (i, j) ≥ t, then i and j cannot be together on a layer of size
Ψ µ (i, ℓ) ≥ 2τ , or Ψ µ (ℓ, j) ≥ 2τ .
′ ′
≤ t, and j cannot be on a layer preceding the layer containing i in
any δ (w ′ )-balanced layering, where w ′ = (x ′, y ′, s ′ ) ∈ N (β) with We note that i and j as in the lemma are necessarily different,
µ(w ′ ) < µ. since i = j would imply 0 = x i∗si∗ ≥ β 2 µ/(220n 11 ).
Our potentials Ψ µ (i, j) can be seen as fine-grained analogues of The overall potential argument in the proof of Theorem 3.16
the crossover events analyzed in [31, 32, 49]. Roughly speaking, a uses Lemma 4.3 in three cases: ξ J ll (w) ≥ 4γn (Lemma 4.2 applies);
772
A Scaling-Invariant Algorithm for Linear Programming Whose Running Time Depends Only on the Constraint Matrix STOC ’20, June 22–26, 2020, Chicago, IL, USA
With these edge weights, it is easy to see that our Layering(δ, κ̂) Proposition 5.3. Assume both primal and dual of (LP) are feasible,
subroutine finds the exact same components as VY. Moreover, the and M > max{( χ̄A + 1)∥c ∥, χ̄A ∥d ∥}. Every optimal solution (x, y, s)
layers will be the initial strongly connected components Ci of to (LP), can be extended to an optimal solution (x, x, x̄, y, s, s, s̄) to
¯ ¯
G δ,γ /n : due to the choice of д, this partition is automatically δ - (Init-LP); and conversely, from every optimal solution (x, x, x̄, y, z, s, s, s̄)
¯ ¯
balanced. There is no need to call Verify-Lift. to (Init-LP), we obtain an optimal solution (x, y, s) by deleting the
The essential difference compared to our algorithm is that the auxiliary variables.
values κ̂i j = дγ /n are not lower bounds on κi j as we require, but
The next lemma is from [31, Lemma 4.4]. Recall that w = (x, y, s) ∈
upper bounds instead. This is convenient to simplify the construc-
N (β) if ∥xs/µ(w) − e ∥ ≤ β.
tion of the layering. On the negative side, the strongly connected
components of Ĝ δ,γ /n may not anymore be strongly connected Lemma 5.4. Let w = (x, y, s) ∈ P ++ × D ++ , and let ν > 0. Assume
√ √
in G δ,γ /n . Hence, we cannot use Lemma 4.1, and consequently, that ∥xs/ν − e ∥ ≤ τ . Then (1 − τ / n)ν ≤ µ(w) ≤ (1 + τ / n)ν and
Lemma 4.3 does not hold. w ∈ N (τ /(1 − τ )).
Still, the κ̂i j bounds are overestimating κi j by at most a factor The new system has the advantage that we can easily initialize
poly(n) χ̄ . Therefore, the strongly connected components of Ĝ δ,n/γ the system with a feasible solution in close proximity to central
are strongly connected in G δ,σ for some σ = 1/(poly(n) χ̄ ). path:
Hence, the entire argument described in this section is applicable
to the VY algorithm, with a different potential function defined Proposition 5.5. We can initialize system (Init-LP) close to the
with χ̄ instead of χ̄ ∗ . This is the reason why the iteration bound central path with initial solution w 0 = (x 0 , y 0 , s 0 ) ∈ N (1/8) and
in Lemma 4.3, and therefore in Theorem 3.16, also changes to χ̄ parameter µ(w 0 ) ≈ M 2 if M > 15 max{( χ̄A + 1)∥c ∥, χ̄A ∥d ∥}.
dependency. Detecting Infeasibility. For using the extended system (Init-LP),
It is worth noting that due to the overestimation of the κi j values, we still need to assume that both the primal and dual programs in
the VY algorithm uses a coarser layering than our algorithm. Our (LP) are feasible. For arbitrary instances, we first need to check if
algorithm splits up the VY layers into smaller parts so that ℓδ (J ) this is the case, or conclude that the primal or the dual (or both) are
remains small, but within each part, the gaps between the variables infeasible.
are bounded as a function of χ̄A∗ instead of χ̄A . This can be done by employing a two-phase method. The first
phase decides feasibility by running (Init-LP) with data (A, b, 0)
5 INITIALIZATION and M > χ̄A ∥d ∥. The objective value of the optimal primal-dual
Our main algorithm (Algorithm 2 in Section 3.5), requires an initial pair is 0 if and only if (LP) has a feasible solution. If the optimal
solution w 0 = (x 0 , y 0 , s 0 ) ∈ N (β). In this section, we remove this primal/dual solution (x ∗ , x ∗ , x̄ ∗ , y ∗ , s ∗ , s ∗ , s̄ ∗ ) has positive objective
assumption by adapting the initialization method of [49] to our ¯ ¯
value, we can extract an infeasibility certificate.
setting. Feasibility of the dual of (LP) can be decided by running (Init-LP)
We use the “big-M method”, a standard initialization approach for on data (A, 0, c) and M > ( χ̄A + 1)∥c ∥ with the same argumentation:
path-following interior point methods that introduces an auxiliary Either the objective of the dual is 0 and therefore the dual optimal
system whose optimal solutions map back to the optimal solutions solution (y ∗ , s ∗ , s ∗ , s̄ ∗ ) corresponds to a feasible dual solution of (LP)
of the original system. The primal-dual system we consider is ¯
or the objective value is negative and we extract a dual infeasibility
min c ⊤x+Me ⊤x max y ⊤b + 2Me ⊤z certificate.
¯
Ax − Ax = b A⊤y + z + s = c Finding the Right Value of M. Whereas Algorithm 2 does not
¯ require any estimate on χ̄ ∗ or χ̄ , for the initialization we need to
x + x̄ = 2Me z + s̄ = 0 (Init-LP)
set M ≥ max{( χ̄A + 1)∥c ∥, χ̄A ∥d ∥} as in Proposition 5.3.
⊤
x, x̄, x ≥ 0 −A y + s = Me A straightforward guessing approach (attributed to J. Renegar
¯ ¯
s, s̄, s ≥ 0. in [49]) starts with a constant guess, say χ̄A = 100, constructs
¯ the extended system, and runs the algorithm. In case the optimal
The constraint matrix used in this system is
solution to the extended system does not map to an optimal solution
©A 0 −Aª of (LP), we restart with χ̄A = 1002 and try again; we continue
 = ® squaring the guess until an optimal solution is found.
«I I 0¬ This would still require a series of log log χ̄A guesses, and thus,
The next lemma, asserts that the χ̄ condition number of  is not result in a dependence on χ̄A in the running time. However, if
much bigger than that of A of the original system (LP). we initially rescale our system using the near-optimal rescaling
√ Theorem 2.4, the we can turn the dependence from χ̄A to χ̄A∗ . The
Lemma 5.1 ([49, Lemma 23]). χ̄Â ≤ 3 2( χ̄A + 1). overall iteration complexity remains O(n 2.5 log n log( χ̄A∗ +n)), since
We extend this bound for χ̄ ∗ . the running time for the final guess on χ̄A∗ dominates the total
√ running time of all previous computations due to the repeated
Lemma 5.2. χ̄ ∗ ≤ 3 2( χ̄A∗ + 1).
 squaring.
Also, for sufficiently large M, the optimal solutions of the original Note that this guessing technique can handle bad guesses grace-
system are preserved. We let d be the min-norm solution to Ax = b, fully. For the first phase, if neither a feasible solution to (LP) is
i.e., d = A⊤ (AA⊤ )−1b. returned nor a Farkas’ certificate can be extracted, we have proof
773
STOC ’20, June 22–26, 2020, Chicago, IL, USA Daniel Dadush, Sophie Huiberts, Bento Natura, and László A. Végh
that the guess was too low by the above paragraph. Similarly, in [25] Aleksander Madry. 2013. Navigating central path with electrical flows: From
phase two, when feasibility was decided in the affirmative for pri- flows to matchings, and back. In Proceedings of the 54th IEEE Annual Symposium
on Foundations of Computer Science. IEEE, 253–262.
mal and dual, an optimal solution to (Init-LP) that corresponds to [26] Nimrod Megiddo. 1983. Towards a genuinely polynomial algorithm for linear
an infeasible solution to (LP) serves as a certificate that another programming. SIAM J. Comput. 12, 2 (1983), 347–353.
[27] Nimrod Megiddo, Shinji Mizuno, and Takashi Tsuchiya. 1998. A modified layered-
squaring of the guess is necessary. step interior-point algorithm for linear programming. Mathematical Programming
82, 3 (1998), 339–355.
[28] Sanjay Mehrotra. 1992. On the implementation of a primal-dual interior point
REFERENCES method. SIAM Journal on Optimization 2, 4 (1992), 575–601.
[29] Shinji Mizuno, Michael Todd, and Yinyu Ye. 1993. On Adaptive-Step Primal-Dual
[1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. 1993. Network Flows: Theory, Algo- Interior-Point Algorithms for Linear Programming. Mathematics of Operations
rithms, and Applications. Prentice-Hall, Inc. Research - MOR 18 (11 1993), 964–981. https://doi.org/10.1287/moor.18.4.964
[2] Xavier Allamigeon, Pascal Benchimol, Stéphane Gaubert, and Michael Joswig. [30] Renato D.C. Monteiro and Takashi Tsuchiya. 2008. A strong bound on the integral
2018. Log-barrier interior point methods are not strongly polynomial. SIAM of the central path curvature and its relationship with the iteration-complexity
Journal on Applied Algebra and Geometry 2, 1 (2018), 140–178. of primal-dual path-following LP algorithms. Mathematical Programming 115, 1
[3] Sébastien Bubeck and Ronen Eldan. 2014. The entropic barrier: a simple and (2008), 105–149.
optimal universal self-concordant barrier. arXiv preprint arXiv:1412.1587. [31] Renato D. C. Monteiro and Takashi Tsuchiya. 2003. A Variant of the Vavasis-Ye
[4] Sergei Chubanov. 2014. A polynomial algorithm for linear optimization which Layered-Step Interior-Point Algorithm for Linear Programming. SIAM Journal
is strongly polynomial under certain conditions on optimal solutions. (2014). on Optimization 13, 4 (2003), 1054–1079.
http://www.optimization-online.org/DB_HTML/2014/12/4710.html. [32] Renato D. C. Monteiro and Takashi Tsuchiya. 2005. A New Iteration-Complexity
[5] Michael B Cohen, Yin Tat Lee, and Zhao Song. 2019. Solving linear programs in Bound for the MTY Predictor-Corrector Algorithm. SIAM Journal on Optimization
the current matrix multiplication time. In Proceedings of the 51st Annual ACM 15, 2 (2005), 319–347.
SIGACT Symposium on Theory of Computing. 938–942. [33] Neil Olver and László A. Végh. 2017. A simpler and faster strongly polynomial
[6] Samuel I Daitch and Daniel A Spielman. 2008. Faster approximate lossy general- algorithm for generalized flow maximization. In Proceedings of the Forty-Ninth
ized flow via interior point algorithms. In Proceedings of the 40th annual ACM Annual ACM Symposium on Theory of Computing (STOC). 100–111.
symposium on Theory of Computing. 451–460. [34] James Renegar. 1988. A polynomial-time algorithm, based on Newton’s method,
[7] Jesús A De Loera, Raymond Hemmecke, and Jon Lee. 2015. On Augmentation for linear programming. Mathematical Programming 40, 1-3 (1988), 59–93.
Algorithms for Linear and Integer-Linear Programming: From Edmonds–Karp to [35] James Renegar. 1994. Is it possible to know a problem instance is ill-posed?: some
Bland and Beyond. SIAM Journal on Optimization 25, 4 (2015), 2494–2511. foundations for a general theory of condition numbers. Journal of Complexity
[8] Jesús A De Loera, Sean Kafer, and Laura Sanità. 2019. Pivot Rules for Circuit- 10, 1 (1994), 1–56.
Augmentation Algorithms in Linear Optimization. arXiv preprint arXiv:1909.12863 [36] James Renegar. 1995. Incorporating condition measures into the complexity
(2019). theory of linear programming. SIAM Journal on Optimization 5, 3 (1995), 506–
[9] II Dikin. 1967. Iterative solution of problems of linear and quadratic programming. 524.
Doklady Akademii Nauk 174, 4 (1967), 747–748. [37] Alexander Schrijver. 2003. Combinatorial Optimization – Polyhedra and Efficiency.
[10] András Frank. 2011. Connections in Combinatorial Optimization. Number 38 in Springer.
Oxford Lecture Series in Mathematics and its Applications. Oxford University [38] György Sonnevend, Josef Stoer, and Gongyun Zhao. 1991. On the complex-
Press. ity of following the central path of linear programs by linear extrapolation II.
[11] Jean-Louis Goffin. 1980. The relaxation method for solving systems of linear Mathematical Programming 52, 1-3 (1991), 527–553.
inequalities. Mathematics of Operations Research 5, 3 (1980), 388–414. [39] Daniel A Spielman and Shang-Hua Teng. 2004. Nearly-linear time algorithms for
[12] Clovis C Gonzaga. 1992. Path-following methods for linear programming. SIAM graph partitioning, graph sparsification, and solving linear systems. In Proceedings
review 34, 2 (1992), 167–224. of the 36th Annual ACM Symposium on Theory of Computing (STOC).
[13] Clovis C. Gonzaga and Hugo J. Lara. 1997. A note on properties of condition [40] G.W. Stewart. 1989. On scaled projections and pseudoinverses. Linear Algebra
numbers. Linear Algebra Appl. 261, 1 (1997), 269 – 273. Appl. 112 (1989), 189 – 193. https://doi.org/10.1016/0024-3795(89)90594-6
[14] Jackie CK Ho and Levent Tunçel. 2002. Reconciliation of Various Complexity and
[41] Éva Tardos. 1985. A strongly polynomial minimum cost circulation algorithm.
Condition Measures for Linear Programming Problems and a Generalization of
Combinatorica 5, 3 (01 Sep 1985), 247–255.
Tardos’ Theorem. In Foundations of Computational Mathematics. World Scientific,
[42] Éva Tardos. 1986. A strongly polynomial algorithm to solve combinatorial linear
93–147.
programs. Operations Research (1986), 250–256.
[15] Satoshi Kakihara, Atsumi Ohara Ohara, and Takashi Tsuchiya. 2013. Information
[43] Michael J. Todd. 1990. A Dantzig–Wolfe-Like Variant of Karmarkar’s
geometry and interior-point algorithms in semidefinite programs and symmetric
Interior-Point Linear Programming Algorithm. Operations Research
cone programs. Journal of Optimization Theory and Applications 157 (2013),
38, 6 (1990), 1006–1018. https://doi.org/10.1287/opre.38.6.1006
749–780.
arXiv:https://doi.org/10.1287/opre.38.6.1006
[16] Satoshi Kakihara, Atsumi Ohara Ohara, and Takashi Tsuchiya. 2014. Curva-
[44] Michael J. Todd, Levent Tunçel, and Yinyu Ye. 2001. Characterizations, bounds,
ture integrals and iteration complexities in SDP and symmetric cone programs.
and probabilistic analysis of two complexity measures for linear programming
Computational Optimization and Applications 57 (2014), 623–665.
problems. Mathematical Programming 90, 1 (01 Mar 2001), 59–69.
[17] Narendra Karmarkar. 1984. A new polynomial-time algorithm for linear program-
[45] Levent Tunçel. 1999. Approximating the complexity measure of Vavasis-Ye
ming. In Proceedings of the 16th Annual ACM Symposium on Theory of Computing
algorithm is NP-hard. Mathematical Programming 86, 1 (01 Sep 1999), 219–223.
(STOC). 302–311.
[46] Pravin M Vaidya. 1989. Speeding-up linear programming using fast matrix
[18] Leonid G Khachiyan. 1979. A polynomial algorithm in linear programming. In
multiplication. In Proceedings of the 30th IEEE Annual Symposium on Foundations
Doklady Academii Nauk SSSR, Vol. 244. 1093–1096.
of Computer Science. 332–337.
[19] Tomonari Kitahara and Shinji Mizuno. 2013. A bound for the number of different
[47] Jan van den Brand. 2020. A Deterministic Linear Program Solver in Current Matrix
basic solutions generated by the simplex method. Mathematical Programming
Multiplication Time. In Proceedings of the Symposium on Discrete Algorithms
137, 1-2 (2013), 579–586.
(SODA).
[20] Tomonari Kitahara and Takashi Tsuchiya. 2013. A simple variant of the Mizuno–
[48] Stephen A Vavasis. 1994. Stable numerical algorithms for equilibrium systems.
Todd–Ye predictor-corrector algorithm and its objective-function-free complexity.
SIAM J. Matrix Anal. Appl. 15, 4 (1994), 1108–1131.
SIAM Journal on Optimization 23, 3 (2013), 1890–1903.
[49] Stephen A. Vavasis and Yinyu Ye. 1996. A primal-dual interior point method
[21] Guanghui Lan, Renato DC Monteiro, and Takashi Tsuchiya. 2009. A polynomial
whose running time depends only on the constraint matrix. Mathematical Pro-
predictor-corrector trust-region algorithm for linear programming. SIAM Journal
gramming 74, 1 (1996), 79–120.
on Optimization 19, 4 (2009), 1918–1946.
[50] László A. Végh. 2017. A Strongly Polynomial Algorithm for Generalized Flow
[22] Yin Tat Lee and Aaron Sidford. 2014.√Path finding methods for linear program-
Maximization. Mathematics of Operations Research 42, 2 (2017), 179–211.
ming: Solving linear programs in Õ ( rank) iterations and faster algorithms for [51] Yinyu Ye. 1997. Interior-Point Algorithms: Theory and Analysis. John Wiley and
maximum flow. In Proceedings of the 55th Annual IEEE Symposium on Foundations Sons, New York.
of Computer Science (FOCS). 424–433. [52] Yinyu Ye. 2005. A new complexity result on solving the Markov decision problem.
[23] Yin Tat Lee and Aaron Sidford. 2015. Efficient inverse maintenance and faster Mathematics of Operations Research 30, 3 (2005), 733–749.
algorithms for linear programming. In 2015 IEEE 56th Annual Symposium on [53] Yinyu Ye. 2011. The simplex and policy-iteration methods are strongly polynomial
Foundations of Computer Science. 230–249. √ for the Markov decision problem with a fixed discount rate. Mathematics of
[24] Yin Tat Lee and Aaron Sidford. 2019. Solving Linear Programs with Õ ( rank) Operations Research 36, 4 (2011), 593–603.
Linear System Solves. arXiv preprint 1910.08033.
774