0% found this document useful (0 votes)
4 views

Lecture Primal Dual

oi

Uploaded by

Luiz Felipe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture Primal Dual

oi

Uploaded by

Luiz Felipe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

CMM Summer School, 2025 Lecturer: D.

Dadush
Primal-Dual Interior Point Methods

In this lecture series, we will analyze the geometry of linear programs (LP) from the perspec-
tive of interior point methods (IPM). This will eventually lead to a new combinatorial measure
of the complexity for linear programs, called straight-line complexity, which has been used to pro-
vide the first strongly-polynomial algorithm for linear programs having at most two variables per
inequality [ADL+ 22, DKN+ 24], as well as the provide examples of linear programs for which
IPMs require a number of iterations that is exponential in the dimension [ABGJ18, AGV22].
In the present lecture, we will present a primal-dual interior point method (IPM) for solving
linear programs, which together with the simplex method, are the principal methods for solving
linear programs in practice. The primal-dual linear programs we will consider throughout is:

min ⟨c, x ⟩
Ax = b (Primal LP)
x ≥ 0n ,
max ⟨b, y⟩
A⊤ y + s = c (Dual LP)
s ≥ 0n .

Notation. The instance data is the constraint matrix A ∈ Rm×n , b ∈ Rm and c ∈ Rn , and
rk(A) = m ≤ n. We useqthe notation ⟨ x, y⟩ := x ⊤ y = ∑in=1 xi yi to denote the standard inner
product on Rn , ∥ x ∥2 = ∑in=1 xi2 = ⟨ x, x ⟩ for the ℓ2 norm, ∥ x ∥∞ := maxi∈[n] | xi | for the ℓ∞
p

norm. The primal and dual feasible regions are denoted P := { x ∈ Rn | Ax = b, x ≥ 0n } and
D := {(s, y) ∈ Rn+m | A⊤ y + s = c, s ≥ 0n }, with the strictly feasible solutions denoted by
P++ := { x ∈ P | x > 0n } and D++ := {(s, y) ∈ D | s > 0n }.

A Very Brief History of LP Algorithms. The algorithmic theory of LP solving began with
Dantzig’s development of the simplex algorithm in 1947 [Dan90]. The first polynomial time
algorithm (which solves the LPs exactly), is due to Khachiyan [Kha79], who relied on the ellipsoid
method of Yudin and Nemirovski [Yud76] (see also Shor [Sho77]), to achieve a running time that
scales as poly(n, log(⟨ A, b, c⟩)), where ⟨ A, b, c⟩ denotes the bit-complexity. While polynomial, the
ellipsoid method was known for having very poor practical performance. The first polynomial
time interior point method for LP was designed by Karmarkar [Kar84], which had the benefit
of being effective in practice. Renegar [Ren88] gave an improved IPM which followed a central
path, and relied on the use the classical Newton’s method. The first primal-dual path following
algorithm was first developed by Kojima, Mizuno and Yoshise [KMY89], variants of which are
implemented in all commercial solvers (see [YTM94, Gon96, Wri97]).
A major open question is whether LPs can be solved in strongly polynomial time. Stated
slightly simplistically, the question is whether there is an algorithm which solves (Primal LP)
and (Dual LP) using only poly(n, m) basic arithmetic operations on real numbers (i.e., +, ×, /, ≥) 1 .
All known polynomial time algorithms require a number of operations that depend on ⟨ A, b, c⟩,
1 There are important considerations relating to the size of the numbers used during the arithmetic computations

which we ignore.

1
and hence are not strongly polynomial. This question was explicitly asked by Megiddo [Meg83]
and highlighted by fields medalist Steven Smale [Sma98] as one of the mathematical challenges
for the 21st century.

LP Duality. The fundamental result which underlies the polynomial solvability of linear pro-
gramming is the linear programming duality:

Theorem 1 (Strong Duality) If (Primal LP) and (Dual LP) are both feasible, they have the same
value and both admit optimal solutions. Furthermore, for x ∈ P and (s, y) ∈ D , we have that:
D E
⟨c, x ⟩ − ⟨y, b⟩ = ⟨c, x ⟩ − ⟨y, Ax ⟩ = c − A⊤ y, x = ⟨s, x ⟩ ≥ 0. (Gap Formula)

We now recall the complementary slackness condition for optimality. Assuming both pro-
grams are feasible, we may let x ∗ , s∗ , y∗ be optimal primal-dual solutions. Since the primal and
dual have the same value, we see that

0 = ⟨c, x ∗ ⟩ − ⟨y∗ , b⟩ = ⟨ x ∗ , s∗ ⟩ .

Since x ∗ , s∗ are non-negative vectors, we must have x ∗ and s∗ have disjoint supports. That is,
xi∗ si∗ = 0, for all i ∈ [n]. As we show next, interior point methods will solve the LP by relaxing
xi si = 0 constraint to xi si = µ > 0, for all i ∈ [n], and slowly driving µ → 0.

1 Primal-Dual Central Path


We will solve the primal-dual system above, by following a primal-dual central path that converges
towards primal and dual optimal solutions. The method will require the primal and dual be
strictly feasible, and that we have knowledge of a good starting point. A way to initialize the
method without this requirement will be discussed in Section 4.
We define the primal-dual central path

CP := {z(µ) := ( x (µ), s(µ), y(µ)) | µ > 0},

as the optimal solutions to the following sequence of parametric optimization problems:


n
x (µ) := argmin{⟨c, x ⟩ − µ ∑ ln( xi ) | Ax = b, x > 0n }, (Primal Path Program)
i =1
n
(s(µ), y(µ)) := argmax{⟨y, b⟩ + µ ∑ ln(si ) | A⊤ y + s = c, s > 0n }. (Dual Path Program)
i =1

The logarithmic term in the objectives is called the logarithmic barrier, which blows up on the
boundary of the feasible region. The barrier encourages the path to stay as far away from the
boundary as possible. The optimal solutions x (µ) and (s(µ), y(µ)) are in fact unique, which
follows from the strict convexity of the objective for the primal and strict concavity in the dual.
While the primal and dual central path programs are seemingly independent, their optimal
solutions are intimately linked via the central path equations, which we explain below. For this
purpose, we will need the Hadamard product of two vectors x, y ∈ Rn :

x ◦ y : = ( x1 y1 , x2 y2 , . . . , x n y n ). (1)

2
For simplicity of notation, we simply write xy for x ◦ y. We generalize this notation to expressions
of the form x α /y β = ( x1α /y1 , . . . , xnα /yn ) for α, β ∈ R. We also use the notation X α , where
β β

X := diag ( x1 , . . . , xn ) is defined to be diagonal matrix having x on its diagonal.


The relation between the paths follows from the fact that up to an additive constant, the
respective programs are Lagrangian duals of each other. Starting with the primal program, we
may Lagrangify out the equality constraints using multipliers y ∈ Rm as follows:
n n
inf ⟨c, x ⟩ − µ ∑ ln( xi ) ≥ inf ⟨c, x ⟩ + ⟨y, b − Ax ⟩ − µ ∑ ln( xi )
Ax =b,x >0n x >0n
i =1 i =1
D E n
= inf ⟨y, b⟩ + c − A⊤ y, x − µ ∑ ln( xi ). (2)
x >0n
i =1

Letting s := c − A⊤ y, it is easy to verify that the relaxed program has value > −∞ (a non-trivial
lower bound) if and only if s > 0n , i.e., if and only if (s, y) ∈ D++ . Assuming s > 0n , the
unique choice of x which minimizes ⟨s, x ⟩ − µ ∑in=1 ln( xi ) is precisely x = µs−1 > 0n , as the
objective function is convex and this choice sets the gradient to 0n (recall that (ln t)′ = 1/t). For
(s, y) ∈ D++ , the value of the relaxed program becomes
n D E n
inf ⟨y, b⟩ + ⟨s, x ⟩ − µ ∑ ln( xi ) = ⟨y, b⟩ + s, µs−1 − µ ∑ ln(µ/si )
x >0n
i =1 i =1
n
= ⟨y, b⟩ + µ ∑ ln(si ) + µ(1 − ln µ)n. (3)
i =1

The problem of maximizing the value of this lower bound over (s, y) ∈ D++ is, up to the addi-
tive constant µ(1 − ln µ)n, precisely the problem (Dual Path Program) (one may also verify that
dualizing (Dual Path Program) yields an equivalent program to (Primal Path Program)).
As with strong LP duality, assuming that P++ and D++ are non-empty, strong Lagrangian du-
ality2 implies that both programs have optimal solutions and that the value of (Primal Path Program)
equals the value of (Dual Path Program) plus µ(1 − ln(µ))n. Letting x (µ) and (s(µ), y(µ)) denote
the respective optimal solutions, we must have that all the inequalities derived in (2) and (3) must
be tight when starting from y(µ). One can verify that the inequalities can only be tight when
x (µ) = µs(µ)−1 ⇔ x (µ)s(µ) = µ1n . This yields the following characterization of the central path.
Lemma 2 Assuming the primal and dual are strictly feasible, the central path is well-defined for any
µ > 0. Furthermore, the tuple ( x (µ), s(µ), y(µ)) is uniquely characterized by the following equations:

Ax (µ) = b, x (µ) > 0n (Strict Primal Feasibility)


A⊤ y(µ) + s(µ) = c, s(µ) > 0n (Strict Dual Feasibility)
x (µ)s(µ) = µ1n , (Centrality Equation)

where 1n denote the vector of all ones.

From the above equations, the optimality gap between the primal solution x (µ) and the dual
solution (s(µ), y(µ)) is precisely

⟨c, x (µ)⟩ − ⟨y(µ), b⟩ = ⟨ x (µ), s(µ)⟩ = nµ.


2 We will not require in-depth knowledge of Lagrangian duality. We only use it here to help explain the central
path equations.

3
Thus, as µ → 0, the optimality gap also shrinks to 0. One can in fact show that ( x (µ), s(µ), y(µ))
indeed converges as µ → 0 to optimal solutions ( x ∗ , s∗ , y∗ ), where these solutions are the “most
interior” among optimal solutions. In particular, the limit ( x ∗ , s∗ , y∗ ) will be strictly complemen-
tary: for each i ∈ [n], we will have either xi∗ > 0 or si∗ > 0.

2 Predictor-Corrector Algorithm for Primal-Dual Path Following


We now explain a method that produces feasible iterates {( x k , sk , yk )}k≥0 that closely track the
central path, and whose gap x k , sk decrease geometrically. Specifically, we will show that every

O( n) iterations the gap decreases by a factor 2 (in fact, this is a rather pessimistic estimate).
To measure our distance to the central path, we will utilize a multiplicative notion of distance
from centrality.

Definition 3 (Normalized Gap, Centrality distance and ℓ2 Neighborhood) For z := ( x, s, y) ∈


P × D primal-dual feasible, the normalized gap is

µ(z) := ⟨ x, s⟩ /n.

When µ(z) > 0, we define the distance to centrality of z by

xs
distc (z) := − 1n .
µ(z) 2

If z = ( x, s, y) is on the primal-dual central path then µ := x1 s1 = · · · = xn sn , and hence µ =


∑in=1 xi si /n := µ(z) and distc (z) = 0.
The ℓ2 neighborhood N2 ( β) of the central path of width β ∈ (0, 1) is then defined by

N2 ( β) := {z ∈ P++ × D++ : distc (z) ≤ β}.

We further define N 2 ( β) to be the closure of the ℓ2 neighborhood. The closure will contain the limit optimal
solution ( x ∗ , s∗ , y∗ ) := limµ→0 ( x (µ), s(µ), y(µ)) at the end of the path.

The width parameter β is a tunable parameter that controls how close we wish to stay to the
path. The method we present will slow down predictably as β gets smaller. The constant β can
in fact be fixed to 1/2 in the present analysis, however a variant of the algorithm we will analyze
later will need β to be a bit smaller for provable correctness (e.g., β = 1/100).
The algorithm we present below, due to Mizuno, Todd and Ye [MTY93], alternates between
two types of steps. The first type are corrector steps, which bring us closer to the central path
by decreasing our centrality distance. The second type are predictor steps, which try to move us
down the path, by decreasing the duality gap as fast as possible while keeping the centrality
distance below the β-threshold.

Corrector Step. Given z = ( x, s, y) ∈ N2 ( β), we would like to compute an update ∆zc :=


(∆x c , ∆sc , ∆yc ) such that ( x + ∆x c )(s + ∆sc ) moves closer to µ(z)1n . Specifically, we will aim for
z + ∆z ∈ N2 ( β′ ) for β′ ≤ β/2. We will compute the update by using a simple linear approxima-
tion of the corresponding quadratic equation:

µ(z)1n = ( x + ∆x )(s + ∆s) = xs + s∆x + x∆s + ∆x∆s ≈ xs + s∆x + x∆s,

4
where we optimistically assume that the quadratic term ∆x∆s is “small”. Heuristically, if we are
β-close to the path, we will hope that the corresponding quadratic term will have size β2 ≪ β,
assuming β ≪ 1.
Using the linear approximation, the corrector direction at z = ( x, s, y) is defined by the fol-
lowing linear system of equations:
s∆x c + x∆sc = µ(z)1n − xs,
A∆x c = 0m , (Corrector Direction)

A ∆y + ∆s = 0n .
c c

We then take a corrector step by updating z to z + ∆zc .

Predictor Step. Given z = ( x, s, y) ∈ N2 ( β′ ), for β′ ≤ β/2, we would like to substantially


decrease µ(z) while staying inside the N2 ( β) neighborhood. Our goal will be to compute a
predictor direction ∆z p := (∆x p , ∆s p , ∆y p ) and a largest possible step-size α ∈ [0, 1] such that
( x + α∆x p )(s + α∆s p ) ≈ (1 − α) xs and z + α∆z p ∈ N 2 ( β). This will decrease µ(z) by a (1 − α)
factor.
The predictor direction is computed using a linear approximation of the equation
0n = ( x + ∆x p )(s + ∆s p ) = xs + s∆x p + s∆s p + ∆x p ∆s p ≈ xs + s∆x p + s∆s p ,
where we again hope that the contribution of the quadratic term ∆x p ∆s p is small. The above
equation is motivated by the fact that any optimal primal-dual solution ( x ∗ , s∗ , y∗ ) satisfies x ∗ s∗ =
0n . The predictor direction at z = ( x, s, y) is then defined as follows:
s∆x p + x∆s p = − xs,
A∆x p = 0m , (Predictor Direction)
A⊤ ∆y p + ∆s p = 0n .
We then take a predictor step by moving to z + α∆z p , where α ∈ [0, 1] is chosen as large as
possible to ensure the resulting iterates lies in N 2 ( β) (the closure ensures we can reach the end
of the path).
Remark 1 The linear system defining the predictor and corrector direction differ only in the µ1n
term in the first equation. Importantly, the linear systems will always admit a solution. In the
literature, the predictor direction defined above is known as the affine scaling direction.

Predictor-Corrector Algorithm. The algorithm is given below. Importantly, the algorithm as-
sumes as input an initial iterate z0 := ( x0 , s0 , y0 ) in β-neighborhood. Note the initial iterate
determines the right-hand side b = Ax0 and objective c = A⊤ y0 + s0 .

2.1 Projection Characterization of Step Equations


In this subsection, show how to interpret the step equations in the form of orthogonal projections:
s∆x + x∆s = t (Target Eq)
A∆x = 0m (Kernel Eq)
A⊤ ∆y + ∆s = 0n (Image Eq)

5
Algorithm 1: Predictor-Corrector IPM
Input : Constraint matrix A ∈ Rm×n , rank(A) = m, and initial iterate
z0 := ( x0 , s0 , y0 ) ∈ N2 ( β), β ∈ (0, 1/2], error parameter ε ≥ 0.
Output: z = ( x, s, y) ∈ N 2 ( β), ⟨ x, s⟩ ≤ ε.
1 z = ( x, s, y) ← ( x0 , s0 , y0 );
2 while ⟨ x, s⟩ > ε do
3 Compute ∆zc at z according to (Corrector Direction);
4 z ← z + ∆zc ;
5 Compute ∆z p at z according to (Predictor Direction);
// See Lemma 5 part (3) for the precise choice of α p
6 Choose α p ∈ [0, 1] as large as possible so that z + α p ∆z p ∈ N 2 ( β);
7 z ← z + α p ∆z p ;
8 return z;

for x, s > 0n and t ∈ Rn . The equations (Kernel Eq) and (Image Eq) are subspace equations that
can be concisely expressed as ∆x ∈ ker(A) = {w ∈ Rn | Aw = 0m }, the kernel of A, and
∆s ∈ im(A⊤ ) := {A⊤ z | z ∈ Rm }, the image of A⊤ . Under the assumption that ∆s ∈ im(A⊤ ),
note that ∆y is uniquely determined by the equation (Image Eq) since the rows of A are linearly
independent.
For a linear subspace W ⊆ Rn , the orthogonal complement of W is defined by W ⊥ := {z ∈ Rn :
⟨z, w⟩ = 0, ∀w ∈ W }. With this definition, we have the ker(A) := im(A⊤ )⊥ since
D E
Ax = 0m ⇔ ⟨y, Ax⟩ = 0, ∀y ∈ Rm ⇔ A⊤ y, x = 0, ∀y ∈ Rm ⇔ ⟨w, x⟩ = 0, ∀w ∈ im(A⊤ ). (4)

Let us multiply (Target Eq) by the diagonal matrix ( XS)−1/2 , recalling that X := diag ( x ) and
S := diag (s), which yields r r
s x t
∆x + ∆s = √ . (5)
x s xs
Letting D = X 1/2 S−1/2 , xs ∆s = D∆s ∈ D im(A⊤ ) = im((AD )⊤ ). Similarly, xs ∆x = D −1 ∆x ∈
p p

D −1 ker(A) = ker(AD ).
Letting W := im(AD )⊤ , by (4) we know that W ⊥ = ker(AD ). We may thus interpret (5) as
an orthogonal decomposition of √txs into its component xs ∆s on W and its component xs ∆x
p p

on W ⊥ . We conclude that the system of equations indexed by (Target Eq) always admits a so-
lution. More precisely, letting ΠW , ΠW ⊥ ∈ Rn×n denote the orthogonal projections onto W, W ⊥
respectively we get that:
r r      
s x t t
∆x, ∆s = ΠW √ , ΠW ⊥ √ ,
x s xs xs

which follows since z = ΠW (z) + ΠW ⊥ (z)3 for any z ∈ Rn .


3 One may in fact define Π as the unique linear map satisfying Π ( z ) ∈ W and z − Π ( z ) ∈ W ⊥ , ∀ z ∈ Rn . By
W W W
symmetry, note then In − ΠW = ΠW ⊥ , where In is the n × n identity.

6
A useful property of orthogonal decomposition is the Pythagorean identity:

∥z∥22 = ∥ΠW z + ΠW ⊥ z∥22 = ⟨ΠW z, ΠW z⟩ + 2 ⟨ΠW z, ΠW ⊥ z⟩ + ⟨ΠW ⊥ z, ΠW ⊥ z⟩ (6)


| {z }
=0
2 2
= ∥ ΠW z ∥ + ∥ ΠW ⊥ z ∥ . (7)

Specializing to (5), we derive the corresponding identity which we will use repeatedly to
analyze the IPM:
2 2
t 2
r r
s x
∆x + ∆s = √ . (8)
x 2 s 2 xs 2

3 Analysis of Predictor-Corrector Algorithm


We analyze the properties of corrector and predictor steps separately. In the two lemmas below,
recall the definition of normalize gap µ(z) := ⟨ x, s⟩ /n for an iterate z := ( x, s, y) ∈ P × D .

Lemma 4 (Corrector Analysis) Let z ∈ N2 ( β) for β ∈ (0, 1/2]. The corrector direction ∆zc at z
satisfies:

1. µ(z + ∆zc ) = µ(z).


β2
2. z + ∆zc ∈ N2 ( 2(1− β) ) ⊆ N2 ( β/2).

β2
Remark 2 For β ∈ (0, 1/2], 2(1− β )
≤ β2 ≤ β/2.

Lemma 5 (Predictor Analysis) Let z ∈ N2 ( β/2), β ∈ (0, 1/2], ∆z p = (∆x p , ∆s p , ∆y p ) be the


∆x p ∆s p
predictor direction at z and qz := µ(z)
be the normalized quadratic error. Then the following holds:
2

1. µ(z + α∆z p ) = (1 − α)µ(z), for α ∈ R.

2. qz ≤ n/2.
q
1− α 2q
αp h −1 ( βz ) 1
satisfies z + α∆z p ∈ N ( β),
β
3. Define h(α) = α2
for α ∈ (0, 1]. Then, := ≥ 2 n
∀α ∈ [0, α p ). In particular, z + α p ∆z p ∈ N ( β).

Combining the above two lemmas, we directly obtain the following estimate on the worst-case
convergence rate on Predictor-Corrector IPM.

Theorem 6 After T ≥ 0 iterations of the while loop, Predictor-Corrector


q
IPM produces an iterate
1 β
z = ( x, s, y) ∈ N 2 ( β), β ∈ (0, 1/2], satisfying ⟨ x, s⟩ ≤ x0 , s0 e− 2 n T . Furthermore, if x0 , s0 ≥ ε >
q
0, the algorithm terminates after at most 2 nβ ln( x0 , s0 /ε) + 1 iterations.

Proof: We prove the first claim by induction on T ≥ 0. Note that the claim is trivially true
for T = 0 (i.e., before we enter the while loop), since by assumption z0 ∈ N2 ( β) ⊆ N 2 ( β).
Assuming that the induction hypothesis holds after iteration T, we prove it for iteration T + 1.
If we terminate before iteration T + 1, there is nothing to prove and the statement holds. So

7
assume we enter the while loop at iteration T + 1. Then at the beginning of iteration T + 1, by
our induction hypothesis, we have that
q
0 0 − 21 β
nT
z = ( x, s, y) ∈ N 2 ( β), and nµ(z) = ⟨ x, s⟩ ≤ x , s e .
Given that we entered the while loop, we must have that nµ(z) = ⟨ x, s⟩ > ε ≥ 0. In this case,
z ∈ N 2 ( β) implies that z ∈ N2 ( β), since N 2 ( β) \ N2 ( β) only contains points with µ(z) = 0.
Starting from z, we now take one corrector and one predictor step. For the corrector step, we
update z to zc where zc = z + ∆zc . By Lemma 4, we have that µ(z) = µ(zc ) and that zc ∈ N2 ( β/2)
p
since β ∈ (0, 1/2]. For the predictor step, we update zc to z p = zc + α p ∆zc where α p is chosen
according to Lemma 5 part (3), which satisfies z p ∈ N ( β) and
Lem 5 p (1)
µ(z p ) = (1 − α p ) µ ( z c ) = (1 − α p ) µ ( z )
r
Lem 5 p (3) ind. hyp.
q q
1 β − 12 nβ − 12 nβ ( T +1)
≤ (1 − )µ(z) ≤ e µ(z) ≤ e µ ( z0 ).
2 n
Recalling that z p = ( x p , s p , y p ) is the iterate produced after T + 1 iterations, that nµ(z p ) = x p , s p
and nµ(z0 ) = x0 , s0 , we conclude that the induction hypothesis continues to hold after T + 1
iterations.
For furthermore, let T be the number of iterations performed by Predictor-Corrector
q IPM
0 0 n 0 0
with target error x , s ≥ ε > 0. Since ⟨ x0 , s0 ⟩ /ε ≥ 1, note that 2 β ln( x , s /ε) + 1 ≥ 1.
Therefore, we may assume that T ≥ 1 since otherwise the statement holds trivially. Then, by
definition of T, it must be the case that the duality gap after iteration
q
T − 1 ≥ 0 is greater
1 β
than ε. By the invariant proved above, we conclude that x0 , s0 e− 2 n ( T −1) > ε. In particular,
q
T < 2 nβ ln( x0 , s0 /ε) + 1, as needed. 2

3.1 Helper Propositions


We begin with the following simple propositions.
xs
Proposition 7 For z = ( x, s, y) ∈ P++ × D++ . If distc (z) ≤ β, then (1 − β)1n ≤ µ(z)
≤ (1 + β )1n .

Proof: By definition distc (z) ≤ β if only and only we can write xs µ = 1n + ξ, where ∥ ξ ∥2 ≤ β.
Therefore, ∥ξ ∥∞ := maxi∈[n] |ξ i | ≤ ∥ξ ∥2 ≤ β. In particular, (1 − β)1n ≤ 1n + ξ ≤ (1 + β)1n , as
needed. 2

Proposition 8 Let u, v ∈ Rn . Then ∥uv∥2 ≤ 12 (∥u∥22 + ∥v∥22 ).


√ √ √ √
Proof: Recall that for a, b ≥ 0, then a + b ≥ 2 ab since a + b − 2 ab = ( a − b)2 ≥ 0. Using
this inequality with a = ∥u∥22 and b = ∥v∥22 , we derive the desired inequality as follows
1
s s s
2
(∥u∥22 + ∥v∥22 ) ≥ ∥u∥2 ∥v∥2 = ∑ i ∑ j
u 2 v 2 =
∑ u2i v2j
i ∈[n] j∈[n] i,j∈[n]
s
≥ ∑ u2i v2i = ∥uv∥2 .
i ∈[n]

8
1− α
Proposition 9 Define h(α) = α2
for α ∈ (0, 1]. Then for ν ≥ 0,

1 − ν : 0 ≤ ν ≤ 1/2


−1 1
h (ν) ≥ : 1/2 ≤ ν ≤ 1
2
1
√ :ν≥1


2 ν

Proof: Since h(α) = 1α−2α is monotone decreasing in α ∈ (0, 1], we have that h−1 (ν) ≥ α ⇔
h(α) ≥ ν. Using this, we verify the desired lower bounds on h−1 . For 0 ≤ ν ≤ 1/2, we have
1−(1−ν) −1/2
h(1 − ν) = (1−ν)2 ≥ ν. For 1/2 ≤ ν ≤ 1, we have h(1/2) = 1(1/2 )2
= 2 ≥ ν. For ν ≥ 1, we have
√ −1 √
h((2 ν) ) = (1 − 2√ν )(2 ν) ≥ (1 − 1/2)4ν ≥ ν. 2
1 2

3.2 Proof of Lemma 4


Let ∆zc = (∆x c , ∆sc , ∆yc ) be the corrector at z ∈ N2 ( β), which satisfies s∆x c + x∆sc = µ1n − xs
where µ := µ(z), ∆x c ∈ ker(A), ∆sc ∈ im(A⊤ ). In particular, ⟨∆x c , ∆sc ⟩ = 0.

Proof of Part (1).

nµ(z + ∆zc ) = ⟨ x + ∆x c , s + ∆sc ⟩ = ⟨ x, s⟩ + ⟨ x, ∆sc ⟩ + ⟨∆x c , s⟩ + ⟨∆x c , ∆sc ⟩


| {z }
=0
* +
= xs + x∆sc + s∆x c , 1n = ⟨µ1n , 1n ⟩ := nµ(z).
| {z }
=µ1n

Proof of Part (2). We start with an exact expression of the centrality distance of z + ∆zc :

( x + ∆x c )(s + ∆sc )
distc (z + ∆zc ) = − 1n
µ(z + ∆zc ) 2
xs + s∆x c + x∆sc + ∆x c ∆sc
= − 1n
µ 2
µ1n + ∆x c ∆sc ∆x c ∆sc
= − 1n = .
µ 2 µ 2

9
∆x c ∆sc β2
It suffices to show µ ≤ 2(1− β )
. This is derived as follows:
2

∆x c ∆sc
r  r 
s x c
= ∆x c
∆s
µ 2 xµ sµ 2
!
Prop. (8) 2 2
1 s x c
r r
≤ ∆x c + ∆s
2 xµ 2 sµ 2
2 √ 2
Eq. (8) 1 µ1n − xs 1 µ1n − xs µ
= √ = √
2 µxs 2
2 µ xs 2
2
1 µ1n − xs µ

2 µ 2 xs ∞
z∈ N2 ( β) β2 µ
≤ max
2 i ∈[n] xi si
Prop. (7) β2
≤ .
2(1 − β )

3.3 Proof of Lemma 5


Let ∆z p = (∆x p , ∆s p , ∆y p ) be the corrector at z ∈ N2 ( β/2), which satisfies s∆x p + x∆s p = − xs
where µ := µ(z), ∆x p ∈ ker(A), ∆s p ∈ im(A⊤ ). In particular, ⟨∆x p , ∆s p ⟩ = 0.

Proof of Part (1).

nµ(z + α∆z p ) = ⟨ x + α∆x p , s + α∆s p ⟩ = ⟨ x, s⟩ + α ⟨ x, ∆s p ⟩ + α ⟨∆x p , s⟩ + α2 ⟨∆x p , ∆s p ⟩


| {z }
=0
* +
= xs + αx∆s p + αs∆x p , 1n = (1 − α) ⟨ xs, 1n ⟩ := (1 − α)nµ(z).
| {z }
=(1−α) xs

∆x p ∆s p
Proof of Part (2). Recall that qz := µ . The bound on qz is derived as follows:
2

∆x p ∆s p
r  r 
s x
= ∆x p
∆s p
µ 2 xµ sµ 2
!
Prop. (8) 2 2
1 s x
r r
≤ ∆x p + ∆s p
2 xµ 2 sµ 2
2 n
Eq. (8) 1 − xs 1 xi si 1
2 i∑
= √ = = n.
2 µxs 2 =1
µ 2

10
Proof of Part (3). Let α ∈ [0, 1). We bound the centrality distance as follows:

( x + α∆x p )(s + ∆s p )
distc (z + α∆z p ) = − 1n
µ(z + α∆z p ) 2
xs + αs∆x p + αx∆s p + α2 ∆x p ∆s p
= − 1n
(1 − α ) µ 2
(1 − α) xs + α2 ∆x c ∆sc
= − 1n
(1 − α ) µ 2
xs α2 ∆x p ∆s p
≤ − 1n +
µ 2 1−α µ 2
z∈ N2 ( β/2) β α2
≤ + qz .
2 1−α
2qz
For α p := h−1 ( p p
β ), we must show that z + α∆z ∈ N2 ( β ) for α ∈ [0, α ). By the above, it
2 2q
suffices to show that 1α−α qz ≤ 2 , ∀α ∈ [0, α p ) ⇔ βz ≤ 1α−2α , ∀α ∈ [0, α p ). This follows directly from
β

the definition of h−1 using the fact that h is monotone decreasing. From here, we have
  p (2)   r
p −1 2qz −1 n 1 β
α =h ≥ h ≥ ,
β β 2 n
n
where the last inequality follows from Proposition 9 and β ≥ 1.

4 Initialization
In this section, we present the (magical) homogeneous self-dual initialization due to Ye, Todd
and Mizuno [YTM94], which works in an extended space.

Self-Dual Optimality System. We start with the following homogeneous feasibility system (SDOS):

Ax = τb
A⊤ y + s = τc
(SDOS)
⟨c, x ⟩ − ⟨b, y⟩ + κ = 0
x ≥ 0n , s ≥ 0n , τ ≥ 0, κ ≥ 0, y ∈ Rm .

Note that any solution above with τ > 0 satisfies that ( x/τ, s/τ, y/τ ) ∈ P × D . As in the
computation of (Gap Formula), the equality constraints of (SDOS) imply
D E
−κτ = (⟨c, x ⟩ − ⟨b, y⟩)τ = ⟨τc, x ⟩ − ⟨τb, y⟩ = ⟨τc, x ⟩ − ⟨ Ax, y⟩ = τc − A⊤ y, x = ⟨s, x ⟩ .

In particular, for any feasible solution, we have

⟨s, x ⟩ + |{z}
κτ = 0. (9)
| {z }
≥0 ≥0

If τ > 0, then by the above we must have κ = 0. Therefore, if τ > 0, then ( x/τ, s/τ, y/τ ) form an
optimal primal-dual pair of solutions. Similarly, if κ > 0, then τ = 0, and then ⟨c, x ⟩ − ⟨b, y⟩ + κ =

11
0 implies that either ⟨c, x ⟩ < 0 or ⟨b, y⟩ > 0. In the former case, we get ⟨c, x ⟩ < 0, Ax = 0n (since
τ = 0) and x ≥ 0n , and thus x certifies that the primal is unbounded from below (i.e., the dual
is infeasible). Similarly in the latter case, we get ⟨y, b⟩ > 0, A⊤ y + s = 0n (since τ = 0) and
s ≥ 0n , which certifies that the dual is unbounded from below (i.e., the primal is infeasible).
Linear programming duality indeed guarantees that there always exists a solution to (SDOS)
with either τ or κ positive.

Self-Dual Initialization System. From the above discussion, finding a solution either with τ >
0 or κ > 0 immediately solves the primal-dual linear program (it provides optimal solutions or
an infeasibility certificate for one side). As of yet however, it is not clear how one could initialize
a path following scheme to find such a solutions.
The insight of Ye, Todd and Mizuno [YTM94] is that one can add an additional variable
and constraint which allows one to violate the non-negativity constraints, and where one can
explicitly initialize the central path for minimizing infeasibility. The corresponding self-dual
initialization system (SDIS) is given by

min ( n + 1) θ
A( x − θ1n ) = (τ − θ )b
A⊤ y + (s − θ1n ) = (τ − θ )c
(SDIS)
⟨c, x − θ1n ⟩ − ⟨b, y⟩ + (κ − θ ) = 0
⟨1n , x ⟩ + ⟨1n , s ⟩ + τ + κ − ( n + 1 ) θ = n + 1
x ≥ 0n , s ≥ 0n , τ ≥ 0, κ ≥ 0, y ∈ Rm , θ ∈ R.

If ( x, s, τ, κ, y, θ ) is feasible for (SDIS) only if ( x − θ1n , s − θ1n , τ − θ, κ − θ, y) is feasible for (SDOS)


where the non-negativity constraints are relaxed from ≥ 0 to ≥ −θ. Using this interpretation
combined with (9), we see that θ ≥ 0 for any feasible solution to (SDIS). In particular, the opti-
mal value of the above program is 0, and any feasible solution with θ = 0 is a feasible solution
to (SDOS) with ⟨1n , x ⟩ + ⟨1n , s⟩ + τ + κ = n + 1 (in particular, the solution is non-zero).
The above program is self-dual in the sense that the dual program is equivalent to the primal
problem. The dual variables are also ( x, s, τ, κ, y, θ ) having the same constraints, where the dual
objective is max −(n + 1)θ. Most important in this duality, is which variables are dual to each
other, which we require to define the primal-dual gap. In particular, the x and s variables are
dual, and the variables κ and τ are dual.
This self-duality yields the following symmetric central path equations:
( x (µ), s(µ), τ (µ), κ (µ), y(µ), θ (µ)) is the primal central path point at parameter µ > 0 if it is
feasible for (SDIS) and
x (µ)s(µ) = µ1n and τ (µ)κ (µ) = µ.
The dual central path is the same except that we swap x (µ) with s(µ) as well as τ (µ) with κ (µ).
With these equations, it is easy to see that the primal central path point at parameter µ = 1 is
given by
( x (1), s(1), τ (1), κ (1), y(1), θ (1)) = (1n , 1n , 1, 1, 0m , 1).
With this explicit initialization, we can now directly apply Predictor-Corrector IPM start-
ing from µ = 1. As the IPM ensures that the limit optimal solutions are strictly complementary,
we will have that either τ (µ) or κ (µ) will converge to something non-zero as µ → 0. We note

12
that the duality gap in this initialization roughly corresponds to “approximate” feasibility and
optimality in the original system (assuming the starting primal and dual LPs are feasible).
As a last remark, we note that a naive implementation of the predictor-corrector IPM would
explicitly keep track of both primal and dual iterates, which corresponds to two times 2(n + 1) +
m + 1 many variables. Due to the symmetry, one can design an optimized implementation that
only directly keeps track of one side.

References
[ABGJ18] Xavier Allamigeon, Pascal Benchimol, Stéphane Gaubert, and Michael Joswig. Log-
barrier interior point methods are not strongly polynomial. SIAM Journal on Applied
Algebra and Geometry, 2(1):140–178, 2018.

[ADL+ 22] Xavier Allamigeon, Daniel Dadush, Georg Loho, Bento Natura, and László A Végh.
Interior point methods are not worse than simplex. In 2022 IEEE 63rd Annual Sympo-
sium on Foundations of Computer Science (FOCS), pages 267–277. IEEE, 2022.

[AGV22] Xavier Allamigeon, Stéphane Gaubert, and Nicolas Vandame. No self-concordant


barrier interior point method is strongly polynomial. In Proceedings of the 54th Annual
ACM Symposium on Theory of Computing (STOC), pages 515–528, 2022.

[Dan90] George B. Dantzig. Origins of the simplex method, page 141–151. Association for Com-
puting Machinery, New York, NY, USA, 1990.

[DKN+ 24] Daniel Dadush, Zhuan Khye Koh, Bento Natura, Neil Olver, and László A Végh. A
strongly polynomial algorithm for linear programs with at most two nonzero entries
per row or column. In Proceedings of the 56th Annual ACM Symposium on Theory of
Computing, pages 1561–1572, 2024.

[Gon96] Jacek Gondzio. Multiple centrality corrections in a primal-dual method for linear
programming. Computational optimization and applications, 6(2):137–156, 1996.

[Kar84] Narendra Karmarkar. A new polynomial-time algorithm for linear programming. In


Proceedings of the 16th Annual ACM Symposium on Theory of Computing (STOC), pages
302–311, 1984.

[Kha79] Leonid G Khachiyan. A polynomial algorithm in linear programming. In Doklady


Academii Nauk SSSR, volume 244, pages 1093–1096, 1979.

[KMY89] Masakazu Kojima, Shinji Mizuno, and Akiko Yoshise. A Primal-Dual Interior Point
Algorithm for Linear Programming, pages 29–47. Springer New York, New York, NY,
1989.

[Meg83] Nimrod Megiddo. Towards a genuinely polynomial algorithm for linear program-
ming. SIAM Journal on Computing, 12(2):347–353, 1983.

[MTY93] Shinji Mizuno, Michael Todd, and Yinyu Ye. On adaptive-step primal-dual interior-
point algorithms for linear programming. Mathematics of Operations Research, 18:964–
981, 11 1993.

13
[Ren88] James Renegar. A polynomial-time algorithm, based on Newton’s method, for linear
programming. Mathematical Programming, 40(1-3):59–93, 1988.

[Sho77] N.Z. Shor. Cut-off method with space extension in convex programming problems.
Kibernetika, 13(1):94–95, 1977. Translated in Cybernetics 13(1), 94-96.

[Sma98] Steve Smale. Mathematical problems for the next century. The Mathematical Intelli-
gencer, 20:7–15, 1998.

[Wri97] Stephen J Wright. Primal-dual interior-point methods. SIAM, 1997.



[YTM94] Yinyu Ye, Michael J. Todd, and Shinji Mizuno. An O( nL)-iteration homogeneous
and self-dual linear programming algorithm. Mathematics of Operations Research,
19(1):53–67, 1994.

[Yud76] A.S Yudin, D.B. Nemirovski. Informational complexity and effective methods of so-
lution for convex extremal problem. Ekonomika i Matematicheskie Metody, 12:357–359,
1976. Translated in Matekon: Translations of Russian and East European Math. Eco-
nomics 13, 25-45, Spring ’77.

14

You might also like