0% found this document useful (0 votes)
16 views28 pages

Quadratically Regularized Optimal Transport

Uploaded by

mymnaka82125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

Quadratically Regularized Optimal Transport

Uploaded by

mymnaka82125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Noname manuscript No.

(will be inserted by the editor)

Quadratically regularized optimal transport

Dirk A. Lorenz · Paul Manns · Christian


Meyer

the date of receipt and acceptance should be inserted later


arXiv:1903.01112v4 [math.OC] 9 Sep 2019

Abstract We investigate the problem of optimal transport in the so-called Kan-


torovich form, i.e. given two Radon measures on two compact sets, we seek an
optimal transport plan which is another Radon measure on the product of the
sets that has these two measures as marginals and minimizes a certain cost func-
tion.
We consider quadratic regularization of the problem, which forces the optimal
transport plan to be a square integrable function rather than a Radon measure.
We derive the dual problem and show strong duality and existence of primal
and dual solutions to the regularized problem. Then we derive two algorithms to
solve the dual problem of the regularized problem: A Gauss-Seidel method and
a semismooth quasi-Newton method and investigate both methods numerically.
Our experiments show that the methods perform well even for small regulariza-
tion parameters. Quadratic regularization is of interest since the resulting optimal
transport plans are sparse, i.e. they have a small support (which is not the case
for the often used entropic regularization where the optimal transport plan always
has full measure).
Keywords optimal transport, regularization, semismooth Newton method,
Gauss-Seidel method, duality
Mathematics Subject Classification (2010) 49Q20, 65D99, 90C25

1 Introduction

In this paper we will investigate a regularized version of the optimal transport


problem. Optimal transport dates back to the work of Monge in 1781 but the

Dirk A. Lorenz, E-mail: [email protected]


TU Braunschweig, Institute of Analysis and Algebra, Germany
Paul Manns, E-mail: [email protected]
TU Braunschweig, Institute of Mathematical Optimization, Germany
Christian Meyer, E-mail: [email protected]
TU Dortmund, Fakultät für Mathematik, Germany
2 Dirk A. Lorenz et al.

problem formulation we use here is the one of Kantorovich [15]. Let us fix some
notation and formulate the problem: Let Ω1 ⊂ Rd1 , Ω2 ⊂ Rd2 be two compact do-
mains, denote Ω = Ω1 × Ω2 , and assume we are given two positive regular Radon
measures µ1 and µ2 on Ω1 and Ω2 , respectively. Further we assume that a cost
function c : Ω1 × Ω2 → R is given that models the cost of transporting a unit
of mass from x1 ∈ Ω1 to x2 ∈ Ω2 . The optimal transport problem asks to find a
transport plan Rπ , which is a Radon measure on Ω , such that it has minimal overall
transport cost Ω c(x1 , x2 ) dπ (x1 , x2 ) among all measures π which have µ1 and µ2
as first and second marginals, respectively, i.e. for all Borel sets A ∈ Ω1 it holds that
π (A × Ω2 ) = µ1 (A) and for all Borel sets B ∈ Ω2 it holds that π (Ω1 × B ) = µ2 (B ).
This problem has been studied extensively and we refer to the books [18, 19, 23,
24, 21]. One particular result is, that an optimal plan π ∗ exists and that the sup-
port of optimal plans is contained in the so-called c-superdifferential of a c-concave
function [1, Theorem 1.13]. For many cost functions c, this means that optimal
transport plans are supported on small sets and that they are in fact singular with
respect to the Lebesgue measure on Ω . This makes the numerical treatment of
optimal transport problems difficult and one can employ regularization to obtain
approximately optimal plans π that are functions on Ω . The regularization method
that has got the most attention recently is regularization with the negative en-
tropy of π and we refer to [16, 10, 4]. Entropic regularization has gotten popular
in machine learning applications due to the fact that it allows for the very simple
Sinkhorn algorithm (in the discrete case), see [9, 13] and also [17] for a recent and
thorough review of the computational aspects of optimal transport.
Regularizations different from entropic regularization has been much less stud-
ied. We are only aware of works in the discrete case, e.g. [3, 11]. In this work we
will investigate the case where we regularize the problem in L2 (Ω ). The paper
is organized as follows: In Section 2 we state the problem and analyze existence
and duality. It will turn out that existence of solutions of the dual problem will
be quite tricky to show, but we will show that dual solutions exist in respective
L2 spaces and that a straightforward optimality system characterizes primal-dual
optimality. In Section 3 we derive two different algorithms for the discrete version
of the quadratically regularized optimal transport problem, and in Section 4 we
comment on a simple discretization scheme and report numerical examples.

Notation. We will abbreviate x+ = max(x, 0) (and will apply this also to functions
and to measures where + will mean the positive part from the Hahn-Jordan de-
composition). By C (Ω ) we denote that space of continuous functions on Ω (and we
will always work on compact sets) equipped with the supremum norm k · k∞ and
by M(Ω ) we denote the space R of Radon measures on a compact domain and we
use the norm kµkM = sup{ f dµ | f ∈ C (Ω ), |f | ≤ 1}. The Lebesgue measure will
be λ (and we also use λ1 and λ2 to specify the Lebesgue measure on sets Ω1 and
Ω2 , respectively). For convenience, we use |Ω| for the Lebesgue measure of the set
Ω . Furthermore, for a Radon measure w ∈ M, we denote the absolutely and sin-
gular part arising from the Lebesgue decomposition with respect to the Lebesgue
measure by wac and ws , i.e. they satisfy wac  λ and ws ⊥ λ. Duality pairings
are denoted by h·, ·i. If both arguments of the duality pairing are positive and the
duality pairing does not necessarily exist, e.g. for ψ ∈ M(Ω ) and x ∈ L2 (Ω ), we
set hψ, xi := +∞.
Quadratically regularized optimal transport 3

2 Quadratic regularization in the continuous case

For the quadratically regularized optimal transport problem we seek a transport


plan π ∈ L2 (Ω1 × Ω2 ) which for a given cost function c ∈ L2 (Ω1 × Ω2 ), a regular-
ization parameter γ > 0, and given functions µi ∈ L2 (Ωi ), i = 1, 2 solves
Z
γ 2
min hc, πiL2 + 2 kπkL2 subject to π (x1 , x2 ) dλ2 = µ1 (x1 ),
π Ω
Z 2
(1)
π (x1 , x2 ) dλ1 = µ2 (x2 ),
Ω1
π (x 1 , x 2 ) ≥ 0

where the constraints are understood pointwise almost everywhere.

2.1 Solutions of the primal problem

It is straight forward to show, that optimal transport plans exist:


Lemma 2.1 Problem (1) has an optimal solution if and onlyR if µ1 ∈ L2 (Ω1 ), µ2 ∈
L2 (Ω2 ), µ1 , µ2 ≥ 0 almost everywhere, and Ω µ1 (x1 ) dλ1 = Ω µ2 (x2 ) dλ2 .
R
1 2

∗ 2
Proof Assume that there is an optimal solution π ∈ L (Ω1 × Ω2 ). By Jensen’s
inequality we get
Z Z Z 2
µ21 (x1 ) dλ1 = π ∗ (x1 , x2 ) dλ2 dλ1
Ω1 Ω1 Ω
ZZ 2
≤ |Ω2 | π ∗ (x1 , x2 )2 dλ1 dλ2 < ∞
Ω1 ×Ω2
2
which shows µ1 ∈ L (Ω1 ). The argument for µ2 is similar. Non-negativity of µ1
and µ2 follows from non-negativity of π ∗ . Finally, by Fubini’s theorem
Z ZZ
µ1 (x1 ) dλ1 = π ∗ (x1 , x2 ) dλ1 dλ2
Ω1 Ω1 ×Ω2
Z
= µ2 (x2 ) dλ2
Ω2

Conversely, ifRµ1 ∈ L (Ω1 ) and µ2 ∈ L2 (Ω2 ) and µ1 , µ2 ≥ 0 we set C :=


2
1
R
µ (x ) dλ1 = Ω µ2 (x2 ) dλ2 . Then π (x1 , x2 ) = C
Ω1 1 1
µ1 (x1 )µ2 (x2 ) is feasible for (1)
2
and since the objective is continuous, coercive, and strongly convex a (unique) min-
imizer exists. u
t

2.2 Dual problem and existence of dual solutions

In the following section, we apply the classical Lagrange duality to the linear-
quadratic program (1). To this end, let us define the Lagrangian associated with
(1). In order to shorten the notation, we set
µ := γ µ1 ⊗ µ2 .
4 Dirk A. Lorenz et al.

Furthermore, we define
Z Z
2 2 2
P1 : L (Ω ) 3 π 7→ π dλ2 ∈ L (Ω1 ), P2 : L (Ω ) 3 π 7→ π dλ1 ∈ L2 (Ω2 ), (2)
Ω2 Ω1

and denote the the primal objective by


Z
γ
Eγ : L2 (Ω ) → R, Eγ (π ) := c π dλ + kπk2L2 (Ω ) . (3)
Ω 2

Then, the Lagrangian associated with (1) is given by

L : L2 (Ω ) × L2 (Ω1 ) × L2 (Ω2 ) × L2 (Ω ) → R,
L(π, α1 , α2 , %) := Eγ (π ) − h%, πiL2 (Ω )
+ hα1 , P1 π − µ1 iL2 (Ω1 ) + hα2 , P2 π − µ2 iL2 (Ω2 ) .

Then, by standard arguments, the primal problem in (1) is equivalent to

inf sup L(π, α1 , α2 , %), (PP)


π∈L2 (Ω ) α ∈L2 (Ω ), α ∈L2 (Ω )
1 1 2 2
%∈L2 (Ω ), %≥0

while its (Lagrangian) dual is given by

sup inf L(π, α1 , α2 , %). (DP)


2
α1 ∈L2 (Ω1 ), α2 ∈L2 (Ω2 ) π∈L (Ω )
%∈L2 (Ω ), %≥0

The main part of the upcoming analysis is devoted to the existence of solutions to
(DP). Once this is established, the necessary and sufficient optimality condition
associated with (1) in form of the variational inequality will allow us to derive an
optimality system that is also amenable for numerical computations.
To show existence for (DP), we first reformulate the dual problem. Since L is
quadratic w.r.t. π , the inner inf-problem is solved by
1
π= (ρ + α1 ⊕ α2 − c), (4)
γ

where the mapping ⊕ : L2 (Ω1 ) × L2 (Ω2 ) → L2 (Ω ) is defined via

(v1 ⊕ v2 )(x1 , x2 ) := v1 (x1 ) + v2 (x2 ) (5)

for almost all (x1 , x2 ) ∈ Ω and all vi ∈ L2 (Ωi ), i = 1, 2.

Remark 2.2 The map ⊕ is related to the adjoints of the projections P1 and P2
from (2) by α1 ⊕ α2 = P1∗ α1 + P2∗ α2 .

Inserting (4) into (DP) yields


Z
 1
sup sup − (ρ + α1 ⊕ α2 − c)2 dλ
α1 ∈L2 (Ω1 ),α2 ∈L2 (Ω2 ) ρ≥0 2γ Ω Z Z  (6)
+ µ1 α1 dλ1 + µ2 α2 dλ2
Ω1 Ω2
Quadratically regularized optimal transport 5

Again, the inner optimization problem is quadratic w.r.t. ρ so that its solution is
given by
ρ = −(α1 ⊕ α2 − c)− . (7)
Inserted in (6), this results in the following dual problem
⊕ α2 − c)+ k2L2 (Ω )

1
min Φ(α1 , α2 ) := 2 k(α1 

−γhα1 , µ1 i − γhα2 , µ2 i (D)
2

s.t. αi ∈ L (Ωi ), i = 1, 2.

To prove existence of solutions for this problem, we need to require the following
Assumption 1 The domains Ω1 and Ω2 are compact. Moreover, the cost function c
is in L2 (Ω ) and fulfills c ≥ c > −∞. Furthermore, the marginals µ1 and
R µ2 satisfy
µi ∈ L2 (Ωi ) and µi ≥ δ > 0, i = 1, 2. In addition we assume that Ω µ2 dλ1 =
R 1


µ2 dλ1 = 1.
1

Remark 2.3 The last assumption on the normalization


R of the marginals
R is just to
ease the subsequent analysis and can be relaxed by Ω µ2 dλ1 = Ω µ2 dλ1 , which
1 1
is needed anyway to ensure the existence of a solution to the primal problem, see
Lemma 2.1.
Remark 2.4 Note that there is an obvious source of non-uniqueness for the dual
problem (D): We can add a constant to α1 and subtract it from α2 and this does not
change the dual objective, i.e for any constant C it holds that Φ(α1 + C, α2 − C ) =
Φ(α1 , α2 ). This non-uniqueness will not cause trouble
R in the proofs and when
convenient, we remove it, e.g. by demanding that Ω α2 dλ2 = 0. 2

Observe that the objective Φ in (D) is also well defined for functions in αi ∈
L1 (Ωi ) with (α1 ⊕ α2 − c)+ ∈ L2 (Ω ). This gives rise to the following auxiliary dual
problem:
)
min Φ(α1 , α2 )
(D’)
s.t. αi ∈ L1 (Ωi ), i = 1, 2, (α1 ⊕ α2 − c)+ ∈ L2 (Ω ).
Our strategy to prove existence of solutions to (D) is now as follows:
1. First, we show that (D’) admits a solution (α1∗ , α2∗ ) ∈ L1 (Ω1 ) × L1 (Ω2 ), see
Proposition 2.9.
2. Then, we prove that α1∗ and α2∗ possess higher regularity, namely that they are
functions in L2 (Ωi ), i = 1, 2, cf. Theorem 2.10.
3. Thus, (α1∗ , α2∗ ) is feasible for (D) and, since the feasible set of (D’) contains
the one of (D), while the objective of (D’) restricted to L2 -functions coincides
with the objective in (D), this finally gives that (α1∗ , α2∗ ) is indeed optimal for
(D).
The reason to consider (D’) is essentially that the objective Φ is not coercive in
L2 (Ω ), but only in L1 (Ω ) (at least w.r.t. the negative part of αi ). Therefore, we
have to deal with weakly∗ converging sequences in the space of Radon measures
within the proof of existence of solutions. For this purpose, we need to extend the
objective to a suitable set. To that end, let us define
Z
G : L2 (Ω ) 3 w 7→ 1 2
2 w+ − wµ dλ ∈ R. (8)

6 Dirk A. Lorenz et al.

R R
Note that, thanks to Ω1
µ2 dλ1 = Ω1
µ2 dλ1 = 1, it holds
Z
Φ(α1 , α2 ) = G(α1 ⊕ α2 − c) − c µ dλ ∀ αi ∈ L2 (Ωi ), i = 1, 2. (9)

Of course, G is also well defined as a functional on the feasible set of (D’) and
we will denote this functional by the same symbol to ease notation. In order to
extend G to the space of Radon measures, consider for a given measure w ∈ M(Ω ),
the Hahn-Jordan decomposition w R= w+ − w− and assume that w+ ∈ L2 (Ω ).
2
Then, we set G(w) = Ω 12 w+
R
dλ − Ω µ dw. With a slight abuse of notation, we
denote this mapping by G, too. Furthermore, for w+ ∈ L2 (Ω ), − Ω w+ µ dλ is
R
2
Rfinite for µ ∈ L (Ω ) as in Assumption 1. Regarding, the negative part, we define

µ dw− := ∞, where this expression is not properlyR defined, as w− and µ are
both positive. Combining this, we obtain that − Ω µ dw ∈ R ∪ {∞}.
Note in this context that, if the singular part of w (w.r.t. the Lebesgue measure)
vanishes, then also w+ ∈ L1 (Ω ) and w+ (x) = max{0, w(x)} λ-a.e. in Ω so that both
functionals coincide on L2 (Ω ), which justifies this notation. Furthermore, we also
generalize the map ⊕ to the measure space by setting

α1 ⊕ α2 := α1 ⊗ λ2 + λ1 ⊗ α2 , αi ∈ M(Ωi ), i = 1, 2.

Again, it is easily seen that, for αi ∈ L2 (Ωi ), i = 1, 2, this definition boils down to
the one in (5). Also Remark 2.2 applies in that we can express α1 ⊕ α2 in terms
of the adjoints of P1 and P2 from (2) when defined appropriately.
The next lemma is rather obvious and covers the coercivity of G in L1 (Ω ) as
indicated above.

Lemma 2.5 Let Assumption 1 hold and suppose that a sequence {wn } ⊂ L2 (Ω ) fulfills

G(wn ) ≤ C < ∞ ∀ n ∈ N.
n n
Then, the sequences {w+ } and {w− } are bounded in L2 (Ω ) and L1 (Ω ), respectively.
1 2
R R
Proof We rewrite G as G(w) = Ω 2
w+ − w+ µ dλ + Ω
w− µ dλ. The positivity of
µ then implies
Z Z
n 2
kw+ kL2 (Ω ) = G(wn ) + n
w+ µ dλ − n
w− n
µ dλ ≤ C + kµkL2 (Ω ) kw+ kL2 (Ω ) ,
Ω Ω

which gives the first assertion. To see the second one, we use µ ≥ δ to estimate
Z Z Z
C ≥ G(wn ) = 1
2 ( w+
n
− µ ) 2
dλ − µ 2
/ 2 d λ + n
w− µ dλ
Ω Ω Ω
Z
n
≥− µ2 /2 dλ + δkw− k L 1 (Ω ) ,

which finishes the proof. u


t

The next lemma provides a lower semicontinuity result for G w.r.t. weak∗
convergence in M(Ω ). Note that, here, we need the extension of G as introduced
above.
Quadratically regularized optimal transport 7

Lemma 2.6 Let Assumption 1 be fulfilled and a sequence {wn } ⊂ L2 (Ω ) be given


such that wn *∗ w∗ in M(Ω ) and G(wn ) ≤ C < ∞ for all n ∈ N. Then there holds

w+ ∈ L2 (Ω ) and
G(w∗ ) ≤ lim inf G(wn ). (10)
n→∞

n
Proof By virtue of Lemma 2.5, {w+ } is bounded in L2 (Ω ) and thus, there is a
n
subsequence of {w+ }, to ease notation denoted by the same symbol, that converges
weakly in L2 (Ω ) to some θ+ ∈ L2 (Ω ). Since the set {v ∈ L2 (Ω ) : v ≥ 0 a.e. in Ω}
is clearly weakly closed, we have θ+ ≥ 0 a.e. in Ω . With a little
R abuse of notation,
we denote the Radon measure induced by C (Ω ) 3 ϕ 7→ Ω θ+ ϕ dλ ∈ R by θ+ ,
too. If we define θ− := θ+ − w∗ ∈ M(Ω ), then w− n
= w+ n
− wn *∗ θ− in M(Ω )

with θ− ≥ 0. Thus we have w = θ+ − θ− with two positive Radon measures
θ+ , θ− . The maximality property of the Hahn-Jordan decomposition then implies
∗ ∗
w+ ≤ θ+ . Since θ+ is absolutely continuous w.r.t. λ, the same thus holds for w+ ,
∗ ∗
i.e. w+ ∈ L1 (Ω ). Applying again w+ ≤ θ+ , which clearly also holds for the densities
pointwise λ-almost everywhere, we moreover deduce from the weak convergence
n
of w+ in L2 (Ω ) that
Z Z Z
∗ 2 n 2
(w+ ) dλ ≤ (θ+ )2 dλ ≤ lim inf (w+ ) dλ, (11)
Ω Ω n→∞ Ω


which implies w+ ∈ L2 (Ω ) as claimed. Since the above reasoning applies to every
n
subsequence w+ that is weakly converging in L2 (Ω ), (11) holds for the whole se-
quence {w+ n
}, which together with the weak∗ convergence of wn and the definition
of G, gives (10).
u
t

Before we are in the position to prove existence for (D’), we need two additional
results on the ⊕-operator in the space of Radon measures.
R
Lemma 2.7 If αi ∈ M(Ωi ), i = 1, 2 and Ω2
dα2 = 0, then it holds that

1 2
kα1 kM ≤ |Ω2 |
kα1 ⊕ α2 kM and kα2 kM ≤ |Ω1 |
kα1 ⊕ α2 kM

Proof We estimate
ZZ
kα1 ⊕ α2 kM = sup φ(x1 , x2 ) d(α1 (x1 ) + α2 (x2 ))
kφk∞ ≤1 Ω1 ×Ω2
ZZ
≥ sup φ1 (x1 )φ2 (x2 ) d(α1 (x1 ) + α2 (x2 ))
kφ1 k∞ ≤1 Ω1 ×Ω2
kφ2 k∞ ≤1
" ZZ
= sup φ1 (x1 )φ2 (x2 ) dα1 (x1 )dλ2
kφ1 k∞ ≤1 Ω1 ×Ω2
kφ2 k∞ ≤1
ZZ #
+ φ1 (x1 )φ2 (x2 ) dλ1 dα2 (x2 ) . (12)
Ω1 ×Ω2
8 Dirk A. Lorenz et al.

R
Taking φ2 ≡ 1 and using Ω2
dα2 (x2 ) = 0 gives
Z Z Z
kα1 ⊕ α2 kM ≥ sup φ1 (x1 )dα1 (x1 )|Ω2 | + dα2 (x2 ) φ1 (x1 )dλ1
kφ1 k∞ ≤1 Ω1 Ω2 Ω1

= |Ω2 |kα1 kM .
Now we start again at (12) and estimate from below by taking φ1 ≡ 1 to get
Z Z Z
kα1 ⊕ α2 kM ≥ sup dα1 (x1 ) φ2 (x2 ) dλ2 + φ2 (x2 ) dα(x2 )|Ω1 |
kφ2 k∞ ≤1 Ω1 Ω2 Ω2
Z
≥ −|Ω2 | dα1 (x1 ) + |Ω1 |kα2 kM
Ω1

which implies
|Ω1 |kα2 kM ≤ kα1 ⊕ α2 kM + |Ω2 |kα1 kM
which completes the proof. u
t
The next lemma will be used to show that the negative part of the minimizer
of (D) does not have a singular part.
Lemma 2.8 Let c ∈ L1 (Ω ) and αi ∈ M(Ωi ) for i ∈ {1, 2} with Lebesgue decomposi-
tions, αi = fi + ηi satisfying fi  λ and ηi ⊥ λ for i ∈ {1, 2}.
1. It holds that
(α1 ⊕ α2 − c)+ = (f1 ⊕ f2 − c + (η1 )+ ⊕ (η2 )+ )+ . (13)
2. If (αi )+ is absolutely continuous for i = 1, 2, then for α̃i = αi − (ηi )− for i = 1, 2,
it holds that
Φ(α̃1 , α̃2 ) ≤ Φ(α1 , α2 ).
Proof We first proof point 1. The measures fi , ηi exist by Lebesgue’s decomposition
theorem, see Theorem 1.155 in [12]. We combine these decompositions with α1 ⊕
α2 = α1 ⊗ λ + λ ⊗ α2 to arrive at Lebesgue’s decomposition of α1 ⊕ α2 with respect
to λ ⊗ λ, namely
α1 ⊕ α2 − c = f1 ⊕ f2 − c + η1 ⊕ η2 (14)
f1 ⊕ f2 − c  λ ⊗ λ (15)
η1 ⊕ η2 ⊥ λ ⊗ λ (16)
(which holds true because c ∈ L1 (Ω ) ,→ M(Ω )). Now, we consider the Hahn-Jordan
decomposition of η1 ,
η1 = (η1 )+ − (η1 )−
(η1 )+ ⊥ (η1 )− , (17)
and obtain from (14) that
α1 ⊕ α2 − c = (f1 + η1 ) ⊕ (f2 + η2 ) − c
= f1 ⊕ f2 + η1 ⊕ η2 − c

= f1 ⊕ f2 + (η1 )+ − (η1 )− ⊕ η2 − c
= f1 ⊕ f2 + (η1 )+ ⊗ λ − (η1 )− ⊗ λ + λ ⊗ η2 − c
= f1 ⊕ f2 − c + (η1 )+ ⊕ η2 − (η1 )− ⊗ λ.
Quadratically regularized optimal transport 9

Furthermore,
(η1 )− ⊗ λ ⊥ f1 ⊕ f2 − c + (η1 )+ ⊕ η2
where the singularity with respect to f1 ⊕ f2 − c is due to (15) and (16) and the
singularity with respect to (η1 )+ ⊕ η2 is due to (17). Thus,

(α1 ⊕ α2 − c)− = (f1 ⊕ f2 − c + (η1 )+ ⊕ η2 )− + (−(η1 )− ⊗ λ)−


= (f1 ⊕ f2 − c + (η1 )+ ⊕ η2 )− + (η1 )− ⊗ λ.

as (η1 )− ⊗ λ is a positive measure. Consequently,

(α1 ⊕ α2 − c)+ = (f1 ⊕ f2 − c + (η1 )+ ⊕ η2 )+ .

Repeating this argument with the Hahn-Jordan decomposition of η2 yields the


claim.
The second part of the lemma is a direct consequence of the first: Since (α1 ⊕
α2 − c)+ = (α̃1 ⊕ α̃2 − c)+ , the first summand in the functional Φ is equal for αi
and α̃i . However, the second summand in Φ can not decrease since α̃i ≤ αi , µi ≥ 0
and γh(ηi )− , µi i = ∞ if the duality pairing does not exist. u
t

Now we are ready to prove the existence result for (D’):

Proposition 2.9 Under Assumption 1 the minimization problem (D’) admits a solu-
tion (α1∗ , α2∗ ) ∈ L1 (Ω1 ) × L1 (Ω2 ).

Proof We proceed via the classical direct method of the calculus of variations. For
this purpose, let {(α1n , α2n )} ⊂ L1 (Ω1 ) × L1 (Ω2 ) with (α1n ⊕ α2n − c)+ ∈ L2 (Ω ) be a
minimizing sequence for (D’), where R we shift α1 and α2 by adding and subtracting
constants such that we obtain Ω2 α2 dλ2 = 0. Note that, due to its additive
structure, this does not change the objective Φ in (D’), cf. Remark 2.4.
Next, let us define wn := α1n ⊕ α2n − c. Then, thanks to (9) and Lemma 2.5,
the sequence {wn } is bounded in L1 (Ω ). Hence, there is a weakly∗ converging
subsequence, which we denote by the same symbol w.l.o.g., i.e. wn *∗ w̃ in M(Ω ).
Now, Lemma 2.6 applies giving that

w̃+ ∈ L2 (Ω ), (18)
G(w̃) ≤ lim inf G(wn ). (19)
n→∞

Since {wn } is bounded in M(Ω ), the same holds for {α1n ⊕ α2n } and, as α2n is
normalized, Lemma 2.7 gives that {αin } is bounded in M(Ωi ), i = 1, 2. Therefore,
we can select a further (sub-)subsequence, still denoted by the same symbol to
ease notation, such that

αin *∗ α̃i in M(Ωi ), i = 1 , 2.

Since the mapping M(Ω1 ) × M(Ω2 ) 3 (Rα1 , α2 ) 7→ Rα1 ⊕ α2 ∈M(Ω ) is the adjoint of
the projection mapping C (Ω ) 3 ϕ 7→ Ω2 ϕ dλ2 , Ω1 ϕ dλ1 ∈ C (Ω1 ) × C (Ω2 ), see
Remark 2.2, it is weakly∗ continuous so that

w̃ = α̃1 ⊕ α̃2 − c. (20)


10 Dirk A. Lorenz et al.

Next, we investigate the singular parts of α̃1 and α̃2 . We start with the positive
part and employ Lebesgue’s decomposition of α̃1 and α̃2 :
α̃i = αi∗ + η̃i , αi∗  λi , η̃i ⊥ λi , i = 1 , 2.

In the following we will see that the regular parts αi∗


∈ L1 (Ωi ), i = 1, 2, are exactly
the solution of (D’). For this purpose, we first show that the positive parts of η̃1
and η̃2 vanish. We have α1∗ ⊕ α2∗ − c  λ, η̃1 ⊕ η̃2 ⊥ λ, and, by uniqueness of
Lebesgue’s decomposition, w̃s = η̃1 ⊕ η̃2 . But from (18), we know that (w̃s )+ = 0.
Combining this fact with Lemma 2.8, applied to the case f1 = 0, f2 = 0, and c = 0,
we obtain
(η̃1 ⊕ η̃2 )+ = (η̃1 )+ ⊕ (η̃2 )+ .
and consequently, (η̃i )+ = 0 for i = 1, 2 by positivity. Therefore, (α̃i )+ are L1 -
functions rather than measures and
w̃+ = (α̃1 ⊕ α̃2 − c)+ = (α1∗ ⊕ α2∗ − c)+ (21)
∗ ∗
Now Lemma 2.8 shows feasibility of (α1 , α2 ) for (D’) and we also see that
Z Z Z
Φ(α1∗ , α2∗ ) = 21 (α1∗ ⊕ α2∗ − c)2+ dλ − γ µ1 α1∗ dλ1 − γ µ2 α2∗ dλ2
Ω Ω1 Ω2
Z
≤ G(α̃1 ⊕ α̃2 − c) − c µ dλ
Z Ω (22)
= G(w̃) − c µ dλ

≤ lim inf Φ(α1n , α2n ),
n→∞

which demonstrates the optimality of (α1∗ , α2∗ ). u


t

R
In the following, we assume that Ω2 α2 dλ2 = 0. If this is not the case, then
we can again shift α1∗ and α2∗ without changing the value of Φ, cf. Remark 2.4.
Theorem 2.10 Let Assumption 1 hold. Then every optimal dual solution (α1∗ , α2∗ )
from Proposition 2.9 satisfies αi∗ ∈ L2 (Ωi ), i = 1, 2, and is therefore also a solution of
the original dual problem (D). Moreover, the negative parts of αi∗ are bounded and the
function γ1 ((α1∗ ⊕ α2∗ ) − c)+ has marginals the µ1 and µ2 .

Proof We again consider the positive and the negative part separately and start
with (α1∗ )− . Let ϕ ∈ Cc∞ (Ω1 ) and t > 0 be fixed, but arbitrary. Then, thanks to
0 ≤ ((α1∗ + t ϕ) ⊕ α2∗ − c)+ ≤ (α1∗ ⊕ α2∗ − c)+ + t ϕ+ ,
Proposition 2.9 implies that ((α1∗ + tϕ) ⊕ α2∗ − c)+ ∈ L2 (Ω ) so that (α1∗ + tϕ, α2∗ ) is
feasible for (D’). Therefore, the optimality of (α1∗ , α2∗ ) for (D’) yields
Z Z
1 1 ∗ ∗ 2 ∗ ∗ 2

2 (( α1 + t ϕ ) ⊕ α2 − c )+ − ( α1 ⊕ α2 − c) + dλ − γ µ1 ϕ dλ1 ≥ 0 ∀ t > 0.
Ω t Ω1
2
Owing to the continuous differentiability of R 3 r 7→ ∈ R, the first integrand
r+
converges to 2(α1∗ ⊕ α2∗ − c)+ ϕ λ-a.e. in Ω for t & 0. Moreover, the Lipschitz
continuity of the max-function gives that
1 ∗ 
((α1 + t ϕ) ⊕ α2∗ − c)2+ − (α1∗ ⊕ α2∗ − c)2+ ≤ |ϕ|2 + 2 |ϕ| (α1∗ ⊕ α2∗ − c)2+ a.e. in Ω
t
Quadratically regularized optimal transport 11

holds for 0 < t ≤ 1. Hence, due to Lebesgue’s dominated convergence theorem, we


are allowed to pass to the limit t & 0 and obtain in this way
Z Z 
(α1∗ ⊕ α2∗ − c)+ dλ2 − γµ1 ϕ dλ1 ≥ 0.
Ω1 Ω2

Since ϕ ∈ Cc∞ (Ω ) was arbitrary, the fundamental lemma of the calculus of varia-
tions thus gives
Z
(α1∗ ⊕ α2∗ − c)+ dλ2 = γµ1 λ1 -a.e. in Ω1 . (23)
Ω2

Next, define the following sequence of functions in L1 (Ω2 ):

fn (x2 ) := (−n + α2∗ (x2 ) − c)+ , n ∈ N,

where c is the lower bound for c from Assumption 1. Then we have fn ≥ 0 λ2 -


a.e. Ω2 and fn & 0 λ2 -a.e. in Ω2 so that the monotone convergence theorem gives
Z Z
(−n + α2∗ (x2 ) − c)+ dλ2 = fn (x2 ) dλ2 → 0 as n → ∞.
Ω2 Ω2

Thus there exists N ∈ N such that


Z
(−N + α2∗ (x2 ) − c)+ dλ2 < γ δ, (24)
Ω2

where δ > 0 is the threshold for µ1 from Assumption 1. Now assume that α1∗ ≤ −N
λ1 -a.e. on a set of E ⊂ Ω1 of positive Lebesgue measure. Then
Z Z
(α1∗ ⊕ α2∗ − c)+ dλ2 ≤ (−N ⊕ α2∗ − c)+ dλ2 < γ δ ≤ γ µ1 λ1 -a.e. in E,
Ω2 Ω2

which contradicts (23). Therefore, α1∗ > −N λ1 -a.e. in Ω1 , which even implies that
(α1∗ )− ∈ L∞ (Ω1 ). Concerning (α2∗ )− , one can argue in exactly the same way to
conclude that (α2∗ )− ∈ L∞ (Ω2 ), too.
For the positive parts we find

|Ω2 | kα1∗ k2L2 (Ω1 ) + |Ω1 | kα2∗ k2L2 (Ω2 )


Z Z
|α1∗ ⊕ α2∗ |2 dλ α2∗ dλ2 = 0

= since
Ω Ω2
Z
= (α1∗ ⊕ α2∗ )2+ + (α1∗ ⊕ α2∗ )2− dλ

Z
≤2 (α1∗ ⊕ α2∗ − c)2+ + c2+ + (α1∗ )2− + (α2∗ )2− dλ < ∞,

where we used (21) and the boundedness of the negative R parts proven above. Note
that the constant shift, potentially needed to ensure Ω2 α2∗ dλ2 = 0 has no effect
on the equation in (21) due to the additive structure of ⊕.
We have thus shown that (α1∗ , α2∗ ) is feasible for (D). Since (α1∗ , α2∗ ) solves (D’),
whose objective is the same as in (D), while its feasible set is larger, this implies
that we have found a solution to (D). u
t
12 Dirk A. Lorenz et al.

We now show that, if π ∗ is of the form π ∗ = γ −1 (α1∗ ⊕α2∗ −c)+ with two functions
αi∗∈ L2 (Ωi ), i = 1, 2, and has the marginals µ1 and µ2 , respectively, then it solves
the necessary and sufficient optimality conditions of the primal problem (1) in
form of the following variational inequality:
π∗ ∈ F , hγπ ∗ + c, π − π ∗ iL2 ≥ 0 ∀ π ∈ F. (VI)
Herein, F is the (convex) feasible set of (1), i.e.
n Z
F := π ∈ L2 (Ω ) : π ≥ 0 λ-a.e. in Ω, π dλ2 = µ1 λ1 -a.e. in Ω1 ,
Ω2
Z o
π dλ1 = µ2 λ2 -a.e. in Ω2 .
Ω1

For this purpose, let π ∈ F be fixed but arbitrary. Multiplying the equality con-
straints in F with α1∗ and α2∗ , respectively, integrating the arising equations and
add them yields
Z Z Z
µ1 α1∗ dλ1 + µ2 α2∗ dλ2 = π (α1∗ ⊕ α2∗ )dλ
Ω1 Ω2 Ω
Z Z
π (α1∗ ⊕ α2∗ − c)+ + c dλ − π (α1∗ ⊕ α2∗ − c)− dλ
 
=
Ω Ω
Z
≤ π (γπ ∗ + c)dλ, (25)

where we used π ≥ 0 for the last inequality. Using the feasibility of π ∗ , we find
similarly
Z Z Z
µ1 α1∗ dλ1 + µ2 α2∗ dλ2 = π ∗ (α1∗ ⊕ α2∗ − c) + c dλ
 
Ω1 Ω2 Ω
Z
γ −1 (α1∗ ⊕ α2∗ − c)+ (α1∗ ⊕ α2∗ − c) + c dλ
 
=
ZΩ
= π ∗ (γπ ∗ + c)dλ. (26)

Combining (25) and (26) now yields (VI). As (1) is a strictly convex minimization
problem, this shows that, if π ∗ has the form π ∗ = γ −1 (α1∗ ⊕ α2∗ − c)+ with functions
αi∗ ∈ L2 (Ωi ) and satisfies π ∗ ∈ F , then it is a solution of (1). On the other
hand, we know from Theorem 2.10 that, under Assumption 1 (more or less needed
for the existence of solutions of (1) anyway), there always exist αi∗ ∈ L2 (Ωi ) so
that π ∗ = γ −1 (α1∗ ⊕ α2∗ − c)+ satisfies the equality constraints in F . Therefore, in
summary we have deduced the following:
Theorem 2.11 (Necessary and Sufficient Optimality Conditions for (1)) Un-
der Assumption 1, π ∗ ∈ L2 (Ω ) is a solution of (1) if and only if there exist functions
αi∗ ∈ L2 (Ωi ), i = 1, 2, such that the following optimality system is fulfilled:
π∗ − 1
α1∗ ⊕ α2∗ − c

γ +
=0 λ-a.e. in Ω, (27a)
Z
α1∗ ⊕ α2∗ − c

+
dλ2 = γµ1 λ1 -a.e. in Ω1 , (27b)
Ω2
Z
α1∗ ⊕ α2∗ − c

+
dλ1 = γµ2 λ2 -a.e. in Ω2 . (27c)
Ω1
Quadratically regularized optimal transport 13

The significance of Theorem 2.11 lies in the fact that we can characterize
optimality of π by just two equalities in L2 (Ω1 ) and L2 (Ω2 ), respectively, namely
(27b) and (27c). Thus, we effectively reduce the size of the problem from searching
one function on Ω = Ω1 × Ω2 to searching two functions, one on Ω1 and one on Ω2
(similarly as for entropic regularization, cf. [4]). This will be exploited numerically
in Section 3.

2.3 Regularization of the dual problem

As seen before, the dual problem in (D) is not uniquely solvable. One source of
non-uniqueness is of course the kernel of the map (α1 , α2 ) 7→ α1 ⊕ α2 . This kernel
is one-dimensional and is spanned by the function (1, −1), which could be easily
taken into account in an algorithmic framework. However, there is another source
of non-uniqueness due to the max-operator that cuts of the negative part. Here
is a simple example where dual solutions are not unique: For Ω1 = Ω2 = [0, 1],
µ1 = µ2 ≡ 1, and
(
C, if 21 ≤ x ≤ 1, 1
2 ≤ y ≤ 1,
c(x, y ) := with C > 4,
0, else,

one can show by a straight forward calculation that, for every δ ∈ [0, C− 4
2 ], the
tuple
( (
∗ 1 + δ, if x ∈ [0, 12 ), ∗ 3 + δ, if y ∈ [0, 21 ),
α1 (x) = α2 (y ) =
−1 − δ, if x ∈ [ 12 , 1], 1 − δ, if y ∈ [ 12 , 1],

solves the optimality system (27b)–(27c). This shows that the potential structure
of non-uniqueness might become fairly intricate. A situation like this can certainly
happen in the discretized problem we will derive in Section 2.4 and can lead to
problems when we derive algorithms for the discrete problem since non-unique
solutions imply a degenerate Hessian at the optimum.
Therefore, we investigate the following regularization of the dual problem:

kα1 k2L2 (Ω1 ) + kα2 k2L2 (Ω2 )


ε
 )
min Φε (α1 , α2 ) := Φ(α1 , α2 ) + 2
(Dε )
s.t. αi ∈ L2 (Ωi ), i = 1, 2,

with a regularization parameter ε > 0. It is clear that the additional quadratic


terms in the regularized objective Φε yield that the latter is strictly convex and
coercive in L2 (Ω1 ) × L2 (Ω2 ). Therefore, for every ε > 0, (Dε ) admits a unique
solution.

Proposition 2.12 Let {εn } ⊂ R+ be a sequence converging to zero and denote the
solutions of (Dε ) with ε = εn by (α1n , α2n ) ∈ L2 (Ω1 ) × L2 (Ω2 ). Then the sequence
{(α1n , α2n )} admits a weak accumulation point, every weak accumulation point is also
strong one and a solution of the original dual problem (D).
14 Dirk A. Lorenz et al.

Proof Let (α1∗ , α2∗ ) ∈ L2 (Ω1 ) ×L2 (Ω2 ) denote an arbitrary globally optimal solution
of (D) (whose existence is guaranteed by Theorem 2.10). Then the optimality of
(α1∗ , α2∗ ) for (D) and of (α1n , α2n ) for (Dε ) (with ε = εn ) gives

Φ(α1∗ , α2∗ ) + εn
kα1n k2L2 (Ω1 ) + kα2n k2L2 (Ω2 ) ≤ Φεn (α1n , α2n ) ≤ Φεn (α1∗ , α2∗ )

2

which implies

kα1n k2L2 (Ω1 ) + kα2n k2L2 (Ω2 ) ≤ kα1∗ k2L2 (Ω1 ) + kα2∗ k2L2 (Ω2 ) . (28)

Thus, the boundedness of {(α1n , α2n )} in L2 (Ω1 ) × L2 (Ω2 ). This in turn gives the
existence of a weak accumulation point as claimed.
Now assume that (α̃1 , α̃2 ) is such a weak accumulation point, i.e.

(α1n , α2n ) * (α̃1 , α̃2 ) in L2 (Ω1 ) × L2 (Ω2 ) (29)

(for a subsequence). Using again the optimality of (α1∗ , α2∗ ) and (α1n , α2n ), respec-
tively, we obtain

Φ(α1∗ , α2∗ ) ≤ Φ(α1n , α2n ) ≤ Φεn (α1n , α2n ) ≤ Φεn (α1∗ , α2∗ ) → Φ(α1∗ , α2∗ ). (30)

On the other hand, by convexity and weak lower semicontinuity of Φ we get from
(29) and (30) that

Φ(α̃1 , α̃2 ) ≤ lim inf Φ(α1n , α2n ) = lim Φ(α1n , α2n ) = Φ(α1∗ , α2∗ ),
n→∞ n→∞

which gives in turn the optimality of the weak limit. Estimate (28) for the choice
(α1∗ , α2∗ ) = (α̃1 , α̃2 ) shows that

kα1n k2L2 (Ω1 ) + kα2n k2L2 (Ω2 ) ≤ kα̃1 k2L2 (Ω1 ) + kα̃2 k2L2 (Ω2 )

and thus, we have

lim inf kα1n k2L2 (Ω1 ) + kα2n k2L2 (Ω2 ) ≤ kα̃1 k2L2 (Ω1 ) + kα̃2 k2L2 (Ω2 ) ,
n→∞

but (α1n , α2n ) 9 (α̃1 , α̃2 ) would imply

kα̃1 k2L2 (Ω1 ) + kα̃2 k2L2 (Ω2 ) < lim inf kα1n k2L2 (Ω1 ) + kα2n k2L2 (Ω2 )
n→∞

and consequently, we have (α1n , α2n ) → (α̃1 , α̃2 ) in L2 (Ω1 ) × L2 (Ω2 ). u


t

Theorem 2.13 Let {εn } ⊂ R+ be a sequence converging to zero and denote the solu-
tions of (Dε ) with ε = εn again by (α1n , α2n ) ∈ L2 (Ω1 ) × L2 (Ω2 ). Moreover, define

n
πn := 1
γ (α1 ⊕ α2n − c)+ . (31)

Then πn converges strongly in L2 (Ω ) to the unique solution of (1).


Quadratically regularized optimal transport 15

Proof From (28), we know that {(α1n , α2n )} is bounded and hence, {πn } is bounded
in L2 (Ω ). Thus,
πn * π̃ in L2 (Ω ) (32)
for some subsequence. Now we show that π̃ is the optimal for (1). Weak closedness
of {π ∈ L2 (Ω ) : π (x1 , x2 ) ≥ 0 a.e. in Ω} implies π̃ ≥ 0. Integrating the first-order
optimality conditions for (Dε )
Z
(α1n ⊕ α2n − c)+ dλ2 + ε α1n = γ µ1 λ1 -a.e. in Ω1 (33)
Ω2
Z
(α1n ⊕ α2n − c)+ dλ1 + ε α2n = γ µ2 λ2 -a.e. in Ω2 . (34)
Ω1

against some ϕ1 ∈ Cc∞ (Ω1 ), inserting the definition of πn , and integrating over Ω1
yields Z Z Z Z
εn
πn dλ2 ϕ1 dλ1 = µ1 ϕ1 dλ1 − α1n ϕ1 dλ1
Ω1 Ω2 Ω1 γ Ω1
Passing to the limit we obtain
Z Z Z
π̃ dλ2 ϕ1 dλ1 = µ1 ϕ1 dλ1 ,
Ω1 Ω2 Ω1

and thus, π̃ satisfies the first equality constraint in (1). The second equality con-
straint can be verified analogously.
To show optimality of π̃ , we test the optimality conditions in (33) and (34)
with α1n and α2n , respectively, and get
Z
n n γ2 2
Φεn (α1 , α2 ) = 2 kπn kL2 (Ω ) − γ πn (α1n ⊕ α2n ) dλ − ε2n kα1n k2L2 (Ω1 ) − ε2n kα2n k2L2 (Ω2 )

Z
γ2 2
= − 2 kπn kL2 (Ω ) − γ c πn dλ − ε2n kα1n k2L2 (Ω1 ) − ε2n kα2n k2L2 (Ω2 )

= −γEγ (πn ) − εn
2 kα1n k2L2 (Ω1 ) − εn
2 kα2n k2L2 (Ω2 ) ,

where Eγ is the primal objective from (3). Similarly, we get

Φ(α1∗ , α2∗ ) = −γ Eγ (π ∗ ),

where π ∗ ∈ L2 (Ω ) is the unique solution of (1) and (α1∗ , α2∗ ) ∈ L2 (Ω1 ) × L2 (Ω2 )
solves the dual problem (D). Now, putting everything so far together, we obtain
 
lim Eγ (πn ) = lim − γ1 Φεn (α1n , α2n ) − 2εnγ kα1n k2L2 (Ω1 ) − 2εnγ kα2n k2L2 (Ω2 )
n→∞ n→∞
= − γ1 Φ(α1∗ , α2∗ ) = Eγ (π ∗ ).

On the other hand, Eγ is weakly lower semicontinuous, and therefore

Eγ (π̃ ) ≤ lim inf Eγ (πn ) = Eγ (π ∗ ).


n→∞

This gives the optimality of π̃ and by strict convexity also uniqueness, i.e. π̃ =
π ∗ . Thus, the weak limit is unique and a well known argument by contradiction
therefore implies the weak convergence of the whole sequence {πn } to π ∗ . Finally,
strong convergence follows from a standard argument. u
t
16 Dirk A. Lorenz et al.

2.4 The discrete dual problem

We show a simple discretization of the quadratically regularized optimal transport


problem (1) by piecewise constant approximation in Appendix A. To keep the
notation concise, we state the corresponding discrete optimal transport problem
and illustrate the duality already here. This will be the basis of our algorithms
we derive in Section 3. A discrete version of the continuous problem (1) is the
finite-dimensional problem
min hπ, ci + γ2 kπk2F s.t. π T 1M = µ, π1N = ν, π ≥ 0 (35)
π∈RM ×N

where 1N ∈ RN denotes the PNvector ofPall ones, µ ∈ RN , ν ∈ RM denote the


M
discretized marginals with j =1 µj = i=1 νj , and c ∈ RM ×N denotes the dis-
cretized cost. Note that we slightly changed the notation from µ1 and µ2 to µ
and ν , respectively. For the discrete form of the optimality system (27) we further
replace the Lagrange multipliers α1 and α2 by α and β , respectively, and get
1
π= γ (α ⊕ β − c )+ (36a)
M
X
(αi + βj − cij )+ = γµj , j = 1, . . . , N (36b)
i=1
N
X
(αi + βj − cij )+ = γνi , i = 1, . . . , M (36c)
j =1

where α ∈ RM , β ∈ RN and (α ⊕ β )i,j = αi + βj is the “outer sum”. The discrete


counterpart of Φ from (D) is
Φ(α, β ) = 1
2 k (α ⊕ β − c)+ k2F − γhν, αi − γhµ, βi
where k · kF denotes the Frobenius norm.
We write the optimality condition (36b)-(36c) as a non-smooth equation F (α, β ) =
0 in RM +N with
!
( N
  P
F1 (α, β ) j =1 (αi + βj − c ij ) +
− γν i ) i =1,...,M
F (α, β ) = = PM (37)
F2 (α, β ) ( i=1 (αi + βj − cij )+ − γµj )j =1,...,N
(note that F1 = ∂α Φ and F2 = ∂β Φ). Since F is the composition of Lipschitz
continuous and semismooth functions, we have the following result (for the chain
rule for semismooth functions, see e.g. [14, Thm. 2.10]):
Lemma 2.14 The function F (and thus, the gradient of Φ) is (globally) Lipschitz
continuous and semismooth.

3 Algorithms

The optimality system (36b), (36c) for the smooth and convex problem (D) can be
solved by different methods. In [3] the authors propose to use a generic L-BFGS
solver and also derive an alternating minimization scheme, which is similar to
the non-linear Gauss-Seidel method in the next section, but differs slightly in the
numerical realization and [20] also uses an off-the-shelf solver. Here we propose
methods that exploit the special structure of the optimality system: A non-linear
Gauss-Seidel method and a semismooth Newton method.
Quadratically regularized optimal transport 17

3.1 Non-linear Gauss-Seidel

The method in this section is similar to the one described in the Appendix of [3],
but we describe it here for the sake of completeness. A close look at the optimality
system
N
X
(αi + βj − cij )+ = γνi , i = 1, . . . , M. (38a)
j =1
M
X
(αi + βj − cij )+ = γµj , j = 1, . . . , N (38b)
i=1

shows that we can solve all M equations in (38a) for the αi in parallel (for fixed β )
since the ith equation depends on αi only. Similarly, all N equations in (38b) can
be solved for the βj if α is fixed. Hence, we can perform a non-linear Gauss-Seidel
method for these non-smooth equations (also known as alternating minimization,
nonlinear SOR or coordinate descent method for Φ [6, 25]), i.e. alternatingly solving
the equations (38a) for α (for fixed β ) and then the equations (38b) for β (for fixed
α). The whole method is stated in Algorithm 1. Since Φ is convex with Lipschitz
continuous gradient (cf. Lemma 2.14) the convergence of the algorithm follows
from results in [2].

Algorithm 1 Non-linear Gauss-Seidel for quadratically regularized optimal trans-


port
Initialize: β 0 ∈ RN , set k = 0
repeat
Set αk+1 to be the solution of (38a) with β = β k .
Set β k+1 to be the solution of (38b) with α = αk+1 .
k ←k+1
until some stopping criterion

Each equation for an αi or βj is just a single scalar equation for a scalar


quantity and the structure of the equation is of the following form: For a given
vector y ∈ Rn and right hand side b ∈ R, solve
n
X
f (x) := (x − yj )+ = b. (39)
j =1

Of course, one can solve this problem by bisection, but here are two other, more
efficient methods to solve equations of the type (39):
Direct search. If we denote by y[j ] the j -th smallest entry of y (i.e. we sort y in
an ascending way), we get that
n
X
f (x) = (x − y [ j ] )+
j =1

0,
 x ≤ y[1]
= kx − kj=1 y[j ] ,
P
y[k] ≤ x ≤ y[k+1] , k = 1, . . . , n − 1
 Pn
nx − j =1 y[j ] , x ≥ y[n] .

18 Dirk A. Lorenz et al.

To obtain the solution of (39) we evaluate f at the break points y[j ] until we
find the interval [y[k] , y[k+1] [ in which the solution lies (by finding k such that
f (y[k] ) ≤ b < f (y[k+1] )), and then setting
Pk
b+ j =1 y[j ]
x= .
k
The complexity of the method is dominated by the sorting of the vector y , its
complexity is O(n log(n)).
Semismooth Newton. Although f is non-smooth, we may perform Newton’s
method here. The function f is piecewise linear and on each interval ]y[j ] , y[j +1] [
is has the slope j (a simple situation with n = 3 is shown in Figure 1). At the
break points we may define f 0 (y[j ] ) = j and then we iterate
f (xk )
xk+1 = xk − f 0 (x k )
.

If we start with x0 ≥ y [n] = maxk yk , the method will produce a monotoni-


cally decreasing sequence which converges in at most n steps. Actually, we can
initialize the method with any x0 that is strictly larger than y[1] = mink yk .
Note that we do not need to sort the values of yk to calculate the derivative
since we have f 0 (x) = #{i : x ≥ yi }. In practice, the method usually needs
much less iterations than n.

f (x)

y[1] y[2]
y[3] x

Fig. 1 Illustration of the non-smooth function f from (39).

3.2 Semismooth Newton

As seen in Lemma 2.14, the mapping F is semismooth and hence, we may use a
semismooth Newton method [5, 7].
A simple calculation proves the following lemma.
Lemma 3.1 A Newton derivative of F from (37) at (α, β ) is given by
 
diag(σ1N ) σ
G= ∈ R(M +N )×(M +N )
σT diag(σ T 1M )
Quadratically regularized optimal transport 19

where σ ∈ RM ×N is given by
(
1 αi + βj − cij ≥ 0
σij =
0 otherwise.
A step of the semismooth Newton method for the solution of F (α, β ) = 0 would
consist of setting
 k+1   k   k   k
α α δα k k δα
k+1 = k − k where F ( α , β ) = G k .
β β δβ δβ
However, the next lemma shows, that G has a non-trivial kernel.
Lemma 3.2 Let G be the Newton derivative of F at (α, β ) defined in Lemma 3.1.
Then the following holds true:
1. G ∈ R(M +N )×(M +N ) is symmetric,
2. G is positive semi-definite,
3. (a, b) ∈ kern(G) if and only if σij (ai + bj ) = 0 for all 1 ≤ i ≤ M , 1 ≤ j ≤ N .
Proof Symmetry of G is clear by construction. To see that G is positive semi-
definite we calculate
N X
M N X
M N X
M
(a, b)> G(a, b) =
X X X
σij a2i + σij b2j + 2 σij ai bj
j =1 i=1 j =1 i=1 j =1 i=1
N X
X M
= σij (ai + bj )2 ≥ 0.
j =1 i=1

Due to the non-negativity of σ , this also shows the last point. u


t
The third point of the lemma shows that the kernel of G may have a high dimen-
sion, depending on the matrix σ . Hence we resort to a quasi Newton method where
we regularize the Newton step arising from the dual problem from Section 2.2 by
setting
 k+1   k   k   k
α α δα k k δα
k+1 = k − k where F ( α , β ) = (G + εI ) k
β β δβ δβ
with a small ε > 0. By [5], the method still converges, but only a local linear rate
is guaranteed. We note that we have not applied the semismooth Newton method
to the regularized dual problem from Section 2.3. This would also be possible, but
lead not only to the regularized Newton matrix from above but we would also have
to adapt the objective F in the computation of the update.
Let us make a few remarks on the the regularized Newton step and its numerical
treatment.
– The matrix σ (and hence the Newton matrix G) is usually very sparse. The
closer α and β are to the optimal ones, the closer (αi + βj − cij )+ is to the
optimal regularized transport plan π and for small γ this usually very sparse.
– Since G is positive semi-definite, the regularized step could be done by the
method of conjugate gradients. However, any linear solver that can exploit the
sparsity of G can be used.
As usual, the regularized semismooth Newton method may not converge globally.
A simple globalization technique is an Armijo linesearch in the Newton direction.
The full method is described in Algorithm 2.
20 Dirk A. Lorenz et al.

Algorithm 2 Globalized and regularized semismooth Newton method quadrati-


cally regularized optimal transport
Initialize: α0 ∈ RM , β 0 ∈ RN , set k = 0, choose regularization parameter ε > 0, Armijo
parameters θ, κ ∈]0, 1[, and a tolerance τ > 0
repeat
Calculate
(
1 Pij ≥ 0
Pij = αki + βjk − cij , σij = , and πij = max(Pij , 0)/γ.
0 otherwise

Calculate δα and δβ by solving


     
diag(σ1N ) σ δα π1N − ν
T T + εI = −γ T
σ diag(σ 1M ) δβ π 1M − µ

Set t = 1 and compute the directional derivative


X
d = D(δα,δβ) Φ(αk , β k ) = γ πij (δαi + δβj ) − γ(hδα, νi + hδβ, µi).
ij

while Φ(αk + tδα, β k + tδβ) ≥ Φ(αk , β k ) + tθd do


t ← κt
end while
Set αk+1 = αk − tδα, β k+1 = β k − tδβ
k ←k+1
until kπ1N − νk∞ , kπ T 1M − µk∞ ≤ τ

4 Numerical examples

4.1 Illustration of γ → 0

In our first numerical example we illustrate the how the solutions π ∗ of the regular-
ized problem converge for vanishing regularization parameter γ → 0. We generate
some marginals, fix a transport cost and compute solutions of the discretized
transport problems (35) for a sequence γn → 0 and illustrate the optimal trans-
port plans (and the related regularized transport costs). Our marginals are non-
negative functions sampled at equidistant points xi , yi in the interval [0, 1] and we
used M = N = 400 and the cost cij = (xi − yj )2 is the squared distance between
the sampling points. The results are shown in Figure 2. One observes that the
optimal transport plans converge to a measure that is singular and is supported
on the graph of a monotonically increasing function, exactly as the fundamental
theorem of optimal transport [1] predicts.
We repeat the same experiment where the cost is the (non-squared) distance
cij = |xi − yj |. Here we had to choose larger regularization parameters as it turned
out that values similar to Figure 2 would lead to almost undistinguishable results.
The results are shown in Figure 3. Note the different structure of the transport
plan (which is again in agreement with the predicted results from the fundamental
theorem of optimal transport).
p In Figure 4 we show the results for the concave
but increasing cost cij = |xi − yj | and again observe the expected effect that a
concave transport cost encouraged that as much mass as possible stays in place
(as can be seen by the concentration of mass along the diagonal of the transport
plan).
Quadratically regularized optimal transport 21

γ = 10 γ=1 γ = 0.1 γ = 0.01

Fig. 2 Visualization of transport plans of the quadratically regularized optimal transport


problem with M = N = 400 and quadratic transport cost cij = (xi − yj )2 .

γ = 1, 000 γ = 100 γ = 10 γ=1

Fig. 3 Visualization of transport plans of the quadratically regularized optimal transport


problem with M = N = 400 and metric transport cost cij = |xi − yj |.

γ = 1, 000 γ = 100 γ = 10 γ=1

Fig. 4 Visualization of transport plans of the quadratically regularized


poptimal transport
problem with M = N = 400 and concave increasing transport cost cij = |xi − yj |.

4.2 Mesh independence and comparsion of SSN and NLGS

While we did not analyze our algorithms in the continuous case, we made an
experiment to see how the methods converge when we change the mesh size of the
discretization. To that end, we did a simple piecewise constant approximation of
the marginals, the cost and the transport plan as described in Appendix A. This
derivation shows that one has to scale up the marginals for finer discretization (or,
equivalently, scale down the regularization parameter γ ) to get consistent results.
We also took care to adapt the termination criteria so that we terminate the
algorithms when the continuous counterpart of the termination criteria is satisfied
(again, see Table 1 in Appendix A for details).
We used marginals µ± : [0, 1] → [0, ∞[ of the form
 
µ(x) = r 1+m(1x−a)2 , , ν (x) = s 1
1+m1 (x−a1 )2
+ 1
1+m2 (x−a2 )2
22 Dirk A. Lorenz et al.

with varying m, m1 , m2 > 0, 0 < a, a1 , a2 < 1 and appropriate normalization


factors r, s and quadratic cost c(x, y ) = (x − y )2 and discretized each instance of
the problem with M = N varying from 10 to 1, 000. We solved the problem for each
size for regularization parameter γ = 0.001 with the semismooth Newton method
from Algorithm 2 (with parameters  = 10−6 and Armijo parameters κ = 0.5
and θ = 0.1) up to tolerance 10−3 and report the number of iterations needed
in Figure 5. As can be observed, the number of iterations is comparable for each
instance of the problem. Moreover, it seems that the number of iterations does not
grow with finer discretization (however, the number of iterations seems to oscillate
unpredictable for coarse discretization). The would hint at mesh independence of
the method and one could hope to prove this is future research. We performed a
similar experiment for the nonlinear Gauss-Seidl method from Algorithm 1 (with
larger regularization parameter γ = 0.05 and only up to M = N = 500 and show
the results in Figure 6. We see an overall increase of the number of iterations but
only very slightly (with several instances where the number of iterations does not
increasing with finer discretization).

100

80
# ssn iterations

60

40

20

0
0 200 400 600 800 1,000
M =N

Fig. 5 Number of iteration for the semismooth Newton method to achieve a desired accuracy.
Each graph corresponds to one instance of the problem.

4.3 Optimal transport between empirical distributions

As an example in two space dimensions, we consider two distributions µ, ν . Instead


of using these as marginals, we consider empirical distributions, i.e. we generate
samples (xi )i=1,...,N , sampled from µ and (yj )j =1,...,M , sampled from ν . These
samples give empirical approximations
N
X M
X
1 1
µ̂ = N δx i , ν̂ = M δyj .
i=1 j =1

The optimal transport problem (1) with these two marginals does no fulfill As-
sumption 1, since the marginals are not L2 -functions. However, we can consider it
as a discrete problem optimal transport problem in the form (35) when we denote
Quadratically regularized optimal transport 23

150

# gs iterations
100

50

0
0 100 200 300 400 500
M =N

Fig. 6 Number of iteration for the nonlinear Gauss-Seidel semismooth method to achieve a
desired accuracy. Each graph corresponds to one instance of the problem.

cij = c(xi , yj ) (for some cost c) and marginals 1M and 1N , respectively. We solve
this discrete optimal transport problem and obtain a transport plan π ∗ . Since we
use quadratic regularization, the plan will be sparse and hence, we can visualize
it by plotting arrows from xi to yj and we make the thickness of the arrows pro-

portional to the size of the entry πij . In other words: The thickness of the arrow
from xi to yj indicates how much of the mass in xi has been transported to yj .
In Figure 7 we show the result for N = 80 samples from an anisotropic Gaus-
sian distribution (centered at the origin) and M = 120 samples from a uniform
distribution on a segment of an annulus. We used c(xi , yj ) = kxi − yj k2 with the
Euclidean norm and regularization paramater γ = 1. The resulting plan π ∗ has 212
non-zero entries. For a comparison we show the result of entropically regularized
optimal transport in the same situation in Figure 8. We used γ = 0.05 (which is
the smallest value for which our naive implementation of Sinkhorn algorithms is
still stable). The resulting plan has 6730 nonzero entries and we only plot lines
for the transport which are larger than 1% of the largest entry in the optimal
transport plan.

5 Conclusion

We analyzed the quadratically regularized optimal transport problem in Kan-


torovich form. While it is straight forward to derive the dual problem, our proof
of existence of dual optima is quite intricate. We note that we are not aware of
any proof of existence of the dual of other regularized transport problems in the
continuous case besides the very recent [8] for entropic regularization. We derived
two algorithms to solve the dual problems, both of which converge by standard
results. It turns out that the semismooth quasi-Newton methods converges fast in
all cases and that it behaves stably with respect to the regularization parameter in
our numerical experiments. We even observe mesh independence of the method in
the experiments. One drawback of the semismooth Newton method is (compared
with, e.g. the Sinkhorn iteration [9]), is that we need to assemble the Newton
matrix in each step. While this matrix is usually very sparse, one still needs to
check M N cases, which may be too large for large scale problems. We did not
24 Dirk A. Lorenz et al.

optimal transport plan

5 20
4
40
3
60
2
1 80
20 40 60 80 100 120
0
histogram of optimal plan
-1 10 4
-2
-3
10 2
-4
-5

10 0
-5 0 5 0 0.002 0.004 0.006 0.008 0.01

Fig. 7 Illustration of the quadratically regularized optimal transport between empirical dis-
tributions. Left: Source distribution µ̂ denoted by blue starts and target distribution ν̂ denoted
by red circles together with lines that indicate the transport. Right: The transport plan and
its histogram in semi-log scale.

optimal transport plan

5 20
4
40
3
60
2
1 80
20 40 60 80 100 120
0
histogram of optimal plan
-1 4
10
-2
-3
10 2
-4
-5

10 0
-5 0 5 0 0.002 0.004 0.006 0.008 0.01

Fig. 8 Illustration of the entropically regularized optimal transport between empirical distri-
butions. Left: Source distribution µ̂ denoted by blue starts and target distribution ν̂ denoted
by red circles together with lines that indicate the transport. Right: The transport plan and
its histogram in semi-log scale.

investigate, how special structure of the cost function c may help to reduce the
cost to assemble the sparse matrix σ .

Acknowledgements We would like to thank the reviewer for helpful suggestions that lead
to an improvement presentation and also Stephan Walther (TU Dortmund) for helping with
the construction of the counterexample in Section 2.3.

A Discretization with piecewise-constant ansatz functions

For sake of brevity, we just consider an equidistant discretization of [0, 1] into N intervals using
piecewise constant ansatz functions, i.e.

N
X −1
π(x, y) := πij χ( i , i+1 j j+1
)×( N , N )
(x, y),
N N
i,j=0
Quadratically regularized optimal transport 25

for coefficients πij and assume analogous definitions for the quantities c, µ+ , µ− , α and β.
They have to coincide on average over the intervals. Again, we study this for π and obtain
that the identity
Z i+1 Z j+1 Z i+1 Z j+1 N −1
N N N N X
π(x, y) dydx = πij χ( i , i+1 j j+1
)×( N , N )
(x, y) dydx
i j i j N N
N N N N i,j=0
1
= πij
N2
holds. Again, analogous identities hold for the quantities c, µ+ , µ− , α and β. The ones with
1
one-dimensional domain are scaled by N instead of N12 .
Now, we consider the discrete Algorithm 2, which operates on discrete quantities and
establish a consistent mapping of the quantities from the discretization to the ones of the
solver. We denote its input quantities by c̄ij , µ̄− + ¯
i , µ̄i and its output quantities by ᾱi , β̄j , piij ,
and Ē. It solves for
N
X −1
π̄ij = γ µ̄+
i ,
j=0

which we desire to correspond to


i+1 i+1
Z
N
Z 1 Z
N
π(x, y) dydx = µ+ (x) dx.
i 0 i
N N

We plug in the ansatz functions and obtain the identity


N −1
1 X 1 +
πij = µ .
N 2 j=0 N i

We set π̄ij := γπij and obtain


N −1 N −1
πij = N µ−
X X
π̄ij = i .
j=0 j=0

Thus, the choice µ̄− − + +


i := N µi gives a consistent conversion. Similarly, we obtain µ̄j := N µj .
We proceed with the objective. Plugging in the ansatz functions into the continuous objective
gives
 
N −1 N −1 N −1
γ 1 X 2 1 X −
X
+
E= π − αi µi + βj µj .
2 N 2 i,j=0 ij N i=0 j=0

The solver computes


 
N −1 N −1 N −1
1 X 2 X

X
+
Ē = π̄ − γ  ᾱi µ̄i + β̄j µ̄j  ,
2 i,j=0 ij i=0 j=0

Plugging in N µ− − + +
i = µ̄i , N µj = µ̄j and γπij = π̄ij gives
 
N −1 N −1 N −1
21 −
X X X
2 +
Ē = γ π − γN ᾱi µi + β̄j µj .
2 i,j=0 ij

i=0 j=0

1
Thus, the consistent identity E = γN 2
Ē follows if we choose ᾱi := αi and β̄i := βi . The solver
computes ᾱi as the solution of
N −1
(ᾱi + β̄j − c̄ij )+ = γ µ̄−
X
i ,
j=0
26 Dirk A. Lorenz et al.

whereas the discretization of the corresponding continuous equation reads

N −1
1 X
(αi + βj − cij )+ = γµ−
i
N j=0

in terms of the coefficients. Plugging in the choices αi = ᾱi , βj = β̄j , cij = c̄ij and N µ− −
i = µ̄i
yields equivalence of the latter equation to

N −1
1 X 1
(ᾱi + β̄j − c̄ij )+ = γ µ̄− ,
N j=0 N j

which is equivalent to the equation that is solved by Algorithm 2. The argument for µ̄+
j is
carried out analogously.
Regarding termination, the solver checks the criteria

M −1 M −1
1 X 1 X
π̄ij − µ̄−
i <τ and π̄ij − µ̄+
j < τ.
γ j=0 γ i=0

We only consider the first and plug the identity γπij = π̄ij into it, which gives equivalence to

M −1
πij − N µ−
X
i < τ.
j=0

This in turn is equivalent to

M −1 Z i+1 Z j+1 Z i+1


X N N N
N2 π(x, y) dydx − µ− (x) dx < τ,
i j i
j=0 N N N
i+1
Z
N
Z 1 τ
π(x, y) dy − µ− (x) dx < .
i
N
0 N2
 
i i+1
Moreover, the ansatz functions for π and µ− are constant on N
, N
, which induces
equivalence to
i+1
Z
N
Z 1 τ
π(x, y) dy − µ− (x) dx < .
i
N
0 N2

This implies that if the solver terminates, we have


Z 1 τ
π(·, y) dy − µ− (·) < .
0 L1 ((0,1)) N

We summarize the choices for the consistent mapping of quantities arising from the discretiza-
tion to quantities the solver operates on in Table 1. Finally, we make a note on the calculation
of the coefficients cij for the cost function c(x, y) := (x − y)2 :

Z i+1 Z j+1  
N N 1 1
cij = N 2 (x − y)2 dydx = ... = (i − j)2 + .
i
N
j
N
N2 6

Conflict of Interest: The authors declare that they have no conflict of interest.
Quadratically regularized optimal transport 27

Table 1 Mapping discretization quantities to solver quantities.


Coefficient Solver Quantity Conversion
πij π̄ij π̄ij = γπij
cij c̄ij c̄ij = cij
µ−i µ̄− i µ̄−i = N µi

µ+j µ̄+ j µ̄j = N µ+


+
j
αi ᾱi ᾱi = αi
βj β̄j β̄j = βj
J J¯ J¯ = JN 2 γ
τ τ̄ τ̄ = τ N

References

1. Luigi Ambrosio and Nicola Gigli. A users guide to optimal transport. In Modelling and
optimisation of flows on networks, pages 1–155. Springer, 2013.
2. Dimitri P. Bertsekas. Nonlinear programming. Athena Scientific Optimization and Com-
putation Series. Athena Scientific, Belmont, MA, third edition, 2016.
3. Mathieu Blondel, Vivien Seguy, and Antoine Rolet. Smooth and sparse optimal transport.
In Amos Storkey and Fernando Perez-Cruz, editors, Proceedings of the Twenty-First In-
ternational Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings
of Machine Learning Research, pages 880–889, Playa Blanca, Lanzarote, Canary Islands,
09–11 Apr 2018. PMLR.
4. Guillaume Carlier, Vincent Duval, Gabriel Peyré, and Bernhard Schmitzer. Convergence of
entropic schemes for optimal transport and gradient flows. SIAM Journal on Mathematical
Analysis, 49(2):1385–1418, 2017.
5. Xiaojun Chen. Superlinear convergence of smoothing quasi-newton methods for nons-
mooth equations. Journal of Computational and Applied Mathematics, 80(1):105 – 126,
1997.
6. Xiaojun Chen. On convergence of SOR methods for nonsmooth equations. Numer. Linear
Algebra Appl., 9(1):81–92, 2002.
7. Xiaojun Chen, Zuhair Nashed, and Liqun Qi. Smoothing methods and semismooth meth-
ods for nondifferentiable operator equations. SIAM J. Numer. Anal., 38(4):1200–1216,
2000.
8. Christian Clason, Dirk A Lorenz, Hinrich Mahler, and Benedikt Wirth. Entropic regu-
larization of continuous optimal transport problems. arXiv preprint arXiv:1906.01333,
2019.
9. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In
Advances in neural information processing systems, pages 2292–2300, 2013.
10. Marco Cuturi and Gabriel Peyré. A smoothed dual approach for variational Wasserstein
problems. SIAM J. Imaging Sci., 9(1):320–343, 2016.
11. Montacer Essid and Justin Solomon. Quadratically regularized optimal transport on
graphs. SIAM Journal on Scientific Computing, 40(4):A1961–A1986, 2018.
12. Irene Fonseca and Giovanni Leoni. Modern methods in the calculus of variations: Lp
spaces. Springer Monographs in Mathematics. Springer, New York, 2007.
13. Aude Genevay, Gabriel Peyre, and Marco Cuturi. Learning generative models with
sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics,
pages 1608–1617, 2018.
14. Michael Hinze, René Pinnau, Michael Ulbrich, and Stefan Ulbrich. Optimization with
PDE constraints, volume 23. Springer Science & Business Media, 2008.
15. Leonid V. Kantorovič. On the translocation of masses. C. R. (Doklady) Acad. Sci. URSS
(N.S.), 37:199–201, 1942.
16. Nicolas Papadakis, Gabriel Peyré, and Edouard Oudet. Optimal transport with proximal
splitting. SIAM J. Imaging Sci., 7(1):212–238, 2014.
17. Gabriel Peyré and Marco Cuturi. Computational optimal transport. Foundations and
Trends in Machine Learning, 11(5-6):355–607, 2019.
18. Svetlozar T. Rachev and Ludger Rüschendorf. Mass transportation problems. Vol. I.
Probability and its Applications (New York). Springer-Verlag, New York, 1998. Theory.
28 Dirk A. Lorenz et al.

19. Svetlozar T. Rachev and Ludger Rüschendorf. Mass transportation problems. Vol. II.
Probability and its Applications (New York). Springer-Verlag, New York, 1998. Applica-
tions.
20. Lucas Roberts, Leo Razoumov, Lin Su, and Yuyang Wang. Gini-regularized optimal trans-
port with an application to spatio-temporal forecasting. arXiv preprint arXiv:1712.02512,
2017.
21. Filippo Santambrogio. Optimal transport for applied mathematicians, volume 87 of
Progress in Nonlinear Differential Equations and their Applications. Birkhäuser/Springer,
Cham, 2015. Calculus of variations, PDEs, and modeling.
22. Fredi Tröltzsch. Regular Lagrange multipliers for control problems with mixed pointwise
control-state constraints. SIAM Journal on Optimization, 15:616–634, 2005.
23. Cédric Villani. Topics in optimal transportation, volume 58 of Graduate Studies in Math-
ematics. American Mathematical Society, Providence, RI, 2003.
24. Cédric Villani. Optimal transport. Old and new, volume 338 of Grundlehren der Mathe-
matischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-
Verlag, Berlin, 2009.
25. Stephen J. Wright. Coordinate descent algorithms. Math. Program., 151(1, Ser. B):3–34,
2015.

You might also like