0% found this document useful (0 votes)

16 views28 pages

Quadratically Regularized Optimal Transport

Uploaded by

mymnaka82125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views28 pages

Quadratically Regularized Optimal Transport

Uploaded by

mymnaka82125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Noname manuscript No.

(will be inserted by the editor)

Quadratically regularized optimal transport

Dirk A. Lorenz · Paul Manns · Christian

Meyer

the date of receipt and acceptance should be inserted later

arXiv:1903.01112v4 [math.OC] 9 Sep 2019

Abstract We investigate the problem of optimal transport in the so-called Kan-

torovich form, i.e. given two Radon measures on two compact sets, we seek an
optimal transport plan which is another Radon measure on the product of the
sets that has these two measures as marginals and minimizes a certain cost func-
tion.
We consider quadratic regularization of the problem, which forces the optimal
transport plan to be a square integrable function rather than a Radon measure.
We derive the dual problem and show strong duality and existence of primal
and dual solutions to the regularized problem. Then we derive two algorithms to
solve the dual problem of the regularized problem: A Gauss-Seidel method and
a semismooth quasi-Newton method and investigate both methods numerically.
Our experiments show that the methods perform well even for small regulariza-
tion parameters. Quadratic regularization is of interest since the resulting optimal
transport plans are sparse, i.e. they have a small support (which is not the case
for the often used entropic regularization where the optimal transport plan always
has full measure).
Keywords optimal transport, regularization, semismooth Newton method,
Gauss-Seidel method, duality
Mathematics Subject Classification (2010) 49Q20, 65D99, 90C25

1 Introduction

In this paper we will investigate a regularized version of the optimal transport

problem. Optimal transport dates back to the work of Monge in 1781 but the

Dirk A. Lorenz, E-mail: [email protected]

TU Braunschweig, Institute of Analysis and Algebra, Germany
Paul Manns, E-mail: [email protected]
TU Braunschweig, Institute of Mathematical Optimization, Germany
Christian Meyer, E-mail: [email protected]
TU Dortmund, Fakultät für Mathematik, Germany
2 Dirk A. Lorenz et al.

problem formulation we use here is the one of Kantorovich [15]. Let us fix some
notation and formulate the problem: Let Ω1 ⊂ Rd1 , Ω2 ⊂ Rd2 be two compact do-
mains, denote Ω = Ω1 × Ω2 , and assume we are given two positive regular Radon
measures µ1 and µ2 on Ω1 and Ω2 , respectively. Further we assume that a cost
function c : Ω1 × Ω2 → R is given that models the cost of transporting a unit
of mass from x1 ∈ Ω1 to x2 ∈ Ω2 . The optimal transport problem asks to find a
transport plan Rπ , which is a Radon measure on Ω , such that it has minimal overall
transport cost Ω c(x1 , x2 ) dπ (x1 , x2 ) among all measures π which have µ1 and µ2
as first and second marginals, respectively, i.e. for all Borel sets A ∈ Ω1 it holds that
π (A × Ω2 ) = µ1 (A) and for all Borel sets B ∈ Ω2 it holds that π (Ω1 × B ) = µ2 (B ).
This problem has been studied extensively and we refer to the books [18, 19, 23,
24, 21]. One particular result is, that an optimal plan π ∗ exists and that the sup-
port of optimal plans is contained in the so-called c-superdifferential of a c-concave
function [1, Theorem 1.13]. For many cost functions c, this means that optimal
transport plans are supported on small sets and that they are in fact singular with
respect to the Lebesgue measure on Ω . This makes the numerical treatment of
optimal transport problems difficult and one can employ regularization to obtain
approximately optimal plans π that are functions on Ω . The regularization method
that has got the most attention recently is regularization with the negative en-
tropy of π and we refer to [16, 10, 4]. Entropic regularization has gotten popular
in machine learning applications due to the fact that it allows for the very simple
Sinkhorn algorithm (in the discrete case), see [9, 13] and also [17] for a recent and
thorough review of the computational aspects of optimal transport.
Regularizations different from entropic regularization has been much less stud-
ied. We are only aware of works in the discrete case, e.g. [3, 11]. In this work we
will investigate the case where we regularize the problem in L2 (Ω ). The paper
is organized as follows: In Section 2 we state the problem and analyze existence
and duality. It will turn out that existence of solutions of the dual problem will
be quite tricky to show, but we will show that dual solutions exist in respective
L2 spaces and that a straightforward optimality system characterizes primal-dual
optimality. In Section 3 we derive two different algorithms for the discrete version
of the quadratically regularized optimal transport problem, and in Section 4 we
comment on a simple discretization scheme and report numerical examples.

Notation. We will abbreviate x+ = max(x, 0) (and will apply this also to functions
and to measures where + will mean the positive part from the Hahn-Jordan de-
composition). By C (Ω ) we denote that space of continuous functions on Ω (and we
will always work on compact sets) equipped with the supremum norm k · k∞ and
by M(Ω ) we denote the space R of Radon measures on a compact domain and we
use the norm kµkM = sup{ f dµ | f ∈ C (Ω ), |f | ≤ 1}. The Lebesgue measure will
be λ (and we also use λ1 and λ2 to specify the Lebesgue measure on sets Ω1 and
Ω2 , respectively). For convenience, we use |Ω| for the Lebesgue measure of the set
Ω . Furthermore, for a Radon measure w ∈ M, we denote the absolutely and sin-
gular part arising from the Lebesgue decomposition with respect to the Lebesgue
measure by wac and ws , i.e. they satisfy wac λ and ws ⊥ λ. Duality pairings
are denoted by h·, ·i. If both arguments of the duality pairing are positive and the
duality pairing does not necessarily exist, e.g. for ψ ∈ M(Ω ) and x ∈ L2 (Ω ), we
set hψ, xi := +∞.
Quadratically regularized optimal transport 3

2 Quadratic regularization in the continuous case

For the quadratically regularized optimal transport problem we seek a transport

plan π ∈ L2 (Ω1 × Ω2 ) which for a given cost function c ∈ L2 (Ω1 × Ω2 ), a regular-
ization parameter γ > 0, and given functions µi ∈ L2 (Ωi ), i = 1, 2 solves
Z
γ 2
min hc, πiL2 + 2 kπkL2 subject to π (x1 , x2 ) dλ2 = µ1 (x1 ),
π Ω
Z 2
(1)
π (x1 , x2 ) dλ1 = µ2 (x2 ),
Ω1
π (x 1 , x 2 ) ≥ 0

where the constraints are understood pointwise almost everywhere.

2.1 Solutions of the primal problem

It is straight forward to show, that optimal transport plans exist:

Lemma 2.1 Problem (1) has an optimal solution if and onlyR if µ1 ∈ L2 (Ω1 ), µ2 ∈
L2 (Ω2 ), µ1 , µ2 ≥ 0 almost everywhere, and Ω µ1 (x1 ) dλ1 = Ω µ2 (x2 ) dλ2 .
R
1 2

∗ 2
Proof Assume that there is an optimal solution π ∈ L (Ω1 × Ω2 ). By Jensen’s
inequality we get
Z Z Z 2
µ21 (x1 ) dλ1 = π ∗ (x1 , x2 ) dλ2 dλ1
Ω1 Ω1 Ω
ZZ 2
≤ |Ω2 | π ∗ (x1 , x2 )2 dλ1 dλ2 < ∞
Ω1 ×Ω2
2
which shows µ1 ∈ L (Ω1 ). The argument for µ2 is similar. Non-negativity of µ1
and µ2 follows from non-negativity of π ∗ . Finally, by Fubini’s theorem
Z ZZ
µ1 (x1 ) dλ1 = π ∗ (x1 , x2 ) dλ1 dλ2
Ω1 Ω1 ×Ω2
Z
= µ2 (x2 ) dλ2
Ω2

Conversely, ifRµ1 ∈ L (Ω1 ) and µ2 ∈ L2 (Ω2 ) and µ1 , µ2 ≥ 0 we set C :=

2
1
R
µ (x ) dλ1 = Ω µ2 (x2 ) dλ2 . Then π (x1 , x2 ) = C
Ω1 1 1
µ1 (x1 )µ2 (x2 ) is feasible for (1)
2
and since the objective is continuous, coercive, and strongly convex a (unique) min-
imizer exists. u
t

2.2 Dual problem and existence of dual solutions

In the following section, we apply the classical Lagrange duality to the linear-
quadratic program (1). To this end, let us define the Lagrangian associated with
(1). In order to shorten the notation, we set
µ := γ µ1 ⊗ µ2 .
4 Dirk A. Lorenz et al.

Furthermore, we define
Z Z
2 2 2
P1 : L (Ω ) 3 π 7→ π dλ2 ∈ L (Ω1 ), P2 : L (Ω ) 3 π 7→ π dλ1 ∈ L2 (Ω2 ), (2)
Ω2 Ω1

and denote the the primal objective by

Z
γ
Eγ : L2 (Ω ) → R, Eγ (π ) := c π dλ + kπk2L2 (Ω ) . (3)
Ω 2

Then, the Lagrangian associated with (1) is given by

L : L2 (Ω ) × L2 (Ω1 ) × L2 (Ω2 ) × L2 (Ω ) → R,
L(π, α1 , α2 , %) := Eγ (π ) − h%, πiL2 (Ω )
+ hα1 , P1 π − µ1 iL2 (Ω1 ) + hα2 , P2 π − µ2 iL2 (Ω2 ) .

Then, by standard arguments, the primal problem in (1) is equivalent to

inf sup L(π, α1 , α2 , %), (PP)

π∈L2 (Ω ) α ∈L2 (Ω ), α ∈L2 (Ω )
1 1 2 2
%∈L2 (Ω ), %≥0

while its (Lagrangian) dual is given by

sup inf L(π, α1 , α2 , %). (DP)

2
α1 ∈L2 (Ω1 ), α2 ∈L2 (Ω2 ) π∈L (Ω )
%∈L2 (Ω ), %≥0

The main part of the upcoming analysis is devoted to the existence of solutions to
(DP). Once this is established, the necessary and sufficient optimality condition
associated with (1) in form of the variational inequality will allow us to derive an
optimality system that is also amenable for numerical computations.
To show existence for (DP), we first reformulate the dual problem. Since L is
quadratic w.r.t. π , the inner inf-problem is solved by
1
π= (ρ + α1 ⊕ α2 − c), (4)
γ

where the mapping ⊕ : L2 (Ω1 ) × L2 (Ω2 ) → L2 (Ω ) is defined via

(v1 ⊕ v2 )(x1 , x2 ) := v1 (x1 ) + v2 (x2 ) (5)

for almost all (x1 , x2 ) ∈ Ω and all vi ∈ L2 (Ωi ), i = 1, 2.

Remark 2.2 The map ⊕ is related to the adjoints of the projections P1 and P2
from (2) by α1 ⊕ α2 = P1∗ α1 + P2∗ α2 .

Inserting (4) into (DP) yields

Z
1
sup sup − (ρ + α1 ⊕ α2 − c)2 dλ
α1 ∈L2 (Ω1 ),α2 ∈L2 (Ω2 ) ρ≥0 2γ Ω Z Z (6)
+ µ1 α1 dλ1 + µ2 α2 dλ2
Ω1 Ω2
Quadratically regularized optimal transport 5

Again, the inner optimization problem is quadratic w.r.t. ρ so that its solution is
given by
ρ = −(α1 ⊕ α2 − c)− . (7)
Inserted in (6), this results in the following dual problem
⊕ α2 − c)+ k2L2 (Ω )

1
min Φ(α1 , α2 ) := 2 k(α1 

−γhα1 , µ1 i − γhα2 , µ2 i (D)
2

s.t. αi ∈ L (Ωi ), i = 1, 2.


To prove existence of solutions for this problem, we need to require the following
Assumption 1 The domains Ω1 and Ω2 are compact. Moreover, the cost function c
is in L2 (Ω ) and fulfills c ≥ c > −∞. Furthermore, the marginals µ1 and
R µ2 satisfy
µi ∈ L2 (Ωi ) and µi ≥ δ > 0, i = 1, 2. In addition we assume that Ω µ2 dλ1 =
R 1

Ω
µ2 dλ1 = 1.
1

Remark 2.3 The last assumption on the normalization

R of the marginals
R is just to
ease the subsequent analysis and can be relaxed by Ω µ2 dλ1 = Ω µ2 dλ1 , which
1 1
is needed anyway to ensure the existence of a solution to the primal problem, see
Lemma 2.1.
Remark 2.4 Note that there is an obvious source of non-uniqueness for the dual
problem (D): We can add a constant to α1 and subtract it from α2 and this does not
change the dual objective, i.e for any constant C it holds that Φ(α1 + C, α2 − C ) =
Φ(α1 , α2 ). This non-uniqueness will not cause trouble
R in the proofs and when
convenient, we remove it, e.g. by demanding that Ω α2 dλ2 = 0. 2

Observe that the objective Φ in (D) is also well defined for functions in αi ∈
L1 (Ωi ) with (α1 ⊕ α2 − c)+ ∈ L2 (Ω ). This gives rise to the following auxiliary dual
problem:
)
min Φ(α1 , α2 )
(D’)
s.t. αi ∈ L1 (Ωi ), i = 1, 2, (α1 ⊕ α2 − c)+ ∈ L2 (Ω ).
Our strategy to prove existence of solutions to (D) is now as follows:
1. First, we show that (D’) admits a solution (α1∗ , α2∗ ) ∈ L1 (Ω1 ) × L1 (Ω2 ), see
Proposition 2.9.
2. Then, we prove that α1∗ and α2∗ possess higher regularity, namely that they are
functions in L2 (Ωi ), i = 1, 2, cf. Theorem 2.10.
3. Thus, (α1∗ , α2∗ ) is feasible for (D) and, since the feasible set of (D’) contains
the one of (D), while the objective of (D’) restricted to L2 -functions coincides
with the objective in (D), this finally gives that (α1∗ , α2∗ ) is indeed optimal for
(D).
The reason to consider (D’) is essentially that the objective Φ is not coercive in
L2 (Ω ), but only in L1 (Ω ) (at least w.r.t. the negative part of αi ). Therefore, we
have to deal with weakly∗ converging sequences in the space of Radon measures
within the proof of existence of solutions. For this purpose, we need to extend the
objective to a suitable set. To that end, let us define
Z
G : L2 (Ω ) 3 w 7→ 1 2
2 w+ − wµ dλ ∈ R. (8)
Ω
6 Dirk A. Lorenz et al.

R R
Note that, thanks to Ω1
µ2 dλ1 = Ω1
µ2 dλ1 = 1, it holds
Z
Φ(α1 , α2 ) = G(α1 ⊕ α2 − c) − c µ dλ ∀ αi ∈ L2 (Ωi ), i = 1, 2. (9)
Ω

Of course, G is also well defined as a functional on the feasible set of (D’) and
we will denote this functional by the same symbol to ease notation. In order to
extend G to the space of Radon measures, consider for a given measure w ∈ M(Ω ),
the Hahn-Jordan decomposition w R= w+ − w− and assume that w+ ∈ L2 (Ω ).
2
Then, we set G(w) = Ω 12 w+
R
dλ − Ω µ dw. With a slight abuse of notation, we
denote this mapping by G, too. Furthermore, for w+ ∈ L2 (Ω ), − Ω w+ µ dλ is
R
2
Rfinite for µ ∈ L (Ω ) as in Assumption 1. Regarding, the negative part, we define
Ω
µ dw− := ∞, where this expression is not properlyR defined, as w− and µ are
both positive. Combining this, we obtain that − Ω µ dw ∈ R ∪ {∞}.
Note in this context that, if the singular part of w (w.r.t. the Lebesgue measure)
vanishes, then also w+ ∈ L1 (Ω ) and w+ (x) = max{0, w(x)} λ-a.e. in Ω so that both
functionals coincide on L2 (Ω ), which justifies this notation. Furthermore, we also
generalize the map ⊕ to the measure space by setting

α1 ⊕ α2 := α1 ⊗ λ2 + λ1 ⊗ α2 , αi ∈ M(Ωi ), i = 1, 2.

Again, it is easily seen that, for αi ∈ L2 (Ωi ), i = 1, 2, this definition boils down to
the one in (5). Also Remark 2.2 applies in that we can express α1 ⊕ α2 in terms
of the adjoints of P1 and P2 from (2) when defined appropriately.
The next lemma is rather obvious and covers the coercivity of G in L1 (Ω ) as
indicated above.

Lemma 2.5 Let Assumption 1 hold and suppose that a sequence {wn } ⊂ L2 (Ω ) fulfills

G(wn ) ≤ C < ∞ ∀ n ∈ N.
n n
Then, the sequences {w+ } and {w− } are bounded in L2 (Ω ) and L1 (Ω ), respectively.
1 2
R R
Proof We rewrite G as G(w) = Ω 2
w+ − w+ µ dλ + Ω
w− µ dλ. The positivity of
µ then implies
Z Z
n 2
kw+ kL2 (Ω ) = G(wn ) + n
w+ µ dλ − n
w− n
µ dλ ≤ C + kµkL2 (Ω ) kw+ kL2 (Ω ) ,
Ω Ω

which gives the first assertion. To see the second one, we use µ ≥ δ to estimate
Z Z Z
C ≥ G(wn ) = 1
2 ( w+
n
− µ ) 2
dλ − µ 2
/ 2 d λ + n
w− µ dλ
Ω Ω Ω
Z
n
≥− µ2 /2 dλ + δkw− k L 1 (Ω ) ,
Ω

which finishes the proof. u

The next lemma provides a lower semicontinuity result for G w.r.t. weak∗
convergence in M(Ω ). Note that, here, we need the extension of G as introduced
above.
Quadratically regularized optimal transport 7

Lemma 2.6 Let Assumption 1 be fulfilled and a sequence {wn } ⊂ L2 (Ω ) be given

such that wn *∗ w∗ in M(Ω ) and G(wn ) ≤ C < ∞ for all n ∈ N. Then there holds
∗
w+ ∈ L2 (Ω ) and
G(w∗ ) ≤ lim inf G(wn ). (10)
n→∞

n
Proof By virtue of Lemma 2.5, {w+ } is bounded in L2 (Ω ) and thus, there is a
n
subsequence of {w+ }, to ease notation denoted by the same symbol, that converges
weakly in L2 (Ω ) to some θ+ ∈ L2 (Ω ). Since the set {v ∈ L2 (Ω ) : v ≥ 0 a.e. in Ω}
is clearly weakly closed, we have θ+ ≥ 0 a.e. in Ω . With a little
R abuse of notation,
we denote the Radon measure induced by C (Ω ) 3 ϕ 7→ Ω θ+ ϕ dλ ∈ R by θ+ ,
too. If we define θ− := θ+ − w∗ ∈ M(Ω ), then w− n
= w+ n
− wn *∗ θ− in M(Ω )
∗
with θ− ≥ 0. Thus we have w = θ+ − θ− with two positive Radon measures
θ+ , θ− . The maximality property of the Hahn-Jordan decomposition then implies
∗ ∗
w+ ≤ θ+ . Since θ+ is absolutely continuous w.r.t. λ, the same thus holds for w+ ,
∗ ∗
i.e. w+ ∈ L1 (Ω ). Applying again w+ ≤ θ+ , which clearly also holds for the densities
pointwise λ-almost everywhere, we moreover deduce from the weak convergence
n
of w+ in L2 (Ω ) that
Z Z Z
∗ 2 n 2
(w+ ) dλ ≤ (θ+ )2 dλ ≤ lim inf (w+ ) dλ, (11)
Ω Ω n→∞ Ω

∗
which implies w+ ∈ L2 (Ω ) as claimed. Since the above reasoning applies to every
n
subsequence w+ that is weakly converging in L2 (Ω ), (11) holds for the whole se-
quence {w+ n
}, which together with the weak∗ convergence of wn and the definition
of G, gives (10).
u
t

Before we are in the position to prove existence for (D’), we need two additional
results on the ⊕-operator in the space of Radon measures.
R
Lemma 2.7 If αi ∈ M(Ωi ), i = 1, 2 and Ω2
dα2 = 0, then it holds that

1 2
kα1 kM ≤ |Ω2 |
kα1 ⊕ α2 kM and kα2 kM ≤ |Ω1 |
kα1 ⊕ α2 kM

Proof We estimate
ZZ
kα1 ⊕ α2 kM = sup φ(x1 , x2 ) d(α1 (x1 ) + α2 (x2 ))
kφk∞ ≤1 Ω1 ×Ω2
ZZ
≥ sup φ1 (x1 )φ2 (x2 ) d(α1 (x1 ) + α2 (x2 ))
kφ1 k∞ ≤1 Ω1 ×Ω2
kφ2 k∞ ≤1
" ZZ
= sup φ1 (x1 )φ2 (x2 ) dα1 (x1 )dλ2
kφ1 k∞ ≤1 Ω1 ×Ω2
kφ2 k∞ ≤1
ZZ #
+ φ1 (x1 )φ2 (x2 ) dλ1 dα2 (x2 ) . (12)
Ω1 ×Ω2
8 Dirk A. Lorenz et al.

R
Taking φ2 ≡ 1 and using Ω2
dα2 (x2 ) = 0 gives
Z Z Z
kα1 ⊕ α2 kM ≥ sup φ1 (x1 )dα1 (x1 )|Ω2 | + dα2 (x2 ) φ1 (x1 )dλ1
kφ1 k∞ ≤1 Ω1 Ω2 Ω1

= |Ω2 |kα1 kM .
Now we start again at (12) and estimate from below by taking φ1 ≡ 1 to get
Z Z Z
kα1 ⊕ α2 kM ≥ sup dα1 (x1 ) φ2 (x2 ) dλ2 + φ2 (x2 ) dα(x2 )|Ω1 |
kφ2 k∞ ≤1 Ω1 Ω2 Ω2
Z
≥ −|Ω2 | dα1 (x1 ) + |Ω1 |kα2 kM
Ω1

which implies
|Ω1 |kα2 kM ≤ kα1 ⊕ α2 kM + |Ω2 |kα1 kM
which completes the proof. u
t
The next lemma will be used to show that the negative part of the minimizer
of (D) does not have a singular part.
Lemma 2.8 Let c ∈ L1 (Ω ) and αi ∈ M(Ωi ) for i ∈ {1, 2} with Lebesgue decomposi-
tions, αi = fi + ηi satisfying fi λ and ηi ⊥ λ for i ∈ {1, 2}.
1. It holds that
(α1 ⊕ α2 − c)+ = (f1 ⊕ f2 − c + (η1 )+ ⊕ (η2 )+ )+ . (13)
2. If (αi )+ is absolutely continuous for i = 1, 2, then for α̃i = αi − (ηi )− for i = 1, 2,
it holds that
Φ(α̃1 , α̃2 ) ≤ Φ(α1 , α2 ).
Proof We first proof point 1. The measures fi , ηi exist by Lebesgue’s decomposition
theorem, see Theorem 1.155 in [12]. We combine these decompositions with α1 ⊕
α2 = α1 ⊗ λ + λ ⊗ α2 to arrive at Lebesgue’s decomposition of α1 ⊕ α2 with respect
to λ ⊗ λ, namely
α1 ⊕ α2 − c = f1 ⊕ f2 − c + η1 ⊕ η2 (14)
f1 ⊕ f2 − c λ ⊗ λ (15)
η1 ⊕ η2 ⊥ λ ⊗ λ (16)
(which holds true because c ∈ L1 (Ω ) ,→ M(Ω )). Now, we consider the Hahn-Jordan
decomposition of η1 ,
η1 = (η1 )+ − (η1 )−
(η1 )+ ⊥ (η1 )− , (17)
and obtain from (14) that
α1 ⊕ α2 − c = (f1 + η1 ) ⊕ (f2 + η2 ) − c
= f1 ⊕ f2 + η1 ⊕ η2 − c

= f1 ⊕ f2 + (η1 )+ − (η1 )− ⊕ η2 − c
= f1 ⊕ f2 + (η1 )+ ⊗ λ − (η1 )− ⊗ λ + λ ⊗ η2 − c
= f1 ⊕ f2 − c + (η1 )+ ⊕ η2 − (η1 )− ⊗ λ.
Quadratically regularized optimal transport 9

Furthermore,
(η1 )− ⊗ λ ⊥ f1 ⊕ f2 − c + (η1 )+ ⊕ η2
where the singularity with respect to f1 ⊕ f2 − c is due to (15) and (16) and the
singularity with respect to (η1 )+ ⊕ η2 is due to (17). Thus,

(α1 ⊕ α2 − c)− = (f1 ⊕ f2 − c + (η1 )+ ⊕ η2 )− + (−(η1 )− ⊗ λ)−

= (f1 ⊕ f2 − c + (η1 )+ ⊕ η2 )− + (η1 )− ⊗ λ.

as (η1 )− ⊗ λ is a positive measure. Consequently,

(α1 ⊕ α2 − c)+ = (f1 ⊕ f2 − c + (η1 )+ ⊕ η2 )+ .

Repeating this argument with the Hahn-Jordan decomposition of η2 yields the

claim.
The second part of the lemma is a direct consequence of the first: Since (α1 ⊕
α2 − c)+ = (α̃1 ⊕ α̃2 − c)+ , the first summand in the functional Φ is equal for αi
and α̃i . However, the second summand in Φ can not decrease since α̃i ≤ αi , µi ≥ 0
and γh(ηi )− , µi i = ∞ if the duality pairing does not exist. u
t

Now we are ready to prove the existence result for (D’):

Proposition 2.9 Under Assumption 1 the minimization problem (D’) admits a solu-
tion (α1∗ , α2∗ ) ∈ L1 (Ω1 ) × L1 (Ω2 ).

Proof We proceed via the classical direct method of the calculus of variations. For
this purpose, let {(α1n , α2n )} ⊂ L1 (Ω1 ) × L1 (Ω2 ) with (α1n ⊕ α2n − c)+ ∈ L2 (Ω ) be a
minimizing sequence for (D’), where R we shift α1 and α2 by adding and subtracting
constants such that we obtain Ω2 α2 dλ2 = 0. Note that, due to its additive
structure, this does not change the objective Φ in (D’), cf. Remark 2.4.
Next, let us define wn := α1n ⊕ α2n − c. Then, thanks to (9) and Lemma 2.5,
the sequence {wn } is bounded in L1 (Ω ). Hence, there is a weakly∗ converging
subsequence, which we denote by the same symbol w.l.o.g., i.e. wn *∗ w̃ in M(Ω ).
Now, Lemma 2.6 applies giving that

w̃+ ∈ L2 (Ω ), (18)
G(w̃) ≤ lim inf G(wn ). (19)
n→∞

Since {wn } is bounded in M(Ω ), the same holds for {α1n ⊕ α2n } and, as α2n is
normalized, Lemma 2.7 gives that {αin } is bounded in M(Ωi ), i = 1, 2. Therefore,
we can select a further (sub-)subsequence, still denoted by the same symbol to
ease notation, such that

αin *∗ α̃i in M(Ωi ), i = 1 , 2.

Since the mapping M(Ω1 ) × M(Ω2 ) 3 (Rα1 , α2 ) 7→ Rα1 ⊕ α2 ∈M(Ω ) is the adjoint of
the projection mapping C (Ω ) 3 ϕ 7→ Ω2 ϕ dλ2 , Ω1 ϕ dλ1 ∈ C (Ω1 ) × C (Ω2 ), see
Remark 2.2, it is weakly∗ continuous so that

w̃ = α̃1 ⊕ α̃2 − c. (20)

10 Dirk A. Lorenz et al.

Next, we investigate the singular parts of α̃1 and α̃2 . We start with the positive
part and employ Lebesgue’s decomposition of α̃1 and α̃2 :
α̃i = αi∗ + η̃i , αi∗ λi , η̃i ⊥ λi , i = 1 , 2.

In the following we will see that the regular parts αi∗

∈ L1 (Ωi ), i = 1, 2, are exactly
the solution of (D’). For this purpose, we first show that the positive parts of η̃1
and η̃2 vanish. We have α1∗ ⊕ α2∗ − c λ, η̃1 ⊕ η̃2 ⊥ λ, and, by uniqueness of
Lebesgue’s decomposition, w̃s = η̃1 ⊕ η̃2 . But from (18), we know that (w̃s )+ = 0.
Combining this fact with Lemma 2.8, applied to the case f1 = 0, f2 = 0, and c = 0,
we obtain
(η̃1 ⊕ η̃2 )+ = (η̃1 )+ ⊕ (η̃2 )+ .
and consequently, (η̃i )+ = 0 for i = 1, 2 by positivity. Therefore, (α̃i )+ are L1 -
functions rather than measures and
w̃+ = (α̃1 ⊕ α̃2 − c)+ = (α1∗ ⊕ α2∗ − c)+ (21)
∗ ∗
Now Lemma 2.8 shows feasibility of (α1 , α2 ) for (D’) and we also see that
Z Z Z
Φ(α1∗ , α2∗ ) = 21 (α1∗ ⊕ α2∗ − c)2+ dλ − γ µ1 α1∗ dλ1 − γ µ2 α2∗ dλ2
Ω Ω1 Ω2
Z
≤ G(α̃1 ⊕ α̃2 − c) − c µ dλ
Z Ω (22)
= G(w̃) − c µ dλ
Ω
≤ lim inf Φ(α1n , α2n ),
n→∞

which demonstrates the optimality of (α1∗ , α2∗ ). u

t
∗
R
In the following, we assume that Ω2 α2 dλ2 = 0. If this is not the case, then
we can again shift α1∗ and α2∗ without changing the value of Φ, cf. Remark 2.4.
Theorem 2.10 Let Assumption 1 hold. Then every optimal dual solution (α1∗ , α2∗ )
from Proposition 2.9 satisfies αi∗ ∈ L2 (Ωi ), i = 1, 2, and is therefore also a solution of
the original dual problem (D). Moreover, the negative parts of αi∗ are bounded and the
function γ1 ((α1∗ ⊕ α2∗ ) − c)+ has marginals the µ1 and µ2 .

Proof We again consider the positive and the negative part separately and start
with (α1∗ )− . Let ϕ ∈ Cc∞ (Ω1 ) and t > 0 be fixed, but arbitrary. Then, thanks to
0 ≤ ((α1∗ + t ϕ) ⊕ α2∗ − c)+ ≤ (α1∗ ⊕ α2∗ − c)+ + t ϕ+ ,
Proposition 2.9 implies that ((α1∗ + tϕ) ⊕ α2∗ − c)+ ∈ L2 (Ω ) so that (α1∗ + tϕ, α2∗ ) is
feasible for (D’). Therefore, the optimality of (α1∗ , α2∗ ) for (D’) yields
Z Z
1 1 ∗ ∗ 2 ∗ ∗ 2

2 (( α1 + t ϕ ) ⊕ α2 − c )+ − ( α1 ⊕ α2 − c) + dλ − γ µ1 ϕ dλ1 ≥ 0 ∀ t > 0.
Ω t Ω1
2
Owing to the continuous differentiability of R 3 r 7→ ∈ R, the first integrand
r+
converges to 2(α1∗ ⊕ α2∗ − c)+ ϕ λ-a.e. in Ω for t & 0. Moreover, the Lipschitz
continuity of the max-function gives that
1 ∗
((α1 + t ϕ) ⊕ α2∗ − c)2+ − (α1∗ ⊕ α2∗ − c)2+ ≤ |ϕ|2 + 2 |ϕ| (α1∗ ⊕ α2∗ − c)2+ a.e. in Ω
t
Quadratically regularized optimal transport 11

holds for 0 < t ≤ 1. Hence, due to Lebesgue’s dominated convergence theorem, we

are allowed to pass to the limit t & 0 and obtain in this way
Z Z
(α1∗ ⊕ α2∗ − c)+ dλ2 − γµ1 ϕ dλ1 ≥ 0.
Ω1 Ω2

Since ϕ ∈ Cc∞ (Ω ) was arbitrary, the fundamental lemma of the calculus of varia-
tions thus gives
Z
(α1∗ ⊕ α2∗ − c)+ dλ2 = γµ1 λ1 -a.e. in Ω1 . (23)
Ω2

Next, define the following sequence of functions in L1 (Ω2 ):

fn (x2 ) := (−n + α2∗ (x2 ) − c)+ , n ∈ N,

where c is the lower bound for c from Assumption 1. Then we have fn ≥ 0 λ2 -

a.e. Ω2 and fn & 0 λ2 -a.e. in Ω2 so that the monotone convergence theorem gives
Z Z
(−n + α2∗ (x2 ) − c)+ dλ2 = fn (x2 ) dλ2 → 0 as n → ∞.
Ω2 Ω2

Thus there exists N ∈ N such that

Z
(−N + α2∗ (x2 ) − c)+ dλ2 < γ δ, (24)
Ω2

where δ > 0 is the threshold for µ1 from Assumption 1. Now assume that α1∗ ≤ −N
λ1 -a.e. on a set of E ⊂ Ω1 of positive Lebesgue measure. Then
Z Z
(α1∗ ⊕ α2∗ − c)+ dλ2 ≤ (−N ⊕ α2∗ − c)+ dλ2 < γ δ ≤ γ µ1 λ1 -a.e. in E,
Ω2 Ω2

which contradicts (23). Therefore, α1∗ > −N λ1 -a.e. in Ω1 , which even implies that
(α1∗ )− ∈ L∞ (Ω1 ). Concerning (α2∗ )− , one can argue in exactly the same way to
conclude that (α2∗ )− ∈ L∞ (Ω2 ), too.
For the positive parts we find

|Ω2 | kα1∗ k2L2 (Ω1 ) + |Ω1 | kα2∗ k2L2 (Ω2 )

Z Z
|α1∗ ⊕ α2∗ |2 dλ α2∗ dλ2 = 0

= since
Ω Ω2
Z
= (α1∗ ⊕ α2∗ )2+ + (α1∗ ⊕ α2∗ )2− dλ
Ω
Z
≤2 (α1∗ ⊕ α2∗ − c)2+ + c2+ + (α1∗ )2− + (α2∗ )2− dλ < ∞,
Ω

where we used (21) and the boundedness of the negative R parts proven above. Note
that the constant shift, potentially needed to ensure Ω2 α2∗ dλ2 = 0 has no effect
on the equation in (21) due to the additive structure of ⊕.
We have thus shown that (α1∗ , α2∗ ) is feasible for (D). Since (α1∗ , α2∗ ) solves (D’),
whose objective is the same as in (D), while its feasible set is larger, this implies
that we have found a solution to (D). u
t
12 Dirk A. Lorenz et al.

We now show that, if π ∗ is of the form π ∗ = γ −1 (α1∗ ⊕α2∗ −c)+ with two functions
αi∗∈ L2 (Ωi ), i = 1, 2, and has the marginals µ1 and µ2 , respectively, then it solves
the necessary and sufficient optimality conditions of the primal problem (1) in
form of the following variational inequality:
π∗ ∈ F , hγπ ∗ + c, π − π ∗ iL2 ≥ 0 ∀ π ∈ F. (VI)
Herein, F is the (convex) feasible set of (1), i.e.
n Z
F := π ∈ L2 (Ω ) : π ≥ 0 λ-a.e. in Ω, π dλ2 = µ1 λ1 -a.e. in Ω1 ,
Ω2
Z o
π dλ1 = µ2 λ2 -a.e. in Ω2 .
Ω1

For this purpose, let π ∈ F be fixed but arbitrary. Multiplying the equality con-
straints in F with α1∗ and α2∗ , respectively, integrating the arising equations and
add them yields
Z Z Z
µ1 α1∗ dλ1 + µ2 α2∗ dλ2 = π (α1∗ ⊕ α2∗ )dλ
Ω1 Ω2 Ω
Z Z
π (α1∗ ⊕ α2∗ − c)+ + c dλ − π (α1∗ ⊕ α2∗ − c)− dλ

=
Ω Ω
Z
≤ π (γπ ∗ + c)dλ, (25)
Ω

where we used π ≥ 0 for the last inequality. Using the feasibility of π ∗ , we find
similarly
Z Z Z
µ1 α1∗ dλ1 + µ2 α2∗ dλ2 = π ∗ (α1∗ ⊕ α2∗ − c) + c dλ

Ω1 Ω2 Ω
Z
γ −1 (α1∗ ⊕ α2∗ − c)+ (α1∗ ⊕ α2∗ − c) + c dλ

=
ZΩ
= π ∗ (γπ ∗ + c)dλ. (26)
Ω

Combining (25) and (26) now yields (VI). As (1) is a strictly convex minimization
problem, this shows that, if π ∗ has the form π ∗ = γ −1 (α1∗ ⊕ α2∗ − c)+ with functions
αi∗ ∈ L2 (Ωi ) and satisfies π ∗ ∈ F , then it is a solution of (1). On the other
hand, we know from Theorem 2.10 that, under Assumption 1 (more or less needed
for the existence of solutions of (1) anyway), there always exist αi∗ ∈ L2 (Ωi ) so
that π ∗ = γ −1 (α1∗ ⊕ α2∗ − c)+ satisfies the equality constraints in F . Therefore, in
summary we have deduced the following:
Theorem 2.11 (Necessary and Sufficient Optimality Conditions for (1)) Un-
der Assumption 1, π ∗ ∈ L2 (Ω ) is a solution of (1) if and only if there exist functions
αi∗ ∈ L2 (Ωi ), i = 1, 2, such that the following optimality system is fulfilled:
π∗ − 1
α1∗ ⊕ α2∗ − c

γ +
=0 λ-a.e. in Ω, (27a)
Z
α1∗ ⊕ α2∗ − c

+
dλ2 = γµ1 λ1 -a.e. in Ω1 , (27b)
Ω2
Z
α1∗ ⊕ α2∗ − c

+
dλ1 = γµ2 λ2 -a.e. in Ω2 . (27c)
Ω1
Quadratically regularized optimal transport 13

The significance of Theorem 2.11 lies in the fact that we can characterize
optimality of π by just two equalities in L2 (Ω1 ) and L2 (Ω2 ), respectively, namely
(27b) and (27c). Thus, we effectively reduce the size of the problem from searching
one function on Ω = Ω1 × Ω2 to searching two functions, one on Ω1 and one on Ω2
(similarly as for entropic regularization, cf. [4]). This will be exploited numerically
in Section 3.

2.3 Regularization of the dual problem

As seen before, the dual problem in (D) is not uniquely solvable. One source of
non-uniqueness is of course the kernel of the map (α1 , α2 ) 7→ α1 ⊕ α2 . This kernel
is one-dimensional and is spanned by the function (1, −1), which could be easily
taken into account in an algorithmic framework. However, there is another source
of non-uniqueness due to the max-operator that cuts of the negative part. Here
is a simple example where dual solutions are not unique: For Ω1 = Ω2 = [0, 1],
µ1 = µ2 ≡ 1, and
(
C, if 21 ≤ x ≤ 1, 1
2 ≤ y ≤ 1,
c(x, y ) := with C > 4,
0, else,

one can show by a straight forward calculation that, for every δ ∈ [0, C− 4
2 ], the
tuple
( (
∗ 1 + δ, if x ∈ [0, 12 ), ∗ 3 + δ, if y ∈ [0, 21 ),
α1 (x) = α2 (y ) =
−1 − δ, if x ∈ [ 12 , 1], 1 − δ, if y ∈ [ 12 , 1],

solves the optimality system (27b)–(27c). This shows that the potential structure
of non-uniqueness might become fairly intricate. A situation like this can certainly
happen in the discretized problem we will derive in Section 2.4 and can lead to
problems when we derive algorithms for the discrete problem since non-unique
solutions imply a degenerate Hessian at the optimum.
Therefore, we investigate the following regularization of the dual problem:

kα1 k2L2 (Ω1 ) + kα2 k2L2 (Ω2 )

ε
)
min Φε (α1 , α2 ) := Φ(α1 , α2 ) + 2
(Dε )
s.t. αi ∈ L2 (Ωi ), i = 1, 2,

with a regularization parameter ε > 0. It is clear that the additional quadratic

terms in the regularized objective Φε yield that the latter is strictly convex and
coercive in L2 (Ω1 ) × L2 (Ω2 ). Therefore, for every ε > 0, (Dε ) admits a unique
solution.

Proposition 2.12 Let {εn } ⊂ R+ be a sequence converging to zero and denote the
solutions of (Dε ) with ε = εn by (α1n , α2n ) ∈ L2 (Ω1 ) × L2 (Ω2 ). Then the sequence
{(α1n , α2n )} admits a weak accumulation point, every weak accumulation point is also
strong one and a solution of the original dual problem (D).
14 Dirk A. Lorenz et al.

Proof Let (α1∗ , α2∗ ) ∈ L2 (Ω1 ) ×L2 (Ω2 ) denote an arbitrary globally optimal solution
of (D) (whose existence is guaranteed by Theorem 2.10). Then the optimality of
(α1∗ , α2∗ ) for (D) and of (α1n , α2n ) for (Dε ) (with ε = εn ) gives

Φ(α1∗ , α2∗ ) + εn
kα1n k2L2 (Ω1 ) + kα2n k2L2 (Ω2 ) ≤ Φεn (α1n , α2n ) ≤ Φεn (α1∗ , α2∗ )

2

which implies

kα1n k2L2 (Ω1 ) + kα2n k2L2 (Ω2 ) ≤ kα1∗ k2L2 (Ω1 ) + kα2∗ k2L2 (Ω2 ) . (28)

Thus, the boundedness of {(α1n , α2n )} in L2 (Ω1 ) × L2 (Ω2 ). This in turn gives the
existence of a weak accumulation point as claimed.
Now assume that (α̃1 , α̃2 ) is such a weak accumulation point, i.e.

(α1n , α2n ) * (α̃1 , α̃2 ) in L2 (Ω1 ) × L2 (Ω2 ) (29)

(for a subsequence). Using again the optimality of (α1∗ , α2∗ ) and (α1n , α2n ), respec-
tively, we obtain

Φ(α1∗ , α2∗ ) ≤ Φ(α1n , α2n ) ≤ Φεn (α1n , α2n ) ≤ Φεn (α1∗ , α2∗ ) → Φ(α1∗ , α2∗ ). (30)

On the other hand, by convexity and weak lower semicontinuity of Φ we get from
(29) and (30) that

Φ(α̃1 , α̃2 ) ≤ lim inf Φ(α1n , α2n ) = lim Φ(α1n , α2n ) = Φ(α1∗ , α2∗ ),
n→∞ n→∞

which gives in turn the optimality of the weak limit. Estimate (28) for the choice
(α1∗ , α2∗ ) = (α̃1 , α̃2 ) shows that

kα1n k2L2 (Ω1 ) + kα2n k2L2 (Ω2 ) ≤ kα̃1 k2L2 (Ω1 ) + kα̃2 k2L2 (Ω2 )

and thus, we have

lim inf kα1n k2L2 (Ω1 ) + kα2n k2L2 (Ω2 ) ≤ kα̃1 k2L2 (Ω1 ) + kα̃2 k2L2 (Ω2 ) ,
n→∞

but (α1n , α2n ) 9 (α̃1 , α̃2 ) would imply

kα̃1 k2L2 (Ω1 ) + kα̃2 k2L2 (Ω2 ) < lim inf kα1n k2L2 (Ω1 ) + kα2n k2L2 (Ω2 )
n→∞

and consequently, we have (α1n , α2n ) → (α̃1 , α̃2 ) in L2 (Ω1 ) × L2 (Ω2 ). u

Theorem 2.13 Let {εn } ⊂ R+ be a sequence converging to zero and denote the solu-
tions of (Dε ) with ε = εn again by (α1n , α2n ) ∈ L2 (Ω1 ) × L2 (Ω2 ). Moreover, define

n
πn := 1
γ (α1 ⊕ α2n − c)+ . (31)

Then πn converges strongly in L2 (Ω ) to the unique solution of (1).

Quadratically regularized optimal transport 15

Proof From (28), we know that {(α1n , α2n )} is bounded and hence, {πn } is bounded
in L2 (Ω ). Thus,
πn * π̃ in L2 (Ω ) (32)
for some subsequence. Now we show that π̃ is the optimal for (1). Weak closedness
of {π ∈ L2 (Ω ) : π (x1 , x2 ) ≥ 0 a.e. in Ω} implies π̃ ≥ 0. Integrating the first-order
optimality conditions for (Dε )
Z
(α1n ⊕ α2n − c)+ dλ2 + ε α1n = γ µ1 λ1 -a.e. in Ω1 (33)
Ω2
Z
(α1n ⊕ α2n − c)+ dλ1 + ε α2n = γ µ2 λ2 -a.e. in Ω2 . (34)
Ω1

against some ϕ1 ∈ Cc∞ (Ω1 ), inserting the definition of πn , and integrating over Ω1
yields Z Z Z Z
εn
πn dλ2 ϕ1 dλ1 = µ1 ϕ1 dλ1 − α1n ϕ1 dλ1
Ω1 Ω2 Ω1 γ Ω1
Passing to the limit we obtain
Z Z Z
π̃ dλ2 ϕ1 dλ1 = µ1 ϕ1 dλ1 ,
Ω1 Ω2 Ω1

and thus, π̃ satisfies the first equality constraint in (1). The second equality con-
straint can be verified analogously.
To show optimality of π̃ , we test the optimality conditions in (33) and (34)
with α1n and α2n , respectively, and get
Z
n n γ2 2
Φεn (α1 , α2 ) = 2 kπn kL2 (Ω ) − γ πn (α1n ⊕ α2n ) dλ − ε2n kα1n k2L2 (Ω1 ) − ε2n kα2n k2L2 (Ω2 )
Ω
Z
γ2 2
= − 2 kπn kL2 (Ω ) − γ c πn dλ − ε2n kα1n k2L2 (Ω1 ) − ε2n kα2n k2L2 (Ω2 )
Ω
= −γEγ (πn ) − εn
2 kα1n k2L2 (Ω1 ) − εn
2 kα2n k2L2 (Ω2 ) ,

where Eγ is the primal objective from (3). Similarly, we get

Φ(α1∗ , α2∗ ) = −γ Eγ (π ∗ ),

where π ∗ ∈ L2 (Ω ) is the unique solution of (1) and (α1∗ , α2∗ ) ∈ L2 (Ω1 ) × L2 (Ω2 )
solves the dual problem (D). Now, putting everything so far together, we obtain

lim Eγ (πn ) = lim − γ1 Φεn (α1n , α2n ) − 2εnγ kα1n k2L2 (Ω1 ) − 2εnγ kα2n k2L2 (Ω2 )
n→∞ n→∞
= − γ1 Φ(α1∗ , α2∗ ) = Eγ (π ∗ ).

On the other hand, Eγ is weakly lower semicontinuous, and therefore

Eγ (π̃ ) ≤ lim inf Eγ (πn ) = Eγ (π ∗ ).

n→∞

This gives the optimality of π̃ and by strict convexity also uniqueness, i.e. π̃ =
π ∗ . Thus, the weak limit is unique and a well known argument by contradiction
therefore implies the weak convergence of the whole sequence {πn } to π ∗ . Finally,
strong convergence follows from a standard argument. u
t
16 Dirk A. Lorenz et al.

2.4 The discrete dual problem

We show a simple discretization of the quadratically regularized optimal transport

problem (1) by piecewise constant approximation in Appendix A. To keep the
notation concise, we state the corresponding discrete optimal transport problem
and illustrate the duality already here. This will be the basis of our algorithms
we derive in Section 3. A discrete version of the continuous problem (1) is the
finite-dimensional problem
min hπ, ci + γ2 kπk2F s.t. π T 1M = µ, π1N = ν, π ≥ 0 (35)
π∈RM ×N

where 1N ∈ RN denotes the PNvector ofPall ones, µ ∈ RN , ν ∈ RM denote the

M
discretized marginals with j =1 µj = i=1 νj , and c ∈ RM ×N denotes the dis-
cretized cost. Note that we slightly changed the notation from µ1 and µ2 to µ
and ν , respectively. For the discrete form of the optimality system (27) we further
replace the Lagrange multipliers α1 and α2 by α and β , respectively, and get
1
π= γ (α ⊕ β − c )+ (36a)
M
X
(αi + βj − cij )+ = γµj , j = 1, . . . , N (36b)
i=1
N
X
(αi + βj − cij )+ = γνi , i = 1, . . . , M (36c)
j =1

where α ∈ RM , β ∈ RN and (α ⊕ β )i,j = αi + βj is the “outer sum”. The discrete

counterpart of Φ from (D) is
Φ(α, β ) = 1
2 k (α ⊕ β − c)+ k2F − γhν, αi − γhµ, βi
where k · kF denotes the Frobenius norm.
We write the optimality condition (36b)-(36c) as a non-smooth equation F (α, β ) =
0 in RM +N with
!
( N
P
F1 (α, β ) j =1 (αi + βj − c ij ) +
− γν i ) i =1,...,M
F (α, β ) = = PM (37)
F2 (α, β ) ( i=1 (αi + βj − cij )+ − γµj )j =1,...,N
(note that F1 = ∂α Φ and F2 = ∂β Φ). Since F is the composition of Lipschitz
continuous and semismooth functions, we have the following result (for the chain
rule for semismooth functions, see e.g. [14, Thm. 2.10]):
Lemma 2.14 The function F (and thus, the gradient of Φ) is (globally) Lipschitz
continuous and semismooth.

3 Algorithms

The optimality system (36b), (36c) for the smooth and convex problem (D) can be
solved by different methods. In [3] the authors propose to use a generic L-BFGS
solver and also derive an alternating minimization scheme, which is similar to
the non-linear Gauss-Seidel method in the next section, but differs slightly in the
numerical realization and [20] also uses an off-the-shelf solver. Here we propose
methods that exploit the special structure of the optimality system: A non-linear
Gauss-Seidel method and a semismooth Newton method.
Quadratically regularized optimal transport 17

3.1 Non-linear Gauss-Seidel

The method in this section is similar to the one described in the Appendix of [3],
but we describe it here for the sake of completeness. A close look at the optimality
system
N
X
(αi + βj − cij )+ = γνi , i = 1, . . . , M. (38a)
j =1
M
X
(αi + βj − cij )+ = γµj , j = 1, . . . , N (38b)
i=1

shows that we can solve all M equations in (38a) for the αi in parallel (for fixed β )
since the ith equation depends on αi only. Similarly, all N equations in (38b) can
be solved for the βj if α is fixed. Hence, we can perform a non-linear Gauss-Seidel
method for these non-smooth equations (also known as alternating minimization,
nonlinear SOR or coordinate descent method for Φ [6, 25]), i.e. alternatingly solving
the equations (38a) for α (for fixed β ) and then the equations (38b) for β (for fixed
α). The whole method is stated in Algorithm 1. Since Φ is convex with Lipschitz
continuous gradient (cf. Lemma 2.14) the convergence of the algorithm follows
from results in [2].

Algorithm 1 Non-linear Gauss-Seidel for quadratically regularized optimal trans-

port
Initialize: β 0 ∈ RN , set k = 0
repeat
Set αk+1 to be the solution of (38a) with β = β k .
Set β k+1 to be the solution of (38b) with α = αk+1 .
k ←k+1
until some stopping criterion

Each equation for an αi or βj is just a single scalar equation for a scalar

quantity and the structure of the equation is of the following form: For a given
vector y ∈ Rn and right hand side b ∈ R, solve
n
X
f (x) := (x − yj )+ = b. (39)
j =1

Of course, one can solve this problem by bisection, but here are two other, more
efficient methods to solve equations of the type (39):
Direct search. If we denote by y[j ] the j -th smallest entry of y (i.e. we sort y in
an ascending way), we get that
n
X
f (x) = (x − y [ j ] )+
j =1

0,
 x ≤ y[1]
= kx − kj=1 y[j ] ,
P
y[k] ≤ x ≤ y[k+1] , k = 1, . . . , n − 1
 Pn
nx − j =1 y[j ] , x ≥ y[n] .

18 Dirk A. Lorenz et al.

To obtain the solution of (39) we evaluate f at the break points y[j ] until we
find the interval [y[k] , y[k+1] [ in which the solution lies (by finding k such that
f (y[k] ) ≤ b < f (y[k+1] )), and then setting
Pk
b+ j =1 y[j ]
x= .
k
The complexity of the method is dominated by the sorting of the vector y , its
complexity is O(n log(n)).
Semismooth Newton. Although f is non-smooth, we may perform Newton’s
method here. The function f is piecewise linear and on each interval ]y[j ] , y[j +1] [
is has the slope j (a simple situation with n = 3 is shown in Figure 1). At the
break points we may define f 0 (y[j ] ) = j and then we iterate
f (xk )
xk+1 = xk − f 0 (x k )
.

If we start with x0 ≥ y [n] = maxk yk , the method will produce a monotoni-

cally decreasing sequence which converges in at most n steps. Actually, we can
initialize the method with any x0 that is strictly larger than y[1] = mink yk .
Note that we do not need to sort the values of yk to calculate the derivative
since we have f 0 (x) = #{i : x ≥ yi }. In practice, the method usually needs
much less iterations than n.

f (x)

y[1] y[2]
y[3] x

Fig. 1 Illustration of the non-smooth function f from (39).

3.2 Semismooth Newton

As seen in Lemma 2.14, the mapping F is semismooth and hence, we may use a
semismooth Newton method [5, 7].
A simple calculation proves the following lemma.
Lemma 3.1 A Newton derivative of F from (37) at (α, β ) is given by

diag(σ1N ) σ
G= ∈ R(M +N )×(M +N )
σT diag(σ T 1M )
Quadratically regularized optimal transport 19

where σ ∈ RM ×N is given by
(
1 αi + βj − cij ≥ 0
σij =
0 otherwise.
A step of the semismooth Newton method for the solution of F (α, β ) = 0 would
consist of setting
k+1 k k k
α α δα k k δα
k+1 = k − k where F ( α , β ) = G k .
β β δβ δβ
However, the next lemma shows, that G has a non-trivial kernel.
Lemma 3.2 Let G be the Newton derivative of F at (α, β ) defined in Lemma 3.1.
Then the following holds true:
1. G ∈ R(M +N )×(M +N ) is symmetric,
2. G is positive semi-definite,
3. (a, b) ∈ kern(G) if and only if σij (ai + bj ) = 0 for all 1 ≤ i ≤ M , 1 ≤ j ≤ N .
Proof Symmetry of G is clear by construction. To see that G is positive semi-
definite we calculate
N X
M N X
M N X
M
(a, b)> G(a, b) =
X X X
σij a2i + σij b2j + 2 σij ai bj
j =1 i=1 j =1 i=1 j =1 i=1
N X
X M
= σij (ai + bj )2 ≥ 0.
j =1 i=1

Due to the non-negativity of σ , this also shows the last point. u

t
The third point of the lemma shows that the kernel of G may have a high dimen-
sion, depending on the matrix σ . Hence we resort to a quasi Newton method where
we regularize the Newton step arising from the dual problem from Section 2.2 by
setting
k+1 k k k
α α δα k k δα
k+1 = k − k where F ( α , β ) = (G + εI ) k
β β δβ δβ
with a small ε > 0. By [5], the method still converges, but only a local linear rate
is guaranteed. We note that we have not applied the semismooth Newton method
to the regularized dual problem from Section 2.3. This would also be possible, but
lead not only to the regularized Newton matrix from above but we would also have
to adapt the objective F in the computation of the update.
Let us make a few remarks on the the regularized Newton step and its numerical
treatment.
– The matrix σ (and hence the Newton matrix G) is usually very sparse. The
closer α and β are to the optimal ones, the closer (αi + βj − cij )+ is to the
optimal regularized transport plan π and for small γ this usually very sparse.
– Since G is positive semi-definite, the regularized step could be done by the
method of conjugate gradients. However, any linear solver that can exploit the
sparsity of G can be used.
As usual, the regularized semismooth Newton method may not converge globally.
A simple globalization technique is an Armijo linesearch in the Newton direction.
The full method is described in Algorithm 2.
20 Dirk A. Lorenz et al.

Algorithm 2 Globalized and regularized semismooth Newton method quadrati-

cally regularized optimal transport
Initialize: α0 ∈ RM , β 0 ∈ RN , set k = 0, choose regularization parameter ε > 0, Armijo
parameters θ, κ ∈]0, 1[, and a tolerance τ > 0
repeat
Calculate
(
1 Pij ≥ 0
Pij = αki + βjk − cij , σij = , and πij = max(Pij , 0)/γ.
0 otherwise

Calculate δα and δβ by solving

diag(σ1N ) σ δα π1N − ν
T T + εI = −γ T
σ diag(σ 1M ) δβ π 1M − µ

Set t = 1 and compute the directional derivative

X
d = D(δα,δβ) Φ(αk , β k ) = γ πij (δαi + δβj ) − γ(hδα, νi + hδβ, µi).
ij

while Φ(αk + tδα, β k + tδβ) ≥ Φ(αk , β k ) + tθd do

t ← κt
end while
Set αk+1 = αk − tδα, β k+1 = β k − tδβ
k ←k+1
until kπ1N − νk∞ , kπ T 1M − µk∞ ≤ τ

4 Numerical examples

4.1 Illustration of γ → 0

In our first numerical example we illustrate the how the solutions π ∗ of the regular-
ized problem converge for vanishing regularization parameter γ → 0. We generate
some marginals, fix a transport cost and compute solutions of the discretized
transport problems (35) for a sequence γn → 0 and illustrate the optimal trans-
port plans (and the related regularized transport costs). Our marginals are non-
negative functions sampled at equidistant points xi , yi in the interval [0, 1] and we
used M = N = 400 and the cost cij = (xi − yj )2 is the squared distance between
the sampling points. The results are shown in Figure 2. One observes that the
optimal transport plans converge to a measure that is singular and is supported
on the graph of a monotonically increasing function, exactly as the fundamental
theorem of optimal transport [1] predicts.
We repeat the same experiment where the cost is the (non-squared) distance
cij = |xi − yj |. Here we had to choose larger regularization parameters as it turned
out that values similar to Figure 2 would lead to almost undistinguishable results.
The results are shown in Figure 3. Note the different structure of the transport
plan (which is again in agreement with the predicted results from the fundamental
theorem of optimal transport).
p In Figure 4 we show the results for the concave
but increasing cost cij = |xi − yj | and again observe the expected effect that a
concave transport cost encouraged that as much mass as possible stays in place
(as can be seen by the concentration of mass along the diagonal of the transport
plan).
Quadratically regularized optimal transport 21

γ = 10 γ=1 γ = 0.1 γ = 0.01

Fig. 2 Visualization of transport plans of the quadratically regularized optimal transport

problem with M = N = 400 and quadratic transport cost cij = (xi − yj )2 .

γ = 1, 000 γ = 100 γ = 10 γ=1

Fig. 3 Visualization of transport plans of the quadratically regularized optimal transport

problem with M = N = 400 and metric transport cost cij = |xi − yj |.

γ = 1, 000 γ = 100 γ = 10 γ=1

Fig. 4 Visualization of transport plans of the quadratically regularized

poptimal transport
problem with M = N = 400 and concave increasing transport cost cij = |xi − yj |.

4.2 Mesh independence and comparsion of SSN and NLGS

While we did not analyze our algorithms in the continuous case, we made an
experiment to see how the methods converge when we change the mesh size of the
discretization. To that end, we did a simple piecewise constant approximation of
the marginals, the cost and the transport plan as described in Appendix A. This
derivation shows that one has to scale up the marginals for finer discretization (or,
equivalently, scale down the regularization parameter γ ) to get consistent results.
We also took care to adapt the termination criteria so that we terminate the
algorithms when the continuous counterpart of the termination criteria is satisfied
(again, see Table 1 in Appendix A for details).
We used marginals µ± : [0, 1] → [0, ∞[ of the form

µ(x) = r 1+m(1x−a)2 , , ν (x) = s 1
1+m1 (x−a1 )2
+ 1
1+m2 (x−a2 )2
22 Dirk A. Lorenz et al.

with varying m, m1 , m2 > 0, 0 < a, a1 , a2 < 1 and appropriate normalization

factors r, s and quadratic cost c(x, y ) = (x − y )2 and discretized each instance of
the problem with M = N varying from 10 to 1, 000. We solved the problem for each
size for regularization parameter γ = 0.001 with the semismooth Newton method
from Algorithm 2 (with parameters = 10−6 and Armijo parameters κ = 0.5
and θ = 0.1) up to tolerance 10−3 and report the number of iterations needed
in Figure 5. As can be observed, the number of iterations is comparable for each
instance of the problem. Moreover, it seems that the number of iterations does not
grow with finer discretization (however, the number of iterations seems to oscillate
unpredictable for coarse discretization). The would hint at mesh independence of
the method and one could hope to prove this is future research. We performed a
similar experiment for the nonlinear Gauss-Seidl method from Algorithm 1 (with
larger regularization parameter γ = 0.05 and only up to M = N = 500 and show
the results in Figure 6. We see an overall increase of the number of iterations but
only very slightly (with several instances where the number of iterations does not
increasing with finer discretization).

100

80
# ssn iterations

0
0 200 400 600 800 1,000
M =N

Fig. 5 Number of iteration for the semismooth Newton method to achieve a desired accuracy.
Each graph corresponds to one instance of the problem.

4.3 Optimal transport between empirical distributions

As an example in two space dimensions, we consider two distributions µ, ν . Instead

of using these as marginals, we consider empirical distributions, i.e. we generate
samples (xi )i=1,...,N , sampled from µ and (yj )j =1,...,M , sampled from ν . These
samples give empirical approximations
N
X M
X
1 1
µ̂ = N δx i , ν̂ = M δyj .
i=1 j =1

The optimal transport problem (1) with these two marginals does no fulfill As-
sumption 1, since the marginals are not L2 -functions. However, we can consider it
as a discrete problem optimal transport problem in the form (35) when we denote
Quadratically regularized optimal transport 23

150

# gs iterations
100

0
0 100 200 300 400 500
M =N

Fig. 6 Number of iteration for the nonlinear Gauss-Seidel semismooth method to achieve a
desired accuracy. Each graph corresponds to one instance of the problem.

cij = c(xi , yj ) (for some cost c) and marginals 1M and 1N , respectively. We solve
this discrete optimal transport problem and obtain a transport plan π ∗ . Since we
use quadratic regularization, the plan will be sparse and hence, we can visualize
it by plotting arrows from xi to yj and we make the thickness of the arrows pro-
∗
portional to the size of the entry πij . In other words: The thickness of the arrow
from xi to yj indicates how much of the mass in xi has been transported to yj .
In Figure 7 we show the result for N = 80 samples from an anisotropic Gaus-
sian distribution (centered at the origin) and M = 120 samples from a uniform
distribution on a segment of an annulus. We used c(xi , yj ) = kxi − yj k2 with the
Euclidean norm and regularization paramater γ = 1. The resulting plan π ∗ has 212
non-zero entries. For a comparison we show the result of entropically regularized
optimal transport in the same situation in Figure 8. We used γ = 0.05 (which is
the smallest value for which our naive implementation of Sinkhorn algorithms is
still stable). The resulting plan has 6730 nonzero entries and we only plot lines
for the transport which are larger than 1% of the largest entry in the optimal
transport plan.

5 Conclusion

We analyzed the quadratically regularized optimal transport problem in Kan-

torovich form. While it is straight forward to derive the dual problem, our proof
of existence of dual optima is quite intricate. We note that we are not aware of
any proof of existence of the dual of other regularized transport problems in the
continuous case besides the very recent [8] for entropic regularization. We derived
two algorithms to solve the dual problems, both of which converge by standard
results. It turns out that the semismooth quasi-Newton methods converges fast in
all cases and that it behaves stably with respect to the regularization parameter in
our numerical experiments. We even observe mesh independence of the method in
the experiments. One drawback of the semismooth Newton method is (compared
with, e.g. the Sinkhorn iteration [9]), is that we need to assemble the Newton
matrix in each step. While this matrix is usually very sparse, one still needs to
check M N cases, which may be too large for large scale problems. We did not
24 Dirk A. Lorenz et al.

optimal transport plan

5 20
4
40
3
60
2
1 80
20 40 60 80 100 120
0
histogram of optimal plan
-1 10 4
-2
-3
10 2
-4
-5

10 0
-5 0 5 0 0.002 0.004 0.006 0.008 0.01

Fig. 7 Illustration of the quadratically regularized optimal transport between empirical dis-
tributions. Left: Source distribution µ̂ denoted by blue starts and target distribution ν̂ denoted
by red circles together with lines that indicate the transport. Right: The transport plan and
its histogram in semi-log scale.

optimal transport plan

5 20
4
40
3
60
2
1 80
20 40 60 80 100 120
0
histogram of optimal plan
-1 4
10
-2
-3
10 2
-4
-5

10 0
-5 0 5 0 0.002 0.004 0.006 0.008 0.01

Fig. 8 Illustration of the entropically regularized optimal transport between empirical distri-
butions. Left: Source distribution µ̂ denoted by blue starts and target distribution ν̂ denoted
by red circles together with lines that indicate the transport. Right: The transport plan and
its histogram in semi-log scale.

investigate, how special structure of the cost function c may help to reduce the
cost to assemble the sparse matrix σ .

Acknowledgements We would like to thank the reviewer for helpful suggestions that lead
to an improvement presentation and also Stephan Walther (TU Dortmund) for helping with
the construction of the counterexample in Section 2.3.

A Discretization with piecewise-constant ansatz functions

For sake of brevity, we just consider an equidistant discretization of [0, 1] into N intervals using
piecewise constant ansatz functions, i.e.

N
X −1
π(x, y) := πij χ( i , i+1 j j+1
)×( N , N )
(x, y),
N N
i,j=0
Quadratically regularized optimal transport 25

for coefficients πij and assume analogous definitions for the quantities c, µ+ , µ− , α and β.
They have to coincide on average over the intervals. Again, we study this for π and obtain
that the identity
Z i+1 Z j+1 Z i+1 Z j+1 N −1
N N N N X
π(x, y) dydx = πij χ( i , i+1 j j+1
)×( N , N )
(x, y) dydx
i j i j N N
N N N N i,j=0
1
= πij
N2
holds. Again, analogous identities hold for the quantities c, µ+ , µ− , α and β. The ones with
1
one-dimensional domain are scaled by N instead of N12 .
Now, we consider the discrete Algorithm 2, which operates on discrete quantities and
establish a consistent mapping of the quantities from the discretization to the ones of the
solver. We denote its input quantities by c̄ij , µ̄− + ¯
i , µ̄i and its output quantities by ᾱi , β̄j , piij ,
and Ē. It solves for
N
X −1
π̄ij = γ µ̄+
i ,
j=0

which we desire to correspond to

i+1 i+1
Z
N
Z 1 Z
N
π(x, y) dydx = µ+ (x) dx.
i 0 i
N N

We plug in the ansatz functions and obtain the identity

N −1
1 X 1 +
πij = µ .
N 2 j=0 N i

We set π̄ij := γπij and obtain

N −1 N −1
πij = N µ−
X X
π̄ij = i .
j=0 j=0

Thus, the choice µ̄− − + +

i := N µi gives a consistent conversion. Similarly, we obtain µ̄j := N µj .
We proceed with the objective. Plugging in the ansatz functions into the continuous objective
gives
 
N −1 N −1 N −1
γ 1 X 2 1 X −
X
+
E= π − αi µi + βj µj .
2 N 2 i,j=0 ij N i=0 j=0

The solver computes

 
N −1 N −1 N −1
1 X 2 X
−
X
+
Ē = π̄ − γ  ᾱi µ̄i + β̄j µ̄j  ,
2 i,j=0 ij i=0 j=0

Plugging in N µ− − + +
i = µ̄i , N µj = µ̄j and γπij = π̄ij gives
 
N −1 N −1 N −1
21 −
X X X
2 +
Ē = γ π − γN ᾱi µi + β̄j µj .
2 i,j=0 ij

i=0 j=0

1
Thus, the consistent identity E = γN 2
Ē follows if we choose ᾱi := αi and β̄i := βi . The solver
computes ᾱi as the solution of
N −1
(ᾱi + β̄j − c̄ij )+ = γ µ̄−
X
i ,
j=0
26 Dirk A. Lorenz et al.

whereas the discretization of the corresponding continuous equation reads

N −1
1 X
(αi + βj − cij )+ = γµ−
i
N j=0

in terms of the coefficients. Plugging in the choices αi = ᾱi , βj = β̄j , cij = c̄ij and N µ− −
i = µ̄i
yields equivalence of the latter equation to

N −1
1 X 1
(ᾱi + β̄j − c̄ij )+ = γ µ̄− ,
N j=0 N j

which is equivalent to the equation that is solved by Algorithm 2. The argument for µ̄+
j is
carried out analogously.
Regarding termination, the solver checks the criteria

M −1 M −1
1 X 1 X
π̄ij − µ̄−
i <τ and π̄ij − µ̄+
j < τ.
γ j=0 γ i=0

We only consider the first and plug the identity γπij = π̄ij into it, which gives equivalence to

M −1
πij − N µ−
X
i < τ.
j=0

This in turn is equivalent to

M −1 Z i+1 Z j+1 Z i+1

X N N N
N2 π(x, y) dydx − µ− (x) dx < τ,
i j i
j=0 N N N
i+1
Z
N
Z 1 τ
π(x, y) dy − µ− (x) dx < .
i
N
0 N2

i i+1
Moreover, the ansatz functions for π and µ− are constant on N
, N
, which induces
equivalence to
i+1
Z
N
Z 1 τ
π(x, y) dy − µ− (x) dx < .
i
N
0 N2

This implies that if the solver terminates, we have

Z 1 τ
π(·, y) dy − µ− (·) < .
0 L1 ((0,1)) N

We summarize the choices for the consistent mapping of quantities arising from the discretiza-
tion to quantities the solver operates on in Table 1. Finally, we make a note on the calculation
of the coefficients cij for the cost function c(x, y) := (x − y)2 :

Z i+1 Z j+1
N N 1 1
cij = N 2 (x − y)2 dydx = ... = (i − j)2 + .
i
N
j
N
N2 6

Conflict of Interest: The authors declare that they have no conflict of interest.
Quadratically regularized optimal transport 27

Table 1 Mapping discretization quantities to solver quantities.

Coefficient Solver Quantity Conversion
πij π̄ij π̄ij = γπij
cij c̄ij c̄ij = cij
µ−i µ̄− i µ̄−i = N µi
−

µ+j µ̄+ j µ̄j = N µ+

+
j
αi ᾱi ᾱi = αi
βj β̄j β̄j = βj
J J¯ J¯ = JN 2 γ
τ τ̄ τ̄ = τ N

References

1. Luigi Ambrosio and Nicola Gigli. A users guide to optimal transport. In Modelling and
optimisation of flows on networks, pages 1–155. Springer, 2013.
2. Dimitri P. Bertsekas. Nonlinear programming. Athena Scientific Optimization and Com-
putation Series. Athena Scientific, Belmont, MA, third edition, 2016.
3. Mathieu Blondel, Vivien Seguy, and Antoine Rolet. Smooth and sparse optimal transport.
In Amos Storkey and Fernando Perez-Cruz, editors, Proceedings of the Twenty-First In-
ternational Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings
of Machine Learning Research, pages 880–889, Playa Blanca, Lanzarote, Canary Islands,
09–11 Apr 2018. PMLR.
4. Guillaume Carlier, Vincent Duval, Gabriel Peyré, and Bernhard Schmitzer. Convergence of
entropic schemes for optimal transport and gradient flows. SIAM Journal on Mathematical
Analysis, 49(2):1385–1418, 2017.
5. Xiaojun Chen. Superlinear convergence of smoothing quasi-newton methods for nons-
mooth equations. Journal of Computational and Applied Mathematics, 80(1):105 – 126,
1997.
6. Xiaojun Chen. On convergence of SOR methods for nonsmooth equations. Numer. Linear
Algebra Appl., 9(1):81–92, 2002.
7. Xiaojun Chen, Zuhair Nashed, and Liqun Qi. Smoothing methods and semismooth meth-
ods for nondifferentiable operator equations. SIAM J. Numer. Anal., 38(4):1200–1216,
2000.
8. Christian Clason, Dirk A Lorenz, Hinrich Mahler, and Benedikt Wirth. Entropic regu-
larization of continuous optimal transport problems. arXiv preprint arXiv:1906.01333,
2019.
9. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In
Advances in neural information processing systems, pages 2292–2300, 2013.
10. Marco Cuturi and Gabriel Peyré. A smoothed dual approach for variational Wasserstein
problems. SIAM J. Imaging Sci., 9(1):320–343, 2016.
11. Montacer Essid and Justin Solomon. Quadratically regularized optimal transport on
graphs. SIAM Journal on Scientific Computing, 40(4):A1961–A1986, 2018.
12. Irene Fonseca and Giovanni Leoni. Modern methods in the calculus of variations: Lp
spaces. Springer Monographs in Mathematics. Springer, New York, 2007.
13. Aude Genevay, Gabriel Peyre, and Marco Cuturi. Learning generative models with
sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics,
pages 1608–1617, 2018.
14. Michael Hinze, René Pinnau, Michael Ulbrich, and Stefan Ulbrich. Optimization with
PDE constraints, volume 23. Springer Science & Business Media, 2008.
15. Leonid V. Kantorovič. On the translocation of masses. C. R. (Doklady) Acad. Sci. URSS
(N.S.), 37:199–201, 1942.
16. Nicolas Papadakis, Gabriel Peyré, and Edouard Oudet. Optimal transport with proximal
splitting. SIAM J. Imaging Sci., 7(1):212–238, 2014.
17. Gabriel Peyré and Marco Cuturi. Computational optimal transport. Foundations and
Trends in Machine Learning, 11(5-6):355–607, 2019.
18. Svetlozar T. Rachev and Ludger Rüschendorf. Mass transportation problems. Vol. I.
Probability and its Applications (New York). Springer-Verlag, New York, 1998. Theory.
28 Dirk A. Lorenz et al.

19. Svetlozar T. Rachev and Ludger Rüschendorf. Mass transportation problems. Vol. II.
Probability and its Applications (New York). Springer-Verlag, New York, 1998. Applica-
tions.
20. Lucas Roberts, Leo Razoumov, Lin Su, and Yuyang Wang. Gini-regularized optimal trans-
port with an application to spatio-temporal forecasting. arXiv preprint arXiv:1712.02512,
2017.
21. Filippo Santambrogio. Optimal transport for applied mathematicians, volume 87 of
Progress in Nonlinear Differential Equations and their Applications. Birkhäuser/Springer,
Cham, 2015. Calculus of variations, PDEs, and modeling.
22. Fredi Tröltzsch. Regular Lagrange multipliers for control problems with mixed pointwise
control-state constraints. SIAM Journal on Optimization, 15:616–634, 2005.
23. Cédric Villani. Topics in optimal transportation, volume 58 of Graduate Studies in Math-
ematics. American Mathematical Society, Providence, RI, 2003.
24. Cédric Villani. Optimal transport. Old and new, volume 338 of Grundlehren der Mathe-
matischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-
Verlag, Berlin, 2009.
25. Stephen J. Wright. Coordinate descent algorithms. Math. Program., 151(1, Ser. B):3–34,
2015.

Cladogram Worksheet
67% (3)
Cladogram Worksheet
2 pages
Lecture Notes On Optimal Transport Problems
No ratings yet
Lecture Notes On Optimal Transport Problems
56 pages
Multi-Marginal Optimal Transport: Theory and Applications
No ratings yet
Multi-Marginal Optimal Transport: Theory and Applications
29 pages
Sparse Regularized Optimal Transport with Deformed q-Entropy
No ratings yet
Sparse Regularized Optimal Transport with Deformed q-Entropy
27 pages
Notes of Optimal Transport Problem and Metrics: Yang YANG, EE 68 April 27, 2019
No ratings yet
Notes of Optimal Transport Problem and Metrics: Yang YANG, EE 68 April 27, 2019
15 pages
Rectified Flow
No ratings yet
Rectified Flow
24 pages
Mass Transport Final
No ratings yet
Mass Transport Final
137 pages
Sparsity-Constrained Optimal Transport
No ratings yet
Sparsity-Constrained Optimal Transport
26 pages
Optimal Transportation and Action Minimizing Measures PDF
No ratings yet
Optimal Transportation and Action Minimizing Measures PDF
251 pages
Boundary Method
No ratings yet
Boundary Method
38 pages
1412.4430v1 geometric graphs
No ratings yet
1412.4430v1 geometric graphs
28 pages
Quantitative Convergence of Quadratically Regularized Linear Programs
No ratings yet
Quantitative Convergence of Quadratically Regularized Linear Programs
15 pages
Uniqueness and Monge Solutions in the Multimarginal OT problem
No ratings yet
Uniqueness and Monge Solutions in the Multimarginal OT problem
20 pages
Co CV 220165
No ratings yet
Co CV 220165
23 pages
Introduction To Optimal Transport
No ratings yet
Introduction To Optimal Transport
56 pages
An Introduction To Optimal Transport and Wasserstein Gradient Flows
No ratings yet
An Introduction To Optimal Transport and Wasserstein Gradient Flows
19 pages
Iterative Bregman Projections For Regularized Transportation Problems
No ratings yet
Iterative Bregman Projections For Regularized Transportation Problems
29 pages
mathrm (C) /mathrm (I) /mathrm (O) /mathrm (M) /mathrm (P) /mathrm (U) /mathrm (T)
No ratings yet
mathrm (C) /mathrm (I) /mathrm (O) /mathrm (M) /mathrm (P) /mathrm (U) /mathrm (T)
25 pages
Quantitative Stability of Regularized Optimal Transport
No ratings yet
Quantitative Stability of Regularized Optimal Transport
35 pages
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
No ratings yet
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
9 pages
Optimal Transport in Learning, Control, and Dynamical Systems
No ratings yet
Optimal Transport in Learning, Control, and Dynamical Systems
25 pages
Ito 2003
No ratings yet
Ito 2003
8 pages
Optimal Transport Old and New
No ratings yet
Optimal Transport Old and New
998 pages
Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems
No ratings yet
Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems
30 pages
2407.19337v1
No ratings yet
2407.19337v1
25 pages
The Turnpike Property in Finite-Dimensional Nonlinear Optimal Control
No ratings yet
The Turnpike Property in Finite-Dimensional Nonlinear Optimal Control
31 pages
Multi-marginal optimal transport and multi-agent matching problems: uniqueness and structure of solutions
No ratings yet
Multi-marginal optimal transport and multi-agent matching problems: uniqueness and structure of solutions
20 pages
Tensor_optimal_transport_distance_between_sets_of_
No ratings yet
Tensor_optimal_transport_distance_between_sets_of_
33 pages
2503.20569v1
No ratings yet
2503.20569v1
6 pages
Projet
No ratings yet
Projet
22 pages
WHEN DOES STABILIZABILITY IMPLY THE EXISTENCE OF INFINITE HORIZON OPTIMAL CONTROL IN NONLINEAR SYSTEMS
No ratings yet
WHEN DOES STABILIZABILITY IMPLY THE EXISTENCE OF INFINITE HORIZON OPTIMAL CONTROL IN NONLINEAR SYSTEMS
25 pages
Asymptotics for semi-discrete entropic optimal
No ratings yet
Asymptotics for semi-discrete entropic optimal
26 pages
Matbematicae: Ordinary Differential Equations, Transport Theory and Sobolev Spaces
No ratings yet
Matbematicae: Ordinary Differential Equations, Transport Theory and Sobolev Spaces
37 pages
A multiscale approach to optimal transport
No ratings yet
A multiscale approach to optimal transport
19 pages
Optimizing Urban Network Via Mass Transport
No ratings yet
Optimizing Urban Network Via Mass Transport
161 pages
Vol4 Solution Final
No ratings yet
Vol4 Solution Final
8 pages
Probabilistic Inverse Optimal Transport: Wei-Ting Chiu and Pei Wang Contribute Equally
No ratings yet
Probabilistic Inverse Optimal Transport: Wei-Ting Chiu and Pei Wang Contribute Equally
18 pages
4115 Troyou06
No ratings yet
4115 Troyou06
20 pages
OPTIMAL CONTROL OF LINEARIZED COMPRESSIBLE NAVIER–STOKES EQUATIONS
No ratings yet
OPTIMAL CONTROL OF LINEARIZED COMPRESSIBLE NAVIER–STOKES EQUATIONS
29 pages
1808.05710v1
No ratings yet
1808.05710v1
38 pages
1_2016_Linear Approximation and Asymptotic Expansion of Solutions for a Nonlinear Carrier Wave Equation in the Annular With Robin-Dirichlet Conditions
No ratings yet
1_2016_Linear Approximation and Asymptotic Expansion of Solutions for a Nonlinear Carrier Wave Equation in the Annular With Robin-Dirichlet Conditions
19 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
236 pages
TRANSPORT INEQUALITIES. A SURVEY
No ratings yet
TRANSPORT INEQUALITIES. A SURVEY
82 pages
s11228-021-00595-z
No ratings yet
s11228-021-00595-z
22 pages
VBook-O&N
No ratings yet
VBook-O&N
547 pages
Smooth and Sparse Optimal Transport
No ratings yet
Smooth and Sparse Optimal Transport
17 pages
Luận văn của Hoang Nguyen
No ratings yet
Luận văn của Hoang Nguyen
14 pages
Inverse problems of recovering lower-order coefficients from boundary integral data
No ratings yet
Inverse problems of recovering lower-order coefficients from boundary integral data
15 pages
Exact Controllability For The Non Stationary Transport Equation
No ratings yet
Exact Controllability For The Non Stationary Transport Equation
29 pages
Mengistu Chalchisa
No ratings yet
Mengistu Chalchisa
46 pages
Documat-FixedChargeBicriterionIndefiniteQuadraticTransport-3682862
No ratings yet
Documat-FixedChargeBicriterionIndefiniteQuadraticTransport-3682862
13 pages
Functional Optimal Transport
No ratings yet
Functional Optimal Transport
10 pages
bracarsan180110_final
No ratings yet
bracarsan180110_final
25 pages
Optimal_Transport
No ratings yet
Optimal_Transport
103 pages
Optimal Control
No ratings yet
Optimal Control
142 pages
Remarks on Toland’s duality, convexity and optimal transport
No ratings yet
Remarks on Toland’s duality, convexity and optimal transport
13 pages
ruta optima
No ratings yet
ruta optima
27 pages
Flows of Non Smooth Vector Fields and Degenerate Elliptic Equations With Applications to the Vlasov Poisson and Semigeostrophic Systems Maria Colombo instant download
No ratings yet
Flows of Non Smooth Vector Fields and Degenerate Elliptic Equations With Applications to the Vlasov Poisson and Semigeostrophic Systems Maria Colombo instant download
53 pages
Monge Kantorovich - Survey
No ratings yet
Monge Kantorovich - Survey
59 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Distance Distributions and Inverse Problems for Metric Measure
No ratings yet
Distance Distributions and Inverse Problems for Metric Measure
71 pages
Multi-Marginal Optimal Transport Defines a Generalized Metric
No ratings yet
Multi-Marginal Optimal Transport Defines a Generalized Metric
17 pages
A Linear Transportation Lp Distance for Pattern Recognition
No ratings yet
A Linear Transportation Lp Distance for Pattern Recognition
41 pages
Understanding the basis of graph signal processing via an intuitive example-driven approach
No ratings yet
Understanding the basis of graph signal processing via an intuitive example-driven approach
10 pages
Spectral Distances On Graphs
No ratings yet
Spectral Distances On Graphs
11 pages
Slides - Graph Signal Processing and Applications in Neuroscience
No ratings yet
Slides - Graph Signal Processing and Applications in Neuroscience
103 pages
Hypergraph Co-Optimal Transport: Metric and Categorical Properties
No ratings yet
Hypergraph Co-Optimal Transport: Metric and Categorical Properties
21 pages
A Geometric View of Optimal Transportation and Generative Model
No ratings yet
A Geometric View of Optimal Transportation and Generative Model
21 pages
Slides - Graph Signal Processing: Fundamentals and Applications To Diffusion Processes
No ratings yet
Slides - Graph Signal Processing: Fundamentals and Applications To Diffusion Processes
118 pages
The Emerging Field of Signal Processing On Graphs
No ratings yet
The Emerging Field of Signal Processing On Graphs
14 pages
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
No ratings yet
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
13 pages
Robust Shape Matching With OT
No ratings yet
Robust Shape Matching With OT
175 pages
11th std Unit test -1 Chemistry Q. paper
No ratings yet
11th std Unit test -1 Chemistry Q. paper
4 pages
Ambarisha Special Story 1
No ratings yet
Ambarisha Special Story 1
4 pages
A Dog's Purpose
No ratings yet
A Dog's Purpose
2 pages
(eBook PDF) Essentials of Oceanography 11th Edition by Alan P. Trujillo instant download
100% (1)
(eBook PDF) Essentials of Oceanography 11th Edition by Alan P. Trujillo instant download
46 pages
Semi-Centrifugal Casting: An Extensive Analysis: Technical Report
No ratings yet
Semi-Centrifugal Casting: An Extensive Analysis: Technical Report
10 pages
英语试卷
No ratings yet
英语试卷
10 pages
Corrosion Damage Report (CDR)
No ratings yet
Corrosion Damage Report (CDR)
4 pages
8458 23594 1 SP
No ratings yet
8458 23594 1 SP
6 pages
LP DEMO LAYERS-ocampo
No ratings yet
LP DEMO LAYERS-ocampo
14 pages
Fdsafdsfsafasdfbrwa
No ratings yet
Fdsafdsfsafasdfbrwa
14 pages
Crystal Fire: The Birth of The Information Age
No ratings yet
Crystal Fire: The Birth of The Information Age
9 pages
Atiolink Adio User V1!0!2202us W-1
No ratings yet
Atiolink Adio User V1!0!2202us W-1
68 pages
V-Tec Installation Manual IN0078 - r0
No ratings yet
V-Tec Installation Manual IN0078 - r0
82 pages
Operating Instructions: SR-SAT102 SR-SAT182
No ratings yet
Operating Instructions: SR-SAT102 SR-SAT182
18 pages
Automobile Notes With DRG
No ratings yet
Automobile Notes With DRG
41 pages
Sigma Mixer
No ratings yet
Sigma Mixer
2 pages
The Do-Over Damsel Conquers the Dragon Emperor Vol.6-1
No ratings yet
The Do-Over Damsel Conquers the Dragon Emperor Vol.6-1
180 pages
Research On Historical Bricks From A Baroque Church
No ratings yet
Research On Historical Bricks From A Baroque Church
5 pages
The Importance of Spiritual Sight (2 Corinthians 5:7) Importance of Spiritual Sight
100% (1)
The Importance of Spiritual Sight (2 Corinthians 5:7) Importance of Spiritual Sight
11 pages
Homework Solution
No ratings yet
Homework Solution
10 pages
Pipeline Integrity Management System (PIMS)
No ratings yet
Pipeline Integrity Management System (PIMS)
21 pages
Paal Vadiyum Mugam
No ratings yet
Paal Vadiyum Mugam
6 pages
Charging Lithium-Ion Batteries - Battery University
No ratings yet
Charging Lithium-Ion Batteries - Battery University
74 pages
Assessment of Perceptions and Preventive Health Behaviours of Bushmeat Handlers Regarding Mpox Infection in Wildlife Markets of Northern Nigeria: A One Health Perspective
No ratings yet
Assessment of Perceptions and Preventive Health Behaviours of Bushmeat Handlers Regarding Mpox Infection in Wildlife Markets of Northern Nigeria: A One Health Perspective
12 pages
OMAE2011-50201 Fatigue Assessment of Aluminum Ship Details by Hot-Spot Stress Approach
No ratings yet
OMAE2011-50201 Fatigue Assessment of Aluminum Ship Details by Hot-Spot Stress Approach
10 pages
Service Bulletin A - 11: Technical Information To All The Owners of Sulzer A 20 Type Diesel Engines 15.09.95
No ratings yet
Service Bulletin A - 11: Technical Information To All The Owners of Sulzer A 20 Type Diesel Engines 15.09.95
3 pages
D Gem & Jewellery Export Process - NGJA
No ratings yet
D Gem & Jewellery Export Process - NGJA
9 pages
Vampire The Requiem - A Hunger Like Fire (Novel #1)
No ratings yet
Vampire The Requiem - A Hunger Like Fire (Novel #1)
290 pages
Biodiversity and Conservation _ DPP __ Umeed NEET English
No ratings yet
Biodiversity and Conservation _ DPP __ Umeed NEET English
3 pages