Optimal Transport-Cedric Villani
Optimal Transport-Cedric Villani
Springer
Berlin Heidelberg NewYork
Hong Kong London
Milan Paris Tokyo
Do mo chuisle mo chroı́, Aëlle
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Introduction 13
4 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
12 Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985
When I was first approached for the 2005 edition of the Saint-Flour
Probability Summer School, I was intrigued, flattered and scared.1
Apart from the challenge posed by the teaching of a rather analytical
subject to a probabilistic audience, there was the danger of producing
a remake of my recent book Topics in Optimal Transportation.
However, I gradually realized that I was being offered a unique op-
portunity to rewrite the whole theory from a different perspective, with
alternative proofs and a different focus, and a more probabilistic pre-
sentation; plus the incorporation of recent progress. Among the most
striking of these recent advances, there was the rising awareness that
John Mather’s minimal measures had a lot to do with optimal trans-
port, and that both theories could actually be embedded in a single
framework. There was also the discovery that optimal transport could
provide a robust synthetic approach to Ricci curvature bounds. These
links with dynamical systems on one hand, differential geometry on
the other hand, were only briefly alluded to in my first book; here on
the contrary they will be at the basis of the presentation. To summa-
rize: more probability, more geometry, and more dynamical systems.
Of course there cannot be more of everything, so in some sense there
is less analysis and less physics, and also there are fewer digressions.
So the present course is by no means a reduction or an expansion of
my previous book, but should be regarded as a complementary reading.
Both sources can be read independently, or together, and hopefully the
complementarity of points of view will have pedagogical value.
Throughout the book I have tried to optimize the results and the
presentation, to provide complete and self-contained proofs of the most
important results, and comprehensive bibliographical notes — a daunt-
ingly difficult task in view of the rapid expansion of the literature. Many
statements and theorems have been written specifically for this course,
and many results appear in rather sharp form for the first time. I also
added several appendices, either to present some domains of mathe-
matics to non-experts, or to provide proofs of important auxiliary re-
sults. All this has resulted in a rapid growth of the document, which in
the end is about six times (!) the size that I had planned initially. So
the non-expert reader is advised to skip long proofs at first
reading, and concentrate on explanations, statements, examples and
sketches of proofs when they are available.
1
Fans of Tom Waits may have identified this quotation.
Preface 3
Several parts of the subject are not developed as much as they would
deserve. Numerical simulation is not addressed at all, except for a few
comments in the concluding part. The regularity theory of optimal
transport is described in Chapter 12 (including the remarkable recent
works of Xu-Jia Wang, Neil Trudinger and Grégoire Loeper), but with-
out the core proofs and latest developments; this is not only because
of the technicality of the subject, but also because smoothness is not
needed in the rest of the book. Still another poorly developed subject is
the Monge–Mather–Mañé problem arising in dynamical systems, and
including as a variant the optimal transport problem when the cost
function is a distance. This topic is discussed in several treatises, such as
Albert Fathi’s monograph, Weak KAM theorem in Lagrangian dynam-
ics; but now it would be desirable to rewrite everything in a framework
that also encompasses the optimal transport problem. An important
step in this direction was recently performed by Patrick Bernard and
Boris Buffoni. In Chapter 8 I shall provide an introduction to Mather’s
theory, but there would be much more to say.
The treatment of Chapter 22 (concentration of measure) is strongly
influenced by Michel Ledoux’s book, The Concentration of Measure
Phenomenon; while the results of Chapters 23 to 25 owe a lot to
the monograph by Luigi Ambrosio, Nicola Gigli and Giuseppe Savaré,
Gradient flows in metric spaces and in the space of probability mea-
sures. Both references are warmly recommended complementary read-
ing. One can also consult the two-volume treatise by Svetlozar Rachev
and Ludger Rüschendorf, Mass Transportation Problems, for many ap-
plications of optimal transport to various fields of probability theory.
While writing this text I asked for help from a number of friends
and collaborators. Among them, Luigi Ambrosio and John Lott are
the ones whom I requested most to contribute; this book owes a lot
to their detailed comments and suggestions. Most of Part III, but also
significant portions of Parts I and II, are made up with ideas taken from
my collaborations with John, which started in 2004 as I was enjoying
the hospitality of the Miller Institute in Berkeley. Frequent discussions
with Patrick Bernard and Albert Fathi allowed me to get the links
between optimal transport and John Mather’s theory, which were a
key to the presentation in Part I; John himself gave precious hints
about the history of the subject. Neil Trudinger and Xu-Jia Wang spent
vast amounts of time teaching me the regularity theory of Monge–
Ampère equations. Alessio Figalli took up the dreadful challenge to
Preface 5
check the entire set of notes from first to last page. Apart from these
people, I got valuable help from Stefano Bianchini, François Bolley,
Yann Brenier, Xavier Cabré, Vincent Calvez, José Antonio Carrillo,
Dario Cordero-Erausquin, Denis Feyel, Sylvain Gallot, Wilfrid Gangbo,
Diogo Aguiar Gomes, Nathaël Gozlan, Arnaud Guillin, Nicolas Juillet,
Kazuhiro Kuwae, Michel Ledoux, Grégoire Loeper, Francesco Maggi,
Robert McCann, Shin-ichi Ohta, Vladimir Oliker, Yann Ollivier, Felix
Otto, Ludger Rüschendorf, Giuseppe Savaré, Walter Schachermayer,
Benedikt Schulte, Theo Sturm, Josef Teichmann, Anthon Thalmaier,
Hermann Thorisson, Süleyman Üstünel, Anatoly Vershik, and others.
Short versions of this course were tried on mixed audiences in the
Universities of Bonn, Dortmund, Grenoble and Orléans, as well as the
Borel seminar in Leysin and the IHES in Bures-sur-Yvette. Part of
the writing was done during stays at the marvelous MFO Institute
in Oberwolfach, the CIRM in Luminy, and the Australian National
University in Canberra. All these institutions are warmly thanked.
It is a pleasure to thank Jean Picard for all his organization work
on the 2005 Saint-Flour summer school; and the participants for their
questions, comments and bug-tracking, in particular Sylvain Arlot
(great bug-tracker!), Fabrice Baudoin, Jérôme Demange, Steve Evans
(whom I also thank for his beautiful lectures), Christophe Leuridan,
Jan Oblój, Erwan Saint Loubert Bié, and others. I extend these thanks
to the joyful group of young PhD students and maı̂tres de conférences
with whom I spent such a good time on excursions, restaurants, quan-
tum ping-pong and other activities, making my stay in Saint-Flour
truly wonderful (with special thanks to my personal driver, Stéphane
Loisel, and my table tennis sparring-partner and adversary, François
Simenhaus). I will cherish my visit there in memory as long as I live!
Typing these notes was mostly performed on my (now defunct)
faithful laptop Torsten, a gift of the Miller Institute. Support by the
Agence Nationale de la Recherche and Institut Universitaire de France
is acknowledged. My eternal gratitude goes to those who made fine
typesetting accessible to every mathematician, most notably Donald
Knuth for TEX, and the developers of LATEX, BibTEX and XFig. Final
thanks to Catriona Byrne and her team for a great editing process.
As usual, I encourage all readers to report mistakes and misprints.
I will maintain a list of errata, accessible from my Web page.
Cédric Villani
Lyon, June 2008
Conventions
8 Conventions
Axioms
I use the classical axioms of set theory; not the full version of the axiom
of choice (only the classical axiom of “countable dependent choice”).
Sets and structures
Id is the identity mapping, whatever the space. If A is a set then the
function 1A is the indicator function of A: 1A (x) = 1 if x ∈ A, and 0
otherwise. If F is a formula, then 1F is the indicator function of the
set defined by the formula F .
If f and g are two functions, then (f, g) is the function x 7−→
(f (x), g(x)). The composition f ◦ g will often be denoted by f (g).
N is the set of positive integers: N = {1, 2, 3, . . .}. A sequence is
written (xk )k∈N , or simply, when no confusion seems possible, (xk ).
R is the set of real numbers. When I write Rn it is implicitly assumed
that n is a positive integer. The Euclidean scalar product between two
vectors a and b in Rn is denoted interchangeably by a · b or ha, bi. The
Euclidean norm will be denoted simply by | · |, independently of the
dimension n.
Mn (R) is the space of real n × n matrices, and In the n × n identity
matrix. The trace of a matrix M will be denoted by tr M , its deter-
minant
p by det M , its adjoint by M ∗ , and its Hilbert–Schmidt norm
tr (M ∗ M ) by kM kHS (or just kM k).
Unless otherwise stated, Riemannian manifolds appearing in the
text are finite-dimensional, smooth and complete. If a Riemannian
manifold M is given, I shall usually denote by n its dimension, by
d the geodesic distance on M , and by vol the volume (= n-dimensional
Hausdorff) measure on M . The tangent space at x will be denoted by
Tx M , and the tangent bundle by TM . The norm on Tx M will most
of the time be denoted by | · |, as in Rn , without explicit mention of
the point x. (The symbol k · k will be reserved for special norms or
functional norms.) If S is a set without smooth structure, the notation
Tx S will instead denote the tangent cone to S at x (Definition 10.46).
If Q is a quadratic form defined on Rn , or on the tangent bundle
ofa
manifold, its value on a (tangent) vector v will be denoted by Q·v, v ,
or simply Q(v).
The open ball of radius r and center x in a metric space X is denoted
interchangeably by B(x, r) or Br (x). If X is a Riemannian manifold,
the distance is of course the geodesic distance. The closed ball will be
denoted interchangeably by B[x, r] or Br] (x). The diameter of a metric
space X will be denoted by diam (X ).
Conventions 9
Function spaces
C(X ) is the space of continuous functions X → R, Cb (X ) the space
of bounded continuous functions X → R; and C0 (X ) the space of
continuous functions X → R converging to 0 at infinity; all of them
are equipped with the norm of uniform convergence kϕk∞ = sup |ϕ|.
Then Cbk (X ) is the space of k-times continuously differentiable func-
tions u : X → R, such that all the partial derivatives of u up to order k
are bounded; it is equipped with the norm given by the supremum of
all norms k∂ukCb , where ∂u is a partial derivative of order at most k;
Cck (X ) is the space of k-times continuously differentiable functions with
compact support; etc. When the target space is not R but some other
space Y, the notation is transformed in an obvious way: C(X ; Y), etc.
Lp is the Lebesgue space of exponent p; the space and the measure
will often be implicit, but clear from the context.
10 Conventions
Calculus
The derivative of a function u = u(t), defined on an interval of R and
valued in Rn or in a smooth manifold, will be denoted by u′ , or more
often by u̇. The notation d+ u/dt stands for the upper right-derivative
of a real-valued function u: d+ u/dt = lim sups↓0 [u(t + s) − u(t)]/s.
If u is a function of several variables, the partial derivative with
respect to the variable t will be denoted by ∂t u, or ∂u/∂t. The notation
ut does not stand for ∂t u, but for u(t).
The gradient operator will be denoted by grad or simply ∇; the di-
vergence operator by div or ∇· ; the Laplace operator by ∆; the Hessian
operator by Hess or ∇2 (so ∇2 does not stand for the Laplace opera-
tor). The notation is the same in Rn or in a Riemannian manifold. ∆ is
the divergence of the gradient, so it is typically a nonpositive operator.
The value of the gradient of f at point x will be denoted either by
∇x f or ∇f (x). The notation ∇ e stands for the approximate gradient,
introduced in Definition 10.2.
If T is a map Rn → Rn , ∇T stands for the Jacobian matrix of T ,
that is the matrix of all partial derivatives (∂Ti /∂xj ) (1 ≤ i, j ≤ n).
All these differential operators will be applied to (smooth) functions
but also to measures, by duality. For
R instance, Rthe Laplacian of a mea-
sure µ is defined via the identity ζ d(∆µ) = (∆ζ) dµ (ζ ∈ Cc2 ). The
notation is consistent in the sense that ∆(f vol) = (∆f ) vol. Similarly,
I shall take the divergence of a vector-valued measure, etc.
f = o(g) means f /g −→ 0 (in an asymptotic regime that should be
clear from the context), while f = O(g) means that f /g is bounded.
log stands for the natural logarithm with base e.
The positive and negative parts of x ∈ R are defined respectively
by x+ = max (x, 0) and x− = max (−x, 0); both are nonnegative, and
|x| = x+ + x− . The notation a ∧ b will sometimes be used for min (a, b).
All these notions are extended in the usual way to functions and also
to signed measures.
Probability measures
δx is the Dirac mass at point x.
All measures considered in the text are Borel measures on Polish
spaces, which are complete, separable metric spaces, equipped with
their Borel σ-algebra. I shall usually not use the completed σ-algebra,
except on some rare occasions (emphasized in the text) in Chapter 5.
A measure is said to be finite if it has finite mass, and locally finite
if it attributes finite mass to compact sets.
Conventions 11
2
Depending on the authors, the measure T# µ is often denoted by T #µ, T∗ µ, T (µ),
T µ, δT (a) µ(da), µ ◦ T −1 , µT −1 , or µ[T ∈ · ].
R
12 Conventions
If µ and ν are the only laws in the problem, then without loss of
generality one may choose Ω = X × Y. In a more measure-theoretical
formulation, coupling µ and ν means constructing a measure π on X ×Y
such that π admits µ and ν as marginals on X and Y respectively.
The following three statements are equivalent ways to rephrase that
marginal condition:
• π = (Id , T )# µ.
Here below are some of the most famous couplings used in mathematics
— of course the list is far from complete, since everybody has his or
her own preferred coupling technique. Each of these couplings comes
with its own natural setting; this variety of assumptions reflects the
variety of constructions. (This is a good reason to state each of them
with some generality.)
and set
T = G−1 ◦ F.
If µ does not have atoms, then T# µ = ν. This rearrangement is quite
simple, explicit, as smooth as can be, and enjoys good geometric
properties.
4. The Knothe–Rosenblatt rearrangement in Rn . Let µ and ν be
two probability measures on Rn , such that µ is absolutely continu-
ous with respect to Lebesgue measure. Then define a coupling of µ
and ν as follows.
Step 1: Take the marginal on the first variable: this gives probabil-
ity measures µ1 (dx1 ), ν1 (dy1 ) on R, with µ1 being atomless. Then
define y1 = T1 (x1 ) by the formula for the increasing rearrangement
of µ1 into ν1 .
Step 2: Now take the marginal on the first two variables and dis-
integrate it with respect to the first variable. This gives proba-
bility measures µ2 (dx1 dx2 ) = µ1 (dx1 ) µ2 (dx2 |x1 ), ν2 (dy1 dy2 ) =
ν1 (dy1 ) ν2 (dy2 |y1 ). Then, for each given y1 ∈ R, set y1 = T1 (x1 ),
and define y2 = T2 (x2 ; x1 ) by the formula for the increasing rear-
rangement of µ2 (dx2 |x1 ) into ν2 (dy2 |y1 ). (See Figure 1.1.)
Then repeat the construction, adding variables one after the other
and defining y3 = T3 (x3 ; x1 , x2 ); etc. After n steps, this produces
a map y = T (x) which transports µ to ν, and in practical situa-
tions might be computed explicitly with little effort. Moreover, the
Jacobian matrix of the change of variables T is (by construction)
upper triangular with positive entries on the diagonal; this makes
it suitable for various geometric applications. On the negative side,
this mapping does not satisfy many interesting intrinsic properties;
it is not invariant under isometries of Rn , not even under relabeling
of coordinates.
5. The Holley coupling on a lattice. Let µ and ν be two discrete
probabilities on a finite lattice Λ, say {0, 1}N , equipped with the
natural partial ordering (x ≤ y if xn ≤ yn for all n). Assume that
ν
µ
dx1 dy1
T1
Fig. 1.1. Second step in the construction of the Knothe–Rosenblatt map: After the
correspondance x1 → y1 has been determined, the conditional probability of x2 (seen
as a one-dimensional probability on a small “slice” of width dx1 ) can be transported
to the conditional probability of y2 (seen as a one-dimensional probability on a slice
of width dy1 ).
are many variants with important differences which are all intended
to make two trajectories close to each other after some time: the
Ornstein coupling, the ε-coupling (in which one requires the
two variables to be close, rather than to occupy the same state),
the shift-coupling (in which one allows an additional time-shift),
etc.
8. The optimal coupling or optimal transport. Here one intro-
duces a cost function c(x, y) on X × Y, that can be interpreted
as the work needed to move one unit of mass from location x to
location y. Then one considers the Monge–Kantorovich mini-
mization problem
inf E c(X, Y ),
where the pair (X, Y ) runs over all possible couplings of (µ, ν); or
equivalently, in terms of measures,
Z
inf c(x, y) dπ(x, y),
X ×Y
Gluing
π123 (dx1 dx2 dx3 ) = π12 (dx1 |x2 ) µ2 (dx2 ) π23 (dx3 |x2 ).
∂µ
+ ∇ · (µ ξ) = 0,
∂t
where the time-derivative is taken in the weak sense, and the diver-
gence operator is defined by duality against continuously differentiable
functions with compact support:
Z Z
ϕ ∇ · (µ ξ) = − (ξ · ∇ϕ) dµ.
M M
Diffusion formula
Remark 1.6. Actually, there is a finer criterion for the diffusion equa-
tion to hold true: it is sufficient that the Ricci curvature at point x be
bounded below by −Cd(x0 , x)2 gx as x → ∞, where gx is the metric at
point x and x0 is an arbitrary reference point. The exponent 2 here is
sharp.
∂t ρ = ∆ρ.
with associated flow (Tt (x))0≤t≤1 , and a family (µt )0<t<1 of probability
measures by
µt = (1 − t) µ0 + t µ1 .
It is easy to check that
∂t µ = (ρ1 − ρ0 ) ν,
∇· µt ξ(t, ·) = ∇· ∇u e−V vol = e−V ∆u−∇V ·∇u vol = (ρ0 −ρ1 ) ν.
So µt satisfies the formula of conservation of mass, therefore µt =
(Tt )# µ0 . In particular, T1 pushes µ0 forward to µ1 .
In the case when M is compact and V = 0, the above construction
works if ρ0 and ρ1 are Lipschitz continuous and positive. Indeed, the
solution u of ∆u = ρ0 − ρ1 will be of class C 2,α for all α ∈ (0, 1),
and in particular ∇u will be of class C 1 (in fact C 1,α ). In more general
situations, things might depend on the regularity of V , and its behavior
at infinity.
Bibliographical notes
(In [814], for the sake of consistency of the presentation I treated op-
timal coupling on R as a particular case of optimal coupling on Rn ,
however this has the drawback to involve subtle arguments.)
The Knothe–Rosenblatt coupling was introduced in 1952 by Rosen-
blatt [709], who suggested that it might be useful to “normalize” sta-
tistical data before applying a statistical test. In 1957, Knothe [523]
rediscovered it for applications to the theory of convex bodies. It is
quite likely that other people have discovered this coupling indepen-
dently. An infinite-dimensional generalization was studied by Bogachev,
Kolesnikov and Medvedev [134, 135].
FKG inequalities were introduced in [375], and have since then
played a crucial role in statistical mechanics. Holley’s proof by coupling
appears in [477]. Recently, Caffarelli [188] has revisited the subject in
connection with optimal transport.
It was in 1965 that Moser proved his coupling theorem, for smooth
compact manifolds without boundaries [640]; noncompact manifolds
were later considered by Greene and Shiohama [432]. Moser himself also
worked with Dacorogna on the more delicate case where the domain
is an open set with boundary, and the transport is required to fix the
boundary [270].
Strassen’s duality theorem is discussed e.g. in [814, Section 1.4].
The gluing lemma is due to several authors, starting with Vorob’ev
in 1962 for finite sets. The modern formulation seems to have emerged
around 1980, independently by Berkes and Philipp [101], Kallenberg,
Thorisson, and maybe others. Refinements were discussed e.g. by de
Acosta [273, Theorem A.1] (for marginals indexed by an arbitrary set)
or Thorisson [781, Theorem 5.1]; see also the bibliographic comments
in [317, p. 20]. For a proof of the statement in these notes, it is suf-
ficient to consult Dudley [317, Theorem 1.1.10], or [814, Lemma 7.6].
A comment about terminology: I like the word “gluing” which gives a
good indication of the construction, but many authors just talk about
“composition” of plans.
The formula of change of variables for C 1 or Lipschitz change of vari-
ables can be found in many textbooks, see e.g. Evans and Gariepy [331,
Chapter 3]. The generalization to approximately differentiable maps is
explained in Ambrosio, Gigli and Savaré [30, Section 5.5]. Such a gen-
erality is interesting in the context of optimal transportation, where
changes of variables are often very rough (say BV , which means of
bounded variation). In that context however, there is more structure:
Bibliographical notes 31
so in particular
d |αt |2 D E 2
= − ∇V (Xt )−∇V (Yt ), Xt −Yt ≤ −K Xt −Yt = −K |αt |2 .
dt 2
It follows by Gronwall’s lemma that
Assume for simplicity that E |X0 |2 and E |Y0 |2 are finite. Then
E |Xt − Yt |2 ≤ e−2Kt E |X0 − Y0 |2 ≤ 2 E |X0 |2 + E |Y0 |2 e−2Kt . (2.4)
e−V (y) dy
ν(dy) =
Z
R −V
(where Z = e is a normalization constant) is stationary: If
law (Y0 ) = ν, then also law (Yt ) = ν. Then (2.4) easily implies that
µt := law (Xt ) converges weakly to ν; in addition, the convergence is
exponentially fast.
Euclidean isoperimetry
Among all subsets of Rn with given surface, which one has the largest
volume? To simplify the problem, let us assume that we are looking
for a bounded open set Ω ⊂ Rn with, say, Lipschitz boundary ∂Ω, and
that the measure of |∂Ω| is given; then the problem is to maximize the
measure of |Ω|. To measure ∂Ω one should use the (n − 1)-dimensional
Hausdorff measure, and to measure Ω the n-dimensional Hausdorff
measure, which of course is the same as the Lebesgue measure in Rn .
It has been known, at least since ancient times, that the solution
to this “isoperimetric problem” is the ball. A simple scaling argument
shows that this statement is equivalent to the Euclidean isoperimetric
inequality:
36 2 Three examples of coupling techniques
|∂Ω| |∂B|
n ≥ n ,
|Ω| n−1 |B| n−1
where B is any ball.
There are very many proofs of the isoperimetric inequality, and
many refinements as well. It is less known that there is a proof by
coupling.
Here is a sketch of the argument, forgetting about regularity issues.
Let B be a ball such that |∂B| = |∂Ω|. Consider a random point X dis-
tributed uniformly in Ω, and a random point Y distributed uniformly
in B. Introduce the Knothe–Rosenblatt coupling of X and Y : This is
a deterministic coupling of the form Y = T (X), such that, at each
x ∈ Ω, the Jacobian matrix ∇T (x) is triangular with nonnegative di-
agonal entries. Since the law of X (resp. Y ) has uniform density 1/|Ω|
(resp. 1/|B|), the change of variables formula yields
1 1
∀x ∈ Ω = det ∇T (x) . (2.5)
|Ω| |B|
1 (∇ · T )(x)
1/n
≤ .
|Ω| n |B|1/n
1 |∂Ω|
|Ω|1− n ≤ 1 .
n |B| n
Caffarelli’s log-concave perturbation theorem 37
1
Since |∂Ω| = |∂B| = n|B|, the right-hand side is actually |B|1− n , so
the volume of Ω is indeed bounded by the volume of B. This concludes
the proof.
The above argument suggests the following problem:
Open Problem 2.1. Can one devise an optimal coupling between sets
(in the sense of a coupling between the uniform probability measures on
these sets) in such a way that the total cost of the coupling decreases
under some evolution converging to balls, such as mean curvature mo-
tion?
Bibliographical notes
minimize the total cost. Monge assumed that the transport cost of one
unit of mass along a certain distance was given by the product of the
mass by the distance.
x
y
remblais
déblais
Nowadays there is a Monge street in Paris, and therein one can find
an excellent bakery called Le Boulanger de Monge. To acknowledge this,
and to illustrate how Monge’s problem can be recast in an economic
perspective, I shall express the problem as follows. Consider a large
number of bakeries, producing loaves, that should be transported each
morning to cafés where consumers will eat them. The amount of bread
that can be produced at each bakery, and the amount that will be
consumed at each café are known in advance, and can be modeled as
probability measures (there is a “density of production” and a “density
of consumption”) on a certain space, which in our case would be Paris
(equipped with the natural metric such that the distance between two
points is the length of the shortest path joining them). The problem is
to find in practice where each unit of bread should go (see Figure 3.2),
in such a way as to minimize the total transport cost. So Monge’s
problem really is the search of an optimal coupling; and to be more
precise, he was looking for a deterministic optimal coupling.
Fig. 3.2. Economic illustration of Monge’s problem: squares stand for production
units, circles for consumption places.
3 The founding fathers of optimal transport 43
It was only several years after his main results that Kantorovich
made the connection with Monge’s work. The problem of optimal cou-
pling has since then been called the Monge–Kantorovich problem.
Throughout the second half of the twentieth century, optimal cou-
pling techniques and variants of the Kantorovich–Rubinstein distance
(nowadays often called Wasserstein distances, or other denominations)
were used by statisticians and probabilists. The “basis” space could be
finite-dimensional, or infinite-dimensional: For instance, optimal cou-
plings give interesting notions of distance between probability measures
on path spaces. Noticeable contributions from the seventies are due
to Roland Dobrushin, who used such distances in the study of parti-
cle systems; and to Hiroshi Tanaka, who applied them to study the
time-behavior of a simple variant of the Boltzmann equation. By the
mid-eighties, specialists of the subject, like Svetlozar Rachev or Ludger
Rüschendorf, were in possession of a large library of ideas, tools, tech-
niques and applications related to optimal transport.
During that time, reparametrization techniques (yet another word
for change of variables) were used by many researchers working on in-
equalities involving volumes or integrals. Only later would it be under-
stood that optimal transport often provides useful reparametrizations.
At the end of the eighties, three directions of research emerged inde-
pendently and almost simultaneously, which completely reshaped the
whole picture of optimal transport.
One of them was John Mather’s work on Lagrangian dynamical
systems. Action-minimizing curves are basic important objects in the
theory of dynamical systems, and the construction of closed action-
minimizing curves satisfying certain qualitative properties is a classical
problem. By the end of the eighties, Mather found it convenient to
study not only action-minimizing curves, but action-minimizing sta-
tionary measures in phase space. Mather’s measures are a generaliza-
tion of action-minimizing curves, and they solve a variational problem
which in effect is a Monge–Kantorovich problem. Under some condi-
tions on the Lagrangian, Mather proved a celebrated result according to
which (roughly speaking) certain action-minimizing measures are au-
tomatically concentrated on Lipschitz graphs. As we shall understand
in Chapter 8, this problem is intimately related to the construction of
a deterministic optimal coupling.
The second direction of research came from the work of Yann Bre-
nier. While studying problems in incompressible fluid mechanics, Bre-
3 The founding fathers of optimal transport 45
nier needed to construct an operator that would act like the projection
on the set of measure-preserving mappings in an open set (in probabilis-
tic language, measure-preserving mappings are deterministic couplings
of the Lebesgue measure with itself). He understood that he could do
so by introducing an optimal coupling: If u is the map for which one
wants to compute the projection, introduce a coupling of the Lebesgue
measure L with u# L. This study revealed an unexpected link between
optimal transport and fluid mechanics; at the same time, by pointing
out the relation with the theory of Monge–Ampère equations, Brenier
attracted the attention of the community working on partial differential
equations.
The third direction of research, certainly the most surprising, came
from outside mathematics. Mike Cullen was part of a group of meteo-
rologists with a well-developed mathematical taste, working on semi-
geostrophic equations, used in meteorology for the modeling of atmo-
spheric fronts. Cullen and his collaborators showed that a certain fa-
mous change of unknown due to Brian Hoskins could be re-interpreted
in terms of an optimal coupling problem, and they identified the min-
imization property as a stability condition. A striking outcome of this
work was that optimal transport could arise naturally in partial differ-
ential equations which seemed to have nothing to do with it.
All three contributions emphasized (in their respective domain) that
important information can be gained by a qualitative description of
optimal transport. These new directions of research attracted various
mathematicians (among the first, Luis Caffarelli, Craig Evans, Wilfrid
Gangbo, Robert McCann, and others), who worked on a better descrip-
tion of the structure of optimal transport and found other applications.
An important conceptual step was accomplished by Felix Otto, who
discovered an appealing formalism introducing a differential point of
view in optimal transport theory. This opened the way to a more geo-
metric description of the space of probability measures, and connected
optimal transport to the theory of diffusion equations, thus leading to
a rich interplay of geometry, functional analysis and partial differential
equations.
Nowadays optimal transport has become a thriving industry, involv-
ing many researchers and many trends. Apart from meteorology, fluid
mechanics and diffusion equations, it has also been applied to such di-
verse topics as the collapse of sandpiles, the matching of images, and the
design of networks or reflector antennas. My book, Topics in Optimal
46 3 The founding fathers of optimal transport
Transportation, written between 2000 and 2003, was the first attempt
to present a synthetic view of the modern theory. Since then the field
has grown much faster than I expected, and it was never so active as
it is now.
Bibliographical notes
Before the twentieth century, the main references for the problem of
“déblais et remblais” are the memoirs by Monge [636], Dupin [319] and
Appell [42]. Besides achieving important mathematical results, Monge
and Dupin were strongly committed to the development of society and
it is interesting to browse some of their writings about economics and
industry (a list can be found online at gallica.bnf.fr). A lively ac-
count of Monge’s life and political commitments can be found in Bell’s
delightful treatise, Men of Mathematics [80, Chapter 12]. It seems how-
ever that Bell did dramatize the story a bit, at the expense of accuracy
and neutrality. A more cold-blooded biography of Monge was written
by de Launay [277]. Considered as one the greatest geologists of his
time, not particularly sympathetic to the French Revolution, de Lau-
nay documented himself with remarkable rigor, going back to original
sources whenever possible. Other biographies have been written since
then by Taton [778, 779] and Aubry [50].
Monge originally formulated his transport problem in Euclidean
space for the cost function c(x, y) = |x − y|; he probably had no idea of
the extreme difficulty of a rigorous treatment. It was only in 1979 that
Sudakov [765] claimed a proof of the existence of a Monge transport
for general probability densities with this particular cost function. But
his proof was not completely correct, and was amended much later by
Ambrosio [20]. In the meantime, alternative rigorous proofs had been
devised first by Evans and Gangbo [330] (under rather strong assump-
tions on the data), then by Trudinger and Wang [791], and Caffarelli,
Feldman and McCann [190].
Kantorovich defined linear programming in [499], introduced his
minimization problem and duality theorem in [500], and in [501] applied
his theory to the problem of optimal transport; this note can be consid-
ered as the act of birth of the modern formulation of optimal transport.
Later he made the link with Monge’s problem in [502]. His major work
Bibliographical notes 47
ject, see in particular [269]; see also the review article [263], the works
by Cullen and Gangbo [266], Cullen and Feldman [265] or the recent
book by Cullen [262].
Further links between optimal transport and other fields of mathe-
matics (or physics) can be found in my book [814], or in the treatise by
Rachev and Rüschendorf [696]. An important source of inspiration was
the relation with the qualitative behavior of certain diffusive equations
arising from gas dynamics; this link was discovered by Jordan, Kinder-
lehrer and Otto at the end of the nineties [493], and then explored by
several authors [208, 209, 210, 211, 212, 213, 214, 216, 669, 671].
Below is a nonexhaustive list of some other unexpected applications.
Relations with the modeling of sandpiles are reviewed by Evans [328],
as well as compression molding problems; see also Feldman [353] (this
is for the cost function c(x, y) = |x − y|). Applications of optimal
transport to image processing and shape recognition are discussed by
Gangbo and McCann [400], Ahmad [6], Angenent, Haker, Tannen-
baum, and Zhu [462, 463], Chazal, Cohen-Steiner and Mérigot [224],
and many other contributors from the engineering community (see
e.g. [700, 713]). X.-J. Wang [834], and independently Glimm and
Oliker [419] (around 2000 and 2002 respectively) discovered that the
theoretical problem of designing reflector antennas could be recast in
terms of optimal transport for the cost function c(x, y) = − log(1−x·y)
on S 2 ; see [402, 419, 660] for further work in the area, and [420] for
another version of this problem involving two reflectors.1 Rubinstein
and Wolansky adapted the strategy in [420] to study the optimal de-
sign of lenses [712]; and Gutiérrez and Huang to treat a refraction
problem [453]. In his PhD Thesis, Bernot [108] made the link be-
tween optimal transport, irrigation and the design of networks. Such
topics were also considered by Santambrogio with various collabora-
tors [152, 207, 731, 732, 733, 734]; in particular it is shown in [732]
that optimal transport theory gives a rigorous basis to some varia-
tional constructions used by physicists and hydrologists to study river
basin morphology [65, 706]. Buttazzo and collaborators [178, 179, 180]
explored city planning via optimal transport. Brenier found a connec-
tion to the electrodynamic equations of Maxwell and related models
in string theory [161, 162, 163, 164, 165, 166]. Frisch and collaborators
1
According to Oliker, the connection between the two-reflector problem (as for-
mulated in [661]) and optimal transport is in fact much older, since it was first
formulated in a 1993 conference in which he and Caffarelli were participating.
Bibliographical notes 49
The first part of this course is devoted to the description and charac-
terization of optimal transport under certain regularity assumptions on
the measures and the cost function.
As a start, some general theorems about optimal transport plans are
established in Chapters 4 and 5, in particular the Kantorovich duality
theorem. The emphasis is on c-cyclically monotone maps, both in the
statements and in the proofs. The assumptions on the cost function
and the spaces will be very general.
From the Monge–Kantorovich problem one can derive natural dis-
tance functions on spaces of probability measures, by choosing the cost
function as a power of the distance. The main properties of these dis-
tances are established in Chapter 6.
In Chapter 7 a time-dependent version of the Monge–Kantorovich
problem is investigated, which leads to an interpolation procedure be-
tween probability measures, called displacement interpolation. The nat-
ural assumption is that the cost function derives from a Lagrangian
action, in the sense of classical mechanics; still (almost) no smoothness
is required at that level. In Chapter 8 I shall make further assumptions
of smoothness and convexity, and recover some regularity properties of
the displacement interpolant by a strategy due to Mather.
Then in Chapters 9 and 10 it is shown how to establish the exis-
tence of deterministic optimal couplings, and characterize the associ-
ated transport maps, again under adequate regularity and convexity
assumptions. The Change of variables Formula is considered in Chap-
ter 11. Finally, in Chapter 12 I shall discuss the regularity of the trans-
port map, which in general is not smooth.
The main results of this part are synthetized and summarized in
Chapter 13. A good understanding of this chapter is sufficient to go
through Part II of this course.
4
Basic properties
Existence
The first good thing about optimal couplings is that they exist:
Theorem 4.1 (Existence of an optimal coupling). Let (X , µ)
and (Y, ν) be two Polish probability spaces; let a : X → R ∪ {−∞}
and b : Y → R ∪ {−∞} be two upper semicontinuous functions such
that a ∈ L1 (µ), b ∈ L1 (ν). Let c : X × Y → R ∪ {+∞} be a lower
semicontinuous cost function, such that c(x, y) ≥ a(x) + b(y) for all
x, y. Then there is a coupling of (µ, ν) which minimizes the total cost
E c(X, Y ) among all possible couplings (X, Y ).
Remark 4.2. The lower bound assumption on c guarantees that the
expected cost E c(X, Y ) is well-defined in R ∪ {+∞}. In most cases of
applications — but not all — one may choose a = 0, b = 0.
The proof relies on basic variational arguments involving the topol-
ogy of weak convergence (i.e. imposed by bounded continuous test func-
tions). There are two key properties to check: (a) lower semicontinuity,
(b) compactness. These issues are taken care of respectively in Lem-
mas 4.3 and 4.4 below, which will be used again in the sequel. Before
going on, I recall Prokhorov’s theorem: If X is a Polish space, then
a set P ⊂ P (X ) is precompact for the weak topology if and only if it is
tight, i.e. for any ε > 0 there is a compact set Kε such that µ[X \Kε ] ≤ ε
for all µ ∈ P.
Lemma 4.3 (Lower semicontinuity of the cost functional). Let
X and Y be two Polish spaces, and c : X × Y → R ∪ {+∞} a lower
56 4 Basic properties
Then Z Z
c dπ ≤ lim inf c dπk .
X ×Y k→∞ X ×Y
R
In particular, if c is nonnegative, then F : π → c dπ is lower semicon-
tinuous on P (X × Y), equipped with the topology of weak convergence.
⊓
⊔
The desired result follows since this bound is independent of the cou-
pling, and Kε × Lε is compact in X × Y. ⊓
⊔
Thus π is minimizing. ⊓
⊔
Remark 4.5. This existence theorem does not imply that the optimal
cost is finite. ItR might be that all transport plans lead to an infinite
total cost, i.e. c dπ = +∞ for all π ∈ Π(µ, ν). A simple condition to
rule out this annoying possibility is
Z
c(x, y) dµ(x) dν(y) < +∞,
which guarantees that at least the independent coupling has finite total
cost. In the sequel, I shall sometimes make the stronger assumption
which implies that any coupling has finite total cost, and has other nice
consequences (see e.g. Theorem 5.10).
Restriction property
The second good thing about optimal couplings is that any sub-coupling
is still optimal. In words: If you have an optimal transport plan, then
any induced sub-plan (transferring part of the initial mass to part of
the final mass) has to be optimal too — otherwise you would be able
to lower the cost of the sub-plan, and as a consequence the cost of the
whole plan. This is the content of the next theorem.
58 4 Basic properties
e
π
π ′ :=
e[X × Y]
π
Proof of Theorem 4.6. Assume that π ′ is not optimal; then there exists
π ′′ such that
Then consider
b := (π − π
π e ′′ ,
e) + Zπ (4.3)
where Ze = π
e[X × Y] > 0. Clearly, π
b is a nonnegative measure. On the
other hand, it can be written as
e ′′ − π ′ );
b = π + Z(π
π
then (4.1) shows that π b has the same marginals as π, while (4.2) implies
that it has a lower transport cost than π. (Here I use the fact that the
total cost is finite.) This contradicts the optimality of π. The conclusion
is that π ′ is in fact optimal.
Convexity properties 59
Convexity properties
The following estimates are of constant use:
Theorem 4.8 (Convexity of the optimal cost). Let X and Y be
two Polish spaces, let c : X ×Y → R∪{+∞} be a lower semicontinuous
function, and let C be the associated optimal transport cost functional
on P (X ) × P (Y). Let (Θ, λ) be a probability space, and let µθ , νθ be two
measurable functions defined on Θ, with values in P (X ) and P (Y) re-
spectively. Assume that c(x, y) ≥ a(x) + b(y), where a ∈ L1 (dµθ dλ(θ)),
b ∈ L1 (dνθ dλ(θ)). Then
Z Z Z
C µθ λ(dθ), νθ λ(dθ) ≤ C(µθ , νθ ) λ(dθ) .
Θ Θ Θ
= C(µθ , νθ ) λ(dθ),
Θ
About the second question: Why don’t we try to apply the same
reasoning as in the proof of Theorem 4.1? The problem is that the set
of deterministic couplings is in general not compact; in fact, this set
is often dense in the larger space of all couplings! So we may expect
that the value of the infimum in the Monge problem coincides with the
value of the minimum in the Kantorovich problem; but there is no a
priori reason to expect the existence of a Monge minimizer.
Fig. 4.1. The optimal plan, represented in the left image, consists in splitting the
mass in the center into two halves and transporting mass horizontally. On the right
the filled regions represent the lines of transport for a deterministic (without splitting
of mass) approximation of the optimum.
Bibliographical notes 61
Bibliographical notes
Fig. 5.1. An attempt to improve the cost by a cycle; solid arrows indicate the mass
transport in the original plan, dashed arrows the paths along which a bit of mass is
rerouted.
The new plan is (strictly) better than the older one if and only if
Thus, if you can find such cycles (x1 , y1 ), . . . , (xN , yN ) in your trans-
ference plan, certainly the latter is not optimal. Conversely, if you do
not find them, then your plan cannot be improved (at least by the pro-
cedure described above) and it is likely to be optimal. This motivates
the following definitions.
Clearly, if we can find a pair (ψ, φ) and a transference plan π for which
there is equality, then (ψ, φ) is optimal in the left-hand side and π is
also optimal in the right-hand side.
A pair of price functions (ψ, φ) will informally be said to be com-
petitive if it satisfies (5.2). For a given y, it is of course in the interest
of the company to set the highest possible competitive price φ(y), i.e.
the highest lower bound for (i.e. the infimum of) ψ(x) + c(x, y), among
all bakeries x. Similarly, for a given x, the price ψ(x) should be the
supremum of all φ(y) − c(x, y). Thus it makes sense to describe a pair
of prices (ψ, φ) as tight if
φ(y) = inf ψ(x) + c(x, y) , ψ(x) = sup φ(y) − c(x, y) . (5.5)
x y
In words, prices are tight if it is impossible for the company to raise the
selling price, or lower the buying price, without losing its competitivity.
Consider an arbitrary pair of competitive prices (ψ, φ). Wecan al-
ways improve φ by replacing it by φ1 (y) = inf x ψ(x) + c(x, y) . Then
we can also improve ψ by replacing it by ψ1 (x) = supy φ1 (y) − c(x, y) ;
then replacing φ1 by φ2 (y) = inf x ψ1 (x) + c(x, y) , and so on. It turns
out that this process is stationary: as an easy exercise, the reader can
check that φ2 = φ1 , ψ2 = ψ1 , which means that after just one iteration
one obtains a pair of tight prices. Thus, when we consider the dual
Kantorovich problem (5.3), it makes sense to restrict our attention to
tight pairs, in the sense of equation (5.5). From that equation we can
reconstruct φ in terms of ψ, so we can just take ψ as the only unknown
in our problem.
That unknown cannot be just any function: if you take a general
function ψ, and compute φ by the first formula in (5.5), there is no
chance that the second formula will be satisfied. In fact this second
formula will hold true if and only if ψ is c-convex, in the sense of the
next definition (illustrated by Figure 5.2).
or equivalently
y0 y1 x2
x0 x1 y2
1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
1111111111
0000000000 1111111111
0000000000
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
Fig. 5.2. A c-convex function is a function whose graph you can entirely caress
from below with a tool whose shape is the negative of the cost function (this shape
might vary with the point y). In the picture yi ∈ ∂c ψ(xi ).
open set Ω where it is finite, but lower semicontinuity might fail at the
boundary of Ω.) One can think of the cost function c(x, y) = −x · y
as basically the same as c(x, y) = |x − y|2 /2, since the “interaction”
between the positions x and y is the same for both costs.
Particular Case 5.4. If c = d is a distance on some metric space
X , then a c-convex function is just a 1-Lipschitz function, and it
is its own c-transform. Indeed, if ψ is c-convex it is obviously 1-
Lipschitz; conversely, if ψ is 1-Lipschitz, then ψ(x) ≤ ψ(y) + d(x, y), so
ψ(x) = inf y [ψ(y) + d(x, y)] = ψ c (x). As an even more particular case,
if c(x, y) = 1x6=y , then ψ is c-convex if and only if sup ψ − inf ψ ≤ 1,
and then again ψ c = ψ. (More generally, if c satisfies the triangle in-
equality c(x, z) ≤ c(x, y) + c(y, z), then ψ is c-convex if and only if
ψ(y) − ψ(x) ≤ c(x, y) for all x, y; and then ψ = ψ c .)
Remark 5.5. There is no measure theory in Definition 5.2, so no as-
sumption of measurability is made, and the supremum in (5.6) is a true
supremum, not just an essential supremum; the same for the infimum
in (5.7). If c is continuous, then a c-convex function is automatically
lower semicontinuous, and its subdifferential is closed; but if c is not
continuous the measurability of ψ and ∂c ψ is not a priori guaranteed.
Remark 5.6. I excluded the case when ψ ≡ +∞ so as to avoid trivial
situations; what I called a c-convex function might more properly (!)
be called a proper c-convex function. This automatically implies that
ζ in (5.6) does not take the value +∞ at all if c is real-valued. If
c does achieve infinite values, then the correct convention in (5.6) is
(+∞) − (+∞) = −∞.
If ψ is a function on X , then its c-transform is a function on Y.
Conversely, given a function on Y, one may define its c-transform as a
function on X . It will be convenient in the sequel to define the latter
concept by an infimum rather than a supremum. This convention has
the drawback of breaking the symmetry between the roles of X and Y,
but has other advantages that will be apparent later on.
Definition 5.7 (c-concavity). With the same notation as in Defini-
tion 5.2, a function φ : Y → R ∪ {−∞} is said to be c-concave if it
is not identically −∞, and there exists ψ : X → R ∪ {±∞} such that
φ = ψ c . Then its c-transform is the function φc defined by
∀x ∈ X φc (x) = sup φ(y) − c(x, y) ;
y∈Y
Kantorovich duality 69
In spite of its short and elementary proof, the next crucial result is
one of the main justifications of the concept of c-convexity.
then the choice x e = x shows that φccc (x) ≤ φc (x); while the choice
ye = y shows that φccc (x) ≥ φc (x).
If ψ is c-convex, then there is ζ such that ψ = ζ c , so ψ cc = ζ ccc =
c
ζ = ψ.
The converse is obvious: If ψ cc = ψ, then ψ is c-convex, as the
c-transform of ψ c . ⊓
⊔
Kantorovich duality
We are now ready to state and prove the main result in this chapter.
70 5 Cyclical monotonicity and Kantorovich duality
and in the above suprema one might as well impose that ψ be c-convex
and φ c-concave.
R
(ii) If c is real-valued and the optimal cost C(µ, ν) = inf π∈Π(µ,ν) c dπ
is finite, then there is a measurable c-cyclically monotone set Γ ⊂ X ×Y
(closed if a, b, c are continuous) such that for any π ∈ Π(µ, ν) the fol-
lowing five statements are equivalent:
(a) π is optimal;
(b) π is c-cyclically monotone;
(c) There is a c-convex ψ such that, π-almost surely,
ψ c (y) − ψ(x) = c(x, y);
(d) There exist ψ : X → R ∪ {+∞} and φ : Y → R ∪ {−∞},
such that φ(y) − ψ(x) ≤ c(x, y) for all (x, y),
with equality π-almost surely;
(e) π is concentrated on Γ .
(iii) If c is real-valued, C(µ, ν) < +∞, and one has the pointwise
upper bound
then both the primal and dual Kantorovich problems have solutions, so
Z
min c(x, y) dπ(x, y)
π∈Π(µ,ν) X ×Y
Z Z
= max φ(y) dν(y) − ψ(x) dµ(x)
(ψ,φ)∈L1 (µ)×L1 (ν); φ−ψ≤c Y X
Z Z
c
= max ψ (y) dν(y) − ψ(x) dµ(x) ,
ψ∈L1 (µ) Y X
Remark 5.12. Note the difference between statements (b) and (e):
The set Γ appearing in (ii)(e) is the same for all optimal π’s, it only
depends on µ and ν. This set is in general not unique. If c is contin-
uous and Γ is imposed to be closed, then one can define a smallest
Γ , which is the closure of the union of all the supports of the opti-
mal π’s. There is also a largest Γ , which is the intersection of all the
c-subdifferentials ∂c ψ, where ψ is such that there exists an optimal π
supported in ∂c ψ. (Since the cost function is assumed to be continuous,
the c-subdifferentials are closed, and so is their intersection.)
Remark 5.15. If the variables x and y are swapped, then (µ, ν) should
be replaced by (ν, µ) and (ψ, φ) by (−φ, −ψ).
Particular Case 5.16. Particular Case 5.4 leads to the following vari-
ant of Theorem 5.10. When c(x, y) = d(x, y) is a distance on a Polish
space X , and µ, ν belong to P1 (X ), then
Z Z
inf E d(X, Y ) = sup E [ψ(X) − ψ(Y )] = sup ψ dµ − ψ dν .
X Y
(5.11)
where the infimum on the left is over all couplings (X, Y ) of (µ, ν), and
the supremum on the right is over all 1-Lipschitz functions ψ. This is
the Kantorovich–Rubinstein formula; it holds true as soon as the
supremum in the left-hand side is finite, and it is very useful.
−(|x|2 + |y|2 )/2. So if x → |x|2 ∈ L1 (µ) and y → |y|2 ∈ L1 (ν), then one
can invoke the Particular Case 5.3 to deduce from Theorem 5.10 that
Z Z
∗
∗
sup E (X · Y ) = inf E ϕ(X) + ϕ (Y ) = inf ϕ dµ + ϕ dν ,
X Y
(5.12)
where the supremum on the left is over all couplings (X, Y ) of (µ, ν), the
infimum on the right is over all (lower semicontinuous) convex functions
on Rn , and ϕ∗ stands for the usual Legendre transform of ϕ. In for-
mula (5.12), the signs have been changed with respect to the statement
of Theorem 5.10, so the problem is to maximize the correlation of
the random variables X and Y .
Rigorous proof of Theorem 5.10, Part (i). First I claim that it is suffi-
cient to treat the case when c is nonnegative. Indeed, let
Z Z
c(x, y) := c(x, y) − a(x) − b(y) ≥ 0,
e Λ := a dµ + b dν ∈ R.
c real-valued =⇒ e
c real-valued
c lower semicontinuous =⇒ e
c lower semicontinuous
ψe ∈ L1 (µ) ⇐⇒ ψ ∈ L1 (µ); φe ∈ L1 (ν) ⇐⇒ φ ∈ L1 (ν);
Z Z
∀π ∈ Π(µ, ν), c dπ = c dπ − Λ;
e
Z Z Z Z
1 1
∀(ψ, φ) ∈ L (µ) × L (ν), φe dν − ψe dµ = φ dν − ψ dν − Λ;
ψ is c-convex ⇐⇒ ψe is e
c-convex;
φ is c-concave ⇐⇒ φe is e
c-concave;
e ψ)
(φ, ψ) are c-conjugate ⇐⇒ (φ, e are e
c-conjugate;
Γ is c-cyclically monotone ⇐⇒ Γ is e
c-cyclically monotone.
Thanks to these formulas, it is equivalent to establish Theorem 5.10
for the cost c or for the nonnegative cost e
c. So in the sequel, I shall
assume, without further comment, that c is nonnegative.
The rest of the proof is divided into five steps.
P P
Step 1: If µ = (1/n) ni=1 δxi , ν = (1/n) nj=1 δyj , where the costs
c(xi , yj ) are finite, then there is at least one cyclically monotone trans-
ference plan.
Indeed, in that particular case, a transference plan between µ and
ν can be identified with a bistochastic n × n array of real numbers
aij ∈ [0, 1]: each aij tells what proportion of the 1/n mass carried by
point xi will go to destination yj . So the Monge–Kantorovich problem
becomes X
inf aij c(xi , yi )
(aij )
ij
Here we are minimizing a linear function on the compact set [0, 1]n×n ,
so obviously there exists a minimizer; the corresponding transference
plan π can be written as
76 5 Cyclical monotonicity and Kantorovich duality
1X
π= aij δ(xi ,yj ) ,
n
ij
and its support S is the set of all couples (xi , yj ) such that aij > 0.
Assume that S is not cyclically monotone: Then there exist N ∈ N
and (xi1 , yj1 ), . . . , (xiN , yjN ) in S such that
c(xi1 , yj2 ) + c(xi2 , yj3 ) + . . . + c(xiN , yj1 ) < c(xi1 , yj1 ) + . . . + c(xiN , yjN ).
(5.15)
Let a := min(ai1 ,j1 , . . . , aiN ,jN ) > 0. Define a new transference plan π e
by the formula
N
aX
e=π+
π δ(xi ,yj ) − δ(xi ,yj ) .
n ℓ ℓ+1 ℓ ℓ
ℓ=1
It is easy to check that this has the correct marginals, and by (5.15)
the cost associated with πe is strictly less than the cost associated with
π. This is a contradiction, so S is indeed c-cyclically monotone!
Step 2: If c is continuous, then there is a cyclically monotone trans-
ference plan.
To prove this, consider sequences of independent random variables
xi ∈ X , yj ∈ Y, with respective law µ, ν. According to the law of
large numbers for empirical measures (sometimes called fundamental
theorem of statistics, or Varadarajan’s theorem), one has, with proba-
bility 1,
n n
1X 1X
µn := δxi −→ µ, νn := δyj −→ ν (5.16)
n n
i=1 j=1
In the definition of ψ, it does not matter whether one takes the supre-
mum over m − 1 or over m variables, since one also takes the supremum
over m. So the previous inequality can be recast as
In particular, ψ(x) + c(x, y) ≥ ψ(x) + c(x, y). Taking the infimum over
x ∈ X in the left-hand side, we deduce that
1
A lower semicontinuous function on a Polish space is always measurable, even if
it is obtained as a supremum of uncountably many continuous functions; in fact
it can always be written as a supremum of countably many continuous functions!
Kantorovich duality 79
dual problem with cost c. Moreover, for each k the functions φk and
ψk are uniformly continuous because c itself is uniformly continuous.
By Lemma 4.4, Π(µ, ν) is weakly sequentially compact. Thus, up to
extraction of a subsequence, we can assume that πk converges to some
e ∈ Π(µ, ν). For all indices ℓ ≤ k, we have cℓ ≤ ck , so
π
Z Z
cℓ de
π = lim cℓ dπk
k→∞
Z
≤ lim sup ck dπk
k→∞
Z Z
= lim sup φk (y) dν(y) − ψk (x) dµ(x) .
k→∞
So
Z Z Z Z
inf c dπ ≤ π ≤ lim sup
c de φk (y) dν(y) − ψk (x) dµ(x)
Π(µ,ν) k→∞
Z
≤ inf c dπ.
Π(µ,ν)
Moreover,
Z Z Z
φk (y) dν(y) − ψk (x) dµ(x) −−−→ inf c dπ. (5.20)
k→∞ Π(µ,ν)
Since each pair (ψk , φk ) lies in Cb (X ) × Cb (Y), the duality also holds
with bounded continuous (and even Lipschitz) test functions, as claimed
in Theorem 5.10(i). ⊓
⊔
Proof of Theorem 5.10, Part (ii). From now on, I shall assume that the
optimal transport cost C(µ, ν) is finite, and that c is real-valued. As
in the proof of Part (i) I shall assume that c is nonnegative, since the
general case can always be reduced to that particular case. Part (ii)
will be established in the following way: (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒
(a) ⇒ (e) ⇒ (b). There seems to be some redundancy in this chain of
implications, but this is because the implication (a) ⇒ (c) will be used
to construct the set Γ appearing in (e).
Kantorovich duality 81
where
Fm,k x0 , y0 , . . . , xm , ym := c(x0 , y0 ) − c(x1 , y0 )
+ c(x1 , y1 ) − c(x2 , y1 ) + · · · + c(xm , ym ) − c(x, ym ) .
Kantorovich duality 83
then we have
n
lim ψm,k,ℓ (x) = sup c(x0 , y0 ) − c(x1 , y0 ) + c(x1 , y1 ) − c(x2 , y1 )
ℓ→∞
o
+ · · · + c(xm , ym ) − c(x, ym ) ; (x1 , y1 ), . . . , (xm , ym ) ∈ Γk .
and
Z Z
ξn dπ −−−→ ξ dπ = 0.
ξ<0 n→∞ ξ<0
so π is optimal.
Before completing the chain of equivalences, we should first con-
struct the set Γ . By Theorem 4.1, there is at least one optimal trans-
ference plan, say πe. From the implication (a) ⇒ (c), there is some ψe
such that π e just choose Γ := ∂c ψ.
e is concentrated on ∂c ψ; e
(a) ⇒ (e): Let πe be the optimal plan used to construct Γ , and let
e
ψ = ψ be the associated c-convex function; let φ = ψ c . Then let π
e have the same cost and same
be another optimal plan. Since π and π
marginals,
Z Z Z
c dπ = c de π = lim (Tn φ − Tn ψ) de
π
n→∞
Z
= lim (Tn φ − Tn ψ) dπ,
n→∞
where Tn is the truncation operator that was already used in the proof
of (d) ⇒ (a). So
Z
c(x, y) − Tn φ(y) + Tn ψ(x) dπ(x, y) −−−→ 0. (5.25)
n→∞
Proof of Theorem 5.10, Part (iii). As in the proof of Part (i) we may
assume that c ≥ 0. Let π be optimal, and let ψ be a c-convex func-
tion such that π is concentrated on ∂c ψ. Let φ := ψ c . The goal is to
show that under the assumption c ≤ cX + cY , (ψ, φ) solves the dual
Kantorovich problem.
The point is to prove that ψ and φ are integrable. For this we repeat
the estimates of Step 4 in the proof of Part (i), with some variants: After
securing (x0 , y0 ) such that φ(y0 ), ψ(x0 ), cX (x0 ) and cY (y0 ) are finite,
we write
ψ(x) + cX (x) = sup φ(y) − c(x, y) + cX (x) ≥ sup [φ(y) − cY (y)]
y y
≥ φ(y0 ) − cY (y0 );
φ(y) − cY (y) = inf ψ(x) + c(x, y) − cY (y) ≤ inf [ψ(x) + cX (x)]
x x
≤ ψ(x0 ) + cX (x0 ).
and it results from Part (i) of the theorem that both π and (ψ, φ) are
optimal, respectively in the original and the dual Kantorovich prob-
lems.
Restriction property 87
To prove the last part of (iii), assume that c is continuous; then the
subdifferential of any c-convex function is a closed c-cyclically mono-
tone set.
Let π be an arbitrary optimal transference plan, and (ψ, φ) an op-
timal pair of prices. We know that (ψ, ψ c ) is optimal in the dual Kan-
torovich problem, so
Z Z Z
c
c(x, y) dπ(x, y) = ψ dν − ψ dµ.
Restriction property
The dual side of the Kantorovich problem also behaves well with respect
to restriction, as shown by the following results.
Application: Stability
An important consequence of Theorem 5.10 is the stability of optimal
transference plans. For simplicity I shall prove it under the assumption
that c is bounded below.
Theorem 5.20 (Stability of optimal transport). Let X and Y be
Polish spaces, and let c ∈ C(X × Y) be a real-valued continuous cost
function, inf c > −∞. Let (ck )k∈N be a sequence of continuous cost
functions converging uniformly to c on X × Y. Let (µk )k∈N and (νk )k∈N
be sequences of probability measures on X and Y respectively. Assume
that µk converges to µ (resp. νk converges to ν) weakly. For each k, let
πk be an optimal transference plan between µk and νk . If
Z
∀k ∈ N, ck dπk < +∞,
Since this is a closed set, the same is true for π ⊗N , and then by letting
ε → 0 we see that π ⊗N is concentrated on the set C(N ) defined by
X X
c(xi , yi ) ≤ c(xi , yi+1 ).
1≤i≤N 1≤i≤N
Theorem 5.20 also admits the following corollary about the stability
of transport maps.
stand for the value of the optimal transport cost of transport between
µ and ν.
If ν is a given reference measure, inequalities of the form
∀µ ∈ P (X ), C(µ, ν) ≤ F (µ)
Remark 5.28. One may simplify (ii’) by taking the supremum over t;
since Λ is nonincreasing, the result is
Z
Λ Φ φ dν − φc ≤ 0. (5.29)
Y
∀µ ∈ P (X ), C(µ, ν) ≤ Hν (µ)
and R Z
φ dν c
∀φ ∈ Cb (X ), e ≤ eφ dν
are equivalent.
Proof of Theorem 5.26. First assume that (i) is satisfied. Then for all
ψ ≥ φc ,
Z Z Z
Λ φ dν − ψ = sup φ dν − ψ dµ − F (µ)
Y µ∈P (X ) X Y
Z Z
= sup φ dν − ψ dµ − F (µ)
µ∈P (X ) Y X
h i
≤ sup C(µ, ν) − F (µ) ≤ 0,
µ∈P (X )
where the easiest part of Theorem 5.10 (that is, inequality (5.4)) was
used to go from the next-to-last line to the last one. Then (ii) follows
upon taking the supremum over ψ.
Application: Dual formulation of transport inequalities 95
Conversely, assume that (ii) is satisfied. Then, for any pair (ψ, φ) ∈
Cb (X ) × Cb (Y) one has, by (5.28),
Z Z Z Z Z
φ dν− ψ dµ = φ dν − ψ dµ ≤ Λ φ dν − ψ + F (µ).
Y X X Y Y
Then (i) follows upon taking the supremum over φ ∈ Cb (Y) and apply-
ing Theorem 5.10 (i).
Now let us consider the equivalence between (i’) and (ii’). By as-
sumption, Φ(r) ≤ 0 for r ≤ 0, so Φ∗ (t) = supr [r t − Φ(r)] = +∞ if
t < 0. Then the Legendre inversion formula says that
∀r ∈ R, Φ(r) = sup r t − Φ∗ (t) .
t∈R+
(The important thing is that the supremum is over R+ and not R.)
If (i’) is satisfied, then for all φ ∈ Cb (X ), for all ψ ≥ φc and for all
t ∈ R+ ,
Z
Λ t φ dν − t ψ − Φ∗ (t)
Y
Z Z
∗
= sup t φ dν − t ψ − Φ (t) dµ − F (µ)
µ∈P (X ) X Y
Z Z
∗
= sup t φ dν − ψ dµ − Φ (t) − F (µ)
µ∈P (X ) Y X
h i
∗
≤ sup t C(µ, ν) − Φ (t) − F (µ)
µ∈P (X )
h i
≤ sup Φ(C(µ, ν)) − F (µ) ≤ 0,
µ∈P (X )
Bibliographical notes
There are many ways to state the Kantorovich duality, and even more
ways to prove it. There are also several economic interpretations, that
belong to folklore. The one which I formulated in this chapter is a
variant of one that I learnt from Caffarelli. Related economic inter-
pretations underlie some algorithms, such as the fascinating “auction
98 5 Cyclical monotonicity and Kantorovich duality
has been written; see [696, Chapter 6] for results and references. This
topic is closely related to the subject of “Kantorovich norms”: see [464],
[695, Chapters 5 and 6], [450, Chapter 4], [149] and the many references
therein.
Around the mid-eighties, it was understood that the study of
the dual problem, and in particular the existence of a maximizer,
could lead to precious qualitative information about the solution of
the Monge–Kantorovich problem. This point of view was emphasized
by many authors such as Knott and Smith [524], Cuesta-Albertos,
Matrán and Tuero-Dı́az [254, 255, 259], Brenier [154, 156], Rachev and
Rüschendorf [696, 722], Abdellaoui and Heinich [1, 2], Gangbo [395],
Gangbo and McCann [398, 399], McCann [616] and probably others.
Then Ambrosio and Pratelli proved the existence of a maximizing pair
under the conditions (5.10); see [32, Theorem 3.2]. Under adequate as-
sumptions, one can also prove the existence of a maximizer for the dual
problem by direct arguments which do not use the original problem (see
for instance [814, Chapter 2]).
The classical theory of convexity and its links with the property
of cyclical monotonicity are exposed in the well-known treatise by
Rockafellar [705]. The more general notions of c-convexity and c-
cyclical monotonicity were studied by several researchers, in particular
Rüschendorf [722]. Some authors prefer to use c-concavity; I personally
advocate working with c-convexity, because signs get better in many
situations. However, the conventions used in the present set of notes
have the disadvantage that the cost function c( · , y) is not c-convex.
A possibility to remedy this would be to call (−c)-convexity the no-
tion which I defined. This convention (suggested to me by Trudinger)
is appealing, but would have forced me to write (−c)-convex hundreds
of times throughout this book.
The notation ∂c ψ(x) and the terminology of c-subdifferential is de-
rived from the usual notation ∂ψ(x) in convex analysis. Let me stress
that in my notation ∂c ψ(x) is a set of points, not a set of tangent vectors
or differential forms. Some authors prefer to call ∂c ψ(x) the contact
set of ψ at x (any y in the contact set is a contact point) and to use
the notation ∂c ψ(x) for a set of tangent vectors (which under suitable
assumptions can be identified with the contact set, and which I shall
denote by −∇x c(x, ∂c ψc (x)), or ∇− c ψ(x), in Chapters 10 and 12).
In [712] the authors argue that c-convex functions should be con-
structible in practice when the cost function c is convex (in the usual
Bibliographical notes 101
Assume, as before, that you are in charge of the transport of goods be-
tween producers and consumers, whose respective spatial distributions
are modeled by probability measures. The farther producers and con-
sumers are from each other, the more difficult will be your job, and you
would like to summarize the degree of difficulty with just one quantity.
For that purpose it is natural to consider, as in (5.27), the optimal
transport cost between the two measures:
Z
C(µ, ν) = inf c(x, y) dπ(x, y), (6.1)
π∈Π(µ,ν)
where c(x, y) is the cost for transporting one unit of mass from x to
y. Here we do not care about the shape of the optimizer but only the
value of this optimal cost.
One can think of (6.1) as a kind of distance between µ and ν, but in
general it does not, strictly speaking, satisfy the axioms of a distance
function. However, when the cost is defined in terms of a distance, it
is easy to cook up a distance from (6.1):
Example 6.3. Wp (δx , δy ) = d(x, y). In this example, the distance does
not depend on p; but this is not the rule.
1 p 1
p
Wp (µ1 , µ3 ) ≤ E d(X1′ , X3′ )p p
≤ E d(X1′ , X2′ ) + d(X2′ , X3′ )
1 1
p p
≤ E d(X1′ , X2′ )p + E d(X2′ , X3′ )p
= Wp (µ1 , µ2 ) + Wp (µ2 , µ3 ),
Remark 6.5. Theorem 5.10(i) and Particular Case 5.4 together lead
to the useful duality formula for the Kantorovich–Rubinstein
distance: For any µ, ν in P1 (X ),
Z Z
W1 (µ, ν) = sup ψ dµ − ψ dν . (6.3)
kψkLip ≤1 X X
Among many applications of this formula I shall just mention the fol-
lowing covariance inequality: if f is a probability density with respect
to µ then
Z Z Z
f dµ g dµ − (f g) dµ ≤ kgkLip W1 (f µ, µ).
p ≤ q =⇒ Wp ≤ Wq . (6.4)
Z Z
(ii) µk −→ µ and lim sup d(x0 , x) dµk (x) ≤ d(x0 , x)p dµ(x);
p
k→∞
Z
(iii) µk −→ µ and lim lim sup d(x0 , x)p dµk (x) = 0;
R→∞ k→∞ d(x0 ,x)≥R
(iv) For all continuous functions ϕ with |ϕ(x)| ≤ C 1 + d(x0 , x)p ,
C ∈ R, one has
Z Z
ϕ(x) dµk (x) −→ ϕ(x) dµ(x).
To prove Theorem 6.9 I shall use the following lemma, which has
interest on its own and will be useful again later.
The proof is not so obvious and one might skip it at first reading.
remains bounded as k → ∞.
Since Wp ≥ W1 , the sequence (µk ) is also Cauchy in the W1 sense.
Let ε > 0 be given, and let N ∈ N be such that
ε2
µk [Uε ] ≥ 1 − ε − = 1 − 2ε.
ε
At this point we have shown the following: For each ε > 0 there
is a finite family (xi )1≤i≤m such that all measures µk give mass at
least 1 − 2ε to the set Z := ∪B(xi , 2ε). The point is that Z might not
be compact. There is a classical remedy: Repeat the reasoning with ε
replaced by 2−(ℓ+1) ε, ℓ ∈ N; so there will be (xi )1≤i≤m(ℓ) such that
[
µk X \ B(xi , 2 ε) ≤ 2−ℓ ε.
−ℓ
1≤i≤m(ℓ)
Thus
µk [X \ S] ≤ ε,
where \ [
S := B(xi , ε2−p ).
1≤p≤∞ 1≤i≤m(p)
µ, µ) ≤ lim
Wp (e ′
inf Wp (µk′ , µ) = 0.
k →∞
(a + b)p ≤ (1 + ε) ap + Cε bp .
But of course,
Z
d(x, y)p dπk (x, y) = Wp (µk , µ)p −−−→ 0;
k→∞
therefore,
Z Z
p
lim sup d(x0 , x) dµk (x) ≤ (1 + ε) d(x0 , x)p dµ(x).
k→∞
1d(x,y)≥R
≤ 1[d(x,x0 )≥R/2 and d(x,x0 )≥d(x,y)/2] + 1[d(x0 ,y)≥R/2 and d(x0 ,y)≥d(x,y)/2] .
So, obviously
d(x, y)p − Rp + ≤ d(x, y)p 1[d(x,x0 )≥R/2 and d(x,x0 )≥d(x,y)/2]
+ d(x, y)p 1[d(x0 ,y)≥R/2 and d(x0 ,y)≥d(x,y)/2]
≤ 2p d(x, x0 )p 1d(x,x0 )≥R/2 + 2p d(x0 , y)p 1d(x0 ,y)≥R/2 .
It follows that
Z
Wp (µk , µ)p = d(x, y)p dπk (x, y)
Z Z
p
= d(x, y) ∧ R dπk (x, y) + d(x, y)p − Rp + dπk (x, y)
Z Z
p
≤ d(x, y) ∧ R dπk (x, y) + 2p d(x, x0 )p dπk (x, y)
d(x,x0 )≥R/2
Z
+ 2p d(x0 , y)p dπk (x, y)
d(x0 ,y)>R/2
Z Z
p p
= d(x, y) ∧ R dπk (x, y) + 2 d(x, x0 )p dµk (x)
d(x,x0 )≥R/2
Z
+ 2p d(x0 , y)p dµ(y).
d(x0 ,y)≥R/2
Control by total variation 115
where the infimum is over all couplings (X, Y ) of (µ, ν); this identity
can be seen as a very particular case of Kantorovich duality for the cost
function 1x6=y .
It seems natural that a control in Wasserstein distance should be
weaker than a control in total variation. This is not completely true,
because total variation does not take into account large distances. But
one can control Wp by weighted total variation:
Remark 6.17. The integral in the right-hand side of (6.12) can be in-
terpreted as the Wasserstein distance W1 for the particular cost func-
tion [d(x0 , x) + d(x0 , y)]1x6=y .
116 6 The Wasserstein distances
Z
1
= d(x, y)p d(µ − ν)+ (x) d(µ − ν)− (y)
a
Z
2p−1
≤ d(x, x0 )p + d(x0 , y)p d(µ − ν)+ (x) d(µ − ν)− (y)
a
Z Z
p−1 p p
≤2 d(x, x0 ) d(µ − ν)+ (x) + d(x0 , y) d(µ − ν)− (y)
Z
= 2p−1 d(x, x0 )p d (µ − ν)+ + (µ − ν)− (x)
Z
p−1
=2 d(x, x0 )p d|µ − ν|(x).
⊓
⊔
Proof of Theorem 6.18. The fact that Pp (X ) is a metric space was al-
ready explained, so let us turn to the proof of separability. Let D be
a dense sequence in P X , and let P be the space of probability measures
that can be written aj δxj , where the aj are rational coefficients, and
the xj are finitely many elements in D. It will turn out that P is dense
in Pp (X ).
To prove this, let ε > 0 be given, and let x0 be an arbitrary element
of D. If µ lies in Pp (X ), then there exists a compact set K ⊂ X such
that Z
d(x0 , x)p dµ(x) ≤ εp .
X \K
µ, Wp (µ, f# µ) ≤ 2εp .
Since (Id , f ) is a coupling of µ and f#P
Of course, f# µ can be written as aj δxj , 0 ≤ j ≤ N . This shows
that µ might be approximated, with arbitrary precision, by a finite
combination of Dirac masses. To conclude, it is sufficient to show that
118 6 The Wasserstein distances
and obviously the latter quantity can be made as small as possible for
some well-chosen rational coefficients bj .
Finally, let us prove the completeness. Again let (µk )k∈N be a
Cauchy sequence in Pp (X ). By Lemma 6.14, it admits a subsequence
(µk′ ) which converges weakly (in the usual sense) to some measure µ.
Then,
Z Z
p
d(x0 , x) dµ(x) ≤ lim
′
inf d(x0 , x)p dµk′ (x) < +∞,
k →∞
so in particular
which means that µℓ′ converges to µ in the Wp sense (and not just in
the sense of weak convergence). Since (µk ) is a Cauchy sequence with
a converging subsequence, it follows by a classical argument that the
whole sequence is converging. ⊓
⊔
Bibliographical notes
and maybe others); (b) the explicit definition of this distance is not so
easy to find in Wasserstein’s work; and (c) Wasserstein was only inter-
ested in the case p = 1. By the way, also the spelling of Wasserstein is
doubtful: the original spelling was Vasershtein. (Similarly, Rubinstein
was spelled Rubinshtein.) These issues are discussed in a historical
note by Rüschendorf [720], who advocates the denomination of “min-
imal Lp -metric” instead of “Wasserstein distance”. Also Vershik [808]
tells about the discovery of the metric by Kantorovich and stands up
in favor of the terminology “Kantorovich distance”.
However, the terminology “Wasserstein distance” (or “Wasserstein
metric”) has been extremely successful: at the time of writing, about
30,000 occurrences can be found on the Internet. Nearly all recent pa-
pers relating optimal transport to partial differential equations, func-
tional inequalities or Riemannian geometry (including my own works)
use this convention. I will therefore stick to this by-now well-established
terminology. After all, even if this convention is a bit unfair since it does
not give credit to all contributors, not even to the most important of
them (Kantorovich), at least it does give credit to somebody.
As I learnt from Bernot, terminological confusion was enhanced in
the mid-nineties, when a group of researchers in image processing in-
troduced the denomination of “Earth Mover’s distance” [713] for the
Wasserstein (Kantorovich–Rubinstein) W1 distance. This terminology
was very successful and rapidly spread by the high rate of growth of the
engineering literature; it is already starting to compete with “Wasser-
stein distance”, scoring more than 15,000 occurrences on Internet.
Gini considered the special case where the random variables are dis-
crete and lie on the real line; like Mallows later, he was interested by
applications in statistics (the “Gini distance” is often used to roughly
quantify the inequalities of wealth or income distribution in a given
population). Tanaka discovered applications to partial differential equa-
tions. Both Mallows and Tanaka worked with the case p = 2, while Gini
was interested both in p = 1 and p = 2, and Hoeffding and Fréchet
worked with general p (see for instance [381]). A useful source on the
point of view of Kantorovich and Rubinstein is Vershik’s review [809].
Kantorovich and Rubinstein [506] made the important discovery
that the original Kantorovich distance (W1 in my notation) can be
extended into a norm on the set M (X ) of signed measures over a Pol-
ish space X . It is common to call this extension the Kantorovich–
Rubinstein norm, and by abuse of language I also used the denomina-
120 6 The Wasserstein distances
due to Rachev [695, Section 6.3], and Ambrosio, Gigli and Savaré [30].
In the latter reference the proof is very simple but makes use of the
deep Kolmogorov extension theorem. Here I followed a much more el-
ementary argument due to Bolley [136].
The statement in Remark 6.19 is proven in [30, Remark 7.1.9].
In a Euclidean or Riemannian context, the Wasserstein distance
W2 between two very close measures, say (1 + h1 ) ν and (1 + h2 ) ν with
h1 , h2 very small, is approximately equal to the H −1 (ν)-norm of h1 −h2 ;
see [671, Section 7], [814, Section 7.6] or Exercise 22.20. (One may also
take a look at [567, 569].) There is in fact a natural one-parameter
family of distances interpolating between H −1 (ν) and W2 , defined by
a variation on the Benamou–Brenier formula (7.34) (insert a factor
(dµt /dν)1−α , 0 ≤ α ≤ 1 in the integrand of (7.33); this construction is
due to Dolbeault, Nazaret and Savaré [312]).
Applications of the Wasserstein distances are too numerous to
be listed here; some of them will be encountered again in the se-
quel. In [150] Wasserstein distances are used to study the best ap-
proximation of a measure by a finite number of points. Various au-
thors [700, 713] use them to compare color distributions in different
images. These distances are classically used in statistics, limit the-
orems, and all kinds of problems involving approximation of prob-
ability measures [254, 256, 257, 282, 694, 696, 716]. Rio [704] de-
rives sharp quantitative bounds in Wasserstein distance for the cen-
tral limit theorem on the real line, and surveys the previous lit-
erature on this problem. Wasserstein distances are well adapted to
study rates of fluctuations of empirical measures, see [695, Theo-
rem 11.1.6], [696, Theorem 10.2.1], [498, Section 4.9], and the research
papers [8, 307, 314, 315, 479, 771, 845]. (The most precise results are
those in [307]: there it is shown that the average W1 distance be-
tween
R two independent copies of the empirical measure behaves like
( ρ1−1/d )/N 1−1/d , where N is the size of the samples, ρ the density
of the common law of the random variables, and d ≥ 3; the proofs are
partly based on subadditivity, as in [150].) Quantitative Sanov-type
theorems have been considered in [139, 742]. Wasserstein distances are
also commonly used in statistical mechanics, most notably in the the-
ory of propagation of chaos, or more generally the mean behavior of
large particle systems [768] [757, Chapter 5]; the original idea seems
to go back to Dobrushin [308, 309] and has been applied in a large
number of problems, see for instance [81, 82, 221, 590, 624]. The origi-
Bibliographical notes 123
Displacement interpolation
As in the previous chapter I shall assume that the initial and final
probability measures are defined on the same Polish space (X , d). The
main additional structure assumption is that the cost is associated with
an action, which is a way to measure the cost of displacement along
a continuous curve, defined on a given time-interval, say [0, 1]. So the
cost function between an initial point x and a final point y is obtained
by minimizing the action among paths that go from x to y:
n o
c(x, y) = inf A(γ); γ0 = x, γ1 = y; γ ∈ C . (7.1)
d2 x
= −∇V (x). (7.4)
dt2
To make sure that A(γ) is well-defined, it is natural to assume that
the path γ is continuously differentiable, or piecewise continuously dif-
ferentiable, or at least almost everywhere differentiable as a function
of t. A classical and general setting is that of absolutely continuous
curves: By definition, if (X , d) is a metric space, a continuous curve
γ : [0, 1] → X is said to be absolutely continuous if there exists a func-
tion ℓ ∈ L1 ([0, 1]; dt) such that for all intermediate times t0 < t1 in
[0, 1], Z t1
d(γt0 , γt1 ) ≤ ℓ(t) dt. (7.5)
t0
More generally, it is said to be absolutely continuous of order p if for-
mula (7.5) holds with some ℓ ∈ Lp ([0, 1]; dt).
If γ is absolutely continuous, then the function t 7−→ d(γt0 , γt ) is
differentiable almost everywhere, and its derivative is integrable. But
the converse is false: for instance, if γ is the “Devil’s staircase”, en-
countered in measure theory textbooks (a nonconstant function whose
distributional derivative is concentrated on the Cantor set in [0, 1]),
then γ is differentiable almost everywhere, and γ̇(t) = 0 for almost
every t, even though γ is not constant! This motivates the “integral”
definition of absolute continuity based on formula (7.5).
If X is Rn , or a smooth differentiable manifold, then absolutely
continuous paths are differentiable for Lebesgue-almost all t ∈ [0, 1]; in
physical words, the velocity is well-defined for almost all times.
Before going further, here are some simple and important examples.
For all of them, the class of admissible curves is the space of absolutely
continuous curves.
Remark 7.3. This example shows that very different Lagrangians can
have the same minimizing curves.
which is called the Hamiltonian; then one can recast (7.6) in terms
of a Hamiltonian system, and access to the rich mathematical world
130 7 Displacement interpolation
This looks general enough, however there are interesting cases where
X does not have enough differentiable structure for the velocity vector
to be well-defined (tangent spaces might not exist, for lack of smooth-
ness). In such a case, it is still possible to define the speed along the
curve:
d(γt , γt+ε )
|γ̇t | := lim sup . (7.8)
ε→0 |ε|
This generalizes the natural notion of speed, which is the norm of the
velocity vector. Thus it makes perfect sense to write a Lagrangian of the
Deterministic interpolation via action-minimizing curves 131
Then minimizing curves are called geodesics. They may have vari-
able speed, but, just as on a Riemannian manifold, one can always
reparametrize them (that is, replace γ by γ e where γ et = γs(t) , with s
continuous increasing) in such a way that they have constant speed. In
that case d(γs , γt ) = |t − s| L(γ) for all s, t ∈ [0, 1].
d(x, y)p
cs,t (x, y) = .
(t − s)p−1
The functional A = Ati ,tf will just be called the action, and the
cost function c = cti ,tf the cost associated with the action. A curve
γ : [ti , tf ] → X is said to be action-minimizing if it minimizes A among
all curves having the same endpoints.
(ii) A length space is a space in which As,t (γ) = L(γ) (here L is the
length) defines a Lagrangian action.
If [t′i , t′f ] ⊂ [ti , tf ] then it is clear that (A)ti ,tf induces an action
′ ′
(A)ti ,tf on the time-interval [t′i , t′f ], just by restriction.
In the rest of this section I shall take (ti , tf ) = (0, 1), just for simplic-
ity; of course one can always reduce to this case by reparametrization.
It will now be useful to introduce further assumptions about exis-
tence and compactness of minimizing curves.
(ii) If s < t are any two intermediate times, and Ks , Kt are any
two nonempty compact sets such that cs,t (x, y) < +∞ for all x ∈ Ks ,
s,t
y ∈ Kt , then the set ΓK s →Kt
is compact and nonempty.
In particular, minimizing curves between any two fixed points x, y
with c(x, y) < +∞ should always exist and form a compact set.
Remark 7.14. If each As,t has compact sub-level sets (more explicitly,
if {γ; As,t(γ) ≤ A} is compact in C([s, t]; X ) for any A ∈ R), then the
lower semicontinuity of As,t, together with a standard compactness
argument (just as in Theorem 4.1) imply the existence of at least one
Abstract Lagrangian action 135
ct1 ,t3 (γt1 , γt3 ) = ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 ). (7.15)
(v) If the cost functions cs,t are continuous, then the set Γ of all
action-minimizing curves is closed in the topology of uniform conver-
gence;
(vi) For all times s < t, there exists a Borel map Ss→t : X × X →
s,t
C([s, t]; X ), such that for all x, y ∈ X , S(x, y) belongs to Γx→y . In
words, there is a measurable recipe to join any two endpoints x and y
by a minimizing curve γ : [s, t] → X .
136 7 Displacement interpolation
cs,t (x, y) ≤ As,t (γ) ≤ lim inf As,t(γk ) = lim inf cs,t (xk , yk ).
ct1 ,t3 (x1 , x3 ) ≤ At1 ,t3 (γ) = At1 ,t2 (γ1→2 ) + At2 ,t3 (γ2→3 )
= ct1 ,t2 (x1 , x2 ) + ct2 ,t3 (x2 , x3 ).
Interpolation of random variables 137
By point (iii) in Definition 7.11, it follows that c0,1 (γ0 , γ1 ) = A0,1 (γ),
which proves (iv).
If 0 ≤ t1 < t2 < t3 ≤ 1, now let Γ (t1 , t2 , t3 ) stand for the set
of all curves satisfying (7.15). If all functions cs,t are continuous, then
Γ (t1 , t2 , t3 ) is closed for the topology of uniform convergence. Then Γ is
the intersection of all Γ (t1 , t2 , t3 ), so it is closed also; this proves state-
ment (v). (Now there is a similarity with the proof of Theorem 5.20.)
For given times s < t, let Γ s,t be the set of all action-minimizing
curves defined on [s, t], and let Es,t be the “endpoints” mapping, defined
on Γ s,t by γ 7−→ (γs , γt ). By assumption, any two points are joined by
at least one minimizing curve, so Es,t is onto X × X . It is clear that
Es,t is a continuous map between Polish spaces, and by assumption
−1
Es,t (x, y) is compact for all x, y. It follows by general theorems of
measurable selection (see the bibliographical notes in case of need)
that Es,t admits a measurable right-inverse Ss→t , i.e. Es,t ◦ Ss→t = Id .
This proves statement (vi). ⊓
⊔
π0,1 := (e0 , e1 )# Π
The next theorem is the main result of this chapter. It shows that
the law at time t of a dynamical optimal coupling can be seen as a
minimizing path in the space of probability measures. In the important
case when the cost is a power of a geodesic distance, the corollary stated
right after the theorem shows that displacement interpolation can be
Interpolation of random variables 139
(iii) The path (µt )0≤t≤1 is a minimizing curve for the coercive action
functional defined on P (X ) by
N
X −1
As,t(µ) = sup sup C ti ,ti+1 (µti , µti+1 ) (7.16)
N ∈N s=t0 <t1 <...<tN =t i=0
= inf E As,t(γ), (7.17)
γ
where the last infimum is over all random curves γ : [s, t] → X such
that law (γτ ) = µτ (s ≤ τ ≤ t).
In that case (µt )0≤t≤1 is said to be a displacement interpolation be-
tween µ0 and µ1 . There always exists at least one such curve.
Finally, if K0 and K1 are two compact subsets of P (X ), such that
C 0,1 (µ0 , µ1 ) < +∞ for all µ0 ∈ K0 , µ1 ∈ K1 , then the set of dynamical
optimal transference plans Π with (e0 )# Π ∈ K0 and (e1 )# Π ∈ K1 is
compact.
• Is there a more explicit formula for the action on the space of prob-
ability measures,
R1 say for a simple enough action on X ? Can it be
written as 0 L(µt , µ̇t , t) dt? (Of course, in Corollary 7.22 this is the
case with L = |µ̇|p , but this expression is not very “explicit”.)
• Are geodesic paths nonbranching? (Does the velocity at initial time
uniquely determine the final measure µ1 ?)
• Can one identify simple conditions for the existence of a unique
geodesic path between two given probability measures?
Main idea in the proof of Theorem 7.21. The delicate part consists in
showing that if (µt ) is a given action-minimizing curve, then there ex-
ists a random minimizer γ such that µt = law (γt ). This γ will be
(0) (0)
constructed by dyadic approximation, as follows. First let (γ0 , γ1 )
(0)
be an optimal coupling of (µ0 , µ1 ). (Here the notation γ0 could be
replaced by just x0 , it does not mean that there is some curve γ (0)
142 7 Displacement interpolation
(1) (1)
behind.) Then let (γ0 , γ1/2 ) be an optimal coupling of (µ0 , µ1/2 ), and
(1) (1)
((γ ′ )1/2 , γ1 ) be an optimal coupling of (µ1/2 , µ1 ). By gluing these cou-
(1) (1)
plings together, I can actually assume that (γ ′ )1/2 = γ1/2 , so that I have
(1) (1) (1)
a triple (γ0 , γ1/2 , γ1 ) in which the first two components on the one
hand, and the last two components on the other hand, constitute opti-
mal couplings.
Now the key observation is that if (γt1 , γt2 ) and (γt2 , γt3 ) are optimal
couplings of (µt1 , µt2 ) and (µt2 , µt3 ) respectively, and the µtk satisfy the
equality appearing in (ii), then also (γt1 , γt3 ) should be optimal. Indeed,
by taking expectation in the inequality
ct1 ,t3 (γt1 , γt3 ) ≤ ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 )
E ct1 ,t3 (γt1 , γt3 ) ≤ C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ).
so actually
E ct1 ,t3 (γt1 , γt3 ) ≤ C t1 ,t3 (µt1 , µt3 ),
which means that indeed (γt1 , γt3 ) is an optimal coupling of (µt1 , µt3 )
for the cost ct1 ,t3 .
(1) (1)
So (γ0 , γ1 ) is an optimal coupling of (µ0 , µ1 ). Now we can proceed
in the same manner and define, for each k, a random discrete path
(k) (k) (k)
(γj 2−k ) such that (γs , γt ) is an optimal coupling for all times s, t of
the form j/2k . These are only discrete paths, but it is possible to extend
(k)
them into paths (γt )0≤t≤1 that are minimizers of the action. Of course,
(k)
if t is not of the form j/2k , there is no reason why law (γt ) would
coincide with µt . But hopefully we can pass to the limit as k → ∞, for
each dyadic time, and conclude by a density argument. ⊓
⊔
C t1 ,t3 (µt1 , µt3 ) ≤ E ct1 ,t3 (γt1 , γt3 ) ≤ E ct1 ,t2 (γt1 , γt2 ) + E ct2 ,t3 (γt2 , γt3 )
= C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ).
C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) ≤ E ct1 ,t2 (γt1 , γt2 ) + E ct2 ,t3 (γt2 , γt3 )
= E At1 ,t2 (γ) + E At2 ,t3 (γ) = E At1 ,t3 (γ) = E ct1 ,t3 (γt1 , γt3 ). (7.18)
(Recall that et is just the evaluation at time t.) Then the law Π (k) of
(γt )0≤t≤1 is a probability measure on the set of continuous curves in X .
I claim that Π (k) is actually concentrated on minimizing curves.
(Skip at first reading and go directly to Step 4.) To prove this, it is
sufficient to check the criterion in Proposition 7.16(iv), involving three
intermediate times t1 , t2 , t3 . By construction, the criterion holds true if
all these times belong to the same time-interval [i/2k , (i + 1)/2k ], and
also if they are all of the form j/2k ; the problem consists in “crossing
subintervals”. Let us show that
j j
i
cs,t (γs , γt ) = cs, 2k (γs , γ i ) + c 2k
i
,
2k (γ i ,γ j ) + c 2k (γ
,t
j , γt )
2k 2k 2k 2k
(7.19)
To prove this, we start with
i−1 j+1
, k i−1
,s t, j+1
c 2k 2 (γ i−1 , γ j+1 ) ≤ c 2k (γ i−1 , γs ) + cs,t (γs , γt ) + c k2 (γt , γ j+1 )
2k 2k 2k 2k
i−1 i i i+1
,s s, ,
≤c 2k (γ i−1 , γs ) + c 2k (γs , γ i ) + c 2k 2k (γ i , γ i+1 )
2k 2k 2k 2k
j
,t t, j+1
+ ... + c 2k (γ j , γt ) + c 2 k (γt , γ j+1 ).
2k 2k
(7.20)
146 7 Displacement interpolation
i−1 j+1
, k
and by construction of Π (k) this is just c 2k 2 (γ i−1 , γ j+1 ). So there
2k 2k
has to be equality everywhere in (7.20), which leads to (7.19). (Here
I use the fact that cs,t (γs , γt ) < +∞.) After that it is an easy game
to conclude the proof of the minimizing property for arbitrary times
t1 , t2 , t3 .
Step 4. To recapitulate: Starting from a curve (µt )0≤t≤1 , we have
constructed a family of probability measures Π (k) which are all concen-
trated on the set Γ of minimizing curves, and satisfy (et )# Π (k) = µt
for all t = j/2k . It remains to pass to the limit as k → ∞. For that we
shall check the tightness of the sequence (Π (k) )k∈N . Let ε > 0 be arbi-
trary. Since µ0 , µ1 are tight, there are compact sets K0 , K1 such that
µ0 [X \ K0 ] ≤ ε, µ1 [X \ K1 ] ≤ ε. From the coercivity of the action, the
0,1
set ΓK of action-minimizing curves joining K0 to K1 is compact,
0 →K1 0,1
and Π Γ \ ΓK0 →K1 is (with obvious notation)
/ K0 × K1 ≤ P [γ0 ∈
P (γ0 , γ1 ) ∈ / K0 ] + P [γ1 ∈
/ K1 ]
= µ0 [X \ K0 ] + µ1 [X \ K1 ] ≤ 2ε.
This proves the tightness of the family (Π (k) ). So one can extract a
subsequence thereof, still denoted Π (k) , that converges weakly to some
probability measure Π.
By Proposition 7.16(v), Γ is closed; so Π is still supported in Γ .
Moreover, for all dyadic time t = i/2ℓ in [0, 1], we have, if k is larger
than ℓ, (et )# Π (k) = µt , and by passing to the limit we find that
(et )# Π = µt also.
By assumption, µt depends continuously on t. So, to conclude that
(et )# Π = µt for all times t ∈ [0, 1] it now suffices to check the continuity
of (et )# Π as a function of t. In other words, if ϕ is an arbitrary bounded
continuous function on X , one has to show that
Interpolation of random variables 147
ψ(t) = E ϕ(γt )
where the next-to-last inequality follows from the fact that (γs , γt ) is
a coupling of (µs , µt ), and the last inequality is a consequence of the
definition of cs,t . This shows that
X
C ti ,ti+1 (µti , µti+1 ) ≤ E As,t (γ).
i
(µτ )s≤τ ≤t for the Lagrangian action restricted to [s, t]. Then Property
(ii) of Theorem 7.21 implies As,t (µ) = C s,t (µs , µt ). Finally, Property
(iii) in Definition 7.11 is also satisfied by construction. In the end, (A)
does define a Lagrangian action, with induced cost functionals C s,t .
To conclude the proof of Theorem 7.21 it only remains to check
the coercivity of the action; then the equivalence of (i), (ii) and (iii)
will follow from Proposition 7.16(iv). Let s < t be two given times in
[0, 1], and let Ks , Kt be compact sets of probability measures such that
C s,t(µs , µt ) < +∞ for all µs ∈ Ks , µt ∈ Kt . Action-minimizing curves
for As,t can be written as law (γτ )s≤τ ≤t , where γ is a random action-
minimizing curve [s, t] → X such that law (γs ) ∈ Ks , law (γt ) ∈ Kt .
One can use an argument similar to the one in Step 4 above to prove
that the laws Π of such minimizing curves form a tight, closed set; so
we have a compact set of dynamical transference plans Π s,t, that are
probability measures on C([s, t]; X ). The problem is to show that the
paths (eτ )# Π s,t constitute a compact set in C([s, t]; P (X )). Since the
continuous image of a compact set is compact, it suffices to check that
the map
Π s,t 7−→ ((eτ )# Π s,t )s≤τ ≤t
is continuous from P (C([s, t]; X )) to C([s, t]; P (X )). To do so, it will
be convenient to metrize P (X ) with the Wasserstein distance W1 , re-
placing if necessary d by a bounded, topologically equivalent distance.
(Recall Corollary 6.13.) Then the uniform distance on C([s, t]; X ) is
also bounded and there is an associated Wasserstein distance W1 on
P (C([s, t]; X )). Let Π and Π e be two dynamical optimal transference
plans, and let ((γτ ), (e
γτ )) be an optimal coupling of Π and Π; e let also
eτ be the associated displacement interpolations; then the required
µτ , µ
continuity follows from the chain of inequalities
et ) ≤ sup E d(γt , γ
sup W1 (µt , µ et ) ≤ E sup d(γt , γ e
et ) = W1 (Π, Π).
t∈[0,1] t∈[0,1] t∈[0,1]
Then
d(x, y)p
cs,t (x, y) = ,
(t − s)p−1
and all our assumptions hold true for this action and cost. (The assump-
tion of local compactness is used to prove that the action is coercive,
see the Appendix.) The important point now is that
Wp (µ, ν)p
C s,t(µ, ν) = .
(t − s)p−1
So, according to the remarks in Example 7.9, Property (ii) in Theo-
rem 7.21 means that (µt ) is in turn a minimizer of the action associated
with the Lagrangian |µ̇|p , i.e. a geodesic in Pp (X ). Note that if µt is
the law of a random optimal geodesic γt at time t, then
So (d+ /dt)[Ψ (t)1/p ] ≤ Wp (µ, ν), thus |Ψ (1)1/p − Ψ (0)1/p | ≤ Wp (µ, ν),
which is the desired result. ⊓
⊔
Again let µ0 and µ1 be any two probability measures, (µt )0≤t≤1 a dis-
placement interpolation associated with a dynamical optimal transfer-
ence plan Π, and (γt )0≤t≤1 a random action-minimizing curve with
law (γ) = Π. In particular (γ0 , γ1 ) is an optimal coupling of (µ0 , µ1 ).
Displacement interpolation between intermediate times and restriction 151
e
Π
Π ′ := , µ′t = (et )# Π ′ .
e C([t0 , t1 ]; X )
Π
et ] =⇒ [γ = γ
[γt = γ e].
γ t0 ,t1 ) = law (b
law (e γ t0 ,t1 ) = law (F (b
γ 0,t0 )) = law (F (γ 0,t0 )) = law (γ t0 ,t1 ).
c0,1 (γ0 , γ
e1 ) ≤ c0,t (γ0 , X) + ct,1 (X, γe1 ), (7.22)
and similarly
c0,1 (e
γ0 , γ1 ) ≤ c0,t (e
γ0 , X) + ct,1 (X, γ1 ). (7.23)
Since the reverse inequality holds true by (7.21), equality has to hold
in all intermediate inequalities, for instance in (7.22). Then it is easy
to see that the path γ defined by γ(s) = γ(s) for 0 ≤ s ≤ t, and
γ(s) = γ e(s) for s ≥ t, is a minimizing curve. Since it coincides with γ
on a nontrivial time-interval, it has to coincide with γ everywhere, and
similarly it has to coincide with γe everywhere. So γ = γ e.
s,t
If the costs c are continuous, the previous conclusion holds true
not only Π ⊗ Π-almost surely, but actually for any two minimizing
curves γ, γe lying in the support of Π. Indeed, inequality (7.21) defines
a closed set C in Γ ×Γ , where Γ stands for the set of minimizing curves;
so Spt Π × Spt Π = Spt(Π ⊗ Π) ⊂ C.
It remains to prove (v). Let Γ 0,1 be a c-cyclically monotone subset
of X × X on which π is concentrated, and let Γ be the set of mini-
mizing curves γ : [0, 1] → X such that (γ0 , γ1 ) ∈ Γ 0,1 . Let (Kℓ )ℓ∈N be
Interpolation of prices 155
Interpolation of prices
When the path µt varies in time, what becomes of the pair of “prices”
(ψ, φ) in the Kantorovich duality? The short answer is that these func-
tions will also evolve continuously in time, according to Hamilton–
Jacobi equations.
156 7 Displacement interpolation
t,s
H− φ (x) = sup φ(y) − cs,t (x, y) .
y∈X
s,t s,t
The family of operators (H+ )t>s (resp. (H− )s<t ) is called the for-
ward (resp. backward) Hamilton–Jacobi (or Hopf–Lax, or Lax–Oleinik)
semigroup.
s,t
Roughly speaking, H+ gives the values of ψ at time t, from its values
s,t
at time s; while H− does the reverse. So the semigroups H− and H+
are in some sense inverses of each other. Yet it is not true in general
t,s s,t
that H− H+ = Id . Proposition 7.34 below summarizes some of the
main properties of these semigroups; the denomination of “semigroup”
itself is justified by Property (ii).
Proposition 7.34 (Elementary properties of Hamilton–Jacobi
semigroups). With the notation of Definition 7.33,
s,t s,t s,t s,t
(i) H+ and H− are order-preserving: ψ ≤ ψ =⇒ H± ψ ≤ H± ψ.
(ii) Whenever t1 < t2 < t3 are three intermediate times in [0, 1],
t ,t t ,t t1 ,t3
2 3 1 2
H + H + = H +
t2 ,t1 t3 ,t2 t3 ,t1
H− H− = H− .
(iii) Whenever s < t are two times in [0, 1],
t,s s,t s,t t,s
H− H+ ≤ Id ; H+ H− ≥ Id .
Proof of Proposition 7.34. Properties (i) and (ii) are immediate conse-
quences of the definitions and Proposition 7.16(iii). To check Property
(iii), e.g. the first half of it, write
t,s s,t
H− (H+ ψ)(x) = sup inf′ ψ(x′ ) + cs,t (x′ , y) − cs,t (x, y) .
y x
∂S+ |∇− S+ |2
+ = 0,
∂t 2
where
[f (y) − f (x)]−
|∇− f |(x) := lim sup , z− = max(−z, 0).
y→x d(x, y)
and
φt (y) − ψs (x) ≤ cs,t (x, y),
with equality πs,t (dx dy)-almost surely, where πs,t is any optimal trans-
ference plan between µs and µt .
Since c0,s (x′ , x) + cs,t (x, y) + ct,1 (y, y ′ ) ≥ c0,1 (x′ , y ′ ), it follows that
h i
φt (y) − ψs (x) − cs,t (x, y) ≤ sup φ1 (y ′ ) − ψ0 (x′ ) − c0,1 (x′ , y ′ ) ≤ 0.
y ′ , x′
So φt (y) − ψs (x) ≤ cs,t (x, y). This inequality does not depend on the
fact that (ψ0 , φ1 ) is a tight pair of prices, in the sense of (5.5), but only
on the inequality φ1 − ψ0 ≤ c0,1 .
Next, introduce a random action-minimizing curve γ such that
µt = law (γt ). Since (ψ0 , φ1 ) is an optimal pair, we know from The-
orem 5.10(ii) that, almost surely,
From the identity c0,1 (γ0 , γ1 ) = c0,s (γ0 , γs )+cs,t (γs , γt )+ct,1 (γt , γ1 ) and
the definition of ψs and φt ,
cs,t (γs , γt ) = φ1 (γ1 ) − ct,1 (γt , γ1 ) − ψ0 (γ0 ) + c0,s (γ0 , γs )
≤ φt (γt ) − ψs (γs ).
This shows that actually cs,t (γs , γt ) = φt (γt ) − ψs (γs ) almost surely, so
(ψs , φt ) has to be optimal in the dual Kantorovich problem between
µs = law (γs ) and µt = law (γt ). ⊓
⊔
Exercise 7.39. After reading the rest of Part I, the reader can come
back to this exercise and check his or her understanding by proving
that, for a quadratic Lagrangian:
(i) The displacement interpolation between two balls in Euclidean
space is always a ball, whose radius increases linearly in time (here I am
identifying a set with the uniform probability measure on this set).
(ii) More generally, the displacement interpolation between two el-
lipsoids is always an ellipsoid.
160 7 Displacement interpolation
p(x) · v = ξ(x) · v,
where v ∈ Tx M , and the dot in the left-hand side just means “p(x)
applied to v”, while the dot in the right-hand side stands for the scalar
product defined by g. As a particular case, if p is the differential of a
function f , the corresponding vector field ξ is the gradient of f , denoted
by ∇f or ∇x f .
If f = f (x, v) is a function on TM , then one can differentiate it with
respect to x or with respect to v. Since T(x,v) Tx M ≃ Tx M , both dx f
and dv f can be seen as linear forms on Tx M ; this allows us to define
∇x f and ∇v f , the “gradient with respect to the position variable” and
the “gradient with respect to the velocity variable”.
Differentiating functions does not pose any particular conceptual
problem, but differentiating vector fields is quite a different story. If ξ
is a vector field on M , then ξ(x) and ξ(y) live in different vector spaces,
so it does not a priori make sense to compare them, unless one identifies
in some sense Tx M and Ty M . (Of course, one could say that ξ is a map
M → TM and define its differential as a map TM → T (TM ) but this
162 7 Displacement interpolation
Further, note that in the second formula of (7.28) the symbol f˙ stands
for the usual derivative of t → f (γt ); while the symbols ξ̇ and ζ̇ stand
for the covariant derivatives of the vector fields ξ and ζ along γ.
A third approach to covariant derivation is based on coordinates.
Let x ∈ M , then there is a neighborhood O of x which is diffeomorphic
to some open subset U ⊂ Rn . Let Φ be a diffeomorphism U → O, and
let (e1 , . . . , en ) be the usual basis of Rn . A point m in O is said to have
coordinates (y 1 , . . . , y n ) if m = Φ(y 1 , . . . , y n ); and a vector v ∈ Tm M
is said to have components v 1 , . . . , v k if d(y1 ,...,yn ) Φ · (v1 , . . . , vk ) = v.
Then the P coefficients of the metric g are the functions gij defined by
g(v, v) = gij v i v j .
The coordinate point of view reduces everything to “explicit” com-
putations and formulas in Rn ; for instance the derivation of a function
f along the ith direction is defined as ∂i f := (∂/∂y i )(f ◦ Φ). This is
conceptually simple, but rapidly leads to cumbersome expressions. A
central role in these formulas is played by the Christoffel symbols,
which are defined by
1 X
n
Γijm := ∂i gjk + ∂j gki − ∂k gij gkm ,
2
k=1
which minimize the action, we can compute the differential of the ac-
tion. So let γ be a curve, and h a small variation of that curve. (This
amounts to considering a family γs,t in such a way that γ0,t = γt and
(d/ds)|s=0 γs,t = h(t).) Then the infinitesimal variation of the action A
at γ, along the variation h, is
Z 1
dA(γ) · h = ∇x L(γt , γ̇t , t) · h(t) + ∇v L(γt , γ̇t , t) · ḣ(t) dt.
0
can also reparametrize geodesic curves γ in such a way that their speed
|γ̇| is constant, or equivalently that for all intermediate times s and t,
their length between times s and t coincides with the distance between
their positions at times s and t.
The same proof that I sketched for Riemannian manifolds applies
in geodesic spaces, to show that the set Γx,y of (minimizing, constant
speed) geodesics joining x to y is compact; more generally, the set
ΓK0 →K1 of geodesics γ with γ0 ∈ K0 and γ1 ∈ K1 is compact, as soon
as K0 and K1 are compact. So there are important common points be-
tween the structure of a length space and the structure of a Riemannian
manifold. From the practical point of view, some main differences are
that (i) there is no available equation for geodesic curves, (ii) geodesics
may “branch”, (iii) there is no guarantee that geodesics between x and
y are unique for y very close to x, (iv) there is neither a unique notion of
dimension, nor a canonical reference measure, (v) there is no guarantee
that geodesics will be almost everywhere unique. Still there is a the-
ory of differential analysis on nonsmooth geodesic spaces (first variation
formula, norms of Jacobi fields, etc.) mainly in the case where there are
lower bounds on the sectional curvature (in the sense of Alexandrov,
as will be described in Chapter 26).
Bibliographical notes
where the infimum is taken among all paths (µt )0≤t≤1 satisfying certain
regularity conditions. Brenier himself gave two sketches of the proof for
this formula [88, 164], and another formal argument was suggested by
Otto and myself [671, Section 3]. Rigorous proofs were later provided
by several authors under various assumptions [814, Theorem 8.1] [451]
[30, Chapter 8] (the latter reference contains the most precise results).
The adaptation to Riemannian manifolds has been considered in [278,
431, 491]. We shall come back to these formulas later on, after a more
precise qualitative picture of optimal transport has emerged. One of
172 7 Displacement interpolation
and this would contradict the fact that the support of π is c-cyclically
monotone. Stated otherwise: Given two crossing line segments, we can
shorten the total length of the paths by replacing these lines by the
new transport lines [x1 , y2 ] and [x2 , y1 ] (see Figure 8.1).
For cost functions that do not satisfy a triangle inequality, Monge’s ar-
gument does not apply, and pathlines can cross. However, it is often the
case that the crossing of the curves (with the time variable explicitly
taken into account) is forbidden. Here is the most basic example: Con-
sider the quadratic cost function in Euclidean space (c(x, y) = |x−y|2 ),
and let (x1 , y1 ) and (x2 , y2 ) belong to the support of some optimal cou-
pling. By cyclical monotonicity,
176 8 The Monge–Mather shortening principle
x1
x2
y1
y2
Then let
γ1 (t) = (1 − t) x1 + t y1 , γ2 (t) = (1 − t) x2 + t y2
|x1 − y2 |2 + |x2 − y1 |2
= |x1 − X|2 + |X − y2 |2 − 2 X − x1 , X − y2
+ |x2 − X|2 + |X − y1 |2 − 2 X − x2 , X − y1
= [t20 + (1 − t0 )2 ] |x1 − y1 |2 + |x2 − y2 |2
+ 4 t0 (1 − t0 ) x1 − y1 , x2 − y2
≤ t20 + (1 − t0 )2 + 2 t0 (1 − t0 ) |x1 − y1 |2 + |x2 − y2 |2
= |x1 − y1 |2 + |x2 − y2 |2 ,
γ1 (t) = (1 − t) x1 + t y1 , γ2 (t) = (1 − t) x2 + t y2 .
(By the way, this inequality is easily seen to be optimal.) So the uniform
distance between the whole paths γ1 and γ2 can be controlled by their
distance at some time t0 ∈ (0, 1).
Example 8.3. The cost function c(x, y) = d(x, y)2 corresponds to the
Lagrangian function L(x, v, t) = |v|2 , which obviously satisfies the as-
sumptions of Corollary 8.2. In that case the exponent β = 1 is ad-
missible. Moreover, it is natural to expect that the constant CK can
be controlled in terms of just a lower bound on the sectional curvature
of M . I shall come back to this issue later in this chapter (see Open
Problem 8.21).
Example 8.4. The cost function c(x, y) = d(x, y)1+α does not satisfy
the assumptions of Corollary 8.2 for 0 < α < 1. Even if the associated
Lagrangian L(x, v, t) = |v|1+α is not smooth, the equation for mini-
mizing curves is just the geodesic equation, so Assumption (i) in The-
orem 8.1 is still satisfied. Then, by tracking exponents in the proof of
Theorem 8.1, one can find that (8.4) holds true with β = (1+α)/(3−α).
But this is far from optimal: By taking advantage of the homogeneity
of the power function, one can prove that the exponent β = 1 is also
180 8 The Monge–Mather shortening principle
admissible, for all α ∈ (0, 1). (It is the constant, rather than the expo-
nent, which deteriorates as α ↓ 0.) I shall explain this argument in the
Appendix, in the Euclidean case, and leave the Riemannian case as a
delicate exercise. This example suggests that Theorem 8.1 still leaves
room for improvement.
111111111111
000000000000
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
µ1
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
µ0 = δx0 000000000000
111111111111
Fig. 8.2. In this example the map γ(0) → γ(1/2) is not well-defined, but the map
γ(1/2) → γ(0) is well-defined and Lipschitz, just as the map γ(1/2) → γ(1). Also
µ0 is singular, but µt is absolutely continuous as soon as t > 0.
Thus, Π ⊗ Π(dγ de
γ )-almost surely,
1K Π
Π ′ := ,
Π[K]
and let π ′ := (e0 , e1 )# Π ′ be the associated transference plan, and µ′t =
(et )# Π ′ the marginal of Π ′ at time t. In particular,
(e1 )# Π µ1
µ′1 ≤ = ,
Π[K] Π[K]
Now let us turn to the proof of Theorem 8.1. It is certainly more im-
portant to grasp the idea of the proof (Figure 8.3) than to follow the
calculations, so the reader might be content with the informal expla-
nations below and skip the rigorous proof at first reading.
Idea of the proof of Theorem 8.1. Assume, to fix the ideas, that γ1 and
γ2 cross each other at a point m0 and at time t0 . Close to m0 , these
two curves look like two straight lines crossing each other, with re-
spective velocities v1 and v2 . Now cut these curves on the time inter-
val [t0 − τ, t0 + τ ] and on that interval introduce “deviations” (like a
plumber installing a new piece of pipe to short-cut a damaged region
of a channel) that join the first curve to the second, and vice versa.
This amounts to replacing (on a short interval of time) two curves
with approximate velocity v1 and v2 , by two curves with approximate
velocities (v1 +v2 )/2. Since the time-interval where the modification oc-
curs is short, everything is concentrated in the neighborhood of (m0 , t0 ),
so the modification in the Lagrangian action of the two curves is ap-
proximately
h i
v1 + v2
(2τ ) 2 L m0 , , t0 − L(m0 , v1 , t0 ) + L(m0 , v2 , t0 ) .
2
184 8 The Monge–Mather shortening principle
γ1 (0) γ2 (0)
γ1
γ2
then this means that (γ1 (t0 ), γ̇1 (t0 )) and (γ2 (t0 ), γ̇2 (t0 )) are very close
to each other in TM ; more precisely
they are separated by a distance
which is O d(γ1 (t0 ), γ2 (t0 ))β . Then by Assumption (i) and Cauchy–
Lipschitz theory this bound will be propagated backward and forward
in time, so the distance between (γ1 (t), γ̇1 (t)) and (γ2 (t), γ̇2 (t)) will re-
main bounded by O d(γ1 (t0 ), γ2 (t0 ))β . Thus to conclude the argument
it is sufficient to prove (8.8).
Step 2: Construction of shortcuts. First some notation: Let us
write x1 (t) = γ1 (t), x2 (t) = γ2 (t), v1 (t) = γ̇1 (t), v2 (t) = γ̇2 (t), and also
X1 = x1 (t0 ), V1 = v1 (t0 ), X2 = x2 (t0 ), V2 = v2 (t0 ). The goal is to
control |V1 − V2 | by |X1 − X2 |.
186 8 The Monge–Mather shortening principle
Then the path x21 (t) and its time-derivative v21 (t) are defined symetri-
cally (see Figure 8.4). These definitions are rather natural: First we try
to construct paths on [t0 − τ, t0 + τ ] whose velocity is about the half of
the velocities of γ1 and γ2 ; then we correct these paths by adding simple
functions (linear in time) to make them match the correct endpoints.
I shall conclude this step with some basic estimates about the paths
x12 and x21 on the time-interval [t0 − τ, t0 + τ ]. For a start, note that
x1 + x2 x1 + x2
x − = − x − ,
12 2
21
2
(8.9)
v + v v + v
v12 − 1 2 1 2
= − v21 − .
2 2
In the sequel, the symbol O(m) will stand for any expression which
is bounded by Cm, where C only depends on V and on the regularity
bounds on the Lagrangian L on V. From Cauchy–Lipschitz theory and
Assumption (i),
|v1 − v2 |(t) + |x1 − x2 |(t) = O |X1 − X2 | + |V1 − V2 | , (8.10)
Proof of Mather’s estimates 187
x21
x12
Fig. 8.4. The paths x12 (t) and x21 (t) obtained by using the shortcuts to switch
from one original path to the other.
and then by plugging this back into the equation for minimizing curves
we obtain
|v̇1 − v̇2 |(t) = O |X1 − X2 | + |V1 − V2 | .
Upon integration in times, these bounds imply
x1 (t) − x2 (t) = (X1 − X2 ) + O τ (|X1 − X2 | + |V1 − V2 |) ; (8.11)
v1 (t) − v2 (t) = (V1 − V2 ) + O τ (|X1 − X2 | + |V1 − V2 |) ; (8.12)
and therefore also
x1 (t)−x2 (t) = (X1 −X2 )+(t−t0 ) (V1 −V2 )+O τ 2 (|X1 −X2 |+|V1 −V2 |) .
(8.13)
As a consequence of (8.12), if τ is small enough (depending only on
the Lagrangian L),
|V1 − V2 |
|v1 − v2 |(t) ≥ − O τ |X1 − X2 | . (8.14)
2
Next, from Cauchy–Lipschitz again,
x2 (t0 +τ )−x1 (t0 +τ ) = X2 −X1 +τ (V2 −V1 )+O τ 2 (|X1 −X2 |+|V1 −V2 |) ;
and also
x2 (t0 + τ ) − x1 (t0 + τ ) x1 (t0 − τ ) − x2 (t0 − τ )
+
2 2
= τ (V2 − V1 ) + O τ 2 (|X1 − X2 | + |V1 − V2 |) . (8.16)
v1 (t) + v2 (t) |X − X |
1 2
v12 (t) − =O + τ |V1 − V2 | . (8.17)
2 τ
After integration in time and use of (8.16), one obtains
x1 (t) + x2 (t)
x12 (t) −
2
h x1 (t0 ) + x2 (t0 ) i
= x12 (t0 ) − + O |X1 − X2 | + τ 2 |V1 − V2 |
2
= O |X1 − X2 | + τ |V1 − V2 | (8.18)
In particular,
|x12 − x21 |(t) = O |X1 − X2 | + τ |V1 − V2 | . (8.19)
similarly
Proof of Mather’s estimates 189
x1 + x2
L(x2 , v2 ) − L , v2
2
x1 + x2 x2 − x1
= ∇x L , v2 · + O |x1 − x2 |1+α .
2 2
Moreover,
x1 + x2 x1 + x2
∇x L , v1 − ∇x L , v2 = O(|v1 − v2 |α ).
2 2
The combination of these three identities, together with estimates (8.11)
and (8.12), yields
x + x x + x
1 2 1 2
L(x1 , v1 ) + L(x2 , v2 ) − L , v1 + L , v2
2 2
1+α α
= O |x1 − x2 | + |x1 − x2 | |v1 − v2 |
= O |X1 − X2 |1+α + τ |V1 − V2 |1+α + |X1 − X2 | |V1 − V2 |α
1+α α
+τ |V1 − V2 | |X1 − X2 | .
v1 + v2
L(x21 , v21 ) − L x21 ,
2
v1 + v2 v1 + v2 v + v 1+α
1 2
= ∇v L x21 , · v21 − + O v21 − ,
2 2 2
v1 + v2 v1 + v2
∇v L x12 , − ∇v L x21 , = O |x12 − x21 |α .
2 2
Combining this with (8.9), (8.17) and (8.19), one finds
v1 + v2 v1 + v2
L(x12 , v12 ) + L(x21 , v21 ) − L x12 , + L x21 ,
2 2
v + v 1+α v + v
1 2 1 2
= O v12 − + v12 − |x12 − x21 |α
2 2
|X1 − X2 |1+α
=O + τ 1+α |V1 − V2 |1+α .
τ 1+α
190 8 The Monge–Mather shortening principle
After that,
v1 + v2 x + x v + v
1 2 1 2
L x12 , −L ,
2 2 2
x + x v + v x1 + x2 x1 + x2 1+α
1 2 1 2
= ∇x L , · x12 − +O x12 − ,
2 2 2 2
v1 + v2 x + x v + v
1 2 1 2
L x21 , −L ,
2 2 2
x + x v + v x1 + x2 x1 + x2 1+α
1 2 1 2
= ∇x L , · x21 − +O x21 − ,
2 2 2 2
and now by (8.9) the terms in ∇x cancel each other exactly upon som-
mation, so the bound (8.18) leads to
v1 + v2 v1 + v2 x + x v + v
1 2 1 2
L x12 , + L x21 , − 2L ,
2 2 2 2
x 1 + x 2 1+α
= O x21 −
2
= O |X1 − X2 |1+α + τ 1+α |V1 − V2 |1+α .
From Step 3, we can replace in the integrand all positions by (x1 +x2 )/2,
and v12 , v21 by (v1 + v2 )/2, up to a small error. Collecting the various
error terms, and taking into account the smallness of τ , one obtains
(dropping the t variable again)
Z t0 +τ x + x x + x x + x v + v
1 1 2 1 2 1 2 1 2
L , v1 +L , v2 − 2L , dt
2τ t0 −τ 2 2 2 2
|X − X |1+α
1 2 1+α
≤C + τ |V 1 − V 2 | . (8.21)
τ 1+α
Complement: Ruling out focalization by shortening 191
On the other hand, from the convexity condition (iii) and (8.14),
the left-hand side of (8.21) is bounded below by
Z t0 +τ 2+κ
1
K |v1 − v2 |2+κ dt ≥ K ′ |V1 − V2 | − Aτ |X1 − X2 | . (8.22)
2τ t0 −τ
If |V1 − V2 | ≤ 2Aτ |X1 − X2 |, then the proof is finished. If this is not the
case, this means that |V1 − V2 | − Aτ |X1 − X2 | ≥ |V1 − V2 |/2, and then
the comparison of the upper bound (8.21) and the lower bound (8.22)
yields
|X − X |1+α
1 2
|V1 − V2 |2+κ ≤ C + τ |V1 − V 2 |1+α
. (8.23)
τ 1+α
If |V1 − V2 | = 0, then the proof is finished. Otherwise, the conclusion
follows by choosing τ small enough that Cτ |V1 − V2 |1+α ≤ (1/2)|V1 −
V2 |2+κ ; then τ = O |V1 − V2 |1+κ−α ) and (8.23) implies
1+α
|V1 − V2 | = O |X1 − X2 |β , β= .
(1 + α)(1 + κ − α) + 2 + κ
(8.24)
In the particular case when κ = 0 and α = 1, one has
2 |X1 − X2 |2 2
|V1 − V2 | ≤ C + τ |V1 − V2 | ,
τ2
and if τ is small enough this implies just
|X1 − X2 |
|V1 − V2 | ≤ C . (8.25)
τ
The upper bound on τ depends on the regularity and strict convexity
of τ in V, but also on t0 since τ cannot be greater than min(t0 , 1 − t0 ).
This is actually the only way in which t0 explicitly enters the estimates.
So inequality (8.25) concludes the argument. ⊓
⊔
Remark 8.13. If L does not depend on t, then one can apply the pre-
vious result for any T = 2−ℓ , and then use a compactness argument to
construct a constant curve (µt )t∈R satisfying Properties (i)–(vi) above.
In particular µ0 is a stationary measure for the Lagrangian system.
Before giving its proof, let me explain briefly why Theorem 8.11
is interesting from the point of view of the dynamics. A trajectory
of the dynamical system defined by the Lagrangian L is a curve γ
which is locally action-minimizing; that is, one can cover the time-
interval by small subintervals on which the curve is action-minimizing.
It is a classical problem in mechanics to construct and study periodic
trajectories having certain given properties. Theorem 8.11 does not
construct a periodic trajectory, but at least it constructs a random
trajectory γ (or equivalently a probability measure Π on the set of
trajectories) which is periodic on average: The law µt of γt satisfies
µt+T = µt . This can also be thought of as a probability measure Π on
the set of all possible trajectories of the system.
Of course this in itself is not too striking, since there may be a great
deal of invariant measures for a dynamical system, and some of them
are often easy to construct. The important point in the conclusion of
Theorem 8.11 is that the curve γ is not “too random”, in the sense
that the random variable (γ(0), γ̇(0)) takes values in a Lipschitz graph.
(If (γ(0), γ̇(0)) were a deterministic element in TM , this would mean
that Π just sees a single periodic curve. Here we may have an infinite
collection of curves, but still it is not “too large”.)
Another remarkable property of the curves γ is the fact that the
minimization property holds along any time-interval in R, not neces-
sarily small.
Proof of Theorem 8.11. I shall repeatedly use Proposition 7.16 and The-
orem 7.21. First, C 0,T (µ, µ) is a lower semicontinuous function of µ,
bounded below by T (inf L) > −∞, so the minimization problem (8.26)
does admit a solution.
Define µ0 = µT = µ, then define µt by displacement interpolation
for 0 < t < T , and extend the result by periodicity.
Let k ∈ N be given and let µ e be a minimizer for the variational
problem
inf C 0,kT (µ, µ).
µ∈P (M )
k−1
X
0,kT 0,kT
C (e e) ≤ C
µ, µ (µ0 , µkT ) ≤ C jT, (j+1)T (µjT , µ(j+1)T )
j=0
1 X
k−1
1X
k−1
0,T 0,T
C (µ, µ) ≤ C ejT ,
µ ejT
µ (8.28)
k k
j=0 j=0
1 X
k−1 k−1
1 X
0,T
=C ejT ,
µ e(j+1)T .
µ (8.29)
k k
j=0 j=0
Introduction to Mather’s theory 197
1 X
k−1
1X
k−1 1X k−1
C 0,T ejT ,
µ e(j+1)T ≤
µ C jT, (j+1)T (e e(j+1)T )
µjT , µ
k k k
j=0 j=0 j=0
1
= C 0,kT (e ekT ),
µ0 , µ (8.30)
k
where the last equality is a consequence of Property (ii) in Theo-
rem 7.21. Inequalities (8.29) and (8.30) together imply
1 0,kT 1
C 0,1 (µ, µ) ≤ C (e ekT ) = C 0,kT (e
µ0 , µ e).
µ, µ
k k
Since the reverse inequality holds true by (8.27), in fact all the inequal-
ities in (8.27), (8.29) and (8.30) have to be equalities. In particular,
k−1
X
C 0,kT (µ0 , µkT ) = C jT, (j+1)T (µjT , µ(j+1)T ). (8.31)
j=0
C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) = C t1 ,t3 (µt1 , µt3 ) (8.32)
holds true for any three intermediate times t1 < t2 < t3 . By periodicity,
it suffices to do this for t1 ≥ 0. If 0 ≤ t1 < t2 < t3 ≤ T , then (8.32)
is true by the property of displacement interpolation (Theorem 7.21
again). If jT ≤ t1 < t2 < t3 ≤ (j + 1)T , this is also true because of the
T -periodicity. In the remaining cases, we may choose k large enough
that t3 ≤ kT . Then
C 0,kT (µ0 , µkT ) ≤ C 0,t1 (µ0 , µt1 ) + C t1 ,t3 (µt1 , µt3 ) + C t3 ,kT (µt3 , µkT )
≤ C 0,t1 (µ0 , µt1 ) + C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) + C t3 ,kT (µt3 , µkT )
X
≤ C sj ,sj+1 (µsj , µsj+1 ), (8.33)
(Consider for instance the particular case when 0 < t1 < t2 < T <
t3 < 2T ; then one can write C 0,t1 + C t1 ,t2 + C t2 ,T = C 0,T , and also
C T,t3 + C t3 ,2T = C T,2T . So C 0,t1 + C t1 ,t2 + C t2 ,T + C T,t3 + C t3 ,2T =
C 0,T + C T,2T .)
But (8.34) is just C 0,kT (µ0 , µkT ), as shown in (8.31). So there is
in fact equality in all these inequalities, and (8.32) follows. Then by
Theorem 7.21, (µt ) defines a displacement interpolation between any
two of its intermediate values. This proves (i). At this stage we have
also proven (iii) in the case when t0 = 0.
Now for any t ∈ R, one has, by (8.32) and the T -periodicity,
ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 ) = ct1 ,t3 (γt1 , γt3 ), (8.35)
Introduction to Mather’s theory 199
where the property of optimality of the path (µt )t∈R was used in the
last step. So all these inequalities are equalities, and in particular
E ct1 ,t3 (γt1 , γt3 ) − ct1 ,t2 (γt1 , γt2 ) − ct2 ,t3 (γt2 , γt3 ) = 0.
The story does not end here. First, there is a powerful dual point
of view to Mather’s theory, based on solutions to the dual Kantorovich
problem; this is a maximization problem defined by
Z
sup (φ − ψ) dµ, (8.36)
a certain critical value; above that value, the Mather measures are sup-
ported by revolutions. At the critical value, the Mather and Aubry sets
differ: the Aubry set (viewed in the variables (x, v)) is the union of the
two revolutions of infinite period.
Fig. 8.5. In the left figure, the pendulum oscillates with little energy between two
extreme positions; its trajectory is an arc of circle which is described clockwise, then
counterclockwise, then clockwise again, etc. On the right figure, the pendulum has
much more energy and draws complete circles again and again, either clockwise or
counterclockwise.
The dual point of view in Mather’s theory, and the notion of the
Aubry set, are intimately related to the so-called weak KAM the-
ory, in which stationary solutions of Hamilton–Jacobi equations play
a central role. The next theorem partly explains the link between the
two theories.
In the sequel, I shall write just γ for γ (T +1) . Of course the estimate
above remains unchanged upon replacement of T by T + 1, so
Z
1 0 1
L(γ(s), γ̇(s)) ds = c + O .
T −(T +1) T
Then define
Z −1 Z 0
1 1
µT := δγ(s) ds; νT := δγ(s) ds;
T −(T +1) T −T
Then from (8.39) and the lower semicontinuity of the optimal transport
cost,
C 0,1 (µ, µ) ≤ lim inf C 0,1 (µT , νT ) ≤ c.
T →∞
one intermediate time is clearly not enough to control the distance be-
tween the positions along the whole geodesic curves. One cannot hope
either to control the distance between the velocities of these curves,
since the velocities might not be well-defined. On the other hand, we
may take advantage of the property of preservation of speed along the
minimizing curves, since this remains true even in a nonsmooth con-
text. The next theorem exploits this to show that if geodesics in a
displacement interpolation pass near each other at some intermediate
time, then their lengths have to be approximately equal.
Theorem 8.22 (A rough nonsmooth shortening lemma). Let
(X , d) be a metric space, and let γ1 , γ2 be two constant-speed, minimiz-
ing geodesics such that
2 2 2 2
d γ1 (0), γ1 (1) + d γ2 (0), γ2 (1) ≤ d γ1 (0), γ2 (1) + d γ2 (0), γ1 (1) .
Let L1 and L2 stand for the respective lengths of γ1 and γ2 , and let D
be a bound on the diameter of (γ1 ∪ γ2 )([0, 1]). Then
√ q
C D
|L1 − L2 | ≤ p d γ1 (t0 ), γ2 (t0 ) ,
t0 (1 − t0 )
for some numeric constant C.
Proof of Theorem 8.22. Write d12 = d(x1 , y2 ), d21 = d(x2 , y1 ), X1 =
γ1 (t0 ), X2 = γ2 (t0 ). From the minimizing assumption, the triangle
inequality and explicit calculations,
0 ≤ d212 + d221 − L21 − L22
2
≤ d(x1 , X1 ) + d(X1 , X2 ) + d(X2 , y2 )
2
+ d(x2 , X2 ) + d(X2 , X1 ) + d(X1 , y1 )
2
= t0 L1 + d(X1 , X2 ) + (1 − t0 ) L2
2
+ t0 L2 + d(X1 , X2 ) + (1 − t0 ) L1 − L21 − L22
= 2 d(X1 , X2 ) L1 + L2 + d(X1 , X2 ) − 2 t0 (1 − t0 ) (L1 − L2 )2 .
As a consequence,
s
L1 + L2 + d(X1 , X2 ) p
|L1 − L2 | ≤ d(X1 , X2 ),
t0 (1 − t0 )
Appendix: Lipschitz estimates for power cost functions 207
Further, let
γ1 (t) = (1 − t) x1 + t y1 , γ2 (t) = (1 − t) x2 + t y2 .
Proof of Theorem 8.23. First note that it suffices to work in the affine
space generated by x1 , y1 , x2 , y2 , which is of dimension at most 3; hence
all the constants will be independent of the dimension n. For notational
simplicity, I shall assume that t0 = 1/2, which has no important in-
fluence on the computations. Let X1 := γ1 (1/2), X2 := γ2 (1/2). It is
sufficient to show that
where δx and δy are vectors of small norm (recall that x1 − y1 has unit
norm). Of course
Appendix: Lipschitz estimates for power cost functions 209
δx + δy
X1 − X2 = , x2 − x1 = δx, y2 − y1 = δy;
2
so to conclude the proof it is sufficient to show that
δx + δy
≥ K(|δx| + |δy|), (8.43)
2
as soon as |δx| and |δy| are small enough, and (8.40) is satisfied.
By using the formulas |a + b|2 = |a|2 + 2ha, bi + |b|2 and
1+α (1 + α) (1 + α)(1 − α) 2
(1 + ε) 2 =1+ ε − ε + O(ε3 ),
2 8
one easily deduces from (8.40) that
h i
|δx − δy|2 − |δx|2 − |δy|2 ≤ (1 − α) hδx − δy, ei2 − hδx, ei2 − hδy, ei2
+ O |δx|3 + |δy|3 .
(which is indeed a scalar product because α > 0), and denote the
associated norm by kvk. Then the above conclusion can be summarized
into
hhδx, δyii ≥ O kδxk3 + kδyk3 . (8.44)
It follows that
δx + δy
2 1
= kδxk2
+ kδyk2
+ 2hhδx, δyii
2
4
1
≥ (kδxk2 + kδyk2 ) + O(kδxk3 + kδyk3 ).
4
So inequality (8.43) is indeed satisfied if |δx| + |δy| is small enough. ⊓
⊔
210 8 The Monge–Mather shortening principle
Exercise 8.25. Extend this result to the cost function d(x, y)1+α on a
Riemannian manifold, when γ and γ e stay within a compact set.
Hints: This tricky exercise is only for a reader who feels very comfort-
able. One can use a reasoning similar to that in Step 2 of the above
proof, introducing a sequence (γ (k) , γ
e(k) ) which is asymptotically the
“worst possible”, and converges, up to extraction of a subsequence,
to (γ (∞) , γ
e(∞) ). There are three cases: (i) γ (∞) and e
γ (∞) are distinct
geodesic curves which cross; this is ruled out by Theorem 8.1. (ii) γ (k)
and γe(k) converge to a point; then everything becomes local and one
can use the result in Rn , Theorem 8.23. (iii) γ (k) and γe(k) converge to
(∞)
a nontrivial geodesic γ ; then these curves can be approximated by
infinitesimal perturbations of γ (∞) , which are described by differential
equations (Jacobi equations).
Remark 8.26. Of course it would be much better to avoid the com-
pactness arguments and derive the bounds directly, but I don’t see how
to proceed.
Bibliographical notes
Monge’s observation about the impossibility of crossing appears in his
seminal 1781 memoir [636]. The argument is likely to apply whenever
the cost function satisfies a triangle inequality, which is always the
case in what Bernard and Buffoni have called the Monge–Mañé prob-
lem [104]. I don’t know of a quantitative version of it.
A very simple argument, due to Brenier, shows how to construct,
without any calculations, configurations of points that lead to line-
crossing for a quadratic cost [814, Chapter 10, Problem 1].
There are several possible computations to obtain inequalities of the
style of (8.3). The use of the identity (8.2) was inspired by a result by
Figalli, which is described below.
It is an old observation in Riemannian geometry that two minimiz-
ing curves cannot intersect twice and remain minimizing; the way to
prove this is the shortcut method already known to Monge. This simple
principle has important geometrical consequences, see for instance the
works by Morse [637, Theorem 3] and Hedlund [467, p. 722]. (These
references, as well as a large part of the historical remarks below, were
pointed out to me by Mather.)
Bibliographical notes 211
(So in this case there is no need for an upper bound on the distances
between x1 , x2 , y1 , y2 .) The general case where K might be negative
seems to be quite more tricky. As a consequence of (8.45), Theorem 8.7
holds when the cost is the squared distance on an Alexandrov space
with nonnegative curvature; but this can also be proven by the method
of Figalli and Juillet [366].
Theorem 8.22 takes inspiration from the no-crossing proof in [246,
Lemma 5.3]. I don’t know whether the Hölder-1/2 regularity is optimal,
and I don’t know either whether it is possible/useful to obtain similar
estimates for more general cost functions.
9
In the present chapter and the next one I shall investigate the solv-
ability of the Monge problem for a Lagrangian cost function. Recall
from Theorem 5.30 that it is sufficient to identify conditions under
which the initial measure µ does not see the set of points where the
c-subdifferential of a c-convex function ψ is multivalued.
Consider a Riemannnian manifold M , and a cost function c(x, y) on
M × M , deriving from a Lagrangian function L(x, v, t) on TM × [0, 1]
satisfying the classical conditions of Definition 7.6. Let µ0 and µ1 be
two given probability measures, and let (µt )0≤t≤1 be a displacement
interpolation, written as the law of a random minimizing curve γ at
time t.
If the Lagrangian satisfies adequate regularity and convexity prop-
erties, Theorem 8.5 shows that the coupling (γ(s), γ(t)) is always de-
terministic as soon as 0 < s < 1, however singular µ0 and µ1 might
be. The question whether one can construct a deterministic coupling
of (µ0 , µ1 ) is much more subtle, and cannot be answered without reg-
ularity assumptions on µ0 . In this chapter, a simple approach to this
problem will be attempted, but only with partial success, since even-
tually it will work out only for a particular class of cost functions,
including at least the quadratic cost in Euclidean space (arguably the
most important case).
Our main assumption on the cost function c will be:
Assumption (C): For any c-convex function ψ and any x ∈ M , the
c-subdifferential ∂c ψ(x) is pathwise connected.
Proof of Theorem 9.2. Let Z be the set of points x for which ψ(x) <
+∞ but ∂c ψ(x) is not single-valued; the problem is to show that Z is
of dimension at most (n − 1)/β.
Let x ∈ M with ψ(x) < +∞, and let y ∈ ∂c ψ(x). Introduce an
action-minimizing curve γ = γ x,y joining x = γ(0) to y = γ(1). I claim
that the map
1
F : γ 7−→ x
2
9 Solution of the Monge problem I: Global approach 219
Thus
c(x, y) + c(x′ , y ′ ) ≤ c(x, y ′ ) + c(x′ , y).
Then by (9.1),
1 1 β
d(x, x′ ) ≤ C d γ , γ′ .
2 2
This implies that m = γ(1/2) determines x = F (m) unambiguously,
and even that F is Hölder-β. (Obviously, this is the same reasoning as
in the proof of Theorem 8.5.)
Now, cover M by a countable number of open sets in which M is
diffeomorphic to a subset U of Rn , via some diffeomorphism ϕU . In each
of these open sets U , consider the union HU of all hyperplanes passing
through a point of rational coordinates, orthogonal to a unit vector
with rational coordinates. Transport this set back to M thanks to the
local diffeomorphism, and take the union over all the sets U . This gives
a set D ⊂ M with the following properties: (i) It is of dimension n − 1;
(ii) It meets every nontrivial continuous curve drawn on M (to see this,
write the curve locally in terms of ϕU and note that, by continuity, at
least one of the coordinates of the curve has to become rational at some
time).
Next, let x ∈ Z, and let y0 , y1 be two distinct elements of ∂c ψ(x).
By assumption there is a continuous curve (yt )0≤t≤1 lying entirely in
∂c ψ(x). For each t, introduce an action-minimizing curve (γt (s))0≤s≤1
between x and yt (s here is the time parameter along the curve). De-
fine mt := γt (1/2). This is a continuous path, nontrivial (otherwise
γ0 (1/2) = γ1 (1/2), but two minimizing trajectories starting from x can-
not cross in their middle, or they have to coincide at all times by (9.1)).
So there has to be some t such that yt ∈ D. Moreover, the map F con-
structed above satisfies F (yt ) = x for all t. It follows that x ∈ F (D).
(See Figure 9.1.)
220 9 Solution of the Monge problem I: Global approach
y0
y1
m0
x
m1
Fig. 9.1. Scheme of proof for Theorem 9.2. Here there is a curve (yt )0≤t≤1 lying
entirely in ∂c ψ(x), and there is a nontrivial path (mt )0≤t≤1 obtained by taking the
midpoint between x and yt . This path has to meet D; but its image by γ(1/2) 7→ γ(0)
is {x}, so x ∈ F (D).
Remark 9.5. The assumption that µ does not give mass to sets of di-
mension at most n − 1 is optimal for the existence of a Monge coupling,
as can be seen by choosing µ = H1 |[0,1]×{0} (the one-dimensional Haus-
dorff measure concentrated on the segment [0, 1] × {0} in R2 ), and then
ν = (1/2) H1 |[0,1]×{−1}∪[0,1]×{+1} . (See Figure 9.2.) It is also optimal
1
for the uniqueness, as can be seen by taking µ = (1/2) H{0}×[−1,1] and
1 n
ν = (1/2) H[−1,1]×{0} . In fact, whenever µ, ν ∈ P2 (R ) are supported
on orthogonal subspaces of Rn , then any transference plan is optimal!
To see this, define a function ψ by ψ = 0 on Conv(Spt µ), ψ = +∞ else-
where; then ψ is convex lower semicontinuous, ψ ∗ = 0 on Conv(Spt ν),
so ∂ψ contains Spt µ × Spt ν, and any transference plan is supported
in ∂ψ.
222 9 Solution of the Monge problem I: Global approach
Fig. 9.2. The source measure is drawn as a thick line, the target measure as a thin
line; the cost function is quadratic. On the left, there is a unique optimal coupling
but no optimal Monge coupling. On the right, there are many optimal couplings, in
fact any transference plan is optimal.
In the next chapter, we shall see that Theorem 9.4 can be improved
in at least two ways: Equation (9.4) can be rewritten y = ∇ψ(x);
and the assumption (9.3) can be replaced by the weaker assumption
C(µ, ν) < +∞ (finite optimal transport cost).
Now if one wants to apply Theorem 9.2 to nonquadratic cost func-
tions, the question arises of how to identify those cost functions c(x, y)
which satisfy Assumption (C). Obviously, there might be some geo-
metric obstructions imposed by the domains X and Y: For instance,
if Y is a nonconvex subset of Rn , then Assumption (C) is violated
even by the quadratic cost function. But even in the whole of, say, Rn ,
Assumption (C) is not a generic condition, and so far there is only a
short
p list of known examples. These include the cost functions c(x, y) =
1 + |x − y|2 on Rn × Rn , or more generally c(x, y) = (1 + |x − y|2 )p/2
√
(1 < p < 2) on BR (0) × BR (0) ⊂ Rn × Rn , where R = 1/ p − 1; and
c(x, y) = d(x, y)2 on S n−1 × S n−1 , where d is the geodesic distance on
the sphere. For such cost functions, the Monge problem can be solved
by combining Theorems 8.1, 9.2 and 5.30, exactly as in the proof of
Theorem 9.4.
This approach suffers, however, from two main drawbacks: First it
seems to be limited to a small number of examples; secondly, the verifi-
cation of Assumption (C) is subtle. In the next chapter we shall inves-
tigate a more pedestrian approach, which will apply in much greater
generality.
I shall end this chapter with a simple example of a cost function
which does not satisfy Assumption (C).
Bibliographical notes 223
where β is a smooth even function, β(0) = 0, β ′ (t) > 0 for |t| > 0.
Further, let r > 0 and X± = (±r, 0). (The fact that the segments
[X− , X+ ] and [y−1 , y1 ] are orthogonal is not accidental.) Then ηt (0) =
β(t) is an increasing function of |t|; while ηt (X± ) = −(r 2 +t2 )p/2 +|t|p +
β(t) is a decreasing function of |t| if 0 < β ′ (t) < pt [(r 2 +t2 )p/2−1 −tp−2 ],
which we shall assume. Now define ψ(x) = sup {ηt (x); t ∈ [−1, 1]}. By
construction this is a c-convex function, and ψ(0) = β(1) > 0, while
ψ(X± ) = η0 (X± ) = −r p .
We shall check that ∂c ψ(0) is not connected. First, ∂c ψ(0) is not
empty: this can be shown by elementary means or as a consequence
of Example 10.20 and Theorem 10.24 in the next chapter. Secondly,
∂c ψ(0) ⊂ {(y1 , y2 ) ∈ R2 ; y1 = 0}: This comes from the fact that all
functions ηt are decreasing as a function of |x1 |. (So ψ is also nonin-
creasing in |x1 |, and if (y1 , y2 ) ∈ ∂c ψ(0, 0), then (y12 + y22 )p/2 + ψ(0, 0) ≤
|y2 |p + ψ(y1 , 0) ≤ |y2 |p + ψ(0, 0), which imposes y1 = 0.) Obviously,
∂c ψ(0) is a symmetric subset of the line {y1 = 0}. But if 0 ∈ ∂c ψ(0),
then 0 < ψ(0) ≤ |X± |p + ψ(X± ) = 0, which is a contradiction. So
∂c ψ(0) does not contain 0, therefore it is not connected.
(What is happening is the following. When replacing η0 by ψ, we
have surelevated the origin, but we have kept the points (X± , η0 (X± ))
in place, which forbids us to touch the graph of ψ from below at the
origin with a translation of η0 .) ⊓
⊔
Bibliographical notes
the next chapter (see Theorem 10.42, Corollary 10.44 and Particular
Case 10.45).
Ma, Trudinger and X.-J. Wang [585, Section 7.5] were the first to
seriously study Assumption (C); they had the intuition that it was
connected to a certain fourth-order differential condition on the cost
function which plays a key role in the smoothness of optimal transport.
Later Trudinger and Wang [793], and Loeper [570] showed that the
above-mentioned differential condition is essentially, under adequate
geometric and regularity assumptions, equivalent to Assumption (C).
These issues will be discussed in more detail in Chapter 12. (See in
particular Proposition 12.15(iii).)
The counterexample in Proposition 9.6 is extracted from [585]. The
fact that c(x, y) = (1 + |x − y|2 )p/2 satisfies Assumption (C) on the ball
√
of radius 1/ p − 1 is also taken from [585, 793]. It is Loeper [571] who
discovered that the squared geodesic distance on S n−1 satisfies Assump-
tion (C); then a simplified argument was devised by von Nessi [824].
As mentioned in the end of the chapter, by combining Loeper’s
result with Theorems 8.1, 9.2 and 5.30, one can mimick the proof of
Theorem 9.4 and get the unique solvability of the Monge problem for
the quadratic distance on the sphere, as soon as µ does not see sets of
dimension at most n − 1. Such a theorem was first obtained for general
compact Riemannian manifolds by McCann [616], with a completely
different argument.
Other examples of cost functions satisfying Assumption (C) will
be listed in Chapter 12 (for instance |x − y|−2 , or −|x − y|p /p for
−2 ≤ p ≤ 1, or |x − y|2 + |f (x) − g(y)|2 , where f and g are convex
and 1-Lipschitz). But these other examples do not come from a smooth
convex Lagrangian, so it is not clear whether they satisfy Assumption
(ii) in Theorem 9.2.
In the particular case when ν has finite support, one can prove
the unique solvability of the Monge problem under much more general
assumptions, namely that the cost function is continuous, and µ does
not charge sets of the form {x; c(x, a) − c(x, b) = k} (where a, b, k
are arbitrary), see [261]. This condition was recently used again by
Gozlan [429].
10
A heuristic argument
Let ψ be a c-convex function on a Riemannian manifold M , and φ = ψ c .
Assume that y ∈ ∂c ψ(x); then, from the definition of c-subdifferential,
228 10 Solution of the Monge problem II: Local approach
e ∈ M,
one has, for all x
(
φ(y) − ψ(x) = c(x, y)
(10.1)
φ(y) − ψ(e
x) ≤ c(e
x, y).
It follows that
ψ(x) − ψ(e
x) ≤ c(e
x, y) − c(x, y). (10.2)
Now the idea is to see what happens when x e → x, along a given
direction. Let w be a tangent vector at x, and consider a path ε → xe(ε),
defined for ε ∈ [0, ε0 ), with initial position x and initial velocity w.
e(ε) = expx (εw); or in Rn , just consider x
(For instance, x e(ε) = x + εw.)
Assume that ψ and c( · , y) are differentiable at x, divide both sides
of (10.2) by ε > 0 and pass to the limit:
x x
0 π 2π 0 π 2π
Fig. 10.1. The distance function d(·, y) on S 1 , and its square. The upper-pointing
singularity is typical. The square distance is not differentiable when |x − y| = π; still
it is superdifferentiable, in a sense that is explained later.
For instance, if N and S respectively stand for the north and south
poles on S 2 , then d(x, S) fails to be differentiable at x = N .
Of course, for any x this happens only for a negligible set of y’s; and
the cost function is differentiable everywhere else, so we might think
that this is not a serious problem. But who can tell us that the optimal
transport will not try to take each x (or a lot of them) to a place y
such that c(x, y) is not differentiable?
To solve these problems, it will be useful to use some concepts from
nonsmooth analysis: subdifferentiability, superdifferentiability, approxi-
mate differentiability. The short answers to the above problems are that
(a) under adequate assumptions on the cost function, ψ will be differ-
entiable out of a very small set (of codimension at least 1); (b) c will
be superdifferentiable because it derives from a Lagrangian, and subd-
ifferentiable wherever ψ itself is differentiable; (c) where it exists, ∇x c
will be injective because c derives from a strictly convex Lagrangian.
The next three sections will be devoted to some basic reminders
about differentiability and regularity in a nonsmooth context. For the
convenience of the non-expert reader, I shall provide complete proofs
of the most basic results about these issues. Conversely, readers who
feel very comfortable with these notions can skip these sections.
e (x) = ∇fe(x).
∇f
Differentiability and approximate differentiability 231
Proof that ∇f e (x) is well-defined. Since this concept is local and invari-
ant by diffeomorphism, it is sufficient to treat the case when U is a
subset of Rn .
Let fe1 and fe2 be two measurable functions on U which are both
differentiable at x and coincide with f on a set of density 1. The problem
is to show that ∇fe1 (x) = ∇fe2 (x).
For each r > 0, let Zr be the set of points in Br (x) where either
f (x) 6= fe1 (x) or f (x) 6= fe2 (x). It is clear that vol [Zr ] = o(vol [Br (x)]).
Since fe1 and fe2 are continuous at x, one can write
Z
1
fe1 (x) = lim fe1 (z) dz
r→0 vol [Br (x)]
Z Z
1 e 1
= lim f1 (z) dz = lim fe2 (z) dz
r→0 vol [Br (x) \ Zr ] r→0 vol [Br (x) \ Zr ]
Z
1
= lim fe2 (z) dz = fe2 (x).
r→0 vol [Br (x)]
x∈
/ Zr =⇒ hw, z − xi = o(r). (10.5)
Since the unit vector (z − x)/|z − x| can take arbitrary fixed values in
the unit sphere as z → x, it follows that p = q. Then
f (z) − f (x) = p, z − x + o(|z − x|),
Ok of Rn . Then, since all the concepts involved are local and invariant
under diffeomorphism, we may work in Ok . So in the sequel, I shall
pretend that U is a subset of Rn .
Let us start with the proof of (i). Let f be continuous on U , and
let V be an open subset of U ; the problem is to show that f admits
at least one point of subdifferentiability in V . So let x0 ∈ V , and let
r > 0 be so small that B(x0 , r) ⊂ V . Let B = B(x0 , r), let ε > 0
and let g be defined on B by g(x) := f (x) + |x − x0 |2 /ε. Since f is
continuous, g attains its minimum on B. But g on ∂B is bounded below
by r 2 /ε − M , where M is an upper bound for |f | on B. If ε < r 2 /(2M ),
then g(x0 ) = f (x0 ) ≤ M < r 2 /ε − M ≤ inf ∂B g; so g cannot achieve
its minimum on ∂B, and has to achieve it at some point x1 ∈ B. Then
g is subdifferentiable at x1 , and therefore f also. This establishes (i).
The other two statements are more tricky. Let f : U → R be a
Lipschitz function. For v ∈ Rn and x ∈ U , define
f (x + tv) − f (x)
Dv f (x) := lim , (10.6)
t→0 t
provided that this limit exists. The problem is to show that for almost
any x, there is a vector p(x) such that Dv f (x) = hp(x), vi and the limit
in (10.6) is uniform in, say, v ∈ S n−1 . Since the functions [f (x + tv) −
f (x)]/t are uniformly Lipschitz in v, it is enough to prove the pointwise
convergence (that is, the mere existence of Dv f (x)), and then the limit
will automatically be uniform by Ascoli’s theorem. So the goal is to
show that for almost any x, the limit Dv f (x) exists for any v, and is
linear in v.
It is easily checked that:
(a) Dv f (x) is homogeneous in v: Dtv f (x) = t Dv f (x);
(b) Dv f (x) is a Lipschitz function of v on its domain: in fact,
|Dv f (x) − Dw f (x)| ≤ L |v − w|, where L = kf kLip ;
(c) If Dw f (x) → ℓ as w → v, then Dv f (x) = ℓ; this comes from the
estimate
f (x + tv) − f (x) f (x + tvk ) − f (x)
sup − ≤ kf kLip |v − vk |.
t t t
ζ ∗ [Dv+w f − Dv f − Dw f ] = 0.
for almost all x ∈ Rn \ (Av ∩ Aw ∩ Av+w ), that is, for almost all x ∈ Rn .
Now it is easy to conclude. Let Bv,w be the set of all x ∈ Rn such
that Dv f (x), Dw f (x) or Dv+w f (x) is not well-defined, or (10.7) does
n
S hold true. Let (vk )k∈N be a dense sequence in R , and let B :=
not
j,k∈N Bvj ,vk . Then B is still Lebesgue-negligible, and for each x ∈/B
we have
Dvj +vk f (x) = Dvj f (x) + Dvk f (x). (10.8)
Since Dv f (x) is a Lipschitz continuous function of v, it can be extended
uniquely into a Lipschitz continuous function, defined for all x ∈ / B
Regularity in a nonsmooth world 237
Since ψ(x0 ) is fixed and ψ(y) is bounded above, it follows that ψ(x) is
bounded below for x ∈ B.
Step 4: ψ is locally Lipschitz. Let x0 ∈ U , let V be a neighborhood of
x0 on which |ψ| ≤ M < +∞, and let r > 0 be such that Br (x0 ) ⊂ V .
For any y, y ′ ∈ Br/2 (x0 ), we can write y ′ = (1 − t) y + t z, where
t = |y − y ′ |/r, so z = (y − y ′ )/t + y ∈ Br (x0 ) and |y − z| = r. Then
ψ(y ′ ) ≤ (1 − t) ψ(y) + t ψ(z) + 2 t(1 − t) ω(|y − z|), so
where Σ (ℓ) is the set of points x such that ∇− ψ(x) contains a segment
[p, p′ ] of length 1/ℓ and |p| ≤ ℓ. To conclude, it is sufficient to show that
each Σ (ℓ) is countably (n − 1)-rectifiable, and for that it is sufficient to
show that for each x ∈ Σ (ℓ) the dimension of the tangent cone Tx Σ (ℓ)
is at most n − 1 (Theorem 10.48(i) in the First Appendix).
So let x ∈ Σ (ℓ) , and let q ∈ Tx Σ (ℓ) , q 6= 0. By assumption, there is
a sequence xk ∈ Σ (ℓ) such that
xk − x
−→ q.
tk
In particular |x − xk |/tk converges to the finite, nonzero limit |q|.
For any k ∈ N, there is a segment [pk , p′k ], of length ℓ−1 , that is
contained in ∇− ψ(xk ); and |pk | ≤ ℓ, |p′k | ≤ ℓ + ℓ−1 . By compactness,
up to extraction of a subsequence one has xk → x, pk → p, p′k → p′ ,
|p − p′ | = ℓ−1 . By continuity of ∇− ψ, both p and p′ belong to ∇− ψ(x).
Then the two inequalities
′
ψ(x) ≥ ψ(xk ) + pk , x − xk − ω(|x − xk |)
ψ(xk ) ≥ ψ(x) + p, xk − x − ω(|x − xk |)
combine to yield
p − p′k , x − xk ≥ −2 ω(|x − xk |).
So D x − xk E ω(|x − xk |) |x − xk |
p − p′k , ≥ −2 .
tk |x − xk | tk
Passing to the limit, we find
hp − p′ , qi ≥ 0.
hp − p′ , qi = 0.
240 10 Solution of the Monge problem II: Local approach
xk − x x′k − x
−−−→ p; −−−→ p′ .
tk k→∞ t′k k→∞
Then m(xk , x′k ) ∈ D and m(xk , x′k ) = (xk + x′k )/2 + o(|xk − x′k |) =
x + tk (pk + p′k )/2 + o(tk ), so
p + p′ m(xk , x′k ) − x
= lim ∈ Tx D.
2 k→∞ tk
Thus Tx D is a convex cone. This leaves two possibilities: either Tx D is
included in a half-space, or it is the whole of Rn .
Assume that Tx D = Rn . If C is a small (Euclidean) cube of side 2r,
centered at x0 , for r small enough any point in a neighborhood of x0 can
be written as a combination of barycenters of the vertices x1 , . . . , xN of
C, and all these barycenters will lie within a ball of radius 2r centered
Semiconvexity and semiconcavity 243
Since x0 is fixed and ψ(y ′ ) is bounded above, this shows that ψ(y) is
bounded below as y varies in B.
Next, let us show that ψ is locally Lipschitz. Let V be a neigh-
borhood of x0 in which |ψ| is bounded by M . If r > 0 is small
enough, then for any y, y ′ ∈ Br (x0 ) there is z = z(y, y ′ ) ∈ V such
that y ′ = [y, z]λ , λ = d(y, y ′ )/4r ∈ [0, 1/2]. (Indeed, choose r so small
that all geodesics in B5r (x0 ) are minimizing, and B5r (x0 ) ⊂ V . Given
y and y ′ , take the geodesic going from y to y ′ , say expy (tv), 0 ≤ t ≤ 1;
extend it up to time t(λ) = 1/(1 − λ), write z = expy (t(λ) v). Then
d(x0 , z) ≤ d(x0 , y) + t(λ) d(y, y ′ ) ≤ d(x0 , y) + 2 d(y, y ′ ) < 5r.) So
ψ(y ′ ) ≤ (1 − λ) ψ(y) + λ ψ(z) + λ(1 − λ) ω(d(y, z)),
whence
244 10 Solution of the Monge problem II: Local approach
Indeed, let γ(t) = expx (tw), y = expx w, then for any t ∈ [0, 1],
so
ψ(γ(t)) − ψ(x) ψ(y) − ψ(x) ω(|w|)
≤ + (1 − t) .
t|w| |w| |w|
On the other hand, by subdifferentiability,
ψ(γ(t)) − ψ(x) hp, twi o(t|w|)
w o(t|w|)
≥ − = p, − .
t|w| t|w| t|w| |w| t|w|
The combination of the two previous inequalities implies
w o(t|w|) ψ(y) − ψ(x) ω(|w|)
p, − ≤ + (1 − t) .
|w| t|w| |w| |w|
The limit t → 0 gives (10.14). At the same time, it shows that for
|w| ≤ r,
w ω(r)
p, ≤ kψkLip (B2r (x0 )) + .
|w| r
By choosing w = rp, we conclude that |p| is bounded above, indepen-
dently of x. So ∇− ψ is locally bounded in the sense that there is a
uniform bound on the norms of all elements of ∇− ψ(x) when x varies
in a compact subset of the domain of ψ.
Semiconvexity and semiconcavity 245
(H∞)1 For any x and for any measurable set S which does not “lie
on one side of x” (in the sense that Tx S is not contained in a
half-space) there is a finite collection of elements z1 , . . . , zk ∈
S, and a small open ball B containing x, such that for any
y outside of a compact set,
Remark 10.17. The requirements in (ii) and (iii) are fulfilled if the
Lagrangian L is time-independent, C 2 , strictly convex superlinear as a
function of v (recall Example 7.5). But they also hold true for other
interesting cases such as L(x, v, t) = |v|1+α , 0 < α < 1.
Remark 10.18. Part (i) of Proposition 10.15 means that the behavior
of the (squared) distance function is typical: if one plots c(x, y) as a
function of x, for fixed y, one will always see upward-pointing crests as
in Figure 10.1, never downward-pointing ones.
or equivalently, with h = x − y,
ε
c(h) − c h 1 − −−−→ +∞.
|h| h→∞
εh hε
c(h) − c(hε ) ≥ ∇c(hε ) · = ε ∇c(hε ) · −−−−→ +∞,
|h| |hε | |h|→∞
as desired.
We shall now return to optimal transport, and arrive at the core of the
analysis of the Monge problem: the study of the regularity of c-convex
Differentiability of c-convex functions 251
∂c ψ(x) 6= ∅ =⇒ ∇− ψ(x) 6= ∅.
I shall use the notion of tangent cone defined later in Definition 10.46,
and show that if x ∈ S is such that Tx S is not included in a half-space,
then ψ is bounded on a small ball around x. It will follow that x is in
fact in the interior of Ω. So for each x ∈ S \ Ω, Tx S will be included
in a half-space, and by Theorem 10.48(ii) S \ Ω will be of dimension at
most n − 1. Moreover, this will show that ψ is locally bounded in Ω.
So let x be such that ψ(x) < +∞, and Tx S is not included in a
half-space. By assumption, there are points z1 , . . . , zk in S, a small ball
B around x, and a compact set K ⊂ Y such that for any y ∈ Y \ K,
So
So if y ∈
/ K, there is a z such that ψ(z) + c(z, y) ≤ c(x, y) − (M + 1) ≤
ψ(x) + c(x, y) − 1, and
Differentiability of c-convex functions 253
φ(y) − c(x, y) ≤ inf ′ ψ(z) + c(z, y) − c(x, y)
z∈B
≤ ψ(x) − 1 = sup [φ(y ′ ) − c(x, y ′ )] − 1.
y ′ ∈Y
Then the supremum of φ(y) − c(x, y) over all Y is the same as the
supremum over only K. But this is a maximization problem for an
upper semicontinuous function on a compact set, so it admits a solution,
which belongs to ∂c ψ(x).
The same reasoning can be made with x replaced by w in a small
neighborhood B of x, then the conclusion is that ∂c ψ(w) is nonempty
and contained in the compact set K, uniformly for z ∈ B. If K ′ ⊂ Ω
is a compact set, we can cover it by a finite number of small open
balls Bj such that ∂c ψ(Bj ) is contained in a compact set Kj , so that
∂c ψ(K ′ ) ⊂ ∪Kj . Since on the other hand ∂c ψ(K ′ ) is closed by the
continuity of c, it follows that ∂c ψ(K ′ ) is compact. This concludes the
proof of Theorem 10.24. ⊓
⊔
leads to
ψ(γt ) = sup φ(y) − c(γt , y)
y
h i
≤ sup φ(y) − (1 − t) c(γ0 , y) − t c(γ1 , y) + t(1 − t) ω d(γ0 , γ1 )
y
h i
= sup (1 − t) φ(y) − (1 − t) c(γ0 , y) + t φ(y) − t c(γ1 , y)
y
+ t(1 − t) ω d(γ0 , γ1 )
≤ (1 − t) sup φ(y) − c(γ0 , y) + t sup φ(y) − c(γ1 , y)
y y
+ t(1 − t) ω d(γ0 , γ1 )
= (1 − t) ψ(γ0 ) + t ψ(γ1 ) + t(1 − t) ω d(γ0 , γ1 ) .
Remark 10.27. Theorems 10.24 to 10.26, and (in the Lagrangian case)
Proposition 10.15 provide a good picture of differentiability points of
Applications to the Monge problem 255
This section and the next one deal with extensions of Theorem 10.28.
Here we shall learn how to cover situations in which no control at
infinity is assumed, and in particular Assumption (iii) of Theorem 10.28
might not be satisfied. The short answer is that it is sufficient to replace
the gradient in (10.20) by an approximate gradient. (Actually a little
bit more will be lost, see Remarks 10.39 and 10.40 below.)
determines the unique optimal coupling between µℓ and νℓ , for the cost
cℓ . (Note that ∇x cℓ coincides with ∇x c when x is in the interior of Bℓ ,
and µℓ [∂Bℓ ] = 0, so equation (10.24) does hold true πℓ -almost surely.)
Now we can define our Monge coupling. For each ℓ ∈ N, and each
x ∈ Seℓ \ Zℓ , ψℓ coincides with ψ on a set which has density 1 at x, so
e
∇ψℓ (x) = ∇ψ(x), and (10.24) becomes
e
∇x c(x, y) + ∇ψ(x) = 0. (10.25)
Removing the conditions at infinity 261
then
e
∇ψ(x) + ∇x c(x, y) = 0 π(dx dy)-almost surely. (10.28)
µ[Sℓ ∩ Br (x)]
−−−→ 1.
µ[Br (x)] r→0
(The proof of this uses the fact that we are working on a Riemannian
manifold; see the bibliographical notes for more information.)
Moreover, the transport plan πℓ induced by π on Sℓ coincides with
the deterministic transport associated with the map
So at least one of the sets {ψℓ < ψeℓ } ∩ Br (x) and {ψℓ > ψeℓ } ∩ Br (x)
has µ-measure at least µ[Br (x)]/2. Without loss of generality, I shall
assume that this is the set {ψℓ > ψeℓ }; so
µ[Br (x)]
µ {ψℓ > ψeℓ } ∩ Br (x) ≥ . (10.30)
2
Next, ψℓ coincides with ψ on the set Sℓ , which has µ-density 1 at x,
and similarly ψeℓ coincides with ψe on a set of µ-density 1 at x. It follows
that
µ {ψ > ψ}e ∩ {ψℓ > ψeℓ } ∩ Br (x) ≥ µ[Br (x)] 1 − o(1) as r → 0.
2
(10.31)
Then since x is a Besicovich point of {∇ψℓ 6= ∇ψeℓ } ∩ Cℓ ∩ C eℓ ,
266 10 Solution of the Monge problem II: Local approach
e ∩ {ψℓ > ψeℓ } ∩ {∇ψℓ 6= ∇ψeℓ } ∩ (Cℓ ∩ C
µ {ψ > ψ} eℓ ) ∩ Br (x)
e ∩ {ψℓ > ψeℓ } ∩ Br (x) − µ[Br (x) \ (Cℓ ∩ C
≥ µ {ψ > ψ} eℓ )]
1
≥ µ[Br (x)] − o(1) − o(1) .
2
As a conclusion,
h i
e
∀r > 0 µ {ψ > ψ}∩{ψ e e e
ℓ > ψℓ }∩{∇ψℓ 6= ∇ψℓ }∩(Cℓ ∩ Cℓ )∩Br (x) > 0.
(10.32)
Now let
e
A := {ψ > ψ}.
The proof will result from the next two claims:
Claim 1: Te−1 (T (A)) ⊂ A;
Claim 2: The set {ψℓ > ψeℓ } ∩ (Cℓ ∩ C
eℓ ) ∩ {∇ψℓ 6= ∇ψeℓ } ∩ Te−1 (T (A))
lies a positive distance away from x.
Let us postpone the proofs of these claims for a while, and see why
they imply the theorem. Let S ⊂ A be defined by
e ∩ {ψℓ > ψeℓ } ∩ {∇ψℓ 6= ∇ψeℓ } ∩ (Cℓ ∩ C
S := {ψ > ψ} eℓ ),
and let
r := d x, S ∩ Te−1 (T (A)) /2.
On the one hand, since S ∩ Te−1(T (A)) = ∅ by definition, µ[S ∩B(x, r)∩
Te−1 (T (A))] = µ[∅] = 0. On the other hand, r is positive by Claim 2,
so µ[S ∩ B(x, r)] > 0 by (10.32). Then
µ A \ Te−1 (T (A)) ≥ µ S ∩ B(x, r) \ Te−1 (T (A)) = µ[S ∩ B(x, r)] > 0.
(Recall that x is such that ψℓ (x) = ψ(x) = ψ(x)e = ψeℓ (x).) On the
other hand, since ψℓ is differentiable at x, we have
which is possible only if ∇ψeℓ (x) − ∇ψℓ (x) = 0. But this contradicts the
definition of x. So Claim 2 holds true, and this concludes the proof of
Theorem 10.42. ⊓
⊔
The next theorem summarizes two results which were useful in the
present chapter:
270 10 Solution of the Monge problem II: Local approach
Proof of Theorem 10.48. For each x ∈ S, let πx stand for the orthogonal
projection on Tx S, and let πx⊥ = Id − πx stand for the orthogonal
projection on (Tx S)⊥ . I claim that
1 1 + 2δ
|x− x′ | ≤ =⇒ |πℓ⊥ (x− x′ )| ≤ L |πℓ (x− x′ )|, L= ; (10.39)
k 1 − 2δ
this will imply that the intersection of Skℓ with a ball of diameter 1/k
is contained in an L-Lipschitz graph over πℓ (Rn ), and the conclusion
will follow immediately.
To prove (10.39), note that, if π, π ′ are any two orthogonal projec-
tions, then |(π ⊥ −(π ′ )⊥ )(z)| = |(Id −π)(z)−(Id −π ′ )(z)| = |(π−π ′ )(z)|,
therefore kπ ⊥ − (π ′ )⊥ k = kπ − π ′ k, and
|y − x|
∀x ∈ ∂S, ∃r > 0, ∃ν ∈ F, ∀y ∈ ∂S ∩Br (x), hy −x, νi ≤
.
2
(10.40)
Indeed, otherwise there is x ∈ ∂S such that for all k ∈ N and for all
ν ∈ F there is yk ∈ ∂S such that |yk − x| ≤ 1/k and hyk − x, νi >
|yk − x|/2. By assumption there is ξ ∈ S n−1 such that
∀ζ ∈ Tx S, hξ, ζi ≤ 0.
Let ν ∈ F be such that |ξ − ν| < 1/8 and let (yk )k∈N be a sequence as
above. Since yk ∈ ∂S and yk 6= x, there is yk′ ∈ S such that |yk − yk′ | <
|yk − x|/8. Then
|yk − x| |x − yk′ |
hyk′ − x, ξi ≥ hyk − x, νi − |yk − x| |ξ − ν| − |y − yk′ | ≥ ≥ .
4 8
So D y′ − x E 1
k
,ξ ≥ . (10.41)
|yk′ − x| 8
Up to extraction of a subsequence, (yk′ − x)/|yk′ − x| converges to some
ζ ∈ Tx S, and then by passing to the limit in (10.41) we have hζ, ξi ≥
272 10 Solution of the Monge problem II: Local approach
x = ζ(x′ , y),
x ∈ M ′ ⇐⇒ y = ϕ(x′ ).
Rn−k ϕ
Rk
Fig. 10.2. k-dimensional graph
and
Z = (−β, β) × Z ′ ; D = V \ Z.
I claim that λn [Z] = 0. To prove this it is sufficient to check that
λn−1 [Z ′ ] = 0. But Z ′ is the nonincreasing limit of (Zℓ′ )ℓ∈N , where
n
Zℓ′ = y ∈ B(0, r0 );
o
λ1 {t ∈ (−β, β); ∃ i; ∇fi (t, y) does not exist} ≥ 1/ℓ .
By Fubini’s theorem,
h i
λn x ∈ O; ∇fi (x) does not exist for some i ≥ (λn−1 [Zℓ′ ]) × (1/ℓ);
and the left-hand side is equal to 0 since all fi are differentiable almost
everywhere. It follows that λn−1 [Zℓ′ ] = 0, and by taking the limit ℓ → ∞
we obtain λP ′
n−1 [Z ] = 0.
Let f = fi , and let ∂1 f = h∇f, vi stand for its partial derivative
with respect to the first coordinate. The first step of the proof has
shown that ∂1 f (x) ≥ α/2 at each point x where all functions fi are
276 10 Solution of the Monge problem II: Local approach
This holds true for all ((t, y), (t′ , y)) in D × D. Since Z = V \ D
has zero Lebesgue measure, D is dense in V , so (10.43) extends to
all ((t, y), (t′ , y)) ∈ V .
For all y ∈ B(0, r0 ), inequality (10.43), combined with the estimate
αβ
|f (0, y)| = |f (0, y) − f (0, 0)| ≤ kf kLip |y| ≤ ,
4
guarantees that the equation f (t, y) = 0 admits exactly one solution
t = ϕ(y) in (−β, β).
It only remains to check that ϕ is Lipschitz on B(0, r0 ). Let y, z ∈
B(0, r0 ), then f (ϕ(y), y) = f (ϕ(z), z) = 0, so
f (ϕ(y), y) − f (ϕ(z), y) = f (ϕ(z), z) − f (ϕ(z), y). (10.44)
Since the first partial derivative of f is no less than α/2, the left-hand
side of (10.44) is bounded below by (α/2)|ϕ(y) − ϕ(z)|, while the right-
hand side is bounded above by kf kLip |z − y|. The conclusion is that
2 kf kLip
|ϕ(y) − ϕ(z)| ≤ |z − y|,
α
so ϕ is indeed Lipschitz. ⊓
⊔
where σ(t) is the sectional curvature of the plane generated by γ̇(t) and
ei (t) inside Tγ(t) M . To relate k(t) and h(t), we note that
d(y, x)2
∇x = d(y, x) ∇x d(y, x);
2
278 10 Solution of the Monge problem II: Local approach
d(y, x)2
∇2x = d(y, x) ∇2x d(y, x) + ∇x d(x, y) ⊗ ∇x d(x, y).
2
By applying this to the tangent vector ei (t) and using the fact that
∇x d(x, y) at x = γ(t) is just γ̇(t), we get
From (10.46) follow the two comparison results which were used in
Theorem 10.41 and Corollary 10.44:
(a) Assume that the sectional curvatures of M are all non-
negative. Then (10.46) forces ḣ ≤ 0, so h remains bounded above by 1
for all times. In short:
2 d(x, y)2
nonnegative sectional curvature =⇒ ∇x ≤ Id Tx M .
2
(10.47)
(If we think of the Hessian as a bilinear form, this is the same as
∇2x (d(x, y)2 /2) ≤ g, where g is the Riemannian metric.) Inequal-
ity (10.47) is rigorous if d(x, y)2 /2 is twice differentiable at x; otherwise
the conclusion should be reinterpreted as
d(x, y)2 r2
x→ is semiconcave with a modulus ω(r) = .
2 2
d(x, y)2 r2
x→ is semiconcave with a modulus ω(r) = C(K) .
2 2
Examples 10.53. The previous result applies to any compact mani-
fold, or any manifold which has been obtained from Rn by modification
on a compact set. But it does not apply to the hyperbolic space Hn ;
in fact, if y is any given point in Hn , then the function x → d(y, x)2 is
not uniformly semiconcave as x → ∞. (Take for instance the unit disk
in R2 , with polar coordinates (r, θ) as a model of H2 , then the distance
from the origin is d(r, θ) = log((1 + r)/(1 − r)); an explicit computation
shows that the first (and only nonzero) coefficient of the matrix of the
Hessian of d2 /2 is 1 + r d(r), which diverges logarithmically as r → 1.)
Bibliographical notes
The key ideas in this chapter were first used in the case of the quadratic
cost function in Euclidean space [154, 156, 722].
The existence of solutions to the Monge problem and the differ-
entiability of c-convex functions, for strictly superlinear convex cost
functions in Rn (other than quadratic) was investigated by several au-
thors, including in particular Rüschendorf [717] (formula (10.4) seems
to appear there for the first time), Smith and Knott [754], Gangbo and
McCann [398, 399]. In the latter reference, the authors get rid of all mo-
ment assumptions by avoiding the explicit use of Kantorovich duality.
These results are reviewed in [814, Chapter 2]. Gangbo and McCann
impose some assumptions of growth and superlinearity, such as the one
280 10 Solution of the Monge problem II: Local approach
tance on the Wiener space; then usual strategies seem to fail, in the
first place because of (non)measurability issues [23].
The optimal transport problem with a distance cost function is
also related to the irrigation problem studied recently by various au-
thors [109, 110, 111, 112, 152], the Bouchitté–Buttazzo variational
problem [147, 148], and other problems as well. In this connection,
see also Pratelli [689].
The partial optimal transport problem, where only a fixed fraction
of the mass is transferred, was studied in [192, 365]. Under adequate as-
sumptions on the cost function, one has the following results: whenever
the transferred mass is at least equal to the shared mass between the
measures µ and ν, then (a) there is uniqueness of the partial transport
map; (b) all the shared mass is at the same time both source and target;
(c) the “active” region depends monotonically on the mass transferred,
and is the union of the intersection of the supports and a semiconvex
set.
To conclude, here are some remarks about the technical ingredients
used in this chapter.
Rademacher [697] proved his theorem of almost everywhere differ-
entiability in 1918, for Lipschitz functions of two variables; this was
later generalized to an arbitrary number of variables. The simple ar-
gument presented in this section seems to be due to Christensen [233];
it can also be found, up to minor variants, in modern textbooks about
real analysis such as the one by Evans and Gariepy [331, pp. 81–84].
Ambrosio showed me another simple argument which uses Lebesgue’s
density theorem and the identification of a Lipschitz function with a
function whose distributional derivative is essentially bounded.
The book by Cannarsa and Sinestrari [199] is an excellent reference
for semiconvexity and subdifferentiability in Rn , as well as the links
with the theory of Hamilton–Jacobi equations. It is centered on semi-
concavity rather than semiconvexity (and superdifferentiability rather
than subdifferentiability), but this is just a question of convention.
Many regularity results in this chapter have been adapted from that
source (see in particular Theorem 2.1.7 and Corollary 4.1.13 there).
Also the proof of Theorem 10.48(i) is adapted from [199, Theorem 4.1.6
and Corollary 4.1.9]. The core results in this circle of ideas and tools
can be traced back to a pioneering paper by Alberti, Ambrosio and
Cannarsa [12]. Following Ambrosio’s advice, I used the same methods
to establish Theorem 10.48(ii) in the present notes.
284 10 Solution of the Monge problem II: Local approach
There are two important things that one should check before writing
the Jacobian equation: First, T should be injective on its domain of
definition; secondly, it should possess some minimal regularity.
So how smooth should T be for the Jacobian equation to hold true?
We learn in elementary school that it is sufficient for T to be continu-
ously differentiable, and a bit later that it is actually enough to have T
Lipschitz continuous. But that degree of regularity is not always avail-
able in optimal transport! As we shall see in Chapter 12, the transport
map T might fail to be even continuous.
There are (at least) three ways out of this situation:
(i) Only use the Jacobian equation in situations where the opti-
mal map is smooth. Such situations are rare; this will be discussed in
Chapter 12.
(ii) Only use the Jacobian equation for the optimal map between
µt0 and µt , where (µt )0≤t≤1 is a compactly supported displacement
288 11 The Jacobian equation
If Tt0 →t = Tt ◦ Tt−1
0
stands for the transport map between µt0 and µt ,
then the equation
also holds true for t0 ∈ (0, 1); but now this is just the theorem of change
of variables for Lipschitz maps.
Then,
Z Z ft0 (x)
F (y, ft (y)) vol(dy) = F Tt0 →t (x), Jt0 →t (x) vol(dx).
M M Jt0 →t (x)
Furthermore, µt0 (dx)-almost surely, Jt0 →t (x) > 0 for all t ∈ [0, 1].
Proof of Theorem 11.3. For brevity I shall abbreviate vol(dx) into just
dx. Let us first consider the case when (µt )0≤t≤1 is compactly sup-
ported. Let Π be a probability measure on the set of minimizing curves,
such that µt = (et )# Π. Let Kt = et (Spt Π) and Kt0 = et0 (Spt Π). By
Theorem 8.5, the map γt0 → γt is well-defined and Lipschitz for all
γ ∈ Spt Π. So Tt0 →t (γt0 ) = γt is a Lipschitz map Kt0 → Kt . By as-
sumption µt is absolutely continuous, so Theorem 10.28 (applied with
the cost function ct0 ,t (x, y), or maybe ct,t0 (x, y) if t < t0 ) guarantees
that the coupling (γt , γt0 ) is deterministic, which amounts to saying
that γt0 → γt is injective apart from a set of zero probability.
Then we can use the change of variables formula with g = 1Kt ,
T = Tt0 →t , and we find f (x) = Jt0 →t (x). Therefore, for any nonnegative
measurable function G on M ,
Z Z
G(y) dy = G(y) d((Tt0 →t )# µ)(y)
Kt Kt
Z
= (G ◦ Tt0 →t (x)) f (x) dx
K t0
Z
= G(Tt0 →t (x)) Jt0 →t (x) dx.
K t0
We can apply this to G(y) = F (y, ft (y)) and replace ft (Tt0 →t (x)) by
ft0 (x)/Jt0 →t (x); this is allowed since in the right-hand side the contri-
bution of those x with ft (Tt0 →t (x)) = 0 is negligible, and Jt0 →t (x) = 0
implies (almost surely) ft0 (x) = 0. So in the end
Z Z ft0 (x)
F (y, ft (y)) dy = F Tt0 →t (x), Jt0 →t (x) dx.
Kt K t0 Jt0 →t (x)
Since this is true almost surely on Kℓ′ , for each ℓ′ , it is also true almost
surely.
Next, for any nonnegative measurable function G, by monotone con-
vergence and the first part of the proof one has
Z Z
G(y) dy = lim G(y) dy
U Kt,ℓ ℓ→∞ Kt,ℓ
Z
= lim G(Tt0 →t (x)) Jt0 →t (x) dx
ℓ→∞ K
t0 ,ℓ
Z
= G(Tt0 →t (x)) Jt0 →t (x) dx.
U Kℓ,t0
maps F1 : γ(t0 ) → (γ(0), γ(t0 )), F2 : (γ(0), γ(t0 )) → (γ(0), γ̇(0)) and
F3 : (γ(0), γ̇(0)) → γ(t). Both F2 and F3 have a positive Jacobian de-
terminant, at least if t < 1; so if x is chosen in such a way that F1 has
a positive Jacobian determinant at x, then also Tt0 →t = F3 ◦ F2 ◦ F1
will have a positive Jacobian determinant at x for t ∈ [0, 1). ⊓
⊔
Bibliographical notes
Theorem 11.1 can be obtained (in Rn ) by combining Lemma 5.5.3
in [30] with Theorem 3.83 in [26].
In the context of optimal transport, the change of variables for-
mula (11.1) was proven by McCann [614]. His argument is based on
Lebesgue’s density theory, and takes advantage of Alexandrov’s the-
orem, alluded to in this chapter and proven later as Theorem 14.25:
A convex function admits a Taylor expansion at order 2 at almost
each x in its domain of definition. Since the gradient of a convex func-
tion has locally bounded variation, Alexandrov’s theorem can be seen
essentially as a particular case of the theorem of approximate differ-
entiability of functions with bounded variation. McCann’s argument is
reproduced in [814, Theorem 4.8].
Along with Cordero-Erausquin and Schmuckenschläger, McCann
later generalized his result to the case of Riemannian manifolds [246].
Modulo certain complications, the proof basically follows the same pat-
tern as in Rn . Then Cordero-Erausquin [243] treated the case of strictly
convex cost functions in Rn in a similar way.
Ambrosio pointed out that those results could be retrieved within
the general framework of push-forward by approximately differentiable
mappings. This point of view has the disadvantage of involving more
subtle arguments, but the advantage of showing that it is not a special
feature of optimal transport. It also applies to nonsmooth cost functions
such as |x − y|p . In fact it covers general strictly convex costs of the
form c(x− y) as soon as c has superlinear growth, is C 1 everywhere and
C 2 out of the origin. A more precise discussion of these subtle issues
can be found in [30, Section 6.2.1].
It is a general feature of optimal transport with strictly convex cost
in Rn that if T stands for the optimal transport map, then the Jacobian
matrix ∇T , even if not necessarily nonnegative symmetric, is diagonal-
izable with nonnegative eigenvalues; see Cordero-Erausquin [243] and
Bibliographical notes 293
Ambrosio, Gigli and Savaré [30, Section 6.2]. From an Eulerian perspec-
tive, that diagonalizability property was already noticed by Otto [666,
Proposition A.4]. I don’t know if there is an analog on Riemannian
manifolds.
Changes of variables of the form y = expx (∇ψ(x)) (where ψ is
not necessarily d2 /2-convex) have been used in a remarkable paper
by Cabré [181] to investigate qualitative properties of nondivergent el-
liptic equations (Liouville theorem, Alexandrov–Bakelman–Pucci esti-
mates, Krylov–Safonov–Harnack inequality) on Riemannian manifolds
with nonnegative sectional curvature. (See for instance [189, 416, 786]
for classical proofs in Rn .) It is mentioned in [181] that the methods
extend to sectional curvature bounded below. For the Harnack inequal-
ity, Cabré’s method was extended to nonnegative Ricci curvature by
S. Kim [516].
12
Smoothness
f (x)
det ∇2 ψ(x) = . (12.4)
g(∇ψ(x))
Caffarelli’s counterexample
supports, and yet the optimal transport between µ(dx) = f (x) dx and
ν(dy) = g(y) dy, for the cost c(x, y) = |x − y|2 , is discontinuous.
Proof of Theorem 12.3. Let f be the indicator function of the unit ball
B in R2 (normalized to be a probability measure), and let g = gε be
the (normalized) indicator function of a set Cε obtained by first sepa-
rating the ball into two halves B1 and B2 (say with distance 2), then
building a thin bridge between those two halves, of width O(ε). (See
Figure 12.1.) Let also g be the normalized indicator function of B1 ∪B2 :
this is the limit of gε as ε ↓ 0. It is not difficult to see that gε (identified
with a probability measure) can be obtained from f by a continuous
deterministic transport (after all, one can deform B continuously into
Cε ; just think that you are playing with clay, then it is possible to mas-
sage the ball into Cε , without tearing off). However, we shall see here
that for ε small enough, the optimal transport cannot be continuous.
0
1
0
1
S−
11111111
00000000
00000000
11111111
0
1
0S
1
0
1
0
1
1111111111111111
0000000000000000
0000000000000000
1111111111111111 S+
000000000
111111111
000000000
111111111
00000000
11111111 00000000000000
11111111111111
0000000000000000
1111111111111111
00000000
11111111 00000000000000
11111111111111
0000000000000000
1111111111111111
0
1
0
1 0000000000000000
1111111111111111
00000
11111
00000000000
11111111111
00000
11111
00000000000
11111111111
000000000
111111111
0000000000000000
1111111111111111
000000000
111111111
0000000000000000
1111111111111111
00000000
11111111 0000
1111
11111111111111
0000000000000000
1111111111111111
00000000
11111111 00000000000000
11111111111111
0000000000000000
1111111111111111
00000000
11111111 0000
1111
00000000000000
11111111111111
0000000000000000
1111111111111111
00000000
11111111 0000
1111
00000000000000
11111111111111
0
1
0
1
0
1
1
0000000000000000
1111111111111111
000000000
111111111
1111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000
00000000000000 00000
11111
00000000000
11111111111
0000000000000000
1111111111111111
000011111
111100000
00000000000
11111111111 000000000
111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
00000
11111
011111111111
00000000000 000000000
111111111
0000000000000000
1111111111111111
00000
11111
00000000000
11111111111 000000000
111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
00000000
11111111 0000
1111
00000000000000
11111111111111
0000000000000000
1111111111111111
00000000
11111111 0000
1111
00000000000000
11111111111111
0
1
0
1 0000000000000000
1111111111111111
00000
11111
00000000000
11111111111
00000
11111
00000000000
11111111111
000000000
111111111
0000000000000000
1111111111111111
000000000
111111111
0000000000000000
1111111111111111
00000000
11111111
00000000
11111111
0000
1111
00000000000000
11111111111111
0000
1111
00000000000000
11111111111111
0
1
0
1 0000000000000000
1111111111111111
00000
11111
00000000000
11111111111 000000000
111111111
0000000000000000
1111111111111111
00000
11111
00000000000
11111111111 000000000
111111111
00000000
11111111
00000000
11111111
00000000
11111111
0000
1111
00000000000000
11111111111111
00000000000000
11111111111111
0000
1111
0
1
0
00000000000000 1
11111111111111 0
0
1
0000000000000000
1111111111111111
00000
11111
00001
1111 00000000000
11111111111
00000000000
11111111111
000000000
111111111
000000000
111111111
0000000000000000
1111111111111111
00000000111111111111111111111111111111111
11111111 000000000
111111111
000000000000000000000000000000000
000000000
111111111
00000000
11111111
00000000
11111111 0
1
0
1 0000000000000000
1111111111111111
000000000000000000000000000000000
111111111111111111111111111111111
000000000
111111111
000000000
111111111
00000000
11111111
00000000
11111111
00000000
11111111
00000000
11111111
0
1
0
1
0
1
0
1
0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
00000000
11111111
00000000
11111111 0
1
0
1 0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
00000000
11111111
00000000
11111111
00000000
11111111
00000000
11111111
0
1
0
1
0
1
0
1
0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
00000000
11111111
00000000
11111111
00000000
11111111
0
1
0
1
0
1
0
1
0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
0000000000000000
1111111111111111
000000000
111111111
0
1
0
1
0
1
Fig. 12.1. Principle behind Caffarelli’s counterexample. The optimal transport from
the ball to the “dumb-bells” has to be discontinuous, and in effect splits the upper
region S into the upper left and upper right regions S− and S+ . Otherwise, there
should be some transport along the dashed lines, but for some lines this would
contradict monotonicity.
Loeper’s counterexample
A+
11
00
00
11
00
11
B− B
11
00 O 11
00+
00
11 00
11
00
11 00
11
11
00
00
11
00
11
A−
Fig. 12.2. Principle behind Loeper’s counterexample. This is the surface S, im-
mersed in R3 , “viewed from above”. By symmetry, O has to stay in place. Because
most of the initial mass is close to A+ and A− , and most of the final mass is close
to B+ and B− , at least some mass has to move from one of the A-balls to one
of the B-balls. But then, because of the modified (negative curvature) Pythagoras
inequality, it is more efficient to replace the transport scheme (A → B, O → O), by
(A → O, O → B).
X and g on Y such that any optimal transport map from f vol to g vol
is discontinuous.
Proof of Theorem 12.7. Note first that the continuity of the cost func-
tion and the compactness of X × Y implies the continuity of ψ. In the
proof, the notation d will be used interchangeably for the distances on
X and Y.
Let C1 and C2 be two disjoint connected components of ∂c ψ(x).
Since ∂c ψ(x) is closed, C1 and C2 lie a positive distance apart. Let
r = d(C1 , C2 )/5, and let C = {y ∈ Y; d(y, ∂c ψ(x)) ≥ 2 r}. Further, let
y 1 ∈ C1 , y 2 ∈ C2 , B1 = Br (y 1 ), B2 = Br (y 2 ). Obviously, C is compact,
and any path going from B1 to B2 has to go through C.
Then let K = {z ∈ X ; ∃ y ∈ C; y ∈ ∂c ψ(z)}. It is clear that K is
compact: Indeed, if zk ∈ K converges to z ∈ X , and yk ∈ ∂c ψ(zk ), then
without loss of generality yk converges to some y ∈ C and one can pass
to the limit in the inequality ψ(zk ) + c(zk , yk ) ≤ ψ(t) + c(t, yk ), where
t ∈ X is arbitrary. Also K is not empty since X and Y are compact.
Next I claim that for any y ∈ C, for any x such that y ∈ ∂c ψ(x),
and for any i ∈ {1, 2},
Indeed, y ∈
/ ∂c ψ(x), so
ψ(x) + c(x, y) = inf ψ(e
x) + c(e
x, y) < ψ(x) + c(x, y). (12.8)
x
e∈X
for some ε > 0. This follows easily from a contradiction argument based
on the compactness of C and K, and once again the continuity of ∂c ψ.
Then let δ ∈ (0, r) be small enough that for any (x, y) ∈ K × C, for
any i ∈ {1, 2}, the inequalities
imply
′ ′
c(x , y) − c(x, y) ≤ ε , c(x , y) − c(x, y) ≤ ε , (12.11)
10 10
′ ′ ′ ′
c(x , y ) − c(x, y i ) ≤ ε , c(x , y ) − c(x, y i ) ≤ ε .
i i
10 10
Let K δ = {x ∈ X ; d(x, K) ≤ δ}. From the assumptions on X and Y,
Bδ (x) has positive volume, so we can fix a smooth positive probability
density f on X such that the measure µ = f vol satisfies
3
µ[Bδ (x)] ≥ ; f ≥ ε0 > 0 on K δ . (12.12)
4
Also we can construct a sequence of smooth positive probability den-
sities (gk )k∈N on Y such that the measures νk = gk vol satisfy
1
νk −−−→ δy1 + δy 2 weakly. (12.13)
k→∞ 2
Let us assume the existence of a continuous optimal transport Tk
sending µk to νk , for any k. We shall reach a contradiction, and this
will prove the theorem.
From (12.13), νk [Bδ (y 1 )] ≥ 1/3 for k large enough. Then by (12.12)
the transport Tk has to send some mass from Bδ (x) to Bδ (y 1 ), and
similarly from Bδ (x) to Bδ (y 2 ). Since Tk (Bδ (x)) is connected, it has to
meet C. So there are yk ∈ C and x′k ∈ Bδ (x) such that Tk (x′k ) = yk .
Let xk ∈ K be such that yk ∈ ∂c ψ(xk ). Without loss of generality
we may assume that xk → x∞ ∈ K as k → ∞. By the second part
of (12.12), m := µ[Bδ (x∞ )] ≥ ε0 vol [Bδ (x∞ )] > 0. When k is large
enough, νk [Bδ (y 1 ) ∪ Bδ (y 2 )] > 1 − m (by (12.13) again), so Tk has to
send some mass from Bδ (x∞ ) to either Bδ (y 1 ) or Bδ (y 2 ); say Bδ (y 1 ).
In other words, there is some x′k ∈ Bδ (x∞ ) such that T (x′k ) ∈ Bδ (y 1 ).
Let us recapitulate: for k large enough,
Regular cost functions 305
Before stating the main definition of this section, I shall now in-
troduce some more notation. If X is a closed subset of a Riemannian
manifold M and c : X → Y → R is a continuous cost function, for
any x in the interior of X I shall denote by Dom ′ (∇x c(x, · )) the in-
terior of Dom (∇x c(x, · )). Moreover I shall write Dom ′ (∇x c) for the
union of all sets {x} × Dom ′ (∇x c(x, · )), where x varies in the inte-
rior of X . For instance, if X = Y = M is a complete Riemannian
manifold and c(x, y) = d(x, y)2 is the square of the geodesic distance,
then Dom ′ (∇x c) is obtained from M × M by removing the cut locus,
while Dom (∇x c) might be slightly bigger (these facts are recalled in
the Appendix).
∇− −
c ψ(x) = ∇ ψ(x),
where (x, y0 ) and (x, y1 ) belong to Dom ′ (∇x c). (The functions ψx,y0 ,y1
play in some sense the role of x → |x1 | in usual convexity theory.) See
Figure 12.3 for an illustration of the resulting “recipe”.
y0 y1/2 y1
Fig. 12.3. Regular cost function. Take two cost-shaped mountains peaked at y0
and y1 , let x be a pass, choose an intermediate point yt on (y0 , y1 )x , and grow a
mountain peaked at yt from below; the mountain should emerge at x. (Note: the
shape of the mountain is the negative of the cost function.)
Proof of Proposition 12.15. Let us start with (i). The necessity of the
condition is obvious since y0 , y1 both belong to ∂c ψx,y0 ,y1 (x). Con-
versely, if the condition is satisfied, let ψ be any c-convex function
X → R, let x belong to the interior of X and let y0 , y1 ∈ ∂c ψ(x). By
adding a suitable constant to ψ (which will not change the subdifferen-
tial), we may assume that ψ(x) = 0. Since ψ is c-convex, ψ ≥ ψx,y0 ,y1 ,
so for any t ∈ [0, 1] and x ∈ X ,
E(∇− ψ(x)) ⊂ ∇− −
c ψ(x) ⊂ ∇ ψ(x),
The next result will show that the regularity property of the cost
function is a necessary condition for a general theory of regularity of
optimal transport. In view of Remark 12.19 it is close in spirit to The-
orem 12.7.
Regular cost functions 311
Remark 12.24. One can refine Proposition 10.15 to show that cost
functions deriving from a well-behaved Lagrangian do satisfy the strong
twist condition. In the Appendix I shall give more details for the im-
portant particular case of the squared geodesic distance.
by the formula
3 X
Sc (x, y) (ξ, η) = cij,r cr,s cs,kℓ − cij,kℓ ξ i ξ j η k η ℓ . (12.20)
2
ijkℓrs
The Ma–Trudinger–Wang condition 315
In other words, c-expx (p) is the unique y such that ∇x c(x, y)+p = 0.
When c(x, y) = d(x, y)2 /2 on X = Y = M , a complete Riemannian
manifold, one recovers the usual exponential of Riemannian geometry,
whose domain of definition can be extended to the whole of TM . More
generally, if c comes from a time-independent Lagrangian, under suit-
able assumptions the c-exponential can be defined as the solution at
time 1 of the Lagrangian system starting at x with initial velocity v,
in such a way that ∇v L(x, v) = p.
Then, with the notation p = −∇x c(x, y), we have the following
reformulation of the c-curvature operator:
3 d2 d2
Sc (x, y)(ξ, η) = − c exp x (tξ), (c-exp)x (p + sη) , (12.21)
2 ds2 dt2
where η in the right-hand side is an abuse of notation for the tangent
vector at x obtained from η by the operation of −∇2xy c(x, y) (viewed
as an operator Ty M → Tx M ). In other words, Sc is obtained by
differentiating the cost function c(x, y) twice with respect to x and
twice with respect to p, not with respect to y. Getting formula (12.20)
from (12.21) is just an exercise, albeit complicated, in classical differ-
ential calculus; it involves the differentiation formula for the matrix
inverse: d(M −1 ) · H = −M −1 HM −1 .
To establish (12.22), first note that for any fixed small t, the geodesic
curve joining x to expx (tξ) is orthogonal to s → expx (sη) at s = 0, so
(d/ds)s=0 F (t, s) = 0. Similarly (d/dt)t=0 F (t, s) = 0 for any fixed s, so
the Taylor expansion of F at (0, 0) takes the form
2
d expx (tξ), expx (sη)
= A t2 + B s 2 + C t4 + D t2 s 2 + E s 4
2
+ O(t6 + s6 ).
Since F (t, 0) and F (0, s) are quadratic functions of t and s respectively,
necessarily C = E = 0, so Sc (x, x) = −6D is −6 times the coefficient of
t4 in the expansion of d(expx (tξ), expx (tη))2 /2. Then the result follows
from formula (14.1) in Chapter 14.
Remark 12.31. Formula (12.21) shows that Sc (x, y) is intrinsic, in
the sense that it is independent of any choice of geodesic coordinates
(this was not obvious from Definition 12.20). However, the geometric
interpretation of Sc is related to the regularity property, which is in-
dependent of the choice of Riemannian structure; so we may suspect
that the choice to work in geodesic coordinates is nonessential. It turns
out indeed that Definition 12.20 is independent of any choice of coor-
dinates, geodesic or not: We may apply Formula (12.20) by just letting
ci,j = ∂ 2 c(x, y)/∂xi ∂yj , pi = −ci (partial derivative of c(x, y) with
respect to xi ), and replace (12.21) by
3 ∂2 ∂2
Sc (x, y)(ξ, η) = − c x, c-exp x (p) , (12.23)
2 ∂p2η ∂x2ξ x=x, p=−dx c(x,y)
3 ∂2 ∂2
− c č-expy (q), c-expx (p) . (12.24)
2 ∂p2η ∂qξ2 p=−dx c(x,y), q=−dy c(x,y)
Proof of Theorem 12.35. Let us start with some reminders about clas-
sical convexity in Euclidean space. If Ω is an open set in Rn with C 2
boundary and x ∈ ∂Ω, let Tx Ω stand for the tangent space to ∂Ω at x,
and n for the outward normal on ∂Ω (extended smoothly in a neigh-
borhood of x). The second fundamental
P form of Ω, evaluated at x, is
defined on Tx Ω by II(x)(ξ) = ij ∂i nj ξ i ξ j . A defining function for Ω
at x is a function Φ defined in a neighborhood of x, such that Φ < 0
in Ω, Φ > 0 outside Ω, and |∇Φ| > 0 on ∂Ω. Such a function always
exists (locally), for instance one can choose Φ(x) = ±d(x, ∂Ω), with +
sign when x is outside Ω and − when x is in Ω. (In that case ∇Φ is the
outward normal on ∂Ω.) If Φ is a defining function, then n = ∇Φ/|∇Φ|
on ∂Ω, and for all ξ⊥n,
i j Φij Φj Φik Φij i j
∂i nj ξ ξ = − 3
ξi ξj = ξ ξ .
|∇Φ| |∇Φ| |∇Φ|
The Ma–Trudinger–Wang condition 319
By assumption
R the left-hand side is finite, and the right-hand side is
exactly #{t; ytz ∈ Σ} Hn−1 (dz); so the integrand is finite for almost
all z, and in particular there is a sequence zk → 0 such that each (ytzk )
intersects Σ finitely many often. ⊓
⊔
Proof of Theorem 12.36. Let us first assume that the cost c is regular
on D, and prove the nonnegativity of Sc .
The Ma–Trudinger–Wang condition 321
and (12.20) are the same, one performs a direct (tedious) computation
to check that if η is a tangent vector in the p-space and ξ is a tangent
vector in the x-space, then
!
∂ 4 c x, y(x, p)
ξ i ξ j ηk ηℓ = cij,r cr,s cs,kℓ − cij,kℓ ck,m cℓ,n ξ i ξ j ηm ηn
∂pk ∂pℓ ∂xi ∂xj
= cij,r cr,s cs,kℓ − cij,kℓ ξ i ξ j η k η ℓ .
So
ḣ(t) = − c,j (x, yt ) − c,j (x, yt ) ζ j = ci,j η i ζ j ,
where ηj = −c,j (x, yt ) + c,j (x, yt ) and η i = −cj,i ηj .
Next,
ḧ(t) = − c,ij (x, yt ) − c,ij (x, yt ) ζ i ζ j
+ c,j (x, yt ) − c,j (x, yt ) cj,k ck,ℓi ζ ℓ ζ i
= − c,ij (x, yt ) − c,ij (x, yt ) − ηℓ cℓ,k ck,ij ζ i ζ j
= − c,ij (x, yt ) − c,ij (x, yt ) − η k ck,ij ζ i ζ j .
Now freeze t, yt , ζ, and let Φ(x) = c,ij (x, yt ) ζ i ζ j = h∇2y c(x, yt )·ζ, ζi.
This can be seen either as a function of x or as a function of q =
The Ma–Trudinger–Wang condition 323
Remark 12.43. The implications (i) ⇒ (ii) and (ii) ⇒ (iii) remain
true without the convexity assumptions. It is a natural open prob-
lem whether these assumptions can be completely dispensed with in
Theorem 12.42. A bold conjecture would be that (i), (ii) and (iii)
are always equivalent and automatically imply the total c-convexity
of Dom ′ (∇x c).
326 12 Smoothness
To prove (12.29) it is sufficient to show that h(t) ≥ h(0) for all t ∈ [0, 1].
The č-convexity of Ď implies that (xt , y) always lies in D. Let
q = −∇y c(x, y), q = −∇y c(x, y), η = q − q, then as in the proof of
Theorem 12.36 we have (ẋ)j = −ck,j ηk = η j ,
ḣ(t) = ψi (xt ) + ci (xt , y) η i = −ci,j (xt , y) η i ζ j
ḧ ≥ −C |ḣ(t)|, (12.32)
The property of c-convexity of the target is the key to get good control
of the localization of the gradient of the solution to (12.2). This asser-
tion might seem awkward: After all, we already know that under gen-
eral assumptions, T (Spt µ) = Spt ν (recall the end of Theorem 10.28),
330 12 Smoothness
Smoothness results
Remark 12.53. Theorem 12.51 shows that the regularity of the cost
function is sufficient to build a strong regularity theory. These results
are still not optimal and likely to be refined in the near future; in
particular one can ask whether C α → C 2,α estimates are available for
plainly regular cost functions (but Caffarelli’s methods strongly use the
affine invariance properties of the quadratic cost function); or whether
interior estimates exist (Theorem 12.52(ii) shows that this is the case
for uniformly regular costs).
Remark 12.54. On the other hand, the first part of Theorem 12.52
shows that a uniformly regular cost function behaves better, in certain
ways, than the square Euclidean norm! For instance, the condition in
Theorem 12.52(i) is automatically satisfied if µ(dx) = f (x) dx, f ∈ Lp
for p > n; but it also allows µ to be a singular measure. (Such estimates
are not even true for the linear Laplace equation!) As observed by
specialists, uniform regularity makes the equation much more elliptic.
With Theorem 12.56 and some more work to establish the strict
convexity, it is possible to extend Caffarelli’s theory to unbounded do-
mains.
kψkC 3,α (Ω ′ ) ≤ C α, Ω, Ω ′ , c|Ω×Λ , kF kC 1,1 (Ω) , k∇ψkL∞ (Ω) ;
kψkC k+2,α (Ω ′ ) ≤ C k, α, Ω, Ω ′ , c|Ω×Λ , kF kC k,α (Ω) , k∇ψkL∞ (Ω) .
well-defined and nonsingular at vx,y = (expx )−1 (y), which is the initial
velocity of the unique geodesic going from x to y. But ∇x c(x, y) coin-
cides with −vx→y ; so ∇2xy c(x, y) = −∇y ((expx )−1 ) = −(∇v expx )−1 is
nonsingular. This concludes the proof of the strong twist condition.
It is also true that c satisfies (Cutn−1 ); in fact, for any compact
subset K of M and any x ∈ M one has
Bibliographical notes
Monge himself was probably not aware of the relation between the
Monge problem and Monge–Ampère equations; this link was made
much later, maybe in the work of Knott and Smith [524]. In any case
it is Brenier [156] who made this connection popular among the com-
munity of partial differential equations. Accordingly, weak solutions of
Monge–Ampère-type equations constructed by means of optimal trans-
port are often called Brenier solutions in the literature. McCann [614]
proved that such a solution automatically satisfies the Monge–Ampère
equation almost everywhere (see the bibliographical notes for Chap-
ter 11). Caffarelli [185] showed that for a convex target, Brenier’s notion
of solution is equivalent to the older concepts of Alexandrov solution
and viscosity solution. These notions are reviewed in [814, Chapter 4]
and a proof of the equivalence between Brenier and Alexandrov so-
lutions is recast there. (The concept of Alexandrov solution is devel-
oped in [53, Section 11.2].) Feyel and Üstünel [359, 361, 362] studied
the infinite-dimensional Monge–Ampère equation induced by optimal
transport with quadratic cost on the Wiener space.
The modern regularity theory of the Monge–Ampère equation was
pioneered by Alexandrov [16, 17] and Pogorelov [684, 685]. Since then
it has become one of the most prestigious subjects of fully nonlinear
338 12 Smoothness
main makes the oblique condition more elliptic in some sense than the
Neumann condition.) Fully nonlinear elliptic equations with oblique
boundary condition had been studied before in [555, 560, 561], and the
connection with the second boundary value problem for the Monge–
Ampère equation had been suggested in [562]. Compared to Caffarelli’s
method, this one only covers the global estimates, and requires higher
initial regularity; but it is more elementary.
The generalization of these regularity estimates to nonquadratic
cost functions stood as an open problem for some time. Then Ma,
Trudinger and Wang [585] discovered that the older interior estimates
by Wang [833] could be adapted to general cost functions satisfying the
condition Sc > 0 (this condition was called (A3) in their paper and
in subsequent works). Theorem 12.52(ii) is extracted from this refer-
ence. A subtle caveat in [585] was corrected in [793] (see Theorem 1
there). A key property discovered in this study is that if c is a regular
cost function and ψ is c-convex, then any local c-support function for
ψ is also a global c-support function (which is nontrivial unless ψ is
differentiable); an alternative proof can be derived from the method of
Y.-H. Kim and McCann [519, 520].
Trudinger and Wang [794] later adapted the method of Urbas to
treat the boundary regularity under the weaker condition Sc ≥ 0 (there
called (A3w)). The proof of Theorem 12.51 can be found there.
At this point Loeper [570] made three crucial contributions to the
theory. First he derived the very strong estimates in Theorem 12.52(i)
which showed that the Ma–Trudinger–Wang (A3) condition (called
(As) in Loeper’s paper) leads to a theory which is stronger than
the Euclidean one in some sense (this was already somehow implicit
in [585]). Secondly, he found a geometric interpretation of this condi-
tion, namely the regularity property (Definition 12.14), and related it to
well-known geometric concepts such as sectional curvature (Particular
Case 12.30). Thirdly, he proved that the weak condition (A3w) (called
(Aw) in his work) is mandatory to derive regularity (Theorem 12.21).
The psychological impact of this work was important: before that, the
Ma–Trudinger–Wang condition could be seen as an obscure ad hoc as-
sumption, while now it became the natural condition.
The proof of Theorem 12.52(i) in [570] was based on approximation
and used auxiliary results from [585] and [793] (which also used some
of the arguments in [570]. . . but there is no loophole!)
Bibliographical notes 341
Loeper [571] further proved that the squared distance on the sphere
is a uniformly regular cost, and combined all the above elements to
derive Theorem 12.58; the proof is simplified in [572]. In [571], Loeper
derived smoothness estimates similar to those in Theorem 12.52 for the
far-field reflector antenna problem.
The exponent β in Theorem 12.52(i) is explicit; for instance, in
the case when f = dµ/dx is bounded above and g is bounded below,
Loeper obtained β = (4n − 1)−1 , n being the dimension. (See [572] for
a simplified proof.) However, this is not optimal: Liu [566] improved
this into β = (2n − 1)−1 , which is sharp.
In a different direction, Caffarelli, Gutiérrez and Huang [191] could
get partial regularity for the far-field reflector antenna problem by very
elaborate variants of Caffarelli’s older techniques. This “direct” ap-
proach does not yet yield results as powerful as the a priori estimates
by Loeper, Ma, Trudinger and Wang, since only C 1 regularity is ob-
tained in [191], and only when the densities are bounded from above
and below; but it gives new insights into the subject.
In dimension 2, the whole theory of Monge–Ampère equations be-
comes much simpler, and has been the object of numerous studies [744].
Old results by Alexandrov [17] and Heinz [471] imply C 1 regularity of
the solution of det(∇2 ψ) = h as soon as h is bounded from above (and
strict convexity if it is bounded from below). Loeper noticed that this
implied strenghtened results for the solution of optimal transport with
quadratic cost in dimension 2, and together with Figalli [368] extended
this result to regular cost functions.
Now I shall briefly discuss counterexamples stories. Counterexam-
ples by Pogorelov and Caffarelli (see for instance [814, pp. 128–129])
show that solutions of the usual Monge–Ampère equation are not
smooth in general: some strict convexity on the solution is needed,
and it has to come from boundary conditions in one way or the other.
The counterexample in Theorem 12.3 is taken from Caffarelli [185],
where it is used to prove that the “Hessian measure” (a generalized
formulation of the Hessian determinant) cannot be absolutely continu-
ous if the bridge is thin enough; in the present notes I used a slightly
different reasoning to directly prove the discontinuity of the optimal
transport. The same can be said of Theorem 12.4, which is adapted
from Loeper [570]. (In Loeper’s paper the contradiction was obtained
indirectly as in Theorem 12.44.)
342 12 Smoothness
ties, and the probability densities satisfy certain size restrictions. Then
Loeper and I [572] established smoothness estimates for the optimal
transport on C 4 perturbations of the projective space, without any
size restriction.
The cut locus is also a major issue in the study of the perturbation
of these smoothness results. Because the dependence of the geodesic
distance on the Riemannian metric is not smooth near the cut locus,
it is not clear whether the Ma–Trudinger–Wang condition is stable
under C k perturbations of the metric, however large k may be. This
stability problem, first formulated in [572], is in my opinion extremely
interesting; it is solved by Figalli and Rifford [371] near S 2 .
Without knowing the stability of the Ma–Trudinger–Wang condi-
tion, if pointwise a priori bounds on the probability densities are given,
one can afford a C 4 perturbation of the metric and retain the Hölder
continuity of optimal transport; or even afford a C 2 perturbation and
retain a mesoscopic version of the Hölder continuity [822].
Some of the smoothness estimates discussed in these notes also hold
for other more complicated fully nonlinear equations, such as the reflec-
tor antenna problem [507] (which in its general formulation does not
seem to be equivalent to an optimal transport problem) or the so-called
Hessian equations [789, 790, 792, 800], where the dominant term is a
symmetric function of the eigenvalues of the Hessian of the unknown.
The short survey by Trudinger [788] presents some results of this type,
with applications to conformal geometry, and puts this into perspec-
tive together with optimal transport. In this reference Trudinger also
notes that the problem of the prescribed Schouten tensor resembles an
optimal transport problem with logarithmic cost function; this connec-
tion had also been made by McCann (see the remarks in [520]) who
had long ago noticed the properties of conformal invariance of this cost
function.
A topic which I did not address at all is the regularity of certain sets
solving variational problems involving optimal transport; see [632].
13
Qualitative picture
Recap
for some nontrivial (i.e. not identically −∞, and never +∞) function φ.
In case (ii), if nothing is known about the behavior of the distance
function at infinity, then the gradient ∇ in the left-hand side of (13.2)
should be replaced by an approximate gradient ∇. e
4. Under the same assumptions, the (generalized) displacement
interpolation (µt )0≤t≤1 is unique. This follows from the almost sure
uniqueness of the minimizing curve joining γ0 to γ1 , where (γ0 , γ1 ) is
the optimal coupling. (Corollary 7.23 applies when the total cost is
finite; but even if the total cost is infinite, we can apply a reasoning
similar to the one in Corollary 7.23. Note that the result does not follow
from the vol ⊗ vol (dx0 dx1 )-uniqueness of the minimizing curve joining
x0 to x1 .)
5. Without loss of generality, one might assume that
φ(y) = inf ψ(x) + c(x, y)
x∈M
(these are true supremum and true infimum, not just up to a negligible
set). One can also assume without loss of generality that
and
φ(x1 ) − ψ(x0 ) = c(x0 , x1 ) almost surely.
6. It is still possible that two minimizing curves meet at time t = 0
or t = 1, but this event may occur only on a very small set, of dimension
at most n − 1.
7. All of the above remains true if one replaces µ0 at time 0 by µt
at time t, with obvious changes of notation (e.g. replace c = c0,1 by
ct,1 ); the function φ is unchanged, but now ψ should be changed into
ψt defined by
ψt (y) = inf ψ0 (x) + c0,t (x, y) . (13.3)
x∈M
In particular,
9. Whenever 0 ≤ t0 < t1 ≤ 1,
Z Z
ψt1 dµt1 − ψt0 dµt0 = C t0 ,t1 (µt0 , µt1 )
Z t1 Z
= L x, [(∇v L)(x, · , t)]−1 (∇ψt (x)), t dµt (x) dt;
t0
recall indeed Theorems 7.21 and 7.36, Remarks 7.25 and 7.37, and (13.4).
Open Problem 13.1. If the initial and final densities, ρ0 and ρ1 , are
positive everywhere, does this imply that the intermediate densities ρt
are also positive? Otherwise, can one identify simple sufficient con-
ditions for the density of the displacement interpolant to be positive
everywhere?
Zℓ ↑ 1; Zℓ πℓ ↑ π; Zℓ µt,ℓ ↑ µt ; Zℓ Πℓ ↑ Π.
(i) each π k is an optimal transference plan between µk0 and µk1 , and
any one of the probability measures µk0 , µk1 has a smooth, compactly
supported density;
(ii) µk0 → µ0 , µk1 → µ1 , π k → π in the weak sense as k → ∞.
If the cost function is just the square of the distance, then these
equations become
∂µt
+ ∇ · (ξt µt ) = 0;
∂t
ξ (x) = ∇ψ (x);
t t
2 /2-convex;
(13.6)
ψ 0 is d
2
∂t ψt + |∇ψt | = 0.
2
Finally, for the square of the Euclidean distance, this simplifies into
∂µt
+ ∇ · (ξt µt ) = 0;
∂t
ξt (x) = ∇ψt (x);
|x|2 (13.7)
x → + ψ 0 (x) is lower semicontinuous convex;
2
2
∂t ψt + |∇ψt | = 0.
2
Apart from the special choice of initial datum, the latter system
is well-known in physics as the pressureless Euler equation, for a
potential velocity field.
Quadratic cost function 355
Spt(ψ) ⊂ K, kψkC 2 ≤ ε
b
is d2 /2-convex.
Remark 13.7. The end of the proof took advantage of a general prin-
ciple, independent of the particular cost c: If there is a surjective map
The structure of P2 (M ) 357
The structure of P2 (M )
A striking discovery made by Otto at the end of the nineties is that the
differentiable structure on a Riemannian manifold M induces a kind of
differentiable structure in the space P2 (M ). This idea takes substance
from the following remarks: All of the path (µt )0≤t≤1 is determined
from the initial velocity field ξ0 (x), which in turn is determined by ∇ψ
as in (13.4). So it is natural to think of the function ∇ψ as a kind of
“initial velocity” for the path (µt ). The conceptual shift here is about
the same as when we decided that µt could be seen either as the law
of a random minimizing curve at time t, or as a path in the space of
measures: Now we decide that ∇ψ can be seen either as the field of the
initial velocities of our minimizing curves, or as the (abstract) velocity
of the path (µt ) at time t = 0.
There is an abstract notion of tangent space Tx X (at point x) to a
metric space (X , d): in technical language, this is the pointed Gromov–
Hausdorff limit of the rescaled space. It is a rather natural notion: fix
your point x, and zoom onto it, by multiplying all distances by a large
factor ε−1 , while keeping x fixed. This gives a new metric space Xx,ε ,
and if one is not too curious about what happens far away from x, then
the space Xx,ε might converge in some nice sense to some limit space,
that may not be a vector space, but in any case is a cone. If that limit
space exists, it is said to be the tangent space (or tangent cone) to X
at x. (I shall come back to these issues in Part III.)
In terms of that construction, the intuition sketched above is in-
deed correct: let P2 (M ) be the metric space consisting of probability
measures on M , equipped with the Wasserstein distance W2 . If µ is
absolutely continuous, then the tangent cone Tµ P2 (M ) exists and can
be identified isometrically with the closed vector space generated by
d2 /2-convex functions ψ, equipped with the norm
Z 1/2
k∇ψkL2 (µ;TM ) := |∇ψ|2x dµ(x) .
M
358 13 Qualitative picture
Actually, in view of Theorem 13.5, this is the same as the vector space
generated by all smooth, compactly supported gradients, completed
with respect to that norm.
With what we know about optimal transport, this theorem is not
that hard to prove, but this would require a bit too much geometric
machinery for now. Instead, I shall spend some time on an important
related result by Ambrosio, Gigli and Savaré, according to which any
Lipschitz curve in the space P2 (M ) admits a velocity (which for all t
lives in the tangent space at µt ). Surprisingly, the proof will not require
absolute continuity.
W2 (µs , µt ) ≤ L |t − s|.
For any t ∈ [0, 1], let Ht be the Hilbert space generated by gradients of
continuously differentiable, compactly supported ψ:
L2 (µt ;TM )
Ht := Vect {∇ψ; ψ ∈ Cc1 (M )} .
Then there exists a measurable vector field ξt (x) ∈ L∞ (dt; L2 (dµt (x))),
µt (dx) dt-almost everywhere unique, such that ξt ∈ Ht for all t (i.e. the
velocity field really is tangent along the path), and
∂t µt + ∇ · (ξt µt ) = 0 (13.10)
The proof of Theorem 13.8 requires some analytical tools, and the
reader might skip it at first reading.
R
Now the key remark is that the time-derivative (d/dt) (ψ +RC) dµt
does not depend on the constant C. This shows that (d/dt) ψ dµt
really is a functional of ∇ψ, obviously linear. The above estimate shows
that this functional is continuous with respect to the norm in L2 (dµt ).
360 13 Qualitative picture
just showed that there is a negligible set of times, τK , such that (13.12)
holds true for all ψ ∈ CK 1 (M ) and t ∈ / τK . Now choose an increasing
family of compact sets (Km )m∈N , with ∪Km = M , so that any compact
set is included in some Km . Then (13.12) holds true for all ψ ∈ Cc1 (M ),
as soon as t does not belong to the union of τKm , which is still a
negligible set of times.
But equation (13.12) is really the weak formulation of (13.10). Since
ξt is uniquely determined in L2 (dµt ), for almost all t, actually the vector
field ξt (x) is dµt (x) dt-uniquely determined.
To conclude the proof of the theorem, it only remains to prove the
converse implication. Let (µt ) and (ξt ) solve (13.10). By the equation
of conservation of mass, µt = law (γt ), where γt is a (random) solution
of
γ̇t = ξt (γt ).
Let s < t be any two times in [0, 1]. From the formula
nZ t o
d(γs , γt )2 = (t − s) inf |ζ̇τ |2 dτ ; ζs = γs , ζt = γt ,
s
The structure of P2 (M ) 361
we deduce
Z t Z t
2 2
d(γs , γt ) ≤ (t − s) |γ̇τ | dτ ≤ (t − s) |ξt (γt )|2 dτ.
s s
So
Z t
E d(γs , γt )2 ≤ (t − s) |ξτ (x)|2 dµτ (x) dτ ≤ (t − s)2 kξkL∞ (dt; L2 (dµt )) .
s
In particular
W2 (µs , µt )2 ≤ E d(γs , γt )2 ≤ L2 (t − s)2 ,
where L is an upper bound for the norm of ξ in L∞ (L2 ). This concludes
the proof of Theorem 13.8. ⊓
⊔
Remark 13.9. With hardly any more work, the preceding theorem can
be extended to cover paths that are absolutely continuous of order 2,
in the sense defined on p. 127. Then of course the velocity field will not
live in L∞ (dt; L2 (dµt )), but in L2 (dµt dt).
Observe that in a displacement interpolation, the initial measure µ0
and the initial velocity field ∇ψ0 uniquely determine the final measure
µ1 : this implies that geodesics in P2 (M ) are nonbranching, in the strong
sense that their initial position and velocity determine uniquely their
final position.
Finally, we can now derive an “explicit” formula for the action func-
tional determining displacement interpolations as minimizing curves.
Let µ = (µt )0≤t≤1 be any Lipschitz (or absolutely continuous) path in
P2 (M ); let ξt (x) = ∇ψt (x) be the associated time-dependent velocity
field. By the formula of conservation of mass, µt can be interpreted as
the law of γt , where γ is a random solution of γ̇t = ξt (γt ). Define
Z 1
A(µ) := inf E µt |ξt (γt )|2 dt, (13.13)
0
where the infimum is taken over all possible realizations of the random
curves γ. By Fubini’s theorem,
Z 1 Z 1
2
A(µ) = inf E |ξt (γt )| dt = inf E |γ̇t |2 dt
0 0
Z 1
≥ E inf |γ̇t |2 dt
0
= E d(γ0 , γ1 )2 ,
362 13 Qualitative picture
and the infimum is achieved if and only if the coupling (γ0 , γ1 ) is mini-
mal, and the curves γ are (almost surely) action-minimizing. This shows
that displacement interpolations are characterized as the minimizing
curves for the action A. Actually A is the same as the action appear-
ing in Theorem 7.21 (iii), the only improvement is that now we have
produced a more explicit form in terms of vector fields.
The expression (13.13) can be made slightly more explicit by noting
that the optimal choice of velocity field is the one provided by Theo-
rem 13.8, which is a gradient, so we may restrict the action functional
to gradient velocity fields:
Z 1
∂µt
A(µ) := E µt |∇ψt |2 dt; + ∇ · (∇ψt µt ) = 0. (13.14)
0 ∂t
Bibliographical notes
The issues discussed in this part are concisely reviewed in the sur-
veys by Cordero-Erausquin [244] and myself [821] (both in French).
Ricci curvature
Riem(X, Y ) := ∇Y ∇X − ∇X ∇Y + ∇[X,Y ] ;
x n
S2
Fig. 14.1. The dashed line gives the recipe for the construction of the Gauss map;
its Jacobian determinant is the Gauss curvature.
∇2 f · v = ∇v (∇f ).
(Recall that ∇v stands for the covariant derivation in the direction v.)
In short, ∇2 f is the covariant gradient of the gradient of f .
A convenient way to compute the Hessian of a function is to differ-
entiate it twice along a geodesic path. Indeed, if (γt )0≤t≤1 is a geodesic
path, then ∇γ̇ γ̇ = 0, so
d2 d D E D E
f (γt ) = h∇f (γt ), γ̇t i = ∇ γ̇ ∇f (γt ), γ̇t + ∇f (γt ), ∇ γ̇ γ̇t
dt2 dt
= ∇2 f (γt ) · γ̇t , γ̇t .
t2
2
f (γt ) = f (x) + t h∇f (x), vi + ∇ f (x) · v, v + o(t2 ). (14.3)
2
This identity can actually be used to define the Hessian operator.
A similar computation shows that for any two tangent vectors u, v
at x,
D d
f expx (su + tv) = ∇2 f (x) · u, v , (14.4)
Ds dt
where expx v is the value at time 1 of the constant speed geodesic
starting from x with velocity v. Identity (14.4) together with (14.2)
shows that if f ∈ C 2 (M ), then ∇2 f (x) is a symmetric operator:
376 14 Ricci curvature
h∇2 f (x) · u, vix = h∇2 f (x) · v, uix . In that case it will often be conve-
nient to think of ∇2 f (x) as a quadratic form on Tx M .
The Hessian is related to another fundamental second-order dif-
ferential operator, the Laplacian, or Laplace–Beltrami operator. The
Laplacian can be defined as the trace of the Hessian:
∆f = ∇ · (∇f ),
Remark 14.3. As the proof will show, Property (i) can be replaced by
the following more precise statement involving the subdifferential of ψ:
If ξ is any vector field valued in ∇− ψ (i.e. ξ(y) ∈ ∇− ψ(y) for all y),
then ∇v ξ(x) = Av.
Remark 14.4. For the main part of this course, we shall not need
the full strength of Theorem 14.1, but just the particular case when
ψ is continuously differentiable and ∇ψ is Lipschitz; then the proof
becomes much simpler, and ∇ψ is almost everywhere differentiable in
378 14 Ricci curvature
the usual sense. Still, on some occasions we shall need the full generality
of Theorem 14.1.
vol [T (Pδ )]
J (x) := lim .
vol [Pδ ]
E(t)
E(0)
Fig. 14.2. The orthonormal basis E, here represented by a small cube, goes along
the geodesic by parallel transport.
380 14 Ricci curvature
(All of these quantities depend implicitly on the starting point x.) The
reader who prefers to stay away from the Riemann curvature tensor
can take (14.6) as the equation defining the matrix R; the only things
that one needs to know about it are (a) R(t) is symmetric; (b) the
first row of R(t) vanishes (which is the same, modulo identification, as
R(t) γ̇(t) = 0); (c) tr R(t) = Ricγt (γ̇t , γ̇t ) (which one can also adopt as
a definition of the Ricci tensor); (d) R is invariant under the transform
t → 1 − t, E(t) → −E(1 − t), γt → γ1−t .
Equation (14.6) is of second order in time, so it should come with
˙
initial conditions for both J(0) and J(0). On the one hand, since
T0 (y) = y,
d
Ji (0) = (x + δei ) = ei ,
dδ δ=0
The Jacobian determinant of the exponential map 381
J(t)
E(t)
J(0) = E(0)
Fig. 14.3. At time t = 0, the matrices J(t) and E(t) coincide, but at later times
they (may) differ, due to geodesic distortion.
d d
DJi
Jij = hJi , ej i = , ej = h(∇ξ)ei , ej i.
dt dt Dt
We conclude that the initial conditions are
J(0) = In , ˙
J(0) = ∇ξ(x), (14.8)
all the tangent spaces Tγ(t) M with Rn . This path depends on x via the
initial conditions (14.8), so in the sequel we shall put that dependence
explicitly. It might be very rough as a function of x, but it is very
smooth as a function of t. The Jacobian of the map Tt is defined by
d
(tr U ) + tr (U 2 ) + tr R = 0.
dt
Now the trace of R(t, x) only depends on γt and γ̇t ; in fact, as noticed
before, it is precisely the value of the Ricci curvature at γ(t), evaluated
in the direction γ̇(t). So we have arrived at our first important equation
involving Ricci curvature:
d
(tr U ) + tr (U 2 ) + Ric(γ̇) = 0, (14.12)
dt
where of course Ric(γ̇) is an abbreviation for Ricγ(t) (γ̇(t), γ̇(t)).
Equation (14.12) holds true for any vector field ξ, as long as ξ is
covariantly differentiable at x. But in the sequel, I shall only apply it
in the particular case when ξ derives from a function: ξ = ∇ψ; and ψ is
The Jacobian determinant of the exponential map 383
(tr U )2
tr (U 2 ) ≥ ;
n
then, by plugging this inequality into (14.12), we obtain an important
differential inequality involving Ricci curvature:
d (tr U )2
(tr U ) + + Ric(γ̇) ≤ 0. (14.13)
dt n
There are several ways to rewrite this result in terms of the Jacobian
determinant J (t). For instance, by differentiating the formula
J˙
tr U = ,
J
one obtains easily
!2
d (tr U )2 J¨ 1 J˙
(tr U ) + = − 1− .
dt n J n J
384 14 Ricci curvature
So (14.13) becomes
!2
J¨ 1 J˙
− 1− ≤ − Ric(γ̇). (14.14)
J n J
D̈ Ric(γ̇)
≤− . (14.15)
D n
Yet another useful formula is obtained by introducing ℓ(t) :=
− log J (t), and then (14.13) becomes
ℓ̇(t)2
ℓ̈(t) ≥ + Ric(γ̇). (14.16)
n
In all of these formulas, we have always taken t = 0 as the starting
time, but it is clear that we could do just the same with any starting
time t0 ∈ [0, 1], that is, consider, instead of Tt (x) = exp(t∇ψ(x)), the
map Tt0 →t (x) = exp((t − t0 )∇ψ(x)). Then all the differential inequali-
ties are unchanged; the only difference is that the Jacobian determinant
at time t = 0 is not necessarily 1.
The previous formulas are quite sufficient to derive many useful geo-
metric consequences. However, one can refine them by taking advantage
of the fact that curvature is not felt in the direction of motion. In other
words, if one is traveling along some geodesic γ, one will never be able
to detect some curvature by considering variations (in the initial posi-
tion, or initial velocity) in the direction of γ itself: the path will always
be the same, up to reparametrization. This corresponds to the property
R(t)γ̇(t) = 0, where R(t) is the matrix appearing in (14.6). In short,
curvature is felt only in n − 1 directions out of n. This loose principle
often leads to a refinement of estimates by a factor (n − 1)/n.
Here is a recipe to “separate out” the direction of motion from
the other directions. As before, assume that the first vector of the or-
thonormal basis J(0) is e1 (0) = γ̇(0)/|γ̇(0)|. (The case when γ̇(0) = 0
Taking out the direction of motion 385
and, of course,
(ℓ̇⊥ )2
ℓ̈⊥ ≥ + Ric(γ̇), (14.21)
n−1
386 14 Ricci curvature
D̈⊥ Ric(γ̇)
≤− . (14.22)
D⊥ n−1
To summarize: The basic inequalities for ℓ⊥ and ℓ// are the same
as for ℓ, but with the exponent n replaced by n − 1 in the case of ℓ⊥ ,
and 1 in the case of ℓ// ; and the number Ric(γ̇) replaced by 0 in the
case of ℓ// . The same for D⊥ and D// .
(ℓ̇)2
ℓ̈ ≥ + K,
n−1
Bochner’s formula
(the same formula that we had before at time t = 0). Under the identi-
fication of Rn with Tγ(t) M provided by the basis E(t), we can identify
J with the matrix J, and then
388 14 Ricci curvature
˙ x) J(t, x)−1 = ∇ξ t, γ(t, x) ,
U (t, x) = J(t, (14.25)
where again the linear operator ∇ξ is identified with its matrix in the
basis E.
Then tr U (t, x) = tr ∇ξ(t, x) coincides with the divergence of
ξ(t, · ), evaluated at x. By the chain-rule and (14.24),
d d
(tr U )(t, x) = (∇ · ξ)(t, γ(t, x))
dt dt
∂ξ
=∇· (t, γ(t, x)) + γ̇(t, x) · ∇(∇ · ξ)(t, γ(t, x))
∂t
= −∇ · (∇ξ ξ) + ξ · ∇(∇ · ξ) (t, γ(t, x)).
All functions here are evaluated at (t, γ(t, x)), and of course we can
choose t = 0, and x arbitrary. So (14.26) is an identity that holds true
for any smooth (say C 2 ) vector field ξ on our manifold M . Of course
it can also be established directly by a coordinate computation.1
While formula (14.26) holds true for all vector fields ξ, if ∇ξ is
symmetric then two simplifications arise:
|ξ|2
(a) ∇ξ ξ = ∇ξ · ξ = ∇ ;
2
(b) tr (∇ξ)2 = k∇ξk2HS , HS standing for Hilbert–Schmidt norm.
So (14.26) becomes
|ξ|2
−∆ + ξ · ∇(∇ · ξ) + k∇ξk2HS + Ric(ξ) = 0. (14.27)
2
We shall apply it only in the case when ξ is a gradient: ξ = ∇ψ; then
∇ξ = ∇2 ψ is indeed symmetric, and the resulting formula is
|∇ψ|2
−∆ + ∇ψ · ∇(∆ψ) + k∇2 ψk2HS + Ric(∇ψ) = 0. (14.28)
2
1
With the notation ∇ξ = ξ·∇ (which is classical in fluid mechanics), and tr (∇ξ)2 =
∇ξ··∇ξ, (14.26) takes the amusing form −∇·ξ·∇ξ+ξ·∇∇·ξ+∇ξ··∇ξ+Ric(ξ) = 0.
Bochner’s formula 389
Remark 14.5. With the ansatz ξ = ∇ψ, the pressureless Euler equa-
tion (14.24) reduces to the Hamilton–Jacobi equation
∂ψ |∇ψ|2
+ = 0. (14.29)
∂t 2
One can use this equation to obtain (14.28) directly, instead of first
deriving (14.26). Here equation (14.29) is to be understood in a viscosity
sense (otherwise there are many spurious solutions); in fact the reader
might just as well take the identity
d(x, y)2
ψ(t, x) = inf ψ(y) +
y∈M 2t
Remark 14.6. Here I have not tried to derive Bochner’s formula for
nonsmooth functions. This could be done for semiconvex ψ, with an
2
appropriate “compensated” definition for −∆ |∇ψ| 2 + ∇ψ · ∇(∆ψ). In
fact, the semiconvexity of ∇ψ prevents the formation of instantaneous
shocks, and will allow the Lagrangian/Eulerian duality for a short time.
Remark 14.7. The operator U (t, x) coincides with ∇2 ψ(t, γ(t, x)),
which is another way to see that it is symmetric for t > 0.
From this point on, we shall only work with (14.28). Of course,
by using the Cauchy–Schwarz identity as before, we can bound below
k∇2 ψk2HS by (∆ψ)2 /n; therefore (14.25) implies
2
In (14.26) or (14.28) I have written Bochner’s formula in purely “metric” terms,
which will probably look quite ugly to many geometer readers. An equivalent but
more “topological” way to write Bochner’s formula is
∆ + ∇∇∗ + Ric = 0,
|∇ψ|2 (∆ψ)2
∆ − ∇ψ · ∇(∆ψ) ≥ + Ric(∇ψ). (14.30)
2 n
Apart from regularity issues, this inequality is strictly equivalent to
(14.13), and therefore to (14.14) or (14.15).
Not so much has been lost when going from (14.28) to (14.30): there
is still equality in (14.30) at all points x where ∇2 ψ(x) is a multiple of
the identity.
One can also take out the direction of motion, ∇ψ d := (∇ψ)/|∇ψ|,
from the Bochner identity. The Hamilton–Jacobi equation implies
d + ∇2 ψ · ∇ψ
∂t ∇ψ d = 0, so
D E
d ∇ψ
∂t ∇2 ψ · ∇ψ, d
d ∇ψ
= − ∇2 (|∇ψ|2 /2) · ∇ψ, d − 2 ∇2 ψ · (∇2 ψ · ∇ψ),
d ∇ψ d ,
d 2.
and by symmetry the latter term can be rewritten −2 |(∇2 ψ) · ∇ψ|
From this one easily obtains the following refinement of Bochner’s for-
mula: Define
d ∇ψ
∆// f = ∇2 f · ∇ψ, d , ∆⊥ = ∆ − ∆// ,
then
|∇ψ|2 2
d 2 2
∆// 2 − ∇ψ · ∇∆// ψ + 2 (∇ ψ) · ∇ψ ≥ (∆// ψ)
2 2
∆⊥ |∇ψ| − ∇ψ · ∇∆ ⊥ ψ − 2 d 2 ≥ k∇2 ψk2 + Ric(∇ψ).
(∇ ψ) · ∇ψ
2 ⊥ HS
(14.31)
This is the “Bochner formula with the direction of motion taken out”.
I have to confess that I never saw these frightening formulas anywhere,
and don’t know whether they have any use. But of course, they are
equivalent to their Lagrangian counterpart, which will play a crucial
role in the sequel.
where Z r
V (r) = S(r ′ ) dr ′ ,
0
392 14 Ricci curvature
r !
K
sinn−1 r if K > 0
n−1
S(r) = cn,K r n−1 if K = 0
r !
n−1 |K|
sinh r if K < 0.
n−1
Here of course S(r) is the surface area of Br (0) in the model space, that
is the (n − 1)-dimensional volume of ∂Br (0), and cn,K is a nonessential
normalizing constant. (See Theorem 18.8 later in this course.)
2. Diameter estimates: The Bonnet–Myers theorem states that,
if K > 0, then M is compact and more precisely
r
n−1
diam (M ) ≤ π ,
K
with equality for the model sphere.
3. Spectral gap inequalities: If K > 0, then the spectral gap λ1 of
the nonnegative operator −∆ is bounded below:
nK
λ1 ≥ ,
n−1
with equality again for the model sphere. (See Theorem 21.20 later in
this course.)
4. (Sharp) Sobolev inequalities: If K > 0 and n ≥ 3, let µ =
vol/vol[M ] be the normalized volume measure on M ; then for any
smooth function on M ,
4 2n
kf k2L2⋆ (µ) ≤ kf k2L2 (µ) + k∇f k2L2 (µ) , 2⋆ = ,
Kn(n − 2) n−2
and those constants are sharp for the model sphere.
5. Heat kernel bounds: There are many of them, in particular the
well-known Li–Yau estimates: If K ≥ 0, then the heat kernel pt (x, y)
satisfies
C d(x, y)2
pt (x, y) ≤ exp − ,
vol [B√t (x)] 2 Ct
for some constant C which only depends on n. For K < 0, a similar
bound holds true, only now C depends on K and there is an additional
Change of reference measure and effective dimension 393
For later purposes it will be useful to keep track of all error terms
in the inequalities. So rewrite (14.12) as
2
(tr U )2
tr U
·
(tr U ) +
+ Ric(γ̇) = −
U − In
n n
. (14.33)
HS
[(log J0 )· ]2
(log J0 )·· + + Ric(γ̇)
n
[(log J )· + γ̇ · ∇V (γ)]2
= (log J )·· + h∇2 V (γ) · γ̇, γ̇i + + Ric(γ̇).
n
By using the identity
2
a2 (a + b)2 b2 n N −n
= − + b− a , (14.34)
n N N − n N (N − n) n
we see that
Change of reference measure and effective dimension 395
h i2
(log J )· + γ̇ · ∇V (γ)
n
2
(log J )· (γ̇ · ∇V (γ))2
= −
N N −n 2
n N −n · N
+ (log J ) + γ̇ · ∇V (γ)
N (N − n) n n
2
(log J )· (γ̇ · ∇V (γ))2
= −
N N −n 2
n N −n ·
+ (log J0 ) + γ̇ · ∇V (γ)
N (N − n) n
2 2
(log J )· (γ̇ · ∇V (γ))2 n N −n
= − + tr U + γ̇ · ∇V (γ)
N N −n N (N − n) n
To summarize these computations it will be useful to introduce some
more notation: first, as usual, the negative logarithm of the Jacobian
determinant:
ℓ(t, x) := − log J (t, x); (14.35)
and then, the generalized Ricci tensor:
∇V ⊗ ∇V
RicN,ν := Ric + ∇2 V − , (14.36)
N −n
where the tensor product ∇V ⊗ ∇V is a quadratic form on TM , defined
by its action on tangent vectors as
∇V ⊗ ∇V x (v) = (∇V (x) · v)2 ;
so
(∇V · γ̇)2
RicN,ν (γ̇) = (Ric + ∇2 V )(γ̇) − .
N −n
It is implicitly assumed in (14.36) that N ≥ n (otherwise the correct
definition is RicN,ν = −∞); if N = n the convention is 0 × ∞ = 0,
so (14.36) still makes sense if ∇V = 0. Note that Ric∞,ν = Ric + ∇2 V ,
while Ricn,vol = Ric.
The conclusion of the preceding computations is that
2
ℓ̇2
tr U
ℓ̈ =
+ RicN,ν (γ̇) +
U − In
N n
HS
2
n N −n
+ tr U + γ̇ · ∇V (γ) . (14.37)
N (N − n) n
396 14 Ricci curvature
As corollaries,
(ℓ̇⊥ )2
ℓ̈ ⊥ ≥ + RicN,ν (γ̇)
N −1
(14.45)
−N D̈⊥ ≥ RicN,ν (γ̇).
D⊥
L = ∆ − ∇V · ∇, (14.46)
Γ (f, g) = ∇f · ∇g.
|∇ψ|2
Γ2 (ψ) := Γ2 (ψ, ψ) = L − ∇ψ · ∇(Lψ). (14.49)
2
Then our previous computations can be rewritten as
(Lψ)2
Γ2 (ψ) = + RicN,ν (∇ψ)
N
2 2 !
2 ∆ψ
n N − n
+
∇ ψ − In
+ N (N − n) ∆ψ + ∇V · ∇ψ .
n HS n
(14.50)
(Lψ)2
Γ2 (ψ) ≥ + RicN,ν (∇ψ). (14.51)
N
And as the reader has certainly guessed, one can now take out the
direction of motion (this computation is provided for completeness but
will not be used): As before, define
d = ∇ψ ,
∇ψ
|∇ψ|
and next,
L⊥ f = ∆⊥ f − ∇V · ∇f,
|∇ψ|2
Γ2,⊥ (ψ) = L⊥ d 2 − 2|(∇2 ψ) · ∇ψ|
− ∇ψ · ∇(L⊥ ψ) − 2(∇2 ψ) · ∇ψ d 2.
2
Then
2
(L⊥ ψ)2
2 ∆⊥ ψ
Γ2,⊥ (ψ) = + RicN,ν (∇ψ) +
∇ ⊥ ψ − I
n−1
N −1 n−1
2 X n
n−1 N −n
+ ∆⊥ ψ + ∇V · ∇ψ + (∂1j ψ)2 .
(N − 1)(N − n) n−1
j=2
Curvature-dimension bounds
Remark 14.9. Note carefully that the inequalities (i)–(iv’) are re-
quired to be true always: For instance (ii) should be true for all ψ,
all x and all t ∈ (0, 1). The equivalence is that [(i) true for all x] is
equivalent to [(ii) true for all ψ, all x and all t], etc.
So indeed RicN,ν ≥ K.
The proof goes in the same way for the equivalence between (i) and
(ii’), (iii’), (iv’): again the problem is to understand why (ii’) implies
(i), and the reasoning is almost the same as before; the key point being
that the extra error terms in ∂1j ψ, j 6= 2, all vanish at x0 . ⊓
⊔
There are two classical generalizations. The first one states that the
differential inequality f¨+ K ≤ 0 is equivalent to the integral inequality
Kt(1 − t)
f (1 − λ) t0 + λ t1 ≥ (1 − λ) f (t0 ) + λ f (t1 ) + (t0 − t1 )2 .
2
Another one is as follows: The differential inequality
where √
sin(λθ Λ)
√ if Λ > 0
sin(θ Λ)
(λ)
τ (θ) = λ if Λ = 0
√
sinh(λθ −Λ)
√ if Λ < 0.
sinh(θ −Λ)
A more precise statement, together with a proof, are provided in the
Second Appendix of this chapter.
This leads to the following integral characterization of CD(K, N ):
(1−t) (t)
D(t, x) ≥ τK,N D(0, x) + τK,N D(1, x) (N < ∞)
(14.54)
Kt(1 − t)
ℓ(t, x) ≤ (1 − t) ℓ(0, x) + t ℓ(1, x) − d(x, y)2 (N = ∞),
2
(14.55)
404 14 Ricci curvature
The reasoning is the same for the case N = ∞, starting from in-
equality (ii) in Theorem 14.8. ⊓
⊔
(t)
The next result states that the the coefficients τK,N obtained in
Theorem 14.11 can be automatically improved if N is finite and K 6= 0,
by taking out the direction of motion:
(1−t) (t)
D(t, x) ≥ τK,N D(0, x) + τK,N D(1, x) (14.56)
where now
1
1 sin(tα) 1− N
t N
if K > 0
sin α
(t)
τK,N = t if K = 0
1
1 sinh(tα) 1− N
t N if K < 0
sinh α
and r
|K|
α= d(x, y) (α ∈ [0, π] if K > 0).
N −1
Remark 14.13. When N < ∞ and K > 0 Theorem 14.12 pcontains the
Bonnet–Myers theorem according to whichpd(x, y) ≤ π (N − 1)/K.
With Theorem 14.11 the bound was only π N/K.
Proof of Theorem 14.12. The proof that (14.56) implies CD(K, N ) is
done in the same way as for (14.54). (In fact (14.56) is stronger
than (14.54).)
As for the other implication: Start from (14.22), and transform it
into an integral bound:
(1−t) (t)
D⊥ (t, x) ≥ σK,N D⊥ (0, x) + σK,N D⊥ (1, x),
(t)
where σK,N = sin(tα)/ sin α if K > 0; t if K = 0; sinh(tα)/ sinh α if
K < 0. Next transform (14.19) into the integral bound
D// (t, x) ≥ (1 − t) D// (0, x) + t D// (1, x).
Both estimates can be combined thanks to Hölder’s inequality:
1 1
D(t, x) = D⊥ (t, x)1− N D// (t, x) N
1− 1
(1−t) (t) N
≥ σK,N D(0, x) + σK,N D(1, x)
1
N
(1 − t) D// (0, x) + t D// (1, x)
(1−t) 1 1 (t) 1 1
≥ (σK,N )1− N (1 − t) N D(0, x) + (σK,N ) N t N D// (1, x).
This implies inequality (14.56). ⊓
⊔
406 14 Ricci curvature
Distortion coefficients
Apart from Definition 14.19, the material in this section is not necessary
to the understanding of the rest of this course. Still, it is interesting
because it will give a new interpretation of Ricci curvature bounds, and
motivate the introduction of distortion coefficients, which will play a
crucial role in the sequel.
11
00
the light source
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11 how the observer thinks
the light source looks like
location of
the observer
Fig. 14.4. The meaning of distortion coefficients: Because of positive curvature
effects, the observer overestimates the surface of the light source; in a negatively
curved world this would be the contrary.
Distortion coefficients 409
111
000 y
000
111
000
111
000
111
000
111
x 000
111
000
111
000
111
000
111
000
111
000
111
000
111
Fig. 14.5. The distortion coefficient is approximately equal to the ratio of the
volume filled with lines, to the volume whose contour is in dashed line. Here the
space is negatively curved and the distortion coefficient is less than 1.
K
(K,∞) (1−t2 ) d(x,y)2
βt (x, y) = e 6 . (14.64)
(K,N )
• For t = 0 define β0 (x, y) = 1.
π
(K,N)
p The shape of the curves βt
Fig. 14.6. (x, y), for fixed t ∈ (0, 1), as a function
of α = |K|/(N − 1) d(x, y).
the case N < ∞ by passing to the limit, since limN →0 [N (a1/N − 1)] =
log a. So all we have to show is that if n ≤ N < ∞, then
1 1 1 1 1
J (t) N ≥ (1 − t) β 1−t (y, x) N J (0) N + t β t (x, y) N J (1) N .
(Here J1,0 , J0,1 are identified with their expressions J 1,0 , J 0,1 in the
moving basis E.)
As noted after (14.7), the Jacobi equation is invariant under the
change t → 1 − t, E → −E, so J 1,0 becomes J 0,1 when one exchanges
the roles of x and y, and replaces t by 1 − t. In particular, we have the
formula
det J 1,0 (t)
= β1−t (y, x). (14.66)
(1 − t)n
As in the beginning of this chapter, the issue is to compute the
determinant at time t of a Jacobi field J(t). Since the Jacobi fields
are solutions of a linear differential equation of the form J¨ + R J = 0,
they form a vector space of dimension 2n, and they are invariant under
right-multiplication by a constant matrix. This implies
and then factor out the positive quantity (det K(t))1/n to get
1 1 1
det J(t) n ≥ det(J 1,0 (t)) n + det(J 0,1 (t) J(1)) n .
2 max ϕ(xi ) − ϕ(x0 )
1≤i≤n+1
kϕkLip(B(x0 ,r)) ≤ .
r
First Appendix: Second differentiability of convex functions 417
Let us assume that ε ≤ |v|; then, by (i), for all z ∈ B2ε (x),
∇ϕ(z) = ∇ϕ(x) + A(z − x) + o(|v|).
If y ∈ Bε (x), we can integrate this identity against
R ζε (y − z) dz (since
ζε (y−z) = 0 if |y−z| > ε); taking into account (z−x) ζε (z−x) dz = 0,
we obtain
418 14 Ricci curvature
(∂i ∂j ϕ) ∗ ζ = ∂i (∂j ϕ ∗ ζ) = ∂i ∂j (ϕ ∗ ζ)
= ∂j ∂i (ϕ ∗ ζ) = ∂j (∂i ϕ ∗ ζ) = (∂j ∂i ϕ) ∗ ζ.
2 Qv0 (t, x) − max Qvi (t, x) ≤ Qv (t, x) ≤ max Qvi (t, x).
So
1
µv − ρv (x0 )λn
−−−→ 0.
δ n TV(Bδ (x0 )) δ→0
The goal is to show that qv (x0 ) = ρv (x0 ). Then the proof will be com-
plete, since ρv is a quadratic form in v (indeed, ρv (x0 ) is obtained by
422 14 Ricci curvature
where
424 14 Ricci curvature
√
sin(λθ Λ)
√ if 0 < Λ < π 2
sin(θ Λ)
τ (λ) (θ) = λ if Λ = 0
√
sinh(λθ −Λ)
√ if Λ < 0.
sinh(θ −Λ)
If Λ = π 2 then f (t) = c sin(πt) for some c ≥ 0; finally if Λ > π 2 then
f = 0.
1 θΛ2
τ (1/2) (θ) = 1+ + o(θ 3 )
2 8
and
f (t0 ) + f (t1 ) t + t (t − t )2 t + t
0 1 0 1 0 1
=f + f¨ + o(|t0 − t1 |2 ).
2 2 4 2
So, if we fix t ∈ (0, 1) and let t0 , t1 → t in such a way that t = (t0 +t1 )/2,
we get
Let R : t 7−→ R(t) be a continuous map defined on [0, 1], valued in the
space of n × n symmetric matrices, and let U be the space of functions
u : t 7−→ u(t) ∈ Rn solving the second-order linear differential equation
Assume that J10 (t) is invertible for all t ∈ (0, 1]. Then:
(a) S(t) := J10 (t)−1 J01 (t) is symmetric positive for all t ∈ (0, 1], and
it is a decreasing function of t.
(b) There is a unique pair of Jacobi matrices (J 1,0 , J 0,1 ) such that
are symmetric. Moreover, det K(t) > 0 for all t ∈ [0, 1).
Proposition 14.31 (Jacobi matrices with positive determinant).
Let S(t) and K(t) be the matrices defined in Proposition 14.30. Let J
be a Jacobi matrix such that J(0) = In and J(0) ˙ is symmetric. Then
the following properties are equivalent:
˙
(i) J(0) + S(1) > 0;
(ii) K(t) J 0,1 (t) J(1) > 0 for all t ∈ (0, 1);
(iii) K(t) J(t) > 0 for all t ∈ [0, 1];
(iv) det J(t) > 0 for all t ∈ [0, 1].
The equivalence remains true if one replaces the strict inequalities in
(i)–(ii) by nonstrict inequalities, and the time-interval [0, 1] in (iii)–(iv)
by [0, 1).
Before proving these propositions, it is interesting to discuss their
geometric interpretation:
• If γ(t) = expx (t∇ψ(x)) is a minimizing geodesic, then γ̇(t) =
∇ψ(t, γ(t)), where ψ solves the Hamilton–Jacobi equation
∂ψ(t, x) |∇ψ(t, x)|2
+ =0
∂t 2
ψ(0, · ) = ψ;
428 14 Ricci curvature
H(t)
0 = J01 (t) − J10 (t) ,
t
where (modulo identification)
d(·, γ(t))2
H(t) = ∇2x .
2
(The extra t in the denominator comes from time-reparameterization.)
So S(t) = J10 (t)−1 J01 (t) = H(t)/t should be symmetric.
• The assumption
J(0) = In , ˙
J(0) + S(1) ≥ 0
d
hu(s), v̇(s)i − hu̇(s), v(s)i = −hu(s), R(s) v(s)i + hR(s) u(s), v(s)i
ds
=0 (14.76)
then the flow (u(s), u̇(s)) 7−→ (u(t), u̇(t)), where u ∈ U, preserves ω.
The subspaces U0 = {u ∈ U; u(0) = 0} and U̇0 = {u ∈ U; u̇(0) = 0} are
Lagrangian: this means that their dimension is the half of the dimension
430 14 Ricci curvature
So
D ∂2u E
hṠ(t)w, wi = − w, (0, t)
∂s ∂t
D ∂ E D ∂v E
= − w, (∂t u)(0, t) = − u(0), (0) ,
∂s ∂s
where s 7−→ v(s) = (∂t u)(s, t) and s 7−→ u(s) = u(s, t) are solu-
tions of the Jacobi equation. Moreover, by differentiating the conditions
in (14.77) one obtains
∂u
v(t) + (t) = 0; v(0) = 0. (14.78)
∂s
By (14.76) again,
D ∂v E D ∂u E D ∂v E D ∂u E
− u(0), (0) = − (0), v(0) − u(t), (t) + (t), v(t) .
∂s ∂s ∂s ∂s
The first two terms in the right-hand side vanish because v(0) = 0 and
u(t) = u(t, t) = 0. Combining this with the first identity in (14.78) one
finds in the end
hṠ(t)w, wi = −kv(t)k2 . (14.79)
We already know that v(0) = 0; if in addition v(t) = 0 then 0 =
v(t) = J10 (t)v̇(0), so (by invertibility of J10 (t)) v̇(0) = 0, and v vanishes
Third Appendix: Jacobi fields forever 431
J 1,0 (t) = J01 (t) − J10 (t) J10 (1)−1 J01 (1); J 0,1 (t) = J10 (t) J10 (1)−1 .
(14.80)
˙ 0 −1 1 ˙ 0,1 ˙
Moreover, J (0) = −J1 (1) J0 (1) and J (t) = J1 (1) J1 (1)−1 are
1,0 0 0
is positive symmetric since by (a) the matrix S(t) = J10 (t)−1 J01 (t) is
a strictly decreasing function of t. In particular K(t) is invertible for
t > 0; but since K(0) = In , it follows by continuity that det K(t)
remains positive on [0, 1).
Finally, if J satisfies the assumptions of (c), then S(1) J(0)−1 is
symmetric (because S(1)∗ = S(1)). Then
(K(t) J 0,1 (t) J(1) J(0)−1
= t J10 (t)−1 J10 (t) J10 (1)−1 J01 (1) + J10 (1) J˙(0) J(0)−1
˙ J(0)−1
= t J10 (1)−1 J01 (1) J(0)−1 + J(0)
is also symmetric. ⊓
⊔
As we already noticed, the first matrix is positive for t ∈ (0, 1); and the
second is also positive, by assumption. In particular (ii) holds true.
The implication (ii) ⇒ (iii) is obvious since J(t) = J 1,0 (t) +
J 0,1 (t) J(1)
is the sum of two positive matrices for t ∈ (0, 1). (At t = 0
one sees directly K(0) J(0) = In .)
432 14 Ricci curvature
If (iii) is true then (det K(t)) (det J(t)) > 0 for all t ∈ [0, 1), and we
already know that det K(t) > 0; so det J(t) > 0, which is (iv).
˙
It remains to prove (iv) ⇒ (i). Recall that K(t) J(t) = t [S(t)+ J(0)];
since det K(t) > 0, the assumption (iv) is equivalent to the statement
that the symmetric matrix A(t) = t S(t) + t J(0) ˙ has positive deter-
minant for all t ∈ (0, 1]. The identity t S(t) = K(t) J01 (t) shows that
A(t) approaches In as t → 0; and since none of its eigenvalues vanishes,
A(t) has to remain positive for all t. So S(t) + J(0)˙ is positive for all
t ∈ (0, 1]; but S is a decreasing function of t, so this is equivalent to
˙
S(1) + J(0) > 0, which is condition (i).
The last statement in Proposition 14.31 is obtained by similar ar-
guments and its proof is omitted. ⊓
⊔
Bibliographical notes
the idea that if the Ricci curvature is bounded below by K, and the
dimension is less than N , then volumes along geodesic fields grow no
faster than volumes in model spaces of constant sectional curvature hav-
ing dimension N and Ricci curvature identically equal to K. These com-
putations are usually performed in a smooth setting; their adaptation
to the nonsmooth context of semiconvex functions has been achieved
only recently, first by Cordero-Erausquin, McCann and Schmucken-
schläger [246] (in a form that is somewhat different from the one pre-
sented here) and more recently by various sets of authors [247, 577, 761].
Bochner’s formula appears, e.g., as [394, Proposition 4.15] (for a
vector field ξ = ∇ψ) or as [680, Proposition 3.3 (3)] (for a vector field
ξ such that ∇ξ is symmetric, i.e. the 1-form p → ξ · p is closed). In
both cases, it is derived from properties of the Riemannian curvature
tensor. Another derivation of Bochner’s formula for a gradient vector
field is via the properties of the square distance function d(x0 , x)2 ; this
is quite simple, and not far from the presentation that I have followed,
since d(x0 , x)2 /2 is the solution of the Hamilton–Jacobi equation at
time 1, when the initial datum is 0 at x0 and +∞ everywhere else.
But I thought that the explicit use of the Lagrangian/Eulerian duality
would make Bochner’s formula more intuitive to the readers, especially
those who have some experience of fluid mechanics.
There are several other Bochner formulas in the literature; Chap-
ter 7 of Petersen’s book [680] is entirely devoted to that subject. In
fact “Bochner formula” is a generic name for many identities involving
commutators of second-order differential operators and curvature.
The examples (14.10) are by now standard; they have been discussed
for instance by Bakry and Qian [61], in relation with spectral gap esti-
mates. When the dimension N is an integer, these reference spaces are
obtained by “projection” of the model spaces with constant sectional
curvature.
The practical importance of separating out the direction of motion is
implicit in Cordero-Erausquin, McCann and Schmuckenschläger [246],
but it was Sturm who attracted my attention on this. To implement
this idea in the present chapter, I essentially followed the discussion
in [763, Section 1]. Also the integral bound (14.56) can be found in this
reference.
Many analytic and geometric consequences of Ricci curvature bounds
are discussed in Riemannian geometry textbooks such as the one by
434 14 Ricci curvature
Otto calculus
Both the pressure and the iterated pressure will appear naturally when
one differentiates the energy functional: the pressure for first-order
derivatives, and the iterated pressure for second-order derivatives.
Example 15.1. Let m 6= 1, and
ρm − ρ
U (ρ) = U (m) (ρ) = ;
m−1
then
p(ρ) = ρm , p2 (ρ) = (m − 1) ρm .
There is an important limit case as m → 1:
then
p(ρ) = ρ, p2 (ρ) = 0.
By the way, the linear part −ρ/(m − 1) in U (m) does not contribute
to the pressure, but has the merit of displaying the link between U (m)
and U (1) .
Differential operators will also be useful. Let ∆ be the Laplace oper-
ator on M , then the distortion of the volume element by the function V
leads to a natural second-order operator:
L = ∆ − ∇V · ∇. (15.5)
gradµ H = −∆µ,
which can be identified with the function −∆ρ. Thus the gradient of
Boltzmann’s entropy is the Laplace operator. This short statement is
one of the first striking conclusions of Otto’s formalism.
15 Otto calculus 439
Z
= ∇U ′ (ρ) · ∇ψ ρ e−V
Z
= ∇U ′ (ρ) · ∇ψ dµ.
15 Otto calculus 441
If this should hold true for all ψ, the only possible choice is that
∇U ′ (ρ) = ∇ζ(ρ), at least µ-almost everywhere. In any case ζ := U ′ (ρ)
provides an admissible representation of gradµ Uν . This proves for-
mula (15.8). The other two formulas are obtained by noting that
p′ (ρ) = ρ U ′′ (ρ), and so
therefore
∇ · µ ∇U ′ (ρ) = ∇ · e−V ρ ∇U ′ (ρ) = e−V L p(ρ).
⊓
⊔
For the second order (formula (15.7)), things are more intricate.
The following identity will be helpful: If ξ is a tangent vector at x on
a Riemannian manifold M, and F is a function on M, then
d2
Hessx F (ξ) = 2 F (γ(t)), (15.15)
dt t=0
From the proof of the gradient formula, one has, with the notation
µt = ρt ν,
Z
dUν (µt )
= ∇ψt · ∇U ′ (ρt ) ρt dν
dt
ZM
= ∇ψt · ∇p(ρt ) dν
MZ
Open Problem 15.11. Find a nice formula for the Hessian of the
functional F appearing in (15.21).
Open Problem 15.12. Find a nice formalism playing the role of the
Otto calculus in the space Pp (M ), for p 6= 2. More generally, are there
nice formal rules for taking derivatives along displacement interpola-
tion, for general Lagrangian cost functions?
444 15 Otto calculus
Bibliographical notes
Displacement convexity I
∇2 F ≥ 0 (16.4)
0 t 1 s
Proof of Proposition 16.2. The arguments in this proof will come again
several times in the sequel, in various contexts.
Assume that (i) holds true. Consider x0 and x1 in M , and introduce
a constant-speed minimizing geodesic γ joining γ0 = x0 to γ1 = x1 .
Then
d2
2
F (γ t ) = ∇ F (γt ) · γ̇t , γ˙t ≥ Λ(γt , γ̇t ).
dt2
Then Property (ii) follows from identity (16.5) with ϕ(t) := F (γt ).
As for Property (iii), it can be established either by dividing the
inequality in (ii) by t > 0, and then letting t → 0, or directly from (i)
by using the Taylor formula at order 2 with ϕ(t) = F (γt ) again. Indeed,
ϕ̇(0) = h∇F (γ0 ), γ̇0 i, while ϕ̈(t) ≥ Λ(γt , γ̇t ).
To go from (iii) to (iv), replace the geodesic γt by the geodesic γ1−t ,
to get
Z 1
F (γ0 ) ≥ F (γ1 ) − ∇F (γ1 ), γ̇1 + Λ(γ1−t , γ̇1−t ) (1 − t) dt.
0
Displacement convexity
e
Tt (x) = expx (t ∇ψ(x)), (16.11)
d
v t, Tt (x) = Tt (x),
dt
and one also has
e t (Tt (x)),
v t, Tt (x) = ∇ψ
where ψt is a solution at time t of the quadratic Hamilton–Jacobi equa-
tion with initial datum ψ0 = ψ.
The next definition adapts the notions of convexity, λ-convexity
and Λ-convexity to the setting of optimal transport. Below λ is a real
number that might nonnegative or nonpositive, while Λ = Λ(µ, v) de-
fines for each probability measure µ a quadratic form on vector fields
v : M → TM .
λ t(1 − t)
∀t ∈ [0, 1] F (µt ) ≤ (1−t) F (µ0 )+t F (µ1 )− W2 (µ0 , µ1 )2 ;
2
• Λ-displacement convex, if, whenever (µt )0≤t≤1 is a (constant-
speed, minimizing) geodesic in P2ac (M ), and (ψt )0≤t≤1 is an associ-
ated solution of the Hamilton–Jacobi equation,
Z 1
∀t ∈ [0, 1] F (µt ) ≤ (1−t) F (µ0 )+t F (µ1 )− e s ) G(s, t) ds,
Λ(µs , ∇ψ
0
µ̇ + ∇ · (∇ψ µ) = 0.
These functions will come back again and again in the sequel, and the
associated functionals will be denoted by HN,ν .
If inequality (16.16) holds true, then
Hessµ Uν ≥ KΛU ,
where Z
ΛU (µ, µ̇) = |∇ψ|2 p(ρ) dν. (16.18)
M
So the conclusion is as follows:
Comparing the latter expression with (16.19) shows that RicN,ν (v0 ) ≥
K|v0 |2 . Since x0 and v0 were arbitrary, this implies RicN,ν ≥ K. Note
that this reasoning only used the functional HN,ν = (UN )ν , and prob-
ability measures µ that are very concentrated around a given point.
This heuristic discussion is summarized in the following:
(i) Ric ≥ 0;
(ii) If the nonlinearity U is such that the nonnegative iterated pres-
sure p2 is nonnegative, then the functional Uvol is displacement convex;
t=0 t=1
Fig. 16.2. The lazy gas experiment: To go from state 0 to state 1, the lazy gas
uses a path of least action. In a nonnegatively curved world, the trajectories of the
particles first diverge, then converge, so that at intermediate times the gas can afford
to have a lower density (higher entropy).
Bibliographical notes
Displacement convexity II
In Chapter 16, a conjecture was formulated about the links between dis-
placement convexity and curvature-dimension bounds; its plausibility
was justified by some formal computations based on Otto’s calculus.
In the present chapter I shall provide a rigorous justification of this
conjecture. For this I shall use a Lagrangian point of view, in contrast
with the Eulerian approach used in the previous chapter. Not only is
the Lagrangian formalism easier to justify, but it will also lead to new
curvature-dimension criteria based on so-called “distorted displacement
convexity”.
The main results in this chapter are Theorems 17.15 and 17.37.
p(r)
(ii) is a nondecreasing function of r;
r 1−1/N
( )
δN U (δ−N ) (δ > 0) if N < ∞
(iii) u(δ) := δ −δ
is a convex
e U (e ) (δ ∈ R) if N = ∞
function of δ.
Remark 17.3. It is clear (from condition (i) for instance) that DCN ′ ⊂
DCN if N ′ ≥ N . So the smallest class of all is DC∞ , while DC1 is the
largest (actually, conditions (i)–(iii) are void for N = 1).
∀r ≥ 0, U (r) ≥ a r log r + b r.
(iii) Let N ∈ [1, ∞] and let U ∈ DCN . If r0 ∈ (0, +∞) is such that
p(r0 ) > 0, then there is a constant K > 0 such that p′ (r) ≥ Kr −1/N
for all r ≥ r0 . If on the contrary p(r0 ) = 0, then U is linear on [0, r0 ].
In particular, the set {r; U ′′ (r) = 0} is either empty, or an interval of
the form [0, r0 ].
(iv) Let N ∈ [1, ∞] and let U ∈ DCN . Then U is the pointwise
nondecreasing limit of a sequence of functions (Uℓ )ℓ∈N in DCN , such
that (a) Uℓ coincides with U on [0, rℓ ], where rℓ is arbitrarily large;
1
(b) for each ℓ there are a ≥ 0 and b ∈ R such that Uℓ (r) = −a r 1− N +b r
(or a r log r + b r if N = ∞) for r large enough; (c) Uℓ′ (∞) → U ′ (∞)
as ℓ → ∞.
(v) Let N ∈ [1, ∞] and let U ∈ DCN . Then U is the pointwise
nonincreasing limit of a sequence of functions (Uℓ )ℓ∈N in DCN , such
466 17 Displacement convexity II
U (r) Uℓ (r)
u(δ) = δN Ψ (δ−N ).
r − N − 1 ′′ − 1
1
′ −1
p′ (r) = r U ′′ (r) = r N u (r N ) − (N − 1) u (r N )
N2
1
−N
(N − 1) u′ (r0 ) 1
≥ − r− N .
N2
−1/N
If on the other hand u′ (r0 ) = 0, then necessarily u′ (r −1/N ) = 0 for
−1/N
all r ≤ r0 , which means that u is constant on [r0 , +∞), so U is
linear on [0, r0 ].
The reasoning is the same in the case N = ∞, with the help of the
formulas
1 1 ′′ 1 1
p(r) = −r u′ log , U ′′ (r) = u log − u′ log
r r r r
and 1
r ≥ r0 =⇒ p′ (r) = r U ′′ (r) ≥ −u′ log .
r0
Displacement convexity classes 469
Since u′ is nondecreasing and u′ (aℓ+1 ) < u′ (aℓ /2), the curve Tℓ lies
strictly below the curve Tℓ+1 on [0,R aℓaℓ /2], ′′and therefore on [0, aℓ+1 ]. By
choosing χℓ in such a way that aℓ /2 χℓ u is very small, we can make
sure that uℓ is very close to the line Tℓ (s) on [0, aℓ ]; and in particular
that the whole curve uℓ is bounded above by Tℓ+1 on [aℓ+1 , aℓ ]. This
will ensure that uℓ is a nondecreasing function of ℓ.
To recapitulate: uℓ ≤ u; uℓ+1 ≤ uℓ ; uℓ = u on [aℓ , +∞); 0 ≤ u′′ℓ ≤ u′′ ;
0 ≥ u′ℓ ≥ u′ ; uℓ is affine on [0, aℓ /2].
Now let
Uℓ (r) = r uℓ (r −1/N ).
By direct computation,
r −1− N − 1 ′′ − 1
1
′ −1
Uℓ′′ (r) = r N u (r N ) − (N − 1) u (r N ) .
ℓ ℓ (17.4)
N2
Since uℓ is convex nonincreasing, the above expression is nonnegative;
so Uℓ is convex, and by construction, it lies in DCN . Moreover Uℓ sat-
isfies the first requirement in (vi), since Uℓ′′ (r) is bounded above by
(r −1−1/N /N 2 ) (r −1/N u′′ (r −1/N ) − (N − 1) u′ (r −1/N )) = U ′′ (r).
470 17 Displacement convexity II
r −1− N − 1
1
−1 1
Uℓ′′ (r) = r ℓ
′
N χ (r N ) r N − (N − 1) u (r)
ℓ
N2
1
−1− N
r
≤ 2
1 + (N − 1)a ;
N
′′ N −1 1
U (r) = 2
a r −1− N .
N
So C = 1 + 1/((N − 1)a) is admissible.
Case 3: u is not affine at infinity and u′ (+∞) = 0. The proof is
based again on the same principle, but modified as follows:
• on [0, aℓ ], uℓ coincides with u;
• on [aℓ , +∞), u′′ℓ (s) = ζℓ (s) u′′ (s), where ζℓ is a smooth function
identically equal to 1 close to aℓ , identically equal to 0 at infinity,
with values in [0, 2].
Choose aℓ < bℓ < cℓ , and ζℓ supported in [aℓ , cℓ ], so that 1 ≤ ζℓ ≤ 2
on [aℓ , bℓ ], 0 ≤ ζℓ ≤ 2 on [bℓ , cℓ ], and
Z bℓ Z cℓ
′′ ′ ′
ζℓ (s) u (s) ds > u (bℓ ) − u (aℓ ); ζℓ (s) u′′ (s) ds = −u′ (aℓ ).
aℓ aℓ
R +∞
This is possible since u′ and u′′ are continuous and aℓ (2u′′ (s)) ds =
2(u′ (+∞) − u′ (aℓ )) > u′ (+∞) − u′ (aℓ ) > 0 (otherwise u would be affine
on [aℓ , +∞)). Then choose aℓ+1 > cℓ + 1.
The resulting function uℓ is convex and it satisfies u′ℓ (+∞) =
u (aℓ ) − u′ (aℓ ) = 0, so u′ℓ ≤ 0 and uℓ is constant at infinity.
′
true on [bℓ , cℓ ]. Then these inequalities will also hold true on [cℓ , +∞)
since uℓ is constant there, and u is nonincreasing.
Define Uℓ (r) = r uℓ (r −1/N ). The same reasoning as in the previous
case shows that Uℓ lies in DCN , Uℓ ≥ U , Uℓ is linear on [0, c−N ℓ ], Uℓ
converges monotonically to U as ℓ → ∞, and Uℓ′ (0) = uℓ (+∞) con-
verges to u(+∞) = U ′ (0). The sequence (Uℓ ) satisfies all the desired
properties; in particular the inequalities u′′ℓ ≤ 2u′′ and u′ℓ ≥ u′ ensure
that Uℓ′′ ≤ 2U ′′ .
Case 4: u is not affine at infinity and u′ (+∞) < 0. In this case the
proof is based on the same principle, and uℓ is defined as follows:
• on [0, aℓ ], uℓ coincides with u;
• on [aℓ , +∞), u′′ℓ (s) = ηℓ (s) u′′ (s)+ χℓ (s)/s, where χℓ and ηℓ are both
valued in [0, 1], χℓ is a smooth cutoff function with compact support
in (aℓ , +∞), and ηℓ is a smooth function identically equal to 1 close
to aℓ , and identically equal to 0 close to infinity.
To construct these functions, first choose bℓ > aℓ and χℓ supported
in [aℓ , bℓ ] in such a way that
Z bℓ
χℓ (s) u′ (bℓ ) + u′ (+∞)
ds = − .
aℓ s 2
R +∞
This is always possible since aℓ ds/s = +∞, u′ is continuous and
−(u′ (bℓ )+u′ (+∞))/2 approaches the finite limit −u′ (+∞) as bℓ → +∞.
Then choose cℓ > bℓ , and ηℓ supported in [aℓ , cℓ ] such that ηℓ = 1
on [aℓ , bℓ ] and Z cℓ
u′ (+∞) − u′ (bℓ )
ηℓ u′′ = .
bℓ 2
R +∞
This is always possible since bℓ u′′ (s) ds = u′ (+∞) − u′ (bℓ ) >
[u′ (+∞) − u′ (bℓ )]/2 > 0 (otherwise u would be affine on [bℓ , +∞)).
Finally choose aℓ+1 ≥ cℓ + 1.
The function uℓ so constructed is convex, affine at infinity, and
Z bℓ Z bℓ Z cℓ
χℓ (s)
u′ℓ (+∞) = u′ (aℓ ) + u′′ + ds + ηℓ u′′ = 0.
aℓ aℓ s bℓ
On [bℓ , +∞), one has u′ℓ ≥ u′ℓ (bℓ ) = (u′ (bℓ ) − u′ (+∞))/2 ≥ u′ (+∞)
if u′ (bℓ ) ≥ 3u′ (+∞). We can always ensure that this inequality holds
true by choosing a1 large enough thatRu′ (a1 ) ≥ 3u′ (+∞). Then uℓ (s) ≥
s
uℓ (bℓ ) + u′ (+∞) (s − bℓ ) ≥ uℓ (bℓ ) + bℓ u′ = u(s); so uℓ ≥ u also on
[bℓ , +∞).
Define Uℓ (r) = r uℓ (r −1/N ). All the desired properties of Uℓ can
be shown just as before, except for the bound on Uℓ′′ , which we shall
now check. On [a−N ′′ ′′ −N
ℓ , +∞), Uℓ = U . On [0, aℓ ), with the notation
a = −u′ (+∞), we have u′ℓ (r −1/N ) ≥ −a, u′ (r −1/N ) ≥ −3a (recall that
we imposed u′ (a1 ) ≥ −3a), so
r −1− N − 1 ′′ − 1
1
r −1− N − 1 ′′ − 1
1
′′
U (r) ≥ r N u (r N ) + (N − 1)a ,
N2
and once again Uℓ′′ ≤ CU ′′ with C = 3 + 1/((N − 1)a).
It remains to prove the second part of (vi). This will be done by
a further approximation scheme. So let U ∈ DCN be linear close to
the origin. (We can always reduce to this case by (v).) If U is linear
on the whole of R+ , there is nothing to do. Otherwise, by (iii), the set
where U ′′ vanishes is an interval [0, r0 ]. The goal is to show that we may
approximate U by Uℓ in such a way that Uℓ ∈ DCN , Uℓ is nonincreasing
in ℓ, Uℓ is linear on some interval [0, r0 (ℓ)], and Uℓ′′ increases nicely
from 0 on [r0 (ℓ), r1 (ℓ)).
In this case, u is a nonincreasing function, identically equal to a
−1/N
constant on [s0 , +∞), with s0 = r0 ; and also u′ is nonincreasing
to 0, so in fact u is strictly decreasing up to s0 . Let a1 ∈ (s0 /2, s0 ). We
can recursively define real numbers aℓ and C 2 functions uℓ as follows:
• on (0, aℓ ], uℓ coincides with u;
• on [aℓ , +∞), (uℓ )′′ = χℓ u′′ + ηℓ (−u′ ), where χℓ and ηℓ are smooth
functions valued in [0, 2], χℓ (r) is identically equal to 1 for r close
to aℓ , and identically equal to 0 for r ≥ bℓ ; and ηℓ is compactly
supported in [bℓ , cℓ ] and decreasing to 0 close to cℓ ; aℓ < bℓ < cℓ < s0 .
Let us choose χℓ , ηℓ , bℓ , cℓ in such a way that
474 17 Displacement convexity II
Z bℓ Z bℓ Z bℓ Z cℓ
χℓ u′′ > u′′ ; χℓ u′′ + ηℓ (−u′ ) = −u′ℓ (aℓ );
aℓ aℓ aℓ bℓ
Z cℓ
ηℓ (−u′ ) > 0.
bℓ
Rs
This is possible since u′ , u′′ are continuous, aℓ0 (2u′′ ) = −2u′ℓ (aℓ ) >
−u′ℓ (aℓ ), and (−u′ ) is strictly positive on [aℓ , s0 ]. It is clear that uℓ ≥ u
and u′ℓ ≥ u′ on [aℓ , bℓ ], with strict inequalities at bℓ ; by choosing cℓ very
close to bℓ , we can make sure that these inequalities are preserved on
[bℓ , cℓ ]. Then we choose aℓ+1 = (cℓ + s0 )/2.
Let us check that Uℓ (r) := r uℓ (r −1/N ) satisfies all the required
properties. To bound Uℓ′′ , note that for r ∈ [s−N 0 , (s0 /2)
−N ],
Uℓ′′ (r) ≤ C(N, r0 ) u′′ℓ (r −1/N ) − u′ℓ (r −1/N )
≤ 2C(N, r0 ) u′′ (r −1/N ) − u′ (r −1/N )
and
U ′′ (r) ≥ K(N, r0 ) u′′ (r −1/N ) − u′ (r −1/N ) ,
where C(N, r0 ), K(N, r0 ) are positive constants. Finally, on [bℓ , cℓ ],
u′′ℓ = ηℓ (−u′ ) is decreasing close to cℓ (indeed, ηℓ is decreasing close to
cℓ , and −u′ is positive nonincreasing); and of course −u′ℓ is decreasing
as well. So u′′ℓ (r −1/N ) and −u′ℓ (r −1/N ) are increasing functions of r in
a small interval [r0 , r1 ]. This concludes the argument.
To prove (vi), we may first approximate u by a C ∞ convex, non-
increasing function uℓ , in such a way that ku − uℓ kC 2 ((a,b)) → 0 for
any a, b > 0. This can be done in such a way that uℓ (s) is nonde-
creasing for small s and nonincreasing for large s; and u′ℓ (0) → u′ (0),
u′ℓ (+∞) → u′ (+∞). The conclusion follows easily since p(r)/r 1−1/N is
nondecreasing and equal to −(1/N )u′ (r −1/N ) (−u′ (log 1/r) in the case
N = ∞). ⊓
⊔
with nonnegative RRicci curvature. (Indeed, vol[Br (x0 )] = O(r N ) for any
fixed x0 ∈ M , so dν(x)/[1 + d(x0 , x)]p(N −1) < +∞ if p(N − 1) > N .)
Proof of Theorem 17.8. The problem is to show that under the assump-
tions of the theorem,
R U (ρ) is bounded below by a ν-integrable function;
then Uν (µ) = U (ρ) dν will be well-defined in R ∪ {+∞}.
Suppose first that N < ∞. By convexity of u, there is a constant
A > 0 so that δN U (δ−N ) ≥ −Aδ − A, which means
1
U (ρ) ≥ −A ρ + ρ1− N . (17.6)
Domain of the functionals Uν 477
I shall often write Hν instead of H∞,ν ; and I may even write just H if
the reference measure is the volume measure. This notation
R is justified
by analogy with Boltzmann’s H functional: H(ρ) = ρ log ρ dvol.
For each U ∈ DCN , formula (16.18) defines a functional ΛU which
will later play a role in displacement convexity estimates. It will be
convenient to compare this quantity with ΛN := ΛUN ; explicitly,
Z
1
ΛN (µ, v) = |v(x)|2 ρ1− N (x) dν(x), µ = ρ ν. (17.9)
M
Remark 17.16. Statement (ii) means, explicitly, that for any displace-
ment interpolation (µt )0≤t≤1 in Ppac (M ), and for any t ∈ [0, 1],
Z 1Z 1
Uν (µt ) + KN,U e s (x)|2 dν(x) G(s, t) ds
ρs (x)1− N |∇ψ
0 M
≤ (1 − t) Uν (µ0 ) + t Uν (µ1 ), (17.11)
0,s
where ρt is the density of µt , ψs = H+ ψ (Hamilton–Jacobi semigroup),
2 e
ψ is d /2-convex, exp(∇ψ) is the Monge transport µ0 → µ1 , and KN,U
is defined by (17.10).
Core of the proof of Theorem 17.15. Before giving a complete proof, for
pedagogical reasons I shall give the main argument behind the impli-
cation (i) ⇒ (ii) in Theorem 17.15, in the simple case K = 0.
Let (µt )0≤t≤1 be a Wasserstein geodesic, with µt absolutely contin-
uous, and let ρt be the density of µt with respect to ν. By change of
variables, Z Z
ρ0
U (ρt ) dν = U Jt dν,
Jt
where Jt is the Jacobian of the optimal transport taking µ0 to µt . The
next step consists in rewriting this as a function of the mean distortion.
Let u(δ) = δN U (δ−N ), then
1
Z Z
ρ0 Jt JN
U ρ0 dν = u t1 ρ0 dν.
J t ρ0 ρN
0
Displacement convexity from curvature bounds, revisited 481
Note that the last integral is finite since |∇ψs (x)|2 is almost surely
bounded by D2 , where D is the maximum distance between elements
R 1− 1
of Spt(µ0 ) and elements of Spt(µ1 ); and that ρs N dν ≤ ν[Spt µs ]1/N
by Jensen’s inequality.
If either Uν (µ0 ) = +∞ or Uν (µ1 ) = +∞, then there is nothing to
prove; so let us assume that these quantities are finite.
Let t0 be a fixed time in (0, 1); on Tt0 (M ), define, for all t ∈ [0, 1],
Tt0 →t expx (t0 ∇ψ(x)) = expx (t∇ψ(x)).
Upon integration against µt0 and use of Fubini’s theorem, this inequal-
ity becomes
This concludes the proof of Property (ii) when µ0 and µ1 have com-
pact support. In a second step I shall relax this compactness as-
sumption by a restriction argument. Let p ∈ [2, +∞) ∪ {c} satisfy the
assumptions of Theorem 17.8, and let µ0 , µ1 be two probability mea-
sures in Ppac (M ). Let (Zℓ )ℓ∈N , (µt,ℓ )0≤t≤1, ℓ∈N (ψt,ℓ )0≤t≤1, ℓ∈N be as in
Proposition 13.2. Let ρt,ℓ stand for the density of µt,ℓ . By Remark 17.4,
the function Uℓ : r → U (Zℓ r) belongs to DCN ; and it is easy to check
1
1−
that KN,Uℓ = Zℓ N KN,U . Since the measures µt,ℓ are compactly sup-
ported, we can apply the previous inequality with µt replaced by µt,ℓ
and U replaced by Uℓ :
Z Z Z
U (Zℓ ρt,ℓ ) dν ≤ (1 − t) U (Zℓ ρ0,ℓ ) dν + t U (Zℓ ρ1,ℓ ) dν
1
Z 1Z
1− N 1
− Zℓ KN,U ρs,ℓ (y)1− N |∇ψs,ℓ (y)|2 dν(y) G(s, t) ds. (17.16)
0 M
On the other hand, the proof of Theorem 17.8 shows that U− (r) ≤
1
A(r + r 1− N ) for some constant A = A(N, U ); so
1
1− N 1− 1 1− 1
U− (Zℓ ρt,ℓ ) ≤ A Zℓ ρt,ℓ + Zℓ ρt,ℓ N ≤ A ρt + ρt N . (17.17)
By the proof of Theorem 17.8 and Remark 17.12, the function on the
right-hand side of (17.17) is ν-integrable, and then we may pass to the
limit by dominated convergence. To summarize:
Z Z
U+ (Zℓ ρt,ℓ ) dν −−−→ U+ (ρt ) dν by monotone convergence;
ℓ→∞
Z Z
U− (Zℓ ρt,ℓ ) dν −−−→ U− (ρt ) dν by dominated convergence.
ℓ→∞
So we can pass to the limit in the first three terms appearing in
the inequality (17.16). As for the last term, note that |∇ψs,ℓ (y)|2 =
d(y, Ts→1,ℓ (y))2 /(1 − s)2 , at least µs,ℓ (dy)-almost surely; but then ac-
cording to Proposition 13.2 this coincides with d(y, Ts→1 (y))2 /(1−s)2 =
e s (y)|2 . So the last term in (17.16) can be rewritten as
|∇ψ
Z 1Z
1
KN,U e s (y)|2 dν(y) G(s, t) ds,
(Zℓ ρs,ℓ (y))1− N |∇ψ
0 M
µ0 = ρ0 ν; µt = exp(t∇ψ)# µ0 .
δ̈(t, x) 1 2
= − θ RicN,ν (v0 ) + O(θ 3 ) . (17.20)
δ(t, x) N
By repeating the proof of (i) ⇒ (ii) with U = UN and using (17.20),
one obtains
486 17 Displacement convexity II
Proof of Proposition 17.24. First, |∇ψ e t (x)| = d(x, Tt→1 (x))/(1 − t),
R
where Tt→1 is the optimal transport µt → µ1 . So ρt |∇ψ e t |2 dν =
2 2 2
W2 (µt , µ1 ) /(1 − t) = W2 (µ0 , µ1 ) . This proves (i).
To prove (ii), I shall start from
Z
1
(1 − t) ρt (x)1− N |∇ψe t (x)|2 ν(dx)
Z
1 1
= ρt (x)1− N d(x, Tt→1 (x))2 ν(dx).
1−t
Let us first see how to bound the integral in the right-hand side, with-
out worrying about the factor (1 − t)−1 in front. This can be done
with the help of Jensen’s inequality, in the same spirit as the proof of
Theorem 17.8: If r ≥ 0 is to be chosen later, then
Z
1
ρt (x)1− N d(x, Tt→1 (x))2 ν(dx)
Z N−1
r
N
2(N−1 ) N
≤ ρt (x) 1 + d(z, x) d(x, Tt→1 (x)) ν(dx)
Z !1
N
ν(dx)
× N −1
1 + d(z, x)r
Z N−1
r
2( N ) N
≤C ρt (x) 1 + d(z, x) d(z, x) + d(z, Tt→1 (x) N−1
ν(dx)
Displacement convexity from curvature bounds, revisited 489
Z N−1
N N N
r+2(N−1 ) r+2(N−1 )
≤C ρt (x) 1 + d(z, x) + d(z, Tt→1 (x)) ν(dx)
Z Z N−1
N N N
r+2(N−1 ) r+2(N−1 )
= C 1 + ρt (x) d(z, x) + ρ1 (y) d(z, y) ν(dy) ,
where !1
Z N
ν(dx)
C = C(r, N, ν) = C(r, N ) N −1 ,
1 + d(z, x)r
and C(r, N ) stands for some constant depending only on r and N . By
Remark 17.12, the previous expression is bounded by
Z Z N−1
N N N
r+2(N−1 ) r+2(N−1 )
C(r, N, ν) 1 + d(z, x) µ0 (dx) + d(z, x) µ1 (dx) ;
In Theorem 17.15, all the R 1 influence of the Ricci curvature bounds lies
in the additional term 0 (. . .) G(s, t) ds. As a consequence, as soon as
K 6= 0 and N < ∞, the formulation involves not only µt , µ0 and µ1 , but
the whole geodesic path (µs )0≤s≤1 . This makes the exploitation of the
resulting inequality (in geometric applications, for instance) somewhat
delicate, if not impossible.
I shall now present a different formulation, expressed only in terms of
µt , µ0 and µ1 . As a price to pay, the functionals Uν (µ0 ) and Uν (µ1 ) will
be replaced by more complicated expressions in which extra distortion
coefficients will appear. From the technical point of view, this new
formulation relies on the principle that one can “take the direction of
motion out”, in all reformulations of Ricci curvature bounds that were
examined in Chapter 14.
We already know by the proof of Theorem 17.8 that (ρ log ρ)− and
ρ lie in L1 (ν), or equivalently in L1 (π(dy|x) ν(dx)). To check the in-
tegrability of the negative part of −a ρ log β(x, y), it suffices to note
that
Z Z
ρ(x) (log β(x, y))+ π(dy|x) ν(dx) ≤ (log β(x, y))+ π(dy|x) µ(dx)
Z
= (log β(x, y))+ π(dx dy),
(K,N )
Remark 17.31. The limit in (17.31) is well-defined; indeed, βt is
increasing as N decreases, and U (r)/r is nondecreasing as a function
(K,N ) (K,N )
of r; so U (ρ(x)/βt (x, y)) βt (x, y) is a nonincreasing function
of N and the limit in (17.31) is monotone. The monotone convergence
theorem guarantees that this definition coincides with the original defi-
nition (17.27) when it applies, i.e. when the integrand is bounded below
by a π(dy|x) ν(dx)-integrable function.
Remark
q 17.33. If diam (M ) = DK,N then actually M is the sphere
S N ( NK−1 ) and ν = vol; but we don’t need this information. (The as-
sumption of M being complete without boundary is important for this
statement to be true, otherwise the one-dimensional reference spaces
of Examples 14.10 provide a counterexample.) See the end of the bib-
liographical notes for more explanations. In the case N = 1, if M is
distinct from a point then it is one-dimensional, so it is either the real
line or a circle.
494 17 Displacement convexity II
Before explaining the proof of this theorem, let me state two very
natural open problems (I have no idea how difficult they are).
Open Problem 17.39. Theorem 17.15 and 17.37 yield two different
upper bounds for Uν (µt ): on the one hand,
Can one compare those two bounds, and if yes, which one is sharpest?
Proof of Theorem 17.37. The proof shares many common points with
the proof of Theorem 17.15. I shall restrict to the case N < ∞, since
the case N = ∞ is similar.
Let us start with the implication (i) ⇒ (ii). In a first step, I shall
assume that µ0pand µ1 are compactly supported, and (if K > 0)
diam (M ) < π (N − 1)/K. With the same notation as in the be-
ginning of the proof of Theorem 17.15,
Z Z
U (ρt (x)) dν(x) = u(δt0 (t, x)) dµt0 (x).
M M
496 17 Displacement convexity II
Theorem 17.28 (together with Application 17.29) shows that the latter
quantity is always integrable. As a conclusion,
Z
Zℓ ρ0,ℓ (x0 )
U+ β(x0 , T (x0 )) ν(dx0 )
β(x0 , T (x0 ))
Z
ρ0 (x0 )
−−−→ U+ β(x0 , T (x0 )) ν(dx0 )
ℓ→∞ β(x0 , T (x0 ))
Let us see how this expression behaves in the limit θ → 0; for instance
I shall focus on the first term in (17.39). From the definitions,
(K,N)
Z
β1−t 1 (K,N ) 1
HN,π,ν (µ0 ) − HN,ν (µ0 ) = N ρ0 (x)1− N 1−β1−t (x, T (x)) N dν(x),
(17.40)
where T = exp(∇ψ) is the optimal transport from µ0 to µ1 . A standard
Taylor expansion shows that
Bibliographical notes
proved that Ric ≥ 0 is equivalent to the property that the heat equa-
tion is a contraction in Wp distance, where p is fixed in [1, ∞). Also,
Sturm [761] showed that a Riemannian manifold (equipped with the
volume measure) satisfies CD(0, N ) if and only if the nonlinear equa-
tion ∂t ρ = ∆ρm is a contraction for m ≥ 1 − 1/N . (There is a more
complicated criterion for CD(K, N ).) As will be explained in Chap-
ter 24, these results are natural in view of the gradient flow structure
of these diffusion equations.
Even if one sticks to displacement convexity, there are possible vari-
ants in which one allows the functional to explicitly depend on the in-
terpolation time. Lott [576] showed that a measured Riemannian man-
ifold (M, ν) satisfies CD(0, N ) if and only if t 7−→ t Hν (µt ) + N t log t
is a convex function of t ∈ [0, 1] along any displacement interpolation.
There is also a more general version of this statement for CD(K, N )
bounds.
Now come some more technical details. The use of Theorem 17.8
to control noncompactly supported probability densities is essentially
taken from [577]; the only change with respect to that reference is that
I do not try to define Uν on the whole of P2ac , and therefore do not
require p to be equal to 2.
In this chapter I used restriction arguments to remove the compact-
ness assumption. An alternative strategy consists in using a density
argument and stability theorems (as in [577, Appendix E]); these tools
will be used later in Chapters 23 and 30. If the manifold has nonnegative
sectional curvature, it is also possible to directly apply the argument
of change of variables to the family (µt ), even if it is not compactly
supported, thanks to the uniform inequality (8.45).
Another innovation in the proofs of this chapter is the idea of choos-
ing µt0 as the reference measure with respect to which changes of vari-
ables are performed. The advantage of that procedure (which evolved
from discussions with Ambrosio) is that the transport map from µt0
to µt is Lipschitz for all times t, as we know from Chapter 8, while
the transport map from µ0 to µ1 is only of bounded variation. So the
proof given in this section only uses the Jacobian formula for Lipschitz
changes of variables, and not the more subtle formula for BV changes
of variables.
Paths (µt )0≤t≤1 defined in terms of transport from a given measure
e (not necessarily of the form µt0 ) are studied in [30] in the context of
µ
generalized geodesics in P2 (Rn ). The procedure amounts to considering
504 17 Displacement convexity II
Contrary
R to what is stated in [814], this is false in dimension 1; in fact
µ 7−→ W (x − y) µ(dx) µ(dy) is displacement convex on P2 (R) if and
only if z 7−→ W (z) + W (−z) is convex on R+ (This is because, by
monotonicity, (x, y) 7−→ (T (x), T (y)) preserves the set {y ≥ x} ⊂ R2 .)
As a matter of fact, an interesting example coming from statistical me-
chanics, where W is not convex on the whole of R, is discussed in [202].
There is no simple displacement convexity statement known for the
Coulomb interaction potential; however, Blower [123] proved that
Z
1 1
E(µ) = log µ(dx) µ(dy)
2 R2 |x − y|
defines a displacement convex functional on P2ac (R). Blower further
studied what happens when one adds a potential energy to E, and used
these tools to establish concentration inequalities for the eigenvalues of
some large random matrices. Also Calvez and Carrillo [196, Chapter 7]
recently gave a sharp analysis of the defect of displacement convexity
for the logarithmic potential in dimension 1 (which arguably should be
the worst) with applications to the long-time study of a one-dimensional
nonlinear diffusion equation modeling chemotaxis.
Exercise 17.43. Let M be a compact Riemannian manifold of di-
mension n ≥ 2, and let Ψ be a continuous function on M × M ;
show that Ψ defines a displacement functional on P2 (M ) if and only
if (x, y) 7−→ Ψ (x, y) + Ψ (y, x) is geodesically convex on M × M .
Hints: Note that a product of geodesics in M is also a geodesic in
M × M . First show that Ψ is locally convex on M × M \ ∆, where
∆ = {(x, x)} ⊂ M × M . Use a density argument to conclude; note that
this argument fails if n = 1.
I shall conclude with some comments about Remark 17.33. The
classical Cheng–Toponogov theorem states the following: If a Rie-
mannian manifold M has dimension N , Ricci curvature bounded below
by K > 0, p and diameter equal to the limit Bonnet–Myers diameter
DK,N = π (N − 1)/K, then it is a sphere. I shall explain why this re-
sult remains true when the reference measure is not the volume, and M
is assumed to satisfy CD(K, N ). Cheng’s original proof was based on
eigenvalue comparisons, but there is now a simpler argument relying on
the Bishop–Gromov inequality [846, p. 229]. This proof goes through
when the volume measure is replaced by another reference measure ν,
and then one sees that Ψ = − log(dν/dvol) should solve a certain dif-
ferential equation of Ricatti type (replace the inequality in [573, (4.11)]
506 17 Displacement convexity II
Volume control
Remark 18.3. It does not really matter whether the definition is for-
mulated in terms of open or in terms of closed balls; at worst this
changes the value of the constant D.
When the distance d and the reference measure ν are clear from the
context, I shall often say that the space X is doubling (resp. locally
doubling), instead of writing that the measure ν is doubling on the
metric space (X , d).
508 18 Volume control
Fig. 18.1. The natural volume measure on this “singular surface” (a balloon with
a spine) is not doubling.
Proof. Let x ∈ X , and let r > 0. Since ν is nonzero, there is R > 0 such
that ν[BR] (x)] > 0. Then there is a constant C, possibly depending on
x and R, such that ν is C-doubling inside BR] (x). Let n ∈ N be large
enough that R ≤ 2n r; then
So ν[Br] (x)] > 0. Since r is arbitrarily small, x has to lie in the support
of ν. ⊓
⊔
transport. This is not the standard strategy, but it will work just as
well as any other, since the results in the end will be optimal. As a
preliminary step, I shall establish a “distorted” version of the famous
Brunn–Minkowski inequality.
• If N = ∞,
1 1 1
log ≤ (1 − t) log + t log
ν [A0 , A1 ]t ν[A0 ] ν[A1 ]
Kt(1 − t)
− sup d(x0 , x1 )2 . (18.5)
2 x0 ∈A0 , x1 ∈A1
Detailed proof of Theorem 18.5. First consider the case N < ∞. For
(K,N )
brevity I shall write just βt instead of βt . By regularity of the
measure ν and an easy approximation argument, it is sufficient to treat
the case when ν[A0 ] > 0 and ν[A1 ] > 0. Then one may define µ0 = ρ0 ν,
µ1 = ρ1 ν, where
1A0 1A1
ρ0 = , ρ1 = .
ν[A0 ] ν[A1 ]
In words, µt0 (t0 ∈ {0, 1}) is the law of a random point distributed uni-
formly in At0 . Let (µt )0≤t≤1 be the unique displacement interpolation
between µ0 and µ1 , for the cost function d(x, y)2 . Since M satisfies the
curvature-dimension bound CD(K, N ), Theorem 17.37, applied with
1
U (r) = UN (r) = −N r 1− N − r , implies
Z
UN (ρt (x)) ν(dx)
M
Z
ρ0 (x0 )
≤ (1 − t) UN β1−t (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
M β1−t (x0 , x1 )
Z
ρ1 (x1 )
+t UN βt (x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
M βt (x0 , x1 )
Z
ρ0 (x0 ) β1−t (x0 , x1 )
= (1 − t) UN π(dx0 dx1 )
M β1−t (x0 , x1 ) ρ0 (x0 )
Z
ρ1 (x1 ) βt (x0 , x1 )
+ t UN π(dx0 dx1 ),
M β (x
t 0 1, x ) ρ1 (x1 )
where βt stands for the minimum of βt (x0 , x1 ) over all pairs (x0 , x1 ) ∈
A0 × A1 . Then, by explicit computation,
Z Z
1 1 1 1
1− N
ρ0 (x0 ) dν(x0 ) = ν[A0 ] N , ρ1 (x1 )1− N dν(x1 ) = ν[A1 ] N .
M M
Bishop–Gromov inequality
ν[Br (x)]
Z r !N −1 ,
r
K
sin t dt
0 N −1
ν[Br (x)]
resp. Z r !N −1 .
r
|K|
sinh t dt
0 N −1
Here is a precise statement:
d+
dr ν[Br ]
is nonincreasing, (18.8)
s(K,N )(r)
for K < 0 the same formula remains true with sin replaced by sinh, K
by |K| and r + ε by r − ε. In the sequel, I shall only consider K > 0, the
treatment of K < 0 being obviously similar. After applying the above
bounds, inequality (18.4) yields
q N−1
N
h i1 K i1
N sin t N −1 (r + ε) N
ν Bt(r+ε) \ Btr ≥t q ν Br+ε \ Br ;
K
t sin N −1 (r + ε)
If φ(r) stands for ν[Br ], then the above inequality can be rewritten as
φ′ (tr) φ′ (r)
≥ .
s(K,N ) (tr) s(K,N ) (r)
This was for any t ∈ [0, 1], so φ′ /s(K,N ) is indeed nonincreasing, and
the proof is complete. ⊓
⊔
The following lemma was used in the proof of Theorem 18.8. At first
sight it seems obvious and the reader may skip its proof.
Doubling property 515
This implies Rx Ry
f f
a
R x ≤ Rxy ,
a g x g
and (18.9) follows. ⊓
⊔
Doubling property
where V (r) is the volume of Br (x) in the model space. This implies
that ν[Br (x)] is a continuous function of r. Of course, this property
is otherwise obvious, but the Bishop–Gromov inequality provides an
explicit modulus of continuity.
Dimension-free bounds
Proof of Theorem 18.12. For brevity I shall write Br for Br] (x0 ). Ap-
ply (18.5) with A0 = Bδ , A1 = Br , and t = δ/(2r) ≤ 1/2. For any
minimizing geodesic γ going from A0 to A1 , one has d(γ0 , γ1 ) ≤ r + δ,
so
< +∞.
Bibliographical notes
Remark 19.3. The word “local” in Definition 19.1 means that the
inequality is interested in averages around some point x0 . This is in
contrast with the “global” Poincaré inequalities that will be considered
later in Chapter 21, in which averages are over the whole space.
The next theorem is the key result of this chapter. The notation [x, y]t
stands for the set of all t-barycenters of x and y (as in Theorem 18.5).
(19.2)
1
− 1 −N
where by convention (1 − t) a− N + t b N = 0 if either a or b
is 0;
• If N = ∞, one has the pointwise bound
Kt(1 − t)
ρt (x) ≤ sup ρ0 (x0 )1−t ρ1 (x1 )t exp − d(x0 , x1 )2 .
x∈[x0 ,x1 ]t 2
(19.3)
524 19 Density control and local regularity
As I said before, there are (at least) two possible schemes of proof
for Theorem 19.4. The first one is by direct application of the Jacobian
estimates from Chapter 14; the second one is based on the displacement
convexity estimates from Chapter 17. The first one is formally simpler,
while the second one has the advantage of being based on very robust
functional inequalities. I shall only sketch the first proof, forgetting
about regularity issues, and give a detailed treatment of the second
one.
Similarly,
ρ0 (x0 ) = ρ1 (x1 ) J (1, x0 ).
Then the result follows directly from Theorems 14.11 and 14.12: Apply
1
equation (14.56) if N < ∞, (14.55) if N = ∞ (recall that D = J N ,
ℓ = − log J ). ⊓
⊔
Further, define π ′ = law (γ0′ , γ1′ ), µ′s = law (γs′ ) = (es )# Π ′ . Obviously,
Π Π
Π′ ≤ = ,
Π[Z] µt [Bδ (y)]
so for all s ∈ [0, 1],
µs
µ′s ≤ .
µt [Bδ (y)]
In particular, µ′s is absolutely continuous and its density ρ′s satisfies
(ν-almost surely)
ρs
ρ′s ≤ . (19.4)
µt [Bδ (y)]
When s = t, inequality (19.4) can be refined into
ρt 1Bδ (y)
ρ′t = , (19.5)
µt [Bδ (y)]
since
1γt ∈Bδ (y) 1x∈Bδ (y) ((et )# Π)
(et )# = .
µt [Bδ (y)] µt [Bδ (y)]
(This is more difficult to write down than to understand!)
From the restriction property (Theorem 4.6), (γ0′ , γ1′ ) is an optimal
coupling of (µ′0 , µ′1 ), and therefore (µ′s )0≤s≤1 is a displacement interpo-
1
lation. By Theorem 17.37 applied with U (r) = −r 1− N ,
Z Z
1 1 1
(ρ′t )1− N dν ≥ (1 − t) (ρ′0 (x0 ))− N β1−t (x0 , x1 ) N π ′ (dx0 dx1 )
M M ×M
Z
1 1
+ t (ρ′1 (x1 ))− N βt (x0 , x1 ) N π ′ (dx0 dx1 ). (19.6)
M ×M
On the other hand, from (19.4) the right-hand side of (19.6) can be
bounded below by
Z
1 1 1
µt [Bδ (y)] N (1 − t) (ρ0 (x0 ))− N β1−t (x0 , x1 ) N
M ×M
1 1
+ t (ρ1 (x1 ))− N βt (x0 , x1 ) N π ′ (dx0 dx1 )
1
h 1 1
= µt [Bδ (y)] N E (1 − t) (ρ0 (γ0′ ))− N β1−t (γ0′ , γ1′ ) N
1 1
i
+ t (ρ1 (γ1′ ))− N βt (γ0′ , γ1′ ) N
1
h 1 1
≥ µt [Bδ (y)] N E inf (1 − t) (ρ0 (γ0′ ))− N β1−t (γ0′ , γ1′ ) N
γt ∈[x0 ,x1 ]t
1 1
i
+ t (ρ1 (γ1′ ))− N βt (γ0′ , γ1′ ) N , (19.9)
where the last inequality follows just from the (obvious) remark that
γt′ ∈ [γ0′ , γ1′ ]t . In all of these inequalities, we can restrict π ′ to the set
{ρ0 (x0 ) > 0, ρ1 (x1 ) > 0} which is of full measure.
Let
1 1
F (x) := inf (1 − t) (ρ0 (x0 ))− N β1−t (x0 , x1 ) N
x∈[x0 ,x1 ]t
1 1
+ t (ρ1 (x1 ))− N βt (x0 , x1 ) N ;
provided that ρt (y) > 0; and then ρt (y) ≤ F (y)−N , as desired. In the
case ρt (y) = 0 the conclusion still holds true.
Some final words about measurability. It is not clear (at least to me)
that F is measurable; but instead of F one may use the measurable
function
1 1 1 1
Fe(x) = (1 − t) ρ0 (γ0 )− N β1−t (γ0 , γ1 ) N + t ρ1 (γ1 )− N βt (γ0 , γ1 ) N ,
Clearly, the sum above can be restricted to those pairs (x0 , x1 ) such that
x1 lies in the support of µ1 , i.e. x1 ∈ B; and x0 lies in the support of µt0 ,
Pointwise estimates on the interpolant density 529
ρ1 (x1 )
= sup .
x∈[x0 ,x1 ]t ; x0 ∈[z0 ,B]t0 ; x1 ∈B tN βt (x0 , x1 )
S(t0 , z0 , B)
ρt′ (x) ≤ ,
tN ν[B]
where
n 1
o
S(t0 , z0 , B) := sup βt (x0 , x1 )− N ; x0 ∈ [z0 , B]t0 , x1 ∈ B . (19.13)
To conclude, I shall state a theorem which holds true with the in-
trinsic distortion coefficients of the manifold, without any reference to a
choice of K and N , and without any assumption on the behavior of the
manifold at infinity (if the total cost is infinite, we can appeal to the
notion of generalized optimal coupling and generalized displacement
interpolation, as in Chapter 13). Recall Definition 14.17.
Democratic condition
However, a geodesic joining two points in B cannot leave the ball 2B,
so (19.23) and the democratic condition together imply that
Z Z
2C r
− |u − huiB | dν ≤ g dν. (19.24)
ν[B] 2B
B
1 D
By the doubling property, ν[B] ≤ ν[2B] . The conclusion is that
Z Z
− |u − huiB | dν ≤ 2 C D r − g dν. (19.25)
B 2B
Remark 19.15. With almost the same proof, it is easy to derive the
following refinement of the local Poincaré inequality:
Z Z
|u(x) − u(y)|
dν(x) dν(y) ≤ P (K, N, R) |∇u|(x) dν(x).
B[x,r] d(x, y) B[x,2r]
Back to Brunn–Minkowski and Prékopa–Leindler inequalities 535
Now integrate this against ρt (x) dν(x): since the right-hand side
does not depend on x any longer,
Z
1 1 1
1− N
ρt (x) dν(x) ≥ (1 − t) inf β1−t (x0 , x1 ) N ν[A0 ] N
x∈[A0 ,A1 ]t
1 1
+ t inf βt (x0 , x1 ) N ν[A1 ] N .
x∈[A0 ,A1 ]t
Proof ofRTheorem
R 19.16. By an easy homogeneity argument, we may
assume f = g = 1. Then write ρ0 = f , ρ1 = g; by Theorem 19.4,
the displacement interpolant ρt between
R ρ0Rν and ρ1 ν satisfies (19.3).
From (19.26), h ≥ ρt . It follows that h ≥ ρt = 1, as desired. ⊓
⊔
Proof of Theorem 19.18. The proof is quite similar to the proof of The-
orem 19.16, except that now N is finite. Let f , g and h satisfy the
assumptions of the theorem, define ρ0 = f /kf kL1 , ρ1 = g/kgkL1 , and
let ρt be the density of the displacement interpolant at time t between
ρ0 ν and ρ1 ν.RLet M be the right-hand side of (19.29); the problem is
to show that (h/M) ≥ 1, and this is obviously true if h/M ≥ ρt . In
view of Theorem 19.4, it is sufficient to establish
h(x) 1 β1−t (x0 , x1 ) βt (x0 , x1 ) −1
≥ sup Mt
N
, . (19.30)
M x∈[x0 ,x1 ]t ρ0 (x0 ) ρ1 (x1 )
In view of the assumption of h and the form of M, it is sufficient to
check that
q f (x0 ) g(x1 )
1 M ,
t β1−t (x0 ,x1 ) βt (x0 ,x1 )
1 ≤ q .
MtN β1−t (x0 ,x1 ) βt (x0 ,x1 )
ρ0 (x0 ) , ρ1 (x1 ) M t
1+Nq
(kf kL1 , kgk L 1 )
But this is a consequence of the following computation:
1
= Mst (a, b)
M−s −1 −1
t (a , b )
q a b
a b Mt c, d
≤ Mqt , Mt r (c, d) = , (19.31)
c d M−r
t (c, d)
1 1 1
+ = , q + r ≥ 0,
q r s
where the two equalities in (19.31) are obvious by homogeneity, and the
central inequality is a consequence of the two-point Hölder inequality
(see the bibliographical notes for references). ⊓
⊔
538 19 Density control and local regularity
Bibliographical notes
(19.33)
It was recently shown by Bobkov and Ledoux [132] that this inequality
can be used to establish optimal Sobolev inequalities in RN (with the
usual Prékopa–Leindler inequality one can apparently reach only the
logarithmic Sobolev inequality, that is, the dimension-free case [131]).
See [132] for the history and derivation of (19.33).
20
on a nonsmooth length space, there are still natural definitions for the
norm of the gradient, |∇f |. The most common one is
|f (y) − f (x)|
|∇f |(x) := lim sup . (20.1)
y→x d(x, y)
Rigorously speaking, this formula makes sense only if x is not isolated,
which will always be the case in the sequel. A slightly finer notion is
the following:
[f (y) − f (x)]−
|∇− f |(x) := lim sup , (20.2)
y→x d(x, y)
where a− = max(−a, 0) stands for the negative part of a (which is
a nonnegative number!). It is obvious that |∇− f | ≤ |∇f |, and both
notions coincide with the usual one if f is differentiable. Note that
|∇− f |(x) is automatically 0 if x is a local minimum of f .
Theorem 20.1 (Differentiating an energy along optimal trans-
port). Let (X , d, ν) and U be as above, and let (µt )0≤t≤1 be a geodesic
in P2 (X ), such that each µt is absolutely continuous with respect to ν,
with density ρt , and U (ρt )− is ν-integrable for all t. Further assume
that ρ0 is Lipschitz continuous, U (ρ0 ) and ρ0 U ′ (ρ0 ) are ν-integrable,
and U ′ is Lipschitz continuous on ρ0 (X ). Then
Uν (µt ) − Uν (µ0 )
lim inf ≥
t↓0 t
Z
− U ′′ (ρ0 (x0 )) |∇− ρ0 |(x0 ) d(x0 , x1 ) π(dx0 dx1 ), (20.3)
X
where π is an optimal coupling of (µ0 , µ1 ) associated with the geodesic
path (µt )0≤t≤1 .
Remark 20.2. The technical assumption on the negative part of U (ρt )
being integrable is a standard way to make sure that Uν (µt ) is well-
defined, with values in R ∪ {+∞}. As for the assumption about U ′
being Lipschitz on ρ0 (X ), it means in practice that either U is twice
(right-)differentiable at the origin, or ρ0 is bounded away from 0.
Remark 20.3. Here is a more probabilistic reformulation of (20.3)
(which will also make more explicit the link between π and µt ): Let
γ be a random geodesic such that µt = law (γt ), then
h i
Uν (µt ) − Uν (µ0 )
lim inf ≥ − E U ′′ (ρ0 (γ0 )) |∇− ρ0 |(γ0 ) d(γ0 , γ1 ) .
t↓0 t
Time-derivative of the energy 543
Now let γ be a random geodesic, such that µt = law (γt ). Then the
above inequality can be rewritten
Uν (µt ) − Uν (µ0 ) ≥ E U ′ (ρ0 (γt )) − E U ′ (ρ0 (γ0 ))
h i
= E U ′ (ρ0 (γt )) − U ′ (ρ0 (γ0 )) .
Since U ′ is nondecreasing,
U ′ (ρ0 (γt )) − U ′ (ρ0 (γ0 )) ≥ U ′ (ρ0 (γt )) − U ′ (ρ0 (γ0 )) 1ρ0 (γ0 )>ρ0 (γt ) .
Similarly,
ρ0 (γt ) − ρ0 (γ0 )
lim inf 1ρ0 (γ0 )>ρ0 (γt ) ≥ − |∇− ρ0 |(γ0 ).
t→0 d(γ0 , γt )
So, if vt (γ) stands for the integrand in the right-hand side of (20.5),
one has
(note indeed that p′ (r) = r U ′′ (r)). Since π = (ρ0 ν) ⊗ δx1 =T (x0 ) with
T = exp ∇ψ, the right-hand side can be rewritten
Z
U ′′ (ρ0 )∇ρ0 · ∇ψ dπ.
Exercise 20.5. Use Otto’s calculus to guess that (d/dt)Uν (µt ) should
coincide with the right-hand side of (20.7).
HWI inequalities
Hν (µt ) − Hν (µ0 )
≤ Hν (µ1 ) − Hν (µ0 ).
t
Under suitable assumptions we may then apply Theorem 20.1 to pass
to the limit as t → 0, and get
Z
|∇ρ0 (x0 )|
− d(x0 , x1 ) π(dx0 dx1 ) ≤ Hν (µ1 ) − Hν (µ0 ).
ρ0 (x0 )
By explicit computation,
N −1
α
sin α >1 if K > 0
β(x0 , x1 ) = 1 if K = 0 (20.9)
N −1
α
sinh α < 1 if K < 0,
α
−(N − 1) 1 − tan α < 0 if K > 0
β ′ (x0 , x1 ) = 0 if K = 0 (20.10)
α
(N − 1) tanh α − 1 > 0 if K < 0,
where r
|K|
α= d(x0 , x1 ).
N −1
HWI inequalities 547
K K
β ≃1− d(x0 , x1 )2 , β′ ≃ − d(x0 , x1 )2 ,
6 3
whatever the sign of K.
The next definition is a generalization of the classical notion of
Fisher information:
(Strictly speaking this is true only if ρ > 0, but the integral in (20.11)
may be restricted to the set {ρ > 0}.) Also, in Definition 20.6 one can
replace |∇ρ| by |∇− ρ| and |∇p(ρ)| by |∇− p(ρ)| since a locally Lipschitz
function is differentiable almost everywhere.
Uν (µ0 ) − Uν (µ1 )
Z
W2 (µ0 , µ1 )2
≤ U ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 ) − K∞,U
2
q W2 (µ0 , µ1 )2
≤ W2 (µ0 , µ1 ) IU,ν (µ0 ) − K∞,U , (20.14)
2
where K∞,U is defined in (17.10).
(iii) If N < ∞, K ≥ 0 and Uν (µ1 ) < +∞ then
HWI inequalities 549
Uν (µ0 ) − Uν (µ1 )
Z
≤ U ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 )
− 1 W2 (µ0 , µ1 )2
− KλN,U max kρ0 kL∞ (ν) , kρ1 kL∞ (ν) N
2
q
≤ W2 (µ0 , µ1 ) IU,ν (µ0 )
− 1 W2 (µ0 , µ1 )2
− KλN,U max kρ0 kL∞ (ν) , kρ1 kL∞ (ν) N
,
2
(20.15)
where
p(r)
λN,U = lim 1 . (20.16)
r→0 r 1− N
Exercise 20.11. When U is well-behaved, give a more direct deriva-
tion of (20.14), via plain displacement convexity (rather than distorted
displacement convexity). The same for (20.15), with the help of Exer-
cise 17.23.
Proof of Theorem 20.10. First recall from the proof of Theorem 17.8
that U− (ρ0 ) is integrable; since ρ0 U ′ (ρ0 ) ≥ U (ρ0 ), the integrability of
ρ0 U ′ (ρ0 ) implies the integrability of U (ρ0 ). Moreover, if N = ∞ then
U (r) ≥ a r log r−b r for some positive constants a, b (unless U is linear).
So
if N = ∞
ρ0 U ′ (ρ0 ) ∈ L1 =⇒ U (ρ0 ) ∈ L1 =⇒ ρ0 log+ ρ0 ∈ L1 .
where C stands for various numeric constants. Then (20.19) also holds
true for K < 0.
Second term of (20.18): This is the same as for the first term except
that the inequalities are reversed. If K > 0 then U (ρ1 ) ≥ U (ρ1 /βt ) βt ↓
U (ρ1 /β) β, and to pass to the limit it suffices to check the integrability
of U+ (ρ1 ). If N < ∞ this follows from the Lipschitz continuity of U ,
while if N = ∞ this comes from the assumption ρ1 log+ ρ1 ∈ L1 (ν).
552 20 Infinitesimal displacement convexity
If K < 0 then U (ρ1 ) ≤ U (ρ1 /βt ) βt ↑ U (ρ1 /β) β, and now we can
conclude because U− (ρ1 ) is integrable by Theorem 17.8. In either case,
Z
ρ1 (x1 )
U βt (x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
βt (x0 , x1 )
Z
ρ1 (x1 )
−−→ U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 ). (20.20)
t→0 β(x0 , x1 )
To prove (20.25), first note that p(0) = 0 (because p(r)/r 1−1/N is non-
decreasing), so pℓ (r) → p(r) for all r, and the integrand in the left-hand
side converges to the integrand in the right-hand side.
Moreover, since pℓ (0) = 0 and p′ℓ (r) = r Uℓ′′ (r) ≤ C r U ′′ (r) =
C p′ (r), we have 0 ≤ pℓ (r) ≤ C p(r). Then:
• If K = 0 then β ′ = 0 and there is nothing to prove.
R
• If K < 0 then β ′ > 0. If p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 ) <
+∞ then the left-hand side converges to the right-hand side by
dominated convergence; otherwise the inequality is obvious.
• If K > 0 and N < ∞ then β ′ is bounded
R and we may conclude by
dominated convergence as soon as p(ρ0 (x0 )) dν(x0 ) < +∞. This
in turn results from the fact that ρ0 U ′ (ρ0 ), U− (ρ0 ) ∈ L1 (ν).
554 20 Infinitesimal displacement convexity
In particular,
ρ1 (x1 ) β(x0 , x1 )
U
β(x0 , x1 ) ρ1 (x1 )
1
≤ U (ρ1 (x1 ))
ρ1 (x1 )
β(x0 , x1 ) ρ1 (x1 ) 1 β(x0 , x1 )
+ p log − log
ρ1 (x1 )β(x0 , x1 ) ρ1 (x1 ) ρ1 (x1 )
U (ρ1 (x1 )) β(x0 , x1 ) ρ1 (x1 ) K
= − p d(x0 , x1 )2
ρ1 (x1 ) ρ1 (x1 ) β(x0 , x1 ) 6
U (ρ1 (x1 )) K∞,U
≤ − d(x0 , x1 )2 .
ρ1 (x1 ) 6
Thus
556 20 Infinitesimal displacement convexity
Z
ρ1 (x1 )
U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
β(x0 , x1 )
Z Z
K∞,U
≤ U (ρ1 (x1 )) ν(dx1 ) − ρ1 (x1 ) d(x0 , x1 )2 π(dx0 |x1 ) ν(dx1 )
6
Z
K∞,U
= U (ρ1 ) dν − W2 (µ0 , µ1 )2 . (20.28)
6
On the other hand,
Z
p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
Z
K∞,U
≤− ρ0 (x0 ) d(x0 , x1 )2 π(dx1 |x0 ) ν(dx0 )
3
K∞,U
=− W2 (µ0 , µ1 )2 . (20.29)
3
Plugging (20.28) and (20.29) into (20.12) finishes the proof of (20.14).
The proof of (iii) is along the same lines: I shall establish the identity
Z
ρ1 (x1 )
U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
β(x0 , x1 )
Z
+ p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
1 1
!
(sup ρ0 )− N (sup ρ1 )− N
≤ Uν (µ1 ) − Kλ + W2 (µ0 , µ1 )2 . (20.30)
3 6
This combined with Corollary 19.5 will lead from (20.12) to (20.15).
So let us prove (20.30). By convexity of s 7−→ sN U (s−N ),
ρ1 (x1 ) β(x0 , x1 ) U (ρ1 (x1 ))
U ≤
β(x0 , x1 ) ρ1 (x1 ) ρ1 (x1 )
1− 1 " 1 #
β(x0 , x1 ) N ρ1 (x1 ) β(x0 , x1 ) N 1
+N p − 1 ,
ρ1 (x1 ) β(x0 , x1 ) ρ1 (x1 ) ρ1 (x1 ) N
which is the same as
ρ1 (x1 )
U β(x0 , x1 ) ≤ U (ρ1 (x1 ))
β(x0 , x1 )
ρ1 (x1 ) 1 1
+Np β(x0 , x1 )1− N β(x0 , x1 ) N − 1 .
β(x0 , x1 )
HWI inequalities 557
As a consequence,
Z
ρ1 (x1 )
U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 ) ≤ Uν (µ1 )
β(x0 , x1 )
Z
ρ1 (x1 ) 1 1
+N p β(x0 , x1 )1− N β(x0 , x1 ) N −1 π(dx0 |x1 ) ν(dx1 ).
β(x0 , x1 )
(20.31)
(see the bibliographical notes for details), the right-hand side of (20.31)
is bounded above by
Z
K ρ1 (x1 ) 1
Uν (µ1 ) − p β(x0 , x1 )1− N π(dx0 |x1 ) ν(dx1 )
6 β(x0 , x1 )
Z
K p(r) 1
≤ Uν (µ1 ) − inf 1 ρ1 (x1 )1− N d(x0 , x1 )2 π(dx0 |x1 ) ν(dx1 )
6 r>0 r 1− N
≤ Uν (µ1 )
Z
K p(r) −N 1
− lim (sup ρ1 ) ρ1 (x1 ) d(x0 , x1 )2 π(dx0 |x1 ) ν(dx1 )
6 r→0 r 1− N1
K 1
= Uν (µ1 ) − λ (sup ρ1 )− N W2 (µ0 , µ1 )2 , (20.33)
6
where λ = λN,U .
On the other hand, since β ′ (x0 , x1 ) = −(N − 1)(1 − (α/ tan α)) < 0,
we can use the elementary inequality
α α2
0 < α ≤ π =⇒ (N − 1) 1 − ≥ (N − 1) (20.34)
tan α 3
(see the bibliographical notes again) to deduce
558 20 Infinitesimal displacement convexity
Z
p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 ) (20.35)
Z
p(r) 1
≤ inf 1 ρ0 (x0 )1− N β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
r>0 r 1− N
Z
p(r) −N1
≤ lim 1 (sup ρ0 ) ρ0 (x0 ) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
r→0 r 1− N
Z
p(r) −N1 Kd(x0 , x1 )2
≤ lim 1 (sup ρ0 ) ρ0 (x0 ) π(dx1 |x0 ) ν(dx0 )
r→0 r 1− N 3
1 Z
Kλ (sup ρ0 )− N
= d(x0 , x1 )2 π(dx0 dx1 )
3
1
Kλ (sup ρ0 )− N
= W2 (µ0 , µ1 )2 . (20.36)
3
The combination of (20.33) and (20.36) implies (20.30) and con-
cludes the proof of Theorem 20.10. ⊓
⊔
Bibliographical notes
leading roles in information theory [252, 295]. They also have a leading
part in statistical mechanics and kinetic theory (see e.g. [817, 812]).
The HWI inequality was established in my joint work with Otto [671];
it obviously extends to any reasonable K-displacement convex func-
tional. A precursor inequality was studied by Otto [669]. An applica-
tion to a “concrete” problem of partial differential equations can be
found in [213, Section 5]. Recently Gao and Wu [405] used the HWI
inequality to derive new criteria of uniqueness for certain spin systems.
It is shown in [671, Appendix] and [814, Proof of Theorem 9.17,
Step 1] how to devise approximating sequences of smooth densities in
such a way that the Hν and Iν functionals pass to the limit. By adapting
these arguments one may conclude the proof of Corollary 20.13.
The role of the HWI inequality as an interpolation inequality is
briefly discussed in [814, Section 9.4] and turned into application
in [213, Proof of Theorem 5.1]: in that reference we study rates of con-
vergence for certain nonlinear partial differential equations, and com-
bine a bound on the Fisher information with a convergence estimate in
Wasserstein distance, to establish a convergence estimate in a stronger
sense (L1 norm, for instance).
The HWI inequality is also interesting as an “infinite-dimensional”
interpolation inequality; this is applied in [445] to the study of the limit
behavior of the entropy in a hydrodynamic limit.
A slightly different derivation of the HWI inequality is due to
Cordero-Erausquin [242]; a completely different derivation is due to
Bobkov, Gentil and Ledoux [127]. Variations of these inequalities were
studied by Agueh, Ghoussoub and Kang [5]; and Cordero-Erausquin,
Gangbo and Houdré [245].
There is no well-identified analog of the HWI inequality for non-
quadratic cost functions. For nonquadratic costs in Rn , some inequali-
ties in the spirit of HWI are established in [76], where they are used to
derive various isoperimetric-type inequalities.
The first somewhat systematic studies of HWI-type inequalities in
the case N < ∞ are due to Lott and myself [577, 578].
The elementary inequalities (20.32) and (20.34) are proven in [578,
Section 5], where they are used to derive the Lichnerowicz spectral gap
inequality (Theorem 21.20 in Chapter 21).
21
Isoperimetric-type inequalities
Sobolev inequalities
with some restrictions on the exponents. I will not say more about
Sobolev-type inequalities, but there are entire books devoted to them.
In a Riemannian setting, there is a famous family of Sobolev inequal-
ities obtained from the curvature-dimension bound CD(K, N ) with
K > 0 and 2 < N < ∞:
2N
1≤q≤ =⇒
N −2
"Z 2 Z # Z
c q NK
|u|q dν − |u|2 dν ≤ |∇u|2 dν, c= .
q−2 N −1
(21.7)
where
Θ(N,K) (r, g) =
r α 1− 1
N −1 g N −1 N
r sup 1 α+N 1−
0≤α≤π N r N
1+ K sin α
α 1
+ (N − 1) − 1 r − N . (21.11)
tan α
Sobolev inequalities 567
As a consequence,
Z 2 2
!
1 |∇ρ|2 N −1 ρ− N
HN,ν (µ) ≤ 1 dν. (21.12)
2K M ρ N 1
+ 23 ρ− N
3
Proof of Theorem 21.9. Start from Theorem 20.10 and choose U (r) =
1
−N (r 1− N − r). After some straightforward calculations, one obtains
Z
HN,ν (µ) ≤ θ (N,K) ρ, |∇ρ|, α ,
M
p
where α = K/(N − 1) d(x0 , x1 ) ∈ [0, π], and θ (N,K) is an explicit
function such that
and the infimum is taken over all functions g ∈ L1 (Rn ), not identi-
cally 0.
Remark 21.13. The assumption of Lipschitz continuity for u can be
removed, but I shall not do so here. Actually, inequality (21.13) holds
true as soon as u is locally integrable and vanishes at infinity, in the
sense that the Lebesgue measure of {|u| ≥ r} is finite for any r > 0.
Remark 21.14. The constant Sn (p) in (21.13) is optimal.
Proof of Theorem 21.12. Choose M = Rn , ν = Lebesgue measure,
and apply Theorem 20.10 with K = 0, N = n, and µ0 = ρ0 ν,
µ1 = ρ1 ν, both of them compactly supported. By formula (20.14) in
Theorem 20.10(i),
(λ) (λ)
Moreover, as λ → ∞, the probability measure µ0 = ρ0 ν converges
weakly to the Dirac mass δ0 at the origin; so
Z 1′
p
(λ) p′
Wp′ (µ0 , µ1 ) −→ Wp′ (δ0 , µ1 ) = |y| dµ1 (y) .
(λ)
After writing (21.14) for µ0 = µ0 and then passing to the limit as
λ → ∞, one obtains
Z Z 1 Z 1′
1− 1 1
−p(1+ n ) p
p′
p
n ρ1 n dν ≤ ρ0 |∇ρ0 |p dµ0 |y| dµ1 (y) .
Rn
(21.15)
⋆
Let us change unknowns and define ρ0 = u1/p , ρ1 = g; inequal-
ity (21.15) then becomes
Z 1′
p
p′
|y| g(y) dy
p (n − 1)
1≤ Z k∇ukLp ,
n (n − p) 1
g(1− n )
R ⋆ R
where u and g are only required to satisfy up = 1, g = 1. The
inequality (21.13) follows by homogeneity again. ⊓
⊔
Recall that ν[B] = 1; then after the change of unknowns ρ0 = uN/(N −1) ,
inequality (21.18) implies
1 ≤ S K, N, R k∇ukL1 (M ) + kukL1 (M ) ,
Isoperimetric inequalities
Isoperimetric inequalities are sometimes obtained as limits of Sobolev
inequalities applied to indicator functions. The most classical example
is the equivalence between the Euclidean isoperimetric inequality
|∂A| |∂B n |
n−1 ≥ n−1
|A| n |B n | n
Poincaré inequalities
This writing makes the formal connection with the logarithmic Sobolev
inequality very natural. (The Poincaré inequality is obtained as the
limit of the logarithmic Sobolev inequality when one sets µ = (1+ εu) ν
and lets ε → 0.)
Like Sobolev inequalities, Poincaré inequalities express the domi-
nation of a function by its gradient; but unlike Sobolev inequalities,
they do not include any gain of integrability. Poincaré inequalities have
spectral content, since the best constant λ can be interpreted as the
spectral gap for the Laplace operator on M .1 There is no Poincaré in-
equality on Rn equipped with the Lebesgue measure (the usual “flat”
Laplace operator does not have a spectral gap), but there is a Poincaré
1
This is one reason to take λ (universally accepted notation for spectral gap) as
the constant defining the Poincaré inequality. Unfortunately this is not consistent
with the convention that I used for local Poincaré inequalities; another choice
would have been to call λ−1 the Poincaré constant.
Poincaré inequalities 573
and the first term on the right-hand side vanishes by assumption. Sim-
ilarly, !
Z 2 Z
|∇ρ|2 ρ− N 2
1 =ε |∇f |2 dν + o(ε2 ).
ρ 1
+ 2 −N
ρ
3 3
So (21.12) implies
Z 2 Z
N −1 f2 1 N −1
dν ≤ |∇f |2 dν,
N 2 2K N
Bibliographical notes
Concentration inequalities
X
N 1
p
dp (x1 , . . . , xN ); (y1 , . . . , yN ) = d(xi , yi )p .
i=1
Z Z
1 λϕ Hν (µ)
∀ϕ ∈ Cb (X ), log e dν = sup ϕ dµ − .
λ X µ∈P (X ) λ
(22.7)
(See the bibliographical notes for proofs of these identities.)
Let us first treat the case p = 2. Apply Theorem 5.26 R λϕ with
c(x, y) = d(x, y)2 /2, F (µ) = (1/λ)Hν (µ), Λ(ϕ) = (1/λ) log e dν .
The conclusion is that ν satisfies T2 (λ) if and only if
Z Z
c
∀φ ∈ Cb (X ), log exp λ φ dν − λφ dν ≤ 0,
i.e. Z R
c
e−λφ dν ≤ e−λ φ dν
,
where φc (x) := supy φ(y) − d(x, y)2 /2). Upon changing φ for ϕ = −φ,
this is the desired result. Note that the Particular Case 22.4 is obtained
from (22.5) by choosing p = 1 and performing the change of variables
t → λt.
The case p < 2 is similar, except that now we appeal to the equiva-
lence between (i’) and (ii’) in Theorem 5.26, and choose
2
d(x, y)p p p p2 ∗ 1 1 2
c(x, y) = ; Φ(r) = r 1r≥0 ; Φ (t) = − t 2−p .
p 2 p 2
⊓
⊔
coupling between µ3 (dx3 |x1 , x2 ) and ν(dy3 ), call it π3 (dx3 dy3 |x1 , x2 );
etc. In the end, glue these plans together to get a coupling
π(dx dy) = π1 (dx1 dy1 ) π2 (dx2 dy2 |x1 ) . . . πN (dxN dyN |xN −1 ).
where of course
For each i and each xi−1 = (x1 , . . . , xi−1 ), the measure π( · |xi−1 ) is
an optimal transference plan between its marginals. So the right-hand
side of (22.8) can be rewritten as
N Z
X p
Wp µi ( · |xi−1 ), ν µi−1 (dxi−1 ).
i=1
Since this cost is achieved for the transference plan π, we obtain the
key estimate
N Z
X p
Wp (µ, ν ⊗N )p ≤ Wp µi ( · |xi−1 ), ν µi−1 (dxi−1 ). (22.9)
i=1
Optimal transport and concentration 589
XZ 2 p
2 i−1
i−1
Hν µi ( · |x ) µ (dxi−1 ). (22.10)
λ
i
P p/2
Since p ≤ 2, we can apply Hölder’s inequality, in the form i≤N ai ≤
P
N 1−p/2 ( ai )p/2 , and bound (22.10) by
p "Z N
X
! # p2
1− p2 2 2
N Hν µi ( · |xi−1 ) µi−1 (dxi−1 ) . (22.11)
λ
i=1
∀µ ∈ P (X ), C(µ, ν) ≤ Hν (µ)
implies
∀µ ∈ P (X N ), C N (µ, ν) ≤ Hν ⊗N (µ),
where C N is the optimal transport cost associated with the cost func-
tion X
cN (x, y) = c(xi , yi )
on X N .
The following important lemma was used in the course of the proof
of Proposition 22.5.
590 22 Concentration inequalities
(iii) There is a constant C > 0 such that for any Borel set A ⊂ X ,
1 2
ν[A] ≥ =⇒ ∀r > 0, ν[Ar ] ≥ 1 − e−C r .
2
(iv) There is a constant C > 0 such that
so (ii) implies Z
t2
R
et f (x) ν(dx) ≤ e 2λ et f dν
.
R
With the shorthand hf i = f dν, this is the same as
Z
t2
et (f −hf i) dν ≤ e 2λ .
where C = λ/2 (cf. the remark at the end of the proof of (i) ⇒ (iv)).
The implication (v) ⇒ (iv) is trivial.
Let us now consider the implication (i) ⇒ (iii). Assume that
p
∀µ ∈ P1 (X ), W1 (µ, ν) ≤ C Hν (µ). (22.14)
thus Z 2
ϕ |u| dν ≤ CHν (µ), (22.18)
where ZZ
(1 − t) (1 + tu) ϕ2 dν dt
C := Z 1 2 · (22.19)
(1 − t) dt
0
The numerator can be rewritten as follows:
ZZ
(1 − t) (1 + tu) ϕ2 dν dt
Z Z Z Z
2 2
= (1 − t)t dt (1 + u) ϕ dν + (1 − t) dt ϕ2 dν
Z Z
1 2 1
= ϕ dµ + ϕ2 dν. (22.20)
6 3
From the Legendre representation of the H functional,
Z Z
2
ϕ dµ ≤ Hν (µ) + log eϕ dν,
2
(22.21)
≤ (H + 2L) 2 (22.24)
where s r
8 √ √
m= 1+L+ (L − 1)2 + L ≤ 2 L + 1.
3
This concludes the proof. ⊓
⊔
Remark 22.19. Part (ii) of Theorem 22.17 shows that the T2 inequal-
ity on a Riemannian manifold contains spectral information, and im-
poses qualitative restrictions on measures satisfying T2 . For instance,
the support of such a measure needs to be connected. (Otherwise take
u = a on one connected component, u = b on another, u = 0Relsewhere,
where Ra and b are two constants
R 2 chosen in such a way that u dν = 0.
2
Then |∇u| dν = 0, while u dν > 0.) This remark shows that T2
does not result from just decay estimates, in contrast with T1 .
602 22 Concentration inequalities
and Z
φ(t) = Ht g dν + O(t). (22.33)
M
By Proposition 22.16(iv), Ht g converges pointwise to g as t → 0+ ; then
by the dominated convergence theorem,
Z
lim φ(t) = g dν. (22.34)
t→0+ M
So it all amounts to showing that φ(1) ≤ limt→0+ φ(t), and this will
obviously be true if φ(t) is nonincreasing in t. To prove this, we shall
compute the time-derivative φ′ (t). We shall go slowly, so the hasty
reader may go directly to the result, which is formula (22.41) below.
Let t ∈ (0, 1] be given. For s > 0, we have
Z
φ(t + s) − φ(t) 1 1 1
= − log eK(t+s)Ht+s g dν
s s K(t + s) Kt M
Z Z
1 K(t+s)Ht+s g KtHt g
+ log e dν − log e dν .
Kts M M
(22.35)
On the other hand, the second term in the right-hand side of (22.35)
converges to
Z Z
1 1
Z lim eK(t+s)Ht+s g dν − eKt Ht g dν ,
s→0+ s
Kt eKt Ht g dν M M
(22.37)
provided that the latter limit exists.
To evaluate the limit in (22.37), we decompose the expression inside
square brackets into
Z ! Z Kt Ht+s g
eK(t+s)Ht+s g − eKt Ht+s g e − eKt Ht g
dν + dν.
M s M s
(22.38)
The integrand of the first term in the above formula can be rewritten as
(eKt Ht+s g )(eKs Ht+s g − 1)/s, which is uniformly bounded and converges
pointwise to (e Kt Ht g )Kt H g as s → 0+ . So the first integral in (22.38)
R t
converges to M (K Ht g)eKt Ht g dν.
Let us turn to the second term of (22.38). By Proposition 22.16(vii),
for each x ∈ M ,
−
|∇ Ht g(x)|2
Ht+s g(x) = Ht g(x) − s + o(1) ,
2
and therefore
Ht+s g = Ht g + O(s).
eKtHt+s g − eKt Ht g
= O(1) as s → 0+ . (22.40)
s
By (22.39), (22.40) and the dominated convergence theorem,
604 22 Concentration inequalities
Z Z
eKtHt+s g − eKtHt g |∇− Ht g|2 KtHt g
lim dν = − Kt e dν.
s→0+ M s M 2
ZM Z
KtHt g 1 −
2 KtHt g
+ (KtHt g) e dν − Kt|∇ Ht g| e dν .
M 2K M
(22.41)
khkL2 (ν)
khkH −1 (ν) = sup =
∇(L−1 h)
L2 (ν) ,
h6=0 k∇hkL2 (ν)
K 2 t2
eKtHt h = 1 + KtHt h + (Ht h)2 + O(t3 ) (22.43)
2
K 2 t2 2
= 1 + KtHt h + h + o(t2 ).
2
So the right-hand side of (22.42) equals
Z Z
h − Ht h K
lim sup dν − h2 dν.
t→0+ M t 2 M
To close this section, I will show that the Talagrand inequality does
imply a logarithmic Sobolev inequality under strong enough curvature
assumptions.
606 22 Concentration inequalities
(iii) There is a constant C > 0 such that for any N ∈ N and any
f ∈ Lip(X N , d2 ) (resp. Lip(X N , d2 ) ∩ L1 (ν ⊗N )),
h i −C r2
⊗N N kf k2
ν x ∈ X ; f (x) ≥ m + r ≤e Lip ,
N
1 X
bN
µ x = δxi ∈ P (X ),
N
i=1
and let
bN
fN (x) = W2 µ x ,ν .
In a compact notation, cqℓ (x, y) = min(d(x, y)2 , d(x, y)). The optimal
total cost associated with cqℓ will be denoted by Cqℓ .
Theorem 22.25 (Reformulations of Poincaré inequalities). Let
M be a Riemannian manifold equipped with a reference probability mea-
sure ν = e−V vol. Then the following statements are equivalent:
(i) ν satisfies a Poincaré inequality;
(ii) There are constants c, K > 0 such that for any Lipschitz proba-
bility density ρ,
Iν (µ)
|∇ log ρ| ≤ c =⇒ Hν (µ) ≤ , µ = ρ ν; (22.48)
K
(iii) ν ∈ P1 (M ) and there is a constant C > 0 such that
∀µ ∈ P1 (M ), Cqℓ (µ, ν) ≤ C Hν (µ). (22.49)
610 22 Concentration inequalities
Remark 22.26. The equivalence between (i) and (ii) remains true
when the Riemannian manifold M is replaced by a general metric space.
On the other hand, the equivalence with (iii) uses at least a little bit
of the Riemannian structure (say, a local Poincaré inequality, a local
doubling property and a length property).
Remark 22.27. The equivalence between (i), (ii) and (iii) can be made
more precise. As the proof will show, if√ν satisfies a Poincaré inequality
with constant λ, then for any c < 2 λ there is an explicit constant
K = K(c) > 0 such that (22.48) holds true; and the K(c) converges to
λ as c → 0. Conversely, if for each c > 0 we call K(c) the best constant
in (22.48), then ν satisfies a Poincaré inequality with constant λ =
limc→0 K(c). Also, in (ii) ⇒ (iii) one can choose C = max (4/K, 2/c),
while in (iii) ⇒ (i) the Poincaré constant can be taken equal to C −1 .
Proof of Theorem
R 22.25. WeR shall start with the proof of (i) ⇒ (ii). Let
f = log ρ − (log ρ) dν; so f Rdν = 0 and the assumption
R in (ii) reads
|∇f | ≤ c. Moreover, with a = (log ρ) dν and X = ef dν,
Z
a
Iν (µ) = e |∇f |2 ef dν;
Z Z Z
f +a f +a f +a
Hν (µ) = (f + a)e dν − e dν log e dν
Z Z
= ea f ef dν − ef dν + 1 − ea (X log X − X + 1)
Z Z
a f f
≤e f e dν − e dν + 1 .
So it is sufficient to prove
Z Z
f f
1
|∇f | ≤ c =⇒ f e − e + 1 dν ≤ |∇f |2 ef dν. (22.53)
K
√
In the sequel, c is any constant satisfying 0 < c < 2 λ. Inequal-
ity (22.53) will be proven by two auxiliary inequalities:
Z √ Z
2 c 5/λ
f dν ≤ e f 2 e−|f | dν; (22.54)
Z √ !2 Z
1 2 λ+c
f 2 ef dν ≤ √ |∇f |2 ef dν. (22.55)
λ 2 λ−c
Note that the upper bound on |∇f | is crucial in both inequalities.
Once (22.54) and (22.55) are established, the result is immediately
obtained. Indeed, the right-hand side of (22.54) is obviously bounded by
the left-hand side of (22.55), so both expressions are bounded above by
Poincaré inequalities and quadratic-linear transport cost 613
R
a constant multiple of |∇f |2 ef dν. On the other hand, an elementary
study shows that
∀f ∈ R, f ef − ef + 1 ≤ max (f 2 , f 2 ef ),
(22.56)
R R
By the Poincaré
R 2 inequality,R f 2 dν ≤ (1/λ) |∇f |2 dν ≤ c2 /λ, which
implies ( f dν)2 ≤ (c2 /λ) f 2 dν. Also by the Poincaré inequality,
Z Z 2 Z
2 2 2
(f ) dν − f dν ≤ (1/λ) |∇(f 2 )|2 dν
Z Z
= (4/λ) f 2 |∇f |2 dν ≤ (4c2 /λ) f 2 dν.
in other words
Z R Z
2 |f |3 dν
f dν ≤ exp R 2 f 2 e−|f | dν.
f dν
614 22 Concentration inequalities
Since L∗ is quadratic on [0, 1], factors ε2 cancel out on both sides, and
we are back with the usual Poincaré inequality. ⊓
⊔
e−C min(r, r 2 )
∀r ≥ 0, ν[Ar ] ≥ 1 − . (22.61)
ν[A]
then µ⊗N will satisfy a Poincaré inequality with the same constant
as ν, and we may apply Theorem 22.30 to study concentration in
(M N , d2 , ν ⊗N ).
Poincaré inequalities and quadratic-linear transport cost 617
c(A, B) ≥ r 2 .
so z ∈ Ar;d2 . Similarly,
X X
d1 (z, y) = d(xi , yi ) ≤ cqℓ (xi , yi ) = c(x, y) < r 2 ;
d(xi ,yi )>1 i
Poincaré inequalities and quadratic-linear transport cost 619
where Brd stands for the ball of center 0 and radius r in RN for the
distance d.
620 22 Concentration inequalities
any Borel set in RN , and let y ∈ TN−1 (A) + Brd2 + Brd21 . This means that
there are w and x such that TN (w) ∈ A, |x − w|2 ≤ r, |y − x|1 ≤ r 2 .
Then by (22.68),
X
|TN (w) − TN (y)|22 = |T (wi ) − T (yi )|2
X
≤ C2 min |wi − yi |, |wi − yi |2
i
X X
2
≤C 2|wi − xi | + 4|xi − yi |2
|wi −xi |≥|xi −yi | |wi −xi |<|xi −yi |
X X
2
≤ 4C |xi − wi | + |xi − yi |2
≤ 8C 2 r 2 ;
d2
√
so TN (y) ∈ A + B√ 8Cr
. In summary, if C ′ = 8C, then
TN TN−1 (A) + Brd2 + Brd21 ⊂ A + BCd2′ r .
As a consequence, if A ⊂ RN is any Borel set satisfying γ ⊗N [A] ≥ 1/2,
then ν ⊗N [TN−1 (A)] = γ ⊗N [A] ≥ 1/2, and
′
h i
γ ⊗N [AC r ] ≥ γ ⊗N TN TN−1 (A) + Brd2 + Brd21
= ν ⊗N TN−1 (A) + Brd2 + Brd21
2
≥ 1 − e−cr
for some numeric constant c > 0. This is precisely the Gaussian concen-
tration property as it appears in Theorem 22.10(iii) — in a dimension-
free form.
Remark 22.35. In certain situations, (22.67) provides sharper con-
centration properties for the Gaussian measure, than the usual Gaus-
sian concentration bounds. This might look paradoxical, but can be
Dimension-dependent inequalities 621
Dimension-dependent inequalities
1
where Q = (α/ sin α)1− N . But this is immediate because Q is a sym-
metric function of x0 and x1 , and π has marginals µ = ρ ν and ν,
so
Z Z Z
Q(x0 , x1 ) dν(x0 ) = Q(x0 , x1 ) dν(x1 ) = Q(x0 , x1 ) dπ(x0 , x1 )
Z
= Q(x0 , x1 ) ρ(x0 ) dν(x0 ).
⊓
⊔
1
Proof of Corollary 22.39. Write again Q = (α/ sin α)1− N . The classical
′
Young inequality can be written ab ≤ ap /p+bp /p′ , where p′ = p/(p−1)
is the conjugate exponent to p; so
′
h i 1 − 1 − pp
1− 1 1 p −1 ρ NQ Q
N pρ Np = (N pρ) ρ− N Q Q p ≤ (N pρ) + .
p p′
⊓
⊔
Exercise 22.41. Use the inequalities proven in this section, and the
result of Exercise 22.20, to recover, at least formally, the inequality
Z
KN
h dν = 0 =⇒ khk2H −1 (ν) ≤ khk2L2 (ν)
N −1
Remark 22.42. If one applies the same procedure to (22.71), one re-
covers a constant K(N p)/(N p − 1), which reduces to the correct con-
stant only in the limit p → 1. As for inequality (22.73), it leads to just
K (which would be the limit p → ∞).
Note that inequality (22.73) does not solve this problem, since by
Remark 22.42 it only implies the Poincaré inequality with constant K.
I shall conclude with a very loosely formulated open problem, which
might be nonsense:
Open Problem 22.45. In the Euclidean case, is there a particular
variant of the Talagrand inequality which takes advantage of the homo-
geneity under dilations, just as the usual Sobolev inequality in Rn ? Is
it useful?
Recap
In the end, the main results of this chapter can be summarized by just
a few diagrams:
• Relations between functional inequalities: By combining
Theorems 21.2, 22.17, 22.10 and elementary inequalities, one has
CD(K, ∞) =⇒ (LS) =⇒ (T2 ) =⇒ (P) =⇒ (exp1 )
⇓
(T1 ) ⇐⇒ (exp2 ) =⇒ (exp1 )
All these symbols designate properties of the reference measure ν: (LS)
stands for logarithmic Sobolev inequality, (P) for Poincaré inequality,
exp2 means that ν has a finite square-exponential moment, and exp1
that it has a finite exponential moment.
• Reformulations of Poincaré inequality: Theorem 22.25 can
be visualized as
(P) ⇐⇒ (LSLL) ⇐⇒ (Tqℓ )
where (LSLL) means logarithmic Sobolev for log-Lipschitz functions,
and (Tqℓ ) designates the transportation-cost inequality involving the
quadratic-linear cost.
• Concentration properties via functional inequalities: The
three main such results proven in this chapter are
(T1 ) ⇐⇒ Gaussian concentration (Theorem 22.10)
(T2 ) ⇐⇒ dimension free Gaussian concentration (Theorem 22.22)
(P ) ⇐⇒ dimension free exponential concentration (Theorem 22.32)
626 22 Concentration inequalities
Proof of Theorem 22.46. First, note that the inverse L−1 of L is well-
defined R+ → R+ since L is strictly increasing and goes to +∞ at
infinity. Also L′ (∞) = limr→∞ (L(r)/r) is well-defined in (0, +∞]. Fur-
ther, note that
L∗ (p) = sup p r − L(r)
r≥0
so
d(x, y) ′
g(x) ≥ Hs g(x) ≥ g(x) + inf sL − L (∞) d(x, y)
d(x,y)≤R(g,s) s
R(g, s) ′
≥ g(x) + s L − L (∞) R(g, s) , (22.76)
s
where I have used the inequality p r ≤ L(r) + L∗ (p). Statement (v) fol-
lows at once from (22.77). Moreover, if L′ (∞) = +∞, then L∗ is contin-
uous on R+ , so by the definition of |∇− g| and the fact that R(g, s) → 0,
!
g(x) − Hs g(x) ∗ [g(y) − g(x)]−
lim sup ≤ L lim sup
s↓0 s s↓0 d(x,y)≤R(g,s) d(x, y)
= L∗ |∇− g(x)| ,
which proves (vi) in the case L′ (∞) = +∞.
If L′ (∞) < +∞, things are a bit more intricate. If kgkLip ≤ L′ (∞),
then of course |∇− g(x)| ≤ L′ (∞). I shall distinguish two situations:
• If |∇− g(x)| = L′ (∞), the same argument as before shows
g(x) − Hs g(x)
≤ L∗ (kgkLip ) ≤ L∗ (L′ (∞)) = L∗ (|∇− g(x)|).
s
• If |∇− g(x)| < L′ (∞), I claim that there is a function α = α(s),
depending on x, such that α(s) −→ 0 as s → 0, and
d(x, y)
Hs g(x) = inf g(y) + s L . (22.78)
d(x,y)≤α(s) s
If this is true then the same argument as in the case L′ (∞) = +∞
will work.
Appendix: Properties of the Hamilton–Jacobi semigroup 631
This is obvious if |∇− g(x)| = 0, so let us assume |∇− g(x)| > 0. (Note
that |∇− g(x)| < +∞ since g is Lipschitz.)
By the same computation as before,
g(x) − Hs g(x) 1 d(x, y)
= sup g(x) − g(y) − s L .
s s d(x,y)≤R(g,s) s
R(g, s)
−→ L−1 (∞) = +∞.
s
632 22 Concentration inequalities
Let
g(x) − g(y)
ψ(r) = sup .
d(x,y)=r d(x, y)
If it can be shown that
g(x) − Hs g(x)
lim inf ≥ |∇− g(x)| q − L(q) = L∗ (|∇− g(x)|).
s↓0 s
If L′ (∞) < +∞ and |∇− g(x)| = L′ (∞), the above reasoning fails
because ∂L∗ (|∇− g(x)|) might be empty. However, for any θ < |∇− g(x)|
we may find q ∈ ∂L∗ (θ), then the previous argument shows that
g(x) − Hs g(x)
lim inf ≥ L∗ (θ);
s↓0 s
the conclusion is obtained by letting θ → |∇− g(x)| and using the lower
semicontinuity of L∗ .
So it all boils down to checking (22.82). This is where the semicon-
cavity of g will be useful. (Indeed (22.82) might fail for an arbitrary
Lipschitz function.) The problem can be rewritten
or equivalently,
So
g(x) − g(y)
d(x, y) = r ′ =⇒ ψ(r) − ≥ − C (r − r ′ ).
d(x, y)
By taking the supremum over y, we conclude that
ψ(r) − ψ(r ′ ) ≥ − C (r − r ′ ).
[g(y) − g(x)]−
lim sup < L.
y→x d(x, y)
Bibliographical notes
The proof of [127] was adapted by Lott and myself [579] to compact
length spaces (X , d) equipped with a reference measure ν that is locally
doubling and satisfies a local Poincaré inequality; see Theorem 30.28
in the last chapter of these notes. In fact the proof of Theorem 22.17,
as I have written it, is essentially a copy–paste from [579]. (A detailed
proof of Proposition 22.16 is also provided there.) Then Gozlan [429]
relaxed these assumptions even further.
If M is a compact Riemannian manifold, then the normalized vol-
ume measure on M satisfies a Talagrand (T2 ) inequality: This results
from the existence of a logarithmic Sobolev inequality [710] and The-
orem 22.17. Moreover, by [671, Theorem 4], the diameter of M can
be bounded in terms of the constant in the Talagrand inequality, the
dimension of M and a lower bound on the Ricci curvature, just as
in (21.21) (where now λ stands for the constant in the Talagrand in-
equality). (The same bound certainly holds true even if M is not a priori
assumed to be compact, but this was not explicitly checked in [671].)
There is an analogous result where Talagrand inequality is replaced by
logarithmic Sobolev inequality [544, 727].
The remarkable result according to which dimension free Gaussian
concentration bounds are equivalent to T2 inequality (Theorem 22.22)
is due to Gozlan [429]; the proof of (iii) ⇒ (i) in Theorem 22.22 is ex-
tracted from this paper. Gozlan’s argument relies on Sanov’s theorem in
large deviation theory [296, Theorem 6.2.10]; this classical result states
that the rate of deviation of the empirical measure of independent, iden-
tically distributed samples is the (Kullback) information with respect
to their common law; in other words, under adequate conditions,
1 n o
− log ν ⊗N [b µNx ∈ A] ≃ inf H ν (µ); µ ∈ A .
N
A simple proof of the particular estimate (22.47) is provided in the
Appendix of [429]. The observation that Talagrand inequalities and
Sanov’s theorem match well goes back at least to [139]; but Gozlan’s
theorem uses this ingredient with a quite new twist.
Varadarajan’s theorem (law of large numbers for empirical mea-
sures) was already used in the proof of Theorem 5.10; it is proven for
instance in [318, Theorem 11.4.1]. It is anyway implied by Sanov’s the-
orem.
Theorem 22.10 shows that T1 is quite well understood, but many
questions remain open about the more interesting T2 inequality. One
of the most natural is the following: given a probability measure ν
Bibliographical notes 639
R
satisfying T2 , and a bounded function v, does e−v ν/( e−v dν) also
satisfies a T2 inequality? For the moment, the only partial result in this
direction is (22.29). This formula was first established by Blower [122]
and later recovered with simpler methods by Bolley and myself [140].
If one considers probability measures of the form e−V (x) dx with
V (x) behaving like |x|β for large |x|, then the critical exponents for
concentration-type inequalities are the same as we already discussed for
isoperimetric-type inequalities: If β ≥ 2 there is the T2 inequality, while
for β = 1 there is the transport inequality with linear-quadratic cost
function. What happens for intermediate values of β has been investi-
gated by Gentil, Guillin and Miclo in [410], by means of modified log-
arithmic Sobolev inequalities in the style of Bobkov and Ledoux [130].
Exponents β > 2 have also been considered in [131].
It was shown in [671] that (Talagrand) ⇒ (log Sobolev) in Rn , if
the reference measure ν is log concave (with respect to the Lebesgue
measure). It was natural to conjecture that the same argument would
work under an assumption of nonnegative curvature (say CD(0, ∞));
Theorem 22.21 shows that such is indeed the case.
It is only recently that Cattiaux and Guillin [219] produced a coun-
terexample on the real line, showing that the T2 inequality does not
necessarily imply a log Sobolev inequality. Their counterexample takes
the form dν = e−V dx, where V oscillates rather wildly at infinity,
in particular V ′′ is not bounded below. More precisely, their potential
looks like V (x) = |x|3 + 3x2 sin2 x + |x|β as x → +∞; then ν satisfies a
logarithmic Sobolev inequality only if β ≥ 5/2, but a T2 inequality as
soon as β > 2. Counterexamples with V ′′ bounded below have still not
yet been found.
Even more recently, Gozlan [425, 426, 428] exhibited a characteriza-
tion of T2 and other transport inequalities on R, for certain classes of
measures. He even identified situations where it is useful to deduce log-
arithmic Sobolev inequalities from T2 inequalities. Gentil, Guillin and
Miclo [411] considered transport inequalities on R for log-concave prob-
ability measures. This is a rather active area of research. For instance,
consider a transport inequality of the form C(µ, ν) ≤ Hν (µ), where the
cost function is c(x, y) = θ(a |x − y|), a > 0, and θ : R+ → R+ is convex
with θ(r) = r 2 for 0 ≤ r ≤ 1. If ν(dx) = e−V dx with V ′′ = o(V ′2 ) at
infinity and lim supx→+∞ θ ′ (λ x)/V ′ (x) < +∞ for some λ > 0, then
there exists a > 0 such that the inequality holds true.
640 22 Concentration inequalities
√ e−r
ν ⊗N A + 6 rB1d2 + 9rB1d1 ≥ 1 − .
ν ⊗N [A]
where (Xs )s≥0 is the symmetric diffusion process with invariant mea-
sure ν, µ = law (X0 ), and ϕ is an arbitrary Lipschitz function. (Com-
pare with Theorem 22.10(v).)
1
A related remark, which I learnt from Ben Arous, is that the logarithmic Sobolev
inequality compares the rate functions of two large deviation principles, one for
the empirical measure of independent samples and the other one for the empirical
time-averages.
23
Gradient flows I
Around the end of the nineties, Jordan, Kinderlehrer and Otto made
the important discovery that a number of well-known partial differen-
tial equations can be reformulated as gradient flows in the Wasserstein
space. The most emblematic example is that of the heat equation,
∂t ρ = ∆ρ,
(It is not a priori assumed that o(ε) is uniform in w.) Proposition 16.2
will also be useful.
(v) For any y ∈ M , and any geodesic (γs )0≤s≤1 joining γ0 = X(t)
to γ1 = y,
Z 1
d+ d(X(t), y)2
≤ Φ(y) − Φ(X(t)) − Λ(γs , γ̇s ) (1 − s) ds;
dt 2 0
648 23 Gradient flows I
(vi) For any y ∈ M , and any geodesic (γs )0≤s≤1 joining γ0 = X(t)
to γ1 = y,
d+ d(X(t), y)2 d(X(t), y)2
≤ Φ(y) − Φ(X(t)) − λ[γ] ,
dt 2 2
Remark 23.2. As the proof will show, the equivalence between (iii),
(iv), (v) and (vi) does not require the differentiability of Φ; it is sufficient
that Φ be valued in R ∪ {+∞} and Φ(X(t)) < +∞.
Finally, Properties (iv) to (vi) are quite handy to study gradient flows
in abstract metric spaces. As a matter of fact, in the sequel I shall use
(iv) to define gradient flows in the Wasserstein space.
d D E
− Φ(X(t)) = −∇Φ(X(t)), Ẋ(t)
dt
|∇Φ(X(t))|2 + |Ẋ(t)|2
≤ ∇Φ(X(t)) Ẋ(t) ≤ ,
2
with equality if and only if ∇Φ(X(t)) and Ẋ(t) have the same norm
and opposite directions. So (i) is equivalent to (ii).
Next, if Φ is differentiable then
So (v) implies
d+ d(X(t), y)2
− εw, Ẋ(t) =
dt 2
d(X(t), y)2
≤ Φ(expX(t) εw) − Φ(X(t)) + λ[γ]
2
ε2 |w|2
= Φ(expX(t) εw) − Φ(X(t)) − λ[γ] .
2
As a consequence,
Φ(expX(t) εw) ≥ Φ(X(t)) + ε w, −Ẋ(t) + o(ε),
The proof is the same as for the implication (iv) ⇒ (v) in Proposi-
tion 23.1. I could have used Inequality (23.2) to define gradient flows
in metric spaces, at least for λ-convex functions; but Definition 23.7 is
more general.
Remark 23.10. Recall that Theorem 10.41 gives a list of a few condi-
e can be replaced by the
tions under which the approximate gradient ∇
usual gradient ∇ in the formulas above.
µs = (Tt→s )# µt .
The idea is to compose the transport Tt→s with some optimal transport;
this will not result in an optimal transport, but at least it will provide
bounds on the Wasserstein distance.
In other words, µt = law (γt ), where γt is a random solution of
γ̇t = ξt (γt ). Restricting the time-interval slightly if necessary, we may
assume that these curves are defined on the closed interval [t1 , t2 ]. Each
of these curves is continuous and thereforeR bounded.
If γ solves γ̇t = ξt (γt ), then d(γs , γt ) ≤ |ξτ (γτ )| dτ . Since (γs , γt ) is
a coupling of (µs , µt ), it follows from the very definition of W2 that
654 23 Gradient flows I
s
Z t 2
W2 (µs , µt ) ≤ E |ξτ (γτ )| dτ
s
s Z
t
≤ E (s − t) |ξτ (γτ )|2 dτ
s
sZ
p t
= |s − t| E |ξτ (γτ )|2 dτ
s
sZ Z
p t
= |s − t| |ξτ |2 dµ τ dτ
s
Z t Z
1 2
≤ |s − t| + |ξτ | dµτ dτ .
2 s
The maps exp(∇ψ)e and exp(∇ b are inverse to each other in the
e ψ)
almost sure sense. So for σ(dx)-almost all x, there is a minimizing
e
geodesic connecting T (x) to x with initial velocity ∇ψ(T (x)); then by
the formula of first variation,
" 2 #
d x, Tt→t+s ◦ T (x) − d(x, T (x))2
lim sup
s↓0 2s
e
≤ − ξt (T (x)), ∇ψ(T (x)) .
So we should check that we can indeed pass to the lim sup in (23.9).
Let v(s, x) be the integrand in the right-hand side of (23.9): If 0 < s ≤ 1
then
2
d x, Tt→t+s ◦ T (x) − d(x, T (x))2
v(s, x) =
2s !
d x, T d x, T
t→t+s ◦T (x) − d(x, T (x)) t→t+s ◦ T (x) + d(x, T (x))
≤
s 2
!
d T (x), Tt→t+s (T (x)) d T (x), Tt→t+s (T (x))
≤ d(x, T (x)) +
s 2
!
d T (x), Tt→t+s (T (x)) d T (x), Tt→t+s (T (x))
≤ d(x, T (x)) +
s 2
2
d(x, T (x))2 d T (x), Tt→t+s (x)
≤ + =: w(s, x).
2 s2
Note that x → d(x, T (x))2 ∈ L1 (σ), since
Z
d(x, T (x))2 dσ(x) = W2 (σ, µt )2 < +∞. (23.10)
Moreover,
Z 2 Z
d T (x), Tt→t+s (T (x)) d(y, Tt→t+s (y))2
dσ(x) = dµt (y)
s2 s2
Z Z s 2
1
≤
ξt+τ (Tt→t+τ (y)) dτ dµt (y)
s2 0
Z Z
1 s 2
≤ ξt+τ (Tt→t+τ (y)) dτ dµt (y)
s 0
Z sZ
1
≤ ξt+τ (Tt→t+τ (y))2 dµt (y) dτ
s 0
Z Z
1 s 2
= |ξt+τ (z)| dµt+τ (z) dτ.
s 0
By assumption, the latter quantity converges as s → 0 to
Z Z
|ξt (x)|2 dµt (x) = |ξt (T (x))|2 dσ(x).
Since d(T (x), Tt→t+s (T (x)))2 /s2 −→ |ξt (T (x))|2 as s → 0, we can com-
bine this with (23.10) to deduce that
Derivative of the Wasserstein distance 657
Z Z
lim sup w(s, x) dσ(x) ≤ lim w(s, x) dσ(x).
s↓0 s↓0
Under this assumption I shall show, with the same notation as in Step 1,
that t → W2 (µt , σ)2 is differentiable on the whole of (t1 , t2 ), and
Z
d W2 (µt , σ)2 e t , ξt i dµt .
=− h∇ψ
dt 2 M
I shall start with some estimates on the flow Tt→s . From the as-
sumptions,
d
d(z, Tt→s (x)) ≤ ξs (Tt→s (x)) ≤ C 1 + d(z, Tt→s (x)) ,
ds
In the sequel, the symbol C will stand for other constants that may
depend only on τ and the Lipschitz constant of ξ.
658 23 Gradient flows I
Z
≤ C (1 + d(z, x))2 µt0 (dx) < +∞. (23.12)
Z
2
≤ C |t − s| (1 + d(z, x))2 µt (dx);
W2 (µt , µs ) ≤ C |t − s|.
For each s > 0, let T (s) be the optimal transport between σ and
µt+s . As s ↓ 0 we can extract a subsequence sk → 0, such that
W2 (µt+s , σ)2 − W2 (µt , σ)2
lim inf
s↓0 2s
W2 (µt+sk , σ)2 − W2 (µt , σ)2
= lim .
k→∞ 2sk
Changing signs and reasoning as in Step 1, we obtain
Z
W2 (µt , σ)2 − W2 (µt+s , σ)2
lim sup ≤ lim sup vk (x) σ(dx),
s↓0 2s k→∞
(23.14)
where
2 2
d x, Tt+sk →t ◦ T (t+sk ) (x) − d x, T (t+sk ) (x)
vk (x) = .
2sk
Let us bound the functions vk . For each x and k, we can use (23.11)
to derive
2 2
d x, Tt+sk →t ◦ T (t+sk ) (x) − d x, T (t+sk ) (x)
≤ d x, Tt+sk →t (T (t+sk ) (x)) + d(x, T (t+sk ) (x)
× d T (t+sk ) (x), Tt+sk →t (T (t+sk ) (x))
≤ C 1 + d(z, x) + d(z, T (t+sk ) (x) sk d z, T (t+sk ) (x)
2
≤ C sk 1 + d(z, x)2 + d z, T (t+sk ) (x) .
So 2
vk (x) ≤ C 1 + d(z, x)2 + d z, T (t+sk ) (x) .
bt )2 − W2 (µs′ , µ
W2 (µs , µ bt )2 ≤ C |s − s′ |
1γ∈Ak Π(dγ)
µt,k = (et )# Πk , Πk (dγ) = ,
Π[Ak ]
∂µt,k
+ ∇ · (ξt µt,k ) = 0. (23.18)
∂t
But by definition µt,k is concentrated on the ball B[z, k], so in (23.18)
we may replace ξt by ξt,k = ξχk , where χk is a smooth cutoff function,
0 ≤ χk ≤ 1, χk = 1 on B[z, k], χk = 0 outside of B[z, 2k].
bk , Z
Let A bt,k and ξbt,k be defined similarly in terms of ξb and µ
bk , µ bt .
b
Since ξt,k and ξt,k are compactly supported, we may apply the result
of Step 3: for all t ∈ (t1 , t2 ),
Z Z
d W2 (µt,k , µbt,k )2
e t,k , ξt,k dµt,k −
e ψbt,k , ξbt,k db
=− ∇ψ ∇ µt,k ,
dt 2
(23.19)
where exp(∇ψ e t,k ) and exp(∇ e ψbt,k ) are the optimal transports µt,k →
bt,k and µ
µ bt,k → µt,k .
Since t → µt,k and t → µ bt,k are Lipschitz paths, also W2 (µt,k , µ bt,k )
is a Lipschitz function of t, so (23.19) integrates up to
Derivative of the Wasserstein distance 663
bt,k )2
W2 (µt,k , µ W2 (µ0,k , µ b0,k )2
=
2 2
Z t Z Z
− e e b b
h∇ψs,k , ξs,k i dµs,k + h∇ψs,k , ξs,k i db
µs,k ds. (23.20)
0
bt )2
W2 (µt , µ b0 )2
W2 (µ0 , µ
=
2 2
Z t Z Z
− e e b b
h∇ψs , ξs i dµs + h∇ψs,k , ξs,k i db
µs,k ds, (23.21)
0
Then
Z
Uν (µt ) − Uν (µ) e ∇p(ρ)i dν;
lim inf ≥ h∇ψ, (23.25)
t↓0 t
and
Z
e ∇p(ρ)i dν
Uν (σ) ≥ Uν (µ) + h∇ψ,
Z 1 Z
1
+ KN,U e 2
|∇ψt (x)| ρt (x)1− N
ν(dx) (1 − t) dt. (23.26)
0
so
J0→t (x) = e− V (exp t∇ψ(x))−V (x)
det dx exp(t∇ψ)
= 1 − t ∇V (x) · ∇ψ(x) 1 + t ∆ψ(x) + o(t)
= 1 + t (Lψ)(x) + o(t). (23.32)
d2 u(t, x) − 1 2
2
≥ KN,U ρt exp(t∇ψ(x)) N ∇ψt (exp(t∇ψ(x))) ,
dt
and by assumption KN,U is finite. Note that |∇ψt (exp(t∇ψ(x)))| =
d(x, T (x)), which is bounded (µ-almost surely) by the maximum dis-
tance between points in the support of µ and points in the support
of ν. So there is a positive constant C such that
d2 u(t, x) − 1
≥ − C ρt exp(t∇ψ(x)) N
. (23.34)
dt2
Let Z tZ s − 1
R(t, x) = C ρτ exp(τ ∇ψ(x)) N dτ ds; (23.35)
0 0
e(t, x) − u
u e(0, x)
e(t, x) = u(t, x) + R(t, x);
u e x) =
w(t, .
t
Subdifferential of energy functionals 669
e
expx (∇ψ(x)) = expx (∇ψ(x)), µ(dx)-almost surely. (23.39)
Also recall from Remark 10.32 that µ-almost surely, x and T (x) =
expx (∇ψ(x)) are joined by a unique geodesic. So (23.39) implies
e
∇ψ(x) = ∇ψ(x), µ(dx)-almost surely.
Uν (µt ) − Uν (µ0 )
≤ Uν (µ1 ) − Uν (µ0 )
t
Z 1 Z
1
1− N 2 G(s, t)
− KN,U ρs (x) |∇ψs (x)| ν(dx) ds. (23.48)
0 t
Then we can use Steps 1 and 2 to pass to the lim inf in the left-hand
side.
As for the right-hand side of (23.48), we note that if B is a large
ball containing the supports of all ρs (0 ≤ s ≤ 1), and D = diam (B),
then
Z Z
1 1− 1
ρs (x)1− N |∇ψs (x)|2 ν(dx) ≤ D 2 ρs N dν
1
≤ D 2 ν[B] N < +∞.
(Here I have used Jensen’s inequality again as in Step 1.) So the quan-
tity inside brackets in the right-hand side of (23.48) is uniformly (in t)
bounded. Since G(s, t)/t converges to 1 − s in L1 (ds) as t → 0, we can
pass to the limit in the right-hand side of (23.48) too. The final result
is
Z
h∇ψ, ∇p(ρ)i dν ≤ Uν (µ1 ) − Uν (µ0 )
Z 1 Z
1
− KN,U ρs (x)1− N |∇ψs (x)|2 ν(dx) (1 − s) ds,
0
1/N
• If ρ0 ≥ A and χk ρ0 ≤ B, then χk p′ (χk ρ0 ) ≤ C χk ≤ C χk ≤
−1/N
C B 1/N ρ0 , so
1
−
e χk p′ (χk ρ0 ) |∇ρ0 | ≤ C |∇ψ|
|∇ψ| e ρ N |∇ρ0 |
0
e ′
≤ C |∇ψ| p (ρ0 ) |∇ρ0 |.
e χk p′ (χk ρ0 ) |∇ρ0 | is bounded by
To summarize: In all cases, |∇ψ|
e |∇p(ρ0 )|, and the latter quantity is integrable since
C |∇ψ|
Z sZ sZ
e |∇p(ρ0 )| dν ≤ e 2 dν |∇p(ρ0 )|2
|∇ψ| ρ0 |∇ψ| dν
ρ0
q
= W2 (µ0 , µ1 ) IU,ν (µ0 ). (23.55)
676 23 Gradient flows I
So we can pass to the limit in the third term of (23.49), and the proof
is complete.
Case 2: Spt(µ1 ) is not compact. In this case we shall definitely
not use the standard approximation scheme by restriction, but instead
a more classical procedure of smooth truncation.
Again let χ : R+ → R+ be a smooth nondecreasing function with
0 ≤ χ ≤ 1, χ(r) = 1 for r ≤ 1, χ(r) = 0 for r ≥ 2; now we require in
addition that (χ′ )2 /χ is bounded. (It is easy to construct such a cutoff
function rather explicitly.) Then we define χk (x) = χ(d(z, x)/k),
R where
z is an arbitrary point in M . For k large enough, Z1,k := R χk dµ1 ≥
1/2. Then we choose ℓ = ℓ(k) large enough that Z0,k := χℓ dµ0 is
larger than Z1,k ; this is possible since Z1,k < 1 (otherwise µ1 would be
compactly supported). Then we let
χℓ(k) µ0 χk µ 1
µ0,k = ; µ1,k = .
Z0,k Z1,k
For each k, these are two compactly supported, absolutely continuous
probability measures; let (µt,k )0≤t≤1 be the displacement interpolation
joining them, and let ρt,k be the density of µt,k . Further, let ψk be a
d2 /2-convex function so that exp(∇ψk ) is the optimal (Monge) trans-
port µ0 and µ1 , and let ψt,k be deduced from ψk by the Hamilton–Jacobi
forward semigroup.
Note carefully: It is obvious from the construction that Z0,k ρ0,k ↑ ρ0 ,
Z1,k ρ1,k ↑ ρ1 , but there is a priori no monotone convergence relating ρt,k
to ρt ! Instead, we have the following information. Since µ0,k → µ0 and
µ1,k → µ1 , Corollary 7.22 shows that the geodesic curves (µt,k )0≤t≤1
converge, up to extraction of a subsequence, to some geodesic curve
(µt,∞ )0≤t≤1 joining µ0 to µ1 . (The convergence holds true for all t.)
Since (µt ) is the unique such curve, actually µt,∞ = µt , which shows
that µt,k converges weakly to µt for all t ∈ [0, 1].
For each k, we can apply the results of Step 4 with U replaced by
Uk = Uk (Z1,k · ) and µt replaced by µt,k :
Z
Uk,ν (µ1,k ) ≥ Uk,ν (µ0,k ) + h∇ψk , ∇pk (ρ0,k )i dν
Z 1 Z
1
1− N 2
+ KN,Uk ρt,k (x) |∇ψt,k (x)| ν(dx) (1 − t) dt, (23.56)
0 M
where pk (r) = r Uk′ (r) − Uk (r). The problem is to pass to the limit as
k → ∞. We shall consider all four terms in (23.56) separately, and use
Subdifferential of energy functionals 677
a few results which will be proven later on in Part III of this course (in
a more general context).
R
First term of (23.56): Since Uk,ν (µ1,k ) = U (Z1,k ρ1,k ) dν and
Z1,k ρ1,k = χk ρ1 converges monotonically to ρ1 , the same arguments
as in the proof of Theorem 17.15 apply to show that
(23.62)
Observe that:
R
(a) ρ0,k |∇ψk |2 dν = W2 (µ0,k , µ1,k )2 . To prove that this converges
to W2 (µ0 , µ1 )2 , by Corollary 6.11 it suffices to check that
W2 (µ0,k , µ0 ) −→ 0; W2 (µ1,k , µ1 ) −→ 0;
Plugging back the results of (a), (b) and (c) into (23.62), we obtain
Z
e 2 dν −−−→ 0.
ρ0,k |∇ψk − ∇ψ| (23.63)
k→∞
I claim that
Z 1 Z
1
ρt (x)1− N e 2
|∇ψt (x)| ν(dx) (1 − t) dt
0
Z 1 Z
1
1− N 2
≥ lim sup ρt,k (x) |∇ψt,k (x)| ν(dx) (1 − t) dt. (23.65)
k→∞ 0
Subdifferential of energy functionals 681
and this quantity is positive if δ is chosen small enough and then C large
enough. Then the event E has zero Πk ⊗ Πk measure since (e0 , e1 )# Πk
is cyclically monotone (Theorems 5.10 and 7.21). So the left-hand side
in (23.70) vanishes, and (23.67) is true. A similar result holds for µt,k
and ∇ψt,k replaced by µt and ∇ψ e t , respectively.
R
Let Zk = χℓ dµt,k (which is positive if ℓ is large enough), and let
b
Πk be the probability measure on geodesics defined by
b
b k (dγ) = χℓ (γt ) Πk (dγ) .
Π
Zk
Subdifferential of energy functionals 683
R 1−1/N
On the other hand, ρt,k dν is bounded independently of k (by
Theorem 17.8; the moment condition used there is weaker than the one
which is presently enforced). This and (23.71) imply
Z
1
χℓ (x) ρt,k (x)1− N |wk (x) − w(x)| ν(dx) −−−→ 0. (23.72)
k→∞
(see Theorem 29.20 later in Chapter 29 and change signs; note that
χℓ w ν is a compactly supported measure).
Combining (23.72) and (23.73) yields
684 23 Gradient flows I
Z
1
lim sup ρt,k (x)1− N χℓ (x) |∇ψt,k (x)|2 ν(dx)
k→∞
Z
1
= lim sup ρt,k (x)1− N χℓ (x) wk (x) ν(dx)
k→∞
Z
1
= lim sup ρt,k (x)1− N χℓ (x) w(x) ν(dx)
k→∞
Z
1
≤ ρt (x)1− N χℓ (x) w(x) ν(dx)
Z
1
= ρt (x)1− N χℓ (x) |∇ψe t (x)|2 ν(dx).
This completes the proof of (23.66). Then we can at last pass to the
lim sup in the fourth term of (23.56).
Let us recapitulate: In this step we have shown that
• if N = ∞, then
Z
e ∇p(ρ0 )i dν + K∞,U W2 (µ0 , µ1 )2
Uν (µ1 ) ≥ Uν (µ0 ) + h∇ψ, ;
2
• if N < ∞ (or N = ∞) and K ≥ 0, then
Z
e ∇p(ρ0 )i dν;
Uν (µ1 ) ≥ Uν (µ0 ) + h∇ψ,
1a≤ρ0 ≤b ∇pℓ (ρ0 ) = p′ℓ (ρ0 ) 1a≤ρ0 ≤b ∇ρ0 −−−→ 1a≤ρ0 ≤b ∇p(ρ0 ).
ℓ→∞
686 23 Gradient flows I
This proves that ∇pℓ (ρ0 ) converges almost surely to ∇p(ρ0 ) on each
set {r0 < a ≤ ρ0 ≤ b}, and therefore on the whole of {ρ0 > r0 }. On the
other hand, if ρ0 ≤ r0 then p(ρ0 ) = 0, so ∇p(ρ0 ) vanishes almost surely
on {ρ0 ≤ r0 } (this is a well-known theorem from distribution theory;
see the bibliographical notes in case of need), and also ∇pℓ (ρ0 ) = 0 on
that set. This proves the almost everywhere convergence of ∇pℓ (ρ0 ) to
∇p(ρ0 ). At the same time, this reasoning proves (23.74).
So to pass to the limit in (23.75) it suffices to prove that the inte-
grand is dominated by an integrable function. But
e ∇pℓ (ρ0 )i ≤ |∇ψ|
h∇ψ, e |∇pℓ (ρ0 )|
e |∇p(ρ0 )|
≤ C |∇ψ|
The first part of the proof of Proposition 17.24(ii) shows that the ex-
pression inside brackets is uniformly bounded as soon as, say, t ≤ 1/2.
So
Z
e ∇p(ρ0 )i dν − O(t2 )
Uν (µt ) ≥ Uν (µ0 ) + t h∇ψ, as t → 0,
∂ρt
= L p(ρt ), (23.79)
∂t
and let µt = ρt ν. Assume that Uν (µt ) < +∞ for all t > 0; and that for
all 0 < t1 < t2 , Z t2
IU,ν (µt ) dt < +∞.
t1
Stability
bt ) ≤ e−λ t W2 (µ0 , µ
W2 (µt , µ b0 ).
bt )2
W2 (µt , µ
≤ Uν (b
µt ) − Uν (µt ) − λ . (23.83)
2
Similarly,
Z
e ψbt , ∇U ′ (b µt , µt )2
W2 (b
h∇ µt ≤ Uν (µt ) − Uν (b
ρt )i db µt ) − λ . (23.84)
2
This last section evokes some key issues which I shall not develop,
although they are closely related to the material in the rest of this
chapter.
There is a general theory of gradient flows in metric spaces, based
for instance on Definition 23.7, or other variants appearing in Propo-
sition 23.1. Motivations for these developments come from both pure
and applied mathematics. This theory was pushed to a high degree
of sophistication by many researchers, in particular De Giorgi and his
school. A key role is played by discrete-time approximation schemes,
the simplest of which can be stated as follows:
1. Choose your initial datum X0 ;
2. Choose a time step τ , which in the end will decrease to 0;
(τ ) d(X0 , X)2
3. Let X1 be a minimizer of X 7−→ Φ(X)+ ; then define
2τ
(τ )
(τ ) d(Xk , X)2
inductively Xk+1 as a minimizer of X 7−→ Φ(X) + .
2τ
(τ )
4. Pass to the limit in Xk as τ → 0, kτ → t, hopefully recover a
function X(t) which is the value of the gradient flow at time t.
General theory and time-discretization 693
W2 (µt , µt+dt )2
Uν (µt+dt ) − Uν (µt ) + .
2 dt
By using the interpretation of the Wasserstein distance between in-
finitesimally close probability measures, this can also be rewritten as
Z
W2 (µt , µt+dt )2 ∂µ
≃ inf |v|2 dµt ; + ∇ · (µv) = 0 .
dt ∂t
Kdt − dS,
The following important lemma was used in the proof of Theorems 23.9
and 23.26.
Z t′
′
sup F (s, t) − F (s, t ) ≤ v(τ ) dτ.
0≤s≤T t
Bibliographical notes
and in [814, Theorem 5.30] for general functions U ; all these references
only consider M = Rn . The procedure of extension of ∇ψ (Step 2)
appears e.g. in [248, Proof of Theorem 2] (in the particular case of
convex functions). The integration by parts of Step 3 appears in many
papers; under adequate assumptions, it can be justified in the whole of
Rn (without any assumption of compact support): see [248, Lemma 7],
[214, Lemma 5.12], [30, Lemma 10.4.5]. The proof in [214, 248] relies on
the possibility to find an exhaustive sequence of cutoff functions with
Hessian uniformly bounded, while the proof in [30] uses the fact that
in Rn , the distance to a convex set is a convex function. None of these
arguments seems to apply in more general noncompact Riemannian
manifolds (the second proof would probably work in nonnegative cur-
vature), so I have no idea whether the integration by parts in the proof
of Theorem 23.14 could be performed without compactness assump-
tion; this is the reason why I went through the painful2 approximation
procedure used in the end of the proof of Theorem 23.14.
It is interesting to compare the two strategies used in the exten-
sion from compact to noncompact situations, in Theorem 17.15 on the
one hand, and in Theorem 23.14 on the other. In the former case,
I could use the standard approximation scheme of Proposition 13.2,
with an excellent control of the displacement interpolation and the op-
timal transport. But for Theorem 23.14, this seems to be impossible
because of the need to control the smoothness of the approximation of
ρ0 ; as a consequence, passing to the limit is more delicate. Further, note
that Theorem 17.15 was used in the proof of Theorem 23.14, since it is
the convexity properties of Uν along displacement interpolation which
allows us to go back and forth between the integral and the differential
(in the t variable) formulations.
The argument used to prove that the first term of (23.61) converges
to 0 is reminiscent of the well-known argument from functional analysis,
according to which convergence in weak L2 combined with convergence
of the L2 norm imply convergence in strong L2 .
1,1
At some point I have used the following theorem: If u ∈ Wloc (M ),
then for any constant c, ∇u = 0 almost everywhere on {u = c}. This
classical result can be found e.g. in [554, Theorem 6.19].
Another strategy to attack Theorem 23.14 would have been to start
1/N
from the “curve-above-tangent” formulation of the convexity of Jt ,
where Jt is the Jacobian determinant. (Instead I used the “curve-below-
2
Still an intense experience!
700 23 Gradient flows I
Here I have assumed that Φ is bounded below (which is the case when
Φ is the free energy functional). When inf Φ = −∞, there are still esti-
mates of the same type, only quite more complicated [30, Section 3.2].
Ambrosio and Savaré recently found a simplified proof of error esti-
mates and convergence for time-discretized gradient flows [34].
Otto applied the same method to various classes of nonlinear dif-
fusion equations, including porous medium and fast diffusion equa-
tions [669], and parabolic p-Laplace type equations [666], but also more
exotic models [667, 668] (see also [737]). For background about the
theory of porous medium and fast diffusion equations, the reader may
consult the review texts by Vázquez [804, 806].
In his work about porous medium equations, Otto also made two
important conceptual contributions: First, he introduced the abstract
formalism allowing him to interpret these equations as gradient flows,
directly at the continuous level (without going through the time-
discretization). Secondly, he showed that certain features of the porous
medium equations (qualitative behavior, rates of convergence to equi-
librium) were best seen via the new gradient flow interpretation. The
Bibliographical notes 701
This looked a bit like a formal game, but it was later found out
that related equations were common in the physical literature about
flux-limited diffusion processes [627], and that in fact Brenier’s very
equation had already been considered by Rosenau [708]. A rigorous
treatment of these equations leads to challenging analytical difficulties,
which triggered several recent technical works, see e.g. [39, 40, 618] and
Bibliographical notes 703
p
the references therein. By the way, the cost function 1 − 1 − |x − y|2
was later found to have applications in the design of lenses [712].
Lemma 23.28 in the Appendix is borrowed from [30, Lemma 4.3.4].
As Ambrosio pointed out to me, the argument is quite reminiscent of
Kruzkhov’s doubling method for the proof of uniqueness in the the-
ory of scalar conservation laws, see for instance the nice presentation
in [327, Sections 10.2 and 11.4]. It is important to note that the al-
most everywhere differentiability of F in both variables separately is
not enough to apply this lemma.
The stability theorem (Theorem 23.26) is a particular case of more
abstract general results, see for instance [30, Theorem 4.0.4(iv)].
In their study of gradient flows, Ambrosio, Gigli and Savaré point
out that it is useful to construct curves (µt ) satisfying the convexity-
type inequality
acceleration rather than the squared velocity; and there are heuristic
arguments to believe that this system should converge to a physical
solution satisfying Dafermos’s entropy criterion (the physical energy,
which is formally conserved, should decrease as much as possible). Nu-
merical simulations based on this scheme perform surprisingly well, at
least in dimension 1.
In the big picture also lies the work of Nelson [201, 647, 648, 650,
651, 652] on the foundations of quantum mechanics. Nelson showed that
the usual Schrödinger equation can be derived from a principle of least
action over solutions of a stochastic differential equation, where the
noise is fixed but the drift is unknown. Other names associated with this
approach are Guerra, Morato and Carlen. The reader may consult [343,
Chapter 5] for more information. In Chapter 7 of the same reference,
I have briefly made the link with the optimal transport problem. Von
Renesse [826] explicitly reformulated the Schrödinger equation as a
Hamiltonian system in Wasserstein space.
A more or less equivalent way to see Nelson’s point of view (ex-
plained to me by Carlen) is to study the critical points of the action
Z 1
A(ρ, m) = K(ρt , mt ) − F (ρt ) dt, (23.93)
0
Calculation rules
Having put equation (24.1) in gradient flow form, one may use Otto’s
calculus to shortcut certain formal computations, and quickly get rel-
evant results, without risks of computational errors. When it comes
to rigorous justification, things however are not so nice, and regular-
ity issues should — alas! — be addressed.1 For the most important of
these gradient flows, such as the heat, Fokker–Planck or porous medium
equations, these regularity issues are nowadays under good control.
To avoid inflating the size of this chapter much further, I shall not
go into these regularity issues, and be content with theorems that will
be conditional to the regularity of the solution.
(f ) ρ, p(ρ), Lp(ρ), p2 (ρ), ∇p2 (ρ), U ′ (ρ), ∇U ′ (ρ), LU ′ (ρ), ∇LU ′ (ρ),
L|∇U ′ (ρ)|2 , L(∇U ′ (ρ) ∇LU ′ (ρ)) and e−V satisfy adequate growth/decay
conditions at infinity.
Then the following formulas hold true:
Z Z
d
(i) ∀t > 0, A(ρt ) dν = − p′ (ρt ) A′′ (ρt )|∇ρt |2 dν;
dt
d
(ii) ∀t > 0, Uν (µt ) = −IU,ν (µt );
dt
(iii) ∀t > 0,
Z h i
d
IU,ν (µt ) = −2 k∇2 U ′ (ρt )k2HS + Ric + ∇2 V (∇U ′ (ρt )) p(ρt ) dν
dt M Z
2
+ LU ′ (ρt ) p2 (ρt ) dν;
M
q
d
(iv) ∀σ ∈ P2 (M ), W2 (σ, µt ) ≤ IU,ν (µt ) for almost all t > 0.
ac
dt
which is (ii).
Next, we can differentiate the previous expression once again along
the gradient flow µ̇ = −gradUν (µ):
d
2
D E
gradµt Uν
= −2 Hessµt · gradµt Uν , gradµt Uν ,
dt
and then (iii) follows from Formula 15.7.
As for (iv), this is just a particular case of the general formula
(d/dt)d(X0 , γ(t)) ≤ |γ̇(t)|γ(t) .
⊓
⊔
Z Z
= U (ρ) Lp(ρ) dν = LU ′ (ρ) p(ρ) dν,
′
(24.2)
Z
+ LU ′ (ρt ) p′ (ρt ) ∇ν · (ρt ∇U ′ (ρt )) dν. (24.3)
714 24 Gradient flows II: Qualitative properties
The last two terms in this formula are actually equal: Indeed, if ρ is
smooth then
Z Z
L ρ U (ρ) LU (ρ) p(ρ) dν = ρ U ′′ (ρ) LU ′ (ρ) Lp(ρ) dν
′′ ′
Z
= p′ (ρ) LU ′ (ρ) ∇ν · (ρ ∇U ′ (ρ)) dν.
e
where exp(∇ψ) is the optimal transport µt → σ. It follows that
sZ sZ
d+ W2 (µt , σ)2 |∇p(ρt )|2 e 2 dν
≤ dν ρt |∇ψ|
dt 2 ρt
q
= IU,ν (µt ) W2 (µt , σ);
Large-time behavior
Otto’s calculus, described in Chapter 15, was first developed to estimate
rates of equilibration for certain nonlinear diffusion equations. The next
theorem illustrates this.
Theorem 24.7 (Equilibration in positive curvature). Let M be
a Riemannian manifold equipped with a reference measure ν = e−V ,
V ∈ C 4 (M ), satisfying a curvature-dimension bound CD(K, N ) for
some K > 0, N ∈ (1, ∞], and let U ∈ DCN . Then:
(i) (exponential convergence to equilibrium) Any smooth so-
lution (µt )t≥0 of (24.1) satisfies the following estimates:
(a) [Uν (µt ) − Uν (ν)] ≤ e−2Kλ t [Uν (µ0 ) − Uν (ν)]
(b) IU,ν (µt ) ≤ e−2Kλ t IU,ν (µ0 ) (24.5)
(c) W2 (µt , ν) ≤ e−Kλ t W2 (µ0 , ν),
where − 1
p(r) N
λ := lim
1 sup ρ0 (x)
(24.6) .
r 1− N
r→0x∈M
In particular, λ is independent of ρ0 if N = ∞.
(ii) (exponential contraction) Any two smooth solutions (µt )t≥0
and (eµt )t≥0 of (24.1) satisfy
et ) ≤ e−Kλ t W2 (µ0 , µ
W2 (µt , µ e0 ), (24.7)
where
h i− 1
p(r) N
λ := lim 1 max sup ρ0 (x), sup ρ1 (x) . (24.8)
r→0 1− N
r x∈M x∈M
716 24 Gradient flows II: Qualitative properties
Remark 24.10. The rate of decay O(e−λ t ) is optimal for (24.9) if di-
mension is not taken into account; but if N is finite, the optimal rate
of decay is O(e−λt ) with λ = KN/(N − 1). The method presented in
this chapter is not clever enough to catch this sharp rate.
sup ρ(s) ≤ max (sup ρt , sup ρet ) ≤ max (sup ρ0 , sup ρe0 ), (24.12)
On the other hand, Theorem 23.9 shows that (d/dt)(W2 (µt , µ et )2 /2)
is equal to the left-hand side of (24.15), for almost all t. We conclude
that
d+
et )2 ≤ −2Kλ W2 (µt , µ
W2 (µt , µ et )2 , (24.16)
dt
and the desired result follows. ⊓
⊔
Remark 24.13. Here is an alternative scheme of proof for Theo-
rem 24.7(ii). The problem is to estimate
Z Z
∇U ′ (ρt ), ∇ψ dµt + ρt ), ∇ψe de
∇U ′ (e µt .
Short-time behavior
A popular and useful topic in the study of diffusion processes consists
in establishing regularization estimates in short time. Typically, a
certain functional used to quantify the regularity of the solution (for
instance, the supremum of the unknown or some Lebesgue or Sobolev
norm) is shown to be bounded like O(t−κ ) for some characteristic expo-
nent κ, independent of the initial datum (or depending only on certain
weak estimates on the initial datum), when t > 0 is small enough. Here
I shall present some slightly unconventional estimates of this type.
Theorem 24.16 (Short-time regularization for gradient flows).
Let M be a Riemannian manifold satisfying a curvature-dimension
bound CD(K, ∞), K ∈ R; let ν = e−V vol ∈ P2 (M ), with V ∈ C 4 (M ),
and let U ∈ DC∞ with U (1) = 0. Further, let (µt )t≥0 be a smooth
solution of (24.1). Then:
(i) If K ≥ 0 then for any t ≥ 0,
t2 IU,ν (µt ) + 2t Uν (µt ) + W2 (µt , ν)2 ≤ W2 (µ0 , ν)2 .
In particular,
W2 (µ0 , ν)2
Uν (µt ) ≤ , (24.18)
2t
W2 (µ0 , ν)2
IU,ν (µt ) ≤ . (24.19)
t2
(ii) If K ≥ 0 and t ≥ s > 0, then
p p q
W2 (µs , µt ) ≤ min 2 Uν (µs ) |t − s|, IU,ν (µs ) |t − s| (24.20)
p|t − s| |t − s|
≤ W2 (µ0 , ν) min √ , . (24.21)
s s
(iii) If K < 0, the previous conclusions become
e2Ct W2 (µ0 , ν)2 e2Ct W2 (µ0 , ν)2
Uν (µt ) ≤ ; IU,ν (µt ) ≤ ;
2t t2
p p q
Ct
W2 (µs , µt ) ≤ e min 2 Uν (µs ) |t − s|, IU,ν (µs ) |t − s|
p|t − s| |t − s|
2Ct
≤ e W2 (µ0 , ν) min √ , ,
s s
with C = −K.
722 24 Gradient flows II: Qualitative properties
Remark 24.21. I would bet that the estimates in (24.22) are optimal
in general (although they would deserve more thinking) as far as the
dependence on µ0 and t is concerned. On the other hand, if µ0 is given,
these bounds are terrible estimates for the short-time behavior of the
Kullback and Fisher informations as functions of just t. Indeed, the
correct scale for the Kullback information Hν (µt ) is O(log(1/t)), and
for the Fisher information it is O(1/t), as can be checked easily in the
particular case when M = Rn and ν is the Gaussian measure.
d+
W2 (µt , ν)2 ≤ −2 Uν (µt ). (24.25)
dt
Now introduce
d+ ψ
≤ [a′ (t) − b(t)] IU,ν (µt ) + [b′ (t) − 2c(t)] Uν (µt ) + c′ (t) W2 (µt , ν)2 .
dt
If we choose
d+
W2 (µs , µt )2 ≤ 2 [Uν (µs ) − Uν (µt )] ≤ 2 Uν (µs ).
dt
So
W2 (µs , µt )2 ≤ 2 Uν (µs ) |t − s|. (24.27)
Then (ii) results from the combination of (24.26) and (24.27), together
with (i).
724 24 Gradient flows II: Qualitative properties
The proof of (iii) is pretty much the same, with the following mod-
ifications:
d IU,ν (µt )
≤ (−2K) IU,ν (µt );
dt
d+
W2 (µt , ν)2 ≤ −2 Uν (µt ) + (−2K) W2 (µt , ν)2 ;
dt
ψ(t) := e2Kt t2 IU,ν (µt ) + 2t Uν (µt ) + W2 (µt , ν)2 .
Details are left to the reader. (The estimates in (iii) can be somewhat
refined.) ⊓
⊔
Uν (µ0 )
IU,ν (µt ) ≤ .
t
Remark 24.23. There are many known regularization results in short
time, for certain of the gradient flows considered in this chapter. The
two most famous examples are:
• the Li–Yau estimates, which give lower bounds on ∆ log ρt , for
a solution of the heat equation on a Riemanian manifold, under
certain curvature-dimension conditions. For instance, if M satisfies
CD(0, N ), then
N
∆ log ρt ≥ − ;
2t
• the Aronson–Bénilan estimates, which give lower bounds on
∆ρm−1
t for solutions of the nonlinear diffusion equation ∂t ρ = ∆ρm
n
in R , where 1 − 2/n < m < 1:
m n
∆(ρm−1
t )≥− , λ = 2 − n(1 − m).
m−1 λt
There is an obvious similarity between these two estimates, and both
can be interpreted as a lower bound on the rate of divergence of the
vector field which drives particles in the gradient flow interpretation of
these partial differential equations. I think it would be very interesting
to have a unified proof of these inequalities, under certain geometric
conditions. For instance one could try to use the gradient flow interpre-
tation of the heat and nonlinear diffusion equations, and maybe some
localization by restriction.
Bibliographical notes 725
Bibliographical notes
In [669], Otto advocated the use of his formalism both for the purpose
of finding new schemes of proof, and for giving a new understanding of
certain results.
What I call the Fokker–Planck equation is
∂µ
= ∆µ + ∇ · (µt ∇V ).
∂t
This is in fact an equation on measures. It can be recast as an equation
on functions (densities):
∂ρ
= ∆ρ − ∇V · ∇ρ.
∂t
From the point of view of stochastic processes, the relation between
these two formalisms is the following: µt can be thought
√ of as law (Xt ),
where Xt is the stochastic process defined by dXt = 2 dBt −∇V (Xt ) dt
(Bt = standard Brownian motion on the manifold), while ρt (x) is de-
fined by the equation ρt (x) = E x ρ0 (Xt ) (the subscript x means that
the process Xt starts at X0 = x). In the particular case when V is a
quadratic potential in Rn , the evolution equation for ρt is often called
the Ornstein–Uhlenbeck equation.
The observation that the Fisher information Iν is the time-derivative
of the entropy functional −Hν along the heat semigroup seems to first
appear in a famous paper by Stam [758] at the end of the fifties, in the
case M = R (equipped with the Lebesgue measure). Stam gives credit
to de Bruijn for that remark. The generalization appearing in Theo-
rem 24.2(ii) has been discovered and rediscovered by many authors.
Theorem 24.2(iii) goes back to Bakry and Émery [56] for the case
U (r) = r log r. After many successive generalizations, the statement as
I wrote it was formally derived in [577, Appendix D]. To my knowledge,
the argument given in the present chapter is the first rigorous one to
be written down in detail (modulo the technical justifications of the
integrations by parts), although it is a natural expansion of previous
works.
Theorem 24.2(iv) was proven by Otto and myself [671] for σ = µ0 .
The case σ = ν is also useful and was considered in [219].
Regularity theory for porous medium equations has been the object
of many works, see in particular the synthesis works by Vázquez [804,
805, 806]. When one studies nonlinear diffusions by means of optimal
726 24 Gradient flows II: Qualitative properties
transport theory, the regularity theory is the first thing to worry about.
In a Riemannian context, Demange [291, 292, 290, 293] presents many
approximation arguments based on regularization, truncation, etc. in
great detail. Going into these issues would have led me to considerably
expand the size of this chapter; but ignoring them completely would
have led to incorrect proofs.
It has been known since the mid-seventies that logarithmic Sobolev
inequalities yield rates of convergence to equilibrium for heat-like equa-
tions, and that these estimates are independent of the dimension. For
certain problems of convergence to equilibrium involving entropy, log-
arithmic Sobolev inequalities are quite more convenient than spectral
tools. This is especially true in infinite dimension, although logarithmic
Sobolev inequalities are also very useful in finite dimension. For more
information see the bibliographical notes of Chapter 21.
As recalled in Remark 24.11, convergence in the entropy sense im-
plies convergence in total variation. In [220] various functional methods
leading to convergence in total variation are examined and compared.
Around the mid-nineties, Toscani [784, 785] introduced the logarith-
mic Sobolev inequality in kinetic theory, where it was immediately rec-
ognized as a powerful tool (see e.g. [300]). The links between logarithmic
Sobolev inequalities and Fokker–Planck equations were re-investigated
by the kinetic theory community, see in particular [43] and the refer-
ences therein. The emphasis was more on proving logarithmic Sobolev
inequalities thanks to the study of the convergence to equilibrium for
Fokker–Planck equations, than the reverse. So the key was the study of
convergence to equilibrium in the Fisher information sense, as in Chap-
ter 25; but the final goal really was convergence in the entropy sense.
To my knowledge, it is only in a recent study of certain algorithms
based on stochastic integration [549], that convergence in the Fisher
information sense in itself has been found useful. (In this work some
constructive criteria for exponential convergence in Fisher information
are given; for instance this is true for the heat equation ∂t ρ = ∆ρ, under
a CD(K, ∞) bound (K < 0) and a logarithmic Sobolev inequality.)
Around 2000, it was discovered independently by Otto [669], Carrillo
and Toscani [215] and Del Pino and Dolbeault [283] that the same
“information-theoretical” tools could be used for nonlinear equations
of the form
∂ρ
= ∆ρm (24.28)
∂t
Bibliographical notes 727
slightly stronger than the one which I derived in Theorem 24.7 and
Remark 24.12, but the asymptotic rate is the same.
All the methods described before apply to the study of the time
asymptotics of the porous medium equation ∂t ρ = ∆ρm , but only under
the restriction m ≥ 1 − 1/N . In that regime one can use time-rescaling
and tools similar to the ones described in this chapter, to prove that
the solutions become close to Barenblatt’s self-similar solution.
When m < 1−1/N , displacement convexity and related tricks do not
apply any more. This is why it was rather a sensation when Carrillo and
Vázquez [217] applied the Aronson–Bénilan estimates to the problem
of asymptotic behavior for fast diffusion equations with exponents m
in (1 − N2 , 1 − N1 ), which is about the best that one can hope for, since
Barenblatt profiles do not exist for m ≤ 1 − 2/N .
Here we see the limits of Otto’s formalism: such results as the di-
mensional refinement of the rate of convergence for diffusive equations
(Remark 24.10), or the Carrillo–Vázquez estimates, rely on inequalities
of the form
Z Z
p(ρ) Γ2 ∇U ′ (ρ) dν + p2 (ρ) (LU ′ (ρ))2 dν ≥ . . .
in which ones takes advantage of the fact that the same function ρ
appears in the terms p(ρ) and p2 (ρ) on the one hand, and in the terms
∇U ′ (ρ) and LU ′ (ρ) on the other. The technical tool might be changes
of variables for the Γ2 (as in [541]), or elementary integration by parts
(as in [217]); but I don’t see any interpretation of these tricks in terms
of the Wasserstein space P2 (M ).
The story about the rates of equilibration for fast diffusion equations
does not end here. At the same time as Carrillo and Vázquez obtained
their main results, Denzler and McCann [298, 299] computed the spec-
tral gap for the linearized fast diffusion equations in the same interval
of exponents (1 − N2 , 1 − N1 ). This study showed that the rate of conver-
gence obtained by Carrillo and Vázquez is off the value suggested by the
linearized analysis by a factor 2 (except in the radially symmetric case
where they obtain the optimal rate thanks to a comparison method).
The connection between the nonlinear and the linearized dynamics is
still unclear, although some partial results have been obtained by Mc-
Cann and Slepčev [619]. More recently, S.J. Kim and McCann [517]
have derived optimal rates of convergence for the “fastest” nonlinear
diffusion equations, in the range 1 − 2/N < m ≤ 1 − 2/(N + 2), by
comparison methods involving Newtonian potentials. Another work by
730 24 Gradient flows II: Qualitative properties
Cáceres and Toscani [183] also recovers some of the results of Denzler
and McCann by means of completely different methods with their roots
in kinetic theory. There is still ongoing research to push the rates of
convergence and the range of admissible nonlinearities, in particular by
Denzler, Koch, McCann and probably others.
In dimension 2, the limit case m = 0 corresponds to a logarithmic
diffusion; it is related to geometric problems, such as the evolution of
conformal surfaces or the Ricci flow [806, Chapter 8].
More general nonlinear diffusion equations of the form ∂t ρ = ∆p(ρ)
have been studied by Biler, Dolbeault and Esteban [119], and Car-
rillo, Di Francesco and Toscani [210, 211] in Rn . In the latter work
the rescaling procedure is recast in a more geometric and physical
interpretation, in terms of temperature and projections; a sequel by
Carrillo and Vázquez [218] shows that the intermediate asymptotics
can be complicated for well-chosen nonlinearities. Nonlinear diffusion
equations on manifolds were also studied by Demange [291] under a
CD(K, N ) curvature-dimension condition, K > 0.
Theorem 24.7(ii) is related to a long tradition of study of contraction
rates in Wasserstein distance for diffusive equations [231, 232, 458, 662].
Sturm and Renesse [764] noted that such contraction rates character-
ize nonnegative Ricci curvature; Sturm [761] went on to give various
characterizations of CD(K, N ) bounds in terms of contraction rates for
possibly nonlinear diffusion equations.
In the one-dimensional case (M = R) there are alternative methods
to get contraction rates in W2 distance, and one can also treat larger
classes of models (for instance viscous conservation laws), and even ob-
tain decay in Wp for any p; see for instance [137, 212]. Recently, Brenier
found a re-interpretation of these one-dimensional contraction proper-
ties in terms of monotone operators [167]. Also the asymptotic behavior
of certain conservation laws has been analyzed in this way [208, 209]
(with the help of the strong “W∞ distance”!).
Another model for which contraction in W2 distance has been estab-
lished is the Boltzmann equation, in the particular case of a spatially
homogeneous gas of Maxwellian molecules. This contraction property
was discovered by Tanaka [644, 776, 777]; see [138] for recent work on
the subject. Some striking uniqueness results have been obtained by
Fournier and Mouhot [377, 379] with a related method (see also [378]).
To conclude this discussion about contraction estimates, I shall
briefly discuss some links with Perelman’s analysis of the backward
Bibliographical notes 731
where the infimum is taken over all C 1 paths γ : [t0 , t1 ] → M such that
γ(t0 ) = x and γ(t1 ) = y, and S(x, t) is the scalar curvature of M (evolv-
ing under backward Ricci flow) at point x and time t. As in Chapter 7,
this induces a Hamilton–Jacobi equation, and an action R in the space of
measures. Then it is shown in [576] that H(µt ) − φt dµt is convex in
t along the associated displacement interpolation, where (φt ) is a so-
lution of the Hamilton–Jacobi equation. Other theorems in [576, 782]
deal with a variant of L0 in which some time-rescalings have been per-
formed. Not only do these results generalize the contraction property
of [620], but they also imply Perelman’s estimates of monotonicity of
the so-called W -entropy and reduced volume functionals (which were
an important tool in the proof of the Poincaré conjecture).
I shall now comment on short-time decay estimates. The short-time
behavior of the entropy and Fisher information along the heat flow
(Theorem 24.16) was studied by Otto and myself around 1999 as a
technical ingredient to get certain a priori estimates in a problem of hy-
drodynamical limits. This work was not published, and I was quite sur-
prised to discover that Bobkov, Gentil and Ledoux [127, Theorem 4.3]
had found similar inequalities and applied them to get a new proof of
the HWI inequality. Otto and I published our method [672] as a com-
ment to [127]; this is the same as the proof of Theorem 24.16. It can be
considered as an adaptation, in the context of the Wasserstein space, of
some classical estimates about gradient flows in Hilbert spaces, that can
be found in Brézis [171, Théorème 3.7]. The result of Bobkov, Gentil
and Ledoux is actually more general than ours, because these authors
seem to have sharp constants under CD(K, ∞) for all values of K ∈ R,
732 24 Gradient flows II: Qualitative properties
IU,ν (µ)
Uν (µ) − Uν (ν) ≤ .
2Kλ
Iν (µ)
∀µ ∈ P ac (M ), Hν (µ) ≤ .
2K
Sloppy proof of Theorem 25.1. By using Theorem 17.7(vii) and an ap-
proximation argument, we may assume that ρ is smooth, that U is
smooth on (0, +∞), that the solution (ρt )t≥0 of the gradient flow
∂ρ
= L p(ρt ),
∂t
starting from ρ0 = ρ is smooth, that Uν (µ0 ) is finite, and that t →
Uν (µt ) is continuous at t = 0.
For notational simplicity, let
dH(t)
= −I(t), I(t) ≤ I(0) e−2Kλ t .
dt
By Theorem 24.7(i)(a), H(t) → 0 as t → ∞. So
Z +∞ Z +∞
I(0)
H(0) = I(t) dt ≤ I(0) e−2Kλ t dt = ,
0 0 2Kλ
which is the desired result. ⊓
⊔
738 25 Gradient flows III: Functional inequalities
Sloppy proof of Theorem 25.3. By density, we may assume that the den-
sity ρ0 of µ is smooth; we may also assume that A and U are smooth
on (0, +∞) (recall Proposition 17.7(vii)). Let (ρt )t≥0 be the solution of
the gradient flow equation
∂ρ
= ∇ · (ρ∇U ′ (ρ)), (25.2)
∂t
and as usual µt = ρt ν. It can be shown that ρt is uniformly bounded
below by a positive number as t → ∞.
From log Sobolev to Talagrand, revisited 739
By Theorem 24.2(iii),
Z
d 1
1− N
IU,ν (µt ) ≤ −2Kλ ρt |∇U ′ (ρt )|2 dν. (25.3)
dt M
1
On the other hand, from the assumption A′′ (r) = r − N U ′′ (r),
1
∇A′ (ρ) = ρ− N ∇U ′ (ρ).
Further assume that the Cauchy problem associated with the gradient
flow ∂t ρ = L p(ρ) admits smooth solutions for smooth initial data.
Then, for any µ ∈ P2ac (M ), holds the inequality
d+ q
W2 (µ0 , µt ) ≤ IU,ν (µt ).
dt
On the other hand, by assumption,
q r
IU,ν (µt ) d 2 Uν (µt )
IU,ν (µt ) ≤ p =− . (25.8)
2KUν (µt ) dt K
The proofs in the present chapter were based on gradient flows of dis-
placement convex functionals, while proofs in Chapters 21 and 22 were
more directly based on displacement interpolation. How do these two
strategies compare to each other?
From a formal point of view, they are not so different as one may
think. Take the case of the heat equation,
∂ρ
= ∆ρ,
∂t
or equivalently
∂ρ
+ ∇ · ρ ∇(− log ρ) = 0.
∂t
The evolution of ρ is determined by the “vector field” ρ → (− log ρ),
in the space of probability densities. Rescale time and the vector field
itself as follows:
εt
ϕε (t, x) = −ε log ρ ,x .
2
742 25 Gradient flows III: Functional inequalities
∂ϕε |∇ϕε |2 ε
+ = ∆ϕε .
∂t 2 2
Passing to the limit as ε → 0, one gets, at least formally, the Hamilton–
Jacobi equation
∂ϕ |∇ϕ|2
+ = 0,
∂t 2
which is in some sense the equation driving displacement interpolation.
There is a general principle here: After suitable rescaling, the ve-
locity field associated with a gradient flow resembles the velocity field
of a geodesic flow. Here might be a possible way to see this. Take an
arbitrary smooth function U , and consider the evolution
need a control of Hess Uν in all directions. This might explain why there
is at present no displacement convexity analogue of Demange’s proof of
the Sobolev inequality (so far only weaker inequalities with nonsharp
constants have been obtained).
On the other hand, proofs based on displacement convexity are usu-
ally rather simpler, and more robust than proofs based on gradient
flows: no issues about the regularity of the semigroup, no subtle in-
terplay between the Hessian of the functional and the “direction of
evolution”. . .
In the end we can put some of the main functional inequalities dis-
cussed in these notes in a nice array. Below, “LSI” stands for “Logarith-
mic Sobolev inequality”; “T” for “Talagrand inequality”; and “Sob2 ”
for the Sobolev inequality with exponent 2. So LSI(K), T(K), HWI(K)
and Sob2 (K, N ) respectively stand for (21.4), (22.4) (with p = 2),
(20.17) and (21.8).
Theorem Gradient flow proof Displ. convexity proof
CD(K, ∞) ⇒ LSI(K) Bakry–Émery Otto–Villani
LSI(K) ⇒ T(K) Otto–Villani Bobkov–Gentil–Ledoux
CD(K, ∞) ⇒ HWI(K) Bobkov–Gentil–Ledoux Otto–Villani
CD(K, N ) ⇒ Sob2 (K, N ) Demange ??
Bibliographical notes
Stam used a heat semigroup argument to prove an inequality which
is equivalent to the Gaussian logarithmic Sobolev inequality in dimen-
sion 1 (recall the bibliographical notes for Chapter 21). His argument
was not completely rigorous because of regularity issues, but can be
repaired; see for instance [205, 783].
The proof of Theorem 25.1 in this chapter follows the strategy by
Bakry and Émery, who were only interested in the Particular Case 25.2.
These authors used a set of calculus rules which has been dubbed the
“Γ2 calculus”. They were not very careful about regularity issues, and
for that reason the original proof probably cannot be considered as
completely rigorous (in particular for noncompact manifolds, in which
regularity issues are not so innocent, even if the curvature-dimension
condition prevents the blow-up of the heat semigroup). However, re-
cently Demange [291] carried out complete proofs for much more deli-
cate situations, so there is no reason to doubt that the Bakry–Émery
744 25 Gradient flows III: Functional inequalities
9N rU ′′ (r) 1
rq ′ (r) + q(r) ≥ q(r)2 , q(r) = ′
+ . (25.10)
4(N + 2) U (r) N
x x0
y z y0 z0
Fig. 26.1. The triangle on the left is drawn in X , the triangle on the right is drawn
on the model space R2 ; the lengths of their edges are the same. The thin geodesic
lines go through the apex to the middle of the basis; the one on the left is longer
than the one on the right. In that sense the triangle on the left is fatter than the
triangle on the right. If all triangles in X look like this, then X has nonnegative
curvature. (Think of a triangle as the belly of some individual, the belt being the
basis, and the neck being the apex; of course the line going from the apex to the
middle of the basis is the tie. The fatter the individual, the longer his tie should be.)
x x0
11
00
00
11
111
000 00
11
000
111
000
111 00
11
000
111 00
11
00
11
000
111
000
111 00
11
000
111 00
11
y 000
111 z y0 00
11 z0
000
111
Still there is a generalization of what it means to have curvature
bounded below by K ∈ R, where K is an arbitrary real number, not
necessarily 0. It is obtained by replacing the model space R2 by the
model space with constant curvature K, that is:
√ √
• the sphere S 2 (1/ K) with radius R = 1/ K, if K > 0;
• the plane R2 , if K = 0;
p
• thephyperbolic space H(1/ |K|) with “hyperbolic radius” R =
1/ |K|, if K < 0; this can be realized as the half-plane R×(0, +∞),
equipped with the metric g(x,y) (dx dy) = (dx2 + dy 2 )/(|K|y 2 ).
Bibliographical notes
∀t ∈ [0, 1], d(x, γt )2 ≥ (1−t) d(x, y)2 +t d(x, z)2 −S 2 t(1−t) d(y, z)2 .
The central question in this chapter is the following: What does it mean
to say that a metric-measure space (X , dX , νX ) is “close” to another
metric-measure space (Y, dY , νY )? We would like to have an answer
that is as “intrinsic” as possible, in the sense that it should depend
only on the metric-measure properties of X and Y.
So as not to inflate this chapter too much, I shall omit many proofs
when they can be found in accessible references, and prefer to insist on
the main stream of ideas.
Hausdorff topology
There is a well-established notion of distance between compact sets in
a given metric space, namely the Hausdorff distance. If X and Y are
two compact subsets of a metric space (Z, d), their Hausdorff distance
is
dH (X , Y) = max sup d(x, Y), sup d(y, X ) ,
x∈X y∈Y
Fig. 27.1. In solid lines, the borders of the two sets X and Y; in dashed lines, the
borders of their enlargements. The width of enlargement is just sufficient that any
of the enlarged sets covers both X and Y.
historically the former came first). This will become more apparent if
I rewrite the Hausdorff distance as
n o
dH (A, B) = inf r > 0; A ⊂ B r] and B ⊂ Ar] ,
where C stands for an arbitrary closed set, C r] is the set of all points
whose distance to C is no more than r, i.e. the union of all closed balls
B[x, r], x ∈ C.
The analogy between the two notions goes further: While the
Prokhorov distance can be defined in terms of couplings, the Hausdorff
distance can be defined in terms of correspondences. By definition, a
correspondence (or relation) between two sets X and Y is a subset R of
X × Y: if (x, y) ∈ R, then x and y are said to be in correspondence; it
is required that each x ∈ X should be in correspondence with at least
one y, and each y ∈ Y should be in correspondence with at least one x.
Then we have the two very similar formulas:
n
dP (µ, ν) = inf r > 0; ∃ coupling (X, Y ) of (µ, ν); o
P [d(X, Y ) > r] ≤ r ;
n
d (µ, ν) = inf r > 0; ∃ correspondence R in X × Y;
H o
∀(x, y) ∈ R, d(x, y) ≤ r .
The Gromov–Hausdorff distance 761
nothing in common? First one would like to say that two spaces which
are isometric really are the same. Recall the definition of an isometry:
If (X , d) and (X ′ , d′ ) are two metric spaces, a map f : X → X ′ is called
an isometry if:
(a) it preserves distances: for all x, y ∈ X , d′ (f (x), f (y)) = d(x, y);
(b) it is surjective: for any x′ ∈ X ′ there is x ∈ X with f (x) = x′ .
An isometry is automatically injective, so it has to be a bijection,
and its inverse f −1 is also an isometry. Two metric spaces are said to
be isometric if there exists an isometry between them. If two spaces
X and X ′ are isometric, then any statement about X which can be
expressed in terms of just the distance, is automatically “transported”
to X ′ by the isometry.
This motivates the desire to work with isometry classes, rather than
metric spaces. By definition, an isometry class X is the set of all metric
spaces which are isometric to some given space X . Instead of “isometry
class”, I shall often write “abstract metric space”. All the spaces in a
given isometry class have the same topological properties, so it makes
sense to say of an abstract metric space that it is compact, or complete,
etc.
This looks good, but a bit frightening: There are so many metric
spaces around that the concept of abstract metric space seems to be
ill-posed from the set-theoretical point of view (just like there is no “set
of all sets”). However, things becomes much more friendly when one
realizes that any compact metric space, being separable, is isometric to
the completion of N for a suitable metric. (To see this, introduce a dense
sequence (xk ) in your favorite space X , and define d(k, ℓ) = dX (xk , xℓ ).)
Then we might think of an isometry class as a subset of the set of all
distances on N; this is still huge, but at least it makes sense from a
set-theoretical point of view.
Now the problem is to find a good distance on the set of abstract
compact metric spaces. The natural concept here is the Gromov–
Hausdorff distance, which is obtained by formally taking the quo-
tient of the Hausdorff distance by isometries: If (X , dX ) and (Y, dY ) are
two compact metric spaces, define
Representation by semi-distances
x R x′ ⇐⇒ d(x, x′ ) = 0.
Z = (X ⊔ Y)/d := (X ⊔ Y)/R
there is a triple of metric spaces (Xe1 , Xe2 , Xe3 ), all subspaces of a common
metric space (Z, dZ ), such that (Xe1 , Xe2 ) is isometric (as a coupling) to
(X1′ , X2′ ), and (Xe2 , Xe3 ) is isometric to (X2′′ , X3′′ ).
Z = (X1 ⊔ X2 ⊔ X3 )/d,
∀y ∈ Y ∃x ∈ X; d(f (x), y) ≤ ε.
In particular, dH (f (X ), Y) ≤ ε.
then dis (R) ≤ 3ε, and the left inequality in (27.3) follows by for-
mula (27.2). Conversely, if R is a relation with distortion η, then for
any ε > η one can define an ε-isometry f whose graph is included in
R: The idea is to define f (x) = y, where y is such that (x, y) ∈ R. (See
the comments at the end of the bibliographical notes.)
The symmetry between X and Y seems to have been lost in (27.3),
but this is not serious, because any approximate isometry admits an
approximate inverse: If f is an ε-isometry X → Y, then there is a
(4ε)-isometry f ′ : Y → X such that for all x ∈ X , y ∈ Y,
dX f ′ ◦ f (x), x ≤ 3ε, dY f ◦ f ′ (y), y ≤ ε. (27.4)
Fig. 27.2. A very thin tire (2-dimensional manifold) is very close, in Gromov–
Hausdorff sense, to a circle (1-dimensional manifold)
Remark 27.7. Keeping Remark 27.3 in mind, two spaces are close in
Gromov–Hausdorff topology if they look the same to a short-sighted
person (see Figure 27.2). I learnt from Lott the expression Mr. Magoo
topology to convey this idea.
Fig. 27.3. A balloon with a very small handle (not simply connected) is very close
to a balloon without handle (simply connected).
Noncompact spaces
If one works in length spaces, as will be the case in the rest of this
course, the above definition does not seem so good because K (ℓ) will in
general not be a strictly intrinsic (i.e. geodesic) length space: Geodesics
joining elements of K (ℓ) might very well leave K (ℓ) at some intermediate
time; so properties involving geodesics might not pass to the limit. This
is the reason for requirement (iii) in the following definition.
dis (Rk ) −→ 0;
Many theorems about metric spaces still hold true, after appropriate
modification, for converging sequences of metric spaces. Such is the
Functional analysis on Gromov–Hausdorff converging sequences 775
Remark 27.23. In the previous proposition, the fact that the maps
fk are approximate isometries is useful only in the noncompact case;
otherwise it just boils down to the compactness of P (X ) when X is
compact.
∀k ∈ N, νk [BR] (⋆k )] ≤ M.
Proof of Proposition 27.24. For any fixed R > 0, (fk )# νk [BR] (⋆)] =
νk [(fk )−1 (BR] (⋆))] ≤ νk [BR+εk ] (⋆k )] is uniformly bounded by M (R+ 1)
for k large enough. Since on the other hand BR] (⋆) is compact, we may
extract a subsequence in k such that (fk )# νk [BR] (⋆)] converges to some
finite measure νR in the weak-∗ topology of BR] (⋆). Then the result
follows by taking R = ℓ → ∞ and applying a diagonal extraction. ⊓ ⊔
(The metric structure of X and Y has not disappeared since the infi-
mum is only over isometries.)
Both dGHP and dGP satisfy the triangle inequality, as can be checked
by a gluing argument again. (Now one should use both the metric and
the probabilistic gluing!) Then there is no difficulty in checking that
dGHP is a honest distance on classes of metric-measure isomorphisms,
with point of view (a). Similarly, dGP is a distance on classes of metric-
measure isomorphisms, with point of view (b), but now it is quite non-
trivial to check that [dGP (X , Y) = 0] =⇒ [X = Y]. I shall not insist on
this issue, for in the sequel I shall focus on point of view (a).
There are several variants of these constructions:
1. Use other distances on probability metrics. Essentially everybody
agrees on the Hausdorff distance to measure distances between sets, but
as we know, there are many natural choices of distances between prob-
ability measures. In particular, one can replace the Prokhorov distance
by the Wasserstein distance of order p, and thus obtain the Gromov–
Hausdorff–Wasserstein distance of order p:
n o
dGHWp (X , Y) = inf dH (X ′ , Y ′ ) + Wp (νX ′ , νY ′ ) ,
The discussion of the previous section showed that one should be cau-
tious about which notion of convergence is used. However, whenever
they are available, doubling estimates, in the sense of Definition 18.1,
basically rule out the discrepancy between approaches (a) and (b)
above. The idea is that doubling prevents the formation of sharp spikes
as in Figure 27.4. This discussion is not so clearly made in the literature
that I know, so in this section I shall provide more careful proofs.
where 1
ΦR,D (δ) = max 8 δ, R (16 δ) log2 D + δ
Now let x ∈ X . There is at least one index j such that d(x, xj ) < 2r;
otherwise x would lie in the complement of the union of all the balls
B2r (xj ), and could be added to the family {xj }. So {xj } constitutes
a 2r-net in X , with cardinality at most N = D n . This concludes the
proof. ⊓
⊔
After all this discussion I can state a precise definition of the notion of
convergence that will be used in the sequel for metric-measure spaces:
this is the measured Gromov–Hausdorff topology. It is associated
with the convergence of spaces as metric spaces and as measure spaces.
This concept can be defined quantitatively in terms of, e.g., the distance
dGHP and its variants, but I shall be content with a purely topological
(qualitative) definition. As in the case of the plain Gromov–Hausdorff
topology, there is a convenient reformulation in terms of approximate
isometries.
(fk )# νk −−−→ ν
k→∞
(fk )# νk −−−→ ν,
k→∞
Proof of Theorem 27.32. Part (i) follows from the combination of Propo-
sition 27.29, Theorem 27.10 and Proposition 27.22. Part (ii) follows in
addition from the definition of pointed measured Gromov–Hausdorff
convergence and Proposition 27.24. Note that in (ii), the doubling as-
sumption is used to prevent the formation of “spikes”, but also to ensure
uniform bounds on the mass of balls of radius R for any R, once it is
known for some R. (Here I chose R = 1, but of course any other choice
would have done.) ⊓
⊔
Bibliographical notes 785
Bibliographical notes
Remarks 9.1.40 to 9.1.42 in [174] to avoid traps (or see [626]). For
Alexandrov spaces with curvature bounded below, both notions coin-
cide, see [174, Section 10.9] and the references therein.
A classical source for the measured Gromov–Hausdorff topology is
Gromov’s book [438]. The point of view mainly used there consists in
forgetting the Gromov–Hausdorff distance and “using the measure to
kill infinity”; so the distances that are found there would be of the sort
of dGWp or dGP . An interesting example is Gromov’s “box” metric 1 ,
defined as follows [438, pp. 116–117]. If d and d′ are two metrics on
a given probability space X , define 1 (d, d′ ) as the infimum of ε > 0
such that |d − d′ | ≤ ε outside of a set of measure at most ε in X × X
(the subscript 1 means that the measure of the discarded set is at
most 1 × ε). Take the particular metric space I = [0, 1], equipped
with the usual distance and with the Lebesgue measure λ, as reference
space. If (X , d, ν) and (X ′ , d′ , ν ′ ) are two Polish probability spaces,
define 1 (X , X ′ ) as the infimum of 1 (d ◦ φ, d′ ◦ φ′ ) where φ (resp.
φ′ ) varies in the set of all measure-preserving maps φ : I → X (resp.
φ′ : [0, 1] → X ′ ).
Sturm made a detailed study of dGW2 (denoted by D in [762]) and
advocated it as a natural distance between classes of equivalence of
probability spaces in the context of optimal transport. He proved that
D satisfies the length property, and compared it with Gromov’s box
distance as follows:
1 1
D(X , Y) ≤ max diam (X ), diam (Y) + 4 1 (X , Y) 2
D(X , Y) ≥ (1/2)3/2 (X , Y) 32 .
1
The alternative point of view in which one takes care of both the
metric and the measure was introduced by Fukaya [386]. This is the
one which was used by Lott and myself in our study of displacement
convexity in geodesic spaces [577].
The pointed Gromov–Hausdorff topology is presented for instance
in [174]; it has become very popular as a way to study tangent spaces
in the absence of smoothness. In the context of optimal transport, the
pointed Gromov–Hausdorff topology was used independently in [30,
Section 12.4] and in [577, Appendix A].
I introduced the definition of local Gromov–Hausdorff topology for
the purpose of these notes; it looks to me quite natural if one wants to
Bibliographical notes 787
Most of the objects that were introduced and studied in the context of
optimal transport on Riemannian manifolds still make sense on a gen-
eral metric-measure length space (X , d, ν), satisfying certain regularity
assumptions. I shall assume here that (X , d) is a locally compact,
complete separable geodesic space equipped with a σ-finite refer-
ence Borel measure ν. From general properties of such spaces, plus the
results in Chapters 6 and 7:
• the cost function c(x, y) = d(x, y)2 is associated with the coercive
Lagrangian action A(γ) = L(γ)2 , and minimizers are constant-
speed, minimizing geodesics, the collection of which is denoted by
Γ (X );
790 28 Stability of optimal transport
and |v|(t, x) really is |γ̇ x,t |, that is, the speed at time t and position x.
Thus Definition 28.1 is consistent with the usual notions of kinetic
energy and speed field (speed = norm of the velocity).
and |v| = |v|(t, x) the associated speed field. Then, for each t ∈ (0, 1)
one can modify |v|(t, · ) on a µt -negligible set in such a way that for all
x, y ∈ X ,
p
C diam (X ) p
|v|(t, x) − |v|(t, y) ≤ p d(x, y), (28.2)
t(1 − t)
Proof of Theorem 28.5. Let t be a fixed time in (0, 1). Let γ1 and γ2
be two minimizing geodesics in the support of Π, and let x = γ1 (t),
y = γ2 (t). Then by Theorem 8.22,
p
C diam (X ) p
L(γ1 ) − L(γ2 ) ≤ p d(x, y). (28.3)
t(1 − t)
Let Xt be the union of all γ(t), for γ in the support of Π. For a given
x ∈ Xt , there might be several geodesics γ passing through x, but
(as a special case of (28.3)) they will all have the same length; define
|v|(t, x) to be that length. This is a measurable function, since it can
be rewritten Z
|v|(t, x) = L(γ) Π dγ γ(t) = x ,
Γ
where Π(dγ|γ(t) = x) is of course the disintegration of Π with respect
to µt = law (γt ). Then |v|(t, · ) is an admissible density for εt , and as a
consequence of (28.3) it satisfies (28.2) for all x, y ∈ Xt .
To extend |v|(t, x) on the whole of X , I shall adapt the proof of
a√well-known extension theorem for Lipschitz functions. Let H :=
C diam X /(t(1 − t)), so that |v| is Hölder-1/2 on Xt with constant H.
Define, for x ∈ X ,
h p i
w(x) := inf H d(x, y) + |v|(t, y) .
y∈Xt
w(x)−w(x′ )
h p i h p i
= inf H d(x, y) + |v|(t, y) − inf H d(x ′ , y ′ ) + |v|(t, y ′ )
y∈Xt y ′ ∈Xt
h p p i
′ ′ ′
= sup inf H d(x, y) − H d(x , y ) + |v|(t, y) − |v|(t, y )
y ′ ∈Xt y∈Xt
hp p p i
≤ H sup inf d(x, y) − d(x′ , y ′ ) + d(y, y ′ )
y ′ ∈Xt y∈Xt
hp p i
≤ H sup d(x, y ′ ) − d(x′ , y ′ ) . (28.4)
y ′ ∈Xt
But
p p p p
d(x, y ′ ) ≤d(x, x′ ) + d(x′ , y ′ ) ≤ d(x, x′ ) + d(x′ , y ′ );
p
so (28.4) is bounded above by H d(x, x′ ).
To summarize: w coincides with |v|(t, · ) on Xt , and it satisfies the
same Hölder-1/2 estimate. Since µt is concentrated on Xt , w is also
an admissible density for εt , so we can take it as the new definition of
|v|(t, · ), and then (28.2) holds true on the whole of X . ⊓
⊔
The main goal of this section is the proof of the convergence of the
Wasserstein space P2 (X ), as expressed in the next statement.
Theorem 28.6 (If Xk converges then P2 (Xk ) also). Let (Xk )k∈N
and X be compact metric spaces such that
GH
Xk −−→ X .
Then
GH
P2 (Xk ) −−→ P2 (X ).
Moreover, if fk : Xk → X are approximate isometries, then the maps
(fk )# : P2 (Xk ) → P2 (X ), defined by (fk )# (µ) = (fk )# µ, are approxi-
mate isometries too.
The identity
d2 (f (x1 ), f (y1 ))2 − d1 (x1 , y1 )2
= d2 (f (x1 ), f (y1 )) − d1 (x1 , y1 ) d2 (f (x1 ), f (y1 )) + d1 (x1 , y1 ) ,
implies
2 2
d
2 (f (x1 ), f (y 1 )) − d1 1 1 ≤ ε diam (X1 ) + diam (X2 ) .
(x , y )
hence
q
W2 f# µ1 , f# µ′1 ≤ W2 (µ1 , µ′1 ) + ε diam (X1 ) + diam (X2 ) . (28.8)
Convergence of the Wasserstein space 795
(iv) lim (fk )# εk,t = εt , in the weak topology of measures, for each
k→∞
t ∈ (0, 1).
Assume further that each Πk is an optimal dynamical transference
plan, for the square distance cost function. Then:
(v) For each t ∈ (0, 1), there is a choice of the speed fields |vk |
associated with the plans Πk , such that lim |vk | ◦ fk′ = |v|, in the
k→∞
uniform topology;
(vi) The limit Π is an optimal dynamical transference plan, so π is
an optimal transference plan and (µt )0≤t≤1 is a displacement interpo-
lation.
Proof of Theorem 28.9. The proof is quite technical, so the reader might
skip it at first reading and go directly to the last section of this chapter.
In a first step, I shall establish the compactness of the relevant objects,
and in a second step pass to the limit.
It will be convenient to regularize rough paths with the help of some
continuous mollifiers. For δ ∈ (0, 1/2), define
δ+s δ−s
ϕδ (s) = 2
1−δ≤s<0 + 2 10≤s≤δ (28.14)
δ δ
and
ϕδ+ (s) = ϕδ (s − δ), ϕδ− (s) = ϕδ (s + δ). (28.15)
Then supp ϕδ+ ⊂ [0, 2δ] and supp ϕδ− ⊂ [−2δ, 0]. These functions have
a graph that looks like a sharp “tent hat”; their integral (on the real
line) is equal to 1, and as δ → 0 they converge in the weak topology to
the Dirac mass δ0 at the origin.
798 28 Stability of optimal transport
Since all of the lengths d(γk (0), γk (1)) are uniformly bounded, we
conclude that there is a constant C such that for all t0 and s0 as above,
δ δ
Lt0 →s0 k(f ◦ γk ) − |s 0 − t 0 | L (f
0→1 k ◦ γk ≤ C(δ + εk );
)
L δ ◦ γk ) ≤ C.
0→1 (fk
In particular,
Lδt0 →s0 (σ) ≤ C(|s0 − t0 | + δ). (28.18)
In Lemma 28.11 below it will be shown that, as a consequence
of (28.18), σ can be written as (Id , γ)# λ for some Lipschitz-continuous
curve γ : [0, 1] → X . Once that is known, the end of the proof of (a) is
straightforward: Since
Lδt0 →s0 (σ) = d γ(t0 ), γ(s0 ) + O(δ),
where
Z 1Z 1
δ
F (t, s) = β(t0 , s0 ) ϕδ+ (t − t0 ) ϕδ− (s − s0 ) dt0 ds0
Z 0 0
Λ(t, s) = d(x, y) dνt (x) dνs (y).
X ×X
Now plug this back into (28.26) and let δ → 0 to conclude that
Z Z
β(t, s) d(x, y) dνt (x) dt dνs (y) ds
X ×[0,1] X ×[0,1]
Z 1Z 1
≤C β(t, s) |s − t| dt ds.
0 0
Next, let (ψk )k∈N be a countable dense subset of C(X ). For any k,
Z Z t+ε Z
ε 1
ψk dνt = ψk (x) dντ (x) dτ. (28.31)
X 2ε t−ε X
Noncompact spaces
Bibliographical notes
Theorem 28.6 is taken from [577, Section 4], while Theorem 28.13 is
an adaptation of [577, Appendix E]. Theorem 28.9 is new. (A part of
this theorem was included in a preliminary version of [578], and later
removed from that reference.)
The discussion about push-forwarding dynamical transference plans
is somewhat subtle. The point of view adopted in this chapter is the
following: when an approximate isometry f is given between two spaces,
use it to push-forward a dynamical transference plan Π, via (f ◦)# Π.
The advantage is that this is the same map that will push-forward the
measure and the dynamical plan. The drawback is that the resulting
object (f ◦)# Π is not a dynamical transference plan, in fact it may
not even be supported on continuous paths. This leads to the kind of
technical sport that we’ve encountered in this chapter, embedding into
probability measures on probability measures and so on.
Another option would be as follows: Given two spaces X and Y, with
an approximate isometry f : X → Y, and a dynamical transference plan
Π on Γ (X ), define a true dynamical transference plan on Γ (Y), which
is a good approximation of (f ◦)# Π. The point is to construct a recipe
which to any geodesic γ in X associates a geodesic S(γ) in Y that is
“close enough” to f ◦ γ. This strategy was successfully implemented
in the final version of [578, Appendix]; it is much simpler, and still
quite sufficient for some purposes. The example treated in [578] is the
stability of the “democratic condition” considered by Lott and myself;
but certainly this simplified version will work for many other stability
issues. On the other hand, I don’t know if it is enough to treat such
topics as the stability of general weak Ricci bounds, which will be
considered in the next chapter.
The study of the kinetic energy measure and the speed field occurred
to me during a parental meeting of the Crèche Le Rêve en Couleurs
Bibliographical notes 809
and
Z
U (ρt ) dν ≤
Z !
ρ0 (x0 ) (K,N )
(1 − t) U (K,N )
β1−t (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
M ×M β1−t (x0 , x1 )
Z !
ρ1 (x1 ) (K,N )
+t U (K,N )
βt (x0 , x1 ) π(dx0 |x1 ) ν(dx1 ).
M ×M βt (x0 , x1 )
(29.3)
Here G(s, t) is the one-dimensional Green function of (16.6), KN,U is
(K,N )
defined by (17.10), and the distortion coefficients βt = βt are those
appearing in (14.61).
Which of these formulas should we choose for the extension to non-
smooth spaces? When K = 0, both inequalities reduce to just
Z Z Z
U (ρt ) dν ≤ (1 − t) U (ρ0 ) dν + t U (ρ1 ) dν. (29.4)
Z
β 1 dµ
Uπ,ν (µ) = U (x) β(x, y) π(dy|x) ν(dx),
X ×X β(x, y) dν
when µ is not absolutely continuous with respect to ν. It would be a
mistake to keep the same definition and replace dµ/dν by the density
of the absolutely continuous part of µ with respect to ν. In fact there
β
is only one natural extension of the functionals U and Uπ,ν ; before
giving it explicitly, I shall try to motivate it. Think of the singular part
of µ as something which “always has infinite density”. Assume that
the respective contributions of finite and infinite values of the density
decouple, so that one would define separately the contributions of the
absolutely continuous part µac and of the singular part µs . Only the
asymptotic behavior of U (r) as r → ∞ should count when one defines
the contribution of µs . Finally, if U (r) were R increasing like cr, it is
natural to assume that Uν (µs ) should be X c dµs = c µs [X ]. So it is
the asymptotic slope of U that should matter. Since U is convex, there
is a natural notion of asymptotic slope of U :
U (r)
U ′ (∞) := lim = lim U ′ (r) ∈ R ∪ {+∞}. (29.5)
r→∞ r r→∞
µ = ρ ν + µs
Remark 29.3. Remark 17.27 applies here too: I shall often identify π
with the probability measure π(dx dy) = µ(dx) π(dy|x) on X × X .
For later use I record here two elementary lemmas about the func-
β
tionals Uπ,ν . The reader may skip them at first reading and go directly
to the next section.
β
First, there is a handy way to rewrite Uπ,ν (µ) when µs = 0:
where v(r) = U (r)/r, with the conventions U (0)/0 = U ′ (0) ∈ [−∞, +∞),
U (∞)/∞ = U ′ (∞) ∈ (−∞, +∞], and ρ = 0 on Spt µs .
Proof of Lemma 29.6. The identity is formally obvious if one notes that
ρ(x) π(dy|x) ν(dx) = π(dy|x) µ(dx) = π(dy dx); so all the subtlety lies
in the fact that in (29.7) the convention is U (0)/0 = 0, while in (29.8)
it is U (0)/0 = U ′ (0). Switching between both conventions is allowed
because the set {ρ = 0} is anyway of zero π-measure. ⊓
⊔
β
Secondly, the functionals Uπ,ν (and the functionals Uν ) satisfy a
principle of “rescaled subadditivity”, which might at first sight seem
contradictory with the convexity property, but is not at all.
β
X X
UP Zj πj ,ν Zj µj ≥ Zj (UZj )βπj ,ν (µj ),
j
j
with equality if the measures µk are singular with respect to each other.
• U is Lipschitz, if N < ∞;
• U is locally Lipschitz and U (r) = a r log r + b r for r large enough,
if N = ∞ (a ≥ 0, b ∈ R).
ρ0 (x0 ) (K,N )
(Uℓ )′ (1) (K,N )
β1−t (x0 , x1 ) ≤ U ′ (1) ρ0 (x0 ).
β1−t (x0 , x1 )
In both cases, the upper bound belongs to L1 π(dx0 |x1 ) ν(dx1 ) , which
proves the claim.
Step 3: Behavior at infinity. Finally we consider the case of a
general U ∈ DCN , and approximate it at infinity so that it has the
desired behavior. The reasoning is pretty much the same as for Step 2.
Let us assume for instance N < ∞. By Proposition 17.7(iv), there is
a nondecreasing sequence (Uℓ )ℓ∈N , converging pointwise to U , with the
desired behavior at infinity, and Uℓ′ (∞) → U ′ (∞). By Step 2,
(K,N) (K,N)
1−t β β
(Uℓ )ν (µt ) ≤ (1 − t) (Uℓ )π,ν t
(µ0 ) + t (Uℓ )π̌,ν (µ1 ),
Then we know that Uℓ′ (∞) → U ′ (∞), so we may pass to the limit
in the second term of (29.15). To pass to the limit in the first term
by monotone convergence, it suffices to check that Uℓ (ρt ) is bounded
below, uniformly in ℓ, by a ν-integrable function. But this is true since,
for instance, U0 which is bounded below by an affine function of the
form r → − C(r + 1), C ≥ 0; so Uℓ (ρt ) ≥ − Cρt − C 1ρt >0 , and the
latter function is integrable since ρt has compact support. ⊓
⊔
then the first term is always displacement convex, and the second is
displacement convex if V is convex (simple exercise).
R
On the other hand, ρǫ (x−v) log ρǫ (x−v) dx is independent of v ∈ Rn ;
so
He−V dx ρǫ ( · − (1 − t)x0 − tx1 ) dx >
(1 − t) He−V dx ρǫ ( · − x0 ) + t He−V dx ρǫ ( · − x1 ) .
Since the path (ρǫ (x − (1 − s)x0 − sx1 ) dx)0≤s≤1 is a geodesic interpola-
tion (this is the translation at uniform speed, corresponding to ∇ψ =
constant), we see that (Rn , d2 , e−V (x) dx) cannot be a weak CD(0, ∞)
space. So the conclusion of Example 29.13 can be refined as follows:
(Rn , d2 , e−V (x) dx) is a weak CD(0, ∞) space if and only if V is convex.
Example 29.15. Let M be a smooth compact n-dimensional Rieman-
nian manifold with nonnegative Ricci curvature, and let G be a com-
pact Lie group acting isometrically on M . (See the bibliographical
notes for references on these notions.) Then let X = M/G and let
q : M → X be the quotient map. Equip X with the quotient distance
d(x, y) = inf{dM (x′ , y ′ ); q(x′ ) = x, q(y ′ ) = y}, and with the measure
ν = q# volM . The resulting space (X , d, ν) is a weak CD(0, n) space,
that in general will not be a manifold. (There will typically be singu-
larities at fixed points of the group action.)
Example 29.16. It will be shown in the concluding chapter that
(Rn , k·k, λn ) is a weak CD(K, N ) space, where k · k is any norm on Rn ,
and λn is the n-dimensional Lebesgue measure. This example proves
that a weak CD(K, N ) space may be “strongly” branching (recall the
discussion in Example 27.17).
Q
Example 29.17. Let X = ∞ i=1 Ti , where Ti = R/(εi Z) is equipped
with the usual distance di and the normalized Lebesgue
P 2 measure λi ,
and εi = 2 diam (Ti ) is qsome positive number. If εi < +∞ then the
P 2
product distance d = di turns X into a compact metric space.
Q
Equip X with the product measure ν = λi ; then (X , d, ν) is a weak
CD(0,Q∞) space. (Indeed, it is the measured Gromov–Hausdorff limit of
X = kj=1 Tj which is CD(0, k), hence CD(0, ∞); and it will be shown
in Theorem 29.24 that the CD(0, ∞) property is stable under measured
Gromov–Hausdorff limits.)
824 29 Weak Ricci curvature bounds I: Definition and Stability
β
Continuity properties of the functionals Uν and Uπ,ν
Z Z
∗ ∞ ′
(i) Uν (µ) = sup ϕ dµ − U (ϕ) dν; ϕ ∈ L (X ); ϕ ≤ U (∞)
X X
Z Z
(ii) Uν (µ) = sup ϕ dµ − U ∗ (ϕ) dν; ϕ ∈ C(X ),
X X
1
U′ ≤ ϕ ≤ U ′ (M ); M ∈ N .
M
The deceiving simplicity of these formulas hides some subtleties: For
instance, it is in general impossible to drop the restriction ϕ ≤ U ′ (∞)
in (i), so the supremum is not taken over the whole vector space L∞ (X )
but only on a subspace thereof. Proposition 29.19 can be proven by
elementary tools of measure theory; see the bibliographical notes for
references and comments.
The next statement gathers the three important properties on which
the main results of this chapter rest: (i) Uν (µ) is lower semicontinuous
in (µ, ν); (ii) Uν (µ) is never increased by push-forward; (iii) µ can be
β
regularized in such a way that Uπ,ν (µ) is upper semicontinuous in (π, µ)
along the approximation. In the next statement, M+ (X ) will stand for
the set of finite (nonnegative) Borel measures on X , and L1+ (ν) for the
set of nonnegative ν-integrable measurable functions on X .
Exercise 29.22. Use Property (ii) in the case U (r) = r log r to recover
the Csiszár–Kullback–Pinsker inequality (22.25). Hint: Take f to be
valued in {0, 1}, apply Csiszár’s two-point inequality
x 1−x
∀x, y ∈ [0, 1], x log + (1 − x) log ≥ 2 (x − y)2
y 1−y
To summarize: Uν (µε ) ≤ Uν (µ) for all ε > 0, and then the conclusion
follows.
If µ is not absolutely continuous, define
Z Z
ρa,ε (x) = Kε (x, y) ρ(y) ν(dy); ρs,ε (x) = Kε (x, y) µs (dy),
Spt ν Spt ν
whence the conclusion. Finally, if U does not have a constant sign, this
means that there is r0 > 0 such that U (r) ≤ 0 for r ≤ r0 , and U (r) > 0
for r > r0 . Then:
• if r < B −1 r0 , then U+ (ar) = 0 for all a ≤ B and (29.19) is obviously
true;
• if r > B r0 , then U (ar) > 0 for all a ∈ [B −1 , B] and one estab-
lishes (29.20) as before;
• if B −1 r0 ≤ r ≤ B r0 , then U+ (ar) is bounded above for a ≤ B,
while r is bounded below, so obviously (29.19) is satisfied for some
well-chosen constant C.
Next, we may dismiss the case when the right-hand side of the in-
equality in (29.17) is +∞ as trivial; so we might assume that
Z
ρ(x)
β(x, y) U+ π(dy|x) ν(dx) < +∞; U ′ (∞) µs [X ] < +∞.
β(x, y)
(29.21)
830 29 Weak Ricci curvature bounds I: Definition and Stability
Step 2: Reduction to the case U ′ (0) > −∞. The problem now
is to get rid of possibly very large negative values of U ′ close to 0. For
any δ > 0, define Uδ′ (r) := max (U ′ (δ), U ′ (r)) and
Z r
Uδ (r) = Uδ′ (s) ds.
0
Since U′ is nondecreasing, Uδ′ converges monotonically to U ′ ∈ L1 (0, r)
as δ → 0. It follows that Uδ (r) decreases to U (r), for all r > 0. Let us
check that all the assumptions which we imposed on U , still hold true
for Uδ . First, Uδ (0) = 0. Also, since Uδ′ is nondecreasing, Uδ is convex.
Finally, Uδ has polynomial growth; indeed:
• if r ≤ δ, then r (Uδ )′ (r) = r U ′ (δ) is bounded above by a constant
multiple of r;
• if r > δ, then r (Uδ )′ (r) = r U ′ (r), which is bounded (by assumption)
by C(U (r)+ + r), and this obviously is bounded by C(Uδ (r)+ + r),
for U ≤ Uδ .
The next claim is that the integral
Z
ρ(x)
β(x, y) Uδ π(dy|x) ν(dx)
β(x, y)
makes sense and is not +∞. Indeed, as we have just seen, there is a
constant C such that (Uδ )+ ≤ C(U+ (r) + r). Then the contribution of
the linear part Cr is finite, since
Z Z
ρ(x)
β(x, y) π(dy|x) ν(dx) = ρ(x) ν(dx) ≤ 1;
β(x, y)
and the contribution of C U+ (r) is also finite in view of (29.21). So
Z
ρ(x)
β(x, y) (Uδ )+ π(dy|x) ν(dx) < +∞,
β(x, y)
which proves the claim.
Now assume that Theorem 29.20(iii) has been proved with Uδ in
place of U . Then, for any δ > 0,
Z
ρε (x)
lim sup β(x, y) U πε (dy|x) ν(dx)
ε↓0 β(x, y)
Z
ρε (x)
≤ lim sup β(x, y) Uδ πε (dy|x) ν(dx)
ε↓0 β(x, y)
Z
ρ(x)
≤ β(x, y) Uδ π(dy|x) ν(dx).
β(x, y)
832 29 Weak Ricci curvature bounds I: Definition and Stability
and for the right-hand side the computation is similar, once one has
noticed that the first marginal of π is the weak limit of the first marginal
µε of πε , i.e. µ (as recalled in the First Appendix).
So in the sequel I shall assume that U ≥ 0.
Step 4: Treatment of the singular part. To take care of the
singular part, the reasoning is similar to the one already used in the
particular case β = 1: Write µ = ρ ν + µs , and
Z Z
ρa,ε (x) = Kε (x, y) ρ(y) ν(dy); ρs,ε (x) = Kε (x, y) µs (dy).
X X
β β
Uπ,ν (µε ) ≤ Uπ,ν (ρa,ε ν) + U ′ (∞) µs [X ].
β
In the next two steps I shall focus on the first term Uπ,ν (ρa,ε ν);
I shall write ρε for ρa,ε .
Step 5: Approximation of β. For any two points x, y in X , define
Z
βε (x, y) = Kε (x, x) Kε (y, y) β(x, y) ν(dx) ν(dy).
X ×X
Now assume that β and βe are bounded from above and below by pos-
itive constants, say B −1 ≤ β, βe ≤ B, then by (29.19) there is C > 0,
depending only on B, such that
βe U ρ − β U ρ ≤ C |βe − β| U (ρ) + ρ .
βe β
Then
Z
ρε (x)
βε (x, y) U πε (dy|x) ν(dx)
βε (x, y)
Z
ρε (x)
− β(x, y) U πε (dy|x) ν(dx)
β(x, y)
Z
ρε (x) ρε (x)
≤ βε (x, y) U − β(x, y) U πε (dy|x) ν(dx)
βε (x, y) β(x, y)
Z
≤ C βε (x, y) − β(x, y) U (ρε (x)) + ρε (x) πε (dy|x) ν(dx)
! Z
≤ C sup βε (x, y) − β(x, y) U (ρε (x)) + ρε (x) ν(dx)
x,y∈X
! Z
≤C sup βε (x, y) − β(x, y) U (ρ) + ρ dν,
x,y∈X
where the last inequality follows from Jensen’s inequality as in the proof
of the particular case β ≡ 1. To summarize:
Z
ρ ε (x)
lim sup βε (x, y) U πε (dy|x) ν(dx)
ε↓0 βε (x, y)
Z
ρε (x)
− β(x, y) U πε (dy|x) ν(dx) = 0.
β(x, y)
In particular,
Z
eε (dx dy) = U ′ (∞) µs (dx).
g(x, y) π
Spt µs
In particular,
Z Z
ρ(x)
sup g(x, y) dµ(x) = sup β(x, y) U dν(x) < +∞;
X y X y β(x, y)
in other words, g belongs to the vector space L1 (X , µ); C(X ) of
µ-integrable functions valued in the normed space C(X ) (equipped
with the norm of uniform convergence).
• If U ′ (∞) < +∞ then g(x, · ) is also continuous for all x (it is iden-
tically equal to U ′ (∞) if x ∈ Spt µs ), and supy g(x, y) ≤ U ′ (∞) is
obviously µ(dx)-integrable; so the conclusion g ∈ L1 (X , µ); C(X )
still holds true.
In the sequel I shall drop the index k and write just Ψ for Ψk . It will
be useful in the sequel to know that Ψ (x, y) = U ′ (∞) when x ∈ Spt µs ;
apart from that, Ψ might be any continuous function.
838 29 Weak Ricci curvature bounds I: Definition and Stability
(a)
Step 8: Variations on a regularization theme. Let π eε be the
eε . Explicitly,
contribution of ρ ν to π
Z
(a)
eε (dx dy) =
π Kε (x, x) Kε (y, y) πε (dy|x) ρ(x) ν(dx) ν(dy) ν(dx).
Then let
Z
π ε(a) (dx dy) = Kε (x, x) Kε (y, y) ρ(x) ν(dx) πε (dy|x) ν(dy) ν(dx);
Z
bε(a) (dx dy)
π = Kε (x, x) Kε (y, y) ρε (x) ν(dx) πε (dy|x) ν(dy) ν(dx).
kπ ε(a) − π
e(a) kT V
Zε
≤ Kε (x, x) Kε (y, y) |ρ(x) − ρ(x)| πε (dy|x) ν(dx) ν(dy) ν(dx)
Z
= Kε (x, x) |ρ(x) − ρ(x)| πε (dy|x) ν(dx) ν(dx)
Z
= Kε (x, x) |ρ(x) − ρ(x)| ν(dx) ν(dx). (29.27)
kπ ε(a) − π
eε(a) kT V −→ 0. (29.31)
Next,
kπ ε(a) − π
b(a) kT V
Zε
≤ Kε (x, x) Kε (y, y) |ρε (x) − ρ(x)| πε (dy|x) ν(dx) ν(dy) ν(dx)
Z
= Kε (x, x) |ρε (x) − ρ(x)| πε (dy|x) ν(dx) ν(dx)
Z
= Kε (x, x) |ρε (x) − ρ(x)| ν(dx) ν(dx)
Z
= |ρε (x) − ρ(x)| ν(dx)
Z Z
= Kε (x, y) ρ(y) − ρ(x) ν(dy) ν(dx)
Z
≤ Kε (x, y) |ρ(y) − ρ(x)| ν(dy) ν(dx),
πε(a) − π ε(a) kT V −→ 0,
kb (29.32)
(a) (a)
By (29.31) and (29.32), kb
πε −e πε kT V −→ 0 as ε → 0. In particular,
Z Z
Ψ dbπε(a) − Ψ deπε(a) −−→ 0. (29.33)
ε↓0
Now let
Z
eε(s) (dx dy)
π = Kε (x, x) Kε (y, y) πε (dy|x) ν(dx) ν(dy) µs (dx);
Z
bε(s) (dx dy)
π = Kε (x, x) Kε (y, y) µs,ε (dx) πε (dy|x) ν(dy) ν(dx);
(a) (s)
eε (dx dy) = π
so that π eε (dx dy) + π
eε (dx dy). Further, define
while Z
πε(s) = U ′ (∞) µs,ε [X ] = U ′ (∞) µs [X ].
Ψ db
At
R this point,
R to finish the proof of the theorem it suffices to establish
πε −→ Ψ dπ; which is true if π
Ψ db bε converges weakly to π.
Step 9: Duality. Proving the convergence of π bε to π will be easy
because πbε is a kind of regularization of πε , and it will be possible to
“transfer the regularization to the test function” by duality. Indeed:
Z Z
bε (dx dy) = Ψ (x, y) Kε (x, x) Kε (y, y) πε (dy dx) ν(dy) ν(dx)
Ψ (x, y) π
Z
= Ψε (x, y) πε (dy dx),
where
Z
Ψε (x, y) = Ψ (x, y) Kε (x, x) Kε (y, y) ν(dy) ν(dx).
By Theorem 28.9 (in the very simple case when Xk = X for all k), we
may extract a subsequence such that µk,t −→ µt in P2 (X ), for each
t ∈ [0, 1], and πk −→ π in P (X × X ), where (µt )0≤t≤1 is a displacement
interpolation and π ∈ Π(µ0 , µ1 ) is an associated optimal transference
plan. Then by Theorem 29.20(i),
as required.
(K,N )
It remains to treat the case when βt is not continuous. By
Proposition
p 29.11 this can occur only if N = 1 or diam (X ) =
π (N − 1)/K. In both cases, Proposition 29.10 and the previous proof
show that
(K,N ′ ) (K,N ′ )
′ β1−t βt
∀N > N, Uν (µt ) ≤ (1 − t) Uπ,ν (µ0 ) + t Uπ̌,ν (µ1 ).
Now we have all the tools to prove the main result of this chapter: The
weak curvature-dimension bound CD(K, N ) passes to the limit. Once
again, the compact case will imply the general statement.
(K,N) (K,N)
β β
Uνk (µkε,t ) ≤ (1 − t) Uπk1−t
,ν
(µk0 ) + t Uπ̌kt ,ν (µk1 ), (29.39)
ε k ε
(K,N )
where βt is given by (14.61) (with the distance dk ) and πεk is an
optimal coupling associated with (µkε,t )0≤t≤1 . This means that for each
ε ∈ (0, 1) and k ∈ N there is a dynamical optimal transference plan Πεk
such that
µkε,t = (et )# Πεk , πεk = (e0 , e1 )# Πεk ,
where et is the evaluation at time t.
By Theorem 28.9, up to extraction of a subsequence in k, there is a
dynamical optimal transference plan Πε on Γ (X ) such that, as k → ∞,
(f ◦) Π k −→ Πε weakly in P (P ([0, 1] × X ));
k # ε
k
(fk , fk )# πε −→ πε weakly in P (X × X );
k
sup W2 (fk )# µε,t , µε,t −→ 0;
0≤t≤1
where
µε,t = (et )# Πε , πε = (e0 , e1 )# Πε .
Each curve (µε,t )0≤t≤1 is D-Lipschitz with D = diam (X ). By As-
coli’s theorem, from ε ∈ (0, 1) we may extract a subsequence (still
denoted ε for simplicity) such that
sup W2 µε,t, µt −−−→ 0, (29.40)
0≤t≤1 ε→0
and since ρkε,0 and U are continuous, β(x0 , x1 ) U (ρkε,0 (x0 )/β(x0 , x1 ))
is uniformly close to β(fk (x0 ), fk (x1 )) U (ρkε,0 (x0 )/β(fk (x0 ), fk (x1 ))) as
k → ∞. So
Z !
ρkε,0 (x0 )
lim β(x0 , x1 ) U πεk (dx1 |x0 ) νk (dx0 )
k→0 β(x0 , x1 )
Z !
ρkε,0 (x0 )
− β(fk (x0 ), fk (x1 )) U πεk (dx1 |x0 ) νk (dx0 ) = 0.
β(fk (x0 ), fk (x1 ))
(29.45)
Similarly,
(K,N)
β βt
lim sup lim sup Uπ̌kt ,ν (µkε,1 ) ≤ Uπ̌,ν (µ1 ). (29.49)
ε k
ε↓0 k→∞
where N ′ > 1 (recall Remark 17.31) to deduce that for any two proba-
bility measures µk0 , µk1 on Xk , there is a Wasserstein geodesic (µkt )t∈[0,1]
and an associated coupling π k such that
(K,N ′ ) (K,N ′ )
β β
Uνk (µkt ) ≤ (1 − t) Uπk1−t
,νk (µk0 ) + t Uπ̌kt ,νk (µk1 ).
Then the same proof as before shows that for any two probability mea-
sures µ0 , µ1 on X , there is a Wasserstein geodesic (µt )t∈[0,1] and an
associated coupling π such that
(K,N ′ ) (K,N ′ )
1−t β β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ).
If K > 0, 1 < N < ∞ and sup diam (Xk ) = DK,N , then we can
apply a similar reasoning, introducing again the bounded coefficients
(K,N ′ )
βt for N ′ > N and then passing to the limit as N ′ ↓ N .
This concludes the proof of Theorem 29.24. ⊓
⊔
Remark 29.27. What the previous proof really shows is that under
certain assumptions the property of distorted displacement convexity
is stable under measured Gromov–Hausdorff convergence. The usual
displacement convexity is a particular case (take βt ≡ 1).
then for k large enough and ε small enough, the supports of µkε,0 and
µkε,1 are contained in BR+1] (⋆k ). So a geodesic which starts from the
support of µkε,0 and ends in the support of µkε,1 will necessarily have its
image included in B2(R+1)] (⋆k ); thus each measure µkε,t has its support
included in B2(R+1)] (⋆k ).
From that point on, the very same reasoning as in the proof of
Theorem 29.24 can be applied, since, say, the ball B2(R+2)] (⋆k ) in
Xk converges in the measured Gromov–Hausdorff topology to the ball
B2(R+2)] (⋆) in X , etc. ⊓
⊔
Remark 29.29. All the interest of this theorem lies in the fact that
the measured Gromov–Hausdorff convergence is a very weak notion of
convergence, which does not imply the convergence of the Ricci tensor.
The space of CD(K, N ) spaces 849
Further, if f ∈ L1 (X , ν),
define Kε f := Kε (f ν).
The linear operator Kε : µ → (Kε µ)ν is mass-preserving, in
the sense that for any nonnegative finite measure µ on Y, one has
((Kε µ)ν)[Y] = µ[Y]. More generally, Kε defines a (nonstrict) contrac-
tion operator on M (Y). Moreover, as ε → 0,
• If f ∈ C(X ), then Kε f converges uniformly to f on Y;
• If µ is a finite measure supported in Y, then (Kε µ)ν converges
weakly (against Cb (X )) to µ (this follows from the previous property
by a duality argument);
• If f ∈ L1 (Y), then Kε f converges to f in L1 (Y) (this follows from
the density of C(Y) in L1 (Y, ν), the fact that Kε f converges uni-
formly to f if f is continuous, and the contraction property of Kε ).
There is in fact a more precise statement: For any f ∈ L1 (Y, ν),
852 29 Weak Ricci curvature bounds I: Definition and Stability
Z
|f (x) − f (y)| Kε (x, y) ν(dx) ν(dy) −−−→ 0.
Y×Y ε→0
Proof of Lemma 29.36. Let us first treat the case when Y is just a point.
Then the first part of the statement of Lemma 29.36 is just the density
of C(X ) in L1 (X , µ), which is a classical result. To prove the second
part of the statement, let f ∈ L1 (µ), let K be a compact subset of X ,
let h ∈ C(K) such that f = h on K, and let ε > 0. Let ψ ∈ Cc (X \ K)
be such that kψ − f kL1 (X \K,µ) ≤ ε.
Since µ and f ν are regular, there is an open set Oε containing K
such that µ[Oε \ K] ≤ ε/(sup |ψ| + sup |h|) and kf − hkL1 (Oε ) ≤ ε.
The sets Oε and X \ K are open and cover X , so there are continuous
functions χ and η, defined on X and valued in [0, 1], such that χ+η = 1,
χ is supported in Oε and η is supported in X \ K. (In particular χ ≡ 1
in K.) Let
Ψ = h χ + ψ η.
Then Ψ coincides with h (and therefore with f ) in K, Ψ is continuous,
and
So
Z
sup |f (x, y) − Ψ (x, y)| µ(dx)
y
ZX
≤ sup |f (x, z) − f (x, z ′ )| µ(dx)
X d(z,z ′ )≤δ XZ
+ |f (x, yℓ ) − ψℓ (x)| µ(dx)
Z ℓ X
≤ mδ (x) µ(dx) + L(δ) η, (29.51)
X
where n o
mδ (x) = sup |f (x, z) − f (x, z ′ )|; d(z, z ′ ) ≤ δ .
Bibliographical notes
Here are some (probably too lengthy) comments about the genesis
of Definition 29.8. It comes after a series of particular cases and/or
variants studied by Lott and myself [577, 578] on the one hand,
Sturm [762, 763] on the other. To summarize: In a first step, Lott
and I [577] treated CD(K, ∞) and CD(0, N ), while Sturm [762] inde-
pendently treated CD(K, ∞). These cases can be handled with just
displacement convexity. Then it took some time before Sturm [763]
came up with the brilliant idea to use distorted displacement as the
basis of the definition of CD(K, N ) for N < ∞ and K 6= 0.
There are slight variations in the definitions appearing in all these
works; and they are not exactly the ones appearing in this course either.
I shall describe the differences in some detail below.
In the case K = 0, for compact spaces, Definition 29.8 is exactly
the definition that was used in [577]. In the case N = ∞, the definition
in [577] was about the same as Definition 29.8, but it was based on
inequality (29.2) (which is very simple in the case K = ∞) instead
of (29.3). Sturm [762] also used a similar definition, but preferred to
impose the weak displacement convexity inequality only for the Boltz-
mann H functional, i.e. for U (r) = r log r, not for the whole class DC∞ .
It is interesting to note that precisely for the H functional and N = ∞,
inequalities (29.2) and (29.3) are the same, while in general the former
is a priori weaker. So the definition which I have adopted here is a priori
stronger than both definitions in [577] and [762].
Now for the general CD(K, N ) criterion. Sturm’s original defini-
tion [763] is quite close to Definition 29.8, with three differences. First,
he does not impose the basic inequality to hold true for all members
′
of the class DCN , but only for functions of the form −r 1−1/N with
N ′ ≥ N . Secondly, he does not require the displacement interpolation
(µt )0≤t≤1 and the coupling π to be related via some dynamical optimal
transference plan. Thirdly, he imposes µ0 , µ1 to be absolutely continu-
ous with respect to ν, rather than just to have their support included in
856 29 Weak Ricci curvature bounds I: Definition and Stability
Spt ν. After becoming aware of Sturm’s work, Lott and I [578] modified
his definition, imposing the inequality to hold true for all U ∈ DCN ,
imposing a relation between (µt ) and π, and allowing in addition µ0
and µ1 to be singular. In the present course, I decided to extend the
new definition to the case N = ∞.
Sturm [763] proved the stability of his definition under a variant
of measured Gromov–Hausdorff convergence, provided that one stays
away from the limit Bonnet–Myers diameter. Then Lott and I [578]
briefly sketched a proof of stability for our modified definition. Details
appear here for the first time, in particular the painful2 proof of upper
β
semicontinuity of Uπ,ν (µ) under regularization (Theorem 29.20(iii)). It
should be noted that Sturm manages to prove the stability of his defini-
tion without using this upper semicontinuity explicitly; but this might
be due to the particular form of the functions U that he is considering,
and the assumption of absolute continuity.
The treatment of noncompact spaces here is not exactly the same
as in [577] or [763]. In the present set of notes I adopted a rather weak
point of view in which every “noncompact” statement reduces to the
compact case; in particular in Definition 29.8 I only consider compactly
supported probability densities. This leads to simpler proofs, but the
treatment in [577, Appendix E] is more precise in that it passes to the
limit directly in the inequalities for probability measures that are not
compactly supported.
Other tentative definitions have been rejected for various reasons.
Let me mention four of them:
(i) Imposing the displacement convexity inequality along all dis-
placement interpolations in Definition 29.8, rather than along some
displacement interpolation. This concept is not stable under measured
Gromov–Hausdorff convergence. (See the last remark in the concluding
chapter.)
(ii) Replace the integrated displacement convexity inequalities by
pointwise inequalities, in the style of those appearing in Chapter 14.
For instance, with the same notation as in Definition 29.8, one may
define
ρ0 (x)
Jt (γ0 ) := ,
E ρt (γt )|γ0
2
As a matter of fact, I was working on precisely this problem when my left lung
collapsed, earning me a one-week holiday in hospital with unlimited amounts of
pain-killers.
Bibliographical notes 857
Borel set A ⊂ X ,
Z
(K,N ) ν[A]
βt (x, y) P (t) (x, y; A) ν(dy) ≤ ;
tN
and symetrically
Z
(K,N ) ν[A]
β1−t (x, y) P (t) (x, y; A) ν(dx) ≤ .
(1 − t)N
far nobody has undertaken this program seriously and it is not known
whether it includes some serious analytical difficulties.
Lott [576] noticed that (for a Riemannian manifold) at least CD(0, N )
bounds can be formulated in terms of displacement convexity of cer-
tain functionals explicitly involving the time variable. For instance,
CD(0, N ) is equivalent to the convexity of t → t Uν (µt ) + N t log t
on [0, 1], along displacement interpolation, for all U ∈ DC∞ ; rather
than convexity of Uν (µt ) for all U ∈ DCN . (Note carefully: in one
formulation the dimension is taken care of by the time-dependence
of the functional, while in the other one it is taken care of by the
class of nonlinearities.) More general curvature-dimension bounds can
probably be encoded by refined convexity estimates: for instance,
CD(K, N ) seems to be equivalent to the (displacement) convexity of
tHν (µt ) + N t log t − K(t3 /6) W2 (µ0 , µ1 )2 .
It seems likely that this observation can be developed into a com-
plete theory. For geometric applications, this point of view is probably
less sharp than the one based on distortion coefficients, but it may be
technically simpler.
A completely different approach to Ricci bounds in metric-measure
spaces has been under consideration in a work by Kontsevich and
Soibelman [528], in relation to Quantum Field Theory, mirror sym-
metry and heat kernels; see also [756]. Kontsevich pointed out to me
that the class of spaces covered by this approach is probably strictly
smaller than the class defined here, since it does not seem to include
the normed spaces considered in Example 29.16; he also suggested that
this point of view is related to the one of Ollivier, described below.
To close this list, I shall evoke the recent independent contributions
by Joulin [494, 495] and Ollivier [662] who suggested defining the infi-
mum of the Ricci curvature as the best constant K in the contraction
inequality
W1 (Pt δx , Pt δy ) ≤ e−Kt d(x, y),
where Pt is the heat semigroup (defined on probability measures); or
equivalently, as the best constant K in the inequality
where Pt∗ is the adjoint of Pt (that is, the heat semigroup on func-
tions). Similar inequalities have been used before in concentration the-
ory [305, 595], and in the study of spectral gaps [231, 232] or large-time
860 29 Weak Ricci curvature bounds I: Definition and Stability
Elementary properties
The next proposition gathers some almost immediate consequences of
the definition of weak CD(K, N ) spaces. I shall say that a subset X ′ of
a geodesic space (X , d) is totally convex if any geodesic whose endpoints
belong to X ′ is entirely contained in X ′ .
866 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Proof of Theorem 30.2. First assume that (Spt ν, d, ν) has the weak
CD(K, N ) property. Replacing Spt ν by X does not enlarge the class
of probability measures that can be chosen for µ0 and µ1 in Defini-
(K,N)
βt
tion 29.8, and does not change the functionals Uν or Uπ,ν either.
Because Spt ν is (by assumption) a length space, geodesics in Spt ν
are also geodesics in X . So geodesics in P2 (Spt ν) are also geodesics
in P2 (X ) (it is the converse that might be false), and then the prop-
erty of X ′ being a weak CD(K, N ) space implies that X is also a weak
CD(K, N ) space.
The converse implication is more subtle. Assume that (X , d, ν) is a
weak CD(K, N ) space. Let µ0 , µ1 be two compactly supported prob-
ability measures on X with Spt µ0 ⊂ Spt ν, Spt µ1 ⊂ Spt ν. For any
t0 ∈ {0, 1}, choose a sequence of probability measures (µk,t0 )k∈N con-
verging to µt0 , satisfying the conclusion of Theorem 29.20(iii), such
that the supports of all measures µk,t0 are included in a common com-
pact set. By definition of the weak CD(K, N ) property, for each k ∈ N
there is a Wasserstein geodesic (µk,t )0≤t≤1 and an associated coupling
πk ∈ Π(µk , νk ) such that for all t ∈ [0, 1] and U ∈ DCN ,
868 30 Weak Ricci curvature bounds II: Geometric and analytic properties
(K,N) (K,N)
β β
Uν (µk,t ) ≤ (1 − t) Uπk1−t
,ν (µk,0 ) + t Uπ̌kt ,ν (µk,1 ). (30.1)
Displacement convexity
µ = ρ ν + µs .
Z !
(K,N)
βt ρ(x) (K,N )
Uπ,ν (µ) := U (K,N )
βt (x, y) π(dy|x) ν(dx)
X ×X βt (x, y)
+ U ′ (∞) µs [X ],
Proof of Theorem 30.4. The proof is the same as for Theorem 17.28
and Application 17.29, with only two minor differences: (a) ρ is not
necessarily a probability density, but still its integral is bounded above
by 1; (b) there is an additional term U ′ (∞) µs [X ] ∈ R ∪ {+∞}. ⊓
⊔
The next theorem is the final goal of this section: It extends the
displacement convexity inequalities of Definition 29.8 to noncompact
situations.
These inequalities are the starting point for all subsequent inequal-
ities considered in the present chapter.
The proof of Theorem 30.5 will use two auxiliary results, which
generalize Theorems 29.20(i) and (iii) to noncompact situations. These
are definitely not the most general results of their kind, but they will
be enough to derive displacement convexity inequalities with a lot of
generality. As usual, I shall denote by M+ (X ) the set of finite (non-
negative) Borel measures on X , and by L1+ (X ) the set of nonnegative
ν-integrable measurable functions on X . Recall from Definition 6.8 the
notion of (weak) convergence in P2 (X ).
µk −−−→ µ in P2 (X ),
k→∞
and for any sequence of probability measures (πk )k∈N such that the first
marginal of πk is µk , and the second one is µk,1 , the limits
πk −−−→ π
k→∞
Z Z
2
d(x, y) πk (dx dy) −−−→ d(x, y)2 π(dx dy)
k→∞
µk,1 −−−→ µ1 in P2 (X )
k→∞
imply
lim Uπβk ,ν (µk ) = Uπ,ν
β
(µ).
k→∞
Proof of Theorem 30.6. First of all, we may reduce to the case when U
is valued in R+ , just replacing U by r → U (r) + c r. So in the sequel U
will be nonnegative.
Let us start with (i). Let z be an arbitrary base point, and let
(χR )R>0 be a z-cutoff as in the Appendix (that is, a family of cutoff
continuous functions that are identically equal to 1 on a ball BR (z) and
identically equal to 0 outside BR+1] (z)). For any R > 0, write
Z Z
Uν (χR µ) = U (χR ρ) dν + U ′ (∞) χR dµs .
In particular,
Uν (µ) = sup Uν (χR µ). (30.8)
R>0
On the other hand, for each fixed R, we have
Uν (χR µ) = UχR+1 ν (χR µ),
and then we can apply Proposition 29.19(i) with the compact space
(BR+1] (z), ν), to get
Z Z
Uν (χR µ) = sup ϕ χR dµ − U ∗ (ϕ) χR+1 dν;
X X
′
ϕ ∈ Cb BR+1] (z) , ϕ ≤ U (∞) .
Displacement convexity 873
Let ρ(R) be the density of the absolutely continuous part of µ(R) , and
(R)
µs be the singular part. It is obvious that ρ(R) converges to ρ in L1 (ν)
(R)
and µs [X ] → µs [X ] as R → ∞.
Next, define
Z
(R) ′ ′
1 − χR (y ) π(dy |x) δz ;
π (dy|x) = χR (y) π(dy|x) +
π (R) (dx dy) = µ(R) (dx) π (R) (dy|x).
(R)
shows that πk converges to π (R) as k → ∞, for any fixed R.
The plan is to first replace the original expressions by the expressions
with the superscript (R), and then to pass to the limit as k → ∞ for
fixed R. For that I will distinguish two cases.
874 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Similarly,1
(R)
β (R) (R)
U (R) (µk ) − Uπβk ,ν (µk ) ≤ C kρk − ρk kL1 (ν) + µk,s [X ] − µk,s [X ]
πk ,ν
Z
1
+ 2 d(z, y)2 µk,1 (dy) . (30.10)
R
(R) (R) (R)
Note that for k ≥ R, ρk = ρ(R) , and µk,s = µs . Then in view of
R
the definition of µk and the fact that d(z, y)2 µk,1 (dy) is bounded, we
easily deduce from (30.9) and (30.10) that
β (R) β
lim Uπ(R) ,ν (µ ) − Uπ,ν (µ) = 0;
R→∞
β (R) β
lim lim sup U (R) (µk ) − Uπk ,ν (µk ) = 0.
R→∞ k→∞ πk ,ν
(R)
The interest of this reduction is that all probability measures µk (resp.
(R)
πk ) are now supported in a common compact set, namely the closed
(R)
ball B2R (resp. B2R × B2R ). Note that µk converges to µ(R) .
(R)
If k is large enough, µk = µ(R) , so (30.11) becomes
In the sequel, I shall drop the superscript (R), so the goal will be
The argument now is similar to the one used in the proof of Theo-
rem 29.20(iii). Define
ρ(x) β(x, y)
g(x, y) = U ,
β(x, y) ρ(x)
Using ρ(R) ≤ 2ρ, log(1/β) ≤ C d(x, y)2 and reasoning as in the first
case, we can bound the above expression by
Z
C ρ(R) (x) − ρ(x) log(2 + ρ(x)) ν(dx)
Z
2
+ C ρ(R) (x) − ρ(x) 1 + d(x, y) π (R) (dy|x) ν(dx)
Z
+ C ρ(x) log(2 + ρ(x)) 1 − χR (y) π(dy|x) ν(dx)
Z
+ C ρ(x) 1 + d(x, z)2 1 − χR (y) π(dy|x) ν(dx)
Z
(R)
≤C ρ (x) − ρ(x) log(2 + ρ(x)) ν(dx)
Z
+ C (1 + D ) ρ(R) (x) − ρ(x) ν(dx)
2
Z
(R)
+C ρ (x) − ρ(x) (1 + d(x, y)2 ) π(dy|x) + δz ν(dx)
d(x,y)≥D
Z
+ C log(2 + M ) ρ(x) 1 − χR (y) π(dy|x) ν(dx)
Z
+C ρ(x) log(2 + ρ(x)) π(dy|x) ν(dx)
ρ(x)≥M
Z
2
+ C (1 + D ) ρ(x) 1 − χR (y) π(dy|x) ν(dx)
Z
+C ρ(x) (1 + d(x, z)2 ) π(dy|x) ν(dx)
d(x,z)≥D
Z
≤ C (1 + D ) ρ(R) (x) − ρ(x) log(2 + ρ(x)) ν(dx)
2
Z
+C 1 + d(x, y)2 π(dx dy)
d(x,y)≥D
Z
+C 1 + d(x, z)2 π(dx dy)
d(x,z)≥D
Z
+ C log(2 + M ) + (1 + D 2 ) 1 − χR (y) π(dx dy)
Z
+C ρ log(2 + ρ) dν.
ρ≥M
878 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Since d(x, y)2 1d(x,y)≥D ≤ d(x, z)2 1d(x,z)≥D/2 + d(y, z)2 1d(y,z)≥D/2 , the
above expression can in turn be bounded by
Z Z
(R)
2
C (1 + D ) ρ − ρ log(2 + ρ) dν + C 1 + d(x, z)2 µ(dx)
d(z,x)≥D/2
Z Z
+C 1 + d(y, z)2 µ1 (dy) + C ρ log(2 + ρ) dν
d(z,y)≥D/2 ρ≥M
Z
log(2 + M ) + D 2
+C d(z, y)2 µ1 (dy).
R2 d(z,y)≥D/2
From that point on, the proof is similar to the one in the first case.
(To prove that g ∈ L1 (B2R ; C(B2R )) one can use the fact that β is
bounded from above and below by positive constants on B2R × B2R ,
and apply the same estimates as in the proof of Theorem 29.20(ii).) ⊓ ⊔
Proof of Theorem 30.5. By an approximation theorem as in the proof of
Proposition 29.12, we may restrict to the case when U is nonnegative;
we may also assume that U is Lipschitz (in case N < ∞) or that it
behaves at infinity like a r log r+b r (in case N = ∞). By approximating
N by N ′ > N , we may also assume that the distortion coefficients
(K,N ) (K,N )
βt (x, y) are locally bounded and | log βt (x, y)| = O(d(x, y)2 ).
Let (µk,0 )k∈N (resp. (µk,1 )k∈N ) be a sequence converging to µ0 (resp.
to µ1 ) and satisfying the conclusions of Theorem 30.6(ii). For each k
there is a Wasserstein geodesic (µk,t )0≤t≤1 and an associated coupling
πk of (µk,0 , µk,1 ) such that
Displacement convexity 879
(K,N) (K,N)
β β
Uν (µk,t ) ≤ (1 − t) Uπk1−t t
,ν (µk,0 ) + t Uπ̌k ,ν (µk,1 ). (30.15)
Further, let Πk be a dynamical optimal transference plan such that
(et )# Πk = µk,t and (e0 , e1 )# Πk = πk . Since the sequence µk,0 con-
verges weakly to µ0 , its elements belong to a compact subset of P (X );
the same is true of the measures µk,1 . By Theorem 7.21 the families
(µk,t )0≤t≤1 belong to a compact subset of C([0, 1]; P (X )); and also the
dynamical optimal transference plans Πk belong to a compact subset
of P (C([0, 1]; X )). So up to extraction of a subsequence we may as-
sume that Πk converges to some Π, (µk,t )0≤t≤1 converges to some path
(µt )0≤t≤1 (uniformly in t), and πk converges to some π. Since the eval-
uation map is continuous, it is immediate that π = (e0 , e1 )# Π and
µt = (et )# Π.
By Theorem 30.6(i), Uν (µt ) ≤ lim inf Uν (µk,t ). Then, by construc-
k→∞
tion (and Theorem 30.6(ii)),
(K,N) (K,N)
β β1−t
lim sup Uπk1−t,ν (µk,0 ) ≤ Uπ,ν (µ0 )
k→∞
(K,N) (K,N)
β βt
lim sup Uπ̌kt ,ν (µk,1 ) ≤ Uπ̌,ν (µ1 ).
k→∞
The desired inequality (30.5) follows by plugging the above into (30.15).
so
βt
(K,∞)
1 − t2
Uπ̌,ν (µ1 ) ≤ Uν (µ1 ) − λ(K, U ) W2 (µ0 , µ1 )2 . (30.16)
6
Similarly,
β1−t
(K,∞)
1 − (1 − t)2
Uπ,ν (µ0 ) ≤ Uν (µ0 )−λ(K, U ) W2 (µ0 , µ1 )2 . (30.17)
6
Then (30.6) follows from (30.5), (30.16), (30.17) and the identity
1 − t2 1 − (1 − t)2 t(1 − t)
t + (1 − t) = .
6 6 2
⊓
⊔
Brunn–Minkowski inequality
The next theorem can be taken as the first step to control volumes in
weak CD(K, N ) spaces:
• If N = ∞, then
1 1 1
log ≤ (1 − t) log + t log
ν [A0 , A1 ]t ν[A 0 ] ν[A 1]
K t(1 − t)
− sup d(x0 , x1 )2 . (30.20)
2 x0 ∈A0 , x1 ∈A1
Remark 30.8. The result fails if A0 , A1 are not assumed to lie in the
support of ν. (Take ν = δx0 , x1 6= x0 , and A0 = {x0 }, A1 = {x1 }.)
This was the easy part, which does not need the CD(K, N ) condition.
To prove the lower bound, apply (30.18) with A0 = {x}, A1 = A:
This results in
1 (K,N ) 1 1
ν [x, A]t N ≥ t inf βt (x, a) N ν[A] N .
a∈A
(K,N )
As t → 1, inf βt (x, a) converges to 1, so we may pass to the lim inf
and recover
lim inf ν [x, A]t ≥ ν[A].
t→1
Bishop–Gromov inequalities
Once we know that ν has no atom, we can get much more precise
information and control on the growth of the volume of balls, and in
particular prove sharp Bishop–Gromov inequalities for weak CD(K, N )
spaces with N < ∞:
ν[Br (x0 )]
Z r is a nonincreasing function of r, (30.22)
s(K,N ) (t) dt
0
r2
ν[Br (x0 )] ≤ eCr e(K− ) 2 ; (30.23)
2
−K r2
ν[Br+δ (x0 ) \ Br (x0 )] ≤ eCr e if K > 0. (30.24)
Bishop–Gromov inequalities 883
Before providing the proof of this theorem, I shall state three im-
mediate but important corollaries, all of them in finite dimension.
r ≤ R.
884 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Uniqueness of geodesics
It is an important result in Riemannian geometry that for almost any
pair of points (x, y) in a complete Riemannian manifold, x and y are
linked by a unique geodesic. This statement does not extend to gen-
eral weak CD(K, N ) spaces, as will be discussed in the concluding
chapter; however, it becomes true if the weak CD(K, N ) criterion is
supplemented with a nonbranching condition. Recall that a geodesic
space (X , d) is said to be nonbranching if two distinct constant-speed
geodesics cannot coincide on a nontrivial interval.
Regularity of the interpolant 885
where the first equality follows from the monotone convergence theorem
and the second from Corollary 30.10.
So for any k ∈ N, the set Zk of points in Bk (x) which can be joined
to x by several geodesics is of zero measure. The set of points in X
which can be joined to x by several geodesics is contained in the union
of all Zk , and is therefore of zero measure too. ⊓
⊔
cannot hold.
On the other hand, there has to be some dynamical optimal trans-
port plan Π ′′ such that (e0 )# Π ′′ = µ′0 , (e1 )# Π ′′ = µ′1 and inequal-
ity (30.26) holds true with µ′t0 replaced by µ′′t0 = (et0 )# Π ′′ . In par-
ticular, Uν (µ′′t0 ) < Uν (µ′t0 ) = 0, which implies that µ′′t0 is not purely
singular.
2
Here I am cheating a bit because Theorem 30.6, in the version which I have
stated, assumes U (r) ≥ −c r. To deal with this issue, one could prove a more
general version of Theorem 30.6; but a simpler remedy is to introduce ε > 0 and
choose U (r) = −N r (r +ε)−1/N instead of −N r 1−1/N . If µ0 and µ1 are compactly
supported, one may keep the proof as it is and use Theorem 29.20(i) instead of
Theorem 30.6(i).
888 30 Weak Ricci curvature bounds II: Geometric and analytic properties
b defined by
Now consider the plan Π
b = P [γt ∈ Z] Π ′′ + 1[γ ∈Z]
Π Π. (30.27)
0 t0 /
(To pass from the first to the second line I used the fact that Π ′ and
Π ′′ are displacement optimal transference plans between the same two
measures.)
It follows from (30.27) and the ν-negligibility of Z that
where ρbt0 , ρ′′t0 and ρt0 respectively stand for the density of the absolutely
continuous parts of µ bt0 , µ′′t0 and µt0 , and a = P [γt0 ∈ Z] > 0. Then
from the minimality property of µt0 ,
Z Z
U (ρt0 (x)) dν(x) = Uν (µt0 ) ≤ Uν (b µt0 ) = U ρt0 (x)+a ρ′′t0 (x) dν(x).
If, say, µ0 is not purely singular, then the first term on the right-hand
side is negative, while the second one is nonpositive. It follows that
Uν (µt ) < 0, and therefore µt is not purely singular. ⊓
⊔
HWI and logarithmic Sobolev inequalities 889
kρt kpLp (ν) ≤ Uν (µt ) ≤ (1 − t) kρ0 kpLp (ν) + t kρ1 kpLp (ν)
p
≤ max kρ0 kLp (ν) , kρ1 kLp (ν) . (30.28)
⊓
⊔
Remark 30.21. The above argument exploited the fact that in the def-
inition of weak CD(K, N ) spaces the displacement convexity inequal-
ity (29.11) is required to hold for all members of DCN and along a
common Wasserstein geodesic.
where |∇− ρ| is defined by (20.2) (one may also use |∇ρ| in place of
|∇− ρ|). With this notion, one has the following estimates:
Sobolev inequalities
Diameter control
Recall from Proposition 29.11 that a weak CD(K, N ) space with K > 0
and N < ∞ satisfies the Bonnet–Myers diameter bound
r
N −1
diam (Spt ν) ≤ π .
K
Slightly weaker conclusions can also be obtained under a priori
weaker assumptions: For instance, if X is at the same time a weak
CD(0, N ) space and a weak CD(K, ∞) space, then there is a universal
constant C such that
r
N −1
diam (Spt ν) ≤ C . (30.32)
K
See the bibliographical notes for more details.
Poincaré inequalities
R R (30.33)
where −BR = (ν[B])−1 B stands for the averaged integral over B;
huiB = −B u dν stands for the average of the function u on B;
P (K, N, R) = 22N +1 C(K, N, R) D(K, N, R); C(K, N, R), D(K, N, R)
are defined by (19.11) and (18.10) respectively.
In particular, if K ≥ 0 then P (K, N, R) = 22N +1 is admissible; so
ν satisfies a uniform local Poincaré inequality. Moreover, (30.33) still
holds true if the local “norm of the gradient” |∇u| is replaced by any
upper gradient of u, that is a function g such that for any Lipschitz
path γ : [0, 1] → X ,
Z 1
g(γ(1)) − g(γ(0)) ≤ g(γ(t)) |γ̇(t)| dt.
0
Talagrand inequalities
The proof of (ii) and (iii) is the same as the proof of Theorem 22.17,
once one has an analog of Proposition 22.16. It turns out that proper-
ties (i)–(vi) of Proposition 22.16 and Theorem 22.46 are still satisfied
when the Riemannian manifold M is replaced by any metric space X ,
but property (vii) might fail in general. It is still true that this property
holds true for ν-almost all x, under the assumption that ν is locally
doubling and satisfies a local Poincaré inequality. See Theorem 30.30
below for a precise statement (and the bibliographical notes for refer-
ences). This is enough for the proof of Theorem 22.17 to go through.
⊓
⊔
Remark 30.33. In the case N = 1, (30.35) does not make any sense,
but the equivalence (i) ⇒ (iii) still holds. This can be seen by working in
dimension N ′ > 1 and letting N ′ ↓ 1, as in the proof of Theorem 17.41.
Since all the measures µk,0 and µk,1 are supported in a uniform compact
set, Corollary 7.22 guarantees that the sequence (Πk )k∈N converges, up
to extraction, to some dynamical optimal transference plan Π with
(e0 )# Π = µ0 and (e1 )# Π = µ1 . Then µk,t converges weakly to µt =
(et )# Π, and πk := (e0 , e1 )# Πk converges weakly to π = (e0 , e1 )# Π. It
remains to pass to the limit as k → ∞ in the inequality (30.38); this is
easy in view of (30.37) and Theorem 29.20(i), which imply
“γt ∈ Bδ (y)”.) Let µ′t = (et )# Π ′ , let ρ′t be the density of the abso-
lutely continuous part of µ′t , and let π ′ := (e0 , e1 )# Π ′ . Since Π ′ is the
unique dynamical optimal transference plan between µ′0 and µ′1 , we can
write the displacement convexity inequality
β βt
HN,ν (µ′t ) ≤ (1 − t) HN,π,ν
1−t
(µ′0 ) + t HN,π̌,ν (µ′1 ).
In other words,
Z Z
1 1 1
(ρ′t )1− N dν ≥ (1 − t) (ρ′0 (x0 ))− N β1−t (x0 , x1 ) N π ′ (dx0 dx1 )
X X ×X
Z
1 1
+ t (ρ′1 (x1 ))− N βt (x0 , x1 ) N π ′ (dx0 dx1 ), (30.40)
X ×X
If we define
1 1
β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
f (γ) := (1 − t) + t ,
ρ0 (γ0 ) ρ1 (γ1 )
ν[Bδ (y)] N
1 h i E f (γ)1
[γt ∈Bδ (y)]
1 ≥ E Π f (γ)|γt ∈ Bδ (y) = . (30.41)
µt [Bδ (y)] N µt [Bδ (y)]
and this coincides with f (Ft (y)) if ρt (y) 6= 0. All in all, µt (dy)-almost
surely,
1
1 ≥ f (Ft (y)).
ρt (y) N
Equivalently, Π(dγ)-almost surely,
1
1 ≥ f (Ft (γt )) = f (γ).
ρt (γt ) N
Let us recapitulate: We have shown that Π(dγ)-almost surely,
1 1
1 β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
1 ≥ (1 − t) + t . (30.43)
ρt (γt ) N ρ0 (γ0 ) ρ1 (γ1 )
! N1
1 1−t−ε β 1−t−ε (γ0 , γ1−ε )
1−ε
1 ≥
ρt (γt ) N 1−ε ρ0 (γ0 )
! N1
t β t (γ0 , γ1−ε )
1−ε
+ . (30.44)
1−ε ρ1−ε (γ1−ε )
!1
1 ε β 1−t
ε (γt , γ1 ) N
1 ≥
ρ1−ε (γ1−ε ) N 1−t ρt (γt )
! N1
1−t−ε β 1−t−ε (γt , γ1 )
1−t
+ . (30.45)
1−t ρ1 (γ1 )
β1−t βt
Uν (µt ) ≤ (1 − t) Uπ,ν (µ0 ) + t Uπ̌,ν (µ1 ),
as desired.
Step 3: Now we wish to establish inequality (30.36) in the case
when µt is absolutely continuous, that is, we just want to drop the
assumption of compact support.
It follows from Step 2 that (X , d, ν) is a weak CD(K, N ) space, so we
now have access to Theorem 30.19 even if µ0 and µ1 are not compactly
supported; and also we can appeal to Theorem 30.5 to guarantee that
Property (ii) is verified for probability measures that are not necessar-
ily compactly supported. Then we can repeat Steps 1 and 2 without
the assumption of compact support, and in the end establish inequal-
ity (30.36) for measures that are not compactly supported.
Step 4: Now we shall consider the case when µt is not absolutely
continuous. (This is the part of the proof which has interest even in
a smooth setting.) Let (µt )s stand for the singular part of µt , and
m := (µt )s [X ] > 0.
Let E (a) and E (s) be two disjoint Borel sets in X such that the
absolutely continuous part of µt is concentrated on E (a) , and the sin-
gular part of µt is concentrated on E (s) . Obviously, Π[γt ∈ E (s) ] =
(µt )s [X ] = m, and Π[γt ∈ E (a) ] = 1 − m. Let us decompose Π into
Π = (1 − m) Π (a) + m Π (s) , where
902 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Similarly,
βt (a)
Uπ̌,ν (µ1 ) = (Um )π̌βt(a) ,ν (µ1 ) + m U ′ (∞). (30.50)
The combination of (30.47), (30.48), (30.49) and (30.50) implies
β1−t βt
Uν (µt ) ≤ (1 − t) Uπ,ν (µ0 ) + t Uπ̌,ν (µ1 ).
1 1 1 K t(1 − t)
log ≥ (1 − t) log + t log + d(γ0 , γ1 )2 .
ρt (γt ) ρ0 (γ0 ) ρ1 (γ1 ) 2
(30.51)
At a technical level, there is a small simplification since it is not nec-
essary to treat singular measures (if µ is singular and U is not linear,
then according to Proposition 17.7(ii) U ′ (∞) = +∞, so Uν (µ) = +∞).
On the other hand, there is a serious complication: The proof of Step 1
breaks down since the measure ν is not a priori locally doubling, and
Lebesgue’s density theorem does not apply!
It seems a bit of a miracle that the method of proof can still be
saved, as I shall now explain. First assume that ρ0 and ρ1 satisfy the
same assumptions as in Step 1 above, but that in addition they are
upper semicontinuous. As in Step 1, define
β1−t (γ0 , γ1 ) βt (γ0 , γ1 )
f (γ) = (1 − t) log + t log
ρ0 (γ0 ) ρ1 (γ1 )
1 1 K t(1 − t)
= (1 − t) log + t log + d(γ0 , γ1 )2 .
ρ0 (γ0 ) ρ1 (γ1 ) 2
The family of balls {Br (z); z ∈ Bδ/2 (y); r ≤ δ/2} generates the Borel
σ-algebra of Bδ/2 (y), so (30.52) holds true for any measurable set S ⊂
Bδ/2 (y) instead of Br (z). Then we can pass to densities:
ρt (z) ≤ exp − inf f (Ft (x)) almost surely in Bδ/2 (y).
x∈Bδ (y)
Locality
Locality is one of the most fundamental properties that one may expect
from any notion of curvature. In the setting of weak CD(K, N ) spaces,
the locality problem may be loosely formulated as follows: If (X , d, ν) is
weakly CD(K, N ) in the neighborhood of any of its points, then (X , d, ν)
should be a weakly CD(K, N ) space.
So far it is not known whether this “local-to-global” property holds
in general. However, it is true at least in a nonbranching space, if K = 0
and N < ∞. The validity of a more general statement may depend on
the following:
!θ
′
sin (1 − λ) α|t − t′ |
f (1 − λ) t + λ t ≥ (1 − λ) f (t)
(1 − λ) sin(α|t − t′ |)
!θ
sin λ α|t − t′ |
+ λ f (t′ ) (30.56)
λ sin(α|t − t′ |)
Proof of Theorem 30.37. If we can treat the case N > 1, then the case
N = 1 will follow by letting N go to 1 (as in the proof of Theo-
rem 29.24). So let us assume 1 < N < ∞. In the sequel, I shall use the
(K,N )
shorthand βt = βt .
Let (X , d, ν) be a nonbranching local weak CD(K, N ) space. By re-
peating the proof of Theorem 30.32, we can show that for any x0 ∈ X
there is r = r(x0 ) > 0 such that (30.58) holds true along any displace-
ment interpolation (µt )0≤t≤1 which is supported in B(x0 , r). Moreover,
if Π is a dynamical optimal transference plan such that (et )# Π = µt ,
and each measure µt is absolutely continuous with density ρt , then
Π(dγ)-almost all geodesics will satisfy inequality (30.43), which I re-
cast below:
1 1
1 β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
1 ≥ (1 − t) +t . (30.59)
ρt (γt ) N ρ0 (γ0 ) ρ1 (γ1 )
β1−t βt
Uν (µt ) ≤ (1 − t) Uπ,ν (µ0 ) + t Uπ̌,ν (µ1 ). (30.60)
The plan is to cut Π into very small pieces, each of which will
be included in a sufficiently small ball that the local weak CD(K, N )
criterion can be used. I shall first proceed to construct these small
pieces.
Cover the closed ball B[z, R] by a finite number of balls B(xj , rj /3)
with rj = r(xj ), and let r := inf(rj /3). For any y ∈ B[z, R], the ball
B(y, r) lies inside some B(xj , rj ); so if (µt )0≤t≤1 is any displacement in-
terpolation supported in some ball B(y, r), Π is an associated dynam-
ical optimal transference plan, and µ0 , µ1 are absolutely continuous,
then the density ρt of µt will satisfy the inequality
1 1
1 β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
1 ≥ (1 − t) +t , (30.61)
ρt (γt ) N ρ0 (γ0 ) ρ1 (γ1 )
The sets Γℓ are disjoint. We discard the sequences ℓ such that Π[Γℓ ] = 0.
Then let Zℓ = Π[Γℓ ], and let
1Γℓ Πℓ
Πℓ =
Zℓ
be the law of γ conditioned by the event {γ ∈ Γℓ }. Further, let µℓ,t =
(et )# Πℓ , and πℓ = (e0 , e1 )# Πℓ .
For each ℓ and k ∈ {0, . . . , m−2}, we define Πℓk to be the image of Πℓ
by the restriction map [0, 1] → [kδ, (k+2)δ]. Up to affine reparametriza-
tion of time, Πℓk is a dynamical optimal transference plan between the
measures µℓ,kδ and µℓ,(k+2)δ (Theorem 7.30(i)–(ii)).
910 30 Weak Ricci curvature bounds II: Geometric and analytic properties
∀k ∈ {0, . . . , m − 2},
Πℓ (dγ)-almost surely,∀t ∈ [0, 1], ∀(t0 , t1 ) ∈ [kδ, (k + 2)δ],
1 β (γ , γ ) 1 β (γ , γ ) 1
1−t t0 t1 N t t0 t1 N
1 ≥ (1 − t) +t .
ρℓ,(1−t)t0 +tt1 γ(1−t)t0 +tt1 N ρℓ,t0 (γt0 ) ρℓ,t1 (γt1 )
(30.62)
1 ≥ (1 − t) +t . (30.63)
ρℓ,t (γt ) N ρℓ,0 (γ0 ) ρℓ,1 (γ1 )
and
(k)
∀k ∈ N, ∀j ∈ N (j ≤ 1/δ), sup ρjδ < +∞, (30.69)
(k)
where the supremum really is an essential supremum, and ρt is the
(k)
density of µt = (et )# Π (k) with respect to ν.
If we can do this, then by repeating the proof of Theorem 30.37 we
shall obtain
(K,∞) (K,∞)
(k) β (k) β (k)
Hν (µt ) ≤ (1 − t) Hπk1−t
,ν (µ0 ) + t Hπ̌kt ,ν (µ1 ).
b k1
Π
b k1 [Γ ];
Z k1 = Π Π k1 = .
Z k1
As k1 goes to infinity, it is clear that Z k1 ↑ 1 (in particular, we may as-
sume without loss of generality that Z k1 > 0) and Z k1 Π k1 ↑ Π. More-
over, if ρkt 1 stands for the density of (et )# Π k1 , then ρkδ 1 = (Z k1 )−1 hkδ 1
is bounded.
k1 ,k2
Now comes the second step: For each k1 , let (h2δ )k2 ∈N be a non-
decreasing sequence of bounded functions converging almost surely to
ρk2δ1 as k2 → ∞. Let
b k1 ,k2
Π
b k1 ,k2 [Γ ],
Z k1 ,k2 = Π Π k1 ,k2 = ,
Z k1 ,k2
and let ρtk1 ,k2 stand for the density of (et )# Π k1 ,k2 . Then ρδk1 ,k2 ≤
k1 ,k2 k1 ,k2
(Z k1 ,k2 )−1 ρkδ 1 = (Z k1 ,k2 Z k1 )−1 hkδ 1 and ρ2δ = (Z k1 ,k2 )−1 h2δ are
both bounded.
Then repeat the process: If Π k1 ,...,kj has been constructed for any
k1 ,...,kj+1
k1 , . . . , kj in N, introduce a nonincreasing sequence (h(j+1)δ )kj+1 ∈N
1 j k ,...,k
converging almost surely to ρ(j+1)δ as kj+1 → ∞; define
is bounded.
After m operations this process has constructed Π k1 ,...,km such that
all densities ρkjδ1 ,...,km are bounded. The proof is completed by choosing
Π (k) = Π k,...,k , Z (k) = Z k · Z k,k · . . . · Z k,...,k . ⊓
⊔
Bibliographical notes 915
In this Appendix I recall some basic facts about the use of cutoff func-
tions to reduce to compact sets. Again, the natural setting is that of
boundedly compact metric spaces, i.e. metric spaces where closed balls
are compact.
Bibliographical notes
Most of the material in this chapter comes from papers by Lott and
myself [577, 578, 579] and by Sturm [762, 763]. Some of the results
are new. Prior to these references, there had been an important se-
ries of papers by Cheeger and Colding [227, 228, 229, 230], with a
follow-up by Ding [303], about the structure of measured Gromov–
Hausdorff limits of sequences of Riemannian manifolds satisfying a
uniform CD(K, N ) bound. Some of the results by Cheeger and Cold-
ing can be re-interpreted in the present framework, but for many other
916 30 Weak Ricci curvature bounds II: Geometric and analytic properties
theorems this remains open: Examples are the generalized splitting the-
orem [227, Theorem 6.64], the theorem of mutual absolute continuity
of admissible reference measures [230, Theorem 4.17], or the theorem
of continuity of the volume in absence of collapsing [228, Theorem 5.9].
(A positive answer to the problem raised in Remark 30.16 would solve
the latter issue.)
Theorem 30.2 is taken from work by Lott and myself [577] as well
as Corollary 30.9, Theorems 30.22 and 30.23, and the first part of The-
orem 30.28. Theorem 30.7, Corollary 30.10, Theorems 30.11 and 30.17
are due to Sturm [762, 763]. Part (i) of Theorem 30.19 was proven
by Lott and myself in the case K = 0. Part (ii) follows a scheme of
proof communicated to me by Sturm. In a Euclidean context, Theo-
rem 30.20 is well-known to specialists and used in several recent works
about optimal transport; I don’t know who first made this observation.
The Poincaré inequalities appearing in Theorems 30.25 and 30.26
(in the case K = 0) are due to Lott and myself [578]. The concept of
upper gradient was put forward by Heinonen and Koskela [470] and
other authors; it played a key role in Cheeger’s construction [226] of a
differentiable structure on metric spaces satisfying a doubling condition
and a local Poincaré inequality. Independently of [578], there were sev-
eral simultaneous treatments of local Poincaré inequalities under weak
CD(K, N ) conditions, by Sturm [763] on the one hand, and von Re-
nesse [825] on the other. The proofs in all of these works have many
common points, and also common features with the proof by Cheeger
and Colding [230]. But the argument by Cheeger and Colding uses an-
other inequality called the “segment inequality” [227, Theorem 2.11],
which as far as I know has not been adapted to the context of metric-
measure spaces. In [578] on the other hand we used the concept of
“democratic condition”, as in Theorem 19.13.
All these notions (possibly coupled with a doubling condition) are
stable under the Gromov–Hausdorff limit: this was checked in [226, 510,
529] for the Poincaré inequality, in [230] for the segment inequality, and
in [578] for the democratic condition.
Theorem 30.28(ii) is due to Lott and myself [579]; it uses Propo-
sition 30.30 with L(d) = d2 /2. In this particular case (and under a
nonessential compactness assumption), a complete proof of Proposi-
tion 30.30 can be found in [579]. It is also shown there that the conclu-
sions of Proposition 22.16 all remain true if (X , d) is a finite-dimensional
Alexandrov space with curvature bounded below; this is a pointwise
Bibliographical notes 917
slightly different from the one which I gave here; it uses Lemma 29.7. An
alternative “Eulerian” approach to displacement convexity for singular
measures was implemented by Daneri and Savaré [271].
In Alexandrov spaces, the locality of the notion “curvature is
bounded below by κ” is called Toponogov’s theorem; in full gen-
erality it is due to Perelman [175]. A proof can be found in [174, The-
orem 10.3.1], along with bibliographical comments.
The conditional locality of CD(K, ∞) in nonbranching spaces (The-
orem 30.42) was first proven by Sturm [762, Theorem 4.17], with a
different argument than the one used here. Sturm does not make any
assumption about infinite-dimensional points, but he assumes that the
space of probability measures µ with Hν (µ) < +∞ is geodesically con-
vex. It is clear that the proof of Theorem 30.42 can be adapted and
simplified to cover this assumption. Theorem 30.37 is new as far as
I know. Example 30.41 was suggested to me by Lott.
When one restricts to λ = 1/2, Conjecture 30.34 takes a simpler
form, and at least seems to be true for all θ outside (0, 1); but of course
we are interested precisely in the range θ ∈ (0, 1). I once hoped to prove
Conjecture 30.34 by reinterpreting it as the locality of the CD(K, N )
property in 1-dimensional spaces, and classifying 1-dimensional local
weak CD(K, N ) spaces; but I did not manage to get things to work.
Natural geometric questions, related to the locality problem, are the
stability of CD(K, N ) under quotient by Lie group action and lifting
to the universal covering. I shall briefly discuss what is known about
these issues.
• About the quotient problem, there are some results. In [577, Sec-
tion 5.5], Lott and I proved that the quotient of a CD(K, N ) metric-
measure space X by a Lie group of isometries G is itself CD(K, N ),
under the assumptions that (a) X and G are compact; (b) K = 0
or N = ∞; (c) any two absolutely continuous probability measures
on X are joined by a unique displacement interpolation which is ab-
solutely continuous for all times. The definition of CD(K, ∞) which
was used in [577] is not exactly the same as in these notes, but Theo-
rem 30.32 guarantees that there is no difference if X is nonbranching.
Assumption (c) was used only to make sure that any displacement in-
terpolation between absolutely continuous probability measures would
satisfy the displacement interpolation inequalities which are charac-
teristic of CD(0, N ); but Theorem 30.32 ensures that this is the case
in nonbranching CD(0, N ) spaces, so the proof would go through if
Bibliographical notes 919
which are often at the basis of the derivation of such sharp inequalities,
as in the recent papers of Jérôme Demange. To add to the confusion, the
mysterious structure condition (25.10) has popped out in these works;
it is natural to ask whether this condition has any interpretation in
terms of optimal transport.
• Are there interesting examples of displacement convex func-
tionals apart from the ones that have already
R been explored
R during the
past ten years — basically all of the form M U (ρ) dν + M k V dµ⊗k ? It
is frustrating that so few examples of displacement convex functionals
are known, in contrast with the enormous amount of plainly convex
functionals that one can construct. Open Problem 15.11 might be re-
lated to this question.
• Is there a transport-based proof of the famous Lévy–Gromov
isoperimetric inequalities (Open Problem 21.16), that would not
involve so much “hard analysis” as the currently known arguments?
Besides its intrinsic interest, such a proof could hopefully be adapted
to nonsmooth spaces such as the weak CD(K, N ) spaces studied in
Part III.
• Caffarelli’s log concave perturbation theorem (alluded to in
Chapter 2) is another riddle in the picture. The Gaussian space can
be seen as the infinite-dimensional version of the sphere, which is the
Riemannian “reference space” with positive constant (sectional) curva-
ture; and the space Rn equipped with a log concave measure is a space
of nonnegative Ricci curvature. So Caffarelli’s theorem can be restated
as follows: If the Euclidean space (Rn , d2 ) is equipped with a probability
measure ν that makes it a CD(K, ∞) space, then ν can be realized as a
1-Lipschitz push-forward of the reference Gaussian measure with cur-
vature K. This implies almost obviously that isoperimetric inequalities
in (Rn , d2 , ν) are not worse than isoperimetric inequalities in the Gaus-
sian space; so there is a strong analogy between Caffarelli’s theorem on
the one hand, and the Lévy–Gromov isoperimetric inequality on the
other hand. It is natural to ask whether there is a common framework
for both results; this does not seem obvious at all, and I have not been
able to formulate even a decent guess of what could be a geometric
generalization of Caffarelli’s theorem.
• Another important remark is that the geometric theory has been
almost exclusively developed in the case of the optimal transport with
quadratic cost function; the exponent p = 2 here is natural in the con-
text of Riemannian geometry, but working with other exponents (or
Conclusions and open problems 925
I did not include this theorem in the body of these notes, because it
appeals to some results that have not yet been adapted to a genuinely
geometric context, and which I preferred not to discuss. I shall sketch
the proof at the end of this text, but first I would like to explain why
this result is at the same time motivating, and a bit shocking:
(a) As pointed out to me by John Lott, if k · k is not Euclidean,
then the metric-measure space (Rn , k · k, λn ) cannot be realized as a
limit of smooth Riemannian manifolds with a uniform CD(0, N ) bound,
because it fails to satisfy the splitting principle. (If a nonnegatively
curved space admits a line, i.e. a geodesic parametrized by R, then the
space can be “factorized” by this geodesic.) Results by Jeff Cheeger,
Toby Colding and Detlef Gromoll say that the splitting principle holds
for CD(0, N ) manifolds and their measured Gromov–Hausdorff limits.
(b) If k · k is not the Euclidean norm, the resulting metric space
is very singular in certain respects: It is in general not an Alexandrov
space, and it can be extremely branching. For instance, if k · k is the
ℓ∞ norm, then any two distinct points are joined by an uncountable
Conclusions and open problems 927
is one of the topics that Donald Knuth has planned to address in his
long-awaited Volume 4 of The Art of Computer Programming.
Needless to say, the theory might also decide to explore new horizons
which I am unable to foresee.
λ In ≤ ∇ 2 N 2 ≤ Λ I n
for some positive constants λ and Λ. Then the cost function c(x, y) =
N (x − y)2 is both strictly convex and C 1,1 , i.e. uniformly semiconcave.
This makes it possible to apply Theorem 10.28 (recall Example 10.35)
and deduce the following theorem about the structure of optimal maps:
If µ0 and µ1 are compactly supported and absolutely continuous, then
there is a unique optimal transport, and it takes the form
Since the norm is uniformly convex, geodesic lines are just straight
lines; so the displacement interpolation takes the form (Tt )# (ρ0 λn ),
where
Tt (x) = x − t ∇(N 2 )∗ (−∇ψ(x)) 0 ≤ t ≤ 1.
Let θ(x) = ∇(N 2 )∗ (−∇ψ(x)). By [814, Remark 2.56], the Jacobian
matrix ∇θ, although not symmetric, is pointwise diagonalizable, with
eigenvalues bounded above by 1 (this remark goes back at least to a
1996 preprint by Otto [666, Proposition A.4]; a more general statement
is in [30, Theorem 6.2.7]). It follows easily that t → det(In − t∇θ)1/n
is a concave function of t [814, Lemma 5.21], and one can reproduce
the proof of displacement convexity for Uλn , as soon as U ∈ DCn [814,
Theorem 5.15 (i)].
This shows that (Rn , N, λn ) satisfies the CD(0, n) displacement con-
vexity inequalities when N is a smooth uniformly convex norm. Now if
N is arbitrary, it can be approximated by a sequence (Nk )k∈N of smooth
uniformly convex norms, in such a way that (Rn , N, λn , 0) is the pointed
measured Gromov–Hausdorff limit of (Rn , Nk , λn , 0) as k → ∞. Then
the general conclusion follows by stability of the weak CD(0, n) crite-
rion (Theorem 29.24). ⊓
⊔
Conclusions and open problems 931
1. Abdellaoui, T., and Heinich, H. Sur la distance de deux lois dans le cas
vectoriel. C. R. Acad. Sci. Paris Sér. I Math. 319, 4 (1994), 397–400.
2. Abdellaoui, T., and Heinich, H. Caractérisation d’une solution optimale
au problème de Monge–Kantorovitch. Bull. Soc. Math. France 127, 3 (1999),
429–443.
3. Agrachev, A., and Lee, P. Optimal transportation under nonholonomic
constraints. To appear in Trans. Amer. Math. Soc.
4. Agueh, M. Existence of solutions to degenerate parabolic equations via the
Monge–Kantorovich theory. Adv. Differential Equations 10, 3 (2005), 309–360.
5. Agueh, M., Ghoussoub, N., and Kang, X. Geometric inequalities via a
general comparison principle for interacting gases. Geom. Funct. Anal. 14, 1
(2004), 215–244.
6. Ahmad, N. The geometry of shape recognition via the Monge–Kantorovich
optimal transport problem. PhD thesis, Univ. Toronto, 2004.
7. Aida, S. Uniform positivity improving property, Sobolev inequalities, and
spectral gaps. J. Funct. Anal. 158, 1 (1998), 152–185.
8. Ajtai, M., Komlós, J., and Tusnády, G. On optimal matchings. Combi-
natorica 4, 4 (1984), 259–264.
9. Alberti, G. On the structure of singular sets of convex functions. Calc. Var.
Partial Differential Equations 2, 1 (1994), 17–27.
10. Alberti, G. Some remarks about a notion of rearrangement. Ann. Scuola
Norm. Sup. Pisa Cl. Sci (4) 29, 2 (2000), 457–472.
11. Alberti, G., and Ambrosio, L. A geometrical approach to monotone func-
tions in Rn . Math. Z. 230, 2 (1999), 259–316.
12. Alberti, G., Ambrosio, L., and Cannarsa, P. On the singularities of
convex functions. Manuscripta Math. 76, 3-4 (1992), 421–435.
13. Alberti, G., and Bellettini, G. A nonlocal anisotropic model for phase
transitions. I: The optimal profile problem. Math. Ann. 310, 3 (1998), 527–560.
14. Albeverio, S., and Cruzeiro, A. B. Global flows with invariant (Gibbs)
measures for Euler and Navier–Stokes two-dimensional fluids. Comm. Math.
Phys. 129, 3 (1990), 431–444.
15. Alesker, S., Dar, S., and Milman, V. D. A remarkable measure preserving
diffeomorphism between two convex bodies in Rn . Geom. Dedicata 74, 2 (1999),
201–212.
934 References
37. Ambrosio, L., and Tilli, P. Topics on analysis in metric spaces, vol. 25 of
Oxford Lecture Series in Mathematics and its Applications. Oxford University
Press, Oxford, 2004.
38. Andres, S., and von Renesse, M.-K. Particle approximation of the Wasser-
stein diffusion. Preprint, 2007.
39. Andreu, F., Caselles, V., and Mazón, J. M. The Cauchy problem for
a strongly degenerate quasilinear equation. J. Eur. Math. Soc. (JEMS) 7, 3
(2005), 361–393.
40. Andreu, F., Caselles, V., Mazón, J., and Moll, S. Finite propagation
speed for limited flux diffusion equations. Arch. Rational Mech. Anal. 182, 2
(2006), 269–297.
41. Ané, C., Blachère, S., Chafaı̈, D., Fougères, P., Gentil, I., Malrieu,
F., Roberto, C., and Scheffer, G. Sur les inégalités de Sobolev logarith-
miques, vol. 10 of Panoramas et Synthèses. Société Mathématique de France,
2000.
42. Appell, P. Mémoire sur les déblais et les remblais des systèmes continus
ou discontinus. Mémoires présentés par divers Savants à l’Académie des Sci-
ences de l’Institut de France, Paris No. 29 (1887), 1–208. Available online at
gallica.bnf.fr.
43. Arnold, A., Markowich, P., Toscani, G., and Unterreiter, A. On
logarithmic Sobolev inequalities and the rate of convergence to equilibrium for
Fokker–Planck type equations. Comm. Partial Differential Equations 26, 1–2
(2001), 43–100.
44. Arnol′ d, V. I. Mathematical methods of classical mechanics, vol. 60 of Grad-
uate Texts in Mathematics. Springer-Verlag, New York. Translated from the
1974 Russian original by K. Vogtmann and A. Weinstein. Corrected reprint of
the second (1989) edition.
45. Aronson, D. G., and Bénilan, Ph. Régularité des solutions de l’équation
des milieux poreux dans RN . C. R. Acad. Sci. Paris Sér. A-B 288, 2 (1979),
A103–A105.
46. Aronsson, G. A mathematical model in sand mechanics: Presentation and
analysis. SIAM J. Appl. Math. 22 (1972), 437–458.
47. Aronsson, G., and Evans, L. C. An asymptotic model for compression
molding. Indiana Univ. Math. J. 51, 1 (2002), 1–36.
48. Aronsson, G., Evans, L. C., and Wu, Y. Fast/slow diffusion and growing
sandpiles. J. Differential Equations 131, 2 (1996), 304–335.
49. Attouch, L., and Soubeyran, A. From procedural rationality to routines:
A “Worthwile to Move” approach of satisficing with not too much sacrificing.
Preprint, 2005. Available online at
www.gate.cnrs.fr/seminaires/2006 2007.
50. Aubry, P. Monge, le savant ami de Napoléon Bonaparte, 1746–1818.
Gauthier-Villars, Paris, 1954.
51. Aubry, S. The twist map, the extended Frenkel–Kontorova model and the
devil’s staircase. In Order in chaos (Los Alamos, 1982), Phys. D 7, 1–3 (1983),
240–258.
52. Aubry, S., and Le Daeron, P. Y. The discrete Frenkel–Kantorova model
and its extensions, I. Phys. D. 8, 3 (1983), 381–422.
53. Bakelman, I. J. Convex analysis and nonlinear geometric elliptic equations.
Edited by S. D. Taliaferro. Springer-Verlag, Berlin, 1994.
936 References
74. Barthe, F., Bakry, D., Cattiaux, P., and Guillin, A. Poincaré inequal-
ities for logconcave probability measures: a Lyapunov function approach. To
appear in Electron. Comm. Probab.
75. Barthe, F., Cattiaux, P., and Roberto, C. Interpolated inequalities be-
tween exponential and Gaussian, Orlicz hypercontractivity and application to
isoperimetry. Rev. Mat. Iberoamericana 22, 3 (2006), 993–1067.
76. Barthe, F., and Kolesnikov, A. Mass transport and variants of the loga-
rithmic Sobolev inequality. Preprint, 2008.
77. Barthe, F., and Roberto, C. Sobolev inequalities for probability measures
on the real line. Studia Math. 159, 3 (2003), 481–497.
78. Beckner, W. A generalized Poincaré inequality for Gaussian measures. Proc.
Amer. Math. Soc. 105, 2 (1989), 397–400.
79. Beiglböck, M., Goldstern, M., Maresch, G., and Schachermayer, W.
Optimal and better transport plans. Preprint, 2008. Available online at
www.fam.tuwien.ac.at/~wschach/pubs.
80. Bell, E. T. Men of Mathematics. Simon and Schuster, 1937.
81. Benachour, S., Roynette, B., Talay, D., and Vallois, P. Nonlinear self-
stabilizing processes. I. Existence, invariant probability, propagation of chaos.
Stochastic Process. Appl. 75, 2 (1998), 173–201.
82. Benachour, S., Roynette, B., and Vallois, P. Nonlinear self-stabilizing
processes. II. Convergence to invariant probability. Stochastic Process. Appl.
75, 2 (1998), 203–224.
83. Benaı̈m, M., Ledoux, M., and Raimond, O. Self-interacting diffusions.
Probab. Theory Related Fields 122, 1 (2002), 1–41.
84. Benaı̈m, M., and Raimond, O. Self-interacting diffusions. II. Convergence in
law. Ann. Inst. H. Poincaré Probab. Statist. 39, 6 (2003), 1043–1055.
85. Benaı̈m, M., and Raimond, O. Self-interacting diffusions. III. Symmetric
interactions. Ann. Probab. 33, 5 (2005), 1717–1759.
86. Benamou, J.-D. Transformations conservant la mesure, mécanique des flu-
ides incompressibles et modèle semi-géostrophique en météorologie. Mémoire
présenté en vue de l’Habilitation à Diriger des Recherches. PhD thesis, Univ.
Paris-Dauphine, 1992.
87. Benamou, J.-D., and Brenier, Y. Weak existence for the semigeostrophic
equations formulated as a coupled Monge–Ampère/transport problem. SIAM
J. Appl. Math. 58, 5 (1998), 1450–1461.
88. Benamou, J.-D., and Brenier, Y. A numerical method for the optimal
time-continuous mass transport problem and related problems. In Monge
Ampère equation: applications to geometry and optimization (Deerfield Beach,
FL, 1997). Amer. Math. Soc., Providence, RI, 1999, pp. 1–11.
89. Benamou, J.-D., and Brenier, Y. A computational fluid mechanics solution
to the Monge–Kantorovich mass transfer problem. Numer. Math. 84, 3 (2000),
375–393.
90. Benamou, J.-D., and Brenier, Y. Mixed L2 -Wasserstein optimal mapping
between prescribed density functions. J. Optim. Theory Appl. 111, 2 (2001),
255–271.
91. Benamou, J.-D., Brenier, Y., and Guittet, K. The Monge–Kantorovitch
mass transfer and its computational fluid mechanics formulation. ICFD Con-
ference on Numerical Methods for Fluid Dynamics (Oxford, 2001). Internat.
J. Numer. Methods Fluids 40, 1–2 (2002), 21–30.
938 References
212. Carrillo, J. A., Gualdani, M. P., and Toscani, G. Finite speed of prop-
agation in porous media by mass transportation methods. C. R. Math. Acad.
Sci. Paris 338, 10 (2004), 815–818.
213. Carrillo, J. A., McCann, R. J., and Villani, C. Kinetic equilibration
rates for granular media and related equations: entropy dissipation and mass
transportation estimates. Rev. Mat. Iberoamericana 19, 3 (2003), 971–1018.
214. Carrillo, J. A., McCann, R. J., and Villani, C. Contractions in the 2-
Wasserstein length space and thermalization of granular media. Arch. Ration.
Mech. Anal. 179, 2 (2006), 217–263.
215. Carrillo, J. A., and Toscani, G. Asymptotic L1 -decay of solutions of the
porous medium equation to self-similarity. Indiana Univ. Math. J. 49, 1 (2000),
113–142.
216. Carrillo, J. A., and Toscani, G. Contractive probability metrics and
asymptotic behavior of dissipative kinetic equations. Proceedings of the 2006
Porto Ercole Summer School. Riv. Mat. Univ. Parma 6 (2007), 75–198.
217. Carrillo, J. A., and Vázquez, J. L. Fine asymptotics for fast diffusion
equations. Comm. Partial Differential Equations 28, 5–6 (2003), 1023–1056.
218. Carrillo, J. A., and Vázquez, J. L. Asymptotic complexity in filtration
equations. J. Evol. Equ. 7, 3 (2007), 471–495.
219. Cattiaux, P., and Guillin, A. On quadratic transportation cost inequalities.
J. Math. Pures Appl. (9) 86, 4 (2006), 341–361.
220. Cattiaux, P., and Guillin, A. Trends to equilibrium in total variation
distance. To appear in Ann. Inst. H. Poincaré Probab. Statist.
221. Cattiaux, P., Guillin, A., and Malrieu, F. Probabilistic approach for
granular media equations in the nonuniformly convex case. Probab. Theory
Related Fields 140, 1–2 (2008), 19–40.
222. Champion, Th., De Pascale, L., and Juutinen, P. The ∞-Wasserstein
distance: local solutions and existence of optimal transport maps. Preprint,
2007.
223. Chavel, I. Riemannian geometry — a modern introduction, vol. 108 of Cam-
bridge Tracts in Mathematics. Cambridge University Press, Cambridge, 1993.
224. Chazal, F., Cohen-Steiner, D., and Mérigot, Q. Stability of boundary
measures. INRIA Report, 2007. Available online at
www-sop.inria.fr/geometrica/team/Quentin.Merigot/
225. Cheeger, J. A lower bound for the smallest eigenvalue of the Laplacian. In
Problems in analysis (Papers dedicated to Salomon Bochner, 1969), pp. 195–
199. Princeton Univ. Press, Princeton, N. J., 1970.
226. Cheeger, J. Differentiability of Lipschitz functions on metric measure spaces.
Geom. Funct. Anal. 9, 3 (1999), 428–517.
227. Cheeger, J., and Colding, T. H. Lower bounds on Ricci curvature and the
almost rigidity of warped products. Ann. of Math. (2) 144, 1 (1996), 189–237.
228. Cheeger, J., and Colding, T. H. On the structure of spaces with Ricci
curvature bounded below. I. J. Differential Geom. 46, 3 (1997), 406–480.
229. Cheeger, J., and Colding, T. H. On the structure of spaces with Ricci
curvature bounded below. II. J. Differential Geom. 54, 1 (2000), 13–35.
230. Cheeger, J., and Colding, T. H. On the structure of spaces with Ricci
curvature bounded below. III. J. Differential Geom. 54, 1 (2000), 37–74.
231. Chen, M.-F. Trilogy of couplings and general formulas for lower bound of
spectral gap. Probability towards 2000 (New York, 1995), 123–136, Lecture
Notes in Statist. 128, Springer, New York, 1998.
References 945
232. Chen, M.-F., and Wang, F.-Y. Application of coupling method to the first
eigenvalue on manifold. Science in China (A) 37, 1 (1994), 1–14.
233. Christensen, J. P. R. Measure theoretic zero sets in infinite dimensional
spaces and applications to differentiability of Lipschitz mappings. Publ. Dép.
Math. (Lyon) 10, 2 (1973), 29–39. Actes du Deuxième Colloque d’Analyse
Fonctionnelle de Bordeaux (Univ. Bordeaux, 1973), I, pp. 29–39.
234. Cianchi, A., Fusco, N., Maggi, F., and Pratelli, A. The sharp Sobolev
inequality in quantitative form. Preprint, 2007.
235. Clarke, F. H. Methods of dynamic and nonsmooth optimization, vol. 57 of
CBMS-NSF Regional Conference Series in Applied Mathematics. Society for
Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1989.
236. Clarke, F. H., and Vinter, R. B. Regularity properties of solutions in the
basic problem of calculus of variations. Trans. Amer. Math. Soc. 289 (1985),
73–98.
237. Clement, Ph., and Desch, W. An elementary proof of the triangle inequality
for the Wasserstein metric. Proc. Amer. Math. Soc. 136, 1 (2008), 333–339.
238. Contreras, G., and Iturriaga, R. Global minimizers of autonomous La-
grangians. 22o Colóquio Brasileiro de Matemática. Instituto de Matemática
Pura e Aplicada (IMPA), Rio de Janeiro, 1999.
239. Contreras, G., Iturriaga, R., Paternain, G. P., and Paternain, M.
The Palais–Smale condition and Mañé’s critical values. Ann. Henri Poincaré
1, 4 (2000), 655–684.
240. Cordero-Erausquin, D. Sur le transport de mesures périodiques. C. R.
Acad. Sci. Paris Sér. I Math. 329, 3 (1999), 199–202.
241. Cordero-Erausquin, D. Inégalité de Prékopa–Leindler sur la sphère. C. R.
Acad. Sci. Paris Sér. I Math. 329, 9 (1999), 789–792.
242. Cordero-Erausquin, D. Some applications of mass transport to Gaussian-
type inequalities. Arch. Ration. Mech. Anal. 161, 3 (2002), 257–269.
243. Cordero-Erausquin, D. Non-smooth differential properties of optimal trans-
port. In Recent advances in the theory and applications of mass transport,
vol. 353 of Contemp. Math., Amer. Math. Soc., Providence, 2004, pp. 61–71.
244. Cordero-Erausquin, D. Quelques exemples d’application du transport de
mesure en géométrie euclidienne et riemannienne. In Séminaire de Théorie
Spectrale et Géométrie. Vol. 22, Année 2003–2004, pp. 125–152.
245. Cordero-Erausquin, D., Gangbo, W., and Houdré, Ch. Inequalities for
generalized entropy and optimal transportation. In Recent advances in the
theory and applications of mass transport, vol. 353 of Contemp. Math., Amer.
Math. Soc., Providence, RI, 2004, pp. 73–94.
246. Cordero-Erausquin, D., McCann, R. J., and Schmuckenschläger, M.
A Riemannian interpolation inequality à la Borell, Brascamp and Lieb. Invent.
Math. 146, 2 (2001), 219–257.
247. Cordero-Erausquin, D., McCann, R. J., and Schmuckenschläger, M.
Prékopa–Leindler type inequalities on Riemannian manifolds, Jacobi fields, and
optimal transport. Ann. Fac. Sci. Toulouse Math. (6) 15, 4 (2006), 613–635.
248. Cordero-Erausquin, D., Nazaret, B., and Villani, C. A mass-transpor-
tation approach to sharp Sobolev and Gagliardo–Nirenberg inequalities. Adv.
Math. 182, 2 (2004), 307–332.
249. Coulhon, Th., and Saloff-Coste, L. Isopérimétrie pour les groupes et les
variétés. Rev. Mat. Iberoamericana 9, 2 (1993), 293–314.
946 References
392. Gallay, T., and Wayne, C. E. Global stability of vortex solutions of the
two-dimensional Navier–Stokes equation. Comm. Math. Phys. 255, 1 (2005),
97–129.
393. Gallot, S. Isoperimetric inequalities based on integral norms of Ricci curva-
ture. Astérisque, 157–158 (1988), 191–216.
394. Gallot, S., Hulin, D., and Lafontaine, J. Riemannian geometry, sec-
ond ed. Universitext. Springer-Verlag, Berlin, 1990.
395. Gangbo, W. An elementary proof of the polar factorization of vector-valued
functions. Arch. Rational Mech. Anal. 128, 4 (1994), 381–399.
396. Gangbo, W. The Monge mass transfer problem and its applications. In Monge
Ampère equation: applications to geometry and optimization (Deerfield Beach,
FL, 1997), vol. 226 of Contemp. Math., Amer. Math. Soc., Providence, RI,
1999, pp. 79–104.
397. Gangbo, W. Review on the book Gradient flows in metric spaces and in the
space of probability measures by Ambrosio, Gigli and Savaré, 2006. Available
online at www.math.gatech.edu/~gangbo.
398. Gangbo, W., and McCann, R. J. Optimal maps in Monge’s mass transport
problem. C. R. Acad. Sci. Paris Sér. I Math. 321, 12 (1995), 1653–1658.
399. Gangbo, W., and McCann, R. J. The geometry of optimal transportation.
Acta Math. 177, 2 (1996), 113–161.
400. Gangbo, W., and McCann, R. J. Shape recognition via Wasserstein dis-
tance. Quart. Appl. Math. 58, 4 (2000), 705–737.
401. Gangbo, W., Nguyen, T., and Tudorascu, A. Euler–Poisson systems as
action-minimizing paths in the Wasserstein space. Preprint, 2006.
Available online at www.math.gatech.edu/~gangbo/publications.
402. Gangbo, W., and Oliker, V. I. Existence of optimal maps in the reflector-
type problems. ESAIM Control Optim. Calc. Var. 13, 1 (2007), 93–106.
403. Gangbo, W., and Świȩch, A. Optimal maps for the multidimensional
Monge–Kantorovich problem. Comm. Pure Appl. Math. 51, 1 (1998), 23–45.
404. Gangbo, W., and Westdickenberg, M. Optimal transport for the system
of isentropic Euler equations. Work in progress.
405. Gao, F., and Wu, L. Transportation-Information inequalities for Gibbs mea-
sures. Preprint, 2007.
406. Gardner, R. The Brunn–Minkowski inequality. Bull. Amer. Math. Soc.
(N.S.) 39, 3 (2002), 355–405.
407. Gelbrich, M. On a formula for the L2 Wasserstein metric between measures
on Euclidean and Hilbert spaces. Math. Nachr. 147 (1990), 185–203.
408. Gentil, I. Inégalités de Sobolev logarithmiques et hypercontractivité en
mécanique statistique et en E.D.P. PhD thesis, Univ. Paul-Sabatier (Toulouse),
2001. Available online at
www.ceremade.dauphine.fr/~gentil/maths.html.
409. Gentil, I. Ultracontractive bounds on Hamilton–Jacobi solutions. Bull. Sci.
Math. 126, 6 (2002), 507–524.
410. Gentil, I., Guillin, A., and Miclo, L. Modified logarithmic Sobolev in-
equalities and transportation inequalities. Probab. Theory Related Fields 133,
3 (2005), 409–436.
411. Gentil, I., Guillin, A., and Miclo, L. Modified logarithmic Sobolev in-
equalities in null curvature. Rev. Mat. Iberoamericana 23, 1 (2007), 237–260.
412. Gentil, I., and Malrieu, F. Équations de Hamilton–Jacobi et inégalités
entropiques généralisées. C. R. Acad. Sci. Paris 335 (2002), 437–440.
954 References
473. Hiai, F., Ohya, M., and Tsukada, M. Sufficiency and relative entropy in ∗-
algebras with applications in quantum systems. Pacific J. Math. 107, 1 (1983),
117–140.
474. Hiai, F., Petz, D., and Ueda, Y. Free transportation cost inequalities via
random matrix approximation. Probab. Theory Related Fields 130, 2 (2004),
199–221.
475. Hiai, F., and Ueda, Y. Free transportation cost inequalities for noncommu-
tative multi-variables. Infin. Dimens. Anal. Quantum Probab. Relat. Top. 9, 3
(2006), 391–412.
476. Hohloch, S. Optimale Massebewegung im Monge–Kantorovich-Transport-
problem. Diploma thesis, Freiburg Univ., 2002.
477. Holley, R. Remarks on the FKG inequalities. Comm. Math. Phys. 36 (1974),
227–231.
478. Holley, R., and Stroock, D. W. Logarithmic Sobolev inequalities and
stochastic Ising models. J. Statist. Phys. 46, 5–6 (1987), 1159–1194.
479. Horowitz, J., and Karandikar, R. L. Mean rates of convergence of empir-
ical measures in the Wasserstein metric. J. Comput. Appl. Math. 55, 3 (1994),
261–273. (See also the reviewer’s note on MathSciNet.)
480. Hoskins, B. J. Atmospheric frontogenesis models: some solutions. Q.J.R.
Met. Soc. 97 (1971), 139–153.
481. Hoskins, B. J. The geostrophic momentum approximation and the semi-
geostrophic equations. J. Atmosph. Sciences 32 (1975), 233–242.
482. Hoskins, B. J. The mathematical theory of frontogenesis. Ann. Rev. of Fluid
Mech. 14 (1982), 131–151.
483. Hsu, E. P. Stochastic analysis on manifolds, vol. 38 of Graduate Studies in
Mathematics. American Mathematical Society, Providence, RI, 2002.
484. Hsu, E. P., and Sturm, K.-T. Maximal coupling of Euclidean Brownian
motions. FB-Preprint No. 85, Bonn. Available online at
www-wt.iam.uni-bonn.de/~sturm/de/index.html.
485. Huang, C., and Jordan, R. Variational formulations for Vlasov–Poisson–
Fokker–Planck systems. Math. Methods Appl. Sci. 23, 9 (2000), 803–843.
486. Ichihara, K. Curvature, geodesics and the Brownian motion on a Riemannian
manifold. I. Recurrence properties, II. Explosion properties. Nagoya Math. J.
87 (1982), 101–114; 115–125.
487. Ishii, H. Asymptotic solutions for large time of Hamilton–Jacobi equations.
International Congress of Mathematicians, Vol. III, Eur. Math. Soc., Zürich,
2006, pp. 213–227. Short presentation available online at
www.edu.waseda.ac.jp/~ishii.
488. Ishii, H. Unpublished lecture notes on the weak KAM theorem, 2004. Available
online at www.edu.waseda.ac.jp/~ishii.
489. Itoh, J.-i., and Tanaka, M. The Lipschitz continuity of the distance function
to the cut locus. Trans. Amer. Math. Soc. 353, 1 (2001), 21–40.
490. Jian, H.-Y., and Wang, X.-J. Continuity estimates for the Monge–Ampère
equation. Preprint, 2006.
491. Jimenez, Ch. Dynamic formulation of optimal transport problems. To appear
in J. Convex Anal.
492. Jones, P., Maggioni, M., and Schul, R. Universal local parametrizations
via heat kernels and eigenfunctions of the Laplacian. Preprint, 2008.
493. Jordan, R., Kinderlehrer, D., and Otto, F. The variational formulation
of the Fokker–Planck equation. SIAM J. Math. Anal. 29, 1 (1998), 1–17.
958 References
515. Khesin, B., and Misiolek, G. Shock waves for the Burgers equation and
curvatures of diffeomorphism groups. Shock waves for the Burgers equation
and curvatures of diffeomorphism groups. Proc. Steklov Inst. Math. 250 (2007),
1–9.
516. Kim, S. Harnack inequality for nondivergent elliptic operators on Riemannian
manifolds. Pacific J. Math. 213, 2 (2004), 281–293.
517. Kim, Y. J., and McCann, R. J. Sharp decay rates for the fastest conservative
diffusions. C. R. Math. Acad. Sci. Paris 341, 3 (2005), 157–162.
518. Kim, Y.-H. Counterexamples to continuity of optimal transportation on pos-
itively curved Riemannian manifolds. Preprint, 2007.
519. Kim, Y.-H., and McCann, R. J. On the cost-subdifferentials of cost-convex
functions. Preprint, 2007. Archived online at arxiv.org/abs/0706.1266.
520. Kim, Y.-H., and McCann, R. J. Continuity, curvature, and the general
covariance of optimal transportation. Preprint, 2007.
521. Kleptsyn, V., and Kurtzmann, A. Ergodicity of self-attracting Brownian
motion. Preprint, 2008.
522. Kloeckner, B. A geometric study of the Wasserstein space of the line.
Preprint, 2008.
523. Knothe, H. Contributions to the theory of convex bodies. Michigan Math. J.
4 (1957), 39–52.
524. Knott, M., and Smith, C. S. On the optimal mapping of distributions.
J. Optim. Theory Appl. 43, 1 (1984), 39–49.
525. Knott, M., and Smith, C. S. On a generalization of cyclic monotonicity and
distances among random vectors. Linear Algebra Appl. 199 (1994), 363–371.
526. Kolesnikov, A. V. Convexity inequalities and optimal transport of infinite-
dimensional measures. J. Math. Pures Appl. (9) 83, 11 (2004), 1373–1404.
527. Kontorovich, L. A linear programming inequality with applications to con-
centration of measures. Preprint, 2006. Archived online at
arxiv.org/abs/math.FA/0610712.
528. Kontsevich, M., and Soibelman, Y. Homological mirror symmetry and
torus fibrations. In Symplectic geometry and mirror symmetry (Seoul, 2000).
World Sci. Publ., River Edge, NJ, 2001, pp. 203–263.
529. Koskela, P. Upper gradients and Poincaré inequality on metric measure
spaces. In Lecture notes on Analysis in metric spaces (Trento, 1999), Appunti
Corsi Tenuti Docenti Sc., Scuola Norm. Sup., Pisa, 2000, pp. 55–69.
530. Krylov, N. V. Boundedly nonhomogeneous elliptic and parabolic equations.
Izv. Akad. Nauk SSSR, Ser. Mat. 46, 3 (1982), 487–523. English translation in
Math. USSR Izv. 20, 3 (1983), 459–492.
531. Krylov, N. V. Fully nonlinear second order elliptic equations: recent devel-
opment. Ann. Scuola Norm. Sup. Pisa Cl. Sci. 25, 3–4 (1998), 569–595.
532. Krylov, N. V., and Safonov, M. V. An estimate of the probability that a
diffusion process hits a set of positive measure. Dokl. Acad. Nauk SSSR 245,
1 (1979), 18–20.
533. Kuksin, S., Piatnitski, A., and Shirikyan, A. A coupling approach to
randomly forced nonlinear PDE’s, II. Comm. Math. Phys. 230, 1 (2002), 81–
85.
534. Kullback, S. A lower bound for discrimination information in terms of vari-
ation. IEEE Trans. Inform. Theory 4 (1967), 126–127.
535. Kurtzmann, A. The ODE method for some self-interacting diffusions on Rd .
Preprint, 2008.
960 References
556. Liese, F., and Vajda, I. Convex statistical distances, vol. 95 of Teubner-Texte
zur Mathematik. BSB B. G. Teubner Verlagsgesellschaft, Leipzig, 1987. With
German, French and Russian summaries.
557. Lions, J.-L. Quelques méthodes de résolution des problèmes aux limites non
linéaires. Dunod, 1969.
558. Lions, P.-L. Generalized solutions of Hamilton–Jacobi equations. Pitman
(Advanced Publishing Program), Boston, Mass., 1982.
559. Lions, P.-L. Personal communication, 2003.
560. Lions, P.-L., and Trudinger, N. S. Linear oblique derivative problems for
the uniformly elliptic Hamilton–Jacobi–Bellman equation. Math. Z. 191, 1
(1986), 1–15.
561. Lions, P.-L., Trudinger, N. S., and Urbas, J. The Neumann problem
for equations of Monge–Ampère type. Comm. Pure Appl. Math. 39, 4 (1986),
539–563.
562. Lions, P.-L., Trudinger, N. S., and Urbas, J. The Neumann problem for
equations of Monge–Ampère type. Proceedings of the Centre for Mathematical
Analysis, Australian National University, 10, Canberra, 1986, pp. 135–140.
563. Lions, P.-L., and Lasry, J.-M. Régularité optimale de racines carrées. C. R.
Acad. Sci. Paris Sér. I Math. 343, 10 (2006), 679–684.
564. Lions, P.-L., Papanicolaou, G., and Varadhan, S. R. S. Homogeneization
of Hamilton–Jacobi equations. Unpublished preprint, 1987.
565. Lisini, S. Characterization of absolutely continuous curves in Wasserstein
spaces. Calc. Var. Partial Differential Equations 28, 1 (2007), 85–120.
566. Liu, J. Hölder regularity of optimal mapping in optimal transportation. To
appear in Calc. Var. Partial Differential Equations.
567. Quasi-neutral limit of the Euler–Poisson and Euler–Monge–Ampère systems.
Comm. Partial Differential Equations 30, 7-9 (2005), 1141–1167.
568. Loeper, G. The reconstruction problem for the Euler-Poisson system in cos-
mology. Arch. Ration. Mech. Anal. 179, 2 (2006), 153–216.
569. Loeper, G. A fully nonlinear version of Euler incompressible equations: the
Semi-Geostrophic system. SIAM J. Math. Anal. 38, 3 (2006), 795–823.
570. Loeper, G. On the regularity of maps solutions of optimal transportation
problems. To appear in Acta Math.
571. Loeper, G. On the regularity of maps solutions of optimal transportation
problems II. Work in progress, 2007.
572. Loeper, G., and Villani, C. Regularity of optimal transport in curved
geometry: the nonfocal case. Preprint, 2007.
573. Lott, J. Optimal transport and Ricci curvature for metric-measure spaces.
To appear in Surveys in Differential Geometry.
574. Lott, J. Some geometric properties of the Bakry–Émery–Ricci tensor. Com-
ment. Math. Helv. 78, 4 (2003), 865–883.
575. Lott, J. Some geometric calculations on Wasserstein space. To appear in
Comm. Math. Phys. Available online at www.math.lsa.umich.edu/~lott.
576. Lott, J. Optimal transport and Perelman’s reduced volume. Preprint, 2008.
577. Lott, J., and Villani, C. Ricci curvature for metric-measure spaces via
optimal transport. To appear in Ann. of Math. (2) Available online at
www.umpa.ens-lyon.fr/~cvillani.
578. Lott, J., and Villani, C. Weak curvature bounds and functional inequalities.
J. Funct. Anal. 245, 1 (2007), 311–333.
962 References
579. Lott, J., and Villani, C. Hamilton–Jacobi semigroup on length spaces and
applications. J. Math. Pures Appl. (9) 88, 3 (2007), 219–229.
580. Lu, P., Ni, L., Vázquez, J.-L., and Villani, C. Local Aronson–Bénilan
estimates and entropy formulae for porous medium and fast diffusion equations
on manifolds. Preprint, 2008.
581. Lusternik, L. A. Die Brunn–Minkowskische Ungleichung für beliebige mess-
bare Mengen. Dokl. Acad. Sci. URSS, 8 (1935), 55–58.
582. Lutwak, E., Yang, D., and Zhang, G. Optimal Sobolev norms and the Lp
Minkowski problem. Int. Math. Res. Not. (2006), Art. ID 62987, 21.
583. Lytchak, A. Differentiation in metric spaces. Algebra i Analiz 16, 6 (2004),
128–161.
584. Lytchak, A. Open map theorem for metric spaces. Algebra i Analiz 17, 3
(2005), 139–159.
585. Ma, X.-N., Trudinger, N. S., and Wang, X.-J. Regularity of potential
functions of the optimal transportation problem. Arch. Ration. Mech. Anal.
177, 2 (2005), 151–183.
586. Maggi, F. Some methods for studying stability in isoperimetric type problems.
To appear in Bull. Amer. Math. Soc.
587. Maggi, F., and Villani, C. Balls have the worst best Sobolev inequalities.
J. Geom. Anal. 15, 1 (2005), 83–121.
588. Maggi, F., and Villani, C. Balls have the worst best Sobolev inequalities.
Part II: Variants and extensions. Calc. Var. Partial Differential Equations 31,
1 (2008), 47–74.
589. Mallows, C. L. A note on asymptotic joint normality. Ann. Math. Statist.
43 (1972), 508–515.
590. Malrieu, F. Logarithmic Sobolev inequalities for some nonlinear PDE’s.
Stochastic Process. Appl. 95, 1 (2001), 109–132.
591. Malrieu, F. Convergence to equilibrium for granular media equations and
their Euler schemes. Ann. Appl. Probab. 13, 2 (2003), 540–560.
592. Mañé, R. Lagrangian flows: the dynamics of globally minimizing orbits. In
International Conference on Dynamical Systems (Montevideo, 1995), vol. 362
of Pitman Res. Notes Math. Ser. Longman, Harlow, 1996, pp. 120–131.
593. Mañé, R. Lagrangian flows: the dynamics of globally minimizing orbits. Bol.
Soc. Brasil. Mat. (N.S.) 28, 2 (1997), 141–153.
594. Maroofi, H. Applications of the Monge–Kantorovich theory. PhD thesis,
Georgia Tech, 2002.
595. Marton, K. A measure concentration inequality for contracting Markov
chains. Geom. Funct. Anal. 6 (1996), 556–571.
596. Marton, K. Measure concentration for Euclidean distance in the case of
dependent random variables. Ann. Probab. 32, 3B (2004), 2526–2544.
597. Massart, P. Concentration inequalities and model selection. Lecture Notes
from the 2003 Saint-Flour Probability Summer School. To appear in the
Springer book series Lecture Notes in Math. Available online at
www.math.u-psud.fr/~massart.
598. Mather, J. N. Existence of quasiperiodic orbits for twist homeomorphisms
of the annulus. Topology 21, 4 (1982), 457–467.
599. Mather, J. N. More Denjoy minimal sets for area preserving diffeomorphisms.
Comment. Math. Helv. 60 (1985), 508–557.
600. Mather, J. N. Minimal measures. Comment. Math. Helv. 64, 3 (1989), 375–
394.
References 963
645. Namah, G., and Roquejoffre, J.-M. Remarks on the long time behaviour
of the solutions of Hamilton–Jacobi equations. Comm. Partial Differential
Equations 24, 5–6 (1999), 883–893.
646. Nash, J. Continuity of solutions of parabolic and elliptic equations. Amer. J.
Math. 80 (1958), 931–954.
647. Nelson, E. Derivation of the Schrödinger equation from Newtonian mechanics.
Phys. Rev. 150 (1966), 1079–1085.
648. Nelson, E. Dynamical theories of Brownian motion. Princeton University
Press, Princeton, N.J., 1967. 2001 re-edition by J. Suzuki. Available online at
www.math.princeton.edu/~nelson/books.html.
649. Nelson, E. The free Markoff field. J. Functional Analysis 12 (1973), 211–227.
650. Nelson, E. Critical diffusions. In Séminaire de probabilités, XIX, 1983/84,
vol. 1123 of Lecture Notes in Math. Springer, Berlin, 1985, pp. 1–11.
651. Nelson, E. Quantum fluctuations. Princeton Series in Physics. Princeton
University Press, Princeton, NJ, 1985.
652. Nelson, E. Stochastic mechanics and random fields. In École d’Été de Proba-
bilités de Saint-Flour XV–XVII, 1985–87, vol. 1362 of Lecture Notes in Math.,
Springer, Berlin, 1988, pp. 427–450.
653. Neunzert, H. An introduction to the nonlinear Boltzmann–Vlasov equation.
In Kinetic theories and the Boltzmann equation, C. Cercignani, Ed., vol. 1048
of Lecture Notes in Math., Springer, Berlin, Heidelberg, 1984, pp. 60–110.
654. Ohta, S.-i. On the measure contraction property of metric measure spaces.
Comment. Math. Helv. 82, 4 (2007), 805–828.
655. Ohta, S.-i. Gradient flows on Wasserstein spaces over compact Alexandrov
spaces. To appear in Amer. J. Math.
656. Ohta, S.-i. Products, cones, and suspensions of spaces with the measure
contraction property. J. Lond. Math. Soc. (2) 76, 1 (2007), 225–236.
657. Ohta, S.-i. Finsler interpolation inequalities. Preprint, 2008. Available online
at www.math.kyoto-u.ac.jp/~sohta.
658. Øksendal, B. Stochastic differential equations. An introduction with applica-
tions, sixth ed. Universitext. Springer-Verlag, Berlin, 2003.
659. Oliker, V. Embedding Sn into Rn+1 with given integral Gauss curvature and
optimal mass transport on Sn . Adv. Math. 213, 2 (2007), 600–620.
660. Oliker, V. Variational solutions of some problems in convexity via Monge–
Kantorovich optimal mass transport theory. Conference in Oberwolfach, July
2006 (personal communication).
661. Oliker, V., and Prussner, L. D. A new technique for synthesis of offset dual
reflector systems. In 10th Annual Review of Progress in Applied Computational
Electromagnetics, 1994, pp. 45–52.
662. Ollivier, Y. Ricci curvature of Markov chains on metric spaces. Preprint,
2007. Available online at www.umpa.ens-lyon.fr/~yollivie/publs.html.
663. Ollivier, Y., and Pansu, P. Courbure de Ricci et concentration de la mesure.
Working seminar notes. Available online at
www.umpa.ens-lyon.fr/~yollivie.
664. Osserman, R. The isoperimetric inequality. Bull. Amer. Math. Soc. 84, 6
(1978), 1182–1238.
665. Otsu, Y., and Shioya, T. The Riemannian structure of Alexandrov spaces.
J. Differential Geom. 39, 3 (1994), 629–658.
666. Otto, F. Double degenerate diffusion equations as steepest descent. Preprint
Univ. Bonn, 1996.
966 References
750. Sheng, W., Trudinger, N. S., and Wang, X.-J. The Yamabe problem for
higher order curvatures. Preprint, archived online at
arxiv.org/abs/math/0505463.
751. Shioya, T. Mass of rays in Alexandrov spaces of nonnegative curvature. Com-
ment. Math. Helv. 69, 2 (1994), 208–228.
752. Siburg, K. F. The principle of least action in geometry and dynamics,
vol. 1844 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2004.
753. Simon, L. Lectures on geometric measure theory. Proceedings of the Centre
for Mathematical Analysis, Australian National University, 3, Canberra, 1983.
754. Smith, C., and Knott, M. On Hoeffding–Fréchet bounds and cyclic mono-
tone relations. J. Multivariate Anal. 40, 2 (1992), 328–334.
755. Sobolevskiı̆, A., and Frisch, U. Application of optimal transportation
theory to the reconstruction of the early Universe. Zap. Nauchn. Sem. S.-
Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 312, 11 (2004), 303–309, 317.
English translation in J. Math. Sci. (N. Y.) 133, 4 (2006), 1539–1542.
756. Soibelman, Y. Notes on noncommutative Riemannian geometry. Personal
communication, 2006.
757. Spohn, H. Large scale dynamics of interacting particles. Texts and Mono-
graphs in Physics. Springer-Verlag, Berlin, 1991.
758. Stam, A. Some inequalities satisfied by the quantities of information of Fisher
and Shannon. Inform. Control 2 (1959), 101–112.
759. Stroock, D. W. An introduction to the analysis of paths on a Riemannian
manifold, vol. 74 of Mathematical Surveys and Monographs. American Mathe-
matical Society, Providence, RI, 2000.
760. Sturm, K.-T. Diffusion processes and heat kernels on metric spaces. Ann.
Probab. 26, 1 (1998), 1–55.
761. Sturm, K.-T. Convex functionals of probability measures and nonlinear dif-
fusions on manifolds. J. Math. Pures Appl. (9) 84, 2 (2005), 149–168.
762. Sturm, K.-T. On the geometry of metric measure spaces. I. Acta Math. 196,
1 (2006), 65–131.
763. Sturm, K.-T. On the geometry of metric measure spaces. II. Acta Math. 196,
1 (2006), 133–177.
764. Sturm, K.-T., and von Renesse, M.-K. Transport inequalities, gradient
estimates, entropy and Ricci curvature. Comm. Pure Appl. Math. 58, 7 (2005),
923–940.
765. Sudakov, V. N. Geometric problems in the theory of infinite-dimensional
probability distributions. Proc. Steklov Inst. Math. 141 (1979), 1–178.
766. Sudakov, V. N. and Cirel′ son, B. S. Extremal properties of half-spaces
for spherically invariant measures. Zap. Naučn. Sem. Leningrad. Otdel. Mat.
Inst. Steklov. (LOMI) 41 (1974), 14–24. English translation in J. Soviet Math.
9 (1978), 9–18.
767. Sznitman, A.-S. Equations de type de Boltzmann, spatialement homogènes.
Z. Wahrsch. Verw. Gebiete 66 (1984), 559–562.
768. Sznitman, A.-S. Topics in propagation of chaos. In École d’Été de Probabilités
de Saint-Flour XIX—1989. Springer, Berlin, 1991, pp. 165–251.
769. Szulga, A. On minimal metrics in the space of random variables. Teor.
Veroyatnost. i Primenen. 27, 2 (1982), 401–405.
770. Talagrand, M. A new isoperimetric inequality for product measure and the
tails of sums of independent random variables. Geom. Funct. Anal. 1, 2 (1991),
211–223.
References 971
794. Trudinger, N. S., and Wang, X.-J. On the second boundary value problem
for Monge–Ampère type equations and optimal transportation. Preprint, 2006.
Archived online at arxiv.org/abs/math.AP/0601086.
795. Tuero-Dı́az, A. On the stochastic convergence of representations based on
Wasserstein metrics. Ann. Probab. 21, 1 (1993), 72–85.
796. Uckelmann, L. Optimal couplings between one-dimensional distributions.
Distributions with given marginals and moment problems (Prague, 1996),
Kluwer Acad. Publ., Dordrecht, 1997, pp. 275–281.
797. Unterreiter, A., Arnold, A., Markowich, P., and Toscani, G. On
generalized Csiszár–Kullback inequalities. Monatsh. Math. 131, 3 (2000), 235–
253.
798. Urbas, J. On the second boundary value problem for equations of Monge–
Ampère type. J. Reine Angew. Math. 487 (1997), 115–124.
799. Urbas, J. Mass transfer problems. Lecture notes from a course given in Univ.
Bonn, 1997–1998.
800. Urbas, J. The second boundary value problem for a class of Hessian equations.
Comm. Partial Differential Equations 26, 5–6 (2001), 859–882.
801. Valdimarsson, S. I. On the Hessian of the optimal transport potential.
Preprint, 2006.
802. Varadhan, S. R. S. Mathematical statistics. Courant Institute of Mathemat-
ical Sciences New York University, New York, 1974. Lectures given during the
academic year 1973–1974, Notes by Michael Levandowsky and Norman Rubin.
803. Vasershtein, L. N. Markov processes over denumerable products of spaces
describing large system of automata. Problemy Peredači Informacii 5, 3 (1969),
64–72.
804. Vázquez, J. L. An introduction to the mathematical theory of the porous
medium equation. In Shape optimization and free boundaries (Montreal, PQ,
1990). Kluwer Acad. Publ., Dordrecht, 1992, pp. 347–389.
805. Vázquez, J. L. The porous medium equation. Mathematical theory. Ox-
ford Mathematical Monographs. The Clarendon Press Oxford University Press,
New York, 2006.
806. Vázquez, J. L. Smoothing and decay estimates for nonlinear parabolic equa-
tions of porous medium type, vol. 33 of Oxford Lecture Series in Mathematics
and its Applications. Oxford University Press, 2006.
807. Vershik, A. M. Some remarks on the infinite-dimensional problems of linear
programming. Russian Math. Surveys 25, 5 (1970), 117–124.
808. Vershik, A. M. L. V. Kantorovich and linear programming. Historical note
(2001, updated in 2007). Archived at www.arxiv.org/abs/0707.0491.
809. Vershik, A. M. The Kantorovich metric: the initial history and little-known
applications. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov.
(POMI) 312, Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 11 (2004),
69–85, 311.
810. Vershik, A. M., Ed. J. Math. Sci. 133, 4 (2006). Special issue dedicated
to L. V. Kantorovich. Springer, New York, 2006. Translated from the Rus-
sian: Zapsiki Nauchn. seminarov POMI, vol. 312: “Theory of representation of
Dynamical Systems. Special Issue”. Saint-Petersburg, 2004.
811. Villani, C. Remarks about negative pressure problems. Unpublished notes,
2002.
References 973
834. Wang, X.-J. On the design of a reflector antenna. II. Calc. Var. Partial
Differential Equations 20, 3 (2004), 329–341.
835. Wang, X.-J. Schauder estimates for elliptic and parabolic equations. Chin.
Ann. Math. 27B, 6 (2006), 637–642.
836. Werner, R. F. The uncertainty relation for joint measurement of position
and momentum. Quantum Information and Computing (QIC) 4, 6–7 (2004),
546–562. Archived online at arxiv.org/abs/quant-ph/0405184.
837. Wolansky, G. On time reversible description of the process of coagulation
and fragmentation. To appear in Arch. Ration. Mech. Anal.
838. Wolansky, G. Extended least action principle for steady flows under a pre-
scribed flux. To appear in Calc. Var. Partial Differential Equations.
839. Wolansky, G. Minimizers of Dirichlet functionals on the n-torus
and the weak KAM theory. Preprint, 2007. Available online at
www.math.technion.ac.il/~gershonw.
840. Wolfson, J. G. Minimal Lagrangian diffeomorphisms and the Monge–
Ampère equation. J. Differential Geom. 46, 2 (1997), 335–373.
841. Wu, L. Poincaré and transportation inequalities for Gibbs measures under the
Dobrushin uniqueness condition. Ann. Probab. 34, 5 (2006), 1960–1989.
842. Wu, L. A simple inequality for probability measures and applications.
Preprint, 2006.
843. Wu, L., and Zhang, Z. Talagrand’s T2 -transportation inequality w.r.t. a
uniform metric for diffusions. Acta Math. Appl. Sin. Engl. Ser. 20, 3 (2004),
357–364.
844. Wu, L., and Zhang, Z. Talagrand’s T2 -transportation inequality and log-
Sobolev inequality for dissipative SPDEs and applications to reaction-diffusion
equations. Chinese Ann. Math. Ser. B 27, 3 (2006), 243–262.
845. Yukich, J. E. Optimal matching and empirical measures. Proc. Amer. Math.
Soc. 107, 4 (1989), 1051–1059.
846. Zhu, S. The comparison geometry of Ricci curvature. In Comparison geometry
(Berkeley, CA, 1993–94), vol. 30 of Math. Sci. Res. Inst. Publ., Cambridge
University Press, Cambridge, 1997, pp. 221–262.
List of short statements
Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Deterministic coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Existence of an optimal coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Lower semicontinuity of the cost functional . . . . . . . . . . . . . . . . . . . . . . 55
Tightness of transference plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Optimality is inherited by restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Convexity of the optimal cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Cyclical monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
c-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
c-concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Alternative characterization of c-convexity . . . . . . . . . . . . . . . . . . . . . . 69
Kantorovich duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Restriction of c-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Restriction for the Kantorovich duality theorem . . . . . . . . . . . . . . . . . 88
Stability of optimal transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Compactness of the set of optimal plans . . . . . . . . . . . . . . . . . . . . . . . . 90
Measurable selection of optimal plans . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Stability of the transport map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Dual transport inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Criterion for solvability of the Monge problem . . . . . . . . . . . . . . . . . . . 96
Wasserstein distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Kantorovich–Rubinstein distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Wasserstein space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Weak convergence in Pp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Wp metrizes Pp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Continuity of Wp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Metrizability of the weak topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Cauchy sequences in Wp are tight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
976 List of short statements
991
R3 semi-geostrophic equations [268]
f (x3 − y3 )
erf(α|x − y|) R Hsu–Sturm’s maximal coupling of Brownian paths [484]
β 2
|x − y| , 0 < β < 1 R or R modeling in economy [399]
992
Cost Setting Use Where quoted