0% found this document useful (0 votes)

373 views998 pages

Optimal Transport-Cedric Villani

Optimal transport, old and new June 13, 2008 do mo chuisle mo chro' a"lle i, e Contents Couplings and changes of variables.

Uploaded by

eisenbud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

373 views998 pages

Optimal Transport-Cedric Villani

Optimal transport, old and new June 13, 2008 do mo chuisle mo chro' a"lle i, e Contents Couplings and changes of variables.

Uploaded by

eisenbud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 998

Cédric Villani

Optimal transport, old and new

June 13, 2008

Springer
Berlin Heidelberg NewYork
Hong Kong London
Milan Paris Tokyo
Do mo chuisle mo chroı́, Aëlle
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Introduction 13

1 Couplings and changes of variables . . . . . . . . . . . . . . . . . . . 17

2 Three examples of coupling techniques . . . . . . . . . . . . . . . 33

3 The founding fathers of optimal transport . . . . . . . . . . . 41

Part I Qualitative description of optimal transport 51

4 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5 Cyclical monotonicity and Kantorovich duality . . . . . . . 63

6 The Wasserstein distances . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7 Displacement interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

8 The Monge–Mather shortening principle . . . . . . . . . . . . . 175

9 Solution of the Monge problem I: Global approach . . . 217

10 Solution of the Monge problem II: Local approach . . . 227

VIII Contents

11 The Jacobian equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

12 Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

13 Qualitative picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347

Part II Optimal transport and Riemannian geometry 367

14 Ricci curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

15 Otto calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435

16 Displacement convexity I . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449

17 Displacement convexity II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

18 Volume control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507

19 Density control and local regularity . . . . . . . . . . . . . . . . . . 521

20 Infinitesimal displacement convexity . . . . . . . . . . . . . . . . . 541

21 Isoperimetric-type inequalities . . . . . . . . . . . . . . . . . . . . . . . 561

22 Concentration inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 583

23 Gradient flows I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645

24 Gradient flows II: Qualitative properties . . . . . . . . . . . . . 709

25 Gradient flows III: Functional inequalities . . . . . . . . . . . . 735

Part III Synthetic treatment of Ricci curvature 747

26 Analytic and synthetic points of view . . . . . . . . . . . . . . . . 751

27 Convergence of metric-measure spaces . . . . . . . . . . . . . . . 759

28 Stability of optimal transport . . . . . . . . . . . . . . . . . . . . . . . . 789

Contents IX

29 Weak Ricci curvature bounds I: Definition and

Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811

30 Weak Ricci curvature bounds II: Geometric and

analytic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865

Conclusions and open problems 921

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933

List of short statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975

List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 983

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985

Some notable cost functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989

Preface
2 Preface

When I was first approached for the 2005 edition of the Saint-Flour
Probability Summer School, I was intrigued, flattered and scared.1
Apart from the challenge posed by the teaching of a rather analytical
subject to a probabilistic audience, there was the danger of producing
a remake of my recent book Topics in Optimal Transportation.
However, I gradually realized that I was being offered a unique op-
portunity to rewrite the whole theory from a different perspective, with
alternative proofs and a different focus, and a more probabilistic pre-
sentation; plus the incorporation of recent progress. Among the most
striking of these recent advances, there was the rising awareness that
John Mather’s minimal measures had a lot to do with optimal trans-
port, and that both theories could actually be embedded in a single
framework. There was also the discovery that optimal transport could
provide a robust synthetic approach to Ricci curvature bounds. These
links with dynamical systems on one hand, differential geometry on
the other hand, were only briefly alluded to in my first book; here on
the contrary they will be at the basis of the presentation. To summa-
rize: more probability, more geometry, and more dynamical systems.
Of course there cannot be more of everything, so in some sense there
is less analysis and less physics, and also there are fewer digressions.
So the present course is by no means a reduction or an expansion of
my previous book, but should be regarded as a complementary reading.
Both sources can be read independently, or together, and hopefully the
complementarity of points of view will have pedagogical value.
Throughout the book I have tried to optimize the results and the
presentation, to provide complete and self-contained proofs of the most
important results, and comprehensive bibliographical notes — a daunt-
ingly difficult task in view of the rapid expansion of the literature. Many
statements and theorems have been written specifically for this course,
and many results appear in rather sharp form for the first time. I also
added several appendices, either to present some domains of mathe-
matics to non-experts, or to provide proofs of important auxiliary re-
sults. All this has resulted in a rapid growth of the document, which in
the end is about six times (!) the size that I had planned initially. So
the non-expert reader is advised to skip long proofs at first
reading, and concentrate on explanations, statements, examples and
sketches of proofs when they are available.
1
Fans of Tom Waits may have identified this quotation.
Preface 3

About terminology: For some reason I decided to switch from “trans-

portation” to “transport”, but this really is a matter of taste.
For people who are already familiar with the theory of optimal trans-
port, here are some more serious changes.
Part I is devoted to a qualitative description of optimal transport.
The dynamical point of view is given a prominent role from the be-
ginning, with Robert McCann’s concept of displacement interpolation.
This notion is discussed before any theorem about the solvability of the
Monge problem, in an abstract setting of “Lagrangian action” which
generalizes the notion of length space. This provides a unified picture
of recent developments dealing with various classes of cost functions,
in a smooth or nonsmooth context.
I also wrote down in detail some important estimates by John
Mather, well-known in certain circles, and made extensive use of them,
in particular to prove the Lipschitz regularity of “intermediate” trans-
port maps (starting from some intermediate time, rather than from ini-
tial time). Then the absolute continuity of displacement interpolants
comes for free, and this gives a more unified picture of the Mather
and Monge–Kantorovich theories. I rewrote in this way the classical
theorems of solvability of the Monge problem for quadratic cost in Eu-
clidean space. Finally, this approach allows one to treat change of vari-
ables formulas associated with optimal transport by means of changes
of variables that are Lipschitz, and not just with bounded variation.
Part II discusses optimal transport in Riemannian geometry, a line
of research which started around 2000; I have rewritten all these ap-
plications in terms of Ricci curvature, or more precisely curvature-
dimension bounds. This part opens with an introduction to Ricci cur-
vature, hopefully readable without any prior knowledge of this notion.
Part III presents a synthetic treatment of Ricci curvature bounds
in metric-measure spaces. It starts with a presentation of the theory of
Gromov–Hausdorff convergence; all the rest is based on recent research
papers mainly due to John Lott, Karl-Theodor Sturm and myself.
In all three parts, noncompact situations will be systematically
treated, either by limiting processes, or by restriction arguments (the
restriction of an optimal transport is still optimal; this is a simple but
powerful principle). The notion of approximate differentiability, intro-
duced in the field by Luigi Ambrosio, appears to be particularly handy
in the study of optimal transport in noncompact Riemannian mani-
folds.
4 Preface

Several parts of the subject are not developed as much as they would
deserve. Numerical simulation is not addressed at all, except for a few
comments in the concluding part. The regularity theory of optimal
transport is described in Chapter 12 (including the remarkable recent
works of Xu-Jia Wang, Neil Trudinger and Grégoire Loeper), but with-
out the core proofs and latest developments; this is not only because
of the technicality of the subject, but also because smoothness is not
needed in the rest of the book. Still another poorly developed subject is
the Monge–Mather–Mañé problem arising in dynamical systems, and
including as a variant the optimal transport problem when the cost
function is a distance. This topic is discussed in several treatises, such as
Albert Fathi’s monograph, Weak KAM theorem in Lagrangian dynam-
ics; but now it would be desirable to rewrite everything in a framework
that also encompasses the optimal transport problem. An important
step in this direction was recently performed by Patrick Bernard and
Boris Buffoni. In Chapter 8 I shall provide an introduction to Mather’s
theory, but there would be much more to say.
The treatment of Chapter 22 (concentration of measure) is strongly
influenced by Michel Ledoux’s book, The Concentration of Measure
Phenomenon; while the results of Chapters 23 to 25 owe a lot to
the monograph by Luigi Ambrosio, Nicola Gigli and Giuseppe Savaré,
Gradient flows in metric spaces and in the space of probability mea-
sures. Both references are warmly recommended complementary read-
ing. One can also consult the two-volume treatise by Svetlozar Rachev
and Ludger Rüschendorf, Mass Transportation Problems, for many ap-
plications of optimal transport to various fields of probability theory.
While writing this text I asked for help from a number of friends
and collaborators. Among them, Luigi Ambrosio and John Lott are
the ones whom I requested most to contribute; this book owes a lot
to their detailed comments and suggestions. Most of Part III, but also
significant portions of Parts I and II, are made up with ideas taken from
my collaborations with John, which started in 2004 as I was enjoying
the hospitality of the Miller Institute in Berkeley. Frequent discussions
with Patrick Bernard and Albert Fathi allowed me to get the links
between optimal transport and John Mather’s theory, which were a
key to the presentation in Part I; John himself gave precious hints
about the history of the subject. Neil Trudinger and Xu-Jia Wang spent
vast amounts of time teaching me the regularity theory of Monge–
Ampère equations. Alessio Figalli took up the dreadful challenge to
Preface 5

check the entire set of notes from ﬁrst to last page. Apart from these
people, I got valuable help from Stefano Bianchini, François Bolley,
Yann Brenier, Xavier Cabré, Vincent Calvez, José Antonio Carrillo,
Dario Cordero-Erausquin, Denis Feyel, Sylvain Gallot, Wilfrid Gangbo,
Diogo Aguiar Gomes, Nathaël Gozlan, Arnaud Guillin, Nicolas Juillet,
Kazuhiro Kuwae, Michel Ledoux, Grégoire Loeper, Francesco Maggi,
Robert McCann, Shin-ichi Ohta, Vladimir Oliker, Yann Ollivier, Felix
Otto, Ludger Rüschendorf, Giuseppe Savaré, Walter Schachermayer,
Benedikt Schulte, Theo Sturm, Josef Teichmann, Anthon Thalmaier,
Hermann Thorisson, Süleyman Üstünel, Anatoly Vershik, and others.
Short versions of this course were tried on mixed audiences in the
Universities of Bonn, Dortmund, Grenoble and Orléans, as well as the
Borel seminar in Leysin and the IHES in Bures-sur-Yvette. Part of
the writing was done during stays at the marvelous MFO Institute
in Oberwolfach, the CIRM in Luminy, and the Australian National
University in Canberra. All these institutions are warmly thanked.
It is a pleasure to thank Jean Picard for all his organization work
on the 2005 Saint-Flour summer school; and the participants for their
questions, comments and bug-tracking, in particular Sylvain Arlot
(great bug-tracker!), Fabrice Baudoin, Jérôme Demange, Steve Evans
(whom I also thank for his beautiful lectures), Christophe Leuridan,
Jan Oblój, Erwan Saint Loubert Bié, and others. I extend these thanks
to the joyful group of young PhD students and maı̂tres de conférences
with whom I spent such a good time on excursions, restaurants, quan-
tum ping-pong and other activities, making my stay in Saint-Flour
truly wonderful (with special thanks to my personal driver, Stéphane
Loisel, and my table tennis sparring-partner and adversary, François
Simenhaus). I will cherish my visit there in memory as long as I live!
Typing these notes was mostly performed on my (now defunct)
faithful laptop Torsten, a gift of the Miller Institute. Support by the
Agence Nationale de la Recherche and Institut Universitaire de France
is acknowledged. My eternal gratitude goes to those who made ﬁne
typesetting accessible to every mathematician, most notably Donald
Knuth for TEX, and the developers of LATEX, BibTEX and XFig. Final
thanks to Catriona Byrne and her team for a great editing process.
As usual, I encourage all readers to report mistakes and misprints.
I will maintain a list of errata, accessible from my Web page.

Cédric Villani
Lyon, June 2008
Conventions
8 Conventions

Axioms
I use the classical axioms of set theory; not the full version of the axiom
of choice (only the classical axiom of “countable dependent choice”).
Sets and structures
Id is the identity mapping, whatever the space. If A is a set then the
function 1A is the indicator function of A: 1A (x) = 1 if x ∈ A, and 0
otherwise. If F is a formula, then 1F is the indicator function of the
set defined by the formula F .
If f and g are two functions, then (f, g) is the function x 7−→
(f (x), g(x)). The composition f ◦ g will often be denoted by f (g).
N is the set of positive integers: N = {1, 2, 3, . . .}. A sequence is
written (xk )k∈N , or simply, when no confusion seems possible, (xk ).
R is the set of real numbers. When I write Rn it is implicitly assumed
that n is a positive integer. The Euclidean scalar product between two
vectors a and b in Rn is denoted interchangeably by a · b or ha, bi. The
Euclidean norm will be denoted simply by | · |, independently of the
dimension n.
Mn (R) is the space of real n × n matrices, and In the n × n identity
matrix. The trace of a matrix M will be denoted by tr M , its deter-
minant
p by det M , its adjoint by M ∗ , and its Hilbert–Schmidt norm
tr (M ∗ M ) by kM kHS (or just kM k).
Unless otherwise stated, Riemannian manifolds appearing in the
text are finite-dimensional, smooth and complete. If a Riemannian
manifold M is given, I shall usually denote by n its dimension, by
d the geodesic distance on M , and by vol the volume (= n-dimensional
Hausdorff) measure on M . The tangent space at x will be denoted by
Tx M , and the tangent bundle by TM . The norm on Tx M will most
of the time be denoted by | · |, as in Rn , without explicit mention of
the point x. (The symbol k · k will be reserved for special norms or
functional norms.) If S is a set without smooth structure, the notation
Tx S will instead denote the tangent cone to S at x (Definition 10.46).
If Q is a quadratic form defined on Rn , or on the tangent bundle

ofa
manifold, its value on a (tangent) vector v will be denoted by Q·v, v ,
or simply Q(v).
The open ball of radius r and center x in a metric space X is denoted
interchangeably by B(x, r) or Br (x). If X is a Riemannian manifold,
the distance is of course the geodesic distance. The closed ball will be
denoted interchangeably by B[x, r] or Br] (x). The diameter of a metric
space X will be denoted by diam (X ).
Conventions 9

The closure of a set A in a metric space will be denoted by A (this

is also the set of all limits of sequences with values in A).
A metric space X is said to be locally compact if every point x ∈ X
admits a compact neighborhood; and boundedly compact if every closed
and bounded subset of X is compact.
A map f between metric spaces (X , d) and (X ′ , d′ ) is said to be
C-Lipschitz if d′ (f (x), f (y)) ≤ C d(x, y) for all x, y in X . The best
admissible constant C is then denoted by kf kLip .
A map is said to be locally Lipschitz if it is Lipschitz on bounded
sets, not necessarily compact (so it makes sense to speak of a locally
Lipschitz map defined almost everywhere).
A curve in a space X is a continuous map defined on an interval of
R, valued in X . For me the words “curve” and “path” are synonymous.
The time-t evaluation map et is defined by et (γ) = γt = γ(t).
If γ is a curve defined from an interval of R into a metric space,
its length will be denoted by L(γ), and its speed by |γ̇|; definitions are
recalled on p. 131.
Usually geodesics will be minimizing, constant-speed geodesic curves.
If X is a metric space, Γ (X ) stands for the space of all geodesics
γ : [0, 1] → X .
Being given x0 and x1 in a metric space, I denote by [x0 , x1 ]t the
set of all t-barycenters of x0 and x1 , as defined on p. 407. If A0 and
A1 are two sets, then [A0 , A1 ]t stands for the set of all [x0 , x1 ]t with
(x0 , x1 ) ∈ A0 × A1 .

Function spaces
C(X ) is the space of continuous functions X → R, Cb (X ) the space
of bounded continuous functions X → R; and C0 (X ) the space of
continuous functions X → R converging to 0 at infinity; all of them
are equipped with the norm of uniform convergence kϕk∞ = sup |ϕ|.
Then Cbk (X ) is the space of k-times continuously differentiable func-
tions u : X → R, such that all the partial derivatives of u up to order k
are bounded; it is equipped with the norm given by the supremum of
all norms k∂ukCb , where ∂u is a partial derivative of order at most k;
Cck (X ) is the space of k-times continuously differentiable functions with
compact support; etc. When the target space is not R but some other
space Y, the notation is transformed in an obvious way: C(X ; Y), etc.
Lp is the Lebesgue space of exponent p; the space and the measure
will often be implicit, but clear from the context.
10 Conventions

Calculus
The derivative of a function u = u(t), defined on an interval of R and
valued in Rn or in a smooth manifold, will be denoted by u′ , or more
often by u̇. The notation d+ u/dt stands for the upper right-derivative
of a real-valued function u: d+ u/dt = lim sups↓0 [u(t + s) − u(t)]/s.
If u is a function of several variables, the partial derivative with
respect to the variable t will be denoted by ∂t u, or ∂u/∂t. The notation
ut does not stand for ∂t u, but for u(t).
The gradient operator will be denoted by grad or simply ∇; the di-
vergence operator by div or ∇· ; the Laplace operator by ∆; the Hessian
operator by Hess or ∇2 (so ∇2 does not stand for the Laplace opera-
tor). The notation is the same in Rn or in a Riemannian manifold. ∆ is
the divergence of the gradient, so it is typically a nonpositive operator.
The value of the gradient of f at point x will be denoted either by
∇x f or ∇f (x). The notation ∇ e stands for the approximate gradient,
introduced in Definition 10.2.
If T is a map Rn → Rn , ∇T stands for the Jacobian matrix of T ,
that is the matrix of all partial derivatives (∂Ti /∂xj ) (1 ≤ i, j ≤ n).
All these differential operators will be applied to (smooth) functions
but also to measures, by duality. For
R instance, Rthe Laplacian of a mea-
sure µ is defined via the identity ζ d(∆µ) = (∆ζ) dµ (ζ ∈ Cc2 ). The
notation is consistent in the sense that ∆(f vol) = (∆f ) vol. Similarly,
I shall take the divergence of a vector-valued measure, etc.
f = o(g) means f /g −→ 0 (in an asymptotic regime that should be
clear from the context), while f = O(g) means that f /g is bounded.
log stands for the natural logarithm with base e.
The positive and negative parts of x ∈ R are defined respectively
by x+ = max (x, 0) and x− = max (−x, 0); both are nonnegative, and
|x| = x+ + x− . The notation a ∧ b will sometimes be used for min (a, b).
All these notions are extended in the usual way to functions and also
to signed measures.
Probability measures
δx is the Dirac mass at point x.
All measures considered in the text are Borel measures on Polish
spaces, which are complete, separable metric spaces, equipped with
their Borel σ-algebra. I shall usually not use the completed σ-algebra,
except on some rare occasions (emphasized in the text) in Chapter 5.
A measure is said to be finite if it has finite mass, and locally finite
if it attributes finite mass to compact sets.
Conventions 11

The space of Borel probability measures on X is denoted by P (X ),

the space of finite Borel measures by M+ (X ), the space of signed finite
Borel measures by M (X ). The total variation of µ is denoted by kµkTV .
The integral of a function f with respect
R to a probability
R measure
µ
R will be denoted interchangeably by f (x) dµ(x) or f (x) µ(dx) or
f dµ.
If µ is a Borel measure on a topological space X , a set N is said to
be µ-negligible if N is included in a Borel set of zero µ-measure. Then
µ is said to be concentrated on a set C if X \ C is negligible. (If C
itself is Borel measurable, this is of course equivalent to µ[X \ C] = 0.)
By abuse of language, I may say that X has full µ-measure if µ is
concentrated on X .
If µ is a Borel measure, its support Spt µ is the smallest closed set
on which it is concentrated. The same notation Spt will be used for the
support of a continuous function.
If µ is a Borel measure on X , and T is a Borel map X → Y, then
T# µ stands for the image measure2 (or push-forward) of µ by T : It is
a Borel measure on Y, defined by (T# µ)[A] = µ[T −1 (A)].
The law of a random variable X defined on a probability space
(Ω, P ) is denoted by law (X); this is the same as X# P .
The weak topology on P (X ) (or topology of weak convergence, or
narrow topology) is induced by convergence against Cb (X ), i.e. bounded
continuous test functions. If X is Polish, then the space P (X ) itself is
Polish. Unless explicitly stated, I do not use the weak-∗ topology of
measures (induced by C0 (X ) or Cc (X )).
When a probability measure is clearly specified by the context, it
will sometimes be denoted just by P , and the associated integral, or
expectation, will be denoted by E .
If π(dx dy) is a probability measure in two variables x ∈ X and
y ∈ Y, its marginal (or projection) on X (resp. Y) is the measure
X# π (resp. Y# π), where X(x, y) = x, Y (x, y) = y. If (x, y) is random
with law (x, y) = π, then the conditional law of x given y is denoted
by π(dx|y); this is a measurable function Y → P (X ), obtained by
disintegrating π along its y-marginal. The conditional law of y given x
will be denoted by π(dy|x).
A measure µ is said to be absolutely continuous with respect to a
measure ν if there exists a measurable function f such that µ = f ν.

2
Depending on the authors, the measure T# µ is often denoted by T #µ, T∗ µ, T (µ),
T µ, δT (a) µ(da), µ ◦ T −1 , µT −1 , or µ[T ∈ · ].
R
12 Conventions

Notation specific to optimal transport and related fields

If µ ∈ P (X ) and ν ∈ P (Y) are given, then Π(µ, ν) is the set of all joint
probability measures on X × Y whose marginals are µ and ν.
C(µ, ν) is the optimal (total) cost between µ and ν, see p. 92. It
implicitly depends on the choice of a cost function c(x, y).
For any p ∈ [1, +∞), Wp is the Wasserstein distance of order p, see
Definition 6.1; and Pp (X ) is the Wasserstein space of order p, i.e. the
set of probability measures with finite moments of order p, equipped
with the distance Wp , see Definition 6.4.
Pc (X ) is the set of probability measures on X with compact support.
If a reference measure ν on X is specified, then P ac (X ) (resp.
Ppac (X ), Pcac (X )) stands for those elements of P (X ) (resp. Pp (X ),
Pc (X )) which are absolutely continuous with respect to ν.
DCN is the displacement convexity class of order N (N plays the
role of a dimension); this is a family of convex functions, defined on
p. 457 and in Definition 17.1.
Uν is a functional defined on P (X ); it depends on a convex func-
tion U and a reference measure ν on X . This functional will be defined
at various levels of generality, first in equation (15.2), then in Defini-
tion 29.1 and Theorem 30.4.
β
Uπ,ν is another functional on P (X ), which involves not only a convex
function U and a reference measure ν, but also a coupling π and a
distortion coefficient β, which is a nonnegative function on X × X : See
again Definition 29.1 and Theorem 30.4.
The Γ and Γ2 operators are quadratic differential operators associ-
ated with a diffusion operator; they are defined in (14.47) and (14.48).
(K,N )
βt is the notation for the distortion coefficients that will play a
prominent role in these notes; they are defined in (14.61).
CD(K, N ) means “curvature-dimension condition (K, N )”, which
morally means that the Ricci curvature is bounded below by Kg (K a
real number, g the Riemannian metric) and the dimension is bounded
above by N (a real number which is not less than 1).
If c(x, y) is a cost function then č(y, x) = c(x, y). Similarly, if
π(dx dy) is a coupling, then π̌ is the coupling obtained by swapping
variables, that is π̌(dy dx) = π(dx dy), or more rigorously, π̌ = S# π,
where S(x, y) = (y, x).
Assumptions (Super), (Twist), (Lip), (SC), (locLip), (locSC),
(H∞) are defined on p. 246, (STwist) on p. 313, (Cutn−1 ) on p. 317.
Introduction
15

To start, I shall recall in Chapter 1 some basic facts about couplings

and changes of variables, including definitions, a short list of famous
couplings (Knothe–Rosenblatt coupling, Moser coupling, optimal cou-
pling, etc.); and some important basic formulas about change of vari-
ables, conservation of mass, and linear diffusion equations.
In Chapter 2 I shall present, without detailed proofs, three applica-
tions of optimal coupling techniques, providing a flavor of the kind of
applications that will be considered later.
Finally, Chapter 3 is a short historical perspective about the foun-
dations and development of optimal coupling theory.
1

Couplings and changes of variables

Couplings are very well-known in all branches of probability theory,

but since they will occur again and again in this course, it might be a
good idea to start with some basic reminders and a few more technical
issues.

Definition 1.1 (Coupling). Let (X , µ) and (Y, ν) be two probability

spaces. Coupling µ and ν means constructing two random variables
X and Y on some probability space (Ω, P ), such that law (X) = µ,
law (Y ) = ν. The couple (X, Y ) is called a coupling of (µ, ν). By abuse
of language, the law of (X, Y ) is also called a coupling of (µ, ν).

If µ and ν are the only laws in the problem, then without loss of
generality one may choose Ω = X × Y. In a more measure-theoretical
formulation, coupling µ and ν means constructing a measure π on X ×Y
such that π admits µ and ν as marginals on X and Y respectively.
The following three statements are equivalent ways to rephrase that
marginal condition:

• (projX )# π = µ, (projY )# π = ν, where projX and projY respectively

stand for the projection maps (x, y) 7−→ x and (x, y) 7−→ y;
• For all measurable sets A ⊂ X , B ⊂ Y, one has π[A × Y] = µ[A],
π[X × B] = ν[B];
• For all integrable (resp. nonnegative) measurable functions ϕ, ψ on
X , Y,
Z Z Z

ϕ(x) + ψ(y) dπ(x, y) = ϕ dµ + ψ dν.
X ×Y X Y
18 1 Couplings and changes of variables

A ﬁrst remark about couplings is that they always exist: at least

there is the trivial coupling, in which the variables X and Y are
independent (so their joint law is the tensor product µ ⊗ ν). This
can hardly be called a coupling, since the value of X does not give
any information about the value of Y . Another extreme is when all
the information about the value of Y is contained in the value of X,
in other words Y is just a function of X. This motivates the following
deﬁnition (in which X and Y do not play symmetric roles).

Definition 1.2 (Deterministic coupling). With the notation of

Definition 1.1, a coupling (X, Y ) is said to be deterministic if there
exists a measurable function T : X → Y such that Y = T (X).

To say that (X, Y ) is a deterministic coupling of µ and ν is strictly

equivalent to any one of the four statements below:

• (X, Y ) is a coupling of µ and ν whose law π is concentrated on the

graph of a measurable function T : X → Y;
• X has law µ and Y = T (X), where T# µ = ν;
• X has law µ and Y = T (X), where T is a change of variables
from µ to ν: for all ν-integrable (resp. nonnegative measurable) func-
tions ϕ, Z Z

ϕ(y) dν(y) = ϕ T (x) dµ(x); (1.1)
Y X

• π = (Id , T )# µ.

The map T appearing in all these statements is the same and is

uniquely defined µ-almost surely (when the joint law of (X, Y ) has been
fixed). The converse is true: If T and Te coincide µ-almost surely, then
T# µ = Te# µ. It is common to call T the transport map: Informally,
one can say that T transports the mass represented by the measure µ,
to the mass represented by the measure ν.
Unlike couplings, deterministic couplings do not always exist: Just
think of the case when µ is a Dirac mass and ν is not. But there
may also be infinitely many deterministic couplings between two given
probability measures.
Some famous couplings 19

Some famous couplings

Here below are some of the most famous couplings used in mathematics
— of course the list is far from complete, since everybody has his or
her own preferred coupling technique. Each of these couplings comes
with its own natural setting; this variety of assumptions reﬂects the
variety of constructions. (This is a good reason to state each of them
with some generality.)

1. The measurable isomorphism. Let (X , µ) and (Y, ν) be two

Polish (i.e. complete, separable, metric) probability spaces with-
out atom (i.e. no single point carries a positive mass). Then there
exists a (nonunique) measurable bijection T : X → Y such that
T# µ = ν, (T −1 )# ν = µ. In that sense, all atomless Polish prob-
ability spaces are isomorphic, and, say, isomorphic to the space
Y = [0, 1] equipped with the Lebesgue measure. Powerful as that
theorem may seem, in practice the map T is very singular; as a good
exercise, the reader might try to construct it “explicitly”, in terms
of cumulative distribution functions, when X = R and Y = [0, 1]
(issues do arise when the density of µ vanishes at some places). Ex-
perience shows that it is quite easy to fall into logical traps when
working with the measurable isomorphism, and my advice is to
never use it.
2. The Moser mapping. Let X be a smooth compact Riemannian
manifold with volume vol, and let f, g be Lipschitz continuous pos-
itive probability densities on X ; then there exists a deterministic
coupling of µ = f vol and ν = g vol, constructed by resolution of an
elliptic equation. On the positive side, there is a somewhat explicit
representation of the transport map T , and it is as smooth as can
be: if f, g are C k,α then T is C k+1,α. The formula is given in the
Appendix at the end of this chapter. The same construction works
in Rn provided that f and g decay fast enough at inﬁnity; and it is
robust enough to accommodate for variants.
3. The increasing rearrangement on R. Let µ, ν be two probability
measures on R; deﬁne their cumulative distribution functions by
Z x Z y
F (x) = dµ, G(y) = dν.
−∞ −∞

Further deﬁne their right-continuous inverses by

20 1 Couplings and changes of variables
n o
F −1 (t) = inf x ∈ R; F (x) > t ;
n o
G−1 (t) = inf y ∈ R; G(y) > t ;

and set
T = G−1 ◦ F.
If µ does not have atoms, then T# µ = ν. This rearrangement is quite
simple, explicit, as smooth as can be, and enjoys good geometric
properties.
4. The Knothe–Rosenblatt rearrangement in Rn . Let µ and ν be
two probability measures on Rn , such that µ is absolutely continu-
ous with respect to Lebesgue measure. Then define a coupling of µ
and ν as follows.
Step 1: Take the marginal on the first variable: this gives probabil-
ity measures µ1 (dx1 ), ν1 (dy1 ) on R, with µ1 being atomless. Then
define y1 = T1 (x1 ) by the formula for the increasing rearrangement
of µ1 into ν1 .
Step 2: Now take the marginal on the first two variables and dis-
integrate it with respect to the first variable. This gives proba-
bility measures µ2 (dx1 dx2 ) = µ1 (dx1 ) µ2 (dx2 |x1 ), ν2 (dy1 dy2 ) =
ν1 (dy1 ) ν2 (dy2 |y1 ). Then, for each given y1 ∈ R, set y1 = T1 (x1 ),
and define y2 = T2 (x2 ; x1 ) by the formula for the increasing rear-
rangement of µ2 (dx2 |x1 ) into ν2 (dy2 |y1 ). (See Figure 1.1.)
Then repeat the construction, adding variables one after the other
and defining y3 = T3 (x3 ; x1 , x2 ); etc. After n steps, this produces
a map y = T (x) which transports µ to ν, and in practical situa-
tions might be computed explicitly with little effort. Moreover, the
Jacobian matrix of the change of variables T is (by construction)
upper triangular with positive entries on the diagonal; this makes
it suitable for various geometric applications. On the negative side,
this mapping does not satisfy many interesting intrinsic properties;
it is not invariant under isometries of Rn , not even under relabeling
of coordinates.
5. The Holley coupling on a lattice. Let µ and ν be two discrete
probabilities on a finite lattice Λ, say {0, 1}N , equipped with the
natural partial ordering (x ≤ y if xn ≤ yn for all n). Assume that

∀x, y ∈ Λ, µ[inf(x, y)] ν[sup(x, y)] ≥ µ[x] ν[y]. (1.2)

Some famous couplings 21

ν
µ

dx1 dy1
T1

Fig. 1.1. Second step in the construction of the Knothe–Rosenblatt map: After the
correspondance x1 → y1 has been determined, the conditional probability of x2 (seen
as a one-dimensional probability on a small “slice” of width dx1 ) can be transported
to the conditional probability of y2 (seen as a one-dimensional probability on a slice
of width dy1 ).

Then there exists a coupling (X, Y ) of (µ, ν) with X ≤ Y . The situa-

tion above appears in a number of problems in statistical mechanics,
in connection with the so-called FKG (Fortuin–Kasteleyn–Ginibre)
inequalities. Inequality (1.2) intuitively says that ν puts more mass
on large values than µ.
6. Probabilistic representation formulas for solutions of partial
diﬀerential equations. There are hundreds of them (if not thou-
sands), representing solutions of diﬀusion, transport or jump pro-
cesses as the laws of various deterministic or stochastic processes.
Some of them are recalled later in this chapter.
7. The exact coupling of two stochastic processes, or Markov chains.
Two realizations of a stochastic process are started at initial time,
and when they happen to be in the same state at some time, they
are merged: from that time on, they follow the same path and ac-
cordingly, have the same law. For two Markov chains which are
started independently, this is called the classical coupling. There
22 1 Couplings and changes of variables

are many variants with important diﬀerences which are all intended
to make two trajectories close to each other after some time: the
Ornstein coupling, the ε-coupling (in which one requires the
two variables to be close, rather than to occupy the same state),
the shift-coupling (in which one allows an additional time-shift),
etc.
8. The optimal coupling or optimal transport. Here one intro-
duces a cost function c(x, y) on X × Y, that can be interpreted
as the work needed to move one unit of mass from location x to
location y. Then one considers the Monge–Kantorovich mini-
mization problem
inf E c(X, Y ),

where the pair (X, Y ) runs over all possible couplings of (µ, ν); or
equivalently, in terms of measures,
Z
inf c(x, y) dπ(x, y),
X ×Y

where the inﬁmum runs over all joint probability measures π on

X ×Y with marginals µ and ν. Such joint measures are called trans-
ference plans (or transport plans, or transportation plans); those
achieving the inﬁmum are called optimal transference plans.

Of course, the solution of the Monge–Kantorovich problem depends

on the cost function c. The cost function and the probability spaces here
can be very general, and some nontrivial results can be obtained as soon
as, say, c is lower semicontinuous and X , Y are Polish spaces. Even the
apparently trivial choice c(x, y) = 1x6=y appears in the probabilistic
interpretation of total variation:
n o
kµ − νkT V = 2 inf E 1X6=Y ; law (X) = µ, law (Y ) = ν .

Cost functions valued in {0, 1} also occur naturally in Strassen’s duality

theorem.
Under certain assumptions one can guarantee that the optimal cou-
pling really is deterministic; the search of deterministic optimal cou-
plings (or Monge couplings) is called the Monge problem. A solution
of the Monge problem yields a plan to transport the mass at minimal
cost with a recipe that associates to each point x a single point y. (“No
mass shall be split.”) To guarantee the existence of solutions to the
Gluing 23

Monge problem, two kinds of assumptions are natural: First, c should

“vary enough” in some sense (think that the constant cost function
will allow for arbitrary minimizers), and secondly, µ should enjoy some
regularity property (at least Dirac masses should be ruled out!). Here
is a typical result: If c(x, y) = |x − y|2 in the Euclidean space, µ is
absolutely continuous with respect to Lebesgue measure, and µ, ν have
finite moments of order 2, then there is a unique optimal Monge cou-
pling between µ and ν. More general statements will be established in
Chapter 10.
Optimal couplings enjoy several nice properties:
(i) They naturally arise in many problems coming from economics,
physics, partial differential equations or geometry (by the way, the in-
creasing rearrangement and the Holley coupling can be seen as partic-
ular cases of optimal transport);
(ii) They are quite stable with respect to perturbations;
(iii) They encode good geometric information, if the cost function c
is defined in terms of the underlying geometry;
(iv) They exist in smooth as well as nonsmooth settings;
(v) They come with a rich structure: an optimal cost functional
(the value of the infimum defining the Monge–Kantorovich problem); a
dual variational problem; and, under adequate structure conditions,
a continuous interpolation.
On the negative side, it is important to be warned that optimal
transport is in general not so smooth. There are known counterexam-
ples which put limits on the regularity that one can expect from it,
even for very nice cost functions.
All these issues will be discussed again and again in the sequel. The
rest of this chapter is devoted to some basic technical tools.

Gluing

If Z is a function of Y and Y is a function of X, then of course Z is

a function of X. Something of this still remains true in the setting of
nondeterministic couplings, under quite general assumptions.

Gluing lemma. Let (Xi , µi ), i = 1, 2, 3, be Polish probability spaces. If

(X1 , X2 ) is a coupling of (µ1 , µ2 ) and (Y2 , Y3 ) is a coupling of (µ2 , µ3 ),
24 1 Couplings and changes of variables

then one can construct a triple of random variables (Z1 , Z2 , Z3 ) such

that (Z1 , Z2 ) has the same law as (X1 , X2 ) and (Z2 , Z3 ) has the same
law as (Y2 , Y3 ).

It is simple to understand why this is called “gluing lemma”: if π12

stands for the law of (X1 , X2 ) on X1 × X2 and π23 stands for the law of
(X2 , X3 ) on X2 × X3 , then to construct the joint law π123 of (Z1 , Z2 , Z3 )
one just has to glue π12 and π23 along their common marginal µ2 .
Expressed in a slightly informal way: Disintegrate π12 and π23 as

π12 (dx1 dx2 ) = π12 (dx1 |x2 ) µ2 (dx2 ),

π23 (dx2 dx3 ) = π23 (dx3 |x2 ) µ2 (dx2 ),

and then reconstruct π123 as

π123 (dx1 dx2 dx3 ) = π12 (dx1 |x2 ) µ2 (dx2 ) π23 (dx3 |x2 ).

Change of variables formula

When one writes the formula for change of variables, say in Rn or on

a Riemannian manifold, a Jacobian term appears, and one has to be
careful about two things: the change of variables should be injective
(otherwise, reduce to a subset where it is injective, or take the multi-
plicity into account); and it should be somewhat smooth. It is classical
to write these formulas when the change of variables is continuously
diﬀerentiable, or at least Lipschitz:

Change of variables formula. Let M be an n-dimensional Rieman-

nian manifold with a C 1 metric, let µ0 , µ1 be two probability measures
on M , and let T : M → M be a measurable function such that T# µ0 =
µ1 . Let ν be a reference measure, of the form ν(dx) = e−V (x) vol(dx),
where V is continuous and vol is the volume (or n-dimensional Haus-
dorff ) measure. Further assume that
(i) µ0 (dx) = ρ0 (x) ν(dx) and µ1 (dy) = ρ1 (y) ν(dy);
(ii) T is injective;
(iii) T is locally Lipschitz.
Then, µ0 -almost surely,
Change of variables formula 25

ρ0 (x) = ρ1 (T (x)) JT (x), (1.3)

where JT (x) is the Jacobian determinant of T at x, defined by

ν[T (Bε (x))]

JT (x) := lim . (1.4)
ε↓0 ν[Bε (x)]

The same holds true if T is only defined on the complement of a µ0 -

negligible set, and satisfies properties (ii) and (iii) on its domain of
definition.

Remark 1.3. When ν is just the volume measure, JT coincides with

the usual Jacobian determinant, which in the case M = Rn is the ab-
solute value of the determinant of the Jacobian matrix ∇T . Since V is
continuous, it is almost immediate to deduce the statement with an ar-
bitrary V from the statement with V = 0 (this amounts to multiplying
ρ0 (x) by eV (x) , ρ1 (y) by eV (y) , JT (x) by eV (x)−V (T (x)) ).

Remark 1.4. There is a more general framework beyond diﬀerentiabil-

ity, namely the property of approximate differentiability. A func-
tion T on an n-dimensional Riemannian manifold is said to be approx-
imately diﬀerentiable at x if there exists a function Te, diﬀerentiable at
x, such that the set {Te 6= T } has zero density at x, i.e.

vol x ∈ Br (x); T (x) 6= Te(x)
lim = 0.
r→0 vol [Br (x)]

It turns out that, roughly speaking, an approximately diﬀerentiable

map can be replaced, up to neglecting a small set, by a Lipschitz map
(this is a kind of differentiable version of Lusin’s theorem). So one can
prove the Jacobian formula for an approximately differentiable map by
approximating it with a sequence of Lipschitz maps.
Approximate differentiability is obviously a local property; it holds
true if the distributional derivative of T is a locally integrable function,
or even a locally finite measure. So it is useful to know that the change
of variables formula still holds true if Assumption (iii) above is replaced
by
(iii’) T is approximately differentiable.
26 1 Couplings and changes of variables

Conservation of mass Formula

The single most important theorem of change of variables arising in

continuum physics might be the one resulting from the conservation
of mass formula,
∂ρ
+ ∇ · (ρ ξ) = 0. (1.5)
∂t
Here ρ = ρ(t, x) stands for the density of a system of particles at
time t and position x; ξ = ξ(t, x) for the velocity ﬁeld at time t and
position x; and ∇· stands for the divergence operator. Once again, the
natural setting for this equation is a Riemannian manifold M .
It will be useful to work with particle densities µt (dx) (that are not
necessarily absolutely continuous) and rewrite (1.5) as

∂µ
+ ∇ · (µ ξ) = 0,
∂t
where the time-derivative is taken in the weak sense, and the diver-
gence operator is deﬁned by duality against continuously diﬀerentiable
functions with compact support:
Z Z
ϕ ∇ · (µ ξ) = − (ξ · ∇ϕ) dµ.
M M

The formula of conservation of mass is an Eulerian description of

the physical world, which means that the unknowns are fields. The next
theorem links it with the Lagrangian description, in which everything
is expressed in terms of particle trajectories, that are integral curves of
the velocity field:
d
ξ t, Tt (x) = Tt (x). (1.6)
dt
If ξ is (locally) Lipschitz continuous, then the Cauchy–Lipschitz the-
orem guarantees the existence of a flow Tt locally defined on a maximal
time interval, and itself locally Lipschitz in both arguments t and x.
Then, for each t the map Tt is a local diffeomorphism onto its image.
But the formula of conservation of mass also holds true without any
regularity assumption on ξ; one should only keep in mind that if ξ is
not Lipschitz, then a solution of (1.6) is not uniquely determined by
its value at time 0, so x 7−→ Tt (x) is not necessarily uniquely defined.
Still it makes sense to consider random solutions of (1.6).
Diffusion formula 27

Mass conservation formula. Let M be a C 1 manifold, T ∈ (0, +∞]

and let ξ(t, x) be a (measurable) velocity field on [0, T ) × M . Let
(µt )0≤t<T be a time-dependent family of probability measures on M
(continuous in time for the weak topology), such that
Z TZ
|ξ(t, x)| µt (dx) dt < +∞.
0 M

Then, the following two statements are equivalent:

(i) µ = µt (dx) is a weak solution of the linear (transport) partial
differential equation
∂t µ + ∇x · (µ ξ) = 0
on [0, T ) × M ;
(ii) µt is the law at time t of a random solution Tt (x) of (1.6).
If moreover ξ is locally Lipschitz, then (Tt )0≤t<T defines a deter-
ministic flow, and statement (ii) can be rewritten
(ii’) µt = (Tt )# µ0 .

Diffusion formula

The ﬁnal reminder in this chapter is very well-known and related to

Itô’s formula; it was discovered independently (in the Euclidean con-
text) by Bachelier, Einstein and Smoluchowski at the beginning of the
twentieth century. It requires a bit more regularity than the Conserva-
tion of mass Formula. The natural assumptions on the phase space are
in terms of Ricci curvature, a concept which will play an important role
in these notes. For the reader who has no idea what Ricci curvature
means, it is suﬃcient to know that the theorem below applies when
M is either Rn , or a compact manifold with a C 2 metric. By conven-
tion, Bt denotes the “standard” Brownian motion on M with identity
covariance matrix.

Diffusion theorem. Let M be a Riemannian manifold with a C 2 met-

ric, such that the Ricci curvature tensor of M is uniformly bounded
below, and let σ(t, x) : Tx M → Tx M be a twice differentiable linear
mapping on each tangent space. Let Xt stand for the solution of the
stochastic differential equation
28 1 Couplings and changes of variables
√
dXt = 2 σ(t, Xt ) dBt (0 ≤ t < T ). (1.7)

Then the following two statements are equivalent:

(i) µ = µt (dx) is a weak solution of the linear (diffusion) partial
differential equation

∂t µ = ∇x · (σσ ∗ )∇x µ

on M × [0, T ), where σ ∗ stands for the transpose of σ;

(ii) µt = law (Xt ) for all t ∈ [0, T ), where Xt solves (1.7).

Example 1.5. In Rn , the solution

√ of the heat equation with initial
datum√δ0 is the law of Xt = 2 Bt (Brownian motion sped up by a
factor 2).

Remark 1.6. Actually, there is a finer criterion for the diffusion equa-
tion to hold true: it is sufficient that the Ricci curvature at point x be
bounded below by −Cd(x0 , x)2 gx as x → ∞, where gx is the metric at
point x and x0 is an arbitrary reference point. The exponent 2 here is
sharp.

Exercise 1.7. Let M be a smooth compact manifold, equipped with its

standard reference volume, and let ρ0 be a smooth positive probability
density on M . Let (ρt )t≥0 be the solution of the heat equation

∂t ρ = ∆ρ.

Use (ρt ) to construct a deterministic coupling of ρ0 and ρ1 .

Hint: Rewrite the heat equation in the form of an equation of conser-
vation of mass.

Appendix: Moser’s coupling

In this Appendix I shall promote Moser’s technique for coupling smooth

positive probability measures; it is simple, elegant and powerful, and
plays a prominent role in geometry. It is not limited to compact mani-
folds, but does require assumptions about the behavior at inﬁnity.
Let M be a smooth n-dimensional Riemannian manifold, equipped
with a reference probability measure ν(dx) = e−V (x) vol(dx), where
Bibliographical notes 29

V ∈ C 1 (M ). Let µ0 = ρ0 ν, µ1 = ρ1 ν be two probability measures on

M ; assume for simplicity that ρ0 , ρ1 are bounded below by a constant
K > 0. Further assume that ρ0 and ρ1 are locally Lipschitz, and that
the equation
(∆ − ∇V · ∇) u = ρ0 − ρ1
1,1
can be solved for some u ∈ Cloc (M ) (that is, ∇u is locally Lipschitz).
Then, deﬁne a locally Lipschitz vector ﬁeld
∇u(x)
ξ(t, x) = ,
(1 − t) ρ0 (x) + t ρ1 (x)

with associated ﬂow (Tt (x))0≤t≤1 , and a family (µt )0<t<1 of probability
measures by
µt = (1 − t) µ0 + t µ1 .
It is easy to check that

∂t µ = (ρ1 − ρ0 ) ν,

∇· µt ξ(t, ·) = ∇· ∇u e−V vol = e−V ∆u−∇V ·∇u vol = (ρ0 −ρ1 ) ν.
So µt satisﬁes the formula of conservation of mass, therefore µt =
(Tt )# µ0 . In particular, T1 pushes µ0 forward to µ1 .
In the case when M is compact and V = 0, the above construction
works if ρ0 and ρ1 are Lipschitz continuous and positive. Indeed, the
solution u of ∆u = ρ0 − ρ1 will be of class C 2,α for all α ∈ (0, 1),
and in particular ∇u will be of class C 1 (in fact C 1,α ). In more general
situations, things might depend on the regularity of V , and its behavior
at inﬁnity.

Bibliographical notes

An excellent general reference book for the “classical theory” of cou-

plings is the monograph by Thorisson [781]. There one can ﬁnd an
exhaustive treatment of classical couplings of Markov chains or stochas-
tic processes, such as ε-coupling, shift-coupling, Ornstein coupling. The
classical theory of optimal couplings is addressed in the two volumes
by Rachev and Rüschendorf [696]. This includes in particular the the-
ory of optimal coupling on the real line with a convex cost function,
which can be treated in a simple and direct manner [696, Section 3.1].
30 1 Couplings and changes of variables

(In [814], for the sake of consistency of the presentation I treated op-
timal coupling on R as a particular case of optimal coupling on Rn ,
however this has the drawback to involve subtle arguments.)
The Knothe–Rosenblatt coupling was introduced in 1952 by Rosen-
blatt [709], who suggested that it might be useful to “normalize” sta-
tistical data before applying a statistical test. In 1957, Knothe [523]
rediscovered it for applications to the theory of convex bodies. It is
quite likely that other people have discovered this coupling indepen-
dently. An infinite-dimensional generalization was studied by Bogachev,
Kolesnikov and Medvedev [134, 135].
FKG inequalities were introduced in [375], and have since then
played a crucial role in statistical mechanics. Holley’s proof by coupling
appears in [477]. Recently, Caffarelli [188] has revisited the subject in
connection with optimal transport.
It was in 1965 that Moser proved his coupling theorem, for smooth
compact manifolds without boundaries [640]; noncompact manifolds
were later considered by Greene and Shiohama [432]. Moser himself also
worked with Dacorogna on the more delicate case where the domain
is an open set with boundary, and the transport is required to fix the
boundary [270].
Strassen’s duality theorem is discussed e.g. in [814, Section 1.4].
The gluing lemma is due to several authors, starting with Vorob’ev
in 1962 for finite sets. The modern formulation seems to have emerged
around 1980, independently by Berkes and Philipp [101], Kallenberg,
Thorisson, and maybe others. Refinements were discussed e.g. by de
Acosta [273, Theorem A.1] (for marginals indexed by an arbitrary set)
or Thorisson [781, Theorem 5.1]; see also the bibliographic comments
in [317, p. 20]. For a proof of the statement in these notes, it is suf-
ficient to consult Dudley [317, Theorem 1.1.10], or [814, Lemma 7.6].
A comment about terminology: I like the word “gluing” which gives a
good indication of the construction, but many authors just talk about
“composition” of plans.
The formula of change of variables for C 1 or Lipschitz change of vari-
ables can be found in many textbooks, see e.g. Evans and Gariepy [331,
Chapter 3]. The generalization to approximately differentiable maps is
explained in Ambrosio, Gigli and Savaré [30, Section 5.5]. Such a gen-
erality is interesting in the context of optimal transportation, where
changes of variables are often very rough (say BV , which means of
bounded variation). In that context however, there is more structure:
Bibliographical notes 31

For instance, changes of variables will typically be given by the gradient

of a convex function in Rn , and on such a map one knows slightly more
than on a general BV function, because convex functions are twice
differentiable almost everywhere (Theorem 14.25 later in these notes).
McCann [614] used this property to prove, by slightly more elemen-
tary means, the change of variables formula for a gradient of convex
function; the proof is reproduced in [814, Theorem 4.8]. It was later
generalized by Cordero-Erausquin, McCann and Schmuckenschläger to
Riemannian manifolds [246], a case which again can be treated either
as part of the general theory of BV changes of variables, or with the
help of almost everywhere second derivatives of semiconcave functions.
The formula of conservation of mass is also called the method of
characteristics for linear transport equations, and is described in a num-
ber of textbooks in partial differential equations, at least when the driv-
ing vector field is Lipschitz, see for instance Evans [327, Section 3.2].
An essentially equivalent statement is proven in [814, Theorem 5.34].
Treating vector fields that are only assumed to be locally Lipschitz is
not so easy: see Ambrosio, Gigli and Savaré [30, Section 8.1].
The Lipschitz condition can be relaxed into a Sobolev or even a BV
condition, but then the flow is determined only almost everywhere, and
this becomes an extremely subtle problem, which has been studied by
many authors since the pioneering work of DiPerna and Lions [304]
at the beginning of the nineties. See Ambrosio [21] for recent progress
and references. The version which is stated in these notes, with no
regularity assumption, is due to Ambrosio and carefully proved in [30,
Section 8.1]. In spite of its appealing and relatively natural character
(especially in a probabilistic perspective), this is a very recent research
result. Note that, if Tt (x) is not uniquely determined by x, then the
solution to the conservation equation starting with a given probability
measure might admit several solutions.
A recent work by Lisini [565] addresses a generalization of the for-
mula of conservation of mass in the setting of general Polish spaces.
Of course, without any regularity assumption on the space it is impos-
sible to speak of vector fields and partial differential equations; but it
is still possible to consider paths in the space of probability measures,
and random curves. Lisini’s results are most naturally expressed in the
language of optimal transport distances; see the bibliographical notes
for Chapter 7.
32 1 Couplings and changes of variables

The diﬀusion formula can be obtained as a simple consequence of

the Itô formula, which in the Euclidean setting can be found in any
textbook on stochastic differential equations, e.g. [658]. It was recently
the hundredth anniversary of the discovery of the diffusion formula
by Einstein [322]; or rather rediscovery, since Bachelier already had
obtained the main results at the turn of the twentieth century [251,
739]. (Some information about Bachelier’s life can be found online at
sjepg.univ-fcomte.fr/sjepgbis/libre/bachelier/page01/page01.htm.) Fasci-
nating tales about the Brownian motion can be read in Nelson’s un-
conventional book [648], especially Chapters 1–4. For the much more
subtle Riemannian setting, one may consult Stroock [759], Hsu [483]
and the references therein.
The Brownian motion on a smooth Riemannian manifold is always
well-defined, even if the manifold has a wild behavior at infinity (the
construction of the Brownian motion is purely local); but in the ab-
sence of a good control on the Ricci curvature, there might be several
heat kernels, and the heat equation might not be uniquely solvable for
a given initial datum. This corresponds to the possibility of a blow-up
of the Brownian motion (i.e. the Brownian motion escapes to infin-
ity) in finite time. All this was explained to me by Thalmaier. The
sharp criterion Ricx ≥ −C (1 + d(x0 , x)2 ) gx for avoiding blow-up of
the heat equation is based on comparison theorems for Laplace oper-
ators. In the version stated here it is due to Ichihara [486]; see also
the book by Hackenbroch and Thalmaier [454, p. 544]. Nonexplosion
criteria based on curvature have been studied by Gaffney, Yau, Hsu,
Karp and Li, Davies, Takeda, Sturm, and Grigor’yan; for a detailed
exposition, and many explanations, the reader can consult the survey
by Grigor’yan [434, Section 9].
2

Three examples of coupling techniques

In this chapter I shall present three applications of coupling methods.

The ﬁrst one is classical and quite simple, the other two are more
original but well-representative of the topics that will be considered
later in these notes. The proofs are extremely variable in diﬃculty and
will only be sketched here; see the references in the bibliographical
notes for details.

Convergence of the Langevin process

Consider a particle subject to the force induced by a potential V ∈
C 1 (Rn ), a friction and a random white noise agitation. If Xt stands for
the position of the particle at time t, m for its mass, λ for the friction
coefficient, k for the Boltzmann constant and T for the temperature of
the heat bath, then Newton’s equation of motion can be written
d2 Xt dXt √ dBt
m 2
= −∇V (Xt ) − λ m + kT , (2.1)
dt dt dt
where (Bt )t≥0 is a standard Brownian motion. This is a second-order
(stochastic) differential equation, so it should come with initial condi-
tions for both the position X and the velocity Ẋ.
Now consider a large cloud of particles evolving independently, ac-
cording to (2.1); the question is whether the distribution of particles
will converge to a definite limit as t → ∞. In other words: Consider the
stochastic differential equation (2.1) starting from some initial distribu-
tion µ0 (dx dv) = law (X0 , Ẋ0 ); is it true that law (Xt ), or law (Xt , Ẋt ),
will converge to some given limit law as t → ∞?
34 2 Three examples of coupling techniques

Obviously, to solve this problem one has to make some assumptions

on the potential V , which should prevent the particles from all escaping
at infinity; for instance, we can make the very strong assumption that V
is uniformly convex, i.e. there exists K > 0 such that the Hessian ∇2 V
satisfies ∇2 V ≥ KIn . Some assumptions on the initial distribution
might also be needed; for instance, it is natural to assume that the
Hamiltonian has finite expectation at initial time:
!
|Ẋ0 |2
E V (X0 ) + < +∞
2

Under these assumptions, it is true that there is exponential conver-

gence to equilibrium, at least if V does not grow too wildly at infinity
(for instance if the Hessian of V is also bounded above). However, I do
not know of any simple method to prove this.
On the other hand, consider the limit where the friction coefficient
is quite strong, and the motion of the particle is so slow that the ac-
celeration term may be neglected in front of the others: then, up to
resetting units, equation (2.1) becomes
dXt √ dBt
= −∇V (Xt ) + 2 , (2.2)
dt dt
which is often called a Langevin process. Now, to study the conver-
gence of equilibrium for (2.2) there is an extremely simple solution by
coupling. Consider another random position (Yt )t≥0 obeying the same
equation as (2.2):
dYt √ dBt
= −∇V (Yt ) + 2 , (2.3)
dt dt
where the random realization of the Brownian motion is the same as
in (2.2) (this is the coupling). The initial positions X0 and Y0 may be
coupled in an arbitrary way, but it is possible to assume that they are
independent. In any case, since they are driven by the same Brownian
motion, Xt and Yt will be correlated for t > 0.
Since Bt is not differentiable as a function of time, neither Xt nor
Yt is differentiable (equations (2.2) and (2.3) hold only in the sense of
solutions of stochastic differential equations); but it is easily checked
that αt := Xt − Yt is a continuously differentiable function of time, and
dαt
= − ∇V (Xt ) − ∇V (Yt ) ,
dt
Euclidean isoperimetry 35

so in particular

d |αt |2 D E 2
= − ∇V (Xt )−∇V (Yt ), Xt −Yt ≤ −K Xt −Yt = −K |αt |2 .
dt 2
It follows by Gronwall’s lemma that

|αt |2 ≤ e−2Kt |α0 |2 .

Assume for simplicity that E |X0 |2 and E |Y0 |2 are ﬁnite. Then

E |Xt − Yt |2 ≤ e−2Kt E |X0 − Y0 |2 ≤ 2 E |X0 |2 + E |Y0 |2 e−2Kt . (2.4)

In particular, Xt − Yt converges to 0 almost surely, and this is indepen-

dent of the distribution of Y0 .
This in itself would be essentially suﬃcient to guarantee the exis-
tence of a stationary distribution; but in any case, it is easy to check,
by applying the diﬀusion formula, that

e−V (y) dy
ν(dy) =
Z
R −V
(where Z = e is a normalization constant) is stationary: If
law (Y0 ) = ν, then also law (Yt ) = ν. Then (2.4) easily implies that
µt := law (Xt ) converges weakly to ν; in addition, the convergence is
exponentially fast.

Euclidean isoperimetry

Among all subsets of Rn with given surface, which one has the largest
volume? To simplify the problem, let us assume that we are looking
for a bounded open set Ω ⊂ Rn with, say, Lipschitz boundary ∂Ω, and
that the measure of |∂Ω| is given; then the problem is to maximize the
measure of |Ω|. To measure ∂Ω one should use the (n − 1)-dimensional
Hausdorﬀ measure, and to measure Ω the n-dimensional Hausdorﬀ
measure, which of course is the same as the Lebesgue measure in Rn .
It has been known, at least since ancient times, that the solution
to this “isoperimetric problem” is the ball. A simple scaling argument
shows that this statement is equivalent to the Euclidean isoperimetric
inequality:
36 2 Three examples of coupling techniques

|∂Ω| |∂B|
n ≥ n ,
|Ω| n−1 |B| n−1
where B is any ball.
There are very many proofs of the isoperimetric inequality, and
many reﬁnements as well. It is less known that there is a proof by
coupling.
Here is a sketch of the argument, forgetting about regularity issues.
Let B be a ball such that |∂B| = |∂Ω|. Consider a random point X dis-
tributed uniformly in Ω, and a random point Y distributed uniformly
in B. Introduce the Knothe–Rosenblatt coupling of X and Y : This is
a deterministic coupling of the form Y = T (X), such that, at each
x ∈ Ω, the Jacobian matrix ∇T (x) is triangular with nonnegative di-
agonal entries. Since the law of X (resp. Y ) has uniform density 1/|Ω|
(resp. 1/|B|), the change of variables formula yields
1 1
∀x ∈ Ω = det ∇T (x) . (2.5)
|Ω| |B|

Since ∇T is triangular, the Jacobian

Q P determinant of T is det(∇T ) =
λi , and its divergence ∇ · T = λi , where the nonnegative numbers
(λi )1≤i≤n areQ the eigenvalues
P of ∇T . Then the arithmetic–geometric
inequality ( λi )1/n ≤ ( λi )/n becomes
1/n ∇ · T (x)
det ∇T (x) ≤ .
n
Combining this with (2.5) results in

1 (∇ · T )(x)
1/n
≤ .
|Ω| n |B|1/n

Integrate this over Ω and then apply the divergence theorem:

Z Z
1
1− n 1 1
|Ω| ≤ 1 (∇ · T )(x) dx = 1 (T · σ) dHn−1 , (2.6)
n |B| n Ω n |B| n ∂Ω

where σ : ∂Ω → Rn is the unit outer normal to Ω and Hn−1 is the

(n − 1)-dimensional Hausdorﬀ measure (restricted to ∂Ω). But T is
valued in B, so |T · σ| ≤ 1, and (2.6) implies

1 |∂Ω|
|Ω|1− n ≤ 1 .
n |B| n
Caffarelli’s log-concave perturbation theorem 37
1
Since |∂Ω| = |∂B| = n|B|, the right-hand side is actually |B|1− n , so
the volume of Ω is indeed bounded by the volume of B. This concludes
the proof.
The above argument suggests the following problem:

Open Problem 2.1. Can one devise an optimal coupling between sets
(in the sense of a coupling between the uniform probability measures on
these sets) in such a way that the total cost of the coupling decreases
under some evolution converging to balls, such as mean curvature mo-
tion?

Caffarelli’s log-concave perturbation theorem

The previous example was about transporting a set to another, now

the present one is in some sense about transporting a whole space to
another.
It is classical in geometry to compare a space X with a “model
space” M that has nice properties and is, e.g., less curved than X .
A general principle is that certain inequalities which hold true on the
model space can automatically be “transported” to X . The theorem
discussed in this section is a striking illustration of this idea.
Let F, G, H, J, L be nonnegative continuous functions on R, with
H and J nondecreasing, and let ℓ ∈ R. For a given measure µ on
Rn , let λ[µ] be the largest λ ≥ 0 such that, for all Lipschitz functions
h : Rn → R,
Z Z Z
1
L(h) dµ = ℓ =⇒ F G(h) dµ ≤ H J(|∇h|) dµ .
Rn Rn λ Rn
(2.7)
Functional inequalities of the form (2.7) are variants of Sobolev in-
equalities; many of them are well-known and useful. Caﬀarelli’s theo-
rem states that they can only be improved by log-concave perturbation of
the Gaussian distribution. More precisely, if γ is the standard Gaussian
measure and µ = e−v γ is another probability measure, with v convex,
then
λ[µ] ≥ λ[γ].
38 2 Three examples of coupling techniques

His proof is a simple consequence of the following remarkable fact,

which I shall call Caffarelli’s log-concave perturbation theo-
rem: If dµ/dγ is log-concave, then there exists a 1-Lipschitz change
of variables from the measure γ to the measure µ. In other words,
there is a deterministic coupling X, Y = C(X) of (γ, µ), such that
|C(x) − C(y)| ≤ |x − y|, or equivalently |∇C| ≤ 1 (almost everywhere).
It follows in particular that

∇(h ◦ C) ≤ |(∇h) ◦ C|, (2.8)

whatever the function h.

Now it is easy to understand why the existence of the map C im-
plies (2.7): On the one hand, the deﬁnition of change of variables implies
Z Z Z Z
G(h) dµ = G(h ◦ C) dγ, L(h) dµ = L(h ◦ C) dγ;

on the other hand, by the deﬁnition of change of variables again, in-

equality (2.8) and the nondecreasing property of J,
Z Z Z

J(|∇h|) dµ = J |∇h ◦ C| dγ ≥ J |∇(h ◦ C)| dγ.

Thus, inequality (2.7) is indeed “transported” from the space (Rn , γ)

to the space (Rn , µ).

Bibliographical notes

It is very classical to use coupling arguments to prove convergence

to equilibrium for stochastic differential equations and Markov chains;
many examples are described by Rachev and Rüschendorf [696] and
Thorisson [781]. Actually, the standard argument found in textbooks
to prove the convergence to equilibrium for a positive aperiodic ergodic
Markov chain is a coupling argument (but the null case can also be
treated in a similar way, as I learnt from Thorisson). Optimal couplings
are often well adapted to such situations, but definitely not the only
ones to apply.
The coupling method is not limited to systems of independent parti-
cles, and sometimes works in presence of correlations, for instance if the
law satisfies a nonlinear diffusion equation. This is exemplified in works
Bibliographical notes 39

by Tanaka [777] on the spatially homogeneous Boltzmann equation with

Maxwell molecules (the core of Tanaka’s argument is reproduced in my
book [814, Section 7.5]), or some recent papers [138, 214, 379, 590].
Cattiaux and Guillin [221] found a simple and elegant coupling argu-
ment to prove the exponential convergence for the law of the stochastic
process √
dXt = 2 dBt − E e ∇V (Xt − Xet ) dt,

where X et is an independent copy of Xt , the Ee expectation only bears

on X et , and V is assumed to be a uniformly convex C 1 potential on Rn
satisfying V (−x) = V (x).
It is also classical to couple a system of particles with an auxiliary
artificial system to study the limit when the number of particles be-
comes large. For the Vlasov equation in kinetic theory this was done by
Dobrushin [309] and Neunzert [653] several decades ago. (The proof is
reproduced in Spohn [757, Chapter 5], and also suggested as an exercise
in my book [814, Problem 14].) Later Sznitman used this strategy in a
systematic way for the propagation of chaos, and made it very popular,
see e.g. his work on the Boltzmann equation [767] or his Saint-Flour
lecture notes [768] and the many references included.
In all these works, the “philosophy” is always the same: Introduce
some nice coupling and see how it evolves in a certain asymptotic regime
(say, either the time, or the number of particles, or both, go to infinity).
It is possible to treat the convergence to equilibrium for the complete
system (2.1) by methods that are either analytic [301, 472, 816, 818]
or probabilistic [55, 559, 606, 701], but all methods known to me are
much more delicate than the simple coupling argument which works
for (2.2). It is certainly a nice open problem to find an elementary
coupling argument which applies to (2.1). (The arguments in the above-
mentioned probabilistic proofs ultimately rely on coupling methods via
theorems of convergence for Markov chains, but in a quite indirect way.)
Coupling techniques have also been used recently for proving rather
spectacular uniqueness theorems for invariant measures in infinite di-
mension, see e.g. [321, 456, 457].
Classical references for the isoperimetric inequality and related top-
ics are the books by Burago and Zalgaller [176], and Schneider [741];
and the survey by Osserman [664]. Knothe [523] had the idea to use a
“coupling” method to prove geometric inequalities, and Gromov [635,
Appendix] applied this method to prove the Euclidean isopetrimetric
inequality. Trudinger [787] gave a closely related treatment of the same
40 2 Three examples of coupling techniques

inequality and some of its generalizations, by means of a clever use of

the Monge–Ampère equation (which more or less amounts to the con-
struction of an optimal coupling with quadratic cost function, as will
be seen in Chapter 11). Cabré [182] found a surprising simplification
of Trudinger’s method, based on the solution of just a linear elliptic
equation. The “proof” which I gave in this chapter is a variation on
Gromov’s argument; although it is not rigorous, there is no real diffi-
culty in turning it into a full proof, as was done by Figalli, Maggi and
Pratelli [369]. These authors actually prove much more, since they use
this strategy to establish a sharp quantitative stability of the isoperi-
metric inequality (if the shape of a set departs from the optimal shape,
then its isoperimetric ratio departs from the optimal ratio in a quantifi-
able way). In the same work one can find a very interesting comparison
of the respective performances of the couplings obtained by the Knothe
method and by the optimal transport method (the comparison turns
very much to the advantage of optimal transport).
Other links between coupling and isoperimetric-type inequalities are
presented in Chapter 6 of my book [814], the research paper [587], the
review paper [586] and the bibliographical notes at the end of Chap-
ters 18 and 21.
The construction of Caffarelli’s map C is easy, at least conceptually:
The optimal coupling of the Gaussian measure γ with the measure µ =
e−v γ, when the cost function is the square of the Euclidean distance,
will do the job. But proving that C is indeed 1-Lipschitz is much more of
a sport, and involves some techniques from nonlinear partial differential
equations [188]. An idea of the core of the proof is explained in [814,
Problem 13]. It would be nice to find a softer argument.
Üstünel pointed out to me that, if v is convex and symmetric
(v(−x) = v(x)), then the Moser transport T from γ to e−v γ is con-
tracting, in the sense that |T (x)| ≤ |x|; it is not clear however that T
would be 1-Lipschitz.
Caffarelli’s theorem has many analytic and probabilistic applica-
tions, see e.g. [242, 413, 465]. There is an infinite-dimensional version by
Feyel and Üstünel [361], where the Gaussian measure is replaced by the
Wiener measure. Another variant was recently studied by Valdimars-
son [801].
Like the present chapter, the lecture notes [813], written for a CIME
Summer School in 2001, present some applications of optimal transport
in various fields, with a slightly impressionistic style.
3

The founding fathers of optimal transport

Like many other research subjects in mathematics, the ﬁeld of optimal

transport was born several times. The ﬁrst of these births occurred
at the end of the eighteenth century, by way of the French geometer
Gaspard Monge.
Monge was born in 1746 under the French Ancient Régime. Because
of his outstanding skills, he was admitted in a military training school
from which he should have been excluded because of his modest origin.
He invented descriptive geometry on his own, and the power of the
method was so apparent that he was appointed professor at the age
of 22, with the understanding that his theory would remain a military
secret, for exclusive use of higher oﬃcers. He later was one of the most
ardent warrior scientists of the French Revolution, served as a professor
under several regimes, escaped a death sentence pronounced during the
Terror, and became one of Napoleon’s closest friends. He taught at
École Normale Supérieure and École Polytechnique in Paris. Most of
his work was devoted to geometry.
In 1781 he published one of his famous works, Mémoire sur la théorie
des déblais et des remblais (a “déblai” is an amount of material that is
extracted from the earth or a mine; a “remblai” is a material that is
input into a new construction). The problem considered by Monge is
as follows: Assume you have a certain amount of soil to extract from
the ground and transport to places where it should be incorporated in
a construction (see Figure 3.1). The places where the material should
be extracted, and the ones where it should be transported to, are all
known. But the assignment has to be determined: To which destination
should one send the material that has been extracted at a certain place?
The answer does matter because transport is costly, and you want to
42 3 The founding fathers of optimal transport

minimize the total cost. Monge assumed that the transport cost of one
unit of mass along a certain distance was given by the product of the
mass by the distance.

x
y

remblais
déblais

Fig. 3.1. Monge’s problem of déblais and remblais

Nowadays there is a Monge street in Paris, and therein one can ﬁnd
an excellent bakery called Le Boulanger de Monge. To acknowledge this,
and to illustrate how Monge’s problem can be recast in an economic
perspective, I shall express the problem as follows. Consider a large
number of bakeries, producing loaves, that should be transported each
morning to cafés where consumers will eat them. The amount of bread
that can be produced at each bakery, and the amount that will be
consumed at each café are known in advance, and can be modeled as
probability measures (there is a “density of production” and a “density
of consumption”) on a certain space, which in our case would be Paris
(equipped with the natural metric such that the distance between two
points is the length of the shortest path joining them). The problem is
to ﬁnd in practice where each unit of bread should go (see Figure 3.2),
in such a way as to minimize the total transport cost. So Monge’s
problem really is the search of an optimal coupling; and to be more
precise, he was looking for a deterministic optimal coupling.

Fig. 3.2. Economic illustration of Monge’s problem: squares stand for production
units, circles for consumption places.
3 The founding fathers of optimal transport 43

Monge studied the problem in three dimensions for a continuous

distribution of mass. Guided by his beautiful geometric intuition, he
made the important observation that transport should go along straight
lines that would be orthogonal to a family of surfaces. This study led
him to the discovery of lines of curvature, a concept that by itself was a
great contribution to the geometry of surfaces. His ideas were developed
by Charles Dupin and later by Paul Appell. By current mathematical
standards, all these arguments were flawed, yet it certainly would be
worth looking up all these problems with modern tools.
Much later Monge’s problem was rediscovered by the Russian math-
ematician Leonid Vitaliyevich Kantorovich. Born in 1912, Kantorovich
was a very gifted mathematician who made his reputation as a first-
class researcher at the age of 18, and earned a position of professor at
just the same age as Monge had. He worked in many areas of math-
ematics, with a strong taste for applications in economics, and later
theoretical computer science. In 1938 a laboratory consulted him for
the solution of a certain optimization problem, which he found out was
representative of a whole class of linear problems arising in various ar-
eas of economics. Motivated by this discovery, he developed the tools
of linear programming, that later became prominent in economics. The
publication of some of his most important works was delayed because
of the great care with which Soviet authorities of the time handled the
divulgence of scientific research related to economics. In fact (and this
is another common point with Monge) for many years it was strictly
forbidden for Kantorovich to publicly discuss some of his main discov-
eries. In the end his work became well-known and in 1975 was awarded
the Nobel Prize for economics, jointly with Tjalling Koopmans, “for
their contributions to the theory of optimum allocation of resources”.
In the case that is of direct interest for us, namely the problem of
optimal coupling, Kantorovich stated and proved, by means of func-
tional analytical tools, a duality theorem that would play a crucial role
later. He also devised a convenient notion of distance between prob-
ability measures: the distance between two measures should be the
optimal transport cost from one to the other, if the cost is chosen as
the distance function. This distance between probability measures is
nowadays called the Kantorovich–Rubinstein distance, and has proven
to be particularly flexible and useful.
44 3 The founding fathers of optimal transport

It was only several years after his main results that Kantorovich
made the connection with Monge’s work. The problem of optimal cou-
pling has since then been called the Monge–Kantorovich problem.
Throughout the second half of the twentieth century, optimal cou-
pling techniques and variants of the Kantorovich–Rubinstein distance
(nowadays often called Wasserstein distances, or other denominations)
were used by statisticians and probabilists. The “basis” space could be
finite-dimensional, or infinite-dimensional: For instance, optimal cou-
plings give interesting notions of distance between probability measures
on path spaces. Noticeable contributions from the seventies are due
to Roland Dobrushin, who used such distances in the study of parti-
cle systems; and to Hiroshi Tanaka, who applied them to study the
time-behavior of a simple variant of the Boltzmann equation. By the
mid-eighties, specialists of the subject, like Svetlozar Rachev or Ludger
Rüschendorf, were in possession of a large library of ideas, tools, tech-
niques and applications related to optimal transport.
During that time, reparametrization techniques (yet another word
for change of variables) were used by many researchers working on in-
equalities involving volumes or integrals. Only later would it be under-
stood that optimal transport often provides useful reparametrizations.
At the end of the eighties, three directions of research emerged inde-
pendently and almost simultaneously, which completely reshaped the
whole picture of optimal transport.
One of them was John Mather’s work on Lagrangian dynamical
systems. Action-minimizing curves are basic important objects in the
theory of dynamical systems, and the construction of closed action-
minimizing curves satisfying certain qualitative properties is a classical
problem. By the end of the eighties, Mather found it convenient to
study not only action-minimizing curves, but action-minimizing sta-
tionary measures in phase space. Mather’s measures are a generaliza-
tion of action-minimizing curves, and they solve a variational problem
which in effect is a Monge–Kantorovich problem. Under some condi-
tions on the Lagrangian, Mather proved a celebrated result according to
which (roughly speaking) certain action-minimizing measures are au-
tomatically concentrated on Lipschitz graphs. As we shall understand
in Chapter 8, this problem is intimately related to the construction of
a deterministic optimal coupling.
The second direction of research came from the work of Yann Bre-
nier. While studying problems in incompressible fluid mechanics, Bre-
3 The founding fathers of optimal transport 45

nier needed to construct an operator that would act like the projection
on the set of measure-preserving mappings in an open set (in probabilis-
tic language, measure-preserving mappings are deterministic couplings
of the Lebesgue measure with itself). He understood that he could do
so by introducing an optimal coupling: If u is the map for which one
wants to compute the projection, introduce a coupling of the Lebesgue
measure L with u# L. This study revealed an unexpected link between
optimal transport and fluid mechanics; at the same time, by pointing
out the relation with the theory of Monge–Ampère equations, Brenier
attracted the attention of the community working on partial differential
equations.
The third direction of research, certainly the most surprising, came
from outside mathematics. Mike Cullen was part of a group of meteo-
rologists with a well-developed mathematical taste, working on semi-
geostrophic equations, used in meteorology for the modeling of atmo-
spheric fronts. Cullen and his collaborators showed that a certain fa-
mous change of unknown due to Brian Hoskins could be re-interpreted
in terms of an optimal coupling problem, and they identified the min-
imization property as a stability condition. A striking outcome of this
work was that optimal transport could arise naturally in partial differ-
ential equations which seemed to have nothing to do with it.
All three contributions emphasized (in their respective domain) that
important information can be gained by a qualitative description of
optimal transport. These new directions of research attracted various
mathematicians (among the first, Luis Caffarelli, Craig Evans, Wilfrid
Gangbo, Robert McCann, and others), who worked on a better descrip-
tion of the structure of optimal transport and found other applications.
An important conceptual step was accomplished by Felix Otto, who
discovered an appealing formalism introducing a differential point of
view in optimal transport theory. This opened the way to a more geo-
metric description of the space of probability measures, and connected
optimal transport to the theory of diffusion equations, thus leading to
a rich interplay of geometry, functional analysis and partial differential
equations.
Nowadays optimal transport has become a thriving industry, involv-
ing many researchers and many trends. Apart from meteorology, fluid
mechanics and diffusion equations, it has also been applied to such di-
verse topics as the collapse of sandpiles, the matching of images, and the
design of networks or reflector antennas. My book, Topics in Optimal
46 3 The founding fathers of optimal transport

Transportation, written between 2000 and 2003, was the ﬁrst attempt
to present a synthetic view of the modern theory. Since then the ﬁeld
has grown much faster than I expected, and it was never so active as
it is now.

Bibliographical notes

Before the twentieth century, the main references for the problem of
“déblais et remblais” are the memoirs by Monge [636], Dupin [319] and
Appell [42]. Besides achieving important mathematical results, Monge
and Dupin were strongly committed to the development of society and
it is interesting to browse some of their writings about economics and
industry (a list can be found online at gallica.bnf.fr). A lively ac-
count of Monge’s life and political commitments can be found in Bell’s
delightful treatise, Men of Mathematics [80, Chapter 12]. It seems how-
ever that Bell did dramatize the story a bit, at the expense of accuracy
and neutrality. A more cold-blooded biography of Monge was written
by de Launay [277]. Considered as one the greatest geologists of his
time, not particularly sympathetic to the French Revolution, de Lau-
nay documented himself with remarkable rigor, going back to original
sources whenever possible. Other biographies have been written since
then by Taton [778, 779] and Aubry [50].
Monge originally formulated his transport problem in Euclidean
space for the cost function c(x, y) = |x − y|; he probably had no idea of
the extreme difficulty of a rigorous treatment. It was only in 1979 that
Sudakov [765] claimed a proof of the existence of a Monge transport
for general probability densities with this particular cost function. But
his proof was not completely correct, and was amended much later by
Ambrosio [20]. In the meantime, alternative rigorous proofs had been
devised first by Evans and Gangbo [330] (under rather strong assump-
tions on the data), then by Trudinger and Wang [791], and Caffarelli,
Feldman and McCann [190].
Kantorovich defined linear programming in [499], introduced his
minimization problem and duality theorem in [500], and in [501] applied
his theory to the problem of optimal transport; this note can be consid-
ered as the act of birth of the modern formulation of optimal transport.
Later he made the link with Monge’s problem in [502]. His major work
Bibliographical notes 47

in economics is the book [503], including a reproduction of [499]. An-

other important contribution is a study of numerical schemes based on
linear programming, joint with his student Gavurin [505].
Kantorovich wrote a short autobiography for his Nobel Prize [504].
Online at www.math.nsc.ru/LBRT/g2/english/ssk/legacy.html are
some comments by Kutateladze, who edited his mathematical works.
A recent special issue of the Journal of Mathematical Sciences, edited
by Vershik, was devoted to Kantorovich [810]; this reference contains
translations of [501] and [502], as well as much valuable information
about the personality of Kantorovich, and the genesis and impact of his
ideas in mathematics, economy and computer science. In another his-
torical note [808] Vershik recollects memories of Kantorovich and tells
some tragicomical stories illustrating the incredible ideological pressure
put on him and other scientists by Soviet authorities at the time.
The “classical” probabilistic theory of optimal transport is exhaus-
tively reviewed by Rachev and Rüschendorf [696, 721]; most notable
applications include limit theorems for various random processes. Re-
lations with game theory, economics, statistics, and hypotheses testing
are also common (among many references see e.g. [323, 391]).
Mather introduced minimizing measures in [600], and proved his
Lipschitz graph theorem in [601]. The explicit connection with the
Monge–Kantorovich problem came only recently [105]: see Chapter 8.
Tanaka’s contributions to kinetic theory go back to the mid-seventies
[644, 776, 777]. His line of research was later taken up by Toscani and
collaborators [133, 692]; these papers constituted my first contact with
the optimal transport problem. More recent developments in the kinetic
theory of granular media appear for instance in [138].
Brenier announced his main results in a short note [154], then pub-
lished detailed proofs in [156]. Chapter 3 in [814] is entirely devoted
to Brenier’s polar factorization theorem (which includes the exis-
tence of the projection operator), its interpretation and consequences.
For the sources of inspiration of Brenier, and various links between op-
timal transport and hydrodynamics, one may consult [155, 158, 159,
160, 163, 170]. Recent papers by Ambrosio and Figalli [24, 25] provide
a complete and thorough rewriting of Brenier’s theory of generalized
incompressible flows.
The semi-geostrophic system was introduced by Eliassen [325] and
Hoskins [480, 481, 482]; it is very briefly described in [814, Problem 9,
pp. 323–326]. Cullen and collaborators wrote many papers on the sub-
48 3 The founding fathers of optimal transport

ject, see in particular [269]; see also the review article [263], the works
by Cullen and Gangbo [266], Cullen and Feldman [265] or the recent
book by Cullen [262].
Further links between optimal transport and other fields of mathe-
matics (or physics) can be found in my book [814], or in the treatise by
Rachev and Rüschendorf [696]. An important source of inspiration was
the relation with the qualitative behavior of certain diffusive equations
arising from gas dynamics; this link was discovered by Jordan, Kinder-
lehrer and Otto at the end of the nineties [493], and then explored by
several authors [208, 209, 210, 211, 212, 213, 214, 216, 669, 671].
Below is a nonexhaustive list of some other unexpected applications.
Relations with the modeling of sandpiles are reviewed by Evans [328],
as well as compression molding problems; see also Feldman [353] (this
is for the cost function c(x, y) = |x − y|). Applications of optimal
transport to image processing and shape recognition are discussed by
Gangbo and McCann [400], Ahmad [6], Angenent, Haker, Tannen-
baum, and Zhu [462, 463], Chazal, Cohen-Steiner and Mérigot [224],
and many other contributors from the engineering community (see
e.g. [700, 713]). X.-J. Wang [834], and independently Glimm and
Oliker [419] (around 2000 and 2002 respectively) discovered that the
theoretical problem of designing reflector antennas could be recast in
terms of optimal transport for the cost function c(x, y) = − log(1−x·y)
on S 2 ; see [402, 419, 660] for further work in the area, and [420] for
another version of this problem involving two reflectors.1 Rubinstein
and Wolansky adapted the strategy in [420] to study the optimal de-
sign of lenses [712]; and Gutiérrez and Huang to treat a refraction
problem [453]. In his PhD Thesis, Bernot [108] made the link be-
tween optimal transport, irrigation and the design of networks. Such
topics were also considered by Santambrogio with various collabora-
tors [152, 207, 731, 732, 733, 734]; in particular it is shown in [732]
that optimal transport theory gives a rigorous basis to some varia-
tional constructions used by physicists and hydrologists to study river
basin morphology [65, 706]. Buttazzo and collaborators [178, 179, 180]
explored city planning via optimal transport. Brenier found a connec-
tion to the electrodynamic equations of Maxwell and related models
in string theory [161, 162, 163, 164, 165, 166]. Frisch and collaborators
1
According to Oliker, the connection between the two-reflector problem (as for-
mulated in [661]) and optimal transport is in fact much older, since it was first
formulated in a 1993 conference in which he and Caffarelli were participating.
Bibliographical notes 49

linked optimal transport to the problem of reconstruction of the “condi-

tions of the initial Universe” [168, 382, 755]. (The publication of [382] in
the prestigious generalist scientiﬁc journal Nature is a good indication
of the current visibility of optimal transport outside mathematics.)
Relations of optimal transport with geometry, in particular Ricci
curvature, will be explored in detail in Parts II and III of these notes.
Many generalizations and variants have been studied in the litera-
ture, such as the optimal matching [323], the optimal transshipment
(see [696] for a discussion and list of references), the optimal transport
of a fraction of the mass [192, 365], or the optimal coupling with more
than two prescribed marginals [403, 525, 718, 723, 725]; I learnt from
Strulovici that the latter problem has applications in contract theory.
In spite of this avalanche of works, one certainly should not regard
optimal transport as a kind of miraculous tool, for “there are no mir-
acles in mathematics”. In my opinion this abundance only reﬂects the
fact that optimal transport is a simple, meaningful, natural and there-
fore universal concept.
Part I

Qualitative description of optimal transport

The ﬁrst part of this course is devoted to the description and charac-
terization of optimal transport under certain regularity assumptions on
the measures and the cost function.
As a start, some general theorems about optimal transport plans are
established in Chapters 4 and 5, in particular the Kantorovich duality
theorem. The emphasis is on c-cyclically monotone maps, both in the
statements and in the proofs. The assumptions on the cost function
and the spaces will be very general.
From the Monge–Kantorovich problem one can derive natural dis-
tance functions on spaces of probability measures, by choosing the cost
function as a power of the distance. The main properties of these dis-
tances are established in Chapter 6.
In Chapter 7 a time-dependent version of the Monge–Kantorovich
problem is investigated, which leads to an interpolation procedure be-
tween probability measures, called displacement interpolation. The nat-
ural assumption is that the cost function derives from a Lagrangian
action, in the sense of classical mechanics; still (almost) no smoothness
is required at that level. In Chapter 8 I shall make further assumptions
of smoothness and convexity, and recover some regularity properties of
the displacement interpolant by a strategy due to Mather.
Then in Chapters 9 and 10 it is shown how to establish the exis-
tence of deterministic optimal couplings, and characterize the associ-
ated transport maps, again under adequate regularity and convexity
assumptions. The Change of variables Formula is considered in Chap-
ter 11. Finally, in Chapter 12 I shall discuss the regularity of the trans-
port map, which in general is not smooth.
The main results of this part are synthetized and summarized in
Chapter 13. A good understanding of this chapter is suﬃcient to go
through Part II of this course.
4

Basic properties

Existence

The ﬁrst good thing about optimal couplings is that they exist:
Theorem 4.1 (Existence of an optimal coupling). Let (X , µ)
and (Y, ν) be two Polish probability spaces; let a : X → R ∪ {−∞}
and b : Y → R ∪ {−∞} be two upper semicontinuous functions such
that a ∈ L1 (µ), b ∈ L1 (ν). Let c : X × Y → R ∪ {+∞} be a lower
semicontinuous cost function, such that c(x, y) ≥ a(x) + b(y) for all
x, y. Then there is a coupling of (µ, ν) which minimizes the total cost
E c(X, Y ) among all possible couplings (X, Y ).
Remark 4.2. The lower bound assumption on c guarantees that the
expected cost E c(X, Y ) is well-deﬁned in R ∪ {+∞}. In most cases of
applications — but not all — one may choose a = 0, b = 0.
The proof relies on basic variational arguments involving the topol-
ogy of weak convergence (i.e. imposed by bounded continuous test func-
tions). There are two key properties to check: (a) lower semicontinuity,
(b) compactness. These issues are taken care of respectively in Lem-
mas 4.3 and 4.4 below, which will be used again in the sequel. Before
going on, I recall Prokhorov’s theorem: If X is a Polish space, then
a set P ⊂ P (X ) is precompact for the weak topology if and only if it is
tight, i.e. for any ε > 0 there is a compact set Kε such that µ[X \Kε ] ≤ ε
for all µ ∈ P.
Lemma 4.3 (Lower semicontinuity of the cost functional). Let
X and Y be two Polish spaces, and c : X × Y → R ∪ {+∞} a lower
56 4 Basic properties

semicontinuous cost function. Let h : X × Y → R ∪ {−∞} be an upper

semicontinuous function such that c ≥ h. Let (πk )k∈N be a sequence of
probability measures on X ×Y, converging weakly to some π ∈ P (X ×Y),
in such a way that h ∈ L1 (πk ), h ∈ L1 (π), and
Z Z
h dπk −−−→ h dπ.
X ×Y k→∞ X ×Y

Then Z Z
c dπ ≤ lim inf c dπk .
X ×Y k→∞ X ×Y
R
In particular, if c is nonnegative, then F : π → c dπ is lower semicon-
tinuous on P (X × Y), equipped with the topology of weak convergence.

Lemma 4.4 (Tightness of transference plans). Let X and Y be

two Polish spaces. Let P ⊂ P (X ) and Q ⊂ P (Y) be tight subsets of
P (X ) and P (Y) respectively. Then the set Π(P, Q) of all transference
plans whose marginals lie in P and Q respectively, is itself tight in
P (X × Y).

Proof of Lemma 4.3. Replacing c by c − h, we may assume that c is

a nonnegative lower semicontinuous function. Then c can be written
as the pointwise limit of a nondecreasing family (cℓ )ℓ∈N of continuous
real-valued functions. By monotone convergence,
Z Z Z Z
c dπ = lim cℓ dπ = lim lim cℓ dπk ≤ lim inf c dπk .
ℓ→∞ ℓ→∞ k→∞ k→∞

⊓
⊔

Proof of Lemma 4.4. Let µ ∈ P, ν ∈ Q, and π ∈ Π(µ, ν). By assump-

tion, for any ε > 0 there is a compact set Kε ⊂ X , independent of
the choice of µ in P, such that µ[X \ Kε ] ≤ ε; and similarly there is
a compact set Lε ⊂ Y, independent of the choice of ν in Q, such that
ν[Y \ Lε ] ≤ ε. Then for any coupling (X, Y ) of (µ, ν),

/ Kε × Lε ≤ P [X ∈
P (X, Y ) ∈ / Kε ] + P [Y ∈/ Lε ] ≤ 2ε.

The desired result follows since this bound is independent of the cou-
pling, and Kε × Lε is compact in X × Y. ⊓
⊔

Proof of Theorem 4.1. Since X is Polish, {µ} is tight in P (X ); similarly,

{ν} is tight in P (Y). By Lemma 4.4, Π(µ, ν) is tight in P (X × Y), and
Restriction property 57

by Prokhorov’s theorem this set has a compact closure. By passing to

the limit in the equation for marginals, we see that Π(µ, ν) is closed,
so it is in fact compact.
Then let R (πk )k∈N be a sequence of probability measures on X × Y,
such that c dπk converges to the inﬁmum transport cost. Extracting
a subsequence if necessary, we may assume that πk converges to some
π ∈ Π(µ, ν). The function h : (x, y) 7−→ a(x) + b(y) R lies inR L1 (πk )
1
Rand in L R(π), and c ≥ h by assumption; moreover, h dπk = h dπ =
a dµ + b dν; so Lemma 4.3 implies
Z Z
c dπ ≤ lim inf c dπk .
k→∞

Thus π is minimizing. ⊓
⊔

Remark 4.5. This existence theorem does not imply that the optimal
cost is ﬁnite. ItR might be that all transport plans lead to an inﬁnite
total cost, i.e. c dπ = +∞ for all π ∈ Π(µ, ν). A simple condition to
rule out this annoying possibility is
Z
c(x, y) dµ(x) dν(y) < +∞,

which guarantees that at least the independent coupling has ﬁnite total
cost. In the sequel, I shall sometimes make the stronger assumption

c(x, y) ≤ cX (x) + cY (y), (cX , cY ) ∈ L1 (µ) × L1 (ν),

which implies that any coupling has ﬁnite total cost, and has other nice
consequences (see e.g. Theorem 5.10).

Restriction property

The second good thing about optimal couplings is that any sub-coupling
is still optimal. In words: If you have an optimal transport plan, then
any induced sub-plan (transferring part of the initial mass to part of
the ﬁnal mass) has to be optimal too — otherwise you would be able
to lower the cost of the sub-plan, and as a consequence the cost of the
whole plan. This is the content of the next theorem.
58 4 Basic properties

Theorem 4.6 (Optimality is inherited by restriction). Let (X , µ)

and (Y, ν) be two Polish spaces, a ∈ L1 (µ), b ∈ L1 (ν), let c : X × Y →
R ∪ {+∞} be a measurable cost function such that c(x, y) ≥ a(x) + b(y)
for all x, y; and let C(µ, ν) be the optimal transport cost from µ to ν.
Assume that C(µ, ν) < +∞ and let π ∈ Π(µ, ν) be an optimal trans-
port plan. Let πe be a nonnegative measure on X × Y, such that π e≤π
and πe[X × Y] > 0. Then the probability measure

e
π
π ′ :=
e[X × Y]
π

is an optimal transference plan between its marginals µ′ and ν ′ .

Moreover, if π is the unique optimal transference plan between µ
and ν, then also π ′ is the unique optimal transference plan between µ′
and ν ′ .

Example 4.7.If (X, Y ) is an optimal coupling of (µ, ν), and Z ⊂ X ×Y

is such that P (X, Y ) ∈ Z > 0, then the pair (X, Y ), conditioned
to lie in Z, is an optimal coupling of (µ′ , ν ′ ), where µ′ is the law of
X conditioned by the event “(X, Y ) ∈ Z”, and ν ′ is the law of Y
conditioned by the same event.

Proof of Theorem 4.6. Assume that π ′ is not optimal; then there exists
π ′′ such that

(projX )# π ′′ = (projX )# π ′ = µ′ , (projY )# π ′′ = (projY )# π ′ = ν ′ ,

(4.1)
yet Z Z
′′
c(x, y) dπ (x, y) < c(x, y) dπ ′ (x, y). (4.2)

Then consider
b := (π − π
π e ′′ ,
e) + Zπ (4.3)
where Ze = π
e[X × Y] > 0. Clearly, π
b is a nonnegative measure. On the
other hand, it can be written as
e ′′ − π ′ );
b = π + Z(π
π

then (4.1) shows that π b has the same marginals as π, while (4.2) implies
that it has a lower transport cost than π. (Here I use the fact that the
total cost is ﬁnite.) This contradicts the optimality of π. The conclusion
is that π ′ is in fact optimal.
Convexity properties 59

It remains to prove the last statement of Theorem 4.6. Assume that

π is the unique optimal transference plan between µ and ν; and let π ′′
be any optimal transference plan between µ′ and ν ′ . Deﬁne again π b
by (4.3). Then π b has the same cost as π, so π
b = π, which implies that
e ′′ , i.e. π ′′ = π ′ .
e = Zπ
π ⊓
⊔

Convexity properties
The following estimates are of constant use:
Theorem 4.8 (Convexity of the optimal cost). Let X and Y be
two Polish spaces, let c : X ×Y → R∪{+∞} be a lower semicontinuous
function, and let C be the associated optimal transport cost functional
on P (X ) × P (Y). Let (Θ, λ) be a probability space, and let µθ , νθ be two
measurable functions defined on Θ, with values in P (X ) and P (Y) re-
spectively. Assume that c(x, y) ≥ a(x) + b(y), where a ∈ L1 (dµθ dλ(θ)),
b ∈ L1 (dνθ dλ(θ)). Then
Z Z Z
C µθ λ(dθ), νθ λ(dθ) ≤ C(µθ , νθ ) λ(dθ) .
Θ Θ Θ

Proof of Theorem 4.8. First notice that a ∈ L1 (µθ ), b ∈ L1 (νθ ) for λ-

almost all values of θ. For each such θ, Theorem 4.1 guarantees the exis-
R an optimal transport plan πθ ∈
tence of R Π(µθ , νθ ), for the cost
R c. Then
π := πθ λ(dθ) has marginals µ := µθ λ(dθ) and ν := νθ λ(dθ).
Admitting temporarily Corollary 5.22, we may assume that πθ is a
measurable function of θ. So
Z
C(µ, ν) ≤ c(x, y) π(dx dy)
X ×Y
Z Z
= c(x, y) πθ λ(dθ) (dx dy)
X ×Y Θ
Z Z
= c(x, y) πθ (dx dy) λ(dθ)
ZΘ X ×Y

= C(µθ , νθ ) λ(dθ),
Θ

and the conclusion follows. ⊓

⊔
60 4 Basic properties

Description of optimal plans

Obtaining more precise information about minimizers will be much

more of a sport. Here is a short list of questions that one might ask:

• Is the optimal coupling unique? smooth in some sense?

• Is there a Monge coupling, i.e. a deterministic optimal coupling?
• Is there a geometrical way to characterize optimal couplings? Can
one check in practice that a certain coupling is optimal?

About the second question: Why don’t we try to apply the same
reasoning as in the proof of Theorem 4.1? The problem is that the set
of deterministic couplings is in general not compact; in fact, this set
is often dense in the larger space of all couplings! So we may expect
that the value of the inﬁmum in the Monge problem coincides with the
value of the minimum in the Kantorovich problem; but there is no a
priori reason to expect the existence of a Monge minimizer.

Example 4.9. Let X = Y = R2 , let c(x, y) = |x − y|2 , let µ be H1

restricted to {0} × [−1, 1], and let ν be (1/2) H1 restricted to {−1, 1} ×
[−1, 1], where H1 is the one-dimensional Hausdorﬀ measure. Then there
is a unique optimal transport, which for each point (0, a) sends one half
of the mass at (0, a) to (−1, a), and the other half to (1, a). This is not
a Monge transport, but it is easy to approximate it by (nonoptimal)
deterministic transports (see Figure 4.1).

Fig. 4.1. The optimal plan, represented in the left image, consists in splitting the
mass in the center into two halves and transporting mass horizontally. On the right
the filled regions represent the lines of transport for a deterministic (without splitting
of mass) approximation of the optimum.
Bibliographical notes 61

Bibliographical notes

Theorem 4.1 has probably been known from time immemorial; it is

usually stated for nonnegative cost functions.
Prokhorov’s theorem is a most classical result that can be found e.g.
in [120, Theorems 6.1 and 6.2], or in my own course on integration [819,
Section VII-5].
Theorems of the form “inﬁmum cost in the Monge problem =
minimum cost in the Kantorovich problem” have been established by
Gangbo [396, Appendix A], Ambrosio [20, Theorem 2.1], and Pratelli
[687, Theorem B]. The most general results to this date are those which
appear in Pratelli’s work: Equality holds true if the source space (X , µ)
is Polish without atoms, and the cost is continuous X ×Y → R∪{+∞},
with the value +∞ allowed. (In [687] the cost c is bounded below, but
it is suﬃcient that c(x, y) ≥ a(x)+ b(y), where a ∈ L1 (µ) and b ∈ L1 (ν)
are continuous.)
5

Cyclical monotonicity and Kantorovich duality

To go on, we should become acquainted with two basic concepts in the

theory of optimal transport. The ﬁrst one is a geometric property called
cyclical monotonicity; the second one is the Kantorovich dual problem,
which is another face of the original Monge–Kantorovich problem. The
main result in this chapter is Theorem 5.10.

Definitions and heuristics

I shall start by explaining the concepts of cyclical monotonicity and

Kantorovich duality in an informal way, sticking to the bakery analogy
of Chapter 3. Assume you have been hired by a large consortium of
bakeries and cafés, to be in charge of the distribution of bread from
production units (bakeries) to consumption units (cafés). The locations
of the bakeries and cafés, their respective production and consumption
rates, are all determined in advance. You have written a transference
plan, which says, for each bakery (located at) xi and each café yj , how
much bread should go each morning from xi to yj .
As there are complaints that the transport cost associated with
your plan is actually too high, you try to reduce it. For that purpose
you choose a bakery x1 that sends part of its production to a distant
café y1 , and decide that one basket of bread will be rerouted to another
café y2 , that is closer to x1 ; thus you will gain c(x1 , y2 ) − c(x1 , y1 ). Of
course, now this results in an excess of bread in y2 , so one basket of
bread arriving to y2 (say, from bakery x2 ) should in turn be rerouted
to yet another café, say y3 . The process goes on and on until ﬁnally you
64 5 Cyclical monotonicity and Kantorovich duality

redirect a basket from some bakery xN to y1 , at which point you can

stop since you have a new admissible transference plan (see Figure 5.1).

Fig. 5.1. An attempt to improve the cost by a cycle; solid arrows indicate the mass
transport in the original plan, dashed arrows the paths along which a bit of mass is
rerouted.

The new plan is (strictly) better than the older one if and only if

c(x1 , y2 )+c(x2 , y3 )+. . .+c(xN , y1 ) < c(x1 , y1 )+c(x2 , y2 )+. . .+c(xN , yN ).

Thus, if you can find such cycles (x1 , y1 ), . . . , (xN , yN ) in your trans-
ference plan, certainly the latter is not optimal. Conversely, if you do
not find them, then your plan cannot be improved (at least by the pro-
cedure described above) and it is likely to be optimal. This motivates
the following definitions.

Definition 5.1 (Cyclical monotonicity). Let X , Y be arbitrary

sets, and c : X × Y → (−∞, +∞] be a function. A subset Γ ⊂ X × Y
is said to be c-cyclically monotone if, for any N ∈ N, and any family
(x1 , y1 ), . . . , (xN , yN ) of points in Γ , holds the inequality
N
X N
X
c(xi , yi ) ≤ c(xi , yi+1 ) (5.1)
i=1 i=1

(with the convention yN +1 = y1 ). A transference plan is said to be

c-cyclically monotone if it is concentrated on a c-cyclically monotone
set.
Definitions and heuristics 65

Informally, a c-cyclically monotone plan is a plan that cannot be im-

proved: it is impossible to perturb it (in the sense considered before, by
rerouting mass along some cycle) and get something more economical.
One can think of it as a kind of local minimizer. It is intuitively obvi-
ous that an optimal plan should be c-cyclically monotone; the converse
property is much less obvious (maybe it is possible to get something
better by radically changing the plan), but we shall soon see that it
holds true under mild conditions.
The next key concept is the dual Kantorovich problem. While the
central notion in the original Monge–Kantorovich problem is cost, in
the dual problem it is price. Imagine that a company oﬀers to take care
of all your transportation problem, buying bread at the bakeries and
selling them to the cafés; what happens in between is not your problem
(and maybe they have tricks to do the transport at a lower price than
you). Let ψ(x) be the price at which a basket of bread is bought at
bakery x, and φ(y) the price at which it is sold at café y. On the whole,
the price which the consortium bakery + café pays for the transport
is φ(y) − ψ(x), instead of the original cost c(x, y). This of course is for
each unit of bread: if there is a mass µ(dx) at x, then the total price of
the bread shipment from there will be ψ(x) µ(dx).
So as to be competitive, the company needs to set up prices in such
a way that
∀(x, y), φ(y) − ψ(x) ≤ c(x, y). (5.2)
When you were handling the transportation yourself, your problem was
to minimize the cost. Now that the company takes up the transporta-
tion charge, their problem is to maximize the profits. This naturally
leads to the dual Kantorovich problem:
Z Z
sup φ(y) dν(y) − ψ(x) dµ(x); φ(y) − ψ(x) ≤ c(x, y) . (5.3)
Y X

From a mathematical point of view, it will be imposed that the

functions ψ and φ appearing in (5.3) be integrable: ψ ∈ L1 (X , µ);
φ ∈ L1 (Y, ν).
With the intervention of the company, the shipment of each unit of
bread does not cost more than it used to when you were handling it
yourself; so it is obvious that the supremum in (5.3) is no more than
the optimal transport cost:
66 5 Cyclical monotonicity and Kantorovich duality
Z Z
sup φ(y) dν(y) − ψ(x) dµ(x)
φ−ψ≤c Y X
Z
≤ inf c(x, y) dπ(x, y) . (5.4)
π∈Π(µ,ν) X ×Y

Clearly, if we can find a pair (ψ, φ) and a transference plan π for which
there is equality, then (ψ, φ) is optimal in the left-hand side and π is
also optimal in the right-hand side.
A pair of price functions (ψ, φ) will informally be said to be com-
petitive if it satisfies (5.2). For a given y, it is of course in the interest
of the company to set the highest possible competitive price φ(y), i.e.
the highest lower bound for (i.e. the infimum of) ψ(x) + c(x, y), among
all bakeries x. Similarly, for a given x, the price ψ(x) should be the
supremum of all φ(y) − c(x, y). Thus it makes sense to describe a pair
of prices (ψ, φ) as tight if

φ(y) = inf ψ(x) + c(x, y) , ψ(x) = sup φ(y) − c(x, y) . (5.5)
x y

In words, prices are tight if it is impossible for the company to raise the
selling price, or lower the buying price, without losing its competitivity.
Consider an arbitrary pair of competitive prices (ψ, φ). Wecan al-
ways improve φ by replacing it by φ1 (y) = inf x ψ(x) + c(x, y) . Then
we can also improve ψ by replacing it by ψ1 (x) = supy φ1 (y) − c(x, y) ;
then replacing φ1 by φ2 (y) = inf x ψ1 (x) + c(x, y) , and so on. It turns
out that this process is stationary: as an easy exercise, the reader can
check that φ2 = φ1 , ψ2 = ψ1 , which means that after just one iteration
one obtains a pair of tight prices. Thus, when we consider the dual
Kantorovich problem (5.3), it makes sense to restrict our attention to
tight pairs, in the sense of equation (5.5). From that equation we can
reconstruct φ in terms of ψ, so we can just take ψ as the only unknown
in our problem.
That unknown cannot be just any function: if you take a general
function ψ, and compute φ by the first formula in (5.5), there is no
chance that the second formula will be satisfied. In fact this second
formula will hold true if and only if ψ is c-convex, in the sense of the
next definition (illustrated by Figure 5.2).

Definition 5.2 (c-convexity). Let X , Y be sets, and c : X × Y →

(−∞, +∞]. A function ψ : X → R ∪ {+∞} is said to be c-convex if it
is not identically +∞, and there exists ζ : Y → R ∪ {±∞} such that
Definitions and heuristics 67

∀x ∈ X ψ(x) = sup ζ(y) − c(x, y) . (5.6)
y∈Y

Then its c-transform is the function ψ c defined by

∀y ∈ Y ψ c (y) = inf ψ(x) + c(x, y) , (5.7)
x∈X

and its c-subdifferential is the c-cyclically monotone set defined by

n o
∂c ψ := (x, y) ∈ X × Y; ψ c (y) − ψ(x) = c(x, y) .

The functions ψ and ψ c are said to be c-conjugate.

Moreover, the c-subdifferential of ψ at point x is
n o
∂c ψ(x) = y ∈ Y; (x, y) ∈ ∂c ψ ,

or equivalently

∀z ∈ X , ψ(x) + c(x, y) ≤ ψ(z) + c(z, y). (5.8)

y0 y1 x2
x0 x1 y2
1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
1111111111
0000000000 1111111111
0000000000
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111 0000000000
1111111111
0000000000
1111111111

Fig. 5.2. A c-convex function is a function whose graph you can entirely caress
from below with a tool whose shape is the negative of the cost function (this shape
might vary with the point y). In the picture yi ∈ ∂c ψ(xi ).

Particular Case 5.3. If c(x, y) = −x · y on Rn × Rn , then the c-

transform coincides with the usual Legendre transform, and c-convexity
is just plain convexity on Rn . (Actually, this is a slight oversimpliﬁca-
tion: c-convexity is equivalent to plain convexity plus lower semicon-
tinuity! A convex function is automatically continuous on the largest
68 5 Cyclical monotonicity and Kantorovich duality

open set Ω where it is finite, but lower semicontinuity might fail at the
boundary of Ω.) One can think of the cost function c(x, y) = −x · y
as basically the same as c(x, y) = |x − y|2 /2, since the “interaction”
between the positions x and y is the same for both costs.
Particular Case 5.4. If c = d is a distance on some metric space
X , then a c-convex function is just a 1-Lipschitz function, and it
is its own c-transform. Indeed, if ψ is c-convex it is obviously 1-
Lipschitz; conversely, if ψ is 1-Lipschitz, then ψ(x) ≤ ψ(y) + d(x, y), so
ψ(x) = inf y [ψ(y) + d(x, y)] = ψ c (x). As an even more particular case,
if c(x, y) = 1x6=y , then ψ is c-convex if and only if sup ψ − inf ψ ≤ 1,
and then again ψ c = ψ. (More generally, if c satisfies the triangle in-
equality c(x, z) ≤ c(x, y) + c(y, z), then ψ is c-convex if and only if
ψ(y) − ψ(x) ≤ c(x, y) for all x, y; and then ψ = ψ c .)
Remark 5.5. There is no measure theory in Definition 5.2, so no as-
sumption of measurability is made, and the supremum in (5.6) is a true
supremum, not just an essential supremum; the same for the infimum
in (5.7). If c is continuous, then a c-convex function is automatically
lower semicontinuous, and its subdifferential is closed; but if c is not
continuous the measurability of ψ and ∂c ψ is not a priori guaranteed.
Remark 5.6. I excluded the case when ψ ≡ +∞ so as to avoid trivial
situations; what I called a c-convex function might more properly (!)
be called a proper c-convex function. This automatically implies that
ζ in (5.6) does not take the value +∞ at all if c is real-valued. If
c does achieve infinite values, then the correct convention in (5.6) is
(+∞) − (+∞) = −∞.
If ψ is a function on X , then its c-transform is a function on Y.
Conversely, given a function on Y, one may define its c-transform as a
function on X . It will be convenient in the sequel to define the latter
concept by an infimum rather than a supremum. This convention has
the drawback of breaking the symmetry between the roles of X and Y,
but has other advantages that will be apparent later on.
Definition 5.7 (c-concavity). With the same notation as in Defini-
tion 5.2, a function φ : Y → R ∪ {−∞} is said to be c-concave if it
is not identically −∞, and there exists ψ : X → R ∪ {±∞} such that
φ = ψ c . Then its c-transform is the function φc defined by

∀x ∈ X φc (x) = sup φ(y) − c(x, y) ;
y∈Y
Kantorovich duality 69

and its c-superdifferential is the c-cyclically monotone set defined by

n o
∂ c φ := (x, y) ⊂ X × Y; φ(y) − φc (x) = c(x, y) .

In spite of its short and elementary proof, the next crucial result is
one of the main justiﬁcations of the concept of c-convexity.

Proposition 5.8 (Alternative characterization of c-convexity).

For any function ψ : X → R∪{+∞}, let its c-convexification be defined
by ψ cc = (ψ c )c . More explicitly,

ψ cc (x) = sup inf ψ(e x, y) − c(x, y) .
x) + c(e
y∈Y x
e∈X

Then ψ is c-convex if and only if ψ cc = ψ.

Proof of Proposition 5.8. As a general fact, for any function φ : Y →

R ∪ {−∞} (not necessarily c-convex), one has the identity φccc = φc .
Indeed,
h i
φccc (x) = sup inf sup φ(ey ) − c(e x, y) − c(x, y) ;
x, ye) + c(e
y x
e ye

then the choice x e = x shows that φccc (x) ≤ φc (x); while the choice
ye = y shows that φccc (x) ≥ φc (x).
If ψ is c-convex, then there is ζ such that ψ = ζ c , so ψ cc = ζ ccc =
c
ζ = ψ.
The converse is obvious: If ψ cc = ψ, then ψ is c-convex, as the
c-transform of ψ c . ⊓
⊔

Remark 5.9. Proposition 5.8 is a generalized version of the Legendre

duality in convex analysis (to recover the usual Legendre duality, take
c(x, y) = −x · y in Rn × Rn ).

Kantorovich duality

We are now ready to state and prove the main result in this chapter.
70 5 Cyclical monotonicity and Kantorovich duality

Theorem 5.10 (Kantorovich duality). Let (X , µ) and (Y, ν) be

two Polish probability spaces and let c : X × Y → R ∪ {+∞} be a lower
semicontinuous cost function, such that

∀(x, y) ∈ X × Y, c(x, y) ≥ a(x) + b(y)

for some real-valued upper semicontinuous functions a ∈ L1 (µ) and

b ∈ L1 (ν). Then
(i) There is duality:
Z
min c(x, y) dπ(x, y)
π∈Π(µ,ν) X ×Y
Z Z
= sup φ(y) dν(y) − ψ(x) dµ(x)
(ψ,φ)∈Cb (X )×Cb (Y); φ−ψ≤c Y X
Z Z
= sup φ(y) dν(y) − ψ(x) dµ(x)
(ψ,φ)∈L1 (µ)×L1 (ν); φ−ψ≤c Y X
Z Z
= sup ψ c (y) dν(y) − ψ(x) dµ(x)
ψ∈L1 (µ) Y X
Z Z
c
= sup φ(y) dν(y) − φ (x) dµ(x) ,
φ∈L1 (ν) Y X

and in the above suprema one might as well impose that ψ be c-convex
and φ c-concave.
R
(ii) If c is real-valued and the optimal cost C(µ, ν) = inf π∈Π(µ,ν) c dπ
is finite, then there is a measurable c-cyclically monotone set Γ ⊂ X ×Y
(closed if a, b, c are continuous) such that for any π ∈ Π(µ, ν) the fol-
lowing five statements are equivalent:
(a) π is optimal;
(b) π is c-cyclically monotone;
(c) There is a c-convex ψ such that, π-almost surely,
ψ c (y) − ψ(x) = c(x, y);
(d) There exist ψ : X → R ∪ {+∞} and φ : Y → R ∪ {−∞},
such that φ(y) − ψ(x) ≤ c(x, y) for all (x, y),
with equality π-almost surely;
(e) π is concentrated on Γ .
(iii) If c is real-valued, C(µ, ν) < +∞, and one has the pointwise
upper bound

c(x, y) ≤ cX (x) + cY (y), (cX , cY ) ∈ L1 (µ) × L1 (ν), (5.9)

Kantorovich duality 71

then both the primal and dual Kantorovich problems have solutions, so
Z
min c(x, y) dπ(x, y)
π∈Π(µ,ν) X ×Y
Z Z
= max φ(y) dν(y) − ψ(x) dµ(x)
(ψ,φ)∈L1 (µ)×L1 (ν); φ−ψ≤c Y X
Z Z
c
= max ψ (y) dν(y) − ψ(x) dµ(x) ,
ψ∈L1 (µ) Y X

and in the latter expressions one might as well impose that ψ be c-

convex and φ = ψ c . If in addition a, b and c are continuous, then there
is a closed c-cyclically monotone set Γ ⊂ X × Y, such that for any
π ∈ Π(µ, ν) and for any c-convex ψ ∈ L1 (µ),
(
π is optimal in the Kantorovich problem if and only if π[Γ ] = 1;
ψ is optimal in the dual Kantorovich problem if and only if Γ ⊂ ∂c ψ.

Remark 5.11. When the cost c is continuous, then the support of π

is c-cyclically monotone; but for a discontinuous cost function it might
a priori be that π is concentrated on a (nonclosed) c-cyclically mono-
tone set, while the support of π is not c-cyclically monotone. So, in the
sequel, the words “concentrated on” are not exchangeable with “sup-
ported in”. There is another subtlety for discontinuous cost functions:
It is not clear that the functions φ and ψ c appearing in statements (ii)
and (iii) are Borel measurable; it will only be proven that they coincide
with measurable functions outside of a ν-negligible set.

Remark 5.12. Note the difference between statements (b) and (e):
The set Γ appearing in (ii)(e) is the same for all optimal π’s, it only
depends on µ and ν. This set is in general not unique. If c is contin-
uous and Γ is imposed to be closed, then one can define a smallest
Γ , which is the closure of the union of all the supports of the opti-
mal π’s. There is also a largest Γ , which is the intersection of all the
c-subdifferentials ∂c ψ, where ψ is such that there exists an optimal π
supported in ∂c ψ. (Since the cost function is assumed to be continuous,
the c-subdifferentials are closed, and so is their intersection.)

Remark 5.13. Here is a useful practical consequence of Theorem 5.10:

Given a transference plan π, if you can cook up a pair of competitive
prices (ψ, φ) such that φ(y) − ψ(x) = c(x, y) throughout the support
of π, then you know that π is optimal. This theorem also shows that
72 5 Cyclical monotonicity and Kantorovich duality

optimal transference plans satisfy very special conditions: if you ﬁx an

optimal pair (ψ, φ), then mass arriving at y can come from x only if
c(x, y) = φ(y) − ψ(x) = ψ c (y) − ψ(x), which means that

x ∈ Arg min ψ(x′ ) + c(x′ , y) .
x′ ∈X

In terms of my bakery analogy this can be restated as follows: A café

accepts bread from a bakery only if the combined cost of buying the
bread there and transporting it here is lowest among all possible bak-
eries. Similarly, given a pair of competitive prices (ψ, φ), if you can cook
up a transference plan π such that φ(y) − ψ(x) = c(x, y) throughout
the support of π, then you know that (ψ, φ) is a solution to the dual
Kantorovich problem.

Remark 5.14. The assumption c ≤ cX + cY in (iii) can be weakened

into Z
c(x, y) dµ(x) dν(y) < +∞,
X ×Y
or even  Z

 µ x; c(x, y) dν(y) < +∞ > 0;


 Y
(5.10)

 Z


ν y; c(x, y) dµ(x) < +∞ > 0.
X

Remark 5.15. If the variables x and y are swapped, then (µ, ν) should
be replaced by (ν, µ) and (ψ, φ) by (−φ, −ψ).

Particular Case 5.16. Particular Case 5.4 leads to the following vari-
ant of Theorem 5.10. When c(x, y) = d(x, y) is a distance on a Polish
space X , and µ, ν belong to P1 (X ), then
Z Z
inf E d(X, Y ) = sup E [ψ(X) − ψ(Y )] = sup ψ dµ − ψ dν .
X Y
(5.11)
where the inﬁmum on the left is over all couplings (X, Y ) of (µ, ν), and
the supremum on the right is over all 1-Lipschitz functions ψ. This is
the Kantorovich–Rubinstein formula; it holds true as soon as the
supremum in the left-hand side is ﬁnite, and it is very useful.

Particular Case 5.17. Now consider c(x, y) = −x · y in Rn × Rn .

This cost is not nonnegative, but we have the lower bound c(x, y) ≥
Kantorovich duality 73

−(|x|2 + |y|2 )/2. So if x → |x|2 ∈ L1 (µ) and y → |y|2 ∈ L1 (ν), then one
can invoke the Particular Case 5.3 to deduce from Theorem 5.10 that
Z Z
∗
∗
sup E (X · Y ) = inf E ϕ(X) + ϕ (Y ) = inf ϕ dµ + ϕ dν ,
X Y
(5.12)
where the supremum on the left is over all couplings (X, Y ) of (µ, ν), the
inﬁmum on the right is over all (lower semicontinuous) convex functions
on Rn , and ϕ∗ stands for the usual Legendre transform of ϕ. In for-
mula (5.12), the signs have been changed with respect to the statement
of Theorem 5.10, so the problem is to maximize the correlation of
the random variables X and Y .

Before proving Theorem 5.10, I shall ﬁrst informally explain the

construction. At ﬁrst reading, one might be content with these informal
explanations and skip the rigorous proof.

Idea of proof of Theorem 5.10. Take an optimal π (which exists from

Theorem 4.1), and let (ψ, φ) be two competitive prices. Of course, as
in (5.4),
Z Z Z Z
c(x, y) dπ(x, y) ≥ φ dν − ψ dµ = [φ(y) − ψ(x)] dπ(x, y).
R
So if both quantities are equal, then [c − φ + ψ] dπ = 0, and since the
integrand is nonnegative, necessarily

φ(y) − ψ(x) = c(x, y) π(dx dy) − almost surely.

Intuitively speaking, whenever there is some transfer of goods from x

to y, the prices should be adjusted exactly to the transport cost.
Now let (xi )0≤i≤m and (yi )0≤i≤m be such that (xi , yi ) belongs to the
support of π, so there is indeed some transfer from xi to yi . Then we
hope that 

 φ(y0 ) − ψ(x0 ) = c(x0 , y0 )


φ(y ) − ψ(x ) = c(x , y )
1 1 1 1

 ...


φ(y ) − ψ(x ) = c(x , y ).
m m m m

On the other hand, if x is an arbitrary point,

74 5 Cyclical monotonicity and Kantorovich duality


 φ(y0 ) − ψ(x1 ) ≤ c(x1 , y0 )


φ(y ) − ψ(x ) ≤ c(x , y )
1 2 2 1
. . .



φ(y ) − ψ(x) ≤ c(x, y ).
m m

By subtracting these inequalities from the previous equalities and

adding up everything, one obtains

ψ(x) ≥ ψ(x0 ) + c(x0 , y0 ) − c(x1 , y0 ) + . . . + c(xm , ym ) − c(x, ym ) .

Of course, one can add an arbitrary constant to ψ, provided that one

subtracts the same constant from φ; so it is possible to decide that
ψ(x0 ) = 0, where (x0 , y0 ) is arbitrarily chosen in the support of π.
Then

ψ(x) ≥ c(x0 , y0 ) − c(x1 , y0 ) + . . . + c(xm , ym ) − c(x, ym ) , (5.13)

and this should be true for all choices of (xi , yi ) (1 ≤ i ≤ m) in the

support of π, and for all m ≥ 1. So it becomes natural to define ψ
as the supremum of all the functions (of the variable x) appearing in
the right-hand side of (5.13). It will turn out that this ψ satisﬁes the
equation

ψ c (y) − ψ(x) = c(x, y) π(dx dy)-almost surely.

Then, if ψ and ψ c are integrable, one can write

Z Z Z Z Z
c c
c dπ = ψ dπ − ψ dπ = ψ dν − ψ dµ.

This shows at the same time that π is optimal in the Kantorovich

problem, and that the pair (ψ, ψ c ) is optimal in the dual Kantorovich
problem. ⊓
⊔

Rigorous proof of Theorem 5.10, Part (i). First I claim that it is suffi-
cient to treat the case when c is nonnegative. Indeed, let
Z Z
c(x, y) := c(x, y) − a(x) − b(y) ≥ 0,
e Λ := a dµ + b dν ∈ R.

Whenever ψ : X → R∪{+∞} and φ : Y → R∪{−∞} are two functions,

deﬁne
e
ψ(x) := ψ(x) + a(x), e
φ(y) := φ(y) − b(y).
Kantorovich duality 75

Then the following properties are readily checked:

c real-valued =⇒ e
c real-valued

c lower semicontinuous =⇒ e
c lower semicontinuous
ψe ∈ L1 (µ) ⇐⇒ ψ ∈ L1 (µ); φe ∈ L1 (ν) ⇐⇒ φ ∈ L1 (ν);
Z Z
∀π ∈ Π(µ, ν), c dπ = c dπ − Λ;
e
Z Z Z Z
1 1
∀(ψ, φ) ∈ L (µ) × L (ν), φe dν − ψe dµ = φ dν − ψ dν − Λ;

ψ is c-convex ⇐⇒ ψe is e
c-convex;
φ is c-concave ⇐⇒ φe is e
c-concave;
e ψ)
(φ, ψ) are c-conjugate ⇐⇒ (φ, e are e
c-conjugate;
Γ is c-cyclically monotone ⇐⇒ Γ is e
c-cyclically monotone.
Thanks to these formulas, it is equivalent to establish Theorem 5.10
for the cost c or for the nonnegative cost e
c. So in the sequel, I shall
assume, without further comment, that c is nonnegative.
The rest of the proof is divided into ﬁve steps.
P P
Step 1: If µ = (1/n) ni=1 δxi , ν = (1/n) nj=1 δyj , where the costs
c(xi , yj ) are finite, then there is at least one cyclically monotone trans-
ference plan.
Indeed, in that particular case, a transference plan between µ and
ν can be identiﬁed with a bistochastic n × n array of real numbers
aij ∈ [0, 1]: each aij tells what proportion of the 1/n mass carried by
point xi will go to destination yj . So the Monge–Kantorovich problem
becomes X
inf aij c(xi , yi )
(aij )
ij

where the inﬁmum is over all arrays (aij ) satisfying

X X
aij = 1, aij = 1. (5.14)
i j

Here we are minimizing a linear function on the compact set [0, 1]n×n ,
so obviously there exists a minimizer; the corresponding transference
plan π can be written as
76 5 Cyclical monotonicity and Kantorovich duality

1X
π= aij δ(xi ,yj ) ,
n
ij

and its support S is the set of all couples (xi , yj ) such that aij > 0.
Assume that S is not cyclically monotone: Then there exist N ∈ N
and (xi1 , yj1 ), . . . , (xiN , yjN ) in S such that

c(xi1 , yj2 ) + c(xi2 , yj3 ) + . . . + c(xiN , yj1 ) < c(xi1 , yj1 ) + . . . + c(xiN , yjN ).
(5.15)
Let a := min(ai1 ,j1 , . . . , aiN ,jN ) > 0. Deﬁne a new transference plan π e
by the formula
N
aX
e=π+
π δ(xi ,yj ) − δ(xi ,yj ) .
n ℓ ℓ+1 ℓ ℓ
ℓ=1

It is easy to check that this has the correct marginals, and by (5.15)
the cost associated with πe is strictly less than the cost associated with
π. This is a contradiction, so S is indeed c-cyclically monotone!
Step 2: If c is continuous, then there is a cyclically monotone trans-
ference plan.
To prove this, consider sequences of independent random variables
xi ∈ X , yj ∈ Y, with respective law µ, ν. According to the law of
large numbers for empirical measures (sometimes called fundamental
theorem of statistics, or Varadarajan’s theorem), one has, with proba-
bility 1,
n n
1X 1X
µn := δxi −→ µ, νn := δyj −→ ν (5.16)
n n
i=1 j=1

as n → ∞, in the sense of weak convergence of measures. In particular,

by Prokhorov’s theorem, (µn ) and (νn ) are tight sequences.
For each n, let πn be a cyclically monotone transference plan be-
tween µn and νn . By Lemma 4.4, {πn }n∈N is tight. By Prokhorov’s
theorem, there is a subsequence, still denoted (πn ), which converges
weakly to some probability measure π, i.e.
Z Z
h(x, y) dπn (x, y) −→ h(x, y) dπ(x, y)

for all bounded continuous functions h on X × Y. By applying the

previous identity with h(x, y) = f (x) and h(x, y) = g(y), we see that
Kantorovich duality 77

π has marginals µ and ν, so this is an admissible transference plan

between µ and ν.
For each n, the cyclical monotonicity of πn implies that for all N
and πn⊗N -almost all (x1 , y1 ), . . . , (xN , yN ), the inequality (5.1) is sat-
isfied; in other words, πn⊗N is concentrated on the set C(N ) of all
((x1 , y1 ), . . . , (xN , yN )) ∈ (X × Y)N satisfying (5.1). Since c is con-
tinuous, C(N ) is a closed set, so the weak limit π ⊗N of πn⊗N is also
concentrated on C(N ). Let Γ = Spt π (Spt stands for “support”), then
Γ N = (Spt π)N = Spt(π ⊗N ) ⊂ C(N ),
and since this holds true for all N , Γ is c-cyclically monotone.
Step 3: If c is continuous real-valued and π is c-cyclically monotone,
then there is a c-convex ψ such that ∂c ψ contains the support of π.
Indeed, let Γ again denote the support of π (this is a closed set).
Pick any (x0 , y0 ) ∈ Γ , and define
n
ψ(x) := sup sup c(x0 , y0 ) − c(x1 , y0 ) + c(x1 , y1 ) − c(x2 , y1 )
m∈N
o
+ · · · + c(xm , ym ) − c(x, ym ) ; (x1 , y1 ), . . . , (xm , ym ) ∈ Γ .
(5.17)
By applying the definition with m = 1 and (x1 , y1 ) = (x0 , y0 ), one
immediately sees that ψ(x0 ) ≥ 0. On the other hand, ψ(x0 ) is the
supremum of all the quantities [c(x0 , y0 ) − c(x1 , y0 )] + . . . + [c(xm , ym ) −
c(x0 , ym )] which by cyclical monotonicity are all nonpositive. So actu-
ally ψ(x0 ) = 0. (In fact this is the only place in this step where c-cyclical
monotonicity will be used!)
By renaming ym as y, obviously
n
ψ(x) = sup sup sup c(x0 , y0 ) − c(x1 , y0 )
y∈Y m∈N (x1 ,y1 ),...,(xm−1 ,ym−1 ),xm

+ c(x1 , y1 ) − c(x2 , y1 ) + · · · + c(xm , y) − c(x, y) ;
o
(x1 , y1 ), . . . , (xm , y) ∈ Γ . (5.18)

So ψ(x) = supy [ζ(y) − c(x, y)], if ζ is deﬁned by

n
ζ(y) = sup c(x0 , y0 )−c(x1 , y0 ) + c(x1 , y1 )−c(x2 , y1 ) +· · ·+c(xm , y);
o
m ∈ N, (x1 , y1 ), . . . , (xm , y) ∈ Γ (5.19)
78 5 Cyclical monotonicity and Kantorovich duality

(with the convention that ζ = −∞ out of projY (Γ )). Thus ψ is a c-

convex function.
Now let (x, y) ∈ Γ . By choosing xm = x, ym = y in the deﬁnition of
ψ,
n
ψ(x) ≥ sup sup c(x0 , y0 ) − c(x1 , y0 ) +
m (x1 ,y1 ),...,(xm−1 ,ym−1 )
o
· · · + c(xm−1 , ym−1 ) − c(x, ym−1 ) + c(x, y) − c(x, y) .

In the deﬁnition of ψ, it does not matter whether one takes the supre-
mum over m − 1 or over m variables, since one also takes the supremum
over m. So the previous inequality can be recast as

ψ(x) ≥ ψ(x) + c(x, y) − c(x, y).

In particular, ψ(x) + c(x, y) ≥ ψ(x) + c(x, y). Taking the inﬁmum over
x ∈ X in the left-hand side, we deduce that

ψ c (y) ≥ ψ(x) + c(x, y).

Since the reverse inequality is always satisﬁed, actually

ψ c (y) = ψ(x) + c(x, y),

and this means precisely that (x, y) ∈ ∂c ψ. So Γ does lie in the c-

subdiﬀerential of ψ.
Step 4: If c is continuous and bounded, then there is duality.
Let kck := sup c(x, y). By Steps 2 and 3, there exists a transference
plan π whose support is included in ∂c ψ for some c-convex ψ, and which
was constructed “explicitly” in Step 3. Let φ = ψ c .
From (5.17), ψ = sup ψm , where each ψm is a supremum of contin-
uous functions, and therefore lower semicontinuous. In particular, ψ is
measurable.1 The same is true of φ.
Next we check that ψ, φ are bounded. Let (x0 , y0 ) ∈ ∂c ψ be such
that ψ(x0 ) < +∞; then necessarily φ(y0 ) > −∞. So, for any x ∈ X ,

ψ(x) = sup [φ(y) − c(x, y)] ≥ φ(y0 ) − c(x, y0 ) ≥ φ(y0 ) − kck;

1
A lower semicontinuous function on a Polish space is always measurable, even if
it is obtained as a supremum of uncountably many continuous functions; in fact
it can always be written as a supremum of countably many continuous functions!
Kantorovich duality 79

φ(y) = inf [ψ(x) + c(x, y)] ≤ ψ(x0 ) + c(x0 , y) ≤ ψ(x0 ) + kck.

Re-injecting these bounds into the identities ψ = φc , φ = ψ c , we get

ψ(x) ≤ sup φ(y) ≤ ψ(x0 ) + kck;

φ(y) ≥ inf ψ(x) ≥ φ(y0 ) − kck.

So both ψ and φ are bounded from above and below.

Thus we can integrate φ, ψ against µ, ν respectively, and, by the
marginal condition,
Z Z Z

φ(y) dν(y) − ψ(x) dµ(x) = φ(y) − ψ(x) dπ(x, y).

R − ψ(x) = c(x, y) on the support of π, the latter quantity

Since φ(y)
equals c(x, y) dπ(x, y). It follows that (5.4) is actually an equality,
which proves the duality.
Step 5: If c is lower semicontinuous, then there is duality.
Since c is nonnegative lower semicontinuous, we can write

c(x, y) = lim ck (x, y),

k→∞

where (ck )k∈N is a nondecreasing sequence of bounded, uniformly con-

tinuous functions. To see this, just choose
n h io
ck (x, y) = inf min c(x′ , y ′ ), k + k d(x, x′ ) + d(y, y ′ ) ;
(x′ ,y ′ )

note that ck is k-Lipschitz, nondecreasing in k, and further satisﬁes

0 ≤ ck (x, y) ≤ min(c(x, y), k).2
By Step 4, for each k we can ﬁnd πk , φk , ψk such that ψk is bounded
and c-convex, φk = (ψk )c , and
Z Z Z
ck (x, y) dπk (x, y) = φk (y) dν(y) − ψk (x) dµ(x).

Since ck is no greater than c, the constraint φk (y) − ψk (x) ≤ ck (x, y)

implies φk (y) − ψk (x) ≤ c(x, y); so all (φk , ψk ) are admissible in the
2
It is instructive to understand exactly where the lower semicontinuity assumption
is used to show c = lim ck .
80 5 Cyclical monotonicity and Kantorovich duality

dual problem with cost c. Moreover, for each k the functions φk and
ψk are uniformly continuous because c itself is uniformly continuous.
By Lemma 4.4, Π(µ, ν) is weakly sequentially compact. Thus, up to
extraction of a subsequence, we can assume that πk converges to some
e ∈ Π(µ, ν). For all indices ℓ ≤ k, we have cℓ ≤ ck , so
π
Z Z
cℓ de
π = lim cℓ dπk
k→∞
Z
≤ lim sup ck dπk
k→∞
Z Z
= lim sup φk (y) dν(y) − ψk (x) dµ(x) .
k→∞

On the other hand, by monotone convergence,

Z Z
c de
π = lim cℓ de
π.
ℓ→∞

So
Z Z Z Z
inf c dπ ≤ π ≤ lim sup
c de φk (y) dν(y) − ψk (x) dµ(x)
Π(µ,ν) k→∞
Z
≤ inf c dπ.
Π(µ,ν)

Moreover,
Z Z Z
φk (y) dν(y) − ψk (x) dµ(x) −−−→ inf c dπ. (5.20)
k→∞ Π(µ,ν)

Since each pair (ψk , φk ) lies in Cb (X ) × Cb (Y), the duality also holds
with bounded continuous (and even Lipschitz) test functions, as claimed
in Theorem 5.10(i). ⊓
⊔

Proof of Theorem 5.10, Part (ii). From now on, I shall assume that the
optimal transport cost C(µ, ν) is ﬁnite, and that c is real-valued. As
in the proof of Part (i) I shall assume that c is nonnegative, since the
general case can always be reduced to that particular case. Part (ii)
will be established in the following way: (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒
(a) ⇒ (e) ⇒ (b). There seems to be some redundancy in this chain of
implications, but this is because the implication (a) ⇒ (c) will be used
to construct the set Γ appearing in (e).
Kantorovich duality 81

(a) ⇒ (b): Let π be an optimal plan, and let (φk , ψk )k∈N be as in

Step 5 of the proof of Part (i). Since the optimal transport cost is ﬁnite
by assumption, the cost function c belongs to L1 (π). From (5.20) and
the marginal property of π,
Z h i
c(x, y) − φk (y) + ψk (x) dπ(x, y) −−−→ 0,
k→∞

so c(x, y) − φk (y) + ψk (x) converges to 0 in L1 (π) as k → ∞. Up

to choosing a subsequence, we can assume that the convergence is al-
most sure; then φk (yi ) − ψk (xi ) converges to c(xi , yi ), π(dxi dyi )-almost
surely, as k → ∞. By passing to the limit in the inequality
N
X N
X N
X
c(xi , yi+1 ) ≥ [φk (yi+1 ) − ψk (xi )] = [φk (yi ) − ψk (xi )]
i=1 i=1 i=1

(where by convention yN +1 = y1 ) we see that, π ⊗N -almost surely,

N
X N
X
c(xi , yi+1 ) ≥ c(xi , yi ). (5.21)
i=1 i=1

At this point we know that π ⊗N is concentrated on some set ΓN ⊂

(X × Y)N , such that ΓN consists of N -tuples ((x1 , y1 ), . . . , (xN , yN ))
satisfying (5.21). Let projk ((xi , yi )1≤i≤N ) := (xk , yk ) be the projec-
tion on the kth factor of (X × Y)N . It is not difficult to check that
Γ := ∩1≤k≤N projk (ΓN ) is a c-cyclically monotone set which has full
π-measure; so π is indeed c-cyclically monotone.
(b) ⇒ (c): Let π be a cyclically monotone transference plan. The
function ψ can be constructed just as in Step 3 of the proof of Part (i),
only with some differences. First, Γ is not necessarily closed; it is just
a Borel set such that π[Γ ] = 1. (If Γ is not Borel, make it Borel by
modifying it on a negligible set.) With this in mind, define, as in Step 3
of Part (i),
n
ψ(x) := sup sup c(x0 , y0 ) − c(x1 , y0 ) + c(x1 , y1 ) − c(x2 , y1 )
m∈N
o
+ · · · + c(xm , ym ) − c(x, ym ) ; (x1 , y1 ), . . . , (xm , ym ) ∈ Γ .
(5.22)

From its deﬁnition, for any x ∈ X ,

82 5 Cyclical monotonicity and Kantorovich duality

ψ(x) ≥ c(x0 , y0 ) − c(x, y0 ) > −∞.

(Here the assumption of c being real-valued is useful.) Then there is no

diﬃculty in proving, as in Step 3, that ψ(x0 ) = 0, that ψ is c-convex,
and that π is concentrated on ∂c ψ.
The rest of this step will be devoted to the measurability of ψ, ψ c
and ∂c ψ. These are surprisingly subtle issues, which do not arise if c is
continuous; so the reader who only cares for a continuous cost function
might go directly to the next step.
First, the measurability of ψ is not clear at all from formula (5.22):
This is typically an uncountable supremum of upper semicontinuous
functions, and there is no a priori reason for this to be Borel measurable.
Since c is nonnegative lower semicontinuous, there is a nondecreasing
sequence (cℓ )ℓ∈N of continuous nonnegative functions, such that cℓ (x, y)
converges to c(x, y) as ℓ → ∞, for all (x, y). By Egorov’s theorem,
for each k ∈ N there is a Borel set Ek with π[Ek ] ≤ 1/k, such that
the convergence of cℓ to c is uniform on Γ \ Ek . Since π (just as any
probability measure on a Polish space) is regular, we can ﬁnd a compact
set Γk ⊂ Γ \Ek , such that π[Γk ] ≥ 1−2/k. There is no loss of generality
in assuming that the sets Γk are increasing in k.
On each Γk , the sequence (cℓ ) converges uniformly and monotoni-
cally to c; in particular c is continuous on Γk . Furthermore, since π is
obviously concentrated on the union of all Γk , there is no loss of general-
ity in assuming that Γ = ∪Γk . We may also assume that (x0 , y0 ) ∈ Γ1 .
Now, let x be given in X , and for each k, ℓ, m, let

Fm,k,ℓ x0 , y0 , . . . , xm , ym := c(x0 , y0 ) − cℓ (x1 , y0 )

+ c(x1 , y1 ) − cℓ (x2 , y1 ) + · · · + c(xm , ym ) − cℓ (x, ym ) ,

for (x0 , y0 , . . . , xm , ym ) ∈ Γkm . It is clear that Fm,k,ℓ is a continuous

function (because cℓ is continuous on X × X , and c is continuous on
Γk ). It is deﬁned on the compact set Γkm , and it is nonincreasing as a
function of ℓ, with
lim Fm,k,ℓ = Fm,k ,
ℓ→∞

where

Fm,k x0 , y0 , . . . , xm , ym := c(x0 , y0 ) − c(x1 , y0 )

+ c(x1 , y1 ) − c(x2 , y1 ) + · · · + c(xm , ym ) − c(x, ym ) .
Kantorovich duality 83

Now I claim that

lim sup Fm,k,ℓ = sup Fm,k . (5.23)

ℓ→∞ Γ m Γkm
k

Indeed, by compactness, for each ℓ ∈ N there is Xℓ ∈ Γkm such that

sup Fm,k,ℓ = Fm,k,ℓ (Xℓ );

Γkm

and up to extraction of a subsequence, one may assume that Xℓ con-

verges to some X. Then by monotonicity, for any ℓ′ ≤ ℓ,

sup Fm,k,ℓ = Fm,k,ℓ (Xℓ ) ≤ Fm,k,ℓ′ (Xℓ );

Γkm

and if one lets ℓ → ∞, with ℓ′ ﬁxed, one obtains

lim sup sup Fm,k,ℓ ≤ Fm,k,ℓ′ (X).

ℓ→∞ Γkm

Now let ℓ′ → ∞, to get

lim sup sup Fm,k,ℓ ≤ Fm,k (X) ≤ sup Fm,k .

ℓ→∞ Γkm Γkm

The converse inequality

sup Fm,k ≤ lim inf sup Fm,k,ℓ

Γkm ℓ→∞ Γkm

is obvious because Fm,k ≤ Fm,k,ℓ ; so (5.23) is proven.

To summarize: If we let
n
ψm,k,ℓ (x) := sup c(x0 , y0 ) − cℓ (x1 , y0 ) + c(x1 , y1 ) − cℓ (x2 , y1 )
o
+ · · · + c(xm , ym ) − cℓ (x, ym ) ; (x1 , y1 ), . . . , (xm , ym ) ∈ Γk ,

then we have
n
lim ψm,k,ℓ (x) = sup c(x0 , y0 ) − c(x1 , y0 ) + c(x1 , y1 ) − c(x2 , y1 )
ℓ→∞
o
+ · · · + c(xm , ym ) − c(x, ym ) ; (x1 , y1 ), . . . , (xm , ym ) ∈ Γk .

It follows easily that, for each x,

84 5 Cyclical monotonicity and Kantorovich duality

ψ(x) = sup sup lim ψm,k,ℓ (x).

m∈N k∈N ℓ→∞

Since ψm,k,ℓ (x) is lower semicontinuous in x (as a supremum of contin-

uous functions of x), ψ itself is measurable.
The measurability of φ := ψ c is subtle also, and at the present
level of generality it is not clear that this function is really Borel mea-
surable. However, it can be modified on a ν-negligible set so as to
become measurable. Indeed, φ(y) − ψ(x) = c(x, y), π(dx dy)-almost
surely, so if one disintegrates π(dx dy) as π(dx|y) ν(dy), then φ(y)
will coincide, ν(dy)-almost surely, with the Borel function φ(y) e :=
R
X [ψ(x) + c(x, y)] π(dx|y).
Then let Z be a Borel set of zero ν-measure such that φe = φ outside
of Z. The subdifferential ∂c ψ coincides, out of the π-negligible set X ×Z,
with the measurable set {(x, y) ∈ X × Y; φ(y)e − ψ(x) = c(x, y)}. The
conclusion is that ∂c ψ can be modified on a π-negligible set so as to be
Borel measurable.
(c) ⇒ (d): Just let φ = ψ c .
(d) ⇒ (a): Let (ψ, φ) be a pair of admissible functions, and let π be
a transference plan such that φ − ψ = c, π-almost surely. The goal is to
show that π is optimal. The main difficulty lies in the fact that ψ and
φ need not be separately integrable. This problem will be circumvented
by a careful truncation procedure. For n ∈ N, w ∈ R ∪ {±∞}, define


w if |w| ≤ n
Tn (w) = n if w > n


−n if w < −n,

and

ξ(x, y) := φ(y) − ψ(x); ξn (x, y) := Tn (φ(y)) − Tn (ψ(x)).

In particular, ξ0 = 0. It is easily checked that ξn converges monotoni-

cally to ξ; more precisely,
• ξn (x, y) remains equal to 0 if ξ(x, y) = 0;
• ξn (x, y) increases to ξ(x, y) if the latter quantity is positive;
• ξn (x, y) decreases to ξ(x, y) if the latter quantity is negative.
As a consequence, ξn ≤ (ξn )+ ≤ ξ+ ≤ c. So (Tn φ, Tn ψ) is an admis-
sible pair in the dual Kantorovich problem, and
Kantorovich duality 85
Z Z Z Z Z
ξn dπ = (Tn φ) dν − (Tn ψ) dµ ≤ sup φ′ dµ − ψ ′ dν .
φ′ −ψ′ ≤c
(5.24)
On the other hand, by monotone convergence and since ξ coincides
with c outside of a π-negligible set,
Z Z Z
ξn dπ −−−→ ξ dπ = c dπ;
ξ≥0 n→∞ ξ≥0

Z Z
ξn dπ −−−→ ξ dπ = 0.
ξ<0 n→∞ ξ<0

This and (5.24) imply that

Z Z Z
′ ′
c dπ ≤ sup φ dµ − ψ dν ;
φ′ −ψ′ ≤c

so π is optimal.
Before completing the chain of equivalences, we should ﬁrst con-
struct the set Γ . By Theorem 4.1, there is at least one optimal trans-
ference plan, say πe. From the implication (a) ⇒ (c), there is some ψe
such that π e just choose Γ := ∂c ψ.
e is concentrated on ∂c ψ; e
(a) ⇒ (e): Let πe be the optimal plan used to construct Γ , and let
e
ψ = ψ be the associated c-convex function; let φ = ψ c . Then let π
e have the same cost and same
be another optimal plan. Since π and π
marginals,
Z Z Z
c dπ = c de π = lim (Tn φ − Tn ψ) de
π
n→∞
Z
= lim (Tn φ − Tn ψ) dπ,
n→∞

where Tn is the truncation operator that was already used in the proof
of (d) ⇒ (a). So
Z

c(x, y) − Tn φ(y) + Tn ψ(x) dπ(x, y) −−−→ 0. (5.25)
n→∞

As before, deﬁne ξ(x, y) := φ(y) − ψ(x); then by monotone conver-

gence, Z Z

c − Tn φ + Tn ψ dπ −−−→ (c − ξ) dπ;
ξ≥0 n→∞ ξ≥0
86 5 Cyclical monotonicity and Kantorovich duality
Z Z

c − Tn φ + Tn ψ dπ −−−→ (c − ξ) dπ.
ξ<0 n→∞ ξ<0

Since ξ ≤ c, the integrands here are nonnegative and both integrals

make sense in [0, +∞]. So by adding the two limits and using (5.25) we
get Z Z

(c − ξ) dπ = lim c − Tn φ + Tn ψ = 0.
n→∞

Since ξ ≤ c, this proves that c coincides π-almost surely with ξ, which

was the desired conclusion.
(e) ⇒ (b): This is obvious since Γ is cyclically monotone by assump-
tion. ⊓
⊔

Proof of Theorem 5.10, Part (iii). As in the proof of Part (i) we may
assume that c ≥ 0. Let π be optimal, and let ψ be a c-convex func-
tion such that π is concentrated on ∂c ψ. Let φ := ψ c . The goal is to
show that under the assumption c ≤ cX + cY , (ψ, φ) solves the dual
Kantorovich problem.
The point is to prove that ψ and φ are integrable. For this we repeat
the estimates of Step 4 in the proof of Part (i), with some variants: After
securing (x0 , y0 ) such that φ(y0 ), ψ(x0 ), cX (x0 ) and cY (y0 ) are ﬁnite,
we write

ψ(x) + cX (x) = sup φ(y) − c(x, y) + cX (x) ≥ sup [φ(y) − cY (y)]
y y

≥ φ(y0 ) − cY (y0 );

φ(y) − cY (y) = inf ψ(x) + c(x, y) − cY (y) ≤ inf [ψ(x) + cX (x)]
x x
≤ ψ(x0 ) + cX (x0 ).

So ψ is bounded below by the µ-integrable function φ(y0 )−cY (y0 )−cX ,

and φ is bounded
R above byR the ν-integrable function ψ(x0 )+cX (x0 )+cY ;
R − ψ dµ and
thus both R φ dν make sense in R ∪ {−∞}. Since their
sum is (φ − ψ) dπ = c dπ > −∞, both integrals are ﬁnite. So
Z Z Z
c dπ = φ dν − ψ dµ,

and it results from Part (i) of the theorem that both π and (ψ, φ) are
optimal, respectively in the original and the dual Kantorovich prob-
lems.
Restriction property 87

To prove the last part of (iii), assume that c is continuous; then the
subdiﬀerential of any c-convex function is a closed c-cyclically mono-
tone set.
Let π be an arbitrary optimal transference plan, and (ψ, φ) an op-
timal pair of prices. We know that (ψ, ψ c ) is optimal in the dual Kan-
torovich problem, so
Z Z Z
c
c(x, y) dπ(x, y) = ψ dν − ψ dµ.

Thanks to the marginal condition, this be rewritten as

Z
c
ψ (y) − ψ(x) − c(x, y) dπ(x, y) = 0.

Since the integrand is nonnegative, it follows that π is concentrated on

the set of pairs (x, y) such that ψ c (y) − ψ(x) − c(x, y) = 0, which is
precisely the subdifferential of ψ. Thus any optimal transference plan is
concentrated on the subdifferential of any optimal ψ. So if Γ is defined
as the intersection of all subdifferentials of optimal functions ψ, then
Γ also contains the supports of all optimal plans.
Conversely,
R eR ∈ Π(µ, ν) is a Rtransference
if π R plan concentrated on
Γ , then c de c
π = [ψ − ψ] de c
π = ψ dν − ψ dµ, so π e is optimal.
Similarly, if ψe is a c-convex function such that ∂c ψe contains Γ , then
by the previous estimates e ψec are Rintegrable Ragainst µ and ν
R Rψ and
respectively, and c dπ = [ψ − ψ] dπ = ψec dν − ψe dµ, so (ψ,
ec e e ψec )
is optimal. This concludes the proof. ⊓
⊔

Restriction property

The dual side of the Kantorovich problem also behaves well with respect
to restriction, as shown by the following results.

Lemma 5.18 (Restriction of c-convexity). Let X , Y be two sets

and c : X × Y → R ∪ {+∞}. Let X ′ ⊂ X , Y ′ ⊂ Y and let c′ be the
restriction of c to X ′ × Y ′ . Let ψ : X → R ∪ {+∞} be a c-convex
function. Then there is a c′ -convex function ψ ′ : X ′ → R ∪ {+∞} such
that ψ ′ ≤ ψ on X ′ , ψ ′ coincides with ψ on projX ((∂c ψ) ∩ (X ′ × Y ′ ))
and ∂c ψ ∩ (X ′ × Y ′ ) ⊂ ∂c′ ψ ′ .
88 5 Cyclical monotonicity and Kantorovich duality

Theorem 5.19 (Restriction for the Kantorovich duality the-

orem). Let (X , µ) and (Y, ν) be two Polish probability spaces, let
a ∈ L1 (µ) and b ∈ L1 (ν) be real-valued upper semicontinuous func-
tions, and let c : X × Y → R be a lower semicontinuous cost function
such that c(x, y) ≥ a(x) + b(y) for all x, y. Assume that the optimal
transport cost C(µ, ν) between µ and ν is finite. Let π be an optimal
transference plan, and let ψ be a c-convex function such that π is con-
centrated on ∂c ψ. Let π e be a measure on X × Y satisfying π e ≤ π, and
Z=π ′
e[X × Y] > 0; let π := π e/Z, and let µ , ν be the marginals of π ′ .
′ ′

Let X ⊂ X be a closed set containing Spt µ′ and Y ′ ⊂ Y a closed set

′

containing Spt ν ′ . Let c′ be the restriction of c to X ′ × Y ′ . Then there

is a c′ -convex function ψ ′ : X ′ → R ∪ {+∞} such that
(a) ψ ′ coincides with ψ on projX ((∂c ψ) ∩ (X ′ × Y ′ )), which has full
µ′ -measure;
(b) π ′ is concentrated on ∂c′ ψ ′ ;
(c) ψ ′ solves the dual Kantorovich problem between (X ′ , µ′ ) and
(Y ′ , ν ′ )
with cost c′ .

In spite of its superﬁcially complicated appearance, Theorem 5.19 is

very simple. If the reader feels that it is obvious, or alternatively that
it is obscure, he or she may just skip the proofs and retain the loose
statement that “it is always possible to restrict the cost function”.

Proof of Lemma 5.18. Let φ = ψ c . For y ∈ Y ′ , deﬁne

(
φ(y) if ∃ x′ ∈ X ′ ; (x′ , y) ∈ ∂c ψ;
φ′ (y) =
−∞ otherwise.

For x ∈ X ′ let then

ψ ′ (x) = sup φ′ (y) − c(x, y) = sup φ′ (y) − c′ (x, y) .
y∈Y ′ y∈Y ′

By construction, ψ ′ is c′ -convex. Since φ′ ≤ φ and Y ′ ⊂ Y it is obvious

that
∀x ∈ X ′ , ψ ′ (x) ≤ sup φ(y) − c(x, y) = ψ(x).
y∈Y

Let x ∈ projX ((∂c ψ) ∩ (X ′ ∩ Y ′ ));

this means that there is y ∈ Y ′ such
′
that (x, y) ∈ ∂c ψ. Then φ (y) = φ(y), so

ψ ′ (x) ≥ φ′ (y) − c(x, y) = φ(y) − c(x, y) = ψ(x).

Application: Stability 89

Thus ψ ′ does coincide with ψ on projX ((∂c ψ) ∩ (X ′ × Y ′ )).

Finally, let (x, y) ∈ ∂c ψ ∩(X ′ ×Y ′ ), then φ′ (y) = φ(y), ψ ′ (x) = ψ(x);
so for all z ∈ X ′ ,
ψ ′ (x) + c′ (x, y) = ψ(x) + c(x, y) = φ(y) = φ′ (y) ≤ ψ ′ (z) + c′ (z, y),
which means that (x, y) ∈ ∂c′ ψ ′ . ⊓
⊔
Proof of Theorem 5.19. Let ψ ′ be deﬁned by Lemma 5.18. To prove (a),
it suﬃces to note that π ′ is concentrated on (∂c ψ) ∩ (X ′ × Y ′ ), so µ′ is
concentrated on projX ((∂c ψ) ∩ (X ′ × Y ′ )). Then π is concentrated on
e is concentrated on ∂c ψ ∩ (X ′ × Y ′ ), which by Lemma 5.18
∂c ψ, so π
is included in ∂c′ ψ ′ ; this proves (b). Finally, (c) follows from Theo-
rem 5.10(ii). ⊓
⊔

The rest of this chapter is devoted to various applications of Theo-

rem 5.10.

Application: Stability
An important consequence of Theorem 5.10 is the stability of optimal
transference plans. For simplicity I shall prove it under the assumption
that c is bounded below.
Theorem 5.20 (Stability of optimal transport). Let X and Y be
Polish spaces, and let c ∈ C(X × Y) be a real-valued continuous cost
function, inf c > −∞. Let (ck )k∈N be a sequence of continuous cost
functions converging uniformly to c on X × Y. Let (µk )k∈N and (νk )k∈N
be sequences of probability measures on X and Y respectively. Assume
that µk converges to µ (resp. νk converges to ν) weakly. For each k, let
πk be an optimal transference plan between µk and νk . If
Z
∀k ∈ N, ck dπk < +∞,

then, up to extraction of a subsequence, πk converges weakly to some

c-cyclically monotone transference plan π ∈ Π(µ, ν).
If moreover Z
lim inf ck dπk < +∞,
k∈N
then the optimal total transport cost C(µ, ν) between µ and ν is finite,
and π is an optimal transference plan.
90 5 Cyclical monotonicity and Kantorovich duality

Corollary 5.21 (Compactness of the set of optimal plans). Let

X and Y be Polish spaces, and let c(x, y) be a real-valued continuous
cost function, inf c > −∞. Let K and L be two compact subsets of
P (X ) and P (Y) respectively. Then the set of optimal transference plans
π whose marginals respectively belong to K and L is itself compact in
P (X × Y).
Proof of Theorem 5.20. Since µk and νk are convergent sequences, by
Prokhorov’s theorem they constitute tight sets, and then by Lemma 4.4
the measures πk all lie in a tight set of X ×Y; so we can extract a further
subsequence, still denoted (πk ) for simplicity, which converges weakly
to some π ∈ Π(µ, ν).
To prove that π is c-monotone, the argument is essentially the same
as in Step 2 of the proof of Theorem 5.10(i). Indeed, by Theorem 5.10,
each πk is concentrated on a ck -cyclically monotone set; so πk⊗N is
concentrated on the set Ck (N ) of ((x1 , y1 ), . . . , (xN , yN )) such that
X X
ck (xi , yi ) ≤ ck (xi , yi+1 ),
1≤i≤N 1≤i≤N

where as usual yN +1 = y1 . So if ε > 0 and N are given, for k large

enough πk⊗N is concentrated on the set Cε (N ) deﬁned by
X X
c(xi , yi ) ≤ c(xi , yi+1 ) + ε.
1≤i≤N 1≤i≤N

Since this is a closed set, the same is true for π ⊗N , and then by letting
ε → 0 we see that π ⊗N is concentrated on the set C(N ) deﬁned by
X X
c(xi , yi ) ≤ c(xi , yi+1 ).
1≤i≤N 1≤i≤N

So the support of π is c-cyclicallyRmonotone, as desired.

Now assume that lim inf k→∞ ck dπk < +∞. Then by the same
argument as in the proof of Theorem 4.1,
Z Z
c dπ ≤ lim inf ck dπk < +∞.
k→∞

In particular, C(µ, ν) < +∞; so Theorem 5.10(ii) guarantees the opti-

mality of π. ⊓
⊔
An immediate consequence of Theorem 5.20 is the possibility to
select optimal transport plans in a measurable way.
Application: Stability 91

Corollary 5.22 (Measurable selection of optimal plans). Let X ,

Y be Polish spaces and let c : X × Y → R be a continuous cost function,
inf c > −∞. Let Ω be a measurable space and let ω 7−→ (µω , νω ) be a
measurable function Ω → P (X ) × P (Y). Then there is a measurable
choice ω 7−→ πω such that for each ω, πω is an optimal transference
plan between µω and νω .

Proof of Corollary 5.22. Let O be the set of all optimal transference

plans, equipped with the weak topology on P (X × Y), and let Φ : O →
P (X ) × P (Y) be the map which to π associates its marginals (µ, ν).
Obviously Φ is continuous. By Theorem 5.20, O is closed; in particular
it is a Polish space. By Theorem 4.1, Φ is onto. By Corollary 5.21
all pre-images Φ−1 (µ, ν) are compact. So the conclusion follows from
general theorems of measurable selection (see the bibliographical notes
for more information). ⊓
⊔

Theorem 5.20 also admits the following corollary about the stability
of transport maps.

Corollary 5.23 (Stability of the transport map). Let X be a

locally compact Polish space, and let (Y, d) be another Polish space. Let
c : X × Y → R be a lower semicontinuous function with inf c > −∞,
and for each k ∈ N let ck : X × Y → R be lower semicontinuous, such
that ck converges uniformly to c. Let µ ∈ P (X ) and let (νk )k∈N be a
sequence of probability measures on Y, converging weakly to ν ∈ P (Y).
Assume the existence of measurable maps Tk : X → Y such that each
πk := (Id , Tk )# µ is an optimal transference plan between µ and νk
for the cost ck , having finite total transport cost. Further, assume the
existence of a measurable map T : X → Y such that π := (Id , T )# µ is
the unique optimal transference plan between µ and ν, for the cost c, and
has finite total transport cost. Then Tk converges to T in probability:
hn oi
∀ε > 0 µ x ∈ X; d Tk (x), T (x) ≥ ε −−−→ 0. (5.26)
k→∞

Remark 5.24. An important assumption in the above statement is the

uniqueness of the optimal transport map T .

Remark 5.25. If the measure µ is replaced by a sequence (µk )k∈N

converging weakly to µ, then the maps Tk and T may be far away
from each other, even µk -almost surely: take for instance X = Y = R,
µk = δ1/k , µ = δ0 , νk = ν = δ0 , Tk (x) = 0, T (x) = 1x6=0 .
92 5 Cyclical monotonicity and Kantorovich duality

Proof of Corollary 5.23. By Theorem 5.20 and uniqueness of the opti-

mal coupling between µ and ν, we know that πk = (Id , Tk )# µk con-
verges weakly to π = (Id , T )# µ.
Let ε > 0 and δ > 0 be given. By Lusin’s theorem (in the abstract
version recalled in the end of the bibliographical notes) there is a closed
set K ⊂ X such that µ[X \ K] ≤ δ and the restriction of T to K is
continuous. So the set
n o
Aε = (x, y) ∈ K × Y; d(T (x), y) ≥ ε .

is closed in K × Y, and therefore closed in X × Y. Also π[Aε ] = 0 since

π is concentrated on the graph of T . So, by weak convergence of πk and
closedness of Aε ,

0 = π[Aε ] ≥ lim sup πk [Aε ]

k→∞

= lim sup πk (x, y) ∈ K × Y; d(T (x), y) ≥ ε
k→∞

= lim sup µ x ∈ K; d(T (x), Tk (x)) ≥ ε
k→∞

≥ lim sup µ x ∈ X ; d(T (x), Tk (x)) ≥ ε − δ.
k→∞

Letting δ → 0 we conclude that lim sup µ[d(T (x), Tk (x)) ≥ ε] = 0,

which was the goal. ⊓
⊔

Application: Dual formulation of transport inequalities

Let c be a given cost function, and let

Z
C(µ, ν) = inf c dπ (5.27)
π∈Π(µ,ν)

stand for the value of the optimal transport cost of transport between
µ and ν.
If ν is a given reference measure, inequalities of the form

∀µ ∈ P (X ), C(µ, ν) ≤ F (µ)

arise in several branches of mathematics; some of them will be studied

in Chapter 22. It is useful to know that if F is a convex function of
Application: Dual formulation of transport inequalities 93

µ, then there is a nice dual reformulation of these inequalities in terms

of the Legendre transform of F . This is the content of the following
theorem.
Theorem 5.26 (Dual transport inequalities). Let X , Y be two
Polish spaces, and ν a given probability measure on Y. Let F : P (X ) →
R ∪ {+∞} be a convex lower semicontinuous function on P (X ), and Λ
its Legendre transform on Cb (X ); more explicitly, it is assumed that
 Z

 ∀µ ∈ P (X ), F (µ) = sup ϕ dµ − Λ(ϕ)



 ϕ∈Cb (X ) X
(5.28)

 Z



∀ϕ ∈ Cb (X ), Λ(ϕ) = sup ϕ dµ − F (µ) .
µ∈P (X ) X

Further, let c : X ×Y → R∪{+∞} be a lower semicontinuous cost func-

tion, inf c > −∞. Then, the following two statements are equivalent:

(i) ∀µ ∈ P (X ), C(µ, ν) ≤ F (µ);

Z
c
(ii) ∀φ ∈ Cb (Y), Λ φ dν − φ ≤ 0, where φc (x) :=
Y
supy [φ(y) − c(x, y)].
Moreover, if Φ : R → R is a nondecreasing convex function with
Φ(0) = 0, then the following two statements are equivalent:
(i’) ∀µ ∈ P (X ), Φ(C(µ, ν)) ≤ F (µ);
Z
c ∗
(ii’) ∀φ ∈ Cb (Y), ∀t ≥ 0, Λ t φ dν − t φ − Φ (t) ≤ 0,
Y
where Φ∗ stands for the Legendre transform of Φ.
Remark 5.27. The writing in (ii) or (ii’) is not very rigorous since
Λ is a priori deﬁned on the set of bounded continuous functions, and
φc might not belong to that set. (It is clear that φc is bounded from
above, but this is all that can be said.) However, from (5.28) Λ(ϕ) is
a nondecreasing function of ϕ, so there is in practice no problem to
extend Λ to a more general class of measurable functions. In any case,
the correct way to interpret the left-hand side in (ii) is
Z Z
c
Λ φ dν − φ = sup Λ φ dν − ψ ,
Y ψ≥φc Y

where ψ in the supremum is assumed to be bounded continuous.

94 5 Cyclical monotonicity and Kantorovich duality

Remark 5.28. One may simplify (ii’) by taking the supremum over t;
since Λ is nonincreasing, the result is
Z
Λ Φ φ dν − φc ≤ 0. (5.29)
Y

(This shows in particular that the equivalence (i) ⇔ (ii) is a particular

case of the equivalence (i’) ⇔ (ii’).) However, in certain situations it
is better to use the inequality (ii’) rather than (5.29); see for instance
Proposition 22.5.

Example 5.29. The most famous example of inequality of the type

of (i) occurs when X = Y and F (µ) is the Kullback
R information of
µ with respect to ν, that is F (µ) = Hν (µ) = ρ log ρ dν, where ρ =
dµ/dν; and by convention F (µ) = +∞ if µ is not absolutely continuous
with respect to ν. Then one has the explicit formula
Z
ϕ
Λ(ϕ) = log e dν .

So the two functional inequalities

∀µ ∈ P (X ), C(µ, ν) ≤ Hν (µ)

and R Z
φ dν c
∀φ ∈ Cb (X ), e ≤ eφ dν

are equivalent.

Proof of Theorem 5.26. First assume that (i) is satisﬁed. Then for all
ψ ≥ φc ,
Z Z Z
Λ φ dν − ψ = sup φ dν − ψ dµ − F (µ)
Y µ∈P (X ) X Y
Z Z
= sup φ dν − ψ dµ − F (µ)
µ∈P (X ) Y X
h i
≤ sup C(µ, ν) − F (µ) ≤ 0,
µ∈P (X )

where the easiest part of Theorem 5.10 (that is, inequality (5.4)) was
used to go from the next-to-last line to the last one. Then (ii) follows
upon taking the supremum over ψ.
Application: Dual formulation of transport inequalities 95

Conversely, assume that (ii) is satisﬁed. Then, for any pair (ψ, φ) ∈
Cb (X ) × Cb (Y) one has, by (5.28),
Z Z Z Z Z
φ dν− ψ dµ = φ dν − ψ dµ ≤ Λ φ dν − ψ + F (µ).
Y X X Y Y

Taking the supremum over all ψ ≥ φc yields

Z Z Z
c c
φ dν − φ dµ ≤ Λ φ dν − φ + F (µ).
Y X Y

By assumption, the ﬁrst term in the right-hand side is always nonpos-

itive; so in fact Z Z
φ dν − φc dµ ≤ F (µ).
Y X

Then (i) follows upon taking the supremum over φ ∈ Cb (Y) and apply-
ing Theorem 5.10 (i).
Now let us consider the equivalence between (i’) and (ii’). By as-
sumption, Φ(r) ≤ 0 for r ≤ 0, so Φ∗ (t) = supr [r t − Φ(r)] = +∞ if
t < 0. Then the Legendre inversion formula says that

∀r ∈ R, Φ(r) = sup r t − Φ∗ (t) .
t∈R+

(The important thing is that the supremum is over R+ and not R.)
If (i’) is satisﬁed, then for all φ ∈ Cb (X ), for all ψ ≥ φc and for all
t ∈ R+ ,
Z
Λ t φ dν − t ψ − Φ∗ (t)
Y
Z Z
∗
= sup t φ dν − t ψ − Φ (t) dµ − F (µ)
µ∈P (X ) X Y
Z Z
∗
= sup t φ dν − ψ dµ − Φ (t) − F (µ)
µ∈P (X ) Y X
h i
∗
≤ sup t C(µ, ν) − Φ (t) − F (µ)
µ∈P (X )
h i
≤ sup Φ(C(µ, ν)) − F (µ) ≤ 0,
µ∈P (X )

where the inequality t r ≤ Φ(r) + Φ∗ (t) was used.

96 5 Cyclical monotonicity and Kantorovich duality

Conversely, if (ii’) is satisﬁed, then for all (φ, ψ) ∈ Cb (X ) × Cb (Y)

and t ≥ 0,
Z Z Z Z
∗ ∗
t φ dν − t ψ dµ − Φ (t) = tφ dν − t ψ − Φ (t) dµ
Y X X
ZY
∗
≤Λ t φ dν − t ψ − Φ (t) + F (µ);
Y

then by taking the supremum over ψ ≥ φc one obtains

Z
∗ c ∗
t C(µ, ν) − Φ (t) ≤ Λ t φ dν − t φ − Φ (t) + F (µ)
Y
≤ F (µ);

and (i’) follows by taking the supremum over t ≥ 0. ⊓

⊔

Application: Solvability of the Monge problem

As a last application of Theorem 5.10, I shall now present the criterion

which is used in the large majority of proofs of existence of a determin-
istic optimal coupling (or Monge transport).

Theorem 5.30 (Criterion for solvability of the Monge prob-

lem). Let (X , µ) and (Y, ν) be two Polish probability spaces, and let
a ∈ L1 (µ), b ∈ L1 (ν) be two real-valued upper semicontinuous func-
tions. Let c : X × Y → R be a lower semicontinuous cost function such
that c(x, y) ≥ a(x) + b(y) for all x, y. Let C(µ, ν) be the optimal total
transport cost between µ and ν. If
(i) C(µ, ν) < +∞;
(ii) for any c-convex function ψ : X → R ∪ {+∞}, the set of x ∈ X
such that ∂c ψ(x) contains more than one element is µ-negligible;
then there is a unique (in law) optimal coupling (X, Y ) of (µ, ν), and it
is deterministic. It is characterized (among all possible couplings) by the
existence of a c-convex function ψ such that, almost surely, Y belongs
to ∂c ψ(X). In particular, the Monge problem with initial measure µ
and final measure ν admits a unique solution.
Bibliographical notes 97

Proof of Theorem 5.30. The argument is almost obvious. By Theo-

rem 5.10(ii), there is a c-convex function ψ, and a measurable set
Γ ⊂ ∂c ψ such that any optimal plan π is concentrated on Γ . By as-
sumption there is a Borel set Z such that µ[Z] = 0 and ∂c ψ(x) contains
at most one element if x ∈ / Z. So for any x ∈ projX (Γ ) \ Z, there is ex-
actly one y ∈ Y such that (x, y) ∈ Γ , and we can then define T (x) = y.
Let now π be any optimal coupling. Because it has to be concen-
trated on Γ , and Z × Y is π-negligible, π is also concentrated on
Γ \ (Z × Y), which is precisely the set of all pairs (x, T (x)), i.e. the
graph of T . It follows that π is the Monge transport associated with
the map T .
The argument above is in fact a bit sloppy, since I did not check
the measurability of T . I shall show below how to slightly modify the
construction of T to make sure that it is measurable. The reader who
does not want to bother about measurability issues can skip the rest
of the proof.
Let (Kℓ )ℓ∈N be a nondecreasing sequence of compact sets, all of them
included in Γ \ (Z × Y), such that π[∪Kℓ ] = 1. (The family (Kℓ ) exists
because π, just as any finite Borel measure on a Polish space, is regular.)
If ℓ is given, then for any x lying in the compact set Jℓ := projX (Kℓ )
we can define Tℓ (x) as the unique y such that (x, y) ∈ Kℓ . Then we
can define T on ∪Jℓ by stipulating that for each ℓ, T restricts to Tℓ
on Jℓ . The map Tℓ is continuous on Jℓ : Indeed, if xm ∈ Tℓ and xm → x,
then the sequence (xm , Tℓ (xm ))m∈N is valued in the compact set Kℓ ,
so up to extraction it converges to some (x, y) ∈ Kℓ , and necessarily
y = Tℓ (x). So T is a Borel map. Even if it is not defined on the whole
of Γ \ (Z × Y), it is still defined on a set of full µ-measure, so the proof
can be concluded just as before. ⊓
⊔

Bibliographical notes

There are many ways to state the Kantorovich duality, and even more
ways to prove it. There are also several economic interpretations, that
belong to folklore. The one which I formulated in this chapter is a
variant of one that I learnt from Caﬀarelli. Related economic inter-
pretations underlie some algorithms, such as the fascinating “auction
98 5 Cyclical monotonicity and Kantorovich duality

algorithm” developed by Bertsekas (see [115, Chapter 7], or the vari-

ous surveys written by Bertsekas on the subject). But also many more
classical algorithms are based on the Kantorovich duality [743].
A common convention consists in taking the pair (−ψ, φ) as the un-
known.3 This has the advantage of making some formulas more sym-
metric: The c-transform becomes ϕc (y) = inf x [c(x, y) − ϕ(x)], and then
ψ c (x) = inf y [c(x, y) − ψ(y)], so this is the same formula going back and
forth between functions of x and functions of y, upon exchange of x
and y. Since X and Y may have nothing in common, in general this
symmetry is essentially cosmetic. The conventions used in this chap-
ter lead to a somewhat natural “economic” interpretation, and will
also lend themselves better to a time-dependent treatment. Moreover,
they also agree with the conventions used in weak KAM theory, and
more generally in the theory of dynamical systems [105, 106, 347]. It
might be good to make the link more explicit. In weak KAM theory,
X = Y is a Riemannian manifold M ; a Lagrangian cost function is
given on the tangent bundle TM ; and c = c(x, y) is a continuous cost
function defined from the dynamics, as the minimum action that one
should take to go from x to y (as later in Chapter 7). Since in gen-
eral c(x, x) 6= 0, it is not absurd to consider the optimal transport
cost C(µ, µ) between a measure µ and itself. If M is compact, it is
easy to show that there exists a µ that minimizes C(µ, µ). To the opti-
mal transport problem between µ and µ, Theorem 5.10 associates two
distinguished closed c-cyclically monotone sets: a minimal one and a
maximal one, respectively Γmin and Γmax ⊂ M × M . These sets can
be identified with subsets of TM via the embedding (initial position,
final position) 7−→ (initial position, initial velocity). Under that identi-
fication, Γmin and Γmax are called respectively the Mather and Aubry
sets; they carry valuable information about the underlying dynamics.
For mnemonic purposes, to recall which is which, the reader might use
the resemblance of the name “Mather” to the word “measure”. (The
Mather set is the one cooked up from the supports of the probability
measures.)
In the particular case when c(x, y) = |x − y|2 /2 in Euclidean space,
it is customary to expand c(x, y) as |x|2 /2 − x · y + |y|2 /2, and change
unknowns by including |x|2 /2 and |y|2 /2 into ψ and φ respectively,
then change signs and reduce to the cost function x · y, which is the one
3
The latter pair was denoted (ϕ, ψ) in [814, Chapter 1], which will probably upset
the reader.
Bibliographical notes 99

appearing naturally in the Legendre duality of convex functions. This is

explained carefully in [814, Chapter 2], where reminders and references
about the theory of convex functions in Rn are also provided.
The Kantorovich duality theorem was proven by Kantorovich him-
self on a compact space in his famous note [501] (even before he re-
alized the connection with Monge’s transportation problem). As Kan-
torovich noted later in [502], the duality for the cost c(x, y) = |x − y|
in Rn implies that transport pathlines are orthogonal to the surfaces
{ψ = constant}, where ψ is the Kantorovich potential, i.e. the solu-
tion of the dual problem; in this way he recovered Monge’s celebrated
original observation.
In 1958, Kantorovich and Rubinstein [506] made the duality more
explicit in the special case when c(x, y) = d(x, y). Much later the
statement was generalized by Dudley [316, Lecture 20] [318, Sec-
tion 11.8], with an alternative argument (partly based on ideas by
Neveu) which does not need completeness (this sometimes is useful
to handle complete nonseparable spaces); the proof in the first refer-
ence contains a gap which was filled by de Acosta [273, Appendix B].4
Rüschendorf [390, 715], Fernique [356], Szulga [769], Kellerer [512],
Feyel [357], and probably others, contributed to the problem.
Modern treatments most often use variants of the Hahn–Banach
theorem, see for instance [550, 696, 814]. The proof presented in [814,
Theorem 1] first proves the duality when X , Y are compact, then treats
the general case by an approximation argument; this is somewhat tricky
but has the advantage of avoiding the general version of the axiom of
choice, since it uses the Hahn–Banach theorem only in the separable
space C(K), where K is compact.
Mikami [629] recovered the duality theorem in Rn using stochastic
control, and together with Thieullen [631] extended it to certain classes
of stochastic processes.
Ramachandran and Rüschendorf [698, 699] investigated the Kan-
torovich duality out of the setting of Polish spaces, and found out a
necessary and sufficient condition for its validity (the spaces should be
“perfect”).
In the case when the cost function is a distance, the optimal trans-
port problem coincides with the Kantorovich transshipment problem,
for which more duality theorems are available, and a vast literature
4
De Acosta used an idea suggested by Dudley in Saint-Flour, 25 years before my
own course!
100 5 Cyclical monotonicity and Kantorovich duality

has been written; see [696, Chapter 6] for results and references. This
topic is closely related to the subject of “Kantorovich norms”: see [464],
[695, Chapters 5 and 6], [450, Chapter 4], [149] and the many references
therein.
Around the mid-eighties, it was understood that the study of
the dual problem, and in particular the existence of a maximizer,
could lead to precious qualitative information about the solution of
the Monge–Kantorovich problem. This point of view was emphasized
by many authors such as Knott and Smith [524], Cuesta-Albertos,
Matrán and Tuero-Dı́az [254, 255, 259], Brenier [154, 156], Rachev and
Rüschendorf [696, 722], Abdellaoui and Heinich [1, 2], Gangbo [395],
Gangbo and McCann [398, 399], McCann [616] and probably others.
Then Ambrosio and Pratelli proved the existence of a maximizing pair
under the conditions (5.10); see [32, Theorem 3.2]. Under adequate as-
sumptions, one can also prove the existence of a maximizer for the dual
problem by direct arguments which do not use the original problem (see
for instance [814, Chapter 2]).
The classical theory of convexity and its links with the property
of cyclical monotonicity are exposed in the well-known treatise by
Rockafellar [705]. The more general notions of c-convexity and c-
cyclical monotonicity were studied by several researchers, in particular
Rüschendorf [722]. Some authors prefer to use c-concavity; I personally
advocate working with c-convexity, because signs get better in many
situations. However, the conventions used in the present set of notes
have the disadvantage that the cost function c( · , y) is not c-convex.
A possibility to remedy this would be to call (−c)-convexity the no-
tion which I defined. This convention (suggested to me by Trudinger)
is appealing, but would have forced me to write (−c)-convex hundreds
of times throughout this book.
The notation ∂c ψ(x) and the terminology of c-subdifferential is de-
rived from the usual notation ∂ψ(x) in convex analysis. Let me stress
that in my notation ∂c ψ(x) is a set of points, not a set of tangent vectors
or differential forms. Some authors prefer to call ∂c ψ(x) the contact
set of ψ at x (any y in the contact set is a contact point) and to use
the notation ∂c ψ(x) for a set of tangent vectors (which under suitable
assumptions can be identified with the contact set, and which I shall
denote by −∇x c(x, ∂c ψc (x)), or ∇− c ψ(x), in Chapters 10 and 12).
In [712] the authors argue that c-convex functions should be con-
structible in practice when the cost function c is convex (in the usual
Bibliographical notes 101

sense), in the sense that such c-convex proﬁles can be “engineered”

with a convex tool.
For the proof of Theorem 5.10, I borrowed from McCann [613] the
idea of recovering c-cyclical monotonicity from approximation by com-
binations of Dirac masses; from Rüschendorf [719] the method used to
reconstruct ψ from Γ ; from Schachermayer and Teichmann [738] the
clever truncation procedure used in the proof of Part (ii). Apart from
that the general scheme of proof is more or less the one which was used
by Ambrosio and Pratelli [32], and Ambrosio, Gigli and Savaré [30].
On the whole, the proof avoids the use of the axiom of choice, does
not need any painful approximation procedure, and leads to the best
known results. In my opinion these advantages do compensate for its
being somewhat tricky.
About the proof of the Kantorovich duality, it is interesting to notice
that “duality for somebody implies duality for everybody” (a rule which
is true in other branches of analysis): In the present case, constructing
one particular cyclically monotone transference plan allows one to prove
the duality, which leads to the conclusion that all transference plans
should be cyclically monotone. By the way, the latter statement could
also be proven directly with the help of a bit of measure theoretical
abstract nonsense, see e.g. [399, Theorem 2.3] or [1, 2].
The use of the law of large numbers for empirical measures might be
natural for a probabilistic audience, but one should not forget that this
is a subtle result: For any bounded continuous test function, the usual
law of large numbers yields convergence out of a negligible set, but then
one has to ﬁnd a negligible set that works for all bounded continuous
test functions. In a compact space X this is easy, because Cb (X ) is
separable, but if X is not compact one should be careful. Dudley [318,
Theorem 11.4.1] proves the law of large numbers for empirical measures
on general separable metric spaces, giving credit to Varadarajan for
this theorem. In the dynamical systems community, these results are
known as part of the so-called Krylov–Bogoljubov theory, in relation
with ergodic theorems; see e.g. Oxtoby [674] for a compact space.
The equivalence between the properties of optimality (of a transfer-
ence plan) and cyclical monotonicity, for quite general cost functions
and probability measures, was a widely open problem until recently; it
was explicitly listed as Open Problem 2.25 in [814] for a quadratic cost
function in Rn . The current state of the art is as follows:
102 5 Cyclical monotonicity and Kantorovich duality

• the equivalence is false for a general lower semicontinuous cost func-

tion with possibly infinite values, as shown by a clever counterex-
ample of Ambrosio and Pratelli [32];
• the equivalence is true for a continuous cost function with possibly
infinite values, as shown by Pratelli [688];
• the equivalence is true for a real-valued lower semicontinuous cost
function, as shown by Schachermayer and Teichmann [738]; this
is the result that I chose to present in this course. Actually it is
sufficient for the cost to be lower semicontinuous and real-valued
almost everywhere (with respect to the product measure µ ⊗ ν).
• more generally, the equivalence is true as soon as c is measurable
(not even lower semicontinuous!) and {c = ∞} is the union of a
closed set and a (µ ⊗ ν)-negligible Borel set; this result is due to
Beiglböck, Goldstern, Maresch and Schachermayer [79].

Schachermayer and Teichmann gave a nice interpretation of the

Ambrosio–Pratelli counterexample and suggested that the correct no-
tion in the whole business is not cyclical monotonicity, but a variant
which they named “strong cyclical monotonicity condition” [738].
As I am completing these notes, it seems that the final resolution
of this equivalence issue might soon be available, but at the price of
a journey into the very guts of measure theory. The following con-
struction was explained to me by Bianchini. Let c be an arbitrary
lower semicontinuous cost function with possibly infinite values, and
let π be a c-cyclically monotone plan. Let Γ be a c-cyclically monotone
set with π[Γ ] = 1. Define an equivalence relation R on Γ as follows:
(x, y) ∼ (x′ , y ′ ) if there is a finite number of pairs (xk , yk ), 0 ≤ k ≤ N ,
such that: (x, y) = (x0 , y0 ); either c(x0 , y1 ) < +∞ or c(x1 , y0 ) < +∞;
(x1 , y1 ) ∈ Γ ; either c(x1 , y2 ) < +∞ or c(x2 , y1 ) < +∞; etc. until
(xN , yN ) = (x′ , y ′ ). The relation R divides Γ into equivalence classes
(Γα )α∈Γ/R . Let p be the map which to a point x associates its equiv-
alence class x. The set Γ/R in general has no topological or measur-
able structure, but we can equip it with the largest σ-algebra making p
measurable. On Γ × (Γ/R) introduce the product σ-algebra. If now the
graph of p is measurable for this σ-algebra, then π should be optimal
in the Monge–Kantorovich problem.
Related to the above discussion is the “transitive c-monotonicity”
considered by Beiglböck, Goldstern, Maresch and Schachermayer [79],
who also study in depth the links of this notion with c-monotonicity,
optimality, strong c-monotonicity in the sense of [738], and a new in-
Bibliographical notes 103

teresting concept of “robust optimality”. The results in [79] unify those

of [688] and [738]. A striking theorem is that robust optimality is always
equivalent to strong c-monotonicity.
An alternative approach to optimality criteria, based on extensions
of the classical saddle-point theory, was developed by Léonard [550].
In most applications, the cost function is continuous, and often
rather simple. However, it is sometimes useful to consider cost functions
that achieve the value +∞, as in the “secondary variational problem”
considered by Ambrosio and Pratelli [32] and also by Bernard and Buf-
foni [104]. Such is also the case for the optimal transport in Wiener
space considered by Feyel and Üstünel [358, 359, 360, 361], for which
the cost function c(x, y) is the square norm of x − y in the Cameron–
Martin space (so it is +∞ if x − y does not belong to that space).
In this setting, optimizers in the dual problem can be constructed via
ﬁnite-dimensional approximations, but it is not known whether there
is a more direct construction by c-monotonicity.
If one uses the cost function |x − y|p in Rn and lets p → ∞, then the
c-cyclical monotonicity condition becomes sup |xi −yi | ≤ sup |xi −yi+1 |.
Much remains of the analysis of the Kantorovich duality, but there are
also noticeable changes [222].
When condition (5.9) (or its weakened version (5.10)) is relaxed,
it is not clear in general that the dual Kantorovich problem admits a
maximizing pair. Yet this is true for instance in the case of optimal
transport in Wiener space; this is an indication that condition (5.10)
might not be the “correct” one, although at present no better general
condition is known.
Lemma 5.18 and Theorem 5.19 were inspired by a recent work of
Fathi and Figalli [348], in which a restriction procedure is used to solve
the Monge problem for certain cost functions arising from Lagrangian
dynamics in unbounded space; see Theorem 10.28 for more information.
The core argument in the proof of Theorem 5.20 has probably been
discovered several times; I learnt it in [30, Proposition 7.13]. At the
present level of generality this statement is due to Schachermayer and
Teichmann [738].
I learnt Corollary 5.23 from Ambrosio and Pratelli [32], for mea-
sures on Euclidean space. The general statement presented here, and
its simple proof, were suggested to me by Schulte, together with the
counterexample in Remark 5.25. In some situations where more struc-
ture is available and optimal transport is smooth, one can reﬁne the
104 5 Cyclical monotonicity and Kantorovich duality

convergence in probability into true pointwise convergence [822]. Even

when that information is not available, under some circumstances one
can obtain the uniform convergence of c(x, Tk (x)): Theorem 28.9(v)
is an illustration (this principle also appears in the tedious proof of
Theorem 23.14).
Measurable selection theorems in the style of Corollary 5.22 go back
at least to Berbee [98]. Recently, Fontbona, Guérin and Méléard [374]
studied the measurability of the map which to (µ, ν) associates the
union of the supports of all optimal transports between µ and ν (mea-
surability in the sense of set-valued functions). This question was mo-
tivated by applications in particle systems and coupling of stochastic
differential equations.
Theorem 5.26 appears in a more or less explicit form in various
works, especially for the particular case described in Example 5.29;
see in particular [128, Section 3]. About Legendre duality for convex
functions in R, one may consult [44, Chapter 14]. The classical reference
textbook about Legendre duality for plainly convex functions in Rn
is [705]. An excellent introduction to the Legendre duality in Banach
spaces can be found in [172, Section I.3].
Finally, I shall say a few words about some basic measure-theoretical
tools used in this chapter.
The regularity of Borel measures on Polish spaces is proven in [318,
p. 225].
Measurable selection theorems provide conditions under which one
may select elements satisfying certain conditions, in a measurable way.
The theorem which I used to prove Corollary 5.22 is one of the most
classical results of this kind: A surjective Borel map f between Pol-
ish spaces, such that the fibers f −1 (y) are all compact, admits a Borel
right-inverse. See Dellacherie [288] for a modern proof. There are more
advanced selection theorems due to Aumann, Castaing, Kuratowski,
Michael, Novikov, Ryll-Nardzewski, Sainte-Beuve, von Neumann, and
others, whose precise statements are remarkably opaque for nonexperts;
a simplified and readable account can be found in [18, Chapter 18].
Lusin’s theorem [352, Theorem 2.3.5], which was used in the proof
of Corollary 5.23, states the following: If X is a locally compact metric
space, Y is a separable metric space, µ is a Borel measure on X , f is
a measurable map X → Y, and A ⊂ X is a measurable set with finite
measure, then for each δ > 0 there is a closed set K ⊂ A such that
µ[A \ K] < δ and the restriction of f to K is continuous.
6

The Wasserstein distances

Assume, as before, that you are in charge of the transport of goods be-
tween producers and consumers, whose respective spatial distributions
are modeled by probability measures. The farther producers and con-
sumers are from each other, the more diﬃcult will be your job, and you
would like to summarize the degree of diﬃculty with just one quantity.
For that purpose it is natural to consider, as in (5.27), the optimal
transport cost between the two measures:
Z
C(µ, ν) = inf c(x, y) dπ(x, y), (6.1)
π∈Π(µ,ν)

where c(x, y) is the cost for transporting one unit of mass from x to
y. Here we do not care about the shape of the optimizer but only the
value of this optimal cost.
One can think of (6.1) as a kind of distance between µ and ν, but in
general it does not, strictly speaking, satisfy the axioms of a distance
function. However, when the cost is deﬁned in terms of a distance, it
is easy to cook up a distance from (6.1):

Definition 6.1 (Wasserstein distances). Let (X , d) be a Polish

metric space, and let p ∈ [1, ∞). For any two probability measures µ, ν
on X , the Wasserstein distance of order p between µ and ν is defined
by the formula
Z 1/p
p
Wp (µ, ν) = inf d(x, y) dπ(x, y) (6.2)
π∈Π(µ,ν) X
h i1
p p
= inf E d(X, Y ) , law (X) = µ, law (Y ) = ν .
106 6 The Wasserstein distances

Particular Case 6.2 (Kantorovich–Rubinstein distance). The

distance W1 is also commonly called the Kantorovich–Rubinstein dis-
tance (although it would be more proper to save the the terminology
Kantorovich–Rubinstein for the norm which extends W1 ; see the bibli-
ographical notes).

Example 6.3. Wp (δx , δy ) = d(x, y). In this example, the distance does
not depend on p; but this is not the rule.

At the present level of generality, Wp is still not a distance in the

strict sense, because it might take the value +∞; but otherwise it does
satisfy the axioms of a distance, as will be checked right now.

Proof that Wp satisfies the axioms of a distance. First, it is clear that

Wp (µ, ν) = Wp (ν, µ).
Next, let µ1 , µ2 and µ3 be three probability measures on X , and let
(X1 , X2 ) be an optimal coupling of (µ1 , µ2 ) and (Z2 , Z3 ) an optimal cou-
pling of (µ2 , µ3 ) (for the cost function c = dp ). By the Gluing Lemma
(recalled in Chapter 1), there exist random variables (X1′ , X2′ , X3′ ) with
law (X1′ , X2′ ) = law (X1 , X2 ) and law (X2′ , X3′ ) = law (Z2 , Z3 ). In par-
ticular, (X1′ , X3′ ) is a coupling of (µ1 , µ3 ), so

1 p 1
p
Wp (µ1 , µ3 ) ≤ E d(X1′ , X3′ )p p
≤ E d(X1′ , X2′ ) + d(X2′ , X3′ )
1 1
p p
≤ E d(X1′ , X2′ )p + E d(X2′ , X3′ )p
= Wp (µ1 , µ2 ) + Wp (µ2 , µ3 ),

where the inequality leading to the second line is an application of the

Minkowski inequality in Lp (P ), and the last equality follows from the
fact that (X1′ , X2′ ) and (X2′ , X3′ ) are optimal couplings. So Wp satisﬁes
the triangle inequality.
Finally, assume that Wp (µ, ν) = 0; then there exists a transference
plan which is entirely concentrated on the diagonal (y = x) in X × X .
So ν = Id # µ = µ. ⊓
⊔

To complete the construction it is natural to restrict Wp to a subset

of P (X ) × P (X ) on which it takes ﬁnite values.

Definition 6.4 (Wasserstein space). With the same conventions as

in Definition 6.1, the Wasserstein space of order p is defined as
6 The Wasserstein distances 107
Z
Pp (X ) := µ ∈ P (X ); d(x0 , x)p µ(dx) < +∞ ,
X

where x0 ∈ X is arbitrary. This space does not depend on the choice of

the point x0 . Then Wp defines a (finite) distance on Pp (X ).

In words, the Wasserstein space is the space of probability measures

which have a finite moment of order p. In this course, it will always be
equipped with the distance Wp .

Proof that Wp is finite on Pp. Let π be a transference plan between two

elements µ and ν in Pp (X ). Then the inequality

d(x, y)p ≤ 2p−1 d(x, x0 )p + d(x0 , y)p

shows that d(x, y)p is π(dx dy)-integrable as soon as d( · , x0 )p is µ-

integrable and d(x0 , · )p is ν-integrable. ⊓
⊔

Remark 6.5. Theorem 5.10(i) and Particular Case 5.4 together lead
to the useful duality formula for the Kantorovich–Rubinstein
distance: For any µ, ν in P1 (X ),
Z Z
W1 (µ, ν) = sup ψ dµ − ψ dν . (6.3)
kψkLip ≤1 X X

Among many applications of this formula I shall just mention the fol-
lowing covariance inequality: if f is a probability density with respect
to µ then
Z Z Z
f dµ g dµ − (f g) dµ ≤ kgkLip W1 (f µ, µ).

Remark 6.6. A simple application of Hölder’s inequality shows that

p ≤ q =⇒ Wp ≤ Wq . (6.4)

In particular, the Wasserstein distance of order 1, W1 , is the weakest of

all. The most useful exponents in the Wasserstein distances are p = 1
and p = 2. As a general rule, the W1 distance is more flexible and easier
to bound, while the W2 distance better reflects geometric features (at
least for problems with a Riemannian flavor), and is better adapted
when there is more structure; it also scales better with the dimension.
Results in W2 distance are usually stronger, and more difficult to es-
tablish, than results in W1 distance.
108 6 The Wasserstein distances

Remark 6.7. On the other hand, under adequate regularity assump-

tions on the cost function and the probability measures, it is possible
to control Wp in terms of Wq even for q < p; these reverse inequalities
express a certain rigidity property of optimal transport maps which
comes from c-cyclical monotonicity. See the bibliographical notes for
more details.

Convergence in Wasserstein sense

Now we shall study a characterization of convergence in the Wasserstein
space.
R The notation
R µk −→ µ means that µk converges weakly to µ,
i.e. ϕ dµk → ϕ dµ for any bounded continuous ϕ.
Definition 6.8 (Weak convergence in Pp ). Let (X , d) be a Polish
space, and p ∈ [1, ∞). Let (µk )k∈N be a sequence of probability mea-
sures in Pp (X) and let µ be another element of Pp (X ). Then (µk ) is
said to converge weakly in Pp (X ) if any one of the following equivalent
properties is satisfied for some (and then any) x0 ∈ X :
Z Z
(i) µk −→ µ and d(x0 , x) dµk (x) −→ d(x0 , x)p dµ(x);
p

Z Z
(ii) µk −→ µ and lim sup d(x0 , x) dµk (x) ≤ d(x0 , x)p dµ(x);
p
k→∞
Z
(iii) µk −→ µ and lim lim sup d(x0 , x)p dµk (x) = 0;
R→∞ k→∞ d(x0 ,x)≥R
(iv) For all continuous functions ϕ with |ϕ(x)| ≤ C 1 + d(x0 , x)p ,
C ∈ R, one has
Z Z
ϕ(x) dµk (x) −→ ϕ(x) dµ(x).

Theorem 6.9 (Wp metrizes Pp ). Let (X , d) be a Polish space, and

p ∈ [1, ∞); then the Wasserstein distance Wp metrizes the weak con-
vergence in Pp (X ). In other words, if (µk )k∈N is a sequence of measures
in Pp (X ) and µ is another measure in P (X ), then the statements
µk converges weakly in Pp (X ) to µ
and
Wp (µk , µ) −→ 0
are equivalent.
Convergence in Wasserstein sense 109

Remark 6.10. As a consequence of Theorem 6.9, convergence in the p-

Wasserstein space implies convergence of the moments
R of order p. There
is a stronger statement that the map µ 7−→ ( d(x0 , x)p µ(dx))1/p is 1-
Lipschitz with respect to Wp ; in the case of a locally compact length
space, this will be proven in Proposition 7.29.
Below are two immediate corollaries of Theorem 6.9 (the ﬁrst one
results from the triangle inequality):
Corollary 6.11 (Continuity of Wp ). If (X , d) is a Polish space,
and p ∈ [1, ∞), then Wp is continuous on Pp (X ). More explicitly, if µk
(resp. νk ) converges to µ (resp. ν) weakly in Pp (X ) as k → ∞, then
Wp (µk , νk ) −→ Wp (µ, ν).
Remark 6.12. On the contrary, if these convergences are only usual
weak convergences, then one can only conclude that Wp (µ, ν) ≤
lim inf Wp (µk , νk ): the Wasserstein distance is lower semicontinuous on
P (X ) (just like the optimal transport cost C, for any lower semicon-
tinuous nonnegative cost function c; recall the proof of Theorem 4.1).
Corollary 6.13 (Metrizability of the weak topology). Let (X , d)
be a Polish space. If de is a bounded distance inducing the same topology
as d (such as de = d/(1 + d)), then the convergence in Wasserstein
sense for the distance de is equivalent to the usual weak convergence of
probability measures in P (X ).

Before starting the proof of Theorem 6.9, it will be good to make

some more comments. The short version of that theorem is that
Wasserstein distances metrize weak convergence. This sounds good, but
after all, there are many ways to metrize weak convergence. Here is a
list of some of the most popular ones, deﬁned either in terms of mea-
sures µ, ν, or in terms of random variables X, Y with law (X) = µ,
law (Y ) = ν:
• the Lévy–Prokhorov distance (or just Prokhorov distance):
n o
dP (µ, ν) = inf ε > 0; ∃ X, Y ; inf P d(X, Y ) > ε ≤ ε ; (6.5)

• the bounded Lipschitz distance (also called Fortet–Mourier dis-

tance):
Z Z
dbL (µ, ν) = sup ϕ dµ − ϕ dν; kϕk∞ + kϕkLip ≤ 1 ; (6.6)
110 6 The Wasserstein distances

• the weak-∗ distance (on a locally compact metric space):

X Z Z

dw∗ (µ, ν) = −k
2 ϕk dµ − ϕk dν , (6.7)
k∈N

where (ϕk )k∈N is a dense sequence in C0 (X );

• the Toscani distance (on P2 (Rn )):
 Z Z 

e−ix·ξ dµ(x) − e−ix·ξ
dν(x) 

dT (µ, ν) = sup  
 . (6.8)

ξ∈Rn \{0} |ξ|2

(Here I implicitly assume that µ, ν have the same mean, otherwise

dT (µ, ν) would be inﬁnite; one can also introduce variants of dT by
changing the exponent 2 in the denominator.)
So why bother with Wasserstein distances? There are several answers
to that question:

1. Wasserstein distances are rather strong, especially in the way they

take care of large distances in X ; this is a definite advantage over,
for instance, the weak-∗ distance (which in practice is so weak
that I advise the reader to never use it). It is not so difficult to
combine information on convergence in Wasserstein distance with
some smoothness bound, in order to get convergence in stronger
distances.
2. The definition of Wasserstein distances makes them convenient to
use in problems where optimal transport is naturally involved, such
as many problems coming from partial differential equations.
3. The Wasserstein distances have a rich duality; this is especially
useful for p = 1, in view of (6.3) (compare with the definition of
the bounded Lipschitz distance). Passing back and forth from the
original to the dual definition is often technically convenient.
4. Being defined by an infimum, Wasserstein distances are often rela-
tively easy to bound from above: The construction of any coupling
between µ and ν yields a bound on the distance between µ and ν. In
the same line of ideas, any C-Lipschitz mapping f : X → X ′ induces
a C-Lipschitz mapping P1 (X ) → P1 (X ′ ) defined by µ 7−→ f# µ (the
proof is obvious).
Convergence in Wasserstein sense 111

5. Wasserstein distances incorporate a lot of the geometry of the space.

For instance, the mapping x 7−→ δx is an isometric embedding of X
into Pp (X ); but there are much deeper links. This partly explains
why Pp (X ) is often very well adapted to statements that combine
weak convergence and geometry.

To prove Theorem 6.9 I shall use the following lemma, which has
interest on its own and will be useful again later.

Lemma 6.14 (Cauchy sequences in Wp are tight). Let X be

a Polish space, let p ≥ 1 and let (µk )k∈N be a Cauchy sequence in
(Pp (X ), Wp ). Then (µk ) is tight.

The proof is not so obvious and one might skip it at ﬁrst reading.

Proof of Lemma 6.14. Let (µk )k∈N be a Cauchy sequence in Pp (X ): This

means that
Wp (µk , µℓ ) −→ 0 as k, ℓ → ∞.
In particular,
Z h ip
p
d(x0 , x)p dµk (x) = Wp δx0 , µk ≤ Wp (δx0 , µ1 ) + Wp (µ1 , µk )

remains bounded as k → ∞.
Since Wp ≥ W1 , the sequence (µk ) is also Cauchy in the W1 sense.
Let ε > 0 be given, and let N ∈ N be such that

k ≥ N =⇒ W1 (µN , µk ) < ε2 . (6.9)

Then for any k ∈ N, there is j ∈ {1, . . . , N } such that W1 (µj , µk ) < ε2 .

(If k ≥ N , this is (6.9); if k < N , just choose j = k.)
Since the ﬁnite set {µ1 , . . . , µN } is tight, there is a compact set K
such that µj [X \ K] < ε for all j ∈ {1, . . . , N }. By compactness, K
can be covered by a ﬁnite number of small balls: K ⊂ B(x1 , ε) ∪ . . . ∪
B(xm , ε).
Now write [ [
U := B(x1 , ε) . . . B(xm , ε);
n o [ [
Uε := x ∈ X ; d(x, U ) < ε ⊂ B(x1 , 2ε) . . . B(xm , 2ε);

d(x, U )
φ(x) := 1 − .
ε +
112 6 The Wasserstein distances

Note that 1U ≤ φ ≤ 1Uε and φ is (1/ε)-Lipschitz. By using these bounds

and the Kantorovich–Rubinstein duality (6.3), we ﬁnd that for j ≤ N
and k arbitrary,
Z
µk [Uε ] ≥ φ dµk
Z Z Z
= φ dµj + φ dµk − φ dµj
Z
W1 (µk , µj )
≥ φ dµj −
ε
W1 (µk , µj )
≥ µj [U ] − .
ε
On the one hand, µj [U ] ≥ µj [K] ≥ 1 − ε if j ≤ N ; on the other hand,
for each k we can ﬁnd j = j(k) such that W1 (µk , µj ) ≤ ε2 . So in fact

ε2
µk [Uε ] ≥ 1 − ε − = 1 − 2ε.
ε
At this point we have shown the following: For each ε > 0 there
is a ﬁnite family (xi )1≤i≤m such that all measures µk give mass at
least 1 − 2ε to the set Z := ∪B(xi , 2ε). The point is that Z might not
be compact. There is a classical remedy: Repeat the reasoning with ε
replaced by 2−(ℓ+1) ε, ℓ ∈ N; so there will be (xi )1≤i≤m(ℓ) such that
[
µk X \ B(xi , 2 ε) ≤ 2−ℓ ε.
−ℓ

1≤i≤m(ℓ)

Thus
µk [X \ S] ≤ ε,
where \ [
S := B(xi , ε2−p ).
1≤p≤∞ 1≤i≤m(p)

By construction, S can be covered by ﬁnitely many balls of radius δ,

where δ is arbitrarily small (just choose ℓ large enough that 2−ℓ ε < δ,
and then B(xi , 2−ℓ ε) will be included in B(xi , δ)). Thus S is totally
bounded, i.e. it can be covered by ﬁnitely many balls of arbitrarily small
radius. It is also closed, as an intersection of ﬁnite unions of closed sets.
Since X is a complete metric space, it follows from a classical result in
topology that S is compact. This concludes the proof of Lemma 6.14.
⊓
⊔
Convergence in Wasserstein sense 113

Proof of Theorem 6.9. Let (µk )k∈N be such that µk → µ in distance

Wp ; the goal is to show that µk converges to µ in Pp (X ). First, by
Lemma 6.14, the sequence (µk )k∈N is tight, so there is a subsequence
(µk′ ) such that µk′ converges weakly to some probability measure µe.
Then by Lemma 4.3,

µ, µ) ≤ lim
Wp (e ′
inf Wp (µk′ , µ) = 0.
k →∞

e = µ, and the whole sequence (µk ) has to converge to µ. This only

So µ
shows the weak convergence in the usual sense, not yet the convergence
in Pp (X ).
For any ε > 0 there is a constant Cε > 0 such that for all nonnegative
real numbers a, b,

(a + b)p ≤ (1 + ε) ap + Cε bp .

Combining this inequality with the usual triangle inequality, we see

that whenever x0 , x and y are three points in X, one has

d(x0 , x)p ≤ (1 + ε) d(x0 , y)p + Cε d(x, y)p . (6.10)

Now let (µk ) be a sequence in Pp (X ) such that Wp (µk , µ) −→ 0, and

for each k, let πk be an optimal transport plan between µk and µ. In-
tegrating inequality (6.10) against πk and using the marginal property,
we ﬁnd that
Z Z Z
p p
d(x0 , x) dµk (x) ≤ (1+ε) d(x0 , y) dµ(y)+Cε d(x, y)p dπk (x, y).

But of course,
Z
d(x, y)p dπk (x, y) = Wp (µk , µ)p −−−→ 0;
k→∞

therefore,
Z Z
p
lim sup d(x0 , x) dµk (x) ≤ (1 + ε) d(x0 , x)p dµ(x).
k→∞

Letting ε → 0, we see that Property (ii) of Deﬁnition 6.8 holds true; so

µk does converge weakly in Pp (X ) to µ.
Conversely, assume that µk converges weakly in Pp (X ) to µ; and
again, for each k, introduce an optimal transport plan πk between µk
and µ, so that
114 6 The Wasserstein distances
Z
d(x, y)p dπk (x, y) −→ 0.

By Prokhorov’s theorem, (µk ) forms a tight sequence; also {µ} is tight.

By Lemma 4.4, the sequence (πk ) is itself tight in P (X × X ). So, up
to extraction of a subsequence, still denoted by (πk ), one may assume
that
πk −→ π weakly in P (X × X ).
Since each πk is optimal, Theorem 5.20 guarantees that π is an opti-
mal coupling of µ and µ, so this is the (completely trivial) coupling
π = (Id , Id )# µ (in terms of random variables, Y = X). Since this is
independent of the extracted subsequence, actually π is the limit of the
whole sequence (πk ).
Now let x0 ∈ X and R > 0. If d(x, y) > R, then the largest of the
two numbers d(x, x0 ) and d(x0 , y) has to be greater than R/2, and no
less than d(x, y)/2. In a slightly pedantic form,

1d(x,y)≥R
≤ 1[d(x,x0 )≥R/2 and d(x,x0 )≥d(x,y)/2] + 1[d(x0 ,y)≥R/2 and d(x0 ,y)≥d(x,y)/2] .

So, obviously

d(x, y)p − Rp + ≤ d(x, y)p 1[d(x,x0 )≥R/2 and d(x,x0 )≥d(x,y)/2]
+ d(x, y)p 1[d(x0 ,y)≥R/2 and d(x0 ,y)≥d(x,y)/2]
≤ 2p d(x, x0 )p 1d(x,x0 )≥R/2 + 2p d(x0 , y)p 1d(x0 ,y)≥R/2 .

It follows that
Z
Wp (µk , µ)p = d(x, y)p dπk (x, y)
Z Z
p
= d(x, y) ∧ R dπk (x, y) + d(x, y)p − Rp + dπk (x, y)
Z Z
p
≤ d(x, y) ∧ R dπk (x, y) + 2p d(x, x0 )p dπk (x, y)
d(x,x0 )≥R/2
Z
+ 2p d(x0 , y)p dπk (x, y)
d(x0 ,y)>R/2
Z Z
p p
= d(x, y) ∧ R dπk (x, y) + 2 d(x, x0 )p dµk (x)
d(x,x0 )≥R/2
Z
+ 2p d(x0 , y)p dµ(y).
d(x0 ,y)≥R/2
Control by total variation 115

Since πk converges weakly to π, the ﬁrst term goes to 0 as k → ∞. So

Z
p p
lim sup Wp (µk , µ) ≤ lim 2 lim sup d(x, x0 )p dµk (x)
k→∞ R→∞ k→∞ d(x,x0 )≥R/2
Z
+ lim 2p lim sup d(x0 , y)p dµ(y)
R→∞ k→∞ d(x0 ,y)≥R/2
= 0.

This concludes the argument. ⊓

⊔

Control by total variation

The total variation is a classical notion of distance between probability

measures. There is, by the way, a classical probabilistic representation
formula of the total variation:

kµ − νkT V = 2 inf P [X 6= Y ], (6.11)

where the inﬁmum is over all couplings (X, Y ) of (µ, ν); this identity
can be seen as a very particular case of Kantorovich duality for the cost
function 1x6=y .
It seems natural that a control in Wasserstein distance should be
weaker than a control in total variation. This is not completely true,
because total variation does not take into account large distances. But
one can control Wp by weighted total variation:

Theorem 6.15 (Wasserstein distance is controlled by weighted

total variation). Let µ and ν be two probability measures on a Polish
space (X , d). Let p ∈ [1, ∞) and x0 ∈ X . Then
Z 1
1
p
p 1 1
Wp (µ, ν) ≤ 2 p′ d(x0 , x) d|µ − ν|(x) , + = 1. (6.12)
p p′
Particular Case 6.16. In the case p = 1, if the diameter of X is
bounded by D, this bound implies W1 (µ, ν) ≤ D kµ − νkT V .

Remark 6.17. The integral in the right-hand side of (6.12) can be in-
terpreted as the Wasserstein distance W1 for the particular cost func-
tion [d(x0 , x) + d(x0 , y)]1x6=y .
116 6 The Wasserstein distances

Proof of Theorem 6.15. Let π be the transference plan obtained by

keeping fixed all the mass shared by µ and ν, and distributing the
rest uniformly: this is
1
π = (Id , Id )# (µ ∧ ν) + (µ − ν)+ ⊗ (µ − ν)− ,
a
where µ ∧ ν = µ − (µ − ν)+ and a = (µ − ν)− [X] = (µ − ν)+ [X]. A more
sloppy but probably more readable way to write π is
1
π(dx dy) = (µ ∧ ν)(dx) δy=x + (µ − ν)+ (dx) (µ − ν)− (dy).
a
By using the definition of Wp , the definition of π, the triangle inequality
for d, the elementary inequality (A + B)p ≤ 2p−1 (Ap + B p ), and the
definition of a, we find that
Z
Wp (µ, ν) ≤ d(x, y)p dπ(x, y)
p

Z
1
= d(x, y)p d(µ − ν)+ (x) d(µ − ν)− (y)
a
Z
2p−1
≤ d(x, x0 )p + d(x0 , y)p d(µ − ν)+ (x) d(µ − ν)− (y)
a
Z Z
p−1 p p
≤2 d(x, x0 ) d(µ − ν)+ (x) + d(x0 , y) d(µ − ν)− (y)
Z

= 2p−1 d(x, x0 )p d (µ − ν)+ + (µ − ν)− (x)
Z
p−1
=2 d(x, x0 )p d|µ − ν|(x).

⊓
⊔

Topological properties of the Wasserstein space

The Wasserstein space Pp (X ) inherits several properties of the base

space X . Here is a ﬁrst illustration:

Theorem 6.18 (Topology of the Wasserstein space). Let X be a

complete separable metric space and p ∈ [1, ∞). Then the Wasserstein
Topological properties of the Wasserstein space 117

space Pp (X ), metrized by the Wasserstein distance Wp , is also a com-

plete separable metric space. In short: The Wasserstein space over a
Polish space is itself a Polish space. Moreover, any probability measure
can be approximated by a sequence of probability measures with finite
support.

Remark 6.19. If X is compact, then Pp (X ) is also compact; but if X

is only locally compact, then Pp (X ) is not locally compact.

Proof of Theorem 6.18. The fact that Pp (X ) is a metric space was al-
ready explained, so let us turn to the proof of separability. Let D be
a dense sequence in P X , and let P be the space of probability measures
that can be written aj δxj , where the aj are rational coeﬃcients, and
the xj are ﬁnitely many elements in D. It will turn out that P is dense
in Pp (X ).
To prove this, let ε > 0 be given, and let x0 be an arbitrary element
of D. If µ lies in Pp (X ), then there exists a compact set K ⊂ X such
that Z
d(x0 , x)p dµ(x) ≤ εp .
X \K

Cover K by a ﬁnite family of balls B(xk , ε/2), 1 ≤ k ≤ N , with centers

xk ∈ D, and deﬁne
[
Bk′ = B(xk , ε) \ B(xj , ε).
j<k

Then all Bk′ are disjoint and still cover K.

Deﬁne f on X by

f (Bk′ ∩ K) = {xk }, f (X \ K) = {x0 }.

Then, for any x ∈ K, d(x, f (x)) ≤ ε. So

Z Z Z
d(x, f (x))p dµ(x) ≤ εp dµ(x) + d(x, x0 )p dµ(x)
K X \K
p p p
≤ ε + ε = 2ε .

µ, Wp (µ, f# µ) ≤ 2εp .
Since (Id , f ) is a coupling of µ and f#P
Of course, f# µ can be written as aj δxj , 0 ≤ j ≤ N . This shows
that µ might be approximated, with arbitrary precision, by a ﬁnite
combination of Dirac masses. To conclude, it is suﬃcient to show that
118 6 The Wasserstein distances

the coeﬃcients aj might be replaced by rational coeﬃcients, up to a

very small error in Wasserstein distance. By Theorem 6.15,
 
X X X
1 1
Wp  aj δxj , bj δxj  ≤ 2 p max d(xk , xℓ )
′
|aj − bj | p ,
k,ℓ
j≤N j≤N j≤N

and obviously the latter quantity can be made as small as possible for
some well-chosen rational coeﬃcients bj .
Finally, let us prove the completeness. Again let (µk )k∈N be a
Cauchy sequence in Pp (X ). By Lemma 6.14, it admits a subsequence
(µk′ ) which converges weakly (in the usual sense) to some measure µ.
Then,
Z Z
p
d(x0 , x) dµ(x) ≤ lim
′
inf d(x0 , x)p dµk′ (x) < +∞,
k →∞

so µ belongs to Pp (X ). Moreover, by lower semicontinuity of Wp ,

Wp (µ, µℓ′ ) ≤ lim

′
inf Wp (µk′ , µℓ′ ),
k →∞

so in particular

lim sup Wp (µ, µℓ′ ) ≤ lim sup Wp (µk′ , µℓ′ ) = 0,

ℓ′ →∞ k ′ ,ℓ′ →∞

which means that µℓ′ converges to µ in the Wp sense (and not just in
the sense of weak convergence). Since (µk ) is a Cauchy sequence with
a converging subsequence, it follows by a classical argument that the
whole sequence is converging. ⊓
⊔

Bibliographical notes

The terminology of Wasserstein distance (apparently introduced by Do-

brushin) is very questionable, since (a) these distances were discovered
and rediscovered by several authors throughout the twentieth century,
including (in chronological order) Gini [417, 418], Kantorovich [501],
Wasserstein [803], Mallows [589] and Tanaka [776] (other contributors
being Salvemini, Dall’Aglio, Hoeﬀding, Fréchet, Rubinstein, Ornstein,
Bibliographical notes 119

and maybe others); (b) the explicit definition of this distance is not so
easy to find in Wasserstein’s work; and (c) Wasserstein was only inter-
ested in the case p = 1. By the way, also the spelling of Wasserstein is
doubtful: the original spelling was Vasershtein. (Similarly, Rubinstein
was spelled Rubinshtein.) These issues are discussed in a historical
note by Rüschendorf [720], who advocates the denomination of “min-
imal Lp -metric” instead of “Wasserstein distance”. Also Vershik [808]
tells about the discovery of the metric by Kantorovich and stands up
in favor of the terminology “Kantorovich distance”.
However, the terminology “Wasserstein distance” (or “Wasserstein
metric”) has been extremely successful: at the time of writing, about
30,000 occurrences can be found on the Internet. Nearly all recent pa-
pers relating optimal transport to partial differential equations, func-
tional inequalities or Riemannian geometry (including my own works)
use this convention. I will therefore stick to this by-now well-established
terminology. After all, even if this convention is a bit unfair since it does
not give credit to all contributors, not even to the most important of
them (Kantorovich), at least it does give credit to somebody.
As I learnt from Bernot, terminological confusion was enhanced in
the mid-nineties, when a group of researchers in image processing in-
troduced the denomination of “Earth Mover’s distance” [713] for the
Wasserstein (Kantorovich–Rubinstein) W1 distance. This terminology
was very successful and rapidly spread by the high rate of growth of the
engineering literature; it is already starting to compete with “Wasser-
stein distance”, scoring more than 15,000 occurrences on Internet.
Gini considered the special case where the random variables are dis-
crete and lie on the real line; like Mallows later, he was interested by
applications in statistics (the “Gini distance” is often used to roughly
quantify the inequalities of wealth or income distribution in a given
population). Tanaka discovered applications to partial differential equa-
tions. Both Mallows and Tanaka worked with the case p = 2, while Gini
was interested both in p = 1 and p = 2, and Hoeffding and Fréchet
worked with general p (see for instance [381]). A useful source on the
point of view of Kantorovich and Rubinstein is Vershik’s review [809].
Kantorovich and Rubinstein [506] made the important discovery
that the original Kantorovich distance (W1 in my notation) can be
extended into a norm on the set M (X ) of signed measures over a Pol-
ish space X . It is common to call this extension the Kantorovich–
Rubinstein norm, and by abuse of language I also used the denomina-
120 6 The Wasserstein distances

tion Kantorovich–Rubinstein metric for W1 . (It would be more proper

to call it just the Kantorovich metric, but more or less everything in
this subject should be called after Kantorovich.) This norm property
is a particular feature of the exponent p = 1, and should be taken seri-
ously because it has strong implications in functional analysis. For one
thing, the Kantorovich–Rubinstein norm provides an explicit isometric
embedding of an arbitrary Polish space in a Banach space.
As pointed out to me by Vershik, the Kantorovich–Rubinstein norm
on a metric space (X , d) can be intrinsically characterized as the max-
imal norm k · k on M (X ) which is “compatible” with the distance, in
the sense that kδx − δy k = d(x, y) for all x, y ∈ X ; this maximality
property is a consequence of the duality formula.
Here are a few words about the other probability metrics men-
tioned in this chapter. The Toscani metric is useful in the theory of
the Boltzmann equation, see [812, Section 4.2] and references quoted
therein. Together with its variants, it is also handy for studying rates
of convergence in the central limit theorem, or certain stable limit the-
orems [424]. The Lévy–Prokhorov metric appears in a number of text-
books, such as Dudley [318, p. 394]. For the taxonomy of probability
metrics and their history, the unavoidable reference is the monograph
by Rachev [695], which lists dozens and dozens of metrics together with
their main properties and applications. (Many of them are variants,
particular cases or extensions of the Wasserstein and Lévy–Prokhorov
metrics.) The more recent set of notes by Carrillo and Toscani [216] also
presents applications of various probability metrics to problems arising
in partial diﬀerential equations (in particular the inelastic Boltzmann
equation).
Here as in all the rest of this course, I only considered complete sep-
arable metric spaces. However, Wasserstein distances also make sense
in noncomplete separable metric spaces: The case p = 1 was treated
by Dudley [318, Lemma 11.8.3], while the general case was recently
considered by Clement and Desch [237]. In this reference the triangular
inequality is proven by approximation by countable spaces.
The equivalence between the four statements in Deﬁnition 6.8 is
proven in [814, Theorem 7.12]. I borrowed the proof of Lemma 6.14 from
Bolley [136]; and the scheme of proof of Theorem 6.9 from Ambrosio,
Gigli and Savaré [30]. There are alternative proofs of Theorem 6.9 in the
literature, for instance in [814, Chapter 7]. Similar convergence results
Bibliographical notes 121

had been obtained earlier by various authors, at least in particular

cases, see e.g. [260, 468].
In dimension 1, Theorem 6.9 can be proven by simpler methods, and
interpreted as a quantitative reﬁnement of Skorokhod’s representation
theorem, as noticed in [795] or [814, Section 7.3].
The ∞-Wasserstein distance, W∞ = limp→∞ Wp , does not ﬁt in the
setting considered in this chapter, in particular because the induced
topology is quite stronger than the weak topology of measures. This
distance however is useful in a surprising number of problems [208, 212,
222, 466, 617].
The representation formula (6.11) for the total variation distance is
a particular case of Strassen’s duality theorem, see for instance [814,
Section 1.4]. Remark 6.17 is extracted from [427, comments following
Remark VI.5].
Theorem 6.15 is a copy–paste from [814, Proposition 7.10], which it-
self was a slight adaptation of [696, Lemma 10.2.3]. Other upper bounds
for the Wasserstein distances are available in the literature; see for in-
stance [527] for the case of the Hamming distance on discrete product
spaces.
Results of lower bounds for the Wasserstein distance (in terms of
moments for instance) are not so common. One example is Proposi-
tion 7.29 in the next chapter. In the particular case of the 2-Wasserstein
distance on a Hilbert space, there are lower bounds expressed in terms
of moments and covariance matrices [258, 407].
In relation with the ∞-Wasserstein distance, Bouchitté, Jimenez and
Rajesh [151] prove the following estimate: If Ω is a bounded Lipschitz
open subset of Rn , equipped with the usual Euclidean distance, µ(dx) =
f (x) dx and ν(dy) are probability measures on Ω, and the density f is
uniformly bounded below, then for any p > 1,

p+n C
W∞ (µ, ν) ≤ Wp (µ, ν)p ,
inf f
where C = C(p, n, Ω). As mentioned in Remark 6.7, this “converse”
estimate is related to the fact that the optimal transport map for the
cost function |x − y|p enjoys some monotonicity properties which make
it very rigid, as we shall see again in Chapter 10. (As an analogy: the
Sobolev norms W 1,p are all topologically equivalent when applied to
C-Lipschitz convex functions on a bounded domain.)
Theorem 6.18 belongs to folklore and has probably been proven
many times; see for instance [310, Section 14]. Other arguments are
122 6 The Wasserstein distances

due to Rachev [695, Section 6.3], and Ambrosio, Gigli and Savaré [30].
In the latter reference the proof is very simple but makes use of the
deep Kolmogorov extension theorem. Here I followed a much more el-
ementary argument due to Bolley [136].
The statement in Remark 6.19 is proven in [30, Remark 7.1.9].
In a Euclidean or Riemannian context, the Wasserstein distance
W2 between two very close measures, say (1 + h1 ) ν and (1 + h2 ) ν with
h1 , h2 very small, is approximately equal to the H −1 (ν)-norm of h1 −h2 ;
see [671, Section 7], [814, Section 7.6] or Exercise 22.20. (One may also
take a look at [567, 569].) There is in fact a natural one-parameter
family of distances interpolating between H −1 (ν) and W2 , defined by
a variation on the Benamou–Brenier formula (7.34) (insert a factor
(dµt /dν)1−α , 0 ≤ α ≤ 1 in the integrand of (7.33); this construction is
due to Dolbeault, Nazaret and Savaré [312]).
Applications of the Wasserstein distances are too numerous to
be listed here; some of them will be encountered again in the se-
quel. In [150] Wasserstein distances are used to study the best ap-
proximation of a measure by a finite number of points. Various au-
thors [700, 713] use them to compare color distributions in different
images. These distances are classically used in statistics, limit the-
orems, and all kinds of problems involving approximation of prob-
ability measures [254, 256, 257, 282, 694, 696, 716]. Rio [704] de-
rives sharp quantitative bounds in Wasserstein distance for the cen-
tral limit theorem on the real line, and surveys the previous lit-
erature on this problem. Wasserstein distances are well adapted to
study rates of fluctuations of empirical measures, see [695, Theo-
rem 11.1.6], [696, Theorem 10.2.1], [498, Section 4.9], and the research
papers [8, 307, 314, 315, 479, 771, 845]. (The most precise results are
those in [307]: there it is shown that the average W1 distance be-
tween
R two independent copies of the empirical measure behaves like
( ρ1−1/d )/N 1−1/d , where N is the size of the samples, ρ the density
of the common law of the random variables, and d ≥ 3; the proofs are
partly based on subadditivity, as in [150].) Quantitative Sanov-type
theorems have been considered in [139, 742]. Wasserstein distances are
also commonly used in statistical mechanics, most notably in the the-
ory of propagation of chaos, or more generally the mean behavior of
large particle systems [768] [757, Chapter 5]; the original idea seems
to go back to Dobrushin [308, 309] and has been applied in a large
number of problems, see for instance [81, 82, 221, 590, 624]. The origi-
Bibliographical notes 123

nal version of the Dobrushin–Shlosman uniqueness criterion [308, 311]

in spin systems was expressed in terms of optimal transport distance,
although this formulation was lost in most subsequent developments
(I learnt this from Ollivier).
Wasserstein distances are also useful in the study of mixing and con-
vergence for Markov chains; the original idea, based on a contraction
property, seems to be due to Dobrushin [310], and has been redis-
covered since then by various authors [231, 662, 679]. Tanaka proved
that the W2 distance is contracting along solutions of a certain class
of Boltzmann equations [776, 777]; these results are reviewed in [814,
Section 7.5] and have been generalized in [138, 214, 379, 590].
Wasserstein distances behave well with increasing dimension, and
therefore have been successfully used in large or infinite dimension; for
instance for the large-time behavior of stochastic partial differential
equations [455, 458, 533, 605], or hydrodynamic limits of systems of
particles [444].
In a Riemannian context, the W2 distance is well-adapted to the
study of Ricci curvature, in relation with diffusion equations; these
themes will be considered again in Part II.
Here is a short list of some more surprising applications. Werner [836]
suggested that the W1 distance is well adapted to quantify some
variants of the uncertainty principle in quantum physics. In a re-
cent note, Melleray, Petrov and Vershik [625] use the properties of
the Kantorovich–Rubinstein norm to study spaces which are “linearly
rigid”, in the sense that, roughly speaking, there is only one way to
embed them in a Banach space. The beautiful text [809] by Vershik
reviews further applications of the Kantorovich–Rubinstein distance to
several original topics (towers of measures, Bernoulli automorphisms,
classification of metric spaces); see also [808] and the older contribu-
tion [807] by the same author.
7

Displacement interpolation

I shall now discuss a time-dependent version of optimal transport

leading to a continuous displacement of measures. There are two main
motivations for that extension:

• a time-dependent model gives a more complete description of the

transport;
• the richer mathematical structure will be useful later on.

As in the previous chapter I shall assume that the initial and final
probability measures are defined on the same Polish space (X , d). The
main additional structure assumption is that the cost is associated with
an action, which is a way to measure the cost of displacement along
a continuous curve, defined on a given time-interval, say [0, 1]. So the
cost function between an initial point x and a final point y is obtained
by minimizing the action among paths that go from x to y:
n o
c(x, y) = inf A(γ); γ0 = x, γ1 = y; γ ∈ C . (7.1)

Here C is a certain class of continuous curves, to be speciﬁed in each

particular case of interest, on which the action functional A is deﬁned.
Of course, Assumption (7.1) is meaningless unless one requires
some speciﬁc structure on the action functional (otherwise, just choose
A(γ) = c(γ0 , γ1 )...). A good notion of action should provide a recipe
for choosing optimal paths, and in particular a recipe to interpolate
between points in X . It will turn out that under soft assumptions, this
interpolation recipe between points can be “lifted” to an interpolation
recipe between probability measures. This will provide a time-dependent
126 7 Displacement interpolation

notion of optimal transport, that will be called displacement inter-

polation (by opposition to the standard linear interpolation between
probability measures).
This is a key chapter in this course, and I have worked hard to
attain a high level of generality, at the price of somewhat lengthy argu-
ments. So the reader should not hesitate to skip proofs at ﬁrst reading,
concentrating on statements and explanations. The main result in this
chapter is Theorem 7.21.

Deterministic interpolation via action-minimizing curves

To better understand what an action functional should be, let us start

with some examples and informal discussions. Consider a model where
the unknown is the position of a given physical system in some position
space, say a Riemannnian manifold M . (See the Appendix for reminders
about Riemannian geometry if needed.) We learn from classical physics
that in the absence of a potential, the action is the integral over time
of the (instantaneous) kinetic energy:
Z 1
|γ̇t |2
A(γ) = dt,
0 2
where γ̇t stands for the velocity (or time-derivative) of the curve γ at
time t. More generally, an action is classically given by the time-integral
of a Lagrangian along the path:
Z 1
A(γ) = L(γt , γ̇t , t) dt. (7.2)
0

Here L is deﬁned on TM × [0, 1], where the smooth manifold M is the

position space and the tangent bundle TM is the phase space, which is
the space of all possible positions and velocities. It is natural to work
in TM because one often deals with second-order differential equations
on M (such as Newton’s equations), which transform themselves into
first-order equations on TM . Typically L would take the form
|v|2
L(x, v, t) = − V (x) (7.3)
2
where V is a potential; but much more complicated forms are admis-
sible. When V is continuously differentiable, it is a simple particular
Deterministic interpolation via action-minimizing curves 127

case of the formula of ﬁrst variation (recalled in the Appendix) that

minimizers of (7.3), with given endpoints, satisfy Newton’s equation

d2 x
= −∇V (x). (7.4)
dt2
To make sure that A(γ) is well-defined, it is natural to assume that
the path γ is continuously differentiable, or piecewise continuously dif-
ferentiable, or at least almost everywhere differentiable as a function
of t. A classical and general setting is that of absolutely continuous
curves: By definition, if (X , d) is a metric space, a continuous curve
γ : [0, 1] → X is said to be absolutely continuous if there exists a func-
tion ℓ ∈ L1 ([0, 1]; dt) such that for all intermediate times t0 < t1 in
[0, 1], Z t1
d(γt0 , γt1 ) ≤ ℓ(t) dt. (7.5)
t0
More generally, it is said to be absolutely continuous of order p if for-
mula (7.5) holds with some ℓ ∈ Lp ([0, 1]; dt).
If γ is absolutely continuous, then the function t 7−→ d(γt0 , γt ) is
differentiable almost everywhere, and its derivative is integrable. But
the converse is false: for instance, if γ is the “Devil’s staircase”, en-
countered in measure theory textbooks (a nonconstant function whose
distributional derivative is concentrated on the Cantor set in [0, 1]),
then γ is differentiable almost everywhere, and γ̇(t) = 0 for almost
every t, even though γ is not constant! This motivates the “integral”
definition of absolute continuity based on formula (7.5).
If X is Rn , or a smooth differentiable manifold, then absolutely
continuous paths are differentiable for Lebesgue-almost all t ∈ [0, 1]; in
physical words, the velocity is well-defined for almost all times.
Before going further, here are some simple and important examples.
For all of them, the class of admissible curves is the space of absolutely
continuous curves.

Example 7.1. In X = Rn , choose L(x, v, t) = |v| (Euclidean norm of

the velocity). Then the action is just the length functional, while the
cost c(x, y) = |x − y| is the Euclidean distance. Minimizing curves are
straight lines, with arbitrary parametrization: γt = γ0 + s(t)(γ1 − γ0 ),
where s : [0, 1] → [0, 1] is nondecreasing and absolutely continuous.

Example 7.2. In X = Rn again, choose L(x, v, t) = c(v), where c is

strictly convex. By Jensen’s inequality,
128 7 Displacement interpolation
Z 1 Z 1
c(γ1 − γ0 ) = c γ̇t dt ≤ c(γ̇t ) dt,
0 0

and this is an equality if and only if γ˙t is constant. Therefore action-

minimizers are straight lines with constant velocity: γt = γ0 +t (γ1 −γ0 ).
Then, of course,
c(x, y) = c(y − x).

Remark 7.3. This example shows that very diﬀerent Lagrangians can
have the same minimizing curves.

Example 7.4. Let X = M be a smooth Riemannian manifold, TM its

tangent bundle, and L(x, v, t) = |v|p , p ≥ 1. Then the cost function is
d(x, y)p , where d is the geodesic distance on M . There are two quite
diﬀerent cases:

• If p > 1, minimizing curves are deﬁned by the equation γ̈t = 0

(zero acceleration), to be understood as (d/dt)γ̇t = 0, where (d/dt)
stands for the covariant derivative along the path γ (once again, see
the reminders in the Appendix if necessary). Such curves have con-
stant speed ((d/dt)|γ̇t | = 0), and are called minimizing, constant-
speed geodesics, or simply geodesics.
• If p = 1, minimizing curves are geodesic curves parametrized in an
arbitrary way.

Example 7.5. Again let X = M be a smooth Riemannian manifold,

and now consider a general Lagrangian L(x, v, t), assumed to be strictly
convex in the velocity variable v. The characterization and study of ex-
tremal curves for such Lagrangians, under various regularity assump-
tions, is one of the most classical topics in the calculus of variations.
Here are some of the basic — which does not mean trivial — results
in the ﬁeld. Throughout the sequel, the Lagrangian L is a C 1 function
deﬁned on TM × [0, 1].

• By the ﬁrst variation formula (a proof of which is sketched in the

Appendix), minimizing curves satisfy the Euler–Lagrange equation
dh i
(∇v L)(γt , γ̇t , t) = (∇x L)(γt , γ̇t , t), (7.6)
dt
which generalizes (7.4). At least this equation should be satisﬁed for
minimizing curves that are suﬃciently smooth, say piecewise C 1 .
Deterministic interpolation via action-minimizing curves 129

• If there exists K, C > 0 such that

L(x, v, t) ≥ K|v| − C,
then the action of a curve γ is bounded below by K L(γ)−C, where L
is the length; this implies that all action-minimizing curves starting
from a given compact K0 and ending in a given compact K1 stay
within a bounded region.
• If minimizing curves depend smoothly on their position and ve-
locity at some time, then there is also a bound on the velocities
Ralong
1
minimizers that join K0 to K1 . Indeed, there is a bound on
0 L(x, v, t) dt; so there is a bound on L(x, v, t) for some t; so there
is a bound on the velocity at some time, and then this bound is
propagated in time.
• Assume that L is strictly convex and superlinear in the velocity
variable, in the following sense:


 v 7−→ L(x, v, t) is convex,


∀(x, t) (7.7)

 L(x, v, t)

 −−−−→ +∞.
|v| |v|→∞

Then v 7−→ ∇v L is invertible, and (7.6) can be rewritten as a dif-

ferential equation on the new unknown ∇v L(γ, γ̇, t).
• If in addition L is C 2 and the strict inequality ∇2v L > 0 holds (more
rigorously, ∇2v L(x, · , t) ≥ K(x) gx for all x, where g is the metric and
K(x) > 0), then the new equation (where x and p = ∇v L(x, v, t) are
the unknowns) has locally Lipschitz coeﬃcients, and the Cauchy–
Lipschitz theorem can be applied to guarantee the unique local ex-
istence of Lipschitz continuous solutions to (7.6). Under the same
assumptions on L, at least if L does not depend on t, one can show
directly that minimizers are of class at least C 1 , and therefore sat-
isfy (7.6). Conversely, solutions of (7.6) are locally (in time) mini-
mizers of the action.
• Finally, the convexity of L makes it possible to deﬁne its Legendre
transform (again, with respect to the velocity variable):

H(x, p, t) := sup p · v − L(x, v, t) ,
v∈Tx M

which is called the Hamiltonian; then one can recast (7.6) in terms
of a Hamiltonian system, and access to the rich mathematical world
130 7 Displacement interpolation

of Hamiltonian dynamics. As soon as L is strictly convex superlinear,

the Legendre transform (x, v) 7−→ (x, ∇v L(x, v, t)) is a homeomor-
phism, so assumptions about (x, v) can be re-expressed in terms of
the new variables (x, p = ∇v L(x, v, t)).
• If L does not depend on t, then H(x, ∇v L(x, v)) is constant along
minimizing curves (x, v) = (γt , γ̇t ); if L does depend on t, then
(d/dt)H(x, ∇v L(x, v)) = (∂t H)(x, ∇v L(x, v)).

Some of the above-mentioned assumptions will come back often in

the sequel, so I shall summarize the most interesting ones in the fol-
lowing deﬁnition:

Definition 7.6 (Classical conditions on a Lagrangian function).

Let M be a smooth, complete Riemannian manifold, and L(x, v, t) a
Lagrangian on TM × [0, 1]. In this course, it is said that L satisfies the
classical conditions if
(a) L is C 1 in all variables;
(b) L is a strictly convex superlinear function of v, in the sense
of (7.7);
(c) There are constants K, C > 0 such that for all (x, v, t) ∈ TM ×
[0, 1], L(x, v, t) ≥ K|v| − C;
(d) There is a well-defined locally Lipschitz flow associated to the
Euler–Lagrange equation, or more rigorously to the minimization prob-
lem; that is, there is a locally Lipschitz map (x0 , v0 , t0 ; t) → φt (x0 , v0 , t0 )
on TM × [0, 1] × [0, 1], with values in TM , such that each action-
minimizing curve γ : [0, 1] → M belongs to C 1 ([0, 1]; M ) and satisfies
(γ(t), γ̇(t)) = φt (γ(t0 ), γ̇(t0 ), t0 ).

Remark 7.7. Assumption (d) above is automatically satisﬁed if L is

of class C 2 , ∇2v L > 0 everywhere and L does not depend on t.

This looks general enough, however there are interesting cases where
X does not have enough differentiable structure for the velocity vector
to be well-defined (tangent spaces might not exist, for lack of smooth-
ness). In such a case, it is still possible to define the speed along the
curve:
d(γt , γt+ε )
|γ̇t | := lim sup . (7.8)
ε→0 |ε|
This generalizes the natural notion of speed, which is the norm of the
velocity vector. Thus it makes perfect sense to write a Lagrangian of the
Deterministic interpolation via action-minimizing curves 131

form L(x, |v|, t) in a general metric space X ; here L might be essentially

R1
any measurable function on X × R+ × [0, 1]. (To ensure that 0 L dt
makes sense in R ∪ {+∞}, it is natural to assume that L is bounded
below.)

Example 7.8. Let (X , d) be a metric space. Deﬁne the length of an

absolutely continuous curve by the formula
Z 1
L(γ) = |γ̇t | dt. (7.9)
0

Then minimizing curves are called geodesics. They may have vari-
able speed, but, just as on a Riemannian manifold, one can always
reparametrize them (that is, replace γ by γ e where γ et = γs(t) , with s
continuous increasing) in such a way that they have constant speed. In
that case d(γs , γt ) = |t − s| L(γ) for all s, t ∈ [0, 1].

Example 7.9. Again let (X , d) be a metric space, but now consider

the action Z 1
A(γ) = c(|γ̇t |) dt,
0
where c is strictly convex and strictly increasing (say c(|v|) = |v|p ,
p > 1). Then,
Z 1 Z 1

c d(γ0 , γ1 ) ≤ c(L(γ)) = c |γ̇t | dt ≤ c(|γ̇t |) dt,
0 0

with equality in both inequalities if and only if γ is a constant-speed,

minimizing geodesic. Thus c(x, y) = c d(x, y) and minimizing curves
are also geodesics, but with constant speed. Note that the distance
can be recovered from the cost function, just by inverting c. As an
illustration, if p > 1, and c(|v|) = |v|p , then
Z 1 p1
p
d(x, y) = inf |γ̇t | dt; γ0 = x, γ1 = y .
0

In a given metric space, geodesics might not always exist, and it

can even be the case that nonconstant continuous curves do not exist
(think of a discrete space). So to continue the discussion we shall have
to impose appropriate assumptions on our metric space and our cost
function.
132 7 Displacement interpolation

Here comes an important observation. When one wants to compute

“in real life” the length of a curve, one does not use formula (7.9), but
rather subdivides the curve into very small pieces, and approximates
the length of each small piece by the distance between its endpoints.
The finer the subdivision, the greater the measured approximate length
(this is a consequence of the triangle inequality). So by taking finer and
finer subdivisions we get an increasing family of measurements, whose
upper bound may be taken as the measured length. This is actually
an alternative definition of the length, which agrees with (7.9) for ab-
solutely continuous curves, but does not require any further regularity
assumption than plain continuity:
h i
L(γ) = sup sup d(γt0 , γt1 ) + · · · + d(γtN−1 , γtN ) . (7.10)
N ∈N 0=t0 <t1 <...<tN =1

Then one can deﬁne a length space as a metric space (X , d) in

which, for any two x, y ∈ X ,
n o
d(x, y) = inf L(γ); γ0 = x, γ1 = y . (7.11)
γ∈C([0,1];X )

If in addition X is complete and locally compact, then the inﬁmum is

a minimum, in which case the space is said to be a strictly intrinsic
length space, or geodesic space. (By abuse of language, one often says
just “length space” for “strictly intrinsic length space”.) Such spaces
play an important role in modern nonsmooth geometry.
Formulas (7.10) and (7.11) show an intimate link between the length
and the distance: The length determines the distance by minimization,
but conversely the distance determines the length by subdivision and
approximation. The idea behind it is that the length of an “infinitesimal
curve” is determined solely by the endpoints. A similar relation holds
true for an action which is defined by a general Lagrangian of the
form (7.2): indeed, if γ is differentiable at t, then
Z t+ε

L γτ , γ̇τ , τ dτ ≃ ε L(γt , γ˙t , t),
t

and the vector γ˙t is uniquely determined, up to an error o(ε), by γt and

γt+ε . Such would not be the case if the function L would also depend
on, say, the acceleration of the curve.
This reconstruction property plays an important role and it is
natural to enforce it in an abstract generalization. To do so, it will be
Abstract Lagrangian action 133

useful to consider an action as a family of functionals parametrized by

the initial and the ﬁnal times: So As,t is a functional on the set of paths
[s, t] → X . Then we let
n o
cs,t (x, y) = inf As,t(γ); γs = x, γt = y; γ s,t ∈ C([s, t]; X ) .
(7.12)
s,t
In words, c (x, y) is the minimal work needed to go from point x at
initial time s, to point y at ﬁnal time t.

Example 7.10. Consider the Lagrangian L(x, |v|, t) = |v|p . Then

d(x, y)p
cs,t (x, y) = .
(t − s)p−1

Note a characteristic property of these “power law” Lagrangians: The

cost function depends on s, t only through multiplication by a constant.
In particular, minimizing curves will be independent of s and t, up to
reparametrization.

Abstract Lagrangian action

After all these preparations, the following deﬁnition should appear

somewhat natural.

Definition 7.11 (Lagrangian action). Let (X , d) be a Polish space,

and let ti , tf ∈ R. A Lagrangian action (A)ti ,tf on X is a family of
lower semicontinuous functionals As,t on C([s, t]; X ) (ti ≤ s < t ≤ tf ),
and cost functions cs,t on X × X , such that:
(i) ti ≤ t1 < t2 < t3 ≤ tf =⇒ At1 ,t2 + At2 ,t3 = At1 ,t3 ;
(ii) ∀x, y ∈ X n o
cs,t (x, y) = inf As,t(γ); γ ∈ C([s, t]; X ); γs = x, γt = y ;
(iii) For any curve (γt )ti ≤t≤tf ,
h
Ati ,tf (γ) = sup sup ct0 ,t1 (γt0 , γt1 ) + ct1 ,t2 (γt1 , γt2 )
N ∈N ti =t0 ≤t1 ≤...≤tN =tf
i
+ · · · + ctN−1 ,tN (γtN−1 , γtN ) .
134 7 Displacement interpolation

The functional A = Ati ,tf will just be called the action, and the
cost function c = cti ,tf the cost associated with the action. A curve
γ : [ti , tf ] → X is said to be action-minimizing if it minimizes A among
all curves having the same endpoints.

Examples 7.12. (i) To recover (7.2) as a particular case of Deﬁni-

tion 7.11, just set
Z t
s,t
A (γ) = L(γτ , γ̇τ , τ ) dτ. (7.13)
s

(ii) A length space is a space in which As,t (γ) = L(γ) (here L is the
length) deﬁnes a Lagrangian action.

If [t′i , t′f ] ⊂ [ti , tf ] then it is clear that (A)ti ,tf induces an action
′ ′
(A)ti ,tf on the time-interval [t′i , t′f ], just by restriction.
In the rest of this section I shall take (ti , tf ) = (0, 1), just for simplic-
ity; of course one can always reduce to this case by reparametrization.
It will now be useful to introduce further assumptions about exis-
tence and compactness of minimizing curves.

Definition 7.13 (Coercive action). Let (A)0,1 be a Lagrangian ac-

tion on a Polish space X , with associated cost functions (cs,t )0≤s<t≤1 .
For any two times s, t (0 ≤ s < t ≤ 1), and any two compact sets
s,t
Ks , Kt ⊂ X , let ΓK s →Kt
be the set of minimizing paths starting in Ks
at time s, and ending in Kt at time t. The action will be called coercive
if:
(i) It is bounded below, in the sense that

inf inf As,t (γ) > −∞;

s<t γ

(ii) If s < t are any two intermediate times, and Ks , Kt are any
two nonempty compact sets such that cs,t (x, y) < +∞ for all x ∈ Ks ,
s,t
y ∈ Kt , then the set ΓK s →Kt
is compact and nonempty.
In particular, minimizing curves between any two fixed points x, y
with c(x, y) < +∞ should always exist and form a compact set.

Remark 7.14. If each As,t has compact sub-level sets (more explicitly,
if {γ; As,t(γ) ≤ A} is compact in C([s, t]; X ) for any A ∈ R), then the
lower semicontinuity of As,t, together with a standard compactness
argument (just as in Theorem 4.1) imply the existence of at least one
Abstract Lagrangian action 135

action-minimizing curve among the set of curves that have prescribed

ﬁnal and initial points. In that case the requirement of nonemptiness
in (ii) is fulﬁlled.

Examples 7.15. (i) If X is a smooth complete Riemannian manifold

and L(x, v, t) is a Lagrangian satisfying the classical conditions of Def-
inition 7.6, then the action defined by (7.13) is coercive.
(ii) If X is a geodesic length space, then the action defined by
As,t(γ) = L(γ)2 /(t − s) is coercive; in fact minimizers are constant-
speed minimizing geodesic curves. On the other hand the action defined
by As,t (γ) = L(γ) is not coercive, since the possibility of reparametriza-
tion prevents the compactness of the set of minimizing curves.

Proposition 7.16 (Properties of Lagrangian actions). Let (X , d)

be a Polish space and (A)0,1 a coercive Lagrangian action on X . Then:

(i) For all intermediate times s < t, cs,t is lower semicontinuous on

X × X , with values in R ∪ {+∞}.
(ii) If a curve γ on [s, t] ⊂ [0, 1] is a minimizer of As,t, then its
′ ′
restriction to [s′ , t′ ] ⊂ [s, t] is also a minimizer for As ,t .
(iii) For all times t1 < t2 < t3 in [0, 1], and x1 , x3 in X ,

ct1 ,t3 (x1 , x3 ) = inf ct1 ,t2 (x1 , x2 ) + ct2 ,t3 (x2 , x3 ) ; (7.14)
x2 ∈X

and if the infimum is achieved at some point x2 , then there is a mini-

mizing curve which goes from x1 at time t1 to x3 at time t3 , and passes
through x2 at time t2 .
(iv) A curve γ is a minimizer of A if and only if, for all intermediate
times t1 < t2 < t3 in [0, 1],

ct1 ,t3 (γt1 , γt3 ) = ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 ). (7.15)

(v) If the cost functions cs,t are continuous, then the set Γ of all
action-minimizing curves is closed in the topology of uniform conver-
gence;
(vi) For all times s < t, there exists a Borel map Ss→t : X × X →
s,t
C([s, t]; X ), such that for all x, y ∈ X , S(x, y) belongs to Γx→y . In
words, there is a measurable recipe to join any two endpoints x and y
by a minimizing curve γ : [s, t] → X .
136 7 Displacement interpolation

Remark 7.17. The statement in (iv) is a powerful formulation of the

minimizing property. It is often quite convenient from the technical
point of view, even in a smooth setting, because it does not involve any
time-derivative.

Remark 7.18. The continuity assumption in (v) is satisﬁed in most

cases of interest. For instance, if As,t(γ) = L(γ)2 /(t−s), then cs,t (x, y) =
d(x, y)2 /(t − s), which is obviously continuous. Continuity also holds
true in the other model example where X is a Riemannian manifold and
the cost is obtained from a Lagrangian function L(x, v, t) on TM ×[0, 1]
satisfying the classical assumptions; a proof is sketched in the Ap-
pendix.

Proof of Proposition 7.16. Let us prove (i). By deﬁnition of the co-

ercivity, c(x, y) is never −∞. Let (xk )k∈N and (yk )k∈N be sequences
converging to x and y respectively. Then the family (xk ) ∪ {x} forms
a compact set Ks , and the family (yk ) ∪ {y} also forms a compact
set Kt . By assumption, for each k we can ﬁnd a minimizing curve
s,t
γk : [s, t] → X joining xk to yk , so γk belongs to ΓK s →Kt
which is
compact. From (γk )k∈N we can extract a subsequence which converges
uniformly to some minimizing curve γ. The uniform convergence im-
plies that xk = γk (s) → γ(s), yk = γk (t) → γ(t), so γ joins x to y. The
lower semicontinuity of As,t implies that As,t(γ) ≤ lim inf As,t (γk ); thus

cs,t (x, y) ≤ As,t (γ) ≤ lim inf As,t(γk ) = lim inf cs,t (xk , yk ).

This establishes the lower semicontinuity of the cost cs,t .

Property (ii) is obvious: if the restriction of γ to [s′ , t′ ] is not optimal,
′ ′ ′ ′
introduce γe on [s′ , t′ ] such that As ,t (eγ ) < As ,t (γ). Then the path
obtained by concatenating γ on [s, s′ ], γ e on [s′ , t′ ] and γ again on [t′ , t],
s,t
has a strictly lower action A than γ, which is impossible. (Obviously,
this is the same line of reasoning as in the proof of the “restriction
property” of Theorem 4.6.)
Now, to prove (iii), introduce minimizing curves γ1→2 joining x1 at
time t1 , to x2 at time t2 , and γ2→3 joining x2 at time t2 , to x3 at time
t3 . Then deﬁne γ on [t1 , t3 ] by concatenation of γ1→2 and γ2→3 . From
the axioms of Deﬁnition 7.11,

ct1 ,t3 (x1 , x3 ) ≤ At1 ,t3 (γ) = At1 ,t2 (γ1→2 ) + At2 ,t3 (γ2→3 )
= ct1 ,t2 (x1 , x2 ) + ct2 ,t3 (x2 , x3 ).
Interpolation of random variables 137

The inequality in (iii) follows by taking the inﬁmum over x2 . Moreover,

if there is equality, that is,
ct1 ,t2 (x1 , x2 ) + ct2 ,t3 (x2 , x3 ) = ct1 ,t3 (x1 , x3 ),
then equality holds everywhere in the above chain of inequalities, so the
curve γ achieves the optimal cost ct1 ,t3 (x1 , x3 ), while passing through
x2 at time t2 .
It is a consequence of (iii) that any minimizer should satisfy (7.15),
since the restrictions of γ to [t1 , t2 ] and to [t2 , t3 ] should both be min-
imizing. Conversely, let γ be a curve satisfying (7.15) for all (t1 , t2 , t3 )
with t1 < t2 < t3 . By induction, this implies that for each subdivision
0 = t0 < t1 ≤ . . . < tN = 1,
X
c0,1 (γ0 , γ1 ) = ctj ,tj+1 (γtj , γtj+1 ).
j

By point (iii) in Deﬁnition 7.11, it follows that c0,1 (γ0 , γ1 ) = A0,1 (γ),
which proves (iv).
If 0 ≤ t1 < t2 < t3 ≤ 1, now let Γ (t1 , t2 , t3 ) stand for the set
of all curves satisfying (7.15). If all functions cs,t are continuous, then
Γ (t1 , t2 , t3 ) is closed for the topology of uniform convergence. Then Γ is
the intersection of all Γ (t1 , t2 , t3 ), so it is closed also; this proves state-
ment (v). (Now there is a similarity with the proof of Theorem 5.20.)

For given times s < t, let Γ s,t be the set of all action-minimizing
curves deﬁned on [s, t], and let Es,t be the “endpoints” mapping, deﬁned
on Γ s,t by γ 7−→ (γs , γt ). By assumption, any two points are joined by
at least one minimizing curve, so Es,t is onto X × X . It is clear that
Es,t is a continuous map between Polish spaces, and by assumption
−1
Es,t (x, y) is compact for all x, y. It follows by general theorems of
measurable selection (see the bibliographical notes in case of need)
that Es,t admits a measurable right-inverse Ss→t , i.e. Es,t ◦ Ss→t = Id .
This proves statement (vi). ⊓
⊔

Interpolation of random variables

Action-minimizing curves provide a fairly general framework to inter-
polate between points, which can be seen as deterministic random vari-
ables. What happens when we want to interpolate between genuinely
138 7 Displacement interpolation

random variables, in a way that is most economic? Since a determin-

istic point can be identiﬁed with a Dirac mass, this new problem con-
tains both the classical action-minimizing problem and the Monge–
Kantorovich problem.
Here is a natural recipe. Let c be the cost associated with the La-
grangian action, and let µ0 , µ1 be two given laws. Introduce an op-
timal coupling (X0 , X1 ) of (µ0 , µ1 ), and a random action-minimizing
path (Xt )0≤t≤1 joining X0 to X1 . (We shall see later that such a thing
always exists.) Then the random variable Xt is an interpolation of X0
and X1 ; or equivalently the law µt is an interpolation of µ0 and µ1 .
This procedure is called displacement interpolation, by opposition
to the linear interpolation µt = (1 − t) µ0 + t µ1 . Note that there is a
priori no uniqueness of the displacement interpolation.
Some of the concepts which we just introduced deserve careful
attention. In the sequel, et will stand for the evaluation at time t:
et (γ) = γ(t).

Definition 7.19 (Dynamical coupling). Let (X , d) be a Polish

space. A dynamical transference plan Π is a probability measure on the
space C([0, 1]; X ). A dynamical coupling of two probability measures
µ0 , µ1 ∈ P (X ) is a random curve γ : [0, 1] → X such that law (γ0 ) = µ0 ,
law (γ1 ) = µ1 .

Definition 7.20 (Dynamical optimal coupling). Let (X , d) be a

Polish space, (A)0,1 a Lagrangian action on X , c the associated cost,
and Γ the set of action-minimizing curves. A dynamical optimal trans-
ference plan is a probability measure Π on Γ such that

π0,1 := (e0 , e1 )# Π

is an optimal transference plan between µ0 and µ1 . Equivalently, Π is

the law of a random action-minimizing curve whose endpoints consti-
tute an optimal coupling of µ0 and µ1 . Such a random curve is called a
dynamical optimal coupling of (µ0 , µ1 ). By abuse of language, Π itself
is often called a dynamical optimal coupling.

The next theorem is the main result of this chapter. It shows that
the law at time t of a dynamical optimal coupling can be seen as a
minimizing path in the space of probability measures. In the important
case when the cost is a power of a geodesic distance, the corollary stated
right after the theorem shows that displacement interpolation can be
Interpolation of random variables 139

thought of as a geodesic path in the space of probability measures.

(“A geodesic in the space of laws is the law of a geodesic.”) The theorem
also shows that such interpolations can be constructed under quite weak
assumptions.

Theorem 7.21 (Displacement interpolation). Let (X , d) be a

Polish space, and (A)0,1 a coercive Lagrangian action on X , with con-
tinuous cost functions cs,t . Whenever 0 ≤ s < t ≤ 1, denote by
C s,t(µ, ν) the optimal transport cost between the probability measures
µ and ν for the cost cs,t ; write c = c0,1 and C = C 0,1 . Let µ0 and µ1
be any two probability measures on X , such that the optimal transport
cost C(µ0 , µ1 ) is finite. Then, given a continuous path (µt )0≤t≤1 , the
following properties are equivalent:
(i) For each t ∈ [0, 1], µt is the law of γt , where (γt )0≤t≤1 is a
dynamical optimal coupling of (µ0 , µ1 );
(ii) For any three intermediate times t1 < t2 < t3 in [0, 1],

C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) = C t1 ,t3 (µt1 , µt3 );

(iii) The path (µt )0≤t≤1 is a minimizing curve for the coercive action
functional defined on P (X ) by
N
X −1
As,t(µ) = sup sup C ti ,ti+1 (µti , µti+1 ) (7.16)
N ∈N s=t0 <t1 <...<tN =t i=0
= inf E As,t(γ), (7.17)
γ

where the last infimum is over all random curves γ : [s, t] → X such
that law (γτ ) = µτ (s ≤ τ ≤ t).
In that case (µt )0≤t≤1 is said to be a displacement interpolation be-
tween µ0 and µ1 . There always exists at least one such curve.
Finally, if K0 and K1 are two compact subsets of P (X ), such that
C 0,1 (µ0 , µ1 ) < +∞ for all µ0 ∈ K0 , µ1 ∈ K1 , then the set of dynamical
optimal transference plans Π with (e0 )# Π ∈ K0 and (e1 )# Π ∈ K1 is
compact.

Theorem 7.21 admits two important corollaries:

Corollary 7.22 (Displacement interpolation as geodesics). Let

(X , d) be a complete separable, locally compact length space. Let p > 1
140 7 Displacement interpolation

and let Pp (X ) be the space of probability measures on X with finite

moment of order p, metrized by the Wasserstein distance Wp . Then,
given any two µ0 , µ1 ∈ Pp (X ), and a continuous curve (µt )0≤t≤1 , valued
in P (X ), the following properties are equivalent:
(i) µt is the law of γt , where γ is a random (minimizing, constant-
speed) geodesic such that (γ0 , γ1 ) is an optimal coupling;
(ii) (µt )0≤t≤1 is a geodesic curve in the space Pp (X ).
Moreover, if µ0 and µ1 are given, there exists at least one such curve.
More generally, if K0 ⊂ Pp (X ) and K1 ⊂ Pp (X ) are compact subsets of
P (X ), then the set of geodesic curves (µt )0≤t≤1 such that µ0 ∈ K0 and
µ1 ∈ K1 is compact and nonempty; and also the set of dynamical opti-
mal transference plans Π with (e0 )# Π ∈ K0 , (e1 )# Π ∈ K1 is compact
and nonempty.

Corollary 7.23 (Uniqueness of displacement interpolation).

With the same assumptions as in Theorem 7.21, if:
(a) there is a unique optimal transport plan π between µ0 and µ1 ;
(b) π(dx0 dx1 )-almost surely, x0 and x1 are joined by a unique min-
imizing curve;
then there is a unique displacement interpolation (µt )0≤t≤1 joining µ0
to µ1 .
Rt
Remark 7.24. In Corollary 7.22, As,t(γ) = s |γ̇τ |p dτ . Then action-
minimizing curves in X are the same, whatever the value of p > 1. Yet
geodesics in Pp (X ) are not the same for diﬀerent values of p, because
a coupling of (µ0 , µ1 ) which is optimal for a certain value of p, might
well not be for another value.

Remark 7.25. Theorem 7.21 applies to Lagrangian functions L(x, v, t)

on a Riemannian manifold TM , as soon as L is C 2 and satisﬁes the
classical conditions of Deﬁnition 7.6. Then µt is the law at time t of a
random solution of the Euler–Lagrange equation (7.6).

Remark 7.26. In Theorem 7.21, the minimizing property of the path

(µt ) is expressed in a weak formulation, which makes sense with a lot
of generality. But this theorem leaves open certain natural questions:

• Is there a diﬀerential equation for geodesic curves, or more generally

optimal paths (µt )0≤t≤1 ? Of course, the answer is related to the
possibility of deﬁning a tangent space in the space of measures.
Interpolation of random variables 141

• Is there a more explicit formula for the action on the space of prob-
ability measures,
R1 say for a simple enough action on X ? Can it be
written as 0 L(µt , µ̇t , t) dt? (Of course, in Corollary 7.22 this is the
case with L = |µ̇|p , but this expression is not very “explicit”.)
• Are geodesic paths nonbranching? (Does the velocity at initial time
uniquely determine the ﬁnal measure µ1 ?)
• Can one identify simple conditions for the existence of a unique
geodesic path between two given probability measures?

All these questions will be answered aﬃrmatively in the sequel of

this course, under suitable regularity assumptions on the space, the
action or the probability measures.

Remark 7.27. The assumption of local compactness in Corollary 7.22

is not superﬁcial: it is used to guarantee the coercivity of the action.
For spaces that are not locally compact, there might be an analogous
theory, but it is certainly more tricky. First of all, selection theorems
are not immediately available if one does not assume compactness of
the set of geodesics joining two given endpoints. More importantly, the
convergence scheme used below to construct a random geodesic curve
from a time-dependent law might fail to work. Here we are encountering
a general principle in probability theory: Analytic characterizations of
stochastic processes (like those based on semigroups, generators, etc.)
are essentially available only in locally compact spaces. In spite of all
that, there are some representation theorems for Wasserstein geodesics
that do not need local compactness; see the bibliographical notes for
details.

The proof of Theorem 7.21 is not so diﬃcult, but a bit cumbersome

because of measurability issues. For training purposes, the reader might
rewrite it in the simpler case where any pair of points is joined by a
unique geodesic (as in the case of Rn ). To help understanding, I shall
ﬁrst sketch the main idea.

Main idea in the proof of Theorem 7.21. The delicate part consists in
showing that if (µt ) is a given action-minimizing curve, then there ex-
ists a random minimizer γ such that µt = law (γt ). This γ will be
(0) (0)
constructed by dyadic approximation, as follows. First let (γ0 , γ1 )
(0)
be an optimal coupling of (µ0 , µ1 ). (Here the notation γ0 could be
replaced by just x0 , it does not mean that there is some curve γ (0)
142 7 Displacement interpolation
(1) (1)
behind.) Then let (γ0 , γ1/2 ) be an optimal coupling of (µ0 , µ1/2 ), and
(1) (1)
((γ ′ )1/2 , γ1 ) be an optimal coupling of (µ1/2 , µ1 ). By gluing these cou-
(1) (1)
plings together, I can actually assume that (γ ′ )1/2 = γ1/2 , so that I have
(1) (1) (1)
a triple (γ0 , γ1/2 , γ1 ) in which the ﬁrst two components on the one
hand, and the last two components on the other hand, constitute opti-
mal couplings.
Now the key observation is that if (γt1 , γt2 ) and (γt2 , γt3 ) are optimal
couplings of (µt1 , µt2 ) and (µt2 , µt3 ) respectively, and the µtk satisfy the
equality appearing in (ii), then also (γt1 , γt3 ) should be optimal. Indeed,
by taking expectation in the inequality

ct1 ,t3 (γt1 , γt3 ) ≤ ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 )

and using the optimality assumption, one obtains

E ct1 ,t3 (γt1 , γt3 ) ≤ C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ).

Now the fact that (µt ) is action-minimizing imposes

C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) = C t1 ,t3 (µt1 , µt3 );

so actually
E ct1 ,t3 (γt1 , γt3 ) ≤ C t1 ,t3 (µt1 , µt3 ),
which means that indeed (γt1 , γt3 ) is an optimal coupling of (µt1 , µt3 )
for the cost ct1 ,t3 .
(1) (1)
So (γ0 , γ1 ) is an optimal coupling of (µ0 , µ1 ). Now we can proceed
in the same manner and deﬁne, for each k, a random discrete path
(k) (k) (k)
(γj 2−k ) such that (γs , γt ) is an optimal coupling for all times s, t of
the form j/2k . These are only discrete paths, but it is possible to extend
(k)
them into paths (γt )0≤t≤1 that are minimizers of the action. Of course,
(k)
if t is not of the form j/2k , there is no reason why law (γt ) would
coincide with µt . But hopefully we can pass to the limit as k → ∞, for
each dyadic time, and conclude by a density argument. ⊓
⊔

Complete proof of Theorem 7.21. First, if As,t (γ) is bounded below by

a constant −C, independently of s, t and γ, then the same is true of
the cost functions cs,t and of the total costs C s,t. So all the quantities
appearing in the proof will be well-deﬁned, the value +∞ being possibly
attained. Moreover, the action As,t deﬁned by the formula in (iii) will
Interpolation of random variables 143

also be bounded below by the same constant −C, so Property (i) of

Definition 7.13 will be satisfied.
Now let µ0 and µ1 be given. According to Theorem 4.1, there exists
at least one optimal transference plan π between µ0 and µ1 , for the cost
c = c0,1 . Let S0→1 be the mapping appearing in Proposition 7.16(vi),
and let
Π := (S0→1 )# π.
Then Π defines the law of a random geodesic γ, and the identity E0,1 ◦
S0→1 = Id implies that the endpoints of γ are distributed according
to π. This proves the existence of a path satisfying (i). Now the main
part of the proof consists in checking the equivalence of properties (i)
and (ii). This will be performed in four steps.
Step 1. Let (µt )0≤t≤1 be any continuous curve in the space of proba-
bility measures, and let t1 , t2 , t3 be three intermediate times. Let πt1 →t2
be an optimal transference plan between µt1 and µt2 for the transport
cost ct1 ,t2 , and similarly let πt2 →t3 be an optimal transference plan be-
tween µt2 and µt3 for the transport cost ct2 ,t3 . By the Gluing Lemma
of Chapter 1 one can construct random variables (γt1 , γt2 , γt3 ) such
that law (γt1 , γt2 ) = πt1 →t2 and law (γt2 , γt3 ) = πt2 →t3 (in particular,
law (γti ) = µti for i = 1, 2, 3). Then, by (7.14),

C t1 ,t3 (µt1 , µt3 ) ≤ E ct1 ,t3 (γt1 , γt3 ) ≤ E ct1 ,t2 (γt1 , γt2 ) + E ct2 ,t3 (γt2 , γt3 )
= C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ).

This inequality holds for any path, optimal or not.

Step 2. Assume that (µt ) satisﬁes (i), so there is a dynamical op-
timal transference plan Π such that µt = (et )# Π. Let γ be a ran-
dom minimizing curve with law Π, and consider the obvious coupling
(γt1 , γt2 ) (resp. (γt2 , γt3 )) of (µt1 , µt2 ) (resp. (µt2 , µt3 )). Then from the
deﬁnition of the optimal cost and the minimizing property of γ,

C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) ≤ E ct1 ,t2 (γt1 , γt2 ) + E ct2 ,t3 (γt2 , γt3 )
= E At1 ,t2 (γ) + E At2 ,t3 (γ) = E At1 ,t3 (γ) = E ct1 ,t3 (γt1 , γt3 ). (7.18)

Now choose t1 = 0, t2 = t, t3 = 1. Since by assumption (γ0 , γ1 ) is

an optimal coupling of (µ0 , µ1 ), the above computation implies

C 0,t (µ0 , µt ) + C t,1 (µt , µ1 ) ≤ C 0,1 (µ0 , µ1 ),

144 7 Displacement interpolation

and since the reverse inequality holds as a consequence of Step 1, ac-

tually
C 0,t (µ0 , µt ) + C t,1 (µt , µ1 ) = C 0,1 (µ0 , µ1 ).
Moreover, equality has to hold in (7.18) (for that particular choice of in-
termediate times); since C 0,1 (µ0 , µ1 ) < +∞ this implies C 0,t (µ0 , µt ) =
E c0,t (γ0 , γt ), which means that (γ0 , γt ) should actually be an optimal
coupling of (µ0 , µt ). Similarly, (γt , γ1 ) should be an optimal coupling of
(µt , µ1 ).
Next choose t1 = 0, t2 = s, t3 = t, and apply the previous deduction
to discover that (γs , γt ) is an optimal coupling of (µs , µt ). After inserting
this information in (7.18) with s = t2 and t = t3 , we recover
C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) ≤ C t1 ,t3 (µt1 , µt3 ).
This together with Step 1 proves that (µt ) satisfies Property (ii). So
far we have proven (i) ⇒ (ii).
Step 3. Assume that (µt ) satisfies Property (ii); then we can per-
form again the same computation as in Step 1, but now all the in-
equalities have to be equalities. This implies that the random variables
(γt1 , γt2 , γt3 ) satisfy:
(a) (γt1 , γt3 ) is an optimal coupling of (µt1 , µt3 ) for the cost ct1 ,t3 ;
(b) ct1 ,t3 (γt1 , γt3 ) = ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 ) almost surely;
(c) cs,t (γs , γt ) < +∞ almost surely.
Armed with that information, we proceed as follows. We start from
an optimal coupling (γ0 , γ1 ) of (µ0 , µ1 ), with joint law π0→1 . Then as
(1) (1) (1) (1)
in Step 1 we construct a triple (γ0 , γ 1 , γ1 ) with law (γ0 ) = µ0 ,
2
(1) (1) (1) (1)
law (γ 1 ) = µ 1 , law (γ1 ) = µ1 , such that (γ0 , γ 1 ) is an optimal
2 2 2
1 (1) (1)
coupling of (µ0 , µ 1 ) for the cost c0, 2 and (γ 1 , γ1 ) is an optimal cou-
2 2
1
pling of (µ 1 , µ1 ) for the cost c 2 ,1 . From (a) and (b) above we know
2
(1) (1) (1) (1)
that (γ0 , γ1 ) is an optimal coupling of (µ0 , µ1 ) (but law (γ0 , γ1 )
(1) (1)
might be different from law (γ0 , γ1 )), and moreover c0,1 (γ0 , γ1 ) =
1 (1) (1) 1 (1) (1)
c0, 2 (γ0 , γ 1 ) + c 2 ,1 (γ 1 , γ1 ) almost surely.
2 2
Next it is possible to iterate the construction, introducing more
and more midpoints. By a reasoning similar to the one above and an
induction argument, one can construct, for each integer k ≥ 1, random
(k) (k) (k) (k) (k)
variables (γ0 , γ 1 , γ 2 , γ 3 , . . . γ1 ) in such a way that
2k 2k 2k
Interpolation of random variables 145
(k) (k)
(a) for any two i, j ≤ 2k , (γ i ,γ j ) constitutes an optimal coupling
2k 2k
of (µ i ,µ j ),
2k 2k

(b) for any three indices i1 , i2 , i3 ≤ 2k , one has

(k) (k) (k) (k)
i1 i3 i1 i2 i2 i3
, (k) , (k) ,
c 2k 2k γ i1 , γ i3 = c 2k 2k γ i1 , γ i2 + c 2k 2k γ i2 , γ i3 .
2k 2k 2k 2k 2k 2k

At this stage it is convenient to extend the random variables γ (k) ,

which are only deﬁned for times j/2k , into (random) continuous curves
(k)
(γt )0≤t≤1 . For that we use Proposition 7.16(vi) again, and for any
t ∈ (i/2k , (i + 1)/2k ) we deﬁne

γt := et S i , i+1 (γ i , γ i+1 ) .
2k 2k 2k 2k

(Recall that et is just the evaluation at time t.) Then the law Π (k) of
(γt )0≤t≤1 is a probability measure on the set of continuous curves in X .
I claim that Π (k) is actually concentrated on minimizing curves.
(Skip at ﬁrst reading and go directly to Step 4.) To prove this, it is
suﬃcient to check the criterion in Proposition 7.16(iv), involving three
intermediate times t1 , t2 , t3 . By construction, the criterion holds true if
all these times belong to the same time-interval [i/2k , (i + 1)/2k ], and
also if they are all of the form j/2k ; the problem consists in “crossing
subintervals”. Let us show that

(i − 1) 2−k < s < i 2−k ≤ j 2−k < t < (j + 1) 2−k =⇒

 i−1 j+1
 , i−1
,s t, j+1

 c 2k 2k γ i−1 , γ j+1 = c 2k γ i−1 , γs + cs,t (γs , γt ) + c 2k γt , γ j+1
 2k 2k 2k 2k


 j j
 i
cs,t (γs , γt ) = cs, 2k (γs , γ i ) + c 2k
i
,
2k (γ i ,γ j ) + c 2k (γ
,t
j , γt )
2k 2k 2k 2k
(7.19)
To prove this, we start with
i−1 j+1
, k i−1
,s t, j+1
c 2k 2 (γ i−1 , γ j+1 ) ≤ c 2k (γ i−1 , γs ) + cs,t (γs , γt ) + c k2 (γt , γ j+1 )
2k 2k 2k 2k
i−1 i i i+1
,s s, ,
≤c 2k (γ i−1 , γs ) + c 2k (γs , γ i ) + c 2k 2k (γ i , γ i+1 )
2k 2k 2k 2k
j
,t t, j+1
+ ... + c 2k (γ j , γt ) + c 2 k (γt , γ j+1 ).
2k 2k
(7.20)
146 7 Displacement interpolation

Since we have used minimizing curves to interpolate on each dyadic

subinterval,
i−1 i i−1 i
,s s, ,
c 2k (γ i−1 , γs ) + c 2k (γs , γ i ) = c 2k 2k (γ i−1 , γ i ),
2k 2k 2k 2k

etc. So the right-hand side of (7.20) coincides with

j
i−1
, i
, j+1
c 2k 2k (γ i−1 , γ i ) + . . . + c 2k 2 k (γ j , γ j+1 ),
2k 2k 2k 2k

i−1 j+1
, k
and by construction of Π (k) this is just c 2k 2 (γ i−1 , γ j+1 ). So there
2k 2k
has to be equality everywhere in (7.20), which leads to (7.19). (Here
I use the fact that cs,t (γs , γt ) < +∞.) After that it is an easy game
to conclude the proof of the minimizing property for arbitrary times
t1 , t2 , t3 .
Step 4. To recapitulate: Starting from a curve (µt )0≤t≤1 , we have
constructed a family of probability measures Π (k) which are all concen-
trated on the set Γ of minimizing curves, and satisfy (et )# Π (k) = µt
for all t = j/2k . It remains to pass to the limit as k → ∞. For that we
shall check the tightness of the sequence (Π (k) )k∈N . Let ε > 0 be arbi-
trary. Since µ0 , µ1 are tight, there are compact sets K0 , K1 such that
µ0 [X \ K0 ] ≤ ε, µ1 [X \ K1 ] ≤ ε. From the coercivity of the action, the
0,1
set ΓK of action-minimizing curves joining K0 to K1 is compact,
0 →K1 0,1
and Π Γ \ ΓK0 →K1 is (with obvious notation)

/ K0 × K1 ≤ P [γ0 ∈
P (γ0 , γ1 ) ∈ / K0 ] + P [γ1 ∈
/ K1 ]
= µ0 [X \ K0 ] + µ1 [X \ K1 ] ≤ 2ε.

This proves the tightness of the family (Π (k) ). So one can extract a
subsequence thereof, still denoted Π (k) , that converges weakly to some
probability measure Π.
By Proposition 7.16(v), Γ is closed; so Π is still supported in Γ .
Moreover, for all dyadic time t = i/2ℓ in [0, 1], we have, if k is larger
than ℓ, (et )# Π (k) = µt , and by passing to the limit we ﬁnd that
(et )# Π = µt also.
By assumption, µt depends continuously on t. So, to conclude that
(et )# Π = µt for all times t ∈ [0, 1] it now suﬃces to check the continuity
of (et )# Π as a function of t. In other words, if ϕ is an arbitrary bounded
continuous function on X , one has to show that
Interpolation of random variables 147

ψ(t) = E ϕ(γt )

is a continuous function of t if γ is a random geodesic with law Π. But

this is a simple consequence of the continuity of t 7−→ γt (for all γ),
and Lebesgue’s dominated convergence theorem. This concludes Step 4,
and the proof of (ii) ⇒ (i).
Next, let us check that the two expressions for As,t in (iii) do coin-
cide. This is about the same computation as in Step 1 above. Let s < t
be given, let (µτ )s≤τ ≤t be a continuous path, and let (ti ) be a subdivi-
sion of [s, t]. Further, let γ be such that law (γτ ) = µτ , and let (Xs , Xt )
be an optimal coupling of (µs , µt ), for the cost function cs,t . Further,
let (γτ )s≤τ ≤t be a random continuous path, such that law (γτ ) = µτ for
all τ ∈ [s, t]. Then
X
C ti ,ti+1 (µti , µti+1 ) ≤ C s,t (µs , µt ) = E cs,t (Xs , Xt )
i
≤ E cs,t (γs , γt ) ≤ E As,t(γ),

where the next-to-last inequality follows from the fact that (γs , γt ) is
a coupling of (µs , µt ), and the last inequality is a consequence of the
deﬁnition of cs,t . This shows that
X
C ti ,ti+1 (µti , µti+1 ) ≤ E As,t (γ).
i

On the other hand, there is equality in the whole chain of inequalities

if t0 = s, t1 = t, Xs = γs , Xt = γt , and γτ is a (random) action-
minimizing curve. So the two expressions in (iii) do coincide.
Now let us address the equivalence between (ii) and (iii). First,
it is clear that As,t is lower semicontinuous, since it is deﬁned as a
supremum of lower semicontinuous functionals. The inequality At1 ,t3 ≥
At1 ,t2 + At2 ,t3 holds true for all intermediate times t1 < t2 < t3 (this is
a simple consequence of the deﬁnitions), and the converse inequality is
a consequence of the general inequality

s < t2 < t =⇒ C s,t2 (µs , µt2 ) ≤ C s,t2 (µs , µt2 ) + C t2 ,t (µt2 , µt ),

which we proved in Step 1 above. So Property (i) in Deﬁnition 7.11

is satisﬁed. To check Property (ii) of that deﬁnition, take any two
probability measures µs , µt and introduce a displacement interpolation
148 7 Displacement interpolation

(µτ )s≤τ ≤t for the Lagrangian action restricted to [s, t]. Then Property
(ii) of Theorem 7.21 implies As,t (µ) = C s,t (µs , µt ). Finally, Property
(iii) in Definition 7.11 is also satisfied by construction. In the end, (A)
does define a Lagrangian action, with induced cost functionals C s,t .
To conclude the proof of Theorem 7.21 it only remains to check
the coercivity of the action; then the equivalence of (i), (ii) and (iii)
will follow from Proposition 7.16(iv). Let s < t be two given times in
[0, 1], and let Ks , Kt be compact sets of probability measures such that
C s,t(µs , µt ) < +∞ for all µs ∈ Ks , µt ∈ Kt . Action-minimizing curves
for As,t can be written as law (γτ )s≤τ ≤t , where γ is a random action-
minimizing curve [s, t] → X such that law (γs ) ∈ Ks , law (γt ) ∈ Kt .
One can use an argument similar to the one in Step 4 above to prove
that the laws Π of such minimizing curves form a tight, closed set; so
we have a compact set of dynamical transference plans Π s,t, that are
probability measures on C([s, t]; X ). The problem is to show that the
paths (eτ )# Π s,t constitute a compact set in C([s, t]; P (X )). Since the
continuous image of a compact set is compact, it suffices to check that
the map
Π s,t 7−→ ((eτ )# Π s,t )s≤τ ≤t
is continuous from P (C([s, t]; X )) to C([s, t]; P (X )). To do so, it will
be convenient to metrize P (X ) with the Wasserstein distance W1 , re-
placing if necessary d by a bounded, topologically equivalent distance.
(Recall Corollary 6.13.) Then the uniform distance on C([s, t]; X ) is
also bounded and there is an associated Wasserstein distance W1 on
P (C([s, t]; X )). Let Π and Π e be two dynamical optimal transference
plans, and let ((γτ ), (e
γτ )) be an optimal coupling of Π and Π; e let also
eτ be the associated displacement interpolations; then the required
µτ , µ
continuity follows from the chain of inequalities

et ) ≤ sup E d(γt , γ
sup W1 (µt , µ et ) ≤ E sup d(γt , γ e
et ) = W1 (Π, Π).
t∈[0,1] t∈[0,1] t∈[0,1]

This proves that displacement interpolations with endpoints lying in

given compact sets themselves form a compact set, and concludes the
proof of the coercivity of the action (A). ⊓
⊔

Remark 7.28. In the proof of the implication (ii) ⇒ (i), instead of

deﬁning Π (k) on the space of continuous curves, one could instead
work with Π (k) deﬁned on discrete times, construct by compactness
ℓ
a consistent system of marginals on X 2 +1 , for all ℓ, and then invoke
Interpolation of random variables 149

Kolmogorov’s existence theorem to get a Π which is deﬁned on a set

of curves. Things however are not that simple, since Kolmogorov’s the-
orem constructs a random measurable curve which is not a priori con-
tinuous. Here one has the same conceptual catch as in the construction
of Brownian motion as a probability measure on continuous paths.

Proof of Corollary 7.22. Introduce the family of actions

Z t
s,t
A (γ) = |γ̇τ |p dτ.
s

Then
d(x, y)p
cs,t (x, y) = ,
(t − s)p−1
and all our assumptions hold true for this action and cost. (The assump-
tion of local compactness is used to prove that the action is coercive,
see the Appendix.) The important point now is that
Wp (µ, ν)p
C s,t(µ, ν) = .
(t − s)p−1
So, according to the remarks in Example 7.9, Property (ii) in Theo-
rem 7.21 means that (µt ) is in turn a minimizer of the action associated
with the Lagrangian |µ̇|p , i.e. a geodesic in Pp (X ). Note that if µt is
the law of a random optimal geodesic γt at time t, then

Wp (µt , µs )p ≤ E d(γs , γt )p = E (t − s)p d(γ0 , γ1 )p = (t − s)p Wp (µ0 , µ1 )p ,

so the path µt is indeed continuous (and actually 1-Lipschitz) for the

distance Wp . ⊓
⊔

Proof of Corollary 7.23. By Theorem 7.21, any displacement interpo-

lation has to take the form (et )# Π, where Π is a probability measure
on action-minimizing curves such that π := (e0 , e1 )# Π is an optimal
transference plan. By assumption, there is exactly one such π. Let Z
be the set of pairs (x0 , x1 ) such that there is more than one minimiz-
ing curve joining x0 to x1 ; by assumption π[Z] = 0. For (x0 , x1 ) ∈
/ Z,
there is a unique geodesic γ = S(x0 , x1 ) joining x0 to x1 . So Π has to
coincide with S# π. ⊓
⊔

To conclude this section with a simple application, I shall use Corol-

lary 7.22 to derive the Lipschitz continuity of moments, alluded to in
Remark 6.10.
150 7 Displacement interpolation

Proposition 7.29 (Wp -Lipschitz continuity of p-moments). Let

(X , d) be a locally compact Polish length space, let p ≥ 1 and µ, ν ∈
Pp (X ). Then for any ϕ ∈ Lip(X ; R+ ),
Z 1 Z 1

p p

ϕ(x)p µ(dx) − ϕ(y)p ν(dy) ≤ kϕkLip Wp (µ, ν).

Proof of Proposition 7.29. Without loss of generality, kϕkLip = 1. The

case p = 1 is obvious from (6.3) and holds true for any Polish space,
so I shall assume p > 1. Let (µt )0≤t≤1 be a displacement interpolation
between µ0 = µ and µ1 = ν, for the cost function c(x, y) = d(x, y)p .
By Corollary 7.22, there is a probability measure Π on Γ , the set of
geodesics in X , such that
R µt = (et )# Π. R
Then let Ψ (t) = X ϕ(x) µt (dx) = Γ ϕ(γt )p Π(dγ). By Fatou’s
p

lemma and Hölder’s inequality,

Z +
d+ Ψ d ϕ(γt )p
≤ Π(dγ)
dt ΓZ dt
≤ p ϕ(γt )p−1 |γ̇t | Π(dγ)
Z 1− 1 Z 1
p p
p p
≤p ϕ(γt ) Π(dγ) |γ̇t | Π(dγ)
Z 1
p
1− 1p p
= p Ψ (t) d(γ0 , γ1 ) Π(dγ)
1
= p Ψ (t)1− p Wp (µ, ν).

So (d+ /dt)[Ψ (t)1/p ] ≤ Wp (µ, ν), thus |Ψ (1)1/p − Ψ (0)1/p | ≤ Wp (µ, ν),
which is the desired result. ⊓
⊔

Displacement interpolation between intermediate times

and restriction

Again let µ0 and µ1 be any two probability measures, (µt )0≤t≤1 a dis-
placement interpolation associated with a dynamical optimal transfer-
ence plan Π, and (γt )0≤t≤1 a random action-minimizing curve with
law (γ) = Π. In particular (γ0 , γ1 ) is an optimal coupling of (µ0 , µ1 ).
Displacement interpolation between intermediate times and restriction 151

With the help of the previous arguments, it is almost obvious that

(γt0 , γt1 ) is also an optimal coupling of (µt0 , µt1 ). What may look at
ﬁrst sight more surprising is that if (t0 , t1 ) 6= (0, 1), this is the only op-
timal coupling, at least if action-minimizing curves “cannot branch”.
Furthermore, there is a time-dependent version of Theorem 4.6.

Theorem 7.30 (Interpolation from intermediate times and re-

striction). Let X be a Polish space equipped with a coercive action
(A) on C([0, 1]; X ). Let Π ∈ P (C([0, 1]; X )) be a dynamical optimal
transport plan associated with a finite total cost. For any t0 , t1 in [0, 1]
with 0 ≤ t0 < t1 ≤ 1, define the time-restriction of Π to [t0 , t1 ] as
Π t0 ,t1 := (rt0 ,t1 )# Π, where rt0 ,t1 (γ) is the restriction of γ to the inter-
val [t0 , t1 ]. Then:
(i) Π t0 ,t1 is a dynamical optimal coupling for the action (A)t0 ,t1 .
(ii) If Π e is a measure on C([t0 , t1 ]; X ), such that Π
e ≤ Π t0 ,t1 and
e
Π[C([t0 , t1 ]; X )] > 0, let

e
Π
Π ′ := , µ′t = (et )# Π ′ .
e C([t0 , t1 ]; X )
Π

Then Π ′ is a dynamical optimal coupling between µ′t0 and µ′t1 ; and

(µ′t )t0 ≤t≤t1 is a displacement interpolation.
(iii) Further, assume that action-minimizing curves are uniquely
and measurably determined by their restriction to a nontrivial time-
interval, and (t0 , t1 ) 6= (0, 1). Then, Π ′ in (ii) is the unique dynam-
ical optimal coupling between µ′t0 and µ′t1 . In particular, (π ′ )t0 ,t1 :=
(et0 , et1 )# Π ′ is the unique optimal transference plan between µ′t0 and
µ′t1 ; and µ′t := (et )# Π ′ (t0 ≤ t ≤ t1 ) is the unique displacement inter-
polation between µ′t0 and µ′t1 .
(iv) Under the same assumptions as in (iii), for any t ∈ (0, 1),
(Π ⊗ Π)(dγ deγ )-almost surely,

et ] =⇒ [γ = γ
[γt = γ e].

In other words, the curves seen by Π cannot cross at intermediate

times. If the costs cs,t are continuous, this conclusion extends to all
curves γ, γe ∈ Spt Π.
(v) Under the same assumptions as in (iii), there is a measurable
map Ft : X → Γ (X ) such that, Π(dγ)-almost surely, Ft (γt ) = γ.
152 7 Displacement interpolation

Remark 7.31. In Chapter 8 we shall work in the setting of smooth

Riemannian manifolds and derive a quantitative variant of the “no-
crossing” property expressed in (iv).

Corollary 7.32 (Nonbranching is inherited by the Wasserstein

space). Let (X , d) be a complete separable, locally compact length space
and let p ∈ (1, ∞). Assume that X is nonbranching, in the sense that
a geodesic γ : [0, 1] → X is uniquely determined by its restriction to
a nontrivial time-interval. Then also the Wasserstein space Pp (X ) is
nonbranching. Conversely, if Pp (X ) is nonbranching, then X is non-
branching.

Proof of Theorem 7.30. Let γ be a random geodesic with law Π. Let

γ 0,t0 , γ t0 ,t1 and γ t1 ,1 stand for the restrictions of γ to the time intervals
[0, t0 ], [t0 , t1 ] and [t1 , 1], respectively. Then

C 0,t0 (µ0 , µt0 ) + C t0 ,t1 (µt0 , µt1 ) + C t1 ,1 (µt1 , µ1 )

≤ E c0,t0 (γ 0,t0 ) + E ct0 ,t1 (γ t0 ,t1 ) + E ct1 ,1 (γ t1 ,1 )
= E c0,1 (γ) = C 0,1 (µ0 , µ1 )
≤ C 0,t0 (µ0 , µt0 ) + C t0 ,t1 (µt0 , µt1 ) + C t1 ,1 (µt1 , µ1 ).

So there has to be equality in all the inequalities, and it follows that

E ct0 ,t1 (γ t0 , γ t1 ) = C t0 ,t1 (µt0 , µt1 ).

So γ t0 ,t1 is optimal, and Π t0 ,t1 is a dynamical optimal transference plan.

Statement (i) is proven.
As a corollary of (i), π t0 ,t1 = (et0 , et1 )# Π t0 ,t1 is an optimal transfer-
ence plan between µt0 and µt1 . Let π e The inequal-
e := (et0 , et1 )# Π.
e
ity Π ≤ Π t0 ,t1 is preserved by push-forward, so π e ≤ π t0 ,t1 . Also
π e
e[X × X ] = Π[C([t 0 , t1 ]; X )] > 0. By Theorem 4.6, π := π
′ π [X × X ]
e/e
is an optimal transference plan between its marginals. But π ′ coincides
with (et0 , et1 )# Π ′ , and since Π ′ is concentrated (just as Π) on action-
minimizing curves, Theorem 7.21 guarantees that Π ′ is a dynamical
optimal transference plan between its marginals. This proves (ii). (The
continuity of µ′t in t is shown by the same argument as in Step 4 of the
proof of Theorem 7.21.)
To prove (iii), assume, without loss of generality, that t0 > 0; then
an action-minimizing curve γ is uniquely and measurably determined
Displacement interpolation between intermediate times and restriction 153

by its restriction γ 0,t0 to [0, t0 ]. In other words, there is a measur-

able function F 0,t0 : Γ 0,t0 → Γ , deﬁned on the set of all γ 0,t0 , such
that any action-minimizing curve γ : [0, 1] → X can be written as
F 0,t0 (γ 0,t0 ). Similarly, there is also a measurable function F t0 ,t1 such
that F t0 ,t1 (γ t0 ,t1 ) = γ.
By construction, Π e is concentrated on the curves γ t0 ,t1 , which are
the restrictions to [t0 , t1 ] of the action-minimizing curves γ. Then let
Π := (F t0 ,t1 )# Π; e this is a probability measure on C([0, 1]; X ). Of
course Π ≤ (F ,t1 )# Π t0 ,t1 = Π; so by (ii), Π/Π[C([0, 1]; X )] is opti-
t0

mal. (In words, Π is obtained from Π by extending to the time-interval

[0, 1] those curves which appear in the sub-plan Π ′ .) Then it is easily
seen that Π ′ = (rt0 ,t1 )# Π, and Π[C([0, 1]; X )] = Π ′ [C([t0 , t1 ]; X )]. So
it suﬃces to prove Theorem 7.30(iii) in the case when Π e = Π t0 ,t1 , and
this will be assumed in the sequel.
Now let γ be a random geodesic with law Π, and Π 0,t0 = law (γ 0,t0 ),
Π t0 ,t1 = law (γ t0 ,t1 ), Π t1 ,1 = law (γ t1 ,1 ). By (i), Π t0 ,t1 is a dynamical
optimal transference plan between µt0 and µt1 ; let Π e t0 ,t1 be another
such plan. The goal is to show that Π e t0 ,t1 = Π t0 ,t1 .
Disintegrate Π 0,t0 and Π e t0 ,t1 along their common marginal µt0 and
glue them together. This gives a probability measure on C([0, t0 ); X ) ×
X × C((t0 , t1 ]; X ), supported on triples (γ, g, γe) such that γ(t) → g as
t → t− 0,γe(t) → g as t → t+ 0 . Such triples can be identiﬁed with continu-
ous functions on [0, t1 ], so what we have is in fact a probability measure
on C([0, t1 ]; X ). Repeat the operation by gluing this with Π t1 ,1 , so as
to get a probability measure Π on C([0, 1]; X ).
Then let γ be a random variable with law Π: By construction,
law (γ 0,t0 ) = law (γ 0,t0 ), and law (γ t1 ,1 ) = law (γ t1 ,1 ), so
E c0,1 (γ) ≤ E c0,t0 (γ 0,t0 ) + E ct0 ,t1 (γ t0 ,t1 ) + E ct1 ,1 (γ t1 ,1 )
= E c0,t0 (γ 0,t0 ) + E ct0 ,t1 (γ t0 ,t1 ) + E ct1 ,1 (γ t1 ,1 )
= C 0,t0 (µ0 , µt0 ) + C t0 ,t1 (µt0 , µt1 ) + C t1 ,1 (µt1 , µ1 )
= C 0,1 (µ0 , µ1 ).
Thus Π is a dynamical optimal transference plan between µ0 and µ1 .
It follows from Theorem 7.21 that there is a random action-minimizing
curve γ b with law (b
γ ) = Π. In particular,
γ 0,t0 ) = law (γ 0,t0 );
law (b γ t0 ,t1 ) = law (e
law (b γ t0 ,t1 ).
By assumption there is a measurable function F (F = rt0 ,t1 ◦ F 0,t0 )
such that gt0 ,t1 = F (g0,t0 ), for any action-minimizing curve g. So
154 7 Displacement interpolation

γ t0 ,t1 ) = law (b
law (e γ t0 ,t1 ) = law (F (b
γ 0,t0 )) = law (F (γ 0,t0 )) = law (γ t0 ,t1 ).

This proves the uniqueness of the dynamical optimal transference

plan joining µ′t0 to µ′t1 . The remaining part of (iii) is obvious since
any optimal plan or displacement interpolation has to come from a
dynamical optimal transference plan, according to Theorem 7.21.
Now let us turn to the proof of (iv). Since the plan π = (e0 , e1 )# Π
is c0,1 -cyclically monotone (Theorem 5.10(ii)), we have, Π ⊗ Π(dγ de γ )-
almost surely,

c0,1 (γ0 , γ1 ) + c0,1 (e e1 ) ≤ c0,1 (γ0 , γ

γ0 , γ e1 ) + c0,1 (e
γ0 , γ1 ), (7.21)

and all these quantities are ﬁnite (almost surely).

If γ and e et = X for some
γ are two such paths, assume that γt = γ
t ∈ (0, 1). Then

c0,1 (γ0 , γ
e1 ) ≤ c0,t (γ0 , X) + ct,1 (X, γe1 ), (7.22)

and similarly

c0,1 (e
γ0 , γ1 ) ≤ c0,t (e
γ0 , X) + ct,1 (X, γ1 ). (7.23)

By adding up (7.22) and (7.23), we get

c0,1 (γ0 , γe1 ) + c0,1 (e

γ0 , γ1 )
0,t
≤ c (γ0 , X) + ct,1 (X, γ1 ) + c0,t (e
γ0 , X) + ct,1 (X, γe1 )
= c0,1 (γ0 , γ1 ) + c0,1 (e
γ0 , γe1 ).

Since the reverse inequality holds true by (7.21), equality has to hold
in all intermediate inequalities, for instance in (7.22). Then it is easy
to see that the path γ deﬁned by γ(s) = γ(s) for 0 ≤ s ≤ t, and
γ(s) = γ e(s) for s ≥ t, is a minimizing curve. Since it coincides with γ
on a nontrivial time-interval, it has to coincide with γ everywhere, and
similarly it has to coincide with γe everywhere. So γ = γ e.
s,t
If the costs c are continuous, the previous conclusion holds true
not only Π ⊗ Π-almost surely, but actually for any two minimizing
curves γ, γe lying in the support of Π. Indeed, inequality (7.21) deﬁnes
a closed set C in Γ ×Γ , where Γ stands for the set of minimizing curves;
so Spt Π × Spt Π = Spt(Π ⊗ Π) ⊂ C.
It remains to prove (v). Let Γ 0,1 be a c-cyclically monotone subset
of X × X on which π is concentrated, and let Γ be the set of mini-
mizing curves γ : [0, 1] → X such that (γ0 , γ1 ) ∈ Γ 0,1 . Let (Kℓ )ℓ∈N be
Interpolation of prices 155

a nondecreasing sequence of compact sets contained in Γ , such that

Π[∪Kℓ ] = Π[Γ ] = 1. For each ℓ, we deﬁne Fℓ on et (Kℓ ) by Fℓ (γt ) = γ.
This map is continuous: Indeed, if xk ∈ et (Kℓ ) converges to x, then for
each k we have xk = (γk )t for some γk ∈ Kℓ , and up to extraction γk
converges to γ ∈ Kℓ , and in particular (γk )t converges to γt ; but then
Fℓ (γt ) = γ. Then we can deﬁne F on ∪Kℓ as a map which coincides
with Fℓ on each Kℓ . (Obviously, this is the same line of reasoning as in
Theorem 5.30.) ⊓
⊔

Proof of Corollary 7.32. Assume that X is nonbranching. Then there

exists some function F , deﬁned on the set of all curves γ 0,t0 , where
γ 0,t0 is the restriction to [0, t0 ] of the geodesic γ : [0, 1] → X , such that
γ = F (γ 0,t0 ).
I claim that F is automatically continuous. Indeed, let (γn )n∈N be
such that γn0,t0 converges uniformly on [0, t0 ] to some g : [0, t0 ] → X .
Since the functions γn0,t0 are uniformly bounded, the speeds of all the
geodesics γn are uniformly bounded too, and the images γn ([0, 1]) are
all included in a compact subset of X . It follows from Ascoli’s theo-
rem that the sequence (γn ) converges uniformly, up to extraction of a
subsequence. But then its limit γ has to be a geodesic, and its restric-
tion to [0, t0 ] coincides with g. There is at most one such geodesic, so
γ is uniquely determined, and the whole sequence γn converges. This
implies the continuity of F .
Then Theorem 7.30(iii) applies: If (µt )0≤t≤1 is a geodesic in Pp (X ),
there is only one geodesic between µt0 and µ1 . So (µt )0≤t≤1 is uniquely
determined by its restriction to [0, t0 ]. The same reasoning could be
done for any nontrivial time-interval instead of [0, t0 ]; so Pp (X ) is indeed
nonbranching.
The converse implication is obvious, since any geodesic γ in X in-
duces a geodesic in Pp (X ), namely (δγ(t) )0≤t≤1 . ⊓
⊔

Interpolation of prices

When the path µt varies in time, what becomes of the pair of “prices”
(ψ, φ) in the Kantorovich duality? The short answer is that these func-
tions will also evolve continuously in time, according to Hamilton–
Jacobi equations.
156 7 Displacement interpolation

Definition 7.33 (Hamilton–Jacobi–Hopf–Lax–Oleinik evolution

semigroup). Let X be a metric space and (A)0,1 a coercive La-
grangian action on X , with cost functions (cs,t )0≤s<t≤1 . For any two
functions ψ : X → R ∪ {+∞}, φ : X → R ∪ {−∞}, and any two times
0 ≤ s < t ≤ 1, define

s,t

 H+ ψ (y) = inf ψ(x) + cs,t (x, y) ;

 x∈X



 t,s
H− φ (x) = sup φ(y) − cs,t (x, y) .
y∈X
s,t s,t
The family of operators (H+ )t>s (resp. (H− )s<t ) is called the for-
ward (resp. backward) Hamilton–Jacobi (or Hopf–Lax, or Lax–Oleinik)
semigroup.
s,t
Roughly speaking, H+ gives the values of ψ at time t, from its values
s,t
at time s; while H− does the reverse. So the semigroups H− and H+
are in some sense inverses of each other. Yet it is not true in general
t,s s,t
that H− H+ = Id . Proposition 7.34 below summarizes some of the
main properties of these semigroups; the denomination of “semigroup”
itself is justified by Property (ii).
Proposition 7.34 (Elementary properties of Hamilton–Jacobi
semigroups). With the notation of Definition 7.33,
s,t s,t s,t s,t
(i) H+ and H− are order-preserving: ψ ≤ ψ =⇒ H± ψ ≤ H± ψ.
(ii) Whenever t1 < t2 < t3 are three intermediate times in [0, 1],
 t ,t t ,t t1 ,t3
2 3 1 2

H + H + = H +

 t2 ,t1 t3 ,t2 t3 ,t1
H− H− = H− .
(iii) Whenever s < t are two times in [0, 1],
t,s s,t s,t t,s
H− H+ ≤ Id ; H+ H− ≥ Id .
Proof of Proposition 7.34. Properties (i) and (ii) are immediate conse-
quences of the definitions and Proposition 7.16(iii). To check Property
(iii), e.g. the first half of it, write

t,s s,t
H− (H+ ψ)(x) = sup inf′ ψ(x′ ) + cs,t (x′ , y) − cs,t (x, y) .
y x

The choice x′ = x shows that the inﬁmum above is bounded above by

t,s s,t
ψ(x), independently of y; so H− (H+ ψ)(x) ≤ ψ(x), as desired. ⊓
⊔
Interpolation of prices 157

The Hamilton–Jacobi semigroup is well-known and useful in geome-

try and dynamical systems theory. On a smooth Riemannian manifold,
when the action is given by a Lagrangian L(x, v, t), strictly convex and
0,t
superlinear in the velocity variable, then S+ (t, · ) := H+ ψ0 solves the
differential equation
∂S+
(x, t) + H x, ∇S+ (x, t), t = 0, (7.24)
∂t
where H = L∗ is obtained from L by Legendre transform in the v vari-
able, and is called the Hamiltonian of the system. This equation pro-
vides a bridge between a Lagrangian description of action-minimizing
curves, and an Eulerian description: From S+(x, t) one can reconstruct
a velocity field v(x, t) = ∇p H x, ∇S+ (x, t), t , in such a way that in-
tegral curves of the equation ẋ = v(x, t) are minimizing curves. Well,
rigorously speaking, that would be the case if S+ were differentiable!
But things are not so simple because S+ is not in general differentiable
everywhere, so the equation has to be interpreted in a suitable sense
(called viscosity sense). It is important to note that if one uses the
t,1
backward semigroup and defines S− (x, t) := H− ψt , then S− formally
satisfies the same equation as S+ , but the equation has to be inter-
preted with a different convention (backward viscosity). This will be
illustrated by the next example.

Example 7.35. On a Riemannian manifold M , consider the simple

Lagrangian cost L(x, v, t) = |v|2 /2; then the associated Hamiltonian is
just H(x, p, t) = |p|2 /2. If S is a C 1 solution of ∂S/∂t + |∇S|2 /2 = 0,
then the gradient of S can be interpreted as the velocity ﬁeld of a family
of minimizing geodesics. But if S0 is a given Lipschitz function and
S+ (t, x) is deﬁned by the forward Hamilton–Jacobi semigroup starting
from initial datum S0 , one only has (for all t, x)

∂S+ |∇− S+ |2
+ = 0,
∂t 2
where
[f (y) − f (x)]−
|∇− f |(x) := lim sup , z− = max(−z, 0).
y→x d(x, y)

Conversely, if one uses the backward Hamilton–Jacobi semigroup to

deﬁne a function S− (x, t), then
158 7 Displacement interpolation

∂S− |∇+ S− |2 [f (y) − f (x)]+

+ = 0, |∇+ f |(x) := lim sup ,
∂t 2 y→x d(x, y)

where now z+ = max(z, 0). When the Lagrangian is more complicated,

things may become much more intricate. The standard convention is
to use the forward Hamilton–Jacobi semigroup by default.

We shall now see that the Hamilton–Jacobi semigroup provides a

simple answer to the problem of interpolation in dual variables. In the
next statement, X is again a Polish space, (A)0,1 a coercive Lagrangian
action on X , with associated cost functions cs,t ; and C s,t stands for the
optimal total cost in the transport problem with cost cs,t .

Theorem 7.36 (Interpolation of prices). With the same assump-

tions and notation as in Definition 7.33, let µ0 , µ1 be two probability
measures on X , such that C 0,1 (µ0 , µ1 ) < +∞, and let (ψ0 , φ1 ) be a pair
of c0,1 -conjugate functions such that any optimal plan π0,1 between µ0
and µ1 has its support included in ∂c0,1 ψ0 . (Recall Theorem 5.10; under
adequate integrability conditions, the pair (ψ0 , φ1 ) is just a solution of
the dual Kantorovich problem.) Further, let (µt )0≤t≤1 be a displacement
interpolation between µ0 and µ1 . Whenever s < t are two intermediate
times in [0, 1], define
0,s 1,t
ψs := H+ ψ0 , φt := H− φ1 .

Then (ψs , φt ) is optimal in the dual Kantorovich problem associated to

(µs , µt ) and cs,t . In particular,
Z Z
s,t
C (µs , µt ) = φt dµt − ψs dµs ,

and
φt (y) − ψs (x) ≤ cs,t (x, y),
with equality πs,t (dx dy)-almost surely, where πs,t is any optimal trans-
ference plan between µs and µt .

Proof of Theorem 7.36. From the deﬁnitions,

φt (y) − ψs (x) − cs,t (x, y)

h i
= sup φ1 (y ′ ) − ct,1 (y ′ , y) − ψ0 (x′ ) − c0,s (x′ , x) − cs,t (x, y) .
y ′ , x′
Interpolation of prices 159

Since c0,s (x′ , x) + cs,t (x, y) + ct,1 (y, y ′ ) ≥ c0,1 (x′ , y ′ ), it follows that
h i
φt (y) − ψs (x) − cs,t (x, y) ≤ sup φ1 (y ′ ) − ψ0 (x′ ) − c0,1 (x′ , y ′ ) ≤ 0.
y ′ , x′

So φt (y) − ψs (x) ≤ cs,t (x, y). This inequality does not depend on the
fact that (ψ0 , φ1 ) is a tight pair of prices, in the sense of (5.5), but only
on the inequality φ1 − ψ0 ≤ c0,1 .
Next, introduce a random action-minimizing curve γ such that
µt = law (γt ). Since (ψ0 , φ1 ) is an optimal pair, we know from The-
orem 5.10(ii) that, almost surely,

φ1 (γ1 ) − ψ0 (γ0 ) = c0,1 (γ0 , γ1 ).

From the identity c0,1 (γ0 , γ1 ) = c0,s (γ0 , γs )+cs,t (γs , γt )+ct,1 (γt , γ1 ) and
the deﬁnition of ψs and φt ,

cs,t (γs , γt ) = φ1 (γ1 ) − ct,1 (γt , γ1 ) − ψ0 (γ0 ) + c0,s (γ0 , γs )
≤ φt (γt ) − ψs (γs ).

This shows that actually cs,t (γs , γt ) = φt (γt ) − ψs (γs ) almost surely, so
(ψs , φt ) has to be optimal in the dual Kantorovich problem between
µs = law (γs ) and µt = law (γt ). ⊓
⊔

Remark 7.37. In the limit case s → t, the above results become

(
φt ≤ ψt
φt = ψt µt -almost surely

... but it is not true in general that φt = ψt everywhere in X .

Remark 7.38. However, the identity ψ1 = φ1 holds true everywhere

as a consequence of the deﬁnitions.

Exercise 7.39. After reading the rest of Part I, the reader can come
back to this exercise and check his or her understanding by proving
that, for a quadratic Lagrangian:
(i) The displacement interpolation between two balls in Euclidean
space is always a ball, whose radius increases linearly in time (here I am
identifying a set with the uniform probability measure on this set).
(ii) More generally, the displacement interpolation between two el-
lipsoids is always an ellipsoid.
160 7 Displacement interpolation

(iii) But the displacement interpolation between two sets is in gen-

eral not a set.
(iv) The displacement interpolation between two spherical caps on
the sphere is in general not a spherical cap.
(v) The displacement interpolation between two antipodal spheri-
cal caps on the sphere is unique, while the displacement interpolation
between two antipodal points can be realized in an inﬁnite number of
ways.

Appendix: Paths in metric structures

This Appendix is a kind of crash basic course in Riemannian geome-

try, and nonsmooth generalizations thereof. Much more detail can be
obtained from the references cited in the bibliographical notes.
A (finite-dimensional, smooth) Riemannian manifold is a manifold
M equipped with a Riemannian metric g: this means that g defines
a scalar product on each tangent space Tx M , varying smoothly with x.
So if v and w at tangent vectors at x, the notation v · w really means
gx (v, w), where gx is the metric at x. The degree of smoothness of g
depends on the context, but it is customary to consider C 3 manifolds
with a C 2 metric. For the purpose of this course, the reader might
assume C ∞ smoothness.
Let γ : [0, 1] → M be a smooth path,1 denoted by (γt )0≤t≤1 . For each
t ∈ (0, 1), the time-derivative at time t is — by the very definition of
tangent space — the tangent vector v = γ̇t in Tγt M . The scalar product
√
g gives a way to measure the norm of that vector: |v|Tx M = v · v. Then
one can define the length of γ by the formula
Z 1
L(γ) = |γ̇t | dt, (7.25)
0

and the distance, or geodesic distance, between two points x and y by

the formula
n o
d(x, y) = inf L(γ); γ0 = x, γ1 = y . (7.26)
1
For me the words “path” and “curve” are synonymous.
Appendix: Paths in metric structures 161

After that it is easy to extend the length formula to absolutely con-

tinuous curves. Note that any one of the three objects (metric, length,
distance) determines the other two; indeed, the metric can be recovered
from the distance via the formula
d(γ0 , γt )
|γ̇0 | = lim , (7.27)
t↓0 t

and the usual polarization identity

1h i
g(v, w) = g(v + w, v + w) − g(v − w, v − w) .
4
Let TM stand for the collection of all Tx M , x ∈ M , equipped with
a manifold structure which is locally product. A point in TM is a pair
(x, v) with v ∈ Tx M . The map π : (x, v) 7−→ x is the projection of TM
onto M ; it is obviously smooth and surjective. A function M → TM is
called a vector field: It is given by a tangent vector at each point. So a
vector field really is a map f : x → (x, v), but by abuse of notation one
often writes f (x) = v. If γ : [0, 1] → M is an injective path, one defines
a vector field along γ as a path ξ : [0, 1] → TM such that π ◦ ξ = γ.
If p = p(x) is a linear form varying smoothly on Tx M , then it can
be identified, thanks to the metric g, to a vector field ξ, via the formula

p(x) · v = ξ(x) · v,

where v ∈ Tx M , and the dot in the left-hand side just means “p(x)
applied to v”, while the dot in the right-hand side stands for the scalar
product defined by g. As a particular case, if p is the differential of a
function f , the corresponding vector field ξ is the gradient of f , denoted
by ∇f or ∇x f .
If f = f (x, v) is a function on TM , then one can differentiate it with
respect to x or with respect to v. Since T(x,v) Tx M ≃ Tx M , both dx f
and dv f can be seen as linear forms on Tx M ; this allows us to define
∇x f and ∇v f , the “gradient with respect to the position variable” and
the “gradient with respect to the velocity variable”.
Differentiating functions does not pose any particular conceptual
problem, but differentiating vector fields is quite a different story. If ξ
is a vector field on M , then ξ(x) and ξ(y) live in different vector spaces,
so it does not a priori make sense to compare them, unless one identifies
in some sense Tx M and Ty M . (Of course, one could say that ξ is a map
M → TM and define its differential as a map TM → T (TM ) but this
162 7 Displacement interpolation

is of little use, because T (TM ) is “too large”; it is much better if we

can come up with a reasonable notion of derivation which produces a
map TM → TM .)
There is in general no canonical way to identify Tx M and Ty M if
x 6= y; but there is a canonical way to identify Tγ0 M and Tγt M as
t varies continuously. This operation is called parallel transport, or
Levi-Civita transport. A vector field which is transported in a parallel
way along the curve γ will “look constant” to an observer who lives
in M and travels along γ. If M is a surface embedded in R3 , parallel
transport can be described as follows: Start from the tangent plane at
γ(0), and then press your plane onto M along γ, in such a way that
there is no friction (no slip, no rotation) between the plane and the
surface.
With this notion it becomes possible to compute the derivative of
a vector field along a path: If γ is a path and ξ is a vector field along
γ, then the derivative of ξ is another vector field along γ, say t → ξ̇(t),
defined by
d
ξ̇(t0 ) = θt→t0 (ξ(γt )),
dt t=t0
where θt→t0 is the parallel transport from Tγt M to Tγt0 M along γ. This
makes sense because θt→t0 ξ(γt ) is an element of the fixed vector space
˙ is a vector field along γ, called the covariant
Tγt0 M . The path t → ξ(t)
derivative of ξ along γ, and denoted by ∇γ̇ ξ, or, if there is no possible
confusion about the choice of γ, Dξ/Dt (or simply dξ/dt). If M = Rn ,
then ∇γ̇ ξ coincides with (γ̇ · ∇)ξ.
It turns out that the value of ξ̇(t0 ) only depends on γt0 , on the values
of ξ in a neighborhood of γt0 , and on the velocity γ̇t0 (not on the whole
path γt ). Thus if ξ is a vector field defined in the neighborhood of a
point x, and v is a tangent vector at x, it makes sense to define ∇v ξ
by the formula
Dξ
∇v ξ(x) = (0), γ0 = x, γ̇0 = v.
Dt
The quantity ∇v ξ(x) is “the covariant derivative of the vector field ξ
at x in direction v.” Of course, if ξ and v are two vector fields, one
can define a vector field ∇v ξ by the formula (∇v ξ)(x) = (∇v(x) ξ)(x).
The linear operator v 7−→ ∇v ξ(x) is the covariant gradient of ξ at
x, denoted by ∇ξ(x) or ∇x ξ; it is a linear operation Tx M → Tx M .
It is worth noticing explicitly that the notion of covariant derivation
coincides with the convective derivation used in fluid mechanics (for in-
Appendix: Paths in metric structures 163

stance in Euler’s equation for an incompressible ﬂuid). I shall sometimes

adopt the notation classically used in fluid mechanics: (∇ξ)v = v · ∇ξ.
(On the contrary, the notation (∇ξ) · v should rather be reserved for
(∇ξ)∗ v, where (∇ξ)∗ is the adjoint of ∇ξ; then hv · ∇ξ, wi = hv, ∇ξ · wi
and we are back to the classical conventions of Rn .)
The procedure of parallel transport allows one to define the covari-
ant derivation; conversely, the equations of parallel transport along γ
can be written as Dξ/Dt = 0, where D/Dt is the covariant derivative
along γ. So it is equivalent to define the notion of covariant derivation,
or to define the rules of parallel transport.
There are (at least) three points of view about the covariant deriva-
tion. The first one is the extrinsic point of view: Let us think of M as
an embedded surface in RN ; that is, M is a subset of RN , it is equipped
with the topology induced by RN , and the quadratic form gx is just
the usual Euclidean scalar product on RN , restricted to Tx M . Then
the covariant derivative is defined by

d(ξ(γt ))
ξ̇(t) = ΠTγt M ,
dt

where ΠTx M stands for the orthogonal projection (in RN ) onto Tx M . In

short, the covariant derivative is the projection of the usual derivative
onto the tangent space.
While this definition is very simple, it does not reveal the fact that
the covariant derivation and parallel transport are intrinsic notions,
which are invariant under isometry and do not depend on the embed-
ding of M into RN , but just on g. An intrinsic way to define covariant
derivation is as follows: It is uniquely characterized by the two natural
rules
d ˙ ζi + hξ, ζ̇i; D
hξ, ζi = hξ, (f ξ) = f˙ ξ + f ξ̇, (7.28)
dt Dt
where the dependence of all the expressions on t is implicit; and by the
not so natural rule
∇ζ ξ − ∇ξ ζ = [ξ, ζ].
Here [ξ, ζ] is the Lie bracket of ξ and ζ, which is defined as the unique
vector field such that for any function F ,

(dF ) · [ξ, ζ] = d(dF · ξ) · ζ − d(dF · ζ) · ξ.

164 7 Displacement interpolation

Further, note that in the second formula of (7.28) the symbol f˙ stands
for the usual derivative of t → f (γt ); while the symbols ξ̇ and ζ̇ stand
for the covariant derivatives of the vector fields ξ and ζ along γ.
A third approach to covariant derivation is based on coordinates.
Let x ∈ M , then there is a neighborhood O of x which is diffeomorphic
to some open subset U ⊂ Rn . Let Φ be a diffeomorphism U → O, and
let (e1 , . . . , en ) be the usual basis of Rn . A point m in O is said to have
coordinates (y 1 , . . . , y n ) if m = Φ(y 1 , . . . , y n ); and a vector v ∈ Tm M
is said to have components v 1 , . . . , v k if d(y1 ,...,yn ) Φ · (v1 , . . . , vk ) = v.
Then the P coefficients of the metric g are the functions gij defined by
g(v, v) = gij v i v j .
The coordinate point of view reduces everything to “explicit” com-
putations and formulas in Rn ; for instance the derivation of a function
f along the ith direction is defined as ∂i f := (∂/∂y i )(f ◦ Φ). This is
conceptually simple, but rapidly leads to cumbersome expressions. A
central role in these formulas is played by the Christoffel symbols,
which are defined by

1 X
n
Γijm := ∂i gjk + ∂j gki − ∂k gij gkm ,
2
k=1

where (gij ) is by convention the inverse of (gij ). Then the covariant

derivation along γ is given by the formula
k
Dξ dξ k X k i j
= + Γij γ̇ ξ .
Dt dt
ij

Be it in the extrinsic or the intrinsic or the coordinate point of view,

the notion of covariant derivative is one of the cornerstones on which
differential Riemannian geometry has been constructed.
Another important concept is that of Riemannian volume, which
I shall denote by vol. It can be defined intrinsically as the n-dimensional
Hausdorff measure associated with the geodesic distance p(where n is the
dimension of the manifold). In coordinates, vol(dx) = det(g) dx. The
Riemannian volume plays the same role as the Lebesgue measure in
Rn .
After these reminders about Riemannian calculus, we can go back
to the study of action minimization. Let L(x, v, t) be a smooth La-
grangian on TM × [0, 1]. To find an equation satisfied by the curves
Appendix: Paths in metric structures 165

which minimize the action, we can compute the diﬀerential of the ac-
tion. So let γ be a curve, and h a small variation of that curve. (This
amounts to considering a family γs,t in such a way that γ0,t = γt and
(d/ds)|s=0 γs,t = h(t).) Then the inﬁnitesimal variation of the action A
at γ, along the variation h, is
Z 1
dA(γ) · h = ∇x L(γt , γ̇t , t) · h(t) + ∇v L(γt , γ̇t , t) · ḣ(t) dt.
0

Thanks to (7.28) we can perform an integration by parts with respect

to the time variable, and get
Z 1
d
dA(γ) · h = ∇x L − (∇v L) (γt , γ̇t , t) · h(t) dt
0 dt
+ (∇v L)(γ1 , γ̇1 , 1) · h(1) − (∇v L)(γ0 , γ̇0 , 0) · h(0). (7.29)

This is the first variation formula.

When the endpoints x, y of γ are ﬁxed, the tangent curve h vanishes
at t = 0 and t = 1. Since h is otherwise arbitrary, it is easy to deduce
the equation for minimizers:
d
∇v L = ∇x L. (7.30)
dt
More explicitly, if a diﬀerentiable curve (γt )0≤t≤1 is minimizing, then
d
∇v L(γt , γ̇t , t) = ∇x L(γt , γ̇t , t), 0 < t < 1.
dt
This is the Euler–Lagrange equation associated with the Lagrangian
L; to memorize it, it is convenient to write it as
d ∂L ∂L
= , (7.31)
dt ∂ ẋ ∂x
so that the two time-derivatives in the left-hand side formally “can-
cel out”. Note carefully that the left-hand side of the Euler–Lagrange
equation involves the time-derivative of a curve which is valued in TM ;
so (d/dt) in (7.31) is in fact a covariant derivative along the minimizing
curve γ, the same operation as we denoted before by ∇γ̇ , or D/Dt.
The most basic example is when L(x, v, t) = |v|2 /2. Then ∇v L = v
and the equation reduces to dv/dt = 0, or ∇γ̇ γ̇ = 0, which is the usual
equation of vanishing acceleration. Curves with zero acceleration are
called geodesics; their equation, in coordinates, is
166 7 Displacement interpolation
X
γ̈ k + Γijk γ̇ i γ̇ j = 0.
ij

(Note: γ̈ k is the derivative of t → γ̇ k (t), not the kth component of γ̈.)

The speed of such a curve γ is constant, and to stress this fact one can
say that these are constant-speed geodesics, by opposition with general
geodesics that can be reparametrized in an arbitrary way. Often I shall
just say “geodesics” for constant-speed geodesics. It is equivalent to say
that a geodesic γ has constant speed, or that its length between any
two times s < t is proportional to t − s.
An important concept related to geodesics is that of the exponen-
tial map. If x ∈ M and v ∈ Tx M are given, then expx v is deﬁned as
γ(1), where γ : [0, 1] → M is the unique constant-speed geodesic start-
ing from γ(0) = x with velocity γ̇(0) = v. The exponential map is a
convenient notation to handle “all” geodesics of a Riemannian manifold
at the same time.
We have seen that minimizing curves have zero acceleration, and the
converse is also true locally, that is if γ1 is very close to γ0 . A curve which
minimizes the action between its endpoints is called a minimizing
geodesic, or minimal geodesic, or simply a geodesic. The Hopf–Rinow
theorem guarantees that if the manifold M (seen as a metric space) is
complete, then any two points in M are joined by at least one minimal
geodesic. There might be several minimal geodesics joining two points
x and y (to see this, consider two antipodal points on the sphere), but
geodesics are:

• nonbranching: Two geodesics that are deﬁned on a time interval [0, t]

and coincide on [0, t′ ] for some t′ > 0 have to coincide on the whole
of [0, t]. Actually, a stronger statement holds true: The velocity of
the geodesic at time t = 0 uniquely determines the ﬁnal position at
time t = 1 (this is a consequence of the uniqueness statement in the
Cauchy–Lipschitz theorem).
• locally unique: For any given x, there is rx > 0 such that any y in the
ball Brx (x) can be connected to x by a single geodesic γ = γ x→y ,
and then the map y 7−→ γ̇(0) is a diﬀeomorphism (this corresponds
to parametrize the endpoint by the initial velocity).
• almost everywhere unique: For any x, the set of points y that can
be connected to x by several (minimizing!) geodesics is of zero mea-
sure. A way to see this is to note that the square distance function
Appendix: Paths in metric structures 167

d2 (x, · ) is locally semiconcave, and therefore diﬀerentiable almost

everywhere. (See Chapter 10 for background about semiconcavity.)

The set Γx,y of (minimizing, constant speed) geodesics joining x

and y might not be single-valued, but in any case it is compact in
C([0, 1], M ), even if M is not compact. To see this, note that (i) the
image of any element of Γx,y lies entirely in the ball B x, d(x, y) , so
Γx,y is uniformly bounded, (ii) elements in Γx,y are d(x, y)-Lipschitz,
so they constitute an equi-Lipschitz family; (iii) Γx,y is closed because
it is defined by the equations γ(0) = x, γ(1) = y, L(γ) ≤ d(γ0 , γ1 )
(the length functional L is not continuous with respect to uniform
convergence, but it is lower semicontinuous, so an upper bound on
the length defines a closed set); (iv) M is locally compact, so Ascoli’s
compactness theorem applies to functions with values in M .
A similar argument shows that for any two given compact sets Ks
and Kt , the set of geodesics γ such that γs ∈ Ks and γt ∈ Kt is
compact in C([s, t]; M ). So the Lagrangian action defined by As,t (γ) =
L(γ)2 /(t − s) is coercive in the sense of Definition 7.13.
Most of these statements can be generalized to the action coming
from a Lagrangian function L(x, v, t) on TM × [0, 1], if L is C 2 and
satisfies the classical conditions of Definition 7.6. In particular the as-
sociated cost functions will be continuous. Here is a sketch of the proof:
Let x and y be two given points, and let xk → x and yk → y be con-
verging sequences. For any ε > 0, small enough,

cs,t (xk , yk ) ≤ cs,s+ε (xk , x) + cs+ε,t−ε (x, y) + ct−ε,t (y, yk ). (7.32)

It is easy to show that there is a uniform bound K on the speeds of all

minimizing curves which achieve the costs appearing above. Then the
Lagrangian is uniformly bounded on these curves, so cs,s+ε (xk , x) =
O(ε), ct−ε,t (y, yk ) = O(ε). Also it does not aﬀect much the Lagrangian
(evaluated on candidate minimizers) to reparametrize [s + ε, t − ε]
into [s, t] by a linear change of variables, so cs+ε,t−ε (x, y) converges to
cs,t (x, y) as s → t. This proves the upper semicontinuity, and therefore
the continuity, of cs,t .
In fact there is a ﬁner statement: cs,t is superdifferentiable. This
notion will be explained and developed later in Chapter 10.

Besides the Euclidean space, Riemannian manifolds constitute in

some sense the most regular metric structure used by mathemati-
168 7 Displacement interpolation

cians. A Riemannian structure comes with many nice features (cal-

culus, length, distance, geodesic equations); it also has a well-defined
dimension n (the dimension of the manifold) and carries a natural vol-
ume.
Finsler structures constitute a generalization of the Riemannian
structure: one has a differentiable manifold, with a norm on each tan-
gent space Tx M , but that norm does not necessarily come from a scalar
product. One can then define lengths of curves, the induced distance as
for a Riemannian manifold, and prove the existence of geodesics, but
the geodesic equations are more complicated.
Another generalization is the notion of length space (or intrinsic
length space), in which one does not necessarily have tangent spaces,
yet one assumes the existence of a length L and a distance d which are
compatible, in the sense that
 Z 1
 d γt , γt+ε

L(γ) = |γ̇t | dt, |γ̇t | := lim sup ,
 0 ε→0 |ε|

 n o

d(x, y) = inf L(γ); γ0 = x, γ1 = y .

In practice the following criterion is sometimes useful: A complete

metric space (X , d) is a length space if and only if for any two points
in X , and any ε > 0 one can ﬁnd an ε-midpoint of (x, y), i.e. a point
mε such that

d(x, y) d(x, y)
− d(x, mε ) ≤ ε, − d(y, mε ) ≤ ε.
2 2

Minimizing paths are fundamental objects in geometry. A length

space in which any two points can be joined by a minimizing path,
or geodesic, is called a geodesic space, or strictly intrinsic length
space, or just (by abuse of language) length space. There is a criterion
in terms of midpoints: A complete metric space (X , d) is a geodesic
space if and only if for any two points in X there is a midpoint, which
is of course some m ∈ X such that
d(x, y)
d(x, m) = d(m, y) = .
2
There is another useful criterion: If the metric space (X , d) is a com-
plete, locally compact length space, then it is geodesic. This is a gen-
eralization of the Hopf–Rinow theorem in Riemannian geometry. One
Bibliographical notes 169

can also reparametrize geodesic curves γ in such a way that their speed
|γ̇| is constant, or equivalently that for all intermediate times s and t,
their length between times s and t coincides with the distance between
their positions at times s and t.
The same proof that I sketched for Riemannian manifolds applies
in geodesic spaces, to show that the set Γx,y of (minimizing, constant
speed) geodesics joining x to y is compact; more generally, the set
ΓK0 →K1 of geodesics γ with γ0 ∈ K0 and γ1 ∈ K1 is compact, as soon
as K0 and K1 are compact. So there are important common points be-
tween the structure of a length space and the structure of a Riemannian
manifold. From the practical point of view, some main differences are
that (i) there is no available equation for geodesic curves, (ii) geodesics
may “branch”, (iii) there is no guarantee that geodesics between x and
y are unique for y very close to x, (iv) there is neither a unique notion of
dimension, nor a canonical reference measure, (v) there is no guarantee
that geodesics will be almost everywhere unique. Still there is a the-
ory of differential analysis on nonsmooth geodesic spaces (first variation
formula, norms of Jacobi fields, etc.) mainly in the case where there are
lower bounds on the sectional curvature (in the sense of Alexandrov,
as will be described in Chapter 26).

Bibliographical notes

There are plenty of classical textbooks on Riemannian geometry, with

variable degree of pedagogy, among which the reader may consult [223],
[306], [394]. For an introduction to the classical calculus of variations
in dimension 1, see for instance [347, Chapters 2–3], [177], or [235].
For an introduction to the Hamiltonian formalism in classical mechan-
ics, one may use the very pedagogical treatise by Arnold [44], or the
more complex one by Thirring [780]. For an introduction to analysis
in metric spaces, see Ambrosio and Tilli [37]. A wonderful introduc-
tion to the theory of length spaces can be found in Burago, Burago
and Ivanov [174]. In the latter reference, a Riemannian manifold is de-
fined as a length space which is locally isometric to Rn equipped with
a quadratic form gx depending smoothly on the point x. This defini-
tion is not standard, but it is equivalent to the classical definition, and
in some sense more satisfactory if one wishes to emphasize the metric
point of view. Advanced elements of differential analysis on nonsmooth
170 7 Displacement interpolation

metric spaces can be found also in the literature on Alexandrov spaces,

see the bibliographical notes of Chapter 26.
I may have been overcautious in the formulation of the classical con-
ditions in Definition 7.6, but in the time-dependent case there are crazy
counterexamples showing that nice C 1 Lagrangian functions do not
necessarily have C 1 minimizing curves (equivalently, these minimizing
curves won’t solve the Euler–Lagrange equation); see for instance the
constructions by Ball and Mizel [63, 64]. (A backwards search in Math-
SciNet will give access to many papers concerned with multidimensional
analogs of this problem, in relation with the so-called Lavrentiev phe-
nomenon.) I owe these remarks to Mather, who also constructed such
counterexamples and noticed that many authors have been fooled by
these issues. Mather [601, Section 2 and Appendices], Clarke and Vin-
ter [235, 236, 823] discuss in great detail sufficient conditions under
which everything works fine. In particular, if L is C 1 , strictly convex
superlinear in v and time-independent, then minimizing curves are au-
tomatically C 1 and satisfy the Euler–Lagrange equation. (The difficult
point is to show that minimizers are Lipschitz; after that it is easier to
see that they are at least as smooth as the Lagrangian.)
I introduced the abstract concept of “coercive Lagrangian action”
for the purpose of this course, but this concept looks so natural to me
that I would not be surprised if it had been previously discussed in the
literature, maybe in disguised form.
Probability measures on action-minimizing curves might look a
bit scary when encountered for the first time, but they were actu-
ally rediscovered several times by various researchers, so they are ar-
guably natural objects: See in particular the works by Bernot, Caselles
and Morel [109, 110] on irrigation problems; by Bangert [67] and
Hohloch [476] on problems inspired by geometry and dynamical sys-
tems; by Ambrosio on transport equations with little or no regu-
larity [21, 30]. In fact, in the context of partial differential equa-
tions, this approach already appears in the much earlier works of Bre-
nier [155, 157, 158, 159] on the incompressible Euler equation and re-
lated systems. One technical difference is that Brenier considers prob-
ability measures on the huge (nonmetrizable) space of measurable
paths, while the other above-mentioned authors only consider much
smaller spaces consisting of continuous, or Lipschitz-continuous func-
tions. There are important subtleties with probability measures on non-
metrizable spaces, and I advise the reader to stay away from them.
Bibliographical notes 171

Also in relation to the irrigation problem, various models of traﬃc

plans and “dynamic cost function” are studied in [108, 113]; while paths
in the space of probability measures are considered in [152].
The Hamilton–Jacobi equation with a quadratic cost function (i.e.
L(x, v, t) = |v|2 ) will be considered in more detail in Chapter 22; see in
particular Proposition 22.16. For further information about Hamilton–
Jacobi equations, there is an excellent book by Cannarsa and Sines-
trari [199]; one may also consult [68, 327, 558] and the references
therein. Of course Hamilton–Jacobi equations are closely related to
the concept of c-convexity: for instance, it is equivalent to say that ψ is
c-convex, or that it is a solution at time 0 of the backward Hamilton–
Jacobi semigroup starting at time 1 (with some arbitrary initial datum).
At the end of the proof of Proposition 7.16 I used once again the
basic measurable selection theorem which was already used in the proof
of Corollary 5.22, see the bibliographical notes on p. 104.
Interpolation arguments involving changes of variables have a long
history. The concept and denomination of displacement interpolation
was introduced by McCann [614] in the particular case of the quadratic
cost in Euclidean space. Soon after, it was understood by Brenier that
this procedure could formally be recast as an action minimization
problem in the space of measures, which would reduce to the classi-
cal geodesic problem when the probability measures are Dirac masses.
In Brenier’s approach, the action is deﬁned, at least formally, by the
formula
Z 1 Z
∂µ
A(µ) = inf |v(t, x)|2 dµt (x) dt; + ∇ · (vµ) = 0 ,
v(t,x) 0 ∂t
(7.33)
and then one has the Benamou–Brenier formula

W2 (µ0 , µ1 )2 = inf A(µ), (7.34)

where the inﬁmum is taken among all paths (µt )0≤t≤1 satisfying certain
regularity conditions. Brenier himself gave two sketches of the proof for
this formula [88, 164], and another formal argument was suggested by
Otto and myself [671, Section 3]. Rigorous proofs were later provided
by several authors under various assumptions [814, Theorem 8.1] [451]
[30, Chapter 8] (the latter reference contains the most precise results).
The adaptation to Riemannian manifolds has been considered in [278,
431, 491]. We shall come back to these formulas later on, after a more
precise qualitative picture of optimal transport has emerged. One of
172 7 Displacement interpolation

the motivations of Benamou and Brenier was to devise new numerical

methods [88, 89, 90, 91]. Wolansky [838] considered a more general
situation in the presence of sources and sinks.
There was a rather amazing precursor to the idea of displacement
interpolation, in the form of Nelson’s theory of “stochastic mechanics”.
Nelson tried to build up a formalism in which quantum effects would
be explained by stochastic fluctuations. For this purpose he considered
an action minimization problem which was also studied by Guerra and
Morato: Z 1
inf E |Ẋt |2 dt,
0
where the infimum is over all random paths (Xt )0≤t≤1 such that
law (X0 ) = µ0 , law (X1 ) = µ1 , and in addition (Xt ) solves the stochas-
tic differential equation
dXt dBt
=σ + ξ(t, Xt ),
dt dt
where σ > 0 is some coefficient, Bt is a standard Brownian motion,
and ξ is a drift, which is an unknown in the problem. (So the mini-
mization is over all possible couplings (X0 , X1 ) but also over all drifts!)
This formulation is very similar to the Benamou–Brenier formula just
alluded to, only there is the additional Brownian noise in it, thus it
is more complex in some sense. Moreover, the expected value of the
action is always infinite, so one has to renormalize it to make sense
of Nelson’s problem. Nelson made the incredible discovery that after a
change of variables, minimizers of the action produced solutions of the
free Schrödinger equation in Rn . He developed this approach for some
time, and finally gave up because it was introducing unpleasant non-
local features. I shall give references at the end of the bibliographical
notes for Chapter 23.
It was Otto [669] who first explicitly reformulated the Benamou–
Brenier formula (7.34) as the equation for a geodesic distance on a Rie-
mannian setting, from a formal point of view. Then Ambrosio, Gigli and
Savaré pointed out that if one is not interested in the equations of mo-
tion, but just in the geodesic property, it is simpler to use the metric no-
tion of geodesic in a length space [30]. Those issues were also developed
by other authors working with slightly different formalisms [203, 214].
All the above-mentioned works were mainly concerned with dis-
placement interpolation in Rn . Agueh [4] also considered the case of
cost c(x, y) = |x − y|p (p > 1) in Euclidean space. Then displacement
Bibliographical notes 173

interpolation on Riemannian manifolds was studied, from a heuristic

point of view, by Otto and myself [671]. Some useful technical tools were
introduced in the field by Cordero-Erausquin, McCann and Schmucken-
schläger [246] for Riemannian manifolds; Cordero-Erausquin adapted
them to the case of rather general strictly convex cost functions in
Rn [243].
The displacement interpolation for more general cost functions, aris-
ing from a smooth Lagrangian, was constructed by Bernard and Buf-
foni [105], who first introduced in this context Property (ii) in Theo-
rem 7.21. At the same time, they made the explicit link with the Mather
minimization problem, which will appear in subsequent chapters. This
connection was also studied independently by De Pascale, Gelli and
Granieri [278].
In all these works, displacement interpolation took place in a smooth
structure, resulting in particular in the uniqueness (almost everywhere)
of minimizing curves used in the interpolation, at least if the Lagrangian
is nice enough. Displacement interpolation in length spaces, as pre-
sented in this chapter, via the notion of dynamical transference plan,
was developed more recently by Lott and myself [577]. Theorem 7.21
in this course is new; it was essentially obtained by rewriting the proof
in [577] with enough generality added to include the setting of Bernard
and Buffoni.
The most natural examples of Lagrangian functions are those tak-
ing the form L(t, x, v) = |v|2 /2 − U (t, x), where U (t, x) is a potential
energy. In relation with incompressible fluid mechanics, Ambrosio and
Figalli [24, 25] studied the case when U is the pressure field (a priori
nonsmooth). Another case of great interest is when U (t, x) is the scalar
curvature of a manifold evolving in time according to the Ricci flow;
then, up to a correct time rescaling, the associated minimal action
is known in geometry as Perelman’s L-distance. First Topping [782],
and then Lott [576] discussed the Lagrangian action induced by the
L-distance (and some of its variants) at the level of the space of prob-
ability measures. They used this formalism to recover some key results
in the theory of Ricci flow.
Displacement interpolation in the case p = 1 is quite subtle because
of the possibility of reparametrization; it was carefully discussed in the
Euclidean space by Ambrosio [20]. Recently, Bernard and Buffoni [104]
shed some new light on that issue by making explicit the link with
the Mather–Mañé problem. Very roughly, the distance cost function is
174 7 Displacement interpolation

a typical representative of cost functions that arise from Lagrangians,

if one also allows minimization over the choice of the time-interval
[0, T ] ⊂ R (rather than fixing, say, T = 1). This extra freedom accounts
for the degeneracy of the problem.
Lagrangian cost functions of the form V (γ) |γ̇|, where V is a “Lya-
punov functional”, have been used by Hairer and Mattingly [458] in
relation to convergence to equilibrium, as a way to force the system
to visit a compact set. Such cost functions also appear formally in the
modeling of irrigation [108], but in fact this is a nonlinear problem
since V (γ) is determined by the total mass of particles passing at a
given point.
The observation in Remark 7.27 came from a discussion with
S. Evans, who pointed out to me that it was difficult, if not impos-
sible, to get characterizations of random processes expressed in terms
of the measures when working in state spaces that are not locally com-
pact (such as the space of real trees). In spite of that remark, recently
Lisini [565] was able to obtain representation theorems for general ab-
solutely continuousR paths (µt )0≤t≤1 in the Wasserstein space Pp (X )
(p > 1), as soon as kµ̇t kpPp dt < ∞, where X is just a Polish space and
kµ̇t kPp is the metric speed in Pp (X ). He showed that such a curve may
be written as (et )# Π, where Π is the law of a random absolutely con-
tinuous curve γ; as a consequence, he could generalize Corollary 7.22 by
removing the assumption of local compactness. Lisini also established a
metric replacement for the relation of conservation of mass: For almost
all t,
E |γ̇t |p ≤ kµ̇kpPp .
He further applied his results to various problems about transport in
infinite-dimensional Banach spaces.
Proposition 7.29 is a generalization of a result appearing in [444].
Gigli communicated to me an alternative proof, which is more elemen-
tary and needs neither local compactness, nor length space property.
8

The Monge–Mather shortening principle

Monge himself made the following important observation. Consider the

transport cost c(x, y) = |x − y| in the Euclidean plane, and two pairs
(x1 , y1 ), (x2 , y2 ), such that an optimal transport maps x1 to y1 and x2
to y2 . (In our language, (x1 , y1 ) and (x2 , y2 ) belong to the support of
an optimal coupling π.) Then either all four points lie on a single line,
or the two line segments [x1 , y1 ], [x2 , y2 ] do not cross, except maybe
at their endpoints. The reason is easy to grasp: If the two lines would
cross at a point which is not an endpoint of both lines, then, by triangle
inequality we would have

|x1 − y2 | + |x2 − y1 | < |x1 − y1 | + |x2 − y2 |,

and this would contradict the fact that the support of π is c-cyclically
monotone. Stated otherwise: Given two crossing line segments, we can
shorten the total length of the paths by replacing these lines by the
new transport lines [x1 , y2 ] and [x2 , y1 ] (see Figure 8.1).

Quadratic cost function

For cost functions that do not satisfy a triangle inequality, Monge’s ar-
gument does not apply, and pathlines can cross. However, it is often the
case that the crossing of the curves (with the time variable explicitly
taken into account) is forbidden. Here is the most basic example: Con-
sider the quadratic cost function in Euclidean space (c(x, y) = |x−y|2 ),
and let (x1 , y1 ) and (x2 , y2 ) belong to the support of some optimal cou-
pling. By cyclical monotonicity,
176 8 The Monge–Mather shortening principle

x1
x2

y1
y2

Fig. 8.1. Monge’s observation. The cost is Euclidean distance; if x1 is sent to y1

and x2 to y2 , then it is cheaper to send x1 to y2 and x2 to y1 .

|x1 − y1 |2 + |x2 − y2 |2 ≤ |x1 − y2 |2 + |x2 − y1 |2 . (8.1)

Then let

γ1 (t) = (1 − t) x1 + t y1 , γ2 (t) = (1 − t) x2 + t y2

be the two line segments respectively joining x1 to y1 , and x2 to y2 . It

may happen that γ1 (s) = γ2 (t) for some s, t ∈ [0, 1]. But if there is a
t0 ∈ (0, 1) such that γ1 (t0 ) = γ2 (t0 ) =: X, then

|x1 − y2 |2 + |x2 − y1 |2

= |x1 − X|2 + |X − y2 |2 − 2 X − x1 , X − y2

+ |x2 − X|2 + |X − y1 |2 − 2 X − x2 , X − y1

= [t20 + (1 − t0 )2 ] |x1 − y1 |2 + |x2 − y2 |2

+ 4 t0 (1 − t0 ) x1 − y1 , x2 − y2

≤ t20 + (1 − t0 )2 + 2 t0 (1 − t0 ) |x1 − y1 |2 + |x2 − y2 |2
= |x1 − y1 |2 + |x2 − y2 |2 ,

and the inequality is strict unless x1 − y1 = x2 − y2 , in which case

γ1 (t) = γ2 (t) for all t ∈ [0, 1]. But strict inequality contradicts (8.1).
The conclusion is that two distinct interpolation trajectories cannot
meet at intermediate times.
It is natural to ask whether this conclusion can be reinforced in a
quantitative statement. The answer is yes; in fact there is a beautiful
identity:
General statement and applications to optimal transport 177
2

(1−t)x1 +ty1 − (1−t)x2 +ty2 = (1−t)2 |x1 −x2 |2 +t2 |y1 −y2 |2

2 2 2 2
+ t(1 − t) |x1 − y2 | + |x2 − y1 | − |x1 − y1 | − |x2 − y2 | . (8.2)

To appreciate the consequences of (8.2), let

γ1 (t) = (1 − t) x1 + t y1 , γ2 (t) = (1 − t) x2 + t y2 .

Then (8.1) and (8.2) imply

1 1
max |x1 − x2 |, |y1 − y2 | ≤ max , |γ1 (t) − γ2 (t)|.
t 1−t
Since |γ1 (t) − γ2 (t)| ≤ max(|x1 − x2 |, |y1 − y2 |) for all t ∈ [0, 1], one can
conclude that for any t0 ∈ (0, 1),

1 1
sup |γ1 (t) − γ2 (t)| ≤ max , |γ1 (t0 ) − γ2 (t0 )|. (8.3)
0≤t≤1 t0 1 − t0

(By the way, this inequality is easily seen to be optimal.) So the uniform
distance between the whole paths γ1 and γ2 can be controlled by their
distance at some time t0 ∈ (0, 1).

General statement and applications to optimal transport

For the purpose of a seemingly diﬀerent problem, Mather (not aware

of Monge’s work, neither of optimal transport) established an estimate
which relies on the same idea as Monge’s shortening argument — only
much more sophisticated — for general cost functions on Lagrangian
manifolds. He obtained a quantitative version of these estimates, in a
form quite similar to (8.3).
Mather’s proof uses three kinds of assumption: (i) the existence
of a second-order diﬀerential equation for minimizing curves; (ii) an
assumption of regularity of the Lagrangian, and (iii) an assumption
of strict convexity of the Lagrangian. To quantify the strict convexity,
I shall use the following concept: A continuous function L on Rn will
be said to be (2 + κ)-convex if it satisﬁes a (strict) convexity inequality
of the form

L(v) + L(w) v+w
−L ≥ K |v − w|2+κ
2 2
178 8 The Monge–Mather shortening principle

for some constant K > 0.

The next statement is a slight generalization of Mather’s estimate; if
the reader ﬁnds it too dense, he or she can go directly to Corollary 8.2
which is simpler, and suﬃcient for the rest of this course.

Theorem 8.1 (Mather’s shortening lemma). Let M be a smooth

Riemannian manifold, equipped with its geodesic distance d, and let
c(x, y) be a cost function on M × M , defined by a Lagrangian L(x, v, t)
on TM × [0, 1]. Let x1 , x2 , y1 , y2 be four points on M such that

c(x1 , y1 ) + c(x2 , y2 ) ≤ c(x1 , y2 ) + c(x2 , y1 ).

Further, let γ1 and γ2 be two action-minimizing curves respectively join-

ing x1 to y1 and x2 to y2 . Let V be a bounded neighborhood of the graphs
of γ1 and γ2 in M × [0, 1], and S a strict upper bound on the maximal
speed along these curves. Define
[
V := x, BS (0), t ⊂ TM × [0, 1].
(x,t)∈V

In words, V is a neighborhood of γ1 and γ2 , convex in the velocity

variable.
Assume that:
(i) minimizing curves for L are solutions of a Lipschitz flow, in the
sense of Definition 7.6 (d);
(ii) L is of class C 1,α in V with respect to the variables x and v,
for some α ∈ (0, 1] (so ∇x L and ∇v L are Hölder-α; Hölder-1 meaning
Lipschitz);
(iii) L is (2 + κ)-convex in V, with respect to the v variable.
Then, for any t0 ∈ (0, 1), there is a constant Ct0 = C(L, V, t0 ), and a
positive exponent β = β(α, κ) such that
β
sup d γ1 (t), γ2 (t) ≤ Ct0 d γ1 (t0 ), γ2 (t0 ) . (8.4)
0≤t≤1

Furthermore, if α = 1 and κ = 0, then one can choose β = 1 and

Ct0 = C(L, V)/ min(t0 , 1 − t0 ).

If L is of class C 2 and ∇2v L > 0, then Assumption (iii) will be true

for κ = 0, so we have the next corollary:
General statement and applications to optimal transport 179

Corollary 8.2 (Mather’s shortening lemma again). Let M be a

smooth Riemannian manifold and let L = L(x, v, t) be a C 2 Lagrangian
on TM × [0, 1], satisfying the classical assumptions of Definition 7.6,
together with ∇2v L > 0. Let c be the cost function associated to L, and
let d be the geodesic distance on M . Then, for any compact K ⊂ M
there is a constant CK such that, whenever x1 , y1 , x2 , y2 are four points
in K with

c(x1 , y1 ) + c(x2 , y2 ) ≤ c(x1 , y2 ) + c(x2 , y1 ),

and γ1 , γ2 are action-minimizing curves joining respectively x1 to y1

and x2 to y2 , then for any t0 ∈ (0, 1),
CK
sup d γ1 (t), γ2 (t) ≤ d γ1 (t0 ), γ2 (t0 ) . (8.5)
0≤t≤1 min(t0 , 1 − t0 )

The short version of the conclusion is that the distance between

γ1 and γ2 is controlled, uniformly in t, by the distance at any time
t0 ∈ (0, 1). In particular, the initial and final distance between these
curves is controlled by their distance at any intermediate time. (But
the final distance is not controlled by the initial distance!) Once again,
inequalities (8.4) or (8.5) are quantitative versions of the qualitative
statement that the two curves, if distinct, cannot cross except at initial
or final time.

Example 8.3. The cost function c(x, y) = d(x, y)2 corresponds to the
Lagrangian function L(x, v, t) = |v|2 , which obviously satisﬁes the as-
sumptions of Corollary 8.2. In that case the exponent β = 1 is ad-
missible. Moreover, it is natural to expect that the constant CK can
be controlled in terms of just a lower bound on the sectional curvature
of M . I shall come back to this issue later in this chapter (see Open
Problem 8.21).

Example 8.4. The cost function c(x, y) = d(x, y)1+α does not satisfy
the assumptions of Corollary 8.2 for 0 < α < 1. Even if the associated
Lagrangian L(x, v, t) = |v|1+α is not smooth, the equation for mini-
mizing curves is just the geodesic equation, so Assumption (i) in The-
orem 8.1 is still satisﬁed. Then, by tracking exponents in the proof of
Theorem 8.1, one can ﬁnd that (8.4) holds true with β = (1+α)/(3−α).
But this is far from optimal: By taking advantage of the homogeneity
of the power function, one can prove that the exponent β = 1 is also
180 8 The Monge–Mather shortening principle

admissible, for all α ∈ (0, 1). (It is the constant, rather than the expo-
nent, which deteriorates as α ↓ 0.) I shall explain this argument in the
Appendix, in the Euclidean case, and leave the Riemannian case as a
delicate exercise. This example suggests that Theorem 8.1 still leaves
room for improvement.

The proof of Theorem 8.1 is a bit involved and before presenting

it I prefer to discuss some applications in terms of optimal couplings.
In the sequel, if K is a compact subset of M , I say that a dynamical
optimal transport Π is supported in K if it is supported on geodesics
whose image lies entirely inside K.

Theorem 8.5 (The transport from intermediate times is lo-

cally Lipschitz). On a Riemannian manifold M , let c be a cost func-
tion satisfying the assumptions of Corollary 8.2, let K be a compact
subset of M , and let Π be a dynamical optimal transport supported in
K. Then Π is supported on a set of geodesics S such that for any two
e ∈ S,
γ, γ

e(t) ≤ CK (t0 ) d γ(t0 ), γe(t0 ) .
sup d γ(t), γ (8.6)
0≤t≤1

In particular, if (µt )0≤t≤1 is a displacement interpolation between

any two compactly supported probability measures on M , and t0 ∈ (0, 1)
is given, then for any t ∈ [0, 1] the map

Tt0 →t : γ(t0 ) 7−→ γ(t)

is well-defined µt0 -almost surely and Lipschitz continuous on its do-

main; and it is in fact the unique solution of the Monge problem be-
tween µt0 and µt . In other words, the coupling (γ(t0 ), γ(t)) is an optimal
deterministic coupling.

Example 8.6. On Rn with c(x, y) = |x − y|2 , let µ0 = δ0 and let

µ = law (X) be arbitrary. Then it is easy to check that µt = law (tX),
and in fact the random geodesic γ(t) is just tγ(1). So γ(t) = tγ(t0 )/t0 ,
which obviously provides a deterministic coupling. This example is eas-
ily adapted to more general geometries (see Figure 8.2).

Proof of Theorem 8.5. The proof consists only in formalizing things

that by now may look essentially obvious to the reader. First note
General statement and applications to optimal transport 181

111111111111
000000000000
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
µ1
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
µ0 = δx0 000000000000
111111111111

Fig. 8.2. In this example the map γ(0) → γ(1/2) is not well-defined, but the map
γ(1/2) → γ(0) is well-defined and Lipschitz, just as the map γ(1/2) → γ(1). Also
µ0 is singular, but µt is absolutely continuous as soon as t > 0.

that (e0 , e1 , e0 , e1 )# (Π ⊗ Π) = π ⊗ π, where π is an optimal coupling

between µ0 and µ1 . So if a certain property holds true π ⊗ π-almost
surely for quadruples, it also holds true Π ⊗ Π-almost surely for the
endpoints of pairs of curves.
Since π is optimal, it is c-cyclically monotone (Theorem 5.10 (ii)),
so, π ⊗ π(dx dy de x dey )-almost surely,

x, ye) ≤ c(x, ye) + c(e

c(x, y) + c(e x, y).

Thus, Π ⊗ Π(dγ de
γ )-almost surely,

γ (0), γe(1)) ≤ c(γ(0), γe(1)) + c(e

c(γ(0), γ(1)) + c(e γ (0), γ(1)).

Then (8.6) follows from Corollary 8.2.

Let S be the support of Π; by assumption this is a compact set.
Since the inequality (8.6) defines a closed set of pairs of geodesics,
actually it has to hold true for all pairs (γ, γe) ∈ S × S.
Now define the map Tt0 →t on the compact set et0 (S) (that is, the
union of all γ(t0 ), when γ varies over the compact set S), by the for-
mula Tt0 →t (γ(t0 )) = γ(t). This map is well-defined, for if two geodesics
γ and γ e in the support of Π are such that γ(t0 ) = γ e(t0 ), then inequal-
ity (8.6) imposes γ = e γ . The same inequality shows that Tt0 →t is actu-
ally Lipschitz-continuous, with Lipschitz constant CK / min(t0 , 1 − t0 ).
All this shows that (γ(t0 ), Tt0 →t (γ(t0 ))) is indeed a Monge coupling
of (µt0 , µt ), with a Lipschitz map. To complete the proof of the theorem,
it only remains to check the uniqueness of the optimal coupling; but
this follows from Theorem 7.30(iii). ⊓
⊔
182 8 The Monge–Mather shortening principle

The second application is a result of “preservation of absolute con-

tinuity”.
Theorem 8.7 (Absolute continuity of displacement interpola-
tion). Let M be a Riemannian manifold, and let L(x, v, t) be a C 2
Lagrangian on TM × [0, 1], satisfying the classical conditions of Defi-
nition 7.6, with ∇2v L > 0; let c be the associated cost function. Let µ0
and µ1 be two probability measures on M such that the optimal cost
C(µ0 , µ1 ) is finite, and let (µt )0≤t≤1 be a displacement interpolation
between µ0 and µ1 . If either µ0 or µ1 is absolutely continuous with re-
spect to the volume on M , then also µt is absolutely continuous for all
t ∈ (0, 1).
Proof of Theorem 8.7. Let us assume for instance that µ1 is absolutely
continuous, and prove that µt0 is absolutely continuous (0 < t0 < 1).
First consider the case when µ0 and µ1 are compactly supported.
Then the whole displacement interpolation is compactly supported, and
Theorem 8.5 applies, so there is a Lipschitz map T solving the Monge
problem between µt0 and µt1 .
Now if N is a set of zero volume, the inclusion N ⊂ T −1 (T (N ))
implies

µt0 [N ] ≤ µt0 T −1 (T (N )) = (T# µt0 )[T (N )] = µ1 [T (N )], (8.7)
and the latter quantity is 0 since vol [T (N )] ≤ kT knLip vol [N ] = 0 and µ1
is absolutely continuous. So (8.7) shows that µt0 [N ] = 0 for any Borel
set N of zero volume, and this means precisely that µt0 is absolutely
continuous.
Actually, the previous computation is not completely rigorous be-
cause T (N ) is not necessarily Borel measurable; but this is not serious
since T (N ) can still be included in a negligible Borel set, and then the
proof can be repaired in an obvious way.
Now let us turn to the general case where µ0 and µ1 are not as-
sumed to be compactly supported. This situation will be handled by
a restriction argument. Assume by contradiction that µt0 is not abso-
lutely continuous. Then there exists a set Zt0 with zero volume, such
that µt0 [Zt0 ] > 0. Let Z := {γ ∈ Γ (M ); γt0 ∈ Zt0 }. Then
Π[Z] = P [γt0 ∈ Zt0 ] = µt0 [Zt0 ] > 0.
By regularity, there exists a compact set K ⊂ Z, such that Π[K] > 0.
Let
Proof of Mather’s estimates 183

1K Π
Π ′ := ,
Π[K]
and let π ′ := (e0 , e1 )# Π ′ be the associated transference plan, and µ′t =
(et )# Π ′ the marginal of Π ′ at time t. In particular,

(e1 )# Π µ1
µ′1 ≤ = ,
Π[K] Π[K]

so µ′1 is still absolutely continuous.

By Theorem 7.30(ii), (µ′t ) is a displacement interpolation. Now, µ′t0
is concentrated on et0 (K) ⊂ et0 (Z) ⊂ Zt0 , so µ′t0 is singular. But the
ﬁrst part of the proof rules out this possibility, because µ′0 and µ′1 are
respectively supported in e0 (K) and e1 (K), which are compact, and µ′1
is absolutely continuous. ⊓
⊔

Proof of Mather’s estimates

Now let us turn to the proof of Theorem 8.1. It is certainly more im-
portant to grasp the idea of the proof (Figure 8.3) than to follow the
calculations, so the reader might be content with the informal expla-
nations below and skip the rigorous proof at ﬁrst reading.

Idea of the proof of Theorem 8.1. Assume, to fix the ideas, that γ1 and
γ2 cross each other at a point m0 and at time t0 . Close to m0 , these
two curves look like two straight lines crossing each other, with re-
spective velocities v1 and v2 . Now cut these curves on the time inter-
val [t0 − τ, t0 + τ ] and on that interval introduce “deviations” (like a
plumber installing a new piece of pipe to short-cut a damaged region
of a channel) that join the first curve to the second, and vice versa.
This amounts to replacing (on a short interval of time) two curves
with approximate velocity v1 and v2 , by two curves with approximate
velocities (v1 +v2 )/2. Since the time-interval where the modification oc-
curs is short, everything is concentrated in the neighborhood of (m0 , t0 ),
so the modification in the Lagrangian action of the two curves is ap-
proximately
h i
v1 + v2
(2τ ) 2 L m0 , , t0 − L(m0 , v1 , t0 ) + L(m0 , v2 , t0 ) .
2
184 8 The Monge–Mather shortening principle

γ1 (0) γ2 (0)

γ1
γ2

Fig. 8.3. Principle of Mather’s proof: Let γ1 and γ2 be two action-minimizing

curves. If at time t0 the two curves γ1 and γ2 pass too close to each other, one can
devise shortcuts (here drawn as straight lines).

Since L(m0 , · , t0 ) is strictly convex, this quantity is negative if v1 6= v2 ,

which means that the total action has been strictly improved by the
modification. But then c(x1 , y2 ) + c(x2 , y1 ) < c(x1 , y1 ) + c(x2 , y2 ), in
contradiction to our assumptions. The only possibility available is that
v1 = v2 , i.e. at the crossing point the curves have the same position and
the same velocity; but then, since they are solutions of a second-order
differential inequality, they have to coincide at all times. ⊓
⊔
It only remains to make this argument quantitative: If the two curves
pass close to each other at time t0 , then their velocities at that time
will also be close to each other, and so the trajectories have to be close
to each other for all times in [0, 1]. Unfortunately this will not be so
easy.
Rigorous proof of Theorem 8.1. Step 1: Localization. The goal of this
step is to show that the problem reduces to a local computation that can
be performed as if we were in Euclidean space, and that it is sufficient
to control the difference of velocities at time t0 (as in the above sketchy
explanation). If the reader is ready to believe in these two statements,
then he or she can go directly to Step 2.
For brevity, let γ1 ∪ γ2 stand for the union of the images of the
minimizing paths γ1 and γ2 . For any point x in projM (V ), there is a
Proof of Mather’s estimates 185

small ball Brx (x) which is diﬀeomorphic to an open set in Rn , and

by compactness one can cover a neighborhood of γ1 ∪ γ2 by a ﬁnite
number of such balls Bj , each of them having radius no less than δ > 0.
Without loss of generality, all these balls are included in projM (V ), and
it can be assumed that whenever two points X1 and X2 in γ1 ∪ γ2 are
separated by a distance less than δ/4, then one of the balls Bj contains
Bδ/4 (X1 ) ∪ Bδ/4 (X2 ).
If γ1 (t0 ) and γ2 (t0 ) are separated by a distance at least δ/4, then the
conclusion is obvious. Otherwise, choose τ small enough that τ S ≤ δ/4
(recall that S is a strict bound on the maximum speed along the curves);
then on the time-interval [t0 − τ, t0 + τ ] the curves never leave the balls
Bδ/4 (X1 ) ∪ Bδ/4 (X2 ), and therefore the whole trajectories of γ1 and
γ2 on that time-interval have to stay within a single ball Bj . If one
takes into account positions, velocities and time, the system is conﬁned
within Bj × BS (0) × [0, 1] ⊂ V.
On any of these balls Bj , one can introduce a Euclidean system of
coordinates, and perform all computations in that system (write L in
those coordinates, etc.). The distance induced on Bj by that system of
coordinates will not be the same as the original Riemannian distance,
but it can be bounded from above and below by multiples thereof. So
we can pretend that we are really working with a Euclidean metric, and
all conclusions that are obtained, involving only what happens inside
the ball Bj , will remain true up to changing the bounds. Then, for the
sake of all computations, we can freely add points as if we were working
in Euclidean space.
If it can be shown, in that system of coordinates, that

γ̇1 (t0 ) − γ̇2 (t0 ) ≤ C γ1 (t0 ) − γ2 (t0 )β , (8.8)

then this means that (γ1 (t0 ), γ̇1 (t0 )) and (γ2 (t0 ), γ̇2 (t0 )) are very close
to each other in TM ; more precisely
they are separated by a distance
which is O d(γ1 (t0 ), γ2 (t0 ))β . Then by Assumption (i) and Cauchy–
Lipschitz theory this bound will be propagated backward and forward
in time, so the distance between (γ1 (t), γ̇1 (t)) and (γ2 (t), γ̇2 (t)) will re-
main bounded by O d(γ1 (t0 ), γ2 (t0 ))β . Thus to conclude the argument
it is suﬃcient to prove (8.8).
Step 2: Construction of shortcuts. First some notation: Let us
write x1 (t) = γ1 (t), x2 (t) = γ2 (t), v1 (t) = γ̇1 (t), v2 (t) = γ̇2 (t), and also
X1 = x1 (t0 ), V1 = v1 (t0 ), X2 = x2 (t0 ), V2 = v2 (t0 ). The goal is to
control |V1 − V2 | by |X1 − X2 |.
186 8 The Monge–Mather shortening principle

Let x12 (t) be deﬁned by



 x1 (t) for t ∈ [0, t0 − τ ];







 x2 (t0 +τ )−x1 (t0 +τ )
 x1 (t)+x2 (t) + τ +t−t0
2 2τ 2
x12 (t) = x1 (t0 −τ )−x2 (t0 −τ )

 + τ −t+t0

 2τ 2



 for t ∈ [t 0 − τ, t 0 + τ ];


x (t) for t ∈ [t + τ, 1].
2 0

Note that x12 is a continuous function of t; it is a path that starts along

γ1 , then switches to γ2 . Let v12 (t) stand for its time-derivative:

 for t ∈ [0, t0 − τ ];
v1 (t)





 h i h i
v12 (t) = v1 (t)+v
2
2 (t)
+ 1
2τ
x2 (t0 −τ )+x2 (t0 +τ )
2 − x1 (t0 −τ )+x1 (t0 +τ )
2



 for t ∈ [t − τ, t + τ ];

 0 0

v (t)
2 for t ∈ [t + τ, 1].
0

Then the path x21 (t) and its time-derivative v21 (t) are deﬁned symetri-
cally (see Figure 8.4). These deﬁnitions are rather natural: First we try
to construct paths on [t0 − τ, t0 + τ ] whose velocity is about the half of
the velocities of γ1 and γ2 ; then we correct these paths by adding simple
functions (linear in time) to make them match the correct endpoints.
I shall conclude this step with some basic estimates about the paths
x12 and x21 on the time-interval [t0 − τ, t0 + τ ]. For a start, note that

 x1 + x2 x1 + x2

 x − = − x − ,
 12 2
21
2
(8.9)

 v + v v + v

v12 − 1 2 1 2
= − v21 − .
2 2

In the sequel, the symbol O(m) will stand for any expression which
is bounded by Cm, where C only depends on V and on the regularity
bounds on the Lagrangian L on V. From Cauchy–Lipschitz theory and
Assumption (i),

|v1 − v2 |(t) + |x1 − x2 |(t) = O |X1 − X2 | + |V1 − V2 | , (8.10)
Proof of Mather’s estimates 187

x21
x12

Fig. 8.4. The paths x12 (t) and x21 (t) obtained by using the shortcuts to switch
from one original path to the other.

and then by plugging this back into the equation for minimizing curves
we obtain
|v̇1 − v̇2 |(t) = O |X1 − X2 | + |V1 − V2 | .
Upon integration in times, these bounds imply

x1 (t) − x2 (t) = (X1 − X2 ) + O τ (|X1 − X2 | + |V1 − V2 |) ; (8.11)

v1 (t) − v2 (t) = (V1 − V2 ) + O τ (|X1 − X2 | + |V1 − V2 |) ; (8.12)
and therefore also

x1 (t)−x2 (t) = (X1 −X2 )+(t−t0 ) (V1 −V2 )+O τ 2 (|X1 −X2 |+|V1 −V2 |) .
(8.13)
As a consequence of (8.12), if τ is small enough (depending only on
the Lagrangian L),
|V1 − V2 |
|v1 − v2 |(t) ≥ − O τ |X1 − X2 | . (8.14)
2
Next, from Cauchy–Lipschitz again,

x2 (t0 +τ )−x1 (t0 +τ ) = X2 −X1 +τ (V2 −V1 )+O τ 2 (|X1 −X2 |+|V1 −V2 |) ;

and since a similar expression holds true with τ replaced by −τ , one

has
188 8 The Monge–Mather shortening principle

x2 (t0 + τ ) − x1 (t0 + τ ) x1 (t0 − τ ) − x2 (t0 − τ )
−
2 2

= (X2 − X1 ) + O τ 2 (|X1 − X2 | + |V1 − V2 |) , (8.15)

and also

x2 (t0 + τ ) − x1 (t0 + τ ) x1 (t0 − τ ) − x2 (t0 − τ )
+
2 2

= τ (V2 − V1 ) + O τ 2 (|X1 − X2 | + |V1 − V2 |) . (8.16)

It follows from (8.15) that

v1 (t) + v2 (t) |X − X |
1 2
v12 (t) − =O + τ |V1 − V2 | . (8.17)
2 τ
After integration in time and use of (8.16), one obtains

x1 (t) + x2 (t)
x12 (t) −
2
h x1 (t0 ) + x2 (t0 ) i
= x12 (t0 ) − + O |X1 − X2 | + τ 2 |V1 − V2 |
2
= O |X1 − X2 | + τ |V1 − V2 | (8.18)

In particular,

|x12 − x21 |(t) = O |X1 − X2 | + τ |V1 − V2 | . (8.19)

Step 3: Taylor formulas and regularity of L. Now I shall eval-

uate the behavior of L along the old and the new paths, using the
regularity assumption (ii). From that point on, I shall drop the time
variable for simplicity (but it is implicit in all the computations). First,

x1 + x2
L(x1 , v1 ) − L , v1
2

x1 + x2 x1 − x2
= ∇x L , v1 · + O |x1 − x2 |1+α ;
2 2

similarly
Proof of Mather’s estimates 189

x1 + x2
L(x2 , v2 ) − L , v2
2

x1 + x2 x2 − x1
= ∇x L , v2 · + O |x1 − x2 |1+α .
2 2
Moreover,

x1 + x2 x1 + x2
∇x L , v1 − ∇x L , v2 = O(|v1 − v2 |α ).
2 2
The combination of these three identities, together with estimates (8.11)
and (8.12), yields
x + x x + x
1 2 1 2
L(x1 , v1 ) + L(x2 , v2 ) − L , v1 + L , v2
2 2
1+α α
= O |x1 − x2 | + |x1 − x2 | |v1 − v2 |

= O |X1 − X2 |1+α + τ |V1 − V2 |1+α + |X1 − X2 | |V1 − V2 |α

1+α α
+τ |V1 − V2 | |X1 − X2 | .

Next, in an analogous way,

v1 + v2
L(x12 , v12 ) − L x12 ,
2
v1 + v2 v1 + v2 v + v 1+α
1 2
= ∇v L x12 , · v12 − + O v12 − ,
2 2 2

v1 + v2
L(x21 , v21 ) − L x21 ,
2
v1 + v2 v1 + v2 v + v 1+α
1 2
= ∇v L x21 , · v21 − + O v21 − ,
2 2 2

v1 + v2 v1 + v2
∇v L x12 , − ∇v L x21 , = O |x12 − x21 |α .
2 2
Combining this with (8.9), (8.17) and (8.19), one ﬁnds
v1 + v2 v1 + v2
L(x12 , v12 ) + L(x21 , v21 ) − L x12 , + L x21 ,
2 2
v + v 1+α v + v
1 2 1 2
= O v12 − + v12 − |x12 − x21 |α
2 2
|X1 − X2 |1+α
=O + τ 1+α |V1 − V2 |1+α .
τ 1+α
190 8 The Monge–Mather shortening principle

After that,
v1 + v2 x + x v + v
1 2 1 2
L x12 , −L ,
2 2 2
x + x v + v x1 + x2 x1 + x2 1+α
1 2 1 2
= ∇x L , · x12 − +O x12 − ,
2 2 2 2

v1 + v2 x + x v + v
1 2 1 2
L x21 , −L ,
2 2 2
x + x v + v x1 + x2 x1 + x2 1+α
1 2 1 2
= ∇x L , · x21 − +O x21 − ,
2 2 2 2
and now by (8.9) the terms in ∇x cancel each other exactly upon som-
mation, so the bound (8.18) leads to
v1 + v2 v1 + v2 x + x v + v
1 2 1 2
L x12 , + L x21 , − 2L ,
2 2 2 2
x 1 + x 2 1+α
= O x21 −
2

= O |X1 − X2 |1+α + τ 1+α |V1 − V2 |1+α .

Step 4: Comparison of actions and strict convexity. From

our minimization assumption,

A(x1 ) + A(x2 ) ≤ A(x12 ) + A(x21 ),

which of course can be rewritten

Z t0 +τ
L(x1 (t), v1 (t), t) + L(x2 (t), v2 (t), t) − L(x12 (t), v12 (t), t)
t0 −τ

− L(x21 (t), v21 (t), t) dt ≤ 0. (8.20)

From Step 3, we can replace in the integrand all positions by (x1 +x2 )/2,
and v12 , v21 by (v1 + v2 )/2, up to a small error. Collecting the various
error terms, and taking into account the smallness of τ , one obtains
(dropping the t variable again)
Z t0 +τ x + x x + x x + x v + v
1 1 2 1 2 1 2 1 2
L , v1 +L , v2 − 2L , dt
2τ t0 −τ 2 2 2 2
|X − X |1+α
1 2 1+α
≤C + τ |V 1 − V 2 | . (8.21)
τ 1+α
Complement: Ruling out focalization by shortening 191

On the other hand, from the convexity condition (iii) and (8.14),
the left-hand side of (8.21) is bounded below by
Z t0 +τ 2+κ
1
K |v1 − v2 |2+κ dt ≥ K ′ |V1 − V2 | − Aτ |X1 − X2 | . (8.22)
2τ t0 −τ

If |V1 − V2 | ≤ 2Aτ |X1 − X2 |, then the proof is ﬁnished. If this is not the
case, this means that |V1 − V2 | − Aτ |X1 − X2 | ≥ |V1 − V2 |/2, and then
the comparison of the upper bound (8.21) and the lower bound (8.22)
yields
|X − X |1+α
1 2
|V1 − V2 |2+κ ≤ C + τ |V1 − V 2 |1+α
. (8.23)
τ 1+α
If |V1 − V2 | = 0, then the proof is ﬁnished. Otherwise, the conclusion
follows by choosing τ small enough that Cτ |V1 − V2 |1+α ≤ (1/2)|V1 −
V2 |2+κ ; then τ = O |V1 − V2 |1+κ−α ) and (8.23) implies
1+α
|V1 − V2 | = O |X1 − X2 |β , β= .
(1 + α)(1 + κ − α) + 2 + κ
(8.24)
In the particular case when κ = 0 and α = 1, one has

2 |X1 − X2 |2 2
|V1 − V2 | ≤ C + τ |V1 − V2 | ,
τ2
and if τ is small enough this implies just
|X1 − X2 |
|V1 − V2 | ≤ C . (8.25)
τ
The upper bound on τ depends on the regularity and strict convexity
of τ in V, but also on t0 since τ cannot be greater than min(t0 , 1 − t0 ).
This is actually the only way in which t0 explicitly enters the estimates.
So inequality (8.25) concludes the argument. ⊓
⊔

Complement: Ruling out focalization by shortening

This section is about the application of the shortening technique to a

classical problem in Riemannian geometry; it may be skipped at ﬁrst
reading.
192 8 The Monge–Mather shortening principle

Let M be a smooth Riemannian manifold and let L = L(x, v, t) be

a C 2 Lagrangian on TM × [0, 1], satisfying the classical conditions of
Definition 7.6, together with ∇2v L > 0. Let Xt (x0 , v0 ) = Xt (x0 , v0 , 0)
be the solution at time t of the flow associated with the Lagrangian L,
starting from the initial position x0 at time 0.
It is said that there is focalization on another point x′ = Xt′ (x0 , v0 ),
t′ > 0, if the differential map dv0 Xt′ (x0 , · ) is singular (not invertible).
In words, this means that starting from x0 it is very difficult to make
the curve explore a whole neighborhood of x′ by varying its initial ve-
locity; instead, trajectories have a tendency to “concentrate” at time
t′ along certain preferred directions around x′ .
The reader can test his or her understanding of the method pre-
sented in the previous section by working out the details of the following
problem.
Problem 8.8 (Focalization is impossible before the cut locus).
With the same notation as before, let γ : [0, 1] → M be a minimizing
curve starting from some initial point x0 . By using the same strategy
of proof as for Mather’s estimates, show that, starting from x0 , focal-
ization is impossible at γ(t∗ ) if 0 < t∗ < 1.
Hint: Here is a possible reasoning:
(a) Notice that the restriction of γ to [0, t∗ ] is the unique minimizing
curve on the time-interval [0, t∗ ], joining x0 to x∗ = γ(t∗ ).
(b) Take y close to x∗ and introduce a minimizing curve γ e on [0, t∗ ],
joining x0 to y; show that the initial velocity ve0 of γ e is close to the
initial velocity v0 of γ if y is close enough to x∗ .
(c) Bound the difference between the action of γ and the action of
e by O(d(x∗ , y)) (recall that the speeds along γ and γ
γ e are bounded by a
uniform constant, depending only of the behavior of L in some compact
set around γ).
(d) Construct a path x0 → γ(1) by first going along γ e up to time
t = t∗ − τ (τ small enough), then using a shortcut from γ e(t∗ − τ ) to
γ(t∗ + τ ), finally going along γ up to time 1. Show that the gain of
action is at least of the order of τ |V − Ve |2 − O(d(x∗ , y)2 /τ ), where
V = γ̇(t∗ ) and Ve = γ(t
ė ∗ ). Deduce that |V − Ve | = O(d(x∗ , y)/τ ).
(e) Conclude that |v0 − ve0 | = O(d(x∗ , y)/τ ). Use a contradiction
argument to deduce that the differential map dv0 Xt (x0 , · ) is invertible,
and more precisely that its inverse is of size O((1 − t∗ )−1 ) as a function
of t∗ .
Complement: Ruling out focalization by shortening 193

In the important case when L(x, v, t) = |v|2 , what we have proven is

a well-known result in Riemannian geometry; to explain it I shall first
recall the notions of cut locus and focal points.
Let γ be a minimizing geodesic, and let tc be the largest time such
that for all t < tc , γ is minimizing between γ0 and γt . Roughly speaking,
γ(tc ) is the first point at which the geodesic ceases to be minimizing;
γ may or may not be minimizing between γ(0) and γ(tc ), but it is
certainly not minimizing between γ(0) and γ(tc + ε), for any ε > 0.
Then the point γ(tc ) is said to be a cut point of γ0 along γ. When the
initial position x0 of the geodesic is fixed and the geodesic varies, the
set of all cut points constitutes the cut locus of x0 .
Next, two points x0 and x′ are said to be focal (or conjugate) if x′
can be written as expx0 (t′ v0 ), where the differential dv0 expx0 (t′ · ) is not
invertible. As before, this means that x′ can be obtained from x0 by a
geodesic γ with γ̇(0) = v0 , such that it is difficult to explore a whole
neighborhood of x′ by slightly changing the initial velocity v0 .
With these notions, the main result of Problem 8.8 can be summa-
rized as follows: Focalization never occurs before the cut locus. It can
occur either at the cut locus, or after.
Example 8.9. On the sphere S 2 , the north pole N has only one cut
point, which is also its only focal point, namely the south pole S. Fix a
geodesic γ going from γ(0) = N to γ(1) = S, and deform your sphere
out of a neighborhood of γ[0, 1], so as to dig a shortcut that allows you
to go from N to γ(1/2) in a more efficient way than using γ. This will
create a new cut point along γ, and S will not be a cut point along γ
any longer (it might still be a cut point along some other geodesic). On
the other hand, S will still be the only focal point along γ.
Remark 8.10. If x and y are not conjugate, and joined by a unique
minimizing geodesic γ, then it is easy to show that there is a neigh-
borhood U of y such that any z in U is also joined to x by a unique
minimizing geodesic. Indeed, any minimizing geodesic has to be close
to γ, therefore its initial velocity should be close to γ̇0 ; and by the lo-
cal inversion theorem, there are neighborhoods W0 of γ̇0 and U of y
such that there is a unique correspondence between the initial veloc-
ity γ̇ ∈ W0 of a minimizing curve starting from x, and the final point
γ(1) ∈ U . Thus the cut locus of a point x can be separated into two
sets:
(a) those points y for which there are at least two distinct minimizing
geodesics going from x to y;
194 8 The Monge–Mather shortening principle

(b) those points y for which there is a unique minimizing geodesic,

but which are focal points of x.

Introduction to Mather’s theory

In this section I shall present an application of Theorem 8.1 to the
theory of Lagrangian dynamical systems. This is mainly to give the
reader an idea of Mather’s motivations, and to let him or her better
understand the link between optimal transport and Mather’s theory.
These results will not play any role in the sequel of the notes.
Theorem 8.11 (Lipschitz graph theorem). Let M be a compact
Riemannian manifold, let L = L(x, v, t) be a Lagrangian function on
TM × R, and T > 0, such that
(a) L is T -periodic in the t variable, i.e. L(x, v, t + T ) = L(x, v, t);
(b) L is of class C 2 in all variables;
(c) ∇2v L is (strictly) positive everywhere, and L is superlinear in v.
Rt
Define as usual the action by As,t (γ) = s L(γτ , γ̇τ , τ ) dτ . Let cs,t be the
associated cost function on M × M , and C s,t the corresponding optimal
cost functional on P (M ) × P (M ).
Let µ be a probability measure solving the minimization problem
inf C 0,T (µ, µ), (8.26)
µ∈P (X )

and let (µt )0≤t≤T be a displacement interpolation between µ0 = µ and

µT = µ. Extend (µt ) into a T -periodic curve R → P (M ) defined for all
times. Then
(i) For all t0 < t1 , the curve (µt )t0 ≤t≤t1 still defines a displacement
interpolation;
(ii) The optimal transport cost C t,t+T (µt , µt ) is independent of t;
(iii) For any t0 ∈ R, and for any k ∈ N, µt0 is a minimizer for
C t0 ,t0 +kT (µ, µ).
Moreover, there is a random curve (γt )t∈R , such that
(iv) For all t ∈ R, law (γt ) = µt ;
(v) For any t0 < t1 , the curve (γt )t0 ≤t≤t1 is action-minimizing;
(vi) The map γ0 → γ̇0 is well-defined and Lipschitz.
Introduction to Mather’s theory 195

Remark 8.12. Since c0,T is not assumed to be nonnegative, the opti-

mal transport problem (8.26) is not trivial.

Remark 8.13. If L does not depend on t, then one can apply the pre-
vious result for any T = 2−ℓ , and then use a compactness argument to
construct a constant curve (µt )t∈R satisfying Properties (i)–(vi) above.
In particular µ0 is a stationary measure for the Lagrangian system.

Before giving its proof, let me explain briefly why Theorem 8.11
is interesting from the point of view of the dynamics. A trajectory
of the dynamical system defined by the Lagrangian L is a curve γ
which is locally action-minimizing; that is, one can cover the time-
interval by small subintervals on which the curve is action-minimizing.
It is a classical problem in mechanics to construct and study periodic
trajectories having certain given properties. Theorem 8.11 does not
construct a periodic trajectory, but at least it constructs a random
trajectory γ (or equivalently a probability measure Π on the set of
trajectories) which is periodic on average: The law µt of γt satisfies
µt+T = µt . This can also be thought of as a probability measure Π on
the set of all possible trajectories of the system.
Of course this in itself is not too striking, since there may be a great
deal of invariant measures for a dynamical system, and some of them
are often easy to construct. The important point in the conclusion of
Theorem 8.11 is that the curve γ is not “too random”, in the sense
that the random variable (γ(0), γ̇(0)) takes values in a Lipschitz graph.
(If (γ(0), γ̇(0)) were a deterministic element in TM , this would mean
that Π just sees a single periodic curve. Here we may have an infinite
collection of curves, but still it is not “too large”.)
Another remarkable property of the curves γ is the fact that the
minimization property holds along any time-interval in R, not neces-
sarily small.

Example 8.14. Let M be a compact Riemannian manifold, and let

L(x, v, t) = |v|2 /2 − V (x), where V has a unique maximum x0 . Then
Mather’s procedure selects the probability measure δx0 , and the sta-
tionary curve γ ≡ x0 (which is an unstable equilibrium).

It is natural to try to construct more “interesting” measures and

curves by Mather’s procedure. One way to do so is to change the La-
grangian, for instance replace L(x, v, t) by Lω := L(x, v, t) + ω(x) · v,
where ω is a vector ﬁeld on M . Indeed,
196 8 The Monge–Mather shortening principle

• If ω is closed (as a diﬀerential form), that is if ∇ω is a symmetric

operator, then Lω and L have the same Euler–Lagrange equations,
so the associated dynamical system is the same;
• If ω is exact, that is if ω = ∇f for some function f : M → R, then
Lω and L have the same minimizing curves.

As a consequence, one may explore various parts of the dynamics

by letting ω vary over the ﬁnite-dimensional group obtained by taking
the quotient of the closed forms by the exact forms. In particular,
R one
1 T
can make sure that the expected mean “rotation number” E ( T 0 γ̇ dt)
takes nontrivial values as ω varies.

Proof of Theorem 8.11. I shall repeatedly use Proposition 7.16 and The-
orem 7.21. First, C 0,T (µ, µ) is a lower semicontinuous function of µ,
bounded below by T (inf L) > −∞, so the minimization problem (8.26)
does admit a solution.
Deﬁne µ0 = µT = µ, then deﬁne µt by displacement interpolation
for 0 < t < T , and extend the result by periodicity.
Let k ∈ N be given and let µ e be a minimizer for the variational
problem
inf C 0,kT (µ, µ).
µ∈P (M )

We shall see later that actually µ is a solution of this problem. For

the moment, let (e µt )t∈R be obtained ﬁrst by taking a displacement
interpolation between µ e0 = µ
e and µ
ekT = µ
e; and then by extending the
result by kT -periodicity.
On the one hand,

k−1
X
0,kT 0,kT
C (e e) ≤ C
µ, µ (µ0 , µkT ) ≤ C jT, (j+1)T (µjT , µ(j+1)T )
j=0

= k C 0,1 (µ, µ). (8.27)

On the other hand, by deﬁnition of µ,

1 X
k−1
1X
k−1
0,T 0,T
C (µ, µ) ≤ C ejT ,
µ ejT
µ (8.28)
k k
j=0 j=0
1 X
k−1 k−1
1 X
0,T
=C ejT ,
µ e(j+1)T .
µ (8.29)
k k
j=0 j=0
Introduction to Mather’s theory 197

Since C 0,T (µ, ν) is a convex function of (µ, ν) (Theorem 4.8),

1 X
k−1
1X
k−1 1X k−1
C 0,T ejT ,
µ e(j+1)T ≤
µ C jT, (j+1)T (e e(j+1)T )
µjT , µ
k k k
j=0 j=0 j=0
1
= C 0,kT (e ekT ),
µ0 , µ (8.30)
k
where the last equality is a consequence of Property (ii) in Theo-
rem 7.21. Inequalities (8.29) and (8.30) together imply
1 0,kT 1
C 0,1 (µ, µ) ≤ C (e ekT ) = C 0,kT (e
µ0 , µ e).
µ, µ
k k
Since the reverse inequality holds true by (8.27), in fact all the inequal-
ities in (8.27), (8.29) and (8.30) have to be equalities. In particular,
k−1
X
C 0,kT (µ0 , µkT ) = C jT, (j+1)T (µjT , µ(j+1)T ). (8.31)
j=0

Let us now check that the identity

C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) = C t1 ,t3 (µt1 , µt3 ) (8.32)

holds true for any three intermediate times t1 < t2 < t3 . By periodicity,
it suﬃces to do this for t1 ≥ 0. If 0 ≤ t1 < t2 < t3 ≤ T , then (8.32)
is true by the property of displacement interpolation (Theorem 7.21
again). If jT ≤ t1 < t2 < t3 ≤ (j + 1)T , this is also true because of the
T -periodicity. In the remaining cases, we may choose k large enough
that t3 ≤ kT . Then

C 0,kT (µ0 , µkT ) ≤ C 0,t1 (µ0 , µt1 ) + C t1 ,t3 (µt1 , µt3 ) + C t3 ,kT (µt3 , µkT )
≤ C 0,t1 (µ0 , µt1 ) + C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) + C t3 ,kT (µt3 , µkT )
X
≤ C sj ,sj+1 (µsj , µsj+1 ), (8.33)

where the times sj are obtained by ordering of {0, T, 2T, . . . , kT } ∪

{t1 , t2 , t3 }. On each time-interval [ℓT, (ℓ + 1)T ] we know that (µt ) is a
displacement interpolation, so we can apply Theorem 7.21(ii), and as
a result bound the right-hand side of (8.33) by
X
C ℓT,(ℓ+1)T µℓT , µ(ℓ+1)T . (8.34)
ℓ
198 8 The Monge–Mather shortening principle

(Consider for instance the particular case when 0 < t1 < t2 < T <
t3 < 2T ; then one can write C 0,t1 + C t1 ,t2 + C t2 ,T = C 0,T , and also
C T,t3 + C t3 ,2T = C T,2T . So C 0,t1 + C t1 ,t2 + C t2 ,T + C T,t3 + C t3 ,2T =
C 0,T + C T,2T .)
But (8.34) is just C 0,kT (µ0 , µkT ), as shown in (8.31). So there is
in fact equality in all these inequalities, and (8.32) follows. Then by
Theorem 7.21, (µt ) deﬁnes a displacement interpolation between any
two of its intermediate values. This proves (i). At this stage we have
also proven (iii) in the case when t0 = 0.
Now for any t ∈ R, one has, by (8.32) and the T -periodicity,

C 0,T (µ0 , µT ) = C 0,t (µ0 , µt ) + C t,T (µt , µT )

= C t,T (µt , µT ) + C T,t+T (µT , µt+T )
= C t,t+T (µt , µt+T )
= C t,t+T (µt , µt ),

which proves (ii).

Next, let t0 be given, and repeat the same whole procedure with
the initial time 0 replaced by t0 : That is, introduce a minimizer µ e for
C t0 ,t0 +T (µ, µ), etc. This gives a curve (e µt )t∈R with the property that
C t,t+T (e et ) = C 0,T (e
µt , µ e0 ). It follows that
µ0 , µ

C t0 ,t0 +T (µt0 , µt0 ) = C 0,T (µ, µ) ≤ C 0,T (e e0 )

µ0 , µ
= C t0 ,t0 +T (e et0 ) = C t0 ,t0 +T (e
µt0 , µ e) ≤ C t0 ,t0 +T (µt0 , µt0 ).
µ, µ

So there is equality everywhere, and µt0 is indeed a minimizer for

C t0 ,t0 +T (µ, µ). This proves the remaining part of (iii).
Next, let (γt )0≤t≤T be a random minimizing curve on [0, T ] with
law (γt ) = µt , as in Theorem 7.21. For each k, deﬁne (γtk )kT ≤t≤(k+1)T as
k ) = law (γ k
a copy of (γt )0≤t≤T . Since µt is T -periodic, law (γkT (k+1)T ) =
µ0 , for all k. So we can glue together these random curves, just as in
the proof of Theorem 7.30, and get random curves (γt )t∈R such that
law (γt ) = µt for all t ∈ R, and each curve (γt )kT ≤t≤(k+1)T is action-
minimizing. Property (iv) is then satisﬁed by construction.
Property (v) can be established by a principle which was already
used in the proof of Theorem 7.21. Let us check for instance that γ is
minimizing on [0, 2T ]. For this one has to show that (almost surely)

ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 ) = ct1 ,t3 (γt1 , γt3 ), (8.35)
Introduction to Mather’s theory 199

for any choice of intermediate times t1 < t2 < t3 in [0, 2T ]. Assume,

without real loss of generality, that 0 < t1 < t2 < T < t3 < 2T . Then

C t1 ,t3 (µt1 , µt3 ) ≤ E ct1 ,t3 (γt1 , γt3 )

≤ E ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 )
≤ E ct1 ,t2 (γt1 , γt2 ) + E ct2 ,T (γt2 , γT ) + E cT,t3 (γT , γt3 )
= C t1 ,t2 (µt1 , µt2 ) + C t2 ,T (µt2 , µT ) + C T,t3 (µT , µt3 )
= C t1 ,t3 (µt1 , µt3 ),

where the property of optimality of the path (µt )t∈R was used in the
last step. So all these inequalities are equalities, and in particular

E ct1 ,t3 (γt1 , γt3 ) − ct1 ,t2 (γt1 , γt2 ) − ct2 ,t3 (γt2 , γt3 ) = 0.

Since the integrand is nonpositive, it has to vanish almost surely.

So (8.35) is satisfied almost surely, for given t1 , t2 , t3 . Then the same
inequality holds true almost surely for all choices of rational times
t1 , t2 , t3 ; and by continuity of γ it holds true almost surely at all times.
This concludes the proof of (v).
From general principles of Lagrangian mechanics, there is a uniform
bound on the speeds of all the curves (γt )−T ≤t≤T (this is because γ−T
and γT lie in a compact set). So for any given ε > 0 we can find δ
such that 0 ≤ t ≤ δ implies d(γ0 , γt ) ≤ ε. Then if ε is small enough the
map (γ0 , γt ) → (γ0 , γ̇0 ) is Lipschitz. (This is another well-known fact in
Lagrangian mechanics; it can be seen as a consequence of Remark 8.10.)
But from Theorem 8.5, applied with the intermediate time t0 = 0
on the time-interval [−T, T ], we know that γ0 7−→ γt is well-defined
(almost surely) and Lipschitz continuous. It follows that γ0 → γ̇0 is
also Lipschitz continuous. This concludes the proof of Theorem 8.11. ⊓ ⊔

The story does not end here. First, there is a powerful dual point
of view to Mather’s theory, based on solutions to the dual Kantorovich
problem; this is a maximization problem deﬁned by
Z
sup (φ − ψ) dµ, (8.36)

where the supremum is over all probability measures µ on M , and all

pairs of Lipschitz functions (ψ, φ) such that

∀(x, y) ∈ M × M, φ(y) − ψ(x) ≤ c0,T (x, y).

200 8 The Monge–Mather shortening principle

Next, Theorem 8.11 suggests that some objects related to optimal

transport might be interesting to describe a Lagrangian system. This
is indeed the case, and the notions deﬁned below are useful and well-
known in the theory of dynamical systems:

Definition 8.15 (Useful transport quantities describing a La-

grangian system). For each displacement interpolation (µt )t≥0 as in
Theorem 8.11, define
(i) the Mather critical value as the opposite of the mean optimal
transport cost:
1 0,T 1 0,kT
−M = c := C (µ, µ) = C (µ, µ); (8.37)
T kT
(ii) the Mather set as the closure of the union of the supports of
all measures V# µ0 , where (µt )t≥0 is a displacement interpolation as in
Theorem 8.11 and V is the Lipschitz map γ0 → (γ0 , γ̇0 );
(iii) the Aubry set as the set of all (γ0 , γ̇0 ) such that there is a solu-
0,T
tion (φ, ψ) of the dual problem (8.36) satisfying H+ ψ(γ1 ) − ψ(γ0 ) =
0,T
c (γ0 , γ1 ).

Up to the change of variables (γ0 , γ̇0 ) → (γ0 , γ1 ), the Mather and

Aubry sets are just the same as Γmin and Γmax appearing in the bibli-
ographical notes of Chapter 5.

Example 8.16. Take a one-dimensional pendulum. For small values of

the total energy, the pendulum is conﬁned in a periodic motion, making
just small oscillations, going back and forth around its equilibrium po-
sition and describing an arc of circle in physical space (see Figure 8.5).
For large values, it also has a periodic motion but now it goes always
in the same direction, and describes a complete circle (“revolution”)
in physical space. But if the system is given just the right amount of
energy, it will describe a trajectory that is intermediate between these
two regimes, and consists in going from the vertical upward position
(at time −∞) to the vertical upward position again (at time +∞) af-
ter exploring all intermediate angles. There are two such trajectories
(one clockwise, and one counterclockwise), which can be called revolu-
tions of inﬁnite period; and they are globally action-minimizing. When
ξ = 0, the solution of the Mather problem is just the Dirac mass on the
unstable equilibrium x0 , and the Mather and Aubry sets Γ are reduced
to {(x0 , x0 )}. When ξ varies in R, this remains the same until ξ reaches
Introduction to Mather’s theory 201

a certain critical value; above that value, the Mather measures are sup-
ported by revolutions. At the critical value, the Mather and Aubry sets
diﬀer: the Aubry set (viewed in the variables (x, v)) is the union of the
two revolutions of inﬁnite period.

Fig. 8.5. In the left figure, the pendulum oscillates with little energy between two
extreme positions; its trajectory is an arc of circle which is described clockwise, then
counterclockwise, then clockwise again, etc. On the right figure, the pendulum has
much more energy and draws complete circles again and again, either clockwise or
counterclockwise.

The dual point of view in Mather’s theory, and the notion of the
Aubry set, are intimately related to the so-called weak KAM the-
ory, in which stationary solutions of Hamilton–Jacobi equations play
a central role. The next theorem partly explains the link between the
two theories.

Theorem 8.17 (Mather critical value and stationary Hamilton–

Jacobi equation). With the same notation as in Theorem 8.11, as-
sume that the Lagrangian L does not depend on t, and let ψ be a Lips-
0,t
chitz function on M , such that H+ ψ = ψ + c t for all times t ≥ 0; that
is, ψ is left invariant by the forward Hamilton–Jacobi semigroup, ex-
cept for the addition of a constant which varies linearly in time. Then,
0,T
necessarily c = c, and the pair (ψ, H+ ψ) = (ψ, ψ + c T ) is optimal in
the dual Kantorovich problem with cost function c0,T , and initial and
final measures equal to µ.
0,1
Remark 8.18. The equation H+ ψ = ψ + c t is a way to reformu-
late the stationary Hamilton–Jacobi equation H(x, ∇ψ(x)) + c = 0.
Yet another reformulation would be obtained by changing the forward
Hamilton–Jacobi semigroup for the backward one. Theorem 8.17 does
202 8 The Monge–Mather shortening principle

not guarantee the existence of such stationary solutions, it just states

that if such solutions exist, then the value of the constant c is uniquely
determined and can be related to a Monge–Kantorovich problem. In
weak KAM theory, one then establishes the existence of these solutions
by independent means; see the references suggested in the bibliograph-
ical notes for much more information.

Remark 8.19. The constant −c (which coincides with Mather’s criti-

cal value) is often called the effective Hamiltonian of the system.

Proof of Theorem 8.17. To ﬁx the ideas, let us impose T = 1. Let ψ be

0,1
such that H+ ψ = ψ + c, and let µ be any probability measure on M ;
then Z Z Z
0,1
(H+ ψ) dµ − ψ dµ = c dµ = c.

By the easy part of the Kantorovich duality, C 0,1 (µ, µ) ≥ c. By taking

the infimum over all µ ∈ P (M ), we conclude that c ≥ c.
To prove the reverse inequality, it suffices to construct a particular
probability measure µ such that C 0,1 (µ, µ) ≤ c. The idea is to look for µ
as a limit of probability measures distributed uniformly over some well-
chosen long minimizing trajectories. Before starting this construction,
we first remark that since M is compact, there is a uniform bound C on
L(γ(t), γ̇(t)), for all action-minimizing curves γ : [0, 1] → M ; and since
L is time-independent, this statement trivially extends to all action-
minimizing curves defined on time-intervals [t0 , t1 ] with |t0 − t1 | ≥ 1.
Also ψ is uniformly bounded on M .
Now let x be an arbitrary point in M ; for any T > 0 we have, by
definition of the forward Hamilton–Jacobi semigroup,
Z 0
−T,0
(H+ ψ)(x) = inf ψ(γ(−T )) + L(γ(s), γ̇(s)) ds; γ(0) = x ,
−T

where the inﬁmum is over all action-minimizing curves γ : [−T, 0] → M

ending at x. (The advantage in working with negative times is that
one can ﬁx one of the endpoints; in the present context where M is
compact this is nonessential, but it would become important if M were
noncompact.) By compactness, there is a minimizing curve γ = γ (T ) ;
then, by the deﬁnition of γ (T ) and the stationarity of ψ,
Introduction to Mather’s theory 203
Z
1 0 1 h −T,0 i
L γ (T ) (s), γ̇ (T ) (s) ds = (H+ ψ)(x) − ψ(γ (T ) (−T ))
T −T T
1
= ψ(x) + c T − ψ(γ (T ) (−T ))
T
1
=c+O .
T

In the sequel, I shall write just γ for γ (T +1) . Of course the estimate
above remains unchanged upon replacement of T by T + 1, so
Z
1 0 1
L(γ(s), γ̇(s)) ds = c + O .
T −(T +1) T

Then deﬁne
Z −1 Z 0
1 1
µT := δγ(s) ds; νT := δγ(s) ds;
T −(T +1) T −T

and θ : γ(s) 7−→ γ(s + 1). It is clear that θ# µT = νT ; moreover,

Z s+1
0,1

c γ(s), θ(γ(s)) = c0,1 γ(s), γ(s + 1) = L(γ(u), γ̇(u)) du.
s

Thus by Theorem 4.8,

Z −1
0,1 1
C (µT , νT ) ≤ c0,1 γ(s), θ(γ(s)) ds
T −(T +1)
Z −1 Z s+1
1
= L(γ(u), γ̇(u)) du ds
T −(T +1) s
Z 0
1
= L(γ(u), γ̇(u)) a(u) du, (8.38)
T −(T +1)

where a : [−(T + 1), 0] → [0, 1] is deﬁned by


Z 
1 if −T ≤ u ≤ −1;
a(u) = 1s≤u≤s+1 ds = −u if −1 ≤ u ≤ 0;


u+T +1 if −(T + 1) ≤ u ≤ −T .

Replacing a by 1 in the integrand of (8.38) involves modifying the

integral on a set of measure at most 2; so
204 8 The Monge–Mather shortening principle
Z 0
0,1 1 1 1
C (µT , νT ) ≤ L(γ(u), γ̇(u)) du + O =c+O .
T
−(T +1) T T
(8.39)
Since P (M ) is compact, the family (µT )T ∈N converges, up to ex-
traction of a subsequence, to some probability measure µ. Then (up to
extraction of the same subsequence) νT also converges to µ, since
Z −T Z 0

µ T − νT 1

2
TV
= δγ(s) ds + δγ(s) ds ≤ .
T −(T +1) −1 T V T

Then from (8.39) and the lower semicontinuity of the optimal transport
cost,
C 0,1 (µ, µ) ≤ lim inf C 0,1 (µT , νT ) ≤ c.
T →∞

This concludes the proof. ⊓

⊔

The next exercise may be an occasion to manipulate the concepts

introduced in this section.

Exercise 8.20. With the same assumptions as in Theorem 8.11, as-

sume that L is symmetric in v; that is, L(x, −v, t) = L(x, v, t). Show
that c0,T (x, y) = c0,T (y, x). Take an optimal measure µ for the min-
imization problem (8.26), and let π be an associated optimal trans-
ference plan. By gluing together π and π̌ (obtained by exchanging
the variables x and y), construct an optimal transference plan for
the problem (8.26) with T replaced by 2T , such that each point x
stays in place. Deduce that the curves γ are 2T -periodic. Show that
c0,2T (x, x) = C 0,2T (µ, µ), and deduce that c0,T (x, y) is π-almost surely
0,2T
constant. Construct ψ such that H+ ψ = ψ + 2c T , µ-almost surely.
Next assume that L does not depend on t, and use a compactness
argument to construct a ψ and a stationary measure µ, such that
0,t
H+ ψ = ψ + c t, for all t ≥ 0, µ-almost surely. Note that this is far
from proving the existence of a stationary solution of the Hamilton–
Jacobi equation, as appearing in Theorem 8.17, for two reasons: First
the symmetry of L is a huge simpliﬁcation; secondly the equation
0,t
H+ ψ = ψ + c t should hold everywhere in M , not just µ-almost surely.
Possible extensions of Mather’s estimates 205

Possible extensions of Mather’s estimates

As noticed in Example 8.4, it would be desirable to have a sharper

version of Theorem 8.1 which would contain as a special case the correct
exponents for the Lagrangian function L(x, v, t) = |v|1+α , 0 < α < 1.
But even for a “uniformly convex” Lagrangian there are several ex-
tensions of Theorem 8.1 which would be of interest, such as (a) getting
rid of the compactness assumption, or at least control the dependence of
constants at inﬁnity; and (b) getting rid of the smoothness assumptions.
I shall discuss both problems in the most typical case L(x, v, t) = |v|2 ,
i.e. c(x, y) = d(x, y)2 .
Intuitively, Mather’s estimates are related to the local behavior of
geodesics (they should not diverge too fast), and to the convexity prop-
erties of the square distance function d2 (x0 , · ). Both features are well
captured by lower bounds on the sectional curvature of the manifold.
There is by chance a generalized notion of sectional curvature bounds,
due to Alexandrov, which makes sense in a general metric space, with-
out any smoothness; metric spaces which satisfy these bounds are called
Alexandrov spaces. (This notion will be explained in more detail in
Chapter 26.) In such spaces, one could hope to solve problems (a)
and (b) at the same time. Although the proofs in the present chap-
ter strongly rely on smoothness, I would be ready to believe in the
following statement (which might be not so diﬃcult to prove):

Open Problem 8.21. Let (X , d) be an Alexandrov space with curva-

ture bounded below by K ∈ R, and let x1 , x2 , y1 , y2 be four points in X
such that

d(x1 , y1 )2 + d(x2 , y2 )2 ≤ d(x1 , y2 )2 + d(x2 , y1 )2 .

Further, let γ1 and γ2 be two constant-speed geodesics respectively join-

ing x1 to y1 and x2 to y2 . Then, for any t0 ∈ (0, 1), there is a constant
Ct0 , depending only on K, t0 , and on an upper bound on all the dis-
tances involved, such that

sup d γ1 (t), γ2 (t) ≤ Ct0 d γ1 (t0 ), γ2 (t0 ) .
0≤t≤1

To conclude this discussion, I shall mention a much rougher “short-

ening lemma”, which has the advantage of holding true in general met-
ric spaces, even without curvature bounds. In such a situation, in gen-
eral there may be branching geodesics, so a bound on the distance at
206 8 The Monge–Mather shortening principle

one intermediate time is clearly not enough to control the distance be-
tween the positions along the whole geodesic curves. One cannot hope
either to control the distance between the velocities of these curves,
since the velocities might not be well-deﬁned. On the other hand, we
may take advantage of the property of preservation of speed along the
minimizing curves, since this remains true even in a nonsmooth con-
text. The next theorem exploits this to show that if geodesics in a
displacement interpolation pass near each other at some intermediate
time, then their lengths have to be approximately equal.
Theorem 8.22 (A rough nonsmooth shortening lemma). Let
(X , d) be a metric space, and let γ1 , γ2 be two constant-speed, minimiz-
ing geodesics such that
2 2 2 2
d γ1 (0), γ1 (1) + d γ2 (0), γ2 (1) ≤ d γ1 (0), γ2 (1) + d γ2 (0), γ1 (1) .
Let L1 and L2 stand for the respective lengths of γ1 and γ2 , and let D
be a bound on the diameter of (γ1 ∪ γ2 )([0, 1]). Then
√ q
C D
|L1 − L2 | ≤ p d γ1 (t0 ), γ2 (t0 ) ,
t0 (1 − t0 )
for some numeric constant C.
Proof of Theorem 8.22. Write d12 = d(x1 , y2 ), d21 = d(x2 , y1 ), X1 =
γ1 (t0 ), X2 = γ2 (t0 ). From the minimizing assumption, the triangle
inequality and explicit calculations,
0 ≤ d212 + d221 − L21 − L22
2
≤ d(x1 , X1 ) + d(X1 , X2 ) + d(X2 , y2 )
2
+ d(x2 , X2 ) + d(X2 , X1 ) + d(X1 , y1 )
2
= t0 L1 + d(X1 , X2 ) + (1 − t0 ) L2
2
+ t0 L2 + d(X1 , X2 ) + (1 − t0 ) L1 − L21 − L22

= 2 d(X1 , X2 ) L1 + L2 + d(X1 , X2 ) − 2 t0 (1 − t0 ) (L1 − L2 )2 .

As a consequence,
s
L1 + L2 + d(X1 , X2 ) p
|L1 − L2 | ≤ d(X1 , X2 ),
t0 (1 − t0 )
Appendix: Lipschitz estimates for power cost functions 207

and the proof is complete. ⊓

⊔

Appendix: Lipschitz estimates for power cost functions

The goal of this Appendix is to prove the following shortening lemma

for the cost function c(x, y) = |x − y|1+α in Euclidean space.

Theorem 8.23 (Shortening lemma for power cost functions).

Let α ∈ (0, 1), and let x1 , y1 , x2 , y2 be four points in Rn , such that

|x1 − y1 |1+α + |x2 − y2 |1+α ≤ |x1 − y2 |1+α + |x2 − y1 |1+α . (8.40)

Further, let

γ1 (t) = (1 − t) x1 + t y1 , γ2 (t) = (1 − t) x2 + t y2 .

Then, for any t0 ∈ (0, 1) there is a constant K = K(α, t0 ) > 0 such

that
|γ1 (t0 ) − γ2 (t0 )| ≥ K sup |γ1 (t) − γ2 (t)|.
0≤t≤1

Remark 8.24. The proof below is not constructive, so I won’t have

any quantitative information on the best constant K(α, t). It is natural
to think that for each ﬁxed t, the constant K(α, t) (which only depends
on α) will go to 0 as α ↓ 0. When α = 0, the conclusion of the theorem
is false: Just think of the case when x1 , y1 , x2 , y2 are aligned. But this
is the only case in which the conclusion fails, so it might be that a
modiﬁed statement still holds true.

Proof of Theorem 8.23. First note that it suffices to work in the affine
space generated by x1 , y1 , x2 , y2 , which is of dimension at most 3; hence
all the constants will be independent of the dimension n. For notational
simplicity, I shall assume that t0 = 1/2, which has no important in-
fluence on the computations. Let X1 := γ1 (1/2), X2 := γ2 (1/2). It is
sufficient to show that

|x1 − x2 | + |y1 − y2 | ≤ C |X1 − X2 |

for some constant C, independent of x1 , x2 , y1 , y2 .

208 8 The Monge–Mather shortening principle

Step 1: Reduction to a compact problem by invariance.

Exchanging the roles of x and y if necessary, we might assume that
|x2 − y2 | ≤ |x1 − y1 |, and then by translation invariance that x1 = 0,
by homogeneity that |x1 − y1 | = 1 (treat separately the trivial case
x1 = y1 ), and by rotation invariance that y1 = e is a fixed unit vector.
Let R := |x2 |, then |y2 − x2 | ≤ 1 implies |x2 − X2 | ≤ 1/2, so
|X2 | ≥ R − 1/2, and since |X1 | ≤ 1/2, it follows that |X1 − X2 | ≥ R − 1.
On the other hand, |x1 −x2 | = R and |y1 −y2 | ≤ R+1. So the conclusion
is obvious if R ≥ 2. Otherwise, |x2 | and |y2 | lie in the ball B3 (0).
Step 2: Reduction to a perturbation problem by com-
(k) (k)
pactness. For any positive integer k, let (x2 , y2 ) be such that
(k) (k)
(|x1 − x2 | + |y1 − y2 |)/|X1 − X2 | is minimized by (x1 , y1 , x2 , y2 ) under
the constraint |X1 − X2 | ≥ k−1 .
By compactness, such a configuration does exist, and the value Ik
of the infimum goes down with k, and converges to

|x1 − x2 | + |y1 − y2 |
I := inf , (8.41)
|X1 − X2 |

where the inﬁmum is taken over all conﬁgurations such that X1 6= X2 .

The strict convexity of x → |x|1+α together with inequality (8.40)
prevent X1 = X2 , unless (x1 , y1 ) = (x2 , y2 ), in which case there is
nothing to prove. So it is sufficient to show that I > 0.
(k) (k)
Since the sequence (x2 , y2 ) takes values in a compact set, there
(k) (k)
is a subsequence thereof (still denoted (x2 , y2 )) which converges
(∞) (∞)
to some (x2 , y2 ). By continuity, condition (8.40) holds true with
(∞) (∞) (∞)
(x2 , y2 ) = (x2 , y2 ). If (with obvious notation) |X1 −X2 | > 0, then
(∞) (∞)
the configuration (x1 , y1 , x2 , y2 ) achieves the minimum I in (8.41),
and that minimum is positive. So the only case there remains to treat is
(∞)
when X2 = X1 . Then, by strict convexity, condition (8.40) imposes
(∞) (∞) (k) (k)
x2 = x1 , y2 = y1 . Equivalently, x2 converges to x1 , and y2 to
y1 . All this shows that it suffices to treat the case when x2 is very close
to x1 and y2 is very close to y1 .
Step 3: Expansions. Now let

x2 = x1 + δx, y2 = y1 + δy, (8.42)

where δx and δy are vectors of small norm (recall that x1 − y1 has unit
norm). Of course
Appendix: Lipschitz estimates for power cost functions 209

δx + δy
X1 − X2 = , x2 − x1 = δx, y2 − y1 = δy;
2
so to conclude the proof it is suﬃcient to show that

δx + δy
≥ K(|δx| + |δy|), (8.43)
2

as soon as |δx| and |δy| are small enough, and (8.40) is satisﬁed.
By using the formulas |a + b|2 = |a|2 + 2ha, bi + |b|2 and

1+α (1 + α) (1 + α)(1 − α) 2
(1 + ε) 2 =1+ ε − ε + O(ε3 ),
2 8
one easily deduces from (8.40) that
h i
|δx − δy|2 − |δx|2 − |δy|2 ≤ (1 − α) hδx − δy, ei2 − hδx, ei2 − hδy, ei2

+ O |δx|3 + |δy|3 .

This can be rewritten

hδx, δyi − (1 − α) hδx, ei hδy, ei ≥ O(|δx|3 + |δy|3 ).

Consider the new scalar product

hhv, wii := hv, wi − (1 − α) hv, ei hw, ei

(which is indeed a scalar product because α > 0), and denote the
associated norm by kvk. Then the above conclusion can be summarized
into
hhδx, δyii ≥ O kδxk3 + kδyk3 . (8.44)
It follows that

δx + δy 2 1
= kδxk2
+ kδyk2
+ 2hhδx, δyii
2 4
1
≥ (kδxk2 + kδyk2 ) + O(kδxk3 + kδyk3 ).
4
So inequality (8.43) is indeed satisﬁed if |δx| + |δy| is small enough. ⊓
⊔
210 8 The Monge–Mather shortening principle

Exercise 8.25. Extend this result to the cost function d(x, y)1+α on a
Riemannian manifold, when γ and γ e stay within a compact set.
Hints: This tricky exercise is only for a reader who feels very comfort-
able. One can use a reasoning similar to that in Step 2 of the above
proof, introducing a sequence (γ (k) , γ
e(k) ) which is asymptotically the
“worst possible”, and converges, up to extraction of a subsequence,
to (γ (∞) , γ
e(∞) ). There are three cases: (i) γ (∞) and e
γ (∞) are distinct
geodesic curves which cross; this is ruled out by Theorem 8.1. (ii) γ (k)
and γe(k) converge to a point; then everything becomes local and one
can use the result in Rn , Theorem 8.23. (iii) γ (k) and γe(k) converge to
(∞)
a nontrivial geodesic γ ; then these curves can be approximated by
inﬁnitesimal perturbations of γ (∞) , which are described by diﬀerential
equations (Jacobi equations).
Remark 8.26. Of course it would be much better to avoid the com-
pactness arguments and derive the bounds directly, but I don’t see how
to proceed.

Bibliographical notes
Monge’s observation about the impossibility of crossing appears in his
seminal 1781 memoir [636]. The argument is likely to apply whenever
the cost function satisfies a triangle inequality, which is always the
case in what Bernard and Buffoni have called the Monge–Mañé prob-
lem [104]. I don’t know of a quantitative version of it.
A very simple argument, due to Brenier, shows how to construct,
without any calculations, configurations of points that lead to line-
crossing for a quadratic cost [814, Chapter 10, Problem 1].
There are several possible computations to obtain inequalities of the
style of (8.3). The use of the identity (8.2) was inspired by a result by
Figalli, which is described below.
It is an old observation in Riemannian geometry that two minimiz-
ing curves cannot intersect twice and remain minimizing; the way to
prove this is the shortcut method already known to Monge. This simple
principle has important geometrical consequences, see for instance the
works by Morse [637, Theorem 3] and Hedlund [467, p. 722]. (These
references, as well as a large part of the historical remarks below, were
pointed out to me by Mather.)
Bibliographical notes 211

At the end of the seventies, Aubry discovered a noncrossing lemma

which is similar in spirit, although in a different setting. Together with
Le Daeron, he demonstrated the power of this principle in studying
the so-called Frenkel–Kantorova model from solid-state physics; see
the “Fundamental Lemma” in [52]. In particular, the method of Aubry
and Le Daeron provided an alternative proof of results by Mather [598]
about the existence of quasiperiodic orbits for certain dynamical sys-
tems.1 The relations between the methods of proof of Aubry and
Mather are discussed in [599, 604] and constitute the core of what is
usually called the Aubry–Mather theory. Bangert [66, Lemma 3.1] gave
a general version of the Aubry–Le Daeron lemma, and illustrated its use
in various aspects of the theory of twist diffeomorphisms. (Bangert’s
paper can be consulted as an entry point for this subject.) He also made
the connection with the earlier works of Morse and Hedlund in geome-
try. There is a related independent study by Bialy and Polterovich [117].
Then Moser [641] showed that the theory of twist diffeomorphisms
(at least in certain particular cases) could be embedded in the theory
of strictly convex Lagrangian systems, and Denzler [297] adapted the
noncrossing arguments of Aubry, Le Daeron and Bangert to this new
setting (see for instance Theorem 2.3 there), which in some sense goes
back to older geometric works.
Around 1990, Mather came up with two contributions which for
our purposes are crucial. The first consists in introducing minimizing
measures rather than minimizing curves [600]; the second consists in
a quantitative version of the noncrossing argument, for a general class
of strictly convex Lagrangian functions [601, p. 186]. This estimate,
which in these notes I called Mather’s shortening lemma, was the key
technical ingredient in the proof of his fundamental “Lipschitz graph
theorem” [601, Theorem 2].
Although the statement in [601] is not really the same as the one
which appears in this chapter, the proof really is similar. The idea to use
this approach in optimal transport theory came to me when Bernard
mentioned Mather’s lemma in a seminar where he was presenting his
results with Buffoni about the optimal transport problem for rather
general Lagrangian functions [105].
1
According to Mather, the chronology is blurred, because Aubry knew similar
results somewhat earlier, at least for certain classes of systems, but had never
published them; in particular, the discoveries of Aubry and Mather were inde-
pendent. Further, see the review paper [51].
212 8 The Monge–Mather shortening principle

In the meantime, an appropriate version of the noncrossing lemma

had already been rediscovered (but not in a quantitative version) by
researchers in optimal transport. Indeed, the noncrossing property of
optimal trajectories, and the resulting estimates about absolute conti-
nuity of the displacement interpolant, were some of the key technical
tools used by McCann [614] to establish convexity properties of certain
functionals along displacement interpolation in Rn for a quadratic cost;
these statements were generalized by Cordero-Erausquin, McCann and
Schmuckenschläger [246] for Riemannian manifolds, and for rather gen-
eral convex cost functions in Rn by Cordero-Erausquin [243].
Results similar to Theorems 8.5 and 8.7 are also proven by Bernard
and Buffoni [105] via the study of Hamilton–Jacobi equations, in the
style of weak KAM theory. This is a bit less elementary but powerful
as well. The basic idea is to exploit the fact that solutions of Hamilton–
Jacobi equations are automatically semiconcave for positive times;
I learnt from Otto the usefulness of this regularization property in the
context of optimal transport (see [814, p. 181]). Fathi and Figalli [348]
generalized this strategy to noncompact situations. Bernard [102] also
used the same idea to recover an important result about the existence
of C 1 subsolutions of certain Hamilton–Jacobi equations.
Figalli and Juillet [366] obtained a result similar to Theorem 8.7
when the cost is the squared distance on a degenerate Riemannian
structure such as the Heisenberg group or an Alexandrov space with
curvature bounded below. Their approach is completely different since
it uses the uniqueness of Wasserstein geodesics and the so-called mea-
sure contraction property (which is traditionally associated with Ricci
curvature bounds but nevertheless holds in the Heisenberg group [496]).
Figalli and Juillet note that concentration phenomena arise in the
Heisenberg group which are not seen in Riemannian manifolds; and
that the Monge–Mather shortening lemma does not hold in this set-
ting.
Theorem 8.11 is a variant of Mather’s Lipschitz graph theorem,
appearing (up to minor modifications) in Bernard and Buffoni [105,
Theorem C]. The core of the proof is also taken from that work.
The acronym “KAM” stands for Kolmogorov, Arnold and Moser;
the “classical KAM theory” deals with the stability (with high proba-
bility) of perturbed integrable Hamiltonian systems. An account of this
theory can be found in, e.g., Thirring [780, Section 3.6]. With respect
to weak KAM theory, some important differences are that: (a) classical
Bibliographical notes 213

KAM theory only applies to slight perturbations of integrable systems;

(b) it only deals with very smooth objects; (c) it controls the behavior
of a large portion of the phase space (the whole of it, asymptotically
when the size of the perturbation goes to 0).
The weak KAM theory is much more recent than the classical one;
it was developed by several authors, in particular Fathi [344, 345].
A theorem of the existence of a stationary solution of the Hamilton–
Jacobi equation can be found in [347, Theorem 4.4.6]. Precursors are
Mather [602, 603] and Mañé [592, 593]. The reader can also consult
the book by Fathi [347], and (with a complementary point of view) the
one by Contreras and Iturriaga [238]. Also available are some technical
notes by Ishii [488], and the review works [604, 752].
The proof of Theorem 8.17, as I wrote it, is a minor variation of
an argument shown to me by Fathi. Related considerations appear in
a recent work by Bernard and Buffoni [106], who analyze the weak
KAM theory in light of the abstract Kantorovich duality. One may
also consult [278].
From its very beginning, the weak KAM theory has been associated
with the theory of viscosity solutions of Hamilton–Jacobi equations.
An early work on the subject (anterior to Mather’s papers) is an un-
published preprint by P.-L. Lions, Papanicolaou and Varadhan [564].
Recently, the weak KAM theory has been related to the large-time
behavior of Hamilton–Jacobi equations [69, 107, 346, 349, 383, 384,
385, 487, 645, 707]. Aubry sets are also related with the C 1 regular-
ity of Hamilton–Jacobi equations, which has important applications in
the theory of dynamical systems [102, 103, 350]. See also Evans and
Gomes [332, 333, 334, 423] and the references therein for an alternative
point of view.
In this chapter I presented Mather’s problem in terms of trajectories
and transport cost. There is an alternative presentation in terms of
invariant measures, following an idea by Mañé. In Mañé’s version of
the problem, the unknown is a probability measure µ(dx dv) on the
tangent bundle TM ; it is stationary in the sense that ∇x · (v µ) = 0
(this is a stationary
R kinetic transport equation), and it should minimize
the action L(x, v) µ(dx dv). Then one can show that µ is actually
invariant under the Lagrangian flow defined by L. As Gomes pointed
out to me, this approach has the drawback that the invariance of µ is
not built in from the definition; but it has several nice advantages:
214 8 The Monge–Mather shortening principle

• It makes the graph property trivial if L is strictly convex: Indeed,

one can always collapse
R the measure µ, at each x ∈ M , onto the
barycenter ξ(x) = v µ(dv|x); this operation preserves the invari-
ance of the measure, and decreases the cost unless µ was already
supported on a graph. (Note: This does not give the Lipschitz reg-
ularity of the graph!)
• This is a linear programming problem, so it admits a dual prob-
lem which is inf ϕ supx H(∇x ϕ, x); the value of this inﬁmum is
but another way to characterize the eﬀective Hamiltonian H, see
e.g. [238, 239].
• This is a good starting point for some generalizations, see for in-
stance [422].

I shall conclude with some more technical remarks.

The use of a restriction property to prove the absolute continuity of
the displacement interpolant without any compactness assumption was
inspired by a discussion with Sturm on a related subject. It was also
Sturm who asked me whether Mather’s estimates could be generalized
to Alexandrov spaces with curvature bounded below.
The theorem according to which a Lipschitz map T dilates the n-
dimensional Hausdorff measure by a factor at most kT knLip is an almost
immediate consequence of the definitions of Hausdorff measure, see
e.g. [174, Proposition 1.7.8].
Alexandrov spaces are discussed at length in the very pedagogical
monograph by Burago, Burago and Ivanov [174]. Several characteri-
zations of Alexandrov spaces are given there, and their equivalence is
established. For instance, an Alexandrov space has curvature bounded
below by K if the square distance function d(z, · )2 is “no more convex”
than the square distance function in the model space having constant
sectional curvature K. Also geodesics in an Alexandrov space cannot
diverge faster than geodesics in the model space, in some sense. These
properties explain why such spaces may be a natural generalized set-
ting for optimal transport. Upper bounds on the sectional curvature,
on the other hand, do not seem to be of any help.
Figalli recently solved the Open Problem 8.21 in the special case
K = 0 (nonnegative curvature), with a very simple and sharp argu-
ment: He showed that if γ1 and γ2 are any two minimizing, constant-
speed geodesics in an Alexandrov space (X , d) with nonnegative cur-
vature, and γ1 (0) = x1 , γ2 (0) = x2 , γ1 (1) = y1 , γ2 (1) = y2 , then
Bibliographical notes 215

d(γ1 (t), γ2 (t)) ≥ (1 − t)2 d(x1 , x2 )2 + t2 d(y1 , y2 )2

h i
+ t(1 − t) d(x1 , y2 )2 + d(x2 , y1 )2 − d(x1 , y1 )2 − d(x2 , y2 )2 . (8.45)

(So in this case there is no need for an upper bound on the distances
between x1 , x2 , y1 , y2 .) The general case where K might be negative
seems to be quite more tricky. As a consequence of (8.45), Theorem 8.7
holds when the cost is the squared distance on an Alexandrov space
with nonnegative curvature; but this can also be proven by the method
of Figalli and Juillet [366].
Theorem 8.22 takes inspiration from the no-crossing proof in [246,
Lemma 5.3]. I don’t know whether the Hölder-1/2 regularity is optimal,
and I don’t know either whether it is possible/useful to obtain similar
estimates for more general cost functions.
9

Solution of the Monge problem I: Global

approach

In the present chapter and the next one I shall investigate the solv-
ability of the Monge problem for a Lagrangian cost function. Recall
from Theorem 5.30 that it is sufficient to identify conditions under
which the initial measure µ does not see the set of points where the
c-subdifferential of a c-convex function ψ is multivalued.
Consider a Riemannnian manifold M , and a cost function c(x, y) on
M × M , deriving from a Lagrangian function L(x, v, t) on TM × [0, 1]
satisfying the classical conditions of Definition 7.6. Let µ0 and µ1 be
two given probability measures, and let (µt )0≤t≤1 be a displacement
interpolation, written as the law of a random minimizing curve γ at
time t.
If the Lagrangian satisfies adequate regularity and convexity prop-
erties, Theorem 8.5 shows that the coupling (γ(s), γ(t)) is always de-
terministic as soon as 0 < s < 1, however singular µ0 and µ1 might
be. The question whether one can construct a deterministic coupling
of (µ0 , µ1 ) is much more subtle, and cannot be answered without reg-
ularity assumptions on µ0 . In this chapter, a simple approach to this
problem will be attempted, but only with partial success, since even-
tually it will work out only for a particular class of cost functions,
including at least the quadratic cost in Euclidean space (arguably the
most important case).
Our main assumption on the cost function c will be:
Assumption (C): For any c-convex function ψ and any x ∈ M , the
c-subdifferential ∂c ψ(x) is pathwise connected.

Example 9.1. Consider the cost function c(x, y) = −x · y in Rn . Let

y0 and y1 belong to ∂c ψ(x); then, for any z ∈ Rn one has
218 9 Solution of the Monge problem I: Global approach

ψ(x) + y0 · (z − x) ≤ ψ(z); ψ(x) + y1 · (z − x) ≤ ψ(z).

It follows that ψ(x)+yt ·(z−x) ≤ ψ(z), where yt := (1−t) y0 +t y1 . Thus

the line segment (yt )0≤t≤1 is entirely contained in the subdiﬀerential of
ψ at x. The same computation applies to c(x, y) = |x − y|2 /2, or to any
cost function of the form a(x) − x · y + b(y).

Actually, there are not so many examples where Assumption (C) is

known to be satisﬁed. Before commenting more on this issue, let me
illustrate the interest of this assumption by showing how it can be used.

Theorem 9.2 (Conditions for single-valued subdifferentials).

Let M be a smooth n-dimensional Riemannian manifold, and c a
real-valued cost function, bounded below, deriving from a Lagrangian
L(x, v, t) on TM × [0, 1], satisfying the classical conditions of Defini-
tion 7.6 and such that:
(i) Assumption (C) is satisfied.
(ii) The conclusion of Theorem 8.1 (Mather’s shortening lemma),
in the form of inequality (8.4), holds true for t0 = 1/2 with an expo-
nent β > 1 − (1/n), and a uniform constant. More explicitly: When-
ever x1 , x2 , y1 , y2 are four points in M satisfying c(x1 , y1 ) + c(x2 , y2 ) ≤
c(x1 , y2 ) + c(x2 , y1 ), and γ1 , γ2 are two action-minimizing curves with
γ1 (0) = x1 , γ1 (1) = y1 , γ2 (0) = x2 , γ2 (1) = y2 , then
β
sup d γ1 (t), γ2 (t) ≤ C d γ1 (1/2), γ2 (1/2) . (9.1)
0≤t≤1

Then, for any c-convex function ψ, there is a set Z ⊂ M of Hausdorff

dimension at most (n − 1)/β < n (and therefore of zero n-dimensional
measure), such that the c-subdifferential ∂c ψ(x) contains at most one
element if x ∈
/ Z.

Proof of Theorem 9.2. Let Z be the set of points x for which ψ(x) <
+∞ but ∂c ψ(x) is not single-valued; the problem is to show that Z is
of dimension at most (n − 1)/β.
Let x ∈ M with ψ(x) < +∞, and let y ∈ ∂c ψ(x). Introduce an
action-minimizing curve γ = γ x,y joining x = γ(0) to y = γ(1). I claim
that the map
1
F : γ 7−→ x
2
9 Solution of the Monge problem I: Global approach 219

is well-deﬁned on its domain of deﬁnition, which is the union of all

γ x,y (1/2). (I mean, m = γ(1/2) determines x unambiguously; there
cannot be two different points x for which γ(1/2) is the same.) Indeed,
assume y ∈ ∂c ψ(x) and y ′ ∈ ∂c ψ(x′ ), with ψ(x) < +∞, ψ(x′ ) < +∞,
and let γ and γ ′ be minimizing geodesics between x and y on the one
hand, x′ and y ′ on the other hand. It follows from the definitions of
subdifferential that
(
ψ(x) + c(x, y) ≤ ψ(x′ ) + c(x′ , y)
ψ(x′ ) + c(x′ , y ′ ) ≤ ψ(x) + c(x, y ′ ).

Thus
c(x, y) + c(x′ , y ′ ) ≤ c(x, y ′ ) + c(x′ , y).
Then by (9.1),
1 1 β
d(x, x′ ) ≤ C d γ , γ′ .
2 2
This implies that m = γ(1/2) determines x = F (m) unambiguously,
and even that F is Hölder-β. (Obviously, this is the same reasoning as
in the proof of Theorem 8.5.)
Now, cover M by a countable number of open sets in which M is
diffeomorphic to a subset U of Rn , via some diffeomorphism ϕU . In each
of these open sets U , consider the union HU of all hyperplanes passing
through a point of rational coordinates, orthogonal to a unit vector
with rational coordinates. Transport this set back to M thanks to the
local diffeomorphism, and take the union over all the sets U . This gives
a set D ⊂ M with the following properties: (i) It is of dimension n − 1;
(ii) It meets every nontrivial continuous curve drawn on M (to see this,
write the curve locally in terms of ϕU and note that, by continuity, at
least one of the coordinates of the curve has to become rational at some
time).
Next, let x ∈ Z, and let y0 , y1 be two distinct elements of ∂c ψ(x).
By assumption there is a continuous curve (yt )0≤t≤1 lying entirely in
∂c ψ(x). For each t, introduce an action-minimizing curve (γt (s))0≤s≤1
between x and yt (s here is the time parameter along the curve). De-
fine mt := γt (1/2). This is a continuous path, nontrivial (otherwise
γ0 (1/2) = γ1 (1/2), but two minimizing trajectories starting from x can-
not cross in their middle, or they have to coincide at all times by (9.1)).
So there has to be some t such that yt ∈ D. Moreover, the map F con-
structed above satisfies F (yt ) = x for all t. It follows that x ∈ F (D).
(See Figure 9.1.)
220 9 Solution of the Monge problem I: Global approach

As a conclusion, Z ⊂ F (D). Since D is of Hausdorﬀ dimension n − 1

and F is β-Hölder, the dimension of F (D) is at most (n − 1)/β. ⊓
⊔

y1
m0

x
m1
Fig. 9.1. Scheme of proof for Theorem 9.2. Here there is a curve (yt )0≤t≤1 lying
entirely in ∂c ψ(x), and there is a nontrivial path (mt )0≤t≤1 obtained by taking the
midpoint between x and yt . This path has to meet D; but its image by γ(1/2) 7→ γ(0)
is {x}, so x ∈ F (D).

Now come the consequences in terms of Monge transport.

Corollary 9.3 (Solution of the Monge problem, I). Let M be a

Riemannian manifold, let c be a cost function on M × M , with associ-
ated cost functional C, and let µ, ν be two probability measures on M .
Assume that:
(i) C(µ, ν) < +∞;
(ii) the assumptions of Theorem 9.2 are satisfied;
(iii) µ gives zero probability to sets of dimension at most (n − 1)/β.
Then there is a unique (in law) optimal coupling (x, y) of µ and ν; it is
deterministic, and characterized (among all couplings of (µ, ν)) by the
existence of a c-convex function ψ such that

y ∈ ∂c ψ(x) almost surely. (9.2)

Equivalently, there is a unique optimal transport plan π; it is determin-

istic, and characterized by the existence of a c-convex ψ such that (9.2)
holds true π-almost surely.

Proof of Corollary 9.3. The conclusion is obtained by just putting to-

gether Theorems 9.2 and 5.30. ⊓
⊔
9 Solution of the Monge problem I: Global approach 221

We have now solved the Monge problem in an absolutely painless

way; but under what assumptions? At least we can treat the important
cost function c(x, y) = −x · y. Indeed the notion of c-convexity reduces
to plain convexity (plus lower semicontinuity), and the c-subdifferential
of a convex function ψ is just its usual subdifferential, which I shall
denote by ∂ψ. Moreover, under an assumption of finite second mo-
ments, for the Monge problem this cost is just as good as the usual
Rsquared Euclidean distance, since |x − y|2 = |x|2 − 2 x · y + |y|2 , and
(|x|2 + |y|2 ) dπ(x, y) is independent of the choice of π ∈ Π(µ, ν). Par-
ticular as it may seem, this case is one of the most important for ap-
plications, so I shall state the result as a separate theorem.

Theorem 9.4 (Monge problem for quadratic cost, first result).

Let c(x, y) = |x − y|2 in Rn . Let µ, ν be two probability measures on
Rn such that Z Z
|x| dµ(x) + |y|2 dν(y) < +∞
2
(9.3)

and µ does not give mass to sets of dimension at most n − 1. (This

is true in particular if µ is absolutely continuous with respect to the
Lebesgue measure.) Then there is a unique (in law) optimal coupling
(x, y) of µ and ν; it is deterministic, and characterized, among all cou-
plings of (µ, ν), by the existence of a lower semicontinuous convex func-
tion ψ such that
y ∈ ∂ψ(x) almost surely. (9.4)
In other words, there is a unique optimal transference π; it is a Monge
transport plan, and it is characterized by the existence of a lower semi-
continuous convex function ψ whose subdifferential contains Spt π.

Remark 9.5. The assumption that µ does not give mass to sets of di-
mension at most n − 1 is optimal for the existence of a Monge coupling,
as can be seen by choosing µ = H1 |[0,1]×{0} (the one-dimensional Haus-
dorﬀ measure concentrated on the segment [0, 1] × {0} in R2 ), and then
ν = (1/2) H1 |[0,1]×{−1}∪[0,1]×{+1} . (See Figure 9.2.) It is also optimal
1
for the uniqueness, as can be seen by taking µ = (1/2) H{0}×[−1,1] and
1 n
ν = (1/2) H[−1,1]×{0} . In fact, whenever µ, ν ∈ P2 (R ) are supported
on orthogonal subspaces of Rn , then any transference plan is optimal!
To see this, deﬁne a function ψ by ψ = 0 on Conv(Spt µ), ψ = +∞ else-
where; then ψ is convex lower semicontinuous, ψ ∗ = 0 on Conv(Spt ν),
so ∂ψ contains Spt µ × Spt ν, and any transference plan is supported
in ∂ψ.
222 9 Solution of the Monge problem I: Global approach

Fig. 9.2. The source measure is drawn as a thick line, the target measure as a thin
line; the cost function is quadratic. On the left, there is a unique optimal coupling
but no optimal Monge coupling. On the right, there are many optimal couplings, in
fact any transference plan is optimal.

In the next chapter, we shall see that Theorem 9.4 can be improved
in at least two ways: Equation (9.4) can be rewritten y = ∇ψ(x);
and the assumption (9.3) can be replaced by the weaker assumption
C(µ, ν) < +∞ (finite optimal transport cost).
Now if one wants to apply Theorem 9.2 to nonquadratic cost func-
tions, the question arises of how to identify those cost functions c(x, y)
which satisfy Assumption (C). Obviously, there might be some geo-
metric obstructions imposed by the domains X and Y: For instance,
if Y is a nonconvex subset of Rn , then Assumption (C) is violated
even by the quadratic cost function. But even in the whole of, say, Rn ,
Assumption (C) is not a generic condition, and so far there is only a
short
p list of known examples. These include the cost functions c(x, y) =
1 + |x − y|2 on Rn × Rn , or more generally c(x, y) = (1 + |x − y|2 )p/2
√
(1 < p < 2) on BR (0) × BR (0) ⊂ Rn × Rn , where R = 1/ p − 1; and
c(x, y) = d(x, y)2 on S n−1 × S n−1 , where d is the geodesic distance on
the sphere. For such cost functions, the Monge problem can be solved
by combining Theorems 8.1, 9.2 and 5.30, exactly as in the proof of
Theorem 9.4.
This approach suffers, however, from two main drawbacks: First it
seems to be limited to a small number of examples; secondly, the verifi-
cation of Assumption (C) is subtle. In the next chapter we shall inves-
tigate a more pedestrian approach, which will apply in much greater
generality.
I shall end this chapter with a simple example of a cost function
which does not satisfy Assumption (C).
Bibliographical notes 223

Proposition 9.6 (Non-connectedness of the c-subdifferential).

Let p > 2 and let c(x, y) = |x− y|p on R2 × R2 . Then there is a c-convex
function ψ : R2 → R such that ∂c ψ(0) is not connected.

Proof of Proposition 9.6. For t ∈ [−1, 1] deﬁne yt = (0, t) ∈ R2 , and

p/2
ηt (x) = −c(x, yt ) + c(0, yt ) + β(t) = − x21 + (x2 − t)2 + |t|p + β(t),

where β is a smooth even function, β(0) = 0, β ′ (t) > 0 for |t| > 0.
Further, let r > 0 and X± = (±r, 0). (The fact that the segments
[X− , X+ ] and [y−1 , y1 ] are orthogonal is not accidental.) Then ηt (0) =
β(t) is an increasing function of |t|; while ηt (X± ) = −(r 2 +t2 )p/2 +|t|p +
β(t) is a decreasing function of |t| if 0 < β ′ (t) < pt [(r 2 +t2 )p/2−1 −tp−2 ],
which we shall assume. Now deﬁne ψ(x) = sup {ηt (x); t ∈ [−1, 1]}. By
construction this is a c-convex function, and ψ(0) = β(1) > 0, while
ψ(X± ) = η0 (X± ) = −r p .
We shall check that ∂c ψ(0) is not connected. First, ∂c ψ(0) is not
empty: this can be shown by elementary means or as a consequence
of Example 10.20 and Theorem 10.24 in the next chapter. Secondly,
∂c ψ(0) ⊂ {(y1 , y2 ) ∈ R2 ; y1 = 0}: This comes from the fact that all
functions ηt are decreasing as a function of |x1 |. (So ψ is also nonin-
creasing in |x1 |, and if (y1 , y2 ) ∈ ∂c ψ(0, 0), then (y12 + y22 )p/2 + ψ(0, 0) ≤
|y2 |p + ψ(y1 , 0) ≤ |y2 |p + ψ(0, 0), which imposes y1 = 0.) Obviously,
∂c ψ(0) is a symmetric subset of the line {y1 = 0}. But if 0 ∈ ∂c ψ(0),
then 0 < ψ(0) ≤ |X± |p + ψ(X± ) = 0, which is a contradiction. So
∂c ψ(0) does not contain 0, therefore it is not connected.
(What is happening is the following. When replacing η0 by ψ, we
have surelevated the origin, but we have kept the points (X± , η0 (X± ))
in place, which forbids us to touch the graph of ψ from below at the
origin with a translation of η0 .) ⊓
⊔

Bibliographical notes

It is classical that the image of a set of Hausdorﬀ dimension d by a

Lipschitz map is contained in a set of Hausdorff dimension at most d:
See for instance [331, p. 75]. There is no difficulty in modifying the
proof to show that the image of a set of Hausdorff dimension d by a
Hölder-β map is contained in a set of dimension at most d/β.
224 9 Solution of the Monge problem I: Global approach

The proof of Theorem 9.2 is adapted from a classical argument

according to which a real-valued convex function ψ on Rn has a single-
valued subdifferential everywhere out of a set of dimension at most
n − 1; see [11, Theorem 2.2]. The key estimate for the proof of the
latter theorem is that (Id + ∂ψ)−1 exists and is Lipschitz; but this can
be seen as a very particular case of the Mather shortening lemma. In
the next chapter another line of argumentation for that differentiability
theorem, more local, will be provided.
The paternity of Theorem 9.4 is shared by Brenier [154, 156] with
Rachev and Rüschendorf [722]; it builds upon earlier work by Knott and
Smith [524], who already knew that an optimal coupling lying entirely
in the subdifferential of a convex function would be optimal. Brenier
rewrote the result as a beautiful polar factorization theorem, which
is presented in detail in [814, Chapter 3].
The nonuniqueness statement in Remark 9.5 was formulated by Mc-
Cann [613]. Related problems (existence and uniqueness of optimal
couplings between measures supported on polygons) are discussed by
Gangbo and McCann [400], in relation to problems of shape recogni-
tion.
Other forms of Theorem 9.4 appear in Rachev and Rüschendorf [696],
in particular an extension to infinite-dimensional separable Hilbert
spaces; the proof is reproduced in [814, Second Proof of Theorem 2.9].
(This problem was also considered in [2, 254].) All these arguments
are based on duality; then more direct proofs, which do not use the
Kantorovich duality explicitly, were found by Gangbo [395], and also
Caffarelli [187] (who gives credit to Varadhan for this approach).
A probabilistic approach of Theorem 9.4 was studied by Mikami
and Thieullen [628, 630]. The idea is to consider a minimization prob-
lem over paths which are not geodesics, but geodesics perturbed by
some noise; then let the noise vanish. This is reminiscent of Nelson’s
approach to quantum mechanics, see the bibliographical notes of Chap-
ters 7 and 23.
McCann [613] extended Theorem 9.4 by removing the assumption
of bounded second moment and even the weaker assumption of finite
transport cost: Whenever µ does not charge sets of dimension n − 1,
there exists a unique coupling of (µ, ν) which takes the form y = ∇Ψ (x),
where Ψ is a lower semicontinuous convex function. The tricky part
in this statement is the uniqueness. This theorem will be proven in
Bibliographical notes 225

the next chapter (see Theorem 10.42, Corollary 10.44 and Particular
Case 10.45).
Ma, Trudinger and X.-J. Wang [585, Section 7.5] were the first to
seriously study Assumption (C); they had the intuition that it was
connected to a certain fourth-order differential condition on the cost
function which plays a key role in the smoothness of optimal transport.
Later Trudinger and Wang [793], and Loeper [570] showed that the
above-mentioned differential condition is essentially, under adequate
geometric and regularity assumptions, equivalent to Assumption (C).
These issues will be discussed in more detail in Chapter 12. (See in
particular Proposition 12.15(iii).)
The counterexample in Proposition 9.6 is extracted from [585]. The
fact that c(x, y) = (1 + |x − y|2 )p/2 satisfies Assumption (C) on the ball
√
of radius 1/ p − 1 is also taken from [585, 793]. It is Loeper [571] who
discovered that the squared geodesic distance on S n−1 satisfies Assump-
tion (C); then a simplified argument was devised by von Nessi [824].
As mentioned in the end of the chapter, by combining Loeper’s
result with Theorems 8.1, 9.2 and 5.30, one can mimick the proof of
Theorem 9.4 and get the unique solvability of the Monge problem for
the quadratic distance on the sphere, as soon as µ does not see sets of
dimension at most n − 1. Such a theorem was first obtained for general
compact Riemannian manifolds by McCann [616], with a completely
different argument.
Other examples of cost functions satisfying Assumption (C) will
be listed in Chapter 12 (for instance |x − y|−2 , or −|x − y|p /p for
−2 ≤ p ≤ 1, or |x − y|2 + |f (x) − g(y)|2 , where f and g are convex
and 1-Lipschitz). But these other examples do not come from a smooth
convex Lagrangian, so it is not clear whether they satisfy Assumption
(ii) in Theorem 9.2.
In the particular case when ν has finite support, one can prove
the unique solvability of the Monge problem under much more general
assumptions, namely that the cost function is continuous, and µ does
not charge sets of the form {x; c(x, a) − c(x, b) = k} (where a, b, k
are arbitrary), see [261]. This condition was recently used again by
Gozlan [429].
10

Solution of the Monge problem II: Local

approach

In the previous chapter, we tried to establish the almost sure single-

valuedness of the c-subdifferential by an argument involving “global”
topological properties, such as connectedness. Since this strategy worked
out only in certain particular cases, we shall now explore a different
method, based on local properties of c-convex functions. The idea is
that the global question “Is the c-subdifferential of ψ at x single-valued
or not?” might be much more subtle to attack than the local ques-
tion “Is the function ψ differentiable at x or not?” For a large class
of cost functions, these questions are in fact equivalent; but these dif-
ferent formulations suggest different strategies. So in this chapter, the
emphasis will be on tangent vectors and gradients, rather than points
in the c-subdifferential.
This approach takes its source from the works by Brenier, Rachev
and Rüschendorf on the quadratic cost in Rn , around the end of the
eighties. It has since then been improved by many authors, a key step
being the extension to Riemannian manifolds, first addressed by Mc-
Cann in 2000.
The main results in this chapter are Theorems 10.28, 10.38 and (to
a lesser extent) 10.42, which solve the Monge problem with increasing
generality. For Parts II and III of this course, only the particular case
considered in Theorem 10.41 is needed.

A heuristic argument
Let ψ be a c-convex function on a Riemannian manifold M , and φ = ψ c .
Assume that y ∈ ∂c ψ(x); then, from the deﬁnition of c-subdiﬀerential,
228 10 Solution of the Monge problem II: Local approach

e ∈ M,
one has, for all x
(
φ(y) − ψ(x) = c(x, y)
(10.1)
φ(y) − ψ(e
x) ≤ c(e
x, y).

It follows that
ψ(x) − ψ(e
x) ≤ c(e
x, y) − c(x, y). (10.2)
Now the idea is to see what happens when x e → x, along a given
direction. Let w be a tangent vector at x, and consider a path ε → xe(ε),
deﬁned for ε ∈ [0, ε0 ), with initial position x and initial velocity w.
e(ε) = expx (εw); or in Rn , just consider x
(For instance, x e(ε) = x + εw.)
Assume that ψ and c( · , y) are diﬀerentiable at x, divide both sides
of (10.2) by ε > 0 and pass to the limit:

−∇ψ(x) · w ≤ ∇x c(x, y) · w. (10.3)

If then one changes w to −w, the inequality will be reversed. So neces-

sarily
∇ψ(x) + ∇x c(x, y) = 0. (10.4)
If x is given, this is an equation for y. Since our goal is to show that y
is determined by x, then it will surely help if (10.4) admits at most one
solution, and this will obviously be the case if ∇x c(x, · ) is injective.
This property (injectivity of ∇x c(x, · )) is in fact a classical condition
in the theory of dynamical system, where it is sometimes referred to as
a twist condition.
Three objections might immediately be raised. First, ψ is an un-
known of the problem, defined by an infimum, so why would it be
differentiable? Second, the injectivity of ∇x c as a function of y seems
quite hard to check on concrete examples. Third, even if c is given in
the problem and a priori quite nice, why should it be differentiable at
(x, y)? As a very simple example, consider the square distance func-
tion d(x, y)2 on the 1-dimensional circle S 1 = R/(2πZ), identified with
[0, 2π):
d(x, y) = min |x − y|, 2π − |x − y| .
Then d(x, y) is not differentiable as a function of x when |y − x| = π,
and of course d(x, y)2 is not differentiable either (see Figure 10.1).
Similar problems would occur on, say, a compact Riemannian man-
ifold, as soon as there is no uniqueness of the geodesic joining x to y.
Differentiability and approximate differentiability 229
d(x, 0) d(x, 0)2

x x
0 π 2π 0 π 2π

Fig. 10.1. The distance function d(·, y) on S 1 , and its square. The upper-pointing
singularity is typical. The square distance is not differentiable when |x − y| = π; still
it is superdifferentiable, in a sense that is explained later.

For instance, if N and S respectively stand for the north and south
poles on S 2 , then d(x, S) fails to be differentiable at x = N .
Of course, for any x this happens only for a negligible set of y’s; and
the cost function is differentiable everywhere else, so we might think
that this is not a serious problem. But who can tell us that the optimal
transport will not try to take each x (or a lot of them) to a place y
such that c(x, y) is not differentiable?
To solve these problems, it will be useful to use some concepts from
nonsmooth analysis: subdifferentiability, superdifferentiability, approxi-
mate differentiability. The short answers to the above problems are that
(a) under adequate assumptions on the cost function, ψ will be differ-
entiable out of a very small set (of codimension at least 1); (b) c will
be superdifferentiable because it derives from a Lagrangian, and subd-
ifferentiable wherever ψ itself is differentiable; (c) where it exists, ∇x c
will be injective because c derives from a strictly convex Lagrangian.
The next three sections will be devoted to some basic reminders
about differentiability and regularity in a nonsmooth context. For the
convenience of the non-expert reader, I shall provide complete proofs
of the most basic results about these issues. Conversely, readers who
feel very comfortable with these notions can skip these sections.

Differentiability and approximate differentiability

Let us start with the classical deﬁnition of diﬀerentiability:
Definition 10.1 (Differentiability). Let U ⊂ Rn be an open set. A
function f : U → R is said to be differentiable at x ∈ U if there exists
230 10 Solution of the Monge problem II: Local approach

a vector p ∈ Rn such that

f (z) = f (x) + hp, z − xi + o(|z − x|) as z → x.

Then the vector p is uniquely determined; it is called the gradient of f

at x, and is denoted by ∇f (x); the map w → hp, wi is the differential
of f at x.
If U is an open set of a smooth Riemannian manifold M , f : U → R
is said to be differentiable at x if it is so when expressed in a local chart
around x; or equivalently if there is a tangent vector p ∈ Tx M such that

f (expw x) = f (x) + hp, wi + o(|w|) as w → 0.

The vector p is again denoted by ∇f (x).

Diﬀerentiability is a pointwise concept, which is not invariant un-

der, say, change of Lebesgue equivalence class: If f is differentiable or
even C ∞ everywhere, by changing it on a dense countable set we may
obtain a function which is discontinuous everywhere, and a fortiori not
differentiable. The next notion is more flexible in this respect, since it
allows for modification on a negligible set. It relies on the useful con-
cept of density. Recall that a measurable set A is said to have density
ρ at x if
vol [A ∩ Br (x)]
lim = ρ.
r→0 vol [Br (x)]
It is a basic result of measure theory that a measurable set in Rn , or
in a Riemannian manifold, has density 1 at almost all of its points.

Definition 10.2 (Approximate differentiability). Let U be an

open set of a Riemannian manifold M , and let f : U → R ∪ {±∞} be a
measurable function. Then f is said to be approximately differentiable
at x ∈ U if there is a measurable function fe : U → R, differentiable
at x, such that the set {fe = f } has density 1 at x; in other words,
hn oi
vol z ∈ Br (x); f (z) = fe(z)
lim = 1.
r→0 vol [Br (x)]

Then one defines the approximate gradient of f at x by the formula

e (x) = ∇fe(x).
∇f
Differentiability and approximate differentiability 231

Proof that ∇f e (x) is well-defined. Since this concept is local and invari-
ant by diffeomorphism, it is sufficient to treat the case when U is a
subset of Rn .
Let fe1 and fe2 be two measurable functions on U which are both
differentiable at x and coincide with f on a set of density 1. The problem
is to show that ∇fe1 (x) = ∇fe2 (x).
For each r > 0, let Zr be the set of points in Br (x) where either
f (x) 6= fe1 (x) or f (x) 6= fe2 (x). It is clear that vol [Zr ] = o(vol [Br (x)]).
Since fe1 and fe2 are continuous at x, one can write
Z
1
fe1 (x) = lim fe1 (z) dz
r→0 vol [Br (x)]
Z Z
1 e 1
= lim f1 (z) dz = lim fe2 (z) dz
r→0 vol [Br (x) \ Zr ] r→0 vol [Br (x) \ Zr ]
Z
1
= lim fe2 (z) dz = fe2 (x).
r→0 vol [Br (x)]

So let fe(x) be the common value of fe1 and fe2 at x.

Next, for any z ∈ Br (x) \ Zr , one has


 e e e
f1 (z) = f (x) + ∇f1 (x), z − x + o(r),

e

f2 (z) = fe(x) + ∇fe2 (x), z − x + o(r),
so

∇fe1 (x) − ∇fe2 (x), z − x = o(r).
Let w := ∇fe1 (x) − ∇fe2 (x); the previous estimate reads

x∈
/ Zr =⇒ hw, z − xi = o(r). (10.5)

If w 6= 0, then the set of z ∈ Br (x) such that hw, z − xi ≥ r/2 has

measure at least K vol [Br (x)], for some K > 0. If r is small enough,
then vol [Zr ] ≤ (K/4) vol [Br (x)] ≤ (K/2) vol [Br (x) \ Zr ], so
hn r oi K
vol z ∈ Br (x) \ Zr ; hw, z − xi ≥ ≥ vol [Br (x) \ Zr ].
2 2
Then (still for r small enough),
Z

hw, z − xi dy
Br (x)\Zr Kr
≥ ,
vol [Br (x) \ Zr ] 4
232 10 Solution of the Monge problem II: Local approach

in contradiction with (10.5). The conclusion is that w = 0, which was

the goal. ⊓
⊔

Regularity in a nonsmooth world

Regularity is a loose concept about the control of “how fast” a function

varies. In the present section I shall review some notions of regularity
which apply in a nonsmooth context, and act as a replacement for, say,
C 1 or C 2 regularity bounds.

Definition 10.3 (Lipschitz continuity). Let U ⊂ Rn be open, and

let f : U → R be given. Then:
(i) f is said to be Lipschitz if there exists L < ∞ such that

∀x, z ∈ U, |f (z) − f (x)| ≤ L|z − x|.

(ii) f is said to be locally Lipschitz if, for any x0 ∈ U , there is a

neighborhood O of x0 in which f is Lipschitz.
If U is an open subset of a Riemannian manifold M , then f : U → R
is said to be locally Lipschitz if it is so when expressed in local charts;
or equivalently if f is Lipschitz on any compact subset of U , equipped
with the geodesic distance on M .

Example 10.4. Obviously, a C 1 function is locally Lipschitz, but the

converse is not true (think of f (x) = |x|).

Definition 10.5 (Subdifferentiability, superdifferentiability). Let

U be an open set of Rn , and f : U → R a function. Then:
(i) f is said to be subdifferentiable at x, with subgradient p, if

f (z) ≥ f (x) + p, z − x + o(|z − x|).

The convex set of all subgradients p at x will be denoted by ∇− f (x).

(ii) f is said to be uniformly subdifferentiable in U if there is a
continuous function ω : R+ → R+ , such that ω(r) = o(r) as r → 0,
and

∀x ∈ U ∃ p ∈ Rn ; f (z) ≥ f (x) + p, z − x − ω(|z − x|).
Regularity in a nonsmooth world 233

(iii) f is said to be locally subdifferentiable (or locally uniformly

subdifferentiable) in U if each x0 ∈ U admits a neighborhood on which
f is uniformly subdifferentiable.
If U is an open set of a smooth manifold M , a function f : U → R
is said to be subdifferentiable at some point x (resp. locally subdifferen-
tiable in U ) if it is so when expressed in local charts.
Corresponding notions of superdifferentiability and supergradients
are obtained in an obvious way by just reversing the signs of the in-
equalities. The convex set of supergradients for f at x is denoted by
∇+ f (x).

Examples 10.6. If f has a minimum at x0 ∈ U , then 0 is a subgradient

of f at x0 , whatever the regularity of f . If f has a subgradient p at
x and g is smooth, then f + g has a subgradient p + ∇g(x) at x. If f
is convex in U , then it is (uniformly) subdiﬀerentiable at every point
in U , by the well-known inequality

f (z) ≥ f (x) + p, z − x ,

which holds true as soon as p ∈ ∂f (x) and [x, y] ⊂ U . If f is the sum

of a convex function and a smooth function, then it is also uniformly
subdiﬀerentiable.

It is obvious that diﬀerentiability implies both subdiﬀerentiability

and superdiﬀerentiability. The converse is true, as shown by the next
statement.

Proposition 10.7 (Sub- and superdifferentiability imply differ-

entiability). Let U be an open set of a smooth Riemannian mani-
fold M , and let f : U → R be a function. Then f is differentiable at x
if and only if it is both subdifferentiable and superdifferentiable there;
and then
∇− f (x) = ∇+ f (x) = {∇f (x)}.

Proof of Proposition 10.7. The only nontrivial implication is that if f is

both subdifferentiable and superdifferentiable, then it is differentiable.
Since this statement is local and invariant by diffeomorphism, let us
pretend that U ⊂ Rn . So let p ∈ ∇− f (x) and q ∈ ∇+ f (x); then

f (z) − f (x) ≥ p, z − x − o(|z − x|);
234 10 Solution of the Monge problem II: Local approach

f (z) − f (x) ≤ q, z − x + o(|z − x|).
This implies hp − q, z − xi ≤ o(|z − x|), which means
D z−x E
lim p − q, = 0.
z→x; z6=x |z − x|

Since the unit vector (z − x)/|z − x| can take arbitrary ﬁxed values in
the unit sphere as z → x, it follows that p = q. Then

f (z) − f (x) = p, z − x + o(|z − x|),

which means that f is indeed diﬀerentiable at x. This also shows that

p = q = ∇f (x), and the proof is complete. ⊓
⊔

The next proposition summarizes some of the most important re-

sults about the links between regularity and diﬀerentiability:

Theorem 10.8 (Regularity and differentiability almost every-

where). Let U be an open subset of a smooth Riemannian manifold
M , and let f : U → R be a function. Let n be the dimension of M .
Then:
(i) If f is continuous, then it is subdifferentiable on a dense subset
of U , and also superdifferentiable on a dense subset of U .
(ii) If f is locally Lipschitz, then it is differentiable almost every-
where (with respect to the volume measure).
(iii) If f is locally subdifferentiable (resp. locally superdifferentiable),
then it is locally Lipschitz and differentiable out of a countably (n − 1)-
rectifiable set. Moreover, the set of differentiability points coincides with
the set of points where there is a unique subgradient (resp. supergradi-
ent). Finally, ∇f is continuous on its domain of definition.

Remark 10.9. Statement (ii) is known as Rademacher’s theorem.

The conclusion in statement (iii) is stronger than differentiability al-
most everywhere, since an (n−1)-rectifiable set has dimension n−1, and
is therefore negligible. In fact, as we shall see very soon, the local sub-
differentiability property is stronger than the local Lipschitz property.
Reminders about the notion of countable rectifiability are provided in
the Appendix.

Proof of Theorem 10.8. First we can cover U by a countable collection

of small open sets Uk , each of which is diﬀeomorphic to an open subset
Regularity in a nonsmooth world 235

Ok of Rn . Then, since all the concepts involved are local and invariant
under diffeomorphism, we may work in Ok . So in the sequel, I shall
pretend that U is a subset of Rn .
Let us start with the proof of (i). Let f be continuous on U , and
let V be an open subset of U ; the problem is to show that f admits
at least one point of subdifferentiability in V . So let x0 ∈ V , and let
r > 0 be so small that B(x0 , r) ⊂ V . Let B = B(x0 , r), let ε > 0
and let g be defined on B by g(x) := f (x) + |x − x0 |2 /ε. Since f is
continuous, g attains its minimum on B. But g on ∂B is bounded below
by r 2 /ε − M , where M is an upper bound for |f | on B. If ε < r 2 /(2M ),
then g(x0 ) = f (x0 ) ≤ M < r 2 /ε − M ≤ inf ∂B g; so g cannot achieve
its minimum on ∂B, and has to achieve it at some point x1 ∈ B. Then
g is subdifferentiable at x1 , and therefore f also. This establishes (i).
The other two statements are more tricky. Let f : U → R be a
Lipschitz function. For v ∈ Rn and x ∈ U , define

f (x + tv) − f (x)
Dv f (x) := lim , (10.6)
t→0 t
provided that this limit exists. The problem is to show that for almost
any x, there is a vector p(x) such that Dv f (x) = hp(x), vi and the limit
in (10.6) is uniform in, say, v ∈ S n−1 . Since the functions [f (x + tv) −
f (x)]/t are uniformly Lipschitz in v, it is enough to prove the pointwise
convergence (that is, the mere existence of Dv f (x)), and then the limit
will automatically be uniform by Ascoli’s theorem. So the goal is to
show that for almost any x, the limit Dv f (x) exists for any v, and is
linear in v.
It is easily checked that:
(a) Dv f (x) is homogeneous in v: Dtv f (x) = t Dv f (x);
(b) Dv f (x) is a Lipschitz function of v on its domain: in fact,
|Dv f (x) − Dw f (x)| ≤ L |v − w|, where L = kf kLip ;
(c) If Dw f (x) → ℓ as w → v, then Dv f (x) = ℓ; this comes from the
estimate

f (x + tv) − f (x) f (x + tvk ) − f (x)
sup − ≤ kf kLip |v − vk |.
t t t

For each v ∈ Rn , let Av be the set of x ∈ Rn such that Dv f (x)

does not exist. The ﬁrst claim is that each Av has zero Lebesgue mea-
sure. This is obvious if v = 0. Otherwise, let H = v ⊥ be the hyper-
plane orthogonal to v, passing through the origin. For each x0 ∈ H,
236 10 Solution of the Monge problem II: Local approach

let Lx0 = x0 + Rv be the line parallel to v, passing through x0 . The

nonexistence of Dv f (x) at x = x0 + t0 v is equivalent to the nondif-
ferentiability of t 7−→ f (x + tv) at t = t0 . Since t 7−→ f (x + tv) is
Lipschitz R → R, it follows from a well-known result of real analysis
that it is diﬀerentiable for λ1 -almost all t ∈ R, where λ1 stands for
the one-dimensional Lebesgue R measure. So λ1 [Av ∩ Lx0 ] = 0. Then by
Fubini’s theorem, λn [Av ] = H λ1 [Av ∩ Lx0 ] dx0 = 0, where λn is the
n-dimensional Lebesgue measure, and this proves the claim.
The problem consists in extending the function Dv f into a linear
(not just homogeneous) function of v. Let v ∈ Rn , and let ζ be a smooth
compactly supported function. Then, by the dominated convergence
theorem,
Z
f (y + tv) − f (y)
(ζ ∗ Dv f )(x) = ζ(x − y) lim dy
t→0 t
Z
1
= lim ζ(x − y) [f (y + tv) − f (y)] dy
t→0 t
Z
1
= lim ζ(x − y + tv) − ζ(x − y) f (y) dy
t→0 t
Z
= h∇ζ(x − y), vi f (y) dy.

(Note that ζ ∗ Dv f is well-deﬁned for any x, even if Dv f is deﬁned only

for almost all x.) So ζ ∗ Dv f depends linearly on v. In particular, if v
and w are any two vectors in Rn , then

ζ ∗ [Dv+w f − Dv f − Dw f ] = 0.

Since ζ is arbitrary, it follows that

Dv f (x) + Dw f (x) = Dv+w f (x) (10.7)

for almost all x ∈ Rn \ (Av ∩ Aw ∩ Av+w ), that is, for almost all x ∈ Rn .
Now it is easy to conclude. Let Bv,w be the set of all x ∈ Rn such
that Dv f (x), Dw f (x) or Dv+w f (x) is not well-deﬁned, or (10.7) does
n
S hold true. Let (vk )k∈N be a dense sequence in R , and let B :=
not
j,k∈N Bvj ,vk . Then B is still Lebesgue-negligible, and for each x ∈/B
we have
Dvj +vk f (x) = Dvj f (x) + Dvk f (x). (10.8)
Since Dv f (x) is a Lipschitz continuous function of v, it can be extended
uniquely into a Lipschitz continuous function, deﬁned for all x ∈ / B
Regularity in a nonsmooth world 237

and v ∈ Rn , which turns out to be Dv f (x) in view of Property (c).

By passing to the limit in (10.8), we see that Dv f (x) is an additive
function of v. We already know that it is a homogeneous function of v,
so it is in fact linear. This concludes the proof of (ii).
Next let us turn to the proof of (iii). Before going on, I shall ﬁrst
explain in an informal way the main idea of the proof of statement
(iii). Suppose for simplicity that we are dealing with a convex function
in Rn . If p lies in the subdiﬀerential ∂ψ(x) of ψ at x, then for all z ∈ Rn ,

ψ(z) ≥ ψ(x) + p, z − x .

In particular, if p ∈ ∂ψ(x) and p′ ∈ ∂ψ(x′ ), then

p − p′ , x − x′ ≥ 0.

If ψ is not diﬀerentiable at x, this means that the convex set ∂ψ(x)

is not a single point, so it should contain a line segment [p, p′ ] ⊂ Rn .
For these heuristic explanations, let us fix p and p′ , and let Σ be the
set of all x ∈ Rn such that [p, p′ ] ⊂ ∂ψ(x). Then hp − p′ , x − x′ i ≥ 0
for all x, x′ ∈ Σ. By exchanging the roles of p and p′ , we see that
actually hp − p′ , x − x′ i = 0. This implies that Σ is included in a single
hyperplane, orthogonal to p − p′ ; in particular its dimension is at most
n − 1.
The rigorous argument can be decomposed into six steps. In the
sequel, ψ will stand for a locally subdiﬀerentiable function.
Step 1: ψ is locally semiconvex. Without loss of generality, we may
assume that ω(r)/r is nondecreasing continuous (otherwise replace ω(r)
by ω(r) = r sup {ω(s)/s; s ≤ r} which is a nondecreasing continuous
function of r); then ω(tr) ≤ t ω(r).
Let x0 ∈ U , let V be a convex neighborhood of x0 in U . Let x, y ∈ V ,
t ∈ [0, 1] and p ∈ ∇− ψ((1 − t) x + t y). Then

ψ(x) ≥ ψ (1 − t) x + t y + ht (x − y), pi − t ω(|x − y|); (10.9)

ψ(y) ≥ ψ (1−t) x+t y +h(1−t) (y −x), pi−(1−t) ω(|x−y|). (10.10)
Take the linear combination of (10.9) and (10.10) with respective co-
eﬃcients (1 − t) and t: the result is

ψ (1 − t) x + t y ≤ (1 − t) ψ(x) + t ψ(y) + 2 t(1 − t) ω(|x − y|). (10.11)
238 10 Solution of the Monge problem II: Local approach

Step 2: ψ is locally bounded above. Let x0 ∈ U , let x1 , . . . , xN ∈ U

be such that the convex hull C of (x1 , . . . , xN ) is a neighborhood
P of
x0 (N = 2nP will do). Any point of C can be written as αi xi where
0 ≤ αi ≤ i, αi = 1. By (10.11) and ﬁnite induction,
X X
ψ αi xi ≤ αi ψ(xi ) + 2N max ω(|xi − xj |);
ij

so ψ is bounded above on C, and therefore in a neighborhood of x0 .

Step 3: ψ is locally bounded below. Let x0 ∈ U , let V be a neighborhood
of x0 on which ψ is bounded above, and let B = Br (x0 ), where r
is such that Br (x0 ) ⊂ V . For any x ∈ B, let y = 2 x0 − x; then
|x0 − y| = |x0 − x| < r, so y ∈ B, and

x+y 1 ω(|x − y|)
ψ(x0 ) = ψ ≤ ψ(x) + ψ(y) + .
2 2 2

Since ψ(x0 ) is ﬁxed and ψ(y) is bounded above, it follows that ψ(x) is
bounded below for x ∈ B.
Step 4: ψ is locally Lipschitz. Let x0 ∈ U , let V be a neighborhood of
x0 on which |ψ| ≤ M < +∞, and let r > 0 be such that Br (x0 ) ⊂ V .
For any y, y ′ ∈ Br/2 (x0 ), we can write y ′ = (1 − t) y + t z, where
t = |y − y ′ |/r, so z = (y − y ′ )/t + y ∈ Br (x0 ) and |y − z| = r. Then
ψ(y ′ ) ≤ (1 − t) ψ(y) + t ψ(z) + 2 t(1 − t) ω(|y − z|), so

ψ(y ′ ) − ψ(y) ψ(y ′ ) − ψ(y) ψ(y) − ψ(z) 2 ω(|y − z|)

′
= ≤ +
|y − y | t |y − z| |y − z| |y − z|
2M 2 ω(r)
≤ + .
r r
Thus the ratio [ψ(y ′ ) − ψ(y)]/|y ′ − y| is uniformly bounded above
in Br/2 (x0 ). By symmetry (exchange y and y ′ ), it is also uniformly
bounded below, and ψ is Lipschitz on Br/2 (x0 ).
Step 5: ∇− ψ is continuous. This means that if pk ∈ ∇− ψ(xk ) and
(xk , pk ) → (x, p) then p ∈ ∇− ψ(x). To prove this, it suﬃces to pass to
the limit in the inequality

ψ(z) ≥ ψ(xk ) + hpk , z − xk i − ω(|z − xk |).

Step 6: ψ is differentiable out of an (n − 1)-dimensional set. Indeed,

let Σ be the set of points x such that ∇− ψ(x) is not reduced to a
Regularity in a nonsmooth world 239

single element. Since ∇− ψ(x) is a convex set, for each x ∈ Σ there is a

nontrivial segment [p, p′ ] ⊂ ∇− ψ(x). So
[
Σ= Σ (ℓ) ,
ℓ∈N

where Σ (ℓ) is the set of points x such that ∇− ψ(x) contains a segment
[p, p′ ] of length 1/ℓ and |p| ≤ ℓ. To conclude, it is sufficient to show that
each Σ (ℓ) is countably (n − 1)-rectifiable, and for that it is sufficient to
show that for each x ∈ Σ (ℓ) the dimension of the tangent cone Tx Σ (ℓ)
is at most n − 1 (Theorem 10.48(i) in the First Appendix).
So let x ∈ Σ (ℓ) , and let q ∈ Tx Σ (ℓ) , q 6= 0. By assumption, there is
a sequence xk ∈ Σ (ℓ) such that
xk − x
−→ q.
tk
In particular |x − xk |/tk converges to the finite, nonzero limit |q|.
For any k ∈ N, there is a segment [pk , p′k ], of length ℓ−1 , that is
contained in ∇− ψ(xk ); and |pk | ≤ ℓ, |p′k | ≤ ℓ + ℓ−1 . By compactness,
up to extraction of a subsequence one has xk → x, pk → p, p′k → p′ ,
|p − p′ | = ℓ−1 . By continuity of ∇− ψ, both p and p′ belong to ∇− ψ(x).
Then the two inequalities

′

ψ(x) ≥ ψ(xk ) + pk , x − xk − ω(|x − xk |)



ψ(xk ) ≥ ψ(x) + p, xk − x − ω(|x − xk |)

combine to yield

p − p′k , x − xk ≥ −2 ω(|x − xk |).

So D x − xk E ω(|x − xk |) |x − xk |
p − p′k , ≥ −2 .
tk |x − xk | tk
Passing to the limit, we ﬁnd

hp − p′ , qi ≥ 0.

But the roles of p and p′ can be exchanged, so actually

hp − p′ , qi = 0.
240 10 Solution of the Monge problem II: Local approach

Since p − p′ is nonzero, this means that q belongs to the hyperplane

(p − p′ )⊥ . So for each x ∈ Σ (ℓ) , the tangent cone Tx Σ (ℓ) is included in
a hyperplane.
To conclude the proof of (iii), it only remains to prove the equiva-
lence between differentiability and unique subdifferentiability. If x is a
differentiability point of ψ, then we know from Proposition 10.7 that
there is a unique subgradient of f at x. Conversely, assume that x is
such that ∇− ψ(x) = {p}. Let (xk )k∈N be a dense sequence in a neigh-
borhood of x; for each k ∈ N, let pk ∈ ∇− ψ(xk ). By definition of
subdifferentiability,

ψ(x) ≥ ψ(xk ) + hpk , x − xk i − ω(|x − xk |)

= ψ(xk ) + hp, x − xk i + hpk − p, x − xk i − ω(|x − xk |).

The continuity of ∇− ψ imposes pk → p as xk → x; it follows that

ψ(x) ≥ ψ(xk ) + hp, x − xk i − o(|x − xk |). By density,

ψ(x) ≥ ψ(y) + hp, x − yi − o(|x − y|) as y → x.

This shows that p ∈ ∇+ ψ(x); then by Proposition 10.7, p = ∇ψ(x)

and the proof is ﬁnished. ⊓
⊔

Semiconvexity and semiconcavity

Convexity can be expressed without any reference to smoothness,

yet it implies a lower bound on the Hessian. In nonsmooth analysis,
convexity-type estimates are often used as a replacement for second-
order derivative bounds. In this respect the notion of semiconvexity is
extremely convenient.

Definition 10.10 (Semiconvexity). Let U be an open set of a

smooth Riemanian manifold and let ω : R+ → R+ be continuous, such
that ω(r) = o(r) as r → 0. A function f : U → R ∪ {+∞} is said to
be semiconvex with modulus ω if, for any constant-speed geodesic path
(γt )0≤t≤1 , whose image is included in U ,

f (γt ) ≤ (1 − t)f (γ0 ) + tf (γ1 ) + t(1 − t) ω d(γ0 , γ1 ) . (10.12)
Semiconvexity and semiconcavity 241

It is said to be locally semiconvex if for each x0 ∈ U there is a neighbor-

hood V of x0 in U such that (10.12) holds true as soon as γ0 , γ1 ∈ V ;
or equivalently if (10.12) holds true for some fixed modulus ωK as long
as γ stays in a compact subset K of U .
Similar definitions for semiconcavity and local semiconcavity are
obtained in an obvious way by reversing the sign of the inequality
in (10.12).
Example 10.11. In Rn , semiconvexity with modulus ω means that for
any x, y in Rn and for any t ∈ [0, 1],

f (1 − t)x + ty ≤ (1 − t) f (x) + t f (y) + t(1 − t) ω(|x − y|).

When ω = 0 this is the usual notion of convexity. In the case ω(r) =

Cr 2 /2, there is a differential characterization of semiconvexity in terms
of Hessian matrices: f : Rn → R is semiconvex with modulus ω(r) =
Cr 2 /2 if and only if ∇2 f ≥ −CIn . (If f is not twice differentiable, then
∇2 f should be interpreted as the distributional gradient.)
A well-known theorem states that a convex function is subdifferen-
tiable everywhere in the interior of its domain. The next result gener-
alizes this property to semiconvex functions.
Proposition 10.12 (Local equivalence of semiconvexity and sub-
differentiability). Let M be a smooth complete Riemannian mani-
fold. Then:
(i) If ψ : M → R ∪ {+∞} is locally semiconvex, then it is locally
subdifferentiable in the interior of its domain D := ψ −1 (R); and ∂D is
countably (n − 1)-rectifiable;
(ii) Conversely, if U is an open subset of M , and ψ : U → R is
locally subdifferentiable, then it is also locally semiconvex.
Similar statements hold true with “subdifferentiable” replaced by
“superdifferentiable” and “semiconvex” replaced by “semiconcave”.
Remark 10.13. This proposition implies that local semiconvexity and
local subdifferentiability are basically the same. But there is also a
global version of semiconvexity.
Remark 10.14. We already proved part of Proposition 10.12 in the
proof of Theorem 10.8 when M = Rn . Since the concept of semicon-
vexity is not invariant by diffeomorphism (unless this diffeomorphism
is an isometry), we’ll have to redo the proof.
242 10 Solution of the Monge problem II: Local approach

Proof of Proposition 10.12. For each x0 ∈ M , let Ox0 be an open neigh-

borhood of x0 . There is an open neighborhood Vx0 of x0 and a contin-
uous function ω = ωx0 such that (10.12) holds true for any geodesic
whose image is included in Vx0 . The open sets Ox0 ∩ Vx0 cover M which
is a countable union of compact sets; so we may extract from this family
a countable covering of M . If the property is proven in each Ox0 ∩ Vx0 ,
then the conclusion will follow. So it is suﬃcient to prove (i) in any
arbitrarily small neighborhood of any given point x0 . In the sequel, U
will stand for such a neighborhood.
Let D = ψ −1 (R). If x0 , x1 ∈ D ∩ U , and γ is a geodesic joining x0 to
x1 , then ψ(γt ) ≤ (1 − t) ψ(γ0 ) + t ψ(γ1 ) + t (1 − t) ω(d(γ0 , γ1 )) is ﬁnite
for all t ∈ [0, 1]. So D is geodesically convex.
If U is small enough, then (a) any two points in U are joined by
a unique geodesic; (b) U is isometric to a small open subset V of
Rn equipped with some Riemannian distance d. Since the property
of (geodesic) semiconvexity is invariant by isometry, we may work in V
equipped with the distance d. If x and y are two given points in V , let
m(x, y) stand for the midpoint (with respect to the geodesic distance d)
of x and y. Because d is Riemannian, one has
x+y
m(x, y) = + o(|x − y|).
2
Then let x ∈ V and let Tx D be the tangent cone of D at x. If p, p′ are
any two points in Tx D, there are sequences xk → x, tk → 0, x′k → x,
t′k → 0 such that

xk − x x′k − x
−−−→ p; −−−→ p′ .
tk k→∞ t′k k→∞

Then m(xk , x′k ) ∈ D and m(xk , x′k ) = (xk + x′k )/2 + o(|xk − x′k |) =
x + tk (pk + p′k )/2 + o(tk ), so

p + p′ m(xk , x′k ) − x
= lim ∈ Tx D.
2 k→∞ tk
Thus Tx D is a convex cone. This leaves two possibilities: either Tx D is
included in a half-space, or it is the whole of Rn .
Assume that Tx D = Rn . If C is a small (Euclidean) cube of side 2r,
centered at x0 , for r small enough any point in a neighborhood of x0 can
be written as a combination of barycenters of the vertices x1 , . . . , xN of
C, and all these barycenters will lie within a ball of radius 2r centered
Semiconvexity and semiconcavity 243

at x0 . (Indeed, let C0 stand for the set of the vertices

P of the cube C, then
C0 = {x(ε) ; ε ∈ {±1}n }, where x(ε) = x0 + r εj ej , (ej )1≤j≤n being
n n−1 (ε)
the canonical basis of R . For each ε ∈ {±1} , let C1 be the union
(ε)
of geodesic segments [x(ε,−1) , x(ε,1) ]. Then for ε ∈ {±1}n−2 let C2 be
(ε,−1)
the union of geodesic segments between an element of C1 and an
(ε,1)
element of C1 ; etc. After n operations we have a simply connected set
Cn which asymptotically coincides with the whole (solid) cube as r → 0;
so it is a neighborhood of x0 for r small enough.) Then in the interior of
C, ψ is bounded above by max ψ(x1 ), . . . , ψ(xN ) +sup {ω(s); s ≤ 4r}.
This shows at the same time that x0 lies in the interior of D, and that
ψ is bounded above around x0 .
In particular, if x ∈ ∂D, then Tx D cannot be Rn , so it is included in
a half space. By Theorem 10.48(ii) in the Appendix, this implies that
∂D is countably (n − 1)-rectifiable.
In the sequel, x0 will be an interior point of D and we shall show
that ψ is locally subdifferentiable around x0 . We have just seen that
ψ is bounded above in a neighborhood of x0 ; we shall now see that it
is also bounded below. Let B = Br (x0 ); if r > 0 is sufficiently small
then for any y ∈ B there is y ′ ∈ B such that the midpoint of y and
y ′ is x0 . (Indeed, take the geodesic γ starting from y and going to
x0 , say γ(t) = expx0 (tv), 0 ≤ t ≤ 1, and extend it up to time 2, set
y ′ = expx0 (2v) ∈ B. If B is small enough the geodesic is automatically
minimizing up to time 2, and x0 = m(y, y ′ ).) Then
1
ψ(x0 ) ≤ ψ(y) + ψ(y ′ ) + sup ω(s).
2 s≤2r

Since x0 is ﬁxed and ψ(y ′ ) is bounded above, this shows that ψ(y) is
bounded below as y varies in B.
Next, let us show that ψ is locally Lipschitz. Let V be a neigh-
borhood of x0 in which |ψ| is bounded by M . If r > 0 is small
enough, then for any y, y ′ ∈ Br (x0 ) there is z = z(y, y ′ ) ∈ V such
that y ′ = [y, z]λ , λ = d(y, y ′ )/4r ∈ [0, 1/2]. (Indeed, choose r so small
that all geodesics in B5r (x0 ) are minimizing, and B5r (x0 ) ⊂ V . Given
y and y ′ , take the geodesic going from y to y ′ , say expy (tv), 0 ≤ t ≤ 1;
extend it up to time t(λ) = 1/(1 − λ), write z = expy (t(λ) v). Then
d(x0 , z) ≤ d(x0 , y) + t(λ) d(y, y ′ ) ≤ d(x0 , y) + 2 d(y, y ′ ) < 5r.) So
ψ(y ′ ) ≤ (1 − λ) ψ(y) + λ ψ(z) + λ(1 − λ) ω(d(y, z)),
whence
244 10 Solution of the Monge problem II: Local approach

ψ(y ′ ) − ψ(y) ψ(y ′ ) − ψ(y) ψ(z) − ψ(y) ω(d(y, z))

′
= ≤ + . (10.13)
d(y, y ) λ d(y, z) d(y, z) d(y, z)

Since d(y, z) = d(y, y ′ )/λ = 4r, (10.13) implies

ψ(y ′ ) − ψ(y) 2M ω(r)

′
≤ + .
d(y, y ) r r

So the ratio [ψ(y ′ )− ψ(y)]/d(y, y ′ ) is bounded above in Br (x0 ); by sym-

metry (exchange y and y ′ ) it is also bounded below, and ψ is Lipschitz
in Br (x0 ).
The next step consists in showing that there is a uniform modulus of
subdifferentiability (at points of subdifferentiability!). More precisely,
if ψ is subdifferentiable at x, and p ∈ ∇− ψ(x), then for any w 6= 0, |w|
small enough,

ψ(expx w) ≥ ψ(x) + hp, wi − ω(|w|). (10.14)

Indeed, let γ(t) = expx (tw), y = expx w, then for any t ∈ [0, 1],

ψ(γ(t)) ≤ (1 − t) ψ(x) + t ψ(y) + t(1 − t) ω(|w|),

so
ψ(γ(t)) − ψ(x) ψ(y) − ψ(x) ω(|w|)
≤ + (1 − t) .
t|w| |w| |w|
On the other hand, by subdiﬀerentiability,
ψ(γ(t)) − ψ(x) hp, twi o(t|w|)
w o(t|w|)
≥ − = p, − .
t|w| t|w| t|w| |w| t|w|
The combination of the two previous inequalities implies

w o(t|w|) ψ(y) − ψ(x) ω(|w|)
p, − ≤ + (1 − t) .
|w| t|w| |w| |w|

The limit t → 0 gives (10.14). At the same time, it shows that for
|w| ≤ r,
w ω(r)
p, ≤ kψkLip (B2r (x0 )) + .
|w| r
By choosing w = rp, we conclude that |p| is bounded above, indepen-
dently of x. So ∇− ψ is locally bounded in the sense that there is a
uniform bound on the norms of all elements of ∇− ψ(x) when x varies
in a compact subset of the domain of ψ.
Semiconvexity and semiconcavity 245

At last we can conclude. Let x be interior to D. By Theorem 10.8(i),

there is a sequence xk → x such that ∇− ψ(xk ) 6= ∅. For each k ∈ N,
let pk ∈ ∇− ψ(xk ). As k → ∞, there is a uniform bound on |pk |, so we
may extract a subsequence such that pk → p ∈ Rn . For each xk , for
each w ∈ Rn small enough,
ψ(expxk w) ≥ ψ(xk ) + hpk , wi − ω(|w|).
Since ψ is continuous, we may pass to the limit as k → ∞ and recover
(for |w| small enough)
ψ(expx w) ≥ ψ(x) + hp, wi − ω(|w|).
So ψ is uniformly subdifferentiable around x, and the proof of (i) is
complete.
Statement (ii) is much easier to prove and completely similar to an
argument already used in the proof of Theorem 10.8(iii). Let x ∈ U ,
and let V be a small neighborhood of x, such that f is uniformly sub-
differentiable in V with modulus ω. Without loss of generality, assume
that ω(r)/r is a nondecreasing function of r; otherwise replace ω(r) by
ω(r) = r sup {ω(s)/s; 0 < s ≤ r}. Let W ⊂ V be a neighborhood of x,
small enough that any two points y, y ′ in W can be joined by a unique
′
geodesic γ y,y , whose image is contained in V ; by abuse of notation
′
I shall write y ′ − y for the initial velocity of γ y,y .
Then let γ be a geodesic such that γ0 , γ1 ∈ V ; let t ∈ [0, 1], and let
p ∈ ∇− f (γt ). It follows from the subdifferentiability that

f (γ1 ) ≤ f (γt ) + hp, γ1 − γt i + ω d(γt , γ1 ) .
Then since d(γt , γ1 ) = (1 − t) d(γ0 , γ1 ) and ω(r)/r is nonincreasing,

f (γ1 ) ≤ f (γt ) + hp, γ1 − γt i + (1 − t) ω d(γ0 , γ1 ) . (10.15)
Similarly,

f (γ0 ) ≤ f (γt ) + hp, γ0 − γt i + t ω d(γ0 , γ1 ) . (10.16)
Now take the linear combination of (10.15) and (10.16) with coefficients
t and 1 − t: Since t(γ1 − γt ) + (1 − t)(γ0 − γt ) = 0 (in Tγt M ), we recover
(1 − t) f (γ0 ) + t f (γ1 ) − f (γt ) ≤ 2 t (1 − t) ω(d(γ0 , γ1 )).
This proves that f is semiconvex in W . ⊓
⊔
246 10 Solution of the Monge problem II: Local approach

Assumptions on the cost function

Let M be a smooth complete connected Riemannian manifold, let X

be a closed subset of M , let Y be an arbitrary Polish space, and let
c : M × Y → R be a continuous cost function. (Most of the time, we
shall have X = M = Y.) I shall impose certain assumptions on the
behavior of c as a function of x, when x varies in the interior (in M ) of
X . They will be chosen from the following list (the notation Tx S stands
for tangent cone to S at x, see Deﬁnition 10.46):

(Super) c(x, y) is everywhere superdifferentiable as a function of x,

for all y.

(Twist) On its domain of definition, ∇x c(x, · ) is injective: if x, y, y ′

are such that ∇x c(x, y) = ∇x c(x, y ′ ), then y = y ′ .

(Lip) c(x, y) is locally Lipschitz as a function of x, uniformly in y.

(SC) c(x, y) is locally semiconcave as a function of x, uniformly

in y.

(locLip) c(x, y) is locally Lipschitz as a function of x, locally in y.

(locSC) c(x, y) is locally semiconcave as a function of x, locally in y.

(H∞)1 For any x and for any measurable set S which does not “lie
on one side of x” (in the sense that Tx S is not contained in a
half-space) there is a finite collection of elements z1 , . . . , zk ∈
S, and a small open ball B containing x, such that for any
y outside of a compact set,

inf c(w, y) ≥ inf c(zj , y).

w∈B 1≤j≤k

(H∞)2 For any x and any neighborhood U of x there is a small ball

B containing x such that

lim sup inf c(z, y) − c(w, y) = −∞.
y→∞ w∈B z∈U

Our theorems of solvability of the Monge problem will be expressed

in terms of these assumptions. (I will write (H∞) for the combination
Assumptions on the cost function 247

of (H∞)1 and (H∞)2 .) There are some obvious implications between

them, in particular (locSC) implies (Super). Before going any fur-
ther, I shall give some informal explanations about (H∞)1 and (H∞)2 ,
which probably look obscure to the reader. Both of them are assump-
tions about the behavior of c(x, y) as y → ∞, therefore they are void
if y varies in a compact set. They are essentially quantitative versions
of the following statement: For any y it is possible to lower the cost to
go from x to y, by starting from a well-chosen point z close to x. For
instance, if c is a radially symmetric cost on Rn × Rn , then I would
choose z very close to x, “opposite to y”.
In the rest of this section, I shall discuss some simple sufficient
conditions for all these assumptions to hold true. The first result is that
Conditions (Super), (Twist), (locLip) and (locSC) are satisfied by
many Lagrangian cost functions.

Proposition 10.15 (Properties of Lagrangian cost functions).

On a smooth Riemannian manifold M , let c(x, y) be a cost function
associated with a C 1 Lagrangian L(x, v, t). Assume that any x, y ∈ M
can be joined by at least one C 1 minimizing curve. Then:
(i) For any (x, y) ∈ M × M , and any C 1 minimizing curve γ con-
necting x to y, the tangent vector −∇v L(x, γ̇0 , 0) ∈ Tx M is a supergra-
dient for c( · , y) at x; in particular, c is superdifferentiable at (x, y) as
a function of x.
(ii) If L is strictly convex as a function of v, and minimizing curves
are uniquely determined by their initial position and velocity, then c
satisfies a twist condition: If c is differentiable at (x, y) as a function
of x, then y is uniquely determined by x and ∇x c(x, y). Moreover,

∇x c(x, y) + ∇v L x, γ̇(0), 0 = 0,

where γ is the unique minimizing curve joining x to y.

(iii) If L has the property that for any two compact sets K0 and K1 ,
the velocities of minimizing curves starting in K0 and ending in K1 are
uniformly bounded, then c is locally Lipschitz and locally semiconcave
as a function of x, locally in y.

Example 10.16. Consider the case L(x, v, t) = |v|2 . Then ∇v L = 2v;

and (i) says that −2v0 is a supergradient of d( · , y)2 at x, where v0 is
the velocity used to go from x to y. This is a generalization of the usual
formula in Euclidean space:
248 10 Solution of the Monge problem II: Local approach

∇x (|x − y|2 ) = 2(x − y) = −2(y − x).

Also (ii) says that if d(x, y)2 is diﬀerentiable at (x, y) as a function of

x, then x and y are connected by a unique minimizing geodesic.1

Remark 10.17. The requirements in (ii) and (iii) are fulﬁlled if the
Lagrangian L is time-independent, C 2 , strictly convex superlinear as a
function of v (recall Example 7.5). But they also hold true for other
interesting cases such as L(x, v, t) = |v|1+α , 0 < α < 1.

Remark 10.18. Part (i) of Proposition 10.15 means that the behavior
of the (squared) distance function is typical: if one plots c(x, y) as a
function of x, for ﬁxed y, one will always see upward-pointing crests as
in Figure 10.1, never downward-pointing ones.

Proof of Proposition 10.15. The proof is based on the formula of ﬁrst

variation. Let (x, y) be given, and let γ(t)0≤t≤1 be a minimizing curve,
C 1 as a function of t, joining x to y. Let γ e be another curve, not
necessarily minimizing, joining x e to ye. Assume that x e is very close
to x, so that there is a unique geodesic joining x to x e; by abuse of
notation, I shall write xe − x for the initial velocity of this geodesic.
Similarly, let us assume that ye is very close to y. Then, by the formula
of ﬁrst variation,
Z 1 h

A(e
γ) = L γt , γ̇t , t dt + ∇v L γ1 , γ̇1 , 1 · (e
y − y)
0
i
− ∇v L γ0 , γ̇0 , 0 · (e
x − x) + ω sup d(γt , γ et ) , (10.17)
0≤t≤1

where ω(r)/r → 0, and ω only depends on the behavior of the man-

ifold in a neighborhood of γ, and on a modulus of continuity for the
derivatives of L in a neighborhood of {(γt , γ̇t , t)0≤t≤1 }. Without loss of
generality, we may assume that ω(r)/r is nonincreasing as r ↓ 0.
Then let x e be arbitrarily close to y. By working in smooth charts, it
is easy to construct a curve γ e joining γ
e0 = x
e to γe1 = y, in such a way
that d(γt , γet ) ≤ d(x, x
e). Then by (10.17),

x, y) ≤ A(e
c(e γ ) ≤ c(x, y) − ∇v L(x, v, 0), x
e − x + ω(|ex − x|),

which proves (i).

1
As pointed out to me by Fathi, this implies (by Theorem 10.8(iii)) that d2 is also
differentiable at (x, y) as a function of y!
Assumptions on the cost function 249

Now for the proof of (ii): If c( · , y) is not only superdiﬀerentiable

but plainly differentiable, then by Proposition 10.7 there is just one
supergradient, which is the gradient, so −∇v L(x, v, 0) = ∇x c(x, y).
Since L is strictly convex in the v variable, this equation determines v
uniquely. By assumption, this in turn determines the whole geodesic γ,
and in particular y.
Finally, let us consider (iii). When x and y vary in small balls, the
velocity v along the minimizing curves will be bounded by assump-
tion; so the function ω will also be uniform. Then c(x, y) is locally
superdifferentiable as a function of x, and the conclusion comes from
Proposition 10.12. ⊓
⊔
Proposition 10.15 is basically all that is needed to treat quite general
cost functions on a compact Riemannian manifold. But for noncompact
manifolds, it might be difficult to check Assumptions (Lip), (SC) or
(H∞). Here are a few examples where this can be done.
Example 10.19. Gangbo and McCann have considered cost functions
of the form c(x, y) = c(x − y) on Rn × Rn , satisfying the following
assumption: For any given r > 0 and θ ∈ (0, π), if |y| is large enough
then there is a cone Kr,θ (y, e), with apex y, direction e, height r and
angle θ, such that c takes its maximum on Kr,θ (y, e) at y. Let us check
briefly that this assumption implies (H∞)1 . (The reader who feels
that both assumptions are equally obscure may very well skip this and
jump directly to Example 10.20.) Let x and S be given such that Tx S
is included in no half-space. So for each direction e ∈ S n−1 there are
points z+ and z− in S, each of which lies on one side of the hyperplane
passing through z and orthogonal to e. By a compactness argument,
one can find a finite collection of points z1 , . . . , zk in S, an angle θ < π
and a positive number r > 0 such that for all e ∈ S n−1 and for any w
close enough to x, the truncated cone Kr,θ (w, e) contains at least one of
the zj . Equivalently, Kr,θ (w − y, e) contains zj − y. But by assumption,
for |w−y| large enough there is a cone Kr,θ (w−y, e) such that c(z−y) ≤
c(w − y) for all z ∈ Kr,θ (w − y, e). This inequality applies to z = zj (for
some j), and then c(zj − y) ≤ c(w − y).
Example 10.20. As a particular case of the previous example, (H∞)1
holds true if c = c(x − y) is radially symmetric and strictly increasing
as a function of |x − y|.
Example 10.21. Gangbo and McCann considered cost functions that
also satisfy c(x, y) = c(x − y) with c convex and superlinear. This
250 10 Solution of the Monge problem II: Local approach

assumption implies (H∞)2 . Indeed, if x in Rn and ε > 0 are given, let

z = x − ε(x − y)/|x − y|; it suﬃces to show that

c(z − y) − c(x − y) −−−→ −∞,

y→∞

or equivalently, with h = x − y,
ε
c(h) − c h 1 − −−−→ +∞.
|h| h→∞

But the inequality

c(0) ≥ c(p) + ∇c(p) · (−p)

and the superlinearity of c imply ∇c(p) · (p/|p|) → +∞ as p → ∞;

then, with the notation hε = h(1 − ε/|h|),

εh hε
c(h) − c(hε ) ≥ ∇c(hε ) · = ε ∇c(hε ) · −−−−→ +∞,
|h| |hε | |h|→∞

as desired.

Example 10.22. If (M, g) is a Riemannian manifold with nonneg-

ative sectional curvature, then (as recalled in the Third Appendix)
∇2x (d(x0 , x)2 /2) ≤ gx , and it follows that c(x, y) = d(x, y)2 is semicon-
cave with a modulus ω(r) = r 2 . This condition of nonnegative curvature
is quite restrictive, but there does not seem to be any good alternative
geometric condition implying the semiconcavity of d(x, y)2 , uniformly
in x and y.

I conclude this section with an open problem:

Open Problem 10.23. Find simple sufficient conditions on a rather

general Lagrangian on an unbounded Riemannian manifold, so that it
will satisfy (H∞).

Differentiability of c-convex functions

We shall now return to optimal transport, and arrive at the core of the
analysis of the Monge problem: the study of the regularity of c-convex
Differentiability of c-convex functions 251

functions. This includes c-subdiﬀerentiability, subdiﬀerentiability, and

plain diﬀerentiability.
In Theorems 10.24 to 10.26, M is a complete connected Riemannian
manifold of dimension n, X is a closed subset of M such that the
frontier ∂X (in M ) is of dimension at most n − 1 (for instance it is
locally a graph), and Y is an arbitrary Polish space. The cost function
c : X × Y → R is assumed to be continuous. The statements will
be expressed in terms of the assumptions appearing in the previous
section; these assumptions will be made for interior points, which are
points lying in the interior of X (viewed as a subset of M ).

Theorem 10.24 (c-subdifferentiability of c-convex functions).

Let Assumption (H∞) be satisfied. Let ψ : X → R ∪ {+∞} be a c-
convex function, and let Ω be the interior (in M ) of its domain ψ −1 (R).
Then, ψ −1 (R) \ Ω is a set of dimension at most n − 1. Moreover, ψ
is locally bounded and c-subdifferentiable everywhere in Ω. Finally, if
K ⊂ Ω is compact, then ∂c ψ(K) is itself compact.

Theorem 10.25 (Subdifferentiability of c-convex functions).

Assume that (Super) is satisfied. Let ψ be a c-convex function, and
let x be an interior point of X (in M ) such that ∂c ψ(x) 6= ∅. Then ψ
is subdifferentiable at x. In short:

∂c ψ(x) 6= ∅ =⇒ ∇− ψ(x) 6= ∅.

More precisely, for any y ∈ ∂c ψ(x), one has −∇+ −

x c(x, y) ⊂ ∇ ψ(x).

Theorem 10.26 (Differentiability of c-convex functions). As-

sume that (Super) and (Twist) are satisfied, and let ψ be a c-convex
function. Then:
(i) If (Lip) is satisfied, then ψ is locally Lipschitz and differentiable
in X , apart from a set of zero volume; the same is true if (locLip) and
(H∞) are satisfied.
(ii) If (SC) is satisfied, then ψ is locally semiconvex and differen-
tiable in the interior (in M ) of its domain, apart from a set of dimen-
sion at most n − 1; and the boundary of the domain of ψ is also of
dimension at most n − 1. The same is true if (locSC) and (H∞) are
satisfied.

Proof of Theorem 10.24. Let S = ψ −1 (R) \ ∂X . (Here ∂X is the bound-

ary of X in M , which by assumption is of dimension at most n − 1.)
252 10 Solution of the Monge problem II: Local approach

I shall use the notion of tangent cone deﬁned later in Deﬁnition 10.46,
and show that if x ∈ S is such that Tx S is not included in a half-space,
then ψ is bounded on a small ball around x. It will follow that x is in
fact in the interior of Ω. So for each x ∈ S \ Ω, Tx S will be included
in a half-space, and by Theorem 10.48(ii) S \ Ω will be of dimension at
most n − 1. Moreover, this will show that ψ is locally bounded in Ω.
So let x be such that ψ(x) < +∞, and Tx S is not included in a
half-space. By assumption, there are points z1 , . . . , zk in S, a small ball
B around x, and a compact set K ⊂ Y such that for any y ∈ Y \ K,

inf c(w, y) ≥ inf c(zj , y).

w∈B 1≤j≤k

Let φ be the c-transform of ψ. For any y ∈ Y \ K,

φ(y) − inf c(w, y) ≤ φ(y) − inf c(zj , y) ≤ sup ψ(zj ).

w∈B 1≤j≤k 1≤j≤k

∀w ∈ B, ∀y ∈ Y \ K, φ(y) − c(w, y) ≤ sup ψ(zj ).

1≤j≤k

When y ∈ K, the trivial bound φ(y) − c(w, y) ≤ ψ(x) + c(x, y) − c(w, y)

implies

∀w ∈ B, ψ(w) = sup [φ(y) − c(w, y)]

y∈Y
!

≤ max sup ψ(zj ), sup c(x, y) + ψ(x) − c(w, y) .
1≤j≤k y∈K

This shows that ψ is indeed bounded above on B. Since it is lower

semicontinuous with values in R ∪ {+∞}, it is also bounded below on
a neighborhood of x. All in all, ψ is bounded in a neighborhood of x.
Next, let x ∈ Ω; the goal is to show that ∂c ψ(x) 6= ∅. Let U be a
small neighborhood of x, on which |ψ| is bounded by M . By assumption
there is a compact set K, and a small ball B ′ in U , such that for all y
outside K,

∀z ∈ B ′ , c(z, y) − c(x, y) ≤ −(2M + 1).

So if y ∈
/ K, there is a z such that ψ(z) + c(z, y) ≤ c(x, y) − (M + 1) ≤
ψ(x) + c(x, y) − 1, and
Differentiability of c-convex functions 253

φ(y) − c(x, y) ≤ inf ′ ψ(z) + c(z, y) − c(x, y)
z∈B
≤ ψ(x) − 1 = sup [φ(y ′ ) − c(x, y ′ )] − 1.
y ′ ∈Y

Then the supremum of φ(y) − c(x, y) over all Y is the same as the
supremum over only K. But this is a maximization problem for an
upper semicontinuous function on a compact set, so it admits a solution,
which belongs to ∂c ψ(x).
The same reasoning can be made with x replaced by w in a small
neighborhood B of x, then the conclusion is that ∂c ψ(w) is nonempty
and contained in the compact set K, uniformly for z ∈ B. If K ′ ⊂ Ω
is a compact set, we can cover it by a ﬁnite number of small open
balls Bj such that ∂c ψ(Bj ) is contained in a compact set Kj , so that
∂c ψ(K ′ ) ⊂ ∪Kj . Since on the other hand ∂c ψ(K ′ ) is closed by the
continuity of c, it follows that ∂c ψ(K ′ ) is compact. This concludes the
proof of Theorem 10.24. ⊓
⊔

Proof of Theorem 10.25. Let x be a point of c-subdiﬀerentiability of ψ,

and let y ∈ ∂c ψ(x). Further, let

φ(y) := inf ψ(x) + c(x, y)

be the c-transform of ψ. By deﬁnition of c-subdiﬀerentiability,

ψ(x) = φ(y) − c(x, y). (10.18)

Let w ∈ Tx M and let xε be obtained from x by a variation of size O(ε)

in the direction w, say xε = expx (εw). From the deﬁnition of φ, one
has of course
ψ(xε ) ≥ φ(y) − c(xε , y). (10.19)
Further, let p ∈ ∇+
x c(x, y). By (10.18), (10.19) and the superdiﬀeren-
tiability of c,

ψ(xε ) ≥ φ(y) − c(xε , y)

≥ φ(y) − c(x, y) − εhp, wi + o(ε)
= ψ(x) − εhp, wi + o(ε).

This shows that ψ is indeed subdiﬀerentiable at x, with p as a subgra-

dient, and −p ∈ ∂c ψ(x). ⊓
⊔
254 10 Solution of the Monge problem II: Local approach

Proof of Theorem 10.26. (i) If kc( · , y)kLip ≤ L, then also ψ(x) =

supy [φ(y) − c(x, y)] satisﬁes kψkLip ≤ L. Then by Theorem 10.8(ii)
ψ is diﬀerentiable everywhere on the interior of X , apart from a set of
zero volume.
If c is only locally Lipschitz in x and y, but condition (H∞) is
ensured, then for each compact set K in X there is a compact set
K ′ ⊂ Y such that

∀x ∈ K, ψ(x) = sup [φ(y) − c(x, y)] = sup [φ(y) − c(x, y)].

y∈∂c ψ(x) y∈K ′

The functions inside the supremum are uniformly Lipschitz when x

stays in K and y stays in K ′ , so the result of the supremum is again a
locally Lipschitz function.
(ii) Assume that c(x, y) is semiconcave, locally in x and uniformly
in y. Let K be a compact subset of M , and let γ be a constant-speed
geodesic whose image is included in K; then the inequality

c(γt , y) ≥ (1 − t) c(γ0 , y) + t c(γ1 , y) − t(1 − t) ω d(γ0 , γ1 )

leads to

ψ(γt ) = sup φ(y) − c(γt , y)
y
h i
≤ sup φ(y) − (1 − t) c(γ0 , y) − t c(γ1 , y) + t(1 − t) ω d(γ0 , γ1 )
y
h i
= sup (1 − t) φ(y) − (1 − t) c(γ0 , y) + t φ(y) − t c(γ1 , y)
y
+ t(1 − t) ω d(γ0 , γ1 )

≤ (1 − t) sup φ(y) − c(γ0 , y) + t sup φ(y) − c(γ1 , y)
y y
+ t(1 − t) ω d(γ0 , γ1 )

= (1 − t) ψ(γ0 ) + t ψ(γ1 ) + t(1 − t) ω d(γ0 , γ1 ) .

So ψ inherits the semiconcavity modulus of c as semiconvexity mod-

ulus. Then the conclusion follows from Proposition 10.12 and Theo-
rem 10.8(iii). If c is semiconcave in x and y, one can use a localization
argument as in the proof of (i). ⊓
⊔

Remark 10.27. Theorems 10.24 to 10.26, and (in the Lagrangian case)
Proposition 10.15 provide a good picture of diﬀerentiability points of
Applications to the Monge problem 255

c-convex functions: Let c satisfy (Twist), (Super) and (H∞), and

let x be in the interior of the domain of a c-convex function ψ. If ψ
is differentiable at x then ∂c ψ(x) consists of just one point y, and
∇ψ(x) = −∇x c(x, y), which in the Lagrangian case also coincides
with ∇v L(x, v, 0), where v is the initial velocity of the unique action-
minimizing curve joining x to y. If ψ is not differentiable at x, the
picture is not so precise; however, under (locSC), we can use the local
semiconvexity of ψ to show that ∇− ψ(x) is included in the closed con-
vex hull of −∇+ x c(x, ∂c ψ(x)), which in the Lagrangian case is also the
closed convex hull of {∇v L(0, x, γ̇(0))}, where γ varies among action-
minimizing curves joining x to ∂c ψ(x). (Use Remark 10.51 in the Second
Appendix, the stability of c-subdifferential, the semiconcavity of c, the
continuity of ∇+x c, the stability of minimizing curves and the continuity
of ∇v L.) There is in general no reason why −∇+ x c(x, ∂c ψ(x)) would be
convex; we shall come back to this issue in Chapter 12.

Applications to the Monge problem

The next theorem shows how to incorporate the previous information

into the optimal transport problem.

Theorem 10.28 (Solution of the Monge problem II). Let M be

a Riemannian manifold, X a closed subset of M , with dim(∂X ) ≤ n−1,
and Y an arbitrary Polish space. Let c : X × Y → R be a continuous
cost function, bounded below, and let µ ∈ P (X ), ν ∈ P (Y), such that
the optimal cost C(µ, ν) is finite. Assume that:
(i) c is superdifferentiable everywhere (Assumption (Super));
(ii) ∇x c(x, · ) is injective where defined (Assumption (Twist));
(iii) any c-convex function is differentiable µ-almost surely on its
domain of c-subdifferentiability.
Then there exists a unique (in law) optimal coupling (x, y) of (µ, ν); it
is deterministic, and there is a c-convex function ψ such that

∇ψ(x) + ∇x c(x, y) = 0 almost surely. (10.20)

In other words, there is a unique transport map T solving the Monge

problem, and ∇ψ(x) + ∇x c(x, T (x)) = 0, µ(dx)-almost surely.
256 10 Solution of the Monge problem II: Local approach

If moreover (H∞) is satisfied, then:

(a) Equation (10.20) characterizes the optimal coupling;
(b) Let Z be the set of points where ψ is differentiable; then one can
define a continuous map x → T (x) on Z by the equation T (x) ∈ ∂c ψ(x),
and
Spt ν = T (Spt µ). (10.21)

Remark 10.29. As a corollary of this theorem, ∇ψ is uniquely deter-

mined µ-almost surely, since the random variable ∇ψ(x) has to coincide
(in law) with −∇x c(x, y).

Remark 10.30. The uniqueness of the optimal transport does not in

general imply the uniqueness of the function ψ, even up to an additive
constant. However, this becomes true if X is the closure of a connected
open set Ω, such that the density of µ with respect to the volume
measure is positive almost everywhere in Ω; indeed, ∇ψ will then be
uniquely deﬁned almost surely in Ω. (This uniqueness theorem does
not require µ to be absolutely continuous.)

Remark 10.31. In general ∂c ψ(Spt µ) is larger than Spt ν. Take for

instance X = Y = R2 , c(x, y) = |x − y|2 , let B = B1 (0) ⊂ R2 ; split B
in two halves along the first coordinate, translate the right (resp. left)
half ball by a unit vector (1, 0) (resp. (−1, 0)), call B ′ the resulting
set, and define µ (resp. ν) as the normalized measure on B (resp. B ′ ).
Then ∂c ψ(Spt µ) will be the whole convex hull of B ′ . Theorem 12.49
will provide sufficient conditions to control ∂c ψ(Spt µ).

Remark 10.32. If in Theorem 10.28 the cost c derives from a C 1 La-

grangian L(x, v, t), strictly convex in v, such that minimizing curves are
uniquely determined by their initial velocity, then Proposition 10.15(ii)
implies the following important property of the optimal coupling (x, y):
Almost surely, x is joined to y by a unique minimizing curve. For in-
stance, if c(x, y) = d(x, y)2 , the optimal transference plan π will be
concentrated on the set of points (x, y) in M × M such that x and y
are joined by a unique geodesic.

Remark 10.33. Assumption (iii) can be realized in a number of ways,

depending on which part of Theorem 10.26 one wishes to use: For in-
stance, it is true if c is Lipschitz on X ×Y and µ is absolutely continuous;
or if c is locally Lipschitz and µ, ν are compactly supported and µ is
absolutely continuous; or if c is locally semiconcave and satisﬁes (H∞)
Applications to the Monge problem 257

and µ does not charge sets of dimension n − 1; etc. It is important to

note that Assumption (iii) implicitly contains some restrictions about
the behavior at infinity, of either the measure µ, or the manifold M , or
the cost function c.
Example 10.34. All the assumptions of Theorem 10.28 are satisfied if
X = M = Y is compact and the Lagrangian L is C 2 and satisfies the
classical conditions of Definition 7.6.
Example 10.35. All the assumptions of Theorem 10.28 are satisfied if
X = M = Y = Rn , c is a C 1 strictly convex function with a bounded
Hessian and µ does not charge sets of dimension n − 1. Indeed, ∇x c will
be injective by strict convexity of c; and c will be uniformly semicon-
cave with a modulus Cr 2 , so Theorem 10.26 guarantees that c-convex
functions are differentiable everywhere apart from a set of dimension
at most n − 1.
Example 10.36. All the assumptions of Theorem 10.28 are satisfied if
X = M = Y, c(x, y) = d(x, y)2 , and M has nonnegative sectional cur-
vature; recall indeed Example 10.22. The same is true if M is compact.
Proof of Theorem 10.28. Let π be an optimal transference plan. From
Theorem 5.10, there exists a pair of c-conjugate functions (ψ, φ) such
that φ(y) − ψ(x) ≤ c(x, y) everywhere, with equality π-almost surely.
Write again (10.2), at a point x of differentiability of ψ (x should be
interior to X , viewed as a subset of M ), and choose x
e=x e(ε) = γ(ε),
where γ̇(0) = w; divide by ε > 0 and pass to the lim inf:
x(ε), y) − c(x, y)
c(e
−∇ψ(x) · w ≤ lim inf . (10.22)
ε→0 ε
It follows that −∇ψ(x) is a subgradient of c( · , y) at x. But by as-
sumption, there exists at least one supergradient of c( · , y) at x, say G.
By Proposition 10.7, c( · , y) really is differentiable at x, with gradient
−∇ψ(x).
So (10.20) holds true, and then Assumption (iii) implies y = T (x) =
(∇x c)−1 (x, −∇ψ(x)), where (∇x c)−1 is the inverse of x 7−→ ∇x c(x, y),
viewed as a function of y and defined on the set of x for which ∇x c(x, y)
exists. (The usual measurable selection theorem implies the measura-
bility of this inverse.)
Thus π is concentrated on the graph of T ; or equivalently, π =
(Id , T )# µ. Since this conclusion does not depend on the choice of π,
but only on the choice of ψ, the optimal coupling is unique.
258 10 Solution of the Monge problem II: Local approach

It remains to prove the last part of Theorem 10.28. From now on

I shall assume that (H∞) is satisfied. Let π ∈ Π(µ, ν) be a transference
plan, and let ψ be a c-convex function such that (10.20) holds true.
Let Z be the set of differentiability points of ψ, and let x ∈ Z;
in particular, x should be interior to X (in M ), and should belong
to the interior of the domain of ψ. By Theorem 10.24, there is some
y ∈ ∂c ψ(x). Let G be a supergradient of c( · , y) at x; by Theorem 10.25,
−G ∈ {∇− ψ(x)} = {∇ψ(x)}, so −∇ψ(x) is the only supergradient of
c( · , y) at x (as in the beginning of the proof of Theorem 10.28); c( · , y)
really is differentiable at x and ∇x c(x, y) + ∇ψ(x) = 0. By injectivity
of ∇x c(x, · ), this equation determines y = T (x) as a function of x ∈ Z.
This proves the first part of (b), and also shows that ∂c ψ(x) = {T (x)}
for any x ∈ Z.
Now, since π is concentrated on Z ×Y, equation (10.20) implies that
that π really is concentrated on the graph of T . A fortiori π[∂c ψ] = 1,
so π is c-cyclically monotone, and therefore optimal by Theorem 5.10.
This proves (a).
Next, let us prove that T is continuous on Z. Let (xk )k∈N be a
sequence in Z, converging to x ∈ Z, and let yk = T (xk ). Assumption
(H∞) and Theorem 10.24 imply that ∂c ψ transforms compact sets into
compact sets; so the sequence (yk )k∈N takes values in a compact set,
and up to extraction of a subsequence it converges to some y ′ ∈ Y. By
passing to the limit in the inequality defining the c-subdifferential, we
obtain y ′ ∈ ∂c ψ(x). Since x ∈ Z, this determines y ′ = T (x) uniquely,
so the whole sequence (T (xk ))k∈N converges to T (x), and T is indeed
continuous.
Equation (10.21) follows from the continuity of T . Indeed, the in-
clusion Spt µ ⊂ T −1 (T (Spt µ)) implies

ν[T (Spt µ)] = µ T −1 (T (Spt µ)) ≥ µ[Spt µ] = 1;

so by deﬁnition of support, Spt ν ⊂ T (Spt µ). On the other hand, if

x ∈ Spt µ ∩ Z, let y = T (x), and let ε > 0; by continuity of T there is
δ > 0 such that T (Bδ (x)) ⊂ Bε (y), and then

ν[Bε (y)] = µ T −1 (Bε (y)) ≥ µ T −1 T (Bδ (x)) ≥ µ[Bδ (x)] > 0;
so y ∈ Spt ν. This shows that T (Spt µ) ⊂ Spt ν, and therefore
T (Spt µ) ⊂ Spt ν, as desired. This concludes the proof of (b). ⊓
⊔
Remark 10.37. Uniqueness in Theorem 10.28 was very easy because
we were careful to note in Theorem 5.10 that any optimal π is supported
Removing the conditions at infinity 259

in the c-subdiﬀerential of any optimal function ψ for the dual problem.

If we only knew that any optimal π is supported in the c-subdiﬀerential
of some ψ, we still could get uniqueness fairly easily, either by working
out a variant of the statement in Theorem 5.10, or by noting that if
any optimal measure is concentrated on a graph, then a strict convex
combination of such measures has to be concentrated on a graph itself,
and this is possible only if the two graphs coincide almost surely.

Removing the conditions at infinity

This section and the next one deal with extensions of Theorem 10.28.
Here we shall learn how to cover situations in which no control at
infinity is assumed, and in particular Assumption (iii) of Theorem 10.28
might not be satisfied. The short answer is that it is sufficient to replace
the gradient in (10.20) by an approximate gradient. (Actually a little
bit more will be lost, see Remarks 10.39 and 10.40 below.)

Theorem 10.38 (Solution of the Monge problem without con-

ditions at infinity). Let M be a Riemannian manifold and Y an
arbitrary Polish space. Let c : M × Y → R be a continuous cost func-
tion, bounded below, and let µ ∈ P (M ), ν ∈ P (Y), such that the optimal
cost C(µ, ν) is finite. Assume that:
(i) c is superdifferentiable everywhere (Assumption (Super));
(ii) ∇x c(x, · ) is injective (Assumption (Twist));
(iii) for any closed ball B = Br] (x0 ) and any compact set K ⊂ Y,
the function c′ defined on B × K by restriction of c is such that any
c′ -convex function on B × K is differentiable µ-almost surely;
(iv) µ is absolutely continuous with respect to the volume measure.
Then there exists a unique (in law) optimal coupling (x, y) of (µ, ν); it
is deterministic, and satisfies the equation
e
∇ψ(x) + ∇x c(x, y) = 0 almost surely. (10.23)

In other words, there is a unique optimal transport T and it satisfies

e
the equation ∇ψ(x) + ∇x c(x, T (x)) = 0 almost surely.

Remark 10.39. I don’t know if (10.23) is a characterization of the

optimal transport.
260 10 Solution of the Monge problem II: Local approach

Remark 10.40. If Assumption (iv) is weakened into (iv’) µ gives zero

mass to sets of dimension at most n − 1, then there is still uniqueness
of the optimal coupling, and there is a c-convex ψ such that y ∈ ∂c ψ(x)
almost surely; but it is not clear that equation (10.23) still holds. This
uniqueness result is a bit more tricky than the previous one, and I shall
postpone its proof to the next section (see Theorem 10.42).

Proof of Theorem 10.38. Let ψ be a c-convex function as in Theo-

rem 5.10. Let π be an optimal transport; according to Theorem 5.10(ii),
π[∂c ψ] = 1.
Let x0 be any point in M . For any ℓ ∈ N, let Bℓ stand for the closed
ball Bℓ] (x0 ). Let also (Kℓ )ℓ∈N be an increasing sequence of compact
sets in Y, such that ν[∪Kℓ ] = 1. The sets Bℓ × Kℓ fill up the whole of
M ×Y, up to a π-negligible set. Let cℓ be the restriction of c to Bℓ ×Kℓ .
If ℓ is large enough, then π[Bℓ × Kℓ ] > 0, so we can define
1Bℓ ×Kℓ π
πℓ := ,
π[Bℓ × Kℓ ]
and then introduce the marginals µℓ and νℓ of πℓ . By Theorem 4.6,
πℓ is optimal in the transport problem from (Bℓ , µℓ ) to (Kℓ , νℓ ), with
cost cℓ . By Theorem 5.19 we can find a cℓ -convex function ψℓ which
coincides with ψ µℓ -almost surely, and actually on the whole of Sℓ :=
projM ((∂c ψ) ∩ (Bℓ × Kℓ )).
The union of all sets Sℓ covers projM (∂c ψ), and therefore also
projM (Spt π), apart from a µ-negligible set. Let Seℓ be the set of points
in Sℓ at which Sℓ has density 1; we know that Sℓ \ Seℓ has zero volume.
So the union of all sets Seℓ still covers M , apart from a µ-negligible set.
(Here I have used the absolute continuity of µ.)
By Assumption (iii), each function ψℓ is differentiable apart from a
µ-negligible set Zℓ . Moreover, by Theorem 10.28, the equation

∇x c(x, y) + ∇ψℓ (x) = 0 (10.24)

determines the unique optimal coupling between µℓ and νℓ , for the cost
cℓ . (Note that ∇x cℓ coincides with ∇x c when x is in the interior of Bℓ ,
and µℓ [∂Bℓ ] = 0, so equation (10.24) does hold true πℓ -almost surely.)
Now we can deﬁne our Monge coupling. For each ℓ ∈ N, and each
x ∈ Seℓ \ Zℓ , ψℓ coincides with ψ on a set which has density 1 at x, so
e
∇ψℓ (x) = ∇ψ(x), and (10.24) becomes
e
∇x c(x, y) + ∇ψ(x) = 0. (10.25)
Removing the conditions at infinity 261

This equation is independent of ℓ, and holds πℓ -almost surely since

Seℓ \ Zℓ has full µℓ -measure. By letting ℓ → ∞ we deduce that π is
concentrated on the set of (x, y) satisfying (10.25). By assumption this
e
equation determines y = ∇x c−1 (x, −∇ψ(x)) uniquely as a measurable
function of x. Uniqueness follows obviously since π was an arbitrary
optimal plan. ⊓
⊔

As an illustration of the use of Theorems 10.28 and 10.38, let us

see how we can solve the Monge problem for the square distance on
a Riemannian manifold. In the next theorem, I shall say that M has
asymptotically nonnegative curvature if all sectional curvatures σx at
point x satisfy
C
σx ≥ − (10.26)
d(x0 , x)2
for some positive constant C and some x0 ∈ M .

Theorem 10.41 (Solution of the Monge problem for the square

distance). Let M be a Riemannian manifold, and c(x, y) = d(x, y)2 .
Let µ, ν be two probability measures on M , such that the optimal cost
between µ and ν is finite. If µ is absolutely continuous, then there is a
unique solution of the Monge problem between µ and ν, and it can be
written as
e
y = T (x) = expx ∇ψ(x) , (10.27)
where ψ is some d2 /2-convex function. The approximate gradient can
be replaced by a true gradient if any one of the following conditions is
satisfied:
(a) µ and ν are compactly supported;
(b) M has nonnegative sectional curvature;
(c) ν is compactly supported and M has asymptotically nonnegative
curvature.

Proof. The general theorem is just a particular case of Theorem 10.38.

In case (a), we can apply Theorem 10.28 with X = BR] (x0 ) = Y,
where R is large enough that BR (x0 ) contains all geodesics that go
from Spt µ to Spt ν. Then the conclusion holds with some c′ -convex
function ψ, where c′ is the restriction of c to X × Y:
h d(x, y)2 i
ψ(x) = sup φ(y) − .
y∈Br] (x0 ) 2
262 10 Solution of the Monge problem II: Local approach

To recover a true d2 /2-function, it suﬃces to set φ(y) = −∞ on

M \ Y, and let ψ(x) = supy∈M [φ(y) − d(x, y)2 /2] (as in the proof of
Lemma 5.18).
In case (b), all functions d( · , y)2 /2 are uniformly semiconcave (as
recalled in the Third Appendix), so Theorem 10.28 applies.
In case (c), all functions d( · , y)2 /2, where y varies in the support
of ν, are uniformly semiconcave (as recalled in the Third Appendix),
so we can choose Y to be a large closed ball containing the support of
ν, and apply Theorem 10.28 again. ⊓
⊔

Removing the assumption of finite cost

In this last section, I shall investigate situations where the total trans-
port cost might be inﬁnite. Unless the reader is speciﬁcally interested
in such a situation, he or she is advised to skip this section, which is
quite tricky.
If C(µ, ν) = +∞, there is no point in searching for an optimal
transference plan. However, it does make sense to look for c-cyclically
monotone plans, which will be called generalized optimal transfer-
ence plans.
Theorem 10.42 (Solution of the Monge problem with possibly
infinite total cost). Let X be a closed subset of a Riemannian
manifold M such that dim(∂X ) ≤ n − 1, and let Y be an arbitrary
Polish space. Let c : M × Y → R be a continuous cost function, bounded
below, and let µ ∈ P (M ), ν ∈ P (Y). Assume that:
(i) c is locally semiconcave (Assumption (locSC));
(ii) ∇x c(x, · ) is injective (Assumption (Twist));
(iii) µ does not give mass to sets of dimension at most n − 1.
Then there exists a unique (in law) coupling (x, y) of (µ, ν) such that
π = law (x, y) is c-cyclically monotone; moreover this coupling is de-
terministic. The measure π is called the generalized optimal transfer-
ence plan between µ and ν. Furthermore, there is a c-convex function
ψ : M → R ∪ {+∞} such that π[∂c ψ] = 1.
• If Assumption (iii) is reinforced into
(iii’) µ is absolutely continuous with respect to the volume measure,
Removing the assumption of finite cost 263

then
e
∇ψ(x) + ∇x c(x, y) = 0 π(dx dy)-almost surely. (10.28)

• If Assumption (iii) is left as it is, but one adds

(iv) the cost function satisfies (H∞) or (SC),
then

∇ψ(x) + ∇x c(x, y) = 0 π(dx dy)-almost surely, (10.29)

and this characterizes the generalized optimal transport. Moreover, one

can define a continuous map x → T (x) on the set of differentiability
points of ψ by the equation T (x) ∈ ∂c ψ(x), and then Spt ν = T (Spt µ).

Remark 10.43. Remark 10.39 applies also in this case.

Proof of Theorem 10.42. Let us ﬁrst consider the existence problem.

Let (µk )k∈N be a sequence of compactly supported probability mea-
sures converging weakly to µ; and similarly let (νk )k∈N be a sequence
of compactly supported probability measures converging weakly to ν.
For each k, the total transport cost C(µk , νk ) is finite. Then let πk be
an optimal transference plan between µk and νk ; by Theorem 5.10(ii),
πk is c-cyclically monotone. By Lemma 4.4, the sequence (πk )k∈N con-
verges, up to extraction, to some transference plan π ∈ Π(µ, ν). By
Theorem 5.20, π is c-cyclically monotone. By Step 3 of the proof of
Theorem 5.10(i) (Rüschendorf’s theorem), there is a c-convex ψ such
that Spt(π) ⊂ ∂c ψ, in particular π[∂c ψ] = 1.
If µ is absolutely continuous, then we can proceed as in the proof
of Theorem 10.38 to show that the coupling is deterministic and
that (10.28) holds true π-almost surely.
In the case when (H∞) or (SC) is assumed, we know that ψ is c-
subdifferentiable everywhere in the interior of its domain; then we can
proceed as in Theorem 10.28 to show that the coupling is deterministic,
that (10.29) holds true, and that this equation implies y ∈ ∂c ψ(x). So
if we prove the uniqueness of the generalized optimal transference plan
this will show that (10.28) characterizes it.
Thus it all boils down to proving that under Assumptions (i)–(iii),
the generalized optimal transport is unique. This will be much more
technical, and the reader is advised to skip all the rest of the proof at
first reading. The two main ingredients will be the Besicovich density
theorem and the implicit function theorem.
264 10 Solution of the Monge problem II: Local approach

Let π be a generalized optimal coupling of µ and ν, and let ψ be

a c-convex function such that Spt(π) ⊂ ∂c ψ. Let z0 ∈ X , let Bℓ =
B[z0 , ℓ] ∪ X , and let (Kℓ )ℓ∈N be an increasing sequence of compact
subsets of Y, such that ν[∪Kℓ ] = 1. Let Zℓ := πℓ [Bℓ ×Kℓ ], cℓ := c|Bℓ ×Kℓ ,
πℓ := 1Bℓ ×Kℓ π/Zℓ , Sℓ := projM (Spt πℓ ); let also µℓ and νℓ be the two
marginals of πℓ . It is easy to see that Sℓ is still an increasing family of
compact subsets of M , and that µ[∪Sℓ ] = 1.
According to Theorem 5.19, there is a cℓ -convex function ψℓ : Bℓ →
R ∪ {+∞} which coincides with ψ on Sℓ . Since c is locally semiconcave,
the cost cℓ is uniformly semiconcave, and ψℓ is diﬀerentiable on Sℓ apart
from a set of dimension n − 1.
By Besicovich’s density theorem, the set Sℓ has µ-density 1 at
µ-almost all x ∈ Sℓ ; that is

µ[Sℓ ∩ Br (x)]
−−−→ 1.
µ[Br (x)] r→0

(The proof of this uses the fact that we are working on a Riemannian
manifold; see the bibliographical notes for more information.)
Moreover, the transport plan πℓ induced by π on Sℓ coincides with
the deterministic transport associated with the map

Tℓ : x 7−→ (∇x cℓ )−1 (x, −∇ψℓ (x)).

Since π is the nondecreasing limit of the measures Zℓ πℓ , it follows that

π itself is deterministic, and associated with the transport map T that
sends x to Tℓ (x) if x ∈ Sℓ . (This map is well-deﬁned µ-almost surely.)
Then let
n
Cℓ := x ∈ Sℓ ; x is interior to X ; Sℓ has µ-density 1 at x;
o
∀k ≥ ℓ, ψk is diﬀerentiable at x; ∇x c(x, T (x)) + ∇ψℓ (x) = 0 .

(Note: There is no reason for ∇ψℓ (x) to be an approximate gradient

of ψ at x, because ψℓ is assumed to coincide with ψ only on a set of
µ-density 1 at x, not on a set of vol-density 1 at x. . . )
The sets Cℓ form a nondecreasing family of bounded Borel sets.
Moreover, Cℓ has been obtained from Sℓ by deletion of a set of zero
volume, and therefore of zero µ-measure. In particular, µ[∪Cℓ ] = 1.
Now let πe be another generalized optimal transference plan, and
let ψe be a c-convex function with Spt(e e We repeat the same
π ) ⊂ ∂c ψ.
Removing the assumption of finite cost 265

construction as above with π e instead of π, and get sequences (Zeℓ )ℓ∈N ,

(e
πℓ )ℓ∈N , (e e e
cℓ )ℓ∈N , (ψℓ )ℓ∈N , (Cℓ )ℓ∈N , such that the sets C eℓ form a nonde-
creasing family of bounded Borel sets with µ[∪C eℓ ] = 1, ψeℓ coincides
e e
with ψ on Cℓ . Also we ﬁnd that π e is deterministic and determined by
the transport map Te, where Te coincides with Teℓ on Sℓ .
Next, the sets Cℓ ∩ C eℓ also form a nondecreasing family of Borel sets,
e
and µ[∪(Cℓ ∩ Cℓ )] = µ[(∪Cℓ ) ∩ (∪C eℓ )] = 1 (the nondecreasing property
was used in the ﬁrst equality). Also Cℓ ∩ C eℓ has µ-density 1 at each of
its points.
Assume that T 6= Te on a set of positive µ-measure; then there is
some ℓ ∈ N such that {T 6= Te} ∩ (Cℓ ∩ C eℓ ) has positive µ-measure. This
e e
implies that {Tℓ 6= Tℓ } ∩ (Cℓ ∩ Cℓ ) has positive µ-measure, and then

µ {∇ψeℓ 6= ∇ψ} ∩ (Cℓ ∩ C eℓ ) > 0.

In the sequel, I shall ﬁx such an ℓ.

Let x be a µ-Besicovich point of Eℓ := (Cℓ ∩ C eℓ ) ∩ {∇ψeℓ 6= ∇ψℓ },
i.e. a point at which Eℓ has µ-density 1. (Such a point exists since Eℓ
has positive µ-measure.) By adding a suitable constant to ψ, we may
e
assume that ψ(x) = ψ(x). Since ψℓ and ψeℓ are semiconvex, we can
apply the implicit function theorem to deduce that there is a small
neighborhood of x in which the set {ψℓ = ψeℓ } has dimension n − 1.
(See Corollary 10.52 in the Second Appendix of this chapter.) Then,
for r small enough, Assumption (iii) implies
h i
µ {ψℓ 6= ψeℓ } ∩ Br (x) = 0.

So at least one of the sets {ψℓ < ψeℓ } ∩ Br (x) and {ψℓ > ψeℓ } ∩ Br (x)
has µ-measure at least µ[Br (x)]/2. Without loss of generality, I shall
assume that this is the set {ψℓ > ψeℓ }; so
µ[Br (x)]
µ {ψℓ > ψeℓ } ∩ Br (x) ≥ . (10.30)
2
Next, ψℓ coincides with ψ on the set Sℓ , which has µ-density 1 at x,
and similarly ψeℓ coincides with ψe on a set of µ-density 1 at x. It follows
that

µ {ψ > ψ}e ∩ {ψℓ > ψeℓ } ∩ Br (x) ≥ µ[Br (x)] 1 − o(1) as r → 0.
2
(10.31)
Then since x is a Besicovich point of {∇ψℓ 6= ∇ψeℓ } ∩ Cℓ ∩ C eℓ ,
266 10 Solution of the Monge problem II: Local approach

e ∩ {ψℓ > ψeℓ } ∩ {∇ψℓ 6= ∇ψeℓ } ∩ (Cℓ ∩ C
µ {ψ > ψ} eℓ ) ∩ Br (x)

e ∩ {ψℓ > ψeℓ } ∩ Br (x) − µ[Br (x) \ (Cℓ ∩ C
≥ µ {ψ > ψ} eℓ )]
1
≥ µ[Br (x)] − o(1) − o(1) .
2
As a conclusion,
h i
e
∀r > 0 µ {ψ > ψ}∩{ψ e e e
ℓ > ψℓ }∩{∇ψℓ 6= ∇ψℓ }∩(Cℓ ∩ Cℓ )∩Br (x) > 0.
(10.32)
Now let
e
A := {ψ > ψ}.
The proof will result from the next two claims:
Claim 1: Te−1 (T (A)) ⊂ A;
Claim 2: The set {ψℓ > ψeℓ } ∩ (Cℓ ∩ C
eℓ ) ∩ {∇ψℓ 6= ∇ψeℓ } ∩ Te−1 (T (A))
lies a positive distance away from x.
Let us postpone the proofs of these claims for a while, and see why
they imply the theorem. Let S ⊂ A be deﬁned by
e ∩ {ψℓ > ψeℓ } ∩ {∇ψℓ 6= ∇ψeℓ } ∩ (Cℓ ∩ C
S := {ψ > ψ} eℓ ),

and let
r := d x, S ∩ Te−1 (T (A)) /2.
On the one hand, since S ∩ Te−1(T (A)) = ∅ by deﬁnition, µ[S ∩B(x, r)∩
Te−1 (T (A))] = µ[∅] = 0. On the other hand, r is positive by Claim 2,
so µ[S ∩ B(x, r)] > 0 by (10.32). Then

µ A \ Te−1 (T (A)) ≥ µ S ∩ B(x, r) \ Te−1 (T (A)) = µ[S ∩ B(x, r)] > 0.

Since Te−1 (T (A)) ⊂ A by Claim 1, this implies

µ[Te−1 (T (A))] < µ[A]. (10.33)

But then, we can write

µ[A] ≤ µ[T −1 (T (A))] = ν[T (A)] = ν[Te(A)] = µ[Te−1 (T (A))],

which contradicts (10.33). So it all boils down to establishing Claims 1

and 2 above.
Proof of Claim 1: Let x ∈ Te−1 (T (A)). Then there exists y ∈ A such
e
that T (y) = Te(x). Recall that T (y) ∈ ∂c ψ(y) and Te(x) ∈ ∂c ψ(x). By
Removing the assumption of finite cost 267

using the deﬁnition of the c-subdiﬀerential and the assumptions, we

can write the following chain of identities and inequalities:
e
ψ(x) e + c(y, Te(x))
+ c(x, Te(x)) ≤ ψ(y)
e + c(y, T (y))
= ψ(y)
< ψ(y) + c(y, T (y))
≤ ψ(x) + c(x, T (y))
= ψ(x) + c(x, Te(x)).
e
This implies that ψ(x) < ψ(x), i.e. x ∈ A, and proves Claim 1.
Proof of Claim 2: Assume that this claim is false; then there is a se-
quence (xk )k∈N , valued in {ψℓ > ψeℓ } ∩ (Cℓ ∩ C eℓ ) ∩ Te−1 (T (A)), such
that xk → x. For each k, there is yk ∈ A such that Te(xk ) = T (yk ). On
Cℓ ∩ Ceℓ , the transport T coincides with Tℓ and the transport Te with Teℓ ,
so Te(xk ) ∈ ∂c ψℓ (xk ) and T (yk ) ∈ ∂c ψℓ (yk ); then we can write, for any
z ∈ M,

ψℓ (z) ≥ ψℓ (yk ) + c(yk , T (yk )) − c(z, T (yk ))

= ψℓ (yk ) + c(yk , Te(xk )) − c(z, Te(xk ))
> ψeℓ (yk ) + c(yk , Te(xk )) − c(z, Te(xk ))
≥ ψeℓ (xk ) + c(xk , Te(xk )) − c(z, Te(xk ))

Since ψℓ is diﬀerentiable at x and since c is locally semiconcave by

assumption, we can expand the right-hand side and obtain

ψℓ (z) ≥ ψeℓ (xk ) + c(xk , Te(xk )) − c(z, Te(xk ))

= ψeℓ (x) + ∇ψeℓ (x) · (xk − x) + o(|xk − x|)
− ∇x c(xk , Te(xk )) · (z − xk ) + o(|z − xk |). (10.34)

where o(|z − xk |) in the last line is uniform in k. (Here I have cheated

by pretending to work in Rn rather than on a Riemannian manifold,
but all this is purely local, and invariant under diﬀeomorphism; so it
is really no problem to make sense of these formulas when z is close
enough to xk .) Recall that ∇x c(xk , Te(xk )) + ∇ψeℓ (xk ) = 0; so (10.34)
can be rewritten as

ψℓ (z) ≥ ∇ψeℓ (x) + ∇ψeℓ (x) · (xk − x) + o(|xk − x|)

+ ∇ψeℓ (xk ) · (z − xk ) + o(|z − xk |).
268 10 Solution of the Monge problem II: Local approach

Then we can pass to the limit as k → ∞, remembering that ∇ψeℓ

is continuous (because ψℓ is semiconvex, recall the proof of Proposi-
tion 10.12(i)), and get

ψℓ (z) ≥ ψeℓ (x) + ∇ψeℓ (x) · (z − x) + o(|z − x|)

= ψℓ (x) + ∇ψeℓ (x) · (z − x) + o(|z − x|). (10.35)

(Recall that x is such that ψℓ (x) = ψ(x) = ψ(x)e = ψeℓ (x).) On the
other hand, since ψℓ is diﬀerentiable at x, we have

ψℓ (z) = ψℓ (x) + ∇ψℓ (x) · (z − x) + o(|z − x|).

Combining this with (10.35), we see that

(∇ψeℓ (x) − ∇ψℓ (x)) · (z − x) ≤ o(|z − x|),

which is possible only if ∇ψeℓ (x) − ∇ψℓ (x) = 0. But this contradicts the
deﬁnition of x. So Claim 2 holds true, and this concludes the proof of
Theorem 10.42. ⊓
⊔

The next corollary of Theorem 10.42 is the same as Theorem 10.41

except that the classical Monge problem (search of a transport of mini-
mum cost) has been replaced by the generalized Monge problem (search
of a c-monotone transport).

Corollary 10.44 (Generalized Monge problem for the square

distance). Let M be a smooth Riemannian manifold, and let c(x, y) =
d(x, y)2 . Let µ, ν be two probability measures on M .

• If µ gives zero mass to sets of dimension at most n − 1, then there

is a unique transport map T solving the generalized Monge problem
between µ and ν.
• If µ is absolutely continuous, then this solution can be written

e
y = T (x) = expx ∇ψ(x) , (10.36)

where ψ is some d2 /2-convex function.

• If M has nonnegative sectional curvature, or ν is compactly sup-
ported and M satisfies (10.26), then equation (10.36) still holds,
but in addition the approximate gradient can be replaced by a true
gradient.
First Appendix: A little bit of geometric measure theory 269

Particular Case 10.45. If M = Rn , formula (10.36) becomes

2
|·|
y = x + ∇ψ(x) = ∇ + ψ (x),
2

where ψ is | · |2 /2-convex, or equivalently | · |2 /2 + ψ is convex lower

semicontinuous. So (10.36) can be written y = ∇Ψ (x), where Ψ is
convex lower semicontinuous, and we are back to Theorem 9.4.

First Appendix: A little bit of geometric measure theory

The geometric counterpart of diﬀerentiability is of course the approxi-

mation of a set S by a tangent plane, or hyperplane, or more generally
by a tangent d-dimensional space, if d is the dimension of S.
If S is smooth, then there is no ambiguity on its dimension (a curve
has dimension 1, a surface has dimension 2, etc.) and the tangent space
always exists. But if S is not smooth, this might not be the case, at
least not in the usual sense. The notion of tangent cone (sometimes
called contingent cone) often remedies this problem; it is naturally as-
sociated with the notion of countable d-rectifiability, which acts as
a replacement for the notion of “being of dimension d”. I shall recall
below some of the basic results about these concepts.

Definition 10.46 (Tangent cone). If S is an arbitrary subset of

Rn , and x ∈ S, then the tangent cone Tx S to S at x is defined as

xk − x
Tx S := lim ; xk ∈ S, xk → x, tk > 0, tk → 0 .
k→∞ tk
The dimension of this cone is defined as the dimension of the vector
space that it generates.

Definition 10.47 (Countable rectifiability). Let S be a subset of

Rn , and let d ∈ [0,Sn] be an integer. Then S is said to be countably
d-rectifiable if S ⊂ k∈N fk (Dk ), where each fk is Lipschitz on a mea-
surable subset Dk of Rd . In particular, S has Hausdorff dimension at
most d.

The next theorem summarizes two results which were useful in the
present chapter:
270 10 Solution of the Monge problem II: Local approach

Theorem 10.48 (Sufficient conditions for countable rectifiabil-

ity).
(i) Let S be a measurable set in Rn , such that Tx S has dimension
at most d for all x ∈ S. Then S is countably d-rectifiable.
(ii) Let S be a measurable set in Rn , such that Tx S is included in a
half-space, for each x ∈ ∂S. Then ∂S is countably (n − 1)-rectifiable.

Proof of Theorem 10.48. For each x ∈ S, let πx stand for the orthogonal
projection on Tx S, and let πx⊥ = Id − πx stand for the orthogonal
projection on (Tx S)⊥ . I claim that

∀x ∈ S, ∃r > 0; ∀y ∈ S, |x− y| ≤ r =⇒ |πx⊥ (x− y)| ≤ |πx (x− y)|.

(10.37)
Indeed, assume that (10.37) is false. Then there is x ∈ S, and there
is a sequence (yk )k∈N such that |x − yk | ≤ 1/k and yet |πx⊥ (x − y)| >
|πx (x − y)|, or equivalently
x − y x − y
⊥ k k
πx > πx . (10.38)
|x − yk | |x − yk |

Up to extraction of a subsequence, wk := (x − yk )/|x − yk | converges

to θ ∈ Tx S with |θ| = 1. Then |πx wk | → 1 and |πx⊥ wk | → 0, which
contradicts (10.38). So (10.37) is true.
Next, for each k ∈ N, let
n o
Sk := x ∈ S; property (10.37) holds true for |x − y| ≤ 1/k .

It is clear that the sets Sk cover S, so it is suﬃcient to establish the

d-rectifiability of Sk for a given k.
Let δ > 0 be small enough (δ < 1/2 will do). Let Πd be the set
of all orthogonal projections on d-dimensional linear spaces. Since Πd
is compact, we can find a finite family (π1 , . . . , πN ) of such orthogonal
projections, such that for any π ∈ Πd there is j ∈ {1, . . . , N } with
kπ − πj k ≤ δ, where k · k stands for the operator norm. So the set Sk
is covered by the sets
n o
Skℓ := x ∈ Sk ; kπx − πℓ k ≤ δ .

To prove (i), it suﬃces to prove that Skℓ is locally rectiﬁable. We shall

show that for any two x, x′ ∈ Skℓ ,
First Appendix: A little bit of geometric measure theory 271

1 1 + 2δ
|x− x′ | ≤ =⇒ |πℓ⊥ (x− x′ )| ≤ L |πℓ (x− x′ )|, L= ; (10.39)
k 1 − 2δ
this will imply that the intersection of Skℓ with a ball of diameter 1/k
is contained in an L-Lipschitz graph over πℓ (Rn ), and the conclusion
will follow immediately.
To prove (10.39), note that, if π, π ′ are any two orthogonal projec-
tions, then |(π ⊥ −(π ′ )⊥ )(z)| = |(Id −π)(z)−(Id −π ′ )(z)| = |(π−π ′ )(z)|,
therefore kπ ⊥ − (π ′ )⊥ k = kπ − π ′ k, and

|πℓ⊥ (x − x′ )| ≤ |(πℓ⊥ − πx⊥ )(x − x′ )| + |πx⊥ (x − x′ )|

≤ |(πℓ − πx )(x − x′ )| + |πx (x − x′ )|
≤ |(πℓ − πx )(x − x′ )| + |πℓ (x − x′ )| + |(πℓ − πx )(x − x′ )|
≤ |πℓ (x − x′ )| + 2δ|x − x′ |
≤ (1 + 2δ)|πℓ (x − x′ )| + 2δ|πℓ⊥ (x − x′ )|.

This establishes (10.39), and Theorem 10.48(i).

Now let us turn to part (ii) of the theorem. Let F be a ﬁnite set in
S n−1 such that the balls (B1/8 (ν))ν∈F cover S n−1 . I claim that

|y − x|
∀x ∈ ∂S, ∃r > 0, ∃ν ∈ F, ∀y ∈ ∂S ∩Br (x), hy −x, νi ≤
.
2
(10.40)
Indeed, otherwise there is x ∈ ∂S such that for all k ∈ N and for all
ν ∈ F there is yk ∈ ∂S such that |yk − x| ≤ 1/k and hyk − x, νi >
|yk − x|/2. By assumption there is ξ ∈ S n−1 such that

∀ζ ∈ Tx S, hξ, ζi ≤ 0.

Let ν ∈ F be such that |ξ − ν| < 1/8 and let (yk )k∈N be a sequence as
above. Since yk ∈ ∂S and yk 6= x, there is yk′ ∈ S such that |yk − yk′ | <
|yk − x|/8. Then

|yk − x| |x − yk′ |
hyk′ − x, ξi ≥ hyk − x, νi − |yk − x| |ξ − ν| − |y − yk′ | ≥ ≥ .
4 8
So D y′ − x E 1
k
,ξ ≥ . (10.41)
|yk′ − x| 8
Up to extraction of a subsequence, (yk′ − x)/|yk′ − x| converges to some
ζ ∈ Tx S, and then by passing to the limit in (10.41) we have hζ, ξi ≥
272 10 Solution of the Monge problem II: Local approach

1/8. But by deﬁnition, ξ is such that hζ, ξi ≤ 0 for all ζ ∈ Tx S. This

contradiction establishes (10.40).
As a consequence, ∂S is included in the union of all sets A1/k,ν ,
where k ∈ N, ν ∈ F , and
n |y − x| o
Ar,ν := x ∈ ∂S; ∀y ∈ ∂S ∩ Br (x), hy − x, νi ≤ .
2
To conclude the proof of the theorem it is sufficient to show that each
Ar,ν is locally the image of a Lipschitz function defined on a subset of
an (n − 1)-dimensional space.
So let r > 0 and ν ∈ F be given, let x0 ∈ Ar,ν , and let π be the
orthogonal projection of Rn to ν ⊥ . (Explicitly, π(x) = x − hx, νiν.) We
shall show that on D := Ar,ν ∩ Br/2 (x0 ), π is injective and its inverse
(on π(D)) is Lipschitz. To see this, first note that for any two x, x′ ∈ D,
one has x′ ∈ Br (x), so, by definition of Ar,ν , hx′ − x, νi ≤ |x′ − x|/2.
By symmetry, also hx − x′ , νi ≤ |x − x′ |/2, so in fact
′
hx − x′ , νi ≤ |x − x | .
2
Then if z = π(x) and z ′ = π(x′ ),
|x − x′ |
|x − x′ | ≤ |z − z ′ | + hx, νi − hx′ , νi ≤ |z − z ′ | + ,
2
so |x − x′ | ≤ 2 |z − z ′ |. This concludes the proof. ⊓
⊔

Second Appendix: Nonsmooth implicit function theorem

Let M be an n-dimensional smooth Riemannian manifold, and x0 ∈ M .

I shall say that a set M ′ ⊂ M is a k-dimensional C r graph (resp. k-
dimensional Lipschitz graph) in a neighborhood of x0 if there are
(i) a smooth system of coordinates around x0 , say

x = ζ(x′ , y),

where ζ is a smooth diﬀeomorphism from an open subset of Rk × Rn−k ,

into a neighborhood O of x0 ;
Second Appendix: Nonsmooth implicit function theorem 273

(ii) a C r (resp. Lipschitz) function ϕ : O′ → Rn−k , where O′ is an

open subset of Rk ;
such that for all x ∈ O,

x ∈ M ′ ⇐⇒ y = ϕ(x′ ).

This deﬁnition is illustrated by Figure 10.2.

Rn−k ϕ

Rk
Fig. 10.2. k-dimensional graph

The following statement is a consequence of the classical implicit

function theorem: If f : M → R is of class C r (r ≥ 1), f (x0 ) = 0 and
∇f (x0 ) 6= 0, then the set {f = 0} = f −1 (0) is an (n − 1)-dimensional
C r graph in a neighborhood of x0 .
In this Appendix I shall consider a nonsmooth version of this theo-
rem. The following notion will be useful.

Definition 10.49 (Clarke subdifferential). Let f be a continu-

ous real-valued function defined on an open subset U of a Riemannian
manifold. For each x ∈ U , define ∂f (x) as the convex hull of all limits
of sequences ∇f (xk ), where all xk are differentiability points of f and
xk → x. In short:
n o
∂f (x) = Conv lim ∇f (xk ) .
xk →x

Here comes the main result of this P Appendix.

P If (Ai )1≤i≤m are sub-
sets of a vector space, I shall write Ai = { i ai ; ai ∈ Ai }.
274 10 Solution of the Monge problem II: Local approach

Theorem 10.50 (Nonsmooth implicit function theorem). Let

(fi )1≤i≤m be real-valued Lipschitz functions defined in an open set U of
an n-dimensional Riemannian manifold, and let x0 ∈ U be such that:
P
(a) fi (x0 ) = 0;
P
(b) 0 ∈
/ ∂fi (x0 ).
P
Then { fi = 0} is an (n − 1)-dimensional Lipschitz graph around x0 .

Remark 10.51. Let ψ be a convex continuous function deﬁned around

some point x0 ∈ Rn , and let p ∈ Rn such that p does not belong to
the Clarke differential of ψ at x0 ; then 0 does not belong to the Clarke
differential of ψe : x 7−→ ψ(x) − ψ(x0 ) − p · (x − x0 ) + |x − x0 |2 at x0 ,
and Theorem 10.50 obviously implies the existence of x 6= x0 such that
e
ψ(x) = 0, in particular ψ(x) < ψ(x0 ) + p · (x − x0). So p does not belong
to the subdifferential of ψ at x0 . In other words, the subdifferential is
included in the Clarke differential. The other inclusion is obvious, so
both notions coincide. This justifies a posteriori the notation ∂ψ used
in Definition 10.49. An easy localization argument shows that similarly,
for any locally semiconvex function ψ defined in a neighborhood of x,
∂ψ(x) = ∇− ψ(x).

Corollary 10.52 (Implicit function theorem for two subdiffer-

entiable functions). Let ψ and ψe be two locally subdifferentiable
functions defined in an open set U of an n-dimensional Riemannian
manifold M , and let x0 ∈ U be such that ψ, ψe are differentiable at x0 ,
and
e 0 );
ψ(x0 ) = ψ(x e 0 ).
∇ψ(x0 ) 6= ∇ψ(x
Then there is a neighborhood V of x0 such that {ψ = ψ} e ∩ V is an
(n − 1)-dimensional Lipschitz graph; in particular, it has Hausdorff
dimension exactly n − 1.

Proof of Corollary 10.52. Let f1 = ψ, f2 = −ψ. e Since f1 is locally

subdifferentiable and f2 is locally superdifferentiable, both functions
are Lipschitz in a neighborhood of x0 (Theorem 10.8(iii)). Moreover,
∇f1 and ∇f2 are continuous on their respective domain P of definition;
soP∂fi (x0 ) = {∇fi (x0 )} (i = 1, 2). Then by assumption ∂fi (x0 ) =
{ ∇fi (x0 )} does not contain 0. The conclusion follows from Theo-
rem 10.50. ⊓
⊔

Proof of Theorem 10.50. The statement is purely local and invariant

under C 1 diﬀeomorphism, so we might pretend to be working in Rn .
Second Appendix: Nonsmooth implicit function theorem 275

kLip ) ⊂ Rn , so ∂fi (x0 ) is a compact

For each i, ∂fi (x0 ) ⊂ B(0, kfiP
n
convex subset of R ; then also ∂fi (x0 ) is compact and convex, and
by assumption does not contain 0. By the Hahn–Banach theorem, there
are v ∈ Rn and α > 0 such that
X
hp, vi ≥ α for all p ∈ ∂fi (x0 ). (10.42)
P
So there is a neighborhood V of x0 such that h∇fi (x), vi ≥ α/2
at all points x ∈ V where all functions fi are diﬀerentiable. (Other-
P there would be a sequence (xk )k∈N , converging to x0 , such that
wise
h∇fi (xk ), vi < α/2, but then up to extraction
P of a subsequence we
would have ∇fi (xk ) → pi ∈ ∂fi (x0 ), so h pi , vi ≤ α/2 < α, which
would contradict (10.42).)
Without loss of generality, we may assume that x0 = 0, v =
(e1 , 0, . . . , 0), V = (−β, β) × P
B(0, r0 ), where the latter ball is a sub-
set of Rn−1 and r0 ≤ (αβ)/(4 kfi kLip ). Further, let
n
Z ′ := y ∈ B(0, r0 ) ⊂ Rn−1 ;
o
λ1 {t ∈ (−β, β); ∃ i; ∇fi (t, y) does not exist} > 0 ,

and
Z = (−β, β) × Z ′ ; D = V \ Z.
I claim that λn [Z] = 0. To prove this it is suﬃcient to check that
λn−1 [Z ′ ] = 0. But Z ′ is the nonincreasing limit of (Zℓ′ )ℓ∈N , where
n
Zℓ′ = y ∈ B(0, r0 );
o
λ1 {t ∈ (−β, β); ∃ i; ∇fi (t, y) does not exist} ≥ 1/ℓ .

By Fubini’s theorem,
h i
λn x ∈ O; ∇fi (x) does not exist for some i ≥ (λn−1 [Zℓ′ ]) × (1/ℓ);

and the left-hand side is equal to 0 since all fi are differentiable almost
everywhere. It follows that λn−1 [Zℓ′ ] = 0, and by taking the limit ℓ → ∞
we obtain λP ′
n−1 [Z ] = 0.
Let f = fi , and let ∂1 f = h∇f, vi stand for its partial derivative
with respect to the first coordinate. The first step of the proof has
shown that ∂1 f (x) ≥ α/2 at each point x where all functions fi are
276 10 Solution of the Monge problem II: Local approach

diﬀerentiable. So, for each y ∈ B(0, r0 ) \ Z ′ , the function t → f (t, y)

is Lipschitz and diﬀerentiable λ1 -almost everywhere on (−β, β), and it
satisﬁes f ′ (t, y) ≥ α/2. Thus for all t, t′ ∈ (−β, β),

t < t′ =⇒ f (t′ , y) − f (t, y) ≥ (α/2) (t′ − t). (10.43)

This holds true for all ((t, y), (t′ , y)) in D × D. Since Z = V \ D
has zero Lebesgue measure, D is dense in V , so (10.43) extends to
all ((t, y), (t′ , y)) ∈ V .
For all y ∈ B(0, r0 ), inequality (10.43), combined with the estimate
αβ
|f (0, y)| = |f (0, y) − f (0, 0)| ≤ kf kLip |y| ≤ ,
4
guarantees that the equation f (t, y) = 0 admits exactly one solution
t = ϕ(y) in (−β, β).
It only remains to check that ϕ is Lipschitz on B(0, r0 ). Let y, z ∈
B(0, r0 ), then f (ϕ(y), y) = f (ϕ(z), z) = 0, so

f (ϕ(y), y) − f (ϕ(z), y) = f (ϕ(z), z) − f (ϕ(z), y). (10.44)

Third Appendix: Curvature and the Hessian of the

squared distance

The practical veriﬁcation of the uniform semiconcavity of a given cost

function c(x, y) might be a very complicated task in general. In the par-
ticular case when c(x, y) = d(x, y)2 , this problem can be related to the
sectional curvature of the Riemannian manifold. In this Appendix
I shall recall some results about these links, some of them well-known,
other ones more conﬁdential. The reader who does not know about sec-
tional curvature can skip this Appendix, or take a look at Chapter 14
ﬁrst.
Third Appendix: Curvature and the Hessian of the squared distance 277

If M = Rn is the Euclidean space, then d(x, y) = |x − y| and there

is the very simple formula

2 |x − y|2
∇x = In ,
2

where the right-hand side is just the identity operator on Tx Rn = Rn .

If M is an arbitrary Riemannian manifold, there is no simple for-
mula for the Hessian ∇2x d(x, y)2 /2, and this operator will in general
not be defined in the sense that it can take eigenvalues −∞ if x and
y are conjugate points. However, as we shall now see, there is still a
recipe to estimate ∇2x d(x, y)2 /2 from above, and thus derive estimates
of semiconcavity for d2 /2.
So let x and y be any two points in M , and let γ be a minimiz-
ing geodesic joining y to x, parametrized by arc length; so γ(0) = y,
γ(d(x, y)) = x. Let H(t) stand for the Hessian operator of x →
d(x, y)2 /2 at x = γ(t). (The definition of this operator is recalled and
discussed in Chapter 14.) On [0, d(x, y)) the operator H(t) is well-
defined (since the geodesic is minimizing, it is only at t = d(x, y) that
eigenvalues −∞ may appear). It starts at H(0) = Id and then its
eigenvectors and eigenvalues vary smoothly as t varies in (0, d(x, y)).
The unit vector γ̇(t) is an eigenvector of H(t), associated with the
eigenvalue +1. The problem is to bound the eigenvalues in the or-
thogonal subspace S(t) = (γ̇)⊥ ⊂ Tγ(t) M . So let (e2 , . . . , en ) be an
orthonormal basis of S(0), and let (e2 (t), . . . , en (t)) be obtained by
parallel transport of (e2 , . . . , en ) along γ; for any t this remains an or-
thonormal basis of S(t). To achieve our goal, it is sufficient to bound
above the quantities h(t) = hH(t) · ei (t), ei (t)iγ(t) , where i is arbitrary
in {2, . . . , n}.
Since H(0) is the identity, we have h(0) = 1. To get a differential
equation on h(t), we can use a classical computation of Riemannian
geometry, about the Hessian of the distance (not squared!): If k(t) =
h∇2x d(y, x) · ei (t), ei (t)iγ(t) , then

k̇(t) + k(t)2 + σ(t) ≤ 0, (10.45)

where σ(t) is the sectional curvature of the plane generated by γ̇(t) and
ei (t) inside Tγ(t) M . To relate k(t) and h(t), we note that

d(y, x)2
∇x = d(y, x) ∇x d(y, x);
2
278 10 Solution of the Monge problem II: Local approach

d(y, x)2
∇2x = d(y, x) ∇2x d(y, x) + ∇x d(x, y) ⊗ ∇x d(x, y).
2
By applying this to the tangent vector ei (t) and using the fact that
∇x d(x, y) at x = γ(t) is just γ̇(t), we get

h(t) = d(y, γ(t)) k(t) + hγ̇(t), ei (t)i2 = t k(t).

Plugging this into (10.45) results in

t ḣ(t) − h(t) + h(t)2 ≤ −t2 σ(t). (10.46)

From (10.46) follow the two comparison results which were used in
Theorem 10.41 and Corollary 10.44:
(a) Assume that the sectional curvatures of M are all non-
negative. Then (10.46) forces ḣ ≤ 0, so h remains bounded above by 1
for all times. In short:

2 d(x, y)2
nonnegative sectional curvature =⇒ ∇x ≤ Id Tx M .
2
(10.47)
(If we think of the Hessian as a bilinear form, this is the same as
∇2x (d(x, y)2 /2) ≤ g, where g is the Riemannian metric.) Inequal-
ity (10.47) is rigorous if d(x, y)2 /2 is twice diﬀerentiable at x; otherwise
the conclusion should be reinterpreted as

d(x, y)2 r2
x→ is semiconcave with a modulus ω(r) = .
2 2

(b) Assume now that the sectional curvatures at point x are

bounded below by −C/d(x0 , x)2 , where x0 is an arbitrary point. In
this case I shall say that M has asymptotically nonnegative curvature.
Then if y varies in a compact subset, we have a lower bound like σ(t) ≥
−C ′ /d(y, x)2 = −C ′ /t2 , where C ′ is some positive constant. So (10.46)
implies the diﬀerential inequality

t ḣ(t) ≤ C ′ + h(t) − h(t)2 .

√
If h(t) ever becomes greater than C := (1 + 1 + 4C ′2 )/2, then the
right-hand side becomes negative; so h can never go above C. The
conclusion is that
Bibliographical notes 279

M has asymptotically nonnegative curvature =⇒

2 d(x, y)2
∀y ∈ K, ∇x ≤ C(K) Id Tx M ,
2

where K is any compact subset of M . Again, at points where d(x, y)2 /2

is not twice diﬀerentiable, the conclusion should be reinterpreted as

d(x, y)2 r2
x→ is semiconcave with a modulus ω(r) = C(K) .
2 2
Examples 10.53. The previous result applies to any compact mani-
fold, or any manifold which has been obtained from Rn by modification
on a compact set. But it does not apply to the hyperbolic space Hn ;
in fact, if y is any given point in Hn , then the function x → d(y, x)2 is
not uniformly semiconcave as x → ∞. (Take for instance the unit disk
in R2 , with polar coordinates (r, θ) as a model of H2 , then the distance
from the origin is d(r, θ) = log((1 + r)/(1 − r)); an explicit computation
shows that the first (and only nonzero) coefficient of the matrix of the
Hessian of d2 /2 is 1 + r d(r), which diverges logarithmically as r → 1.)

Remark 10.54. The exponent 2 appearing in the deﬁnition of “asymp-

totic nonnegative curvature” above is optimal in the sense that for any
p < 2 it is possible to construct manifolds satisfying σx ≥ −C/d(x0 , x)p
and on which d(x0 , · )2 is not uniformly semiconcave.

Bibliographical notes

The key ideas in this chapter were first used in the case of the quadratic
cost function in Euclidean space [154, 156, 722].
The existence of solutions to the Monge problem and the differ-
entiability of c-convex functions, for strictly superlinear convex cost
functions in Rn (other than quadratic) was investigated by several au-
thors, including in particular Rüschendorf [717] (formula (10.4) seems
to appear there for the first time), Smith and Knott [754], Gangbo and
McCann [398, 399]. In the latter reference, the authors get rid of all mo-
ment assumptions by avoiding the explicit use of Kantorovich duality.
These results are reviewed in [814, Chapter 2]. Gangbo and McCann
impose some assumptions of growth and superlinearity, such as the one
280 10 Solution of the Monge problem II: Local approach

described in Example 10.19. Ekeland [323] applies similar tools to the

optimal matching problem (in which both measures are transported to
another topological space).
It is interesting to notice that the structure of the optimal map can
be guessed heuristically from the old-fashioned method of Lagrange
multipliers; see [328, Section 2.2] where this is done for the case of the
quadratic cost function.
The terminology of the twist condition comes from dynamical sys-
tems, in particular the study of certain classes of diffeomorphisms in
dimension 2 called twist diffeomorphisms [66]. Moser [641] showed that
under certain conditions, these diffeomorphisms can be represented as
the time-1 map of a strictly convex Lagrangian system. In this setting,
the twist condition can be informally recast by saying that the dynam-
ics “twists” vertical lines (different velocities, common position) into
oblique curves (different positions).
There are cases of interest where the twist condition is satisfied
even though the cost function does not have a Lagrangian struc-
ture. Examples are the so-called symmetrized Bregman cost function
c(x, y) = h∇φ(x) − ∇φ(y), x − yi where φ is strictly convex [206],
or the cost c(x, y) = |x − y|2 + |f (x) − g(y)|2 , where f and g are
convex and k-Lipschitz, k < 1 [794]. For applications in meteorol-
ogy, Cullen and Maroofi [268] considered cost functions of the form
c(x, y) = [(x1 − y1 )2 + (x2 − y2 )2 + ϕ(x3 )]/y3 in a bounded region of R3 .
Conversely, a good example of an interesting smooth cost function
which does not satisfy the twist condition is provided by the restriction
of the square Euclidean distance to the sphere, or to a product of convex
boundaries [6, 400].
Gangbo and McCann [399] considered not only strictly convex,
but also strictly concave cost functions in Rn (more precisely, strictly
concave functions of the distance), which are probably more realis-
tic from an economic perspective, as explained in the introduction of
their paper. The main results from [399] are briefly reviewed in [814,
Section 2.4]. Further numerical and theoretical analysis for nonconvex
cost functions in dimension 1 have been considered by McCann [615],
Rüschendorf and Uckelmann [724, 796], and Plakhov [683]. Hsu and
Sturm [484] worked on a very nice application of an optimal trans-
port problem with a concave cost to a problem of maximal coupling of
Brownian paths.
Bibliographical notes 281

McCann [616] proved Theorem 10.41 when M is a compact Rieman-

nian manifold and µ is absolutely continuous. This was the first optimal
transport theorem on a Riemannian manifold (save for the very par-
ticular case of the n-dimensional torus, which was treated before by
Cordero-Erausquin [240]). In his paper McCann also mentioned the
possibility of covering more general cost functions expressed in terms
of the distance.
Later Bernard and Buffoni [105] extended McCann’s results to
more general Lagrangian cost functions, and imported tools and tech-
niques from the theory of Lagrangian systems (related in particular to
Mather’s minimization problem). The proof of the basic result of this
chapter (Theorem 10.28) is in some sense an extension of the Bernard–
Buffoni theorem to its “natural generality”. It is clear from the proof
that the Riemannian structure plays hardly any role, so it extends for
instance to Finsler geometries, as was done in the work of Ohta [657].
Before the explicit link realized by Bernard and Buffoni, several
researchers, in particular Evans, Fathi and Gangbo, had become grad-
ually aware of the strong similarities between Monge’s theory on the
one hand, and Mather’s theory on the other. De Pascale, Gelli and
Granieri [278] contributed to this story; see also [839].
Fang and Shao rewrote McCann’s theorem in the formalism of Lie
groups [339]. They used this reformulation as a starting point to derive
theorems of unique existence of the optimal transport on the path space
over a Lie group. Shao’s PhD Thesis [748] contains a synthetic view on
these issues, and reminders about differential calculus in Lie groups.
Feyel and Üstünel [358, 359, 360, 362] derived theorems of unique
solvability of the Monge problem in the Wiener space, when the cost is
the square of the Cameron–Martin distance (or rather pseudo-distance,
since it takes the value +∞). Their tricky analysis goes via finite-
dimensional approximations.
Ambrosio and Rigot [33] adapted the proof of Theorem 10.41 to
cover degenerate (subriemannian) situations such as the Heisenberg
group, equipped with either the squared Carnot–Carathéodory metric
or the squared Korányi norm. The proofs required a delicate analysis
of minimizing geodesics, differentiability properties of the squared dis-
tance, and fine properties of BV functions on the Heisenberg group.
Then Rigot [702] generalized these results to certain classes of groups.
Further work in this area (including the absolute continuity of Wasser-
stein geodesics at intermediate times, the differentiability of the optimal
282 10 Solution of the Monge problem II: Local approach

maps, and the derivation of an equation of Monge–Ampère type) was

achieved by Agrachev and Lee [3], Figalli and Juillet [366], and Figalli
and Rifford [370].
Another nonsmooth generalization of Theorem 10.41 was obtained
by Bertrand [114], who adapted McCann’s argument to the case of
an Alexandrov space with finite (Hausdorff) dimension and (sectional)
curvature bounded below. His analysis makes crucial use of fine regular-
ity results on the structure of Alexandrov spaces, derived by Perelman,
Otsu and Shioya [174, 175, 665].
Remark 10.30 about the uniqueness of the potential ψ is due to
Loeper [570, Appendix]; it still holds if µ is any singular measure, pro-
vided that dµ/dvol > 0 almost everywhere (even though the transport
map might not be unique then).
The use of approximate differentials as in Theorem 10.38 was initi-
ated by Ambrosio and collaborators [30, Chapter 6], for strictly convex
cost functions in Rn . The adaptation to Riemannian manifolds is due to
Fathi and Figalli [348], with a slightly more complicated (but slightly
more general) approach than the one used in this chapter.
The tricky proof of Theorem 10.42 takes its roots in Alexandrov’s
uniqueness theorem for graphs of prescribed Gauss curvature [16]. (The
method can be found in [53, Theorem 10.2].) McCann [613] understood
that Alexandrov’s strategy could be revisited to yield the uniqueness
of a cyclically monotone transport in Rn without the assumption of
finite total cost (Corollary 10.44 in the case when M = Rn ). The tricky
extension to more general cost functions on Riemannian manifolds was
performed later by Figalli [363]. The current proof of Theorem 10.42
is so complicated that the reader might prefer to have a look at [814,
Section 2.3.3], where the core of McCann’s proof is explained in simpler
terms in the particular case c(x, y) = |x − y|2 .
The case when the cost function is the distance (c(x, y) = d(x, y))
is not covered by Theorem 10.28, nor by any of the theorems ap-
pearing in the present chapter. This case is quite more tricky, be it
in Euclidean space or on a manifold. The interested reader can con-
sult [814, Section 2.4.6] for a brief review, as well as the research pa-
pers [20, 31, 32, 104, 190, 279, 280, 281, 354, 364, 380, 686, 765, 791].
The treatment by Bernard and Buffoni [104] is rather appealing, for
its simplicity and links to dynamical system tools. An extreme case
(maybe purely academic) is when the cost is the Cameron–Martin dis-
Bibliographical notes 283

tance on the Wiener space; then usual strategies seem to fail, in the
first place because of (non)measurability issues [23].
The optimal transport problem with a distance cost function is
also related to the irrigation problem studied recently by various au-
thors [109, 110, 111, 112, 152], the Bouchitté–Buttazzo variational
problem [147, 148], and other problems as well. In this connection,
see also Pratelli [689].
The partial optimal transport problem, where only a fixed fraction
of the mass is transferred, was studied in [192, 365]. Under adequate as-
sumptions on the cost function, one has the following results: whenever
the transferred mass is at least equal to the shared mass between the
measures µ and ν, then (a) there is uniqueness of the partial transport
map; (b) all the shared mass is at the same time both source and target;
(c) the “active” region depends monotonically on the mass transferred,
and is the union of the intersection of the supports and a semiconvex
set.
To conclude, here are some remarks about the technical ingredients
used in this chapter.
Rademacher [697] proved his theorem of almost everywhere differ-
entiability in 1918, for Lipschitz functions of two variables; this was
later generalized to an arbitrary number of variables. The simple ar-
gument presented in this section seems to be due to Christensen [233];
it can also be found, up to minor variants, in modern textbooks about
real analysis such as the one by Evans and Gariepy [331, pp. 81–84].
Ambrosio showed me another simple argument which uses Lebesgue’s
density theorem and the identification of a Lipschitz function with a
function whose distributional derivative is essentially bounded.
The book by Cannarsa and Sinestrari [199] is an excellent reference
for semiconvexity and subdifferentiability in Rn , as well as the links
with the theory of Hamilton–Jacobi equations. It is centered on semi-
concavity rather than semiconvexity (and superdifferentiability rather
than subdifferentiability), but this is just a question of convention.
Many regularity results in this chapter have been adapted from that
source (see in particular Theorem 2.1.7 and Corollary 4.1.13 there).
Also the proof of Theorem 10.48(i) is adapted from [199, Theorem 4.1.6
and Corollary 4.1.9]. The core results in this circle of ideas and tools
can be traced back to a pioneering paper by Alberti, Ambrosio and
Cannarsa [12]. Following Ambrosio’s advice, I used the same methods
to establish Theorem 10.48(ii) in the present notes.
284 10 Solution of the Monge problem II: Local approach

One often says that S ⊂ Rn is d-rectiﬁable if it can be written as

a countable union of C 1 manifolds (submanifolds of Rn ), apart from a
set of zero Hd measure. This property seems stronger, but is actually
equivalent to Definition 10.47 (see [753, Lemma 11.1]). Stronger notions
are obtained by changing C 1 into C r for some r ≥ 2. For instance
Alberti [9] shows that the (n−1)-rectifiability of the nondifferentiability
set of a convex function is achieved with C 2 manifolds, which is optimal.
Apart from plain subdifferentiability and Clarke subdifferentiability,
other notions of differentiability for nonsmooth functions are discussed
in [199], such as Dini derivatives or reachable gradients.
The theory of approximate differentiability (in Euclidean space) is
developed in Federer [352, Section 3.1.8]; see also Ambrosio, Gigli and
Savaré [30, Section 5.5]. A central result is the fact that any approxi-
mately differentiable function coincides, up to a set of arbitrarily small
measure, with a Lipschitz function.
The proof of Besicovich’s density theorem [331, p. 43] is based on
Besicovich’s covering lemma. This theorem is an alternative to the more
classical Lebesgue density theorem (based on Vitali’s covering lemma),
which requires the doubling property. The price to pay for Besicovich’s
theorem is that it only works in Rn (or a Riemannian manifold, by
localization) rather than on a general metric space.
The nonsmooth implicit function theorem in the second Appendix
(Theorem 10.50) seems to be folklore in nonsmooth real analysis; the
core of its proof was explained to me by Fathi. Corollary 10.52 was
discovered or rediscovered by McCann [613, Appendix], in the case
where ψ and ψe are convex functions in Rn .
Everything in the Third Appendix, in particular the key differential
inequality (10.46), was explained to me by Gallot. The lower bound
assumption on the sectional curvatures σx ≥ −C/d(x0 , x)2 is sufficient
to get upper bounds on ∇2x d(x, y)2 as y stays in a compact set, but it
is not sufficient to get upper bounds that are uniform in both x and y.
A counterexample is developed in [393, pp. 213–214].
The exact computation about the hyperbolic space in Example 10.53
is the extremal situation for a comparison theorem about the Hessian of
the squared distance [246, Lemma 3.12]: If M is a Riemannian manifold
with sectional curvature bounded below by κ < 0, then
p
2 d(x, y)2 |κ| d(x, y)
∇x ≤ p .
2 tanh( |κ| d(x, y))
Bibliographical notes 285

As pointed out to me by Ghys, the problem of ﬁnding a suﬃcient

condition for the Hessian of d(x, y)2 to be bounded above is related to
the problem of whether large spheres Sr (y) centered at y look ﬂat at
inﬁnity, in the sense that their second fundamental form is bounded
like O(1/r).
11

The Jacobian equation

Transport is but a change of variables, and in many problems involving

changes of variables, it is useful to write the Jacobian equation

f (x) = g(T (x)) JT (x),

where f and g are the respective densities of the probability measures

µ and ν with respect to the volume measure (in Rn , the Lebesgue
measure), and JT (x) is the absolute value of the Jacobian determinant
associated with T :
vol [T (Br (x))]
JT (x) = | det(∇T (x))| = lim .
r→0 vol [Br (x)]

There are two important things that one should check before writing
the Jacobian equation: First, T should be injective on its domain of
definition; secondly, it should possess some minimal regularity.
So how smooth should T be for the Jacobian equation to hold true?
We learn in elementary school that it is sufficient for T to be continu-
ously differentiable, and a bit later that it is actually enough to have T
Lipschitz continuous. But that degree of regularity is not always avail-
able in optimal transport! As we shall see in Chapter 12, the transport
map T might fail to be even continuous.
There are (at least) three ways out of this situation:
(i) Only use the Jacobian equation in situations where the opti-
mal map is smooth. Such situations are rare; this will be discussed in
Chapter 12.
(ii) Only use the Jacobian equation for the optimal map between
µt0 and µt , where (µt )0≤t≤1 is a compactly supported displacement
288 11 The Jacobian equation

interpolation, and t0 is ﬁxed in (0, 1). Then, according to Theorem 8.5,

the transport map is essentially Lipschitz. This is the strategy that
I shall use in most of this course.
(iii) Apply a more sophisticated theorem of change of variables, cov-
ering for instance changes of variables with bounded variation (possibly
discontinuous). It is in fact sufficient that the map T be differentiable
almost everywhere, or even just approximately differentiable almost
everywhere, in the sense of Definition 10.2. Such a theorem is stated
below without proof; I shall use it in Chapter 23. The volume measure
on M will be denoted by vol.
Theorem 11.1 (Jacobian equation). Let M be a Riemannian man-
ifold, let f ∈ L1 (M ) be a nonnegative integrable function on M , and
let T : M → M be a Borel map. Define µ(dx) = f (x) vol(dx) and
ν := T# µ. Assume that:
(i) There exists a measurable set Σ ⊂ M , such that f = 0 almost
everywhere outside of Σ, and T is injective on Σ;
(ii) T is approximately differentiable almost everywhere on Σ.
e be the approximate gradient of T , and let JT be defined almost
Let ∇T
everywhere on Σ by the equation JT (x) := | det(∇T e (x))|. Then ν is
absolutely continuous with respect to the volume measure if and only
if JT > 0 almost everywhere. In that case ν is concentrated on T (Σ),
and its density is determined by the equation
f (x) = g(T (x)) JT (x). (11.1)
In an informal writing:
d(T −1 )# (g vol)
= JT (g ◦ T ) vol.
dvol
Theorem 11.1 establishes the Jacobian equation as soon as, say, the
optimal transport has locally bounded variation. Indeed, in this case
the map T is almost everywhere differentiable, and its gradient coin-
cides with the absolutely continuous part of the distributional gradient
∇D′ T . The property of bounded variation is obviously satisfied for the
quadratic cost in Euclidean space, since the second derivative of a con-
vex function is a nonnegative measure.
Example 11.2. Consider two probability measures µ0 and µ1 on Rn ,
with finite second moments; assume that µ0 and µ1 are absolutely con-
tinuous with respect to the Lebesgue measure, with respective densities
11 The Jacobian equation 289

f0 and f1 . Under these assumptions there exists a unique optimal trans-

port map between µ0 and µ1 , and it takes the form T (x) = ∇Ψ (x) for
some lower semicontinuous convex function Ψ . There is a unique dis-
placement interpolation (µt )0≤t≤1 , and it is deﬁned by

µt = (Tt )# µ0 , Tt (x) = (1 − t) x + t T (x) = (1 − t) x + t ∇Ψ (x).

By Theorem 8.7, each µt is absolutely continuous, so let ft be its density.

The map ∇T is of locally bounded variation, and it is diﬀerentiable
almost everywhere, with Jacobian matrix ∇T = ∇2 Ψ , where ∇2 Ψ is
the Alexandrov Hessian of Ψ (see Theorem 14.25 later in this course).
Then, it follows from Theorem 11.1 that, µ0 (dx)-almost surely,

f0 (x) = f1 (∇Ψ (x)) det(∇2 Ψ (x)).

Also, for any t ∈ [0, 1],

f0 (x) = ft (Tt (x)) det(∇Tt (x))

= ft (1 − t)x + t∇Ψ (x) det (1 − t)In + t∇2 Ψ (x) .

If Tt0 →t = Tt ◦ Tt−1
0
stands for the transport map between µt0 and µt ,
then the equation

ft0 (x) = ft (Tt0 →t (x)) det(∇Tt0 →t (x))

also holds true for t0 ∈ (0, 1); but now this is just the theorem of change
of variables for Lipschitz maps.

In the sequel of this course, with the noticeable expression of Chap-

ter 23, it will be suﬃcient to use the following theorem of change of
variables.

Theorem 11.3 (Change of variables). Let M be a Riemannian

manifold, and c(x, y) a cost function deriving from a C 2 Lagrangian
L(x, v, t) on TM × [0, 1], where L satisfies the classical conditions of
Definition 7.6, together with ∇2v L > 0. Let (µt )0≤t≤1 be a displacement
interpolation, such that each µt is absolutely continuous and has density
ft . Let t0 ∈ (0, 1), and t ∈ [0, 1]; further, let Tt0 →t be the (µt0 -almost
surely) unique optimal transport from µt0 to µt , and let Jt0 →t be the
associated Jacobian determinant. Let F be a nonnegative measurable
function on M × R+ such that
290 11 The Jacobian equation

ft (y) = 0 =⇒ F (y, ft (y)) = 0.

Then,
Z Z ft0 (x)
F (y, ft (y)) vol(dy) = F Tt0 →t (x), Jt0 →t (x) vol(dx).
M M Jt0 →t (x)

Furthermore, µt0 (dx)-almost surely, Jt0 →t (x) > 0 for all t ∈ [0, 1].

Proof of Theorem 11.3. For brevity I shall abbreviate vol(dx) into just
dx. Let us first consider the case when (µt )0≤t≤1 is compactly sup-
ported. Let Π be a probability measure on the set of minimizing curves,
such that µt = (et )# Π. Let Kt = et (Spt Π) and Kt0 = et0 (Spt Π). By
Theorem 8.5, the map γt0 → γt is well-defined and Lipschitz for all
γ ∈ Spt Π. So Tt0 →t (γt0 ) = γt is a Lipschitz map Kt0 → Kt . By as-
sumption µt is absolutely continuous, so Theorem 10.28 (applied with
the cost function ct0 ,t (x, y), or maybe ct,t0 (x, y) if t < t0 ) guarantees
that the coupling (γt , γt0 ) is deterministic, which amounts to saying
that γt0 → γt is injective apart from a set of zero probability.
Then we can use the change of variables formula with g = 1Kt ,
T = Tt0 →t , and we find f (x) = Jt0 →t (x). Therefore, for any nonnegative
measurable function G on M ,
Z Z
G(y) dy = G(y) d((Tt0 →t )# µ)(y)
Kt Kt
Z
= (G ◦ Tt0 →t (x)) f (x) dx
K t0
Z
= G(Tt0 →t (x)) Jt0 →t (x) dx.
K t0

We can apply this to G(y) = F (y, ft (y)) and replace ft (Tt0 →t (x)) by
ft0 (x)/Jt0 →t (x); this is allowed since in the right-hand side the contri-
bution of those x with ft (Tt0 →t (x)) = 0 is negligible, and Jt0 →t (x) = 0
implies (almost surely) ft0 (x) = 0. So in the end
Z Z ft0 (x)
F (y, ft (y)) dy = F Tt0 →t (x), Jt0 →t (x) dx.
Kt K t0 Jt0 →t (x)

Since ft (y) = 0 almost surely outside of Kt and ft0 (x) = 0 almost

surely outside of Kt0 , these two integrals can be extended to the whole
of M .
11 The Jacobian equation 291

Now it remains to generalize this to the case when Π is not com-

pactly supported. (Skip this bit at ﬁrst reading.) Let (Kℓ )ℓ∈N be a
nondecreasing sequence of compact sets, such that Π[∪Kℓ ] = 1. For ℓ
large enough, Π[Kℓ ] > 0, so we can consider the restriction Πℓ of Π to
Kℓ . Then let Kt,ℓ and Kt0 ,ℓ be the images of Kℓ by et and et0 , and of
course µt,ℓ = (et )# Πℓ , µt0 ,ℓ = (et0 )# Πℓ . Since µt and µt0 are absolutely
continuous, so are µt,ℓ and µt0 ,ℓ ; let ft,ℓ and ft0 ,ℓ be their respective
densities. The optimal map Tt0 →t,ℓ for the transport problem between
µt0 ,ℓ and µt,ℓ is obtained as before by the map γt0 → γt , so this is
actually the restriction of Tt0 →t to Kt0 ,ℓ . Thus we have the Jacobian
equation
ft0 ,ℓ (x) = ft,ℓ (Tt0 →t (x)) Jt0 →t (x), (11.2)
where the Jacobian determinant does not depend on ℓ. This equation
holds true almost surely for x ∈ Kℓ′ , as soon as ℓ′ ≤ ℓ, so we may pass
to the limit as ℓ → ∞ to get

ft0 (x) = ft (Tt0 →t (x)) Jt0 →t (x). (11.3)

Since this is true almost surely on Kℓ′ , for each ℓ′ , it is also true almost
surely.
Next, for any nonnegative measurable function G, by monotone con-
vergence and the ﬁrst part of the proof one has
Z Z
G(y) dy = lim G(y) dy
U Kt,ℓ ℓ→∞ Kt,ℓ
Z
= lim G(Tt0 →t (x)) Jt0 →t (x) dx
ℓ→∞ K
t0 ,ℓ
Z
= G(Tt0 →t (x)) Jt0 →t (x) dx.
U Kℓ,t0

The conclusion follows as before by choosing G(y) = F (y, ft (x)) and

using the Jacobian equation (11.3), then extending the integrals to the
whole of M .
It remains to prove the assertion about Jt0 →t (x) being positive for
all values of t ∈ [0, 1], and not just for t = 1, or for almost all values
of t. The transport map Tt0 →t can be written γ(t0 ) → γ(t), where γ
is a minimizing curve determined uniquely by γ(t0 ). Since γ is mini-
mizing, we know (recall Problem 8.8) that the map (γ0 , γ̇0 ) → (γ0 , γt0 )
is locally invertible. So Tt0 →t can be written as the composition of the
292 11 The Jacobian equation

maps F1 : γ(t0 ) → (γ(0), γ(t0 )), F2 : (γ(0), γ(t0 )) → (γ(0), γ̇(0)) and
F3 : (γ(0), γ̇(0)) → γ(t). Both F2 and F3 have a positive Jacobian de-
terminant, at least if t < 1; so if x is chosen in such a way that F1 has
a positive Jacobian determinant at x, then also Tt0 →t = F3 ◦ F2 ◦ F1
will have a positive Jacobian determinant at x for t ∈ [0, 1). ⊓
⊔

Bibliographical notes
Theorem 11.1 can be obtained (in Rn ) by combining Lemma 5.5.3
in [30] with Theorem 3.83 in [26].
In the context of optimal transport, the change of variables for-
mula (11.1) was proven by McCann [614]. His argument is based on
Lebesgue’s density theory, and takes advantage of Alexandrov’s the-
orem, alluded to in this chapter and proven later as Theorem 14.25:
A convex function admits a Taylor expansion at order 2 at almost
each x in its domain of definition. Since the gradient of a convex func-
tion has locally bounded variation, Alexandrov’s theorem can be seen
essentially as a particular case of the theorem of approximate diﬀer-
entiability of functions with bounded variation. McCann’s argument is
reproduced in [814, Theorem 4.8].
Along with Cordero-Erausquin and Schmuckenschläger, McCann
later generalized his result to the case of Riemannian manifolds [246].
Modulo certain complications, the proof basically follows the same pat-
tern as in Rn . Then Cordero-Erausquin [243] treated the case of strictly
convex cost functions in Rn in a similar way.
Ambrosio pointed out that those results could be retrieved within
the general framework of push-forward by approximately diﬀerentiable
mappings. This point of view has the disadvantage of involving more
subtle arguments, but the advantage of showing that it is not a special
feature of optimal transport. It also applies to nonsmooth cost functions
such as |x − y|p . In fact it covers general strictly convex costs of the
form c(x− y) as soon as c has superlinear growth, is C 1 everywhere and
C 2 out of the origin. A more precise discussion of these subtle issues
can be found in [30, Section 6.2.1].
It is a general feature of optimal transport with strictly convex cost
in Rn that if T stands for the optimal transport map, then the Jacobian
matrix ∇T , even if not necessarily nonnegative symmetric, is diagonal-
izable with nonnegative eigenvalues; see Cordero-Erausquin [243] and
Bibliographical notes 293

Ambrosio, Gigli and Savaré [30, Section 6.2]. From an Eulerian perspec-
tive, that diagonalizability property was already noticed by Otto [666,
Proposition A.4]. I don’t know if there is an analog on Riemannian
manifolds.
Changes of variables of the form y = expx (∇ψ(x)) (where ψ is
not necessarily d2 /2-convex) have been used in a remarkable paper
by Cabré [181] to investigate qualitative properties of nondivergent el-
liptic equations (Liouville theorem, Alexandrov–Bakelman–Pucci esti-
mates, Krylov–Safonov–Harnack inequality) on Riemannian manifolds
with nonnegative sectional curvature. (See for instance [189, 416, 786]
for classical proofs in Rn .) It is mentioned in [181] that the methods
extend to sectional curvature bounded below. For the Harnack inequal-
ity, Cabré’s method was extended to nonnegative Ricci curvature by
S. Kim [516].
12

Smoothness

The smoothness of the optimal transport map may give information

about its qualitative behavior, as well as simplify computations. So it
is natural to investigate the regularity of this map.
What characterizes the optimal transport map T is the existence of
a c-convex ψ such that (10.20) (or (10.23)) holds true; so it is natural
to search for a closed equation on ψ.
To guess the equation, let us work formally without being too de-
manding about regularity issues. We shall assume that x and y vary in
Rn , or in nice subsets of smooth n-dimensional Riemannian manifolds.
Let µ(dx) = f (x) vol(dx) and ν(dy) = g(y) vol(dy) be two absolutely
continuous probability measures, let c(x, y) be a smooth cost function,
and let T be a Monge transport. The diﬀerentiation of (10.20) with
respect to x (once again) leads to

∇2 ψ(x) + ∇2xx c(x, T (x)) + ∇2xy c(x, T (x)) · ∇T (x) = 0,

which can be rewritten

∇2 ψ(x) + ∇2xx c(x, T (x)) = −∇2xy c(x, T (x)) · ∇T (x). (12.1)

The expression on the left-hand side is the Hessian of the function

c(x′ , T (x)) + ψ(x′ ), considered as a function of x′ and then evaluated
at x. Since this function has a minimum at x′ = x, its Hessian is
nonnegative, so the left-hand side of (12.1) is a nonnegative symmetric
operator; in particular its determinant is nonnegative. Take absolute
values of determinants on both sides of (12.1):

det ∇2 ψ(x) + ∇2xx c(x, T (x)) = det ∇2xy c(x, T (x)) | det(∇T (x))|.
296 12 Smoothness

Then the Jacobian determinant in the right-hand side can be replaced

by f (x)/g(T (x)), and we arrive at the basic partial differential equa-
tion of optimal transport:
f (x)
det ∇2 ψ(x) + ∇2xx c(x, T (x)) = det ∇2xy c(x, T (x)) . (12.2)
g(T (x))

This becomes a closed equation on ψ in terms of f and g, if one recalls

from (10.20) that

T (x) = (∇x c)−1 x, −∇ψ(x) , (12.3)

where the inverse is with respect to the y variable.

Remark 12.1. In a genuinely Riemannian context, equation (12.2) at

ﬁrst sight does not need to make sense: change the metric close to
y = T (x) and not close to x, then the left-hand side is invariant but
the right-hand side seems to change! This is a subtle illusion: indeed,
if the metric is changed close to y, then the volume measure also, and
the density g has to be modiﬁed too. All in all, the right-hand side
in (12.2) is invariant.

Now what can be said of (12.2)? Unfortunately not much simpliﬁ-

cation can be expected, except in special cases. The most important of
them is the quadratic cost function, or equivalently c(x, y) = −x · y in
Rn . Then (12.2)–(12.3) reduces to

f (x)
det ∇2 ψ(x) = . (12.4)
g(∇ψ(x))

This is an instance of the Monge–Ampère equation, well-known in

the theory of partial differential equations. By extension, the system
(12.2)–(12.3) is also called a generalized Monge–Ampère equation.
At this point we may hope that the theory of partial differential
equations will help our task quite a bit by providing regularity results
for the optimal map in the Monge–Kantorovich problem, at least if we
rule out cases where the map is trivially discontinuous (for instance if
the support of the initial measure µ is connected, while the support of
the final measure ν is not).
However, things are not so simple. As the next few sections will
demonstrate, regularity can hold only under certain stringent assump-
tions on the geometry and the cost function. The identification of these
Caffarelli’s counterexample 297

conditions will introduce us to a beautiful chapter of the theory of fully

nonlinear partial differential equations; but one sad conclusion will be
that optimal transport is in general not smooth — even worse,
smoothness requires nonlocal conditions which are probably impossible
to check effectively, say on a generic Riemannian manifolds. So if we
still want to use optimal transport in rather general situations, we’d
better find ways to do without smoothness. It is actually a striking
feature of optimal transport that this theory can be pushed very far
with so little regularity available.
In this chapter I shall first study various counterexamples to identify
obstructions to the regularity of the optimal transport, and then discuss
some positive results. The following elementary lemma will be useful:

Lemma 12.2. Let (X , µ) and (Y, ν) be any two Polish probability

spaces, let T be a continuous map X → Y, and let π = (Id , T )# µ
be the associated transport map. Then, for each x ∈ Spt µ, the pair
(x, T (x)) belongs to the support of π.

Proof of Lemma 12.2. Let x and ε > 0 be given. By continuity of T ,

there is δ > 0 such that T (Bδ (x)) ⊂ Bε (T (x)). Without loss of gener-
ality, δ ≤ ε. Then

π Bε (x) × Bε (T (x)) = µ z ∈ X ; z ∈ Bε (x) and T (z) ∈ Bε (T (x))
≥ µ[Bε (x) ∩ Bδ (x)] = µ[Bδ (x)] > 0.

Since ε is arbitrarily small, this shows that π attributes positive mea-

sure to any neighborhood of (x, T (x)), which proves the claim. ⊓
⊔

Caffarelli’s counterexample

Caﬀarelli understood that regularity results for (12.2) in Rn cannot

be obtained unless one adds an assumption of convexity of the support
of ν. Without such an assumption, the optimal transport may very well
be discontinuous, as the next counterexample shows.

Theorem 12.3 (An example of discontinuous optimal trans-

port). There are smooth compactly supported probability densities f
and g on Rn , such that the supports of f and g are smooth and con-
nected, f and g are (strictly) positive in the interior of their respective
298 12 Smoothness

supports, and yet the optimal transport between µ(dx) = f (x) dx and
ν(dy) = g(y) dy, for the cost c(x, y) = |x − y|2 , is discontinuous.

Proof of Theorem 12.3. Let f be the indicator function of the unit ball
B in R2 (normalized to be a probability measure), and let g = gε be
the (normalized) indicator function of a set Cε obtained by first sepa-
rating the ball into two halves B1 and B2 (say with distance 2), then
building a thin bridge between those two halves, of width O(ε). (See
Figure 12.1.) Let also g be the normalized indicator function of B1 ∪B2 :
this is the limit of gε as ε ↓ 0. It is not difficult to see that gε (identified
with a probability measure) can be obtained from f by a continuous
deterministic transport (after all, one can deform B continuously into
Cε ; just think that you are playing with clay, then it is possible to mas-
sage the ball into Cε , without tearing off). However, we shall see here
that for ε small enough, the optimal transport cannot be continuous.

0
1
0
1
S−
11111111
00000000
00000000
11111111
0
1
0S
1
0
1
0
1
1111111111111111
0000000000000000
0000000000000000
1111111111111111 S+
000000000
111111111
000000000
111111111
00000000
11111111 00000000000000
11111111111111
0000000000000000
1111111111111111
00000000
11111111 00000000000000
11111111111111
0000000000000000
1111111111111111
0
1
0
1 0000000000000000
1111111111111111
00000
11111
00000000000
11111111111
00000
11111
00000000000
11111111111
000000000
111111111
0000000000000000
1111111111111111
000000000
111111111
0000000000000000
1111111111111111
00000000
11111111 0000
1111
11111111111111
0000000000000000
1111111111111111
00000000
11111111 00000000000000
11111111111111
0000000000000000
1111111111111111
00000000
11111111 0000
1111
00000000000000
11111111111111
0000000000000000
1111111111111111
00000000
11111111 0000
1111
00000000000000
11111111111111
0
1
0
1
0
1
1
0000000000000000
1111111111111111
000000000
111111111
1111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000
00000000000000 00000
11111
00000000000
11111111111
0000000000000000
1111111111111111
000011111
111100000
00000000000
11111111111 000000000
111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
00000
11111
011111111111
00000000000 000000000
111111111
0000000000000000
1111111111111111
00000
11111
00000000000
11111111111 000000000
111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
00000000
11111111 0000
1111
00000000000000
11111111111111
0000000000000000
1111111111111111
00000000
11111111 0000
1111
00000000000000
11111111111111
0
1
0
1 0000000000000000
1111111111111111
00000
11111
00000000000
11111111111
00000
11111
00000000000
11111111111
000000000
111111111
0000000000000000
1111111111111111
000000000
111111111
0000000000000000
1111111111111111
00000000
11111111
00000000
11111111
0000
1111
00000000000000
11111111111111
0000
1111
00000000000000
11111111111111
0
1
0
1 0000000000000000
1111111111111111
00000
11111
00000000000
11111111111 000000000
111111111
0000000000000000
1111111111111111
00000
11111
00000000000
11111111111 000000000
111111111
00000000
11111111
00000000
11111111
00000000
11111111
0000
1111
00000000000000
11111111111111
00000000000000
11111111111111
0000
1111
0
1
0
00000000000000 1
11111111111111 0
0
1
0000000000000000
1111111111111111
00000
11111
00001
1111 00000000000
11111111111
00000000000
11111111111
000000000
111111111
000000000
111111111
0000000000000000
1111111111111111
00000000111111111111111111111111111111111
11111111 000000000
111111111
000000000000000000000000000000000
000000000
111111111
00000000
11111111
00000000
11111111 0
1
0
1 0000000000000000
1111111111111111
000000000000000000000000000000000
111111111111111111111111111111111
000000000
111111111
000000000
111111111
00000000
11111111
00000000
11111111
00000000
11111111
00000000
11111111
0
1
0
1
0
1
0
1
0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
00000000
11111111
00000000
11111111 0
1
0
1 0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
00000000
11111111
00000000
11111111
00000000
11111111
00000000
11111111
0
1
0
1
0
1
0
1
0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
00000000
11111111
00000000
11111111
00000000
11111111
0
1
0
1
0
1
0
1
0000000000000000
1111111111111111
000000000
111111111
000000000
111111111
0000000000000000
1111111111111111
000000000
111111111
0
1
0
1
0
1
Fig. 12.1. Principle behind Caffarelli’s counterexample. The optimal transport from
the ball to the “dumb-bells” has to be discontinuous, and in effect splits the upper
region S into the upper left and upper right regions S− and S+ . Otherwise, there
should be some transport along the dashed lines, but for some lines this would
contradict monotonicity.

The proof will rest on the stability of optimal transport: If T is

the unique optimal transport between µ and ν, and Tε is an optimal
transport betwen µ and νε , then Tε converges to T in µ-probability as
ε ↓ 0 (Corollary 5.23).
In the present case, choosing µ(dx) = f (x) dx, ν(dy) = g(y) dy,
c(x, y) = |x − y|2 , it is easy to ﬁgure out that the unique optimal
Loeper’s counterexample 299

transport T is the one that sends (x, y) to (x − 1, y) if x < 0, and to

(x + 1, y) if x > 0.
Let now S, S− and S+ be the upper regions in the ball, the left
half-ball and the right half-ball, respectively, as in Figure 12.1. As a
consequence of the convergence in probability, for ε small enough, a
large fraction (say 0.99) of the mass in S has to go to S− (if it lies on
the left) or to S+ (if it lies on the right). Since the continuous image of a
connected set is itself connected, there have to be some points in Tε (S)
that form a path going from S− to S+ ; and so there is some x ∈ S such
that Tε (x) is close to the left-end of the tube joining the half-balls,
in particular Tε (x) − x has a large downward component. From the
convergence in probability again, many of the neighbors x e of x have to
be transported to, say, S− , with nearly horizontal
displacements T(e
x)−
e. If such an x
x e is picked below x, we shall have x− x e, T (x)−T (ex) < 0;
or equivalently, |x − T (x)|2 + |e x)|2 > |x − T (e
x − T (e x)|2 + |e
x − T (x)|2 . If
T is continuous, in view of Lemma 12.2 this contradicts the c-cyclical
monotonicity of the optimal coupling. The conclusion is that when ε is
small enough, the optimal map Tε is discontinuous.
The maps f and g in this example are extremely smooth (in fact
constant!) in the interior of their support, but they are not smooth as
maps deﬁned on Rn . To produce a similar construction with functions
that are smooth on Rn , one just needs to regularize f and gε a tiny bit,
letting the regularization parameter vanish as ε → 0. ⊓
⊔

Loeper’s counterexample

Loeper discovered that in a genuine Riemannian setting the smoothness

of optimal transport can be prevented by some local geometric obstruc-
tions. The next counterexample will illustrate this phenomenon.

Theorem 12.4 (A further example of discontinuous optimal

transport). There is a smooth compact Riemannian surface S,
and there are smooth positive probability densities f and g on S,
such that the optimal transport between µ(dx) = f (x) vol(dx) and
ν(dy) = g(y) vol(dy), with a cost function equal to the square of the
geodesic distance on S, is discontinuous.
300 12 Smoothness

Remark 12.5. The obstruction has nothing to do with the lack of

smoothness of the squared distance. Counterexamples of the same type
exist for very smooth cost functions.
Remark 12.6. As we shall see in Theorem 12.44, the surface S in
Theorem 12.4 could be replaced by any compact Riemannian mani-
fold admitting a negative sectional curvature at some point. In that
sense there is no hope for general regularity results outside the world
of nonnegative sectional curvature.

A+
11
00
00
11
00
11

B− B
11
00 O 11
00+
00
11 00
11
00
11 00
11

11
00
00
11
00
11
A−

Fig. 12.2. Principle behind Loeper’s counterexample. This is the surface S, im-
mersed in R3 , “viewed from above”. By symmetry, O has to stay in place. Because
most of the initial mass is close to A+ and A− , and most of the final mass is close
to B+ and B− , at least some mass has to move from one of the A-balls to one
of the B-balls. But then, because of the modified (negative curvature) Pythagoras
inequality, it is more efficient to replace the transport scheme (A → B, O → O), by
(A → O, O → B).

Proof of Theorem 12.4. Let S be a compact surface in R3 with the

following properties: (a) S is invariant under the symmetries x → −x,
y → −y; (b) S crosses the axis (x, y) = (0, 0) at exactly two points,
namely O = (0, 0, 0) and O′ ; (c) S coincides in an open ball B(O, r)
with the “horse saddle” (z = x2 − y 2 ). (Think of S as a small piece of
the horse saddle which has been completed into a closed surface.)
Loeper’s counterexample 301

Let x0 , y0 > 0 to be determined later. Then let A+ = (x0 , 0, x20 ),

A− = (−x0 , 0, x20 ), and similarly B+ = (0, y0 , −y02 ), B− = (0, −y0 , −y02 );
see Figure 12.2. In the sequel the symbol A± will stand for “either A+
or A− ”, etc.
If x0 and y0 are small enough then the four points A+ , A− , B+ , B−
belong to a neighborhood of O where S has (strictly) negative curva-
ture, and the unique geodesic joining O to A± (resp. B± ) satisﬁes the
equation (y = 0) (resp. x = 0); then the lines (O, A± ) and (O, B± ) are
orthogonal at O. Since we are on a negatively curved surface, Pythago-
ras’s identity in a triangle with a square angle is modiﬁed in favor of
the diagonal, so

d(O, A± )2 + d(O, B± )2 < d(A± , B± )2 .

By continuity, there is ε0 > 0 small enough that the balls B(A+ , ε0 ),

B(A− , ε0 ), B(B+ , ε0 ) and B(B− , ε0 ) are all disjoint and satisfy

x ∈ B(A+ , ε0 ) ∪ B(A− , ε0 ), y ∈ B(B+ , ε0 ) ∪ B(B− , ε0 )
=⇒ d(O, x)2 + d(O, y)2 < d(x, y)2 . (12.5)

Next let f and g be smooth probability densities on M , even in x

and y, such that
Z Z
1 1
f (x) dx > ; g(y) dy > .
B(A+ ,ε0 )∪B(A− ,ε0 ) 2 B(B+ ,ε0 )∪B(B− ,ε0 ) 2
(12.6)
Let µ(dx) = f (x) dx, ν(dy) = g(y) dy, let T be the unique optimal
transport between the measures µ and ν (for the cost function c(x, y) =
d(x, y)2 ), and let Te be the optimal transport between ν and µ. (T and
Te are inverses of each other, at least in a measure-theoretical sense.)
I claim that either T or Te is discontinuous.
Indeed, suppose to the contrary that both T and Te are continuous.
We shall ﬁrst see that necessarily T (O) = O. Since the problem is
symmetric with respect to x → −x and y → −y, and since there is
uniqueness of the optimal transport, T maps O into a point that is
invariant under these two transforms, that is, either O or O′ . Similarly,
T (O′ ) ∈ {O, O′ }. So we have two cases to dismiss:
Case 1: T (O) = O′ , T (O′ ) = O. Then by Lemma 12.2 the two
pairs (O, O′ ) and (O′ , O) belong to the support of the optimal plan
associated to T , which trivially contradicts the cyclical monotonicity
since d(O, O′ )2 + d(O′ , O)2 > d(O, O)2 + d(O′ , O′ )2 = 0.
302 12 Smoothness

Case 2: T (O) = O′ , T (O′ ) = O′ . Then both (O′ , O) and (O′ , O′ )

belong to the support of the optimal plan. Let U and U ′ be two disjoint
neighborhoods of O and O′ respectively. By swapping variables x and
y we see that (O′ , O) and (O′ , O′ ) belong to the support of the optimal
plan (Id , Te)# ν; so for any ε > 0 the map Te has to send a set of positive
measure in Bε (O′ ) into U , and also a set of positive measure in Bε (O′ )
into U ′ . This contradicts the continuity of Te.
So we may assume that T (O) = O. By Lemma 12.2 again, (O, O)
belongs to the support of the optimal plan π. Then (12.6) implies that
there is some transfer of mass from either B(A+ , ε0 ) ∪ B(A− , ε0 ) to
B(B+ , ε0 ) ∪ B(B− , ε0 ); in other words, we can ﬁnd, in the support of
the optimal transport, some (x, y) with x ∈ B(A+ , ε0 ) ∪ B(A− , ε0 ) and
y ∈ B(B+ , ε0 )∪B(B− , ε0 ). From the previous step we know that (O, O)
also lies in that support; then by c-cyclical monotonicity,

d(x, y)2 + d(O, O)2 ≤ d(x, O)2 + d(y, O)2 ;

but this contradicts (12.5). The proof is complete. ⊓

⊔

Smoothness and Assumption (C)

Here as in Chapter 9, I shall say that a cost function c on X ×Y satisﬁes

Assumption (C) if for any c-convex function ψ and for any x ∈ X ,
the c-subdiﬀerential (or contact set) of ψ at x, ∂c ψ(x), is connected.
Some interesting consequences of this assumption were discussed in
Chapter 9, and it was shown that even the simple cost function c(x, y) =
|x − y|p on Rn × Rn , p > 2, does not satisfy it. Now we shall see that
this assumption is more or less necessary for the regularity of optimal
transport. For simplicity I shall work in a compact setting; it would
be easy to extend the proof to more general situations by imposing
suitable conditions at inﬁnity.

Theorem 12.7 (Smoothness needs Assumption (C)). Let X

(resp. Y) be the closure of a bounded open set in a smooth Rieman-
nian manifold M (resp. N ), equipped with its volume measure. Let
c : X × Y → R be a continuous cost function. Assume that there are
a c-convex function ψ : X → R, and a point x ∈ X , such that ∂c ψ(x)
is disconnected. Then there exist C ∞ positive probability densities f on
Smoothness and Assumption (C) 303

X and g on Y such that any optimal transport map from f vol to g vol
is discontinuous.

Remark 12.8. Assumption (C) involves both the geometry of Y and

the local properties of c; in some sense the obstruction described in this
theorem underlies both Caﬀarelli’s and Loeper’s counterexamples.

Remark 12.9. The volume measure in Theorem 12.7 can be replaced

by any ﬁnite measure giving positive mass to all open sets. (In that
sense the Riemannian structure plays no real role.) Moreover, with sim-
ple modiﬁcations of the proof one can weaken the assumption “∂c ψ(x)
is disconnected” into “∂c ψ(x) is not simply connected”.

Proof of Theorem 12.7. Note ﬁrst that the continuity of the cost func-
tion and the compactness of X × Y implies the continuity of ψ. In the
proof, the notation d will be used interchangeably for the distances on
X and Y.
Let C1 and C2 be two disjoint connected components of ∂c ψ(x).
Since ∂c ψ(x) is closed, C1 and C2 lie a positive distance apart. Let
r = d(C1 , C2 )/5, and let C = {y ∈ Y; d(y, ∂c ψ(x)) ≥ 2 r}. Further, let
y 1 ∈ C1 , y 2 ∈ C2 , B1 = Br (y 1 ), B2 = Br (y 2 ). Obviously, C is compact,
and any path going from B1 to B2 has to go through C.
Then let K = {z ∈ X ; ∃ y ∈ C; y ∈ ∂c ψ(z)}. It is clear that K is
compact: Indeed, if zk ∈ K converges to z ∈ X , and yk ∈ ∂c ψ(zk ), then
without loss of generality yk converges to some y ∈ C and one can pass
to the limit in the inequality ψ(zk ) + c(zk , yk ) ≤ ψ(t) + c(t, yk ), where
t ∈ X is arbitrary. Also K is not empty since X and Y are compact.
Next I claim that for any y ∈ C, for any x such that y ∈ ∂c ψ(x),
and for any i ∈ {1, 2},

c(x, y i ) + c(x, y) < c(x, y) + c(x, y i ). (12.7)

Indeed, y ∈
/ ∂c ψ(x), so

ψ(x) + c(x, y) = inf ψ(e
x) + c(e
x, y) < ψ(x) + c(x, y). (12.8)
x
e∈X

On the other hand, yi ∈ ∂c ψ(x), so

ψ(x) + c(x, y i ) ≤ ψ(x) + c(x, y i ). (12.9)

The combination of (12.8) and (12.9) implies (12.7).

Next, one can reinforce (12.7) into
304 12 Smoothness

c(x, y i ) + c(x, y) ≤ c(x, y) + c(x, y i ) − ε (12.10)

for some ε > 0. This follows easily from a contradiction argument based
on the compactness of C and K, and once again the continuity of ∂c ψ.
Then let δ ∈ (0, r) be small enough that for any (x, y) ∈ K × C, for
any i ∈ {1, 2}, the inequalities

d(x, x′ ) ≤ 2δ, d(x, x′ ) ≤ 2δ, d(y i , y ′i ) ≤ 2δ, d(y, y ′ ) ≤ 2δ

imply
′ ′
c(x , y) − c(x, y) ≤ ε , c(x , y) − c(x, y) ≤ ε , (12.11)
10 10
′ ′ ′ ′
c(x , y ) − c(x, y i ) ≤ ε , c(x , y ) − c(x, y i ) ≤ ε .
i i
10 10
Let K δ = {x ∈ X ; d(x, K) ≤ δ}. From the assumptions on X and Y,
Bδ (x) has positive volume, so we can ﬁx a smooth positive probability
density f on X such that the measure µ = f vol satisﬁes
3
µ[Bδ (x)] ≥ ; f ≥ ε0 > 0 on K δ . (12.12)
4
Also we can construct a sequence of smooth positive probability den-
sities (gk )k∈N on Y such that the measures νk = gk vol satisfy
1
νk −−−→ δy1 + δy 2 weakly. (12.13)
k→∞ 2
Let us assume the existence of a continuous optimal transport Tk
sending µk to νk , for any k. We shall reach a contradiction, and this
will prove the theorem.
From (12.13), νk [Bδ (y 1 )] ≥ 1/3 for k large enough. Then by (12.12)
the transport Tk has to send some mass from Bδ (x) to Bδ (y 1 ), and
similarly from Bδ (x) to Bδ (y 2 ). Since Tk (Bδ (x)) is connected, it has to
meet C. So there are yk ∈ C and x′k ∈ Bδ (x) such that Tk (x′k ) = yk .
Let xk ∈ K be such that yk ∈ ∂c ψ(xk ). Without loss of generality
we may assume that xk → x∞ ∈ K as k → ∞. By the second part
of (12.12), m := µ[Bδ (x∞ )] ≥ ε0 vol [Bδ (x∞ )] > 0. When k is large
enough, νk [Bδ (y 1 ) ∪ Bδ (y 2 )] > 1 − m (by (12.13) again), so Tk has to
send some mass from Bδ (x∞ ) to either Bδ (y 1 ) or Bδ (y 2 ); say Bδ (y 1 ).
In other words, there is some x′k ∈ Bδ (x∞ ) such that T (x′k ) ∈ Bδ (y 1 ).
Let us recapitulate: for k large enough,
Regular cost functions 305

d(x, x′k ) ≤ δ; Tk (x′k ) = yk ∈ C; yk ∈ ∂c ψ(xk ); (12.14)

d(xk , x′k ) ≤ 2δ; d(Tk (x′k ), y 1 ) ≤ δ.

By c-cyclical monotonicity of optimal transport and Lemma 12.2,

c(x′k , Tk (x′k )) + c(x′k , Tk (x′k )) ≤ c(x′k , Tk (x′k )) + c(x′k , Tk (x′k )).

From the inequalities in (12.14) and (12.11), we deduce

4ε
c(x, yk ) + c(xk , y 1 ) ≤ c(x, y 1 ) + c(xk , yk ) + .
10
Since yk ∈ C ∩ ∂c ψ(xk ) and xk ∈ K, this contradicts (12.10). The proof
is complete. ⊓
⊔

Regular cost functions

In the previous section we dealt with plainly continuous cost functions;

now we shall come back to the differentiable setting used in Chapter 10
for the solution of the Monge problem. Throughout this section, X will
be a closed subset of a Riemannian manifold M and Dom (∇x c) will
stand for the set of points (x, y) ∈ X × Y such that ∇x c(x, y) is well-
defined. (A priori x should belong to the interior of X in M .) It will
be assumed that ∇x c(x, · ) is one-to-one on its domain of definition
(Assumption (Twist) in Chapter 10).

Definition 12.10 (c-segment). A continuous curve (yt )t∈[0,1] in Y

is said to be a c-segment with base x if (a) (x, yt ) ∈ Dom (∇x c) for
all t; (b) there are p0 , p1 ∈ Tx M such that ∇x c(x, yt ) + pt = 0, where
pt = (1 − t) p0 + t p1 .

In other words a c-segment is the image of a usual segment by a map

(∇x c(x, · ))−1 . Since (x, yt ) in the deﬁnition always lies in Dom (∇x c), a
c-segment is uniquely determined by its base point x and its endpoints
y0 , y1 ; I shall denote it by [y0 , y1 ]x .

Definition 12.11 (c-convexity). A set C ⊂ Y is said to be c-convex

with respect to x ∈ X if for any two points y0 , y1 ∈ C there is a
c-segment (yt )0≤t≤1 = [y0 , y1 ]x (necessarily unique) which is entirely
contained in C.
306 12 Smoothness

More generally, a set C ⊂ Y is said to be c-convex with respect to a

subset Xe of X if C is c-convex with respect to any x ∈ Xe.
A set D ⊂ Dom (∇x c) is said to be totally c-convex if for any two
points (x, y0 ) and (x, y1 ) in D, there is a c-segment (yt )0≤t≤1 with
base x, such that (x, yt ) ∈ D for all t.
Similar notions of strict c-convexity are defined by imposing that yt
belong to the interior of C if y0 6= y1 and t ∈ (0, 1).

Example 12.12. When X = Y = Rn and c(x, y) = −x · y (or x · y), c-

convexity is just plain convexity in Rn . If X = Y = S n−1 and c(x, y) =
d(x, y)2 /2 then Dom (∇x c(x, · )) = S n−1 \ {−x} (the cut locus is the
antipodal point), and ∇x c(x, S n−1 \{−x}) = B(0, π) ⊂ Tx M is a convex
set. So for any point x, S n−1 minus the cut locus of x is c-convex with
respect to x. An equivalent statement is that S n−1 × S n−1 \ {(x, −x)}
(which is the whole of Dom (∇x c)) is totally d2 /2-convex. By abuse of
language, one may say that S n−1 is d2 /2-convex with respect to itself.
The same is true of a ﬂat torus with arbitrary sidelengths.

Example 12.13. To construct a Riemannian manifold which is not

(d2 /2)-convex with respect to itself, start with the Euclidean plane
(embedded in R3 ), and from the origin draw three lines L− , L0 and L+
directed respectively by the vectors (1, −1), (1, 0) and (1, 1). Put a
high enough mountain on the pathway of L0 , without aﬀecting L+
or L− , so that L0 is minimizing only for a ﬁnite time, while L± are still
minimizing at all times. The resulting surface is not d2 /2-convex with
respect to the origin.

Before stating the main deﬁnition of this section, I shall now in-
troduce some more notation. If X is a closed subset of a Riemannian
manifold M and c : X → Y → R is a continuous cost function, for
any x in the interior of X I shall denote by Dom ′ (∇x c(x, · )) the in-
terior of Dom (∇x c(x, · )). Moreover I shall write Dom ′ (∇x c) for the
union of all sets {x} × Dom ′ (∇x c(x, · )), where x varies in the inte-
rior of X . For instance, if X = Y = M is a complete Riemannian
manifold and c(x, y) = d(x, y)2 is the square of the geodesic distance,
then Dom ′ (∇x c) is obtained from M × M by removing the cut locus,
while Dom (∇x c) might be slightly bigger (these facts are recalled in
the Appendix).

Definition 12.14 (regular cost function). A cost c : X ×Y → R is

said to be regular if for any x in the interior of X and for any c-convex
Regular cost functions 307

function ψ : X → R, the set ∂c ψ(x) ∩ Dom ′ (∇x c(x, · )) is c-convex with

respect to x.
The cost c is said to be strictly regular if moreover, for any nontrivial
c-segment (yt )0≤t≤1 = [y0 , y1 ]x in ∂c ψ(x) and for any t ∈ (0, 1), x is the
only contact point of ψ c at yt , i.e. the only x ∈ X such that yt ∈ ∂c ψ(x).

More generally, if D is a subset of Dom ′ (∇x c), I shall say that c

is regular in D if for any x in projX D and any c-convex function ψ :
X → R, the set ∂c ψ(x) ∩ {y; (x, y) ∈ D} is c-convex. Equivalently, the
intersection of D and the graph of ∂c ψ should be totally c-convex. The
notion of strict regularity in D is obtained by modifying this definition
in an obvious way.
What does regularity mean? Let ψ be a c-convex function and let
−c( · , y0 ) + a0 and −c( · , y1 ) + a1 touch the graph of ψ from below at
x, take any yt ∈ [y0 , y1 ]x , 0 < t < 1, and increase a from −∞ until the
function −c( · , yt ) + a touches the graph of ψ: the regularity property
means that x should be a contact point (and the only one if the cost is
strictly regular).
Before going further I shall discuss several convenient reformula-
tions of the regularity property, in terms of (i) elementary c-convex
functions; (ii) subgradients; (iii) connectedness of c-subdifferentials.
Assumptions (Twist) (twist condition), (locSC) (local semiconcav-
ity) and (H∞) (adequate behavior at infinity) will be the same as in
Chapter 10 (cf. p. 246).

Proposition 12.15 (Reformulation of regularity). Let X be a

closed subset of a Riemannian manifold M and let Y be a Polish space.
Let c : X × Y → R be a continuous cost function satisfying (Twist).
Then:
(i) c is regular if and only if for any (x, y0 ), (x, y1 ) ∈ Dom ′ (∇x c),
the c-segment (yt )0≤t≤1 = [y0 , y1 ]x is well-defined, and for any t ∈ [0, 1],

−c(x, yt ) + c(x, yt ) ≤ max −c(x, y0 ) + c(x, y0 ), −c(x, y1 ) + c(x, y1 )
(12.15)
(with strict inequality if c is strictly regular, y0 6= y1 , t ∈ (0, 1) and
x 6= x).
(ii) If c is a regular cost function satisfying (locSC) and (H∞),
then for any c-convex ψ : X → R and any x in the interior of X such
that ∂c ψ(x) ⊂ Dom ′ (∇x c(x, · )), one has
308 12 Smoothness

∇− −
c ψ(x) = ∇ ψ(x),

where the c-subgradient ∇−

c ψ(x) is defined by

∇−
c ψ(x) := −∇x c x, ∂c ψ(x) .

(iii) If c satisfies (locSC) and Dom ′ (∇x c) is totally c-convex, then

c is regular if and only if for any c-convex ψ : X → R and any x in the
interior of X ,

∂c ψ(x) ∩ Dom ′ ∇x c(x, · ) is connected. (12.16)

Remark 12.16. Statement (i) in Proposition 12.15 means that it is

suﬃcient to test Deﬁnition 12.14 on the particular functions

ψx,y0 ,y1 (x) := max −c(x, y0 ) + c(x, y0 ), −c(x, y1 ) + c(x, y1 ) , (12.17)

where (x, y0 ) and (x, y1 ) belong to Dom ′ (∇x c). (The functions ψx,y0 ,y1
play in some sense the role of x → |x1 | in usual convexity theory.) See
Figure 12.3 for an illustration of the resulting “recipe”.

Example 12.17. Obviously c(x, y) = −x · y, or equivalently |x − y|2 ,

is a regular cost function, but it is not strictly regular. The same is
true of c(x, y) = −|x − y|2 , although the qualitative properties of opti-
mal transport are quite diﬀerent in this case. We shall consider other
examples later.

Remark 12.18. It follows from the deﬁnition of c-subgradient and

Theorem 10.24 that for any x in the interior of X and any c-convex
ψ : X → R,
∇− −
c ψ(x) ⊂ ∇ ψ(x).

So Proposition 12.15(ii) means that, modulo issues about the domain

of differentiability of c, it is equivalent to require that c satisfies the
regularity property, or that ∇− c ψ(x) fills the whole of the convex set
−
∇ ψ(x).

Remark 12.19. Proposition 12.15(iii) shows that, again modulo issues

about the diﬀerentiability of c, the regularity property is morally equiv-
alent to Assumption (C) in Chapter 9. See Theorem 12.42 for more.
Regular cost functions 309

y0 y1/2 y1

Fig. 12.3. Regular cost function. Take two cost-shaped mountains peaked at y0
and y1 , let x be a pass, choose an intermediate point yt on (y0 , y1 )x , and grow a
mountain peaked at yt from below; the mountain should emerge at x. (Note: the
shape of the mountain is the negative of the cost function.)

Proof of Proposition 12.15. Let us start with (i). The necessity of the
condition is obvious since y0 , y1 both belong to ∂c ψx,y0 ,y1 (x). Con-
versely, if the condition is satisﬁed, let ψ be any c-convex function
X → R, let x belong to the interior of X and let y0 , y1 ∈ ∂c ψ(x). By
adding a suitable constant to ψ (which will not change the subdiﬀeren-
tial), we may assume that ψ(x) = 0. Since ψ is c-convex, ψ ≥ ψx,y0 ,y1 ,
so for any t ∈ [0, 1] and x ∈ X ,

c(x, yt ) + ψ(x) ≥ c(x, yt ) + ψx,y0 ,y1 (x)

≥ c(x, yt ) + ψx,y0 ,y1 (x) = c(x, yt ) + ψ(x),

which shows that yt ∈ ∂c ψ(x), as desired.

Now let c, ψ and x be as in Statement (ii). By Theorem 10.25,
−∇x c(x, ∂c ψ(x)) is included in the convex set ∇− ψ(x). Moreover, ψ is
locally semiconvex (Theorem 10.26), so ∇− ψ(x) is the convex hull of
cluster points of ∇ψ(x), as x → x. (This comes from a localization ar-
gument and a similar result for convex functions, recall Remark 10.51.)
310 12 Smoothness

It follows that any extremal point p of ∇− ψ(x) is the limit of ∇ψ(xk )

for some sequence xk → x. Then let yk ∈ ∂c ψ(xk ) (these yk exist by
Theorem 10.24 and form a compact set). By Theorem 10.25, ∇x c is
well-deﬁned at (xk , yk ) and −∇c(xk , yk ) = ∇ψ(xk ). Up to extraction
of a subsequence, we have yk → y ∈ ∂c ψ(x), in particular (x, y) lies in
the domain of ∇x c and −∇c(x, y) = p. The conclusion is that

E(∇− ψ(x)) ⊂ ∇− −
c ψ(x) ⊂ ∇ ψ(x),

where E stands for “extremal points”. In particular, ∇−c ψ(x) is convex

if and only if it coincides with ∇− ψ(x). Part (ii) of the Proposition
follows easily.
It remains to prove (iii). Whenever (x, y0 ) and (x, y1 ) belong to
Dom ′ (∇x c), let ψx,y0 ,y1 be defined by (12.17). Both y0 and y1 belong
to ∂c ψx,y0 ,y1 (x). If c is regular, then y0 and y1 can be connected by
a c-segment with base x, which lies inside Dom (∇x c) ∩ ∂c ψ(x). This
proves (12.16).
Conversely, let us assume that (12.16) holds true, and prove that
c is regular. If x, y0 , y1 are as above, the c-segment [y0 , y1 ]x is well-
defined by (a), so by part (i) of the Proposition we just have to prove
the c-convexity of ∂c ψx,y0 ,y1 .
Then let h0 (x) = −c(x, y0 ) + c(x, y0 ), h1 (x) = −c(x, y1 ) + c(x, y1 ).
By the twist condition, ∇h0 (x) 6= ∇h1 (x). Since h0 and h1 are semi-
convex, we can apply the nonsmooth implicit function theorem (Corol-
lary 10.52) to conclude that the equation (h0 = h1 ) defines an (n − 1)-
dimensional Lipschitz graph G in the neighborhood of x. This graph is
a level set of ψ = ψx,y0 ,y1 , so ∇− ψ(x) is included in the orthogonal of G,
which is the line directed by ∇h0 (x) − ∇h1 (x). In particular, ∇− ψ(x)
is a one-dimensional set.
Let S = ∂c ψ(x) ∩ Dom (∇x c(x, · )). By (b), S is connected, so S ′ =
∇x c(x, S) is connected too, and by Theorem 10.25, S ′ is included in
the line ∇− ψ(x). Thus S ′ is a convex line segment (closed or not)
containing −∇x c(x, y0 ) and −∇x c(x, y1 ). This finishes the proof of the
regularity of c. ⊓
⊔

The next result will show that the regularity property of the cost
function is a necessary condition for a general theory of regularity of
optimal transport. In view of Remark 12.19 it is close in spirit to The-
orem 12.7.
Regular cost functions 311

Theorem 12.20 (Nonregularity implies nondensity of differen-

tiable c-convex functions). Let c : X × Y → R satisfy (Twist),
(locSC) and (H∞). Let C be a totally c-convex set contained in
Dom ′ (∇x c), such that (a) c is not regular in C; and (b) for any
(x, y0 ), (x, y1 ) in C, ∂c ψx,y0 ,y1 (x) ⊂ Dom ′ (∇x c(x, · )). Then for some
(x, y0 ), (x, y1 ) in C the c-convex function ψ = ψx,y0 ,y1 cannot be the
locally uniform limit of differentiable c-convex functions ψk .

The short version of Theorem 12.20 is that if c is not regular, then

differentiable c-convex functions are not dense, for the topology of lo-
cal uniform convergence, in the set of all c-convex functions. What is
happening here can be explained informally as follows. Take a function
such as the one pictured in Figure 12.3, and try to tame up the singu-
larity at the “saddle” by letting mountains grow from below and gently
surelevate the pass. To do this without creating a new singularity you
would like to be able to touch only the pass. Without the regularity
assumption, you might not be able to do so.
Before proving Theorem 12.20, let us see how it can be used to prove
a nonsmoothness result of the same kind as Theorem 12.7. Let x, y0 , y1
be as in the conclusion of Theorem 12.20. We may partition X in two
sets X0 and X1 such that yi ∈ ∂c ψ(x) for each x ∈ Xi (i = 0, 1). Fix an
arbitrary smooth positive probability density f on X , let ai = µ[Xi ] and
ν = a0 δy0 + a1 δy1 . Let T be the transport map defined by T (Xi ) = yi ;
by construction the associated transport plan has its support included
in ∂c ψ, so π is an optimal transport plan and (ψ, ψ c ) is optimal in the
dual Kantorovich problem associated with (µ, ν). By Remark 10.30,
ψ is the unique optimal map, up to an additive constant, which can
be fixed by requiring ψ(x0 ) = 0 for some x0 ∈ X . Then let (fk )k∈N
be a sequence of smooth probability densities, such that νk := fk vol
converges weakly to ν. For any k let ψk be a c-convex function, optimal
in the dual Kantorovich problem. Without loss of generality we can
assume ψk (x0 ) = 0. Since the functions ψk are uniformly continuous, we
can extract a subsequence converging to some function ψ. e By passing to
the limit in both sides of the Kantorovich problem, we deduce that ψe is
optimal in the dual Kantorovich problem, so ψe = ψ. By Theorem 12.20,
the ψk cannot all be differentiable. So we have proven the following
corollary:

Corollary 12.21 (Nonsmoothness of the Kantorovich poten-

tial). With the same assumptions as in Theorem 12.20, if Y is a
312 12 Smoothness

closed subset of a Riemannian manifold, then there are smooth posi-

tive probability densities f on X and g on Y, such that the associated
Kantorovich potential ψ is not differentiable.
Now let us prove Theorem 12.20. The following lemma will be useful.
Lemma 12.22. Let U be an open set of a Riemannian manifold, and
let (ψk )k∈N be a sequence of semi-convex functions converging uniformly
to ψ on U . Let x ∈ U and p ∈ ∇− ψ(x). Then there exist sequences
xk → x, and pk ∈ ∇− ψk (xk ), such that pk → p.
Proof of Lemma 12.22. Since this statement is local, we may pretend
that U is a small open ball in Rn , centered at x. By adding a well-chosen
quadratic function we may also pretend that ψ is convex. Let δ > 0 and
ψek (x) = ψk (x) − ψ(x) + δ|x − x|2 /2 − p · (x − x). Clearly ψek converges
e
uniformly to ψ(x) = ψ(x)−ψ(x)+δ|x−x|2 /2−p·(x−x). Let xk ∈ U be
a point where ψk achieves its minimum. Since ψe has a unique minimum
e
at x, by uniform convergence xk approaches x, in particular xk belongs
to U for k large enough. Since ψek has a minimum at xk , necessarily
0 ∈ ∇− ψ(xe k ), or equivalently pk := p − δ(xk − x) ∈ ∇− ψk (xk ), so
pk → p, which proves the result. ⊓
⊔
Proof of Theorem 12.20. Since c is not regular in C, there are (x, y0 )
and (x, y1 ) in C such that ∂c ψ(x)∩C is not c-convex, where ψ = ψx,y0 ,y1 .
Assume that ψ is the limit of diﬀerentiable c-convex functions ψk .
Let p ∈ ∇− ψ(x). By Lemma 12.22 there are sequences xk → x and
pk → p such that pk ∈ ∇− ψk (xk ), i.e. pk = ∇ψk (xk ). Since ∂c ψk (xk )
is nonempty (Proposition 10.24), in fact −∇c(xk , ∂c ψk (xk )) = {pk },
in particular ∂c ψk (xk ) contains a single point, say yk . By Proposi-
tion 10.24 again, yk stays in a compact set, so we may assume yk → y.
The pointwise convergence of ψk to ψ implies y ∈ ∂c ψ(x). Since
c is semiconcave, ∇x c is continuous on its domain of deﬁnition, so
−∇x c(x, y) = lim pk = p.
So −∇x c(x, ∂c ψ(x)) contains the whole of ∇− ψ(x); combining this
with Remark 12.18 we conclude that ∂c ψ(x) is c-convex, in contradic-
tion with our assumption. ⊓
⊔
Now that we are convinced of the importance of the regularity con-
dition, the question naturally arises whether it is possible to translate
it in analytical terms, and to check it in practice. The next section will
bring a partial answer.
The Ma–Trudinger–Wang condition 313

The Ma–Trudinger–Wang condition

Ma, Trudinger and X.-J. Wang discovered a differential condition on

the cost function, which plays a key role in smoothness estimates, but
in the end turns out to be a local version of the regularity property.
Before explaining this condition, I shall introduce a new key assump-
tion which will be called the “strong twist” condition. (Recall that the
“plain” twist condition is the injectivity of ∇x c(x, · ) on its domain of
deﬁnition.)
Let X , Y be closed sets in Riemannian manifolds M, N respectively,
and as before let Dom ′ (∇x c) stand for the set of all (x, y) ∈ X × Y such
that x lies in the interior of X and y in the interior of Dom (∇x c(x, · )).
It will be said that c satisﬁes the strong twist condition if
(STwist) Dom ′ (∇x c) is an open set on which c is smooth, ∇x c is
one-to-one, and the mixed Hessian ∇2x,y c is nonsingular.

Remark 12.23. The invertibility of ∇2x,y c implies the local injectivity

of ∇x c(x, · ), by the implicit function theorem; but alone it does not a
priori guarantee the global injectivity.

Remark 12.24. One can reﬁne Proposition 10.15 to show that cost
functions deriving from a well-behaved Lagrangian do satisfy the strong
twist condition. In the Appendix I shall give more details for the im-
portant particular case of the squared geodesic distance.

Remark 12.25. One should think of ∇2x,y c as a bilinear form on Tx M ×

Ty N : It takes a pair of tangent vectors (ξ, η) ∈ Tx M × Ty N , and gives
back a number, (∇2 c(x, y)) · (ξ, η) = h∇2x,y c(x, y) · ξ, ηi. It will play the
role of a Riemannian metric (or rather the negative of a Riemannian
metric), except that ξ and η do not necessarily belong to the same
tangent space!

The Ma–Trudinger–Wang condition is not so simple and involves

fourth-order derivatives of the cost. To write it in an unambiguous
way, it will be convenient to use coordinates, with some care. If x and
y are given points, let us introduce geodesic coordinates x1 , . . . , xn and
y1 , . . . , yn in the neighborhood of x and y respectively. (This means that
one chooses Euclidean coordinates in, say, Tx M and then parameter-
izes a point x e in the neighborhood of x by the coordinates of the vector
ξ such that x e = expx (ξ).) The technical advantage of using geodesic
coordinates is that geodesic paths starting from x or y will be straight
314 12 Smoothness

curves, in particular the Hessian can be computed by just diﬀerentiat-

ing twice with respect to the coordinate variables. (This advantage is
nonessential, as we shall see.)
If u is a function of x, then uj = ∂j u = ∂u/∂xj will stand for the
partial derivative of u with respect to xj . Indices corresponding to the
derivation in y will be written after a comma; so if a(x, y) is a function
of x and y, then ai,j stands for ∂ 2 a/∂xi ∂yj . SometimesP I shall use the
convention of summation over repeated indices (ak bk = k ak bk , etc.),
and often the arguments x and y will be implicit.
As noted before, the matrix (−ci,j ) deﬁned by ci,j ξ i η j = h∇2x,y c·ξ, ηi
will play the role of a Riemannian metric; in agreement with classical
conventions of Riemannian geometry, I shall denote by (ci,j ) the coor-
dinates of the inverse of (ci,j ), and sometimes raise and lower indices
according to the rules −ci,j ξi = ξ j , −ci,j ξ j = ξi , etc. In this case (ξi ) are
the coordinates of a 1-form (an element of (Tx M )∗ ), while (ξ j ) are the
coordinates of a tangent vector in Ty M . (I shall try to be clear enough
to avoid devastating confusions with the operation of the metric g.)

Definition 12.26 (c-second fundamental form). Let c : X × Y →

R satisfy (STwist). Let Ω ⊂ Y be open (in the ambient manifold)
with C 2 boundary ∂Ω contained in the interior of Y. Let (x, y) ∈
Dom ′ (∇x c), with y ∈ ∂Ω, and let n be the outward unit normal P vec-i
tor to ∂Ω, defined close to y. Let (ni ) be defined by hn, ξiy = ni ξ
(∀ξ ∈ Ty M ). Define the quadratic form IIc (x, y) on Ty Ω by the formula
X
IIc (x, y)(ξ) = ∂j ni − ck,ℓ cij,k nℓ ξ i ξ j
ijkℓ
X
= ci,k ∂j ck,ℓ nℓ ξ i ξ j . (12.18)
ijkℓ

Definition 12.27 (Ma–Trudinger–Wang tensor, or c-curvature

operator). Let c : X × Y → R satisfy (STwist). For any (x, y) ∈
Dom ′ (∇x c), define a quadrilinear form Sc (x, y) on the space of bivec-
tors (ξ, η) ∈ Tx M × Ty N satisfying

h∇2x,y c(x, y) · ξ, ηi = 0 (12.19)

by the formula
3 X
Sc (x, y) (ξ, η) = cij,r cr,s cs,kℓ − cij,kℓ ξ i ξ j η k η ℓ . (12.20)
2
ijkℓrs
The Ma–Trudinger–Wang condition 315

Remark 12.28. Both for practical and technical purposes, it is often

better to use another equivalent deﬁnition of Sc , see formula (12.21)
below.

To understand where Sc comes from, and to compute it in practice,

the key is the change of variables p = −∇x c(x, y). This leads to the
following natural deﬁnition.

Definition 12.29 (c-exponential). Let c be a cost function satisfying

(Twist), define the c-exponential map on the image of −∇x c by the
formula c-expx (p) = (∇x c)−1 (x, −p).

In other words, c-expx (p) is the unique y such that ∇x c(x, y)+p = 0.
When c(x, y) = d(x, y)2 /2 on X = Y = M , a complete Riemannian
manifold, one recovers the usual exponential of Riemannian geometry,
whose domain of definition can be extended to the whole of TM . More
generally, if c comes from a time-independent Lagrangian, under suit-
able assumptions the c-exponential can be defined as the solution at
time 1 of the Lagrangian system starting at x with initial velocity v,
in such a way that ∇v L(x, v) = p.
Then, with the notation p = −∇x c(x, y), we have the following
reformulation of the c-curvature operator:
3 d2 d2
Sc (x, y)(ξ, η) = − c exp x (tξ), (c-exp)x (p + sη) , (12.21)
2 ds2 dt2
where η in the right-hand side is an abuse of notation for the tangent
vector at x obtained from η by the operation of −∇2xy c(x, y) (viewed
as an operator Ty M → Tx M ). In other words, Sc is obtained by
differentiating the cost function c(x, y) twice with respect to x and
twice with respect to p, not with respect to y. Getting formula (12.20)
from (12.21) is just an exercise, albeit complicated, in classical differ-
ential calculus; it involves the differentiation formula for the matrix
inverse: d(M −1 ) · H = −M −1 HM −1 .

Particular Case 12.30 (Loeper’s identity). If X = Y = M is a

smooth complete Riemannian manifold, c(x, y) = d(x, y)2 /2, and ξ, η
are two unit orthogonal vectors in Tx M , then

Sc (x, x)(ξ, η) = σx (P ) (12.22)

is the sectional curvature of M at x along the plane P generated by ξ

and η. (See Chapter 14 for basic reminders about sectional curvature.)
316 12 Smoothness

To establish (12.22), first note that for any fixed small t, the geodesic
curve joining x to expx (tξ) is orthogonal to s → expx (sη) at s = 0, so
(d/ds)s=0 F (t, s) = 0. Similarly (d/dt)t=0 F (t, s) = 0 for any fixed s, so
the Taylor expansion of F at (0, 0) takes the form
2
d expx (tξ), expx (sη)
= A t2 + B s 2 + C t4 + D t2 s 2 + E s 4
2
+ O(t6 + s6 ).
Since F (t, 0) and F (0, s) are quadratic functions of t and s respectively,
necessarily C = E = 0, so Sc (x, x) = −6D is −6 times the coefficient of
t4 in the expansion of d(expx (tξ), expx (tη))2 /2. Then the result follows
from formula (14.1) in Chapter 14.
Remark 12.31. Formula (12.21) shows that Sc (x, y) is intrinsic, in
the sense that it is independent of any choice of geodesic coordinates
(this was not obvious from Definition 12.20). However, the geometric
interpretation of Sc is related to the regularity property, which is in-
dependent of the choice of Riemannian structure; so we may suspect
that the choice to work in geodesic coordinates is nonessential. It turns
out indeed that Definition 12.20 is independent of any choice of coor-
dinates, geodesic or not: We may apply Formula (12.20) by just letting
ci,j = ∂ 2 c(x, y)/∂xi ∂yj , pi = −ci (partial derivative of c(x, y) with
respect to xi ), and replace (12.21) by
3 ∂2 ∂2
Sc (x, y)(ξ, η) = − c x, c-exp x (p) , (12.23)
2 ∂p2η ∂x2ξ x=x, p=−dx c(x,y)

where the c-exponential is deﬁned in an obvious way in terms of 1-forms

(differentials) rather than tangent vectors. This requires some justifi-
cation since the second differential is not an intrisical concept (except
when the first differential vanishes) and might a priori depend on the
choice of coordinates. When we differentiate (12.23) twice with respect
to xξ , there might be an additional term which will be a combination of
Γijk (x) ∂k c(x, (c-expx )(p)) = −Γijk (x) pk , where the Γijk are the Christof-
fel symbols. But this additional term is linear in p, so anyway it disap-
pears when we differentiate twice with respect to pη . (This argument
does not need the “orthogonality” condition ∇2 c(x, y) · (ξ, η) = 0.)
Remark 12.32. Even if it is intrinsically defined, from the point of
view of Riemannian geometry Sc is not a standard curvature-type op-
erator, for at least two reasons. First it involves derivatives of order
The Ma–Trudinger–Wang condition 317

greater than 2; and secondly it is nonlocal, in a strong sense. Take for

instance c(x, y) = d(x, y)2 /2 on a Riemannian manifold (M, g), ﬁx x
and y, compute Sc (x, y), then a change of the Riemannian metric g can
aﬀect the value of Sc (x, y), even if the metric g is left unchanged in a
neighborhood of x and y, and even if it is unchanged in a neighborhood
of the geodesics joining x to y! Here we are facing the fact that geodesic
distance is a highly nonlocal notion.

Remark 12.33. The operator S is symmetric under the exchange of

x and y, in the sense that if č(x, y) = c(y, x), then Sč (y, x)(η, ξ) =
Sc (x, y)(ξ, η). (Here I am assuming implicitly that also č satisﬁes the
strong twist condition.) This symmetry can be seen from (12.20) by
just rearranging indices. To see it directly from (12.21), we may ap-
ply Remark 12.31 to change the geodesic coordinate system around
x and parameterize x by q = −∇y c(x, y). Then we obtain the nicely
symmetric expression for Sc (x, y)(ξ, η):

3 ∂2 ∂2
− c č-expy (q), c-expx (p) . (12.24)
2 ∂p2η ∂qξ2 p=−dx c(x,y), q=−dy c(x,y)

(Caution: ξ is tangent at x and q at y, so diﬀerentiating with respect to

q in direction ξ means in fact diﬀerentiating in the direction −∇2xy c · ξ;
recall (12.21). The same for pη .)

In the sequel, I shall stick to the same conventions as in the be-

ginning of the chapter, so I shall use a fixed Riemannian metric and
use Hessians and gradients with respect to this metric, rather than
differentials of order 1 and 2.
Theorems 12.35 and 12.36 will provide necessary and sufficient dif-
ferential conditions for c-convexity and regularity; one should note care-
fully that these conditions are valid only inside Dom ′ (∇x c). I shall use
the notation č(y, x) = c(x, y), Ď = {(y, x); (x, y) ∈ D}. I shall also be
led to introduce an additional assumption:
(Cutn−1 ) For any x in the interior of projX (D), the “cut locus”
cutD (x) = projY (D) \ Dom ′ (∇x c(x, · )) locally has finite
(n − 1)-dimensional Hausdorff measure.

Example 12.34. Condition (Cutn−1 ) trivially holds when c satisﬁes

the strong twist condition, D is totally c-convex and product. It also
holds when c is the squared distance on a Riemannian manifold M , and
D = M \ cut(M ) is the domain of smoothness of c (see the Appendix).
318 12 Smoothness

Now come the main results of this section:

Theorem 12.35 (Differential formulation of c-convexity). Let

c : X × Y → R be a cost function satisfying (STwist). Let x ∈ X and
let C be a connected open subset of Y with C 2 boundary. Let x ∈ X ,
such that {x} × C ⊂ Dom ′ (∇x c). Then C is c-convex with respect to x
if and only if IIc (x, y) ≥ 0 for all y ∈ ∂C.
If moreover IIc (x, y) > 0 for all y ∈ ∂Ω then Ω is strictly c-convex
with respect to x.

Theorem 12.36 (Differential formulation of regularity). Let c :

X × Y → R be a cost function satisfying (STwist), and let D be a
totally c-convex open subset of Dom ′ (∇x c). Then:
(i) If c is regular in D, then Sc (x, y) ≥ 0 for all (x, y) ∈ D.
(ii) Conversely, if Sc (x, y) ≥ 0 (resp. > 0) for all (x, y) ∈ D, č
satisfies (STwist), Ď is totally č-convex, and c satisfies (Cutn−1 ) on
D, then c is regular (resp. strictly regular) in D.

Remark 12.37. For Theorem 12.36 to hold true, it is important that

no condition be imposed on the sign of Sc (x, y) · (ξ, η) (or more rig-
orously, the expression in (12.20)) when p ξ and η are not “orthogonal”
to each other. For instance, c(x, y) = 1 + |x − y|2 is regular, still
Sc (x, y) · (ξ, ξ) < 0 for ξ = x − y and |x − y| large enough.

Remark 12.38. A corollary of Theorem 12.36 and Remark 12.33 is

that the regularity of c is more or less equivalent to the regularity of č.

Proof of Theorem 12.35. Let us start with some reminders about clas-
sical convexity in Euclidean space. If Ω is an open set in Rn with C 2
boundary and x ∈ ∂Ω, let Tx Ω stand for the tangent space to ∂Ω at x,
and n for the outward normal on ∂Ω (extended smoothly in a neigh-
borhood of x). The second fundamental
P form of Ω, evaluated at x, is
defined on Tx Ω by II(x)(ξ) = ij ∂i nj ξ i ξ j . A defining function for Ω
at x is a function Φ defined in a neighborhood of x, such that Φ < 0
in Ω, Φ > 0 outside Ω, and |∇Φ| > 0 on ∂Ω. Such a function always
exists (locally), for instance one can choose Φ(x) = ±d(x, ∂Ω), with +
sign when x is outside Ω and − when x is in Ω. (In that case ∇Φ is the
outward normal on ∂Ω.) If Φ is a defining function, then n = ∇Φ/|∇Φ|
on ∂Ω, and for all ξ⊥n,

i j Φij Φj Φik Φij i j
∂i nj ξ ξ = − 3
ξi ξj = ξ ξ .
|∇Φ| |∇Φ| |∇Φ|
The Ma–Trudinger–Wang condition 319

So the condition II(x) ≥ 0 on ∂Ω is equivalent to the condition that

∇2 Φ(x) is always nonnegative when evaluated on tangent vectors.
In that case, we can always choose a defining function of the form
g(±d(x, Ω)), with g(0) = g′ (0) = 1, g strictly convex, so that ∇2 Φ(x)
is nonnegative in all directions. With a bit more work, one can show
that Ω is convex if and only if it is connected and its second fundamen-
tal form is nonnegative on the whole of ∂Ω. Moreover, if the second
fundamental form is positive then Ω is strictly convex.
Now let C be as in the assumptions of Theorem 12.35 and let
Ω = −∇x c(x, C) ⊂ Tx M . Since ∇x c(x, · ) is smooth, one-to-one with
nonsingular differential, ∂Ω is smooth and coincides with −∇x c(x, ∂C).
Since C is connected, so is Ω, so to prove the convexity of Ω we just have
to worry about the sign of its second fundamental form. Let y ∈ ∂C be
fixed, and let Φ = Φ(y) be a defining function for C in a neighborhood
of y. Then Ψ = Φ ◦ (−∇x c(x, · ))−1 is a defining function for Ω. By
direct computation,

Ψ ij = Φrs − crs,k ck,ℓ Φℓ cr,i cs,j .
(The derivation indices are raised because these are derivatives with
respect to the p variables.) Let ξ = (ξ j ) ∈ Ty M be tangent to C;
the bilinear form ∇2 c identifies ξ with an element in (Tx M )∗ whose
coordinates are denoted by (ξi ) = (−ci,j ξ j ). Then
Ψ ij ξi ξj = Φrs − crs,k ck,ℓ Φℓ ) ξ r ξ s .
Since ξ s ns = 0, the nonnegativity of this expression is equivalent to
the nonnegativity of (∂r ns − ck,ℓ crs,k nℓ ) ξ r ξ s , for all tangent vectors ξ.
This establishes the first part of Theorem 12.35, and at the same time
justifies formula (12.18). (To prove the equivalence between the two
expressions in (12.18), use the temporary notation nk = −ck,ℓ nℓ and
note that ∂j ni − ck,ℓ cij,k nℓ = −∂j (ci,k nk ) + cij,k nk = −ci,k ∂j nk .) The
statement about strict convexity follows the same lines. ⊓
⊔
In the proof of Theorem 12.36 I shall use the following technical
lemma:
Lemma 12.39. If c satisfies (STwist) and (Cutn−1 ) then it satisfies
the following property of “transversal approximation”:
(TA) For any x, x in the interior of projX (D), any C 2 path
(yt )0≤t≤1 drawn in Dom ′ (∇x c(x, · )) ∩ projY (D) can be ap-
proximated in C 2 topology by a path (b yt )0≤t≤1 such that
′
{t ∈ (0, 1); ybt ∈
/ Dom (∇x c(x, · ))} is discrete.
320 12 Smoothness

Lemma 12.39 applies for instance to the squared geodesic distance

on a Riemannian manifold. The detailed proof of Lemma 12.39 is a bit
tedious, so I shall be slightly sketchy:
Sketch of proof of Lemma 12.39. As in the statement of (Cutn−1 ), let
cutD (x) = projY (D) \ Dom ′ (∇x c(x, · )). Since cutD (x) has empty inte-
rior, for any fixed t0 ∈ [0, 1] we can perturb the path y in C 2 topology
into a path yb, in such a way that yb(t0 ) ∈ / cutD (x). Repeating this oper-
ation finitely many times, we can ensure that yb(tj ) lies outside cutD (x)
for each tj = j/2k , where k ∈ N and j ∈ {0, . . . , 2k }. If k is large
enough, then for each j the path yb can be written, on the time-interval
[tj , tj+1 ], in some well-chosen local chart, as a straight line. Moreover,
since cutD (x) is closed, there will be a small ball Bj = B(b y (tj ), rj ),
rj > 0, such that on the interval [tj − εj , tj + εj ] the path yb is
entirely contained in Bj , and the larger ball 2Bj = B(b y (tj ), 2rj )
does not meet cutD (x). If we prove the approximation property on
each interval [tj−1 + εj−1 , tj − εj ], we shall get suitable paths (b yt ) on
2
[tj−1 + εj−1 , tj − εj ], approximating (yt ) in C ; in particular ybti ∈ Bi ,
and we can easily “patch together” these pieces of (b yt ) in the intervals
[tj − εj , tj + εj ] while staying within 2Bj .
All this shows that we just have to treat the case when (yt ) takes
values in a small open subset U of Rn and is a straight line. In these
coordinates, Σ := cutD (x) ∩ U will have finite Hn−1 measure, with
Hk standing for the k-dimensional Hausdorff measure. Without loss of
generality, yt = t en , where (e1 , . . . , en ) is an orthonormal basis of Rn
and −τ < t < τ ; and U is the cylinder B(0, σ) × (−τ, τ ), for some
σ > 0.
For any z ∈ B(0, σ) ⊂ Rn−1 , let ytz = (z, t). The goal is to show that
Hn−1 (dz)-almost surely, ytz intersects Σ in at most finitely many points.
To do this one can apply the co-area formula (see the bibliographical
notes) in the following form: let f : (z, t) 7−→ z (defined on U ), then
Z
n−1
H [Σ] ≥ H0 [Σ ∩ f −1 (z)] Hn−1 (dz).
f (Σ)

By assumption
R the left-hand side is finite, and the right-hand side is
exactly #{t; ytz ∈ Σ} Hn−1 (dz); so the integrand is finite for almost
all z, and in particular there is a sequence zk → 0 such that each (ytzk )
intersects Σ finitely many often. ⊓
⊔
Proof of Theorem 12.36. Let us first assume that the cost c is regular
on D, and prove the nonnegativity of Sc .
The Ma–Trudinger–Wang condition 321

Let (x, y) ∈ D be given, and let p ∈ Tx M be such that ∇x c(x, y) +

p = 0. Let ν be a unit vector in Tx M . For ε ∈ [−ε0 , ε0 ] one can define
a c-segment by the formula y(ε) = (∇x c)−1 (x, −p(ε)), p(ε) = p + ε ν;
let y0 = y(−ε0 ) and y1 = y(ε0 ). Further, let h0 (x) = −c(x, y0 ) +
c(x, y0 ) and h1 (x) = −c(x, y1 ) + c(x, y1 ). By construction, ∇x c(x, y0 ) −
∇x c(x, y1 ) is colinear to ν and nonzero, and h0 (x) = h1 (x).
By the implicit function theorem, the equation (h0 (x) = h1 (x))
defines close to x an (n − 1)-dimensional submanifold M f, orthogonal to
ν at x. For any ξ ∈ Tx M f = ν one can define in the neighborhood of
⊥

x a smooth curve (γ(t))−τ ≤t≤τ , valued in M f, such that γ(0) = x and

γ̇(0) = ξ.
Let ψ(x) = max(h0 (x), h1 (x)). By construction ψ is c-convex and
y0 , y1 both belong to ∂c ψ(x). So

ψ(x) + c(x, y) ≤ ψ(γ(t)) + c(γ(t), y)

1
= h0 (γ(t)) + h1 (γ(t)) + c(γ(t), y)
2
1h i
= −c(γ(t), y0 ) + c(x, y0 ) − c(γ(t), y1 ) + c(x, y1 ) + c(γ(t), y).
2
Since ψ(x) = 0, this can be recast as
1 1
0≥ c(γ(t), y0 )+c(γ(t), y1 ) − c(x, y0 )+c(x, y1 ) −c(γ(t), y)+c(x, y).
2 2
(12.25)
At t = 0 the expression on the right-hand side achieves its maximum
value 0, so the second t-derivative is nonnegative. In other words,

d2 1
c(γ(t), y ) + c(γ(t), y ) − c(γ(t), y) ≤ 0.
dt2 t=0 2
0 1

This is equivalent to saying that

1
2

∇2x c(x, y) · ξ, ξi ≥ ∇x c(x, y0 ) · ξ, ξ + ∇2x c(x, y1 ) · ξ, ξ .
2
(Note: The path γ(t) is not geodesic, but this does not matter because
the ﬁrst derivative of the right-hand side in (12.25) vanishes at t = 0.)
Since p = (p0 + p1 )/2 and p0 − p1 was along an arbitrary direction
(orthogonal to ξ), this shows precisely that h∇2x c(x, y)·ξ, ξi is concave as
a function of p, after the change of variables y = y(x, p). In particular,
the expression in (12.21) is ≥ 0. To see that the expressions in (12.21)
322 12 Smoothness

and (12.20) are the same, one performs a direct (tedious) computation
to check that if η is a tangent vector in the p-space and ξ is a tangent
vector in the x-space, then
!
∂ 4 c x, y(x, p)
ξ i ξ j ηk ηℓ = cij,r cr,s cs,kℓ − cij,kℓ ck,m cℓ,n ξ i ξ j ηm ηn
∂pk ∂pℓ ∂xi ∂xj

= cij,r cr,s cs,kℓ − cij,kℓ ξ i ξ j η k η ℓ .

(Here η k = −ck,m ηm stand for the coordinates of a tangent vector at

y, obtained from η after changing variables p → y, and still denoted η
by abuse of notation.) To conclude, one should note that the condition
ξ⊥η, i.e. ξ i ηi = 0, is equivalent to ci,j ξ i η j = 0.
Next let us consider the converse implication. Let (x, y0 ) and (x, y1 )
belong to D; p = p0 = −∇x c(x, y0 ), p1 = −∇x c(x, y1 ), ζ = p1 − p0 ∈
Tx M , pt = p + tζ, and yt = (∇x c)−1 (x, −pt ). By assumption (x, yt )
belongs to D for all t ∈ [0, 1].
For t ∈ [0, 1], let h(t) = −c(x, yt ) + c(x, yt ). Let us ﬁrst assume that
x ∈ Dom ′ (∇y c( · , yt )) for all t; then h is a smooth function of t ∈ (0, 1),
and we can compute its ﬁrst and second derivatives. First,

(ẏ)i = −ci,r (x, yt ) ζr =: ζ i .

Similarly, since (yt ) deﬁnes a straight curve in p-space,

(ÿ)i = −ci,k ck,ℓj cℓ,r cj,s ζr ζs = −ci,k ck,ℓj ζ ℓ ζ j .

So
ḣ(t) = − c,j (x, yt ) − c,j (x, yt ) ζ j = ci,j η i ζ j ,
where ηj = −c,j (x, yt ) + c,j (x, yt ) and η i = −cj,i ηj .
Next,

ḧ(t) = − c,ij (x, yt ) − c,ij (x, yt ) ζ i ζ j

+ c,j (x, yt ) − c,j (x, yt ) cj,k ck,ℓi ζ ℓ ζ i

= − c,ij (x, yt ) − c,ij (x, yt ) − ηℓ cℓ,k ck,ij ζ i ζ j

= − c,ij (x, yt ) − c,ij (x, yt ) − η k ck,ij ζ i ζ j .

Now freeze t, yt , ζ, and let Φ(x) = c,ij (x, yt ) ζ i ζ j = h∇2y c(x, yt )·ζ, ζi.
This can be seen either as a function of x or as a function of q =
The Ma–Trudinger–Wang condition 323

−∇y c(x, yt ), viewed as a 1-form on Tyt M . Computations to go back

and forth between the q-space and the x-space are the same as those
between the p-space and the y-space. If q and q are associated to x and
x respectively, then η = q − q, and

c,ij (x, yt ) − c,ij (x, yt ) − η k ck,ij ζ i ζ j = Φ(q) − Φ(q) − dq Φ · (q − q)
Z 1

= ∇2q Φ (1 − s)q + sq · (q − q, q − q) (1 − s) ds.
0

Computing ∇2q Φ means diﬀerentiating c(x, y) twice with respect to y

and then twice with respect to q, with ∇y c(x, y) + q = 0. According to
Remark 12.33, the result is the same as when we first differentiate with
respect to x and then with respect to p, so this gives (−2/3)Sc . Since
q − q = −η, we end up with

 ḣ(t) = ∇ 2 c(x, y ) · (η, ζ);

 x,y t

Z


 2 1
ḧ(t) = Sc (∇y c)−1 (1 − s)q + sq, yt , yt · (η, ζ) (1 − s) ds,
3 0
(12.26)
where now ∇y c is inverted with respect to the x variable. Here I have
slightly abused notation since the vectors η and ζ do not necessar-
ily satisfy ∇2x,y c · (η, ζ) = 0; but Sc stands for the same analytic
expression as in (12.20). Note that the argument of Sc in (12.26)
is always well-defined because Ď was assumed to be č-convex and
x ∈ Dom ′ (∇y c( · , yt )). (So [x, x]yt is contained in Dom ′ (∇y c( · , yt )).)
The goal is to show that h achieves its maximum value at t = 0 or
t = 1. Indeed, the inequality h(t) ≤ max(h(0), h(1)) is precisely (12.15).
Let us first consider the simpler case when Sc > 0. If h achieves
its maximum at t0 ∈ (0, 1), then ḣ(t0 ) = 0, so ∇2x,y c · (η, ζ) = 0,
Sc (. . .)(η, ζ) > 0, and by (12.26) we have ḧ(t0 ) > 0 (unless x = x),
which contradicts the fact that t0 is a maximum. So h has a maximum
at t = 0 or t = 1, which proves (12.15), and this inequality is strict
unless t = 0 or t = 1 or x = x.
To work out the borderline case where Sc is only assumed to be
nonnegative we shall have to refine the analysis just a bit. By the same
density argument as above, we can assume that h is smooth.
Freeze ζ and let η vary in a ball. Since Sc (x, y) · (η, ζ) is a quadratic
function of η, nonnegative on the hyperplane {ζℓ η ℓ = 0}, there is a
324 12 Smoothness

constant C such that Sc (x, y) · (η, ζ) ≥ −C |ζℓ η ℓ | = −C |∇2x,y c · (ζ, η)|.

This constant only depends on an upper bound on the functions of x, y
appearing in Sc (x, y), and on upper bounds on the norm of ∇2x,y c and
its inverse. By homogeneity, Sc (x, y) · (η, ζ) ≥ −C |∇2x,y c · (ζ, η)| |ζ| |η|,
where the constant C is uniform when x, x, y0 , y1 vary in compact
domains. The norms |ζ| and |η| remain bounded as t varies in [0, 1],
so (12.26) implies
ḧ(t) ≥ −C |ḣ(t)|. (12.27)
Now let hε (t) = h(t) + ε(t − 1/2)k , where ε > 0 and k ∈ 2N will
be chosen later. Let t0 be such that hε admits a maximum at t0 . If
t0 ∈ (0, 1), then ḣε (t0 ) = 0, ḧε (t0 ) ≤ 0, so

ḧ(t0 ) = ḧε (t0 ) − ε k(k − 1) (t0 − 1/2)k−2 ≤ −ε k(k − 1) (t0 − 1/2)k−2 ;

ḣ(t0 ) = −ε k (t0 − 1/2)k−1 .

Plugging these formulas back in (12.27) we deduce C |t0 − 1/2| ≥ k − 1,
which is impossible if k has been chosen greater than 1 + C/2. Thus hε
has to reach its maximum either at t = 0 or t = 1, i.e.

h(t) + ε (t − 1/2)k ≤ max(h(0), h(1)) + ε 2−k .

Letting ε → 0, we conclude again to (12.15).

To conclude the proof of the theorem, it only remains to treat the
case when x does not belong to Dom ′ (∇y c( ·, yt )) for all t, or equiva-
lently when the path (yt ) is not contained in Dom ′ (∇x c(x, · ). Let us
consider for instance the case Sc > 0. Thanks to Lemma 12.39, we
can approximate (yt ) by a very close path (b yt ), in such a way that (yt )
leaves Dom ′ (∇x c(x, ·)) only on a discrete set of times tj .
Outside of these times, the same computations as before can be
repeated with ybt in place of yt (here I am cheating a bit since (b yt ) is
not a c-segment any longer, but it is no big deal to handle correction
terms). So h cannot achieve a maximum in (0, 1) except maybe at some
time tj , and it all amounts to proving that tj cannot be a maximum
of h either. This is obvious if ḣ is continuous at tj and ḣ(tj ) 6= 0. This
is also obvious if ḣ is discontinuous at tj , because by semiconvexity
of t → −c(x, yt ), necessarily ḣ(t+ −
j ) > ḣ(tj ). Finally, if ḣ is continuous
at tj and ḣ(tj ) = 0, the same computations as before show that ḧ(t)
is strictly positive when t is close to (but diﬀerent from) tj , then the
continuity of ḣ implies that h is strictly convex around tj , so it cannot
have a maximum at tj . ⊓
⊔
The Ma–Trudinger–Wang condition 325

With Theorems 12.35 and 12.36 at hand it becomes possible to prove

or disprove the regularity of certain simple cost functions. Typically, one
ﬁrst tries to exhaust Dom ′ (∇x c(x, ·)) by smooth open sets compactly
included in its interior, and one checks the c-convexity of these sets by
use of the c-second fundamental form; and similarly for the x variable.
Then one checks the sign condition on Sc . In simple enough situations,
this strategy can be worked out successfully.

Example 12.40. If f and g are C 2 convex functions Rn → R with

|∇f | < 1, |∇g| < 1, then c(x, y) = |x − y|2 + |f (x) − g(y)|2 satisﬁes
Sc ≥ 0 on Rn × Rn , so it is regular on Rn × Rn . If ∇2 f and ∇2 g are
positive everywhere, then Sc > 0, so c is strictly regular on Rn × Rn .
p
Examples
p 12.41. The cost functions c(x, y) = 1 + |x − y|2 in Rn ×
Rn , 1 − |x − y|2 in B(0, 1) × B(0, 1) \ {|x − y| ≥ 1}, |x − y|p on
Rn ×Rn \{y = x} for 0 < p < 1, d(x, y)2 on S n−1 ×S n−1 \{y = −x}, all
satisfy Sc > 0, and are therefore strictly regular on any totally c-convex
subdomain (for instance, B(0, 1/2) × B(0, 1/2) for |x − y|p ). The same
is true of the singular cost functions |x − y|p (−2 < p < 0), − log |x − y|
on Rn × Rn \ {y = x}, or − log |x − y| on S n−1 × S n−1 \ {y = ±x}. Also
the limit case c(x, y) = |x − y|−2 satisﬁes Sc ≥ 0 on Rn × Rn \ {x = y}.

Theorem 12.42 (Equivalence of regularity conditions). Let

M , N be Riemannian manifolds and let c : M × N → R be a locally
semiconcave cost function such that c and č satisfy (STwist), c sat-
isfies (Cutn−1 ), and the c-exponential map extends into a continuous
map TM → N . Further, assume that Dom ′ (∇x c) is totally c-convex
and Dom ′ (∇x č) is totally č-convex. Then the following three properties
are equivalent:
(i) c satisfies Assumption (C);
(ii) c is regular;
(iii) c satisfies the Ma–Trudinger–Wang condition Sc ≥ 0.

Remark 12.43. The implications (i) ⇒ (ii) and (ii) ⇒ (iii) remain
true without the convexity assumptions. It is a natural open prob-
lem whether these assumptions can be completely dispensed with in
Theorem 12.42. A bold conjecture would be that (i), (ii) and (iii)
are always equivalent and automatically imply the total c-convexity
of Dom ′ (∇x c).
326 12 Smoothness

Proof of Theorem 12.42. Theorem 12.36 ensures the equivalence of (ii)

and (iii). We shall now see that (i) and (ii) are also equivalent.
Assume (ii) is satisfied. Let ψ : M → R be a c-convex function,
and let x ∈ M ; the goal is to show that ∂c ψ(x) is connected. Without
loss of generality we may assume ψ(x) = 0. Let ψx,y0 ,y1 be defined as
in (12.17). The same reasoning as in the proof of Proposition 12.15(i)
shows that ∂c ψx,y0 ,y1 ⊂ ∂c ψ(x), so it is sufficient to show that ∂c ψx,y0 ,y1
is connected. Even if y0 , y1 do not belong to Dom ′ (∇x c(x, ·)), the latter
set has an empty interior as a consequence of (Cutn−1 ), so we can find
(k) (k) (k)
sequences (y0 )k∈N and (y1 )k∈N such that (x, yi ) ∈ Dom ′ (∇x c) and
(k) (k)
yi → yi (i = 0, 1). In particular, there are pi ∈ Tx M , uniquely
(k) (k)
determined, such that ∇x c(x, yi ) + pi = 0.
(k) (k)
Then let ψ (k) = ψx,y(k) ,y(k) . The c-segment [y0 , y1 ]x is well-
0 1
defined and included in ∂c ψ (k) (x) (because c is regular). In other
(k) (k)
words, c-expx ((1 − t) p0 + t p1 ) ∈ ∂c ψ (k) (x). Passing to the limit as
k → ∞, after extraction of a subsequence if necessary, we find vectors
p0 , p1 (not necessarily uniquely determined) such that c-expx (p0 ) = y0 ,
c-expx (p1 ) = y1 , and c-expx ((1 − t) p0 + t p1 ) ∈ ∂c ψ(x). This proves the
desired connectedness property.
Conversely, assume that (i) holds true. Let x, y be such that (x, y) ∈
Dom ′ (∇x c), and let p = −∇x c(x, y). Let ν be a unit vector in Ty M ;
(ε) (ε)
for ε > 0 small enough, y0 = c-expx (p + εν) and y1 = c-expx (p − εν)
belong to Dom ′ (∇x c(x, · )). Let ψ (ε) = ψx,y(ε) ,y(ε) . As ε → 0, ∂c ψ (ε) (x)
0 1
shrinks to ∂c ψx,y,y (x) = {y} (because (x, y) ∈ Dom ′ (∇x c)). Since
Dom ′ (∇x c) is open, for ε small enough the whole set ∂c ψ ε (x) is in-
cluded in Dom ′ (∇x c). By the same reasoning as in the proof of Propo-
sition 12.15(iii), the connectedness of ∂c ψ ε (x) implies its c-convexity.
Then the proof of Theorem 12.36(i) can be repeated to show that
Sc (x, y) ≥ 0 along pairs of vectors satisfying the correct orthogonality
condition. ⊓
⊔

I shall conclude this section with a negative result displaying the

power of the diﬀerential reformulation of regularity.

Theorem 12.44 (Smoothness of the optimal transport needs

nonnegative curvature). Let M be a compact Riemannian manifold
such that the sectional curvature σx (P ) is negative for some x ∈ M
and some plane P ⊂ Tx M . Then there exist smooth positive probability
Differential formulation of c-convexity 327

densities f and g on M such that the optimal transport map T from

f vol to g vol, with cost c(x, y) = d(x, y)2 , is discontinuous.
The same conclusion holds true under the weaker assumption that
Sc (x, y) · (ξ, η) < 0 for some (x, y) ∈ M × M such that y does not
belong to the cut locus of x and ∇2xy c(x, y) · (ξ, η) = 0.

Remark 12.45. A counterexample by Y.-H. Kim shows that the sec-

ond assumption is strictly weaker than the ﬁrst one.

Proof of Theorem 12.44. Let ξ, η be orthogonal tangent vectors at x

generating P . By Particular Case 12.30, Sc (x, x)(ξ, η) < 0. If we fix
a neighborhood V of x we can find r > 0 such that for any x the
set Cx = (∇x c)−1 (x, Br (0)) is well-defined. If we take a small enough
subdomain of U , containing x and define C = ∩x∈U Cx , then U × C is
totally c-convex and open. By Theorem 12.36, c is not regular in U ×C.
For any y0 , y1 in C, define ψx,y0 ,y1 as in (12.17). If we let U × C
shrink to {x} × {x}, ∂c ψx,y0 ,y1 (x) will converge to ∂c ψx,x,x (x) = {x}.
So if U and C are small enough, ∂c ψx,y0 ,y1 (x) will be contained in an
arbitrarily small ball around x, a fortiori in Dom ′ (∇x c(x, ·)). Then we
can apply Theorem 12.20 and Corollary 12.21.
A similar reasoning works for the more general case when Sc (x, y) ·
(ξ, η) < 0, since ∂c ψx,y,y = {y} (as long as y is not a cut point of x). ⊓
⊔

Differential formulation of c-convexity

Its deﬁnition makes the c-convexity property quite diﬃcult to check

in general. In contrast, to establish the plain convexity of a smooth
function Rn → R it is suﬃcient to just prove the nonnegativity of its
Hessian. If c is an arbitrary cost function, there does not seem to be
such a simple diﬀerential characterization for c-convexity; but if c is a
regular cost function there is one, as the next result will demonstrate.
The notation is the same as in the previous section.

Theorem 12.46 (Differential criterion for c-convexity). Let c :

X × Y → R be a cost function such that c and č satisfy (STwist),
and let D be a totally c-convex closed subset of Dom ′ (∇x c) such that
Ď is totally č-convex and Sc ≥ 0 on D. Let X ′ = projX (D) and let
ψ ∈ C 2 (X ′ ; R) (meaning that ψ is twice continuously differentiable on
328 12 Smoothness

X ′ , up to the boundary). If for any x ∈ X ′ there is y ∈ Y such that

(x, y) ∈ D and 
∇ψ(x) + ∇x c(x, y) = 0

(12.28)

 2 2
∇ ψ(x) + ∇x c(x, y) ≥ 0,
then ψ is c-convex on X ′ (or more rigorously, c′ -convex, where c′ is the
restriction of c to D).

Remark 12.47. In view of the discussion at the beginning of this chap-

ter, (12.28) is a necessary and sufficient condition for c-convexity, up
to issues about the smoothness of ψ and the domain of diﬀerentiability
of c. Note that the set of y’s appearing in (12.28) is not required to be
the whole of projY (D), so in practice one may often enlarge D in the
y variable before applying Theorem 12.46.

Remark 12.48. Theorem 12.46 shows that if c is a regular cost, then

(up to issues about the domain of deﬁnition) c-convexity is a local
notion.

Proof of Theorem 12.46. Let ψ satisfy the assumptions of the theorem,

and let (x, y) ∈ D such that ∇ψ(x) + ∇x c(x, y) = 0. The goal is

∀x ∈ X ′ , ψ(x) + c(x, y) ≥ ψ(x) + c(x, y). (12.29)

If this is true then ψ c (y) = ψ(x) + c(x, y), in particular ψ cc (x) =

supy [ψ c (y) − c(x, y)] ≥ ψ(x), and since this is true for any x ∈ X ′
we will have ψ cc ≥ ψ, therefore ψ cc = ψ, so ψ will be c-convex.
The proof of (12.29) is in the same spirit as the proof of the converse
implication in Theorem 12.36. Let x ∈ X ′ , and let (xt )0≤t≤1 = [x, x]y
be the č-segment with base y and endpoints x0 = x and x1 = x. Let

h(t) = ψ(xt ) + c(xt , y).

To prove (12.29) it is suﬃcient to show that h(t) ≥ h(0) for all t ∈ [0, 1].
The č-convexity of Ď implies that (xt , y) always lies in D. Let
q = −∇y c(x, y), q = −∇y c(x, y), η = q − q, then as in the proof of
Theorem 12.36 we have (ẋ)j = −ck,j ηk = η j ,

ḣ(t) = ψi (xt ) + ci (xt , y) η i = −ci,j (xt , y) η i ζ j

where ζi = ψi (xt ) + ci (xt , y) and ζ j = −cj,i ζi ; similarly,

Control of the gradient via c-convexity 329

ḧ(t) = ψij (xt ) + cij (xt , y) + cij,k (xt , y) ζ k η i η j . (12.30)

By assumption there is yt such that ∇ψ(xt ) + ∇x c(xt , yt ) = 0, in

particular ζi = ci (xt , y) − ci (xt , yt ). Then (12.30) can be rewritten as
Φ(pt ) + dp Φ(pt ) · (pt − pt ), where

Φ(y) := ψij (xt ) + cij (xt , y) η i η j

is seen as a function of p = −∇x c(xt , y), and of course pt = −∇x c(xt , yt ),

pt = −∇x c(xt , y). (Note that ψ does not contribute to dp Φ.) After using
the c-convexity of D and a Taylor formula, we end up with formulas
that are quite similar to (12.26):


 ḣ(t) = ∇2x,y c(xt , y) · (η, ζ)





 ḧ(t) = ∇2 ψ(xt ) + ∇2x c(xt , yt ) · (ζ, ζ)
 Z


 2 1
 + Sc xt , (∇x c)−1 xt , (1 − s)pt + spt · (η, ζ) (1 − s) ds.
3 0
(12.31)
By assumption the ﬁrst term in the right-hand side of ḧ is nonneg-
ative, so, reasoning as in the proof of Theorem 12.36 we arrive at

ḧ ≥ −C |ḣ(t)|, (12.32)

where C is a positive constant depending on c, ψ, x, y, x. We shall see

that (12.32), combined with ḣ(0) = 0, implies that h is nondecreasing
on [0, 1], and therefore h(t) ≥ h(0), which was our goal.
Assume indeed that ḣ(t∗ ) < 0 for some t∗ ∈ (0, 1], and let t0 =
sup {t ≤ t∗ ; ḣ(t) = 0} ≥ 0. For t ∈ (t0 , t∗ ) we have ḣ(t) < 0 and
(d/dt) log |ḣ(t)| = ḧ/ḣ ≤ C, so log |ḣ(t)| ≥ log |ḣ(t∗ )| − C (t∗ − t),
and as t → t0 we obtain a contradiction since log |ḣ(t0 )| = −∞. The
conclusion is that ḣ(t) ≥ 0 for all t ∈ [0, 1], and we are done. ⊓
⊔

Control of the gradient via c-convexity

The property of c-convexity of the target is the key to get good control
of the localization of the gradient of the solution to (12.2). This asser-
tion might seem awkward: After all, we already know that under gen-
eral assumptions, T (Spt µ) = Spt ν (recall the end of Theorem 10.28),
330 12 Smoothness

where the transport T is related to the gradient of ψ by T (x) =

(∇x c)−1 (x, −∇ψ(x)); so ∇ψ(x) always belongs to −∇x c(Spt µ, Spt ν)
when x varies in Spt µ.
To understand why this is not enough, assume that you are approx-
imating ψ by smooth approximate solutions ψk . Then ∇ψk −→ ∇ψ
at all points of differentiability of ψ, but you have no control of the
behavior of ∇ψk (x) if ψ is not differentiable at x! In particular, in
the approximation process the point y might very well get beyond the
support of ν, putting you in trouble. To guarantee good control of
smooth approximations of ψ, you need an information on the whole
c-subdifferential ∂c ψ(x). The next theorem says that such control is
available as soon as the target is c-convex.
Theorem 12.49 (Control of c-subdifferential by c-convexity of
target). Let X , Y, c : X × Y → R, µ ∈ P (X ), ν ∈ P (Y) and
ψ : X → R ∪ {+∞} satisfy the same assumptions as in Theorem 10.28
(including (H∞)). Let Ω ⊂ X be an open set such that Spt µ = Ω, and
let C ⊂ Y be a closed set such that Spt ν ⊂ C. Assume that:
(a) Ω × C ⊂ Dom ′ (∇x c);
(b) C is c-convex with respect to Ω.
Then ∂c ψ(Ω) ⊂ C.
Proof of Theorem 12.49. We already know from Theorem 10.28 that
T (Ω) ⊂ C, where T (x) = (∇x c)−1 (x, −∇ψ(x)) stands for the optimal
transport. In particular, ∂c ψ(Ω ∩ Dom (∇ψ)) ⊂ C. The problem is to
control ∂c ψ(x) when ψ is not differentiable at x.
Let x ∈ Ω be such a point, and let y ∈ ∂c ψ(x). By Theorem 10.25,
−∇x c(x, y) ∈ ∇− ψ(x). Since ψ is locally semiconvex, we can apply
Remark 10.51 to deduce that ∇− ψ(x) is the convex hull of limits of
∇ψ(xk ) when xk → x. Then by Proposition 12.15(ii), there are L ∈ N
(L = n + 1 would do), αℓ ≥ 0 and (xk,ℓ )k∈N (1 ≤ ℓ ≤ L) such that
P
αℓ = 1, xk,ℓ → x as k → ∞, and
L
X
αℓ ∇ψ(xk,ℓ ) −−−→ −∇x c(x, y). (12.33)
k→∞
ℓ=1

From the observation at the beginning of the proof, ∇ψ(xk,ℓ ) ∈

−∇x c(xk,ℓ , C), and as k → ∞ this set converges uniformly (say in
the sense of Hausdorﬀ distance) to −∇x c(x, C) which is convex. By
passing to the limit in (12.33) we get −∇x c(x, y) ∈ −∇x c(x, C), so
y ∈C. ⊓
⊔
Smoothness results 331

Smoothness results

After a long string of counterexamples (Theorems 12.3, 12.4, 12.7,

Corollary 12.21, Theorem 12.44), we can at last turn to positive results
about the smoothness of the transport map T . It is indeed absolutely
remarkable that a good regularity theory can be developed once all the
previously discussed obstructions are avoided, by:
• suitable assumptions of convexity of the domains;
• suitable assumptions of regularity of the cost function.
These results constitute a chapter in the theory of Monge–Ampère-
type equations, more precisely for the “second boundary value prob-
lem”, which means that the boundary condition is not of Dirichlet type;
instead, what plays the role of boundary condition is that the image of
the source domain by the transport map should be the target domain.
Typically a convexity-type assumption on the target will be needed
for local regularity results, while global regularity (up to the bound-
ary) will request convexity of both domains. Throughout this theory
the main problem is to get C 2 estimates on the unknown ψ; once these
estimates are secured, the equation becomes “uniformly elliptic”, and
higher regularity follows from the well-developed machinery of uni-
formly elliptic fully nonlinear second-order partial differential equations
combined with the linear theory (Schauder estimates).
It would take much more space than I can afford here to give a
fair account of the methods used, so I shall only list some of the main
results proven so far, and refer to the bibliographical notes for more
information. I shall distinguish three settings which roughly speaking
are respectively the quadratic cost function in Euclidean space; regular
cost functions; and strictly regular cost functions. The day may come
when these results will all be unified in just two categories (regular and
strictly regular), but we are not there yet.
In the sequel, I shall denote by C k,α(Ω) (resp. C k,α(Ω)) the space
of functions whose derivatives up to order k are locally α-Hölder (resp.
globally α-Hölder in Ω) for some α ∈ (0, 1], α = 1 meaning Lipschitz
continuity. I shall say that a C 2 -smooth open set C ⊂ Rn is uniformly
convex if its second fundamental form is uniformly positive on the whole
of ∂C. A similar notion of uniform c-convexity can be defined by use
of the c-second fundamental form in Definition 12.26. I shall say that
a cost function c is uniformly regular if it satisfies Sc (x, y) ≥ λ |ξ|2 |η|2
for some λ > 0, where Sc is defined by (12.20) and h∇2xy c · ξ, ηi = 0;
332 12 Smoothness

I shall abbreviate this inequality into Sc (x, y) ≥ λ Id . When I say that

a density is bounded from above and below, this means bounded by
positive constants.

Theorem 12.50 (Caffarelli’s regularity theory). Let c(x, y) =

|x − y|2 in Rn × Rn , and let Ω, Λ be connected bounded open subsets
of Rn . Let f, g be probability densities on Ω and Λ respectively, with
f and g bounded from above and below. Let ψ : Ω → R be the unique
(up to an addive constant) Kantorovich potential associated with the
probability measures µ(dx) = f (x) dx and ν(dy) = g(y) dy, and the
cost c. Then:
(i) If Λ is convex, then ψ ∈ C 1,β (Ω) for some β ∈ (0, 1).
(ii) If Λ is convex, f ∈ C 0,α (Ω), g ∈ C 0,α (Λ) for some α ∈ (0, 1),
then ψ ∈ C 2,α (Ω); moreover, for any k ∈ N and α ∈ (0, 1),

f ∈ C k,α (Ω), g ∈ C k,α (Λ) =⇒ ψ ∈ C k+2,α (Ω).

(iii) If Λ and Ω are C 2 and uniformly convex, f ∈ C 0,α (Ω) and

g ∈ C 0,α (Λ) for some α ∈ (0, 1), then ψ ∈ C 2,α (Ω); more generally, for
any k ∈ N and α ∈ (0, 1),

f ∈ C k,α(Ω), g ∈ C k,α(Λ), Ω, Λ ∈ C k+2 =⇒ ψ ∈ C k+2,α(Ω).

Theorem 12.51 (Urbas–Trudinger–Wang regularity theory).

Let X and Y be the closures of bounded open sets in Rn , and let c :
X × Y → R be a smooth cost function satisfying (STwist) and Sc ≥ 0
in the interior of X ×Y. Let Ω ⊂ X and Λ ⊂ Y be C 2 -smooth connected
open sets and let f ∈ C(Ω), g ∈ C(Λ) be positive probability densities.
Let ψ be the unique (up to an additive constant) Kantorovich potential
associated with the probability measures µ(dx) = f (x) dx and ν(dy) =
g(y) dy, and the cost c. If (a) Λ is uniformly c-convex with respect
to Ω, and Ω uniformly č-convex with respect to Λ, (b) f ∈ C 1,1 (Ω),
g ∈ C 1,1 (Λ), and (c) Λ and Ω are of class C 3,1 , then ψ ∈ C 3,β (Ω) for
all β ∈ (0, 1).
If moreover for some k ∈ N and α ∈ (0, 1) we have f ∈ C k,α (Ω),
g ∈ C k,α(Ω) and Ω, Λ are of class C k+2,α, then ψ ∈ C k+2,α (Ω).

Theorem 12.52 (Loeper–Ma–Trudinger–Wang regularity the-

ory). Let X and Y be the closures of bounded connected open sets
in Rn , and let c : X × Y → R be a smooth cost function satisfy-
ing (STwist) and Sc ≥ λ Id , λ > 0, in the interior of X × Y. Let
Smoothness results 333

Ω ⊂ X and Λ ⊂ Y be two connected open sets, let µ ∈ P (Ω) such that

dµ/dx > 0 almost everywhere in Ω, and let g be a probability density
on Λ, bounded from above and below. Let ψ be the unique (up to an
additive constant) Kantorovich potential associated with the probability
measures µ(dx), ν(dy) = g(y) dy, and the cost c. Then:
(i) If Λ is c-convex with respect to Ω and if

∃ m > n − 1, ∃ C > 0, ∀x ∈ Ω, ∀r > 0, µ[Br (x)] ≤ C r m ,

then ψ ∈ C 1,β (Ω) for some β ∈ (0, 1).

(ii) If Λ is uniformly c-convex with respect to Ω, f ∈ C 1,1 (Ω) and
g ∈ C 1,1 (Λ), then ψ ∈ C 3,β (Ω) for all β ∈ (0, 1). If moreover for
some k ∈ N and α ∈ (0, 1) we have f ∈ C k,α (Ω), g ∈ C k,α (Λ), then
ψ ∈ C k+2,α(Ω).

Remark 12.53. Theorem 12.51 shows that the regularity of the cost
function is sufficient to build a strong regularity theory. These results
are still not optimal and likely to be refined in the near future; in
particular one can ask whether C α → C 2,α estimates are available for
plainly regular cost functions (but Caffarelli’s methods strongly use the
affine invariance properties of the quadratic cost function); or whether
interior estimates exist (Theorem 12.52(ii) shows that this is the case
for uniformly regular costs).

Remark 12.54. On the other hand, the ﬁrst part of Theorem 12.52
shows that a uniformly regular cost function behaves better, in certain
ways, than the square Euclidean norm! For instance, the condition in
Theorem 12.52(i) is automatically satisﬁed if µ(dx) = f (x) dx, f ∈ Lp
for p > n; but it also allows µ to be a singular measure. (Such estimates
are not even true for the linear Laplace equation!) As observed by
specialists, uniform regularity makes the equation much more elliptic.

Remark 12.55. Theorems 12.52 and 12.51 imply a certain converse

to Theorem 12.20: Roughly speaking, if the cost function is regular
then any c-convex function ψ deﬁned in a uniformly bounded convex
domain can be approximated uniformly by smooth c-convex functions.
In other words, the density of smooth c-convex functions is more or less
a necessary and suﬃcient condition for the regularity property.

All these results are stated only for bounded subsets of Rn × Rn ,

so the question arises whether they can be extended to more general
334 12 Smoothness

cost functions on Riemannian manifolds. One possibility is to redo the

proofs of all these results in curved geometry (with probably addi-
tional complications and assumptions). Another possibility is to use a
localization argument to reduce the general case to the particular case
where the functions are deﬁned in Rn . At the level of the optimal trans-
port problem, such an argument seems to be doomed: If you cut out
a small piece Ω ⊂ X and a small piece Λ ⊂ Y, there is in general no
hope of being able to choose Ω and Λ in such a way that the optimal
transport sends Ω to Λ and these domains satisfy adequate c-convexity
properties. However, whenever interior a priori estimates are available
besides the regularity results, this localization strategy is likely to work
at the level of the partial diﬀerential equation. At least Theorems 12.50
and 12.52 can be complemented with such interior a priori estimates:

Theorem 12.56 (Caffarelli’s interior a priori estimates). Let

Ω ⊂ Rn be open and let ψ : Ω → R be a smooth convex function
satisfying the Monge–Ampère equation

det(∇2 ψ(x)) = F (x, ∇ψ(x)) in Ω. (12.34)

Let κΩ (ψ) stand for the modulus of (strict) convexity of ψ in Ω. Then

for any open subdomain Ω ′ such that Ω ′ ⊂ Ω, one has the a priori
estimates (for some β ∈ (0, 1), for all α ∈ (0, 1), for all k ∈ N)

kψkC 1,β (Ω ′ ) ≤ C Ω, Ω ′ , kF kL∞ (Ω) , k∇ψkL∞ (Ω) , κΩ (ψ) ;

kψkC 2,α (Ω ′ ) ≤ C α, Ω, Ω ′ , kF kC 0,α (Ω) , k∇ψkL∞ (Ω) , κΩ (ψ) ;

kψkC k+2,α (Ω ′ ) ≤ C k, α, Ω, Ω ′ , kF kC k,α (Ω) , k∇ψkL∞ (Ω) , κΩ (ψ) .

With Theorem 12.56 and some more work to establish the strict
convexity, it is possible to extend Caﬀarelli’s theory to unbounded do-
mains.

Theorem 12.57 (Loeper–Ma–Trudinger–Wang interior a pri-

ori estimates). Let X , Y be the closures of bounded open sets in
Rn , and let c : X × Y → R be a smooth cost function satisfying
(STwist) and uniformly regular. Let Ω ⊂ X be a bounded open set and
let ψ : Ω → R be a smooth c-convex solution of the Monge–Ampère-type
equation
Smoothness results 335

det ∇2 ψ(x)+∇2x c x, (∇x c)−1 (x, −∇ψ(x)) = F (x, ∇ψ(x)) in Ω.
(12.35)
−1
Let Λ ⊂ Y be a strict neighborhood of {(∇x c) (x, −∇ψ(x)); x ∈ Ω},
c-convex with respect to Ω. Then for any open subset Ω ′ such that
Ω ′ ⊂ Ω, one has the a priori estimates (for some β ∈ (0, 1), for all
α ∈ (0, 1), for all k ≥ 2)

′
kψkC 1,β (Ω ′ ) ≤ C Ω, Ω , c|Ω×Λ , kF kL (Ω) , k∇ψkL (Ω) ;
∞ ∞

kψkC 3,α (Ω ′ ) ≤ C α, Ω, Ω ′ , c|Ω×Λ , kF kC 1,1 (Ω) , k∇ψkL∞ (Ω) ;

kψkC k+2,α (Ω ′ ) ≤ C k, α, Ω, Ω ′ , c|Ω×Λ , kF kC k,α (Ω) , k∇ψkL∞ (Ω) .

These a priori estimates can then be extended to more general cost

functions. A possible strategy is the following:
(1) Identify a totally c-convex domain D ⊂ X × Y in which the cost
is smooth and uniformly regular. (For instance in the case of d2 /2 on
S n−1 this could be S n−1 × S n−1 \ {d(y, −x) < δ}, i.e. one would remove
a strip around the cut locus.)
(2) Prove the continuity of the optimal transport map.
(3) Working on the mass transport condition, prove that ∂c ψ is
entirely included in D. (Still in the case of the sphere, prove that the
transport map stays a positive distance away from the cut locus.)
(4) For each small enough domain Ω ⊂ X , ﬁnd a c-convex subset
Λ of Y such that the transport map sends Ω into Λ; deduce from
Theorem 12.49 that ∂c ψ(Ω) ⊂ Λ and deduce an upper bound on ∇ψ.
(5) Use coordinates to reduce to the case when Ω and Λ are subsets
of Rn . (Reduce Ω and Λ if necessary.) Since the tensor Sc is intrinsi-
cally deﬁned, the uniform regularity property will be preserved by this
operation.
(6) Regularize ψ on ∂Ω, solve the associated Monge–Ampère-type
equation with regularized boundary data (there is a set of useful tech-
niques for that), and use Theorem 12.57 to obtain interior a priori
estimates which are independent of the regularization. Then pass to
the limit and get a smooth solution. This last step is well-understood
by specialists but requires some skill.
To conclude this chapter, I shall state a regularity result for optimal
transport on the sphere, which was obtained by means of the preceding
strategy.
336 12 Smoothness

Theorem 12.58 (Smoothness of optimal transport on S n−1 ).

Let S n−1 be the unit Euclidean sphere in Rn , equipped with its vol-
ume measure, and let d be the geodesic distance on S n−1 . Let f and
g be C 1,1 positive probability densities on S n−1 . Let ψ be the unique
(up to an additive constant) Kantorovich potential associated with the
transport of µ(dx) = f (x) vol(dx) to ν(dy) = g(y) vol(dy) with cost
c(x, y) = d(x, y)2 , and let T be the optimal transport map. Then ψ ∈
C 3,β (S n−1 ) for all β ∈ (0, 1), and in particular T ∈ C 2,β (S n−1 , S n−1 ).
If moreover f, g lie in C k,α (S n−1 ) for some k ∈ N, α ∈ (0, 1), then
ψ ∈ C k+2,α (S n−1 ) and T ∈ C k+1,α (S n−1 , S n−1 ). (In particular, if f
and g are positive and C ∞ then ψ and T are C ∞ .)

Exercise 12.59. Split S n−1 into two (geodesically convex) hemispheres

S+ and S− , according to, say, the sign of the ﬁrst coordinate. Let f±
stand for the uniform probability density on S± . Find out the optimal
transport between f+ vol and f− vol (for the cost c(x, y) = d(x, y)2 )
and explain why it was not a priori expected to be smooth.

Appendix: Strong twist condition for squared distance,

and rectifiability of cut locus

In this Appendix I shall explain why the squared geodesic distance

satisﬁes the strong twist condition. The argument relies on a few key
results about the cut locus (introduced just after Problem 8.8) which
I shall recall without proof. Details can be found in mildly advanced
textbooks in Riemannian geometry such as the ones quoted in the bib-
liographical notes of Chapter 14.
Let (M, g) be a complete smooth Riemannian manifold equipped
with its geodesic distance d. Recall that d(x, · )2 is not diﬀerentiable
at y if and only if there are at least two distinct minimal geodesics
going from x to y. It can be shown that the closure of such points
y is exactly the cut locus cut(x) of x. In particular,
S if c = d2 , then
Dom ′ (∇x c(x, · )) = M \ cut(x). Also cut(M ) := x∈M ({x} × cut(x))
is closed, so Dom ′ (∇x c) = (M × M ) \ cut(M ).
It is easily checked that c is C ∞ outside cut(M ).
Let now x and y be such that y ∈ / cut(x); in particular (as we know
from Problem 8.8), y is not focal to x, which means that d expx is
Bibliographical notes 337

well-deﬁned and nonsingular at vx,y = (expx )−1 (y), which is the initial
velocity of the unique geodesic going from x to y. But ∇x c(x, y) coin-
cides with −vx→y ; so ∇2xy c(x, y) = −∇y ((expx )−1 ) = −(∇v expx )−1 is
nonsingular. This concludes the proof of the strong twist condition.
It is also true that c satisﬁes (Cutn−1 ); in fact, for any compact
subset K of M and any x ∈ M one has

Hn−1 [K ∩ cut(x)] < +∞. (12.36)

This however is a much more delicate result, which will be found in

recent research papers.

Bibliographical notes

The denomination Monge–Ampère equation is used for any equation

which resembles det(∇2 ψ) = f , or more generally

det ∇2 ψ − A(x, ψ, ∇ψ) = F x, ψ, ∇ψ . (12.37)

Monge himself was probably not aware of the relation between the
Monge problem and Monge–Ampère equations; this link was made
much later, maybe in the work of Knott and Smith [524]. In any case
it is Brenier [156] who made this connection popular among the com-
munity of partial differential equations. Accordingly, weak solutions of
Monge–Ampère-type equations constructed by means of optimal trans-
port are often called Brenier solutions in the literature. McCann [614]
proved that such a solution automatically satisfies the Monge–Ampère
equation almost everywhere (see the bibliographical notes for Chap-
ter 11). Caffarelli [185] showed that for a convex target, Brenier’s notion
of solution is equivalent to the older concepts of Alexandrov solution
and viscosity solution. These notions are reviewed in [814, Chapter 4]
and a proof of the equivalence between Brenier and Alexandrov so-
lutions is recast there. (The concept of Alexandrov solution is devel-
oped in [53, Section 11.2].) Feyel and Üstünel [359, 361, 362] studied
the infinite-dimensional Monge–Ampère equation induced by optimal
transport with quadratic cost on the Wiener space.
The modern regularity theory of the Monge–Ampère equation was
pioneered by Alexandrov [16, 17] and Pogorelov [684, 685]. Since then
it has become one of the most prestigious subjects of fully nonlinear
338 12 Smoothness

partial diﬀerential equations, in relation to geometric problems such

as the construction of isometric embeddings, or convex hypersurfaces
with prescribed (multi-dimensional) Gauss curvature (for instance the
Minkowski problem is about the construction of a convex hypersur-
face whose Gauss curvature is a prescribed function of the normal; the
Alexandrov problem is about prescribing the so-called integral curva-
ture). These issues are described in Bakelman’s monograph [53]. Re-
cently Oliker [659] has shown that the Alexandrov problem can be
recast as an optimal transport problem on the sphere.
A modern account of some parts of the theory of the Monge–Ampère
equation can be found in the recent book by Gutiérrez [452]; there is
also an unpolished set of notes by Guan [446].
General references for fully nonlinear elliptic partial differential
equations are the book by Caffarelli and Cabré [189], the last part
of the one by Gilbarg and Trudinger [416], and the user-friendly notes
by Krylov [531] or by Trudinger [786]. As a major development, the
estimates derived by Evans, Krylov and Safonov [326, 530, 532] allow
one to establish the regularity of fully nonlinear second-order equations
under an assumption of uniform ellipticity. Eventually these techniques
rely on Schauder-type estimates for certain linear equations. Monge–
Ampère equations are uniformly elliptic only under a priori estimates
on the second derivative of the unknown function; so these a priori
estimates must first be established before applying the general theory.
An elementary treatment of the basic linear Schauder estimates can be
found, e.g., in [835] (together with many references), and a short treat-
ment of the Monge–Ampère equation (based on [833] and on works by
other authors) in [490].
Monge–Ampère equations arising in optimal transport have certain
distinctive features; one of them is a particular type of boundary con-
dition, sometimes referred to as the second boundary condition. The
pioneering papers in this field are due to Delanoë in dimension 2 [284],
then Caffarelli [185, 186, 187], Urbas [798, 799], and X.-J. Wang [833]
in arbitrary dimension. The first three authors were interested in the
case of a quadratic cost function in Euclidean space, while Wang con-
sidered the logarithmic cost function on the sphere, which appears in
the so-called reflector antenna problem, at least in the particular case
of “far field”. (At the time of publication of [833] it was not yet under-
stood that the problem treated there really was an optimal transport
problem; it was only later that Wang made this remark.)
Bibliographical notes 339

Wolfson [840] studied the optimal transport problem between two

sets of equal area in the plane, with motivations in geometry, and iden-
tified obstructions to the existence of smooth solutions in terms of the
curvature of the boundary.
Theorem 12.49 is one of the first steps in Caffarelli’s regularity the-
ory [185] for the quadratic cost function in Euclidean space; for more
general cost functions it is due to Ma, Trudinger and Wang [585].
Theorem 12.50 (together with further refinements) appears in [185,
186, 187]. An informal introduction to the first steps of Caffarelli’s
techniques can be found in [814, Chapter 4]. The extension to un-
bounded domains is sketched in [15, Appendix]. Most of the theory
was developed with nonconstructive arguments, but Forzani and Mal-
donado [376] were able to make at least the C 1,β estimates quantitative.
Apart from the C α → C 2,α theory, there is also a difficult C 0 → W 2,p
result [184], where p is arbitrarily large and f, g should be positive
continuous. (The necessity of both the continuity and the positivity
assumptions are demonstrated by counterexamples due to Wang [832];
see also [490].) Caffarelli and McCann [192] studied the regularity of
the optimal map in the problem of partial optimal transport, in which
only a fraction of the mass is transferred from the source to the target
(both measures need not have the same mass); this problem transforms
into a (double) obstacle problem for the Monge–Ampère equation. Fi-
galli [365] obtained refined results on this problem by more elementary
methods (not relying on the Monge–Ampère equation) and showed that
there is in general no regularity if the supports of the source and target
measures overlap. Another obstacle problem involving optimal trans-
port, arising from a physical model, has been studied in [737].
Cordero-Erausquin [240] adapted Caffarelli’s theory to the case of
the torus, and Delanoë [285] and studied the stability of this theory
under small perturbations. Roughly speaking he showed the following:
Given two smooth probability densities f and g on (say) Tn and a
smooth optimal transport T between µ = f vol and ν = g vol, it is
possible to slightly perturb f , g and the Riemannian metric, in such a
way that the resulting optimal transport is still smooth. (Note carefully:
How much you are allowed to perturb the metric depends on f and g.)
Urbas [798] considered directly the boundary regularity for uni-
formly convex domains in Rn , by first establishing a so-called oblique
derivative boundary condition, which is a sort of nonlinear version of
the Neumann condition. (Actually, the uniform convexity of the do-
340 12 Smoothness

main makes the oblique condition more elliptic in some sense than the
Neumann condition.) Fully nonlinear elliptic equations with oblique
boundary condition had been studied before in [555, 560, 561], and the
connection with the second boundary value problem for the Monge–
Ampère equation had been suggested in [562]. Compared to Caffarelli’s
method, this one only covers the global estimates, and requires higher
initial regularity; but it is more elementary.
The generalization of these regularity estimates to nonquadratic
cost functions stood as an open problem for some time. Then Ma,
Trudinger and Wang [585] discovered that the older interior estimates
by Wang [833] could be adapted to general cost functions satisfying the
condition Sc > 0 (this condition was called (A3) in their paper and
in subsequent works). Theorem 12.52(ii) is extracted from this refer-
ence. A subtle caveat in [585] was corrected in [793] (see Theorem 1
there). A key property discovered in this study is that if c is a regular
cost function and ψ is c-convex, then any local c-support function for
ψ is also a global c-support function (which is nontrivial unless ψ is
differentiable); an alternative proof can be derived from the method of
Y.-H. Kim and McCann [519, 520].
Trudinger and Wang [794] later adapted the method of Urbas to
treat the boundary regularity under the weaker condition Sc ≥ 0 (there
called (A3w)). The proof of Theorem 12.51 can be found there.
At this point Loeper [570] made three crucial contributions to the
theory. First he derived the very strong estimates in Theorem 12.52(i)
which showed that the Ma–Trudinger–Wang (A3) condition (called
(As) in Loeper’s paper) leads to a theory which is stronger than
the Euclidean one in some sense (this was already somehow implicit
in [585]). Secondly, he found a geometric interpretation of this condi-
tion, namely the regularity property (Definition 12.14), and related it to
well-known geometric concepts such as sectional curvature (Particular
Case 12.30). Thirdly, he proved that the weak condition (A3w) (called
(Aw) in his work) is mandatory to derive regularity (Theorem 12.21).
The psychological impact of this work was important: before that, the
Ma–Trudinger–Wang condition could be seen as an obscure ad hoc as-
sumption, while now it became the natural condition.
The proof of Theorem 12.52(i) in [570] was based on approximation
and used auxiliary results from [585] and [793] (which also used some
of the arguments in [570]. . . but there is no loophole!)
Bibliographical notes 341

Loeper [571] further proved that the squared distance on the sphere
is a uniformly regular cost, and combined all the above elements to
derive Theorem 12.58; the proof is simplified in [572]. In [571], Loeper
derived smoothness estimates similar to those in Theorem 12.52 for the
far-field reflector antenna problem.
The exponent β in Theorem 12.52(i) is explicit; for instance, in
the case when f = dµ/dx is bounded above and g is bounded below,
Loeper obtained β = (4n − 1)−1 , n being the dimension. (See [572] for
a simplified proof.) However, this is not optimal: Liu [566] improved
this into β = (2n − 1)−1 , which is sharp.
In a different direction, Caffarelli, Gutiérrez and Huang [191] could
get partial regularity for the far-field reflector antenna problem by very
elaborate variants of Caffarelli’s older techniques. This “direct” ap-
proach does not yet yield results as powerful as the a priori estimates
by Loeper, Ma, Trudinger and Wang, since only C 1 regularity is ob-
tained in [191], and only when the densities are bounded from above
and below; but it gives new insights into the subject.
In dimension 2, the whole theory of Monge–Ampère equations be-
comes much simpler, and has been the object of numerous studies [744].
Old results by Alexandrov [17] and Heinz [471] imply C 1 regularity of
the solution of det(∇2 ψ) = h as soon as h is bounded from above (and
strict convexity if it is bounded from below). Loeper noticed that this
implied strenghtened results for the solution of optimal transport with
quadratic cost in dimension 2, and together with Figalli [368] extended
this result to regular cost functions.
Now I shall briefly discuss counterexamples stories. Counterexam-
ples by Pogorelov and Caffarelli (see for instance [814, pp. 128–129])
show that solutions of the usual Monge–Ampère equation are not
smooth in general: some strict convexity on the solution is needed,
and it has to come from boundary conditions in one way or the other.
The counterexample in Theorem 12.3 is taken from Caffarelli [185],
where it is used to prove that the “Hessian measure” (a generalized
formulation of the Hessian determinant) cannot be absolutely continu-
ous if the bridge is thin enough; in the present notes I used a slightly
different reasoning to directly prove the discontinuity of the optimal
transport. The same can be said of Theorem 12.4, which is adapted
from Loeper [570]. (In Loeper’s paper the contradiction was obtained
indirectly as in Theorem 12.44.)
342 12 Smoothness

Ma, Trudinger and Wang [585, Section 7.3] generalized Caﬀarelli’s

counterexample, showing that for any nonconvex target domain, there
are smooth positive densities leading to nonsmooth optimal transport.
Their result holds for more general cost functions, up to the replace-
ment of convexity by c-convexity. The method of proof used in [585] is
adapted from an older paper by Wang [833] on the reflector antenna
problem. A similar strategy was rediscovered by Loeper [570] to con-
struct counterexamples in the style of Theorem 12.44.
Theorem 12.7 was proven for these notes, with the aim of getting an
elementary topological (as opposed to differential) version of Loeper’s
general nonsmoothness result.
An old counterexample by Lewy [744, Section 9.5] shows that the
general equation (12.37) needs certain convexity-type assumptions;
see [750, Section 3.3] for a rewriting of this construction. In view of
Lewy’s counterexample, specialists of regularity theory did expect that
smoothness would need some assumptions on the cost function, al-
though there was no hint of the geometric insights found by Loeper.
For nonregular cost functions, a natural question consists in describ-
ing the possible singularities of solutions of optimal transport problems.
Do these singularities occur along reasonably nice curves, or with com-
plicated, fractal-like geometries? No such result seems to be known.
Proposition 12.15, Theorems 12.20 and 12.44, Remarks 12.31 and
12.33, and the direct implication in Theorem 12.36 are all taken from
Loeper [570] (up to issues about the domain of definition of ∇x c). The
fact that Sc is defined independently of the coordinate system was also
noticed independently by Kim and McCann [520]. Theorem 12.35 is
due to Trudinger and Wang [793], as well as Examples 12.40 and 12.41,
except for the case of S n−1 which is due to Loeper [571]. The converse
implication in Theorem 12.36 was proven independently by Trudinger
and Wang [793] on the one hand, by Y.-H. Kim and McCann [519] on
the other. (The result by Trudinger and Wang is slightly more general
but the proof is more sophisticated since it involves delicate smooth-
ness theorems, while the argument by Kim and McCann is much more
direct.) The proof which I gave in these notes is a simplified version
of the Kim–McCann argument, which evolved from discussions with
Trudinger.
To go further, a refinement of the Ma–Trudinger–Wang conditions
seemed to be necessary. The condition

Sc (x, y) · (ξ, η) ≥ K0 |ξ|2 |e

η |2 + C0 (∇2x,y c) · (ξ, η)
Bibliographical notes 343

is called MTW(K0 , C0 ) in [572]; here ηe = −(∇2x,y c) · η. In the particu-

lar case C0 = +∞, one finds again the Ma–Trudinger–Wang condition
(strong if K0 > 0, weak if K0 = 0). If c is the squared geodesic Rieman-
nian distance, necessarily C0 ≥ K0 . According to [371], the squared
distance on the sphere satisfies MTW(K0 , K0 ) for some K0 ∈ (0, 1)
(this improves on [520, 571, 824]); numerical evidence even suggests
that this cost satisfies the doubly optimal condition MTW(1, 1).
The proof of Theorem 12.36 was generalized in [572] to the case
when the cost is the squared distance and satisfies an MTW(K0 , C0 )
condition. Moreover, one can afford to “slightly bend” the c-segments,
and the inequality expressing regularity is also complemented with a
remainder term proportional to K0 t(1−t) (inf |η|2 ) |ζ|2 (in the notation
of the proof of Theorem 12.36).
Figalli, Kim and McCann [367] have been working on the adaptation
of Caffarelli’s techniques to cost functions satisfying MTW(0, 0) (which
is a reinforcement of the weak regularity property, satisfied by many
examples which are not strictly regular).
Remark 12.45 is due to Kim [518], who constructed a smooth per-
turbation of a very thin cone, for which the squared distance is not a
regular cost function, even though the sectional curvature is positive
everywhere. (It is not known whether positive sectional curvature com-
bined with some pinching assumption would be sufficient to guarantee
the regularity.)
The transversal approximation property (TA) in Lemma 12.39 de-
rives from a recent work of Loeper and myself [572], where it is proven
for the squared distance on Riemannian manifolds with “nonfocal cut
locus”, i.e. no minimizing geodesic is focalizing. The denomination
“transversal approximation” is because the approximating path (b yt )
is constructed in such a way that it goes transversally through the cut
locus. Before that, Kim and McCann T [519] had introduced a different
density condition, namely that t Dom ′ (∇y c( · , yt )) be dense in M .
The latter assumption clearly holds on S n , but is not true for general
manifolds. On the contrary, Lemma 12.39 shows that (TA) holds with
great generality; the principle of the proof of this lemma, based on the
co-area formula, was suggested to me by Figalli. The particular co-area
formula which I used can be found in [331, p. 109]; it is established in
[352, Sections 2.10.25 and 2.10.26].
Property (Cutn−1 ) for the squared geodesic distance was proven
in independent papers by Itoh and Tanaka [489] and Li and Niren-
344 12 Smoothness

berg [551]; in particular, inequality (12.36) is a particular case of Corol-

lary 1.3 in the latter reference. These results are also established there
for quite more general classes of cost functions.
In relation to the conjecture evoked in Remark 12.43, Loeper and
I [572] proved that when the cost function is the squared distance on
a Riemannian manifold with nonfocal cut locus, the strict form of the
Ma–Trudinger–Wang condition automatically implies the uniform c-
convexity of all Dom ′ (∇x c(x, · )), and a strict version of the regularity
property.
It is not so easy to prove that the solution of a Monge–Ampère
equation such as (12.1) is the solution of the Monge–Kantorovich prob-
lem. There is a standard method based on the strict maximum prin-
ciple [416] to prove uniqueness of solutions of fully nonlinear elliptic
equations, but it requires smoothness (up to the boundary), and can-
not be used directly to assess the identity of the smooth solution with
the Kantorovich potential, which solves (12.1) only in weak sense. To
establish the desired property, the strategy is to prove the c-convexity
of the smooth solution; then the result follows from the uniqueness
of the Kantorovich potential. This was the initial motivation behind
Theorem 12.46, which was first proven by Trudinger and Wang [794,
Section 6] under slightly different assumptions (see also the remark in
Section 7 of the same work). All in all, Trudinger and Wang suggested
three different methods to establish the c-convexity; one of them is a
global comparison argument between the strong and the weak solution,
in the style of Alexandrov. Trudinger suggested to me that the Kim–
McCann strategy from [519] would yield an alternative proof of the
c-convexity, and this is the method which I implemented to establish
Theorem 12.46. (The proof by Trudinger and Wang is more general in
the sense that it does not need the č-convexity; however, this gain of
generality might not be so important because the c-convexity of ψ au-
tomatically implies the c-convexity of ∂c ψ and the č-convexity of ∂ c ψ c ,
up to issues about the domain of differentiability of c.)
In [585] a local uniqueness statement is needed, but this is tricky
since c-convexity is a global notion. So the problem arises whether a
locally c-convex function (meaning that for each x there is y such that
x is a local minimizer of ψ + c( · , y)) is automatically c-convex. This
local-to-global problem, which is closely related to Theorem 12.46 (and
to the possibility of localizing the study of the Monge–Ampère equa-
tion (12.35)), is solved affirmatively for uniformly regular cost functions
Bibliographical notes 345

in [793] where a number of variants of c-convexity (local c-convexity,

full c-convexity, strict c-convexity, global c-convexity) are carefully dis-
cussed. See also [571] for the case of the sphere.
In this chapter I chose to start with geometric (more general) con-
siderations, such as the regularity property, and end up with analytic
conditions in terms of the Ma–Trudinger–Wang tensor; but actually
the tensor was discovered before the geometric conditions. The role of
the Riemannian structure (and geodesic coordinates) in the presenta-
tion of this chapter might also seem artificial since it was noticed in
Remark 12.31 that the meaningful quantities are actually independent
of these choices. As a matter of fact, Kim and McCann [520] develop a
framework which avoids any reference to them and identifies the Ma–
Trudinger–Wang tensor as the sectional curvature tensor of the mixed
second derivative ∂ 2 c/∂x ∂y, considered as a pseudo-Riemannian metric
(with signature (n, n)) on the product manifold. In the same reference
they also point out interesting connections with pseudo-Riemannian,
Lagrangian and symplectic geometry (related to [840]).
Now some comments about terminology. The terminology of c-
curvature was introduced by Loeper [570] after he made the connection
between the Ma–Trudinger–Wang tensor and the sectional curvature. It
was Trudinger who suggested the term “regular” to describe a matrix-
valued function A in (12.37) that would satisfy adequate assumptions
of the type Sc ≥ 0. By extension, I used the same denomination for
cost functions satisfying the property of Definition 12.14. Kim and Mc-
Cann [519] call this property (DASM) (Double mountain Above Slid-
ing Mountain).
To apply Theorems 12.51 or 12.52 to problems where the cost func-
tion is not everywhere differentiable, one first needs to make sure that
the c-subdifferential of the unknown function lies within the domain
of differentiability. Typically (for the cost function d(x, y)2 on a Rie-
mannian manifold), this means controlling the distance of T (x) to the
cut locus of x, where T is the optimal transport. (“Stay away from cut
locus!”) Until recently, the only manifolds for which this was known to
be true independently of the probability densities (say bounded pos-
itive) were positively curved manifolds where all geodesics have the
same length: the sphere treated in [286] and [571]; and the projec-
tive space considered in [287]. In the latter work, it was shown that
the “stay-away” property still holds true if the variation of the length
of geodesics is small with respect to certain other geometric quanti-
346 12 Smoothness

ties, and the probability densities satisfy certain size restrictions. Then
Loeper and I [572] established smoothness estimates for the optimal
transport on C 4 perturbations of the projective space, without any
size restriction.
The cut locus is also a major issue in the study of the perturbation
of these smoothness results. Because the dependence of the geodesic
distance on the Riemannian metric is not smooth near the cut locus,
it is not clear whether the Ma–Trudinger–Wang condition is stable
under C k perturbations of the metric, however large k may be. This
stability problem, first formulated in [572], is in my opinion extremely
interesting; it is solved by Figalli and Rifford [371] near S 2 .
Without knowing the stability of the Ma–Trudinger–Wang condi-
tion, if pointwise a priori bounds on the probability densities are given,
one can afford a C 4 perturbation of the metric and retain the Hölder
continuity of optimal transport; or even afford a C 2 perturbation and
retain a mesoscopic version of the Hölder continuity [822].
Some of the smoothness estimates discussed in these notes also hold
for other more complicated fully nonlinear equations, such as the reflec-
tor antenna problem [507] (which in its general formulation does not
seem to be equivalent to an optimal transport problem) or the so-called
Hessian equations [789, 790, 792, 800], where the dominant term is a
symmetric function of the eigenvalues of the Hessian of the unknown.
The short survey by Trudinger [788] presents some results of this type,
with applications to conformal geometry, and puts this into perspec-
tive together with optimal transport. In this reference Trudinger also
notes that the problem of the prescribed Schouten tensor resembles an
optimal transport problem with logarithmic cost function; this connec-
tion had also been made by McCann (see the remarks in [520]) who
had long ago noticed the properties of conformal invariance of this cost
function.
A topic which I did not address at all is the regularity of certain sets
solving variational problems involving optimal transport; see [632].
13

Qualitative picture

This chapter is devoted to a recap of the whole picture of optimal

transport on a smooth Riemannian manifold M . For simplicity I shall
not try to impose the most general assumptions. A good understanding
of this chapter is suﬃcient to attack Part II of this course.

Recap

Let M be a smooth complete connected Riemannian manifold, L(x, v, t)

a C 2 Lagrangian function on TM × [0, 1], satisfying the classical con-
ditions of Deﬁnition 7.6, together with ∇2v L > 0. Let c : M × M → R
be the induced cost function:
nZ 1 o
c(x, y) = inf L(γt , γ̇t , t) dt; γ0 = x, γ1 = y .
0

More generally, deﬁne

nZ t o
s,t
c (x, y) = inf L(γτ , γ̇τ , τ ) dτ ; γs = x, γt = y .
s

So cs,t (x, y) is the optimal cost to go from point x at time s, to point

y at time t.
I shall consider three cases: (i) L(x, v, t) arbitrary on a compact
manifold; (ii) L(x, v, t) = |v|2 /2 on a complete manifold (so the cost
is d2 /2, where d is the distance); (iii) L(x, v, t) = |v|2 /2 in Rn (so the
cost is |x − y|2 /2). Throughout the sequel, I denote by µ0 the initial
probability measure, and by µ1 the ﬁnal one. When I say “absolutely
348 13 Qualitative picture

continuous” or “singular” this is in reference with the volume measure

on the manifold (Lebesgue measure in Rn ).
Recall that a generalized optimal coupling is a c-cyclically monotone
coupling. By analogy, I shall say that a generalized displacement inter-
polation is a path (µt )0≤t≤1 valued in the space of probability measures,
such that µt = law (γt ) and γ is a random minimizing curve such that
(γ0 , γ1 ) is a generalized optimal coupling. These notions are interesting
only when the total cost between µ0 and µ1 is inﬁnite.
By gathering the results from the previous chapters, we know:
1. There always exists:
• an optimal coupling (or generalized optimal coupling) (x0 , x1 ), with
law π;
• a displacement interpolation (or generalized displacement interpo-
lation) (µt )0≤t≤1 ;
• a random minimizing curve γ with law Π;
such that law (γt ) = µt , and law (γ0 , γ1 ) = π. Each curve γ is a solution
of the Euler–Lagrange equation
d
∇v L(γt , γ̇t , t) = ∇x L(γt , γ̇t , t). (13.1)
dt
In the case of a quadratic Lagrangian, this equation reduces to
d2 γt
= 0,
dt2
so trajectories are just geodesics, or straight lines in Rn . Two trajecto-
ries in the support of Π may intersect at time t = 0 or t = 1, but never
at intermediate times.
2. If either µ0 or µ1 is absolutely continuous, then so is µt , for all
t ∈ (0, 1).
3. If µ0 is absolutely continuous, then the optimal coupling (x0 , x1 )
is unique (in law), deterministic (x1 = T (x0 )) and characterized by the
equation
∇ψ(x0 ) = −∇x c(x0 , x1 ) = ∇v L(x0 , γ̇0 , 0), (13.2)
where (γt )0≤t≤1 is the minimizing curve joining γ0 = x0 to γ1 = x1 (it
is part of the theorem that this curve is almost surely unique), and ψ
is a c-convex function, that is, it can be written as

ψ(x) = sup φ(y) − c(x, y)
y∈M
Recap 349

for some nontrivial (i.e. not identically −∞, and never +∞) function φ.
In case (ii), if nothing is known about the behavior of the distance
function at infinity, then the gradient ∇ in the left-hand side of (13.2)
should be replaced by an approximate gradient ∇. e
4. Under the same assumptions, the (generalized) displacement
interpolation (µt )0≤t≤1 is unique. This follows from the almost sure
uniqueness of the minimizing curve joining γ0 to γ1 , where (γ0 , γ1 ) is
the optimal coupling. (Corollary 7.23 applies when the total cost is
finite; but even if the total cost is infinite, we can apply a reasoning
similar to the one in Corollary 7.23. Note that the result does not follow
from the vol ⊗ vol (dx0 dx1 )-uniqueness of the minimizing curve joining
x0 to x1 .)
5. Without loss of generality, one might assume that

φ(y) = inf ψ(x) + c(x, y)
x∈M

(these are true supremum and true inﬁmum, not just up to a negligible
set). One can also assume without loss of generality that

∀x, y ∈ M, φ(y) − ψ(x) ≤ c(x, y)

and
φ(x1 ) − ψ(x0 ) = c(x0 , x1 ) almost surely.
6. It is still possible that two minimizing curves meet at time t = 0
or t = 1, but this event may occur only on a very small set, of dimension
at most n − 1.
7. All of the above remains true if one replaces µ0 at time 0 by µt
at time t, with obvious changes of notation (e.g. replace c = c0,1 by
ct,1 ); the function φ is unchanged, but now ψ should be changed into
ψt deﬁned by
ψt (y) = inf ψ0 (x) + c0,t (x, y) . (13.3)
x∈M

This ψt is a (viscosity) solution of the forward Hamilton–Jacobi equa-

tion
∂t ψt + L∗ x, ∇ψt (x), t = 0.
8. The equation for the optimal transport Tt between µ0 and µt is as
follows: Tt (x) is the solution at time t of the Euler–Lagrange equation
starting from x with velocity
−1
v0 (x) = ∇v L(x, · , 0) (∇ψ(x)). (13.4)
350 13 Qualitative picture

In particular,

• For the quadratic cost on a Riemannian manifold M , Tt (x) =

expx (t∇ψ(x)): To obtain Tt , flow for time t along a geodesic start-
e
ing at x with velocity ∇ψ(x) (or rather ∇ψ(x) if nothing is known
about the behavior of M at infinity);
• For the quadratic cost in Rn , Tt (x) = (1 − t) x + t ∇Ψ (x), where
Ψ (x) = |x|2 /2+ψ(x) defines a lower semicontinuous convex function
in the usual sense. In particular, the optimal transport from µ0 to
µ1 is a gradient of convex function, and this property characterizes
it uniquely among all admissible transports.

9. Whenever 0 ≤ t0 < t1 ≤ 1,
Z Z
ψt1 dµt1 − ψt0 dµt0 = C t0 ,t1 (µt0 , µt1 )
Z t1 Z

= L x, [(∇v L)(x, · , t)]−1 (∇ψt (x)), t dµt (x) dt;
t0

recall indeed Theorems 7.21 and 7.36, Remarks 7.25 and 7.37, and (13.4).

Simple as they may seem by now, these statements summarize years

of research. If the reader has understood them well, then he or she is
ready to go on with the rest of this course. The picture is not really
complete and some questions remain open, such as the following:

Open Problem 13.1. If the initial and final densities, ρ0 and ρ1 , are
positive everywhere, does this imply that the intermediate densities ρt
are also positive? Otherwise, can one identify simple sufficient con-
ditions for the density of the displacement interpolant to be positive
everywhere?

For general Lagrangian actions, the answer to this question seems to

be negative, but it is not clear that one can also construct counterexam-
ples for, say, the basic quadratic Lagrangian. My personal guess would
be that the answer is about the same as for the smoothness: Positiv-
ity of the displacement interpolant is in general false except maybe for
some particular manifolds satisfying an adequate structure condition.
Standard approximation procedure 351

Standard approximation procedure

In this last section I have gathered two useful approximation results

which can be used in problems where the probability measures are
either noncompactly supported, or singular.
In Chapter 10 we have seen how to treat the Monge problem in
noncompact situations, without any condition at infinity, thanks to the
notion of approximate differentiability. However, in practice, to treat
noncompact situations, the simplest solution is often to use again a
truncation argument similar to the one used in the proof of approximate
differentiability. The next proposition displays the main scheme that
one can use to deal with such situations.

Proposition 13.2 (Standard approximation scheme). Let M be

a smooth complete Riemannian manifold, let c = c(x, y) be a cost func-
tion associated with a Lagrangian L(x, v, t) on TM ×[0, 1], satisfying the
classical conditions of Definition 7.6; and let µ0 , µ1 be two probability
measures on M . Let π be an optimal transference plan between µ0 and
µ1 , let (µt )0≤t≤1 be a displacement interpolation and let Π be a dynam-
ical optimal transference plan such that (e0 , e1 )# Π = π, (et )# Π = µt .
Let Γ be the set of all action-minimizing curves, equipped with the topol-
ogy of uniform convergence; and let (Kℓ )ℓ∈N be a sequence of compact
sets in Γ , such that Π[U Kℓ ] = 1. For ℓ large enough, Π[Kℓ ] > 0; then
define
1Kℓ Π
Zℓ := Π[Kℓ ]; Πℓ := ;
Zℓ
µt,ℓ := (et )# Πℓ ; πℓ := (e0 , e1 )# Πℓ ;
and let cℓ be the restriction of c to projM ×M (Kℓ ). Then for each ℓ,
(µt,ℓ )0≤t≤1 is a displacement interpolation and πℓ is an associated
optimal transference plan; µt,ℓ is compactly supported, uniformly in
t ∈ [0, 1]; and the following monotone convergences hold true:

Zℓ ↑ 1; Zℓ πℓ ↑ π; Zℓ µt,ℓ ↑ µt ; Zℓ Πℓ ↑ Π.

If moreover µ0 is absolutely continuous, then there exists a c-convex

function ψ : M → R∪{+∞} such that π is concentrated on the graph of
e
the transport T : x → (∇x c)−1 (x, −∇ψ(x)). For any ℓ, µ0,ℓ is absolutely
continuous, and the optimal transference plan πℓ is deterministic. Fur-
thermore, there is a cℓ -convex function ψℓ such that ψℓ coincides with
352 13 Qualitative picture

ψ everywhere on Cℓ := projM (Spt(πℓ )); and there is a set Zℓ such that

e
vol [Zℓ ] = 0 and for any x ∈ Cℓ \ Zℓ , ∇ψ(x) = ∇ψℓ (x).
Still under the assumption that µ0 is absolutely continuous, the mea-
sures µt,ℓ are also absolutely continuous, and the optimal transport
Tt0 →t,ℓ between µt0 ,ℓ and µt,ℓ is deterministic, for any given t0 ∈ [0, 1)
and t ∈ [0, 1]. In addition,

Tt0 →t,ℓ = Tt0 →t , µt0 ,ℓ -almost surely,

where Tt0 →t is the optimal transport from µt0 to µt .

Proof of Proposition 13.2. The proof is quite similar to the argument

used in the proof of uniqueness in Theorem 10.42 in a time-independent
context. It is no problem to make this into a time-dependent version,
since displacement interpolation behaves well under restriction, recall
Theorem 7.30. The last part of the theorem follows from the fact that
the map Tt0 →t,ℓ can be written as γt0 7→ γt . ⊓
⊔

Remark 13.3. Proposition 13.2 will be used several times throughout

this course, for instance in Chapter 17. Its main drawback is that there
is absolutely no control of the smoothness of the approximations: Even
if the densities ρ0 and ρ1 are smooth, the approximate densities ρ0,ℓ and
ρ1,ℓ will in general be discontinuous. In the proof of Theorem 23.14 in
Chapter 23, I shall use another approximation scheme which respects
the smoothness, but at the price of a loss of control on the approxima-
tion of the transport.

Let us now turn to the problem of approximating singular transport

problems by smooth ones. If µ0 and µ1 are singular, there is a priori no
uniqueness of the optimal transference plans, and actually there might
be a large number (possibly uncountable) of them. However, the next
theorem shows that singular optimal transference plans can always be
approximated by nice ones.

Theorem 13.4 (Regularization of singular transport problems).

Let M be a smooth complete Riemannian manifold, and c : M ×M → R
be a cost function induced by a Lagrangian L(x, v, t) satisfying the clas-
sical conditions of Definition 7.6. Further, let µ0 and µ1 be two proba-
bility measures on M , such that the optimal transport cost between µ0
and µ1 is finite, and let π be an optimal transference plan between µ0
and µ1 . Then there are sequences (µk0 )k∈N , (µk1 )k∈N and (π k )k∈N such
that
Equations of displacement interpolation 353

(i) each π k is an optimal transference plan between µk0 and µk1 , and
any one of the probability measures µk0 , µk1 has a smooth, compactly
supported density;
(ii) µk0 → µ0 , µk1 → µ1 , π k → π in the weak sense as k → ∞.

Proof of Theorem 13.4. By Theorem 7.21, there exists a displacement

interpolation (µt )0≤t≤1 between µ0 and µ1 ; let (γt )0≤t≤1 be such that
µt = law (γt ). The assumptions on L imply that action-minimizing
curves solve a differential equation with Lipschitz coefficients, and
therefore are uniquely determined by their initial position and velocity,
a fortiori by their restriction to some time-interval [0, t0 ]. So for any
t0 ∈ (0, 1/2), by Theorem 7.30(ii), (γt0 , γ1−t0 ) is the unique optimal
coupling between µt0 and µ1−t0 . Now it is easy to construct a sequence
(µkt0 )k∈N such that µkt0 converges weakly to µt0 as k → ∞, and each
µkt0 is compactly supported with a smooth density. (To construct such
a sequence, first truncate to ensure the property of compact support,
then localize to charts by a partition of unity, and apply a regulariza-
tion in each chart.) Similarly, construct a sequence (µk1−t0 )k∈N such that
µk1−t0 converges weakly to µ1−t0 , and each µk1−t0 is compactly supported
with a smooth density. Let πtk0 ,1−t0 be the unique optimal transference
plan between µt0 and µ1−t0 . By stability of optimal transport (Theo-
rem 5.20), πtk0 ,1−t0 converges as k → ∞ to πt0 ,1−t0 = law (γt0 , γ1−t0 ).
Then by continuity of γ, the random variable (γt0 , γ1−t0 ) converges
pointwise to (γ0 , γ1 ) as t0 → 0, which implies that πt0 ,1−t0 converges
weakly to π. The conclusion follows by choosing t0 = 1/n, k = k(n)
large enough. ⊓
⊔

Equations of displacement interpolation

In Chapter 7, we understood that a curve (µt )0≤t≤1 obtained by dis-

placement interpolation solves an action minimization problem in the
space of measures, and we wondered whether we could obtain some nice
equations for these curves. Here now is a possible answer. For simplicity
I shall assume that there is enough control at inﬁnity, that the notion
of approximate diﬀerentiability can be dispensed with (this is the case
for instance if M is compact).
354 13 Qualitative picture

Consider a displacement interpolation (µt )0≤t≤1 . By Theorem 7.21,

µt can be seen as the law of γt , where the random path (γt )0≤t≤1 satis-
ﬁes the Euler–Lagrange equation (13.1), and so at time t has velocity
−1
ξt (γt ), where ξt (x) := ∇v L(x, · , t) (∇ψt (x)). By the formula of con-
servation of mass, µt satisﬁes
∂µt
+ ∇ · (ξt µt ) = 0
∂t
in the sense of distributions (be careful: ξt is not necessarily a gradient,
unless L is quadratic). Then we can write down the equations of
displacement interpolation:


 ∂µt

 + ∇ · (ξt µt ) = 0;

 ∂t

∇v L x, ξt (x), t = ∇ψt (x); (13.5)

 ψ is c-convex;

 0

∂ ψ + L∗ x, ∇ψ (x), t = 0.

t t t

If the cost function is just the square of the distance, then these
equations become


 ∂µt

 + ∇ · (ξt µt ) = 0;
 ∂t


ξ (x) = ∇ψ (x);
t t
2 /2-convex;
(13.6)

 ψ 0 is d



 2
∂t ψt + |∇ψt | = 0.

2
Finally, for the square of the Euclidean distance, this simpliﬁes into

 ∂µt

 + ∇ · (ξt µt ) = 0;

 ∂t




ξt (x) = ∇ψt (x);
 |x|2 (13.7)

 x → + ψ 0 (x) is lower semicontinuous convex;

 2


 2
∂t ψt + |∇ψt | = 0.

2

Apart from the special choice of initial datum, the latter system
is well-known in physics as the pressureless Euler equation, for a
potential velocity ﬁeld.
Quadratic cost function 355

Quadratic cost function

In a context of Riemannian geometry, it is natural to focus on the

quadratic Lagrangian cost function, or equivalently on the cost function
c(x, y) = d(x, y)2 , and consider the Wasserstein space P2 (M ). This will
be the core of all the transport proofs in Part II of this course, so a key
role will be played by d2 /2-convex functions (that is, c-convex functions
for c = d2 /2). In Part III we shall consider metric structures that are
not Riemannian, but still the square of the distance will be the only
cost function. So in the remainder of this chapter I shall focus on that
particular cost.
The class of d2 /2-convex functions might look a bit mysterious, and
if they are so important it would be good to have simple characteriza-
tions of them. If ψ is d2 /2-convex, then z → ψ(z) + d(z, y)2 /2 should
have a minimum at x when y = expx (∇ψ(x)). If in addition ψ is twice
diﬀerentiable at x, then necessarily

d( · , expx ∇ψ(x))2
∇2 ψ(x) ≥ − ∇2 (x). (13.8)
2
However, this is only a necessary condition, and it is not clear that it
would imply d2 /2-convexity, except for a manifold satisfying the very
strong curvature condition Sd2 /2 ≥ 0 as in Theorem 12.46.
On the other hand, there is a simple and useful general criterion
according to which suﬃciently small functions are d2 /2-convex. This
statement will guarantee in particular that any tangent vector v ∈ T M
can be represented as the gradient of a d2 /2-convex function.

Theorem 13.5 (C 2 -small functions are d2 /2-convex). Let M be

a Riemannian manifold, and let K be a compact subset of M . Then,
there is ε > 0 such that any function ψ ∈ Cc2 (M ) satisfying

Spt(ψ) ⊂ K, kψkC 2 ≤ ε
b

is d2 /2-convex.

Example 13.6. Let M = Rn , then ψ is d2 /2-convex if ∇2 ψ ≥ −In .

(In this particular case there is no need for compact support, and a
one-sided bound on the second derivative is suﬃcient.)

Proof of Theorem 13.5. Let (M, g) be a Riemannian manifold, and let

K be a compact subset of M . Let K ′ = {x ∈ M ; d(x, K) ≤ 1}. For
356 13 Qualitative picture

any y ∈ M , the Hessian of x → d(x, y)2 /2 is equal to In (or, more

rigorously, to the identity on Tx M ) at x = y; so by compactness one
may ﬁnd δ > 0 such that the Hessian of x → d(x, y)2 /2 remains larger
than In /2 as long as y stays in K ′ and d(x, y) < 2δ. Without loss of
generality, δ < 1/2.
Now let ψ be supported in K, and such that
δ2 1
∀x ∈ M |ψ(x)| < , |∇2 ψ(x)| < ;
4 4
write
d(x, y)2
fy (x) = ψ(x) +
2
2
and note that ∇ fy ≥ In /4 in B2δ (y), so fy is uniformly convex in that
ball.
If y ∈ K ′ and d(x, y) ≥ δ, then obviously fy (x) ≥ δ2 /4 > ψ(y) =
fy (y); so the minimum of fy can be achieved only in Bδ (y). If there
are two distinct such minima, say x0 and x1 , then we can join them by
a geodesic (γt )0≤t≤1 which stays within B2δ (y) and then the function
t → fy (γt ) is uniformly convex (because fy is uniformly convex in
B2δ (y)), and minimum at t = 0 and t = 1, which is impossible.
/ K ′ , then ψ(x) 6= 0 implies d(x, y) ≥ 1, so fy (x) ≥ (1/2)−δ2 /4,
If y ∈
while fy (y) = 0. So the minimum of fy can only be achieved at x such
that ψ(x) = 0, and it has to be at x = y.
In any case, fy has exactly one minimum, which lies in Bδ (y). We
shall denote it by x = T (y), and it is characterized as the unique
solution of the equation

d(x, y)2
∇ψ(x) + ∇x = 0, (13.9)
2
where x is the unknown.
Let x be arbitrary in M , and y = expx (∇ψ(x)). Then (as a con-
sequence of the ﬁrst variation formula), ∇x [d(x, y)2 /2] = −∇ψ(x), so
equation (13.9) holds true, and x = T (y). This means that, with the
notation c(x, y) = d(x, y)2 /2, one has ψ c (y) = ψ(x) + c(x, y). Then
ψ cc (x) = sup[ψ c (y) − c(x, y)] ≥ ψ(x). Since x is arbitrary, actually we
have shown that ψ cc ≥ ψ; but the converse inequality is always true,
so ψ cc = ψ, and then ψ is c-convex. ⊓
⊔

Remark 13.7. The end of the proof took advantage of a general prin-
ciple, independent of the particular cost c: If there is a surjective map
The structure of P2 (M ) 357

T such that fy : x → ψ(x) + c(x, y) is minimum at T (y), then ψ is

c-convex.

The structure of P2 (M )

A striking discovery made by Otto at the end of the nineties is that the
differentiable structure on a Riemannian manifold M induces a kind of
differentiable structure in the space P2 (M ). This idea takes substance
from the following remarks: All of the path (µt )0≤t≤1 is determined
from the initial velocity field ξ0 (x), which in turn is determined by ∇ψ
as in (13.4). So it is natural to think of the function ∇ψ as a kind of
“initial velocity” for the path (µt ). The conceptual shift here is about
the same as when we decided that µt could be seen either as the law
of a random minimizing curve at time t, or as a path in the space of
measures: Now we decide that ∇ψ can be seen either as the field of the
initial velocities of our minimizing curves, or as the (abstract) velocity
of the path (µt ) at time t = 0.
There is an abstract notion of tangent space Tx X (at point x) to a
metric space (X , d): in technical language, this is the pointed Gromov–
Hausdorff limit of the rescaled space. It is a rather natural notion: fix
your point x, and zoom onto it, by multiplying all distances by a large
factor ε−1 , while keeping x fixed. This gives a new metric space Xx,ε ,
and if one is not too curious about what happens far away from x, then
the space Xx,ε might converge in some nice sense to some limit space,
that may not be a vector space, but in any case is a cone. If that limit
space exists, it is said to be the tangent space (or tangent cone) to X
at x. (I shall come back to these issues in Part III.)
In terms of that construction, the intuition sketched above is in-
deed correct: let P2 (M ) be the metric space consisting of probability
measures on M , equipped with the Wasserstein distance W2 . If µ is
absolutely continuous, then the tangent cone Tµ P2 (M ) exists and can
be identified isometrically with the closed vector space generated by
d2 /2-convex functions ψ, equipped with the norm
Z 1/2
k∇ψkL2 (µ;TM ) := |∇ψ|2x dµ(x) .
M
358 13 Qualitative picture

Actually, in view of Theorem 13.5, this is the same as the vector space
generated by all smooth, compactly supported gradients, completed
with respect to that norm.
With what we know about optimal transport, this theorem is not
that hard to prove, but this would require a bit too much geometric
machinery for now. Instead, I shall spend some time on an important
related result by Ambrosio, Gigli and Savaré, according to which any
Lipschitz curve in the space P2 (M ) admits a velocity (which for all t
lives in the tangent space at µt ). Surprisingly, the proof will not require
absolute continuity.

Theorem 13.8 (Representation of Lipschitz paths in P2 (M )).

Let M be a smooth complete Riemannian manifold, and let P2 (M ) be
the metric space of all probability measures on M , with a finite sec-
ond moment, equipped with the metric W2 . Further, let (µt )0≤t≤1 be a
Lipschitz-continuous path in P2 (M ):

W2 (µs , µt ) ≤ L |t − s|.

For any t ∈ [0, 1], let Ht be the Hilbert space generated by gradients of
continuously differentiable, compactly supported ψ:
L2 (µt ;TM )
Ht := Vect {∇ψ; ψ ∈ Cc1 (M )} .

Then there exists a measurable vector field ξt (x) ∈ L∞ (dt; L2 (dµt (x))),
µt (dx) dt-almost everywhere unique, such that ξt ∈ Ht for all t (i.e. the
velocity field really is tangent along the path), and

∂t µt + ∇ · (ξt µt ) = 0 (13.10)

in the weak sense.

Conversely, if the path (µt )0≤t≤1 satisfies (13.10) for some measur-
able vector field (ξt (x)) whose L2 (µt )-norm is bounded by L, almost
surely in t, then (µt ) is a Lipschitz-continuous curve with kµ̇k ≤ L.

The proof of Theorem 13.8 requires some analytical tools, and the
reader might skip it at ﬁrst reading.

Proof of Theorem 13.8. Let ψ : M → R be a C 1 function, with Lipschitz

constant at most 1. For all s < t in [0, 1],
Z Z

ψ dµt − ψ dµs ≤ W1 (µs , µt ) ≤ W2 (µs , µt ). (13.11)

M M
The structure of P2 (M ) 359
R
In particular, ζ(t) := M ψ dµt is a Lipschitz function of t. By Theo-
rem 10.8(ii), the time-derivative of ζ exists for almost all times t ∈ [0, 1].
Then let πs,t be an optimal transference plan between µs and µt (for
the squared distance cost function). Let

 |ψ(x) − ψ(y)|

 if x 6= y
 d(x, y)
Ψ (x, y) :=



|∇ψ(x)| if x = y.

Obviously Ψ is bounded by 1, and moreover it is upper semicontinuous.

If t is a diﬀerentiability point of ζ, then
Z Z Z
d 1
ψ dµ ≤ lim inf ψ dµ − ψ dµ
dt t
ε↓0 ε t t+ε
Z
1
≤ lim inf |ψ(y) − ψ(x)| dπt,t+ε (x, y)
ε↓0 ε
rZ
sZ ! d(x, y)2 dπt,t+ε (x, y)
≤ lim inf Ψ (x, y)2 dπt,t+ε (x, y)
ε↓0 ε
sZ !
2
W2 (µt , µt+ε )
= lim inf Ψ (x, y) dπt,t+ε (x, y)
ε↓0 ε
sZ !
≤ lim inf Ψ (x, y)2 dπt,t+ε (x, y) L.
ε↓0

Since Ψ is upper semicontinuous and πt,t+ε converges weakly to δx=y

(the trivial transport plan where nothing moves) as ε ↓ 0, it follows
that
Z sZ
d
|Ψ (x, x)|2 dµt (x)
dt ψ dµt ≤ L
sZ
=L |∇ψ(x)|2 dµt (x).

R
Now the key remark is that the time-derivative (d/dt) (ψ +RC) dµt
does not depend on the constant C. This shows that (d/dt) ψ dµt
really is a functional of ∇ψ, obviously linear. The above estimate shows
that this functional is continuous with respect to the norm in L2 (dµt ).
360 13 Qualitative picture

Actually, this is not completely rigorous, since this functional is only

defined for almost all t, and “almost all” here might depend on ψ. Here
is a way to make things rigorous: Let L be the set of all Lipschitz
functions ψ on M with Lipschitz constant at most 1, such that, say,
ψ(x0 ) = 0, where x0 ∈ M is arbitrary but fixed once for all, and ψ is
supported in a fixed compact K ⊂ M . The set L is compact in the norm
of uniform convergence, and contains a dense sequence (ψk )k∈N . By a
regularization argument, one can assume thatR all those functions are
actually of class C 1 . For each ψk , we know that ψk dµt is differentiable
for almost all t ∈ [0, 1]; and since there are only countably many ζk ’s,
we know that R for almost every t, each ζk is differentiable at time t. The
map (d/dt) α dµt is well-defined at each of these times t, for all α in
the vector space Ht generated by all the ψk ’s, and it is continuous if
that vector space is equipped with the L2 (dµt ) norm. It follows from
the Riesz representation theorem that for each differentiability time t
there exists a unique vector ξt ∈ Ht ⊂ L2 (dµt ), with norm at most L,
such that Z Z
d
ψ dµt = ξt · ∇ψ dµt . (13.12)
dt
This identity should hold true for any ψk , and by density it should also
hold true for any ψ ∈ C 1 (M ), supported in K.
Let CK 1 (M ) be the set of ψ ∈ C 1 (M ) that are supported in K. We

just showed that there is a negligible set of times, τK , such that (13.12)
holds true for all ψ ∈ CK 1 (M ) and t ∈ / τK . Now choose an increasing
family of compact sets (Km )m∈N , with ∪Km = M , so that any compact
set is included in some Km . Then (13.12) holds true for all ψ ∈ Cc1 (M ),
as soon as t does not belong to the union of τKm , which is still a
negligible set of times.
But equation (13.12) is really the weak formulation of (13.10). Since
ξt is uniquely determined in L2 (dµt ), for almost all t, actually the vector
ﬁeld ξt (x) is dµt (x) dt-uniquely determined.
To conclude the proof of the theorem, it only remains to prove the
converse implication. Let (µt ) and (ξt ) solve (13.10). By the equation
of conservation of mass, µt = law (γt ), where γt is a (random) solution
of
γ̇t = ξt (γt ).
Let s < t be any two times in [0, 1]. From the formula
nZ t o
d(γs , γt )2 = (t − s) inf |ζ̇τ |2 dτ ; ζs = γs , ζt = γt ,
s
The structure of P2 (M ) 361

we deduce
Z t Z t
2 2
d(γs , γt ) ≤ (t − s) |γ̇τ | dτ ≤ (t − s) |ξt (γt )|2 dτ.
s s
So
Z t
E d(γs , γt )2 ≤ (t − s) |ξτ (x)|2 dµτ (x) dτ ≤ (t − s)2 kξkL∞ (dt; L2 (dµt )) .
s
In particular
W2 (µs , µt )2 ≤ E d(γs , γt )2 ≤ L2 (t − s)2 ,
where L is an upper bound for the norm of ξ in L∞ (L2 ). This concludes
the proof of Theorem 13.8. ⊓
⊔
Remark 13.9. With hardly any more work, the preceding theorem can
be extended to cover paths that are absolutely continuous of order 2,
in the sense defined on p. 127. Then of course the velocity field will not
live in L∞ (dt; L2 (dµt )), but in L2 (dµt dt).
Observe that in a displacement interpolation, the initial measure µ0
and the initial velocity field ∇ψ0 uniquely determine the final measure
µ1 : this implies that geodesics in P2 (M ) are nonbranching, in the strong
sense that their initial position and velocity determine uniquely their
final position.
Finally, we can now derive an “explicit” formula for the action func-
tional determining displacement interpolations as minimizing curves.
Let µ = (µt )0≤t≤1 be any Lipschitz (or absolutely continuous) path in
P2 (M ); let ξt (x) = ∇ψt (x) be the associated time-dependent velocity
field. By the formula of conservation of mass, µt can be interpreted as
the law of γt , where γ is a random solution of γ̇t = ξt (γt ). Define
Z 1
A(µ) := inf E µt |ξt (γt )|2 dt, (13.13)
0
where the infimum is taken over all possible realizations of the random
curves γ. By Fubini’s theorem,
Z 1 Z 1
2
A(µ) = inf E |ξt (γt )| dt = inf E |γ̇t |2 dt
0 0
Z 1
≥ E inf |γ̇t |2 dt
0
= E d(γ0 , γ1 )2 ,
362 13 Qualitative picture

and the infimum is achieved if and only if the coupling (γ0 , γ1 ) is mini-
mal, and the curves γ are (almost surely) action-minimizing. This shows
that displacement interpolations are characterized as the minimizing
curves for the action A. Actually A is the same as the action appear-
ing in Theorem 7.21 (iii), the only improvement is that now we have
produced a more explicit form in terms of vector fields.
The expression (13.13) can be made slightly more explicit by noting
that the optimal choice of velocity field is the one provided by Theo-
rem 13.8, which is a gradient, so we may restrict the action functional
to gradient velocity fields:
Z 1
∂µt
A(µ) := E µt |∇ψt |2 dt; + ∇ · (∇ψt µt ) = 0. (13.14)
0 ∂t

Note the formal resemblance to a Riemannian structure: What the

formula above says is
Z 1
W2 (µ0 , µ1 )2 = inf kµ̇t k2Tµt P2 dt, (13.15)
0

where the norm on the tangent space Tµ P2 is deﬁned by

Z
kµ̇k2Tµ P2 = inf |v|2 dµ; µ̇ + ∇ · (vµ) = 0
Z
= |∇ψ|2 dµ; µ̇ + ∇ · (∇ψ µ) = 0.

Remark 13.10. There is an appealing physical interpretation, which

really is an infinitesimal version of the optimal transport problem.
Imagine that you observe the (infinitesimal) evolution of the density
of particles moving in a continuum, but don’t know the actual veloc-
ities of these particles. There might be many velocity fields that are
compatible with the observed evolution of density (many solutions of
the continuity equation). Among all possible solutions, select the one
with minimum kinetic energy. This energy is (up to a factor 1/2) the
square norm of your infinitesimal evolution.
Bibliographical notes 363

Bibliographical notes

Formula (13.8) appears in [246]. It has an interesting consequence which

can be described as follows: On a Riemannian manifold, the optimal
transport starting from an absolutely continuous probability measure
almost never hits the cut locus; that is, the set of points x such that
the image T (x) belongs to the cut locus of x is of zero probability. Al-
though we already know that almost surely, x and T (x) are joined by a
unique geodesic, this alone does not imply that the cut locus is almost
never hit, because it is possible that y belongs to the cut locus of x and
still x and y are joined by a unique minimizing geodesic. (Recall the
discussion after Problem 8.8.) But Cordero-Erausquin, McCann and
Schmuckenschläger show that if such is the case, then d(x, z)2 /2 fails to
be semiconvex at z = y. On the other hand, by Alexandrov’s second dif-
ferentiability theorem (Theorem 14.1), ψ is twice differentiable almost
everywhere; formula (13.8), suitably interpreted, says that d(x, · )2 /2
is semiconvex at T (x) whenever ψ is twice differentiable at x.
At least in the Euclidean case, and up to regularity issues, the ex-
plicit formulas for geodesic curves and action in the space of measures
were known to Brenier, around the mid-nineties. Otto [669] took a
conceptual step forward by considering formally P2 (M ) as an infinite-
dimensional Riemannian manifold, in view of formula (13.15). For
some time it was used as a purely formal, yet quite useful, heuris-
tic method (as in [671], or later in Chapter 15). It is only recently
that rigorous constructions were performed in several research papers,
e.g. [30, 203, 214, 577]. The approach developed in this chapter relies
heavily on the work of Ambrosio, Gigli and Savaré [30] (in Rn ). A more
geometric treatment can be found in [577, Appendix A]; see also [30,
Section 12.4], and [655, Section 3] (I shall give a few more details in the
bibliographical notes of Chapter 26). As I am completing this course, an
important contribution to the subject has just been made by Lott [575]
who established “explicit” formulas for the Riemannian connection and
curvature in P2ac (M ), or rather in the subset made of smooth positive
densities, when M is compact.
The pressureless Euler equations describe the evolution of a gas of
particles which interact only when they meet, and then stick together
(sticky particles). It is a very degenerate system whose study turns out
to be tricky in general [169, 320, 745]. But in applications to optimal
transport, it comes in the very particular case of potential flow (the
364 13 Qualitative picture

velocity ﬁeld is a gradient), so the evolution is governed by a simple

Hamilton–Jacobi equation.
There are two natural ways to extend a minimizing geodesic (µt )0≤t≤1
in P2 (M ) into a geodesic defined (but not necessarily minimizing) at all
times. One is to solve the Hamilton–Jacobi equation for all times, in the
viscosity sense; then the gradient of the solution will define a velocity
field, and one can let (µt ) evolve by transport, as in (13.6). Another way
[30, Example 11.2.9] is to construct a trajectory of the gradient flow of
the energy −W2 (σ, · )2 /2, where σ is, say, µ0 , and the trajectory starts
from an intermediate point, say µ1/2 . The existence of this gradient
flow follows from [30, Theorems 10.4.12 and 11.2.1], while [30, Theo-
rem 11.2.1] guarantees that it coincides (up to time-reparameterization)
with the original minimizing geodesic for short times. (This is close to
the construction of quasigeodesics in [678].) It is natural to expect that
both approaches give the same result, but as far as I know, this has
not been established.
Khesin suggested to me the following nice problem. Let µ = (µt )t≥0
be a geodesic in the Wasserstein space P2 (Rn ) (defined with the help
of the pressureless Euler equation), and characterize the cut time of
µ as the “time of the first shock”. If µ0 is absolutely continuous with
positive density, this essentially means the following: let tc = inf {t1 ;
(µt )0≤t≤t1 is not a minimizing geodesic}, let ts = sup {t0 ; µt is abso-
lutely continuous for t ≤ t0 }, and show that tc = ts . Since tc should be
equal to sup {t2 ; |x|2 /2 + t ψ(0, x) is convex for t ≤ t2 }, the solution
of this problem is related to a qualitative study of the way in which
convexity degenerates at the first shock. In dimension 1, this problem
can be studied very precisely [642, Chapter 1] but in higher dimensions
the problem is more tricky. Khesin and Misiolek obtain some results in
this direction in [515].
Kloeckner [522] studied the isometries of P2 (R) and found that they
are not all induced by isometries of R; roughly speaking, there is one
additional “exotic” isometry.
In his PhD thesis, Agueh [4] studied the structure of Pp (M ) for
p > 1 (not necessarily equal to 2). Ambrosio, Gigli and Savaré [30]
pushed these investigations further.
Displacement interpolation becomes somewhat tricky in presence of
boundaries. In his study of the porous medium equations, Otto [669]
considered the case of a bounded open set of Rn with C 2 boundary.
Bibliographical notes 365

For many years, the great majority of applications of optimal trans-

port to problems of applied mathematics have taken place in Euclidean
setting, but more recently some “genuinely Riemannian” applications
have started to pop out. There was an original suggestion to use opti-
mal transport in a three-dimensional Riemannian manifold (actually, a
cube equipped with a varying metric) related to image perception and
the matching of pictures with different contrasts [289]. In a meteoro-
logical context, it is natural to consider the sphere (as a model of the
Earth), and in the study of the semi-geostrophic system one is naturally
led to optimal transport on the sphere [263, 264]; actually, it is even
natural to consider a conformal change of metric which “pinches” the
sphere along its equator [263]! For completely different reasons, optimal
transport on the sphere was recently used by Otto and Tzavaras [670]
in the study of a coupled fluid-polymer model.
Part II

Optimal transport and Riemannian geometry

369

This second part is devoted to the exploration of Riemannian geome-

try through optimal transport. It will be shown that the geometry of
a manifold influences the qualitative properties of optimal transport;
this can be quantified in particular by the effect of Ricci curvature
bounds on the convexity of certain well-chosen functionals along dis-
placement interpolation. The first hints of this interplay between Ricci
curvature and optimal transport appeared around 2000, in works by
Otto and myself, and shortly after by Cordero-Erausquin, McCann and
Schmuckenschläger.
Throughout, the emphasis will be on the quadratic cost (the trans-
port cost is the square of the geodesic distance), with just a few excep-
tions. Also, most of the time I shall only handle measures which are
absolutely continuous with respect to the Riemannian volume measure.
Chapter 14 is a preliminary chapter devoted to a short and tenta-
tively self-contained exposition of the main properties of Ricci curva-
ture. After going through this chapter, the reader should be able to
understand all the rest without having to consult any extra source on
Riemannian geometry. The estimates in this chapter will be used in
Chapters 15, 16 and 17.
Chapter 15 presents a powerful formal differential calculus on the
Wasserstein space, cooked up by Otto.
Chapters 16 and 17 establish the main relations between displace-
ment convexity and Ricci curvature. Not only do Ricci curvature
bounds imply certain properties of displacement convexity, but con-
versely these properties characterize Ricci curvature bounds. These re-
sults will play a key role in the rest of the course.
In Chapters 18 to 22 the main theme will be that many classical
properties of Riemannian manifolds, that come from Ricci curvature
estimates, can be conveniently derived from displacement convexity
techniques. This includes in particular estimates about the growth of
the volume of balls, Sobolev-type inequalities, concentration inequali-
ties, and Poincaré inequalities.
Then in Chapter 23 it is explained how one can define certain gradi-
ent flows in the Wasserstein space, and recover in this way well-known
equations such as the heat equation. In Chapter 24 some of the func-
tional inequalities from the previous chapters are applied to the quali-
tative study of some of these gradient flows. Conversely, gradient flows
provide alternative proofs to some of these inequalities, as shown in
Chapter 25.
370

The issues discussed in this part are concisely reviewed in the sur-
veys by Cordero-Erausquin [244] and myself [821] (both in French).

Convention: Throughout Part II, unless otherwise stated, a “Rie-

mannian manifold” is a smooth, complete connected finite-dimensional
Riemannian manifold distinct from a point, equipped with a smooth
metric tensor.
14

Ricci curvature

Curvature is a generic name to designate a local invariant of a metric

space that quantifies the deviation of this space from being Euclidean.
(Here “local invariant” means a quantity which is invariant under local
isometries.) It is standard to define and study curvature mainly on Rie-
mannian manifolds, for in that setting definitions are rather simple, and
the Riemannian structure allows for “explicit” computations. Through-
out this chapter, M will stand for a complete connected Riemannian
manifold, equipped with its metric g.
The most popular curvatures are: the sectional curvature σ (for
each point x and each plane P ⊂ Tx M , σx (P ) is a number), the Ricci
curvature Ric (for each point x, Ricx is a quadratic form on the tan-
gent space Tx M ), and the scalar curvature S (for each point x, Sx is
a number). All of them can be obtained by reduction of the Riemann
curvature tensor. The latter is easy to define: If ∇X stands for the
covariant derivation along the vector field X, then

Riem(X, Y ) := ∇Y ∇X − ∇X ∇Y + ∇[X,Y ] ;

but it is notoriously diﬃcult, even for specialists, to get some under-

standing of its meaning. The Riemann curvature can be thought of as
a tensor with four indices; it can be expressed in coordinates as a non-
linear function of the Christoﬀel symbols and their partial derivatives.
Of these three notions of curvature (sectional, Ricci, scalar), the
sectional one is the most precise; in fact the knowledge of all sectional
curvatures is equivalent to the knowledge of the Riemann curvature.
The Ricci curvature is obtained by “tracing” the sectional curvature:
If e is a given unit vector in Tx M and
P (e, e2 , . . . , en ) is an orthonormal
basis of Tx M , then Ricx (e, e) = σx (Pj ), where Pj (j = 2, . . . , n)
372 14 Ricci curvature

is the plane generated by {e, ej }. Finally, the scalar curvature is the

trace of the Ricci curvature. So a control on the sectional curvature is
stronger than a control on the Ricci curvature, which in turn is stronger
than a control on the scalar curvature.
For a surface (manifold of dimension 2), these three notions reduce
to just one, which is the Gauss curvature and whose definition is el-
ementary. Let us first describe it from an extrinsic point of view. Let
M be a two-dimensional submanifold of R3 . In the neighborhood of a
point x, choose a unit normal vector n = n(y), then this defines lo-
cally a smooth map n with values in S 2 ⊂ R3 (see Figure 14.1). The
tangent spaces Tx M and Tn(x) S 2 are parallel planes in R3 , which can
be identified unambiguously. So the determinant of the differential of n
can also be defined without ambiguity, and this determinant is called
the curvature. The fact that this quantity is invariant under isometries
is one of Gauss’s most famous results, a tour de force at the time. (To
appreciate this theorem, the reader might try to prove it by elementary
means.)

x n
S2

Fig. 14.1. The dashed line gives the recipe for the construction of the Gauss map;
its Jacobian determinant is the Gauss curvature.

As an illustration of this theorem: If you hold a sheet of paper

straight, then its equation (as an embedded surface in R3 , and assuming
that it is inﬁnite) is just the equation of a plane, so obviously it is not
curved. Fine, but now bend the sheet of paper so that it looks like
valleys and mountains, write down the horrible resulting equations,
give it to a friend and ask him whether it is curved or not. One thing
he can do is compute the Gauss curvature from your horrible equations,
ﬁnd that it is identically 0, and deduce that your surface was not curved
14 Ricci curvature 373

at all. Well, it looked curved as a surface which was embedded in R3 ,

but from an intrinsic point of view it was not: A tiny creature living
on the surface of the sheet, unable to measure the lengths of curves
going outside of the surface, would never have noticed that you bent
the sheet.
To construct isometries from (M, g) to something else, pick up any
diffeomorphism ϕ : M → M ′ , and equip M ′ = ϕ(M ) with the metric
g′ = (ϕ−1 )∗ g, defined by gx′ (v) = gϕ−1 (x) (dx ϕ−1 (v)). Then ϕ is an
isometry between (M, g) and (M ′ , g′ ). Gauss’s theorem says that the
curvature computed in (M, g) and the curvature computed in (M ′ , g′ )
are the same, modulo obvious changes (the curvature at point x along
a plane P should be compared with the curvature at ϕ(x) along a plane
dx ϕ(P )). This is why one often says that the curvature is “invariant
under the action of diffeomorphisms”.
Curvature is intimately related to the local behavior of geodesics.
The general rule is that, in the presence of positive curvature, geodesics
have a tendency to converge (at least for short times), while in the
presence of negative curvature they have a tendency to diverge. This
tendency can usually be felt only at second or third order in time: at
first order, the convergence or divergence of geodesics is dictated by
the initial conditions. So if, on a space of (strictly) positive curvature,
you start two geodesics from the same point with velocities pointing in
different directions, the geodesics will start to diverge, but then the ten-
dency to diverge will diminish. Here is a more precise statement, which
will show at the same time that the Gauss curvature is an intrinsic
notion: From a point x ∈ M , start two constant-speed geodesics with
unit speed, and respective velocities v, w. The two curves will spread
apart; let δ(t) be the distance between
p their respective positions at time
t. In a first approximation, δ(t) ≃ 2(1 − cos θ) t, where θ is the angle
between v and w (this is the same formula as in Euclidean space). But
a more precise study shows that
p κx cos2 (θ/2) 2
δ(t) = 2(1 − cos θ) t 1 − t + O(t4 ) , (14.1)
6
where κx is the Gauss curvature at x.
Once the intrinsic nature of the Gauss curvature has been estab-
lished, it is easy to define the notion of sectional curvature for Rie-
mannian manifolds of any dimension, embedded or not: If x ∈ M and
P ⊂ Tx M , define σx (P ) as the Gauss curvature of the surface which is
obtained as the image of P by the exponential map expx (that is, the
374 14 Ricci curvature

collection of all geodesics starting from x with a velocity in P ). Another

equivalent definition is by reduction of the Riemann curvature tensor:
If {u, v} is an orthonormal basis of P , then σx (P ) = hRiem (u, v)·u, vi.
It is obvious from the first definition of Gauss curvature that the unit
two-dimensional sphere S 2 has curvature +1, and that the Euclidean
plane R2 has curvature 0. More generally, the sphere S n (R), of dimen-
sion n and radius R, has constant sectional curvature 1/R2 ; while the n-
dimensional Euclidean space Rn has curvature 0. The other classical ex-
ample is the hyperbolic space, say Hn (R) = {(x, y) ∈ Rn−1 × (0, +∞)}
equipped with the metric R2 (dx2 + dy 2 )/y 2 , which has constant sec-
tional curvature −1/R2 . These three families (spheres, Euclidean, hy-
perbolic) constitute the only simply connected Riemannian manifolds
with constant sectional curvature, and they play an important role as
comparison spaces.
The qualitative properties of optimal transport are also (of course)
related to the behavior of geodesics, and so it is natural to believe that
curvature has a strong influence on the solution of optimal transport.
Conversely, some curvature properties can be read off on the solution
of optimal transport. At the time of writing, these links have been best
understood in terms of Ricci curvature; so this is the point of view that
will be developed in the sequel.
This chapter is a tentative crash course on Ricci curvature. Hope-
fully, a reader who has never heard about that topic before should, by
the end of the chapter, know enough about it to understand all the rest
of the notes. This is by no means a complete course, since most proofs
will only be sketched and many basic results will be taken for granted.
In practice, Ricci curvature usually appears from two points of view:
(a) estimates of the Jacobian determinant of the exponential
map; (b) Bochner’s formula. These are two complementary points
of view on the same phenomenon, and it is useful to know both. Before
going on, I shall make some preliminary remarks about Riemannian
calculus at second order, for functions which are not necessarily smooth.

Preliminary: second-order differentiation

All curvature calculations involve second-order diﬀerentiation of cer-

tain expressions. The notion of covariant derivation lends itself well to
Preliminary: second-order differentiation 375

those computations. A ﬁrst thing to know is that the exchange of deriva-

tives is still possible. To express this properly, consider a parametrized
surface (s, t) → γ(s, t) in M , and write d/dt (resp. d/ds) for the dif-
ferentiation along γ, viewed as a function of t with s frozen (resp. as
a function of s with t frozen); and D/Dt (resp. D/Ds) for the corre-
sponding covariant diﬀerentiation. Then, if F ∈ C 2 (M ), one has

D dF D dF
= . (14.2)
Ds dt Dt ds

Also a crucial concept is that of the Hessian operator. If f is twice

differentiable on Rn , its Hessian matrix is just (∂ 2 f /∂xi ∂xj )1≤i,j≤n ,
that is, the array of all second-order partial derivatives. Now if f is
defined on a Riemannian manifold M , the Hessian operator at x is the
linear operator ∇2 f (x) : Tx M → Tx M defined by the identity

∇2 f · v = ∇v (∇f ).

(Recall that ∇v stands for the covariant derivation in the direction v.)
In short, ∇2 f is the covariant gradient of the gradient of f .
A convenient way to compute the Hessian of a function is to diﬀer-
entiate it twice along a geodesic path. Indeed, if (γt )0≤t≤1 is a geodesic
path, then ∇γ̇ γ̇ = 0, so

d2 d D E D E
f (γt ) = h∇f (γt ), γ̇t i = ∇ γ̇ ∇f (γt ), γ̇t + ∇f (γt ), ∇ γ̇ γ̇t
dt2 dt

= ∇2 f (γt ) · γ̇t , γ̇t .

In other words, if γ0 = x and γ̇0 = v ∈ Tx M , then

t2
2
f (γt ) = f (x) + t h∇f (x), vi + ∇ f (x) · v, v + o(t2 ). (14.3)
2
This identity can actually be used to deﬁne the Hessian operator.
A similar computation shows that for any two tangent vectors u, v
at x,
D d

f expx (su + tv) = ∇2 f (x) · u, v , (14.4)
Ds dt
where expx v is the value at time 1 of the constant speed geodesic
starting from x with velocity v. Identity (14.4) together with (14.2)
shows that if f ∈ C 2 (M ), then ∇2 f (x) is a symmetric operator:
376 14 Ricci curvature

h∇2 f (x) · u, vix = h∇2 f (x) · v, uix . In that case it will often be conve-
nient to think of ∇2 f (x) as a quadratic form on Tx M .
The Hessian is related to another fundamental second-order dif-
ferential operator, the Laplacian, or Laplace–Beltrami operator. The
Laplacian can be deﬁned as the trace of the Hessian:

∆f (x) = tr (∇2 f (x)).

Another possible deﬁnition is

∆f = ∇ · (∇f ),

where ∇· is the divergence operator, deﬁned as the negative of the

adjoint of the gradient in L2 (M ): More explicitly, if ξ is a C 1 vector
ﬁeld on M , its divergence is deﬁned by
Z Z
∞
∀ζ ∈ Cc (M ), (∇ · ξ) ζ dvol = − ξ · ∇ζ dvol.
M M

Both deﬁnitions are equivalent; in fact, the divergence of any vector

ﬁeld ξ coincides with the trace of the covariantP gradient of ξ. When
M = Rn , ∆f is given by the usual expression ∂ii2 f . More generally,
in coordinates, the Laplacian reads
X
∆f = (det g)−1/2 ∂i (det g)1/2 gij ∂j f ).
ij

In the context of optimal transport, we shall be led to consider

Hessian operators for functions f that are not of class C 2 , and not
even continuously differentiable. However, ∇f and ∇2 f will still be
well-defined almost everywhere, and this will be sufficient to conduct
the proofs. Here I should explain what it means for a function defined
almost everywhere to be differentiable. Let ξ be a vector field defined
on a domain of a neighborhood U of x; when y is close enough to x,
there is a unique velocity w ∈ Tx M such that y = γ1 , where γ is
the constant-speed geodesic starting from x with initial velocity w; for
simplicity I shall write w = y − x (to be understood as y = expx w).
Then ξ is said to be covariantly differentiable at x in the direction v, if

θy→x ξ(y) − ξ(x)
∇v ξ(x) := lim |v| (14.5)
y−x
y→x; |y−x| v
→ |v| |y − x|

exists, where y varies on the domain of deﬁnition of ξ, and θy→x is

the parallel transport along the geodesic joining y to x. If ξ is deﬁned
Preliminary: second-order differentiation 377

everywhere in a neighborhood of x, then this is just the usual notion

of covariant derivation. Formulas for (14.5) in coordinates are just the
same as in the smooth case.
The following theorem is the main result of second diﬀerentiability
for nonsmooth functions:

Theorem 14.1 (Second differentiability of semiconvex func-

tions). Let M be a Riemannian manifold equipped with its volume
measure, let U be an open subset of M , and let ψ : U → R be locally
semiconvex with a quadratic modulus of semiconvexity, in the sense of
Definition 10.10. Then, for almost every x ∈ U , ψ is differentiable at x
and there exists a symmetric operator A : Tx M → Tx M , characterized
by one of the following two equivalent properties:
(i) For any v ∈ Tx M , ∇v (∇ψ)(x) = Av;
hA · v, vi
(ii) ψ(expx v) = ψ(x) + h∇ψ(x), vi + + o(|v|2 ) as v → 0.
2
The operator A is denoted by ∇2 ψ(x) and called the Hessian of ψ at x.
When no confusion is possible, the quadratic form defined by A is also
called the Hessian of ψ at x.
The trace of A is denoted by ∆ψ(x) and called the Laplacian of
ψ at x. The function x → ∆ψ(x) coincides with the density of the
absolutely continuous part of the distributional Laplacian of ψ; while the
singular part of this distributional Laplacian is a nonnegative measure.

Remark 14.2. The particular case when ψ is convex Rn → R is known

as Alexandrov’s second differentiability theorem. By extension,
I shall use the terminology “Alexandrov’s theorem” for the general
statement of Theorem 14.1. This theorem is more often stated in terms
of Property (ii) than in terms of Property (i); but it is the latter that
will be most useful for our purposes.

Remark 14.3. As the proof will show, Property (i) can be replaced by
the following more precise statement involving the subdiﬀerential of ψ:
If ξ is any vector ﬁeld valued in ∇− ψ (i.e. ξ(y) ∈ ∇− ψ(y) for all y),
then ∇v ξ(x) = Av.

Remark 14.4. For the main part of this course, we shall not need
the full strength of Theorem 14.1, but just the particular case when
ψ is continuously diﬀerentiable and ∇ψ is Lipschitz; then the proof
becomes much simpler, and ∇ψ is almost everywhere diﬀerentiable in
378 14 Ricci curvature

the usual sense. Still, on some occasions we shall need the full generality
of Theorem 14.1.

Beginning of proof of Theorem 14.1. The notion of local semiconvexity

with quadratic modulus is invariant by C 2 diffeomorphism, so it suffices
to prove Theorem 14.1 when M = Rn . But a semiconvex function
in an open subset U of Rn is just the sum of a quadratic form and
a locally convex function (that is, a function which is convex in any
convex subset of U ). So it is actually sufficient to consider the special
case when ψ is a convex function in a convex subset of Rn . Then if
x ∈ U and B is a closed ball around x, included in U , let ψB be
the restriction of ψ to B; since ψ is Lipschitz and convex, it can be
extended into a Lipschitz convex function on the whole of Rn (take
for instance the supremum of all supporting hyperplanes for ψB ). In
short, to prove Theorem 14.1 it is sufficient to treat the special case of
a convex function ψ : Rn → R. At this point the argument does not
involve any more Riemannian geometry, but only convex analysis; so
I shall postpone it to the Appendix (Theorem 14.25). ⊓
⊔

The Jacobian determinant of the exponential map

Let M be a Riemannian manifold, and let ξ be a vector ﬁeld on M (so

for each x, ξ(x) lies in Tx M ). Recall the definition of the exponential
map T = exp ξ: Start from point x a geodesic curve with initial velocity
ξ(x) ∈ Tx M , and follow it up to time 1 (it is not required that the
geodesic be minimizing all along); the position at time 1 is denoted by
expx ξ(x). As a trivial example, in the Euclidean space, expx ξ(x) =
x + ξ(x).
The computation of the Jacobian determinant of such a map is a
classical exercise in Riemannian geometry, whose solution involves the
Ricci curvature. One can take this computation as a theorem about the
Ricci curvature (previously defined in terms of sectional or Riemann
curvature), or as the mere definition of the Ricci curvature.
So let x ∈ M be given, and let ξ be a vector field defined in
a neighborhood of x, or almost everywhere in a neighborhood of x.
Let (e1 , . . . , en ) be an orthonormal basis of Tx M , and consider small
variations of x in these directions e1 , . . . , en , denoted abusively by
The Jacobian determinant of the exponential map 379

x + δe1 , . . . , x + δen . (Here x + δej should be understood as, say,

expx (δej ); but it might also be any path x(δ) with ẋ(0) = ei .) As δ → 0,
the inﬁnitesimal parallelepiped Pδ built on (x + δe1 , . . . , x + δen ) has
volume vol [Pδ ] ≃ δn . (It is easy to make sense of that by using local
charts.) The quantity of interest is

vol [T (Pδ )]
J (x) := lim .
vol [Pδ ]

For that purpose, T (Pδ ) can be approximated by the inﬁnitesimal par-

allelogram built on T (x + δe1 ), . . . , T (x + δen ). Explicitly,

T (x + δei ) = expx+δei (ξ(x + δei )).

(If ξ is not deﬁned at x+δei it is always possible to make an inﬁnitesimal

perturbation and replace x + δei by a point which is extremely close
and at which ξ is well-defined. Let me skip this nonessential subtlety.)
Assume for a moment that we are in Rn , so T (x) = x + ξ(x);
then, by a classical result of real analysis, J (x) = | det(∇T (x))| =
| det(In + ∇ξ(x))|. But in the genuinely Riemannian case, things are
much more intricate (unless ξ(x) = 0) because the measurement of
infinitesimal volumes changes as we move along the geodesic path
γ(t, x) = expx (t ξ(x)).
To appreciate this continuous change, let us parallel-transport along
the geodesic γ to define a new family E(t) = (e1 (t), . . . , en (t)) in
Tγ(t) M . Since (d/dt)hei (t), ej (t)i = hėi (t), ej (t)i + hei (t), ėj (t)i = 0, the
family E(t) remains an orthonormal basis of Tγ(t) M for any t. (Here
the dot symbol stands for the covariant derivation along γ.) Moreover,
e1 (t) = γ̇(t, x)/|γ̇(t, x)|. (See Figure 14.2.)

E(t)

E(0)

Fig. 14.2. The orthonormal basis E, here represented by a small cube, goes along
the geodesic by parallel transport.
380 14 Ricci curvature

To express the Jacobian of the map T = exp ξ, it will be convenient

to consider the whole collection of maps Tt = exp(t ξ). For brevity, let
us write
Tt (x + δE) = Tt (x + δe1 ), . . . , Tt (x + δen ) ;
then
Tt (x + δE) ≃ Tt (x) + δ J,
where

d
J = (J1 , . . . , Jn ); Ji (t, x) := Tt (x + δei ).
dδ δ=0

(See Figure 14.3.)

The vector fields Ji have been obtained by differentiating a family
of geodesics depending on a parameter (here δ); such vector fields are
called Jacobi fields and they satisfy a characteristic linear second-
order equation known as the Jacobi equation. To write this equa-
tion, it will be convenient to express J1 , . . . , Jn in terms of the basis
e1 , . . . , en ; so let Jij = hJi , ej i stand for the jth component of Ji in this
basis. The matrix J = (Jij )1≤i,j≤n satisfies the differential equation

J¨(t) + R(t) J(t) = 0, (14.6)

where R(t) is a matrix which depends on the Riemannian structure at

γ(t), and can be expressed in terms of the Riemann curvature tensor:
D E
Rij (t) = Riemγ(t) (γ̇(t), ei (t)) γ̇(t), ej (t) . (14.7)
γ(t)

(All of these quantities depend implicitly on the starting point x.) The
reader who prefers to stay away from the Riemann curvature tensor
can take (14.6) as the equation defining the matrix R; the only things
that one needs to know about it are (a) R(t) is symmetric; (b) the
first row of R(t) vanishes (which is the same, modulo identification, as
R(t) γ̇(t) = 0); (c) tr R(t) = Ricγt (γ̇t , γ̇t ) (which one can also adopt as
a definition of the Ricci tensor); (d) R is invariant under the transform
t → 1 − t, E(t) → −E(1 − t), γt → γ1−t .
Equation (14.6) is of second order in time, so it should come with
˙
initial conditions for both J(0) and J(0). On the one hand, since
T0 (y) = y,
d
Ji (0) = (x + δei ) = ei ,
dδ δ=0
The Jacobian determinant of the exponential map 381

J(t)

E(t)

J(0) = E(0)
Fig. 14.3. At time t = 0, the matrices J(t) and E(t) coincide, but at later times
they (may) differ, due to geodesic distortion.

so J(0) is just the identity matrix. On the other hand,

˙ D d D d
Ji (0) = Tt (x + δei ) = Tt (x + δei )
Dt t=0 dδ δ=0 Dδ δ=0 dt t=0

D
= ξ(x + δei ) = (∇ξ)ei ,
Dδ δ=0

where ∇ξ is the covariant gradient of ξ. (The exchange of derivatives

is justiﬁed by the diﬀerentiability of ξ at x and the C ∞ regularity of
(t, y, ξ) → expy (tξ).) So

d d
DJi
Jij = hJi , ej i = , ej = h(∇ξ)ei , ej i.
dt dt Dt
We conclude that the initial conditions are

J(0) = In , ˙
J(0) = ∇ξ(x), (14.8)

where in the second expression the linear operator ∇ξ(x) is identiﬁed

with its matrix in the basis E: (∇ξ)ij = h(∇ξ)ei , ej i = hei · ∇ξ, ej i.
(Be careful, this is the converse of the usual convention Aij = hAej , ei i;
anyway, later we shall work with symmetric operators, so it will not
matter.)
From this point on, the problem is about a path J(t) valued in
the space Mn (R) of real n × n matrices, and we can forget about the
geometry: Parallel transport has provided a consistent identiﬁcation of
382 14 Ricci curvature

all the tangent spaces Tγ(t) M with Rn . This path depends on x via the
initial conditions (14.8), so in the sequel we shall put that dependence
explicitly. It might be very rough as a function of x, but it is very
smooth as a function of t. The Jacobian of the map Tt is deﬁned by

J (t, x) = det J(t, x),

and the formula for the diﬀerential of the determinant yields

J˙ (t, x) = J (t, x) tr J(t,
˙ x) J(t, x)−1 , (14.9)

at least as long as J(t, x) is invertible (let’s forget about that problem

for the moment).
So it is natural to set
U := J˙ J −1 , (14.10)
and look for an equation on U . By diﬀerentiating (14.10) and using
(14.6), we discover that
¨ −1 − JJ
U̇ = JJ ˙ −1 JJ
˙ −1 = −R − U 2

(note that J and J˙ do not necessarily commute). So the change of

variables (14.10) has turned the second-order equation (14.6) into the
first-order equation
U̇ + U 2 + R = 0, (14.11)
which is of Ricatti type, that is, with a quadratic nonlinearity.
By taking the trace of (14.11), we arrive at

d
(tr U ) + tr (U 2 ) + tr R = 0.
dt
Now the trace of R(t, x) only depends on γt and γ̇t ; in fact, as noticed
before, it is precisely the value of the Ricci curvature at γ(t), evaluated
in the direction γ̇(t). So we have arrived at our first important equation
involving Ricci curvature:
d
(tr U ) + tr (U 2 ) + Ric(γ̇) = 0, (14.12)
dt
where of course Ric(γ̇) is an abbreviation for Ricγ(t) (γ̇(t), γ̇(t)).
Equation (14.12) holds true for any vector field ξ, as long as ξ is
covariantly differentiable at x. But in the sequel, I shall only apply it
in the particular case when ξ derives from a function: ξ = ∇ψ; and ψ is
The Jacobian determinant of the exponential map 383

locally semiconvex with a quadratic modulus of semiconvexity. There

are three reasons for this restriction:
(a) In the theory of optimal transport, one only needs to consider
such maps;
(b) The semiconvexity of ψ guarantees the almost everywhere dif-
ferentiability of ∇ψ, by Theorem 14.1;
(c) If ξ = ∇ψ, then ∇ξ(x) = ∇2 ψ(x) is symmetric and this will
imply the symmetry of U (t, x) at all times; this symmetry will allow to
derive from (14.12) a closed inequality on tr U (t, x) = J (t, x).
So from now on, ξ = ∇ψ, where ψ is semiconvex. To prove the
symmetry of U (t, x), note that U (0, x) = In and U̇ (0, x) = ∇2 ψ(x)
(modulo identification) are symmetric, and the time-dependent matrix
R(t, x) is also symmetric, so U (t, x) and its transpose U (t, x)∗ solve the
same differential equation, with the same initial conditions. Then, by
the uniqueness statement in the Cauchy–Lipschitz theorem, they have
to coincide at all times where they are defined.
Inequality (14.12) cannot be recast as a differential equation involv-
ing only the Jacobian determinant (or equivalently tr U (t, x)), since
the quantity tr (U 2 ) in (14.12) cannot be expressed in terms of tr U .
However, the symmetry of U allows us to use the Cauchy–Schwarz
inequality, in the form

(tr U )2
tr (U 2 ) ≥ ;
n
then, by plugging this inequality into (14.12), we obtain an important
diﬀerential inequality involving Ricci curvature:

d (tr U )2
(tr U ) + + Ric(γ̇) ≤ 0. (14.13)
dt n
There are several ways to rewrite this result in terms of the Jacobian
determinant J (t). For instance, by diﬀerentiating the formula

J˙
tr U = ,
J
one obtains easily
!2
d (tr U )2 J¨ 1 J˙
(tr U ) + = − 1− .
dt n J n J
384 14 Ricci curvature

So (14.13) becomes
!2
J¨ 1 J˙
− 1− ≤ − Ric(γ̇). (14.14)
J n J

For later purposes, it will be convenient to deﬁne D(t) := J (t)1/n

(which one can think of as a coeﬃcient of mean distortion); then the
left-hand side of (14.14) is exactly nD̈/D. So

D̈ Ric(γ̇)
≤− . (14.15)
D n
Yet another useful formula is obtained by introducing ℓ(t) :=
− log J (t), and then (14.13) becomes

ℓ̇(t)2
ℓ̈(t) ≥ + Ric(γ̇). (14.16)
n
In all of these formulas, we have always taken t = 0 as the starting
time, but it is clear that we could do just the same with any starting
time t0 ∈ [0, 1], that is, consider, instead of Tt (x) = exp(t∇ψ(x)), the
map Tt0 →t (x) = exp((t − t0 )∇ψ(x)). Then all the diﬀerential inequali-
ties are unchanged; the only diﬀerence is that the Jacobian determinant
at time t = 0 is not necessarily 1.

Taking out the direction of motion

The previous formulas are quite sufficient to derive many useful geo-
metric consequences. However, one can refine them by taking advantage
of the fact that curvature is not felt in the direction of motion. In other
words, if one is traveling along some geodesic γ, one will never be able
to detect some curvature by considering variations (in the initial posi-
tion, or initial velocity) in the direction of γ itself: the path will always
be the same, up to reparametrization. This corresponds to the property
R(t)γ̇(t) = 0, where R(t) is the matrix appearing in (14.6). In short,
curvature is felt only in n − 1 directions out of n. This loose principle
often leads to a refinement of estimates by a factor (n − 1)/n.
Here is a recipe to “separate out” the direction of motion from
the other directions. As before, assume that the first vector of the or-
thonormal basis J(0) is e1 (0) = γ̇(0)/|γ̇(0)|. (The case when γ̇(0) = 0
Taking out the direction of motion 385

can be treated separately.) Set u// = u11 (this is the coeﬃcient in U

which corresponds to just the direction of motion), and define U⊥ as
the (n − 1) × (n − 1) matrix obtained by removing the first line and
first column in U . Of course, tr (U ) = u// + tr (U⊥ ). Next decompose
the Jacobian determinant J into a parallel and an orthogonal contri-
butions: Z t
J = J// J⊥ , J// (t) = exp u// (s) ds .
0
Further define parallel and orthogonal distortions by
1
D// = J// , D⊥ = J⊥n−1 ;

and, of course,

ℓ// = − log J// , ℓ⊥ = − log J⊥ . (14.17)

Since the ﬁrst row of R(t) vanishes, equation (14.11) implies

X
u̇// = − u21j ≤ −u211 = −u//
2
.
j

It follows easily that

2
ℓ̈// ≥ ℓ̇// , (14.18)
or equivalently
J¨// ≤ 0, (14.19)
so D// = J// is always a concave function of t (this property does not
depend on the curvature of M ), and the same holds true of course for
D// which coincides with J// .
Now let us take care of the orthogonal part: Putting together (14.9),
(14.10), (14.11), (14.18), we ﬁnd
d X
ℓ̈⊥ = − (tr U ) − ℓ̈// = tr (U 2 ) + Ric(γ̇) − u21j .
dt
P
Since tr U 2 = tr (U⊥ )2 + 2 u21j , this implies

ℓ̈⊥ ≥ tr (U⊥2 ) + Ric(γ̇). (14.20)

Then in the same manner as before, one can obtain

(ℓ̇⊥ )2
ℓ̈⊥ ≥ + Ric(γ̇), (14.21)
n−1
386 14 Ricci curvature

D̈⊥ Ric(γ̇)
≤− . (14.22)
D⊥ n−1
To summarize: The basic inequalities for ℓ⊥ and ℓ// are the same
as for ℓ, but with the exponent n replaced by n − 1 in the case of ℓ⊥ ,
and 1 in the case of ℓ// ; and the number Ric(γ̇) replaced by 0 in the
case of ℓ// . The same for D⊥ and D// .

Positivity of the Jacobian

Unlike the distance function, the exponential map is always smooth.

But this does not prevent the Jacobian determinant J (t) from vanish-
ing, i.e. the matrix J(t) from becoming singular (not invertible). Then
computations such as (14.9) break down. So all the computations per-
formed before are only valid if J (t) is positive for all t ∈ (0, 1).
In terms of ℓ(t) = − log J (t), the vanishing of the Jacobian deter-
minant corresponds to a divergence ℓ(t) → ∞. Readers familiar with
ordinary diﬀerential equations will have no trouble believing that these
events are not rare: Indeed, ℓ solves the Ricatti-type equation (14.16),
and such equations often lead to blow-up in ﬁnite time. For instance,
consider a function ℓ(t) that solves

(ℓ̇)2
ℓ̈ ≥ + K,
n−1

where K > 0. Consider a time t0 where ℓ has a minimum, so ℓ̇(t0 ) = 0.

pbe deﬁned on a time-interval larger than [t0 − T, t0 + T ],
Then, ℓ cannot
where T := π (n − 1)/K. So the Jacobian has to vanish at some time,
and we even have a bound on this time. (With a bit more work, this
estimate implies the Bonnet–Myers theorem,
p which asserts that the
diameter of M cannot be larger than π (n − 1)/K if Ric ≥ Kg.)
The vanishing of the Jacobian may occur even along geodesics that
are minimizing for all times: Consider for instance ξ(x) = −2x in Rn ;
then the image of exp(tξ) is reduced to a single point when t = 1/2.
However, in the case of optimal transport, the Jacobian cannot vanish
at intermediate times, at least for almost all initial points: Recall indeed
the last part of Theorem 11.3. This property can be seen as a result
of the very special choice of the velocity ﬁeld ξ, which is the gradient
of a d2 /2-convex function; or as a consequence of the “no-crossing”
Bochner’s formula 387

property explored in Chapter 8. (There is also an interpretation in

terms of Jacobi ﬁelds, see Proposition 14.31 in the Third Appendix.)

Bochner’s formula

So far, we have discussed curvature from a Lagrangian point of view,

that is, by going along a geodesic path γ(t), keeping the memory of the
initial position. It is useful to be also familiar with the Eulerian point
of view, in which the focus is not on the trajectory, but on the velocity
field ξ = ξ(t, x). To switch from Lagrangian to Eulerian description,
just write
γ̇(t) = ξ t, γ(t) . (14.23)
In general, this can be a subtle issue because two trajectories might
cross, and then there might be no way to define a meaningful veloc-
ity field ξ(t, · ) at the crossing point. However, if a smooth vector
field ξ = ξ(0, · ) is given, then around any point x0 the trajectories
γ(t, x) = exp(t ξ(x)) do not cross for |t| small enough, and one can de-
fine ξ(t, x) without ambiguity. The covariant differentiation of (14.23)
along ξ itself, and the geodesic equation γ̈ = 0, yield
∂ξ
+ ∇ξ ξ = 0, (14.24)
∂t
which is the pressureless Euler equation. From a physical point of
view, this equation describes the velocity field of a bunch of particles
which travel along geodesic curves without interacting. The derivation
of (14.24) will fail when the geodesic paths start to cross, at which
point the solution to (14.24) would typically lose smoothness and need
reinterpretation. But for the sequel, we only need (14.24) to be satisfied
for small values of |t|, and locally around x.
All the previous discussion about Ricci curvature can be recast in
Eulerian terms. Let γ(t, x) = expx (tξ(x)); by the definition of the co-
variant gradient, we have

J̇(t, x) = ∇ξ(t, γ(t, x)) J(t, x)

(the same formula that we had before at time t = 0). Under the identi-
ﬁcation of Rn with Tγ(t) M provided by the basis E(t), we can identify
J with the matrix J, and then
388 14 Ricci curvature

˙ x) J(t, x)−1 = ∇ξ t, γ(t, x) ,
U (t, x) = J(t, (14.25)

where again the linear operator ∇ξ is identiﬁed with its matrix in the
basis E.
Then tr U (t, x) = tr ∇ξ(t, x) coincides with the divergence of
ξ(t, · ), evaluated at x. By the chain-rule and (14.24),

d d
(tr U )(t, x) = (∇ · ξ)(t, γ(t, x))
dt dt
∂ξ
=∇· (t, γ(t, x)) + γ̇(t, x) · ∇(∇ · ξ)(t, γ(t, x))
∂t

= −∇ · (∇ξ ξ) + ξ · ∇(∇ · ξ) (t, γ(t, x)).

Thus the Lagrangian formula (14.12) can be translated into the

Eulerian formula

−∇ · (∇ξ ξ) + ξ · ∇(∇ · ξ) + tr (∇ξ)2 + Ric(ξ) = 0. (14.26)

All functions here are evaluated at (t, γ(t, x)), and of course we can
choose t = 0, and x arbitrary. So (14.26) is an identity that holds true
for any smooth (say C 2 ) vector field ξ on our manifold M . Of course
it can also be established directly by a coordinate computation.1
While formula (14.26) holds true for all vector fields ξ, if ∇ξ is
symmetric then two simplifications arise:
|ξ|2
(a) ∇ξ ξ = ∇ξ · ξ = ∇ ;
2
(b) tr (∇ξ)2 = k∇ξk2HS , HS standing for Hilbert–Schmidt norm.
So (14.26) becomes

|ξ|2
−∆ + ξ · ∇(∇ · ξ) + k∇ξk2HS + Ric(ξ) = 0. (14.27)
2
We shall apply it only in the case when ξ is a gradient: ξ = ∇ψ; then
∇ξ = ∇2 ψ is indeed symmetric, and the resulting formula is

|∇ψ|2
−∆ + ∇ψ · ∇(∆ψ) + k∇2 ψk2HS + Ric(∇ψ) = 0. (14.28)
2
1
With the notation ∇ξ = ξ·∇ (which is classical in fluid mechanics), and tr (∇ξ)2 =
∇ξ··∇ξ, (14.26) takes the amusing form −∇·ξ·∇ξ+ξ·∇∇·ξ+∇ξ··∇ξ+Ric(ξ) = 0.
Bochner’s formula 389

The identity (14.26), or its particular case (14.28), is called the

Bochner–Weitzenböck–Lichnerowicz formula, or just Bochner’s
formula.2

Remark 14.5. With the ansatz ξ = ∇ψ, the pressureless Euler equa-
tion (14.24) reduces to the Hamilton–Jacobi equation

∂ψ |∇ψ|2
+ = 0. (14.29)
∂t 2
One can use this equation to obtain (14.28) directly, instead of ﬁrst
deriving (14.26). Here equation (14.29) is to be understood in a viscosity
sense (otherwise there are many spurious solutions); in fact the reader
might just as well take the identity

d(x, y)2
ψ(t, x) = inf ψ(y) +
y∈M 2t

as the deﬁnition of the solution of (14.29). Then the geodesic curves

γ starting with γ(0) = x, γ̇(0) = ∇ψ(x) are called characteristic
curves of equation (14.29).

Remark 14.6. Here I have not tried to derive Bochner’s formula for
nonsmooth functions. This could be done for semiconvex ψ, with an
2
appropriate “compensated” deﬁnition for −∆ |∇ψ| 2 + ∇ψ · ∇(∆ψ). In
fact, the semiconvexity of ∇ψ prevents the formation of instantaneous
shocks, and will allow the Lagrangian/Eulerian duality for a short time.

Remark 14.7. The operator U (t, x) coincides with ∇2 ψ(t, γ(t, x)),
which is another way to see that it is symmetric for t > 0.

From this point on, we shall only work with (14.28). Of course,
by using the Cauchy–Schwarz identity as before, we can bound below
k∇2 ψk2HS by (∆ψ)2 /n; therefore (14.25) implies
2
In (14.26) or (14.28) I have written Bochner’s formula in purely “metric” terms,
which will probably look quite ugly to many geometer readers. An equivalent but
more “topological” way to write Bochner’s formula is

∆ + ∇∇∗ + Ric = 0,

where ∆ = −(dd∗ + d∗ d) is the Laplace operator on 1-forms, ∇ is the covariant

differentiation (under the identification of a 1-form with a vector field) and the
adjoints are in L2 (vol). Also I should note that the name “Bochner formula” is
attributed to a number of related identities.
390 14 Ricci curvature

|∇ψ|2 (∆ψ)2
∆ − ∇ψ · ∇(∆ψ) ≥ + Ric(∇ψ). (14.30)
2 n
Apart from regularity issues, this inequality is strictly equivalent to
(14.13), and therefore to (14.14) or (14.15).
Not so much has been lost when going from (14.28) to (14.30): there
is still equality in (14.30) at all points x where ∇2 ψ(x) is a multiple of
the identity.
One can also take out the direction of motion, ∇ψ d := (∇ψ)/|∇ψ|,
from the Bochner identity. The Hamilton–Jacobi equation implies
d + ∇2 ψ · ∇ψ
∂t ∇ψ d = 0, so
D E
d ∇ψ
∂t ∇2 ψ · ∇ψ, d

d ∇ψ
= − ∇2 (|∇ψ|2 /2) · ∇ψ, d − 2 ∇2 ψ · (∇2 ψ · ∇ψ),
d ∇ψ d ,

d 2.
and by symmetry the latter term can be rewritten −2 |(∇2 ψ) · ∇ψ|
From this one easily obtains the following reﬁnement of Bochner’s for-
mula: Deﬁne

d ∇ψ
∆// f = ∇2 f · ∇ψ, d , ∆⊥ = ∆ − ∆// ,

then
 |∇ψ|2 2
 d 2 2
∆// 2 − ∇ψ · ∇∆// ψ + 2 (∇ ψ) · ∇ψ ≥ (∆// ψ)

 2 2
∆⊥ |∇ψ| − ∇ψ · ∇∆ ⊥ ψ − 2 d 2 ≥ k∇2 ψk2 + Ric(∇ψ).
(∇ ψ) · ∇ψ
2 ⊥ HS
(14.31)
This is the “Bochner formula with the direction of motion taken out”.
I have to confess that I never saw these frightening formulas anywhere,
and don’t know whether they have any use. But of course, they are
equivalent to their Lagrangian counterpart, which will play a crucial
role in the sequel.

Analytic and geometric consequences of Ricci curvature

bounds

Inequalities (14.13), (14.14), (14.15) and (14.30) are the “working

heart” of Ricci curvature analysis. Many gometric and analytic con-
sequences follow from these estimates.
Analytic and geometric consequences of Ricci curvature bounds 391

Here is a ﬁrst example coming from analysis and partial diﬀeren-

tial equations theory: If the Ricci curvature of M is globally bounded
below (inf x Ricx > −∞), then there exists a unique heat kernel,
i.e. a measurable function pt (x, y) (t > 0, x ∈ M , y ∈ M ), in-
tegrable inR y, smooth outside of the diagonal x = y, such that
f (t, x) := pt (x, y) f0 (y) dvol(y) solves the heat equation ∂t f = ∆f
with initial datum f0 .
Here is another example in which some topological information can
be recovered from Ricci bounds: If M is a manifold with nonnegative
Ricci curvature (for each x, Ricx ≥ 0), and there exists a line in M ,
that is, a geodesic γ which is minimizing for all values of time t ∈ R,
then M is isometric to R×M ′ , for some Riemannian manifold M ′ . This
is the splitting theorem, in a form proven by Cheeger and Gromoll.
Many quantitative statements can be obtained from (i) a lower
bound on the Ricci curvature and (ii) an upper bound on the dimension
of the manifold. Below is a (grossly nonexhaustive) list of some famous
such results. In the statements to come, M is always assumed to be a
smooth, complete Riemannian manifold, vol stands for the Riemannian
volume on M , ∆ for the Laplace operator and d for the Riemannian
distance; K is the lower bound on the Ricci curvature, and n is the
dimension of M . Also, if A is a measurable set, then Ar will denote
its r-neighborhood, which is the set of points that lie at a distance
at most r from A. Finally, the “model space” is the simply connected
Riemannian manifold with constant sectional curvature which has the
same dimension as M , and Ricci curvature constantly equal to K (more
rigorously, to Kg, where g is the metric tensor on the model space).
1. Volume growth estimates: The Bishop–Gromov inequality (also
called Riemannian volume comparison theorem) states that the volume
of balls does not increase faster than the volume of balls in the model
space. In formulas: for any x ∈ M ,

vol [Br (x)]

is a nonincreasing function of r,
V (r)

where Z r
V (r) = S(r ′ ) dr ′ ,
0
392 14 Ricci curvature
 r !

 K


sinn−1 r if K > 0

 n−1





S(r) = cn,K r n−1 if K = 0





 r !



 n−1 |K|

sinh r if K < 0.
n−1
Here of course S(r) is the surface area of Br (0) in the model space, that
is the (n − 1)-dimensional volume of ∂Br (0), and cn,K is a nonessential
normalizing constant. (See Theorem 18.8 later in this course.)
2. Diameter estimates: The Bonnet–Myers theorem states that,
if K > 0, then M is compact and more precisely
r
n−1
diam (M ) ≤ π ,
K
with equality for the model sphere.
3. Spectral gap inequalities: If K > 0, then the spectral gap λ1 of
the nonnegative operator −∆ is bounded below:
nK
λ1 ≥ ,
n−1
with equality again for the model sphere. (See Theorem 21.20 later in
this course.)
4. (Sharp) Sobolev inequalities: If K > 0 and n ≥ 3, let µ =
vol/vol[M ] be the normalized volume measure on M ; then for any
smooth function on M ,
4 2n
kf k2L2⋆ (µ) ≤ kf k2L2 (µ) + k∇f k2L2 (µ) , 2⋆ = ,
Kn(n − 2) n−2
and those constants are sharp for the model sphere.
5. Heat kernel bounds: There are many of them, in particular the
well-known Li–Yau estimates: If K ≥ 0, then the heat kernel pt (x, y)
satisﬁes
C d(x, y)2
pt (x, y) ≤ exp − ,
vol [B√t (x)] 2 Ct
for some constant C which only depends on n. For K < 0, a similar
bound holds true, only now C depends on K and there is an additional
Change of reference measure and effective dimension 393

factor eCt . There are also pointwise estimates on the derivatives of

log pt , in relation with Harnack inequalities.
The list could go on. More recently, Ricci curvature has been at
the heart of Perelman’s solution of the celebrated Poincaré conjecture,
and more generally the topological classification of three-dimensional
manifolds. Indeed, Perelman’s argument is based on Hamilton’s idea
to use Ricci curvature in order to define a “heat flow” in the space of
metrics, via the partial differential equation
∂g
= −2 Ric(g), (14.32)
∂t
where Ric(g) is the Ricci tensor associated with the metric g, which
can be thought of as something like −∆g. The flow defined by (14.32)
is called the Ricci flow. Some time ago, Hamilton had already used its
properties to show that a compact simply connected three-dimensional
Riemannian manifold with positive Ricci curvature is automatically
diffeomorphic to the sphere S 3 .

Change of reference measure and effective dimension

For various reasons, one is often led to consider a reference measure ν
that is not the volume measure vol, but, say, ν(dx) = e−V (x) vol(dx),
for some function V : M → R, which in this chapter will always be
assumed to be of class C 2 . The metric–measure space (M, d, ν), where
d stands for the geodesic distance, may be of interest in its own right,
or may appear as a limit of Riemannian manifolds, in a sense that will
be studied in Part III of these notes.
Of course, such a change of reference measure affects Jacobian deter-
minants; so Ricci curvature estimates will lose their geometric meaning
unless one changes the definition of Ricci tensor to take the new refer-
ence measure into account. This might perturb the dependence of all
estimates on the dimension, so it might also be a good idea to intro-
duce an “effective dimension” N , which may be larger than the “true”
dimension n of the manifold.
The most well-known example is certainly the Gaussian measure
in Rn , which I shall denote by γ (n) (do not confuse it with a geodesic!):
2
e−|x| dx
γ (n) (dx) = , x ∈ Rn .
(2π)n/2
394 14 Ricci curvature

It is a matter of experience that most theorems which we encounter

about the Gaussian measure can be written just the same in dimen-
sion 1 or in dimension n, or even in infinite dimension, when properly
interpreted. In fact, the effective dimension of (Rn , γ (n) ) is infinite, in
a certain sense, whatever n. I admit that this perspective may look
strange, and might be the result of lack of imagination; but in any
case, it will fit very well into the picture (in terms of sharp constants
for geometric inequalities, etc.).
So, again let

Tt (x) = γ(t, x) = expx t∇ψ(x) ;

now the Jacobian determinant is

ν Tt (Br (x)) e−V (Tt (x))
J (t, x) = lim = J0 (t, x),
r↓0 ν[Br (x)] e−V (x)

where J0 is the Jacobian corresponding to V ≡ 0 (that is, to ν = vol).

Then (with dots still standing for derivation with respect to t),

(log J )· (t, x) = (log J0 )· (t, x) − γ̇(t, x) · ∇V (γ(t, x)),

D E
(log J )·· (t, x) = (log J0 )·· (t, x) − ∇2 V (γ(t, x)) · γ̇(t, x), γ̇(t, x) .

For later purposes it will be useful to keep track of all error terms
in the inequalities. So rewrite (14.12) as
2
(tr U )2 tr U
·
(tr U ) +
+ Ric(γ̇) = − U − In
n n . (14.33)
HS

Then the left-hand side in (14.33) becomes

[(log J0 )· ]2
(log J0 )·· + + Ric(γ̇)
n
[(log J )· + γ̇ · ∇V (γ)]2
= (log J )·· + h∇2 V (γ) · γ̇, γ̇i + + Ric(γ̇).
n
By using the identity
2
a2 (a + b)2 b2 n N −n
= − + b− a , (14.34)
n N N − n N (N − n) n

we see that
Change of reference measure and effective dimension 395
h i2
(log J )· + γ̇ · ∇V (γ)
n
2
(log J )· (γ̇ · ∇V (γ))2
= −
N N −n 2
n N −n · N
+ (log J ) + γ̇ · ∇V (γ)
N (N − n) n n
2
(log J )· (γ̇ · ∇V (γ))2
= −
N N −n 2
n N −n ·
+ (log J0 ) + γ̇ · ∇V (γ)
N (N − n) n
2 2
(log J )· (γ̇ · ∇V (γ))2 n N −n
= − + tr U + γ̇ · ∇V (γ)
N N −n N (N − n) n
To summarize these computations it will be useful to introduce some
more notation: first, as usual, the negative logarithm of the Jacobian
determinant:
ℓ(t, x) := − log J (t, x); (14.35)
and then, the generalized Ricci tensor:
∇V ⊗ ∇V
RicN,ν := Ric + ∇2 V − , (14.36)
N −n
where the tensor product ∇V ⊗ ∇V is a quadratic form on TM , defined
by its action on tangent vectors as

∇V ⊗ ∇V x (v) = (∇V (x) · v)2 ;
so
(∇V · γ̇)2
RicN,ν (γ̇) = (Ric + ∇2 V )(γ̇) − .
N −n
It is implicitly assumed in (14.36) that N ≥ n (otherwise the correct
definition is RicN,ν = −∞); if N = n the convention is 0 × ∞ = 0,
so (14.36) still makes sense if ∇V = 0. Note that Ric∞,ν = Ric + ∇2 V ,
while Ricn,vol = Ric.
The conclusion of the preceding computations is that
2
ℓ̇2 tr U
ℓ̈ =
+ RicN,ν (γ̇) + U − In
N n
HS
2
n N −n
+ tr U + γ̇ · ∇V (γ) . (14.37)
N (N − n) n
396 14 Ricci curvature

When N = ∞ this takes a simpler form:

2
tr U

ℓ̈ = Ric∞,ν (γ̇) + U − In (14.38)
n
HS

When N < ∞ one can introduce

1
D(t) := J (t) N ,
and then formula (14.37) becomes
2
D̈ tr U

− N = RicN,ν (γ̇) + U − In
D n
HS
2
n N −n
+ tr U + γ̇ · ∇V (γ) . (14.39)
N (N − n) n
Of course, it is a trivial corollary of (14.37) and (14.39) that


 ℓ̇2

 ℓ̈ ≥ + RicN,ν (γ̇)
 N
(14.40)



 D̈
−N ≥ RicN,ν (γ̇).
D
Finally, if one wishes, one can also take out the direction of motion
(skip at ﬁrst reading and go directly to the next section). Deﬁne, with
self-explicit notation,
e−V (Tt (x))
J⊥ (t, x) = J0,⊥ (t, x) ,
e−V (x)
1
and ℓ⊥ = − log J⊥ , D⊥ = J⊥N . Now, in place of (14.33), use
2 n
X
(tr U⊥ )2 tr U⊥
·
(tr U⊥ ) +
+ Ric(γ̇) = − U⊥ − In−1 − u21j
n−1 n−1
HS j=2
(14.41)
as a starting point. Computations quite similar to the ones above lead
to
2
(ℓ̇⊥ )2 tr U ⊥
ℓ̈⊥ = + RicN,ν (γ̇) +
U⊥ − In−1

N −1 n−1 HS
2 X n
n−1 N −n
+ tr U + γ̇ · ∇V (γ) + u21j . (14.42)
(N − 1)(N − n) n−1
j=2
Generalized Bochner formula and Γ2 formalism 397

In the case N = ∞, this reduces to

2 n
X
tr U⊥
ℓ̈⊥ = Ric∞,ν (γ̇) + U
⊥ − I
n−1 + u21j ; (14.43)
n−1 HS j=2

and in the case N < ∞, to

2
D̈⊥ tr U⊥
−N
= RicN,ν (γ̇) + U⊥ − In−1
D⊥ n−1
HS
2 Xn
n−1 N −n
+ tr U + γ̇ · ∇V (γ) + u21j . (14.44)
(N − 1)(N − n) n−1
j=2

As corollaries, 

 (ℓ̇⊥ )2

 ℓ̈ ⊥ ≥ + RicN,ν (γ̇)
 N −1
(14.45)



−N D̈⊥ ≥ RicN,ν (γ̇).

D⊥

Generalized Bochner formula and Γ2 formalism

Of course there is an Eulerian translation of all that. This Eulerian

formula can be derived either from the Lagrangian calculation, or from
the Bochner formula, by a calculation parallel to the above one; the
latter approach is conceptually simpler, while the former is faster. In
any case the result is best expressed in terms of the diﬀerential operator

L = ∆ − ∇V · ∇, (14.46)

and can be written

|∇ψ|2
L − ∇ψ · ∇Lψ
2
= k∇2 ψk2HS + Ric + ∇2 V (∇ψ)
(Lψ)2
= + RicN,ν (∇ψ)
N
2 2 !
2 ∆ψ n N − n
+ ∇ ψ − In
+ N (N − n) ∆ψ + ∇V · ∇ψ .
n HS n
398 14 Ricci curvature

It is convenient to reformulate this formula in terms of the Γ2 for-

malism. Given a general linear operator L, one deﬁnes the associated
Γ operator (or carré du champ) by the formula
1
Γ (f, g) = L(f g) − f Lg − gLf . (14.47)
2
Note that Γ is a bilinear operator, which in some sense encodes the
deviation of L from being a derivation operator. In our case, for (14.46),

Γ (f, g) = ∇f · ∇g.

Next introduce the Γ2 operator (or carré du champ itéré):

1
Γ2 (f, g) = LΓ (f g) − Γ (f, Lg) − Γ (g, Lf ) . (14.48)
2
In the case of (14.46), the important formula for later purposes is

|∇ψ|2
Γ2 (ψ) := Γ2 (ψ, ψ) = L − ∇ψ · ∇(Lψ). (14.49)
2
Then our previous computations can be rewritten as

(Lψ)2
Γ2 (ψ) = + RicN,ν (∇ψ)
N
2 2 !
2 ∆ψ n N − n
+
∇ ψ − In
+ N (N − n) ∆ψ + ∇V · ∇ψ .
n HS n

(14.50)

Of course, a trivial corollary is

(Lψ)2
Γ2 (ψ) ≥ + RicN,ν (∇ψ). (14.51)
N
And as the reader has certainly guessed, one can now take out the
direction of motion (this computation is provided for completeness but
will not be used): As before, deﬁne

d = ∇ψ ,
∇ψ
|∇ψ|

then if f is a smooth function, let ∇2⊥ f be ∇2 f restricted to the space

orthogonal to ∇ψ, and ∆⊥ f = tr (∇2⊥ f ), i.e.
Curvature-dimension bounds 399
D E
d ∇ψ
∆⊥ f = ∆f − ∇2 f · ∇ψ, d ,

and next,
L⊥ f = ∆⊥ f − ∇V · ∇f,
|∇ψ|2
Γ2,⊥ (ψ) = L⊥ d 2 − 2|(∇2 ψ) · ∇ψ|
− ∇ψ · ∇(L⊥ ψ) − 2(∇2 ψ) · ∇ψ d 2.
2
Then
2
(L⊥ ψ)2 2 ∆⊥ ψ
Γ2,⊥ (ψ) = + RicN,ν (∇ψ) +
∇ ⊥ ψ − I
n−1
N −1 n−1
2 X n
n−1 N −n
+ ∆⊥ ψ + ∇V · ∇ψ + (∂1j ψ)2 .
(N − 1)(N − n) n−1
j=2

Curvature-dimension bounds

It is convenient to declare that a Riemannian manifold M , equipped

with its volume measure, satisfies the curvature-dimension estimate
CD(K, N ) if its Ricci curvature is bounded below by K and its dimen-
sion is bounded above by N : Ric ≥ K, n ≤ N . (As usual, Ric ≥ K is a
shorthand for “∀x, Ricx ≥ Kgx .”) The number K might be positive or
negative. If the reference measure is not the volume, but ν = e−V vol,
then the correct definition is RicN,ν ≥ K.
Most of the previous discussion is summarized by Theorem 14.8
below, which is all the reader needs to know about Ricci curvature to
understand the rest of the proofs in this course. For convenience I shall
briefly recall the notation:

• measures: vol is the volume on M , ν = e−V vol is the reference

measure;
• operators: ∆ is the Laplace(–Beltrami) operator on M , ∇2 is the
Hessian operator, L = ∆ − ∇V · ∇ is the modified Laplace operator,
and Γ2 (ψ) = L(|∇ψ|2 /2) − ∇ψ · ∇(Lψ);
• tensors: Ric is the Ricci curvature bilinear form, and RicN,ν is the
modified Ricci tensor: RicN,ν = Ric + ∇2 V − (∇V ⊗ ∇V )/(N − n),
where ∇2 V (x) is the Hessian of V at x, identified to a bilinear form;
400 14 Ricci curvature

• functions: ψ is an arbitrary function; in formulas involving the Γ2

formalism it will be assumed to be of class C 3 , while in formulas
involving Jacobian determinants it will only be assumed to be semi-
convex;
• geodesic paths: If ψ is a given function on M , γ(t, x) = Tt (x) =
expx (t − t0 )∇ψ(x) is the geodesic starting from x with velocity
γ̇(t0 , x) = ∇ψ(x), evaluated at time t ∈ [0, 1]; it is assumed that
J (t, x) does not vanish for t ∈ (0, 1); the starting time t0 may be
the origin t0 = 0, or any other time in [0, 1];
• Jacobian determinants: J (t, x) is the Jacobian determinant of Tt (x)
(with respect to the reference measure ν, not with respect to the
standard volume), ℓ = − log J , and D = J 1/N is the mean distor-
tion associated with (Tt );
• the dot means differentiation with respect to time;
• finally, the subscript ⊥ in J⊥ , D⊥ , Γ2,⊥ means that the direction
of motion γ̇ = ∇ψ has been taken out (see above for precise defini-
tions).

Theorem 14.8 (CD(K, N ) curvature-dimension bound). Let M

be a Riemannian manifold of dimension n, and let K ∈ R, N ∈ [n, ∞].
Then, the conditions below are all equivalent if they are required to hold
true for arbitrary data; when they are fulfilled, M is said to satisfy the
CD(K, N ) curvature-dimension bound:
(i) RicN,ν ≥ K;
(Lψ)2
(ii) Γ2 (ψ) ≥ + K|∇ψ|2 ;
N
(ℓ̇)2
(iii) ℓ̈ ≥ + K|γ̇|2 .
N
If N < ∞, this is also equivalent to

K|γ̇|2
(iv) D̈ + D ≤ 0.
N
Moreover, these inequalities are also equivalent to
(L⊥ ψ)2
(ii’) Γ2,⊥ (ψ) ≥ + K|∇ψ|2 ;
N −1
(ℓ̇⊥ )2
(iii’) ℓ¨⊥ ≥ + K|γ̇|2 ;
N −1
and, in the case N <2∞,
K|γ̇|
(iv’) D̈⊥ + D⊥ ≤ 0.
N −1
Curvature-dimension bounds 401

Remark 14.9. Note carefully that the inequalities (i)–(iv’) are re-
quired to be true always: For instance (ii) should be true for all ψ,
all x and all t ∈ (0, 1). The equivalence is that [(i) true for all x] is
equivalent to [(ii) true for all ψ, all x and all t], etc.

Examples 14.10 (One-dimensional CD(K, N ) model spaces).

(a) Let K > 0 and 1 < N < ∞, consider
r r !
N −1 π N −1 π
M= − , ⊂ R,
K 2 K 2

equipped with the usual distance on R, and the reference measure

r !
K
ν(dx) = cosN −1 x dx;
N −1

then M satisﬁes CD(K, N ), although the Hausdorﬀ dimension of M

is of course 1. Note that M is not complete, but this is not a serious
problem since CD(K, N ) is a local property. (We can also replace M
by its closure, but then it is a manifold with boundary.)
(b) For K < 0, 1 ≤ N < ∞, the same conclusion holds true if one
considers M = R and
r !
|K|
ν(dx) = coshN −1 x dx.
N −1

(c) For any N ∈ [1, ∞), an example of one-dimensional space satis-

fying CD(0, N ) is provided by M = (0, +∞) with the reference measure
ν(dx) = xN −1 dx;
(d) For any K ∈ R, take M = R and equip it with the reference
measure
Kx2
ν(dx) = e− 2 dx;
then M satisﬁes CD(K, ∞).

Sketch of proof of Theorem 14.8. It is clear from our discussion in this

chapter that (i) implies (ii) and (iii); and (iii) is equivalent to (iv) by el-
ementary manipulations about derivatives. (Moreover, (ii) and (iii) are
equivalent modulo smoothness issues, by Eulerian/Lagrangian duality.)
It is less clear why, say, (ii) would imply (i). This comes from for-
mulas (14.37) and (14.50). Indeed, assume (ii) and choose an arbitrary
402 14 Ricci curvature

x0 ∈ M , and v0 ∈ Tx0 M . Assume, to ﬁx ideas, that N > n. Construct

a C 3 function ψ such that


∇ψ(x0 ) = v0

∇2 ψ(x0 ) = λ0 In

 n
∆ψ(x0 )(= nλ0 ) = − ∇V (x0 ) · v0 .
N −n
(This is fairly easy by using local coordinates, or distance and expo-
nential functions.) Then all the remainder terms in (14.50) will vanish
at x0 , so that

2 2 (Lψ)2
K|v0 | = K|∇ψ(x0 )| ≤ Γ2 (ψ) − (x0 )
N

= RicN,ν ∇ψ(x0 ) = RicN,ν (v0 ).

So indeed RicN,ν ≥ K.
The proof goes in the same way for the equivalence between (i) and
(ii’), (iii’), (iv’): again the problem is to understand why (ii’) implies
(i), and the reasoning is almost the same as before; the key point being
that the extra error terms in ∂1j ψ, j 6= 2, all vanish at x0 . ⊓
⊔

Many interesting inequalities can be derived from CD(K, N ). It was

successfully advocated by Bakry and other authors during the past two
decades that CD(K, N ) should be considered as a property of the gen-
eralized Laplace operator L. Instead, it will be advocated in this course
that CD(K, N ) is a property of the solution of the optimal transport
problem, when the cost function is the square of the geodesic distance.
Of course, both points of view have their advantages and their draw-
backs.

From differential to integral curvature-dimension bounds

There are two ways to characterize the concavity of a function f (t)

on a time-interval, say [0, 1]: the diﬀerential inequality f¨ ≤ 0, or the
integral bound f (1 − λ) t0 + λ t1 ≥ (1 − λ)f (t0 ) + λf (t1 ). If the latter
is required to hold true for all t0 , t1 ∈ [0, 1] and λ ∈ [0, 1], then the two
formulations are equivalent.
From differential to integral curvature-dimension bounds 403

There are two classical generalizations. The first one states that the
differential inequality f¨+ K ≤ 0 is equivalent to the integral inequality
Kt(1 − t)
f (1 − λ) t0 + λ t1 ≥ (1 − λ) f (t0 ) + λ f (t1 ) + (t0 − t1 )2 .
2
Another one is as follows: The differential inequality

f¨(t) + Λf (t) ≤ 0 (14.52)

is equivalent to the integral bound

f (1−λ) t0 +λ t1 ≥ τ (1−λ) (|t0 −t1 |) f (t0 )+τ (λ) (|t0 −t1 |) f (t1 ), (14.53)

where  √

 sin(λθ Λ)

 √ if Λ > 0

 sin(θ Λ)





(λ)
τ (θ) = λ if Λ = 0





 √

 sinh(λθ −Λ)


 √ if Λ < 0.
sinh(θ −Λ)
A more precise statement, together with a proof, are provided in the
Second Appendix of this chapter.
This leads to the following integral characterization of CD(K, N ):

Theorem 14.11 (Integral reformulation of curvature-dimension

bounds). Let M be a Riemannian manifold, equipped with a reference
measure ν = e−V vol, V ∈ C 2 (M ), and let d be the geodesic distance
on M . Let K ∈ R and N ∈ [1, ∞]. Then, with the same notation as
in Theorem 14.8, M satisfies CD(K, N ) if and only if the following
inequality is always true (for any semiconvex ψ, and almost any x, as
soon as J (t, x) does not vanish for t ∈ (0, 1)):

(1−t) (t)
D(t, x) ≥ τK,N D(0, x) + τK,N D(1, x) (N < ∞)
(14.54)
Kt(1 − t)
ℓ(t, x) ≤ (1 − t) ℓ(0, x) + t ℓ(1, x) − d(x, y)2 (N = ∞),
2
(14.55)
404 14 Ricci curvature

where y = expx (∇ψ(x)) and, in case N < ∞,


 sin(tα)

 if K > 0

 sin α




(t)
τK,N = t if K = 0








 sinh(tα) if K < 0
sinh α
where r
|K|
α= d(x, y) (α ∈ [0, π] if K > 0).
N
Proof of Theorem 14.11. If N < ∞, inequality (14.54) is obtained
by transforming the diﬀerential bound of (iii) in Theorem 14.8 into
an integral bound, after noticing that |γ̇| is a constant all along
the geodesic γ, and equals d(γ0 , γ1 ). Conversely, to go from (14.54)
to Theorem 14.8(iii), we select a geodesic γ, then reparametrize the
geodesic (γt )t0 ≤t≤t1 into a geodesic [0, 1] → M , apply (14.54) to the
reparametrized path and discover that
(1−λ) (λ)
D(t, x) ≥ τK,N D(t0 , x) + τK,N D(t1 , x) t = (1 − λ)t0 + λt1 ;
p
where now α = |K|/N d(γ(t0 ), γ(t1 )). It follows that D(t, x) satis-
ﬁes (14.53) for any choice of t0 , t1 ; and this is equivalent to (14.52).

The reasoning is the same for the case N = ∞, starting from in-
equality (ii) in Theorem 14.8. ⊓
⊔
(t)
The next result states that the the coeﬃcients τK,N obtained in
Theorem 14.11 can be automatically improved if N is ﬁnite and K 6= 0,
by taking out the direction of motion:

Theorem 14.12 (Curvature-dimension bounds with direction

of motion taken out). Let M be a Riemannian manifold, equipped
with a reference measure ν = e−V vol, and let d be the geodesic distance
on M . Let K ∈ R and N ∈ [1, ∞). Then, with the same notation as
in Theorem 14.8, M satisfies CD(K, N ) if and only if the following
inequality is always true (for any semiconvex ψ, and almost any x, as
soon as J (t, x) does not vanish for t ∈ (0, 1)):
From differential to integral curvature-dimension bounds 405

(1−t) (t)
D(t, x) ≥ τK,N D(0, x) + τK,N D(1, x) (14.56)

where now
 1

 1 sin(tα) 1− N
t N
 if K > 0

 sin α





(t)
τK,N = t if K = 0







 1
 1 sinh(tα) 1− N

t N if K < 0
sinh α
and r
|K|
α= d(x, y) (α ∈ [0, π] if K > 0).
N −1
Remark 14.13. When N < ∞ and K > 0 Theorem 14.12 pcontains the
Bonnet–Myers theorem according to whichpd(x, y) ≤ π (N − 1)/K.
With Theorem 14.11 the bound was only π N/K.
Proof of Theorem 14.12. The proof that (14.56) implies CD(K, N ) is
done in the same way as for (14.54). (In fact (14.56) is stronger
than (14.54).)
As for the other implication: Start from (14.22), and transform it
into an integral bound:
(1−t) (t)
D⊥ (t, x) ≥ σK,N D⊥ (0, x) + σK,N D⊥ (1, x),
(t)
where σK,N = sin(tα)/ sin α if K > 0; t if K = 0; sinh(tα)/ sinh α if
K < 0. Next transform (14.19) into the integral bound
D// (t, x) ≥ (1 − t) D// (0, x) + t D// (1, x).
Both estimates can be combined thanks to Hölder’s inequality:
1 1
D(t, x) = D⊥ (t, x)1− N D// (t, x) N
1− 1
(1−t) (t) N
≥ σK,N D(0, x) + σK,N D(1, x)
1
N
(1 − t) D// (0, x) + t D// (1, x)
(1−t) 1 1 (t) 1 1
≥ (σK,N )1− N (1 − t) N D(0, x) + (σK,N ) N t N D// (1, x).
This implies inequality (14.56). ⊓
⊔
406 14 Ricci curvature

Estimate (14.56) is sharp in general. The following reformulation

yields an appealing interpretation of CD(K, N ) in terms of comparison
spaces. In the sequel, I will write Jacx for the (unoriented) Jacobian
determinant evaluated at point x, computed with respect to a given
reference measure.

Corollary 14.14 (Curvature-dimension bounds by comparison).

Let M be a Riemannian manifold equipped with a reference mea-
sure ν = e−V vol, V ∈ C 2 (M ). Define the J -function of M on
[0, 1] × R+ × R+ by the formula
n o
JM,ν (t, δ, J) := inf Jacx (exp(tξ)); |ξ(x)| = δ; Jacx (exp(ξ)) = J ,
(14.57)
where the infimum is over all vector fields ξ defined around x, such that
∇ξ(x) is symmetric, and Jacx (exp sξ) 6= 0 for 0 ≤ s < 1. Then, for
any K ∈ R, N ∈ [1, ∞] (K ≤ 0 if N = 1),

(M, ν) satisfies CD(K, N ) ⇐⇒ JM,ν ≥ J (K,N ) ,

where J (K,N ) is the J -function of the model CD(K, N ) space consid-

ered in Examples 14.10.
If N is an integer, J (K,N ) is also the J -function of the N -dimensional
model space
 r

 N − 1
S N
 if K > 0

 K




 N
S (K,N )
= R if K = 0





 s

 N − 1

 N
if K < 0,
H
|K|

equipped with its volume measure.

Corollary 14.14 results from Theorem 14.12 by a direct computation

of the J -function of the model spaces. In the case of S (K,N ) , one can
also make a direct computation, or note that all the inequalities which
were used to obtain (14.56) turn into equalities for suitable choices of
parameters.
Distortion coefficients 407

Remark 14.15. There is a quite similar (and more well-known) for-

mulation of lower sectional curvature bounds which goes as follows.
Deﬁne the L-function of a manifold M by the formula
n
LM (t, δ, L) = inf d expx (tv), expx (tw) ; |v| = |w| = δ;
o
d(expx v, expx w) = L ,

where the inﬁmum is over tangent vectors v, w ∈ Tx M . Then M has

sectional curvature larger than κ if and only if LM ≥ L(κ) , where L(κ) is
√
the L-function of the reference
p space S (κ) , which is S 2 (1/ κ) if κ > 0,
R2 if κ = 0, and H2 (1/ |κ|) if κ < 0. By changing the inﬁmum into a
supremum and reversing the inequalities, one can also obtain a charac-
terization of upper sectional curvature bounds (under a topological as-
sumption of simple connectivity). The comparison with (14.14) conveys
the idea that sectional curvature bounds measure the rate of separation
of geodesics in terms of distances, while Ricci curvature bounds do it
in terms of Jacobian determinants.

Distortion coefficients

Apart from Deﬁnition 14.19, the material in this section is not necessary
to the understanding of the rest of this course. Still, it is interesting
because it will give a new interpretation of Ricci curvature bounds, and
motivate the introduction of distortion coeﬃcients, which will play a
crucial role in the sequel.

Definition 14.16 (Barycenters). If A and B are two measurable

sets in a Riemannian manifold, and t ∈ [0, 1], a t-barycenter of A and B
is a point which can be written γt , where γ is a (minimizing, constant-
speed) geodesic with γ0 ∈ A and γ1 ∈ B. The set of all t-barycenters
between A and B is denoted by [A, B]t .

Definition 14.17 (Distortion coefficients). Let M be a Rieman-

nian manifold, equipped with a reference measure e−V vol, V ∈ C(M ),
and let x and y be any two points in M . Then the distortion coefficient
β t (x, y) between x and y at time t ∈ (0, 1) is defined as follows:
408 14 Ricci curvature

• If x and y are joined by a unique geodesic γ, then

ν [x, Br (y)]t ν [x, Br (y)]t
β t (x, y) = lim = lim n ; (14.58)
r↓0 ν[Btr (y)] r↓0 t ν[Br (y)]

• If x and y are joined by several minimizing geodesics, then

β t (x, y) = inf lim sup β t (x, γs ), (14.59)

γ s→1−

where the infimum is over all minimizing geodesics joining γ(0) = x

to γ(1) = y.

Finally, the values of β t (x, y) for t = 0 and t = 1 are defined by

β 1 (x, y) ≡ 1; β 0 (x, y) := lim inf β t (x, y).

t→0+

The heuristic meaning of distortion coeﬃcients is as follows (see

Figure 14.4). Assume you are standing at point x and observing some
device located at y. You are trying to estimate the volume of this
device, but your appreciation is altered because light rays travel along
curved lines (geodesics). If x and y are joined by a unique geodesic,
then the coeﬃcient β 0 (x, y) tells by how much you are overestimating;
so it is less than 1 in negative curvature, and greater than 1 in positive
curvature. If x and y are joined by several geodesics, this is just the
same, except that you choose to look in the direction where the device
looks smallest.

geodesics are distorted

by curvature effects

11
00
the light source
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11 how the observer thinks
the light source looks like
location of
the observer
Fig. 14.4. The meaning of distortion coefficients: Because of positive curvature
effects, the observer overestimates the surface of the light source; in a negatively
curved world this would be the contrary.
Distortion coefficients 409

More generally, β t (x, y) compares the volume occupied by the light

rays emanating from the light source, when they arrive close to γ(t),
to the volume that they would occupy in a ﬂat space (see Figure 14.5).

111
000 y
000
111
000
111
000
111
000
111
x 000
111
000
111
000
111
000
111
000
111
000
111
000
111

Fig. 14.5. The distortion coefficient is approximately equal to the ratio of the
volume filled with lines, to the volume whose contour is in dashed line. Here the
space is negatively curved and the distortion coefficient is less than 1.

Now let us express distortion coeﬃcients in diﬀerential terms, and

more precisely Jacobi fields. A key concept in doing so will be the notion
of focalization, which was already discussed in Chapter 8: A point y is
said to be focal to another point x if there exists v ∈ Tx M such that
y = expv x and the differential dv expx : Tx M → Ty M is not invertible.
It is equivalent to say that there is a geodesic γ which visits both x
and y, and a Jacobi field J along γ such that J(x) = 0, J(y) = 0. This
concept is obviously symmetric in x and y, and then x, y are said to be
conjugate points (along γ).
If x and y are joined by a unique geodesic γ and are not conjugate,
then by the local inversion theorem, for r small enough, there is a unique
velocity ξ(z) at z ∈ Br (y) such that expz ξ(z) = x. Then the distortion
coefficients can be interpreted as the Jacobian determinant of exp ξ at
time t, renormalized by (1 − t)n , which would be the value in Euclidean
space. The difference with the computations at the beginning of this
chapter is that now the Jacobi field is not defined by its initial value
and initial derivative, but rather by its initial value and its final value:
expz ξ(z) = x independently of z, so the Jacobi field vanishes after a
time 1. It will be convenient to reverse time so that t = 0 corresponds
to x and t = 1 to y; thus the conditions are J(0) = 0, J(1) = In . After
these preparations it is easy to derive the following:
410 14 Ricci curvature

Proposition 14.18 (Computation of distortion coefficients).

Let M be a Riemannian manifold, let x and y be two points in M .
Then
[γ]
β t (x, y) = inf β t (x, y),
γ

where the infimum is over all minimizing geodesics γ joining γ(0) = x

[γ]
to γ(1) = y, and β t (x, y) is defined as follows:

• If x, y are not conjugate along γ, let E be an orthonormal basis of

Ty M and define


 det J0,1 (t)

 if 0 < t ≤ 1
 tn
[γ]
β t (x, y) = (14.60)

 0,1 (s)

 det J
 lim if t = 0,
s→0 sn

where J0,1 is the unique matrix of Jacobi fields along γ satisfying

J0,1 (0) = 0; J0,1 (1) = E;

• If x, y are conjugate along γ, define

(
[γ] 1 if t = 1
β t (x, y) =
+∞ if 0 ≤ t < 1.

Distortion coeﬃcients can be explicitly computed for the model

CD(K, N ) spaces and depend only on the distance between the two
points x and y. These particular coeﬃcients (or rather their expression
as a function of the distance) will play a key role in the sequel of these
notes.

Definition 14.19 (Reference distortion coefficients). Given

K ∈ R, N ∈ [1, ∞] and t ∈ [0, 1], and two points x, y in some metric
(K,N )
space (X , d), define βt (x, y) as follows:

• If 0 < t ≤ 1 and 1 < N < ∞ then

Distortion coefficients 411


 +∞ if K > 0 and α > π,





 N −1



 sin(tα)

 if K > 0 and α ∈ [0, π],

 t sin α
(K,N )
βt (x, y) =



 1 if K = 0,










 sinh(tα) N −1
 if K < 0,
t sinh α
(14.61)
where r
|K|
α = d(x, y). (14.62)
N −1
• In the two limit cases N → 1 and N → ∞, modify the above expres-
sions as follows:
(
(K,1) +∞ if K > 0,
βt (x, y) = (14.63)
1 if K ≤ 0,

K
(K,∞) (1−t2 ) d(x,y)2
βt (x, y) = e 6 . (14.64)

(K,N )
• For t = 0 define β0 (x, y) = 1.

If X is the model space for CD(K, N ), as in Examples 14.10, then

β (K,N )is just the distortion coefficient on X .
(K,N )
If K is positive, then for fixed t, βt is an increasing function of α
(going to +∞ at α = π), while for fixed α, it is a decreasing function
(K,N )
of t on [0, 1]. All this is reversed for negative K. On the whole, βt
is nondecreasing in K and nonincreasing in N . (See Figure 14.6.)
The next two theorems relate distortion coefficients with Ricci cur-
vature lower bounds; they show that (a) distortion coefficients can be
interpreted as the “best possible” coefficients in concavity estimates
for the Jacobian determinant; and (b) the curvature-dimension bound
CD(K, N ) is a particular case of a family of more general estimates
characterized by a lower bound on the distortion coefficients.
412 14 Ricci curvature

K>0 K=0 K<0

π
(K,N)
p The shape of the curves βt
Fig. 14.6. (x, y), for fixed t ∈ (0, 1), as a function
of α = |K|/(N − 1) d(x, y).

Theorem 14.20 (Distortion coefficients and concavity of Jaco-

bian determinant). Let M be a Riemannian manifold of dimen-
sion n, and let x, y be any two points in M . Then if (βt (x, y))0≤t≤1 and
(βt (y, x))0≤t≤1 are two families of nonnegative coefficients, the follow-
ing statements are equivalent:
(a) ∀t ∈ [0, 1], βt (x, y) ≤ β t (x, y); βt (y, x) ≤ β t (y, x);
(b) For any N ≥ n, for any geodesic γ joining x to y, for any
t0 ∈ [0, 1], and for any initial vector field ξ around x0 = γ(t0 ), ∇ξ(x0 )
symmetric, let J (s) stand for the Jacobian determinant of exp((s−t0 )ξ)
at x0 ; if J (s) does not vanish for 0 < s < 1, then for all t ∈ [0, 1],
 1 1 1 1 1

 J (t) N ≥ (1 − t) β1−t (y, x) N J (0) N + t βt (x, y) N J (1) N (N < ∞)




 log J (0, x) ≥ (1 − t) log J (0) + t log J (1)


 + (1 − t) log β1−t (y, x) + t log βt (x, y) (N = ∞);
(14.65)
(c) Property (b) holds true for N = n.

Theorem 14.21 (Ricci curvature bounds in terms of distor-

tion coefficients). Let M be a Riemannian manifold of dimension n,
equipped with its volume measure. Then the following two statements
are equivalent:
(a) Ric ≥ K;
(b) β ≥ β (K,n) .

Sketch of proof of Theorem 14.20. To prove the implication (a)⇒(b), it

suﬃces to establish (14.65) for β = β. The case N = ∞ is obtained from
Distortion coefficients 413

the case N < ∞ by passing to the limit, since limN →0 [N (a1/N − 1)] =
log a. So all we have to show is that if n ≤ N < ∞, then
1 1 1 1 1
J (t) N ≥ (1 − t) β 1−t (y, x) N J (0) N + t β t (x, y) N J (1) N .

The case when x, y are conjugate can be treated by a limiting argument.

(In fact the conclusion is that both J (0) and J (1) have to vanish if x
and y are conjugate.) So we may assume that x and y are not conjugate,
introduce a moving orthonormal basis E(t), along γ, and deﬁne the
Jacobi matrices J1,0 (t) and J0,1 (t) by the requirement

J 1,0 (0) = In , J 1,0 (1) = 0; J 0,1 (0) = 0, J 0,1 (1) = In .

(Here J1,0 , J0,1 are identified with their expressions J 1,0 , J 0,1 in the
moving basis E.)
As noted after (14.7), the Jacobi equation is invariant under the
change t → 1 − t, E → −E, so J 1,0 becomes J 0,1 when one exchanges
the roles of x and y, and replaces t by 1 − t. In particular, we have the
formula
det J 1,0 (t)
= β1−t (y, x). (14.66)
(1 − t)n
As in the beginning of this chapter, the issue is to compute the
determinant at time t of a Jacobi field J(t). Since the Jacobi fields
are solutions of a linear differential equation of the form J¨ + R J = 0,
they form a vector space of dimension 2n, and they are invariant under
right-multiplication by a constant matrix. This implies

J(t) = J 1,0 (t) J(0) + J 0,1 (t) J(1). (14.67)

The determinant in dimension n satisﬁes the following inequality: If

X and Y are two n × n nonnegative symmetric matrices, then
1 1 1
det(X + Y ) n ≥ (det X) n + (det Y ) n . (14.68)

By combining this with Hölder’s inequality, in the form

1 1 n N−n 1 N−n 1
(a n + b n ) N ≥ (1 − t) N aN + t N bN ,

we obtain a generalization of (14.68):

1 N−n 1 N−n 1
det(X + Y ) N ≥ (1 − t) N (det X) N + t N (det Y ) N . (14.69)
414 14 Ricci curvature

If one combines (14.67) and (14.69) one obtains

1 N−n 1 1
(det J(t)) N ≥ (1 − t) N (det J 1,0 (t)) N (det J(0)) N
N−n 1 1
+t N (det J 0,1 (t)) N (det J(1)) N
h det J 1,0 (t) i 1 1
h det J 0,1 (t) i 1 1
N N
= (1 − t) J (0) N + t J (1) N
(1 − t)n t n
1 1 1 1
= (1 − t) β 1−t (y, x) N J (0) N + t β t (x, y) N J (1) N ,

where the ﬁnal equality follows from (14.60) and (14.66).

In this way we have shown that (a) implies (b), but at the price
of a gross cheating, since in general the matrices appearing in (14.67)
are not symmetric! It turns out however that they can be positively
cosymmetrized: there is a kernel K(t) such that det K(t) > 0, and
K(t) J 1,0 (t) and K(t) J 0,1 (t) J(1) are both positive symmetric, at least
for t ∈ (0, 1). This remarkable property is a consequence of the structure
of Jacobi ﬁelds; see Propositions 14.30 and 14.31 in the Third Appendix
of this chapter.
Once the cosymmetrization property is known, it is obvious how to
ﬁx the proof: just write
1 1 1
det(K(t) J(t)) n ≥ det(K(t) J 1,0 (t)) n + det(K(t) J 0,1 (t) J(1)) n ,

and then factor out the positive quantity (det K(t))1/n to get
1 1 1
det J(t) n ≥ det(J 1,0 (t)) n + det(J 0,1 (t) J(1)) n .

The end of the argument is as before.

Next, it is obvious that (b) implies (c). To conclude the proof, it
suffices to show that (c) ⇒ (a). By symmetry and definition of β t , it
[γ]
is sufficient to show that βt (x, y) ≤ β t (x, y) for any geodesic γ. If x
and y are conjugate along γ then there is nothing to prove. Otherwise,
we can introduce ξ(z) in the ball Br (y) such that for any z ∈ Br (y),
expz ξ(z) = x, and expz (tξ(z)) is the only geodesic joining z to x. Let µ0
be the uniform probability distribution on Br (y), and µ1 be the Dirac
mass at x; then exp ξ is the unique map T such that T# µ0 = µ1 , so it
is the optimal transport map, and therefore can be written as exp(∇ψ)
for some d2 /2-convex ψ; in particular, ξ = ∇ψ. (Here I have chosen
t0 = 1, say.) So we can apply (b) with N = n, D(1) = 0, D(0) = 1,
D(t, x) = det J 0,1 (t), and obtain
Distortion coefficients 415
1 1
det J 0,1 (t) n ≥ t βt (x, y) n .
[γ]
It follows that βt (x, y) ≤ (det J 0,1 (t))/tn = β t (x, y), as desired. ⊓
⊔
Sketch of proof of Theorem 14.21. To prove (a) ⇒ (b), we may apply
inequality (14.56) with n = N , to conclude that Property (c) in The-
orem 14.20 is satisfied with β = β (K,n) ; thus β ≥ β (K,n) . Conversely,
if β ≥ β (K,n) , Theorem 14.20 implies inequality (14.56), which in turn
implies CD(K, n), or equivalently Ric ≥ K. ⊓
⊔
Remark 14.22. If (M, ν) satisfies CD(K, N ) then (14.65) still holds
true with β = β (K,N ) (provided of course that one takes the measure
ν into account when computing Jacobian determinants): this is just a
rewriting of Theorem 14.12. However, Theorem 14.20 does not apply
in this case, since N is in general larger than the “true dimension” n.
Remark 14.23. Theorems 14.20 and 14.21 suggest a generalization
of the CD(K, N ) criterion: Given an effective dimension N , define the
generalized distortion coefficients β N,ν as the best coefficients in (14.65)
(the first inequality if N < ∞, the second one if N = ∞). In this way
the condition CD(K, N ) might be a particular case of a more general
condition CD(β, N ), which would be defined by the inequality β N,ν ≥
β, where β(x, y) would be, say, a given function of the distance between
x and y. I shall not develop this idea, because (i) it is not clear at present
that it would really add something interesting to the CD(K, N ) theory;
(ii) the condition CD(β, N ) would in general be nonlocal.
Remark 14.24. It is not a priori clear what kind of functions β can
occur as distortion coefficients. It is striking to note that, in view of
Theorems 14.11 and 14.12, for any given manifold M of dimension n
the following two conditions are equivalent, say for K > 0:
 q n
K
 sin t n d(x, y) 
(i) ∀x, y ∈ M, ∀t ∈ [0, 1], β t (x, y) ≥  q  ;
K
t sin n d(x, y)
 q n−1
K
 sin t n−1 d(x, y) 
(ii) ∀x, y ∈ M, ∀t ∈ [0, 1], β t (x, y) ≥  q  .
K
t sin n−1 d(x, y)

This self-improvement property implies restrictions on the possible be-

havior of β.
416 14 Ricci curvature

First Appendix: Second differentiability of convex

functions

In this Appendix I shall provide a proof of Theorem 14.1. As explained

right after the statement of that theorem, it suﬃces to consider the
particular case of a convex function Rn → R. So here is the statement
to be proven:

Theorem 14.25 (Alexandrov’s second differentiability theorem).

Let ϕ : Rn → R be a convex function. Then, for Lebesgue-almost every
x ∈ Rn , ϕ is differentiable at x and there exists a symmetric opera-
tor A : Rn → Rn , characterized by any one of the following equivalent
properties:
(i) ∇ϕ(x + v) = ∇ϕ(x) + Av + o(|v|) as v → 0;
(i’) ∂ϕ(x + v) = ∇ϕ(x) + Av + o(|v|) as v → 0;
hAv, vi
(ii) ϕ(x + v) = ϕ(x) + ∇ϕ(x) · v + + o(|v|2 ) as v → 0;
2
hAv, vi
(ii’) ∀v ∈ Rn , ϕ(x + tv) = ϕ(x) + t ∇ϕ(x) · v + t2 + o(t2 )
2
as t → 0.
(In (i) the vector v is such that ϕ is differentiable at x + v; in (ii) the
notation o(|v|) means a set whose elements are all bounded in norm
like o(|v|).)
The operator A is denoted by ∇2 ϕ(x) and called the Hessian of ϕ
at x. When no confusion is possible, the quadratic form defined by A is
also called the Hessian of ϕ at x. Moreover, the function x → ∇2 ψ(x)
(resp. x → ∆ψ(x) = tr (∇2 ψ(x))) is the density of the absolutely con-
tinuous part of the distribution ∇2D′ ψ (resp. of the distribution ∆D′ ψ).

Before starting the proof, let me recall an elementary lemma about

convex functions.

Lemma 14.26. (i) Let ϕ : Rn → R be a convex function, and let x0 ,

x1 , . . . , xn+1 ∈ Rn such that B(x0 , 2r) is included in the convex hull
of x1 , . . . , xn+1 . Then,

2 ϕ(x0 ) − max ϕ(xi ) ≤ inf ϕ≤ sup ϕ ≤ max ϕ(xi );

1≤i≤n+1 B(x0 ,2r) B(x0 ,2r) 1≤i≤n+1

2 max ϕ(xi ) − ϕ(x0 )
1≤i≤n+1
kϕkLip(B(x0 ,r)) ≤ .
r
First Appendix: Second differentiability of convex functions 417

(ii) If (ϕk )k∈N is a sequence of convex functions which converges

pointwise to some function Φ, then the convergence is locally uniform.
Proof of Lemma 14.26. If x lies in B(x0 , 2r) then of course, by convex-
ity, ϕ(x) ≤ max(ϕ(x1 ), . . . , ϕ(xn+1 )). Next, if z ∈ B(x0 , 2r), then ze :=
2x0 − z ∈ B(x0 , 2r) and ϕ(z) ≥ 2 ϕ(x0 ) − ϕ(e z ) ≥ 2 ϕ(x0 ) − max ϕ(xi ).
Next, let x ∈ B(x0 , r) and let y ∈ ∂ϕ(x); let z = x + ry/|y| ∈ B(x0 , 2r).
From the subdiﬀerential inequality, r|y| = hy, z − xi ≤ ϕ(z) − ϕ(x) ≤
2 (max ϕ(xi ) − ϕ(x0 )). This proves (i).
Now let (ϕk )k∈N be a sequence of convex functions, let x0 ∈ Rn and
let r > 0. Let x1 , . . . , xn+1 be such that B(x0 , 2r) is included in the con-
vex hull of x1 , . . . , xn+1 . If ϕk (xj ) converges for all j, then by (i) there
is a uniform bound on kϕk kLip on B(x0 , r). So if ϕk converges pointwise
on B(x0 , r), the convergence has to be uniform. This proves (ii). ⊓
⊔
Now we start the proof of Theorem 14.25. To begin with, we should
check that the formulations (i), (i’), (ii) and (ii’) are equivalent; this
will use the convexity of ϕ.
Proof of the equivalence in Theorem 14.25. It is obvious that (i’) ⇒ (i)
and (ii) ⇒ (ii’), so we just have to show that (i) ⇒ (ii) and (ii’) ⇒ (i’).
To prove (i) ⇒ (ii), the idea is to use the mean value theorem; since
a priori ϕ is not smooth, we shall regularize it. Let ζ be a radially sym-
n
metric nonnegative R smooth function R → R, with compact support in
B1 (0), such that ζ = 1. For any ε > 0, let ζε (x) = ε−n ζ(x/ε); then
let ϕε := ϕ ∗ ζε . The resulting function ϕε is smooth and converges
pointwise to ϕ as ε → 0; moreover, since ϕ is locally Lipschitz we have
(by dominated convergence) ∇ϕε = (∇ϕ) ∗ ζε .
Then we can write
ϕ(x + v) − ϕ(x) = lim [ϕε (x + v) − ϕε (x)]
ε→0
Z 1
= lim ∇ϕε (x + tv) · v dt. (14.70)
ε→0 0

Let us assume that ε ≤ |v|; then, by (i), for all z ∈ B2ε (x),
∇ϕ(z) = ∇ϕ(x) + A(z − x) + o(|v|).
If y ∈ Bε (x), we can integrate this identity against
R ζε (y − z) dz (since
ζε (y−z) = 0 if |y−z| > ε); taking into account (z−x) ζε (z−x) dz = 0,
we obtain
418 14 Ricci curvature

∇ϕε (y) = ∇ϕε (x) + A(y − x) + o(|v|).

In particular, ∇ϕε (x + tv) = ∇ϕε (x) + tAv + o(|v|). By plugging this
into the right-hand side of (14.70), we obtain Property (ii).
Now let us prove that (ii’) ⇒ (i’). Without loss of generality we
may assume that x = 0 and ∇ϕ(x) = 0. So the assumption is ϕ(tw) =
t2 hAw, wi/2 + o(t2 ), for any w. If (i’) is false, then there are sequences
xk → 0, |xk | =
6 0, and yk ∈ ∂ϕ(xk ) such that
yk − Axk
6−−−→ 0. (14.71)
|xk | k→∞

Extract an arbitrary sequence from (xk , yk ) (still denoted (xk , yk )

for simplicity) and deﬁne
1
ϕk (w) := ϕ(|xk |w).
|xk |2
Assumption (ii) implies that ϕk converges pointwise to Φ deﬁned by
hAw, wi
Φ(w) = .
2
The functions ϕk are convex, so the convergence is actually locally
uniform by Lemma 14.26.
Since yk ∈ ∂ϕ(xk ),

∀z ∈ Rn , ϕ(z) ≥ ϕ(xk ) + hyk , z − xk i,

or equivalently, with the notation wk = xk /|xk |,

D y E
k
∀w ∈ Rn , ϕk (w) ≥ ϕk (wk ) + , w − wk . (14.72)
|xk |
The choice w = wk + yk /|yk | shows that |yk |/|xk | ≤ ϕk (w) − ϕk (wk ),
so |yk |/|xk | is bounded. Up to extraction of a subsequence, we may
assume that wk = xk /|xk | → σ ∈ S n−1 and yk /|xk | → y. Then we can
pass to the limit in (14.72) and recover

∀w ∈ Rn , Φ(w) ≥ Φ(σ) + y, w − σi.

It follows that y ∈ ∂Φ(σ) = {Aσ}. So yk /|xk | → Aσ, or equivalently

(yk − Axk )/|xk | → 0. What has been shown is that each subsequence of
the original sequence (yk −Axk )/|xk | has a subsequence which converges
to 0; thus the whole sequence converges to 0. This is in contradiction
with (14.71), so (i’) has to be true. ⊓
⊔
First Appendix: Second differentiability of convex functions 419

Now, before proving Theorem 14.25 in full generality, I shall consider

two particular cases which are much simpler.

Proof of Theorem 14.25 in dimension 1. Let ϕ : R → R be a convex

function. Then its derivative ϕ′ is nondecreasing, and therefore diﬀer-
entiable almost everywhere. ⊓
⊔

Proof of Theorem 14.25 when ∇ϕ is locally Lipschitz. Let ϕ : Rn → R

be a convex function, continuously differentiable and such that ∇ϕ
locally Lipschitz. By Rademacher’s theorem, each function ∂i ϕ is dif-
ferentiable almost everywhere, where ∂i stands for the partial derivative
with respect to xi . So the functions ∂j (∂i ϕ) are defined almost every-
where. To conclude the proof, it suffices to show that ∂j (∂i ϕ) = ∂i (∂j ϕ)
almost everywhere. To prove this, let ζ be any C 2 compactly supported
function; then, by successive use of the dominated convergence theorem
and the smoothness of ϕ ∗ ζ,

(∂i ∂j ϕ) ∗ ζ = ∂i (∂j ϕ ∗ ζ) = ∂i ∂j (ϕ ∗ ζ)
= ∂j ∂i (ϕ ∗ ζ) = ∂j (∂i ϕ ∗ ζ) = (∂j ∂i ϕ) ∗ ζ.

It follows that (∂i ∂j ϕ − ∂j ∂i ϕ) ∗ ζ = 0, and since ζ is arbitrary this

implies that ∂i ∂j ϕ − ∂j ∂i ϕ vanishes almost everywhere. This concludes
the argument. ⊓
⊔

Proof of Theorem 14.25 in the general case. As in the proof of Theo-

rem 10.8(ii), the strategy will be to reduce to the one-dimensional case.
For any v ∈ Rn , t > 0, and x such that ϕ is diﬀerentiable at x, deﬁne
ϕ(x + tv) − ϕ(x) − t ∇ϕ(x) · v
Qv (t, x) = ≥ 0.
t2
The goal is to show that for Lebesgue–almost all x ∈ Rn ,

qv (x) := lim Qv (t, x)

t→0

exists for all v, and is a quadratic function of v.

Let Dom q(x) be the set of v ∈ Rn such that qv (x) exists. It is clear
from the deﬁnition that:
(a) qv (x) is nonnegative and homogeneous of degree 2 in v on
Dom q(x);
(b) qv (x) is a convex function of v on Dom q(x): this is just because
it is the limit of the family Qv (t, x), which is convex in v;
420 14 Ricci curvature

(c) If v is interior to Dom q(x) and qw (x) → ℓ as w → v, w ∈

Dom q(x), then also v ∈ Dom q(x) and qv (x) = ℓ. Indeed, let ε > 0
and let δ be so small that |w − v| ≤ δ =⇒ |qw (x) − ℓ| ≤ ε; then, we can
ﬁnd v1 , . . . , vn+1 in Dom q(x) ∩ B(v, δ) so that v lies in the convex hull
of v1 , . . . , vn+1 , and then v0 ∈ Dom q(x) ∩ B(v, δ), so v ∈ B(v0 , δ) and
B(v0 , r) is included in the convex hull of v1 , . . . , vn+1 . By Lemma 14.26,

2 Qv0 (t, x) − max Qvi (t, x) ≤ Qv (t, x) ≤ max Qvi (t, x).

ℓ − 3ε ≤ 2 qv0 (x) − max qvi (x) ≤ lim inf Qv (t, x)

t→0
≤ lim sup Qv (t, x) ≤ max qvi (x) ≤ ℓ + ε.
t→0

It follows that lim Qv (t, x) = ℓ, as desired.

Next, we can use the same reasoning as in the proof of Rademacher’s
theorem (Theorem 10.8(ii)): Let v be given, v 6= 0, let us show that
qv (x) exists for almost all x. By Fubini’s theorem, it is sufficient to
show that qv (x) exists λ1 -almost everywhere on each line parallel to v.
So let x0 ∈ v ⊥ be given, and let Lx0 = x0 + Rv be the line passing
through x0 , parallel to v; the existence of qv (x0 + t0 v) is equivalent to
the second differentiability of the convex function ψ : t → ϕ(x0 + tv) at
t = t0 , and from our study of the one-dimensional case we know that
this happens for λ1 -almost all t0 ∈ R.
So for each v, the set Av of x ∈ Rn such that qv (x) does not exist is
of zero measure. Let (vk ) be a dense subset of Rn , and let A = ∪Avk :
A is of zero measure, and for each x ∈ Rn \ A, Dom q(x) contains all
the vectors vk .
Again, let x ∈ Rn \ A. By Property (b), qv (x) is a convex function
of v, so it is locally Lipschitz and can be extended uniquely into a
continuous convex function r(v) on Rn . By Property (c), r(v) = qv (x),
which means that Dom q(x) = Rn .
At this point we know that for almost any x the limit qv (x) exists
for all v, and it is a convex function of v, homogeneous of degree 2.
What we do not know is whether qv (x) is a quadratic function of v.
Let us try to solve this problem by a regularization argument. Let
ζR be a smooth nonnegative compactly supported function on Rn , with
ζ = 1. Then ∇ϕ∗ζ = ∇(ϕ∗ζ). Moreover, thanks to the nonnegativity
of Qv (x, t) and Fatou’s lemma,
First Appendix: Second differentiability of convex functions 421
Z
(qv ∗ ζ)(x) = lim Qv (y, t) ζ(x − y) dy
t↓0
Z
≤ lim inf Qv (y, t) ζ(x − y) dy
t↓0
1h i
= lim inf 2 (ϕ ∗ ζ)(x + tv) − (ϕ ∗ ζ)(x) − t ∇(ϕ ∗ ζ)(x) · v
t↓0 t
1
2
= ∇ (ϕ ∗ ζ)(x) · v, v .
2
It is obvious that the right-hand side is a quadratic form in v, but this
is only an upper bound on qv ∗ ζ(x). In fact, in general qv ∗ ζ does
not coincide with (1/2)h∇2 (ϕ ∗ ζ)v, vi. The difference is caused by the
singular part of the measure µv := (1/2)h∇2 ϕ · v, vi, defined in the
distribution sense by
Z Z
1
ζ(x) µv (dx) = h∇2 ζ(x) · v, vi ϕ(x) dx.
2
This obstacle is the main new difficulty in the proof of Alexandrov’s
theorem, as compared to the proof of Rademacher’s theorem.
To avoid the singular part of the measure µv , we shall appeal to
Lebesgue’s density theory, in the following precise form: Let µ be a
locally finite measure on Rn , and let ρ λn + µs be its Lebesgue decom-
position into an absolutely continuous part and a singular part. Then,
for Lebesgue–almost all x ∈ Rn ,
1
µ − ρ(x)λn −−−→ 0,
TV(Bδ (x)) δ→0
δn
where k · kTV(Bδ (x)) stands for the total variation on the ball Bδ (x).
Such an x will be called a Lebesgue point of µ.
So let ρv be the density of µv . It is easy to check that µv is locally
finite, and we also showed that qv is locally integrable. So, for λn -almost
all x0 we have
 Z
 1

 |qv (x) − qv (x0 )| dx −−−→ 0;

 δn Bδ (x0 ) δ→0




 1
 µv − ρv (x0 )λn −−−→ 0.
δ n TV(Bδ (x0 )) δ→0

The goal is to show that qv (x0 ) = ρv (x0 ). Then the proof will be com-
plete, since ρv is a quadratic form in v (indeed, ρv (x0 ) is obtained by
422 14 Ricci curvature

averaging µv (dx), which itself is quadratic in v). Without loss of gen-

erality, we may assume that x0 = 0.
To prove that qv (0) = ρv (0), it suﬃces to establish
Z
1
lim |qv (x) − ρv (0)| dx = 0, (14.73)
δ→0 δ n Bδ (0)

To estimate qv (x), we shall express it as a limit involving points in

Bδ (x), and then use a Taylor formula; since ϕ is not a priori smooth,
we shall regularize it on a scale ε ≤ δ. Let ζ be as before, and let
ζε (x) = ε−n ζ(x/ε); further, let ϕε := ϕ ∗ ζ.
We can restrict the integral in (14.73) to those x such that ∇ϕ(x)
exists and such that x is a Lebesgue point of ∇ϕ; indeed, such points
form a set of full measure. For such an x, ϕ(x) = limε→0 ϕε (x), and
∇ϕ(x) = limε→0 ∇ϕε (x). So,
Z
1
|qv (x) − ρv (0)| dx
δn Bδ (0)
Z h ϕ(x + tδv) − ϕ(x) − ∇ϕ(x) · tδv i
1
= n lim − ρ0 (v) dx
δ Bδ (0) t→0 t2 δ 2
Z ϕ (x + tδv) − ϕ (x) − ∇ϕ (x) · tδv
1 ε ε ε
= n lim lim − ρ0 (v) dx
δ Bδ (0) t→0 ε→0 t2 δ 2
Z Z 1 h i
1
= n lim lim h∇2 ϕε (x + stδv) · v, vi − 2ρv (0) (1 − s) ds dx
δ Bδ (0) t→0 ε→0 0
Z Z 1
1
≤ lim inf lim inf n h∇2 ϕε (x + stδv) · v, vi − 2ρv (0)
t→0 ε→0 δ Bδ (0) 0

(1 − s) ds dx
Z 1Z
1 2
≤ lim inf lim inf n h∇ ϕε (y) · v, vi − ρv (0) dy ds,
t→0 ε→0 δ 0 Bδ (stδv)
where Fatou’s lemma and Fubini’s theorem were used successively.
Since B(stδv, δ) ⊂ B(0, (1 + |v|)δ), independently of s and t, we can
bound the above expression by
Z
1 2
lim inf n h∇ ϕε (y) · v, vi − ρv (0) dy
ε→0 δ B(0,(1+|v|)δ)
Z Z
1
= lim inf n ζε (y − z)[µv − ρv (0) λn ](dz) dy
ε→0 δ B(0,(1+|v|)δ)
Z Z
1
≤ lim inf n ζε (y − z)|µv − ρv (0) λn |(dz) dy.
ε→0 δ B(0,(1+|v|)δ)
Second Appendix: Very elementary comparison arguments 423

When y varies in B(0, (1 + |v|)δ), z stays in B(0, (1 + |v|)δ + ε), which

itself is included in B(0, Cδ) with C = 2 + |v|. So, after using Fubini’s
theorem and integrating out ζε (y − z) dy, we conclude that
Z
1
|qv (x) − ρv (0)| dx ≤ kµv − ρv (0) λn kTV(B(0,Cδ)) .
δn Bδ (0)

The conclusion is obtained by taking the limit δ → 0.

Once ∇2 ϕ has been identiﬁed as the density of the distributional
Hessian of ϕ, it follows immediately that ∆ϕ := tr (∇2 ϕ) is the density
of the distributional Laplacian of ϕ. (The trace of a matrix-valued
nonnegative measure is singular if and only if the measure itself is
singular.) ⊓
⊔

Remark 14.27. The notion of a distributional Hessian on a Rieman-

nian manifold is a bit subtle, which is why I did not state anything
about it in Theorem 14.1. On the other hand, there is no diﬃculty in
deﬁning the distributional Laplacian.

Second Appendix: Very elementary comparison

arguments

There are well-developed theories of comparison estimates for second-

order linear diﬀerential equations; but the statement to be considered
here can be proven by very elementary means.

Theorem 14.28 (One-dimensional comparison for second-order

inequalities). Let Λ ∈ R, and f ∈ C([0, 1]) ∩ C 2 (0, 1), f ≥ 0. Then
the following two statements are equivalent:
(i) f¨ + Λf ≤ 0 in (0, 1);
(ii) If Λ < π 2 then for all t0 , t1 ∈ [0, 1],

f (1 − λ)t0 + λt1 ≥ τ (1−λ) (|t0 − t1 |) f (t0 ) + τ (λ) (|t0 − t1 |) f (t1 ),

where
424 14 Ricci curvature
 √

 sin(λθ Λ)

 √ if 0 < Λ < π 2

 sin(θ Λ)





τ (λ) (θ) = λ if Λ = 0





 √

 sinh(λθ −Λ)


 √ if Λ < 0.
sinh(θ −Λ)
If Λ = π 2 then f (t) = c sin(πt) for some c ≥ 0; finally if Λ > π 2 then
f = 0.

Proof of Theorem 14.28. The easy part is (ii) ⇒ (i). If Λ ≥ π 2 this is

trivial. If Λ < π 2 , take λ = 1/2, then a Taylor expansion shows that

1 θΛ2
τ (1/2) (θ) = 1+ + o(θ 3 )
2 8
and
f (t0 ) + f (t1 ) t + t (t − t )2 t + t
0 1 0 1 0 1
=f + f¨ + o(|t0 − t1 |2 ).
2 2 4 2
So, if we ﬁx t ∈ (0, 1) and let t0 , t1 → t in such a way that t = (t0 +t1 )/2,
we get

τ (1/2) (|t0 − t1 |) f (t0 ) + τ (1/2) (|t0 − t1 |) f (t1 ) − f (t)

(t0 − t1 )2 ¨
= f (t) + Λf (t) + o(1) .
8
By assumption the left-hand side is nonnegative, so in the limit we
recover f¨ + Λf ≤ 0.
Now consider the reverse implication (ii) ⇒ (i). By abuse of nota-
tion, let us write f (λ) = f ((1 − λ) t0 + λ t1 ), and denote by a prime
the derivation with respect to λ; so f ′′ + Λθ 2 f ≤ 0, θ = |t0 − t1 |. Let
g(λ) be defined by the right-hand side of (ii); that is, λ → g(λ) is the
solution of g′′ + Λθ 2 g = 0 with g(0) = f (0), g(1) = f (1). The goal is to
show that f ≥ g on [0, 1].
(a) Case Λ < 0. Let a > 0 be any constant; then fa := f + a still
solves the same differential inequality as f , and fa > 0 (even if we did
not assume f ≥ 0, we could take a sufficiently large for this to be true).
Let ga be defined as the solution of ga′′ + Λθ 2 ga = 0 with ga (0) = fa (0),
Second Appendix: Very elementary comparison arguments 425

ga (1) = fa (1). As a → 0, fa converges to f and ga converges to g, so it

is sufficient to show fa ≥ ga . Therefore, without loss of generality we
may assume that f, g are positive, so g/f is continuous.
If g/f attains its maximum at 0 or 1, then we are done. Otherwise,
there is λ0 ∈ (0, 1) such that (g/f )′′ (λ0 ) ≤ 0, (g/f )′ (λ0 ) = 0, and then
the identity
′′
g (g′′ + Λg) g ′′ f′ g ′ g
= − 2 (f + Λf ) − 2 − 2Λ ,
f f f f f f
evaluated at λ0 , yields 0 > −2Λg/f , which is impossible.
(b) Case Λ = 0. This is the basic property of concave functions.
√
(c) Case 0 < Λ < π 2 . Let θ = |t0 − t1 | ≤ 1. Since θ Λ < π, we
can find a function w such that w′′ + Λθ 2 w ≤ 0 and w > 0 on (0, 1).
(Just take a well-chosen sine or cosine function.) Then fa := f + aw
still satisfies the same differential inequality as f , and it is positive.
Let ga be defined by the equation ga′′ + Λθ 2 ga = 0 with ga (0) = fa (0),
ga (1) = fa (1). As a → 0, fa converges to f and ga to g, so it is sufficient
to show that fa ≥ ga . Thus we may assume that f and g are positive,
and f /g is continuous.
The rest of the reasoning is parallel to the case Λ < 0: If f /g attains
its minimum at 0 or 1, then we are done. Otherwise, there is λ0 ∈ (0, 1)
such that (f /g)′′ (λ0 ) ≥ 0, (f /g)′ (λ0 ) = 0, and then the identity
′′
f (f ′′ + Λf ) f g′ f ′ f
= − 2 (g′′ + Λg) − 2 − 2Λ ,
g g g g g g
evaluated at λ0 , yields 0 < −2Λf /g, which is impossible.
(d) Case Λ = π 2 . Take t0 = 0, t1 = 1. Then let g(λ) = sin(πλ), and
let h := f /g. The differential equations f ′′ + Λf ≤ 0 and g′′ + Λg = 0
combine to yield (h′ g2 )′ = h′′ g2 + 2gh′ g′ ≤ 0. So h′ g2 is nonincreasing.
If h′ (λ0 ) < 0 for some t0 ∈ (0, 1), then h′ g2 (λ0 ) < 0 for all λ ≥ λ0 , so
h′ (λ) ≤ −C/g(λ)2 ≤ −C ′ /(1 − λ)2 as λ → 1, where C, C ′ are positive
constants. It follows that h(λ) becomes negative for λ close to 1, which
is impossible. If on the other hand h′ (λ0 ) > 0, then a similar reasoning
shows that h(λ) becomes negative for λ close to 0. The conclusion is
that h′ is identically 0, so f /g is a constant.
√
(e) If Λ > π 2 , then for all t0 , t1 ∈ [0, 1] with |t0 − t1 | = π/ Λ,
the function f (λ) = f (λt0 + (1 − λt1 )) is proportional to sin(πλ), by
Case (d). By letting t0 , t1 vary, it is easy to deduce that f is identi-
cally 0. ⊓
⊔
426 14 Ricci curvature

Third Appendix: Jacobi fields forever

Let R : t 7−→ R(t) be a continuous map deﬁned on [0, 1], valued in the
space of n × n symmetric matrices, and let U be the space of functions
u : t 7−→ u(t) ∈ Rn solving the second-order linear diﬀerential equation

ü(t) + R(t) u(t) = 0. (14.74)

By the theory of linear diﬀerential equations, U is a (2n)-dimensional

vector space and the map u 7−→ (u(0), u̇(0)) is a linear isomorphism
U → R2n .
As explained in the beginning of this chapter, if γ : [0, 1] → M is
a geodesic in a Riemannian manifold M , and ξ is a Jacobi field along
γ (that is, an infinitesimal variation of geodesic around γ), then the
coordinates of ξ(t), written in an orthonormal basis of Tγ(t) M evolving
by parallel transport, satisfy an equation of the form (14.74).
It is often convenient to consider an array (u1 (t), . . . , un (t)) of so-
lutions of (14.74); this can be thought of as a time-dependent matrix
t 7−→ J(t) solving the differential (matrix) equation

J¨(t) + R(t) J(t) = 0.

Such a solution will be called a Jacobi matrix. If J is a Jacobi matrix

and A is a constant matrix, then JA is still a Jacobi matrix.
Jacobi matrices enjoy remarkable properties, some of which are sum-
marized in the next three statements. In the sequel, the time-interval
[0, 1] and the symmetric matrix t 7−→ R(t) are ﬁxed once for all, and
the dot stands for time-derivation.

Proposition 14.29 (Jacobi matrices have symmetric logarith-

mic derivatives). Let J be a Jacobi matrix such that J(0) is invert-
˙ J(0)−1 is symmetric. Let t∗ ∈ [0, 1] be largest possible
ible and J(0)
˙ J(t)−1 is symmetric
such that J(t) is invertible for t < t∗ . Then J(t)
for all t ∈ [0, t∗ ).

Proposition 14.30 (Cosymmetrization of Jacobi matrices). Let

J01 and J10 be Jacobi matrices defined by the initial conditions

J01 (0) = In , J˙01 (0) = 0, J10 (0) = 0, J˙10 (0) = In ;

so that any Jacobi matrix can be written as

J(t) = J01 (t) J(0) + J10 (t) J˙(0). (14.75)

Third Appendix: Jacobi fields forever 427

Assume that J10 (t) is invertible for all t ∈ (0, 1]. Then:
(a) S(t) := J10 (t)−1 J01 (t) is symmetric positive for all t ∈ (0, 1], and
it is a decreasing function of t.
(b) There is a unique pair of Jacobi matrices (J 1,0 , J 0,1 ) such that

J 1,0 (0) = In , J 1,0 (1) = 0, J 0,1 (0) = 0, J 0,1 (1) = In ;

moreover J˙1,0 (0) and J˙0,1 (1) are symmetric.

K(t) = t J10 (t)−1 ,

extended by continuity at t = 0 by K(0) = In . If J is a Jacobi ma-

trix such that J(0) is invertible, and J(0)∗ S(1) J(0)−1 = S(1) and
˙ J(0)−1 are symmetric, then for any t ∈ [0, 1] the matrices
J(0)

K(t) J 1,0 (t) J(0) J(0)−1 and K(t) J 0,1 (t) J(1) J(0)−1

are symmetric. Moreover, det K(t) > 0 for all t ∈ [0, 1).
Proposition 14.31 (Jacobi matrices with positive determinant).
Let S(t) and K(t) be the matrices defined in Proposition 14.30. Let J
be a Jacobi matrix such that J(0) = In and J(0) ˙ is symmetric. Then
the following properties are equivalent:
˙
(i) J(0) + S(1) > 0;
(ii) K(t) J 0,1 (t) J(1) > 0 for all t ∈ (0, 1);
(iii) K(t) J(t) > 0 for all t ∈ [0, 1];
(iv) det J(t) > 0 for all t ∈ [0, 1].
The equivalence remains true if one replaces the strict inequalities in
(i)–(ii) by nonstrict inequalities, and the time-interval [0, 1] in (iii)–(iv)
by [0, 1).
Before proving these propositions, it is interesting to discuss their
geometric interpretation:
• If γ(t) = expx (t∇ψ(x)) is a minimizing geodesic, then γ̇(t) =
∇ψ(t, γ(t)), where ψ solves the Hamilton–Jacobi equation

 ∂ψ(t, x) |∇ψ(t, x)|2

 + =0
∂t 2



ψ(0, · ) = ψ;
428 14 Ricci curvature

at least before the ﬁrst shock of the Hamilton–Jacobi equation. This

˙
corresponds to Proposition 14.29 with J(0) = In , J(0) = ∇2 ψ(x),
˙ J(t) = ∇ ψ(t, γ(t)). (Here ∇ ψ(x) and ∇ ψ(t, γ(t)) are iden-
J(t) −1 2 2 2

tiﬁed with the matrix of their respective coordinates, as usual in a

varying orthonormal basis).
• The Jacobi matrices J01 (t) and J10 (t) represent ∂x Ft and ∂v Ft respec-
tively, where Ft (x, v) = (expx (tv), (d/dt) exp x (tv)) is the geodesic
ﬂow at time t. So the hypothesis of invertibility of J10 (t) in Propo-
sition 14.30 corresponds to an assumption of nonfocalization along
the geodesic γ.
• The formula
d( · , γ(t))2
γ(t) = expx −∇x
2
yields, after diﬀerentiation,

H(t)
0 = J01 (t) − J10 (t) ,
t
where (modulo identiﬁcation)

d(·, γ(t))2
H(t) = ∇2x .
2
(The extra t in the denominator comes from time-reparameterization.)
So S(t) = J10 (t)−1 J01 (t) = H(t)/t should be symmetric.
• The assumption

J(0) = In , ˙
J(0) + S(1) ≥ 0

is the Jacobi ﬁeld version of the formulas

2
d · , exp x ∇ψ(x)
γ(t) = expx (t∇ψ(x)), ∇2 ψ(x) + ∇2x ≥ 0.
2
The latter inequality holds true if ψ is d2 /2-convex, so Proposi-
tion 14.31 implies the last part of Theorem 11.3, according to which
the Jacobian of the optimal transport map remains positive for
0 < t < 1.

Now we can turn to the proofs of Propositions 14.29 to 14.31.

Third Appendix: Jacobi fields forever 429

Proof of Proposition 14.29. The argument was already mentioned in

˙ J(t)−1 satisﬁes
the beginning of this chapter: the matrix U (t) = J(t)
the Ricatti equation

U̇ (t) + U (t)2 + R(t) = 0

on [0, t∗ ); and since R is symmetric the transpose U (t)∗ of U (t) satis-

ﬁes the same equation. By assumption U (0) = U (0)∗ ; so the Cauchy–
Lipschitz uniqueness theorem implies U (t) = U (t)∗ for all t. ⊓
⊔

Proof of Proposition 14.30. First of all, the identity in (14.75) follows

immediately from the observation that both sides solve the Jacobi equa-
tion with the same initial conditions.
Let now t ∈ (0, 1], w ∈ Rn and w b = J10 (t)−1 J01 (t) w; so that
b = J01 (t) w. The function s 7−→ u(s) = J01 (s) w − J10 (s) w
J10 (t) w b be-
longs to U and satisﬁes u(t) = 0. Moreover, w = u(0), w b = −u̇(0). So
the matrix S(t) = J10 (t)−1 J01 (t) can be interpreted as the linear map
u(0) 7−→ −u̇(0), where s 7−→ u(s) varies in the vector space Ut of solu-
tions of the Jacobi equation (14.74) satisfying u(t) = 0. To prove the
symmetry of S(t) it suﬃces to check that for any two u, v in Ut ,

hu(0), v̇(0)i = hu̇(0), v(0)i,

where h · , · i stands for the usual scalar product on Rn . But

d
hu(s), v̇(s)i − hu̇(s), v(s)i = −hu(s), R(s) v(s)i + hR(s) u(s), v(s)i
ds
=0 (14.76)

by symmetry of R. This implies the symmetry of S since

hu(0), v̇(0)i − hu̇(0), v(0)i = hu(t), v̇(t)i − hu̇(t), v(t)i = 0.

Remark 14.32. A knowledgeable reader may have noticed that this

is a “symplectic” argument (related to the Hamiltonian nature of the
geodesic ﬂow): if Rn × Rn is equipped with its natural symplectic form

ω (u, u̇), (v, v̇) = hu̇, vi − hv̇, ui,

then the ﬂow (u(s), u̇(s)) 7−→ (u(t), u̇(t)), where u ∈ U, preserves ω.
The subspaces U0 = {u ∈ U; u(0) = 0} and U̇0 = {u ∈ U; u̇(0) = 0} are
Lagrangian: this means that their dimension is the half of the dimension
430 14 Ricci curvature

of U, and that ω vanishes identically on each of them. Moreover ω is

nondegenerate on U0 × U̇0 , providing an identiﬁcation of these spaces.
Then Ut is also Lagrangian, and if one writes it as a graph in U0 × U̇0 ,
it is the graph of a symmetric operator.

Back to the proof of Proposition 14.29, let us show that S(t) is

a decreasing function of t. To reformulate the above observations we
write, for any w ∈ Rn ,
D ∂u E
hS(t)w, wi = − w, (0, t) ,
∂s
where u(s, t) is deﬁned by
 2

 ∂ u(s, t)

 + R(s) u(s, t) = 0
 ∂s2
(14.77)

 u(t, t) = 0


u(0, t) = w.

So
D ∂2u E
hṠ(t)w, wi = − w, (0, t)
∂s ∂t
D ∂ E D ∂v E
= − w, (∂t u)(0, t) = − u(0), (0) ,
∂s ∂s
where s 7−→ v(s) = (∂t u)(s, t) and s 7−→ u(s) = u(s, t) are solu-
tions of the Jacobi equation. Moreover, by differentiating the conditions
in (14.77) one obtains
∂u
v(t) + (t) = 0; v(0) = 0. (14.78)
∂s
By (14.76) again,
D ∂v E D ∂u E D ∂v E D ∂u E
− u(0), (0) = − (0), v(0) − u(t), (t) + (t), v(t) .
∂s ∂s ∂s ∂s
The first two terms in the right-hand side vanish because v(0) = 0 and
u(t) = u(t, t) = 0. Combining this with the first identity in (14.78) one
finds in the end
hṠ(t)w, wi = −kv(t)k2 . (14.79)
We already know that v(0) = 0; if in addition v(t) = 0 then 0 =
v(t) = J10 (t)v̇(0), so (by invertibility of J10 (t)) v̇(0) = 0, and v vanishes
Third Appendix: Jacobi fields forever 431

identically; then by (14.78) (du/ds) vanishes at s = t, and since also

u(t) = 0 we know that u vanishes identically, which implies w = 0. In
other words, the right-hand side of (14.79) is strictly negative unless
w = 0; this means that S(t) is a strictly decreasing function of t. Thus
the proof of (a) is ﬁnished.
To prove (b) it is suﬃcient to exhibit the matrices J 1,0 (t) and J 0,1 (t)
explicitly in terms of J01 and J10 :

J 1,0 (t) = J01 (t) − J10 (t) J10 (1)−1 J01 (1); J 0,1 (t) = J10 (t) J10 (1)−1 .
(14.80)
˙ 0 −1 1 ˙ 0,1 ˙
Moreover, J (0) = −J1 (1) J0 (1) and J (t) = J1 (1) J1 (1)−1 are
1,0 0 0

symmetric in view of (a) and Proposition 14.29. Also

K(t) J 1,0 (t) J(0) J(0)−1 = t J10 (t)−1 J01 (t) − J10 (t) J10 (1)−1 J01 (1)

= t J10 (t)−1 J01 (t) − J10 (1)−1 J01 (1)

is positive symmetric since by (a) the matrix S(t) = J10 (t)−1 J01 (t) is
a strictly decreasing function of t. In particular K(t) is invertible for
t > 0; but since K(0) = In , it follows by continuity that det K(t)
remains positive on [0, 1).
Finally, if J satisﬁes the assumptions of (c), then S(1) J(0)−1 is
symmetric (because S(1)∗ = S(1)). Then

(K(t) J 0,1 (t) J(1) J(0)−1

= t J10 (t)−1 J10 (t) J10 (1)−1 J01 (1) + J10 (1) J˙(0) J(0)−1

˙ J(0)−1
= t J10 (1)−1 J01 (1) J(0)−1 + J(0)

is also symmetric. ⊓
⊔

Proof of Proposition 14.31. Assume (i); then, by the formulas in the

end of the proof of Proposition 14.30, with J(0) = In ,

K(t) J 1,0 (t) = t [S(t) − S(1)]; ˙

K(t) J 0,1 (t) J(1) = t [S(1) + J(0)].

As we already noticed, the ﬁrst matrix is positive for t ∈ (0, 1); and the
second is also positive, by assumption. In particular (ii) holds true.
The implication (ii) ⇒ (iii) is obvious since J(t) = J 1,0 (t) +
J 0,1 (t) J(1)
is the sum of two positive matrices for t ∈ (0, 1). (At t = 0
one sees directly K(0) J(0) = In .)
432 14 Ricci curvature

If (iii) is true then (det K(t)) (det J(t)) > 0 for all t ∈ [0, 1), and we
already know that det K(t) > 0; so det J(t) > 0, which is (iv).
˙
It remains to prove (iv) ⇒ (i). Recall that K(t) J(t) = t [S(t)+ J(0)];
since det K(t) > 0, the assumption (iv) is equivalent to the statement
that the symmetric matrix A(t) = t S(t) + t J(0) ˙ has positive deter-
minant for all t ∈ (0, 1]. The identity t S(t) = K(t) J01 (t) shows that
A(t) approaches In as t → 0; and since none of its eigenvalues vanishes,
A(t) has to remain positive for all t. So S(t) + J(0)˙ is positive for all
t ∈ (0, 1]; but S is a decreasing function of t, so this is equivalent to
˙
S(1) + J(0) > 0, which is condition (i).
The last statement in Proposition 14.31 is obtained by similar ar-
guments and its proof is omitted. ⊓
⊔

Bibliographical notes

Recommended textbooks about Riemannian geometry are the ones by

do Carmo [306], Gallot, Hulin and Lafontaine [394] and Chavel [223].
All the necessary background about Hessians, Laplace–Beltrami oper-
ators, Jacobi fields and Jacobi equations can be found there.
Apart from these sources, a review of comparison methods based on
Ricci curvature bounds can be found in [846].
Formula (14.1) does not seem to appear in standard textbooks of
Riemannian geometry, but can be derived with the tools found therein,
or by comparison with the sphere/hyperbolic space. On the sphere,
the computation can be done directly, thanks to a classical formula
of spherical trigonometry: If a, b, c are the lengths of the sides of a
triangle drawn on the unit sphere S 2 , and γ is the angle opposite to c,
then cos c = cos a cos b+sin a sin b cos γ. A more standard computation
usually found in textbooks is the asymptotic expansion of the perimeter
of a circle centered at x with (geodesic) radius r, as r → 0. Y. H.
Kim and McCann [520, Lemma 4.5] recently generalized (14.1) to more
general cost functions, and curves of possibly differing lengths.
The differential inequalities relating the Jacobian of the exponen-
tial map to the Ricci curvature can be found (with minor variants)
in a number of sources, e.g. [223, Section 3.4]. They usually appear
in conjunction with volume comparison principles such as the Heintze–
Kärcher, Lévy–Gromov, Bishop–Gromov theorems, all of which express
Bibliographical notes 433

the idea that if the Ricci curvature is bounded below by K, and the
dimension is less than N , then volumes along geodesic fields grow no
faster than volumes in model spaces of constant sectional curvature hav-
ing dimension N and Ricci curvature identically equal to K. These com-
putations are usually performed in a smooth setting; their adaptation
to the nonsmooth context of semiconvex functions has been achieved
only recently, first by Cordero-Erausquin, McCann and Schmucken-
schläger [246] (in a form that is somewhat different from the one pre-
sented here) and more recently by various sets of authors [247, 577, 761].
Bochner’s formula appears, e.g., as [394, Proposition 4.15] (for a
vector field ξ = ∇ψ) or as [680, Proposition 3.3 (3)] (for a vector field
ξ such that ∇ξ is symmetric, i.e. the 1-form p → ξ · p is closed). In
both cases, it is derived from properties of the Riemannian curvature
tensor. Another derivation of Bochner’s formula for a gradient vector
field is via the properties of the square distance function d(x0 , x)2 ; this
is quite simple, and not far from the presentation that I have followed,
since d(x0 , x)2 /2 is the solution of the Hamilton–Jacobi equation at
time 1, when the initial datum is 0 at x0 and +∞ everywhere else.
But I thought that the explicit use of the Lagrangian/Eulerian duality
would make Bochner’s formula more intuitive to the readers, especially
those who have some experience of fluid mechanics.
There are several other Bochner formulas in the literature; Chap-
ter 7 of Petersen’s book [680] is entirely devoted to that subject. In
fact “Bochner formula” is a generic name for many identities involving
commutators of second-order differential operators and curvature.
The examples (14.10) are by now standard; they have been discussed
for instance by Bakry and Qian [61], in relation with spectral gap esti-
mates. When the dimension N is an integer, these reference spaces are
obtained by “projection” of the model spaces with constant sectional
curvature.
The practical importance of separating out the direction of motion is
implicit in Cordero-Erausquin, McCann and Schmuckenschläger [246],
but it was Sturm who attracted my attention on this. To implement
this idea in the present chapter, I essentially followed the discussion
in [763, Section 1]. Also the integral bound (14.56) can be found in this
reference.
Many analytic and geometric consequences of Ricci curvature bounds
are discussed in Riemannian geometry textbooks such as the one by
434 14 Ricci curvature

Gallot, Hulin and Lafontaine [394], and also in hundreds of research

papers.
Cordero-Erausquin, McCann and Schmuckenschläger [246, Section 2]
express differential inequalities about the Jacobian determinant in
terms of volume distortion coefficients; all the discussion about dis-
tortion coefficients is inspired from this reference, and most of the ma-
terial in the Third Appendix is also adapted from that source. This
Appendix was born from exchanges with Cordero-Erausquin, who also
unwillingly provided its title. It is a pleasure to acknowledge the help
of my geometer colleagues (Ghys, Sikorav, Welschinger) in getting the
“symplectic argument” for the proof of Proposition 14.29.
Concerning Bakry’s approach to curvature-dimension bounds, among
many sources one can consult the survey papers [54] and [545].
The almost everywhere second differentiability of convex functions
was proven by Alexandrov in 1942 [16]. The proof which I gave in the
first Appendix has several points in common with the one that can
be found in [331, pp. 241–245], but I have modified the argument to
make it look as much as possible like the proof of Rademacher’s theo-
rem (Theorem 10.8(ii)). The resulting proof is a bit redundant in some
respects, but hopefully it will look rather natural to the reader; also
I think it is interesting to have a parallel presentation of the theorems by
Rademacher and Alexandrov. Alberti and Ambrosio [11, Theorem 7.10]
prove Alexandrov’s theorem by a quite different technique, since they
deduce it from Rademacher’s theorem (in the form of the almost ev-
erywhere existence of the tangent plane to a Lipschitz graph) together
with the area formula. Also they directly establish the differentiability
of the gradient, and then deduce the existence of the Hessian; that is,
they prove formulation (i) in Theorem 14.1 and then deduce (ii), while
in the First Appendix I did it the other way round.
Lebesgue’s density theorem can be found for instance in [331, p. 42].
The theorem according to which a nonincreasing function R → R
is differentiable almost everywhere is a well-know result, which can be
deduced as a corollary of [318, Theorems 7.2.4 and 7.2.7].
15

Otto calculus

Let M be a Riemannian manifold, and let P2 (M ) be the associated

Wasserstein space of order 2. Recall from Chapter 7 that P2 (M ) is a
length space and that there is a nice representation formula for the
Wasserstein distance W2 :
Z 1
W2 (µ0 , µ1 )2 = inf kµ̇t k2µt dt, (15.1)
0

where kµ̇kµ is the norm of the inﬁnitesimal variation µ̇ of the measure

µ, deﬁned by
Z
2
kµ̇kµ = inf |v| dµ; µ̇ + ∇ · (vµ) = 0 .

One of the reasons for the popularity of Riemannian geometry (as

opposed to the study of more general metric structures) is that it al-
lows for rather explicit computations. At the end of the nineties, Otto
realized that some precious inspiration could be gained by performing
computations of a Riemannian nature in the Wasserstein space. His
motivations will be described later on; to make a long story short, he
needed a good formalism to study certain diffusive partial differential
equations which he knew could be considered as gradient flows in the
Wasserstein space.
In this chapter, as in Otto’s original papers, this problem will be
considered from a purely formal point of view, and there will be no
attempt at rigorous justification. So the problem is to set up rules for
formally differentiating functions (i.e. functionals) on P2 (M ). To fix
the ideas, and because this is an important example arising in many
different contexts, I shall discuss only a certain class of functionals,
436 15 Otto calculus

that involve (i) a function V : M → R, used to distort the reference

volume measure; and (ii) a function U : R+ → R, twice diﬀerentiable
(at least on (0, +∞)), which will relate the values of the density of our
probability measure and the value of the functional. So let


 ν(dx) := e−V (x) vol(dx)

Z (15.2)


Uν (µ) := U (ρ(x)) dν(x), µ = ρ ν.
M

So far the functional Uν is only deﬁned on the set of probability mea-

sures that are absolutely continuous with respect to ν, or equivalently
with respect to the volume measure, and I shall not go beyond that
setting before Part III of these notes. If ρ0 stands for the density of µ
with respect to the plain volume, then obviously ρ0 = e−V ρ, so there
is the alternative expression
Z

Uν (µ) = U eV ρ0 e−V dvol, µ = ρ0 vol.
M

One can think of U as a constitutive law for the internal energy

of a fluid: this is jargon to say that the R energy “contained” in a fluid
of density ρ is given by the formula U (ρ). The function U should
be a property of the fluid itself, and might reflect some microscopic
interaction between the particles which constitute it; it is natural to
assume U (0) = 0.
In the same thermodynamical analogy, one can also introduce the
pressure law:
p(ρ) = ρ U ′ (ρ) − U (ρ). (15.3)
The physical interpretation is as follows: if the fluid is enclosed in a
domain Ω, then the pressure felt by the boundary ∂Ω at a point x is
normal and proportional to p(ρ) at that point. (Recall that the pres-
sure is defined, up to a sign, as the partial derivative of the internal
energy with respect to the volume of the fluid.) So if you consider a
homogeneous fluid of total mass 1, in a volume V , then its density is
ρ = 1/V , the total energy is V U (1/V ), and the pressure should be
(−d/dV )[V U (1/V )] = p(1/V ); this justifies formulaR(15.3).
To the pressure p is associated a total pressure p(ρ) dν, and one
can again consider the influence of small variations of volume on this
functional; this leads to the definition of the iterated pressure:
15 Otto calculus 437

p2 (ρ) = ρ p′ (ρ) − p(ρ). (15.4)

Both the pressure and the iterated pressure will appear naturally when
one diﬀerentiates the energy functional: the pressure for ﬁrst-order
derivatives, and the iterated pressure for second-order derivatives.
Example 15.1. Let m 6= 1, and
ρm − ρ
U (ρ) = U (m) (ρ) = ;
m−1
then
p(ρ) = ρm , p2 (ρ) = (m − 1) ρm .
There is an important limit case as m → 1:

U (1) (ρ) = ρ log ρ;

then
p(ρ) = ρ, p2 (ρ) = 0.
By the way, the linear part −ρ/(m − 1) in U (m) does not contribute
to the pressure, but has the merit of displaying the link between U (m)
and U (1) .
Diﬀerential operators will also be useful. Let ∆ be the Laplace oper-
ator on M , then the distortion of the volume element by the function V
leads to a natural second-order operator:

L = ∆ − ∇V · ∇. (15.5)

Recall from Chapter 14 the expression of the carré du champ itéré

associated with L:
|∇ψ|2
Γ2 (ψ) = L − ∇ψ · ∇(Lψ) (15.6)
2
= k∇2 ψk2HS + Ric + ∇2 V (∇ψ); (15.7)

the second equality is a consequence of Bochner’s formula (14.28), as

we shall brieﬂy check. With respect to (14.28), there is an additional
term in the left-hand side:
|∇ψ|2
−∇V · ∇ + ∇ψ · ∇(∇V · ∇ψ)
2

= − ∇2 ψ · ∇V, ∇ψ + ∇2 V · ∇ψ, ∇ψ + ∇2 ψ · ∇V, ∇ψ

= ∇2 V · ∇ψ, ∇ψ ,
438 15 Otto calculus

which is precisely the additional term in the right-hand side.

The next formula is the ﬁrst important result in this chapter: it
gives an “explicit” expression for the gradient of the functional Uν . For
a given measure µ, the gradient of Uν at µ is a “tangent vector” at µ in
the Wasserstein space, so this should be an inﬁnitesimal variation of µ.
Formula 15.2 (Gradient formula in Wasserstein space). Let µ
be absolutely continuous with respect to ν. Then, with the above nota-
tion,

gradµ Uν = −∇ · µ ∇U ′ (ρ) (15.8)
−V

= −∇ · e ∇p(ρ) vol (15.9)

= − L p(ρ) ν. (15.10)

Remark 15.3. The expression in the right-hand side of (15.8) is the

divergence of a vector-valued measure; recall that ∇ · m is deﬁned in
the weak sense by its action on compactly supported smooth functions:
Z Z
φ d(∇ · m) = − ∇φ · (dm).

On the other hand, the divergence in (15.9) is the divergence of a vector

field. Note that ∇ · (ξ vol) = (∇ · ξ) vol, so in (15.9) one could put the
volume “inside the divergence”. All three expressions in Formula (15.2)
are interesting, the first one because it writes the “tangent vector”
gradµ Uν in the normalized form −∇ · (µ∇ψ), with ψ = U ′ (ρ); the
second one because it gives the result as the divergence of a vector
field; the third one because it is stated in terms of the infinitesimal
variation of density ρ = dµ/dν.
Below are some important examples of application of Formula 15.2.
Example 15.4. Define the H-functional of Boltzmann (opposite of the
entropy) by Z
H(µ) = ρ log ρ dvol.
M
Then the second expression in equation (15.8) yields

gradµ H = −∆µ,

which can be identiﬁed with the function −∆ρ. Thus the gradient of
Boltzmann’s entropy is the Laplace operator. This short statement is
one of the ﬁrst striking conclusions of Otto’s formalism.
15 Otto calculus 439

Example 15.5. Now consider a general ν = e−V vol, write µ = ρ ν =

ρ0 vol, and deﬁne
Z Z
Hν (µ) = ρ log ρ dν = (log ρ0 + V ) dµ
M M

(this is the H-functional relative to the reference measure ν). Then

gradµ Hν = − (∆ρ − ∇V · ∇ρ) ν = − (Lρ) ν.

In short, the gradient of the relative entropy is the distorted Laplace

operator.
Example 15.6. To generalize Example 15.4 in another direction, con-
sider Z m
ρ −ρ
H (m) (µ) = dvol;
m−1
then
gradµ H (m) = −∆(ρm ).
More generally, if ρ is the density with respect to ν = e−V vol, and
Z m
(m) ρ −ρ
Hν (µ) = dν,
m−1
then

gradµ Uν = − eV ∇ · e−V ∇ρm ν (15.11)
m
= − (Lρ ) ν. (15.12)

The next formula is about second-order derivatives, or Hessians.

Since the Hessian of Uν at µ is a quadratic form on the tangent space
Tµ P2 , I shall write down its expression when evaluated on a tangent
vector of the form −∇ · (µ∇ψ).
Formula 15.7 (Hessian formula in Wasserstein space). Let µ be
absolutely continuous with respect to ν, and let µ̇ = −∇ · (µ∇ψ) be a
tangent vector at µ. Then, with the above notation,
Z Z
Hessµ Uν (µ̇) = Γ2 (ψ) p(ρ) dν + (Lψ)2 p2 (ρ) dν (15.13)
Z h M M
i
= k∇2 ψk2HS + Ric + ∇2 V (∇ψ) p(ρ) dν
M Z 2
+ −∆ψ + ∇V · ∇ψ p2 (ρ) dν. (15.14)
M
440 15 Otto calculus

Remark 15.8. As expected, this is a quadratic expression in ∇ψ and

its derivatives; and this expression does depend on the measure µ.
Example 15.9. Applying the formula with U (ρ) = (ρm − ρ)/(m − 1),
recalling that µ = ρ ν, one obtains

Hessµ Hν(m) (µ̇) =

Z
2 2 2
2 m−1
k∇ ψkHS +(Ric+∇ V )(∇ψ)+(m−1) ∆ψ−∇V ·∇ψ ρ dµ.
M
In the limit case m = 1, which is U (ρ) = ρ log ρ, this expression sim-
plifies into
Z
Hessµ Hν (µ̇) = k∇2 ψk2HS + (Ric + ∇2 V )(∇ψ) dµ;
M
or equivalently, with the notation of Chapter 14,
Z
Hessµ Hν (µ̇) = k∇2 ψk2HS + Ric∞,ν (∇ψ) dµ.
M
Formulas 15.2 and 15.7 will be justified only at a heuristic level. A
rigorous proof would require many more definitions and much more ap-
paratus, as well as regularity and decay assumptions on the measures
and the functionals. So here I shall disregard all issues about integrabil-
ity and regularity, which will be a huge simplification. Still, the proofs
will not be completely trivial.
“Proof ” of Formula 15.2. When the integration measure is not speci-
fied, it will be the volume rather than ν. To understand the proof, it is
important to make the distinction between a gradient and a differential.
Let ζ be such that the tangent vector gradµ Uν can be represented
as −∇ · (µ∇ζ), and let ∂t µ = −∇ · (µ∇ψ) be an arbitrary “tangent
vector”. The infinitesimal variation of the density ρ = dµ/dν is given
by
∂t ρ = −eV ∇ · ρ e−V ∇ψ .
By direct computation and integration by parts, the infinitesimal vari-
ation of Uν along that variation is equal to
Z Z
U (ρ) ∂t ρ dν = − U ′ (ρ) ∇ · (ρ e−V ∇ψ)
′

Z
= ∇U ′ (ρ) · ∇ψ ρ e−V
Z
= ∇U ′ (ρ) · ∇ψ dµ.
15 Otto calculus 441

By deﬁnition of the gradient operator, this should coincide with

Z

gradµ Uν , ∂t µ = ∇ζ · ∇ψ dµ.

If this should hold true for all ψ, the only possible choice is that
∇U ′ (ρ) = ∇ζ(ρ), at least µ-almost everywhere. In any case ζ := U ′ (ρ)
provides an admissible representation of gradµ Uν . This proves for-
mula (15.8). The other two formulas are obtained by noting that
p′ (ρ) = ρ U ′′ (ρ), and so

∇U ′ (ρ)ρ = ρ U ′′ (ρ)∇ρ = p′ (ρ)∇ρ = ∇p(ρ);

therefore

∇ · µ ∇U ′ (ρ) = ∇ · e−V ρ ∇U ′ (ρ) = e−V L p(ρ).

⊓
⊔

For the second order (formula (15.7)), things are more intricate.
The following identity will be helpful: If ξ is a tangent vector at x on
a Riemannian manifold M, and F is a function on M, then

d2
Hessx F (ξ) = 2 F (γ(t)), (15.15)
dt t=0

where γ(t) is a geodesic starting from γ(0) = x with velocity γ̇(0) = ξ.

To prove (15.15), it suﬃces to note that the ﬁrst derivative of F (γ(t))
is

γ̇(t) · ∇F (γ(t)); so the second derivative is (d/dt)(γ̇(t)) · ∇F (γ(t)) +

2
∇ F (γ(t)) · γ̇(t), γ̇(t) , and the ﬁrst term vanishes because a geodesic
has zero acceleration.

“Proof ” of Formula 15.7. The problem consists in diﬀerentiating Uν (µt )

twice along a geodesic path of the form


 ∂ µ + ∇ · (µ∇ψ) = 0
 t

 2
∂ ψ + |∇ψ| = 0.
t
2
The following integration by parts formula will be useful:
Z Z
∇f · ∇g dν = − (Lf )g dν. (15.16)
442 15 Otto calculus

From the proof of the gradient formula, one has, with the notation
µt = ρt ν,
Z
dUν (µt )
= ∇ψt · ∇U ′ (ρt ) ρt dν
dt
ZM
= ∇ψt · ∇p(ρt ) dν
MZ

=− (Lψt ) p(ρt ) dν.

It remains to diﬀerentiate again. To alleviate notation, I shall not

write the time variable explicitly. So
Z Z
d2 Uν (µ)
= − L∂ t ψ p(ρ) dν − (Lψ)p′ (ρ)∂t ρ dν (15.17)
dt2
Z Z
|∇ψ|2
= L p(ρ) dν − (Lψ) p′ (ρ) ∂t µ. (15.18)
2

The last term in (15.18) can be rewritten as

Z
(Lψ) p′ (ρ) ∇ · (µ∇ψ)
Z

=− ∇ (Lψ)p′ (ρ) · ∇ψ dµ
Z

=− ∇ (Lψ)p′ (ρ) · ∇ψ ρ dν
Z Z
′
=− ∇(Lψ) · ∇ψ p (ρ) ρ dν − (Lψ) p′′ (ρ) ρ ∇ρ · ∇ψ dν
Z Z
′
=− ∇(Lψ) · ∇ψ ρ p (ρ) dν − (Lψ)∇p2 (ρ) · ∇ψ dν. (15.19)

The second term in (15.19) needs a bit of reworking: it can be recast

as Z Z

− ∇ Lψ p2 (ρ) · ∇ψ dν − (∇Lψ) p2 (ρ) · ∇ψ dν
Z Z
2
= (Lψ) p2 (ρ) dν − (∇Lψ) · ∇ψ p2 (ρ) dν,

where (15.16) has been used once more.

By collecting all these calculations,
15 Otto calculus 443
Z Z
d2 Uν (µ) |∇ψ|2
= L p(ρ) dν + (Lψ)2 p2 (ρ) dν
dt2 2
Z

+ (∇ψ · ∇Lψ) p2 (ρ) − ρ p′ (ρ) dν.

Since p2 (ρ) − ρ p′ (ρ) = −p(ρ), this transforms into

Z Z
|∇ψ|2
L − ∇ψ · ∇Lψ p(ρ) dν + (Lψ)2 p2 (ρ). (15.20)
2
In view of (15.6)–(15.7), this establishes formula (15.13). ⊓
⊔

Exercise 15.10. “Prove” that the gradient of an arbitrary functional F

on P2 (M ) can be written
δF
gradµ F = −∇ · (µ∇φ), φ= ,
δµ
where δF/δµ is a function deﬁned by
Z
d δF
F(µt ) = ∂t µt .
dt δµ
Check that in the particular case
Z
F(µ) = F x, ρ(x), ∇ρ(x) dν(x), (15.21)
M

where F = F (x, ρ, p) is a smooth function of ρ ∈ R+ , (x, p) ∈ TM , one

has

∂F
(x) = (∂ρ F ) x, ρ(x), ∇ρ(x)
∂µ

− (∇x − ∇V (x)) · (∇p F ) x, ρ(x), ∇ρ(x)

The following two open problems (loosely formulated) are natural

and interesting, and I don’t know how diﬃcult they are:

Open Problem 15.11. Find a nice formula for the Hessian of the
functional F appearing in (15.21).

Open Problem 15.12. Find a nice formalism playing the role of the
Otto calculus in the space Pp (M ), for p 6= 2. More generally, are there
nice formal rules for taking derivatives along displacement interpola-
tion, for general Lagrangian cost functions?
444 15 Otto calculus

To conclude this chapter, I shall come back to the subject of rig-

orous justification of Otto’s formalism. At the time of writing, several
theories have been developed, at least in the Euclidean setting (see
the bibliographical notes); but they are rather heavy and not yet com-
pletely convincing.1 From the technical point of view, they are based
on the natural strategy which consists in truncating and regularizing,
then applying the arguments presented in this chapter, then passing to
the limit.
A quite different strategy, which I personally recommend, consists in
translating all the Eulerian statements in the language of Lagrangian
formalism. This is less appealing for intuition and calculations, but
somehow easier to justify in the case of optimal transport. For in-
stance, instead of the Hessian operator, one will only speak of the
second derivative along geodesics in the Wasserstein space. This point
of view will be developed in the next two chapters, and then a rigorous
treatment will not be that painful.
Still, in many situations the Eulerian point of view is better for intu-
ition and for understanding, in particular in certain problems involving
functional inequalities. The above discussion might be summarized by
the slogan “Think Eulerian, prove Lagrangian”. This is a rather excep-
tional situation from the point of view of fluid dynamics, where the
standard would rather be “Think Lagrangian, prove Eulerian” (for in-
stance, shocks are delicate to treat in a Lagrangian formalism). Once
again, the point is that “there are no shocks” in optimal transport: as
discussed in Chapter 8, trajectories do not meet until maybe at final
time.

Bibliographical notes

Otto’s seminal paper [669] studied the formal Riemannian structure

of the Wasserstein space, and gave applications to the study of the
porous medium equation; I shall come back to this topic later. With all
the preparations of Part I, the computations performed in this chapter
may look rather natural, but they were a little conceptual tour de force
at the time of Otto’s contribution, and had a strong impact on the
research community. This work was partly inspired by the desire to
1
I can afford this negative comment since myself I participated in the story.
Bibliographical notes 445

understand in depth a previous contribution by Jordan, Kinderlehrer

and Otto [493].
Otto’s computations were concerned with the case U (ρ) = ρm in
Rn . Then Otto and I considered U (ρ) = ρ log ρ on a manifold [671,
Section 3]; we computed the Hessian by differentiating twice along
geodesics in the Wasserstein space. (To my knowledge, this was the first
published work where Ricci curvature appeared R in relation to optimal
transport.) Functionals of the form E(µ) = W (x − y) µ(dx) µ(dy)
in Rn were later studied by Carrillo, McCann and myself [213]. More
recently, Lott and I [577, Appendix E] considered the functionals Uν
presented in this chapter (on a manifold and with a reference measure
e−V vol).
In my previous book [814, Section 9.1], I already gave formulas for
the gradient and Hessian of three basic types of functionals on P2 (Rn )
that I called internal energy, potential energy and interaction energy,
and which can be written respectively (with obvious notation) as
Z Z Z
1
U (ρ(x)) dx; V dµ; W (x − y) dµ(x) dµ(y). (15.22)
2
A short presentation of the differential calculus in the Wasserstein space
can be found in [814, Chapter 8]; other sources dealing with this subject,
with some variations in the presentation, are [30, 203, 214, 671, 673].
Apart from computations of gradients and Hessians, little is known
about Riemannian calculus in P2 (M ). The following issues are natural
(I am not sure how important they are, but at least they are natural):

• Is there a Jacobi equation in P2 (M ), describing small variations of

geodesic fields?
• Can one define Christoffel symbols, at least formally?
• Can one define a Laplace operator?
• Can one define a volume element? A divergence operator?

Recently, Lott [575] partly answered some of these questions by

establishing formulas for the Riemannian connection and Riemannian
curvature in the subset P ∞ (M ) of smooth positive densities, viewed
as a subset of P2 (M ), when M is compact. In a diﬀerent direction,
Gigli [415] gave a rigorous construction
R1 of a parallel transport along a
curve in P2 (Rn ) for which 0 kvt kLip dt < +∞. (See [29] for improved
results.)
446 15 Otto calculus

The problem whether there exists a natural probability measure

(“volume”, or “Boltzmann–Gibbs measure”) on P2 (M ) is, I think, very
relevant for applications in geometry or theoretical statistics. Von Re-
nesse and Sturm [827] have managed to construct natural probability
measures on P2 (S 1 ); these measures depend on a parameter β (“inverse
temperature”) and may be written heuristically as

e−Hν (µ) dvol(µ)

Pβ (dµ) = , (15.23)
Zβ

where ν is the reference measure on S 1 , that is, the Lebesgue measure.

Their construction strongly uses the one-dimensional assumption, and
makes the link with the theory of “Poisson measures” used in nonpara-
metric statistics. A particle approximation was studied in [38].
The point of view that was first advocated by Otto himself, and
which I shall adopt in this course, is that the “Otto calculus” should
primarily be considered a heuristic tool, and conclusions drawn by its
use should then be checked by “direct” means. This might lack elegance,
but it is much safer from the point of view of mathematical rigor. Some
papers in which this strategy has been used with success are [577, 669,
671, 673, 761]. Recently, Calvez [197] used the Otto formalism to derive
complex identities for chemotaxis models of Keller–Segel type, which
would have been very difficult to guess otherwise.
In most of these works, rigorous justifications are done in Lagrangian
formalism, or by methods which do not use transport at all. The work
by Otto and Westdickenberg [673] is an interesting exception: there
everything is attacked from an Eulerian perspective (using such tools
as regularization of currents on manifolds); see [271] for an elaboration
of these ideas, which applies even without smoothness.
All the references quoted above mainly deal with calculus in Pp (M )
for p = 2. The case p 6= 2 is much less well understood; as noticed
in [30, p. 10], Pp (M ) can be seen as a kind of Finsler structure, and
there are also rules to compute derivatives in that space, at least to
first order. The most general results to this date are in [30].
A better understood generalization treats the case when geodesics in
P2 (M ) are replaced by action-minimizing curves, for some Lagrangian
action like those considered in Chapter 7; the adaptation of Otto cal-
culus to this situation was worked out by Lott [576], with applications
to the study of the Ricci flow.
Bibliographical notes 447

Let me conclude with some remarks about the functionals consid-

ered in this chapter.
Functionals of the form (15.22) appear everywhere in mathematical
physics to model all kinds of energies. It would be foolish to try to
make a list.
The interpretation of p(ρ) = ρ U ′ (ρ) − U (ρ) as a pressure associated
to the constitutive law U is well-known in thermodynamics, and was
explained to me by McCann; the discussion in the present chapter is
slightly expanded in [814, Remarks
R 5.18].
The functional Hν (µ) = ρ log ρ dν (µ = ρ ν) is well-known is sta-
tistical physics, where it was introduced by Boltzmann [141]. In Boltz-
mann’s theory of gases, Hν is identiﬁed with the negative of the entropy.
It would take a whole book to review the meaning of entropy in thermo-
dynamics and statistical mechanics (see, e.g., [812] for its use in kinetic
theory). I should also mention that the H functional coincides with the
Kullback information in statistics, and it appears in Shannon’s theory
of information as an optimal compression rate [747], and in Sanov’s the-
orem as the rate function for large deviations of the empirical measure
of independent samples [302, Chapter 3] [296, Theorem 6.2.10].
An interesting example of a functional of the form (15.21) that was
considered in relation with optimal transport is the so-called Fisher
information, Z
|∇ρ|2
I(µ) = ;
ρ
see [30, Example 11.1.10] and references there provided. We shall en-
counter this functional again later.
16

Displacement convexity I

Convexity plays a prominent role in analysis in general. It is most

generally used in a vector space V: A function F : V → R ∪ {+∞} is
said to be convex if for all x, y ∈ V,

∀t ∈ [0, 1] F (1 − t) x + t y ≤ (1 − t) F (x) + t F (y). (16.1)

But convexity is also a metric notion: In short, convexity in a metric

space means convexity along geodesics. Consequently, geodesic spaces
are a natural setting in which to deﬁne convexity:

Definition 16.1 (Convexity in a geodesic space). Let (X , d) be a

complete geodesic space. Then a function F : X → R ∪ {+∞} is said to
be geodesically convex, or just convex, if for any constant-speed geodesic
path (γt )0≤t≤1 valued in X ,

∀t ∈ [0, 1] F (γt ) ≤ (1 − t) F (γ0 ) + t F (γ1 ). (16.2)

It is said to be weakly convex if for any x0 , x1 in X there exists at

least one constant-speed geodesic path (γt )0≤t≤1 with γ0 = x0 , γ1 = x1 ,
such that inequality (16.2) holds true.

It is a natural problem to identify functionals that are convex on

the Wasserstein space. In his 1994 PhD thesis, McCann established
and used the convexity of certain functionals on P2 (Rn ) to prove the
uniqueness of their minimizers. Since then, his results have been gen-
eralized; yet almost all examples which have been treated so far belong
to the general class
Z Z
dµ
F(µ) = I(x1 , . . . , xk ) dµ(x1 ) . . . dµ(xk ) + U dν,
Xk X dν
450 16 Displacement convexity I

where I(x1 , . . . , xk ) is a certain “k-particle interaction potential”, U is

a nice function R+ → R, and ν is a reference measure.
In this and the next chapter I shall consider the convexity prob-
lem on a general Riemannian manifold M , in the case I = 0, so the
functionals under study will be the functionals Uν deﬁned by
Z
Uν (µ) = U (ρ) dν, µ = ρ ν. (16.3)
M

As a start, I shall give some reminders about the notion of convexity

and its refinements; then I shall make these notions more explicit in the
case of the Wasserstein space P2 (M ). In the last section of this chapter
I shall use Otto’s calculus to guess sufficient conditions under which Uν
satisfies some interesting convexity properties (Guesses 16.6 and 16.7).
Let the reader not be offended if I strongly insist that convexity in
the metric space P2 (M ) has nothing to do with the convex structure
of the space of probability measures. The former concept will be called
“convexity along optimal transport” or displacement convexity.

Reminders on convexity: differential and integral

conditions

The material in this section has nothing to do with optimal transport,

and is, for the most part, rather standard.
It is well-known that a function F : Rn → R is convex, in the sense
of (16.1), if and only if it satisﬁes

∇2 F ≥ 0 (16.4)

(nonnegative Hessian) on Rn . The latter inequality should generally

be understood in distribution sense, but let me just forget about this
subtlety which is not essential here.
The inequality (16.4) is a diﬀerential condition, in contrast with the
“integral” condition (16.1). There is a more general principle relating a
lower bound on the Hessian (diﬀerential condition) to a convexity-type
inequality (integral condition). It can be stated in terms of the one-
dimensional Green function (of the Laplace operator with Dirichlet
boundary conditions). That Green function is the nonnegative kernel
G(s, t) such that for all functions ϕ ∈ C([0, 1]; R) ∩ C 2 ((0, 1); R),
Reminders on convexity: differential and integral conditions 451
Z 1
ϕ(t) = (1 − t) ϕ(0) + t ϕ(1) − ϕ̈(s) G(s, t) ds. (16.5)
0

It is easy to give an explicit expression for G (see Figure 16.1):

(
s (1 − t) if s ≤ t
G(s, t) = (16.6)
t (1 − s) if s ≥ t.

Then formula (16.5) actually extends to arbitrary continuous functions

ϕ on [0, 1], provided that ϕ̈ (taken in a distribution sense) is bounded
below by a real number.

0 t 1 s

Fig. 16.1. The Green function G(s, t) as a function of s.

The next statement provides the equivalence between several diﬀer-

ential and integral convexity conditions in a rather general setting.
Proposition 16.2 (Convexity and lower Hessian bounds). Let
(M, g) be a Riemannian manifold, and let Λ = Λ(x, v) be a continuous
quadratic form on TM ; that is, for any x, Λ(x, · ) is a quadratic form
in v, and it depends continuously on x. Assume that for any constant-
speed geodesic γ : [0, 1] → M ,
Λ(γt , γ̇t )
λ[γ] := inf > −∞. (16.7)
0≤t≤1 |γ˙t |2
Then, for any function F ∈ C 2 (M ), the following statements are equiv-
alent:
(i) ∇2 F ≥ Λ;
(ii) For any constant-speed, minimizing geodesic γ : [0, 1] → M ,
Z 1
F (γt ) ≤ (1 − t) F (γ0 ) + t F (γ1 ) − Λ(γs , γ̇s ) G(s, t) ds;
0
452 16 Displacement convexity I

(iii) For any constant-speed, minimizing geodesic γ : [0, 1] → M ,

Z 1

F (γ1 ) ≥ F (γ0 ) + ∇F (γ0 ), γ̇0 + Λ(γt , γ̇t ) (1 − t) dt;
0

(iv) For any constant-speed, minimizing geodesic γ : [0, 1] → M ,

Z 1

∇F (γ1 ), γ̇1 − ∇F (γ0 ), γ̇0 ≥ Λ(γt , γ̇t ) dt.
0

When these properties are satisfied, F is said to be Λ-convex. The

equivalence is still preserved if conditions (ii), (iii) and (iv) are respec-
tively replaced by the following a priori weaker conditions:
(ii’) For any constant-speed, minimizing geodesic γ : [0, 1] → M ,
t(1 − t)
F (γt ) ≤ (1 − t) F (γ0 ) + t F (γ1 ) − λ[γ] d(γ0 , γ1 )2 ;
2

(iii’) For any constant-speed, minimizing geodesic γ : [0, 1] → M ,

d(γ0 , γ1 )2
F (γ1 ) ≥ F (γ0 ) + ∇F (γ0 ), γ̇0 + λ[γ] ;
2

(iv’) For any constant-speed, minimizing geodesic γ : [0, 1] → M ,

∇F (γ1 ), γ̇1 − ∇F (γ0 ), γ̇0 ≥ λ[γ] d(γ0 , γ1 )2 .
Remark 16.3. In the particular case when Λ is equal to λg for some
constant λ ∈ R, Property (ii) reduces to Property (ii’) with λ[γ] = λ.
Indeed, since γ has constant speed,
Z 1
F (γt ) ≤ (1 − t) F (γ0 ) + t F (γ1 ) − λ g(γs , γ̇s ) G(s, t) ds
0
Z 1
2
= (1 − t) F (γ0 ) + t F (γ1 ) − λ d(γ0 , γ1 ) G(s, t) ds.
0
R1
t2
By plugging the function ϕ(t) = into (16.5) one sees that 0 G(s, t) ds =
t(1 − t)/2. So (ii) indeed reduces to
λ t(1 − t)
F (γt ) ≤ (1 − t) F (γ0 ) + t F (γ1 ) − d(γ0 , γ1 )2 . (16.8)
2
Reminders on convexity: differential and integral conditions 453

Definition 16.4 (Λ-convexity). Let M be a Riemannian manifold,

and let Λ = Λ(x, v) be a continuous quadratic form on M , satisfy-
ing (16.7). A function F : M → R ∪ {+∞} is said to be Λ-convex if
Property (ii) in Proposition 16.2 is satisfied. In the case when Λ = λg,
λ ∈ R, F will be said to be λ-convex; this means that inequality (16.8)
is satisfied. In particular, 0-convexity is just plain convexity.

Proof of Proposition 16.2. The arguments in this proof will come again
several times in the sequel, in various contexts.
Assume that (i) holds true. Consider x0 and x1 in M , and introduce
a constant-speed minimizing geodesic γ joining γ0 = x0 to γ1 = x1 .
Then
d2
2
F (γ t ) = ∇ F (γt ) · γ̇t , γ˙t ≥ Λ(γt , γ̇t ).
dt2
Then Property (ii) follows from identity (16.5) with ϕ(t) := F (γt ).
As for Property (iii), it can be established either by dividing the
inequality in (ii) by t > 0, and then letting t → 0, or directly from (i)
by using the Taylor formula at order 2 with ϕ(t) = F (γt ) again. Indeed,
ϕ̇(0) = h∇F (γ0 ), γ̇0 i, while ϕ̈(t) ≥ Λ(γt , γ̇t ).
To go from (iii) to (iv), replace the geodesic γt by the geodesic γ1−t ,
to get
Z 1

F (γ0 ) ≥ F (γ1 ) − ∇F (γ1 ), γ̇1 + Λ(γ1−t , γ̇1−t ) (1 − t) dt.
0

After changing variables in the last integral, this is

Z 1

F (γ0 ) ≥ F (γ1 ) − ∇F (γ1 ), γ̇1 + Λ(γt , γ̇t ) t dt,
0

and by adding up (iii), one gets Property (iv).

So far we have seen that (i) ⇒ (ii) ⇒ (iii) ⇒ (iv). To complete the
proof of equivalence it is suﬃcient to check that (iv’) implies (i).
So assume (iv’). From the identity
Z 1

∇F (γ1 ), γ̇1 − ∇F (γ0 ), γ̇0 = ∇2 F (γt )(γ̇t ) dt
0

and (iv’), one deduces that for all geodesic paths γ,

Z 1
λ[γ] d(γ0 , γ1 )2 ≤ ∇2 F (γt )(γ̇t ) dt. (16.9)
0
454 16 Displacement convexity I

Choose (x0 , v0 ) in TM , with v0 6= 0, and γ(t) = expx0 (εtv0 ), where

ε > 0; of course γ depends implicitly on ε. Note that d(γ0 , γ1 ) = ε|v0 |.
Write Z 1
d(γ0 , γ1 )2 2 γ̇t
λ[γ] 2
≤ ∇ F (γt ) dt. (16.10)
ε 0 ε
As ε → 0, (γt , γ˙t ) ≃ (x0 , εv0 ) in TM , so

Λ(γt , γ̇t ) Λ(γt , γ̇t /ε) Λ(x0 , v0 )

λ[γ] = inf 2
= inf 2
−−−→ .
0≤t≤1 |γ̇t | 0≤t≤1 |γ̇t /ε| ε→0 |v0 |2

Thus the left-hand side of (16.10) converges to Λ(x0 , v0 ). On the other

hand, since ∇2 F is continuous, the right-hand side obviously converges
to ∇2 F (x0 )(v0 ). Property (i) follows. ⊓
⊔

Displacement convexity

I shall now discuss convexity in the context of optimal transport,

replacing the manifold M of the previous section by the geodesic
space P2 (M ). For the moment I shall only consider measures that are
absolutely continuous with respect to the volume on M , and denote by
P2ac (M ) the space of such measures. It makes sense to study convex-
ity in P2ac (M ) because this is a geodesically convex subset of P2 (M ):
By Theorem 8.7, a displacement interpolation between any two ab-
solutely continuous measures is itself absolutely continuous. (Singular
measures will be considered later, together with singular metric spaces,
in Part III.)
So let µ0 and µ1 be two probability measures on M , absolutely con-
tinuous with respect to the volume element, and let (µt )0≤t≤1 be the
displacement interpolation between µ0 and µ1 . Recall from Chapter 13
that this displacement interpolation is uniquely deﬁned, and character-
ized by the formulas µt = (Tt )# µ0 , where

e
Tt (x) = expx (t ∇ψ(x)), (16.11)

and ψ is d2 /2-convex. (Forget about theesymbol if you don’t like it.)

Moreover, Tt is injective for t < 1; so for any t < 1 it makes sense to
deﬁne the velocity ﬁeld v(t, x) on Tt (M ) by
Displacement convexity 455

d
v t, Tt (x) = Tt (x),
dt
and one also has
e t (Tt (x)),
v t, Tt (x) = ∇ψ
where ψt is a solution at time t of the quadratic Hamilton–Jacobi equa-
tion with initial datum ψ0 = ψ.
The next definition adapts the notions of convexity, λ-convexity
and Λ-convexity to the setting of optimal transport. Below λ is a real
number that might nonnegative or nonpositive, while Λ = Λ(µ, v) de-
fines for each probability measure µ a quadratic form on vector fields
v : M → TM .

Definition 16.5 (Displacement convexity). With the above no-

tation, a functional F : P2ac (M ) → R ∪ {+∞} is said to be:

• displacement convex if, whenever (µt )0≤t≤1 is a (constant-speed,

minimizing) geodesic in P2ac (M ),

∀t ∈ [0, 1] F (µt ) ≤ (1 − t) F (µ0 ) + t F (µ1 );

• λ-displacement convex, if, whenever (µt )0≤t≤1 is a (constant-

speed, miminizing) geodesic in P2ac (M ),

λ t(1 − t)
∀t ∈ [0, 1] F (µt ) ≤ (1−t) F (µ0 )+t F (µ1 )− W2 (µ0 , µ1 )2 ;
2
• Λ-displacement convex, if, whenever (µt )0≤t≤1 is a (constant-
speed, minimizing) geodesic in P2ac (M ), and (ψt )0≤t≤1 is an associ-
ated solution of the Hamilton–Jacobi equation,
Z 1
∀t ∈ [0, 1] F (µt ) ≤ (1−t) F (µ0 )+t F (µ1 )− e s ) G(s, t) ds,
Λ(µs , ∇ψ
0

where G(s, t) is the one-dimensional Green function of (16.6). (It

is assumed that Λ(µs , ∇ψe s ) G(s, t) is bounded below by an integrable
function of s ∈ [0, 1].)

Of course these deﬁnitions are more and more general: Λ-displacement

convexity reduces to λ-displacement convexity when Λ(µ, v) = λ kvk2L2 (µ) ;
and this in turn reduces to plain displacement convexity when λ = 0.
456 16 Displacement convexity I

Displacement convexity from curvature-dimension

bounds

The question is whether the previously deﬁned concepts apply to func-

tionals of the form Uν , as in (16.3). Of course Proposition 16.2 does
not apply, because neither P2 (M ) nor P2ac (M ) are smooth manifolds.
However, if one believes in Otto’s formalism, then one can hope that
displacement convexity, λ-displacement convexity, Λ-displacement con-
vexity of Uν would be respectively equivalent to

Hessµ Uν ≥ 0, Hessµ Uν ≥ λ, Hessµ Uν (µ̇) ≥ Λ(µ, µ̇), (16.12)

where Hessµ Uν stands for the formal Hessian of Uν at µ (which was

computed in Chapter 15), λ is shorthand for λ k · k2L2 (µ) , and µ̇ is
identiﬁed with ∇ψ via the usual continuity equation

µ̇ + ∇ · (∇ψ µ) = 0.

Let us try to identify simple suﬃcient conditions on the manifold

M , the reference measure ν and the energy function U , for (16.12) to
hold. This quest is, for the moment, just formal; it will be checked later,
without any reference to Otto’s formalism, that our guess is correct.
To identify conditions for displacement convexity I shall use again
the formalism of Chapter 14. Equip the Riemannian manifold M with
a reference measure ν = e−V vol, where V is a smooth function on M ,
and assume that the resulting space satisfies the curvature-dimension
bound CD(K, N ), as in Theorem 14.8, for some N ∈ [1, ∞] and K ∈ R.
Everywhere in the sequel, ρ will stand for the density of µ with respect
to ν.
Consider a continuous function U : R+ → R. I shall assume that U is
convex and U (0) = 0. The latter condition is quite natural from a phys-
ical point of view (no matter ⇒ no energy). The convexity assumption
might seem more artificial, and to justify it I will argue that (i) the con-
vexity of U is necessary for Uν to be lower semicontinuous with respect
to the weak topology induced by the metric W2 ; (ii) if one imposes the
nonnegativity of the pressure p(r) = r U ′ (r) − U (r), which is natural
from the physical point of view, then conditions for displacement con-
vexity will be in the end quite more stringent than just convexity of U ;
(iii) the convexity of U automatically implies the nonnegativity of the
pressure, since p(r) = r U ′ (r) − U (r) = r U ′ (r) − U (r) + U (0) ≥ 0. For
simplicity I shall also impose that U is twice continuously differentiable
Displacement convexity from curvature-dimension bounds 457

everywhere in (0, +∞). Finally, I shall assume that ψ in (16.11) is C 2 ,

and I shall avoid the discussion about the domain of deﬁnition of Uν
by just considering compactly supported probability measures. Then,
from (15.13) and (14.51),
Z Z
Hessµ Uν (µ̇) = Γ2 (ψ) p(ρ) dν + (Lψ)2 p2 (ρ) dν (16.13)
ZM M Z h pi
≥ RicN,ν (∇ψ) p(ρ) dν + (Lψ)2 p2 + (ρ) dν
M M N
(16.14)
Z Z h i
p
≥K |∇ψ|2 p(ρ) dν + (Lψ)2 p2 + (ρ) dν.
M M N
(16.15)

To get a bound on this expression, it is natural to assume that

p
p2 + ≥ 0. (16.16)
N
The set of all functions U for which (16.16) is satisﬁed will be called
the displacement convexity class of dimension N and denoted
by DCN . A typical representative of DCN , for which (16.16) holds as
an equality, is U = UN deﬁned by

1− 1

−N ρ N − ρ (1 < N < ∞)
UN (ρ) = (16.17)


ρ log ρ (N = ∞).

These functions will come back again and again in the sequel, and the
associated functionals will be denoted by HN,ν .
If inequality (16.16) holds true, then

Hessµ Uν ≥ KΛU ,

where Z
ΛU (µ, µ̇) = |∇ψ|2 p(ρ) dν. (16.18)
M
So the conclusion is as follows:

Guess 16.6. Let M be a Riemannian manifold satisfying a curvature-

dimension bound CD(K, N ) for some K ∈ R, N ∈ (1, ∞], and let U
satisfy (16.16); then Uν is KΛU -displacement convex.
458 16 Displacement convexity I

Actually, there should be an equivalence between the two statements

in Guess 16.6. To see this, assume that Uν is KΛU -displacement convex;
pick up an arbitrary point x0 ∈ M , and a tangent vector v0 ∈ Tx0 M ;
consider the particular function U = UN , a probability measure µ which
is very much concentrated close to x0 , and a function ψ such that
∇ψ(x0 ) = v0 and Γ2 (ψ) + (Lψ)2 /N = RicN,ν (v0 ) (as in the proof of
Theorem 14.8). Then, on the one hand,
Z Z
1 1
KΛU (µ, µ̇) = K |∇ψ|2 ρ1− N dν ≃ K|v0 |2 ρ1− N dν; (16.19)

on the other hand, by the choice of U ,

Z
(Lψ)2 1
Hessµ Uν (µ̇) = Γ2 (ψ) + ρ1− N dν,
N

but then since µ is concentrated around x0 , this is well approximated

by
Z Z
(Lψ)2 1
1− N 1
Γ2 (ψ) + (x0 ) ρ dν = RicN,ν (v0 ) ρ1− N dν.
N

Comparing the latter expression with (16.19) shows that RicN,ν (v0 ) ≥
K|v0 |2 . Since x0 and v0 were arbitrary, this implies RicN,ν ≥ K. Note
that this reasoning only used the functional HN,ν = (UN )ν , and prob-
ability measures µ that are very concentrated around a given point.
This heuristic discussion is summarized in the following:

Guess 16.7. If, for each x0 ∈ M , HN,ν is KΛU -displacement con-

vex when applied to probability measures that are supported in a small
neighborhood of x0 , then M satisfies the CD(K, N ) curvature-dimension
bound.

Example 16.8. Condition CD(0, ∞) with ν = vol just means Ric ≥ 0,

and the statement U ∈ DC∞ just means that the iterated pressure p2
is nonnegative. The typical example is when U (ρ) = ρ log ρ, and then
the corresponding functional is
Z
H(µ) = ρ log ρ dvol, µ = ρ vol.

Then the above considerations suggest that the following statements

are equivalent:
A fluid mechanics feeling for Ricci curvature 459

(i) Ric ≥ 0;
(ii) If the nonlinearity U is such that the nonnegative iterated pres-
sure p2 is nonnegative, then the functional Uvol is displacement convex;

(iii) H is displacement convex;

(iii’) For any x0 ∈ M , the functional H is displacement convex
when applied to probability measures that are supported in a small
neighborhood of x0 .
Example 16.9. The above considerations also suggest that the in-
equality Ric ≥ Kg is equivalent to the K-displacement convexity of
the H functional, whatever the value of K ∈ R.
These guesses will be proven and generalized in the next chapter.

A fluid mechanics feeling for Ricci curvature

Ricci curvature is familiar to physicists because it plays a crucial role
in Einstein’s theory of general relativity. But what we have been dis-
covering in this chapter is that Ricci curvature can also be given a
physical interpretation in terms of classical fluid mechanics. To pro-
vide the reader with a better feeling of this new point of view, let us
imagine how two physicists, the first one used to relativity and light
propagation, the second one used to fluid mechanics, would answer the
following question: Describe in an informal way an experiment that can
determine whether we live in a nonnegatively Ricci-curved space.
The light source test: Take a small light source, and try to determine
its volume by looking at it from a distant position. If you systemat-
ically overestimate the volume of the light source, then you live in a
nonnegatively curved space (recall Figure 14.4).
The lazy gas experiment: Take a perfect gas in which particles do
not interact, and ask him to move from a certain prescribed density
field at time t = 0, to another prescribed density field at time t = 1.
Since the gas is lazy, he will find a way to do so that needs a minimal
amount of work (least action path). Measure the entropy of the gas at
each time, and check that it always lies above the line joining the final
and initial entropies. If such is the case, then we know that we live in
a nonnegatively curved space (see Figure 16.2).
460 16 Displacement convexity I
111111111
000000000
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
11111111
00000000 000000000
111111111
00000000
11111111 000000000
111111111 111111111
000000000
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111 000000000
111111111
00000000
11111111 000000000
111111111
00000000
11111111 000000000
111111111
00000000
11111111 000000000
111111111 t=1
000000000
111111111
000000000
111111111
000000000
111111111
t=0 000000000
111111111
000000000
111111111
000000000
111111111
t = 1/2
R
S = − ρ log ρ

t=0 t=1
Fig. 16.2. The lazy gas experiment: To go from state 0 to state 1, the lazy gas
uses a path of least action. In a nonnegatively curved world, the trajectories of the
particles first diverge, then converge, so that at intermediate times the gas can afford
to have a lower density (higher entropy).

Bibliographical notes

Convexity has been extensively studied in the Euclidean space [705]

and in Banach spaces [172, 324]. I am not aware of textbooks where
the study of convexity in more general geodesic spaces is developed,
although this notion is now quite frequently used (in the context of
optimal transport, see, e.g., [30, p. 50]).
The concept and terminology of displacement convexity were intro-
duced by McCann in the mid-nineties [614]. He identiﬁed (16.16) as the
basic criterion for convexity in P2 (Rn ), and also discussed other formu-
lations of this condition, which will be studied in the next chapter.
Inequality (16.16) was later rediscovered by several authors, in various
contexts.
Bibliographical notes 461

The application of Otto calculus to the study of displacement con-

vexity goes back to [669] and [671]. In the latter reference it was con-
jectured that nonnegative Ricci curvature would imply displacement
convexity of H.
Ricci curvature appears explicitly in Einstein’s equations, and will
be encountered in any mildly advanced book on general relativity. Fluid
mechanics analogies for curvature appear explicitly in the work by
Cordero-Erausquin, McCann and Schmuckenschläger [246].
Lott [576] recently pointed out some interesting properties of dis-
placement convexity for functionals
R explicitly depending on t ∈ [0, 1];
for instance the convexity of t ρt log ρt dν + N t log t along displace-
ment interpolation is a characterization of CD(0, N ). The Otto formal-
ism is also useful here.
17

Displacement convexity II

In Chapter 16, a conjecture was formulated about the links between dis-
placement convexity and curvature-dimension bounds; its plausibility
was justiﬁed by some formal computations based on Otto’s calculus.
In the present chapter I shall provide a rigorous justiﬁcation of this
conjecture. For this I shall use a Lagrangian point of view, in contrast
with the Eulerian approach used in the previous chapter. Not only is
the Lagrangian formalism easier to justify, but it will also lead to new
curvature-dimension criteria based on so-called “distorted displacement
convexity”.
The main results in this chapter are Theorems 17.15 and 17.37.

Displacement convexity classes

What I shall call displacement convexity class of order N is a family

of convex nonlinearities satisfying a certain characteristic diﬀerential
inequality of second order (recall (16.16)).

Definition 17.1 (Displacement convexity classes). Let N be a

real parameter in [1, ∞]. The class DCN is defined as the set of contin-
uous convex functions U : R+ → R, twice continuously differentiable
on (0, +∞), such that U (0) = 0, and, with the notation

p(r) = r U ′ (r) − U (r), p2 (r) = r p′ (r) − p(r),

U satisfies any one of the following equivalent differential conditions:

p
(i) p2 + ≥ 0;
N
464 17 Displacement convexity II

p(r)
(ii) is a nondecreasing function of r;
r 1−1/N
( )
δN U (δ−N ) (δ > 0) if N < ∞
(iii) u(δ) := δ −δ
is a convex
e U (e ) (δ ∈ R) if N = ∞
function of δ.

Remark 17.2. Since U is convex and U (0) = 0, the function u ap-

pearing in (iii) is automatically nonincreasing.

Remark 17.3. It is clear (from condition (i) for instance) that DCN ′ ⊂
DCN if N ′ ≥ N . So the smallest class of all is DC∞ , while DC1 is the
largest (actually, conditions (i)–(iii) are void for N = 1).

Remark 17.4. If U belongs to DCN , then for any a ≥ 0, b > 0, c ∈ R,

the function r 7−→ a U (br) + cr also belongs to DCN .

Remark 17.5. The requirement for U to be twice diﬀerentiable on

(0, +∞) could be removed from many subsequent results involving
displacement convexity classes. Still, this regularity assumption will
simplify the proofs, without signiﬁcantly restricting the generality of
applications.

Examples 17.6. (i) For any α ≥ 1, the function U (r) = r α belongs to

all classes DCN .
(ii) If α < 1, then the function U (r) = −r α belongs to DCN if and
only if N ≤ (1 − α)−1 (that is, α ≥ 1 − 1/N ). The function −r 1−1/N is
in some sense the minimal representative of DCN .
(iii) The function U∞ (r) = r log r belongs to DC∞ . It can be seen
as the limit of the functions UN (r) = −N (r 1−1/N − r), which are the
same (up to multiplication and addition of a linear function) as the
functions appearing in (ii) above.

Proof of the equivalence in Definition 17.1. Assume ﬁrst N < ∞, and

write r(δ) = δ−N . By computation, u′ (δ) = −N p(r)/r 1−1/N . So u is
convex if and only if p(r)/r 1−1/N is a nonincreasing function of δ, i.e.
a nondecreasing function of r. Thus (ii) and (iii) are equivalent.
Next, by computation again,

′′ 2
2 N −1 p(r)
u (δ) = N r p2 (r) + . (17.1)
N
Displacement convexity classes 465

So u is convex if and only if p2 + p/N is nonnegative. This shows the

equivalence between (i) and (iii).
In the case N = ∞, the arguments are similar, with the formulas
p(r) p2 (r)
r(δ) = e−δ , u′ (δ) = − , u′′ (δ) = .
r r
⊓
⊔
The behavior of functions in DCN will play an important role in the
sequel of this course. Functions in DCN may present singularities at the
origin; for example UN (r) is not diﬀerentiable at r = 0. It is often possi-
ble to get around this problem by replacing UN (r) by a smooth approx-
imation which still belongs to DCN , for instance −N r(r + ε)−1/N − r ,
and later passing to the limit as ε → 0. The next proposition provides
more systematic ways to “regularize” functions in DCN near 0 or +∞;
at the same time it gives additional information about the behavior
of functions in DCN . The notation p(r) and p2 (r) is the same as in
Deﬁnition 17.1.
Proposition 17.7 (Behavior of functions in DCN ).
(i) Let N ∈ [1, ∞), and Ψ ∈ C(R+ ; R+ ) such that Ψ (r)/r → +∞
as r → ∞; then there exists U ∈ DCN such that 0 ≤ U ≤ Ψ , and
U (r)/r −→ +∞ as r → ∞.
(ii) If U ∈ DC∞ , then either U is linear, or there exist constants
a > 0, b ∈ R such that

∀r ≥ 0, U (r) ≥ a r log r + b r.

(iii) Let N ∈ [1, ∞] and let U ∈ DCN . If r0 ∈ (0, +∞) is such that
p(r0 ) > 0, then there is a constant K > 0 such that p′ (r) ≥ Kr −1/N
for all r ≥ r0 . If on the contrary p(r0 ) = 0, then U is linear on [0, r0 ].
In particular, the set {r; U ′′ (r) = 0} is either empty, or an interval of
the form [0, r0 ].
(iv) Let N ∈ [1, ∞] and let U ∈ DCN . Then U is the pointwise
nondecreasing limit of a sequence of functions (Uℓ )ℓ∈N in DCN , such
that (a) Uℓ coincides with U on [0, rℓ ], where rℓ is arbitrarily large;
1
(b) for each ℓ there are a ≥ 0 and b ∈ R such that Uℓ (r) = −a r 1− N +b r
(or a r log r + b r if N = ∞) for r large enough; (c) Uℓ′ (∞) → U ′ (∞)
as ℓ → ∞.
(v) Let N ∈ [1, ∞] and let U ∈ DCN . Then U is the pointwise
nonincreasing limit of a sequence of functions (Uℓ )ℓ∈N in DCN , such
466 17 Displacement convexity II

that (a) Uℓ coincides with U on [rℓ , +∞), where rℓ is an arbitrary real

number such that p′ (rℓ ) > 0; (b) Uℓ (r) is a linear function of r close to
the origin; (c) (Uℓ )′ (0) → U ′ (0) as ℓ → ∞.
(vi) In statements (iv) and (v), one can also impose that Uℓ′′ ≤ C U ′′ ,
for some constant C independent of ℓ. In statement (v), one can also
impose that Uℓ′′ increases nicely from 0, in the following sense: If [0, r0 ]
is the interval where Uℓ′′ = 0, then there are r1 > r0 , an increasing
function h : [r0 , r1 ] → R+ , and constants K1 , K2 such that K1 h ≤
U ′′ ≤ K2 h on [r0 , r1 ].
(vii) Let N ∈ [1, ∞] and let U ∈ DCN . Then there is a sequence
(Uℓ )ℓ∈N of functions in DCN such that Uℓ ∈ C ∞ ((0, +∞)), Uℓ con-
2 ((0, +∞)); and, with the notation
verges to U monotonically and in Cloc
pℓ (r) = r Uℓ′ (r) − Uℓ (r),

pℓ (r) p(r) pℓ (r) p(r)

inf 1 −−−→ inf 1 ; sup 1 −−−→ sup 1 .
1− N 1− N 1− N
r r ℓ→∞ r r r r ℓ→∞ r r 1− N
Here are some comments about these results. Statements (i) and (ii)
show that functions in DCN can grow as slowly as desired at infinity if
N < ∞, but have to grow at least like r log r if N = ∞. Statements
(iv) to (vi) make it possible to write any U ∈ DCN as a monotone
limit (nonincreasing for small r, nondecreasing for large r) of “very
nice” functions Uℓ ∈ DCN , which behave linearly close to 0 and like
b r − a r 1−1/N (or a r log r + b r) at infinity (see Figure 17.1). This ap-
proximation scheme makes it possible to extend many results which
can be proven for very nice nonlinearities, to general nonlinearities in
DCN .
The proof of Proposition 17.7 is more tricky than one might expect,
and it is certainly better to skip it at first reading.

Proof of Proposition 17.7. The case N = 1 is not diﬃcult to treat sep-

arately (recall that DC1 is the class of all convex continuous functions
U with U (0) = 0 and U ∈ C 2 ((0, +∞))). So in the sequel I shall as-
sume N > 1. The strategy will always be the same: First approximate
u, then reconstruct U from the approximation, thanks to the formula
U (r) = r u(r −1/N ) (r u(log 1/r) if N = ∞).
Let us start with the proof of (i). Without loss of generality, we
may assume that Ψ is identically 0 on [0, 1] (otherwise, replace Ψ by
χΨ , where 0 ≤ χ ≤ 1 and χ is identically 0 on [0, 1], identically 1 on
[2, +∞)). Deﬁne a function u : (0, +∞) → R by
Displacement convexity classes 467

U (r) Uℓ (r)

Fig. 17.1. Uℓ (dashed line) is an approximation of U (solid line); it is linear close

to the origin and almost affine at infinity. This regularization can be made without
going out of the class DCN , and without increasing too much the second derivative
of U .

u(δ) = δN Ψ (δ−N ).

Then u ≡ 0 on [1, +∞), and limδ→0+ u(δ) = +∞.

The problem now is that u is not necessarily convex. So let u e be the
lower convex hull of u on (0, ∞), i.e. the supremum of all linear functions
bounded above by u. Then u e ≡ 0 on [1, ∞) and u e is nonincreasing.
Necessarily,
e(δ) = +∞.
lim u (17.2)
δ→0+
Indeed, suppose on the contrary that limδ→0+ u e(δ) = M < +∞, and
M +1−u(δ)
let a ∈ R be deﬁned by a := supδ≥0 δ (the latter function is
nonpositive when δ is small enough, so the supremum is ﬁnite). Then
u(δ) ≥ M + 1 − aδ, so limδ→0+ u e(δ) ≥ M + 1, which is a contradiction.
Thus (17.2) does hold true.
Then let
e(r −1/N ).
U (r) := r u
468 17 Displacement convexity II

Clearly U is continuous and nonnegative, with U ≡ 0 on [0, 1]. By com-

putation, U ′′ (r) = N −2 r −1−1/N (r −1/N u
e′′ (r −1/N ) − (N − 1) u
e′ (r −1/N )).
As ue is convex and nonincreasing, it follows that U is convex. Hence
U ∈ DCN . On the other hand, since u e ≤ u and Ψ (r) = r u(r −1/N ), it is
clear that U ≤ Ψ ; and still (17.2) implies that U (r)/r goes to +∞ as
r → ∞.
Now consider Property (ii). If N = ∞, then the function U can be
reconstructed from u by the formula

U (r) = r u(log(1/r)), (17.3)

As u is convex and nonincreasing, either u is constant (in which case U

is linear), or there are constants a > 0, b ∈ R, such that u(δ) ≥ −aδ +b,
and then U (r) ≥ −a r log(1/r) + b r = a r log r + b r.
Next let us turn to (iii). First assume N < ∞. The formula
1 1− 1 ′ − 1
p(r) = r U ′ (r) − U (r) = − r N u (r N )
N
−1/N
shows that p(r0 ) > 0 if and only if u′ (r0 ) < 0. Then for any r ≤ r0 ,
−1/N
u′ (r −1/N ) ≤ u′ (r0 ), so

r − N − 1 ′′ − 1
1
′ −1
p′ (r) = r U ′′ (r) = r N u (r N ) − (N − 1) u (r N )
N2  
1
−N
(N − 1) u′ (r0 ) 1
≥ − r− N .
N2

−1/N
If on the other hand u′ (r0 ) = 0, then necessarily u′ (r −1/N ) = 0 for
−1/N
all r ≤ r0 , which means that u is constant on [r0 , +∞), so U is
linear on [0, r0 ].
The reasoning is the same in the case N = ∞, with the help of the
formulas
1 1 ′′ 1 1
p(r) = −r u′ log , U ′′ (r) = u log − u′ log
r r r r
and 1
r ≥ r0 =⇒ p′ (r) = r U ′′ (r) ≥ −u′ log .
r0
Displacement convexity classes 469

Now consider statement (iv). The idea is to replace u by an aﬃne

function close to the origin, essentially by smoothing of the trivial C 1
approximation by the tangent. First let N ∈ [1, ∞), let U ∈ DCN and
let u(δ) = δN U (δ−N ). We know that u is a nonincreasing, twice diﬀer-
entiable convex function on (0, +∞). If u is linear close to the origin,
there is nothing to prove. Otherwise there is a sequence of positive
numbers (aℓ )ℓ∈N such that aℓ+1 ≤ aℓ /4 and u′ (aℓ+1 ) < u′ (aℓ /2) < 0.
For each ℓ, construct a C 2 function uℓ as follows:
• on [aℓ , +∞), uℓ coincides with u;
• on [0, aℓ ], u′′ℓ = χℓ u′′ , where χℓ is a smooth cutoﬀ function such that
0 ≤ χℓ ≤ 1, χℓ (aℓ ) = 1, χℓ (δ) = 0 for δ ≤ aℓ /2.
Since uℓ is convex and u′ℓ (aℓ ) < 0, also u′ℓ < 0 on (0, aℓ ]. By con-
struction, uℓ is linear on [0, aℓ /2]. Also u′′ℓ ≤ u′′ , u′ℓ (aℓ ) = u′ (aℓ ),
uℓ (aℓ ) = u(aℓ ); by writing the Taylor formula on [s, aℓ ] (with 1/ℓ as
base point), we deduce that uℓ (s) ≤ u(s), u′ℓ (s) ≥ u′ (s) for all s ≤ aℓ
(and therefore for all s).
For each ℓ, uℓ lies above the tangent to u at 1/ℓ; that is

uℓ (s) ≥ u(aℓ ) + (s − aℓ ) u′ (aℓ ) =: Tℓ (s).

Since u′ is nondecreasing and u′ (aℓ+1 ) < u′ (aℓ /2), the curve Tℓ lies
strictly below the curve Tℓ+1 on [0,R aℓaℓ /2], ′′and therefore on [0, aℓ+1 ]. By
choosing χℓ in such a way that aℓ /2 χℓ u is very small, we can make
sure that uℓ is very close to the line Tℓ (s) on [0, aℓ ]; and in particular
that the whole curve uℓ is bounded above by Tℓ+1 on [aℓ+1 , aℓ ]. This
will ensure that uℓ is a nondecreasing function of ℓ.
To recapitulate: uℓ ≤ u; uℓ+1 ≤ uℓ ; uℓ = u on [aℓ , +∞); 0 ≤ u′′ℓ ≤ u′′ ;
0 ≥ u′ℓ ≥ u′ ; uℓ is aﬃne on [0, aℓ /2].
Now let
Uℓ (r) = r uℓ (r −1/N ).
By direct computation,

r −1− N − 1 ′′ − 1
1
′ −1
Uℓ′′ (r) = r N u (r N ) − (N − 1) u (r N ) .
ℓ ℓ (17.4)
N2
Since uℓ is convex nonincreasing, the above expression is nonnegative;
so Uℓ is convex, and by construction, it lies in DCN . Moreover Uℓ sat-
isﬁes the ﬁrst requirement in (vi), since Uℓ′′ (r) is bounded above by
(r −1−1/N /N 2 ) (r −1/N u′′ (r −1/N ) − (N − 1) u′ (r −1/N )) = U ′′ (r).
470 17 Displacement convexity II

In the case N = ∞, things are similar, except that now u is deﬁned

on the whole of R, the sequence aℓ converges to −∞ (say aℓ+1 ≤ 2aℓ ),
and one should use the formulas
1 ′′ 1 1
Uℓ (r) = r uℓ (log 1/r); Uℓ′′ (r) = uℓ log − u′ℓ log .
r r r
For (v), the idea is to replace u by a constant function for large
values of δ. But this cannot be done in a C 1 way, so the smoothing
turns out to be more tricky. (Please consider again possibly skipping
the rest of this proof.)
I shall distinguish four cases, according to the behavior of u at in-
finity, and the value of u′ (+∞) = lims→+∞ u′ (s). To fix ideas I shall
assume that N < ∞; but the case N = ∞ can be treated similarly.
In each case I shall also check the first requirement of (vi), which is
Uℓ′′ ≤ C U ′′ .
Case 1: u is affine at infinity and u′ (+∞) = 0. This means that
u(δ) = c for δ ≥ δ0 large enough, where c is some constant. Then
U (r) = r u(r −1/N ) = c r for r ≤ δ0−N , and there is nothing to prove.
Case 2: u is affine at infinity and u′ (+∞) < 0. Let a := −u′ (+∞),
so u′ ≤ −a. By assumption there are δ0 > 0 and b ∈ R such that
u(s) = −as+b for s ≥ δ0 . Let a1 ≥ max(1, δ0 ). I shall define recursively
an increasing sequence (aℓ )ℓ∈N , and C 2 functions uℓ such that:
• on [0, aℓ ], uℓ coincides with u;
• on [aℓ , +∞), u′′ℓ (s) = χℓ (s)/s, where χℓ is a continuous function
with compact support in (aℓ , +∞), 0 ≤ χℓ ≤ 1. (So uℓ is obtained
by integrating this twice, and ensuring the C 1 continuity at s = ℓ;
note that u′′ (aℓ ) = 0, so the result will be C 2 .)
Let us choose χℓ to be supported in some interval (aℓ , bℓ ), such that
Z +∞
χℓ (s)
ds = a.
aℓ s
R +∞
Such a χℓ exists since aℓ ds/s = +∞. Then we let aℓ+1 ≥ bℓ + 1.
The function uℓ is convex by construction,
R +∞ and affine at infinity;
′ ′ ′
moreover, uℓ (+∞) = u (aℓ+1 ) = u (aℓ ) + aℓ χℓ (s) ds/s = 0, so uℓ is
actually constant at infinity and u′ℓ ≤ 0. Obviously u′′ℓ ≥ u′′ , so u′ℓ ≥ u′
and uℓ ≥ u. Also, on [aℓ+1 , +∞), uℓ+1 ≤ u(aℓ+1 ) ≤ uℓ (aℓ+1 ) ≤ uℓ ,
Displacement convexity classes 471

while on [0, aℓ+1 ], uℓ+1 = u ≤ uℓ ; so the sequence (uℓ ) is nonincreasing

in ℓ.
Let Uℓ (r) = r uℓ (r −1/N ). Formula (17.4) shows again that Uℓ′′ ≥ 0,
and it is clear that Uℓ (0) = 0, Uℓ ∈ C(R+ )∩C 2 ((0, +∞)); so Uℓ ∈ DCN .
It is clear also that Uℓ ≥ U , Uℓ coincides with U on [0, a−N ℓ ], Uℓ is
−N
linear on [0, bℓ ], Uℓ converges monotonically to U as ℓ → ∞, and
Uℓ′ (0) = uℓ (+∞) converges to u(+∞) = U ′ (0) = −∞.
It only remains to check the bound Uℓ′′ ≤ CU ′′ . This bound is obvi-
ous on [a−Nℓ , +∞); for r ≤ aℓ
−N
it results from the formulas

r −1− N − 1
1
−1 1
Uℓ′′ (r) = r ℓ
′
N χ (r N ) r N − (N − 1) u (r)
ℓ
N2
1
−1− N
r
≤ 2
1 + (N − 1)a ;
N

′′ N −1 1
U (r) = 2
a r −1− N .
N
So C = 1 + 1/((N − 1)a) is admissible.
Case 3: u is not affine at infinity and u′ (+∞) = 0. The proof is
based again on the same principle, but modified as follows:
• on [0, aℓ ], uℓ coincides with u;
• on [aℓ , +∞), u′′ℓ (s) = ζℓ (s) u′′ (s), where ζℓ is a smooth function
identically equal to 1 close to aℓ , identically equal to 0 at infinity,
with values in [0, 2].
Choose aℓ < bℓ < cℓ , and ζℓ supported in [aℓ , cℓ ], so that 1 ≤ ζℓ ≤ 2
on [aℓ , bℓ ], 0 ≤ ζℓ ≤ 2 on [bℓ , cℓ ], and
Z bℓ Z cℓ
′′ ′ ′
ζℓ (s) u (s) ds > u (bℓ ) − u (aℓ ); ζℓ (s) u′′ (s) ds = −u′ (aℓ ).
aℓ aℓ
R +∞
This is possible since u′ and u′′ are continuous and aℓ (2u′′ (s)) ds =
2(u′ (+∞) − u′ (aℓ )) > u′ (+∞) − u′ (aℓ ) > 0 (otherwise u would be affine
on [aℓ , +∞)). Then choose aℓ+1 > cℓ + 1.
The resulting function uℓ is convex and it satisfies u′ℓ (+∞) =
u (aℓ ) − u′ (aℓ ) = 0, so u′ℓ ≤ 0 and uℓ is constant at infinity.
′

On [aℓ , bℓ ], u′′ℓ ≥ u′′ , so u′ℓ ≥ u′ and uℓ ≥ u, and these inequalities

are strict at bℓ . Since u′ and u′′ are continuous, we can always arrange
that bℓ is so close to cℓ that the inequalities uℓ ≥ u and u′ℓ ≥ u′ hold
472 17 Displacement convexity II

true on [bℓ , cℓ ]. Then these inequalities will also hold true on [cℓ , +∞)
since uℓ is constant there, and u is nonincreasing.
Define Uℓ (r) = r uℓ (r −1/N ). The same reasoning as in the previous
case shows that Uℓ lies in DCN , Uℓ ≥ U , Uℓ is linear on [0, c−N ℓ ], Uℓ
converges monotonically to U as ℓ → ∞, and Uℓ′ (0) = uℓ (+∞) con-
verges to u(+∞) = U ′ (0). The sequence (Uℓ ) satisfies all the desired
properties; in particular the inequalities u′′ℓ ≤ 2u′′ and u′ℓ ≥ u′ ensure
that Uℓ′′ ≤ 2U ′′ .
Case 4: u is not affine at infinity and u′ (+∞) < 0. In this case the
proof is based on the same principle, and uℓ is defined as follows:
• on [0, aℓ ], uℓ coincides with u;
• on [aℓ , +∞), u′′ℓ (s) = ηℓ (s) u′′ (s)+ χℓ (s)/s, where χℓ and ηℓ are both
valued in [0, 1], χℓ is a smooth cutoff function with compact support
in (aℓ , +∞), and ηℓ is a smooth function identically equal to 1 close
to aℓ , and identically equal to 0 close to infinity.
To construct these functions, first choose bℓ > aℓ and χℓ supported
in [aℓ , bℓ ] in such a way that
Z bℓ
χℓ (s) u′ (bℓ ) + u′ (+∞)
ds = − .
aℓ s 2
R +∞
This is always possible since aℓ ds/s = +∞, u′ is continuous and
−(u′ (bℓ )+u′ (+∞))/2 approaches the finite limit −u′ (+∞) as bℓ → +∞.
Then choose cℓ > bℓ , and ηℓ supported in [aℓ , cℓ ] such that ηℓ = 1
on [aℓ , bℓ ] and Z cℓ
u′ (+∞) − u′ (bℓ )
ηℓ u′′ = .
bℓ 2
R +∞
This is always possible since bℓ u′′ (s) ds = u′ (+∞) − u′ (bℓ ) >
[u′ (+∞) − u′ (bℓ )]/2 > 0 (otherwise u would be affine on [bℓ , +∞)).
Finally choose aℓ+1 ≥ cℓ + 1.
The function uℓ so constructed is convex, affine at infinity, and
Z bℓ Z bℓ Z cℓ
χℓ (s)
u′ℓ (+∞) = u′ (aℓ ) + u′′ + ds + ηℓ u′′ = 0.
aℓ aℓ s bℓ

So uℓ is actually constant at inﬁnity, and u′ℓ ≤ 0.

On [aℓ , bℓ ], u′′ℓ ≥ u′′ , u′ℓ (aℓ ) = u′ (aℓ ), uℓ (aℓ ) = u(aℓ ); so u′ℓ ≥ u′ and
uℓ ≥ u on [aℓ , bℓ ].
Displacement convexity classes 473

On [bℓ , +∞), one has u′ℓ ≥ u′ℓ (bℓ ) = (u′ (bℓ ) − u′ (+∞))/2 ≥ u′ (+∞)
if u′ (bℓ ) ≥ 3u′ (+∞). We can always ensure that this inequality holds
true by choosing a1 large enough thatRu′ (a1 ) ≥ 3u′ (+∞). Then uℓ (s) ≥
s
uℓ (bℓ ) + u′ (+∞) (s − bℓ ) ≥ uℓ (bℓ ) + bℓ u′ = u(s); so uℓ ≥ u also on
[bℓ , +∞).
Deﬁne Uℓ (r) = r uℓ (r −1/N ). All the desired properties of Uℓ can
be shown just as before, except for the bound on Uℓ′′ , which we shall
now check. On [a−N ′′ ′′ −N
ℓ , +∞), Uℓ = U . On [0, aℓ ), with the notation
a = −u′ (+∞), we have u′ℓ (r −1/N ) ≥ −a, u′ (r −1/N ) ≥ −3a (recall that
we imposed u′ (a1 ) ≥ −3a), so

r −1− N − 1 ′′ − 1
1

Uℓ′′ (r) ≤ r N u (r N ) + 1 + 3(N − 1)a ,

r −1− N − 1 ′′ − 1
1
′′
U (r) ≥ r N u (r N ) + (N − 1)a ,
N2
and once again Uℓ′′ ≤ CU ′′ with C = 3 + 1/((N − 1)a).
It remains to prove the second part of (vi). This will be done by
a further approximation scheme. So let U ∈ DCN be linear close to
the origin. (We can always reduce to this case by (v).) If U is linear
on the whole of R+ , there is nothing to do. Otherwise, by (iii), the set
where U ′′ vanishes is an interval [0, r0 ]. The goal is to show that we may
approximate U by Uℓ in such a way that Uℓ ∈ DCN , Uℓ is nonincreasing
in ℓ, Uℓ is linear on some interval [0, r0 (ℓ)], and Uℓ′′ increases nicely
from 0 on [r0 (ℓ), r1 (ℓ)).
In this case, u is a nonincreasing function, identically equal to a
−1/N
constant on [s0 , +∞), with s0 = r0 ; and also u′ is nonincreasing
to 0, so in fact u is strictly decreasing up to s0 . Let a1 ∈ (s0 /2, s0 ). We
can recursively deﬁne real numbers aℓ and C 2 functions uℓ as follows:
• on (0, aℓ ], uℓ coincides with u;
• on [aℓ , +∞), (uℓ )′′ = χℓ u′′ + ηℓ (−u′ ), where χℓ and ηℓ are smooth
functions valued in [0, 2], χℓ (r) is identically equal to 1 for r close
to aℓ , and identically equal to 0 for r ≥ bℓ ; and ηℓ is compactly
supported in [bℓ , cℓ ] and decreasing to 0 close to cℓ ; aℓ < bℓ < cℓ < s0 .
Let us choose χℓ , ηℓ , bℓ , cℓ in such a way that
474 17 Displacement convexity II
Z bℓ Z bℓ Z bℓ Z cℓ
χℓ u′′ > u′′ ; χℓ u′′ + ηℓ (−u′ ) = −u′ℓ (aℓ );
aℓ aℓ aℓ bℓ
Z cℓ
ηℓ (−u′ ) > 0.
bℓ
Rs
This is possible since u′ , u′′ are continuous, aℓ0 (2u′′ ) = −2u′ℓ (aℓ ) >
−u′ℓ (aℓ ), and (−u′ ) is strictly positive on [aℓ , s0 ]. It is clear that uℓ ≥ u
and u′ℓ ≥ u′ on [aℓ , bℓ ], with strict inequalities at bℓ ; by choosing cℓ very
close to bℓ , we can make sure that these inequalities are preserved on
[bℓ , cℓ ]. Then we choose aℓ+1 = (cℓ + s0 )/2.
Let us check that Uℓ (r) := r uℓ (r −1/N ) satisﬁes all the required
properties. To bound Uℓ′′ , note that for r ∈ [s−N 0 , (s0 /2)
−N ],

Uℓ′′ (r) ≤ C(N, r0 ) u′′ℓ (r −1/N ) − u′ℓ (r −1/N )

≤ 2C(N, r0 ) u′′ (r −1/N ) − u′ (r −1/N )

and
U ′′ (r) ≥ K(N, r0 ) u′′ (r −1/N ) − u′ (r −1/N ) ,
where C(N, r0 ), K(N, r0 ) are positive constants. Finally, on [bℓ , cℓ ],
u′′ℓ = ηℓ (−u′ ) is decreasing close to cℓ (indeed, ηℓ is decreasing close to
cℓ , and −u′ is positive nonincreasing); and of course −u′ℓ is decreasing
as well. So u′′ℓ (r −1/N ) and −u′ℓ (r −1/N ) are increasing functions of r in
a small interval [r0 , r1 ]. This concludes the argument.
To prove (vi), we may ﬁrst approximate u by a C ∞ convex, non-
increasing function uℓ , in such a way that ku − uℓ kC 2 ((a,b)) → 0 for
any a, b > 0. This can be done in such a way that uℓ (s) is nonde-
creasing for small s and nonincreasing for large s; and u′ℓ (0) → u′ (0),
u′ℓ (+∞) → u′ (+∞). The conclusion follows easily since p(r)/r 1−1/N is
nondecreasing and equal to −(1/N )u′ (r −1/N ) (−u′ (log 1/r) in the case
N = ∞). ⊓
⊔

Domain of the functionals Uν

To each U ∈ DCN corresponds a functional Uν . However, some condi-

tions might be needed to make sense of Uν (µ).R Why is that so? If U
is, say, nonnegative, then an integral such as U (ρ) dν always makes
Domain of the functionals Uν 475

sense in [0, +∞], so Uν is well-deﬁned on the whole of P2ac (M ). But

U might be partially negative, and then one should not exclude the
possibility that both the negative and the positive parts of U (ρ) have
infinite integrals. The problem comes from infinity and does not arise
if M is a compact manifold, or more generally if ν has finite mass.
Theorem 17.8 below solves this issue: It shows that under some
integral growth condition on ν, the quantity Uν (µ) is well-defined if
µ has finite moments of order p large enough. This suggests that we
study Uν on the set Ppac (M ) of absolutely continuous measures with
finite moment of order p, rather than on the whole space P2ac (M ).
Since this theorem only uses the metric structure, I shall state it in
the context of general Polish spaces rather than Riemannian manifolds.

Theorem 17.8 (Moment conditions make sense of Uν (µ)). Let

(X , d) be a Polish space and let ν be a reference Borel measure on X .
Let N ∈ [1, ∞]. Assume that there exists x0 ∈ X and p ∈ [2, +∞) such
that
Z
 dν(x)

 < +∞ if N < ∞,

 X [1 + d(x0 , x)]p(N −1)

 Z

 p
∃ c > 0; e−c d(x0 ,x) dν(x) < +∞ if N = ∞.
M
(17.5)
Then, for any U ∈ DCN , the formula
Z
Uν (µ) = U (ρ) dν, µ = ρν
X

unambiguously defines a functional Uν : Ppac (X ) → R ∪ {+∞}, where

Ppac (X ) is the set of absolutely continuous probability measures on X
with a finite moment of order p.
Even if no such p exists, Uν is still well-defined on Pcac (X ), the
set of absolutely continuous compactly supported probability measures,
provided that ν is finite on compact sets.

Example 17.9. If ν is the Lebesgue measure on RN , then Uν is well-

defined on P2ac (RN ) for all U ∈ DCN , as long as N ≥ 3. For N = 2,
Theorem 17.8 allows us to define Uν on Ppac (RN ), for any p > 2. In
the case N = 1, Uν is well-defined on Pcac (RN ). All this remains true
if RN is replaced by an arbitrary N -dimensional Riemannian manifold
476 17 Displacement convexity II

with nonnegative RRicci curvature. (Indeed, vol[Br (x0 )] = O(r N ) for any
ﬁxed x0 ∈ M , so dν(x)/[1 + d(x0 , x)]p(N −1) < +∞ if p(N − 1) > N .)

Convention 17.10. In the sequel of this course I shall sometimes write

“p ∈ [2, +∞) ∪ {c} satisfying the assumptions of Theorem 17.8” or
“p ∈ [2, +∞) ∪ {c} satisfying (17.5)”. This means that p is either a
real number greater or equal than 2, satisfying (17.5) (the metric space
(X , d) and the reference measure ν should be obvious from the context);
or the symbol “c”, so that Pp (X ) stands for the set Pc (X ) of compactly
supported probability measures.

Remark 17.11. For any Rpositive constant C, the set of probability

measures µ in Pp (X ) with d(x0 , x)p dµ(x) ≤ C is closed in P2 (X ); but
in general the whole set Pp (X ) is not. Similarly, if K is a given compact
subset of X , then the set of probability measures with support in K is
compact in P2 (X ); but Pc (X ) is not closed in general.

Remark 17.12. If X is a length space (for instance a Riemannian man-

ifold equipped with its geodesic distance), then Pp (M ) is a geodesically
convex subset of Pq (M ), for any q ∈ (1, +∞). Indeed, let (µt )0≤t≤1 be
a geodesic in Pq (M ); according to Corollary 7.22, there is a random
geodesic γ such that µt = law (γt ); then the bounds E d(x0 , γ0 )p < +∞
and E d(x0 , γ1 )p < +∞ together imply E d(x0 , γt )p < +∞, in view of
the inequality

0 ≤ t ≤ 1 =⇒ d(x0 , γt )p ≤ 22p−1 d(x0 , γ0 )p + d(x0 , γ1 )p .

Combining this with Theorem 8.7, we deduce that Ppac (M ) is geodesi-

cally convex in P2 (M ), and more precisely
Z Z Z
p 2p−1 p p
d(x0 , x) µt (dx) ≤ 2 d(x0 , x) µ0 (dx) + d(x0 , x) µ1 (dx) .

Thus even if the functional Uν is a priori only deﬁned on Ppac (M ), it is

not absurd to study its convexity properties along geodesics of P2 (M ).

Proof of Theorem 17.8. The problem is to show that under the assump-
tions of the theorem,
R U (ρ) is bounded below by a ν-integrable function;
then Uν (µ) = U (ρ) dν will be well-deﬁned in R ∪ {+∞}.
Suppose ﬁrst that N < ∞. By convexity of u, there is a constant
A > 0 so that δN U (δ−N ) ≥ −Aδ − A, which means
1
U (ρ) ≥ −A ρ + ρ1− N . (17.6)
Domain of the functionals Uν 477

Of course, ρ lies in L1 (ν); so it is suﬃcient to show that also ρ1−1/N

lies in L1 (ν). But this is a simple consequence of Hölder’s inequality:
Z
1
ρ(x)1− N dν(x)
XZ
1− 1 1
= (1 + d(x0 , x)p )ρ(x) N
(1 + d(x0 , x)p )−1+ N dν(x)
X
Z 1− 1 Z 1
N N
p p −(N −1)
≤ (1 + d(x0 , x) )ρ(x) dν(x) (1 + d(x0 , x) ) dν(x) .
X X

Now suppose that N = ∞. By Proposition 17.7(ii), there are posi-

tive constants a, b such that

U (ρ) ≥ a ρ log ρ − b ρ. (17.7)

So it is suﬃcient to show that (ρ log ρ)− ∈ L1 (ν). Write

Z
ρ(x) log ρ(x) dν(x)
X
Z
p p p
= ρ(x) ec d(x0 ,x) log ρ(x) ec d(x0 ,x) e−c d(x0 ,x) dν(x)
X Z
−c d(x0 , x)p ρ(x) dν(x). (17.8)
X

By Jensen’s inequality, applied with the convex function r → r log r,

−c d(x0 , · )p dν
the probability measure R e e−c d(x p
0 , · ) dν
and the integrable function
X
p
ρ ec d(x0 , · ) , (17.8) can be bounded below by
Z R R
−c d(x0 ,x)p R X ρ dν X ρ dν
e dν(x) log R −c d(x
X X e−c d(x0 ,x)p dν(x) X e
p
0 ,x) dν(x)
Z
−c d(x0 , x)p ρ(x) dν(x).
X

This concludes the argument. ⊓

⊔

In the sequel of this chapter, I shall study properties of the func-

tionals Uν on Ppac (M ), where M is a Riemannian manifold equipped
with its geodesic distance.
478 17 Displacement convexity II

Displacement convexity from curvature bounds, revisited

Recall the notation UN introduced in (16.17) (or in Example 17.6 (iii)).
For any N > 1, the functional (UN )ν will instead be denoted by HN,ν :
Z
HN,ν (µ) = UN (ρ) dν, µ = ρ ν.
M

I shall often write Hν instead of H∞,ν ; and I may even write just H if
the reference measure is the volume measure. This notation
R is justiﬁed
by analogy with Boltzmann’s H functional: H(ρ) = ρ log ρ dvol.
For each U ∈ DCN , formula (16.18) deﬁnes a functional ΛU which
will later play a role in displacement convexity estimates. It will be
convenient to compare this quantity with ΛN := ΛUN ; explicitly,
Z
1
ΛN (µ, v) = |v(x)|2 ρ1− N (x) dν(x), µ = ρ ν. (17.9)
M

It is clear that ΛU ≥ KN,U ΛN , where


 p(r)

 K lim 1−1/N if K > 0

 r→0 r




Kp(r)
KN,U = inf 1−1/N = 0 if K = 0 (17.10)
r>0 r 







K lim p(r) if K < 0.
r→∞ r 1−1/N

It will also be useful to introduce a local version of displacement

convexity. In short, a functional Uν is said to be locally displacement
convex if it is displacement convex in the neighborhood of each point.
Definition 17.13 (Local displacement convexity). Let M be a
Riemannian manifold, and let F be defined on a geodesically convex
subset of P2ac (M ), with values in R ∪ {+∞}. Then F is said to be
locally displacement convex if, for any x0 ∈ M there is r > 0 such that
the convexity inequality
∀t ∈ [0, 1] F (µt ) ≤ (1 − t) F (µ0 ) + t F (µ1 )
holds true as soon as all measures µt , 0 ≤ t ≤ 1, are supported in the
ball Br (x0 ).
The notions of local Λ-displacement convexity and local λ-displace-
ment convexity are defined similarly, by localizing Definition 16.5.
Displacement convexity from curvature bounds, revisited 479

Warning 17.14. When one says that a functional F is locally dis-

placement convex, this does not mean that F is displacement convex
in a small neighborhood of µ, for any µ. The word “local” refers to
the topology of the base space M , not the topology of the Wasserstein
space.

The next theorem is a rigorous implementation of Guesses 16.6

and 16.7; it relates curvature-dimension bounds, as appearing in Theo-
rem 14.8, to displacement convexity properties. Recall Convention 17.10.

Theorem 17.15 (CD bounds read off from displacement con-

vexity). Let M be a Riemannian manifold, equipped with its geodesic
distance d and a reference measure ν = e−V vol, V ∈ C 2 (M ). Let K ∈ R
and N ∈ (1, ∞]. Let p ∈ [2, +∞) ∪ {c} satisfy the assumptions of The-
orem 17.8. Then the following three conditions are equivalent:
(i) M satisfies the curvature-dimension criterion CD(K, N );
(ii) For each U ∈ DCN , the functional Uν is ΛN,U -displacement
convex on Ppac (M ), where ΛN,U = KN,U ΛN ;
(iii) HN,ν is locally KΛN -displacement convex;
and then necessarily N ≥ n, with equality possible only if V is constant.

Remark 17.16. Statement (ii) means, explicitly, that for any displace-
ment interpolation (µt )0≤t≤1 in Ppac (M ), and for any t ∈ [0, 1],
Z 1Z 1
Uν (µt ) + KN,U e s (x)|2 dν(x) G(s, t) ds
ρs (x)1− N |∇ψ
0 M
≤ (1 − t) Uν (µ0 ) + t Uν (µ1 ), (17.11)
0,s
where ρt is the density of µt , ψs = H+ ψ (Hamilton–Jacobi semigroup),
2 e
ψ is d /2-convex, exp(∇ψ) is the Monge transport µ0 → µ1 , and KN,U
is deﬁned by (17.10).

Remark 17.17. If the two quantities in the left-hand side of (17.11)

are inﬁnite with opposite signs, the convention is (+∞)−(+∞) = −∞,
i.e. the inequality is void. This eventuality is ruled out by any one of the
following conditions: (a) K ≥ 0; R (b) N = ∞;q(N (c) µ0 , µ1 ∈ Pq (M ), where
q > 2N/(N − 1) is such that (1 + d(x0 , x) −1)−2N −δ ) ν(dx) < +∞
for some δ > 0. This is a consequence of Proposition 17.24 later in this
chapter.
480 17 Displacement convexity II

Remark 17.18. The case N = 1 is degenerate since U1 is not deﬁned;

but the equivalence (i) ⇔ (ii) remains true if one deﬁnes KN,U to be
+∞ if K > 0, and 0 if K ≤ 0. I shall address this case from a slightly
diﬀerent point of view in Theorem 17.41 below. (As stated in that
theorem, N = 1 is possible only if M is one-dimensional and ν = vol.)

As a particular case of Theorem 17.15, we now have a rigorous justi-

ﬁcation of the guess formulated in Example 16.8: The nonnegativity of
the Ricci curvature is equivalent to the (local) displacement convexity
of Boltzmann’s H functional. This is the intersection of two situations
where Theorem 17.15 is easier to formulate: (a) the case N = ∞; and
(b) the case K = 0. These cases are important enough to be stated
explicitly as corollaries of Theorem 17.15:

Corollary 17.19 (CD(K, ∞) and CD(0, N ) bounds via optimal

transport). Let M be a Riemannian manifold, K ∈ R and N ∈ (1, ∞];
then:
(a) M satisfies Ric ≥ Kg if and only if Boltzmann’s H functional
is K-displacement convex on Pcac (M );
(b) M has nonnegative Ricci curvature and dimension bounded
above by N if and only if HN,vol is displacement convex on Pcac (M ).

Remark 17.20. All these results can be extended to singular mea-

sures, so the restriction to absolutely continuous measures is nonessen-
tial. I shall come back to these issues in Chapter 30.

Core of the proof of Theorem 17.15. Before giving a complete proof, for
pedagogical reasons I shall give the main argument behind the impli-
cation (i) ⇒ (ii) in Theorem 17.15, in the simple case K = 0.
Let (µt )0≤t≤1 be a Wasserstein geodesic, with µt absolutely contin-
uous, and let ρt be the density of µt with respect to ν. By change of
variables, Z Z
ρ0
U (ρt ) dν = U Jt dν,
Jt
where Jt is the Jacobian of the optimal transport taking µ0 to µt . The
next step consists in rewriting this as a function of the mean distortion.
Let u(δ) = δN U (δ−N ), then
 1
Z Z
ρ0 Jt JN
U ρ0 dν = u  t1  ρ0 dν.
J t ρ0 ρN
0
Displacement convexity from curvature bounds, revisited 481

The fact that U belongs to DCN means precisely that u is convex

nonincreasing. The nonnegativity of the (generalized) Ricci curvature
means that the argument of u is a concave function of t. Then the
convexity of the whole expression follows from the simple fact that the
composition of a convex nonincreasing function with a concave function
is itself convex. ⊓
⊔

Complete proof of Theorem 17.15. Let us start with the implication

(i) ⇒ (ii). I shall only treat the case N < ∞, since the case N = ∞
is very similar. In a first step, I shall also assume that µ0 and µ1 are
compactly supported; this assumption will be relaxed later on.
So let µ0 and µ1 be two absolutely continuous, compactly supported
probability measures and let (µt )0≤t≤1 be the unique displacement
interpolation between µ0 and µ1 . It can be written (Tt )# µ0 , where
Tt (x) = expx (t∇ψ(x)), then let (ψt )0≤t≤1 solve the Hamilton–Jacobi
equation with initial datum ψ0 = ψ. The goal is to show that

Uν (µt ) ≤ (1 − t) Uν (µ0 ) + t Uν (µ1 )

Z 1Z
1
− KN,U ρs (x)1− N |∇ψs (x)|2 dν(x) G(s, t) ds. (17.12)
0 M

Note that the last integral is finite since |∇ψs (x)|2 is almost surely
bounded by D2 , where D is the maximum distance between elements
R 1− 1
of Spt(µ0 ) and elements of Spt(µ1 ); and that ρs N dν ≤ ν[Spt µs ]1/N
by Jensen’s inequality.
If either Uν (µ0 ) = +∞ or Uν (µ1 ) = +∞, then there is nothing to
prove; so let us assume that these quantities are finite.
Let t0 be a fixed time in (0, 1); on Tt0 (M ), define, for all t ∈ [0, 1],

Tt0 →t expx (t0 ∇ψ(x)) = expx (t∇ψ(x)).

Then Tt0 →t is the unique optimal transport µt0 → µt . Let Jt0 →t be

the associated Jacobian determinant (well-deﬁned µt0 -almost surely).
Recall from Chapter 11 that µt is concentrated on Tt0 →t (M ) and that
its density ρt is determined by the equation

ρt0 (x) = ρt (Tt0 →t (x)) Jt0 →t (x). (17.13)

Since U (0) = 0, we may apply Theorem 11.3 to F (x) = U (ρt (x));

or more precisely, to the positive part and the negative part of F sep-
arately. So
482 17 Displacement convexity II
Z Z

U (ρt (x)) dν(x) = U ρt (Tt0 →t (x)) Jt0 →t (x) dν(x).
M M

Then formula (17.13) implies

Z Z
ρt0 (x)
U (ρt ) dν = U Jt0 →t (x) dν(x). (17.14)
M M Jt0 →t (x)
Since the contribution of {ρt0 = 0} does not matter, this can be rewrit-
ten
Z
ρt0 (x) Jt0 →t (x)
Uν (µt ) = U ρt0 (x) dν(x)
M J t0 →t (x) ρt0 (x)
Z
= U δt0 (t, x)−N δt0 (t, x)N dµt0 (x)
ZM
= w(t, x) dµt0 (x),
M

where w(t, x) := U (δt0 (t, x)−N ) δt0 (t, x)N , and

1
− 1 Jt0 →t (x) N
δt0 (t, x) = ρt Tt0 →t (x) N = .
ρt0 (x)
Up to a factor which does not depend on t, δt0 ( · , x) coincides with
D(t) in the notation of Chapter 14. So, by Theorem 14.8, for µt0 -almost
all x one has
K 2
δ̈t0 (t, x) ≤ − δt0 (t, x) ∇ψt (Tt0 →t (x)) .
N
Set u(δ) = δN U (δ−N ), so that w = u ◦ δ, where δ is shorthand for
δt0 ( · , x) and x is ﬁxed. Since u′′ ≥ 0 and u′ (δ) = −N p(r)/r 1−1/N ≤ 0,
one has, with r = δ−N ,

∂2w ∂2u 2 ∂u
= (δ̇(t)) + δ̈(t)
∂t2 ∂δ2 ∂δ

p(r) K 2
≥ −N 1

− δ(t) ∇ψt (Tt0 →t (x)) .
r 1− N N
By combining this with the deﬁnition of KN,U , one obtains
2
ẅ(t, x) ≥ KN,U δt0 (t, x) ∇ψt (Tt0 →t (x))
1 2
= KN,U ρt (Tt0 →t (x))− N ∇ψt (Tt0 →t (x)) . (17.15)
Displacement convexity from curvature bounds, revisited 483

Since w is a continuous function of t, this implies (recall Proposi-

tion 16.2)

w(t, x) − (1 − t) w(0, x) − t w(1, x)

Z 1
1 2
≤ −KN,U ρs (Tt0 →s (x))− N ∇ψs (Tt0 →s (x)) G(s, t) ds.
0

Upon integration against µt0 and use of Fubini’s theorem, this inequal-
ity becomes

Uν (µt ) − (1 − t) Uν (µ0 ) − t Uν (µ1 )

Z Z 1
1 2
≤ −KN,U ρs (Tt0 →s (x))−N ∇ψs (Tt →s (x)) G(s, t) ds dµt (x)
0 0
M 0
Z 1Z
1 2
= −KN,U ρs (Tt0 →s (x))− N ∇ψs (Tt0 →s (x)) dµt0 (x) G(s, t) ds
0 M
Z 1Z
1
= −KN,U ρs (y)− N |∇ψs (y)|2 dµs (y) G(s, t) ds
0 M
Z 1Z
1
= −KN,U ρs (y)1− N |∇ψs (y)|2 dν(y) G(s, t) ds.
0 M

This concludes the proof of Property (ii) when µ0 and µ1 have com-
pact support. In a second step I shall relax this compactness as-
sumption by a restriction argument. Let p ∈ [2, +∞) ∪ {c} satisfy the
assumptions of Theorem 17.8, and let µ0 , µ1 be two probability mea-
sures in Ppac (M ). Let (Zℓ )ℓ∈N , (µt,ℓ )0≤t≤1, ℓ∈N (ψt,ℓ )0≤t≤1, ℓ∈N be as in
Proposition 13.2. Let ρt,ℓ stand for the density of µt,ℓ . By Remark 17.4,
the function Uℓ : r → U (Zℓ r) belongs to DCN ; and it is easy to check
1
1−
that KN,Uℓ = Zℓ N KN,U . Since the measures µt,ℓ are compactly sup-
ported, we can apply the previous inequality with µt replaced by µt,ℓ
and U replaced by Uℓ :
Z Z Z
U (Zℓ ρt,ℓ ) dν ≤ (1 − t) U (Zℓ ρ0,ℓ ) dν + t U (Zℓ ρ1,ℓ ) dν
1
Z 1Z
1− N 1
− Zℓ KN,U ρs,ℓ (y)1− N |∇ψs,ℓ (y)|2 dν(y) G(s, t) ds. (17.16)
0 M

It remains to pass to the limit in (17.16) as ℓ → ∞. Recall from

Proposition 13.2 that Zℓ ρt,ℓ is a nondecreasing family of functions con-
verging monotonically to ρt . Since U+ is nondecreasing, it follows that
484 17 Displacement convexity II

U+ (Zℓ ρt,ℓ ) ↑ U+ (ρt ).

On the other hand, the proof of Theorem 17.8 shows that U− (r) ≤
1
A(r + r 1− N ) for some constant A = A(N, U ); so
1
1− N 1− 1 1− 1
U− (Zℓ ρt,ℓ ) ≤ A Zℓ ρt,ℓ + Zℓ ρt,ℓ N ≤ A ρt + ρt N . (17.17)

By the proof of Theorem 17.8 and Remark 17.12, the function on the
right-hand side of (17.17) is ν-integrable, and then we may pass to the
limit by dominated convergence. To summarize:
Z Z
U+ (Zℓ ρt,ℓ ) dν −−−→ U+ (ρt ) dν by monotone convergence;
ℓ→∞
Z Z
U− (Zℓ ρt,ℓ ) dν −−−→ U− (ρt ) dν by dominated convergence.
ℓ→∞
So we can pass to the limit in the ﬁrst three terms appearing in
the inequality (17.16). As for the last term, note that |∇ψs,ℓ (y)|2 =
d(y, Ts→1,ℓ (y))2 /(1 − s)2 , at least µs,ℓ (dy)-almost surely; but then ac-
cording to Proposition 13.2 this coincides with d(y, Ts→1 (y))2 /(1−s)2 =
e s (y)|2 . So the last term in (17.16) can be rewritten as
|∇ψ
Z 1Z
1
KN,U e s (y)|2 dν(y) G(s, t) ds,
(Zℓ ρs,ℓ (y))1− N |∇ψ
0 M

and by monotone convergence this goes to

Z 1Z
1
KN,U e s (y)|2 dν(y) G(s, t) ds
(ρs (y))1− N |∇ψ
0 M

as ℓ → ∞. Thus we have passed to the limit in all terms of (17.16),

and the proof of (i) ⇒ (ii) is complete.
Since the implication (ii) ⇒ (iii) is trivial, to conclude the proof of
Theorem 17.15 it only suﬃces to prove (iii) ⇒ (i). So let x0 ∈ M ; the
goal is to show that (RicN,ν )x0 ≥ Kgx0 , where g is the Riemannian
metric. Let r > 0 be such that HN,ν is KΛN -displacement convex in
Br (x0 ). Let v0 6= 0 be a tangent vector at x0 . For a start, assume
N > n. As in the proof of Theorem 14.8, we can construct ψe ∈ C 2 (M ),
compactly supported in Br (x0 ), such that ∇ψ(x e 0 ) = v0 , ∇2 ψ(x
e 0) =
λ0 In (In is the identity on Tx0 M ) and
" #
(L e2
ψ)
e −
Γ2 (ψ) (x0 ) = RicN,ν (v0 ).
N
Displacement convexity from curvature bounds, revisited 485

e where θ is a positive real number. If θ is small

Then let ψ := θ ψ,
2
enough, ψ is d /2-convex by Theorem 13.5, and |∇ψ| ≤ r/2. Let ρ0 be
a smooth probability density, supported in Bη (x0 ), with η < r/2; and

µ0 = ρ0 ν; µt = exp(t∇ψ)# µ0 .

Then (µt )0≤t≤1 is a geodesic in P2 (M ), entirely supported in Br (x0 ),

so condition (iii) implies

HN,ν (µt ) − (1 − t) HN,ν (µ0 ) − t HN,ν (µ1 )

Z 1 Z
1
1− N 2
≤ −K ρs (x) |∇ψs (x)| dν(x) ds. (17.18)
0

As in the proof of (i) ⇒ (ii), let J (t, x) be the Jacobian determi-

nant of the map exp(t∇ψ) at x, and let δ(t, x) = J (t, x)1/N . (This
amounts to choosing t0 = 0 in the computations above; now this
is not a problem since exp(t∇ψ) is for sure Lipschitz.) Further, let
γ(t, x) = expx (t∇ψ(x)). Formula (14.39) becomes
2
δ̈(t, x) tr U (t, x)
−N = RicN,ν (γ̇(t, x)) +
U (t, x) − In

δ(t, x) n HS
2
n N −n
+ tr U (t, x) + γ̇(t, x) · ∇V (γ(t, x)) , (17.19)
N (N − n) n

where U (0, x) = ∇2 ψ(x), U (t, x) solves the nonlinear diﬀerential equa-

tion U̇ + U 2 + R = 0, and R is deﬁned by (14.7). By using all this
information, we shall derive expansions of (17.19) as θ → 0, ψe being
ﬁxed. First of all, x = x0 + O(θ) (this is informal writing to mean that
d(x, x0 ) = O(θ)); then, by smoothness of the exponential map, γ̇(t, x) =
θv0 + O(θ 2 ); it follows that RicN,ν (γ̇(t, x)) = θ 2 RicN,ν;x0 (v0 ) + O(θ 3 ).
Next, U (0) = θ∇2 ψ(x e 0 ) = λ0 θIn and R(t) = O(θ 2 ); by an elementary
comparison argument, U (t, x) = O(θ), so U̇ = O(θ 2 ), and U (t, x) =
λ0 θIn + O(θ 2 ). Also U − (tr U )In /n = O(θ 2 ), tr U (t) = λ0 θn + O(θ 2 )
and γ̇(t, x) · ∇V (γ(t, x)) + ((N − n)/n) tr U (t, x) = O(θ 2 ). Plugging all
these expansions into (17.19), we get

δ̈(t, x) 1 2
= − θ RicN,ν (v0 ) + O(θ 3 ) . (17.20)
δ(t, x) N
By repeating the proof of (i) ⇒ (ii) with U = UN and using (17.20),
one obtains
486 17 Displacement convexity II

HN,ν (µt ) − (1 − t) HN,ν (µ0 ) − t HN,ν (µ1 )

Z Z
2
1 1
≥ − θ RicN,ν (v0 ) + O(θ) ρs (y)1− N dν(y) G(s, t) ds. (17.21)
0 M

On the other hand, by (17.18),

HN,ν (µt ) − (1 − t) HN,ν (µ0 ) − t HN,ν (µ1 )

Z 1Z
1
≤ −K ρs (y)1− N |γ̇(s, y)|2 dν(y) G(s, t) ds
0 M
Z Z
1 1
= −K θ 2 |v0 |2 + O(θ) ρs (y)1− N dν(y) G(s, t) ds.
0 M

Combining this with (17.21) and canceling out multiplicative factors

R1R 1
θ 2 0 ρs (y)1− N dν(y) G(s, t) ds on both sides, we obtain RicN,ν (v0 ) ≥
K|v0 |2 + O(θ). The limit θ → 0 yields

RicN,ν (v0 ) ≥ K|v0 |2 , (17.22)

and since v0 was arbitrary this concludes the proof.

The previous argument was under the assumption N > n. If N = n
and V is constant, the proof is the same but the modiﬁed Ricci tensor
RicN,ν is replaced by the usual Ricci tensor.
It remains to rule out the cases when N = n and V is nonconstant,
or N < n. In these situations (17.19) should be replaced by
2
δ̈(t, x)
tr U (t, x)
−N 2
= Ric + ∇ V (γ̇(t, x)) + U (t, x) − In

δ(t, x) n HS
2
(tr U (t, x))2 tr U (t, x) − γ̇(t, x) · ∇V (γ(t, x))
+ − . (17.23)
n N
(To see this, go back to Chapter 14, start again from (14.33) but this
time don’t apply (14.34).) Repeat the same argument as before, with
ψ satisfying ∇ψ(x0 ) = v0 and ∇2 ψ(x0 ) = λ0 In (now λ0 is arbitrary).
Then instead of (17.22) one retrieves

1 1 (∇V (x0 ) · v0 )2 2 λ0 (v0 · ∇V (x0 ))
(Ric+∇2 V )(v0 )+λ20 − − +
n N N N
≥ K|v0 |2 . (17.24)
Displacement convexity from curvature bounds, revisited 487

If n = N , the left-hand side of (17.24) is an aﬃne expression of λ0 , so

the inequality is possible only if the slope if zero, i.e. v0 · ∇V (x0 ) = 0.
Since v0 and x0 are arbitrary, actually ∇V = 0, so V is constant (and
Ric(v0 ) ≥ K|v0 |2 ). If n < N , what we have on the left of (17.24) is
a quadratic expression of λ0 with negative dominant coefficient, so it
cannot be bounded below. This contradiction establishes N ≥ n. ⊓
⊔
Remark 17.21. A completely different (and much more general) proof
of the inequality N ≥ n will be given later in Corollary 30.13.
Exercise 17.22 (Necessary condition for displacement convex-
ity). This exercise shows that the elements of DCN are essentially
the only nonlinearities leading to displacement convex functionals. Let
N be a positive integer, M = RN , and let ν be the Lebesgue mea-
sure in RN . Let U be a measurable function R+ → R such that Uν
is lower semicontinuous and convex on the space Pcac (RN ) (absolutely
continuous, compactly supported probability measures), equipped with
the distance W2 . Show that (a) U itself is convex lower semicontinuous;
(b) δ → δN U (δ−N ) is convex. Hint: To prove (b), consider the geodesic
curve (µδ )δ>0 , where µδ is the uniform probability measure on Bδ (0).
Exercise 17.23. Show that if (M, ν) satisfies CD(K, N ) and U belongs
to DCN , then Uν is KR−1/N -displacement convex, when restricted to
−1/N
the geodesically convex set defined by ρ ≤ R. In short, Uν is KkρkL∞ -
displacement convex. Hint: Use U (r) = r m and let m → ∞. (A longer
solution is via the proof of Corollary 19.5.)
To conclude this section, I shall establish sufficient conditions for
the time-integral appearing in (17.11) to be finite.
Proposition 17.24 (Finiteness of time-integral in displacement
convexity inequality). Let M be a Riemannian manifold equipped
with a reference measure ν = e−V vol, V ∈ C(M ), and let ρ0 , ρ1
be two probability densities on M . Let ψ be d2 /2-convex such that
T = exp(∇ψ)e is the optimal Monge transport between µ0 = ρ0 ν and
µ1 = ρ1 ν for the cost function c(x, y) = d(x, y)2 . Let (µt )0≤t≤1 be the
displacement interpolation between µ0 and µ1 , and let ρt be the density
of µt . Then for any t ∈ [0, 1):
Z
(i) e t |2 dν = W2 (µ0 , µ1 )2 .
ρt |∇ψ
M
(ii) Let z be any point in M . If N ∈ (1, ∞) and q > 2N/(N − 1) is
such that µ0 , µ1 ∈ Pqac (M ) and
488 17 Displacement convexity II
Z
dν(x)
∃ δ > 0; q(N −1)−2N −δ < +∞,
M 1 + d(z, x)
then Z 1 Z
1− 1
e t |2 dν
ρt N |∇ψ (1 − t) dt < +∞.
0 M
More precisely, there are constants C = C(N, q, δ) > 0 and θ =
θ(N, q, δ) ∈ (0, 1 − 1/N ) such that
Z
1− 1
(1 − t) e t |2 dν
ρt N |∇ψ (17.25)
M
Z 1
C 2θ ν(dx) N
≤ 1−2θ
W2 (µ0 , µ1 ) q(N −1)−2N −δ
(1 − t) (1 + d(z, x))
Z Z (1−θ)− 1
N
q q
× 1 + d(z, x) µ0 (dx) + d(z, x) µ1 (dx) .

Proof of Proposition 17.24. First, |∇ψ e t (x)| = d(x, Tt→1 (x))/(1 − t),
R
where Tt→1 is the optimal transport µt → µ1 . So ρt |∇ψ e t |2 dν =
2 2 2
W2 (µt , µ1 ) /(1 − t) = W2 (µ0 , µ1 ) . This proves (i).
To prove (ii), I shall start from
Z
1
(1 − t) ρt (x)1− N |∇ψe t (x)|2 ν(dx)
Z
1 1
= ρt (x)1− N d(x, Tt→1 (x))2 ν(dx).
1−t
Let us ﬁrst see how to bound the integral in the right-hand side, with-
out worrying about the factor (1 − t)−1 in front. This can be done
with the help of Jensen’s inequality, in the same spirit as the proof of
Theorem 17.8: If r ≥ 0 is to be chosen later, then
Z
1
ρt (x)1− N d(x, Tt→1 (x))2 ν(dx)
Z N−1
r
N
2(N−1 ) N
≤ ρt (x) 1 + d(z, x) d(x, Tt→1 (x)) ν(dx)
Z !1
N
ν(dx)
× N −1
1 + d(z, x)r
Z N−1
r
2( N ) N
≤C ρt (x) 1 + d(z, x) d(z, x) + d(z, Tt→1 (x) N−1
ν(dx)
Displacement convexity from curvature bounds, revisited 489
Z N−1
N N N
r+2(N−1 ) r+2(N−1 )
≤C ρt (x) 1 + d(z, x) + d(z, Tt→1 (x)) ν(dx)
Z Z N−1
N N N
r+2(N−1 ) r+2(N−1 )
= C 1 + ρt (x) d(z, x) + ρ1 (y) d(z, y) ν(dy) ,

where !1
Z N
ν(dx)
C = C(r, N, ν) = C(r, N ) N −1 ,
1 + d(z, x)r
and C(r, N ) stands for some constant depending only on r and N . By
Remark 17.12, the previous expression is bounded by
Z Z N−1
N N N
r+2(N−1 ) r+2(N−1 )
C(r, N, ν) 1 + d(z, x) µ0 (dx) + d(z, x) µ1 (dx) ;

and the choice r = q − 2N/(N − 1) leads to

Z
1
e t (x)|2 ν(dx)
ρt (x)1− N |∇ψ (17.26)
Z Z N−1
N
≤ C(N, q) 1 + d(z, x)q µ0 (dx) + d(z, x)q µ1 (dx)
Z !1
N
ν(dx)
× q(N −1)−2N .
1 + d(z, x)
This estimateR 1 is not enough to establish the convergence of the time-
integral, since 0 dt/(1−t) = +∞; so we need to gain some cancellation
as t → 1. The idea is to interpolate with (i), and this is where δ will be
useful. Without loss of generality, we may assume δ < q(N − 1) − 2N .
Let θ ∈ (0, 1 − 1/N ) to be chosen later, and N ′ = (1 − θ)N ∈ (1, ∞).
By Hölder’s inequality,
Z
1 1
ρt (x)1− N d(x, Tt→1 (x))2 ν(dx)
1−t
Z θ
1 2
≤ ρt (x) d(x, Tt→1 (x)) ν(dx)
1−t
Z 1−θ
1− N1′ 2
ρt (x) d(x, Tt→1 (x)) ν(dx)
Z 1−θ
1 1− N1′
= W2 (µt , µ1 )2θ ρt (x) 2
d(x, Tt→1 (x)) ν(dx)
1−t
Z 1−θ
1 1− N1′
= W2 (µ0 , µ1 )2θ ρt (x) 2
d(x, Tt→1 (x)) ν(dx) .
(1 − t)1−2θ
490 17 Displacement convexity II

Since N ′ > 1 we can apply the preceding computation with N re-

placed by N ′ :
Z
1
ρt (x)1− N ′ d(x, Tt→1 (x))2 ν(dx)
Z Z 1− 1′
N
′ q q
≤ C(N , q) 1 + d(z, x) µ0 (dx) + d(z, x) µ1 (dx)
Z 1
ν(dx) N′
× .
(1 + d(z, x))q(N ′ −1)−2N ′

Then we may choose θ so that q(N ′ − 1) − 2N ′ = q(N − 1) − 2N − δ;

that is, θ = δ/((q − 2)N ) ∈ (0, 1 − 1/N ). The conclusion follows. ⊓
⊔

Ricci curvature bounds from distorted displacement

convexity

In Theorem 17.15, all the R 1 influence of the Ricci curvature bounds lies
in the additional term 0 (. . .) G(s, t) ds. As a consequence, as soon as
K 6= 0 and N < ∞, the formulation involves not only µt , µ0 and µ1 , but
the whole geodesic path (µs )0≤s≤1 . This makes the exploitation of the
resulting inequality (in geometric applications, for instance) somewhat
delicate, if not impossible.
I shall now present a different formulation, expressed only in terms of
µt , µ0 and µ1 . As a price to pay, the functionals Uν (µ0 ) and Uν (µ1 ) will
be replaced by more complicated expressions in which extra distortion
coefficients will appear. From the technical point of view, this new
formulation relies on the principle that one can “take the direction of
motion out”, in all reformulations of Ricci curvature bounds that were
examined in Chapter 14.

Definition 17.25 (Distorted Uν functional). Let (X , d) be a Polish

space equipped with a Borel reference measure ν. Let U be a convex
function with U (0) = 0, let x → π(dy|x) be a family of conditional
probability measures on X , indexed by x ∈ X , and let β be a measurable
function X × X → (0, +∞]. The distorted Uν functional with distortion
coefficient β is defined as follows: For any measure µ = ρ ν on X ,
Ricci curvature bounds from distorted displacement convexity 491
Z
β ρ(x)
Uπ,ν (µ) = U β(x, y) π(dy|x) ν(dx). (17.27)
X ×X β(x, y)
In particular, if π(dy|x) = δy=T (x) , where T : X → X is measurable,
then
Z
β ρ(x)
Uπ,ν (µ) = U β(x, T (x)) ν(dx). (17.28)
X β(x, T (x))
Remark 17.26. Most of the time, I shall use Definition 17.25 with
(K,N )
β = βt , that is, the reference distortion coefficients introduced in
Definition 14.19.
Remark 17.27. I shall often identify the conditional measures π with
the probability measure π(dx dy) = µ(dx) π(dy|x) on X × X . Of course
the joint measure π(dx dy) determines the conditional measures π(dy|x)
only up to a µ-negligible set of x; but this ambiguity has no influence
on the value of (17.27) since U (0) = 0.
The problems of domain of definition which we encountered for the
original Uν functionals also arise (even more acutely) for the distorted
ones. The next theorem almost solves this issue.
β
Theorem 17.28 (Domain of definition of Uπ,ν ). Let (X , d) be a
Polish space, equipped with a Borel reference measure ν; let K ∈ R,
N ∈ [1, ∞], and U ∈ DCN . Let π be a probability measure on X × X ,
such that the marginal µ of π is absolutely continuous with density ρ.
Further, let β : X × X → (0, +∞] be a measurable function such that


 β is bounded (N < ∞)


Z (17.29)



 (log β(x, y))+ π(dx dy) < +∞ (N = ∞).
X ×X

If there exists x0 ∈ X and p ∈ [2, +∞) such that

Z
 dν(x)

 < +∞ (N < ∞)

 X [1 + d(x0 , x)]p(N −1)
(17.30)

 Z

 p
∃c > 0; e−c d(x0 ,x) dν(x) < +∞ (N = ∞),
X
β
then the integral Uπ,ν (µ) appearing in Definition 17.25 makes sense in
R ∪ {+∞} as soon as µ ∈ Ppac (X ).
492 17 Displacement convexity II
β
Even if there is no such p, Uπ,ν (µ) still makes sense in R ∪ {+∞}
if µ ∈ Pcac (X ) and ν is finite on compact sets.

Proof of Theorem 17.28. The argument is similar to the proof of The-

orem 17.8. In the case N < ∞, β bounded, it suﬃces to write
ρ 1− 1
ρ N 1 1
β U (ρ/β) ≥ −a β + = −a ρ − b β N ρ1− N ;
β β

then the right-hand side is integrable since ρ1−1/N is integrable (as

noted in the proof of Theorem 17.8).
In the case N = ∞, by Proposition 17.7(ii) there are positive con-
stants a, b such that U (ρ) ≥ a ρ log ρ − b ρ; so

ρ(x) ρ(x) ρ(x)
β(x, y) U ≥ a β(x, y) log − b ρ(x)
β(x, y) β(x, y) β(x, y)
= a ρ(x) log ρ(x) − a ρ(x) log β(x, y) − b ρ(x).

We already know by the proof of Theorem 17.8 that (ρ log ρ)− and
ρ lie in L1 (ν), or equivalently in L1 (π(dy|x) ν(dx)). To check the in-
tegrability of the negative part of −a ρ log β(x, y), it suﬃces to note
that
Z Z
ρ(x) (log β(x, y))+ π(dy|x) ν(dx) ≤ (log β(x, y))+ π(dy|x) µ(dx)
Z
= (log β(x, y))+ π(dx dy),

which is ﬁnite by assumption. This concludes the proof of Theo-

rem 17.28. ⊓
⊔
(K,N)
β
Application 17.29 (Definition of Uπ,ν t
). Let X = M be a Rie-
mannian manifold satisfying a CD(K, N ) curvature-dimension bound,
(K,N )
let t ∈ [0, 1] and let β = βt be the distortion coefficient defined
in (14.61)–(14.64).
(K,N )
• If K ≤ 0, then βt is bounded;
p
• If K > 0, N < ∞ and diam (M ) < DK,N := π (N − 1)/K, then β
is bounded;
(K,N )
• If K > 0 and N = ∞, then log βt (x, y) is bounded above by a
constant multiple of d(x, y)2 , which is π(dx dy)-integrable whenever
π is an optimal coupling arising in some displacement interpolation.
Ricci curvature bounds from distorted displacement convexity 493
β
In all three cases, Theorem 17.28 shows that Uπ,ν is well-defined in
R ∪ {+∞}, more precisely that the integrand entering the definition
is bounded below by an integrable function. The only remaining cases
are (a) when K > 0, N < ∞ and diam (M ) coincides with the limit
Bonnet–Myers diameter DK,N ; and (b) when N = 1. These two cases
are covered by the following definition.
β
Convention 17.30 (Definition of Uπ,ν in
p the limit cases). If ei-
ther (a) K > 0, N < ∞ and diam (M ) = π (N − 1)/K or (b) N = 1,
I shall define ′
(K,N) (K,N )
t β β
t
Uπ,ν (µ) = lim
′
Uπ,ν (µ). (17.31)
N ↓N

(K,N )
Remark 17.31. The limit in (17.31) is well-defined; indeed, βt is
increasing as N decreases, and U (r)/r is nondecreasing as a function
(K,N ) (K,N )
of r; so U (ρ(x)/βt (x, y)) βt (x, y) is a nonincreasing function
of N and the limit in (17.31) is monotone. The monotone convergence
theorem guarantees that this definition coincides with the original defi-
nition (17.27) when it applies, i.e. when the integrand is bounded below
by a π(dy|x) ν(dx)-integrable function.

The combination of Application 17.29 and Convention 17.30 ensures

βt
(K,N) R
that Uπ,ν (µ) is well-deﬁned as soon as d(x, y)2 π(dx dy) < +∞,
µ ∈ Ppac (M ), and p satisﬁes (17.30).

Remark 17.32. In the limit case diam (M ) = DK,N , it is perfectly

(K,N)
β
t
possible for Uπ,ν (µ) to take the value −∞. An example is when M
is the sphere S N , K = N − 1, µ is uniform, U (r) = −N r 1−1/N , and
π is the deterministic coupling associated with the map S : x → −x.
(K,N)
t β
However, when π is an optimal coupling, it is impossible for Uπ,ν (µ)
to take the value −∞.

Remark
q 17.33. If diam (M ) = DK,N then actually M is the sphere
S N ( NK−1 ) and ν = vol; but we don’t need this information. (The as-
sumption of M being complete without boundary is important for this
statement to be true, otherwise the one-dimensional reference spaces
of Examples 14.10 provide a counterexample.) See the end of the bib-
liographical notes for more explanations. In the case N = 1, if M is
distinct from a point then it is one-dimensional, so it is either the real
line or a circle.
494 17 Displacement convexity II

Now comes the key notion in this section:

Definition 17.34 (Distorted displacement convexity). Let M

be a Riemannian manifold, equipped with a reference measure ν. Let
(βt (x, y))0≤t≤1 be a family of measurable functions M × M → (0, +∞],
and let U : R+ → R be a continuous convex function with U (0) = 0.
The functional Uν is said to be displacement convex with distortion (βt )
if, for any geodesic path (µt )0≤t≤1 in the domain of Uν ,
β1−t βt
∀t ∈ [0, 1], Uν (µt ) ≤ (1 − t) Uπ,ν (µ0 ) + t Uπ̌,ν (µ1 ), (17.32)

where π stands for the optimal transference plan between µ0 and µ1 ;

and π̌ is obtained from π by switching the variables, that is π̌ = S# π,
S(x0 , x1 ) = (x1 , x0 ).
This notion can be localized as in Definition 17.13.

Remark 17.35. The inequality appearing in (17.32) can be rewritten

more explicitly as
Z
U (ρt ) dν
Z
ρ0 (x0 )
≤ (1 − t) U β1−t (x0 , x1 ) π(dx0 |x1 ) ν(dx0 )
M ×M β1−t (x0 , x1 )
Z
ρ1 (x1 )
+t U βt (x0 , x1 ) π(dx1 |x0 ) ν(dx1 ).
M ×M βt (x0 , x1 )

Remark 17.36. Since U (r)/r is nondecreasing in r, the displacement

convexity condition in Deﬁnition 17.34 becomes more stringent as β
increases.

The next result is an alternative to Theorem 17.15; recall Conven-

tion 17.10.

Theorem 17.37 (CD bounds read off from distorted displace-

ment convexity). Let M be a Riemannian manifold equipped with a
reference measure ν = e−V vol, V ∈ C 2 (M ). Let K ∈ R and N ∈ (1, ∞];
(K,N )
let βt (x, y) be defined as in (14.61). Further, let p ∈ [2, +∞) ∪ {c}
satisfy (17.30). Then the following three conditions are equivalent:
(i) M satisfies the curvature-dimension bound CD(K, N );
(ii) For each U ∈ DCN , the functional Uν is displacement convex on
(K,N )
Ppac (M ) with distortion (βt );
Ricci curvature bounds from distorted displacement convexity 495
(K,N )
(iii) HN,ν is locally displacement convex with distortion (βt );
and then necessarily N ≥ n, with equality possible only if V is constant.

Before explaining the proof of this theorem, let me state two very
natural open problems (I have no idea how diﬃcult they are).

Open Problem 17.38. Is there a natural “Eulerian” counterpart to

Theorem 17.37?

Open Problem 17.39. Theorem 17.15 and 17.37 yield two different
upper bounds for Uν (µt ): on the one hand,

Uν (µt ) ≤ (1 − t) Uν (µ0 ) + t Uν (µ1 )

Z 1 Z
1
− KN,U e s |2 dν G(s, t) ds; (17.33)
ρs (x)1− N |∇ψ
0

on the other hand,

Z !
ρ0 (x0 ) (K,N )
Uν (µt ) ≤ (1−t) U (K,N )
β1−t (x0 , x1 ) π(dx1 |x0 ) dν(x0 )
M β1−t (x0 , x1 )
Z !
ρ1 (x1 ) (K,N )
+t U (K,N )
βt (x0 , x1 ) π(dx0 |x1 ) dν(x1 ). (17.34)
M βt (x0 , x1 )

Can one compare those two bounds, and if yes, which one is sharpest?

At least in the case N = ∞, inequality (17.34) implies (17.33): see

Theorem 30.5 at the end of these notes.

Exercise 17.40. Show, at least formally, that inequalities (17.33) and

(17.34) coincide asymptotically when µ0 and µ1 approach each other.

Proof of Theorem 17.37. The proof shares many common points with
the proof of Theorem 17.15. I shall restrict to the case N < ∞, since
the case N = ∞ is similar.
Let us start with the implication (i) ⇒ (ii). In a first step, I shall
assume that µ0pand µ1 are compactly supported, and (if K > 0)
diam (M ) < π (N − 1)/K. With the same notation as in the be-
ginning of the proof of Theorem 17.15,
Z Z
U (ρt (x)) dν(x) = u(δt0 (t, x)) dµt0 (x).
M M
496 17 Displacement convexity II

By applying inequality (14.56) in Theorem 14.12 (up to a factor which

only depends on x and t0 , D(t) coincides with δt0 (t, x)), and using
the decreasing property of u, we get, with the same notation as in
Theorem 14.12,
Z Z
(1−t) (t)
U (ρt (x)) dν(x) ≤ u τK,N δt0 (0, x) + τK,N δt0 (1, x) dµt0 (x).
M M

Next, by the convexity of u, with coeﬃcients t and 1 − t,

Z
(1−t) (t)
u τK,N δt0 (0, x) + τK,N δt0 (1, x) dν(x)
M
Z (1−t)
τK,N
≤ (1 − t) u δt0 (0, x) dµt0 (x)
M 1−t
Z (t)
τK,N
+t u δt0 (1, x) dµt0 (x).
M t
(K,N ) (t)
Since βt = (τK,N /t)N , the right-hand side of the latter inequality
can be rewritten as
Z (K,N )
!
β1−t (x0 , x1 ) ρ0 (x0 )
(1 − t) U (K,N )
dπ(x0 , x1 )
M ρ0 (x0 ) β1−t (x0 , x1 )
Z !
(K,N )
βt (x0 , x1 ) ρ1 (x1 )
+ t U (K,N )
dπ(x0 , x1 ),
M ρ1 (x1 ) βt (x0 , x1 )

which is the same as the right-hand side of (17.32).

In a second step, I shall relax the assumption of compact support
by a restriction argument. Let µ0 and µ1 be two probability measures
in Ppac (M ), and let (Zℓ )ℓ∈N , (µt,ℓ )0≤t≤1, ℓ∈N , (πℓ )ℓ∈N be as in Proposi-
tion 13.2. Let t ∈ [0, 1] be ﬁxed. By the ﬁrst step, applied with the
probability measures µt,ℓ and the nonlinearity Uℓ : r → U (Zℓ r),
(K,N) (K,N)
β β
(Uℓ )ν (µt,ℓ ) ≤ (1 − t) (Uℓ )π1−t
ℓ ,ν (µ0,ℓ ) + t (Uℓ )π̌tℓ ,ν (µ1,ℓ ). (17.35)

It remains to pass to the limit in (17.35) as ℓ → ∞. The left-

hand side is handled in exactly the same way as in the proof of Theo-
rem 17.15, and the problem is to pass to the limit in the right-hand side.
(K,N )
To ease notation, I shall write βt = β. Let us prove for instance
Ricci curvature bounds from distorted displacement convexity 497

(Uℓ )βπℓ ,ν (µ0,ℓ ) −−−→ Uπ,ν

β
(µ0 ). (17.36)
ℓ→∞

Since µ0 is absolutely continuous, the optimal transport plan π

comes from a deterministic transport T , and similarly the optimal
transport πℓ comes from a deterministic transport Tℓ ; Proposition 13.2
guarantees that Tℓ = T , µ0,ℓ -almost surely. So the left-hand side
of (17.36) can be rewritten as
Z
Zℓ ρ0,ℓ (x0 )
U β(x0 , T (x0 )) ν(dx0 ).
β(x0 , T (x0 ))

Since U+ is a nondecreasing function and Zℓ ρ0,ℓ is a nondecreasing

sequence, the contribution of the positive part U+ is nondecreasing
in ℓ. On the other hand, the contribution of the negative part can be
controlled as in the proof of Theorem 17.28:

Zℓ ρ0,ℓ (x0 ) 1 1− 1 1
U− ≤ A Zℓ ρ0,ℓ (x0 ) + β(x0 , T (x0 )) N Zℓ N ρ0,ℓ (x0 )1− N
β(x0 , T (x0 ))
1 1
≤ A ρ0 (x0 ) + β(x0 , T (x0 )) N ρ0 (x0 )1− N .

Theorem 17.28 (together with Application 17.29) shows that the latter
quantity is always integrable. As a conclusion,
Z
Zℓ ρ0,ℓ (x0 )
U+ β(x0 , T (x0 )) ν(dx0 )
β(x0 , T (x0 ))
Z
ρ0 (x0 )
−−−→ U+ β(x0 , T (x0 )) ν(dx0 )
ℓ→∞ β(x0 , T (x0 ))

by monotone convergence; while

Z
Zℓ ρ0,ℓ (x0 )
U− β(x0 , T (x0 )) ν(dx0 )
β(x0 , T (x0 ))
Z
ρ0 (x0 )
−−−→ U− β(x0 , T (x0 )) ν(dx0 )
ℓ→∞ β(x0 , T (x0 ))

by dominated convergence. So (17.36) holds true, and we can pass to

the limit in all the terms of (17.35).
pIn a third step, I shall treat the limit case diam (M ) =′ DK,N =
π (N − 1)/K. To do this, note that M satisﬁes CD(K, N ) for any
N ′ > N ; then diam (M ) < DK,N ′ , so, by the previous step,
498 17 Displacement convexity II
(K,N ′ ) (K,N ′ )
β
1−t β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ).

The conclusion follows by letting N ′ decrease to N and recalling Con-

vention 17.30. This concludes the proof of (i) ⇒ (ii).
It is obvious that (ii) ⇒ (iii). So let us now consider the implication
(iii) ⇒ (i). For brevity I shall only consider the case N > n. (The
case N = n and V = constant is similar; the other cases are ruled
out in the same way as in the end of the proof of Theorem 17.15.)
Let x0 ∈ M , v0 ∈ Tx0 M ; the goal is to show that RicN,ν (v0 ) ≥ K.
Construct ψ, e ψ and (µt )0≤t≤1 as in the proof of the implication (ii) ⇒
(iii) in Theorem 17.15. Recall (17.21): As θ → 0,

HN,ν (µt ) − (1 − t) HN,ν (µ0 ) − t HN,ν (µ1 )

Z Z
1 1
≥ −θ 2 RicN,ν (v0 ) + O(θ) ρs (y)1− N dν(y) G(s, t) ds. (17.37)
0 M

The change of variables x → Ts (x) is smooth and has the Jacobian

J0→s (x) = 1 + O(θ). So
Z Z
1 1
ρs (x)1− N ν(dx) = ρs (T0→s (x))1− N J0→s (x) ν(dx)
Z 1
ρ0 (x)1− N
= 1 J0→s (x) ν(dx)
J0→s (x)1− N
Z
1 1
= ρ0 (x)1− N J0→s (x) N ν(dx)
Z
1− N1
= 1 + O(θ) ρ0 dν ;

thus (17.37) can be recast as

HN,ν (µt ) − (1 − t) HN,ν (µ0 ) − t HN,ν (µ1 )

Z
2 t(1 − t) 1
1− N
≥ −θ RicN,ν (v0 ) ρ0 dν + O(θ 3 ). (17.38)
2 M
R
(Recall that G(s, t) ds = t(1 − t)/2.)
On the other hand, with obvious notation, the left-hand side of (17.38)
is (by assumption) bounded above by
(K,N)
β1−t (K,N)
βt
(1−t) HN,π,ν (µ0 ) − HN,ν (µ0 ) +t HN,π̌,ν (µ1 ) −HN,ν (µ1 ) . (17.39)
Ricci curvature bounds from distorted displacement convexity 499

Let us see how this expression behaves in the limit θ → 0; for instance
I shall focus on the ﬁrst term in (17.39). From the deﬁnitions,
(K,N)
Z
β1−t 1 (K,N ) 1
HN,π,ν (µ0 ) − HN,ν (µ0 ) = N ρ0 (x)1− N 1−β1−t (x, T (x)) N dν(x),
(17.40)
where T = exp(∇ψ) is the optimal transport from µ0 to µ1 . A standard
Taylor expansion shows that

(K,N ) 1 K[1 − (1 − t)2 ]

β1−t (x, y) N = 1 + d(x, y)2 + O(d(x, y)4 );
6N
plugging this back into (17.40), we ﬁnd
(K,N)
β
1−t
HN,π,ν (µ0 ) − HN,ν (µ0 )
Z
K[1 − (1 − t)2 ] 1
=− ρ0 (x)1− N θ 2 |v0 |2 + O(θ 3 ) dν(x)
6
Z
2 2 3
K[1 − (1 − t)2 ] 1
1− N
= − θ |v0 | + O(θ ) ρ0 dν .
6

A similar computation can be performed for the second term in (17.39),

R 1− 1 R 1
taking into account ρ1 N dν = ρ1− N dν + O(θ). Then the whole ex-
pression (17.39) is equal to
Z
2 (1 − t)[1 − (1 − t)2 ] + t[1 − t2 ] 2 1
1− N
−θ K |v0 | ρ dν +O(θ 3 )
6
Z
2 Kt(1 − t) 2 1
1− N
= −θ |v0 | ρ dν + O(θ 3 ).
2

Since this is an upper bound for the right-hand side of (17.38), we

obtain, after some simpliﬁcation,

RicN,ν (v0 ) + O(θ) ≥ K|v0 |2 + O(θ),

and the conclusion follows upon taking the limit θ → 0. ⊓

⊔

The case N = 1 was not addressed in Theorem 17.37, since the

functional H1,ν was not deﬁned, so Property (iii) would not make sense.
However, the rest of the theorem holds true:
500 17 Displacement convexity II

Theorem 17.41 (One-dimensional CD bounds and displace-

ment convexity). Let M be an n-dimensional Riemannian manifold,
equipped with a reference measure ν = e−V vol, where V ∈ C 2 (M ). Let
(K,1)
K ∈ R and let βt (x, y) be defined as in (14.63). Then the following
two conditions are equivalent:
(i) M satisfies the curvature-dimension bound CD(K, 1);
(ii) For each U ∈ DC1 , the functional Uν is displacement convex on
(K,1)
Pcac (M ) with distortion (βt );
and then necessarily n = 1, V is constant and K ≤ 0.
Proof of Theorem 17.41. When K > 0, (i) is obviously false since ν has
to be equal to vol (otherwise Ric1,ν will take values −∞); but (ii) is
(K,1)
obviously false too since βt = +∞ for 0 < t < 1. So we may assume
that K ≤ 0. Then the proof of (i) ⇒ (ii) is along the same lines as in
Theorem 17.37. As for the implication (ii) ⇒ (i), note that DCN ′ ⊂ DC1
for all N ′ < 1, so M satisfies Condition (ii) in Theorem 17.37 with N
replaced by N ′ , and therefore RicN ′ ,ν ≥ Kg. If N ′ < 2, this forces M
to be one-dimensional. Moreover, if V is not constant there is x0 such
that RicN ′ ,ν = V ′′ − (V ′ )2 /(N ′ − 1) is < K for N ′ small enough. So V
is constant and Ric1,ν = Ric = 0, a fortiori Ric1,ν ≥ K. ⊓
⊔
I shall conclude this chapter with an “intrinsic” theorem of displace-
ment convexity, in which the distortion coefficient β only depends on M
and not on a priori given parameters K and N . Recall Definition 14.17
and Convention 17.10.
Theorem 17.42 (Intrinsic displacement convexity). Let M be
a Riemannian manifold of dimension n ≥ 2, and let βt (x, y) be a con-
tinuous positive function on [0, 1] × M × M . Let p ∈ [2, +∞) ∪ {c} such
that (17.30) holds true with ν = vol and N = n. Then the following
three statements are equivalent:
(i) β ≤ β;
(ii) For any U ∈ DCn , the functional Uν is displacement convex on
ac
Pp (M ) with distortion coefficients (βt );
(iii) The functional Hn is locally displacement convex with distortion
coefficients (βt ).
The proof of this theorem is along the same lines as the proof of
Theorem 17.37, with the help of Theorem 14.20; details are left to the
reader.
Bibliographical notes 501

Bibliographical notes

Historically, the ﬁrst motivation for the study of displacement convexity

was to establish theorems of unique minimization for functionals which
are not strictly convex [614]; a recent development can be found in [202].
In the latter reference, the authors note that some independent earlier
works by Alberti and Bellettini [10, 13] can be reinterpreted in terms
of displacement convexity.
The definition of the displacement convexity classes DCN goes back
to McCann’s PhD thesis [612], in the case N < ∞. (McCann required u
in Definition 17.1 to be nonincreasing; but this is automatic, as noticed
in Remark 17.2.) The definition of DC∞ is taken from [577]. Conditions
(i), (ii) or (iii) in Definition 17.1 occur in various contexts, in partic-
ular in the theory of nonlinear diffusion equations (as we shall see in
Chapter 23), so it is normal that these classes of nonlinearities were re-
discovered later by several authors. The normalization U (0) = 0 is not
the only possible one (U (1) = 0 would also be convenient in a compact
setting), but it has many advantages. In [612] or more recently [577] it
is not imposed that U should be twice differentiable on (0, +∞).
Theorems 17.15 and 17.37 form the core of this chapter. They result
from the contributions of many authors and the story is roughly as
follows.
McCann [614] established the displacement convexity of Uν when
M = Rn , n = N and ν is the Lebesgue measure. Things were made
somewhat simpler by the Euclidean setting (no Jacobi fields, no d2 /2-
convex functions, etc.) and by the fact that only displacement convexity
(as opposed to Λ-displacement convexity) was considered. Apart from
that, the strategy was essentially the same as the one used in this chap-
ter, based on a change of variables, except that the reference measure
was µ0 instead of µt0 . McCann’s proof was recast in my book [814, Proof
of Theorem 5.15 (i)]; it takes only a few lines, once one has accepted
(a) the concavity of det1/n in Rn : that is, if a symmetric matrix S ≤ In
is given, then t 7−→ det(In − tS)1/n is concave [814, Lemma 5.21]; and
(b) the change of variables formula along displacement interpolation.
Later, Cordero-Erausquin, McCann and Schmuckenschläger [246]
studied genuinely Riemannian situations, replacing the concavity of
det1/n in Rn by distortion estimates, and extending the formula of
change of variables along displacement interpolation. With these tools
they basically proved the displacement convexity of Uν for U ∈ DCN ,
as soon as M is a Riemannian manifold of dimension n ≤ N and
502 17 Displacement convexity II

nonnegative Ricci curvature, with the reference measure ν = vol. It

is clear from their paper that their argument also yields, for instance,
K-displacement convexity of H as soon as Ric ≥ K; moreover, they
established (i) ⇒ (ii) in Theorem 17.42 for compactly supported den-
sities.
Several authors independently felt the need to rewrite more ex-
plicitly the connection between Jacobi fields and optimal transport,
which was implicit in [246]. This was done simultaneously by Cordero-
Erausquin, McCann and Schmuckenschläger [247] again; by Sturm [761];
and by Lott and myself [577]. All those arguments heavily draw on [246],
and they are also reminiscent of arguments used in the proof of the
Lévy–Gromov isoperimetric inequality. A large part of the proofs was
actually devoted to establish the Jacobian estimates on the exponential
function, which I recast here as part of Chapter 14.
Modifications needed to replace the volume measure by ν = e−V vol
were discussed by Sturm [761] for N = ∞; and independently by Lott
and myself [577] for N ≤ ∞. For the purpose of this course, all those
modifications were included in the section about “change of measure”
in Chapter 14.
It was first proven by Sturm and von Renesse [764] that the dis-
placement convexity of H does not only result from, but actually char-
acterizes the nonnegativity of the Ricci curvature. This statement was
generalized by Lott and myself [577], and independently Sturm [762].
In a major contribution, Sturm [763] realized the importance and
flexibility of the distorted displacement convexity to encode Ricci cur-
vature bounds. He proved Theorem 17.37 in the most important case
U = UN . As we saw, the proof rests on the inequality (14.56) in The-
orem 14.12, which is (as far as I know) due to Cordero-Erausquin,
McCann and Schmuckenschläger [246] (in the case n = N ). Then the
general formulation with arbitrary U ∈ DCN was worked out shortly
after by Lott and myself [578]. All this was for N < ∞; but then the
case N = ∞ works the same, once one has the correct definitions for
(K,∞)
DC∞ and βt .
In a very recent work, Ohta [657] extended these results to Finsler
geometries.
Displacement convexity is not the only way to “synthetically” re-
formulate lower bounds on the Ricci curvature tensor. An alternative
approach is via the study of rates of contraction of diffusion processes
in Wasserstein distances. For instance, Sturm and von Renesse [764]
Bibliographical notes 503

proved that Ric ≥ 0 is equivalent to the property that the heat equa-
tion is a contraction in Wp distance, where p is fixed in [1, ∞). Also,
Sturm [761] showed that a Riemannian manifold (equipped with the
volume measure) satisfies CD(0, N ) if and only if the nonlinear equa-
tion ∂t ρ = ∆ρm is a contraction for m ≥ 1 − 1/N . (There is a more
complicated criterion for CD(K, N ).) As will be explained in Chap-
ter 24, these results are natural in view of the gradient flow structure
of these diffusion equations.
Even if one sticks to displacement convexity, there are possible vari-
ants in which one allows the functional to explicitly depend on the in-
terpolation time. Lott [576] showed that a measured Riemannian man-
ifold (M, ν) satisfies CD(0, N ) if and only if t 7−→ t Hν (µt ) + N t log t
is a convex function of t ∈ [0, 1] along any displacement interpolation.
There is also a more general version of this statement for CD(K, N )
bounds.
Now come some more technical details. The use of Theorem 17.8
to control noncompactly supported probability densities is essentially
taken from [577]; the only change with respect to that reference is that
I do not try to define Uν on the whole of P2ac , and therefore do not
require p to be equal to 2.
In this chapter I used restriction arguments to remove the compact-
ness assumption. An alternative strategy consists in using a density
argument and stability theorems (as in [577, Appendix E]); these tools
will be used later in Chapters 23 and 30. If the manifold has nonnegative
sectional curvature, it is also possible to directly apply the argument
of change of variables to the family (µt ), even if it is not compactly
supported, thanks to the uniform inequality (8.45).
Another innovation in the proofs of this chapter is the idea of choos-
ing µt0 as the reference measure with respect to which changes of vari-
ables are performed. The advantage of that procedure (which evolved
from discussions with Ambrosio) is that the transport map from µt0
to µt is Lipschitz for all times t, as we know from Chapter 8, while
the transport map from µ0 to µ1 is only of bounded variation. So the
proof given in this section only uses the Jacobian formula for Lipschitz
changes of variables, and not the more subtle formula for BV changes
of variables.
Paths (µt )0≤t≤1 defined in terms of transport from a given measure
e (not necessarily of the form µt0 ) are studied in [30] in the context of
µ
generalized geodesics in P2 (Rn ). The procedure amounts to considering
504 17 Displacement convexity II

µt = (Tt )# µe with Tt (x) = (1 − t) T0 (x) + t T1 (x), where T0 is optimal

between µ e and µ0 , and T1 is optimal between µ e and µ1 . Displacement
convexity theorems work for these generalized geodesics just as fine as
for true geodesics; one reason is that t 7−→ det((1 − t)A + tB)1/n is
concave for any two nonnegative matrices A and B, not just in the
case A = In . These results are useful in error estimates for gradient
flows; they have not yet been adapted to a Riemannian context.
The proofs in the present chapter are of Lagrangian nature, but,
as I said before, it is also possible to use Eulerian arguments, at the
price of further regularization procedures (which are messy but more or
less standard), see in particular the original contribution by Otto and
Westdickenberg [673]. As pointed out by Otto, the Eulerian point of
view, although more technical, has the merit of separating very clearly
the input from local smooth differential geometry (Bochner’s formula
is a purely local statement about the Laplace operator on M , seen
as a differential operator on very smooth functions) and the input
from global nonsmooth analysis (Wasserstein geodesics involve d2 /2-
convexity, which is a nonlocal condition; and d2 /2-convex functions are
in general nonsmooth). Then Daneri and Savaré [271] showed that the
Eulerian approach could be applied even without smoothness, roughly
speaking by encoding the convexity property into the existence of a
suitable gradient flow, which can be defined for nonsmooth data.
Apart from functionals of the form Uν , most displacement convex
functionals presently
R known are constructedR with functionals of the
form Φ : µ 7−→ Φ(x) dµ(x), or Ψ : µ 7−→ Ψ (x, y) dµ(x) dµ(y), where
Φ is a given “potential” and Ψ is a given “interaction potential” [84,
213, 214]. Sometimes these functionals appear in disguise [202].
It is easy to show that the displacement convexity of Φ (seen as
a function on P2 (M )) is implied by the geodesic convexity of Φ, seen
as a function on M . Similarly, it is not difficult to show that the dis-
placement convexity of Ψ is implied by the geodesic convexity of Ψ ,
seen as a function on M × M . These results can be found for instance
in my book [814, Theorem 5.15] in the Euclidean setting. (There it is
assumed that Ψ (x, y) = Ψ (x − y), with Ψ convex, but it is immediate
to generalize the proof to the case where Ψ is convex on Rn × Rn .)
Under mild assumptions, there is in fact equivalence between the dis-
placement convexity of Ψ on P2 (M ) and the geodesic convexity of Ψ
on M × M . This is the case for instance if M = Rn , n ≥ 2, and
Ψ (x, y) = Ψ (x − y), where Ψ (z) is an even continuous function of z.
Bibliographical notes 505

Contrary
R to what is stated in [814], this is false in dimension 1; in fact
µ 7−→ W (x − y) µ(dx) µ(dy) is displacement convex on P2 (R) if and
only if z 7−→ W (z) + W (−z) is convex on R+ (This is because, by
monotonicity, (x, y) 7−→ (T (x), T (y)) preserves the set {y ≥ x} ⊂ R2 .)
As a matter of fact, an interesting example coming from statistical me-
chanics, where W is not convex on the whole of R, is discussed in [202].
There is no simple displacement convexity statement known for the
Coulomb interaction potential; however, Blower [123] proved that
Z
1 1
E(µ) = log µ(dx) µ(dy)
2 R2 |x − y|
defines a displacement convex functional on P2ac (R). Blower further
studied what happens when one adds a potential energy to E, and used
these tools to establish concentration inequalities for the eigenvalues of
some large random matrices. Also Calvez and Carrillo [196, Chapter 7]
recently gave a sharp analysis of the defect of displacement convexity
for the logarithmic potential in dimension 1 (which arguably should be
the worst) with applications to the long-time study of a one-dimensional
nonlinear diffusion equation modeling chemotaxis.
Exercise 17.43. Let M be a compact Riemannian manifold of di-
mension n ≥ 2, and let Ψ be a continuous function on M × M ;
show that Ψ defines a displacement functional on P2 (M ) if and only
if (x, y) 7−→ Ψ (x, y) + Ψ (y, x) is geodesically convex on M × M .
Hints: Note that a product of geodesics in M is also a geodesic in
M × M . First show that Ψ is locally convex on M × M \ ∆, where
∆ = {(x, x)} ⊂ M × M . Use a density argument to conclude; note that
this argument fails if n = 1.
I shall conclude with some comments about Remark 17.33. The
classical Cheng–Toponogov theorem states the following: If a Rie-
mannian manifold M has dimension N , Ricci curvature bounded below
by K > 0, p and diameter equal to the limit Bonnet–Myers diameter
DK,N = π (N − 1)/K, then it is a sphere. I shall explain why this re-
sult remains true when the reference measure is not the volume, and M
is assumed to satisfy CD(K, N ). Cheng’s original proof was based on
eigenvalue comparisons, but there is now a simpler argument relying on
the Bishop–Gromov inequality [846, p. 229]. This proof goes through
when the volume measure is replaced by another reference measure ν,
and then one sees that Ψ = − log(dν/dvol) should solve a certain dif-
ferential equation of Ricatti type (replace the inequality in [573, (4.11)]
506 17 Displacement convexity II

by an equality). Then the second fundamental forms of the spheres in

M have to be the same as in S N , and one gets a formula for the radial
derivative of Ψ . After some computations, one ﬁnds that M is an n-
sphere of diameter DK,N ; and that the measure ν, in coordinates (r, θ)
from thep north pole, is c (sin(kr))N −n times the volume, where c > 0
and k = K/(N − 1). If n < N , the density of dν/dvol vanishes at the
north pole, which is not allowed by our assumptions. The only possi-
bility left out is that M has dimension N and ν is a constant multiple
of the volume. All this was explained to me by Lott.
18

Volume control

Controlling the volume of balls is a universal problem in geometry. This

means of course controlling the volume from above when the radius
increases to inﬁnity; but also controlling the volume from below when
the radius decreases to 0. The doubling property is useful in both
situations.

Definition 18.1 (Doubling property). Let (X , d) be a metric

space, and let µ be a Borel measure on X , not identically 0. The mea-
sure µ is said to be doubling if there exists a constant D such that

∀x ∈ X , ∀r > 0, ν[B2r] (x)] ≤ D ν[Br] (x)]. (18.1)

The measure µ is said to be locally doubling if for any fixed closed

ball B[z, R] ⊂ X , there is a constant D = D(z, R) such that

∀x ∈ B[z, R], ∀r ∈ (0, R), ν[B2r] (x)] ≤ D ν[Br] (x)]. (18.2)

Remark 18.2. It is equivalent to say that a measure ν is locally dou-

bling, or that its restriction to any ball B[z, R] (considered as a metric
space) is doubling.

Remark 18.3. It does not really matter whether the deﬁnition is for-
mulated in terms of open or in terms of closed balls; at worst this
changes the value of the constant D.

When the distance d and the reference measure ν are clear from the
context, I shall often say that the space X is doubling (resp. locally
doubling), instead of writing that the measure ν is doubling on the
metric space (X , d).
508 18 Volume control

It is a standard fact in Riemannian geometry that doubling con-

stants may be estimated, at least locally, in terms of curvature-dimension
bounds. These estimates express the fact that the manifold does not
contain sharp spines (as in Figure 18.1). Of course, it is obvious that a
Riemannian manifold has this property, since it is locally diﬀeomorphic
to an open subset of Rn ; but curvature-dimension bounds quantify this
in terms of the intrinsic geometry, without reference to charts.

Fig. 18.1. The natural volume measure on this “singular surface” (a balloon with
a spine) is not doubling.

Another property which is obvious for a Riemannian manifold, but

which doubling makes quantitative, is the fact that the reference mea-
sure has full support:

Proposition 18.4 (Doubling measures have full support). Let

(X , d) be a metric space equipped with a locally doubling measure ν.
Then Spt ν = X .

Proof. Let x ∈ X , and let r > 0. Since ν is nonzero, there is R > 0 such
that ν[BR] (x)] > 0. Then there is a constant C, possibly depending on
x and R, such that ν is C-doubling inside BR] (x). Let n ∈ N be large
enough that R ≤ 2n r; then

0 < ν[BR] (x)] ≤ C n ν[Br] (x)].

So ν[Br] (x)] > 0. Since r is arbitrarily small, x has to lie in the support
of ν. ⊓
⊔

One of the goals of this chapter is to get doubling constants from

curvature-dimension bounds, by means of arguments based on optimal
Distorted Brunn–Minkowski inequality 509

transport. This is not the standard strategy, but it will work just as
well as any other, since the results in the end will be optimal. As a
preliminary step, I shall establish a “distorted” version of the famous
Brunn–Minkowski inequality.

Distorted Brunn–Minkowski inequality

The classical Brunn–Minkowski inequality states that whenever A0 and
A1 are two nonempty compact subsets of Rn , then
1
A0 + A1 n ≥ |A0 | n1 + |A1 | n1 , (18.3)
where | · | stands for Lebesgue measure, and A0 + A1 is the set of all
vectors of the form a0 + a1 with a0 ∈ A0 and a1 ∈ A1 . This inequality
contains the Euclidean isoperimetric inequality as a limit case (take
A1 = εB(0, 1) and let ε → 0).
It is not easy to guess the “correct” generalization of (18.3) to
general Riemannian manifolds, and it is only a few years ago that a
plausible answer to that problem emerged, in terms of the distortion
coeﬃcients (14.61).
In the sequel, I shall use the following notation: if A0 and A1 are
two nonempty compact subsets of a Riemannian manifold M , then
[A0 , A1 ]t stands for the set of all t-barycenters of A0 and A1 , which is
the set of all y ∈ M that can be written as γt , where γ is a minimiz-
ing, constant-speed geodesic with γ0 ∈ A0 and γ1 ∈ A1 . Equivalently,
[A0 , A1 ]t is the set of all y such that there exists (x0 , x1 ) ∈ A0 ×A1 with
d(x0 , y)/d(y, x1 ) = t/(1−t). In Rn , of course [A0 , A1 ]t = (1−t) A0 +t A1 .
Theorem 18.5 (Distorted Brunn–Minkowski inequality). Let
M be a (complete) Riemannian manifold equipped with a reference
measure ν = e−V vol, V ∈ C 2 (M ), satisfying a CD(K, N ) curvature-
dimension condition. Let A0 , A1 be two nonempty compact subsets, and
let t ∈ (0, 1). Then:
• If N < ∞,

1 (K,N ) 1 1
ν [A0 , A1 ]t N
≥ (1 − t) inf β1−t (x0 , x1 ) N ν[A0 ] N
(x0 ,x1 )∈A0 ×A1

(K,N ) 1 1
+ t inf βt (x0 , x1 ) N ν[A1 ] N , (18.4)
(x0 ,x1 )∈A0 ×A1
510 18 Volume control
(K,N )
where βt (x0 , x1 ) are the distortion coefficients defined in (14.61).

• If N = ∞,

1 1 1
log ≤ (1 − t) log + t log
ν [A0 , A1 ]t ν[A0 ] ν[A1 ]
Kt(1 − t)
− sup d(x0 , x1 )2 . (18.5)
2 x0 ∈A0 , x1 ∈A1

By particularizing Theorem 18.5 to the case when K = 0 and

(K,N )
N < ∞ (so βt = 1), one can show that nonnegatively curved
Riemannian manifolds satisfy a Brunn–Minkowski inequality which is
similar to the Brunn–Minkowski inequality in Rn :

Corollary 18.6 (Brunn–Minkowski inequality in nonnegative

curvature). With the same notation as in Theorem 18.5, if M satis-
fies the curvature-dimension condition CD(0, N ), N ∈ (1, +∞), then
1 1 1
ν [A0 , A1 ]t N ≥ (1 − t) ν[A0 ] N + t ν[A1 ] N . (18.6)

Remark 18.7. When M = Rn , N = n, inequality (18.6) reduces to

1
(1 − t)A0 + tA1 n ≥ (1 − t) |A0 | n1 + t |A1 | n1 ,

where | · | stands for the n-dimensional Lebesgue measure. By homo-

geneity, this is equivalent to (18.4).

Idea of the proof of Theorem 18.5. Introduce an optimal coupling be-

tween a random point γ0 chosen uniformly in A0 and a random point
γ1 chosen uniformly in A1 (as in the proof of isoperimetry in Chapter 2).
Then γt is a random point (not necessarily uniform) in At . If At is very
small, then the law µt of γt will be very concentrated, so its density will
be very high, but then this will contradict the displacement convexity
estimates implied by the curvature assumptions. For instance, consider
for simplicity U (r) = r m , m ≥ 1, K = 0: Since Uν (µ0 ) and Uν (µ1 ) are
ﬁnite, this implies a bound on Uν (µt ), and this bound cannot hold if
the support of µt is too small (in the extreme case where At is a single
point, µt will be a Dirac, so Uν (µt ) = +∞). Thus, the support of µt
has to be large enough. It turns out that the optimal estimates are
obtained with U = UN , as deﬁned in (16.17). ⊓
⊔
Distorted Brunn–Minkowski inequality 511

Detailed proof of Theorem 18.5. First consider the case N < ∞. For
(K,N )
brevity I shall write just βt instead of βt . By regularity of the
measure ν and an easy approximation argument, it is suﬃcient to treat
the case when ν[A0 ] > 0 and ν[A1 ] > 0. Then one may deﬁne µ0 = ρ0 ν,
µ1 = ρ1 ν, where
1A0 1A1
ρ0 = , ρ1 = .
ν[A0 ] ν[A1 ]

In words, µt0 (t0 ∈ {0, 1}) is the law of a random point distributed uni-
formly in At0 . Let (µt )0≤t≤1 be the unique displacement interpolation
between µ0 and µ1 , for the cost function d(x, y)2 . Since M satisﬁes the
curvature-dimension bound CD(K, N ), Theorem 17.37, applied with
1
U (r) = UN (r) = −N r 1− N − r , implies
Z
UN (ρt (x)) ν(dx)
M
Z
ρ0 (x0 )
≤ (1 − t) UN β1−t (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
M β1−t (x0 , x1 )
Z
ρ1 (x1 )
+t UN βt (x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
M βt (x0 , x1 )
Z
ρ0 (x0 ) β1−t (x0 , x1 )
= (1 − t) UN π(dx0 dx1 )
M β1−t (x0 , x1 ) ρ0 (x0 )
Z
ρ1 (x1 ) βt (x0 , x1 )
+ t UN π(dx0 dx1 ),
M β (x
t 0 1, x ) ρ1 (x1 )

where π is the optimal coupling of (µ0 , µ1 ), and βt is a shorthand

(K,N )
for βt ; the equality comes from the fact that, say, π(dx0 dx1 ) =
µ(dx0 ) π(dx1 |x0 ) = ρ(x0 ) ν(dx0 ) π(dx1 |x0 ). After replacement of UN
by its explicit expression and simpliﬁcation, this leads to
Z Z
1 1 1
ρt (x)1− N ν(dx) ≥ (1 − t) ρ0 (x)− N β1−t (x0 , x1 ) N π(dx0 dx1 )
M Z M
1 1
+ t ρ1 (x)− N βt (x0 , x1 ) N π(dx0 dx1 ). (18.7)
M

Since π is supported in A0 × A1 and has marginals ρ0 ν and ρ1 ν, one

can bound the right-hand side of (18.7) below by
1
Z 1
Z
1 1
(1 − t) β1−t
N
ρ0 (x0 )1− N dν(x0 ) + t βtN ρ1 (x1 )1− N dν(x1 ),
M M
512 18 Volume control

where βt stands for the minimum of βt (x0 , x1 ) over all pairs (x0 , x1 ) ∈
A0 × A1 . Then, by explicit computation,
Z Z
1 1 1 1
1− N
ρ0 (x0 ) dν(x0 ) = ν[A0 ] N , ρ1 (x1 )1− N dν(x1 ) = ν[A1 ] N .
M M

So to conclude the proof of (18.4) it suﬃcient to show

Z
1− 1 1
ρt N dν ≤ ν [A0 , A1 ]t N .
M

Obviously, µt is supported in At = [A0 , A1 ]t ; therefore ρt is a probability

density on that set. By Jensen’s inequality,
Z Z
1
1− N 1− 1 dν
ρt dν = ν[At ] ρt N
At At ν[At ]
Z 1
dν 1− N
≤ ν[At ] ρt
At ν[At ]
Z 1− 1
1 N 1
= ν[At ] N ρt dν = ν[At ] N .
At

This concludes the proof of (18.4).

The proof in the case N = ∞ is along the same lines, except that
now it is based on the K-displacement convexity of Hν and the con-
vexity of r 7−→ r log r. ⊓
⊔

Bishop–Gromov inequality

The Bishop–Gromov inequality states that the volume of balls in a

space satisfying CD(K, N ) does not grow faster than the volume of
balls in the model space of constant sectional curvature having Ricci
curvature equal to K and dimension equal to N . In the case K = 0, it
takes the following simple form:
ν[Br (x)]
is a nonincreasing function of r.
rN
(It does not matter whether one considers the closed or the open ball
of radius r.) In the case K > 0 (resp. K < 0), the quantity on the
left-hand side should be replaced by
Bishop–Gromov inequality 513

ν[Br (x)]
Z r !N −1 ,
r
K
sin t dt
0 N −1

ν[Br (x)]
resp. Z r !N −1 .
r
|K|
sinh t dt
0 N −1
Here is a precise statement:

Theorem 18.8 (Bishop–Gromov inequality). Let M be a Rie-

mannian manifold equipped with a reference measure ν = e−V vol, sat-
isfying a curvature-dimension condition CD(K, N ) for some K ∈ R,
1 < N < ∞. Further, let
 r !N −1

 K

 sin t if K > 0



 N −1





(K,N )
s (t) = tN −1 if K = 0





 !N −1

 r

 |K|


 sinh N − 1 t if K < 0

Then, for any x ∈ M ,

ν[Br (x)]
Z r
s(K,N ) (t) dt
0
is a nonincreasing function of r.

Proof of Theorem 18.8. Let us start with the case K = 0, which is

simpler. Let A0 = {x} and A1 = Br] (x); in particular, ν[A0 ] = 0. For
any s ∈ (0, r), one has [A0 , A1 ] rs ⊂ Bs] (x), so by the Brunn–Minkowski
inequality (18.6),
1 1 s 1
ν[Bs] (x)] N ≥ ν [A0 , A1 ] rs N ≥ ν[Br] (x)] N ,
r
and the conclusion follows immediately.
Now let us consider the general case. By Lemma 18.9 below, it will
be suﬃcient to check that
514 18 Volume control

d+
dr ν[Br ]
is nonincreasing, (18.8)
s(K,N )(r)

where Br = Br] (x).

Apply Theorem 18.5 with A0 = {x} again, but now A1 = Br+ε \ Br ;
then for t ∈ (0, 1) one has [A0 , A1 ]t ⊂ Bt(r+ε) \Bt . Moreover, for K ≥ 0,
one has
 q N −1
K
(K,N )  sin t N −1 (r + ε) 
βt (x0 , x1 ) ≥  q  ;
K
t sin N −1 (r + ε)

for K < 0 the same formula remains true with sin replaced by sinh, K
by |K| and r + ε by r − ε. In the sequel, I shall only consider K > 0, the
treatment of K < 0 being obviously similar. After applying the above
bounds, inequality (18.4) yields
 q  N−1
N
h i1 K i1
N  sin t N −1 (r + ε)  N
ν Bt(r+ε) \ Btr ≥t q  ν Br+ε \ Br ;
K
t sin N −1 (r + ε)

or, what is the same,

ν Bt(r+ε) \ Btr ν Br+ε \ Br
q N −1 ≥ t q N −1 .
K K
sin N −1 t(r + ε) sin N −1 (r + ε)

If φ(r) stands for ν[Br ], then the above inequality can be rewritten as

φ(tr + tε) − φ(tr) φ(r + ε) − φ(r)

(K,N )
≥ .
εs (t(r + ε)) ε s(K,N ) (r + ε)

In the limit ε → 0, this yields

φ′ (tr) φ′ (r)
≥ .
s(K,N ) (tr) s(K,N ) (r)

This was for any t ∈ [0, 1], so φ′ /s(K,N ) is indeed nonincreasing, and
the proof is complete. ⊓
⊔

The following lemma was used in the proof of Theorem 18.8. At ﬁrst
sight it seems obvious and the reader may skip its proof.
Doubling property 515

Lemma 18.9. Let a < b in R ∪ {+∞}, let g : (a, b) → R+R be a pos-

r
itive continuous function, integrable at a, and let G(r) = a g(s) ds.
Let F : [a, b) → R+ be a nondecreasing measurable function satisfying
F (a) = 0, and let f (r) = d+ F/dr be its upper derivative. If f /g is
nonincreasing then also F/G is nonincreasing.

Proof of Lemma 18.9. Let h = f /g; by assumption, h is nonincreasing.

In particular, for any x ≥ x0 > a, f (x) ≤ g(x) h(xR0 ) is locally bounded,
y
so F is locally Lipschitz, and F (y) − F (x) = x f (t) dt R yas soon as
y > x > a. Taking the limit x → a shows that F (y) = a f (t) dt. So
the problem is to show that
Rx Ry
a f (t) dt f (t) dt
x ≤ y =⇒ Rx ≤ Ray . (18.9)
a g(t) dt a g(t) dt

If a ≤ t ≤ x ≤ t′ ≤ y, then h(t) ≤ h(t′ ); so

Z x Z y Z x Z y Z x Z y Z x Z y
f g= gh g≤ g gh = g f.
a x a x a x a x

This implies Rx Ry
f f
a
R x ≤ Rxy ,
a g x g
and (18.9) follows. ⊓
⊔

Exercise 18.10. Give an alternative proof of the Bishop–Gromov in-

equality for CD(0, N ) Riemannian manifolds, using the convexity of
t 7−→ t Uν (µt ) + N t log t, for U ∈ DCN , when (µt )0≤t≤1 is a displace-
ment interpolation.

Doubling property

From Theorem 18.8 and elementary estimates on the function s(K,N )

it is easy to deduce the following corollary:

Corollary 18.11 (CD(K, N ) implies doubling). Let M be a Rie-

mannian manifold equipped with a reference measure ν = e−V vol, sat-
isfying a curvature-dimension condition CD(K, N ) for some K ∈ R,
1 < N < ∞. Then ν is doubling with a constant C that is:
516 18 Volume control

• uniform and no more than 2N if K ≥ 0;

• locally uniform and no more than 2N D(K, N, R) if K < 0, where
" r !#N −1
|K|
D(K, N, R) = cosh 2 R , (18.10)
N −1

when restricted to a large ball B[z, R].

The Bishop–Gromov inequality is however more precise than just

doubling property: for instance, if 0 < s < r then, with the same
notation as before,

V (s)
ν[Br (x)] ≥ ν[Bs (x)] ≥ ν[Br (x)],
V (r)

where V (r) is the volume of Br (x) in the model space. This implies
that ν[Br (x)] is a continuous function of r. Of course, this property
is otherwise obvious, but the Bishop–Gromov inequality provides an
explicit modulus of continuity.

Dimension-free bounds

There does not seem to be any “natural” analog of the Bishop–Gromov

inequality when N = ∞. However, we have the following useful esti-
mates.

Theorem 18.12 (Dimension-free control on the growth of balls).

Let M be a Riemannian manifold equipped with a reference mea-
sure ν = e−V vol, V ∈ C 2 (M ), satisfying a CD(K, ∞) condition
for some K ∈ R. Then, for any δ > 0, there exists a constant
C = C K− , δ, ν[Bδ (x0 )], ν[B2δ (x0 )] , such that for all r ≥ δ,
r2
ν[Br (x0 )] ≤ eCr e(K− ) 2 ; (18.11)
r2
ν[Br+δ (x0 ) \ Br (x0 )] ≤ eCr e−K 2 if K > 0. (18.12)
In particular, if K ′ < K then
Z
K′ 2
e 2 d(x0 ,x) ν(dx) < +∞. (18.13)
Bibliographical notes 517

Proof of Theorem 18.12. For brevity I shall write Br for Br] (x0 ). Ap-
ply (18.5) with A0 = Bδ , A1 = Br , and t = δ/(2r) ≤ 1/2. For any
minimizing geodesic γ going from A0 to A1 , one has d(γ0 , γ1 ) ≤ r + δ,
so

d(x0 , γt ) ≤ d(x0 , γ0 ) + d(γ0 , γt ) ≤ δ + t(r + δ) ≤ δ + 2tr ≤ 2δ.

So [A0 , A1 ]t ⊂ B2δ , and by (18.5),

1 δ 1 δ 1 K− δ δ
log ≤ 1− log + log + 1− (r+δ)2 .
ν[B2δ ] 2r ν[Bδ ] 2r ν[Br ] 2 2r 2r
This implies an estimate of the form
c K− r 2
ν[Br ] ≤ exp a + br + + ,
r 2
where a, b, c only depend on δ, ν[Bδ ] and ν[B2δ ]. Inequality (18.11)
follows.
The proof of inequality (18.12) is just the same, but with A0 = Bδ ,
A1 = Br+δ \ Br , t = δ/(3r).
To prove (18.13) in the case K > 0, it suﬃces to take δ = 1 and
write
Z X K′
K′ 2 K′ 2
e 2 d(x0 ,x) ν(dx) ≤ e 2 ν[B1 ] + e 2 (k+1) ν[Bk+1 \ Bk ]
X k≥1
K′ X K′
(k+1)2 −Kk 2
≤e 2 ν[B1 ] + C eC(k+1) e 2 e
k≥1

< +∞.

The case K ≤ 0 is treated similarly. ⊓

⊔

Bibliographical notes

The Brunn–Minkowski inequality in Rn goes back to the end of the

nineteenth century; it was ﬁrst established by Brunn (for convex sets
in dimension 2 or 3), and later generalized by Minkowski (for convex
sets in arbitrary dimension) and Lusternik [581] (for arbitrary compact
sets). Nowadays, it is still one of the cornerstones of the geometry of
518 18 Volume control

convex bodies. Standard references on the Brunn–Minkowski theory are

the book by Schneider [741] and the more recent survey paper by Gard-
ner [406]; one can also consult the lecture by Maurey [608], the recent
research review by Barthe [71], and his broad-audience introductory
text [72].
It is classical to prove the Brunn–Minkowski inequality (in Rn )
via changes of variables, usually called reparametrizations in this con-
text. McCann [612] noticed that optimal transport does yield a con-
venient reparametrization; this is a bit more complicated than the
reparametrizations classically used in Rn , but it has the advantage
of begin deﬁned in more intrinsic terms. McCann’s argument is repro-
duced in [814, Section 6.1]; it is basically the same as the proof of
Theorem 18.5, only much simpler because it is in Euclidean space.
At the end of the nineties, it was still not clear what would be the
correct extension of that theory to curved spaces. The ﬁrst hint came
when Cordero-Erausquin [241] used the formalism of optimal transport
to guess a Prékopa–Leindler inequality on the sphere. In Euclidean
space, the Prékopa–Leindler inequality is a well-known functional ver-
sion of the Brunn–Minkowski inequality (it is discussed for instance in
the above-mentioned surveys, and we shall meet it in the next chapter).
Cordero-Erausquin, McCann and Schmuckenschläger [246] developed
the tools necessary to make this approach rigorous, and also established
Prékopa–Leindler inequalities in curved geometry (when the reference
measure is the volume). Then Sturm [763] adapted the proof of [246]
to get Brunn–Minkowski inequalities for general reference measures.
Ohta [657] further generalized these results to Finsler geometries.
The proof of the Bishop–Gromov inequality in the case K = 0 is
taken from [577]. Apart from that, my presentation in this chapter is
strongly inspired by Sturm [763]. In particular, it is from that source
that I took the statement of Theorem 18.5 and the proof of the Bishop–
Gromov inequality for K 6= 0.
Exercice 18.10 was inspired from an exchange with Lott. The con-
vexity property mentioned in the exercise is proven in [576].
More classical proofs of the Bishop–Gromov inequality can be found
in reference textbooks, e.g. [394, Theorem 4.19]. The resulting compar-
ison inequality between the volume of balls in a CD(K, N ) Rieman-
nian manifold and in the comparison space is called just the Bishop
inequality [394, Theorem 3.101(i)]. Also available is a reversed com-
Bibliographical notes 519

parison principle for upper bounds on the sectional curvature [394,

Theorem 3.101(ii)], due to Gunther.
Lemma 18.9 is a slight variation of [223, Lemma 3.1]; it is apparently
due to Gromov [635]. This lemma can also be proven by approximation
from its discrete version.
19

Density control and local regularity

The following situation occurs in many problems of local regularity:

Knowing a certain estimate on a certain ball Br (x0 ), deduce a better
estimate on a smaller ball, say Br/2 (x0 ). In the fifties, this point of view
was put to a high degree of sophistication by De Giorgi in his famous
proof of Hölder estimates for elliptic second-order partial differential
equations in divergence form; and it also plays a role in the alternative
solutions found at the same time by Nash, and later by Moser. When
fine analysis on metric spaces started to develop, it became an impor-
tant issue to understand what were the key ingredients lying at the
core of the methods of De Giorgi, Nash and Moser. It is now accepted
by many that the two key inequalities are:
• a doubling inequality for the reference volume measure;
• a local Poincaré inequality, controlling the deviation of a func-
tion on a smaller ball by the integral of its gradient on a larger ball.
Here is a precise definition:

Definition 19.1 (Local Poincaré inequality). Let (X , d) be a Pol-

ish metric space and let ν be a Borel measure on X . It is said that ν
satisfies a local Poincaré inequality with constant C if, for any Lipschitz
function u, any point x0 ∈ X and any radius r > 0,
Z Z

− u(x) − huiBr (x0 ) dν(x) ≤ Cr − |∇u(x)| dν(x), (19.1)
Br (x0 ) B2r (x0 )
R R
where −RB = (ν[B])−1 B stands for the averaged integral over B, and
huiB = −B u dν for the average of the function u on B.
522 19 Density control and local regularity

Let B be a Borel subset of X . It is said that ν satisfies a local

Poincaré inequality with constant C in B if inequality (19.1) holds true
under the additional restriction that B2r (x0 ) ⊂ B.

Remark 19.2. The deﬁnition of |∇u| in a nonsmooth context will be

discussed later (see Chapter 20). For the moment the reader does not
need to know this notion since this chapter only considers Riemannian
manifolds.

Remark 19.3. The word “local” in Deﬁnition 19.1 means that the
inequality is interested in averages around some point x0 . This is in
contrast with the “global” Poincaré inequalities that will be considered
later in Chapter 21, in which averages are over the whole space.

There are an incredible number of variants of Poincaré inequalities,

but I shall stick to the ones appearing in Definition 19.1. Sometimes
I shall say that ν satisfies a uniform local Poincaré inequality to stress
the fact that the constant C is independent of x0 and r. For most
applications this uniformity is not important, all that matters is that
inequality (19.1) holds true in the neighborhood of any point x0 ; so it
is sufficient to prove that ν satisfies a local Poincaré inequality with
constant C = C(R) on each ball B(z, R), where z is fixed once for all.
Just as the doubling inequality, the local Poincaré inequality might
be ruined by sharp spines, and Ricci curvature bounds will prevent
those spines to occur, providing quantitative Poincaré constants (that
will be uniform in nonnegative curvature). Again, the goal of this chap-
ter is to prove these facts by using optimal transport. The strategy
goes through pointwise bounds on the density of the displacement in-
terpolant.
There are at least two ways to prove pointwise bounds on the dis-
placement interpolant. The first one consists in combining the Jacobian
equation involving the density of the interpolant (Chapter 11) with
the Jacobian determinant estimates derived from the Ricci curvature
bounds (Chapter 14). The second way goes via displacement convex-
ity (Chapter 17); it is quite more indirect, but its interest will become
apparent later in Chapter 30.
Of course, pointwise bounds do not result directly from displacement
convexity, which only yields integral bounds on the interpolant; still,
it is possible to deduce pointwise bounds from integral bounds by us-
ing the stability of optimal transport under restriction (Theorem 4.6).
Pointwise estimates on the interpolant density 523

The idea is simple: a pointwise bound on ρt (x) will be achieved by

considering integral bounds on a very small ball Bδ (x), as δ → 0.
Apart from the local Poincaré inequality, the pointwise control on
the density will imply at once the Brunn–Minkowski inequality, and
also its functional counterpart, the Prékopa–Leindler inequality. This
is not surprising, since a pointwise control is morally stronger than an
integral control.

Pointwise estimates on the interpolant density

The next theorem is the key result of this chapter. The notation [x, y]t
stands for the set of all t-barycenters of x and y (as in Theorem 18.5).

Theorem 19.4 (CD(K, N ) implies pointwise bounds on displace-

ment interpolants). Let M be a Riemannian manifold equipped with
a reference measure ν = e−V vol, V ∈ C 2 (M ), satisfying a curvature-
dimension CD(K, N ) for some K ∈ R, N ∈ (1, ∞]. Further, let
µ0 = ρ0 ν and µ1 = ρ1 ν be two probability measures in Ppac (M ),
where p ∈ [2, +∞) ∪ {c} satisfies the assumptions of Theorem 17.8.
Let (µt )0≤t≤1 be the unique displacement interpolation between µ0 and
µ1 , and let ρt stand for the density of µt with respect to ν. Then for
any t ∈ (0, 1),

• If N < ∞, one has the pointwise bound

!−N
ρ0 (x0 ) − 1 ρ1 (x1 ) − 1
N N
ρt (x) ≤ sup (1 − t) (K,N )
+t (K,N )
,
x∈[x0 ,x1 ]t β1−t (x0 , x1 ) βt (x0 , x1 )

(19.2)
1
− 1 −N
where by convention (1 − t) a− N + t b N = 0 if either a or b
is 0;
• If N = ∞, one has the pointwise bound

Kt(1 − t)
ρt (x) ≤ sup ρ0 (x0 )1−t ρ1 (x1 )t exp − d(x0 , x1 )2 .
x∈[x0 ,x1 ]t 2
(19.3)
524 19 Density control and local regularity

Corollary 19.5 (Preservation of uniform bounds in nonnega-

tive curvature). With the same notation as in Theorem 19.4, if
K ≥ 0 then

kρt kL∞ (ν) ≤ max kρ0 kL∞ (ν) , kρ1 kL∞ (ν) .

As I said before, there are (at least) two possible schemes of proof
for Theorem 19.4. The first one is by direct application of the Jacobian
estimates from Chapter 14; the second one is based on the displacement
convexity estimates from Chapter 17. The first one is formally simpler,
while the second one has the advantage of being based on very robust
functional inequalities. I shall only sketch the first proof, forgetting
about regularity issues, and give a detailed treatment of the second
one.

Sketch of proof of Theorem 19.4 by Jacobian estimates. Let ψ : M →

R ∪ {+∞} be a (d2 /2)-convex function so that µt = [exp(t∇ψ)] e # µ0 .
e
Let J (t, x) stand for the Jacobian determinant of exp(t∇ψ); then, with
e
the shorthand xt = expx0 (t∇ψ(x0 )), the Jacobian equation of change
of variables can be written

ρ0 (x0 ) = ρt (xt ) J (t, x0 ).

Similarly,
ρ0 (x0 ) = ρ1 (x1 ) J (1, x0 ).
Then the result follows directly from Theorems 14.11 and 14.12: Apply
1
equation (14.56) if N < ∞, (14.55) if N = ∞ (recall that D = J N ,
ℓ = − log J ). ⊓
⊔

Proof of Theorem 19.4 by displacement interpolation. For simplicity

I shall only consider the case N < ∞, and derive the conclusion from
Theorem 17.37. Then the case N = ∞ can be treated by adapting the
proof of the case N < ∞, replacing Theorem 17.37 by Theorem 17.15,
and using the function U∞ deﬁned in (16.17). (Formally, it amounts to
taking the limit N → ∞ in (19.2).)
Let t ∈ (0, 1) be given, let (µs )0≤s≤1 be as in the statement of
the theorem, and let Π be the law of a random geodesic γ such that
law (γs ) = µs . Let y be an arbitrary point in M , and δ > 0; the goal is
to estimate from above the probability P γt ∈ Bδ (y) = µt [Bδ (y)], so
as to recover a bound on ρt (y) as δ → 0.
Pointwise estimates on the interpolant density 525

If P γt ∈ Bδ (y) = 0, then there is nothing to prove. Otherwise
we may condition γ by the event “γt ∈ Bδ (y)”. Explicitly, this means:
Introduce γ ′ such that law (γ ′ ) = Π ′ = (1Z Π)/Π[Z], where
n o
Z = γ ∈ Γ (M ); γt ∈ Bδ (y) .

Further, define π ′ = law (γ0′ , γ1′ ), µ′s = law (γs′ ) = (es )# Π ′ . Obviously,
Π Π
Π′ ≤ = ,
Π[Z] µt [Bδ (y)]
so for all s ∈ [0, 1],
µs
µ′s ≤ .
µt [Bδ (y)]
In particular, µ′s is absolutely continuous and its density ρ′s satisfies
(ν-almost surely)
ρs
ρ′s ≤ . (19.4)
µt [Bδ (y)]
When s = t, inequality (19.4) can be refined into
ρt 1Bδ (y)
ρ′t = , (19.5)
µt [Bδ (y)]
since
1γt ∈Bδ (y) 1x∈Bδ (y) ((et )# Π)
(et )# = .
µt [Bδ (y)] µt [Bδ (y)]
(This is more difficult to write down than to understand!)
From the restriction property (Theorem 4.6), (γ0′ , γ1′ ) is an optimal
coupling of (µ′0 , µ′1 ), and therefore (µ′s )0≤s≤1 is a displacement interpo-
1
lation. By Theorem 17.37 applied with U (r) = −r 1− N ,
Z Z
1 1 1
(ρ′t )1− N dν ≥ (1 − t) (ρ′0 (x0 ))− N β1−t (x0 , x1 ) N π ′ (dx0 dx1 )
M M ×M
Z
1 1
+ t (ρ′1 (x1 ))− N βt (x0 , x1 ) N π ′ (dx0 dx1 ). (19.6)
M ×M

By deﬁnition, µ′t is supported in Bδ (y), so

Z Z
1 1
′ 1− N
(ρt ) dν = (ρ′t )1− N dν
M Bδ (y)
Z
1 dν
= ν[Bδ (y)] (ρ′t )1− N . (19.7)
Bδ (y) ν[Bδ (y)]
526 19 Density control and local regularity
1
By Jensen’s inequality, applied with the concave function r → r 1− N ,
Z Z 1− 1
1 dν dν N 1
(ρ′t )1− N ≤ ρ′t = 1 .
Bδ (y) ν[Bδ (y)] ν[Bδ (y)] ν[Bδ (y)]1− N

Plugging this into (19.7), we ﬁnd

Z
1 1
(ρ′t )1− N dν ≤ ν[Bδ (y)] N . (19.8)
M

On the other hand, from (19.4) the right-hand side of (19.6) can be
bounded below by
Z
1 1 1
µt [Bδ (y)] N (1 − t) (ρ0 (x0 ))− N β1−t (x0 , x1 ) N
M ×M
1 1
+ t (ρ1 (x1 ))− N βt (x0 , x1 ) N π ′ (dx0 dx1 )

1
h 1 1
= µt [Bδ (y)] N E (1 − t) (ρ0 (γ0′ ))− N β1−t (γ0′ , γ1′ ) N
1 1
i
+ t (ρ1 (γ1′ ))− N βt (γ0′ , γ1′ ) N

1
h 1 1
≥ µt [Bδ (y)] N E inf (1 − t) (ρ0 (γ0′ ))− N β1−t (γ0′ , γ1′ ) N
γt ∈[x0 ,x1 ]t
1 1
i
+ t (ρ1 (γ1′ ))− N βt (γ0′ , γ1′ ) N , (19.9)

where the last inequality follows just from the (obvious) remark that
γt′ ∈ [γ0′ , γ1′ ]t . In all of these inequalities, we can restrict π ′ to the set
{ρ0 (x0 ) > 0, ρ1 (x1 ) > 0} which is of full measure.
Let

1 1
F (x) := inf (1 − t) (ρ0 (x0 ))− N β1−t (x0 , x1 ) N
x∈[x0 ,x1 ]t

1 1
+ t (ρ1 (x1 ))− N βt (x0 , x1 ) N ;

and by convention F (x) = 0 if either ρ0 (x0 ) or ρ1 (x1 ) vanishes. (Forget

about the measurability of F for the moment.) Then in view of (19.5)
the lower bound in (19.9) can be rewritten as
Pointwise estimates on the interpolant density 527
Z
Z F (x) dµt (x)
Bδ (y)
E F (γt′ ) = F (x) dµ′t (x) = .
M µt [Bδ (y)]
Combined with the upper bound (19.8), this implies
Z
1 F (x) dµt (x)
µt [Bδ (y)] − N Bδ (y)
≥ . (19.10)
ν[Bδ (y)] µt [Bδ (y)]
Lebesgue’s density theorem tells the following: If ϕ is a locally
integrable function, then ν(dy)-almost any y is a Lebesgue point of ϕ,
which means
Z
1
ϕ(x) dν(x) −−→ ϕ(y).
ν[Bδ (y)] Bδ (y) δ↓0

In particular, if y is a Lebesgue point of ρt , then

Z
ρt (x) dν(x)
µt [Bδ (y)] Bδ (y)
= −−→ ρt (y).
ν[Bδ (y)] ν[Bδ (y)] δ↓0

The inequality in (19.10) proves that F ρt is locally ν-integrable; there-

fore also Z
F (x) dµt (x)
Bδ (y)
−−→ F (y) ρt (y).
ν[Bδ (y)] δ↓0

If one plugs these two limits in (19.10), one obtains

1 F (y) ρt (y)
ρt (y)− N ≥ = F (y),
ρt (y)

provided that ρt (y) > 0; and then ρt (y) ≤ F (y)−N , as desired. In the
case ρt (y) = 0 the conclusion still holds true.
Some ﬁnal words about measurability. It is not clear (at least to me)
that F is measurable; but instead of F one may use the measurable
function
1 1 1 1
Fe(x) = (1 − t) ρ0 (γ0 )− N β1−t (γ0 , γ1 ) N + t ρ1 (γ1 )− N βt (γ0 , γ1 ) N ,

where γ = Ft (x), and Ft is the measurable map deﬁned in Theo-

rem 7.30(v). Then the same argument as before gives ρt (y) ≤ Fe (y)−N ,
and this is obviously bounded above by F (y)−N . ⊓
⊔
528 19 Density control and local regularity

It is useful to consider the particular case when the initial density

µ0 is a Dirac mass and the ﬁnal mass is the uniform distribution on
some set B:

Theorem 19.6 (Jacobian bounds revisited). Let M be a Rie-

mannian manifold equipped with a reference measure ν = e−V vol,
V ∈ C 2 (M ), satisfying a curvature-dimension condition CD(K, N ) for
some K ∈ R, N ∈ (1, ∞). Let z0 ∈ M and let B be a bounded set of
positive measure. Further, let (µzt 0 )0≤t≤1 be the displacement interpola-
tion joining µ0 = δz0 to µ1 = (1B ν)/ν[B]. Then the density ρzt 0 of µzt 0
satisfies
C(K, N, R)
ρzt 0 (x) ≤ ,
tN ν[B]
where
p
C(K, N, R) = exp − (N − 1)K− R , K− = max(−K, 0),
(19.11)
and R is an upper bound on the distances between z0 and elements of B.
In particular, if K ≥ 0, then
1
ρzt 0 (x) ≤ .
tN ν[B]
Remark 19.7. Theorem 19.6 is a classical estimate in Riemannian
geometry; it is often stated as a bound on the Jacobian of the map
(s, ξ) 7−→ expx (sξ). It will be a good exercise for the reader to convert
Theorem 19.6 into such a Jacobian bound.

Proof of Theorem 19.6. Let z0 and B be as in the statement of the

lemma, and let µ1 = (1B ν)/ν[B]. Consider a displacement interpola-
tion (µt )0≤t≤1 between µ0 = δz0 and µ1 . Recall from Chapter 13 that
µt is absolutely continuous for all t ∈ (0, 1]. So Theorem 19.4 can be
applied to the (reparametrized) displacement interpolation (µ′t )0≤t≤1
deﬁned by µ′t = µt′ , t′ = t0 + (1 − t0 )t; this yields
h 1 1
ρt′ (x) ≤ sup (1 − t) β1−t (x0 , x1 ) N ρt0 (x0 )− N
x∈[x0 ,x1 ]t
1 1
i−N
+ t βt (x0 , x1 ) N ρ1 (x1 )− N . (19.12)

Clearly, the sum above can be restricted to those pairs (x0 , x1 ) such that
x1 lies in the support of µ1 , i.e. x1 ∈ B; and x0 lies in the support of µt0 ,
Pointwise estimates on the interpolant density 529

which implies x0 ∈ [z0 , B]t0 . Moreover, since z → z −N is nonincreasing,

one has the obvious bound
h 1
i
1 −N
ρt′ (x) ≤ sup t βt (x0 , x1 ) N ρ1 (x1 )− N
x∈[x0 ,x1 ]t ; x0 ∈[z0 ,B]t0 ; x1 ∈B

ρ1 (x1 )
= sup .
x∈[x0 ,x1 ]t ; x0 ∈[z0 ,B]t0 ; x1 ∈B tN βt (x0 , x1 )

Since ρ1 = 1B /ν[B], actually

S(t0 , z0 , B)
ρt′ (x) ≤ ,
tN ν[B]

where
n 1
o
S(t0 , z0 , B) := sup βt (x0 , x1 )− N ; x0 ∈ [z0 , B]t0 , x1 ∈ B . (19.13)

Now let t0 → 0 and t go to t′ , in such a way that t′ stays ﬁxed.

Since B is bounded, the geodesics linking z0 to an element of B have
a uniformly bounded speed, so the set [z0 , B]t0 is included in a ball
B(z, V t0 ) for some constant V ; this shows that those x0 appearing
in (19.13) converge uniformly to z0 . By continuity of βt , S(t0 , z0 , B)
converges to S(0, z0 , B). Then an elementary estimate of βt shows that
S(0, z0 , B) ≤ C(K, N, R). This ﬁnishes the proof. ⊓
⊔

To conclude, I shall state a theorem which holds true with the in-
trinsic distortion coefficients of the manifold, without any reference to a
choice of K and N , and without any assumption on the behavior of the
manifold at infinity (if the total cost is infinite, we can appeal to the
notion of generalized optimal coupling and generalized displacement
interpolation, as in Chapter 13). Recall Definition 14.17.

Theorem 19.8 (Intrinsic pointwise bounds on the displace-

ment interpolant). Let M be an n-dimensional Riemannian man-
ifold equipped with some reference measure ν = e−V vol, V ∈ C 2 (M ),
and let β be the associated distortion coefficients. Let µ0 , µ1 be two
absolutely continuous probability measures on M , let (µt )0≤t≤1 be the
unique generalized displacement interpolation between µ0 and µ1 , and
let ρt be the density of µt with respect to ν. Then one has the pointwise
bound
530 19 Density control and local regularity
!
ρ0 (x0 ) − n1 ρ (x ) − 1 −n
1 1 n
ρt (x) ≤ sup (1 − t) + t ,
x∈[x0 ,x1 ]t β 1−t (x0 , x1 ) β t (x0 , x1 )
(19.14)
1 1 −n
−n −n
where by convention (1 − t) a + tb = 0 if either a or b is 0.

Proof of Theorem 19.8. First use the standard approximation proce-

dure of Proposition 13.2 to deﬁne probability measures µt,ℓ with den-
sity ρt,ℓ , and numbers Zℓ such that Zℓ ↑ 1, Zℓ ρt,ℓ ↑ ρt , and µt,ℓ are
compactly supported.
Then we can redo the proof of Theorem 19.4 with µ0,ℓ and µ1,ℓ
instead of µ0 and µ1 , replacing Theorem 17.37 by Theorem 17.42. The
result is
!
ρ (x ) − 1 ρ (x ) − 1 −n
0,ℓ 0 n 1,ℓ 1 n
ρt,ℓ (x) ≤ sup (1 − t) +t .
x∈[x0 ,x1 ]t β (x ,
1−t 0 1 x ) β t 0 , x1 )
(x

Since Zℓ ρt,ℓ ≤ ρt , it follows that

!
Z ρ (x ) − 1 Z ρ (x ) − 1 −n
ℓ 0,ℓ 0 n ℓ 1,ℓ 1 n
Zℓ ρt,ℓ (x) ≤ sup (1 − t) +t
x∈[x0 ,x1 ]t β 1−t (x0 , x1 ) β t (x0 , x1 )
!
ρ (x ) − 1 ρ (x ) − 1 −n
0 0 n 1 1 n
≤ sup (1 − t) +t .
x∈[x0 ,x1 ]t β 1−t (x0 , x1 ) β t (x0 , x1 )

The conclusion is obtained by letting ℓ → ∞. ⊓

⊔

Democratic condition

Local Poincaré inequalities are conditioned, loosely speaking, to the

“richness” of the space of geodesics: One should be able to transfer
mass between sets by going along geodesics, in such a way that diﬀerent
points use geodesics that do not get “too close to each other”. This
idea (which is reminiscent of the intuition behind the distorted Brunn–
Minkowski inequality) will be more apparent in the following condition.
It says that one can use geodesics to redistribute all the mass of a ball
in such a way that each point in the ball sends all its mass uniformly
over the ball, but no point is visited too often in the process. In the
Democratic condition 531

next deﬁnition, what I call “uniform distribution on B” is the reference

measure ν, conditioned on the ball, that is, (1B ν)/ν[B]. The deﬁnition
is formulated in the setting of a geodesic space (recall the deﬁnitions
about length spaces in Chapter 7), but in this chapter I shall apply it
only in Riemannian manifolds.
Definition 19.9 (Democratic condition). A Borel measure ν on a
geodesic space (X , d) is said to satisfy the democratic condition Dm(C)
for some constant C > 0 if the following property holds true: For any
closed ball B in X there is a random geodesic γ such that γ0 and γ1
are independent and distributed uniformly in B, and the time-integral
of the density of γt (with respect ν) never exceeds C/ν[B].
The condition is said to hold uniformly if the constant C is indepen-
dent of the ball B = B[x, r], and locally uniformly if it is independent
of B as long as B[x, 2r] remains inside a large fixed ball B[z, R].
A more explicit formulation of the democratic condition is as follows:
If µt stands for the law of γt , then
Z 1
ν
µt dt ≤ C . (19.15)
0 ν[B]
Theorem 19.10 (CD(K, N ) implies Dm). Let M be a Riemannian
manifold equipped with a reference measure ν = e−V vol, V ∈ C 2 (M ),
satisfying a curvature-dimension condition CD(K, N ) for some K ∈ R,
N ∈ (1, ∞). Then ν satisfies a locally uniform democratic condition,
with an admissible constant 2N C(K, N, R) in a large ball B[z, R], where
C(K, N, R) is defined in (19.11).
In particular, if K ≥ 0, then ν satisfies the uniform democratic
condition Dm(2N ).
Proof of Theorem 19.10. The proof is largely based on Theorem 19.6.
Let B be a ball of radius r. For any point x0 , let µxt 0 be as in the
statement of Theorem 19.6; then its density ρxt 0 (with respect to ν) is
bounded above by C(K, N, R)/(tN ν[B]).
On the other hand, µxt 0 can be interpreted as the position at time t
of a random geodesic γ x0 starting at x0 and ending at x1 , which is dis-
tributed according to µ. By integrating this against µ(dx0 ), we obtain
the position at time t of a random geodesic γ such that γ0 and γ1 are
independent and both distributed according to µ. Explicitly,
Z
µt = law (γt ) = µxt 0 dµ(x0 ).
M
532 19 Density control and local regularity

Obviously, the uniform bound on ρt persists upon integration, so

C(K, N, R)
µt ≤ ν. (19.16)
tN ν[B]
Recall that µt = law (γt ), where γ0 , γ1 are independent and dis-
tributed according to µ. Since geodesics in a Riemannian manifold are
almost surely unique, we can throw away a set of zero volume in B × B
such that for each (x, y) ∈ (B × B) \ Z, there is a unique geodesic
(γtx0 ,x1 )0≤t≤1 going from x0 to x1 . Then µt is characterized as the law
of γtx0 ,x1 , where law (x0 , x1 ) = µ ⊗ µ. If we repeat the construction by
exchanging the variables x0 and x1 , and replacing t by 1 − t, then we
get the same path (µt ), up to reparametrization of time. So

C(K, N, R)
µt ≤ ν. (19.17)
(1 − t)N ν[B]
Combining (19.16) and (19.17) and passing to densities, one obtains
that, ν(dx)-almost surely,

1 1 1 2N C(K, N, R)
ρt (x) ≤ C(K, N, R) min N , ≤ ,
t (1 − t)N ν[B] ν[B]
(19.18)
which implies Theorem 19.10. ⊓
⊔
Remark 19.11. The above bounds (19.18) can be improved as follows.
Let µ = ρ ν be a probability measure which is absolutely continuous
with respect to ν, and otherwise arbitrary. Then there exists a random
geodesic γ such that law (γ0 , γ1 ) = µ ⊗ µ, law (γt ) admits a density ρt
with respect to ν, and

1 1 1
kρt kLp (ν) ≤ C(K, N, R) p min N/p′ ,
′
kρkLp (ν) (19.19)
t (1 − t)N/p′
for all p ∈ (1, ∞), where p′ = p/(p − 1) is the conjugate exponent to p.

Local Poincaré inequality

Convention 19.12. If ν satisfies a local Poincaré inequality for some
constant C, I shall say that it satisfies a uniform local Poincaré inequal-
ity. If ν satisfies a local Poincaré inequality in each ball BR (z), with a
constant that may depend on z and R, I shall just say that ν satisfies
a local Poincaré inequality.
Local Poincaré inequality 533

Theorem 19.13 (Doubling + democratic imply local Poincaré).

Let (X , d) be a length space equipped with a reference measure ν satis-
fying a doubling condition with constant D, and a democratic condition
with constant C. Then ν satisfies a local Poincaré inequality with con-
stant P = 2 C D.
If the doubling and democratic conditions hold true inside a ball
B(z, R) with constants C = C(z, R) and D = D(z, R) respectively, then
ν satisfies a local Poincaré inequality in the ball B(z, R) with constant
P (z, R) = 2 C(z, R) D(z, R).

Before giving the proof of Theorem 19.13 I shall state a corol-

lary which follows immediately from this theorem together with Corol-
lary 18.11 and Theorem 19.10:

Corollary 19.14 (CD(K, N ) implies local Poincaré). Let M be a

Riemannian manifold equipped with a reference measure ν = e−V vol,
V ∈ C 2 (M ), satisfying a curvature-dimension condition CD(K, N ) for
some K ∈ R, N ∈ (1, ∞). Then ν satisfies a local Poincaré inequality
with constant P (K, N, R) = 22N +1 C(K, N, R) D(K, N, R), inside any
ball B[z, R], where C(K, N, R) and D(K, N, R) are defined by (19.11)
and (18.10).
In particular, if K ≥ 0 then ν satisfies a local Poincaré inequality
on the whole of M with constant 22N +1 .

Proof of Theorem 19.13. Let x0 be a given point in M . Given r > 0,

write B = Br] (x0 ), and 2B = B2r] (x0 ). As before, let µ = (1B ν)/ν[B].
Let u : B → R be an arbitrary Lipschitz function. For any y0 ∈ M , we
have Z

u(y0 ) − huiB = u(y0 ) − u(y1 ) dµ(y1 ).
M
Then
Z Z

− |u − huiB | dν = u(y0 ) − hui dµ(y0 )
B
M
B
Z

≤ u(y0 ) − u(y1 ) dµ(y0 ) dµ(y1 ). (19.20)
B×B

Next, let us estimate |u(y0 ) − u(y1 )| in terms of a constant-speed

geodesic path γ joining y0 to y1 , where y0 , y1 ∈ B. The length of such
a geodesic path is at most 2r. Then, with the shorthand g = |∇u|,
534 19 Density control and local regularity
Z
1
u(y0 ) − u(y1 ) ≤ 2r g(γ(t)) dt. (19.21)
0

By assumption there is a random geodesic γ : [0, 1] → M such

that law (γ0 , γ1 ) = µ ⊗ µ and µt = law (γt ) satisﬁes (19.15). Integrat-
ing (19.21) against the law of γ yields
Z Z 1
|u(y0 ) − u(y1 )| dµ(y0 ) dµ(y1 ) ≤ E 2r g(γ(t)) dt (19.22)
M ×M 0
Z 1
= 2r E g(γ(t)) dt
0
Z 1Z
= 2r g dµt dt.
0 M

This, combined with (19.20), implies

Z Z 1Z
− |u − huiB | dν ≤ 2r g dµt dt. (19.23)
0 M
B

However, a geodesic joining two points in B cannot leave the ball 2B,
so (19.23) and the democratic condition together imply that
Z Z
2C r
− |u − huiB | dν ≤ g dν. (19.24)
ν[B] 2B
B

1 D
By the doubling property, ν[B] ≤ ν[2B] . The conclusion is that
Z Z
− |u − huiB | dν ≤ 2 C D r − g dν. (19.25)
B 2B

This concludes the proof of Theorem 19.13. ⊓

⊔

Remark 19.15. With almost the same proof, it is easy to derive the
following reﬁnement of the local Poincaré inequality:
Z Z
|u(x) − u(y)|
dν(x) dν(y) ≤ P (K, N, R) |∇u|(x) dν(x).
B[x,r] d(x, y) B[x,2r]
Back to Brunn–Minkowski and Prékopa–Leindler inequalities 535

Back to Brunn–Minkowski and Prékopa–Leindler

inequalities

To conclude this chapter I shall explain how the Brunn–Minkowski

inequality (18.5) comes at once from the pointwise estimates on the
interpolant density.

Proof of Theorem 18.5, again. Let µ0 be the measure ν conditioned on

A0 , i.e. µ0 = ρ0 ν with ρ0 = 1A0 /ν[A0 ]. Similarly, let µ1 = ρ1 ν with
ρ1 = 1A1 /ν[A1 ]. Let ρt be the density of the displacement interpolant
at time t. Then, since ρ0 vanishes out of A0 , and ρ1 out of A1 , Theo-
rem 19.4 implies

1 1 1
−N
ρt (x) ≥ (1 − t) inf β1−t (x0 , x1 ) N ν[A0 ] N
x∈[A0 ,A1 ]t

1 1
+ t inf βt (x0 , x1 ) N ν[A1 ] N
x∈[A0 ,A1 ]t

1 1
≥ (1 − t) inf β1−t (x0 , x1 ) N ν[A0 ] N
(x0 ,x1 )∈A0 ×A1

1 1
+ t inf βt (x0 , x1 ) N ν[A1 ] N .
(x0 ,x1 )∈A0 ×A1

Now integrate this against ρt (x) dν(x): since the right-hand side
does not depend on x any longer,
Z
1 1 1
1− N
ρt (x) dν(x) ≥ (1 − t) inf β1−t (x0 , x1 ) N ν[A0 ] N
x∈[A0 ,A1 ]t

1 1
+ t inf βt (x0 , x1 ) N ν[A1 ] N .
x∈[A0 ,A1 ]t

On the other hand, ρt is concentrated on [A0 , A1 ]t , so the same Jensen

inequality that was used in the earlier proof of Theorem 18.5 implies
Z
1 1
ρt (x)1− N dν(x) ≤ ν [A0 , A1 ]t N ,

and inequality (18.4) follows. ⊓

⊔

Interestingly enough, Theorem 19.4 also implies the distorted

Prékopa–Leindler inequality. This is a functional variant of the
Brunn–Minkowski inequality, which is sometimes much more conve-
nient to handle. (Here I say that the inequality is “distorted” only
536 19 Density control and local regularity

because the Prékopa–Leindler inequality is usually stated in Rn , while

the Riemannian generalization involves distortion coefficients.) I shall
first consider the dimension-free case, which is simpler and does not
need distortion coefficients.

Theorem 19.16 (Prékopa–Leindler inequalities). With the same

notation as in Theorem 19.4, assume that (M, ν) satisfies the curvature-
dimension condition CD(K, ∞). Let t ∈ (0, 1), and let f , g, h be three
nonnegative functions such that the inequality

1−t t Kt(1 − t) 2
h(x) ≥ sup f (x0 ) g(x1 ) exp − d(x0 , x1 )
x∈[x0 ,x1 ]t 2
(19.26)
is satisfied for all x ∈ M . Then
Z Z 1−t Z t
h dν ≥ f dν g dν .

Proof ofRTheorem
R 19.16. By an easy homogeneity argument, we may
assume f = g = 1. Then write ρ0 = f , ρ1 = g; by Theorem 19.4,
the displacement interpolant ρt between
R ρ0Rν and ρ1 ν satisﬁes (19.3).
From (19.26), h ≥ ρt . It follows that h ≥ ρt = 1, as desired. ⊓
⊔

Remark 19.17. Let (M, ν) be a Riemannian manifold satisfying a

curvature-dimension bound CD(K, ∞) with K > 0, and let A ⊂ M
be a compact set such that ν[A] > 0. Apply the Prékopa–Leindler in-
equality with t = 1/2, f = 1A , g = exp(K d(x, A)2 /4) and h = 1: This
shows that Z
Kd(x,A)2
e 4 dν(x) < +∞, (19.27)
M
and one easily deduces that ν admits square-exponential moments
(something which we already know from Theorem 18.12).

I shall conclude with the dimension-dependent form of the Prékopa–

Leindler inequality, which will require some more notation. For any
a, b ≥ 0, t ∈ [0, 1], q ∈ R \ {0}, deﬁne
h i1
Mqt (a, b) := (1 − t) aq + t bq ,
q

with the convention that Mqt (a, b) = 0 if either a or b is 0; and

M−∞t (a, b) = min(a, b).
Back to Brunn–Minkowski and Prékopa–Leindler inequalities 537

Theorem 19.18 (Finite-dimension distorted Prékopa–Leindler

inequality). With the same notation as in Theorem 19.4, assume that
(M, ν) satisfies a curvature-dimension condition CD(K, N ) for some
K ∈ R, N ∈ (1, ∞). Let f , g and h be three nonnegative functions on
M satisfying
!
q f (x0 ) g(x1 ) 1
h(x) ≥ sup Mt (K,N )
, (K,N )
, q ≥ − ;
x∈[x0 ,x1 ]t β1−t (x0 , x1 ) βt (x0 , x1 ) N
(19.28)
then Z q
Z Z
1+Nq
h dν ≥ Mt f dν, g dν . (19.29)

Proof of Theorem 19.18. The proof is quite similar to the proof of The-
orem 19.16, except that now N is finite. Let f , g and h satisfy the
assumptions of the theorem, define ρ0 = f /kf kL1 , ρ1 = g/kgkL1 , and
let ρt be the density of the displacement interpolant at time t between
ρ0 ν and ρ1 ν.RLet M be the right-hand side of (19.29); the problem is
to show that (h/M) ≥ 1, and this is obviously true if h/M ≥ ρt . In
view of Theorem 19.4, it is sufficient to establish

h(x) 1 β1−t (x0 , x1 ) βt (x0 , x1 ) −1
≥ sup Mt
N
, . (19.30)
M x∈[x0 ,x1 ]t ρ0 (x0 ) ρ1 (x1 )
In view of the assumption of h and the form of M, it is sufficient to
check that

q f (x0 ) g(x1 )
1 M ,
t β1−t (x0 ,x1 ) βt (x0 ,x1 )
1 ≤ q .
MtN β1−t (x0 ,x1 ) βt (x0 ,x1 )
ρ0 (x0 ) , ρ1 (x1 ) M t
1+Nq
(kf kL1 , kgk L 1 )
But this is a consequence of the following computation:
1
= Mst (a, b)
M−s −1 −1
t (a , b )
q a b
a b Mt c, d
≤ Mqt , Mt r (c, d) = , (19.31)
c d M−r
t (c, d)
1 1 1
+ = , q + r ≥ 0,
q r s
where the two equalities in (19.31) are obvious by homogeneity, and the
central inequality is a consequence of the two-point Hölder inequality
(see the bibliographical notes for references). ⊓
⊔
538 19 Density control and local regularity

Bibliographical notes

The main historical references concerning interior regularity estimates

are by De Giorgi [274], Nash [646] and Moser [638, 639]. Their methods
were later considerably developed in the theory of elliptic partial dif-
ferential equations, see e.g. [189, 416]. Moser’s Harnack inequality is a
handy technical tool to recover the most famous regularity results. The
relations of this inequality with Poincaré and Sobolev inequalities, and
the influence of Ricci curvature on it, were studied by many authors,
including in particular Saloff-Coste [726] and Grigor’yan [433].
Lebesgue’s density theorem can be found in most textbooks about
measure theory, e.g. Rudin [714, Chapter 7].
Local Poincaré inequalities admit many variants and are known un-
der many names in the literature, in particular “weak Poincaré in-
equalities”, by contrast with “strong” Poincaré inequalities, in which
the larger ball is not B[x, 2r] but B[x, r]. In spite of that terminology,
both inequalities are in some sense equivalent [461]. Sometimes one
replaces the ball B[x, 2r] by a smaller ball B[x, λr], λ > 1. One some-
times says that the inequality (19.1) is of type (1, 1) because there are
L1 norms on both sides. Inequality (19.1) also implies the other main
members of the family of local Poincaré inequalities, see for instance
Heinonen [469, Chapters 4 and 9]. There are equivalent formulations of
these inequalities in terms of modulus and capacity, see e.g. [510, 511]
and the many references therein. The study of Poincaré inequalities in
metric spaces has turned into a surprisingly large domain of research.
Local Poincaré inequalities are also used to study large-scale geome-
try, see e.g. [250]. Further, inequality (19.1), applied to the whole space,
is equivalent to Cheeger’s isoperimetric inequality:
1
ν[Ω] ≤ =⇒ |∂Ω|ν ≥ K ν[Ω], (19.32)
2
where ν is a reference probability measure on X , Ω is a Borel subset
of X , and |∂Ω|ν is the ν-surface of Ω. Cheeger’s inequality in turn
implies the usual Poincaré inequality [225, 609, 610]. See [633] and the
references therein for more details.
Theorem 19.6 is a classical estimate, usually formulated in terms
of Jacobian estimates, see e.g. Saloff-Coste [728, p. 179]; the differ-
ences in the formulas are due to the convention that geodesics might be
parametrized by arc length rather than defined on [0, 1]. The transport-
based proof was devised by Lott and myself [578].
Bibliographical notes 539

The “intrinsic” bounds appearing in Theorem 19.8 go back to [246]

(in the compactly supported case) and [363, Section 3] (in the general
case). The methods applied in these references are different. The re-
striction strategy which I used to prove Theorems 19.4 and 19.8 is an
amplification of the transport-based proof of Theorem 19.6 from [578].
A nice alternative strategy, also based on restriction, was suggested by
Sturm [763, Proof of Proposition IV.2]. Instead of conditioning with
respect to the values of γt , Sturm conditions with respect to the values
of (γ0 , γ1 ); this has the technical drawback to modify the values of ρt ,
but one can get around this difficulty by a two-step limit procedure.
The proofs of Theorems 19.10 and 19.13 closely follow [578]. It
was Pajot who pointed out to me the usefulness of Jacobian esti-
mates expressed by Theorem 19.6 (recall Remark 19.7) for proving
local Poincaré inequalities. The democratic condition Dm(C) was ex-
plicitly introduced in [578], but it is somehow implicit in earlier works,
such as Cheeger and Colding [230]. Other proofs of the local Poincaré
inequality from optimal transport, based on slightly different but quite
close arguments, were found independently by von Renesse [825] and
Sturm [763].
The general strategy of proof behind Theorem 19.13 is rather classi-
cal, be it in the context of Riemannian manifolds or groups or graphs;
see e.g. [249] and the references therein.
The classical Prékopa–Leindler inequality in Euclidean space goes
back to [548, 690]; see [406] for references and its role in the Brunn–
Minkowski theory. Although in principle equivalent to the Brunn–
Minkowski inequality, it is sometimes rather more useful, see e.g. a
famous application by Maurey [607] for concentration inequalities.
Bobkov and Ledoux [131] have shown how to use this inequality to de-
rive many functional inequalities such as logarithmic Sobolev inequal-
ities, to be considered in Chapter 21.
In the Euclidean case, the stronger version of the Prékopa–Leindler
inequality which corresponds to Theorem 19.18 was established by
Borell [144], Brascamp and Lieb [153], and others. The proof of The-
orem 19.18 from Theorem 19.4 follows the argument given at the
very end of [246]. The inequality used in (19.31) appears in [406,
Lemma 10.1].
The Prékopa–Leindler inequality on manifolds, Theorem 19.16,
shows up in a recent research paper by Cordero-Erausquin, McCann
and Schmuckenschläger [247]. In that reference, displacement convex-
540 19 Density control and local regularity

ity is established independently of the Prékopa–Leindler inequality, but

with similar tools (namely, the Jacobian estimates in Chapter 14). The
presentation that I have adopted makes it clear that the Prékopa–
Leindler inequality, and even the stronger pointwise bounds in Theo-
rem 19.4, can really be seen as a consequence of displacement convexity
inequalities (together with the restriction property). This determina-
tion to derive everything from displacement convexity, rather than di-
rectly from Jacobian estimates, will ﬁnd a justiﬁcation in Part III of
this course: In some sense the notion of displacement convexity is softer
and more robust.
In RN , there is also a “stronger” version of Theorem 19.18 in which
the exponent q can go down to −1/(N − 1) instead of −1/N ; it reads

h (1 − t)x0 + tx1 ≥ Mqt (f (x0 ), g(x1 )) =⇒
Z Z Z
q
1+q(N−1)
1 1 1
h(z) dz ≥ Mt mi (f ), mi (g) · Mt f, g ,
mi (f ) mi (g)

(19.33)

where i ∈ {1, . . . , N } is arbitrary and

Z
mi (f ) = sup f (x) dx1 . . . dxi−1 dxi+1 . . . dxN .
xi ∈R RN−1

It was recently shown by Bobkov and Ledoux [132] that this inequality
can be used to establish optimal Sobolev inequalities in RN (with the
usual Prékopa–Leindler inequality one can apparently reach only the
logarithmic Sobolev inequality, that is, the dimension-free case [131]).
See [132] for the history and derivation of (19.33).
20

Infinitesimal displacement convexity

The goal of the present chapter is to translate displacement convex-

ity inequalities of the form “the graph of a convex function lies below
the chord” into inequalities of the form “the graph of a convex func-
tion lies above the tangent” — just as in statements (ii) and (iii) of
Proposition 16.2. This corresponds to the limit t → 0 in the convexity
inequality.
The main results in this chapter are the HWI inequality (Corol-
lary 20.13); and its generalized version, the distorted HWI inequality
(Theorem 20.10).

Time-derivative of the energy

As a preliminary step, a useful lower bound will now be given for a
derivative of Uν (µt ), where (µt )0≤t≤1 is a Wasserstein geodesic and Uν
an energy functional with a reference measure ν. This computation
hardly needs any regularity on the space, and for later use I shall state
it in a more general setting than Riemannian manifolds.
In the next theorem, I consider a locally compact, complete geodesic
space X equipped with a distance d and a locally ﬁnite measure ν. Then
U : [0, +∞) → R is a continuous convex function, twice diﬀerentiable
on (0, +∞). To U is associated the functional
Z
Uν (µ) = U (ρ) dν µ = ρ ν.
X

The statement below will involve norms of gradients. Even though

there is no natural notion for the gradient ∇f of a function f deﬁned
542 20 Infinitesimal displacement convexity

on a nonsmooth length space, there are still natural definitions for the
norm of the gradient, |∇f |. The most common one is
|f (y) − f (x)|
|∇f |(x) := lim sup . (20.1)
y→x d(x, y)
Rigorously speaking, this formula makes sense only if x is not isolated,
which will always be the case in the sequel. A slightly finer notion is
the following:
[f (y) − f (x)]−
|∇− f |(x) := lim sup , (20.2)
y→x d(x, y)
where a− = max(−a, 0) stands for the negative part of a (which is
a nonnegative number!). It is obvious that |∇− f | ≤ |∇f |, and both
notions coincide with the usual one if f is differentiable. Note that
|∇− f |(x) is automatically 0 if x is a local minimum of f .
Theorem 20.1 (Differentiating an energy along optimal trans-
port). Let (X , d, ν) and U be as above, and let (µt )0≤t≤1 be a geodesic
in P2 (X ), such that each µt is absolutely continuous with respect to ν,
with density ρt , and U (ρt )− is ν-integrable for all t. Further assume
that ρ0 is Lipschitz continuous, U (ρ0 ) and ρ0 U ′ (ρ0 ) are ν-integrable,
and U ′ is Lipschitz continuous on ρ0 (X ). Then

Uν (µt ) − Uν (µ0 )
lim inf ≥
t↓0 t
Z
− U ′′ (ρ0 (x0 )) |∇− ρ0 |(x0 ) d(x0 , x1 ) π(dx0 dx1 ), (20.3)
X
where π is an optimal coupling of (µ0 , µ1 ) associated with the geodesic
path (µt )0≤t≤1 .
Remark 20.2. The technical assumption on the negative part of U (ρt )
being integrable is a standard way to make sure that Uν (µt ) is well-
defined, with values in R ∪ {+∞}. As for the assumption about U ′
being Lipschitz on ρ0 (X ), it means in practice that either U is twice
(right-)differentiable at the origin, or ρ0 is bounded away from 0.
Remark 20.3. Here is a more probabilistic reformulation of (20.3)
(which will also make more explicit the link between π and µt ): Let
γ be a random geodesic such that µt = law (γt ), then
h i
Uν (µt ) − Uν (µ0 )
lim inf ≥ − E U ′′ (ρ0 (γ0 )) |∇− ρ0 |(γ0 ) d(γ0 , γ1 ) .
t↓0 t
Time-derivative of the energy 543

Proof of Theorem 20.1. By convexity,

U (ρt ) − U (ρ0 ) ≥ U ′ (ρ0 ) (ρt − ρ0 ), (20.4)

where U ′ (0) is the right-derivative of U at 0.

On the one hand, U (ρ0 ) and U (ρt )− are ν-integrable by assumption,
so the integral in the left-hand side of (20.4) makes sense in R ∪ {+∞}
(and the integral of each term is well-deﬁned). On the other hand,
ρ0 U ′ (ρ0 ) is integrable by assumption, while ρt U ′ (ρ0 ) is bounded above
by (max U ′ )ρt , which is integrable; so the integral of the right-hand side
makes sense in R ∪ {−∞}. All in all, inequality (20.4) can be integrated
into
Z Z
Uν (µt ) − Uν (µ0 ) ≥ U ′ (ρ0 )ρt dν − U ′ (ρ0 )ρ0 dν
Z Z
= U ′ (ρ0 ) dµt − U ′ (ρ0 ) dµ0 .

Now let γ be a random geodesic, such that µt = law (γt ). Then the
above inequality can be rewritten
Uν (µt ) − Uν (µ0 ) ≥ E U ′ (ρ0 (γt )) − E U ′ (ρ0 (γ0 ))
h i
= E U ′ (ρ0 (γt )) − U ′ (ρ0 (γ0 )) .

Since U ′ is nondecreasing,

U ′ (ρ0 (γt )) − U ′ (ρ0 (γ0 )) ≥ U ′ (ρ0 (γt )) − U ′ (ρ0 (γ0 )) 1ρ0 (γ0 )>ρ0 (γt ) .

Multiplying and dividing by ρ0 (γt ) − ρ0 (γ0 ), and then by d(γ0 , γt ), one

arrives at
h i
Uν (µt ) − Uν (µ0 ) ≥
′
U (ρ0 (γt )) − U ′ (ρ0 (γ0 )) ρ0 (γt ) − ρ0 (γ0 )
E 1ρ0 (γ0 )>ρ0 (γt ) d(γ0 , γt ).
ρ0 (γt ) − ρ0 (γ0 ) d(γ0 , γt )
After division by t and use of the identity d(γ0 , γt ) = t d(γ0 , γ1 ), one
obtains in the end
1h i
Uν (µt ) − Uν (µ0 ) ≥
t

U ′ (ρ0 (γt )) − U ′ (ρ0 (γ0 )) ρ0 (γt ) − ρ0 (γ0 )
E 1ρ0 (γ0 )>ρ0 (γt ) d(γ0 , γ1 ).
ρ0 (γt ) − ρ0 (γ0 ) d(γ0 , γt )
(20.5)
544 20 Infinitesimal displacement convexity

It remains to pass to the limit in the right-hand side of (20.5) as

t → 0. Since ρ0 is continuous, for almost each geodesic γ one has
ρ0 (γt ) → ρ0 (γ0 ) > 0 as t → 0, and in particular,

U ′ (ρ0 (γt )) − U ′ (ρ0 (γ0 ))

−−→ U ′′ (ρ0 (γ0 )),
ρ0 (γt ) − ρ0 (γ0 ) t→0

Similarly,

ρ0 (γt ) − ρ0 (γ0 )
lim inf 1ρ0 (γ0 )>ρ0 (γt ) ≥ − |∇− ρ0 |(γ0 ).
t→0 d(γ0 , γt )

So, if vt (γ) stands for the integrand in the right-hand side of (20.5),
one has

lim inf vt (γ) ≥ − U ′′ (ρ0 (γ0 )) |∇− ρ0 |(γ0 ) d(γ0 , γ1 ).

t→0

On the other hand, ρ0 is Lipschitz by assumption, and also U ′ is

Lipschitz on the range of ρ0 . So |vt (γ)| ≤ Cd(γ0 , γ1 ), where C is
the product of the Lipschitz constants of ρ0 and U ′ . This uniform
domination makes it possible to apply Fatou’s lemma, in the form
lim inf t→0 E vt (γ) ≥ E lim inf vt (γ). Thus
1h i
lim inf Uν (µt ) − Uν (µ0 ) ≥ − E U ′′ (ρ0 (γ0 )) |∇− ρ0 |(γ0 ) d(γ0 , γ1 ),
t→0 t
as desired. ⊓
⊔

Remark 20.4. This theorem does not assume smoothness of X , and

does not either assume structural restrictions on the function U . On
the other hand, when X is a Riemannian manifold of dimension n,
ν = e−V vol, and µ is compactly supported, then there is a more precise
result: Z
[Uν (µt ) − Uν (µ0 )]
lim = − p(ρ0 )(Lψ) dν, (20.6)
t→0 t
where ψ is such that T = exp(∇ψ) is the unique optimal transport from
µ0 to µ1 , and Lψ = ∆ψ − ∇V · ∇ψ (deﬁned almost everywhere). It is
not clear a priori how this compares with the result of Theorem 20.1,
but then, under slightly more stringent regularity assumptions, one can
justify the integration by parts formula
Z Z
− p(ρ0 ) Lψ dν ≥ ρ0 U ′′ (ρ0 )∇ρ0 · ∇ψ dν (20.7)
HWI inequalities 545

(note indeed that p′ (r) = r U ′′ (r)). Since π = (ρ0 ν) ⊗ δx1 =T (x0 ) with
T = exp ∇ψ, the right-hand side can be rewritten
Z
U ′′ (ρ0 )∇ρ0 · ∇ψ dπ.

As |∇ψ(x0 )| = d(x0 , x1 ), this integral is obviously an upper bound

for the expression in (20.3). In the present chapter, the more precise
result (20.6) will not be useful, but later in Chapter 23 we shall have
to go through it (see the proof of Theorem 23.14). More comments are
in the bibliographical notes.

Exercise 20.5. Use Otto’s calculus to guess that (d/dt)Uν (µt ) should
coincide with the right-hand side of (20.7).

HWI inequalities

Recall from Chapters 16 and 17 that CD(K, N ) bounds imply convexity

properties of certain functionals Uν along displacement interpolation.
For instance, if a Riemannian manifold M , equipped with a reference
measure ν, satisﬁes CD(0, ∞), then by Theorem 17.15 the Boltzmann
H functional is displacement convex.
If (µt )0≤t≤1 is a geodesic in P2ac (M ), for t ∈ (0, 1] the convexity
inequality
Hν (µt ) ≤ (1 − t) Hν (µ0 ) + t Hν (µ1 )
may be rewritten as

Hν (µt ) − Hν (µ0 )
≤ Hν (µ1 ) − Hν (µ0 ).
t
Under suitable assumptions we may then apply Theorem 20.1 to pass
to the limit as t → 0, and get
Z
|∇ρ0 (x0 )|
− d(x0 , x1 ) π(dx0 dx1 ) ≤ Hν (µ1 ) − Hν (µ0 ).
ρ0 (x0 )

This implies, by Cauchy–Schwarz inequality,

546 20 Infinitesimal displacement convexity
sZ sZ
|∇ρ0 (x0 )|
Hν (µ0 )−Hν (µ1 ) ≤ d(x0 , x1 )2 π(dx0 dx1 ) π(dx0 dx1 )
ρ0 (x0 )2
sZ
|∇ρ0 |2
= W2 (µ0 , µ1 ) dν, (20.8)
ρ0

where I have used the fact that the ﬁrst marginal of π is µ0 = ρ0 ν.

Inequality (20.8) is the HWI inequality: It is expressed in terms
of
Z
• the H-functional Hν (µ) = ρ log ρ dν (as usual ρ = dµ/dν);
• the Wasserstein distance of order 2, W2 ;
Z
|∇ρ|2
• the Fisher information I, defined by Iν (µ) = dν.
ρ
The present section is devoted to establishing such inequalities.
For technical reasons (such as the treatment of small values of ρ0
in noncompact manifolds, or finite-dimensional generalizations) it will
be convenient to recast this discussion in the more general setting of
distorted HWI inequalities, which involve distortion coefficients. Let
(K,N )
βt = βt be the reference distortion coefficients defined in (14.61).
Note that β1 (x0 , x1 ) = 1, β0′ (x0 , x1 ) = 0, where the prime stands for
partial derivation with respect to t. For brevity I shall write

β(x0 , x1 ) = β0 (x0 , x1 ); β ′ (x0 , x1 ) = β1′ (x0 , x1 ).

By explicit computation,
 N −1
α

 sin α >1 if K > 0
β(x0 , x1 ) = 1 if K = 0 (20.9)

 N −1
α
sinh α < 1 if K < 0,


α
−(N − 1) 1 − tan α < 0 if K > 0

β ′ (x0 , x1 ) = 0 if K = 0 (20.10)

 α

(N − 1) tanh α − 1 > 0 if K < 0,

where r
|K|
α= d(x0 , x1 ).
N −1
HWI inequalities 547

Moreover, a standard Taylor expansion shows that, as α → 0 while K

is ﬁxed (which means that either d(x0 , x1 ) → 0 or N → ∞), then

K K
β ≃1− d(x0 , x1 )2 , β′ ≃ − d(x0 , x1 )2 ,
6 3
whatever the sign of K.
The next deﬁnition is a generalization of the classical notion of
Fisher information:

Definition 20.6 (Generalized Fisher information). Let U be a

continuous convex function R+ → R, twice continuously differentiable
on (0, +∞). Let M be a Riemannian manifold, equipped with a Borel
reference measure ν. Let µ ∈ P ac (M ) be a probability measure on M ,
whose density ρ is locally Lipschitz. Define
Z Z Z
′′ 2 2 |∇p(ρ)|2
IU,ν (µ) = ρ U (ρ) |∇ρ| dν = dν = ρ |∇U ′ (ρ)|2 dν,
ρ
(20.11)
where p(r) = r U ′ (r) − U (r).

Particular Case 20.7 (Fisher information). When U (r) = r log r,

(20.11) becomes Z
|∇ρ|2
Iν (µ) = dν.
ρ
Remark 20.8. The identity in (20.11) comes from the chain-rule:

∇p(ρ) = p′ (ρ) ∇ρ = ρ U ′′ (ρ) ∇ρ = ρ ∇U ′ (ρ).

(Strictly speaking this is true only if ρ > 0, but the integral in (20.11)
may be restricted to the set {ρ > 0}.) Also, in Deﬁnition 20.6 one can
replace |∇ρ| by |∇− ρ| and |∇p(ρ)| by |∇− p(ρ)| since a locally Lipschitz
function is diﬀerentiable almost everywhere.

Remark 20.9. If p(ρ) ∈ L1loc (M ), then R the convexity of (p, r) → |p|2 /r

n 2
on R ×R+ makes it possible to deﬁne |∇p(ρ)| /ρ dν in [0, +∞] even if
ρ is not locally Lipschitz. In particular, Iν (µ) makes sense in [0, +∞] for
all probability measures µ (with the understanding that Iν (µ) = +∞
if µ is singular). I shall not develop this remark, and in the sequel shall
only consider densities which are locally (and even globally) Lipschitz.
548 20 Infinitesimal displacement convexity

Theorem 20.10 (Distorted HWI inequality). Let M be a Rie-

mannian manifold equipped with a reference measure ν = e−V vol,
V ∈ C 2 (M ), satisfying a curvature-dimension bound CD(K, N ) for
some K ∈ R, N ∈ (1, ∞]. Let U ∈ DCN and let p(r) = r U ′ (r) − U (r).
Let µ0 = ρ0 ν and µ1 = ρ1 ν be two absolutely continuous probability
measures such that:
(a) µ0 , µ1 ∈ Ppac (M ), where p ∈ [2, +∞) ∪ {c} satisfies (17.30);
(b) ρ0 is Lipschitz.
If N = ∞, further assume ρ0 log+ ρ0 and ρ1 log+ ρ1 belong to L1 (ν).
If K > 0, further assume ρ0 U ′ (ρ0 ) ∈ L1 (ν). If K > 0 and N = ∞,
further assume p(ρ0 )2 /ρ0 ∈ L1 (ν). Then
Z Z
ρ1 (x1 )
U (ρ0 ) dν ≤ U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
M M ×M β(x0 , x1 )
Z
+ p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
Z M ×M
+ U ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 ), (20.12)
M ×M

where π is the unique optimal coupling of (µ0 , µ1 ) and the coefficients

β, β ′ are defined in (20.9)–(20.10).
In particular,
(i) If K = 0 and Uν (µ1 ) < +∞, then
Z
Uν (µ0 ) − Uν (µ1 ) ≤ U ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 )
q
≤ W2 (µ0 , µ1 ) IU,ν (µ0 ). (20.13)

(ii) If N = ∞ and Uν (µ1 ) < +∞, then

Uν (µ0 ) − Uν (µ1 )
Z
W2 (µ0 , µ1 )2
≤ U ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 ) − K∞,U
2
q W2 (µ0 , µ1 )2
≤ W2 (µ0 , µ1 ) IU,ν (µ0 ) − K∞,U , (20.14)
2
where K∞,U is defined in (17.10).
(iii) If N < ∞, K ≥ 0 and Uν (µ1 ) < +∞ then
HWI inequalities 549

Uν (µ0 ) − Uν (µ1 )
Z
≤ U ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 )
− 1 W2 (µ0 , µ1 )2
− KλN,U max kρ0 kL∞ (ν) , kρ1 kL∞ (ν) N
2
q
≤ W2 (µ0 , µ1 ) IU,ν (µ0 )
− 1 W2 (µ0 , µ1 )2
− KλN,U max kρ0 kL∞ (ν) , kρ1 kL∞ (ν) N
,
2
(20.15)

where
p(r)
λN,U = lim 1 . (20.16)
r→0 r 1− N
Exercise 20.11. When U is well-behaved, give a more direct deriva-
tion of (20.14), via plain displacement convexity (rather than distorted
displacement convexity). The same for (20.15), with the help of Exer-
cise 17.23.

Remark 20.12. As the proof will show, Theorem 20.10(iii) extends

to negative curvature modulo the following changes: replace limr→0
in (20.16) by limr→∞ ; and (max(kρ0 kL∞ , kρ1 kL∞ ))−1/N in (20.15) by
max(k1/ρ0 kL∞ , k1/ρ1 kL∞ )1/N . (This result is not easy to derive by
plain displacement convexity.)

Corollary 20.13 (HWI inequalities). Let M be a Riemannian

manifold equipped with a reference measure ν = e−V vol, V ∈ C 2 (M ),
satisfying a curvature-dimension bound CD(K, ∞) for some K ∈ R.
Then:
(i) Let p ∈ [2, +∞) ∪ {c} satisfy (17.30) for N = ∞, and let µ0 =
ρ0 ν, µ1 = ρ1 ν be any two probability measures in Ppac (M ), such that
Hν (µ1 ) < +∞ and ρ0 is Lipschitz; then
p W2 (µ0 , µ1 )2
Hν (µ0 ) − Hν (µ1 ) ≤ W2 (µ0 , µ1 ) Iν (µ0 ) − K .
2
(ii) If ν ∈ P2 (M ) then for any µ ∈ P2 (M ),
p W2 (µ, ν)2
Hν (µ) ≤ W2 (µ, ν) Iν (µ) − K . (20.17)
2
Remark 20.14. The HWI inequality plays the role of a nonlinear in-
terpolation inequality: it shows that the Kullback information H is
550 20 Infinitesimal displacement convexity

controlled by a bit of the Fisher information I (which is stronger, in

the sense that it involves smoothness) and the Wasserstein distance
W2 (which p is weaker). A related “linear” interpolation inequality is
1
khkL2 ≤ khkH −1 khkH 1 , where H is the Sobolev space deﬁned by
the L2 -norm of the gradient, and H −1 is the dual of H 1 .

Proof of Corollary 20.13. Statement (i) follows from Theorem 20.10

by choosing N = ∞ and U (r) = r log r. Statement (ii) is obtained
by approximation: One just needs to ﬁnd a sequence of probabil-
ity densities ρ0,k → ρ0 in such a way that each ρ0 is Lipschitz and
Hν (ρ0,k ν) −→ Hν (µ), W2 (ρ0,k ν, ν) −→ W2 (µ, ν), Iν (ρ0,k ν) −→ Iν (µ).
I shall not go into this argument and refer to the bibliographical notes
for more information. ⊓
⊔

Proof of Theorem 20.10. First recall from the proof of Theorem 17.8
that U− (ρ0 ) is integrable; since ρ0 U ′ (ρ0 ) ≥ U (ρ0 ), the integrability of
ρ0 U ′ (ρ0 ) implies the integrability of U (ρ0 ). Moreover, if N = ∞ then
U (r) ≥ a r log r−b r for some positive constants a, b (unless U is linear).
So
if N = ∞
ρ0 U ′ (ρ0 ) ∈ L1 =⇒ U (ρ0 ) ∈ L1 =⇒ ρ0 log+ ρ0 ∈ L1 .

The proof of (20.12) will be performed in three steps.

Step 1: In this step I shall assume that U and β are nice. More pre-
cisely,
• If N < ∞ then U is Lipschitz, U ′ is Lipschitz and β, β ′ are bounded;
• If N = ∞ then U (r) = O(r log(2 + r)) and U ′ is Lipschitz.
Let (µt = ρt ν)0≤t≤1 be the unique Wasserstein geodesic joining µ0
to µ1 . Recall from Theorem 17.37 the displacement convexity inequality
Z
U (ρt ) dν
M
Z
ρ0 (x0 )
≤ (1 − t) U β1−t (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
M ×M β1−t (x0 , x1 )
Z
ρ1 (x1 )
+t U βt (x0 , x1 ) π(dx0 |x1 ) ν(dx1 ),
M ×M βt (x0 , x1 )

and transform this into

HWI inequalities 551
Z Z
ρ0 ρ1
U β1−t π dν ≤ U βt π dν
M ×M β1−t M ×M βt
 
Z U βρ1−t
0
β1−t − U (ρ0 ) Z
1
+   π dν − U (ρt ) − U (ρ0 ) dν.
M ×M t t M
(20.18)

The problem is to pass to the limit as t → 0. Let us consider the four

terms in (20.18) one after the other.
First term of (20.18): If K = 0 there is nothing to do.
If K > 0 then βt (x0 , x1 ) is a decreasing function of t; since
U (r)/r is a nondecreasing function of r it follows that U (ρ0 /β) β ≤
U (ρ0 /β1−t ) β1−t ↑ U (ρ0 ) (as t → 0). By the proof of Theorem 17.28,
U− (ρ0 /β) is integrable, so we may apply the monotone convergence
theorem to conclude that
Z Z
ρ0 (x0 )
U β1−t (x0 , x1 ) π(dx1 |x0 ) ν(dx0 ) −−→ U (ρ0 ) dν.
β1−t (x0 , x1 ) t→0
(20.19)
If K < 0 then β1−t is an increasing function of t, U (ρ0 /β) β ≥
U (ρ0 /β1−t ) β1−t ↓ U (ρ0 ), and now we should check the integrability of
U+ (ρ0 /β) β. In the case N < ∞, this is a consequence of the Lipschitz
continuity of U . In the case N = ∞, this comes from
Z
ρ0 (x0 )
U β(x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
β(x0 , x1 )
Z
1
≤ C ρ0 (x0 ) log 2 + ρ0 (x0 ) + π(dx1 |x0 ) ν(dx0 )
β(x0 , x1 )
Z Z
1
≤ C ρ0 log(2 + ρ0 ) dν + C log 2 + π(dx0 dx1 )
β(x0 , x1 )
Z
≤ C 1 + ρ0 log ρ0 dν + W2 (µ0 , µ1 )2 ,

where C stands for various numeric constants. Then (20.19) also holds
true for K < 0.
Second term of (20.18): This is the same as for the ﬁrst term except
that the inequalities are reversed. If K > 0 then U (ρ1 ) ≥ U (ρ1 /βt ) βt ↓
U (ρ1 /β) β, and to pass to the limit it suﬃces to check the integrability
of U+ (ρ1 ). If N < ∞ this follows from the Lipschitz continuity of U ,
while if N = ∞ this comes from the assumption ρ1 log+ ρ1 ∈ L1 (ν).
552 20 Infinitesimal displacement convexity

If K < 0 then U (ρ1 ) ≤ U (ρ1 /βt ) βt ↑ U (ρ1 /β) β, and now we can
conclude because U− (ρ1 ) is integrable by Theorem 17.8. In either case,
Z
ρ1 (x1 )
U βt (x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
βt (x0 , x1 )
Z
ρ1 (x1 )
−−→ U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 ). (20.20)
t→0 β(x0 , x1 )

Third term of (20.18): This term exists only if K 6= 0. By convexity

of U , the function b 7−→ U (r/b) is convex, with derivative −p(r/b); so

ρ0 ρ0
U (ρ0 ) − U β1−t ≥ −p (1 − β1−t );
β1−t β1−t
or equivalently

U βρ1−t
0
β1−t − U (ρ0 )
ρ0 1 − β1−t
≤p . (20.21)
t β1−t t
Since U is convex, p is nondecreasing. If K > 0 then β1−t de-
creases as t decreases to 0, so p(ρ0 /β1−t ) increases to p(ρ0 ), while
(1 − β1−t (x0 , x1 ))/t increases to β ′ (x0 , x1 ); so the right-hand side
of (20.21) increases to p(ρ) β ′ . The same is true if K < 0 (the in-
equalities are reversed but the product of two decreasing nonnegative
functions is nondecreasing). Moreover, for t = 1 the left-hand side
of (20.21) is integrable. So the monotone convergence theorem implies
 
Z U ρ0 (x0 )
β1−t (x0 , x1 )
β1−t (x0 ,x1 )
lim sup   π(dx1 |x0 ) ν(dx0 )
t↓0 t
Z
≤ p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 ). (20.22)

Fourth term of (20.18): By Theorem 20.1,

Z
1
lim sup − [U (ρt ) − U (ρ0 )] dπ
t↓0 t
Z
≤ U ′′ (ρ0 (x0 )) |∇− ρ0 |(x0 ) d(x0 , x1 ) π(dx0 dx1 ). (20.23)
HWI inequalities 553

All in all, (20.12) follows from (20.19), (20.20), (20.22), (20.23).

Step 2: Relaxation of the assumptions on U .
By Proposition 17.7 we can ﬁnd a sequence (Uℓ )ℓ∈N such that Uℓ
coincides with U on [ℓ−1 , ℓ], Uℓ (r) is nonincreasing in ℓ for r ≤ 1 and
nondecreasing for r ≥ 1, Uℓ is linear close to the origin, Uℓ′ is Lipschitz,
Uℓ′′ ≤ CU ′′ , and
• if N < ∞, Uℓ is Lipschitz;
• if N = ∞, Uℓ (r) = O(r log(2 + r)).
Then by Step 1, with the notation pℓ (r) = r Uℓ′ (r) − Uℓ (r),
Z Z
ρ1 (x1 )
Uℓ (ρ0 ) dν ≤ Uℓ β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
M M ×M β(x0 , x1 )
Z
+ pℓ (ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
ZM ×M
+ Uℓ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 ).
M ×M
(20.24)
R R
Passing to the limit in Uℓ (ρ0 ) and Uℓ (ρ1 /β) β is performed as in
the proof of Theorem 17.37.
Next I claim that
Z
lim sup pℓ (ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
ℓ→∞ M ×M
Z
≤ p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 ). (20.25)
M ×M

To prove (20.25), ﬁrst note that p(0) = 0 (because p(r)/r 1−1/N is non-
decreasing), so pℓ (r) → p(r) for all r, and the integrand in the left-hand
side converges to the integrand in the right-hand side.
Moreover, since pℓ (0) = 0 and p′ℓ (r) = r Uℓ′′ (r) ≤ C r U ′′ (r) =
C p′ (r), we have 0 ≤ pℓ (r) ≤ C p(r). Then:
• If K = 0 then β ′ = 0 and there is nothing to prove.
R
• If K < 0 then β ′ > 0. If p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 ) <
+∞ then the left-hand side converges to the right-hand side by
dominated convergence; otherwise the inequality is obvious.
• If K > 0 and N < ∞ then β ′ is bounded
R and we may conclude by
dominated convergence as soon as p(ρ0 (x0 )) dν(x0 ) < +∞. This
in turn results from the fact that ρ0 U ′ (ρ0 ), U− (ρ0 ) ∈ L1 (ν).
554 20 Infinitesimal displacement convexity

• If K > 0 and N = ∞, then β ′ (x0 , x1 ) = −(K/3) d(x0 , x1 )2 , so the

same reasoning applies if
Z
p(ρ0 (x0 )) d(x0 , x1 )2 π(dx1 |x0 ) ν(dx0 ) < +∞. (20.26)

But the left-hand side of (20.26) is bounded by

sZ sZ
p(ρ0 (x0 ))2
ν(dx0 ) d(x0 , x1 )2 π(dx0 dx1 )
ρ0 (x0 )
sZ
p(ρ0 )2
= dν W2 (µ0 , µ1 ),
ρ0

which is ﬁnite by assumption.

It remains to take care of the last term in (20.24), i.e. show that
Z
lim sup Uℓ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 )
ℓ→∞
Z
≤ U ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 ).

If the integral on the right-hand side is inﬁnite, the inequality is obvi-

ous. Otherwise the left-hand side converges to the right-hand side by
dominated convergence, since Uℓ′′ (ρ0 (x0 )) ≤ C U ′′ (ρ0 (x0 )).
In the end we can pass to the limit in (20.24), and recover
Z Z
ρ1 (x1 )
U (ρ0 ) dν ≤ U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
M M ×M β(x0 , x1 )
Z
+ p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
ZM ×M
+ U ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 ),
M ×M
(20.27)

Step 3: Relaxation of the assumption on β.

If N < ∞ I have assumed that p β, β ′ are bounded, which is true if
K ≤ 0 or if diam (M ) < DK,N = π (N − 1)/K. The only problem is
when K > 0 and diam (M ) = DK,N . In this case it suﬃces to estab-
lish (20.27) with N replaced by N ′ > N and then pass to the limit as
N ′ ↓ N . Explicitly:
HWI inequalities 555
Z Z !
ρ1 (x1 ) ′
U (ρ0 ) dν ≤ U K,N ′
β0K,N (x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
M M ×M β0 (x0 , x1 )
Z
′
+ p(ρ0 (x0 )) (β K,N )′1 (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
ZM ×M
+ U ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 ).
M ×M

Passing to the limit is allowed because the right-hand side is decreas-

′ ′ ′
ing as N ′ ↓ N . Indeed, β (K,N ) is increasing, so U (ρ1 /β K,N ) β (K,N )
′
is decreasing; and (β (K,N ) )′1 is decreasing. This concludes the proof
of (20.12).
Next, (20.13) is obtained by considering the particular case K = 0
in (20.12) (so β = 1 and β ′ = 0), and then applying the Cauchy–
Schwarz inequality:
Z
U ′′ (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 )
sZ sZ
≤ d(x0 , x1 )2 π(dx0 dx1 ) U ′′ (ρ0 (x0 ))2 |∇ρ0 (x0 )|2 π(dx0 dx1 ).

The case N = ∞ requires a bit more work. Let u(δ) = U (e−δ ) eδ ,

then u is convex, and u′ (δ) = −eδ p(e−δ ), so

U (e−δ1 ) eδ1 ≥ U (e−δ2 ) eδ2 − eδ2 p(e−δ2 ) (δ1 − δ2 ).

In particular,

ρ1 (x1 ) β(x0 , x1 )
U
β(x0 , x1 ) ρ1 (x1 )
1
≤ U (ρ1 (x1 ))
ρ1 (x1 )

β(x0 , x1 ) ρ1 (x1 ) 1 β(x0 , x1 )
+ p log − log
ρ1 (x1 )β(x0 , x1 ) ρ1 (x1 ) ρ1 (x1 )

U (ρ1 (x1 )) β(x0 , x1 ) ρ1 (x1 ) K
= − p d(x0 , x1 )2
ρ1 (x1 ) ρ1 (x1 ) β(x0 , x1 ) 6
U (ρ1 (x1 )) K∞,U
≤ − d(x0 , x1 )2 .
ρ1 (x1 ) 6
Thus
556 20 Infinitesimal displacement convexity
Z
ρ1 (x1 )
U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
β(x0 , x1 )
Z Z
K∞,U
≤ U (ρ1 (x1 )) ν(dx1 ) − ρ1 (x1 ) d(x0 , x1 )2 π(dx0 |x1 ) ν(dx1 )
6
Z
K∞,U
= U (ρ1 ) dν − W2 (µ0 , µ1 )2 . (20.28)
6
On the other hand,
Z
p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
Z
K∞,U
≤− ρ0 (x0 ) d(x0 , x1 )2 π(dx1 |x0 ) ν(dx0 )
3
K∞,U
=− W2 (µ0 , µ1 )2 . (20.29)
3
Plugging (20.28) and (20.29) into (20.12) ﬁnishes the proof of (20.14).

The proof of (iii) is along the same lines: I shall establish the identity
Z
ρ1 (x1 )
U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
β(x0 , x1 )
Z
+ p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
1 1
!
(sup ρ0 )− N (sup ρ1 )− N
≤ Uν (µ1 ) − Kλ + W2 (µ0 , µ1 )2 . (20.30)
3 6

This combined with Corollary 19.5 will lead from (20.12) to (20.15).
So let us prove (20.30). By convexity of s 7−→ sN U (s−N ),

ρ1 (x1 ) β(x0 , x1 ) U (ρ1 (x1 ))
U ≤
β(x0 , x1 ) ρ1 (x1 ) ρ1 (x1 )
1− 1 " 1 #
β(x0 , x1 ) N ρ1 (x1 ) β(x0 , x1 ) N 1
+N p − 1 ,
ρ1 (x1 ) β(x0 , x1 ) ρ1 (x1 ) ρ1 (x1 ) N
which is the same as

ρ1 (x1 )
U β(x0 , x1 ) ≤ U (ρ1 (x1 ))
β(x0 , x1 )

ρ1 (x1 ) 1 1
+Np β(x0 , x1 )1− N β(x0 , x1 ) N − 1 .
β(x0 , x1 )
HWI inequalities 557

As a consequence,
Z
ρ1 (x1 )
U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 ) ≤ Uν (µ1 )
β(x0 , x1 )
Z
ρ1 (x1 ) 1 1
+N p β(x0 , x1 )1− N β(x0 , x1 ) N −1 π(dx0 |x1 ) ν(dx1 ).
β(x0 , x1 )
(20.31)

Since K ≥ 0, β(x0 , x1 ) coincides with (α/ sin α)N −1 , where α =

p
K/(N − 1) d(x0 , x1 ). By the elementary inequality

α N−1
N N −1
0 ≤ α ≤ π =⇒ N −1 ≥ α2 (20.32)
sin α 6

(see the bibliographical notes for details), the right-hand side of (20.31)
is bounded above by
Z
K ρ1 (x1 ) 1
Uν (µ1 ) − p β(x0 , x1 )1− N π(dx0 |x1 ) ν(dx1 )
6 β(x0 , x1 )
Z
K p(r) 1
≤ Uν (µ1 ) − inf 1 ρ1 (x1 )1− N d(x0 , x1 )2 π(dx0 |x1 ) ν(dx1 )
6 r>0 r 1− N
≤ Uν (µ1 )
Z
K p(r) −N 1
− lim (sup ρ1 ) ρ1 (x1 ) d(x0 , x1 )2 π(dx0 |x1 ) ν(dx1 )
6 r→0 r 1− N1
K 1
= Uν (µ1 ) − λ (sup ρ1 )− N W2 (µ0 , µ1 )2 , (20.33)
6
where λ = λN,U .
On the other hand, since β ′ (x0 , x1 ) = −(N − 1)(1 − (α/ tan α)) < 0,
we can use the elementary inequality
α α2
0 < α ≤ π =⇒ (N − 1) 1 − ≥ (N − 1) (20.34)
tan α 3
(see the bibliographical notes again) to deduce
558 20 Infinitesimal displacement convexity
Z
p(ρ0 (x0 )) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 ) (20.35)
Z
p(r) 1
≤ inf 1 ρ0 (x0 )1− N β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
r>0 r 1− N
Z
p(r) −N1
≤ lim 1 (sup ρ0 ) ρ0 (x0 ) β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
r→0 r 1− N
Z
p(r) −N1 Kd(x0 , x1 )2
≤ lim 1 (sup ρ0 ) ρ0 (x0 ) π(dx1 |x0 ) ν(dx0 )
r→0 r 1− N 3
1 Z
Kλ (sup ρ0 )− N
= d(x0 , x1 )2 π(dx0 dx1 )
3
1
Kλ (sup ρ0 )− N
= W2 (µ0 , µ1 )2 . (20.36)
3
The combination of (20.33) and (20.36) implies (20.30) and con-
cludes the proof of Theorem 20.10. ⊓
⊔

Bibliographical notes

Formula (20.6) appears as Theorem 5.30 in my book [814] when the

space is Rn ; there were precursors, see for instance [669, 671]. The inte-
gration by parts leading from (20.6) to (20.7) is quite tricky, especially
in the noncompact case; this will be discussed later in more detail in
Chapter 23 (see Theorem 23.14 and the bibliographical notes). Here
I preferred to be content with Theorem 20.1, which is much less tech-
nical, and still sufficient for most applications known to me. Moreover,
it applies to nonsmooth spaces, which will be quite useful in Part III of
this course. The argument is taken from my joint work with Lott [577]
(where the space X is assumed to be compact, which simplifies slightly
the assumptions).
Fisher introduced the Fisher information as part of his theory of
“efficient statistics” [373]. It plays a crucial role in the Cramér–Rao
inequality [252, Theorem 12.11.1], determines the asymptotic variance
of the maximum likelihood estimate [802, Chapter 4] and the rate func-
tion for large deviations of time-averages of solutions of heat-like equa-
tions [313]. The Boltzmann–Gibbs–Shannon–Kullback information on
the one hand, the Fisher information on the other hand, play the two
Bibliographical notes 559

leading roles in information theory [252, 295]. They also have a leading
part in statistical mechanics and kinetic theory (see e.g. [817, 812]).
The HWI inequality was established in my joint work with Otto [671];
it obviously extends to any reasonable K-displacement convex func-
tional. A precursor inequality was studied by Otto [669]. An applica-
tion to a “concrete” problem of partial differential equations can be
found in [213, Section 5]. Recently Gao and Wu [405] used the HWI
inequality to derive new criteria of uniqueness for certain spin systems.
It is shown in [671, Appendix] and [814, Proof of Theorem 9.17,
Step 1] how to devise approximating sequences of smooth densities in
such a way that the Hν and Iν functionals pass to the limit. By adapting
these arguments one may conclude the proof of Corollary 20.13.
The role of the HWI inequality as an interpolation inequality is
briefly discussed in [814, Section 9.4] and turned into application
in [213, Proof of Theorem 5.1]: in that reference we study rates of con-
vergence for certain nonlinear partial differential equations, and com-
bine a bound on the Fisher information with a convergence estimate in
Wasserstein distance, to establish a convergence estimate in a stronger
sense (L1 norm, for instance).
The HWI inequality is also interesting as an “infinite-dimensional”
interpolation inequality; this is applied in [445] to the study of the limit
behavior of the entropy in a hydrodynamic limit.
A slightly different derivation of the HWI inequality is due to
Cordero-Erausquin [242]; a completely different derivation is due to
Bobkov, Gentil and Ledoux [127]. Variations of these inequalities were
studied by Agueh, Ghoussoub and Kang [5]; and Cordero-Erausquin,
Gangbo and Houdré [245].
There is no well-identified analog of the HWI inequality for non-
quadratic cost functions. For nonquadratic costs in Rn , some inequali-
ties in the spirit of HWI are established in [76], where they are used to
derive various isoperimetric-type inequalities.
The first somewhat systematic studies of HWI-type inequalities in
the case N < ∞ are due to Lott and myself [577, 578].
The elementary inequalities (20.32) and (20.34) are proven in [578,
Section 5], where they are used to derive the Lichnerowicz spectral gap
inequality (Theorem 21.20 in Chapter 21).
21

Isoperimetric-type inequalities

It is a fact of experience that several inequalities with isoperimetric

content can be retrieved by considering the above-tangent formulation
of displacement convexity. Here is a possible heuristic explanation for
this phenomenon. Assume, for the sake of the discussion, that the initial
measure is the normalized indicator function of some set A. Think of
the functional Uν as the internal energy of some fluid that is initially
confined in A. In a displacement interpolation, some of the mass of the
fluid will have to flow out of A, leading to a variation of the energy
(typically, more space available means less density and less energy).
The decrease of energy at initial time is related to the amount of mass
that is able to flow out of A at initial time, and that in turn is related
to the surface of A (a small surface leads to a small variation, because
not much of the fluid can escape). So by controlling the decrease of
energy, one should eventually gain control of the surface of A.
The functional nature of this approach makes it possible to replace
the set A by some arbitrary probability measure µ = ρ ν. Then, what
plays the role of the “surface” of A is some integral expression involving
∇ρ. Any inequality expressing the domination of an integral expression
of ρ by an integral expression of ρ and ∇ρ will be loosely referred to
as a Sobolev-type, or isoperimetric-type inequality. Of course there
are many many variants of such inequalities.
562 21 Isoperimetric-type inequalities

Logarithmic Sobolev inequalities

A probability measure ν on a Riemannian manifold is said to satisfy a

logarithmic Sobolev inequality if the functional Hν is dominated by (a
constant multiple of) the functional Iν . Here is a more precise deﬁnition:

Definition 21.1 (Logarithmic Sobolev inequality). Let M be a

Riemannian manifold, and ν a probability measure on M . It is said
that ν satisfies a logarithmic Sobolev inequality with constant λ if, for
any probability measure µ = ρ ν with ρ Lipschitz, one has
1
Hν (µ) ≤ Iν (µ). (21.1)
2λ
Explicitly, inequality (21.1) means
Z Z
1 |∇ρ|2
ρ log ρ dν ≤ dν. (21.2)
2λ ρ

Equivalently, for any function u (regular enough) one should have

Z Z Z Z
2 2 2 2 2
u log(u ) dν − u dν log u dν ≤ |∇u|2 dν. (21.3)
λ
R
To go from (21.2) to (21.3), just set ρ = u2 /( u2 dν) and notice that
∇|u| ≤ |∇u|.
The Lipschitz regularity of ρ allows one to deﬁne |∇ρ| pointwise,
for instance by means of (20.1). Everywhere in this chapter, |∇ρ| may
also be replaced by the quantity |∇− ρ| appearing in (20.2); in fact both
expressions coincide almost everywhere if u is Lipschitz.
This restriction of Lipschitz continuity is unnecessary, and can be re-
laxed with a bit of work. For instance, if ν = e−V vol, V ∈ C 2 (M ), then
one
R can use a little bit of distribution theory to show that the quantity
|∇ρ|2 /ρ dν is well-deﬁned in [0, +∞], and then (21.1) makes sense.
But in the sequel, I shall just stick to Lipschitz functions. The same re-
mark applies to other functional inequalities which will be encountered
later: dimension-dependent Sobolev inequalities, Poincaré inequalities,
etc.
Logarithmic Sobolev inequalities are dimension-free Sobolev-type
inequalities: the dimension of the space does not appear explicitly
in (21.3). This is one reason why these inequalities are extremely popu-
lar in various branches of statistical mechanics, mathematical statistics,
Logarithmic Sobolev inequalities 563

quantum ﬁeld theory, and more generally the study of phenomena in

high or infinite dimension. They are also used in geometry and partial
differential equations, including Perelman’s recent work on the Ricci
flow and the Poincaré conjecture.
At this stage of the course, the next theorem, a famous result in
Riemannian geometry, may seem almost trivial.

Theorem 21.2 (Bakry–Émery theorem). Let M be a Rieman-

nian manifold equipped with a reference probability measure ν = e−V vol,
V ∈ C 2 (M ), satisfying a curvature assumption CD(K, ∞) for some
K > 0. Then ν satisfies a logarithmic Sobolev inequality with constant
K, i.e.
Iν
Hν ≤ . (21.4)
2K
2
Example 21.3. For the Gaussian measure γ(dx) = (2π)−n/2 e−|x| /2 in
Rn , one has
Iγ
Hγ ≤ , (21.5)
2
independently of the dimension. This is the Stam–Gross logarithmic
Sobolev inequality. By scaling, for any K > 0 the measure γK (dx) =
2
(2π/K)−n/2 e−K|x| /2 dx satisﬁes a logarithmic Sobolev inequality with
constant K.

Remark 21.4. More generally, if V ∈ C 2 (Rn ) and ∇2 V ≥ KIn , The-

orem 21.2 shows that ν(dx) = e−V (x) dx satisﬁes a logarithmic Sobolev
inequality with constant K. When V (x) = K|x|2 /2 the constant K is
optimal in (21.4).

Remark 21.5. The curvature assumption CD(K, ∞) is quite restric-

tive; however, there are known perturbation theorems which immedi-
ately extend the range of application of Theorem 21.2. For instance,
if ν satisfies a logarithmic Sobolev inequality, v is a bounded function
and νe = e−v ν/Z is another probability measure obtained from ν by
multiplication by e−v , then also νe satisfies a logarithmic Sobolev in-
equality (Holley–Stroock perturbation
R theorem). The same is true if v
2
is unbounded, but satisfies eα|∇v| dν < ∞ for α large enough.

Proof of Theorem 21.2. By Theorem 18.12, ν admits square-exponential

moments, in particular it lies in P2 (M ). Then from Corollary 20.13(ii)
and the inequality ab ≤ Ka2 /2 + b2 /(2K),
564 21 Isoperimetric-type inequalities

p KW2 (µ, ν)2 Iν (µ)

Hν (µ) ≤ W2 (µ, ν) Iν (µ) − ≤ .
2 2K
⊓
⊔

Open Problem 21.6. For Riemannian manifolds satisfying CD(K, N )

with N < ∞, the optimal constant in the logarithmic Sobolev inequality
is not K but KN/(N −1). Can this be proven by a transport argument?

In the next section, some ﬁnite-dimensional Sobolev inequalities will

be addressed, but it is not at all clear that they are strong enough to
lead to the solution of Problem 21.6. Before examining these issues, I
shall state an easy variation of Theorem 21.2. Recall Deﬁnition 20.6.

Theorem 21.7 (Sobolev-L∞ interpolation inequalities). Let M

be a Riemannian manifold, equipped with a reference probability mea-
sure ν = e−V vol, V ∈ C 2 (M ), satisfying a CD(K, N ) curvature-
dimension bound for some K > 0, N ∈ (1, ∞]. Further, let U ∈ DCN .
Then, for any Lipschitz-continuous probability density ρ, if µ = ρ ν,
one has the inequality
1
(sup ρ) N
0 ≤ Uν (µ) − Uν (ν) ≤ IU,ν (µ), (21.6)
2Kλ
where
p(r)
λ = lim 1 .
r→0 r 1− N
Proof of Theorem 21.7. The proof of the inequality on the right-hand
side of (21.6) is the same as for Theorem 21.2, using Theorem 20.10(iii).
The inequality on Rthe left-hand side
R is a consequence of Jensen’s in-
equality: Uν (µ) = U (ρ) dν ≥ U ( ρ dν) = U (1) = Uν (ν). ⊓
⊔

Sobolev inequalities

Sobolev inequalities are one among several classes of functional inequal-

ities with isoperimetric content; they are extremely popular in the the-
ory of partial diﬀerential equations. They look like logarithmic Sobolev
inequalities, but with powers instead of logarithms, and they take di-
mension into account explicitly.
Sobolev inequalities 565

The most basic Sobolev inequality is in Euclidean space: If u is a

function on Rn such that ∇u ∈ Lp (Rn ) (1 ≤ p < n) and u vanishes
at inﬁnity (in whatever sense, see e.g. Remark 21.13 below), then u
⋆
automatically lies in Lp (Rn ) where p⋆ = (np)/(n − p) > p. More
quantitatively, there is a constant S = S(n, p) such that

kukLp⋆ (Rn ) ≤ S k∇ukLp (Rn ) .

′
There are other versions for p = n (in which case essentially exp(c un )
is integrable, n′ = n/(n − 1)), and p > n (in which case u is Hölder-
continuous). There are also very many variants for a function u deﬁned
on a set Ω that might be a reasonable open subset of either Rn or a
Riemannian manifold M . For instance (say for p < n),
(n − 1) p
kukLp⋆ (Ω) ≤ A k∇ukLp (Ω) + C kukLp♯ (∂Ω) , p♯ = ,
n−p
kukLp⋆ (Ω) ≤ A k∇ukLp (Ω) + B kukLq (Ω) , q ≥ 1,
etc. One can also quote the Gagliardo–Nirenberg interpolation
inequalities, which typically take the form
1−θ θ
kukLp⋆ ≤ G k∇ukLp kukLq , 1 ≤ p < n, 1 ≤ q < p⋆ , 0 ≤ θ ≤ 1,

with some restrictions on the exponents. I will not say more about
Sobolev-type inequalities, but there are entire books devoted to them.
In a Riemannian setting, there is a famous family of Sobolev inequal-
ities obtained from the curvature-dimension bound CD(K, N ) with
K > 0 and 2 < N < ∞:
2N
1≤q≤ =⇒
N −2
"Z 2 Z # Z
c q NK
|u|q dν − |u|2 dν ≤ |∇u|2 dν, c= .
q−2 N −1
(21.7)

When q → 2, (21.7) reduces to Bakry–Émery’s logarithmic Sobolev

inequality. The other most interesting member of the family is obtained
when q coincides with the critical exponent 2⋆ = (2N )/(N − 2), and
then (21.7) becomes

2 2 4 N −1
kuk 2N ≤ kukL2 (M ) + k∇uk2L2 (M ) . (21.8)
L N−2 (M ) N −2 KN
566 21 Isoperimetric-type inequalities

There is no loss of generality in assuming u ≥ 0, since the inequality

for general u follows easily from the inequality for nonnegative u. Let
us then change unknowns by choosing ρ = u2N/(N −2) . By homogene-
ity, there is also no loss of generality in assuming that µ := ρ ν is a
probability measure. Then inequality (21.8) transforms into
Z
N 2
HN/2,ν (µ) = − (ρ1− N − ρ) dν
2
Z
1 |∇ρ|2 (N − 1)(N − 2) − 2
≤ ρ N dν. (21.9)
2K ρ N2
The way in which I have written inequality (21.9) might look strange,
but it has the merit of showing very clearly how the limitRN → ∞ leads
to the logarithmic Sobolev inequality H∞,ν (µ) ≤ (1/2K) (|∇ρ|2 /ρ) dν.
I don’t know whether (21.9), or more generally (21.7), can be ob-
tained by transport. Instead, I shall derive related inequalities, whose
relation to (21.9) is still unclear.

Remark 21.8. It is possible that (21.8) implies (21.6) if U = UN . This

would follow from the inequality

N −1 1
HN,ν ≤ (sup ρ) N HN/2,ν ,
N −2
which should not be diﬃcult to prove, or disprove.

Theorem 21.9 (Sobolev inequalities from CD(K, N )). Let M be

a Riemannian manifold, equipped with a reference measure ν = e−V vol,
V ∈ C 2 (M ), satisfying a curvature-dimension inequality CD(K, N ) for
some K > 0, 1 < N < ∞. Then, for any probability density ρ, Lipschitz
continuous and strictly positive, and µ = ρ ν, one has
Z Z
1
1− N
HN,ν (µ) = −N (ρ − ρ) dν ≤ Θ(N,K) (ρ, |∇ρ|) dν, (21.10)
M M

where

Θ(N,K) (r, g) =
r α 1− 1
N −1 g N −1 N
r sup 1 α+N 1−
0≤α≤π N r N
1+ K sin α
α 1
+ (N − 1) − 1 r − N . (21.11)
tan α
Sobolev inequalities 567

As a consequence,
Z 2 2
!
1 |∇ρ|2 N −1 ρ− N
HN,ν (µ) ≤ 1 dν. (21.12)
2K M ρ N 1
+ 23 ρ− N
3

Remark 21.10. By taking the limit as N → ∞ in (21.12), one recovers

again the logarithmic Sobolev inequality of Bakry and Émery, with
the sharp constant. For ﬁxed N , the exponents appearing in (21.12)
are sharp: For large ρ, the integrand in the right-hand side behaves
1
like |∇ρ|2 ρ−(1+2/N ) = cN |∇ρ 2⋆ |2 , so the critical Sobolev exponent 2⋆
does govern this inequality. On the other hand, the constants appearing
in (21.12) are deﬁnitely not sharp; for instance it is obvious that they
do not imply exponential integrability as N → 2.

Open Problems 21.11. Is inequality (21.10) stronger, weaker, or not

comparable to inequality (21.9)? Does (21.12) follow from (21.9)? Can
one find a transport argument leading to (21.9)?

Proof of Theorem 21.9. Start from Theorem 20.10 and choose U (r) =
1
−N (r 1− N − r). After some straightforward calculations, one obtains
Z

HN,ν (µ) ≤ θ (N,K) ρ, |∇ρ|, α ,
M
p
where α = K/(N − 1) d(x0 , x1 ) ∈ [0, π], and θ (N,K) is an explicit
function such that

Θ(N,K) (r, g) = sup θ (N,K)(r, g, α).

α∈[0,π]

This is suﬃcient to prove (21.10).

To go from (21.10) to (21.12), one can use the elementary inequali-
ties (20.32) and (20.34) and compute the supremum explicitly. ⊓
⊔

Now I shall consider the case of the Euclidean space Rn , equipped

with the Lebesgue measure, and show that sharp Sobolev inequalities
can be obtained by a transport approach. The proof will take advantage
of the scaling properties in Rn .

Theorem 21.12 (Sobolev inequalities in Rn ). Whenever u is a

Lipschitz, compactly supported function on Rn , then
568 21 Isoperimetric-type inequalities
np
kukLp⋆ (Rn ) ≤ Sn (p) k∇ukLp (Rn ) , 1 ≤ p < n, p⋆ = ,
n−p
(21.13)
where the constant Sn (p) is given by
 Z 1⋆ Z 1′ 

 p
p ′ p 


 p(n − 1) |g| |y| |g(y)| dy 
 p
Sn (p) = inf Z , p′ = ,

 n(n − p) 1 
 p−1

 |g|1− n 


and the infimum is taken over all functions g ∈ L1 (Rn ), not identi-
cally 0.
Remark 21.13. The assumption of Lipschitz continuity for u can be
removed, but I shall not do so here. Actually, inequality (21.13) holds
true as soon as u is locally integrable and vanishes at inﬁnity, in the
sense that the Lebesgue measure of {|u| ≥ r} is ﬁnite for any r > 0.
Remark 21.14. The constant Sn (p) in (21.13) is optimal.
Proof of Theorem 21.12. Choose M = Rn , ν = Lebesgue measure,
and apply Theorem 20.10 with K = 0, N = n, and µ0 = ρ0 ν,
µ1 = ρ1 ν, both of them compactly supported. By formula (20.14) in
Theorem 20.10(i),

Hn,ν (µ0 ) − Hn,ν (µ1 )

Z
1 1
≤ 1− ρ0 (x0 )−(1+ n ) |∇ρ0 |(x0 ) d(x0 , x1 ) π(dx0 dx1 ).
n Rn ×Rn

Then Hölder’s inequality and the marginal property of π imply

Z 1
1 −p(1+ n1
) p
p
Hn,ν (µ0 ) − Hn,ν (µ1 ) ≤ 1 − ρ0 |∇ρ0 | dµ0
n Rn
Z 1′
p
p′
d(x0 , x1 ) π(dx0 dx1 ) ,
Rn ×Rn

where p′ = p/(p − 1). This can be rewritten

Z 1
Z
1− n 1− 1
n ρ1 dν ≤ n ρ0 n dν
1
1 1
−p(1+ n ) p
+ 1− ρ0 |∇ρ0 |p dµ0 Wp′ (µ0 , µ1 ). (21.14)
n
Sobolev inequalities 569

Now I shall use a homogeneity argument. Fix ρ1 and ρ0 as above,

(λ)
and deﬁne ρ0 (x) = λn ρ0 (λx). On the one hand,
Z Z
(λ) 1− n
1
−1 1− 1
ρ0 dν = λ ρ0 n dν −−−→ 0;
λ→∞

on the other hand,

Z
(λ) 1 (λ) (λ)
(ρ0 )−p(1+ n ) |∇ρ0 |p dµ0 does not depend on λ.
Rn

(λ) (λ)
Moreover, as λ → ∞, the probability measure µ0 = ρ0 ν converges
weakly to the Dirac mass δ0 at the origin; so
Z 1′
p
(λ) p′
Wp′ (µ0 , µ1 ) −→ Wp′ (δ0 , µ1 ) = |y| dµ1 (y) .

(λ)
After writing (21.14) for µ0 = µ0 and then passing to the limit as
λ → ∞, one obtains
Z Z 1 Z 1′
1− 1 1
−p(1+ n ) p
p′
p
n ρ1 n dν ≤ ρ0 |∇ρ0 |p dµ0 |y| dµ1 (y) .
Rn
(21.15)
⋆
Let us change unknowns and deﬁne ρ0 = u1/p , ρ1 = g; inequal-
ity (21.15) then becomes
 Z 1′ 
p
p′
 |y| g(y) dy 
p (n − 1)  
1≤  Z  k∇ukLp ,
n (n − p)  1 
g(1− n )

R ⋆ R
where u and g are only required to satisfy up = 1, g = 1. The
inequality (21.13) follows by homogeneity again. ⊓
⊔

To conclude this section, I shall assume RicN,ν ≥ K < 0 and derive

Sobolev inequalities for compactly supported functions. Since I shall
not be concerned here with optimal constants, I shall only discuss the
limit case p = 1, p⋆ = n/(n − 1), which implies the general inequality
for p < n (via Hölder’s inequality), up to a loss in the constants.

Theorem 21.15 (CD(K, N ) implies L1 -Sobolev inequalities).

Let M be a Riemannian manifold equipped with a reference measure
570 21 Isoperimetric-type inequalities

ν = e−V vol, satisfying a curvature-dimension bound CD(K, N ) for

some K < 0, N ∈ (1, ∞). Then, for any ball B = B(z, R), R ≥ 1,
there are constants A and B, only depending on a lower bound on K,
and upper bounds on N and R, such that for any Lipschitz function u
supported in B,

kuk N ≤ A k∇ukL1 + B kukL1 . (21.16)

L N−1

Proof of Theorem 21.15. Inequality (21.16) remains unchanged if ν is

multiplied by a positive constant. So we might assume, without loss of
generality, that ν[B(z, R)] = 1.
Formula (20.12) in Theorem 20.10 implies
Z Z
1− 1 1 1
N− ρ0 N dν ≤ N − ρ1 (x1 )1− N β(x0 , x1 ) N π(dx0 |x1 ) ν(dx1 )
Z
1
+ ρ0 (x0 )1− N β ′ (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
Z
1 1
+ ρ0 (x0 )−1− N |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 ). (21.17)
N
Choose ρ1 = 1B(z,R) /ν[B(z, R)] (the normalized indicator function of
the ball). The arguments of β and β ′ in (21.17) belong to B(z, R), so
the coeﬃcients β and β ′ remain bounded by some explicit function of
N , K and R, while the distance d(x0 , x1 ) remains bounded by 2R. So
there are constants δ(K, N, R) > 0 and C(K, N, R) such that
Z 1
1− N 1
− ρ0 dν ≤ − δ(K, N, R) ν[B] N
Z 1
Z 1

1− N −N
+ C(K, N, R) ρ0 + ρ0 |∇ρ0 | . (21.18)

Recall that ν[B] = 1; then after the change of unknowns ρ0 = uN/(N −1) ,
inequality (21.18) implies

1 ≤ S K, N, R k∇ukL1 (M ) + kukL1 (M ) ,

for some explicit

R constant
R S = (C + 1)/δ. This holds true under the
constraint 1 = ρ = uN/(N −1) , and then inequality (21.16) follows
by homogeneity. ⊓
⊔
Isoperimetric inequalities 571

Isoperimetric inequalities
Isoperimetric inequalities are sometimes obtained as limits of Sobolev
inequalities applied to indicator functions. The most classical example
is the equivalence between the Euclidean isoperimetric inequality
|∂A| |∂B n |
n−1 ≥ n−1
|A| n |B n | n

already considered in Chapter 2, and the optimal Sobolev inequality

kukLn/(n−1) (Rn ) ≤ Sn (1) k∇ukL1 (Rn ) .
As seen before, there is a proof of the optimal Sobolev inequality
in Rn based on transport, and of course this leads to a proof of the
Euclidean isoperimetry. There is also a more direct path to a transport-
based proof of isoperimetry, as explained in Chapter 2.
Besides the Euclidean one, the most famous isoperimetric inequality
in differential geometry is certainly the Lévy–Gromov inequality,
which states that if A is a reasonable set in a manifold (M, g) with
dimension n and Ricci curvature bounded below by K > 0, then
|∂A| |∂B|
≥ ,
|M | |S|
where B is a spherical cap in the model sphere S (that is, the sphere
with dimension N and Ricci curvature K) such that |B|/|S| = |A|/|M |.
In other words, isoperimetry in M is at least as strong as isoperimetry
in the model sphere.
I don’t know if the Lévy–Gromov inequality can be retrieved from
optimal transport, and I think this is one of the most exciting open
problems in the field. Indeed, there seems to be no “reasonable” proof
of the Lévy–Gromov inequality, in the sense that the only known ar-
guments rely on subtle results from geometric measure theory, about
the rectifiability of certain extremal sets. A softer argument would be
conceptually very satisfactory. I record this in the form of a loosely
formulated open problem:
Open Problem 21.16. Find a transport-based, soft proof of the Lévy–
Gromov isoperimetric inequality.
The same question can be asked for the Gaussian isoperimetry,
which is the infinite-dimensional version of the Lévy–Gromov inequal-
ity. In this case however there are softer approaches based on functional
versions; see the bibliographical notes for more details.
572 21 Isoperimetric-type inequalities

Poincaré inequalities

Poincaré inequalities are related to Sobolev inequalities, and often ap-

pear as limit cases of them. (I am sorry if the reader begins to be bored
by this litany: Logarithmic Sobolev inequalities are limits of Sobolev
inequalities, isoperimetric inequalities are limits of Sobolev inequali-
ties, Poincaré inequalities are limits of Sobolev inequalities . . . ) Here in
this section I shall only consider global Poincaré inequalities, which are
rather diﬀerent from the local inequalities considered in Chapter 19.

Definition 21.17 (Poincaré inequality). Let M be a Riemannian

manifold, and ν a probability measure on M . It is said that ν satisfies
a Poincaré inequality with constant λ if, for any u ∈ L2 (µ) with u
Lipschitz, one has
Z

u − hui 2 2 ≤ 1 k∇uk2 2 , hui = u dν. (21.19)
L (ν) L (ν)
λ
Remark 21.18. Throughout Part II, I shall always assume that ν is
absolutely continuous with respect to the volume measure. This implies
that Lipschitz functions are ν-almost everywhere diﬀerentiable.

Inequality (21.19) can be reformulated into

Z
k∇uk2L2
u dν = 0 =⇒ kuk2L2 ≤ .
λ

This writing makes the formal connection with the logarithmic Sobolev
inequality very natural. (The Poincaré inequality is obtained as the
limit of the logarithmic Sobolev inequality when one sets µ = (1+ εu) ν
and lets ε → 0.)
Like Sobolev inequalities, Poincaré inequalities express the domi-
nation of a function by its gradient; but unlike Sobolev inequalities,
they do not include any gain of integrability. Poincaré inequalities have
spectral content, since the best constant λ can be interpreted as the
spectral gap for the Laplace operator on M .1 There is no Poincaré in-
equality on Rn equipped with the Lebesgue measure (the usual “ﬂat”
Laplace operator does not have a spectral gap), but there is a Poincaré
1
This is one reason to take λ (universally accepted notation for spectral gap) as
the constant defining the Poincaré inequality. Unfortunately this is not consistent
with the convention that I used for local Poincaré inequalities; another choice
would have been to call λ−1 the Poincaré constant.
Poincaré inequalities 573

inequality on, say, any compact Riemannian manifold equipped with

its volume measure.
Poincaré inequalities are implied by logarithmic Sobolev inequali-
ties, but the converse is false.
Example 21.19 (Exponential measure). Let ν(dx) = e−|x| dx be
the exponential measure on [0, +∞). Then ν satisfies a Poincaré in-
equality (with constant 1). On the other hand, it does not satisfy any
logarithmic Sobolev inequality. The same conclusions hold true for the
double-sided exponential measure ν(dx) = e−|x| dx/2 on R. More gener-
β
ally, the measure νβ (dx) = e−|x| dx/Zβ satisfies a Poincaré inequality
if and only if β ≥ 1, and a logarithmic Sobolev inequality if and only
if β ≥ 2.
Establishing Poincaré inequalities for various measures is an ex-
tremely classical problem on which a lot has been written. Here is one
of the oldest results in the field:
Theorem 21.20 (Lichnerowicz’s spectral gap inequality). Let
M be a Riemannian manifold equipped with a reference Borel measure
ν = e−V vol, V ∈ C 2 (M ), satisfying a curvature-dimension condition
CD(K, N ) for some K > 0, N ∈ (1, ∞]. Then ν satisfies a Poincaré
inequality with constant KN/(N − 1).
In other words, Rif CD(K, N ) holds true, then for any Lipschitz func-
tion f on M with f dν = 0, one has
Z Z Z
2 N −1
f dν = 0 =⇒ f dν ≤ |∇f |2 dν. (21.20)
KN
Remark 21.21. Let L = ∆−∇V ·∇, then (21.20) means that L admits
a spectral gap of size at least KN/(N − 1):
KN
λ1 (−L) ≥ .
N −1
Proof of Theorem 21.20. In the case N < ∞, apply (21.12) with
µ = R(1 + εf ) ν, where ε is a small positive number, f is Lipschitz
and f dν = 0. Since M has a finite diameter, f is bounded, so µ is
a probability measure for ε small enough. Then, by standard Taylor
expansion of the logarithm function,
Z Z 2
2 N −1 f
HN,ν (µ) = ε f dν + ε dν + o(ε2 ),
N 2
574 21 Isoperimetric-type inequalities

and the ﬁrst term on the right-hand side vanishes by assumption. Sim-
ilarly, !
Z 2 Z
|∇ρ|2 ρ− N 2
1 =ε |∇f |2 dν + o(ε2 ).
ρ 1
+ 2 −N
ρ
3 3
So (21.12) implies
Z 2 Z
N −1 f2 1 N −1
dν ≤ |∇f |2 dν,
N 2 2K N

and then inequality (21.20) follows.

In the case N = ∞, start from inequality (21.4) and apply a sim-
ilar reasoning. (It is in fact a well-known property that a logarithmic
Sobolev inequality with constant K implies a Poincaré inequality with
constant K.) ⊓
⊔

Bibliographical notes

Popular sources dealing with classical isoperimetric inequalities are the

book by Burago and Zalgaller [176], and the survey by Osserman [664].
A very general discussion of isoperimetric inequalities can be found in
Bobkov and Houdré [129]. The subject is related to Poincaré inequali-
ties, as can be seen through Cheeger’s isoperimetric inequality (19.32).
As part of his huge work on concentration of measure, Talagrand has
put forward the use of isoperimetric inequalities in product spaces [772].
There are entire books devoted to logarithmic Sobolev inequalities;
this subject goes back at least to Nelson [649] and Gross [441], in rela-
tion to hypercontractivity and quantum field theory; but it also has its
roots in earlier works by Stam [758], Federbush [351] and Bonami [142].
A gentle introduction, and references, can be found in [41]. The 1992
survey by Gross [442], the Saint-Flour course by Bakry [54] and the
book by Royer [711] are classical references. Applications to concentra-
tion of measure and deviation inequalities can also be found in those
sources, or in Ledoux’s synthesis works [544, 546].
The first and most famous logarithmic Sobolev inequality is the
one that holds true for the Gaussian reference measure in Rn (equa-
tion (21.5)). At the end of the fifties, Stam [758] established an inequal-
ity which can be recast (after simple changes of functions) as the usual
Bibliographical notes 575

logarithmic Sobolev inequality, found ﬁfteen years later by Gross [441].

Stam’s inequality reads N I ≥ 1, where I is the Fisher information, and
N is the “power entropy”. (In dimension n, this inequality should be
replaced by N I ≥ n.) The main difference between these inequalities
is that Stam’s is expressed in terms of the Lebesgue reference measure,
while Gross’s is expressed in terms of the Gaussian reference measure.
Although Stam is famous for his information-theoretical inequalities,
it was only at the beginning of the nineties that specialists identified
a version of the logarithmic Sobolev inequality in his work. (I learnt
it from Carlen.) I personally use the name “Stam–Gross logarithmic
Sobolev inequality” for (21.5); but this is of course debatable. Stam’s
argument was slightly flawed because of regularity issues, see [205, 783]
for corrected versions.
At present, there are more than fifteen known proofs of (21.5); see
Gross [442] for a partial list.
The Bakry–Émery theorem (Theorem 21.2) was proven in [56] by
a semigroup method which will be reinterpreted in Chapter 25 as a
gradient flow argument. The proof was rewritten in a language of par-
tial differential equations in [43], with emphasis on the link with the
convergence to equilibrium for the heat-like equation ∂t ρ = Lρ.
The proof of Theorem 21.2 given in these notes is essentially the
one that appeared in my joint work with Otto [671]. When the mani-
fold M is Rn (and V is K-convex), there is a slightly simpler variant
of that argument, due to Cordero-Erausquin [242]; there are also two
quite different proofs, one by Caffarelli [188] (based on his log concave
perturbation theorem) and one by Bobkov and Ledoux [131] (based on
the Brunn–Minkowski inequality in Rn ). It is likely that the distorted
Prékopa–Leindler inequality (Theorem 19.16) can be used to derive an
alternative proof of the Bakry–Émery theorem in the style of Bobkov–
Ledoux.
In the definition of logarithmic Sobolev inequality (Definition 21.1),
I imposed a Lipschitz condition on the probability density ρ. Relaxing
this condition can be done by tedious approximation arguments in the
style of [671, Appendix], [43], and [814, Proof of Theorem 9.17, Step 1].
(In these references this is done only for the Euclidean space.) By the
way, the original proof by Bakry and Émery [56] is complete only for
compact manifolds. (Only formal computations were established in this
paper, and integrations by parts were not taken care of rigorously,
neither the proof of ergodicity of the diffusion in the entropy sense. For
576 21 Isoperimetric-type inequalities

a compact manifold this is nothing; but on a noncompact manifold, one

should be careful about the behavior of the density at infinity.) In his
PhD thesis, Demange worked out much more complicated situations
in full detail, so there is no doubt that the Bakry–Émery strategy can
be made rigorous also on noncompact manifolds, although the proof is
probably nowhere to be found.
The Holley–Stroock perturbation theorem for logarithmic Sobolev
inequalities, explained in Remark 21.5, was proven in [478]. The other
R α|∇v| 2
criterion mentioned in Remark 21.5, namely e dν < ∞ for α
large enough, is due to Aida [7]. Related results can be found in a
paper by F.-Y. Wang [829].
Another theorem by F.-Y. Wang [828] shows that logarithmic
Sobolev inequalities follow from curvature-dimension lower bounds to-
gether with square-exponential moments. More precisely, if (M, ν) sat-
R K 2
isfies CD(−K, ∞) for some K > 0, and e( 2 +ε) d(x,x0 ) ν(dx) < +∞
for some ε > 0 and x0 ∈ M , then ν satisfies a logarithmic Sobolev
inequality. Barthe and Kolesnikov [76] derived more general results in
the same spirit.
Logarithmic Sobolev inequalities in Rn for the measure e−V (x) dx
require a sort of quadratic growth of the potential V , while Poincaré
inequalities require a sort of linear growth. (Convexity is sufficient,
as shown by Bobkov [126]; there is now an elementary proof of this
fact [74], while refinements and generalizations can be found in [633].) It
is natural to ask what happens in between, that is, when V (x) behaves
like |x|β as |x| → ∞, with 1 < β < 2. This subject has been studied
by Latala and Oleszkiewicz [540], Barthe, Cattiaux and Roberto [75]
and Gentil, Guillin and Miclo [410]. The former set of authors chose
to focus on functional inequalities which interpolate between Poincaré
and log Sobolev, and seem to be due to Beckner [78]; on the con-
trary, the latter set of authors preferred to focus on modified versions
of logarithmic Sobolev inequalities, following the steps of Bobkov and
Ledoux [130]. (Modified logarithmic Sobolev inequalities will be studied
later, in Chapter 22.)
On the real line, there is a characterization of logarithmic Sobolev in-
equalities, in terms of weighted Hardy-type inequalities, due to Bobkov
and Götze [128]; see also Barthe and Roberto [77]. In the simpler con-
text of Poincaré inequalities, the relation with Hardy inequalities goes
back at least to Muckenhoupt [643].
Bibliographical notes 577

The reﬁnement of the constant in the logarithmic Sobolev inequal-

ities by a dimensional factor of N/(N − 1) is somewhat tricky; see
for instance Ledoux [541]. As a limit case, on S 1 there is a logarith-
mic Sobolev inequality with constant 1, although the Ricci curvature
vanishes identically.
The normalized volume measure on a compact Riemannian mani-
fold always satisfies a logarithmic Sobolev inequality, as Rothaus [710]
showed long ago. But even on a compact manifold, this result does not
diminish the interest of the Bakry–Émery theorem, because the con-
stant in Rothaus’ argument is not explicit. Saloff-Coste [727] proved
a partial converse: If M has finite volume and Ricci curvature tensor
bounded below by K, and the normalized volume measure satisfies a
logarithmic Sobolev inequality with constant λ > 0, then M is compact
and there is an explicit upper bound on its diameter:

p 1 K−
diam (M ) ≤ C dim(M ) max √ , , (21.21)
λ λ
where C is numeric. (In view of the Bakry–Émery theorem, this can be
thought of as a generalization of the Bonnet–Myers bound.) Simplified
proofs of this result were given by Ledoux [544], then by Otto and
myself [671, Theorem 4].
Like their logarithmic relatives, Sobolev inequalities also fill up
books, but usually the emphasis is more on regularity issues. (In fact,
for a long time logarithmic Sobolev inequalities and plain Sobolev
inequalities were used and studied by quite different communities.)
A standard reference is the book by Maz’ja [611], but there are many
alternative sources.
A good synthetic discussion of the family (21.7) is in the course
by Ledoux [545], which reviews many results obtained by Bakry and
collaborators about Sobolev inequalities and CD(K, N ) curvature-
dimension bounds (expressed in terms of Γ and Γ2 operators). There
it is shown how to deduce some geometric information from (21.7),
including the Myers diameter bound (following [58]).
Demange recently obtained a derivation of (21.9) which is, from my
point of view, very satisfactory, and will be explained later in Chap-
ter 25. By Demange’s method one can establish the following gener-
alization of (21.9): Under adequate regularity assumptions, if (M, ν)
satisfies the curvature-dimension bound CD(K, N ), U ∈ DCN , and A
is defined by A(0) = 0 and A(1) = 0, A′′ (r) = r −1/N U ′′ (r), then for
any probability density ρ,
578 21 Isoperimetric-type inequalities
Z Z
1 1 2
A(ρ) dν ≤ ρ1− N ∇U ′ (ρ) dν.
M 2K M

Many variants, some of them rather odd-looking, appear in Demange’s

work [290, 291, 293]. For instance, he is able to establish seemingly
sharp inequalities for nonlinearities U satisfying the following condition:

d r U ′′ (r) 1 9N r U ′′ (r) 1 2
r + ≥ + .
dr U ′ (r) N 4(N + 2) U ′ (r) N

Demange also suggested that (21.8) might imply (21.6). It is inter-

esting to note that (21.6) can be proven by a simple transport argu-
ment, while no such thing is known for (21.8).
The proof of Theorem 21.9 is taken from a collaboration with
Lott [578].
The use of transport methods to study isoperimetric inequalities in
Rn goes back at least to Knothe [523]. Gromov [635, Appendix] revived
the interest in Knothe’s approach by using it to prove the isoperimetric
inequality in Rn . Recently, the method was put to a higher degree of
sophistication by Cordero-Erausquin, Nazaret and myself [248]. In this
work, we recover general optimal Sobolev inequalities in Rn , together
with some families of optimal Gagliardo–Nirenberg inequalities. (The
proof of the Sobolev inequalities is reproduced in [814, Theorem 6.21].)
The results themselves are not new, since optimal Sobolev inequalities
in Rn were established independently by Aubin, Talenti and Rodemich
in the seventies (see [248] for references), while the optimal Gagliardo–
Nirenberg inequalities were discovered recently by Del Pino and Dol-
beault [283]. However, I think that all in all the transport approach
is simpler, especially for the Gagliardo–Nirenberg family. In [248] the
optimal Sobolev inequalities came with a “dual” family of inequalities,
that can be interpreted as a particular case of so-called Faber–Krahn
inequalities; there is still (at least for me) some mystery in this duality.
In the present chapter, I have modiﬁed slightly the argument of [248]
to avoid the use of Alexandrov’s theorem about second derivatives of
convex functions (Theorem 14.25). The advantage is to get a more
elementary proof; however, the computations are less precise, and some
useful “magic” cancellations (such as x + (∇ϕ − x) = ∇ϕ) are not
available any longer; I used a homogeneity argument to get around
this problem. A drawback of this approach is that the discussion about
cases of equality is not possible any longer (anyway a clean discussion
of equality cases requires much more eﬀort; see [248, Section 4]). The
Bibliographical notes 579

proof presented here should work through if Rn is replaced by a cone

with nonnegative Ricci curvature, although I did not checkRdetails.
The homogeneity (under dilations) of the function − ρ1−1/N is
used in this argument to transform a seemingly nonoptimal inequality
into the optimal one; I wonder whether a similar argument could lead
from (21.6) to (21.8) on the sphere — but what would play the role of
homogeneity? In fact, homogeneity also underlies [248]: if one wishes
to interpret theR proof in terms of “Otto calculus” it all amounts to
the fact that − ρ1−1/N is displacement convex and homogeneous. The
Euclidean analog is the following: if Φ is convex on a Euclidean space
and minimal at 0 then Φ(X) ≤ X · ∇Φ(X) + |∇Φ(X)|2 /2; but if in
addition Φ is λ-homogeneous (Φ(tx) = tλ Φ(x)) then X · ∇Φ(X) =
λ Φ(X) (Euler relation), so
|∇Φ(X)|2 |∇Φ(X)|2
Φ(X) ≤ X · ∇Φ(X) + = λ Φ(X) + ,
2 2
so if λ < 1 we obtain Φ(X) ≤ |∇Φ(X)|2 /(2(1 − λ)).
Another inequality for which it would be interesting to have a
transport proof is the Hardy–Littlewood–Sobolev inequality. Recently
Calvez and Carrillo [198] obtained such a proof in dimension 1; further
investigation is under way.
After [248], Maggi and I pushed the method even further [587], to
recover “very optimal” Sobolev inequalities with trace terms, in Rn .
This settled some problems that had been left open in a classical work
by Brézis and Lieb [173]. Much more information can be found in [587];
recently we also wrote a sequel [588] in which limit cases (such as
inequalities of Moser–Trudinger type) are considered.
As far as all these applications of transport to Sobolev or isoperi-
metric inequalities in Rn are concerned, the Knothe coupling works
just about as fine as the optimal coupling. (As a matter of fact, many
such inequalities are studied in [134, 135] via the Knothe coupling.)
But the choice of coupling method becomes important for refinements
such as the characterization of minimizers [248], or even more estab-
lishing quantitative stability estimates around minimizers. This point
is discussed in detail by Figalli, Maggi and Pratelli [369] who use the
optimal coupling to refine Gromov’s proof of isoperimetry, to the point
of obtaining a beautiful (and sharp) quantitative isoperimetric theo-
rem, giving a lower bound on how much the isoperimetric ratio of a set
departs from the optimal ratio, in terms of how much the shape of this
set departs from the optimal shape.
580 21 Isoperimetric-type inequalities

Interestingly enough, the transport method in all of these works

is insensitive to the choice of norm in Rn . (I shall come back to this
observation in the concluding chapter.) Lutwak, Yang and Zhang [582]
developed this remark and noticed that if a function f is given, then the
problem of minimizing k∇f kL1 over all norms on Rn can be related to
Minkowski’s problem of prescribed Gauss curvature. The isoperimetric
problem for a non-Euclidean norm, also known as the Wulff problem,
is not an academic issue, since it is used in the modeling of surface
energy (see the references in [369]).
In a Euclidean context, there are other more classical methods to
attack these problems, such as rearrangement or symmetrization. For
instance, quantitative symmetrization inequalities were used in [389] to
prove the stability of the sharp Euclidean isoperimetric inequality. But
even in this case, as noticed by Maggi [586], the most efficient strategy
seems to be a combination of quantitative symmetrization inequalities
with optimal transport techniques (for an auxiliary one-dimensional
problem in that case). Similarly, the stability of optimal Sobolev in-
equalities is treated in [234, 388] by a combination of quantitative sym-
metrization (to reduce to radially symmetric functions) and optimal
transport (with a quantitative version of [248] on radially symmetric
functions).
In any case, if the reader is looking for a transport argument related
to some geometric inequality in Rn , I personally advise him or her to
try the Knothe coupling first, and if this turns out to be insufficient
because of some geometric reason, to go on with the optimal transport.
The Lévy–Gromov inequality was first conjectured by Lévy in the
case when the manifold M is the boundary of a uniformly convex set
(so the sectional curvatures are bounded below by a positive constant).
Lévy thought he had a proof, but his argument was faulty and repaired
by Gromov [438]. A lecture about Gromov’s proof is available in [663];
one may also consult [394, Section 4.H].
There have also been some striking works by Bobkov, Bakry and
Ledoux on the infinite-dimensional version of the Lévy–Gromov in-
equality (often called Gaussian isoperimetry); for this inequality there
is an elegant functional formulation [57, 125], and the extremal subsets
are half-spaces, rather than balls [145, 766]. On that subject I warmly
recommend (as usual) the synthesis works by Ledoux [542, 543]. Fur-
ther, see [76] where various isoperimetric inequalities are obtained via
optimal transport.
Bibliographical notes 581

In ﬁnite dimension, functional versions of the Lévy–Gromov inequal-

ity are also available [70, 740], but the picture is not so clear as in the
infinite-dimensional case.
The Lichnerowicz spectral gap theorem is usually encountered as
a simple application of the Bochner formula; it was obtained almost
half a century ago by Lichnerowicz [553, p. 135]; see [100, Section D.I.]
or [394, Theorem 4.70] for modern presentations. The above proof of
Theorem 21.20 is a variant of the one which appears in my joint work
with Lott [578]. Although less simple than the classical proof, it has
the advantage, for the purpose of these notes, to be based on optimal
transport. This is actually, to my knowledge, the first time that the
dimensional refinement in the constants by a factor N/(N − 1) in an
“infinite-dimensional functional inequality” has been obtained from a
transport argument.
22

Concentration inequalities

The theory of concentration of measure is a collection of results, tools

and recipes built on the idea that if a set A is given in a metric prob-
ability space (X , d, P ), then the enlargement Ar := {x; d(x, A) ≤ r}
might acquire a very high probability as r increases. There is an equiv-
alent statement that Lipschitz functions X → R are “almost constant”
in the sense that they have a very small probability of deviating from
some typical quantity, for instance their mean value. This theory was
founded by Lévy and later developed by many authors, in particular
V. Milman, Gromov and Talagrand.
To understand the relation between the two sides of concentration
(sets and functions), it is most natural to think in terms of median,
rather than mean value. By deﬁnition, a real number mf is a median
of the random variable f : X → R if
1 1
P [f ≥ mf ] ≥ ; P [f ≤ mf ] ≥ .
2 2
Then the two statements
(a) ∀A ⊂ X , ∀r ≥ 0, P [A] ≥ 1/2 =⇒ P [Ar ] ≥ 1 − ψ(r)
(b) ∀f ∈ Lip(X ), ∀r ≥ 0, P [f > mf + r] ≤ ψ(r/kf kLip )
are equivalent. Indeed, to pass from (a) to (b), ﬁrst reduce to the case
kf kLip = 1 and let A = {f ≤ mf }; conversely, to pass from (b) to (a),
let f = d( · , A) and note that 0 is a median of f .
The typical and most emblematic example of concentration of mea-
sure occurs in the Gaussian probability space (Rn , γ):
1 r2
γ[A] ≥ =⇒ ∀r ≥ 0, γ[Ar ] ≥ 1 − e− 2 . (22.1)
2
584 22 Concentration inequalities

Here is the translation in terms of Lipschitz functions: If X is a

Gaussian random variable with law γ, then for all Lipschitz functions
f : Rn → R,

r2
∀r ≥ 0, P f (X) ≥ E f (X) + r ≤ exp − . (22.2)
2 kf k2Lip

Another famous example is the unit sphere S N : if σ N stands for the

normalized volume on S N , then the formulas above can be replaced by
1 (N−1)
r2
σ N [A] ≥ =⇒ σ N [Ar ] ≥ 1 − e− 2 ,
2

(N − 1) r 2
P f (X) ≥ E f (X) + r ≤ exp − .
2 kf k2Lip
In this example we see that the phenomenon of concentration of mea-
sure becomes more and more important as the dimension increases to
inﬁnity.
In this chapter I shall review the links between optimal transport
and concentration, focusing on certain transport inequalities. The main
results will be Theorems 22.10 (characterization of Gaussian concen-
tration), 22.14 (concentration via Ricci curvature bounds), 22.17 (con-
centration via logarithmic Sobolev inequalities), 22.22 (concentration
via Talagrand inequalities) and 22.25 (concentration via Poincaré in-
equalities). The chapter will be concluded by a recap, and a technical
appendix about Hamilton–Jacobi equations, of independent interest.

Optimal transport and concentration

As ﬁrst understood by Marton, there is a simple and robust functional

approach to concentration inequalities based on optimal transport. One
can encode some information about the concentration of measure with
respect to some reference measure ν, by functional inequalities of the
form
∀µ ∈ P (X ), C(µ, ν) ≤ Eν (µ), (22.3)
where C(µ, ν) is the optimal transport cost between µ and ν, and Eν is
some local nonlinear functional (“energy”) of µ, involving for instance
the integral of a function of the density of µ with respect to ν.
Optimal transport and concentration 585

This principle may be heuristically understood as follows. To any

measurable set A, associate the conditional measure µA = (1A /ν[A]) ν.
If the measure of A is not too small, then the associated energy Eν (µA )
will not be too high, and by (22.3) the optimal transport cost C(µA , ν)
will not be too high either. In that sense, the whole space X can be
considered as a “small enlargement” of just A.
Here is a fluid mechanics analogy: imagine µ as the density of a
fluid. The term on the right-hand side of (22.3) measures how difficult
it is to prepare µ, for instance to confine it within a set A (this has to
do with the measure of A); while the term on the left-hand side says
how difficult it is for the fluid to invade the whole space, after it has
been prepared initially with density µ.
The most important class of functional inequalities of the type (22.3)
occurs when the cost function is of the type c(x, y) = d(x, y)p , and the
“energy” functional is the square root of Boltzmann’s H functional,
Z
Hν (µ) = ρ log ρ dν, µ = ρ ν,

with the understanding that Hν (µ) = +∞ if µ is not absolutely contin-

uous with respect to ν. Here is a precise deﬁnition of these functional
inequalities:
Definition 22.1 (Tp inequality). Let (X , d) be a Polish space and
let p ∈ [1, ∞). Let ν be a reference probability measure in Pp (X ), and
let λ > 0. It is said that ν satisfies a Tp inequality with constant λ if
r
2 Hν (µ)
∀µ ∈ Pp (X ), Wp (µ, ν) ≤ . (22.4)
λ
These inequalities are often called transportation-cost inequalities,
or Talagrand inequalities, although the latter denomination is some-
times restricted to the case p = 2.
Remark 22.2. Since Wp ≤ Wq for p ≤ q, the Tp inequalities become
stronger and stronger as p increases. The inequalities T1 and T2 have
deserved most attention. It is an experimental fact that T1 is more
handy and ﬂexible, while T2 has more geometric content, and behaves
better in large dimensions (see for instance Corollary 22.6 below).
There are two important facts to know about Tp inequalities when
p varies in the range [1, 2]: they admit a dual formulation, and they
tensorize. These properties are described in the two propositions below.
586 22 Concentration inequalities

Proposition 22.3 (Dual formulation of Tp ). Let (X , d) be a Polish

space, p ∈ [1, 2] and ν ∈ Pp (X ). Then the following two statements are
equivalent:
(a) ν satisfies Tp (λ);
(b) For any ϕ ∈ Cb (X ),
 Z 2
 d(x,y)p 1 R
 λt inf y∈X ϕ(y)+ p −1 t 2−p
∀t ≥ 0
 e ν(dx) ≤ eλ p 2 eλt ϕ dν


 (p < 2)

 Z

 d(x,y)2
R

 λ inf y∈X ϕ(y)+ 2
 e ν(dx) ≤ eλ ϕ dν (p = 2).
(22.5)

Particular Case 22.4 (Dual formulation of T1 ). Let (X , d) be a

Polish space and ν ∈ P1 (X ), then the following two statements are
equivalent:
(a) ν satisﬁes T1 (λ);
(b) For any ϕ ∈ Cb (X ),
Z R
t inf y∈X ϕ(y)+d(x,y) t2
∀t ≥ 0 e ν(dx) ≤ e 2λ et ϕ dν . (22.6)

Proposition 22.5 (Tensorization of Tp ). Let (X , d) be a Polish

space, p ∈ [1, 2] and let ν ∈ Pp (X ) be a reference probability measure
satisfying an inequality Tp (λ). Then for any N ∈ N, the measure ν ⊗N
2
satisfies an inequality Tp (N 1− p λ) on (X N , dp , ν ⊗N ), where the product
distance dp is defined by

X
N 1
p
dp (x1 , . . . , xN ); (y1 , . . . , yN ) = d(xi , yi )p .
i=1

Corollary 22.6 (T2 inequalities tensorize exactly). If ν satisfies

T2 (λ), then also µ⊗N satisfies T2 (λ) on (X N , d2 , ν ⊗N ), for any N ∈ N.

Proof of Proposition 22.3. Proposition 22.3 will be obtained as a con-

sequence of Theorem 5.26. Recall the Legendre representation of the
H-functional: For any λ > 0,
Optimal transport and concentration 587
 Z Z
 Hν (µ) 1


 ∀µ ∈ P (X ), = sup ϕ dµ − log eλϕ dν ,

 λ ϕ∈Cb (X ) λ X


 Z Z

 1 λϕ Hν (µ)

∀ϕ ∈ Cb (X ), log e dν = sup ϕ dµ − .
λ X µ∈P (X ) λ
(22.7)
(See the bibliographical notes for proofs of these identities.)
Let us ﬁrst treat the case p = 2. Apply Theorem 5.26 R λϕ with

c(x, y) = d(x, y)2 /2, F (µ) = (1/λ)Hν (µ), Λ(ϕ) = (1/λ) log e dν .
The conclusion is that ν satisﬁes T2 (λ) if and only if
Z Z
c
∀φ ∈ Cb (X ), log exp λ φ dν − λφ dν ≤ 0,

i.e. Z R
c
e−λφ dν ≤ e−λ φ dν
,

where φc (x) := supy φ(y) − d(x, y)2 /2). Upon changing φ for ϕ = −φ,
this is the desired result. Note that the Particular Case 22.4 is obtained
from (22.5) by choosing p = 1 and performing the change of variables
t → λt.
The case p < 2 is similar, except that now we appeal to the equiva-
lence between (i’) and (ii’) in Theorem 5.26, and choose
2
d(x, y)p p p p2 ∗ 1 1 2
c(x, y) = ; Φ(r) = r 1r≥0 ; Φ (t) = − t 2−p .
p 2 p 2
⊓
⊔

Proof of Proposition 22.5. First we need to set up a bit of notation.

Let µ = µ(dx1 dx2 . . . dxN ) be a probability measure on X N , and let
(x1 , . . . , xN ) ∈ X N be distributed randomly according to µ. I shall
write µ1 (dx1 ) for the law of x1 , µ2 (dx2 |x1 ) for the conditional law of
x2 given x1 , µ3 (dx3 |x1 , x2 ) for the conditional law of x3 given x1 and
x2 , etc. I shall also use the shorthand xi = (x1 , x2 , . . . , xi ) (with the
convention that x0 = ∅), and write µi for the law of xi .
The proof is reminiscent of the strategy used to construct the
Knothe–Rosenblatt coupling. First choose an optimal coupling (for the
cost function c = dp ) between µ1 (dx1 ) to ν(dy1 ), call it π1 (dx1 dy1 ).
Then for each x1 , choose an optimal coupling between µ2 (dx2 |x1 ) and
ν(dy2 ), call it π2 (dx2 dy2 |x1 ). Then for each (x1 , x2 ), choose an optimal
588 22 Concentration inequalities

coupling between µ3 (dx3 |x1 , x2 ) and ν(dy3 ), call it π3 (dx3 dy3 |x1 , x2 );
etc. In the end, glue these plans together to get a coupling

π(dx1 dy1 dx2 dy2 . . . dxN dyN )

= π1 (dx1 dy1 ) π2 (dx2 dy2 |x1 ) π3 (dx3 dy3 |x1 , x2 ) . . .
. . . πN (dxN dyN |x1 , . . . , xN −1 ).

In more compact notation,

π(dx dy) = π1 (dx1 dy1 ) π2 (dx2 dy2 |x1 ) . . . πN (dxN dyN |xN −1 ).

Here something should be said about the measurability, since there

is a priori no canonical way to choose πi ( · |xi−1 ) as a function of xi−1 .
But Corollary 5.22 ensures that this choice can be made in a measurable
way.
By the deﬁnition of dp ,
N
X
E π dp (x, y)p = E π d(xi , yi )p
i=1
XN Z h i
= E π( · |xi−1 ) d(xi , yi )p π i−1 (dxi−1 dy i−1 )
i=1
XN Z h i
= E π( · |xi−1 ) d(xi , yi )p µi−1 (dxi−1 ), (22.8)
i=1

where of course

π i (dxi dy i ) = π1 (dx1 dy1 ) π2 (dx2 dy2 |x1 ) . . . πi (dxi dyi |xi−1 ).

For each i and each xi−1 = (x1 , . . . , xi−1 ), the measure π( · |xi−1 ) is
an optimal transference plan between its marginals. So the right-hand
side of (22.8) can be rewritten as
N Z
X p
Wp µi ( · |xi−1 ), ν µi−1 (dxi−1 ).
i=1

Since this cost is achieved for the transference plan π, we obtain the
key estimate
N Z
X p
Wp (µ, ν ⊗N )p ≤ Wp µi ( · |xi−1 ), ν µi−1 (dxi−1 ). (22.9)
i=1
Optimal transport and concentration 589

By assumption, ν satisﬁes Tp (λ), so the right-hand side in (22.9) is

bounded above by

XZ 2 p
2 i−1
i−1
Hν µi ( · |x ) µ (dxi−1 ). (22.10)
λ
i

P p/2
Since p ≤ 2, we can apply Hölder’s inequality, in the form i≤N ai ≤
P
N 1−p/2 ( ai )p/2 , and bound (22.10) by

p "Z N
X
! # p2
1− p2 2 2
N Hν µi ( · |xi−1 ) µi−1 (dxi−1 ) . (22.11)
λ
i=1

But the formula of additivity of entropy (Lemma 22.8 below) states

that
X Z
Hν µi (dxi |xi−1 ) µi−1 (dxi−1 ) = Hν ⊗N (µ). (22.12)
1≤i≤N

Putting all the previous bounds back together, we end up with

p
⊗N p 1− p2 2 2 p
Wp (µ, ν ) ≤N Hν ⊗N (µ) 2 ,
λ

which is equivalent to the desired inequality. ⊓

⊔

Remark 22.7. The same proof shows that the inequality

∀µ ∈ P (X ), C(µ, ν) ≤ Hν (µ)

implies
∀µ ∈ P (X N ), C N (µ, ν) ≤ Hν ⊗N (µ),
where C N is the optimal transport cost associated with the cost func-
tion X
cN (x, y) = c(xi , yi )

on X N .

The following important lemma was used in the course of the proof
of Proposition 22.5.
590 22 Concentration inequalities

Lemma 22.8 (Additivity of entropy). Let NQ∈ N, let XN 1 , . . . , XN

be Polish spaces, νi ∈ P (Xi ) (1 ≤ i ≤ N ), X = Xi , ν = νi , and
µ ∈ P (X ). Then, with the same notation as in the beginning of the
proof of Proposition 22.5,
X Z
Hν (µ) = Hνi µi (dxi |xi−1 ) µi−1 (dxi−1 ). (22.13)
1≤i≤N

Proof of Lemma 22.8. By induction, it suﬃces to treat the case N = 2.

Let ρ = ρ(x1 , x2 ) be the density of µ with respect to ν1 ⊗ ν2 . By an easy
approximation argument based on the monotone convergence theorem,
it is suﬃcient to establish (22.13) in Rthe case when ρ is bounded.
The measure µ1 (dx1 ) has density ρ(x1 , x2 ) ν2 (dx R 2 ), while the con-
ditional measure µ2 (dx2 |x1 ) has density ρ(x1 , x2 )/( ρ(x1 , x2 ) ν2 (dx2 )).
From this and the additive property of the logarithm, we deduce
Z

Hν µ2 ( · |x1 ) µ1 (dx1 )
Z Z
ρ(x1 , x2 ) ρ(x1 , x2 )
= R log R ν2 (dx2 )
ρ(x1 , x′2 ) ν2 (dx′2 ) ρ(x1 , x′2 ) ν2 (dx′2 )
Z
ρ(x1 , x′2 ) ν2 (dx′2 ) ν1 (dx1 )
ZZ
= ρ(x1 , x2 ) log ρ(x1 , x2 ) ν2 (dx2 ) ν1 (dx1 )
Z Z Z
− ρ(x1 , x2 ) ν2 (dx2 ) log ρ(x1 , x2 ) ν2 (dx2 ) ν1 (dx1 )

= Hν (µ) − Hν1 (µ1 ).

This concludes the proof. ⊓
⊔
Exercise 22.9. Give an alternative proof of Proposition 22.5 based on
the dual formulation of Tp inequalities (Proposition 22.3).

Gaussian concentration and T1 inequality

Gaussian concentration is a loose terminology meaning that some ref-
erence measure enjoys properties of concentration which are similar to
those of the Gaussian measure. In this section we shall see that a cer-
tain form of Gaussian concentration is equivalent to a T1 inequality.
Once again, this principle holds in very general metric spaces.
Gaussian concentration and T1 inequality 591

Theorem 22.10 (Gaussian concentration). Let (X , d) be a Pol-

ish space, equipped with a reference probability measure ν. Then the
following properties are all equivalent:
(i) ν lies in P1 (X ) and satisfies a T1 inequality;
(ii) There is λ > 0 such that for any ϕ ∈ Cb (X ),
Z R
t inf y∈X ϕ(y)+d(x,y) t2
∀t ≥ 0 e ν(dx) ≤ e 2λ et ϕ dν .

(iii) There is a constant C > 0 such that for any Borel set A ⊂ X ,
1 2
ν[A] ≥ =⇒ ∀r > 0, ν[Ar ] ≥ 1 − e−C r .
2
(iv) There is a constant C > 0 such that

∀f ∈ L1 (ν) ∩ Lip(X ), ∀ε > 0,

Z !
h i ε2
ν x ∈ X ; f (x) ≥ f dν + ε ≤ exp − C ;
kf k2Lip

(v) There is a constant C > 0 such that

∀f ∈ L1 (ν) ∩ Lip(X ), ∀ε > 0, ∀N ∈ N,

Z !
h N
1 X i N ε2
⊗N N
ν x∈X ; f (xi ) ≥ f dν + ε ≤ exp − C ;
N
i=1
kf k2Lip

(vi) There is a constant C > 0 such that

∀f ∈ Lip(X ), ∀ mf = median of f , ∀ε > 0,

!
h i ε2
ν x ∈ X ; f (x) ≥ mf + ε ≤ exp − C ;
kf k2Lip

(vii) For any x0 ∈ X there is a constant a > 0 such that

Z
2
ea d(x0 ,x) ν(dx) < +∞;

(viii) There exists a > 0 such that

Z
2
ea d(x,y) ν(dx) ν(dy) < +∞;
592 22 Concentration inequalities

(ix) There exist x0 ∈ X and a > 0 such that

Z
2
ea d(x0 ,x) ν(dx) < +∞.

Remark 22.11. One should not overestimate the power of Theo-

rem 22.10. The simple (too simple?) criterion (ix) behaves badly in large
dimensions, and in practice might lead to terrible constants at the level
of (iii). In particular, this theorem alone is unable to recover dimension-
free concentration inequalities such as (22.1) or (22.2). Statement (v)
is dimension-independent,
P but limited to particular observables of the
form (1/N ) f (xi ). Here we see some limitations of the T1 inequality.

Proof of Theorem 22.10. We shall establish (i) ⇒ (ii) ⇒ (iv) ⇒ (vii),

(i) ⇒ (v) ⇒ (iv), (i) ⇒ (iii) ⇒ (vi) ⇒ (vii) ⇒ (viii) ⇒ (ix) ⇒ (i), and
this will prove the theorem.
(i) ⇒ (ii) was already seen in Particular Case 22.4.
To prove (ii) ⇒ (iv), it suﬃces to treat the case kf kLip = 1 (replace
ε by ε/kf kLip and f by f /kf kLip ). Then if f is 1-Lipschitz,

inf f (y) + d(x, y) = f (x),
y∈X

so (ii) implies Z
t2
R
et f (x) ν(dx) ≤ e 2λ et f dν
.
R
With the shorthand hf i = f dν, this is the same as
Z
t2
et (f −hf i) dν ≤ e 2λ .

Then by the exponential Chebyshev inequality,

h Z
i −tε t2
ν f − hf i ≥ ε ≤ e et (f −hf i) dν ≤ e−tε e 2λ ;

and (iv) is obtained by optimizing in t. (C = λ/2 does the job.)

Now let us prove (iv) ⇒ (vii). Let ν satisfy (iv) and let x0 ∈ X .
First we shall check that d( · , x0 ) ∈ L1 (ν). Let m ∈ N, and let fm =
d( · , x0 ) ∧ m; then fm ∈ L1 (ν) ∩ Lip(X ), so
Z
2
ν fm ≥ s + fm dν ≤ e−C s .
Gaussian concentration and T1 inequality 593

It follows that for any A ≤ m,

Z Z +∞
2
fm dν = 2s ν[fm ≥ s] ds
0 R
Z A Z fm dν
≤ 2s ν[fm ≥ s] ds + 2s ν[fm ≥ s] ds
0 A Z +∞

+ R 2s ν fm ≥ s ds
fm dν
Z R
fm dν
≤ A2 + ν[fm ≥ A] 2s ds
AZ Z Z
+∞ h i
+ 2 s+ fm dν ν fm ≥ s + fm dν ds
0
Z 2 Z +∞
2 2
≤ A + ν[fm ≥ A] fm dν + 2s e−C s ds
0 Z Z +∞
2
+2 fm dν e−C s ds
0
Z 2 Z +∞ Z 2
2 −C s2 1
≤ A + ν[f ≥ A] fm dν + 2s e ds + fm dν
0 4
Z +∞ 2
−C s2
+8 e ds
0
Z
2 2 1
≤A + fm dν ν[f ≥ A] + + C,
4
R +∞ R 2
2 +∞ 2
where C = 0 2s e−C s ds + 8 0 e−C s ds is a ﬁnite constant.
If A is large R enough, then ν[f ≥ A] ≤ 1/4, and the above inequal-
ity implies f 2 dν ≤ 2(A2 + C). By taking m → ∞ we deduce that
R 2 m
f dν < +∞, in particular f ∈ L1 (ν). So we can apply directly (iv)
to f = d( · , x0 ), and we get that for any a < C,
Z Z +∞
a d(x,x0 )2 2
e ν(dx) = 2aseas ν[f ≥ s] ds
0
Z R
f dν
2
= 2aseas ν[f ≥ s] ds
0
Z +∞ Z R 2 h
Z i
+ 2a s + f dν ea(s+ f dν ) ν f ≥ s + f dν ds
0
R 2
Z +∞ Z R 2
a( f dν )
2a s + f dν ea(s+ f dν ) e− Cs ds < +∞.
2
≤e +
0
594 22 Concentration inequalities

This proves (vii).

The next implication is (i) ⇒ (v). If ν satisfies T1 (λ), by Propo-
ν ⊗N satisfies T1 (λ/N ) on X N equipped with the distance
sition 22.5 P
d1 (x, y) = d(xi , yi ). Let F : X N → R be defined by
N
1 X
F (x) = f (xi ).
N
i=1
R
RIf f is Lipschitz then kF kLip = kf kLip /N . Moreover, F dν ⊗N =
N
f dν. So if we apply (iv) with X replaced by X and f replaced
by F , we obtain
hn N Z oi
⊗N N1 X
ν x∈X ; f (xi ) ≥ f dν + ε
N
i=1
hn Z oi
= ν ⊗N x ∈ X N ; F (x) ≥ F dν + ε

ε2
≤ exp − (C/N )
(kf kLip /N )2
!
N ε2
= exp − C ,
kf k2Lip

where C = λ/2 (cf. the remark at the end of the proof of (i) ⇒ (iv)).
The implication (v) ⇒ (iv) is trivial.
Let us now consider the implication (i) ⇒ (iii). Assume that
p
∀µ ∈ P1 (X ), W1 (µ, ν) ≤ C Hν (µ). (22.14)

Choose A with ν[A] ≥ 1/2, and µ = (1A ν)/ν[A]. If ν[Ar ] = 1 there

e = (1X \Ar ν)/ν[X \ Ar ]. By an
is nothing to prove, otherwise let µ
immediate computation,

1 1
Hν (µ) = log ≤ log 2, Hν (e
µ) = log .
ν[A] 1 − ν[Ar ]

By (22.14) and the triangle inequality for the distance W1 ,

s
p 1
e) ≤ W1 (µ, ν) + W1 (e
W1 (µ, µ µ, ν) ≤ C log 2 + C log .
1 − ν[Ar ]
(22.15)
Gaussian concentration and T1 inequality 595

e) ≥ r (all the mass has to

On the other hand, it is obvious that W1 (µ, µ
go from A to X \ Ar , so each unit of mass should travel a distance at
least r). So (22.15) implies
s
p 1
r ≤ C log 2 + C log ,
1 − ν[Ar ]
therefore 2
r r p
ν[A ] ≥ 1 − exp − − log 2 .
C
2
This establishes a bound of the type ν[Ar ] ≥ 1 − ae−C r , and (iii)
follows. (To get rid of the constant a, it suﬃces to note that ν[Ar ] ≥
2 ′ 2
max (1/2, 1 − ae−cr ) ≥ 1 − e−c r for c′ well-chosen.)
To prove (iii) ⇒ (vi), let A = {y; f (y) ≤ mf }. By the very deﬁnition
of a median, A has probability at least 1/2, so (iii) implies ν[Ar ] ≥
2
1 − e−Cr . On the other hand, {f ≥ mf + ε} is included in X \ Ar
for any r < ε/kf kLip . (Indeed, if f (x) ≥ mf + ε and y ∈ A then
f (x) − f (y) ≥ ε, so d(x, y) ≥ ε/kf kLip > r.) This leads to the bound
2
ν[{f ≥ mf + ε}] ≤ e−C(ε/kf kLip ) , which is Property (vi).
To show (vi) ⇒ (vii), let A be a compact set such that ν[A] ≥ 1/2;
also, let x0 ∈ A, and let R be the diameter of A. Further, let f (x) =
d(x, A); then f is a 1-Lipschitz function admitting 0 for median. So (vi)
implies 2
ν d(x, x0 ) ≥ R + r ≤ ν[d(x, A) ≥ r] ≤ e−C r .
It follows that for any a < C,
Z Z +∞
a d(x,x0 )2
2
e ν(dx) = ν d( · , x0 )2 ≥ s 2aseas ds
0
Z R Z +∞
as2
2
≤ 2ase ds + ν d( · , x0 )2 ≥ s 2aseas ds
0 R
Z ∞
2 2 2
≤ eaR + e−C(s−R) 2aseas ds < +∞.
R

To prove (vii) ⇒ (viii), pick up any x0 ∈ X and write

Z Z
a d(x,y)2 2 2
e ν(dx) ν(dy) ≤ e2a d(x,x0 ) +2a d(x0 ,y) ν(dx) ν(dy)
Z 2
2a d(x,x0 )2
= e ν(dx) .
596 22 Concentration inequalities

The implication (viii) ⇒ (ix) is obvious.

It only remains to establish (ix) ⇒ (i). If ν satisfies (ix), then ob-
viously ν ∈ P1 (X ). To prove that ν satisfies T1 , we shall establish the
weighted Csiszár–Kullback–Pinsker inequality
√ Z 1/2
2 2 p
d(x0 , · ) (µ − ν) ≤ 1 + log ea d(x 0 ,x)
dν(x) Hν (µ).
TV a X
(22.16)
Inequality (22.16) implies a T1 inequality, since Theorem 6.15 yields

W1 (µ, ν) ≤ d(x0 , · ) (µ − ν) . TV
So let us turn to the proof of (22.16). We may assume that µ is
absolutely continuous with respect to ν, otherwise (22.16) is trivial.
Then let f be the density of µ, and let u = f − 1, so that
µ = (1 + u) ν;
R
note that u ≥ −1 and u dν = 0. We also define
h(v) := (1 + v) log(1 + v) − v ≥ 0, v ∈ [−1, +∞);
so that Z
Hν (µ) = h(u) dν. (22.17)
X
Finally, let ϕ(x) = a d(x0 , x).
Since h(0) = h′ (0) = 0, Taylor’s formula (with integral remainder)
yields Z 1
2 1−t
h(u) = u dt,
0 1 + tu
so Z Z 1 2
u (x) (1 − t)
Hν (µ) = dν(x) dt.
X 0 1 + tu(x)
On the other hand, by Cauchy–Schwarz inequality on (0, 1) × X ,
Z 1 2 Z 2
(1 − t) dt ϕ|u| dν
0 X
Z !2
= (1 − t) ϕ(x)|u(x)| dν(x) dt
(0,1)×X
Z Z
≤ (1 − t) (1 + tu(x)) ϕ2 (x) dν(x) dt
Z Z
1−t 2
× |u(x)| dν(x) dt ;
1 + tu(x)
Gaussian concentration and T1 inequality 597

thus Z 2
ϕ |u| dν ≤ CHν (µ), (22.18)

where ZZ
(1 − t) (1 + tu) ϕ2 dν dt
C := Z 1 2 · (22.19)
(1 − t) dt
0
The numerator can be rewritten as follows:
ZZ
(1 − t) (1 + tu) ϕ2 dν dt
Z Z Z Z
2 2
= (1 − t)t dt (1 + u) ϕ dν + (1 − t) dt ϕ2 dν
Z Z
1 2 1
= ϕ dµ + ϕ2 dν. (22.20)
6 3
From the Legendre representation of the H functional,
Z Z
2
ϕ dµ ≤ Hν (µ) + log eϕ dν,
2
(22.21)

and Jensen’s inequality, in the form

Z Z
2
ϕ dν ≤ log eϕ dν,
2
(22.22)

we deduce that the right-hand side of (22.20) is bounded above by

Z
1 1 2
Hν (µ) + log eϕ dν.
6 2

Plugging this into (22.19) and (22.18), we conclude that

Z 2
2
ϕ |u| dν ≤ H + 2L H, (22.23)
3
Z
2
where H stands for Hν (µ) and L for log eϕ dν.

The preceding bound is relevant only for “small” values of H. To

handle large values, note that
598 22 Concentration inequalities
Z 2 Z Z
ϕ|u| dν ≤ ϕ2 |u| dν
|u| dν
Z Z Z Z
2 2
≤ ϕ dµ + ϕ dν dµ + dν

≤ (H + 2L) 2 (22.24)

where I have successively used the Cauchy–Schwarz inequality, the in-

equality |u| ≤ 1 + u + 1 on [−1, +∞) (which results in |u| ν ≤ µ + ν),
and ﬁnally (22.21) and (22.22).
By (22.23) and (22.24),
Z 2
H
ϕ|u| dν ≤ min (2H) + L , 2(H + 2L) .
3

Then the elementary inequality

1n p o
min (At2 +Bt, t+D) ≤ M t, M = 1 + B + (B − 1)2 + 4AD
2
implies Z p
ϕ|u| dν ≤ m H(µ|ν)

where s r
8 √ √
m= 1+L+ (L − 1)2 + L ≤ 2 L + 1.
3
This concludes the proof. ⊓
⊔

Remark 22.12 (CKP inequality). In the Rparticular case ϕ = 1,

we can replace the inequality (22.21) by just dµ = 1; then instead
of (22.23) we obtain
p
kµ − νkT V ≤ 2 Hν (µ). (22.25)

This is the classical Csiszár–Kullback–Pinsker

√ (CKP) inequality,
with the sharp constant 2.

Remark 22.13. If ν satisﬁes T2 (λ), then also ν ⊗N satisﬁes T2 (λ),

independently of N ; so one might hope to improve the concentra-
tion inequality appearing in Theorem 22.10(v). But now the space
X N should be Pequipped with the
√ d2 distance, for which the function
F : x → (1/N ) f (xi ) is only N -Lipschitz! In the end, T2 does not
Talagrand inequalities from Ricci curvature bounds 599

lead to any improvement of Theorem 22.10(v). This is not in contradic-

tion with the fact that T2 is significantly stronger than T1 (as we shall
see in the sequel); it just shows that we cannot tell the P
difference when
we consider observables of the particular form (1/N ) ϕ(xi ). If one
is interested in more complicated observables (such as nonlinear func-
tionals, or suprema as in Example 22.36 below) the difference between
T1 and T2 might become considerable.

Talagrand inequalities from Ricci curvature bounds

In the previous section the focus was on T1 inequalities; in this and

the next two sections we shall consider the stronger T2 inequalities
(Talagrand inequalities).
The most simple criterion for T2 to hold is expressed in terms of
Ricci curvature bounds:

Theorem 22.14 (CD(K, ∞) implies T2 (K)). Let M be a Rie-

mannian manifold, equipped with a reference probability measure ν =
e−V vol, V ∈ C 2 (M ), satisfying a CD(K, ∞) curvature-dimension
bound for some K > 0. Then ν belongs to P2 (M ) and satisfies the
Talagrand inequality T2 (K). In particular, ν satisfies Gaussian con-
centration bounds.

Proof of Theorem 22.14. It follows by Theorem 18.12 that ν lies in

P2 (M ); then the inequality T2 (K) comes from Corollary 20.13(i) with
µ0 = ν and µ1 = µ. Since T2 (K) implies T1 (K), Theorem 22.10 shows
that ν satisﬁes Gaussian concentration bounds. ⊓
⊔

Example 22.15. The standard Gaussian γ on RN satisﬁes CD(1, ∞),

and therefore T2 (1) too. This is independent of N .

The links between Talagrand inequalities and dimension free con-

centration bounds will be considered further in Theorem 22.22.
600 22 Concentration inequalities

Relation with log Sobolev and Poincaré inequalities

So far we learnt that logarithmic Sobolev inequalities follow from cur-

vature bounds, and that Talagrand inequalities also result from the
same bounds. We also learnt from Chapter 21 that logarithmic Sobolev
inequalities imply Poincaré inequalities. Actually, Talagrand inequal-
ities are intermediate between these two inequalities: a logarithmic
Sobolev inequality implies a Talagrand inequality, which in turn im-
plies a Poincaré inequality. In some sense however, Talagrand is closer
to logarithmic Sobolev than to Poincaré: For instance, in nonnegative
curvature, the validity of the Talagrand inequality is equivalent to the
validity of the logarithmic Sobolev inequality — up to a degradation
of the constant by a factor 1/4.
To establish these properties, we shall use, for the ﬁrst time in this
course, a semigroup argument. As discovered by Bobkov, Gentil
and Ledoux, it is indeed convenient to consider inequality (22.5) from
a dynamical point of view, with the help of the (forward) Hamilton–
Jacobi semigroup deﬁned as in Chapter 7 by


H0 ϕ = ϕ,


(22.26)

 d(x, y)2

(Ht ϕ)(x) = inf ϕ(y) + (t > 0, x ∈ M ).
y∈M 2t

The next proposition summarizes some of the nice properties of the

semigroup (Ht )t≥0 . Recall the notation |∇− f | from (20.2).

Proposition 22.16 (Some properties of the quadratic Hamilton–

Jacobi semigroup). Let f be a bounded continuous function on a
Riemannian manifold M . Then:
(i) For any s, t ≥ 0, Ht Hs f = Ht+s f .
(ii) For any x ∈ M , inf f ≤ (Ht f )(x) ≤ f (x); moreover,
√ the infi-
mum over M in (22.26) can be restricted to the ball B[x, Ct], where
C := 2 (sup f − inf f ).
(iii) For any t > 0, Ht f is Lipschitz and locally semiconcave (with
a quadratic modulus of semiconcavity) on M .
(iv) For any x ∈ M , Ht f (x) is a nonincreasing function of t, con-
verging monotonically to f (x) as t → 0. In particular, limt→0 Ht f = f ,
locally uniformly.
(v) For any t ≥ 0, s > 0, x ∈ M ,
Relation with log Sobolev and Poincaré inequalities 601

|Ht+s f (x) − Ht f (x)| kHt f k2Lip(B[x,√Cs])

≤ .
s 2
(vi) For any x ∈ M and t ≥ 0,

(Ht+s f )(x) − (Ht f )(x) |∇− Ht f (x)|2

lim inf ≥− . (22.27)
s→0+ s 2
(vii) For any x ∈ M and t > 0,

(Ht+s f )(x) − (Ht f )(x) |∇− Ht f (x)|2

lim =− . (22.28)
s→0+ s 2
The proof of Proposition 22.16 is postponed to the Appendix, where
a more general statement will be provided (Theorem 22.46).
Now we are ready for the main result of this section.

Theorem 22.17 (Logarithmic Sobolev ⇒ T2 ⇒ Poincaré).

Let M be a Riemannian manifold equipped with a reference probability
measure ν ∈ P2 (M ). Then:
(i) If ν satisfies a logarithmic Sobolev inequality with some constant
K > 0, then it also satisfies a Talagrand inequality with constant K.
(ii) If ν satisfies a Talagrand inequality with some constant K > 0,
then it also satisfies a Poincaré inequality with constant K.

Remark 22.18. Theorem 22.17 has the important advantage over

Theorem 22.14 that logarithmic Sobolev inequalities are somewhat easy
to perturb (recall Remark 21.5), while there are few known perturba-
tion criteria for T2 . One of them is as follows: if ν satisﬁes T2 and
νe = e−v ν with v bounded, then there is a constant C such that
p 1

∀µ ∈ P2 (M ), W2 (µ, ν) ≤ C Hν (µ) + Hν (µ) 4 . (22.29)

Remark 22.19. Part (ii) of Theorem 22.17 shows that the T2 inequal-
ity on a Riemannian manifold contains spectral information, and im-
poses qualitative restrictions on measures satisfying T2 . For instance,
the support of such a measure needs to be connected. (Otherwise take
u = a on one connected component, u = b on another, u = 0Relsewhere,
where Ra and b are two constants
R 2 chosen in such a way that u dν = 0.
2
Then |∇u| dν = 0, while u dν > 0.) This remark shows that T2
does not result from just decay estimates, in contrast with T1 .
602 22 Concentration inequalities

Proof of Theorem 22.17, part (i). Let ν satisfy a logarithmic Sobolev

inequality with constant K > 0. By the dual reformulation of T2 (K)
(Proposition 22.3 for p = 2), it is suﬃcient to show
Z R
∀g ∈ Cb (M ), eK(Hg) dν ≤ e M g dν , (22.30)
M

where h d(x, y)2 i

(Hg)(x) = inf g(y) + .
y∈M 2
Deﬁne Z
1 KtHt g
φ(t) = log e dν . (22.31)
Kt M
Since g is bounded, Proposition 22.16(ii) implies that Ht g is bounded,
uniformly in t. Thus
Z
KtHt g
e = 1 + Kt Ht g dν + O(t2 ) (22.32)
M

and Z
φ(t) = Ht g dν + O(t). (22.33)
M
By Proposition 22.16(iv), Ht g converges pointwise to g as t → 0+ ; then
by the dominated convergence theorem,
Z
lim φ(t) = g dν. (22.34)
t→0+ M

So it all amounts to showing that φ(1) ≤ limt→0+ φ(t), and this will
obviously be true if φ(t) is nonincreasing in t. To prove this, we shall
compute the time-derivative φ′ (t). We shall go slowly, so the hasty
reader may go directly to the result, which is formula (22.41) below.
Let t ∈ (0, 1] be given. For s > 0, we have
Z
φ(t + s) − φ(t) 1 1 1
= − log eK(t+s)Ht+s g dν
s s K(t + s) Kt M
Z Z
1 K(t+s)Ht+s g KtHt g
+ log e dν − log e dν .
Kts M M
(22.35)

As s → 0+ , eK(t+s)Ht+s g converges pointwise to eKt Ht g , and is uni-

formly bounded. So the ﬁrst term in the right-hand side of (22.35)
converges, as s → 0+ , to
Relation with log Sobolev and Poincaré inequalities 603
Z
1
− log eKt Ht g dν . (22.36)
Kt2 M

On the other hand, the second term in the right-hand side of (22.35)
converges to
Z Z
1 1
Z lim eK(t+s)Ht+s g dν − eKt Ht g dν ,
s→0+ s
Kt eKt Ht g dν M M

(22.37)
provided that the latter limit exists.
To evaluate the limit in (22.37), we decompose the expression inside
square brackets into
Z ! Z Kt Ht+s g
eK(t+s)Ht+s g − eKt Ht+s g e − eKt Ht g
dν + dν.
M s M s
(22.38)
The integrand of the ﬁrst term in the above formula can be rewritten as
(eKt Ht+s g )(eKs Ht+s g − 1)/s, which is uniformly bounded and converges
pointwise to (e Kt Ht g )Kt H g as s → 0+ . So the ﬁrst integral in (22.38)
R t
converges to M (K Ht g)eKt Ht g dν.
Let us turn to the second term of (22.38). By Proposition 22.16(vii),
for each x ∈ M ,
−
|∇ Ht g(x)|2
Ht+s g(x) = Ht g(x) − s + o(1) ,
2

and therefore

eKtHt+s g(x) − eKt Ht g(x) |∇− Ht g(x)|2

lim = − KteKt Ht g . (22.39)
s→0+ s 2
On the other hand, parts (iii) and (v) of Proposition 22.16 imply that

Ht+s g = Ht g + O(s).

Since Ht g(x) is uniformly bounded in t and x,

eKtHt+s g − eKt Ht g
= O(1) as s → 0+ . (22.40)
s
By (22.39), (22.40) and the dominated convergence theorem,
604 22 Concentration inequalities
Z Z
eKtHt+s g − eKtHt g |∇− Ht g|2 KtHt g
lim dν = − Kt e dν.
s→0+ M s M 2

In summary, for any t > 0, φ is right-diﬀerentiable at t and

d+ φ(t) φ(t + s) − φ(t)
:= lim
dt s→0+ s
Z Z
1 KtHt g KtHt g
= Z − e dν log e dν
Kt2 eKtHt g dν M M

ZM Z
KtHt g 1 −
2 KtHt g
+ (KtHt g) e dν − Kt|∇ Ht g| e dν .
M 2K M
(22.41)

Because ν satisﬁes a logarithmic Sobolev inequality with constant K,

the quantity inside square brackets is nonpositive. So φ is nonincreasing
and the proof is complete. ⊓
⊔

Before attacking the proof of Theorem 22.17(ii), it might be a good

idea to think over the next exercise, so as to understand more “con-
cretely” why Talagrand inequalities are related to Poincaré inequalities.

Exercise 22.20. Use Otto’s calculus to show that, at least formally,

W2 (1 + εh)ν, ν
khkH −1 (ν) = lim ,
ε→0 ε
where Rh is smooth and bounded (and compactly supported, if you
wish), h dν = 0, and the dual Sobolev norm H −1 (ν) is deﬁned by

khkL2 (ν)
khkH −1 (ν) = sup = ∇(L−1 h) L2 (ν) ,
h6=0 k∇hkL2 (ν)

where as before L = ∆ − ∇V · ∇. Deduce that, at least formally, the

Talagrand inequality reduces, in the limit when µ = (1 + εh) ν and
ε → 0, to the dual Poincaré inequality
Z
khkL2 (ν)
h dν = 0 =⇒ khkH −1 (ν) ≤ √ .
K
Proof of Theorem 22.17, Rpart (ii). Let h : M → R be a bounded Lips-
chitz function satisfying M h dν = 0. Introduce
Relation with log Sobolev and Poincaré inequalities 605
Z
ψ(t) = eKtHt h dν.
M

From the dual formulation of Talagrand’s inequality

R (Proposition 22.3
for p = 2), ψ(t) is bounded above by exp(KtR M h dν) = 1; hence ψ has
a maximum at t = 0. Combining this with h dν = 0, we ﬁnd
Z
1 − ψ(t) 1 + Kt h − eKt Ht h
0 ≤ lim sup = lim sup dν.
t→0+ Kt2 t→0+ M Kt2
(22.42)
By the boundedness of Ht h and Proposition 22.16(iv),

K 2 t2
eKtHt h = 1 + KtHt h + (Ht h)2 + O(t3 ) (22.43)
2
K 2 t2 2
= 1 + KtHt h + h + o(t2 ).
2
So the right-hand side of (22.42) equals
Z Z
h − Ht h K
lim sup dν − h2 dν.
t→0+ M t 2 M

By Proposition 22.16(v), (h − Ht h)/t is bounded; so we can apply

Fatou’s lemma, in the form
Z Z
h − Ht h h − Ht h
lim sup dν ≤ lim sup dν.
t→0+ M t M t→0+ t

Then Proposition 22.16(vi) implies that

Z Z
h − Ht h |∇− h|2
lim sup dν ≤ dν.
M t→0+ t M 2

All in all, the right-hand side of (22.42) can be bounded above by

Z Z
1 K
|∇− h|2 dν − h2 dν. (22.44)
2 M 2 M

So (22.44) is always nonnegative, which concludes the proof of the

Poincaré inequality. ⊓
⊔

To close this section, I will show that the Talagrand inequality does
imply a logarithmic Sobolev inequality under strong enough curvature
assumptions.
606 22 Concentration inequalities

Theorem 22.21 (T2 sometimes implies log Sobolev). Let M be a

Riemannian manifold and let ν = e−V vol ∈ P2 (M ) be a reference mea-
sure on M , V ∈ C 2 (M ). Assume that ν satisfies a Talagrand inequal-
ity T2 (λ), and a curvature-dimension inequality CD(K, ∞) for some
K > −λ. Then ν also satisfies a logarithmic Sobolev inequality with
constant " 2 #
e = max λ K
λ 1+ , K .
4 λ

Proof of Theorem 22.21. From the assumptions and Corollary 20.13(ii),

the nonnegative quantities H = Hν (µ), W = W2 (µ, ν) and I = Iν (µ)
satisfy the inequalities
r
√ λW 2 2H
H≤W I− , W ≤ .
2 K
e so ν satisﬁes
It follows by an elementary calculation that H ≤ I/(2λ),
e
a logarithmic Sobolev inequality with constant λ. ⊓
⊔

Talagrand inequalities and Gaussian concentration

We already saw in Theorem 22.10 that the T1 inequality implies Gaus-

sian concentration bounds. Now we shall see that the stronger T2 in-
equality implies dimension free concentration bounds; roughly speak-
ing, this means that ν ⊗N satisﬁes concentration inequalities with con-
stants independent of N .

Theorem 22.22 (T2 and dimension free Gaussian concentra-

tion). Let (X , d) be a Polish space, equipped with a reference proba-
bility measure ν. Then the following properties are equivalent:
(i) ν lies in P2 (X ) and satisfies a T2 inequality.
(ii) There is a constant C > 0 such that for any N ∈ N and any
Borel set A ⊂ X N ,
1 2
ν[A] ≥ =⇒ ∀r > 0, ν ⊗N [Ar ] ≥ 1 − e−Cr ;
2
here the enlargement Ar is defined with the d2 distance,
Talagrand inequalities and Gaussian concentration 607
v
uN
uX
d2 (x, y) = t d(xi , yi )2 .
i=1

(iii) There is a constant C > 0 such that for any N ∈ N and any
f ∈ Lip(X N , d2 ) (resp. Lip(X N , d2 ) ∩ L1 (ν ⊗N )),
h i −C r2
⊗N N kf k2
ν x ∈ X ; f (x) ≥ m + r ≤e Lip ,

where m is a median of f (resp. the mean value of f ) with respect to

the measure ν ⊗N .
Remark 22.23. The dependence of the constants can be made more
precise: If ν satisfies T2 (K), then there are a, r0 > 0 such that (with
2
obvious notation) ν ⊗N [Ar ] ≥ 1−a e−K(r−r0 ) /2 and ν ⊗N [{f ≥ m+r}] ≤
2
a e−K(r−r0 ) /2 , for all N ∈ N and r ≥ r0 . Conversely, these inequalities
imply T2 (K).
Proof of Theorem 22.22. If (i) is satisfied, then by Proposition 22.5,
ν ⊗N satisfies T2 (and therefore T1 ) with a uniform constant. Then we
can repeat the proof of (i) ⇒ (iii) in Theorem 22.10, with ν replaced
by ν ⊗N . Since the constants obtained in the end are independent of N ,
this proves (ii).
The implication (ii) ⇒ (iii) follows the same lines as in Theo-
rem 22.10.
The implication (iii) ⇒ (i) is more subtle. For any x ∈ X N , define
the empirical measure µ bN
x by

N
1 X
bN
µ x = δxi ∈ P (X ),
N
i=1

and let
bN
fN (x) = W2 µ x ,ν .

By the triangle inequality for W2 and Theorem 4.8, for any x, y ∈ X N

one has
1 X
N
1 X 2
N
|fN (x) − fN (y)|2 ≤ W2 δxi , δyi
N N
i=1 i=1
N N
1 X 1 X
≤ W2 (δxi , δyi )2 = d(xi , yi )2 ;
N N
i=1 i=1
608 22 Concentration inequalities
√
so fN is (1/ N )-Lipschitz in distance d2 . By (iii),
2
∀r > 0, ν ⊗N fN ≥ mN + r ≤ e−C N r , (22.45)

where mN is either a median, or the mean value of fN .

Let us check that mN goes to zero as N → ∞. According to
Varadarajan’s theorem, µ bNx converges weakly to ν for all x outside
⊗N
of a ν -negligible set. (Here I am slightly abusing notation since x
stands both for a sequence and for the ﬁrst N elements of this se-
2
quence.)
R Moreover, 0 is any element in X and ϕ(x) = d(x0 , x) ,
if xP
N
then ϕ db µN
x = (1/N )
2
i=1 d(x0 , xi ) is (under ν
⊗N ) a sum of inde-

pendent, identically distributed variables with ﬁnite moment of or-

Rder 1; and 2
thus, by Rthe strong law of large numbers, converges to
d(x0 , x) dν(x) = ϕ dν. Combining this information with Theo-
rem 6.9 we see that W2 (b µNx , ν) −→ 0 as N → ∞, for ν
⊗N -almost

all sequences (xi )i∈N . By Lebesgue’s dominated convergence theorem,

for any t > 0 we have ν ⊗N [W2 (b µN , ν) ≥ t] → 0; this implies that any
sequence (mN ) of medians of fN converges to zero.
The convergence of the mean value requires an additional estimate:
Z Z
W2 (bµN , ν)2 dν ⊗N ≤ W2 (δx , ν)2 dν(x)
Z Z
≤ W2 (δx , δy )2 dν(x) dν(y) = d(x, y)2 dν(x) dν(y) < +∞;

so fN is bounded in L2 (ν ⊗N ), in particular it is equi-integrable. Then

the almost sure convergence of fN to 0 implies the convergence also in
L1 ; a fortiori the mean value of fN goes to zero.
Since mN → 0, (22.45) implies
hn oi
1 ⊗N N
lim inf − log ν µx , ν) ≥ r
x; W2 (b ≥ C r2. (22.46)
N →∞ N
The left-hand side is a large deviation estimate for the empirical
measure of independent identically distributed samples; since the set
W2 (µ, ν) > r is open in the topology of P2 (X ), a suitable version of
Sanov’s theorem (see the bibliographical notes) yields
hn oi
1 ⊗N N
lim sup − log ν µx , ν) ≥ r
x; W2 (b
N →∞ N
n o
≤ inf Hν (µ); ν ∈ P2 (X ); W2 (µ, ν) > r . (22.47)
Poincaré inequalities and quadratic-linear transport cost 609

Combining (22.46) and (22.47) gives

n o
inf Hν (µ); ν ∈ P2 (X ); W2 (µ, ν) > r ≥ C r 2 ,
p
which is equivalent to W2 (µ, ν) ≤ Hν (µ)/C, hence (i). ⊓
⊔

Poincaré inequalities and quadratic-linear transport cost

So far we have encountered transport inequalities involving the quadratic
cost function c(x, y) = d(x, y)2 , and the linear cost function c(x, y) =
d(x, y). Remarkably, Poincaré inequalities can be recast in terms of
transport cost inequalities for a cost function which behaves quadrat-
ically for small distances, and linearly for large distances. As discov-
ered by Bobkov and Ledoux, they can also be rewritten as modified
logarithmic Sobolev inequalities, which are just usual logarithmic
Sobolev inequalities, except that there is a Lipschitz constraint on the
logarithm of the density of the measure. These two reformulations of
Poincaré inequalities will be discussed below.
Definition 22.24 (Quadratic-linear cost). Let (X , d) be a metric
space. The quadratic-linear cost cqℓ on X is defined by
(
d(x, y)2 if d(x, y) ≤ 1;
cqℓ (x, y) =
d(x, y) if d(x, y) > 1.

In a compact notation, cqℓ (x, y) = min(d(x, y)2 , d(x, y)). The optimal
total cost associated with cqℓ will be denoted by Cqℓ .
Theorem 22.25 (Reformulations of Poincaré inequalities). Let
M be a Riemannian manifold equipped with a reference probability mea-
sure ν = e−V vol. Then the following statements are equivalent:
(i) ν satisfies a Poincaré inequality;
(ii) There are constants c, K > 0 such that for any Lipschitz proba-
bility density ρ,
Iν (µ)
|∇ log ρ| ≤ c =⇒ Hν (µ) ≤ , µ = ρ ν; (22.48)
K
(iii) ν ∈ P1 (M ) and there is a constant C > 0 such that
∀µ ∈ P1 (M ), Cqℓ (µ, ν) ≤ C Hν (µ). (22.49)
610 22 Concentration inequalities

Remark 22.26. The equivalence between (i) and (ii) remains true
when the Riemannian manifold M is replaced by a general metric space.
On the other hand, the equivalence with (iii) uses at least a little bit
of the Riemannian structure (say, a local Poincaré inequality, a local
doubling property and a length property).

Remark 22.27. The equivalence between (i), (ii) and (iii) can be made
more precise. As the proof will show, if√ν satisﬁes a Poincaré inequality
with constant λ, then for any c < 2 λ there is an explicit constant
K = K(c) > 0 such that (22.48) holds true; and the K(c) converges to
λ as c → 0. Conversely, if for each c > 0 we call K(c) the best constant
in (22.48), then ν satisﬁes a Poincaré inequality with constant λ =
limc→0 K(c). Also, in (ii) ⇒ (iii) one can choose C = max (4/K, 2/c),
while in (iii) ⇒ (i) the Poincaré constant can be taken equal to C −1 .

Theorem 22.25 will be obtained by two ingredients: The ﬁrst one

is the Hamilton–Jacobi semigroup with a nonquadratic Lagrangian;
the second one is a generalization of Theorem 22.17, stated below.
The notation L∗ will stand for the Legendre transform of the convex
function L : R+ → R+ , and L′′ for the distributional derivative of L
(well-deﬁned on R+ once L has been extended by 0 on R− ).

Theorem 22.28 (From generalized log Sobolev to transport to

generalized Poincaré). Let M be a Riemannian manifold equipped
with its geodesic distance d and with a reference probability measure
ν = e−V vol ∈ P2 (M ). Let L be a strictly increasing convex function
R+ → R+ such that L(0) = 0 and L′′ is bounded above; let cL (x, y) =
L(d(x, y)) and let CL be the optimal transport cost associated with the
cost function cL . Further, assume that L(r) ≤ C(1 + r)p for some
p ∈ [1, 2] and some C > 0. Then:
(i) Further, assume that L∗ (ts) ≤ t2 L∗ (s) for all t ∈ [0, 1], s ≥ 0.
If there is λ ∈ (0, 1] such that ν satisfies the generalized logarithmic
Sobolev inequality with constant λ:

For any µ = ρ ν ∈ P (M ) such that log ρ ∈ Lip(M ),

Z
1
Hν (µ) ≤ L∗ |∇ log ρ| dµ; (22.50)
λ
then ν also satisfies the following transport inequality:
Hν (µ)
∀µ ∈ Pp (M ), CL (µ, ν) ≤ . (22.51)
λ
Poincaré inequalities and quadratic-linear transport cost 611

(ii) If ν satisfies (22.51), then it also satisfies the generalized

Poincaré inequality with constant λ:

∀f ∈ Lip(M ), kf kLip ≤ L′ (∞),

Z Z Z
2
f dν = 0 =⇒ f 2 dν ≤ L∗ (|∇f |) dν.
λ
Proof of Theorem 22.28. The proof is the same as the proof of Theo-
rem 22.17. After picking up g ∈ Cb (M ), one introduces the function
Z
1 λt Ht g d(x, y)
φ(t) = log e dν, Ht g(x) = inf g(y) + t L .
λt y∈M t
The properties of Ht g are investigated in Theorem 22.46 in the Ap-
pendix.
It is not always true that φ is continuous at t = 0, but at least the
monotonicity of Ht g implies

lim φ(t) ≤ φ(0).

t→0+

Theorem 22.46 makes it possible to compute the right derivative of φ

as in the proof of Theorem 22.17(i):

d+ φ(t) φ(t + s) − φ(t)
:= lim
dt s→0+ s
Z Z
1 λtH g λtH g
= Z − e t
dν log e t
dν
λt2 eλtHt g dν M M
M
Z Z
λtHt g 1 2 2 ∗ −
λtHt g
+ (λtHt g)e dν − λ t L (|∇ Ht g|) e dν
M 2λ M
Z Z
1
≤ Z − eλtHt g dν log eλtHt g dν
λt2 eλtHt g dν M M
M
Z Z
λtHt g 1 ∗ −
λtHt g
+ (λtHt g)e dν − L λt|∇ Ht g| e dν ,
M 2λ M
(22.52)

where the inequality L∗ (λts) ≤ λ2 t2 L∗ (s) was used. By assumption, the

quantity inside square brackets is nonpositive, so φ is nonincreasing on
(0, 1], and therefore on [0, 1]. The inequality φ(1) ≤ φ(0) can be recast
as
612 22 Concentration inequalities
Z Z
1 λ inf y∈M g(y)+L(d(x,y))
log e ν(dx) ≤ g dν,
λ M M
which by Theorem 5.26 is the dual formulation of (22.51).
As for part (ii) of the theorem, it is similar to part (ii) of Theo-
rem 22.17. ⊓
⊔

Now we have enough tools at our disposal to carry on the proof of

Theorem 22.25.

Proof of Theorem
R 22.25. WeR shall start with the proof of (i) ⇒ (ii). Let
f = log ρ − (log ρ) dν; so f Rdν = 0 and the assumption
R in (ii) reads
|∇f | ≤ c. Moreover, with a = (log ρ) dν and X = ef dν,
Z
a
Iν (µ) = e |∇f |2 ef dν;

Z Z Z
f +a f +a f +a
Hν (µ) = (f + a)e dν − e dν log e dν
Z Z
= ea f ef dν − ef dν + 1 − ea (X log X − X + 1)
Z Z
a f f
≤e f e dν − e dν + 1 .

So it is suﬃcient to prove
Z Z
f f
1
|∇f | ≤ c =⇒ f e − e + 1 dν ≤ |∇f |2 ef dν. (22.53)
K
√
In the sequel, c is any constant satisfying 0 < c < 2 λ. Inequal-
ity (22.53) will be proven by two auxiliary inequalities:
Z √ Z
2 c 5/λ
f dν ≤ e f 2 e−|f | dν; (22.54)

Z √ !2 Z
1 2 λ+c
f 2 ef dν ≤ √ |∇f |2 ef dν. (22.55)
λ 2 λ−c
Note that the upper bound on |∇f | is crucial in both inequalities.
Once (22.54) and (22.55) are established, the result is immediately
obtained. Indeed, the right-hand side of (22.54) is obviously bounded by
the left-hand side of (22.55), so both expressions are bounded above by
Poincaré inequalities and quadratic-linear transport cost 613
R
a constant multiple of |∇f |2 ef dν. On the other hand, an elementary
study shows that

∀f ∈ R, f ef − ef + 1 ≤ max (f 2 , f 2 ef ),

so (22.53) holds true for some explicit constant K(c).

To obtain (22.54), we proceed as follows. The elementary inequality
2|f |3 ≤ δf 2 + δ−1 f 4 (δ > 0) integrates up to
Z Z Z
3 2 −1
2 |f | dν ≤ δ f dν + δ f 4 dν
Z Z 2 "Z Z 2 #
= δ f 2 dν + δ−1 f 2 dν + δ−1 (f 2 )2 dν − f 2 dν .

(22.56)
R R
By the Poincaré
R 2 inequality,R f 2 dν ≤ (1/λ) |∇f |2 dν ≤ c2 /λ, which
implies ( f dν)2 ≤ (c2 /λ) f 2 dν. Also by the Poincaré inequality,
Z Z 2 Z
2 2 2
(f ) dν − f dν ≤ (1/λ) |∇(f 2 )|2 dν
Z Z
= (4/λ) f 2 |∇f |2 dν ≤ (4c2 /λ) f 2 dν.

Plugging this information back into (22.56), we obtain

Z Z
3 5c2
2 |f | dν ≤ δ + f 2 dν.
δλ
p
The choice δ = 5c2 /λ yields
Z r Z
3 5
|f | dν ≤ c f 2 dν. (22.57)
λ

By Jensen’s inequality, applied with Rthe convex function x → e−|x|

and the probability measure σ = f 2 ν/( f 2 dν),
Z Z Z R
Z
2 −|f | −|f | 2 − |f | dσ 2
f e dν = e dσ f dν ≥ e f dν ;

in other words
Z R Z
2 |f |3 dν
f dν ≤ exp R 2 f 2 e−|f | dν.
f dν
614 22 Concentration inequalities

Combining this inequality with (22.57) ﬁnishes the Rproof of (22.54).

To establish (22.55), we ﬁrst use the condition f dν = 0 and the
Poincaré inequality to write
Z 2
f /2
f e dν
Z 2
1 f (x)/2 f (y)/2
= [f (x) − f (y)] [e −e ] dν(x) dν(y)
4
Z Z
1 2 f (x)/2 f (y)/2 2
≤ |f (x) − f (y)| dν(x) dν(y) [e −e ] dν(x) dν(y)
4
Z Z 2 ! Z Z 2 !
2 f f /2
= f dν − f dν e dν − e dν
Z Z
1 2 f /2 2
≤ |∇f | dν |∇(e )| dν
λ2
Z
c2
≤ 2 |∇f |2 ef dν. (22.58)
4λ
Next, also by the Poincaré inequality and the chain-rule,
Z Z 2
f 2 ef dν − f ef /2 dν
Z
1 2
≤ ∇(f ef /2 ) dν
λ
Z
1 2 f 2 f
= |∇f | 1 + e dν
λ 2
Z Z Z
1 2 f 2 f 1 2 2 f
= |∇f | e dν + |∇f | f e dν + |∇f | f e dν
λ 4
Z sZ sZ !
1 c2 Z
≤ |∇f |2 ef dν + c |∇f |2 ef dν f 2 ef dν + f 2 ef dν .
λ 4
(22.59)

By adding up (22.58) and (22.59), we obtain

Z Z
2 f 1 c2
f e dν ≤ + |∇f |2 ef dν
λ 4λ2
sZ sZ Z
c c2
+ |∇f |2 ef dν f 2 ef dν + f 2 ef dν.
λ 4λ
Poincaré inequalities and quadratic-linear transport cost 615
qR
This inequality of degree 2 involving the two quantities f 2 ef dν
qR
and |∇f |2 ef dν can be transformed into (22.55). (Here the fact
that c2 /(4λ) < 1 is crucial.) This completes the proof of (i) ⇒ (ii).
Now we shall see that (ii) ⇒ (iii). Let ν satisfy a modified loga-
rithmic Sobolev inequality as in (22.48). Then let L(s) = cs2 /2 for
0 ≤ s ≤ 1, L(s) = c(s − 1/2) for s > 1. The function L so defined is
convex, strictly increasing and L′′ ≤ c. Its Legendre transform L∗ is
quadratic on [0, c] and identically +∞ on (c, +∞). So (22.48) can be
rewritten Z
2c
Hν (µ) ≤ L∗ (|∇ log ρ|) dµ.
K
Since L∗ (tr) ≤ t2 L∗ (r) for all t ∈ [0, 1], r ≥ 0, we can apply Theo-
rem 22.28(i) to deduce the modified transport inequality

2c
CL (µ, ν) ≤ max , 1 Hν (µ), (22.60)
K

which implies (iii) since Cqℓ ≤ (2/c) CL .

It remains to check (iii) ⇒ (i). If ν satisfies (iii), then it also sat-
isfies CL (µ, ν) ≤ C Hν (µ), where L is as before (with c = 1); as a
consequence, it satisfies the generalized Poincaré inequality of Theo-
rem 22.28(ii). Pick up any Lipschitz function f and apply this inequal-
ity to εf , where ε is small enough that εkf kLip < 1; the result is
Z Z Z
2
f dν = 0 =⇒ ε f dν ≤ (2C) L∗ (ε|∇− f |) dν.
2

Since L∗ is quadratic on [0, 1], factors ε2 cancel out on both sides, and
we are back with the usual Poincaré inequality. ⊓
⊔

Exercise 22.29. Prove directly the implication (ii) ⇒ (i).

Let us now see the implications of Theorem 22.25 in terms of con-

centration of measure.

Theorem 22.30 (Measure concentration from Poincaré inequal-

ity). Let M be a Riemannian manifold equipped with its geodesic dis-
tance, and with a reference probability measure ν = e−V vol. Assume
that ν satisfies a Poincaré inequality with constant λ. Then there is a
constant C = C(λ) > 0 such that for any Borel set A,
616 22 Concentration inequalities

e−C min(r, r 2 )
∀r ≥ 0, ν[Ar ] ≥ 1 − . (22.61)
ν[A]

Moreover, for any f ∈ Lip(M ) (resp. Lip(M ) ∩ L1 (ν)),

h i −C min r
, r2
kf kLip kf k2
ν x; f (x) ≥ m + r ≤ e Lip , (22.62)

where m is a median (resp. the mean value) of f with respect to ν.

Proof of Theorem 22.30. The proof of (22.61) is similar to the implica-

tion (i) ⇒ (iii) in Theorem 22.10. Deﬁne B = M \ Ar , and let νA =
(1A )ν/ν[A], νB = (1B )ν/ν[B]. Obviously, Cqℓ (νA , νB ) ≥ min(r, r 2 ).
The inequality min(a + b, (a + b)2 ) ≤ 4 [min(a, a2 ) + min(b, b2 )] makes
it possible to adapt the proof of the triangle inequality for W1 (given
in Chapter 6) and get Cqℓ (νA , νB ) ≤ 4 [Cqℓ (νA , ν) + Cqℓ (νB , ν)]. Thus

min(r, r 2 ) ≤ 4[Cqℓ (νA , ν) + Cqℓ (νB , ν)].

By Theorem 22.28, ν satisﬁes (22.49), so there is C > 0 such that

min(r, r 2 ) ≤ C Hν (νA ) + Hν (νB )
1 1
= C log + log ,
ν[A] 1 − ν[Ar ]
and (22.61) follows immediately. Then (22.62) is obtained by arguments
similar to those used before in the proof of Theorem 22.10. ⊓
⊔

Example 22.31. The exponential measure ν(dx) = (1/2)e−|x| dx does

not admit Gaussian tails, so it fails to satisfy properties of Gaussian
concentration expressed in Theorem 22.10. However, it does satisfy a
Poincaré inequality. So (22.61), (22.62) hold true for this measure.

Now, consider the problem of concentration of measure in a product

space, say (M N , ν ⊗N ), where ν satisﬁes a Poincaré inequality. We may
equip M N with the metric
sX
d2 (x, y) = d(xi , yi )2 ;
i

then µ⊗N will satisfy a Poincaré inequality with the same constant
as ν, and we may apply Theorem 22.30 to study concentration in
(M N , d2 , ν ⊗N ).
Poincaré inequalities and quadratic-linear transport cost 617

However, there is a more interesting approach, due to Talagrand, in

which one uses both the distance d2 and the distance
X
d1 (x, y) = d(xi , yi ).
i

Here is the procedure: Given a Borel set A ⊂ M N , ﬁrst enlarge it by r

in distance d2 (that is, consider all points which lie at a distance less
than r from A); then enlarge the result by r 2 in distance d1 . This is
explained in the next theorem, where Ar;d stands for the enlargement
of A by r in distance d, and kf kLip(X ,d) stands for the Lipschitz norm
of f on X with respect to the distance d.

Theorem 22.32 (Product measure concentration from Poincaré

inequality). Let M be a Riemannian manifold equipped with its
geodesic distance d and a reference probability measure ν = e−V vol.
Assume that ν satisfies a Poincaré inequality with constant λ. Then
there is a constant C = C(λ) such that for any N ∈ N, and for any
Borel set A ⊂ M N ,
1 h 2
i 2
ν ⊗N [A] ≥ =⇒ ν ⊗N (Ar;d2 )r ;d1 ≥ 1 − e−C r . (22.63)
2
Moreover, for any f ∈ Lip(M N , d1 )∩Lip(M N , d2 ) (resp. Lip(M N , d1 )∩
Lip(M N , d2 ) ∩ L1 (ν ⊗N )),

r2
h i −C min kf k
r
, 2
Lip(M N ,d1 ) kf kLip(M N ,d )
ν ⊗N x; f (x) ≥ m + r ≤ e 2 ,
(22.64)
where m is a median of f (resp. the mean value of f ) with respect to
the measure ν ⊗N .
Conversely, if (22.63) or (22.64) holds true, then ν satisfies a
Poincaré inequality.

Proof of Theorem 22.32. Once again, the equivalence (22.63) ⇔ (22.64)

is based on arguments similar to those used in the proof of Theo-
rem 22.10; in the sequel I shall use (22.63).
Let us assume that ν satisﬁes a Poincaré inequality, and prove (22.63).
By Theorem 22.25, ν satisﬁes a transport-cost inequality of the form

∀µ ∈ P1 (M ), Cqℓ (µ, ν) ≤ C Hν (µ).

618 22 Concentration inequalities

On M N deﬁne the cost

X
c(x, y) = cqℓ (xi , yi ),

and let C be the associated optimal cost functional.

By Remark 22.7, ν ⊗N satisﬁes an inequality of the form

∀µ ∈ P1 (M N ), C(µ, ν ⊗N ) ≤ C Hν ⊗N (µ). (22.65)

Let A be a Borel set of M N with ν ⊗N [A] ≥ 1/2, and let r > 0 be

2
given. Let B = M N \ (Ar;d2 )r ;d1 . Let νB be obtained by conditioning ν
on B (that is, νB = (1B )ν/ν[B]). Consider the problem of transporting
νB to ν optimally, with the cost c. At least a portion ν ⊗N [A] ≥ 1/2 of
the mass has to go to from B to A, so
1 1
C(νB , ν ⊗N ) ≥ inf c(x, y) =: c(A, B).
2 x∈A, y∈B 2

On the other hand, by (22.65),

1
C(µ, ν ⊗N ) ≤ C Hν ⊗N (µ) = C log .
ν[B]

By combining these two inequalities, we get

h 2
i
ν (Ar;d2 )r ;d1 = 1 − ν[B] ≥ 1 − e−1/(2C) c(A,B) .

To prove (22.63), it only remains to check that

c(A, B) ≥ r 2 .

So let x = (x1 , . . . , xN ) ∈ A, and let y ∈ M N such that c(x, y) < r 2 ;

2
the goal is to show that y ∈ (Ar;d2 )r ;d1 . For each i ∈ {1, . . . , N }, deﬁne
zi = xi if d(xi , yi ) > 1, zi = yi otherwise. Then
X X
d2 (x, z)2 = d(xi , yi )2 ≤ cqℓ (xi , yi ) = c(x, y) < r 2 ;
d(xi ,yi )≤1 i

so z ∈ Ar;d2 . Similarly,
X X
d1 (z, y) = d(xi , yi ) ≤ cqℓ (xi , yi ) = c(x, y) < r 2 ;
d(xi ,yi )>1 i
Poincaré inequalities and quadratic-linear transport cost 619

so y lies at a distance at most r 2 from z, in distance d1 . This concludes

the proof.
Let us now assume that ν satisfies (22.63), and prove the modified
Talagrand inequality appearing in Theorem 22.25(iii), which will in
turn imply the Poincaré inequality. The first step is to note that
2 2
n o
(Ar;d2 )r ;d1 ⊂ Ac;4r := x ∈ X N ; ∃ a ∈ A; c(a, x) ≤ 4r 2 . (22.66)

Indeed, separating the four cases (a) d(ai , zi ) ≤ 1, d(yi , zi ) ≤ 1;

(b) d(ai , zi ) ≤ 1, d(yi , zi ) ≥ 1; (c) d(ai , zi ) ≥ 1, d(ai , yi ) ≥ 1/2; (d)
d(ai , zi ) ≥ 1, d(ai , yi ) ≤ 1/2, d(yi , zi ) ≥ d(ai , zi )/2, one obtains the
elementary inequality

cqℓ (ai , zi ) ≤ 2 d(ai , yi )2 + d(yi , zi ) ,
and (22.66) follows from the deﬁnitions and summation overPi.
As in the proof of Theorem 22.22, let µ bN
x = (1/N ) i δxi , let
N
µx , ν) let mN be a median of fN (with respect to the
fN (x) = Cqℓ (b
probability measure ν ⊗N ) and let A = {x; fN (x) ≤ mN }, so that A
2
has probability at least 1/2. If x ∈ (Ar;d2 )r ;d1 then by (22.66) there is
a ∈ A such that c(a, x) ≤ 4r 2 , and by Theorem 4.8,
4r 2
µN
Cqℓ (b bN
x ,µ a )≤ .
N
On the other hand, as in the proof of Theorem 22.30, one has

µN
Cqℓ (b µN
x , ν) ≤ 4 Cqℓ (b bN
x ,µ µN
a ) + Cqℓ (ba , ν) .

It follows by means of Property (22.63) that

h r2 i 2
µN
ν ⊗N Cqℓ (bx , ν) > 4 m N + 16 ≤ ν ⊗N X N \ Ac;4r
N
2 2
≤ ν ⊗N X N \ (Ar;d2 )r ;d1 ≤ e−C r .
The rest of the argument is as in Theorem 22.22. ⊓
⊔
Example 22.33. Let ν(dx) be P theQ exponential measure e−|x| dx/2 on
⊗N
R, then ν (dx) = (1/2 )e N − |x i | dxi on RN . Theorem 22.32 shows
that for every Borel set A ⊂ RN with ν ⊗N [A] ≥ 1/2 and any δ > 0,
2
ν ⊗N A + Brd2 + Brd21 ≥ 1 − e−cr (22.67)

where Brd stands for the ball of center 0 and radius r in RN for the
distance d.
620 22 Concentration inequalities

Remark 22.34. Strange as this may seem, inequality (22.67) contains

(up to numerical constants) the Gaussian concentration of the Gaussian
measure! Indeed, let T : R → R be the increasing rearrangement of the
exponential measure ν onto the one-dimensional Gaussian measure γ
(so T# ν = γ, (T −1 )# γ = ν). An explicit computation shows that
p
|T (x) − T (y)| ≤ C min |x − y|, |x − y| (22.68)
for some numeric constant C. Let TN (x1 , . . . , xN ) = (T (x1 ), . . . , T (xN ));
obviously (TN )# (ν ⊗N ) = γ ⊗N , (TN )−1
# (γ
⊗N ) = ν ⊗N . Further, let A be

explained by the fact that Gaussian concentration considers arbitrary

sets A, while in many problems one is led to study the concentration of
measure around certain very particular sets, for instance with a “cubic”
structure; then inequality (22.67) might be very eﬃcient.

Example 22.36. Let A = {x ∈ RN ; max |xi | ≤ m} be the centered

cube of side 2m, where m = m(N ) → ∞ is chosen√ in such a way that
γ ⊗N [A] ≥ 1/2. (It is a classical fact that m = O( log N ) will do, but
we don’t need that information.) If r ≥ 1 is small with respect to m,
then the enlargement of the cube is dominated by the behavior of T
√
close to T −1 (m). Since T (x) behaves approximately like x for large
values of x, T −1 (m) is of the order m2 ; and close to m2 the Lipschitz
norm of T is O(1/m). Then the computation before can be sharpened
into
TN TN−1 (A) + Brd2 + Brd21 ⊂ A + BCd2′ r2 /m ;
so the concentration of measure can be felt with enlargements by a
distance of the order of r 2 /m ≪ r.

Dimension-dependent inequalities

There is no well-identiﬁed analog of Talagrand inequalities that would

take advantage of the ﬁniteness of the dimension to provide sharper
concentration inequalities. In this section I shall suggest some natural
possibilities, focusing on positive curvature for simplicity; so M will
be compact. This compactness assumption is not a serious restriction:
If (M, e−V vol) is any Riemannian manifold satisfying a CD(K, N ) in-
equality for some K ∈ R, N < ∞, and e−V vol satisﬁes a Talagrand
inequality, then it can be shown that M is compact.

Theorem 22.37 (Finite-dimensional transport-energy inequal-

ities). Let M be a Riemannian manifold equipped with a prob-
ability measure ν = e−V vol, V ∈ C 2 (M ), satisfying a curvature-
dimension bound CD(K, N ) for some K > 0, N ∈ (1, ∞). Then, for
any µ = ρ ν ∈ P2 (M ),
Z
α 1− N1 1
−N α
N ρ(x0 ) − (N − 1) π(dx0 dx1 ) ≤ 1,
M sin α tan α
(22.69)
622 22 Concentration inequalities
p
where α(x0 , x1 ) = K/(N − 1) d(x0 , x1 ), and π is the unique optimal
coupling between µ and ν. Equivalently,
Z α 1− 1
N α
N − (N − 1) − 1 π(dx0 dx1 )
M sin α tan α
Z 1 h i
α 1− N 1
≤ (N − 1)ρ − N ρ1− N + 1 dν. (22.70)
sin α
1
Remark 22.38. The function (N − 1)r − N r 1− N + 1 is nonnegative,
and so is the integrand in the right-hand side of (22.70). If the coeﬃ-
cient α/ sin α above would be replaced by 1, then the right-hand side
R 1
of (22.70) would be just [(N − 1)ρ − N ρ1− N + 1] dν = HN,ν (ρ).

Corollary 22.39 (Further finite-dimensional transport-energy

inequalities). With the same assumptions and notation as in Theo-
rem 22.37, the following inequalities hold true:

∀p ∈ (1, ∞) HN p,ν (µ) ≥

Z " 1 1 #
α sin α p−1 (1− N )
(N p − 1) − (N − 1) − N (p − 1) dπ;
tan α α
(22.71)
Z
1
2HN,ν (µ) − ρ1− N log ρ dν ≥
Z α 1− 1
α N
(2N − 1) − (N − 1) − N exp 1 − dπ;
tan α sin α
(22.72)
Z
α α
H∞,ν (µ) ≥ (N − 1) 1− + log dπ. (22.73)
tan α sin α

Proof of Theorem 22.37. Apply Theorem 20.10 with U (r) = −N r 1−1/N ,

ρ0 = 1, ρ1 = ρ: This yields the inequality
Z α 1− 1
1 N
−N ≤− N ρ1− N dπ( · | · ) dν
sin α
Z α
− (N − 1) 1 − dπ( · | · ) dν.
tan α
Since π has marginals ρ ν and ν, this is the same as (22.69).
Dimension-dependent inequalities 623

To derive (22.70) from (22.69), it is suﬃcient to check that

Z Z

N Q dπ = Q (N − 1)ρ + 1 dν,

1
where Q = (α/ sin α)1− N . But this is immediate because Q is a sym-
metric function of x0 and x1 , and π has marginals µ = ρ ν and ν,
so
Z Z Z
Q(x0 , x1 ) dν(x0 ) = Q(x0 , x1 ) dν(x1 ) = Q(x0 , x1 ) dπ(x0 , x1 )
Z
= Q(x0 , x1 ) ρ(x0 ) dν(x0 ).

⊓
⊔
1
Proof of Corollary 22.39. Write again Q = (α/ sin α)1− N . The classical
′
Young inequality can be written ab ≤ ap /p+bp /p′ , where p′ = p/(p−1)
is the conjugate exponent to p; so
 ′ 
h i 1 − 1 − pp
1− 1 1 p −1 ρ NQ Q
N pρ Np = (N pρ) ρ− N Q Q p ≤ (N pρ)  + .
p p′

By integration of this inequality and (22.69),

Z
1− 1
HN,ν (µ) + N p = −N p ρ Np
Z Z
1 1
≥− Nρ −N
Q dπ − N (p − 1) Q− p−1 dπ
Z Z
α 1
≥ −1 − (N − 1) dπ − N (p − 1) Q− p−1 dπ,
tan α
which is the same as (22.71).
Then (22.72) and (22.73) are obtained by taking the limits p → 1
and p → ∞, respectively. Equivalently, one can apply the inequalities
ab ≤ a log a − 2a + eb+1 and ab ≤ a log a − a + eb instead of Young’s
inequality; more precisely, to get (22.72) from (22.69), one can write
1 1 1 1
N ρ1− N log ρ N = (N ρ1− N e−Q )(eQ log ρ N )
1
1

≤ (N ρ1− N e−Q ) eQ Q − 2eQ + eρ N ;

and to get (22.73) from (22.69),

624 22 Concentration inequalities
1 1 1
1 1 1

N ρ log Q = (N ρ1− N )ρ N log Q ≤ (N ρ1− N ) ρ N log ρ N − ρ N + Q .

⊓
⊔

All the inequalities appearing in Corollary 22.39 can be seen as

reﬁnements of the Talagrand inequality appearing in Theorem 22.14;
concentration inequalities derived from them take into account, for in-
stance, the
p fact that the distance between any two points can never
exceed π (N − 1)/K.

Exercise 22.40. Give a more direct derivation of inequality (22.73),

based on the fact that U (r) = r log r lies in DC N .

Exercise 22.41. Use the inequalities proven in this section, and the
result of Exercise 22.20, to recover, at least formally, the inequality
Z
KN
h dν = 0 =⇒ khk2H −1 (ν) ≤ khk2L2 (ν)
N −1

under an assumption of curvature-dimension bound CD(K, N ). Now

turn this into a rigorous proof, assuming as much smoothness on h
and on the density of ν as you wish. (Hint: When ε → 0, the optimal
transport between (1+ εh) ν and ν converges in measure to the identity
map; this enables one to pass to the limit in the distortion coeﬃcients.)

Remark 22.42. If one applies the same procedure to (22.71), one re-
covers a constant K(N p)/(N p − 1), which reduces to the correct con-
stant only in the limit p → 1. As for inequality (22.73), it leads to just
K (which would be the limit p → ∞).

Remark 22.43. Since the Talagrand inequality implies a Poincaré in-

equality without any loss in the constants, and the optimal constant in
the Poincaré inequality is KN/(N −1), it is natural to ask whether this
is also the optimal constant in the Talagrand inequality. The answer
is aﬃrmative, in view of Theorem 22.17, since the logarithmic Sobolev
inequality also holds true with the same constant. But I don’t know of
any transport proof of this!

Open Problem 22.44. Find a direct transport argument to prove that

the curvature-dimension CD(K, N ) with K > 0 and N < ∞ implies
e with K
T2 (K) e = KN/(N − 1), rather than just T2 (K).
Recap 625

Note that inequality (22.73) does not solve this problem, since by
Remark 22.42 it only implies the Poincaré inequality with constant K.
I shall conclude with a very loosely formulated open problem, which
might be nonsense:
Open Problem 22.45. In the Euclidean case, is there a particular
variant of the Talagrand inequality which takes advantage of the homo-
geneity under dilations, just as the usual Sobolev inequality in Rn ? Is
it useful?

Recap
In the end, the main results of this chapter can be summarized by just
a few diagrams:
• Relations between functional inequalities: By combining
Theorems 21.2, 22.17, 22.10 and elementary inequalities, one has
CD(K, ∞) =⇒ (LS) =⇒ (T2 ) =⇒ (P) =⇒ (exp1 )
⇓
(T1 ) ⇐⇒ (exp2 ) =⇒ (exp1 )
All these symbols designate properties of the reference measure ν: (LS)
stands for logarithmic Sobolev inequality, (P) for Poincaré inequality,
exp2 means that ν has a ﬁnite square-exponential moment, and exp1
that it has a ﬁnite exponential moment.
• Reformulations of Poincaré inequality: Theorem 22.25 can
be visualized as
(P) ⇐⇒ (LSLL) ⇐⇒ (Tqℓ )
where (LSLL) means logarithmic Sobolev for log-Lipschitz functions,
and (Tqℓ ) designates the transportation-cost inequality involving the
quadratic-linear cost.
• Concentration properties via functional inequalities: The
three main such results proven in this chapter are
(T1 ) ⇐⇒ Gaussian concentration (Theorem 22.10)
(T2 ) ⇐⇒ dimension free Gaussian concentration (Theorem 22.22)
(P ) ⇐⇒ dimension free exponential concentration (Theorem 22.32)
626 22 Concentration inequalities

Appendix: Properties of the Hamilton–Jacobi semigroup

This Appendix is devoted to the proof of Theorem 22.46 below, which

was used in the proof of Theorem 22.28 (and also in the proof of The-
orem 22.17 via Proposition 22.16). It says that if a nice convex La-
grangian L(|v|) is given on a Riemannian manifold, then the solution
f (t, x) of the associated Hamilton–Jacobi semigroup satisfies (a) certain
regularity properties which go beyond differentiability; (b) the point-
wise differential equation
∂f
+ L∗ |∇− f (x)| = 0,
∂t
where |∇− f (x)| is defined by (20.2).
Recall the notion of semiconcavity from Definition 10.10; through-
out this Appendix I shall say “semiconcave” for “semiconcave with a
quadratic modulus”. To say that L is locally semiconcave is equiva-
lent to saying that the (distributional) second derivative L′′ is locally
bounded above on R, once L has been extended by 0 on R− .

Theorem 22.46 (Some properties of the Hamilton–Jacobi semi-

group on a manifold). Let L : R+ → R+ be a strictly increasing,
locally semiconcave, convex continuous function with L(0) = 0. Let M
be a Riemannian manifold equipped with its geodesic distance d. For
any f ∈ Cb (M ), define the evolution (Ht f )t≥0 by


 H0 f = f




 d(x, y)

(Ht f )(x) = inf f (y) + t L (t > 0, x ∈ M ).
y∈M t
(22.74)
Then:
(i) For any s, t ≥ 0, Hs Ht f = Ht+s f .
(ii) For any x ∈ M , inf f ≤ (Ht f )(x) ≤ f (x); moreover, for any
t > 0 the infimum over M in (22.74) can be restricted to the closed ball
B[x, R(f, t)], where

−1 sup f − inf f
R(f, t) = t L .
t
(iii) For any t > 0, Ht f is Lipschitz and locally semiconcave on M ;
moreover kHt f kLip ≤ L′ (∞).
Appendix: Properties of the Hamilton–Jacobi semigroup 627

(iv) For any t > 0, Ht+s f is nonincreasing in s, and converges

monotonically and locally uniformly to Ht f as s → 0; this conclusion
extends to t = 0 if kf kLip ≤ L′ (∞).
(v) For any t ≥ 0, s > 0, x ∈ M ,

|Ht+s f (x) − Ht f (x)|

≤ L∗ kHt f kLip(B[x,R(f,s)]) .
s
(vi) For any x ∈ M and t > 0,

(Ht+s f )(x) − (Ht f )(x)

lim inf ≥ − L∗ |∇− Ht f (x)| ;
s↓0 s

this conclusion extends to t = 0 if kf kLip ≤ L′ (∞).

(vii) For any x ∈ M and t > 0,

(Ht+s f )(x) − (Ht f )(x)

lim = − L∗ |∇− Ht f | ;
s↓0 s

this conclusion extends to t = 0 if kf kLip ≤ L′ (∞) and f is locally

semiconcave.

Remark 22.47. If L′ (∞) < +∞ then in general Ht f is not continuous

as a function of t at t = 0. This can be seen by the fact that kHt f kLip ≤
L′ (∞) for all t > 0.

Remark 22.48. There is no measure theory in Theorem 22.46, and

conclusions hold for all (not just almost all) x ∈ M .

Proof of Theorem 22.46. First, note that the inverse L−1 of L is well-
defined R+ → R+ since L is strictly increasing and goes to +∞ at
infinity. Also L′ (∞) = limr→∞ (L(r)/r) is well-defined in (0, +∞]. Fur-
ther, note that
L∗ (p) = sup p r − L(r)
r≥0

is a convex nondecreasing function of p, satisfying L∗ (0) = 0.

Let x, y, z ∈ M and t, s > 0. Since L is increasing and convex,

d(x, y) d(x, z) + d(z, y)
L ≤L
t+s t+s

t d(x, z) s d(z, y)
≤ L + L ,
t+s t t+s s
628 22 Concentration inequalities

with equality if d(x, z)/t = d(z, y)/s, i.e. if z is an s/(t + s)-barycenter

of x and y. (There always exists such a z.) So

d(x + y) d(x, z) d(x, y)
(t + s) L = inf t L + sL .
t+s z∈M t s
This implies (i).
The lower bound in (ii) is obvious since L ≥ 0, and the upper bound
follows from the choice y = x in (22.74). Moreover, if d(x, y) > R(f, t),
then

d(x, y) R(f, t)
f (x) + t L > (inf f ) + t L
t t
= (inf f ) + (sup f − inf f ) = sup f ;
so the infimum in (22.74) may be restricted to those y ∈ M such that
d(x, y) ≤ R(f, t). Note that R(f, t) is finite for all t > 0.
When y varies in B[x, R(f, t)], the function t L(d(x, y)/t) remains
C-Lipschitz, where C = L′ (R(f, t)/t) < +∞. So Ht f is an infimum of
uniformly Lipschitz functions, and is therefore Lipschitz. It is obvious
that C ≤ L′ (∞).
To prove (iii) it remains to show that Ht f is locally semiconcave for
t > 0. Let (γt )0≤t≤1 be a minimizing geodesic in M , then for λ ∈ [0, 1],
Ht f (γλ ) − (1 − λ) Ht f (γ0 ) − λ Ht f (γ1 )

= inf inf sup f (zλ ) − (1 − λ) f (z0 ) − λ f (z1 )
zλ z0 z1

d(zλ , γλ ) d(z0 , γ0 ) d(z1 , γ1 )
+t L − (1 − λ) L − λL
t t t

d(z, γλ ) d(z, γ0 ) d(z, γ1 )
≥ t inf L − (1 − λ) L − λL ,
z t t t
where the latter inequality has been obtained by choosing z = z0 =
z1 = zλ . The infimum may be restricted to a large ball containing the
balls of radius R(f, t) centered at γ0 , γλ and γ1 . When the image of γ
is contained in a compact set K we may therefore find a large ball B
(depending on K) such that

Ht f (γλ ) − (1 − λ) Ht f (γ0 ) − λ Ht f (γ1 )

d(z, γλ ) d(z, γ0 ) d(z, γ1 )
≥ t inf L − (1 − λ) L − λL .
z∈B t t t
(22.75)
Appendix: Properties of the Hamilton–Jacobi semigroup 629

When z varies in B, the distance function d(z, · ) is uniformly semi-

concave (with a quadratic modulus) on the compact set K; recall
indeed the computations in the Third Appendix of Chapter 10. Let
F = L( · /t), restricted to a large interval where d(z, · ) takes values;
and let ϕ = d(z, · ), restricted to K. Since F is semiconcave increas-
ing Lipschitz and ϕ is semiconcave Lipschitz, their composition F ◦ ϕ
is semiconcave, and the modulus of semiconcavity is uniform in z. So
there is C = C(K) such that

d(z, γλ ) d(z, γ0 ) d(z, γ1 )
inf L − (1 − λ) L − λL
z∈B t t t
≥ − Cλ(1 − λ) d(γ0 , γ1 )2 .

This and (22.75) show that Ht f is locally semiconcave and conclude

the proof of (iii).
To prove (iv), let g = Ht f . It is clear that Hs g is a nonincreasing
function of s since s L(d(x, y)/s) is itself a nonincreasing function of s.
I shall now distinguish two cases.
Case 1: L′ (∞) = +∞; then lims→0 R(g, s) = (sup g−inf g)/L′ (∞) = 0.
For any x ∈ M ,

d(x, y)
g(x) ≥ Hs g(x) = inf g(y) + s L
d(x,y)≤R(g,s) s
≥ inf g(y),
d(x,y)≤R(g,s)

and this converges to g(x) as s → 0, locally uniformly in x.

Case 2: L′ (∞) < +∞; then lims→0 R(g, s) > 0 (except if g is constant,
a case which I omit since it is trivial). Since kgkLip ≤ L′ (∞),

g(y) ≥ g(x) − L′ (∞) d(x, y),

so

d(x, y) ′
g(x) ≥ Hs g(x) ≥ g(x) + inf sL − L (∞) d(x, y)
d(x,y)≤R(g,s) s

R(g, s) ′
≥ g(x) + s L − L (∞) R(g, s) , (22.76)
s

where I used the fact that s L(d/s)−L′ (∞) d is a nonincreasing function

of d (to see this, note that L′ (d/s) − L′ (∞) ≤ 0, where L′ (r) is the
630 22 Concentration inequalities

right-derivative of L at r). By deﬁnition, s L(R(g, s)/s) = sup g − inf g,

so (22.76) becomes
h i
Hs g(x) ≥ g(x) + (sup g − inf g) − L′ (∞) R(g, s) .

As s → 0, the expression inside square brackets goes to 0, and Hs g(x)

converges uniformly to g(x). So (iv) is established.
To prove (v), again let g = Ht f , then
0 ≤ g(x) − Hs g(x)

d(x, y)
= sup g(x) − g(y) − s L
d(x,y)≤R(g,s) s
( )
[g(y) − g(x)]− d(x, y) d(x, y)
≤s sup −L
d(x,y)≤R(g,s) d(x, y) s s
!
[g(y) − g(x)]−
≤ s L∗ sup , (22.77)
d(x,y)≤R(g,s) d(x, y)

where I have used the inequality p r ≤ L(r) + L∗ (p). Statement (v) fol-
lows at once from (22.77). Moreover, if L′ (∞) = +∞, then L∗ is contin-
uous on R+ , so by the deﬁnition of |∇− g| and the fact that R(g, s) → 0,
!
g(x) − Hs g(x) ∗ [g(y) − g(x)]−
lim sup ≤ L lim sup
s↓0 s s↓0 d(x,y)≤R(g,s) d(x, y)

= L∗ |∇− g(x)| ,
which proves (vi) in the case L′ (∞) = +∞.
If L′ (∞) < +∞, things are a bit more intricate. If kgkLip ≤ L′ (∞),
then of course |∇− g(x)| ≤ L′ (∞). I shall distinguish two situations:
• If |∇− g(x)| = L′ (∞), the same argument as before shows
g(x) − Hs g(x)
≤ L∗ (kgkLip ) ≤ L∗ (L′ (∞)) = L∗ (|∇− g(x)|).
s
• If |∇− g(x)| < L′ (∞), I claim that there is a function α = α(s),
depending on x, such that α(s) −→ 0 as s → 0, and

d(x, y)
Hs g(x) = inf g(y) + s L . (22.78)
d(x,y)≤α(s) s
If this is true then the same argument as in the case L′ (∞) = +∞
will work.
Appendix: Properties of the Hamilton–Jacobi semigroup 631

So let us prove (22.78). By Lemma 22.49 below, there exists δ > 0

such that for all y ∈ B[x, R(g, s)],

g(x) − g(y) ≤ L′ (∞) − δ d(x, y). (22.79)

For any α0 > 0 we can ﬁnd s0 such that

s α δ
α ≥ α0 , s ≤ s0 =⇒ L ≥ L′ (∞) − .
α s 2
So we may deﬁne a function α(s) → 0 such that
α δ

′
α ≥ α(s) =⇒ sL ≥ L (∞) − α. (22.80)
s 2

If d(x, y) ≥ α(s), (22.79) and (22.80) imply

d(x, y) d(x, y)
g(y) + s L ≥ g(x) + s L − L′ (∞) − δ d(x, y)
s s
δ
≥ g(x) + d(x, y) > g(x).
2
So the inﬁmum of [g(y) + s L(d(x, y)/s)] may be restricted to those
y ∈ M such that d(x, y) ≤ α(s), and (22.78) is true. Thus (vi) holds
true in all cases.
It only remains to prove (vii). Let g = Ht f ; as we already know,
kgkLip ≤ L′ (∞) and g is locally semiconcave. The problem is to show

g(x) − Hs g(x)
lim inf ≥ L∗ (|∇− g(x)|).
s↓0 s

This is obvious if |∇− g(x)| = 0, so let us assume |∇− g(x)| > 0. (Note
that |∇− g(x)| < +∞ since g is Lipschitz.)
By the same computation as before,

g(x) − Hs g(x) 1 d(x, y)
= sup g(x) − g(y) − s L .
s s d(x,y)≤R(g,s) s

First, assume L′ (∞) = +∞, so L∗ is deﬁned on the whole of R+ . Let

q ∈ ∂L(|∇− g(x)|). As s → 0,

R(g, s)
−→ L−1 (∞) = +∞.
s
632 22 Concentration inequalities

So for s small enough, R(g, s) > s q. This implies

g(x) − Hs g(x) 1 d(x, y)
≥ sup g(x) − g(y) − s L
s s d(x,y)=s q s

g(x) − g(y)
= sup q − L(q) . (22.81)
d(x,y)=s q d(x, y)

Let
g(x) − g(y)
ψ(r) = sup .
d(x,y)=r d(x, y)
If it can be shown that

ψ(r) −−−→ |∇− g(x)|, (22.82)

r→0

then we can pass to the limit in (22.81) and recover

g(x) − Hs g(x)
lim inf ≥ |∇− g(x)| q − L(q) = L∗ (|∇− g(x)|).
s↓0 s

If L′ (∞) < +∞ and |∇− g(x)| = L′ (∞), the above reasoning fails
because ∂L∗ (|∇− g(x)|) might be empty. However, for any θ < |∇− g(x)|
we may ﬁnd q ∈ ∂L∗ (θ), then the previous argument shows that

g(x) − Hs g(x)
lim inf ≥ L∗ (θ);
s↓0 s

the conclusion is obtained by letting θ → |∇− g(x)| and using the lower
semicontinuity of L∗ .
So it all boils down to checking (22.82). This is where the semicon-
cavity of g will be useful. (Indeed (22.82) might fail for an arbitrary
Lipschitz function.) The problem can be rewritten

lim ψ(r) = lim sup ψ(s);

r→0 r→0 s≤r

so it is enough to show that ψ does have a limit at 0.

Let Sr denote the sphere of center x and radius r. If r is small
enough, for any z ∈ Sr there is a unique geodesic joining x to z, and
the exponential map induces a bijection between Sr′ and Sr , for any
r ′ ∈ (0, r]. Let λ = r ′ /r ∈ (0, 1]; for any y ∈ Sr′ we can ﬁnd a unique
geodesic γ such that γ0 = x, γλ = y, γ1 ∈ Sr . By semiconcavity, there
is a constant C = C(x, r) such that
Appendix: Properties of the Hamilton–Jacobi semigroup 633

g(γλ ) − (1 − λ) g(γ0 ) − λ g(γ1 ) ≥ − C λ(1 − λ) d(γ0 , γ1 )2 .

This can be rewritten

g(γ0 ) − g(γ1 ) g(γ0 ) − g(γλ )
− ≥ − C λ(1 − λ) d(γ0 , γ1 );
d(γ0 , γ1 ) λ d(γ0 , γ1 )

or equivalently,

g(x) − g(γ1 ) g(x) − g(y)

− ≥ − C (r − r ′ ).
d(x, γ1 ) d(x, y)

So
g(x) − g(y)
d(x, y) = r ′ =⇒ ψ(r) − ≥ − C (r − r ′ ).
d(x, y)
By taking the supremum over y, we conclude that

ψ(r) − ψ(r ′ ) ≥ − C (r − r ′ ).

In particular ψ(r) + C r is a nondecreasing function of r, so ψ has a

limit as r → 0. This concludes the proof. ⊓
⊔

The following lemma was used in the proof of Theorem 22.46:

Lemma 22.49. Let M be a Riemannian manifold (or more generally,

a geodesic space), and let L, R be positive numbers. If g : M → R is
L-Lipschitz and |∇− g(x)| < L for some x ∈ M then there is δ > 0 such
that for any y ∈ B[x, R],

g(x) − g(y) ≤ (L − δ) d(x, y).

Proof of Lemma 22.49. By assumption

[g(y) − g(x)]−
lim sup < L.
y→x d(x, y)

So there are r > 0, η > 0 such that if d(x, z) ≤ r then

g(x) ≤ g(z) + (L − η) d(x, z). (22.83)

Let y ∈ B[z, R] and let γ be a geodesic joining γ(0) = x to γ(1) = y;

let z = γ(r/R). Then d(x, z) = (r/R) d(x, y) ≤ r, so (22.83) holds true.
As a consequence,
634 22 Concentration inequalities

g(x) − g(y) = [g(x) − g(z)] + [g(z) − g(y)]

≤ (L − η) d(x, z) + L d(z, y)
= L d(x, y) − η d(x, z)
r
≤ L−η d(x, y),
R
which proves the lemma. ⊓
⊔

Bibliographical notes

Most of the literature described below is reviewed with more detail in

the synthesis works of Ledoux [542, 543, 546]. Selected applications of
the concentration of measure to various parts of mathematics (Banach
space theory, fine study of Brownian motion, combinatorics, percola-
tion, spin glass systems, random matrices, etc.) are briefly developed
in [546, Chapters 3 and 8]. The role of Tp inequalities in that theory is
discussed in [546, Chapter 6], [41, Chapter 8], and [427]. One may also
take a look at Massart’s Saint-Flour lecture notes [597].
Lévy is often quoted as the founding father of concentration the-
ory. His work might have been forgotten without the determination of
V. Milman to make it known. The modern period of concentration of
measure starts with a work by Milman himself on the so-called Dvoretzy
theorem [634].
The Lévy–Gromov isoperimetric inequality [435] is a way to get
sharp concentration estimates from Ricci curvature bounds. Gromov
has further worked on the links between Ricci curvature and concentra-
tion, see his very influential book [438], especially Chapter 3 12 therein.
Also Talagrand made decisive contributions to the theory of concentra-
tion of measure, mainly in product spaces, see in particular [772, 773].
Dembo [294] showed how to recover several of Talagrand’s results in an
elegant way by means of information-theoretical inequalities.
Tp inequalities have been studied for themselves at least since the
beginning of the nineties [695]; the Csiszár–Kullback–Pinsker inequal-
ity can be considered as their ancestor from the sixties (see below).
Sometimes it is useful to consider more general transport inequalities
of the form C(µ, ν) ≤ Hν (µ), or even C(µ, µ e) ≤ Hν (µ) + Hν (eµ) (recall
Theorem 22.30 and its proof).
Bibliographical notes 635

It is easy to show that transport inequalities are stable under

weak convergence [305, Lemma 2.2]. They are also stable under push-
forward [425].
Proposition 22.3 was studied by Rachev [695] and Bobkov and
Götze [128], in the cases p = 1 and p = 2. These duality formulas were
later systematically exploited by Bobkov, Gentil and Ledoux [127, 131].
The Legendre reformulation of the H functional can be found in many
sources (for instance [577, Appendix B] when X is compact).
The tensorization argument used in Proposition 22.5 goes back to
Marton [595]. The measurable selection theorem used in the construc-
tion of the coupling π can be found e.g. in [288]. As for Lemma 22.8, it
is as old as information theory, since Shannon [746] used it to motivate
the introduction of entropy in this context. After Marton’s work, this
tensorization technique has been adapted to various situations, such
as weakly dependent Markov chains; see [124, 140, 305, 596, 621, 703,
730, 841]. Relations with the so-called Dobrushin(–Shlosman) mixing
condition appear in some of these works, for instance [596, 841]; in fact,
the original version of the mixing condition [308, 311] was formulated
in terms of optimal transport!
It is also Marton [595] who introduced the simple argument by which
Tp inequalities lead to concentration inequalities (implication (i) ⇒ (iii)
in Theorem 22.10), and which has since then been reproduced in nearly
all introductions to the subject. She used it mainly
P with the so-called
Hamming distance: d((xi )1≤i≤n , (yi )1≤i≤n ) = 1xi 6=yi .
There are alternative functional approaches to the concentration
of measure: via logarithmic Sobolev inequalities [546, Chapter 5] [41,
Chapter 7]; and via Brunn–Minkowski, Prékopa–Leindler, or isoperi-
metric inequalities [546, Chapter 2]; one may also consult [547] for a
short review on these various approaches. For instance, (19.27) imme-
diately implies
Kr 2
r e− 4
ν[A ] ≥ 1 − .
ν[A]
This kind of inequality goes back to Gromov and V. Milman [440],
who also studied concentration from Poincaré inequalities (as Borovkov
and Utev [146] did independently at about the same time). The rela-
tions between Poincaré inequalities and concentration are reviewed by
E. Milman [633], who also shows that a weak concentration estimate,
together with the CD(0, ∞) criterion, implies Poincaré (and even the
stronger Cheeger isoperimetric inequality). The tight links between all
636 22 Concentration inequalities

these functional inequalities show that these various strategies are in

some sense related.
First introduced by Herbst, the Laplace transform became an im-
portant tool in some of these developments, especially in the hands of
Ledoux and coworkers — as can be seen in many places in [546].
Theorem 22.10 has been obtained by patching together results due
to Bobkov and Götze [128], Djellout, Guillin and Wu [305], and Bolley
and myself [140], together with a few arguments from folklore. There
is an alternative proof of (ix) ⇒ (i) based on the following fact, well-
known to specialists: Let X be a centered real random variable such
2
that E eX < ∞, then the Laplace transform of X is bounded above by
a Gaussian Laplace transform.
The Csiszár–Kullback–Pinsker (CKP) inequality (22.25) was found
independently by Pinsker [682], Kullback [534] and Csiszár [253]. The
popular short proof by Pinsker is based on an obscure inequality on
the real line [41, ineq. (8.4)]. The approach used in Remark 22.12 is
taken from [140]; it takes inspiration from an argument which I heard
in a graduate course by Talagrand. In Exercise 29.22 I shall propose yet
another line of reasoning which is closer to Csiszár’s original argument,
and which was the basis to the generalization in the context of quantum
physics [473]. (This reference was pointed out to me by Seiringer.)
Weighted CKP inequalities such as (22.16) were introduced in my
paper with Bolley [140]; then Gozlan and Léonard [430] studied similar
inequalities from the point of view of the theory of large deviations.
More information can be found in Gozlan’s PhD thesis [427]. Different
kinds of generalizations of the CKP inequality appear in [140, 669, 797,
842], together with applications.
Talagrand [774] proved Theorem 22.14 when ν is the Gaussian
measure in Rn , using a change of variables in the one-dimensional
case, and then a tensorization argument (Corollary 22.6). This strat-
egy was developed by Blower [122] who proved Theorem 22.14 when
M = Rn , ν(dx) = e−V (x) dx, ∇2 V ≥ K > 0; see also Cordero-
Erausquin [242]. Generalizations to nonquadratic costs appear in [245].
Also Kolesnikov [526] made systematic use of this approach in infinite-
dimensional situations and for various classes of inequalities. More re-
cently, the same strategy was used by Barthe [73] to recover the modi-
fied transport inequalities for the exponential measure on the half-line
(a particular case of Theorem 22.25).
Bibliographical notes 637

Otto and I [671] found an alternative approach to Theorem 22.14,

via the HWI inequality (which at the time of [671] had been established
only in Rn ). The proof which I have used in this chapter is the same
as the proof in [671], modulo the extension of the HWI inequality to
general Riemannian manifolds.
There are several other schemes of proof for Theorem 22.14. One
consists in combining Theorems 21.2 and 22.17(i). When M = Rn ,
there is an argument based on Caffarelli’s log concave perturbation
theorem [188] (exercise). Yet another proof has been given by Bobkov
and Ledoux [131], based on the Brunn–Minkowski inequality, or its
functional counterpart, the Prékopa–Leindler inequality (in this work
there are interesting extensions to cases where the convexity assump-
tions are not the standard ones and the cost might not be quadratic).
Bobkov and Ledoux only worked in Rn , but it is quite possible that
their strategy can be extended to genuinely Riemannian situations,
by means of the “Riemannian” Prékopa–Leindler inequality stated in
Theorem 19.16.
Theorem 22.17 (log Sobolev implies T2 implies Poincaré) was first
proven by Otto and myself [671]; the Otto calculus had first been used
to get an idea of the result. Our proof relied on a heat semigroup
argument, which will be explained later in Chapter 25. The “dual”
strategy which I have used in this chapter, based on the Hamilton–
Jacobi semigroup, is due to Bobkov, Gentil and Ledoux [127]. In [671]
it was assumed that the Ricci curvature of the manifold M is bounded
below, and this assumption was removed in [127]. This is because the
proof in [671] used a heat semigroup, which has infinite speed of prop-
agation and is influenced by the asymptotic behavior of the manifold,
while the argument in [127] was based on the Hopf–Lax semigroup,
for which there is only finite speed of propagation (if the initial da-
tum is bounded). The methods in [127] were pushed in [412] to treat
nonquadratic cost functions. Infimum convolutions in the style of the
Hopf–Lax formula also play a role in [131, 132], in relation with loga-
rithmic or plain Sobolev inequalities. Much later, Gozlan [429] found a
third proof of Theorem 22.17, based on Theorem 22.22 (which is itself
based on Sanov’s theorem).
Various generalizations of the proof in [671] were studied by Catti-
aux and Guillin [219]; see the bibliographical notes of Chapter 25 for
more details.
638 22 Concentration inequalities

The proof of [127] was adapted by Lott and myself [579] to compact
length spaces (X , d) equipped with a reference measure ν that is locally
doubling and satisfies a local Poincaré inequality; see Theorem 30.28
in the last chapter of these notes. In fact the proof of Theorem 22.17,
as I have written it, is essentially a copy–paste from [579]. (A detailed
proof of Proposition 22.16 is also provided there.) Then Gozlan [429]
relaxed these assumptions even further.
If M is a compact Riemannian manifold, then the normalized vol-
ume measure on M satisfies a Talagrand (T2 ) inequality: This results
from the existence of a logarithmic Sobolev inequality [710] and The-
orem 22.17. Moreover, by [671, Theorem 4], the diameter of M can
be bounded in terms of the constant in the Talagrand inequality, the
dimension of M and a lower bound on the Ricci curvature, just as
in (21.21) (where now λ stands for the constant in the Talagrand in-
equality). (The same bound certainly holds true even if M is not a priori
assumed to be compact, but this was not explicitly checked in [671].)
There is an analogous result where Talagrand inequality is replaced by
logarithmic Sobolev inequality [544, 727].
The remarkable result according to which dimension free Gaussian
concentration bounds are equivalent to T2 inequality (Theorem 22.22)
is due to Gozlan [429]; the proof of (iii) ⇒ (i) in Theorem 22.22 is ex-
tracted from this paper. Gozlan’s argument relies on Sanov’s theorem in
large deviation theory [296, Theorem 6.2.10]; this classical result states
that the rate of deviation of the empirical measure of independent, iden-
tically distributed samples is the (Kullback) information with respect
to their common law; in other words, under adequate conditions,
1 n o
− log ν ⊗N [b µNx ∈ A] ≃ inf H ν (µ); µ ∈ A .
N
A simple proof of the particular estimate (22.47) is provided in the
Appendix of [429]. The observation that Talagrand inequalities and
Sanov’s theorem match well goes back at least to [139]; but Gozlan’s
theorem uses this ingredient with a quite new twist.
Varadarajan’s theorem (law of large numbers for empirical mea-
sures) was already used in the proof of Theorem 5.10; it is proven for
instance in [318, Theorem 11.4.1]. It is anyway implied by Sanov’s the-
orem.
Theorem 22.10 shows that T1 is quite well understood, but many
questions remain open about the more interesting T2 inequality. One
of the most natural is the following: given a probability measure ν
Bibliographical notes 639
R
satisfying T2 , and a bounded function v, does e−v ν/( e−v dν) also
satisfies a T2 inequality? For the moment, the only partial result in this
direction is (22.29). This formula was first established by Blower [122]
and later recovered with simpler methods by Bolley and myself [140].
If one considers probability measures of the form e−V (x) dx with
V (x) behaving like |x|β for large |x|, then the critical exponents for
concentration-type inequalities are the same as we already discussed for
isoperimetric-type inequalities: If β ≥ 2 there is the T2 inequality, while
for β = 1 there is the transport inequality with linear-quadratic cost
function. What happens for intermediate values of β has been investi-
gated by Gentil, Guillin and Miclo in [410], by means of modified log-
arithmic Sobolev inequalities in the style of Bobkov and Ledoux [130].
Exponents β > 2 have also been considered in [131].
It was shown in [671] that (Talagrand) ⇒ (log Sobolev) in Rn , if
the reference measure ν is log concave (with respect to the Lebesgue
measure). It was natural to conjecture that the same argument would
work under an assumption of nonnegative curvature (say CD(0, ∞));
Theorem 22.21 shows that such is indeed the case.
It is only recently that Cattiaux and Guillin [219] produced a coun-
terexample on the real line, showing that the T2 inequality does not
necessarily imply a log Sobolev inequality. Their counterexample takes
the form dν = e−V dx, where V oscillates rather wildly at infinity,
in particular V ′′ is not bounded below. More precisely, their potential
looks like V (x) = |x|3 + 3x2 sin2 x + |x|β as x → +∞; then ν satisfies a
logarithmic Sobolev inequality only if β ≥ 5/2, but a T2 inequality as
soon as β > 2. Counterexamples with V ′′ bounded below have still not
yet been found.
Even more recently, Gozlan [425, 426, 428] exhibited a characteriza-
tion of T2 and other transport inequalities on R, for certain classes of
measures. He even identified situations where it is useful to deduce log-
arithmic Sobolev inequalities from T2 inequalities. Gentil, Guillin and
Miclo [411] considered transport inequalities on R for log-concave prob-
ability measures. This is a rather active area of research. For instance,
consider a transport inequality of the form C(µ, ν) ≤ Hν (µ), where the
cost function is c(x, y) = θ(a |x − y|), a > 0, and θ : R+ → R+ is convex
with θ(r) = r 2 for 0 ≤ r ≤ 1. If ν(dx) = e−V dx with V ′′ = o(V ′2 ) at
infinity and lim supx→+∞ θ ′ (λ x)/V ′ (x) < +∞ for some λ > 0, then
there exists a > 0 such that the inequality holds true.
640 22 Concentration inequalities

Theorem 22.14 admits an almost obvious generalization: if F is uni-

formly K-displacement convex and has a minimum at ν, then
KW2 (µ, ν)2
≤ F(µ) − F(ν). (22.84)
2
Such inequalities have been studied in [213, 671] and have proven use-
ful in the study of certain partial differential equations: see e.g. [213]
(various generalizations of the inequalities considered there appear
in [245, 412]). In Section 5 of [213], (22.84) is combined with the HWI
inequality and the convergence of the functional F, to deduce conver-
gence in total variation. By the way, this is one of the rare applications
in finite (fixed) dimension that I know where a T2 -type inequality has
a real advantage on a T1 -type inequality.
Optimal transport inequalities in infinite dimension have started
to receive a lot of attention recently, for instance on the Wiener space.
A major technical difficulty is that the natural distance in this problem,
the so-called Cameron–Martin distance, takes the value +∞ “most of
the time”. (So it is not a real distance, but rather a pseudo-distance.)
Gentil [408, Section 5.8] established the T2 inequality for the Wiener
measure by using the logarithmic Sobolev inequality on the Wiener
space, and adapting the proof of Theorem 22.17(i) to that setting.
Feyel and Üstünel [359] on the one hand, and Djellout, Guillin and
Wu [305, Section 6] on the other, suggested a more direct approach
based on Girsanov’s formula. Interestingly enough, the T2 inequality
on the Wiener space implies the T2 inequality on the Gaussian space,
just by “projection” under the map (xt )0≤t≤1 → x1 ; this gives another
proof of Talagrand’s original inequality (with the optimal constant)
for the Gaussian measure. There are closely related works by Wu and
Zhang [843, 844]. Also F.-Y. Wang [830, 831] studied another kind of
Talagrand inequality on the path space over an arbitrary Riemannian
manifold.
In his recent PhD thesis, Shao [748] studied T2 inequalities on the
path space and loop space constructed over a compact Lie group G.
(The path space is equipped with the Wiener measure over G.) Together
with Fang [341], he adapted the strategy based on the Girsanov formula,
to get a T2 inequality on the path space, and also on the path space over
the loop space; then by reduction he gets a T2 inequality on the loop
space (equipped with a measure associated with the Brownian motion
on loop space). This approach however only seems to give results when
the loop space is equipped with the topology of uniform convergence,
Bibliographical notes 641

not with the more natural Cameron–Martin distance. I refer to [341]

for more explanations.
Fang and Shao [340] also extended Theorem 22.17 (Logarithmic
Sobolev implies Talagrand inequality) to an infinite-dimensional set-
ting, via the study of the Hamilton–Jacobi semigroup in infinite dimen-
sion. Thanks to known results about logarithmic Sobolev inequalities
on loop spaces (studied by Driver, Lohrentz and others), they recover a
T2 inequality on the loop space, now for the Cameron–Martin distance.
The technical core of these results is the analysis of the Hamilton–
Jacobi for semi-distances in infinite dimension, performed in [749].
Very recently, Fang and Shao [339] used Talagrand inequalities to
obtain results of unique existence of optimal transport in the Wiener
space over a Lie group, when the target measure ν is the Wiener mea-
sure and the source measure µ satisfies Hν (µ) < +∞. In the standard
(Gaussian) Wiener space, Feyel and Üstünel [359] have solved the same
problem in more generality, but so far their results have not been ex-
tended outside the Gaussian setting.
A quite different direction of generalization is in the field of free
probability, where analogs of the Talagrand inequality have been es-
tablished by various authors [118, 474, 475].
The equivalence between Poincaré inequalities and modified trans-
port inequalities, as expressed in Theorem 22.25, has a long history.
Talagrand [770] had identified concentration properties satisfied by the
exponential measure, or a product of exponential measures. He estab-
lished the following precise version of (22.67):

√ e−r
ν ⊗N A + 6 rB1d2 + 9rB1d1 ≥ 1 − .
ν ⊗N [A]

A proof can be found in [546, Theorem 4.16]. It is also Talagrand

who noticed that concentration inequalities for the product exponen-
tial measure are in some sense stronger than concentration inequalities
for the Gaussian measure (Remark 22.34 and Example 22.36, which
I copied from [546]). Then Maurey [607] found a simple approach to
concentration inequalities for the product exponential measure. Later,
Talagrand [774] made the connection with transport inequalities for
the quadratic-linear cost. Bobkov and Ledoux [130] introduced modi-
ﬁed logarithmic Sobolev inequalities, and showed their equivalence with
Poincaré inequalities. (The proof of (i) ⇒ (ii) is copied almost verba-
tim from [130].) Very recently, alternative methods based on Lyapunov
642 22 Concentration inequalities

functionals have been developed to handle these inequalities in an ele-

mentary way [447].
Bobkov and Ledoux also showed how to recover concentration in-
equalities directly from their modified logarithmic Sobolev inequalities,
showing in some sense that the concentration properties of the expo-
nential measure were shared by all measures satisfying a Poincaré in-
equality. Finally, Bobkov, Gentil and Ledoux [127] understood how to
deduce quadratic-linear transport inequalities from modified logarith-
mic Sobolev inequalities, thanks to the Hamilton–Jacobi semigroup.
The proof of the direct implication in Theorem 22.28 is just an ex-
panded version of the arguments suggested in [127]; the proof of the
converse follows Gozlan [429].
In the particular case when ν(dx) = e−|x| dx on R+ , there are sim-
pler proofs of Theorem 22.25, also with improved constants; see for
instance the above-mentioned works by Talagrand or Ledoux, or a re-
cent remark by Barthe [73]. One may also note that deviation estimates
with bounds like exp(−c min(t, t2 )) for sums of independent real vari-
ables go back to the elementary Bernstein inequalities (see [96] or [775,
Theorem A.2.1]).
The treatment of dimension-dependent Talagrand-type inequalities
in the last section is inspired from a joint work with Lott [578]. That
topic had been addressed before, with different tools, by Gentil [409];
it would be interesting to compare precisely his results with the ones
in this chapter. I shall also mention another dimension-dependent in-
equality in Remark 24.14.
Theorem 22.46 (behavior of solutions of Hamilton–Jacobi equations)
has been obtained by generalizing the proof of Proposition 22.16 as it
appears in [579]. When L′ (∞) = +∞, the proof is basically the same,
while there are a few additional technical difficulties if L′ (∞) < +∞.
In fact Proposition 22.16 was established in [579] in a more general con-
text, namely when M is a finite-dimensional Alexandrov spaces with
(sectional) curvature locally bounded below. The same extension prob-
ably holds for Theorem 22.46, although part (vii) would require a bit
more thinking because the inequalities defining Alexandrov spaces are
in terms of the squared distance, not the distance.
The study of Hamilton–Jacobi equations is an old topic (see the
reference texts [68, 199, 327, 558] and the many references therein);
so I do not exclude that Theorem 22.46 might be found somewhere in
the literature, maybe in disguised form. Bobkov and Ledoux recently
Bibliographical notes 643

established closely related results [132, Lemma A] for the quadratic

Hamilton–Jacobi equation in a ﬁnite-dimensional Banach space.
I shall conclude by listing some further applications of Tp inequalities
which I did not previously mention.
Relations of Tp inequalities with the so-called slicing problem are
discussed in [623].
These inequalities are also useful to study the propagation of chaos
or the mean behavior of particle systems [221, 590].
As already noticed before, the functional Hν appears in Sanov’s
theorem as the rate function for the deviations of the empirical mean
of independent samples; this explains why Tp inequalities are handy
tools for a quantitative study of concentration of the empirical measure
associated with certain particle systems [139]. The links with large
deviation theory were further explored in [355, 429, 430, 448]. If one
is interested in the concentration of time averages, then one should
replace the Kullback information Hν by the Fisher information Iν , as
was understood by Donsker and Varadhan [313].1 As a matter of fact,
Guillin, Léonard, Wu and Yao [448] have established that the functional
inequality
α W1 (µ, ν) ≤ Iν (µ),
where α is an increasing function, α(0) = 0, is equivalent to the con-
centration inequality

h1 Z t Z i
dµ −t α kϕkε
P ϕ(Xs ) ds > ϕ dν + ε ≤
dν 2 e
Lip
,
t 0 L (ν)

where (Xs )s≥0 is the symmetric diﬀusion process with invariant mea-
sure ν, µ = law (X0 ), and ϕ is an arbitrary Lipschitz function. (Com-
pare with Theorem 22.10(v).)

1
A related remark, which I learnt from Ben Arous, is that the logarithmic Sobolev
inequality compares the rate functions of two large deviation principles, one for
the empirical measure of independent samples and the other one for the empirical
time-averages.
23

Gradient flows I

Take a Riemannian manifold M and a function Φ : M → R, which for

the sake of this exposition will be assumed to be continuously differen-
tiable. The gradient of Φ, denoted by ∇Φ, is the vector field defined by
the equation
dx Φ · v = h∇x Φ, vix ,
where v is an arbitrary vector in the tangent space Tx M , dx Φ stands
for the differential of Φ at x, and h · , · ix is the scalar product on Tx M .
In other words, if (γt )−ε<t<ε is a smooth path in M , with γ0 = x, then

d d
γt = v =⇒ Φ(γt ) = h∇x Φ, vix .
dt t=0 dt t=0
If |v| is given, then in order to make the latter derivative as large
as possible, the best choice is to take v colinear to ∇x Φ. In that sense
∇x Φ indicates the direction in which Φ increases most rapidly.
The gradient flow associated to Φ is the flow induced by the differ-
ential equation
dX
= − gradX Φ.
dt
One may think of it heuristically as a flow which makes Φ decrease as
fast as possible. Stated in this way, this idea is of course grossly false:
for instance, Ẋ = −λ gradX Φ, λ > 1, will make Φ decrease even faster;
but in the last section of this chapter I shall make the statement more
precise and hopefully more convincing.
An important consequence of the definition of gradient flow is the
following neat formula for the time-derivative of the energy:
d 2
Φ(X(t)) = − gradX(t) Φ .
dt
646 23 Gradient flows I

Gradient ﬂows (as Hamiltonian ﬂows) are everywhere in physics

and mathematics. In mechanics, they often describe the behavior of
damped Hamiltonian systems, in an asymptotic regime in which dis-
sipative eﬀects play such an important role, that the eﬀects of forcing
and dissipation compensate each other. The basic example one should
think of is
Ẍ = − λ gradX Φ − λ Ẋ
(acceleration = forcing − friction), in the limit λ → +∞ (which means
strong friction).

Gradient flows in Wasserstein space

Around the end of the nineties, Jordan, Kinderlehrer and Otto made
the important discovery that a number of well-known partial differen-
tial equations can be reformulated as gradient flows in the Wasserstein
space. The most emblematic example is that of the heat equation,

∂t ρ = ∆ρ,

say in Euclidean space for simplicity. It is classical that this equation

can be seen
R as a gradient flow, for instance for the quadratic functional
Φ(ρ) = |∇ρ|2 dx in L2 (Rn ). But the Jordan–Kinderlehrer–Otto for-
mulation describes the heat equation as a gradient flow in the space
of probability measures, and with a natural “information-theoretical”
content. In this new
R approach, the functional Φ is the negative of the
entropy: Φ(ρ) = ρ log ρ dx.
To better understand this point of view, Otto developed a set of cal-
culation rules which I dubbed “Otto calculus” in Chapter 15. We have
already seen several applications of this calculus, at least for heuristic
purposes.
In this chapter, I shall describe in which rigorous sense one can
say that certain equations are gradient flows in the Wasserstein space.
Before that, it will be necessary to get a good understanding of gradient
flows in abstract metric spaces, a subject which is important in itself.
Reformulations of gradient flows 647

Reformulations of gradient flows

There are several ways to reformulate gradient ﬂows in a weak sense,

so as to obtain deﬁnitions that are general (for nonsmooth energies,
or nonsmooth spaces), and stable (under some limit process). They
usually require a convexity-type assumption on the energy Φ. Here I
shall present some of these reformulations and explain why they are
equivalent to the classical formulation when used in a smooth setting.
Recall the deﬁnitions of |∇− Φ| and ∇− Φ (or ∂Φ):
[Φ(y) − Φ(x)]−
|∇− Φ(x)| = lim sup ;
y→x d(x, y)
n o
∇− Φ(x) = v ∈ Tx M ; ∀w ∈ Tx M, Φ expx (εw) ≥ Φ(x)+εhv, wi+o(ε) .

(It is not a priori assumed that o(ε) is uniform in w.) Proposition 16.2
will also be useful.

Proposition 23.1 (Reformulations of gradient flows). Let M

be a Riemannian manifold. Let Λ = Λ(x, v) be a quadratic form on
TM , satisfying (16.7), and let Φ be a differentiable function M → R,
Λ-convex in the sense of Proposition 16.2. Let X : (t1 , t2 ) → M be a
continuous path, and let t ∈ (t1 , t2 ) be a time where X is differentiable.
Then, the following statements are equivalent:
(i) Ẋ(t) = − gradX(t) Φ;
|Ẋ(t)|2 + |∇− Φ(X(t))|2 d
(ii) = − Φ(X(t));
2 dt
(iii) −Ẋ(t) ∈ ∇− Φ(X(t));
(iv) For any y ∈ M , and any geodesic (γs )0≤s≤1 joining γ0 = X(t)
to γ1 = y,

d+ d(X(t), y)2 d+
≤ Φ(γs )
dt 2 ds s=0
Φ(γs ) − Φ(γ0 )
:= lim sup ;
s↓0 s

(v) For any y ∈ M , and any geodesic (γs )0≤s≤1 joining γ0 = X(t)
to γ1 = y,
Z 1
d+ d(X(t), y)2
≤ Φ(y) − Φ(X(t)) − Λ(γs , γ̇s ) (1 − s) ds;
dt 2 0
648 23 Gradient flows I

(vi) For any y ∈ M , and any geodesic (γs )0≤s≤1 joining γ0 = X(t)
to γ1 = y,

d+ d(X(t), y)2 d(X(t), y)2
≤ Φ(y) − Φ(X(t)) − λ[γ] ,
dt 2 2

where λ[γ] is defined by (16.7).

Remark 23.2. As the proof will show, the equivalence between (iii),
(iv), (v) and (vi) does not require the diﬀerentiability of Φ; it is suﬃcient
that Φ be valued in R ∪ {+∞} and Φ(X(t)) < +∞.

Remark 23.3. The most well-known case is when Λ = 0 (Φ is convex),

and then (v) becomes just

d+ d(X(t), y)2
≤ Φ(y) − Φ(X(t)).
dt 2

Remark 23.4. Statements (i) to (iii) do not explicitly depend on Λ,

so here the assumption of Λ-convexity is not essential. But as soon as
one wants to generalize Proposition 23.1 by dropping some smoothness
assumptions, it might be important to know that Φ is Λ-convex for
some Λ. Note that in formulations (iv) to (vi), one can always replace
Λ by Λ′ ≤ Λ, and the equivalence still holds true, independently of the
choice of Λ′ ! In particular, if Λ(x, v) ≥ λ |v|2 for some λ ∈ R, i.e. when
Φ is λ-convex in the sense of Deﬁnition 16.4, one may replace Λ(x, v)
by λ |v|2 .

Remark 23.5. If one wants to use Proposition 23.1 to characterize a

curve (X(t)) as a gradient ﬂow, the natural regularity assumption is
that X be an absolutely continuous function of t, in the sense of (7.5).
This will imply the existence of the derivative Ẋ(t) for almost all t,
and in addition this will guarantee that the values of X are uniquely
determined by X(0) and the values of Ẋ.

Before going on with the proof of Proposition 23.1, I shall brieﬂy

explain its interest. Property (ii) involves speeds (norms of velocities)
rather than velocities; this is interesting also in a nonsmooth setting,
where the speed might be well-defined even though the velocity is not.
Property (iii) has the advantage of being formulated in terms of sub-
gradients (or subdifferentials), which are often well-defined even if Φ
is not differentiable (for instance if Φ is semiconvex), and quite stable.
Reformulations of gradient flows 649

Finally, Properties (iv) to (vi) are quite handy to study gradient flows
in abstract metric spaces. As a matter of fact, in the sequel I shall use
(iv) to define gradient flows in the Wasserstein space.

Proof of Proposition 23.1. By the chain-rule, and Cauchy–Schwarz and

Young’s inequalities,

d D E
− Φ(X(t)) = −∇Φ(X(t)), Ẋ(t)
dt
|∇Φ(X(t))|2 + |Ẋ(t)|2
≤ ∇Φ(X(t)) Ẋ(t) ≤ ,
2

with equality if and only if ∇Φ(X(t)) and Ẋ(t) have the same norm
and opposite directions. So (i) is equivalent to (ii).
Next, if Φ is diﬀerentiable then

|∇− Φ(x)| = |∇Φ(x)|, ∇− Φ(x) = {∇Φ(x)};

in particular (i) and (iii) are equivalent.

Now, let us check the equivalence of statements (iii) to (vi). Let y
be given, and let γ be a geodesic path joining γ(0) = X(t) to γ(1) = y.
By the formula of ﬁrst variation (7.29),

d+ d(X(t), y)2

≤ − γ̇(0), Ẋ(t) X(t) (23.1)
dt 2

(the distance decreases if Ẋ is in the direction of γ̇(0)). On the other

hand, γs = exp(sw), where w = γ̇(0), so if (iii) holds true then as
s → 0,
Φ(γs ) − Φ(γ0 )
≥ h−Ẋ(t), γ̇0 i + o(1).
s
Consequently,

d+ d(X(t), y)2 Φ(γs ) − Φ(γ0 )

≤ lim inf ,
dt 2 s↓0 s

which obviously implies (iv).

Next, since Φ is Λ-convex,
Z 1
Φ(γs ) − Φ(γ0 ) G(τ, s)
≤ Φ(γ1 ) − Φ(γ0 ) − Λ(γτ , γ̇τ ) dτ,
s 0 s
650 23 Gradient flows I

where G is the Green function deﬁned in (16.5). We can pass to the

limit as s → 0 since G(τ, s)/s ≤ 1 − τ and Λ(γτ , γ̇τ ) is bounded below
for τ ∈ [0, 1]. So
Z 1
Φ(γs ) − Φ(γ0 )
lim sup ≤ Φ(γ1 ) − Φ(γ0 ) − Λ(γτ , γ̇τ ) (1 − τ ) dτ,
s↓0 s 0

and it becomes clear that (iv) ⇒ (v).

R1
The implication (v) ⇒ (vi) is trivial since 0 Λ(γs , γ̇s ) (1 − s) ds ≥
λ[γ] d(γ0 , γ1 )2 /2. So it only remains to check (vi) ⇒ (iii).
Again, let t be given, w ∈ TX(t) M , y = expX(t) (εw). If ε is small
enough there is a unique geodesic γ joining γ(0) = X(t) to γ(1) = y,
namely γ(s) = expX(t) (sεw). Then |γ̇| = ε|w|, and

d d(X(t), y)2

= − γ̇(0), Ẋ(t) = − εw, Ẋ(t) .
dt 2

So (v) implies

d+ d(X(t), y)2
− εw, Ẋ(t) =
dt 2
d(X(t), y)2
≤ Φ(expX(t) εw) − Φ(X(t)) + λ[γ]
2
ε2 |w|2
= Φ(expX(t) εw) − Φ(X(t)) − λ[γ] .
2
As a consequence,

Φ(expX(t) εw) ≥ Φ(X(t)) + ε w, −Ẋ(t) + o(ε),

which precisely means that −Ẋ(t) ∈ ∇− Φ(X(t)), so (iii) is satisﬁed.

This concludes the proof of Proposition 23.1. ⊓
⊔

Gradient flows in metric spaces

Now we are ready to study an abstract deﬁnition of gradient ﬂows. In

the sequel I shall use the following notation:
Gradient flows in metric spaces 651

Notation 23.6 (Locally absolutely continuous paths). Let (X , d)

be a metric space and T ∈ (0, +∞]. I shall denote by ACloc ((0, T ); X )
the set of paths γ : (0, T ) → X such that
R t there is a measurable function
ℓ : (0, T ) → R+ satisfying d(γs , γt ) ≤ s ℓ(τ ) dτ for all s < t in (0, T ),
and Z t2
0 < t1 < t2 < T =⇒ ℓ(τ ) dτ < +∞.
t1

Proposition 23.1 suggests the following deﬁnition for gradient ﬂows

in possibly nonsmooth length spaces. Recall that a geodesic space is a
strictly intrinsic length space, that is, a length space in which any two
points are connected by a (constant-speed, minimizing) geodesic.

Definition 23.7 (Gradient flows in a geodesic space). Let (X , d)

be a geodesic space and let Φ : X → R ∪ {+∞}. Let T ∈ (0, +∞] and let
X ∈ C([0, T ); X ) ∩ ACloc ((0, T ); X ). Then X is said to be a trajectory
of the gradient flow associated with the energy Φ if (a) Φ(X(t)) < +∞
for all t > 0; and (b) for any y ∈ X and for almost any t > 0, there is
a geodesic (γs )0≤s≤1 joining γ0 = X(t) to γ1 = y, such that

d+ d(X(t), y)2 d+
≤ Φ(γs ).
dt 2 ds s=0

Remark 23.8. If Φ is λ-convex, then property (b) in the previous def-

inition implies

d+ d(X(t), y)2 d(X(t), y)2
≤ Φ(y) − Φ(X(t)) − λ . (23.2)
dt 2 2

The proof is the same as for the implication (iv) ⇒ (v) in Proposi-
tion 23.1. I could have used Inequality (23.2) to define gradient flows
in metric spaces, at least for λ-convex functions; but Definition 23.7 is
more general.

Proposition 23.1 guarantees that the concept of abstract gradient

flow coincides with the usual one when X is a Riemannian manifold
equipped with its geodesic distance. In the sequel, I shall apply Defi-
nition 23.7 in the Wasserstein space P2 (M ), where M is a Riemannian
manifold (sometimes with additional geometric assumptions). To avoid
complications I shall in fact use Definition 23.7 in P2ac (M ), that is,
restricting to absolutely continuous probability measures. This might
look a bit dangerous, because X = P2ac (M ) is not complete, but after
652 23 Gradient flows I

all it is a geodesic space in its own right, as a geodesically convex subset

of P2 (M ) (recall Theorem 8.7), and I shall not need completeness. Of
course, this does not mean that it is not interesting to study gradient
flows in the whole of P2 (M ).
To go on with this program, I have to
• compute the (upper) derivative of the distance function;
• compute the subdifferential of a given energy functional.
This will be the purpose of the next two sections, more precisely The-
orems 23.9 and 23.14. Proofs will be long because I tried to achieve
what looked like the “correct” level of generality; so the reader should
focus on the results at first reading.

Derivative of the Wasserstein distance

The next theorem will at the same time study the diﬀerentiability of
the Wasserstein distance, and give simple suﬃcient conditions for a
path in the Wasserstein space to be absolutely continuous, in the sense
of (7.5).
Theorem 23.9 (Derivative of the Wasserstein distance). Let
M be a Riemannian manifold, and [t1 , t2 ) ⊂ R. Let (µt ) and (b µt ) be
two weakly continuous curves [t1 , t2 ) → P (M ). Assume that µt , µ bt ∈
ac
P2 (M ) for all t ∈ (t1 , t2 ), and that µt , µ
bt solve the continuity equations
∂µt ∂µbt
+ ∇ · (ξt µt ) = 0, + ∇ · (ξbt µ
bt ) = 0, (23.3)
∂t ∂t
where ξt = ξt (x), ξbt = ξbt (x) are locally Lipschitz vector fields and
Z t2 Z Z
2
|ξt | dµt + b 2
|ξ| db
µt dt < +∞.
t1 M M

Then t → µt and t → µ bt are Hölder-1/2 continuous and absolutely

continuous. Moreover, for almost any t ∈ (t1 , t2 ),
Z Z
bt )2
d W2 (µt , µ e e ψbt , ξbt i db
=− h∇ψt , ξt i dµt − h∇ µt , (23.4)
dt 2 M M

where ψt , ψbt are (d2 /2)-convex functions such that

e t )# µt = µ
exp(∇ψ bt , e ψbt )# µ
exp(∇ bt = µt .
Derivative of the Wasserstein distance 653

Remark 23.10. Recall that Theorem 10.41 gives a list of a few condi-
e can be replaced by the
tions under which the approximate gradient ∇
usual gradient ∇ in the formulas above.

Exercise 23.11. Guess formula (23.4) by means of Otto calculus.

Remark 23.12. For the purpose of this chapter, the superdiﬀerentia-

bility of the Wasserstein distance would be enough. However, the plain
diﬀerentiability will be useful later on.

Proof of Theorem 23.9. Without loss of generality I shall assume that

τ = |t1 − t2 | is finite.
A crucial ingredient in the proof is the flow associated with the
velocity fields ξ and ξ. b If t and s both belong to [t1 , t2 ), define the
characteristics (or flow, or trajectory map) Tt→s : M → M associated
with ξ by the differential equation


 T (x) = x;
 t→t
(23.5)

 d Tt→s (x) = ξs (Tt→s x).

ds
(If ξs (x) is the velocity field at time s and position x, then Tt→s (x) is
the position at time s of a particle which was at time t at x and then
followed the flow.)
By the formula of conservation of mass, for all t, s ∈ [t1 , t2 ),

µs = (Tt→s )# µt .

The idea is to compose the transport Tt→s with some optimal transport;
this will not result in an optimal transport, but at least it will provide
bounds on the Wasserstein distance.
In other words, µt = law (γt ), where γt is a random solution of
γ̇t = ξt (γt ). Restricting the time-interval slightly if necessary, we may
assume that these curves are deﬁned on the closed interval [t1 , t2 ]. Each
of these curves is continuous and thereforeR bounded.
If γ solves γ̇t = ξt (γt ), then d(γs , γt ) ≤ |ξτ (γτ )| dτ . Since (γs , γt ) is
a coupling of (µs , µt ), it follows from the very deﬁnition of W2 that
654 23 Gradient flows I
s
Z t 2
W2 (µs , µt ) ≤ E |ξτ (γτ )| dτ
s
s Z
t
≤ E (s − t) |ξτ (γτ )|2 dτ
s
sZ
p t
= |s − t| E |ξτ (γτ )|2 dτ
s
sZ Z
p t
= |s − t| |ξτ |2 dµ τ dτ
s
Z t Z
1 2
≤ |s − t| + |ξτ | dµτ dτ .
2 s

This shows at the same time that t → µt is RHölder-1/2, and that it is

t
absolutely continuous: Indeed, W2 (µs , µt ) ≤ s ℓ(τ ) dτ with
Z
1 2
ℓ(τ ) = 1 + |ξτ | dµτ .
2
The rest of the proof is decomposed into four steps. All intermediate
results have their own interest.
StepR 1: t → W2 (µt , σ) is superdifferentiable at each Lebesgue point of
t → |ξt |2 dµt .
In this step, the path µ bt will be constant and equal to some ﬁxed
σ ∈ P2ac (M ). Let t ∈ (t1 , t2 ) be such that
 Z s Z Z
 1



2
|ξt+τ | dµτ dτ −−−→ |ξt |2 dµt ;
s 0 s→0
(23.6)

 Z s Z Z

1
 |ξt−τ |2 dµτ dτ −−−→ |ξt |2 dµt .
s 0 s→0

Let ψt be a d2 /2-convex function such that exp(∇ψ e t )# µt = σ. We shall

see that
Z
W2 (µt+s , σ)2 W2 (µt , σ)2 e t , ξt i dµt + o(s).
≤ − s h∇ψ (23.7)
2 2
Remark 23.13. By Lebesgue’s density theorem, (23.6) holds true for
almost all t, so this step will already establish the superdiﬀerentiability
formula for almost all t, which is all we need in this chapter to identify
gradient ﬂows.
Derivative of the Wasserstein distance 655

Back to the proof of (23.7): By symmetry, it is suﬃcient to establish

it for s > 0. Let T = exp(∇ b be the optimal (Monge) transport
e ψ)
σ → µt . Then
Z
W2 (µt , σ)2 1
= d(x, T (x))2 dσ(x). (23.8)
2 2

For any s > 0 small enough, (Tt→t+s )# µt = µt+s ; so Tt→t+s ◦ T is a

transport σ → µt+s . By deﬁnition of the Wasserstein distance,
Z 2
W2 (µt+s , σ)2 1
≤ d x, Tt→t+s ◦ T (x) dσ(x).
2 2

This, combined with (23.8), implies

1 W2 (µt+s , σ)2 W2 (µt , σ)2
−
s 2 2
Z 2 !
d x, Tt→t+s ◦ T (x) − d(x, T (x))2
≤ dσ(x). (23.9)
2s

The maps exp(∇ψ)e and exp(∇ b are inverse to each other in the
e ψ)
almost sure sense. So for σ(dx)-almost all x, there is a minimizing
e
geodesic connecting T (x) to x with initial velocity ∇ψ(T (x)); then by
the formula of ﬁrst variation,
" 2 #
d x, Tt→t+s ◦ T (x) − d(x, T (x))2
lim sup
s↓0 2s

e
≤ − ξt (T (x)), ∇ψ(T (x)) .

So if we can pass to the lim sup as s → 0 in (23.9), it will follow that

Z
d+ W2 (µt , σ)2

≤− e
ξt (T (x)), ∇ψ(T (x)) dσ(x)
dt 2
ZM
e
= − hξt (y), ∇ψ(y)i d(T# σ)(y)
Z

=− e
ξt (y), ∇ψ(y) dµt (y),

and this will establish the desired (23.7).

656 23 Gradient flows I

So we should check that we can indeed pass to the lim sup in (23.9).
Let v(s, x) be the integrand in the right-hand side of (23.9): If 0 < s ≤ 1
then
2
d x, Tt→t+s ◦ T (x) − d(x, T (x))2
v(s, x) =
2s !
d x, T d x, T
t→t+s ◦T (x) − d(x, T (x)) t→t+s ◦ T (x) + d(x, T (x))
≤
s 2
!
d T (x), Tt→t+s (T (x)) d T (x), Tt→t+s (T (x))
≤ d(x, T (x)) +
s 2
!
d T (x), Tt→t+s (T (x)) d T (x), Tt→t+s (T (x))
≤ d(x, T (x)) +
s 2
2
d(x, T (x))2 d T (x), Tt→t+s (x)
≤ + =: w(s, x).
2 s2
Note that x → d(x, T (x))2 ∈ L1 (σ), since
Z
d(x, T (x))2 dσ(x) = W2 (σ, µt )2 < +∞. (23.10)

Moreover,
Z 2 Z
d T (x), Tt→t+s (T (x)) d(y, Tt→t+s (y))2
dσ(x) = dµt (y)
s2 s2
Z Z s 2
1
≤
ξt+τ (Tt→t+τ (y)) dτ dµt (y)
s2 0
Z Z
1 s 2
≤ ξt+τ (Tt→t+τ (y)) dτ dµt (y)
s 0
Z sZ
1
≤ ξt+τ (Tt→t+τ (y))2 dµt (y) dτ
s 0
Z Z
1 s 2
= |ξt+τ (z)| dµt+τ (z) dτ.
s 0
By assumption, the latter quantity converges as s → 0 to
Z Z
|ξt (x)|2 dµt (x) = |ξt (T (x))|2 dσ(x).

Since d(T (x), Tt→t+s (T (x)))2 /s2 −→ |ξt (T (x))|2 as s → 0, we can com-
bine this with (23.10) to deduce that
Derivative of the Wasserstein distance 657
Z Z
lim sup w(s, x) dσ(x) ≤ lim w(s, x) dσ(x).
s↓0 s↓0

By Fatou’s lemma, in fact

Z Z
lim w(s, x) dσ(x) = lim w(s, x) dσ(x).
s↓0 s↓0

So the domination v(s, x) ≤ w(s, x) is suﬃcient to apply again Fatou’s

lemma, in the form
Z Z " #
lim sup v(s, x) dσ(x) ≤ lim sup v(s, x) dσ(x),
s↓0 s↓0

and conclude the proof of this step.

Step 2: If ξ grows at most linearly, then differentiability holds for all t.
In this step I shall assume that there are z ∈ M and C > 0 such
that for all x ∈ M and t ∈ (t1 , t2 ),

|ξt (x)| ≤ C 1 + d(z, x) .

Under this assumption I shall show, with the same notation as in Step 1,
that t → W2 (µt , σ)2 is diﬀerentiable on the whole of (t1 , t2 ), and
Z
d W2 (µt , σ)2 e t , ξt i dµt .
=− h∇ψ
dt 2 M

I shall start with some estimates on the ﬂow Tt→s . From the as-
sumptions,

d
d(z, Tt→s (x)) ≤ ξs (Tt→s (x)) ≤ C 1 + d(z, Tt→s (x)) ,
ds

which implies 1+d(z, Tt→s (x)) ≤ eCτ (1+d(z, x)). As a consequence, we

have (d/ds) d(x, Tt→s (x)) ≤ C(1 + eCτ (1 + d(z, x))), so d(x, Tt→s (x)) ≤
|s − t| C(1 + eCτ (1 + d(z, x))). To summarize: There is a constant C
such that for all y ∈ M , and t, s ∈ (t1 , t2 ).
(
d(z, Tt→s (x)) ≤ C(1 + d(z, x));
(23.11)
d(x, Tt→s (x)) ≤ C|s − t| (1 + d(z, x)).

In the sequel, the symbol C will stand for other constants that may
depend only on τ and the Lipschitz constant of ξ.
658 23 Gradient flows I

Next, let us check that the second moment of µt is bounded by

a constant independent of t. From the continuity equation (23.3),
(Tt→s )# µt = µs . Combining this and (23.11), we deduce that for any
ﬁxed time t0 ∈ (t1 , t2 ),
Z Z
d(z, x) µt (dx) = d(z, Tt0 →t (x))2 µt0 (dx)
2

Z
≤ C (1 + d(z, x))2 µt0 (dx) < +∞. (23.12)

It is worth noticing that the assumptions also imply the Lipschitz

continuity of t → µt . Indeed, by deﬁnition of W2 and (23.11),
Z
W2 (µt , µs ) ≤ d(x, Tt→s (x))2 µt (dx)
2

Z
2
≤ C |t − s| (1 + d(z, x))2 µt (dx);

taking square roots and using (23.12), we deduce

W2 (µt , µs ) ≤ C |t − s|.

To prove the superdiﬀerentiability ofR W2 (µt , σ)2 , by Step 1 it is

suﬃcient to check the continuity of t → |ξt |2 dµt . Let t ∈ (t1 , t2 ) be
ﬁxed, then
Z Z
2
|ξt+s | dµt+s = ξt+s ◦ Tt→t+s (x) dµt (x).
2
(23.13)

For any x, the integrand |ξt+s ◦ Tt→t+s (x)|2 is a continuous function

of s, and by (23.11) it is bounded above by C (1 + d(z, Tt→t+s (x)))2 ≤
C ′ (1 + d(z, x))2 , where C, C ′ are positive constants. Since the latter
function is integrable with respect to µt , we can apply the R dominated
convergence theorem to show that (23.13) converges to |ξt |2 dµt as
s → 0. This establishes the desired continuity, and the superdifferen-
tiability of W2 (µt , σ)2 .
The proof of subdifferentiability is a bit more tricky and the reader
might wish to skip the rest of this step.
As before, to establish the subdifferentiability, it is sufficient to prove
the right-subdifferentiability, more precisely
Z
W2 (µt+s , σ)2 − W2 (µt , σ)2 e dµt .
lim inf ≥ − hξt , ∇ψi
s↓0 s
Derivative of the Wasserstein distance 659

For each s > 0, let T (s) be the optimal transport between σ and
µt+s . As s ↓ 0 we can extract a subsequence sk → 0, such that

W2 (µt+s , σ)2 − W2 (µt , σ)2
lim inf
s↓0 2s

W2 (µt+sk , σ)2 − W2 (µt , σ)2
= lim .
k→∞ 2sk
Changing signs and reasoning as in Step 1, we obtain
Z
W2 (µt , σ)2 − W2 (µt+s , σ)2
lim sup ≤ lim sup vk (x) σ(dx),
s↓0 2s k→∞
(23.14)
where
2 2
d x, Tt+sk →t ◦ T (t+sk ) (x) − d x, T (t+sk ) (x)
vk (x) = .
2sk

Since T (t) is the unique optimal transport between σ and µt , and

since s → µt+s is continuous with respect to the weak topology, we
know from Corollary 5.23 that T (t+sk ) converges to T (t) in probability,
with respect to the measure σ. Extracting a further subsequence if
necessary, we may assume that T (t+sk ) converges σ(dx)-almost surely
to T (t) (x).
Next, the square distance d2 is locally superdiﬀerentiable, so
2
d x, Tt+sk →t (y) d(x, y)2

≤ + sk ξt (y), γ̇(1) y + o d(y, Tt+sk →t (y)
2 2
d(x, y)2

≤ + sk ξt (y), γ̇(1) y + o(sk ),
2
where γ is a geodesic joining x to y, and the o(sk ) is uniform in a
neighborhood of y. So if yk → y, then
2
d x, Tt+sk →t (yk ) − d(x, yk )2
lim sup ≤ hξt (y), γ̇(1)i.
k→∞ 2sk

Applying this to yk = T (t+sk ) (x) → T (t) (x) (σ-almost surely), we de-

duce that

lim sup vk (x) ≤ v(x) := ξt (T (t) (x)), γ̇x (1) T (t) (x) , (23.15)
k→∞
660 23 Gradient flows I

where γx is the geodesic joining x to T (t) (x): in particular, σ(dx)-almost

e
surely, γ̇x (1) = −∇ψ(T (t) (x)). In view of (23.14) and (23.15), the proof

will be complete if we show

Z Z
lim sup vk dσ ≤ v dσ.
k→∞

Let us bound the functions vk . For each x and k, we can use (23.11)
to derive
2 2
d x, Tt+sk →t ◦ T (t+sk ) (x) − d x, T (t+sk ) (x)

≤ d x, Tt+sk →t (T (t+sk ) (x)) + d(x, T (t+sk ) (x)

× d T (t+sk ) (x), Tt+sk →t (T (t+sk ) (x))

≤ C 1 + d(z, x) + d(z, T (t+sk ) (x) sk d z, T (t+sk ) (x)
2
≤ C sk 1 + d(z, x)2 + d z, T (t+sk ) (x) .

So 2
vk (x) ≤ C 1 + d(z, x)2 + d z, T (t+sk ) (x) .

Let χ : R+ → [0, 1] be a continuous cutoﬀ function, χ(r) = 1 for

r ≤ 1, χ(r) = 0 for r ≥ 2, and for any R ≥ 1 let χR (r) = χ(r/R). (This
is a continuous approximation of the indicator function 1r≤R .) When
χR (1 + d(z, x) + d(z, T (t+sk ) (x))) 6= 0, vk (x) stays bounded like O(R2 ).
So we can invoke Fatou’s lemma as in Step 1: for any ﬁxed R,
Z

lim sup χR 1 + d(z, x) + d(z, T (t+sk ) (x)) vk (x) σ(dx)
k→∞
Z

≤ χR 1 + d(z, x) + d(z, T (t) (x)) v(x) σ(dx).

To conclude the argument it suﬃces to show that

Z

lim (1 − χR ) 1 + d(z, x) + d(z, T (t) (x)) |v(x)| σ(dx) = 0; (23.16)
R→∞
Z

lim lim sup (1− χR ) 1+ d(z, x)+ d(z, T (t+sk ) (x)) |vk (x)| σ(dx) = 0.
R→∞ k→∞
(23.17)
Of course χR (r) 6= 1 only if r > R. Then, say for R ≥ 3,
Derivative of the Wasserstein distance 661

(1 − χR ) 1 + d(z, x) + d(z, T (t) (x)) |v(x)|

≤ C 11+d(z,x)+d(z,T (t) (x))≥R 1 + d(z, x)2 + d(z, T (t) (x))2
h i
≤ (3C) 1d(z,x)≥R/3 d(z, x)2 + 1d(z,T (t) (x))≥R/3 d(z, T (t) (x))2 .

So the integral in (23.16) is bounded by

"Z Z #
2
(3C) d(z, x) σ(dx) + d(z, T (t) (x))2 σ(dx)
d(z,x)≥R/3 d(z,T (t) (x))≥R/3
"Z Z #
≤ (3C) d(z, x)2 σ(dx) + d(z, y)2 µt (dy) ,
d(z,x)≥R/3 d(z,y)≥R/3

which does converge to 0 as R → ∞.

Similarly, the integral in (23.17) is bounded by
"Z Z #
2 2
(3C) d(z, x) σ(dx) + d(z, y) µt+sk (dy) .
d(z,x)≥R/3 d(z,y)≥R/3

Since µt+sk converges to µt in distance W2 , it follows from Theorem 6.9

and Deﬁnition 6.8 that
Z
lim lim sup d(z, y)2 µt+sk (dy) = 0.
R→∞ k→∞ d(z,y)≥R/3

So (23.17) holds, and the proof of subdiﬀerentiability is complete.

Step 3: Doubling of variables.
Now let ξt , ξbt satisfy the same assumptions as in Step 2:
|ξt (x)| ≤ C (1 + d(z, x)), |ξbt (x)| ≤ C (1 + d(z, x)).
By Step 2, s → W2 (µs , µ bt ) and t → W2 (µs , µ
bt ) are diﬀerentiable for
all s, t. To conclude to the diﬀerentiability of t → W2 (µt , µ bt ), we can
use Lemma 23.28 in the Appendix, provided that we check that, say,
s → W2 (µs , µbt ) is (locally) absolutely continuous in s, uniformly in t.
This will result from the triangle inequality:
bt )2 − W2 (µs′ , µ
W2 (µs , µ bt )2
h ih i
= W2 (µs , µ bt ) W2 (µs , µ
bt ) + W2 (µs′ , µ bt ) − W2 (µs′ , µ
bt )
h i
≤ W2 (µs , µ bt ) W2 (µs , µs′ )
bt ) + W2 (µs′ , µ
h i
≤ W2 (µs , σ) + W2 (µs′ , σ) + 2 W2 (b µt , σ) W2 (µs , µs′ ),
662 23 Gradient flows I

where σ is any arbitrary element of P2 (M ). The quantity inside square

brackets is bounded (in fact it is a Lipschitz function of s, s′ and t),
and the path (µs ) is Lipschitz in W2 distance; so in fact

bt )2 − W2 (µs′ , µ
W2 (µs , µ bt )2 ≤ C |s − s′ |

for some constant C.

This concludes the proof of Theorem 23.9 for vector fields which
grow at most linearly at infinity.
Step 4: Integral reformulation and restriction argument.
In this last step I shall complete the proof of Theorem 23.9. Let
ξt , ξbt satisfy the assumptions of the theorem. Let z be a fixed point in
M ; consider the increasing sequence of events Ak = {supt d(z, γt ) ≤ k}.
For k large enough the event Ak has positive probability and it makes
sense to condition γ by it. Then let µt,k be the law of this conditioned
path, evaluated at time t: explicitly,

1γ∈Ak Π(dγ)
µt,k = (et )# Πk , Πk (dγ) = ,
Π[Ak ]

where of course et is the evaluation at time t. Let Zk := Π[Ak ]. Then

Zk ↑ 1, Zk µt,k ↑ µt as k → ∞.
For each k, µt,k solves the same continuity equation as µt :

∂µt,k
+ ∇ · (ξt µt,k ) = 0. (23.18)
∂t
But by definition µt,k is concentrated on the ball B[z, k], so in (23.18)
we may replace ξt by ξt,k = ξχk , where χk is a smooth cutoff function,
0 ≤ χk ≤ 1, χk = 1 on B[z, k], χk = 0 outside of B[z, 2k].
bk , Z
Let A bt,k and ξbt,k be defined similarly in terms of ξb and µ
bk , µ bt .
b
Since ξt,k and ξt,k are compactly supported, we may apply the result
of Step 3: for all t ∈ (t1 , t2 ),
Z Z
d W2 (µt,k , µbt,k )2

e t,k , ξt,k dµt,k −

e ψbt,k , ξbt,k db
=− ∇ψ ∇ µt,k ,
dt 2
(23.19)
where exp(∇ψ e t,k ) and exp(∇ e ψbt,k ) are the optimal transports µt,k →
bt,k and µ
µ bt,k → µt,k .
Since t → µt,k and t → µ bt,k are Lipschitz paths, also W2 (µt,k , µ bt,k )
is a Lipschitz function of t, so (23.19) integrates up to
Derivative of the Wasserstein distance 663

bt,k )2
W2 (µt,k , µ W2 (µ0,k , µ b0,k )2
=
2 2
Z t Z Z
− e e b b
h∇ψs,k , ξs,k i dµs,k + h∇ψs,k , ξs,k i db
µs,k ds. (23.20)
0

Since t → µt and t → µ bt are absolutely continuous paths, a compu-

bt ) is absolutely
tation similar to the one in Step 3 shows that W2 (µt , µ
continuous in t, in particular diﬀerentiable at almost all t ∈ (t1 , t2 ). If
we can pass to the limit as k → ∞ in (23.20), then we shall have

bt )2
W2 (µt , µ b0 )2
W2 (µ0 , µ
=
2 2
Z t Z Z
− e e b b
h∇ψs , ξs i dµs + h∇ψs,k , ξs,k i db
µs,k ds, (23.21)
0

and the desired result will be obtained by diﬀerentiating again in t. So

it all amounts to passing to the limit in (23.20).
First, since Zk ↑ 1 and Zk µs,k ↑ µs , µs,k converges in total variation
(and a fortiori weakly) to µs . Similarly, µbs,k converges in total variation
to µbs .
Next, the restriction property of optimal transport implies that
exp(∇ψe s,k ) coincides with exp(∇ψ e s ), µs,k -almost surely. If x is out-
side a set N0 of zero µs -probability, we know from Theorem 10.28 and
Remark 10.32 that there is a unique geodesic joining x to expx (∇ψ(x)); e
then the initial velocity is uniquely determined by the ﬁnal position, so
e s,k coincides with ∇ψ
∇ψ e s , µs,k -almost surely.
Now let us check that µs,k converges to µs in P2 (M ). It follows from
its deﬁnition that µs,k coincides with µs /Zk on the set es (Ak ). Let S
stand for the support of Π; then
Z
d(z, x)2 d|µs,k − µs (x)|
Z Z
−1 2 1
≤ (Zk − 1) d(z, x) µs (dx) + d(z, x)2 |Zk µs,k − µs |(dx)
Zk
Z Z
−1 2 1
≤ (Zk − 1) d(z, x) µs (dx) + d(z, x)2 µs (dx)
Zk es (S)\es (Ak )
Z Z
1
≤ (Zk−1 − 1) d(z, x)2 µs (dx) + d(z, γs )2 Π(dγ).
Zk S\Ak

By Theorem 6.15, this implies that W2 (µs,k , µs ) → 0 for each s. Simi-

larly, W2 (b bs ) → 0.
µs,k , µ
664 23 Gradient flows I

Moreover, if k is large enough that Zk ≥ 1/2, we have the uniform

bound Z Z
d(z, x) d|µs,k − µs (x)| ≤ 2 d(z, x)2 µs (dx),
2

which is a continuous function of s (because µs is continuous in W2 ;

recall Theorem 6.9). So there is a uniform bound (independent of s) on
W2 (µs,k , µs ). Similarly, there is a uniform bound on W2 (b bs ).
µs,k , µ
Since W2 (µt,k , µt ) and W2 (b bt ) converge to 0 as k → ∞, by
µt,k , µ
Corollary 6.11 the first two terms in (23.20) converge to the first two
terms in (23.21). To conclude the argument it suffices to check the
convergence of the last two terms. Let us show for instance that
Z t Z Z t Z
e
h∇ψs,k , ξs i dµs,k ds −−−→ e
h∇ψs , ξs i dµs ds. (23.22)
0 k→∞ 0

First observe that the integrand in (23.22) is dominated by an inte-

grable function of s. Indeed, there is a constant C such that
Z sZ sZ

h∇ψ
e s,k , ξs i dµs,k ≤ e s,k |2 dµs,k
|∇ψ |ξs |2 dµs,k

s Z
1
≤ W2 (b
µs,k , µs,k ) |ξs |2 dµs
Zk
sZ
≤C |ξs |2 dµs ,

and the latter function lies in L2 (ds). So it is suﬃcient to prove that

for almost all s,
Z Z
e s,k , ξs i ρs,k dν −−−→ h∇ψ
h∇ψ e s , ξs i ρs dν, (23.23)
k→∞

where of course ρs,k is the density of µs,k with respect to ν. It is suﬃcient

to prove this for a subsequence of some arbitrary subsequence (in k), so
we may assume that ρs,k converges ν-almost everywhere to ρs . Recalling
e s,k coincides with ∇ψ
that ∇ψ e s , µs,k -almost surely, we just have to show
Z
e s , ξs (ρs,k − ρs )i dν −−−→ 0.
h∇ψ
k→∞

This is easily done by dominated convergence: Indeed, ρs,k ≤ 2ρs and

e s | |ξs | ρs is integrable, since
|∇ψ
Subdifferential of energy functionals 665
Z sZ sZ
e s | |ξs | ρs dν ≤
|∇ψ e s |2 ρs dν
|∇ψ |ξs |2 ρs dν
sZ
bs )
= W2 (µs , µ |ξs |2 dµs < +∞.

This concludes the proof of Theorem 23.9. ⊓

⊔

Subdifferential of energy functionals

The next problem to be addressed is the diﬀerentiation of an energy

functional Uν , along a path in the Wasserstein space P2 (M ), or rather
in P2ac (M ). This problem is easy to solve formally by means of Otto
calculus, but the rigorous justification is definitely not trivial, espe-
cially when M is noncompact. The proof of the next statement will use
Alexandrov’s second differentiability theorem (Theorem 14.1), some el-
ements of distribution theory, and many technical tricks. I shall denote
1,1
by Wloc (M ) the space of functions f which are locally integrable in M
and whose distributional gradient ∇f is defined by a locally integrable
function. Recall Convention 17.10.

Theorem 23.14 (Computation of subdifferentials in Wasser-

stein space). Let M be a Riemannian manifold, equipped with a
reference measure ν = e−V vol, V ∈ C 2 (M ), satisfying a curvature-
dimension bound CD(K, N ) for some K ∈ R, N ∈ (1, ∞]. Let
U ∈ DCN , p(r) = r U ′ (r) − U (r), let µ and σ belong to P2ac (M ), and let
ρ be the density of µ with respect to ν. Let ψ be a d2 /2-convex function
such that T = exp(∇ψ) e is the unique Monge transport µ → σ, and for
e
t ∈ [0, 1] let µt = (exp(t∇ψ)) # µ. Assume that:
1,1
(i) p(ρ) ∈ Wloc ;
(ii) Uν (µ) < +∞;
(iii) KN,U > −∞, where KN,U is defined in (17.10).
If M is noncompact, further assume that:
Z
|∇p(ρ)|2
(iv) IU,ν (µ) := dν < +∞; and
ρ
(v) µ, σ ∈ Ppac (M ), where p ∈ [2, +∞) ∪ {c} satisfies (17.5).
666 23 Gradient flows I

If M is noncompact, N < ∞ and K < 0, reinforce (v) into:

(v’) µ, σ ∈ Pqac (M ), where q ∈ ((2N )/(N − 1), +∞) ∪ {c} satisfies
Z
ν(dx)
∃ δ > 0; < +∞. (23.24)
(1 + d(x0 , x))q(N −1)−2N −δ

Then
Z
Uν (µt ) − Uν (µ) e ∇p(ρ)i dν;
lim inf ≥ h∇ψ, (23.25)
t↓0 t

and
Z
e ∇p(ρ)i dν
Uν (σ) ≥ Uν (µ) + h∇ψ,
Z 1 Z
1
+ KN,U e 2
|∇ψt (x)| ρt (x)1− N
ν(dx) (1 − t) dt. (23.26)
0

Particular Case 23.15 (Displacement convexity of H: above-

tangent formulation). If N = ∞ and U (r) = r log r, Formula (23.26)
becomes
Z 2
Hν (σ) ≥ Hν (µ) + e ∇ρi dν + K W2 (µ, σ) .
h∇ψ, (23.27)
M 2

Remark 23.16. By the Cauchy–Schwarz inequality, (23.27) implies

sZ sZ
e 2 dν |∇ρ|2 W2 (µ, σ)2
Hν (σ) ≥ Hν (µ) − ρ |∇ψ| dν + K
M M ρ 2
p W2 (µ, σ)2
= Hν (µ) − W2 (µ, σ) Iν (µ) + K .
2
In this sense (23.27) is a precise version of the HWI inequality appearing
in Corollary 20.13.

Remark 23.17. If KN,U = −∞ (i.e. K < 0 and p(r)/r 1−1/N → +∞

as r → ∞) then (23.26) remains obviously true but I don’t know
about (23.25).
1,1
Remark 23.18. As soon as ρ ∈ Wloc (M ) we can write ∇p(ρ) =
′ ′′ ′
p (ρ)∇ρ = ρ U (ρ)∇ρ = ρ ∇U (ρ), so (23.25) becomes
Subdifferential of energy functionals 667
Z
Uν (µt ) − Uν (µ) e ∇U ′ (ρ)i dµ,
lim inf ≥ h∇ψ, (23.28)
t↓0 t
which is the result that one would have formally guessed by using Otto’s
calculus.

Proof of Theorem 23.14. The complete proof is quite tricky and it is

strongly advised to just focus on the compactly supported case at ﬁrst
reading. I have divided the argument into seven steps.
Step 1: Computation of the lim inf in the compactly sup-
ported case. To begin with, I shall assume that µ and σ are compactly
supported, and compute the lower derivative:
Z
Uν (µt ) − Uν (µ0 )
lim inf = − p(ρ) (Lψ) dν, (23.29)
t↓0 t

where the function Lψ is obtained from the measure Lψ (understood

in the sense of distributions) by keeping only the absolutely continuous
part (with respect to the volume measure).
The argument starts as in the proof of Theorem 17.15, with a change
of variables:
Z Z
Uν (µt ) = U (ρt (x)) dν(x) = U ρt (expx t∇ψ(x)) J0→t (x) dν(x)
Z
ρ0 (x)
= U J0→t (x) dν(x),
J0→t (x)

where ρ0 is the same as ρ, and J0→t is the Jacobian determinant asso-

ciated with the map exp(t∇ψ) (the reference measure being ν). Note
carefully that here I am using the Jacobian formula for a change of vari-
ables which a priori is not Lipschitz; also note that T = exp(∇ψ) (there
is no need to use approximate gradients since everything is compactly
supported). Upon use of µ = ρ0 ν, it follows that
Z
Uν (µt ) − Uν (µ)
= w(t, x) µ(dx), (23.30)
t
where

u(t, x) − u(0, x) J0→t (x)
ρ0 (x)
w(t, x) = , u(t, x) = U .
t J0→t (x)
ρ0 (x)
(23.31)
By Theorem 14.1, for µ-almost all x we have the Taylor expansion
668 23 Gradient flows I

det dx exp(t∇ψ) = 1 + t ∆ψ(x) + o(t) as t → 0,

so

J0→t (x) = e− V (exp t∇ψ(x))−V (x)
det dx exp(t∇ψ)

= 1 − t ∇V (x) · ∇ψ(x) 1 + t ∆ψ(x) + o(t)
= 1 + t (Lψ)(x) + o(t). (23.32)

On the other hand, for given r, the derivative of δ → (δ/r) U (r/δ) at

δ = 1 is − p(r)/r. This and (23.32) imply that for almost all x where
ρ0 (x) > 0,
p(ρ0 (x))
lim w(t, x) = − (Lψ(x)) .
t↓0 ρ0 (x)
So to establish (23.29), it suﬃces to prove
Z Z h i
lim w(tk , x) µ(dx) = lim w(tk , x) µ(dx), (23.33)
k→∞ k→∞

where (tk )k∈N is an arbitrary sequence of positive times decreasing to 0.

First consider the case K ≥ 0, which is simpler. From the estimates
in the proof of Theorem 17.15 (recall (17.15)) we know that u(t, x) is a
convex function of t; then w(t, x) is nonincreasing as t ↓ 0, and (23.33)
follows from the monotone convergence theorem.
Now consider the case when K < 0. As in the estimates in the proof
of Theorem 17.15 (see (17.15) again),

d2 u(t, x) − 1 2
2
≥ KN,U ρt exp(t∇ψ(x)) N ∇ψt (exp(t∇ψ(x))) ,
dt
and by assumption KN,U is ﬁnite. Note that |∇ψt (exp(t∇ψ(x)))| =
d(x, T (x)), which is bounded (µ-almost surely) by the maximum dis-
tance between points in the support of µ and points in the support
of ν. So there is a positive constant C such that

d2 u(t, x) − 1
≥ − C ρt exp(t∇ψ(x)) N
. (23.34)
dt2
Let Z tZ s − 1
R(t, x) = C ρτ exp(τ ∇ψ(x)) N dτ ds; (23.35)
0 0
e(t, x) − u
u e(0, x)
e(t, x) = u(t, x) + R(t, x);
u e x) =
w(t, .
t
Subdifferential of energy functionals 669

Then t → ue(t, x) is a convex function, so the previous reasoning applies

to show that
Z Z
lim e k , x) µ(dx) =
w(t e k , x) µ(dx).
lim w(t
k→∞ k→∞

To conclude the proof, it suﬃces to check that the additional term

R(s, x) does not count in the limit, i.e.

R(tk , x)
lim = 0, µ(dx)-almost surely, (23.36)
k→∞ tk
Z
R(tk , x)
lim µ(dx) = 0. (23.37)
k→∞ tk
If (23.37) is true, then also (23.36) will be true up to extraction of a
subsequence; but since the sequence tk is already arbitrary, this will
imply that R(t, x) → 0 as t → 0, µ-almost surely. So we just have to
check (23.37). By (23.35) and Fubini,
Z Z Z Z
R(tk , x) 1 t s − 1
µ(dx) = ρτ exp(τ ∇ψ(x)) N
µ(dx) dτ ds
tk t 0 0
Z Z Z
1 t s −N1
= ρτ (y) µτ (dy) dτ ds
t 0 0
Z Z Z
1 t s 1
= ρτ (y)1− N ν(dy) dτ ds. (23.38)
t 0 0
R 1−1/N R
By Jensen’s inequality, ρτ dν ≤ ν[S]1/N ( ρτ dν)1−1/N = ν[S]1/N ,
where S is the support of µτ , and this is bounded
RtRs independently of
τ ∈ [0, 1]. So (23.38) is bounded like O(t −1
0 0 dτ ds) = O(t). This
proves (23.37) and concludes the argument.
Step 2: Extension of ∇ψ. The function ψ might not be ﬁnite
outside Spt µ, which might cause problems in the sequel. In this step,
we shall see that it is possible to extend ψ into a function which is ﬁnite
everywhere.
Let π be the optimal transference plan between µ and σ, and let
T = exp(∇ψ) be the Monge transport µ → σ. Let e c be the restriction
of c(x, y) = d(x, y)2 to (Spt µ) × (Spt σ). By Theorem 5.19, there is a
c-convex function ψe such that Spt π ⊂ ∂ec ψ,
e e so exp(∇ψ) e is a Monge
transport between µ and σ. By uniqueness of the Monge transport
(Theorem 10.28),
670 23 Gradient flows I

e
expx (∇ψ(x)) = expx (∇ψ(x)), µ(dx)-almost surely. (23.39)

Also recall from Remark 10.32 that µ-almost surely, x and T (x) =
expx (∇ψ(x)) are joined by a unique geodesic. So (23.39) implies
e
∇ψ(x) = ∇ψ(x), µ(dx)-almost surely.

Now let φ be the e e then

c-transform of ψ:
d(x, y)2
e
ψ(x) = sup φ(y) − . (23.40)
y∈Spt σ 2

This formula can be used to deﬁne ψe outside of Spt µ, everywhere on M ;

the resulting function will still be d2 /2-convex.
Since Spt σ is bounded, the function inside brackets in (23.40) is lo-
cally Lipschitz in x, uniformly in y, so the extended function ψe is also
locally Lipschitz. Also, since the function d(x, y)2 /2 is locally semicon-
cave in x, uniformly in y ∈ Spt σ, the function ψe is locally semiconvex
(recall Theorem 10.26).
To summarize: We have constructed a locally Lipschitz, locally semi-
convex, d2 /2-convex function ψe such that ∇ψe coincide µ-almost surely
with ∇ψ. In the sequel, I shall work with ψe and drop the tilde symbol,
so ψ will be deﬁned in the whole of M .
Step 3: Integration by parts. In this step I shall show that
Z Z
− p(ρ) Lψ dν ≥ h∇ψ, ∇p(ρ)i dν, (23.41)

where Lψ = ∆ψ − ∇V · ∇ψ is understood in the sense of Alexandrov

(Theorem 14.1) or equivalently, as the absolutely continuous part of
the distribution Lψ.
Since ρ is compactly supported, so is p(ρ). By assumption p(ρ) lies
in W 1,1 (M ). By regularization (in local charts, or using a C ∞ molliﬁer
kernel with properties similar to those in Deﬁnition 29.34), we can
construct a sequence (ζk )k∈N of nonnegative functions in Cc∞ (M ) such
that
ζk −→ p(ρ), ∇ζk −→ ∇p(ρ), (23.42)
L1 L1

and all functions ζk are supported in a ﬁxed compact set W .

By Theorem 14.1, the function ∆ψ is bounded above by the distri-
bution ∆ψ; on the other hand, the function ∇V · ∇ψ coincides with
Subdifferential of energy functionals 671

the distribution ∇V · ∇ψ; so the function Lψ is bounded above by the

distribution Lψ (denoted LD′ ψ). This implies
Z Z
ζk (Lψ) dν ≤ hζk , LD′ ψi dν
Z
= − h∇ζk , ∇ψi dν. (23.43)

The function ψ is Lipschitz in W , so ∇ψ is bounded; combining this

with (23.42) we get
Z Z
h∇ζk , ∇ψi dν −−−→ h∇p(ρ), ∇ψi dν. (23.44)
k→∞

Next, the function ∆ψ is bounded below on W because ψ is semi-

convex (or because, by (13.8),

d(z, expx ∇ψ(x))2

∆ψ(x) ≥ − ∆|z=x
2
n d(x, y)2 o
≥ − sup ∆x ; y ∈ (exp ∇ψ)(W ) ,
2
which is ﬁnite, recall the Third Appendix of Chapter 10). So Lψ is
bounded below on W , and Fatou’s lemma applies to show
Z Z Z
p(ρ) (Lψ) dν = ( lim ζk ) (Lψ) dν ≤ lim inf ζk (Lψ) dν. (23.45)
k→∞ k→∞

The combination of (23.43), (23.44) and (23.45) implies (23.41).

Step 4: Integral reformulation. We shall now take advantage of
the displacement convexity properties of Uν to reformulate the diﬀer-
ential condition
Z
Uν (µt ) − Uν (µ0 )
lim inf ≥ h∇ψ, ∇p(ρ0 )i dν (23.46)
t↓0 t

into the integral (in t) condition

Z
Uν (µ1 ) ≥ Uν (µ0 ) + h∇ψ, ∇p(ρ0 )i dν
Z 1 Z
1
1− N 2
+ KN,U ρt (x) |∇ψt (x)| ν(dx) (1 − t) dt. (23.47)
0
672 23 Gradient flows I

The advantage of the integral formulation is that it is quite stable under

limits. At the same time, this will establish Theorem 23.14 in the case
when µ and σ are compactly supported.
The strategy is the same as in the proof of (iv) ⇒ (v) in Proposi-
tion 23.1. Recall from Theorem 17.15 that for any t ∈ (0, 1),

Uν (µt ) ≤ (1 − t) Uν (µ0 ) + t Uν (µ1 )

Z 1 Z
1
− KN,U ρs (x)1− N |∇ψs (x)|2 ν(dx) G(s, t) ds,
0 M

where G(s, t) is the one-dimensional Green function of the Laplacian.

By subtracting Uν (µ0 ) on both sides and dividing by t, we get

Uν (µt ) − Uν (µ0 )
≤ Uν (µ1 ) − Uν (µ0 )
t
Z 1 Z
1
1− N 2 G(s, t)
− KN,U ρs (x) |∇ψs (x)| ν(dx) ds. (23.48)
0 t

Then we can use Steps 1 and 2 to pass to the lim inf in the left-hand
side.
As for the right-hand side of (23.48), we note that if B is a large
ball containing the supports of all ρs (0 ≤ s ≤ 1), and D = diam (B),
then
Z Z
1 1− 1
ρs (x)1− N |∇ψs (x)|2 ν(dx) ≤ D 2 ρs N dν
1
≤ D 2 ν[B] N < +∞.

(Here I have used Jensen’s inequality again as in Step 1.) So the quan-
tity inside brackets in the right-hand side of (23.48) is uniformly (in t)
bounded. Since G(s, t)/t converges to 1 − s in L1 (ds) as t → 0, we can
pass to the limit in the right-hand side of (23.48) too. The ﬁnal result
is
Z
h∇ψ, ∇p(ρ)i dν ≤ Uν (µ1 ) − Uν (µ0 )
Z 1 Z
1
− KN,U ρs (x)1− N |∇ψs (x)|2 ν(dx) (1 − s) ds,
0

which is the same as (23.47) (or (23.26) since µ0 = µ, µ1 = σ).

Subdifferential of energy functionals 673

The last three steps consist in a long string of approximations to go

from the compact to the noncompact case.
Step 5: Removal of compactness assumption, for nice pres-
sure laws. In this step I shall use an approximation argument to extend
the validity of (23.47) (or a relaxation thereof) to the noncompact case.
The difficulty is that the standard approximation scheme of Proposi-
tion 13.2 does not in general preserve any smoothness R of ρ0 , which
makes it problematic to pass to the limit in the term h∇ψ, ∇p(ρ0 )i.
So let M , ν, U , (µt )0≤t≤1 , (ρt )0≤t≤1 , (ψt )0≤t≤1 satisfy the assump-
tions of the theorem. Later on I shall make some additional assumptions
on the function U , which will be removed in the next step.
I shall distinguish two cases, according to whether the support of
µ1 is compact or not. Somewhat surprisingly, the two arguments will
be different.
Case 1: Spt(µ1 ) is compact. In this case one can use a slight
modification of the standard approximation scheme. Let χ : R+ → R
be a smooth nondecreasing function with 0 ≤ χ ≤ 1, χ(r) = 1 for
r ≤ 1, χ(r) = 0 for r ≥ 2. Let z ∈ M be arbitrary, and for k ∈ N let
χk (r) = χ(d(z, x)/k). Of course 0 ≤ χk ≤ 1, χk is identically equal to 1
on B[z, k] and identically equal to 0 outside B[z, 2k]; moreover,
R χk is
C/k-Lipschitz, where C = kχkLip . For k large enough, χk dµ0 ≥ 1/2;
then let us set
Z
χ k ρ0
Zk = χk dµ0 ; ρ0,k = ; µ0,k = ρ0,k ν;
Zk
χk (γ0 ) Π(dγ)
Πk (dγ) = .
Zk

Of course, (e0 )# Πk = µ0,k . For each t ∈ [0, 1] we let µt,k = (et )# Πk ,

and deﬁne ρt,k as the density of µt,k . Then Zk ↑ 1, Zk ρt,k ↑ ρt , and
for each k, all the measures µt,k are supported in a common compact
set. (This is because µ1 is compactly supported!) Moreover, the opti-
mal transport exp(∇ψt,k ) between µt,k and µ1,k coincides, µt,k -almost
surely, with exp(∇ψ e t ); and the gradient ∇ψk coincides, µ0,k -almost
surely, with the approximate gradient ∇ψ. e
For each k, we can apply the results of Step 4 with µt replaced by
µt,k and U replaced by Uk = U (Zk · ). We obtain
674 23 Gradient flows I
Z

Uk,ν (µ1,k ) ≥ Uk,ν (µ0,k ) + ∇ψk , ∇pk (ρ0,k ) dν
Z 1 Z
1
1− N 2
+ KN,Uk ρt,k (x) |∇ψt,k (x)| ν(dx) (1 − t) dt, (23.49)
0 M

where pk (r) = r Uk′ (r) − Uk (r).

We can pass to the limit in the ﬁrst, second and fourth terms
in (23.49) exactly
R as in theRproof of Theorem 17.15. (The key obser-
vation is that Uk (µt,k ) = U (Zk ρt,k ) dν, and the monotone conver-
gence theorem applies to U+ (Zk ρt,k ).) It remains to take care of the
third term, which is
Z Z
h∇ψk , ∇pk (ρ0,k )i dν = h∇ψ,e Zk ∇p(χk ρ0 )i dν
Z D
E
= e Zk p′ (χk ρ0 ) ρ0 ∇χk + χk ∇ρ0 dν.
∇ψ,
(23.50)

Here I used the chain-rule formula ∇p(χk ρ0 ) = p′ (χk ρ0 ) ∇(χk ρ0 ),

which requires a bit of justification. (Don’t hesitate to skip this para-
1,1
graph.) The problem is that the assumption p(ρ) ∈ Wloc (M ) does
1,1
not imply that ρ itself lies in Wloc (M ). A possible justification is
as follows. Let [0, r0 ] be the interval on which p′ = 0 (recall Propo-
sition 17.7(iii)); then for any a > r0 , b > a, the map p(r) →
1,1
min(a, max(r, b)) =: ϕa,b (r) is Lipschitz. So ϕa,b (ρ) lies in Wloc (M ),
′
and 1a≤ρ≤b ∇p(ρ) = p (ρ)1a≤ρ≤b ∇ρ; this identity defines ∇ρ almost ev-
erywhere as a function on the set {a ≤ ρ0 ≤ b}. Since a, b are arbitrary,
this also establishes 1ρ>r0 ∇p(ρ) = p′ (ρ)1ρ>r0 ∇ρ. But for ρ ≤ r0 , p(ρ) is
a constant and it results by a classical theorem (see the bibliographical
notes) that ∇p(ρ) = 0 almost everywhere. Also p′ (ρ) = 0 if ρ ≤ r0 ,
so we may decide that p′ (ρ)1ρ≤r0 ∇ρ = 0 even if ∇ρ might not be de-
fined as a function for ρ ≤ r0 . The same reasoning also applies with ρ
replaced by χk ρ, because χk ρ is never greater than ρ (so p(χk ρ0 ) is
constant for ρ ≤ r0 ).
Now let us come back to the control of (23.50). Since p′ is contin-
uous, Zk → 1, χk → 1 and ∇χk → 0, it is clear that the integrand
e p′ (ρ0 ) ∇ρ0 i. To conclude, it suf-
in (23.50) converges pointwise to h∇ψ,
fices to check that the dominated convergence theorem applies, i.e. that

e p′ (χk ρ0 ) ρ0 |∇χk | + χk |∇ρ0 |
|∇ψ| (23.51)
Subdifferential of energy functionals 675

is bounded by a ﬁxed L1 function.

The ﬁrst term in (23.51) is easy to dominate if we assume that p is
Lipschitz: Then there is a constant C such that
e p′ (χk ρ0 )ρ0 |∇χk | ≤ C ρ0 |∇ψ|,
|∇ψ| e
R qR
e ≤
and the latter is integrable since ρ0 |∇ψ| e 2 = W2 (µ0 , µ1 ).
ρ0 |∇ψ|
To control the second term in (23.51), I shall assume the existence
of ﬁnite positive constants A, B, b, c such that
ρ1 ≤ ρ0 ≤ A =⇒ p′ (ρ1 ) ≤ C p′ (ρ0 ); (23.52)
1
−N
ρ0 ≥ A =⇒ p′ (ρ0 ) ≥ c ρ0 ; (23.53)
1
−N
ρ0 ≥ B =⇒ p′ (ρ0 ) = b ρ0 . (23.54)
(Here as below, C is a notation for various constants which may depend
on U .) With these assumptions we can reason as follows:
• If ρ0 ≤ A, then
e p′ (χk ρ0 ) χk |∇ρ0 | ≤ C |∇ψ|
|∇ψ| e p′ (ρ0 )|∇ρ0 | = C |∇ψ|
e |∇p(ρ0 )|.
1−1/N −1/N
• If ρ0 ≥ A and χk ρ0 ≥ B, then χk p′ (χk ρ0 ) = b χk ρ0 , so
1
1− N −1
e χk p′ (χk ρ0 ) |∇ρ0 | ≤ C |∇ψ|
|∇ψ| e χ ρ0 N |∇ρ0 |
k
1
−N
e ρ
≤ C |∇ψ| |∇ρ0 |
0
e p (ρ0 ) |∇ρ0 |.
≤ C |∇ψ| ′

1/N
• If ρ0 ≥ A and χk ρ0 ≤ B, then χk p′ (χk ρ0 ) ≤ C χk ≤ C χk ≤
−1/N
C B 1/N ρ0 , so
1
−
e χk p′ (χk ρ0 ) |∇ρ0 | ≤ C |∇ψ|
|∇ψ| e ρ N |∇ρ0 |
0
e ′
≤ C |∇ψ| p (ρ0 ) |∇ρ0 |.
e χk p′ (χk ρ0 ) |∇ρ0 | is bounded by
To summarize: In all cases, |∇ψ|
e |∇p(ρ0 )|, and the latter quantity is integrable since
C |∇ψ|
Z sZ sZ
e |∇p(ρ0 )| dν ≤ e 2 dν |∇p(ρ0 )|2
|∇ψ| ρ0 |∇ψ| dν
ρ0
q
= W2 (µ0 , µ1 ) IU,ν (µ0 ). (23.55)
676 23 Gradient flows I

So we can pass to the limit in the third term of (23.49), and the proof
is complete.
Case 2: Spt(µ1 ) is not compact. In this case we shall definitely
not use the standard approximation scheme by restriction, but instead
a more classical procedure of smooth truncation.
Again let χ : R+ → R+ be a smooth nondecreasing function with
0 ≤ χ ≤ 1, χ(r) = 1 for r ≤ 1, χ(r) = 0 for r ≥ 2; now we require in
addition that (χ′ )2 /χ is bounded. (It is easy to construct such a cutoff
function rather explicitly.) Then we define χk (x) = χ(d(z, x)/k),
R where
z is an arbitrary point in M . For k large enough, Z1,k := R χk dµ1 ≥
1/2. Then we choose ℓ = ℓ(k) large enough that Z0,k := χℓ dµ0 is
larger than Z1,k ; this is possible since Z1,k < 1 (otherwise µ1 would be
compactly supported). Then we let
χℓ(k) µ0 χk µ 1
µ0,k = ; µ1,k = .
Z0,k Z1,k
For each k, these are two compactly supported, absolutely continuous
probability measures; let (µt,k )0≤t≤1 be the displacement interpolation
joining them, and let ρt,k be the density of µt,k . Further, let ψk be a
d2 /2-convex function so that exp(∇ψk ) is the optimal (Monge) trans-
port µ0 and µ1 , and let ψt,k be deduced from ψk by the Hamilton–Jacobi
forward semigroup.
Note carefully: It is obvious from the construction that Z0,k ρ0,k ↑ ρ0 ,
Z1,k ρ1,k ↑ ρ1 , but there is a priori no monotone convergence relating ρt,k
to ρt ! Instead, we have the following information. Since µ0,k → µ0 and
µ1,k → µ1 , Corollary 7.22 shows that the geodesic curves (µt,k )0≤t≤1
converge, up to extraction of a subsequence, to some geodesic curve
(µt,∞ )0≤t≤1 joining µ0 to µ1 . (The convergence holds true for all t.)
Since (µt ) is the unique such curve, actually µt,∞ = µt , which shows
that µt,k converges weakly to µt for all t ∈ [0, 1].
For each k, we can apply the results of Step 4 with U replaced by
Uk = Uk (Z1,k · ) and µt replaced by µt,k :
Z
Uk,ν (µ1,k ) ≥ Uk,ν (µ0,k ) + h∇ψk , ∇pk (ρ0,k )i dν
Z 1 Z
1
1− N 2
+ KN,Uk ρt,k (x) |∇ψt,k (x)| ν(dx) (1 − t) dt, (23.56)
0 M

where pk (r) = r Uk′ (r) − Uk (r). The problem is to pass to the limit as
k → ∞. We shall consider all four terms in (23.56) separately, and use
Subdifferential of energy functionals 677

a few results which will be proven later on in Part III of this course (in
a more general context).
R
First term of (23.56): Since Uk,ν (µ1,k ) = U (Z1,k ρ1,k ) dν and
Z1,k ρ1,k = χk ρ1 converges monotonically to ρ1 , the same arguments
as in the proof of Theorem 17.15 apply to show that

Uk,ν (µ1,k ) −−−→ Uν (µ1 ). (23.57)

k→∞

Second term of (23.56): Since Z1,k ρ0,k converges to µ0 (in total

variation, a fortiori weakly), we can use the lower semicontinuity of the
convex functional Uν (see Theorem 30.6(i) later in these notes), to pass
to the lim inf:
Uν (µ0 ) ≤ lim inf Uk,ν (µ0,k ). (23.58)
k→∞

(Theorem 30.6 is proven under the assumption that U (r) ≥ −c r for

some c ∈ R, so let us make this assumption here too.) By the way,
notice that the treatment of µ0 and µ1 is not symmetric.
Third term of (23.56): First note that ∇ψk converges µ0 -almost
e and therefore also in µ0 -probability. The argument is the
surely to ∇ψ,
same as in the proof of (ii) in Theorem 23.9.
Now our goal is to show that
Z Z

e ∇p(ρ0 )i dν −−−→ 0.
∇ψk , ∇pk (ρ0,k ) dν − h∇ψ, (23.59)
k→∞

The left-hand side of (23.59) can be decomposed into two terms as

follows:
Z Z

e e ∇pk (ρ0,k )− ∇p(ρ0 )i dν. (23.60)
∇ψk − ∇ψ, ∇pk (ρ0,k ) dν + h∇ψ,

Both terms will be treated separately.

First term in (23.60): By Cauchy–Schwarz, this term is bounded by
sZ sZ
e 2 dν |∇pk (ρ0,k )|2
ρ0,k |∇ψk − ∇ψ| dν. (23.61)
ρ0,k

Let us expand the square norm:

678 23 Gradient flows I
Z
e 2 dν
ρ0,k |∇ψk − ∇ψ|
Z Z Z
2 e 2 e dν
= ρ0,k |∇ψk | dν + ρ0,k |∇ψ| dν − 2 ρ0,k h∇ψk , ∇ψi
Z Z Z
2 e 2 e ∇ψi
= ρ0,k |∇ψk | dν − ρ0,k |∇ψ| dν − 2 ρ0,k h∇ψk − ∇ψ, e dν.

(23.62)

Observe that:
R
(a) ρ0,k |∇ψk |2 dν = W2 (µ0,k , µ1,k )2 . To prove that this converges
to W2 (µ0 , µ1 )2 , by Corollary 6.11 it suﬃces to check that

W2 (µ0,k , µ0 ) −→ 0; W2 (µ1,k , µ1 ) −→ 0;

but this is an immediate consequence of the construction of µ0,k , µ1,k ,

Theorem 6.9 and Deﬁnition 6.8.
R R R
e 2 dν = (1/Z0,k ) χk |∇ψ|
(b) ρ0,k |∇ψ| e 2 dµ0 −→ |∇ψ| e 2 dµ0 =
2
W2 (µ0 , µ1 ) by monotone convergence.
(c) For any ε, M > 0,
Z

e ∇ψi
ρ0,k h∇ψk − ∇ψ, e dν

Z Z
e dν +
≤ ε ρ0,k |∇ψ| e |∇ψ|
ρ0,k |∇ψk − ∇ψ| e dν
e
|∇ψk −∇ψ|≥ε
sZ sZ
≤ε e 2 dν +
ρ0,k |∇ψ| e 2 dν
ρ0,k |∇ψk − ∇ψ|
sZ
× e 2 dν
ρ0,k |∇ψ|
e
|∇ψk −∇ψ|≥ε
sZ sZ sZ !
√
≤ε e 2 dν + 2
ρ0 |∇ψ| ρ0,k |∇ψk |2 dν + e 2 dν
ρ0,k |∇ψ|
s Z Z
× M2 ρ0,k dν + e 2 dν
ρ0,k |∇ψ|
e
|∇ψk −∇ψ|≥ε e
|∇ψ|≥M
Subdifferential of energy functionals 679
sZ sZ sZ !
√ √
≤ε e 2 dν + 2
ρ0 |∇ψ| ρ0,k |∇ψk |2 dν + 2 e 2 dν
ρ0 |∇ψ|
s Z Z
× M2 ρ0,k dν + e 2 dν
ρ0,k |∇ψ|
e
|∇ψk −∇ψ|≥ε e
|∇ψ|≥M

≤ ε W2 (µ0 , µ1 ) + 2 W2 (µ0,k , µ1,k ) + W2 (µ0 , µ1 )
s Z

× 2M 2 µ0 {|∇ψk − ∇ψ| e ≥ ε} + 2 e 2 dν.
ρ0 |∇ψ|
e
|∇ψ|≥M

By letting ε → 0 and then M → ∞, we conclude that

Z

e ∇ψ
ρ0,k ∇ψk − ∇ψ, e dν −−−→ 0.
k→∞

Plugging back the results of (a), (b) and (c) into (23.62), we obtain
Z
e 2 dν −−−→ 0.
ρ0,k |∇ψk − ∇ψ| (23.63)
k→∞

The inequality Z1,k ≤ Z0,k implies Z1,k ρ0,k ≤ ρ0 χk ≤ ρ0 . This

makes it possible to justify the chain-rule as in Step 1: ∇pk (ρ0,k ) =
1,1
p′k (ρ0,k )∇ρ0,k . Next, since ρ0,k ∈ Wloc (M ) and p is Lipschitz, we can
write
Z Z ′ 2
|∇pk (ρ0,k )|2 pk (ρ0,k ) ∇ρ0,k
dν = dν
ρ0,k ρ0,k
2
2
Z1,k p ′ Z1,k χ ρ χk ∇ρ0 + ρ0 ∇χk 2
Z0,k k 0
= dν
Z0,k ρ0 χ k
Z 2
′ Z1,k |∇ρ0 |2 |∇χk |2
≤4 p χk ρ0 χk + ρ0 dν.
Z0,k ρ0 χk

Since Z1,k /Z0,k ≤ 1, we can use conditions (23.52) to (23.54), and a

reasoning similar to the one in Step 1, to ﬁnd a constant C such that
2
Z1,k |∇ρ0 |2 |∇ρ0 |2 |∇p(ρ0 )|2
p′ χ k ρ0 χ k ≤ C p′ (ρ0 )2 =C ,
Z0,k ρ0 ρ0 ρ0

which is integrable by assumption. Also, since p′ and (χ′ )2 /χ are

bounded, there is a constant C such that
680 23 Gradient flows I
2
Z1,k |∇χk |2
p′ χk ρ0 ρ0 ≤ Cρ0 ,
Z0,k χk

which is of course integrable. The conclusion is that the second integral

in (23.61) is bounded. Combining this with (23.63), we conclude that
the whole expression in (23.61) converges to 0.
R
Second term of (23.60): We wish to show that h∇ψ, e ∇pk (ρ0,k )i dν
R
converges to h∇ψ,e ∇p(ρ0 )i dν. As before, we can find a constant C
such that
|∇pk (ρ0,k )| ≤ C |∇p(ρ0 )|.
Then Rthe conclusion follows by the dominated convergence theorem,
e |∇p(ρ0 )| dν is integrable by (23.55). This finishes the treat-
since |∇ψ|
ment of the third term in (23.56).
Fourth term of (23.56): I shall consider three cases:
Case (I): N = ∞. Then as in Proposition 17.24(i) the fourth term
of (23.56) can be rewritten as Zk K∞,U W2 (µ0,k , µ1,k )2 /2, and this con-
verges to K∞,U W2 (µ0 , µ1 )2 /2 as k → ∞.
Case (II): N < ∞ and K ≥ 0. Then let us just say that the fourth
term of (23.56) is nonnegative.
Case (III): RN < ∞ and K < 0. This R case is much more tricky.
By assumption, (1 + d(z, x)q ) µ0 (dx) + (1 + d(z, y)q ) µ1 (dy) < +∞,
where q satisfies (23.24). Then it follows from the construction of µ0,k
and µ1,k that
Z Z
sup (1+d(z, x)q ) µ0,k (dx) < +∞; sup (1+d(z, y)q ) µ1,k (dy) < +∞.
k∈N k∈N

By Proposition 17.24(ii), there is a function η ∈ L1 ((0, 1); dt) such that

Z
1
∀t ∈ (0, 1), ρt,k (x)1− N |∇ψt,k (x)|2 ν(dx) ≤ η(t). (23.64)

I claim that
Z 1 Z
1
ρt (x)1− N e 2
|∇ψt (x)| ν(dx) (1 − t) dt
0
Z 1 Z
1
1− N 2
≥ lim sup ρt,k (x) |∇ψt,k (x)| ν(dx) (1 − t) dt. (23.65)
k→∞ 0
Subdifferential of energy functionals 681

By (23.64) the integrands in the right-hand side of (23.65) are

bounded above by an integrable function, so the claim will follow by
Fatou’s lemma if we can show that for each t ∈ (0, 1),
Z Z
1 1
ρt (x)1− N e
|∇ψt (x)| ν(dx) ≥ lim sup ρt,k (x)1− N |∇ψt,k (x)|2 ν(dx).
2
k→∞
(23.66)
In the sequel, t will be ﬁxed in (0, 1).
To establish (23.66), we may cut out large values of x, by introducing
a cutoﬀ function χℓ (x) (0 ≤ χℓ ≤ 1, χℓ = 0 outside B[z, 2ℓ], χℓ = 1
inside B[z, ℓ]): this is possible since, by Jensen’s inequality as in the
beginning of the proof of Proposition 17.24(ii),
Z
1
e t (x)|2 ν(dx)
(1 − χℓ (x)) ρt (x)1− N |∇ψ
Z Z 1− 1
N
q q
≤ C(t, N, q) 1 + d(z, x) µ0 (dx) + d(z, x) µ1 (dx)
Z N1
(1 − χℓ (x))N ν(dx)
×
(1 + d(z, x))q(N −1)−2N
−−−→ 0,
ℓ→∞

e t replaced by ρt,k , ∇ψt,k (uniformly

and similar bounds hold with ρt , ∇ψ
in k).
Now we shall prove a locally uniform bound on |∇ψt,k |: for any ﬁxed
t ∈ (0, 1),

µt,k (dx) − almost surely, x ∈ B[z, 2ℓ] =⇒ |∇ψt,k (x)| ≤ C, (23.67)

where C may depend on ℓ and t, but not on k.

The argument will rely on cyclical monotonicity. First we note that,
the sequences (µ0,k ) and (µ1,k ) being tight, there is ℓ ∈ N such that

∀k ∈ N, µ0,k B[z, ℓ] ≥ 3/4, µ1,k B[z, ℓ] ≥ 3/4;

then a proportion at least 1/2 of the mass is transferred from B[z, ℓ]

to B[z, ℓ]. In particular, if Πk is a dynamical optimal transference plan
associated with (µt,k )0≤t≤1 , and Πk = law (γ), then
hn oi 1
Πk d(z, γ0 ) ≤ ℓ and d(z, γ1 ) ≤ ℓ ≥ . (23.68)
2
682 23 Gradient flows I

Next we recall that |∇ψt,k (γt )| = d(γt , γ1 )/(1 − t), so

h i
µt,k |∇ψt,k (x)| ≥ C and d(z, x) ≤ 2ℓ
h d(γ , γ ) i
t 1
= Πk ≥ C and d(z, γt ) ≤ 2ℓ
1−t
h 1−t i
= Πk d(γ0 , γ1 ) ≥ C and d(z, γt ) ≤ 2ℓ . (23.69)
t

Comnbining (23.68) and (23.69), we conclude that

h i
µt,k |∇ψt,k (x)| ≥ C and d(z, x) ≤ 2ℓ ≤ 2(Πk ⊗ Πk )[E], (23.70)

where E is an event involving two random geodesics γ and γ e, deﬁned

by

1−t
d(z, γ0 ) ≤ ℓ, d(z, γ1 ) ≤ ℓ, d(z, γet ) ≤ 2ℓ, d(e e1 ) ≥ C
γ0 , γ .
t

If these inequalities hold true, then

d(γ0 , γ1 )2 + d(e e1 )2 − d(γ0 , γ

γ0 , γ e1 )2 − d(e
γ0 , γe1 )2
2
≥ d(e γ0 , γe1 )2 − d(γ0 , z) + d(z, γet ) + d(e γt , γe1 )
2
− d(eγ0 , γet ) + d(e γt , z) + d(z, γe1 )

≥ d(e γ0 , γe1 )2 − 2(1 + δ−1 )(3ℓ)2 − (1 + δ) d(e e1 )2 + d(e
γt , γ et )2
γ0 , γ

= d(e γ0 , γe1 )2 1 − (1 + δ)((1 − t)2 + t2 ) − 18(1 + δ−1 )ℓ2

2 1−t 2
≥C 1 − (1 + δ)(1 − 2t(1 − t)) − 18(1 + δ−1 )ℓ2 ,
t

and this quantity is positive if δ is chosen small enough and then C large
enough. Then the event E has zero Πk ⊗ Πk measure since (e0 , e1 )# Πk
is cyclically monotone (Theorems 5.10 and 7.21). So the left-hand side
in (23.70) vanishes, and (23.67) is true. A similar result holds for µt,k
and ∇ψt,k replaced by µt and ∇ψ e t , respectively.
R
Let Zk = χℓ dµt,k (which is positive if ℓ is large enough), and let
b
Πk be the probability measure on geodesics deﬁned by

b
b k (dγ) = χℓ (γt ) Πk (dγ) .
Π
Zk
Subdifferential of energy functionals 683

This is still a dynamical optimal transference plan, so it is associated

with a displacement interpolation (bµs,k )0≤s≤1 , and a velocity field ∇ψbs,k
(recall Theorem 7.21, or the recap provided in Chapter 13). The defi-
nition implies that
χℓ µt,k
µbt,k = .
Zk
Moreover, the “no-crossing property” (in the form of Theorem 7.30(v))
implies that |∇ψbt,k | = |∇ψt,k |, at least µ bt,k -almost surely, or equiva-
lently µt,k -almost surely on Spt χℓ . We can also make a similar con-
struction for Π and µs in place of Πk and µs,k , and get (with obvious
notation) Π, b µbs , ∇ψbs (0 ≤ s ≤ 1). It is easily seen that µ b0,k and µb1,k
converge to µ b0 and µ b1 , respectively.
The bound (23.67) implies that all plans Π b k are supported in a uni-
form compact set. Combining this with the fact that Π b is the unique
dynamical optimal transference plan between its endpoints (Corol-
lary 7.23), it results from Theorem 28.9(v) (proven later in these notes)
that the speed fields |∇ψt,k | converge uniformly. More precisely, there
are nonnegative measurable maps wk coinciding µ bt,k -almost surely with
|∇ψbt,k |2 , and w coinciding µ bt -almost surely with |∇ψbt |2 , such that

sup |wk (x) − w(x)| −−−→ 0. (23.71)

x∈B[x,2ℓ] k→∞

R 1−1/N
On the other hand, ρt,k dν is bounded independently of k (by
Theorem 17.8; the moment condition used there is weaker than the one
which is presently enforced). This and (23.71) imply
Z
1
χℓ (x) ρt,k (x)1− N |wk (x) − w(x)| ν(dx) −−−→ 0. (23.72)
k→∞

Since µt,k converges weakly to µt , the concavity of the function

Φ(r) = r 1−1/N implies
Z 1
Z
1− N 1− 1
lim sup ρt,k w χℓ dν ≤ ρt N w χℓ dν (23.73)
k→∞

(see Theorem 29.20 later in Chapter 29 and change signs; note that
χℓ w ν is a compactly supported measure).
Combining (23.72) and (23.73) yields
684 23 Gradient flows I
Z
1
lim sup ρt,k (x)1− N χℓ (x) |∇ψt,k (x)|2 ν(dx)
k→∞
Z
1
= lim sup ρt,k (x)1− N χℓ (x) wk (x) ν(dx)
k→∞
Z
1
= lim sup ρt,k (x)1− N χℓ (x) w(x) ν(dx)
k→∞
Z
1
≤ ρt (x)1− N χℓ (x) w(x) ν(dx)
Z
1
= ρt (x)1− N χℓ (x) |∇ψe t (x)|2 ν(dx).

This completes the proof of (23.66). Then we can at last pass to the
lim sup in the fourth term of (23.56).
Let us recapitulate: In this step we have shown that
• if N = ∞, then
Z
e ∇p(ρ0 )i dν + K∞,U W2 (µ0 , µ1 )2
Uν (µ1 ) ≥ Uν (µ0 ) + h∇ψ, ;
2
• if N < ∞ (or N = ∞) and K ≥ 0, then
Z
e ∇p(ρ0 )i dν;
Uν (µ1 ) ≥ Uν (µ0 ) + h∇ψ,

• if N < ∞ and K < 0, then

Z
e ∇p(ρ0 )i dν
Uν (µ1 ) ≥ Uν (µ0 ) + h∇ψ,
Z 1 Z
1
+ KN,U ρt (x)1− N e 2
|∇ψt (x)| ν(dx) (1 − t) dt.
0

Step 6: Extension to general pressure laws. The previous step

was performed under some regularity assumptions on the nonlinearity
U (and therefore on the pressure p(r) = r U ′ (r) − U (r)): namely, p was
assumed to be Lipschitz, and to satisfy Conditions (23.52) to (23.54).
If p does not satisfy these assumptions, then we can always apply
Proposition 17.7 to write U as the monotone limit of Uℓ , where Uℓ (r) is
nondecreasing (resp. nonincreasing) in ℓ for r ≥ R (resp. r ≤ R), such
that each Uℓ satisﬁes:
(a) pℓ (r) = a r 1−1/N for r large enough, where pℓ (r) = r Uℓ′ (r)−Uℓ (r)
(this holds true for N < ∞ as well as for N = ∞);
Subdifferential of energy functionals 685

(b) pℓ (r) = 0 for r small enough;

(c) If [0, r0 (ℓ)] is the interval on which pℓ vanishes, then there are
r1 > r0 , a function h nondecreasing on [r0 , r1 ], and constants K, C
such that Kh ≤ p′ℓ ≤ Ch; in particular, if r ≤ r ′ ≤ r1 , then p′ℓ (r) ≤
(C/K) p′ℓ (r ′ );
(d) p′ℓ (r) ≥ K0 r −1/N for r ≥ r1 .
This implies that each Uℓ satisﬁes all the assumptions for Step 5 to
go through. Moreover, Proposition 17.7 guarantees that Uℓ′′ ≤ C U ′′ for
some constant C which does not depend on ℓ; in particular, p′ℓ ≤ Cp′ .
Admitting for a while that this implies

|∇pℓ (ρ0 )| ≤ C |∇p(ρ0 )| (23.74)

(which is formally obvious, but requires a bit of care since p is not

assumed to be Lipschitz), we obtain
Z Z
|∇pℓ (ρ0 )|2 2 |∇p(ρ0 )|2
dν ≤ C dν < +∞.
ρ0 ρ0
So we can write the result of Step 5 with U replaced by Uℓ and p
replaced by pℓ , and it remains to pass to the limit as ℓ → ∞.
There is no diﬃculty in showing that (Uℓ )ν (µt0 ) converges to Uν (µt0 )
for t0 ∈ {0, 1}: This is done by monotone convergence, as in the proof
of Theorem 17.15.
If K ≥ 0, let us just forget about the time-integral. If K < 0,
then the condition KN,U > −∞ means p(r) = O(r 1−1/N ); then the
construction of Uℓ implies that KN,Uℓ converges to KN,U , so there is
no diﬃculty in passing to the limit in the time-integral. The last subtle
point consists in showing that
Z Z
e
h∇ψ, ∇pℓ (ρ0 )i dν −−−→ h∇ψ,e ∇p(ρ0 )i dν. (23.75)
ℓ→∞

First we check that ∇pℓ (ρ0 ) converges ν-almost everywhere to

∇p(ρ0 ). Let r0 ≥ 0 be such that U ′′ = 0 on (0, r0 ]. For any a > r0 ,
b > a, if a ≤ ρ0 ≤ b we can write ρ0 as a Lipschitz function of
1,1
p(ρ0 ), which by assumption lies in Wloc ; this implies 1a≤ρ0 ≤b ∇p(ρ0 ) =
′
1a≤ρ0 ≤b p (ρ0 ) ∇ρ0 . So

1a≤ρ0 ≤b ∇pℓ (ρ0 ) = p′ℓ (ρ0 ) 1a≤ρ0 ≤b ∇ρ0 −−−→ 1a≤ρ0 ≤b ∇p(ρ0 ).
ℓ→∞
686 23 Gradient flows I

This proves that ∇pℓ (ρ0 ) converges almost surely to ∇p(ρ0 ) on each
set {r0 < a ≤ ρ0 ≤ b}, and therefore on the whole of {ρ0 > r0 }. On the
other hand, if ρ0 ≤ r0 then p(ρ0 ) = 0, so ∇p(ρ0 ) vanishes almost surely
on {ρ0 ≤ r0 } (this is a well-known theorem from distribution theory;
see the bibliographical notes in case of need), and also ∇pℓ (ρ0 ) = 0 on
that set. This proves the almost everywhere convergence of ∇pℓ (ρ0 ) to
∇p(ρ0 ). At the same time, this reasoning proves (23.74).
So to pass to the limit in (23.75) it suﬃces to prove that the inte-
grand is dominated by an integrable function. But

e ∇pℓ (ρ0 )i ≤ |∇ψ|
h∇ψ, e |∇pℓ (ρ0 )|
e |∇p(ρ0 )|
≤ C |∇ψ|

which is integrable, since, again,

Z sZ sZ
e |∇p(ρ0 )| dν ≤ e 2 dν |∇p(ρ0 )|2
|∇ψ| ρ0 |∇ψ| dν
ρ0
q
= W2 (µ0 , µ1 ) IU,ν (µ0 ).

Step 7: Differential reformulation and conclusion. To con-

clude the proof of Theorem 23.14, I shall distinguish three cases:
Case (I): N < ∞ and K ≥ 0. Then we know
Z
e ∇p(ρ0 )i dν.
Uν (µ1 ) ≥ Uν (µ0 ) + h∇ψ, (23.76)

This is a priori weaker than (23.26), but we shall manage to improve

this inequality thanks to the displacement convexity properties of Uν .
First, by applying (23.76) with µt replaced by µ1 and ψ replaced by
tψ, and then passing to the limit (which is also an inﬁmum) as t → 0,
we obtain
Z
Uν (µt ) − Uν (µ0 ) e ∇p(ρ0 )i dν.
lim inf ≥ h∇ψ, (23.77)
t↓0 t
By Theorem 17.15,
Z 1 Z 1

Uν (µt ) + KN,U e s (x)|2 ν(dx) G(s, t) ds
ρs (x)1− N |∇ψ
0
≤ (1 − t) Uν (µ0 ) + t Uν (µ1 ),
Subdifferential of energy functionals 687

where G(s, t) is the Green function of the one-dimensional Laplace

operator; in particular G(s, t) = t(1 − s) for s ∈ [t, 1]. This combined
with (23.77) implies
Z Z 1 Z 1

1− N
e ∇p(ρ0 )i dν + KN,U
h∇ψ, ρs e s | dν (1 − s) ds
|∇ψ 2
t
Z 1 Z
Uν (µt ) − Uν (µ0 ) 1− N1
e 2 G(s, t)
≤ + KN,U ρs |∇ψs | dν ds
t 0 t
≤ Uν (µ1 ) − Uν (µ0 ),
and the result follows by letting t → 0. (This works for N < ∞ as well
as for N = ∞; recall Proposition 17.24(i).)
Case (II): N = ∞ and K < 0. Then
Z 2
e ∇p(ρ0 )i dν + KW2 (µ0 , µ1 ) .
Uν (µ1 ) ≥ Uν (µ0 ) + h∇ψ, (23.78)
2
This proves (23.26). To establish (23.25), let us write (23.78) with µ1
replaced by µt , and ψ replaced by tψ (0 ≤ t ≤ 1); this gives
Z 2
e ∇p(ρ0 )i dν + t2 KW2 (µ0 , µ1 ) .
Uν (µt ) ≥ Uν (µ0 ) + t h∇ψ,
2
After dividing by t and passing to the lim inf, we recover (23.77) again.

Case (III): N < ∞ and K < 0. Then we have

Z
e ∇p(ρ0 )i dν
Uν (µ1 ) ≥ Uν (µ0 ) + h∇ψ,
Z 1 Z
1
+ KN,U ρs (x)1− N e s (x)| ν(dx) (1 − s) ds,
|∇ψ 2
0
which is the same as (23.26). If we change µ1 for µt , then ψ should be
replaced by tψ, ρs by ρst and ψs by tψst ; the result is
Z
e ∇p(ρ0 )i dν
Uν (µt ) ≥ Uν (µ0 ) + t h∇ψ,
Z 1 Z
1
2
+ t KN,U ρst (x)1− N e 2
|∇ψst (x)| ν(dx) (1 − s) ds
0
Z
≥ Uν (µ0 ) + t e ∇p(ρ0 )i dν
h∇ψ,
Z
KN,U 1
e
+ t2 sup ρτ (x)1− N 2
|∇ψτ (x)| ν(dx) .
2 0≤τ ≤t
688 23 Gradient flows I

The ﬁrst part of the proof of Proposition 17.24(ii) shows that the ex-
pression inside brackets is uniformly bounded as soon as, say, t ≤ 1/2.
So
Z
e ∇p(ρ0 )i dν − O(t2 )
Uν (µt ) ≥ Uν (µ0 ) + t h∇ψ, as t → 0,

and we can conclude as before.

This ﬁnishes the proof of Theorem 23.14 in all cases. ⊓
⊔

Diffusion equations as gradient flows

Now we are equipped to identify certain nonlinear diﬀusive equations

as gradient ﬂows in the Wasserstein space.

Theorem 23.19 (Diffusion equations as gradient flows in the

Wasserstein space). Let M be a Riemannian manifold equipped with
a reference measure ν = e−V vol, V ∈ C 2 (M ), satisfying a CD(K, N )
curvature-dimension bound for some K ∈ R, N ∈ (1, ∞]. Let L =
∆ − ∇V ·∇. Let U be a nonlinearity in DCN , such that U ∈ C 3 (0, +∞);
and let p(r) = r U ′ (r) − U (r). Let ρ = ρt (x) be a smooth (C 1 in t, C 2
in x) positive solution of the partial differential equation

∂ρt
= L p(ρt ), (23.79)
∂t
and let µt = ρt ν. Assume that Uν (µt ) < +∞ for all t > 0; and that for
all 0 < t1 < t2 , Z t2
IU,ν (µt ) dt < +∞.
t1

• If K < 0, further assume that p(r) = O(r 1−1/N ) as r → ∞.

• If M is noncompact, further assume, with the same notation as in
Theorem 23.14, that µt ∈ Ppac (M ).
• If M is noncompact, K < 0 and N < ∞, reinforce the latter as-
sumption into µt ∈ Pqac (M ), where q is as in Theorem 23.14.
Then (µt )t≥0 is a trajectory of the gradient flow associated with the
energy functional Uν in P2ac (M ).
Diffusion equations as gradient flows 689

Remark 23.20. If (ρt ) is reasonably well-behaved at inﬁnity (a for-

tiori if M is compact), then t → Uν (µt ) is nonincreasing (see Theo-
rem 24.2(ii) in the next chapter). Thus the assumption Uν (µt ) < +∞
is satisfied as soon as Uν (µ0 ) < +∞. However, it is interesting to cover
also cases where Uν (µ0 ) = +∞.
R
Remark 23.21. The finiteness of IU,ν (µt ) = ρt |∇U ′ (ρt )|2 dν is a re-
1,1 R
inforcement
p of the condition p(ρ) ∈ Wloc (M ), since |∇p(ρ)| dν ≤
IU,ν (µ).
Example 23.22. If M is compact, any smooth positive solution of
∂t ρ = ∆ρ can be seen asR a trajectory of the gradient flow associated
with the energy H(µ) = ρ log ρ. Similarly, any smooth positive solu-
tion of ∂t ρ = ∆ρ + ∇ · (ρ ∇V ) can be seen as Ra trajectoryR of the gra-
dient flow associated with the energy F (µ) = ρ log ρ + ρV . (With
respect to the previous example, this amounts to changing the reference
measure vol into e−V vol.) If M has dimension n, any smooth positive
solution of ∂t ρ = ∆ρm , m ≥ 1 − 1/n, can be seen as a trajectory
R of the
gradient flow associated with the energy E(µ) = (m − 1)−1 ρm . All
these statements can be generalized to noncompact manifolds, under
adequate global smoothness and decay assumptions. R For instance, any
n 2
smooth positive solution of ∂t ρ = ∆ρ in R , with ρ0 (x)|x| dx < +∞,
is a trajectory of the gradient flow associated with the H functional.
Proof of Theorem 23.19. First note that the assumptions of the theorem
imply KN,U > −∞.
Because U is C 3 on (0, +∞), the function U ′ (ρ) is C 2 , so
ξt (x) := −∇U ′ (ρt (x))
is a C 1 vector field. Then (23.79) can be rewritten as
∂µt
+ ∇ · (ξt µt ) = 0.
∂t
Let σ ∈ P2ac (M ). By Theorem 23.9, the definition of ξ and the
identity ρ U ′′ (ρ) = p′ (ρ), for almost any t,
Z
d W2 (µt , σ)2 e t , ξt i dµt
= − h∇ψ
dt 2
Z
= h∇ψ e t , ∇U ′ (ρt )i dµt
Z
= h∇ψ e t , ∇p(ρt )i dν, (23.80)
690 23 Gradient flows I

where exp(∇ψe t ) is the Monge transport µt → σ.

Let (µ(s) ) be the displacement interpolation joining µ(0) = µt to
(1)
µ = σ. By Theorem 23.14,
Z
Uν (µ(s) ) − Uν (µ(0) )

e t , ∇p(ρt ) dν.
lim inf ≥ ∇ψ (23.81)
s↓0 s
The combination of (23.80) and (23.81) implies

d+ W2 (µt , σ)2 Uν (µ(s) ) − Uν (µ(0) )
≤ lim sup ,
dt 2 s↓0 s

and the conclusion follows from Deﬁnition 23.7. ⊓

⊔

In Theorem 23.19 I assumed the smoothness of the density; but

in many situations there are regularization theorems for such (a pri-
ori nonlinear) diﬀusion equations, so the smoothness assumption can
be relaxed in the end. Such is the case for the heat equation. Here is a
result for this case, stated without proof (for simplicity I shall only con-
sider compact manifolds; noncompact manifolds would require moment
estimates):

Corollary 23.23 (Heat equation as a gradient flow). Let M be

a compact Riemannian manifold curvature, let V ∈ C 2 (M ), and let
L = ∆ − ∇V · ∇. Let µ0 ∈ P2 (M ), and let µt = ρt ν solve
∂ρt
= Lρt .
∂t
Then (µt )t>0 is a trajectory of the gradient flow associated with the
energy functional
Z
Hν (µ) = ρ log ρ dν, µ = ρν

in the Wasserstein space P2ac (M ).

In particular, the gradient flow associated with Hvol is the standard
heat equation
∂ρ
= ∆ρ.
∂t
Remark 23.24. The distinction between P2 (M ) and P2ac (M ) is not
essential here, but I have to do it since in Theorems 23.9 and 23.14
I have only worked with absolutely continuous measures.
Stability 691

Remark 23.25. As I already said at the beginning of this chapter, the

heat equation can be seen as a gradient flow in various ways. For in-
stance, take for simplicity the basic heat equation in Rn , in the form
∂t u = ∆u, then it can
R be interpreted as the gradient flow of the func-
tional E(u) = (1/2) |∇u|2 for the usual Hilbert structure imposedR 2 by
2
the L norm; or as the gradient flow of the functional E(u) = u for
−1
R Hilbert structure induced by the H norm (say on the subspace
the
u = 0). But the interesting new feature coming from Theorem 23.19
is that now the heat equation can be seen as the gradient flow of a nice
functional which has statistical (or thermodynamical) meaning; and in
such a way that it is naturally set in the space of probability measures.
There are other reasons why this new interpretation seems “natural”;
see the bibliographical notes for more information.

Stability

A good point of our weak formulation of gradient ﬂows is that it comes

with stability estimates almost for free. This is illustrated by the next
theorem, in which regularity assumptions are far from optimal.

Theorem 23.26 (Stability of gradient flows in the Wasserstein

bt be two solutions of (23.79), satisfying the assump-
space). Let µt , µ
tions of Theorem 23.19 with either K ≥ 0 or N = ∞. Let λ = K∞,U if
N = ∞; and λ = 0 if N < ∞ and K ≥ 0. Then, for all t ≥ 0,

bt ) ≤ e−λ t W2 (µ0 , µ
W2 (µt , µ b0 ).

Remark 23.27. I don’t know how seriously the restrictions on K

and N should be taken.

Proof of Theorem 23.26. By Theorem 23.9, for almost any t,

Z Z
bt )2
d W2 (µt , µ e ′ e ψbt , ∇U ′ (b
= h∇ψt , ∇U (ρt )i dµt + h∇ ρt )i db
µt ,
dt 2
(23.82)
where exp(∇ψ e ψbt )) is the optimal transport µt → µ
e t ) (resp. exp(∇ bt
bt → µt ).
(resp. µ
By the chain-rule and Theorem 23.14,
692 23 Gradient flows I
Z Z
e t , ∇U ′ (ρt )i dµt =
h∇ψ e t , ∇p(ρt )i dν
h∇ψ

bt )2
W2 (µt , µ
≤ Uν (b
µt ) − Uν (µt ) − λ . (23.83)
2
Similarly,
Z
e ψbt , ∇U ′ (b µt , µt )2
W2 (b
h∇ µt ≤ Uν (µt ) − Uν (b
ρt )i db µt ) − λ . (23.84)
2

The combination of (23.82), (23.83) and (23.84) implies

bt )2
d W2 (µt , µ bt )2
W2 (µt , µ
≤ −2λ .
dt 2 2

Then the result follows from Gronwall’s lemma. ⊓

⊔

General theory and time-discretization

This last section evokes some key issues which I shall not develop,
although they are closely related to the material in the rest of this
chapter.
There is a general theory of gradient flows in metric spaces, based
for instance on Definition 23.7, or other variants appearing in Propo-
sition 23.1. Motivations for these developments come from both pure
and applied mathematics. This theory was pushed to a high degree
of sophistication by many researchers, in particular De Giorgi and his
school. A key role is played by discrete-time approximation schemes,
the simplest of which can be stated as follows:
1. Choose your initial datum X0 ;
2. Choose a time step τ , which in the end will decrease to 0;
(τ ) d(X0 , X)2
3. Let X1 be a minimizer of X 7−→ Φ(X)+ ; then define
2τ
(τ )
(τ ) d(Xk , X)2
inductively Xk+1 as a minimizer of X 7−→ Φ(X) + .
2τ
(τ )
4. Pass to the limit in Xk as τ → 0, kτ → t, hopefully recover a
function X(t) which is the value of the gradient flow at time t.
General theory and time-discretization 693

Such schemes sometimes provide an excellent way to construct the

gradient flow, and they may be useful in numerical simulations. They
also give a more precise formulation of the statement according to which
gradient flows make the energy decrease “as fast as possible”.
There are strong results for the convergence of time-discretized gra-
dient flow as τ → 0; see the bibliographical notes for details.1
The time-discretization procedure also suggests a better under-
standing of the gradient flow in Wasserstein distance, as I shall explain
in a slightly informal way. Consider, as in Theorem 23.19, the partial
differential equation
∂ρ
= L p(ρ).
∂t
Suppose you know the density ρ(t) at some time t, and look for the
density ρ(t + dt) at a later time, where dt is infinitesimally small. To
do this, minimize the quantity

W2 (µt , µt+dt )2
Uν (µt+dt ) − Uν (µt ) + .
2 dt
By using the interpretation of the Wasserstein distance between in-
ﬁnitesimally close probability measures, this can also be rewritten as
Z
W2 (µt , µt+dt )2 ∂µ
≃ inf |v|2 dµt ; + ∇ · (µv) = 0 .
dt ∂t

To go from µ(t) to µ(t + dt), what you have to do is find a velocity

field v inducing an infinitesimal variation dµ = −∇ · (µv) dt, so as to
minimize the infinitesimal quantity

dUν + Kdt, (23.85)

R R
where Uν (µ) = U (ρ) dν, and K is the kinetic energy (1/2) |v|2 dµ
(so Kdt is the inﬁnitesimal
R action). For the heat equation ∂ρ
∂t = ∆ρ,
ν = vol, Uν (µ) = ρ log ρ dν, we are back to the example discussed
at the beginning of this chapter, and (23.85) can be rewritten as an
“inﬁnitesimal variation of free energy”, say

Kdt − dS,

with S standing for the entropy.

1
These notes stop before what some readers might consider as the most interesting
part, namely try to construct solutions by use of the gradient flow interpretation.
694 23 Gradient flows I

There is an important moral here: Behind many nonequilibrium

equations of statistical mechanics, there is a variational principle in-
volving entropy and energy, or functionals alike — just as in equilibrium
statistical mechanics.

Appendix: A lemma about doubling variables

The following important lemma was used in the proof of Theorems 23.9
and 23.26.

Lemma 23.28 (Differentiation through doubling of variables).

Let F = F (s, t) be a function [0, T ] × [0, T ] → R, locally absolutely
continuous in s, uniformly in t; and locally absolutely continuous in t,
uniformly in s. Then t → F (t, t) is absolutely continuous, and for al-
most all t0 ,

d F (t, t0 ) − F (t0 , t0 )
F (t, t) ≤ lim sup
dt t=t0 t↑t0 t − t0

F (t0 , t) − F (t0 , t0 )
+ lim sup . (23.86)
t↓t0 t − t0

If moreover F (t0 , · ) and F ( · , t0 ) are differentiable at all times, for

almost all t0 , then the inequality (23.86) can be reinforced into the
equality

d d d
F (t, t) = F (t, t0 ) + F (t0 , t). (23.87)
dt t=t0 dt t=t0 dt t=t0

Explicitly, to say that F is locally absolutely continuous in s, uni-

formly in t, means that there is a ﬁxed function u ∈ L1loc (dt) such
that Z s′

′
sup F (s, t) − F (s , t) ≤ u(τ ) dτ.
0≤t≤T s

Remark 23.29. Lemma 23.28 does not allow us to conclude to (23.87)

if it is only known that for any t0 , F (t, t0 ) and F (t0 , t) are diﬀerentiable
almost everywhere as functions of t. Indeed, it might be a priori that
diﬀerentiability fails precisely at t = t0 , for all t0 .
Appendix: A lemma about doubling variables 695

Proof of Lemma 23.28. By assumption there are functions u ∈ L1loc (dt)

and v ∈ L1loc (ds) such that
 Z s′

 sup F (s, t) − F (s′ , t) ≤
 u(τ ) dτ


0≤t≤T s


 Z t′


 ′
 sup F (s, t) − F (s, t ) ≤ v(τ ) dτ.
0≤s≤T t

Without loss of generality we may take u = v.

Let f (t) = F (t, t). RThen we have |f (s) − f (t)| ≤ |F (s, s) − F (s, t)| +
t
|F (s, t) − F (t, t)| ≤ 2 s u(τ ) dτ ; so f is locally absolutely continuous.
Let f˙ stand for the derivative of f . Since f is absolutely continuous,
f˙ is also (almost everywhere) the distributional derivative of f . The goal
is to show that f˙(t) is bounded above by the right-hand side of (23.86).
If this is true, then the rest of the proof follows easily: Indeed, if F (s, t)
is diﬀerentiable in s and t then

d d d
F (t, t) ≤ F (t, t0 ) + F (t0 , t),
dt
t=t0 dtt=t0 dt t=t0

and the reverse inequality will be obtained by changing F for −F .

Let ζ be a C ∞ nonnegative function supported in (0, 1). For h small
enough, ζ( · + h) is also supported in (0, 1), and
Z Z Z 1
˙ ζ(t − h) − ζ(t)
f ζ = − f ζ̇ = lim f (t) dt
h↓0 0 h
Z 1
f (t + h) − f (t)
= lim ζ(t) dt.
h↓0 0 h
Replacing f by its expression in terms of F , we get
Z Z 1
˙ F (t + h, t + h) − F (t, t + h)
f ζ = lim ζ(t) dt
h↓0 0 h
Z 1
F (t, t + h) − F (t, t)
+ ζ(t) dt
0 h
Z 1
F (t, t) − F (t − h, t)
≤ lim sup ζ(t − h) dt
h↓0 0 h
Z 1
F (t, t + h) − F (t, t)
+ lim sup ζ(t) dt.
h↓0 0 h
(23.88)
696 23 Gradient flows I

In the ﬁrst integral on the right-hand side of (23.88), it is possible to

replace ζ(t − h) by ζ(t); indeed, since ζ and ζ( · − h) are supported in
(δ, 1 − δ) for some δ > 0, we may write
Z 1
F (t, t) − F (t − h, t)
ζ(t − h) − ζ(t) dt
h
0
Z 1−δ
≤ kζkLip |F (t, t) − F (t − h, t)| dt
δ
Z 1 Z t
≤ kζkLip u(τ ) dτ dt
0 t−h
Z Z !
1−δ min(τ +h,1)
= kζkLip dt u(τ ) dτ
δ τ
Z 1−δ
≤ kζkLip h u(τ ) dτ = O(h).
δ
To summarize:
Z 1 Z 1
˙ F (t, t) − F (t − h, t)
ζ f ≤ lim sup ζ(t) dt
0 h↓0 0 h
Z 1
F (t, t + h) − F (t, t)
+ lim sup ζ(t) dt. (23.89)
h↓0 0 h
By assumption,
Z
F (t, t) − F (t − h, t) 1 t
≤ u(τ ) dτ ;
h h
t−h
and by Lebesgue’s density theorem, the right-hand side converges to u
in L1loc (dt) as h → 0. This makes it possible to apply Fatou’s lemma,
in the form
Z 1
F (t, t) − F (t − h, t)
lim sup ζ(t) dt
h↓0 0 h
Z 1
F (t, t) − F (t − h, t)
≤ ζ(t) lim sup dt. (23.90)
0 h↓0 h
Similarly,
Z 1
F (t, t + h) − F (t, t)
lim sup ζ(t) dt
h↓0 0 h
Z 1
F (t, t + h) − F (t, t)
≤ ζ(t) lim sup dt. (23.91)
0 h↓0 h
Bibliographical notes 697

Plugging (23.90) and (23.91) back in (23.89), we ﬁnd

Z 1 Z
F (t, t) − F (t − h, t)
ζ f˙ ≤ ζ(t) lim sup
0 h↓0 h

F (t, t + h) − F (t, t)
+ lim sup dt.
h↓0 h

Since ζ is arbitrary, f˙ is bounded above by the expression in curly

brackets, almost everywhere. This concludes the proof. ⊓
⊔

Bibliographical notes

Historically, the development of the theory of abstract gradient ﬂows

was initiated by De Giorgi and coworkers (see e.g. [19, 275, 276]) on the
basis of the time-discretized variational scheme; and by Bénilan [95] on
the basis of the variational inequalities involving the square distance, as
in Proposition 23.1(iv)–(vi). The latter approach has the advantage of
incorporating stability and uniqueness as a built-in feature, while the
former is more efficient in establishing existence. Bénilan introduced
his method in the setting of Banach spaces, but it applies just as well
to abstract metric spaces. Both approaches work in the Wasserstein
space. De Giorgi also introduced the formulation in Proposition 23.1(ii),
which is an alternative “intrinsic” definition for gradient flows in metric
spaces.
Basically all methods used to construct abstract gradient flows rely
on some convexity property. But the interplay is in both directions:
the existence of an energy-decreasing flow satisfying (23.2) implies the
λ-convexity of the energy Φ. When (23.2) is replaced by a suitable time-
integrated formulation this becomes a powerful way to study convexity
properties in metric spaces; see [271] for a neat application to the study
of displacement convexity.
Currently, the reference for abstract gradient flows is the recent
monograph by Ambrosio, Gigli and Savaré [30]. (There is a short
version in Ambrosio’s Santander lecture notes [22], and a review by
Gangbo [397].) There the reader will find the most precise results known
to this day, apart from some very recent refinements. More than half
of the book is devoted to gradient flows in the space of probability
698 23 Gradient flows I

measures on Rn (or a separable Hilbert space). Issues about the re-

placement of P2 (Rn ) by P2ac (Rn ) are carefully discussed there. Some
abstract results from [30] are improved by Gigli [415] (with the help
of an interesting quasi-minimization theorem which can be seen as a
consequence of the Ekeland variational principle).
The results presented in this chapter extend some of the results
in [30] to P2 (M ), where M is a Riemannian manifold, sometimes at the
price of less precise conclusions. In another direction of generalization,
Fang, Shao and Sturm [342] have considered gradient flows in P2 (W),
where W is an abstract Wiener space.
Other treatments of gradient flows in nonsmooth structures, under
various curvature assumptions, are due to Perelman and Petrunin [678],
Lytchak [584], Ohta [655] and Savaré [735]; the first two references
are concerned with Alexandrov spaces, while the latter two deal with
so-called 2-uniform spaces. The assumption of 2-uniform smoothness
is relevant for optimal transport, since the Wasserstein space over a
Riemannian manifold is not an Alexandrov space in general (except
for nonnegative curvature). All these works address the construction of
gradient flows under various sets of geometric assumptions.
The classical theory of gradient flows in Hilbert spaces, mostly for
convex functionals, based on Remark 23.3, is developed in Brézis [171]
and other sources; it is also implicitly used in several parts of the pop-
ular book by J.-L. Lions [557].
The differentiability of the Wasserstein distance in P2ac (Rn ), and in
fact in Ppac (Rn ) (for 1 < p < ∞), is proven in [30, Theorems 10.2.2
and 10.2.6, Corollary 10.2.7]. The assumption of absolute continuity
of the probability measures is not crucial for the superdifferentiability
(actually in [30, Theorem 10.2.2] there is no such assumption). For
the subdifferentiability, this assumption is only used to guarantee the
uniqueness of the transference plan. Proofs in [30] slightly differ from
the proofs in the present chapter.
There is a more general statement that the Wasserstein distance
W2 (σ, µt ) is almost surely (in t) differentiable along any absolutely
continuous curve (µt )0≤t≤1 , without any assumption of absolute conti-
nuity of the measures [30, Theorem 8.4.7]. (In this reference, only the
Euclidean space is considered, but this should not be a problem.)
There is a lot to say about the genesis of Theorem 23.14, which
can be considered as a refinement of Theorem 20.1. The exact com-
putation of Step 1 appears in [669, 671] for particular functions U ,
Bibliographical notes 699

and in [814, Theorem 5.30] for general functions U ; all these references
only consider M = Rn . The procedure of extension of ∇ψ (Step 2)
appears e.g. in [248, Proof of Theorem 2] (in the particular case of
convex functions). The integration by parts of Step 3 appears in many
papers; under adequate assumptions, it can be justified in the whole of
Rn (without any assumption of compact support): see [248, Lemma 7],
[214, Lemma 5.12], [30, Lemma 10.4.5]. The proof in [214, 248] relies on
the possibility to find an exhaustive sequence of cutoff functions with
Hessian uniformly bounded, while the proof in [30] uses the fact that
in Rn , the distance to a convex set is a convex function. None of these
arguments seems to apply in more general noncompact Riemannian
manifolds (the second proof would probably work in nonnegative cur-
vature), so I have no idea whether the integration by parts in the proof
of Theorem 23.14 could be performed without compactness assump-
tion; this is the reason why I went through the painful2 approximation
procedure used in the end of the proof of Theorem 23.14.
It is interesting to compare the two strategies used in the exten-
sion from compact to noncompact situations, in Theorem 17.15 on the
one hand, and in Theorem 23.14 on the other. In the former case,
I could use the standard approximation scheme of Proposition 13.2,
with an excellent control of the displacement interpolation and the op-
timal transport. But for Theorem 23.14, this seems to be impossible
because of the need to control the smoothness of the approximation of
ρ0 ; as a consequence, passing to the limit is more delicate. Further, note
that Theorem 17.15 was used in the proof of Theorem 23.14, since it is
the convexity properties of Uν along displacement interpolation which
allows us to go back and forth between the integral and the differential
(in the t variable) formulations.
The argument used to prove that the first term of (23.61) converges
to 0 is reminiscent of the well-known argument from functional analysis,
according to which convergence in weak L2 combined with convergence
of the L2 norm imply convergence in strong L2 .
1,1
At some point I have used the following theorem: If u ∈ Wloc (M ),
then for any constant c, ∇u = 0 almost everywhere on {u = c}. This
classical result can be found e.g. in [554, Theorem 6.19].
Another strategy to attack Theorem 23.14 would have been to start
1/N
from the “curve-above-tangent” formulation of the convexity of Jt ,
where Jt is the Jacobian determinant. (Instead I used the “curve-below-
2
Still an intense experience!
700 23 Gradient flows I

chord” formulation of convexity via Theorem 17.15.) I don’t know if

technical details can be completed with this approach.
The interpretation of the linear Fokker–Planck equation ∂t ρ =
∆ρ + ∇ · (ρ ∇V ) as the limit of a discretized scheme goes back to
the pioneering work of Jordan, Kinderlehrer and Otto [493]. In that
sense the Fokker–Planck equation can be considered Ras the abstract
R
gradient ﬂow corresponding to the free energy Φ(ρ) = ρ log ρ + ρV .
The proof (slightly rewritten) appears in my book [814, Section 8.4].
It is based on the three main estimates which are more or less at the
basis of the whole theory of abstract gradient ﬂows: If τ is the time
(τ )
step, and Xk the position at step k of the discretized system, then


 Φ(Xn(τ ) ) = O(1);







 ∞ (τ ) (τ ) 2

 X d Xj , Xj+1
 = O(1);
2τ
 j=1





 ∞

 X

 τ gradΦ(X (τ ) ) 2 = O(1).

 j
j=1

Here I have assumed that Φ is bounded below (which is the case when
Φ is the free energy functional). When inf Φ = −∞, there are still esti-
mates of the same type, only quite more complicated [30, Section 3.2].
Ambrosio and Savaré recently found a simplified proof of error esti-
mates and convergence for time-discretized gradient flows [34].
Otto applied the same method to various classes of nonlinear dif-
fusion equations, including porous medium and fast diffusion equa-
tions [669], and parabolic p-Laplace type equations [666], but also more
exotic models [667, 668] (see also [737]). For background about the
theory of porous medium and fast diffusion equations, the reader may
consult the review texts by Vázquez [804, 806].
In his work about porous medium equations, Otto also made two
important conceptual contributions: First, he introduced the abstract
formalism allowing him to interpret these equations as gradient flows,
directly at the continuous level (without going through the time-
discretization). Secondly, he showed that certain features of the porous
medium equations (qualitative behavior, rates of convergence to equi-
librium) were best seen via the new gradient flow interpretation. The
Bibliographical notes 701

psychological impact of this work on specialists of optimal transport

was important. Otto’s approach was developed by various authors, in-
cluding Carrillo, McCann and myself [213, 214], Ambrosio, Gigli and
Savaré [30], and others. As an example of recent application, Carrillo
and Calvez [198] applied the same methodology to a one-dimensional
variant of the Keller–Segel chemotaxis model, and showed that its qual-
itative behavior can be conveyed by diagrams depicting the behavior
of a free energy in the Wasserstein space.
The setting adopted in [30, 214, 814] is the following: Let E denote
an energy functional of the form
Z Z Z
1
E(µ) = U (ρ(x)) dx+ V (x) dµ(x)+ W (x−y) dµ(x) dµ(y),
Rn Rn 2 Rn ×Rn
where as usual ρ is the density of µ, and U (0) = 0; then under certain
regularity assumptions, the associated gradient flow with respect to the
2-Wasserstein distance W2 is
∂ρ
= ∆p(ρ) + ∇ · (ρ ∇V ) + ∇ · ρ ∇(ρ ∗ W ) ,
∂t
where as usual p(r) = r U ′ (r) − U (r). (When p(r) = r, the above equa-
tion is a special case of McKean–Vlasov equation.) The most general
results of this kind can be found in [30]. Such equations arise in a num-
ber of physical models; see e.g. [214]. As an interesting particular case,
the logarithmic interaction in dimension 2 gives rise to a form of the
Keller–Segel model for chemotaxis; variants of this model are studied
by means of optimal transport in [121, 198]; (see also [196, Chapter 7]).
Other interesting gradient flows are obtained by choosing for the
energy functional:
• the Fisher information
Z
|∇ρ|2
I(µ) = ;
ρ
then the resulting fourth-order, nonlinear partial differential equa-
tion is a quantum drift-diffusion equation [30, Example 11.1.10],
which also appears in the modeling of interfaces in spin systems.
The gradient flow interpretation of this equation was recently stud-
ied rigorously by Gianazza, Savaré and Toscani [414]. See [497] and
the many references there quoted for other recent contributions on
this model.
702 23 Gradient flows I

• the squared H −1 norm

kµk2H −1 = k∇∆−1 ρk2L2 ;

then the resulting equation appears in the Ginzburg–Landau dy-

namics. This idea has been in the air for a few years at a purely
formal level; recently, Ambrosio and Serfaty [36] have justified this
guess rigorously. (The functional treated there is not exactly the
squared H −1 norm, but it is related; at the same time they showed
how this problem gives some insight into the tricky issue of bound-
ary conditions.)
• the negative squared W2 distance, µ 7−→ −W2 (σ, µ)2 /2, where σ is
a reference measure. (In dimension 2, this is as a kind of dissipative
variant of the semi-geostrophic equations.) The resulting gradient
flow can be considered as the geodesic flow on P2 (M ) (the case
M = Rn is treated in [30, Example 11.2.9 and Theorem 11.2.10]).

Gradient ﬂows with respect to the Wasserstein distances Wp with

p 6= 2 were considered in [666] and lead to other classes of well-known
diffusion equations, such as p-Laplace equations ∂t ρ = ∇ · (|∇ρ|p−2 ∇ρ).
A large part of the discussion can be transposed to that case [4, 594],
but things become quite more difficult.
Brenier [164] has suggested that certain cost functions with “rela-
tivistic” features could be physically relevant, say c(x, y) = c(x − y)
with
r r
|v|2 |v|2
c(v) = 1 − 1 − 2 or c(v) = 1 + 2 − 1.
c c
By applying the general formalism of gradient flows with such cost
functions, he derived relativistic-like heat equations, such as
!
∂ρ ρ ∇ρ
=∇· p .
∂t ρ2 + ε2 |∇ρ|2

This looked a bit like a formal game, but it was later found out
that related equations were common in the physical literature about
flux-limited diffusion processes [627], and that in fact Brenier’s very
equation had already been considered by Rosenau [708]. A rigorous
treatment of these equations leads to challenging analytical difficulties,
which triggered several recent technical works, see e.g. [39, 40, 618] and
Bibliographical notes 703
p
the references therein. By the way, the cost function 1 − 1 − |x − y|2
was later found to have applications in the design of lenses [712].
Lemma 23.28 in the Appendix is borrowed from [30, Lemma 4.3.4].
As Ambrosio pointed out to me, the argument is quite reminiscent of
Kruzkhov’s doubling method for the proof of uniqueness in the the-
ory of scalar conservation laws, see for instance the nice presentation
in [327, Sections 10.2 and 11.4]. It is important to note that the al-
most everywhere differentiability of F in both variables separately is
not enough to apply this lemma.
The stability theorem (Theorem 23.26) is a particular case of more
abstract general results, see for instance [30, Theorem 4.0.4(iv)].
In their study of gradient flows, Ambrosio, Gigli and Savaré point
out that it is useful to construct curves (µt ) satisfying the convexity-
type inequality

W2 (σ, µt )2 ≥ (1 − t) W2 (σ, µ0 )2 + t W2 (σ, µ1 )2

Z
2
− t(1 − t) d T0 (x), T1 (x) σ(dx), (23.92)

where Ti (i = 0, 1) is the optimal transport between σ and µi . (Their

construction is performed only in Rn [30, Proposition 9.3.12].) At first
sight (23.92) seems to contradict the fact that P2 (M ) does not have
sectional curvature bounded above in the Alexandrov sense; but there
is in fact no contradiction since the curve (µt )0≤t≤1 will depend on the
measure σ.
I shall now make some more speculative comments about the inter-
pretation of the results, and their possible extensions.
For the most part, equilibrium statistical mechanics rests on the
idea that the equilibrium measure is obtained by the minimization of
a thermodynamical functional such as the free energy. The principle
according to which nonequilibrium statistical mechanics may also be
understood through variational principles is much more original; I first
heard it explicitly in a seminar by Kinderlehrer (June 1997 in Paris),
about the interpretation of the Fokker–Planck equation by means of
Wasserstein distance. Independently of optimal transport theory, the
same idea has been making its way in the community of physicists,
where it may be attributed to Prigogine. There is ongoing research
in that direction, in relation to large deviations and fluctuation of
currents, performed by Gabrieli, Landim, Derrida, Lebowitz, Speer,
704 23 Gradient flows I

Jona Lasinio and others. It seems to me that both approaches (opti-

mal transport on the one hand, large deviations on the other) have
a lot in common, although the formalisms look very different. By the
way, some links between optimal transport and large deviations have
recently been explored in a book by Feng and Kurtz [355] as well as
in [429, 430, 448].
As for the idea to rewrite (possibly linear) diffusion equations as
nonlinear transport equations, it may at first sight seem odd, but looks
like the right point of view in many problems, and has been used for
some time in numerical analysis.
So far I have mainly discussed gradient flows associated with cost
functions that are quadratic (p = 2), or at least strictly convex. But
there are some quite interesting models of gradient flows for, say, the
cost function which is equal to the distance (p = 1). Such equations
have been used for instance in the modeling of sandpiles [46, 48, 329,
336, 691] or compression molding [47]. These issues are briefly reviewed
by Evans [328].
Also gradient flows of certain “energy” functionals in the Wasser-
stein space may be used to define certain diffusive semigroups; think for
instance of the problem of constructing the heat semigroup on a nons-
mooth metric space. Recently Ohta [655] showed that the heat equation
could be introduced in this way on Alexandrov spaces of nonnegative
sectional curvature. An independent contribution by Savaré [735] ad-
dresses the same problem on certain classes of metric spaces including
Alexandrov spaces with curvature bounded below. On such spaces, the
heat equation can be constructed by other means [537], but it might be
that the optimal transport approach will apply to even more singular
situations, and in any case this provides a genuinely different approach
of the problem.
One can also hope to treat subriemannian situations in the same
way; at least from a formal point of view, this is addressed in [514].
It can also be expected that the gradient flow interpretation will
provide powerful general tools to study the stability of diffusion equa-
tions. In a striking recent application of this principle, Ambrosio, Savaré
and Zambotti [35] proved the stability of the linear diffusion process
associated with the Fokker–Planck equation, in finite or infinite dimen-
sion, with respect to the weak convergence of the invariant probability
measure, as soon as the latter is log concave. Savaré [735, 736] also
used this interpretation to prove the stability of the heat semigroup
Bibliographical notes 705

(on manifolds, or on certain metric-measure spaces) under measured

Gromov–Hausdorff convergence.
Very recently, variational problems taking the form of a discretized
gradient flow have made their way in mathematical economics or deci-
sion theory; in these models the negative of the energy can be thought
of as, say, the reward or the benefits obtained from a certain skill or
method or decision, while the cost function can be interpreted as the
effort or difficulty which one has to spend or endure in order to learn
this skill or change one’s habits or take the decision. As an entry point
to that literature, the reader may take a look at a paper by Attouch
and Soubeyran [49]. It is interesting to note that the gradient flows in
this kind of literature would be more of the kind p = 1 than of the kind
p = 2.
One may speculate about the possible applications of all these gra-
dient flow interpretations in numerical simulation. In [121] the authors
studied a numerical scheme based on the discretization of a gradient
flow in Wasserstein space, which has the striking advantage of pre-
serving qualitative features such as finite-time blow-up or long-time
extinction. The simulation is however very costly.
This chapter was only concerned with gradient flows. The situation
about Hamiltonian flows is anything but clear. In [814, Section 8.3.2]
one can find some examples of equations that one would like to intepret
as Hamiltonian equations with respect to the distance W2 , and other
equations that one would like to interpret as dissipative Hamiltonian
equations. There are many other ones.
An important example of a “Hamiltonian equation” is the semi-
geostrophic system and its variants [86, 87, 265, 266, 268, 421, 569]. The
well-known Euler–Poisson and Vlasov–Poisson models should belong to
this class too, but also some strange variants suggested by Brenier and
Loeper such as the Euler–Monge–Ampère equation [567] or its kinetic
counterpart, the Vlasov–Monge–Ampère equation [170].
Among many examples of “dissipative Hamiltonian equations”,
I shall mention the rescaled two-dimensional incompressible Navier–
Stokes equation in vorticity formulation (for nonnegative vorticity), as
studied by Gallay and Wayne [392]. Caglioti, Pulvirenti and Rousset
[193, 194] have used this interpretation of Navier–Stokes in terms of
optimal transport, to derive related equations attempting to describe
“quasi-stationary states”, or intermediate time-asymptotics.
706 23 Gradient flows I

About the rigorous justiﬁcation of the Hamiltonian formalism in the

Wasserstein space, there are recent works by Ambrosio and Gangbo [27,
28] covering certain classes of Hamiltonian equations, yet not as wide
as one could wish. Cullen, Gangbo and Pisante [267] have studied the
approximation of some of these equations by particle systems. There is
also a work by Gangbo, Nguyen and Tudorascu on the one-dimensional
Euler–Poisson model [401]. Their study provides evidence of striking
“pathological” behavior: If one defines variational solutions of Euler–
Poisson as the minimizing paths for the natural action, then a solution
might very well start absolutely continuous at initial time, collapse on a
Dirac mass in finite time, stay in this state for a positive time, and then
spread again. Also Wolansky [837] obtained coagulation–fragmentation
phenomena through a Hamiltonian formalism (based on an internal
energy rather than an interaction energy).
The precise sense of the “Hamiltonian structure” should be taken
with some care. It was suggested to me some time ago by Ghys that
this really is a Poisson structure, in the spirit of Kirillov. This guess
was justified and completely clarified (at least formally) by Lott [575],
who also made the link with previous work by Weinstein and collabo-
rators (see [575, Section 6] for explanations). Further contributions are
due to Khesin and Lee [514], who also studied the sense in which the
semigeostrophic system defines a Hamiltonian flow [513].
A particularly interesting “dissipative Hamiltonian equation” that
should have an interpretation in terms of optimal transport is the ki-
netic Fokker–Planck equation, with or without self-interaction. Huang
and Jordan [485] studied this model in the setting of gradient flows
(which is not natural here since we are rather dealing with a Hamil-
tonian flow with dissipation). A quite different contribution by Carlen
and Gangbo [204] approaches the kinetic Fokker–Planck equation by a
splitting scheme alternating free transport and time-discretized gradi-
ent flow in the velocity variable.
More recently, Gangbo and Westdickenberg [404] suggested an ap-
proximation scheme for the isentropic Euler system based on a kinetic
approximation (so one works with probability measures on Rn × Rn ),
with a splitting scheme alternating free transport, time-discretized gra-
dient flow in the position variable, and an optimal transport problem
to reconstruct the velocity variable. Their scheme, which has some hid-
den relation to the Huang–Jordan contribution, can be reinterpreted
in terms of minimization of an action that would involve the squared
Bibliographical notes 707

acceleration rather than the squared velocity; and there are heuristic
arguments to believe that this system should converge to a physical
solution satisfying Dafermos’s entropy criterion (the physical energy,
which is formally conserved, should decrease as much as possible). Nu-
merical simulations based on this scheme perform surprisingly well, at
least in dimension 1.
In the big picture also lies the work of Nelson [201, 647, 648, 650,
651, 652] on the foundations of quantum mechanics. Nelson showed that
the usual Schrödinger equation can be derived from a principle of least
action over solutions of a stochastic differential equation, where the
noise is fixed but the drift is unknown. Other names associated with this
approach are Guerra, Morato and Carlen. The reader may consult [343,
Chapter 5] for more information. In Chapter 7 of the same reference,
I have briefly made the link with the optimal transport problem. Von
Renesse [826] explicitly reformulated the Schrödinger equation as a
Hamiltonian system in Wasserstein space.
A more or less equivalent way to see Nelson’s point of view (ex-
plained to me by Carlen) is to study the critical points of the action
Z 1
A(ρ, m) = K(ρt , mt ) − F (ρt ) dt, (23.93)
0

where ρ = ρ(t, x) is a time-dependent density (say onRRn ), m = m(t, x)

is a time-dependent momentum density, R K(ρ, m) = |m|2 /(2ρ) is the
2
kinetic energy and F (ρ) = I(ρ) = |∇ρ| /ρ is the Fisher information.
The density and momentum should satisfy the equation of mass con-
servation, namely ∂t ρ + ∇ · m = 0. At least formally, critical points
of (23.93), for ﬁxed ρ0 , ρ1 , satisfy the Euler–Lagrange equation


 ∂ ρ + ∇ · (ρ ∇ϕ) = 0
 t

√ (23.94)

 |∇ϕ| 2 ∆ ρ

∂t ϕ + =2 √ ;
2 ρ
√ √
the pressure term (∆ ρ)/ ρ is sometimes called the “Bohm potential”.
√
Then the change of unknown ψ = ρ ei ϕ transforms (23.94) into the
usual linear Schrödinger equation.
Variational problems of the form of (23.93) can be used to derive
many systems of Hamiltonian type, and some of these actions are in-
teresting in their own right. The choice F = 0 gives just the squared
708 23 Gradient flows I

2-Wasserstein distance; this is the Benamou–Brenier

R formula [814,
Theorem 8.1]. The functional F (ρ) = − |∇∆−1 (ρ − 1)|2 appears
in the so-called reconstruction problem in cosmology [568] and leads
to the Euler–Poisson
R equations (see also [401]). As a final example,
F (ρ) = −(π 2 /6) ρ3 appears (in dimension n = 1) in the qualita-
tive description of certain random matrix models, and leads to the
isentropic Euler equations with negative cubic pressure law, as first re-
alized by Matytsin; see [449] for rigorous justification and references.
(Some simple remarks about uniqueness and duality for such negative
pressure models can be found in [811].) Unexpectedly, the same varia-
tional problem appears in a seemingly unrelated problem of stochastic
control [563].
24

Gradient flows II: Qualitative properties

Consider a Riemannian manifold M , equipped with a reference measure

ν = e−V vol, and a partial differential equation of the form
∂ρ
= L p(ρ), (24.1)
∂t
where p(r) = r U ′ (r) − U (r), U is a given nonlinearity, the unknown
ρ = ρ(t, x) is a probability density on M and L = ∆ − ∇V · ∇.
Theorem 23.19 provides an interpretation of (24.1) as a gradient
flow in the Wasserstein space P2 (M ). What do we gain from that in-
formation? A first possible answer is a new physical insight. Another
one is a set of recipes and estimates associated with gradient flows; this
is what I shall illustrate in this chapter.
As in the previous chapter, I shall use the following conventions:
• M is a complete Riemannian manifold, d its geodesic distance and
vol its volume;
• ν = e−V vol is a reference measure on M ;
• L = ∆ − ∇V · ∇ is a linear differential operator admitting ν as
invariant measure;
• U is a convex nonlinearity with U (0) = 0; typically U will belong
to some DCN class;
• p(r) = r U ′ (r) − U (r) is the pressure function associated to U ;
• µt = ρt ν is the solution of a certain partial differential equation
∂t ρt = L p(ρt ) (sometimes I shall say that µ is the solution, some-
times that
Z ρ is the solution); Z Z
′ 2 |∇p(ρ)|2
• Uν (µ) = U (ρ) dν; IU,ν (µ) = ρ |∇U (ρ)| dν = dν.
ρ
710 24 Gradient flows II: Qualitative properties

Calculation rules

Having put equation (24.1) in gradient flow form, one may use Otto’s
calculus to shortcut certain formal computations, and quickly get rel-
evant results, without risks of computational errors. When it comes
to rigorous justification, things however are not so nice, and regular-
ity issues should — alas! — be addressed.1 For the most important of
these gradient flows, such as the heat, Fokker–Planck or porous medium
equations, these regularity issues are nowadays under good control.

Examples 24.1. Consider a power law nonlinearity U (r) = r m , m > 0.

For m > 1 the resulting equation (24.1) is called a porous medium
equation, and for m < 1 a fast diffusion equation. These equations
are usually studied under the restriction m > 1 − (2/n), because for
m ≤ 1 − (2/n) the solution might fail to exist (there is in general loss of
mass at infinity in finite time, or even in no time, at least if M = Rn ).
If M is compact and ρ0 is positive, then there is a unique C ∞ , positive
solution. For m > 1, if ρ0 vanishes somewhere, the solution in general
fails to have C ∞ regularity at the boundary of the support of ρ. For
m < 1, adequate decay conditions at infinity are needed.

To avoid inﬂating the size of this chapter much further, I shall not
go into these regularity issues, and be content with theorems that will
be conditional to the regularity of the solution.

Theorem 24.2 (Computations for gradient flow diffusion equa-

tions). Let ρ = ρ(t, x) be a solution of (24.1) defined and continuous
on [0, T ) × M . Further, let A be a convex nonlinearity, C 2 on (0, +∞).
Assume that:
(a) ρ is bounded and positive on [0, θ) × M , for any θ < T ;
(b) ρ is C 3 in the x variable and C 1 in the t variable on (0, T ) × M ;
(c) U is C 4 on (0, T ) × M ;
(d) V is C 4 on M ;
(e) For any t > 0, ∃ δ > 0;
1
sup |ρt − ρs | + |U (ρt ) − U (ρs )|
|s−t|<δ |t − s|

+ |L U ′ (ρt ) p(ρt ) − L U ′ (ρs ) p(ρs )| ∈ L1 (dν);
1
Sometimes the gradient flow structure allows one to dispense with regularity, but
I shall not explore this possibility.
Calculation rules 711

(f ) ρ, p(ρ), Lp(ρ), p2 (ρ), ∇p2 (ρ), U ′ (ρ), ∇U ′ (ρ), LU ′ (ρ), ∇LU ′ (ρ),
L|∇U ′ (ρ)|2 , L(∇U ′ (ρ) ∇LU ′ (ρ)) and e−V satisfy adequate growth/decay
conditions at infinity.
Then the following formulas hold true:
Z Z
d
(i) ∀t > 0, A(ρt ) dν = − p′ (ρt ) A′′ (ρt )|∇ρt |2 dν;
dt
d
(ii) ∀t > 0, Uν (µt ) = −IU,ν (µt );
dt
(iii) ∀t > 0,
Z h i
d
IU,ν (µt ) = −2 k∇2 U ′ (ρt )k2HS + Ric + ∇2 V (∇U ′ (ρt )) p(ρt ) dν
dt M Z
2
+ LU ′ (ρt ) p2 (ρt ) dν;
M
q
d
(iv) ∀σ ∈ P2 (M ), W2 (σ, µt ) ≤ IU,ν (µt ) for almost all t > 0.
ac
dt

Particular Case 24.3. When U (r) = r log r, Formula (ii) becomes a

famous identity: the Fisher information is the time-derivative of the
entropy along the heat semigroup. (What I call entropy is not Hν but
−Hν ; this agrees with the physicists’ convention.)

In the sequel, a smooth solution of (24.1) will be a solution satisfying

Assumptions (a) to (f ) above.

Remark 24.4. I do not wish to be precise about the conditions at in-

finity needed in Assumption (f ), because there are a large number of
possible assumptions. The point is to be able to justify a certain number
of integrations by parts, using integrability and moment conditions. If
V = 0, this works for instance if ρ, p(ρ) and p2 (ρ) have finite moments
of all orders and U ′ (ρ) and all its derivatives up to, say, order 5, have
polynomial growth; but there are many alternative sets of assumptions.
When V is not zero, there might be issues about the density of Cc∞ (M )
in the weighted Sobolev spaces H 1 (e−V ) and H 2 (e−V ) which are asso-
ciated with the operator L. These problems might be worsened by the
behavior of the manifold M at infinity.

Before giving a rigorous proof, I shall ﬁrst provide a formal argument

for Theorem 24.2 based on Otto’s calculus.
712 24 Gradient flows II: Qualitative properties

Formal proof of Theorem 24.2. By Formula 15.2,

Z D E
d
A(ρt ) dν = − gradµt Aν , gradµt Uν
dt
Z
= − ρt ∇A′ (ρt ) · ∇U ′ (ρt ) dν
Z
= − ρt U ′′ (ρt ) A′′ (ρt ) |∇ρt |2 dν
Z
= − p′ (ρt ) A′′ (ρt ) |∇ρt |2 dν.

This leads to formula (i). The choice A = U gives

Z 2
d
U (ρt ) dν = − gradµt Uν
dt
Z
= − ρt |∇U ′ (ρt )|2 dν = −IU,ν (µt ),

which is (ii).
Next, we can diﬀerentiate the previous expression once again along
the gradient ﬂow µ̇ = −gradUν (µ):

d
2

D E
gradµt Uν = −2 Hessµt · gradµt Uν , gradµt Uν ,
dt
and then (iii) follows from Formula 15.7.
As for (iv), this is just a particular case of the general formula

(d/dt)d(X0 , γ(t)) ≤ |γ̇(t)|γ(t) .

⊓
⊔

Rigorous proof of Theorem 24.2. A crucial observation is that (24.1)

can be rewritten ∂t ρ = ∇ν · (ρt ∇U ′ (ρt )), where ∇ν · stands for the
negative of the adjoint of the gradient operator in L2 (ν). (Explicitly:
∇ν · u = ∇ · u − ∇V · u.) Then the proofs of (i) and (ii) are obtained by
just repeating the arguments by which Formula 15.2 was established.
This is a succession of diﬀerentiations under the integral symbol, chain-
rules and integrations by parts:
Calculation rules 713
Z Z
d
A(ρt ) dν = ∂t [A(ρt )] dν
dt
Z
= A′ (ρt ) (∂t ρt ) dν
Z
= A′ (ρt ) ∇ν · (ρt ∇U ′ (ρt )) dν
Z
= − ∇A′ (ρt ) ρt ∇U ′ (ρt ) dν,

and then the rest of the computation is the same as before.

The justiﬁcation of (iii) is more tricky. First write
Z Z
− ρ |∇U (ρ)| dν = U ′ (ρ) ∇ν · (ρ ∇U ′ (ρ)) dν
′ 2

Z Z
= U (ρ) Lp(ρ) dν = LU ′ (ρ) p(ρ) dν,
′

where the self-adjointness of L with respect to the measure ν was used.

Then
Z
d
LU ′ (ρt ) p(ρt ) dν
dt
Z Z

= ∂t LU ′ (ρt ) p(ρt ) dν + LU ′ (ρt ) ∂t p(ρt ) dν
Z Z
= L(∂t U (ρt )) p(ρt ) dν + LU ′ (ρt ) p′ (ρt ) ∇ν · (ρt ∇U ′ (ρt )) dν.
′

(24.2)

On the other hand,

∂t U ′ (ρt ) = U ′′ (ρt ) ∂t ρt = U ′′ (ρt ) ∇ν · (ρt ∇U ′ (ρt ))

= U ′′ (ρt ) ∇ρt · ∇U ′ (ρt ) + ρt U ′′ (ρt ) LU ′ (ρt )
= |∇U ′ (ρt )|2 + ρt U ′′ (ρt ) LU ′ (ρt ).

Plugging this back into (24.2), we obtain

Z
d
LU ′ (ρt ) p(ρt ) dν
dt
Z Z
= L|∇U (ρt )| p(ρt ) dν + L ρt U ′′ (ρt ) LU ′ (ρt ) p(ρt ) dν
′ 2

Z
+ LU ′ (ρt ) p′ (ρt ) ∇ν · (ρt ∇U ′ (ρt )) dν. (24.3)
714 24 Gradient flows II: Qualitative properties

The last two terms in this formula are actually equal: Indeed, if ρ is
smooth then
Z Z
L ρ U (ρ) LU (ρ) p(ρ) dν = ρ U ′′ (ρ) LU ′ (ρ) Lp(ρ) dν
′′ ′

Z
= p′ (ρ) LU ′ (ρ) ∇ν · (ρ ∇U ′ (ρ)) dν.

So the expression appearing in (24.3) is exactly twice the expression

appearing in (15.18), up to the replacement of ψ by −U ′ (ρt ). At this
point, to arrive at formula (iii), it suﬃces to repeat the computations
leading from (15.18) to (15.20), and to apply Bochner’s formula, say in
the form (15.6)–(15.7).
The proof of (iv) is simple: By Theorem 23.9, for almost all t,
Z
d W2 (µt , σ)2 e dν,
= − h∇p(ρt ), ∇ψi
dt 2

e
where exp(∇ψ) is the optimal transport µt → σ. It follows that
sZ sZ
d+ W2 (µt , σ)2 |∇p(ρt )|2 e 2 dν
≤ dν ρt |∇ψ|
dt 2 ρt
q
= IU,ν (µt ) W2 (µt , σ);

so for almost all t > 0,

+ q
d+ 1 d 2
W2 (µt , σ) ≤ W2 (µt , σ) ≤ IU,ν (µt ). (24.4)
dt 2 W2 (µt , σ) dt

This is the desired estimate. ⊓

⊔

As a ﬁnal remark, Theorem 24.2 automatically implies some inte-

grated (in time) “regularity” a priori estimates for (24.1), as the next
corollary shows.

Corollary 24.5 (Integrated regularity for gradient flows). With

the same assumptions as in Theorem 24.2, one has Uν (µt ) ≤ Uν (µ0 )
and
Z +∞ " #2 Z +∞
W2 (µt , µt+s )
lim sup dt ≤ IU,ν (µt ) dt ≤ Uν (µ0 )−(inf Uν ).
0 s↓0 s 0
Large-time behavior 715

Remark 24.6. If Uν is bounded below, this corollary yields exactly the

regularity which is a priori required in Theorem 23.19. It also shows
that t → µt belongs to AC2 ((0, +∞); P2 (M )) (absolute continuity of
Rorder 2) in the sense that there is ℓ ∈ L2 (dt) such that W2 (µt , µs ) ≤
t
Rs ℓ(τ ) dτ . (Take ℓ(t) = lim sups→0 W2 (µt , µt+s )/s.) Finally, the bound
UU,ν (µt ) dt < +∞ is the assumption of Theorem 23.9.

Large-time behavior
Otto’s calculus, described in Chapter 15, was ﬁrst developed to estimate
rates of equilibration for certain nonlinear diﬀusion equations. The next
theorem illustrates this.
Theorem 24.7 (Equilibration in positive curvature). Let M be
a Riemannian manifold equipped with a reference measure ν = e−V ,
V ∈ C 4 (M ), satisfying a curvature-dimension bound CD(K, N ) for
some K > 0, N ∈ (1, ∞], and let U ∈ DCN . Then:
(i) (exponential convergence to equilibrium) Any smooth so-
lution (µt )t≥0 of (24.1) satisfies the following estimates:


 (a) [Uν (µt ) − Uν (ν)] ≤ e−2Kλ t [Uν (µ0 ) − Uν (ν)]





(b) IU,ν (µt ) ≤ e−2Kλ t IU,ν (µ0 ) (24.5)







(c) W2 (µt , ν) ≤ e−Kλ t W2 (µ0 , ν),
where − 1
p(r) N
λ := lim
1 sup ρ0 (x)
(24.6) .
r 1− N
r→0x∈M
In particular, λ is independent of ρ0 if N = ∞.
(ii) (exponential contraction) Any two smooth solutions (µt )t≥0
and (eµt )t≥0 of (24.1) satisfy
et ) ≤ e−Kλ t W2 (µ0 , µ
W2 (µt , µ e0 ), (24.7)
where
h i− 1
p(r) N
λ := lim 1 max sup ρ0 (x), sup ρ1 (x) . (24.8)
r→0 1− N
r x∈M x∈M
716 24 Gradient flows II: Qualitative properties

Example 24.8. Smooth solutions of the Fokker–Planck equation

∂ρ
= Lρ (24.9)
∂t
converge to equilibrium at least as fast as O(e−Kt ), in Wp2 distance, in
the entropy sense (i.e. in the sense of the convergence of Hν (µ) to 0),
and in the Fisher information sense.

Remark 24.9. At least formally, these properties are in fact general

properties of gradient flows: Let F be a function deﬁned on a geodesi-
cally convex subset of a Riemannian manifold (M, g); Hess F ≥ λ g,
λ > 0; X∞ is the minimizer of F ; and X, X e are two trajectories of the
gradient ﬂow associated with F . Then we have three neat estimates:
(a) [F (X(t)) − F (X∞ )] ≤ e−λt [F (X(0)) − F (X∞ )]; (b) |∇F (X(t))| ≤
e−λt |∇F (X(0))|; (c) d(X(t), Xe (t)) ≤ e−λt d(X(0), X
e (0)). The proof of
these inequalities will be a good exercise for the reader.

Remark 24.10. The rate of decay O(e−λ t ) is optimal for (24.9) if di-
mension is not taken into account; but if N is ﬁnite, the optimal rate
of decay is O(e−λt ) with λ = KN/(N − 1). The method presented in
this chapter is not clever enough to catch this sharp rate.

Remark 24.11. I believe that the preceding results of convergence are

satisfactory as I have stated them, i.e. in terms of convergence of natu-
ral, physically meaningful functionals. However, it is also often possible
to get similar rates of decay for more classical distances such as the L1
norm, thanks to the Csiszár–Kullback–Pinsker inequality (22.25) and
generalizations thereof.

Remark 24.12. If N < ∞, Theorem 24.7 proves convergence to equi-

librium with a rate that depends on the initial datum. However, if the
solution (ρt )t≥0 satisﬁes uniform smoothness bounds, it is often pos-
sible to reinforce the statement ρt −→ 1 into ρt −−∞→ 1. Then we can
L1 L
choose ρT as new initial datum, and get

t ≥ T =⇒ Uν (µt ) ≤ e−KλT (t−T ) Uν (µT ) ≤ e−KλT (t−T ) Uν (µ0 ),

1
where λT = (lim p(r)/r 1−1/N ) (sup ρT )− N −→ λ∞ = (lim p(r)/r 1−1/N )
e e > λ.
as T → ∞. It follows that µt converges to ν as O(e−K λ t ) for any λ
Large-time behavior 717

Proof of Theorem 24.7. Let H(t) = Uν (µt ). Theorem 24.2(ii) reads

H ′ (t) = −IU,ν (µt ). Let λ0 := limr→0 p(r)/r 1−1/N . The (modiﬁed)
Sobolev inequality of Theorem 21.7 implies
1
(sup ρt ) N
Uν (µt ) ≤ IU,ν (µt ).
2Kλ0
Thus,
d
H(t) ≤ −2Kλ0 (sup ρt )−1/N H(t). (24.10)
dt
Theorem 24.2(i) with A(r) = r p , p ≥ 2, gives
Z Z
d
ρ dν = −p(p − 1) ρ U ′′ (ρ)ρp−2 |∇ρ|2 dν ≤ 0.
p
dt
So kρt kpLp is a nonincreasing function of t, and therefore
∀t ≥ 0 kρt kLp (ν) ≤ kρ0 kLp (ν) .
Passing to the limit as p → ∞ yields
∀t ≥ 0, sup ρt ≤ sup ρ0 . (24.11)
Plugging this back into (24.10), we get
d
H(t) ≤ −2Kλ0 (sup ρ0 )−1/N H(t) = −2Kλ H(t),
dt
and then (24.5)(a) follows.
Next, if U ∈ DCN , and CD(K, N ) is enforced, we can write, as
in (16.13),
1 d
− IU,ν (µt )
2 dt Z Z
= Γ2 (U ′ (ρt )) p(ρt ) dν + (LU ′ (ρt ))2 p2 (ρt ) dν
ZM M Z h
′ pi
≥ RicN,ν (∇U (ρt )) p(ρt ) dν + (LU ′ (ρt ))2 p2 + (ρt ) dν
M N
Z Z M h pi
≥K |∇U ′ (ρt )|2 p(ρt ) dν + (LU ′ (ρt ))2 p2 + (ρt ) dν
N
ZM M

≥K |∇U ′ (ρt )|2 p(ρt ) dν

M Z
1
−N
≥ Kλ0 (sup ρt ) |∇U ′ (ρt )|2 ρt dν
M
1
= Kλ0 (sup ρt )− N IU,ν (µt ).
718 24 Gradient flows II: Qualitative properties

This implies (24.5)(b).

It remains to establish (24.7) (of which (24.5)(c) obviously is a corol-
lary). The strategy is the same as for Theorem 23.26. Let t > 0 be given.
First, the assumption K ≥ 0 implies sup ρt ≤ sup ρ0 , sup ρet ≤ sup ρe0
(recall (24.11)).
If (µ(s) = ρ(s) ν)0≤s≤1 is the displacement interpolation between ρt ν
and ρet ν, then since r → r m lies in DCN and K ≥ 0, we can use the
displacement convexity to write
Z Z Z
(s) m (0) m (1) m
∀s ∈ [0, 1], (ρ ) dν ≤ max (ρ ) dν, (ρ ) dν ;

by raising to the power 1/m and letting m → ∞, we obtain

sup ρ(s) ≤ max (sup ρt , sup ρet ) ≤ max (sup ρ0 , sup ρe0 ), (24.12)

where all the suprema are essential suprema.

Now let T = exp(∇ψ)e be the optimal transport from µt to µet ,
2 (s)
where ψ is d /2-convex. If (ψ )0≤s≤1 is obtained from ψ by action of
the Hamilton–Jacobi semigroup, then by Proposition 17.24(i),
Z 1 Z 1

(s) 1− N
(ρ ) e (s) 2
|∇ψ | dν (1 − t) dt
0
h i− N1 W2 (µt , µ
et )2
≥ max sup ρ0 , sup ρe0 .
2
et and µ replaced by
So if we apply Theorem 23.14 with σ replaced by µ
µt , we deduce
Z
et )2
e ∇p(ρt )i dν + Kλ W2 (µt , µ
µt ) ≥ Uν (µt ) + h∇ψ,
Uν (e , (24.13)
2
where λ is deﬁned by (24.8).
e
Similarly, if exp(∇ψ) is the optimal transport between µ et and µt ,
then
Z
e ∇p(e
e ψ, et )2
Kλ W2 (µt , µ
Uν (µt ) ≥ Uν (e
µt ) + h∇ ρt )i dν + . (24.14)
2
By (24.13) and (24.14),
Z Z
e ∇p(ρt )i dν + h∇
h∇ψ, e ∇p(e
e ψ, et )2 . (24.15)
ρt )i dν ≤ −Kλ W2 (µt , µ
Large-time behavior 719

On the other hand, Theorem 23.9 shows that (d/dt)(W2 (µt , µ et )2 /2)
is equal to the left-hand side of (24.15), for almost all t. We conclude
that
d+
et )2 ≤ −2Kλ W2 (µt , µ
W2 (µt , µ et )2 , (24.16)
dt
and the desired result follows. ⊓
⊔
Remark 24.13. Here is an alternative scheme of proof for Theo-
rem 24.7(ii). The problem is to estimate
Z Z

∇U ′ (ρt ), ∇ψ dµt + ρt ), ∇ψe de
∇U ′ (e µt .

(Let us forget about the approximate gradients.) Introduce the dis-

(s) (0) (1)
placement interpolation (µt )0≤s≤1 with µt = µt , µt = µ et , and let
ψ (s) be the solution of the Hamilton–Jacobi equation starting from
ψ (0) = ψ, so that ψ (1) = −ψe (recall Remarks 5.15 and 7.38). Dropping
the tildes for notational simplicity, the problem is to estimate from
below
Z Z

′ (0) (0)
(0)

∇U (ρ ), ∇ψ dµ − ∇U ′ (ρ(1) ), ∇ψ (1) dµ(1) = f (0) − f (1),
R
where f (s) = h∇U ′ (ρ(s) ), ∇ψ (s) i dµ(s) . This can be done by estimating
the time-derivative of f , and considering the quantity
f (s + h) − f (s)
Z h Z
1 ′ (s+h) (s+h) (s+h) ′ (s) (s) (s)
= h∇U (ρ ), ∇ψ i dµ − h∇U (ρ ), ∇ψ i dµ .
h
h
Note that µ(s+h) = exp( 1−s ∇ψ (s) )# µ(s) ; also if Πv stands for the par-
allel transport along the curve exp(τ v) (0 ≤ τ ≤ 1), then

h
Π( h )∇ψ(s) (∇ψ (s) ) = ∇ψ (s+h) exp ∇ψ (s) .
1−s 1−s
Using these identities and the fact that parallel transport preserves the
scalar product, we deduce
Z

∇U ′ (ρ(s+h) ), ∇ψ (s+h) i dµ(s+h)

Z

h
= ∇U ′ (ρ(s+h) ), ∇ψ (s+h) ◦ exp ∇ψ (s) dµ(s)
1−s
Z D E
h
= Π −1
h
∇ψ(s)
∇U ′
ρ(s+h)
exp ∇ψ (s)
, ∇ψ (s)
dµ(s) .
1−s 1−s
720 24 Gradient flows II: Qualitative properties

Then, at least formally, df /ds coincides with

Z D Π −1 h ∇U ′ ρ(s+h) exp h ∇ψ(s)
1−s − U ′ (ρ) E
∇ψ(s)
lim 1−s
, ∇ψ(s) dµ(s) .
h↓0 h
From this point it is clear that one can take the computation to the
end, with adequate input from Riemannian geometry. Although this ap-
proach is in some sense more direct, I preferred to use the other strategy
based on displacement convexity estimates, because all the input from
diﬀerential geometry was already contained in those estimates.
Remark 24.14. In the particular case µ et = ν, inequality (24.13) and
Theorem 23.9 imply the following: Let M satisfy a CD(K, N ) condition
with K ≥ 0, let U ∈ DCN with U (1) = 0, let µ = ρ ν ∈ P2ac (M ), and
let − 1
p(r) N
λ := lim 1 sup ρ0 (x) ;
r→0 r 1− N x∈M
if (µt )0≤t≤1 is a smooth solution of the gradient flow ∂t ρt = L p(ρt )
starting from µ0 = µ, then for almost all t,

d+ W2 (µt , ν)2 Kλ W2 (µ, ν)2
− ≥ Uν (µ) + . (24.17)
dt t=0 2 2

On the other hand, by using the same arguments as in the proof of

Theorem 22.14 and in (24.12), it is easy to establish the Talagrand-type
inequality
Kλ W2 (µ, ν)2
Uν (µ) ≥ .
2
So (24.17) is a reinforcement of (24.16).
Example 24.15. When M is Rn equipped with the standard Gaussian
measure γ, we have the following inequalities relating the functionals
Hγ (Kullback information), Iγ (Fisher information) and W2 (Wasser-
stein distance) along the Ornstein–Uhlenbeck semigroup ∂t ρt (x) =
∆ρt (x) − x · ∇ρt (x):

 d

 − Hγ (µt ) = Iγ (µt ) ≥ 2 Hγ (µt );

 dt


 d W2 (µt , γ)2 W2 (µt , γ)2

− ≥ Hγ (µt ) + ≥ W2 (µt , γ)2 .
dt 2 2
Short-time behavior 721

Short-time behavior
A popular and useful topic in the study of diﬀusion processes consists
in establishing regularization estimates in short time. Typically, a
certain functional used to quantify the regularity of the solution (for
instance, the supremum of the unknown or some Lebesgue or Sobolev
norm) is shown to be bounded like O(t−κ ) for some characteristic expo-
nent κ, independent of the initial datum (or depending only on certain
weak estimates on the initial datum), when t > 0 is small enough. Here
I shall present some slightly unconventional estimates of this type.
Theorem 24.16 (Short-time regularization for gradient flows).
Let M be a Riemannian manifold satisfying a curvature-dimension
bound CD(K, ∞), K ∈ R; let ν = e−V vol ∈ P2 (M ), with V ∈ C 4 (M ),
and let U ∈ DC∞ with U (1) = 0. Further, let (µt )t≥0 be a smooth
solution of (24.1). Then:
(i) If K ≥ 0 then for any t ≥ 0,
t2 IU,ν (µt ) + 2t Uν (µt ) + W2 (µt , ν)2 ≤ W2 (µ0 , ν)2 .
In particular,
W2 (µ0 , ν)2
Uν (µt ) ≤ , (24.18)
2t
W2 (µ0 , ν)2
IU,ν (µt ) ≤ . (24.19)
t2
(ii) If K ≥ 0 and t ≥ s > 0, then
p p q
W2 (µs , µt ) ≤ min 2 Uν (µs ) |t − s|, IU,ν (µs ) |t − s| (24.20)
p|t − s| |t − s|
≤ W2 (µ0 , ν) min √ , . (24.21)
s s
(iii) If K < 0, the previous conclusions become
e2Ct W2 (µ0 , ν)2 e2Ct W2 (µ0 , ν)2
Uν (µt ) ≤ ; IU,ν (µt ) ≤ ;
2t t2
p p q
Ct
W2 (µs , µt ) ≤ e min 2 Uν (µs ) |t − s|, IU,ν (µs ) |t − s|
p|t − s| |t − s|
2Ct
≤ e W2 (µ0 , ν) min √ , ,
s s
with C = −K.
722 24 Gradient flows II: Qualitative properties

Particular Case 24.17. When U (ρ) = ρ log ρ, inequalities (24.18)

and (24.19) become

W2 (µ0 , ν)2 W2 (µ0 , ν)2

Hν (µt ) ≤ , Iν (µt ) ≤ . (24.22)
2t t2
Under a CD(K, ∞) bound (K < 0) there is an additional factor e−2Kt .

Remark 24.18. Theorem 24.16 should be thought of as an a priori

estimate. If life is not unfair, one can then remove the assumption of
smoothness by a density argument, and transform (24.18), (24.19) into
genuine regularization estimates. This is true at least for the Particular
Case 24.17.

Remark 24.19. Inequalities (24.20) and (24.21) establish the follow-

ing estimates: The curve (µt )t≥0 , viewed as a function of time t, is
Hölder-1/2 close to t = 0, and Lipschitz away from t = 0, if Uν (µ0 ) is
ﬁnite. If IU,ν (µ0 ) is ﬁnite, then the curve is Lipschitz all along.

Remark 24.20. Theorem 24.7 gave upper bounds on Uν (µt ) − Uν (ν)

like O(e−κ t ), with a constant depending on Uν (µ0 ). But now we can
combine Theorem 24.7 with Theorem 24.16 to get an exponential decay
with a constant that does not depend on Uν (µ0 ), but only on W2 (µ0 , ν).
By approximation, this will lead to results of convergence that do not
need the ﬁniteness of Uν (µ0 ).

Remark 24.21. I would bet that the estimates in (24.22) are optimal
in general (although they would deserve more thinking) as far as the
dependence on µ0 and t is concerned. On the other hand, if µ0 is given,
these bounds are terrible estimates for the short-time behavior of the
Kullback and Fisher informations as functions of just t. Indeed, the
correct scale for the Kullback information Hν (µt ) is O(log(1/t)), and
for the Fisher information it is O(1/t), as can be checked easily in the
particular case when M = Rn and ν is the Gaussian measure.

Proof of Theorem 24.16. First note that U (1) = 0 implies Uν (µ) ≥

Uν (ν) = 0.
e
Let t > 0 be given, and let exp(∇ψ) be the optimal transport be-
tween µt and ν, where as usual ψ is d2 /2-convex. Since Uν (ν) = 0 and
K ≥ 0, Theorem 23.14 implies
Z
e ∇p(ρt )i ≤ 0.
Uν (µt ) + h∇ψ, (24.23)
Short-time behavior 723

On the other hand, by Theorem 23.9, for almost all t,

Z
d+ 2 e ∇p(ρt )i dν.
W2 (µt , ν) ≤ 2 h∇ψ, (24.24)
dt
The combination of (24.23) and (24.24) implies

d+
W2 (µt , ν)2 ≤ −2 Uν (µt ). (24.25)
dt
Now introduce

ψ(t) := a(t) IU,ν (µt ) + b(t) Uν (µt ) + c(t) W2 (µt , ν)2 ,

where a(t), b(t) and c(t) will be determined later.

Because of the assumption of nonnegative curvature, the quantity
IU,ν (µt ) is nonincreasing with time. (Set K = 0 in (24.5)(b).) Combin-
ing this with (24.25) and Theorem 24.2(ii), we get

d+ ψ
≤ [a′ (t) − b(t)] IU,ν (µt ) + [b′ (t) − 2c(t)] Uν (µt ) + c′ (t) W2 (µt , ν)2 .
dt
If we choose

a(t) ≡ t2 , b(t) ≡ 2t, c(t) ≡ 1,

then ψ has to be nonincreasing as a function of t, and this implies (i).

Let us now prove (ii). By Theorem 24.2(iv), for almost all t > s ≥ 0,
q q
d+
W2 (µs , µt ) ≤ IU,ν (µt ) ≤ IU,ν (µs ),
dt
so q
W2 (µs , µt ) ≤ IU,ν (µs ) |t − s|. (24.26)
On the other hand, by Theorems 23.9 and 23.14 (more precisely
(23.26) with K = 0, σ replaced by µt and µ replaced by µs ),

d+
W2 (µs , µt )2 ≤ 2 [Uν (µs ) − Uν (µt )] ≤ 2 Uν (µs ).
dt
So
W2 (µs , µt )2 ≤ 2 Uν (µs ) |t − s|. (24.27)
Then (ii) results from the combination of (24.26) and (24.27), together
with (i).
724 24 Gradient flows II: Qualitative properties

The proof of (iii) is pretty much the same, with the following mod-
iﬁcations:
d IU,ν (µt )
≤ (−2K) IU,ν (µt );
dt
d+
W2 (µt , ν)2 ≤ −2 Uν (µt ) + (−2K) W2 (µt , ν)2 ;
dt

ψ(t) := e2Kt t2 IU,ν (µt ) + 2t Uν (µt ) + W2 (µt , ν)2 .

Details are left to the reader. (The estimates in (iii) can be somewhat
reﬁned.) ⊓
⊔

Exercise 24.22. Assuming CD(0, ∞), establish the estimate

Uν (µ0 )
IU,ν (µt ) ≤ .
t
Remark 24.23. There are many known regularization results in short
time, for certain of the gradient flows considered in this chapter. The
two most famous examples are:
• the Li–Yau estimates, which give lower bounds on ∆ log ρt , for
a solution of the heat equation on a Riemanian manifold, under
certain curvature-dimension conditions. For instance, if M satisfies
CD(0, N ), then
N
∆ log ρt ≥ − ;
2t
• the Aronson–Bénilan estimates, which give lower bounds on
∆ρm−1
t for solutions of the nonlinear diffusion equation ∂t ρ = ∆ρm
n
in R , where 1 − 2/n < m < 1:
m n
∆(ρm−1
t )≥− , λ = 2 − n(1 − m).
m−1 λt
There is an obvious similarity between these two estimates, and both
can be interpreted as a lower bound on the rate of divergence of the
vector field which drives particles in the gradient flow interpretation of
these partial differential equations. I think it would be very interesting
to have a unified proof of these inequalities, under certain geometric
conditions. For instance one could try to use the gradient flow interpre-
tation of the heat and nonlinear diffusion equations, and maybe some
localization by restriction.
Bibliographical notes 725

Bibliographical notes

In [669], Otto advocated the use of his formalism both for the purpose
of finding new schemes of proof, and for giving a new understanding of
certain results.
What I call the Fokker–Planck equation is
∂µ
= ∆µ + ∇ · (µt ∇V ).
∂t
This is in fact an equation on measures. It can be recast as an equation
on functions (densities):
∂ρ
= ∆ρ − ∇V · ∇ρ.
∂t
From the point of view of stochastic processes, the relation between
these two formalisms is the following: µt can be thought
√ of as law (Xt ),
where Xt is the stochastic process defined by dXt = 2 dBt −∇V (Xt ) dt
(Bt = standard Brownian motion on the manifold), while ρt (x) is de-
fined by the equation ρt (x) = E x ρ0 (Xt ) (the subscript x means that
the process Xt starts at X0 = x). In the particular case when V is a
quadratic potential in Rn , the evolution equation for ρt is often called
the Ornstein–Uhlenbeck equation.
The observation that the Fisher information Iν is the time-derivative
of the entropy functional −Hν along the heat semigroup seems to first
appear in a famous paper by Stam [758] at the end of the fifties, in the
case M = R (equipped with the Lebesgue measure). Stam gives credit
to de Bruijn for that remark. The generalization appearing in Theo-
rem 24.2(ii) has been discovered and rediscovered by many authors.
Theorem 24.2(iii) goes back to Bakry and Émery [56] for the case
U (r) = r log r. After many successive generalizations, the statement as
I wrote it was formally derived in [577, Appendix D]. To my knowledge,
the argument given in the present chapter is the first rigorous one to
be written down in detail (modulo the technical justifications of the
integrations by parts), although it is a natural expansion of previous
works.
Theorem 24.2(iv) was proven by Otto and myself [671] for σ = µ0 .
The case σ = ν is also useful and was considered in [219].
Regularity theory for porous medium equations has been the object
of many works, see in particular the synthesis works by Vázquez [804,
805, 806]. When one studies nonlinear diffusions by means of optimal
726 24 Gradient flows II: Qualitative properties

transport theory, the regularity theory is the first thing to worry about.
In a Riemannian context, Demange [291, 292, 290, 293] presents many
approximation arguments based on regularization, truncation, etc. in
great detail. Going into these issues would have led me to considerably
expand the size of this chapter; but ignoring them completely would
have led to incorrect proofs.
It has been known since the mid-seventies that logarithmic Sobolev
inequalities yield rates of convergence to equilibrium for heat-like equa-
tions, and that these estimates are independent of the dimension. For
certain problems of convergence to equilibrium involving entropy, log-
arithmic Sobolev inequalities are quite more convenient than spectral
tools. This is especially true in infinite dimension, although logarithmic
Sobolev inequalities are also very useful in finite dimension. For more
information see the bibliographical notes of Chapter 21.
As recalled in Remark 24.11, convergence in the entropy sense im-
plies convergence in total variation. In [220] various functional methods
leading to convergence in total variation are examined and compared.
Around the mid-nineties, Toscani [784, 785] introduced the logarith-
mic Sobolev inequality in kinetic theory, where it was immediately rec-
ognized as a powerful tool (see e.g. [300]). The links between logarithmic
Sobolev inequalities and Fokker–Planck equations were re-investigated
by the kinetic theory community, see in particular [43] and the refer-
ences therein. The emphasis was more on proving logarithmic Sobolev
inequalities thanks to the study of the convergence to equilibrium for
Fokker–Planck equations, than the reverse. So the key was the study of
convergence to equilibrium in the Fisher information sense, as in Chap-
ter 25; but the final goal really was convergence in the entropy sense.
To my knowledge, it is only in a recent study of certain algorithms
based on stochastic integration [549], that convergence in the Fisher
information sense in itself has been found useful. (In this work some
constructive criteria for exponential convergence in Fisher information
are given; for instance this is true for the heat equation ∂t ρ = ∆ρ, under
a CD(K, ∞) bound (K < 0) and a logarithmic Sobolev inequality.)
Around 2000, it was discovered independently by Otto [669], Carrillo
and Toscani [215] and Del Pino and Dolbeault [283] that the same
“information-theoretical” tools could be used for nonlinear equations
of the form
∂ρ
= ∆ρm (24.28)
∂t
Bibliographical notes 727

in Rn . Such equations are called porous medium equations for m > 1,

and fast diffusion equations for m < 1. For these models there is no
convergence to equilibrium: the solution disperses at infinity. But there
is a well-known scaling, due to Barenblatt, which transforms (24.28)
into
∂ρ
= ∆ρm + ∇x · (ρx). (24.29)
∂t
Then, up to rescaling space and time, it is equivalent to understand the
convergence to equilibrium for (24.29), or to understand the asymptotic
behavior for (24.28), that is, how fast it approaches a certain known
self-similar profile.
The extra drift term in (24.29) acts like the confinement by a
quadratic potential, and this in effect is equivalent to imposing a cur-
vature condition CD(K, ∞) (K > 0). This explains why there is an
approach based on generalized logarithmic Sobolev inequalities, quite
similar to the proof of Theorem 24.7.
These problems can be attacked without any knowledge of optimal
transport. In fact, among the authors quoted before, only Otto did use
optimal transport, and this was not at the level of proofs, but only
at the level of intuition. Later in [671], Otto and I gave a more direct
proof of logarithmic Sobolev inequality based on the HWI inequality.
The same strategy was applied again in my joint work with Carrillo
and McCann [213], for more general equations involving also a (simple)
nonlinear drift.
In [213] the basic equation takes the form
∂ρ
= σ∆ρ + ∇ · (ρ∇V ) + ∇ · ρ ∇(ρ ∗ ∇W ) , (24.30)
∂t
where σ ∈ R+ and W = W (x − y) is some interaction potential on Rn .
These equations (a particular instance of McKean–Vlasov equations)
appeared in the modeling of granular media [92, 93, 622], either with
σ = 0 or with σ > 0, in particular in dimension 1. See the review
paper [820] for much more information. Similar equations also appear
in the theory of self-interacting diffusion processes [83, 84, 85, 521, 535].
(Some of the ingredients of [213] are used again in [521], along with
many subtle probabilistic arguments, in the study of the confining effect
of self-interaction.)
The study of exponential convergence for (24.30) leads to interesting
issues, some of them briefly reviewed in [815, 820]. There are criteria
for exponential convergence in terms of the convexity of V and W .
728 24 Gradient flows II: Qualitative properties

These problems can also be set on a Riemannian manifold M (replace

W (x − y) by W (x, y)), and then Ricci curvature estimates come into
play [761]. In the particular case of linear diffusion in Rn , there are
alternative approaches to these convergence results, more directly based
on coupling arguments [221, 590, 591]. In the other particular case
where (24.30) is set in dimension 1, σ = 0 and W (z) = |z|3 /3, the
solution converges to a Dirac mass, and there is a self-similar scaling
allowing one to refine the study of the rate of convergence. A somewhat
surprising (at least so it was for us) result of Caglioti and myself [195]
states that the refinement obtained by this method is necessarily small;
the argument is based on a proof of “slow convergence” for a rescaled
equation, which uses the 1-Wasserstein distance W1 .
The strategy based on displacement convexity does not apply di-
rectly to (24.30) when the potential W is not convex. However, there
is an interesting interplay between the diffusion and the effect of the
interaction potential. Such an effect was studied in [213] for (24.30)
when σ > 0, V = 0 and W = |z|3 : even though the interaction po-
tential is degenerately convex, the spreading caused by the diffusion is
sufficient to make it “effectively” uniformly convex. Even more strik-
ing, Calvez and Carrillo [198] established convergence to equilibrium
(in Wasserstein distance) for (24.30) with V = 0 in the nonconvex case
W (z) = |z|k /k for −1 < k < 1 (k = 0 corresponds to log |z|). This also
works if the linear diffusion ∆ρ is replaced by ∆ρm with m + k > 1.
Demange [290, 291, 292, 293] studied the fast diffusion equation
∂t ρ = ∆ρ1−1/N on a Riemannian manifold, under a curvature-dimension
condition CD(K, N ). He used the Sobolev inequality, in the form
Z
(N − 2)(N − 1) 2
HN/2 (µ) ≤ ρ−1− N |∇ρ|2 dν
2K
Z
(N − 2)(N − 1) 1 1
≤ (sup ρ)− N ρ1− N |∇ρ|2 dν
2K
to obtain a differential inequality such as

dHN/2 (µt ) N −2 1 HN/2 (µt )
≤− (sup ρ)− N ,
dt N −1 2K
and deduced an estimate of the form

HN/2 (µt ) = O e−(λN +ε) t ,
where λN is the presumably optimal rate that one would obtain with-
out the (sup ρ) term, and ε > 0 is arbitrarily small. His estimate is
Bibliographical notes 729

slightly stronger than the one which I derived in Theorem 24.7 and
Remark 24.12, but the asymptotic rate is the same.
All the methods described before apply to the study of the time
asymptotics of the porous medium equation ∂t ρ = ∆ρm , but only under
the restriction m ≥ 1 − 1/N . In that regime one can use time-rescaling
and tools similar to the ones described in this chapter, to prove that
the solutions become close to Barenblatt’s self-similar solution.
When m < 1−1/N , displacement convexity and related tricks do not
apply any more. This is why it was rather a sensation when Carrillo and
Vázquez [217] applied the Aronson–Bénilan estimates to the problem
of asymptotic behavior for fast diffusion equations with exponents m
in (1 − N2 , 1 − N1 ), which is about the best that one can hope for, since
Barenblatt profiles do not exist for m ≤ 1 − 2/N .
Here we see the limits of Otto’s formalism: such results as the di-
mensional refinement of the rate of convergence for diffusive equations
(Remark 24.10), or the Carrillo–Vázquez estimates, rely on inequalities
of the form
Z Z

p(ρ) Γ2 ∇U ′ (ρ) dν + p2 (ρ) (LU ′ (ρ))2 dν ≥ . . .

in which ones takes advantage of the fact that the same function ρ
appears in the terms p(ρ) and p2 (ρ) on the one hand, and in the terms
∇U ′ (ρ) and LU ′ (ρ) on the other. The technical tool might be changes
of variables for the Γ2 (as in [541]), or elementary integration by parts
(as in [217]); but I don’t see any interpretation of these tricks in terms
of the Wasserstein space P2 (M ).
The story about the rates of equilibration for fast diffusion equations
does not end here. At the same time as Carrillo and Vázquez obtained
their main results, Denzler and McCann [298, 299] computed the spec-
tral gap for the linearized fast diffusion equations in the same interval
of exponents (1 − N2 , 1 − N1 ). This study showed that the rate of conver-
gence obtained by Carrillo and Vázquez is off the value suggested by the
linearized analysis by a factor 2 (except in the radially symmetric case
where they obtain the optimal rate thanks to a comparison method).
The connection between the nonlinear and the linearized dynamics is
still unclear, although some partial results have been obtained by Mc-
Cann and Slepčev [619]. More recently, S.J. Kim and McCann [517]
have derived optimal rates of convergence for the “fastest” nonlinear
diffusion equations, in the range 1 − 2/N < m ≤ 1 − 2/(N + 2), by
comparison methods involving Newtonian potentials. Another work by
730 24 Gradient flows II: Qualitative properties

Cáceres and Toscani [183] also recovers some of the results of Denzler
and McCann by means of completely different methods with their roots
in kinetic theory. There is still ongoing research to push the rates of
convergence and the range of admissible nonlinearities, in particular by
Denzler, Koch, McCann and probably others.
In dimension 2, the limit case m = 0 corresponds to a logarithmic
diffusion; it is related to geometric problems, such as the evolution of
conformal surfaces or the Ricci flow [806, Chapter 8].
More general nonlinear diffusion equations of the form ∂t ρ = ∆p(ρ)
have been studied by Biler, Dolbeault and Esteban [119], and Car-
rillo, Di Francesco and Toscani [210, 211] in Rn . In the latter work
the rescaling procedure is recast in a more geometric and physical
interpretation, in terms of temperature and projections; a sequel by
Carrillo and Vázquez [218] shows that the intermediate asymptotics
can be complicated for well-chosen nonlinearities. Nonlinear diffusion
equations on manifolds were also studied by Demange [291] under a
CD(K, N ) curvature-dimension condition, K > 0.
Theorem 24.7(ii) is related to a long tradition of study of contraction
rates in Wasserstein distance for diffusive equations [231, 232, 458, 662].
Sturm and Renesse [764] noted that such contraction rates character-
ize nonnegative Ricci curvature; Sturm [761] went on to give various
characterizations of CD(K, N ) bounds in terms of contraction rates for
possibly nonlinear diffusion equations.
In the one-dimensional case (M = R) there are alternative methods
to get contraction rates in W2 distance, and one can also treat larger
classes of models (for instance viscous conservation laws), and even ob-
tain decay in Wp for any p; see for instance [137, 212]. Recently, Brenier
found a re-interpretation of these one-dimensional contraction proper-
ties in terms of monotone operators [167]. Also the asymptotic behavior
of certain conservation laws has been analyzed in this way [208, 209]
(with the help of the strong “W∞ distance”!).
Another model for which contraction in W2 distance has been estab-
lished is the Boltzmann equation, in the particular case of a spatially
homogeneous gas of Maxwellian molecules. This contraction property
was discovered by Tanaka [644, 776, 777]; see [138] for recent work on
the subject. Some striking uniqueness results have been obtained by
Fournier and Mouhot [377, 379] with a related method (see also [378]).
To conclude this discussion about contraction estimates, I shall
briefly discuss some links with Perelman’s analysis of the backward
Bibliographical notes 731

version of Hamilton’s Ricci ﬂow. A ﬁrst observation by McCann and

Topping [620] is that the evolution by this flow forces the heat equa-
tion to be a contraction in Wasserstein distance, even if the Ricci cur-
vature is not everywhere nonnegative. McCann and Topping also es-
tablished a converse result characterizing the Ricci flow in terms of
contractive semigroups. Related topics were independently studied by
Carfora [200].
These investigations were pushed further by Topping [782] and
Lott [576] with the help of the formalism of displacement interpola-
tion for Lagrangian actions (recall Chapter 7). For instance, define
n1 Z t1 o
Lt00 ,t1 (x, y) = inf kγ̇(t)|2 + S(γ(t), t) dt ,
2 t0

where the infimum is taken over all C 1 paths γ : [t0 , t1 ] → M such that
γ(t0 ) = x and γ(t1 ) = y, and S(x, t) is the scalar curvature of M (evolv-
ing under backward Ricci flow) at point x and time t. As in Chapter 7,
this induces a Hamilton–Jacobi equation, and an action R in the space of
measures. Then it is shown in [576] that H(µt ) − φt dµt is convex in
t along the associated displacement interpolation, where (φt ) is a so-
lution of the Hamilton–Jacobi equation. Other theorems in [576, 782]
deal with a variant of L0 in which some time-rescalings have been per-
formed. Not only do these results generalize the contraction property
of [620], but they also imply Perelman’s estimates of monotonicity of
the so-called W -entropy and reduced volume functionals (which were
an important tool in the proof of the Poincaré conjecture).
I shall now comment on short-time decay estimates. The short-time
behavior of the entropy and Fisher information along the heat flow
(Theorem 24.16) was studied by Otto and myself around 1999 as a
technical ingredient to get certain a priori estimates in a problem of hy-
drodynamical limits. This work was not published, and I was quite sur-
prised to discover that Bobkov, Gentil and Ledoux [127, Theorem 4.3]
had found similar inequalities and applied them to get a new proof of
the HWI inequality. Otto and I published our method [672] as a com-
ment to [127]; this is the same as the proof of Theorem 24.16. It can be
considered as an adaptation, in the context of the Wasserstein space, of
some classical estimates about gradient flows in Hilbert spaces, that can
be found in Brézis [171, Théorème 3.7]. The result of Bobkov, Gentil
and Ledoux is actually more general than ours, because these authors
seem to have sharp constants under CD(K, ∞) for all values of K ∈ R,
732 24 Gradient flows II: Qualitative properties

while it is not clear that our method is sharp for K 6= 0. For K = 0

both methods yield exactly the same result, which was a bit of a puzzle
to me. It would be interesting to clarify all this.
In relation to Remark 24.21, I was asked the following question by
Guionnet (and I am unable to answer): Given a solution (µt ) of the
heat equation ∂t ρ = Lρ, is it true that t Iν (µt ) converges to a finite
limit as t → 0? If yes, then by De L’Hospital’s rule, this is also the
limit Pof Hν (µt )/| log t| as t → 0. In the particular case when µ0 =
f ν+ N k=1 akP
δxk , with f smooth, it is not difficult to show that t Iν (µt )
converges to ak . This question is motivated by some problems in free
probability theory.
Inequality (24.25) goes back to [672], under adequate regularity as-
sumptions, for the main case of interest which is U (r) = r log r.
Hölder-1/2 estimates in time are classical for gradient flows; in the
context of the Wasserstein space, they appeared in several works, for
instance [30].
In [214] and [30] there were some investigations about the possibility
to directly use Otto’s formalism to perform the proof of Theorem 24.2
and the other theorems in this chapter.
The Li–Yau heat kernel estimates go back to [552]; they were refined
by Davies [272], then by Bakry and Qian [60]; the latter paper is closely
related to certain issues that will be addressed in the next chapter. In
any case, the Bochner formula and various forms of maximum principles
are the main ingredients behind these estimates. Recently, Bakry and
Ledoux [59] derived improved forms of the Li–Yau estimates, and made
the connection with the theory of logarithmic Sobolev inequalities.
There are similarities between the Li–Yau parabolic Harnack in-
equality and the study of Perelman’s L-functional; in view of Topping’s
work [782], this seems to give one further reason to hope for a direct
relation between optimal transport and Li–Yau inequalities.
The Aronson–Bénilan estimates were established in [45]. There
is some overlap between the Aronson–Bénilan and Li–Yau bounds;
see [580] for a common framework. (Early attempts were performed
by Carrillo and myself.)
Recently Demange has obtained short-time regularization estimates
like sup ρt = O(t−N ) for the fast diffusion equation ∂t ρ = ∆ρ1−1/N in
positive curvature, which are optimal in a certain sense.
In this chapter as in the previous one, I have only been interested in
gradient flows; but there are probably other questions about the qual-
Bibliographical notes 733

itative behavior of Hamiltonian ﬂows which make sense in relation to

optimal transport. For instance, if one were able to construct “Gibbs
measures” of the form (15.23) on the set P2 (M ), where M is a sym-
plectic manifold, then they would be natural candidates to be relevant
invariant measures for Hamiltonian flows in P2 (M R ). Take for instance
2
M = T , and define the Hamiltonian as H(µ) = G(x, y) µ(dx) µ(dy),
where G(x, y) is the fundamental solution of the Laplace operator
on T2 ; then the associated “Hamiltonian flow” should be the two-
dimensional Euler equation. For this equation the problem of construct-
ing invariant measures was considered long ago [94, 693] without real
success (see however [14]); it is natural to ask whether the optimal
transport approach provides a path to attack this problem.
25

Gradient flows III: Functional inequalities

In the preceding chapter certain functional inequalities were used to

provide quantitative information about the behavior of solutions to
certain partial differential equations. In the present chapter, conversely,
the behavior of solutions to certain partial differential equations will
help establish certain functional inequalities.
For the kind of inequalities that will be encountered in this chapter,
this principle has been explored in depth since the mid-eighties, start-
ing with Bakry and Émery’s heat semigroup proof of Theorem 21.2.
Nowadays, one can prove this theorem by more direct means (as I did
in Chapter 21); nevertheless, the heat semigroup argument is still of
interest, and not only for historical reasons. Indeed it has been the
basis for many generalizations, some of which are still out of reach of
alternative methods.
Optimal transport appears in this game from two different perspec-
tives. On the one hand, several inequalities involving optimal transport
have been proven by diffusion semigroup methods. On the other hand,
optimal transport has provided a re-interpretation of these methods,
since several diffusion equations can be understood as gradient flows
with respect to a structure induced by optimal transport. This inter-
pretation has led to a more synthetic and geometric picture of the field;
and Otto’s calculus has provided a way to shortcut some intricate com-
putations.
That being said, I have to admit that there are limitations to this
point of view. It is true that some of the most important computations
in Bakry’s Γ2 calculus can be understood in terms of optimal trans-
port; but some other parts of the formalism, in particular those based
on changes of functions, have remained inaccessible so far. Usually such
736 25 Gradient flows III: Functional inequalities

manipulations are useful to treat functional inequalities involving a nat-

ural class of functions whose dimension “does not match” the dimension
of the curvature-dimension condition. More explicitly: It is usually okay
to interpret in terms of optimal transport a proof involving functions
in DC∞ under a curvature-dimension assumption CD(K, ∞). Such is
also the case for a proof involving functions in DCN under a curvature-
dimension assumption CD(K, N ). But to get the correct constants for
an inequality involving functions in DCN under a condition CD(K, N ′ ),
N ′ < N , may be much more of a problem.
In this chapter, I shall discuss three examples which can be worked
out nicely. The first one is an alternative proof of Theorem 21.2, fol-
lowing the original argument of Bakry and Émery. The second example
is a proof of the optimal Sobolev inequality (21.8) under a CD(K, N )
condition, recently discovered by Demange. The third example is an
alternative proof of Theorem 22.17, along the lines of the original proof
by Otto and myself.
The proofs in this chapter will be sloppy in the sense that I shall
not go into smoothness issues, or rather admit auxiliary regularity re-
sults which are not trivial, especially in unbounded manifolds. These
regularity issues are certainly the main drawback of the gradient flow
approach to functional inequalities — to the point that many authors
prefer to just ignore these difficulties!
I shall use the same conventions as in the previous chapters: U
will be a nonlinearity belonging to some displacement convexity class,
and p(r) = r U ′ (r) − U (r) will be the associated pressure function;
ν = e−V vol will be a reference measure, and L will be the associated
Laplace-type operator admitting ν as invariant measure. Moreover,
Z Z Z
|∇p(ρ)|2
Uν (µ) = U (ρ) dν, IU,ν (µ) = ρ |∇U ′ (ρ)|2 dν = dν,
ρ
Z Z
1
1− N 1 2 2
HN,ν (µ) = −N (ρ −ρ) dν, IN,ν (µ) = 1− ρ−1− N |∇ρ|2 dν,
N
Z Z
|∇ρ|2
H∞,ν (µ) = Hν (µ) = ρ log ρ dν, I∞,ν (µ) = Iν (µ) = dν,
ρ
where ρ always stands for the density of µ with respect to ν.
Logarithmic Sobolev inequalities revisited 737

Logarithmic Sobolev inequalities revisited

Theorem 25.1 (Infinite-dimensional Sobolev inequalities from

Ricci curvature). Let M be a Riemannian manifold equipped with a
reference measure ν satisfying a curvature-dimension bound CD(K, ∞)
for some K > 0, and let U ∈ DC∞ . Further, let λ := limr→0 p(r)/r.
Then, for all µ ∈ P2ac (M ),

IU,ν (µ)
Uν (µ) − Uν (ν) ≤ .
2Kλ

Particular Case 25.2 (Bakry–Émery theorem again). If (M, ν)

satisﬁes CD(K, ∞) for some K > 0, then the following logarithmic
Sobolev inequality holds true:

Iν (µ)
∀µ ∈ P ac (M ), Hν (µ) ≤ .
2K
Sloppy proof of Theorem 25.1. By using Theorem 17.7(vii) and an ap-
proximation argument, we may assume that ρ is smooth, that U is
smooth on (0, +∞), that the solution (ρt )t≥0 of the gradient ﬂow

∂ρ
= L p(ρt ),
∂t
starting from ρ0 = ρ is smooth, that Uν (µ0 ) is ﬁnite, and that t →
Uν (µt ) is continuous at t = 0.
For notational simplicity, let

H(t) := Uν (µt ), I(t) := IU,ν (µt ).

From Theorems 24.2(ii) and 24.7(i)(b),

dH(t)
= −I(t), I(t) ≤ I(0) e−2Kλ t .
dt
By Theorem 24.7(i)(a), H(t) → 0 as t → ∞. So
Z +∞ Z +∞
I(0)
H(0) = I(t) dt ≤ I(0) e−2Kλ t dt = ,
0 0 2Kλ
which is the desired result. ⊓
⊔
738 25 Gradient flows III: Functional inequalities

Sobolev inequalities revisited

Theorem 25.3 (Generalized Sobolev inequalities under Ricci

curvature bounds). Let M be a Riemannian manifold equipped with
a reference measure ν = e−V , V ∈ C 2 (M ), satisfying a curvature-
dimension bound CD(K, N ) for some K > 0, N ∈ [1, ∞). Let U ∈ DCN
with U ′′ > 0 on (0, +∞), and let A ∈ C(R+ ) ∩ C 2 ((0, +∞)) be such
1
that A(0) = A(1) = 0 and A′′ (r) = r − N U ′′ (r). Then, for any probability
density ρ on M ,
Z Z
1
A(ρ) dν ≤ ρ |∇U ′ (ρ)|2 dν, (25.1)
M 2Kλ M
where
p(r)
λ = lim 1 .
r↓0 r 1− N
Remark 25.4. For a given U , there might not necessarily exist a suit-
able A. For instance, if U = UN , it is only for N > 2 that we can
construct A.

Particular Case 25.5 (Sobolev inequalities). Whenever N > 2,

let
1 N (N − 1) 1− 2
U (r) = UN (r) = −N (r 1− N − r), A(r) = − (r N − r);
2(N − 2)

then (25.1) reads

1 N −2
H N ,ν (µ) ≤ IN,ν (µ),
2 2K N −1

which can also be rewritten in the form of (21.9) or (21.8).

Sloppy proof of Theorem 25.3. By density, we may assume that the den-
sity ρ0 of µ is smooth; we may also assume that A and U are smooth
on (0, +∞) (recall Proposition 17.7(vii)). Let (ρt )t≥0 be the solution of
the gradient ﬂow equation
∂ρ
= ∇ · (ρ∇U ′ (ρ)), (25.2)
∂t
and as usual µt = ρt ν. It can be shown that ρt is uniformly bounded
below by a positive number as t → ∞.
From log Sobolev to Talagrand, revisited 739

By Theorem 24.2(iii),
Z
d 1
1− N
IU,ν (µt ) ≤ −2Kλ ρt |∇U ′ (ρt )|2 dν. (25.3)
dt M
1
On the other hand, from the assumption A′′ (r) = r − N U ′′ (r),
1
∇A′ (ρ) = ρ− N ∇U ′ (ρ).

So Theorem 24.2(i) implies

Z Z
d
A(ρt ) dν = − ρt ∇A′ (ρt ) · ∇U ′ (ρt ) dν
dt
ZM
1− 1 2
=− ρt N ∇U ′ (ρt ) dν. (25.4)
M

The combination of (25.3) and (25.4) leads to

d 1 d
− Aν (µt ) ≤ − IU,ν (µt ). (25.5)
dt 2Kλ dt

As t → ∞, IU,ν (µt ) and Uν (µt ) converge to 0 (Theorem 24.7(i)). Since

ρt is uniformly bounded below and U ′′ is uniformly positive on the
range of ρt , this implies that ρt → 1 in L1 (ν), and also that Aν (µt )
converges to 0. Then one can integrate both sides of (25.5) from t = 0
to t = ∞, and recover

1
Aν (µ0 ) ≤ IU,ν (µ0 ),
2Kλ
as desired. ⊓
⊔

From log Sobolev to Talagrand, revisited

Theorem 25.6 (From Sobolev-type inequalities to concentra-

tion inequalities). Let M be a Riemannian manifold equipped with
a reference probability measure ν = e−V vol ∈ P2ac (M ), V ∈ C 2 (M ).
Let U ∈ DC∞ . Assume that for any µ ∈ P2ac (M ), holds the inequality
1
Uν (µ) − Uν (ν) ≤ IU,ν (µ). (25.6)
2K
740 25 Gradient flows III: Functional inequalities

Further assume that the Cauchy problem associated with the gradient
flow ∂t ρ = L p(ρ) admits smooth solutions for smooth initial data.
Then, for any µ ∈ P2ac (M ), holds the inequality

W2 (µ, ν)2 Uν (µ) − Uν (ν)

≤ .
2 K
Particular Case 25.7 (From Log Sobolev to Talagrand). If the
reference measure ν on M satisfies a logarithmic Sobolev inequality
with constant K, and a curvature-dimension bound CD(K ′ , ∞) for
some K ′ ∈ R, then it also satisfies a Talagrand inequality with con-
stant K: r
ac 2 Hν (µ)
∀µ ∈ P2 (M ), W2 (µ, ν) ≤ . (25.7)
K
Sloppy proof of Theorem 25.6. By a density argument, we may assume
that µ has a smooth density µ0 , and let (µt )t≥0 evolve according to the
gradient flow (25.2). By Theorem 24.2(ii),
d
Uν (µt ) = −IU,ν (µt ).
dt
In particular, (d/dt)Uν (µt ) ≤ −2KUν (µt ), so Uν (µt ) converges to 0 as
t → ∞ (exponentially fast).
By Theorem 24.2(iv), for almost all t,

d+ q
W2 (µ0 , µt ) ≤ IU,ν (µt ).
dt
On the other hand, by assumption,
q r
IU,ν (µt ) d 2 Uν (µt )
IU,ν (µt ) ≤ p =− . (25.8)
2KUν (µt ) dt K

From (24.4) and (25.8),

r
d+ d 2 Uν (µt )
W2 (µ0 , µt ) ≤ − .
dt dt K
Stated otherwise: If
r
2 Uν (µt )
ψ(t) := W2 (µ0 , µt ) + ,
K
then d+ ψ/dt ≤ 0, i.e. ψ is nonincreasing as a function of t, and so
Appendix: Comparison of proofs 741

lim ψ(t) ≤ ψ(0). (25.9)

t→∞

Let us now check that µt converges weakly to ν. Inequality

R (25.9) im-
Rplies that W2 (µ0 , µt ) remains bounded as t → ∞; so d(z, x) µt (dx) ≤
d(z, x) µ0 (dx) + W1 (µ0 , µt ) is also uniformly bounded, and {µt } is
tight as t → ∞. Up to extraction of a sequence of times, µt converges
weakly to some measure µ e. On the other hand, the functional inequal-
ity (25.6) forces U ′′ to be positive on (0, +∞), and then the convergence
Uν (µt ) → 0 = Uν (ν) is easily seen to imply ρt −−−→ 1 almost surely.
t→∞
e imposes µ
This combined with the weak convergence of µ to µ e = ν; so
µt does converge weakly to ν. As a consequence,

W2 (µ0 , ν) ≤ lim inf W2 (µ0 , µt )

t→∞
= lim inf ψ(t)
t→∞
p
≤ ψ(0) = (2 Uν (µ0 ))/K,

which proves the claim. ⊓

⊔

Appendix: Comparison of proofs

The proofs in the present chapter were based on gradient flows of dis-
placement convex functionals, while proofs in Chapters 21 and 22 were
more directly based on displacement interpolation. How do these two
strategies compare to each other?
From a formal point of view, they are not so different as one may
think. Take the case of the heat equation,
∂ρ
= ∆ρ,
∂t
or equivalently
∂ρ
+ ∇ · ρ ∇(− log ρ) = 0.
∂t
The evolution of ρ is determined by the “vector field” ρ → (− log ρ),
in the space of probability densities. Rescale time and the vector field
itself as follows:
εt
ϕε (t, x) = −ε log ρ ,x .
2
742 25 Gradient flows III: Functional inequalities

Then ϕε satisﬁes the equation

∂ϕε |∇ϕε |2 ε
+ = ∆ϕε .
∂t 2 2
Passing to the limit as ε → 0, one gets, at least formally, the Hamilton–
Jacobi equation
∂ϕ |∇ϕ|2
+ = 0,
∂t 2
which is in some sense the equation driving displacement interpolation.
There is a general principle here: After suitable rescaling, the ve-
locity field associated with a gradient flow resembles the velocity field
of a geodesic flow. Here might be a possible way to see this. Take an
arbitrary smooth function U , and consider the evolution

ẋ(t) = −∇U (x(t)).

Turn to Eulerian formalism, consider the associated vector ﬁeld v de-

ﬁned by
d
X(t, x0 ) = −∇U (X(t, x0 )) =: − v t, X(t, x0 ) ,
dt
and rescale by
vε (t, x0 ) = ε v εt, X(εt, x0 ) .
then one can check that, as ε → 0,

∇x0 vε (t, x0 ) ≃ ε∇2 U (x0 ).

It follows by an explicit calculation that

∂vε
+ vε · ∇vε ≃ 0.
∂t
So as ε → 0, vε (t, x) should asymptotically satisfy the equation of a
geodesic vector field (pressureless Euler equation).
There is certainly more to say on the subject, but whatever the inter-
pretation, the Hamilton–Jacobi equations can always be squeezed out of
the gradient flow equations after some suitable rescaling. Thus we may
expect the gradient flow strategy to be more precise than the displace-
ment convexity strategy. This is also what the use of Otto’s calculus
suggests: Proofs based on gradient flows need a control of Hess Uν only
in the direction grad Uν , while proofs based on displacement convexity
Bibliographical notes 743

need a control of Hess Uν in all directions. This might explain why there
is at present no displacement convexity analogue of Demange’s proof of
the Sobolev inequality (so far only weaker inequalities with nonsharp
constants have been obtained).
On the other hand, proofs based on displacement convexity are usu-
ally rather simpler, and more robust than proofs based on gradient
ﬂows: no issues about the regularity of the semigroup, no subtle in-
terplay between the Hessian of the functional and the “direction of
evolution”. . .
In the end we can put some of the main functional inequalities dis-
cussed in these notes in a nice array. Below, “LSI” stands for “Logarith-
mic Sobolev inequality”; “T” for “Talagrand inequality”; and “Sob2 ”
for the Sobolev inequality with exponent 2. So LSI(K), T(K), HWI(K)
and Sob2 (K, N ) respectively stand for (21.4), (22.4) (with p = 2),
(20.17) and (21.8).
Theorem Gradient ﬂow proof Displ. convexity proof
CD(K, ∞) ⇒ LSI(K) Bakry–Émery Otto–Villani
LSI(K) ⇒ T(K) Otto–Villani Bobkov–Gentil–Ledoux
CD(K, ∞) ⇒ HWI(K) Bobkov–Gentil–Ledoux Otto–Villani
CD(K, N ) ⇒ Sob2 (K, N ) Demange ??

Bibliographical notes
Stam used a heat semigroup argument to prove an inequality which
is equivalent to the Gaussian logarithmic Sobolev inequality in dimen-
sion 1 (recall the bibliographical notes for Chapter 21). His argument
was not completely rigorous because of regularity issues, but can be
repaired; see for instance [205, 783].
The proof of Theorem 25.1 in this chapter follows the strategy by
Bakry and Émery, who were only interested in the Particular Case 25.2.
These authors used a set of calculus rules which has been dubbed the
“Γ2 calculus”. They were not very careful about regularity issues, and
for that reason the original proof probably cannot be considered as
completely rigorous (in particular for noncompact manifolds, in which
regularity issues are not so innocent, even if the curvature-dimension
condition prevents the blow-up of the heat semigroup). However, re-
cently Demange [291] carried out complete proofs for much more deli-
cate situations, so there is no reason to doubt that the Bakry–Émery
744 25 Gradient flows III: Functional inequalities

argument can be made fully rigorous. Also, when the manifold is Rn

equipped with a reference density e−V , the proof was carefully rewritten
by Arnold, Markowich, Toscani and Unterreiter [43], in the language
of partial differential equations. This paper was the sequel to a simpler
paper by Toscani [785] considering the particular case of the Gaussian
measure.
The Bakry–Émery strategy was applied independently by Otto [669]
and by Carrillo and Toscani [215] to study the asymptotic behavior of
porous medium equations. Since then, many authors have applied it to
various classes of nonlinear equations, see e.g. [213, 217].
The interpretation of the Bakry–Émery proof as a gradient flow
argument was developed in my paper with Otto [671]. This interpreta-
tion was of much help when we considered more complicated nonlinear
situations in [213].
In this chapter as in [671] the gradient flow interpretation was used
only as a help to understanding; but the gradient flow formalism can
also be used more directly, see for instance [655].
Theorem 25.3 is due to Demange [291]. Demange did not only treat
the inequality (21.9), but also the whole family (21.7). A disturbing
remark is that for many members of this family, several distinct gradient
flows can be used to yield the same functional inequality. Demange
also discussed other criteria than U ∈ DCN , allowing for finer results
if, say, U ∈ DCN but the curvature-dimension bound is CD(K, N ′ ) for
some N ′ < N ; at this point he uses formulas of change of variables
for Γ2 operators. He found a mysterious structure condition on the
nonlinearity U , which in many cases leads to finer results than the
DCN condition:

9N rU ′′ (r) 1
rq ′ (r) + q(r) ≥ q(r)2 , q(r) = ′
+ . (25.10)
4(N + 2) U (r) N

(Note that q ≡ 0 for U = UN .)

Demange worked on arbitrary noncompact manifolds by using a
careful truncation procedure; he restricted the equation to bounded
open subsets and imposed Dirichlet boundary conditions. (Neumann’s
boundary conditions would be more natural, for instance because they
preserve the mass; but the Dirichlet boundary conditions have the ma-
jor technical advantage of coming with a monotonicity principle.) All of
Demange’s results still seem to be out of reach of more direct methods
based on displacement interpolation.
Bibliographical notes 745

The proof of Theorem 25.6 was implemented in my joint work with

Otto [671]. The proof there is (hopefully!) complete, but we only consid-
ered Particular Case 25.7 (certainly the most important). We carefully
checked that the curvature bound CD(K ′ , ∞) prevents the blow-up of
the heat equation. Maybe one can still make the proof work without
that lower bound assumption, by truncating the logarithmic Sobolev
inequality and the Talagrand inequality, and then working in an arbi-
trarily large bounded open subset of the manifold, imposing Neumann
boundary conditions. In any case, to treat noncompact manifolds with-
out lower bounds on the curvature, it is certainly easier to use the proof
of Theorem 22.17, based on the Bobkov–Gentil–Ledoux method.
Later, Biane and Voiculescu [118] adapted our argument to free
probability theory, deriving a noncommutative analog of the Talagrand
inequality; what plays the role of the Gaussian measure is now Wigner’s
semi-circular law. In their paper, they also discuss many generaliza-
tions, some of which seem to have no classical counterpart so far.
F.-Y. Wang [831], and Cattiaux and Guillin [219] have worked out
several other variants and applications of our scheme of proof. Cattiaux
and Guillin also noticed that one could replace the original argument
based on an upper estimate of (d/dt)W2 (µ0 , µt ), by a lower estimate of
(d/dt)W2 (µt , ν).
The observation that the Hamilton–Jacobi equation can be obtained
from the heat equation after proper rescaling is quite old, and it is now
a classical exercise in the theory of viscosity solutions (see e.g. [335]).
Bobkov, Gentil and Ledoux [127] observed that this could constitute a
bridge between the two main existing strategies for logarithmic Sobolev
inequalities. Links with the theory of large deviations have been inves-
tigated in [335, 355].
As for the ﬁnal array in the Appendix, the corresponding papers are
those of Bakry and Émery [56], Otto and Villani [671], Bobkov, Gentil
and Ledoux [127], and Demange [290, 291].
Part III

Synthetic treatment of Ricci curvature

749

The last part of these notes is devoted to a direction of research which

was mainly explored by Lott, Sturm and myself from 2004 on.
In Chapter 17 it was proven that lower Ricci curvature bounds in-
fluence displacement convexity properties of certain classes of function-
als; but also that these properties characterize lower Ricci curvature
bounds. So we may “transform the theorem into a definition” and ex-
press the property “Ricci curvature is bounded below by K” in terms of
certain displacement convexity properties. This approach is synthetic,
in the sense that it does not rely on analytic computations (of the Ricci
tensor. . . ), but rather on the properties of certain objects which play
an important role in some geometric arguments.
This point of view has the advantage of applying to nonsmooth
spaces, just as lower (or upper) sectional curvature bounds can be de-
fined in nonsmooth metric spaces by Alexandrov’s method. An impor-
tant difference however is that the notion of generalized Ricci curvature
will be defined not only in terms of distances, but also in terms of ref-
erence measures. So the basic object will not be a metric space, but
a metric-measure space, that is, a metric space equipped with a
reference measure.
Chapters 26 and 27 are preparatory. In Chapter 26 I shall try to
convey in some detail the meaning of the word “synthetic”, with a
simple illustration about convex functions; then Chapter 27 will be
devoted to some reminders about the convergence of metric-measure
spaces.
The next two chapters constitute the core of this part. In Chapter 28
I will consider optimal transport in possibly nonsmooth spaces, and es-
tablish various properties of stability of optimal transport under con-
vergence of metric-measure spaces. Then in Chapter 29 I shall present
a synthetic definition of the curvature-dimension condition CD(K, N )
in a nonsmooth context, and prove that it too is stable. Here is a ge-
ometric consequence of these results that can be stated without any
reference to optimal transport: If a Riemannian manifold is the limit of
a sequence of CD(K, N ) Riemannian manifolds, then it, too, satisfies
CD(K, N ).
The last chapter will present the state of the art concerning the qual-
itative geometric and analytic properties enjoyed by metric-measure
spaces satisfying curvature-dimension conditions, with complete proofs.
750

The issues discussed in this part are concisely reviewed in my sem-

inar proceedings [821] (in French), or the survey paper by Lott [573],
whose presentation is probably more geometer-friendly.
Convention: Throughout Part III, geodesics are constant-speed,
minimizing geodesics.
26

Analytic and synthetic points of view

The present chapter is devoted to a simple pedagogical illustration of

the opposition between the “analytic” and “synthetic” points of view.
Consider the following two deﬁnitions for convexity on Rn :
(i) A convex function is a function ϕ which is twice continuously
diﬀerentiable, and whose Hessian ∇2x ϕ is nonnegative at each x ∈ Rn ;
(ii) A convex function is a function ϕ such that for all x, y ∈ Rn ,
and λ ∈ [0, 1],

ϕ (1 − λ)x + λy ≤ (1 − λ) ϕ(x) + λ ϕ(y).

How can we compare these two deﬁnitions?

1) When applied to C 2 functions, both definitions coincide, but the
second one is obviously more general. Not only is it expressed without
any reference to second differentiability, but there are examples, such
as ϕ(x) = |x|, which satisfy (ii) but not (i). So Definition (ii) really is
an extension of Definition (i).
2) Definition (ii) is more stable than Definition (i). Here is what I
mean by that: Take a sequence (ϕk )k∈N of convex functions, converging
to some other function ϕ; how do I know that ϕ is convex? To pass
to the limit in Definition (i), I would need the convergence to be very
strong, say in C 2 (Rn ). (Let’s forget here about the notion of distri-
butional convergence, which would solve the problem but is much less
elementary.) On the other hand, I can pass to the limit in Definition (ii)
assuming only, say, pointwise convergence. So Definition (ii) is much
easier to “pass to the limit in” — even if the limit is known to be
smooth.
752 26 Analytic and synthetic points of view

3) Deﬁnition (ii) is also a better starting point for studying proper-

ties of convex functions. In this set of notes, most of the time, when I
used some convexity, it was via (ii), not (i).
4) On the other hand, if I give you a particular function (by its
explicit analytic expression, say), and ask you whether it is convex,
it will probably be a nightmare to check Definition (ii) directly, while
Definition (i) might be workable: You just need to compute the second
derivative and check its sign. Probably this is the method that will
work most easily for the huge majority of candidate convex functions
that you will meet in your life, if you don’t have any extra information
on them (like they are the limit of some family of functions. . . ).
5) Definition (i) is naturally local, while Definition (ii) is global (and
probably this is related to the fact that it is so difficult to check).
In particular, Definition (i) involves an object (the second derivative)
which can be used to quantify the “strength of convexity” at each point.
Of course, one may define a convex function as a function satisfying (ii)
locally, i.e. when x and y stay in the neighborhood of any given point;
but then locality does not enter in such a simple way as in (i), and the
issue immediately arises whether a function which satisfies (ii) locally,
also satisfies (ii) globally.
In the above discussion, Definition (i) can be thought of as analytic
(it is based on the computation of certain objects), while Definition (ii)
is synthetic (it is based on certain qualitative properties which are
the basis for proofs). Observations 1–5 above are in some sense typical:
synthetic definitions ought to be more general and more stable, and
they should be usable directly to prove interesting results; on the other
hand, they may be difficult to check in practice, and they are usually
less precise (and less “local”) than analytic definitions.
In classical Euclidean geometry, the analytic approach consists in in-
troducing Cartesian coordinates and making computations with equa-
tions of lines and circles, sines and cosines, etc. The synthetic approach,
on the other hand, is more or less the one that was used already by
ancient Greeks (and which is still taught, or at least should be taught,
to our kids, for developing the skill of proof-making): It is not based
on computations, but on axioms à la Euclid, qualitative properties of
lines, angles, circles and triangles, construction of auxiliary points, etc.
The analytic approach is conceptually simple, but sometimes leads to
very horrible computations; the synthetic approach is often lighter, but
26 Analytic and synthetic points of view 753

requires better intuition and clever elementary arguments. It is also

usually (but this is of course a matter of taste) more elegant.
In “Riemannian” geometry, curvature is traditionally defined by a
purely analytic approach: From the Riemannian scalar product one
can compute several functions which are called sectional curvature,
Ricci curvature, scalar curvature, etc. For instance, for any x ∈ M , the
sectional curvature at point x is a function which associates to each
2-dimensional plane P ⊂ Tx M a number σx (P ), for which there is an
explicit expression in terms of a basis of P , and a certain combination
of derivatives of the metric at x. Intuitively, σx (P ) measures the speed
of convergence of geodesics that start at x, with velocities spanning
the plane P . A lot of geometric information can be retrieved from
the bounds on the sectional curvature. Then a space is said to have
nonnegative sectional curvature if σx (P ) is nonnegative, for all P and
for all x.
However, there is also a famous synthetic point of view on sectional
curvature, due to Alexandrov. In Alexandrov’s approach one does not
try to define what the curvature is, but what it means to have nonneg-
ative curvature: By definition, a geodesic space (X , d) is said to have
Alexandrov curvature bounded below by K, or to be an Alexandrov
space with curvature bounded below by K, if triangles in X are no
more “skinny” than reference triangles drawn on the model space R2 .
More precisely: If xyz is a triangle in X , x0 y0 z0 is a triangle drawn on
R2 with d(x0 , y0 ) = d(x, y), d(y0 , z0 ) = d(y, z), d(z0 , x0 ) = d(z, x), x′
is a midpoint between y and z, and x′0 a midpoint between y0 and z0 ,
then one should have d(x0 , x′0 ) ≤ d(x, x′ ). (See Figure 26.1.)
There is an excellent analogy with the previous discussion for convex
functions. The Alexandrov definition is equivalent to the analytic one
in case it is applied to a smooth Riemannian manifold; but it is more
general, since it applies for instance to a cone (say, the two-dimensional
cone embedded in R3 , constructed over a circular basis). It is also more
stable; in particular, it passes to the limit under Gromov–Hausdorff
convergence, a notion that will be described in the sequel. It can still
be used as the starting point for many properties involving sectional
curvature. On the other hand, it is in general difficult to check directly,
and there is no associated notion of curvature (when one says “Alexan-
drov space of nonnegative curvature”, the words “nonnegative” and
“curvature” do not make sense independently of each other).
754 26 Analytic and synthetic points of view

x x0

y z y0 z0

Fig. 26.1. The triangle on the left is drawn in X , the triangle on the right is drawn
on the model space R2 ; the lengths of their edges are the same. The thin geodesic
lines go through the apex to the middle of the basis; the one on the left is longer
than the one on the right. In that sense the triangle on the left is fatter than the
triangle on the right. If all triangles in X look like this, then X has nonnegative
curvature. (Think of a triangle as the belly of some individual, the belt being the
basis, and the neck being the apex; of course the line going from the apex to the
middle of the basis is the tie. The fatter the individual, the longer his tie should be.)

x x0
11
00
00
11
111
000 00
11
000
111
000
111 00
11
000
111 00
11
00
11
000
111
000
111 00
11
000
111 00
11
y 000
111 z y0 00
11 z0
000
111
Still there is a generalization of what it means to have curvature
bounded below by K ∈ R, where K is an arbitrary real number, not
necessarily 0. It is obtained by replacing the model space R2 by the
model space with constant curvature K, that is:
√ √
• the sphere S 2 (1/ K) with radius R = 1/ K, if K > 0;
• the plane R2 , if K = 0;
p
• thephyperbolic space H(1/ |K|) with “hyperbolic radius” R =
1/ |K|, if K < 0; this can be realized as the half-plane R×(0, +∞),
equipped with the metric g(x,y) (dx dy) = (dx2 + dy 2 )/(|K|y 2 ).

Geodesic spaces satisfying these inequalities are called Alexandrov

spaces with curvature bounded below by K; all the remarks which were
made above in the case K = 0 apply in this more general case. There is
also a symmetric notion of Alexandrov spaces with curvature bounded
above, obtained by just reversing the inequalities.
This generalized notion of sectional curvature bounds has been ex-
plored by many authors, and quite strong results have been obtained
Bibliographical notes 755

concerning the geometric and analytic implications of such bounds. But

until recently the synthetic treatment of lower Ricci curvature bounds
stood as an open problem. The thesis developed in the rest of these
notes is that optimal transport provides a solution to this problem
(maybe not the only one, but so far the only one which seems to be
acceptable).

Bibliographical notes

In close relation to the topics discussed in this chapter, there is an

illuminating course by Gromov [437], which I strongly recommend to
the reader who wants to learn about the meaning of curvature.
Alexandrov spaces are also called CAT spaces, in honor of Cartan,
Alexandrov and Toponogov. But the terminology of CAT space is of-
ten restricted to Alexandrov spaces with upper sectional bounds. So a
CAT(K) space typically means an Alexandrov space with “sectional
curvature bounded above by K”. In the sequel, I shall only consider
lower curvature bounds.
There are several good sources for Alexandrov spaces, in particular
the book by Burago, Burago and Ivanov [174] and the synthesis paper
by Burago, Gromov and Perelman [175]. Analysis on Alexandrov spaces
has been an active research topic in the past decade [536, 537, 539, 583,
584, 665, 676, 677, 678, 681, 751].
There is also a notion of “approximate” Alexandrov spaces, called
CATδ (K) spaces, in which a fixed “resolution error” δ is allowed in
the defining inequalities [439]. (In the case of upper curvature bounds,
this notion has applications to the theory of hyperbolic groups.) Such
spaces are not necessarily geodesic spaces, not even length spaces, they
can be discrete; a pair of points (x0 , x1 ) will not necessarily admit a
midpoint, but there will be a δ-approximate midpoint (that is, m such
that |d(x0 , m) − d(x0 , x1 )/2| ≤ δ, |d(x1 , m) − d(x0 , x1 )/2| ≤ δ).
The open problem of developing a satisfactory synthetic treatment
of Ricci curvature bounds was discussed in the above-mentioned book
by Gromov [437, pp. 84–85], and more recently in a research paper
by Cheeger and Colding [228, Appendix 2]. References about recent
developments related to optimal transport will be given in the sequel.
Although this is not really the point of this chapter, I shall take this
opportunity to briefly discuss the structure of the Wasserstein space
756 26 Analytic and synthetic points of view

over Alexandrov spaces. In Chapter 8 I already mentioned that Alexan-

drov spaces with lower curvature bounds might be the natural setting
for certain regularity issues associated with optimal transport; recall
indeed Open Problem 8.21 and the discussion before it. Another is-
sue is how sectional (or Alexandrov) curvature bounds inﬂuence the
geometry of the Wasserstein space.
It was shown by Lott and myself [577, Appendix A] that if M is
a compact Riemannian manifold then M has nonnegative sectional
curvature if and only if P2 (M ) is an Alexandrov space with nonnegative
curvature. Independently, Sturm [762, Proposition 2.10(iv)] proved the
more general result according to which X is an Alexandrov space with
nonnegative curvature if and only if P2 (X ) is an Alexandrov space with
nonnegative curvature. A study of tangent cones was performed in [577,
Appendix A]; it is shown in particular that tangent cones at absolutely
continuous measures are Hilbert spaces.
All this suggested that the notion of Alexandrov curvature matched
well with optimal transport. However, at the same time, Sturm showed
that if X is not nonnegatively curved, then P2 (X ) cannot be an Alexan-
drov space (morally, the curvature takes all values in (−∞, +∞) at
Dirac masses); this negative result was recently developed in [575]. To
circumvent this obstacle, Ohta [655, Section 3] suggested replacing the
Alexandrov property by a weaker condition known as 2-uniform con-
vexity and used in Banach space theory (see e.g. [62]). Savaré [735, 736]
came up independently with a similar idea. By deﬁnition, a geodesic
space (X , d) is 2-uniform with a constant S ≥ 1 if, given any three
points x, y, z ∈ X , and a minimizing geoesic γ joining y to z, one has

∀t ∈ [0, 1], d(x, γt )2 ≥ (1−t) d(x, y)2 +t d(x, z)2 −S 2 t(1−t) d(y, z)2 .

(When S = 1 this is exactly the inequality deﬁning nonnegative Alexan-

drov curvature.) Ohta shows that (a) any Alexandrov space with curva-
ture bounded below is locally 2-uniform; (b) X is 2-uniformly smooth
with constant S if and only if P2 (X ) is 2-uniformly smooth with the
same constant S. He further uses the 2-uniform smoothness to study
the structure of tangent cones in P2 (X ). Both Ohta and Savaré use
these inequalities to construct gradient ﬂows in the Wasserstein space
over Alexandrov spaces.
Even when X is a smooth manifold with nonnegative curvature,
many technical issues remain open about the structure of P2 (M ) as an
Alexandrov space. For instance, the notion of the tangent cone used
Bibliographical notes 757

in the above-mentioned works is the one involving the space of direc-

tions, but does it coincide with the notion derived from rescaling and
Gromov–Hausdorff convergence? (See Example 27.16. In finite dimen-
sion there are theorems of equivalence of the two definitions, but now
we are in a genuinely infinite-dimensional setting.) How should one de-
fine a regular point of P2 (M ): as an absolutely continuous measure,
or an absolutely continuous measure with positive density? And in the
latter case, do these measures form a totally convex set? Do singular
measures form a very small set in some sense? Can one define and use
quasi-geodesics in P2 (M )? And so on.
27

Convergence of metric-measure spaces

The central question in this chapter is the following: What does it mean
to say that a metric-measure space (X , dX , νX ) is “close” to another
metric-measure space (Y, dY , νY )? We would like to have an answer
that is as “intrinsic” as possible, in the sense that it should depend
only on the metric-measure properties of X and Y.
So as not to inﬂate this chapter too much, I shall omit many proofs
when they can be found in accessible references, and prefer to insist on
the main stream of ideas.

Hausdorff topology
There is a well-established notion of distance between compact sets in
a given metric space, namely the Hausdorff distance. If X and Y are
two compact subsets of a metric space (Z, d), their Hausdorﬀ distance
is
dH (X , Y) = max sup d(x, Y), sup d(y, X ) ,
x∈X y∈Y

where as usual d(a, B) = inf{d(a, b); b ∈ B} is the distance from the

point a to the set B. The choice of notation is significant: Think of X
and Y not just as subsets, but rather as metric subspaces of Z.
The statement “dH (X , Y) ≤ r” can be recast informally as follows:
“If we inflate (enlarge) Y by a distance r, then the resulting set covers
X ; and conversely if we inflate X by a distance r, the resulting set
covers Y.” (See Figure 27.1.)
The Hausdorff distance can be thought of as a set-theoretical analog
of the Prokhorov distance between probability measures (of course,
760 27 Convergence of metric-measure spaces

Fig. 27.1. In solid lines, the borders of the two sets X and Y; in dashed lines, the
borders of their enlargements. The width of enlargement is just sufficient that any
of the enlarged sets covers both X and Y.

historically the former came ﬁrst). This will become more apparent if
I rewrite the Hausdorﬀ distance as
n o
dH (A, B) = inf r > 0; A ⊂ B r] and B ⊂ Ar] ,

and the Prokhorov distance as

n o
dP (µ, ν) = inf r > 0; ∀C, µ[C] ≤ ν[C r] ]+r and ν[C] ≤ µ[C r] ]+r ,

where C stands for an arbitrary closed set, C r] is the set of all points
whose distance to C is no more than r, i.e. the union of all closed balls
B[x, r], x ∈ C.
The analogy between the two notions goes further: While the
Prokhorov distance can be defined in terms of couplings, the Hausdorff
distance can be defined in terms of correspondences. By definition, a
correspondence (or relation) between two sets X and Y is a subset R of
X × Y: if (x, y) ∈ R, then x and y are said to be in correspondence; it
is required that each x ∈ X should be in correspondence with at least
one y, and each y ∈ Y should be in correspondence with at least one x.
Then we have the two very similar formulas:
 n

 dP (µ, ν) = inf r > 0; ∃ coupling (X, Y ) of (µ, ν); o



 P [d(X, Y ) > r] ≤ r ;
n

 d (µ, ν) = inf r > 0; ∃ correspondence R in X × Y;

 H o

 ∀(x, y) ∈ R, d(x, y) ≤ r .
The Gromov–Hausdorff distance 761

Moreover, it is easy to guess an “optimal” correspondence: Just deﬁne

(x, y) ∈ R ⇐⇒ d(x, y) = d(x, Y) or d(y, x) = d(y, X ) .

So each (x, y) ∈ R satisﬁes d(x, y) ≤ dH (X , Y), with equality for at

least one pair. (Indeed, the maximum in the definition of the Hausdorff
distance is obviously achieved.)
Like their probabilistic counterparts, correspondences can be glued
together: if R12 is a correspondence between X1 and X2 , and R23 is a
correspondence between X2 and X3 , one may define a correspondence
R13 = R23 ◦ R12 between X1 and X3 by

(x1 , x3 ) ∈ R13 ⇐⇒ ∃x2 ∈ X2 ; (x1 , x2 ) ∈ R12 and (x2 , x3 ) ∈ R23 .
The next observation is that the Hausdorff distance really is a dis-
tance! Indeed:
(i) It is obviously symmetric (dH (X , Y) = dH (Y, X ));
(ii) Because it is defined on compact (hence bounded) sets, the in-
fimum in the definition is a nonnegative finite number;
(iii) If dH (X , Y) = 0, then any x ∈ X satisfies d(x, Y) ≤ ε, for any
ε > 0; so d(x, Y) = 0, and since Y is assumed to be compact (hence
closed), this implies X = Y;
(iv) if X1 , X2 and X3 are given, introduce optimal correspondences
R12 and R23 in the correspondence representation of the Hausdorff
measure; then the composed representation R13 = R23 ◦ R12 is such
that any (x1 , x3 ) ∈ R13 satisfies d(x1 , x3 ) ≤ d(x1 , x2 ) + d(x2 , x3 ) ≤
dH (X1 , X2 ) + dH (X2 , X3 ) for some x2 . So dH satisfies the triangle in-
equality.
Then one may define the metric space H(Z) as the space of all
compact subsets of Z, equipped with the Hausdorff distance. There is
a nice statement that if Z is compact then H(Z) is also a compact
metric space.
So far everything is quite simple, but soon it will become a bit more
complicated, which is a good reason to go slowly.

The Gromov–Hausdorff distance

The Hausdorﬀ distance only compares subsets of a given underlying
space. But how can we compare diﬀerent metric spaces with possibly
762 27 Convergence of metric-measure spaces

nothing in common? First one would like to say that two spaces which
are isometric really are the same. Recall the definition of an isometry:
If (X , d) and (X ′ , d′ ) are two metric spaces, a map f : X → X ′ is called
an isometry if:
(a) it preserves distances: for all x, y ∈ X , d′ (f (x), f (y)) = d(x, y);
(b) it is surjective: for any x′ ∈ X ′ there is x ∈ X with f (x) = x′ .
An isometry is automatically injective, so it has to be a bijection,
and its inverse f −1 is also an isometry. Two metric spaces are said to
be isometric if there exists an isometry between them. If two spaces
X and X ′ are isometric, then any statement about X which can be
expressed in terms of just the distance, is automatically “transported”
to X ′ by the isometry.
This motivates the desire to work with isometry classes, rather than
metric spaces. By definition, an isometry class X is the set of all metric
spaces which are isometric to some given space X . Instead of “isometry
class”, I shall often write “abstract metric space”. All the spaces in a
given isometry class have the same topological properties, so it makes
sense to say of an abstract metric space that it is compact, or complete,
etc.
This looks good, but a bit frightening: There are so many metric
spaces around that the concept of abstract metric space seems to be
ill-posed from the set-theoretical point of view (just like there is no “set
of all sets”). However, things becomes much more friendly when one
realizes that any compact metric space, being separable, is isometric to
the completion of N for a suitable metric. (To see this, introduce a dense
sequence (xk ) in your favorite space X , and define d(k, ℓ) = dX (xk , xℓ ).)
Then we might think of an isometry class as a subset of the set of all
distances on N; this is still huge, but at least it makes sense from a
set-theoretical point of view.
Now the problem is to find a good distance on the set of abstract
compact metric spaces. The natural concept here is the Gromov–
Hausdorff distance, which is obtained by formally taking the quo-
tient of the Hausdorff distance by isometries: If (X , dX ) and (Y, dY ) are
two compact metric spaces, define

dGH (X , Y) = inf dH (X ′ , Y ′ ), (27.1)

where the inﬁmum is taken over all isometric embeddings X ′ , Y ′ of X

and Y into a common metric space Z; this means that X ′ is isometric
to X , Y ′ is isometric to Y, and both X ′ and Y ′ are subspaces of Z.
Representation by semi-distances 763

Of course, there is no loss of generality in choosing Z = X ′ ∪ Y ′ ,

but let me insist: the metric on X ′ , Y ′ has to be the metric induced
by Z! In that situation I shall say that (X ′ , Y ′ ) constitute a metric
coupling of the abstract spaces (X , Y). Two metric couplings (X ′ , Y ′ )
and (X ′′ , Y ′′ ) will be said to be isometric if there is an isometry F :
(X ′ ∪ Y ′ ) → (X ′′ ∪ Y ′′ ) which restricts to isometries X ′ → X ′′ and
Y ′ → Y ′′ .

Representation by semi-distances

As we know, all the probabilistic information contained in a coupling

(X, Y ) of two probability spaces (X , νX ) and (Y, νY ) is summarized
by a joint probability measure on the product space X × Y. There is
an analogous statement for metric couplings: All the geometric infor-
mation contained in a metric coupling (X ′ , Y ′ ) of two abstract metric
spaces X and Y is summarized by a semi-distance on the disjoint union
X ⊔ Y. Here are the deﬁnitions:
• a semi-distance on a set Z is a map d : Z × Z → [0, +∞) satisfying
d(x, x) = 0, d(x, y) = d(y, x), d(x, z) ≤ d(x, y) + d(y, z), but not
necessarily [d(x, y) = 0 =⇒ x = y];
• the disjoint union X ⊔ Y is the union of two disjoint isometric copies
of X and Y. The particular way in which this disjoint union is
constructed does not matter; for instance, take any representative
of X (still denoted X for simplicity), any representative of Y, and
set X ⊔ Y = ({0} × X ) ∪ ({1} × Y). Then {0} × X is isometric to
X via the map (0, x) → x, etc.
However, not all semi-distances are allowed. In a probabilistic con-
text, the only admissible couplings of two measures νX and νY are those
whose joint law π has marginals νX and νY . There is a similar princi-
ple for metric couplings: If (X , dX ) and (Y, dY ) are two given abstract
metric spaces, the only admissible semi-distances on X ⊔ Y are those
whose restriction to X × X (resp. Y × Y) coincides with dX (resp. dY ).
When that condition is satisﬁed, it will be possible to reconstruct a
metric coupling from the semi-distance, by just “taking the quotient”
of X ⊔Y by the semi-distance d, in other words deciding that two points
a and b with d(a, b) = 0 really are the same.
All this is made precise by the following statement:
764 27 Convergence of metric-measure spaces

Proposition 27.1 (Metric couplings as semi-distances). Let

(X , dX ) and (Y, dY ) be two disjoint metric spaces, and let X ⊔ Y be
their union. Then:
(i) Let (X ′ , Y ′ ) be a metric coupling of X and Y; let f : X → X ′
and g : Y → Y ′ be isometries, and let (Z, dZ ) be the ambient metric
space containing X ′ and Y ′ . Whenever a, b belong to X ⊔ Y, define


 dX (a, b) if a, b ∈ X


d (a, b) if a, b ∈ Y
Y
d(a, b) =

 dZ (f (a), g(b)) if a ∈ X , b ∈ Y


d (g(a), f (b)) if a ∈ Y, b ∈ X .
Z

Then d is a semi-distance on X ⊔ Y, whose restriction to X × X (resp.

Y × Y) coincides with dX (resp. dY ).
(ii) Conversely, let d be a semi-distance on X ⊔ Y, whose restriction
to X × X (resp. Y × Y) coincides with dX (resp. dY ). On X ⊔ Y, define
the relation R by the property

x R x′ ⇐⇒ d(x, x′ ) = 0.

This is an equivalence relation, so one may define

Z = (X ⊔ Y)/d := (X ⊔ Y)/R

as the set of classes of equivalence in X ⊔Y. Write x for the equivalence

class of x, and define
dZ (a, b) = d(a, b)
(this does not depend on the choice of representatives a, b, but only on
the equivalence classes a, b). Then x → x is an isometric embedding of
X into (Z, dZ ), and similarly y → y is an isometric embedding of Y
into (Z, dZ ).

The reader should have no diﬃculty in writing down the proof of

Proposition 27.1; just be patient enough and make sure that you con-
sider all cases!
Now the following property should not come as a surprise:

Theorem 27.2 (Metric gluing lemma). Let (X1 , d1 ), (X2 , d2 ),

(X3 , d3 ) be three abstract compact metric spaces. If (X1′ , X2′ ) is a metric
coupling of (X1 , X2 ) and (X2′′ , X3′′ ) is a metric coupling of (X2 , X3 ), then
Representation by approximate isometries 765

there is a triple of metric spaces (Xe1 , Xe2 , Xe3 ), all subspaces of a common
metric space (Z, dZ ), such that (Xe1 , Xe2 ) is isometric (as a coupling) to
(X1′ , X2′ ), and (Xe2 , Xe3 ) is isometric to (X2′′ , X3′′ ).

Sketch of proof of Theorem 27.2. By means of Proposition 27.1, the

metric coupling (X1′ , X2′ ) may be thought of as a semi-distance d12 on
X1 ⊔ X2 ; and similarly, (X2′′ , X3′′ ) may be thought of as a semi-distance
d23 on X2 ⊔ X3 . Then, for x1 ∈ X1 and x3 ∈ X3 , deﬁne

d13 (x1 , x3 ) = inf d12 (x1 , x2 ) + d23 (x2 , x3 ) .
x2 ∈X2

Next, on X1 ⊔ X2 ⊔ X3 introduce the semi-distance



 d12 (a, b) if a, b ∈ X1 ⊔ X2


d (a, b) if a, b ∈ X ⊔ X
23 2 3
d(a, b) =

 d13 (a, b) if a ∈ X 1 and b ∈ X3


d (b, a) if a ∈ X and b ∈ X .
13 3 1

This is a semi-distance; so one can deﬁne

Z = (X1 ⊔ X2 ⊔ X3 )/d,

and repeat the same reasoning as in Proposition 27.1. ⊓

⊔

Representation by approximate isometries

If a correspondence R ⊂ X × Y preserves distances, in the sense that

d(x, x′ ) = d(y, y ′ ) for all (x, y), (x′ , y ′ ) in R, then it is almost obvious
that R is the graph of an isometry between X and Y. To measure how
far a correspondence is from being an isometry, deﬁne its distortion
by the formula

dis (R) = sup dY (y, y ′ ) − dX (x, x′ ).
(x,y),(x′ ,y ′ )∈R

Then it can be shown that

1
dGH (X , Y) = inf dis (R), (27.2)
2
766 27 Convergence of metric-measure spaces

where the inﬁmum is over all correspondences R between X and Y.

There is an even more handy way to evaluate Gromov–Hausdorﬀ
distances, in terms of approximate isometries. By deﬁnition, an ε-
isometry between (X , dX ) and (Y, dY ) is a map f : X → Y that is
“almost an isometry”, which means:
(a’) it almost preserves distances: for all x, x′ in X ,

d(f (x), f (x′ )) − d(x, x′ ) ≤ ε;

(b’) it is almost surjective:

∀y ∈ Y ∃x ∈ X; d(f (x), y) ≤ ε.

In particular, dH (f (X ), Y) ≤ ε.

Remark 27.3. Heuristically, an ε-isometry is a map that you can’t

distinguish from an isometry if you are short-sighted, that is, if you
measure all distances with a possible error of about ε.

It is not clear whether one can reformulate the Gromov–Hausdorﬀ

distance in terms of ε-isometries, but at least from the qualitative point
of view this works ﬁne: It can be shown that
2 n o
dGH (X , Y) ≤ inf ε; ∃f ε-isometry X → Y ≤ 2 dGH (X , Y).
3
(27.3)
Indeed, if f is an ε-isometry, deﬁne a relation R by

(x, y) ∈ R ⇐⇒ d(f (x), y) ≤ ε;

then dis (R) ≤ 3ε, and the left inequality in (27.3) follows by for-
mula (27.2). Conversely, if R is a relation with distortion η, then for
any ε > η one can deﬁne an ε-isometry f whose graph is included in
R: The idea is to deﬁne f (x) = y, where y is such that (x, y) ∈ R. (See
the comments at the end of the bibliographical notes.)
The symmetry between X and Y seems to have been lost in (27.3),
but this is not serious, because any approximate isometry admits an
approximate inverse: If f is an ε-isometry X → Y, then there is a
(4ε)-isometry f ′ : Y → X such that for all x ∈ X , y ∈ Y,

dX f ′ ◦ f (x), x ≤ 3ε, dY f ◦ f ′ (y), y ≤ ε. (27.4)

Such a map f ′ will be called an ε-inverse of f .

The Gromov–Hausdorff space 767

To construct f ′ , consider the relation R induced by f , whose dis-

tortion is no more than 3ε; then ﬂip the components of R to get
a relation R′ from Y to X , with (obviously) the same distortion
as R, and construct a (4ε)-isometry f ′ : Y → X whose graph is
a subset of R. Then (f (x), x) ∈ R′ and (f (x), f ′ (f (x))) ∈ R′ , so
d(f ′ (f (x)), x) ≤ dis (R′ ) + d(f (x), f (x)) ≤ 3ε. Similarly, the identity
(f ′ (y), y) ∈ R implies d(f (f ′ (y)), y) ≤ ε.
If there exists an ε-isometry between X and Y, then I shall say that
X and Y are ε-isometric. This terminology has the drawback that the
order of X and Y matters: if X and Y are ε-isometric, then Y and X
are not necessarily ε-isometric; but at least they are (4ε)-isometric, so
from the qualitative point of view this is not a problem.
Lemma 27.4 (Approximate isometries converge to isometries).
Let X and Y be two compact metric spaces, and for each k ∈ N let fk be
an εk -isometry, where εk → 0. Then, up to extraction of a subsequence,
fk converges to an isometry.
Sketch of proof of Lemma 27.4. Introduce a dense subset S of X . For
each x ∈ X , the sequence (fk (x)) is valued in the compact set Y, and so,
up to extraction of a subsequence, it converges to some f (x) ∈ Y. By a
diagonal extraction, we may assume that fk (x) → f (x) for all x ∈ X .
By passing to the limit in the inequality satisﬁed by fk , we see that f
is distance-preserving. By uniform continuity, it may be extended into
a distance-preserving map X → Y.
Similarly, there is a distance-preserving map Y → X , obtained as a
limit of approximate inverses of fk , and denoted by g. The composition
g ◦ f is a distance-preserving map X → X , and since X is compact
it follows from a well-known theorem that g ◦ f is a bijection. As a
consequence, both f and g are bijective, so they are isometries. ⊓
⊔
Remark 27.5. The above proof establishes the pointwise convergence
of (a subsequence of) fk to f , but in fact one can impose the uniform
convergence; see Theorem 27.10.

The Gromov–Hausdorff space

After all these preparations, we may at last understand why dGH is a
honest distance:
768 27 Convergence of metric-measure spaces

(i) It is obviously symmetric.

(ii) Let X and Y be two compact metric spaces; define Z to be the
disjoint union X ⊔ Y, equip X and Y with their respective distances,
and extend this into a distance d on X ⊔ Y by letting d(x, y) = D > 0
as soon as (x, y) ∈ X × Y. If D is chosen large enough, this is indeed
a distance; so the injections (x, y) → x and (x, y) → y realize a metric
coupling of (X , Y). Thus the infimum in (27.1) is not +∞.
(iii) Obviously, dGH (X , X ) = 0. Conversely, if X and Y are two
abstract compact metric spaces such that dGH (X , Y) = 0, introduce
any two representatives of these isometry classes (still denoted by X
and Y for simplicity), then for each k ∈ N there is an (1/k)-isometry
fk : X → Y. Up to extraction of a subsequence, fk converges to a true
isometry by Lemma 27.4, so X and Y are isometric, and define the
same isometry class.
(iv) Finally, the triangle inequality follows easily from the metric
gluing lemma — just as the triangle inequality for the Wasserstein
distance was a consequence of the probabilistic gluing lemma.
So the set (GH, dGH ) of all classes of isometry of compact met-
ric spaces, equipped with the Gromov–Hausdorff distance, is itself a
complete separable metric space. An explicit countable dense subset of
GH is provided by the family of all finite subsets of N with rational-
valued distances. Convergence in the Gromov–Hausdorff distance is
called Gromov–Hausdorff convergence.
It is equivalent to express the Gromov–Hausdorff convergence in
terms of embeddings and Hausdorff distance, or in terms of distortions
of correspondences, or in terms of approximate isometries. This leads
to the following definition.

Definition 27.6 (Gromov–Hausdorff convergence). Let (Xk )k∈N

be a sequence of compact metric spaces, and let X be a compact metric
space. Then it is said that Xk converges to X in the Gromov–Hausdorff
topology if any one of the three equivalent statements is satisfied:
(i) dGH (Xk , X ) −→ 0;
(ii) There exist correspondences Rk between Xk and X such that
dis Rk −→ 0;
(iii) There exist εk -isometries fk : Xk → X , for some sequence
εk → 0.
GH
This convergence will be denoted by Xk −−→ X , or just Xk −→ X .
The Gromov–Hausdorff space 769

Fig. 27.2. A very thin tire (2-dimensional manifold) is very close, in Gromov–
Hausdorff sense, to a circle (1-dimensional manifold)

Remark 27.7. Keeping Remark 27.3 in mind, two spaces are close in
Gromov–Hausdorﬀ topology if they look the same to a short-sighted
person (see Figure 27.2). I learnt from Lott the expression Mr. Magoo
topology to convey this idea.

Remark 27.8. It is important to allow the approximate isometries to

be discontinuous. Figure 27.3 below shows a simple example where two
spaces X and Y are very close to each other in Gromov–Hausdorﬀ
topology, although there is no continuous map X → Y. (Still there is a
celebrated convergence theorem by Gromov showing that such behavior
is ruled out by bounds on the curvature.)

Fig. 27.3. A balloon with a very small handle (not simply connected) is very close
to a balloon without handle (simply connected).

The Gromov–Hausdorﬀ topology enjoys the very nice property that

any geometric statement which can be expressed in terms of the dis-
tances between a finite number of points automatically passes to the
limit. For example, consider the statement “Any pair (x, y) of points
in X admits a midpoint”, which (under a completeness assumption)
770 27 Convergence of metric-measure spaces

is characteristic of a geodesic space. This only involves the distance

between conﬁgurations of three points (x, y and the candidate mid-
point), so it passes to the limit. Then geodesics can be reconstructed
from successive midpoints. In this way one can easily prove:
Theorem 27.9 (Convergence of geodesic spaces). Let (Xk )k∈N
be a sequence of compact geodesic spaces converging to X in Gromov–
Hausdorff topology; then X is a geodesic space. Moreover, if fk is an
εk -isometry Xk → X , and γk is a geodesic curve in Xk such that fk ◦ γk
converges to some curve γ in X , then γ is a geodesic.

Gromov–Hausdorff topology and nets

Given ε > 0, a set N in a metric space (X , d) is called an ε-net (in X )
if the enlargement S ε] covers X ; in other words, for any x ∈ X there is
y ∈ N such that d(x, y) ≤ ε.
If N is an ε-net in X , clearly the distance between N and X is at
most ε. And if X is compact, then it admits finite ε-nets for all ε > 0, so
it can be approximated in Gromov–Hausdorff topology by a sequence
of finite sets.
In fact, it is another nice feature of the Gromov–Hausdorff topology
that it ultimately always reduces to convergence of finite ε-nets. More
precisely, Xn −→ X in the Gromov–Hausdorff topology if and only if
for any ε > 0 there exists a finite ε-net {x1 , . . . , xk } in X , and for n
(n) (n)
large enough there is an ε-net {x1 , . . . , xk } in Xn , and for all j ≤ k,
(n)
xj −→ xj .
This leads to the main compactness criterion in Gromov–Hausdorff
topology. Recall that a metric space X is said to be totally bounded if
for any ε > 0 it can be covered by a finite number of balls of radius ε. If
N (ε) is the minimal number of such balls, then the function ε 7−→ N (ε)
can be thought of as a “modulus of total boundedness”. Then the
following statement, due to Gromov, is vaguely reminiscent of Ascoli’s
theorem:
Theorem 27.10 (Compactness criterion in Gromov–Hausdorff
topology). A family F of compact metric spaces is precompact in
the Gromov–Hausdorff topology if and only if it is uniformly totally
bounded, in the sense that for any ε > 0 there is N = N (ε) such that
any X ∈ F contains an ε-net of cardinality at most N .
Noncompact spaces 771

Noncompact spaces

There is no problem in extending the deﬁnition of the Gromov–

Hausdorff distance to noncompact spaces, except of course that the
resulting “distance” might be infinite. But even when it is finite, this
notion is of limited use. A good analogy is the concept of uniform
convergence of functions, which usually is too strong a notion for non-
compact spaces, and should be replaced by locally uniform convergence,
i.e. uniform convergence on any compact subset.
At first sight, it does not seem to make much sense to define a
notion of local Gromov–Hausdorff convergence. Indeed, if a sequence
(Xk )k∈N of metric spaces is given, there is a priori no canonical family of
compact sets in Xk to compare to a family of compact sets in X . So we
had better impose the existence of these compact sets on each member
of the sequence. The idea is to exhaust the space X by compact sets K (ℓ)
in such a way that each K (ℓ) (equipped with the metric induced by X )
(ℓ)
is a Gromov–Hausdorff limit of corresponding compact sets Kk ⊂ Xk
(each of them with the induced metric), as k → ∞.
The next definition makes this more precise. (Recall that a Polish
space is a complete separable metric space.)

Definition 27.11 (Local Gromov–Hausdorff convergence). Let

(Xk )k∈N be a family of Polish spaces, and let X be another Polish space.
It is said that Xk converges to X in the local Gromov–Hausdorff topology
(ℓ)
if there are nondecreasing sequences of compact sets (Kk )ℓ∈N in each
Xk , and (K (ℓ) )ℓ∈N in X , such that:
S
(i) ℓ K (ℓ) is dense in X ;
(ℓ)
(ii) for each fixed ℓ, Kk converges to K (ℓ) in Gromov–Hausdorff
sense as k → ∞.

If one works in length spaces, as will be the case in the rest of this
course, the above deﬁnition does not seem so good because K (ℓ) will in
general not be a strictly intrinsic (i.e. geodesic) length space: Geodesics
joining elements of K (ℓ) might very well leave K (ℓ) at some intermediate
time; so properties involving geodesics might not pass to the limit. This
is the reason for requirement (iii) in the following deﬁnition.

Definition 27.12 (Geodesic local Gromov–Hausdorff conver-

gence). Let (Xk )k∈N be a family of geodesic Polish spaces, and let
X be a Polish space. It is said that Xk converges to X in the geodesic
772 27 Convergence of metric-measure spaces

local Gromov–Hausdorff topology if there are nondecreasing sequences

(ℓ)
of compact sets (Kk )ℓ∈N in each Xk , and (K (ℓ) )ℓ∈N in X , such that
(i) and (ii) in Definition 27.11 are satisfied, and in addition:
(iii) For each ℓ ∈ N, there exists ℓ′ such that all geodesics starting
(ℓ) (ℓ′ )
and ending in Kk have their image contained in Kk .
Then X is automatically a geodesic space.

If X is boundedly compact (that is, all closed balls are compact),

there is a rather natural choice of exhaustive family of compact sets
in X : Pick up an arbitrary point ⋆ ∈ X , and consider the closed balls
B[⋆, Rℓ ], where Rℓ is any sequence of positive real numbers going to
infinity, say Rℓ = ℓ. One can fix the sequence Rℓ once for all, and then
the notion of convergence only depends on the choice of the “reference
point” or “base point” ⋆ (the point from which the convergence is seen).
This suggests that the basic objects should not be just metric spaces,
but rather pointed metric spaces. By definition, a pointed metric
space consists of a triple (X , d, ⋆), where (X , d) is a metric space and
⋆ is some point in X . Sometimes I shall just write (X , ⋆) or even just
X as a shorthand for the triple (X , d, ⋆).
It is equivalent for a geodesic space to be boundedly compact or to
be locally compact; so in the sequel the basic regularity assumption,
when considering pointed Gromov–Hausdorff convergence, will be local
compactness.
All the notions that were introduced in the previous section can
be generalized in a completely obvious way to pointed metric spaces:
A pointed isometry between (X , ⋆X ) and (Y, ⋆Y ) is an isometry which
sends ⋆X to ⋆Y ; the pointed Gromov–Hausdorff distance dpGH between
two pointed spaces (X , ⋆X ) and (Y, ⋆Y ) is obtained as an infimum of
Hausdorff distances over all pointed isometric embeddings; a pointed
correspondence is a correspondence such that ⋆X is in correspondence
with ⋆Y ; a pointed ε-isometry is an ε-isometry which sends ⋆X to ⋆Y ,
etc. Then Definition 27.6 can be trivially transformed into a pointed
notion of convergence, expressing the fact that for each R, the closed
ball B[⋆k , R] in Xk converges to the closed ball B[⋆, R] in X . By an
easy extraction argument, this is equivalent to the following alternative
definition.

Definition 27.13 (Pointed Gromov–Hausdorff convergence).

Let (Xk , ⋆k ) be a sequence of pointed locally compact geodesic Polish
spaces, and let (X , ⋆) be a pointed locally compact Polish space. Then
Noncompact spaces 773

it is said that Xk converges to X in the pointed Gromov–Hausdorff

topology if any one of the three equivalent statements is satisfied:
(i) There is a sequence Rk → ∞ such that

dpGH B[⋆k , Rk ], B[⋆, Rk ] −→ 0;

(ii) There is a sequence Rk → ∞, and there are pointed correspon-

dences Rk between B[⋆k , Rk ] and B[⋆, Rk ] such that

dis (Rk ) −→ 0;

(iii) There are sequences Rk → ∞ and εk → 0, and pointed εk -

isometries fk : B[⋆k , Rk ] → B[⋆, Rk ] with εk → 0.

Remark 27.14. This notion of convergence implies the geodesic lo-

cal convergence, as defined in Definition 27.12. Indeed, (i) and (ii) of
Definition 27.11 are obviously satisfied, and (iii) follows from the fact
that if a geodesic curve has its endpoints in B[⋆, R], then its image lies
entirely inside B[⋆, R′ ] with R′ = 2R.

Example 27.15 (Blow-up). Let M be a Riemannian manifold of

dimension n, and x a point in M . For each k, consider the pointed
metric space Xk = (M, kd, x), where x is the basepoint, and kd is just
the original geodesic distance on M , dilated by a factor k. Then Xk
converges in the pointed Gromov–Hausdorﬀ topology to the tangent
space Tx M , pointed at 0 and equipped with the metric gx (it is a
Euclidean space). This is true as soon as M is just diﬀerentiable at x,
in the sense of the existence of the tangent space.

Example 27.16. More generally, if X is a given metric space, and x is

a point in X , one can define the rescaled pointed spaces Xk = (X , kd, x);
if this sequence converges in the pointed Gromov–Hausdorff topology
to some metric space Y, then Y is said to be the tangent space, or
tangent cone, to X at x. In many cases, the tangent cone coincides
with the metric cone built on some length space Σ, which itself can be
thought of as the space of tangent directions. (By definition, if (B, d)
is a length space, the metric cone over B is obtained by considering
B × [0, ∞), gluing together all the points in the fiber B × {0}, and
equipping
p the resulting space with the cone metric: dc ((x, t), (y, s)) =
t2 + s2 − 2ts cos d(x, y) when d(x, y) ≤ π, dc ((x, t), (y, s)) = t + s
when d(x, y) > π.)
774 27 Convergence of metric-measure spaces

Example 27.17. For any p P ∈ [1, ∞), deﬁne the ℓp norm on Rn by

the usual formula kxkℓp = ( |xi |p )1/p ; and let Xp be the space Rn ,
equipped with the ℓp norm, pointed at 0. Then, as p → ∞, Xp con-
verges in the pointed Gromov–Hausdorff topology to X∞ , which is Rn
equipped with the ℓ∞ norm, kxkℓ∞ = sup |xi |. In Xp , geodesics are seg-
ments of the form (1−t) a+t b, in particular they are nonbranching (two
distinct geodesics cannot coincide on a nontrivial time interval), and
unique (any two points are joined by a unique geodesic path). In con-
trast, geodesics in X∞ are branching and definitely nonunique (any two
distinct points can be joined by an uncountable set of geodesic paths).
We see in this example that neither the nonbranching property, nor
the property of uniqueness of geodesics, are preserved under Gromov–
Hausdorff convergence. Moreover, the huge majority of geodesics in X∞
cannot be realized as limits of geodesics on Xp .

Remark 27.18. Consider pointed geodesic spaces (Xk , ⋆k ) and (X , ⋆),

and let fk be a pointed εk -isometry BRk ] (⋆k ) → BRk ] (⋆). Then let
Rk′ ≤ Rk . It is clear that the distortion of fk on BR′k ] (⋆k ) is no more
than the distortion of fk on BRk ] (⋆k ). Also if x belongs to BR′k ] (⋆), and
X is a geodesic space, then there is x′ ∈ BR′k ] (⋆) with d(x, x′ ) = 2εk
and d(⋆, x′ ) ≤ Rk′ − 2εk ; then there is x′k ∈ BRk ] (⋆k ) such that
d(fk (x′k ), x′ ) ≤ εk , so d(⋆, fk (x′k )) ≤ Rk′ − εk , and then by the dis-
tortion property d(⋆k , x′k ) ≤ Rk′ − εk + εk = Rk′ ; on the other hand,
d(x′k , x) ≤ 2εk + εk = 3εk . The conclusion is that the restriction of
fk to BR′k ] (⋆k ) deﬁnes a (3εk )-isometry BR′k ] (⋆k ) → BR′k ] (⋆). In other
words, it is always possible to reduce Rk while keeping the property of
approximate isometry, provided that one changes εk for 3εk .

Remark 27.19 (important). In the theory of Gromov–Hausdorﬀ

convergence, it is often imposed that Rk = (εk )−1 in Deﬁnition 27.13.
This is consistent with Example 27.15, and also with most tangent
cones that are usually encountered. However, I shall not do so.

Functional analysis on Gromov–Hausdorff converging

sequences

Many theorems about metric spaces still hold true, after appropriate
modiﬁcation, for converging sequences of metric spaces. Such is the
Functional analysis on Gromov–Hausdorff converging sequences 775

case for some of the basic compactness theorems in functional analysis:

Ascoli’s theorem and Prokhorov’s theorem. I shall not need these results
outside the setting of compact spaces, so I shall be sketchy about their
formulation in the noncompact case; the reader can easily ﬁll in the
gaps.

Proposition 27.20 (Ascoli theorem in Gromov–Hausdorff con-

verging sequences). Let (Xk )k∈N be a sequence of compact metric
spaces, converging in the Gromov–Hausdorff topology to some compact
metric space X , by means of εk -approximations fk : Xk → X , admitting
approximate inverses fk′ ; and let (Yk )k∈N be another sequence of com-
pact metric spaces converging to Y in the Gromov–Hausdorff topology,
by means of εk -approximations gk : Yk → Y. Let (αk )k∈N be a sequence
of maps Xk → Yk that are asymptotically equicontinuous, in the sense
that for every ε > 0, there are δ = δ(ε) > 0 and N = N (ε) ∈ N so that
for all k ≥ N ,

dXk (xk , x′k ) ≤ δ =⇒ dYk αk (xk ), αk (x′k ) ≤ ε. (27.5)

Then after passing to a subsequence, the maps gk ◦ αk ◦ fk′ : X → Y

converge uniformly to a continuous map α : X → Y.
This statement extends to locally compact spaces converging in the
pointed Gromov–Hausdorff topology, and locally asymptotically uni-
formly equicontinuous maps, provided that the conclusion is weakened
into locally uniform convergence.

Remark 27.21. In Proposition 27.20, the maps gk ◦ αk ◦ fk′ may be

discontinuous, yet they will converge uniformly.

Proposition 27.22 (Prokhorov theorem in Gromov–Hausdorff

converging sequences). Let (Xk )k∈N be a sequence of compact metric
spaces, converging in the Gromov–Hausdorff topology to some compact
metric space X , by means of εk -approximations fk : Xk → X . For
each k, let µk be a probability measure on Xk . Then, after extraction of
a subsequence, (fk )# µk converges in the weak topology to a probability
measure µ on X as k → ∞.
This statement extends to Polish spaces converging by means of local
Gromov–Hausdorff approximations, provided that the probability mea-
(ℓ)
sures µk are uniformly tight with respect to the sequences (Kk ) ap-
pearing in the definition of local Gromov–Hausdorff approximation.
776 27 Convergence of metric-measure spaces

Remark 27.23. In the previous proposition, the fact that the maps
fk are approximate isometries is useful only in the noncompact case;
otherwise it just boils down to the compactness of P (X ) when X is
compact.

Now comes another simple compactness criterion for which I shall

provide a proof. Recall that a locally ﬁnite measure is a measure at-
tributing ﬁnite mass to balls.

Proposition 27.24 (Compactness of locally finite measures).

Let (Xk , dk , ⋆k )k∈N be a sequence of pointed locally compact Polish spaces
converging in the pointed Gromov–Hausdorff topology to some pointed
locally compact Polish space (X , d, ⋆), by means of pointed εk -isometries
fk with εk → 0. For each k ∈ N, let νk be a locally finite Borel mea-
sure on Xk . Assume that for each R > 0, there is a finite constant
M = M (R) such that

∀k ∈ N, νk [BR] (⋆k )] ≤ M.

Then, there is a locally finite measure ν such that, up to extraction of

a subsequence,
(fk )# νk −−−→ ν
k→∞
in the weak-∗ topology (that is, convergence against compactly supported
continuous test functions).

Proof of Proposition 27.24. For any ﬁxed R > 0, (fk )# νk [BR] (⋆)] =
νk [(fk )−1 (BR] (⋆))] ≤ νk [BR+εk ] (⋆k )] is uniformly bounded by M (R+ 1)
for k large enough. Since on the other hand BR] (⋆) is compact, we may
extract a subsequence in k such that (fk )# νk [BR] (⋆)] converges to some
ﬁnite measure νR in the weak-∗ topology of BR] (⋆). Then the result
follows by taking R = ℓ → ∞ and applying a diagonal extraction. ⊓ ⊔

Remark 27.25. There is an easy extension of Proposition 27.24 to

local Gromov–Hausdorﬀ convergence.

Adding the measure

Now let us switch from metric spaces to metric-measure spaces,

which are triples (X , d, ν), where d is a distance on X and ν a Borel
Adding the measure 777

measure on X . (For brevity I shall sometimes write just X instead of

(X , d, ν).) So the question is to measure how far two metric-measure
spaces X and Y are from each other.
There is a nontrivial choice to be made:
(a) Either we insist that metric-measure spaces are metric spaces in
the ﬁrst place, so two metric-measure spaces should be declared close
only if they are close in terms of both the metric and the measure;
(b) Or we think that only the measure is relevant, and we should
disregard sets of zero or small measure when estimating how far two
metric-measure spaces are.
In the ﬁrst case, one should identify two spaces (X , d, ν) and
(X ′ , d′ , ν ′ ) only if they are isomorphic as metric-measure spaces, which
means that there exists a measurable bijection f : X → X ′ such that
f is an isometry, and f preserves the measure: f# ν = ν ′ . Such a map
is naturally called a measure-preserving isometry, and its inverse
f −1 is automatically a measure-preserving isometry. (Note: It is not
enough to impose that X and X ′ are isomorphic as metric spaces, and
isomorphic as measure spaces; the same map should do the job for both
isomorphisms.)
In the second case, one should identify sets that are isomorphic up to
a zero measure set; so a natural thing to do is to declare that (X , d, ν)
and (X ′ , d′ , ν ′ ) are the same if there is a measure-preserving isometry
between Spt ν and Spt ν ′ , seen as subspaces of X and X ′ respectively.
Figure 27.4 is a classical example of a convergence which holds true
in the sense of (b), not in the sense of (a).

Fig. 27.4. An example of “reduction of support” that can arise in measured

Gromov–Hausdorff convergence. This is a balloon with a very thin spike; in the
Gromov–Hausdorff sense it is approximated by a balloon to which a one-dimensional
spike is attached, that carries no measure.
778 27 Convergence of metric-measure spaces

Now it is easy to cook up notions of distance between metric-

measure spaces. For a start, let us restrict to compact probability
spaces. Pick up a distance which metrizes the weak topology on the
set of probability measures, such as the Prokhorov distance dP , and
introduce the Gromov–Hausdorff–Prokhorov distance by the for-
mula n o
dGHP (X , Y) = inf dH (X ′ , Y ′ ) + dP (νX ′ , νY ′ ) ,

where the inﬁmum is taken over all measure-preserving isometric em-

beddings f : (X , νX ) → (X ′ , νX ′ ) and g : (Y, νY ) → (Y ′ , νY ′ ) into
a common metric space Z. That choice would correspond to point of
view (a), while in point of view (b) one would rather use the Gromov–
Prokhorov distance, which is deﬁned, with the same notation, as

dGP (X , Y) = inf dP (νX ′ , νY ′ ).

(The metric structure of X and Y has not disappeared since the infi-
mum is only over isometries.)
Both dGHP and dGP satisfy the triangle inequality, as can be checked
by a gluing argument again. (Now one should use both the metric and
the probabilistic gluing!) Then there is no difficulty in checking that
dGHP is a honest distance on classes of metric-measure isomorphisms,
with point of view (a). Similarly, dGP is a distance on classes of metric-
measure isomorphisms, with point of view (b), but now it is quite non-
trivial to check that [dGP (X , Y) = 0] =⇒ [X = Y]. I shall not insist on
this issue, for in the sequel I shall focus on point of view (a).
There are several variants of these constructions:
1. Use other distances on probability metrics. Essentially everybody
agrees on the Hausdorff distance to measure distances between sets, but
as we know, there are many natural choices of distances between prob-
ability measures. In particular, one can replace the Prokhorov distance
by the Wasserstein distance of order p, and thus obtain the Gromov–
Hausdorff–Wasserstein distance of order p:
n o
dGHWp (X , Y) = inf dH (X ′ , Y ′ ) + Wp (νX ′ , νY ′ ) ,

where the inﬁmum is over isometric embeddings; and of course the

Gromov–Wasserstein distance of order p:

dGWp (X , Y) = inf Wp (νX ′ , νY ′ ).

Adding the measure 779

2. Measure distances between spaces on which the measure is finite

but not necessarily normalized to 1. This obviously amounts to mea-
suring distances between finite nonnegative measures that are not nec-
essarily normalized. There are two rather natural strategies (and many
variants). The first one consists in using the bounded Lipschitz distance,
as defined in (6.6), which makes sense for arbitrary signed measures;
in this way one can define the “Gromov–Hausdorff–bounded-Lipschitz
distance” dGHbL and the “Gromov–bounded-Lipschitz distance” dGbL ,
the definitions of which should be obvious to the reader. Another pos-
sibility consists in comparing the normalized metric spaces, and then
adding a penalty which takes into account the discrepancy between the
total masses. For instance, if µ and ν are defined on some common
space Z, one may let

µ ν
d(µ, ν) = dP , + µ[Z] − ν[Z].
µ[Z] ν[Z]

One may also replace dP by Wp , or whatever; and change the penalty

(why not something like | log(µ[Z]/ν[Z])|?); etc. So there is a tremen-
dous number of “natural” possibilities.
3. Consider noncompact spaces. Here also, there are many possible
frameworks, and the reader is free to consider this variety as a wealth
or as a big mess. A first possibility is to just ignore the fact that spaces
are noncompact: this is not reasonable if one sticks to philosophy (a),
because the Hausdorff distance between noncompact spaces is too rigid;
but it makes perfect sense with philosophy (b), at least for finite mea-
sures (say probability measures). Then, one may apply distances dGHWp
to noncompact situations, or variants designed to handle measures that
are not probability measures. When the measures are only σ-finite, this
simple approach has to be modified. A possibility consists in localizing
as in Definition 27.11. Another possibility, which makes sense in a lo-
cally compact context, consists in pointing as in Definition 27.13 (and
one may also impose the same condition as in Remark 27.19).
Convention: In the sequel of this chapter, when (X , d, ν) is a metric
space equipped with a measure, I shall always implicitly assume that ν
is nonzero, and that it is:
• σ-finite if (X , d) is only assumed to be Polish;
• locally finite if (X , d) is assumed in addition to be locally compact.
780 27 Convergence of metric-measure spaces

(Given a locally compact metric space (X , d) and an arbitrary point

⋆ ∈ X , X is the union of the closed balls Bk] (⋆), so any locally ﬁnite
measure on X is automatically σ-ﬁnite.)

Convergence and doubling property

The discussion of the previous section showed that one should be cau-
tious about which notion of convergence is used. However, whenever
they are available, doubling estimates, in the sense of Deﬁnition 18.1,
basically rule out the discrepancy between approaches (a) and (b)
above. The idea is that doubling prevents the formation of sharp spikes
as in Figure 27.4. This discussion is not so clearly made in the literature
that I know, so in this section I shall provide more careful proofs.

Proposition 27.26 (Doubling lets metric and metric-measure

approaches coincide). Let (X , µ) and (Y, ν) be two compact Polish
probability spaces with diameter at most R. Assume that both µ and ν
are doubling with a constant D. Then

dGP (X , Y) ≤ dGHP (X , Y) ≤ ΦR,D dGP (X , Y) , (27.6)

where 1
ΦR,D (δ) = max 8 δ, R (16 δ) log2 D + δ

is a function that goes to 0 as δ → 0, at a rate that is controlled in

terms of just upper bounds on R and D.

Proof of Proposition 27.26. The inequality on the left of (27.6) is trivial,

so let us focus on the right inequality. To start with, let x ∈ X , ε > 0,
then
1 = µ[X ] = µ[BR] (x)] ≤ D N µ[Bε/4] (x)],
where
4R R
N = log2 ≤ log2 + 3,
ε ε
and after a few manipulations this leads to
1 ε log2 D
µ[Bε/4] (x)] ≥ .
8 R
Convergence and doubling property 781

Now let (X ′ , µ′ ) and (Y ′ , ν ′ ) be two isomorphic copies of (X , µ) and

(Y, ν) in some metric space Z. Let ε be the Hausdorﬀ distance between
X ′ and Y ′ , and δ the Prokhorov distance between µ′ and ν ′ ; the goal
is to control ε + δ in terms of δ alone. If ε ≤ 8δ, then we are done, so
we might assume that δ < ε/8. Since ε > 0, there is, say, some x ∈ X ′
such that the ball Bε/2] (x) does not intersect Y ′ .
The doubling property of (X , µ) is of course transferred to (X ′ , µ′ ),
so by the previous estimate
1 ε log2 D
µ′ [Bε/4] (x)] ≥ . (27.7)
8 R
By deﬁnition of the Prokhorov distance,

µ′ [Bε/4] (x)] ≤ ν ′ [Bε/4+2δ] (x)] + 2δ. (27.8)

From (27.7) and (27.8) it follows that

1 ε log2 D
ν ′ [Bε/4+2δ] (x)] ≥ − 2δ. (27.9)
8 R
Since δ < ε/8, the ball Bε/4+2δ] is included in Bε/2] (x), which does not
intersect Y ′ ; so the left-hand side in (27.9) has to be 0. Thus
1
ε ≤ R (16δ) log2 D ;

and then the conclusion is easily obtained. ⊓

⊔
Proposition 27.26 is better appreciated in view of the next exercise:
Exercise 27.27. Let (Xk , dk , νk ) be a sequence of Polish probability
spaces converging in the sense of dGP to (X , d, ν). Assume that each
νk is doubling, with a uniform bound on the doubling constant. Prove
that ν is also doubling.
The combination of Proposition 27.26 and Exercise 27.27 yields the
following corollary:
Corollary 27.28 (dGP convergence and doubling imply dGHP
convergence). Let (Xk , dk , νk ) be a family of Polish probability spaces
satisfying a uniform doubling condition, uniformly bounded, and con-
verging to (X , d, ν) in the Gromov–Prokhorov sense. Then (Xk , dk , νk )
also converges in the Gromov–Hausdorff–Prokhorov sense to (X , d, ν).
In particular, (Xk , dk ) converges to (X , d) in the Gromov–Hausdorff
sense.
782 27 Convergence of metric-measure spaces

To summarize once again: Qualitatively, the distinction between

points of view (a) and (b) is nonessential when one deals with the
convergence of probability spaces satisfying a uniform doubling esti-
mate. A more careful discussion would extend this conclusion to metric-
measure spaces that are not necessarily probability spaces; and then to
the pointed convergence of metric-measure spaces, provided that the
doubling constant on a ball of radius R (around the base point of each
space) only depends on R.
When doubling estimates are not available, things are not so sim-
ple and it does matter whether one adheres to philosophy (a) or (b).
Point of view (b) is the one that was mainly developed by Gromov, in
relation to the phenomenon of concentration of measure. It is also the
point of view that was adopted by Sturm in his study of the stability
of displacement convexity. Nevertheless, I shall prefer to stick here to
point of view (a), partly because this is the approach which Lott and I
adopted for the study of the stability of optimal transport, partly be-
cause it can be argued that point of view (a) provides a more precise
notion of convergence and description of the limit space. For instance,
in the example of Figure 27.4, the fact that the limit space has a spike
carrying zero measure retains information about the asymptotic shape
of the converging sequence. Of course, this will not prevent me from
throwing away pieces with zero measure by restricting to the support
of the measure, when that is possible.
Doubling has another use in the present context: It leads to uniform
total boundedness estimates, and therefore to compactness statements
via Theorem 27.10.
Proposition 27.29 (Doubling implies uniform total bounded-
ness). Let (X , d) be a Polish space with diameter bounded above by R,
equipped with a finite (nonzero) D-doubling measure ν. Then for any
ε > 0 there is a number N = N (ε), only depending on R, D and ε,
such that X can be covered with N balls of radius ε.
Proof of Proposition 27.29. Without loss of generality, we might assume
that ν[X ] = 1. Let r = ε/2, and let n be such that R ≤ 2n r. Choose an
arbitrary point x1 in X ; then a point x2 in X \ (B2r] (x1 )), a point x3 in
X \(B2r] (x1 )∪ B2r] (x2 )), and so forth. All the balls Br] (xj ) are disjoint,
and by the doubling property each of them has measure at least D−n .
So X \ (Br] (x0 ) ∪ . . . ∪ Br] (xk )) has measure at most 1 − kD−n , and
therefore Dn is an upper bound on the number of points xj that can
be chosen.
Measured Gromov–Hausdorff topology 783

Now let x ∈ X . There is at least one index j such that d(x, xj ) < 2r;
otherwise x would lie in the complement of the union of all the balls
B2r (xj ), and could be added to the family {xj }. So {xj } constitutes
a 2r-net in X , with cardinality at most N = D n . This concludes the
proof. ⊓
⊔

Measured Gromov–Hausdorff topology

After all this discussion I can state a precise definition of the notion of
convergence that will be used in the sequel for metric-measure spaces:
this is the measured Gromov–Hausdorff topology. It is associated
with the convergence of spaces as metric spaces and as measure spaces.
This concept can be defined quantitatively in terms of, e.g., the distance
dGHP and its variants, but I shall be content with a purely topological
(qualitative) definition. As in the case of the plain Gromov–Hausdorff
topology, there is a convenient reformulation in terms of approximate
isometries.

Definition 27.30 (Measured Gromov–Hausdorff topology). Let

(Xk , dk , νk )k∈N and (X , d, ν) be compact metric spaces, equipped with
finite nonzero measures. It is said that Xk converges to X in the mea-
sured Gromov–Hausdorff topology if there are measurable εk -isometries
fk : Xk → X such that εk → 0 and

(fk )# νk −−−→ ν
k→∞

in the weak topology of measures.

If (Xk , dk , νk ) and (X , d, ν) are Polish spaces, not necessarily com-
pact, equipped with σ-finite measures, it is said that Xk converges to X
in the local measured Gromov–Hausdorff topology if there are nonde-
(ℓ)
creasing sequences of compact sets (Kk )ℓ∈N for each k, and (K (ℓ) )ℓ∈N ,
(ℓ)
such that for each ℓ, the space Kk , seen as a subspace of Xk , converges
in the measured Gromov–Hausdorff topology to K (ℓ) as k → ∞; and the
union of all K (ℓ) is dense in X .
If the spaces (Xk , dk , νk , ⋆k ) and (X , d, ν, ⋆) are locally compact
pointed Polish spaces equipped with locally finite measures, it is said
that Xk converges to X in the pointed measured Gromov–Hausdorff
784 27 Convergence of metric-measure spaces

topology if there are sequences Rk → ∞ and εk → 0, and measurable

pointed εk -isometries B[⋆k , Rk ] → B[⋆, Rk ], such that

(fk )# νk −−−→ ν,
k→∞

where the convergence is now in the weak-∗ topology (convergence

against compactly supported continuous functions).

Remark 27.31. As already remarked for the plain Gromov–Hausdorﬀ

topology, one might also require that Rk = (εk )−1 , but I shall not do
so here.

From the material in this chapter it is easy to derive the following

compactness criterion:

Theorem 27.32 (Compactness in measured Gromov–Hausdorff

topology). (i) Let R > 0, D > 0, and 0 < m ≤ M be finite posi-
tive constants, and let F be a family of compact metric-measure spaces,
such that (a) for each (X , d, ν) ∈ F the diameter of (X , d) is bounded
above by 2R; (b) the measure ν has a doubling constant bounded above
by D; and (c) m ≤ ν[X ] ≤ M . Then F is precompact in the mea-
sured Gromov–Hausdorff topology. In particular, any weak cluster space
(X∞ , d∞ , ν∞ ) satisfies Spt ν∞ = X∞ .
(ii) Let F be a family of locally compact pointed Polish metric-
measure spaces. Assume that for each R, there is a constant D = D(R)
such that for each (X , d, ν, ⋆) ∈ F the measure ν is D-doubling on
the ball BR] (⋆). Further, assume the existence of m, M > 0 such that
m ≤ ν[B1] (⋆)] ≤ M for all (X , d, ν) ∈ F. Then F is precompact in the
pointed measured Gromov–Hausdorff topology. In particular, any weak
cluster space (X∞ , d∞ , ν∞ ) satisfies Spt ν∞ = X∞ .

Remark 27.33. A particular case of Theorem 27.32 is when all mea-

sures are normalized to have unit mass.

Proof of Theorem 27.32. Part (i) follows from the combination of Propo-
sition 27.29, Theorem 27.10 and Proposition 27.22. Part (ii) follows in
addition from the deﬁnition of pointed measured Gromov–Hausdorﬀ
convergence and Proposition 27.24. Note that in (ii), the doubling as-
sumption is used to prevent the formation of “spikes”, but also to ensure
uniform bounds on the mass of balls of radius R for any R, once it is
known for some R. (Here I chose R = 1, but of course any other choice
would have done.) ⊓
⊔
Bibliographical notes 785

The following simple but fundamental corollary is obtained by

combining Theorem 27.32 with the Bishop–Gromov inequality (The-
orem 18.8):

Corollary 27.34 (Gromov’s precompactness theorem). Let K ∈

R, N ∈ (1, ∞] and D ∈ (0, +∞). Let M(K, N, D) be the set of Rie-
mannian manifolds (M, g) such that dim(M ) ≤ N , RicM ≥ Kg and
diam (M ) ≤ D, equipped with their geodesic distance and their volume
measure. Then M(K, N, D) is precompact in the measured Gromov–
Hausdorff topology.

Bibliographical notes

It is well-known to specialists, but not necessarily obvious to others,

that the Prokhorov distance, as defined in (6.5), coincides with the
expression given in the beginning of this chapter; see for instance [814,
Remark 1.29].
Gromov’s influential book [438] is one of the founding texts for the
Gromov–Hausdorff topology. Some of the motivations, developments
and applications of Gromov’s ideas can can be found in the research
papers [116, 386, 675] written shortly after the publication of that book.
My presentation of the Gromov–Hausdorff topology mainly followed
the very pedagogical book of Burago, Burago and Ivanov [174]. (These
authors do not use the terminology “geodesic space” but rather “strictly
intrinsic length space”.) One can find there the proofs of Theorems 27.9
and 27.10. Besides Gromov’s own book, other classical sources about
the convergence of metric spaces and metric-measure spaces are the
book by Petersen [680, Chapter 10], and the survey by Fukaya [387].
Also a short presentation is available in the Saint-Flour lecture notes
by S. Evans.
Information about Mr. Magoo, the famous short-sighted cartoon
character, can be found on the website www.toontracker.com.
The definition of a metric cone can be found in [174, Section 3.6.2],
and the notion of a tangent cone is explained in [174, p. 321]. Re-
call from Example 27.16 that the tangent cone may be defined in two
different ways: either as the metric cone over the space of directions,
or as the Gromov–Hausdorff limit of rescaled spaces; read carefully
786 27 Convergence of metric-measure spaces

Remarks 9.1.40 to 9.1.42 in [174] to avoid traps (or see [626]). For
Alexandrov spaces with curvature bounded below, both notions coin-
cide, see [174, Section 10.9] and the references therein.
A classical source for the measured Gromov–Hausdorff topology is
Gromov’s book [438]. The point of view mainly used there consists in
forgetting the Gromov–Hausdorff distance and “using the measure to
kill infinity”; so the distances that are found there would be of the sort
of dGWp or dGP . An interesting example is Gromov’s “box” metric 1 ,
defined as follows [438, pp. 116–117]. If d and d′ are two metrics on
a given probability space X , define 1 (d, d′ ) as the infimum of ε > 0
such that |d − d′ | ≤ ε outside of a set of measure at most ε in X × X
(the subscript 1 means that the measure of the discarded set is at
most 1 × ε). Take the particular metric space I = [0, 1], equipped
with the usual distance and with the Lebesgue measure λ, as reference
space. If (X , d, ν) and (X ′ , d′ , ν ′ ) are two Polish probability spaces,
define 1 (X , X ′ ) as the infimum of 1 (d ◦ φ, d′ ◦ φ′ ) where φ (resp.
φ′ ) varies in the set of all measure-preserving maps φ : I → X (resp.
φ′ : [0, 1] → X ′ ).
Sturm made a detailed study of dGW2 (denoted by D in [762]) and
advocated it as a natural distance between classes of equivalence of
probability spaces in the context of optimal transport. He proved that
D satisfies the length property, and compared it with Gromov’s box
distance as follows:

 1 1

D(X , Y) ≤ max diam (X ), diam (Y) + 4 1 (X , Y) 2





D(X , Y) ≥ (1/2)3/2 (X , Y) 32 .
1

The alternative point of view in which one takes care of both the
metric and the measure was introduced by Fukaya [386]. This is the
one which was used by Lott and myself in our study of displacement
convexity in geodesic spaces [577].
The pointed Gromov–Hausdorff topology is presented for instance
in [174]; it has become very popular as a way to study tangent spaces
in the absence of smoothness. In the context of optimal transport, the
pointed Gromov–Hausdorff topology was used independently in [30,
Section 12.4] and in [577, Appendix A].
I introduced the definition of local Gromov–Hausdorff topology for
the purpose of these notes; it looks to me quite natural if one wants to
Bibliographical notes 787

preserve the idea of pointing in a setting that might not necessarily be

locally compact. This is not such a serious issue and the reader who does
not like this notion can still go on with the rest of these notes. Still, let
me recommend it as a natural concept to treat the Gromov–Hausdorff
convergence of the Wasserstein space over a noncompact metric space
(see Chapter 28).
The statement of completeness of the Gromov–Hausdorff space ap-
pears in Gromov’s book [438, p. 78]. Its proof can be found, e.g., in
Fukaya [387, Theorem 1.5], or in the book by Petersen [680, Chap-
ter 10, Proposition 1.7].
The theorem briefly alluded to in the end of Remark 27.8 states the
following: If M is an n-dimensional compact Riemannian manifold, and
(Mk )k∈N is a sequence of n-dimensional compact Riemannian manifolds
converging to M , with uniform upper and lower bounds on the sectional
curvatures, and a volume which is uniformly bounded below, then Mk
is diffeomorphic to M for k large enough. This result is due to Gromov
(after precursors by Shikata); see, e.g., [387, Chapter 3] for a proof and
references.
The Gromov–Hausdorff topology is not the only one used to com-
pare Riemannian manifolds; for instance, some authors have defined a
“spectral distance” based on properties of the heat kernel [97, 492, 508,
509].
I shall conclude with some technical remarks.
The theorem according to which a distance-preserving map X → X
is a bijection if X is compact can be found in [174, Theorem 1.6.14].
Proposition 27.20 appears, in a form which is not exactly the one
that I used, but quite close to, in [436, p. 66] and [443, Appendix A].
The reader should have no difficulty in adapting the statements there
into Proposition 27.20 (or redo the proof of the Ascoli theorem). Propo-
sition 27.22 is rather easy to prove, and anyway in the next chapter we
shall prove some more complicated related theorems.
That a locally compact geodesic space is automatically boundedly
compact is part of a generalized version of the Hopf–Rinow theo-
rem [174, Theorem 2.5.28].
Finally, the construction of approximate isometries from correspon-
dences, as performed in [174], uses the full axiom of choice (on p. 258:
“For each x, choose f (x) ∈ Y ”). So I should sketch a proof which does
not use it. Let R be a correspondence with distortion η; the problem
is to construct an ε-isometry f : X → Y for any ε > η. Let D be a
countable dense subset in X . Choose δ so small that 2δ + η < ε. Cover
788 27 Convergence of metric-measure spaces

X by ﬁnitely many disjoint sets Ak , such that each Ak is included in

some ball B(xk , δ), with xk ∈ D. Then for each x ∈ D choose f (x)
in relation with x. (This only uses the countable version of the axiom
of choice.) Finally, for each x ∈ Ak deﬁne f (x) = f (xk ). It is easy to
check that dis (f ) ≤ 2δ + η < ε.
(This axiomatic issue is also the reason why I work with approximate
inverses that are (4ε)-isometries rather than (3ε)-isometries.)
28

Stability of optimal transport

This chapter is devoted to the following theme: Consider a family of

geodesic spaces Xk which converges to some geodesic space X ; does this
imply that certain basic objects in the theory of optimal transport on
Xk “pass to the limit”? In this chapter I shall show that the answer is af-
firmative: One of the main results is that the Wasserstein space P2 (Xk )
converges, in (local) Gromov–Hausdorff sense, to the Wasserstein space
P2 (X ). Then I shall consider the stability of dynamical optimal trans-
ference plans, and related objects (displacement interpolation, kinetic
energy, etc.). Compact spaces will be considered first, and will be the
basis for the subsequent treatment of noncompact spaces.

Optimal transport in a nonsmooth setting

Most of the objects that were introduced and studied in the context of
optimal transport on Riemannian manifolds still make sense on a gen-
eral metric-measure length space (X , d, ν), satisfying certain regularity
assumptions. I shall assume here that (X , d) is a locally compact,
complete separable geodesic space equipped with a σ-ﬁnite refer-
ence Borel measure ν. From general properties of such spaces, plus the
results in Chapters 6 and 7:
• the cost function c(x, y) = d(x, y)2 is associated with the coercive
Lagrangian action A(γ) = L(γ)2 , and minimizers are constant-
speed, minimizing geodesics, the collection of which is denoted by
Γ (X );
790 28 Stability of optimal transport

• for any given µ0 , µ1 in P2 (X ), the optimal total cost C(µ0 , µ1 ) is

ﬁnite and there exists at least one optimal transference plan π ∈
P (X × X ) with marginals µ0 and µ1 ;
• the 2-Wasserstein space P2 (X ), equipped with the 2-Wasserstein
distance, is a complete separable geodesic space;
• a displacement interpolation (µt )0≤t≤1 can be deﬁned either as a
geodesic in P2 (X ), or as (et )# Π, where et is the evaluation at time
t, and Π is a dynamical optimal transference plan, i.e. the law of
a random geodesic whose endpoints form an optimal coupling of
(µ0 , µ1 ).

One can also introduce the interpolant density ρt = ρt (x) as the

density of µt with respect to the reference measure ν.
Many of the statements that were available in the Riemannian set-
ting can be recast in terms of these objects. An important difference
however is the absence of any “explicit” description of optimal cou-
plings in terms of d2 /2-convex maps ψ. So expressions involving ∇ψ
will not a priori make sense, unless we find an intrinsic reformulation
in terms of the above-mentioned objects. For instance,
Z Z 2
ρ0 (x)|∇ψ(x)|2 dν(x) = d x, expx ∇ψ(x) dµ0 (x) = W2 (µ0 , µ1 )2 .
(28.1)
There is a more precise procedure which allows one to make sense of
|∇ψ|, even if ∇ψ itself does not. The crucial observation, as in (28.1),
is that |∇ψ(x)| can be identified with the length L(γ) of the geodesic
γ joining γ(0) = x to γ(1) = y. In the next paragraph I shall develop
this remark. The hasty reader can skip this bit and go directly to the
section about the convergence of Wasserstein spaces.

Kinetic energy and speed

Definition 28.1 (Kinetic energy). Let X be a locally compact Pol-

ish geodesic space, and let Π ∈ P (Γ (X )) be a dynamical transference
plan. For each t ∈ (0, 1) define the associated kinetic energy εt (dx) by
the formula 2
L
εt = (et )# Π .
2
Kinetic energy and speed 791

If εt is absolutely continuous with respect to µt , define the speed field

|v|(t, x) by the formula
s
dεt
|v|(t, x) = 2 .
dµt

Remark 28.2. If X is compact then εt ≤ Cµt with C = (diam X )2 /2;

so |v| is well-deﬁned (up to modiﬁcation
√ on a set of zero µt -measure)
and almost surely bounded by 2C = diam (X ).

Remark 28.3. If γ is a geodesic curve, then L(γ) = |γ̇|(t), whatever

t ∈ (0, 1). Assume that X is a Riemannian manifold M , and geodesics in
the support of Π do not cross at intermediate times. (As we know from
Chapter 8, this is the case if Π is an optimal dynamical transference
plan.) Then for each t ∈ (0, 1) and x ∈ M there is at most one geodesic
γ = γ x,t such that γ(t) = x. So
x,t 2 x,t 2
|γ̇ (t)| |γ̇ (t)|
εt (dx) = [(et )# Π](dx) = µt (dx);
2 2

and |v|(t, x) really is |γ̇ x,t |, that is, the speed at time t and position x.
Thus Deﬁnition 28.1 is consistent with the usual notions of kinetic
energy and speed ﬁeld (speed = norm of the velocity).

Particular Case 28.4. Let M be a Riemannian manifold, and let µ0

and µ1 be two probability measures in P2 (M ), µ0 being absolutely
continuous with respect to the volume measure. Let ψ be a d2 /2-convex
function such that exp(∇ψ) is the optimal transport from µ0 to µ1 , and
let ψt be obtained by solving the forward Hamilton–Jacobi equation
∂t ψt + |∇ψt |2 /2 = 0 starting from the initial datum ψ0 = ψ. Then the
speed |v|(t, x) coincides, µt -almost surely, with |∇ψt (x)|.

The kinetic energy is a nonnegative measure, while the ﬁeld speed

is a function. Both objects will enjoy good stability properties under
Gromov–Hausdorﬀ approximation. But under adequate assumptions,
the velocity ﬁeld will also enjoy compactness properties in the uniform
topology. This comes from the next statement.

Theorem 28.5 (Regularity of the speed field). Let (X , d) be a

compact geodesic space, let Π ∈ P (Γ (X )) be a dynamical optimal trans-
ference plan, let (µt )0≤t≤1 be the associated displacement interpolation,
792 28 Stability of optimal transport

and |v| = |v|(t, x) the associated speed field. Then, for each t ∈ (0, 1)
one can modify |v|(t, · ) on a µt -negligible set in such a way that for all
x, y ∈ X ,
p
C diam (X ) p
|v|(t, x) − |v|(t, y) ≤ p d(x, y), (28.2)
t(1 − t)

where C is a numeric constant. In particular, |v|(t, · ) is Hölder-1/2.

Proof of Theorem 28.5. Let t be a ﬁxed time in (0, 1). Let γ1 and γ2
be two minimizing geodesics in the support of Π, and let x = γ1 (t),
y = γ2 (t). Then by Theorem 8.22,
p
C diam (X ) p
L(γ1 ) − L(γ2 ) ≤ p d(x, y). (28.3)
t(1 − t)

Let Xt be the union of all γ(t), for γ in the support of Π. For a given
x ∈ Xt , there might be several geodesics γ passing through x, but
(as a special case of (28.3)) they will all have the same length; define
|v|(t, x) to be that length. This is a measurable function, since it can
be rewritten Z

|v|(t, x) = L(γ) Π dγ γ(t) = x ,
Γ
where Π(dγ|γ(t) = x) is of course the disintegration of Π with respect
to µt = law (γt ). Then |v|(t, · ) is an admissible density for εt , and as a
consequence of (28.3) it satisfies (28.2) for all x, y ∈ Xt .
To extend |v|(t, x) on the whole of X , I shall adapt the proof of
a√well-known extension theorem for Lipschitz functions. Let H :=
C diam X /(t(1 − t)), so that |v| is Hölder-1/2 on Xt with constant H.
Define, for x ∈ X ,
h p i
w(x) := inf H d(x, y) + |v|(t, y) .
y∈Xt

It is clear that w ≥ 0, and the estimate (28.2) easily implies that

w(x) = |v|(t, x) for any x ∈ Xt . Next, whenever x and x′ are two points
in X , one has
Convergence of the Wasserstein space 793

w(x)−w(x′ )
h p i h p i
= inf H d(x, y) + |v|(t, y) − inf H d(x ′ , y ′ ) + |v|(t, y ′ )
y∈Xt y ′ ∈Xt
h p p i
′ ′ ′
= sup inf H d(x, y) − H d(x , y ) + |v|(t, y) − |v|(t, y )
y ′ ∈Xt y∈Xt
hp p p i
≤ H sup inf d(x, y) − d(x′ , y ′ ) + d(y, y ′ )
y ′ ∈Xt y∈Xt
hp p i
≤ H sup d(x, y ′ ) − d(x′ , y ′ ) . (28.4)
y ′ ∈Xt

But
p p p p
d(x, y ′ ) ≤d(x, x′ ) + d(x′ , y ′ ) ≤ d(x, x′ ) + d(x′ , y ′ );
p
so (28.4) is bounded above by H d(x, x′ ).
To summarize: w coincides with |v|(t, · ) on Xt , and it satisﬁes the
same Hölder-1/2 estimate. Since µt is concentrated on Xt , w is also
an admissible density for εt , so we can take it as the new deﬁnition of
|v|(t, · ), and then (28.2) holds true on the whole of X . ⊓
⊔

Convergence of the Wasserstein space

The main goal of this section is the proof of the convergence of the
Wasserstein space P2 (X ), as expressed in the next statement.

Theorem 28.6 (If Xk converges then P2 (Xk ) also). Let (Xk )k∈N
and X be compact metric spaces such that
GH
Xk −−→ X .

Then
GH
P2 (Xk ) −−→ P2 (X ).
Moreover, if fk : Xk → X are approximate isometries, then the maps
(fk )# : P2 (Xk ) → P2 (X ), defined by (fk )# (µ) = (fk )# µ, are approxi-
mate isometries too.

Theorem 28.6 will come as an immediate corollary of the following

more precise results:
794 28 Stability of optimal transport

Proposition 28.7 (If f is an approximate isometry then f#

also). Let f : (X1 , d1 ) → (X2 , d2 ) be an ε-isometry between two Polish
spaces. Then the map f# is an εe-isometry between P2 (X1 ) and P2 (X2 ),
where
p p
εe = 6 ε + 2 ε (2 diam (X2 ) + ε) ≤ 8 ε + ε diam (X2 ) . (28.5)

Remark 28.8. The map f# is continuous if and only if f itself is con-

tinuous (which in general is not the case).

Proof of Proposition 28.7. Let f be an ε-isometry, and let f ′ be an ε-

inverse for f . Recall that f ′ is a (4ε)-isometry and satisﬁes (27.4).
Given µ1 and µ′1 in P2 (X1 ), let π1 be an optimal transference plan
between µ1 and µ′1 . Deﬁne

π2 := f, f # π1 .

Obviously, π2 is a transference plan between f# µ1 and f# µ′1 ; so

Z

′ 2
W2 f # µ1 , f # µ1 ≤ d2 (x2 , y2 )2 dπ2 (x2 , y2 )
X2 ×X2
Z
2
= d2 f (x1 ), f (y1 ) dπ1 (x1 , y1 ). (28.6)
X1 ×X1

The identity

d2 (f (x1 ), f (y1 ))2 − d1 (x1 , y1 )2

= d2 (f (x1 ), f (y1 )) − d1 (x1 , y1 ) d2 (f (x1 ), f (y1 )) + d1 (x1 , y1 ) ,

implies

2 2
d
2 (f (x1 ), f (y 1 )) − d1 1 1 ≤ ε diam (X1 ) + diam (X2 ) .
(x , y )

Plugging this bound into (28.6), we deduce that

2
W2 f# µ1 , f# µ′1 ≤ W2 (µ1 , µ′1 )2 + ε diam (X1 ) + diam (X2 ) , (28.7)

hence
q
W2 f# µ1 , f# µ′1 ≤ W2 (µ1 , µ′1 ) + ε diam (X1 ) + diam (X2 ) . (28.8)
Convergence of the Wasserstein space 795

It follows from the deﬁnition of an ε-isometry that diam (X1 ) ≤

diam (X2 ) + ε; so (28.8) leads to
q
W2 f# µ1 , f# µ′1 ≤ W2 (µ1 , µ′1 ) + ε 2 diam (X2 ) + ε , (28.9)

which shows that f# does not increase distances much.

Exchanging the roles of X1 and X2 , and applying (28.8) to the
map f ′ and the measures f# µ1 and f# µ′1 , together with diam (X1 ) ≤
diam (X2 ) + ε again, we obtain

W2 (f ′ )# (f# µ1 ), (f ′ )# (f# µ′1 ) ≤ W2 f# µ1 , f# µ′1
p
+ 4ε (2 diam (X2 ) + ε). (28.10)

(The factor 4 is because f ′ is a (4ε)-isometry.) Since f ′ ◦ f is an admis-

sible Monge transport between µ1 and (f ′ ◦ f )# µ1 , or between µ′1 and
(f ′ ◦ f )# µ′1 , which moves points by a distance at most 3ε, we have

W2 (f ′ ◦ f )# µ1 , µ1 ≤ 3ε; W2 (f ′ ◦ f )# µ′1 , µ′1 ≤ 3ε.

Then by (28.10) and the triangle inequality,

W2 (µ1 , µ′1 ) ≤ W2 µ1 , (f ′ ◦ f )# µ1 + W2 (f ′ ◦ f )# µ1 , (f ′ ◦ f )# µ′1

+ W2 (f ′ ◦ f )# µ′1 , µ′1
p
≤ 3ε + W2 f# µ1 , f# µ′1 + 4ε(2 diam (X2 ) + ε) + 3ε.
(28.11)

Equations (28.9) and (28.11) together show that f# distorts distances

by at most εe.
It remains to show that f# is approximately surjective. To do this,
pick up some µ2 ∈ P2 (X2 ), and consider the Monge transport f ◦ f ′
from µ2 to (f ◦ f ′ )# µ2 . By (27.4), f ◦ f ′ moves points by a distance at
most ε, so W2 (µ2 , f# (f# ′ µ )) ≤ ε. This concludes the proof that f is
2 #
an εe-isometry. ⊓
⊔
796 28 Stability of optimal transport

Compactness of dynamical transference plans and

related objects
The issue now is to show that dynamical transference plans enjoy good
stability properties in a Gromov–Hausdorff approximation. The main
technical difficulty comes from the fact that ε-isometries, being in gen-
eral discontinuous, will not map geodesic paths into continuous paths.
So we shall be led to work on the horribly large space of measurable
paths [0, 1] → X . I shall daringly embed this space in the even much
larger space of probability measures on [0, 1] × X , via the identification

γ 7−→ γ = (Id , γ)# λ, (28.12)

where λ is the Lebesgue measure on [0, 1]. In loose notation,

γ(dt dx) = δx=γ(t) dt. (28.13)

Of course, the ﬁrst marginal of such a measure is always the Lebesgue

measure. (That is, if τ : [0, 1] × X → [0, 1] is the projection on the
first factor then τ# γ = λ.) Moreover, the uniqueness of conditional
measures shows that if γ1 = γ2 , then γ1 = γ2 λ-almost surely, and
therefore actually γ1 = γ2 (because γ1 , γ2 are continuous).
I shall think of the injection i : Γ → P ([0, 1]×X ) defined by i(γ) = γ
as an “inclusion”; any Π ∈ P (Γ ) can be identified with its push-forward
i# Π ∈ P (P ([0, 1] × X )). This point of view is reminiscent of the theory
of Young measures; one of its advantages is that the space P ([0, 1] × X )
is separable, while the space of measurable paths with values in X is
not.
The next theorem expresses the stability of the main objects asso-
ciated with transport (optimal or not). Recall that et stands for the
evaluation at time t, and L(γ) for the length of the curve γ.
Theorem 28.9 (Optimal transport is stable under Gromov–
Hausdorff convergence). Let (Xk , dk )k∈N and (X , d) be compact
geodesic spaces such that Xk converges in the Gromov–Hausdorff topol-
ogy as k → ∞, by means of approximate isometries fk : Xk → X . For
each k ∈ N, let Πk be a Borel probability measure on Γ (Xk ); further,
let πk = (e0 , e1 )# Πk , µk,t = (et )# Πk , and εk,t = (et )# [(L2 /2) Πk ].
Then, after extraction of a subsequence, still denoted with the index k
for simplicity, there is a dynamical transference plan Π on X , with as-
sociated transference plan π(dx dy), measure-valued path (µt (dx))0≤t≤1 ,
and kinetic energy εt (dx), such that:
Compactness of dynamical transference plans and related objects 797

(i) lim (fk ◦)# Πk = Π in the weak topology on P (P ([0, 1] × X ));

k→∞
(ii) lim (fk , fk )# πk = π in the weak topology on P (X × X );
k→∞
(iii) lim (fk )# µk,t = µt in P2 (X ) uniformly in t; more explicitly,
k→∞

lim sup W2 (µk,t , µt ) = 0;

k→∞ t∈[0,1]

(iv) lim (fk )# εk,t = εt , in the weak topology of measures, for each
k→∞
t ∈ (0, 1).
Assume further that each Πk is an optimal dynamical transference
plan, for the square distance cost function. Then:
(v) For each t ∈ (0, 1), there is a choice of the speed fields |vk |
associated with the plans Πk , such that lim |vk | ◦ fk′ = |v|, in the
k→∞
uniform topology;
(vi) The limit Π is an optimal dynamical transference plan, so π is
an optimal transference plan and (µt )0≤t≤1 is a displacement interpo-
lation.

Remark 28.10. In (i) fk ◦ is the map γ → fk ◦ γ, which maps contin-

uous paths [0, 1] → Xk into measurable maps [0, 1] → X (identiﬁed to
probability measures on [0, 1] × X ).

Proof of Theorem 28.9. The proof is quite technical, so the reader might
skip it at first reading and go directly to the last section of this chapter.
In a first step, I shall establish the compactness of the relevant objects,
and in a second step pass to the limit.
It will be convenient to regularize rough paths with the help of some
continuous mollifiers. For δ ∈ (0, 1/2), define

δ+s δ−s
ϕδ (s) = 2
1−δ≤s<0 + 2 10≤s≤δ (28.14)
δ δ
and
ϕδ+ (s) = ϕδ (s − δ), ϕδ− (s) = ϕδ (s + δ). (28.15)
Then supp ϕδ+ ⊂ [0, 2δ] and supp ϕδ− ⊂ [−2δ, 0]. These functions have
a graph that looks like a sharp “tent hat”; their integral (on the real
line) is equal to 1, and as δ → 0 they converge in the weak topology to
the Dirac mass δ0 at the origin.
798 28 Stability of optimal transport

In the sequel, fk : Xk → X will be an εk -isometry, and the sequence

εk goes to 0 as k → ∞.1
Step 1: Compactness. First, [0, 1] × X is a compact metric space,
so the same holds true for P ([0, 1]× X ) and P (P ([0, 1]× X )). Hence, af-
ter extraction of a subsequence, the sequence ((fk ◦)# Πk )k∈N converges
to some Π ∈ P (P ([0, 1] × X )). Taking a further subsequence, we can
assume that limk→∞ (fk , fk )# πk = π for some π ∈ P (X × X ).
Next, since X is bounded and Xk → X , there is a uniform bound C
on the diameters diam (Xk ). So the lengths of all geodesics γk ∈ Γ (Xk )
are all bounded by C, and d(γk (s), γk (t)) ≤ C|s − t| for all times
s, t ∈ [0, 1]. It follows that W2 (µk,s , µk,t ) ≤ C|s − t|, as (es , et )# Πk
is a particular transference plan between µk,s and µk,t. This shows
that the paths (µk,t )k∈N, t∈[0,1] are uniformly continuous in t, with a
uniform modulus of continuity. On the other hand, by Theorem 28.6,
(P2 (Xk ))k∈N converges in the Gromov–Hausdorff topology to P2 (X ).
By Proposition 27.20, there is a subsequence (in k) of the family
((fk )# µk,t )t∈[0,1],k∈N which converges uniformly to a continuous curve
(µt )t∈[0,1] ∈ C([0, 1]; P (X )).
Next, for each t ∈ (0, 1) and each k ∈ N, the total mass of the mea-
sure εk,t is bounded by diam (Xk )2 /2 ≤ C 2 /2; so the same holds true
for the measures (fk )# εk,t , which therefore constitute a precompact
family in the space of nonnegative measures on X . So up to extraction,
we may assume that (fk )# εk,t converges weakly to some measure εt .
To conclude the proof of (i)–(iv), we should establish that
(a) Π is actually concentrated on Γ (X );
(b) π = (e0 , e1 )# Π;
(c) µt = (et )# Π;
(d) εt = (et )# (L2 Π)/2.
Step 2: Embedding in probability measures and passing to
the limit. I shall first mollify the condition “L(γ) = d(γ(0), γ(1))”
(which characterizes geodesics) in such a way that the resulting con-
dition will pass to the limit under weak convergence of probability
measures. Given δ ∈ (0, 1/2), a continuous path γ : [0, 1] → X , and
times t0 , s0 with 0 ≤ t0 < s0 ≤ 1, define
Z 1Z 1
δ
Lt0 →s0 (γ) = d(γ(t), γ(s)) ϕδ+ (t − t0 ) ϕδ− (s − s0 ) dt ds.
0 0
1
In this proof εk and εk,t stand for completely different objects, but I hope this is
not confusing. (As usual the choice of notation is a delicate exercise.)
Compactness of dynamical transference plans and related objects 799

(This is an approximation of the length of γ between times t0 and s0 .)

The function Lδt0 →s0 can be extended into a continuous function on
P ([0, 1] × X ), still denoted Lδt0 →s0 for simplicity:
Z Z
δ
Lt0 →s0 (σ) = d(x, y) ϕδ+ (t−t0 ) ϕδ− (s−s0 ) dσ(t, x) dσ(s, y).
[0,1]×X [0,1]×X

Since fk is an εk -isometry, for any geodesic γk ∈ Γ (Xk ),

Lδt0 →s0 (fk ◦ γk )

Z 1 Z 1h i

= d γk (t), γk (s) + O(εk ) ϕδ+ (t − t0 ) ϕδ− (s − s0 ) dt ds
0 0
Z 1Z 1

= d γk (t), γk (s) ϕδ+ (t − t0 ) ϕδ− (s − s0 ) dt ds + O(εk )
0 0
Z Z
1 1
= d γk (0), γk (1) |s − t| ϕδ+ (t − t0 ) ϕδ− (s − s0 ) dt ds + O(εk )
0 0

= d γk (0), γk (1) |s0 − t0 | + O(δ) + O(εk ).

In particular, taking t0 = 0 and s0 = 1 gives

Lδ0→1 (fk ◦ γk ) = d γk (0), γk (1) (1 + O(δ)) + O(εk ).

Since all of the lengths d(γk (0), γk (1)) are uniformly bounded, we
conclude that there is a constant C such that for all t0 and s0 as above,

 δ δ

Lt0 →s0 k(f ◦ γk ) − |s 0 − t 0 | L (f
0→1 k ◦ γk ≤ C(δ + εk );
)



L δ ◦ γk ) ≤ C.
0→1 (fk

Now for ε, δ > 0, deﬁne

n
Γε,δ (X ) = σ ∈ P ([0, 1] × X ); τ# σ = λ;
o
δ δ
Lt0 →s0 (σ) − |s0 − t0 | L0→1 (σ) ≤ C(δ + ε); Lδ0→1 (σ) ≤ C .
(28.16)

It is easy to see that Γε,δ (X ) is closed in P ([0, 1] × X ). Moreover, for

k large enough one has εk ≤ ε and then i(fk ◦ γk ) ∈ Γε,δ (X ) for any
geodesic γk ∈ Γ (Xk ). It follows that (fk ◦)# Πk ∈ P (Γε,δ (X )) for k large
800 28 Stability of optimal transport

enough; by passing to the limit, also Π ∈ P (Γε,δ (X )). Since ε, δ are

arbitrarily small, \
Π∈ P Γε,δ (X ) .
ε,δ>0

So to conclude the proof of (a) it suﬃces to prove

\
Γε,δ (X ) = Γ (X ).
ε,δ>0
T
Let σ ∈ Taking ε → 0 in (28.16), we get
ε,δ>0 Γε,δ (X ).

δ
Lt0 →s0 (σ) − |s0 − t0 | Lδ0→1 (σ) ≤ δ. (28.17)

In particular,
Lδt0 →s0 (σ) ≤ C(|s0 − t0 | + δ). (28.18)
In Lemma 28.11 below it will be shown that, as a consequence
of (28.18), σ can be written as (Id , γ)# λ for some Lipschitz-continuous
curve γ : [0, 1] → X . Once that is known, the end of the proof of (a) is
straightforward: Since

Lδt0 →s0 (σ) = d γ(t0 ), γ(s0 ) + O(δ),

the inequality (28.17) becomes, in the limit δ → 0,

d γ(t0 ), γ(s0 ) = |s0 − t0 | d γ(0), γ(1) .

This implies that L(γ) = d(γ(0), γ(1)), so γ is a geodesic curve. This

concludes the proof of (a), and of part (i) of Theorem 28.9 at the same
time.
Now I shall use a similar reasoning for the convergence of the
marginals of Π. Given Φ ∈ C(X × X ) and γ ∈ Γ (X ), deﬁne
Z 1Z 1
δ

Φ (γ) = Φ γ(t), γ(s) ϕδ+ (t) ϕδ− (s − 1) dt ds.
0 0

As before, this extends to a continuous function on P ([0, 1] × X ) by

Z Z
δ
Φ (σ) = Φ(x, y) ϕδ+ (t) ϕδ− (s − 1) dσ(t, x) dσ(s, y).
[0,1]×X [0,1]×X

By part (i) of the theorem,

Compactness of dynamical transference plans and related objects 801
Z Z
Φδ (fk ◦ γk ) dΠk (γk ) −→ Φδ (γ) dΠ(γ). (28.19)
Γ (Xk ) Γ (X )

Let us examine the behavior of both sides of (28.19) as δ → 0. If γ

is a geodesic on X , the continuity of Φ and γ implies that Φδ (γ) −→
Φ(γ(0), γ(1)) as δ → 0. Then by dominated convergence,
Z Z
δ

Φ (γ) dΠ(γ) −→ Φ γ(0), γ(1) dΠ(γ)
Γ (X ) Γ (X )
Z Z
= Φ d(e0 , e1 )# Π = Φ dπ. (28.20)

As for the left-hand side of (28.19), things are not so immediate

because fk ◦ γk may be discontinuous. However, for 0 ≤ t ≤ 2δ one has

d fk (γk (0)), fk (γk (t)) = dk γk (0), γk (t) + O(εk ) = O(δ + εk ),

where the implicit constant in the right-hand side is independent of γk .

Similarly, for 1 − 2δ ≤ s ≤ 1, one has

d fk (γk (s)), fk (γk (1)) = O(δ + εk ).

Then it follows from the uniform continuity of Φ that

sup Φ fk (γk (t)), fk (γk (s)) − Φ fk (γk (0)), fk (γk (1)) −→ 0

as δ → 0 and k → ∞, where the supremum is over all γk ∈ Γ (Xk ),

t ∈ [0, 2δ] and s ∈ [1−2δ, 1]. So in this limit, the left-hand side of (28.19)
is well approximated by
Z

Φ fk (γk (0)), fk (γk (1)) dΠk (γk )
Γ (Xk )
Z h i
= Φ d (fk , fk )# (e0 , e1 )# Πk
ZX ×X
= Φ d(fk , fk )# πk . (28.21)
X ×X

The comparison of (28.19), (28.20) and (28.21) shows that (fk , fk )# πk

converges to π, which concludes the proof of (b).
As for (c) we just have to show that limk→∞ (fk )# µk,t0 = µt0 for all
t0 ∈ [0, 1]. The argument is quite similar to the proof of (b). Assume
for example that t0 < 1. Given Φ ∈ C(X ), deﬁne
802 28 Stability of optimal transport
Z 1
Φδt0 (γ) = Φ(γ(t)) ϕδ+ (t − t0 ) dt.
0

This extends to a continuous function on P ([0, 1] × X ), so by (i),

Z Z
Φδt0 (fk ◦ γk ) dΠk (γk ) −→ Φδt0 (γ) dΠ(γ).
Γ (Xk ) Γ (X )
R
R dµt0 (x) as δ → 0, while the
The right-hand side converges to X Φ(x)
left-hand side is well approximated by X Φ(fk (x)) dµk,t0 (x). The con-
clusion follows.
The proof of (d) is obtained by a similar reasoning.
Let us finally turn to the proof of statements (v) and (vi) in the
theorem. In the sequel, it will be assumed that each Πk is an optimal
dynamical optimal transference plan. In view of Theorem 28.5, for each
t ∈ (0, 1), the speed fields |vk,t | can be chosen in such a way that they
satisfy a uniform Hölder-1/2 estimate. Then the precompactness of
|vk,t | results from Ascoli’s theorem, in the form of Proposition 27.20.
So up to extraction, we may assume that |vk,t | ◦ fk′ converges uniformly
to some function |vt |. It remains to show that |vt |2 /2 is an admissible
density for ε, at each time t ∈ (0, 1). For simplicity I shall omit the time
variable, so t is implicit and fixed in (0, 1). Since there is a uniform
bound on the diameter of the spaces Xk (and therefore on |vk |), the
function |vk |2 ◦ fk′ converges uniformly to |v|2 . By uniform continuity
of |vk |2 , the difference between |vk |2 and |vk |2 ◦ (fk′ ◦ fk ) is bounded by
η(k), where η(k) → 0 as k → ∞. After going back to the definitions of
push-forward and weak convergence, it follows that

lim |vk |2 ◦ fk′ (fk )# µk = lim (fk )# |vk |2 µk ) = 2 lim (fk )# εk = 2 ε.
k→∞ k→∞ k→∞
(28.22)
Since |vk |2 ◦ fk′ converges uniformly to |v|2 , and (fk )# µk converges
weakly to µ, the left-hand side in (28.22) converges weakly to |v|2 µ.
This implies that |v|2 /2 is an admissible density for the kinetic energy ε,
and concludes the proof of (v).
The proof of (vi) is easy now. Since π = lim(fk , fk )# πk and fk is an
approximate isometry,
Z Z
2
d(x0 , x1 ) dπ(x0 , x1 ) = lim d(fk (x0 ), fk (x1 ))2 dπk (x0 , x1 )
k→∞
Z
= lim dk (x0 , x1 )2 dπk (x0 , x1 ). (28.23)
k→∞
Compactness of dynamical transference plans and related objects 803

By assumption, πk is optimal for each k, so

Z
dk (x0 , x1 )2 dπk (x0 , x1 ) = W2 (µ0,k , µ1,k )2 . (28.24)

By Theorem 28.6, (fk )# is an approximate isometry P2 (Xk ) → P2 (X ),

so
2
lim W2 (µ0,k , µ1,k )2 = lim W2 (fk )# µ0,k , (fk )# µ1,k = W2 (µ0 , µ1 )2 ,
k→∞ k→∞
(28.25)
where the latter limit follows from the continuity of W2 under weak con-
vergence (Corollary
R 6.11). 2By combining (28.23), (28.24) and (28.25),
2
we deduce that d(x0 , x1 ) dπ(x0 , x1 ) = W2 (µ0 , µ1 ) , so π is an opti-
mal transference plan. Thus Π is an optimal dynamical transference
plan, and the proof of (vi) is complete.
(Note: Since (µk,t )0≤t≤1 is a geodesic path in P2 (Xk ) (recall Corol-
lary 7.22), and (fk )# is an approximate isometry P2 (Xk ) → P2 (X ),
Theorem 27.9 implies directly that the limit path µt = (fk )# µk,t is a
geodesic in P2 (X ); however, I am not sure that the whole theorem can
be proven by this approach.) ⊓
⊔

To complete the proof of Theorem 28.9, it only remains to establish

the next lemma, which was used in the proof of statement (a).

Lemma 28.11. Let (X , d) be a compact geodesic space. Let σ be a prob-

ability measure on [0, 1]×X satisfying (28.18). Then there is a Lipschitz
curve γ : [0, 1] → X such that σ(dt dx) = γ(dt dx) = δx=γ(t) dt.

Proof of Lemma 28.11. First disintegrate σ with respect to its ﬁrst

marginal λ: There is a family (νt )0≤t≤1 , measurable as a map from
[0, 1] to P (X ) and unique up to modiﬁcation on a set of zero Lebesgue
measure in [0, 1], such that

σ(dt dx) = νt (dx) dt.

The goal is to show that, up to modiﬁcation of νt on a negligible set of

times t,
νt (dx) = δx=γ(t) ,
where γ is Lipschitz.
The argument will be divided into three steps. It will be convenient
to use W1 , the 1-Wasserstein distance.
804 28 Stability of optimal transport

Step 1: almost-everywhere Lipschitz continuity. Let β be an

arbitrary nonnegative continuous function on [0, 1] × [0, 1]. Integrat-
ing (28.18) with respect to β yields
Z 1Z 1 Z 1Z 1
β(t0 , s0 ) Lδt0 →s0 (σ) dt0 ds0 ≤ C β(t0 , s0 ) |s0 −t0 |+δ dt0 ds0 .
0 0 0 0
(28.26)
The left-hand side of (28.26) can be written as
Z
ϕδ+ (t − t0 ) ϕδ− (s − s0 ) dνt (x) dt dνs (y) ds dt0 ds0
Z 1Z 1
= F δ (t, s) Λ(t, s) dt ds, (28.27)
0 0

where
 Z 1Z 1

 δ
F (t, s) = β(t0 , s0 ) ϕδ+ (t − t0 ) ϕδ− (s − s0 ) dt0 ds0
Z 0 0


Λ(t, s) = d(x, y) dνt (x) dνs (y).
X ×X

Since F δ (t, s) converges to β(t, s) in C([0, 1] × [0, 1]) as δ → 0, the

expression in (28.27) converges to
Z Z
β(t, s) d(x, y) dνt (x) dt dνs (y) ds.
X ×[0,1] X ×[0,1]

Now plug this back into (28.26) and let δ → 0 to conclude that
Z Z
β(t, s) d(x, y) dνt (x) dt dνs (y) ds
X ×[0,1] X ×[0,1]
Z 1Z 1
≤C β(t, s) |s − t| dt ds.
0 0

As β is arbitrary, we actually have

Z
d(x, y) νt (dx) νs (dy) ≤ C |t − s|
X ×X

for (λ ⊗ λ)-almost all (t, s) in [0, 1] × [0, 1]. In particular,

W1 (νt , νs ) ≤ C |t − s| for almost all (t, s) ∈ [0, 1] × [0, 1]. (28.28)

Compactness of dynamical transference plans and related objects 805

Step 2: true Lipschitz continuity. Now let us show that νt can

be modiﬁed on a negligible set of times t so that (28.28) holds for all
(t, s) ∈ [0, 1] × [0, 1].
For small ε > 0 and t ∈ [ε, 1 − ε], deﬁne
Z ε
ε 1
νt = νt+τ dτ. (28.29)
2ε −ε
Then by Theorem 4.8,
Z ε
ε ε 1
W1 (νt , νs ) ≤ W1 (νt+τ , νs+τ ) dτ ≤ C|t − s| + O(ε). (28.30)
2ε −ε

Next, let (ψk )k∈N be a countable dense subset of C(X ). For any k,
Z Z t+ε Z
ε 1
ψk dνt = ψk (x) dντ (x) dτ. (28.31)
X 2ε t−ε X

Since the expression inside parentheses is a bounded measurable func-

R ensures that as ε → 0, the right-
tion of τ , Lebesgue’s density theorem
hand side of (28.31) converges to X ψk dνt for almost all t. So there is
a negligible subset of [0, 1], say Nk , such that if t ∈
/ Nk then
Z Z
lim ψk dνtε = ψk dνt . (28.32)
ε→0 X X
S∞
Let N = k=1 Nk ; this is a negligible subset of [0, 1]. For all t ∈/ N,
ε
equation (28.32) holds for all k. This proves that limε→0 νt = νt in the
weak topology, for almost all t.
Now for arbitrary t ∈ (0, 1), there is a sequence of times tj → t,
such that νtεj converges to νtj as ε → 0. Then for ε and ε′ sufficiently
small,
′ ′
W1 (νtε , νtε ) ≤ W1 (νtεj , νtεj ) + 2C |t − tj | (28.33)
ε′
≤ W1 (νtεj , νtj ) + W1 (νtj , νtj ) + 2C |t − tj |.
′
It follows that limε,ε′ →0 W1 (νtε , νtε ) = 0. Since (P (X ), W1 ) is a complete
metric space (Theorem 6.18), in fact limε→0 νtε exists for all (not just
almost all) t ∈ (0, 1). The limit coincides with νt for almost all t ∈ (0, 1),
so it defines the same measure σ(dt dx). Redefine νt on a negligible set
of times if necessary, so that the limit is νt for all t ∈ (0, 1). It is possible
to pass to the limit in (28.30) as ε → 0, and recover W1 (νt , νs ) ≤ C|t−s|
for all t, s ∈ (0, 1). Of course this extends to t, s ∈ [0, 1] by continuity.
806 28 Stability of optimal transport

Step 3: Conclusion. From the previous step, W1 (νt , νt0 ) ≤ Cδ if

|t − t0 | ≤ δ. It results from the deﬁnition of Lδt0 →s0 that

Lδt0 →s0 (σ)

Z Z
= d(x, y) ϕδ+ (t − t0 ) ϕδ− (s − s0 ) dνt0 (x) dt dνs0 (y) ds
X ×[0,1] X ×[0,1]
Z + O(δ)
= d(x, y) dνt0 (x) dνs0 (y) + O(δ).
X ×X

Plugging this back into (28.18) and taking δ → 0, we obtain

Z
d(x, y) dνt0 (x) dνs0 (y) ≤ C |s0 − t0 |.
X ×X

This holds for all t0 and s0 , so we can choose s0 = t0 and obtain

Z
d(x, y) dνt0 (x) dνt0 (y) = 0.
X ×X

This is possible only if νt0 is a Dirac measure. Hence for any t0 ∈

[0, 1] there is γ(t0 ) ∈ X such that νt0 = δγ(t0 ) . Then d(γ(t), γ(s)) =
W1 (νt , νs ) ≤ C|t − s|, so γ is Lipschitz continuous. This concludes the
proof of Lemma 28.11. ⊓
⊔

Exercise 28.12 (Gromov–Hausdorff stability of the dual Kan-

torovich problem). With the same notation as in Theorem 28.9, let
ψk be an optimal d2k /2-convex function in the dual Kantorovich prob-
lem between µk,0 and µk,1 . Show that up to extraction of a subsequence,
there are constants ck ∈ R such that ψk ◦ fk′ + ck converges uniformly to
some d2 /2-convex function ψ which is optimal in the dual Kantorovich
problem between µ0 and µ1 .

Noncompact spaces

It will be easy to extend the preceding results to noncompact spaces,

by localization. These generalizations will not be needed in the sequel,
so I shall not be very precise in the proofs.
Noncompact spaces 807

Theorem 28.13 (Pointed convergence of Xk implies local con-

vergence of P2 (Xk )). Let (Xk , dk , ⋆k ) be a sequence of locally com-
pact geodesic Polish spaces converging in the pointed Gromov–Hausdorff
topology to some locally compact Polish space (X , d, ⋆). Then P2 (Xk )
converges to P2 (X ) in the geodesic local Gromov–Hausdorff topology.
Remark 28.14. If a basepoint ⋆ is given in X , there is a natural choice
of basepoint for P2 (X ), namely δ⋆ . However, P2 (X ) is in general not
locally compact, and it does not make sense to consider the pointed
convergence of P2 (Xk ) to P2 (X ).
Remark 28.15. Theorem 28.13 admits the following extension: If
(Xk , dk ) converges to (X , d) in the geodesic local Gromov–Hausdorff
topology, then also P2 (Xk ) converges to P2 (X ) in the geodesic local
Gromov–Hausdorff topology. The proof is almost the same and is left
to the reader.
Proof of Theorem 28.13. Let Rℓ → ∞ be a given increasing sequence
of positive numbers. Define

K (ℓ) = P2 BRℓ ] (⋆) ⊂ P2 (X ),
(ℓ)
Kk = P2 BRℓ ] (⋆k ) ⊂ P2 (Xk ),
where the inclusion is understood in an obvious way (a probability
measure on a subset of X can be seen as the restriction of a probability
measure on X ). Since BRℓ ] (⋆) is a compact set, K (ℓ) is compact too,
(ℓ)
and so is Kk , for each k and each ℓ. Moreover, the union of all Kk is
dense in P2 (X ), as a corollary of Theorem 6.18.
For each ℓ, there is a sequence (fk )k∈N such that each fk is an εk -
isometry BRℓ ] (⋆k ) → BRℓ ] (⋆), where εk → 0. From Proposition 28.7,
(ℓ)
(fk )# is an εek,ℓ -isometry Kk → K (ℓ) , with
p
εek,ℓ ≤ 8 (εk + 2Rℓ εk ),
which goes to 0 as k → ∞. So all the requirements of Definition 27.11
are satisfied, and P2 (Xk ) does converge to P2 (X ) in the local Gromov–
Hausdorff topology.
To check condition (iii) appearing in Definition 27.12, it suffices to
recall that any geodesic in P2 (BRℓ ] (⋆k )) can be written as the law of
a random geodesic joining points in BRℓ ] (⋆k ); such a geodesic has its
image contained in B2Rℓ ] (⋆k ), so we just need to choose ℓ′ large enough
that Rℓ′ ≥ 2Rℓ . ⊓
⊔
808 28 Stability of optimal transport

Exercise 28.16. Write down an analog of Theorem 28.9 for noncom-

pact metric spaces, replacing Gromov–Hausdorﬀ convergence of Xk
by pointed Gromov–Hausdorﬀ convergence, and using an appropriate
“tightness” condition. Hint: Recall that if K is a given compact then
the set of geodesics whose endpoints lie in K is itself compact in the
topology of uniform convergence.

Bibliographical notes
Theorem 28.6 is taken from [577, Section 4], while Theorem 28.13 is
an adaptation of [577, Appendix E]. Theorem 28.9 is new. (A part of
this theorem was included in a preliminary version of [578], and later
removed from that reference.)
The discussion about push-forwarding dynamical transference plans
is somewhat subtle. The point of view adopted in this chapter is the
following: when an approximate isometry f is given between two spaces,
use it to push-forward a dynamical transference plan Π, via (f ◦)# Π.
The advantage is that this is the same map that will push-forward the
measure and the dynamical plan. The drawback is that the resulting
object (f ◦)# Π is not a dynamical transference plan, in fact it may
not even be supported on continuous paths. This leads to the kind of
technical sport that we’ve encountered in this chapter, embedding into
probability measures on probability measures and so on.
Another option would be as follows: Given two spaces X and Y, with
an approximate isometry f : X → Y, and a dynamical transference plan
Π on Γ (X ), define a true dynamical transference plan on Γ (Y), which
is a good approximation of (f ◦)# Π. The point is to construct a recipe
which to any geodesic γ in X associates a geodesic S(γ) in Y that is
“close enough” to f ◦ γ. This strategy was successfully implemented
in the final version of [578, Appendix]; it is much simpler, and still
quite sufficient for some purposes. The example treated in [578] is the
stability of the “democratic condition” considered by Lott and myself;
but certainly this simplified version will work for many other stability
issues. On the other hand, I don’t know if it is enough to treat such
topics as the stability of general weak Ricci bounds, which will be
considered in the next chapter.
The study of the kinetic energy measure and the speed field occurred
to me during a parental meeting of the Crèche Le Rêve en Couleurs
Bibliographical notes 809

in Lyon. My motivations for regularity estimates on the speed are ex-

plained in the bibliographical notes of Chapter 29, and come from a
direction of research which I have more or less left aside for the moment.
I also used it in the tedious proof of Theorem 23.14.
The last section of [822] contains a solution of Exercise 28.12, to-
gether with a statement of upper semicontinuity of d2 /2-subdiﬀerentials
under Gromov–Hausdorﬀ approximation.
29

Weak Ricci curvature bounds I: Definition and

Stability

In Chapter 14 I discussed several reformulations of the CD(K, N )

curvature-dimension bound for a smooth manifold (M, g) equipped
with a reference measure ν whose density (with respect to the vol-
ume element) is smooth. For instance, here is a possible reformula-
tion of CD(K, N ) for N < ∞: For any C 2 function ψ : M → R, let
J (t, · ) be the Jacobian determinant of Tt : x 7−→ expx (t∇ψ(x)), and
1
let D(t, x) = J (t, x) N ; then, with the notation of Theorem 14.11,
(1−t) (t)
D(t, x) ≥ τK,N D(0, x) + τK,N D(1, x). (29.1)
How to generalize this definition in such a way that it would make
sense in a possibly nonsmooth metric-measure space? This is definitely
not obvious since (i) there might be no good notion of gradient, and
(ii) there might be no good notion of exponential map either.
There are many definitions that one may try, but so far the only ap-
proach that yields acceptable results is the one based on displacement
convexity. Recall from Chapters 16 and 17 two displacement convexity
inequalities that characterize CD(K, N ): Let µ0 and µ1 be two com-
pactly supported (for simplification) absolutely continuous probability
measures, let π = (Id , exp ∇ψ)# µ0 be the optimal coupling of (µ0 , µ1 ),
let (ρt ν)0≤t≤1 be the displacement interpolation between µ0 = ρ0 ν and
µ1 = ρ1 ν; let (vt )0≤t≤1 be the associated velocity field (vt = ∇ψ e t in
the notation of Remark 17.16); then for any U ∈ DCN , t ∈ [0, 1],
Z
U (ρt ) dν ≤ (1 − t) Uν (µ0 ) + t Uν (µ1 )
Z 1
1
− KN,U ρs (x)1− N |vs (x)|2 G(s, t) ds, (29.2)
0
812 29 Weak Ricci curvature bounds I: Definition and Stability

and
Z
U (ρt ) dν ≤
Z !
ρ0 (x0 ) (K,N )
(1 − t) U (K,N )
β1−t (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
M ×M β1−t (x0 , x1 )
Z !
ρ1 (x1 ) (K,N )
+t U (K,N )
βt (x0 , x1 ) π(dx0 |x1 ) ν(dx1 ).
M ×M βt (x0 , x1 )
(29.3)
Here G(s, t) is the one-dimensional Green function of (16.6), KN,U is
(K,N )
deﬁned by (17.10), and the distortion coeﬃcients βt = βt are those
appearing in (14.61).
Which of these formulas should we choose for the extension to non-
smooth spaces? When K = 0, both inequalities reduce to just
Z Z Z
U (ρt ) dν ≤ (1 − t) U (ρ0 ) dν + t U (ρ1 ) dν. (29.4)

In the case N < ∞, formula (29.3) is much more convenient to establish

functional inequalities; while in the case N = ∞ it is formula (29.2)
which is easier to use. However, it will turn out that in the case N = ∞,
(29.2) is an immediate consequence of (29.3). All this concurs to suggest
that (29.3) is the correct choice on which we should base the general
definition.
Now we would like to adapt these formulas to a nonsmooth context.
This looks simpler than working with (29.1), but there are still a few
issues to take into account.
(i) First issue: Nonuniqueness of the displacement interpo-
lation. There is a priori no reason to expect uniqueness of the dis-
placement interpolation in a nonsmooth context. We may require the
distorted displacement convexity (29.3) along every displacement inter-
polation, i.e. every geodesic in Wasserstein space; but this is not a good
idea for stability issues. (Recall Example 27.17: in general the geodesics
in the limit space cannot be realized as limits of geodesics.) Instead,
we shall only impose a weak displacement convexity property: For
any µ0 and µ1 there should be some geodesic (µt )0≤t≤1 along which
inequality (29.3) holds true.
To appreciate the difference between “convexity” and “weak con-
vexity”, note the following: If F is a function defined on a geodesic
29 Weak Ricci curvature bounds I: Definition and Stability 813

space X , then the two statements “F is convex along each geodesic

(γt )0≤t≤1 ” and “For any x0 and x1 , there is a geodesic (γt )0≤t≤1 joining
x0 to x1 , such that F (γt ) ≤ (1 − t) F (γ0 ) + t F (γ1 )” are not equivalent
in general. (They become equivalent under some regularity assumption
on X , for instance if any two close enough points in X are joined by a
unique geodesic.)
(ii) Second issue: Treatment of the singular part. Even if µ0
and µ1 are absolutely continuous with respect to ν, there is no guar-
antee that the Wasserstein interpolant µt will also be absolutely con-
tinuous. Also for stability issues it will be useful to work with singular
measures, since P2ac (X , ν) is not closed under weak convergence.
Then the problem arises to devise a “correct” deﬁnition for the
integral functionals of the density which appear in the displacement
convexity inequalities, namely
 Z
 dµ

Uν (µ) = X U dν dν


 Z

 β 1 dµ
Uπ,ν (µ) = U (x) β(x, y) π(dy|x) ν(dx),
X ×X β(x, y) dν
when µ is not absolutely continuous with respect to ν. It would be a
mistake to keep the same definition and replace dµ/dν by the density
of the absolutely continuous part of µ with respect to ν. In fact there
β
is only one natural extension of the functionals U and Uπ,ν ; before
giving it explicitly, I shall try to motivate it. Think of the singular part
of µ as something which “always has infinite density”. Assume that
the respective contributions of finite and infinite values of the density
decouple, so that one would define separately the contributions of the
absolutely continuous part µac and of the singular part µs . Only the
asymptotic behavior of U (r) as r → ∞ should count when one defines
the contribution of µs . Finally, if U (r) were R increasing like cr, it is
natural to assume that Uν (µs ) should be X c dµs = c µs [X ]. So it is
the asymptotic slope of U that should matter. Since U is convex, there
is a natural notion of asymptotic slope of U :
U (r)
U ′ (∞) := lim = lim U ′ (r) ∈ R ∪ {+∞}. (29.5)
r→∞ r r→∞

This suggests the addition of a new term U ′ (∞) µs [X ] in the deﬁnitions

β
of Uν (µ) and Uπ,ν (µ).
814 29 Weak Ricci curvature bounds I: Definition and Stability

Integral functionals of singular measures

The discussion in the previous paragraph should make the following

deﬁnition natural.

Definition 29.1 (Integral functionals for singular measures).

Let (X , d, ν) be a locally compact metric-measure space, where ν is lo-
cally finite; let U : R+ → R be a continuous convex function with
U (0) = 0, and let µ be a measure on X , compactly supported. Let

µ = ρ ν + µs

be the Lebesgue decomposition of µ into absolutely continuous and sin-

gular parts. Then:
(i) define the integral functional Uν , with nonlinearity U and refer-
ence measure ν, by
Z
Uν (µ) := U (ρ(x)) ν(dx) + U ′ (∞) µs [X ];
X

(ii) if x → π(dy|x) is a family of probability measures on X , indexed

by x ∈ X , and β is a measurable function X × X → (0, +∞], define
β
the integral functional Uπ,ν with nonlinearity U , reference measure ν,
coupling π and distortion coefficient β, by
Z
β ρ(x)
Uπ,ν (µ) := U β(x, y) π(dy|x) ν(dx) + U ′ (∞) µs [X ].
X ×X β(x, y)
(29.6)
β
Remark 29.2. It is clear that Uπ,ν reduces to Uν when β ≡ 1, i.e. when
there is no distortion. In the sequel, I shall use Definition 29.1 only
(K,N )
with the special coefficients βt (x, y) = βt (x, y) defined in (14.61)–
(14.64).

Remark 29.3. Remark 17.27 applies here too: I shall often identify π
with the probability measure π(dx dy) = µ(dx) π(dy|x) on X × X .

Remark 29.4. The new deﬁnition of Uν takes care of the subtleties

linked to singularities of the measure µ; there are also subtleties linked
to the behavior at infinity, but I shall take them into account only
in the next chapter. For the moment I shall only consider compactly
supported displacement interpolations.
Integral functionals of singular measures 815
β
Remark 29.5. For Uπ,ν the situation is worse because (29.6) might
not be a priori well-defined in R ∪ {+∞} if β is unbounded (recall
Remark 17.32). In the sequel, this ambiguity will occur in two limit
cases: one is when X satisfies CD(K, p N ) and has exactly the limit
Bonnet–Myers diameter DK,N = π (N − 1)/K; the other is when
N = 1. In both cases I shall use Convention 17.30 to make sense of
β β
Uπ,ν (µ). It will turn out a posteriori that Uπ,ν (µ) is never −∞ when π
arises from an optimal transport.

For later use I record here two elementary lemmas about the func-
β
tionals Uπ,ν . The reader may skip them at ﬁrst reading and go directly
to the next section.
β
First, there is a handy way to rewrite Uπ,ν (µ) when µs = 0:

Lemma 29.6 (Rewriting of the distorted Uν functional). With

the notation of Definition 29.1,
Z
β ρ(x) β(x, y)
Uπ,ν (µ) = U π(dx dy) (29.7)
X ×X β(x, y) ρ(x)
Z
ρ(x)
= v π(dx dy), (29.8)
X ×X β(x, y)

where v(r) = U (r)/r, with the conventions U (0)/0 = U ′ (0) ∈ [−∞, +∞),
U (∞)/∞ = U ′ (∞) ∈ (−∞, +∞], and ρ = 0 on Spt µs .

Proof of Lemma 29.6. The identity is formally obvious if one notes that
ρ(x) π(dy|x) ν(dx) = π(dy|x) µ(dx) = π(dy dx); so all the subtlety lies
in the fact that in (29.7) the convention is U (0)/0 = 0, while in (29.8)
it is U (0)/0 = U ′ (0). Switching between both conventions is allowed
because the set {ρ = 0} is anyway of zero π-measure. ⊓
⊔
β
Secondly, the functionals Uπ,ν (and the functionals Uν ) satisfy a
principle of “rescaled subadditivity”, which might at ﬁrst sight seem
contradictory with the convexity property, but is not at all.

Lemma 29.7 (Rescaled subadditivity of the distorted Uν func-

tionals). Let (X , d, ν) be a locally compact metric-measure space,
where ν is locally finite, and let β be a positive measurable function
on X × X . Let U be a continuous convex function with U (0) = 0.
Let µ1 , . . . , µk be probability measures on X , let π1 , . . . , πk be probabil-
P measures on X × X , and let Z1 , . . . , Zk be
ity positive numbers with
Zj = 1. Then, with the notation Ua (r) = a−1 U (ar), one has
816 29 Weak Ricci curvature bounds I: Definition and Stability

β
X X
UP Zj πj ,ν Zj µj ≥ Zj (UZj )βπj ,ν (µj ),
j
j

with equality if the measures µk are singular with respect to each other.

Proof of Lemma 29.7. By induction, it is suﬃcient to treat the case

k = 2. The following inequality will be useful: If X1 , X2 , p1 , p2 are any
four nonnegative numbers then (with the convention U (0)/0 = U ′ (0))

U (X1 + X2 ) U (X1 ) U (X2 )

(p1 + p2 ) ≥ p1 + p2 . (29.9)
X1 + X2 X1 X2
This comes indeed from U (r)/r being a nonincreasing function of r.
Now, in view of Lemma 29.6,

UZβ1 π1 +Z2 π2 ,ν (Z1 µ1 + Z2 µ2 )

Z
Z1 ρ1 (x) + Z2 ρ2 (x) β(x, y)
= U (Z1 π1 + Z2 π2 )(dx dy);
β(x, y) Z1 ρ1 (x) + Z2 ρ2 (x)
Z
β Z1 ρ1 (x) β(x, y)
UZ1 ,π1 ,ν (µ1 ) = U π1 (dx dy);
β(x, y) Z1 ρ1 (x)
Z
β Z2 ρ2 (x) β(x, y)
UZ2 ,π2 ,ν (µ2 ) = U π2 (dx dy).
β(x, y) Z2 ρ2 (x)
So to prove the lemma it is suﬃcient to show that

Z1 ρ1 + Z2 ρ2 β
U (Z1 π1 + Z2 π2 )
β Z1 ρ1 + Z2 ρ2

Z1 ρ1 β Z2 ρ2 β
≥U (Z1 π1 ) + U (Z2 π2 ). (29.10)
β Z1 ρ1 β Z2 ρ2

But this is an obvious consequence of (29.9) with X1 = Z1 ρ1 (x)/β(x, y),

X2 = Z2 ρ2 (x)/β(x, y), p1 = d(Z1 π1 )/d(Z1 π1 + Z2 π2 )(x, y), and p2 =
d(Z2 π2 )/d(Z1 π1 + Z2 π2 )(x, y). ⊓
⊔

Synthetic definition of the curvature-dimension bound

In the next deﬁnition I shall say that an optimal transference plan

π is associated with a displacement interpolation (µt )0≤t≤1 if there
Synthetic definition of the curvature-dimension bound 817

is a dynamical optimal transference plan Π such that µt = (et )# Π,

π = (e0 , e1 )# π. (Equivalently, there is a random geodesic γ such that
π = law (γ0 , γ1 ) and µt = law (γt ).) Also, if π is a given probability
measure on X ×X , I shall denote by π̌ the probability measure obtained
from π by “exchanging x and y”; more rigorously, π̌ = S# π, where
S(x, y) = (y, x).

Definition 29.8 (Weak curvature-dimension condition). Let

K ∈ R and N ∈ [1, ∞]. A locally compact, complete σ-finite metric-
measure geodesic space (X , d, ν) is said to satisfy a weak CD(K, N )
condition, or to be a weak CD(K, N ) space, if the following is satisfied:
Whenever µ0 and µ1 are two compactly supported probability measures
with Spt µ0 , Spt µ1 ⊂ Spt ν, there exist a displacement interpolation
(µt )0≤t≤1 and an associated optimal coupling π of (µ0 , µ1 ) such that,
for all U ∈ DCN and for all t ∈ [0, 1],
(K,N) (K,N)
β 1−t β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ). (29.11)

Roughly speaking, the weak CD(K, N ) condition states that the

functionals Uν are “jointly” weakly displacement convex with distor-
(K,N )
tion coefficients (βt ), for all U ∈ DCN . This is a property of the
triple (X , d, ν), but for simplicity I shall often abbreviate the state-
ment “(X , d, ν) satisfies a weak CD(K, N ) condition” into “X satisfies
a weak CD(K, N ) condition”, with the understanding that the distance
and reference measure should be clear from the context.
Before going any further, I shall make explicit the fact that this
definition is an extension of the usual one, and connect the synthetic
notion of weak CD(K, N ) space with the corresponding analytic notion
(considered in Chapter 14, and defined for instance in terms of the
modified Ricci curvature tensor (14.36)).

Theorem 29.9 (Smooth weak CD(K, N ) spaces are CD(K, N )

manifolds). Let (M, g) be a smooth Riemannian manifold, equipped
with its geodesic distance d, its volume measure vol, and a refer-
ence measure ν = e−V vol, where V ∈ C 2 (M ). Then, (M, d, ν) is a
weak CD(K, N ) space if and only if (M, g, ν) satisfies the CD(K, N )
curvature-dimension bound; or equivalently, if the modified Ricci tensor
RicN,ν satisfies RicN,ν ≥ Kg.
818 29 Weak Ricci curvature bounds I: Definition and Stability

Proof of Theorem 29.9. By Theorem 10.41 (unique solvability of the

Monge problem), Corollary 7.23 (uniqueness of the displacement inter-
polation) and Theorem 17.37 (characterization of CD(K, N ) via dis-
torted displacement convexity), (M, g, ν) satisﬁes the CD(K, N ) bound
if and only if (29.11) holds true as soon as µ0 , µ1 are absolutely contin-
uous with respect to ν. So it only remains to show that if (29.11) holds
for absolutely continuous µ0 , µ1 then it also holds for singular measures
µ0 , µ1 . This will follow from Corollary 29.23 later in this chapter.1 ⊓⊔

The end of this section is devoted to a series of comments about the

definition of weak CD(K, N ) spaces.
• In Definition 29.8 I was careful to impose displacement convexity
inequalities along some Wasserstein geodesic, because such geodesics
might not be unique. There are two possible reasons for this nonunique-
ness: one is the a priori lack of smoothness of the metric space X ; the
other is the fact that µ0 , µ1 might be singular. Even on a Riemannian
manifold, it is easy to construct examples of measures µ0 , µ1 which are
joined by more than one displacement interpolation (just take µ0 = δx0 ,
µ1 = δx1 , where x0 and x1 are joined by multiple geodesics). However, it
will turn out later that displacement convexity inequalities hold along
all Wasserstein geodesics if the space X satisfies some mild regularity
assumption, namely if it is nonbranching (see Theorem 30.32).
• An important property of the classical CD(K, N ) condition is that
it is more and more stringent as K increases and as N decreases. The
next proposition shows that the same is true in a nonsmooth setting.

Proposition 29.10 (Consistency of the CD(K, N ) conditions).

The weak condition CD(K, N ) becomes more and more stringent as K
increases, and as N decreases.

Proof of Proposition 29.10. First, the class DCN becomes smaller as N

increases, which means fewer conditions to satisfy. Next, recall that
(K,N ) (K,N )
βt and β1−t are increasing in K and decreasing in N (as noticed
right after Definition 14.19); since U (r)/r is nonincreasing, the quan-
(K,N ) (K,N ) (K,N ) (K,N )
tities β1−t U (ρ0 /β1−t ) and βt U (ρ1 /βt ) are nondecreasing
in N and nonincreasing in K. The conclusion follows immediately. ⊓ ⊔
1
This is one of the rare instances in this book where a result is used before it is
proven; but I think the resulting presentation is more clear and pedagogical.
Synthetic definition of the curvature-dimension bound 819
(K,N )
• In the case K > 0 and N < ∞, the coefficient βp t (x, y) takes
the value +∞ if 0 < t < 1 and d(x, y) ≥ DK,N := π (N − 1)/K. In
that case the natural convention is ∞ U (r/∞) = U ′ (0) r, in accordance
with Lemma 29.6. With this convention, Definition 29.8 implies that
the diameter of the support of ν is automatically bounded above by
DK,N . Otherwise, take x0 , x1 ∈ Spt ν with d(x0 , x1 ) > DK,N and choose
r > 0 small enough that d(x′0 , x′1 ) > DK,N for all x′0 ∈ Br (x0 ), x′1 ∈
Br (x1 ). Take ρ0 = 1Br (x0 ) /ν[Br (x0 )] and ρ1 = 1Br (x1 ) /ν[Br (x1 )] in
the definition of the weak CD(K, N ) bound. Then the coefficients βt
appearing in the right-hand side of (29.11) are identically +∞, and the
measures have no singular part; so that inequality becomes just
Z Z
′
Uν (µt ) ≤ U (0) (1 − t) ρ0 dν + t ρ1 dν = U ′ (0). (29.12)

Now choose U (r) = −r 1−1/N : Then U ′ (0) = −∞, so (29.12) implies

Uν (µ) = R −∞. On the other hand, by Jensen’s inequality, Uν (µt ) ≥
−ν[S] ( ρt dν/ν[S])1−1/N ≥ −ν[S]1/N , where S stands for the support
of µt ; so Uν (µt ) cannot be −∞. This contradiction proves the claim.
Let me record the conclusion in the form of a separate statement:
Proposition 29.11 (Bonnet–Myers diameter bound for weak
CD(K, N ) spaces). If (X , d, ν) is a weak CD(K, N ) space with K > 0
and N < ∞, then
r
N −1
diam (Spt ν) ≤ DK,N := π .
K
As a corollary, when one uses inequality (29.11) in a weak CD(K, N )
space, the distortion coefficients appearing in the right-hand side are
in fact always finite, except possibly in the two limit cases N = 1 and
β
diam (Spt ν) = DK,N . (In both cases, Uπ,ν (µ) is defined as in Conven-
tion 17.30.)
• To check Definition 29.8, it is not really necessary to establish
inequality (29.11) for the whole class DCN : It is sufficient to restrict to
members of DCN that are nonnegative, and Lipschitz (for N < ∞), or
behaving at infinity like O(r log r) (for N = ∞). This is the content of
the next statement.
Proposition 29.12 (Sufficient condition to be a weak CD(K, N )
space). In Definition 29.8, it is equivalent to require that inequal-
ity (29.11) hold for all U ∈ DCN , or just for those U ∈ DCN which are
nonnegative and satisfy:
820 29 Weak Ricci curvature bounds I: Definition and Stability

• U is Lipschitz, if N < ∞;
• U is locally Lipschitz and U (r) = a r log r + b r for r large enough,
if N = ∞ (a ≥ 0, b ∈ R).

Proof of Proposition 29.12. Let us assume that (X , d, ν) satisﬁes Deﬁ-

nition 29.8, except that (29.11) holds true only for those U satisfying
the above conditions; and we shall check that inequality (29.11) holds
true for all U ∈ DCN . Thanks to Convention 17.30 it is suﬃcient to
(K,N )
prove it under the assumption that βt is bounded; then the two
limit cases N = 1 and diam (X ) = DK,N are treated by taking another
dimension N ′ > N and letting N ′ ↓ N (the same reasoning will be used
later to establish Corollary 29.23 and Theorem 29.24).
The proof will be performed in three steps.
Step 1: Relaxing the nonnegativity condition. Let U ∈ DCN
be Lipschitz (for N < ∞) or locally Lipschitz and behaving like
a r log r + b r for r large enough (for N = ∞), but not necessarily
nonnegative. Then we can decompose U as
e (r) − A r,
U (r) = U

where Ue ∈ DCN ∩Lip(R+ , R+ ) and A ≥ 0 (choose A = max(−U ′ (0), 0)).

By assumption, with the same notation as in Deﬁnition 29.8, one has
the inequality
(K,N) (K,N)
β1−t
eν (µt ) ≤ (1 − t) U
U eπ,ν e βt
(µ0 ) + t U (µ1 ). (29.13)
π̌,ν

Write µt = ρt ν + (µt )s for the Lebesgue decomposition of µt with

respect to ν. The replacement of U e by U amounts to adding to the
left-hand side
Z
A ρt dν + (µt )s [X ] = A µt [X ] = A,

and to the right-hand side

Z
ρ0 (x0 ) (K,N )
A (1 − t) (K,N )
β1−t (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
X ×X β1−t (x0 , x1 )
Z
ρ1 (x1 ) (K,N )
+ At (K,N )
βt (x0 , x1 ) π(dx0 |x1 ) ν(dx1 ) = A
X ×X βt (x0 , x1 )
e replaced by U .
also. So (29.13) also holds true with U
Synthetic definition of the curvature-dimension bound 821

Step 2: Behavior at the origin. Let U ∈ DCN be such that U is

Lipschitz at inﬁnity (N < ∞) or behaves like a r log r + b r at inﬁnity
(N = ∞). By Proposition 17.7(v), there is a nonincreasing sequence
(Uℓ )ℓ∈N , converging pointwise to U , coinciding with U on [1, +∞), with
Uℓ′ (0) > −∞ and Uℓ′ (0) → U ′ (0). Each Uℓ is locally Lipschitz, and one
has, by Step 1,
(K,N) (K,N)
β
1−t β
(Uℓ )ν (µt ) ≤ (1 − t) (Uℓ )π,ν t
(µ0 ) + t (Uℓ )π̌,ν (µ1 ). (29.14)

The problem is to pass to the limit as ℓ → ∞. In the left-hand side this

is obvious, since U ≤ Uℓ . In the right-hand side, this will follow from
the monotone convergence theorem as soon as we have veriﬁed that the
integrands are bounded above, uniformly in ℓ, by integrable functions.
Let us check this for the ﬁrst term in the right-hand side of (29.14): If
ρ0 (x0 )/β1−t (x0 , x1 ) ≥ 1, then
!
ρ0 (x0 ) (K,N )
(Uℓ ) (K,N )
β1−t (x0 , x1 )
β1−t (x0 , x1 )
!
ρ0 (x0 ) (K,N )
=U (K,N )
β1−t (x0 , x1 );
β1−t (x0 , x1 )

and otherwise, the same expression is bounded above by

ρ0 (x0 ) (K,N )
(Uℓ )′ (1) (K,N )
β1−t (x0 , x1 ) ≤ U ′ (1) ρ0 (x0 ).
β1−t (x0 , x1 )

In both cases, the upper bound belongs to L1 π(dx0 |x1 ) ν(dx1 ) , which
proves the claim.
Step 3: Behavior at infinity. Finally we consider the case of a
general U ∈ DCN , and approximate it at inﬁnity so that it has the
desired behavior. The reasoning is pretty much the same as for Step 2.
Let us assume for instance N < ∞. By Proposition 17.7(iv), there is
a nondecreasing sequence (Uℓ )ℓ∈N , converging pointwise to U , with the
desired behavior at inﬁnity, and Uℓ′ (∞) → U ′ (∞). By Step 2,
(K,N) (K,N)
1−t β β
(Uℓ )ν (µt ) ≤ (1 − t) (Uℓ )π,ν t
(µ0 ) + t (Uℓ )π̌,ν (µ1 ),

and it remains to pass to the limit as ℓ → ∞. In the right-hand side,

this is obvious since Uℓ ≤ U . The left-hand side may be rewritten as
822 29 Weak Ricci curvature bounds I: Definition and Stability
Z
Uℓ (ρt ) dν + Uℓ′ (∞) (µt )s [X ]. (29.15)

Then we know that Uℓ′ (∞) → U ′ (∞), so we may pass to the limit
in the second term of (29.15). To pass to the limit in the first term
by monotone convergence, it suffices to check that Uℓ (ρt ) is bounded
below, uniformly in ℓ, by a ν-integrable function. But this is true since,
for instance, U0 which is bounded below by an affine function of the
form r → − C(r + 1), C ≥ 0; so Uℓ (ρt ) ≥ − Cρt − C 1ρt >0 , and the
latter function is integrable since ρt has compact support. ⊓
⊔

• In Deﬁnition 29.8 I imposed µ0 and µ1 to be compactly supported.

This assumption can actually be relaxed, but the deﬁnition which one
gets by so doing is not stronger; see Theorem 30.5 in the next chapter
for more details. Conversely, one could also have imposed µ0 and µ1 to
be absolutely continuous, or even to have continuous densities, but the
deﬁnition would not be weaker (see Corollary 29.23).
• Finally, here are some examples of weak CD(K, N ) spaces:

RExample 29.13. Let V be a continuous function Rn → R with

e−V (x) dx < ∞, let ν(dx) = e−V (x) dx, and let d2 be the Euclidean
distance. Then the space (Rn , d2 , ν) satisfies the usual CD(K, ∞) con-
dition if V is C 2 and ∇2 V ≥ K In in the classical sense. It satisfies the
weak CD(K, ∞) condition without any regularity assumption on V , as
soon as ∇2 V ≥ K In in the sense of distributions, which means that V is
K-convex. For instance, if V is merely convex, then (Rn , d2 , ν) satisfies
the weak CD(0, ∞) condition. To see this, note that if µ(dx) = ρ(x) dx,
then
Z Z Z
Hν (µ) = ρ(x) log ρ(x) dx + ρ(x) V (x) dx = H(µ) + V dµ;

then the ﬁrst term is always displacement convex, and the second is
displacement convex if V is convex (simple exercise).

Remark 29.14. If V is not convex, then one can ﬁnd x0 , x1 ∈ Rn and

t ∈ [0, 1] such that

V (1 − t)x0 + tx1 > (1 − t) V (x0 ) + t V (x1 ).

Then let ρ be a compactly supported probability density, and ρǫ =

ǫ−n ρ( · /ǫ): then ρǫ (x) dx converges weakly to δ0 , so for ǫ small enough,
Synthetic definition of the curvature-dimension bound 823
Z

V (x) ρǫ x − (1 − t)x0 − tx1 dx >
Z Z
(1 − t) V (x) ρ (x − x0 ) dx + t V (x) ρǫ (x − x1 ) dx.
ǫ

R
On the other hand, ρǫ (x−v) log ρǫ (x−v) dx is independent of v ∈ Rn ;
so

He−V dx ρǫ ( · − (1 − t)x0 − tx1 ) dx >

(1 − t) He−V dx ρǫ ( · − x0 ) + t He−V dx ρǫ ( · − x1 ) .
Since the path (ρǫ (x − (1 − s)x0 − sx1 ) dx)0≤s≤1 is a geodesic interpola-
tion (this is the translation at uniform speed, corresponding to ∇ψ =
constant), we see that (Rn , d2 , e−V (x) dx) cannot be a weak CD(0, ∞)
space. So the conclusion of Example 29.13 can be refined as follows:
(Rn , d2 , e−V (x) dx) is a weak CD(0, ∞) space if and only if V is convex.
Example 29.15. Let M be a smooth compact n-dimensional Rieman-
nian manifold with nonnegative Ricci curvature, and let G be a com-
pact Lie group acting isometrically on M . (See the bibliographical
notes for references on these notions.) Then let X = M/G and let
q : M → X be the quotient map. Equip X with the quotient distance
d(x, y) = inf{dM (x′ , y ′ ); q(x′ ) = x, q(y ′ ) = y}, and with the measure
ν = q# volM . The resulting space (X , d, ν) is a weak CD(0, n) space,
that in general will not be a manifold. (There will typically be singu-
larities at fixed points of the group action.)
Example 29.16. It will be shown in the concluding chapter that
(Rn , k·k, λn ) is a weak CD(K, N ) space, where k · k is any norm on Rn ,
and λn is the n-dimensional Lebesgue measure. This example proves
that a weak CD(K, N ) space may be “strongly” branching (recall the
discussion in Example 27.17).
Q
Example 29.17. Let X = ∞ i=1 Ti , where Ti = R/(εi Z) is equipped
with the usual distance di and the normalized Lebesgue
P 2 measure λi ,
and εi = 2 diam (Ti ) is qsome positive number. If εi < +∞ then the
P 2
product distance d = di turns X into a compact metric space.
Q
Equip X with the product measure ν = λi ; then (X , d, ν) is a weak
CD(0,Q∞) space. (Indeed, it is the measured Gromov–Hausdorff limit of
X = kj=1 Tj which is CD(0, k), hence CD(0, ∞); and it will be shown
in Theorem 29.24 that the CD(0, ∞) property is stable under measured
Gromov–Hausdorff limits.)
824 29 Weak Ricci curvature bounds I: Definition and Stability

The remaining part of the present chapter is devoted to a proof of

stability for the weak CD(K, N ) property.

β
Continuity properties of the functionals Uν and Uπ,ν

In this section, I shall explain some of the remarkable properties of the

integral functionals appearing in Definition 29.1. For the moment it will
be sufficient to restrict to the case of a compact space X , and it will be
β
convenient to consider that Uν and Uπ,ν are defined on the set of all
(nonnegative) finite signed Borel measures, not necessarily probability
measures. (Actually, in Definition 29.1 it was not assumed that µ is
a probability measure.) One may even think of these functionals as
defined on the whole vector space M (X ) of finite Borel measures on X ,
with the convention that their value is +∞ if µ is not nonnegative;
β
then Uν and Uπ,ν are true convex functional on M (X ).
It will be convenient to study the functionals Uν by means of their
Legendre representation. Generally speaking, the Legendre repre-
sentation of a convex functional Φ defined on a vector space E is an
identity of the form
n o
Φ(x) = sup hΛ, xi − Ψ (Λ) ,

where Λ varies over a certain subset of E ∗ , and Ψ is a convex func-

tional of Λ. Usually, Λ varies over the whole set E ∗ , and Ψ (Λ) =
supx∈E [hΛ, xi− Φ(x)] is the Legendre transform of Φ; but here we don’t
really want to do so, because nobody knows what the huge space M (X )∗
looks like. So it is better to restrict to subspaces of M (X )∗ . There are
several natural possible choices, resulting in various Legendre represen-
tations; which one is most convenient depends on the context. Here are
the ones that will be useful in the sequel.
Definition 29.18 (Legendre transform of a real-valued convex
function). Let U : R+ → R be a continuous convex function with
U (0) = 0; its Legendre transform is defined on R by
h i
U ∗ (p) = sup p r − U (r) .
r∈R+

It is easy to check that U ∗ is a convex function, taking the value

−U (0) = 0 on (−∞, U ′ (0)] and +∞ on (U ′ (∞), +∞).
β
Continuity properties of the functionals Uν and Uπ,ν 825

Proposition 29.19 (Legendre representation of Uν ). Let U :

R+ → R be a continuous convex function with U (0) = 0, let X be a
compact metric space, equipped with a finite reference measure ν. Then,
whenever µ is a finite measure on X ,

Z Z
∗ ∞ ′
(i) Uν (µ) = sup ϕ dµ − U (ϕ) dν; ϕ ∈ L (X ); ϕ ≤ U (∞)
X X

Z Z
(ii) Uν (µ) = sup ϕ dµ − U ∗ (ϕ) dν; ϕ ∈ C(X ),
X X

1
U′ ≤ ϕ ≤ U ′ (M ); M ∈ N .
M
The deceiving simplicity of these formulas hides some subtleties: For
instance, it is in general impossible to drop the restriction ϕ ≤ U ′ (∞)
in (i), so the supremum is not taken over the whole vector space L∞ (X )
but only on a subspace thereof. Proposition 29.19 can be proven by
elementary tools of measure theory; see the bibliographical notes for
references and comments.
The next statement gathers the three important properties on which
the main results of this chapter rest: (i) Uν (µ) is lower semicontinuous
in (µ, ν); (ii) Uν (µ) is never increased by push-forward; (iii) µ can be
β
regularized in such a way that Uπ,ν (µ) is upper semicontinuous in (π, µ)
along the approximation. In the next statement, M+ (X ) will stand for
the set of ﬁnite (nonnegative) Borel measures on X , and L1+ (ν) for the
set of nonnegative ν-integrable measurable functions on X .

Theorem 29.20 (Continuity and contraction properties of Uν

β
and Uπ,ν ). Let (X , d) be a compact metric space, equipped with a
finite measure ν. Let U : R+ → R+ be a convex continuous function,
with U (0) = 0. Further, let β(x, y) be a continuous positive function on
X × X . Then, with the notation of Definition 29.1:
(i) Uν (µ) is a weakly lower semicontinuous function of both µ and ν
in M+ (X ). More explicitly, if µk → µ and νk → ν in the weak topology
of convergence against bounded continuous functions, then

Uν (µ) ≤ lim inf Uνk (µk ).

k→∞
826 29 Weak Ricci curvature bounds I: Definition and Stability

(ii) Uν satisfies a contraction principle in both µ and ν; that is, if Y

is another compact space, and f : X → Y is any measurable function,
then
Uf# ν (f# µ) ≤ Uν (µ).

(iii) If U “grows at most polynomially”, in the sense that

∀r > 0, r U ′ (r) ≤ C (U (r)+ + r), (29.16)

then for any probability measure µ ∈ P (X ), with Spt µ ⊂ Spt ν, there is

a sequence (µk )k∈N of probability measures converging weakly to µ, such
that each µk has a continuous density, and for any sequence (πk )k∈N
converging weakly to π in P (X × X ), such that πk admits µk as first
marginal and Spt πk ⊂ (Spt ν) × (Spt ν),
β
lim sup Uπβk ,ν (µk ) ≤ Uπ,ν (µ). (29.17)
k→∞

Remark 29.21. The assumption of polynomial growth in (iii) is ob-

viously true if U is Lipschitz; or if U (r) behaves at inﬁnity like
a r log r + b r (or like a polynomial).

Exercise 29.22. Use Property (ii) in the case U (r) = r log r to recover
the Csiszár–Kullback–Pinsker inequality (22.25). Hint: Take f to be
valued in {0, 1}, apply Csiszár’s two-point inequality

x 1−x
∀x, y ∈ [0, 1], x log + (1 − x) log ≥ 2 (x − y)2
y 1−y

and optimize the choice of f .

Proof of Theorem 29.20. To prove (i), note that U ∗ is continuous on

[U ′ (1/M ), U ′ (M )); so if ϕ is continuous with values in [U ′ (1/M ), U ′ (M )],
then U ∗ (ϕ) is also continuous. Then Proposition 29.19(ii) can be rewrit-
ten as Z Z
Uν (µ) = sup ϕ dµ + ψ dν ,
(ϕ,ψ)∈ U

where U is a certain subset of C(X ) × C(X ). In particular, Uν (µ) is a

supremum of continuous functions of (µ, ν); it follows that Uν is lower
semicontinuous.
To prove (ii), pick up any ϕ ∈ L∞ (X ) with ϕ ≤ U ′ (∞). Then
β
Continuity properties of the functionals Uν and Uπ,ν 827
Z Z Z Z
(ϕ ◦ f ) dµ − U ∗ (ϕ ◦ f ) dν = ϕ d(f# µ) − U ∗ (ϕ) d(f# ν).
X X Y Y

If ϕ ≤ U ′ (∞), then also ϕ ◦ f ≤ U ′ (∞); similarly, if ϕ is bounded, then

also ϕ ◦ f is bounded. So
Z Z
∗
sup ψ dµ − U (ψ) dν
ψ∈L∞ ; ψ≤U ′ (∞) X X
Z Z
∗
≤ sup ϕ d(f# µ) − U (ϕ) d(f# ν) .
ϕ∈L∞ ; ϕ≤U ′ (∞) Y Y

By Proposition 29.19(i), the left-hand side coincides with Uν (µ), and

the right-hand side with Uf# µ (f# ν). This concludes the proof of the
contraction property (ii).
Now let us consider the proof of (iii), which is a bit tricky. For
pedagogical reasons I shall ﬁrst treat a simpler case.
Particular case: β ≡ 1. (Then there is no need for any restriction on
the growth of U .)
Let ε = εk be a sequence in (0, 1), εk → 0, and let Kε (x, y) be
a sequence of symmetric continuous nonnegative functions on X × X
satisfying
 Z

∀x ∈ Spt ν, Kε (x, y) ν(dy) = 1;

∀x, y ∈ X , d(x, y) ≥ ε =⇒ Kε (x, y) = 0.

Such kernels induce a regularization of probability measures, as recalled

in the First Appendix. On S = Spt ν, deﬁne
Z
ρε (x) = Kε (x, y) µ(dy);
Spt ν

this is a (uniformly) continuous function on a compact set. By the

Tietze–Urysohn extension theorem, ρε can be extended into a contin-
uous function ρeε on the whole of X . Of course, ρε and ρeε coincide
ν-almost everywhere. We shall see that µε = ρε ν (or, more explicitly,
µk = ρεk ν) does the job for statement (iii).
Let us ﬁrst assume that µ is absolutely continuous, and let ρ be its
density. Since Spt µ and Spt µε are included in S = Spt ν,
828 29 Weak Ricci curvature bounds I: Definition and Stability
Z Z Z
Uν (µ) = U (ρ) dν = U (ρ) dν; Uν (µε ) = U (ρε ) dν.
X S S

So up to changing X for S, we might just assume that Spt(ν) = X .

Then for each x ∈ X , Kε (x, y) ν(dy) is a probability measure on X ,
and by Jensen’s inequality,
Z Z
U (ρε (x)) = U Kε (x, y) ρ(y) ν(dy) ≤ Kε (x, y) U (ρ(y)) ν(dy).
X X
(29.18)
Now integrate both sides of the latter inequality against ν(dx); this
is allowed because U (r) ≥ − C(r + 1) for some ﬁnite constant C, so the
left-hand side of (29.18) is bounded below by the integrable function
−C (ρε + 1). After integration, one has
Z Z
U (ρε ) dν ≤ Kε (x, y) U (ρ(y)) ν(dy) ν(dx).
X X ×X

But Kε (x, y) ν(dx) is a probability measure for any y ∈ X , so

Z Z Z
Kε (x, y) U (ρ(y)) ν(dy) ν(dx) = U (ρ(y)) ν(dy) = U (ρ) dν.
X ×X X X

To summarize: Uν (µε ) ≤ Uν (µ) for all ε > 0, and then the conclusion
follows.
If µ is not absolutely continuous, deﬁne
Z Z
ρa,ε (x) = Kε (x, y) ρ(y) ν(dy); ρs,ε (x) = Kε (x, y) µs (dy),
Spt ν Spt ν

where µ = ρ ν + µs is the Lebesgue decomposition of µ with respect

to ν. Then ρε = ρa,ε + ρs,ε . By convexity of U , for any θ ∈ (0, 1),
Z Z Z ρ
ρa,ε s,ε
U (ρε ) dν ≤ (1 − θ) U dν + θ U dν
1−θ θ
Z Z
ρa,ε ′
≤ (1 − θ) U dν + U (∞) ρs,ε dν
1−θ
Z
ρa,ε
= (1 − θ) U dν + U ′ (∞) µs [X ].
1−θ

It is easy to pass to the limit as θ → 0 (use the monotone convergence

theorem for the positive part of U , and the dominated convergence
theorem for the negative part). Thus
β
Continuity properties of the functionals Uν and Uπ,ν 829
Z Z
U (ρε ) dν ≤ U (ρa,ε ) dν + U ′ (∞) µs [X ].
R R
The ﬁrst part of the proof shows that U (ρa,ε ) dν ≤ U (ρ) dν, so
Z Z
U (ρε ) dν ≤ U (ρ) dν + U ′ (∞) µs [X ] = Uν (µ),

and this again implies the conclusion.

General case with β variable: This is much, much more tricky and
I urge the reader to skip this case at ﬁrst encounter.
Before starting the proof, here are a few remarks. The assumption
of polynomial growth implies the following estimate: For each B > 0
there is a constant C such that

U+ (ar)
sup ≤ C (U+ (r) + r). (29.19)
B −1 ≤a≤B a

Let us check (29.19) brieﬂy. If U ≤ 0, there is nothing to prove. If U ≥ 0,

the polynomial growth assumption amounts to r U ′ (r) ≤ C(U (r) + r),
so r (U ′ (r) + 1) ≤ 2C (U (r) + r); then (d/dt) log[U (t r) + t r] ≤ 2C/t, so

U (ar) + ar ≤ (U (r) + r) a2C , (29.20)

whence the conclusion. Finally, if U does not have a constant sign, this
means that there is r0 > 0 such that U (r) ≤ 0 for r ≤ r0 , and U (r) > 0
for r > r0 . Then:
• if r < B −1 r0 , then U+ (ar) = 0 for all a ≤ B and (29.19) is obviously
true;
• if r > B r0 , then U (ar) > 0 for all a ∈ [B −1 , B] and one estab-
lishes (29.20) as before;
• if B −1 r0 ≤ r ≤ B r0 , then U+ (ar) is bounded above for a ≤ B,
while r is bounded below, so obviously (29.19) is satisﬁed for some
well-chosen constant C.

Next, we may dismiss the case when the right-hand side of the in-
equality in (29.17) is +∞ as trivial; so we might assume that
Z
ρ(x)
β(x, y) U+ π(dy|x) ν(dx) < +∞; U ′ (∞) µs [X ] < +∞.
β(x, y)
(29.21)
830 29 Weak Ricci curvature bounds I: Definition and Stability

Since B −1 ≤ β(x, y) ≤ B for some B > 0, (29.19) implies the existence

of a constant C such that
Z Z
C −1 U+ (ρ(x)) ν(dx) = C −1 U+ (ρ(x)) π(dy|x) ν(dx)
Z
ρ(x)
≤ β(x, y) U+ π(dy|x) ν(dx)
β(x, y)
Z Z
≤ C U+ (ρ(x)) π(dy|x) ν(dx) = C U+ (ρ(x)) ν(dx).

So (29.21) implies the integrability of U+ (ρ).

After these preliminaries, we can go on with the proof. In the se-
quel, the symbol C will stand for constants that may vary from place
to place but only depend on the nonlinearity U and the distortion coef-
ﬁcients β. I shall write µε for µk = µεk , to emphasize the role of ε as a
regularization parameter. For consistency, I shall also write πε instead
of πk .
Step 1: Reduction to the case Spt ν = X . This step is essentially
trivial. Let S = Spt ν. By assumption πε [(X × X ) \ (S × S)] = 0, so
πε [y ∈
/ S|x] = 0, ρε ν(dx)-almost surely; equivalently, πε [y ∈
/ S|x] = 0,
ν(dx)-almost surely on {ρε > 0}. Since U (0) = 0, values of x such that
ρε (x) = 0 do not aﬀect the integral in the left-hand side of (29.17), so
we might restrict this integral to y ∈ S. Then the ν(dx) integration
allows us to further restrict the integral to x ∈ S.
Since each πε is concentrated on the closed set S × S, the same is
true for the weak limit π; then the same reasoning as above applies for
the right-hand side of (29.17), and that integral can also be restricted
to S × S.
It only remains to check that the assumption of weak convergence
πε → π is preserved under restriction to S × S. Let ϕ be a continuous
function on S × S. Since S × S is compact, ϕ is uniformly continuous,
so by Tietze–Urysohn’s theorem it can be extended into a continuous
function on the whole of X × X , still denoted ϕ. Then
Z Z Z Z
ϕ dπε = ϕ dπε −−− → ϕ dπ = ϕ dπ,
S×S X ×X ε→0 X ×X S×S

so π is indeed the weak limit of πε , when viewed as a probability mea-

sure on S × S.
In the sequel all the discussion will be restricted to S, so I shall
assume Spt ν = X .
β
Continuity properties of the functionals Uν and Uπ,ν 831

Step 2: Reduction to the case U ′ (0) > −∞. The problem now
is to get rid of possibly very large negative values of U ′ close to 0. For
any δ > 0, define Uδ′ (r) := max (U ′ (δ), U ′ (r)) and
Z r
Uδ (r) = Uδ′ (s) ds.
0
Since U′ is nondecreasing, Uδ′ converges monotonically to U ′ ∈ L1 (0, r)
as δ → 0. It follows that Uδ (r) decreases to U (r), for all r > 0. Let us
check that all the assumptions which we imposed on U , still hold true
for Uδ . First, Uδ (0) = 0. Also, since Uδ′ is nondecreasing, Uδ is convex.
Finally, Uδ has polynomial growth; indeed:
• if r ≤ δ, then r (Uδ )′ (r) = r U ′ (δ) is bounded above by a constant
multiple of r;
• if r > δ, then r (Uδ )′ (r) = r U ′ (r), which is bounded (by assumption)
by C(U (r)+ + r), and this obviously is bounded by C(Uδ (r)+ + r),
for U ≤ Uδ .
The next claim is that the integral
Z
ρ(x)
β(x, y) Uδ π(dy|x) ν(dx)
β(x, y)
makes sense and is not +∞. Indeed, as we have just seen, there is a
constant C such that (Uδ )+ ≤ C(U+ (r) + r). Then the contribution of
the linear part Cr is finite, since
Z Z
ρ(x)
β(x, y) π(dy|x) ν(dx) = ρ(x) ν(dx) ≤ 1;
β(x, y)
and the contribution of C U+ (r) is also finite in view of (29.21). So
Z
ρ(x)
β(x, y) (Uδ )+ π(dy|x) ν(dx) < +∞,
β(x, y)
which proves the claim.
Now assume that Theorem 29.20(iii) has been proved with Uδ in
place of U . Then, for any δ > 0,
Z
ρε (x)
lim sup β(x, y) U πε (dy|x) ν(dx)
ε↓0 β(x, y)
Z
ρε (x)
≤ lim sup β(x, y) Uδ πε (dy|x) ν(dx)
ε↓0 β(x, y)
Z
ρ(x)
≤ β(x, y) Uδ π(dy|x) ν(dx).
β(x, y)
832 29 Weak Ricci curvature bounds I: Definition and Stability

But by monotone convergence,

Z
ρ(x)
β(x, y) Uδ π(dy|x) ν(dx)
β(x, y)
Z
ρ(x)
= β(x, y) U π(dy|x) ν(dx),
β(x, y)

and inequality (29.17) follows.

To summarize: It is suﬃcient to establish (29.17) with U replaced
by Uδ , and thus we may assume that U is bounded below by a linear
function r → −Kr.
Step 3: Reduction to the case when U ≥ 0. As we have already
seen, adding a linear function Kr to U does not alter the assumptions
on U ; it does not change the conclusion either, because this only adds
the constant K to both sides of the inequality (29.17). For the right-
hand side, this is a consequence of
Z Z
ρε (x, y)
β(x, y)K πε (dy|x) ν(dx) = K πε (dy|x) (ρε ν)(dx)
β(x, y)
Z
= K πε (dx dy) = K;

and for the right-hand side the computation is similar, once one has
noticed that the ﬁrst marginal of π is the weak limit of the ﬁrst marginal
µε of πε , i.e. µ (as recalled in the First Appendix).
So in the sequel I shall assume that U ≥ 0.
Step 4: Treatment of the singular part. To take care of the
singular part, the reasoning is similar to the one already used in the
particular case β = 1: Write µ = ρ ν + µs , and
Z Z
ρa,ε (x) = Kε (x, y) ρ(y) ν(dy); ρs,ε (x) = Kε (x, y) µs (dy).
X X

Then by convexity of U , for any θ ∈ (0, 1),

ρ ν
β β ρa,ε ν β s,ε
Uπ,ν (µε ) ≤ (1 − θ) Uπ,ν + θ Uπ,ν
1−θ θ

β ρa,ε ν
≤ (1 − θ) Uπ,ν + U ′ (∞) µs [X ],
1−θ
and the limit θ → 0 yields
β
Continuity properties of the functionals Uν and Uπ,ν 833

β β
Uπ,ν (µε ) ≤ Uπ,ν (ρa,ε ν) + U ′ (∞) µs [X ].
β
In the next two steps I shall focus on the ﬁrst term Uπ,ν (ρa,ε ν);
I shall write ρε for ρa,ε .
Step 5: Approximation of β. For any two points x, y in X , deﬁne
Z
βε (x, y) = Kε (x, x) Kε (y, y) β(x, y) ν(dx) ν(dy).
X ×X

The measure Kε (x, x)Kε (y, y) ν(dx) ν(dy) is a probability measure on

X × X , supported in {d(x, x) ≤ ε, d(y, y) ≤ ε}, so
Z

βε (x, y) − β(x, y) = Kε (x, x) Kε (y, y) [β(x, y)−β(x, y)] ν(dx) ν(dy)

n o
≤ sup |β(x, y) − β(x, y)|; d(x, x) ≤ ε, d(y, y) ≤ ε .

The latter quantity goes to 0 uniformly in x and y by uniform continuity

of β; so βε converges uniformly to β. The goal now is to replace β by
βε in the left-hand side of the desired inequality (29.17).
The map w : b 7−→ b U (r/b) is continuously diﬀerentiable and (since
U ≥ 0),

|w′ (b)| = p(r/b) ≤ (r/b) U ′ (r/b) ≤ C(U (r/b) + r/b).

So if, β ≤ βe are two positive real numbers, then

Z βe

βe U ρ − β U ρ = p
ρ
db ≤ C |βe − β| sup (U (r) + r).
e β b
β β ρ ρ
e ≤r≤ β
β

Now assume that β and βe are bounded from above and below by pos-
itive constants, say B −1 ≤ β, βe ≤ B, then by (29.19) there is C > 0,
depending only on B, such that

βe U ρ − β U ρ ≤ C |βe − β| U (ρ) + ρ .

βe β

Apply this estimate with ρ = ρε (x), β = β(x, y), and {β, β} e =

{β(x, y), βε (x, y)}; this is allowed since β is bounded from above and
below by positive constants, and the same is true of βε since it has
been obtained by averaging β. So there is a constant C such that for
all x, y ∈ X ,
834 29 Weak Ricci curvature bounds I: Definition and Stability

ρε (x) ρε (x)
βε (x, y) U − β(x, y) U
βε (x, y) β(x, y)

≤ C βε (x, y) − β(x, y) U (ρε (x)) + ρε (x) . (29.22)

Then
Z
ρε (x)
βε (x, y) U πε (dy|x) ν(dx)
βε (x, y)
Z
ρε (x)
− β(x, y) U πε (dy|x) ν(dx)
β(x, y)
Z
ρε (x) ρε (x)

≤ βε (x, y) U − β(x, y) U πε (dy|x) ν(dx)
βε (x, y) β(x, y)
Z

≤ C βε (x, y) − β(x, y) U (ρε (x)) + ρε (x) πε (dy|x) ν(dx)
! Z

≤ C sup βε (x, y) − β(x, y) U (ρε (x)) + ρε (x) ν(dx)
x,y∈X
! Z

≤C sup βε (x, y) − β(x, y) U (ρ) + ρ dν,
x,y∈X

where the last inequality follows from Jensen’s inequality as in the proof
of the particular case β ≡ 1. To summarize:
Z
ρ ε (x)
lim sup βε (x, y) U πε (dy|x) ν(dx)
ε↓0 βε (x, y)
Z
ρε (x)
− β(x, y) U πε (dy|x) ν(dx) = 0.
β(x, y)

So the desired conclusion is equivalent to

Z
ρε (x)
lim sup βε (x, y) U πε (dy|x) ν(dx) + U ′ (∞) µs [X ]
ε↓0 βε (x, y)
Z
ρ(x)
≤ β(x, y) U π(dy|x) ν(dx) + U ′ (∞) µs [X ].
β(x, y)

Step 6: Convexity inequality. This is a key step. By Legendre

representation,
β
Continuity properties of the functionals Uν and Uπ,ν 835

ρ ρ ∗
βU = β sup p − U (p)
β p∈R β

= sup pρ − β U ∗ (p) ,
p∈R

so β U (ρ/β) is a (jointly) convex function of (β, ρ) ∈ (0, +∞) × R+ .

On the other hand, βε and ρε are averaged values of β(x, y) and ρ(x),
respectively, over the probability measure Kε (x, x)Kε (y, y) ν(dx) ν(dy).
So by Jensen’s inequality,

ρε (x)
βε (x, y) U
βε (x, y)
Z
ρ(x)
≤ Kε (x, x) Kε (y, y) β(x, y) U ν(dx) ν(dy).
β(x, y)
To conclude the proof of (29.17) it suffices to show
Z
ρ(x)
Kε (x, x) Kε (y, y) β(x, y) U ν(dx) ν(dy) πε (dy|x) ν(dx)
β(x, y)
+ U ′ (∞) µs [X ]
Z
ρ(x)
−−−→ β(x, y) U π(dy|x) ν(dx) + U ′ (∞) µs [X ]. (29.23)
ε→0 β(x, y)
This will be the object of the final three steps.
Step 7: Approximation by a continuous function. Let us start
with some explanations. Forget about the singular part for simplicity,
and define
ρ(x)
f (x, y) = β(x, y) U ,
β(x, y)
Z
ωε (dx dy) = Kε (x, x) Kε (y, y) πε (dy|x) ν(dx) ν(dy) ν(dx),

ω(dx dy) = π(dy|x) ν(dx).

With this notation, the goal (29.23) can be rewritten as
Z Z
f dωε −→ f dω. (29.24)

This is not trivial, in particular because f is a priori not continuous.

The obvious solution is to try to replace f by a continuous approxima-
tion, but then we run into another problem: the conditional measure
836 29 Weak Ricci curvature bounds I: Definition and Stability

π(dy|x) is completely arbitrary if ρ(x) = 0. This is not serious as long

as we multiply it by f (x, y), since f (x, y) vanishes when ρ(x) does, but
this might become annoying when f (x, y) is replaced by a continuous
approximation.
Instead, let us rather deﬁne

f (x, y) β(x, y) ρ(x)
g(x, y) = = U ,
ρ(x) ρ(x) β(x, y)

with the convention U (0)/0 = U ′ (0), U (∞)/∞ = U ′ (∞), and we

impose that ρ(x) is finite outside of Spt µs and takes the value +∞
β
on Spt µs . If U ′ (∞) = +∞, then the finiteness of Uπ,ν (µ) imposes
µs [X ] = 0, so Spt µs = ∅; in particular, g takes values in R.
We further define
Z
eε (dx dy) =
π Kε (x, x) Kε (y, y) πε (dy|x) ν(dx) ν(dy) µ(dx).

Note that the x-marginal of π eε is

Z
Kε (x, x) Kε (y, y) πε (dy|x) ν(dx) ν(dy) µ(dx)
Z
= Kε (x, x) πε (dy|x) ν(dx) µ(dx)
Z
= Kε (x, x) ν(dx) µ(dx) = µ(dx).

In particular,
Z
eε (dx dy) = U ′ (∞) µs (dx).
g(x, y) π
Spt µs

Then the goal (29.23) becomes

Z Z
eε (dx dy) −−−→ g(x, y) π(dx dy).
g(x, y) π (29.25)
ε→0

• If U ′ (∞) = +∞, then Spt µs = ∅ and for each x, g(x, y) is a con-

tinuous function of y; moreover, since β is bounded from above and
below by positive constants, (29.19) implies

ρ(x)
sup β(x, y) U ≤ C U (ρ(x)) + ρ(x) .
y β(x, y)
β
Continuity properties of the functionals Uν and Uπ,ν 837

In particular,
Z Z
ρ(x)
sup g(x, y) dµ(x) = sup β(x, y) U dν(x) < +∞;
X y X y β(x, y)

in other words, g belongs to the vector space L1 (X , µ); C(X ) of
µ-integrable functions valued in the normed space C(X ) (equipped
with the norm of uniform convergence).
• If U ′ (∞) < +∞ then g(x, · ) is also continuous for all x (it is iden-
tically equal to U ′ (∞) if x ∈ Spt µs ), and supy g(x, y) ≤ U ′ (∞) is
obviously µ(dx)-integrable; so the conclusion g ∈ L1 (X , µ); C(X )
still holds true.

By Lemma 29.36 in the Second Appendix, there is a sequence

(Ψk )k∈N in C(X × X )N such that
Z

sup g(x, y) − Ψk (x, y) dµ(x) −−−→ 0,
X y k→∞

and each function Ψk (x, · ) is identically equal to U ′ (∞) when x varies

in Spt µs .
Since the x-marginal of π is µ,
Z

g(x, y) − Ψk (x, y) π(dx dy)
Z

≤ sup f (x, y) − Ψk (x, y) µ(dx) −−−→ 0.
y k→∞

A similar computation applies with π eε in place of π. So it is also true

that Z

lim sup g(x, y) − Ψk (x, y) πeε (dx dy) −−−→ 0.
ε↓0 k→∞

After these estimates, to conclude the proof it is suﬃcient to show

that for any ﬁxed k,
Z Z
eε (dx dy) −−→ Ψk (x, y) π(dx dy).
Ψk (x, y) π (29.26)
ε↓0

In the sequel I shall drop the index k and write just Ψ for Ψk . It will
be useful in the sequel to know that Ψ (x, y) = U ′ (∞) when x ∈ Spt µs ;
apart from that, Ψ might be any continuous function.
838 29 Weak Ricci curvature bounds I: Definition and Stability
(a)
Step 8: Variations on a regularization theme. Let π eε be the
eε . Explicitly,
contribution of ρ ν to π
Z
(a)
eε (dx dy) =
π Kε (x, x) Kε (y, y) πε (dy|x) ρ(x) ν(dx) ν(dy) ν(dx).

Then let
Z
π ε(a) (dx dy) = Kε (x, x) Kε (y, y) ρ(x) ν(dx) πε (dy|x) ν(dy) ν(dx);
Z
bε(a) (dx dy)
π = Kε (x, x) Kε (y, y) ρε (x) ν(dx) πε (dy|x) ν(dy) ν(dx).

(a) (a) (a)

eε
We shall check that π bε . First
is well approximated by π ε and π
of all,

Let us show that the integral in (29.27) converges to 0. Since C(X ) is

dense in L1 (X , ν), there is a sequence of continuous functions (ψj )j∈N
converging to ρ in L1 (X , ν). Then
Z Z
Kε (x, x) |ρ(x)−ψj (x)| ν(dx) ν(dx) = |ρ(x)−ψj (x)| ν(dx) −−−→ 0,
j→∞
(29.28)
and the convergence is uniform in ε. Symetrically,
Z
Kε (x, x) |ρ(x) − ψj (x)| ν(dx) ν(dx) −−−→ 0 uniformly in ε.
j→∞
(29.29)
On the other hand, for each j, the uniform continuity of ψj guarantees
that
Z
Kε (x, x) |ψj (x) − ψj (x)| ν(dx) ν(dx)

≤ ν[X ] sup |ψj (x) − ψj (x)| −−−→ 0. (29.30)

d(x,x)≤ε ε→0
β
Continuity properties of the functionals Uν and Uπ,ν 839

The combination of (29.27), (29.28), (29.29) and (29.30) shows that

kπ ε(a) − π
eε(a) kT V −→ 0. (29.31)

Next,

and this goes to 0 as we already saw before; so

πε(a) − π ε(a) kT V −→ 0,
kb (29.32)
(a) (a)
By (29.31) and (29.32), kb
πε −e πε kT V −→ 0 as ε → 0. In particular,
Z Z
Ψ dbπε(a) − Ψ deπε(a) −−→ 0. (29.33)
ε↓0

Now let
Z
eε(s) (dx dy)
π = Kε (x, x) Kε (y, y) πε (dy|x) ν(dx) ν(dy) µs (dx);

Z
bε(s) (dx dy)
π = Kε (x, x) Kε (y, y) µs,ε (dx) πε (dy|x) ν(dy) ν(dx);

(a) (s)
eε (dx dy) = π
so that π eε (dx dy) + π
eε (dx dy). Further, deﬁne

bε(a) (dx dy) + π

bε (dx dy) = π
π bε(s) (dx dy)
Z
= Kε (x, x) Kε (y, y) πε (dx dy) ν(dy) ν(dx).
840 29 Weak Ricci curvature bounds I: Definition and Stability

Since Ψ (x, y) = U ′ (∞) when x ∈ Spt µs , we have

Z Z
(s)
Ψ de
πε = πε(s) = U ′ (∞) µs [X ],
Ψ de
Spt µs

while Z
πε(s) = U ′ (∞) µs,ε [X ] = U ′ (∞) µs [X ].
Ψ db

Combining this with (29.33), we can conclude that

Z Z
Ψ dbπε − Ψ de πε −−→ 0. (29.34)
ε↓0

At
R this point,
R to ﬁnish the proof of the theorem it suﬃces to establish
πε −→ Ψ dπ; which is true if π
Ψ db bε converges weakly to π.
Step 9: Duality. Proving the convergence of π bε to π will be easy
because πbε is a kind of regularization of πε , and it will be possible to
“transfer the regularization to the test function” by duality. Indeed:
Z Z
bε (dx dy) = Ψ (x, y) Kε (x, x) Kε (y, y) πε (dy dx) ν(dy) ν(dx)
Ψ (x, y) π
Z
= Ψε (x, y) πε (dy dx),

where
Z
Ψε (x, y) = Ψ (x, y) Kε (x, x) Kε (y, y) ν(dy) ν(dx).

By the same classical argument that we already used several times,

Ψε converges uniformly to Ψ :

Ψε (x, y) − Ψ (x, y)
Z

= [Ψ (x, y) − Ψ (x, y)] Kε (x, x) Kε (y, y) ν(dx) ν(dy)

≤ sup Ψ (x, y) − Ψ (x, y) −−→ 0;
d(x,x)≤ε, d(y,y)≤ε ε↓0

Since on the other hand πε converges weakly to π, we conclude that

Z Z
Ψε dπε −→ Ψ dπ,

and the proof is complete. ⊓

⊔
β
Continuity properties of the functionals Uν and Uπ,ν 841

I shall conclude this section with a corollary of Theorem 29.20:

Corollary 29.23 (Another sufficient condition to be a weak

CD(K, N ) space). In Definition 29.8 it is equivalent to require in-
equality (29.11) for all probability measures µ0 , µ1 ; or only when µ0 , µ1
are absolutely continuous with continuous densities.

Proof of Corollary 29.23. Assume that (X , d, ν) satisﬁes the assump-

tions of Deﬁnition 29.8, except that µ0 , µ1 are required to be absolutely
continuous. The goal is to show that the absolute continuity condition
can be relaxed.
By Proposition 29.12, we may assume that U has polynomial
growth, in the sense of Theorem 29.20(iii). Let us also assume that
(K,N )
βt is continuous.
Let µ0 , µ1 be two possibly singular probability measures such that
Spt µ0 , Spt µ1 ⊂ ν. By Theorem 29.20(iii), there are sequences of prob-
ability measures µk,0 → µ0 and µk,1 → µ1 , all absolutely continuous
and with continuous densities, such that for any πk ∈ Π(µk,0 , µk,1 ),
converging weakly to π,
 (K,N) (K,N)

 β1−t β1−t

 lim sup Uπk ,ν (µ k,0 ) ≤ Uπ,ν (µ0 );
k→∞
(29.35)

 (K,N)
βt
(K,N)
βt

lim sup Uπ̌k ,ν (µk,1 ) ≤ Uπ̌,ν (µk,0 ).
k→∞

For each k ∈ N, there is a displacement interpolation (µk,t )0≤t≤1 and

an associated optimal transference plan πk ∈ P (X × X ) such that
(K,N) (K,N)
β β
Uν (µk,t ) ≤ (1 − t) Uπk1−t t
,ν (µk,0 ) + t Uπ̌k ,ν (µk,1 ). (29.36)

By Theorem 28.9 (in the very simple case when Xk = X for all k), we
may extract a subsequence such that µk,t −→ µt in P2 (X ), for each
t ∈ [0, 1], and πk −→ π in P (X × X ), where (µt )0≤t≤1 is a displacement
interpolation and π ∈ Π(µ0 , µ1 ) is an associated optimal transference
plan. Then by Theorem 29.20(i),

Uν (µt ) ≤ lim inf Uν (µk,t ).

k→∞

Combining this with (29.35) and (29.36), we deduce that

(K,N) (K,N)
β
1−t β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ),
842 29 Weak Ricci curvature bounds I: Definition and Stability

as required.
(K,N )
It remains to treat the case when βt is not continuous. By
Proposition
p 29.11 this can occur only if N = 1 or diam (X ) =
π (N − 1)/K. In both cases, Proposition 29.10 and the previous proof
show that
(K,N ′ ) (K,N ′ )
′ β1−t βt
∀N > N, Uν (µt ) ≤ (1 − t) Uπ,ν (µ0 ) + t Uπ̌,ν (µ1 ).

Then the conclusion is obtained by letting N ′ ↓ N . ⊓

⊔

Stability of Ricci bounds

Now we have all the tools to prove the main result of this chapter: The
weak curvature-dimension bound CD(K, N ) passes to the limit. Once
again, the compact case will imply the general statement.

Theorem 29.24 (Stability of weak CD(K, N ) under MGH).

Let (Xk , dk , νk )k∈N be a sequence of compact metric-measure geodesic
spaces converging in the measured Gromov–Hausdorff topology to a
compact metric-measure space (X , d, ν). Let K ∈ R and N ∈ [1, ∞].
If each (Xk , dk , νk ) satisfies the weak curvature-dimension condition
CD(K, N ), then also (X , d, ν) satisfies CD(K, N ).

Theorem 29.25 (Stability of weak CD(K, N ) under pMGH).

Let (Xk , dk , νk )k∈N be a sequence of locally compact, complete, separable
σ-finite metric-measure geodesic spaces converging in the pointed mea-
sured Gromov–Hausdorff topology to a locally compact, complete separa-
ble σ-finite metric-measure space (X , d, ν). Let K ∈ R and N ∈ [1, ∞].
If each (Xk , dk , νk ) satisfies the weak curvature-dimension condition
CD(K, N ), then also (X , d, ν) satisfies CD(K, N ).

Remark 29.26. An easy variant of Theorem 29.25 is as follows: If

(Xk , dk , νk ) converges to (X , d, ν) in the geodesic local Gromov–Hausdorff
topology, and each (Xk , dk , νk ) satisfies CD(K, N ), then also (X , d, ν)
satisfies CD(K, N ).

Proof of Theorem 29.24. Let (Xk , dk , νk )k∈N be a sequence of metric-

measure spaces satisfying the assumptions of Theorem 29.24. From the
Stability of Ricci bounds 843

characterization of measured Gromov–Hausdorﬀ convergence, we know

that there are measurable functions fk : Xk → X such that:
(i) fk is an εk -isometry (Xk , dk ) → (X , d), with εk → 0;
(ii) (fk )# νk converges weakly to ν.
Let ρ0 , ρ1 be two probability densities on (X , ν); let µ0 = ρ0 ν,
µ1 = ρ1 ν.
Let ε = (εm )m∈N be a sequence going to 0; for each t0 ∈ {0, 1}, let
(ρε,t0 ) be a sequence of continuous probability densities satisfying the
conclusion of Theorem 29.20(iii) with µ = µt0 , and let µε,t0 = ρε,t0 ν be
the associated measure. In particular, µε,t0 converges weakly to µt0 as
ε → 0.
Still for t0 ∈ {0, 1}, deﬁne
Z
k (ρε,t0 ◦ fk ) νk k
µε,t0 := k
, Zε,t0 = (ρε,t0 ◦ fk ) dνk .
Zε,t0

Since ρε,t0 is continuous and (fk )# νk converges weakly to ν, we have

Z Z
k
Zε,t0 = ρε,t0 d((fk )# νk ) −−−→ ρε,t0 dν = 1;
k→∞

k > 0 for k large enough, and then µk

in particular Zε,t 0 ε,t0 is a probability
measure on Xk .
Let ψ ∈ C(X ), then
Z Z Z
k k 1
ψ d((fk )# µε,t0 ) = (ψ ◦ fk ) dµε,t0 = k (ψ ◦ fk ) (ρε,t0 ◦ fk ) dνk
Zε,t0
Z
1
= k ψρε,t0 d((fk )# νk ).
Zε,t0
(29.37)
k
On the one hand, Zε,t converges to 1 as k → ∞; on the other,
0
Z Z Z
ψρε,t0 d((fk )# νk ) −−−→ ψρε,t0 dν = ψ dµε,t0 .
k→∞

Plugging this information back into (29.37), we obtain

(fk )# µkε,t0 −−−→ µε,t0 weakly. (29.38)
k→∞

Since each (Xk , dk , νk ) satisﬁes CD(K, N ), there is a Wasserstein

geodesic (µkε,t )0≤t≤1 joining µkε,0 to µkε,1 , such that for all U ∈ DCN and
t ∈ (0, 1),
844 29 Weak Ricci curvature bounds I: Definition and Stability

(K,N) (K,N)
β β
Uνk (µkε,t ) ≤ (1 − t) Uπk1−t
,ν
(µk0 ) + t Uπ̌kt ,ν (µk1 ), (29.39)
ε k ε

(K,N )
where βt is given by (14.61) (with the distance dk ) and πεk is an
optimal coupling associated with (µkε,t )0≤t≤1 . This means that for each
ε ∈ (0, 1) and k ∈ N there is a dynamical optimal transference plan Πεk
such that
µkε,t = (et )# Πεk , πεk = (e0 , e1 )# Πεk ,
where et is the evaluation at time t.
By Theorem 28.9, up to extraction of a subsequence in k, there is a
dynamical optimal transference plan Πε on Γ (X ) such that, as k → ∞,


 (f ◦) Π k −→ Πε weakly in P (P ([0, 1] × X ));
 k # ε
k
(fk , fk )# πε −→ πε weakly in P (X × X );

 k
 sup W2 (fk )# µε,t , µε,t −→ 0;
0≤t≤1

where
µε,t = (et )# Πε , πε = (e0 , e1 )# Πε .
Each curve (µε,t )0≤t≤1 is D-Lipschitz with D = diam (X ). By As-
coli’s theorem, from ε ∈ (0, 1) we may extract a subsequence (still
denoted ε for simplicity) such that

sup W2 µε,t, µt −−−→ 0, (29.40)
0≤t≤1 ε→0

where (µt )0≤t≤1 is a Wasserstein geodesic joining µ0 to µ1 . It remains

to “pass to the limit” in inequality (29.39), letting ﬁrst k → ∞, then
ε → 0, in order to show that
(K,N) (K,N)
β
1−t β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ). (29.41)

By Proposition 29.12, it is suﬃcient to establish (29.41) when

U ∈ DCN is nonnegative and either Lipschitz, or behaving at infin-
ity like r → a r log r + b r; in particular we may assume that U grows
at most polynomially in the sense of Theorem 29.20(iii). In the sequel,
U will be such a nonlinearity, and t will be an arbitrary time in (0, 1).
(K,N )
Moreover, I shall first assume that the coefficients βt are continuous
and bounded.
Stability of Ricci bounds 845

By the joint lower semicontinuity of (µ, ν) 7−→ Uν (µ) (Theo-

rem 29.20(i)) and the contraction property (Theorem 29.20(ii)), we
have

Uν (µε,t ) ≤ lim inf U(fk )# νk (fk )# µkε,t (29.42)
k→∞
≤ lim inf Uνk (µkε,t ).
k→∞

Then by lower semicontinuity again,

Uν (µt ) ≤ lim inf Uν (µε,t ). (29.43)

ε→0

Inequalities (29.42) and (29.43) take care of the left-hand side

of (29.39):
Uν (µt ) ≤ lim inf lim inf Uνk (µkε,t ). (29.44)
ε→0 k→∞

It remains to pass to the limit in the right-hand side.

(K,N )
Let β(x, y) = β1−t (x, y). Since β(x, y) is only a function of the
distance dk (x, y), since

lim sup dk (x, y) − d(fk (x), fk (y)) = 0,
k→∞ x,y∈Xk

and since ρkε,0 and U are continuous, β(x0 , x1 ) U (ρkε,0 (x0 )/β(x0 , x1 ))
is uniformly close to β(fk (x0 ), fk (x1 )) U (ρkε,0 (x0 )/β(fk (x0 ), fk (x1 ))) as
k → ∞. So
Z !
ρkε,0 (x0 )

lim β(x0 , x1 ) U πεk (dx1 |x0 ) νk (dx0 )
k→0 β(x0 , x1 )
Z !
ρkε,0 (x0 )

− β(fk (x0 ), fk (x1 )) U πεk (dx1 |x0 ) νk (dx0 ) = 0.
β(fk (x0 ), fk (x1 ))
(29.45)

(Of course, in the second integral β is computed with the distance d,

while in the ﬁrst integral it is computed with the distance dk .)
Let v(r) = U (r)/r (this is a continuous function if v(0) = U ′ (0));
by Lemma 29.6,
846 29 Weak Ricci curvature bounds I: Definition and Stability
Z !
ρkε,0 (x0 )
β fk (x0 ), fk (x1 ) U πεk (dx1 |x0 ) νk (dx0 )
β fk (x0 ), fk (x1 )
Z !
ρkε,0 (x0 )
= v πεk (dx0 dx1 )
β fk (x0 ), fk (x1 )
Z !
ρε,0 (y0 )
= v k
d (fk , fk )# πεk (y0 , y1 ).
Zε,0 β(y0 , y1 )
(29.46)

Since the integrand is a continuous function of (y0 , y1 ) converging uni-

formly as k → ∞, and (fk , fk )# πεk converges weakly to πε , we may pass
to the limit in (29.46):
Z !
ρε,0 (y0 ) k

lim v k β(y , y )
d (f k , f k )# π ε (y0 , y1 )
k→∞ Zε,0 0 1
Z
ρε,0 (y0 )
= v dπ(y0 , y1 )
β(y0 , y1 )
Z
ρε,0 (y0 )
= β(y0 , y1 ) U πε (dy1 |y0 ) ν(dy0 ), (29.47)
β(y0 , y1 )
where the latter equality follows again by Lemma 29.6.
Now it remains to pass to the limit as ε → 0. But Theorem 29.20(iii)
guarantees precisely that
Z
ρε,0 (x0 )
lim sup β(x0 , x1 ) U πε (dx1 |x0 ) ν(dx0 )
ε↓0 X ×X β(x0 , x1 )
Z
ρ0 (x0 )
≤ β(x0 , x1 ) U π(dx1 |x0 ) ν(dx0 ).
X ×X β(x0 , x1 )
This combined with (29.45), (29.46) and (29.47) shows that
(K,N)
β
lim sup lim sup Uπkt ,ν (µkε,0 ) ≤ Uπ,ν
βt
(µ0 ). (29.48)
ε k
ε↓0 k→∞

Similarly,
(K,N)
β βt
lim sup lim sup Uπ̌kt ,ν (µkε,1 ) ≤ Uπ̌,ν (µ1 ). (29.49)
ε k
ε↓0 k→∞

To summarize: Starting from (29.39), we can apply (29.44) to pass to

the limit in the left-hand side, and (29.48)–(29.49) to pass to the limit
in the right-hand side; and we recover the desired inequality (29.41).
Stability of Ricci bounds 847
(K,N )
This concludes the proof of the theorem in the case when βt is
bounded, which is true if any one of the following conditions is satisﬁed:
(a) K ≤ 0 and N > 1; (b) K > p 0 and N = ∞; (c) K > 0, 1 < N < ∞
and sup diam (Xk ) < DK,N = π (N − 1)/K.
If K ≤ 0 and N = 1, we can use the inequality
(K,1) (K,N ′ )
β β
t
Uπ,ν (µ) ≤ Uπ,ν
t
(µ)

where N ′ > 1 (recall Remark 17.31) to deduce that for any two proba-
bility measures µk0 , µk1 on Xk , there is a Wasserstein geodesic (µkt )t∈[0,1]
and an associated coupling π k such that
(K,N ′ ) (K,N ′ )
β β
Uνk (µkt ) ≤ (1 − t) Uπk1−t
,νk (µk0 ) + t Uπ̌kt ,νk (µk1 ).

Then the same proof as before shows that for any two probability mea-
sures µ0 , µ1 on X , there is a Wasserstein geodesic (µt )t∈[0,1] and an
associated coupling π such that
(K,N ′ ) (K,N ′ )
1−t β β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ).

It remains to pass to the limit as N ′ ↓ 1 to get

(K,1) (K,1)
1−t β β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ).

If K > 0, 1 < N < ∞ and sup diam (Xk ) = DK,N , then we can
apply a similar reasoning, introducing again the bounded coeﬃcients
(K,N ′ )
βt for N ′ > N and then passing to the limit as N ′ ↓ N .
This concludes the proof of Theorem 29.24. ⊓
⊔

Remark 29.27. What the previous proof really shows is that under
certain assumptions the property of distorted displacement convexity
is stable under measured Gromov–Hausdorﬀ convergence. The usual
displacement convexity is a particular case (take βt ≡ 1).

Proof of Theorem 29.25. Let ⋆k (resp. ⋆) be the reference point in Xk

(resp. X ). The same arguments as in the proof of Theorem 29.24 will
work here, provided that we can localize the problem. So pick up com-
pactly supported probability measures µ0 and µ1 on X , and deﬁne µkε,t0
(t0 = 0, 1) in exactly the same way as in the proof of Theorem 29.24.
Let R > 0 be such that the supports of ρ0 and ρ1 are included in BR] (⋆);
848 29 Weak Ricci curvature bounds I: Definition and Stability

then for k large enough and ε small enough, the supports of µkε,0 and
µkε,1 are contained in BR+1] (⋆k ). So a geodesic which starts from the
support of µkε,0 and ends in the support of µkε,1 will necessarily have its
image included in B2(R+1)] (⋆k ); thus each measure µkε,t has its support
included in B2(R+1)] (⋆k ).
From that point on, the very same reasoning as in the proof of
Theorem 29.24 can be applied, since, say, the ball B2(R+2)] (⋆k ) in
Xk converges in the measured Gromov–Hausdorﬀ topology to the ball
B2(R+2)] (⋆) in X , etc. ⊓
⊔

An application in Riemannian geometry

In this section, by convention I shall say that a metric-measure space

(M, d, ν) is a smooth Riemannian manifold if the distance d is the
geodesic distance induced by a Riemannian metric g on M , and ν is a
reference measure that can be written e−V vol, where vol is the volume
measure on M and V ∈ C 2 (M ). This definition extends in an obvious
way to pointed metric-measure spaces. Theorem 29.9 guarantees that
the synthetic and analytic definitions of CD(K, N ) bounds coincide for
Riemannian manifolds.
The next theorem, which is a simple consequence of our previous
results, may be seen as one noticeable outcome of the theory of weak
CD(K, N ) spaces. Note that it is an external result, in the sense that
its statement does not involve the definition of weak CD(K, N ) spaces,
nor any reference to optimal transport.

Theorem 29.28 (Smooth MGH limits of CD(K, N ) manifolds

are CD(K, N )). Let K ∈ R and N ∈ [1, ∞]. If a sequence (Mk )k∈N of
smooth CD(K, N ) Riemannian manifolds converges to some smooth
manifold M in the (pointed) measured Gromov–Hausdorff topology,
then the limit also satisfies the CD(K, N ) curvature-dimension bound.

Proof of Theorem 29.28. The statement results at once from Theo-

rems 29.9 and 29.25. ⊓
⊔

Remark 29.29. All the interest of this theorem lies in the fact that
the measured Gromov–Hausdorﬀ convergence is a very weak notion of
convergence, which does not imply the convergence of the Ricci tensor.
The space of CD(K, N ) spaces 849

Remark 29.30. There is no analog of Theorem 29.28 for upper bounds

on the Ricci curvature. In fact, any compact Riemannian manifold
(M, g) can be approximated by a sequence (M, gk )k∈N with (arbitrarily
large) negative Ricci curvature.

The space of CD(K, N ) spaces

Theorem 29.24 can be summarized as follows: The space of all compact

metric-measure geodesic spaces satisfying a weak CD(K, N ) bound is
closed under measured Gromov–Hausdorff convergence.
In connection with this, recall Gromov’s precompactness theorem
(Corollary 27.34): Given K ∈ R, N < ∞ and D < ∞, the set
M(K, N, D) of all smooth compact manifolds with dimension bounded
above by N , Ricci curvature bounded below by K and diameter
bounded above by D is precompact in the Gromov–Hausdorﬀ topol-
ogy. Then Theorem 29.24 implies that any element of the closure of
M(K, N, D) is a compact metric-measure geodesic space satisfying
CD(K, N ), in the weak sense of Deﬁnition 29.8.
Even if it is smooth, the limit space might have reference measure
ν = e−Ψ vol, for some nonconstant Ψ . Such phenomena do indeed occur
in examples where there is a collapse in the dimension; that is, when
the dimension of the limit manifold is strictly less than the dimension of
the manifolds in the converging sequence. The next example shows that
basically any reference measure can be obtained as a limit of volume
measures of higher-dimensional manifolds; it is a strong motivation
to replace the class of Riemannian manifolds by the class of metric-
measure spaces.

Example 29.31. Let (M, g) be a compact n-dimensional Riemannian

manifold, equipped with its geodesic distance d and its volume vol; let
V be any C 2 function on M , and let ν(dx) = e−V (x) dvol(x). Let S 2
stand for the usual 2-dimensional sphere, equipped with its usual metric
σ. For ε ∈ (0, 1), deﬁne Mε to be the e−V -warped product of (M, g) by
ε S 2 : This is the (n + 2)-dimensional manifold M × S 2 , equipped with
the metric gε (dx, ds) = g(dx) + ε2 e−V (x) σ(ds). As ε → 0, Mε collapses
to M ; more precisely the manifold (Mε , gε ), seen as a metric-measure
space, converges in the measured Gromov–Hausdorﬀ sense to (M, d, ν).
850 29 Weak Ricci curvature bounds I: Definition and Stability

Moreover, if Ricn+2,ν ≥ K, then Mε has Ricci curvature bounded below

by Kε , where Kε → K.

We shall see later (Theorem 30.14) that if (X , d, ν) is a weak

CD(K, N ) space, then the reference measure ν is locally doubling on
its support. More precisely, for any base point ⋆, there is a constant
D = D(K, N, R) such that ν is D-doubling on B[⋆, R] ∩ Spt ν. Combin-
ing this with Theorem 27.32, we arrive at the following compactness
theorem:

Theorem 29.32 (Compactness of the space of weak CD(K, N )

spaces). (i) Let K ∈ R, N < ∞, D < ∞, and 0 < m ≤ M < ∞.
Let CDD(K, N, D, m, M ) be the space of all compact metric-measure
geodesic spaces (X , d, ν) satisfying the weak curvature-dimension bound
CD(K, N ) of Definition 29.8, together with diam (X , d) ≤ D, m ≤
ν[X ] ≤ M , and Spt ν = X . Then CDD(K, N, D, m, M ) is compact in
the measured Gromov–Hausdorff topology.
(ii) Let K ∈ R, N < ∞, 0 < m ≤ M < ∞. Let p CDD(K, N, m, M )
be the space of all pointed locally compact Polish metric-measure geodesic
spaces (X , d, ν, ⋆) satisfying the weak CD(K, N ) curvature-dimension
bound of Definition 29.8, together with m ≤ ν[B1 (X )] ≤ M , and
Spt ν = X . Then p CDD(K, N, m, M ) is compact in the measured
Gromov–Hausdorff topology.

Remark 29.33. It is a natural question whether smooth Riemannian

manifolds, equipped with their geodesic distance and their volume
measure (multiplied by a positive constant), form a dense set in, say,
CDD(K, N, D, m, M ). The answer is negative, as will be discussed in
the concluding chapter.

First Appendix: Regularization in metric-measure spaces

Regularization by convolution is a fundamental tool in real analysis. It

is still available, to some extent, in metric-measure spaces, as I shall
now explain. Recall that a boundedly compact metric space is a metric
space in which closed balls are compact; and a locally ﬁnite measure is
a measure which gives ﬁnite mass to balls.
First Appendix: Regularization in metric-measure spaces 851

Definition 29.34 (Regularizing kernels). Let (X , d) be a bound-

edly compact metric space equipped with a locally finite measure ν, and
let Y be a compact subset of X . A (Y, ν)-regularizing kernel is a fam-
ily of nonnegative continuous symmetric functions (Kε )ε>0 on X × X ,
such that Z
(i) ∀x ∈ Y, Kε (x, y) ν(dy) = 1;
X
(ii) d(x, y) > ε =⇒ Kε (x, y) = 0.

For any compact subset Y of Spt ν, there is a (Y, ν)-regularizing

kernel, which can be constructed as follows. Cover Y by a finite num-
ber of balls B(xi , ε/2). Introduce a continuous subordinate partition
are continuous functions satisfying 0 ≤ φi ≤ 1,
of unity (φi )i∈I : These P
Spt(φi ) ⊂ B(xi , ε/2), i φi = 1 Ron Y; only keep those functions φi
such that Spt φi ∩ Y = 6 ∅, so that φi dν > 0. Next define
X φi (x) φi (y)
Kε (x, y) := R . (29.50)
i
φi dν
R P
For any x ∈ Y, Kε (x, y) ν(dy) = i φi (x) = 1. Also φi (x) φi (y) can
be nonzero only if x and y belong to B(xi , ε/2), which implies that
d(x, y) < ε. So Kε does the job.
As soon as µ is a finite measure on X , one may define a continuous
function Kε µ on X by
Z
(Kε µ)(x) := Kε (x, y) µ(dy).
X

Further, if f ∈ L1 (X , ν),
define Kε f := Kε (f ν).
The linear operator Kε : µ → (Kε µ)ν is mass-preserving, in
the sense that for any nonnegative finite measure µ on Y, one has
((Kε µ)ν)[Y] = µ[Y]. More generally, Kε defines a (nonstrict) contrac-
tion operator on M (Y). Moreover, as ε → 0,
• If f ∈ C(X ), then Kε f converges uniformly to f on Y;
• If µ is a finite measure supported in Y, then (Kε µ)ν converges
weakly (against Cb (X )) to µ (this follows from the previous property
by a duality argument);
• If f ∈ L1 (Y), then Kε f converges to f in L1 (Y) (this follows from
the density of C(Y) in L1 (Y, ν), the fact that Kε f converges uni-
formly to f if f is continuous, and the contraction property of Kε ).
There is in fact a more precise statement: For any f ∈ L1 (Y, ν),
852 29 Weak Ricci curvature bounds I: Definition and Stability
Z
|f (x) − f (y)| Kε (x, y) ν(dx) ν(dy) −−−→ 0.
Y×Y ε→0

Remark 29.35. If the measure ν is (locally) doubling, then one can

ask more of the kernel (Kε ), than just properties (i) and (ii) in Def-
inition 29.34. Indeed, by Vitali’s covering lemma, one can make sure
that the covering (B(xi , ε/2)) used in the deﬁnition of Kε is such that
the balls B(xi , ε/10) are disjoint. If (φi ) is a partition of unity asso-
ciated to the covering
R (B(xi , ε/2)), necessarily φi is identically 1 on
B(xi , ε/10), so φi dν ≥ ν[B(xi , ε/10)] ≥ Cν[B(xi , ε)], where C is a
constant depending on the doubling constant of ν. Then the following
uniform bound comes easily:
C
(iii) Kε (x, y) ≤ .
ν[Bε (x)]
(Here C is another numerical constant, depending on the doubling con-
stant of ν.) Assumptions (i), (ii) and (iii), together with the doubling
property of ν, and classical Lebesgue density theory, guarantee that for
any f ∈ L1 (Y) the convergence of Kε f to f holds not only in L1 (Y)
but also almost everywhere.

Second Appendix: Separability of L1 (X ; C(Y))

In the course of the proof of Theorem 29.20(iii), I used the density of

C(X × X ) in L1 (X ; C(X )). Here is a precise statement and a short
proof.

Lemma 29.36 (Separability of L1 (C)). Let (X , d) be a compact

metric space equipped with a finite Borel measure µ, let Y be another
compact metric space, and let f be a measurable function X × Y → R,
such that 

(i) f (x, · ) is continuous for all x;

Z

 sup |f (x, y)| dµ(x) < +∞.
(ii)
X y

Then for any ε > 0 there is Ψ ∈ C(X × Y) such that

Z

sup f (x, y) − Ψ (x, y) dµ(x) ≤ ε.
X y∈Y
Second Appendix: Separability of L1 (X ; C(Y)) 853

Moreover, if a (possibly empty) compact subset K of X , and a func-

tion h ∈ C(K) are given, such that f (x, y) = h(x) for all x ∈ K, then
it is possible to impose that Ψ (x, y) = h(x) for all x ∈ K.

Proof of Lemma 29.36. Let us first treat the case when Y is just a point.
Then the first part of the statement of Lemma 29.36 is just the density
of C(X ) in L1 (X , µ), which is a classical result. To prove the second
part of the statement, let f ∈ L1 (µ), let K be a compact subset of X ,
let h ∈ C(K) such that f = h on K, and let ε > 0. Let ψ ∈ Cc (X \ K)
be such that kψ − f kL1 (X \K,µ) ≤ ε.
Since µ and f ν are regular, there is an open set Oε containing K
such that µ[Oε \ K] ≤ ε/(sup |ψ| + sup |h|) and kf − hkL1 (Oε ) ≤ ε.
The sets Oε and X \ K are open and cover X , so there are continuous
functions χ and η, defined on X and valued in [0, 1], such that χ+η = 1,
χ is supported in Oε and η is supported in X \ K. (In particular χ ≡ 1
in K.) Let
Ψ = h χ + ψ η.
Then Ψ coincides with h (and therefore with f ) in K, Ψ is continuous,
and

kΨ − f kL1 (X ) ≤ kf − hkL1 (Oε ) + (sup |h|) kχ − 1kL1 (Oε )

+ (sup |ψ|) kη − 1kL1 (X \K) + kψ − f kL1 (X \K)
≤ ε + (sup |h| + sup |ψ|) µ[Oε \ K] + ε
≤ 3 ε.

Since ε is arbitrarily small, this concludes the proof.

Now let us turn to the case when Y is any compact set. For any
δ > 0 there are L = L(δ) ∈ N and {y1 , . . . , yL } such that the open
balls Bδ (yℓ ) (1 ≤ ℓ ≤ L) cover Y. Let (ζℓ )1≤ℓ≤L be a partition of unity
subordinated to that covering: As in the previous Appendix, these are
continuous functions satisfying 0 ≤ ζℓ ≤ 1, Spt(ζℓ ) ⊂ Bδ (yℓ ), and
P
ℓ ζℓ = 1 on Y. Let η > 0.
For each ℓ ∈ N, the function f ( · , yℓ ) is µ-integrable, so there exists
ψℓ ∈ C(X ) such that
Z

f (x, yℓ ) − ψℓ (x) dµ(x) ≤ η.

Moreover, we may require that ψℓ (x) = h(x) when x ∈ K.

Deﬁne Ψ by the formula
854 29 Weak Ricci curvature bounds I: Definition and Stability
X
Ψ (x, y) := ψℓ (x) ζℓ (y);
ℓ≤L

note that Ψ (x, y) = h(x) if x ∈ K. Then

f (x, y)−Ψ (x, y)

X X
= f (x, y) ζℓ (y) − ψℓ (x) ζℓ (y)
ℓ ℓ
X
= f (x, y) − ψℓ (x) ζℓ (y)
ℓ
X X
= f (x, y) − f (x, yℓ ) ζℓ (y) + f (x, yℓ ) − ψℓ (x) ζℓ (y).
ℓ ℓ

Since ζℓ is supported in Bδ (yℓ ),

|f (x, y) − Ψ (x, y)|

where n o
mδ (x) = sup |f (x, z) − f (x, z ′ )|; d(z, z ′ ) ≤ δ .

Obviously, mδ (x) ≤ 2 supz |f (x, z)|, so mδ ∈ L1 (µ), for all δ. On the

other hand, mδ (x) is nonincreasing as δ ↓ 0, and since each f (x, · ) is
continuous, actually mδ decreases to 0 as δ ↓ 0. By monotone conver-
gence, Z
mδ (x) dµ(x) −−−→ 0.
δ→0
Bibliographical notes 855

So if we choose δ small enough, then in turn η small enough, we can

make sure that the right-hand side of (29.51) is as small as desired.
This concludes the proof. ⊓
⊔

Bibliographical notes

Here are some (probably too lengthy) comments about the genesis
of Definition 29.8. It comes after a series of particular cases and/or
variants studied by Lott and myself [577, 578] on the one hand,
Sturm [762, 763] on the other. To summarize: In a first step, Lott
and I [577] treated CD(K, ∞) and CD(0, N ), while Sturm [762] inde-
pendently treated CD(K, ∞). These cases can be handled with just
displacement convexity. Then it took some time before Sturm [763]
came up with the brilliant idea to use distorted displacement as the
basis of the definition of CD(K, N ) for N < ∞ and K 6= 0.
There are slight variations in the definitions appearing in all these
works; and they are not exactly the ones appearing in this course either.
I shall describe the differences in some detail below.
In the case K = 0, for compact spaces, Definition 29.8 is exactly
the definition that was used in [577]. In the case N = ∞, the definition
in [577] was about the same as Definition 29.8, but it was based on
inequality (29.2) (which is very simple in the case K = ∞) instead
of (29.3). Sturm [762] also used a similar definition, but preferred to
impose the weak displacement convexity inequality only for the Boltz-
mann H functional, i.e. for U (r) = r log r, not for the whole class DC∞ .
It is interesting to note that precisely for the H functional and N = ∞,
inequalities (29.2) and (29.3) are the same, while in general the former
is a priori weaker. So the definition which I have adopted here is a priori
stronger than both definitions in [577] and [762].
Now for the general CD(K, N ) criterion. Sturm’s original defini-
tion [763] is quite close to Definition 29.8, with three differences. First,
he does not impose the basic inequality to hold true for all members
′
of the class DCN , but only for functions of the form −r 1−1/N with
N ′ ≥ N . Secondly, he does not require the displacement interpolation
(µt )0≤t≤1 and the coupling π to be related via some dynamical optimal
transference plan. Thirdly, he imposes µ0 , µ1 to be absolutely continu-
ous with respect to ν, rather than just to have their support included in
856 29 Weak Ricci curvature bounds I: Definition and Stability

Spt ν. After becoming aware of Sturm’s work, Lott and I [578] modified
his definition, imposing the inequality to hold true for all U ∈ DCN ,
imposing a relation between (µt ) and π, and allowing in addition µ0
and µ1 to be singular. In the present course, I decided to extend the
new definition to the case N = ∞.
Sturm [763] proved the stability of his definition under a variant
of measured Gromov–Hausdorff convergence, provided that one stays
away from the limit Bonnet–Myers diameter. Then Lott and I [578]
briefly sketched a proof of stability for our modified definition. Details
appear here for the first time, in particular the painful2 proof of upper
β
semicontinuity of Uπ,ν (µ) under regularization (Theorem 29.20(iii)). It
should be noted that Sturm manages to prove the stability of his defini-
tion without using this upper semicontinuity explicitly; but this might
be due to the particular form of the functions U that he is considering,
and the assumption of absolute continuity.
The treatment of noncompact spaces here is not exactly the same
as in [577] or [763]. In the present set of notes I adopted a rather weak
point of view in which every “noncompact” statement reduces to the
compact case; in particular in Definition 29.8 I only consider compactly
supported probability densities. This leads to simpler proofs, but the
treatment in [577, Appendix E] is more precise in that it passes to the
limit directly in the inequalities for probability measures that are not
compactly supported.
Other tentative definitions have been rejected for various reasons.
Let me mention four of them:
(i) Imposing the displacement convexity inequality along all dis-
placement interpolations in Definition 29.8, rather than along some
displacement interpolation. This concept is not stable under measured
Gromov–Hausdorff convergence. (See the last remark in the concluding
chapter.)
(ii) Replace the integrated displacement convexity inequalities by
pointwise inequalities, in the style of those appearing in Chapter 14.
For instance, with the same notation as in Definition 29.8, one may
define
ρ0 (x)
Jt (γ0 ) := ,
E ρt (γt )|γ0
2
As a matter of fact, I was working on precisely this problem when my left lung
collapsed, earning me a one-week holiday in hospital with unlimited amounts of
pain-killers.
Bibliographical notes 857

where γ is a random geodesic with law (γt ) = µt , and ρt is the abso-

lutely continuous part of µt with respect to ν. Then J is a continuous
function of t, and it makes sense to require that inequality (29.1) be
satisfied ν(dx)-almost everywhere (as a function of t, in the sense of dis-
tributions). This notion of a weak CD(K, N ) space makes perfect sense,
and is a priori stronger than the notion discussed in this chapter. But
there is no evidence that it should be stable under measured Gromov–
Hausdorff convergence. Integrated convexity inequalities enjoy better
stability properties. (One might hope that integrated inequalities lead
to pointwise inequalities by a localization argument, as in Chapter 19;
but this is not obvious at all, due to the a priori nonuniqueness of
displacement interpolation in a nonsmooth context.)
(iii) Choose inequality (29.2) as the basis for the definition, instead
of (29.3). In the case K < 0, this inequality is stable, due to the convex-
ity of −r 1−1/N , and the a priori regularity of the speed field provided
by Theorem 28.5. (This was actually my original motivation for The-
orem 28.5.) In the case K > 0 there is no reason to expect that the
inequality is stable, but then one can weaken even more the formulation
of CD(K, N ) and replace it by

Uν (µt ) ≤ (1 − t) Uν (µ0 ) + t Uν (µ1 )

KN,U −1/N
− max sup ρ0 , sup ρ1 W2 (µ0 , µ1 )2 , (29.52)
2
which in turn is stable, and still equivalent to the usual CD(K, N ) when
applied to smooth manifolds. For the purpose of the present chapter,
this approach would have worked fine; as far as I know, Theorem 29.28
was first proved for general K, N by this approach (in an unpublished
letter of mine from September 2004). But basing the definition of the
general CD(K, N ) criterion on (29.52) has a major drawback: It seems
very difficult, if not impossible, to derive any sharp geometric theorem
such as Bishop–Gromov or Bonnet–Myers from this inequality. We shall
see in the next chapter that these sharp inequalities do follow from
Definition 29.8.
(iv) Base the definition of CD(K, N ) on other inequalities, involv-
ing the volume growth. The main instance is the so-called measure
contraction property (MCP). This property involves a conditional
probability P (t) (x, y; dz) on the set [x, y]t of t-barycenters of x, y (think
of P (t) as a measurable rule to choose barycenters of x and y). By def-
inition, a metric-measure space (X , d) satisfies MCP(K, N ) if for any
858 29 Weak Ricci curvature bounds I: Definition and Stability

Borel set A ⊂ X ,
Z
(K,N ) ν[A]
βt (x, y) P (t) (x, y; A) ν(dy) ≤ ;
tN
and symetrically
Z
(K,N ) ν[A]
β1−t (x, y) P (t) (x, y; A) ν(dx) ≤ .
(1 − t)N

In the case K = 0, these inequalities basically reduce to ν[[x, B]t ] ≥

tN ν[B]. (Recall Theorem 19.6.) This approach has two drawbacks:
First, it does not extend to the case N = ∞; secondly, if M is a
Riemannian manifold, then MCP(K, N ) does not imply CD(K, N ), un-
less N coincides with the true dimension of the manifold. On the other
hand, CD(K, N ) implies MCP(K, N ), at least in a nonbranching space.
All this is discussed in independent works by Sturm [763, Section 6]
and Ohta [654]; see also [573, Remark 4.9]. Unlike the weak CD(K, N )
property, the MCP property is known to be true for finite-dimensional
Alexandrov spaces with curvature bounded below [654, Section 2]. An
interesting field of application of the MCP property is the analysis on
the Heisenberg group, which does not satisfy any CD(K, N ) bound but
still satisfies MCP(0, 5) [496].
Bonciocat and Sturm [143] modified the definition of weak CD(K, ∞)
spaces to allow for a fixed “resolution error” in the measurement of
distances (implying of course a modification of the transport cost).
This approach defines spaces which are “δ-approximately CD(K, ∞)”;
it gives the possibility to study Ricci curvature bounds outside the
scope of length spaces, including even discrete spaces in the analysis.
For instance, any weak CD(K, ∞) space is a limit of δ-approximate
CD(K, ∞) discrete spaces, with δ → 0. The procedure is a bit simi-
lar in spirit to the construction of CATδ (K) spaces [439], modulo of
course the introduction of the formalism of entropies and Wasserstein
geodesics. (So this theory provides one answer to the very last question
raised by Gromov in [439].) Bonciocat and Sturm apply it to classes of
planar homogeneous graphs.
Cordero-Erausquin suggested the use of the Prékopa–Leindler in-
equality as a possible alternative basis for a synthetic CD(K, N ) crite-
rion. This is motivated by the fact that the Prékopa–Leindler inequality
has many geometric consequences, including Sobolev or Talagrand in-
equalities (see the bibliographical notes for Chapters 21 and 22). So
Bibliographical notes 859

far nobody has undertaken this program seriously and it is not known
whether it includes some serious analytical difficulties.
Lott [576] noticed that (for a Riemannian manifold) at least CD(0, N )
bounds can be formulated in terms of displacement convexity of cer-
tain functionals explicitly involving the time variable. For instance,
CD(0, N ) is equivalent to the convexity of t → t Uν (µt ) + N t log t
on [0, 1], along displacement interpolation, for all U ∈ DC∞ ; rather
than convexity of Uν (µt ) for all U ∈ DCN . (Note carefully: in one
formulation the dimension is taken care of by the time-dependence
of the functional, while in the other one it is taken care of by the
class of nonlinearities.) More general curvature-dimension bounds can
probably be encoded by refined convexity estimates: for instance,
CD(K, N ) seems to be equivalent to the (displacement) convexity of
tHν (µt ) + N t log t − K(t3 /6) W2 (µ0 , µ1 )2 .
It seems likely that this observation can be developed into a com-
plete theory. For geometric applications, this point of view is probably
less sharp than the one based on distortion coefficients, but it may be
technically simpler.
A completely different approach to Ricci bounds in metric-measure
spaces has been under consideration in a work by Kontsevich and
Soibelman [528], in relation to Quantum Field Theory, mirror sym-
metry and heat kernels; see also [756]. Kontsevich pointed out to me
that the class of spaces covered by this approach is probably strictly
smaller than the class defined here, since it does not seem to include
the normed spaces considered in Example 29.16; he also suggested that
this point of view is related to the one of Ollivier, described below.
To close this list, I shall evoke the recent independent contributions
by Joulin [494, 495] and Ollivier [662] who suggested defining the infi-
mum of the Ricci curvature as the best constant K in the contraction
inequality
W1 (Pt δx , Pt δy ) ≤ e−Kt d(x, y),
where Pt is the heat semigroup (defined on probability measures); or
equivalently, as the best constant K in the inequality

kPt∗ f kLip ≤ e−Kt kf kLip ,

where Pt∗ is the adjoint of Pt (that is, the heat semigroup on func-
tions). Similar inequalities have been used before in concentration the-
ory [305, 595], and in the study of spectral gaps [231, 232] or large-time
860 29 Weak Ricci curvature bounds I: Definition and Stability

convergence [310, 458, 679, 729]. In a Riemannian setting, this deﬁni-

tion is justified by Sturm and von Renesse [764], who showed that one
recovers in this way the usual Ricci curvature bound, the key obser-
vation being that the parallel transport is close to be optimal in some
sense. (The distance W1 may be replaced by any Wr , 1 ≤ r < ∞.) This
point of view is natural when the problem includes a distinguished
Markov kernel; in particular, it leads to the possibility of treating dis-
crete spaces equipped with a random walk. Ollivier [662] has demon-
strated the geometric interest of this notion on an impressive list of
examples and applications, most of them in discrete spaces; and he
also derived geometric consequences such as concentration of measure.
The dimension can probably be taken into account by using refined es-
timates on the rates of convergence. On the other hand, the stability of
this notion requires a stronger topology than just measured Gromov–
Hausdorff convergence: also the Markov process should converge. (The
topology explored in [97] might be useful.) To summarize:
• there are two main classes of synthetic definitions of CD(K, N )
bounds: those based on displacement convexity of integral func-
tionals (as in Chapter 17) and those based on contraction prop-
erties of diffusion equations in Wasserstein distances (as in [764]),
the Jordan–Kinderlehrer–Otto theorem [493] ensuring the formal
equivalence of these points of view;
• the first definition was discretized by Bonciocat and Sturm [143],
while the second one was discretized by Joulin [495] and Ollivier [662].

Remark 29.30 essentially means that manifolds with (strongly) neg-

ative Ricci curvature bounds are dense in Gromov–Hausdorff topol-
ogy. Statements of this type are due in particular to Lokhamp; see
Berger [99, Section 12.3.5] and the references there provided.
Next, here are some further comments about the ingredients in the
proof of Theorem 29.24.
The definition and properties of the functional Uν acting on singu-
lar measures (Definition 29.1, Proposition 29.19, Theorem 29.20(i)–(ii))
were worked out in detail in [577]. At least some of these properties be-
long to folklore, but it is not so easy to find precise references. For the
particular case U (r) = r log r, there is a detailed alternative proof of
Theorem 29.20(i)–(ii) in [30, Lemmas 9.4.3 to 9.4.5, Corollary 9.4.6]
when X is a separable Hilbert space, possibly infinite-dimensional; the
proof of the contraction property in that reference does not rely on the
Bibliographical notes 861

Legendre representation. There is also a proof of the lower semicontinu-

ity and the contraction property, for general functions U , in [556, Chap-
ter 1]; the arguments there do not rely on the Legendre representation
either. I personally advocate the use of the Legendre representation, as
an efficient and versatile tool.
In [577], we also discussed the extension of these properties to spaces
that are not necessarily compact, but only locally compact, and refer-
ence measures that are not necessarily finite, but only locally finite.
Integrability conditions at infinity should be imposed on µ, as in The-
orem 17.8. The discussion on the Legendre representation in this gen-
eralized setting is a bit subtle, for instance it is in general impossible
to impose at the same time ϕ ∈ Cc (X ) and U ∗ (ϕ) ∈ Cc (X ). Here
I preferred to limit the use of the Legendre representation (and the
lower semicontinuity) to the compact case; but another approximation
argument will be used in the next chapter to extend the displacement
convexity inequalities to probability measures that are not compactly
supported.
The density of C(X ) in L1 (X , µ), where X is a locally compact space
and ν is a finite Borel measure, is a classical result that can be found
e.g. in [714, Theorem 3.14]. It is also true that nonnegative continuous
functions are dense in L1+ (X , µ), or that continuous probability densi-
ties are dense in the space of probability densities, equipped with the
L1 norm. All these results can be derived from Lusin’s approximation
theorem [714, Theorem 2.24].
The Tietze–Urysohn extension theorem states the following: If
(X , d) is a metric space, C is a closed subset of X , and f : C → R
is uniformly continuous on C, then it is possible to extend f into a
continuous function on the whole of X , with preservation of the supre-
mum norm of f ; see [318, Theorem 2.6.4].
The crucial approximation scheme based on regularization kernels
was used by Lott and myself in [577]. In Appendix C of that reference,
we worked out in detail the properties stated after Definition 29.34. We
used this tool extensively, and also discussed regularization in noncom-
pact spaces. Even in the framework of absolutely continuous measures,
the approach based on the regularizing kernel has many advantages over
Lusin’s approximation theorem (it is explicit, linear, preserves convex-
ity inequalities, etc.).
The existence of continuous partitions of unity was used in the First
Appendix to construct the regularizing kernels, and in the Second Ap-
862 29 Weak Ricci curvature bounds I: Definition and Stability

pendix about the separability of L1 (X ; C(Y)); the proof of this classical

result can be found e.g. in [714, Theorem 2.13].
Apart from Theorem 29.28, other “external” consequences of the
theory of weak CD(K, N ) spaces are discussed in [577, Section 7.2], in
the cases K = 0 and N = ∞.
Lemma 29.7 is taken from a recent work of mine with Figalli [372].
It will be used later in the proof of Theorems 30.37 and 30.42.
I shall conclude with some remarks about the examples considered
in this chapter.
The following generalization of Example 29.13 is proven in [30, The-
orems 9.4.10 and 9.4.11]: If ν is a finite measure on Rn such that
Hν is displacement convex, then ν takes the form e−V Hk , where V is
lower semicontinuous and Hk is the k-dimensional Hausdorff measure,
k = dim(Spt ν). The same reference extends to infinite-dimensional sep-
arable Hilbert spaces the result according to which Hν is displacement
convex if and only if ν is log-concave.
Example 29.15 was treated in [577]: We show that the quotient of
a CD(K, N ) Riemannian manifold by a compact Lie group action is
still a weak CD(K, N ) space, if K = 0 or N = ∞. (The definition
of CD(K, ∞) space used in [577] is not exactly the same, but Theo-
rem 30.32 in the next chapter shows that both definitions coincide in
nonbranching spaces.) The same theorem is also certainly true for all
values of K and N , although this was never written down explicitly.
More problematic is the extension to noncompact spaces X or non-
compact Lie groups G. The proof uses indeed an isomorphism between
P2 (X /G) and P2 (X )G , the set of probability measures on X which are
invariant under the action of G; but this isomorphism might not exist in
noncompact situations. Take for instance X = R, G = Z, then P2 (X )G
is the set of probability measures invariant by integer translation, which
is empty.
Elementary background on Lie group actions, and possibly singular
spaces obtained by this procedure, can be found in Burago, Burago and
Ivanov [174, Chapter 3]. This topic is also treated in dozens of books
on Riemannian geometry.
Example 29.16 will be considered in more detail in the final chapter.
It can be considered as the precursor of a large work by Ohta [657]
investigating curvature-dimension conditions in Finsler spaces.
Example 29.31 was explained to me by Lott; it is studied in [574].
This example shows that a lower bound on the Ricci curvature is not
Bibliographical notes 863

enough to ensure the continuity of the Hausdorﬀ measure under mea-

sured Gromov–Hausdorﬀ convergence. Such a phenomenon is necessar-
ily linked with collapsing (loss of dimension), as shown by the results
of continuity of the Hausdorﬀ measure for n-dimensional Alexandrov
spaces [751] or for limits of n-dimensional Riemannian manifolds with
Ricci curvature bounded below [228, Theorem 5.9].
Example 29.17 was also suggested by Lott.
30

Weak Ricci curvature bounds II: Geometric

and analytic properties

In the previous chapter I introduced the concept of weak curvature-

dimension bound, which extends the classical notion of curvature-
dimension bound from the world of smooth Riemannian manifolds to
the world of metric-measure geodesic spaces; then I proved that such
bounds are stable under measured Gromov–Hausdorﬀ convergence.
Still, this notion would be of limited value if it could not be used to
establish nontrivial conclusions. Fortunately, weak curvature-dimension
bounds do have interesting geometric and analytic consequences. This
might not be a surprise to the reader who has already browsed Part II
of these notes, since there many geometric and analytic statements of
Riemannian geometry were derived from optimal transport theory.
In this last chapter, I shall present the state of the art concern-
ing properties of weak CD(K, N ) spaces. This direction of research is
growing relatively fast, so the present list might soon become outdated.

Convention: Throughout the sequel, a “weak CD(K, N ) space” is a

locally compact, complete separable geodesic space (X , d) equipped with
a locally finite Borel measure ν, satisfying a weak CD(K, N ) condition
as in Definition 29.8.

Elementary properties
The next proposition gathers some almost immediate consequences of
the deﬁnition of weak CD(K, N ) spaces. I shall say that a subset X ′ of
a geodesic space (X , d) is totally convex if any geodesic whose endpoints
belong to X ′ is entirely contained in X ′ .
866 30 Weak Ricci curvature bounds II: Geometric and analytic properties

Proposition 30.1 (Elementary consequences of weak CD(K, N )

bounds). Let (X , d, ν) be a weak CD(K, N ) space. Then:
(i) If X ′ is a totally convex closed subset of X , then X ′ inherits from
(X , d, ν) a natural structure of metric-measure geodesic space, and X ′
is also a weak CD(K, N ) space;
(ii) For any α > 0, (X , d, αν) is a weak CD(K, N ) space;
(iii) For any λ > 0, (X , λ d, ν) is a weak CD(λ−2 K, N ) space.
Proof of Proposition 30.1. Let X ′ be a totally convex subset of X ,
equipped with the restriction of the distance d and the measure ν.
Let µ0 , µ1 be two probability measures in P2 (X ′ ). The notion of opti-
mal coupling is the same whether one considers them as measures on
X ′ or on X . Also, since X ′ is totally convex, a path [0, 1] → X with
endpoints in X ′ is a geodesic in X ′ if and only if it is a geodesic in X .
So X ′ is a geodesic space, and the representation theorem for Wasser-
stein geodesics (Theorem 7.22) ensures that a path (µt )0≤t≤1 valued in
P2 (X ′ ) is a geodesic in P2 (X ′ ) if and only if it is a geodesic in P2 (X ).
Property (i) follows immediately.
To prove (ii), note that the replacement of ν by αν induces a mul-
tiplication of the density ρ by α−1 ; so
β
Uαν (µ) = (Uα )ν (µ), Uπ,αν (µ) = (Uα )βπ,ν (µ),
where Uα (r) = α U (α−1 r). But the transform U → Uα leaves the class
DCN invariant. So the inequalities deﬁning the CD(K, N ) condition
will hold just the same in (X , d, αν) or in (X , d, ν).
As for (iii), recall the deﬁnition of β (K,N ) :

(K,N ) sin tα(N, K, d(x, y)) N −1
βt (x, y) = ,
t sin α(N, K, d(x, y))
where r
K
α(N, K, d) = d(x, y).
N −1
Then α(N, K, d) = α(N, λ−2 K, λd), and Property (iii) follows. ⊓
⊔
The next theorem shows that the property of being a CD(K, N )
space does not involve the whole space X , but only the support of ν:
Theorem 30.2 (Restriction of the CD(K, N ) property to the
support). A metric-measure space (X , d, ν) is a weak CD(K, N ) space
if and only if (Spt ν, d, ν) is itself a weak CD(K, N ) space.
Elementary properties 867

Remark 30.3. Theorem 30.2 allows us to systematically reduce to the

case when Spt ν = X in the study of properties of weak CD(K, N )
spaces. Then why not impose this in the definition of these spaces? The
answer is that on some occasions it is useful to allow X to be larger
than Spt ν, in particular for convergence issues. Indeed, it may very
well happen that a sequence of weak CD(K, N ) spaces (Xk , dk , νk )k∈N
with Spt νk = Xk converges in the measured Gromov–Hausdorff sense
to a metric-measure space (X , d, ν) such that Spt ν is strictly smaller
than X . This phenomenon of “reduction of support” is impossible if
N < ∞, as shown by Theorem 29.32, but can occur in the case
N = ∞. As a simple example, consider Xk = (Rn , | · |) (Euclidean
space) equipped with the sharply peaked Gaussian probability measure
2
e−k|x| dx/Zk , Zk being a normalizing constant. As k → ∞, Xk con-
verges in measured Gromov–Hausdorff sense to X = (Rn , | · |, δ0 ). Each
of the spaces Xk is a weak CD(0, ∞) space and satisfies Spt νk = Xk ,
however, the limit measure is supported in just a point. To summa-
rize: For weak CD(K, N ) spaces (X , d, ν) with N < ∞, one probably
does not lose anything by assuming Spt ν = X ; but in the class of weak
CD(K, ∞) spaces, the stability theorem would not be true if one would
not allow the support of ν to be strictly smaller than the whole space.

Proof of Theorem 30.2. First assume that (Spt ν, d, ν) has the weak
CD(K, N ) property. Replacing Spt ν by X does not enlarge the class
of probability measures that can be chosen for µ0 and µ1 in Deﬁni-
(K,N)
βt
tion 29.8, and does not change the functionals Uν or Uπ,ν either.
Because Spt ν is (by assumption) a length space, geodesics in Spt ν
are also geodesics in X . So geodesics in P2 (Spt ν) are also geodesics
in P2 (X ) (it is the converse that might be false), and then the prop-
erty of X ′ being a weak CD(K, N ) space implies that X is also a weak
CD(K, N ) space.
The converse implication is more subtle. Assume that (X , d, ν) is a
weak CD(K, N ) space. Let µ0 , µ1 be two compactly supported prob-
ability measures on X with Spt µ0 ⊂ Spt ν, Spt µ1 ⊂ Spt ν. For any
t0 ∈ {0, 1}, choose a sequence of probability measures (µk,t0 )k∈N con-
verging to µt0 , satisfying the conclusion of Theorem 29.20(iii), such
that the supports of all measures µk,t0 are included in a common com-
pact set. By deﬁnition of the weak CD(K, N ) property, for each k ∈ N
there is a Wasserstein geodesic (µk,t )0≤t≤1 and an associated coupling
πk ∈ Π(µk , νk ) such that for all t ∈ [0, 1] and U ∈ DCN ,
868 30 Weak Ricci curvature bounds II: Geometric and analytic properties
(K,N) (K,N)
β β
Uν (µk,t ) ≤ (1 − t) Uπk1−t
,ν (µk,0 ) + t Uπ̌kt ,ν (µk,1 ). (30.1)

Choosing H(r) = r log r, and using the monotonicity of the CD(K, N )

condition with respect to N , we deduce
(K,∞) (K,∞)
β β
Hν (µk,t ) ≤ (1 − t) Hπk1−t
,ν (µk,0 ) + t Hπ̌kt ,ν (µk,1 ). (30.2)

By an explicit calculation (as in the proof of (30.6) later in this chapter)

the right-hand side is equal to
Z
t(1 − t)
(1 − t) Hν (µk,0 ) + t Hν (µk,1 ) − K d(x0 , x1 )2 πk (dx0 dx1 ),
2
and this quantity is finite since µk,0 , µk,1 are compactly supported.
Then by (30.2), Hν (µk,t ) < +∞ for all t ∈ [0, 1] and for all k ∈ N.
Since H ′ (∞) = ∞, this implies that µk,t is absolutely continuous with
respect to ν, and in particular it is supported in Spt ν.
Next, by Ascoli’s theorem, some subsequence of the family (µk,t )
converges uniformly in C([0, 1], P2 (X )) to some Wasserstein geodesic
(µt )0≤t≤1 . Since Spt ν is closed, µt is also supported in Spt ν, for each
t ∈ [0, 1].
Let (γt )0≤t≤1 be a random geodesic such that µt = law (γt ). By the
preceding argument, P [γt ∈ / Spt ν] = 0 for any t ∈ [0, 1]. If (tj )j∈N
is a dense sequence of times in [0, 1], then P [∃ j; γtj ∈ / Spt ν] = 0.
But since γ is continuous and Spt ν closed, this can be reinforced into
P [∃ t; γt ∈/ Spt ν] = 0. So γ lies entirely in Spt ν, with probability 1.
The path (µt )0≤t≤1 is valued in P2 (Spt ν), and it is a geodesic in the
larger space P2 (X ); so it is also a geodesic in P2 (Spt ν).
Admit for the moment that Spt ν is a geodesic space. By Proposi-
tion 29.12, to show that Spt ν is a weak CD(K, N ) space it is sufficient
to establish the displacement convexity property when U ∈ DCN is
Lipschitz (for N < ∞) or behaves like r log r at infinity (for N = ∞).
For such a nonlinearity U , we can pass to the limit in (30.1), invoking
Theorem 29.20(i) and (iv):

Uν (µt ) ≤ lim inf Uν (µk,t )

k→∞
h β
(K,N) (K,N) i
βt
≤ lim sup (1 − t) Uπk1−t,ν (µk,0 ) + t Uπ̌k ,ν (µk,1 )
k→∞
(K,N) (K,N)
β
1−t β
≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ), (30.3)
Displacement convexity 869

where µ0 = ρ0 ν, µ1 = ρ1 ν, and π is an optimal coupling between µ0

and µ1 .
It only remains to check that Spt ν is indeed geodesic; this is not
a priori obvious since it needs not be totally convex. Let x0 , x1 be
any two points in Spt ν; then for any r > 0, we have ν[Br (x0 )] > 0,
ν[Br (x1 )] > 0; so it makes sense to define µ0 = 1Br (x0 ) /ν[Br (x0 )], and
µ1 = 1Br (x1 ) /ν[Br (x1 )]. The preceding reasoning shows that there is
a random geodesic γ (r) which lies entirely in Spt ν, and whose end-
points belong to Br (x0 ) and Br (x1 ). By Ascoli’s theorem, there is a
subsequence rj → 0 such that γ (rj ) converges uniformly to some ran-
dom geodesic γ, which necessarily satisfies γ0 = x0 , γ1 = x1 , and lies
entirely in Spt ν. So Spt ν satisfies the axioms of a geodesic space. ⊓ ⊔

Displacement convexity

The deﬁnition of weak CD(K, N ) spaces is based upon displacement

convexity inequalities, but these inequalities are only required to hold
under the assumption that µ0 and µ1 are compactly supported. To
exploit the full strength of displacement convexity inequalities, it is
important to get rid of this restriction.
The next theorem shows that the functionals appearing in Deﬁni-
tion 29.1 can be extended to measures µ that are not compactly sup-
ported, provided that the nonlinearity U belongs to some DCN class,
and the measure µ admits a moment of order p, where N and p are
related through the behavior of ν at inﬁnity. Recall Conventions 17.10
and 17.30 from Part II.
β
Theorem 30.4 (Domain of definition of Uν and Uπ,ν on non-
compact spaces). Let (X , d) be a boundedly compact Polish space,
equipped with a locally finite measure ν, and let z be any point in X . Let
K ∈ R, N ∈ [1, ∞], and U ∈ DCN . For any measure µ on X , introduce
its Lebesgue decomposition with respect to ν:

µ = ρ ν + µs .

Let π(dy|x) be a family of conditional probability measures on X , in-

dexed by x ∈ X , and let π(dx dy) = µ(dx) π(dy|x) be the associated
coupling with first marginal µ. Assume that
870 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Z Z
2
d(x, y) π(dx dy) < +∞; d(z, x)p µ(dx) < +∞,
X ×X X

where p ≥ 2 is such that

Z
 dν(x)

 < +∞ (N < ∞)

 X [1 + d(z, x)]p(N −1)
(30.4)

 Z

 −c d(z,x)p
∃ c > 0; e dν(x) < +∞ (N = ∞).
X

Then for any t ∈ [0, 1] the following expressions make sense in R ∪

{±∞} and can be taken as generalized definitions of the functionals
appearing in Definition 29.1:
Z
Uν (µ) := U (ρ(x)) ν(dx) + U ′ (∞) µs [X ];
X

Z !
(K,N)
βt ρ(x) (K,N )
Uπ,ν (µ) := U (K,N )
βt (x, y) π(dy|x) ν(dx)
X ×X βt (x, y)
+ U ′ (∞) µs [X ],

where the right-hand side really should be understood as

Z !
ρ(x) (K,N ′ )
lim
′
U (K,N ′ )
βt (x, y) π(dy|x) ν(dx)
N ↓N X ×X βt (x, y)
+ U ′ (∞) µs [X ].
β
Even if there is no such p, Uπ,ν (µ) still makes sense for any µ ∈ Pc (X ).

Proof of Theorem 30.4. The proof is the same as for Theorem 17.28
and Application 17.29, with only two minor diﬀerences: (a) ρ is not
necessarily a probability density, but still its integral is bounded above
by 1; (b) there is an additional term U ′ (∞) µs [X ] ∈ R ∪ {+∞}. ⊓
⊔

The next theorem is the ﬁnal goal of this section: It extends the
displacement convexity inequalities of Deﬁnition 29.8 to noncompact
situations.

Theorem 30.5 (Displacement convexity inequalities in weak

CD(K, N ) spaces). Let N ∈ [1, ∞], let (X , d, ν) be a weak CD(K, N )
space, and let p ∈ [2, +∞) ∪ {c} satisfy condition (30.4). Let µ0 and
Displacement convexity 871

µ1 be two probability measures in Pp (X ), whose supports are included

in Spt ν. Then there exists a Wasserstein geodesic (µt )0≤t≤1 , and an
associated optimal coupling π of (µ0 , µ1 ) such that, for all U ∈ DCN
and for all t ∈ [0, 1],
(K,N) (K,N)
β
1−t β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ). (30.5)

Furthermore, if N = ∞, one also has

λ(K, U ) t(1 − t)
Uν (µt ) ≤ (1 − t) Uν (µ0 ) + t Uν (µ1 ) − W2 (µ0 , µ1 )2 ,
2
(30.6)
where

 ′
Kp(r) K p (0) if K > 0
λ(K, U ) = inf = 0 if K = 0 (30.7)
r>0 r 
 ′
K p (∞) if K < 0.

These inequalities are the starting point for all subsequent inequal-
ities considered in the present chapter.
The proof of Theorem 30.5 will use two auxiliary results, which
generalize Theorems 29.20(i) and (iii) to noncompact situations. These
are definitely not the most general results of their kind, but they will
be enough to derive displacement convexity inequalities with a lot of
generality. As usual, I shall denote by M+ (X ) the set of finite (non-
negative) Borel measures on X , and by L1+ (X ) the set of nonnegative
ν-integrable measurable functions on X . Recall from Definition 6.8 the
notion of (weak) convergence in P2 (X ).

Theorem 30.6 (Lower semicontinuity of Uν again). Let (X , d) be

a boundedly compact Polish space, equipped with a locally finite measure
ν such that Spt ν = X . Let U : R+ → R be a continuous convex
function, with U (0) = 0, U (r) ≥ −c r for some c ∈ R. Then
(i) For any µ ∈ M+ (X ) and any sequence (µk )k∈N converging weakly
to µ in M+ (X ),
Uν (µ) ≤ lim inf Uν (µk ).
k→∞

(ii) Assume further that µ ∈ P2 (X ), and let β(x, y) be a positive

measurable function on X × X , with | log β(x, y)| = O(d(x, y)2 ). Then
there is a sequence (µk )k∈N of compactly supported probability measures
on X such that
872 30 Weak Ricci curvature bounds II: Geometric and analytic properties

µk −−−→ µ in P2 (X ),
k→∞
and for any sequence of probability measures (πk )k∈N such that the first
marginal of πk is µk , and the second one is µk,1 , the limits


πk −−−→ π

 k→∞

Z Z
2
d(x, y) πk (dx dy) −−−→ d(x, y)2 π(dx dy)

 k→∞



µk,1 −−−→ µ1 in P2 (X )
k→∞

imply
lim Uπβk ,ν (µk ) = Uπ,ν
β
(µ).
k→∞

Proof of Theorem 30.6. First of all, we may reduce to the case when U
is valued in R+ , just replacing U by r → U (r) + c r. So in the sequel U
will be nonnegative.
Let us start with (i). Let z be an arbitrary base point, and let
(χR )R>0 be a z-cutoﬀ as in the Appendix (that is, a family of cutoﬀ
continuous functions that are identically equal to 1 on a ball BR (z) and
identically equal to 0 outside BR+1] (z)). For any R > 0, write
Z Z
Uν (χR µ) = U (χR ρ) dν + U ′ (∞) χR dµs .

Since U is convex nonnegative with U (0) = 0, it is nondecreasing; by

the monotone convergence theorem,
Uν (χR µ) −−−−→ Uν (µ).
R→∞

In particular,
Uν (µ) = sup Uν (χR µ). (30.8)
R>0
On the other hand, for each ﬁxed R, we have
Uν (χR µ) = UχR+1 ν (χR µ),
and then we can apply Proposition 29.19(i) with the compact space
(BR+1] (z), ν), to get
Z Z
Uν (χR µ) = sup ϕ χR dµ − U ∗ (ϕ) χR+1 dν;
X X
′
ϕ ∈ Cb BR+1] (z) , ϕ ≤ U (∞) .
Displacement convexity 873

The function ϕχR , extended by 0 outside of BR+1] R , deﬁnes a bounded

continuous function on the whole of X , so µ 7−→ ϕ χR dµ is continu-
ous with respect to the weak topology of convergence against bounded
continuous functions. Thus Uν (χR µ) is a lower semicontinuous function
of µ. This combined with (30.8) shows that Uν is a lower semicontinuous
function of µ, which establishes (i).
Let us now turn to the proof of (ii), which is more tricky. As be-
fore I shall assume that U is nonnegative. Let BR = BR] (z). Again,
Rlet (χR )R>0 be a z-cutoff as in the Appendix. For k large enough,
χR dµ ≥ 1/2. Define
χk µ
µk = R .
χk dµ
We’ll see that µk does the job. The proof will use truncation and reg-
ularization into continuous functions.
So let (πk ) be as in the theorem. Define
Z
χR µ
µ(R) = R ; Z (R) = χR dµ.
χR dµ

Let ρ(R) be the density of the absolutely continuous part of µ(R) , and
(R)
µs be the singular part. It is obvious that ρ(R) converges to ρ in L1 (ν)
(R)
and µs [X ] → µs [X ] as R → ∞.
Next, deﬁne
 Z

 (R) ′ ′
1 − χR (y ) π(dy |x) δz ;
π (dy|x) = χR (y) π(dy|x) +





π (R) (dx dy) = µ(R) (dx) π (R) (dy|x).

Note that π (R) is a probability measure supported in BR × BR .

(R) (R) (R) (R) (R) (R)
Further, deﬁne µk , Zk , ρk , µk,s , πk (dy|x), πk (dx dy) in a
similar way, just replacing µ by µk . The explicit formula
Z Z Z
(R)
Ψ dπk = Ψ (x, y) χR (y) πk (dx dy)+ Ψ (x, z) 1−χR (y) πk (dx dy)

(R)
shows that πk converges to π (R) as k → ∞, for any fixed R.
The plan is to first replace the original expressions by the expressions
with the superscript (R), and then to pass to the limit as k → ∞ for
fixed R. For that I will distinguish two cases.
874 30 Weak Ricci curvature bounds II: Geometric and analytic properties

Case 1: U is Lipschitz. Then

β (R) β
U
π(R) ,ν (µ ) − Uπ,ν (µ)
Z !
ρ(R) (x)

≤ U β(x, y) π (R) (dy|x) ν(dx)
β(x, y)
Z
ρ(x)
− U β(x, y) π(dy|x) ν(dx)
β(x, y)
(R)
′
+ U (∞) µs [X ] − µs [X ]
Z (R) (x)
!
ρ ρ(x)
≤ U −U β(x, y) π (R) (dy|x) ν(dx)
β(x, y) β(x, y)
Z
ρ(x)
+ U β(x, y) 1 − χR (y) π(dy|x) ν(dx)
β(x, y)
Z Z
ρ(x) ′
′
+ U β(x, y) 1 − χR (y ) π(dy |x) δy=z ν(dx)
β(x, y)

+ kU kLip µ(R)s [X ] − µs [X ]

Z
(R)
≤kU kLip ρ (x) − ρ(x) π (R) (dy|x) ν(dx)
Z

+ ρ(x) 1 − χR (y) π(dy|x) ν(dx)
Z

+ ρ(x) 1 − χR (y ′ ) π(dy ′ |x) ν(dx)

(R)
+ µs [X ] − µs [X ]
Z Z
(R)
≤kU kLip (R)
|ρ
− ρ| dν + 2 1 − χR (y) π(dx dy) + µs [X ] − µs [X ] .

In the last expression, the second term inside brackets is bounded by

Z Z Z
2
2 π(dx dy) = 2 µ1 (dy) ≤ 2 d(z, y)2 µ1 (dy).
d(z,y)≥R d(z,y)≥R R

Conclusion: There is a constant C such that

β
Uπ(R) ,ν (µ(R) ) − Uπ,ν
β
(µ) ≤ C kρ(R) − ρkL1 (ν) + µ(R)
s [X ] − µs [X ]

Z
1
+ 2 d(z, y)2 µ1 (dy) . (30.9)
R
Displacement convexity 875

Similarly,1
(R)
β (R) (R)
U (R) (µk ) − Uπβk ,ν (µk ) ≤ C kρk − ρk kL1 (ν) + µk,s [X ] − µk,s [X ]
πk ,ν
Z
1
+ 2 d(z, y)2 µk,1 (dy) . (30.10)
R
(R) (R) (R)
Note that for k ≥ R, ρk = ρ(R) , and µk,s = µs . Then in view of
R
the deﬁnition of µk and the fact that d(z, y)2 µk,1 (dy) is bounded, we
easily deduce from (30.9) and (30.10) that

 β (R) β
 lim Uπ(R) ,ν (µ ) − Uπ,ν (µ) = 0;
R→∞


 β (R) β
 lim lim sup U (R) (µk ) − Uπk ,ν (µk ) = 0.
R→∞ k→∞ πk ,ν

So to prove the result, it is suﬃcient to establish that for ﬁxed R,

(R)
lim U β(R) (µk ) = Uπβ(R) ,ν (µ(R) ). (30.11)
k→∞ πk ,ν

(R)
The interest of this reduction is that all probability measures µk (resp.
(R)
πk ) are now supported in a common compact set, namely the closed
(R)
ball B2R (resp. B2R × B2R ). Note that µk converges to µ(R) .
(R)
If k is large enough, µk = µ(R) , so (30.11) becomes

U β(R) (µ(R) ) −−−→ Uπβ(R) ,ν (µ(R) ).

πk ,ν k→∞

In the sequel, I shall drop the superscript (R), so the goal will be

Uπβk ,ν (µ) −−−→ Uπ,ν

β
(µ).
k→∞

The argument now is similar to the one used in the proof of Theo-
rem 29.20(iii). Deﬁne

ρ(x) β(x, y)
g(x, y) = U ,
β(x, y) ρ(x)

with the convention that g(x, y) = U ′ (∞) when x ∈ Spt µs , and

g(x, y) = U ′ (0) when ρ(x) = 0 and x ∈
/ Spt µs . Then
1
Here again the notation might be confusing: µk,s stands for the singular part of
µk , while µk,1 is the second marginal of πk .
876 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Z Z
Uπβk ,ν (µ) = g(x, y) µ(dx) πk (dy|x) = g(x, y) πk (dx dy);
Z
Uπ,ν (µ) = g(x, y) π(dx dy).

Since g ∈ L1 ((B2R , µ); C(B2R )), by Lemma 29.36 there is a sequence

(Ψj )j∈N in C(B2R × B2R ) such that kΨj − gkL1 ((B2R ,µ);C(B2R )) −−−→ 0.
j→∞
Then
Z Z
β
sup Uπk ,ν (µ) − Ψj dπk ≤ g(x, y) − Ψj (x, y) πk (dx dy)
k∈N
Z

≤ sup g(x, y) − Ψj (x, y) µ(dx) −−−→ 0;
y j→∞
(30.12)
Z
β
Uπ,ν (µ) − Ψj dπ −−−→ 0; (30.13)
j→∞
and for each ﬁxed k,
Z Z
Ψj dπk −→ Ψj dπ. (30.14)

The combination of (30.12), (30.13) and (30.14) closes the case.

β
Case 2: U is not Lipschitz. If Uπ,ν (µ) < +∞ then necessar-
ily µs [X ] = 0. Moreover, there exist positive constants a, c such that
U (r) ≤ a r log(2 + r) and |U ′ (r)| ≤ c log(2 + r); in particular, there is
C > 0 such that

∀x, y ≥ 0, |U (x) − U (y)| ≤ C |x − y| log(2 + x) + log(2 + y) .
Possibly increasing the value of C, we deduce that

β
Uπ(R) ,ν (µ(R) ) − Uπ,ν
β
(µ)
Z (R)
ρ (x)
ρ(x) 1
≤C − log(2 + ρ(x)) + log 2 +
β(x, y) β(x, y) β(x, y)

+ log(2 + ρ(R) (x)) β(x, y) π (R) (dy|x) ν(dx)
Z
ρ(x) h 1 i
+C log(2 + ρ(x)) + log 2 + β(x, y)
β(x, y) β(x, y)

1 − χR (y) π(dy|x) ν(dx)
Z
ρ(x) h 1 i
+C log(2 + ρ(x)) + log 2 + β(x, y)
β(x, y) β(x, y)
Z

1 − χR (y ′ ) π(dy ′ |x) δy=z ν(dx).
Displacement convexity 877

Using ρ(R) ≤ 2ρ, log(1/β) ≤ C d(x, y)2 and reasoning as in the ﬁrst
case, we can bound the above expression by
Z

C ρ(R) (x) − ρ(x) log(2 + ρ(x)) ν(dx)
Z
2
+ C ρ(R) (x) − ρ(x) 1 + d(x, y) π (R) (dy|x) ν(dx)
Z

+ C ρ(x) log(2 + ρ(x)) 1 − χR (y) π(dy|x) ν(dx)
Z

+ C ρ(x) 1 + d(x, z)2 1 − χR (y) π(dy|x) ν(dx)

Z
(R)
≤C ρ (x) − ρ(x) log(2 + ρ(x)) ν(dx)
Z

+ C (1 + D ) ρ(R) (x) − ρ(x) ν(dx)
2

Z
(R)
+C ρ (x) − ρ(x) (1 + d(x, y)2 ) π(dy|x) + δz ν(dx)
d(x,y)≥D
Z

+ C log(2 + M ) ρ(x) 1 − χR (y) π(dy|x) ν(dx)
Z
+C ρ(x) log(2 + ρ(x)) π(dy|x) ν(dx)
ρ(x)≥M
Z
2

+ C (1 + D ) ρ(x) 1 − χR (y) π(dy|x) ν(dx)
Z
+C ρ(x) (1 + d(x, z)2 ) π(dy|x) ν(dx)
d(x,z)≥D
Z

≤ C (1 + D ) ρ(R) (x) − ρ(x) log(2 + ρ(x)) ν(dx)
2

Z

+C 1 + d(x, y)2 π(dx dy)
d(x,y)≥D
Z

+C 1 + d(x, z)2 π(dx dy)
d(x,z)≥D
Z

+ C log(2 + M ) + (1 + D 2 ) 1 − χR (y) π(dx dy)
Z
+C ρ log(2 + ρ) dν.
ρ≥M
878 30 Weak Ricci curvature bounds II: Geometric and analytic properties

Since d(x, y)2 1d(x,y)≥D ≤ d(x, z)2 1d(x,z)≥D/2 + d(y, z)2 1d(y,z)≥D/2 , the
above expression can in turn be bounded by
Z Z
(R)
2
C (1 + D ) ρ − ρ log(2 + ρ) dν + C 1 + d(x, z)2 µ(dx)
d(z,x)≥D/2
Z Z

+C 1 + d(y, z)2 µ1 (dy) + C ρ log(2 + ρ) dν
d(z,y)≥D/2 ρ≥M
Z
log(2 + M ) + D 2
+C d(z, y)2 µ1 (dy).
R2 d(z,y)≥D/2

Of course this bound converges to 0 if R → ∞, then M, D → ∞.

β (R) β
Similarly, U (R) (µk ) − Uπk ,ν (µk ) is bounded by
πk ,ν
Z Z

C (1 + D 2 ) ρ(R) − ρ log(2 + ρ) dν + C 1 + d(x, z)2 µ(dx)
d(z,x)≥D/2
Z Z
2

+C 1 + d(y, z) µk,1 (dy) + C ρ log(2 + ρ) dν
d(z,y)≥D/2 ρ≥M
Z
log(2 + M ) + D2
+C d(z, y)2 µk,1 (dy).
R2 d(z,y)≥D/2

By letting k → ∞, then R → ∞, then M, D → ∞ and using the

deﬁnition of convergence in P2 (X ), we conclude that

(R)
lim sup U β(R) (µk ) − Uπβk ,ν (µk ) −−−−→ 0.
k→∞ πk ,ν R→∞

From that point on, the proof is similar to the one in the first case.
(To prove that g ∈ L1 (B2R ; C(B2R )) one can use the fact that β is
bounded from above and below by positive constants on B2R × B2R ,
and apply the same estimates as in the proof of Theorem 29.20(ii).) ⊓ ⊔
Proof of Theorem 30.5. By an approximation theorem as in the proof of
Proposition 29.12, we may restrict to the case when U is nonnegative;
we may also assume that U is Lipschitz (in case N < ∞) or that it
behaves at infinity like a r log r+b r (in case N = ∞). By approximating
N by N ′ > N , we may also assume that the distortion coefficients
(K,N ) (K,N )
βt (x, y) are locally bounded and | log βt (x, y)| = O(d(x, y)2 ).
Let (µk,0 )k∈N (resp. (µk,1 )k∈N ) be a sequence converging to µ0 (resp.
to µ1 ) and satisfying the conclusions of Theorem 30.6(ii). For each k
there is a Wasserstein geodesic (µk,t )0≤t≤1 and an associated coupling
πk of (µk,0 , µk,1 ) such that
Displacement convexity 879
(K,N) (K,N)
β β
Uν (µk,t ) ≤ (1 − t) Uπk1−t t
,ν (µk,0 ) + t Uπ̌k ,ν (µk,1 ). (30.15)
Further, let Πk be a dynamical optimal transference plan such that
(et )# Πk = µk,t and (e0 , e1 )# Πk = πk . Since the sequence µk,0 con-
verges weakly to µ0 , its elements belong to a compact subset of P (X );
the same is true of the measures µk,1 . By Theorem 7.21 the families
(µk,t )0≤t≤1 belong to a compact subset of C([0, 1]; P (X )); and also the
dynamical optimal transference plans Πk belong to a compact subset
of P (C([0, 1]; X )). So up to extraction of a subsequence we may as-
sume that Πk converges to some Π, (µk,t )0≤t≤1 converges to some path
(µt )0≤t≤1 (uniformly in t), and πk converges to some π. Since the eval-
uation map is continuous, it is immediate that π = (e0 , e1 )# Π and
µt = (et )# Π.
By Theorem 30.6(i), Uν (µt ) ≤ lim inf Uν (µk,t ). Then, by construc-
k→∞
tion (and Theorem 30.6(ii)),
 (K,N) (K,N)

 β β1−t
lim sup Uπk1−t,ν (µk,0 ) ≤ Uπ,ν (µ0 )
k→∞
(K,N) (K,N)

 β βt
lim sup Uπ̌kt ,ν (µk,1 ) ≤ Uπ̌,ν (µ1 ).
k→∞

The desired inequality (30.5) follows by plugging the above into (30.15).

To deduce (30.6) from (30.5), I shall use a reasoning similar to the

one in the proof of Theorem 20.10. Since N = ∞, Proposition 17.7(ii)
implies U ′ (∞) = +∞ (save for the trivial case where U is linear), so we
may assume that µ0 and µ1 are absolutely continuous with respect to ν,
with respective densities ρ0 and ρ1 . The convexity of u : δ 7−→ U (e−δ )eδ
implies
!
(K,∞)
ρ1 (x1 ) βt (x0 , x1 )
U (K,∞)
βt (x0 , x1 ) ρ1 (x1 )
1
≤ U (ρ1 (x1 ))
ρ1 (x1 )
! !
(K,∞) (K,∞)
βt (x0 , x1 ) ρ1 (x1 ) 1 β (x0 , x1 )
+ p (K,∞)
log − log t
ρ1 (x1 ) βt
(x0 , x1 ) ρ1 (x1 ) ρ1 (x1 )
(K,∞)
U (ρ1 (x1 )) βt ρ1 (x1 ) K (1 − t2 )
(x0 , x1 )
= − p (K,∞) d(x0 , x1 )2
ρ1 (x1 ) ρ1 (x1 ) β 6
t
U (ρ1 (x1 )) 1 − t2
≤ − λ(K, U ) d(x0 , x1 )2 ;
ρ1 (x1 ) 6
880 30 Weak Ricci curvature bounds II: Geometric and analytic properties

so

βt
(K,∞)
1 − t2
Uπ̌,ν (µ1 ) ≤ Uν (µ1 ) − λ(K, U ) W2 (µ0 , µ1 )2 . (30.16)
6

Similarly,

β1−t
(K,∞)
1 − (1 − t)2
Uπ,ν (µ0 ) ≤ Uν (µ0 )−λ(K, U ) W2 (µ0 , µ1 )2 . (30.17)
6

Then (30.6) follows from (30.5), (30.16), (30.17) and the identity

1 − t2 1 − (1 − t)2 t(1 − t)
t + (1 − t) = .
6 6 2
⊓
⊔

Brunn–Minkowski inequality

The next theorem can be taken as the ﬁrst step to control volumes in
weak CD(K, N ) spaces:

Theorem 30.7 (Brunn–Minkowski inequality in weak CD(K, N )

spaces). Let K ∈ R and N ∈ [1, ∞]. Let (X , d, ν) be a weak CD(K, N )
space, let A0 , A1 be two compact subsets of Spt ν, and let t ∈ (0, 1).
Then:
• If N < ∞,

1 (K,N ) 1 1
ν [A0 , A1 ]t N
≥ (1 − t) inf β1−t (x0 , x1 ) N ν[A0 ] N
(x0 ,x1 )∈A0 ×A1

(K,N ) 1 1
+ t inf βt (x0 , x1 ) N ν[A1 ] N . (30.18)
(x0 ,x1 )∈A0 ×A1

• In particular, if N < ∞ and K ≥ 0, then

1 1 1
ν [A0 , A1 ]t N ≥ (1 − t) ν[A0 ] N + t ν[A1 ] N . (30.19)
Brunn–Minkowski inequality 881

• If N = ∞, then

1 1 1
log ≤ (1 − t) log + t log
ν [A0 , A1 ]t ν[A 0 ] ν[A 1]
K t(1 − t)
− sup d(x0 , x1 )2 . (30.20)
2 x0 ∈A0 , x1 ∈A1

Proof of Theorem 30.7. The proof is the same, mutatis mutandis, as

the proof of Theorem 18.5: Use the regularity of ν to reduce to
the case when ν[A0 ], ν[A1 ] > 0; then deﬁne µ0 = (1A0 /ν[A0 ])ν,
µ1 = (1A1 /ν[A1 ])ν, and apply the displacement convexity inequality
from Theorem 30.5 with the nonlinearity U (r) = −r 1−1/N if N < ∞,
U (r) = r log r if N = ∞. ⊓
⊔

Remark 30.8. The result fails if A0 , A1 are not assumed to lie in the
support of ν. (Take ν = δx0 , x1 6= x0 , and A0 = {x0 }, A1 = {x1 }.)

Here are two interesting corollaries:

Corollary 30.9 (Nonatomicity of the support). Let K ∈ R and

N ∈ [1, ∞]. If (X , d, ν) is a weak CD(K, N ) space, then either ν is a
Dirac mass, or ν has no atom.

Corollary 30.10 (Exhaustion by intermediate points). Let K ∈

R and N ∈ [1, ∞). Let (X , d, ν) be a weak CD(K, N ) space, let A be a
compact subset of Spt ν, and let x ∈ A. Then

ν [x, A]t −−→ ν[A].
t→1

Proof of Corollary 30.9. This corollary will be derived as a consequence

of (30.20). By Theorem 30.2, we may assume without loss of generality
that Spt ν = X . Suppose that ν has an atom, i.e. some x0 ∈ X with
ν[{x0 }] > 0; and yet ν 6= δx0 , so that ν[X \ {x0 }] > 0. Deﬁne A0 = {x0 }
and let A1 be some compact subset of X \ {x0 } such that ν[A1 ] > 0.
For t > 0, [A0 , A1 ]t does not contain x0 , but it is included in a ball that
shrinks around x0 ; it follows that ν[[A0 , A1 ]t ] converges to 0 as t → 0.
So log(1/ν[[A0 , A1 ]t ]) → +∞ as t → 0; but this contradicts (30.20). ⊓ ⊔

Proof of Corollary 30.10. Deﬁne R = max{d(x, a); a ∈ A}. Then

[x, A]t ⊂ A(1−t) R] = {y; d(y, A) ≤ (1− t) R}; so ν[[x, A]t ] ≤ ν[A(1−t) R] ].
The limit t → 1 yields
882 30 Weak Ricci curvature bounds II: Geometric and analytic properties

lim sup ν [x, A]t ≤ ν[A]. (30.21)
t→1

This was the easy part, which does not need the CD(K, N ) condition.
To prove the lower bound, apply (30.18) with A0 = {x}, A1 = A:
This results in
1 (K,N ) 1 1
ν [x, A]t N ≥ t inf βt (x, a) N ν[A] N .
a∈A

(K,N )
As t → 1, inf βt (x, a) converges to 1, so we may pass to the lim inf
and recover
lim inf ν [x, A]t ≥ ν[A].
t→1

This combined with (30.21) proves the claim. ⊓

⊔

Bishop–Gromov inequalities

Once we know that ν has no atom, we can get much more precise
information and control on the growth of the volume of balls, and in
particular prove sharp Bishop–Gromov inequalities for weak CD(K, N )
spaces with N < ∞:

Theorem 30.11 (Bishop–Gromov inequality in metric-measure

spaces). Let (X , d, ν) be a weak CD(K, N ) space and let x0 ∈ Spt ν.
Then, for any r > 0, ν[B[x0 , r]] = ν[B(x0 , r)]. Moreover,
• If N < ∞, then

ν[Br (x0 )]
Z r is a nonincreasing function of r, (30.22)
s(K,N ) (t) dt
0

where s(K,N ) is defined as in Theorem 18.8.

• If N = ∞, then for any δ > 0 there exists a constant C =
C K− , δ, ν[Bδ (x0 )], ν[B2δ (x0 )] , such that for all r ≥ δ,

r2
ν[Br (x0 )] ≤ eCr e(K− ) 2 ; (30.23)
2
−K r2
ν[Br+δ (x0 ) \ Br (x0 )] ≤ eCr e if K > 0. (30.24)
Bishop–Gromov inequalities 883

In particular, if K ′ < K then

Z
K′ 2
e 2 d(x0 ,x) ν(dx) < +∞. (30.25)

Before providing the proof of this theorem, I shall state three im-
mediate but important corollaries, all of them in ﬁnite dimension.

Corollary 30.12 (Measure of small balls in weak CD(K, N )

spaces). Let (X , d, ν) be a weak CD(K, N ) space and let z ∈ Spt ν.
Then for any R > 0 there is a constant c = c(K, N, R) such that if
B(x0 , r) ⊂ B(z, R) then

ν[B(x0 , r)] ≥ c ν[B(z, R)] r N .

Corollary 30.13 (Dimension of weak CD(K, N ) spaces). If X is

a weak CD(K, N ) space with K ∈ R and N ∈ [1, ∞), then the Hausdorff
dimension of Spt ν is at most N .

Corollary 30.14 (Weak CD(K, N ) spaces are locally doubling).

If X is a weak CD(K, N ) space with K ∈ R, N < ∞, Spt ν = X ,
then (X , d, ν) is C-doubling on each ball B(z, R), with a constant C
depending only on K, N and R. In particular if diam (X ) ≤ D then
(X , d, ν) is C-doubling with a constant C = C(K, N, D).

Remark 30.15. Corollary 30.14, combined with the general theory of

Gromov–Hausdorﬀ convergence (as exposed in Chapter 27), implies the
compactness Theorem 29.32.

Remark 30.16. It is natural to ask whether the equality N = dim(X )

in Corollary 30.13 forces ν to be proportional to the N -dimensional
Hausdorﬀ measure.

Proofs of Corollaries 30.12 to 30.14. If B(x0 , r) ⊂ B(z, R), then obvi-

ously B(z, R) ⊂ B(x0 , 2R), so by (30.22),
R r (K,N ) !
0 s (t) dt
ν[B(x0 , r)] ≥ R 2R ν[B(z, R)].
s (K,N ) (t) dt
0

Then Corollary 30.12 follows from the elementary observation that

R r (K,N
( 0s ) (t) dt)/r N is bounded below by a positive constant K(R) if

r ≤ R.
884 30 Weak Ricci curvature bounds II: Geometric and analytic properties

Next, Corollary 30.12 and the deﬁnition of Hausdorﬀ measure imply

Hd [B(z, R)] = 0 for any d > N , where Hd stands for the d-dimensional
Hausdorﬀ measure. This gives Corollary 30.13 at once.
Finally, Corollary 30.14 is a consequence of the elementary estimate
R 2r (K,N )
s
r ≤ R =⇒ R0r ≤ C(R).
(K,N )
0 s
⊓
⊔
Proof of Theorem 30.11. The open ball (resp. closed ball) of center x0
and radius r in the space (Spt ν, d) coincides with B(x0 , r)∩Spt ν (resp.
B[x0 , r] ∩ Spt ν). So we may use Theorem 30.2 to reduce to the case
X = Spt ν.
Next, we may dismiss the case where ν is a Dirac mass as trivial;
then by Corollary 30.9, we may assume that ν has no atom.
Let x0 ∈ X and r > 0. The open ball Br (x0 ) contains [x0 , Br] (x0 )]t ,
for all t ∈ (0, 1). By Corollary 30.10,

ν[Br (x0 )] ≥ lim ν x0 , Br] (x0 ) t = ν[Br] (x0 )],
t→1

so ν[Br (x0 )] = ν[Br] (x0 )].

To prove (30.22), apply the displacement convexity inequality (30.5)
in the case when U (r) = − r 1−1/N , µ0 = δx0 , µ1 = ρ1 dν, where ρ1 is
the normalized indicator function of the set A1 = Br+ε (x) \ Br (x); and
U (r) = − r 1−1/N . Note that Uν (µ0 ) = 0, and apply the same reasoning
as in the proof of Theorem 18.8.
To prove (30.23), use inequality (30.20) and a reasoning similar to
the proof of Theorem 18.12. ⊓
⊔

Uniqueness of geodesics
It is an important result in Riemannian geometry that for almost any
pair of points (x, y) in a complete Riemannian manifold, x and y are
linked by a unique geodesic. This statement does not extend to gen-
eral weak CD(K, N ) spaces, as will be discussed in the concluding
chapter; however, it becomes true if the weak CD(K, N ) criterion is
supplemented with a nonbranching condition. Recall that a geodesic
space (X , d) is said to be nonbranching if two distinct constant-speed
geodesics cannot coincide on a nontrivial interval.
Regularity of the interpolant 885

Theorem 30.17 (Unique geodesics in nonbranching CD(K, N )

spaces). Let (X , d, ν) be a nonbranching weak CD(K, N ) space with
K ∈ R and N ∈ [1, ∞). Then for ν ⊗ν-almost any (x, y) ∈ X ×X , there
is a unique geodesic joining x to y. More precisely, for any x ∈ Spt ν,
the set of points y ∈ Spt ν which can be joined to x by several geodesics
has zero measure.

Remark 30.18. The restriction N < ∞ seems natural, but I don’t

have a counterexample for N = ∞.

Proof of Theorem 30.17. By Theorem 30.2 we may assume Spt ν = X .

Let x ∈ X , r > 0, A = Br (x) and At = [x, Br (x)]t ⊂ Btr (x). For any
z ∈ At , there is a geodesic γ joining x to some y ∈ Z, with γ(t) = z.
Assume that there is another distinct geodesic γ e joining x to z; up to
a rescaling of time, one may assume that also e γ is deﬁned on [0, t], so
that γ(0) = x, γ e(t) = z. (e
γ might not be deﬁned after time t.) Then
the curve obtained by concatenation of e γ on [0, t] and γ on [t, 1] is
also a geodesic, and it is distinct from γ, which is impossible since it
coincides with γ on the nontrivial interval [t, 1]. The conclusion is that
there is one and only one geodesic joining x to z; it is obtained by
reparametrizing the restriction of γ to the interval [0, t].
Let Z := ∪0<t<1 At ⊂ A. The preceding reasoning shows that for
any z ∈ Z there is only one geodesic path joining x to z. The sets At
are nondecreasing in t, so
h [ i
ν At = lim ν [x, A]t = ν[A],
t→1
0<t<1

where the ﬁrst equality follows from the monotone convergence theorem
and the second from Corollary 30.10.
So for any k ∈ N, the set Zk of points in Bk (x) which can be joined
to x by several geodesics is of zero measure. The set of points in X
which can be joined to x by several geodesics is contained in the union
of all Zk , and is therefore of zero measure too. ⊓
⊔

Regularity of the interpolant

If M is a smooth Riemannian manifold, then the displacement inter-

polant µt between µ0 and µ1 is absolutely continuous with respect to
886 30 Weak Ricci curvature bounds II: Geometric and analytic properties

the volume measure, as soon as either µ0 or µ1 is absolutely continuous

(Theorem 8.5(ii)). It is not known whether this property is still true
in weak CD(K, N ) spaces; but a weakened version is still available.
Moreover, some regularity properties are known to be inherited by the
displacement interpolant if they are satisﬁed by both µ0 and µ1 . This
is the content of the next theorem. Just as Theorem 30.17, it uses an
assumption of ﬁnite dimension.

Theorem 30.19 (Regularity of interpolants in weak CD(K, N )

spaces). Let (X , d, ν) be a weak CD(K, N ) space with K ∈ R and
N ∈ [1, ∞). Further, let µ0 , µ1 be two probability measures in P2 (X )
with Spt µ0 ⊂ Spt ν, Spt µ1 ⊂ Spt ν. Then:
(i) Assume that both µ0 and µ1 are absolutely continuous with re-
spect to ν; if K < 0, further assume that they are compactly supported.
Let (µt )0≤t≤1 be a Wasserstein geodesic satisfying the displacement con-
vexity inequalities of Theorem 30.5. Then also µt is absolutely contin-
uous, for all t ∈ [0, 1].
(ii) If either µ0 or µ1 is absolutely continuous, and t0 ∈ (0, 1) is
given, then one can find a Wasserstein geodesic joining µ0 to µ1 , such
that µt0 is also absolutely continuous.
(iii) If either µ0 or µ1 is not purely singular, then one can find a
Wasserstein geodesic joining µ0 to µ1 , such that for any t ∈ [0, 1], µt
is not purely singular.

Theorem 30.20 (Uniform bound on the interpolant in nonneg-

ative curvature). Let (X , d, ν) be a weak CD(0, ∞) space and let
µ0 , µ1 ∈ P ac (X ), with bounded respective densities ρ0 , ρ1 . Let (µt )0≤t≤1
be a Wasserstein geodesic satisfying the displacement convexity in-
equalities of Theorem 30.5. Then the density ρt of µt is bounded by
max (sup ρ0 , sup ρ1 ).

Proof of Theorem 30.19. First assume that K ≥ 0, or, what amounts

to the same, K = 0. Since µ0 and µ1 are absolutely continuous, by the
Dunford–Pettis theorem there exists Ψ : R+ → R+ such that
Z Z
Ψ (r)
lim = +∞, Ψ (ρ0 ) dν < +∞, Ψ (ρ1 ) dν < +∞.
r→∞ r

Thanks to Proposition 17.7(i), one may assume that Ψ belongs to DCN .

Then the convexity inequality
Regularity of the interpolant 887

Ψν (µt ) ≤ (1 − t) Ψν (µ0 ) + t Ψν (µ1 ) < +∞

shows that µt is absolutely continuous. This proves (i).

Now consider the case when K < 0. It is not hard to prove that in the
Dunford–Pettis theorem, one may impose Ψ to have polynomial growth,
in the sense that 0 ≤ U (ar) ≤ C(a)[U (r) + 1] for any a > 1. Since µ0
and µ1 are compactly supported, the distortion coeﬃcients β(x0 , x1 )
appearing in the right-hand side of the inequalities in Theorem 30.5 are
bounded from above and R below; then the integrability of Ψ (ρ) also im-
plies the ﬁniteness of β(x0 , x1 ) Ψ (ρ(x0 )/β(x0 , x1 )) π(dx1 |x0 ) ν(dx0 ),
and the same reasoning as before applies.
To prove (ii), let U (r) = −N r 1−1/N ; U ∈ DCN and U ′ (∞) = 0.
Moreover, Uν (µ) < 0 as soon as µ is absolutely continuous. Among all
dynamical optimal transport plans Π with (e0 )# Π = µ0 , (e1 )# Π = µ1
which satisfy the CD(K, N ) displacement convexity inequality, choose
one with minimal Uν (µt0 ). There exists one such dynamical optimal
transport plan by compactness of the set of admissible transference
plans (Lemma 4.4) and lower semicontinuity of Uν (Theorem 30.6(i)).2
Assume that µt0 is not absolutely continuous, i.e. there exists a Borel
set Z ⊂ X with ν[Z] = 0 and µt0 [Z] > 0. Let Π ′ be the restricted trans-
port obtained by conditioning Π by the event “γ(t0 ) ∈ Z”. The mea-
sures (et )# Π ′ = µ′t satisfy the following properties: (a) µ′t ≤ µt /µt0 [Z],
and in particular µ′0 is absolutely continuous; (b) µ′t0 is concentrated
(K,N)
β1−t
on Z. So Uν (µ′t0 ) = 0, but Uπ,ν 0
(µ′0 ) < 0, and
(K,N) (K,N)
β1−t βt
Uν (µ′t0 ) ≤ (1 − t0 ) Uπ,ν 0
(µ′0 ) + t0 Uπ̌,ν0 (µ′1 ) (30.26)

cannot hold.
On the other hand, there has to be some dynamical optimal trans-
port plan Π ′′ such that (e0 )# Π ′′ = µ′0 , (e1 )# Π ′′ = µ′1 and inequal-
ity (30.26) holds true with µ′t0 replaced by µ′′t0 = (et0 )# Π ′′ . In par-
ticular, Uν (µ′′t0 ) < Uν (µ′t0 ) = 0, which implies that µ′′t0 is not purely
singular.
2
Here I am cheating a bit because Theorem 30.6, in the version which I have
stated, assumes U (r) ≥ −c r. To deal with this issue, one could prove a more
general version of Theorem 30.6; but a simpler remedy is to introduce ε > 0 and
choose U (r) = −N r (r +ε)−1/N instead of −N r 1−1/N . If µ0 and µ1 are compactly
supported, one may keep the proof as it is and use Theorem 29.20(i) instead of
Theorem 30.6(i).
888 30 Weak Ricci curvature bounds II: Geometric and analytic properties

b deﬁned by
Now consider the plan Π
b = P [γt ∈ Z] Π ′′ + 1[γ ∈Z]
Π Π. (30.27)
0 t0 /

This is still a dynamical optimal transport plan, because

Z
b
d(γ0 ,γ1 )2 Π(dγ)
Z Z
2 ′′ 2
= P [γt0 ∈ Z] d(γ0 , γ1 ) Π (dγ) + 1γt0 ∈Z / d(γ0 , γ1 ) Π(dγ)
Z Z
2 ′ 2
= P [γt0 ∈ Z] d(γ0 , γ1 ) Π (dγ) + 1γt0 ∈Z/ d(γ0 , γ1 ) Π(dγ)
Z
= d(γ0 , γ1 )2 Π(dγ).

(To pass from the ﬁrst to the second line I used the fact that Π ′ and
Π ′′ are displacement optimal transference plans between the same two
measures.)
It follows from (30.27) and the ν-negligibility of Z that

ρbt0 = a ρ′′t0 + 1γt0 ∈Z ′′

/ ρt0 = a ρt0 + ρt0 (ν-almost surely),

where ρbt0 , ρ′′t0 and ρt0 respectively stand for the density of the absolutely
continuous parts of µ bt0 , µ′′t0 and µt0 , and a = P [γt0 ∈ Z] > 0. Then
from the minimality property of µt0 ,
Z Z

U (ρt0 (x)) dν(x) = Uν (µt0 ) ≤ Uν (b µt0 ) = U ρt0 (x)+a ρ′′t0 (x) dν(x).

Since a is positive and U is strictly decreasing, this inequality is possible

only if ρ′′t0 = 0 almost everywhere, but this would contradict the fact
that µ′′t0 is not purely singular. The only possibility left out is that µt0
is absolutely continuous. This proves (ii).
Statement (iii) is based on the same principle as (ii), but now this is
much simpler: Choose U (r) = −N r 1−1/N , and choose a displacement
interpolation (µt )0≤t≤1 satisfying the convexity inequality
(K,N) (K,N)
β
1−t β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ).

If, say, µ0 is not purely singular, then the ﬁrst term on the right-hand
side is negative, while the second one is nonpositive. It follows that
Uν (µt ) < 0, and therefore µt is not purely singular. ⊓
⊔
HWI and logarithmic Sobolev inequalities 889

Proof of Theorem 30.20. Let ρt be the density of µt , and let U (r) = r p ,

where p ≥ 1. Since U ∈ DC∞ , (30.5) implies

kρt kpLp (ν) ≤ Uν (µt ) ≤ (1 − t) kρ0 kpLp (ν) + t kρ1 kpLp (ν)
p
≤ max kρ0 kLp (ν) , kρ1 kLp (ν) . (30.28)

Since ρ0 and ρ1 belong to L1 (ν) and L∞ (ν), by elementary interpola-

tion they belong to all Lp spaces, so the right-hand side in (30.28) is
also ﬁnite, and ρt belongs to Lp . Then take powers 1/p in both sides
of (30.28) and pass to the limit as p → ∞, to recover

kρt kL∞ (ν) ≤ max kρ0 kL∞ (ν) , kρ1 kL∞ (ν) .

⊓
⊔

Remark 30.21. The above argument exploited the fact that in the def-
inition of weak CD(K, N ) spaces the displacement convexity inequal-
ity (29.11) is required to hold for all members of DCN and along a
common Wasserstein geodesic.

HWI and logarithmic Sobolev inequalities

There is a generalized notion of Fisher information in a metric-measure

space (X , d, ν):
Z
|∇− ρ|2
Iν (µ) = dν, µ = ρ ν,
ρ

where |∇− ρ| is deﬁned by (20.2) (one may also use |∇ρ| in place of
|∇− ρ|). With this notion, one has the following estimates:

Theorem 30.22 (HWI and log Sobolev inequalities in weak

CD(K, ∞) spaces). Let K ∈ R and let (X , d, ν) be a weak CD(K, ∞)
space. Further, let µ0 and µ1 be two probability measures in P2 (X ),
such that µ0 = ρ0 ν with ρ0 Lipschitz. Then
p K W2 (µ0 , µ1 )2
Hν (µ0 ) ≤ Hν (µ1 ) + W2 (µ0 , µ1 ) Iν (µ0 ) − . (30.29)
2
890 30 Weak Ricci curvature bounds II: Geometric and analytic properties

In particular, if ν ∈ P2 (X ), then, for any µ ∈ P2 (X ) with Lipschitz-

continuous density,
p K W2 (µ, ν)2
Hν (µ) ≤ W2 (µ, ν) Iν (µ) − . (30.30)
2
Consequently, if K > 0, ν satisfies a logarithmic Sobolev inequality
with constant K:
Iν
Hν ≤ .
2K
Proof of Theorem 30.22. By an easy approximation argument, it suf-
ﬁces to establish (30.29) when Hν (µ0 ) < +∞, Hν (µ1 ) < +∞. Let
(µt )0≤t≤1 be a displacement interpolation satisfying the inequalities of
Theorem 30.5. By Theorem 30.19(i), or more directly by the inequality
Hν (µt ) ≤ (1−t) Hν (µ0 )+t Hν (µ1 )−K t(1−t) W2 (µ0 , µ1 )2 /2, each µt is
absolutely continuous. Then we can repeat the proof of Theorem 20.1,
Corollary 20.13 and Theorem 25.1. ⊓
⊔

Sobolev inequalities

Theorem 30.22 provides a Sobolev inequality of inﬁnite-dimensional

nature; but in weak CD(K, N ) spaces with N < ∞, one also has access
to ﬁnite-dimensional Sobolev inequalities. An example is the following
statement.

Theorem 30.23 (Sobolev inequality in weak CD(K, N ) spaces).

Let (X , d, ν) be a weak CD(K, N ) space, where K < 0 and N ∈ [1, ∞).
Then, for any R > 0 there exist constants A = A(K, N, R) and B =
B(K, N, R) such that for any Lipschitz function u supported in a ball
B(z, R),
kuk N ≤ A k∇− ukL1 (ν) + B kukL1 (ν) . (30.31)
L N−1 (ν)

Proof of Theorem 30.23. Since k∇− uk ≤ k∇− |u|k, it is suﬃcient to

treat the case
R when u is nonnegative (and nonzero). Let ρ = uN/(N −1) /Z,
where Z = u N/(N −1) , so that ρ is a probability density. Let µ0 = ρ ν,
and µ1 = (1B(z,R) )/ν[B(z, R)]. Let (µt )0≤t≤1 be a displacement in-
terpolation satisfying the displacement convexity inequalities of Theo-
rem 30.5. By Theorem 30.19(i), each µt is absolutely continuous. Then
we can repeat the proof of Theorems 20.1, 20.10 and 21.15. ⊓
⊔
Poincaré inequalities 891

Remark 30.24. It is not known whether weak CD(K, N ) spaces with

K > 0 and N < ∞ satisfy sharp Sobolev inequalities such as (21.8).

Diameter control

Recall from Proposition 29.11 that a weak CD(K, N ) space with K > 0
and N < ∞ satisﬁes the Bonnet–Myers diameter bound
r
N −1
diam (Spt ν) ≤ π .
K
Slightly weaker conclusions can also be obtained under a priori
weaker assumptions: For instance, if X is at the same time a weak
CD(0, N ) space and a weak CD(K, ∞) space, then there is a universal
constant C such that
r
N −1
diam (Spt ν) ≤ C . (30.32)
K
See the bibliographical notes for more details.

Poincaré inequalities

As I already mentioned in Chapter 19, there are many kinds of Poincaré

inequalities, which can be roughly divided into global and local inequal-
ities. In a nonsmooth context, global Poincaré inequalities can be seen
as a replacement for spectral gap estimates in a context where a Laplace
operator is not necessarily deﬁned.
If one does not care about dimension, there is a general principle
(independent of optimal transport) according to which a logarithmic
Sobolev inequality with constant K implies a global Poincaré inequality
with the same constant; and then from Theorem 30.22 we know that
a weak CD(K, ∞) condition does imply such a logarithmic Sobolev in-
equality. If one does care about the dimension, it is possible to adapt the
proof of Theorem 21.20 and establish the following Poincaré inequality
with the optimal constant KN/(N − 1).
892 30 Weak Ricci curvature bounds II: Geometric and analytic properties

Theorem 30.25 (Global Poincaré inequalities in weak CD(K, N )

spaces). Let (X , d, ν) be a weak CD(K, N ) space with K > 0 and
N ∈ (1, ∞]. Then, for any Lipschitz function f : Spt ν → R,
Z Z Z
2 N −1
f dν = 0 =⇒ f dν ≤ |∇− f |2 dν,
NK
with the convention that (N − 1)/N = 1 if N = ∞.
I omit the proof of Theorem 30.25 since it is almost a copy–paste of
the proof of Theorem 21.20.
Local Poincaré inequalities play a key role in the modern geometry
of metric spaces, and it is natural to ask whether weak CD(K, N )
spaces with N < ∞ satisfy them. (The restriction to ﬁnite N is natural
because these inequalities are morally ﬁnite-dimensional, contrary to
global Poincaré inequalities.) For the moment, this is only known to be
true under a nonbranching assumption.
Theorem 30.26 (Local Poincaré inequalities in nonbranching
CD(K, N ) spaces). Let K ∈ R, N ∈ [1, ∞), and let (X , d, ν) be a
nonbranching weak CD(K, N ) space. Let u : Spt ν → R be a Lipschitz
function, and let x0 ∈ Spt ν. For any R > 0, if r ≤ R then
Z Z

− u(x) − huiBr (x0 ) dν(x) ≤ P (K, N, R) r − |∇u|(x) dν(x),
Br (x0 ) B2r (x0 )

R R (30.33)
where −BR = (ν[B])−1 B stands for the averaged integral over B;
huiB = −B u dν stands for the average of the function u on B;
P (K, N, R) = 22N +1 C(K, N, R) D(K, N, R); C(K, N, R), D(K, N, R)
are defined by (19.11) and (18.10) respectively.
In particular, if K ≥ 0 then P (K, N, R) = 22N +1 is admissible; so
ν satisfies a uniform local Poincaré inequality. Moreover, (30.33) still
holds true if the local “norm of the gradient” |∇u| is replaced by any
upper gradient of u, that is a function g such that for any Lipschitz
path γ : [0, 1] → X ,
Z 1

g(γ(1)) − g(γ(0)) ≤ g(γ(t)) |γ̇(t)| dt.
0

Remark 30.27. It would be desirable to eliminate the nonbranching

condition, since it is not always satisﬁed by weak CD(K, N ) spaces,
and rather unnatural in the theory of local Poincaré inequalities.
Talagrand inequalities 893

Proof of Theorem 30.26. Modulo changes of notation, the proof is the

same as the proof of Theorem 19.13, once Theorem 30.17 guarantees
the almost sure uniqueness of geodesics. ⊓
⊔

Talagrand inequalities

With logarithmic Sobolev inequalities come a rich functional apparatus

for treating concentration of measure. One may also get concentration
from curvature bounds CD(K, ∞) via Talagrand inequalities. As for
the links between logarithmic Sobolev and Talagrand inequalities, they
also remain true, at least under mild stringent regularity assumptions
on X :

Theorem 30.28 (Talagrand inequalities and weak curvature

bounds). (i) Let (X , d, ν) be a weak CD(K, ∞) space with K > 0.
Then ν lies in P2 (X ) and satisfies the Talagrand inequality T2 (K).
(ii) Let (X , d, ν) be a locally compact Polish geodesic space equipped
with a locally doubling measure ν, satisfying a local Poincaré inequality.
If ν satisfies a logarithmic Sobolev inequality for some constant K > 0,
then ν lies in P2 (X ) and satisfies the Talagrand inequality T2 (K).
(iii) Let (X , d, ν) be a locally compact Polish geodesic space. If ν
satisfies a Talagrand inequality T2 (K) for some K > 0, then it also
satisfies a global Poincaré inequality with constant K.
(iv) Let (X , d, ν) be a locally compact Polish geodesic space equipped
with a locally doubling measure ν, satisfying a local Poincaré inequality.
If ν satisfies a global Poincaré inequality, then it also satisfies a modi-
fied logarithmic Sobolev inequality and a quadratic-linear transportation
inequality as in Theorem 22.25.

Remark 30.29. In view of Corollary 30.14 and Theorem 30.26, the

regularity assumptions required in (ii) are satisﬁed if (X , d, ν) is a non-
branching weak CD(K ′ , N ′ ) space for some K ′ ∈ R, N ′ < ∞; note that
the values of K ′ and N ′ do not play any role in the conclusion.

Proof of Theorem 30.28. Part (i) is an immediate consequence of (30.25)

and (30.29) with µ0 = ν.
894 30 Weak Ricci curvature bounds II: Geometric and analytic properties

The proof of (ii) and (iii) is the same as the proof of Theorem 22.17,
once one has an analog of Proposition 22.16. It turns out that proper-
ties (i)–(vi) of Proposition 22.16 and Theorem 22.46 are still satisﬁed
when the Riemannian manifold M is replaced by any metric space X ,
but property (vii) might fail in general. It is still true that this property
holds true for ν-almost all x, under the assumption that ν is locally
doubling and satisﬁes a local Poincaré inequality. See Theorem 30.30
below for a precise statement (and the bibliographical notes for refer-
ences). This is enough for the proof of Theorem 22.17 to go through.
⊓
⊔

The next theorem was used in the proof of Theorem 30.28:

Theorem 30.30 (Hamilton–Jacobi semigroup in metric spaces).

Let L : R+ → R+ be a strictly increasing, locally semiconcave, convex
continuous function such that L(0) = 0. Let (X , d) be a locally compact
geodesic Polish space equipped with a reference measure ν, locally dou-
bling and satisfying a local Poincaré inequality. For any f ∈ Cb (X ),
define the evolution (Ht f )t≥0 by


 H0 f = f




 d(x, y)

(Ht f )(x) = inf f (y) + t L (t > 0, x ∈ X ).
y∈X t
(30.34)
Then Properties (i)–(vi) of Theorem 22.46 remain true, up to the re-
placement of M by X . Moreover, the following weakened version of (vii)
holds true:
(vii’) For ν-almost any x ∈ X and any t > 0,

(Ht+s f )(x) − (Ht f )(x)

lim = − L∗ |∇− Ht f | ;
s↓0 s

this conclusion extends to t = 0 if kf kLip ≤ L′ (∞) and f is locally

Lipschitz.

Remark 30.31. There are also dimensional versions of Talagrand in-

equalities available, for instance the analog of Theorem 22.37 holds true
in weak CD(K, N ) spaces with K > 0 and N < ∞.
Equivalence of definitions in nonbranching spaces 895

Equivalence of definitions in nonbranching spaces

In the definition of weak CD(K, N ) spaces we chose to impose the
displacement convexity inequality for all U ∈ DCN , but only along
some displacement interpolation. We could have chosen otherwise, for
instance impose the inequality for just some particular functions U , or
along all displacement interpolations. In the end our choice was dictated
partly by the will to get a stable definition, partly by convenience. It
turns out that in nonbranching metric-measure spaces, the choice really
does not matter. It is equivalent:
• to require the displacement convexity inequality to hold true for any
U ∈ DCN ; or just for U = UN , where as usual UN (r) = −N r 1−1/N
if 1 < N < ∞, and U∞ (r) = r log r;
• to require the inequality to hold true for compactly supported, abso-
lutely continuous probability measures µ0 , µ1 ; or for any two prob-
ability measures with suitable moment conditions;
• to require the inequality to hold true along some displacement in-
terpolation, or along any displacement interpolation.
The next statement makes this claim precise. Note that I leave aside
the case N = 1, which is special (for instance U1 is not defined). I shall
write (UN )ν = HN,ν , and (UN )βπ,ν = HN,π,ν
β
. Recall Convention 17.10.
Theorem 30.32 (Equivalent definitions of CD(K, N ) in non-
branching spaces). Let (X , d, ν) be a nonbranching locally com-
pact Polish geodesic space equipped with a locally finite measure ν. Let
K ∈ R, N ∈ (1, ∞], and let p ∈ [2, +∞) ∪ {c} satisfy the assumptions
of Theorem 30.4. Then the following three properties are equivalent:
(i) (X , d, ν) is a weak CD(K, N ) space, in the sense of Defini-
tion 29.8;
(ii) For any two compactly supported continuous probability densities
ρ0 and ρ1 , there is a displacement interpolation (µt )0≤t≤1 joining µ0 =
ρ0 ν to µ1 = ρ1 ν, and an associated optimal plan π, such that for all
t ∈ [0, 1],
(K,N) (K,N)
β1−t β
HN,ν (µt ) ≤ (1 − t) HN,π,ν t
(µ0 ) + t HN,π̌,ν (µ1 ). (30.35)

(iii) For any displacement interpolation (µt )0≤t≤1 with µ0 , µ1 ∈

Pp (X ), for any associated transport plan π, for any U ∈ DCN and
for any t ∈ [0, 1],
896 30 Weak Ricci curvature bounds II: Geometric and analytic properties
(K,N) (K,N)
β1−t β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ). (30.36)

Remark 30.33. In the case N = 1, (30.35) does not make any sense,
but the equivalence (i) ⇒ (iii) still holds. This can be seen by working in
dimension N ′ > 1 and letting N ′ ↓ 1, as in the proof of Theorem 17.41.

Theorem 30.32 is interesting even for smooth Riemannian mani-

folds, since it covers singular measures, for which there is a priori no
uniqueness of displacement interpolant. Its proof is based on the idea,
already used in Theorem 19.4, that we may condition the optimal trans-
port to lie in a very small ball at time t, and, by passing to the limit,
retrieve a pointwise control of the density ρt . This will work because
the nonbranching property implies the uniqueness of the displacement
interpolation between intermediate times, and forbids the crossing of
geodesics used in the optimal transport, as in Theorem 7.30. Apart
from this simple idea, the proof is quite technical and can be skipped
at ﬁrst reading.

Proof of Theorem 30.32. Let us ﬁrst consider the case N < ∞.

Clearly, (iii) ⇒ (i) ⇒ (ii). So it is sufficient to show (ii) ⇒ (iii).
In the sequel, I shall assume that Property (ii) is satisfied. By the
same arguments as in the proof of Proposition 29.12, it is sufficient
to establish (30.36) when U is nonnegative and Lipschitz continuous,
and u(r) := U (r)/r (extended at 0 by u(0) = U ′ (0)) is a continuous
function of r. I shall fix t ∈ (0, 1) and establish Property (iii) for that t.
(K,N )
For simplicity I shall abbreviate βt into just βt .
First of all, let us establish that Property (ii) also applies if µ0
and µ1 are not absolutely continuous. The scheme of reasoning is the
same as we already used several times. Let µ0 and µ1 be any two
compactly supported measures. As in the proof of Theorem 29.24 we
can construct probability measures µk,0 and µk,1 , absolutely continuous
with continuous densities, supported in a common compact set, such
that µk,0 converges to µ0 , µk,1 converges to µ1 , in such a way that

lim sup Uπβk1−t β1−t

,ν (µk,0 ) ≤ Uπ,ν (µ0 );
βt
lim sup Uπ̌βkt ,ν (µk,1 ) ≤ Uπ̌,ν (µ1 ),
k→∞ k→∞
(30.37)
where πk is any optimal transference plan between µk,0 and µk,1 such
that πk → π. Since µk,0 and µk,1 are absolutely continuous with contin-
uous densities, for each k we may choose an optimal tranference plan
πk and an associated displacement interpolation (µk,t )0≤t≤1 such that
Equivalence of definitions in nonbranching spaces 897
β
1−t βt
HN,ν (µk,t) ≤ (1 − t) HN,π k ,ν
(µk,0 ) + t HN,π̌ k ,ν
(µk,1 ). (30.38)

Since all the measures µk,0 and µk,1 are supported in a uniform compact
set, Corollary 7.22 guarantees that the sequence (Πk )k∈N converges, up
to extraction, to some dynamical optimal transference plan Π with
(e0 )# Π = µ0 and (e1 )# Π = µ1 . Then µk,t converges weakly to µt =
(et )# Π, and πk := (e0 , e1 )# Πk converges weakly to π = (e0 , e1 )# Π. It
remains to pass to the limit as k → ∞ in the inequality (30.38); this is
easy in view of (30.37) and Theorem 29.20(i), which imply

HN,ν (µt ) ≤ lim inf HN,ν (µk,t ). (30.39)

k→∞

Next, the proofs of Theorem 30.11 and Corollary 30.14 go through,

since they only use the convex function U = UN ; in particular the
measure ν is locally doubling on its support.
Also the proof of Theorem 30.19(ii)-(iii) can be easily adapted in
the present setting, as soon as µ0 and µ1 are compactly supported.
Now we can start the core of the argument. It will be decomposed
into four steps.
Step 1: Assume that µ0 and µ1 are compactly supported, µt is ab-
solutely continuous and there exists a dynamical optimal transference
plan Π joining µ0 to µ1 , such that for any subplan Π ′ = Π/ e Π[Γ
e ],
e ′
0 ≤ Π ≤ Π, it happens that Π is the unique dynamical optimal trans-
ference plan between µ′0 = (e0 )# Π ′ and µ′1 = (e1 )# Π ′ .
In particular, Π is the unique dynamical optimal transference plan
between µ0 and µ1 , and, by Corollary 7.23, µt = (et )# Π defines the
unique displacement interpolation between µ0 and µ1 . In the sequel,
I shall denote by ρt the density of µt , and by ρ0 , ρ1 the densities of the
absolutely continuous parts of µ0 , µ1 respectively. I shall also fix Borel
sets S0 , S1 such that ν[S0 ] = ν[S1 ] = 0, µ0,s is concentrated on S0 and
µ1,s is concentrated on S1 . By convention ρ0 is defined to be +∞ on
S0 ; similarly ρ1 is defined to be +∞ on S1 .
Then let y ∈ Spt µt , and let δ > 0. Define
n o
Z = γ ∈ Γ ; γt ∈ Bδ (y) ,

and let Π ′ = (1Z Π)/Π[Z]. (If γ is a random variable distributed

according to Π, then Π ′ is the law of γ conditioned by the event
898 30 Weak Ricci curvature bounds II: Geometric and analytic properties

“γt ∈ Bδ (y)”.) Let µ′t = (et )# Π ′ , let ρ′t be the density of the abso-
lutely continuous part of µ′t , and let π ′ := (e0 , e1 )# Π ′ . Since Π ′ is the
unique dynamical optimal transference plan between µ′0 and µ′1 , we can
write the displacement convexity inequality
β βt
HN,ν (µ′t ) ≤ (1 − t) HN,π,ν
1−t
(µ′0 ) + t HN,π̌,ν (µ′1 ).

In other words,
Z Z
1 1 1
(ρ′t )1− N dν ≥ (1 − t) (ρ′0 (x0 ))− N β1−t (x0 , x1 ) N π ′ (dx0 dx1 )
X X ×X
Z
1 1
+ t (ρ′1 (x1 ))− N βt (x0 , x1 ) N π ′ (dx0 dx1 ), (30.40)
X ×X

with the understanding that ρ′0 (x0 ) = +∞ when x0 ∈ S0 , and similarly

ρ′1 (x1 ) = +∞ when x1 ∈ S1 .
By reasoning as in the proof of Theorem 19.4, we obtain
1 h 1 1 i
ν[Bδ (y)] N β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
1 ≥ E Π (1−t) +t γt ∈ Bδ (y) .
µt [Bδ (y)] N ρ0 (γ0 ) ρ1 (γ1 )

If we deﬁne
1 1
β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
f (γ) := (1 − t) + t ,
ρ0 (γ0 ) ρ1 (γ1 )

then the conclusion can be rewritten

ν[Bδ (y)] N
1 h i E f (γ)1
[γt ∈Bδ (y)]
1 ≥ E Π f (γ)|γt ∈ Bδ (y) = . (30.41)
µt [Bδ (y)] N µt [Bδ (y)]

In view of the nonbranching property, Π only sees geodesics which

do not cross each other; recall Theorem 7.30(iv)-(v). Let Ft be the
map appearing in that theorem, deﬁned by Ft (γt ) = γ. Then (30.41)
becomes
1
ν[Bδ (y)] N E f (Ft (γt )) 1[γt ∈Bδ (y)]
1 ≥
µt [Bδ (y)] N µt [Bδ (y)]
Z
f (Ft (x)) dµt (x)
Bδ (y)
= . (30.42)
µt [Bδ (y)]
Equivalence of definitions in nonbranching spaces 899

Since the measure ν is locally doubling, we can apply Lebesgue’s

density theorem: There is a set Z of zero ν-measure such that if y ∈
/ Z,
then 1
ν[Bδ (y)] N 1
1 −−−→ 1 .
µt [Bδ (y)] N δ→0 ρt (y) N
Similarly, outside of a set of zero measure,
Z Z
f (Ft (x)) dµt (x) f (Ft (x)) ρt (x) dν(x)
Bδ (y) Bδ (y) ν[Bδ (y)]
=
µt [Bδ (y)] ν[Bδ (y)] µt [Bδ (y)]

f (Ft (y)) ρt (y)

−−−→ ,
δ→0 ρt (y)

and this coincides with f (Ft (y)) if ρt (y) 6= 0. All in all, µt (dy)-almost
surely,
1
1 ≥ f (Ft (y)).
ρt (y) N
Equivalently, Π(dγ)-almost surely,
1
1 ≥ f (Ft (γt )) = f (γ).
ρt (γt ) N
Let us recapitulate: We have shown that Π(dγ)-almost surely,
1 1
1 β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N

1 ≥ (1 − t) + t . (30.43)
ρt (γt ) N ρ0 (γ0 ) ρ1 (γ1 )

Step 2: Now we shall prove inequality (30.36) when µ0 and µ1 are

compactly supported, and µt is absolutely continuous. So let (µs )0≤s≤1
be a displacement interpolation joining µ0 to µ1 , and let Π be a dy-
namical optimal transport plan with µs = (es )# Π. Let ε ∈ (0, 1 − t)
be given. By the nonbranching property and Theorem 7.30(iii), the
restricted plan Π 0,1−ε obtained as the push-forward of Π by the re-
striction map from C([0, 1]; X ) to C([0, 1 − ε]; X ) is the only dynamical
optimal transport plan between µ0 and µ1−ε ; and more generally, if
0 ≤ Π e ≤ Π 0,1−ε with Π[Γ
e ] > 0, then Π ′ := Π/ e Π[Γ
e ] is the only
dynamical optimal transport plan between its endpoints measures. In
other words, µe0 = µ0 and µ e1 = µ1−ε satisfy the assumptions used
in Step 1. The only displacement interpolation between µ e0 and µ
e1 is
900 30 Weak Ricci curvature bounds II: Geometric and analytic properties

et = µ(1−ε)t , so we can apply formula (30.43) to that path, after time-

µ
reparametrization. Writing

1−t−ε t
t= ×0 + × (1 − ε),
1−ε 1−ε
we see that, Π(dγ)-almost surely,

! N1
1 1−t−ε β 1−t−ε (γ0 , γ1−ε )
1−ε
1 ≥
ρt (γt ) N 1−ε ρ0 (γ0 )
! N1
t β t (γ0 , γ1−ε )
1−ε
+ . (30.44)
1−ε ρ1−ε (γ1−ε )

Next, let us apply the same reasoning on the time-interval [t, 1]

rather than [0, 1 − ε]. Write 1 − ε as an intermediate point between t
and 1:
ε 1−t−ε
1−ε= ×t + × 1.
1−t 1−t
Since µt is absolutely continuous and µ1−ε belongs to the unique
displacement interpolation between µt and µ1 , it follows from The-
orem 30.19(ii) that µ1−ε is absolutely continuous too. Then (30.43)
becomes, after time-reparametrization,

!1
1 ε β 1−t
ε (γt , γ1 ) N

1 ≥
ρ1−ε (γ1−ε ) N 1−t ρt (γt )
! N1
1−t−ε β 1−t−ε (γt , γ1 )
1−t
+ . (30.45)
1−t ρ1 (γ1 )

The combination of (30.44) and (30.45) yields

t ε 1 1 1
1− β t (γ0 , γ1−ε ) N β 1−t
ε (γt , γ1 ) N
1
1−ε 1−t 1−ε
ρt (γt ) N
β 1−t−ε (γ0 , γ1−ε ) ! N1
1−t−ε 1−ε
≥
1−ε ρ0 (γ0 )
β 1−t−ε (γt , γ1 ) β t (γ0 , γ1−ε ) ! N1
1−t−ε t 1−t 1−ε
+ .
1−t 1−ε ρ1 (γ1 )
Equivalence of definitions in nonbranching spaces 901

Then we can pass to the limit as ε → 0 thanks to the continuity of γ

and β; since β1 (x, y) = 1 for all x, y, we conclude that inequality (30.43)
holds true almost surely.
Now let w(δ) = u(δ−N ) = δN U (δ−N ), with the convention w(0) =
U ′ (∞).By assumption w is a convex nonincreasing function of δ.
So (30.43) implies
!
1 β1−t (γ0 , γ1 ) N1
E u(ρt (γt )) = E w 1 ≤ (1 − t) E w
ρt (γt ) N ρ0 (γ0 )

βt (γ0 , γ1 ) N1
+ tEw . (30.46)
ρ1 (γ1 )
R R
The left-hand side is just U (ρt (x))/ρt (x) dµt (x) = U (ρt (x)) dν(x) =
β1−t
Uν (µt ). The ﬁrst term in the right-hand side is (1 − t) Uπ,ν (µ0 ), since
we chose to deﬁne ρ0 (x0 ) = +∞ when x0 belongs to the singular set S0 .
βt
Similarly, the second term is t Uπ̌,ν (µ1 ). So (30.46) reads

β1−t βt
Uν (µt ) ≤ (1 − t) Uπ,ν (µ0 ) + t Uπ̌,ν (µ1 ),

as desired.
Step 3: Now we wish to establish inequality (30.36) in the case
when µt is absolutely continuous, that is, we just want to drop the
assumption of compact support.
It follows from Step 2 that (X , d, ν) is a weak CD(K, N ) space, so we
now have access to Theorem 30.19 even if µ0 and µ1 are not compactly
supported; and also we can appeal to Theorem 30.5 to guarantee that
Property (ii) is veriﬁed for probability measures that are not necessar-
ily compactly supported. Then we can repeat Steps 1 and 2 without
the assumption of compact support, and in the end establish inequal-
ity (30.36) for measures that are not compactly supported.
Step 4: Now we shall consider the case when µt is not absolutely
continuous. (This is the part of the proof which has interest even in
a smooth setting.) Let (µt )s stand for the singular part of µt , and
m := (µt )s [X ] > 0.
Let E (a) and E (s) be two disjoint Borel sets in X such that the
absolutely continuous part of µt is concentrated on E (a) , and the sin-
gular part of µt is concentrated on E (s) . Obviously, Π[γt ∈ E (s) ] =
(µt )s [X ] = m, and Π[γt ∈ E (a) ] = 1 − m. Let us decompose Π into
Π = (1 − m) Π (a) + m Π (s) , where
902 30 Weak Ricci curvature bounds II: Geometric and analytic properties

1[γt ∈E (a) ] Π(dγ) 1[γt ∈E (s) ] Π(dγ)

Π (a) (dγ) = , Π (s) (dγ) = .
Π[γt ∈ E (a) ] Π[γt ∈ E (s) ]

Further, for any s ∈ [0, 1], let

µs(a) = (es )# Π (a) , µ(s) (s)

s = (es )# Π ,
π (a) = (e0 , e1 )# Π (a) , π (s) = (e0 , e1 )# Π (s) .

Since it has been obtained by conditioning of a dynamical optimal

transference plan, Π (a) is itself a dynamical optimal transference plan
(a)
(Theorem 7.30(ii)), and by construction µt is the absolutely contin-
(s)
uous part of µt , while µt is its singular part. So the result of Step 2
(a)
applies to the path (µs )0≤s≤1 :
(a) β (a) (a)
1−t
Uν (µt ) ≤ (1 − t) Uπ(a) (µ ) + t Uπ̌β(a)
,ν 0
t
(µ ).
,ν 1

Actually, we shall not apply this inequality with the nonlinearity U ,

but rather with Um (r) = U ((1 − m)r), which lies in DCN if U does. So
(a) β (a)
βt (a)
(Um )ν (µt ) ≤ (1 − t) (Um )π1−t
(a) ,ν (µ0 ) + t (Um )π̌ (a) ,ν (µ1 ). (30.47)

(s) (a) (s)

Since µt is purely singular and µt = (1 − m) µt + m µt , the
deﬁnition of Uν implies
(a)
Uν (µt ) = (Um )ν (µt ) + m U ′ (∞). (30.48)
(s)
By Theorem 30.19(iii), µ0 is purely singular. In view of the identity
(a) (s)
µ0 = (1 − m) µ0 + m µ0 ,
β1−t β ′(a)
Uπ,ν (µ0 ) = (Um )π1−t
(a) ,ν (µ0 ) + m U (∞). (30.49)

Similarly,
βt (a)
Uπ̌,ν (µ1 ) = (Um )π̌βt(a) ,ν (µ1 ) + m U ′ (∞). (30.50)
The combination of (30.47), (30.48), (30.49) and (30.50) implies
β1−t βt
Uν (µt ) ≤ (1 − t) Uπ,ν (µ0 ) + t Uπ̌,ν (µ1 ).

This concludes the proof in the case N < ∞.

When N = ∞, at ﬁrst sight things look pretty much the same;
formula (30.43) should be replaced by
Equivalence of definitions in nonbranching spaces 903

1 1 1 K t(1 − t)
log ≥ (1 − t) log + t log + d(γ0 , γ1 )2 .
ρt (γt ) ρ0 (γ0 ) ρ1 (γ1 ) 2
(30.51)
At a technical level, there is a small simpliﬁcation since it is not nec-
essary to treat singular measures (if µ is singular and U is not linear,
then according to Proposition 17.7(ii) U ′ (∞) = +∞, so Uν (µ) = +∞).
On the other hand, there is a serious complication: The proof of Step 1
breaks down since the measure ν is not a priori locally doubling, and
Lebesgue’s density theorem does not apply!
It seems a bit of a miracle that the method of proof can still be
saved, as I shall now explain. First assume that ρ0 and ρ1 satisfy the
same assumptions as in Step 1 above, but that in addition they are
upper semicontinuous. As in Step 1, deﬁne

β1−t (γ0 , γ1 ) βt (γ0 , γ1 )
f (γ) = (1 − t) log + t log
ρ0 (γ0 ) ρ1 (γ1 )
1 1 K t(1 − t)
= (1 − t) log + t log + d(γ0 , γ1 )2 .
ρ0 (γ0 ) ρ1 (γ1 ) 2

The argument of Step 1 shows that

R
ν[Bδ (y)] B (y) f (Ft (x)) µt (dx)
log ≥ δ ;
µt [Bδ (y)] µt [Bδ (y)]

in particular µt [Bδ (y)] ≤ exp − inf f (Ft (x)) ν[Bδ (y)]. Similarly,
x∈Bδ (y)
for any z ∈ Bδ/2 (y) and r ≤ δ/2,

µt [Br (z)] ≤ exp − inf f (Ft (x)) ν[Br (z)]. (30.52)
x∈Bδ (y)

The family of balls {Br (z); z ∈ Bδ/2 (y); r ≤ δ/2} generates the Borel
σ-algebra of Bδ/2 (y), so (30.52) holds true for any measurable set S ⊂
Bδ/2 (y) instead of Br (z). Then we can pass to densities:

ρt (z) ≤ exp − inf f (Ft (x)) almost surely in Bδ/2 (y).
x∈Bδ (y)

In particular, for almost any z,

ρt (z) ≤ sup e−f (Ft (x)) . (30.53)

x∈B2δ (z)
904 30 Weak Ricci curvature bounds II: Geometric and analytic properties

Now note that the map Ft from Theorem 7.30(v) is continuous on

Spt Π. Indeed, if a sequence (γk )k∈N of geodesics is given in Spt Π, in
such a way that γk (t) → γ(t), by compactness there is a subsequence,
still denoted γk , which converges uniformly to some geodesic γ e ∈ Spt Π
e(t) = γ(t); which implies that γ
and satisfying γ e = γ. (In fact I used the
same argument to prove the measurability of Ft in the case when Spt Π
is not necessarily compact.) Since ρ0 and ρ1 are upper semicontinuous,
f is lower semicontinuous; so e−f (Ft ) is upper semicontinuous, and

lim sup e−f (Ft (x)) ≤ e−f (Ft (z)) .

δ↓0 x∈Bδ (z)

So we may pass to the limit as δ → 0 in (30.53) and recover ρt ≤ e−f ◦Ft ,

or in other words

1 β1−t (γ0 , γ1 ) βt (γ0 , γ1 )
log ≥ (1 − t) log + t log . (30.54)
ρt (γt ) ρ0 (γ0 ) ρ1 (γ1 )
This is the desired estimate, but under the additional assumption
of upper semicontinuity of ρ0 and ρ1 . In the general case, we still have

1 β1−t (γ0 , γ1 ) βt (γ0 , γ1 )
log ≥ (1 − t) log + t log , (30.55)
ρt (γt ) ρ0 (γ0 ) ρ1 (γ1 )
where ρ0 and ρ1 are upper semicontinuous and ρ0 ≤ ρ0 , ρ1 ≤ ρ1 .
Next recall that if g is any nonnegative measurable function, there is
a sequence (gk )k∈N of nonnegative upper semicontinuous functions such
that 0 ≤ gk ≤ g out of a set of zero measure, and gk ↑ g almost surely
as k → ∞. Indeed,P one can write g as a nondecreasing limit of simple
functions hj = ℓ λj 1B ℓ , where (Bjℓ )1≤ℓ≤Lj is a finite family of Borel
ℓ
j
sets. For each Bjℓ , the regularity of the measure allows one to find a
nondecreasing sequence of compact sets (Kj,m ℓ ) ℓ
m∈N included in Bj , such
ℓ ] −→ ν[B ℓ ]. So h
P
that ν[Kj,m j j,m = λℓj 1K ℓ converges monotonically
j,m
to hj as m → ∞, up to a set of zero ν-measure. Each hj,m is obviously
upper semicontinuous. Then choose gk = max{hj,j ; j ≤ k}: this is
still upper semicontinuous (the maximum is over a finite set of upper
semicontinuous functions). For any ℓ and any k ≥ ℓ we have gk ≥ hℓ,k ,
which converges to gℓ as k → ∞; so lim inf gk ≥ g, almost surely.
Coming back to the proof of Theorem 30.32, I shall now proceed
to approximate Π. Let (gk )k∈N be a sequence of upper semicontinuous
functions such that 0 ≤ gk ≤ ρ0 and gk ↑ ρ0 up to a ν-negligible set.
Let
Equivalence of definitions in nonbranching spaces 905
Z
gk
Zk = gk dν, .
ρk,0 =
Zk
Next disintegrate Π with respect to its marginal (e0 )# Π:
Π(dγ) = ρ(γ0 ) ν(dγ0 ) Π(dγ|γ0 ),
and define
Πk′
Πk′ (dγ) = gk (γ0 ) ν(dγ0 ) Π(dγ|γ0 ); Πk = .
Zk
Then Πk is a probability measure on geodesics, and since it has been
obtained from Π by restriction, it is actually the unique dynamical
optimal transference plan between the two probability measures µk,0 =
(e0 )# Πk and µk,1 = (e1 )# Πk . From the construction of Πk ,
ρ1
µk,0 = ρk,0 ν; µk,1 = ρk,1 ν; ρk,1 ≤ .
Zk
Next we repeat the process at the other end: Let (hk,ℓ )ℓ∈N be a
nonincreasing sequence of upper semicontinuous functions such that
0 ≤ hk,ℓ ≤ ρk,1 and hk,ℓ ↑ ρk,1 (up to a set of zero measure). Define
Z
hk,ℓ
Zk,ℓ = hk,ℓ dν, ρk,ℓ,1 = ;
Zk,ℓ
′ (dγ)
Πk,ℓ
′
Πk,ℓ (dγ) = Πk (dγ|γ1 ) hk,ℓ (γ1 ) ν(dγ1 ); Πk,ℓ (dγ) = .
Zk,ℓ
Then again Πk,ℓ is the unique dynamical optimal transference plan
between its marginals µk,ℓ,0 = (e0 )# Πk,ℓ and µk,ℓ,1 = ρk,ℓ,1 ν.
If t is any time in [0, 1] and ρk,t (resp. ρk,ℓ,t ) stands for the density
of (et )# Πk (resp. (et )# Πk,ℓ) with respect to ν, then
Zk ρk,t ↑ ρt as k → ∞;
Zk,ℓ ρk,ℓ,t ↑ ρk,t as ℓ → ∞.
Moreover, ρk,0 and ρk,ℓ,1 are upper semicontinuous; in particular ρk,ℓ,0 ≤
ρk,0 /Zk,ℓ , which is upper semicontinuous. Then we can apply (30.55)
with the dynamical plan Πk,ℓ and get

1 Zk,ℓ β1−t (γ0 , γ1 ) βt (γ0 , γ1 )
log ≥ (1 − t) log + t log .
ρk,ℓ,t(γt ) ρ0,k (γ0 ) ρk,ℓ,1 (γ1 )
By letting ℓ → ∞ and then k → ∞, we conclude that (30.54) is true,
but this time without any upper semicontinuity assumption.
This concludes the proof of Step 1 in the case N = ∞. Then Steps 2
and 3 are done as before. ⊓
⊔
906 30 Weak Ricci curvature bounds II: Geometric and analytic properties

Locality

Locality is one of the most fundamental properties that one may expect
from any notion of curvature. In the setting of weak CD(K, N ) spaces,
the locality problem may be loosely formulated as follows: If (X , d, ν) is
weakly CD(K, N ) in the neighborhood of any of its points, then (X , d, ν)
should be a weakly CD(K, N ) space.
So far it is not known whether this “local-to-global” property holds
in general. However, it is true at least in a nonbranching space, if K = 0
and N < ∞. The validity of a more general statement may depend on
the following:

Conjecture 30.34 (Local-to-global CD(K, N ) property along a

path). Let θ ∈ (0, 1) and α ∈ [0, π]. Let f : [0, 1] → R+ be a measurable
function such that for all λ ∈ [0, 1], t, t′ ∈ [0, 1], the inequality

!θ
′
sin (1 − λ) α|t − t′ |
f (1 − λ) t + λ t ≥ (1 − λ) f (t)
(1 − λ) sin(α|t − t′ |)
!θ
sin λ α|t − t′ |
+ λ f (t′ ) (30.56)
λ sin(α|t − t′ |)

holds true as soon as |t−t′ | is small enough. Then (30.56) automatically

holds true for all t, t′ ∈ [0, 1].
The same if sin is replaced by sinh and α is allowed to vary in R+ .

I really don’t have much to support this conjecture, except that it

would imply a really nice (to my taste) result. It might be trivially false
or trivially true, but I was unable to prove or disprove it. (If it would
hold true only under additional regularity assumptions such as local
integrability or continuity of f , this might be fine.)
To understand theprelation of (30.56) to optimal transport, take
θ = 1 − 1/N , α = |K|/(N − 1) d(γ0 , γ1 ), f (t) = ρt (γt )−1/N , and
write It (γ0 , γt , γ1 ) for the inequality appearing in (30.43). Then Con-
jecture 30.34, if true, means that this inequality is local, in the sense
that if It (γt0 , γ(1−t) t0 +t t1 , γt1 ) holds true for |t0 − t1 | small enough, then
it holds true for all t0 , t1 , and in particular t0 = 0, t1 = 1.
There are at least two limit cases in which Conjecture 30.34 be-
comes true. The first one is for α = 0 and θ fixed (this corresponds to
CD(0, N ), N = 1/(1 − θ)); the second one is the limit when θ → 1,
Locality 907

α → 0 in such a way that α2 /(1 − θ) converges to a ﬁnite limit

(this corresponds to CD(K, ∞), and the limit of α2 /(1 − θ) would be
K d(γ0 , γ1 )2 ). In the first case, Conjecture 30.56 reduces to the locality
of the property of concavity:

f (1 − λ) t + λ t′ ≥ (1 − λ) f (t) + λ f (t′ );
while in the second case, it reduces to the locality of the more general
property of κ-concavity (κ ∈ R):
κ λ(1 − λ)
f (1 − λ) t + λ t′ ≥ (1 − λ) f (t) + λ f (t′ ) + |t − t′ |2 . (30.57)
2
These properties do satisfy a local-to-global principle, for instance be-
cause they are equivalent to the differential inequality f ′′ ≤ 0, or
f ′′ ≤ −κ, to be understood in the distributional sense.
To summarize: If K = 0 (resp. N = ∞), inequality (30.43)
(resp. (30.51)) satisfies a local-to-global principle; in the other cases
I don’t know.
Next, I shall give a precise definition of what it means to satisfy
CD(K, N ) locally:
Definition 30.35 (Local CD(K, N ) space). Let K ∈ R and N ∈
[1, ∞]. A locally compact Polish geodesic space (X , d) equipped with a
locally finite measure ν is said to be a local weak CD(K, N ) space if for
any x0 ∈ X there is r > 0 such that whenever µ0 , µ1 are two probability
measures supported in Br (x0 ) ∩ Spt ν, there is a displacement interpo-
lation (µt )0≤t≤1 joining µ0 to µ1 , and an associated optimal coupling
π, such that for all t ∈ [0, 1] and for all U ∈ DCN ,
(K,N) (K,N)
β
1−t β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ). (30.58)
Remark 30.36. In the previous definition, one could also have im-
posed that the whole path (µt )0≤t≤1 is supported in Br (x0 ). Both for-
mulations are equivalent: Indeed, if µ0 and µ1 are supported in Br/3 (x0 )
then all measures µt are supported in Br (x0 ).
Now comes the main result in this section:
Theorem 30.37 (From local to global CD(K, N )). Let K ∈ R,
N ∈ [1, ∞), and let (X , d, ν) be a nonbranching local weak CD(K, N )
space with Spt ν = X . If K = 0, then X is also a weak CD(K, N )
space. The same is true for all values of K if Conjecture 30.34 has an
affirmative answer.
908 30 Weak Ricci curvature bounds II: Geometric and analytic properties

Remark 30.38. If the assumption Spt ν = X is dropped then the

result becomes trivially false. As a counterexample, take X = R3 ,
equipped with the Euclidean distance, and let ν be the 2-dimensional
Lebesgue measure on each horizontal plane of integer altitude. (So the
measure is concentrated on well-separated parallel planes.) This is a
local weak CD(0, 2) space but not a weak CD(0, 2) space.

Remark 30.39. I don’t know if the nonbranching condition can be

removed in Theorem 30.37.

As in the proof of Theorem 30.32, one of the main ideas in the

proof of Theorem 30.37 consists in using the nonbranching condition
to translate integral conditions into pointwise density bounds along
geodesic paths. Another idea consists in “cutting” dynamical optimal
transference plans into small pieces, each of which is “small enough”
that the local displacement convexity can be applied. The fact that
we work along geodesic paths parametrized by [0, 1] explains that the
whole locality problem is reduced to the one-dimensional “local-to-
global” problem exposed in Conjecture 30.34.

Proof of Theorem 30.37. If we can treat the case N > 1, then the case
N = 1 will follow by letting N go to 1 (as in the proof of Theo-
rem 29.24). So let us assume 1 < N < ∞. In the sequel, I shall use the
(K,N )
shorthand βt = βt .
Let (X , d, ν) be a nonbranching local weak CD(K, N ) space. By re-
peating the proof of Theorem 30.32, we can show that for any x0 ∈ X
there is r = r(x0 ) > 0 such that (30.58) holds true along any displace-
ment interpolation (µt )0≤t≤1 which is supported in B(x0 , r). Moreover,
if Π is a dynamical optimal transference plan such that (et )# Π = µt ,
and each measure µt is absolutely continuous with density ρt , then
Π(dγ)-almost all geodesics will satisfy inequality (30.43), which I re-
cast below:
1 1
1 β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
1 ≥ (1 − t) +t . (30.59)
ρt (γt ) N ρ0 (γ0 ) ρ1 (γ1 )

Let µ0 , µ1 be any two compactly supported probability measures

on X , and let B = B(z, R) be a large ball such that any geodesic going
from Spt µ0 to Spt µ1 lies within B. Let Π be a dynamical optimal
transference plan between µ0 and µ1 . The goal is to prove that for all
U ∈ DCN ,
Locality 909

β1−t βt
Uν (µt ) ≤ (1 − t) Uπ,ν (µ0 ) + t Uπ̌,ν (µ1 ). (30.60)
The plan is to cut Π into very small pieces, each of which will
be included in a sufficiently small ball that the local weak CD(K, N )
criterion can be used. I shall first proceed to construct these small
pieces.
Cover the closed ball B[z, R] by a finite number of balls B(xj , rj /3)
with rj = r(xj ), and let r := inf(rj /3). For any y ∈ B[z, R], the ball
B(y, r) lies inside some B(xj , rj ); so if (µt )0≤t≤1 is any displacement in-
terpolation supported in some ball B(y, r), Π is an associated dynam-
ical optimal transference plan, and µ0 , µ1 are absolutely continuous,
then the density ρt of µt will satisfy the inequality
1 1
1 β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
1 ≥ (1 − t) +t , (30.61)
ρt (γt ) N ρ0 (γ0 ) ρ1 (γ1 )

Π(dγ)-almost surely. The problem now is to cut Π into many small

subplans Π and to apply (30.61) to all these subplans.
Let δ ∈ 1/N be small enough that 4R δ ≤ r/3, and let B(yℓ , δ)1≤ℓ≤L
be a finite covering of B[z, R] by balls of radius δ. Define A1 = B(y1 , δ),
A2 = B(y2 , δ) \ A1 , A3 = B(y3 , δ) \ (A1 ∪ A2 ), etc. This provides a
covering of B(z, R) by disjoint sets (Aℓ )1≤ℓ≤L , each of which is included
in a ball of radius δ. (Without loss of generality, we can assume that
they are all nonempty.)
Let m = 1/δ ∈ N. We divide the set Γ of all geodesics going from
Spt µ0 to Spt µ1 into pieces, as follows. For any finite sequence ℓ =
(ℓ0 , ℓ1 , . . . , ℓm ), let
n o
Γℓ = γ ∈ Γ ; γ0 ∈ Aℓ0 , γδ ∈ Aℓ1 , γ2δ ∈ Aℓ2 , . . . , γmδ = γ1 ∈ Aℓm .

The sets Γℓ are disjoint. We discard the sequences ℓ such that Π[Γℓ ] = 0.
Then let Zℓ = Π[Γℓ ], and let
1Γℓ Πℓ
Πℓ =
Zℓ
be the law of γ conditioned by the event {γ ∈ Γℓ }. Further, let µℓ,t =
(et )# Πℓ , and πℓ = (e0 , e1 )# Πℓ .
For each ℓ and k ∈ {0, . . . , m−2}, we deﬁne Πℓk to be the image of Πℓ
by the restriction map [0, 1] → [kδ, (k+2)δ]. Up to aﬃne reparametriza-
tion of time, Πℓk is a dynamical optimal transference plan between the
measures µℓ,kδ and µℓ,(k+2)δ (Theorem 7.30(i)–(ii)).
910 30 Weak Ricci curvature bounds II: Geometric and analytic properties

Let γ be a random geodesic distributed according to the law Πℓk .

Almost surely, γ(kδ) belongs to Aℓk , which has a diameter at most r/3.
Moreover, the speed of γ is bounded above by diam (B[z, R]) ≤ 2R,
so on the time-interval [kδ, (k + 2)δ], γ moves at most by a distance
(2δ)(2R) ≤ r/3. Thus γ is entirely contained in a set of diameter 2r/3.
In particular, (µkℓ,t )kδ≤t≤(k+2)δ is entirely supported in a set of diameter
r, and satisﬁes the displacement convexity inequalities which are typical
of the curvature-dimension bound CD(K, N ).
By Theorem 7.30(iii), µkℓ,t is (up to time-reparametrization) the
unique optimal dynamical transference plan between µℓ,kδ and µℓ,(k+2)δ .
So by Theorem 30.19(ii), the absolute continuity of µℓ,kδ implies the
absolute continuity of µℓ,t for all t ∈ [kδ, (k + 2)δ). Since µℓ,0 is ab-
solutely continuous, an immediate induction shows that µℓ,t is abso-
lutely continuous for all times. Then we can apply (30.61) to each path
(µℓ,t )kδ≤t≤(k+2)δ ; after time reparametrization, this becomes:

∀k ∈ {0, . . . , m − 2},
Πℓ (dγ)-almost surely,∀t ∈ [0, 1], ∀(t0 , t1 ) ∈ [kδ, (k + 2)δ],
1 β (γ , γ ) 1 β (γ , γ ) 1
1−t t0 t1 N t t0 t1 N
1 ≥ (1 − t) +t .
ρℓ,(1−t)t0 +tt1 γ(1−t)t0 +tt1 N ρℓ,t0 (γt0 ) ρℓ,t1 (γt1 )
(30.62)

So inequality (30.62) is satisﬁed whenever |t0 − t1 | ≤ δ. Then our

assumptions, and the discussion following Conjecture 30.34, imply that
the same inequality is satisﬁed for all values of t0 and t1 in [0, 1]. In
particular, Πℓ -almost surely,
1 1
1 β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N

1 ≥ (1 − t) +t . (30.63)
ρℓ,t (γt ) N ρℓ,0 (γ0 ) ρℓ,1 (γ1 )

By reasoning as in the proof of Theorem 30.32 (end of Step 2), we

deduce the inequality
βt
Uν (µℓ,t ) ≤ (1 − t) Uπβℓ1−t
,ν (µℓ,0 ) + t Uπ̌ℓ ,ν (µℓ,1 ). (30.64)
P
Recall that µt = Zℓ µℓ,t ; so the issue is now to add up the various
contributions coming from diﬀerent values of ℓ.
For each ℓ, we apply (30.64) with U replaced by Uℓ = U (Zℓ · )/Zℓ .
β
Then, with the shorthand Uℓ,ν = (Uℓ )ν and Uℓ,π ℓ ,ν
= (Uℓ )βπℓ ,ν , we obtain
Locality 911
β βt
Uℓ,ν (µℓ,t ) ≤ (1 − t) Uℓ,π1−t
ℓ ,ν
(µℓ,0 ) + t Uℓ,π̌ ℓ ,ν
(µℓ,1 ). (30.65)

For any t ∈ (0, 1), the map γt → γ is injective, as a consequence of

Theorem 7.30(iv)–(v), and in particular the measures µℓ,t are mutually
singular as ℓ varies. Then it follows from Lemma 29.7 that
X
Uν (µt ) = Zℓ Uℓ,ν (µℓ,t ). (30.66)
ℓ
P
Since π = ℓ Zℓ πℓ ,
Lemma 29.7 also implies
X β β1−t
Zℓ Uℓ,π1−t
ℓ ,ν
(µℓ,0 ) ≤ Uπ,ν (µ0 );
ℓ
X βt βt
Zℓ Uℓ,π̌ ℓ ,ν
(µℓ,1 ) ≤ Uπ̌,ν (µ1 ). (30.67)
ℓ

The combination of (30.65), (30.66) and (30.67) implies the desired

conclusion (30.60). ⊓
⊔
In the case N = ∞, Conjecture 30.34 is satisfied; however, I don’t
know if Theorem 30.37 can be extended to that case without additional
assumptions; the problem is that Hν (µ) might be +∞. More precisely,
if either Hν (µkδ ) or Hν (µ(k+2)δ ) is +∞, then we cannot derive (30.51)
between times kδ and (k + 2)δ, so the proof breaks down.
To get around this problem, I shall impose further assumptions en-
suring that the space is “almost everywhere” finite-dimensional. Let
us agree that a point x in a metric-measure space (X , d, ν) is finite-
dimensional if there is a small ball Br (x) in which the criterion for
CD(K ′ , N ′ ) is satisfied, where K ′ ∈ R and N ′ < ∞. More explicitly, it
is required that for any two probability measures µ0 , µ1 supported in
Br (x) ∩ Spt ν, there is a displacement interpolation (µt )0≤t≤1 and an
associated coupling π such that for all U ∈ DCN ,
(K,N) (K,N)
1−t β β
Uν (µt ) ≤ (1 − t) Uπ,ν t
(µ0 ) + t Uπ̌,ν (µ1 ).
A point will be called infinite-dimensional if it is not finite-dimensional.
Example 30.40. Let ϕ : R → R ∪ {+∞} be a convex function with
domain (a, b), where a, b are two real numbers. (So ϕ takes the value
+∞ outside of (a, b).) Equip R with the usual distance and the measure
ν(dx) = e−ϕ(x) dx; this gives a weak CD(0, ∞) space. Then the support
[a, b] of ν consists in finite-dimensional points, which fill up the open
interval (a, b); and the two infinite-dimensional points a and b.
912 30 Weak Ricci curvature bounds II: Geometric and analytic properties

Example 30.41. The space X in Example 29.17 is “genuinely inﬁnite-

dimensional” in the sense that none of its points is finite-dimensional
(such a point would have a neighborhood of finite Hausdorff dimension
by Corollary 30.13).
Theorem 30.42 (From local to global CD(K, ∞)). Let K ∈ R and
let (X , d, ν) be a local weak CD(K, ∞) space with Spt ν = X . Assume
that X is nonbranching and that there is a totally convex measurable
subset Y of X such that all points in Y are finite-dimensional and
ν[X \ Y] = 0. Then (X , d, ν) is a weak CD(K, ∞) space.
Remark 30.43. I don’t know if the assumption of the existence of Y
can be removed from the above theorem.
Proof of Theorem 30.42. Let Γ be the set of all geodesics in X . Let K
be a compact subset of Y, and let µ0 , µ1 be two probability measures
supported in K. Let (µt )0≤t≤1 be a displacement interpolation. The set
ΓK of geodesics (γt )0≤t≤1 such that γ0 , γ1 ∈ K is a compact subset of
Γ (X ) (the set of all geodesics in X ). So
n o
XK := γt ; 0 ≤ t ≤ 1; γ0 ∈ K, γ1 ∈ K

is a compact set too, as the image of ΓK × [0, 1] by the continuous map

(γ, t) → γt .
For each x ∈ XK , we may find a small ball Br (x) such that the
displacement convexity inequality defining CD(K, ∞) is satisfied for
all displacement interpolations supported in Br (x); but also the dis-
placement convexity inequality defining CD(K ′ , N ′ ), for some K ′ ∈ R,
N ′ < ∞. (Both K ′ and N ′ will depend on x.) In particular, if (µ′t )t1 ≤t≤t2
is a displacement interpolation supported in Br (x), with µ′t1 absolutely
continuous, then also µ′t is absolutely continuous for all t ∈ (t1 , t2 ) (the
proof is the same as for Theorem 30.19(ii)). By reasoning as in the
proof of Theorem 30.37, one deduces that µt is absolutely continuous
for all t ∈ [0, 1] if µ0 and µ1 are absolutely continuous.
Of course this is not yet sufficient to imply the finiteness of Hν (µt ),
but now we shall be able to reduce to this case by approximation. More
precisely, we shall construct a sequence (Π (k) )k∈N of dynamical optimal
transference plans, such that
b (k)
Π
Π (k) = , b (k) ≤ Π;
0≤Π Z (k) ↑ 1; Z (k) Π (k) ↑ Π,
Z (k)
(30.68)
Locality 913

and
(k)
∀k ∈ N, ∀j ∈ N (j ≤ 1/δ), sup ρjδ < +∞, (30.69)

(k)
where the supremum really is an essential supremum, and ρt is the
(k)
density of µt = (et )# Π (k) with respect to ν.
If we can do this, then by repeating the proof of Theorem 30.37 we
shall obtain
(K,∞) (K,∞)
(k) β (k) β (k)
Hν (µt ) ≤ (1 − t) Hπk1−t
,ν (µ0 ) + t Hπ̌kt ,ν (µ1 ).

Then by monotonicity we may pass to the limit as k → ∞ (as in the

proof of Theorem 17.37, say) and deduce
(K,∞) (K,∞)
1−t β β
Hν (µt ) ≤ (1 − t) Hπ,ν t
(µ0 ) + t Hπ̌,ν (µ1 ). (30.70)

Here µ0 and µ1 are assumed to be supported in a compact subset

K of Y. But then, by regularity of ν, we may introduce an increas-
ing sequence of compact sets (Km )m∈N such that ∪Km = Y, up to
a ν-negligible set. Observe that Y = X (otherwise Spt ν would be
included in Y and strictly smaller than X ); that X \ Y has zero mea-
Rsure; that any µR such that Hν (µ) < +∞ satisfies µs = 0, so Hν (µ) =
X ρ log ρ dν = Y ρ log ρ dν. This makes it possible to run again a clas-
sical scheme to approximate any µ ∈ Pc (X ) with Hν (µ) < +∞ by a
sequence (µm )m∈N , such that µm is supported in Km , µm converges
(K,∞) (K,∞)
1−t β 1−t β
weakly to µ and Hπm R converges to Hπ,ν
,ν if πm → π. (Choose for
instance µm = χm µ/( χm dµ), where χm is a cutoff function satisfying
0 ≤ χm ≤ 1, χm = 0 outside Km+1 , χm = 1 on Km , and argue as in the
proof of Theorem 30.5.) A limit argument will then establish (30.70)
for any two compactly supported probability measures µ0 , µ1 .
So it all boils down to providing an approximation sequence Π (k)
satisfying (30.68), (30.69). This is done in m (simple) steps as follows.
First approximate ρδ by a nondecreasing sequence of bounded den-
sities: 0 ≤ hkδ 1 ≤ ρδ , k1 ∈ N, where each hkδ 1 is bounded and hkδ 1 ↑ ρδ as
k1 → ∞. Define
b k1 (dγ) = (hk1 ν)(dγδ ) Π(dγ|γδ ),
Π δ

where Π(dγ|γt ) stands for the conditional probability of γ, distributed

according to Π and conditioned by its value at time t. Then let
914 30 Weak Ricci curvature bounds II: Geometric and analytic properties

b k1
Π
b k1 [Γ ];
Z k1 = Π Π k1 = .
Z k1
As k1 goes to inﬁnity, it is clear that Z k1 ↑ 1 (in particular, we may as-
sume without loss of generality that Z k1 > 0) and Z k1 Π k1 ↑ Π. More-
over, if ρkt 1 stands for the density of (et )# Π k1 , then ρkδ 1 = (Z k1 )−1 hkδ 1
is bounded.
k1 ,k2
Now comes the second step: For each k1 , let (h2δ )k2 ∈N be a non-
decreasing sequence of bounded functions converging almost surely to
ρk2δ1 as k2 → ∞. Let

b k1 ,k2 (dγ) = (hk1 ,k2 ν)(dγ2δ ) Π k1 (dγ|γ2δ ),

Π 2δ

b k1 ,k2
Π
b k1 ,k2 [Γ ],
Z k1 ,k2 = Π Π k1 ,k2 = ,
Z k1 ,k2
and let ρtk1 ,k2 stand for the density of (et )# Π k1 ,k2 . Then ρδk1 ,k2 ≤
k1 ,k2 k1 ,k2
(Z k1 ,k2 )−1 ρkδ 1 = (Z k1 ,k2 Z k1 )−1 hkδ 1 and ρ2δ = (Z k1 ,k2 )−1 h2δ are
both bounded.
Then repeat the process: If Π k1 ,...,kj has been constructed for any
k1 ,...,kj+1
k1 , . . . , kj in N, introduce a nonincreasing sequence (h(j+1)δ )kj+1 ∈N
1 j k ,...,k
converging almost surely to ρ(j+1)δ as kj+1 → ∞; deﬁne

b k1 ,...,kj+1 (dγ) = (hk1 ,...,kj+1 ν)(dγ(j+1)δ ) Π k1 ,...,kj+1 (dγ|γ(j+1)δ ),

Π (j+1)δ
b k1 ,...,kj+1 [Γ ],
Z k1 ,...,kj+1 = Π
b k1 ,...,kj+1 /Z k1 ,...,kj+1 ,
Π k1 ,...,kj+1 = Π
k ,...,kj+1
µt 1 = (et )# Π k1 ,...,kj+1 .
k ,...,k k ,...,k
Then for any t ∈ {δ, 2δ, . . . , (j +1)δ}, the density ρt 1 j+1
of µt 1 j+1

is bounded.
After m operations this process has constructed Π k1 ,...,km such that
all densities ρkjδ1 ,...,km are bounded. The proof is completed by choosing
Π (k) = Π k,...,k , Z (k) = Z k · Z k,k · . . . · Z k,...,k . ⊓
⊔
Bibliographical notes 915

Appendix: Localization in measure spaces

In this Appendix I recall some basic facts about the use of cutoﬀ func-
tions to reduce to compact sets. Again, the natural setting is that of
boundedly compact metric spaces, i.e. metric spaces where closed balls
are compact.

Definition 30.44 (Cutoff functions). Let (X , d) be a boundedly

compact metric space, and let ⋆ be an arbitrary base point. For any
R > 0, let BR be the closed ball B[⋆, R]. A ⋆-cutoff is a family of non-
negative continuous functions (χR )R>0 such that 1BR ≤ χR ≤ 1BR+1
for all R.

More explicitly: χR is valued in [0, 1], and χR is identically equal

to 1 on BR , identically equal to 0 on BR+1 .
The existence of a ⋆-cutoff follows from Urysohn’s lemma.
If µ is any finite measure on X , then χR µ converges to µ in total
variation norm; moreover, for any R > 0, the truncation operator TR :
µ → χR µ is a (nonstrict) contraction. As a particular case, if ν is any
measure on X , and f ∈ L1 (X , ν), then χR f converges to f in L1 (ν).
A consequence is the density of Cc (X ) in L1 (X , ν), as soon as ν
is locally finite. Indeed, if f is given in L1 (X , ν), first choose R such
that kf kL1 (X \BR ,ν) ≤ δ; then pick up some g ∈ C(BR+1 ) such that
kf − gkL1 (BR+1 ,ν) ≤ δ. (Since BR+1 is compact, this can be done with
Lusin’s theorem.) Finally define e g := g χR , extended by 0 outside of
BR : then ge is a continuous function with compact support, and it is
easy to check that kf − gekL1 (X ,ν) ≤ 2δ.

Bibliographical notes

Most of the material in this chapter comes from papers by Lott and
myself [577, 578, 579] and by Sturm [762, 763]. Some of the results
are new. Prior to these references, there had been an important se-
ries of papers by Cheeger and Colding [227, 228, 229, 230], with a
follow-up by Ding [303], about the structure of measured Gromov–
Hausdorﬀ limits of sequences of Riemannian manifolds satisfying a
uniform CD(K, N ) bound. Some of the results by Cheeger and Cold-
ing can be re-interpreted in the present framework, but for many other
916 30 Weak Ricci curvature bounds II: Geometric and analytic properties

theorems this remains open: Examples are the generalized splitting the-
orem [227, Theorem 6.64], the theorem of mutual absolute continuity
of admissible reference measures [230, Theorem 4.17], or the theorem
of continuity of the volume in absence of collapsing [228, Theorem 5.9].
(A positive answer to the problem raised in Remark 30.16 would solve
the latter issue.)
Theorem 30.2 is taken from work by Lott and myself [577] as well
as Corollary 30.9, Theorems 30.22 and 30.23, and the first part of The-
orem 30.28. Theorem 30.7, Corollary 30.10, Theorems 30.11 and 30.17
are due to Sturm [762, 763]. Part (i) of Theorem 30.19 was proven
by Lott and myself in the case K = 0. Part (ii) follows a scheme of
proof communicated to me by Sturm. In a Euclidean context, Theo-
rem 30.20 is well-known to specialists and used in several recent works
about optimal transport; I don’t know who first made this observation.
The Poincaré inequalities appearing in Theorems 30.25 and 30.26
(in the case K = 0) are due to Lott and myself [578]. The concept of
upper gradient was put forward by Heinonen and Koskela [470] and
other authors; it played a key role in Cheeger’s construction [226] of a
differentiable structure on metric spaces satisfying a doubling condition
and a local Poincaré inequality. Independently of [578], there were sev-
eral simultaneous treatments of local Poincaré inequalities under weak
CD(K, N ) conditions, by Sturm [763] on the one hand, and von Re-
nesse [825] on the other. The proofs in all of these works have many
common points, and also common features with the proof by Cheeger
and Colding [230]. But the argument by Cheeger and Colding uses an-
other inequality called the “segment inequality” [227, Theorem 2.11],
which as far as I know has not been adapted to the context of metric-
measure spaces. In [578] on the other hand we used the concept of
“democratic condition”, as in Theorem 19.13.
All these notions (possibly coupled with a doubling condition) are
stable under the Gromov–Hausdorff limit: this was checked in [226, 510,
529] for the Poincaré inequality, in [230] for the segment inequality, and
in [578] for the democratic condition.
Theorem 30.28(ii) is due to Lott and myself [579]; it uses Propo-
sition 30.30 with L(d) = d2 /2. In this particular case (and under a
nonessential compactness assumption), a complete proof of Proposi-
tion 30.30 can be found in [579]. It is also shown there that the conclu-
sions of Proposition 22.16 all remain true if (X , d) is a finite-dimensional
Alexandrov space with curvature bounded below; this is a pointwise
Bibliographical notes 917

property, as opposed to the “almost everywhere” statement appearing

in Proposition 30.30(vii). As for the proof of this almost everywhere
result, it is based on Cheeger’s generalized Rademacher theorem [226,
Theorem 10.2]. (The argument is written down in [579] only for the
case L(d) = d2 /2 but this is no big deal.)
An even more general proof of Theorem 30.28(ii) was given by Go-
zlan [429].
As a consequence of the doubling and local Poincaré inequalities, a
weak CD(K, N ) space with N < ∞ automatically has some regularity
(a differentiable structure defined almost everywhere); see again [226].
For Alexandrov spaces, such “automatic regularity” results have been
obtained in [665, 677]; see [174, Chapter 10] for a survey.
Inequality (30.32) was proven by Lott and myself in [577], at a time
when we did not have the general definition of weak CD(K, N ) spaces.
The argument is inspired by previous works of Otto and myself [671,
Theorem 4], and Ledoux [544]. It might still have some interest, since
there is no reason why CD(0, N ) and CD(K, ∞) together should imply
CD(K, N ).
As discussed in [654, 656, 763], several of the inequalities estab-
lished in the present chapter (and elsewhere) from the CD(K, N ) prop-
erty also follow from the measure-contraction property MCP(K, N ),
which, at least in nonbranching spaces, is weaker than CD(K, N ). (But
MCP(K, N ) cannot be directly related to Ricci curvature bounds, as
explained in the bibliographical notes of Chapter 29.) Building on pre-
vious works such as [226, 538, 539, 459, 760], Sturm [763, Section 7]
discussed other consequences of MCP(K, N ), under the additional as-
sumption that limr→0 ν[Br (x)]/r N is bounded. These consequences in-
clude results on Dirichlet forms, Sobolev spaces, Harnack inequalities,
Hölder continuity of harmonic functions, Gaussian estimates for heat
kernels.
I proved Theorem 30.32 specifically for these notes, but a very
close statement was also obtained shortly after and independently by
Sturm [763, Proposition IV.2], at least for absolutely continuous mea-
sures. Sturm’s proof is different from mine, although many common
ingredients can be recognized.
The treatment of singular measures in Theorem 30.32 (Step 4 of
the proof) grew out of a joint work of mine with Figalli [372]; there
we proved Theorem 30.32 (more precisely, the parts which were not
proven in [577]) in smooth Riemannian manifolds. The proof in [372] is
918 30 Weak Ricci curvature bounds II: Geometric and analytic properties

slightly different from the one which I gave here; it uses Lemma 29.7. An
alternative “Eulerian” approach to displacement convexity for singular
measures was implemented by Daneri and Savaré [271].
In Alexandrov spaces, the locality of the notion “curvature is
bounded below by κ” is called Toponogov’s theorem; in full gen-
erality it is due to Perelman [175]. A proof can be found in [174, The-
orem 10.3.1], along with bibliographical comments.
The conditional locality of CD(K, ∞) in nonbranching spaces (The-
orem 30.42) was first proven by Sturm [762, Theorem 4.17], with a
different argument than the one used here. Sturm does not make any
assumption about infinite-dimensional points, but he assumes that the
space of probability measures µ with Hν (µ) < +∞ is geodesically con-
vex. It is clear that the proof of Theorem 30.42 can be adapted and
simplified to cover this assumption. Theorem 30.37 is new as far as
I know. Example 30.41 was suggested to me by Lott.
When one restricts to λ = 1/2, Conjecture 30.34 takes a simpler
form, and at least seems to be true for all θ outside (0, 1); but of course
we are interested precisely in the range θ ∈ (0, 1). I once hoped to prove
Conjecture 30.34 by reinterpreting it as the locality of the CD(K, N )
property in 1-dimensional spaces, and classifying 1-dimensional local
weak CD(K, N ) spaces; but I did not manage to get things to work.
Natural geometric questions, related to the locality problem, are the
stability of CD(K, N ) under quotient by Lie group action and lifting
to the universal covering. I shall briefly discuss what is known about
these issues.
• About the quotient problem, there are some results. In [577, Sec-
tion 5.5], Lott and I proved that the quotient of a CD(K, N ) metric-
measure space X by a Lie group of isometries G is itself CD(K, N ),
under the assumptions that (a) X and G are compact; (b) K = 0
or N = ∞; (c) any two absolutely continuous probability measures
on X are joined by a unique displacement interpolation which is ab-
solutely continuous for all times. The definition of CD(K, ∞) which
was used in [577] is not exactly the same as in these notes, but Theo-
rem 30.32 guarantees that there is no difference if X is nonbranching.
Assumption (c) was used only to make sure that any displacement in-
terpolation between absolutely continuous probability measures would
satisfy the displacement interpolation inequalities which are charac-
teristic of CD(0, N ); but Theorem 30.32 ensures that this is the case
in nonbranching CD(0, N ) spaces, so the proof would go through if
Bibliographical notes 919

Assumption (c) were replaced by just a nonbranching property. As-

sumption (b) is probably easy to remove. On the other hand, relaxing
Assumption (a) does not seem trivial at all, and would require more
thinking.
• About the lifting problem, one might first think that it follows
from the locality, as stated for instance in Theorems 30.37 or 30.42.
But even in situations where CD(K, N ) has been shown to be local
(say N = 0 and X is nonbranching), the existence of the universal
covering is not obvious. Abstract topology shows that the existence
of the universal covering of X is equivalent to X being semi-locally
simply connected (“délaçable” in the terminology of Bourbaki). This
property is satisfied if X is locally contractible, that is, every point x
has a neighborhood which can be contracted into x. For instance, an
Alexandrov space with curvature bounded below is locally contractible
because any point x has a neighborhood which is homeomorphic to the
tangent cone at x; no such theory is known for weak CD(K, N ) spaces.
(All this was explained to me by Lott.)
Here are some technical notes to conclude.
An introduction to Hausdorff measure and Hausdorff dimension can
be found, e.g., in Falconer’s broad-audience books [337, 338].
The Dunford–Pettis theorem provides a sufficient condition for uni-
form equi-integrability: If a family F ⊂ L1 (ν) is weakly sequentially
compact in L1 (ν), then there exists a function
R Ψ : R+ → R+ such that
Ψ (r)/r → +∞ as r → ∞ and supf ∈F Ψ (f ) dν < +∞. A proof can
be found, e.g., in [177, Theorem 2.12] (there the theorem is stated in
Rn but the proof is the same in a more general space), or in my own
course on integration [819, Section VII-5].
Urysohn’s lemma [318, Theorem 2.6.3] states the following: If (X , d)
is a locally compact metric space (or even just a locally compact Haus-
dorff space), K is a compact subset of X and O is an open subset of X
with K ⊂ O, then there is f ∈ Cc (X ) with 1K ≤ f ≤ 1O .
Analysis on metric spaces (in terms of regularity, Sobolev spaces,
etc.) has undergone rapid development in the past ten years, after the
pioneering works by Hajlasz and Koskela [459, 461] and others. Among
dozens and dozens of papers, I shall only quote two reference books [37,
469] and a survey paper [460]; see also the bibliographical notes of
Chapter 26 for more references about analysis in Alexandrov spaces.
The thesis developed in the present course is that optimal transport
has suddenly become an important actor in this theory.
Conclusions and open problems
Conclusions and open problems 923

In these notes I have tried to present a consistent picture of the the-

ory of optimal transport, with a dynamical, probabilistic and geometric
point of view, insisting on the notions of displacement interpolation,
probabilistic representation, and curvature effects.
The qualitative description of optimal transport, developed in Part I,
now seems to be more or less under control. Even the smoothness of
the transport map in curved geometries starts to be better understood,
thanks in particular to the recent works of Grégoire Loeper, Xinan Ma,
Neil Trudinger and Xu-Jia Wang which were described in Chapter 12.
Among issues which seem to be of interest I shall mention:
• find relevant examples of cost functions with nonnegative, or posi-
tive c-curvature (Definition 12.27), and theorems guaranteeing that the
optimal transport does not approach singularities of the cost function
— so that the smoothness of the transport map can be established;
• get a precise description of the singularities of the optimal trans-
port map when the latter is not smooth;
• further analyze the displacement interpolation on singular spaces,
maybe via nonsmooth generalizations of Mather’s estimates (as in Open
Problem 8.21).
For the applications of optimal transport to Riemannian geometry,
a consistent picture is also emerging, as I have tried to show in Part II.
The main regularity problems seem to be under control here, but there
remain several challenging “structural” problems:
• How can one best understand the relation between plain displace-
ment convexity and distorted displacement convexity, as described in
Chapter 17? Is there an Eulerian counterpart of the latter concept? See
Open Problems 17.38 and 17.39 for more precise formulations.
• Optimal transport seems to work well to establish sharp geometric
inequalities when the “natural dimension of the inequality” coincides
with the dimension bound; on the other hand, so far it has failed to es-
tablish for instance sharp logarithmic Sobolev or Talagrand inequalities
(infinite-dimensional) under a CD(K, N ) condition for N < ∞ (Open
Problems 21.6 and 22.44). The sharp L2 -Sobolev inequality (21.9) has
also escaped investigations based on optimal transport (Open Prob-
lems 21.11). Can one find a more precise strategy to attack such prob-
lems by a displacement convexity approach? A seemingly closely related
question is whether one can mimick (maybe by changes of unknowns in
the transport problem?) the changes of variables in the Γ2 formalism,
924 Conclusions and open problems

which are often at the basis of the derivation of such sharp inequalities,
as in the recent papers of Jérôme Demange. To add to the confusion, the
mysterious structure condition (25.10) has popped out in these works;
it is natural to ask whether this condition has any interpretation in
terms of optimal transport.
• Are there interesting examples of displacement convex func-
tionals apart from the ones that have already
R been explored
R during the
past ten years — basically all of the form M U (ρ) dν + M k V dµ⊗k ? It
is frustrating that so few examples of displacement convex functionals
are known, in contrast with the enormous amount of plainly convex
functionals that one can construct. Open Problem 15.11 might be re-
lated to this question.
• Is there a transport-based proof of the famous Lévy–Gromov
isoperimetric inequalities (Open Problem 21.16), that would not
involve so much “hard analysis” as the currently known arguments?
Besides its intrinsic interest, such a proof could hopefully be adapted
to nonsmooth spaces such as the weak CD(K, N ) spaces studied in
Part III.
• Caffarelli’s log concave perturbation theorem (alluded to in
Chapter 2) is another riddle in the picture. The Gaussian space can
be seen as the infinite-dimensional version of the sphere, which is the
Riemannian “reference space” with positive constant (sectional) curva-
ture; and the space Rn equipped with a log concave measure is a space
of nonnegative Ricci curvature. So Caffarelli’s theorem can be restated
as follows: If the Euclidean space (Rn , d2 ) is equipped with a probability
measure ν that makes it a CD(K, ∞) space, then ν can be realized as a
1-Lipschitz push-forward of the reference Gaussian measure with cur-
vature K. This implies almost obviously that isoperimetric inequalities
in (Rn , d2 , ν) are not worse than isoperimetric inequalities in the Gaus-
sian space; so there is a strong analogy between Caffarelli’s theorem on
the one hand, and the Lévy–Gromov isoperimetric inequality on the
other hand. It is natural to ask whether there is a common framework
for both results; this does not seem obvious at all, and I have not been
able to formulate even a decent guess of what could be a geometric
generalization of Caffarelli’s theorem.
• Another important remark is that the geometric theory has been
almost exclusively developed in the case of the optimal transport with
quadratic cost function; the exponent p = 2 here is natural in the con-
text of Riemannian geometry, but working with other exponents (or
Conclusions and open problems 925

with radically diﬀerent Lagrangian cost functions) might lead to new

geometric territories. An illustration is provided by the recent work of
Shin-ichi Ohta in Finsler geometry. A related question is Open Prob-
lem 15.12.
In Part III of these notes, I discussed the emerging theory of weak
Ricci curvature lower bounds in metric-measure spaces, based on dis-
placement convexity inequalities. The theory has grown very fast and
is starting to be rather well-developed; however, some challenging is-
sues remain to be solved before one can consider it as mature. Here are
three missing pieces of the puzzle:
• A globalization theorem that would play the role of the
Toponogov–Perelman theorem for Alexandrov spaces with a lower
bound on the curvature. This result should state that a weak lo-
cal CD(K, N ) space is automatically a weak CD(K, N ) space. The-
orem 30.37 shows that this is true at least if K = 0, N < ∞ and X is
nonbranching; if Conjecture 30.34 turns out to be true, the same result
will be available for all values of K.
• The compatibility with the theory of Alexandrov spaces
(with lower curvature bounds). Alexandrov spaces have proven their
flexibility and have gained a lot of popularity among geometers. Since
Alexandrov bounds are weak sectional curvature bounds, they should in
principle be able to control weak Ricci curvature bounds. The natural
question here can be stated as follows: Let (X , d) be a finite-dimensional
Alexandrov space with dimension n and curvature bounded below by κ,
and let Hn be the n-dimensional Hausdorff measure on X ; is (X , d, Hn )
a weak CD((n − 1)κ, n) space?
• A thorough discussion of the branching problem: Find exam-
ples of weak CD(K, N ) spaces that are branching; that are singular
but nonbranching; identify simple regularity conditions that prevent
branching; etc. It is also of interest to enquire whether the nonbranch-
ing assumption can be dispensed with in Theorems 30.26 and 30.37
(recall Remarks 30.27 and 30.39).
More generally, we would like to have more information about the
structure of weak CD(K, N ) spaces, at least when N is finite. It is
known from the work of Jeff Cheeger and others that metric-measures
spaces in which the measure is (locally) doubling and satisfies a (local)
Poincaré inequality have at least some little bit of regularity: There is a
tangent space defined almost everywhere, varying in a measurable way.
926 Conclusions and open problems

In the context of Alexandrov spaces with curvature bounded below,

some rather strong structure theorems have been established by Grigori
Perelman and others; it is natural to ask whether similar results hold
true for weak CD(K, N ) spaces.
Another relevant problem is to check the compatibility of the
CD(K, N ) condition with the operations of quotient by Lie group ac-
tions, and lifting to the universal covering. As explained in the biblio-
graphical notes of Chapter 30, only partial results are known in these
directions.
Besides these issues, it seems important to ﬁnd further examples of
weak CD(K, N ) spaces, apart from the ones presented in Chapter 29,
mostly constructed as limits or quotients of manifolds. It was realized in
a recent Oberwolfach meeting, as a consequence of discussions between
Dario Cordero-Erausquin, Karl-Theodor Sturm and myself, that the
Euclidean space Rn , equipped with any norm k · k, is a weak CD(0, n)
space:

Theorem. Let k · k be a norm on Rn (considered as a distance on

Rn × Rn ), and let λn be the n-dimensional Lebesgue measure. Then
the metric-measure space (Rn , k · k, λn ) is a weak CD(0, n) space in the
sense of Definition 29.8.

I did not include this theorem in the body of these notes, because it
appeals to some results that have not yet been adapted to a genuinely
geometric context, and which I preferred not to discuss. I shall sketch
the proof at the end of this text, but first I would like to explain why
this result is at the same time motivating, and a bit shocking:
(a) As pointed out to me by John Lott, if k · k is not Euclidean,
then the metric-measure space (Rn , k · k, λn ) cannot be realized as a
limit of smooth Riemannian manifolds with a uniform CD(0, N ) bound,
because it fails to satisfy the splitting principle. (If a nonnegatively
curved space admits a line, i.e. a geodesic parametrized by R, then the
space can be “factorized” by this geodesic.) Results by Jeff Cheeger,
Toby Colding and Detlef Gromoll say that the splitting principle holds
for CD(0, N ) manifolds and their measured Gromov–Hausdorff limits.
(b) If k · k is not the Euclidean norm, the resulting metric space
is very singular in certain respects: It is in general not an Alexandrov
space, and it can be extremely branching. For instance, if k · k is the
ℓ∞ norm, then any two distinct points are joined by an uncountable
Conclusions and open problems 927

inﬁnity of geodesics. Since (Rn , k · kℓ∞ , λn ) is the (pointed) limit of the

nonbranching spaces (Rn , k · kℓp , λn ) as p → ∞, we also realize that
weak CD(K, N ) bounds do not prevent the appearance of branching in
measured Gromov-Hausdorff limits, at least if K ≤ 0.
On the other hand, the study of optimal Sobolev inequalities in Rn
which I performed together with Bruno Nazaret and Dario Cordero-
Erausquin shows that optimal Sobolev inequalities basically do not
depend on the choice of the norm on Rn . In a Riemannian context,
Sobolev inequalities strongly depend on Ricci curvature bounds; so, our
result suggests that it is not absurd to decide that Rn is a weak CD(0, n)
space independently of the norm. Shin-ichi Ohta has developed this
point of view by studying curvature-dimension conditions in certain
classes of Finsler spaces.
One can also ask whether there are additional regularity conditions
that might be added to the definition of weak CD(K, N ) space, in
order to enforce nonbranching, or the splitting principle, or both, and
in particular rule out non-Euclidean norms.
As a side consequence of point (a) above, we realize that smooth
CD(K, N ) manifolds are not dense in the spaces CDD(K, N, D, m, M )
introduced in Theorem 29.32.
The interpretation of dissipative equations as gradient flows with
respect to optimal transport, and the theory reviewed in Chapters 23
to 25, also lead to fascinating issues that are relevant in smooth or
nonsmooth geometry as well as in partial differential equations. For
instance,
(a) Can one define a reasonably well-behaved heat flow on weak
CD(K, N ) spaces by taking the gradient flow for Boltzmann’s H func-
tional? The theory of gradient flows in abstract metric spaces has been
pushed very far, in particular by Luigi Ambrosio, Giuseppe Savaré and
collaborators; so it might not be so difficult to define an object that
would play the role of a heat semigroup. But this will be of limited
value unless one can prove relevant theorems about this object.
Shin-ichi Ohta, and independently Giuseppe Savaré, recently made
progress in this direction by constructing gradient flows in the Wasser-
stein space over a finite-dimensional Alexandrov space of curvature
bounded below, or over more general spaces satisfying very weak reg-
ularity assumptions expressed in terms of distances and angles. In the
particular case when the energy functional is the Boltzmann entropy,
this provides a natural notion of heat equation and heat semigroup.
928 Conclusions and open problems

Savaré uses a very elegant argument, based on properties of Wasser-

stein distances and entropy, to prove the linearity of this semigroup,
and other properties as well (positivity, contraction in Wp for 1 ≤ p ≤ 2,
contraction in Lp , some regularizing effect).
This problem might be related to the possibility of defining a Laplace
operator on a singular space, an issue which has been addressed in par-
ticular by Jeff Cheeger and Toby Colding, for limits of Riemannian
manifolds. However, their construction is strongly based on regular-
ity properties enjoyed by such limits, and breaks down, e.g., for Rn
equipped with a non-Euclidean norm k · k. In fact, as noted by Karl-
Theodor Sturm, the gradient flow of the H functional in P2 ((RN , k · k))
yields a nonlinear evolution. The fact that this equation has the same
fundamental solution (in the sense of a solution evolving from a Dirac
mass) as the Euclidean one is one argument to believe that this is a
natural notion of heat equation on the non-Euclidean RN — and may
reinforce us in the conviction that this space deserves its status of weak
CD(0, N ) space.
(b) Can one extend the theory of dissipative equations to other
equations, which are of Hamiltonian, or, even more interestingly, of
dissipative Hamiltonian nature? As explained in the bibliographical
notes of Chapter 23, there has been some recent work in that direction
by Luigi Ambrosio, Wilfrid Gangbo and others, however the situation
is still far from clear.
A loosely related issue is the study of the semi-geostrophic system,
which in the simplest situations can formally be written as a Hamil-
tonian flow, where the Hamiltonian function is the square Wasserstein
distance with respect to some uniform reference measure. I think that
the rigorous qualitative understanding of the semi-geostrophic system
is one of the most exciting problems that I am aware of in theoretical
fluid mechanics; and discussions with Mike Cullen convinced me that
it is very relevant in applications to meteorology. Although the theory
of the semi-geostrophic system is still full of fundamental open prob-
lems, enough has already been written on it to make the substance of
a complete monograph.
On a much more theoretical level, the geometric understanding of
the Wasserstein space P2 (X ), where X is a Riemannian manifold or just
a geodesic space, has been the object of several recent studies, and still
retains many mysteries. For instance, there is a neat statement accord-
ing to which P2 (X ) is nonnegatively curved, in the sense of Alexandrov,
Conclusions and open problems 929

if and only if X itself is nonnegatively curved. But there is no similar

statement for nonzero lower bounds on the curvature! In fact, if x is
a point of negative curvature, then the curvature of P2 (X ) seems to
be unbounded in both directions (+∞ and −∞) in the neighborhood
of δx . Also it is not clear what exactly is “the right” structure on, say,
P2 (Rn ); recent works on the subject have suggested differing answers.
Another relevant open problem is whether there is a natural “volume”
measure on P2 (M ). Karl-Theodor Sturm and Max-Kostja von Renesse
have recently managed to construct a natural one-parameter family of
“Gibbs” probability measures on P2 (S 1 ). A multi-dimensional general-
ization would be of great interest.
In their book on gradient flows, Luigi Ambrosio, Nicola Gigli and
Giuseppe Savaré make an intriguing observation: One may define “gen-
eralized geodesics” in P2 (Rn ) by considering the law of (1−t) X0 +t X1 ,
where (X0 , Z) and (X1 , Z) are optimal couplings. These generalized
geodesics have intriguing properties: For instance, they still satisfy the
characteristic displacement interpolation inequalities; and they provide
curves of “nonpositive curvature”, that can be exploited for various pur-
poses, such as error estimates for approximate gradient flow schemes. It
is natural to further investigate the properties of these objects, which
are reminiscent of the c-segments considered in Chapter 12.
The list above provides but a sample among the many problems
that remain open in the theory of optimal transport.
As I already mentioned in the preface, another crucial issue which
I did not address at all is the numerical analysis of optimal trans-
port. This topic also has a long and complex history, with some famous
schemes such as the old simplex algorithm, described for instance in
Alexander Schrijver’s monograph Combinatorial Optimization: Polyhe-
dra and Efficiency; or the more recent auction algorithm developed by
Dimitri Bertsekas. Numerical schemes based on Monge–Ampère equa-
tions have been suggested, but hardly implemented yet. Recent works
by Uriel Frisch and collaborators in cosmology provide an example
where one would like to efficiently solve the optimal transport problem
with huge sets of data. To add to the variety of methods, continuous
schemes based on partial differential equations have been making their
way lately. All in all, this subject certainly deserves a systematic study
on its own, with experiments, comparisons of algorithms, benchmark
problems and so forth. By the way, the optimum matching problem
930 Conclusions and open problems

is one of the topics that Donald Knuth has planned to address in his
long-awaited Volume 4 of The Art of Computer Programming.
Needless to say, the theory might also decide to explore new horizons
which I am unable to foresee.

Sketch of proof of the Theorem. First consider the case when N = k · k

is a uniformly convex, smooth norm, in the sense that

λ In ≤ ∇ 2 N 2 ≤ Λ I n

for some positive constants λ and Λ. Then the cost function c(x, y) =
N (x − y)2 is both strictly convex and C 1,1 , i.e. uniformly semiconcave.
This makes it possible to apply Theorem 10.28 (recall Example 10.35)
and deduce the following theorem about the structure of optimal maps:
If µ0 and µ1 are compactly supported and absolutely continuous, then
there is a unique optimal transport, and it takes the form

T (x) = x − ∇(N 2 )∗ (−∇ψ(x)), ψ a c-convex function.

Since the norm is uniformly convex, geodesic lines are just straight
lines; so the displacement interpolation takes the form (Tt )# (ρ0 λn ),
where
Tt (x) = x − t ∇(N 2 )∗ (−∇ψ(x)) 0 ≤ t ≤ 1.
Let θ(x) = ∇(N 2 )∗ (−∇ψ(x)). By [814, Remark 2.56], the Jacobian
matrix ∇θ, although not symmetric, is pointwise diagonalizable, with
eigenvalues bounded above by 1 (this remark goes back at least to a
1996 preprint by Otto [666, Proposition A.4]; a more general statement
is in [30, Theorem 6.2.7]). It follows easily that t → det(In − t∇θ)1/n
is a concave function of t [814, Lemma 5.21], and one can reproduce
the proof of displacement convexity for Uλn , as soon as U ∈ DCn [814,
Theorem 5.15 (i)].
This shows that (Rn , N, λn ) satisﬁes the CD(0, n) displacement con-
vexity inequalities when N is a smooth uniformly convex norm. Now if
N is arbitrary, it can be approximated by a sequence (Nk )k∈N of smooth
uniformly convex norms, in such a way that (Rn , N, λn , 0) is the pointed
measured Gromov–Hausdorﬀ limit of (Rn , Nk , λn , 0) as k → ∞. Then
the general conclusion follows by stability of the weak CD(0, n) crite-
rion (Theorem 29.24). ⊓
⊔
Conclusions and open problems 931

Remark. In the above argument the spaces (Rn , Nk , λn ) satisfy the

property that the displacement interpolation between any two abso-
lutely continuous, compactly supported probability measures is unique,
while the limit space (Rn , N, λn ) does not necessarily satisfy this prop-
erty. For instance, if N = k · kℓ∞ , there is an enormous number of dis-
placement interpolations between two given probability measures; and
most of them do not satisfy the displacement convexity inequalities that
are used to define CD(0, n) bounds. This shows that if in Definition 29.8
one requires the inequality (29.11) to hold true for any Wasserstein
geodesic, rather than for some Wasserstein geodesic, then the resulting
CD(K, N ) property is not stable under measured Gromov–Hausdorff
convergence.
References

1. Abdellaoui, T., and Heinich, H. Sur la distance de deux lois dans le cas
vectoriel. C. R. Acad. Sci. Paris Sér. I Math. 319, 4 (1994), 397–400.
2. Abdellaoui, T., and Heinich, H. Caractérisation d’une solution optimale
au problème de Monge–Kantorovitch. Bull. Soc. Math. France 127, 3 (1999),
429–443.
3. Agrachev, A., and Lee, P. Optimal transportation under nonholonomic
constraints. To appear in Trans. Amer. Math. Soc.
4. Agueh, M. Existence of solutions to degenerate parabolic equations via the
Monge–Kantorovich theory. Adv. Differential Equations 10, 3 (2005), 309–360.
5. Agueh, M., Ghoussoub, N., and Kang, X. Geometric inequalities via a
general comparison principle for interacting gases. Geom. Funct. Anal. 14, 1
(2004), 215–244.
6. Ahmad, N. The geometry of shape recognition via the Monge–Kantorovich
optimal transport problem. PhD thesis, Univ. Toronto, 2004.
7. Aida, S. Uniform positivity improving property, Sobolev inequalities, and
spectral gaps. J. Funct. Anal. 158, 1 (1998), 152–185.
8. Ajtai, M., Komlós, J., and Tusnády, G. On optimal matchings. Combi-
natorica 4, 4 (1984), 259–264.
9. Alberti, G. On the structure of singular sets of convex functions. Calc. Var.
Partial Differential Equations 2, 1 (1994), 17–27.
10. Alberti, G. Some remarks about a notion of rearrangement. Ann. Scuola
Norm. Sup. Pisa Cl. Sci (4) 29, 2 (2000), 457–472.
11. Alberti, G., and Ambrosio, L. A geometrical approach to monotone func-
tions in Rn . Math. Z. 230, 2 (1999), 259–316.
12. Alberti, G., Ambrosio, L., and Cannarsa, P. On the singularities of
convex functions. Manuscripta Math. 76, 3-4 (1992), 421–435.
13. Alberti, G., and Bellettini, G. A nonlocal anisotropic model for phase
transitions. I: The optimal profile problem. Math. Ann. 310, 3 (1998), 527–560.
14. Albeverio, S., and Cruzeiro, A. B. Global flows with invariant (Gibbs)
measures for Euler and Navier–Stokes two-dimensional fluids. Comm. Math.
Phys. 129, 3 (1990), 431–444.
15. Alesker, S., Dar, S., and Milman, V. D. A remarkable measure preserving
diffeomorphism between two convex bodies in Rn . Geom. Dedicata 74, 2 (1999),
201–212.
934 References

16. Alexandrov, A. D. Existence and uniqueness of a convex surface with given

integral curvature. Dokl. Akad. Nauk. SSSR 35 (1942), 131–134.
17. Alexandrov, A. D. Smoothness of a convex surface of bounded Gaussian
curvature. Dokl. Akad. Nauk. SSSR 36 (1942), 195–199.
18. Aliprantis, C. D., and Border, K. C. Infinite dimensional analysis. A
hithchiker’s guide. Third edition. Springer, Berlin, 2006.
19. Ambrosio, L. Minimizing movements. Rend. Accad. Naz. Sci. XL Mem. Mat.
Appl. (5) 19 (1995), 191–246.
20. Ambrosio, L. Lecture notes on optimal transport problems. In Mathematical
aspects of evolving interfaces (Funchal, 2000), vol. 1812 of Lecture Notes in
Math. Springer, Berlin, 2003, pp. 1–52.
21. Ambrosio, L. Transport equation and Cauchy problem for BV vector fields.
Invent. Math. 158, 2 (2004), 227–260.
22. Ambrosio, L. Steepest descent flows and applications to spaces of proba-
bility measures. In Recent trends in partial differential equations, vol. 409 of
Contemp. Math., Amer. Math. Soc., Providence, RI, 2006, pp. 1–32.
23. Ambrosio, L. Disintegrations of Gaussian measures. Personal communication,
2008.
24. Ambrosio, L., and Figalli, A. Geodesics in the space of measure-preserving
maps and plans. To appear in Arch. Ration. Mech. Anal.
25. Ambrosio, L., and Figalli, A. On the regularity of the pressure field of
Brenier’s weak solutions to incompressible Euler equations. To appear in Calc.
Var. Partial Differential Equations.
26. Ambrosio, L., Fusco, N., and Pallara, D. Functions of bounded varia-
tion and free discontinuity problems. Oxford Mathematical Monographs. The
Clarendon Press Oxford University Press, New York, 2000.
27. Ambrosio, L., and Gangbo, W. Hamiltonian flows in P2 (R2d ). Report at
the MSRI Optimal Transport meeting, Berkeley, November 2005.
28. Ambrosio, L., and Gangbo, W. Hamiltonian ODE’s in the Wasserstein
space of probability measures. Comm. Pure Appl. Math. 51 (2007), 18–53.
29. Ambrosio, L., and Gigli, N. Construction of the parallel transport in the
Wasserstein space. To appear in Meth. Appl. Anal.
30. Ambrosio, L., Gigli, N., and Savaré, G. Gradient flows in metric spaces
and in the space of probability measures. Lectures in Mathematics ETH Zürich.
Birkhäuser Verlag, Basel, 2005.
31. Ambrosio, L., Kirchheim, B., and Pratelli, A. Existence of optimal trans-
port maps for crystalline norms. Duke Math. J. 125, 2 (2004), 207–241.
32. Ambrosio, L., and Pratelli, A. Existence and stability results in the L1
theory of optimal transportation. In Optimal transportation and applications
(Martina Franca, 2001), vol. 1813 of Lecture Notes in Math. Springer, Berlin,
2003, pp. 123–160.
33. Ambrosio, L., and Rigot, S. Optimal mass transportation in the Heisenberg
group. J. Funct. Anal. 208, 2 (2004), 261–301.
34. Ambrosio, L., and Savaré, G. Personal communication, 2006.
35. Ambrosio, L., Savaré, G., and Zambotti, L. Existence and stability for
Fokker–Planck equations with log-concave reference measure. To appear in
Probab. Theory Related Fields.
36. Ambrosio, L., and Serfaty, S. A gradient flow approach to an evolution
problem arising in superconductivity. To appear in Comm. Pure Appl. Math.
References 935

37. Ambrosio, L., and Tilli, P. Topics on analysis in metric spaces, vol. 25 of
Oxford Lecture Series in Mathematics and its Applications. Oxford University
Press, Oxford, 2004.
38. Andres, S., and von Renesse, M.-K. Particle approximation of the Wasser-
stein diffusion. Preprint, 2007.
39. Andreu, F., Caselles, V., and Mazón, J. M. The Cauchy problem for
a strongly degenerate quasilinear equation. J. Eur. Math. Soc. (JEMS) 7, 3
(2005), 361–393.
40. Andreu, F., Caselles, V., Mazón, J., and Moll, S. Finite propagation
speed for limited flux diffusion equations. Arch. Rational Mech. Anal. 182, 2
(2006), 269–297.
41. Ané, C., Blachère, S., Chafaı̈, D., Fougères, P., Gentil, I., Malrieu,
F., Roberto, C., and Scheffer, G. Sur les inégalités de Sobolev logarith-
miques, vol. 10 of Panoramas et Synthèses. Société Mathématique de France,
2000.
42. Appell, P. Mémoire sur les déblais et les remblais des systèmes continus
ou discontinus. Mémoires présentés par divers Savants à l’Académie des Sci-
ences de l’Institut de France, Paris No. 29 (1887), 1–208. Available online at
gallica.bnf.fr.
43. Arnold, A., Markowich, P., Toscani, G., and Unterreiter, A. On
logarithmic Sobolev inequalities and the rate of convergence to equilibrium for
Fokker–Planck type equations. Comm. Partial Differential Equations 26, 1–2
(2001), 43–100.
44. Arnol′ d, V. I. Mathematical methods of classical mechanics, vol. 60 of Grad-
uate Texts in Mathematics. Springer-Verlag, New York. Translated from the
1974 Russian original by K. Vogtmann and A. Weinstein. Corrected reprint of
the second (1989) edition.
45. Aronson, D. G., and Bénilan, Ph. Régularité des solutions de l’équation
des milieux poreux dans RN . C. R. Acad. Sci. Paris Sér. A-B 288, 2 (1979),
A103–A105.
46. Aronsson, G. A mathematical model in sand mechanics: Presentation and
analysis. SIAM J. Appl. Math. 22 (1972), 437–458.
47. Aronsson, G., and Evans, L. C. An asymptotic model for compression
molding. Indiana Univ. Math. J. 51, 1 (2002), 1–36.
48. Aronsson, G., Evans, L. C., and Wu, Y. Fast/slow diffusion and growing
sandpiles. J. Differential Equations 131, 2 (1996), 304–335.
49. Attouch, L., and Soubeyran, A. From procedural rationality to routines:
A “Worthwile to Move” approach of satisficing with not too much sacrificing.
Preprint, 2005. Available online at
www.gate.cnrs.fr/seminaires/2006 2007.
50. Aubry, P. Monge, le savant ami de Napoléon Bonaparte, 1746–1818.
Gauthier-Villars, Paris, 1954.
51. Aubry, S. The twist map, the extended Frenkel–Kontorova model and the
devil’s staircase. In Order in chaos (Los Alamos, 1982), Phys. D 7, 1–3 (1983),
240–258.
52. Aubry, S., and Le Daeron, P. Y. The discrete Frenkel–Kantorova model
and its extensions, I. Phys. D. 8, 3 (1983), 381–422.
53. Bakelman, I. J. Convex analysis and nonlinear geometric elliptic equations.
Edited by S. D. Taliaferro. Springer-Verlag, Berlin, 1994.
936 References

54. Bakry, D. L’hypercontractivité et son utilisation en théorie des semigroupes.

In École d’été de Probabilités de Saint-Flour, no. 1581 in Lecture Notes in
Math. Springer, 1994.
55. Bakry, D., Cattiaux, P., and Guillin, A. Rate of convergence for ergodic
continuous Markov processes: Lyapunov versus Poincaré. J. Funct. Anal. 254,
3 (2008), 727–759.
56. Bakry, D., and Émery, M. Diffusions hypercontractives. In Sém. Proba.
XIX, no. 1123 in Lecture Notes in Math. Springer, 1985, pp. 177–206.
57. Bakry, D., and Ledoux, M. Lévy–Gromov’s isoperimetric inequality for an
infinite-dimensional diffusion generator. Invent. Math. 123, 2 (1996), 259–281.
58. Bakry, D., and Ledoux, M. Sobolev inequalities and Myers’s diameter
theorem for an abstract Markov generator. Duke Math. J. 85, 1 (1996), 253–
270.
59. Bakry, D., and Ledoux, M. A logarithmic Sobolev form of the Li–Yau
parabolic inequalities. Rev. Mat. Iberoamericana 22, 2 (2006), 683–702.
60. Bakry, D., and Qian, Z. M. Harnack inequalities on a manifold with positive
or negative Ricci curvature. Rev. Mat. Iberoamericana 15, 1 (1999), 143–179.
61. Bakry, D., and Qian, Z. M. Some new results on eigenvectors via dimension,
diameter, and Ricci curvature. Adv. Math. 155, 1 (2000), 98–153.
62. Ball, K., Carlen, E., and Lieb, E. H. Sharp uniform convexity and smooth-
ness inequalities for trace norms. Invent. Math. 115, 3 (1994), 463–482.
63. Ball, J. M., and Mizel, V. J. Singular minimizers for regular one-
dimensional problems in the calculus of variations. Bull. Amer. Math. Soc.
(N.S.) 11, 1 (1984), 143–146.
64. Ball, J. M., and Mizel, V. J. One-dimensional variational problems whose
minimizers do not satisfy the Euler-Lagrange equation. Arch. Rational Mech.
Anal. 90, 4 (1985), 325–388.
65. Banavar, J. R., Colaiori, F., Flammini, A., Maritan, A., and Rinaldo,
A. Scaling, optimality and landscape evolution. J. Statist. Phys. 104, 1/2
(2001), 1–48.
66. Bangert, V. Mather sets for twist maps and geodesics on tori. In Dynamics
reported, Dynam. Report. Ser. Dynam. Systems Appl., no. 1, Wiley, Chichester,
1988, pp. 1–56.
67. Bangert, V. Minimal measures and minimizing closed normal one-currents.
Geom. Funct. Anal. 9, 3 (1999), 413–427.
68. Barles, G. Solutions de viscosité des équations de Hamilton–Jacobi, vol. 17
of Mathématiques & Applications. Springer-Verlag, Paris, 1994.
69. Barles, G., and Souganidis, P. E. On the large time behavior of solutions
of Hamilton–Jacobi equations. SIAM J. Math. Anal. 31, 4 (2000), 925–939.
70. Barthe, F. Log-concave and spherical models in isoperimetry. Geom. Funct.
Anal. 12, 1 (2002), 32–55.
71. Barthe, F. Autour de l’inégalité de Brunn–Minkowski. Ann. Fac. Sci.
Toulouse Math. (6) 12, 2 (2003), 127–178.
72. Barthe, F. L’inégalité de Brunn–Minkowski. Images des Mathématiques 2006,
CNRS, Paris, 2006, pp. 5–10. Available online at
www.math.cnrs.fr/imagesdesmaths.
73. Barthe, F. Handwritten notes about Talagrand’s inequality for the exponen-
tial measure on the real line, 2006.
References 937

74. Barthe, F., Bakry, D., Cattiaux, P., and Guillin, A. Poincaré inequal-
ities for logconcave probability measures: a Lyapunov function approach. To
appear in Electron. Comm. Probab.
75. Barthe, F., Cattiaux, P., and Roberto, C. Interpolated inequalities be-
tween exponential and Gaussian, Orlicz hypercontractivity and application to
isoperimetry. Rev. Mat. Iberoamericana 22, 3 (2006), 993–1067.
76. Barthe, F., and Kolesnikov, A. Mass transport and variants of the loga-
rithmic Sobolev inequality. Preprint, 2008.
77. Barthe, F., and Roberto, C. Sobolev inequalities for probability measures
on the real line. Studia Math. 159, 3 (2003), 481–497.
78. Beckner, W. A generalized Poincaré inequality for Gaussian measures. Proc.
Amer. Math. Soc. 105, 2 (1989), 397–400.
79. Beiglböck, M., Goldstern, M., Maresch, G., and Schachermayer, W.
Optimal and better transport plans. Preprint, 2008. Available online at
www.fam.tuwien.ac.at/~wschach/pubs.
80. Bell, E. T. Men of Mathematics. Simon and Schuster, 1937.
81. Benachour, S., Roynette, B., Talay, D., and Vallois, P. Nonlinear self-
stabilizing processes. I. Existence, invariant probability, propagation of chaos.
Stochastic Process. Appl. 75, 2 (1998), 173–201.
82. Benachour, S., Roynette, B., and Vallois, P. Nonlinear self-stabilizing
processes. II. Convergence to invariant probability. Stochastic Process. Appl.
75, 2 (1998), 203–224.
83. Benaı̈m, M., Ledoux, M., and Raimond, O. Self-interacting diffusions.
Probab. Theory Related Fields 122, 1 (2002), 1–41.
84. Benaı̈m, M., and Raimond, O. Self-interacting diffusions. II. Convergence in
law. Ann. Inst. H. Poincaré Probab. Statist. 39, 6 (2003), 1043–1055.
85. Benaı̈m, M., and Raimond, O. Self-interacting diffusions. III. Symmetric
interactions. Ann. Probab. 33, 5 (2005), 1717–1759.
86. Benamou, J.-D. Transformations conservant la mesure, mécanique des flu-
ides incompressibles et modèle semi-géostrophique en météorologie. Mémoire
présenté en vue de l’Habilitation à Diriger des Recherches. PhD thesis, Univ.
Paris-Dauphine, 1992.
87. Benamou, J.-D., and Brenier, Y. Weak existence for the semigeostrophic
equations formulated as a coupled Monge–Ampère/transport problem. SIAM
J. Appl. Math. 58, 5 (1998), 1450–1461.
88. Benamou, J.-D., and Brenier, Y. A numerical method for the optimal
time-continuous mass transport problem and related problems. In Monge
Ampère equation: applications to geometry and optimization (Deerfield Beach,
FL, 1997). Amer. Math. Soc., Providence, RI, 1999, pp. 1–11.
89. Benamou, J.-D., and Brenier, Y. A computational fluid mechanics solution
to the Monge–Kantorovich mass transfer problem. Numer. Math. 84, 3 (2000),
375–393.
90. Benamou, J.-D., and Brenier, Y. Mixed L2 -Wasserstein optimal mapping
between prescribed density functions. J. Optim. Theory Appl. 111, 2 (2001),
255–271.
91. Benamou, J.-D., Brenier, Y., and Guittet, K. The Monge–Kantorovitch
mass transfer and its computational fluid mechanics formulation. ICFD Con-
ference on Numerical Methods for Fluid Dynamics (Oxford, 2001). Internat.
J. Numer. Methods Fluids 40, 1–2 (2002), 21–30.
938 References

92. Benedetto, D., Caglioti, E., Carrillo, J. A., and Pulvirenti, M.

A non-Maxwellian steady distribution for one-dimensional granular media.
J. Statist. Phys. 91, 5–6 (1998), 979–990.
93. Benedetto, D., Caglioti, E., and Pulvirenti, M. A one-dimensional
Boltzmann equation with inelastic collisions. Rend. Sem. Mat. Fis. Milano 67
(1997), 169–179 (2000).
94. Benfatto, G., Picco, P., and Pulvirenti, M. On the invariant measures
for the two-dimensional Euler flow. J. Statist. Phys. 46, 3–4 (1987), 729–742.
95. Bénilan, Ph. Solutions intégrales d’équations d’évolution dans un espace de
Banach. C. R. Acad. Sci. Paris Sér. A–B 274 (1972), A47–A50.
96. Bennett, G. Probability inequalities for the sum of independent random
variables. J. Amer. Statist. Assoc. 57, 297 (1962), 33–45.
97. Bérard, P., Besson, G., and Gallot, S. Embedding Riemannian manifolds
by their heat kernel. Geom. Funct. Anal. 4, 4 (1994), 373–398.
98. Berbee, H. C. P. Random walks with stationary increments and renewal
theory. Mathematical Centre Tracts, vol. 112. Mathematisch Centrum, Ams-
terdam, 1979.
99. Berger, M. A panoramic view of Riemannian geometry. Springer-Verlag,
2003.
100. Berger, M., Gauduchon, P., and Mazet, E. Le spectre d’une variété
riemannienne. Lecture Notes in Math., vol. 194, Springer-Verlag, Berlin, 1971.
101. Berkes, I., and Philipp, W. An almost sure invariance principle for the
empirical distribution function of mixing random variables. Z. Wahrschein-
lichkeitstheorie und Verw. Gebiete 41, 2 (1977/78), 115–137.
102. Bernard, P. Smooth critical sub-solutions of the Hamilton–Jacobi equation.
Math. Res. Lett. 14, 3 (2007), 503–511.
103. Bernard, P. Existence of C 1,1 critical sub-solutions of the Hamilton–Jacobi
equation on compact manifolds. To appear in Ann. Sci. École Norm. Sup.
104. Bernard, P., and Buffoni, B. The Monge problem for supercritical Mañé
potentials on compact manifolds. Adv. Math. 207, 2 (2006), 691–706.
105. Bernard, P., and Buffoni, B. Optimal mass transportation and Mather
theory. J. Eur. Math. Soc. (JEMS) 9, 1 (2007), 85–121.
106. Bernard, P., and Buffoni, B. Weak KAM pairs and Monge–Kantorovich
duality. To appear in Adv. Studies Pure Math. Available online at
www.ceremade.dauphine.fr/~pbernard/publi.html.
107. Bernard, P., and Roquejoffre, J.-M. Convergence to time-periodic solu-
tions in time-periodic Hamilton–Jacobi equations on the circle. Comm. Partial
Differential Equations 29, 3–4 (2004), 457–469.
108. Bernot, M. Transport optimal et irrigation. PhD thesis, ENS Cachan, 2005.
109. Bernot, M., Caselles, V., and Morel, J.-M. Traffic plans. Publ. Mat. 49,
2 (2005), 417–451.
110. Bernot, M., Caselles, V., and Morel, J.-M. Are there infinite irrigation
trees? J. Math. Fluid Mech. 8, 3 (2006), 311–332.
111. Bernot, M., Caselles, V., and Morel, J.-M. The structure of branched
transportation networks. To appear in Calc. Var. Partial Differential Equa-
tions.
112. Bernot, M., Caselles, V., and Morel, J.-M. The models and theory of
optimal transportation networks. To appear in Lect. Notes in Math.
113. Bernot, M., and Figalli, A. Synchronized traffic plans and stability of
optima. To appear in ESAIM Control Optim. Calc. Var.
References 939

114. Bertrand, J. Existence and uniqueness of optimal maps on Alexandrov

spaces. Preprint, 2007.
115. Bertsekas, D. Network optimization: Continuous and discrete models.
Athena Scientific, 1998. Referenced online at
www.athenasc.com/netbook.html.
116. Bestvina, M. Degenerations of the hyperbolic space. Duke Math. J. 56, 1
(1988), 143–161.
117. Bialy, M. L., and Polterovich, L. V. Geodesic flows on the two-
dimensional torus and phase transitions “commensurability-noncommensura-
bility”. Funct. Anal. Appl. 20 (1986), 260–266.
118. Biane, P., and Voiculescu, D. A free probability analogue of the Wasser-
stein metric on the trace-state space. Geom. Funct. Anal. 11, 6 (2001), 1125–
1138.
119. Biler, P., Dolbeault, J., and Esteban, M. J. Intermediate asymptotics
in L1 for general nonlinear diffusion equations. Appl. Math. Lett. 15, 1 (2002),
101–107.
120. Billingsley, P. Convergence of probability measures, second ed. John Wiley
& Sons Inc., New York, 1999. A Wiley–Interscience Publication.
121. Blanchet, A., Calvez, V., and Carrillo, J. A. Convergence of the
mass-transport steepest descent scheme for the sub-critical Patlak–Keller–Segel
model. Preprint, 2007.
122. Blower, G. The Gaussian isoperimetric inequality and transportation. Pos-
itivity 7, 3 (2003), 203–224.
123. Blower, G. Displacement convexity for the generalized orthogonal ensemble.
J. Statist. Phys. 116, 5–6 (2004), 1359–1387.
124. Blower, G., and Bolley, F. Concentration inequalities on product spaces
with applications to Markov processes. Studia Math. 175, 1 (2006), 47–72.
Archived online at arxiv.org/abs/math.PR/0505536.
125. Bobkov, S. G. A functional form of the isoperimetric inequality for the Gaus-
sian measure. J. Funct. Anal. 135, 1 (1996), 39–49.
126. Bobkov, S. G. Isoperimetric and analytical inequalities for log-concave prob-
ability measures. Ann. Probab. 27, 4 (1999), 1903–1921.
127. Bobkov, S. G., Gentil, I., and Ledoux, M. Hypercontractivity of
Hamilton–Jacobi equations. J. Math. Pures Appl. 80, 7 (2001), 669–696.
128. Bobkov, S. G., and Götze, F. Exponential integrability and transportation
cost related to logarithmic Sobolev inequalities. J. Funct. Anal. 163, 1 (1999),
1–28.
129. Bobkov, S. G., and Houdré, Ch. Some connections between isoperimetric
and Sobolev-type inequalities. Mem. Amer. Math. Soc. 129, 616 (1997).
130. Bobkov, S. G., and Ledoux, M. Poincaré’s inequalities and Talagrand’s
concentration phenomenon for the exponential distribution. Probab. Theory
Related Fields 107, 3 (1997), 383–400.
131. Bobkov, S. G., and Ledoux, M. From Brunn–Minkowski to Brascamp–Lieb
and to logarithmic Sobolev inequalities. Geom. Funct. Anal. 10, 5 (2000),
1028–1052.
132. Bobkov, S. G., and Ledoux, M. From Brunn–Minkowski to sharp Sobolev
inequalities. To appear in Ann. Mat. Pura Appl. Available online at
www.lsp.ups-tlse.fr/Ledoux.
940 References

133. Bobylev, A., and Toscani, G. On the generalization of the Boltzmann H-

theorem for a spatially homogeneous Maxwell gas. J. Math. Phys. 33, 7 (1992),
2578–2586.
134. Bogachev, V. I., Kolesnikov, A. V., and Medvedev, K. V. On triangular
transformations of measures. Dokl. Ross. Akad. Nauk 396 (2004), 438–442.
Translation in Dokl. Math. 69 (2004).
135. Bogachev, V. I., Kolesnikov, A. V., and Medvedev, K. V. Triangular
transformations of measures. Mat. Sb. 196, 3 (2005), 3–30. Translation in Sb.
Math. 196, 3-4 (2005), 309–335.
136. Bolley, F. Separability and completeness for the Wasserstein distance. To
appear in Seminar on Probability, Lect. Notes in Math.
137. Bolley, F., Brenier, Y., and Loeper, G. Contractive metrics for scalar
conservation laws. J. Hyperbolic Differ. Equ. 2, 1 (2005), 91–107.
138. Bolley, F., and Carrillo, J. A. Tanaka theorem for inelastic Maxwell
models. Comm. Math. Phys. 276, 2 (2007), 287–314.
139. Bolley, F., Guillin, A., and Villani, C. Quantitative concentration in-
equalities for empirical measures on non-compact spaces. Prob. Theory Related
Fields 137, 3–4 (2007), 541–593.
140. Bolley, F., and Villani, C. Weighted Csiszár–Kullback–Pinsker inequalities
and applications to transportation inequalities. Ann. Fac. Sci. Toulouse Math.
(6) 14, 3 (2005), 331–352.
141. Boltzmann, L. Lectures on gas theory. University of California Press, Berke-
ley, 1964. Translated by Stephen G. Brush. Reprint of the 1896–1898 Edition
by Dover Publications, 1995.
142. Bonami, A. Étude des coefficients de Fourier des fonctions de Lp (G). Ann.
Inst. Fourier (Grenoble) 20, fasc. 2 (1970), 335–402 (1971).
143. Bonciocat, A.-I., and Sturm, K.-T. Mass transportation and rough curva-
ture bounds for discrete spaces. Preprint, 2007.
144. Borell, C. Convex set functions in d-space. Period. Math. Hungar. 6, 2
(1975), 111–136.
145. Borell, C. The Brunn–Minkowski inequality in Gauss space. Invent. Math.
30, 2 (1975), 207–216.
146. Borovkov, A. A., and Utev, S. A. An inequality and a characterization of
the normal distribution connected with it. Teor. Veroyatnost. i Primenen. 28,
2 (1983), 209–218.
147. Bouchitté, G., and Buttazzo, G. Characterization of optimal shapes and
masses through Monge–Kantorovich equation. J. Eur. Math. Soc. (JEMS) 3,
2 (2001), 139–168.
148. Bouchitté, G., Buttazzo, G., and Seppecher, P. Shape optimization
solutions via Monge–Kantorovich equation. C. R. Acad. Sci. Paris Sér. I
Math. 324, 10 (1997), 1185–1191.
149. Bouchitté, G., Champion, Th., and Jiemenez, Ch. Completion of the
space of measures in the Kantorovich norm. Riv. Mat. Univ. Parma 7, 4
(2005), 127–139.
150. Bouchitté, G., Jimenez, Ch., and Rajesh. M. Asymptotique d’un problème
de positionnement optimal. C. R. Acad. Sci. Paris, Sér. I, 335 (2002), 835–858.
151. Bouchitté, G., Jimenez, Ch., and Rajesh, M. A new L∞ estimate in
optimal mass transport. Proc. Amer. Math. Soc. 135, 11 (2007), 3525–3535.
152. Brancolini, A., Buttazzo, G., and Santambrogio, F. Path functionals
over Wasserstein spaces. J. Eur. Math. Soc. (JEMS) 8, 3 (2006), 415–434.
References 941

153. Brascamp, H. J., and Lieb, E. H. On extensions of the Brunn–Minkowski

and Prékopa–Leindler theorems, including inequalities for log concave func-
tions, and with an application to the diffusion equation. J. Functional Analysis
22, 4 (1976), 366–389.
154. Brenier, Y. Décomposition polaire et réarrangement monotone des champs
de vecteurs. C. R. Acad. Sci. Paris Sér. I Math. 305, 19 (1987), 805–808.
155. Brenier, Y. The least action principle and the related concept of generalized
flows for incompressible perfect fluids. J. Amer. Math. Soc. 2, 2 (1989), 225–
255.
156. Brenier, Y. Polar factorization and monotone rearrangement of vector-valued
functions. Comm. Pure Appl. Math. 44, 4 (1991), 375–417.
157. Brenier, Y. The dual least action problem for an ideal, incompressible fluid.
Arch. Rational Mech. Anal. 122, 4 (1993), 323–351.
158. Brenier, Y. A homogenized model for vortex sheets. Arch. Rational Mech.
Anal. 138, 4 (1997), 319–353.
159. Brenier, Y. Minimal geodesics on groups of volume-preserving maps and
generalized solutions of the Euler equations. Comm. Pure Appl. Math. 52, 4
(1999), 411–452.
160. Brenier, Y. Derivation of the Euler equations from a caricature of Coulomb
interaction. Comm. Math. Phys. 212, 1 (2000), 93–104.
161. Brenier, Y. A Monge–Kantorovich approach to the Maxwell equations. In:
Hyperbolic problems: theory, numerics, applications, vol. I, II (Magdeburg,
2000), 2001, pp. 179–186.
162. Brenier, Y. Volume preserving maps, Euler equations and Coulomb interac-
tion. XIIIth International Congress on Mathematical Physics (London, 2000),
Int. Press, Boston, MA, 2001, pp. 303–309.
163. Brenier, Y. Some geometric PDEs related to hydrodynamics and electrody-
namics. Proceedings of the International Congress of Mathematicians, Vol. III
(Beijing, 2002), Higher Ed. Press, Beijing, 2002, pp. 761–772.
164. Brenier, Y. Extended Monge–Kantorovich theory. In Optimal transportation
and applications (Martina Franca, 2001), vol. 1813 of Lecture Notes in Math.
Springer, Berlin, 2003, pp. 91–121.
165. Brenier, Y. A note on deformations of 2D fluid motions using 3D Born–Infeld
equations. Monatsh. Math. 142, 1-2 (2004), 113–122.
166. Brenier, Y. Extension of the Monge–Kantorovich theory to classical electro-
dynamics. In Recent advances in the theory and applications of mass transport,
vol. 353 of Contemp. Math., Amer. Math. Soc., Providence, RI, 2004, pp.19–41.
167. Brenier, Y. Personal communication, 2006.
168. Brenier, Y., Frisch, U., Hénon, M., Loeper, G., Matarrese, S., Mo-
hayaee, R., and Sobolevskiı̆, A. Reconstruction of the early Universe as a
convex optimization problem. Mon. Not. R. Astron. Soc. 346 (2003), 501–524.
169. Brenier, Y., and Grenier, E. Sticky particles and scalar conservation laws.
SIAM J. Numer. Anal. 35, 6 (1998), 2317–2328.
170. Brenier, Y., and Loeper, G. A geometric approximation to the Euler equa-
tion: The Vlasov–Monge–Ampère equation. Geom. Funct. Anal. 14, 6 (2004),
1182–1218.
171. Brézis, H. Opérateurs maximaux monotones et semi-groupes de contractions
dans les espaces de Hilbert. North-Holland Publishing Co., Amsterdam, 1973.
North-Holland Mathematics Studies, No. 5.
942 References

172. Brézis, H. Analyse fonctionnelle. Théorie et applications. Masson, Paris,

1983.
173. Brézis, H., and Lieb, E. H. Sobolev inequalities with a remainder term.
J. Funct. Anal. 62 (1985), 73–86.
174. Burago, D., Burago, Y., and Ivanov, S. A course in metric geometry,
vol. 33 of Graduate Studies in Mathematics. American Mathematical Society,
Providence, RI, 2001. Errata list available online at
www.pdmi.ras.ru/staff/burago.html.
175. Burago, Y., Gromov, M., and Perel′ man, G. A. D. Aleksandrov spaces
with curvatures bounded below. Uspekhi Mat. Nauk 47, 2(284) (1992), 3–51,
222.
176. Burago, Y. D., and Zalgaller, V. A. Geometric inequalities. Springer-
Verlag, Berlin, 1988. Translated from the Russian by A. B. Sosinskiı̆, Springer
Series in Soviet Mathematics.
177. Buttazzo, G., Giaquinta, M., and Hildebrandt, S. One-dimensional vari-
ational problems. An introduction, vol. 15 of Oxford Lecture Series in Math-
ematics and its Applications. The Clarendon Press Oxford University Press,
New York, 1998.
178. Buttazzo, G., Pratelli, A., and Stepanov, E. Optimal pricing policies
for public transportation networks. SIAM J. Optim. 16, 3 (2006), 826–853.
179. Buttazzo, G., Pratelli, A., Solimini, S., and Stepanov, E. Optimal
urban networks via mass transportation. Preprint, 2007. Available online via
cvgmt.sns.it.
180. Buttazzo, G., and Santambrogio, F. A model for the optimal planning of
an urban area. SIAM J. Math. Anal. 37, 2 (2005), 514–530.
181. Cabré, X. Nondivergent elliptic equations on manifolds with nonnegative
curvature. Comm. Pure Appl. Math. 50, 7 (1997), 623–665.
182. Cabré, X. The isoperimetric inequality via the ABP method. Preprint, 2005.
183. Cáceres, M., and Toscani, G. Kinetic approach to long time behavior of
linearized fast diffusion equations. J. Stat. Phys. 128, 4 (2007), 883–925.
184. Caffarelli, L. A. Interior W 2,p estimates for solutions of the Monge–Ampère
equation. Ann. of Math. (2) 131, 1 (1990), 135–150.
185. Caffarelli, L. A. The regularity of mappings with a convex potential.
J. Amer. Math. Soc. 5, 1 (1992), 99–104.
186. Caffarelli, L. A. Boundary regularity of maps with convex potentials.
Comm. Pure Appl. Math. 45, 9 (1992), 1141–1151.
187. Caffarelli, L. A. Boundary regularity of maps with convex potentials. II.
Ann. of Math. (2) 144, 3 (1996), 453–496.
188. Caffarelli, L. A. Monotonicity properties of optimal transportation and the
FKG and related inequalities. Comm. Math. Phys. 214, 3 (2000), 547–563.
Erratum in Comm. Math. Phys. 225 (2002), 2, 449–450.
189. Caffarelli, L. A., and Cabré, X. Fully nonlinear elliptic equations, vol. 43
of American Mathematical Society Colloquium Publications. American Math-
ematical Society, Providence, RI, 1995.
190. Caffarelli, L. A., Feldman, M., and McCann, R. J. Constructing optimal
maps for Monge’s transport problem as a limit of strictly convex costs. J. Amer.
Math. Soc. 15, 1 (2002), 1–26.
191. Caffarelli, L. A., Gutiérrez, C., and Huang, Q. On the regularity of
reflector antennas. Ann. of Math. (2), to appear.
References 943

192. Caffarelli, L. A., and McCann, R. J. Free boundaries in optimal transport

and Monge–Ampère obstacle problems. To appear in Ann. of Math. (2).
193. Caglioti, E., Pulvirenti, M., and Rousset, F. 2-D constrained Navier–
Stokes equation and intermediate asymptotics. To appear in Physica A.
194. Caglioti, E., Pulvirenti, M., and Rousset, F. Quasi-stationary states for
the 2-D Navier–Stokes equation. Preprint, 2007.
195. Caglioti, E., and Villani, C. Homogeneous cooling states are not always
good approximations to granular flows. Arch. Ration. Mech. Anal. 163, 4
(2002), 329–343.
196. Calvez, V. Modèles et analyses mathématiques pour les mouvements collectifs
de cellules. PhD thesis, Univ. Paris VI, 2007.
197. Calvez, V. Personal communication.
198. Calvez, V., and Carrillo, J. A. Work in progress.
199. Cannarsa, P., and Sinestrari, C. Semiconcave functions, Hamilton–Jacobi
equations, and optimal control. Progress in Nonlinear Differential Equations
and their Applications, 58. Birkhäuser Boston Inc., Boston, MA, 2004.
200. Carfora, M. Fokker–Planck dynamics and entropies for the normalized Ricci
flow. Preprint, 2006. Archived online at arxiv.org/abs/math.DG/0507309.
201. Carlen, E. Behind the wave function: stochastic mechanics today. In Pro-
ceedings of the Joint Concordia Sherbrooke Seminar Series on Functional Inte-
gration Methods in Stochastic Quantum Mechanics (Sherbrooke, PQ and Mon-
treal, PQ, 1987) (1991), vol. 25, pp. 141–156.
202. Carlen, E. A., Carvalho, M. C., Esposito, R., Lebowitz, J. L., and
Marra, R. Displacement convexity and minimal fronts at phase boundaries.
Preprint, 2007.
203. Carlen, E. A., and Gangbo, W. Constrained steepest descent in the 2-
Wasserstein metric. Ann. of Math. (2) 157, 3 (2003), 807–846.
204. Carlen, E. A., and Gangbo, W. Solution of a model Boltzmann equation
via steepest descent in the 2-Wasserstein metric. Arch. Ration. Mech. Anal.
172, 1 (2004), 21–64.
205. Carlen, E. A., and Soffer, A. Entropy production by block variable sum-
mation and central limit theorems. Comm. Math. Phys. 140 (1991), 339–371.
206. Carlier, G., and Jimenez, Ch. On Monge’s problem for Bregman-like cost
functions. To appear in J. Convex Anal.
207. Carlier, G., Jimenez, Ch., and Santambrogio, S. Optimal transportation
with traffic congestion and Wardrop equilibria. Preprint, 2006.
208. Carrillo, J. A., Di Francesco, M., and Lattanzio, C. Contractivity of
Wasserstein metrics and asymptotic profiles for scalar conservation laws. J.
Differential Equations 231, 2 (2006), 425–458.
209. Carrillo, J. A., Di Francesco, M., and Lattanzio, C. Contractivity and
asymptotics in Wasserstein metrics for viscous nonlinear scalar conservation
laws. Boll. Unione Mat. Ital. Ser. B Artic. Ric. Mat. 10, 2 (2007), 277–292.
210. Carrillo, J. A., Di Francesco, M., and Toscani, G. Intermediate
asymptotics beyond homogeneity and self-similarity: long time behavior for
ut = ∆φ(u). Arch. Ration. Mech. Anal. 180, 1 (2006), 127–149.
211. Carrillo, J. A., Di Francesco, M., and Toscani, G. Strict contractivity of
the 2-Wasserstein distance for the porous medium equation by mass centering.
Proc. Amer. Math. Soc. 135, 2 (2007), 353–363.
944 References

212. Carrillo, J. A., Gualdani, M. P., and Toscani, G. Finite speed of prop-
agation in porous media by mass transportation methods. C. R. Math. Acad.
Sci. Paris 338, 10 (2004), 815–818.
213. Carrillo, J. A., McCann, R. J., and Villani, C. Kinetic equilibration
rates for granular media and related equations: entropy dissipation and mass
transportation estimates. Rev. Mat. Iberoamericana 19, 3 (2003), 971–1018.
214. Carrillo, J. A., McCann, R. J., and Villani, C. Contractions in the 2-
Wasserstein length space and thermalization of granular media. Arch. Ration.
Mech. Anal. 179, 2 (2006), 217–263.
215. Carrillo, J. A., and Toscani, G. Asymptotic L1 -decay of solutions of the
porous medium equation to self-similarity. Indiana Univ. Math. J. 49, 1 (2000),
113–142.
216. Carrillo, J. A., and Toscani, G. Contractive probability metrics and
asymptotic behavior of dissipative kinetic equations. Proceedings of the 2006
Porto Ercole Summer School. Riv. Mat. Univ. Parma 6 (2007), 75–198.
217. Carrillo, J. A., and Vázquez, J. L. Fine asymptotics for fast diffusion
equations. Comm. Partial Differential Equations 28, 5–6 (2003), 1023–1056.
218. Carrillo, J. A., and Vázquez, J. L. Asymptotic complexity in filtration
equations. J. Evol. Equ. 7, 3 (2007), 471–495.
219. Cattiaux, P., and Guillin, A. On quadratic transportation cost inequalities.
J. Math. Pures Appl. (9) 86, 4 (2006), 341–361.
220. Cattiaux, P., and Guillin, A. Trends to equilibrium in total variation
distance. To appear in Ann. Inst. H. Poincaré Probab. Statist.
221. Cattiaux, P., Guillin, A., and Malrieu, F. Probabilistic approach for
granular media equations in the nonuniformly convex case. Probab. Theory
Related Fields 140, 1–2 (2008), 19–40.
222. Champion, Th., De Pascale, L., and Juutinen, P. The ∞-Wasserstein
distance: local solutions and existence of optimal transport maps. Preprint,
2007.
223. Chavel, I. Riemannian geometry — a modern introduction, vol. 108 of Cam-
bridge Tracts in Mathematics. Cambridge University Press, Cambridge, 1993.
224. Chazal, F., Cohen-Steiner, D., and Mérigot, Q. Stability of boundary
measures. INRIA Report, 2007. Available online at
www-sop.inria.fr/geometrica/team/Quentin.Merigot/
225. Cheeger, J. A lower bound for the smallest eigenvalue of the Laplacian. In
Problems in analysis (Papers dedicated to Salomon Bochner, 1969), pp. 195–
199. Princeton Univ. Press, Princeton, N. J., 1970.
226. Cheeger, J. Differentiability of Lipschitz functions on metric measure spaces.
Geom. Funct. Anal. 9, 3 (1999), 428–517.
227. Cheeger, J., and Colding, T. H. Lower bounds on Ricci curvature and the
almost rigidity of warped products. Ann. of Math. (2) 144, 1 (1996), 189–237.
228. Cheeger, J., and Colding, T. H. On the structure of spaces with Ricci
curvature bounded below. I. J. Differential Geom. 46, 3 (1997), 406–480.
229. Cheeger, J., and Colding, T. H. On the structure of spaces with Ricci
curvature bounded below. II. J. Differential Geom. 54, 1 (2000), 13–35.
230. Cheeger, J., and Colding, T. H. On the structure of spaces with Ricci
curvature bounded below. III. J. Differential Geom. 54, 1 (2000), 37–74.
231. Chen, M.-F. Trilogy of couplings and general formulas for lower bound of
spectral gap. Probability towards 2000 (New York, 1995), 123–136, Lecture
Notes in Statist. 128, Springer, New York, 1998.
References 945

232. Chen, M.-F., and Wang, F.-Y. Application of coupling method to the first
eigenvalue on manifold. Science in China (A) 37, 1 (1994), 1–14.
233. Christensen, J. P. R. Measure theoretic zero sets in infinite dimensional
spaces and applications to differentiability of Lipschitz mappings. Publ. Dép.
Math. (Lyon) 10, 2 (1973), 29–39. Actes du Deuxième Colloque d’Analyse
Fonctionnelle de Bordeaux (Univ. Bordeaux, 1973), I, pp. 29–39.
234. Cianchi, A., Fusco, N., Maggi, F., and Pratelli, A. The sharp Sobolev
inequality in quantitative form. Preprint, 2007.
235. Clarke, F. H. Methods of dynamic and nonsmooth optimization, vol. 57 of
CBMS-NSF Regional Conference Series in Applied Mathematics. Society for
Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1989.
236. Clarke, F. H., and Vinter, R. B. Regularity properties of solutions in the
basic problem of calculus of variations. Trans. Amer. Math. Soc. 289 (1985),
73–98.
237. Clement, Ph., and Desch, W. An elementary proof of the triangle inequality
for the Wasserstein metric. Proc. Amer. Math. Soc. 136, 1 (2008), 333–339.
238. Contreras, G., and Iturriaga, R. Global minimizers of autonomous La-
grangians. 22o Colóquio Brasileiro de Matemática. Instituto de Matemática
Pura e Aplicada (IMPA), Rio de Janeiro, 1999.
239. Contreras, G., Iturriaga, R., Paternain, G. P., and Paternain, M.
The Palais–Smale condition and Mañé’s critical values. Ann. Henri Poincaré
1, 4 (2000), 655–684.
240. Cordero-Erausquin, D. Sur le transport de mesures périodiques. C. R.
Acad. Sci. Paris Sér. I Math. 329, 3 (1999), 199–202.
241. Cordero-Erausquin, D. Inégalité de Prékopa–Leindler sur la sphère. C. R.
Acad. Sci. Paris Sér. I Math. 329, 9 (1999), 789–792.
242. Cordero-Erausquin, D. Some applications of mass transport to Gaussian-
type inequalities. Arch. Ration. Mech. Anal. 161, 3 (2002), 257–269.
243. Cordero-Erausquin, D. Non-smooth differential properties of optimal trans-
port. In Recent advances in the theory and applications of mass transport,
vol. 353 of Contemp. Math., Amer. Math. Soc., Providence, 2004, pp. 61–71.
244. Cordero-Erausquin, D. Quelques exemples d’application du transport de
mesure en géométrie euclidienne et riemannienne. In Séminaire de Théorie
Spectrale et Géométrie. Vol. 22, Année 2003–2004, pp. 125–152.
245. Cordero-Erausquin, D., Gangbo, W., and Houdré, Ch. Inequalities for
generalized entropy and optimal transportation. In Recent advances in the
theory and applications of mass transport, vol. 353 of Contemp. Math., Amer.
Math. Soc., Providence, RI, 2004, pp. 73–94.
246. Cordero-Erausquin, D., McCann, R. J., and Schmuckenschläger, M.
A Riemannian interpolation inequality à la Borell, Brascamp and Lieb. Invent.
Math. 146, 2 (2001), 219–257.
247. Cordero-Erausquin, D., McCann, R. J., and Schmuckenschläger, M.
Prékopa–Leindler type inequalities on Riemannian manifolds, Jacobi fields, and
optimal transport. Ann. Fac. Sci. Toulouse Math. (6) 15, 4 (2006), 613–635.
248. Cordero-Erausquin, D., Nazaret, B., and Villani, C. A mass-transpor-
tation approach to sharp Sobolev and Gagliardo–Nirenberg inequalities. Adv.
Math. 182, 2 (2004), 307–332.
249. Coulhon, Th., and Saloff-Coste, L. Isopérimétrie pour les groupes et les
variétés. Rev. Mat. Iberoamericana 9, 2 (1993), 293–314.
946 References

250. Coulhon, Th., and Saloff-Coste, L. Variétés riemanniennes isométriques

à l’infini. Rev. Mat. Iberoamericana 11, 3 (1995), 687–726.
251. Courtault, J.-M., Kabanov, Y., Bru, B., Crépel, P., Lebon, I., and
Le Marchand, A. Louis Bachelier on the centenary of “Théorie de la
spéculation.” Math. Finance 10, 3 (2000), 341–353.
252. Cover, T. M., and Thomas, J. A. Elements of information theory. John
Wiley & Sons Inc., New York, 1991. A Wiley–Interscience Publication.
253. Csiszár, I. Information-type measures of difference of probability distributions
and indirect observations. Stud. Sci. Math. Hung. 2 (1967), 299–318.
254. Cuesta-Albertos, J. A., and Matrán, C. Notes on the Wasserstein metric
in Hilbert spaces. Ann. Probab. 17, 3 (1989), 1264–1276.
255. Cuesta-Albertos, J. A., and Matrán, C. Stochastic convergence through
Skorohod representation theorems and Wasserstein distances. Rend. Circ. Mat.
Palermo (2) Suppl. No. 35 (1994), 89–113.
256. Cuesta-Albertos, J. A., Matrán, C., Rachev, S. T., and Rüschendorf,
L. Mass transportation problems in probability theory. Math. Sci. 21, 1 (1996),
34–72.
257. Cuesta-Albertos, J. A., Matrán, C., and Rodrı́guez-Rodrı́guez, J. Ap-
proximation to probabilities through uniform laws on convex sets. J. Theoret.
Probab. 16, 2 (2003), 363–376.
258. Cuesta-Albertos, J. A., Matrán, C., and Tuero-Dı́az, A. On lower
bounds for the L2 -Wasserstein metric in a Hilbert space. J. Theoret. Probab.
9, 2 (1996), 263–283.
259. Cuesta-Albertos, J. A., Matrán, C., and Tuero-Dı́az, A. On the mono-
tonicity of optimal transportation plans. J. Math. Anal. Appl. 215 (1997), 1,
86–94.
260. Cuesta-Albertos, J. A., Matrán, C., and Tuero-Dı́az, A. Optimal trans-
portation plans and convergence in distribution. J. Multivariate Anal. 60, 1
(1997), 72–83.
261. Cuesta-Albertos, J. A., and Tuero-Dı́az, A. A characterization for the
solution of the Monge–Kantorovich mass transference problem. Statist. Probab.
Lett. 16, 2 (1993), 147–152.
262. Cullen, M. J. P. A mathematical theory of large-scale atmosphere/ocean flow.
World Scientific, 2006.
263. Cullen, M. J. P., and Douglas, R. J. Applications of the Monge–Ampère
equation and Monge transport problem to meteorology and oceanography. In
Monge Ampère equation: applications to geometry and optimization (Deerfield
Beach, FL, 1997). Amer. Math. Soc., Providence, RI, 1999, pp. 33–53.
264. Cullen, M. J. P., Douglas, R. J., Roulstone, I., and Sewell, M. J.
Generalised semigeostrophic theory on a sphere. J. Fluid Mech. 531 (2005),
123–157.
265. Cullen, M., and Feldman, M. Lagrangian solutions of semigeostrophic
equations in physical space. SIAM J. Math. Anal. 37, 5 (2006), 1371–1395.
266. Cullen, M. J. P., and Gangbo, W. A variational approach for the 2-
dimensional semigeostrophic shallow water equations. Arch. Ration. Mech.
Anal. 156, 3 (2001), 241–273.
267. Cullen, M. J. P., Gangbo, W., and Pisante, G. The semigeostrophic equa-
tions discretized in reference and dual variables. To appear in Arch. Ration.
Mech. Anal.
References 947

268. Cullen, M. J. P., and Maroofi, H. The fully compressible semi-geostrophic

system from meteorology. Arch. Ration. Mech. Anal. 167, 4 (2003), 309–336.
269. Cullen, M. J. P., and Purser, R. J. Properties of the Lagrangian semi-
geostrophic equations. J. Atmospheric Sci. 46, 17 (1989), 2684–2697.
270. Dacorogna, B., and Moser, J. On a partial differential equation involving
the Jacobian determinant. Ann. Inst. H. Poincaré Anal. Non Linéaire 7, 1
(1990), 1–26.
271. Daneri, S., and Savaré, G. Eulerian calculus for the displacement convexity
in the Wasserstein distance. Preprint, 2008.
272. Davies, E. B. Heat kernels and spectral theory, vol. 92 of Cambridge Tracts
in Mathematics. Cambridge University Press, Cambridge, 1989.
273. de Acosta, A. Invariance principles in probability for triangular arrays of
B-valued random vectors and some applications. Ann. Probab. 10, 2 (1982),
346–373.
274. De Giorgi, E. Sulla differenziabilità e l’analiticità delle estremali degli inte-
grali multipli regolari. Mem. Accad. Sci. Torino. Cl. Sci. Fis. Mat. Nat. (3) 3
(1957), 25–43.
275. De Giorgi, E. Movimenti minimizzanti. In Aspetti e Problemi della Matem-
atica Oggi (Lecce, 1992).
276. De Giorgi, E., Marino, A., and Tosques, M. Problems of evolution in
metric spaces and maximal decreasing curve. Atti Accad. Naz. Lincei Rend.
Cl. Sci. Fis. Mat. Natur. (8) 68, 3 (1980), 180–187.
277. De Launay, L. Un grand français – Monge, fondateur de l’École Polytech-
nique. Éd. Pierre Roger, Paris, 1933. Available online at
www.annales.org/archives/x/monge0.html.
278. De Pascale, L., Gelli, M. S., and Granieri, L. Minimal measures, one-
dimensional currents and the Monge-Kantorovich problem. Calc. Var. 27, 1
(2006), 1–23.
279. De Pascale, L., Evans, L. C., and Pratelli, A. Integral estimates for
transport densities. Bull. London Math. Soc. 36, 3 (2004), 383–395.
280. De Pascale, L., and Pratelli, A. Regularity properties for Monge transport
density and for solutions of some shape optimization problem. Calc. Var.
Partial Differential Equations 14, 3 (2002), 249–274.
281. De Pascale, L., and Pratelli, A. Sharp summability for Monge transport
density via interpolation. ESAIM Control Optim. Calc. Var. 10, 4 (2004),
549–552.
282. del Barrio, E., Cuesta-Albertos, J. A., Matrán, C., and Rodrı́guez-
Rodrı́guez, J. M. Tests of goodness of fit based on the L2 -Wasserstein dis-
tance. Ann. Statist. 27, 4 (1999), 1230–1239.
283. Del Pino, M., and Dolbeault, J. Best constants for Gagliardo–Nirenberg
inequalities and applications to nonlinear diffusions. J. Math. Pures Appl. (9)
81, 9 (2002), 847–875.
284. Delanoë, Ph. Classical solvability in dimension two of the second boundary-
value problem associated with the Monge-Ampère operator. Ann. Inst. H.
Poincaré Anal. Non Linéaire 8, 5 (1991), 443–457.
285. Delanoë, Ph. Gradient rearrangement for diffeomorphisms of a compact
manifold. Differential Geom. Appl. 20, 2 (2004), 145–165.
286. Delanoë, Ph., and Loeper, G. Gradient estimates for potentials of invertible
gradient-mappings on the sphere. Calc. Var. Partial Differential Equations 26,
3 (2006), 297–311.
948 References

287. Delanoë, Ph., and Ge, Y. Regularity of optimal transportation maps on

compact, locally nearly spherical, manifolds. Work in progress.
288. Dellacherie, C. Ensembles analytiques. Théorèmes de séparation et appli-
cations. In Séminaire de Probabilités, IX (Seconde Partie, Univ. Strasbourg,
années universitaires 1973/1974 et 1974/1975). Springer, Berlin, 1975, pp. 336–
372. Lecture Notes in Math., vol. 465.
289. Delon, J. Personal communication, 2005.
290. Demange, J. From porous media equations to generalized Sobolev inequalities
on a Riemannian manifold. Preprint, 2004.
291. Demange, J. Improved Sobolev inequalities under positive curvature.
Preprint, 2004.
292. Demange, J. Des équations à diffusion rapide aux inégalités de Sobolev sur
les modèles de la géométrie. PhD thesis, Univ. Paul Sabatier (Toulouse), 2005.
293. Demange, J. Porous media equation and Sobolev inequalities under negative
curvature. Bull. Sci. Math. 129, 10 (2005), 804–830.
294. Dembo, A. Information inequalities and concentration of measure. Ann.
Probab. 25, 2 (1997), 927–939.
295. Dembo, A., Cover, T. M., and Thomas, J. A. Information theoretic in-
equalities. IEEE Trans. Inform. Theory 37, 6 (1991), 1501–1518.
296. Dembo, A., and Zeitouni, O. Large deviations techniques and applications,
second ed. Springer-Verlag, New York, 1998.
297. Denzler, J. Mather sets for plane Hamiltonian systems. Z. Angew. Math.
Phys. 38, 6 (1987), 791–812.
298. Denzler, J., and McCann, R. J. Phase transitions and symmetry breaking
in singular diffusion. Proc. Natl. Acad. Sci. USA 100, 12 (2003), 6922–6925.
299. Denzler, J., and McCann, R. J. Fast diffusion to self-similarity: complete
spectrum, long-time asymptotics, and numerology. Arch. Ration. Mech. Anal.
175, 3 (2005), 301–342.
300. Desvillettes, L., and Villani, C. On the spatially homogeneous Landau
equation for hard potentials. II. H-theorem and applications. Comm. Partial
Differential Equations 25, 1–2 (2000), 261–298.
301. Desvillettes, L., and Villani, C. On the trend to global equilibrium in
spatially inhomogeneous entropy-dissipating systems: the linear Fokker–Planck
equation. Comm. Pure Appl. Math. 54, 1 (2001), 1–42.
302. Deuschel, J.-D., and Stroock, D. W. Large deviations, vol. 137 of Pure
and Applied Mathematics. Academic Press Inc., Boston, MA, 1989.
303. Ding, Y. Heat kernels and Green’s functions on limit spaces. Comm. Anal.
Geom. 10, 3 (2002), 475–514.
304. DiPerna, R. J., and Lions, P.-L. Ordinary differential equations, transport
theory and Sobolev spaces. Invent. Math. 98 (1989), 511–547.
305. Djellout, H., Guillin, A., and Wu, L. Transportation cost-information in-
equalities and applications to random dynamical systems and diffusions. Ann.
Probab. 32, 3B (2004), 2702–2732.
306. do Carmo, M. P. Riemannian geometry. Mathematics: Theory & Applica-
tions. Birkhäuser Boston Inc., Boston, MA, 1992. Translated from the second
Portuguese edition by Francis Flaherty.
307. Dobrić, V., and Yukich, J. E. Asymptotics for transportation cost in high
dimensions. J. Theoret. Probab. 8, 1 (1995), 97–118.
References 949

308. Dobrušin, R. L. Prescribing a system of random variables by conditional

distributions. Teor. Verojatnost. i Primenen. 15 (1970), 469–497. English
translation in Theor. Prob. Appl. 15 (1970), 458–486.
309. Dobrušin, R. L. Vlasov equations. Funktsional. Anal. i Prilozhen. 13, 2
(1979), 48–58, 96.
310. Dobrušin, R. L. Perturbation methods of the theory of Gibbsian fields. In Lec-
tures on probability theory and statistics (Saint-Flour, 1994). Springer, Berlin,
1996, pp. 1–66.
311. Dobrušin, R. L., and Shlosman, S. B. Constructive criterion for the unique-
ness of Gibbs field. In Statistical physics and dynamical systems (Köszeg, 1984),
Progr. Phys., 10, Birkhäuser Boston, 1985, pp. 347–370.
312. Dolbeault, J., Nazaret, B., and Savaré, G. Work in progress.
313. Donsker, M., and Varadhan, S. R. S. Asymptotic evaluation of certain
Markov process expectations for large time, I. Comm. Pure Appl. Math. 28
(1975), 1–47.
314. Dudley, R. M. The speed of mean Glivenko–Cantelli convergence. Ann.
Math. Statist. 40 (1968), 40–50.
315. Dudley, R. M. Speeds of metric probability convergence. Z. Wahrschein-
lichkeitstheorie verw. Geb. 22 (1972), 323–332.
316. Dudley, R. M. Probabilities and metrics: Convergence of laws on metric
spaces, with a view to statistical testing. Aarhus Universitet, 1976.
317. Dudley, R. M. Uniform central limit theorems, vol. 63 of Cambridge Studies
in Advanced Mathematics. Cambridge University Press, Cambridge, 1999.
318. Dudley, R. M. Real analysis and probability, vol. 74 of Cambridge Stud-
ies in Advanced Mathematics. Cambridge University Press, Cambridge, 2002.
Revised reprint of the 1989 original.
319. Dupin, C. Applications de la géométrie et de la mécanique. Re-edition by
Bachelier, Paris (1922).
320. E, W., Rykov, Yu. G., and Sinai, Ya. G. Generalized variational principles,
global weak solutions and behavior with random initial data for systems of
conservation laws arising in adhesion particle dynamics. Comm. Math. Phys.
177, 2 (1996), 349–380.
321. Eckmann, J.-P., and Hairer, M. Uniqueness of the invariant measure for a
stochastic PDE driven by degenerate noise. Comm. Math. Phys. 219, 3 (2001),
523–565.
322. Einstein, A. Mouvement des particules en suspension dans un fluide au repos,
comme conséquence de la théorie cinétique moléculaire de la chaleur. Annalen
der Physik 17 (1905), 549–560. Trad. française.
323. Ekeland, I. An optimal matching problem. ESAIM Control Optim. Calc.
Var. 11, 1 (2005), 57–71.
324. Ekeland, I., and Témam, R. Convex analysis and variational problems,
English ed., vol. 28 of Classics in Applied Mathematics. Society for Industrial
and Applied Mathematics (SIAM), Philadelphia, PA, 1999. Translated from
the French.
325. Eliassen, A. The quasi-static equations of motion. Geofys. Publ. 17, 3 (1948).
326. Evans, L. C. Classical solutions of fully nonlinear convex, second order elliptic
equations. Comm. Pure Appl. Math. 25 (1982), 333–363.
327. Evans, L. C. Partial differential equations. American Mathematical Society,
Providence, RI, 1998.
950 References

328. Evans, L. C. Partial differential equations and Monge–Kantorovich mass

transfer. In Current developments in mathematics, 1997 (Cambridge, MA).
Int. Press, Boston, MA, 1999, pp. 65–126.
329. Evans, L. C., Feldman, M., and Gariepy, R. F. Fast/slow diffusion and
collapsing sandpiles. J. Differential Equations 137, 1 (1997), 166–209.
330. Evans, L. C. and Gangbo, W. Differential equations methods for the Monge–
Kantorovich mass transfer problem. Mem. Amer. Math. Soc. 137, no. 653,
1999.
331. Evans, L. C., and Gariepy, R. Measure theory and fine properties of func-
tions. CRC Press, Boca Raton, FL, 1992.
332. Evans, L. C., and Gomes, D. A. Effective Hamiltonians and averaging for
Hamiltonian dynamics. I. Arch. Ration. Mech. Anal. 157, 1 (2001), 1–33.
333. Evans, L. C., and Gomes, D. A. Effective Hamiltonians and averaging for
Hamiltonian dynamics. II. Arch. Ration. Mech. Anal. 161, 4 (2002), 271–305.
334. Evans, L. C., and Gomes, D. A. Linear programming interpretations of
Mather’s variational principle. ESAIM Control Optim. Calc. Var. 8 (2002),
693–702.
335. Evans, L. C., and Ishii, H. A PDE approach to some asymptotic problems
concerning random differential equations with small noise intensities. Ann.
Inst. H. Poincaré Anal. Non Linéaire 2, 1 (1985), 1–20.
336. Evans, L. C., and Rezakhanlou, F. A stochastic model for growing sand-
piles and its continuum limit. Comm. Math. Phys. 197, 2 (1998), 325–345.
337. Falconer, K. J. The geometry of fractal sets, vol. 85 of Cambridge Tracts in
Mathematics. Cambridge University Press, Cambridge, 1986.
338. Falconer, K. J. Fractal geometry, second ed. John Wiley & Sons Inc.,
Hoboken, NJ, 2003. Mathematical foundations and applications.
339. Fang, S., and Shao, J. Optimal transportation maps for Monge–Kantorovich
problem on loop groups. Preprint Univ. Bourgogne No. 470, 2006.
340. Fang, S., and Shao, J. Distance riemannienne, théorème de Rademacher et
inégalité de transport sur le groupe des lacets. C. R. Math. Acad. Sci. Paris
341, 7 (2005), 445–450.
341. Fang, S., and Shao, J. Transportation cost inequalities on path and loop
groups. J. Funct. Anal. 218, 2 (2005), 293–317.
342. Fang, S., Shao, J., and Sturm, K.-T. Wasserstein space over the Wiener
space. Preprint, 2007.
343. Faris, W., Ed. Diffusion, Quantum Theory, and Radically Elementary Math-
ematics, vol. 47 of Mathematical Notes. Princeton University Press, 2006.
344. Fathi, A. Solutions KAM faibles conjuguées et barrières de Peierls. C. R.
Acad. Sci. Paris Sér. I Math. 325, 6 (1997), 649–652.
345. Fathi, A. Théorème KAM faible et théorie de Mather sur les systèmes la-
grangiens. C. R. Acad. Sci. Paris Sér. I Math. 324, 9 (1997), 1043–1046.
346. Fathi, A. Sur la convergence du semi-groupe de Lax–Oleinik. C. R. Acad.
Sci. Paris Sér. I Math. 327, 3 (1998), 267–270.
347. Fathi, A. Weak KAM theorem in Lagrangian dynamics. To be published by
Cambridge University Press.
348. Fathi, A., and Figalli, A. Optimal transportation on non-compact mani-
folds. To appear in Israel J. Math.
349. Fathi, A., and Mather, J. N. Failure of convergence of the Lax–Oleinik
semigroup in the time-periodic case. Bull. Soc. Math. France 128, 3 (2000),
473–483.
References 951

350. Fathi, A., and Siconolfi, A. Existence of C 1 critical subsolutions of the

Hamilton–Jacobi equation. Invent. Math. 155, 2 (2004), 363–388.
351. Federbush, P. Partially alternate derivation of a result by Nelson. J. Math.
Phys. 10, 1 (1969), 50–52.
352. Federer, H. Geometric measure theory. Die Grundlehren der mathematischen
Wissenschaften, Band 153. Springer-Verlag New York Inc., New York, 1969.
353. Feldman, M. Variational evolution problems and nonlocal geometric motion.
Arch. Rational Mech. Anal. 146, 3 (1999), 221–274.
354. Feldman, M., and McCann, R. J. Uniqueness and transport density in
Monge’s mass transportation problem. Calc. Var. Partial Differential Equa-
tions 15, 1 (2002), 81–113.
355. Feng, J., and Kurtz, T. G. Large deviations for stochastic processes, vol. 131
of Mathematical Surveys and Monographs. American Mathematical Society,
Providence, RI, 2006.
356. Fernique, X. Sur le théorème de Kantorovitch–Rubinstein dans les es-
paces polonais. In Seminar on Probability, XV (Univ. Strasbourg, 1979/1980),
vol. 850 of Lecture Notes in Math. Springer, Berlin, 1981, pp. 6–10.
357. Feyel, D. Une démonstration simple du théorème de Kantorovich. Personal
communication, 2003.
358. Feyel, D., and Üstünel, A. S. Measure transport on Wiener space and the
Girsanov theorem. C. R. Math. Acad. Sci. Paris 334, 11 (2002), 1025–1028.
359. Feyel, D., and Üstünel, A. S. Monge–Kantorovitch measure transportation
and Monge–Ampère equation on Wiener space. Probab. Theory Related Fields
128, 3 (2004), 347–385.
360. Feyel, D., and Üstünel, A. S. Monge–Kantorovitch measure transporta-
tion, Monge–Ampère equation and the Itô calculus. In Stochastic analysis and
related topics in Kyoto, vol. 41 of Adv. Stud. Pure Math. Math. Soc. Japan,
Tokyo, 2004, pp. 49–74.
361. Feyel, D., and Üstünel, A. S. The strong solution of the Monge–Ampère
equation on the Wiener space for log-concave densities. C. R. Math. Acad. Sci.
Paris 339, 1 (2004), 49–53.
362. Feyel, D., and Üstünel, A. S. Solution of the Monge–Ampère equation on
Wiener space for general log-concave measures. J. Funct. Anal. 232, 1 (2006),
29–55.
363. Figalli, A. Existence, uniqueness and regularity of optimal transport maps.
SIAM J. Math. Anal. 39, 1 (2007), 126–137.
364. Figalli, A. The Monge problem on noncompact manifolds. Rend. Sem. Mat.
Univ. Padova 117 (2007), 147–166.
365. Figalli, A. The optimal partial transfer problem. Preprint, 2008.
366. Figalli, A., and Juillet, N. Absolute continuity of Wasserstein geodesics
in the Heisenberg group. Preprint, 2007.
367. Figalli, A., Kim, Y.-H., and McCann, R. Work in progress.
368. Figalli, A., and Loeper, G. C 1 regularity in two dimensions for potentials
of the optimal transport problem. Preprint, 2007.
369. Figalli, A., Maggi, F., and Pratelli, A. A mass transportation approach
to quantitative isoperimetric inequalities. Preprint, 2007.
370. Figalli, A., and Rifford, L. Mass transportation on sub-Riemannian man-
ifolds. Preprint, 2008.
371. Figalli, A., and Rifford, L. Continuity of optimal transport maps on small
deformations of S 2 . Preprint, 2008.
952 References

372. Figalli, A., and Villani, C. Strong displacement convexity on Riemannian

manifolds. Math. Z. 257, 2 (2007), 251–259.
373. Fisher, R. A. Theory of statistical estimation. Math. Proc. Cambridge Philos.
Soc. 22 (1925), 700–725.
374. Fontbona, J., Guérin, H., and Méléard, S. Measurability of optimal trans-
portation and convergence rate for Landau type interacting particle systems.
Preprint, 2007.
375. Fortuin, C. M., Kasteleyn, P. W., and Ginibre, J. Correlation inequali-
ties on some partially ordered sets. Comm. Math. Phys. 22 (1971), 89–103.
376. Forzani, L., and Maldonado, D. Properties of the solutions to the Monge–
Ampère equation. Nonlinear Anal. 57, 5–6 (2004), 815–829.
377. Fournier, N. Uniqueness for a class of spatially homogeneous Boltzmann
equations without angular cutoff. J. Statist. Phys. 125, 4 (2006), 927–946.
378. Fournier, N., and Guérin, H. On the uniqueness for the spatially homoge-
neous Boltzmann equation with a strong angular singularity. To appear in J.
Stat. Phys.
379. Fournier, N., and Mouhot, C. On the well-posedness of the spatially ho-
mogeneous Boltzmann equation with a moderate angular singularity. Preprint,
2007.
380. Fragalà, I., Gelli, M. S., and Pratelli, A. Continuity of an optimal
transport in Monge problem. J. Math. Pures Appl. (9) 84, 9 (2005), 1261–
1294.
381. Fréchet, M. Sur la distance de deux lois de probabilité. C. R. Acad. Sci.
Paris 244 (1957), 689–692.
382. Frisch, U., Matarrese, S., Mohayaee, R., and Sobolevskı̆, A. A recon-
struction of the initial conditions of the Universe by optimal mass transporta-
tion. Nature 417 (2002), 260–262.
383. Fujita, Y., Ishii, H., and Loreti, P. Asymptotic solutions of viscous
Hamilton–Jacobi equations with Ornstein–Uhlenbeck operator. Comm. Partial
Differential Equations 31, 4–6 (2006), 827–848.
384. Fujita, Y., Ishii, H., and Loreti, P. Asymptotic solutions for large time of
Hamilton–Jacobi equations in Euclidean n space. To appear in Ann. Inst. H.
Poincaré Anal. Non Linéaire.
385. Fujita, Y., Ishii, H., and Loreti, P. Asymptotic solutions of Hamilton–
Jacobi equations in Euclidean n space. Indiana Univ. Math. J. 55, 5 (2006),
1671–1700.
386. Fukaya, K. Collapsing of Riemannian manifolds and eigenvalues of Laplace
operator. Invent. Math. 87, 3 (1987), 517–547.
387. Fukaya, K. Hausdorff convergence of Riemannian manifolds and its applica-
tions. In Recent topics in differential and analytic geometry, vol. 18 of Adv.
Stud. Pure Math. Academic Press, Boston, MA, 1990, pp. 143–238.
388. Fusco, N., Maggi, F., and Pratelli, A. The sharp quantitative Sobolev
inequality for functions of bounded variation. J. Funct. Anal. 244, 1 (2007),
315–341.
389. Fusco, N., Maggi, F., and Pratelli, A. The sharp quantitative isoperi-
metric inequality. To appear in Ann. of Math.
390. Gaffke, N., and Rüschendorf, L. On a class of extremal problems in
statistics. Math. Operationsforsch. Statist. Ser. Optim. 12, 1 (1981), 123–135.
391. Galichon, A., and Henry, M. Testing non-identifying restrictions and con-
fidence regions for partially identified parameters. Preprint, 2007.
References 953

392. Gallay, T., and Wayne, C. E. Global stability of vortex solutions of the
two-dimensional Navier–Stokes equation. Comm. Math. Phys. 255, 1 (2005),
97–129.
393. Gallot, S. Isoperimetric inequalities based on integral norms of Ricci curva-
ture. Astérisque, 157–158 (1988), 191–216.
394. Gallot, S., Hulin, D., and Lafontaine, J. Riemannian geometry, sec-
ond ed. Universitext. Springer-Verlag, Berlin, 1990.
395. Gangbo, W. An elementary proof of the polar factorization of vector-valued
functions. Arch. Rational Mech. Anal. 128, 4 (1994), 381–399.
396. Gangbo, W. The Monge mass transfer problem and its applications. In Monge
Ampère equation: applications to geometry and optimization (Deerfield Beach,
FL, 1997), vol. 226 of Contemp. Math., Amer. Math. Soc., Providence, RI,
1999, pp. 79–104.
397. Gangbo, W. Review on the book Gradient flows in metric spaces and in the
space of probability measures by Ambrosio, Gigli and Savaré, 2006. Available
online at www.math.gatech.edu/~gangbo.
398. Gangbo, W., and McCann, R. J. Optimal maps in Monge’s mass transport
problem. C. R. Acad. Sci. Paris Sér. I Math. 321, 12 (1995), 1653–1658.
399. Gangbo, W., and McCann, R. J. The geometry of optimal transportation.
Acta Math. 177, 2 (1996), 113–161.
400. Gangbo, W., and McCann, R. J. Shape recognition via Wasserstein dis-
tance. Quart. Appl. Math. 58, 4 (2000), 705–737.
401. Gangbo, W., Nguyen, T., and Tudorascu, A. Euler–Poisson systems as
action-minimizing paths in the Wasserstein space. Preprint, 2006.
Available online at www.math.gatech.edu/~gangbo/publications.
402. Gangbo, W., and Oliker, V. I. Existence of optimal maps in the reflector-
type problems. ESAIM Control Optim. Calc. Var. 13, 1 (2007), 93–106.
403. Gangbo, W., and Świȩch, A. Optimal maps for the multidimensional
Monge–Kantorovich problem. Comm. Pure Appl. Math. 51, 1 (1998), 23–45.
404. Gangbo, W., and Westdickenberg, M. Optimal transport for the system
of isentropic Euler equations. Work in progress.
405. Gao, F., and Wu, L. Transportation-Information inequalities for Gibbs mea-
sures. Preprint, 2007.
406. Gardner, R. The Brunn–Minkowski inequality. Bull. Amer. Math. Soc.
(N.S.) 39, 3 (2002), 355–405.
407. Gelbrich, M. On a formula for the L2 Wasserstein metric between measures
on Euclidean and Hilbert spaces. Math. Nachr. 147 (1990), 185–203.
408. Gentil, I. Inégalités de Sobolev logarithmiques et hypercontractivité en
mécanique statistique et en E.D.P. PhD thesis, Univ. Paul-Sabatier (Toulouse),
2001. Available online at
www.ceremade.dauphine.fr/~gentil/maths.html.
409. Gentil, I. Ultracontractive bounds on Hamilton–Jacobi solutions. Bull. Sci.
Math. 126, 6 (2002), 507–524.
410. Gentil, I., Guillin, A., and Miclo, L. Modified logarithmic Sobolev in-
equalities and transportation inequalities. Probab. Theory Related Fields 133,
3 (2005), 409–436.
411. Gentil, I., Guillin, A., and Miclo, L. Modified logarithmic Sobolev in-
equalities in null curvature. Rev. Mat. Iberoamericana 23, 1 (2007), 237–260.
412. Gentil, I., and Malrieu, F. Équations de Hamilton–Jacobi et inégalités
entropiques généralisées. C. R. Acad. Sci. Paris 335 (2002), 437–440.
954 References

413. Giacomin, G. On stochastic domination in the Brascamp–Lieb framework.

Math. Proc. Cambridge Philos. Soc. 134, 3 (2003), 507–514.
414. Gianazza, U., Savaré, G., and Toscani, G. The Wasserstein gradient flow
of the Fisher information and the quantum drift-diffusion equation. To appear
in Arch. Ration. Mech. Anal.
415. Gigli, N. On the geometry of the space of probability measures endowed with
the Kantorovich–Rubinstein distance. PhD Thesis, SNS Pisa, 2008.
416. Gilbarg, D., and Trudinger, N. S. Elliptic partial differential equations of
second order. Classics in Mathematics. Springer-Verlag, Berlin, 2001. Reprint
of the 1998 edition.
417. Gini, C. Sulla misura della concentrazione e della variabilità del caratteri. Atti
del R. Instituto Veneto 73 (1914), 1913–1914.
418. Gini, C. Measurement of inequality and incomes. The Economic Journal 31,
124–126.
419. Glimm, T., and Oliker, V. I. Optical design of single reflector systems and
the Monge–Kantorovich mass transfer problem. J. of Math. Sciences 117, 3
(2003), 4096–4108.
420. Glimm, T., and Oliker, V. I. Optical design of two-reflector systems, the
Monge–Kantorovich mass transfer problem and Fermat’s principle. Indiana
Univ. Math. J. 53, 5 (2004), 1255–1277.
421. Goldman, D., and McCann, R. J. Chaotic response of the 2D semi-
geostrophic and 3D quasi-geostrophic equations to gentle periodic forcing.
Preprint, 2007. Available online at www.math.toronto.edu/mccann.
422. Gomes, D. A. A stochastic analogue of Aubry–Mather theory. Nonlinearity
15, 3 (2002), 581–603.
423. Gomes, D. A., and Oberman, A. M. Computing the effective Hamiltonian
using a variational approach. SIAM J. Control Optim. 43, 3 (2004), 792–812.
424. Goudon, T., Junca, S., and Toscani, G. Fourier-based distances and
Berry–Esseen like inequalities for smooth densities. Monatsh. Math. 135, 2
(2002), 115–136.
425. Gozlan, N. Characterization of Talagrand’s like transportation inequalities
on the real line. J. Funct. Anal. 250, 2 (2007), 400–425.
426. Gozlan, N. Un critère général pour les inégalités de transport sur R associées
à une fonction de coût convexe. Preprint, 2006.
427. Gozlan, N. Principe conditionnel de Gibbs pour des contraintes fines ap-
prochées et Inégalités de Transport. PhD thesis, Univ. Paris X–Nanterre, 2005.
428. Gozlan, N. Integral criteria for transportation-cost inequalities. Electron.
Comm. Probab. 11 (2006), 64–77.
429. Gozlan, N. A characterization of dimension free concentration in terms of
transportation inequalities. Preprint, 2008.
430. Gozlan, N., and Léonard, Ch. A large deviation approach to some trans-
portation cost inequalities. Probab. Theory Related Fields 139, 1–2 (2007),
235–283.
431. Granieri, L. On action minimizing measures for the Monge–Kantorovich
problem. To appear in NoDEA Nonlinear Differential Equations Appl.
432. Greene, R. E., and Shiohama, K. Diffeomorphisms and volume-preserving
embeddings of noncompact manifolds. Trans. Amer. Math. Soc. 255 (1979),
403–414.
433. Grigor′ yan, A. A. The heat equation on noncompact Riemannian manifolds.
Mat. Sb. 182, 1 (1991), 55–87.
References 955

434. Grigor′ yan, A. A. Analytic and geometric background of recurrence and

non-explosion of the Brownian motion on Riemannian manifolds. Bull. Amer.
Math. Soc. (N.S.) 36, 2 (1999), 135–249.
435. Gromov, M. Paul Lévy’s isoperimetric inequality. Preprint IHES, 1980.
436. Gromov, M. Groups of polynomial growth and expanding maps. Inst. Hautes
Études Sci. Publ. Math., 53 (1981), 53–73.
437. Gromov, M. Sign and geometric meaning of curvature. Rend. Sem. Mat. Fis.
Milano 61 (1991), 9–123 (1994).
438. Gromov, M. Metric structures for Riemannian and non–Riemannian spaces,
vol. 152 of Progress in Mathematics. Birkhäuser Boston Inc., Boston, MA,
1999. Based on the 1981 French original. With appendices by M. Katz, P.
Pansu and S. Semmes. Translated from the French by S. M. Bates.
439. Gromov, M. Mesoscopic curvature and hyperbolicity. In Global differential
geometry: the mathematical legacy of Alfred Gray (Bilbao 2000), vol. 228 of
Contemp. Math., Amer. Math. Soc., Providence, RI, 2001, pp. 58–69.
440. Gromov, M., and Milman, V. D. A topological application of the isoperi-
metric inequality. Amer. J. Math. 105, 4 (1983), 843–854.
441. Gross, L. Logarithmic Sobolev inequalities. Amer. J. Math. 97 (1975), 1061–
1083.
442. Gross, L. Logarithmic Sobolev inequalities and contractivity properties of
semigroups. In Dirichlet forms (Varenna, 1992). Springer, Berlin, 1993, pp. 54–
88. Lecture Notes in Math., 1563.
443. Grove, K., and Petersen, P. Manifolds near the boundary of existence.
J. Differential Geom. 33, 2 (1991), 379–394.
444. Grunewald, N., Otto, F., Villani, C., and Westdickenberg, M. A
two-scale approach to logarithmic Sobolev inequalities and the hydrodynamic
limit. To appear in Ann. Inst. H. Poincaré Probab. Statist.
445. Grunewald, N., Otto, F., Villani, C., and Westdickenberg, M. Work
in progress.
446. Guan, P. Monge–Ampère equations and related topics. Lecture notes. Avail-
able online at www.math.mcgill.ca/guan/notes.html, 1998.
447. Guillin, A. Personal communication.
448. Guillin, A., Léonard, C., Wu, L., and Yao, N. Transportation cost vs.
Fisher–Donsker–Varadhan information. Preprint, 2007. Available online at
www.cmi.univ-mrs.fr/ guillin/index3.html
449. Guionnet, A. First order asymptotics of matrix integrals; a rigorous ap-
proach towards the understanding of matrix models. Comm. Math. Phys. 244,
3 (2004), 527–569.
450. Guittet, K. Contributions à la résolution numérique de problèmes de trans-
port optimal de masse. PhD thesis, Univ. Paris VI – Pierre et Marie Curie
(France), 2003.
451. Guittet, K. On the time-continuous mass transport problem and its approx-
imation by augmented Lagrangian techniques. SIAM J. Numer. Anal. 41, 1
(2003), 382–399.
452. Gutiérrez, C. E. The Monge–Ampère equation, vol. 44 of Progress in Non-
linear Differential Equations and their Applications. Birkhäuser Boston Inc.,
Boston, MA, 2001.
453. Gutiérrez, C. E., and Huang, Q. The refractor problem in reshaping light
beams. Preprint, 2007.
956 References

454. Hackenbroch, W., and Thalmaier, A. Stochastische Analysis. Eine

Einführung in die Theorie der stetigen Semimartingale. Mathematische
Leitfäden. B. G. Teubner, Stuttgart, 1994.
455. Hairer, M. Exponential mixing properties of stochastic PDEs through asymp-
totic coupling. Probab. Theory Related Fields 124, 3 (2002), 345–380.
456. Hairer, M., and Mattingly, J. C. Ergodic properties of highly degenerate
2D stochastic Navier–Stokes equations. C. R. Math. Acad. Sci. Paris 339, 12
(2004), 879–882.
457. Hairer, M., and Mattingly, J. C. Ergodicity of the 2D Navier–Stokes
equations with degenerate stochastic forcing. Ann. of Maths (2) 164, 3 (2006),
993–1032.
458. Hairer, M., and Mattingly, J. C. Spectral gaps in Wasserstein distances
and the 2D Navier–Stokes equation. To appear in Ann. Probab.
459. Hajlasz, P. Sobolev spaces on an arbitrary metric space. Potential Anal. 5,
4 (1996), 403–415.
460. Hajlasz, P. Sobolev spaces on metric-measure spaces. In Heat kernels and
analysis on manifolds, graphs, and metric spaces (Paris, 2002), vol. 338 of
Contemp. Math., Amer. Math. Soc., Providence, RI, 2003, pp. 173–218.
461. Hajlasz, P., and Koskela, P. Sobolev meets Poincaré. C. R. Acad. Sci.
Paris Sér. I Math. 320, 10 (1995), 1211–1215.
462. Haker, S., and Tannenbaum, A. On the Monge–Kantorovich problem and
image warping. In Mathematical methods in computer vision, vol. 133 of IMA
Vol. Math. Appl. Springer, New York, 2003, pp. 65–85.
463. Haker, S., Zhu, L., Tannenbaum, A., and Angenent, S. Optimal mass
transport for registration and warping. International Journal on Computer
Vision 60, 3 (2004), 225–240.
464. Hanin, L. G. An extension of the Kantorovich norm. In Monge Ampère equa-
tion: applications to geometry and optimization (Deerfield Beach, FL, 1997).
Amer. Math. Soc., Providence, RI, 1999, pp. 113–130.
465. Hargé, G. A convex/log-concave correlation inequality for Gaussian measure
and an application to abstract Wiener spaces. Probab. Theory Related Fields
130, 3 (2004), 415–440.
466. Hauray, M., and Jabin, P.-E. N -particles approximation of the Vlasov
equations with singular potential. Arch. Ration. Mech. Anal. 183, 3 (2007),
489–524.
467. Hedlund, G. Geodesics on a two-dimensional Riemannian manifold with
periodic coefficients. Ann. of Math. (2) 33 (1932), 719–739.
468. Heinich, H., and Lootgieter, J.-C. Convergence des fonctions monotones.
C. R. Acad. Sci. Paris Sér. I Math. 322, 9 (1996), 869–874.
469. Heinonen, J. Lectures on analysis on metric spaces. Universitext. Springer-
Verlag, New York, 2001.
470. Heinonen, J., and Koskela, P. Quasiconformal maps in metric spaces with
controlled geometry. Acta Math. 181, 1 (1998), 1–61.
471. Heinz, E. Über die Differentialungleichung 0 < α ≤ rt − s2 ≤ β < ∞. Math.
Z. 72 (1959/1960), 107–126.
472. Hérau, F., and Nier, F. Isotropic hypoellipticity and trend to equilibrium
for the Fokker–Planck equation with a high-degree potential. Arch. Ration.
Mech. Anal. 171, 2 (2004), 151–218.
References 957

473. Hiai, F., Ohya, M., and Tsukada, M. Sufficiency and relative entropy in ∗-
algebras with applications in quantum systems. Pacific J. Math. 107, 1 (1983),
117–140.
474. Hiai, F., Petz, D., and Ueda, Y. Free transportation cost inequalities via
random matrix approximation. Probab. Theory Related Fields 130, 2 (2004),
199–221.
475. Hiai, F., and Ueda, Y. Free transportation cost inequalities for noncommu-
tative multi-variables. Infin. Dimens. Anal. Quantum Probab. Relat. Top. 9, 3
(2006), 391–412.
476. Hohloch, S. Optimale Massebewegung im Monge–Kantorovich-Transport-
problem. Diploma thesis, Freiburg Univ., 2002.
477. Holley, R. Remarks on the FKG inequalities. Comm. Math. Phys. 36 (1974),
227–231.
478. Holley, R., and Stroock, D. W. Logarithmic Sobolev inequalities and
stochastic Ising models. J. Statist. Phys. 46, 5–6 (1987), 1159–1194.
479. Horowitz, J., and Karandikar, R. L. Mean rates of convergence of empir-
ical measures in the Wasserstein metric. J. Comput. Appl. Math. 55, 3 (1994),
261–273. (See also the reviewer’s note on MathSciNet.)
480. Hoskins, B. J. Atmospheric frontogenesis models: some solutions. Q.J.R.
Met. Soc. 97 (1971), 139–153.
481. Hoskins, B. J. The geostrophic momentum approximation and the semi-
geostrophic equations. J. Atmosph. Sciences 32 (1975), 233–242.
482. Hoskins, B. J. The mathematical theory of frontogenesis. Ann. Rev. of Fluid
Mech. 14 (1982), 131–151.
483. Hsu, E. P. Stochastic analysis on manifolds, vol. 38 of Graduate Studies in
Mathematics. American Mathematical Society, Providence, RI, 2002.
484. Hsu, E. P., and Sturm, K.-T. Maximal coupling of Euclidean Brownian
motions. FB-Preprint No. 85, Bonn. Available online at
www-wt.iam.uni-bonn.de/~sturm/de/index.html.
485. Huang, C., and Jordan, R. Variational formulations for Vlasov–Poisson–
Fokker–Planck systems. Math. Methods Appl. Sci. 23, 9 (2000), 803–843.
486. Ichihara, K. Curvature, geodesics and the Brownian motion on a Riemannian
manifold. I. Recurrence properties, II. Explosion properties. Nagoya Math. J.
87 (1982), 101–114; 115–125.
487. Ishii, H. Asymptotic solutions for large time of Hamilton–Jacobi equations.
International Congress of Mathematicians, Vol. III, Eur. Math. Soc., Zürich,
2006, pp. 213–227. Short presentation available online at
www.edu.waseda.ac.jp/~ishii.
488. Ishii, H. Unpublished lecture notes on the weak KAM theorem, 2004. Available
online at www.edu.waseda.ac.jp/~ishii.
489. Itoh, J.-i., and Tanaka, M. The Lipschitz continuity of the distance function
to the cut locus. Trans. Amer. Math. Soc. 353, 1 (2001), 21–40.
490. Jian, H.-Y., and Wang, X.-J. Continuity estimates for the Monge–Ampère
equation. Preprint, 2006.
491. Jimenez, Ch. Dynamic formulation of optimal transport problems. To appear
in J. Convex Anal.
492. Jones, P., Maggioni, M., and Schul, R. Universal local parametrizations
via heat kernels and eigenfunctions of the Laplacian. Preprint, 2008.
493. Jordan, R., Kinderlehrer, D., and Otto, F. The variational formulation
of the Fokker–Planck equation. SIAM J. Math. Anal. 29, 1 (1998), 1–17.
958 References

494. Joulin, A. Poisson-type deviation inequalities for curved continuous time

Markov chains. Bernoulli 13, 3 (2007), 782–798.
495. Joulin, A. A new Poisson-type deviation inequality for Markov jump process
with positive Wasserstein curvature. Preprint, 2007.
496. Juillet, N. Geometric inequalities and generalized Ricci bounds in the Heisen-
berg group. Preprint, 2006.
497. Jüngel, A., and Matthes, D. The Derrida–Lebowitz–Speer–Spohn equa-
tion: Existence, nonuniqueness, and decay rates of the solutions. Preprint,
2006. Available online at www.juengel.de.vu.
498. Kalashnikov, V. V., and Rachev, S. T. Mathematical methods for construc-
tion of queueing models. The Wadsworth & Brooks/Cole Operations Research
Series, Pacific Grove, CA, 1990.
499. Kantorovich, L. V. Mathematical methods in the organization and planning
of production. Leningrad Univ., 1939. English translation in Management
Science 6, 4 (1960), 363-422.
500. Kantorovich, L. V. On an effective method of solving certain classes of
extremal problems. Dokl. Akad. Nauk. USSR 28 (1940), 212–215.
501. Kantorovich, L. V. On the translocation of masses. Dokl. Akad. Nauk.
USSR 37 (1942), 199–201. English translation in J. Math. Sci. 133, 4 (2006),
1381–1382.
502. Kantorovich, L. V. On a problem of Monge. Uspekhi Mat. Nauk. 3 (1948),
225–226. English translation in J. Math. Sci. 133, 4 (2006), 1383.
503. Kantorovich, L. V. The best use of economic resources. Oxford Pergamon
Press, 1965.
504. Kantorovich, L. V. Autobiography. In Nobel Lectures, Economics 1969–
1980, A. Lindbeck, Ed., Les Prix Nobel / Nobel Lectures. World Scientific
Publishing Co., Singapore, 1992. Available online at
nobelprize.org/economics/laureates/1975/kantorovich-autobio.html.
505. Kantorovich, L. V., and Gavurin, M. L. Application of mathematical
methods to problems of analysis of freight flows. In Problems of raising the
efficiency of transport performance (1949), Moscow-Leningrad Univ., pp. 110–
138.
506. Kantorovich, L. V., and Rubinshtein, G. S. On a space of totally additive
functions. Vestn. Leningrad. Univ. 13, 7 (1958), 52–59.
507. Karakhanyan, A., and Wang, X.-J. The reflector design problem. Preprint,
2007.
508. Kasue, A., and Kumura, H. Spectral convergence of Riemannian manifolds.
Tohoku Math. J. (2) 46, 2 (1994), 147–179.
509. Kasue, A., and Kumura, H. Spectral convergence of Riemannian manifolds.
II. Tohoku Math. J. (2) 48, 1 (1996), 71–120.
510. Keith, S. Modulus and the Poincaré inequality on metric measure spaces.
Math. Z. 245, 2 (2003), 255–292.
511. Keith, S. A differentiable structure for metric measure spaces. Adv. Math.
183, 2 (2004), 271–315.
512. Kellerer, H. G. Duality theorems for marginal problems. Z. Wahrsch. Verw.
Gebiete 67, 4 (1984), 399–432.
513. Khesin, B., and Lee, P. Poisson geometry and first integrals of geostrophic
equations. Preprint, 2008.
514. Khesin, B., and Lee, P. A nonholomic Moser theorem and optimal mass
transport. Preprint, 2008.
References 959

515. Khesin, B., and Misiolek, G. Shock waves for the Burgers equation and
curvatures of diffeomorphism groups. Shock waves for the Burgers equation
and curvatures of diffeomorphism groups. Proc. Steklov Inst. Math. 250 (2007),
1–9.
516. Kim, S. Harnack inequality for nondivergent elliptic operators on Riemannian
manifolds. Pacific J. Math. 213, 2 (2004), 281–293.
517. Kim, Y. J., and McCann, R. J. Sharp decay rates for the fastest conservative
diffusions. C. R. Math. Acad. Sci. Paris 341, 3 (2005), 157–162.
518. Kim, Y.-H. Counterexamples to continuity of optimal transportation on pos-
itively curved Riemannian manifolds. Preprint, 2007.
519. Kim, Y.-H., and McCann, R. J. On the cost-subdifferentials of cost-convex
functions. Preprint, 2007. Archived online at arxiv.org/abs/0706.1266.
520. Kim, Y.-H., and McCann, R. J. Continuity, curvature, and the general
covariance of optimal transportation. Preprint, 2007.
521. Kleptsyn, V., and Kurtzmann, A. Ergodicity of self-attracting Brownian
motion. Preprint, 2008.
522. Kloeckner, B. A geometric study of the Wasserstein space of the line.
Preprint, 2008.
523. Knothe, H. Contributions to the theory of convex bodies. Michigan Math. J.
4 (1957), 39–52.
524. Knott, M., and Smith, C. S. On the optimal mapping of distributions.
J. Optim. Theory Appl. 43, 1 (1984), 39–49.
525. Knott, M., and Smith, C. S. On a generalization of cyclic monotonicity and
distances among random vectors. Linear Algebra Appl. 199 (1994), 363–371.
526. Kolesnikov, A. V. Convexity inequalities and optimal transport of infinite-
dimensional measures. J. Math. Pures Appl. (9) 83, 11 (2004), 1373–1404.
527. Kontorovich, L. A linear programming inequality with applications to con-
centration of measures. Preprint, 2006. Archived online at
arxiv.org/abs/math.FA/0610712.
528. Kontsevich, M., and Soibelman, Y. Homological mirror symmetry and
torus fibrations. In Symplectic geometry and mirror symmetry (Seoul, 2000).
World Sci. Publ., River Edge, NJ, 2001, pp. 203–263.
529. Koskela, P. Upper gradients and Poincaré inequality on metric measure
spaces. In Lecture notes on Analysis in metric spaces (Trento, 1999), Appunti
Corsi Tenuti Docenti Sc., Scuola Norm. Sup., Pisa, 2000, pp. 55–69.
530. Krylov, N. V. Boundedly nonhomogeneous elliptic and parabolic equations.
Izv. Akad. Nauk SSSR, Ser. Mat. 46, 3 (1982), 487–523. English translation in
Math. USSR Izv. 20, 3 (1983), 459–492.
531. Krylov, N. V. Fully nonlinear second order elliptic equations: recent devel-
opment. Ann. Scuola Norm. Sup. Pisa Cl. Sci. 25, 3–4 (1998), 569–595.
532. Krylov, N. V., and Safonov, M. V. An estimate of the probability that a
diffusion process hits a set of positive measure. Dokl. Acad. Nauk SSSR 245,
1 (1979), 18–20.
533. Kuksin, S., Piatnitski, A., and Shirikyan, A. A coupling approach to
randomly forced nonlinear PDE’s, II. Comm. Math. Phys. 230, 1 (2002), 81–
85.
534. Kullback, S. A lower bound for discrimination information in terms of vari-
ation. IEEE Trans. Inform. Theory 4 (1967), 126–127.
535. Kurtzmann, A. The ODE method for some self-interacting diffusions on Rd .
Preprint, 2008.
960 References

536. Kuwae, K., Machigashira, Y., and Shioya, T. Beginning of analysis on

Alexandrov spaces. In Geometry and topology: Aarhus (1998), vol. 258 of
Contemp. Math., Amer. Math. Soc., Providence, RI, 2000, pp. 275–284.
537. Kuwae, K., Machigashira, Y., and Shioya, T. Sobolev spaces, Laplacian,
and heat kernel on Alexandrov spaces. Math. Z. 238, 2 (2001), 269–316.
538. Kuwae, K., and Shioya, T. On generalized measure contraction property
and energy functionals over Lipschitz maps. Potential Anal. 15, 1–2 (2001),
105–121.
539. Kuwae, K., and Shioya, T. Sobolev and Dirichlet spaces over maps between
metric spaces. J. Reine Angew. Math. 555 (2003), 39–75.
540. Latala, R., and Oleszkiewicz, K. Between Sobolev and Poincaré. In Ge-
ometric aspects of functional analysis, vol. 1745 of Lecture Notes in Math.
Springer, Berlin, 2000, pp. 147–168.
541. Ledoux, M. On an integral criterion for hypercontractivity of diffusion semi-
groups and extremal functions. J. Funct. Anal. 105, 2 (1992), 444–465.
542. Ledoux, M. Inégalités isopérimétriques en analyse et probabilités. Astérisque,
216 (1993), Exp. No. 773, 5, 343–375. Séminaire Bourbaki, vol. 1992/93.
543. Ledoux, M. Isoperimetry and Gaussian analysis. In Lectures on probability
theory and statistics (Saint-Flour, 1994). Springer, Berlin, 1996, pp. 165–294.
544. Ledoux, M. Concentration of measure and logarithmic Sobolev inequalities.
In Séminaire de Probabilités, XXXIII. Springer, Berlin, 1999, pp. 120–216.
545. Ledoux, M. The geometry of Markov diffusion generators. Ann. Fac. Sci.
Toulouse Math. (6) 9, 2 (2000), 305–366.
546. Ledoux, M. The concentration of measure phenomenon. American Mathe-
matical Society, Providence, RI, 2001.
547. Ledoux, M. Measure concentration, transportation concentration, and func-
tional inequalities. Lecture Notes for the Instructional Conference on Combi-
natorial Aspects of Mathematical Analysis (Edinburgh, March 2002) and the
Summer School on Singular Phenomena and Scaling in Mathematical Models
(Bonn, June 2003), 2003. Available online at www.lsp.ups-tlse.fr/Ledoux.
548. Leindler, L. On a certain converse of Hölder’s inequality. In Linear operators
and approximation (Proc. Conf., Oberwolfach, 1971). Birkhäuser, Basel, 1972,
pp. 182–184. Internat. Ser. Numer. Math., vol. 20.
549. Lelièvre, T., Rousset, M., and Stoltz, G. Long-time convergence of an
adaptive biasing force method. Preprint, 2007.
550. Léonard, Ch. A saddle-point approach to the Monge–Kantorovich optimal
transport problem. Preprint, 2008.
551. Li, Y., and Nirenberg, L. The distance function to the boundary, Finsler
geometry, and the singular set of viscosity solutions of some Hamilton–Jacobi
equations. Comm. Pure Appl. Math. 58 (2004), 85–146.
552. Li, P., and Yau, S.-T. On the parabolic kernel of the Schrödinger operator.
Acta Math. 156, 3–4 (1986), 153–201.
553. Lichnerowicz, A. Géométrie des groupes de transformations. Travaux et
recherches mathématiques, Vol. III, Dunod, Paris, 1958.
554. Lieb, E. H., and Loss, M. Analysis, second ed. American Mathematical
Society, Providence, RI, 2001.
555. Liebermann, G. M., and Trudinger, N. S. Nonlinear oblique boundary
value problems for nonlinear elliptic equations. Trans. Amer. Math. Soc. 295,
2 (1986), 509–546.
References 961

556. Liese, F., and Vajda, I. Convex statistical distances, vol. 95 of Teubner-Texte
zur Mathematik. BSB B. G. Teubner Verlagsgesellschaft, Leipzig, 1987. With
German, French and Russian summaries.
557. Lions, J.-L. Quelques méthodes de résolution des problèmes aux limites non
linéaires. Dunod, 1969.
558. Lions, P.-L. Generalized solutions of Hamilton–Jacobi equations. Pitman
(Advanced Publishing Program), Boston, Mass., 1982.
559. Lions, P.-L. Personal communication, 2003.
560. Lions, P.-L., and Trudinger, N. S. Linear oblique derivative problems for
the uniformly elliptic Hamilton–Jacobi–Bellman equation. Math. Z. 191, 1
(1986), 1–15.
561. Lions, P.-L., Trudinger, N. S., and Urbas, J. The Neumann problem
for equations of Monge–Ampère type. Comm. Pure Appl. Math. 39, 4 (1986),
539–563.
562. Lions, P.-L., Trudinger, N. S., and Urbas, J. The Neumann problem for
equations of Monge–Ampère type. Proceedings of the Centre for Mathematical
Analysis, Australian National University, 10, Canberra, 1986, pp. 135–140.
563. Lions, P.-L., and Lasry, J.-M. Régularité optimale de racines carrées. C. R.
Acad. Sci. Paris Sér. I Math. 343, 10 (2006), 679–684.
564. Lions, P.-L., Papanicolaou, G., and Varadhan, S. R. S. Homogeneization
of Hamilton–Jacobi equations. Unpublished preprint, 1987.
565. Lisini, S. Characterization of absolutely continuous curves in Wasserstein
spaces. Calc. Var. Partial Differential Equations 28, 1 (2007), 85–120.
566. Liu, J. Hölder regularity of optimal mapping in optimal transportation. To
appear in Calc. Var. Partial Differential Equations.
567. Quasi-neutral limit of the Euler–Poisson and Euler–Monge–Ampère systems.
Comm. Partial Differential Equations 30, 7-9 (2005), 1141–1167.
568. Loeper, G. The reconstruction problem for the Euler-Poisson system in cos-
mology. Arch. Ration. Mech. Anal. 179, 2 (2006), 153–216.
569. Loeper, G. A fully nonlinear version of Euler incompressible equations: the
Semi-Geostrophic system. SIAM J. Math. Anal. 38, 3 (2006), 795–823.
570. Loeper, G. On the regularity of maps solutions of optimal transportation
problems. To appear in Acta Math.
571. Loeper, G. On the regularity of maps solutions of optimal transportation
problems II. Work in progress, 2007.
572. Loeper, G., and Villani, C. Regularity of optimal transport in curved
geometry: the nonfocal case. Preprint, 2007.
573. Lott, J. Optimal transport and Ricci curvature for metric-measure spaces.
To appear in Surveys in Differential Geometry.
574. Lott, J. Some geometric properties of the Bakry–Émery–Ricci tensor. Com-
ment. Math. Helv. 78, 4 (2003), 865–883.
575. Lott, J. Some geometric calculations on Wasserstein space. To appear in
Comm. Math. Phys. Available online at www.math.lsa.umich.edu/~lott.
576. Lott, J. Optimal transport and Perelman’s reduced volume. Preprint, 2008.
577. Lott, J., and Villani, C. Ricci curvature for metric-measure spaces via
optimal transport. To appear in Ann. of Math. (2) Available online at
www.umpa.ens-lyon.fr/~cvillani.
578. Lott, J., and Villani, C. Weak curvature bounds and functional inequalities.
J. Funct. Anal. 245, 1 (2007), 311–333.
962 References

579. Lott, J., and Villani, C. Hamilton–Jacobi semigroup on length spaces and
applications. J. Math. Pures Appl. (9) 88, 3 (2007), 219–229.
580. Lu, P., Ni, L., Vázquez, J.-L., and Villani, C. Local Aronson–Bénilan
estimates and entropy formulae for porous medium and fast diffusion equations
on manifolds. Preprint, 2008.
581. Lusternik, L. A. Die Brunn–Minkowskische Ungleichung für beliebige mess-
bare Mengen. Dokl. Acad. Sci. URSS, 8 (1935), 55–58.
582. Lutwak, E., Yang, D., and Zhang, G. Optimal Sobolev norms and the Lp
Minkowski problem. Int. Math. Res. Not. (2006), Art. ID 62987, 21.
583. Lytchak, A. Differentiation in metric spaces. Algebra i Analiz 16, 6 (2004),
128–161.
584. Lytchak, A. Open map theorem for metric spaces. Algebra i Analiz 17, 3
(2005), 139–159.
585. Ma, X.-N., Trudinger, N. S., and Wang, X.-J. Regularity of potential
functions of the optimal transportation problem. Arch. Ration. Mech. Anal.
177, 2 (2005), 151–183.
586. Maggi, F. Some methods for studying stability in isoperimetric type problems.
To appear in Bull. Amer. Math. Soc.
587. Maggi, F., and Villani, C. Balls have the worst best Sobolev inequalities.
J. Geom. Anal. 15, 1 (2005), 83–121.
588. Maggi, F., and Villani, C. Balls have the worst best Sobolev inequalities.
Part II: Variants and extensions. Calc. Var. Partial Differential Equations 31,
1 (2008), 47–74.
589. Mallows, C. L. A note on asymptotic joint normality. Ann. Math. Statist.
43 (1972), 508–515.
590. Malrieu, F. Logarithmic Sobolev inequalities for some nonlinear PDE’s.
Stochastic Process. Appl. 95, 1 (2001), 109–132.
591. Malrieu, F. Convergence to equilibrium for granular media equations and
their Euler schemes. Ann. Appl. Probab. 13, 2 (2003), 540–560.
592. Mañé, R. Lagrangian flows: the dynamics of globally minimizing orbits. In
International Conference on Dynamical Systems (Montevideo, 1995), vol. 362
of Pitman Res. Notes Math. Ser. Longman, Harlow, 1996, pp. 120–131.
593. Mañé, R. Lagrangian flows: the dynamics of globally minimizing orbits. Bol.
Soc. Brasil. Mat. (N.S.) 28, 2 (1997), 141–153.
594. Maroofi, H. Applications of the Monge–Kantorovich theory. PhD thesis,
Georgia Tech, 2002.
595. Marton, K. A measure concentration inequality for contracting Markov
chains. Geom. Funct. Anal. 6 (1996), 556–571.
596. Marton, K. Measure concentration for Euclidean distance in the case of
dependent random variables. Ann. Probab. 32, 3B (2004), 2526–2544.
597. Massart, P. Concentration inequalities and model selection. Lecture Notes
from the 2003 Saint-Flour Probability Summer School. To appear in the
Springer book series Lecture Notes in Math. Available online at
www.math.u-psud.fr/~massart.
598. Mather, J. N. Existence of quasiperiodic orbits for twist homeomorphisms
of the annulus. Topology 21, 4 (1982), 457–467.
599. Mather, J. N. More Denjoy minimal sets for area preserving diffeomorphisms.
Comment. Math. Helv. 60 (1985), 508–557.
600. Mather, J. N. Minimal measures. Comment. Math. Helv. 64, 3 (1989), 375–
394.
References 963

601. Mather, J. N. Action minimizing invariant measures for positive definite

Lagrangian systems. Math. Z. 207, 2 (1991), 169–207.
602. Mather, J. N. Variational construction of orbits of twist diffeomorphisms.
J. Amer. Math. Soc. 4, 2 (1991), 207–263.
603. Mather, J. N. Variational construction of connecting orbits. Ann. Inst.
Fourier (Grenoble) 43, 5 (1993), 1349–1386.
604. Mather, J. N., and Forni, G. Action minimizing orbits in Hamiltonian
systems. In Transition to chaos in classical and quantum mechanics, vol. 1589
of Lecture Notes in Math., Springer-Verlag, Berlin, 1994, pp. 92–186.
605. Mattingly, J. C. Ergodicity of stochastic partial differential equations. Lec-
ture Notes for the 2007 Saint-Flour Summer School.
606. Mattingly, J. C., Stuart, A. M., and Higham, D. J. Ergodicity for
SDEs and approximations: locally Lipschitz vector fields and degenerate noise.
Stochastic Process. Appl. 101, 2 (2002), 185–232.
607. Maurey, B. Some deviation inequalities. Geom. Funct. Anal. 1, 2 (1991),
188–197.
608. Maurey, B. Inégalité de Brunn–Minkowski–Lusternik, et autres inégalités
géométriques et fonctionnelles. Astérisque, 299 (2005), Exp. No. 928, vii, 95–
113. Séminaire Bourbaki, Vol. 2003/2004.
609. Maz’ja, V. G. The negative spectrum of the higher-dimensional Schrödinger
operator. Dokl. Akad. Nauk SSSR 144 (1962), 721–722.
610. Maz’ja, V. G. On the solvability of the Neumann problem. Dokl. Akad. Nauk
SSSR 147 (1962), 294–296.
611. Maz’ja, V. G. Sobolev spaces. Springer Series in Soviet Mathematics.
Springer-Verlag, Berlin, 1985. Translated from the Russian by T. O. Sha-
poshnikova.
612. McCann, R. J. A convexity theory for interacting gases and equilibrium crys-
tals. PhD thesis, Princeton Univ., 1994.
613. McCann, R. J. Existence and uniqueness of monotone measure-preserving
maps. Duke Math. J. 80, 2 (1995), 309–323.
614. McCann, R. J. A convexity principle for interacting gases. Adv. Math. 128,
1 (1997), 153–179.
615. McCann, R. J. Exact solutions to the transportation problem on the line.
R. Soc. Lond. Proc. Ser. A Math. Phys. Eng. Sci. 455, 1984 (1999), 1341–1380.
616. McCann, R. J. Polar factorization of maps on Riemannian manifolds. Geom.
Funct. Anal. 11, 3 (2001), 589–608.
617. McCann, R. J. Stable rotating binary stars and fluid in a tube. Houston J.
Math. 32, 2 (2006), 603–631.
618. McCann, R. J., and Puel, M. Constructing a relativistic heat flow by trans-
port time steps. Preprint, 2007. Available at www.math.toronto.edu/mccann.
619. McCann, R. J., and Slepčev, D. Second-order asymptotics for the fast-
diffusion equation. Int. Math. Res. Not. (2006), Art. ID 24947, 22.
620. McCann, R. J., and Topping, P. Ricci flow, entropy and optimal trans-
portation. Preprint, 2007. Available online at www.math.toronto.edu/mccann.
621. McDiarmid, C. On the method of bounded differences. In Surveys in com-
binatorics, 1989 (Norwich, 1989), vol. 141 of London Math. Soc. Lecture Note
Ser., Cambridge University Press, Cambridge, 1989, pp. 148–188.
622. McNamara, S., and Young, W. R. Kinetics of a one-dimensional granular
medium. Phys. Fluids A 5, 1 (1993), 34–45.
964 References

623. Meckes, M. Some remarks on transportation cost and related inequalities.

In Geometric aspects of functional analysis, Lect. Notes in Math., vol. 1910,
Springer-Verlag, Berlin, 2007, pp. 237–244.
624. Méléard, S. Probabilistic interpretation and approximations of some Boltz-
mann equations. In Stochastic models (Guanajuato, 1998). Soc. Mat. Mexicana,
México, 1998, pp. 1–64.
625. Melleray, J., Petrov, F., and Vershik, A. M. Linearly rigid metric spaces.
C. R. Math. Acad. Sci. Paris 344, 4 (2007), 235–240.
626. Menguy, X. Examples of nonpolar limit spaces. Amer. J. Math. 122, 5 (2000),
927–937.
627. Mihalas, D., and Mihalas, B. Foundations of Radiation Hydrodynamics.
Oxford University Press, 1984.
628. Mikami, T. Monge’s problem with a quadratic cost by the zero-noise limit of
h-path processes. Probab. Theory Related Fields 129, 2 (2004), 245–260.
629. Mikami, T. A simple proof of duality theorem for Monge–Kantorovich prob-
lem. Kodai Math. J. 29, 1 (2006), 1–4.
630. Mikami, T., and Thieullen, M. Optimal transportation problem by stochas-
tic optimal control. Preprint, 2005. Available online at
eprints.math.sci.hokudai.ac.jp.
631. Mikami, T., and Thieullen, M. Duality theorem for the stochastic optimal
control problem. Stochastic Process. Appl. 116, 12 (2006), 1815–1835.
632. Milakis, E. On the regularity of optimal sets in mass transfer problems.
Comm. Partial Differential Equations 31, 4–6 (2006), 817–826.
633. Milman, E. On the role of convexity in isoperimetry, spectral gap and con-
centration. Preprint, 2008.
634. Milman, V. D. A new proof of A. Dvoretzky’s theorem on cross-sections of
convex bodies. Funkcional. Anal. i Priložen. 5, 4 (1971), 28–37.
635. Milman, V. D., and Schechtman, G. Asymptotic theory of finite-dimen-
sional normed spaces. With an appendix by M. Gromov. Springer-Verlag,
Berlin, 1986.
636. Monge, G. Mémoire sur la théorie des déblais et des remblais. In Histoire de
l’Académie Royale des Sciences de Paris (1781), pp. 666–704.
637. Morse, H. M. A fundamental class of geodesics on any closed surface of genus
greater than one. Trans. Amer. Math. Soc. 26 (1924), 25–60.
638. Moser, J. On Harnack’s theorem for elliptic differential equations. Comm.
Pure Appl. Math. 14 (1961), 577–591.
639. Moser, J. A Harnack inequality for parabolic differential equations. Comm.
Pure Appl. Math. 17 (1964), 101–134. Erratum in Comm. Pure Appl. Math.
20 (1967), 231–236.
640. Moser, J. On the volume elements on a manifold. Trans. Amer. Math. Soc.
120 (1965), 286–294.
641. Moser, J. Monotone twist mappings and the calculus of variations. Ergodic
Theory Dynam. Systems 6, 3 (1986), 401–413.
642. Moutsinga, O. Approche probabiliste des Particules Collantes et Système de
Gaz Sans Pression. PhD thesis, Univ. Lille I (France), 2003.
643. Muckenhoupt, B. Hardy’s inequality with weights. Studia Math. 44 (1972),
31–38.
644. Murata, H., and Tanaka, H. An inequality for certain functional of multi-
dimensional probability distributions. Hiroshima Math. J. 4 (1974), 75–81.
References 965

645. Namah, G., and Roquejoffre, J.-M. Remarks on the long time behaviour
of the solutions of Hamilton–Jacobi equations. Comm. Partial Differential
Equations 24, 5–6 (1999), 883–893.
646. Nash, J. Continuity of solutions of parabolic and elliptic equations. Amer. J.
Math. 80 (1958), 931–954.
647. Nelson, E. Derivation of the Schrödinger equation from Newtonian mechanics.
Phys. Rev. 150 (1966), 1079–1085.
648. Nelson, E. Dynamical theories of Brownian motion. Princeton University
Press, Princeton, N.J., 1967. 2001 re-edition by J. Suzuki. Available online at
www.math.princeton.edu/~nelson/books.html.
649. Nelson, E. The free Markoff field. J. Functional Analysis 12 (1973), 211–227.
650. Nelson, E. Critical diffusions. In Séminaire de probabilités, XIX, 1983/84,
vol. 1123 of Lecture Notes in Math. Springer, Berlin, 1985, pp. 1–11.
651. Nelson, E. Quantum fluctuations. Princeton Series in Physics. Princeton
University Press, Princeton, NJ, 1985.
652. Nelson, E. Stochastic mechanics and random fields. In École d’Été de Proba-
bilités de Saint-Flour XV–XVII, 1985–87, vol. 1362 of Lecture Notes in Math.,
Springer, Berlin, 1988, pp. 427–450.
653. Neunzert, H. An introduction to the nonlinear Boltzmann–Vlasov equation.
In Kinetic theories and the Boltzmann equation, C. Cercignani, Ed., vol. 1048
of Lecture Notes in Math., Springer, Berlin, Heidelberg, 1984, pp. 60–110.
654. Ohta, S.-i. On the measure contraction property of metric measure spaces.
Comment. Math. Helv. 82, 4 (2007), 805–828.
655. Ohta, S.-i. Gradient flows on Wasserstein spaces over compact Alexandrov
spaces. To appear in Amer. J. Math.
656. Ohta, S.-i. Products, cones, and suspensions of spaces with the measure
contraction property. J. Lond. Math. Soc. (2) 76, 1 (2007), 225–236.
657. Ohta, S.-i. Finsler interpolation inequalities. Preprint, 2008. Available online
at www.math.kyoto-u.ac.jp/~sohta.
658. Øksendal, B. Stochastic differential equations. An introduction with applica-
tions, sixth ed. Universitext. Springer-Verlag, Berlin, 2003.
659. Oliker, V. Embedding Sn into Rn+1 with given integral Gauss curvature and
optimal mass transport on Sn . Adv. Math. 213, 2 (2007), 600–620.
660. Oliker, V. Variational solutions of some problems in convexity via Monge–
Kantorovich optimal mass transport theory. Conference in Oberwolfach, July
2006 (personal communication).
661. Oliker, V., and Prussner, L. D. A new technique for synthesis of offset dual
reflector systems. In 10th Annual Review of Progress in Applied Computational
Electromagnetics, 1994, pp. 45–52.
662. Ollivier, Y. Ricci curvature of Markov chains on metric spaces. Preprint,
2007. Available online at www.umpa.ens-lyon.fr/~yollivie/publs.html.
663. Ollivier, Y., and Pansu, P. Courbure de Ricci et concentration de la mesure.
Working seminar notes. Available online at
www.umpa.ens-lyon.fr/~yollivie.
664. Osserman, R. The isoperimetric inequality. Bull. Amer. Math. Soc. 84, 6
(1978), 1182–1238.
665. Otsu, Y., and Shioya, T. The Riemannian structure of Alexandrov spaces.
J. Differential Geom. 39, 3 (1994), 629–658.
666. Otto, F. Double degenerate diffusion equations as steepest descent. Preprint
Univ. Bonn, 1996.
966 References

667. Otto, F. Dynamics of labyrinthine pattern formation in magnetic fluids: a

mean-field theory. Arch. Rational Mech. Anal. 141, 1 (1998), 63–103.
668. Otto, F. Lubrication approximation with prescribed nonzero contact angle.
Comm. Partial Differential Equations 23, 11–12 (1998), 2077–2164.
669. Otto, F. The geometry of dissipative evolution equations: the porous medium
equation. Comm. Partial Differential Equations 26, 1–2 (2001), 101–174.
670. Otto, F., and Tzavaras, A. E. Continuity of velocity gradients in suspen-
sions of rod-like molecules. Preprint, 2004. Available online at
www.math.umd.edu/~tzavaras/listsu.html.
671. Otto, F., and Villani, C. Generalization of an inequality by Talagrand and
links with the logarithmic Sobolev inequality. J. Funct. Anal. 173, 2 (2000),
361–400.
672. Otto, F., and Villani, C. Comment on: “Hypercontractivity of Hamilton–
Jacobi equations”, by S. G. Bobkov, I. Gentil and M. Ledoux. J. Math. Pures
Appl. (9) 80, 7 (2001), 697–700.
673. Otto, F., and Westdickenberg, M. Eulerian calculus for the contraction
in the Wasserstein distance. SIAM J. Math. Anal. 37, 4 (2005), 1227–1255.
674. Oxtoby, J. C. Ergodic sets. Bull. Amer. Math. Soc. 58 (1952), 116–136.
675. Paulin, F. Topologie de Gromov équivariante, structures hyperboliques et
arbres réels. Invent. Math. 94, 1 (1988), 53–80.
676. Perel′ man, G. Alexandrov’s spaces with curvatures bounded from below.
Unpublished preprint, available online at
www.math.psu.edu/petrunin/papers/papers.html.
677. Perel′ man, G. DC structure on Alexandrov spaces. Unpublished preprint,
preliminary version available online at
www.math.psu.edu/petrunin/papers/papers.html.
678. Perel′ man, G., and Petrunin, A. Quasigeodesics and gradient curves in
Alexandrov spaces. Unpublished preprint, available online at
www.math.psu.edu/petrunin/papers/papers.html.
679. Peres, Y. Mixing for Markov chains and spin systems. Unpublished notes,
available online at www.stat.berkeley.edu/~peres/ubc.pdf.
680. Petersen, P. Riemannian geometry, vol. 171 of Graduate Texts in Mathe-
matics. Springer-Verlag, New York, 1998.
681. Petrunin, A. Harmonic functions on Alexandrov spaces. Unpublished
preprint, available online at
www.math.psu.edu/petrunin/papers/papers.html.
682. Pinsker, M. S. Information and information stability of random variables
and processes. Holden-Day, San Francisco, 1964.
683. Plakhov, A. Yu. Exact solutions of the one-dimensional Monge–Kantorovich
problem. Mat. Sb. 195, 9 (2004), 57–74; English translation in Sb. Math. 195,
9–10 (2004), 1291–1307.
684. Pogorelov, A. V. Monge-Ampère equations of elliptic type. Groningen,
Noordhoff, 1964.
685. Pogorelov, A. V. The Dirichlet problem for the multidimensional analogue
of the Monge–Ampère equation. Dokl. Akad. Nauk SSSR 201 (1971), 790–793.
English translation in Soviet Math. Dokl. 12 (1971), 1727–1731.
686. Pratelli, A. How to show that some rays are maximal transport rays in
Monge problem. Rend. Sem. Mat. Univ. Padova 113 (2005), 179–201.
References 967

687. Pratelli, A. On the equality between Monge’s infimum and Kantorovich’s

minimum in optimal mass transportation. Ann. Inst. H. Poincaré Probab.
Statist. 43, 1 (2007), 1–13.
688. Pratelli, A. On the sufficiency of c-cyclical monotonicity for optimality of
transport plans. To appear in Math. Z.
689. Pratelli, A. Equivalence between some definitions for the optimal mass
transport problem and for the transport density on manifolds. Ann. Mat.
Pura Appl. (4) 184, 2 (2005), 215–238.
690. Prékopa, A. On logarithmic concave measures and functions. Acta Sci. Math.
(Szeged) 34 (1973), 335–343.
691. Prigozhin, L. Variational model of sandpile growth. European J. Appl. Math.
7, 3 (1996), 225–235.
692. Pulvirenti, A., and Toscani, G. The theory of the nonlinear Boltzmann
equation for Maxwell molecules in Fourier representation. Ann. Mat. pura ed
appl. 171, 4 (1996), 181–204.
693. Pulvirenti, M. On invariant measures for the 2-D Euler flow. In Mathematical
aspects of vortex dynamics (Leesburg, VA, 1988). SIAM, Philadelphia, PA,
1989, pp. 88–96.
694. Rachev, S. The Monge–Kantorovich mass transference problem and its
stochastic applications. Theory Probab. Appl. 29 (1984), 647–676.
695. Rachev, S. T. Probability metrics and the stability of stochastic models. John
Wiley & Sons Ltd., Chichester, 1991.
696. Rachev, S. T., and Rüschendorf, L. Mass Transportation Problems. Vol.
I: Theory, Vol. II: Applications. Probability and its applications. Springer-
Verlag, New York, 1998.
697. Rademacher, H. Über partielle und totale differenzierbarkeit von Funktionen
mehrerer Variabeln und über die Transformation der Doppelintegrale. Math.
Ann. 79, 4 (1919), 340–359.
698. Ramachandran, D., and Rüschendorf, L. A general duality theorem for
marginal problems. Probab. Theory Related Fields 101, 3 (1995), 311–319.
699. Ramachandran, D., and Rüschendorf, L. Duality and perfect probability
spaces. Proc. Amer. Math. Soc. 124, 7 (1996), 2223–2228.
700. Reinhard, E., Ashikhmin, M., Gooch, B., and Shirley, P. Color transfer
between images. IEEE Computer Graphics and Applications 21, 5 (2001), 34–
41.
701. Rey-Bellet, L., and Thomas, L. E. Exponential convergence to non-
equilibrium stationary states in classical statistical mechanics. Comm. Math.
Phys. 225, 2 (2002), 305–329.
702. Rigot, S. Mass transportation in groups of type H. Commun. Contemp.
Math. 7, 4 (2005), 509–537.
703. Rio, E. Inégalités de Hoeffding pour les fonctions lipschitziennes de suites
dépendantes. C. R. Acad. Sci. Paris Sér. I Math. 330, 10 (2000), 905–908.
704. Rio, E. Upper bounds for minimal distances in the central limit theorem.
Preprint, 2007.
705. Rockafellar, R. T. Convex analysis. Princeton University Press, Princeton,
NJ, 1997. Reprint of the 1970 original, Princeton Paperbacks.
706. Rodriguez-Iturbe, I., and Rinaldo, A. Fractal River Basins. Cambridge
University Press, 1997.
968 References

707. Roquejoffre, J.-M. Convergence to steady states or periodic solutions in

a class of Hamilton–Jacobi equations. J. Math. Pures Appl. (9) 80, 1 (2001),
85–104.
708. Rosenau, P. Tempered diffusion: A transport process with propagating front
and initial delay. Phys. Rev. A 46 (1992), 7371–7374.
709. Rosenblatt, M. Remarks on a multivariate transformation. Ann. Math.
Statistics 23 (1952), 470–472.
710. Rothaus, O. S. Diffusion on compact Riemannian manifolds and logarithmic
Sobolev inequalities. J. Funct. Anal. 42 (1981), 102–109.
711. Royer, G. Une initiation aux inégalités de Sobolev logarithmiques, vol. 5 of
Cours Spécialisés. Société Mathématique de France, Paris, 1999.
712. Rubinstein, J., and Wolansky, G. Intensity control with a free-form lens.
J. Opt. Soc. Am. A 24, 2 (2007), 463–469.
713. Rubner, Y., Tomasi, C., and Guibas, L. J. The Earth Mover’s distance as
a metric for image retrieval. International Journal of Computer Vision 40, 2
(2000), 99–121.
714. Rudin, W. Real and complex analysis, third ed. McGraw-Hill Book Co., New
York, 1987.
715. Rüschendorf, L. Sharpness of Fréchet-bounds. Z. Wahrsch. Verw. Gebiete
57, 2 (1981), 293–302.
716. Rüschendorf, L. The Wasserstein distance and approximation theorems.
Z. Wahrsch. Verw. Gebiete 70 (1985), 117–129.
717. Rüschendorf, L. Fréchet-bounds and their applications. In Advances in
probability distributions with given marginals (Rome, 1990), vol. 67 of Math.
Appl. Kluwer Acad. Publ., Dordrecht, 1991, pp. 151–187.
718. Rüschendorf, L. Optimal solutions of multivariate coupling problems. Appl.
Math. (Warsaw) 23, 3 (1995), 325–338.
719. Rüschendorf, L. On c-optimal random variables. Statist. Probab. Lett. 27,
3 (1996), 267–270.
720. Rüschendorf, L. Wasserstein-metric. Historical note. Personal communica-
tion, 2006.
721. Rüschendorf, L. Monge–Kantorovich transportation problem and optimal
couplings. Notes based on a talk at the MSRI meeting on Optimal Transport
(Berkeley, November 2005), 2006.
722. Rüschendorf, L., and Rachev, S. T. A characterization of random vari-
ables with minimum L2 -distance. J. Multivariate Anal. 32, 1 (1990), 48–54.
Corrigendum in J. Multivariate Anal. 34, 1 (1990), p. 156.
723. Rüschendorf, L., and Uckelmann, L. On optimal multivariate couplings.
In Distributions with given marginals and moment problems (Prague, 1996).
Kluwer Acad. Publ., Dordrecht, 1997, pp. 261–273.
724. Rüschendorf, L., and Uckelmann, L. Numerical and analytical results
for the transportation problem of Monge–Kantorovich. Metrika 51, 3 (2000),
245–258.
725. Rüschendorf, L., and Uckelmann, L. On the n-coupling problem. J. Mul-
tivariate Anal. 81, 2 (2002), 242–258.
726. Saloff-Coste, L. A note on Poincaré, Sobolev, and Harnack inequalities.
Internat. Math. Res. Notices, 2 (1992), 27–38.
727. Saloff-Coste, L. Convergence to equilibrium and logarithmic Sobolev con-
stant on manifolds with Ricci curvature bounded below. Colloquium Math. 67
(1994), 109–121.
References 969

728. Saloff-Coste, L. Aspects of Sobolev-type inequalities, vol. 289 of London

Mathematical Society Lecture Note Series. Cambridge University Press, Cam-
bridge, 2002.
729. Sammer, M. D. Aspects of mass transportation in discrete concentration
inequalities. PhD thesis, Georgia Institute of Technology, 2005. Available online
at etd.gatech.edu/theses/etd-04112005-163457/unrestricted/
sammer marcus d 200505 phd.pdf
730. Samson, P.-M. Concentration of measure inequalities for Markov chains and
Φ-mixing processes. Ann. Probab. 28, 1 (2000), 416–461.
731. Santambrogio, F. Transport and concentration problems with interaction
effects. J. Global Optim. 38, 1 (2007), 129–141.
732. Santambrogio, F. Optimal channel networks, landscape function and
branched transport. Interfaces and Free Boundaries 9 (2007), 149–169.
733. Santambrogio, F. Variational problems in transport theory with mass con-
centration. PhD Thesis, Scuola Normale di Pisa, 2006.
734. Santambrogio, F., and Tilli, P. Blow-up of optimal sets in the irrigation
problems. J. Geom. Anal. 15, 2 (2005), 343–362.
735. Savaré, G. Gradient flows and diffusion semigroups in metric spaces under
lower curvature bounds. C. R. Acad. Sci. Paris Sér. I Math. 345, 3 (2007),
151–154.
736. Savaré, G. Work in progress.
737. Savin, O. A free boundary problem with optimal transportation. Comm. Pure
Appl. Math. 57, 1 (2004), 126–140.
738. Schachermayer, W., and Teichmann, J. Characterization of optimal trans-
port plans for the Monge–Kantorovich problem. To appear in Proc. Amer.
Math. Soc. Available online at www.fam.tuwien.ac.at/~wschach/pubs.
739. Schachermayer, W., and Teichmann, J. How close are the Option Pricing
Formulas of Bachelier and Black–Merton–Scholes? To appear in Math. Finance.
740. Scheffer, G. Isopérimétrie fonctionnelle dimensionnelle en courbure positive.
C. R. Acad. Sci. Paris Sér. I Math. 331, 3 (2000), 251–254.
741. Schneider, R. Convex bodies: the Brunn–Minkowski theory. Cambridge Uni-
versity Press, Cambridge, 1993.
742. Schochet, S. The point-vortex method for periodic weak solutions of the 2-D
Euler equations. Comm. Pure Appl. Math. 49, 9 (1996), 911–965.
743. Schrijver, A. Combinatorial optimization: polyhedra and efficiency, vol. 24
of Algorithms and combinatorics. Springer-Verlag, 2003.
744. Schulz, F. Regularity theory for quasilinear elliptic systems and Monge–
Ampère equations in two dimensions. Lect. Notes in Math., vol. 1445, Springer-
Verlag, Berlin, 1990.
745. Sever, M. An existence theorem in the large for zero-pressure gas dynamics.
Differential Integral Equations 14, 9 (2001), 1077–1092.
746. Shannon, C. E. A mathematical theory of communication. Bell System Tech.
J. 27 (1948), 379–423, 623–656.
747. Shannon, C. E., and Weaver, W. The mathematical theory of communica-
tion. The University of Illinois Press, Urbana, Ill., 1949.
748. Shao, J. Inégalité du coût de transport et le problème de Monge–Kantorovich
sur le groupe des lacets et l’espace des chemins. PhD thesis, Univ. Bourgogne,
2006.
749. Shao, J. Hamilton–Jacobi semigroups in infinite dimensional spaces. Bull.
Sci. Math. 130, 8 (2006), 720–738.
970 References

750. Sheng, W., Trudinger, N. S., and Wang, X.-J. The Yamabe problem for
higher order curvatures. Preprint, archived online at
arxiv.org/abs/math/0505463.
751. Shioya, T. Mass of rays in Alexandrov spaces of nonnegative curvature. Com-
ment. Math. Helv. 69, 2 (1994), 208–228.
752. Siburg, K. F. The principle of least action in geometry and dynamics,
vol. 1844 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2004.
753. Simon, L. Lectures on geometric measure theory. Proceedings of the Centre
for Mathematical Analysis, Australian National University, 3, Canberra, 1983.
754. Smith, C., and Knott, M. On Hoeffding–Fréchet bounds and cyclic mono-
tone relations. J. Multivariate Anal. 40, 2 (1992), 328–334.
755. Sobolevskiı̆, A., and Frisch, U. Application of optimal transportation
theory to the reconstruction of the early Universe. Zap. Nauchn. Sem. S.-
Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 312, 11 (2004), 303–309, 317.
English translation in J. Math. Sci. (N. Y.) 133, 4 (2006), 1539–1542.
756. Soibelman, Y. Notes on noncommutative Riemannian geometry. Personal
communication, 2006.
757. Spohn, H. Large scale dynamics of interacting particles. Texts and Mono-
graphs in Physics. Springer-Verlag, Berlin, 1991.
758. Stam, A. Some inequalities satisfied by the quantities of information of Fisher
and Shannon. Inform. Control 2 (1959), 101–112.
759. Stroock, D. W. An introduction to the analysis of paths on a Riemannian
manifold, vol. 74 of Mathematical Surveys and Monographs. American Mathe-
matical Society, Providence, RI, 2000.
760. Sturm, K.-T. Diffusion processes and heat kernels on metric spaces. Ann.
Probab. 26, 1 (1998), 1–55.
761. Sturm, K.-T. Convex functionals of probability measures and nonlinear dif-
fusions on manifolds. J. Math. Pures Appl. (9) 84, 2 (2005), 149–168.
762. Sturm, K.-T. On the geometry of metric measure spaces. I. Acta Math. 196,
1 (2006), 65–131.
763. Sturm, K.-T. On the geometry of metric measure spaces. II. Acta Math. 196,
1 (2006), 133–177.
764. Sturm, K.-T., and von Renesse, M.-K. Transport inequalities, gradient
estimates, entropy and Ricci curvature. Comm. Pure Appl. Math. 58, 7 (2005),
923–940.
765. Sudakov, V. N. Geometric problems in the theory of infinite-dimensional
probability distributions. Proc. Steklov Inst. Math. 141 (1979), 1–178.
766. Sudakov, V. N. and Cirel′ son, B. S. Extremal properties of half-spaces
for spherically invariant measures. Zap. Naučn. Sem. Leningrad. Otdel. Mat.
Inst. Steklov. (LOMI) 41 (1974), 14–24. English translation in J. Soviet Math.
9 (1978), 9–18.
767. Sznitman, A.-S. Equations de type de Boltzmann, spatialement homogènes.
Z. Wahrsch. Verw. Gebiete 66 (1984), 559–562.
768. Sznitman, A.-S. Topics in propagation of chaos. In École d’Été de Probabilités
de Saint-Flour XIX—1989. Springer, Berlin, 1991, pp. 165–251.
769. Szulga, A. On minimal metrics in the space of random variables. Teor.
Veroyatnost. i Primenen. 27, 2 (1982), 401–405.
770. Talagrand, M. A new isoperimetric inequality for product measure and the
tails of sums of independent random variables. Geom. Funct. Anal. 1, 2 (1991),
211–223.
References 971

771. Talagrand, M. Matching random samples in many dimensions. Ann. Appl.

Probab. 2, 4 (1992), 846–856.
772. Talagrand, M. Concentration of measure and isoperimetric inequalities in
product spaces. Inst. Hautes Études Sci. Publ. Math., 81 (1995), 73–205.
773. Talagrand, M. New concentration inequalities in product spaces. Invent.
Math. 126, 3 (1996), 505–563.
774. Talagrand, M. Transportation cost for Gaussian and other product mea-
sures. Geom. Funct. Anal. 6, 3 (1996), 587–600.
775. Talagrand, M. Spin Glasses: a challenge for mathematicians. Cavity and
mean field models. Vol. 46 of Ergebnisse der Mathematik und ihrer Grenzgebi-
ete. Springer-Verlag, Berlin, 2003.
776. Tanaka, H. An inequality for a functional of probability distributions and
its application to Kac’s one-dimensional model of a Maxwellian gas. Z.
Wahrscheinlichkeitstheorie und Verw. Gebiete 27 (1973), 47–52.
777. Tanaka, H. Probabilistic treatment of the Boltzmann equation of Maxwellian
molecules. Z. Wahrsch. Verw. Gebiete 46, 1 (1978/79), 67–105.
778. Taton, R. Gaspard Monge. Birkhäuser, Basel, 1951.
779. Taton, R., Ed. L’oeuvre scientifique de Gaspard Monge. P.U.F., Paris, 1951.
780. Thirring, W. Classical mathematical physics, third ed. Springer-Verlag, New
York, 1997. Dynamical systems and field theories, Translated from the German
by Evans M. Harrell, II.
781. Thorisson, H. Coupling, stationarity, and regeneration. Probability and its
Applications. Springer-Verlag, New York, 2000.
782. Topping, P. L-optimal transportation for Ricci flow. Preprint, available online
at www.warwick.ac.uk/~maseq.
783. Toscani, G. New a priori estimates for the spatially homogeneous Boltzmann
equation. Cont. Mech. Thermodyn. 4 (1992), 81–93.
784. Toscani, G. Sur l’inégalité logarithmique de Sobolev. C.R. Acad. Sci. Paris,
Série I, 324 (1997), 689–694.
785. Toscani, G. Entropy production and the rate of convergence to equilibrium
for the Fokker–Planck equation. Quart. Appl. Math. 57, 3 (1999), 521–541.
786. Trudinger, N. S. Lectures on nonlinear elliptic equations of second order.
Notes by N. Ishimura. Lectures in Mathematical Sciences 9, Univ. of Tokyo,
1995, pp. 1–52.
787. Trudinger, N. S. Isoperimetric inequalities for quermassintegrals. Ann. Inst.
H. Poincaré Anal. Non Linéaire 11, 4 (1994), 411–425.
788. Trudinger, N. S. Recent developments in elliptic partial differential equations
of Monge–Ampère type. International Congress of Mathematicians, Vol. III,
Eur. Math. Soc., Zürich, 2006, pp. 291–301.
789. Trudinger, N. S., and Wang, X.-J. Hessian measures. I. Topol. Methods
Nonlinear Anal. 10, 2 (1997), 225–239.
790. Trudinger, N. S., and Wang, X.-J. Hessian measures. II. Ann. of Math.
(2) 150, 2 (1999), 579–604.
791. Trudinger, N. S., and Wang, X.-J. On the Monge mass transfer problem.
Calc. Var. Partial Differential Equations 13 (2001), 19–31.
792. Trudinger, N. S., and Wang, X.-J. Hessian measures. III. J. Funct. Anal.
193, 1 (2002), 1–23.
793. Trudinger, N. S., and Wang, X.-J. On strict convexity and continuous
differentiability of potential functions in optimal transportation. To appear in
Arch. Ration. Mech. Anal.
972 References

794. Trudinger, N. S., and Wang, X.-J. On the second boundary value problem
for Monge–Ampère type equations and optimal transportation. Preprint, 2006.
Archived online at arxiv.org/abs/math.AP/0601086.
795. Tuero-Dı́az, A. On the stochastic convergence of representations based on
Wasserstein metrics. Ann. Probab. 21, 1 (1993), 72–85.
796. Uckelmann, L. Optimal couplings between one-dimensional distributions.
Distributions with given marginals and moment problems (Prague, 1996),
Kluwer Acad. Publ., Dordrecht, 1997, pp. 275–281.
797. Unterreiter, A., Arnold, A., Markowich, P., and Toscani, G. On
generalized Csiszár–Kullback inequalities. Monatsh. Math. 131, 3 (2000), 235–
253.
798. Urbas, J. On the second boundary value problem for equations of Monge–
Ampère type. J. Reine Angew. Math. 487 (1997), 115–124.
799. Urbas, J. Mass transfer problems. Lecture notes from a course given in Univ.
Bonn, 1997–1998.
800. Urbas, J. The second boundary value problem for a class of Hessian equations.
Comm. Partial Differential Equations 26, 5–6 (2001), 859–882.
801. Valdimarsson, S. I. On the Hessian of the optimal transport potential.
Preprint, 2006.
802. Varadhan, S. R. S. Mathematical statistics. Courant Institute of Mathemat-
ical Sciences New York University, New York, 1974. Lectures given during the
academic year 1973–1974, Notes by Michael Levandowsky and Norman Rubin.
803. Vasershtein, L. N. Markov processes over denumerable products of spaces
describing large system of automata. Problemy Peredači Informacii 5, 3 (1969),
64–72.
804. Vázquez, J. L. An introduction to the mathematical theory of the porous
medium equation. In Shape optimization and free boundaries (Montreal, PQ,
1990). Kluwer Acad. Publ., Dordrecht, 1992, pp. 347–389.
805. Vázquez, J. L. The porous medium equation. Mathematical theory. Ox-
ford Mathematical Monographs. The Clarendon Press Oxford University Press,
New York, 2006.
806. Vázquez, J. L. Smoothing and decay estimates for nonlinear parabolic equa-
tions of porous medium type, vol. 33 of Oxford Lecture Series in Mathematics
and its Applications. Oxford University Press, 2006.
807. Vershik, A. M. Some remarks on the infinite-dimensional problems of linear
programming. Russian Math. Surveys 25, 5 (1970), 117–124.
808. Vershik, A. M. L. V. Kantorovich and linear programming. Historical note
(2001, updated in 2007). Archived at www.arxiv.org/abs/0707.0491.
809. Vershik, A. M. The Kantorovich metric: the initial history and little-known
applications. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov.
(POMI) 312, Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 11 (2004),
69–85, 311.
810. Vershik, A. M., Ed. J. Math. Sci. 133, 4 (2006). Special issue dedicated
to L. V. Kantorovich. Springer, New York, 2006. Translated from the Rus-
sian: Zapsiki Nauchn. seminarov POMI, vol. 312: “Theory of representation of
Dynamical Systems. Special Issue”. Saint-Petersburg, 2004.
811. Villani, C. Remarks about negative pressure problems. Unpublished notes,
2002.
References 973

812. Villani, C. A review of mathematical topics in collisional kinetic theory. In

Handbook of mathematical fluid dynamics, Vol. I. North-Holland, Amsterdam,
2002, pp. 71–305.
813. Villani, C. Optimal transportation, dissipative PDE’s and functional in-
equalities. In Optimal transportation and applications (Martina Franca, 2001),
vol. 1813 of Lecture Notes in Mathematics, Springer, Berlin, 2003, pp. 53–89.
814. Villani, C. Topics in optimal transportation, vol. 58 of Graduate Studies in
Mathematics. American Mathematical Society, Providence, RI, 2003.
815. Villani, C. Trend to equilibrium for dissipative equations, functional inequali-
ties and mass transportation. In Recent advances in the theory and applications
of mass transport, vol. 353 of Contemp. Math., Amer. Math. Soc., Providence,
RI, 2004, pp. 95–109.
816. Villani, C. Hypocoercive diffusion operators. International Congress of Math-
ematicians, Vol. III, Eur. Math. Soc., Zürich, 2006, pp. 473–498.
817. Villani, C. Entropy dissipation and convergence to equilibrium. Notes from
a series of lectures in Institut Henri Poincaré, Paris (2001). Updated in 2004.
Lect. Notes in Math., vol. 1916, Springer-Verlag, Berlin, 2008, pp. 1–70.
818. Villani, C. Hypocoercivity. Preprint, 2006. To appear as a Memoir of the
AMS.
819. Villani, C. Intégration et analyse de Fourier (in french). Incomplete lecture
notes, available online at www.umpa.ens-lyon.fr/~cvillani.
820. Villani, C. Mathematics of granular materials. J. Statist. Phys. 124, 2–4
(2006), 781–822.
821. Villani, C. Transport optimal et courbure de Ricci. To appear in Sémin.
Théor. Spectr. Géom. Available online at www.umpa.ens-lyon.fr/~cvillani.
822. Villani, C. Weak stability of a 4th-order curvature condition arising in opti-
mal transport theory. Preprint, 2007.
823. Vinter, R. B. Optimal control. Systems & Control: Foundations & Applica-
tions, Birkhäuser, Boston, 2000.
824. von Nessi, G. T. On regularity for potentials of optimal transportation prob-
lems on spheres and related Hessian equations. PhD Thesis, Australian Na-
tional University, 2008.
825. von Renesse, M.-K. On local Poincaré via transportation. To appear in
Math. Z. Archived online at math.MG/0505588, 2005.
826. von Renesse, M.-K. An optimal transport view on Schrödinger’s equation.
Preprint, 2008.
827. von Renesse, M.-K., and Sturm, K.-T. Entropic measure and Wasserstein
diffusion. To appear in Ann. Probab.
828. Wang, F.-Y. Logarithmic Soblev inequalities on non-compact Riemannian
manifolds. Probab. Theory Relat. Fields 109 (1997), 417–424.
829. Wang, F.-Y. Logarithmic Sobolev inequalities: conditions and counterexam-
ples. J. Operator Theory 46, 1 (2001), 183–197.
830. Wang, F.-Y. Transportation cost inequalities on path spaces over Riemannian
manifolds. Illinois J. Math. 46, 4 (2002), 1197–1206.
831. Wang, F.-Y. Probability distance inequalities on Riemannian manifolds and
path spaces. J. Funct. Anal. 206, 1 (2004), 167–190.
832. Wang, X.-J. Some counterexamples to the regularity of Monge–Ampère equa-
tions. Proc. Amer. Math. Soc. 123, 3 (1995), 841–845.
833. Wang, X.-J. On the design of a reflector antenna. Inverse Problems 12, 3
(1996), 351–375.
974 References

834. Wang, X.-J. On the design of a reflector antenna. II. Calc. Var. Partial
Differential Equations 20, 3 (2004), 329–341.
835. Wang, X.-J. Schauder estimates for elliptic and parabolic equations. Chin.
Ann. Math. 27B, 6 (2006), 637–642.
836. Werner, R. F. The uncertainty relation for joint measurement of position
and momentum. Quantum Information and Computing (QIC) 4, 6–7 (2004),
546–562. Archived online at arxiv.org/abs/quant-ph/0405184.
837. Wolansky, G. On time reversible description of the process of coagulation
and fragmentation. To appear in Arch. Ration. Mech. Anal.
838. Wolansky, G. Extended least action principle for steady flows under a pre-
scribed flux. To appear in Calc. Var. Partial Differential Equations.
839. Wolansky, G. Minimizers of Dirichlet functionals on the n-torus
and the weak KAM theory. Preprint, 2007. Available online at
www.math.technion.ac.il/~gershonw.
840. Wolfson, J. G. Minimal Lagrangian diffeomorphisms and the Monge–
Ampère equation. J. Differential Geom. 46, 2 (1997), 335–373.
841. Wu, L. Poincaré and transportation inequalities for Gibbs measures under the
Dobrushin uniqueness condition. Ann. Probab. 34, 5 (2006), 1960–1989.
842. Wu, L. A simple inequality for probability measures and applications.
Preprint, 2006.
843. Wu, L., and Zhang, Z. Talagrand’s T2 -transportation inequality w.r.t. a
uniform metric for diffusions. Acta Math. Appl. Sin. Engl. Ser. 20, 3 (2004),
357–364.
844. Wu, L., and Zhang, Z. Talagrand’s T2 -transportation inequality and log-
Sobolev inequality for dissipative SPDEs and applications to reaction-diffusion
equations. Chinese Ann. Math. Ser. B 27, 3 (2006), 243–262.
845. Yukich, J. E. Optimal matching and empirical measures. Proc. Amer. Math.
Soc. 107, 4 (1989), 1051–1059.
846. Zhu, S. The comparison geometry of Ricci curvature. In Comparison geometry
(Berkeley, CA, 1993–94), vol. 30 of Math. Sci. Res. Inst. Publ., Cambridge
University Press, Cambridge, 1997, pp. 221–262.
List of short statements

Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Deterministic coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Existence of an optimal coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Lower semicontinuity of the cost functional . . . . . . . . . . . . . . . . . . . . . . 55
Tightness of transference plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Optimality is inherited by restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Convexity of the optimal cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Cyclical monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
c-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
c-concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Alternative characterization of c-convexity . . . . . . . . . . . . . . . . . . . . . . 69
Kantorovich duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Restriction of c-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Restriction for the Kantorovich duality theorem . . . . . . . . . . . . . . . . . 88
Stability of optimal transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Compactness of the set of optimal plans . . . . . . . . . . . . . . . . . . . . . . . . 90
Measurable selection of optimal plans . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Stability of the transport map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Dual transport inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Criterion for solvability of the Monge problem . . . . . . . . . . . . . . . . . . . 96
Wasserstein distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Kantorovich–Rubinstein distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Wasserstein space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Weak convergence in Pp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Wp metrizes Pp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Continuity of Wp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Metrizability of the weak topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Cauchy sequences in Wp are tight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
976 List of short statements

Wasserstein distance is controlled by weighted total variation . . . . . . 115

Topology of the Wasserstein space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Classical conditions on a Lagrangian function . . . . . . . . . . . . . . . . . . . . 130
Lagrangian action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Coercive action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Properties of Lagrangian actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Dynamical coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Dynamical optimal coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Displacement interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Displacement interpolation as geodesics . . . . . . . . . . . . . . . . . . . . . . . . . 139
Uniqueness of displacement interpolation . . . . . . . . . . . . . . . . . . . . . . . . 140
Wp -Lipschitz continuity of p-moments . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Interpolation from intermediate times and restriction . . . . . . . . . . . . . 151
Nonbranching is inherited by the Wasserstein space . . . . . . . . . . . . . . 152
Hamilton–Jacobi–Hopf–Lax–Oleinik evolution semigroup . . . . . . . . . . 156
Elementary properties of Hamilton–Jacobi semigroups . . . . . . . . . . . . 156
Interpolation of prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Mather’s shortening lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Mather’s shortening lemma again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
The transport from intermediate times is locally Lipschitz . . . . . . . . . 180
Absolute continuity of displacement interpolation . . . . . . . . . . . . . . . . 182
Focalization is impossible before the cut locus . . . . . . . . . . . . . . . . . . . 192
Lipschitz graph theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Useful transport quantities describing a Lagrangian system . . . . . . . . 200
Mather critical value and stationary Hamilton–Jacobi equation . . . . 201
A rough nonsmooth shortening lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Shortening lemma for power cost functions . . . . . . . . . . . . . . . . . . . . . . 207
Conditions for single-valued subdifferentials . . . . . . . . . . . . . . . . . . . . . 218
Solution of the Monge problem, I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Monge problem for quadratic cost, first result . . . . . . . . . . . . . . . . . . . 221
Non-connectedness of the c-subdifferential . . . . . . . . . . . . . . . . . . . . . . . 223
Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Approximate differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Lipschitz continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Subdifferentiability, superdifferentiability . . . . . . . . . . . . . . . . . . . . . . . . 232
Sub- and superdifferentiability imply differentiability . . . . . . . . . . . . . 233
Regularity and differentiability almost everywhere . . . . . . . . . . . . . . . . 234
Semiconvexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Local equivalence of semiconvexity and subdifferentiability . . . . . . . . 241
Properties of Lagrangian cost functions . . . . . . . . . . . . . . . . . . . . . . . . . 247
c-subdifferentiability of c-convex functions . . . . . . . . . . . . . . . . . . . . . . . 251
List of short statements 977

Subdiﬀerentiability of c-convex functions . . . . . . . . . . . . . . . . . . . . . . . . 251

Differentiability of c-convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Solution of the Monge problem II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Solution of the Monge problem without conditions at infinity . . . . . . 259
Solution of the Monge problem for the square distance . . . . . . . . . . . . 261
Solution of the Monge problem with possibly infinite total cost . . . . 262
Generalized Monge problem for the square distance . . . . . . . . . . . . . . 268
Tangent cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Countable rectifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Sufficient conditions for countable rectifiability . . . . . . . . . . . . . . . . . . . 270
Clarke subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Nonsmooth implicit function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Implicit function theorem for two subdifferentiable functions . . . . . . . 274
Jacobian equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Change of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
An example of discontinuous optimal transport . . . . . . . . . . . . . . . . . . 297
A further example of discontinuous optimal transport . . . . . . . . . . . . 299
Smoothness needs Assumption (C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
c-segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
c-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
regular cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Reformulation of regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Nonregularity implies nondensity of differentiable c-convex functions 311
Nonsmoothness of the Kantorovich potential . . . . . . . . . . . . . . . . . . . . 311
c-second fundamental form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Ma–Trudinger–Wang tensor, or c-curvature operator . . . . . . . . . . . . . 314
c-exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Loeper’s identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Differential formulation of c-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . 318
Differential formulation of regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
Equivalence of regularity conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Smoothness of the optimal transport needs nonnegative curvature . . 326
Differential criterion for c-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Control of c-subdifferential by c-convexity of target . . . . . . . . . . . . . . . 330
Caffarelli’s regularity theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Urbas–Trudinger–Wang regularity theory . . . . . . . . . . . . . . . . . . . . . . . 332
Loeper–Ma–Trudinger–Wang regularity theory . . . . . . . . . . . . . . . . . . . 332
Caffarelli’s interior a priori estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
Loeper–Ma–Trudinger–Wang interior a priori estimates . . . . . . . . . . . 334
Smoothness of optimal transport on S n−1 . . . . . . . . . . . . . . . . . . . . . . . 336
Standard approximation scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
978 List of short statements

Regularization of singular transport problems . . . . . . . . . . . . . . . . . . . . 352

C 2 -small functions are d2 /2-convex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Representation of Lipschitz paths in P2 (M ) . . . . . . . . . . . . . . . . . . . . . 358
Second differentiability of semiconvex functions . . . . . . . . . . . . . . . . . . 377
CD(K, N ) curvature-dimension bound . . . . . . . . . . . . . . . . . . . . . . . . . . 400
One-dimensional CD(K, N ) model spaces . . . . . . . . . . . . . . . . . . . . . . . 401
Integral reformulation of curvature-dimension bounds . . . . . . . . . . . . . 403
Curvature-dimension bounds with direction of motion taken out . . . 404
Curvature-dimension bounds by comparison . . . . . . . . . . . . . . . . . . . . . 406
Barycenters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Distortion coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Computation of distortion coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 410
Reference distortion coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
Distortion coefficients and concavity of Jacobian determinant . . . . . . 412
Ricci curvature bounds in terms of distortion coefficients . . . . . . . . . . 412
Alexandrov’s second differentiability theorem . . . . . . . . . . . . . . . . . . . . 416
One-dimensional comparison for second-order inequalities . . . . . . . . . 423
Jacobi matrices have symmetric logarithmic derivatives . . . . . . . . . . . 426
Cosymmetrization of Jacobi matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Jacobi matrices with positive determinant . . . . . . . . . . . . . . . . . . . . . . . 427
Gradient formula in Wasserstein space . . . . . . . . . . . . . . . . . . . . . . . . . . 438
Hessian formula in Wasserstein space . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Convexity in a geodesic space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Convexity and lower Hessian bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Λ-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
Displacement convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Displacement convexity classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
Behavior of functions in DCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Moment conditions make sense of Uν (µ) . . . . . . . . . . . . . . . . . . . . . . . . 475
Local displacement convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
CD bounds read off from displacement convexity . . . . . . . . . . . . . . . . . 479
CD(K, ∞) and CD(0, N ) bounds via optimal transport . . . . . . . . . . . 480
Necessary condition for displacement convexity . . . . . . . . . . . . . . . . . . 487
Finiteness of time-integral in displacement convexity inequality . . . . 487
Distorted Uν functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
β
Domain of definition of Uπ,ν . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
(K,N )
β
Definition of Uπ,νt
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
β
Definition of Uπ,ν in the limit cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Distorted displacement convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
CD bounds read off from distorted displacement convexity . . . . . . . . 494
One-dimensional CD bounds and displacement convexity . . . . . . . . . . 500
List of short statements 979

Intrinsic displacement convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500

Doubling property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
Doubling measures have full support . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Distorted Brunn–Minkowski inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Brunn–Minkowski inequality in nonnegative curvature . . . . . . . . . . . . 510
Bishop–Gromov inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
CD(K, N ) implies doubling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Dimension-free control on the growth of balls . . . . . . . . . . . . . . . . . . . . 516
Local Poincaré inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
CD(K, N ) implies pointwise bounds on displacement interpolants . . 523
Preservation of uniform bounds in nonnegative curvature . . . . . . . . . 524
Jacobian bounds revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Intrinsic pointwise bounds on the displacement interpolant . . . . . . . . 529
Democratic condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
CD(K, N ) implies Dm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
Doubling + democratic imply local Poincaré . . . . . . . . . . . . . . . . . . . . 533
CD(K, N ) implies local Poincaré . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
Prékopa–Leindler inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
Finite-dimension distorted Prékopa–Leindler inequality . . . . . . . . . . . 537
Diﬀerentiating an energy along optimal transport . . . . . . . . . . . . . . . . 542
Generalized Fisher information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
Fisher information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
Distorted HWI inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
HWI inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Logarithmic Sobolev inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
Bakry–Émery theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
Sobolev-L∞ interpolation inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
Sobolev inequalities from CD(K, N ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
Sobolev inequalities in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
CD(K, N ) implies L1 -Sobolev inequalities . . . . . . . . . . . . . . . . . . . . . . . 569
Poincaré inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
Exponential measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
Lichnerowicz’s spectral gap inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 573
Tp inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
Dual formulation of Tp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
Dual formulation of T1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
Tensorization of Tp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
T2 inequalities tensorize exactly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
Additivity of entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
Gaussian concentration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
CKP inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
980 List of short statements

CD(K, ∞) implies T2 (K) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599

Some properties of the quadratic Hamilton–Jacobi semigroup . . . . . . 600
Logarithmic Sobolev ⇒ T2 ⇒ Poincaré . . . . . . . . . . . . . . . . . . . . . . . . . 601
T2 sometimes implies log Sobolev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
T2 and dimension free Gaussian concentration . . . . . . . . . . . . . . . . . . . 606
Quadratic-linear cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
Reformulations of Poincaré inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 609
From generalized log Sobolev to transport to generalized Poincaré . . 610
Measure concentration from Poincaré inequality . . . . . . . . . . . . . . . . . 615
Product measure concentration from Poincaré inequality . . . . . . . . . . 617
Finite-dimensional transport-energy inequalities . . . . . . . . . . . . . . . . . . 621
Further finite-dimensional transport-energy inequalities . . . . . . . . . . . 622
Some properties of the Hamilton–Jacobi semigroup on a manifold . . 626
Reformulations of gradient flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
Locally absolutely continuous paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
Gradient flows in a geodesic space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
Derivative of the Wasserstein distance . . . . . . . . . . . . . . . . . . . . . . . . . . 652
Computation of subdifferentials in Wasserstein space . . . . . . . . . . . . . 665
Displacement convexity of H: above-tangent formulation . . . . . . . . . . 666
Diffusion equations as gradient flows in the Wasserstein space . . . . . 688
Heat equation as a gradient flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
Stability of gradient flows in the Wasserstein space . . . . . . . . . . . . . . . 691
Differentiation through doubling of variables . . . . . . . . . . . . . . . . . . . . 694
Computations for gradient flow diffusion equations . . . . . . . . . . . . . . . 710
Integrated regularity for gradient flows . . . . . . . . . . . . . . . . . . . . . . . . . . 714
Equilibration in positive curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
Short-time regularization for gradient flows . . . . . . . . . . . . . . . . . . . . . . 721
Infinite-dimensional Sobolev inequalities from Ricci curvature . . . . . . 737
Bakry–Émery theorem again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
Generalized Sobolev inequalities under Ricci curvature bounds . . . . . 738
Sobolev inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738
From Sobolev-type inequalities to concentration inequalities . . . . . . . 739
From Log Sobolev to Talagrand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
Metric couplings as semi-distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
Metric gluing lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
Approximate isometries converge to isometries . . . . . . . . . . . . . . . . . . . 767
Gromov–Hausdorff convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
Convergence of geodesic spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
Compactness criterion in Gromov–Hausdorff topology . . . . . . . . . . . . 770
Local Gromov–Hausdorff convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 771
Geodesic local Gromov–Hausdorff convergence . . . . . . . . . . . . . . . . . . . 771
List of short statements 981

Pointed Gromov–Hausdorﬀ convergence . . . . . . . . . . . . . . . . . . . . . . . . . 772

Blow-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
Ascoli theorem in Gromov–Hausdorff converging sequences . . . . . . . . 775
Prokhorov theorem in Gromov–Hausdorff converging sequences . . . . 775
Compactness of locally finite measures . . . . . . . . . . . . . . . . . . . . . . . . . . 776
Doubling lets metric and metric-measure approaches coincide . . . . . . 780
dGP convergence and doubling imply dGHP convergence . . . . . . . . . . 781
Doubling implies uniform total boundedness . . . . . . . . . . . . . . . . . . . . . 782
Measured Gromov–Hausdorff topology . . . . . . . . . . . . . . . . . . . . . . . . . . 783
Compactness in measured Gromov–Hausdorff topology . . . . . . . . . . . 784
Gromov’s precompactness theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
Kinetic energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
Regularity of the speed field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791
If Xk converges then P2 (Xk ) also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793
If f is an approximate isometry then f# also . . . . . . . . . . . . . . . . . . . . 794
Optimal transport is stable under Gromov–Hausdorff convergence . . 796
Gromov–Hausdorff stability of the dual Kantorovich problem . . . . . . 806
Pointed convergence of Xk implies local convergence of P2 (Xk ) . . . . . 807
Integral functionals for singular measures . . . . . . . . . . . . . . . . . . . . . . . 814
Rewriting of the distorted Uν functional . . . . . . . . . . . . . . . . . . . . . . . . 815
Rescaled subadditivity of the distorted Uν functionals . . . . . . . . . . . . 815
Weak curvature-dimension condition . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
Smooth weak CD(K, N ) spaces are CD(K, N ) manifolds . . . . . . . . . . 817
Consistency of the CD(K, N ) conditions . . . . . . . . . . . . . . . . . . . . . . . . 818
Bonnet–Myers diameter bound for weak CD(K, N ) spaces . . . . . . . . . 819
Sufficient condition to be a weak CD(K, N ) space . . . . . . . . . . . . . . . . 819
Legendre transform of a real-valued convex function . . . . . . . . . . . . . . 824
Legendre representation of Uν . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825
β
Continuity and contraction properties of Uν and Uπ,ν . . . . . . . . . . . . . 825
Another sufficient condition to be a weak CD(K, N ) space . . . . . . . . 841
Stability of weak CD(K, N ) under MGH . . . . . . . . . . . . . . . . . . . . . . . . 842
Stability of weak CD(K, N ) under pMGH . . . . . . . . . . . . . . . . . . . . . . . 842
Smooth MGH limits of CD(K, N ) manifolds are CD(K, N ) . . . . . . . . 848
Compactness of the space of weak CD(K, N ) spaces . . . . . . . . . . . . . . 850
Regularizing kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
Separability of L1 (C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852
Elementary consequences of weak CD(K, N ) bounds . . . . . . . . . . . . . . 866
Restriction of the CD(K, N ) property to the support . . . . . . . . . . . . . 866
β
Domain of definition of Uν and Uπ,ν on noncompact spaces . . . . . . . . 869
Displacement convexity inequalities in weak CD(K, N ) spaces . . . . . 870
Lower semicontinuity of Uν again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
982 List of short statements

Brunn–Minkowski inequality in weak CD(K, N ) spaces . . . . . . . . . . . 880

Nonatomicity of the support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881
Exhaustion by intermediate points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881
Bishop–Gromov inequality in metric-measure spaces . . . . . . . . . . . . . . 882
Measure of small balls in weak CD(K, N ) spaces . . . . . . . . . . . . . . . . . 883
Dimension of weak CD(K, N ) spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 883
Weak CD(K, N ) spaces are locally doubling . . . . . . . . . . . . . . . . . . . . . 883
Unique geodesics in nonbranching CD(K, N ) spaces . . . . . . . . . . . . . . 885
Regularity of interpolants in weak CD(K, N ) spaces . . . . . . . . . . . . . . 886
Uniform bound on the interpolant in nonnegative curvature . . . . . . . 886
HWI and log Sobolev inequalities in weak CD(K, ∞) spaces . . . . . . . 889
Sobolev inequality in weak CD(K, N ) spaces . . . . . . . . . . . . . . . . . . . . 890
Global Poincaré inequalities in weak CD(K, N ) spaces . . . . . . . . . . . . 892
Local Poincaré inequalities in nonbranching CD(K, N ) spaces . . . . . 892
Talagrand inequalities and weak curvature bounds . . . . . . . . . . . . . . . 893
Hamilton–Jacobi semigroup in metric spaces . . . . . . . . . . . . . . . . . . . . 894
Equivalent deﬁnitions of CD(K, N ) in nonbranching spaces . . . . . . . . 895
Local-to-global CD(K, N ) property along a path . . . . . . . . . . . . . . . . . 906
Local CD(K, N ) space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
From local to global CD(K, N ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
From local to global CD(K, ∞) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912
Cutoﬀ functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915
List of figures

1.1 Construction of the Knothe–Rosenblatt map . . . . . . . . . . . . . . . . . 21

3.1 Monge’s problem of déblais and remblais . . . . . . . . . . . . . . . . . . . . 42

3.2 Economic illustration of Monge’s problem . . . . . . . . . . . . . . . . . . . 42

4.1 Monge approximation of a genuine Kantorovich optimal plan . . 60

5.1 An attempt to improve the cost by a cycle . . . . . . . . . . . . . . . . . . . 64

5.2 c-convex function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

8.1 Monge’s shortening lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

8.2 The map from the intermediate point is well-deﬁned . . . . . . . . . . 181
8.3 Principle of the proof of Mather’s shortening lemma . . . . . . . . . . 184
8.4 Shortcuts in Mather’s proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
8.5 Oscillations of a pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

9.1 How to prove that the subdiﬀerential is single-valued . . . . . . . . . 220

9.2 Nonexistence of Monge transport, nonuniqueness of optimal
coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

10.1 Singularities of the distance function . . . . . . . . . . . . . . . . . . . . . . . . 229

10.2 k-dimensional graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

12.1 Caﬀarelli’s counterexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

12.2 Principle of Loeper’s counterexample . . . . . . . . . . . . . . . . . . . . . . . 300
12.3 Regular cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

14.1 The Gauss map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372

14.2 Parallel transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
984 List of figures

14.3 Jacobi ﬁelds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

14.4 Distortion by curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
14.5 Distortion by curvature, again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
14.6 Model distortion coeﬃcients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

16.1 The one-dimensional Green function . . . . . . . . . . . . . . . . . . . . . . . . 451

16.2 The lazy gas experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460

17.1 Approximation of an element of DCN . . . . . . . . . . . . . . . . . . . . . . . 467

18.1 An example of a measure that is not doubling . . . . . . . . . . . . . . . . 508

26.1 Triangles in a nonnegatively curved world . . . . . . . . . . . . . . . . . . . 754

27.1 Principle of the deﬁnition of the Hausdorﬀ distance . . . . . . . . . . . 760

27.2 An example of Gromov–Hausdorﬀ convergence . . . . . . . . . . . . . . . 769
27.3 Approximate isometries cannot in general be continuous . . . . . . . 769
27.4 An example of reduction of support . . . . . . . . . . . . . . . . . . . . . . . . . 777
Index

absolute continuity contraction principle, 826

of a curve, 127, 651 convergence
of a measure, 11 geodesic Gromov–Hausdorff, 771
action, 126, 133 Gromov–Hausdorff, 768
coercive, 134, 139 local Gromov–Hausdorff, 771
Alexandrov space, 205, 753, 755 measured Gromov–Hausdorff, 783
Alexandrov theorem, 377, 416 pointed Gromov–Hausdorff, 772
Aronson–Bénilan estimates, 724, 732 Wasserstein space, 108
Assumption (C), 217, 225, 302, 325 weak, 109
Aubry set, 98, 200 convexity
c-concavity, 68
Bakry–Émery theorem, 563, 575, 737, c-convexity, 66, 305
743
c-transform, 67, 68
barycenter, 407
d2 /2-convexity, 355
Bishop–Gromov inequality, 391, 513,
in Rn , 67
882
in P2 , 455
Bochner formula, 389, 433
in a geodesic space, 449
generalized, 397
in a Riemannian manifold, 451
Bonnet–Myers theorem, 392, 891
semiconvexity, 240
Brunn–Minkowski inequality, 509, 517
correspondence, 760
distorted, 509, 880
cost function, 22
Caffarelli perturbation theorem, 38, 40 quadratic, 175, 221, 350
change of variables formula, 24, 30, 288 quadratic-linear, 609, 610
Cheng–Toponogov theorem, 505 recapitulative table, 990
compactness counterexamples (to regularity)
in Gromov–Hausdorff topology, 770 Caffarelli, 297
in measured Gromov–Hausdorff Loeper, 299, 311
topology, 784, 850 without Assumption (C), 302
competitivity (of price functions), 66 coupling, 17, 29
concentration of measure, 583, 615, 634 deterministic, 18
Gaussian, 591, 606 dynamical, 138
conservation of mass, 26, 31 exact (classical), 21
contact set, contact point, 100 Holley, 20, 30
986 Index

increasing rearrangement, 19 minimal, 119

Knothe–Rosenblatt, 20, 30 Toscani, 110
measurable, 19 total variation, 115
Moser, 19, 28, 30 Wasserstein, 105, 118
optimal, 22 weak-∗, 110
trivial, 18 distortion coefficient, 407, 410
covariant derivative, 162 in model space, 410
critical value (Mather’s), 200 divergence, 376
Csiszár–Kullback–Pinsker inequality, doubling of variables, 693
598, 636 doubling property, 507, 515, 780
weighted, 596
curvature, 371 entropy, 438, 447, 638, 643, 711
c-curvature, 314 additivity, 590
Gauss, 372 Euler equation (pressureless), 354, 387
generalized Ricci, 395 Euler–Lagrange equation, 128, 165
Ricci, 371, 390 Eulerian point of view, 26, 387
sectional, 205, 373 exponential map, 166
curvature-dimension bound, 399, 403 Jacobian determinant, 378, 432
and displacement convexity, 479, 494, exponential measure, 573, 616, 621
501, 870
stability, 842 fast diffusion equation, 710
weak, 817, 855 Finsler structure, 168
cut locus, 193 FKG inequalities, 21, 30
and optimal transport, 363 focal point, 193, 409
cutoff function, 915 and cut locus, 192
cyclical monotonicity, 64, 70, 101 Fokker–Planck equation, 725
formula
democratic condition, 531 Bochner, 389, 397
differentiability, 229 change of variables, 30, 289
almost everywhere, 234, 377 conservation of mass, 26, 31
approximate, 25, 230, 282, 284 diffusion, 27, 31
sub- and superdifferentiability, 232 first variation, 165
diffusion equations, 688, 710
and curvature-dimension bounds, 502 Gaussian measure, 393, 583, 591, 606
displacement convexity, 455, 460, 501, geodesic
870 curve, 128, 165
class, 457, 463, 487, 501 distance, 160
distorted, 494 space, 132, 168
intrinsic, 500 uniqueness, 166, 256, 885
local, 478, 907 geodesic space, 770
displacement interpolation, 138, 171 gluing lemma, 30
equations of, 354 metric, 764
distance gradient flow, 645, 647, 697
bounded Lipschitz, 109 in a geodesic space, 651
Fortet–Mourier, 109 in Wasserstein space, 688, 710
Gromov–Hausdorff, 762 granular media, 727
Hausdorff, 759 Green function, 450, 455
Kantorovich–Rubinstein, 106 Gromov–Hausdorff topology, 762, 768,
Lévy–Prokhorov, 109, 760 770, 785
Index 987

local, 771, 786 length space, 132, 168

measured, 783, 786 Li–Yau estimates, 392, 724, 732
pointed, 772, 786 linear programming, 43, 46
Lipschitz continuity, 232
Hamilton–Jacobi semigroup, 156, 389, Lipschitz graph theorem, 194
600, 894 locality, 906, 907, 918
Hamiltonian, 129, 157 log Sobolev inequality, 562, 574, 601,
Hamiltonian equations, 705, 732 737, 889, 893
heat equation, 28, 690 generalized, 610
Hessian, 375, 451
Holley–Stroock theorem, 563 Ma–Trudinger–Wang tensor, 314, 325
HWI inequality, 545, 889 marginal, 17
hyperbolic space, 374, 754 Mather set, 98, 200
metric (Riemannian), 160
implicit functions, 274 metric coupling, 764
information metric-measure space, 776
Fisher, 447, 546, 701, 711, 889 midpoint, 168
Kullback (Cf. entropy), 94 Monge coupling, 22
interpolation Monge problem, 22
of laws, 138, 139, 171 for quadratic cost, 221
of prices, 156, 158 original, 41
isometry, 762 smoothness, 295
approximate, 766 solvability, 96, 220, 255, 259, 262, 279
isoperimetry Monge–Ampère equation, 296, 337
Euclidean, 35 Monge–Kantorovich problem, 22
Gaussian, 580 Moser coupling, 28, 30
isoperimetric inequality, 35, 39, 538,
no-crossing property, 152, 175, 179
561, 574, 579
nonbranching property, 166, 895
Lévy–Gromov, 571, 580, 634
optimal coupling, 22
Jacobi equation, 380 compactness, 90
Jacobi field, 380 dynamical, 138, 796
Jacobian determinant, 25, 288 existence, 55
stability, 89, 796
Kantorovich duality, 70, 72, 97, 107, 158 Otto calculus, 435
Kantorovich–Rubinstein distance, 72,
106 parallel transport, 162, 379
kinetic energy, 790 Poincaré inequality, 609
kinetic theory, 47, 213, 447, 558, 705, generalized, 610
726 global, 392, 572, 601, 615, 892
Knothe–Rosenblatt coupling, 20, 30 local, 521, 533, 892
polar factorization theorem, 47
Lagrangian, 126, 247 Polish space, 10
classical conditions, 130 porous medium equation, 710
Lagrangian point of view, 26 Prékopa–Leindler inequality, 536
Langevin process, 33 pressure and iterated pressure, 436
Laplacian, 376 Prokhorov theorem, 55, 775
lazy gas, 459
length, 131, 160 Rademacher’s theorem, 234, 917
988 Index

rearrangement, 19 tangent cone, 269

rectifiability, 269 total variation, 598
regular cost function, 306, 325 transference plan, 22
regularity theory, 331 dynamical, 138, 796
regularizing kernel, 851, 861 generalized optimal, 262
restriction property, 58 optimal, 22
dual side, 87 transform
Riemannian manifold, 160 c-transform, 67, 68
Legendre, 67, 825
second fundamental form, 314, 318 transport inequality, 93, 585, 621
selection theorem, 104 transport map, 18
semi-distance, 763 stability, 91
semi-geostrophic system, 47 twist condition, 228, 280
shortening principle, 175, 178, 211 strong, 313
Sobolev inequality, 392, 561, 564, 569,
738, 890 Vlasov equation, 39
logarithmic, 562 volume (Riemannian), 164
Spectral gap inequality, 392
speed, 130, 791 Wasserstein distance, 105, 118
stochastic mechanics, 172 Wasserstein space, 106, 108, 116
subdifferential differential structure, 357, 652, 698
c-subdifferential, 67, 100 geodesics, 139
Clarke, 273 gradient flows, 688
synthetic point of view, 752 Gromov–Hausdorff convergence, 793
representation of curves, 358
Talagrand inequality, 585, 599, 601, weak CD(K, N ) space, 817, 855
739, 893 weak KAM theory, 201, 213
Some notable cost functions
990 Some notable cost functions

In the following table I have compiled a few remarkable cost functions

which have appeared in various applications. The list is by no means exhaus-
tive and the suggested references should be considered just as entry points to
the corresponding literature.
Cost Setting Use Where quoted
2 3
|x − y| R or R Monge’s original cost function [20, 636]
shape optimization, sandpile growth, compression molding [328]
d(x, y) Polish space Kantorovich’s cost function [501]
definition of Kantorovich distance/norm Chapter 6
1x6=y Polish space representation of total variation [814, Appendix 1.4]
1d(x,y)≥ε Polish space Strassen’s duality theorem [814, Appendix 1.4]
p
d(x, y) Polish space p-Wasserstein distances Chapter 6
2 n
|x − y| R Tanaka’s study of Boltzmann equation [777] [814, Section 7.5]
Brenier’s study of incompressible fluids [24, 154, 156, 159] [814, Section 3.2]
most useful for geometric applications in Rn [814, Chapter 6]
diffusion equations of Fokker–Planck type [493, 669, 671]

Some notable cost functions

semi-geostrophic equations [269]
− log(1 − x · y) S2 Far-field reflector antenna design [419, 834]
2
− log(1 − ǫ(x) · y) x ∈ surface, y ∈ S Near-field reflector antenna design [660]
n
loghx, yi+ S prescribed integral curvature problem [659]
n
− log |x − y| R (flat) conformal geometry [520, 788]
3
p
1 − 1 − |x − y|2 R relativistic theory [164]
R2 Rubinstein–Wolansky’s design of lens [712]
R3
p
1 + |x − y|2 relativistic heat equation [39, 40, 164]
2 2
(x1 − y1 ) + (x2 − y2 )

991
R3 semi-geostrophic equations [268]
f (x3 − y3 )
erf(α|x − y|) R Hsu–Sturm’s maximal coupling of Brownian paths [484]
β 2
|x − y| , 0 < β < 1 R or R modeling in economy [399]
992
Cost Setting Use Where quoted

Some notable cost functions

2
d(x, y) Riemannian manifold Riemannian geometry [246, 616] and Part II
P 2
i min(d(xi , yi ), d(xi , yi ) ) product metric space Talagrand’s study of exponential concentration [774] and Chapter 22
Z
inf L(x, v, t) dt Riemannian manifold Mather’s theory of Lagrangian mechanics [104, 105, 601] and Chapter 8
Z “ 2
|v| ”
inf − p(t, x) dt subset of Rn incompressible Euler equation [24, 25, 155]
2
Z
1 ` 2 ´
inf |v| + Scal(t, x) dt Riemannian manifold Study of Ricci flow [576, 782]
2
and variants (L0 , L± )
d(x, y)2 geodesic space Lott–Sturm–Villani’s nonsmooth Ricci curvature bounds [577, 762, 763] and Part III

Riemannian Geometry I
No ratings yet
Riemannian Geometry I
79 pages
Module 1: Drafting: Introduc Tion To Industria L Arts
No ratings yet
Module 1: Drafting: Introduc Tion To Industria L Arts
9 pages
Calculus Of Variations Draft Miersemann E instant download
No ratings yet
Calculus Of Variations Draft Miersemann E instant download
47 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
236 pages
Introduction To Optimal Transport
No ratings yet
Introduction To Optimal Transport
56 pages
Ot For Applied Mathematicians
No ratings yet
Ot For Applied Mathematicians
356 pages
Mathematical Methods For Economic Analysis
100% (1)
Mathematical Methods For Economic Analysis
245 pages
VBook-O&N
No ratings yet
VBook-O&N
547 pages
OTNote
No ratings yet
OTNote
46 pages
Optimal Transportation and Action Minimizing Measures PDF
No ratings yet
Optimal Transportation and Action Minimizing Measures PDF
251 pages
Calculus of Variations
No ratings yet
Calculus of Variations
176 pages
AdvancedCalculus2017_2
No ratings yet
AdvancedCalculus2017_2
128 pages
Ambrosio L., Gigli N. - A User's Guide To Optimal Transport-Web Draft (2009)
No ratings yet
Ambrosio L., Gigli N. - A User's Guide To Optimal Transport-Web Draft (2009)
128 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
209 pages
(Euclidean, Metric, and Wasserstein) Gradient Flows
No ratings yet
(Euclidean, Metric, and Wasserstein) Gradient Flows
68 pages
Convexity and Well-Posed Problems (CMS Books in Mathematics) by Roberto Lucchetti
No ratings yet
Convexity and Well-Posed Problems (CMS Books in Mathematics) by Roberto Lucchetti
321 pages
Crash Course Renal and Urinary Systems 4E (2012) download
No ratings yet
Crash Course Renal and Urinary Systems 4E (2012) download
24 pages
Course Optimal Transport
No ratings yet
Course Optimal Transport
46 pages
m820 Block1
100% (1)
m820 Block1
208 pages
calc.of var. notes
No ratings yet
calc.of var. notes
149 pages
Analysis II Script v1
No ratings yet
Analysis II Script v1
207 pages
Introduction To The Mathematics of Variation: January 2021
No ratings yet
Introduction To The Mathematics of Variation: January 2021
245 pages
Mathematical Methods
No ratings yet
Mathematical Methods
123 pages
Convex It y 2015
0% (1)
Convex It y 2015
437 pages
Functions of Several Real Variables
No ratings yet
Functions of Several Real Variables
269 pages
Mfepoly PDF
No ratings yet
Mfepoly PDF
168 pages
AHRICertificate ARUM264BTE5
No ratings yet
AHRICertificate ARUM264BTE5
1 page
Math Notes
No ratings yet
Math Notes
244 pages
Mathematics of Variation
No ratings yet
Mathematics of Variation
14 pages
Andrews Notes PDF
No ratings yet
Andrews Notes PDF
327 pages
Calculus of Variations
No ratings yet
Calculus of Variations
176 pages
Optimal Transport For Applied Mathematicians: Filippo Santambrogio
No ratings yet
Optimal Transport For Applied Mathematicians: Filippo Santambrogio
376 pages
Nonlinear Optimization CO 367
No ratings yet
Nonlinear Optimization CO 367
105 pages
Russak (1996) Calculus of Variations Lecture Notes
No ratings yet
Russak (1996) Calculus of Variations Lecture Notes
133 pages
The Passage Below Is Accompanied by A Set of Six Questions. Choose The Best Answer To Each Question
100% (1)
The Passage Below Is Accompanied by A Set of Six Questions. Choose The Best Answer To Each Question
123 pages
Applications of Quantum Computing in Finance
From Everand
Applications of Quantum Computing in Finance
David Moche
5/5 (2)
A Treatise On Information Geometry: Chris Goddard, The University of Melbourne
No ratings yet
A Treatise On Information Geometry: Chris Goddard, The University of Melbourne
290 pages
Ezekiel Technical Report
No ratings yet
Ezekiel Technical Report
19 pages
Gorodski 2012 - An Introduction To Riemannian Geometry
No ratings yet
Gorodski 2012 - An Introduction To Riemannian Geometry
138 pages
Optimizing Urban Network Via Mass Transport
No ratings yet
Optimizing Urban Network Via Mass Transport
161 pages
Samr Task Cards Survival of the Sweetest
No ratings yet
Samr Task Cards Survival of the Sweetest
5 pages
m820 2011
67% (3)
m820 2011
396 pages
Quantum Theory for Math Enthusiasts
From Everand
Quantum Theory for Math Enthusiasts
Sanjay Nair
No ratings yet
EXPT.4 TIME DOMAIN RESPONSE OF SECOND ORDER SYSTEM USING MATLAB
No ratings yet
EXPT.4 TIME DOMAIN RESPONSE OF SECOND ORDER SYSTEM USING MATLAB
4 pages
Justine Humphry - Homelessness and Mobile Communication - Precariously Connected-Palgrave Macmillan (2022)
No ratings yet
Justine Humphry - Homelessness and Mobile Communication - Precariously Connected-Palgrave Macmillan (2022)
222 pages
Year-1-Block-2-Mental-Maths--Test-1-Week-2
No ratings yet
Year-1-Block-2-Mental-Maths--Test-1-Week-2
16 pages
Complex Analysis: Advanced Concepts
From Everand
Complex Analysis: Advanced Concepts
Shashank Tiwari
No ratings yet
Herd_Behavior-student_copy
No ratings yet
Herd_Behavior-student_copy
4 pages
Modelling and Perturbation Methods
No ratings yet
Modelling and Perturbation Methods
314 pages
Proposal C
No ratings yet
Proposal C
14 pages
Sturm-Liouville Problem_
No ratings yet
Sturm-Liouville Problem_
43 pages
PBA 1docx
No ratings yet
PBA 1docx
2 pages
Crunching Calculus- An Introduction to Calculus
From Everand
Crunching Calculus- An Introduction to Calculus
Vinayak Singh Oberoi
No ratings yet
06 HW
No ratings yet
06 HW
1 page
Prediction of Pressure Drop in Chilled Water Pipin
100% (1)
Prediction of Pressure Drop in Chilled Water Pipin
6 pages
Friedrich Nietzsche
100% (1)
Friedrich Nietzsche
40 pages
4.1. Shooting Method - Mechanical Engineering Methods Notes
No ratings yet
4.1. Shooting Method - Mechanical Engineering Methods Notes
6 pages
Turism Sasi
No ratings yet
Turism Sasi
37 pages
Model of the Theory of Everything (Point Universe)
From Everand
Model of the Theory of Everything (Point Universe)
khawla khaled
No ratings yet
English 19-24
No ratings yet
English 19-24
4 pages
Optimal Transport Old and New
No ratings yet
Optimal Transport Old and New
998 pages
Partial Differencial Equations Book
No ratings yet
Partial Differencial Equations Book
790 pages
The Graduate School of Korean Studies, The Academy of Korean Studies
No ratings yet
The Graduate School of Korean Studies, The Academy of Korean Studies
9 pages
Infinite Dimensional Manifolds
No ratings yet
Infinite Dimensional Manifolds
171 pages
(Andrei Bourchtein, Ludmila Bourchtein) CounterExa PDF
100% (1)
(Andrei Bourchtein, Ludmila Bourchtein) CounterExa PDF
358 pages
Calculus of Variations Solution Manual Russak
100% (2)
Calculus of Variations Solution Manual Russak
240 pages
Grade 4: Weekly Home Learning Plan
No ratings yet
Grade 4: Weekly Home Learning Plan
4 pages
IVC1 2AD Analog Input Module User Manual PDF
No ratings yet
IVC1 2AD Analog Input Module User Manual PDF
4 pages
Analysis Tools With Applications - Bruce Driver
No ratings yet
Analysis Tools With Applications - Bruce Driver
790 pages
Data Sheets Lump Breaker
No ratings yet
Data Sheets Lump Breaker
2 pages
Drilling Fluid Drilling Fluid Drilling Fluid Drilling Fluid Contaminants Contaminants
No ratings yet
Drilling Fluid Drilling Fluid Drilling Fluid Drilling Fluid Contaminants Contaminants
51 pages
Standard Aleman SK Description
No ratings yet
Standard Aleman SK Description
8 pages
Calculus of Variations
No ratings yet
Calculus of Variations
141 pages
Homework Practice Workbook Geometry Glencoe Mcgraw Hill
100% (1)
Homework Practice Workbook Geometry Glencoe Mcgraw Hill
7 pages
(Series in Mathematical Analysis and Applications 9) Leszek Gasinski, Nikolaos S. Papageorgiou - Nonlinear Analysis-Chapman & Hall - CRC (2006)
100% (1)
(Series in Mathematical Analysis and Applications 9) Leszek Gasinski, Nikolaos S. Papageorgiou - Nonlinear Analysis-Chapman & Hall - CRC (2006)
960 pages
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
Savitribai Phule Pune University: A Project Report On . of Company Submitted To
No ratings yet
Savitribai Phule Pune University: A Project Report On . of Company Submitted To
5 pages
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Chapter 4 - The Challenges of Middle and Late Adolescence
100% (1)
Chapter 4 - The Challenges of Middle and Late Adolescence
24 pages
Kayaking with Eric Jackson: Strokes and Concepts
From Everand
Kayaking with Eric Jackson: Strokes and Concepts
Eric Jackson
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Charging by Friction - Activity
No ratings yet
Charging by Friction - Activity
2 pages
Draft Resolution No. 161-2020 - Adopting Local Climate Change Action Plan 2021
No ratings yet
Draft Resolution No. 161-2020 - Adopting Local Climate Change Action Plan 2021
2 pages
PROTOCOL External COBAN
No ratings yet
PROTOCOL External COBAN
25 pages
Structural Reliability
From Everand
Structural Reliability
Maurice Lemaire
No ratings yet
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet
The First Science Fiction Novel MEGAPACK®: 6 Great Science Fiction Novels
From Everand
The First Science Fiction Novel MEGAPACK®: 6 Great Science Fiction Novels
John Gregory Betancourt
No ratings yet
Quantum Physics for Beginners
From Everand
Quantum Physics for Beginners
Max Thomson
4.5/5 (3)