0% found this document useful (0 votes)
27 views

Geodesic Convex It y

This document provides an introduction to geodesic convex optimization. It discusses how convex optimization relies on convexity defined with respect to differentiation in Euclidean spaces. Geodesic convexity generalizes this to manifolds by redefining straight lines as geodesics and convexity with respect to geodesics. Some non-convex problems can be reformulated as geodesically convex optimization problems over geodesically convex sets. The document covers differentiation on manifolds, defining geodesics as straight lines or length-minimizing curves, defining geodesic convexity, and examples of geodesically convex formulations of problems like computing the Brascamp-Lieb constant and operator scaling.

Uploaded by

IoanaFlondor
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Geodesic Convex It y

This document provides an introduction to geodesic convex optimization. It discusses how convex optimization relies on convexity defined with respect to differentiation in Euclidean spaces. Geodesic convexity generalizes this to manifolds by redefining straight lines as geodesics and convexity with respect to geodesics. Some non-convex problems can be reformulated as geodesically convex optimization problems over geodesically convex sets. The document covers differentiation on manifolds, defining geodesics as straight lines or length-minimizing curves, defining geodesic convexity, and examples of geodesically convex formulations of problems like computing the Brascamp-Lieb constant and operator scaling.

Uploaded by

IoanaFlondor
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Geodesic Convex Optimization:

Differentiation on Manifolds, Geodesics, and Convexity


Nisheeth K. Vishnoi∗

June 6, 2018

Abstract
Convex optimization is a vibrant and successful area due to the existence of a variety of effi-
cient algorithms that leverage the rich structure provided by convexity. Convexity of a smooth
set or a function in a Euclidean space is defined by how it interacts with the standard differen-
tial structure in this space – the Hessian of a convex function has to be positive semi-definite
everywhere. However, in recent years, there is a growing demand to understand non-convexity
and develop computational methods to optimize non-convex functions. Interestingly, there is
a type of non-convexity that disappears once one introduces a suitable differentiable structure
and redefines convexity with respect to the straight lines, or geodesics, with respect to this
structure. Such convexity is referred to as geodesic convexity. Interest in studying it arises
due to recent reformulations of some non-convex problems as geodesically convex optimization
problems over geodesically convex sets. Geodesics on manifolds have been extensively studied
in various branches of Mathematics and Physics. However, unlike convex optimization, under-
standing geodesics and geodesic convexity from a computational point of view largely remains
a mystery. The goal of this exposition is to introduce the first half of geodesic convex optimiza-
tion: geodesic convexity. Mathematically, we first present a variety of notions from differential
and Riemannian geometry such as differentiation on manifolds, geodesics, and then introduce
geodesic convexity. We conclude by showing that certain non-convex optimization problems such
as computing the Brascamp-Lieb constant and the operator scaling problem have geodesically
convex formulations.


Many thanks to Ozan Yıldız for preparing the scribe notes of my lecture on this topic.
Contents
1 Beyond Convexity 3

2 Manifolds 5
2.1 Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Differentiation on Manifolds 10
3.1 Differentiating a function on a manifold . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Differentiation a function along a vector field on a manifold . . . . . . . . . . . . . . 11
3.3 Affine connections and Christoffel symbols . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Derivative of a vector field along a smooth curve . . . . . . . . . . . . . . . . . . . . 13
3.5 The Levi-Civita connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Geodesics 16
4.1 Geodesics as straight lines on a manifold . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Geodesics as length minimizing curves on a Riemannian manifold . . . . . . . . . . . 17
4.3 Equivalence of the two notions for Riemannian Manifolds . . . . . . . . . . . . . . . 18
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Geodesic Convexity 21
5.1 Totally convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Geodesically convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Examples of geodesically convex functions . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4 What functions are not geodesically convex? . . . . . . . . . . . . . . . . . . . . . . . 28

6 Application: Characterizing Brascamp-Lieb Constants 28


6.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Non-convexity of Lieb’s formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3 A geodesically convex formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7 Application: Operator Scaling 31

A Proof of the Fundamental Theorem of Riemannian Manifold 32


A.1 Uniqueness of the Levi-Civita connection . . . . . . . . . . . . . . . . . . . . . . . . 32
A.2 Formula for Christoffel symbols in terms of the metric . . . . . . . . . . . . . . . . . 33

B Euler-Lagrange Dynamics on a Riemannian Manifold 34


B.1 Euler-Lagrange dynamics in Newtonian mechanics . . . . . . . . . . . . . . . . . . . 35
B.2 Derivation of Euler-Lagrange equations on a Riemannian manifold . . . . . . . . . . 36

C Minimizers of Energy Functional are Geodesics 38

2
1 Beyond Convexity
In the most general setting, an optimization problem takes the form
inf f (x), (1)
x∈K

for some set K and some function f : K → R. In the case when K ⊆ Rd , we can talk about
the convexity of K and f . K is said to be convex if any “straight line” joining two points in K
is entirely contained in K, and f is said to be convex if on any such straight line, the average
value of f at the end points is at least the value of f at the mid-point of the line. When f is
“smooth” enough, there are equivalent definitions of convexity in terms of the standard differential
structure in Rd : the gradient or the Hessian of f . Thus, convexity can also be viewed as a property
arising from the interaction of the function and how we differentiate in Rn . When both K and
f are convex, the optimization problem in (1) is called a convex optimization problem. The fact
that the convexity of f implies that any local minimum of f in K is also a global minimum, along
with the fact that computing gradients and Hessians is typically easy in Euclidean spaces, makes it
well-suited for developing algorithms such as gradient descent, interior point methods, and cutting
plane methods. Analyzing the convergence of these methods boils down to understanding how
well-behaved derivatives of the function are; see [BV04, Nes04, Vis18] for more on algorithms for
convex optimization.

Geodesic convexity. In recent times, several non-convex optimization problems have emerged
and, as a consequence, there is a need to understand non-convexity and develop methods for such
problems. Interestingly, there is a type of non-convexity that disappears when we view the domain
as a manifold and redefine what we mean by a straight line on it. This redefinition of a straight
line entails the introduction of a differential structure on Rn , or, more generally, on a “smooth
manifold”. Roughly speaking, a manifold is a topological space that locally looks like a Euclidean
space. ”Differentiable manifolds” are a special class of manifolds that come with a differential
structure that allows one to do calculus over them. Straight lines on differential manifolds are
called “geodesics”, and a set that has the property that a geodesic joining any two points in it
is entirely contained in the set is called geodesically convex (with respect to the given differential
structure). A function that has this property that its average value at the end points of a geodesic is
at least the value of f at the mid-point of the geodesic is called geodesically convex (with respect to
the given differential structure). And, when K and f are both geodesically convex, the optimization
problem in (1) is called a geodesically convex optimization problem. Geodesically convex functions
also have key properties similar to convex functions such as the fact that a local minimum is also
a global minimum.

Geodesics. Geodesics on manifolds have been well-studied, in various branches of Mathematics


and Physics. Perhaps the most famous use of them is in the Theory of General Relativity by
Einstein [Ein16]. In general, there is no unique notion of a geodesic on a smooth manifold – it
depends on the differential structure. However, if one imposes an additional “metric” structure on
the manifold that allows us to measure lengths and angles, there is an alternative, and equivalent,
way to define geodesics – as shortest paths with respect to this metric. The most famous class of such
metrics give rise to Riemannian manifolds – these are manifolds where each point has an associated
local inner product matrix that is positive semi-definite. The fundamental theorem of Riemannian
geometry states that any differential structure that is “compatible” with a Riemannian metric is
unique and thus we can view geodesics as either straight lines or distance minimizing curves.1
1
This even holds for what are called pseudo-Riemannian manifolds that arise in relativity.

3
Applications. Geodesic convex optimization has recently become interesting in computer science
and machine learning due to the realization that some important problems that appear to be non-
convex at first glance, are geodesically convex if we introduce a suitable differential structure and
a metric. Computationally, however, there is an additional burden to be able to compute geodesics
and argue about quantities such as areas and volumes that may no longer have closed form solutions.
For instance, consider the problem
X
inf log p(x) − log xi
x>0
i

for a polynomial p(x) ∈ R+ [x1 , . . . , xn ]. This problem and its variants arise in computing maximum
entropy distributions; see e.g., [Gur06, SV17b]. While this problem is clearly non-convex, it turns
out that it is geodesically convex if we consider the positive orthant Rn+ endowed with P the usual
differential structure but the Riemannian metric arising as the Hessian of the function − i log xi .
This viewpoint, while not new, can be used to rewrite the optimization problem above as a convex
optimization problem in new variables (by replacing xi by eyi for yi ∈ R) and use algorithms from
convex optimization to solve it efficiently; see [SV14, SV17a].
An extension of the above optimization problem to matrices is the Operator Scaling problem.
Given an operator T that maps positive definite matrices to positive definite matrices, compute
inf log det T (X) − log det X.
X0

This problem was introduced in [Gur04] and was studied in the work of [GGdOW16, GGdOW17].
This problem is non-convex, but unlike the previous example, there is no obvious way to convexity
the problem. It turns out that this problem is also geodesically convex after the introduction of a
suitable metric on the cone of positive definite matrices; see [AZGL+ 18]. Further, [AZGL+ 18] also
showed how to extend a method from convex optimization to this setting.
A related problem is that of computing the Brascamp-Lieb constant [BL76a] an important tool
from functional analysis with application on convex geometry [Bal89], information theory [CCE09],
[LCCV16], [LCCV17], machine learning [HM13], and theoretical computer science [DSW14, DG-
dOS18]. At the moment, there is no known convex formulation of the Brascamp-Lieb constant.
However, recently, a geodesically convex formulation of the problem over the positive definite
cone [SVY18] was discovered; it remains open whether this formulation can be solved efficiently.
Organization of this exposition. The goal of this exposition is to introduce the reader with
geodesic convexity and its applications. We do not assume any knowledge of differential geometry
and present the mathematical notions necessary to understand and formulate geodesics and geodesic
convexity. We start by introducing differentiable and Riemannian manifolds in Section 2. In
Section 3, we introduce the notion of differentiation of functions and vector fields on manifolds. In
particular we introduce the notion of an “affine connection” that allows us to take derivatives of
one vector field on a manifold with respect to another. We show that affine connections can be
completely specified by a tensor whose entries are called “Christoffel symbols”. We also prove the
existence of the uniqueness of the differential structure compatible with the Riemannian metric –
the “Levi-Civita connection”. This gives rise to two views of geodesics, one as straight lines on a
manifold and second as length-minimizing curves on a manifold, which are discussed in Section 4.
The second view can be reformulated as the “Euler-Lagrange dynamics” on a manifold giving rise
to the differential equations that enable us to compute geodesics. Subsequently, we define geodesic
convexity in Section 5 and discuss various structural aspects of it. Finally, we present geodesically
convex formulation for the Brascamp-Lieb problem and the Operator Scaling Problem in Section 6
and Section 7. We do not present methods for geodesic convex optimization here.

4
2 Manifolds
Roughly speaking, a manifold is a topological space that resembles a Euclidean space at each
point. This resemblance is a local property and it may be applicable only around a very small
neighborhood of each point.

Definition 2.1 (Manifold). A set M ⊆ Rn with an associated topology T is a d-dimensional


manifold if for any point p ∈ M , there exists an open neighborhood of p with respect to T , Up ,
such that there exists a homeomorphism, φp , between Up and an open set in Rd with respect to
standard topology. The collection (Up , φp ), for all p ∈ M , is called a chart.

Fig. 1 visualizes some examples of local resemblance to the Euclidean spaces.

Figure 1: The sphere (top left), the boundary of a cube (bottom left), and the shape at the top
right can all be considered manifolds since, in a small neighborhood around any point, they are all
(topologically) Euclidean. The figure-eight (bottom right) is not a manifold since it is not Euclidean
in any neighborhood of the point where the figure-eight self-intersects.

Example 2.2 (The Euclidean space, Rn ). Rn with the standard topology is a Euclidean space.
For any point p ∈ Rn and any open neighborhood of p, Up , the homeomorphism φp is the identity
map. In the remainder, whenever we discuss Rn , we use these charts.

Example 2.3 (The positive definite cone, Sn++ ). The set of n × n real symmetric matrices, Sn ,
with the standard topology is homeomorphic to Rn(n+1)/2 . Let {Eij }ni,j=1 be the standard basis
n(n+1)/2
elements for n × n matrices and {ei }i=1 be the standard basis elements for Rn(n+1)/2 . Let σ
be a bijective mapping from { (i, j) | 1 ≤ i ≤ j ≤ n } to {1, . . . , n(n + 1)/2}. For convenience, let us
Pnσ((i, j)) := σ((j,
also set Pi)) if i > j. Then, the homeomorphism between Sn and Rn(n+1)/2 is given
n
by φ( i,j=1 αij Eij ) = i,j=1 αij eσ((i,j)) . For any point p ∈ Sn++ and open neighborhood of p, Up ,
the homeomorphism φp is the restriction of φ to Up . In the remainder, whenever we discuss Sn++ ,
we use these charts.

5
2.1 Differentiable manifolds
In the most general sense, a manifold is an abstract geometric object. Although it has interesting
topological properties by itself, in various applications it becomes important to be able to do
calculus over manifolds, just as in the Euclidean space. The first step towards doing that is the
requirement that the different charts be consistent. If we have two charts corresponding to two
neighborhood of a point p, then these charts should agree on the region they intersect. We can
think of this as a having two different maps for the same region and if we want to go from one
place to another place that is covered in both maps, then it should not matter which map we are
using. Furthermore, if we are following a smooth path in one of these maps, then the corresponding
path in the other map should also be smooth. We formalize this intuition by defining differentiable
manifolds.

Definition 2.4 (Differentiable manifold). Let M be a manifold, and let Uα and Uβ be two open
neighborhoods of p ∈ M with charts (Uα , ψα ), (Uβ , ψβ ). There is a natural bijection ψαβ = ψα−1 ◦ψβ
between ψα (Uα ∩ Uβ ) and ψβ (Uα ∩ Uβ ) called the transition map. The manifold M is called a
differentiable manifold if all transition maps are differentiable. More generally, if all transition
functions are k-times differentiable, then M is a C k -manifold. If all transition functions have
derivatives of all orders, then M is said to be a C ∞ manifold or smooth manifold.

Figure 2: A visual representation of transition maps

One of the main reasons we have defined differentiable manifolds is that we would like to move
over the manifold. However, when we are moving over the manifold we cannot leave it. Depending
on the manifold there can be some possible directions that leave the manifold. For example, if we
consider d-dimensional manifold M ⊆ Rn with d < n, then for any p ∈ M , there should be at least
one direction that leaves the manifold. If no such direction exists, then M locally behaves like Rn
and so it should be n-dimensional. We want to characterize possible directions that allows us to
stay in the manifold after instantaneous movement in that direction.

6
Definition 2.5 (Tangent space). Let M be a d-dimensional differentiable manifold and p be a
point in M . By definition, there exists a chart around p, (Up , φp ). The tangent space at p, Tp M
is the collection of directional derivatives of f ◦ φ−1
p : φp (Up ) → R for any differentiable function
f : Rd → R. The collection of Tp M for p in M is called the tangent bundle and denoted by T M .

One can ask how can we compute the directional derivatives of f ◦ φ−1p . Observe that Vp := φp (Up )
is a subset of R by the definition of charts. Subsequently, note that f ◦ φ−1
d
p is simply a real-
valued differentiable multivariable function. Thus, it can be computed using standard multivariable
calculus techniques.
Another important point is that we have defined a tangent space with respect to charts. Hence,
we need to ensure that different selections of charts lead to the same definition. This can be seen
by noting the consistency provided by the manifold being a differentiable manifold.
When we do calculus over differentiable manifolds, often, we need to sum elements from a
tangent space. This makes sense only if a tangent space is actually a vector space.

Proposition 2.6 (The tangent space is a vector space). Let M be a differentiable manifold,
and p ∈ M . Then, Tp M is a vector space.

Proof. We need to verify two properties in order to show that Tp M is a vector space. If u, v ∈ Tp M
and c ∈ R, then cu ∈ Tp M and u + v ∈ Tp M .

• If a vector u ∈ Tp M is a directional derivative of f ◦ φ−1


p for some function f , then cu is the
−1
directional derivative of (cf ) ◦ φp for each c ∈ R. Thus, cu ∈ Tp M .

• If another vector v ∈ R is a directional derivative of f 0 ◦ φ−1 0


p for some function f , then u + v
is the directional derivative of (f + f 0 ) ◦ φ−1
p . Thus u + v ∈ Tp M .

Tp M
M p

Figure 3: The tangent space Tp M to a manifold M at the point p ∈ M .

Example 2.7 (The Euclidean space, Rn ). Let p ∈ Rn and Uα , Uβ be two open neighborhoods
of p with charts (Uα , φα ) and (Uβ , φβ ). As φα and φβ are identity maps, φαβ := φβ φ−1 α is also
n
an identity map on Uα ∩ Uβ . Since an identity map is smooth, R is a smooth manifold. The
tangent space at p ∈ Rn , Tp M , is Rn for each p ∈ Rn . We can verify this by considering functions
fi (u) := hei , ui where ei is the ith standard basis element for Rn .

7
Example 2.8 (The positive definite cone, Sn++ ). Let P ∈ Sn++ and Uα , Uβ be two open neigh-
borhoods of p with charts (Uα , φα ) and (Uβ , φβ ). As φα and φβ are restrictions of φ to Uα and
Uβ , φαβ := φβ φ−1α is also a restriction of φ to Uα ∩ Uβ . Since φ is smooth its restrictions to
open sets are smooth. Thus, Sn++ is a smooth manifold. The tangent space at P ∈ Sn++ , TP M ,
is Rn(n+1)/2 which is homeomorphic to Sn for each P ∈ Sn++ . We can verify this by considering
functions fi (u) := hei , ui where ei is the ith standard basis element for Rn(n+1)/2 . To simplify our
calculations we take tangent spaces as Sn , the set of all n × n symmetric matrices.

So far, we have defined a differential manifold and tangent spaces that consist of collection of
directions we can follow to move around the manifold. Now, we introduce vector fields to describe
our movement over the manifold.

Definition 2.9 (Vector field). A vector field X over a d-dimensional differentiable manifold M
is an assignment of tangent vectors to points in M . In other words, if p ∈ M , then X(p) is a vector
from Tp M . If f is a smooth real-valued function on M , then multiplication of f with X is another
vector field f X defined as follows,

(f X)(p) := f (p)X(p).

We say X is a smooth vector field if, for any coordinate chart φp with basis elements e1 , . . . , ed for
Tp M , there are smooth functions f1 , . . . , fd such that X(q) = di=1 fi (φp (q))ei for points in the
P
neighborhood of p.

Let us consider the problem of moving from a point p to another point q. If p and q are not close to
each other than there will be no chart covering both points. Consequently, it is not clear in which
direction we move from p to reach q. We use vector fields to provide a complete description of a
path, or curve, from p to q. In a nutshell, a vector field tells us which direction we need to move
given our current position.
Lastly, we define frame bundle: a tuple of ordered vector fields that forms an ordered basis for
tangent spaces.

Definition 2.10 (Frame bundle). Let M be a d-dimensional smooth manifold. A frame bundle
on M , {∂i }di=1 is an ordered tuple of d smooth vector fields, such that for any p ∈ M , {∂i (p)}di=1
form a basis for Tp M .

We remark that, although we use the symbol ∂i which is the usual symbol for partial derivatives,
frame bundles are different objects than partial derivatives.

2.2 Riemannian manifolds


Differential manifolds allow us to do calculus that is necessary for optimization. However, as such,
they lack quantitative notions such as distances and angles. For example, when we do optimization
and are interested in optimal solutions of the problem rather than optimal value, we need to
approximate points. Without a measure of distance, we cannot argue about approximating points.
Towards this end, we introduce Riemannian manifolds, differential manifolds on which one can
measure distances, lengths, angles etc. Before we introduce how to measure distance on a manifold,
let us recall how we measure distance on the Euclidean space. Euclidean space is a metric space
with metric d : Rn × Rn → Rn , d(p, q) = kp − qk2 . This metric gives the distance between p and q
which is also the length of the straight curve joining p and q. If we examine this metric carefully,
we notice that d actually computes the norm of the q − p, the direction we follow to reach q. This

8
direction vector is a part of the tangent space at each point over the line segment joining p to q.
As we saw earlier, this tangent space is simply Rn , itself. Thus, what we need is a way to define
norms of vectors in a tangent space or, more generally, inner products between two vectors in a
tangent space. This is what the metric tensor does.
Definition 2.11 (Metric tensor). Let M ⊆ Rn be a smooth manifold. A metric tensor over M
is a collection of functions gp : Tp M × Tp M → R for x ∈ M that satisfies following conditions,
i. gp is a symmetric function: gp (u, v) = gp (v, u) for all u, v ∈ Tp M ,

ii. gp is linear in the first argument: gp (cu + v, w) = cgp (u, w) + gp (v, w) for all u, v, w ∈ Tp M
and c ∈ R, and
g is called non-degenerate, if for a fixed 0 6= u ∈ Tp M , gp (u, v) is not identically zero, in other
words ∃v ∈ Tp M such that gp (u, v) 6= 0. g is called positive definite, if gp (v, v) ≥ 0 for v ∈ Tp M
and gp (v, v) = 0 if and only if v = 0.
In the Euclidean space, the metric is the same for every tangent space, however, this is not required
for general manifolds. This definition allows us to assign arbitrary metrics to individual tangent
spaces. But this is neither intuitive nor useful. We know that tangent spaces change smoothly in a
small neighborhood around a point. As a result, tangent vectors also change smoothly in the same
neighborhood. Our expectation is that the distance between these tangent vectors also changes
smoothly. A class of metric tensors that satisfy this condition give rise to Riemannian manifolds.
Definition 2.12 (Riemannian manifold). Let M be a smooth manifold and g be a metric
tensor over M . If for any open set U and smooth vector fields X and Y over U , the function
g(X, Y )[p] := gp (Xp , Yp ) is a smooth function of p, and g is positive definite, then (M, g) is called
a Riemannian manifold.
If we relax the positive definiteness condition with just non-degenerateness, then one gets what is
called a pseudo-Riemannian manifold. Pseudo-Riemannian manifolds arise in the theory of relativ-
ity and, although we focus only on Riemannian manifolds, most of the results can be extended to
pseudo-Riemannian manifolds. One critical difference between pseudo-Riemannian manifolds and
Riemannian manifolds is that the local norm of any tangent vector is non-negative in a Rieman-
nian manifold, while local norm of some tangent vectors can be negative on a pseudo-Riemannian
manifold.
Example 2.13 (The Euclidean space, Rn ). The usual metric tensor over Rn is gp (u, v) := hu, vi
for x, u, v ∈ Rn . gp is clearly a symmetric, bilinear, and positive definite function. It is also
nondegenerate as hu, vi = 0 for every v implies hu, ui = 0 or equivalently u = 0. Next, we observe
that Rn with g is a Riemannian manifold. This follows from the observation

g(X, Y )[p] := gp (Xp , Yp ) = hXp , Yp i

is a smooth function of p as X and Y are smooth vector fields. In the remainder, whenever we talk
about Rn we use this metric tensor.
Example 2.14 (The positiveP orthant, Rn>0 ). We consider the metric tensor induced by the Hes-
sian of the log-barrier function: ni=1 log(xi ). The induced metric tensor is gp (u, v) := hP −1 u, P −1 vi
where P is a diagonal matrix whose entries is p for p ∈ Rn>0 and u, v ∈ Rn . gp is clearly a symmetric,
bilinear, and positive definite function. It is also nondegenerate as hP −1 u, P −1 vi = 0 for every v
implies hP −1 u, P −1 ui = 0 or equivalently P −1 u = 0. Since P is a non-singular matrix, P −1 u = 0

9
is equivalent to u = 0. Next, we observe that Rn>0 with g is a Riemannian manifold. This follows
from the observation
n
−1 −1
X Xpi Ypi
g(X, Y )[p] := gp (Xp , Yp ) = hP Xp , P Yp i =
i=1
p2i

is a smooth function of p where Xp := ni=1 Xpi ei and Yp := ni=1 Ypi ei with (ei )ni=1 standard basis
P P
elements of Rn . Here we used the fact that the tangent space of Rn>0 is Rn . Now, Xpi , Ypi , and
p−2
i are all smooth functions of p. Thus their sum, g(X, Y )[p] is a smooth function of P . In the
remainder of this note, whenever we talk about Rn>0 we use this metric tensor.
Remark 2.15 (Hessian manifold). In fact, when a metric arises as a Hessian of a convex function,
as in the example above, it is called a Hessian manifold.
Example 2.16 (The positive definite cone, Sn++ ). We consider the metric tensor induced by
the Hessian of the log-barrier function: log det(P ) for a positive definite matrix P . The induced
metric tensor is gP (U, V ) := tr[P −1 U P −1 V ] for P ∈ Sn++ and U, V ∈ Sn . gP is clearly a symmetric,
bilinear, and positive definite. It is also nondegenerate as tr[P −1 U P −1 V ] = 0 for every V implies

tr[P −1 U P −1 U ] = tr[P −1/2 U P −1/2 P −1/2 U P −1/2 ] = 0

or equivalently P −1/2 U P −1/2 = 0. Since P is a non-singular matrix, P −1/2 U P −1/2 = 0 is equivalent


to U = 0. Next, we observe that Sn++ with g is a Riemannian manifold. This follows from the
observation
g(X, Y )[P ] := gP (XP , YP ) = tr[P −1 XP P −1 YP ]
is a smooth function of P . Our observation is based on P −1 , XP , and YP are all smooth functions
of P . Thus, their multiplication P −1 XP P −1 YP is a smooth function. Finally, trace as a linear
function is smooth. Thus, g(X, Y )[P ] is a smooth function of P . In the remainder of this note,
whenever we talk about Sn++ we use this metric tensor.

3 Differentiation on Manifolds
Consider the task of optimizing a given real valued function f over a manifold. In a Euclidean
space, traditional methods such as gradient descent or Newton’s method move from a point to
another in the direction of the negative gradient of f in order to minimize f . The performance
of such methods also depends on how the derivative of f with respect to a direction behaves. A
smooth manifold, on the other hand, while allows us to define smooth “curves”, does not tell us
how to differentiate functions with respect to vector fields or, more generally, how to differentiate
a vector field with respect to another. Differentiation inherently requires comparing the change
in a function or vector value when we move from one point to another close by point (in a given
direction), but the fact that at different points the charts are different charts poses a challenge in
defining differentiation. Additionally, it turns out that there is no unique way to do differentiation
on a smooth manifold and a differential structure that satisfies some natural properties that we
would expect is referred to as an “affine connection”. In this section, we present the machinery
from differential geometry that is required to understand how to differentiate over a differentiable
manifold. On Riemannian manifolds, if we impose additional constraints on the affine connection
that it be compatible with the metric tensor, then this gives rise to the Levi-Civita connection that
is shown to be unique. This section is important to define what it means for a curve on a manifold
to be a geodesic or a straight line which, in turn, is important to define geodesic convexity.

10
3.1 Differentiating a function on a manifold
Let us recall how does one differentiate real-valued functions on the Euclidean space. Let f : Rn →
R be a differentiable function. The differential of f at x ∈ Rn , df (x) is the unique vector that
satisfies,
f (x + h) − f (x) − hdf (x), hi
lim =0 (2)
h→0 khk
The direction vector h in (2) is an arbitrary vector in Rn . In the manifold setting, we cannot
approach to given point from arbitrary directions. The directions are limited to the tangent space
at that point. On the other hand, there exists a homeomorphism between the tangent space and
some Euclidean space. Thus, we can apply usual differentiation operator to functions over manifolds
with slight modifications.

Definition 3.1 (Derivative of a function). Let M be a differentiable manifold, x ∈ M , Tp M be


the tangent space at p and (Up , φp ) be a chart around p. Let f : M → R be a continuous function.
We say f is differentiable at p, if there exists a linear operator λp : Tp M → Tp M such that

f (φp (φ−1
p (x) + h)) − f (x) − λp (h)
lim = 0, (3)
h→0 khk

where h varies over Tp M .

In the example manifolds we have discussed earlier, manifolds and their tangent spaces lie in a
common Euclidean space. Thus, we can drop the mappings φp from (3). Consequently, differenti-
ation on Rn and Rn>0 is simply the usual differentiation.2 On the other hand, we restrict vectors h
to Sn in (2) while differentiating a function on Sn++ .

3.2 Differentiation a function along a vector field on a manifold


Differentiation along a vector field is a generalization of partial derivatives from multivariable
calculus. The vector field gives a direction that is tangent to the manifold at every point on
the manifold. Differentiating f along the vector field X is equivalent to computing directional
derivatives of f at every point in the direction given by X.

Definition 3.2 (Derivative of a function along a vector field). Let M be a differentiable


manifold, X be a vector field over M , x ∈ M , Tp M be the tangent space at p and (Up , φp ) be a
chart around p. Let f : M → R be a differentiable function. We define the derivative of f along X
at p by
f (φp (φ−1
p (p) + hX(p))) − f (p)
X(f )(p) := lim . (4)
h→0 h
Similar to differentiation of a function on the manifold, we can drop the mappings φp from (4) for
the example manifolds we have discussed so far. Thus, differentiation along a vector field on these
manifolds corresponds is exactly same as usual partial derivatives. The only difference is that at
any point, the vector field specifies the direction of the partial derivative.
2
Note, however, that this is not true if we were considering derivation along a geodesic or an arbitrary curve in
the positive orthant example.

11
3.3 Affine connections and Christoffel symbols
We saw how to differentiate a function over a manifold as well as how to differentiate a function
along a given vector field. However, both of these definitions depend on the local charts and do not
tell us how to compare different curves between “far away” points. In order to compare different
curves, we need to be able to compare vector fields inducing these curves. For that reason, we
would like to measure the relative movement of a vector field Y with respect to another vector field
X. The idea is very similar to the second order partial derivatives, where we measure the change
in one component with respect to another component. Consequently, if we denote the collection of
vector fields over M by X(M ), then we would like to define an operator ∇ : X(M ) × X(M ) → X(M )
called the affine connection so that ∇ measures the relative movement between vector fields and
behaves like a differential operator. What we mean by behave like a differential operator is that it
should be bilinear and it should satisfy Leibniz’s rule.
Definition 3.3 (Affine connection). Let M be differentiable manifold and X(M ) be the collection
vector spaces over M . An operator ∇ : X(M ) × X(M ) → X(M ) is called affine connection if it
satisfies following conditions,
• linearity in the first term: for any smooth functions f, f 0 : M → R and X, Y, Z ∈ X(M ),

∇f X+f 0 Y Z = f ∇X Z + f 0 ∇Y Z. (5)

• linearity in the second term: for any X, Y, Z ∈ X(M ),

∇X (Y + Z) = ∇X Y + ∇X Z. (6)

• Leibniz’s rule: for any smooth function f : M → R and X, Y ∈ X(M ),

∇X (f Y ) = f ∇X Y + Y X(f ). (7)

Such mappings are called “affine connections” because they allow one to describe tangent vectors of
two different points, in some sense “connecting” two points on the manifold. An affine connection
at a point p is actually a linear operator over Tp M × Tp M × Tp M and it can be thought as d × d × d
tensor. Christoffel symbols of the second kind correspond to the entries of these tensors.
Definition 3.4 (Christoffel symbols of the second kind). Let M be a d-dimensional differ-
entiable manifold and T M be its tangent bundle. Let us fix one frame {∂i }di=1 for M . Given an
affine connection ∇, define
Γkij := (∇∂i ∂j )k , (8)
the coefficient of ∇∂i ∂j in the direction of ∂k . The coefficient Γkij is called a Christoffel symbol of
the second kind.
Theorem 3.5 (Christoffel symbols determine the affine connection). The Christoffel sym-
bols of the second kind determine the affine connection ∇ uniquely.
Proof. Let X, Y be two vector fields on M and {∂i }di=1 be the frame bundle on M . Let us write X
as di=1 X i ∂i and Y as di=1 Y i ∂i . Then, ∇X Y can be written as
P P

d
(5) X
∇X Y = X i ∇ ∂i Y
i=1

12
d
(6) X
= X i ∇∂i Y j ∂j
i,j=1
d
(7) X
= X i Y j ∇∂i ∂j + X i ∂j ∂i (Y j )
i,j=1
d
X d
X
i j k
= XY Γij ∂k + X i ∂j ∂i (Y j )
i,j,k=1 i,j=1
 
d
X d
X d
X
=  Γkij + X i ∂i (Y k ) ∂k .
k=1 i,j=1 i=1

We note that di=1 X i ∂i (Y k ) neither depends on the affine connection nor the Christoffel symbols
P
of the second kind for any k = 1, . . . , d. It depends only to the vector fields X, Y ], and the tangent
spaces induced by the topology over the manifold. Hence, given the Christoffel symbols of the
second kind, there is unique affine connection corresponding to these symbols.

3.4 Derivative of a vector field along a smooth curve


We already saw how to compute the derivative of a curve along a vector field as well how to
differentiate a vector field with respect to another vector field. Now, we discuss how to compute
derivatives of a vector field along a curve. A useful application of this is that one can move a
tangent vector along the curve so that tangent vectors stay parallel with respect to the curve –
parallel transport – that we do not discuss in this exposition.
Let us consider a smooth curve γ : I → M on differentiable manifold M for an interval I ⊆ R.
γ̇ describes a tangent vector for each γ(t) ∈ M for t ∈ I. However, γ̇ may not correspond to a
vector field, in general. A vector field needs to assign a unique tangent vector at any given point.
However, if a smooth curve, γ : I → M , intersects by itself, then the curve defines two tangent
vectors at the intersection. Unless these tangent vectors are same, there cannot be a vector field
X such that ∀t ∈ I, X(γ(t)) = γ̇(t). Although we cannot find a vector field whose restriction to γ
is γ̇ globally, we can still define differentiation with respect to γ that is compatible with the affine
connection ∇. The differentiation of a vector field X with respect to a smooth curve γ measures
the change of X along γ̇ and we denote it by ∇γ̇ X.
Definition 3.6 (Covariant derivative). Let M be a differentiable manifold and ∇ be an affine
connection on M . Let γ : I → M be a smooth curve for some interval I ⊆ R. We say ∇γ̇ is a
covariant derivative if for any vector fields X and Y on M , real constants α, β and smooth function
f : M → R,
∇γ̇ (αX + βY ) =α∇γ̇ X + β∇γ̇ Y (R-linearity) (9)
˙
∇γ̇ f X =f X + f ∇γ̇ X (Leibniz rule) (10)
and if there exists a vector field Z such that Z(γ(t)) = γ̇, then
(∇Z X)(γ(t)) = (∇γ̇ X)(t). (11)
If we fix the curve γ, then there exists a unique operator that satisfies conditions (9) and (10).
Proposition 3.7 (Uniqueness of the covariant derivative). Let M be a differentiable mani-
fold, ∇ be an affine connection on M and γ : I → M be a smooth curve for some interval I ⊆ R.
There is a unique covariant derivative corresponding to γ, denoted by ∇γ̇ .

13
Proof. Let {∂i }di=1 be a frame bundle for M . We consider ∂i as a function of t, ∂i (t) := ∂i (γ(t)).
For a given vector field X, let us write X := di=1 X i ∂i and γ̇ := di=1 γ̇i ∂i . Then,
P P

d
X
∇γ̇ X =∇γ̇ X i ∂i
i=1
d
(9) X
= ∇γ̇ X i ∂i
i=1
d
(10) X i
= Ẋ ∂i + X i ∇γ̇ ∂i
i=1
d
(11) X
= Ẋ i ∂i X i + ∇Pd γ̇j ∂j ∂i
j=1
i=1
d
(11) X
= Ẋ i ∂i + γ̇j X i ∇∂j ∂i
i,j=1
 
d
X d
X
= Ẋ k + γ̇i X j Γkij  ∂k
k=1 i,j=1

Thus, given an affine connection, there can be at most one covariant derivative corresponding to
γ. On the other hand, given ∇, γ and X, if we define ∇γ̇ X as
 
X d Xd
Ẋ k + γ̇i X j Γkij  ∂k
k=1 i,j=1

then it satisfies the conditions for the covariant derivative. Linearity clearly holds, and if f : M → R
is a smooth function, then
 
d d
X k X
∇γ̇ f X = ˙ +
f X γ̇i (f X)j Γkij  ∂k
k=1 i,j=1
 
d
X d
X
= f˙X k + f Ẋ k + f γ̇i X j Γkij  ∂k
k=1 i,j=1

=f˙X + f ∇γ̇ X.

Hence, covariant derivative exists.

3.5 The Levi-Civita connection


Affine connections are very generic and there can be many of them. In fact, for any d × d × d tensor,
there is a corresponding affine connection as stated in Theorem 3.5. In general, we do not want
to consider every affine connection but those with “nice” properties. For a Riemannian manifold,
“compatibility” with the associated metric is one “nice” property we can ask. By compatibility
we mean the inner product between two vectors in tangent spaces should not change when we

14
move around the manifold. If we require the connection to satisfy another property called “torsion-
freeness”, then there is a unique connection that satisfies both properties. This connection is called
the Levi-Civita connection.
We define “torsion-freeness” using Lie Bracket of vector fields.

Definition 3.8 (Lie Bracket). Let X and Y be vector fields defined over a d-dimensional
Pd dif-
ferentiable manifold M . Let us fix a frame {∂i }di=1 for M and let us write X = i=1 X i∂ ,
i
Pd
Y = i=1 Y i ∂i . The Lie Bracket of X and Y , [X, Y ] is
d
X
[X, Y ] = (X j ∂j Y i − Y j ∂j X i )∂i .
i,j=1

Thus, we can think of a Lie Bracket as a differential operator induced by two smooth vector fields.
Given a real-valued smooth function f , and vector fields X, Y , we apply Lie bracket to f as follows,
d
X
[X, Y ](f ) := X j ∂j (Y i ∂i (f )) − Y j ∂j (X i ∂i (f )).
i,j=1

Linearity of this operator is trivial. And with a quick calculation one can show that [X, Y ] satisfies
the Leibniz rule, in other words for real-valued smooth functions f and g,

[X, Y ](f g) = f [X, Y ](g) + [X, Y ](f )g.

The torsion-free property requires that for every vector field pair X and Y , the difference between
the vector fields induced by deviation of Y along X and deviation of X along Y is equal to the
differential operator induced by these vector fields. Now, we can define the Levi-Civita connection
formally.

Definition 3.9 (Levi-Civita connection). Let (M, g) be a Riemannian manifold. An affine


connection on M is called Levi-Civita connection if it is torsion-free

∀X, Y ∈ X(M ), ∇X Y − ∇Y X = [X, Y ], (12)

and it is compatible with the metric g,

∀X, Y, Z ∈ X(M ), X(g(Y, Z)) = g(∇X Y, Z) + g(Y, ∇X Z). (13)

The definition of Levi-Civita connection states that it is any affine connection that is both torsion-
free and compatible with the metric. From this definition, it is hard to see if Levi-Civita connection
unique or even exists. The fundamental theorem of Riemannian geometry asserts that Levi-Civita
connection exists and unique. This theorem also shows that the Christoffel symbols of the second
kind, and hence the Levi-Civita connection, are completely determined by the metric tensor. This
property is useful in deriving expressions for geodesics in several cases. In the scope of this exposi-
tion, we focus on the uniqueness only. We present one Levi-Civita connection for the examples we
have discussed which is the only Levi-Civita connection by the theorem. The proof of this theorem
appears in Appendix A.

15
Theorem 3.10 (Fundamental Theorem of Riemannian Geometry). Let (M, g) be a d-
dimensional Riemannian manifold. The Levi-Civita connection on (M, g) is exist and unique.
Furthermore, its Christoffel symbols can be computed in terms of g. Let gij := g(∂i , ∂j ), G be the
d × d matrix whose entries are gij , and g ij be the ijth entry of G−1 . The Christoffel symbols of
Levi-Civita connection is given by following formula,
d
1 X kl
Γkij := g (∂i gjl + ∂j gil − ∂l gij )
2
l=1

for i, j, k = 1, . . . , d.
Example 3.11 (The Euclidean space, Rn ). Let us fix ∂i (p) as ei , ith standard basis element
of Rn . We can do this as Tp M is Rn for any p ∈ Rn . Also, gij (p) := gp (ei , ej ) = δij where δ is
Kronecker delta. Since gij is a constant function ∂k gij , the directional derivative of gij along ek is
0. Consequently, Γkij = 0 for any i, j, k = 1, . . . , n.
Example 3.12 (The positive orthant, Rn>0 ). Let us fix ∂i (p) as ei , ith standard basis element
of Rn . We can do this as Tp M is Rn for any p ∈ Rn . Also, gij (p) := gp (ei , ej ) = δij p−2
i where δ is
Kronecker delta. Consequently, G(p) = P −2 −1
and G(p) = P . Equivalently g (p) = δij p2i . The
2 ij

directional derivative of gij along ek at p, ∂k gij (p) is −2δij δik p−3 i −1


i . Consequently, Γii (p) = pi for
k
i = 1, . . . , n and Γij = 0 if all three i, j, and k are not same.

4 Geodesics
4.1 Geodesics as straight lines on a manifold
Geometrically, geodesics are curves whose tangent vectors remain parallel to the curve with respect
to the affine connection.
Definition 4.1 (Geodesics on a differentiable manifold). Let M be a differentiable manifold
and ∇ be an affine connection on M . Let γ : I → M be a smooth curve on M where I ⊆ R is an
interval. γ is a geodesic on M if
∇γ̇ γ̇ = 0
dγ(t)
where γ̇ := dt .

We already saw that one can specify an affine connection in terms of its Christoffel symbols.
Similarly, we can also describe geodesics using Christoffel symbols. This characterization is useful
when we want to compute geodesics when the affine connection is given in terms of its Christoffel
symbols. An important example is the Levi-Civita connection whose Christoffel symbols can be
derived from the metric tensor.
Proposition 4.2 (Geodesic equations in terms of Christoffel symbols). Let M be a d-
dimensional differentiable manifold, {∂i }di=1 be a frame of M , ∇ be an affine connection on M
with Christoffel symbols Γkij for i, j, k = 1, . . . , d. If γ is a geodesic on M with respect to ∇ and
γ̇ = di=1 γ̇i ∂i , then γ satisfies the following differential equation,
P

 
Xd Xd
 γ̇i γ̇j Γkij + γ̈k  ∂k = 0. (14)
k=1 i,j=1

16
Proof. Let us start writing the condition ∇γ̇ γ̇ = 0 in terms of Christoffel symbols.
0 =∇γ̇ γ̇
d
(9) X
= ∇γ̇ γ̇i ∂i
i=1
d
(10) X
= γ̈i ∂i + γ̇i ∇γ̇ ∂i
i=1 (15)
d d
(5) X X
= γ̇i γ̇j ∇∂j ∂i + γ̈i ∂i
i,j=1 i=1
d d
(8) X X
= γ̇i γ̇j Γkij ∂k + γ̈i ∂i
i,j,k=1 i=1

Consequently, (15) reduces to


d
X d
X
0= γ̇i γ̇j Γkij ∂k + γ̈i ∂i .
i,j,k=1 i=1

Rearranging this equation, we see that γ satisfies (14).

4.2 Geodesics as length minimizing curves on a Riemannian manifold


Since the metric tensor allows us to measure distances on a Riemannian manifold, there is an
alternative, and sometimes useful, way of defining geodesics on it: as length minimizing curves.
Before we can define a geodesic in this manner, we need to define the length of a curve on a
Riemannian manifold. This gives rise to a notion of distance between two points as the minimum
length of a curve that joins these points. Using the metric tensor we can measure the instantaneous
length of a given curve. Integrating along the vector field induced by its derivative, we can measure
the length of the curve.
Definition 4.3 (Length of a curve on a Riemannian manifold). Let (M, g) be a Riemannian
manifold and γ : I → R be a smooth curve on M where I ⊆ R is an interval. We define the length
function L as follows: Z q
L[γ] := gγ (γ̇, γ̇)dt. (16)
I

If we ignore the local metric, then this is simply the line integral used for measuring the length
of curves in the Euclidean spaces. On the other hand, the local metric on Euclidean spaces is
simply the 2-norm. Thus, this definition is a natural generalization of line integrals to Riemannian
manifolds. A notion related to the length of a curve, but mathematically more convenient is in
terms of the “work” done by a particle moving along the curve. To define this work, we first need
to define the energy function as
E(γ, γ̇, t) := gγ(t) (γ̇(t), γ̇(t)),
which is just the square of the term in the integral above. Given the energy at each time, the work
done is the integral of the energy over the curve:
Z 1
S[γ] = E(γ, γ̇, t)dt. (17)
0

17
Definition 4.4 (Geodesics on a Riemannian manifold). Let (M, g) be a Riemannian manifold
and p, q ∈ M . Let Γ be the set of all smooth curves joining p to q defined from I to M where I ⊆ R
is an interval. γ 0 ∈ Γ is a geodesic on M if γ 0 is a minimizer of
inf L[γ]
γ∈Γ

where L is the length function defined as (16). Equivalently, γ 0 minimizes S[γ].


The definition of geodesics on Riemannian manifold above is a solution to an optimization problem
and not very useful as such to compute geodesics. We can recover the dynamics of this curve by
using a result from the calculus of variations, the Euler-Lagrange equations to characterize optimal
solutions of this optimization problem.
Theorem 4.5 (Euler-Lagrange equations for geodesics). Let (M, g) be a Riemannian man-
ifold. Let p and q be two points on M , γ 0 : [0, 1] → M be the geodesic that joins p to q and
minimizes S[γ]. Then γ 0 satisfies the following differential equations:
d ∂E 0 ∂E 0
(γ ) − (γ ) = 0.
dt ∂ γ̇ ∂γ

We derive the Euler-Lagrange equations and prove this theorem in Appendix B.

4.3 Equivalence of the two notions for Riemannian Manifolds


We presented two different definitions of geodesics, one for smooth manifolds with a given affine
connection and one for Riemannian manifolds in terms of the metric. As shown earlier, the metric
of a Riemannian manifold determines a unique affine connection – the Levi-Civita connection. We
also show that these two formulations describe the same curves.
Theorem 4.6. Let (M, g) be a d-dimensional Riemannian manifold and ∇ be its Levi-Civita
connection. Let γ : I → M be a smooth curve joining p to q. If γ is a minimizer of the work
function defined in (17), then
∇γ̇ γ̇ = 0.

We present the proof of this theorem in Appendix C. An immediate corollary of Theorem 4.5
and Theorem 4.6 is that length minimizing curves are also geodesics in the sense of differentiable
manifolds. The other direction is not necessarily true. In a given Riemannian manifold, there can
be geodesics whose length greater than the distance between points they join.

4.4 Examples
Example 4.7 (The Euclidean space, Rn ). We do a sanity check and show that geodesics in Rn
are indeed straight lines. We have already shown that the Christoffel symbols are 0; see Exam-
ple 3.11. Hence, (14) simplifies to
Xn
γ̈i ∂i = 0.
i=1
Since, {∂i }ni=1 are independent for each i = 1, . . . , n,
γ̈i = 0.
This is simply the equation of a line. Therefore, geodesics in the Rn are straight lines.

18
Example 4.8 (The positive orthant, Rn>0 ). We show that the geodesic that joins p to q in Rn>0
endowed with metric corresponding to the Hessian of the log-barrier function can be parameterized
as follows,
γ(t) = (p1 (q1 /p1 )t , . . . , pn (qn /pn )t ).
We know that the Christoffel symbols are Γiii (p) = −p−1 k n
i , and Γij (p) = 0 for p ∈ R>0 and i, j, k =
1, . . . , n such that i, j, and k are not all same, Example 3.12. Hence, (14) simplifies to
n
X
0=− (−γi−1 γ̇i2 + γ̈i )∂i
i=1

Thus,
0 = γ̈i − γ̇i2 γi−1
or equivalently
d d
log(γ̇i ) = log(γi ).
dt dt
Consequently,
log(γ̇i ) = log(γi ) + ci
for some constant ci . Equivalently
d
log(γi ) = eci
dt
Therefore,
log(γi ) = αi t + βi
for some constants αi and βi . If we take βi as log pi as αi − βi as log qi , then γ becomes,

γ(t) = (p1 (q1 /p1 )t , . . . , pn (qn /pn )t ).

Fig. 4 visualizes a geodesic on R2>0 and R2 .

Figure 4: Two geodesics joining (1.0, 0.5) to (0.5, 1.0) on the smooth manifold R2>0 . The blue
curve corresponds to the geodesic with respect to metric tensor gp (u, v) := hu, vi. The green curve
corresponds to the geodesic with respect to metric tensor gp (u, v) := hP −1 u, P −1 vi where P is a
diagonal matrix whose entries is p.

19
Example 4.9 (The positive definite cone, Sn++ ). We show that the geodesic with respect to
the Hessian of the logdet metric that joins P to Q on Sn++ can be parameterized as follows:

γ(t) := P 1/2 (P −1/2 QP −1/2 )t P 1/2 . (18)

Thus, γ(0) = P and γ(1) = Q. Instead of using the straight-line formulation, here it turns out
to be a bit easier to use the Euler-Lagrange characterization of geodesics to derive a formula. Let
{ei }ni=1 be standard basis for Rn and
(
ei e> >
j + ej ei , if i 6= j,
Eij := >
ei ei , if i = j.

{Eij }1≤i≤j≤n is a basis for the set of symmetric matrices Sn , the tangent space. Let γ(t) be a
minimum length curve that joins P to Q. If we write γ(t) as
X
γij (t)Eij
1≤i≤j≤n

then
∂γ(t)
= Eij
∂γij (t)
for 1 ≤ i ≤ j ≤ n. Let us define E(t) as follows,
1
E(t) := tr[γ(t)−1 γ̇(t)γ(t)−1 γ̇(t)]
2
and notice that this definition of E(t) is actually the same as E(t) defined in Theorem 4.5. Let us
compute partial derivatives of E, before applying Theorem 4.5.
 −1
∂γ −1

∂E 1 ∂γ ∂ γ̇ −1 ∂ γ̇
= tr γ̇γ −1 γ̇ + γ −1 γ γ̇ + γ −1 γ̇ γ̇ + γ −1 γ̇γ −1
∂γij 2 ∂γij ∂γij ∂γij ∂γij
= − tr[γ −1 Eij γ −1 γ̇γ −1 γ̇]
 −1
∂γ −1

∂E 1 ∂γ ∂ γ̇ −1 ∂ γ̇
= tr γ̇γ −1 γ̇ + γ −1 γ γ̇ + γ −1 γ̇ γ̇ + γ −1 γ̇γ −1
∂ γ̇ij 2 ∂ γ̇ij ∂ γ̇ij ∂ γ̇ij ∂ γ̇ij
=tr[γ −1 Eij γ −1 γ̇]
d ∂E d
= tr[γ −1 Eij γ −1 γ̇]
dt ∂ γ̇ij dt
=tr[−γ −1 γ̇γ −1 Eij γ −1 γ̇ − γ −1 Eij γ −1 γ̇γ −1 γ̇ + γ −1 Eij γ −1 γ̈]
=tr[Eij (γ −1 γ̈γ −1 − 2γ −1 γ̇γ −1 γ̇γ −1 ]

for 1 ≤ i ≤ j ≤ n. Thus,

tr[Eij (γ −1 γ̈γ −1 − 2γ −1 γ̇γ −1 γ̇γ −1 ] = −tr[γ −1 Eij γ −1 γ̇γ −1 γ̇],

or equivalently
−1 −1
γ γ̈γ − γ −1 γ̇γ −1 γ̇γ −1 , Eij F = 0

for 1 ≤ i ≤ j ≤ n. Since Eij forms a basis for Sn , this implies that

γ −1 γ̈γ −1 − γ −1 γ̇γ −1 γ̇γ −1 = 0,

20
or equivalently
d
0 = γ̈γ −1 − γ̇γ −1 γ̇γ −1 = γ̇γ −1 .

dt
Hence,
γ̇γ −1 = C
for some constant real matrix C. Equivalently,

γ̇ = Cγ. (19)

If γ was a real valued function and C was a real constant, then (19) would be a first order linear
differential equation whose solution is γ(t) = d exp(ct) for some constant d. Using this intuition
our guess for γ(t) is exp(tC)D for some constant matrix D. We can see that this is a solution
by plugging into (19), but we need to show that this is the unique solution to this differential
equations. Let us define φ(t) := exp(−tC)γ(t) and compute φ̇,

φ̇(t) = − C exp(−tC)γ(t) + exp(−tC)γ̇(t) =− exp(−tC)Cγ(t) + exp(−tC)Cγ(t) = 0.

Here, we used the fact that exp(tC) and C commutes. Thus φ(t) = D for some constant matrix.
Hence, the unique solution of (19) is γ(t) = exp(tC)D. Now, let us notice that both γ and γ̇ are
symmetric. In other words,

γ̇ = exp(tC)CD =C exp(tC)D =Cγ(t) =γ(t)C > =exp(tC)DC >

In other words, CD = DC > . Now, let us consider boundary conditions,

P =γ(0) = exp(0 · C)D = D,

and write C as P 1/2 SP −1/2 . Such a matrix S exists as P is invertible. Thus, the CD = DC >
condition implies
P 1/2 SP 1/2 = P 1/2 S > P 1/2
or equivalently S = S > . Thus, C = P 1/2 SP −1/2 for some symmetric matrix. Hence,
∞ k ∞ k k
! !
X t (P 1/2 SP −1/2 )k 1/2
X t S
γ(t) = exp(tC)D = P =P P 1/2 = P 1/2 exp(tS)P 1/2 .
k! k!
k=0 k=0

Finally, considering γ(1) = Q, we get

S = log(P −1/2 QP −1/2 ).

Therefore, the geodesic joining P to Q is

γ(t) = P 1/2 (P −1/2 QP −1/2 )t P 1/2 .

5 Geodesic Convexity
With all the machinery and the definitions of geodesics behind us, we are now ready to define
geodesic convexity. We refer the reader to the book [Udr94] for an extensive treatment on geodesic
convexity.

21
5.1 Totally convex sets
Definition 5.1 (Totally (geodesically) convex set). Let (M, g) be a Riemannian manifold. A
set K ⊆ M is said to be totally convex with respect to g, if for any p, q ∈ K, any geodesic γpq that
joins p to q lies entirely in K.

Totally convex sets are a generalization of convex sets. In the Euclidean case, there is a unique
geodesic joining points p and q which is the straight line segment between p and q. Consequently,
totally convex sets and convex sets are same in the Euclidean case. One can relax the definition of
totally convex sets by requiring that geodesics that minimize the distance between points. These
sets are called geodesically convex sets. If there is a unique geodesic joining each pair of points,
then both the definition are the same. However, in general totally convex sets are more restrictive.
This can be seen from the following example of the unit sphere. On S n := { x ∈ Rn+1 | kxk2 = 1 }
with the metric induced by the Euclidean norm, there are at least two geodesics joining given any
two points, p, q ∈ S n . These geodesics are long and short arcs between p and q on a great circle
passing through both p and q. A set on S n is geodesically convex if short arcs are part of the set.
On the other hand, a set on S n is totally convex if both short and long arcs are part of the set.

Example 5.2 (A non-convex but totally convex set). Let us consider the Riemannian man-
ifold on Sn++ and a positive number c ∈ R+. Let Dc := { P ∈ Sn++ | det(P ) = c }. One can easily
verify that Dc is a non-convex set. On the other hand, if P and Q are two points in Dc , then the
geodesic joining P to Q, γP Q can be parameterized as follows

γP Q (t) := P 1/2 (P −1/2 QP −1/2 )t P 1/2 ,

see Example 4.9. Now, let us verify that ∀t ∈ [0, 1], γP Q (t) ∈ Dc or equivalently det(γP Q (t)) = c.

det(γP Q (t)) = det(P 1/2 (P −1/2 QP −1/2 )t P 1/2 )


= det(P )1/2 (det(P )−1/2 ) det(Q) det(P )−1/2 )c det(P )1/2
= det(P )1−t det(Q)t
=c1−t ct = c.

Therefore, Dc is a totally convex but non-convex subset of Sn++ .

Figure 5: Visual representation of a geodesically convex set on S2++ . Given a 2×2 matrix X = xy yz ,


the outer surface, det(X) = 0, is the boundary of S2++ and the inner surface, det(X) = 1, is a
geodesically convex set consisting of solutions of det(X) = 1.

22
5.2 Geodesically convex functions
Definition 5.3 (Geodesically convex function). Let (M, g) be a Riemannian manifold and
K ⊆ M be a totally convex set with respect to g. A function f : K → R is said to be a geodesically
convex function with respect to g if for any p, q ∈ K, and for any geodesic γpq : [0, 1] → K that
joins p to q,
∀t ∈ [0, 1] f (γpq (t)) ≤ (1 − t)f (p) + tf (q).

This definition can be interpreted as follows: the univariate function constructed by restricting f
to a geodesic is a convex function if the geodesic lies in K.
Theorem 5.4 (First-order characterization of geodesically convex functions). Let (M, g)
be a Riemannian manifold and K ⊆ M be an open and totally convex set with respect to g. A
differentiable function f : K → R is said to be geodesically convex with respect to g if and only if
for any p, q ∈ K, and for any geodesic γpq : [0, 1] → K that joins p to q,
f (p) + γ̇pq (f )(p) ≤ f (q), (20)
where γ̇pq (f ) denotes the first derivative of f along the geodesic.
Geometrically, Theorem 5.4 is equivalent to saying that the linear approximation given by the
tangent vector at any point is a lower bound for the function. Recall that convex functions also
satisfy this property. The first derivative of f along geodesics simply corresponds to the directional
derivative of f in that case.

Figure 6: Te blue line is the graph of f (x) := (log x)2 is a non-convex function. The red line
corresponds to the tangent line at x = 2. The green line corresponds to f (2) + γ̇2,x (f )(2) where γ
is the geodesic with respect to metric gx (u, v) := ux−1 v.

Proof. Let us first show that if f is geodesically convex, then (20) holds. Geodesic convexity, by
definition, implies that for any p, q ∈ K, any geodesic joining p to q, γpq with γpq (0) = p, γpq (1) = q
and t ∈ (0, 1],
f (γpq (t)) − f (p)
f (p) + ≤ f (q). (21)
t

23
Recall that
f (γpq (t)) − f (p)
γpq
˙ (f )(p) = lim .
t→0 t
Hence, if we take limit the of (21), we get

f (p) + γ̇pq (f )(p) ≤ f (q).

Now, let us assume that for any p, q ∈ K, and for any geodesic γpq : [0, 1] → K that joins p to q (20)
holds. Given p, q and γpq , let us fix a t ∈ (0, 1). Let us denote γpq (t) by r. Next, let us consider
curves α(u) := γpq (t + u(1 − t)) and β(u) := γpq (t − ut). These curves are a reparametrization
of γpq from r to q and r to p. Consequently, they are geodesics joining r to q and r to p. Their
derivatives with respect to u are α̇(0) = (1 − t)γpq˙ (t) and β̇(0) = −tγpq
˙ (t). Thus, if we apply (20)
to r, q and α, then we get

f (q) ≥ α̇(f )(r) + f (r) = f (r) + (1 − t)γpq


˙ (f )(r). (22)

Similarly, if we apply (20) to r, p and β, then we get

f (p) ≥ β̇(f )(r) + f (r) = f (r) − tγpq


˙ (f )(r). (23)

If we multiply (22) with t and (23) with 1 − t, and sum them up, then we get

tf (q) + (1 − t)f (p) ≥ f (r) = f (γpq (t)).

Therefore, f is geodesically convex.

Theorem 5.5 (Second-order characterization of geodesically convex functions). Let


(M, g) be a Riemannian manifold and K ⊆ M be an open and totally convex set with respect
to g. A twice differentiable function f : K → R is said to be a geodesically convex function with
respect to g if and only if for any p, q ∈ K, and for any geodesic γpq that joins p to q,

d2 f (γpq (t))
≥ 0.
dt2

Proof. Given p, q ∈ K and γpq , let us define θ : [0, 1] → R as θ(t) = f (γpq (t)). If f is geodesically
convex, then ∀t ∈ [0, 1],
f (γpq (t)) ≤ (1 − t)f (p) + tf (q)
or equivalently
θ(t) ≤ (1 − t)θ(0) + tθ(1).
In other words, θ is a convex function. The second order characterization of convex functions leads
to
d2 θ(t) d2 f (γpq (t))
0≥ = .
dt2 dt2
On the other hand, if f is not geodesically convex, then there exist p, q ∈ K, a geodesic γpq joining
p to q and t ∈ [0, 1] such that

f (γpq (t)) > (1 − t)f (p) + tf (q).

Thus,
θ(t) > (1 − t)θ(0) + tθ(1).

24
Therefore, θ is not convex and consequently there exists u ∈ [0, 1] such that

d2 θ(u) d2 f (γpq (u))


0> = .
dt2 dt2
Hence, f is geodesically convex if and only if

d2 f (γpq (t))
≥ 0.
dt2

5.3 Examples of geodesically convex functions


In this section, we present some simple examples of geodesically convex functions.

Example 5.6 (Geodesically convex functions on the positive orthant). Let us denote
the set of multivariate polynomials with positive coefficients over Rn by P+n . Also, let us recall
that geodesics on Rn>0 are of the form exp(αt + β) for α, β ∈ Rn where exp function is evaluated
independently on each coordinate.

• The log barrier function, h1, log(x)i is both geodesically convex and concave, where 1 denotes
the vector whose coordinates are 1. We can verify this by observing that the restriction of
h1, log(x)i to a geodesic is h1, αit+h1, βi, a linear function of t. Thus, h1, log(x)i is geodesically
convex by the second order characterization of geodesically convex functions.

• Multivariate polynomials
Qn with positive coefficients, p ∈ P+n , are geodesically convex. Let use
denote monomial i=1 x i by xλ for λ ∈ Zn≥0 . The important observation is that if we restrict
λ

a monomial to a geodesic, then it has the form exp(hλ, αit + hλ, βi) and its first derivative
is hλ, αi exp(hλ, αit + hλ, βi). Thus, the second derivative of any monomial is positive along
any geodesic. Consequently, the second derivative of p is positive along any geodesic since
its coefficients are positive. Our observation can be also interpreted as when we compute the
derivative of a monomial along a geodesic, the result is a scaling of the same monomial. In
contrast, the derivative along a straight line results in a polynomial whose degree is 1 less
than the differentiated monomial.

• log(p(x)) for p ∈ P+n . Let us fix a geodesic γ(t) := exp(αt + β) for α, β ∈ Rn . Let us write
p(γ(t)) as follows, X
p(γ(t)) := cθ exp(hθ, αit + hθ, βi)
θ∈F

where F ⊆ Zn≥0 and cθ is the coefficient of the monomial ni=1 xθi i . Now, let us compute the
Q
Hessian of log(p(γ(t))),
P
d log(p(γ(t))) cθ hθ, αi exp(hθ, αit + hθ, βi)
= θ∈F
P
dt θ∈F cθ exp(hθ, αit + hθ, βi)
2 2
d2 log(p(γ(t)))
P P
θ∈F cθ hθ, αi exp(hθ, αit + hθ, βi) θ∈F cθ hθ, αi exp(hθ, αit + hθ, βi)
= − .
dt2
P P
θ∈F cθ exp(hθ, αit + hθ, βi) θ∈F cθ exp(hθ, αit + hθ, βi)

2
Non-negativity of d log(p(γ(t)))
dt2
follows from the Cauchy-Schwarz inequality. Thus, log(p(x))
is geodesically convex by the second order characterization of geodesically convex functions.

25
Proposition 5.7 (Geodesic linearity of logdet). log det(X) is geodesically both convex and
concave on Sn++ with respect to the metric gX (U, V ) := tr[X −1 U X −1 V ].

Proof. Let X, Y ∈ Sn++ and t ∈ [0, 1]. Then, the geodesic joining X to Y is

γ(t) = X 1/2 (X −1/2 Y X −1/2 )t X 1/2 ,

see Example 4.9. Thus,

log det(γ(t)) = log det(X 1/2 (X −1/2 Y X −1/2 )t X 1/2 ) = (1 − t) log det(X) + t log det(Y ).

Therefore, log det(X) is a geodesically linear function over the positive definite cone with respect
to the metric g.

Proposition 5.8. Let T (X) be a strictly positive linear operator from Sn++ to Sm ++ , in other
words it maps positive definite matrices to positive definite matrices. T (X) is geodesically convex
with respect to Loewner partial ordering on Sm n
++ over S++ with respect to metric gX (U, V ) :=
tr[X −1 U X −1 V ]. In other words, for any geodesic, γ : [0, 1] → Sn++ ,

∀t ∈ [0, 1], T (γ(t))  (1 − t)T (γ(0)) + tT (γ(1)). (24)


Pd
Proof. Any linear operator on Sn++ can be written as T (X) := i=1 Ai XBi for some m × n
d2
matrices Ai and n × m matrices Bi . If we can show that dt2 T (γ(t)) is positive definite, then this
implies (24). Let us consider the geodesic γ(t) := P 1/2 exp(tQ)P 1/2 for P ∈ Sn++ and Q ∈ S. The
second derivative of T along γ is
d
dT (γ(t)) X
= Ai P 1/2 Q exp(tQ)P 1/2 Bi
dt
i=1
d
d2 T (γ(t)) X
= Ai P 1/2 Q exp(tQ)QP 1/2 Bi
dt2
i=1
=T (P 1/2 Q exp(tQ)QP 1/2 ).

We can see that P 1/2 Q exp(tQ)QP 1/2 is positive definite for t ∈ [0, 1] and T (P 1/2 Q exp(tQ)QP 1/2 )
d2
is also positive definite as T is a strictly positive linear map. Consequently, dt 2 T (γ(t)) is positive
definite, and (24) holds.

Proposition 5.9 (Geodesic convexity of logdet of positive operators). Let T (X) be a


strictly positive linear operator. Then, log det(T (X)) is geodesically convex on Sn++ with respect
to the metric gX (U, V ) := tr[X −1 U X −1 V ].

Proof. Let us write T (X) as T (X) := di=1 Ai XBi for some m × n matrices Ai and n × m matrices
P
Bi . We need to show that the Hessian of log det(T (X)) is positive semi-definite along any geodesic.
Let us consider the geodesic γ(t) := P 1/2 exp(tQ)P 1/2 for P ∈ Sn++ and Q ∈ S. The second
derivative of log det(T (X)) along γ is
 
d log det(T (γ(t))) −1 d
=tr T (γ(t)) T (γ(t))
dt dt
d2 log det(T (γ(t))) 2
 
−1 d −1 d −1 d
=tr −T (γ(t)) T (γ(t))T (γ(t)) T (γ(t)) + T (γ(t)) T (γ(t)) .
dt2 dt dt dt2

26

d2 log det(T (γ(t)))
We need to only verify that dt2 > 0. In other words, we need to show that
t=0
h  i
tr T (P )−1 T (P 1/2 Q2 P 1/2 ) − T (P 1/2 QP 1/2 )T (P )−1 T (P 1/2 QP 1/2 ) .

In particular, if we show that

T (P 1/2 Q2 P 1/2 )  T (P 1/2 QP 1/2 )T (P )−1 T (P 1/2 QP 1/2 ), (25)

then we are done. Let us define another strictly positive linear operator

T 0 (X) := T (P )−1/2 T (P 1/2 XP 1/2 )T (P )−1/2 .

If T 0 (X 2 )  T 0 (X)2 , then by picking X = Q we arrive at (25). This inequality is an instance


of Kadison’s inequality, see [Bha09] for more details. A classical result from matrix algebra is
A  BD−1 C if and only if  
A B
0
C D
where A and D square matrices, B and C are compatible sized matrices and D is invertible. The
quantity is called Schur complement, A − BD−1 C. Thus, we need to verify that
 0 2
T (X ) T 0 (X)

 0.
T 0 (X) Im

Let {ui }ni=1 be eigenvalues of X that form an orthonormal basis and {λi } be the corresponding
eigenvalues. Then,
Xn
X= λi ui u>
i ,
i=1

and
n
X
X2 = λ2i ui u>
i .
i=1

Let us denote T 0 (ui u>


i ) by Ui , then
n
X n
X
0 0
Im = T (In ) = T ( ui u>
i ) = Ui
i=1 i=1

and consequently,
  0 Pn Pn n  2
T 0 (X 2 ) T 0 (X) T ( Pi=1 λ2i ui u> ) T 0( P λi ui u>
  X 
i i=1 i ) λi Ui λi Ui
= = .
T 0 (X) Im T 0 ( ni=1 λi ui u>
i )
n
i=1 Ui λi Ui Ui
i=1

Since λ2i Ui − (λi Ui )Ui−1 (λi Ui ) = 0,


n  2 
X λi Ui λi Ui
 0.
λi Ui Ui
i=1

Thus, T 0 (X 2 )  T 0 (X) for any X ∈ Sn . Therefore, (25) holds and log det(X) is a geodesically
convex function.

27
5.4 What functions are not geodesically convex?
A natural question one can ask is that given a manifold M and a smooth function f : M → R,
does there exist a metric g such that f is geodesically convex on M with respect to g. In general,
verifying if the function f is geodesically convex or not with respect to the metric g is relatively
easy. However, arguing that no such metric exists is not so easy. We already saw that non-convex
functions can be geodesically convex with respect to metrics induced by the Hessian of log-barrier
functions. Functions that are not geodesically convex with respect to these metrics can still be
geodesically convex with respect to other metrics.
To prove that something is not geodesically convex for any metric, we revisit convexity. We
know that any local minimum of a convex function is also its global minimum. This property
also extends to geodesically convex functions. Consequently, one class of functions that are not
geodesically convex with respect to any metric are functions that have a local minimum that is not
a global minimum.
Proposition 5.10 (Establishing non-geodesic convexity). Let M be a smooth manifold, and
f be a function such that there exists p ∈ M and an open neighborhood of p, Up , such that

f (p) = inf f (q) (26)


q∈Up

but
f (p) > inf f (q). (27)
q∈M

Then there is no metric tensor g on M such that f is geodesically convex with respect to g.
(26) corresponds to p being a local minimum of f , while (27) corresponds to p not being a global
minimum of p.

Proof. Let us assume that there exists a metric g such that f is geodesically convex with respect
to g. Let q ∈ M be such that f (q) < f (p) and γ : [0, 1] → M be a geodesic such that γ(0) = p and
γ(1) = q. Since f is geodesically convex, we have

∀t ∈ [0, 1], f (γ(t)) ≤ (1 − t)f (γ(0)) + tf (γ(1)) = (1 − t)f (p) + tf (q) < f (p).

For some t0 ∈ (0, 1], γ(t) ∈ Up for t ∈ [0, t0 ) as γ is smooth. Then, f (γ(t)) ≥ f (p) for t ∈ [0, t0 )
by (26). This is a contradiction. Therefore, there is no metric g on M such that f is geodesically
convex with respect to g.

6 Application: Characterizing Brascamp-Lieb Constants


The Brascamp-Lieb inequality [BL76b] is a powerful mathematical tool which unifies most of the
well-known classical inequalities with a single formulation. The inequality can be described as
follows.
Definition 6.1 (Brascamp-Lieb inequality [BL76b]). Given linear transformations B = (Bj )m j=1
where Bj is a linear transformation from Rn to Rnj and non-negative real numbers p = (pj )m j=1 ,
there exists a number C ∈ R such that for any given function tuple f = (fj )m
j=1 where fj : R
nj → R
+
is a Lebesgue measurable function, the following inequality holds:
Z Ym Ym Z pj
pj
fj (Bj x) dx ≤ C fj (x)dx . (28)
x∈Rn j=1 j=1 x∈Rnj

28
The smallest C such that (28) holds for any f is called the Brascamp-Lieb constant and we denote
it by BL(B, p). (B, p) is called “feasible” if C is finite.
Lieb [Lie90] showed that (28) is saturated by Gaussian inputs: gj (x) := exp(−x> Aj x) for some
positive definite matrix Aj for all j. Evaluating (28) with inputs gj leads to the following formulation
of the Brascamp-Lieb inequality:
Qm !1/2
pj
j=1 det(Aj )
≤ C.
det( m >
P
j=1 pj Bj Aj Bj

Also, Bennett et al. [BCCT08] characterized the necessary and sufficient conditions for a Brascamp-
Lieb datum to be feasible.
Theorem 6.2 (Feasibility of Brascamp-Lieb datum [BCCT08], Theorem 1.15). Let (B, p)
be a Brascamp-Lieb datum with Bj ∈ Rnj ×n for each j = 1, . . . , m. Then, (B, p) is feasible if and
only if following conditions hold:
1. n = m
P
j=1 pj nj , and

2. dim(V ) ≤ m n
P
j=1 pj dim(Bj V ) for any subspace V of R .

6.1 Problem statement


Lieb’s characterization leads to the following characterization of the Brascamp-Lieb constant.

Brascamp-Lieb Constant

Input: An m-tuple of matrices B := (B m nj ×n and an m-


Pjm)j=1 where Bj ∈ R
dimensional vector p that satisfies n = j=1 nj pj .

Goal: Compute
Qm !1/2
pj
j=1 det(Xj )
BL(B, p) := sup Pm >
X=(X1 ,...,Xm ) det( j=1 pj Bj Xj Bj (29)
nj
s.t. ∀j = 1, . . . , m, Xj ∈ S++

6.2 Non-convexity of Lieb’s formulation


Let us first observe that, when nj = 1 for all j, the formulation in (29) is log-concave by a change
of variables similar to the example of the log of a polynomial over the positive orthant. We can
nj
verify this observation as follows. Let X1 , . . . , Xm be inputs with Xj ∈ S++ and let us denote
log(pj Xj ) by yj for j = 1, . . . , m. Then, 2 log BL(B, p; X1 , . . . , Xm ) becomes
Xm m
X
yj >
f (y) := hp, yi − log(det( e Bj Bj )) − pj log pj .
j=1 j=1
Pm
We note that det( j=1 eyj Bj> Bj ) is a multilinear function of (ey1 , . . . , eym ) with non-negative
coefficients. The Cauchy-Binet formula yields
 
m
X X
det  eyj Bj> Bj  = cα ehα,yi
j=1 α∈F

29
where F := { I ⊆ {1, . . . , m} | |I| = n } and cα = det(Bα )2 where Bα is a matrix whose rows are
Bj for j ∈ α. Thus,
m
X X
f (y) = hp, yi − pj log(pj ) + log( cα exp(hα, yi)).
j=1 α∈F

A simple computation yields


hα,yi αα> ( α∈F cα ehα,yi α)( α∈F cα α> )
P P P
2 α∈F cα e
∇ f (y) = P hα,yi
− .
( α∈F cα ehα,yi )2
P
α∈F cα e

Positive semi-definiteness of ∇2 f (y) follows from the Cauchy-Schwartz inequality.


Now, let us return to general case. In general, (29) is not concave. We can get some intuition
for this by noting that if we fix all but one of the Xj s, then log BL(B, p; X1 , . . . , Xm ) reduces to

1 
pj log det(Xj ) − log det(pj Bj> Xj Bj + Cj )
2
where Cj is a positive semi definite matrix. This function is a difference of two concave functions
of Xj which indicates it may not be a concave function. In fact, for some feasible Brascamp-Lieb
data one can find some input and directions such that this formulation is not concave. We remark
that although (29) is not a log-concave function in general, it is jointly geodesically log-concave;
see [SVY18].

6.3 A geodesically convex formulation


We present a geodesically convex formulation to compute BL(B, p). Given an m-tuple of
Pmmatrices
B := (Bj )m B ∈ nj ×n and an m-dimensional vector p that satisfies n =
j=1 where j R j=1 nj pj ,
define FB,p as follows,
m
X
FB,p (X) := pj log det(Bj XBj> ) − log det(X).
j=1

It was proved in [SVY18] that this is a reformulation of the Brascamp-Lieb constant.

Theorem 6.3 (Reformulation of Brascamp-Lieb constant). If (B, p) is feasible, then BL(B, p) =


exp(−1/2 inf X∈Sn++ FB,p (X)).

Subsequently, it was proved that this formulation is geodesically convex.

Theorem 6.4 (Brascamp-Lieb constant has a succinct geodesically convex


Pmformulation n
[SVY18]). Let (B, p) be a Bracamp-Lieb datum such that n = m
P
p n
j=1 j j and n = j=1 pj dim(Bj R ).
n
Then FB,p (X) is a geodesically convex function on S++ with respect to local metric gX (U, V ) :=
tr[X −1 U X −1 V ].

The conditions we impose on the Brascamp-Lieb datum are satisfied by any feasible datum, see The-
orem 6.2. A Brascamp-Lieb satisfying these conditions is called a non-degenerate datum.

Proof. Let us first observe that FB,p is a positive weighted sum of − log det(X) and log det(Bj XBj> )
for j = 1, . . . , m. − log det(X) is geodesically convex by Proposition 5.7. Also, we know that if T (X)
is strictly positive linear operator, then log det(T (X)) is geodesically convex by Proposition 5.9.

30
Thus, we only need to show that Tj (X) := Bj XBj> is a strictly positive linear maps if (B, p) is
feasible. This would imply that, for any geodesic γ and for t ∈ [0, 1],
m
X
FB,p (γ)(t) = pj log det(Bj γ(t)Bj> ) − log det(γ(t))
j=1
Xm
≤ pj ((1 − t) log det(Bj γ(0)Bj> ) + y log det(Bj γ(1)Bj> ))
j=1
− ((1 − t) log det(γ(0)) + t log det(γ(1)))
 
Xm
=(1 − t)  pj log det(Bj γ(0)Bj> ) − log det(γ(0))
j=1
 
Xm
+ t pj log det(Bj γ(1)Bj> ) − log det(γ(1))
j=1

=(1 − t)FB,p (γ(0)) + tFB,p (γ(1)).

Let us assume that for some j0 ∈ {1, . . . , m}, Tj0 (X) is not strictly positive linear map. Then,
there exists X0 ∈ Sn++ such that Tj0 (X0 ) is not positive definite. Thus, there exists v ∈ Rnj0
such that v > Tj0 (X0 )v ≤ 0. Equivalently, (Bj>0 v)> X0 (Bj>0 v) ≤ 0. Since X0 is positive definite,
we get Bj0 v = 0. Hence, v > Bj0 Bj>0 v = 0. Consequently, rank of Bj0 is at most nj0 − 1 and
dim(Bj0 Rn ) < nj0 . We note that dim(Bj Rn ) ≤ nj as Bj ∈ Rnj ×n . Therefore,
m
X m
X
n= pj dim(Bj Rn ) < pj nj = n.
j=1 j=1

This contradicts with our assumptions on (B, p). Consequently, for any j ∈ {1, . . . , m}, Tj (X) :=
Bj> XBj is strictly positive linear whenever (B, p) is a non-degenerate datum. Therefore, FB,p (X) =
>
P
j∈{1,...,m} pj log det(Bj XBj ) − log det(X) is geodesically convex function.

7 Application: Operator Scaling


The matrix scaling problem is a classic problem related to bipartite matchings in graphs and
n×n
asks the following question: given a matrix M ∈ Z>0 with positive entries, find positive vectors
n
x, y ∈ R>0 such that
Y −1 M X is a doubly-stochastic matrix,
where X = diag(x) and Y = diag(y). The operator scaling problem is a generalization of this to
the world of operators as follows:

31
Operator Scaling Problem for Square Operators

Input: An m-tuple of n × n matrices (Aj )m


j=1

Goal: Find two n × n matrices X and Y such that if Âi := Y −1 Ai X, then


X
Âi Â>
i = I,
i
X
Â>
i Âi = I.
i

The Operator Scaling problem was studied by Gurvits [Gur04] and, more recently by Garg et
al. [GGdOW16]. They showed that the Operator Scaling problem can be formulated in terms of
solving the following “operator capacity” problem.

Operator Capacity for Square Operators


Pm >
Input: A strictly positive operator of the form T (X) := j=1 Aj XAj given as
an m-tuple of n × n matrices (Aj )mj=1

Goal: Compute
det(T (X))
cap(T ) := inf .
X∈Sn
++ det(X)

If the infimum of this problem is attainable and is attained for X ? , then it can used to compute
the optimal solution to the operator scaling problem as follows:
X
Âi Â>
i = I,
i
X
Â>
i Âi = I,
i

where Âi = T (X ? )−1/2 Ai (X ? )1/2 .


The operator capacity turns out to be a non-convex function [Gur04, GGdOW16]. Its geodesic
convexity follows from the example of geodesically convex functions we discussed on the positive
definite cone. In particular, Proposition 5.7 asserts that log det(X) is geodesically linear and Propo-
sition 5.9 asserts that log det(T (X)) is geodesically convex on the positive definite cone with respect
to metric gX (U, V ) := tr[X −1 U X −1 V ]. Thus, their difference log cap(X) is also geodesically convex.

Theorem 7.1 (AGLOW18). Given a strictly positive linear map T (X) = m >
P
j=1 Aj XAj , log cap(X)
is a geodesically convex function on Sn++ with respect to local metric gX (U, V ) := tr[X −1 U X −1 V ].

A Proof of the Fundamental Theorem of Riemannian Manifold


A.1 Uniqueness of the Levi-Civita connection
Let ∇ be a Levi-Civita connection, and X, Y, Z be three vector fields in X(M ). We will show
that, if the assumptions of the Levi-Civita connection hold, then there is a formula which uniquely
determines ∇. First, we compute the vector fields X(g(Y, Z)), Y (g(Z, Y )), and Z(g(X, Y )). By

32
the compatibility condition (Equation (13)), we have

X(g(Y, Z)) =g(∇X Y, Z) + g(Y, ∇X Z)


Y (g(Z, X)) =g(∇Y Z, X) + g(Z, ∇Y X)
Z(g(X, Y )) =g(∇Z X, Y ) + g(X, ∇Z Y ).

Therefore,

X(g(Y, Z)) + Y (g(Z, X)) − Z(g(X, Y ))


=g(∇X Y, Z) + g(Y, ∇X Z) + g(∇Y Z, X) + g(Z, ∇Y X) − g(∇Z X, Y ) − g(X, ∇Z Y )
=g(∇X Y + ∇Y X, Z) + g(∇X Z − ∇Z X, Y ) + g(∇Y Z − ∇Z Y, X)
=g([X, Z], Y ) + g([Y, Z], X) − g([X, Y ], Z) + 2g(∇X Y, Z),

where the last equality is by the torsion-free property of the Levi-Civita connection. Thus,

1
g(∇X Y, Z) = X(g(Y, Z)) + Y (g(Z, X)) − Z(g(X, Y )) (30)
2

− g([X, Z], Y ) − g([Y, Z], X) + g([X, Y ], Z)

for all X, Y, Z ∈ X(M ). Since g is non-degenerate, ∇X Y is uniquely determined by Equation (30),


implying that the Levi-Civita connection is unique.

A.2 Formula for Christoffel symbols in terms of the metric


Equation (30) is called the Koszul formula. Using the Koszul formula we can compute Christoffel
symbols for the Levi-Civita connection. Recall that Γkij denotes the coefficient of (∇∂i ∂j ) in the
direction of ∂k . In Einstein summation convention, we have

∇∂i ∂j = Γkij ∂k ,

where the convention is that we sum over all indices that only appear on one side of the equation
(in this case, we sum over the index k). Define the matrix G := [gij ]n×n where each entry is
gij := g(∂i , ∂j ). Denote each entry (G)−1 ij
ij of its inverse by g . We have

g(∇∂i ∂j , ∂l ) =g(Γkij ∂k , ∂l )
=Γkij gkl
=glk Γkij .

In matrix form,
   1
g(∇∂i ∂j , ∂1 ) Γij
..  .. 
 = G . .
 
 .
g(∇∂i ∂j , ∂n ) Γnij

Thus, Γkij is given by

Γkij = g kl g(∇∂i ∂j , ∂l ). (31)

33
We can compute g(∇∂i ∂j , ∂l ) by setting X to be ∂i , Y to be ∂j , and Z to be ∂l in Equation (30).
Before computing g(∇∂i ∂j , ∂l ), we should note that

g(∇∂i ∂j , ∂l ) = glk Γkij = glk Γkji = g(∇∂j ∂i , ∂l )

since Γkij = Γkji because ∇ is torsion-free. Therefore, Equation (30) implies that

1
g(∇∂i ∂j , ∂l ) = (∂i (g(∂j , ∂l )) + ∂j (g(∂l , ∂i )) − ∂l (g(∂i , ∂j ))
2 (32)
−g([∂i , ∂l ], ∂j ) − g([∂j , ∂l ], ∂i ) + g([∂i , ∂j ], ∂l ) ,

and
g(∇∂i ∂j , ∂l ) = g(∇∂j ∂i , ∂l )
1
= (∂j (g(∂i , ∂l )) + ∂i (g(∂l , ∂j )) − ∂l (g(∂j , ∂i )) (33)
2
−g([∂j , ∂l ], ∂i ) − g([∂i , ∂l ], ∂j ) + g([∂j , ∂i ], ∂l )) .

Since g is symmetric, we have

∂j (g(∂l , ∂i )) =∂j (g(∂i , ∂l )),


∂i (g(∂j , ∂l )) =∂i (g(∂l , ∂j )),
∂l (g(∂i , ∂j )) =∂l (g(∂j , ∂i )).

Thus, combining Equations (32) and (33), we have

g([∂i , ∂j ], ∂l ) = g([∂j , ∂i ], ∂l ).

Since ∇ is torsion-free, we have that [∂i , ∂j ] = −[∂j , ∂i ], implying that g([∂i , ∂j ], ∂l ) = 0. Since
our selection of indices i, j, l was arbitrary, Equation (32) simplifies to
1
g(∇∂i ∂j , ∂l ) = (∂i gjl + ∂j gil − ∂l gij ). (34)
2
Combining (31) and (34), we get
1
Γkij = g kl (∂i gjl + ∂j gil − ∂l gij ).
2

B Euler-Lagrange Dynamics on a Riemannian Manifold


In this section, we derive the Euler-Lagrange dynamics, a useful concept from the calculus of vari-
ations. The significance of Euler-Lagrange equations is that it allows us to characterize minimizers
of following work or action integral:
Z b
S[γ] = L(γ, γ̇, t)dt
a

where L is a smooth function, γ is a curve whose end-points fixed, and γ̇ = dγ


dt . An application of this
characterization is Theorem 4.5. Before we delve into derivations of Euler-Lagrange equations on
a Riemannian manifold, we present a point of view to Euler-Lagrange equations from a Newtonian
mechanics perspective.

34
B.1 Euler-Lagrange dynamics in Newtonian mechanics
In Euler-Lagrangian dynamics, one typically denotes the generalized position of a particle by q
and its generalized velocity by q̇, as it is normally a time derivative of the position. In most cases
we should still think of q̇ as a formal variable independent of q, but if we consider a trajectory
(q(t), q̇(t)) of a particle, then clearly
d
q(t) = q̇(t).
dt
The central object of the Euler-Lagrangian dynamics is the Lagrangian function defined as

L(q1 , . . . , qd , q̇1 , . . . , q̇d , t) = K(q1 , . . . , qd , q̇1 , . . . , q̇d , t) − V (q1 , . . . , qd , t),

where K is to be thought of as kinetic energy and V as potential energy. An example of Lagrangian


equation in 1-dimension is
1
L = mq̇ 2 − V (q). (35)
2
The Lagrangian can also depend on time, for instance if β < 0 is a fixed constant
 
0 βt 1 2
L =e mq̇ − V (q) ,
2

then the above describes a physical system with friction.


We are now ready to state the Euler-Lagrange equations.

Definition B.1. The Euler-Lagrange equations describing the motion of a particle (q, q̇) with
respect to a Lagrangian L(q, q̇, t) are given by

∂L d ∂L
= i = 1, 2, . . . , d. (36)
∂qi dt ∂ q̇i

We can see for instance that the Euler-Lagrange equations with respect to (35) are compatible with
Newtonian dynamics

d ∂L d d ∂L
= (mq̇) = p = F (q) = −∇V (q) = ,
dt ∂ q̇ dt dt ∂q
where q is the position of particle, m is the mass of the particle, p is the momentum of the particle,
F (q) is the force applied at position q, and V (q) is the potential energy at position p. The quantity
∂L ∂L
∂qi is usually referred to as conjugate momenta or generalized momenta, while ∂ q̇i is called the
generalized force. We note that the quantity
Z b
S(q) := L(q, q̇, t)dt,
a

represents the work performed by the particle. The Principle of Least Action asserts that the
trajectory taken by the particle to go from state (q(a), q̇(a)) to state (q(b), q̇(b)) will minimize S(q).
In other words, if we compute partial derivative of S with respect to the trajectory of the particle,
then result should be 0. Starting from this observation and using properties of differentiation and
integrals one can arrive to equations (36). Here, we present the proof of Euler-Lagrange in a more
general setting of an “embedded manifold”.

35
B.2 Derivation of Euler-Lagrange equations on a Riemannian manifold
In this section, we derive Euler-Lagrange equations on a manifold M that is embedded in a larger
Euclidean space. In other words, we assume that M and T M are both part of Rn . This assumption
allows us to use usual the calculus rules and is sufficient for all the applications discussed in this
exposition. One can extend the ideas presented in this section to general manifold setting by
carefully handling local coordinate changes.

Theorem B.2 (Euler-Lagrange equations). Let (M, g) be a Riemannian manifold and p, q ∈


M . Consider the set of all smooth curves Γ that join p to q by mapping [0, 1] to M . Let us consider
a smooth real-valued function L(γ, γ̇, t) for γ ∈ Γ and
Z 1
S[γ] := L(γ, γ̇, t)dt. (37)
0

If γ 0 ∈ Γ is a minimizer of S[γ], then

d ∂L 0 ∂L 0
(γ ) − (γ ) = 0.
dt ∂ γ̇ ∂γ

In order to prove Theorem B.2, we need to argue about the change induced in S by changing γ.
Towards this end, we introduce the variation of a curve.

Definition B.3 (Variation of a curve). Let M be a differentiable manifold and γ : [0, 1] → M


be a smooth curve. A variation of γ is a smooth function ν : (−c, c) × [0, 1] → M for some c > 0
such that ν is ν(0, t) = γ(t), ∀t ∈ [0, 1] and ν(u, 0) = γ(0), ν(u, 1) = γ(1), ∀u ∈ (−c, c).

Figure 7: Variation of a curve. The black curve is the original curve and green curves are its smooth
variations.

Proof of Theorem B.2. γ is a minimizer of S if any small variation on γ causes an increase in


S[γ]. More formally, if ν : (−c, c) × [0, 1] → M is a variation of γ, then

S[ν0 ] ≤ S[νu ]

for any u ∈ (−c, c) where νu (t) := ν(u, t). Consequently,



dS[νu ]
= 0.
du u=0

36
Now, let us evaluate derivative of S[νu ] using (37).

dS[νu ]
0=
du u=0
Z 1  
d ∂νu
= L νu , , t dt
du 0 ∂t
Z 1   u=0
d ∂νu
= L νu , , t dt
0 du ∂t u=0
Z 1* + * +
∂L νu , ∂ν ∂L νu , ∂ν
 
∂t , t ∂t , t
u u
∂νu ∂ (∂νu/∂t)
= , + , dt.
∂νu ∂u ∂ (∂νu/∂t) ∂u

0
u=0 u=0

We note that
∂ (∂νu/∂t) ∂ 2 ν(u, t) ∂ 2 ν(u, t) ∂ (∂νu/∂u)
= = = ,
∂u ∂u∂t ∂t∂u ∂t
by smoothness. Let us denote ∂νu/∂u|u=0 by µ. Then by the chain rule,
Z 1* + * +
∂L νu , ∂ν ∂L νu , ∂ν
 
∂t , t ∂t , t
u u
∂νu ∂ (∂νu/∂t)
0= , + , dt
∂νu ∂u ∂ (∂νu/∂t) ∂u

0
u=0 u=0
Z 1* ∂νu
 + * ∂νu
 +
∂L νu , ∂t , t ∂νu ∂L νu , ∂t , t ∂ (∂νu/∂t)
= , + , dt
∂νu ∂u u=0 ∂ (∂νu/∂t) ∂ν ∂u u=0

0
νu =γ u/∂t=γ̇
Z 1   
∂L(γ, γ̇, t) ∂L(γ, γ̇, t) dµ
= ,µ + , dt.
0 ∂γ ∂ γ̇ dt

We note that µ(0) = µ(1) = 0 as νu (0) = γ(0) and νu (1) = γ(1) for u ∈ (−c, c). Thus, integration
by parts, we obtain
Z 1    Z 1 
∂L(γ, γ̇, t) dµ ∂L(γ, γ̇, t) 1 d ∂L(γ, γ̇, t)
, dt = , µ|0 − , µ dt
0 ∂ γ̇ dt ∂ γ̇ 0 dt ∂ γ̇
Z 1 
d ∂L(γ, γ̇, t)
=− , µ dt.
0 dt ∂ γ̇

Consequently, we have Z 1 
∂L(γ, γ̇, t) d ∂L(γ, γ̇, t)
0= − , µ dt. (38)
0 ∂γ dt ∂ γ̇
If γ is a minimizer of S, then (38) should hold for any variation. As µ = ∂νu/∂u|
u=0 , we can pick µ
to be any smooth vector field along γ. In particular, we can pick µ as

∂L(γ, γ̇, t) d ∂L(γ, γ̇, t)


µ= − ,
∂γ dt ∂ γ̇
since γ is a smooth function. Consequently,
Z 1 
∂L(γ, γ̇, t) d ∂L(γ, γ̇, t)
0= − , µ dt
0 ∂γ dt ∂ γ̇
Z 1 
∂L(γ, γ̇, t) d ∂L(γ, γ̇, t) ∂L(γ, γ̇, t) d ∂L(γ, γ̇, t)
= − , − dt.
0 ∂γ dt ∂ γ̇ ∂γ dt ∂ γ̇

37
This implies that
 
∂L(γ, γ̇, t) d ∂L(γ, γ̇, t) ∂L(γ, γ̇, t) d ∂L(γ, γ̇, t)
− , − =0
∂γ dt ∂ γ̇ ∂γ dt ∂ γ̇
as it is non-negative everywhere. Finally, 0 vector is the only vector whose norm is 0. Therefore,
∂L(γ, γ̇, t) d ∂L(γ, γ̇, t)
− = 0.
∂γ dt ∂ γ̇

C Minimizers of Energy Functional are Geodesics


In this section, we prove Theorem 4.6. Given a curve γ that minimize the energy functional, we
derive the differential equations describing γ in terms of Christoffel symbols and metric. Later, we
verify that these differential equations implies that ∇γ̇ γ̇ = 0 where ∇ is the Levi-Civita connection.
In other words, γ is a geodesic with respect to the Levi-Civita connection.

Proof of Theorem 4.6. Let us assume that M is a d-dimensional manifold. Let us fix a frame
bundle, {∂i }di=1 , for M and Γkij be the Christoffel symbols corresponding to Levi-Civita connection.
Let us also define
d
1 X
E(t) := gij (γ(t))γ̇i (t)γ̇j (t).
2
i,j=1

Before we use Theorem 4.5, let us compute partial derivatives of E.


d
∂E 1 X ∂gjk (γ(t))γ̇j (t)γ̇k (t)
=
∂γi 2 ∂γi
j,k=1
d
1 X ∂gjk (γ(t))
= γ̇j (t)γ̇k (t)
2 ∂γi
j,k=1
d
1 X
= ∂i (gjk (γ(t)))γ̇j (t)γ̇k (t)
2
j,k=1
d
∂E 1 X ∂gjk (γ(t))γ̇j (t)γ̇k (t)
=
∂ γ̇i 2 ∂ γ̇i
j,k=1
d
X
= gij (γ(t))γ̇j (t)
j=1
d
d ∂E d X
= gij (γ(t))γ̇j (t)
dt ∂ γ̇i dt
j=1
d
X dgij (γ(t)) dγ̇j (t)
= γ̇j (t) + gij (γ(t))
dt dt
j=1
d
X
= γ̇(t)(gij (γ(t)))γ̇j (t) + gij (γ(t))γ̈j (t)
j=1

38
d
X d
X
= ∂k (gij )(γ(t))γ̇k (t)γ̇j (t) + gij (γ(t))γ̈j (t)
j,k=1 j=1

for i = 1, . . . , d. Thus, Theorem 4.5 implies that implies that


 
d ∂E ∂E
0= −
dt ∂ γ̇i ∂γi
d d
X 1 X
= ∂k (gij (γ(t)))γ̇k (t)γ̇j (t) + gij (γ(t))γ̈j (t) − ∂i (gjk (γ(t)))γ̇j (t)γ̇k (t)
2
j,k=1 j,k=1
d d
1 X X
= (∂k gij (γ(t)) + ∂j gik (γ(t)) − ∂i gjk (γ(t)))γ̇j (t)γ̇k (t) + gij (γ(t))γ̈j (t)
2
j,k=1 j=1

for i = 1, . . . , d. Consequently,
d
X
γ̈l (t) = δlj γ̈( t)
j=1
d
X
= g li (γ(t))gij (γ(t))γ̈j (t)
i,j=1
d d
1 X X li
=− g (γ(t))(∂k gij (γ(t)) + ∂j gik (γ(t)) − ∂i gjk (γ(t)))γ̇j (t)γ̇k (t)
2
j,k=1 i=1
d
X
=− Γljk γ̇j (t)γ̇k (t)
j,k=1

for l = 1, . . . , d where δlj is Kronecker delta. Therefore,


 
Xd Xd
 γ̇j (t)γ̇k (t)Γljk + γ̈l  ∂l = 0.
l=1 j,k=1

Hence, ∇γ̇ γ̇ = 0 by Proposition 4.2.

References
[AZGL+ 18] Zeyuan Allen-Zhu, Ankit Garg, Yuanzhi Li, Rafael Oliveira, and Avi Wigderson. Op-
erator scaling via geodesically convex optimization, invariant theory and polynomial
identity testing. In Proceedings of the 50th Annual ACM Symposium on Theory of
Computing, Los Angeles, CA, USA, June 25-29, 2018. To appear.

[Bal89] Keith Ball. Volumes of sections of cubes and related problems. In Geometric Aspects
of Functional Analysis, pages 251–260. Springer, 1989.

[BCCT08] Jonathan Bennett, Anthony Carbery, Michael Christ, and Terence Tao. The
Brascamp-Lieb inequalities: finiteness, structure and extremals. Geometric and Func-
tional Analysis, 17(5):1343–1415, 2008.

39
[Bha09] Rajendra Bhatia. Positive definite matrices. Princeton University Press, 2009.
[BL76a] Herm Jan Brascamp and Elliott H Lieb. Best constants in Young’s inequality, its con-
verse, and its generalization to more than three functions. Advances in Mathematics,
20(2):151–173, 1976.
[BL76b] Herm Jan Brascamp and Elliott H Lieb. Best constants in young’s inequality, its con-
verse, and its generalization to more than three functions. Advances in Mathematics,
20(2):151–173, 1976.
[BV04] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university
press, 2004.
[CCE09] Eric A Carlen and Dario Cordero-Erausquin. Subadditivity of the entropy and its
relation to Brascamp–Lieb type inequalities. Geometric and Functional Analysis,
19(2):373–405, 2009.
[DGdOS18] Zeev Dvir, Ankit Garg, Rafael Mendes de Oliveira, and József Solymosi. Rank bounds
for design matrices with block entries and geometric applications. Discrete Analysis,
2018:5, Mar 2018.
[DSW14] Zeev Dvir, Shubhangi Saraf, and Avi Wigderson. Breaking the quadratic barrier
for 3-LCC’s over the reals. In Proceedings of the 46th Annual ACM Symposium on
Theory of Computing, Los Angeles, CA, USA, May 31- June 3, 2014, pages 784–793,
New York, NY, USA, 2014.
[Ein16] A Einstein. The foundation of the generalised theory of relativity. On a Heuristic
Point of View about the Creation and Conversion of Light 1 On the Electrodynam-
ics of Moving Bodies 10 The Development of Our Views on the Composition and
Essence of Radiation 11 The Field Equations of Gravitation 19 The Foundation of
the Generalised Theory of Relativity, 22:22, 1916.
[GGdOW16] Ankit Garg, Leonid Gurvits, Rafael Mendes de Oliveira, and Avi Wigderson. A deter-
ministic polynomial time algorithm for non-commutative rational identity testing. In
IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-
11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 109–117,
2016.
[GGdOW17] Ankit Garg, Leonid Gurvits, Rafael Mendes de Oliveira, and Avi Wigderson. Algo-
rithmic and optimization aspects of Brascamp-Lieb inequalities, via operator scaling.
In Proceedings of the 49th Annual ACM Symposium on Theory of Computing, Mon-
treal, QC, Canada, June 19-23, 2017, pages 397–409, 2017.
[Gur04] Leonid Gurvits. Classical complexity and quantum entanglement. Journal of Com-
puter and System Sciences, 69(3):448–484, 2004.
[Gur06] Leonid Gurvits. Hyperbolic polynomials approach to van der waerden/schrijver-
valiant like conjectures: sharper bounds, simpler proofs and algorithmic applications.
In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing,
pages 417–426. ACM, 2006.
[HM13] Moritz Hardt and Ankur Moitra. Algorithms and hardness for robust subspace re-
covery. In Conference on Learning Theory, pages 354–375, 2013.

40
[LCCV16] Jingbo Liu, Thomas A. Courtade, Paul W. Cuff, and Sergio Verdú. Smoothing
Brascamp-Lieb inequalities and strong converses for common randomness genera-
tion. In IEEE International Symposium on Information Theory, Barcelona, Spain,
July 10-15, 2016, pages 1043–1047. IEEE, 2016.

[LCCV17] Jingbo Liu, Thomas A. Courtade, Paul W. Cuff, and Sergio Verdú. Information-
theoretic perspectives on Brascamp-Lieb inequality and its reverse. CoRR,
abs/1702.06260, 2017.

[Lie90] Elliott H Lieb. Gaussian kernels have only gaussian maximizers. Inventiones mathe-
maticae, 102(1):179–208, 1990.

[Nes04] Yurii Nesterov. Introductory lectures on convex optimization, volume 87. Springer
Science & Business Media, 2004.

[SV14] Mohit Singh and Nisheeth K Vishnoi. Entropy, optimization and counting. In Pro-
ceedings of the forty-sixth annual ACM symposium on Theory of computing, pages
50–59. ACM, 2014.

[SV17a] Damian Straszak and Nisheeth K. Vishnoi. Computing maximum entropy distribu-
tions everywhere. ArXiv e-prints, November 2017.

[SV17b] Damian Straszak and Nisheeth K. Vishnoi. Real stable polynomials and matroids:
optimization and counting. In Proceedings of the 49th Annual ACM Symposium on
Theory of Computing, Montreal, QC, Canada, June 19-23, 2017, pages 370–383,
2017.

[SVY18] Suvrit Sra, Nisheeth K Vishnoi, and Ozan Yildiz. On geodesically convex formulations
for the brascamp-lieb constant. APPROX, arXiv preprint arXiv:1804.04051, 2018.

[Udr94] Constantin Udriste. Convex functions and optimization methods on Riemannian man-
ifolds, volume 297. Springer Science & Business Media, 1994.

[Vis18] Nisheeth K. Vishnoi. Algorithms for convex optimization, May 2018.

41

You might also like