0% found this document useful (0 votes)
56 views

Deterministic Approximation Algorithms For Volumes of Spectrahedra

This document proposes a method for computing asymptotic formulas and approximations for the volumes of spectrahedra based on the maximum entropy principle from statistical physics. The method provides an approximate volume formula using a single convex optimization problem. It yields efficient deterministic approximation algorithms whenever the number of affine constraints is dominated by the dimension of the positive semi-definite cone. The method is applied to compute the asymptotic volume of central sections of the set of density matrices and of multi-way Birkhoff spectrahedra, which represent quantum states with maximal entanglement.

Uploaded by

xunxun
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Deterministic Approximation Algorithms For Volumes of Spectrahedra

This document proposes a method for computing asymptotic formulas and approximations for the volumes of spectrahedra based on the maximum entropy principle from statistical physics. The method provides an approximate volume formula using a single convex optimization problem. It yields efficient deterministic approximation algorithms whenever the number of affine constraints is dominated by the dimension of the positive semi-definite cone. The method is applied to compute the asymptotic volume of central sections of the set of density matrices and of multi-way Birkhoff spectrahedra, which represent quantum states with maximal entanglement.

Uploaded by

xunxun
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Deterministic Approximation Algorithms for Volumes of

Spectrahedra
Mahmut Levent Doğan∗ Jonathan Leake† Mohan Ravichandran‡
arXiv:2211.12541v1 [cs.CG] 22 Nov 2022

November 24, 2022

Abstract
We give a method for computing asymptotic formulas and approximations for the volumes
of spectrahedra, based on the maximum-entropy principle from statistical physics. The method
gives an approximate volume formula based on a single convex optimization problem of min-
imizing − log det P over the spectrahedron. Spectrahedra can be described as affine slices of
the convex cone of positive semi-definite (PSD) matrices, and the method yields efficient de-
terministic approximation algorithms and asymptotic formulas whenever the number of affine
constraints is sufficiently dominated by the dimension of the PSD cone.
Our approach is inspired by the work of Barvinok and Hartigan who used an analogous
framework for approximately computing volumes of polytopes. Spectrahedra, however, possess
a remarkable feature not shared by polytopes, a new fact that we also prove: central sections
of the set of density matrices (the quantum version of the simplex) all have asymptotically
the same volume. This allows for very general approximation algorithms, which apply to large
classes of naturally occurring spectrahedra.
We give two main applications of this method. First, we apply this method to what we
call the “multi-way Birkhoff spectrahedron” and obtain an explicit asymptotic formula for its
volume. This spectrahedron is the set of quantum states with maximal entanglement (i.e., the
quantum states having univariant quantum marginals equal to the identity matrix) and is the
quantum analog of the multi-way Birkhoff polytope. Second, we apply this method to explicitly
compute the asymptotic volume of central sections of the set of density matrices.


Technische Universität Berlin, Institut für Mathematik, Strasse des 17. Juni 136, 10623, Berlin, Germany
([email protected])

University of Waterloo, Department of Combinatorics and Optimization, 200 University Ave W, Waterloo, ON,
Canada ([email protected])

Department of Mathematics, Bogazici University, Bebek, Istanbul ([email protected]).

1
Contents

1 Introduction 1

2 Main Results 2
2.1 General approximation and asymptotic results . . . . . . . . . . . . . . . . . . . . . 3
2.2 The multi-way Birkhoff spectrahedron . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Central sections of the standard spectraplex . . . . . . . . . . . . . . . . . . . . . . . 5

3 Technical Overview 5
3.1 The overarching strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Step 1: The maximum-entropy distribution . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Step 2: The random variable Y = AX . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Step 3: Approximating the density function of Y . . . . . . . . . . . . . . . . . . . . 8
3.5 From the main technical result to the main results . . . . . . . . . . . . . . . . . . . 10

4 Maximum Entropy Distributions over the PSD Cone 11

5 Basic Examples 15
5.1 The spectraplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 One PD constraint and one rank-one constraint . . . . . . . . . . . . . . . . . . . . . 16
5.3 Diagonal constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 Main Example: Multi-stochastic Completely Positive Maps 19


6.1 Transportation polytopes and completely positive maps . . . . . . . . . . . . . . . . 19
6.2 Multi-index transportation polytopes and completely positive maps . . . . . . . . . . 20
6.3 Applying Theorem 2.2 to multi-index Birkhoff spectrahedra . . . . . . . . . . . . . . 21

7 Proof of the Main Technical Result 22


7.1 A few small results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.2 The characteristic function for small t . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.2.1 For large q(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.2.2 For small q(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7.3 The characteristic function for large t . . . . . . . . . . . . . . . . . . . . . . . . . . 31

8 Proofs of the Main Results 34


8.1 Simplifying Theorem 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
8.2 Proof of Theorem 2.1: Main approximation result . . . . . . . . . . . . . . . . . . . . 37
8.3 Proof of Theorem 2.2: Main asymptotic result . . . . . . . . . . . . . . . . . . . . . . 38
8.4 Proof of Corollary 2.5: Central sections of the spectraplex . . . . . . . . . . . . . . . 38

2
1 Introduction
Approximate computation of the volume of convex sets is a fundamental problem in computer
science. Although efficient deterministic approximation is impossible in general (e.g., see [DF88,
Ele86]), a large number of efficient randomized algorithms exist. The first such algorithm was
given in [DFK91], and since then many improvements and other algorithms have been given
[LS93b, BH93, KLS97, LV06, CV16, Cou17, CEF19]. Generally speaking, the problem of vol-
ume computation of convex sets has generated a large amount of literature in mathematics and
computer science.
This work deals with the more specific problem of computing the volume of spectrahedra, an
important class of convex sets that are well-studied in real algebraic geometry, optimization, and
beyond (e.g., see [Tod01, BPT12, WSV12, Ali95] and the references therein). Spectrahedra are
affine slices of the cone of real symmetric positive semi-definite (PSD) matrices, and in that sense,
they are relatives of polytopes, which are affine slices of the positive orthant. However, unlike
the class of polytopes where there are a number of novel approaches with different practical and
theoretical advantages [EF14, EF18, CDWY18, LV18, MV19, CF20, BR21], volume computation
for spectrahedra is often performed using vanilla random walk methods developed for general
convex sets without regard to the explicit presentations that are available [CFRT21]. In this work,
we revisit an idea due to Barvinok and Hartigan [BH10] for deterministically approximating the
volume of polytopes, and we adapt this technique to spectrahedra.
In [BH10], Barvinok and Hartigan employ the maximum entropy approach to approximate
the volume of a polytope P, represented as an affine slice of the positive orthant. This type
of approach was originally used by Jaynes [Jay57a, Jay57b], who was motivated by problems in
statistical mechanics. The general idea is to estimate the average value of a given functional f over
an unknown distribution µ by assuming the distribution maximizes entropy, conditioned on some
given assumptions. In [BH10], the unknown distribution µ is the uniform distribution on P, and
the functional f is the density function of µ, which is inversely proportional to the volume of P.
(When the polytope P is sufficiently complicated, it is reasonable to consider µ to be unknown.)
By assuming µ to be entropy-maximizing on the positive orthant and proving a local central limit
theorem, Barvinok and Hartigan are able to approximate the density function of µ by the density
function of an appropriate Gaussian at its expectation. Under certain conditions on the constraints
and the dimension of the polytope, this gives rise to a deterministic approximate volume formula
for P which depends on the input parameters and the solution to a simple convex optimization
problem over P.
In this paper, we adapt this method to obtain an asymptotic approximation algorithm for
the volume of a given family of spectrahedra {Sn }∞ n=1 whenever the number of affine constraints
which define Sn is sufficiently dominated by the dimension of the PSD cone in which Sn lies. An
asymptotic approximation algorithm is an algorithm for approximating a family of values V (n).
Given any small ǫ, the algorithm approximates the value V (n) with relative error ǫ for any given
n ≥ nǫ , where nǫ has nice dependence on ǫ and possibly other parameters. That is, an asymptotic
approximation algorithm is an approximation algorithm where the allowed input size is lower-
bounded by a function dependent on the error parameter ǫ.
The asymptotic approximation algorithm we obtain in this paper for V (n) = vol(Sn ) is es-
sentially a formula, depending on the input parameters and on the solution to a simple convex
optimization problem over the given spectrahedron Sn . Thus under certain nice circumstances, our
algorithm naturally becomes an explicit asymptotic formula for the given family of spectrahedra.
This is similar to the situation in [BH10] for polytopes.
Beyond our general results, we obtain explicit asymptotic formulas for two particular families of

1
spectrahedra (among a number of other examples). First, we provide an asymptotic formula for the
real symmetric version of the spectrahedron of mixed quantum states with prescribed marginals. A
quantum state is defined Nas a positive semi-definite linear operator A (called the density matrix ) on
the space C(n1 ,...,nk ) := ki=1 Cni with tr(A) = 1. For a subset I ⊂ {1, . . .N
, k}, the quantum marginal
(or the reduced density matrix ) ρI (A) of A is a linear operator on C := i∈I Cni which is obtained
I

by tracing out the spaces Cnj for all j 6∈ I. The quantum marginal problem then asks whether a
given set of quantum marginals is consistent, i.e., given a collection of reduced density matrices does
there exist a quantum state A with these quantum marginals? For a study of the quantum marginal
problem we refer to [Kly02, Kly04]. For recent developments that demonstrate the connection
between the quantum marginal problem and geometric complexity theory, see [BGO+ 18, BFG+ 18].
A special case of the quantum marginal problem is the study of univariant marginals, i.e.,
the quantum marginals obtained by considering ρ{i} (A) for all i ∈ {1, . . . , k}. These quantum
marginals are always consistent and the set of quantum states with given fixed quantum marginals
form a non-empty convex set. In fact, such sets of quantum states are affine slices of the positive
semi-definite cone, and thus they are spectrahedra. Our method applies to the the real symmetric
versions of these families of spectrahedra, and we give a formula for the asymptotic volume in
the special case when the univariant marginals are all identity matrices in Corollary 2.3. These
spectrahedra can be viewed as the quantum analog of the multi-way Birkhoff polytope, consisting
of all multi-stochastic completely positive maps.
As a second example, we provide an asymptotic formula for central sections of the real symmetric
standard spectraplex. The standard spectraplex SN is the spectrahedron consisting of all N ×N PSD
matrices with trace equal to 1 (called density matrices), and it can be considered as the quantum
analog of the standard simplex. A central section is then any intersection of the spectraplex with
a codimension-one affine hyperplane passing through N1 IN , the center of SN . In Corollary 2.5, we
show a remarkable fact: every sequence of central sections of S of increasing dimension has the
same asymptotic formula. This implies the ratio of the maximum and minimum volume of central
sections of SN approaches 1 as N → ∞. This differs from the case of the standard simplex, where
this ratio is bounded below by a constant greater than 1, see [Web96, Brz13].

2 Main Results
Define Sym(N ) to be the space of real symmetric N × N matrices, PSD(N ) to be the set of real
symmetric positive semi-definite matrices, and PD(N ) to be the set of real symmetric positive
definite matrices. Throughout, we consider Sym(N ) as a real inner product space with Frobenius
inner product defined via hX, Y iF := tr(XY ). Given A1 , A2 , . . . , Am ∈ Sym(N ) and b ∈ Rm , we
define a spectrahedron S by:

S := P ∈ PSD(N ) : tr(Ak P ) = bk for k ∈ [m] .

We assume that S is compact, that the constraints tr(Ak P ) = bk are linearly independent, that
m < 2 = dim(PSD(N )), and that S is of dimension exactly N 2+1 − m. We now present our
N +1

main results, leaving the proofs to Section 8.

2
2.1 General approximation and asymptotic results
Given a spectrahedron S as defined above, let P ⋆ ∈ S be the point which maximizes the function
   
N +1 N (N + 1) N +1 N +1
φ(P ) = log ΓN − log + log det(P )
2 2 2e 2
N +1
= const(N ) + log det(P )
2
over S, where ΓN is the multivariate gamma function. The function φ is the entropy function of
an associated Wishart distribution on PSD(N ); see Definition 4.4 and Corollary 4.5. We discuss
this further in Section 4. Let A and B be linear operators from Sym(N ) to Rm , defined via

AX := (tr(A1 X), . . . , tr(Am X))

and  √ √ √ √ 
BX := tr P ⋆ A1 P ⋆ X), . . . , tr( P ⋆ Am P ⋆ X .
In the following results, we approximate the volume of S by the formula
m/2  1/2
det(AA⊤ )

N +1 ⋆
vol(S) ≈ eφ(P ) ,
4π det(BB ⊤ )
under certain conditions on N and m. Our first result gives conditions under which the formula
yields a good approximation for vol(S).
Theorem 2.1 (Main approximation result). Let S be a spectrahedron defined with the above nota-
tion. Fix ǫ ≤ e−1 and suppose that
ǫ2 γ m3 log N
≥ , (1)
log3 (ǫ−1 ) N

where γ is an absolute constant (we can choose γ = 32 · 105 ). Then the number
m/2  1/2
det(AA⊤ )

N +1 ⋆)
eφ(P
4π det(BB ⊤ )
approximates vol(S) within relative error ǫ.
Our second result gives conditions under which the formula yields the correct asymptotics for
the volume of a family of spectrahedra.
Theorem 2.2 (Main asymptotic result). Let {Sn }∞ n=1 be a family of spectrahedra with the above
notation, using the subscript n to denote which spectrahedron in the family {Sn }∞
n=1 we are referring
to. If
 3 
mn log Nn
lim = 0, (2)
n→∞ Nn
then
vol(Sn )
lim 1/2 = 1.
n→∞

Nn +1 mn /2 det(An A⊤
n)
eφn (Pn⋆ )

4π ⊤)
det(Bn Bn

That is, we achieve an asymptotic formula (depending on Pn⋆ ) for the volume of Sn .

3
Utilizing the notation of Theorem 2.2 for a family of spectrahedra {Sn }∞ n=1 , the above results
imply an asymptotic approximation algorithm whenever Condition (2) is satisfied: Given small
ǫ > 0, there exists nǫ such that for all n ≥ nǫ we have a formula (based on the optimizer Pn⋆ ) which
approximates vol(Sn ) within relative error ǫ. Computing the optimizer Pn⋆ of the convex function
− log det P is the last required step of the algorithm, and this can be done efficiently using standard
convex optimization techniques like interior point methods or the ellipsoid method. See [VBW98]
for further discussion related to this specific optimization problem.

2.2 The multi-way Birkhoff spectrahedron


We now give a formula for the asymptotic volume of the multi-way Birkhoff spectrahedron, consist-
ing of all multi-stochastic completely positive maps on real symmetric matrices. This spectrahedron
can be viewed as the quantum analog of the multi-way Birkhoff polytope, but we leave the formal
definitions of this and all notation to Section 6. We also give the proof of the asymptotic formula
in Section 6. Fix n, k ∈ N and consider the spectraplex

S := {A ∈ PSD(nk ) : tr(A) = 1},

which can be interpreted as the set of density matrices (with real entries), acting on the space
(Rn )⊗k . For i = 1, 2, . . . , k, the i-th partial trace operator is the unique linear map

tri : PSD(nk ) → PSD(n)

defined by the property that

tri (A1 ⊗ A2 ⊗ · · · ⊗ Ak ) = tr(Abı ) Ai ,

where Abı is interpreted as a linear operator on (Rn )⊗(k−1) . For a density matrix A, the partial traces
tri (A) are called the (univariant) quantum marginals of A. It is known that tri is a completely
positive map and tri (A) ∈ PSD(n) for every A ∈ S. The multi-way Birkhoff spectrahedron is then
defined to be
n o
SCP n,k := A ∈ PSD(nk ) : tr(A) = n and tri (A) = In for all i = 1, . . . , k .
n+1
Corollary 2.3. Fix k ≥ 7, and set N := nk and m := k 2 −k+1 for any n ∈ N. As a function
of n, the asymptotic volume of SCP n,k is given by
 m  m   N(N+1)  
N +1 2 N 2en 2 N +1
vol(SCP n,k ) ≈ ΓN .
4π n N (N + 1) 2
Now assume that k = 2 and consider the intersection of the Birkhoff spectrahedron with the
set of n2 × n2 -diagonal matrices. Then, SCP n,k ∩ Diag(n2 ) is a polytope. Indeed, this polytope
is the well known Birkhoff polytope of n × n, doubly stochtastic matrices. Similarly, for k ≥ 3,
the intersection of SCP n,k with diagonal matrices gives rise to a polytope, the multi-way Birkhoff
polytope. This observation justifies the name Birkhoff spectrahedron and we view SCP n,k as the
quantum analog of the Birkhoff polytope.
Remark 2.4. The usual definitions of quantum states and quantum marginals consider operators
on complex vector spaces rather than real ones. Our general results above only apply to the real
symmetric PSD cone, and thus we only give the asymptotic formulas in the real symmetric case.
That said, we believe similar results should be possible for the Hermitian PSD cone, and even the
quaternionic PSD cone; see the discussion at the end of Section 3.5.

4
2.3 Central sections of the standard spectraplex
Now consider the standard spectraplex, defined via
S1 := {P ∈ PSD(N ) : tr(P ) = 1} .
Equivalently S1 is the set of real symmetric density matrices, and it is the spectrahedral analog of
the standard simplex. Both S1 and φ(P ) are invariant under conjugation by invertible matrices,
which shows that the analytic center of S1 equals P ∗ = N1 IN . A central section of S1 is then given
by intersecting the spectraplex with an affine hyperplane through its center N1 IN . That is,
 
tr(M )
SM := P ∈ PSD(N ) : tr(P ) = 1 and tr(M P ) = ,
N
for some M ∈ Sym(N ) linearly independent of IN . Note that P ∗ = N1 IN always satisfies the
second condition, and thus the analytic center of SM is N1 IN . Therefore the approximation formula
from Theorem 2.1 gives the same formula for every choice of M . Our next result shows that,
asymptotically, this formula actually gives a good approximation for vol(SM ). This result is an
immediate corollary of Theorem 2.2.
Corollary 2.5 (Central sections of the spectraplex). For every ǫ < e−1 , there exists Nǫ such that
for all N ≥ Nǫ and all M ∈ Sym(N ) linearly independent of IN , the number
 N(N+1)
N 2 (N + 1) φ( 1 IN ) N 2 (N + 1)
  
2e 2 N +1
e N = ΓN
4π 4π N (N + 1) 2
approximates vol(SM ) within relative error ǫ. In particular, there is a single asymptotic formula
for the volume, which holds for any sequence of central sections of increasing dimension.
Corollary 2.5 demonstrates a striking distinction between spectrahedra and polytopes: All
central sections of the spectraplex have the same asymptotic volume, but the analogous statement
is false for the standard simplex. The analytic center of the standard simplex ∆n = {x ∈ Rn+1 ≥0 :
Pn+1 1
i=1 xi = 1} is the scaled all-ones vector n+1 1n+1 , and there exist two central sections of the
standard simplex such that the ratio of their volumes limits to √e2 as n → ∞ (one containing n − 1
vertices of ∆n , and the other one parallel to one of the facets). It is an open problem whether
this ratio is optimal. We refer to [Web96, Brz13, AS17] for relevant computations and further
discussion.
Remark 2.6. A version of Corollary 2.5 actually holds for spectrahedra obtained by intersecting
S1 with any collection of m linearly independent affine hyperplanes that pass through N1 IN , where
m is a fixed constant. One can essentially obtain the same result with the same proof; the only
difference is that the approximate volume formula will depend on m (but not on the choice of affine
hyperplanes).

3 Technical Overview
In this section, we give an overview of the techniques used for the main results. The main results
given in Section 2 are all derived in Section 8 from our main technical result given in Theorem 7.1.
That said, we first give a birds-eye view on the overarching conceptual strategy of the proof of our
main technical result, and then we discuss in more detail what must actually be done to make this
strategy work. After that, we briefly discuss how we use our main technical result (Theorem 7.1)
to prove our main results.

5
3.1 The overarching strategy
In [BH10], the maximum-entropy approach is utilized to approximate the volume of certain poly-
topes. (This approach is also utilized to approximately count integer points of polytopes, but we
will not discuss this further here.) The authors consider a polytope P to be given by an affine slice
of the positive orthant, denoted by
P := {x ∈ Rn≥0 : Ax = b}
for some fixed m × n real matrix A and b ∈ Rm , where m < n and A has rank m. One main goal
of their paper is then to approximate the volume of P in various situations.
In this paper, we adapt the maximum-entropy strategy from [BH10] to the case of spectrahedra.
We consider a spectrahedron S to be given by an affine slice of the positive semi-definite (PSD)
cone contained in the space of real symmetric matrices, denoted by
S := {P ∈ PSD(N ) : tr(Ak P ) = bk for k = 1, 2, . . . , m}
for some fixed N × N real symmetric A1 , . . . , Am and b ∈ Rm , where m < N 2+1 = dim(PSD(N ))


and A1 , . . . , Am are linearly independent. We will denote by A the linear map from N × N matrices
to Rm sending P to (tr(A1 P ), tr(A2 P ), . . . , tr(Am P )), and thus S can also be written as S = {P ∈
PSD(N ) : AP = b}. We assume that S is a compact subset of the PSD cone.
Our main goal is then to approximate the volume of S in various situations. The overarching
strategy consists of the following steps, which have been adapted from the polytope case in [BH10].
(One can see the polytope case in the following steps by using the positive orthant Rn≥0 instead of
the PSD cone.)
1. Construct a certain random variable X on PSD(N ) which has expectation in S, and for which
the density function fX is constant on S. This random variable X is distributed according to
a certain maximum-entropy distribution µ on PSD(N ), and it has log-linear density function
fX (P ) ∝ e− tr(M P ) for some M ∈ PD(N ).
2. Construct the random variable Y = AX = (tr(A1 X), . . . , tr(Am X)) ∈ Rm which has expecta-
tion E[AX] = AE[X] = b. Since the density function fX of X is constant on S, the volume of
S can be related to the density function fY of Y via the standard change of variables formula
for Y :
dx vol(S) · fX (P0 )
Z
fY (b) = fX (x) p = p ,
A−1 (b) det(AA⊤ ) det(AA⊤ )
where P0 is any point of P .
3. Compute fX (P0 ) and fY (b) and rearrange to compute vol(S). Since fX is log-linear, fX (P0 )
is easy to compute directly. Since Y is a linear transformation applied to a maximum-
entropy distribution, it approximates a Gaussian random variable under certain conditions.
(Gaussians are the maximum-entropy distributions on Rm .) Thus apply a local central limit
theorem-type argument to approximate fY (y) at its expectation y = b.
Conceptually the steps are not very different between the polytope case and the spectrahedron
case, and in fact this overarching strategy theoretically may work for affine slices of any convex
body. There are two key features that then make this strategy practical for spectrahedra. First, the
maximum entropy value in Step 1 is equal to log det M (up to constant) for some M ∈ S. The fact
that this quantity has a nice formula is a crucial observation (see Section 4), and it only occurs for
certain classes of convex bodies (including polytopes and spectrahedra). And second, the technical
arguments of [BH10] used to prove the local central limit theorem in Step 3 can be adapted to the
spectrahedron case. This adaptation requires substantial work, as can be seen in Section 7.

6
3.2 Step 1: The maximum-entropy distribution
Maximum-entropy distributions on various subsets of Rn have been well-studied since the seminal
work of Shannon [Sha48] and Jaynes [Jay57a, Jay57b]. The most basic max-entropy optimization
problem on a convex domain K ⊂ Rn can be formulated as:
1
Z
f= arg sup f (x) log dx,
f, density function f (x)
supp(f )⊆K
1
R
where f (x) log f (x) dx is known as the differential entropy of the distribution µ with density
function f . (Note that solutions may or may not exist for the above problem as stated.)
In the case that K ⊆ Rn is a compact convex set, there is a unique max-entropy distribution
µ: the uniform distribution on K. When K is a non-compact convex domain, there is no longer a
unique max-entropy distribution, and thus more information must be specified to obtain uniqueness.
One natural way to do this is to restrict the expectation of µ. In particular, if K is the PSD cone
N+1
PSD(N ) ⊂ R( 2 ) , then there is a unique max-entropy distribution for each choice of expectation
in the strictly positive definite cone PD(N ). More generally, maximizing entropy over a space of
distributions with restricted expectation can yield distributions with interesting properties.
In the case of K = PSD(N ), all max-entropy distributions belong to the family of Wishart distri-
butions on PSD(N ) with log-linear density function given by f (P ) ∝ e− tr(M P ) , where M ∈ PD(N )
depends on the desired expectation of the distribution. This is proven in Section 4 by computing
the dual convex program to the following infinite-dimensional maximum entropy program:
1
Z
ψ(S) = sup f (x) log dx.
f, density function f (x)
supp(f )⊆PSD(N )
E[f ]∈S

We show that the dual convex program to this maximum entropy program is given by the finite-
dimensional optimization problem
 
N +1
ψ(S) = sup φ(P ) = sup const(N ) + log det(P ) ,
P ∈S P ∈S 2

where the first equality above is a strong duality result which says that the optimal values of the
primal and dual programs are equal. The function φ(P ) is precisely the entropy of the associated
Wishart distribution with expectation P .
Thus, the value of P ∈ S which optimizes the function φ gives rise to the max-entropy distribu-
⋆ −1
tion on PSD(N ) with expectation contained in S. Specifically, it is given by f (x) ∝ e− tr((P ) P )
where P ⋆ is the optimizer of φ. We prove this in Section 4, but it also follows from the Appendix
of [LV20] where a max-entropy strong duality theorem is proven in much greater generality. See
also Section 3 of [BH10] for analogous results in the polytope case.
⋆ −1
Given a random variable X with associated density function fX (x) ∝ e− tr((P ) P ) , the crucial
observation is then that this density function is constant on the spectrahedron S ⊂ PSD(N ).
Conceptually this is related to the fact that uniform distributions maximize entropy on compact
convex domains. More concretely, this follows from a basic gradient computation using the dual
formulation above. If P ⋆ optimizes φ, then the following equivalent conditions hold for all P, Q ∈ S:
hP − Q, ∇φ(P ⋆ )i = tr((P − Q)∇φ(P ⋆ )) = 0 ⇐⇒ tr((P ⋆ )−1 P ) = tr((P ⋆ )−1 Q),
⋆ −1
since ∇φ(P ⋆ ) = (P ⋆ )−1 . Thus fX (x) ∝ e− tr((P ) P ) is constant on S.
With this, we have a max-entropy distribution on PSD(N ) which is constant on S. This will en-
able us to relate the volume of S to other quantities as described in the overarching strategy above.

7
Further, computing this max-entropy distribution boils down to a particular finite-dimensional
optimization problem: maximizing φ over the polytope S. Such problems are efficiently solvable
using interior-point methods (the maximizer of φ is known as the analytic center of S, see [Ren01]
and [NN94]) and the ellipsoid method (e.g., see [NY76, NY77, Sho77]). See [SV14] and [LV20] for
further discussion on the computational complexity of solving such max-entropy optimization prob-
lems. See also [VBW98] for optimization techniques applied to the specific problem of determinant
maximization. Finally, see [BH10] for further discussion in the polytope case.

Remark 3.1. The function φ(P ) used above is the entropy of the associated Wishart distribution
with expectation P . That is, it is not the von Neumann entropy of the matrix P , which a priori
might seem to be the natural measure of entropy for a given PSD matrix. Using the Wishart entropy
instead of the von Neumann entropy is crucial to our analysis, and a similar situation can be found
in [LV20]. See [LV20] for further discussion on why the von Neumann entropy is the incorrect
measure of entropy in this case.

3.3 Step 2: The random variable Y = AX


Recall the spectrahedron S is defined as the affine slice of PSD(N ) given by AP = b, where
AP = (tr(A1 P ), . . . , tr(Am P )). Let X be a random variable with density function fX distributed
according to the max-entropy distribution on PSD(N ) discussed above, with expectation contained
in S. We now construct a new random variable Y = AX on Rm with density function fY . Since Y
is defined as a linear map applied to X, we can use the standard change of variables formula for
evaluating the density function of Y at b:
dP dP fX (P0 )
Z Z
fY (b) = fX (P ) p = fX (P ) p = vol(S) p ,
−1
A (b) ⊤
det(AA ) S ⊤
det(AA ) det(AA⊤ )
where P0 is any point of S.
⋆ −1
After computing P ⋆ as above, we have fX (P ) = Z1 e− tr((P ) P ) where the appropriate nor-
malizing constant Z is somewhat complicated, but can be computed explicitly by considering the
expression for the density function of the Wishart distribution. A straightforward computation

then yields fX (P ⋆ ) = e−φ(P ) , where φ(P ⋆ ) is defined above as the optimal entropy value. We then
rearrange the above expression to achieve the following formula for the volume of S:
q
φ(P ⋆ )
vol(S) = fY (b) · e det(AA⊤ ).
What remains then is to compute fY (b).

3.4 Step 3: Approximating the density function of Y


To achieve an approximation formula for the volume of S, the above expression shows we now
only need to approximate fY (b), the density function of Y = AX at its expectation b. For the
polytope case, this is the part of the argument that requires the most technical work (see Section
6 in [BH10]), and a substantial part of this paper is then dedicated to generalizing their arguments
to spectrahedra (see Section 7).
The overarching idea is that, since X is distributed according to a max-entropy distribution on
PSD(N ), it is reasonable to guess that Y is distributed according to a max-entropy distribution
on Rm . Max-entropy distributions on Rm are precisely multivariate Gaussian distributions, and
thus we assume that Y is distributed according to a multivariate Gaussian with expectation b and
covariance matrix ΩY of Y . We can then hope to approximate fY (b) by simply evaluating the
density function of this Gaussian at its expectation. To do this, we first compute the covariance

8
matrix of Y . Note first that since X is distributed according to the Wishart distribution with
expectation P ⋆ (and N + 1 degrees of freedom), we may write
1 √ ⋆ √
X= P GG⊤ P ⋆ ,
N +1
where G is an N × (N + 1) random matrix with independent standard Gaussian entries. If uij is
the (i, j)th coordinate of GG⊤ , then

cov(uij , ui′ j ′ ) = (N + 1) · δi=i′ ,j=j ′ + δi=j ′ ,j=i′
by bilinearity of cov and standard computations with independent√normal√random variables. Re-
calling Y = AX = (tr(A1 X), . . . , tr(Am X)) and defining Zk := P ⋆ Ak P ⋆ , a straightforward
computation using the above covariance computation for uij yields
1 
⊤ ⊤
 2
cov(Yi , Yj ) = 2
cov tr(Z i GG ), tr(Zj GG ) = tr (Zi Zj ) .
(N + 1) N +1
Thus the covariance matrix of Y is N 2+1 times the Gram matrix of the matrices Z1 , . . . , Zm . Letting
B be the linear map from N × N matrices to Rm defined by BP := (tr(Z1 P ), . . . , tr(Zm P )), this
implies ΩY = N 2+1 BB ⊤ . Computing the density of the appropriate Gaussian evaluated at its
expectation then gives
N + 1 m/2
   1/2
−1/2 1
fY (b) ≈ det(2π · ΩY ) = .
4π det(BB ⊤ )
This yields the following approximation formula for the volume of S:
1/2
N + 1 m/2 det(AA⊤ )
  

vol(S) ≈ ⊤
eφ(P ) ,
4π det(BB )
where φ(P ⋆ ) is the maximum entropy value, as discussed above. This is precisely the formula given
in Section 2.
What remains then is to justify our assumption that Y is distributed like a multivariate Gaussian
with expectation b and covariance matrix ΩY . We do this by proving a local central limit theorem
for Y in a number of situations. In the polytope case, the argument in [BH10] relies heavily on the
well-roundedness of B, which is described by the ratio of the maximum column norm √ of B to √ the
minimum singular value of B. This quantity resembles the condition number of BB√ = ΩY , ⊤

but is crucially different, as replacing the well-roundedness by the condition number of ΩY yields
vacuous results in [BH10].
To prove our main results for spectrahedra, we first prove a technical result (Theorem 7.1)
which is analogous to the main result of [BH10]. This result relies upon the use of a similar well-
roundedness quantity, suited to spectrahedra. One problem with this is that the maximum column
norm of B in this case has little to no conceptual meaning. Thus we replace the maximum column
norm by
max k(x⊤ Zk x)mk=1 k2 ,
kxk2 =1
where the matrices Zk are as defined above. Note that this corresponds the maximum column
norm of B whenever the matrices Zk are diagonal, and thus this leads to a natural matrix-theoretic
generalization of the well-roundedness. That said, this quantity remains a bit of a mystery to us,
and a more conceptual understanding of it would likely lead to improved results or simpler proofs.
A good upper bound on the well-roundedness of B then enables a proof of the local central
limit theorem, which implies the above approximate volume formula whenever the dimension of S
is large. (Conceptually, the largeness of the dimension is required because the local central limit
theorem only applies in the limit.) This is the last piece of the argument which leads to the main
technical result (Theorem 7.1), from which we derive our main results.

9
3.5 From the main technical result to the main results
We now discuss how our main technical result (Theorem 7.1) discussed above implies our main
results in Section 2. The key idea is to remove reliance upon the well-roundedness quantity discussed
in the previous section, by replacing it with something conceptually simpler. It is straightforward
from the√definition that well-roundedness of B can be upper bounded by the condition number of the
matrix BB ⊤ . Thus in theory, we can replace the well-roundedness of B by this condition number,
at the expense that this will weaken the result. And even better: by a straightforward
√ change
√ of
variables, we can assume further that the conjugated constraint matrices Zk = P Ak P are ⋆ ⋆

orthonormal, without changing the spectrahedron or the approximation formula √ (see Lemma 8.1).
This implies BB ⊤ = Im , and thus we can replace the condition number of BB ⊤ with its optimal
value of 1.
The crucial difference between the polytope and spectrahedron case is then what happens when
we make this replacement. As discussed above, this replacement leads to vacuous results in the
polytope case. This is due to the fact that applying Theorem 2.2 of [BH10] roughly requires
ω 2 · m2 log n → 0
asymptotically, where n is the dimension of the positive orthant where the polytope P lies, m is
the number of affine constraints which cut out P, and ω is the well-roundedness of B. We can
replace ω with the optimal condition number value of 1, but the problem is then that m2 log n → 0
is never satisfied.
On the other hand, the requirement in the spectrahedron case (see Condition A3 of Theorem
7.1) essentially becomes
m3 log N
ω2 · → 0.
N
We can then replace ω with the optimal condition number value of 1, and it is still a priori possible
that this requirement is satisfied. In fact, it is satisfied for many interesting classes of spectrahedra
(such as those having m = O(1)), and the main results of Section 2 spell out various general cases
that follow as straightforward corollaries; see Section 8.
Why this difference between the polytope case and the spectrahedron case? The denominator
factor of N in the above requirement for spectrahedra is the key difference, and it is related to the
extra factor of N 2+1 that appears in the expression for the covariance matrix of Y in Section 3.4
above. (In the polytope case, where exponential distributions are considered, an analogous factor
does not appear.) That said, it is not immediately clear to us philosophically why this difference
occurs. It appears in the computations of the proof of Theorem 7.1 due to explicit differences
in the characteristic functions of the Wishart distribution and of the exponential distribution. In
particular, an extra factor of N 2+1 appears for the Wishart distrbution; see Section 7.2. This
suggests to us that similar results to those presented in [BH10] and in this paper may be possible
for affine slices of symmetric cones in general (e.g., cones of Hermitian PSD matrices and cones of
quaternionic PSD matrices). The rank of a given symmetric cone may then have some connection
to these extra factors that appear. We do not discuss this any further here.

Remark 3.2. The difference between the polytope and spectrahedron cases discussed above may
seem a bit strange since spectrahedra are generalizations of polytopes. That said, our results for
spectrahedra do not actually improve the results in the polytope case. Representing a polytope via a
spectrahedral representation requires affine constraints which zero out all off-diagonal entries; this
would force m to be of the order N 2 and cause the above required condition to become vacuous (as
in the polytope case).

10
4 Maximum Entropy Distributions over the PSD Cone
In this section, we construct a probability distribution which maximizes differential entropy over
probability distributions on PSD(N
 ) with agiven fixed expectation. This probability distribution
P
is the Wishart distribution WN N +1 , N + 1 on PSD(N ) for any choice of expectation P ∈ PD(N )
(see Theorem 4.3). This is in fact precisely the reason why the function φ appears in Theorem 7.1
above; specifically, the density function of the Wishart distribution with expectation P evaluated
at P is given by e−φ(P ) (see Theorem 4.3 and Definition 4.4).
The way we utilize this fact to approximate the volume of spectrahedra then goes as follows.
Suppose P ⋆ maximizes φ(P ) over
 ⋆the spectrahedron
 S. Then in fact the Wishart distribution with
⋆ P
expectation P given by WN N +1 , N + 1 has constant density on all of S (see Corollary 4.6).
In particular, this means that the volume of S is inversely proportional to the density function of

this Wishart distribution evaluated at P ⋆ ∈ S, which is given by e−φ(P ) as mentioned above. All
that remains is then to determine the constant of proportionality, and this is discussed in detail in
Section 7.
The remainder of this section is spent proving the facts we describe above. We start with a few
technical lemmas, and then prove the main results of this section in Theorem 4.3 and Corollary
4.6.

Lemma 4.1 (see [Gül96]). Given Y ∈ PD(N ), we have

N +1
Z
log e− tr(Y Q) dQ = − log det(Y ) + C
PSD(N ) 2

− tr(Q) dQ = log ΓN ( N 2+1 ), where ΓN is the multivariate gamma function.


R
with C = log PSD(N ) e
√ √ √ √
Proof. We may write tr(Y Q) = tr( Y Q Y ). Note that Q 7→ Y Q Y is an endomorphism of
Sym(N ) which preserves
√ √ PSD(N ). If λ1 , λ2 , . . . , λN denote the eigenvalues of Y , then the Jacobian
of the map Q 7→ Y Q Y equals
N
Y Yp p 
λi λi λj = det(Y )(N +1)/2
i=1 i6=j
√ √
which can be seen by diagonalizing Y . Hence, applying the change of variables P = Y Q Y to
the above integral, the equality
Z Z
− tr(Y Q) −(N +1)/2
e dQ = det(Y ) e− tr(P ) dP
PSD(N ) PSD(N )

holds. The result follows from taking logarithm of both sides.

Corollary 4.2. Given P ∈ PD(N ), the convex optimization problem


" Z #
− tr(Y Q)
inf tr(Y P ) + log e dQ
Y ∈PD(N ) PSD(N )

N +1 −1
attains its unique minimum at Y ⋆ = 2 P .

11
Proof. By Lemma 4.1, we can write an equivalent optimization problem as
 
N +1
inf tr(Y P ) − log det(Y ) .
Y ∈PD(N ) 2

This objective function is continuous in Y ∈ PD(N ), and by a standard computation we now


compute its gradient via
 
N +1 N + 1 −1
∇ tr(Y P ) − log det(Y ) = P − Y .
2 2
N +1 −1
Therefore the gradient is 0 if and only if Y = 2 P , and thus the objective function attains its
unique minimum at Y ⋆ = N 2+1 P −1 .

The following is proven in more generality in [LV20], but we give a direct proof for the specific
case of distributions on PSD(N ) for the sake of the reader. In particular, we do not actually
prove here that the following optimization problems are a primal-dual pair with strong duality. We
instead simply demonstrate that their optimal values are equal.
Theorem 4.3. For P ∈ PD(N ), let us define φ(P ) to be the maximum differential entropy of a
probability distribution on PSD(N ) with expectation P ; that is,
Z
φ(P ) = sup −f (Q) log f (Q) dQ
PSD(N )
subject to: f (P ) > 0 for all P ∈ PD(N )
Z
f (Q) dQ = 1
PSD(N )
Z
f (Q) Q dQ = P.
PSD(N )

Then φ(P ) is equal to the dual optimum of the above primal optimization problem; that is,
" Z #
− tr(Y Q)
φ(P ) = inf tr(Y P ) + log e dQ .
Y ∈PD(N ) PSD(N )

For fixed P ∈ PD(N ), there is a probability distribution µ⋆ on PSD(N ) with differential entropy
φ(P ) and expectation P , and with density function f ⋆ which can be expressed via

⋆ e− tr(Y Q)
f (Q) = R − tr(Y ⋆ R) dR
,
PSD(N ) e

where Y ⋆ ∈ PD(N ) is the unique minimizer Y ⋆ of the dual formulation.


Proof. By Corollary 4.2, the dual formulation of φ(P ) attains its unique minimum at Y ⋆ = N 2+1 P −1 .
We first show that f ⋆ (Q) is a maximizer for the primal (maximum entropy) optimization problem.
Since differential entropy is concave, we compute the derivative of the primal objective function in
all directions orthogonal to the two linear constraints of primal optimization problem. Specifically,
let g be bounded and compactly supported on PSD(N ), and such that
Z Z
g(Q) dQ = 0 and Q · g(Q) dQ = 0.
PSD(N ) PSD(N )

12
Letting f ⋆ be defined as above, we now compute

d
Z
− (f ⋆ + t · g)(Q) log(f ⋆ + t · g)(Q) dQ
dt t=0 PSD(N )
g(Q)
Z Z

=− g(Q) log f (Q) dQ − f ⋆ (Q) · ⋆ dQ
PSD(N ) PSD(N ) f (Q)
" ⋆
#
e− tr(Y Q)
Z
=− g(Q) log R − tr(Y ⋆ R) dR
dQ
PSD(N ) PSD(N ) e
Z
= g(Q) · tr(Y ⋆ Q) dQ
PSD(N )
Z !
= tr Y ∗ Q · g(Q) dQ
PSD(N )

= 0.

To show that f ⋆ is a maximizer for the primal problem, we just need to show R that it satisfies all
the necessary constraints. First, that f ⋆ (Q) > 0 for all Q ∈ PD(N ) and that PSD(N ) f ⋆ (Q) dQ = 1
are immediate from the definition of f ⋆ . To show that PSD(N ) Q · f ⋆ (Q) dQ = P , we compute the
R

gradient of the dual program directly via


− tr(Y ⋆ Q) dQ
" # R
PSD(N ) Q · e
Z
− tr(Y Q)
0 = ∇|Y =Y ⋆ tr(Y P ) + log e dQ = P − R − tr(Y ⋆ Q) dQ
PSD(N ) PSD(N ) e

e− tr(Y Q)
Z
=P− Q· R − tr(Y ⋆ R) dR
dQ
PSD(N ) PSD(N ) e
Z
=P− Q · f ⋆ (Q) dQ.
PSD(N )

That is, f ⋆ is a maximizer of the primal (maximum entropy) optimization problem.


We now show that the optimal values of the primal and dual problems are equal. Since we have
determined optimal inputs for both problems, we simply compute the objective functions and show
that they are equal. We compute
Z
− f ⋆ (Q) log f ⋆ (Q) dQ
PSD(N )
⋆ ⋆
e− tr(Y Q) e− tr(Y Q)
Z
=− R ⋆
− tr(Y R) dR
log R
− tr(Y ⋆ R) dR
dQ
PSD(N ) PSD(N ) e PSD(N ) e

e− tr(Y Q)
Z Z
⋆ ⋆
= R
− tr(Y ⋆ R) dR
· tr(Y Q) dQ + log e− tr(Y Q) dQ
PSD(N ) PSD(N ) e PSD(N )

!
e− tr(Y Q)
Z Z
⋆ ⋆
= tr Y Q· R
− tr(Y ⋆ R) dQ + log e− tr(Y Q) dQ
PSD(N ) PSD(N ) e dR PSD(N )
Z

= tr(Y ⋆ P ) + log e− tr(Y Q) dQ.
PSD(N )

This is precisely the dual objective function evaluated at Y = Y ⋆ , and this completes the proof.

13
The function φ evaluated at its maximizer P ⋆ gives the maximum entropy value, and we have
shown in Theorem 4.3 above that this is also the optimal value of the dual convex program. We
now formally name this function φ, and then show that it is equal to the expression for φ given in
Section 2.

Definition 4.4. We refer to the function φ from Theorem 4.3 as the maximum entropy function
on PD(N ).

Corollary 4.5. The maximum entropy function φ can be expressed as


   
N +1 N (N + 1) N +1 N +1
φ(P ) = log ΓN − log + log det(P ).
2 2 2e 2

Proof. By Theorem 4.3, Lemma 4.1, and Corollary 4.2, we have


 
N +1 N +1
φ(P ) = log ΓN + tr(Y ⋆ P ) − log det(Y ⋆ )
2 2
   
N +1 N (N + 1) N +1 N +1
= log ΓN − log + log det(P ),
2 2 2e 2
N +1 −1
where Y ⋆ = 2 P .

Corollary 4.6. Given real symmetric matrices A1 , . . . , Am and real vector b ∈ Rm , define the
corresponding spectrahedron

S = {X ∈ PSD(N ) : tr(Ai X) = bi , i ∈ [m]}.

Suppose that S is bounded. The maximum entropy function φ is concave and attains its unique
maximum over S at a unique point P ⋆ in the relative interior of S. Further, let µ⋆ (with density
function f ⋆ ) be the probability distribution on PSD(N ) guaranteed by Theorem 4.3, with differential
entropy φ(P ⋆ ) and expectation P ⋆ . Then f ⋆ is constant on S; specifically,
⋆)
f ⋆ (P ) = e−φ(P for all P ∈ S.

Proof. Using the formula for the maximum entropy function φ from Corollary 4.5, we compute its
gradient via  
N +1 N + 1 −1
∇φ(P ) = ∇ log det(P ) = P .
2 2
Note first that for P near the boundary of S, the determinant of P approaches 0 and thus φ(P )
approaches −∞. Therefore by boundedness of S, it must be that φ is maximized at some point
in the relative interior of S. Concavity of φ and uniqueness of the maximizer then follow from the
fact that log det(P ) is a strictly concave function on PD(N ); see Example 11.7 of [BV04]. Let us
denote this unique maximizer by P ⋆ .
Since φ is smooth on PD(N ), the gradient of φ at P ⋆ is 0 when restricted to S. Equivalently,
the gradient of φ at P ⋆ must be orthogonal to the affine span of S. That is, for all P ∈ S we have
N +1
0 = tr (∇φ(P ⋆ ) · (P − P ⋆ )) = tr (P ⋆ )−1 (P − P ⋆ )

2
N +1 N (N + 1)
tr (P ⋆ )−1 P =

=⇒ .
2 2

14
By Corollary 4.2 and Theorem 4.3, we have that
⋆ Q) N+1
tr((P ⋆ )−1 Q)
f ⋆ (Q) ∝ e− tr(Y = e− 2 ,

where Y ⋆ is the unique minimizer of the dual formulation for P ⋆ . It immediately follows that f ⋆ is

constant on S. To see that this constant is equal to e−φ(P ) , we use Corollary 4.5 to compute
  N(N+1)
−φ(P ⋆ ) 1 N +1 2 N+1
e = N +1
 det(P ⋆ )− 2 .
ΓN 2
2e

By the above computations and Lemma 4.1, we then have


N+1
tr((P ⋆ )−1 P ⋆ )
e− 2
f ⋆ (P ⋆ ) = R
− N+1 tr((P ⋆ )−1 Q)
PSD(N ) e dQ
2

N(N+1) N+1 N+1 N+1


· e 2 log det( 2 (P ) )−log ΓN ( 2 )
⋆ −1
= e− 2

  N(N+1)   N+1
1 1 2 N + 1 ⋆ −1 2
= N +1
 det (P )
ΓN 2
e 2
  N(N+1)
1 N +1 2 N+1
= N +1
 det(P ⋆ )− 2 .
ΓN 2
2e
⋆) ⋆)
That is, f ⋆ (P ⋆ ) = e−φ(P and thus f ⋆ (P ) = e−φ(P for P ∈ S since f ⋆ is constant on S.

Finally, we note that the probability distribution µ⋆ of Corollary 4.6 actually maximizes differ-
ential entropy over all distributions with expectation contained in S. First, by Corollary 4.6 we
have that P ⋆ uniquely maximizes φ(P ) over all P ∈ S. Then, since φ can be represented as the
optimal value of the maximum entropy optimization problem given in Theorem 4.3, we have that
any probability distribution on PSD(N ) with differential entropy φ(P ⋆ ) and expectation contained
in S must actually have expectation equal to P ⋆ (or else P ⋆ would not uniquely maximize φ).

5 Basic Examples
5.1 The spectraplex
We first consider a simple example for which there is an explicit volume formula. Fixing A ∈ PD(N ),
we consider
S := {P ∈ PSD(N ) : tr(AP ) = 1} .
In this case that A is the identity matrix, this spectrahedron is called the spectraplex or the set
of density matrices. More generally, the spectrahedron S generalizes the simplex in the sense that
any simplex (up to translation) can be given as the intersection of a single affine hyperplane with
the positive orthant.
The spectrahedron S has an explicit volume formula, which we compute as follows. First we
write   Z
N +1
Z
N+1
ΓN = e− tr(X) dX = det(A) 2 e− tr(AX) dX.
2 PD(N ) PD(N )

15
A
Next note that the line f (t) = kAkF t is the unique unit-speed line through 0 which is orthogonal
to S, and f (kAk−1
F ) ∈ S. Thus we have
Z Z ∞
− tr(AX)
e dX = vol(kAkF · S) e−tkAkF tdim(S) dt = dim(S)! · kAk−1
F vol(S).
PD(N ) 0

Combining and rearranging gives


    
N +1 kAkF N +1
vol(S) = ΓN N+1 − 1 !−1 .
2 det(A) 2 2

k!
Using Stirling’s approximation and the fact that (k − 1)! = k, we can obtain the following asymp-
totic formula for large N :
   (N+1)  1
N +1 kAkF 2e 2 N (N + 1) 2
vol(S) ≈ ΓN N+1 .
2 det(A) 2 N (N + 1) 4π

We now apply the asymptotic formula of Theorem 2.2 and compare the result to the above formula.

Entropy function maximizer. We claim in this case that P ⋆ = (N A)−1 . To see this, recall
that the gradient of log det P is P −1 since we are restricting to symmetric matrices. For any P ∈ S,
we then have
hP − P ⋆ , ∇φ(P ⋆ )i = tr(P (P ⋆ )−1 ) − N = tr(N AP ) − N = 0.
It is then clear that P ⋆ ∈ S.

The asymptotic volume. Since S is defined by a constant number of constraints, we apply


Theorem 2.2 to obtain an asymptotic formula for the volume of S. For large N , Theorem 2.2 gives
1  1
tr(A2 ) 2 φ(P ⋆ )

N +1 2
vol(S) ≈ e
4π N −1
 1    (N+1)
N (N + 1) 2 N +1 2e 2 N+1
= kAkF · ΓN · det(N A)− 2
4π 2 N +1
   (N+1)  1
N +1 kAkF 2e 2 N (N + 1) 2
= ΓN N+1 .
2 det(A) 2 N (N + 1) 4π

This is the same expression that we obtained above using an explicit formula for the volume, and
thus this proves our asymptotic result in this special case.

5.2 One PD constraint and one rank-one constraint


Let us now consider another simple example. Fix A ∈ PD(N ) and v ∈ RN such that ξ := v ⊤ A−1 v >
1. We consider the case given by
n o
S := P ∈ PSD(N ) : tr(AP ) = 1 and tr(vv ⊤ P ) = v ⊤ P v = 1 .

16
Entropy function maximizer. We claim in this case that P ⋆ = (aA + bvv ⊤ )−1 , where
(N − 1)ξ ξ−N
a= and b= .
ξ−1 ξ−1
To see this, recall that the gradient of log det P is P −1 since we are restricting to real symmetric
matrices. For any P ∈ S, we then have
hP − P ⋆ , ∇φ(P ⋆ )i = tr(P (P ⋆ )−1 ) − N = tr(aAP ) + tr(bvv ⊤ P ) − N = a + b − N = 0.
Thus we only have left to show that P ⋆ ∈ S. To see that P ⋆ ∈ PD(N ), we first write
(N − 1)ξA + tvv ⊤

⋆ −1
(P ) = .
ξ−1
t=ξ−N

If ξ − N > 0, then (P ⋆ )−1 is positive definite and thus so is P ⋆ . Otherwise recall the following
consequence of the matrix determinant lemma for invertible B:
B + uu⊤ is invertible ⇐⇒ u⊤ B −1 u 6= −1.
Applying this to the above expression, we have
t · v ⊤ A−1 v
M (t) := (N − 1)ξA + tvv ⊤ is invertible ⇐⇒ 6= −1 ⇐⇒ t 6= 1 − N.
(N − 1)ξ
Since ξ − N > 1 − N , we have that M (t) is invertible for all t ∈ [ξ − N, 0]. Since M (0) is positive
definite, we therefore have that M (ξ − N ) is also positive definite and thus so is P ⋆ . We now finally
apply the Sherman-Morrison formula to get
a−2 bA−1 vv ⊤ A−1
  
N bξ
tr(AP ⋆ ) = tr A a−1 A−1 − = − 2 =1
1 + a−1 bv ⊤ A−1 v a a + abξ
and
a−2 bA−1 vv ⊤ A−1 bξ 2
  
⊤ ⋆ ⊤ −1 −1 ξ
tr(vv P ) = tr vv a A − = − 2 = 1.
1 + a−1 bv ⊤ A−1 v a a + abξ
Therefore P ⋆ is the optimizer of φ over S.

Applying Theorem 2.1. Since S is defined by a constant number of constraints, we apply


Theorem 2.1 which says that we can apply the volume formula with relative error ǫ whenever
0 < ǫ ≤ 21 and
2
2 8 · 105 2 + log(ǫ−1 ) log(N ǫ−1 )
ǫ ≥ .
N +1
103 log3/2 (N +1)
Setting ǫ := (N +1)1/2
, we have
! !
2 e2 (N + 1)1/2 (N + 1)3/2
2 + log(ǫ−1 ) log(N ǫ−1 ) ≤ log2 log
103 log3 (N + 1) 103 log3 (N + 1)
3
≤ log3 (N + 1),
8
which implies
5 2 + log(ǫ−1 ) 2 log(N ǫ−1 )
106 log3 (N + 1)

8 · 105 8 · 10
 
3
ǫ2 = ≥ log3 (N + 1) ≥ .
N +1 N +1 8 N +1
Thus we can apply Theorem 2.1 whenever ǫ ≤ 12 , which is satisfied whenever N ≥ 1011 , for example.

17
The approximate volume. We now compute the approximate volume of S for large enough N .
Since m = 2, we have
1/2
det(AA⊤ )
 
N +1 ⋆
vol(S) ≈ eφ(P ) ,
4π det(BB ⊤ )
where
det(AA⊤ ) = kAk2F · kvk42 − (v ⊤ Av)2
and, after a straightforward computation using the Sherman-Morrison formula,
   2 (N − 1)ξ 2 (ξ − 1)2
det(BB ⊤ ) = tr (AP ⋆ )2 · tr (vv ⊤ P ⋆ )2 − tr(AP ⋆ vv ⊤ P ⋆ ) = 2

= .
a (a + bξ)2 (N − 1)ξ 2

Thus, we have that


1/2
det(AA⊤ ) ξ(N − 1)1/2 
 1/2
= kAk2F · kvk42 − (v ⊤ Av)2 .
det(BB ⊤ ) ξ−1

Further, we have
   (N+1)
φ(P ⋆ ) N +1 2e 2 N+1
e = ΓN · det(aA + bvv ⊤ )− 2 ,
2 N +1

where, by the matrix determinant lemma,


 N
⊤ −1 ξ−1 (N − 1)ξ
det(aA + bvv ) = (1 + a bξ) · det(aA) = · det(A).
N −1 ξ −1

Combining everything gives


 (N+1)−1  
e N 2e 2 N +1
vol(S) ≈ (N − 1) 2 ΓN
2π N2 − 1 2
 N+1  (N+1)−1  1/2
kAk2F · kvk42 − (v ⊤ Av)2

1 2 ξ−1 2
× .
ξ−1 ξ det(A)N +1

5.3 Diagonal constraints


We compute one quick final example before moving on to the main exampling given in Section 6
below. Consider the problem of computing the volume of the spectrahedron given as follows

Sα,β = {P ∈ P SD(2N ) : Tr(M1 P ) = α, Tr(M2 P ) = β},

where the matrices Mi are given by


   
IN 0 0 0
M1 = , M2 = .
0 0 0 IN

Equivalently,
n P P2
 o
1
Sα,β = ∈ PD(2N ) : tr(P1 ) = α, tr(P4 ) = β .
P3 P4

18
A straightforward calculation shows that the entropy function maximizer P ⋆ is given by
 
⋆ 1 αIN 0
P = .
N 0 βIN

The linear operators AAT and BB T on R2 become respectively

1 α2 0
   
⊤ 1 0 ⊤
AA = N , BB = .
0 1 N 0 β2

Since m = 2, we can apply Theorem 2.2 to obtain the asymptotic expression


√ N (2N +1)
N 2 (2N + 1) φ(P ⋆ ) N 2 (2N + 1)
  
2e αβ 2N + 1
vol(Sα,β ) ≈ e = Γ2N
4παβ 4παβ N (2N + 1) 2

for N → ∞.

6 Main Example: Multi-stochastic Completely Positive Maps


In this section, we apply our main asymptotic result (Theorem 2.2) to a spectrahedral generaliza-
tion of the multi-way Birkhoff polytope. This spectrahedron consists of real symmetric positive
definite matrices which are naturally associated to completely positive linear maps on matrices
with certain stochasticity properties. Such “multi-stochastic” completely positive maps generalize
doubly stochastic completely positive maps, which in turn generalize doubly stochastic matrices
(i.e., points of the Birkhoff polytope). As discussed in the introduction, this spectrahedron can
also be described as the set of all quantum states with maximal entanglement; i.e., the set of all
quantum states with all univariant quantum marginals equal to the identity matrix.

6.1 Transportation polytopes and completely positive maps


In [BH10], the main application of their technical results is to asymptotic integer point counting
and volume computation for transportation polytopes and their generalizations. Recall that a
transportation polytope is defined as the set of m × n matrices with non-negative entries with fixed
specified row and column sums. For example, the Birkhoff polytope is the set of all n × n matrices
with non-negative entries whose rows and columns all sum to 1. Equivalently, the Birkhoff polytope
is the set of all doubly stochastic n×n matrices. All of these polytopes are linear slices of the positive
orthant of some dimension.
Here, we generalize the notion of the Birkhoff polytope (and transportation polytopes more gen-
erally) to something which might be called the quantum Birkhoff polytope or Birkhoff spectrahedron.
Specifically, fix A ∈ Sym(n2 ), and define a linear map ΦA : Sym(n) → Sym(n) via
n
X
A= Ei,j ⊗ ΦA (Ei,j ),
i,j=1

where Ei,j is the matrix with a 1 in the (i, j) entry and 0 elsewhere. Note that this defines ΦA on a
basis via the n×n blocks of the matrix A. By Choi’s theorem [Cho75], we have that A ∈ PSD(n2 ) if
and only if ΦA is a completely positive map.1 This positivity condition on A will serve to generalize
1
We do not define this term further here, but note that it is stronger than the condition ΦA (PSD(n)) ⊆ PSD(n).
E.g., see [LS93a] for further discussion.

19
the fact that elements of the Birkhoff polytope have non-negative entries. Also note that the adjoint
linear operator Φ∗A is given by
Xn
A= Φ∗A (Ei,j ) ⊗ Ei,j ,
i,j=1

and thus Choi’s theorem applies to Φ∗A


as well.
We now consider generalizations of the linear restrictions on the Birkhoff polytope; namely, the
row and column sum restrictions. An equivalent description of these conditions is given by: for B
in the Birkhoff polytope of size n we have

B · 1n = 1n and B ∗ · 1n = 1n ,

where 1n is the length-n all-ones column vector. Stated this way, these conditions are immediately
generalizable to A ∈ PSD(n2 ) via

ΦA (In ) = In and Φ∗A (In ) = In .

The matrices ΦA (In ) and Φ∗A (In ) are also called the partial traces of the operator A. For this
reason, a completely positive map ΦA (or sometimes A) is said to be doubly stochastic when it
satisfies the above conditions (e.g., see [LS93a] for further discussion). Note that the connection
to the Birkhoff polytope is strengthened by the fact that the set of all diagonal A ∈ PSD(n2 ) for
which ΦA is doubly stochastic is equal to the Birkhoff polytope after rearranging the diagonals of
each n × n block of A into columns of an n × n matrix.

6.2 Multi-index transportation polytopes and completely positive maps


The volume approximation result of [BH10] cannot be used directly to obtain asymptotics for the
volume of the Birkhoff polytope and transportation polytopes. This comes from the fact that the
asymptotic volume of the Birkhoff polytope given in [CM07] gives a different formula than the
one achieved in [BH10]. That said, one can achieve the correct asymptotic formula using similar
techniques and an “Edgeworth correction” term; see Remark 6.1 for more discussion.
Because of this, Barvinok and Hartigan in [BH10] apply their results to what they call multi-
index transportation polytopes. Instead of considering matrices with fixed row and column sums,
they consider higher-order tensors with fixed codimension-1 slice sums. More formally, they consider
n1 × n2 × · · · × nk multi-dimensional matrices A and fixed αij ∈ Z≥0 for all i ∈ [k] and j ∈ [ni ],
such that
n1 ni−1 ni+1 nk
X X X X X
aκ = ··· ··· aκ1 ,...,κi−1 ,j,κi+1,...,κk = αij
κ∈[n1 ]×···×[nk ] κ1 =1 κi−1 =1 κi+1 =1 κk =1
κi =j

for all i ∈ [k] and j ∈ [ni ]. One can then define the multi-index Birkhoff polytope by setting
n1 = · · · = nk = n and αij = 1 for all i ∈ [k] and j ∈ [n]. The results of [BH10] then apply to this
case whenever k ≥ 5.
This generalization from transportation polytopes to multi-index transportation polytopes then
easily extends to completely positive maps. Fix A ∈ Sym(n1 n2 · · · nk ), and for all i ∈ [k] define a
(i)
linear map ΦA : Sym(n1 · · · ni−1 ni+1 · · · nk ) → Sym(ni ) via
(i)
X
A= Eκ1 ,κ′1 ⊗ · · · ⊗ Eκi−1 ,κ′i−1 ⊗ ΦA (Eκ,κ′ ) ⊗ Eκi ,κ′i ⊗ · · · ⊗ Eκk−1 ,κ′k−1 ,
[n ]×···×[ni−1 ]×
κ,κ′ ∈ 1
[ni+1 ]×···×[nk ]

20
(i)
where Eκ,κ′ := Eκ1 ,κ′1 ⊗ · · · ⊗ Eκk−1 ,κ′k−1 . Note that as above, this expression defines ΦA on a basis
of Sym(n1 · · · ni−1 ni+1 · · · nk ). By Choi’s theorem [Cho75] again, the following are then equivalent:
1. A ∈ PSD(n1 n2 · · · nk ),
(i)
2. ΦA is a completely positive linear map for some i ∈ [k],
(i)
3. ΦA is a completely positive linear map for all i ∈ [k].
With this, we now explicitly define the multi-index Birkhoff spectrahedron which we denote SCP n,k
for “Stochastic Completely Positive”. Setting n1 = · · · = nk = n and given A ∈ PSD(nk ), we
define:
(i)
A ∈ SCP n,k ⇐⇒ ΦA (Ink−1 ) = In for all i ∈ [k].
In particular, SCP n,2 is precisely the set of matrices A ∈ PSD(n2 ) which correspond to doubly
(2) (1)
stochastic completely positive maps ΦA = ΦA , since Φ∗A = ΦA in this case.
As was the case with transportation polytopes and the Birkhoff polytope, our main result does
not apply to SCP n,2 . However, we will see next that our main result does yield asymptotics for the
volume of SCP n,k for all fixed k ≥ 7.
Remark 6.1. The results of [BH10] only apply directly to multi-index transportation polytopes when
the size of the tensor is k ≥ 5. However, later work showed that similar techniques could handle the
case of k = 2 using an “Edgeworth correction” to the approximation formula (see [BH12, BH09]
for further discussion). Additionally, it was also later shown that the original approximation of
[BH10] could be applied to k = 3, 4 cases in [BP14].

6.3 Applying Theorem 2.2 to multi-index Birkhoff spectrahedra


We now apply Theorem 2.2 to multi-index Birkhoff spectrahedra SCP n,k which we defined above.
To do this, we first rewrite the linear conditions which define SCP n,k in a way which is more
compatible with the statement of Theorem 2.2.

Linear constraints. For i ∈ [k] and α, α′ ∈ [n] we define


X Eκ,κ′ + Eκ′ ,κ
Ai,α,α′ := and bi,α,α′ = δα,α′ ,
2
κ,κ′ ∈[n]k
κi =α, κ′i =α′
κj =κ′j ∀j6=i

where Eκ,κ′ := Eκ1 ,κ′1 ⊗ · · · ⊗ Eκk ,κ′k and Ep,q is the matrix with a 1 in the (p, q) entry and 0
elsewhere. An alternative description of SCP n,k is then given by

SCP n,k = {X ∈ PSD(nk ) : (tr(Ai,α,α X))i∈[k],α,α′ ∈[n],α≤α′ = b}

where α, α′ range over [n] with α ≤ α′ and i ranges over [k]. Further note that this description is
partially redundant. By including an extra constraint on the trace of a given X ∈ PSD(nk ), we
can eliminate the conditions indexed by (i, n, n) for all i ∈ [k]. Specifically, we add the condition
In k
tr(A0 X) = b0 where A0 = and b0 = 1.
n
n+1

Thus the total number of affine constraints required to describe SCP n,k is mn = k 2 − k + 1.

21
I
Entropy function maximizer. We next claim that P ⋆ := nk−1 nk
maximizes φ(P ) over all P ∈
SCP n,k , or equivalently maximizes log det P . To see this, recall that the gradient of log det P is
P −1 since we are restricting to real symmetric matrices. For any P ∈ SCP n,k , we then have

hP − P ⋆ , ∇φ(P ⋆ )i = tr(P (P ⋆ )−1 ) − nk = nk−1 tr(P ) − nk = 0,


Ink
since the condition tr(A0 P ) = b0 implies tr(P ) = n. Since φ(P ) is concave, this implies P ⋆ = nk−1
maximizes φ over SCP n,k .

Applying Theorem 2.2. To apply Theorem 2.2, we need to determine for which values of k the
number of affine constraints mn = k n+1 1 is sufficiently dominated by Nn = nk . Recalling

2 − k +
Equation 2 from the statement of Theorem 2.2, we compute
3
k n+1

m3n log Nn 2 − k + 1 k log n k4 n6 log n
lim = lim ≤ lim .
n→∞ Nn n→∞ nk n→∞ nk
Thus whenever k ≥ 7, we can apply Theorem 2.2 to obtain asymptotics for the volume of SCP n,k .

The asymptotic volume. We now compute the asymptotic volume of SCP n,k for k ≥ 7. For
k n+1

N = n and m = k 2 − k + 1, we have
m/2  1/2
det(AA⊤ )

N +1 ⋆
vol(SCP n,k ) ≈ eφ(P ) ,
4π det(BB ⊤ )

det(AA⊤ ) 1/2
 
N m 1
= nm(k−1) =

where det(BB ⊤ ) n since B = nk−1
A, and

 − N(N+1)  − N+1  
φ(P ⋆ ) N +1 2
N (k−1) 2 N +1
e = n ΓN
2e 2
  N(N+1)  
2en 2 N +1
= ΓN
N (N + 1) 2
Ink
since P ⋆ = nk−1
. That is,
 m  m   N(N+1)  
N +1 2 N 2en 2 N +1
vol(SCP n,k ) ≈ ΓN .
4π n N (N + 1) 2
Further, using the definition of the multivariate gamma function we obtain

N +1
  π  N(N−1) + 1 ⌊ N ⌋ NY
−1
4 2 2
ΓN = j!.
2 2
j=0

7 Proof of the Main Technical Result


Our main results in Section 2 follow from a single result, which we prove in this section. Although
this result is strictly stronger than any of the results in Section 2, we have moved it here because
it is much more technical and complicated to state. The result we prove here is the direct analog
of the main result of [BH10].

22
We first recall all notation from Section 2. Given A1 , A2 , . . . , Am ∈ Sym(N ) and b ∈ Rm , we
define a spectrahedron S by:

S := P ∈ PSD(N ) : tr(Ak P ) = bk for k ∈ [m] .

We assume that S is compact, that the constraints tr(Ak P ) = bk are linearly independent, that
m < N 2+1 = dim(PSD(N )), and that S is of dimension exactly N 2+1 − m. Let P ⋆ ∈ S be the
point which maximizes the function
   
N +1 N (N + 1) N +1 N +1
φ(P ) = log ΓN − log + log det(P )
2 2 2e 2
N +1
= const(N ) + log det(P )
2
over S. Let A and B be linear operators from Sym(N ) to Rm , defined via

AX := (tr(A1 X), . . . , tr(Am X))

and
BX := (tr(Z1 X), . . . , tr(Zm X)).
√ √
where Zk := P ⋆ Ak P ⋆ for all k ∈ [m].

Theorem 7.1 (Main technical result). Let S be a spectrahedron as defined above. Consider the
quadratic form q : Rm → R defined by
 !2 
m
1 X
q(t) := tr  t k Zk  .
N +1
k=1

Suppose that for some λ > 0 we have

q(t) ≥ λktk22 for all t ∈ Rm , (A1)

and that for some θ > 0 we have


2
k(x⊤ Zk x)m
k=1 k2 ≤ θ for all kxk2 = 1. (A2)
N +1

Then there exists an absolute constant γ (we can choose γ = 105 ) such that the following holds:
Fix 0 < ǫ ≤ 21 and suppose that
2
λ ≥ γθ 2 ǫ−2 m m + log(ǫ−1 ) log(N ǫ−1 ). (A3)

Then the number m/2  1/2


det(AA⊤ )

N +1 ⋆)
eφ(P
4π det(BB ⊤ )
approximates vol(S) within relative error ǫ.

The remainder of this section is devoted to proving this result.

23
Setup. Let X be a random variable distributed according to the (maximum entropy) Wishart
distribution with expectation given by a positive definite matrix P ⋆ . That is,
√ GGT √ ∗
X = P∗ P
N +1
where G ∈ RN ×(N +1) is a random matrix where the entries Gij ∼ N (0, 1) are drawn identically
and independently from the normal distribution.
Given N × N real symmetric matrices A1 , . . . , Am , we further define
Y := AX := (tr(A1 X), . . . , tr(Am X)) ∈ Rm ,
and we note that
b = AP ⋆ = (tr(A1 P ⋆ ), . . . , tr(Am P ⋆ )) ∈ Rm .
We further define the matrix
m
X m
X √ √
Z(t) := t k Zk = tk P ⋆ Ak P ⋆ ,
k=1 k=1
and with this, q(t) can be defined via
N
1 1 X 2
q(t) := tr(Z 2 (t)) = λj (Z(t))
N +1 N +1
j=1

for any t ∈ Rm , where λj (M ) denotes the j th largest eigenvalue of M . We show how this form
relates to B in Lemma 7.5. Note further that we have slightly overloaded the symbol λ: λ itself
denotes the above constant, and λj (M ) denotes the j th largest eigenvalue of M . We have done
this mainly to retain consistency of notation between our computations and that of Barvinok and
Hartigan in [BH10].

The proof. We now define σ := 4m + 10 log 1ǫ , and we further define


   
1 1
R1 := t : ktk2 ≤ , q(t) > σ and R2 := {t : q(t) ≤ σ} and R3 := t : ktk2 ≥ .
2θ 2θ
1
Note that for t with ktk2 ≥ 2θ and for γ > 40, we have
λ γǫ−2 m(m + log 1ǫ )2 log(N ǫ−1 ) γσm(m + log 1ǫ ) log(N ǫ−1 )
q(t) ≥ 2 ≥ ≥ > σ.
4θ 4 40
Thus, the sets R1 , R2 , R3 are disjoint for large enough γ.
We assume the following bound, which we prove in Section 7.2.1 (for the R1 bound), Section
7.2.2 (for the R2 bound), and Section 7.3 (for the R3 bound):
Z Z Z
−ihb,ti −q(t) −ihb,ti

me
φY (t)dt − e dt ≤ e φY (t)dt
Rm
R
Z R1 Z
−ihb,ti −q(t)

+ e φY (t)dt − e dt
m
ZR2 R
e−ihb,ti φY (t)dt

+
R
 3   Z
3 2ǫ 5 ǫ
≤ ǫ + +ǫ + e−q(t) dt
3 100 Rm
Z
≤ǫ e−q(t) dt,
Rm

24
where φY is the characteristic function of the random variable Y , the first inequality follows from
the triangle inequality, and the last inequality follows from the fact that ǫ ≤ 21 . Assuming this
bound, we now complete the proof of the main result Theorem 7.1. By the characteristic function
inversion formula (e.g., see Section 29 of [Bil95]), the density of Y at b is equal to
1
Z
e−ihb,ti φY (t)dt.
(2π)m Rm
By Corollary 4.6 and since A is a linear map, the density of Y = AX at b is also equal to

vol(S)e−φ(P )
.
det(AA⊤ )1/2
Further, using Lemma 7.5 below, we compute the multivariate Gaussian integral
v
(2π)m ((N + 1)π)m/2
Z Z u
−q(t) − 12 t⊤ ( N+1
2
BB ⊤ )t
e dt = e dt = t = .
u 
Rm Rm det N 2+1 BB ⊤ det(BB ⊤ )1/2

Combining everything and rearranging then implies




vol(S)
− 1 ≤ ǫ.

N +1 m/2  det(AA⊤ ) 1/2 φ(P ⋆ )

e

4π det(BB ⊤ )

That is,
1/2
N + 1 m/2 det(AA⊤ )
  


eφ(P ) approximates vol(S) within relative error ǫ,
4π det(BB )
which is precisely the statement of Theorem 7.1. The remainder of this section will now be spent
proving the bound which we assumed above.
Remark 7.2. This technique of bounding the characteristic function is precisely what was used in
[BH10] in the polytope case. See [BR22] for another interesting use of these types of integral expres-
sions involving real symmetric matrices, where similar computations are utilized on the algebraic
problem of solving a system of real quadratic equations.

7.1 A few small results



For a symmetric matrix Z ∈ Sym(N ), we denote by |Z| the matrix Z 2.
Lemma 7.3. For Z(t) defined as above and any t ∈ Rm , we have
N +1
λmax (|Z(t)|) ≤ θktk2 .
2
Proof. Since Z(t) is symmetric, we have
m m
!
X X

λmax (Z(t)) = sup x tk Zk x = sup t k x ⊤ Zk x
kxk2 =1 k=1 kxk2 =1 k=1
m
X

≤ sup tk x Zk x ≤ ktk2 · sup k(x⊤ Zk x)m
k=1 k2 .

kxk2 =1 k=1 kxk2 =1

The same holds for λmax (−Z(t)), and so the bound in fact holds for λmax (|Z(t)|). Applying the θ
bound of (A2) then gives the result.

25
Corollary 7.4. For Z(t) defined as above and any t ∈ Rm , we have
 p−2
p N +1
tr(|Z (t)|) ≤ θktk2 tr(Z 2 (t)).
2

Proof. For any p, we have


N N
λpj (|Z(t)|) ≤ λmax (|Z(t)|)p−2
X X
tr(|Z p (t)|) = λ2j (|Z(t)|) = λmax (|Z(t)|)p−2 tr(Z 2 (t)).
j=1 j=1

Applying the previous lemma then gives the result.

Lemma 7.5. For any t ∈ Rm , we have


1
q(t) = t⊤ BB ⊤ t.
N +1
Proof. Letting vec denote the standard vectorization, we can write
 ⊤
| |
BX = vec(Z1 ) · · ·
 vec(Zm ) vec(X),
| |

which implies
  
| | Xm
B ⊤ t = vec−1 vec(Z1 ) · · · vec(Zm ) t = tk Zk = Z(t).
| | k=1

Therefore,
1 1
t⊤ BB ⊤ t = tr(Z 2 (t)) = q(t).
N +1 N +1

We now state few results from [BH10] which we will need here.

Lemma 7.6 ([BH10], Lemma 6.2). Let q : Rm → R be a positive definite quadratic form, and let
ω > 0 be a positive real number.

1. If ω ≥ 3, then Z Z
e−q(t) dt ≤ e−ωm/2 e−q(t) dt.
t: q(t)≥ωm Rm

2. If q(t) ≥ λktk22 for all t ∈ Rm , then for any a ∈ Rm we have


Z Z
−q(t) −λω 2
e dt ≤ e e−q(t) dt.
t: |ha,ti|>ωkak2 Rm

Lemma 7.7 ([BH10], Lemma 6.3). For any ρ ≥ 0 and any k > m where t ∈ Rm , we have

2π m/2
Z
(1 + θ 2 ktk22 )−k/2 dt ≤ (1 + ρ2 )(m−k)/2 .
t: ktk2 ≥ρ/θ Γ(m/2) · θ m (k − m)

26
We actually need a slight modification of part(2) of Lemma 7.6 for our purposes, which we
prove now.
Corollary 7.8. Let q : Rm → R be a positive q definite quadratic form. If q(t) ≥ λktk22 for all
t ∈ Rm , then for every positive real number ω ≥ 3m λ
Z Z
−q(t) −λω 2 /2
e dt ≤ e e−q(t) dt
ktk2 ≥ω Rm

holds.
q
3m
Proof. For t ∈ Rm with ktk2 ≥ ω, we have q(t) ≥ λω 2 . Moreover, the condition ω ≥ λ implies
λω 2 ≥ 3m. Hence,
Z Z Z
−q(t) −q(t) −λω 2 /2
e dt ≤ e dt ≤ e e−q(t) dt
ktk2 ≥ω q(t)≥λω 2 Rm

where the second inequality follows from Lemma 7.6 (1).

7.2 The characteristic function for small t


In this and the following section, we will bound the characteristic function of the random variable Y
for various input values. To do this, we apply the appropriate transformation to the characteristic
function of the maximum entropy Wishart random variable X. The expectation of X is P ⋆ , and
thus its characteristic function is given by
" √ √ #− N+1 2
2i P ⋆ M P ⋆
φX (M ) = det I − .
N +1

Note that there is a potential ambiguity in this definition coming from applying the square root
to a complex number, and we briefly address this now. Considering the determinant expression
above as a function on the space of complex symmetric matrices, the expression is non-zero in
an open neighborhood of the subspace of real symmetric matrices. (The eigenvalues of the input
matrix will all be near the line ℜ(z) = 1.) Thus the square root of the determinant can be defined
analytically in an open neighborhood up to choice of square root, and we make the choice which
gives φX (0) = 1. Because all eigenvalues of the matrix in the expression for φX (M ) are contained
in the open right half-plane whenever M is real symmetric, the value of φX is also given by applying
the square root to the eigenvalues individually, using the principal branch. That is, we have

N
" √ √ !#− 21 N +1
Y P ⋆M P ⋆
φX (M ) =  1 − 2i · λj  .
N +1
j=1

We now use this expression for for the characteristic function of X to prove a nice expression for
the characteristic function of Y .
1
Lemma 7.9. On the set of all real t ∈ Rm such that ktk2 ≤ 2θ , the characteristic function of Y
can be expressed as
φY (t) = exp(ihb, ti − q(t) − if (t) + g(t))
where
4 4
f (t) = · tr(Z 3 (t)) and |g(t)| ≤ · tr(Z 4 (t)).
3(N + 1)2 (N + 1)3

27
Proof. Using the above discussion, the expression for the characteristic function of a random N ×N
matrix X distributed according to the maximum entropy Wishart distribution with expectation
P ⋆ is given by
N
" √ √ !#
X 1 P ⋆M P ⋆
log φX (M ) = (N + 1) − log 1 − 2i · λj ,
2 N +1
j=1

where here we choose the principal branch of log as discussed above.


We now write down the characteristic function for the random variable Y = AX, where A acts
by X 7→ (tr(Ai X))mi=1 as defined above. Using the standard formula for the characteristic function
under the action of a linear operator, we have
N
" √ √ !#
N + 1 X P ⋆ (A⊤ t) P ⋆
log φY (t) = log φX (A⊤ t) = − log 1 − 2i · λj .
2 N +1
j=1

Note that the above expression is syntactically valid, since we can view A⊤ as a map from Rm to
N × N real symmetric matrices. Letting vec denote the standard vectorization, we can write
 ⊤
| | |
Y = AX = vec(A1 ) vec(A2 ) · · ·
 vec(Am ) vec(X),
| | |

which implies
  
| | | Xm
A⊤ t = vec−1 vec(A1 ) vec(A2 ) · · · vec(Am ) t = tk Ak .
| | | k=1

Therefore we have
N  
N +1X 2i
log φY (t) = − log 1 − λj (Z(t)) .
2 N +1
j=1
1
Note that since ktk2 ≤ 2θ by assumption, Pwe have λmax (|Z(t)|) ≤ N 4+1 by Lemma 7.3. Now recall
the Taylor’s approximation log(1 + ξ) = ni=1 (−1)i−1 ξ i /i. Using the error theorem for Taylor’s
approximation, for every ξ ∈ C with |ξ| ≤ 21 there exists ξ̃ with |ξ̃| ≤ 12 such that

ξ2 ξ3 ξ4
log(1 + ξ) − ξ + − =
2 3 4(1 + ξ̃)

which implies
ξ2 ξ3
log(1 + ξ) = ξ − + + z0 · ξ 4
2 3
for some |z0 | ≤ 21 . Therefore for any fixed j, we have
 
2i 2i 2
log 1 − λj (Z(t)) = − λj (Z(t)) + λ2 (Z(t))
N +1 N +1 (N + 1)2 j
8i 16
+ λ3j (Z(t)) + ĝj (t) · λ4 (Z(t)),
3(N + 1) 3 (N + 1)4 j

28
PN p
where |ĝj (t)| ≤ 21 . Since j=1 λj (Z(t)) = tr(Z p (t)), we have
1
log φY (t) = i · tr(Z(t)) − · tr(Z 2 (t))
N +1
4i 8
− 2
· tr(Z 3 (t)) + ĝ(t) · · tr(Z 4 (t))
3(N + 1) (N + 1)3
where |ĝ(t)| ≤ 21 . Note that this relies on the fact that λ4j (Z(t)) ≥ 0 for all j. The fact that
m
X √ √ m
X
tr(Z(t)) = tk tr( P ⋆ Ak P ⋆ ) = tk bk = hb, ti
k=1 k=1

then implies the result.


1
Corollary 7.10. For real t ∈ Rm such that ktk2 ≤ 2θ , the characteristic function of Y can be
bounded by
|φY (t)| ≤ e−3q(t)/4 .
Proof. By the previous lemma and Corollary 7.4, we have
|φY (t)| = exp(−q(t) + ℜ[g(t)])
where ℜ[g(t)] denotes the real part of g(t) and
 2
4 4 4 N +1 1 q(t)
|g(t)| ≤ 3
tr(Z (t)) ≤ 3
θktk2 tr(Z 2 (t)) ≤ tr(Z 2 (t)) = .
(N + 1) (N + 1) 2 4(N + 1) 4
This gives the desired bound.

7.2.1 For large q(t)


Defining σ := 4m + 10 log 1ǫ as above, we consider the subcase where q(t) > σ and ktk2 ≤ 1
2θ . By
Corollary 7.10 and part (1) of Lemma 7.6, we have

Z Z
−ihb,ti
e−3q(t)/4 dt

ktk2 ≤ 2θ1 e
φY (t)dt ≤
q(t)>σ 3q(t)/4>3σ/4
Z
−3σ/8
≤e e−3q(t)/4 dt
Rm Z
−3m/2−15/4 log(1/ǫ)
≤e e−3q(t)/4 dt
R m
Z
−3m/2 3
≤e ǫ e−3q(t)/4 dt.
Rm

Since q is a quadratic form, we further have


Z Z √  m/2 Z
4
−3q(t)/4 −q(t 3/4)
e dt = e dt = e−q(t) dt.
Rm Rm 3 Rm
m/2
Since e−3m/2 34 = exp(−(m/2) · (3 − log 43 )) ≤ 1, this implies

Z Z
−ihb,ti 3
e−q(t) dt.


ktk2 ≤ 2θ1 e φY (t)dt ≤ ǫ
m

q(t)>σ R

29
7.2.2 For small q(t)
Defining σ := 4m + 10 log 1ǫ as above, we now consider the case where q(t) ≤ σ. We first show
1
that this implies ktk2 ≤ 2θ for γ ≥ 40, meaning that we may consider this as a subcase of the
ktk2 ≤ 2θ case. Since q(t) ≤ σ, we have that ktk22 ≤ σλ by the assumption given in (A1). Then by
1

the assumption given in (A3), we have

σ 4m + 10 log(ǫ−1 )
ktk22 ≤ ≤ 2 −2
λ γθ ǫ m(m + log(ǫ−1 ))2 log(N ǫ−1 )
10(m + log(ǫ−1 )) 1
≤ 2 −1
≤ 2.
40θ (m + log(ǫ )) 4θ
With this, Lemma 7.9 implies
Z Z Z

e−ihb,ti φY (t)dt − e−q(t) dt ≤ e−q(t) e−if (t)+g(t) − 1 dt.


q(t)≤σ q(t)≤σ q(t)≤σ

And by Corollary 7.4, we further have


2
θ2σ2

4 4 N +1
|g(t)| ≤ · tr(Z 4 (t)) ≤ θktk2 tr(Z 2 (t)) = θ 2 ktk22 · q(t) ≤ .
(N + 1)3 (N + 1)3 2 λ

For γ ≥ 500, our assumption on λ given in (A3) then implies

θ2σ2 ǫ2 (4m + 10 log ǫ−1 )2 100ǫ2 ǫ


|g(t)| ≤ ≤ −1 2 −1
≤ ≤ ,
λ γm(m + log(ǫ )) log(N ǫ ) γ 10

since ǫ ≤ 12 . Now define


n ǫ o
T := {t : q(t) ≤ σ} and B := t : ktk2 ≤ .
10σθ
Thus for t ∈ T , we have that

−if (t)+g(t)
− 1 ≤ e|g(t)| + 1 ≤ eǫ/10 + 1 ≤ 3.

e

Note that for γ ≥ 30000,

ǫ2 γm(m + log ǫ−1 )2 log(N ǫ−1 ) m log(N ǫ−1 ) 3m


2 2
≥ 2
≥ 4

100σ θ 100σ λ 10 λ λ
holds. Hence, by Corollary 7.8, we have
λǫ2
Z Z
e−q(t) dt ≤ e− 200σ2 θ2 e−q(t) dt.
Rm \B Rm

For γ ≥ 100000, our assumption on λ given in (A3) then implies

λǫ2 γm(m + log(ǫ−1 ))2 log(N ǫ−1 )


− ≤ −
200σ 2 θ 2 200(4m + 10 log(ǫ−1 ))2
γ log(N ǫ−1 )
≤−
20000
5 −5 ǫ
≤ log(ǫ N ) ≤ log
16

30
sinc ǫ ≤ 12 . Combining the above two expressions gives

ǫ
Z Z
−q(t)
e dt ≤ e−q(t) dt.
Rm \B 16 Rm

For t ∈ T ∩ B, Lemma 7.9 and Corollary 7.4 then imply

4 4 N +1 2θktk2 ǫ
|f (t)| ≤ 2
· tr(|Z 3 (t)|) ≤ 2
· θktk2 · tr(Z 2 (t)) ≤ · q(t) ≤
3(N + 1) 3(N + 1) 2 3 15

and
4 4 2 2
 ǫ 2 ǫ2
|g(t)| ≤ · tr(Z (t)) ≤ θ ktk2 · q(t) ≤ · σ ≤ .
(N + 1)3 10σ 100
For |x| < 1, we have that |∂x ex | ≤ e. Thus for t ∈ T ∩ B, we have that
2
 

−if (t)+g(t)

0 ǫ ǫ ǫ
e − 1 ≤ e − 1 + e · | − if (t) + g(t)| ≤ 3 + ≤ ,
15 100 3

since ǫ ≤ 12 . Combining this with the above expression gives


Z Z Z

e−ihb,ti φY (t) dt − e−q(t) dt ≤ e−q(t) e−if (t)+g(t) − 1 dt


q(t)≤σ q(t)≤σ q(t)≤σ
Z Z
−q(t) −if (t)+g(t)
= e − 1 dt + e−q(t) e−if (t)+g(t) − 1 dt

e
T ∩(Rm \B) T ∩B
3ǫ ǫ
Z Z
≤ e−q(t) dt + e−q(t) dt
16 Rm 3 T ∩B

Z
≤ e−q(t) dt.
3 Rm

By part (1) of Lemma 7.6, we also have


Z Z Z Z
−q(t) −σ/2 −q(t) −2m 5 −q(t) 5
e dt ≤ e e dt = e ǫ e dt ≤ ǫ e−q(t) dt.
q(t)≥σ Rm Rm Rm

Combining these gives


Z  Z

Z
−ihb,ti −q(t) 5
e φY (t)dt − e dt ≤ +ǫ e−q(t) dt.

3

q(t)≤σ Rm Rm

7.3 The characteristic function for large t


Lemma 7.11. For all real t ∈ Rm , the characteristic function of Y can be bounded by
− λ2
|φY (t)| ≤ 1 + θ 2 ktk22 θ .

Proof. Using the expressions in the proof of Lemma 7.9 above, we have
 N +1 − 12 2 !− N+1
4
2
1 2i 2i
|φY (t)| = |φY (t) | = det I − Z(t) = det I − Z(t) .
2

N +1 N +1

31
Note here that the absolute value takes care of the fact that we may need to take a square root of
a complex number in the computation of φY (t). With this, we then further compute

2 !− N+1
4  − N+1
2i 4 2
4
|φY (t)| = det I − Z(t)
= det I + 2
Z (t) .
N +1 (N + 1)

Now, denoting
4
ξj := λ2 (Z(t)),
(N + 1)2 j
we have
N
X 4 4λ
ξj = q(t) ≥ ktk2
N +1 N +1 2
j=1

by the assumption given in (A1). By Lemma 7.3,


 2
4 2 4 N +1
λ (Z(t)) ≤ θktk2 = θ 2 ktk22 .
(N + 1)2 max (N + 1)2 2
QN
We now want to minimize j=1 (1 + ξj ), a log-concave function, over the polytope given by

N
X 4λ
ξj ≥ ktk2 and 0 ≤ ξj ≤ θ 2 ktk22 .
N +1 2
j=1

By log-concavity, the minimum must occur at an extreme point of this polytope, thus at least N
inequalities above must actually be equalities. In particular, this means that all but at most one
2 2
j θ ktk2 .kGiven such
value of ξj is equal to 0 or l a minimum,
m if N0 is the number of values of ξj that
4λ 4λ
equal θ 2 ktk22 , then N0 ≥ θ 2 (N +1)
. If N0 ≥ θ 2 (N +1)
, then we have

N N0
Y Y  4λ
1 + θ 2 ktk22 ≥ 1 + θ 2 ktk22 θ2 (N+1) .

(1 + ξj ) ≥
j=1 j=1
j k

Otherwise N0 = θ 2 (N +1)
. The one potentially non-extreme value of ξj , call it ξj0 , is then such
that
N  
4λ 2
X 4λ
ktk ≤ ξj = 2 · θ 2 ktk22 + ξj0 ,
N +1 2 θ (N + 1)
j=1
j k
ξj0 4λ 4λ
which implies θ 2 ktk22
≥ θ2 (N +1)
− 2
θ (N +1)
=: α0 . Thus we have

N j k
Y 4λ
(1 + ξj ) ≥ 1 + θ 2 ktk22 1 + α0 θ 2 ktk22 .
 
θ 2 (N+1)

j=1

Now for α ∈ [0, 1] and r > 0, the function

f (α) := 1 + αr − (1 + r)α

32
is concave, and f (0) = f (1) = 0. Therefore f (α) ≥ 0 for α ∈ [0, 1], which in turn implies
N j k
4λ 4λ
Y
2 +α0
ktk22 = 1 + θ 2 ktk22
 
(1 + ξj ) ≥ 1 + θ θ 2 (N+1) θ 2 (N+1) .
j=1

So in any case, the above inequality holds. Rearranging then finally gives
 − N+1
2
 4λ
2 θ2 (N+1)
4 − λ
|φY (t)| ≤ 1 + θ ktk2 = 1 + θ 2 ktk22 θ2 .

1
Proposition 7.12. The integral of φY (t) over the region where ktk2 ≥ 2θ can be bounded by
m
ǫ((N + 1)π) 2 ǫ
Z Z
|φY (t)|dt ≤ p = e−q(t) dt.
1
t: ktk2 ≥ 2θ 100 det(BB ⊤ ) 100 Rm

Proof. By Lemma 7.7 and Lemma 7.11 we have


Z Z
2
|φY (t)|dt ≤ (1 + θ 2 ktk22 )−λ/θ dt
t: ktk2 ≥1/2θ t: ktk2 ≥1/2θ
m   m − λ2
2π 2 5 2 θ

Γ( m m 2λ
2 ) · θ ( θ 2 − m)
4
m   λ
5π 2 5 − θ2

1
= m .
Γ( 2 ) · ( θλ2 − m
2)
4θ 2 4
By Lemma 7.5 and Lemma 7.3 we also have
N
X
λmax (BB ⊤ ) = sup t⊤ BB ⊤ t = sup (N + 1)q(t) = sup λ2j (Z(t))
ktk2 =1 ktk2 =1 ktk2 =1 j=1

N  2
X N +1 N (N + 1)2 2
≤ sup θktk2 = θ ,
ktk2 =1 2 4
j=1

which implies m  2 2 m
det(BB ⊤ ) N (N + 1)2 2

N θ
m
≤ θ ≤ .
(N + 1) 4(N + 1) 2
Now recall from the assumption given in (A3) that
λ −2 −1 2
log(N ǫ−1 ).

≥ γǫ m m + log(ǫ )
θ2
Since ǫ ≤ 21 , choosing γ ≥ 2 implies
  λ2  γǫ−2 m(m+log(ǫ−1 ))2 log( 5 )
5 θ N 4

4 ǫ
 m/2  2 2 m/2
m 8 N θ
≥ (2N ) = 2
θ 2
 m/2  ⊤
1/2
8 det(BB )
≥ 2
θ (N + 1)m

33
For γ ≥ 200, we then further have
λ m m 100m 200m
− ≥ 2 (2γ log(2) − ǫ2 ) ≥ ≥ .
θ2 2 2ǫ ǫ2 ǫ
which implies
m  λ m
  m  200m 100m
Γ · 2
− ≥ Γ · ≥ ,
2 θ 2 2 ǫ ǫ
1
since Γ(x) ≥ 2 for x > 0. Combining everything then gives
2
5π m/2 5 −λ/θ
   
1
Z
|φY (t)|dt ≤
t: ktk2 ≥1/2θ Γ( m λ m
2 ) · ( θ2 − 2 )
4θ 2 4
 m/2 
5π m/2 θ 2 (N + 1)m 1/2
  
ǫ

100m 4θ 2 8 det(BB ⊤ )
m/2 
(π(N + 1))m 1/2
 
ǫ 5

100 4 · 8 det(BB ⊤ )
(π(N + 1))m 1/2
 
ǫ
≤ .
100 det(BB ⊤ )

This finishes the proof of the first claim. For the second claim, suppose that µ1 , µ2 , . . . , µm ≥ 0 are
the eigenvalues of BB T . Then, by diagonalizing BB T with an orthogonal matrix
Pm
Z Z
−q(t) 2
e dt = e−( i=1 µi ti )/(N +1) dt
Rm Rm
m
YZ 2 /(N +1)
= e−µi t dt
i=1 R
m p
Y π(N + 1)
= √
µi
i=1
(π(N + 1))m/2
=
det(BB T )1/2

holds. This finishes the proof.

8 Proofs of the Main Results


In this section, we prove the main results of Section 2 as corollaries of Theorem 7.1. We first recall
the spectrahedron notation given in Section 2, and then we recall the conditions of Theorem 7.1
given in Section 7.
Given A1 , A2 , . . . , Am ∈ Sym(N ) and b ∈ Rm , we define a spectrahedron S by:

S := P ∈ PSD(N ) : tr(Ak P ) = bk for k ∈ [m] .

We assume that S is compact, that the constraints tr(Ak P ) = bk are linearly independent, that
m < N 2+1 = dim(PSD(N )), and that S is of dimension exactly N 2+1 − m. Let P ⋆ ∈ S be the

34
point which maximizes the function
   
N +1 N (N + 1) N +1 N +1
φ(P ) = log ΓN − log + log det(P )
2 2 2e 2
N +1
= const(N ) + log det(P )
2
over S. We let A and B be linear operators from Sym(N ) to Rm , defined via

AX := (tr(A1 X), . . . , tr(Am X))

and
BX := (tr(Z1 X), . . . , tr(Zm X)).
√ √
where Zk := P ⋆ Ak P ⋆ for all k ∈ [m].
In the statement of Theorem 7.1, we consider the quadratic form q : Rm → R defined by
 !2 
m
1 X
q(t) := tr  t k Zk  .
N +1
k=1

We suppose that for some λ > 0 we have

q(t) ≥ λktk22 for all t ∈ Rm , (A1)

and that for some θ > 0 we have


2
k(x⊤ Zk x)m
k=1 k2 ≤ θ for all kxk2 = 1. (A2)
N +1

Given 0 < ǫ ≤ 21 , we further suppose that


2
λ ≥ γθ 2 ǫ−2 m m + log(ǫ−1 ) log(N ǫ−1 ), (A3)

where γ = 105 is an absolute constant. Under these conditions, Theorem 7.1 gives an approximate
volume formula.

8.1 Simplifying Theorem 7.1


We now demonstrate a way to simplify the above conditions of Theorem 7.1, which we will use in
the proofs of the main results.
By Lemma 7.5, we have that q(t) = N 1+1 t⊤ BB ⊤ t, and so λ in Condition A1 can be optimally

chosen to be the minimal eigenvalue of BB
N +1 . Since B is a linear map taking matrices as input, we
may write  √ √ m
(x⊤ Zk x)m
k=1 = tr( P ⋆A
k P ⋆ xx⊤ ) = B(xx⊤ ).
k=1

Since kxk2 = 1 if and only ifkxx⊤ kF = 1, where kXkF = √


tr(X ⊤ X) denotes the entrywise 2-norm

of X, we can choose θ to be the maximal eigenvalue of 2 NBB
+1 (though this is non-optimal). With
this, we can replace Condition (A3) above by

λmin (BB ⊤ ) 4γ 2

≥ 2 m m + log(ǫ−1 ) log(N ǫ−1 ),
λmax (BB ) ǫ (N + 1)

35
where λmin and λmax refer to the minimum and maximum eigenvalues respectively.
That is, we can replace Condition (A3) by a bound on the condition number of BB ⊤ . What is
special about this is the fact that one can always change Ak and bk defined above to enforce the
condition number of BB ⊤ to be 1, without changing the spectrahedron or the value of P ⋆ . We
prove this formally in Lemma 8.1 below. Thus we can further replace Condition (A3) above by
2
ǫ2 (N + 1) ≥ 4γm m + log(ǫ−1 ) log(N ǫ−1 ), (A3′ )

That is, to obtain the volume approximation for a given ǫ, it is enough to achieve the above bound
comparing the size N of the matrices under consideration to the number m of linear constraints on
the spectrahedron.
We now prove formally that we can assume the condition number of BB ⊤ to be 1.

Lemma 8.1. Let S be a spectrahedron as defined above, so that

S := {P ∈ PSD(N ) : tr(Ak P ) = bk , k ∈ [m]} ,

where Ak are linearly independent for k ∈ [m]. There exists a choice of A′k and b′k which defines
the same spectrahedron S, so that B ′ (B ′ )⊤ = Im (where B ′ is defined analogously with respect to A′
as B is to A). Further, the matrix P ⋆ and the volume approximation formula for S are unchanged
by replacing Ak and bk by A′k and b′k respectively.

Proof. First note that since the entropy function φ given above is not dependent on the represen-
tation of S, we have that the optimizer P ⋆ is also not dependent on the representation of S. Now,
S can be defined as the set of all positive semi-definite matrices satisfying the linear system given
by
 ⊤
| | |
b = AX = vec(A1 ) vec(A2 ) · · · vec(Am ) vec(X).
| | |
√ √
Using the above
√ notation,
√ we defined Zk := P ⋆ Ak P ⋆ . Considering the invertible linear map
⋆ ⋆
LP ⋆ : A 7→ P A P on real symmetric matrices, we can write
√ √
Zk = P ⋆ Ak P ⋆ = vec−1 (LP ⋆ · vec(Ak ))

for some n2 × n2 invertible matrix LP ⋆ which represents the linear map LP ⋆ . With this, we can
write  ⊤
| | |  
BX = vec(Z1 ) vec(Z2 ) · · · vec(Zm ) vec(X) = A · L⊤ P ⋆ vec(X).
| | |
Now let B = U ΣV ⊤ be the real singular value decomposition of B, where Σ is an m×N 2 rectangular
diagonal matrix with non-negative entries and U and V are real orthogonal matrices of appropriate
size. Further, since m < N 2 and the matrices Ak are linearly independent for k ∈ [m], we have
that  
Σ = D 0N 2 −m ,
where D is an m × m diagonal matrix with strictly positive entries. Now define C := D −1 U −1 and
 −1
A′ := CA = CB L⊤
P⋆ and b′ = Cb.

36
Note that since A is a matrix whose rows are vectorizations of real symmetric matrices, we have
that A′ is also a matrix of rows are vectorizations of real symmetric matrices. Therefore A′ is the
linear system corresponding to another spectrahedron given by

S ′ := P ∈ PSD(N ) : tr(A′k P ) = b′k , k ∈ [m] ,




where the matrices A′k correspond to the rows of A′ . Since C is invertible, we have in fact that
S ′ = S, and therefore
  −1 
′ ′ ⊤ ⊤
L⊤ −1 −1
U ΣV ⊤ = Im 0N 2 −m V ⊤ ,
 
B = A · LP ⋆ = CB LP ⋆ P ⋆ = CB = D U

where B ′ is defined analogously with respect to A′ as B is to A. Thus BB ⊤ = Im . Finally, since

det(A′ (A′ )⊤ ) det(CAA⊤ C ⊤ ) det(AA⊤ )


= = ,
det(B ′ (B ′ )⊤ ) det(CBB ⊤ C ⊤ ) det(BB ⊤ )

we have that the volume approximation formula remains unchanged by replacing Ak and bk by A′k
and b′k respectively.

With this lemma in hand, we can now prove the main results.

8.2 Proof of Theorem 2.1: Main approximation result


We want to apply Theorem 7.1, where we replace Condition A3 by Condition A3′ via the discussion
of Section 8.1. To this end, we first define

m3 log N
δ := 32γ · ,
N
where γ = 105 as in Theorem 7.1. Recall the assumptions of Theorem 2.1: ǫ ≤ e−1 and Condition
(1), which states
ǫ2 m3 log N
≥ 32γ · .
log3 (ǫ−1 ) N
Note that the extra factor of 32 here is due to the difference in the value of the constant γ. Thus
we have
2 
log(ǫ−1 ) log(ǫ−1 )
 
2 3 −1 δ −1 3 δ
ǫ ≥ δ · log (ǫ ) ≥ (1 + log(ǫ )) ≥ 1+ 1+ .
8 8 m log N

Therefore
4γm(m + log(ǫ−1 ))2 (log N + log(ǫ−1 )) 4γm(m + log(ǫ−1 ))2 log(N ǫ−1 )
ǫ2 ≥ ≥ ,
N N +1
which is precisely Condition A3′ . This completes the proof.

37
8.3 Proof of Theorem 2.2: Main asymptotic result
Recall the assumptions of Theorem 2.2: ǫ < e−1 and Condition (2), which states

m3n log Nn
lim = 0.
n→∞ Nn

Now recall that mn and Nn are positive integers. By the assumption that mn < Nn2 , this implies
Nn ≥ 2 and
2 3
4γmn mn + log(ǫ−1 ) log(Nn ǫ−1 ) 8γ 1 + log(ǫ−1 ) m3n log Nn
lim ≤ lim = 0.
n→∞ ǫ2 (Nn + 1) ǫ2 n→∞ Nn

That is, there exists nǫ such that for all n ≥ nǫ we have that condition (A3′ ) in Section 8.1 is
satisfied for Sn . Therefore by Theorem 2.1 and Section 8.1, we have


vol(Sn )
− 1 ≤ǫ

1/2
 
Nn +1 mn /2 det(An A⊤n) φn (Pn⋆ )
e

4π det(B B ⊤ )

n n

for all n ≥ nǫ . This completes the proof.

8.4 Proof of Corollary 2.5: Central sections of the spectraplex


Recall in Corollary 2.5 that we define

S1 := {P ∈ PSD(N ) : tr(P ) = 1}

as the standard spectraplex, and for M ∈ Sym(N ) that is linearly independent of IN we define
 
1
SM := P ∈ PSD(N ) : tr(P ) = 1 and tr(P M ) = tr(M ) ,
N

which we call a central section of S1 .


1
Lemma 8.2. For any M ∈ Sym(N ) that is not a scalar multiple of the identity matrix, N IN
maximizes φ on S1 and on SM .

Proof. The gradient of φ satisfies ∇φ(P ) = N 2+1 P −1 . Since S1 orthogonal to the line spanned
by N1 IN , we have that a matrix P ∗ maximizes φ on S1 if and only if (P ∗ )−1 (and thus P ∗ ) is a
scalar multiple of IN . This shows that N1 IN maximizes φ over S1 . Since N1 IN ∈ SM ⊂ S1 , it also
maximizes φ over SM .

With this, we prove Corollary 2.5 as follows. Let {SMn }∞


n=1 be any sequence of central sections
of S1 , with Nn → ∞. Thus we have

m3n log Nn 8 log Nn


lim = lim = 0.
n→∞ Nn n→∞ Nn

Thus for every ǫ ≤ e−1 , there is some nǫ such that for n ≥ nǫ we have

ǫ2 m3n log Nn
≥ 32γ · ,
log3 (ǫ−1 ) Nn

38
which is the condition required to apply Theorem 2.1. Applying Theorem 2.1 then completes
the proof. For the asymptotic statement at the end of Corollary 2.5, one can also directly apply
Theorem 2.2.
As a final comment, note that this result says: for large enough N , all central sections of the
spectraplex have volume close to the expected volume over all central sections. See [BKL19] for
further discussion on the volume of random spectrahedra.

Acknowledgements. The authors would like to thank Alexander Barvinok, Peter Bürgisser, and Ak-
shay Ramachandran for very helpful discussions. The first author is supported by the ERC under the
European’s Horizon 2020 research and innovation programme (grant 787840). The second author was par-
tially supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Ger-
many’s Excellence Strategy – The Berlin Mathematics Research Center MATH+ (EXC-2046/1, project ID:
390685689). The third author would like to thank the Institute of Mathematical Sciences, Chennai, for
hosting him in Spring 2022, when part of this work was done. He also gratefully acknowledges financial
support from the Bogazici University Solidarity fund.

References
[Ali95] Farid Alizadeh. Interior point methods in semidefinite programming with applications to com-
binatorial optimization. SIAM Journal on Optimization, 5(1):13–51, 1995.
[AS17] Guillaume Aubrun and Stanislaw J Szarek. Alice and Bob meet Banach, volume 223. American
Mathematical Soc., 2017.
[BFG+ 18] Peter Bürgisser, Cole Franks, Ankit Garg, Rafael Oliveira, Michael Walter, and Avi Wigderson.
Efficient algorithms for tensor scaling, quantum marginals, and moment polytopes. In 2018
IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 883–897.
IEEE, 2018.
[BGO+ 18] Peter Bürgisser, Ankit Garg, Rafael Oliveira, Michael Walter, and Avi Wigderson. Alternating
minimization, scaling algorithms, and the null-cone problem from invariant theory. 2018.
[BH93] Ulrich Betke and Martin Henk. Approximating the volume of convex bodies. Discrete & Com-
putational Geometry, 10(1):15–21, 1993.
[BH09] Alexander Barvinok and JA Hartigan. Maximum entropy Edgeworth estimates of the number
of integer points in polytopes. arXiv preprint arXiv:0910.2497, 2009.
[BH10] Alexander Barvinok and JA Hartigan. Maximum entropy gaussian approximations for the num-
ber of integer points and volumes of polytopes. Advances in Applied Mathematics, 45(2):252–289,
2010.
[BH12] Alexander Barvinok and JA Hartigan. An asymptotic formula for the number of non-negative
integer matrices with prescribed row and column sums. Transactions of the American Mathe-
matical Society, 364(8):4323–4368, 2012.
[Bil95] Patrick Billingsley. Probability and Measure. Wiley series in probability and mathematical
statistics. Wiley, New York u.a., 3. ed. edition, 1995.
[BKL19] Paul Breiding, Khazhgali Kozhasov, and Antonio Lerario. Random spectrahedra. SIAM Journal
on Optimization, 29(4):2608–2624, 2019.
[BP14] David Benson-Putnins. Counting integer points in multi-index transportation polytopes. arXiv
preprint arXiv:1402.4715, 2014.
[BPT12] Grigoriy Blekherman, Pablo A Parrilo, and Rekha R Thomas. Semidefinite optimization and
convex algebraic geometry. SIAM, 2012.

39
[BR21] Alexander Barvinok and Mark Rudelson. A quick estimate for the volume of a polyhedron.
arXiv preprint arXiv:2112.06322, 2021.
[BR22] Alexander Barvinok and Mark Rudelson. When a system of real quadratic equations has a
solution. Advances in Mathematics, 403:108391, 2022.
[Brz13] Patryk Brzezinski. Volume estimates for sections of certain convex bodies. Mathematische
Nachrichten, 286(17-18):1726–1743, 2013.
[BV04] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press,
2004.
[CDWY18] Yuansi Chen, Raaz Dwivedi, Martin J Wainwright, and Bin Yu. Fast MCMC sampling algo-
rithms on polytopes. The Journal of Machine Learning Research, 19(1):2146–2231, 2018.
[CEF19] Apostolos Chalkis, Ioannis Z Emiris, and Vissarion Fisikopoulos. Practical volume estimation
by a new annealing schedule for cooling convex bodies. arXiv preprint arXiv:1905.05494, 2019.
[CF20] Apostolos Chalkis and Vissarion Fisikopoulos. volesti: Volume approximation and sampling for
convex polytopes in R. arXiv preprint arXiv:2007.01578, 2020.
[CFRT21] Apostolos Chalkis, Vissarion Fisikopoulos, Panagiotis Repouskos, and Elias Tsigaridas. Sam-
pling the feasible sets of SDPs and volume approximation. ACM Commun. Comput. Algebra,
54(3):114–118, mar 2021.
[Cho75] Man-Duen Choi. Completely positive linear maps on complex matrices. Linear algebra and its
applications, 10(3):285–290, 1975.
[CM07] E Rodney Canfield and Brendan D McKay. The asymptotic volume of the Birkhoff polytope.
arXiv preprint arXiv:0705.2422, 2007.
[Cou17] Benjamin Cousins. Efficient high-dimensional sampling and integration. PhD thesis, Georgia
Institute of Technology, 2017.
[CV16] Ben Cousins and Santosh Vempala. A practical volume algorithm. Mathematical Programming
Computation, 8(2):133–160, 2016.
[DF88] M. E. Dyer and A. M. Frieze. On the complexity of computing the volume of a polyhedron.
SIAM Journal on Computing, 17(5):967–974, 1988.
[DFK91] Martin Dyer, Alan Frieze, and Ravi Kannan. A random polynomial-time algorithm for approx-
imating the volume of convex bodies. J. ACM, 38(1):1–17, jan 1991.
[EF14] Ioannis Z Emiris and Vissarion Fisikopoulos. Efficient random-walk methods for approximating
polytope volume. In Proceedings of the thirtieth annual symposium on Computational geometry,
pages 318–327, 2014.
[EF18] Ioannis Z Emiris and Vissarion Fisikopoulos. Practical polytope volume approximation. ACM
Transactions on Mathematical Software (TOMS), 44(4):1–21, 2018.
[Ele86] G. Elekes. A geometric inequality and the complexity of computing volume. Discrete & Com-
putational Geometry, 1(4):289–292, December 1986.
[Gül96] Osman Güler. Barrier functions in interior point methods. Mathematics of Operations Research,
21(4):860–885, 1996.
[Jay57a] Edwin T Jaynes. Information theory and statistical mechanics I. Physical review, 106(4):620,
1957.
[Jay57b] Edwin T Jaynes. Information theory and statistical mechanics II. Physical review, 108(2):171,
1957.
[KLS97] Ravi Kannan, László Lovász, and Miklós Simonovits. Random walks and an O∗ (n5 ) volume
algorithm for convex bodies. Random Structures & Algorithms, 11(1):1–50, 1997.

40
[Kly02] Alexander Klyachko. Coherent states, entanglement, and geometric invariant theory. 2002.
[Kly04] Alexander Klyachko. Quantum marginal problem and representations of the symmetric group.
2004.
[LS93a] LJ Landau and RF Streater. On Birkhoff’s theorem for doubly stochastic completely positive
maps of matrix algebras. Linear algebra and its applications, 193:107–127, 1993.
[LS93b] László Lovász and Miklós Simonovits. Random walks in a convex body and an improved volume
algorithm. Random structures & algorithms, 4(4):359–412, 1993.
[LV06] László Lovász and Santosh Vempala. Simulated annealing in convex bodies and an O∗ (n4 )
volume algorithm. Journal of Computer and System Sciences, 72(2):392–417, 2006.
[LV18] Yin Tat Lee and Santosh S Vempala. Convergence rate of Riemannian Hamiltonian Monte Carlo
and faster polytope volume computation. In Proceedings of the 50th Annual ACM SIGACT
Symposium on Theory of Computing, pages 1115–1121, 2018.
[LV20] Jonathan Leake and Nisheeth K Vishnoi. On the computability of continuous maximum entropy
distributions with applications. In Proceedings of the 52nd Annual ACM SIGACT Symposium
on Theory of Computing, pages 930–943, 2020.
[MV19] Oren Mangoubi and Nisheeth K Vishnoi. Faster polytope rounding, sampling, and volume
computation via a sub-linear ball walk. In 2019 IEEE 60th Annual Symposium on Foundations
of Computer Science (FOCS), pages 1338–1357. IEEE, 2019.
[NN94] Yurii Nesterov and Arkadii Nemirovskii. Interior-Point Polynomial Algorithms in Convex Pro-
gramming. Society for Industrial and Applied Mathematics, 1994.
[NY76] Arkadi S Nemirovskii and David Berkovich Yudin. Informational complexity and efficient meth-
ods for the solution of convex extremal problems. Matekon, 13(2):22–45, 1976.
[NY77] Arkadi S Nemirovski and David Berkovich Yudin. Optimization methods adaptive to significant
dimension of the problem. Avtomatika i Telemekhanika, (4):75–87, 1977.
[Ren01] James Renegar. A Mathematical View of Interior-Point Methods in Convex Optimization. So-
ciety for Industrial and Applied Mathematics, 2001.
[Sha48] Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical
journal, 27(3):379–423, 1948.
[Sho77] Naum Z Shor. Cut-off method with space extension in convex programming problems. Cyber-
netics, 13(1):94–96, 1977.
[SV14] Mohit Singh and Nisheeth K Vishnoi. Entropy, optimization and counting. In Proceedings of
the forty-sixth annual ACM symposium on Theory of computing, pages 50–59, 2014.
[Tod01] MJ Todd. Semidefinite optimization. Acta Numerica, 10:515–560, 2001.
[VBW98] Lieven Vandenberghe, Stephen Boyd, and Shao-Po Wu. Determinant maximization with linear
matrix inequality constraints. SIAM Journal on Matrix Analysis and Applications, 19(2):499–
533, 1998.
[Web96] Simon Webb. Central slices of the regular simplex. Geometriae Dedicata, 61(1):19–28, June
1996.
[WSV12] Henry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe. Handbook of semidefinite program-
ming: Theory, algorithms, and applications, volume 27. Springer Science & Business Media,
2012.

41

You might also like