Thesis
Thesis
Dissertation
By
2019
Dissertation Committee:
Facundo Mémoli, Advisor
Matthew Kahle
David Sivakoff
c Copyright by
Samir Chowdhury
2019
Abstract
Network data, which shows the relationships between entities in complex systems, is
becoming available at an ever-increasing rate. In particular, advances in data acquisition
and computational power have shifted the bottleneck in analyzing weighted and directed
network datasets towards the domain of available mathematical methods. Thus there is a
pressing need to develop mathematical foundations for analyzing such datasets.
In this thesis, we present methods for applying one of the flagship tools of topological
data analysis—persistent homology—to weighted, directed network datasets. We ground
these methods in a network distance dN that had appeared in a restricted context in ear-
lier literature, and is now fully developed in this thesis. This development independently
provides metric methods for network data analysis, including and invoking methods from
optimal transport.
In our framework, a network dataset is represented as a set of points X (equipped
with the minimalistic structure of a first countable topological space) and a (continuous)
edge weight function ωX : X × X → R. With this terminology, a finite network dataset
is viewed as a finite sample from some infinite underlying process—a compact network.
This perspective is especially appropriate for data streams that are so large that they are
“essentially infinite”, or are perhaps being generated continuously in time.
We show that the space of all compact networks is the completion of the space of all
finite networks. We develop the notion of isomorphism in this space, and explore a range
of different geodesics that exist in this space. We develop sampling theorems and explain
their use in obtaining probabilistic convergence guarantees. Several persistent homology
methods—notably including persistent path homology—are also developed. By virtue of
the sampling theorems, we are able to define these methods even for infinite networks.
Our theoretical contributions are complemented by software packages that we devel-
oped in the course of producing this thesis. We illustrate the theory and implementations
via experiments on simulated and real-world data.
ii
In my family, I found boundless sources of love and encouragement. My grandparents,
aunts, and uncles provided me with all the support I could ask for. My mother gave me
insight from her own life as an academic, my father taught me the quadratic formula and
my first lessons in the sciences, and my brother lifted the weight of responsibility from my
shoulders so I could pursue my own path. It is to them that I dedicate this work.
iii
Acknowledgments
My thesis advisor, Facundo Mémoli, has invested incredible amounts of time and en-
ergy into my education, and I can only hope to pay it forward. Even during the final stages
of writing this thesis, I was amazed at how his vision had materialized as connections be-
tween all the research that we had done together, and how neatly the different concepts
rolled out and interlocked, like the pieces of a puzzle. I am happy and grateful that I was
able to work under his guidance for the past five years.
I am lucky to have had many mentors over the years; special thanks go to Henry Adams,
Jose Perea, Chad Giusti, Matthew Kahle, and David Sivakoff, who have all contributed
crucially to the development of my career. I would like to thank Neil Falkner, whose
instruction was critical for my education, and also Boris Hasselblatt, Richard Weiss, and
Genevieve Walsh, who first led me into mathematics.
Many academics took the time to give me advice and direction over the years. From
these conversations, I have often carried away a new insight into a complex problem, or
a deeper realization about the academic world at large. Javier Arsuaga, Pablo Cámara,
Chao Chen, Paweł Dłotko, Moon Duchin, Greg Henselman, Kathryn Hess, Steve Hunts-
man, Sara Kališnik, Katherine Kinnaird, Sanjeevi Krishnan, Melissa McGuirl, Amit Patel,
Xaq Pitkow, Vanessa Robins, Manish Saggar, Santiago Segarra, Elchanan Solomon, Justin
Solomon, Mimi Tsuruga, Mariel Vázquez, Bei Wang, Sunny Xiao, Lori Ziegelmeier—
thank you all, you make this community a wonderful place to be in.
Our research group—and more broadly, the TGDA group at OSU—comprised a won-
derful community of grad students and postdocs who contributed to my way of thinking
about problems. Tom Needham and Ben Schweinhart were both exceptional in that re-
gard. Tamal Dey gave the seminar talk that first got me interested in applied topology, and
Anastasios Sidiropoulos got me thinking about directed graphs.
My friends in Columbus made sure that my time here was rich and rewarding. Thanks
to the 91 crew for the endless laughs, to Sunita and Sabrina for many therapeutic con-
versations, and to the Rahmans for making me part of their family. Thanks to Tia for
contributing to my career, in so many ways known and unknown. Danke schön to Natalie
for her warmth, wit, and wisdom. My fellow grad students were a constant source of sup-
port and entertainment; I was lucky to have Marissa as a study buddy, Katie for keeping
my harebrained ideas about the TAGGS seminar in check, and Evan and Kevin for many
welcome distractions. Osama, Hanbaek, and I were there for each other through all the
highs and lows. I could not ask for it to be any other way.
iv
Vita
Publications
Research Publications
v
S. Chowdhury, F. Mémoli. “Persistent homology of directed networks”. 50th Asilomar
Conference on Signals, Systems and Computers, 2016.
Fields of Study
vi
Table of Contents
Page
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
vii
1.5.3 An application: Characterizing the diagrams of cycle networks . 41
1.6 The case of compact networks . . . . . . . . . . . . . . . . . . . . . . . 42
1.6.1 ε-systems and finite sampling . . . . . . . . . . . . . . . . . . . 42
1.6.2 Weak isomorphism and dN . . . . . . . . . . . . . . . . . . . . 44
1.6.3 An additional axiom coupling weight function with topology . . 47
1.6.4 Skeletons, motifs, and motif reconstruction . . . . . . . . . . . . 49
1.7 Diagrams of compact networks and convergence results . . . . . . . . . 53
1.8 Completeness, compactness, and geodesics . . . . . . . . . . . . . . . . 54
1.8.1 Completeness of (CN /∼ =w , dN ) . . . . . . . . . . . . . . . . . . 54
1.8.2 Precompact families in CN /∼ =w . . . . . . . . . . . . . . . . . . 55
1.8.3 Geodesics: existence and explicit examples . . . . . . . . . . . . 55
1.9 Measure networks and the dN,p distances . . . . . . . . . . . . . . . . . 61
1.9.1 The structure of measure networks . . . . . . . . . . . . . . . . 63
1.9.2 Couplings and the distortion functional . . . . . . . . . . . . . . 65
1.9.3 Interval representation and continuity of distortion . . . . . . . . 66
1.9.4 Optimality of couplings in the network setting . . . . . . . . . . 68
1.9.5 The Network Gromov-Wasserstein distance . . . . . . . . . . . . 69
1.9.6 The Network Gromov-Prokhorov distance . . . . . . . . . . . . 72
1.9.7 Lower bounds and measure network invariants . . . . . . . . . . 72
1.10 Computational aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
1.10.1 Software packages . . . . . . . . . . . . . . . . . . . . . . . . . 79
1.10.2 Simulated hippocampal networks . . . . . . . . . . . . . . . . . 79
1.10.3 Clustering SBMs and migration networks . . . . . . . . . . . . . 82
viii
3. Persistent Homology on Networks . . . . . . . . . . . . . . . . . . . . . . . . 138
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
ix
List of Tables
Table Page
1.1 Left: The five classes of SBM networks corresponding to the experiment
in §1.10.3. N refers to the number of communities, v refers to the vector
that was used to compute a table of means via G5 (v), and ni is the number
of nodes in each community. Right: G5 (v) for v = [0, 25, 50, 75, 100]. . . . 85
4.1 The first two columns contain sample 0-dimensional persistence intervals,
as produced by Javaplex. We have added the labels in column 3, and the
common δ-sinks in column 4. . . . . . . . . . . . . . . . . . . . . . . . . . 193
x
4.4 Short 0-dimensional Dowker persistence intervals capture regions which
receive most of their incoming migrants from a single source. Each in-
terval [0, δ) corresponds to a 0-simplex which becomes subsumed into a
1-simplex at resolution δ. We list these 1-simplices in the second column,
and their δ sinks in the third column. The definition of a δ-sink enables us
to produce a lower bound on the migration into each sink, which we pro-
vide in the fourth column. We also list the true migration numbers in the
fifth column, and the reader can consult §4.5.2 for our explanation of the
error between the true migration and the lower bounds on migration. . . . . 208
xi
List of Figures
Figure Page
1.1 The two networks on the left have different cardinalities, but computing
correspondences shows that dN (X, Y ) = 1. Similarly one computes dN (X, Z) =
0, and thus dN (Y, Z) = 1 by triangle inequality. On the other hand, the bi-
jection given by the arrows shows dbN (Y, Z) = 1. Applying Proposition 12
then recovers dN (X, Y ) = 1. . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 The directed circle (~S1 , ω~S1 ), the directed circle on 6 nodes (~S16 , ω~S1 ), and
6
the directed circle with reversibility ρ, for some ρ ∈ [1, ∞). Traveling in a
clockwise direction is possibly only in the directed circle with reversibility
ρ, but this incurs a penalty modulated by ρ. . . . . . . . . . . . . . . . . . 15
1.3 A cycle network on 6 nodes, along with its weight matrix. Note that the
weights are highly asymmetric. . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 A network SBM on 50 nodes, split into 5 communities, along with the
matrices of means and variances. The deepest blue corresponds to values
≈ 1, and the deepest yellow corresponds to values ≈ 29. . . . . . . . . . . 17
1.6 Computing the Dowker sink and source complexes of a network (X, ωX ).
Observe that the sink and source complexes are different in the range 1 ≤
δ < 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
xii
1.7 The first column contains illustrations of cycle networks G3 , G4 , G5 and
G6 . The second column contains the corresponding Dowker persistence
barcodes, in dimensions 0 and 1. Note that the persistent intervals in the
1-dimensional barcodes agree with the result in Theorem 40. The third col-
umn contains the Rips persistence barcodes of each of the cycle networks.
Note that for n = 3, 4, there are no persistent intervals in dimension 1. On
the other hand, for n = 6, there are two persistent intervals in dimension 1. 31
1.9 Dowker persistence barcodes of networks (X, ωX ) and (Y, ωY ) from Figure
1.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.10 Rips persistence barcodes of networks (X, ωX ) and (Y, ωY ) from Figure
1.8. Note that the Rips diagrams indicate no persistent homology in di-
mensions higher than 0, in contrast with the Dowker diagrams in Figure
1.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.13 Working over Z/2Z coefficients, we find that DgmΞ1 (X ) and DgmD
1 (Y) are
D Ξ
trivial, whereas Dgm1 (X ) = Dgm1 (Y) = {(1, 2)} = {(1, 2)}. . . . . . . . 40
1.15 Relaxing the requirements on the maps of this “tripod structure” is a natural
way to weaken the notion of strong isomorphism. . . . . . . . . . . . . . . 45
1.16 Note that Remark 69 does not fully characterize weak isomorphism, even
for finite networks: All three networks above, with the given weight matri-
ces, are Type I weakly isomorphic since C maps surjectively onto A and
B. But there are no surjective, weight preserving maps A → B or B → A. . 46
xiii
1.17 Left: Z represents a terminal object in p(X), and f, g are weight preserving
surjections X → Z. Here ϕ ∈ Aut(Z) is such that g = ϕ ◦ f . Right:
Here we show more of the poset structure of p(X). In this case we have
X V Y . . . Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.19 Illustrations of the finite networks we consider in this paper. Notice that
the edge weights are asymmetric. The numbers in each node correspond to
probability masses; for each network, these masses sum to 1. . . . . . . . . 62
1.20 The dN,p distance between the two one-node networks is simply 12 |α − α0 |.
In Example 109 we give an explicit formula for computing dN,p between
an arbitrary network and a one-node network. . . . . . . . . . . . . . . . . 70
1.21 Networks at dN,p -distance zero which are not strongly isomorphic. . . . . . 71
1.22 Bottom right: Sample place cell spiking pattern matrix. The x-axis cor-
responds to the number of time steps, and the y-axis corresponds to the
number of place cells. Black dots represent spikes. Clockwise from bot-
tom middle: Sample distribution of place field centers in 4, 3, 0, 1, and
2-hole arenas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
xiv
1.25 Left: TLB dissimilarity matrix for SBM community networks in §1.10.3.
Classes 1 and 3 are similar, even though networks in Class 3 have twice
as many nodes as those in Class 1. Classes 2 and 5 are most dissimilar
because of the large difference in their edge weights. Class 4 has a different
number of communities than the others, and is dissimilar to Classes 1 and 3
even though all their edge weights are in comparable ranges. Right: TLB
dissimilarity matrix for two-community SBM networks in §1.10.3. The
near-zero values on the diagonal are a result of using the adaptive λ-search
described in Chapter 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
1.26 Result of applying the TLB to the migration networks in §1.10.3. Left:
Dissimilarity matrix. Nodes 1-5 correspond to female migration from 1960-
2000, and nodes 6-10 correspond to male migration from 1960-2000. Right:
Single linkage dendrogram. Notice that overall migration patterns change
in time, but within a time period, migration patterns are grouped according
to gender. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.1 The trace map erases data between pairs of nodes. . . . . . . . . . . . . . . 119
2.2 The out map applied to each node yields the greatest weight of an arrow
leaving the node, and the in map returns the greatest weight entering the
node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.1 Left: Lower bound matrix arising from matching local spectra on the
database of community networks. Right: Corresponding single linkage
dendrogram. The labels indicate the number of communities and the total
number of nodes. Results correspond to using local spectra as described in
Proposition 180. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.2 Left: Lower bound matrix arising from matching global spectra on the
database of community networks. Right: Corresponding single linkage
dendrogram. The labels indicate the number of communities and the total
number of nodes. Results correspond to using global spectra as signatures. 176
4.3 Single linkage dendrogram based on local spectrum lower bound of Propo-
sition 180 corresponding to hippocampal networks with place field radii
0.2L, 0.1L, and 0.05L (clockwise from top left). . . . . . . . . . . . . . . . 178
xv
4.4 Sinkhorn computations for dN,2 (X , Y) are carried out in the “stable region”
for K, and the end result is rescaled to recover dN,2 (X, Y ). . . . . . . . . . 183
4.5 Left: The rows and columns of Mp are initially arranged so that the domain
and codomain vectors are in increasing and decreasing allow time, respec-
tively. If there are no domain (codomain) vectors having a particular allow
time, then the corresponding vertical (horizontal) strip is omitted. Right:
After converting to column echelon form, the domain vectors of Mp,G need
not be in the original ordering. But the codomain vectors are still arranged
in decreasing allow time. . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
4.9 Here we illustrate the representative nodes for one of the 1-dimensional
persistence intervals in Figure 4.6. This 1-cycle [PC,CH] + [CH,PL] +
[PL,WO] - [WO,FM] + [FM,PM] - [PM,PC] persists on the interval [0.75, 0.95).
At δ = 0.94, we observe that this 1-cycle has joined the homology equiv-
alence class of the shorter 1-cycle illustrated on the right. Unidirectional
arrows represent an asymmetric flow of investment. A full description of
the meaning of each arrow is provided in §4.5.1. . . . . . . . . . . . . . . . 195
xvi
4.13 An example of a 1-cycle becoming a 1-boundary due to a single mutual sink
r, as described in the interpretation of 1-cycles in §4.5.2. The figure on the
left shows a connected component of Dsiδ,S , consisting of [s1 , s2 ], [s2 , s3 ], [s3 , s4 ].
The arrows are meant to suggest that r will eventually become a δ-sink for
each of these 1-simplices, for some large enough δ. The progression of
these simplices for increasing values of δ are shown from left to right. In
the leftmost figure, r is not a δ-sink for any of the three 1-simplices. Note
that r has become a δ-sink for [s3 , s4 ] in the middle figure. Finally, in the
rightmost figure, r has become a δ-sink for each of the three 1-simplices. . . 201
4.14 0 and 1-dimensional Dowker persistence barcodes for U.S. migration data . 203
4.15 U.S. map with representative cycles of the persistence intervals that were
highlighted in Figure 4.14. The cycle on the left appears at δ1 = 0.87,
and the cycle on the right appears at δ2 = 0.90. The red lines indicate the
1-simplices that participate in each cycle. Each red line is decorated with
an arrowhead si → sj if and only if sj is a sink for the simplex [si , sj ].
The blue arrows point towards all possible alternative δ-sinks, and are in-
terpreted as follows: Tennessee is a 0.90-sink for the Kentucky-Georgia
simplex, West Virginia is a 0.90-sink for the Ohio-Florida simplex, and
Alabama is a 0.90-sink for the Georgia-Florida simplex. . . . . . . . . . . . 204
4.17 Top: Two cycles corresponding to the left endpoints of the (Djibouti-
Somalia-Uganda-Eritrea-Ethiopia) and (Kiribati-Papua New Guinea-Australia-
United Kingdom-Tuvalu) persistence intervals listed in Table 4.5. The
δ values are 0.73, 0.77, respectively. Bottom: Two cycles correspond-
ing to the left endpoints of the (China-Thailand-Philippines) and (China-
Indonesia-Malaysia) persistence intervals listed in Table 4.5. The δ values
are 0.77, 0.75, respectively. Meaning of arrows: In each cycle, an arrow
si → sj means that ωS (si , sj ) ≤ δ, i.e. that sj is a sink for the simplex
[si , sj ]. We can verify separately that for δ = 0.77, the Kiribati-Papua
New Guinea simplex has the Solomon Islands as a δ-sink, and that the
Philippines-Thailand simplex has Taiwan as a δ-sink. Similarly, the China-
Malaysia simplex has Singapore as a δ-sink, for δ = 0.75. . . . . . . . . . 209
xvii
Background definitions and conventions
A metric space (X, dX ) is totally bounded if for any ε > 0, there exists a finite subset
S ⊆ X such that dX (x, S) < ε for all x ∈ S. Here dX (x, S) := mins∈S dX (x, s).
Given a set X, a topology on X is a subset τX ⊆ pow(X) such that:
xviii
The elements of τX are referred to as the open sets of X.
An open cover of a topological space X is a collection of open sets {Ui ⊆ X : i ∈ I}
S
indexed by some set I such that each Ui is nonempty, and i∈I Ui = X.
A base or basis for τX is a subcollection of τX such that every open set in X can be
written as a union of elements in the subcollection. A local base at a point x ∈ X is a
collection of open sets containing x such that every open set containing x contains some
element in this collection.
There are always two topologies that one can place on a set X: the discrete topology
τX = pow(X), and the trivial topology τX = {∅, X}.
A point cloud is a discrete subset of d-dimensional Euclidean space for some d ∈ N.
Given two topological spaces X and Y , two continuous maps f, g : X → Y are
said to be homotopic if there exists a continuous map F : X × [0, 1] → Y such that
F |X×{0} = f and F |X×{1} = g. X and Y are said to be homotopy equivalent if there exist
maps f : X → Y and g : Y → X such that g ◦ f ' idX and f ◦ g ' idY . In this case, f
and g are said to be homotopy inverses.
The indicator function of a set S is denoted 1S . We denote measure spaces via the
triple (X, F, µ), where X is a set, F is a σ-field on X, and µ is the measure on F. Given a
measure space (X, F, µ), we write L0 = L0 (µ) to denote the collection of F-measurable
R
functions f : X → R. For all p ∈ (0, ∞) and all f ∈ L0 , we define kf kp := ( |f |p dµ)1/p .
For p = ∞, kf k∞ := inf{M ∈ [0, ∞] : µ(|f | > M ) = 0}. Then for any p ∈ (0, ∞],
Lp = Lp (µ) := {f ∈ L0 : kf kp < ∞}.
Given a measurable real-valued function f : X → R and t ∈ R, we will occasionally
write {f ≤ t} to denote the set {x ∈ X : f (x) ≤ t}.
Lebesgue measure on the reals will be denoted by λ. We write λI to denote the
Lebesgue measure on the unit interval I = [0, 1].
Suppose we have a measure space (X, F, µ), a measurable space (Y, G), and a measur-
able function f : X → Y . The pushforward or image measure of f is defined to be the
measure f∗ µ on G given by writing f∗ µ(A) := µ(f −1 [A]) for all A ∈ G.
A particular case where we deal with pushforward measures is the following: given a
product space X = X1 × X2 × . . . × Xn and a measure µ on X , the canonical projection
maps πi : X → Xi , for i = 1, . . . , n, define pushforward measures that we denote (πi )∗ µ.
If each Xi is itself a measure space with measure µi , then we say that µ has marginals µi ,
for i = 1, . . . , n, if (πi )∗ µ = µi for each i. We also consider projection maps of the form
(πi , πj , πk ) : X → Xi × Xj × Xk for i, j, k ∈ {1, . . . , n}, and denote the corresponding
pushforward by (πi , πj , πk )∗ µ. Notice that we can take further projections of the form
(πi , πj )ijk : Xi × Xj × Xk → Xi × Xj , and the images of these projections are precisely
those given by projections of the form (πi , πj ) : X → Xi × Xj .
xix
Xi × Xj × Xk → Xi × Xk and Xi × Xj × Xk → Xj be denoted (πi , πk )ijk and (πj )ijk ,
respectively. Let B ⊆ Xj be measurable. Then
(πj )ijk
∗ (B) = (πi , πj , πk )∗ µ(Xi × B × Xk ) = µ(X1 × . . . × Xj−1 × B × Xj+1 × . . . × Xn )
= (πj )∗ (B).
(πi , πk )ijk
∗ (A × C) = (πi , πj , πk )∗ µ(A × Xj × C)
= µ(X1 × . . . × Xi−1 × A × Xi+1 × . . . × Xk−1 × C × Xk+1 × . . . × Xn )
= (πi , πk )∗ (A × C).
xx
Chapter 1: Introduction
Let X be a set, and let ωX : X × X → R be any function. The pair (X, ωX ) (with some
additional constraints, cf. Definition 1) is what we refer to as a network, and is the central
object of study in this thesis. Networks arise in many guises, and in many contexts [90], so
it is necessary to explain the motivation behind the preceding definition. Perhaps the most
common type of network is an undirected graph G = (V, E) consisting of a vertex set V
and edge set E. Letting n be the number of elements in V , the adjacency matrix of G is
an n × n binary matrix with a 1 in entry (i, j) if vi → vj , and 0 otherwise. In particular,
the adjacency matrix is symmetric. More generally, a directed graph relaxes the symmetry
condition by allowing for individual edges vi → vj and vj → vi . Finally, a weighted,
directed graph allows for real-valued weights on the edges. The pair (X, ωX ) encapsulates
this idea. In this sense, the networks we study in this thesis are weighted, directed graphs
with self-loops1 . We denote the collection of all networks (cf. Definition 1) by N .
In addition to graphs, N contains many other classes of objects, including metric
spaces, directed metric spaces, Riemannian manifolds, and Finsler manifolds. As will
be shown throughout this work, the perspective of viewing networks as generalized metric
spaces (as opposed to combinatorial objects) enables the import of many techniques—both
new and old—from the theory of metric spaces into network analysis. This ranges from
classical results of Gromov on reconstructions of metric spaces, to more modern tools such
as persistent homology and applied optimal transport.
The question which initially motivated this thesis is as follows. In an unpublished
manuscript first appearing in 2012, Grigor’yan, Lin, Muranov, and Yau defined a homol-
ogy theory for digraphs (i.e. directed graphs) called path homology that generalized the
standard notion of simplicial homology [62]. By this time, the theory of persistent homol-
ogy—a multiresolution version of simplicial homology used for data analysis—had already
become quite popular. So a natural question was: would it be possible to produce a per-
sistent version of path homology? More specifically, would it be possible to produce a
persistent path homology theory with satisfactory algorithms and implementations as well
as nice theoretical properties?
In this thesis, we provide a positive answer to the preceding question. More generally,
we develop a framework with machinery in place for extending many metric data analysis
1
While multigraphs (graphs allowing for multiple edges between a pair of vertices) and hypergraphs
(multiple nodes sharing a “hyperedge”) are often useful in practice, we do not consider them in this work.
1
techniques (including variants of persistent homology) to the setting of directed networks
while preserving nice theoretical properties. In particular, we generalize the two most com-
mon extant simplicial persistent homology methods—the Vietoris-Rips and Čech methods
[47]—and fit them to the directed network setting. Our contributions in these directions
have already been published [40, 37].
The crucial component of this framework is the examination of a (pseudo)metric dN on
the collection of all networks, which first appeared in a slightly restricted version in [22].
In this setting, dN was used to provide theoretical guarantees for the robustness of certain
hierarchical clustering methods on directed networks. This network distance is structurally
analogous to the Gromov-Hausdorff distance dGH on CM—the collection of all compact
metric spaces [17]. An important remark is that dN (or even dGH ) is NP-hard to compute.
With the development of various data analysis methods relying on dN , it became in-
creasingly necessary to understand and develop the core theoretical properties of dN itself.
Keeping this goal in mind, we provide a comprehensive analysis of dN in this thesis (see
also [35, 39, 34]). Because its structural counterpart for metric spaces (dGH ) is already well-
studied, we were interested in extending the desirable results for dGH to the setting of dN .
As we show here, a wide range of results does extend to the dN setting. The crux of this
development is in realizing that dGH , by nature of its definition, enforces enough structure
on CM that only certain abstract consequences of metric properties such as symmetry and
triangle inequality become necessary when proving results on CM. These consequences,
when assumed as properties of networks, are quite natural. Most importantly, they allow
us to prove strong results about dN . We expect that in general, when proving results about
dGH , verifying these assumptions (cf. Definitions 1, 2, and also 28) would allow one to
prove the results on the much broader setting of dN with little additional work.
As a natural extension of these ideas, we develop families of metrics dN,p for p = [1, ∞]
that are Lp versions of dN (cf. [38]). These are structurally analogous to the Gromov-
Wasserstein distances developed in [85, 116, 86, 117]. We currently have ongoing work
that leverages the formulation and foundational work on dN,p .
When studying a metric, an important item to clarify is the “curvature” of the metric
[17]. Knowledge of curvature has important practical consequences; for example, it gives
theoretical guarantees on the existence of means [96]. Following the extensive study of
the (Alexandrov) curvature of the space of metric measure spaces in [117], we became
interested in testing similar ideas in the settings of dN and dGH . Interestingly, even the
restricted metric dGH does not admit any curvature bounds (in the Alexandrov sense). The
deeper reason behind this is the existence of “wild” geodesics in the space of compact
metric spaces. In [36], we produced explicit constructions of infinite families of these
wild-type geodesics in CM; these results are reproduced here.
The dN distance at the core of this work is a pseudometric, and it is of natural impor-
tance to understand its zero sets thoroughly. This is related to the classic question “Can
one hear the shape of a drum”: to understand the behavior of our methods, we need to
know which networks are perceived by dN to be the same. We provide a full treatment of
2
this question. We provide an independent definition of “weakly isomorphic networks” and
prove that these are precisely the networks at 0 dN -distance. Weakly isomorphic networks
essentially live on a fiber over a core subnetwork that we call a skeleton. In particular,
when comparing two networks having different sizes, one may traverse the fibers, pick out
two representatives having the same number of nodes, and compute a simplified (but still
NP-hard) distance dbN to obtain the original distance dN .
We continually develop network invariants that serve as polynomial-time proxies for
dN . The persistent homology methods are all examples of such invariants. One of the
other invariants we consider, the motif sets, are based on the “curvature class” invariants
defined by Gromov for metric spaces [65]. In the case of compact metric spaces, the motif
sets form a full invariant. We are able to recover this result for a subcollection of CN
satisfying an additional topological condition that we call coherence, or more specifically,
Axiom A2 (cf. Definition 28). In other words (some readers may be more familiar with this
terminology), the map from weak isomorphism classes of (a subcollection of) networks to
motif sets is injective. The proof of this result is quite short for metric spaces, but requires a
long sequence of verifications for networks. The interesting part, however, is that the proof
is made possible via Axiom A2, which is an abstract consequence of the triangle inequality
(cf. Remark 74). In particular, any finite network satisfies Axiom A2, even if it violates the
triangle inequality by an arbitrary margin. Actually, to be more accurate, given any finite
network, one may traverse the fiber of weakly isomorphic networks down to its skeleton,
and this skeleton will satisfy Axiom A2.
In addition to the theoretical results outlined above, we present practical implemen-
tations on both real and simulated datasets. These implementations are available as the
PersNet, PPH, and GWnets software packages, and are written in a combination of
Matlab, Python, and C++.
Finally, we note that this thesis combines narratives and results from [35, 39, 34, 36,
38, 37, 40]. All of these papers have been developed jointly with Facundo Mémoli. While
each of these papers can be read in a self-contained manner, we have made the effort to
consolidate the landscape developed in those papers and distill it into this thesis.
3
which also contains important guarantees on when persistent homology can be meaning-
fully applied to infinite networks. In §1.8 we discuss the existence of convex-combination
and wild-type geodesics in N . Development of the dN,p distances is carried out in §1.9.
Finally, in §1.10, we discuss computational complexity, algorithms, and the results of im-
plementing our methods on one particular dataset.
Chapter 2 is devoted to proofs of the results about dN stated in §1, along with auxiliary
results. In particular, §2.6 discusses lower bounds for computing dN . Chapter 3 contains
proofs of statements involving persistent homology. Finally we address computational
aspects and additional experiments in Chapter 4.
Recall that a space is first countable if each point in the space has a countable local basis
(see [114, p. 7] for more details). First countability is a technical condition guaranteeing
that when the underlying topological space of a network is compact, it is also sequentially
compact. Notice that these conditions are automatically satisfied in the finite setting, as ωX
is then trivially continuous.
A further observation is that the “correct” restriction of N to work with is the collection
of compact networks:
For data analysis purposes, we expect to only ever work with finite networks. However,
datasets are often viewed as being sampled from some “infinite” object (as is the case
4
for very large datasets), and compact networks turn out to be the appropriate model for
our purposes. In particular, whereas FN is not complete, we have the natural inclusion
FN ⊆ CN , and the latter is a complete pseudometric space (see §1.6).
Letting FM, CM, and M denote the spaces of finite, compact, and arbitrary metric
spaces, respectively, we also note the containments
FM ( FN , CM ( CN , and M ( N .
We now proceed to define dN , starting with some auxiliary definitions.
Definition 3 (Correspondence). Let (X, ωX ), (Y, ωY ) ∈ N . A correspondence between X
and Y is a relation R ⊆ X × Y such that πX (R) = X and πY (R) = Y , where πX and πY
are the canonical projections of X × Y onto X and Y , respectively. The collection of all
correspondences between X and Y will be denoted R(X, Y ), abbreviated to R when the
context is clear.
Example 2 (1-point correspondence). Let X be a set, and let {p} be the set with one point.
Then there is a unique correspondence R = {(x, p) : x ∈ X} between X and {p}.
Example 3 (Diagonal correspondence). Let X = {x1 , . . . , xn } and Y = {y1 , . . . , yn } be
two enumerated sets with the same cardinality. A useful correspondence is the diagonal
correspondence, defined as ∆ := {(xi , yi ) : 1 ≤ i ≤ n} . When X and Y are infinite sets
with the same cardinality, and ϕ : X → Y is a given bijection, then we write the diagonal
correspondence as ∆ := {(x, ϕ(x)) : x ∈ X} .
Definition 4 (Distortion of a correspondence). Let (X, ωX ), (Y, ωY ) ∈ N and let R ∈
R(X, Y ). The distortion of R is given by:
dis(R) := sup |ωX (x, x0 ) − ωY (y, y 0 )|.
(x,y),(x0 ,y 0 )∈R
When the context is clear, we will often write dN (X, Y ) to denote dN ((X, ωX ), (Y, ωY )).
We define the collection of optimal correspondences R opt between X and Y to be the col-
lection {R ∈ R(X, Y ) : dis(R) = 2dN (X, Y )} . This set is always nonempty when X, Y ∈
FN , but may be empty in general.
5
Remark 5. We remark that when restricted to the special case of networks that are also
metric spaces, the network distance dN agrees with the Gromov-Hausdorff distance. De-
tails on the Gromov-Hausdorff distance can be found in [17].
Remark 6. The intuition behind the preceding definition of network distance may be better
understood by examining the case of a finite network. Given a finite set X and two edge
0
weight functions ωX , ωX defined on it, we can use the `∞ distance as a measure of network
0
similarity between (X, ωX ) and (X, ωX ):
0
kωX − ωX k`∞ (X×X) := max
0
|ωX (x, x0 ) − ωX
0
(x, x0 )|.
x,x ∈X
∞
A generalization of the ` distance is required when dealing with networks having
different sizes: Given two sets X and Y , we need to decide how to match up points of X
with points of Y . Any such matching will yield a subset R ⊆ X × Y such that πX (R) =
X and πY (R) = Y , where πX and πY are the projection maps from X × Y to X and
Y , respectively. This is precisely a correspondence, as defined above. A valid notion of
network similarity may then be obtained as the distortion incurred by choosing an optimal
correspondence—this is precisely the idea behind the definition of the network distance
above.
We will eventually verify that dN as defined above is a pseudometric (Theorem 72),
which will justify calling dN a “network distance”. Because dN is a pseudometric, it is
important to understand its zero sets. To this end, we first develop the notion of strong
isomorphism of networks. The definition follows below.
Definition 6 (Weight preserving maps). Let (X, ωX ), (Y, ωY ) ∈ N . A map ϕ : X → Y is
weight preserving if:
ωX (x, x0 ) = ωY (ϕ(x), ϕ(x0 )) for all x, x0 ∈ X.
Definition 7 (Strong isomorphism). Let (X, ωX ), (Y, ωY ) ∈ N . To say (X, ωX ) and
(Y, ωY ) are strongly isomorphic means that there exists a weight preserving bijection ϕ :
X → Y . We will denote a strong isomorphism between networks by X ∼ =s Y . Note that
this notion is exactly the usual notion of isomorphism between weighted graphs.
Given two strongly isomorphic networks, i.e. networks (X, ωX ), (Y, ωY ) and a weight
preserving bijection ϕ : X → Y , it is easy to use the diagonal correspondence (Example
3) to verify that dN (X, Y ) = 0. However, it is easy to see that the reverse implication
is not true in general. Using the one-point correspondence (Example 2), one can see that
dN (N1 (1), N2 (12×2 )) = 0. Here 1n×n denotes the all-ones matrix of size n × n for any
n ∈ N. However, these two networks are not strongly isomorphic, because they do not even
have the same cardinality. Thus to understand the zero sets of dN , we need to search for a
different, perhaps weaker notion of isomorphism. This will be further explored in Section
1.6.
Now we state another reformulation of dN which will be especially useful when proving
results about persistent homology.
6
Definition 8 (Distortion of a map between two networks). Given any (X, ωX ), (Y, ωY ) ∈
N and a map ϕ : (X, ωX ) → (Y, ωY ), the distortion of ϕ is defined as:
dis(ϕ) := sup |ωX (x, x0 ) − ωY (ϕ(x), ϕ(x0 ))|.
x,x0 ∈X
Given maps ϕ : (X, ωX ) → (Y, ωY ) and ψ : (Y, ωY ) → (X, ωX ), we define two co-
distortion terms:
CX,Y (ϕ, ψ) := sup |ωX (x, ψ(y)) − ωY (ϕ(x), y)|,
(x,y)∈X×Y
7
Example 9 (Networks with two nodes). Let (X, ωX ), (Y, ωY ) ∈ FN where X = {x1 , x2 }
α δ
and Y = {y1 , y2 }. Then
we claim dN (X, Y ) = dN (X, Y ). Furthermore, if X = N2 β γ
b
α0 δ 0
and Y = N2 β 0 γ 0 , then we have the explicit formula:
1
dN (X, Y ) =
min (Γ1 , Γ2 ) , where
2
Γ1 = max (|α − α0 |, |β − β 0 |, |δ − δ 0 |, |γ − γ 0 |) ,
Γ2 = max (|α − γ 0 |, |γ − α0 |, |δ − β 0 |, |β − δ 0 |) .
Remark 10 (A three-node example where dN 6= dbN ). Assume (X, ωX ) and (Y, ωY ) are
two networks with the same cardinality. Then
The inequality holds because each bijection induces a correspondence, and we are min-
imizing over all correspondences to obtain dN . However, the inequality may be strict, as
demonstrated by the following example. Let X = {x1 , . . . , x3 } and let Y = {y1 , . . . , y3 }.
Define ωX (x1 , x1 ) = ωX (x3 , x3 ) = ωX (x1 , x3 ) = 1, ωX = 0 elsewhere, and define
ωY (y3 , y3 ) = 1, ωY = 0 elsewhere. In terms of matrices, X = N3 (ΣX ) and Y = N3 (ΣY ),
where
1 0 1 0 0 0
ΣX = 0 0 0 and ΣY = 0 0 0 .
0 0 1 0 0 1
max Γ(x, x0 , ϕ(x), ϕ(x0 )) = max{Γ(x1 , x3 , ϕ(x1 ), ϕ(x3 )), Γ(x1 , x1 , ϕ(x1 ), ϕ(x1 )),
x,x0 ∈X
Then max(x,y),(x0 y0 )∈R |ωX (x, x0 ) − ωY (y, y 0 )| = 0. Thus dN (X, Y ) = 0 < dbN (X, Y ).
Example 11 (Networks with three nodes). Let (X, ωX ), (Y, ωY ) ∈ FN , where we write
X = {x1 , x2 , x3 } and Y = {y1 , y2 , y3 }. Because we do not necessarily have dN = dbN
on three node networks by Remark 10, the computation of dN becomes more difficult than
in the two node case presented in Example 9. A certain reduction is still possible, which
we present next. Consider the following list L of matrices representing correspondences,
where a 1 in position (i, j) means that (xi , yj ) belongs to the correspondence.
8
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
dN (X, Y ) = inf{dbN (X 0 , Y 0 ) : X 0 , Y 0 ∈ N , X 0 ∼
=w 0 ∼w
I X, Y =I Y,
and card(X 0 ) = card(Y 0 )}.
The moral of the preceding proposition is that networks at 0 dN -distance live on a fiber
above their equivalence class (where equivalence is with respect to dN ), and the distance
between two networks can always be computed by computing dbN between two represen-
tatives having the same number of points. While this notion of picking representatives
with the same number of points may seem somewhat mysterious, we refer the reader to
Proposition 79 and Figure 1.18 for explicit details on how this process is carried out.
9
y1 x1 y1
3
x1 x2 5 5 3 3 5 5
y2
1 y3
0 y2
1 y3
X x21 x22
Y Z Y
Figure 1.1: The two networks on the left have different cardinalities, but computing corre-
spondences shows that dN (X, Y ) = 1. Similarly one computes dN (X, Z) = 0, and thus
dN (Y, Z) = 1 by triangle inequality. On the other hand, the bijection given by the arrows
shows dbN (Y, Z) = 1. Applying Proposition 12 then recovers dN (X, Y ) = 1.
Remark 13 (Computational aspects of dN and dbN ). Even though dbN has a simpler for-
mulation than dN , computing dbN still turns out to be an NP-hard problem, as we discuss
in §1.10. Moreover, we show in Theorem 178 that computing dN is at least as hard as
computing dbN .
Instead of trying to compute dN , we will focus on finding network invariants that can
be computed easily. Finding invariants and stability results guaranteeing their validity as
proxies for dN is an overarching goal of this work.
1.3 Network models: the cycle networks and the SBM networks
A dissimilarity network is a network (X, AX ) where AX is a map from X × X to R+ ,
and AX (x, x0 ) = 0 if and only if x = x0 . Neither symmetry nor triangle inequality is
assumed. We denote the collection of all such networks as FN dis , CN dis , and N dis for the
finite, compact, and general settings, respectively.
Example 14. Finite metric spaces and finite ultrametric spaces constitute obvious examples
of dissimilarity networks. Recall that, in an ultrametric space (X, dX ), we have the strong
triangle inequality dX (x, x0 ) ≤ max {dX (x, x00 ), dX (x00 , x0 )} for all x, x0 , x00 ∈ X. More
interesting classes of dissimilarity networks arise by relaxing the symmetry and triangle
inequality conditions of metric spaces.
10
(X, AX ) is said to have finite reversibility if ρX < ∞. Notice that ρX ≥ 1, with equality if
and only if AX is symmetric.
Dissimilarity networks satisfying the symmetry condition, but not the triangle inequal-
ity, have a long history dating back to Fréchet’s thesis [55] and continuing with work by
Pitcher and Chittenden [101], Niemytzki [91], Galvin and Shore [57, 58], and many oth-
ers, as summarized in [67]. One of the interesting directions in this line of work was the
development of a “local triangle inequality” and related metrization theorems [91], which
has been continued more recently in [122].
Dissimilarity networks satisfying the triangle inequality, but not symmetry, include the
special class of objects called directed metric spaces, which we define below.
Definition 11. Let (X, AX ) be a dissimilarity network. Given any x ∈ X and r ∈ R+ , the
forward-open ball of radius r centered at x is
Directed metric spaces with finite reversibility were studied in [108], and constitute im-
portant examples of networks that are strictly non-metric. More specifically, the authors of
[108] extended notions of Hausdorff distance and Gromov-Hausdorff distance to the setting
of directed metric spaces with finite reversibility, and our network distance dN subsumes
this theory while extending it to even more general settings.
Remark 15 (Finsler metrics). An interesting class of directed metric spaces arises from
studying Finsler manifolds. A Finsler manifold (M, F ) is a smooth, connected manifold
M equipped with an asymmetric norm F (called a Finsler function) defined on each tangent
space of M [6]. A Finsler function induces a directed metric dF : M ×M → R+ as follows:
for each x, x0 ∈ M ,
Z b
0 0
dF (x, x ) := inf F (γ(t), γ̇(t)) dt : γ : [a, b] → M a smooth curve joining x and x .
a
11
Finsler metric spaces have received interest in the applied literature. In [104], the au-
thors prove that Finsler metric spaces with reversible geodesics (i.e. the reverse curve
γ 0 (t) := γ(1 − t) of any geodesic γ : [0, 1] → M is also a geodesic) is a weighted quasi-
metric [104, p. 2]. Such objects have been shown to be essential in biological sequence
comparison [115].
Definition 13. We define the directed unit circle to be (~S1 , ω~S1 ) with the discrete topology.
12
The directed circles with finite reversibility
Now we define a family of directed circles parametrized by reversibility. Unlike the
construction in §1.3.1, these directed networks belong to the family CN dis . An illustration
is provided in Figure 1.2.
Recall from §1.3.1 that for α, β ∈ [0, 2π), we wrote d(α, ~ β) to denote the counter-
clockwise geodesic distance along the unit circle from e to eiβ . Fix ρ ≥ 1. For each
iα
Proof of Proposition 16. It suffices to show that the preimages of basic open sets under
ω~S1,ρ are open. Let (a, b) be an open interval in R. Let (eiα , eiβ ) ∈ ω~S−1 1,ρ
[(a, b)], where
α, β ∈ [0, 2π). There are three cases: (1) α < β, (2) β < α, or (3) α = β.
~ β), or
Suppose first that α < β. There are two subcases: either ω~S1,ρ (eiα , eiβ ) = d(α,
~ α).
= ρd(β,
Fix r > 0 to be determined later, but small enough so that B(α, r) ∩ B(β, r) = ∅. Let
~ δ) ∈ B(d(α,
γ ∈ B(α, r), δ ∈ B(β, r). Then d(γ, ~ β), 2r). Also,
~ δ) − ρd(α,
ρd(γ, ~ β) = ρ d(γ,
~ δ) − d(α,
~ β) < 2rρ.
Now r can be made arbitrarily small, so that for any γ ∈ B(α, r) and any δ ∈ B(δ, r),
we have ω~S1,ρ (eiγ , eiδ ) ∈ (a, b). It follows that (eiα , eiβ ) is contained in an open set con-
tained inside ω~S−1 1,ρ
[(a, b)]. An analogous proof shows this to be true for the β < α case.
Next suppose α = β. Fix 0 < r < b/(2ρ). We need to show ω~S1,ρ (B(α, r), B(α, r)) ⊆
(a, b). Note that 0 ∈ (a, b). Let γ, δ ∈ B(α, r). There are three subcases. If γ = δ, then
ω~S1,ρ (eiγ , eiδ ) = 0 ∈ (a, b). If d(γ, ~ δ) < 2r, then ω~ 1 (eiγ , eiδ ) < 2r < b. Finally, suppose
S ,ρ
~ δ) ≥ 2r. Then we must have d(δ,
d(γ, ~ γ) < 2r, so ω~ 1 (eiγ , eiδ ) ≤ ρd(δ,
~ γ) < 2rρ < b.
S ,ρ
iγ iδ
Thus for any γ, δ ∈ B(α, r), we have ω~S1,ρ (e , e ) ∈ (a, b).
It follows that ω~S−1 1,ρ
[(a, b)] is open. This proves the claim.
Definition 14. Let ρ ∈ [1, ∞). We define the directed unit circle with reversibility ρ to be
(~S1 , ω~S1,ρ ). This is a compact, asymmetric network in CN dis .
13
This asymmetric network provides us with concrete examples of ε-approximations
(Definition 22), for any ε > 0. To see this, fix any n ∈ N, and consider the directed
circle network on n nodes with reversibility ρ obtained by writing
n o
~S1 := e 2πik
n ∈ C : k ∈ {0, 1, . . . , n − 1} ,
n
and defining ωn,ρ to be the restriction of ω~S1,ρ on this set. The pair (~S1n , ωn,ρ ) is the network
thus obtained. An illustration of ~S1 and ~S1n for n = 6 is provided in Figure 1.2.
Theorem 17. As n → ∞, the sequence of finite dissimilarity networks (~S1n , ωn,ρ ) limits to
the dissimilarity network (~S1 , ω~S1,ρ ) in the sense of dN .
Proof of Theorem 17. Let ε > 0, and let n ∈ N be such that 2π/n < ε. It suffices to show
that dN ((~S1 , ω~S1,ρ ), (~S1n , ωn,ρ )) < ε. Define a correspondence between ~S1 and ~S1n as follows:
n 2πik
o
R := (eiθ , e n ) : θ ∈ ( 2πk
n
− ε 2πk
ρ
, n
+ ε), k ∈ {0, 1, 2, . . . , n − 1} .
2πik
Essentially this is the same as taking ε-balls around each e n , except that the reversibility
parameter skews one side of the ε-ball. Next let 0 ≤ θ1 ≤ θ2 < 2π, and let j, k ∈
{0, 1, . . . , n − 1} be such that θ1 ∈ ( 2πj
n
− ρε , 2πj
n
+ ε) and θ2 ∈ ( 2πk
n
− ρε , 2πk
n
+ ε).
Suppose first that k = j. Then
~ 1 , θ2 ), d(θ
min(d(θ ~ 2 , θ1 )) ≤ ε + ε , and max(d(θ
~ 1 , θ2 ), d(θ
~ 2 , θ1 )) ≤ ρ(ε + ε ) = ρε + ε.
ρ ρ
As ε → 0, this quantity tends to zero, so ω~S1,ρ (eiθ1 , eiθ2 ) → 0. The other cases follow from
similar observations; the key idea is that as ε → 0, the ω~S1,ρ value between any two points
on a “skewed ε-ball” also tends to 0.
Remark 18. Finite reversibility is critical when defining directed circles on n nodes. With-
out this condition, correspondences as above lead to terms like the following:
2πik 2πik 2πik 2πik
max |ω~S1 (eiθ1 , eiθ2 ) − ω~S1 (e n , e n )|, |ω~S1 (eiθ2 , eiθ1 ) − ω~S1 (e n , e n )| ≈ 2π.
The problem here is that as ω~S1 (eiθ1 , eiθ2 ) → 0, we adversely have ω~S1 (eiθ2 , eiθ1 ) → 2π.
Indeed, one of our later results (cf. §1.8) shows that CN is complete. Because (~S1 , ω~S1 ) 6∈
CN , it follows that there cannot a sequence of finite networks converging to (~S1 , ω~S1 ).
14
2π
6
4π 2π
2π e 6 e 6 2π
6 6
(~S1 , ω~S1 ) e
6π
6
~S1
6 e0 (~S1 , ω~S1,ρ )
2π 2π
8π 10π
6 e 6 e 6 6
2π
6
Figure 1.2: The directed circle (~S1 , ω~S1 ), the directed circle on 6 nodes (~S16 , ω~S1 ), and the di-
6
rected circle with reversibility ρ, for some ρ ∈ [1, ∞). Traveling in a clockwise direction is
possibly only in the directed circle with reversibility ρ, but this incurs a penalty modulated
by ρ.
15
1
x1 x2 x3 x4 x5 x6
x1 x2
x1 0 1 2 3 4 5
1 1
x2 5 0 1 2 3 4
x3 4 5 0 1 2 3
x6 x3
x4 3 4 5 0 1 2
x5 2 3 4 5 0 1
1 1 x6 1 2 3 4 5 0
x5 1 x4
Figure 1.3: A cycle network on 6 nodes, along with its weight matrix. Note that the weights
are highly asymmetric.
16
µ σ2
1 10 11 12 13 1 3 3 3 3
20 1.75 14 15 16 3 1 3 3 3
21 24 2.5 17 18 3 3 1 3 3
22 25 27 3.25 19 3 3 3 1 3
23 26 28 29 4 3 3 3 3 1
Figure 1.4: A network SBM on 50 nodes, split into 5 communities, along with the matrices
of means and variances. The deepest blue corresponds to values ≈ 1, and the deepest
yellow corresponds to values ≈ 29.
s
R
Dsi Hk
FN F Dgm
so
D
t
17
1.4.1 Background on persistent homology
Homology is a classical construction which assigns a k-dimensional signature to a topo-
logical space, where k ranges over the nonnegative integers. When the space is a simplicial
complex, the resulting homology theory is called simplicial homology. In practice, simpli-
cial homology is readily computable via matrix operations.
Datasets “in the wild” are typically discrete objects. More specifically, our measure-
ment and recording technologies are discrete, and therefore the datasets that we curate from
a process are necessarily discrete. A priori, these datasets are equipped with the uninterest-
ing discrete topology. However, there are several well-studied [47] methods for imposing
an artificial topology on a discrete dataset. Here are two examples.
Example 21 (Čech complexes). Let (X, dX ) be a metric space. Given a scale parameter
δ ≥ 0, the Čech complex at scale δ is
The Čech complex of a metric space X at scale δ coincides with the nerve simplicial
complex when we take a cover of X by δ-balls:
Definition 15 (Nerve of a cover). Let X be a topological space, and let A = {Ai }i∈I be an
open cover of X indexed by I. The nerve of A is the simplicial complex N (A) := {σ ∈
pow(I) : σ is finite, nonempty, and ∩i∈σ Ai 6= ∅}.
Theorem 24 (Nerve theorem [68] Corollary 4G.3). Let X be a paracompact space (every
open cover admits a locally finite open refinement), and let A be an open cover such that
every nonempty, finite intersection of sets in A is contractible. Then X ' |N (A)|.
18
Returning to the topic of imposing an artificial topology on a dataset via one of these
constructions, the following question comes to mind: what is the “correct” scale param-
eter to use when defining either the Vietoris-Rips or the Čech complexes? The theory
of persistent homology (PH) enables the user to bypass this consideration and instead
view the homological signatures at a range of scale parameters, along with information
about how signatures from one resolution “include” into the signatures at another resolu-
tion [56, 103, 49, 123]. The essential idea is to fix a method for “topologizing” a dataset
(e.g. the Vietoris-Rips or Čech constructions), choose a collection of scale parameters
0 ≤ δ0 < δ1 < . . . < δn (e.g. choose all the scales at which new simplices are added),
and then apply the (simplicial) homology functor with coefficients in a field. The nested
simplicial complexes and their inclusion maps
Here the H• denotes homology in a given dimension • ranging over Z+ . The entire col-
lection {Kδ (X) ,→ Kδ0 (X)}δ≤δ0 is known as a simplicial filtration or a filtered simplicial
complex, and the collection of vector spaces with linear maps is a persistent vector space.
More precisely, we have the following definition:
νδ,δ0 0
Definition 16. A persistent vector space V is a family {V δ −−→ V δ }δ≤δ0 ∈R of vector
spaces and linear maps such that: (1) νδ,δ is the identity map for any δ ∈ R, and (2)
νδ,δ00 = νδ0 ,δ00 ◦ νδ,δ0 whenever δ ≤ δ 0 ≤ δ 00 .
A classification result in [21, §5.2] shows that at least in “nice” settings, a certain object
called a persistence diagram/barcode ([26]) is a full invariant of a persistent vector space.
When it is well-defined, a persistence diagram is essentially a list of the topological signa-
tures of the dataset along with the ranges of scale parameters along which each signature
persists. In typical data analysis use cases, the barcode or diagram is simply the output of
applying persistent homology to a dataset.
Persistent homology computations are matrix operations and hence computable, and
there are numerous software packages currently available for efficient PH computations.
PH computation is theoretically justified by certain stability theorems, one of which is the
following (an early version for Vietoris-Rips filtrations on finite metric spaces appared in
[25]):
Theorem 25 (PH stability for metric spaces, [27] Theorem 5.2). Let X, Y be two totally
bounded metric spaces, and let k ∈ Z+ . Let Dgm•k denote the k-dimensional persistence
diagram of either the Vietoris-Rips or Čech filtrations. Then we have:
19
Here dB is a certain metric on persistence diagrams called the bottleneck distance. It is
essentially a matching metric that can be computed via the Hungarian algorithm. Stability
results of this form show that PH outputs change in a controlled way when the input dataset
is perturbed. This provides the theoretical justification for using PH in data analysis.
Finally, we introduce another definition that will be of use to us: the interleaving dis-
tance dI is an extended pseudometric between persistent vector spaces (cf. §3.1.2). Even
when persistent vector spaces do not have well-defined persistence diagrams, we can refer
to the interleaving distance between the persistent vector spaces.
20
analogue of the directed Rips/flag/OT complex. In this way, our work complements the
contributions of [118].
We make a final remark to situate our work in the existing literature. In a 2018 update
to [118], Turner asks if any of the directed generalizations of the Vietoris-Rips complex
hold in the setting of infinite networks (more precisely, the question is about infinite “set-
function” pairs, which are just networks in our terminology). The answer is “yes”: as
we had already shown in [34], by the framework we develop for infinite networks, and in
particular by our notion of ε-systems, all of these directed generalizations of the Vietoris-
Rips and Čech complex constructions are well-defined for compact (in particular, infinite)
networks.
0
To any network (X, ωX ), we may associate the Vietoris-Rips filtration {RδX ,→ RδX }δ≤δ0 .
We denote the k-dimensional persistence vector space associated to this filtration by PVecR
k (X).
R
It is not at all clear that the corresponding persistence diagram Dgmk (X) is well-
defined in general, although it is well-defined when (X, ωX ) ∈ FN (the finite case is
easy to see). The fact that the persistence diagram is well-defined when (X, ωX ) ∈ CN is
presented in Theorem 86, and is a consequence of the machinery we develop for dN .
The Vietoris-Rips persistence diagram, when defined, is stable to small perturbations
of the input data:
Proposition 26. Let (X, ωX ), (Y, ωY ) ∈ CN , and let k ∈ Z+ . Then
dI (PVecR R
k (X), PVeck (Y )) ≤ 2dN (X, Y ).
We omit the proof because it is similar to that of Proposition 29, which we will prove in
detail. We also remark that we obtain stability for persistence diagrams with respect to the
bottleneck distance in Corollary 177. More specifically, Corollary 177 states that we have:
dB (DgmR R
k (X), Dgmk (Y )) ≤ 2dN (X, Y ).
Remark 27. The preceding result serves a dual purpose: (1) it shows that the Vietoris-Rips
persistence diagram is robust to noise in input data, and (2) it shows that instead of comput-
ing the network distance between two networks, one can compute the bottleneck distance
between their Vietoris-Rips persistence diagrams as a suitable proxy. The advantage to
computing bottleneck distance is that it can be done in polynomial time (see [52]), whereas
computing dN is NP-hard in general. We remind the reader that the problem of comput-
ing dN includes the problem of computing the Gromov-Hausdorff distance between finite
21
metric spaces, which is an NP-hard problem [105]. We remark that the idea of computing
Vietoris-Rips persistence diagrams to compare finite metric spaces first appeared in [25],
and moreover, that Proposition 26 is an extension of Theorem 3.1 in [25].
The Vietoris-Rips filtration in the setting of symmetric networks has been used in [71,
23, 60, 99], albeit without addressing stability results.
We now introduce a definition that will help us gauge the performance of various PH
methods on directed networks.
Definition 17 (Symmetrization and Transposition). Define the max-symmetrization map
s : N → N by (X, ωX ) 7→ (X, ωc X ), where for any network (X, ωX ), we define ω
c X :
X × X → R as follows:
0 0 0 0
X (x, x ) := max(ωX (x, x ), ωX (x , x)), for x, x ∈ X.
ωc
>
Also define the transposition map t : N → N by (X, ωX ) 7→ (X, ωX ), where for any
> 0 0 0
(X, ωX ) ∈ N , we define ωX (x, x ) := ωX (x , x) for x, x ∈ X. For convenience, we
denote X > := t(X) for any network X.
Remark 28 (Vietoris-Rips is insensitive to asymmetry). A critical weakness of the Vietoris-
Rips complex construction is that it is not sensitive to asymmetry. To see this, consider the
symmetrization map s defined in Definition 17, and let (X, ωX ) ∈ FN . Now for any
σ ∈ pow(X), we have maxx,x0 ∈σ ωX (x, x0 ) = maxx,x0 ∈σ ωc 0
X (x, x ). It follows that for each
δ ≥ 0, the Rips complexes of (X, ωX ) and (X, ωc X ) = s(X, ωX ) are equal, i.e. R = R ◦ s.
Thus the Rips persistence diagrams of the original and max-symmetrized networks are
equal.
Then Rδ,X ⊆ X × X, and for any δ 0 ≥ δ, we have Rδ,X ⊆ Rδ0 ,X . Using Rδ,X , we build a
simplicial complex Dsiδ as follows:
Dsiδ,X := {σ = [x0 , . . . , xn ] : there exists x0 ∈ X such that (xi , x0 ) ∈ Rδ,X for each xi } .
(1.2)
If σ ∈ Dsiδ,X , it is clear that any face of σ also belongs to Dsiδ,X . We call Dsiδ,X the Dowker
δ-sink simplicial complex associated to X, and refer to x0 as a δ-sink for σ (where σ and x0
should be clear from context).
22
Since Rδ,X is an increasing sequence of sets, it follows that Dsiδ,X is an increasing se-
quence of simplicial complexes. In particular, for δ 0 ≥ δ, there is a natural inclusion map
Dsiδ,X ,→ Dsiδ0 ,X . We write DsiX to denote the filtration {Dsiδ,X ,→ Dsiδ0 ,X }δ≤δ0 associated to
X. We call this the Dowker sink filtration on X. The corresponding persistent vector space
is denoted PVecsik (X). When it is defined, we will denote the k-dimensional persistence
diagram arising from this filtration by Dgmsik (X). Once again, we point the reader to The-
orem 86, where we show that this diagram is well-defined when (X, ωX ) ∈ CN . The case
(X, ωX ) ∈ FN is well-defined for easy reasons, and the reader may keep the finite case in
mind for now.
∅ : δ < −1
{[a]} : −1 ≤ δ < 0
b 0 Dsi
δ,X =
{[a], [b], [c]} :0≤δ<1
1
{[a], [b], [c], [ab], [bc], [ac], [abc]} :δ≥1
2 ∅ : δ < −1
1 2
{[a]} : −1 ≤ δ < 0
2
−1 a c 0
Dso
δ,X = {[a], [b], [c]} :0≤δ<1
1
{[a], [b], [c], [ab], [ac]} :1≤δ<2
{[a], [b], [c], [ab], [bc], [ac], [abc]} :δ≥2
Figure 1.6: Computing the Dowker sink and source complexes of a network (X, ωX ).
Observe that the sink and source complexes are different in the range 1 ≤ δ < 2.
Practitioners of persistent homology might recall that there are two Dowker complexes
[59, p. 73]. One of these is the sink complex defined above. We define its dual below:
0 0
Dsoδ,X := {σ = [x0 , . . . , xn ] : there exists x ∈ X such that (x , xi ) ∈ Rδ,X for each xi } .
(1.3)
We call Dso δ,X the Dowker δ-source simplicial complex associated to X. The filtration
{Dsoδ,X ,→ D so
δ ,X }δ≤δ 0 associated to X is called the Dowker source filtration, denoted DX .
0
so
We denote the k-dimensional persistence diagram (when defined) arising from this filtration
by Dgmso si
k (X). Notice that any construction using Dδ,X can also be repeated using Dδ,X , so
so
we focus on the case of the sink complexes and restate results in terms of source complexes
where necessary. A subtle point to note here is that each of these Dowker complexes can be
used to construct a persistence diagram. A folklore result in the literature about persistent
homology of metric spaces, known as Dowker duality, is that the two persistence diagrams
arising this way are equal [27, Remark 4.8]:
Dgmsik (X) = Dgmso
k (X) for any k ∈ Z+ ,
23
Thus it makes sense to talk about “the” Dowker diagram associated to X. In particular,
in §1.4.5 we describe a stronger result—a functorial Dowker theorem—from which the
duality follows easily in the general setting of networks.
The sink and source filtrations are not equal in general; this is illustrated in Figure 1.6.
As in the case of the Rips filtration, both the Dowker sink and source filtrations are
stable.
Once again, we obtain stability for persistence diagrams with respect to the bottleneck
distance in Corollary 177. More specifically, Corollary 177 states that we have:
Remark 30. The preceding result shows that the Dowker persistence diagram is robust
to noise in input data, and that the bottleneck distance between Dowker persistence dia-
grams arising from two networks can be used as a proxy for computing the actual network
distance. Note the analogy with Remark 27.
Both the Dowker and Rips filtrations are valid methods for computing persistent ho-
mology of networks, by virtue of their stability results (Propositions 26 and 29). However,
we present the Dowker filtration as an appropriate method for capturing directionality in-
formation in directed networks. In §1.4.6 we discuss this particular feature of the Dowker
filtration in full detail.
Remark 31 (Symmetric networks). In the setting of symmetric networks, the Dowker sink
and source simplicial filtrations coincide, and so we automatically obtain Dgmso
k (X) =
Dgmsik (X) for any k ∈ Z+ and any (X, ωX ) ∈ CN .
Remark 32 (The metric space setting and relation to witness complexes). When restricted
to the setting of metric spaces, the Dowker complex resembles a construction called the
witness complex [44]. In particular, a version of the Dowker complex for metric spaces,
constructed in terms of landmarks and witnesses, was discussed in [27], along with sta-
bility results. When restricted to the special networks that are pseudo-metric spaces, our
definitions and results agree with those presented in [27].
24
complexes ER and FR as follows. A finite subset σ ⊆ X belongs to ER whenever there
exists y ∈ Y such that (x, y) ∈ R for each x ∈ σ. Similarly a finite subset τ ⊆ Y belongs to
FR whenever there exists x ∈ X such that (x, y) ∈ R for each y ∈ τ . These constructions
can be traced back to [46], who proved the following result that we refer to as Dowker’s
theorem:
Theorem 33 (Dowker’s theorem; Theorem 1a, [46]). Let X, Y be two totally ordered sets,
let R ⊆ X × Y be a nonempty relation, and let ER , FR be as above. Then for each k ∈ Z+ ,
Hk (ER ) ∼
= Hk (FR ).
There is also a strong form of Dowker’s theorem that Björner proves via the classical
nerve theorem [11, Theorems 10.6, 10.9]. Below we write |X| to denote the geometric
realization of a simplicial complex X.
Theorem 34 (The strong form of Dowker’s theorem; Theorem 10.9, [11]). Under the as-
sumptions of Theorem 33, we in fact have |ER | ' |FR |.
The Functorial Dowker Theorem is the following generalization of the strong form of
Dowker’s theorem: instead of a single nonempty relation R ⊆ X × Y , consider any pair
of nested, nonempty relations R ⊆ R0 ⊆ X × Y . Then there exist homotopy equivalences
between the geometric realizations of the corresponding complexes that commute with the
canonical inclusions, up to homotopy. We formalize this statement below.
Theorem 35 (The Functorial Dowker Theorem (FDT)). Let X, Y be two totally ordered
sets, let R ⊆ R0 ⊆ X × Y be two nonempty relations, and let ER , FR , ER0 , FR0 be their
associated simplicial complexes. Then there exist homotopy equivalences Γ|ER | : |FR | →
|ER | and Γ|ER0 | : |FR0 | → |ER0 | such that the following diagram commutes up to homotopy:
|ιE |
|FR | |FR0 |
In other words, we have |ιF | ◦ Γ|ER | ' Γ|ER0 | ◦ |ιE |, where ιE , ιF are the canonical
inclusions.
25
implies the other. We were not able to find an elementary proof of the strong form of
Dowker’s theorem in the existing literature. However, such an elementary proof is provided
by our proof of Theorem 35 (given in Section 3.2.2), which we obtained by extending ideas
in Dowker’s original proof of Theorem 33.2
Whereas the Functorial Dowker Theorem and our elementary proof are of indepen-
dent interest, it has been suggested in [27, Remark 4.8] that such a functorial version of
Dowker’s theorem could also be proved using a functorial nerve theorem [28, Lemma 3.4].
Despite being an interesting possibility, we were not able to find a detailed proof of this
claim in the literature. In addition, Björner’s remark regarding the equivalence between the
nerve theorem and the strong form of Dowker’s theorem suggests the following question:
Question 1. Are the Functorial Nerve Theorem (FNT) of [28] and the Functorial Dowker
Theorem (FDT, Theorem 35) equivalent?
This question is of fundamental importance because the Nerve Theorem is a crucial tool
in the applied topology literature and its functorial generalizations are equally important in
persistent homology. In general, the answer is no, and moreover, one (of the FNT and FDT)
is not stronger than the other. The FNT of [28] is stated for paracompact spaces, which are
more general than the simplicial complexes of the FDT. However, the FNT of [28] is stated
for spaces with finitely-indexed covers, so the associated nerve complexes are necessarily
finite. All the complexes involved in the statement of the FDT are allowed to be infinite, so
the FDT is more general than the FNT in this sense.
To clarify these connections, we formulate a simplicial Functorial Nerve Theorem (The-
orem 38) and prove it via a finite formulation of the FDT (Theorem 36). In turn, we show
that the simplicial FNT implies the finite FDT, thus proving the equivalence of these for-
mulations (Theorem 39).
We begin with a weaker formulation of Theorem 35 and some simplicial Functorial
Nerve Theorems.
Theorem 36 (The finite FDT). Let X, Y be two totally ordered sets, and without loss of
generality, suppose X is finite. Let R ⊆ R0 ⊆ X × Y be two nonempty relations, and let
ER , FR , ER0 , FR0 be their associated simplicial complexes (as in Theorem 35). Then there
exist homotopy equivalences Γ|ER | : |FR | → |ER | and Γ|ER0 | : |FR0 | → |ER0 | that commute
up to homotopy with the canonical inclusions.
The finite FDT (Theorem 36) is an immediate consequence of the general FDT (Theo-
rem 35).
Recall the definition of the nerve complex. Let A = {Ai }i∈I be a family of nonempty
sets indexed by I. The nerve of A is the simplicial complex N (A) := {σ ∈ pow(I) :
σ is finite, nonempty, and ∩i∈σ Ai 6= ∅}.
2
A thread with ideas towards the proof of Theorem 34 was discussed in [1, last accessed 4.24.2017], but
the proposed strategy was incomplete. We have inserted an addendum in [1] proposing a complete proof with
a slightly different construction.
26
Definition 18 (Covers of simplices and subcomplexes). Let Σ be a simplicial complex.
Then a collection of subcomplexes AΣ = {Σi }i∈I is said to be a cover of subcomplexes for
Σ if Σ = ∪i∈I Σi . Furthermore, AΣ is said to be a cover of simplices if each Σi ∈ AΣ has the
property that Σi = pow(V (Σi )). In this case, each Σi has precisely one top-dimensional
simplex, consisting of the vertex set V (Σi ).
We present two simplicial formulations of the Functorial Nerve Theorem that turn out
to be equivalent; the statements differ in that one is about covers of simplices and the other
is about covers of subcomplexes.
Theorem 37 (Functorial Nerve I). Let Σ ⊆ Σ0 be two simplicial complexes, and let AΣ =
{Σi }i∈I , AΣ0 = {Σ0i }i∈I 0 be finite covers of simplices for Σ and Σ0 such that I ⊆ I 0 and
Σi ⊆ Σ0i for each i ∈ I. In particular, card(I 0 ) < ∞. Suppose that for each finite subset
σ ⊆ I 0 , the intersection ∩i∈σ Σ0i is either empty or contractible (and likewise for ∩i∈σ Σi ).
Then |Σ| ' |N (AΣ )| and |Σ0 | ' |N (AΣ0 )|, via maps that commute up to homotopy with
the canonical inclusions.
Theorem 38 (Functorial Nerve II). The statement of Theorem 37 holds even if AΣ and
AΣ0 are covers of subcomplexes. Explicitly, the statement is as follows. Let Σ ⊆ Σ0
be two simplicial complexes, and let AΣ = {Σi }i∈I , AΣ0 = {Σ0i }i∈I 0 be finite covers of
subcomplexes for Σ and Σ0 such that I ⊆ I 0 and Σi ⊆ Σ0i for each i ∈ I. In particular,
card(I 0 ) < ∞. Suppose that for each finite subset σ ⊆ I 0 , the intersection ∩i∈σ Σ0i is
either empty or contractible (and likewise for ∩i∈σ Σi ). Then |Σ| ' |N (AΣ )| and |Σ0 | '
|N (AΣ0 )|, via maps that commute up to homotopy with the canonical inclusions.
Theorem 39 (Equivalence). The finite FDT, the FNT I, and the FNT II are all equivalent.
Moreover, all of these results are implied by the FDT, as below:
Theorem 37
27
In this section, we consider the cycle networks from §1.3, for which the Dowker per-
sistence diagrams capture meaningful structure, whereas the Rips persistence diagrams do
not.
We then probe the question “What happens to the Dowker or Rips persistence diagram
of a network upon reversal of one (or more) edges?” Intuitively, if either of these persistence
diagrams captures asymmetry, we would see a change in the diagram after applying this
reversal operation to an edge.
To provide further evidence that Dowker persistence is sensitive to asymmetry, we
computed both the Rips and Dowker persistence diagrams, in dimensions 0 and 1, of cy-
cle networks Gn , for values of n between 3 and 6. Computations were carried out using
Javaplex in Matlab with Z2 coefficients. The results are presented in Figure 1.7. Based
on our computations, we were able to conjecture and prove the result in Theorem 40, which
gives a precise characterization of the 1-dimensional Dowker persistence diagram of a cy-
cle network Gn , for any n. Furthermore, the 1-dimensional Dowker persistence barcode
for any Gn contains only one persistent interval, which agrees with our intuition that there
is only one nontrivial loop in Gn . On the other hand, for large n, the 1-dimensional Rips
persistence barcodes contain more than one persistent interval. This can be seen in the
Rips persistence barcode of G6 , presented in Figure 1.7. Moreover, for n = 3, 4, the 1-
dimensional Rips persistence barcode does not contain any persistent interval at all. This
suggests that Dowker persistence diagrams/barcodes are an appropriate method for analyz-
ing cycle networks, and perhaps asymmetric networks in general.
The following theorem contains the characterization result for 1-dimensional Dowker
persistence diagrams of cycle networks.
Theorem 40. Let Gn = (Xn , ωGn ) be a cycle network for some n ∈ N, n ≥ 3. Then we
obtain:
DgmD 2
1 (Gn ) = (1, dn/2e) ∈ R .
Thus DgmD 2
1 (Gn ) consists of precisely the point (1, dn/2e) ∈ R with multiplicity 1.
Remark 41. From our experimental results (cf. Figure 1.7), it appears that the 1-dimensional
Rips persistence diagram of a cycle network does not admit a characterization as simple as
that given by Theorem 40 for the 1-dimensional Dowker persistence diagram. Moreover,
the Rips complexes RδGn , δ ∈ R, n ∈ N correspond to certain types of independence com-
plexes that appear independently in the literature, and whose homotopy types remain open
[53, Question 5.3]. On a related note, we point the reader to [3] for a complete characteri-
zation of the homotopy types of Rips complexes of points on the circle (equipped with the
restriction of the arc length metric).
To elaborate on the connection to [53], we write Hnk to denote the undirected graph
with vertex set {1, . . . , n}, and edges given by pairs (i, j) where 1 ≤ i < j ≤ n and either
j − i < k or (n + i) − j < k. Next we write Ind(Hnk ) to denote the independence complex
of Hnk , which is the simplicial complex consisting of subsets σ ⊆ {1, 2, . . . , n} such that
no two elements of σ are connected by an edge in Hnk . Then we have Ind(Hnk ) = Rn−k Gn
28
for each k, n ∈ N such that k < n. To gain intuition for this equality, fix a basepoint 1,
and consider the values of j ∈ N for which the simplex [1, j] belongs to Ind(Hnk ) and to
Rn−k
Gn , respectively. In either case, we have k + 1 ≤ j ≤ n − k + 1. Using the rotational
symmetry of the points, one then obtains the remaining 1-simplices. Rips complexes are
determined by their 1-skeleton, so this suffices to construct Rn−k
Gn . Analogously, Ind(Hn )
k
is determined by the edges in Hnk , and hence also by its 1-skeleton. In [53, Question 5.3],
the author writes that the homotopy type of Ind(Hnk ) is still unsolved. Characterizing the
persistence diagrams DgmR k (Gn ) thus seems to be a useful future step, both in providing
a computational suggestion for the homotopy type of Ind(Hnk ), and also in providing a
valuable example in the study of persistence of directed networks.
Remark 42. Theorem 40 has the following implication for data analysis: nontrivial 1-
dimensional homology in the Dowker persistence diagram of an asymmetric network sug-
gests the presence of directed cycles in the underlying data. Of course, it is not necessarily
true that nontrivial 1-dimensional persistence can occur only in the presence of a directed
circle.
Remark 43. Our motivation for studying cycle networks is that they constitute directed
analogues of circles, and we were interested in seeing if the 1-dimensional Dowker per-
sistence diagram would be able to capture this analogy. Theorem 40 shows that this is
indeed the case: we get a single nontrivial 1-dimensional persistence interval, which is
what we would expect when computing the persistent homology of a circle in the metric
space setting.
While we had initially proved Theorem 40 using elementary methods, Henry Adams
observed that Dowker complexes on cycle networks can be precisely related to nerve com-
plexes built over arcs on the circle. Using the techniques developed in [3] and [4], one
obtains the following results:
As a special case, we know that if n is odd, then DgmD 2 (Gn ) is trivial. If n is even, then
D
Dgm2 (Gn ) consists of the point ( 2 , 2 + 1) with multiplicity n2 − 1.
n n
Theorem
n o Fix n ∈ N, n ≥ 3. Then for l ∈ N, define Ml :=
45 (Odd dimension).
n(l+1)
nl
m ∈ N : l+1 < m < l+2 . If Ml is empty, then DgmD 2l+1 (Gn ) is trivial. Otherwise,
we have: n l mo
D n(l+1)
Dgm2l+1 (Gn ) = al , l+2 ,
where al := min {m ∈ Ml } . We use set notation (instead of multisets) to mean that the
multiplicity is 1.
29
nl
In particular, for l = 0, we have l+1 = 0 and n(l+1)
l+2
= n2 ≥ 3/2, so 1 ∈ Ml . Thus
we have DgmD 1, n2
1 (Gn ) = , and so Theorem 45 recovers Theorem 40 as a special
case.
Remark 47 (Pair swaps and their effect). Let (X, ωX ) ∈ CN , let z, z 0 ∈ X, and let
σ ∈ pow(X). Then we have:
z,z 0
max
0
ωX (x, x0 ) = max
0
ωX (x, x0 ).
x,x ∈σ x,x ∈σ
Using this observation, one then repeats the arguments used in the proof of Proposition 46
to show that:
0
DgmR R
k (X) = Dgmk (SX (z, z )), for each k ∈ Z+ .
30
x1
1 1
x2 1 x3
x1
1 1
x2 x4
1 1
x3
x1
1 1
x5 x2
1 1
x4 1 x3
x1 1 x2
1 1
x6 x3
1 1
x5 1 x4
Figure 1.7: The first column contains illustrations of cycle networks G3 , G4 , G5 and G6 .
The second column contains the corresponding Dowker persistence barcodes, in dimen-
sions 0 and 1. Note that the persistent intervals in the 1-dimensional barcodes agree with
the result in Theorem 40. The third column contains the Rips persistence barcodes of each
of the cycle networks. Note that for n = 3, 4, there are no persistent intervals in dimension
1. On the other hand, for n = 6, there are two persistent intervals in dimension 1.
31
0 0
a a
1 4 1 2
6 2 6 4
0 5 0 0 5 0
b c b c
3 3
(X, ωX ) (Y, ωY )
This encodes the intuitive fact that Rips persistence diagrams are blind to pair swaps. More-
over, successively applying the pair swap operation over all pairs produces the transpose of
the original network, and so it follows that DgmR R
k (X) = Dgmk (X ).
>
On the other hand, k-dimensional Dowker persistence diagrams are not necessarily
invariant to pair swaps when k ≥ 1. Indeed, Example 49 below constructs a space X for
which there exist points z, z 0 ∈ X such that
0
DgmD D
1 (X) 6= Dgm1 (SX (z, z )).
However, 0-dimensional Dowker persistence diagrams are still invariant to pair swaps:
Proposition 48. Let (X, ωX ) ∈ CN , let z, z 0 be any two points in Z, and let σ ∈ pow(X).
Then we have:
0
DgmD D
0 (X) = Dgm0 (SX (z, z )).
Example 49. Consider the three node dissimilarity networks (X, ωX ) and (Y, ωY ) in Fig-
ure 1.8. Note that (Y, ωY ) coincides with SX (a, c). We present both the Dowker and Rips
persistence barcodes obtained from these networks. Note that the Dowker persistence bar-
code is sensitive to the difference between (X, ωX ) and (Y, ωY ), whereas the Rips barcode
is blind to this difference.
To show how the Dowker complex is constructed, we also list the Dowker sink com-
plexes of the networks in Figure 1.8, and also the corresponding homology dimensions
across a range of resolutions. Note that when we write [a, b](a), we mean that a is a sink
32
Figure 1.9: Dowker persistence barcodes of networks (X, ωX ) and (Y, ωY ) from Figure
1.8.
Figure 1.10: Rips persistence barcodes of networks (X, ωX ) and (Y, ωY ) from Figure 1.8.
Note that the Rips diagrams indicate no persistent homology in dimensions higher than 0,
in contrast with the Dowker diagrams in Figure 1.9.
33
corresponding to the simplex [a, b].
Note that for δ ∈ [3, 4), dim(H1 (Dsiδ,Y )) = 1, whereas dim(H1 (Dsiδ,X )) = 0 for each
δ ∈ R.
Based on the discussion in Remark 47, Proposition 48, and Example 49, we conclude
the following:
Moral: Unlike Rips persistence diagrams, Dowker persistence diagrams are truly
sensitive to asymmetry.
Theorem 50. Recall the symmetrization and transposition maps s and t from Definition
17. Then:
1. R ◦ s = R,
3. Dsi ◦ t = Dso .
Also, there exist (X, ωX ), (Y, ωY ) ∈ FN such that (Dsi ◦ s)(X) 6= Dsi (X), and (Dso ◦
s)(Y ) 6= Dso (Y ).
Proof. These follow from Example 49, Remark 28, and Proposition 46.
34
1.5 Persistent path homology of networks
To define PPH, we first summarize and condense some concepts that appeared in [62].
We also point the reader to Section 3.1.1 for the necessary background on chain complexes
and associated constructions.
for each elementary p-path [x0 , . . . , xp ] ∈ Λp . Here xbi denotes omission of xi from the
sequence. The maps ∂•nr are referred to as the non-regular boundary maps. For p = −1,
nr nr
one defines ∂−1 : Λ−1 → Λ−2 to be the zero map. Then ∂p+1 ◦ ∂pnr = 0 for any integer
p ≥ −1 [64, Lemma 2.2]. It follows that (Λp , ∂pnr )p∈Z+ is a chain complex.
For notational convenience, we will often drop the square brackets and commas and
write paths of the form [a, b, c] as abc. We use this convention in the next example.
35
Example 52 (Paths on a double edge). We will soon explain the interaction between paths
on a set and the edges on a digraph. First consider a digraph on a vertex set Y = {a, b}
as in Figure 1.11. Notice that there is a legitimate “path” on this digraph of the form aba,
obtained by following the directions of the edges. But notice that applying ∂2nr to the 2-path
aba yields ∂2nr (aba) = ba − aa + ab, and aa is not a valid path on this particular digraph
(self-loops are disallowed). To handle situations like this, one needs to consider regular
paths, which are explained in the next section.
a b
One can further verify that ∂pnr (Ip ) ⊆ Ip−1 [64, Lemma 2.6], and so ∂pnr is well-defined
on Λp /Ip . Since Rp ∼= Λp /Ip via a natural linear isomorphism, one can define ∂p : Rp →
Rp−1 as the pullback of ∂pnr via this isomorphism [64, Definition 2.7]. Then ∂p is referred
to as the regular boundary map in dimension p, where p ∈ Z+ . Now we obtain a new chain
complex (Rp , ∂p )p∈Z+ .
Example 53 (Regular paths on a double edge). Consider again the digraph in Figure 1.11.
Applying the regular boundary map to the 2-path aba yields ∂2 (aba) = ba+ab. This exam-
ple illustrates the following general principle: Irregular paths arising from an application
of ∂• are treated as zeros.
36
on (X, E) is denoted Ap = Ap (G) = Ap (X, E, K), and is called the space of allowed
p-paths. One further defines A−1 := K and A−2 := {0}.
a b w x
d c z y
One further defines Ω−1 := A−1 ∼ = K and Ω−2 := A−2 = {0}. Now it follows by
the definitions that im(∂p (Ωp )) ⊆ Ωp−1 for any integer p ≥ −1. Thus we have a chain
complex:
3∂ 2 ∂ 1 ∂ 0 ∂ ∂−1
... −
→ Ω2 −
→ Ω1 −
→ Ω0 −
→ K −−→ 0
For each p ∈ Z+ , the p-dimensional path homology groups of G = (X, E) are defined
as:
37
The crux of the Ω• construction lies in understanding Ω2 (GN ). Note that even though
∂2GN (wxy),
∂2GN (wzy) 6∈ Ω2 (GN ) (because wy 6∈ A1 (GN )), we still have:
Elementary calculations show that dim(H1Ξ (GM )) = 1, and dim(H1Ξ (GN )) = 0. Thus
path homology can successfully distinguish between these two squares.
To compare this with a simplicial approach, consider the directed clique complex ho-
mology studied in [102, 84, 118]. Given a digraph G = (X, E), the directed clique complex
is defined to be the ordered simplicial complex [88, p. 76] given by writing:
Here we use parentheses to denote ordered simplices. For the squares in Figure 1.12, we
have:
FGM = {a, b, c, d, ab, cb, cd, ad} and FGN = {w, x, y, z, wx, xy, wz, zy} ,
Remark 55 (The challenge of finding a natural basis for Ω• ). The digraph GN in Example
54 is a minimal example showing that it is nontrivial to compute bases for the vector spaces
Ω• . Specifically, while it is trivial to read off bases for the allowed paths A• from a digraph,
one needs to consider linear combinations of allowed paths in a systematic manner to obtain
bases for the ∂-invariant paths.
Contrast this with the setting of simplicial homology: here the simplices themselves
form bases for the associated chain complex, so there is no need for an extra preprocessing
step. Thus when using PPH for asymmetric data, it is important to consider the trade-off
between greater sensitivity to asymmetry and increased computational cost.
We derive a procedure for systematically computing bases for Ω• in §4.4.
38
0
Definition 20. Let G = {Gδ ,→ Gδ }δ≤δ0 ∈R be a digraph filtration. Then for each p ∈ Z+ ,
we define the p-dimensional persistent path homology of G to be the following persistent
vector space:
(ιδ,δ0 )# 0
PVecΞp (G) := {HpΞ (Gδ ) −−−−→ HpΞ (Gδ )}δ≤δ0 ∈R .
When it is defined, the diagram associated to PVecΞp (G) is denoted DgmΞp (G).
Remark 57. While the preceding stability result is analogous to those for Vietoris-Rips and
Dowker persistence, the proofs in this setting require results on the homotopy of digraphs
that were recently developed in [63] (cf. Section 3.3.2).
Having defined PPH, we now answer some fundamental questions related to its char-
acterization. We show that PPH agrees with Čech/Dowker persistence on metric spaces in
dimension 1, but not necessarily in higher dimensions. We also show that in the asymmetric
case, PPH and Dowker agree in dimension 1 if a certain local condition is satisfied.
Example 58 (PPH vs Dowker for metric n-cubes). In the setting of metric spaces, PPH is
generally different from Dowker persistence in dimensions ≥ 2. To see this, consider Rn
equipped with the Euclidean distance for n ≥ 3. Define
n := {(i1 , i2 , . . . , in ) : ij ∈ {0, 1} ∀ 1 ≤ j ≤ n} .
Then Gδn has no edges for δ < 1, and for δ = 1, it has precisely an edge between any two
points of n that differ on a single coordinate. But at δ = 1, Gδn is homotopy equivalent
to Gδn−1 : the homotopy equivalence is given by collapsing points that differ exactly on the
nth coordinate (see Figure 1.14). Proceeding recursively, we see that Gδn−1 is contractible
at δ = 1. However, Dsi (n ) is not contractible at δ = 1. Moreover,√an explicit verification
for the n = 3 case shows that DgmD 2 (3 ) consists of the point (1, 2) with multiplicity 7.
D Ξ
Thus Dgm2 (3 ) 6= Dgm2 (3 ).
Theorem 59. Let X = (X, AX ) ∈ CN be a symmetric network, and fix K = Z/pZ for
some prime p. Then DgmΞ1 (X ) = DgmD
1 (X ).
39
The preceding result shows that on metric spaces, PPH agrees with Dowker persistence
in dimension 1. The converse implication is not true: in §1.5.3, we provide a family of
highly asymmetric networks for which PPH agrees with Dowker persistence in dimension
1. On the other hand, the examples in Figure 1.13 show that equality in dimension 1 does
not necessarily hold for asymmetric networks. Moreover, it turns out that the four-point
configurations illustrated in Figure 1.13 can be used to give another partial characterization
of the networks for which PPH and Dowker persistence do agree in dimension 1. We
present this statement next.
x1 x2 x3 x4 y1 y2 y3 y4 x2 y2
x1 y1
x1 0 1 2 2 y1 0 1 2 1
x2 2 0 2 2 y2 2 0 2 2
X Y
x3 2 1 0 2 y3 2 1 0 1
x4 1 2 1 0 y4 2 2 2 0 x3 y3
x4 y4
Figure 1.13: Working over Z/2Z coefficients, we find that DgmΞ1 (X ) and DgmD
1 (Y) are
D Ξ
trivial, whereas Dgm1 (X ) = Dgm1 (Y) = {(1, 2)} = {(1, 2)}.
Definition 21 (Squares, triangles, and double edges). Let G be a finite digraph. Then we
define the following local configurations of edges between distinct nodes a, b, c, d:
• A double edge is a pair of edges (a, b), (b, a).
• A short square is a set of edges (a, b), (a, d), (c, b), (c, d) such that neither of (a, c),
(c, a), (b, d), (d, b) is an edge.
• A long square is a set of edges (a, b), (b, c), (a, d), (d, c) such that neither of (b, d),(a, c)
is an edge.
a b a b a b a b
c d c d c
Finally, we define a network (X, AX ) to be square-free if GδX does not contain a four-
point subset whose induced subgraph is a short or long square, for any δ ∈ R. An important
40
observation is that to be a square, the subgraph induced by a four-point subset cannot just
include one of the configurations pictured above; it must exclude diagonal edges as well.
Remark 60. We thank Guilherme Vituri for pointing out the need to exclude the b → d
edge in the definition of a long square.
Theorem 61. Let X = (X, AX ) ∈ CN be a square-free network, and fix K = Z/pZ for
some prime p. Then DgmΞ1 (X ) = DgmD
1 (X ).
Remark 62. The proofs of Theorems 59 and 61 both require an argument where simplices
are paired up—this requires us to use Z/pZ coefficients in both theorem statements.
Figure 1.14: Left: Gδ3 is (digraph) homotopy equivalent to a point at δ = 1, as can be seen
√
by collapsing points along the orange lines. Right: Dsiδ,3 becomes contractible at δ = 2,
√
but has nontrivial homology in dimension 2 that persists across the interval [1, 2).
41
1.6 The case of compact networks
Having surveyed the constructions of persistent homology on networks, we return to
dN and discuss some more results.
Theorem 64 (∃ of refined ε-systems). Any compact network (X, ωX ) has a refined ε-system
for any ε > 0. In particular, by picking a representative from each element of U, we get a
finite network X 0 such that dN (X, X 0 ) < ε.
Remark 65. When considering a compact metric space (X, dX ), the preceding theorem
relates to the well-known notion of taking finite ε-nets in a metric space. Recall that for
ε > 0, a subset S ⊆ X is an ε-net if for any point x ∈ X, we have B(x, ε) ∩ S 6= ∅. Such
an ε-net satisfies the nice property that dGH (X, S) < ε [17, 7.3.11]. In particular, one can
find a finite ε-net of (X, dX ) for any ε > 0 by compactness.
We do not make quantitative estimates on the cardinality of the ε-approximation pro-
duced in Theorem 64. In the setting of compact metric spaces, the size of an ε-net relates
42
to the rich theory of metric entropy developed by Kolmogorov and Tihomirov [51, Chapter
17].
The preceding result shows that refined ε-systems always exist; this result relies cru-
cially on the assumption that the network is compact. The proof of the theorem uses the
continuity of ωX : X × X → R and the compactness of X × X. In the setting of compact
subsets of Euclidean space or compact metric spaces, ε-systems are easy to construct: we
can just take a cover by ε-balls, and then extract a finite subcover by invoking compactness.
The strength of Theorem 64 lies in proving the existence of ε-systems even when symmetry
and triangle inequality (key requirements needed to guarantee the standard properties of ε-
balls) are not assumed. The next result shows that by sampling points from all the elements
of an ε-system, one obtains a finite, quantitatively good approximation to the underlying
network.
Theorem 66 (ε-systems and dN ). Let (X, ωX ) be a compact network, let ε > 0, and let U
be an ε-system on X. Suppose X 0 is any finite subset of X that has nonempty intersection
with each element in U. Then there exists a correspondence R0 ∈ R(X, X 0 ) such that
dis(R0 ) < 4ε, and for each (x, x0 ) ∈ R0 we have {x, x0 } ∈ U for some U ∈ U. In
particular, it follows that
The first statement in the preceding theorem asserts that we can choose a “well-behaved”
correspondence that associates to each point in X a point in X 0 that belongs to the same el-
ement in the ε-system. We omit the proof, as it follows essentially from the proof technique
for the next result.
By virtue of Theorem 64, one can always approximate a compact network up to any
given precision. The next theorem implies that a sampled network limits to the underlying
compact network as the sample gets more and more dense.
Theorem 67 (Limit of dense sampling). Let (X, ωX ) be a compact network, and let S =
{s1 , s2 , . . .} be a countable dense subset of X with a fixed enumeration. For each n ∈ N,
let Xn be the finite network with node set {s1 , . . . , sn } and weight function ωX |Xn ×Xn .
Then we have:
dN (X, Xn ) ↓ 0 as n → ∞.
We now briefly venture into measure networks, which are Polish spaces X equipped
with a Borel probability measure µX and an essentially bounded, measurable weight func-
tion ωX : X × X → R (more in §1.9.1). For such a network, it makes sense to ask about
an “optimal” ε-system, as in the next definition.
Definition 24. Let (X, ωX , µX ) be a measure network. Let U be any ε-system on X. We
define the minimal mass function
43
Note that m returns the minimal non-zero mass of an element in U.
Next let ε > 0. Define a function Mε : CN → (0, 1] as follows:
Mε (X) := sup {m(U) : U a refined ε-system on X} .
Since U covers X, we know that the total mass of U is 1. Thus the set of elements U with
positive mass is nonempty, and so m(U) is strictly positive. It follows that Mε (X) is strictly
positive. More is true when µX is fully supported on X: given any ε-system U on X and
any U ∈ U, we automatically have µX (U ) > 0. To see this, suppose µX (U ) = 0. Then
U ∩ supp(µX ) = ∅, which is a contradiction because supp(µX ) = X and U ∩ X 6= ∅ by
our convention for an open cover (i.e. that empty elements are excluded, see §).
In the preceding definition, for a given ε > 0, the function Mε (X) considers the collec-
tion of all refined ε-systems on X, and then maximizes the minimal mass of any element in
such an ε-system. For an example, consider the setting of Euclidean space Rd : ε-systems
can be constructed using ε-balls, and the mass of an ε-ball scales as εd . The functions in
Definition 24 are crucial to the next result, which shows that as we sample points from a
distribution on a network, the sampled subnetwork converges almost surely to the support
of the distribution.
Theorem 68 (Probabilistic network approximation). Let (X, ωX ) be a network equipped
with a Borel probability measure µX . For each i ∈ N, let xi : Ω → X be an independent
random variable defined on some probability space (Ω, F, P) with distribution µX . For
each n ∈ N, let Xn = {x1 , x2 , . . . , xn }. Let ε > 0. Then we have:
n
1 − Mε/2 (supp(µX ))
P {ω ∈ Ω : dN (supp(µX ), Xn (ω)) ≥ ε} ≤ ,
Mε/2 (supp(µX ))
where Xn (ω) is the subnetwork induced by {x1 (ω), . . . , xn (ω)}. In particular, the subnet-
work Xn converges almost surely to X in the dN -sense.
As noted before, the mass of an ε-ball in d-dimensional Euclidean space scales as εd .
Thus in the setting of Euclidean space Rd , the quantity on the right would scale as ε−d (1 −
εd ) n .
44
Definition 25. Let (X, ωX ) and (Y, ωY ) ∈ N . We define X and Y to be Type I weakly
isomorphic, denoted X ∼=w
I Y , if there exists a set Z and surjective maps ϕX : Z → X and
ϕY : Z → Y such that ωX (ϕX (z), ϕX (z 0 )) = ωY (ϕY (z), ϕY (z 0 )) for each z, z 0 ∈ Z.
Figure 1.15: Relaxing the requirements on the maps of this “tripod structure” is a natural
way to weaken the notion of strong isomorphism.
Notice that Type I weak isomorphism is in fact a relaxation of the notion of strong
isomorphism. Indeed, if in addition to being surjective, we require the maps φX and φY
to be injective, then the strong notion of isomorphism is recovered. In this case, the map
φY ◦ φ−1
X : X → Y would be a weight preserving bijection between the networks X and Y .
The relaxation of strong isomorphism to a Type I weak isomorphism is illustrated in Figure
1.15. Also observe that the relaxation is strict. For example, the networks X = N1 (1) and
Y = N2 (12×2 ), are weakly but not strongly isomorphic via the map that sends both nodes
of Y to the single node of X.
We remark that when dealing with infinite networks, it will turn out that an even weaker
notion of isomorphism is required. We define this weakening next.
Definition 26. Let (X, ωX ) and (Y, ωY ) ∈ N . We define X and Y to be Type II weakly
isomorphic, denoted X ∼=wII Y , if for each ε > 0, there exists a set Zε and surjective maps
ε ε
φX : Zε → X and φY : Zε → Y such that
|ωX (φεX (z), φεX (z 0 )) − ωY (φεY (z), φεY (z 0 ))| < ε for all z, z 0 ∈ Zε . (1.4)
Remark 69 (Type I isomorphism is stronger than Type II). Let (X, ωX ), (Y, ωY ) ∈ CN
and suppose ϕ : X → Y is a surjective map such that ωX (x, x0 ) = ωY (ϕ(x0 ), ϕ(x0 )) for
all x, x0 ∈ X. Then X and Y are Type I weakly isomorphic and hence Type II weakly
isomorphic, i.e. X ∼=wII Y . This result follows from Definition 25 by: (1) choosing Z = X,
and (2) letting φX be the identity map, and (3) letting φY = ϕ. The converse implication,
i.e. that Type I weak isomorphism implies the existence of a surjective map as above, is not
true: an example is shown in Figure 1.16.
45
A B C
1 1 1
2 x 2 v 2 p r
z u
y 3 w 3 q s 3
1 1 12 2 1 1
2 2 1 2 1 1
Ψ3A (x, y, z) = 221 Ψ3B (u, v, w) = 133 Ψ4C (p, q, r, s) = 2211
1133
113 133 1133
Figure 1.16: Note that Remark 69 does not fully characterize weak isomorphism, even
for finite networks: All three networks above, with the given weight matrices, are Type I
weakly isomorphic since C maps surjectively onto A and B. But there are no surjective,
weight preserving maps A → B or B → A.
It is easy to see that strong isomorphism induces an equivalence class on N . The same
is true for both types of weak isomorphism, and we record this result in the following
proposition.
Proposition 70. Weak isomorphism of Types I and II both induce equivalence relations on
N.
In the setting of FN , it is not difficult to show that the two types of weak isomorphism
coincide. This is the content of the next proposition. By virtue of this result, there is no
ambiguity in dropping the “Type I/II” modifier when saying that two finite networks are
weakly isomorphic.
Proposition 71. Let X, Y ∈ FN be finite networks. Then X and Y are Type I weakly
isomorphic if and only if they are Type II weakly isomorphic.
Type I weak isomorphisms will play a vital role in the content of this paper, but for now,
we focus on Type II weak isomorphism. The next theorem justifies calling dN a network
distance, and shows that dN is compatible with Type II weak isomorphism.
Theorem 72. dN is a metric on N modulo Type II weak isomorphism.
The proof is in §2.1. For finite networks, we immediately obtain:
The restriction of dN to FN yields a metric modulo Type I weak isomorphism.
The proof of Proposition 71 will follow from the proof of Theorem 72. In fact, an even
stronger result is true: weak isomorphism of Types I and II coincide for compact networks
as well.
Theorem 73 (Weak isomorphism in CN ). Let X, Y ∈ CN . Then X and Y are Type II
weakly isomorphic if and only if X and Y are Type I weakly isomorphic, i.e. there exists a
set V and surjections ϕX : V → X, ϕY : V → Y such that:
ωX (ϕX (v), ϕX (v 0 )) = ωY (ϕY (v), ϕY (v 0 )) for all v, v 0 ∈ V.
46
1.6.3 An additional axiom coupling weight function with topology
We now explore some additional constraints on the coupling between the topology on a
network and its weight function. Using these constraints, we are able to prove that weakly
isomorphic networks have, in a particular sense, a strongly isomorphic core. Moreover,
weak isomorphism on the whole is guaranteed by having a certain equality of substructures
called motifs. In particular, this generalizes an observation of Gromov about reconstruction
via motif sets in metric spaces [65, 3.27 12 ] to the setting of directed metric spaces.
First we present a definition that will be used later, and will also help us understand the
topological constraints we later impose.
Definition 27 (An equivalence relation and a quotient space). Let (X, ωX ) ∈ N . Define
the equivalence relation ∼ as follows:
where the first equality holds because a ∼ x, and the second equality holds because a0 ∼ x0 .
We equip X/ ∼ with the quotient topology, i.e. a set is open in X/ ∼ if and only if its
preimage under σ is open in X. Then σ is a surjective, continuous map.
Recall that we often write xn → x to mean that a sequence (xn )n∈N in a topological
space X is converging to x ∈ X, i.e. any open set containing x contains all but finitely
many of the xn terms. We also often write “(xn )n∈N is eventually inside A ⊆ X” to mean
that xn ∈ A for all but finitely many n. Also recall that given a subspace Z ⊆ X equipped
with the subspace topology, we say that a particular toplogical property (e.g. convergence
or openness) holds relative Z or rel Z if it holds in the set Z equipped with the sub-
space topology. Throughout this section, we use the “relative” terminology extensively as
a bookkeeping device to keep track of the subspace with respect to which some topological
property holds.
Definition 28. Let (X, ωX ) ∈ N . We say that X has a coherent topology if the following
axioms are satisfied for any subnetwork Z of X equipped with the subspace topology:
A1 (Open sets in a first countable space) A set A ⊆ Z is open rel Z if and only if for any
sequence (xn )n∈N in Z converging rel Z to a point x ∈ A, there exists N ∈ N such
that xn ∈ A for all n ≥ N .
47
Axiom A1 is a characterization of open sets in first countable spaces; we mention it
explicitly for easy reference. Axiom A2 gives a characterization of convergence (and hence
of the open sets, via A1) in terms of the given weight function. Note that A2 does not
discount the possibility of a sequence converging to non-unique limits, does not force a
space to be Hausdorff, and does not force convergent sequences to be Cauchy. The name
topological triangle inequality is explained in the next remark.
Remark 74 (The “topological triangle inequality”). Consider a metric space (X, dX ). One
key property of such a space is that whenever dX (x, x0 ) is small, we also have dX (x, •) ≈
dX (x0 , •) by the triangle inequality. Said differently, if we have a sequence (xn )n and
xn → x, then |dX (xn , z) − dX (x, z)| ≤ dX (x, xn ) → 0 for any z ∈ X. Conversely, if
|dX (xn , z) − dX (x, z)| → 0 for all z ∈ X, then by letting z = x, we immediately obtain
dX (xn , x) → 0.
Axiom A2 abstracts away this consequence of the triangle inequality into its network
formulation. However, there is more subtlety in the definition. First consider the relation
∼. Informally, if we relax the definition of ∼ and require “approximate equality” instead
of strict equality, we say that x ∼ε x0 if
ωX (x, x) ≈ ωX (x, x0 ) ≈ ωX (x0 , x) ≈ ωX (x0 , x0 ) (1.5)
0 0
ωX (x, z) ≈ ωX (x , z) and ωX (z, x) ≈ ωX (z, x ) for all z ∈ X. (1.6)
Here the ε decoration on ∼ is incorporated into the ≈ notation in the obvious way. As we
observed earlier, in a metric space, the triangle inequality ensures that (1.5) implies (1.6).
More generally, let x ∈ X, let Z be a small ε-ball containing x, and suppose (xn )n is a
sequence in Z. If Z is small, then (1.5) holds for any (xn , x) pair in Z and forces (1.6) to
hold, not just in Z, but in all of X. This type of local-to-global inference is a consequence
of the triangle inequality.
In a network (X, ωX ), A2 captures this type of local-to-global inference in a weak
sense. Suppose Z ⊆ X, {xn }n ⊆ Z, x ∈ Z, and xn → x rel Z. Note that (1.5) does not
force (1.6) to hold even in Z, and so we explicitly assume xn → x rel Z, which implicitly
assumes (1.6) restricted to Z.
By a fact about convergence in a relative topology, (xn )n in Z converges to x ∈ Z rel Z
if and only if it converges rel X. So xn → x rel Z automatically forces xn → x rel X. Thus
by A2, we know that (1.6) holds, not just in Z, but in all of X. Because A2 generalizes
the triangle inequality in some sense, and relies on properties of the subspace topology, we
interpret it as a topological triangle inequality.
Remark 75 (Heredity of coherence). An alternative formulation of a coherent topology—
without invoking the “any subnetwork Z of X” terminology—would be to say that X
satisfies A2, and that A2 is hereditary, meaning that any subspace also satisfies A2. Note
that first countability is hereditary, so any subspace of X automatically satisfies A1.
One of the reasons for discussing coherent topologies is that it enables us to prove that
isometric maps are continuous. This also justifies Axiom A2 as a fundamental property
that we should expect networks to have.
48
Proposition 76. Let (X, ωX ), (Y, ωY ) be networks with coherent topologies. Suppose f :
X → Y is a weight-preserving map and f (X) is a subnetwork of Y with the subspace
topology. Then f is continuous.
We use the name “coherent” because it was used in the context of describing the cou-
pling between a metric-like function and its topology as far back as in [101].
metric network with a coherent topology. In general, for a topology on a finite network to be
coherent, it needs to be coarser than the discrete topology. Consider the network N2 ( 11 11 )
on node set {p, q}. If we assume that the constant sequence (p, p, . . .) converges to q in
the sense of Axiom A2, then {q} cannot be open for Axiom A1 to be satisfied. However,
the trivial topology {∅, {p, q}} is coherent. More generally, the discrete topology on the
skeleton sk(X) of any finite network X (esentially X/ ∼, but defined more precisely in
§1.6.4) is coherent.
The directed network with finite reversibility (~S1 , ω~S1,ρ ) described in §1.3 is a compact,
asymmetric network with a coherent topology.
49
collection of n×n weight matrices obtained from n-tuples of points in X, possibly with rep-
etition. This is made precise next, after introducing some notation. For a sequence (xi )ni=1
of nodes in a network X, we will denote the associated weight matrix by ((ωX (xi , xj )))ni,j=1 .
Entry (i, j) of this matrix is simply ωX (xi , xj ).
Definition 29 (Motif set). For each n ∈ N and each (X, ωX ) ∈ CN , define ΨnX : X n →
Rn×n to be the map (x1 , · · · , xn ) 7→ ((ωX (xi , xj )))ni,j=1 , where the (()) notation refers to the
square matrix associated with the sequence. Note that ΨnX is simply a map that sends each
sequence of length n to its corresponding weight matrix. Let C(Rn×n ) denote the closed
subsets of Rn×n . Then let Mn : CN → C(Rn×n ) denote the map defined by
We refer to Mn (X) as the n-motif set of X. The interpretation is that Mn (X) is a bag
containing all the motifs of X that one can form by looking at all subnetworks of size n
(with repetitions). Notice that the image of Mn is closed in Rn×n because each coordinate
is the continuous image of the compact set X × X under ωX , hence the image of Mn is
compact in Rn×n and hence closed.
It is easy to come up with examples of networks that share the same motif sets, but are
not strongly isomorphic. However, as we later show in Theorem 84, weak isomorphism of
compact, separable, and coherent networks is precisely characterized by equality of motif
sets. Another crucial object for this result is the notion of a skeleton, which we define next.
Definition 30 (Automorphisms). Let (X, ωX ) ∈ CN . We define the automorphisms of X
to be the collection
Next we define a partial order on p(X) as follows: for any (Y, ωY ), (Z, ωZ ) ∈ p(X),
Then the set p(X) equipped with is called the poset of weak isomorphism of X.
Definition 32 (Terminal networks in CN ). Let (X, ωX ) ∈ CN . A compact network Z ∈
p(X) is terminal if:
1. For each Y ∈ p(X), there exists a weight preserving surjection ϕ : Y → Z.
50
X X V Y ···
f g
Z Z
Figure 1.17: Left: Z represents a terminal object in p(X), and f, g are weight preserving
surjections X → Z. Here ϕ ∈ Aut(Z) is such that g = ϕ ◦ f . Right: Here we show more
of the poset structure of p(X). In this case we have X V Y . . . Z.
One of our main results (Theorem 84) shows that two weakly isomorphic networks have
strongly isomorphic skeleta.
A terminal network captures the idea of a minimal substructure of a network. One may
ask if anything interesting can be said about superstructures of a network. This motivates
the following construction of a “blow-up” network. We provide an illustration in Figure
1.18.
Definition 33. Let (X, ωX ) be any network. Let k = (kx )x∈X be a S choice of an index set
kx for each node x ∈ X. Consider the network X[k] with node set x∈X {(x, i) : i ∈ kx }
and weights ω given as follows: for x, x0 ∈ X and for i ∈ kx , i0 ∈ kx0 ,
S
The topology on X[k] is given as follows: the open sets are of the form x∈U {(x, i) : i ∈ kx },
where U is open in X. By construction, X[k] is first countable with respect to this topology.
We will call any such X[k] a blow-up network of X.
51
2
1 4
(q, 1) (r, 1)
blow-up
3
2 2
2
1 q r 4 1 1 4 4
3 3 3
skeletonize 2
(q, 2) (r, 2)
1 3 4
1 1 2 2
N2 (( 13 24 )) N4 1122
3344
3344
52
Observe that sk(X) is compact because X is compact, and first countable by Proposi-
tion 80 and the fact that the image of first countable space under an open, surjective, and
continuous map is also first countable [114, p. 27]. Furthermore, ωsk(X) is well defined by
the definition of ∼.
The following proposition shows that skeletons inherit the property of coherence.
Proposition 81. Let (X, ωX ) be a compact network with a coherent topology. The quotient
topology on (sk(X), ωsk(X) ) is also coherent.
Proposition 82. Let (X, ωX ) be a compact network with a coherent topology. Then its
skeleton (sk(X), ωsk(X) ) is Hausdorff.
Theorem 83 (Skeletons are terminal). Let (X, ωX ) ∈ CN be such that the topology on X
is coherent. Then (sk(X), ωsk(X) ) ∈ CN is terminal in p(X).
Theorem 84. Suppose (X, ωX ), (Y, ωY ) are separable, compact networks with coherent
topologies. Then the following are equivalent:
1. X ∼
=w Y .
3. sk(X) ∼
=s sk(Y ).
Theorem 85 ([26], also [27] Theorem 2.3). Any q-tame persistent vector space V has
a well-defined persistence diagram Dgm(V). If U, V are ε-interleaved q-tame persistent
vector spaces, then dB (Dgm(U), Dgm(V)) ≤ ε.
53
As a consequence of developing the notion of ε-systems, we are able to prove the fol-
lowing:
Theorem 86. Let (X, ωX ) ∈ CN , k ∈ Z+ . Then the persistence vector spaces associated
to the Vietoris-Rips, Dowker, and PPH constructions are all q-tame.
The metric space analogue of Theorem 86 for VR and Čech complexes appeared in [27,
Proposition 5.1]; the same proof structure works in the setting of networks after applying
our results on approximation via ε-systems.
For this next result, we again refer the reader to §1.9.1 for additional details on measure
networks.
Theorem 87 (Convergence). Let (X, ωX ) be a measure network equipped with a Borel
probability measure µX . For each i ∈ N, let xi : Ω → X be an independent random
variable defined on some probability space (Ω, F, P) with distribution µX . For each n ∈ N,
let Xn = {x1 , x2 , . . . , xn }. Let ε > 0. Then we have:
n
• •
1 − Mε/4 (supp(µX ))
P {ω ∈ Ω : dB (Dgm (supp(µX )), Dgm (Xn (ω))) ≥ ε} ≤ ,
Mε/4 (supp(µX ))
where Xn (ω) is the subnetwork induced by {x1 (ω), . . . , xn (ω)} and Dgm• is either of the
Vietoris-Rips, Dowker, or PPH diagrams. In particular, either of these three persistent vec-
tor spaces of the subnetwork Xn converges almost surely to that of supp(µX ) in bottleneck
distance.
54
[65, 17, 98]. Any sequence in such a precompact family has a subsequence converging to
some limit point of the family. In the next section, we extend these results to the setting of
networks. Namely, we show that that there are many families of compact networks that are
precompact under the metric topology induced by dN .
Definition 36 (Diameter for networks, [35]). For any network (X, ωX ), define diam(X) :=
supx,x0 ∈X |ωX (x, x0 )|. For compact networks, the sup is replaced by max.
Remark 89. The preceding definition is an analogue of the definition of uniformly totally
bounded families of compact metric spaces [17, Definition 7.4.13], which is used in for-
mulating the precompactness result in the metric space setting. A family of compact metric
spaces is said to be uniformly totally bounded if there exists D ∈ R+ such that each space
has diameter bounded above by D, and for any ε > 0 there exists Nε ∈ N such that each
space in the family has an ε-net with cardinality bounded above by Nε . Recall that given
a metric space (X, dX ) and ε > 0, a subset S ⊆ X is an ε-net if for any point x ∈ X,
we have B(x, ε) ∩ S 6= ∅. Such an ε-net satisfies the nice property that dGH (X, S) < ε
[17, 7.3.11]. Thus an ε-net is an ε-approximation of the underlying metric space in the
Gromov-Hausdorff distance.
55
geodesically convex region of (CN /∼ =w , dN ) to compute an “average” network from a
collection of networks. Such a result is of interest in statistical inference, e.g. when one
wishes to represent a noisy collection of networks by a single network. Similar results on
barycenters of geodesic spaces can be found in [61, 82]. We leave a treatment of this topic
from a probabilistic framework as future work, and only use this vignette to motivate the
results in this section.
We begin with some definitions.
Theorem 91 ([17], Theorem 2.4.16). Let (X, dX ) be a complete metric space. If for any
x, x0 ∈ X there exists a midpoint z such that dX (x, z) = dX (z, y) = 21 dX (x, y), then X is
geodesic.
Theorem 92. The metric space (FN /∼ =w , dN ) is a geodesic space. More specifically, let
[X], [Y ] ∈ (FN /∼
=w , dN ). Then, for any R ∈ R opt (X, Y ), we can construct a geodesic
γR : [0, 1] → FN /∼
=w between [X] and [Y ] as follows:
γR (0) := [(X, ωX )], γR (1) := [(Y, dY )], and γR (t) := [(R, ωγR (t) )] for t ∈ (0, 1),
A key step in the proof of the preceding theorem is to choose an optimal correspondence
between two finite networks. This may not be possible, in general, for compact networks.
However, using the additional results on precompactness and completeness of CN /∼ =w , we
are able to obtain the desired geodesic structure in Theorem 93. The proof is similar to
the one used by the authors of [73] to prove that the metric space of isometry classes of
compact metric spaces endowed with the Gromov-Hausdorff distance is geodesic.
56
Remark 94. Consider the collection of compact metric spaces endowed with the Gromov-
Hausdorff distance. This collection can be viewed as a subspace of (CN /∼ =w , dN ). It is
known (via a proof relying on Theorem 91) that this restricted metric space is geodesic [73].
Furthermore, it was proved in [36] that an optimal correspondence always exists in this set-
ting, and that such a correspondence can be used to construct explicit geodesics instead of
resorting to Theorem 91. The key technique used in [36] was to take a convergent sequence
of increasingly-optimal correspondences, use a result about compact metric spaces called
Blaschke’s theorem [17, Theorem 7.3.8] to show that the limiting object is closed, and then
use metric properties such as the Hausdorff distance to guarantee that this limiting object
is indeed a correspondence. A priori, such techniques cannot be readily adapted to the
network setting, and while one can obtain a convergent sequence of increasingly-optimal
correspondences, the obstruction lies in showing that the limiting object is indeed a cor-
respondence. Thus in our proof of Theorem 93, we resort to the indirect route of using
Theorem 91.
While the existence of geodesics is satisfactory, it turns out that in some sense, there are
“too many” geodesics in CN . This problem is already apparent in FM. As shown in [36],
FM contains both branching and non-unique geodesics. The simultaneous presence of
both these types of geodesics precludes the placement of curvature bounds (in the sense of
Alexandrov curvature, which is a commonly used notion of curvature in metric geometry)
on even FM. The issue lies in the definition of dGH /dN : the l∞ structure of these metrics
is what enables the existence of these exotic geodesics. We reproduce these results in the
next few sections. First we show a quick lemma.
Lemma 95. Let (Z, dZ ) be a metric space. Let S, T ∈ R, with S < T , and γ : [S, T ] → Z
be a curve such that
|s − t|
dZ (γ(s), γ(t)) ≤ · dZ (γ(S), γ(T )), for all s, t ∈ [S, T ].
|S − T |
Then, in fact,
|s − t|
dZ (γ(s), γ(t)) = · dZ (γ(S), γ(T )), for all s, t ∈ [S, T ].
|S − T |
Proof of Lemma 95. Suppose the inequality is strict. Suppose also that s ≤ t. Then by the
triangle inequality, we obtain:
57
Deviant geodesics
For any n ∈ N, let ∆n denote the n-point discrete space, often called the n-point unit
simplex. Fix n ∈ N, n ≥ 2. We will construct an infinite family of deviant geodesics
between ∆1 and ∆n , named as such because they deviate from the straight-line geodesics
given by Theorem 92. As a preliminary step, we describe the straight-line geodesic between
∆1 and ∆n of the form given by Theorem 92. Let {p} and {x1 , . . . , xn } denote the under-
lying sets of ∆1 and ∆n . There is a unique correspondence R := {(p, x1 ), . . . , (p, xn )}
between these two sets. According to the setup in Theorem 92, the straight-line geodesic
between ∆1 and ∆n is then given by the metric spaces (R, dγR (t) ), for t ∈ (0, 1). Here
dγR (t) ((p, xi ), (p, xj )) = t · d∆n (xi , xj ) = t for each t ∈ (0, 1) and each 1 ≤ i, j ≤ n. This
corresponds to the all-t matrix with 0s on the diagonal. Finally, we note that the unique
correspondence R necessarily has distortion 1. Thus dGH (∆1 , ∆n ) = 12 .
Now we give the parameters for the construction of a certain family of deviant geodesics
between ∆1 and ∆n . For any α ∈ (0, 1] and t ∈ [0, 1], define
(
tα : 0 ≤ t ≤ 12
f (α, t) :=
α − tα : 21 < t ≤ 1
Fix α1 , . . . , αm ∈ (0, 1]. For each 0 ≤ t ≤ 1, define the matrix δt := ((dtij ))n+m
i,j=1 by:
0 :i=j
f (α , t) : j − i = n
i
For 1 ≤ i, j ≤ n + m, dtij :=
f (αj , t) : i − j = n
t : otherwise.
This is a block matrix BAT B C
where A is the n × n all-t matrix with 0s on the diagonal,
C is an m × m all-t matrix with 0s on the diagonal, and B is the n × m all-t matrix with
f (α1 , t), f (α2 , t), . . . , f (αm , t) on the diagonal.
We first claim that δt is the distance matrix of a pseudometric space. Symmetry is clear.
We now check the triangle inequality. In the cases 1 ≤ i, j, k ≤ n and n + 1 ≤ i, j, k ≤
n + m, the points xi , xj , xk form the vertices of an equilateral triangle with side length t.
Suppose 1 ≤ i, j ≤ n and n + 1 ≤ k ≤ n + m. Then the triple xi , xj , xk forms an isosceles
triangle with equal longest sides of length t, and a possibly shorter side of length f (αi , t)
(if |k − i| = n), f (αj , t) (if |k − j| = n), or just a third equal side with length t in the
remaining cases. The case 1 ≤ i ≤ n, n + 1 ≤ j, k ≤ n + m is similar. This verifies the
triangle inequality. Also note that δt is the distance matrix of a bona fide metric space for
t ∈ (0, 1). For t = 1, we identify the points xi and xi−n , for n + 1 ≤ i ≤ n + m, to obtain
∆n , and for t = 0, we identify all points together to obtain ∆1 . This allows us to define
58
geodesics between ∆1 and ∆n as follows. Let α
~ denote the vector (α1 , . . . , αm ). We define
a map γα~ : [0, 1] → M by writing:
diag, we check case-by-case that dis(diag) ≤ |t − s|. Thus for any s, t ∈ [0, 1], we have
dGH (γα~ (s), γα~ (t)) ≤ 12 |t − s| = |t − s| · dGH (∆1 , ∆n ). It follows by Lemma 95 that γα~ is a
geodesic between ∆1 and ∆n . Furthermore, since α ~ ∈ (0, 1]m was arbitrary, this holds for
any such α ~ . Thus we have an infinite family of geodesics γα~ : [0, 1] → M from ∆1 to ∆n .
A priori, some of these geodesics may intersect at points other than the endpoints. By
this we mean that there may exist t ∈ (0, 1) and α ~ 6= β~ ∈ (0, 1]m such that [γα~ (t)] =
[γβ~ (t)] in M/∼. This is related to the branching phenomena that we describe in the next
section. For now, we give an infinite subfamily of geodesics that do not intersect each
other anywhere except at the endpoints. Recall that the separation of a finite metric space
(X, dX ) is the smallest positive distance in X, which we denote by sep(X). If sep(X) <
sep(Y ) for two finite metric spaces X and Y , then dGH (X, Y ) > 0.
Let ≺ denote the following relation on (0, 1]m : for α ~ , β~ ∈ (0, 1]m , set α
~ ≺ β~ if αi < βi
for each 1 ≤ i ≤ m. Next let α ~ , β~ ∈ (0, 1]m be such that α
~ ≺ β.~ Then γ ~ is a geodesic from
β
∆1 to ∆n which is distinct (i.e. non-isometric) from γα~ everywhere except at its endpoints.
This is because the condition α ~ ≺ β~ guarantees that for each t ∈ (0, 1), sep(γα~ (t)) <
sep(γβ~ (t)). Hence dGH (γα~ (t), γβ~ (t)) > 0 for all t ∈ (0, 1).
~ ∈ (0, 1)m , and let ~1 denote the all-ones vector of length m. For η ∈ [0, 1],
Finally, let α
~
define β(η) := (1 − η)~a + η~1. Then by the observations about the relation ≺, {γβ(η) ~ :
η ∈ [0, 1]} is an infinite family of geodesics from ∆1 to ∆n that do not intersect each other
anywhere except at the endpoints.
Note that one could choose the diameter of ∆n to be arbitrarily small and still obtain
deviant geodesics via the construction above.
Branching geodesics
59
The structure of dGH permits branching geodesics, as illustrated on the right. We use the
notation (a)+ for any a ∈ R to denote max(0, a). As above, fix n ∈ N, n ≥ 2, and consider
the straight-line geodesic between ∆1 and ∆n described at the beginning of Section 1.8.3.
Throughout this section, we denote this geodesic by γ : [0, 1] → M. We will construct an
infinite family of geodesics which branch off from γ. For convenience, we will overload
notation and write, for each t ∈ [0, 1], the distance matrix of γ(t) as γ(t). Recall from
above that γ(t) is a symmetric n × n matrix with the following form:
0 t t ... t
0 t . . . t
. . ..
. . . . .
0
For t > a1 , we have dGH γ(t), γ (a1 ) (t) > 0, because any correspondence between
γ(t), γ (a1 ) (t) has distortion at least t − a1 . Thus γ (a1 ) branches off from γ at a1 .
The construction of γ (a1 ) (t) above is a special case of a one-point metric extension.
Such a construction involves appending an extra row and column to the distance matrix of
the starting space; explicit conditions for the entries of the new row and column are stated
in [97, Lemma 5.1.22]. In particular, γ (a1 ) (t) above satisfies these conditions.
Procedurally, the γ (a1 ) (t) construction can be generalized as follows. Let (•) denote
any finite subsequence of (ai )i∈N . We also allow (•) to be the empty subsequence. Let aj
denote the terminal element in this subsequence. Then for any ak , k > j, we can construct
γ (•,ak ) as follows:
1. Take the rightmost column of γ (•) (t), replace the only 0 by (t − ak )+ , append a 0 at
the bottom.
2. Append this column on the right to a copy of γ (•) (t).
3. Append the transpose of another copy of this column to the bottom of the newly
constructed matrix to make it symmetric.
The objects produced by this construction satisfy the one-point metric extension con-
ditions [97, Lemma 5.1.22], and hence are distance matrices of pseudometric spaces. By
taking the appropriate quotients, we obtain valid distance matrices. Symmetry is satisfied
by definition, and the triangle inequality is satisfied because any triple of points forms an
isosceles triangle with longest sides equal. We write Γ(•) (t) to denote the matrix obtained
from γ (•) (t) after taking quotients. As an example, we obtain the following matrices after
taking quotients for γ (a1 ) (t) above, for 0 ≤ t ≤ a1 (below left) and for a1 < t ≤ 1 (below
right):
0 t ... t t
0 t ... t 0 ... t
0 . . . t t
. . .. ..
. . ..
. . .
. .
0 (t − a1 )
0
0
60
Now let (aij )kj=1 be any finite subsequence of (ai )i∈N . For notational convenience, we
write (bi )i instead of (aij )kj=1 . Γ(bi )i is a curve in M; we need to check that it is moreover a
geodesic.
Let s ≤ t ∈ [0, 1]. Then Γ(bi )i (s) and Γ(bi )i (t) are square matrices with n + p and
n + q columns, respectively, for nonnegative integers p and q. It is possible that the matrix
grows in size between s and t, so we have q ≥ p. Denote the underlying point set by
{x1 , x2 , . . . , xn+p , . . . , xn+q }. Then define:
Here B is possibly empty. Note that R is a correspondence between Γ(bi )i (s) and
Γ(bi )i (t), and by direct calculation we have dis(R) ≤ |t − s|. Hence we have
61
to this problem by endowing a network with a probability measure. The user adjusts the
measure to signify important network substructures and to smooth out the effect of outliers.
This approach was adopted in [70] to compare various real-world network datasets mod-
eled as metric measure (mm) spaces—metric spaces equipped with a probability measure.
This work was based in turn on the formulation of the Gromov-Wasserstein (GW) distance
between mm spaces presented in [85, 86].
a m
µX (x1 ) µY (y1 )
e f q r
d g p s
h t
µX (x2 ) µX (x3 ) µY (y2 ) µY (y3 )
i u
b c n o
X Y
Figure 1.19: Illustrations of the finite networks we consider in this paper. Notice that the
edge weights are asymmetric. The numbers in each node correspond to probability masses;
for each network, these masses sum to 1.
62
Figure 1.19. Already in [70], it was observed that numerical computation of GW distances
between networks worked well for network comparison even when the underlying datasets
failed to be metric. This observation was further developed in [100], where the focus from
the outset was to define generalized discrepancies between matrices that are not necessarily
metric.
On the computational front, the authors of [100] directly attacked the nonconvex opti-
mization problem by considering an entropy-regularized form of the GW distance (ERGW)
following [111], and using a projected gradient descent algorithm based on results in
[10, 111]. This approach was also used (for a generalized GW distance) on graph-structured
datasets in [119]. It was pointed out in [119] that the gradient descent approach for the
ERGW problem occasionally requires a large amount of regularization to obtain conver-
gence, and that this could possibly lead to over-regularized solutions. A different approach,
developed in [85, 86], considers the use of lower bounds on the GW distance as opposed to
solving the full GW optimization problem. This is a practical approach for many use cases,
in which it may be sufficient to simply obtain lower bounds for the GW distance.
In the current section, we use the GW distance formulation to define and develop a met-
ric structure on the space of measure networks. Additionally, by following the approaches
used in [85, 86], we are able to produce quantitatively stable network invariants that pro-
duce polynomial-time lower bounds on this network GW distance.
Remark 97. Sturm has studied symmetric, L2 versions of measure networks (called gauged
measure spaces) in [117], and we point to his work as an excellent reference on the geom-
etry of such spaces. Our motivation comes from studying networks, hence the difference
in our naming conventions.
63
The information contained in a network should be preserved when we relabel the nodes
in a compatible way; we formalize this idea by the following notion of strong isomorphism
of measure networks.
• ϕ∗ µX = µY .
Example 98. Networks with one or two nodes will be very instructive in providing exam-
ples and counterexamples, so we introduce them now with some special terminology.
• By N1 (a) we will refer to the network with one node X = {p}, a weight ωX (p, p) =
a, and the Dirac measure δp = 1p .
• By N2 (( ac db ) , α, β) we will mean a two-node network with node set X = {p, q}, and
weights and measures given as follows:
ωX (p, p) = a µX ({p}) = α
ωX (p, q) = b µX ({q}) = β
ωX (q, p) = c
ωX (q, q) = d
Notation. Even though µX takes sets as its argument, we will often omit the curly braces
and use µX (p, q, r) to mean µX ({p, q, r}).
We wish to define a notion of distance on Nm that is compatible with isomorphism. A
natural analog is the Gromov-Wasserstein distance defined between metric measure spaces
[85]. To adapt that definition for our needs, we first recall the definition of a measure
coupling.
64
1.9.2 Couplings and the distortion functional
Let (X, ωX , µX ), (Y, ωY , µY ) be two measure networks. A coupling between these two
networks is a probability measure µ on X × Y with marginals µX and µY , respectively.
Stated differently, couplings satisfy the following property:
µ(A × Y ) = µX (A) and µ(X × B) = µY (B), for all A ∈ Borel(X) and B ∈ Borel(Y ).
The collection of all couplings between (X, ωX , µX ) and (Y, ωY , µY ) will be denoted
C (µX , µY ), abbreviated to C when the context is clear.
In the case where we have a coupling µ between two measures ν, ν 0 on the same network
(X, ωX ), the quantity µ(A × B) is interpreted as the amount of mass transported from A
to B when interpolating between the two distributions ν and ν 0 . In this special case, a
coupling is also referred to as a transport plan.
Here we also recall that the product σ-field on X ×Y , denoted Borel(X)⊗Borel(Y ), is
defined as the σ-field generated by the measurable rectangles A × B, where A ∈ Borel(X)
and B ∈ Borel(Y ). Because our spaces are all Polish, we always have Borel(X × Y ) =
Borel(X) ⊗ Borel(Y ) [13, Lemma 6.4.2].
The product measure µX ⊗ µY is defined on the measurable rectangles by writing
By a consequence of Fubini’s theorem and the π-λ theorem, the property above uniquely
defines the product measure µX ⊗ µY among measures on Borel(X × Y ).
Example 100 (1-point coupling). Let X be a set, and let Y = {p} be the set with one
point. Then for any probability measure µX on X there is a unique coupling µ = µX ⊗ δp
between µX and δp . To see this, first we check that µ as defined above is a coupling. Let
A ∈ Borel(X). Then µ(A × Y ) = µX (A)δp (Y ) = µX (A), and similarly µ(X × {p}) =
µX (X)δp ({p}) = δp ({p}). Thus µ ∈ C (X, Y ). For uniqueness, let ν be another coupling.
It suffices to show that ν agrees with µ on the measurable rectangles. Let A ∈ Borel(X),
and observe that
On the other hand, ν(A × ∅) ≤ ν(X × ∅) = (πY )∗ ν(∅) = 0 = µX (A)δp (∅) = µ(A × ∅).
Thus ν satisfies the property ν(A × B) = µX (A)δp (B). Thus by uniqueness of the
product measure, ν = µX ⊗ δp . Finally, note that we can endow X and Y with weight
functions ωX and ωY , thus adapting this example to the case of networks.
65
Example 101 (Diagonal coupling). Let (X, ωX , µX ) ∈ Nm . The diagonal coupling be-
tween µX and itself is defined by writing
Z
∆(A×B) := 1A×B (x, x0 ) dµX (x) dδx (x0 ) for all A ∈ Borel(X), B ∈ Borel(Y ).
X×X
Next let µ ∈ C (µX , µY ), and consider the probability space (X × Y )2 equipped with the
product measure µ ⊗ µ. For each p ∈ [1, ∞) the p-distortion of µ is defined as:
Z Z 1/p
0 0 p 0 0
disp (µ) = |ωX (x, x ) − ωY (y, y )| dµ(x, y) dµ(x , y )
X×Y X×Y
= kΩX,Y k Lp (µ⊗µ) .
When the context is clear, we will often write kf kp to denote kf kLp (µ⊗µ) .
66
In what follows, we follow the presentation in [117]. Since X is Polish, it can be viewed
as a standard Borel space [113] and therefore as the pushforward of Lebesgue measure on
the unit interval I. More specifically, let C0 = 0, write Ci = ij=1 ci for i ∈ N ∪ {∞},
P
I 0 = [C∞ , 1], and X 0 = supp(µ0X ). Now X 0 is a standard Borel space equipped with a
nonatomic measure, so by [113, Theorem 3.4.23], there is a Borel isomorphism ρ0 : I 0 →
X 0 such that µ0X = ρ0∗ λI 0 , where λI 0 denotes Lebesgue measure restricted to I 0 . Define the
representation map ρ : I → X as follows:
The map ρ0 is not necessarily unique, and therefore neither is ρ. Any such map ρ is
called a parametrization of X. In particular, we have µX = ρ∗ λI .
The benefit of this construction is that it allows us to represent the underlying measur-
able space of a network via the unit interval I. Moreover, by taking the pullback of ωX via
ρ, we obtain a network (I, ρ∗ ωX , λI ). As we will see in the next section, this permits the
strategy of proving results over I and transporting them back to X using ρ.
Remark 102 (A 0-distortion coupling between a space and its interval representation).
Let (X, ωX , µX ) ∈ Nm , and let (I, ρ∗ ωX , λI ) be an interval representation of X for some
parametrization ρ. Consider the map (ρ, id) : I → X × I given by i 7→ (ρ(i), i). Define
µ := (ρ, id)∗ λI . Let A ∈ Borel(X) and B ∈ Borel(I). Then µ(A × I) = λI ({j ∈
I : ρ(j) ∈ A}) = µX (A). Also, µ(X × B) = λI ({j ∈ B : ρ(j) ∈ X}) = λI (B).
Thus µ is a coupling between µX and λI . Moreover, for any A ∈ Borel(X) and any
B ∈ Borel(I), if for each j ∈ B we have ρ(j) 6∈ A, then we have µ(A × B) = 0. In
particular, µ(A × B) = µ ((A ∩ ρ(B)) × B). Also, given (x, i) ∈ X × I, we have that
ρ(i) 6= x implies (x, i) 6∈ supp(µ).
Let 1 ≤ p < ∞. For convenience, define ωI := ρ∗ ωX . An explicit computation of
disp (µ) shows:
Z Z
p
p
disp (µ) = |ωX (x, x0 ) − ωI (i, i0 )| dµ(x, i) dµ(x0 , i0 )
ZX×I
Z X×I
p
= |ωX (ρ(i), ρ(i0 )) − ωI (i, i0 )| dλI (i) dλI (i0 )
I I
= 0.
For p = ∞, we have:
67
1.9.4 Optimality of couplings in the network setting
We now collect some results about probability spaces. Let X be a Polish space. A
subset P ⊆ Prob(X) is said to be tight if for all ε > 0, there is a compact subset Kε ⊆ X
such that µX (X \ Kε ) ≤ ε for all µX ∈ P .
A sequence (µn )n∈N ∈ Prob(X)N is said to converge narrowly to µX ∈ Prob(X) if
Z Z
lim f dµn = f dµX for all f ∈ Cb (X),
n→∞ X X
68
1.9.5 The Network Gromov-Wasserstein distance
For each p ∈ [1, ∞], we define:
1
dN,p (X, Y ) := inf disp (µ) for each (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm .
2 µ∈C (µX ,µY )
Remark 108 (Boundedness of dN,p ). Recall from Example 99 that for any X, Y ∈ Nm ,
C (µX , µY ) always contains the product coupling, and is thus nonempty. A consequence
is that dN,p (X, Y ) is bounded for any p ∈ [1, ∞]. Indeed, by taking the product coupling
µ := µX ⊗ µY we have
1
dN,p (X, Y ) ≤ disp (µ).
2
Suppose first that p ∈ [1, ∞). Applying Minkowski’s inequality, we obtain:
The case p = ∞ case is analogous, except that integrals are replaced by taking essential
suprema as needed.
Example 109 (Easy examples of dN,p ). Let a, b ∈ R and consider the networks N1 (a) and
N1 (b). The unique coupling between the two networks is the product measure µ = δx ⊗ δy ,
where we understand x, y to be the nodes of the two networks. Then for any p ∈ [1, ∞],
we obtain:
disp (µ) = |ωN1 (a) (x, x) − ωN1 (b) (y, y)| = |a − b|.
69
α α0
p 0
(X, ωX , µX ) p
Figure 1.20: The dN,p distance between the two one-node networks is simply 21 |α − α0 |. In
Example 109 we give an explicit formula for computing dN,p between an arbitrary network
and a one-node network.
For p = ∞, we have dN,p (X, N1 (a)) = sup{ 12 |ωX (x, x0 ) − a| : x, x0 ∈ supp(µX )}.
Remark 110. dN,p is not necessarily a metric modulo strong isomorphism. Let X =
{x1 , x2 , x3 } and Y = {y1 , y2 , y3 }. Consider a coupling µ given as:
y1 y2 y3
x1 1/3 0 0
µ = x2 1/6 0 0 .
x3 0 1/6 1/3
Then G contains all the points with positive µ-measure. Given any two points (x, y), (x0 , y 0 ) ∈
G, we observe that |ωX (x, x0 ) − ωY (y, y 0 )| = 0. Thus for any p ∈ [1, ∞], disp (µ) = 0, and
so dN,p (X, Y ) = 0.
The definition of dN,p is sensible in the sense that it captures the notion of a distance:
Theorem 111. For each p ∈ [1, ∞], dN,p is a pseudometric on Nm .
70
e e
x1 y1
e g h g g
e h h
g f
x2 x3 y2 y3
h f
e f f f
X Y
Figure 1.21: Networks at dN,p -distance zero which are not strongly isomorphic.
By the next result, this infimum is actually attained. Hence we may write:
1
dN,p (X, Y ) := min disp (µ).
2 µ∈C (X,Y )
The next result stands out in contrast to the case for dN : whereas we do not have results
about optimality of dN , the following result comes relatively easily by virtue of Prokhorov’s
lemma.
Theorem 112. Let (X, ωX , µX ) and (Y, ωY , µY ) be two measure networks, and let p ∈
[1, ∞]. Then there exists an optimal coupling, i.e. a minimizer for disp (·) in C (µX , µY ).
It remains to discuss the precise pseudometric structure of dN,p . The following defini-
tion is a relaxation of strong isomorphism.
• f∗ µZ = µX , g∗ µZ = µY , and
• kf ∗ ωX − g ∗ ωY k∞ = 0.
71
(a, b)} is measurable because ωX is measurable. Because f is measurable, we know that
(f, f ) : Z × Z → X × X is measurable. Thus A := (f, f )−1 (B) is measurable. Now we
write:
A = {(z, z 0 ) ∈ Z 2 : ((f (z), f (z 0 )) ∈ B}
= {(z, z 0 ) ∈ Z 2 : ωX (f (z), f (z 0 )) ∈ (a, b)}
= (f ∗ ωX )−1 (a, b).
Thus f ∗ ωX is measurable. Similarly, we verify that g ∗ ωY is measurable.
Theorem 113 (Pseudometric structure of dN,p ). Let (X, ωX , µX ), (Y, ωY , µY ) ∈ N , and
let p ∈ [1, ∞]. Then dN,p (X, Y ) = 0 if and only if X ∼
=w Y .
Remark 114. Theorem 113 is in the same spirit as related results for gauged measure
spaces [117] and for networks under dN , as discussed earlier. The “tripod structure” X ←
Z → Y described above is much more difficult to obtain in the setting of dN .
In the next section we follow a brief diversion to study a Gromov-Prokhorov distance
between measure networks. While it is not the main focus of the current paper, it turns out
to be useful for the notion of interleaving stability that we define in §1.9.7.
GP
Theorem 115. For each α ∈ [0, ∞), dN,α is a pseudometric on Nm .
Lemma 116 (Relation between Gromov-Prokhorov and Gromov-Wasserstein). Consider
(X, ωX , µX ), (Y, ωY , µY ) ∈ Nm . We always have:
GP
dN,0 (X, Y ) = dN,∞ (X, Y ).
72
Definition 42 (Lipschitz stability). Let p ∈ [1, ∞]. A Lipschitz-stable network invariant is
an invariant ιp : Nm → V for which there exists a Lipschitz constant L(ιp ) > 0 such that
dV (ιp (X), ιp (Y )) ≤ L(ιp )dN,p (X, Y ) for all X, Y ∈ Nm .
Definition 43 (Interleaving stability). Let p ∈ [1, ∞]. An interleaving-stable network
invariant is an R-parametrized invariant ιp : Nm × R → V for which there exists an
interleaving constant α ∈ R and a symmetric interleaving function ε : Nm × Nm → R such
that
ιp (X, t) ≤ ιp (Y, t+εXY )+αεXY ≤ ιp (X, t+2εXY )+2αεXY for all t ∈ R and X, Y ∈ Nm .
Here εXY := ε(X, Y ). In Example 117 below, we give some invariants that are interleaving
stable.
Example 117 (A map that ignores/emphasizes large edge weights). Let t ∈ R. For each
p ∈ [1, ∞], the pth t-sublevel set map for the weight function, denoted subw
p,t : Nm → R+ ,
is given as:
Z 1/p
w 0 p 0
subp,t (X, ωX , µX ) = |ωX (x, x )| d(µX ⊗ µX )(x, x ) for p ∈ [1, ∞),
{ωX ≤t}
subw
p,t (X, ωX , µX ) = sup{|ωX (x, x0 )| : x, x0 ∈ supp(µX ), ωX (x, x0 ) ≤ t} for p = ∞.
This map de-emphasizes large edge weights in a measure network. Analogously, one
can consider integrating over the set {ωX ≥ t}. In this case, the larger edge weights are
emphasized. The corresponding superlevel set invariant is denoted supwp,t .
Theorem 118 (Interleaving stability of the sublevel/superlevel set weight invariants). Let
p ∈ [1, ∞]. The subw p invariant is interleaving-stable with interleaving constant α = 1
and interleaving function dN,∞ . The supwp invariant is interleaving-stable with interleaving
constant α = −1 and interleaving function −dN,∞ .
We now define a family of local invariants that incorporate data from the networks at a
much finer scale. Computing these local invariants amounts to solving an optimal transport
(OT) problem, which is a linear programming (LP) task.
Example 119 (A generalized eccentricity function). Let (X, ωX , µX ) be a measure net-
work. Then consider the eccout
p,X : X → R+ map
Z 1/p
out p
eccp,X (s) := |ωX (s, x)| dµX (x) = kωX (s, ·)kLp (µX ) .
X
73
Example 120 (A joint eccentricity function). Let (X, ωX , µX ) and (Y, ωY , µY ) be two
measure networks, and let p ∈ [1, ∞]. Define the (outer) joint eccentricity function
eccout
p,X,Y : X × Y → R+ of X and Y as follows: for each (s, t) ∈ X × Y ,
eccout
p,X,Y (s, t) := inf kωX (s, ·) − ωY (t, ·)kLp (µ) .
µ∈C (µX ,µY )
One obtains the inner joint eccentricity function by using the term ωX (·, s)−ωY (·, t) above,
and we denote this by eccin
p,X,Y .
Theorem 121 (Stability of local R-valued invariants). The eccentricity and joint eccen-
tricity invariants are both Lipschitz stable, with Lipschitz constant 2. Formally, for any
(X, ωX , µX ), (Y, ωY , µY ) ∈ Nm , we have:
inf eccout
p,X,Y Lp (µ)
≤ 2dN,p (X, Y ). (joint eccentricity bound)
µ∈C (µX ,µY )
Moreover, the joint eccentricity invariant provides a stronger bound than the eccentric-
ity bound, i.e.
Finally, the analogous bounds hold in the case of the inner eccentricity and inner joint
eccentricity functions.
Remark 122. The analogous bounds in the setting of metric measure spaces were provided
in [85], where the eccentricity and joint eccentricity bounds were called the First and Third
Lower Bounds, respectively. The TLB later appeared in [107].
Having described the form of the local network invariants, we now leverage a particu-
larly useful fact about optimal transport over the real line. For probability measures over R,
the method for constructing an optimal coupling is known, and this gives a simple formula
for computing the OT cost in terms of the cumulative distribution functions of the measures
[120, Remark 2.19]. Later we obtain lower bounds based on distributions over R that can
be computed easily and remain stable with respect to the local invariants described above.
Remark 123. The structure of the joint eccentricity bound (i.e. the TLB) in Theorem 121
shows that a priori, it involves solving an ensemble of OT problems, one for each pair
(x, y) ∈ X × Y , and a final OT problem once eccout
p,X,Y is computed.
74
Example 124 (Pushforward via ωX ). Recall that given any (X, ωX , µX ), the corresponding
pushforward of µX ⊗ µX via ωX is given as follows: for any generator of Borel(R) of the
form (a, b) ⊆ R,
Here we write λ and ρ to refer to the “left” and “right” arguments, respectively. The corre-
sponding distribution functions are defined as follows: for any t ∈ R,
Z
FωX (x,·) (t) := µX ({ωX (x, ·) ≤ t}) = 1{ωX (x,·)≤t} (x0 ) dµX (x0 )
ZX
FωX (·,x) (t) := µX ({ωX (·, x) ≤ t}) = 1{ωX (·,x)≤t} (x0 ) dµX (x0 ).
X
It is interesting to note that we get such a pair of distributions for each x ∈ X. Thus we
can add yet another layer to this construction, via the maps Nm → pow(Prob(R)) defined
by writing
Assume for now that we equip Prob(R) with the Wasserstein metric. Write X := {λX (x)}x∈X ,
let dX denote the Wasserstein metric, and let µX := (λX )∗ µX . More specifically, for any
75
A ∈ Borel(X), we have µX (A) = µX ({x ∈ X : λX (x) ∈ A}). This yields a metric mea-
sure space (X, dX , µX ). So even though we do not start off with a metric space, the operation
of passing into distributions over R forces a metric structure on (X, ωX , µX ).
Next let (Y, ωY , µY ) ∈ N , and suppose (Y, dY , µY ) are defined as above. Since X, Y ⊆
Prob(R), we know that µX , µY are both distributions on Prob(R). Thus we can compare
them via the p-Wasserstein distance as follows, for p ∈ [1, ∞):
Z 1/p
p
dW,p (µX , µY ) = inf dW,p (λX (x), λY (y)) dµ(λX (x), λY (y))
µ∈C (µX ,µY ) Prob(R)2
By the change of variables formula, this quantity coincides with one that we show below
to be a lower bound for 2dN,p (X, Y ) (cf. Inequality (1.14) of Theorem 127).
Example 126 (Pushforward via eccentricity). Let (X, ωX , µX ), (Y, ωY , µY ) ∈ N , and let
(a, b) ∈ Borel(R). Recall the outer and inner eccentricity functions eccout in
p,X and eccp,X from
Example 119. These functions induce distributions as follows:
eccout out
p,X ∗ µ X (a, b) = µ X x ∈ X : ecc p,X (x) ∈ (a, b) ,
in in
eccp,X ∗ µX (a, b) = µX x ∈ X : eccp,X (x) ∈ (a, b) .
Next let µ ∈ C (µX , µY ) and recall the joint outer/inner eccentricity functions eccout
p,X,Y and
in
eccp,X,Y from Example 120. These functions induce distributions as below:
eccout out
p,X,Y ∗ µ(a, b) = µ (x, y) ∈ X × Y : eccp,X,Y (x, y) ∈ (a, b) ,
in in
eccp,X,Y ∗ µ(a, b) = µ (x, y) ∈ X × Y : eccp,X,Y (x, y) ∈ (a, b) .
76
stability, for p ∈ [1, ∞):
Z 1/p
0 0 p 0 0
2dN,p (X, Y ) ≥ inf |ωX (x, x ) − ωY (y, y )| dµ(x, x , y, y )
µ∈C (µX ⊗µX ,µY ⊗µY ) X 2 ×Y 2
(1.7)
Z 1/p
≥ inf |a − b|p dν(a, b) . (1.8)
ν∈C (νX ,νY ) R2
Z 1/p
p
2dN,p (X, Y ) ≥ inf eccout
p,X (x) − eccout
p,Y (y) dµ(x, y) (1.9)
µ∈C (µX ,µY ) X×Y
Z 1/p
p
≥ inf |a − b| dγ(a, b) . (1.10)
γ∈C ((eccout out
p,X )∗ µX ,(eccp,Y )∗ µY ) R2
Z 1/p
p
2dN,p (X, Y ) ≥ inf eccin
p,X (x) − eccin
p,Y (y) dµ(x, y) (1.11)
µ∈C (µX ,µY ) X×Y
Z 1/p
p
≥ inf |a − b| dγ(a, b) . (1.12)
γ∈C ((eccin in
p,X )∗ µX ,(eccp,Y )∗ µY ) R2
Z Z 1/p
0 0 p 0 0
2dN,p (X, Y ) ≥ inf inf |ωX (x, x ) − ωY (y, y )| dγ(x , y ) dµ(x, y)
µ∈C (µX ,µY ) X×Y γ∈C (µX ,µY ) X×Y
(1.13)
Z Z 1/p
≥ inf inf |a − b|p dγ(a, b) dµ(x, y) .
µ∈C (µX ,µY ) X×Y γ∈C (λX (x),λY (y)) R2
(1.14)
Z Z 1/p
0 0 p 0 0
2dN,p (X, Y ) ≥ inf inf |ωX (x, x ) − ωY (y, y )| dγ(x, y) dµ(x , y )
µ∈C (µX ,µY ) X×Y γ∈C (µX ,µY ) X×Y
(1.15)
Z Z 1/p
≥ inf 0
inf 0
|a − b|p dγ(a, b) dµ(x0 , y 0 ) .
µ∈C (µX ,µY ) X×Y γ∈C (ρX (x ),ρY (y )) R2
(1.16)
Here recall that νX = (ωX )∗ (µ⊗2 ⊗2
X ), νY = (ωY )∗ (µY ), λX = (ωX (x, ·))∗ µX , λY =
(ωY (y, ·))∗ µY , ρX = (ωX (·, x))∗ µX , and ρY = (ωY (·, y))∗ µY . Inequalities (1.7)-(1.8)
appeared as the Second Lower Bound and its relaxation in [85]. Inequalities (1.9), (1.11),
(1.13), and (1.15) are the eccentricity bounds in Theorem 121. Inequalities (1.10), (1.12),
(1.14), and (1.16) are their relaxations. In the symmetric case, these outer/inner pairs of
inequalities coincide; they appeared as the First and Third Lower Bounds and their relax-
ations in [85].
In Inequality (1.8), both νX and νY are probability distributions on R, and the right hand
side is precisely the p-Wasserstein distance between νX and νY . Analogous statements hold
for Inequalities (1.14) and (1.16).
77
To connect with the nomenclature introduced in [85], we give names to the inequalities
in Theorem 127: (1.7)-(1.8) are the SLB inequalities, (1.9)-(1.12) are the FLB inequalities,
and (1.13)-(1.16) are the TLB inequalities. These abbreviations stand for First, Second,
and Third Lower Bound. Note that due to the asymmetry of the network setting, we get
twice as many FLB and TLB inequalities as we would for the metric measure setting.
Next we describe the formula for computing OT over R (see [120, Remark 2.19]) . Let
measure spaces (X, µX ), (Y, µY ) and measurable functions f : X → R, g : Y → R be
given. Then let F, G : R → [0, 1] denote the cumulative distribution functions of f and g:
These formulae are easily adapted to obtain closed form solutions for the lower bounds
given by Inequalities (1.10) and (1.12) and for the inner OT problems in (1.14), (1.16) of
Theorem 127.
78
simplicial filtration, after which any out-of-the-box persistent homology package can take
over (we use Javaplex for the latter).
Computing PPH, however, is more tricky, a priori. As we point out in Remark 55,
we are not supplied with a basis for path homology computations, and computing this
basis requires significant additional preprocessing, at least for a naive implementation of
PPH. One of our contributions is in showing that the basis computation and persistence
computation rely on the same matrix operations, so both steps can be combined and carried
out together.
79
different arenas. To further exemplify our methods, we repeated our analysis after comput-
ing the 1-dimensional Rips persistence diagrams from the hippocampal activity networks.
In our experiment, there were five arenas. The first was a square of side length L = 10,
with four circular “holes” or “forbidden zones” of radius 0.2L that the trajectory could
not intersect. The other four arenas were those obtained by removing the forbidden zones
one at a time. In what follows, we refer to the arenas of each type as 4-hole, 3-hole, 2-
hole, 1-hole, and 0-hole arenas. For each arena, a random-walk trajectory of 5000 steps
was generated, where the animal could move along a square grid with 20 points in each
direction. The grid was obtained as a discretization of the box [0, L] × [0, L], and each step
had length 0.05L. The animal could move in each direction with equal probability. If one
or more of these moves took the animal outside the arena (a disallowed move), then the
probabilities were redistributed uniformly among the allowed moves. Each trajectory was
tested to ensure that it covered the entire arena, excluding the forbidden zones. Formally,
we write the time steps as a set T := {1, 2, . . . , 5000}, and denote the trajectory as a map
traj : T → [0, L]2 .
For each of the five arenas, 20 trials were conducted, producing a total of 100 trials. For
each trial lk , an integer nk was chosen uniformly at random from the interval [150, 200].
Then nk place fields of radius 0.05L were scattered uniformly at random inside the cor-
responding arena for each lk . An illustration of the place field distribution is provided in
Figure 1.22. A spike on a place field was recorded whenever the trajectory would intersect
it. So for each 1 ≤ i ≤ nk , the spiking pattern of cell xi , corresponding to place field PFi ,
was recorded via a function ri : T → {0, 1} given by:
(
1 : if traj(t) intersects PFi ,
ri (t) = t ∈ T.
0 : otherwise
The matrix corresponding to ri is called the raster of cell xi . A sample raster is il-
lustrated in Figure 1.22. For each trial lk , the corresponding network (Xk , ωXk ) was
constructed as follows: Xk consisted of nk nodes representing place fields, and for each
1 ≤ i, j ≤ nk , the weight ωXk (xi , xj ) was given by:
Ni,j (5)
ωXk (xi , xj ) := 1 − Pnk ,
i=1 Ni,j (5)
where Ni,j (5) = card (s, t) ∈ T 2 : t ∈ [2, 5000], t − s ∈ [1, 5], rj (t) = 1, ri (s) = 1 .
In words, Ni,j (5) counts the pairs of times (s, t), s < t, such that cell xj spikes (at a
time t) after cell xi spikes (at a time s), and the delay between the two spikes is fewer than
5 time steps. The idea is that if cell xj frequently fires within a short span of time after
cell xi fires, then place fields PFi and PFj are likely to be in close proximity to each other.
>
The column sum of the matrix corresponding to ωXk is normalized to 1, and so ωX k
can be
interpreted as the transition matrix of a Markov process.
80
10 10 10
9 9 9
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Figure 1.22: Bottom right: Sample place cell spiking pattern matrix. The x-axis corre-
sponds to the number of time steps, and the y-axis corresponds to the number of place
cells. Black dots represent spikes. Clockwise from bottom middle: Sample distribution
of place field centers in 4, 3, 0, 1, and 2-hole arenas.
81
Next, we computed the 1-dimensional Dowker persistence diagrams of each of the 100
networks. Note that DgmD D >
1 (ωX ) = Dgm1 (ωX ) by Proposition 46, so we are actually ob-
taining the 1-dimensional Dowker persistence diagrams of transition matrices of Markov
processes. We then computed a 100 × 100 matrix consisting of the bottleneck distances
between all the 1-dimensional persistence diagrams. The single linkage dendrogram gen-
erated from this bottleneck distance matrix is shown in Figure 1.23. The labels are in the
format env-<nh>-<nn>, where nh is the number of holes in the arena/environment, and
nn is the number of place fields. Note that with some exceptions, networks corresponding
to the same arena are clustered together. We conclude that the Dowker persistence diagram
succeeded in capturing the intrinsic differences between the five classes of networks arising
from the five different arenas, even when the networks had different sizes.
We then computed the Rips persistence diagrams of each network, and computed the
100 × 100 bottleneck distance matrix associated to the collection of 1-dimensional dia-
grams. The single linkage dendrogram generated from this matrix is given in Figure 1.24.
Notice that the Rips dendrogram does not do a satisfactory job of classifying arenas cor-
rectly.
Remark 128. We note that an alternative method of comparing the networks obtained from
our simulations would have been to compute the pairwise network distances, and plot the
results in a dendrogram. But dN is NP-hard to compute—this follows from the fact that
computing dN includes the problem of computing Gromov-Hausdorff distance between
finite metric spaces, which is NP-hard [105]. So instead, we are computing the bottleneck
distances between 1-dimensional Dowker persistence diagrams, as suggested by Remark
30.
Remark 129. It is possible to compare the current approach with the one taken in [43] on a
common dataset. We performed this comparison in [32] for a similar experiment, but with
a stochastic firing model for the place cells. Interestingly, it turns out that the network ap-
proach with Dowker persistence performs better than the approach in [43], as indicated by
computing 1-nearest neighbor classification error rates on the bottleneck distance matrices.
A possible explanation is that preprocessing the spiking data into a network automatically
incorporates a form of error correction, where the errors consist of stochastic firing between
cells that are non-adjacent. On the other hand, such errors are allowed to accumulate over
time in the approach taken in [43]. For an alternative error-correction approach, see [33].
82
Figure 1.23: Single linkage dendrogram corresponding to the distance matrix obtained by
computing bottleneck distances between 1-dimensional Dowker persistence diagrams of
our database of hippocampal networks (§1.10.2). Note that the 4, 3, and 2-hole arenas are
well separated into clusters at threshold 0.1.
83
Figure 1.24: Single linkage dendrogram corresponding to the distance matrix obtained by
computing bottleneck distances between 1-dimensional Rips persistence diagrams of our
database of hippocampal networks (§1.10.2). Notice that the hierarchical clustering fails to
capture the correct arena types.
84
Class # N v ni Sample cycle network of means
1 5 [0,25,50,75,100] 10 0 25 50 75 100
2 5 [0,50,100,150, 200] 10 100 0 25 50 75
3 5 [0,25,50,75,100] 20 75 100 0 25 50
4 2 [0,100] 25 50 75 100 0 25
5 5 [-100,-50,0,50,100] 10 25 50 75 100 0
Table 1.1: Left: The five classes of SBM networks corresponding to the experiment in
§1.10.3. N refers to the number of communities, v refers to the vector that was used to
compute a table of means via G5 (v), and ni is the number of nodes in each community.
Right: G5 (v) for v = [0, 25, 50, 75, 100].
85
5
25
10
0.12
15
20
20 5
0.1
25
15 0.08
10
30
0.06
15
35 10
0.04
40
20
5 0.02
45
25
50
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25
Figure 1.25: Left: TLB dissimilarity matrix for SBM community networks in §1.10.3.
Classes 1 and 3 are similar, even though networks in Class 3 have twice as many nodes
as those in Class 1. Classes 2 and 5 are most dissimilar because of the large difference in
their edge weights. Class 4 has a different number of communities than the others, and is
dissimilar to Classes 1 and 3 even though all their edge weights are in comparable ranges.
Right: TLB dissimilarity matrix for two-community SBM networks in §1.10.3. The near-
zero values on the diagonal are a result of using the adaptive λ-search described in Chapter
4.
86
Class # N v ni
1 2 [0,0] 10
2 2 [0,5] 10
3 2 [0,10] 10
4 2 [0,15] 10
5 2 [0,20] 10
87
10 4
7
1
2 6
m-2000
3
5
m-1960
4
m-1970
4
5 f-1970
f-1960
6
3
f-2000
7
f-1990
2
8 f-1980
m-1990
9 1
m-1980
10
3.5 4 4.5 5 5.5
1 2 3 4 5 6 7 8 9 10 10 4
Figure 1.26: Result of applying the TLB to the migration networks in §1.10.3. Left: Dis-
similarity matrix. Nodes 1-5 correspond to female migration from 1960-2000, and nodes
6-10 correspond to male migration from 1960-2000. Right: Single linkage dendrogram.
Notice that overall migration patterns change in time, but within a time period, migration
patterns are grouped according to gender.
dendrogram suggests that between 1980 and 1990, both male and female populations had
quite similar migration patterns. Within these years, however, migration patterns were
more closely tied to gender. This effect is more pronounced between 1960 and 1970,
where we see somewhat greater divergence between migration patterns based on gender.
Male migration in 2000 is especially divergent, with the greatest dissimilarity to all the
other datasets.
The labels in the dissimilarity matrix are as follows: 1-5 correspond to “f-1960” through
“f-2000”, and 6-10 correspond to “m-1960” through “m-2000”. The color gradient in the
dissimilarity matrix suggests that within each gender, migration patterns change in a way
that is parametrized by time. This of course reflects the shifts in global technological and
economical forces which make migration attractive and/or necessary with time.
88
Chapter 2: Metric structures of dN and dN,p
In this chapter, we supply the proofs of the results on the metric structure of N that
were stated in §1. Along the way, we occasionally state additional definitions and results.
Let ε > dN (X, Y ), and let R be a correspondence such that dis(R) < 2ε. We define maps
ϕ : X → Y and ψ : Y → X as follows: for each x ∈ X, set ϕ(x) = y for some y such
that (x, y) ∈ R. Similarly, for each y ∈ Y , set ψ(y) = x for some x such that (x, y) ∈ R.
Let x ∈ X, y ∈ Y . Then we have
|ωX (x, ψ(y)) − ωY (ϕ(x), y)| < 2ε and |ωX (ψ(y), x) − ωY (y, ϕ(x))| < 2ε.
Since x ∈ X, y ∈ Y were arbitrary, it follows that CX,Y (ϕ, ψ) ≤ 2ε and CY,X (ψ, ϕ) ≤ 2ε.
Also for any x, x0 ∈ X, we have (x, ϕ(x)), (x0 , ϕ(x0 )) ∈ R, and so
Thus dis(ϕ) ≤ 2ε, and similarly dis(ψ) ≤ 2ε. This proves the “≥” case.
Next we wish to show the “≤” case. Suppose ϕ, ψ are given, and
1
max(dis(ϕ), dis(ψ), CX,Y (ϕ, ψ), CY,X (ψ, ϕ)) < ε,
2
for some ε > 0.
Let RX = {(x, ϕ(x)) : x ∈ X} and let RY = {(ψ(y), y) : y ∈ Y }. Then R = RX ∪RY
is a correspondence. We wish to show that for any z = (a, b), z 0 = (a0 , b0 ) ∈ R,
89
To see this, let z, z 0 ∈ R. Note that there are four cases: (1) z, z 0 ∈ RX , (2) z, z 0 ∈ RY ,
(3) z ∈ RX , z 0 ∈ RY , and (4) z ∈ RY , z 0 ∈ RX . In the first two cases, the desired
inequality follows because dis(ϕ), dis(ψ) < 2ε. The inequality follows in cases (3) and (4)
because CX,Y (ϕ, ψ) < 2ε and CY,X (ψ, ϕ) < 2ε, respectively. Thus dN (X, Y ) ≤ ε.
Proof of Example 9. We start with some notation: for x, x0 ∈ X, y, y 0 ∈ Y , let
Γ(x, x0 , y, y 0 ) = |ωX (x, x0 ) − ωY (y, y 0 )|.
Let ϕ : X → Y be a bijection. Note that Rϕ := {(x, ϕ(x)) : x ∈ X} is a correspon-
dence, and this holds for any bijection (actually any surjection) ϕ. Since we minimize over
all correspondences for dN , we conclude dN (X, Y ) ≤ dbN (X, Y ).
For the reverse inequality, we represent all the elements of R(X, Y ) as 2-by-2 binary
matrices R, where a 1 in position ij means (xi , yj ) ∈ R. Denote the matrix representation
of each R ∈ R(X, Y ) by mat(R), and the collection of such matrices as mat(R). Then we
have:
mat(R) = {( 1b a1 ) : a, b ∈ {0, 1}} ∪ {( a1 1b ) : a, b ∈ {0, 1}}
Let A = {(x1 , y1 ), (x2 , y2 )} (in matrix notation, this is ( 10 01 )) and let B = {(x1 , y2 ), (x2 , y1 )}
(in matrix notation, this is ( 01 10 )). Let R ∈ R(X, Y ). Note that either A ⊆ R or B ⊆ R.
Suppose that A ⊆ R. Then we have:
max Γ(x, x0 , y, y 0 ) ≤ max Γ(x, x0 , y, y 0 )
(x,y),(x0 ,y 0 )∈A (x,y),(x0 ,y 0 )∈R
Let Ω(A) denote the quantity on the left hand side. A similar result holds in the case
B ⊆ R:
max
0 0
Γ(x, x0 , y, y 0 ) ≤ max
0 0
Γ(x, x0 , y, y 0 )
(x,y),(x ,y )∈B (x,y),(x ,y )∈R
Let Ω(B) denote the quantity on the left hand side. Since either A ⊆ R or B ⊆ R, we have
min {Ω(A), Ω(B)} ≤ min max Γ(x, x0 , y, y 0 )
R∈R (x,y),(x0 ,y 0 )∈R
90
Proof of Proposition 12. We begin with an observation. Given X, Y ∈ N , let X 0 , Y 0 ∈ N
be such that X ∼
=w 0 ∼w 0 0 0
II X , Y =II Y , and card(X ) = card(Y ). Then we have:
sup |ωU (z, z 0 ) − ωV (ϕ(z), ϕ(z 0 ))| = sup |ωU (z, z 0 ) − ωV (z, z 0 )|
z,z 0 ∈U z,z 0 ∈U
= dis(R). In particular,
inf dis(ϕ) ≤ dis(R).
ϕ:U →V bijection
So there exist networks U, V with the same node set (and thus the same cardinality)
such that dbN (U, V ) ≤ 21 dis(R) < η. We have already shown that dN (X, Y ) ≤ dbN (U, V ).
Since η > dN (X, Y ) was arbitrary, it follows that we have:
n o
dN (X, Y ) = inf dbN (X 0 , Y 0 ) : X 0 ∼
=w
II X, Y 0 ∼w
= II Y, and card(X 0
) = card(Y 0
) .
Proof of Theorem 64. Once an ε-system has been found, the refinement can be produced
by standard methods. So we focus on proving the existence of an ε-system. The idea
is to find a cover of X by open sets G1 , . . . , Gq and representatives xi ∈ Gi for each
1 ≤ i ≤ q such that whenever we have (x, x0 ) ∈ Gi × Gj , we know by continuity of ωX
91
that |ωX (x, x0 ) − ωX (xi , xj )| < ε. Then we define a correspondence that associates each
x ∈ Gi to xi , for 1 ≤ i ≤ q. Such a correspondence has distortion bounded above by ε.
Let ε > 0. Let B be a base for the topology on X.
Let {B(r, ε/4) : r ∈ R} be an open cover for R. Then by continuity of ωX , we get that
−1
ωX [B(r, ε/4)] : r ∈ R
is an open cover for X × X. Each open set in this cover can be written as a union of open
rectangles U × V , for U, V ∈ B. Thus the following set is an open cover of X × X:
U := U × V : U, V ∈ B, U × V ⊆ ωX −1
[B(r, ε/4)], r ∈ R .
Claim 1. There exists a finite open cover G = {G1 , . . . , Gq } of X such that for any 1 ≤
i, j ≤ q, we have Gi × Gj ⊆ U × V for some U × V ∈ U .
Proof of Claim 1. The proof of the claim proceeds by a repeated application of the Tube
Lemma [89, Lemma 26.8]. Since X × X is compact, we take a finite subcover:
Uxf := {U × V ∈ U f : x ∈ U },
and write n o
Uxf = Uix1 × Vix1 , . . . , Uixm(x) × Vixm(x) .
Here m(x) is an integer depending on x, and i1 , . . . , im(x) is a subset of {1, . . . , n}.
Since U f is an open cover of X × X, we know that Uxf is an open cover of {x} × X.
Next define:
m(x)
\
Ax := Uixk .
k=1
Then Ax is open and contains x. In the literature [89, p. 167], the set Ax × X is called a
tube around {x} × X. Notice that Ax × X ⊆ Uxf . Since x was arbitrary in the preceding
construction, we define Uxf and Ax for each x ∈ X. Then note that {Ax : x ∈ X} is an
open
cover of X. Using compactness of X, we choose {s1 , . . . , sp } ⊆ X, p ∈ N, such that
As1 , . . . , Asp is a finite subcover of X.
Once again let x ∈ X, and let Uxf and Ax be defined as above. Define the following:
Bx := Ax × Vixk : 1 ≤ k ≤ m(x) .
m(x)
Since x ∈ Ax and X ⊆ ∪k=1 Vixk , it follows that Bx is a cover of {x} ×X. Furthermore,
since As1 , . . . , Asp is a cover of X, it follows that the finite collection Bs1 , . . . , Bsp is
a cover of X × X.
92
m(x)
Let z ∈ X. Since X ⊆ ∪k=1 Vixk , we pick Vixk for 1 ≤ k ≤ m(x) such that z ∈ Vixk .
Since x was arbitrary, such a choice exists for each x ∈ X. Therefore, we define:
Since each Bsi is finite and there are finitely many Bsi , we know that Cz is a finite
collection. Next define: \
Dz := V.
V ∈Cz
Then Dz is open and contains z. Notice that X × Dz is a tube around X × {z}. Next, using
the fact that {Asi : 1 ≤ i ≤ p} is an open cover of X, pick Asi(z) such that z ∈ Asi(z) . Here
1 ≤ i(z) ≤ p is some integer depending on z. Then define
Gz := Dz ∩ Asi(z) .
Then Gz is open and contains z. Since z was arbitrary, we define Gz for each z ∈ X. Then
{Gz : z ∈ X} is an open cover of X, and we take a finite subcover:
G := {G1 , . . . , Gq }, q ∈ N.
Note that the second containment holds by definition of Asi(w) . Since Usfi(w) is a cover of
si(w) × X, we choose V si(w) to contain y. Then observe that Asi(w) × V si(w) ∈ Bsi(w) .
Then V si(w) ∈ Cy , and so we have:
Gy ⊆ Dy ⊆ V si(w) .
n o
R := (x, xi ) : x ∈ G ei , 1 ≤ i ≤ q .
93
Let (x, xi ), (x0 , xj ) ∈ R. Then we have (x, x0 ), (xi , xj ) ∈ G
ei × Gej ⊆ Gi × Gj . By the
preceding work, we know that Gi × Gj ⊆ U × V , for some U × V ∈ U . Therefore
ωX (x, x0 ), ωX (xi , xj ) ∈ B(r, ε/4) for some r ∈ R. It follows that:
Since (x, xi ), (x0 , xj ) ∈ R were arbitrary, we have dis(R) < ε/2. Hence dN (X, X 0 ) <
ε.
Proof of Theorem 67. The first part of this proof is
similar to that of Theorem 64. Let
ε > 0. Let B be a base for the topology on X. Then ωX −1
[B(r, ε/8)] : r ∈ R is an open
cover for X × X. Each open set in this cover can be written as a union of open rectangles
U × V , for U, V ∈ B. Thus the following set is an open cover of X × X:
U := U × V : U, V ∈ B, U × V ⊆ ωX −1
[B(r, ε/8)], r ∈ R .
By applying Claim 1 from the proof of Theorem 64, we obtain a finite open cover G =
{G1 , . . . , Gq } of X such that for any 1 ≤ i, j ≤ q, we have Gi × Gj ⊆ U × V for some
U × V ∈ U . For convenience, we assume that each Gi is nonempty.
Now let 1 ≤ i ≤ q. Then Gi ∩ S 6= ∅, because S is dense in X. Choose p(i) ∈ N such
that sp(i) ∈ Gi . We repeat this process for each 1 ≤ i ≤ q, and then define
Now define Xn to be the network with node set {s1 , s2 , . . . , sn } and weight function given
the appropriate restriction of ωX . Also define Sn to be the network with node set
by
sp(1) , sp(2) , . . . , sp(q) and weight function given by the restriction of ωX .
Claim 2. Let A be a subset of X equipped with the weight function ωX |A×A . Then
dN (Sn , A) < ε/2.
Proof of Claim 2. We begin with G = {G1 , . . . , Gq }. Notice that each Gi contains sp(i) .
To avoid ambiguity in our construction, we will need to ensure that Gi does not contain
sp(j) for i 6= j. So our first step is to obtain a cover of A by disjoint sets while ensuring that
each sp(i) ∈ Sn belongs to exactly one element of the new cover. We define:
n o
Notice that G ei : 1 ≤ i ≤ q is a cover for A, and for each 1 ≤ i ≤ q, G
ei contains sp(j) if
and only if i = j. Now we define a correspondence between A and Sn as follows:
n o
R := (x, sp(i) ) : x ∈ A ∩ G
ei , 1 ≤ i ≤ q .
94
Next let (x, sp(i) ), (x0 , sp(j) ) ∈ R. Then we have (x, x0 ), (sp(i) , sp(j) ) ∈ G
ei × G
ej ⊆ Gi ×
Gj ⊆ U × V for some U × V ∈ U . Therefore ωX (x, x ) and ωX (sp(i) , sp(j) ) both belong
0
Next we proceed to Theorem 68. We first prove the following useful lemma:
Lemma 130. Assume the setup of (X, ωX ), µX , (Ω, F, P), and Xn for each n ∈ N as in
Theorem 68. Fix ε > 0, and let U = {U1 , . . . , Um } be a refined ε-system on supp(µX ). For
each 1 ≤ i ≤ m and each n ∈ N, define the following event:
n
\
Ai := {ω ∈ Ω : xk (ω) 6∈ Ui } ⊆ Ω.
k=1
Sm 1
Then we have P ( k=1 Ak ) ≤ m(U )
(1 − m(U))n .
Proof of Lemma 130. Here we are considering the probability that at least one of the Ui
has empty intersection with Xn . By independence, P(Ai ) = (1 − µX (Ui ))n . Then we have:
m
! m m
[ X X (1 − m(U))n
P Ak ≤ P(Ak ) = (1 − µX (Uk ))n ≤ m · max (1 − µ(Uk ))n ≤ .
k=1 k=1 k=1
1≤k≤m m(U)
Here the first inequality follows by subadditivity of measure, and the last inequality follows
because the total mass µX (supp(µX )) = 1 is an upper bound for m · m(U). Note also that
each U ∈ U has nonzero mass, by the observation in Definition 24.
Proof of Theorem 68. By endowing supp(µX ) with the restriction of ωX to supp(µX ) ×
supp(µX ) it may itself be viewed as a network with full support, so for notational conve-
nience, we assume X = supp(µX ).
First observe that Mε/2 (X) ∈ (0, 1]. Let r ∈ (0, Mε/2 (X)), and let Ur be an ε/2-system
on X such that m(Ur ) ∈ (r, Mε/2 (X)]. For convenience, write m := |Ur |, and also write
Ur = {U1 , . . . , Um }. For each 1 ≤ i ≤ m, define Ai as in the statement of Lemma 130.
Then by Lemma 130, the probability that at least one Ui has empty intersection with Xn
is bounded as P ( m 1 n
S
k=1 Ak ) ≤ m(Ur ) (1 − m(Ur )) . On the other hand, if Ui has nonempty
95
intersection with Xn for each 1 ≤ i ≤ m, then by Theorem 66, we obtain dN (X, Xn ) < ε.
For each n ∈ N, define: Bn := {ω ∈ Ω : dN (X, Xn (ω)) ≥ ε} . Then we have:
m
!
[ (1 − m(Ur ))n
P(Bn ) ≤ P Ak ≤ .
k=1
m(Ur )
Since r ∈ (0, Mε/2 (X)) was arbitrary, letting r approach Mε/2 (X) shows that P(Bn ) ≤
(1−Mε/2 (X))n
Mε/2 (X)
. We have by Definition 24 that Mε/2 (X) is strictly positive. Thus the term on
the right side of the inequality is an element of a convergent geometric series, so
∞ ∞
X 1 X
P(Bn ) ≤ (1 − Mε/2 (X))n < ∞.
n=1
Mε/2 (X) n=1
|ωA (ϕA (p), ϕA (p0 )) − ωB (ϕB (p), ϕB (p0 ))| < ε/2 for each p, p0 ∈ P, and
|ωB (ψB (s), ψB (s0 )) − ωC (ψC (s), ψC (s0 ))| < ε/2 for each s, s0 ∈ S.
|ωA (ϕA (πP (p, s)), ϕA (πP (p0 , s0 ))) − ωC (ψC (πS (p, s)), ψC (πS (p0 , s0 )))|
= |ωA (ϕA (p), ϕA (p0 )) − ωC (ψC (s), ψC (s0 ))|
= |ωA (ϕA (p), ϕA (p0 )) − ωB (ϕB (p), ϕB (p0 )) + ωB (ϕB (p), ϕB (p0 )) − ωC (ψC (s), ψC (s0 ))|
= |ωA (ϕA (p), ϕA (p0 )) − ωB (ϕB (p), ϕB (p0 )) + ωB (ψB (s), ψB (s0 )) − ωC (ψC (s), ψC (s0 ))|
< ε/2 + ε/2 = ε.
96
Proof of Theorem 72. It is clear that dN (X, Y ) ≥ 0. To show dN (X, X) = 0, consider
the correspondence R = {(x, x) : x ∈ X}. Then for any (x, x), (x0 , x0 ) ∈ R, we have
|ωX (x, x0 ) − ωX (x, x0 )| = 0. Thus dis(R) = 0 and dN (X, X) = 0.
Next we show symmetry, i.e. dN (X, Y ) ≤ dN (Y, X) and dN (Y, X) ≤ dN (X, Y ). The
two cases are similar, so we just show the second inequality. Let η > dN (X, Y ). Let
R ∈ R(X, Y ) be such that dis(R) < 2η. Then define R̃ = {(y, x) : (x, y) ∈ R}. Note that
R̃ ∈ R(Y, X). We have:
So dis(R) = dis(R̃). Then dN (Y, X) = 21 inf S∈R(Y,X) dis(S) ≤ 21 dis(R̃) < η. This shows
dN (Y, X) ≤ dN (X, Y ). The reverse inequality follows by a similar argument.
Next we prove the triangle inequality. Let R ∈ R(X, Y ), S ∈ R(Y, Z), and let
First we claim that R ◦ S ∈ R(X, Z). This is equivalent to checking that for each x ∈ X,
there exists z such that (x, z) ∈ R ◦ S, and for each z ∈ Z, there exists x such that
(x, z) ∈ R ◦ S. The proofs of these two conditions are similar, so we just prove the former.
Let x ∈ X. Let y ∈ Y be such that (x, y) ∈ R. Then there exists z ∈ Z such that
(y, z) ∈ S. Then (x, z) ∈ R ◦ S.
Next we claim that dis(R◦S) ≤ dis(R)+dis(S). Let (x, z), (x0 , z 0 ) ∈ R◦S. Let y ∈ Y
be such that (x, y) ∈ R and (y, z) ∈ S. Let y 0 ∈ Y be such that (x0 , y 0 ) ∈ R, (y 0 , z 0 ) ∈ S.
Then we have:
This holds for any (x, z), (x0 , z 0 ) ∈ R ◦ S, and proves the claim.
Now let η1 > dN (X, Y ), let η2 > dN (Y, Z, and let R ∈ R(X, Y ), S ∈ R(Y, Z) be
such that dis(R) < 2η1 and dis(S) < 2η2 . Then we have:
This shows that dN (X, Z) ≤ dN (X, Y ) + dN (Y, Z), and proves the triangle inequality.
Finally, we claim that X ∼ =w
II Y if and only if dN (X, Y ) = 0. Suppose dN (X, Y ) = 0.
Let ε > 0, and let R(ε) ∈ R(X, Y ) be such that dis(R(ε)) < ε. Then for any z =
(x, y), z 0 = (x0 , y 0 ) ∈ R(ε), we have |ωX (x, x0 ) − ωY (y, y 0 )| < ε. But this is equivalent
97
to writing |ωX (πX (z), πX (z 0 )) − ωY (πY (z), πY (z 0 ))| < ε, where πX : R(ε) → X and
πY : R(ε) → Y are the canonical projection maps. This holds for each ε > 0. Thus
X∼ =w
II Y .
Conversely, suppose X ∼ =w II Y , and for each ε > 0 let Z(ε) be a set with surjective maps
φX : Z(ε) → X, φY : Z → Y such that |ωX (φX (z), φX (z 0 )) − ωY (φY (z), φY (z 0 ))| < ε
ε ε
for all z, z 0 ∈ Z(ε). For each ε > 0, let R(ε) = {(φεX (z), φεY (z)) : z ∈ Z(ε)}. Then
R(ε) ∈ R(X, Y ) for each ε > 0, and dis(R(ε)) = supz,z0 ∈Z |ωX (φX (z), φX (z 0 )) −
ωY (φY (z), φY (z 0 ))| < ε.
We conclude that dN (X, Y ) = 0. Thus dN is a metric modulo Type II weak isomor-
phism.
Proof of Theorem 73. By the definition of ∼
=w ∼w
I , it is clear that if X =I Y , then dN (X, Y ) =
0, i.e. X ∼
=w
II Y (cf. Theorem 72).
Conversely, suppose dN (X, Y ) = 0. Our strategy is to obtain a set Z ⊆ X × Y
with canonical projection maps πX : Z → X, πY : Z → Y and surjections ψX : X →
πX (Z), ψY : Y → πY (Z) as in the following diagram:
X Z Y
idX ψX πX πY ψY idY
X ∼
=w πX (Z) ∼
=w πY (Z) ∼
=w Y
I I I
X∼
=w ∼w ∼w
I πX (Z) =I πY (Z) =I Y.
Since Type I weak isomorphism is an equivalence relation (Proposition 70), it will follow
that X and Y are Type I weakly isomorphic.
By applying Theorem 64, we choose sequences of finite subnetworks {Xn ⊆ X : n ∈ N}
and {Yn ⊆ Y : n ∈ N} such that dN (Xn , X) < 1/n and dN (Yn , Y ) < 1/n for each n ∈ N.
By the triangle inequality, dN (Xn , Yn ) < 2/n for each n.
For each n ∈ N, let Tn ∈ R(Xn , X), Pn ∈ R(Y, Yn ) be such that dis(Tn ) < 2/n and
dis(Pn ) < 2/n. Define αn := 4/n − dis(Tn ) − dis(Pn ), and notice that αn → 0 as n → ∞.
Since dN (X, Y ) = 0 by assumption, for each n ∈ N we let Sn ∈ R(X, Y ) be such that
dis(Sn ) < αn . Then,
98
Then for each n ∈ N, we define Rn := Tn ◦ Sn ◦ Pn ∈ R(Xn , Yn ). By Remark 4, we
know that Rn has the following expression:
Next define:
S := (x̃n , ỹn )n∈N ∈ (X × Y )N : (x̃n , ỹn ) ∈ Sn for each n ∈ N .
Since X, Y are first countable and compact, the product X × Y is also first countable
and compact, hence sequentially compact. Any sequence in a sequentially compact space
has a convergent subsequence, so for convenience, we replace each sequence in S by a
convergent subsequence. Next define:
Proof of Equation 2.1. We now prove Equation 2.1. Let z = (x, y), z 0 = (x0 , y 0 ) ∈ Z, and
let (x̃n , ỹn )n∈N , (x̃0n , ỹn0 )n∈N be elements of S that converge to (x, y), (x0 , y 0 ) respectively.
99
We wish to show |ωX (x, x0 ) − ωY (y, y 0 )| = 0. Let ε > 0, and observe that:
Claim 5. Suppose we are given sequences (x̃n , ỹn )n∈N , (x̃0n , ỹn0 )n∈N in Z converging to
(x, y) and (x0 , y 0 ) in Z, respectively. Then there exists N ∈ N such that for all n ≥ N , we
have:
|ωX (x, x0 ) − ωX (x̃n , x̃0n )| < ε/4, |ωY (ỹn , ỹn0 ) − ωY (y, y 0 )| < ε/4.
|ωX (x, x0 ) − ωY (y, y 0 )| ≤ ε/4 + |ωX (x̃n , x̃0n ) − ωY (ỹn , ỹn0 )| + ε/4.
Separately note that for each n ∈ N, having (x̃n , ỹn ), (x̃0n , ỹn0 ) ∈ Sn implies that there
exist (xn , yn ) and (x0n , yn0 ) ∈ Rn such that (xn , x̃n ), (x0n , x̃0n ) ∈ Tn and (ỹn , yn ), (ỹn0 , yn0 ) ∈
Pn . Thus we can bound the middle term above as follows:
Since ε > 0 was arbitrary, it follows that ωX (x, x0 ) = ωY (y, y 0 ). This proves Equation 2.1.
It remains to define surjective maps ψX : X → πX (Z), ψY : Y → πY (Z) and to verify
Equations 2.2 and 2.3. Both cases are similar, so we only show the details of constructing
ψX and verifying Equation 2.2.
100
Construction of ψX . Let x ∈ X. Suppose first that x ∈ πX (Z). Then we simply define
ψX (x) = x. We also make the following observation, to be used later: for each n ∈ N,
letting y ∈ Y be such that (x, y) ∈ Sn , there exists xn ∈ Xn and yn ∈ Yn such that
(xn , x) ∈ Tn and (y, yn ) ∈ Pn .
Next suppose x ∈ X \ πX (Z). For each n ∈ N, let xn ∈ Xn be such that (xn , x) ∈ Tn ,
and let x̃n ∈ X be such that (xn , x̃n ) ∈ Tn . Also for each n ∈ N, let ỹn ∈ Y be such that
(x̃n , ỹn ) ∈ Sn . Then for each n ∈ N, let yn ∈ Yn be such that (ỹn , yn ) ∈ Pn . Then by
sequential compactness of X × Y , the sequence (x̃n , ỹn )n∈N has a convergent subsequence
which belongs to S and converges to a point (x̃, ỹ) ∈ Z. In particular, we obtain a sequence
(x̃n )n∈N converging to a point x̃, such that (xn , x) and (xn , x̃n ) ∈ Tn for each n ∈ N. Define
ψX (x) = x̃.
Since x ∈ X was arbitrary, this construction defines ψX : X → πX (Z). Note that ψX
is simply the identity on πX (Z), hence is surjective.
Proof of Equation 2.2. Now we verify Equation 2.2. Let ε > 0. There are three cases to
check:
101
we choose N large enough so that for each n ≥ N , we have:
Since ε > 0 was arbitrary, Equation 2.2 follows. The construction of ψY and proof for
Equation 2.3 are similar. This concludes the proof of the theorem.
As a consequence of Theorem 73, we see that weak isomorphisms of Types I and II
coincide in the setting of CN . Thus we recover a desirable notion of equivalence in the
setting of compact networks.
Proposition 76. Let (X, ωX ), (Y, ωY ) be networks with coherent topologies. Suppose f :
X → Y is a weight-preserving map and f (X) is a subnetwork of Y with the subspace
topology. Then f is continuous.
Proof. Let V 0 be an open subset of Y , and write V := V 0 ∩ f (X). Then V is open rel
f (X). We need to show that U := f −1 (V 0 ) = f −1 (V ) is open. Let x ∈ U , and suppose
(xn )n is a sequence in X converging to x. Then f (xn ) → f (x) rel f (X). To see this, note
that
kωY (f (xn ), •)|f (X) − ωY (f (x), •)|f (X) k = kωY (f (xn ), f (•))|X − ωY (f (x), f (•))|X k
= kωX (xn , •) − ωX (x, •)k,
and the latter converges to 0 uniformly by Axiom A2 for X. Similarly, kωY (, f (xn ))|f (X) −
ωY (•, f (x))|f (X) k converges to 0 uniformly. Thus by Axiom A2 for f (X), we have f (xn ) →
f (x) rel f (X). But then there must exist N ∈ N such that f (xn ) ∈ V for all n ≥ N . Then
xn ∈ U for all n ≥ N . Thus U is open rel X by A1. This concludes the proof.
Proposition 80. Suppose (X, ωX ) ∈ N has a coherent topology. Then the map σ : X →
X/ ∼ is an open map, i.e. it maps open sets to open sets.
102
Proof of Proposition 80. Let U ⊆ X be open. We need to show σ −1 (σ(U )) is open. For
convenience, define V := σ −1 (σ(U )). Let v ∈ V . Then σ(v) = [v] = [x] for some x ∈ U .
Let (vn )n∈N be any sequence in X such that vn → v rel X. We first show that vn → x
unif. unif.
rel X. We know ωX (vn , •) −−→ ωX (v, •) and ωX (•, vn ) −−→ ωX (•, v) by Axiom A2.
But ωX (v, •) = ωX (x, •) and ωX (•, v) = ωX (•, x), because x ∼ v. By A2, we then have
vn → x rel X. But then there exists N ∈ N such that vn ∈ U ⊆ V for all n ≥ N . This
shows that any sequence (vn ) in X converging rel X to an arbitrary point v ∈ V must
eventually be in V . Thus V is open rel X, by Axiom A1. This concludes the proof.
The following lemma summarizes useful facts about weight preserving maps and the
relation ∼.
2. f preserves weights between equivalence classes, i.e. ωX/∼ ([x], [x0 ]) = ωY /∼ ([f (x)], [f (x0 )])
for any [x], [x0 ] ∈ X/ ∼.
Proof of Lemma 131. For the first assertion, let x ∼ x0 for some x, x0 ∈ X. We wish to
show f (x) ∼ f (x0 ). Let y ∈ Y , and write y = f (z) for some z ∈ X. Then,
Similarly we have ωY (y, f (x)) = ωY (y, f (x0 )) for any y ∈ Y . Thus f (x) ∼ f (x0 ).
Conversely suppose f (x) ∼ f (x0 ). Let z ∈ X. Then,
and similarly we get ωX (z, x) = ωX (z, x0 ). Thus x ∼ x0 . This proves the first assertion.
The second assertion holds by definition:
ωY /∼ ([f (x)], [f (x0 )]) = ωY (f (x), f (x0 )) = ωX (x, x0 ) = ωX/∼ ([x], [x0 ]).
Proposition 81. Let (X, ωX ) be a compact network with a coherent topology. The quotient
topology on (sk(X), ωsk(X) ) is also coherent.
Proof of Proposition 81. Let Z be any subnetwork of sk(X). Axiom A1 holds for any first
countable space, and we have already shown that sk(X) is first countable. Any subspace
of a first countable space is first countable, so Z satisfies A1.
Next we verify Axiom A2. We begin with the “if” statement. Let [x] ∈ Z and let
([xn ])n be some sequence in Z. Suppose we have
unif. unif.
ωsk(X) ([xn ], [•])|Z −−→ ωsk(X) ([x], [•])|Z , ωsk(X) ([•], [xn ])|Z −−→ ωsk(X) ([•], [x])|Z .
103
Then we also have the following:
unif. unif.
ωX (xn , •)|σ−1 (Z) −−→ ωX (x, •)|σ−1 (Z) , ωX (•, xn )|σ−1 (Z) −−→ ωX (•, x)|σ−1 (Z) .
Proposition 82. Let (X, ωX ) be a compact network with a coherent topology. Then its
skeleton (sk(X), ωsk(X) ) is Hausdorff.
Proof of Proposition 82. Let [x] 6= [x0 ] ∈ sk(X). By first countability, we take a count-
able open neighborhood base {Un : n ∈ N} of [x] such that U1 ⊇ U2 ⊇ U3 . . . (if neces-
sary, we replace Un by ∩ni=1 Ui ). Similarly, we take a countable open neighborhood base
{Vn : n ∈ N} of [x0 ] such that V1 ⊇ V2 ⊇ V3 . . .. To show that sk(X) is Hausdorff, it
suffices to show that there exists n ∈ N such that Un ∩ Vn = ∅.
104
Towards a contradiction, suppose Un ∩ Vn 6= ∅ for each n ∈ N. For each n ∈ N,
let [yn ] ∈ Un ∩ Vn . Any open set containing [x] contains UN for some N ∈ N, and thus
contains [yn ] for all n ≥ N . Thus [yn ] → [x] rel sk(X). Similarly, [yn ] → [x0 ] rel sk(X).
Because sk(X) has a coherent topology (Proposition 81) and thus satisfies Axiom A2, we
then have:
ωsk(X) ([x0 ], [•]) = unif lim ωsk(X) ([yn ], [•]) = ωsk(X) ([x], [•]),
n
0
ωsk(X) ([•], [x ]) = unif lim ωsk(X) ([•], [yn ]) = ωsk(X) ([•], [x]).
n
We are now ready to prove that skeletons are terminal, in the sense of Definition 32
(also recall Definitions 30 and 31).
Theorem 83 (Skeletons are terminal). Let (X, ωX ) ∈ CN be such that the topology on X
is coherent. Then (sk(X), ωsk(X) ) ∈ CN is terminal in p(X).
ωY (y, y 0 ) = ωY (f (xy ), f (xy0 )) = ωX (xy , xy0 ) = ωsk(X) ([xy ], [xy0 ]) = ωsk(X) (g(y), g(y 0 )).
This proves that the skeleton satisfies the first condition for being terminal.
Next suppose g : Y → sk(X) and h : Y → sk(X) are two weight preserving surjec-
tions. We wish to show h = ψ ◦ g for some ψ ∈ Aut(sk(X)).
For each [x] ∈ sk(X), we use the surjectivity of g to pick yx ∈ Y such that g(yx ) = [x].
Then we define ψ : sk(X) → sk(X) by ψ([x]) = ψ(g(yx )) := h(yx ).
To see that ψ is surjective, let [x] ∈ sk(X). Since h is surjective, there exists yx0 ∈ Y
such that h(yx0 ) = [x]. Write [u] = g(yx0 ). We have already chosen yu such that g(yu ) = [u].
Since g preserves equivalence classes (Lemma 131), it follows that yx0 ∼ yu . Then,
where the second-to-last equality holds because h preserves equivalence classes (Lemma
131).
To see that ψ is injective, let [x], [x0 ] ∈ sk(X) be such that ψ([x]) = h(yx ) = h(yx0 ) =
ψ([x0 ]). Since h preserves equivalence classes (Lemma 131), we have yx ∼ yx0 . Next,
105
g(yx ) = [x] and g(yx0 ) = [x0 ] by the choices we made earlier. Since yx ∼ yx0 and g
preserves clusters, we have g(yx ) ∼ g(yx0 ). Thus [x] = [x0 ].
Next we wish to show that ψ preserves weights. Let [x], [x0 ] ∈ sk(X). Then,
ωsk(X) (ψ([x]), ψ([x0 ])) = ωsk(X) (h(yx ), h(yx0 )) = ωY (yx , yx0 ) = ωsk(X) (g(yx ), g(yx0 ))
= ωsk(X) ([x], [x0 ]).
where the last equality holds because h preserves equivalence classes (Lemma 131). Thus
for each y ∈ Y , we have h(y) = ψ(g(y)). This shows that the skeleton satisfies the second
condition for being terminal. We conclude the proof.
ωY (f (xn ), f (xm )) = ωY (lim fk,k (xn ), lim fk,k (xm )) = lim ωY (fk,k (xn ), fk,k (xm )) = ωX (xn , xm ).
k k k
106
In the second equality above, we used the fact that a sequence converges in the product
topology iff the components converge. Since xn , xm ∈ SX were arbitrary, this concludes
the proof.
lim ωY (f (xn ), f (x0n )) = ωY (f (x), f (x0 )); lim ωX (xn , x0n ) = ωX (x, x0 ).
n n
Let ε > 0. By the previous observation, fix N ∈ N such that for all n ≥ N , we have
|ωY (f (xn ), f (x0n )) − ωY (f (x), f (x0 ))| < ε and |ωX (xn , x0n ) − ωX (x, x0 )| < ε. Then,
|ωX (x, x0 ) − ωY (f (x), f (x0 ))| = |ωX (x, x0 ) − ωX (xn , x0n ) + ωX (xn , x0n ) − ωY (f (x), f (x0 ))|
≤ |ωX (x, x0 ) − ωX (xn , x0n )| + |ωX (f (xn ), f (x0n )) − ωY (f (x), f (x0 ))| < 2ε.
Thus ωX (x, x0 ) = ωY (f (x), f (x0 )). Since x, x0 ∈ X were arbitrary, this concludes the
proof.
The next result generalizes the result that an isometric embedding of a compact metric
space into itself is automatically surjective [17, Theorem 1.6.14]. However, before present-
ing the theorem we first discuss an auxiliary construction that is used in its proof.
ΓA (x, x0 ) := max sup |ωX (x, a) − ωX (x0 , a)|, sup |ωX (a, x) − ωX (a, x0 )| .
a∈A a∈A
Then ΓA satisfies symmetry, triangle inequality, and ΓA (x, x) = 0 for all x ∈ X. Thus ΓA
is a pseudometric on X. Moreover, ΓA is a bona fide metric on sk(A). The construction is
“canonical” because it does not rely on any coupling between the topology of X and ωX :
even the continuity of ωX is not necessary for this construction.
107
Next, for any E ⊆ X and any y ∈ X, define ΓA (y, E) := inf y0 ∈E ΓA (y, y 0 ). Then
ΓA (•, E) behaves as a proxy for the “distance to a set” function, where the set is fixed to
be E.
Theorem 134. Let (X, ωX ) be a compact network with a coherent, Hausdorff topology.
Suppose f : X → X is a weight-preserving map. Then f is surjective.
To see this claim, fix y ∈ E. Let v ∈ f (Z). Then v = f (y 0 ) for some y 0 ∈ Z, and
Γf (E) (f (y), v) = ΓE (y, y 0 ). To see the latter assertion, let u ∈ f (E); then u = f (y 00 ) for
some y 00 ∈ E. Because f is weight-preserving, we then have:
|ωX (f (y), u) − ωX (v, u)| = |ωX (f (y), f (y 00 )) − ωX (f (y 0 ), f (y 00 ))| = |ωX (y, y 00 ) − ωX (y 0 , y 00 )|,
|ωX (u, f (y)) − ωX (u, v)| = |ωX (f (y 00 ), f (y)) − ωX (f (y 00 ), f (y 0 ))| = |ωX (y 00 , y) − ωX (y 00 , y 0 )|.
108
The preceding equalities show that for each v ∈ f (Z), there exists y 0 ∈ Z such that
Γf (E) (f (y), v) = ΓE (y, y 0 ). Conversely, for any y 0 ∈ Z, we have Γf (E) (f (y), f (y 0 )) =
ΓE (y, y 0 ). It follows that Γf (E) (f (y), f (Z)) = ΓE (y, Z).
Claim 9. ΓX (x, Z) = 0.
To see this, assume towards a contradiction that ΓX (x, Z) = ε > 0 (ΓX is posi-
tive by definition). Since f (Z) = Z, we have by the preceding claim that ΓX (x, Z) =
Γf (X) (f (x), Z) = . . . = Γf n (X) (f n (x), Z) for each n ∈ N. In particular, for any k ∈ N,
Here the first inequality follows because the left hand side includes an infimum over z ∈ Z,
and the second inequality holds because the right hand side includes a supremum over a
larger set.
Since xnk → z rel X, we have by Axiom A2 that
unif. unif.
kωX (xnk , •) − ωX (z, •)k −−→ 0, kωX (•, xnk ) − ωX (•, z)k −−→ 0.
sup |ωX (xnk , y) − ωX (z, y)| < ε, sup |ωX (y, xnk ) − ωX (y, z)| < ε.
y∈X y∈Xnk
max (|ωX (x, x0 ) − ωX (zn , x0 )|, |ωX (x0 , x) − ωX (x0 , zn )|) < 1/n, i.e.
max (kωX (x, •) − ωX (zn , •)k, kωX (•, x) − ωX (•, zn )k) < 1/n.
Thus the sequence (zn )n converges to x, by Axiom A2. Hence any open set containing x
also contains infinitely many points of Z that are distinct from x. Thus x is a limit point of
the closed set Z, and so x ∈ Z. This is a contradiction.
Theorem 84. Suppose (X, ωX ), (Y, ωY ) are separable, compact networks with coherent
topologies. Then the following are equivalent:
1. X ∼
=w Y .
3. sk(X) ∼
=s sk(Y ).
109
Proof of Theorem 84. (2) follows from (1) by the stability of motif sets (Theorem 149). (1)
follows from (3) by the triangle inequality of dN . We need to show that (2) implies (3).
First observe that sk(X), being a continuous image of the separable space X, is sep-
arable, and likewise for sk(Y ). Let SX , SY denote countable dense subsets of sk(X) and
sk(Y ). Next, because dN (X, sk(X)) = 0, an application of Theorem 149 shows that
Mn (X) = Mn (sk(X)) for each n ∈ N. The analogous result holds for sk(Y ). Thus
Mn (sk(X)) = Mn (sk(Y )) for each n ∈ N. Since X and Y have coherent topologies, so
do sk(X) and sk(Y ), by Proposition 81. By Propositions 132 and 133, there exist weight-
preserving maps ϕ : sk(X) → sk(Y ) and ψ : sk(Y ) → sk(X). Define X (1) := ψ(sk(Y ))
and Y (1) := ϕ(sk(X)). Also define ϕ1 and ψ1 to be the restrictions of ϕ and ψ to X (1) and
Y (1) , respectively. Finally define X (2) := ψ1 (Y (1) ) and Y (2) := ϕ1 (X (1) ). Then we have
the following diagram.
R := R1 ◦ R2 ◦ · · · ◦ Rn
= (x1 , xn ) ∈ X1 × Xn | ∃(xi )n−1
i=2 , (xi , xi+1 ) ∈ Ri for all i .
110
Proof. We proceed by induction, beginning with the base case n = 2. For convenience,
write X := X1 , Y := X2 , and Z := X3 . Let (x, z), (x0 , z 0 ) ∈ R1 ◦ R2 . Let y ∈ Y be such
that (x, y) ∈ R1 and (y, z) ∈ R2 . Let y 0 ∈ Y be such that (x0 , y 0 ) ∈ R1 , (y 0 , z 0 ) ∈ R2 . Then
we have:
This holds for any (x, z), (x0 , z 0 ) ∈ R ◦ S, and proves the claim.
Suppose that the result holds for n = N ∈ N. Write R0 = R1 ◦ · · · ◦ RN and R =
R ◦ RN +1 . Since R0 is itself a correspondence, applying the base case yields:
0
Proof. Let ([Xi ])i∈N be a Cauchy sequence in FN /∼ =w . First we wish to show this se-
quence converges in CN /∼ =w . Note that (Xi )i∈N is a Cauchy sequence in FN , since the
distance between two equivalence classes is given by the distance between any representa-
tives. To show (Xi )i converges, it suffices to show that a subsequence of (Xi )i converges,
so without loss of generality, suppose dN (Xi , Xi+1 ) < 2−i for each i. Then for each i,
there exists Ri ∈ R(Xi , Xi+1 ) such that dis(Ri ) ≤ 2−i+1 . Fix such a sequence (Ri )i∈N .
For j > i, define
Rij := Ri ◦ Ri+1 ◦ Ri+2 ◦ · · · ◦ Rj−1 .
By Lemma 135, dis(Rij ) ≤ dis(Ri ) + dis(Ri+1 ) + . . . + dis(Rj−1 ) ≤ 2−i+2 . Next define:
111
have:
|ωXj (xj , x0j )| = |ωXj (xj , x0j ) − ωXj−1 (xj−1 , x0j−1 ) + ωXj−1 (xj−1 , x0j−1 ) − . . .
− ωX1 (x1 , x01 ) + ωX1 (x1 , x01 )|
≤ |ωX1 (x1 , x01 )| + dis(R1 ) + dis(R2 ) + . . . + dis(Rj−1 )
≤ |ωX1 (x1 , x01 )| + 2
|ωX ((xj ), (x0j ))| = lim sup ωXj (xj , x0j ) ≤ |ωX1 (x1 , x01 )| + 2 < ∞.
j→∞
U := X1 × X2 × . . . × {xN } × {xN +1 } × XN +2 × . . . .
Since Xi has the discrete topology for each i ∈ N, it follows that {xN } and {xN +1 } are
Q
open. Hence U is an open neighborhood of (xi )i∈N and is disjoint from i∈N Xi . It follows
Q
that i∈N Xi \ X is open, hence X is closed and thus compact.
It remains to show that ωX is continuous. We will show that preimages of open sets
−1
in R under ωX are open. Let (a, b) ⊆ R, and suppose ωX [(a, b)] is nonempty (otherwise,
0
there is nothing to show). Let (xi )i∈N , (xi )i∈N ∈ X × X be such that
112
−1
and (x0i )i∈N ∈ B. We wish to show that A × B ⊆ ωX [(a, b)], so it suffices to show that
ωX (A, B) ⊆ (a, b).
Let (zi )i∈N ∈ A and (zi0 )i∈N ∈ B. Notice that zi = xi and zi0 = x0i for each i ≤ N . So
for n ≤ N , we have |ωXn (zn , zn0 ) − ωXn (xn , x0n )| = 0.
Next let n ∈ N, and note that:
0 0
|ωXN +n (zN +n , zN +n ) − ωXN +n (xN +n , xN +n )|
0 0 0 0
= |ωXN +n (zN +n , zN +n ) − ωXN (zN , zN ) + ωXN (zN , zN ) − ωXN +n (xN +n , xN +n )|
0 0 0 0
= |ωXN +n (zN +n , zN +n ) − ωXN (zN , zN ) + ωXN (xN , xN ) − ωXN +n (xN +n , xN +n )|
≤ dis(RN,N +n ) + dis(RN,N +n ) ≤ 2−N +2 + 2−N +2 = 2−N +3 < r.
Here the second to last inequality follows from Lemma 135. The preceding calculation
holds for arbitrary n ∈ N. It follows that:
lim sup ωXi (xi , x0i ) − lim sup ωXi (zi , zi0 ) ≤ lim sup(ωXi (xi , x0i ) − ωXi (zi , zi0 ) < r,
i→∞ i→∞ i→∞
and similarly lim supi→∞ ωXi (zi , zi0 )−lim supi→∞ ωXi (xi , x0i ) < r. Thus we have ωX ((zi )i , (zi0 )i ) ∈
(a, b). This proves continuity of ωX .
N d
Next we claim that Xi −→ X as i → ∞. Fix i ∈ N. We wish to construct a correspon-
dence S ∈ R(Xi , X). Let y ∈ Xi . We write xi = y and pick x1 , x2 , . . . , xi−1 , xi+1 , . . .
such that (xj ,j+1 ) ∈ Rj for each j ∈ N. We denote this sequence by (xj )xi =y , and note
that by construction, it lies in X. Conversely, for any (xj ) ∈ X, we simply pick its ith
coordinate xi as a corresponding element in Xi . We define:
S := A ∪ B, where
A := {(y, (xj )xi =y ) : y ∈ Xi }
B := (xi , (xk )) : (xk ) ∈ X
Then S ∈ R(Xi , X). We claim that dis(S) ≤ 2−i+2 . Let z = (y, (xk )), z 0 = (y 0 , (x0k )) ∈
B. Let n ∈ N, n ≥ i. Then we have:
|ωXi (y, y 0 ) − ωXn (xn , x0n )| = |ωXi (y, y 0 ) − ωXi+1 (xi+1 , x0i+1 ) + ωXi+1 (xi+1 , x0i+1 ) − . . .
+ ωXn−1 (xn−1 , x0n−1 ) + ωXn (xn , x0n )|
≤ dis(Ri ) + dis(Ri+1 ) + . . . + dis(Rn−1 )
≤ 2−i+1 + 2−i + . . . + 2−n+2
≤ 2−i+2 .
113
Similar inequalities hold for z, z 0 ∈ A, and for z ∈ A, z 0 ∈ B. Thus dis(S) ≤ 2−i+2 . It
follows that dN (Xi , X) ≤ 2−i+1 . Thus the sequence ([Xi ])i converges to [X] ∈ CN /∼ =w .
Finally, we need to check that (CN /∼ =w , dN ) is complete. Let ([Yn ])n be a Cauchy
sequence in CN /∼ =w . For each n, let [Xn ] ∈ FN /∼ =w be such that dN ([Xn ], [Yn ]) < n1 .
Let ε > 0. Then for sufficiently large m and n, we have:
Thus ([Xn ])n is a Cauchy sequence in FN /∼=w . By applying what we have shown above,
this sequence converges to some [X] ∈ CN /∼ =w . By applying the triangle inequality,
we see that the sequence ([Yn ])n also converges to [X]. This shows completeness, and
concludes the proof.
The result of Theorem 88 can be summarized as follows:
The limit of a convergent sequence of finite networks is a compact topological space with
a continuous weight function.
Remark 136. The technique of composed correspondences used in the preceding proof
can also be used to show that the collection of isometry classes of compact metric spaces
endowed with the Gromov-Hausdorff distance is a complete metric space. Standard proofs
of this fact [98, §10] do not use correspondences, relying instead on a method of endowing
metrics on disjoint unions of spaces and then computing Hausdorff distances.
Remark 137. In the proof of Theorem 88, note that the construction of the limit is depen-
dent upon the initial choice of optimal correspondences. However, all such limits obtained
from different choices of optimal correspondences belong to the same weak isomorphism
class.
114
diam(F ) + 2(ε/2) ≤ D + ε. Thus the matrices in A have entries in [−D − ε, D + ε]. Let
N 1 be such that:
2D + 2ε ε
< ,
N 4
and write the refinement of [−D − ε, D + ε] into N pieces as:
W := −D − ε + k 2D+2ε
N
:0≤k≤N .
FN (ε/2)
Write A = i=1 Ai , where each Ai consists of the i × i matrices of A. For each i
define:
Let G = N
F (ε/2)
i=1 Gi and note that this is a finite collection. Furthermore, for each
Ai ∈ Ai , there exists Gi ∈ Gi such that
ε
kAi − Gi k∞ < .
4
Taking the diagonal correspondence between Ai and Gi , it follows that dN (Ai , Gi ) <
ε/2. Hence for any [F ] ∈ F, there exists A ∈ A and G ∈ G such that
2.5.3 Geodesics in CN /∼
=w
We now prove our results about the geodesic structures of FN /∼
=w and CN /∼
=w .
Proof of Theorem 92. Let [X], [Y ] ∈ FN /∼ =w . We will show the existence of a curve
γ : [0, 1] → FN such that γ(0) = (X, ωX ), γ(1) = (Y, ωY ), and for all s, t ∈ [0, 1],
Note that this yields dN ([γ(s)], [γ(t)]) = |t − s| · dN ([X], [Y ]) for all s, t ∈ [0, 1], which is
what we need to show.
Let R ∈ R opt (X, Y ), i.e. let R be a correspondence
such that dis(R) = 2dN (X, Y ).
For each t ∈ (0, 1) define γ(t) := R, ωγ(t) , where
ωγ(t) (x, y), (x0 , y 0 ) := (1 − t) · ωX (x, x0 ) + t · ωY (y, y 0 ) for all (x, y), (x0 , y 0 ) ∈ R.
115
Suppose for now that Claim 11 holds. We further claim that this implies, for all s, t ∈
[0, 1],
dN (γ(s), γ(t)) = |t − s| · dN (X, Y ).
To see this, assume towards a contradiction that there exist s0 < t0 such that :
Thus it suffices to show Claim 11. There are three cases: (i) s, t ∈ (0, 1), (ii) s = 0, t ∈
(0, 1), and (iii) s ∈ (0, 1), t = 1. The latter two cases are similar, so we just prove (i)
and (ii). For (i), fix s, t ∈ (0, 1). Notice that ∆ := diag(R × R) := {(r, r) : r ∈ R} is a
correspondence in R(R, R). Then we obtain:
= max |ωγ(t) ((x, y), (x0 , y 0 )) − ωγ(s) ((x, y), (x0 , y 0 ))|
(x,y),(x0 ,y 0 )∈R
≤ 2|t − s| · dN (X, Y ).
dis(RX ) = max
0 0
|ωX (x, x0 ) − (1 − t) · ωX (x, x0 ) − t · ωY (y, y 0 )|
(x,(x,y)),(x ,(x ,y 0 ))∈R X
= t dis(R) = 2t · dN (X, Y ).
Thus dN (X, γ(t)) ≤ t · dN (X, Y ). The proof for case (iii), i.e. that dN (γ(s), Y ) ≤
|1 − s| · dN (X, Y ), is similar. This proves Claim 11, and the result follows.
Proof of Theorem 93. Let [X], [Y ] ∈ CN /∼=w . It suffices to find a geodesic between X
and Y , because the distance between any two equivalence classes is given by the distance
between any two representatives, and hence we will obtain a geodesic between [X] and
[Y ].
116
Let (Xn )n , (Yn )n be sequences in FN such that dN (Xn , X) < n1 and dN (Yn , Y ) < n1
for each n. For each n, let Rn be an optimal correspondence between Xn and Yn , endowed
with the weight function
dis(Q) = max |ωN ((x, y), (a, b)) − ωn ((x0 , y 0 ), (a0 , b0 ))|
(x,y,x0 ,y 0 ),
(a,b,a0 ,b0 )∈Q
= max
0 0
| 12 ωXN (x, a) + 12 ωYN (y, b) − 21 ωXn (x0 , a0 ) − 12 ωYn (y 0 , b0 ))|
(x,y,x ,y ),
(a,b,a0 ,b0 )∈Q
≤ 1
max
2 (x,x0 ),(a,a0 )∈S
|ωXN (x, a) − ωXn (x0 , a0 )| + 1
max
2 (y,y 0 ),(b,b0 )∈S
|ωYN (y, b) − ωYn (y 0 , b0 )|
= 21 dis(S) + 12 dis(T ) ≤ 4
N
.
117
Thus dN (RN , Rn ) ≤ N2 < ε. This shows that any Rn can be ε-approximated by a network
having up to N (ε) points. Thus {Rn } is uniformly approximable, hence precompact. Thus
Claim 12 and the result follow.
Remark 138 (Branching and deviant geodesics). It is important to note that there exist
geodesics in CN /∼ =w that deviate from the straight-line form given by Theorem 92. Even
in the setting of compact metric spaces, there exist infinite families of branching and deviant
geodesics [36].
Then diam is an R-invariant. Observe that the maximum is achieved for (X, ωX ) ∈ CN
because X (hence X × X) is compact and ωX : X × X → R is continuous.
Example 140. Define the spectrum map
118
α δ α
p =⇒ p
q q
γ β β
(X, ωX ) (X, trX )
Figure 2.1: The trace map erases data between pairs of nodes.
The spectrum also has two local variants. Define the out-local S spectrum of X by x 7→
0 0
specX (x) := {ωX (x, x ), x ∈ X}. Notice that spec(X) = x∈X specout
out
X (x) for any
network X, thus justifying the claim that this construction localizes spec. Similarly, we
in 0 0
define the in-spectrum of X asS the map inx 7→ specX (x) := {ωX (x , x) : x ∈ X} . Notice
that one still has spec(X) = x∈X specX (x) for any network X. Finally, we observe that
the two local versions of spec do not necessarily coincide in an asymmetric network.
The spectrum is closely related to the multisets used by Boutin and Kemper [15] to pro-
duce invariants of weighted undirected graphs. For an undirected graph G, they considered
the collection of all subgraphs with three nodes, along with the edge weights for each sub-
graph (compare to our notion of spectrum). Then they proved that the distribution of edge
weights of these subgraphs is an invariant when G belongs to a certain class of graphs.
Example 141. Define the trace map tr : CN → pow(R) by (X, ωX ) 7→ tr(X) :=
{ωX (x, x) : x ∈ X}. This also defines an associated map x 7→ trX (x) := ωX (x, x). An
example is provided in Figure 2.1: in this case, we have (X, trX ) = ({p, q}, (α, β)).
Example 142 (The out and in maps). Let (X, ωX ) ∈ CN , and let x ∈ X. Now define
out : CN → pow(R) and in : CN → pow(R) by
0
out(X) = max |ωX (x, x )| : x ∈ X for all (X, ωX ) ∈ CN
x0 ∈X
0
in(X) = max0
|ωX (x , x)| : x ∈ X for all (X, ωX ) ∈ CN .
x ∈X
For each x ∈ X, maxx0 ∈X |ωX (x, x0 )| and maxx0 ∈X |ωX (x0 , x)| are achieved because {x}×
X and X × {x} are compact. We also define the associated maps outX and inX by writing,
for any (X, ωX ) ∈ CN and any x ∈ X,
119
1
p
0 3
2 0
4
q r
0
0 5
X
Figure 2.2: The out map applied to each node yields the greatest weight of an arrow leaving
the node, and the in map returns the greatest weight entering the node.
So the out map returns the maximum (absolute) value in each row, and the in map pulls out
the maximum (absolute) value in each column of the weight matrix. As in the preceding
example, we may use the Hausdorff distance to compare the images of networks under the
out and in maps.
Constructions similar to out and in have been used by Jon Kleinberg to study the prob-
lem of searching the World Wide Web for user-specified queries [77]. In Kleinberg’s model,
for a search query σ, hubs are pages that point to highly σ-relevant pages (compare to out• ),
and authorities are pages that are pointed to by pages that have a high σ-relevance (com-
pare to in• ). Good hubs determine good authorities, and good authorities turn out to be
good search results.
Example 143 (mout and min ). Define the maps mout : CN → R and min : CN → R by
Then both min and mout are R-invariants. We take the minimum when defining mout , min
because for any network (X, ωX ), we have maxx∈X outX (x) = maxx∈X inX (x) = diam(X).
Also observe that the minima are achieved above because X is compact.
Proposition 144. The maps out, in, tr, spec, and spec• are pow(R)-invariants. Similarly,
diam, mout , and min are R-invariants.
120
Next we see that the motif sets defined in §1.6.4 are also invariants.
Definition 46 (Motif sets are metric space valued invariants). Our use of motif sets is
motivated by the following observation, which appeared in [87, Section 5]. For any n ∈ N,
let C(Rn×n ) denote the set of closed subsets of Rn×n . Under the Hausdorff distance induced
by the `∞ metric on Rn×n , this set becomes a valid metric space [17, Proposition 7.3.3].
The motif sets defined in Definition 29 define a metric space valued invariant as follows:
for each n ∈ N, let Mn : CN → C(Rn×n ) be the map X 7→ Mn (X). We call this the motif
set invariant. So for (X, ωX ), (Y, ωY ) ∈ CN , for each n ∈ N, we let (Z, dZ ) = (Rn×n , `∞ )
and consider the following distance between the n-motif sets of X and Y :
Since dH is a proper distance between closed subsets, dn (Mn (X), Mn (Y )) = 0 if and only
if Mn (X) = Mn (Y ).
for all networks X and Y . The least constant L such that the above holds for all X, Y ∈ CN
is the Lipschitz constant of ι and is denoted L(ι).
Note that by identifying a non-constant quantitatively stable V -valued invariant ι, we
immediately obtain a lower bound for the dN distance between any two compact networks
(X, ωX ) and (Y, ωY ). Furthermore, given a finite family ια : CN → V , α ∈ A, of non-
constant quantitatively stable invariants, we may obtain the following lower bound for the
distance between compact networks X and Y :
−1
dN (X, Y ) ≥ max L(ια ) max dV (ια (X), ια (Y )).
α∈A α∈A
It is often the case that computing dV (ι(X), ι(Y )) is substantially simpler than comput-
ing the dN distance between X and Y (which leads to a possibly NP-hard problem). The
invariants described in the previous section are quantitatively stable.
Proposition 145. The invariants diam, tr, out, in, mout , and min are quantitatively stable,
with Lipschitz constant L = 2.
Example 146. Proposition 145 provides simple lower bounds for the dN distance between
compact networks. One application is the following: for all networks X and Y , we have
dN (X, Y ) ≥ 21 diam(X) − diam(Y ) . For example, consider the weight matrices
0 5 2 3 4 2
Σ := 3 1 4 and Σ0 := 3 1 5 .
1 4 3 3 3 4
121
1 3
2 2
p r
q s
1 3
2 1
X Y
Let X = N3 (Σ) and Y = N3 (Σ0 ). By comparing the diagonals, we can easily see that
X ∼6=s Y , but let us see how the invariants we proposed can help. Note that diam(X) =
diam(Y ) = 5, so the lower bound provided by diameter ( 12 |5 − 5| = 0) does not help
in telling the networks apart. However, tr(X) = {0, 1, 3} and tr(Y ) = {3, 1, 4}, and
Proposition 145 then yields
1 1
dN (X, Y ) ≥ dRH ({0, 1, 3}, {1, 3, 4}) = .
2 2
Consider now the out and in maps. Note that one has out(X) = {5, 4}, out(Y ) =
{4, 5}, in(X) = {3, 5, 4}, and in(Y ) = {3, 4, 5}. Then dRH (out(X), out(Y )) = 0, and
dRH (in(X), in(Y )) = 0. Thus in both cases, we obtain dN (X, Y ) ≥ 0. So in this particular
example, the out and in maps are not useful for obtaining a lower bound to dN (X, Y ) via
Proposition 145.
Now we state a proposition regarding the stability of global and local spectrum invari-
ants. These will be of particular interest for computational purposes as we explain in §4.2.
Proposition 147. Let spec• refer to either the out or in version of local spectrum. Then,
for all (X, ωX ), (Y, ωY ) ∈ CN we have
1
dN (X, Y ) ≥ inf sup dRH (spec•X (x), spec•Y (y))
2 R∈R (x,y)∈R
1
≥ dRH (spec(X), spec(Y )).
2
As a corollary, we get L(spec• ) = L(spec) = 2.
Example 148 (An application of Proposition 147). Consider the networks in Figure 2.3.
By Proposition 147, we may calculate a lower bound for dN (X, Y ) by simply computing
the Hausdorff distance between spec(X) and spec(Y ), and dividing by 2. In this exam-
ple, spec(X) = {1, 2} and spec(Y ) = {1, 2, 3}. Thus dRH (spec(X), spec(Y )) = 1, and
dN (X, Y ) ≥ 21 .
122
Computing the lower bound involving local spectra requires solving a bottleneck linear
assignment problem over the set of all correspondences between X and Y . This can be
solved in polynomial time; details are provided in §4.2. The second lower bound stipu-
lates computing the Hausdorff distance on R between the (global) spectra of X and Y – a
computation which can be carried out in (smaller) polynomial time as well.
To conclude this section, we state a theorem asserting that motif sets form a family of
quantitatively stable invariants.
Theorem 149. For each n ∈ N, Mn is a stable invariant with L(Mn ) = 2.
Proof of Lemma 150. Observe that f (X) = {fX (x) : x ∈ X} = fX (X), so we need to
show
dRH (fX (X), fY (Y )) = inf sup fX (x) − fY (y) .
R∈R(X,Y ) (x,y)∈R
Let a ∈ X and let R ∈ R(X, Y ). Then there exists b ∈ Y such that (a, b) ∈ R. Then we
have:
123
This holds for all R ∈ R(X, Y ). So we have
sup inf |fX (a) − fY (b)| ≤ inf sup |fX (x) − fY (y)|.
a∈X b∈Y R∈R (x,y)∈R
sup inf |fX (a) − fY (b)| ≤ inf sup |fX (x) − fY (y)|.
b∈Y a∈X R∈R (x,y)∈R
Now we show the reverse inequality. Let x ∈ X, and let η > dRH (fX (X), fY (Y )). Then
there exists y ∈ Y such that |fX (x) − fY (y)| < η. Define ϕ(x) = y, and extend ϕ to all
of X in this way. Let y ∈ Y . Then there exists x ∈ X such that |fX (x) − fY (y)| < η.
Define ψ(y) = x, and extend ψ to all of Y in this way. Let R = {(x, ϕ(x)) : x ∈ X} ∪
{(ψ(y), y) : y ∈ Y }. Then for each (a, b) ∈ R, we have |fX (a) − fY (b)| < η. Thus we
have inf R∈R sup(x,y)∈R |fX (x) − fY (y)| < η. Since η > dRH (fX (X), fY (Y )) was arbitrary,
it follows that
Proof of Proposition 145. Let η > dN (X, Y ). We break this proof into three parts.
The diam case. Recall that diam is an R-valued invariant, so we wish to show | diam(X)−
diam(Y )| ≤ 2dN (X, Y ). Let R ∈ R(X, Y ) be such that for any (a, b), (a0 , b0 ) ∈ R, we
have |ωX (a, a0 ) − ωY (b, b0 )| < 2η.
Let x, x0 ∈ X such that |ωX (x, x0 )| = diam(X), and let y, y 0 be such that (x, y), (x0 , y 0 ) ∈
R. Then we have:
Similarly, we get diam(Y ) < diam(X) + 2η. It follows that | diam(X) − diam(Y )| < 2η.
Since η > dN (X, Y ) was arbitrary, it follows that:
For tightness, consider the networks X = N1 (1) and Y = N1 (2). By direct computa-
tion, we have that dN (X, Y ) = 21 . On the other hand, diam(X) = 1 and diam(Y ) = 2 so
that | diam(X) − diam(Y )| = 1 = 2dN (X, Y ).
The cases tr, out, and in. First we show L(tr) = 2. By Lemma 150, it suffices to show
124
Let R ∈ R(X, Y ) be such that for any (a, b), (a0 , b0 ) ∈ R, we have |ωX (a, a0 )−ωY (b, b0 )| <
2η. Then we obtain |ωX (a, a) − ωY (b, b)| < 2η. Thus | trX (a) − trY (b)| < 2η. Since
(a, b) ∈ R was arbitrary, it follows that sup(a,b)∈R | trX (a) − trX (b)| < 2η. It follows that
inf R∈R sup(a,b)∈R | trX (a) − trX (b)| < 2η. The result now follows because η > dN (X, Y )
was arbitrary. The proofs for out and in are similar, so we just show the former. By Lemma
150, it suffices to show
Recall that outX (x) = maxx0 ∈X |ωX (x, x0 )|. Let R ∈ R(X, Y ) be such that |ωX (x, x0 )−
ωY (y, y 0 )| < 2η for any (x, y), (x0 , y 0 ) ∈ R. By triangle inequality, it follows that |ωX (x, x0 )| <
|ωY (y, y 0 )| + 2η. In particular, for (x0 , y 0 ) ∈ R such that |ωX (x, x0 )| = outX (x), we have
outX (x) < |ωY (y, y 0 )| + 2η. Hence outX (x) < outY (y) + 2η. Similarly, outY (y) <
outX (x) + 2η. Thus we have | outX (x) − outY (y)| < 2η. This holds for all (x, y) ∈ R, so
we have:
sup | outX (x) − outY (y)| < 2η.
(x,y)∈R
The cases mout and min . The two cases are similar, so let’s just prove L(mout ) = 2. Since
mout is an R-invariant, we wish to show | mout (X) − mout (Y )| < 2η. It suffices to show:
dRH (out(X), out(Y )) = inf sup | outX (x) − outY (y)| < 2η.
R∈R(X,Y ) (x,y)∈R
Here we have used Lemma 150 for the first equality above.
Let ε > dRH (out(X), out(Y )). Then for any x ∈ X, there exists y ∈ Y such that:
125
for some y ∈ Y . In particular, we have:
Similarly, we obtain:
mout (X) < ε + mout (Y ).
Thus we have | mout (X)−mout (Y )| < ε. Since ε > dRH (out(X), out(Y )) was arbitrary,
we have:
| mout (X) − mout (Y )| ≤ dRH (out(X), out(Y )).
The inequality now follows by Lemma 150 and our proof in the case of the out map.
For tightness, note that | mout (N1 (1)) − mout (N1 (2))| = |1 − 2| = 1 = 2 · 21 =
2dN (N1 (1), N1 (2)). The same example works for the min case.
Proof of Proposition 147. (First inequality.) Let X, Y ∈ CN and let η > dN (X, Y ). Let
R ∈ R(X, Y ) be such that sup(x,y),(x0 ,y0 )∈R |ωX (x, x0 ) − ωY (y, y 0 )| < 2η. Let (x, y) ∈ R,
and let α ∈ specX (x). Then there exists x0 ∈ X such that ωX (x, x0 ) = α. Let y 0 ∈ Y be
such that (x0 , y 0 ) ∈ R. Let β = ωY (y, y 0 ). Note β ∈ specY (y). Also note that |α − β| < 2η.
By a symmetric argument, for each β ∈ specY (y), there exists α ∈ specX (x) such that
|α − β| < 2η. So dRH (specX (x), specY (y)) < 2η. This is true for any (x, y) ∈ R, and so
we have sup(x,y)∈R dRH (specX (x), specY (y)) ≤ 2η. Then we have:
dRH (spec(X), spec(Y )) ≤ inf sup dRH (specX (x), specY (y)).
R∈R (x,y)∈R
126
By a symmetric argument, we get sup(yi )∈Y n inf (xi )∈X n |ωX (xi , xj ) − ωY (yi , yj )| ≤ dis(R).
Thus dn (Mn (X), Mn (Y )) ≤ dis(R). This holds for any R ∈ R(X, Y ). Thus we have
For tightness, let X = N1 (1) and let Y = N1 (2). Then dN (X, Y ) = 21 , so we wish to
show dn (Mn (X), Mn (Y )) = 1 for each n ∈ N. Let n ∈ N. Let 1n×n denote the n × n
matrix with 1 in each entry. Then Mn (X) = {1n×n } and Mn (Y ) = {2 · 1n×n }. Thus
dn (Mn (X), Mn (Y )) = 1. Since n was arbitrary, we conclude that equality holds for each
n ∈ N.
Because |σX n
− σYn |p is continuous and hence bounded on the compact cube I 2 × I 2 , we
n
know that |σX − σYn |p ∈ Cb (I 2 × I 2 ).
We claim that disnp is continuous. Since the narrow topology on Prob(I × I) is in-
duced by a distance [5, Remark 5.1.1], it suffices to show sequential continuity. Let
ν ∈ C (λI , λI ), and let (νm )m∈N be a sequence in C (λI , λI ) converging narrowly to ν.
Then we have
Z 1/p
n n n p
lim disp (νm ) = lim |σX − σY | dνm ⊗ dνm
m→∞ m→∞ I 2 ×I 2
Z 1/p
= n
|σX − σYn |p dν ⊗ dν
I 2 ×I 2
n
= disp (ν).
Here the second equality follows from the definition of convergence in the narrow topology
and the fact that the integrand is bounded and continuous. This shows sequential continuity
(hence continuity) of disnp .
127
Finally, we show that (disnp )n∈N converges to disp uniformly. Let µ ∈ C (λI , λI ). Then,
≤ 2/n.
But µ ∈ C (λI , λI ) was arbitrary. This shows that disp is the uniform limit of continu-
ous functions, hence is continuous. Here the first and second inequalities followed from
Minkowski’s inequality.
Now suppose p = ∞. Let µ ∈ C (λI , λI ) be arbitrary. Recall that because we are
working over probability spaces, Jensen’s inequality can be used to show that for any 1 ≤
q ≤ r < ∞, we have disq (µ) ≤ disr (µ). Moreover, we have limq→∞ disq (µ) = dis∞ (µ).
The supremum of a family of continuous functions is lower semicontinuous. In our case,
dis∞ = sup{disq : q ∈ [1, ∞)}, and we have shown above that all the functions in this
family are continuous. Hence dis∞ is lower semicontinuous.
Proof of Theorem 111. Let (X, ωX , µX ), (Y, ωY , µY ), (Z, ωY , µY ) ∈ Nm . It is clear that
dN,p (X, Y ) ≥ 0. To show dN,p (X, X) = 0, consider the diagonal coupling ∆ (see Example
101). For p ∈ [1, ∞), we have:
Z Z 1/p
0 0 p 0 0
disp (∆) = |ωX (x, x ) − ωX (z, z )| d∆(x, z)d∆(x , z )
X×X X×X
Z Z 1/p
0 0 p 0
= |ωX (x, x ) − ωX (x, x )| dµX (x)dµX (x )
X X
= 0.
128
For p = ∞, we have:
disp (∆) = sup{|ωX (x, x0 ) − ωX (z, z 0 )| : (x, z), (x0 , z 0 ) ∈ supp(∆)}
= sup{|ωX (x, x0 ) − ωX (x, x0 )| : x, x0 ∈ supp(µX )}
= 0.
Thus dN,p (X, X) = 0 for any p ∈ [1, ∞]. For symmetry, notice that for any µ ∈
C (µX , µY ), we can define µe ∈ C (µY , µX ) by µ
e(y, x) = µ(x, y). Then disp (µ) = disp (eµ),
and this will show dN,p (X, Y ) = dN,p (Y, X).
Finally, we need to check the triangle inequality. Let ε > 0, and let µ12 ∈ C (µX , µY )
and µ23 ∈ C (µY , µZ ) be couplings such that 2dN,p (X, Y ) ≥ disp (µ12 )−ε and 2dN,p (Y, Z) ≥
disp (µ23 )−ε . Invoking Lemma 107, we obtain a probability measure µ ∈ Prob(X×Y ×Z)
with marginals µ12 , µ23 , and a marginal µ13 that is a coupling between µX and µZ . This
coupling is not necessarily optimal. For p ∈ [1, ∞) we have:
2dN,p (X, Z) ≤ disp (µ13 )
Z Z 1/p
0 0 p 0 0
= |ωX (x, x ) − ωZ (z, z )| dµ13 (x, z) dµ13 (x , z )
X×Z X×Z
Z Z 1/p
0 0 p 0 0 0
= |ωX (x, x ) − ωZ (z, z )| dµ(x, y, z) dµ(x , y , z )
X×Y ×Z X×Y ×Z
= kωX − ωY + ωY − ωZ kLp (µ⊗µ)
≤ kωX − ωY kLp (µ⊗µ) + kωY − ωZ kLp (µ⊗µ)
Z Z 1/p
0 0 p 0 0
= |ωX (x, x ) − ωY (y, y )| dµ12 (x, y) dµ12 (x , y ) ...
X×Y X×Y
Z Z 1/p
0 0 p 0 0
+ |ωY (y, y ) − ωZ (z, z )| dµ23 (y, z) dµ23 (y , z )
Y ×Z Y ×Z
≤ 2dN,p (X, Y ) + 2dN,p (Y, Z) + 2ε.
The second inequality above follows from Minkowski’s inequality. Letting ε → 0 now
proves the triangle inequality in the case p ∈ [1, ∞).
For p = ∞ we have:
129
Letting ε → 0 now proves the triangle inequality in the case p = ∞. This concludes our
proof.
Proof of Theorem 112. First suppose p ∈ [1, ∞). By the construction in Section 1.9.3,
we pass into interval representations of X and Y . As noted in Section 1.9.3, the choice
of parametrization is not necessarily unique, but this does not affect the argument. Let
(I, σX , λI ), (I, σY , λI ) denote these representations. By Lemma 106, the disp functional
is continuous on the space of couplings between these two networks. By Lemma 105, this
space of couplings is compact. Thus disp achieves its infimum.
Let µ ∈ C (λI , λI ) denote this minimizer of disp . By Remark 102, we can also take
couplings µX ∈ C (µX , λI ) and µY ∈ C (λI , µY ) which have zero distortion. By Lemma
107, we can glue together µX , µ, and µY to obtain a coupling ν ∈ C (µX , µY ). By the proof
of the triangle inequality in Theorem 111, we have:
disp (ν) ≤ disp (µX ) + disp (µ) + disp (µY ) = disp (µ) = 2dN,p ((I, σX , λI ), (I, σY , λI )).
Also by the triangle inequality, we have dN,p ((I, σX , λI ), (I, σY , λI )) ≤ dN,p (X, Y ). It
follows that ν ∈ C (µX , µY ) is optimal.
The case p = ∞ is analogous, because lower semicontinuity (Lemma 106) combined
with compactness (Lemma 105) is sufficient to guarantee that dis∞ achieves its infimum
on C (λI , λI ).
Proof of Theorem 113. Fix p ∈ [1, ∞). For the backwards direction, suppose there exist Z
and measurable maps f : Z → X and g : Z → Y such that the appropriate conditions are
satisfied. We first claim that dN,p ((X, ωX , µX ), (Z, f ∗ ωX , µZ )) = 0.
To see the claim, define µ ∈ Prob(X × Z) by µ := (f, id)∗ µZ . Then,
Z Z
p
|ωX (x, x0 ) − f ∗ ωX (z, z 0 )| dµ(x, z) dµ(x0 , z 0 )
ZX×Z ZX×Z
p
|ωX (x, x0 ) − ωX (f (z), f (z 0 ))| dµ(x, z) dµ(x0 , z 0 )
ZX×ZZ X×Z
p
|ωX (f (z), f (z 0 )) − ωX (f (z), f (z 0 ))| dµZ (z) dµZ (z 0 ) = 0.
Z Z
This verifies the claim. Similarly we have dN,p ((Y, ωY , µX ), (Z, g ∗ ωY , µZ )) = 0. Using the
diagonal coupling along with the assumption, we have dN,p ((Z, f ∗ ωX , µZ ), (Z, g ∗ ωY , µZ )) =
0. By triangle inequality, we then have dN,p (X, Y ) = 0.
For the forwards direction, let µ ∈ C (µX , µY ) be an optimal coupling with disp (µ) = 0
(Theorem 112). Define Z := X × Y , µZ := µ. Then the projection maps πX : Z → X
and πY : Z → Y are measurable. We also have (πX )∗ µ = µX and (πY )∗ µ = µY . Since
disp (µ) = 0, we also have k(πX )∗ ωX − (πY )∗ ωY k∞ = kωX − ωY k∞ = 0.
The p = ∞ case is proved analogously. This concludes the proof.
130
Proof of Theorem 115. Let (X, ωX , µX ), (Y, ωY , µY ), (Z, ωZ , µZ ) ∈ Nm . The proofs that
GP GP GP GP
dN,α (X, Y ) ≥ 0, dN,α (X, X) = 0, and that dN,α (X, Y ) = dN,α (Y, X) are analogous to
those used in Theorem 111. Hence we only check the triangle inequality. Let εXY >
GP GP
2dN,α (X, Y ), εY Z > 2dN,α (Y, Z), and let µXY , µY Z be couplings such that
µ⊗2 0 0 0 0
XY ({(x, y, x , y ) : |ωX (x, x ) − ωY (y, y )| ≥ εXY }) ≤ αεXY ,
µ⊗2 0 0 0 0
Y Z ({(y, z, y , z ) : |ωY (y, y ) − ωZ (z, z )| ≥ εY Z }) ≤ αεY Z .
To show this, it suffices to show C ⊆ A ∪ B, because then we have µ⊗2 (C) ≤ µ⊗2 (A) +
µ⊗2 (B) and consequently
µ⊗2 ⊗2 ⊗2 ⊗2 ⊗2 ⊗2
XZ ((πX , πZ )(C)) = µ (C) ≤ µ (A) + µ (B) = µXY ((πX , πY )(A)) + µY Z ((πY , πZ )(B))
≤ α(εXY + εY Z ).
Let ((x, y, z), (x0 , y 0 , z 0 )) ∈ (X × Y × Z)2 \ (A ∪ B). Then we have
|ωX (x, x0 ) − ωY (y, y 0 )| < εXY and |ωY (y, y 0 ) − ωZ (z, z 0 )| < εY Z .
By the triangle inequality, we then have:
|ωX (x, x0 ) − ωZ (z, z 0 )| ≤ |ωX (x, x0 ) − ωY (y, y 0 )| + |ωY (y, y 0 ) − ωZ (z, z 0 )| < εXY + εY Z .
Thus ((x, y, z), (x0 , y 0 , z 0 )) ∈ (X × Y × Z)2 \ C. This shows C ⊆ A ∪ B.
GP GP
The preceding work shows that 2dN,α (X, Z) ≤ εXY + εY Z . Since εXY > 2dN,α (X, Y )
GP GP GP GP
and εY Z > 2dN,α (Y, Z) were arbitrary, it follows that dN,α (X, Z) ≤ dN,α (X, Y )+dN,α (Y, Z).
= 2dN,∞ (X, Y ).
131
Proof of Theorem 118. Let t0 ∈ R, and let (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm . Via Lemma
GP
116, write ε := dN,0 (X, Y ) = dN,∞ (X, Y ). Using Theorem 112, let µ be an optimal
coupling between µX and µY for which dN,∞ (X, Y ) is achieved.
For each t ∈ R, write A(X, t) := {(x, x0 ) ∈ X × X : ωX (x, x0 ) ≤ t} = {ωX ≤ t}.
Similarly write A(Y, t) := {ωY ≤ t} for each t ∈ R.
Let B := {(x, y, x0 , y 0 ) ∈ (X × Y )2 : |ωX (x, x0 ) − ωY (y, y 0 )| ≥ ε}. Also let G denote
the complement of B, i.e. G := {(x, y, x0 , y 0 ) ∈ (X × Y )2 : |ωX (x, x0 ) − ωY (y, y 0 )| < ε}.
In particular, notice that for any (x, y, x0 , y 0 ) ∈ G, we have ωX (x, x0 ) < ε + ωY (y, y 0 ).
By the definition of ε, we have µ⊗2 (B) = 0, and hence µ⊗2 (G) = 1.
In what follows, we will focus on the case p ∈ [1, ∞) and write out the integrals
explicitly. An analogous proof holds for p = ∞. We have:
Z 1/p
w 0 p ⊗2 0
subp,t0 (X) = |ωX (x, x )| dµX (x, x )
A(X,t0 )
Z 1/p
0 p ⊗2 0 0
= |ωX (x, x )| dµ (x, y, x , y )
A(X,t0 )×Y 2
Z 1/p
0 p ⊗2 0 0
= |ωX (x, x )| dµ (x, y, x , y )
G∩(A(X,t0 )×Y 2 )
Z 1/p
= 1
p
G∩(A(X,t0 )×Y 2 )
0 0 0
|ωX (x, x ) − ωY (y, y ) + ωY (y, y )| dµ (x, y, x , y )
p ⊗2 0 0
.
(X×Y )2
For any (x, y, x0 , y 0 ) ∈ H, we have |ωX (x, x0 ) − ωY (y, y 0 )| < ε. Also we have ωY (y, y 0 ) <
ε+ωX (x, x0 ) ≤ ε+t0 . From the latter, we know G∩(A(X, t0 ) × Y 2 ) ⊆ X 2 ×A(Y, t0 +ε).
So we continue the previous expression as below:
Z 1/p Z 1/p
≤ 1H |ε| dµ
p p ⊗2
+ 0 p ⊗2 0 0
|ωY (y, y )| dµ (x, y, x , y )
(X×Y )2 X 2 ×A(Y,t0 +ε)
(2.4)
Z 1/p
p
≤ε+ |ωY (y, y 0 )| dµ⊗2 0
Y (y, y )
A(Y,t0 +ε)
w
= subp,t0 +ε (Y ) + ε.
Analogously, we have
subw w
p,t0 +ε (Y ) ≤ subp,t0 +2ε (X) + ε.
132
This yields interleaving for p ∈ [1, ∞). For p = ∞, we use the same arguments about G
and B to obtain:
0 0 0
subw
p,t0 (X) = sup{|ωX (x, x )| : x, x ∈ supp(µX ), ωX (x, x ) ≤ t0 }
≤ sup{|ωY (y, y 0 )| + ε : y, y 0 ∈ supp(µY ), ωY (y, y 0 ) ≤ t0 + ε}
≤ subw
p,t0 +ε (Y ) + ε.
supw w w
p,t0 (X) ≤ supp,t0 −ε (X) + ε ≤ supp,t0 −2ε (Y ) + 2ε.
kϕst st st st
XY kLp (µ⊗µ) − kψXY kLp (µ⊗µ) ≤ kϕXY − ψXY kLp (µ⊗µ) . (2.5)
Next we observe:
Z Z 1/p
0 0 p 0 0
kϕst
XY kLp (µ⊗µ) = ϕst
XY (x, y, x , y ) dµ(x, y) dµ(x , y )
X×Y X×Y
Z Z 1/p
0 p 0 0
= |ωX (s, x )| dµ(x, y) dµ(x , y )
X×Y X×Y
Z 1/p
0 p 0
= |ωX (s, x )| dµX (x ) = eccout
p,X (s).
X
st
Similarly, kψXY kLp (µ⊗µ) = eccout
p,Y (t).
133
For the right side of Inequality (2.6), we have:
Z Z
p 0 0 0 0 p
st st
kϕXY − ψXY kLp (µ⊗µ) = ϕst st
XY (x, y, x , y ) − ψXY (x, y, x , y ) dµ(x, y) dµ(x0 , y 0 )
ZX×Y ZX×Y
p
= |ωX (s, x0 ) − ωY (t, y 0 )| dµ(x, y) dµ(x0 , y 0 )
ZX×Y X×Y
p
= |ωX (s, x0 ) − ωY (t, y 0 )| dµ(x0 , y 0 ).
X×Y
The left hand side above is independent of the coupling µ, so we can infimize over C (µX , µY ):
Z
p p p
out out
eccp,X (s) − eccp,Y (t) ≤ inf |ωX (s, x0 ) − ωY (t, y 0 )| dν(x0 , y 0 ) = eccout
p,X,Y (s, t) .
ν∈C X×Y
Also observe:
Z 1/p
st st p
kϕXY − ψXY kLp (µ⊗µ) dµ(s, t)
X×Y
Z Z 1/p
0 0 p 0 0
= |ωX (s, x ) − ωY (t, y )| dµ(x , y ) dµ(s, t)
X×Y X×Y
= disp (µ) < 2η.
Thus we obtain:
Z 1/p Z 1/p
p p
eccout out
p,X (s) − eccp,Y (t) dµ(s, t) ≤ eccout
p,X,Y (s, t) dµ(s, t) < 2η.
X×Y X×Y
This proves the p ∈ [1, ∞) case. The p = ∞ case follows by applying Minkowski’s
inequality to obtain Inequality (2.5), and working analogously from there. Finally, we
remark that the same proof holds for the eccin in
p,X and eccp,X,Y functions.
The following lemma is a particular statement of the change of variables theorem that
we use later.
134
Lemma 151 (Change of variables). Let (X, FX , µX ) and (Y, FY , µY ) be two probability
spaces. Let f : X → R and g : Y → R be two measurable functions. Write f∗ µX and
g∗ µY to denote the pushforward distributions on R. Let T : X × Y → R2 be the map
(x, y) 7→ (f (x), g(y)) and let h : R2 → R+ be measurable. Next let µ ∈ C (µX , µY ). Then
T∗ µ ∈ C (f∗ µX , g∗ µY ), and the following inequality holds:
Z 1/p Z 1/p
inf h(a, b) d(T∗ µ)(a, b) ≤ h(T (x, y)) dµ(x, y) .
ν∈C (f∗ µX ,g∗ µY ) R2 X×Y
This is essentially the same as [86, Lemma 6.1] but stated for general probability spaces
instead of metric measure spaces. The form of the statement in [86, Lemma 6.1] is slightly
different, but it can be obtained from the statement presented above by using [120, Remark
2.19].
Proof of Lemma 151. First we check that T∗ µ ∈ C (f∗ µX , g∗ µY ). Let A ∈ Borel(R).
Then,
135
By Lemma 151, we know that T∗ τ ∈ C (νX , νY ). By the change of variables formula
and Fubini’s theorem,
Z 1/p Z 1/p
p 0 0 p 0 0
|a − b| d(T∗ τ )(a, b) = |ωX (x, x ) − ωY (y, y )| dτ (x, x , y, y )
R2 X 2 ×Y 2
Z 1/p
0 0 p 0 0
= |ωX (x, x ) − ωY (y, y )| d (µ(x, y)µ(x , y ))
X 2 ×Y 2
Z Z 1/p
0 0 p 0 0
= |ωX (x, x ) − ωY (y, y )| dµ(x, y) dµ(x , y )
X×Y X×Y
< 2η.
We infimize over C (µ⊗2 ⊗2
X , µY ), use the fact that η > dN,p (X, Y ) was arbitrary, and apply
Lemma 151 to obtain:
Z !1/p
p
2dN,p (X, Y ) ≥ inf
⊗2 ⊗2
|ωX (x, x0 ) − ωY (y, y 0 )| dµ(x, x0 , y, y 0 )
µ∈C (µX ,µY ) X 2 ×Y 2
Z 1/p
p
≥ inf |a − b| dγ(a, b) .
γ∈C (νX ,νY ) R2
This yields Inequalities (1.7)-(1.8).
Next we consider the distributions induced by the eccout function. For convenience,
write eX := (eccout out
p,X )∗ µX and eY := (eccp,Y )∗ µY . Now let T : X × Y → R be the map
(x, y) 7→ (eccout out 2 p
p,X (x), eccp,Y (y)), and let h : R → R be the map (a, b) 7→ |a − b| . By the
change of variables formula and Theorem 121, we know
Z 1/p Z 1/p
p out out p
inf |a − b| d(T∗ µ)(a, b) = inf eccp,X (x) − eccp,X (y) dµ(x, y)
µ∈C (µX ,µY ) R2 µ∈C (µX ,µY ) X×Y
≤ 2dN,p (X, Y ).
By Lemma 151, we know that T∗ µ ∈ C (eX , eY ) and also the following:
Z 1/p Z 1/p
p p
inf |a − b| dγ(a, b) ≤ inf eccout out
p,X (x) − eccp,X (y) dµ(x, y)
γ∈C (eX ,eY ) R2 µ∈C (µX ,µY ) X×Y
≤ 2dN,p (X, Y ).
This proves Inequalities (1.9)-(1.10). Inequalities (1.11)-(1.12) are proved analogously.
Finally we consider the distributions obtained as pushforwards of the joint eccentricity
function, i.e. Inequalities (1.13)-(1.16). For each x ∈ X and y ∈ Y let T xy : X×Y → R be
the map (x0 , y 0 ) 7→ (ωX (x, x0 ), ωY (y, y 0 )), and let h : R2 → R be the map (a, b) 7→ |a − b|p .
Let γ ∈ C (µX , µY ). By the change of variables formula, we have
Z Z
p p
xy
|a − b| d(T∗ γ)(a, b) = |ωX (x, x0 ) − ωY (y, y 0 )| dγ(x0 , y 0 ), and so
R2
Z Z ZX×Y Z
p p
|a − b| d(T∗xy γ)(a, b) dµ(x, y) = |ωX (x, x0 ) − ωY (y, y 0 )| dγ(x0 , y 0 ) dµ(x, y).
X×Y R2 X×Y X×Y
136
By Lemma 151, T∗xy µ ∈ C (λX (x), λY (y)). Applying Theorem 121 and Lemma 151, we
have:
Z Z 1/p
0 0 p 0 0
2dN,p (X, Y ) ≥ inf inf |ωX (x, x ) − ωY (y, y )| dγ(x , y ) dµ(x, y)
µ∈C (µX ,µY ) X×Y γ∈C (µX ,µY ) X×Y
Z Z 1/p
p
≥ inf inf |a − b| dγ(a, b) dµ(x, y) .
µ∈C (µX ,µY ) X×Y γ∈C (λX (x),λY (y)) R2
137
Chapter 3: Persistent Homology on Networks
In this chapter, we give proofs and auxiliary results related to persistent homology.
138
where x̂i denotes omission of xi from the sequence.
We will write C = (Ck , ∂k )k∈Z+ to denote a chain complex, i.e. a sequence of vector
spaces with boundary maps such that ∂k−1 ◦ ∂k = 0. Given a chain complex C and any
k ∈ Z+ , the k-th homology of the chain complex C is denoted Hk (C) := ker(∂k )/ im(∂k+1 ).
The k-th Betti number of C is denoted βk (C) := dim(Hk (C)).
Given a simplicial map f between simplicial complexes, we write f∗ to denote the
induced chain map between the corresponding chain complexes [88, §1.12], and (fk )# to
denote the linear map on kth homology vector spaces induced for each k ∈ Z+ .
The operations of passing from simplicial complexes and simplicial maps to chain com-
plexes and induced chain maps, and then to homology vector spaces with induced linear
maps, will be referred to as passing to homology. Recall the following useful fact, often
referred to as functoriality of homology [88, Theorem 12.2]: given a composition g ◦ f of
simplicial maps, we have
The elements of PVec(R) contain only a finite number of vector spaces, up to isomor-
phism. By the classification results in [21, §5.2], it is possible to associate a full invariant,
called a persistence barcode or persistence diagram, to each element of PVec(R). This
barcode is a multiset of persistence intervals, and is represented as a set of lines over a
single axis. The barcode of a persistence vector space V is denoted Pers(V). The intervals
in Pers(V) can be represented as the persistence diagram of V, which is as a multiset of
2
points lying on or above the diagonal in R , counted with multiplicity. More specifically,
2
Dgm(V) := (δi , δj+1 ) ∈ R : [δi , δj+1 ) ∈ Pers(V) ,
2
where the multiplicity of (δi , δj+1 ) ∈ R is given by the multiplicity of [δi , δj+1 ) ∈ Pers(V).
139
Persistence diagrams can be compared using the bottleneck distance, which we denote
by dB . Details about this distance, as well as the other material related to persistent homol-
ogy, can be found in [26]. Numerous other formulations of the material presented above
can be found in [49, 123, 19, 47, 50, 9, 48].
Remark 152. Whenever we describe a persistence diagram as being trivial, we mean that
it does not have any off-diagonal points.
νδ,δ0 0 νδ+ε,δ0 +ε 0
Vδ Vδ V δ+ε V δ +ε
ϕδ 0 ψδ
ϕδ ψδ0
µδ+ε,δ0 +ε µδ,δ0
δ+ε δ 0 +ε δ δ0
U U U U
νδ,δ+2ε
Vδ V δ+2ε V δ+ε
ψδ ϕδ+ε
ϕδ ψδ+ε
µδ,δ+2ε
U δ+ε Uδ U δ+2ε
140
Proposition 153 (Properties of contiguous maps). Let f, g : Σ → Ξ be two contiguous
simplicial maps. Then,
2. The chain maps induced by f and g are chain homotopic, and as a result, the induced
maps f# and g# for homology are equal [88, Theorem 12.5].
Lemma 154 (Stability Lemma, [45]). Let F, G be two filtered simplicial complexes written
as
sδ,δ0 0 tδ,δ0 0
{Fδ −−→ Fδ }δ0 ≥δ∈R and {Gδ −−→ Gδ }δ0 ≥δ∈R ,
where sδ,δ0 and tδ,δ0 denote the natural
δ δ+η
η ≥ δ0 is such
inclusion maps. Suppose
δ+η
that there
exist families of simplicial maps ϕδ : F → G δ∈R
and ψδ : G → F δ∈R
such
that the following are satisfied for any δ ≤ δ 0 :
sδ,δ0 0 sδ+η,δ0 +η 0
Fδ Fδ Fδ+η Fδ +η
ϕδ0 ψδ
ϕδ
ψδ0
tδ+η,δ0 +η tδ,δ0
δ+η δ 0 +η δ δ0
G G G G
sδ,δ+2η
Fδ Fδ+2η Fδ+η
ψδ ϕδ+η
ϕδ ψδ+η
tδ,δ+2η
δ+η
G Gδ Gδ+2η
For each k ∈ Z+ , let PVeck (F), PVeck (G) denote the k-dimensional persistent vector
spaces associated to F and G. Then for each k ∈ Z+ ,
141
3.2 Simplicial constructions
Here sδ,δ0 and tδ+η,δ0 +η are the inclusion maps. We claim that tδ+η,δ0 +η ◦ϕδ and ϕδ0 ◦sδ,δ0
are contiguous simplicial maps. To see this, let σ ∈ Dsiδ,X . Since sδ,δ0 is just the inclusion,
it follows that tδ+η,δ0 +η (ϕδ (σ)) ∪ ϕδ0 (sδ,δ0 (σ)) = ϕδ (σ), which is a simplex in Dsiδ+η,Y
because ϕδ is simplicial, and hence a simplex in Dsiδ0 +η,Y because the inclusion tδ+η,δ0 +η is
simplicial. Thus tδ+η,δ0 +η ◦ ϕδ and ϕδ0 ◦ sδ,δ0 are contiguous, and their induced linear maps
for homology are equal. By a similar argument, we verify that sδ+η,δ0 +η ◦ ψδ and ψδ0 ◦ tδ,δ0
are contiguous simplicial maps as well.
Next we check that the maps ψδ+η ◦ ϕδ and sδ,δ+2η in the figure below are contiguous.
sδ,δ+2η
Dsiδ,X Dsiδ+2η,X
ϕδ ψδ+η
Dsiδ+η,Y
142
Let xi ∈ σ. Note that for our fixed σ = [x0 , . . . , xn ] ∈ Dsiδ,X and x0 , we have:
|ωX (xi , x0 ) − ωX (ψ(ϕ(xi )), ψ(ϕ(x0 )))| ≤ |ωX (xi , x0 ) − ωY (ϕ(xi ), ϕ(x0 ))|
+ |ωY (ϕ(xi ), ϕ(x0 )) − ωX (ψ(ϕ(xi )), ψ(ϕ(x0 )))|
< 2η.
Thus we obtain ωX (ψ(ϕ(xi )), ψ(ϕ(x ))) < ωX (xi , x0 ) + 2η ≤ δ + 2η.
0
Since this holds for any xi ∈ σ, it follows that ψδ+η (ϕδ (σ)) ∈ Dsiδ+2η,X . We further
claim that
Let y = ϕ(x0 ). Then |ωX (xi , ψ(y)) − ωY (ϕ(xi ), y)| < η. In particular,
Since 0 ≤ i ≤ n were arbitrary, it follows that τ ∈ Dsiδ+2η,X . Thus ψδ+η ◦ ϕδ and sδ,δ+2η
are contiguous. Similarly, we use the dis(ψ) and CY,X (ψ, ϕ) terms to verify that tδ,δ+2η and
ϕδ+η ◦ ψδ are contiguous.
Since η > 2dN (X, Y ) was arbitrary, the result now follows by an application of Lemma
154.
Hk (Dsiδ,X ) ∼
= Hk (Dso
δ,X ).
In the persistent setting, Theorem 33 and Corollary 155 suggest the following question:
Given a network (X, ωX ) and a fixed dimension k ∈ Z+ , are the persistence di-
agrams of the Dowker sink and source filtrations of (X, ωX ) necessarily equal?
143
In what follows, we provide a positive answer to the question above. Our strategy is to
use the Functorial Dowker Theorem (Theorem 35), for which we will provide a complete
proof below. The Functorial Dowker Theorem implies equality between sink and source
persistence diagrams.
Thus we may call either of the diagrams above the k-dimensional Dowker diagram of X,
denoted DgmDk (X).
Before proving the corollary, we state an R-indexed variant of the Persistence Equiva-
lence Theorem [47]. This particular version follows from the isometry theorem [9], and we
refer the reader to [26, Chapter 5] for an expanded presentation of this material.
Theorem 157 (Persistence Equivalence Theorem). Consider two persistent vector spaces
µδ,δ0 0 νδ,δ0 0
U = {U δ −−→ U δ }δ≤δ0 ∈R and V = {V δ −−→ V δ }δ≤δ0 ∈R with connecting maps fδ :
U δ → V δ.
0 00
··· Uδ Uδ Uδ ···
fδ fδ 0 fδ00
0 00
··· Vδ Vδ Vδ ···
If the fδ are all isomorphisms and each square in the diagram above commutes, then:
Dgm(U) = Dgm(V).
Proof of Corollary 156. Let δ ≤ δ 0 ∈ R, and consider the relations Rδ,X ⊆ Rδ0 ,X ⊆
X × X. Suppose first that Rδ,X and Rδ0 ,X are both nonempty. By applying Theorem 35,
we obtain homotopy equivalences between the source and sink complexes that commute
with the canonical inclusions up to homotopy. Passing to the k-th homology level, we
obtain persistence vector spaces that satisfy the commutativity properties of Theorem 157.
The result follows from Theorem 157.
In the case where Rδ,X and Rδ0 ,X are both empty, there is nothing to show because all
the associated complexes are empty. Suppose Rδ,X is empty, and Rδ0 ,X is nonempty. Then
Dsiδ,X and Dso si so
δ,X are empty, so their inclusions into Dδ 0 ,X and Dδ 0 ,X induce zero maps upon
passing to homology. Thus the commutativity of Theorem 157 is satisfied, and the result
follows by Theorem 157.
144
The proof of the Functorial Dowker Theorem It remains to prove Theorem 35. Because
the proof involves numerous maps, we will adopt the notational convention of adding a
subscript to a function to denote its codomain—e.g. we will write fB to denote a function
with codomain B.
First we recall the construction of a combinatorial barycentric subdivision (see [46, §2],
[81, §4.7], [7, Appendix A]).
Definition 47 (Barycentric subdivisions). For any simplicial complex Σ, one may construct
a new simplicial complex Σ(1) , called the first barycentric subdivision, as follows:
Σ(1) := {[σ1 , σ2 , . . . , σp ] : σ1 ⊆ σ2 ⊆ . . . ⊆ σp , each σi ∈ Σ} .
Note that the vertices of Σ(1) are the simplices of Σ, and the simplices of Σ(1) are nested
sequences of simplices of Σ. Furthermore, note that given any two simplicial complexes
Σ, Ξ and a simplicial map f : Σ → Ξ, there is a natural simplicial map f 1 : Σ1 → Ξ1
defined as:
f 1([σ1 , . . . , σp ]) := [f (σ1 ), . . . , f (σp )], σ1 ⊆ σ2 ⊆ . . . , σp , each σi ∈ Σ.
To see that this is simplicial, note that f (σi ) ⊆ f (σj ) whenever σi ⊆ σj . As a special case,
observe that any inclusion map ι : Σ ,→ Ξ induces an inclusion map ι1 : Σ1 ,→ Ξ1.
Given a simplex σ = [x0 , . . . , xk ] in a simplicial complex Σ, one defines the barycenter
to be the point B(σ) := ki=0 k+1 1
xi ∈ |Σ|. Then the spaces |Σ1| and |Σ| can be identified
P
via a homeomorphism E|Σ| : |Σ1| → |Σ| defined on vertices by E|Σ| (σ) := B(σ) and
extended linearly.
Details on the preceding list of definitions can be found in [88, §2.14-15, 2.19], [112,
§3.3-4], and also [7, Appendix A]. The next proposition follows from the discussions in
these references, and is a simple restatement of [7, Proposition A.1.5]. We provide a proof
in the appendix for completeness.
Proposition 158 (Simplicial approximation to E• ). Let Σ be a simplicial complex, and let
Φ : Σ1 → Σ be a simplicial map such that Φ(σ) ∈ σ for each σ ∈ Σ. Then |Φ| ' E|Σ| .
We now introduce some auxiliary constructions dating back to [46] that use the setup
stated in Theorem 35. For any nonempty relation R ⊆ X × Y , one may define [46, §2]
an associated map ΦER : ER 1 → ER as follows: first define ΦER on vertices of ER 1 by
ΦER (σ) = sσ , where sσ is the least vertex of σ with respect to the total order. Next, for
any simplex [σ1 , . . . , σk ] of ER 1, where σ1 ⊆ . . . ⊆ σk , we have ΦER (σi ) = sσi ∈ σk for
all 1 ≤ i ≤ k. Thus [ΦER (σ1 ), . . . , ΦER (σk )] = [sσ1 , sσ2 , . . . , sσk ] is a face of σk , hence a
simplex of Σ. This defines ΦER as a simplicial map ER 1 → ER . This argument also shows
that ΦER is order-reversing: if σ ⊆ σ 0 , then ΦER (σ) ≥ ΦER (σ 0 ).
Remark 159. Applying Proposition 158 to the setup above, one sees that |ΦER | ' E|ER | .
(2)
After passing to a second barycentric subdivision ER (obtained by taking a barycentric
subdivision of ER 1) and obtaining a map ΦER 1 : ER → ER 1, one also has |ΦER 1 | '
(2)
E|ER 1| .
145
One also defines [46, §3] a simplicial map ΨFR : ER 1 → FR as follows. Given a
vertex σ = [x0 , . . . , xk ] ∈ ER 1, one defines ΨFR (σ) = yσ , for some yσ ∈ Y such that
(xi , yσ ) ∈ R for each i. To see why this vertex map is simplicial, let σ 1 = [σ0 , . . . , σk ]
be a simplex in ER 1. Let x ∈ σ0 . Then, because σ0 ⊆ σ1 ⊆ . . . ⊆ σk , we automatically
have that (x, ΨFR (σi )) ∈ R, for each i = 0, . . . , k. Thus ΨFR (σ 1) is a simplex in FR . This
definition involves a choice of yσ when writing ΨFR (σ) = yσ , but all the maps resulting
from such choices are contiguous [46, §3].
The preceding map induces a simplicial map ΨFR 1 : ER → FR 1 as follows. Given
(2)
[ΨFR (τ0 ), . . . , ΨFR (τk )]. Since ΨFR is simplicial, this is a simplex in FR , i.e. a vertex
in FR 1. Thus we have a vertex map ΨFR 1 : ER → FR 1. To check that this map is sim-
(2)
(1)
ΨFR : ER → FR as above. Consider the following diagram:
(2) (2)
FR FR 0 ΦFR0 1
ΦFR 1
(1) (1)
ΨER 1 FR ΦFR FR 0 ΦFR0
ΨFR 1 ΨER FR FR 0
(2) (2)
ER ER 0
ΦFR 1 ΨFR
ER 1 ER0 1
ΦFR ER ER0
146
FR 1 FR
(2)
FR
(1)
FR
ΦFR 1 ΦFR
ΨFR 1 ΨER
ΨER 1 ΨFR
ER
(2)
ER
(1)
ER ER 1
ΦER 1 ΦER
FR
(2) (1)
FR FR 1 FR
ΦFR 1 ΦFR
ΨFR 1
ΨER
ΨER 1 ΨFR
ΦER ΦER 1
ER 1 ER ER
(2) (1)
ER
Items (1) and (3) appear in the proof of Dowker’s theorem [46, Lemmas 5, 6], and it is
easy to see that a symmetric argument shows Items (2) and (4). For completeness, we will
verify these items in this paper, but defer this verification to the end of the proof.
By passing to the geometric realization and applying Proposition 153 and Remark 159,
we obtain the following from Item (3) of Claim 13:
Replacing this term in the expression for Item (2) of Claim 13, we obtain:
147
Replacing this term in the expression for Item (1) of Claim 13, we obtain:
ιE
ER ER0
|ιF | ◦ E|FR | ' |ιF | ◦ |ΦFR | ' |ΦFR0 | ◦ |ιF 1 | ' E|FR0 | ◦ |ιF 1 |,
−1
E|F 0|
◦ |ιF | ◦ E|FR | ' |ιF 1 |.
R
This proves the theorem. It only remains to prove the various claims.
Proof of Claim 13. In proving Claim 13, we supply the proofs of Items (2) and (4). These
arguments are adapted from [46, Lemmas 1, 5, and 6], where the proofs of Items (1) and
(3) appeared.
148
For Item (2), let τ (2) = [τ0 1, . . . , τk 1] be a simplex in FR , where τ0 1 ⊆ . . . ⊆ τk 1
(2)
But this is easy to see: letting y ∈ τ0 , we have (xτ0 , y), . . . , (xτk , y), (x0τ0 , y), . . . , (x0τk , y) ⊆
3.2.3 The equivalence between the finite FDT and the simplicial FNTs
In this section, we present our answer to Question 1. We present the proof of Theorem
39 over the course of the next few subsections.
Remark 160. By virtue of Theorem 39, we will write simplicial FNT to mean either of the
FNT I or FNT II.
149
Theorem 36 implies Theorem 37
Proof of Theorem 37. Let V, V 0 denote the vertex sets of Σ, Σ0 , respectively. We define
the relations R ⊆ V × I and R0 ⊆ V 0 × I 0 as follows: (v, i) ∈ R ⇐⇒ v ∈ Σi and
(v 0 , i0 ) ∈ R0 ⇐⇒ v 0 ∈ Σ0i . Then R ⊆ R0 , the set I 0 is finite by assumption, and so we
are in the setting of the finite FDT (Theorem 36) (perhaps invoking the Axiom of Choice
to obtain the total order on V 0 ). It suffices to show that ER = Σ, ER0 = Σ0 , FR = N (AΣ ),
and FR0 = N (AΣ0 ), where ER , ER0 , FR , FR0 are as defined in Theorem 35.
First we claim the ER = Σ. By the definitions of R and ER , we have ER = {σ ⊆
V : ∃i ∈ I, (v, i) ∈ R ∀ v ∈ σ} = {σ ⊆ V : ∃i ∈ I, v ∈ Σi ∀ v ∈ σ}. Let
σ ∈ ER , and let i ∈ I be such that v ∈ Σi for all v ∈ σ. Then σ ⊆ V (Σi ), and since
Σi = pow(V (Σi )) by the assumption about covers of simplices, we have σ ∈ Σi ⊆ Σ.
Thus ER ⊆ Σ. Conversely, let σ ∈ Σ. Then σ ∈ Σi for some i. Thus for all v ∈ σ, we
have (v, i) ∈ R. It follows that σ ∈ ER . This shows ER = Σ. The proof that ER0 = Σ0 is
analogous.
Next we claim that FR = N (AΣ ). By the definition of FR , we have FR = {τ ⊆ I :
∃v ∈ V, (v, i) ∈ R ∀ i ∈ τ }. Let τ ∈ FR , and let v ∈ V be such that (v, i) ∈ R for
each i ∈ τ . Then ∩i∈τ Σi 6= ∅, and so τ ∈ N (AΣ ). Conversely, let τ ∈ N (AΣ ). Then
∩i∈τ Σi 6= ∅, so there exists v ∈ V such that v ∈ Σi for each i ∈ τ . Thus σ ∈ FR . This
shows FR = N (AΣ ). The case for R0 is analogous.
An application of Theorem 36 now completes the proof.
150
Now for each x ∈ V (ER0 ), define A0x := {τ ∈ FR0 : (x, y) ∈ R0 for all y ∈ τ }. Also
define A0 := {A0x : x ∈ V (ER0 )}. The same argument shows that A0 is a finite cover
of subcomplexes (in particular, a cover of simplices) for FR0 with all finite intersections
either empty or contractible, and that ER0 = N (A0 ). An application of Theorem 38 now
shows that |ER | ' |FR | and |ER0 | ' |FR0 |, via maps that commute up to homotopy with
the inclusions |ER | ,→ |ER0 | and |FR | ,→ |FR0 |.
151
1. There exists a homotopy equivalence ϕ : |Σ ∪ pow(U )| → |Σ| such that ϕ(x) and
id|Σ∪pow(U )| (x) belong to the same simplex of |Σ ∪ pow(U )| for each x ∈ |Σ ∪
pow(U )|. Furthermore, the homotopy inverse is given by the inclusion ι : |Σ| ,→
|Σ ∪ pow(U )|.
Proof of Lemma 164. The proof uses this fact: any continuous map of an n-sphere Sn into
a contractible space Y can be continuously extended to a mapping of the (n + 1)-disk Dn+1
into Y , where Dn+1 has Sn as its boundary [112, p. 27]. First we define ϕ. On |Σ|, define
ϕ to be the identity. Next let σ be a minimal simplex in | pow(U ) \ Σ|. By minimality,
the boundary of σ (denoted Bd(σ)) belongs to |Σ ∩ pow(U )|, and |Σ| in particular. Thus
ϕ is defined on Bd(σ), which is an n-sphere for some n ≥ 0. Furthermore, ϕ maps
Bd(σ) into the contractible space |Σ ∩ pow(U )|. Then we use the aforementioned fact to
extend ϕ continuously to all of σ so that ϕ maps σ into |Σ ∩ pow(U )|. Furthermore, both
id|Σ∪pow(U )| (σ) = σ and ϕ(σ) belong to the simplex | pow(U )|. By iterating this procedure,
we obtain a retraction ϕ : |Σ ∪ pow(U )| → |Σ| such that ϕ(x) and x belong to the same
simplex in |Σ ∪ pow(U )|, for each x ∈ |Σ ∪ pow(U )|. Thus ϕ is homotopic to id|Σ∪pow(U )|
by Lemma 162. Thus we have a homotopy equivalence:
For the second part of the proof, suppose that a homotopy equivalence ϕ : |Σ ∪
pow(U )| → |Σ| as above is provided. We need to extend ϕ to obtain ϕ0 . Define ϕ0 to be
equal to ϕ on |Σ∪pow(U )|, and equal to the identity on G := |Σ0 |\|Σ∪pow(U )|. Let σ be
a minimal simplex in | pow(U 0 )|\G. Then by minimality, Bd(σ) belongs to |Σ0 ∩pow(U 0 )|.
As before, we have ϕ0 mapping Bd(σ) into the contractible space |Σ0 ∩ pow(U 0 )|, and we
extend ϕ0 continuously to a map of σ into |Σ0 ∩ pow(U 0 )|. Once again, id|Σ0 ∪pow(U 0 )| (x)
and ϕ0 (x) belong to the same simplex | pow(U 0 )|, for all x ∈ σ. Iterating this procedure
gives a continuous map ϕ0 : |Σ0 ∪ pow(U 0 )| → |Σ0 |. This map is not necessarily a re-
traction, because there may be a simplex σ ∈ |Σ ∪ pow(U )| ∩ |Σ0 | on which ϕ0 is not the
identity. However, it still holds that ϕ0 is continuous, and that x, ϕ0 (x) get mapped to the
same simplex for each x ∈ |Σ0 ∪ pow(U 0 )|. Thus Lemma 162 still applies to show that ϕ0
is homotopic to id|Σ0 ∪pow(U 0 )| .
We write ι0 to denote the inclusion ι0 : |Σ0 | ,→ |Σ0 ∪ pow(U 0 )|. By the preceding work,
we have ι0 ◦ ϕ0 ' id|Σ0 ∪pow(U 0 )| . Next let x ∈ |Σ0 |. Then either x ∈ |Σ0 | ∩ |Σ ∪ pow(U )|, or
x ∈ G. In the first case, we know that ϕ0 (x) = ϕ(x) and id|Σ0 | (x) = id|Σ∪pow(U )| (x) belong
to the same simplex of |Σ ∪ pow(U )| by the assumption on ϕ. In the second case, we know
that ϕ0 (x) = x = id|Σ0 | (x). Thus for any x ∈ |Σ0 |, we know that ϕ0 (x) and id|Σ0 | (x) belong
152
to the same simplex in |Σ0 ∪ pow(U 0 )|. By Lemma 162, we then have ϕ0 ||Σ0 | ' id|Σ0 | . Thus
ϕ0 ◦ ι0 ' id|Σ0 | . This shows that ϕ0 is the necessary homotopy equivalence.
Now we present the proof of Theorem 38.
Notation. Let I be an ordered set. For any subset J ⊆ I, we write (J) to denote the
sequence (j1 , j2 , j3 , . . .), where the ordering is inherited from the ordering on I.
Proof of Theorem 38. The first step is to functorially deform AΣ and AΣ0 into covers of
simplices while still preserving all associated homotopy types. Then we will be able to
apply Theorem 37. We can assume by Lemma 161 that each subcomplex Σi is induced,
and likewise for each Σ0i . We start by fixing an enumeration I 0 = {l1 , l2 , . . .}. Thus I 0
becomes an ordered set.
Passing to covers of simplices. We now define some inductive constructions. In what
follows, we will define complexes denoted Σ• , Σ0• obtained by “filling in” Σ and Σ0 while
preserving homotopy equivalence, as well as covers of these larger complexes denoted
Σ?,• , Σ0?,• . First define:
(
Σ ∪ pow(V (Σl1 )) : if l1 ∈ I
Σ(l1 ) :=
Σ : otherwise.
Σ0(l1 ) := Σ0 ∪ pow(V (Σ0l1 )).
Now by induction, suppose Σ(l1 ,...,ln ) and Σi,(l1 ,...,ln ) are defined for all i ∈ I. Also suppose
Σ0(l1 ,...,ln ) and Σ0i,(l1 ,...,ln ) are defined for all i ∈ I 0 . Then we define:
(
Σ(l1 ,...,ln ) ∪ pow(V (Σln+1 ,(l1 ,...,ln ) )) : if ln+1 ∈ I
Σ(l1 ,...,ln ,ln+1 ) :=
Σ(l1 ,...,ln ) : otherwise.
Σ0(l1 ,...,ln ,ln+1 ) := Σ0(l1 ,...,ln ) ∪ pow(V (Σ0ln+1 ,(l1 ,...,ln ) )).
153
And for all i ∈ I 0 , we have
Σ0i,(l1 ,l2 ,...,ln+1 ) := Σ0i,(l1 ,l2 ,...,ln ) ∪ pow(V (Σ0i,(l1 ,l2 ,...,ln ) ) ∩ V (Σ0ln+1 ,(l1 ,l2 ,...,ln ) )).
Finally, for any n ≤ card(I 0 ), we define AΣ,(l1 ,...,ln ) := {Σi,(l1 ,...,ln ) : i ∈ I} and
AΣ0 ,(l1 ,...,ln ) := {Σ0i,(l1 ,...,ln ) : i ∈ I 0 }. We will show that these are covers of Σ(l1 ,l2 ,...,ln ) and
Σ0(l1 ,l2 ,...,ln ) , respectively.
The next step is to prove by induction that for any n ≤ card(I 0 ), we have |Σ| '
|Σ(l1 ,l2 ,...,ln )
| and |Σ0 | ' |Σ0(l1 ,l2 ,...,ln ) |, that N (AΣ ) = N (AΣ,(l1 ,l2 ,...,ln ) ) and N (AΣ0 ) =
N (AΣ0 ,(l1 ,l2 ,...,ln ) ), and that nonempty finite intersections of the new covers AΣ,(l1 ,l2 ,...,ln ) , AΣ0 ,(l1 ,l2 ,...,ln )
remain contractible. For the base case n = 0, we have Σ = Σ() , Σ0 = Σ0() . Thus the base
case is true by assumption. We present the inductive step next.
Claim 16. For this claim, let • denote l1 , . . . , ln , where 0 < n < card(I 0 ). Define l := ln+1 .
Suppose the following is true:
1. The collections AΣ,(•) and AΣ0 ,(•) are covers of Σ(•) and Σ0(•) .
2. The nerves of the coverings are unchanged: N (AΣ ) = N (AΣ,(•) ) and N (AΣ0 ) =
N (AΣ0 ,(•) ).
3. Each of the subcomplexes Σi,(•) , i ∈ I, and Σ0j,(•) , j ∈ I 0 is induced in Σ(•) and Σ0(•) ,
respectively.
5. We have homotopy equivalences |Σ| ' |Σ(•) | and |Σ0 | ' |Σ0(•) | via maps that com-
mute with the canonical inclusions.
Then the preceding statements are true for Σ(•,l) , Σ0(•,l) , AΣ,(•,l) , and AΣ0 ,(•,l) as well.
Proof. For the first claim, we have Σ(•,l) = Σ(•) ∪ pow(V (Σl,(•) )) ⊆ ∪i∈I Σi,(•,l) . For the
inclusion, we used the inductive assumption that Σ(•) = ∪i∈I Σi,(•) . Similarly, Σ0(•,l) ⊆
∪i∈I 0 Σ0i,(•,l) .
For the second claim, let i ∈ I. Then V (Σi,(l1 ) ) = V (Σi ), and in particular, we have
V (Σi,(•,l) ) = V (Σi,(•) ) = V (Σi ). Next observe that for any σ ⊆ I, the intersection
Thus N (AΣ ) = N (AΣ,(•) ) = N (AΣ,(•,l) ), and similarly N (AΣ0 ) = N (AΣ0 ,(•) ) = N (AΣ0 ,(•,l) ).
For the third claim, again let i ∈ I. If l 6∈ I, then Σi,(•,l) = Σi,(•) , so we are done by the
inductive assumption. Suppose l ∈ I. Since Σi,(•) is induced by the inductive assumption,
154
we have:
Thus Σi,(•,l) is induced. The same argument holds for the I 0 case.
For the fourth claim, let σ ⊆ I, and suppose ∩i∈σ Σi,(•,l) is nonempty. By the previous
claim, each Σi,(•,l) is induced. Thus we write:
For convenience, define A := (∩i∈σ Σi,(•) ) and B := pow(∩i∈σ V (Σi,(•) ) ∩ V (Σl,(•) )). Then
|A| is contractible by inductive assumption, and |B| is a full simplex, hence contractible.
Also, A ∩ B has the form
and the latter is contractible by inductive assumption. Thus by Lemma 163, we have |A∪B|
contractible. This proves the claim for the case σ ⊆ I. The case τ ⊆ I 0 is similar.
Now we proceed to the final claim. Since Σl,(•) is induced, we have Σl,(•) = Σ(•) ∩
pow(V (Σl,(•) )). By the contractibility assumption, we know that |Σl,(•) | is contractible.
Also we know that |Σ0l,(•) | = |Σ0(•) ∩ pow(V (Σ0l,(•) ))| is contractible. By assumption we
also have V (Σl,(•) ) ⊆ V (Σ0l,(•) ). Thus by Lemma 164, we obtain homotopy equivalences
Φl : |Σ(•,l) | → |Σ(•) | and Φ0l : |Σ0(•,l) | → |Σ0(•) | such that Φ0l extends Φl . Furthermore,
the homotopy inverses of Φl and Φ0l are just the inclusions |Σ(•) | ,→ |Σ(•,l) | and |Σ0(•) | ,→
|Σ0(•,l) |.
Now let ι : |Σ(•) | → |Σ0(•) | and ιl : |Σ(•,l) | → |Σ0(•,l) | denote the canonical inclusions.
We wish to show the equality Φ0l ◦ ιl = ι ◦ Φl . Let x ∈ |Σ(•,l) |. Because Φ0l extends Φl (this
is why we needed the functorial gluing lemma), we have
Since x ∈ |Σ(•,l) | was arbitrary, the equality follows immediately. By the inductive as-
sumption, we already have homotopy equivalences |Σ(•) | → |Σ| and |Σ0(•) | → |Σ0 | that
155
commute with the canonical inclusions. Composing these maps with Φl and Φ0l completes
the proof of the claim.
By the preceding work, we replace the subcomplexes Σl , Σ0l by full simplices of the
form Σl,(•,l) , Σ0l,(•,l) . In this process, the nerves remain unchanged and the complexes Σ, Σ0
are replaced by homotopy equivalent complexes Σ(•,l) , Σ0(•,l) . Furthermore, this process is
functorial—the homotopy equivalences commute with the canonical inclusions Σ ,→ Σ(•,l)
and Σ0 ,→ Σ0(•,l) .
Repeating the inductive process in Claim 16 for all the finitely many l ∈ I yields
a simplicial complex Σ(I) along with a cover of simplices AΣ,(I) . We also perform the
same procedure for all l ∈ I 0 \ I (this does not affect Σ(I) ) to obtain a simplicial complex
0
Σ0(I ) along with a cover of simplices AΣ0 ,(I 0 ) . Furthermore, Σ(I) and Σ0(I) are related to
Σ and Σ0 by a finite sequence of homotopy equivalences that commute with the canonical
inclusions. Also, we have N (AΣ ) = N (AΣ,(I) ) and N (AΣ0 ) = N (AΣ0 ,(I 0 ) ). Thus we
obtain the following picture:
0
By applying Theorem 37 to the block consisting of |Σ(I) |, |Σ0(I ) |, |N (AΣ,I )| and
|N (AΣ0 ,I 0 )|, we obtain a square that commutes up to homotopy. Then by composing
the homotopy equivalences constructed above, we obtain a square consisting of |Σ|, |Σ0 |,
|N (AΣ )|, and |N (AΣ0 )| that commutes up to homotopy. Thus we obtain homotopy equiv-
alences |Σ| ' |N (AΣ )| and |Σ0 | ' |N (AΣ0 )| via maps that commute up to homotopy with
the canonical inclusions.
In the setting of metric spaces, the Čech complex coincides with the Dowker source and
sink complexes. We will be interested in the special case where the underlying metric space
156
1
1 S2 to denote
is the circle. We write the circle with unit circumference. Next, for any n ∈ N,
n−1
we write Xn := 0, n , n , . . . , n to denote the collection of n equally spaced points on
S 1 with the restriction of the arc length metric on S 1 . Also let Gn denote the n-node cycle
network with vertex set Xn (in contrast with Xn , here Gn is equipped with the asymmetric
weights defined in §1.3.2). The connection between Xn and Dowker complexes of the cycle
networks Gn is highlighted by the following observation:
Proposition 165. Let n ∈ N. Then for any δ ∈ [0, 1], we have Č(Xn , 2δ ) = Dsinδ,Gn .
The scaling factor arises because Gn has diameter ∼ n, whereas Xn ⊆ S 1 has diameter
∼ 1/2. This proposition provides a pedagogical step which helps us transport results from
the setting of [3] and [4] to that of the current paper.
Proof. For δ = 0, both the Čech and Dowker complexes consist of the n vertices, and are
equal. Similarly for δ = 1, both Č(Xn , 1) and Dsin,Gn are equal to the (n − 1)-simplex.
Now suppose δ ∈ (0, 1). Let σ ∈ Dsinδ,Gn . Then σ is of the form [ nk , k+1 n
, . . . , bk+nδc
n
] for
bk+nδc
some integer 0 ≤ k ≤ n − 1, where the nδ-sink is n and all the numerators are taken
modulo n. We claim that σ ∈ Č(Xn , 2δ ). To see this, observe that dS 1 ( nk , bk+nδc
n
) ≤ δ, and so
B( nk , 2δ ) ∩ B( bk+nδc , 2δ ) 6= ∅. Then we have σ ∈ nδ bk+ic δ
, 2 , and so σ ∈ Č(Xn , 2δ ).
T
n i=0 B n
Now let σ ∈ Č(Xn , 2δ ). Then σ is of the form [ nk , k+1
n
, . . . , k+j
n
] for some integer 0 ≤
k ≤ n − 1, where j is an integer such that n ≤ δ. In this case, we have σ = Xn ∩ji=0
j
B k+i , δ . Then in Gn , after applying the scaling factor n, we have σ ∈ Dsinδ,Gn , with k+j
n n
as an nδ-sink in Gn . This shows equality of the two simplicial complexes.
Theorem 166 (Theorem 3.5, [4]). Fix n ∈ N, and let 0 ≤ k ≤ n − 2 be an integer. Then,
(W
n−k−1 2l
k S l
if nk = l+1 ,
Č(Xn , 2n )'
S 2l+1 l
or if l+1 < nk < l+2
l+1
,
W
for some l ∈ Z+ . Here denotes the wedge sum, and ' denotes homotopy equivalence.
157
Dsik−ε,Gn ⊆ Dsik,Gn ⊆ Dsik+ε,Gn induce zero maps upon passing to homology. It follows that
DgmD nl nl n
2l (Gn ) consists of the point ( l+1 , l+1 + 1) with multiplicity l+1 − 1.
If l ∈ N does not satisfy the condition described above, then there does not exist an
integer 1 ≤ j ≤ n − 2 such that j/n = l/(l + 1). So for each 1 ≤ j ≤ n − 2, Dsij,Gn =
j
Č(Xn , 2n ) has the homotopy type of an odd-dimensional sphere by Theorem 166, and thus
does not contribute to DgmD k
2l (Gn ). If l satisfies the condition but k ≥ n−1, then Č(Xn , 2n )
is just the (n − 1)-simplex, hence contractible.
Theorem 44 gives a characterization of the even dimensional Dowker persistence di-
agrams of cycle networks. The most interesting case occurs when considering the 2-
dimensional diagrams: we see that cycle networks of an even number of nodes have an
interesting barcode, even if the bars are all short-lived. For dimensions 4, 6, 8, and beyond,
there are fewer and fewer cycle networks with nontrivial barcodes (in the sense that only
cycle networks with number of nodes equal to a multiple of 4, 6, 8, and so on have nontriv-
ial barcodes). For a complete picture, it is necessary to look at odd-dimensional persistence
diagrams. This is made possible by the next set of constructions.
We have already recalled the definition of a Rips complex of a metric space. To facilitate
the assessment of the connection to [3], we temporarily adopt the notation VR(X, ε) to
denote the Vietoris-Rips complex of a metric space (X, dX ) at resolution ε > 0, i.e. the
simplicial complex {σ ⊆ X : diam(σ) ≤ ε}.
Theorem 167 (Theorem 9.3, Proposition 9.5, [3]). Let 0 < r < 12 . Then there exists a
map Tr : pow(S 1 ) → pow(S 1 ) and a map πr : S 1 → S 1 such that there is an induced
homotopy equivalence
2r '
VR(Tr (X), 1+2r )−→ Č(X, r).
Next suppose X ⊆ S 1 and let 0 < r ≤ r0 < 21 . Then there exists a map η : S 1 → S 1 such
that the following diagram commutes:
2r η 2r 0
VR(Tr (X), 1+2r ) VR(Tr0 (X), 1+2r 0)
⊆
Č(X, r) Č(X, r0 )
Theorem 168. Consider the setup of Theorem 167. If Č(X, r) and Č(X, r0 ) are homotopy
equivalent, then the inclusion map between them is a homotopy equivalence.
Theorem
n o Fix n ∈ N, n ≥ 3. Then for l ∈ N, define Ml :=
45 (Odd dimension).
n(l+1)
m ∈ N : l+1 < m < l+2 . If Ml is empty, then DgmD
nl
2l+1 (Gn ) is trivial. Otherwise,
we have: n l mo
n(l+1)
DgmD 2l+1 (Gn ) = a l , l+2
,
158
where al := min {m ∈ Ml } . We use set notation (instead of multisets) to mean that the
multiplicity is 1.
k
Proof of Theorem 45. By Proposition 165 and Theorem 166, we know that Dsik,Gn = Č(Xn , 2n )'
1 n
S for integers 0 < k < 2 . Let b ∈ N be the greatest integer less than n/2. Then
by Theorem 168, we know that each inclusion map in the following chain is a homotopy
equivalence:
Dsi1,Gn ⊆ . . . ⊆ Dsib,Gn = Dsidn/2e− ,Gn .
It remains to provide a proof of Theorem 168. For this, we need some additional ma-
chinery.
Cyclic maps and winding fractions We introduce some more terms from [3], but for ef-
ficiency, we try to minimize the scope of the definitions to only what is needed for our
purpose. Recall that we write S 1 to denote the circle with unit circumference. Thus we
naturally identify any x ∈ S 1 with a point in [0, 1). We fix a choice of 0 ∈ S 1 , and for any
−→
x, x0 ∈ S 1 , the length of a clockwise arc from x to x0 is denoted by dS 1 (x, x0 ). Then, for any
−→
finite subset X ⊆ S 1 and any r ∈ (0, 1/2), the directed Vietoris-Rips graph VR(X, r) is
−→
defined to be the graph with vertex set X and edge set {(x, x0 ) : 0 < dS 1 (x, x0 ) < r}. Next,
→
−
let G be a Vietoris-Rips graph such that the vertices are enumerated as x0 , x1 , . . . , xn−1 ,
→
−
according to the clockwise order in which they appear. A cyclic map between G and a
→
− →
−
Vietoris-Rips graph H is a map of vertices f such that for each edge (x, x0 ) ∈ G , we
→
− −→
have either f (x) = f (x0 ), or (f (x), f (x0 )) ∈ H , and n−1
P
i=0 dS 1 (f (xi ), f (xi+1 )) = 1. Here
xn := x0 .
→
− →
−
Next, the winding fraction of a Vietoris-Rips graph G with vertex set V ( G ) is defined
→
−
to be the infimum of numbers nk such that there is an order-preserving map V ( G ) → Z/nZ
such that each edge is mapped to a pair of numbers at most k apart. A key property of the
159
winding fraction, denoted wf, is that if there is a cyclic map between Vietoris-Rips graphs
→
− →
− →
− →
−
G → H , then wf( G ) ≤ wf( H ).
Theorem 169 (Corollary 4.5, Proposition 4.9, [3]). Let X ⊆ S 1 be a finite set and let
0 < r < 12 . Then,
(
l −→ l+1
S 2l+1 : 2l+1 < wf(VR(X, r)) < 2l+3 for some l ∈ Z+ ,
VR(X, r) ' Wj 2l −→ l
S : wf(VR(X, r)) = 2l+1 , for some j ∈ N.
−→
Next let X 0 ⊆ S 1 be another finite set, and let r ≤ r0 < 12 . Suppose f : VR(X, r) →
−→ 0 0 l −→
VR(X , r ) is a cyclic map between Vietoris-Rips graphs and 2l+1 < wf(VR(X, r)) ≤
−→
wf(VR(X 0 , r0 )) < 2l+3
l+1
. Then f induces a homotopy equivalence between VR(X, r) and
0 0
VR(X , r ).
We now have the ingredients for a proof of Theorem 168.
Proof of Theorem 168. Since the maps πr and πr0 induce homotopy equivalences, it follows
that
2r 2r0
VR(Tr (X), 1+2r ) ' VR(Tr0 (X), 1+2r 0 ).
By the characterization result in Theorem 169, we know that there exists l ∈ Z+ such that
l −→ 2r −→ 2r0 l+1
2l+1
< wf(VR(Tr (X), 1+2r )) ≤ wf(VR(Tr0 (X), 1+2r 0 )) < 2l+3 .
The map η in Theorem 167 appears in [3, Proposition 9.5] through an explicit construction.
Moreover, it is shown that η induces a cyclic map
−→ 2r −→ 2r0
wf(VR(Tr (X), 1+2r )) → wf(VR(Tr0 (X), 1+2r 0 )).
2r
Thus by Theorem 169, η induces a homotopy equivalence between VR(Tr (X), 1+2r ) and
2r0
VR(Tr0 (X), 1+2r0 ). Finally, the commutativity of the diagram in Theorem 167 shows that
the inclusion Č(X, r) ⊆ Č(X, r0 ) induces a homotopy equivalence.
Remark 170. The analogue of Theorem 168 for Čech complexes appears as Proposition
4.9 of [3] for Vietoris–Rips complexes. We prove Theorem 168 by connecting Čech and
Vietoris-Rips complexes using Proposition 9.5 of [3]. However, as remarked in §9 of [3],
one could prove Theorem 168 directly using a parallel theory of winding fractions for Čech
complexes.
160
notation means:
either f (x) = f (x0 ), or (f (x), f (x0 )) ∈ EY .
To extend path homology constructions to a persistent framework, we need to verify
the functoriality of path homology. As a first step, one must understand how digraph maps
transform into maps between vector spaces. Some of the material below can be found
in [63]; we contribute a statement and verification of the functoriality of path homology
(Proposition 172) that is central to the PPH framework (Definition 20).
Let X, Y be two sets, and let f : X → Y be a set map. For each dimension p ∈ Z+ ,
one defines a map (f∗ )p : Λp (X) → Λp (Y ) to be the linearization of the following map on
generators: for any generator [x0 , . . . , xp ] ∈ Λp (X),
Note also that for any p ∈ Z+ and any generator [x0 , . . . , xp ] ∈ Λp (X), we have:
p
X
∂pnr (−1)i (f∗ )p−1 [x0 , . . . , xbi , . . . , xp ]
(f∗ )p−1 ◦ ([x0 , . . . , xp ]) =
i=0
p
X
= (−1)i [f (x0 ), . . . , f[
(xi ), . . . , f (xp )]
i=0
∂pnr
= ◦ (f∗ )p ([x0 , . . . , xp ]).
It follows that f∗ := ((f∗ )p )p∈Z+ is a chain map from (Λp (X), ∂pnr )p∈Z+ to (Λp (Y ), ∂pnr )p∈Z+ .
Let p ∈ Z+ . Note that (f∗ )p (Ip (X)) ⊆ Ip (Y ), so (f∗ )p descends to a map on quotients
which is well-defined. For convenience, we will abuse notation to denote the map on
quotients by (f∗ )p as well. Thus we obtain an induced map (f∗ )p : Rp (X) → Rp (Y ).
Since p ∈ Z+ was arbitrary, we get that f∗ is a chain map from (Rp (X), ∂p )p∈Z+ to
(Rp (Y ), ∂p )p∈Z+ . The operation of this chain map is as follows: for each p ∈ Z+ and
any generator [x0 , . . . , xp ] ∈ Rp (X),
(
[f (x0 ), . . . , f (xp )] : f (xi ), f (xi+1 ) distinct for 0 ≤ i ≤ p − 1
(f∗ )p ([x0 , . . . , xp ]) :=
0 : otherwise.
161
Proposition 171 (Theorem 2.10, [63]). Let GX = (X, EX ), GY = (Y, EY ) be two di-
graphs, and let f : GX → GY be a digraph map. Let f∗ : R• (X) → R• (Y ) denote
the chain map induced by the underlying set map f : X → Y . Let (Ωp (GX ), ∂pGX )p∈Z+ ,
(Ωp (GY ), ∂pGY )p∈Z+ denote the chain complexes of the ∂-invariant paths associated to each
of these digraphs. Then (f∗ )p (Ωp (GX )) ⊆ Ωp (GY ) for each p ∈ Z+ , and the restriction of
f∗ to Ω• (GX ) is a chain map.
Henceforth, given two digraphs G, G0 and a digraph map f : G → G0 , we refer to the
chain map f∗ given by Proposition 171 as the chain map induced by the digraph map f .
Because f∗ is a chain map, we then obtain an induced linear map (f# )p : Hp (G) → Hp (G0 )
for each p ∈ Z+ .
The preceding concepts are necessary for developing the theory of path homology. We
use this set up to state and prove the following result, which is used in defining PPH (Defi-
nition 20) and also for proving stability (Theorem 56).
Proposition 172 (Functoriality of path homology). Let G, G0 , G00 be three digraphs.
1. Let idG : G → G be the identity digraph map. Then (idG# )p : Hp (G) → Hp (G) is
the identity linear map for each p ∈ Z+ .
It follows that (idG∗ )p is the identity linear map on Ωp (G), and thus (idG# )p is the identity
linear map on Hp (G). For the second claim, suppose first that pairs of consecutive elements
of g(f (x0 )), . . . , g(f (xp )) are all distinct. This implies that pairs of consecutive elements
of f (x0 ), . . . , f (xp ) are also all distinct, and we observe:
((g ◦ f )∗ )p ([x0 , . . . , xp ]) = [g(f (x0 )), . . . , g(f (xp ))] g(f (xi )), g(f (xi+1 )) distinct
= (g∗ )p ([f (x0 ), . . . , f (xp )]) because f (xi ), f (xi+1 ) distinct
= (g∗ )p (f∗ )p ([x0 , . . . , xp ]) .
Next suppose that for some 0 ≤ i < p, we have g(f (xi )) = g(f (xi+1 )). Then we obtain:
((g ◦ f )∗ )p ([x0 , . . . , xp ]) = 0 = (g∗ )p (f∗ )p ([x0 , . . . , xp ]) .
It follows that ((g◦f )∗ )p = (g∗ )p ◦(f∗ )p . The statement of the proposition now follows.
Remark 173. We thank Paul Ignacio for pointing out an error in a version of the preceding
proof that appeared in [40].
162
3.3.2 Homotopy of digraphs
The constructions of path homology are accompanied by a theory of homotopy devel-
oped in [63]. An illustrated example is provided in Figure 3.1.
X × Y := {(x, y) : x ∈ X, y ∈ Y }, and
EX×Y := {((x, y), (x0 , y 0 )) ∈ (X × Y )2 : x = x0 and (y, y 0 ) ∈ EY ,
or y = y 0 and (x, x0 ) ∈ EX }.
Next, the line digraphs I + and I − are defined to be the two-point digraphs with vertices
{0, 1} and edges (0, 1) and (1, 0), respectively. Two digraph maps f, g : GX → GY are
one-step homotopic if there exists a digraph map F : GX × I → GY , where I ∈ {I + , I − },
such that:
F |GX ×{0} = f and F |GX ×{1} = g.
This condition is equivalent to requiring:
→ →
f (x) = g(x) for all x ∈ X, or g(x) = f (x) for all x ∈ X.
Moreover, f and g are homotopic, denoted f ' g, if there is a finite sequence of digraph
maps f0 = f, f1 , . . . , fn = g : GX → GY such that fi , fi+1 are one-step homotopic for
each 0 ≤ i ≤ n − 1. The digraphs GX and GY are homotopy equivalent if there exist
digraph maps f : GX → GY and g : GY → GX such that g ◦ f ' idGX and f ◦ g ' idGY .
An example of digraph homotopy equivalence is illustrated in Figure 3.1. Informally,
the homotopy equivalence is given by “crushing” the orange arrows according to the direc-
tions they mark. This operation crushes the 4-tesseract to the 3-cube, to the 2-square, to the
line, and finally to the point.
The concept of homotopy yields the following theorem on path homology groups:
163
Theorem 174 (Theorem 3.3, [63]). Let G, G0 be two digraphs.
1. Let f, g : G → G0 be two homotopic digraph maps. Then these maps induce identical
maps on homology vector spaces. More precisely, the following maps are identical
for each p ∈ Z+ :
(f# )p : Hp (G) → Hp (G0 ) (g# )p : Hp (G) → Hp (G0 ).
164
By combining the preceding claims and Theorem 174, we obtain the following, for
each p ∈ Z+ :
Thus PVecΞp (X ) and PVecΞp (Y) are η-interleaved, for each p ∈ Z+ . The result now
follows by an application of Lemma 154.
Lemma 175 (Proposition 2.9, [63]). Let G be a finite digraph. Then any v ∈ Ω2 (G) is a
linear combination of the following three types of ∂-invariant 2-paths:
2. abc with edges (a, b), (b, c), (a, c) (a triangle), and
3. abc − adc with edges (a, b), (b, c), (a, d), (d, c), where a 6= c and (a, c) is not an edge
(a long square).
Lemma 176 (Parity lemma). Fix a simplicial complex K and a field Z/pZ for some prime
P
p. Let w := i∈I bi τi be a 2-chain in C2 (K) where I is a finite index set, each bi ∈ Z/pZ,
and each τi is a 2-simplex in K. Let σ be a 1-simplex contained in some τi such that σ does
not appear in ∂2∆ (w). Define Jσ := {j ∈ I : σ a face of τj }. Then there exists n(σ) ∈ N
such that:
n(σ)
X X
w= bi τ i + (τj+ + τj− ),
i∈I\Jσ j=1
165
Proof of Lemma 176. Since we are working over Z/pZ, we adopt the convention that bi ∈
{0, 1, . . . , p − 1} for each i ∈ I. Then for each j ∈ Jσ , we know that ∂2∆ (τj ) contributes
P P
either +σ or −σ with multiplicity bj . Write w = i∈I\Jσ bi τi + j∈Jσ bj τj .
Since σ is not a summand of ∂2∆ (w), it follows that j∈Jσ bj = 0. Define:
P
where the sum is taken over Z (not Z/pZ). Next define a finite sequence (τ1+ , . . . , τn++ (σ) )
as follows:
Here the indexing element i is of course taken over Z and not Z/pZ. Similarly we define a
P + (σ) + Pn− (σ) −
sequence (τ1− , . . . , τn−− (σ) ). Then w = i∈I\Jσ bi τi + nm=1
P
τm + m=1 τm .
The expression for ∂2∆ (w) contains +σ with multiplicity n+ (σ) and −σ with multi-
plicity n− (σ), such that the total multiplicity is 0, i.e. is a multiple of p. Thus we have
n+ (σ) − n− (σ) ∈ pZ. There are two cases: either n+ (σ) ≥ n− (σ) or n+ (σ) ≤ n− (σ).
Both cases are similar, so we consider the first. Let q be a nonnegative integer such
that n+ (σ) = n− (σ) + pq. We pad the τ − sequence by defining τi− := τn−− (σ) for
i ∈ {n− (σ) + 1, . . . , n− (σ) + pq}. Then we have:
n+ (σ) n− (σ) n+ (σ) n− (σ) n− (σ)+pq
X X X X X X X
+ − + − −
w= bi τ i + τm + τm = bi τ i + τm + τm + τm
i∈I\Jσ m=1 m=1 i∈I\Jσ m=1 m=1 m=n− (σ)+1
n+ (σ) n+ (σ)
X X X
+ −
= bi τ i + τm + τm .
i∈I\Jσ m=1 m=1
Theorem 61. Let X = (X, AX ) ∈ CN be a square-free network, and fix K = Z/pZ for
some prime p. Then DgmΞ1 (X ) = DgmD
1 (X ).
Proof of Theorem 61. Let δ ∈ R. First we wish to find an isomorphism ϕδ : H1Ξ (GδX ) →
H1∆ (Dsiδ,X ). We begin with the basis B for Ω1 (GδX ). We claim that B is just the collection of
allowed 1-paths in GδX . To see this, let ab be an allowed 1-path. Then ∂1 (ab) = b−a, which
is allowed because the vertices a and b are automatically allowed. Thus ab ∈ Ω1 (GδX ), and
so B generates Ω1 (GδX ).
Whenever ab is an allowed 1-path, we have a directed edge (a, b) in GδX , and so
AX (a, b) ≤ δ by the definition of GδX . Thus the simplex [a, b] belongs to Dsiδ,X , with b
166
as a δ-sink. Hence [a, b] is a 1-chain in C1 (Dsiδ,X ). Define a map ϕ eδ : Ω1 (GδX ) → C1 (Dsiδ,X )
by setting ϕ eδ (ab) = [a, b] and extending linearly. The image of ϕ eδ restricted to B is linearly
independent because any linear dependence relation would contradict the independence of
B. Furthermore, ϕ eδ induces a map ϕ e0δ : ker(∂1Ξ ) → ker(∂1∆ ). We need to check that this
descends to a map ϕd : ker(∂1 )/ im(∂2Ξ ) → ker(∂1∆ )/ im(∂2∆ ) on quotients. To see this, we
Ξ
m n([x,y])
X X
w= bi τ i + (τj+ + τj− ),
i6∈J,i=0 j=1
where all the summands of w containing [x, y] as a face are paired in the latter term. Each
τ + + τ − summand has the following form: [x, y] is a face of both τ + and τ − , and both τ +
167
and τ − are Type I simplices. Fix 1 ≤ j ≤ n([x, y]). Then for some z, u ∈ X, τj+ = [x, y, z]
and τj− = [x, u, y] have the following arrangement:
x x x
u z u z u z
y y y
Since (X, AX ) is square-free, we must have at least one of the edges (z, u) or (u, z) in
GδX .Suppose (z, u) is an edge. Because we have
Type II simplex in the expression for v, for some 0 ≤ i ≤ m. Write τi = [x, y], and let z be
a δ-sink for τi . Then [x, y, z] is a simplex in Dsiδ,X , and ∂2∆ ([x, y, z]) = [y, z] − [x, z] + [x, y].
Thus [x, y] is homologous to [x, z]−[y, z], each of which is a Type I simplex. This argument
shows that v is homologous to a 1-cycle v 0 of Type I.
Next let τ 0 be a 1-simplex in the expression for v 0 . Write τ 0 = [x, y]. If x is the δ-sink
for τ 0 , then we replace the τ 0 = [x, y] in the expression of v 0 with −[y, x]. This does not
change v, since we have τ 0 = [x, y] = −[y, x] in C1 (Dsiδ,X ). After repeating this procedure
for each element of v 0 , we obtain a rewritten expression for v 0 in terms of elements [x, y]
where y is the δ-sink for [x, y]. Let v 0 = ni=0 b0i [xi , yi ] denote this new expression.
P
Finally, observe that for each [xi , yi ] in the rewritten expression for v 0 , we also have
(xi , yi ) as an edge in GδX . Thus ni=0 b0i xi yi is a 1-cycle in H1Ξ (GδX ) that is mapped to v 0
P
by ϕδ . It follows that ϕδ is surjective, and hence is an isomorphism.
0
To complete the proof, let δ ≤ δ 0 ∈ R. Consider the inclusion maps ιG : GδX ,→ GδX
and ιD : Dsiδ,X ,→ Dsiδ0 ,X , and let (ιG )# , (ιD )# denote the induced maps at the respective
168
homology levels. Let v = ni=0 ai xi yi be a 1-cycle in H1Ξ (GδX ). Then we have:
P
n
! n
! n
X X X
(ϕδ0 ◦ (ιG )# ) ai xi yi = ϕδ0 ai x i y i = ai [xi , yi ]
i=0 i=0 i=0
n
! n
!
X X
= (ιD )# ai [xi , yi ] = ((ιG )# ◦ ϕδ ) ai [xi , yi ] .
i=0 i=0
Thus the necessary commutativity relation holds, and the theorem follows by the Per-
sistence Equivalence Theorem.
Theorem 63. Let Gn be a cycle network for some integer n ≥ 3. Fix a field K = Z/pZ for
some prime p. Then DgmΞ1 (Gn ) = {(1, dn/2e)}.
Proof of Theorem 63. From [37], we know that DgmD 1 (Gn ) = {(1, dn/2e)}. Thus by The-
orem 61, it suffices to show that Gn is square-free. Suppose n ≥ 4, and let a, b, c, d be four
nodes that appear in Gn in clockwise order. First let δ ∈ R be such that (a, b), (b, c), (a, d), (d, c)
are edges in GδGn . Then ωGn (d, c) ≤ δ, and because of the clockwise orientation d a c,
we automatically ωGn (a, c) ≤ δ. Hence (a, c) is an edge in GδGn , and so the subgraph in-
duced by a, b, c, d is not a long square.
Next suppose δ ∈ R is such that (a, b), (c, b), (a, d), (c, d) are edges in GδGn . Since
ωGn (c, b) ≤ δ and c a b in Gn , we have ωGn (c, a) ≤ δ. Hence (c, a) is an edge in
GδGn , and so the subgraph induced by a, b, c, d is not a short square.
Theorem 59. Let X = (X, AX ) ∈ CN be a symmetric network, and fix K = Z/pZ for
some prime p. Then DgmΞ1 (X ) = DgmD
1 (X ).
Proof of Theorem 59. The proof is similar to that of Theorem 61; instead of repeating all
details, we will show how the argument changes when the square-free assumption is re-
placed by the symmetry assumption. Let δ ∈ R, and consider the map ϕ e0δ : ker(∂1Ξ ) →
ker(∂1∆ ) defined as in Theorem 61. As before, we need to check that this descends to a map
ϕd : ker(∂1Ξ )/ im(∂2Ξ ) → ker(∂1∆ )/ im(∂2∆ ) on quotients. For this we need to verify that
e0δ (im(∂2Ξ )) ⊆ im(∂2∆ ).
ϕ
By Lemma 175, we know that any element of im(∂2Ξ ) is of the form ba+ab, bc−ac+ab,
or bc+ab−dc−ad. For the first two cases, we can repeat the argument used in Theorem 61.
The final case corresponds to the situation where we have a long square in GδX consisting
of edges (a, b), (b, c), (a, d), and (d, c). This gives the 2-chain abc − adc. Now by the
symmetry condition, we also have edges (c, d) and (c, b). Thus [a, b, c] is a 2-simplex in
Dsiδ,X , with b as a δ-sink, and [a, d, c] is a 2-simplex with d as a δ-sink. Hence [a, b, c] −
e0δ (bc + ab − dc − ad) = [b, c] + [a, b] − [d, c] − [a, d]
[a, d, c] is a 2-chain in C2 (Dsiδ,X ). Thus ϕ
∆
belongs to im(∂2 ). Thus we obtain a well-defined map ϕδ : H1Ξ (GδX ) → H1∆ (Dsiδ,X ).
Next we need to check that ϕδ is injective. As in Theorem 61, let v ∈ ker(ϕδ ). Then
ϕδ (v) = ϕδ ( ki=0 ai σi ) = ∂2∆ ( m
P P
j=0 bj τj ), where the ai , bj terms belong to the field K,
each σi is a 1-path in GX , and each τj is a 2-simplex in Dsiδ,X . We proceed by proving an
δ
169
analogue of Claim 20 in the symmetric setting. Write w := m
P
Pn 0 0 j=0 bj τj . We need to show
that w is homologous to a 2-cycle k=0 bk τk in C2 (Dδ,X ), where each τk0 is of the form
si
u z u z
y y
By the symmetry assumption, (z, y) and (u, y) are also edges in GδX , and so xuy, xzy
are both allowed 2-paths. Since τ − = [x, y, u] = −[x, u, y], we can replace τ + + τ − by
[x, z, y] − [x, u, y], where xzy − xuy is a square in GδX . Proceeding in this way, we replace
each summand of w containing [x, y] as a face. We repeat this argument for each choice of
τ = [x, y, z] in the expression for w.
Finally, we obtain an expression of w such that there exists v 0 ∈ Ω2 (GδX ) satisfying
ϕδ (v 0 ) = w. Then we have ∂2Ξ (v 0 ) = v, and so v = 0 in H1Ξ (GδX ). Thus ϕδ is injective.
We omit the remainder of the argument, because it is a repeat of the corresponding
part of the proof of Theorem 61. In summary, it turns out that ϕδ is surjective, hence
an isomorphism, and furthermore that it commutes with the linear maps induced by the
canonical inclusions. This concludes the proof.
170
Corollary 177 (Stability). Let (X, ωX ), (Y, ωY ) ∈ CN , k ∈ Z+ . Then,
where Dgm• denotes each of the Vietoris-Rips, Dowker, or path persistence diagrams.
Proof. By Theorem 86, both the Rips and Dowker persistent vector spaces of X and Y
are q-tame. Thus they have well-defined persistence diagrams (Theorem 85), and we have
equality of dI and dB .
where Xn (ω) is the subnetwork induced by {x1 (ω), . . . , xn (ω)} and Dgm• is either of the
Vietoris-Rips, Dowker, or PPH diagrams. In particular, either of these three persistent vec-
tor spaces of the subnetwork Xn converges almost surely to that of supp(µX ) in bottleneck
distance.
Proof of Theorem 87. We can consider supp(µX ) as a network with full support by endow-
ing it with the restriction of ωX to supp(µX ) × supp(µX ), so for convenience, we assume
X = supp(µX ). Let ω ∈ Ω be such that dN (X, Xn (ω)) < ε/2. Then by Corollary 177, we
have that dB (Dgm• (X), Dgm• (Xn )) < ε. By applying Theorem 68, we then have:
We conclude the proof with an application of the Borel-Cantelli lemma, as in the proof of
Theorem 68.
171
Chapter 4: Algorithms, computation, and experiments
For convenience, we will write Ψ(X), Ψ(Y ) to mean (X, Ψ ◦ ωX ) and (Y, Ψ ◦ ωY )
respectively. We will also write:
172
Consider the problem of computing dN (Ψ(X), Ψ(Y )). First observe that for any R ∈ RB ,
we have dis(R) = disΨ (R). To see this, let R ∈ RB . Let (x, y), (x0 , y 0 ) ∈ R, and note that
x 6= x0 , y 6= y 0 . Then:
|Ψ(ωX (x, x0 ))−Ψ(ωY (y, y 0 ))| = |ωX (x, x0 )+C −ωY (y, y 0 )−C| = |ωX (x, x0 )−ωY (y, y 0 )|.
Since (x, y), (x0 , y 0 ) were arbitrary, it follows that dis(R) = disΨ (R). This holds for all
R ∈ RB .
On the other hand, let R ∈ RN . By a previous observation, we assume that there exist
x, x0 , y such that (x, y), (x0 , y) ∈ R. For such a pair, we have:
|Ψ(ωX (x, x0 )) − Ψ(ωY (y, y))| = |ωX (x, x0 ) + C − 0| ≥ max dis(S) + 1.
S∈R(X,Y )
It follows that disΨ (R) > disΨ (S), for any S ∈ RB . Hence:
1
dN (Ψ(X), Ψ(Y )) = min disΨ (R)
2 R∈R(X,Y )
1
= min disΨ (R)
2 R∈RB
1
= min dis(R)
2 R∈RB
1
= min dis(ϕ), where ϕ ranges over bijections X → Y
2 ϕ
= dbN (X, Y ).
It is known (see Remark 179 below) that computing dbN is NP-hard. But the preceding
calculation shows that dbN can be computed through dN , which, by assumption, is not NP-
hard. This is a contradiction. Hence dN is NP-hard.
Remark 179. We can be more precise about why computing dbN is a case of the QBAP. Let
X = {x1 , . . . , xn } and let Y = {y1 , . . . , yn }. Let Π denote the set of all n × n permutation
matrices. Note that any π ∈ Π can be written as π = ((πij ))ni,j=1 , where each πij ∈ {0, 1}.
P P
Then j πij = 1 for any i, and i πij = 1 for any j. Computing dbN now becomes:
1
dbN (X, Y ) = min max Γijkl πij πkl , where Γikjl = |ωX (xi , xk ) − ωY (yj , yl )|.
2 π∈Π 1≤i,k,j,l,≤n
This is just the QBAP, which is known to be NP-hard [18].
173
4.2.1 An algorithm for computing minimum matchings
Lower bounds for dN involving the comparison of local spectra of two networks such
as those in Proposition 147 require computing the minimum of a functional J(R) :=
max(x,y)∈R C(x, y) where C : X × Y → R+ is a given cost function and R ranges in
R(X, Y ). This is an instance of a bottleneck linear assignment problem (or LBAP) [18].
We remark that the current instance differs from the standard formulation in that one is
now optimizing over correspondences and not over permutations. Hence the standard al-
gorithms need to be modified.
Assume n = card(X) and m = card(Y ). In this section we adopt matrix notation and
regard P)) ∈ {0, 1}
R as a matrix ((ri,j n×m
. The condition R ∈ R(X, Y ) then requires that
n×m
P
i ri,j ≥ 1 for all j and j ri,j ≥ 1 for all i. We denote by C = ((ci,j )) ∈ R+ the matrix
representation of the cost function C described above. With the goal of identifying a suit-
able algorithm, the key observation is that the optimal value minR∈R J(R) must coincide
with a value realized in the matrix C.
An algorithm with complexity O(n2 × m2 ) is the one in Algorithm 1 (we give it in
Matlab pseudo-code). The algorithm belongs to the family of thresholding algorithms for
solving matching problems over permutations, see [18]. Notice that R is a binary matrix
and that procedure TestCorrespondence has complexity O(n × m). In the worst case, the
matrix C has n × m distinct entries, and the while loop will need to exhaustively test them
all, hence the claimed complexity of O(n2 × m2 ). Even though a more efficient version
(with complexity O((n × m) log(n × m)) can be obtained by using a bisection strategy
on the range of possible values contained in the matrix C (in a manner similar to what is
described for the case of permutations in [18]), here for clarity we limit our presentation to
the version detailed above.
174
Algorithm 1 MinMax matching
1: procedure M IN M AX M ATCH(C)
2: v = sort(unique(C(:)));
3: k = 1;
4: while ∼ done do
5: c = v(k);
6: R = (C <= c);
7: done = T EST C ORRESPONDENCE(R);
8: k = k + 1;
9: end while
10: return c
11: end procedure
12: procedure T EST C ORRESPONDENCE(R)
13: done = prod(sum(R))*prod(sum(R’)) > 0;
14: return done
15: end procedure
Figure 4.1: Left: Lower bound matrix arising from matching local spectra on the database
of community networks. Right: Corresponding single linkage dendrogram. The labels
indicate the number of communities and the total number of nodes. Results correspond to
using local spectra as described in Proposition 180.
175
Figure 4.2: Left: Lower bound matrix arising from matching global spectra on the database
of community networks. Right: Corresponding single linkage dendrogram. The labels
indicate the number of communities and the total number of nodes. Results correspond to
using global spectra as signatures.
This bound follows from Proposition 147 by the discussion at the beginning of §2.6.1.
The results are shown in the form of the lower bound matrix and its single linkage
dendrogram in Figure 4.1. Notice that the labels in the dendrogram permit ascertaining
the quality of the classification provided by the local spectra bound. With only very few
exceptions, networks with similar structure (same number of communities) were clustered
together regardless of their cardinality. Notice furthermore how networks with 4 and 5
communities merge together before merging with networks with 1 and 2 communities, and
vice versa. For comparison, we provide details about the performance of the global spectra
lower bound on the same database in Figure 4.2. The results are clearly inferior to those
produced by the local version, as predicted by the inequality in Proposition 147.
176
In this experiment, there were two environments: (1) a square of side length L, and (2)
a square of side length L, with a disk of radius 0.33L removed from the center. In what
follows, we refer to the environments of the second type as 1-hole environments, and those
of the first type as 0-hole environments. For each environment, a random-walk trajectory
of 5000 steps was generated, where the animal could move above, below, left, or right with
equal probability. If one or more of these moves took the animal outside the environment (a
disallowed move), then the probabilities were redistributed uniformly among the allowed
moves. The length of each step in the trajectory was 0.1L.
In the first set of 20 trials for each environment, 200 place fields of radius 0.1L were
scattered uniformly at random. In the next two sets, the place field radii were changed to
0.2L and 0.05L. This produced a total of 60 trials for each environment. For each trial,
the corresponding network (X, ωX ) was constructed as follows: X consisted of 200 place
cells, and for each 1 ≤ i, j ≤ 200, the weight ωX (xi , xj ) was given by:
# times cell xj spiked in a window of five time units after cell xi spiked
ωX (xi , xj ) = 1 − .
# times cell xj spiked
The results of applying the local spectra lower bound are shown in Figure 4.3. The
labels env-0, env-1 correspond to 0 and 1-hole environments, respectively.
As a final remark, we note that at least for this experiment, it appears that superior
results were obtained using Dowker persistent homology, as shown in §1.10.2.
177
Figure 4.3: Single linkage dendrogram based on local spectrum lower bound of Proposition
180 corresponding to hippocampal networks with place field radii 0.2L, 0.1L, and 0.05L
(clockwise from top left).
178
4.3.1 Numerical stability of entropic regularization
Let µX , µY be probability measures on sets X, Y with |X| = m, |Y | = n. For a general
m × n cost matrix M , one may consider the entropically regularized optimal transport
problem below, where λ ≥ 0 is a regularization parameter and H denotes entropy:
X 1 X
inf Mij pij − H(p), H(m) = − pij log pij .
p∈C (µX ,µY )
i,j
λ i,j
As shown in [42], the solution to this problem has the form diag(a)∗K ∗diag(b), where
K := e−λM is a kernel matrix and a, b are nonnegative scaling vectors in Rm , Rn , respec-
tively. Here ∗ denotes matrix multiplication, and exponentiation is performed elementwise.
An approximation to this solution can be obtained by iteratively scaling K to have row and
column sums equal to µX and µY , respectively. More specifically, the updates are simply:
µY µX
a = ones(m, 1); %initialize, b← , a← .
K0 ∗ a K ∗b
As pointed out in [106, 31, 30], using a large value of λ (corresponding to a small
regularization) leads to numerical instability, where values of K can go below machine
precision and entries of the scaling udpates a, b can also blow up. For example, Matlab
will interpret e−1000 as 0, which is a problem with even a moderate choice of λ = 200 and
Mij = 50. Theoretically, it is necessary to have K be a positive matrix for the Sinkhorn
algorithm to converge to the correct output [109, 110].
In [106, 31, 30], the authors proposed a variety of clever methods for stabilizing the
structure of the algorithm. One such idea is to incorporate some amount of log-domain
computations to stabilize the iterations, i.e. during the iterations, large values of a and b
are occasionally “absorbed” into K. This leads to some cancellation with the small values
of K so that the resulting matrix K̃ is stabilized. The iterations then continue with the
stabilized kernel until another absorption step is required, which stabilizes K̃ even further.
Even with this strategy, some entries of K might be zero at initialization. Another strategy
described in the preceding works is to start with a conservative value of λ, obtain some
scaling updates that are used to stabilize K, and then gradually increase λ to the desired
value while further stabilizing the kernel.
As discussed in [30], many entries of the stabilized kernel obtained as above could be
below machine precision, but the entries corresponding to those on which the optimal plan
is supported are likely to be above the machine limit. Indeed, this sparsity may even be
leveraged for additional computational tricks.
The techniques for stabilizing the entropy regularized OT problem are not the focus
of our work, but because these considerations naturally arose in our computational experi-
ments, we describe some strategies we undertook that are complementary to the techniques
available in the current literature. In order to provide a perspective complementary to that
presented in [30], we impose the requirement that all entries of the kernel matrix remain
above machine precision.
179
Initializing in the log domain. A simple adaptation of the “log domain absorption” step
referred to above yields a “log initialization” method that works well in most cases for
initializing K to have values above machine precision. To explain this method, we first
present an algorithm (Algorithm 2) for the log domain absorption method. We follow the
presentation provided in [30], making notational changes as necessary.
Notice that in Algorithm 2, K might already have values below machine precision at
initialization. To circumvent this, we can add a preprocessing step that yields a stable
initialization of K. This is outlined in Algorithm 3. An important point to note about Al-
gorithm 3 is that the user needs to choose a function decideParam(α, β) which returns
a number γ between α and β, where α and β are as stated in the algorithm. This number γ
should be such that exp(−λβ + λγ) is above machine precision, but exp(−λα + λγ) is not
too large. The crux of Algorithm 3 is that by choosing large initial scaling vectors a, b and
immediately absorbing them into the log domain, the extreme values of M are canceled out
before exponentiation.
A geometric trick in the p = 2 case. The preceding initialization method has its limita-
tions: depending on how far min(M ), max(M ) are spread apart, the log initialization step
might not be able to yield an initial kernel K that has all entries above machine precision
and below the machine limit. This suggests that it would be beneficial to normalize the cost
matrix M to control the spread of min(M ), max(M ). However, it is crucial to remember
180
Algorithm 3 Log domain initialization of K
procedure LOG I NITIALIZE(M, λ, mX, mY ) . M an m × n cost matrix, mX, mY
prob. measures
α ← min(M ), β ← max(M ) . scan M for max and min values
γ ← decideParam(α, β) . decideParam is an independent function
a ← exp(λγ)1m , b ← exp(λγ)1n
u ← (1/λ) log(a), v ← (1/λ) log(b)
Kij ← exp(λ(−Mij + ui + vj )) . K is stably initialized
a ← 1m , b ← 1n
perform rest of SINKHORN L OG(a)s usual
end procedure
that OT problems arise in our setting when computing the TLB between two networks, so
any normalization would have to be theoretically justified.
It turns out that in the case p = 2, the particular geometry of (Nm , dN,p ) allows for an
elegant normalization scheme. In this particular case, it is possible to use a certain “cosine
rule formula” [117] to compute the TLB between rescaled versions of the original networks
X and Y , and then rescale the solution to get the TLB between X and Y . We describe this
method in detail below. In what follows, we always have p = 2 for dN,p unless specified
otherwise.
The caveat to this normalization scheme, pointed out to us by Justin Solomon, is that
scaling the network weights down in turn requires the λ values to be scaled up, which
once again leads to numerical instability. In practice, we have used the following ap-
proach. When running a computation over a database of networks which have widely
varying weights (such that any fixed choice of λ causes some of the initial kernels to have
values above/below machine precision), we rescale all the networks simultaneously using
the normalization described below. Then we proceed with the Sinkhorn algorithm, employ-
ing log domain absorption steps as needed (and some of the computations will indeed need
more of these absorption steps).
Let (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm . Recall from Example 109 that dN,2 (X, N1 (0)) =
1
2
size2 (X). Define s := 12 size2 (X, ωX , µX ), t := 21 size2 (Y, ωY , µY ). Notice also that for
an optimal coupling µ ∈ C (µX , µY ), we have:
Z Z
1 2 2
2
dN,2 (X, Y ) = |ωX (x, x0 )| + |ωY (y, y 0 )| − 2ωX (x, x0 )ωY (y, y 0 ) dµ(x, y) dµ(x0 , y 0 )
4
Z Z
1
2
=s +t − 2
ωX (x, x0 )ωY (y, y 0 ) dµ(x, y) dµ(x0 , y 0 ),
2
where the first equality holds because |a − b|2 = |a|2 + |b|2 − 2ab for all a, b ∈ R, and
the last equality holds because ωX (x, x0 ), ωY (y, y 0 ) do not depend on µY , µX , respectively.
181
Sturm [117, Lemma 4.2] observed the following “cosine rule” structure. Define
0 ωX ωY
ωX := , ωY0 := . (4.1)
2s 2t
0
Then size2 (X, ωX ) = 2s 1
size2 (X, ωX ) = 1 = 2t1 size2 (Y, ωY ) = size2 (Y, ωY0 ). A geo-
metric fact about this construction is that (X, ωX , µX ), (Y, ωY , µY ) lie on geodesic rays
0
connecting X := (X, ωX , µX ) and Y := (Y, ωY0 , µY ) respectively to N1 (0). Actually, once
(X, ωX , µX ) and (Y, ωY , µY ) are chosen, the geodesic rays are automatically defined to be
given by their scalar multiples. Then we independently define X and Y to be representa-
tives of the weak isomorphism class of networks at dN,2 distance 1/2 from N1 (0) that lie
on these geodesics. We illustrate a related situation in Figure 4.4, and refer the reader to
[117] for further details. Implicitly using this geometric fact, we fix X , Y as above and
treat (X, ωX , µX ), (Y, ωY , µY ) as 2s and 2t-scalings of X and Y, respectively (i.e. such
that Equation 4.1 is satisfied). Then we have:
0
where the last equality holds because size2 (X, ωX ) = 1 = size2 (Y, ωY0 ).
Since (X, ωX , µX ), (Y, ωY , µY ) were 2s, 2t-scalings of X and Y for arbitrary s, t > 0,
this shows in particular that the quantity
182
is more stable because the entries in the corresponding kernel matrix K are more likely to
be above machine precision. Indeed, for larger values of λ, one can scale down ωX , ωY
sufficiently via α and β to ensure that K is well-behaved. The cosine rule can then be used
to recover dN,2 (X, Y ) in terms of dN,2 (X , Y).
X
s
s0 X0
Y0
t0
Y
t
Figure 4.4: Sinkhorn computations for dN,2 (X , Y) are carried out in the “stable region” for
K, and the end result is rescaled to recover dN,2 (X, Y ).
0
This geometric idea is illustrated in Figure 4.4. The spaces (X, ωX ), (X, ωX ), X all
live on a geodesic ray emanating from N1 (0), and likewise for Y . See [117] for more
details about the geodesic structure of gauged measure spaces; the analogous results hold
for (Nm , dN,2 ).
From the preceding observation, we have:
1 1
dN,2 (X, Y )2 − s2 − t2 = dN,2 (X , Y)2 − σ 2 − τ 2
2st 2στ
s2 t2
2 2 2 2
dN,2 (X, Y ) − s − t = αβkωX k∞ kωY k∞ dN,2 (X , Y) − 2 − .
α kωX k2∞ β 2 kωY k2∞
The final step is summarized in the following
Lemma 181. Let X, Y, X , Y, α, β, s, t be as above. Then,
s2 βkωY k∞ t2 αkωX k∞
dN,2 (X, Y )2 = αβkωX k∞ kωY k∞ dN,2 (X , Y)2 − − + s2 + t2 .
αkωX k∞ βkωY k∞
Remark 182. From the perspective of computations, the preceding lemma should be in-
terpreted as follows. The quantities s, t, kωX k∞ , kωY k∞ are all easy to compute. One can
either attempt to obtain a local minimizer for the dN,2 (X , Y)-functional (e.g. following
[111]), or to obtain a TLB-type lower bound for dN,2 (X , Y), as in the current work. In
either case, the output can be rescaled by the formula in the lemma to approximate (or
lower-bound) dN,2 (X, Y )2 .
183
Remark 183. In our experiments, it was best to fix α, β = 1. This cosine rule method
works best when the networks in question have large sizes; if the networks are provided in
normalized form with weights in a small range (e.g. in [0,1]), then it is better to transfer
some of the computations to the log domain using Algorithms 2 and 3.
Zeros on the diagonal for TLB(A, A) and adaptive λ-search. When comparing a net-
work to itself, we expect to get a TLB output of zero. The corresponding optimal coupling
should be the diagonal coupling. However, some care needs to be taken to achieve this
during computation, because entropic regularization produces “smoothened” couplings by
design, and the diagonal coupling is highly nonsmooth. We now briefly explain a simple
heuristic we used to achieve these nonsmooth couplings.
Suppose we are comparing a matrix A to itself via the TLB. The corresponding cost
matrix M should have zeros on the diagonal, which translates to a kernel matrix K with
1s on the diagonal. When M has all 0s on the diagonal and values strictly above 0 ev-
erywhere else, and the two marginals are equal, then the optimal coupling should be the
diagonal coupling: the optimal transport plan is to not transport anything at all. To achieve
this via Sinkhorn iterations, we noticed that the off-diagonal entries in each column of K
needed to be several orders of magnitude below 1. For our desired precision, three orders
of magnitude were sufficient. Since orders of separation in K are easily related to differ-
ences of values in M , we performed the following procedure: for each column j of M , we
computed the minimal difference between values in the column, and computed λj so that
after computing K = e−λj M , the entries in column j of K would be separated by at least
three orders of magnitude. Thus we obtained a pool of λ values. From this list, we used a
binary search to pick the largest λ that would not cause entries of K to go below machine
precision.
While this approach naturally suggests using log domain initialization (as in Algorithm
3) to choose the largest possible λ, we did not use any log domain computations so that we
could independently observe the behavior of this simple heuristic. However, to ensure that
at least one λ in the pool would work without causing entries of K to go below machine
precision, we preprocessed the networks via the cosine rule normalization strategy used
above and inserted a moderate value of λ = 200 into the pool of λ values.
We used this adaptive λ-search heuristic when computing the TLB for any pair of net-
works, not just the TLB of a network with itself (which was the motivation for this heuris-
tic). For an illustration of the operation of this method, see the TLB dissimilarity matrix
corresponding to Experiment 1.10.3 in Figure 1.25.
184
Algorithm 4 TLB computation with cosine rule normalization
procedure GET C OSINE TLB(X, Y, mX, mY ) . X is n × n, Y is m × m, mX, mY
prob. measures
A ← X./ max(abs(X)), B ← Y./ max(abs(Y )) . ./ denotes elementwise division
ρ ← GET TLB(A, B, mX, mY )
s ← (1/2)GET S IZE(A, mX), t ← (1/2)GET S IZE(B, mY )
perform scaling of ρ using s, t as in Lemma 181, save as π
return π
end procedure
procedure GET S IZE(A, mA) . Get the 2-size of a network
0
σ ← sqrt(mA (A. ∧ 2)mA)
return σ
end procedure
procedure GET TLB(A, B, mA, mB) . get TLB over R with p = 2
for 1 ≤ i ≤ n and 1 ≤ j ≤ m do
vAout ← A(i, :), vBout ← B(j, :) . get both eccout and eccin
vAin ← A(:, i), vBin ← B(:, j)
Cout(i, j) ← COMPARE D ISTRIBUTIONS(vAout, vBout, mA, mB)
Cin(i, j) ← COMPARE D ISTRIBUTIONS(vAin, vBin, mA, mB)
end for
perform Sinkhorn iterations for OT problem with Cout, Cin as cost matrices
store results in tlbIn, tlbOut
return max(tlbIn, tlbOut) . both are valid lower bounds for dN,2 , so take the max
end procedure
procedure COMPARE D ISTRIBUTIONS(vA, vB, mA, mB)
γ ← use Equation (1.17) to get 2-Wasserstein distance between
pushforward distributions over R induced by vA and vB
return γ
end procedure
185
The PPH setting is more complicated, due to two reasons: (1) because of directionality,
the number of p-paths on a vertex set is much larger than the number of p-simplices, for any
p ∈ N, and (2) one must first obtain bases for the ∂-invariant p-paths {Ωp : p ≥ 2}. The
first item is unavoidable, and even desirable—we capture the asymmetry in the data, thus
retaining more information. For the second item, note that Ω0 and Ω1 are just the allowed
0 and 1-paths, so their bases can just be read off from the network weight function. After
obtaining compatible bases for the filtered chain complex {Ωi• → Ωi+1 • }i∈N , however, one
can use the general persistent homology algorithm [49, 123, 29]. By compatible bases, we
mean a set of bases {Bpi ⊆ Ωip : 0 ≤ p ≤ D + 1, i ∈ N} such that Bpi ⊆ Bpi+1 for each
i, and relative to which the transformation matrices Mp of ∂p are known. Here D is the
dimension up to which we compute persistence.
We now present a procedure for obtaining compatible bases for the ∂-invariant paths.
Fix a network (X, AX ). We write Rp to denote Rp (X, K), for each p ∈ Z+ . Given a
digraph filtration on X, we obtain a filtered vector space {Ai• → Ai+1 N
• }i=1 and a filtered
chain complex {Ωi• → Ωi+1 N
• }i=1 for some N ∈ N. For any p-path v, define its allow time
as at(v) := min{k ≥ 0 : v ∈ Akp }. Similarly define its entry time as et(v) := min{k ≥
0 : v ∈ Ωkp }. The allow time and entry time coincide when p = 0, 1, but are not necessarily
equal in general. In Figure 1.13, for example, we have at(x4 x1 x2 ) = 1 < 2 = et(x4 x1 x2 ).
Now fix p ≥ 2, and consider the map ∂p : Rp → Rp−1 . Let Mp denote the matrix
representation of ∂p , relative to an arbitrary choice of bases Bp and Bp−1 for Rp and Rp−1 .
For convenience, we write the bases as Bp ={vip : 1 ≤ i ≤ dim(Rp )} and Bp−1 ={vip−1 :
1 ≤ i ≤ dim(Rp−1 )}, respectively. Each basis element has an allow time that can be
computed efficiently, and the allow times belong the set {1, 2, . . . , N }. By performing row
and column swaps as needed, we can arrange Mp so that the basis vectors for the domain
are in increasing allow time, and the basis vectors for the codomain are in decreasing allow
time. This is illustrated in Figure 4.5.
A special feature of Mp is that it is stratified into horizontal strips given by the allow
times of the codomain basis vectors. For each 1 ≤ i ≤ N , we define the height range i as:
In words, hr(i) lists the codomain basis vectors that have allow time i. Next we trans-
form Mp into a column echelon form Mp,G , using left-to-right Gaussian elimination. In this
form, all nonzero columns are to the left of any zero column, and the leading coefficient
(the topmost nonzero element) of any column is strictly above the leading coefficient of
the column on its right. The leading coefficients are usually called pivots. An illustra-
tion of Mp,G is provided in Figure 4.5. To obtain this column echelon form, the following
elementary column operations are used:
186
The basis for the domain undergoes corresponding changes, i.e. we replace vjp by (vjp −
kvip ) as necessary. We write the new basis Bp,G for Rp as {b vip : 1 ≤ i ≤ dim(Rp )}.
Moreover, we can write this basis as a union Bp,G = ∪N i i
i=1 Bp,G , where each Bp,G := {b vkp :
p
1 ≤ k ≤ dim(Rp ), et(b vk ) ≤ i}. This follows easily from the column echelon form: for
each basis vector v of the domain, the corresponding column vector is ∂p (v), and at(∂p (v))
can be read directly from the height of the column. Specifically, if the row index of the
topmost nonzero entry of ∂p (v) belongs to hr(i), then at(∂p (v)) = i, and if ∂p (v) = 0, then
at(∂p (v)) = 0. Then we have et(v) = max(at(v), at(∂p (v))).
Remark 184. In the Gaussian elimination step above, we only eliminate entries by adding
paths that have already been allowed in the filtration. This means that for any operation of
the form vjp ← vjp − kvip , we must have at(vip ) ≤ at(vjp ). Thus at(vjp − kvip ) = at(vjp ). It
follows that the allow times of the domain basis vectors do not change as we pass from Mp
to Mp,G , i.e. Mp and Mp,G have the same number of domain basis vectors corresponding
to any particular allow time.
Now we repeat the same procedure for ∂p+1 : Rp+1 → Rp , taking care to use the basis
Bp,G for Rp . Because we never perform any row operations on Mp+1 , the computations for
i
Mp+1 do not affect Mp,G . We claim that for each 1 ≤ i ≤ N and each p ≥ 0, Bp,G is a
i
basis for Ωp . The correctness of the procedure amounts to proving this claim. Assuming
N
the claim for now, we obtain compatible bases for the chain complex {Ωi• → Ωi+1 • }i=1 .
Applying the general persistence algorithm with respect to the bases we just found now
yields the PPH diagram.
Correctness Note that all paths become allowed eventually, so dim(ΩN p ) = dim(Rp ). We
i
claim that Bp,G is a basis for Ωip , for each 1 ≤ i ≤ N . To see this, fix 1 ≤ i ≤ N and let
i i
v ∈ Bp,G . By the definition of Bp,G , et(v) ≤ i, so v ∈ Ωip . Each Bp,G
i
was obtained by
performing linear operations on the basis Bp of Rp , so it is a linearly independent collection
of vectors in Ωip . Towards a contradiction, suppose Bp,G i
does not span Ωip . Let ũ ∈ Ωip be
i i
linearly independent from Bp,G , and let ṽ ∈ Bp,G \ Bp,G be linearly dependent on ũ (such
a ṽ exists because Bp,G is a basis for Rp ).
Consider the basis Bpũ obtained from Bp,G after replacing ṽ with ũ. Let Mpũ denote
the corresponding matrix, with the columns arranged in the following order from left to
i
right: the first |Bp,G | columns agree with those of Mp,G , the next column is ∂p (ũ), and the
remaining columns appear in the same order that they appear in Mp,G . Notice that Mp,G
differs from Mpũ by a change of (domain) basis, i.e. a sequence of elementary column
operations. Next perform another round of left-to-right Gaussian elimination to arrive at
a column echelon form Mpu , where u is the domain basis vector obtained from ũ after
performing all the column operations. Let Bpu denote the corresponding domain basis. It
is a standard theorem in linear algebra that the reduced column echelon form of a matrix
is unique. Since Mp,G and Mpu were obtained from Mp via column operations, they both
have the same unique reduced column echelon form, and it follows that they have the same
pivot positions.
187
i
Now we arrive at the contradiction. Since ṽ 6∈ Bp,G , we must have either at(ṽ) > i, or
at(∂p (ṽ)) > i. Suppose first that at(ṽ) > i. Since ũ ∈ Ωip , we must have et(ũ) ≤ i, and
so at(ũ) ≤ i. By the way in which we sorted Mpũ , we know that u is obtained by adding
i i
terms from Bp,G to ũ. Each term in Bp,G has allow time ≤ i, so at(u) ≤ i by Remark 184.
u
But then Bp has one more basis vector with allow time ≤ i than Bp , i.e. one fewer basis
vector with allow time > i. This is a contradiction, because taking linear combinations
of linearly independent vectors to arrive at Bpu can only increase the allow time. Next
suppose that at(∂p (ṽ)) > i. Then, because Mp,G is already reduced, the column of ṽ has
a pivot at a height that does not belong to hr(i). Now consider ∂p (u). Suppose first that
∂p (u) = 0. Then the column of u clearly does not have a pivot, and it does not affect the
pivots of the columns to its right in Mpu . Thus Mpu has one fewer pivot than Mp,G , which
is a contradiction because both matrices have the same reduced column echelon form and
hence the same pivot positions. Finally, suppose ∂p (u) 6= 0. Since u is obtained from ũ by
reduction, we also have at(∂p (u)) ≤ at(∂p (ũ)) ≤ i. Thus Mpu has one more pivot at height
i
range i than Mp,G , which is again a contradiction. Thus Bp,G spans Ωip . Since 1 ≤ i ≤ N
was arbitrary, the result follows.
Data structure Our work shows that left-to-right column reduction is sufficient to obtain
compatible bases for the filtered chain complex {Ωi• → Ωi+1 N
• }i=1 . As shown in [123], this
is precisely the operation needed in computing persistence intervals, so we can compute
PPH with little more work. It is known that there are simple ways to optimize the left-to-
right persistence computation [29, 8], but in this paper we follow the classical treatment.
Following [49, 123], our data structure is a linear array T labeled by the elementary regular
p-paths, 0 ≤ p ≤ D + 1, where D is the dimension up to which homology is computed.
For completeness, we show below how to modify the algorithms in [123] to obtain PPH.
Analysis The running time for this procedure is the same as that of Gaussian elimination
over fields, i.e. it is O(m3 ), where m is the number of D-paths (if we compute persistence
up to dimension D − 1). This number is large: the number of regular D-paths over n
points is n(n − 1)D . Computing persistence also requires O(m3 ) running time. Thus, to
compute PPH in dimension D − 1 for a network on n nodes, the worst case running time
is O(n3+3D ).
Compare this with the problem of producing simplicial complexes from networks, and
then computing simplicial persistent homology. For a network on n nodes, assume that the
simplicial filtration is such that every D-simplex on n points eventually enters the filtration
n
(see [37] for such filtrations). The number of D-simplices over n points is D+1 , which
D+1
is of the same order as n . Thus computing simplicial persistent homology in dimen-
sion D − 1 via such a filtration (using the general algorithm of [123]) still has complexity
O(n3+3D ).
188
basis for Rp
at = 1 at = 2 ··· at = N
at = N
at = N
0’s
.
.
.
basis for Rp−1
. at = 1
.
.
at = 0
at = 1
Figure 4.5: Left: The rows and columns of Mp are initially arranged so that the domain
and codomain vectors are in increasing and decreasing allow time, respectively. If there
are no domain (codomain) vectors having a particular allow time, then the corresponding
vertical (horizontal) strip is omitted. Right: After converting to column echelon form, the
domain vectors of Mp,G need not be in the original ordering. But the codomain vectors are
still arranged in decreasing allow time.
189
Algorithm 5 Computing persistent path homology
1: procedure C OMPUTE PPH(X , D + 1) . Compute PPH of network
X up to dimension D
2: for p = 0, . . . , D do
3: Persp = ∅; . Store intervals here
4: for j = 1, . . . , dim(Rp+1 ) do
5: [u, i, et] =BASIS C HANGE(vjp+1 , p + 1);
6: if u = 0 then Mark Tp+1 [j];
7: else
8: Tp [i] ← (u, et);
9: Add (et(vip ), et) to Persp ;
10: end if
11: end for
12: for j = 1, . . . , dim(Rp ) do
13: if Tp [j] marked and empty then
14: Add (et(vjp ), ∞) to Persp ;
15: end if
16: end for
17: end for
18: return Pers0 , . . . , PersD ;
19: end procedure
190
4.5 More experiments using Dowker persistence
191
Because our goal was to analyze the interdependence of industries and the flow of com-
modities across industrial sectors, we removed the diagonal as above to discount the com-
modities produced by each industry in its own type. Next we defined a network (E, ωE ),
where ωE was given by:
ω E (ei , ej )
ωE (ei , ej ) = f P for each 1 ≤ i, j ≤ 71.
e∈E ω E (e, ej )
Here f (x) = 1 − x is a function used to convert the original similarity network into
a dissimilarity network. The greater the dissimilarity, the weaker the investment, and vice
versa. So if ωE (e, e0 ) = 0.85, then sector e is said to make an investment of 15% on
sector e0 , meaning that 15% of the commodities of type e0 produced externally (i.e. by
industries other than e0 ) are produced by industry e. After this preprocessing step, we
computed the 0 and 1-dimensional Dowker persistence diagrams of the resulting network.
The corresponding barcodes are presented in Figure 4.6, and our interpretation is given
below.
Dependent sectors
We open our discussion with the 0-dimensional Dowker persistence barcode presented
in Figure 4.6. Recall that Javaplex produces representative 0-cycles for each persistence
interval in this 0-dimensional barcode. Typically, these representative 0-cycles are given as
the boundary of a 1-simplex, so that we know which pair of sectors merges together into
a 1-simplex and converts the 0-cycle into a 0-boundary. We interpret the representative
sectors of the shortest 0-dimensional persistence intervals as pairs of sectors where one is
strongly dependent on the other. To justify this interpretation, we observe that the right
endpoint of a 0-dimensional persistence interval corresponds to a resolution δ at which two
industries e, e0 find a common δ-sink. Typically this sink is one of e or e0 , although this
sink is allowed to be a third industry e00 . We suggest the following interpretation for being
a common δ-sink: of all the commodities of type e00 produced by industries other than e00 ,
over (1 − δ) ∗ 100% is produced by each of the industries e, e0 . Note that for δ < 0.50, this
interpretation suggests that e00 is actually e0 (or e), and that over 50% of the commodities
of type e0 produced by external industries (i.e. by industries other than e0 ) are actually
produced by e (resp. e0 ).
In Table 4.1 we list some sample representative 0-cycles produced by Javaplex. Note
that these cycles are representatives for bars that we can actually see on the 0-dimensional
barcode in Figure 4.6. We do not focus any more on finding dependent sectors and direct
investment relations, and point the reader to [20] where this topic has been covered in great
detail under the lens of hierarchical clustering (albeit with a slightly different dataset, the
“use” table instead of the make table). In the following subsection, we study some of the
persistent 1-dimensional intervals shown in Figure 4.6, specifically the two longest bars
that we have colored in red.
192
Figure 4.6: 0 and 1-dimensional Dowker persistence barcodes for US economic sector data,
obtained by the process described in §4.5.1. The long 1-dimensional persistence intervals
that are colored in red are examined in §4.5.1 and Figures 4.9,4.12.
Table 4.1: The first two columns contain sample 0-dimensional persistence intervals, as
produced by Javaplex. We have added the labels in column 3, and the common δ-sinks in
column 4.
193
Patterns of investment
Examining the representative cycles of the persistent 1-dimensional intervals in Figure
4.6 allows us to discover patterns of investment that would not otherwise be apparent from
the raw data. Javaplex produces representative nodes for each nontrivial persistence inter-
val, so we were able to directly obtain the industrial sectors involved in each cycle. Note
that for a persistence interval [δ0 , δ1 ), Javaplex produces a representative cycle that emerges
at resolution δ0 . As more 1-simplices enter the Dowker filtration at greater resolutions, the
homology equivalence class of this cycle may coincide with that of a shorter cycle, until
finally it becomes the trivial class at δ1 . We have illustrated some of the representative cy-
cles produced by Javaplex in Figures 4.9 and 4.12. To facilitate our analysis, we have also
added arrows in the figures according to the following rule: for each representative cycle at
resolution δ, there is an arrow ei → ej if and only if ωE (ei , ej ) ≤ δ, i.e. if and only if ej is
a sink for the simplex [ei , ej ] in Dsiδ,E .
Consider the 1-dimensional persistence interval [0.75, 0.95), colored in red in Figure
4.6. The industries involved in a representative cycle for this interval at δ = 0.75 are:
Wood products (WO), Primary metals (PM), Fabricated metal products (FM), Petroleum
and coal products (PC), Chemical products (CH), and Plastics and rubber products (PL).
The entire cycle is illustrated in Figure 4.9. Starting at the bottom right, note that PC has an
arrow going towards CH, suggesting the dependence of the chemical industry on petroleum
and coal products. This makes sense because petroleum and coal products are the major
organic components used by chemical plants. Chemical products are a necessary ingredient
for the synthesis of plastics, which could explain the arrow (CH→PL). Plastic products are
commonly used to produce wood-plastic composites, which are low-cost alternatives to
products made entirely out of wood. This can explain the arrow PL→WO. Next consider
the arrows FM→WO and FM→PM. As a possible interpretation of these arrows, note that
fabricated metal frames and other components are frequently used in wood products, and
fabricated metal structures are used in the extraction of primary metals from ores. Also
note that the metal extraction industry is one of the largest consumers of energy. Since
energy is mostly produced from petroleum and coal products, this is a possible reason for
the arrow PC→PM.
We now consider the 1-dimensional persistence interval [0.81, 1) colored in red in Fig-
ure 4.6. The sectors involved in a representative cycle for this interval at δ = 0.81 are:
Petroleum and coal products (PC), Oil and gas (OG), Waste management (WM), State and
local general government (SLGG), Apparel and leather and allied products (AP), Textile
mills (TE), Plastics and rubber products (PL), and Chemical products (CH). The pattern of
investment in this cycle is illustrated in Figure 4.12, at resolutions δ = 0.81 and δ = 0.99.
We have already provided interpretations for the arrows OG→PC→CH→PL above. Con-
sider the arrow PL→TE. This likely reflects the widespread use of polyester and polyester
blends in production of fabrics. These fabrics are then cut and sewn to manufacture cloth-
ing, hence the arrow TE→AP. Also consider the arrow WM→OG: this suggests the role
of waste management services in the oil and gas industry, which makes sense because the
194
WO FM FM
PL PM PL PM
CH PC CH PC
Figure 4.9: Here we illustrate the representative nodes for one of the 1-dimensional persis-
tence intervals in Figure 4.6. This 1-cycle [PC,CH] + [CH,PL] + [PL,WO] - [WO,FM] +
[FM,PM] - [PM,PC] persists on the interval [0.75, 0.95). At δ = 0.94, we observe that this
1-cycle has joined the homology equivalence class of the shorter 1-cycle illustrated on the
right. Unidirectional arrows represent an asymmetric flow of investment. A full description
of the meaning of each arrow is provided in §4.5.1.
waste management industry has a significant role in the treatment and disposal of hazardous
materials produced in the oil and gas industry. Finally, note that the arrows SLGG→WM
and SLGG→AP likely suggest the dependence of the waste management and apparel in-
dustries on state and local government support.
We note that there are numerous other 1-dimensional persistence intervals in Figure 4.6
that could be worth exploring, especially for economists who regularly analyze the make
tables in input-output accounts and are better prepared to interpret this type of data. The
results obtained in our analysis above suggest that viewing these tables as asymmetric net-
works and then computing their persistence diagrams is a reasonable method for uncovering
their hidden attributes.
195
OG WM
OG WM
PC SLGG
CH SLGG
CH AP
TE AP
PL TE
Figure 4.12: Representative nodes for another 1-dimensional persistence interval in Figure
4.6. A full description of this cycle is provided in §4.5.1.
in our work. We begin with a set S = {s1 , . . . , s52 } of these 52 regions and a function
m : S × S → Z+ , where m(si , sj ) represents the number of migrants moving from si to
sj . We define a network (S, ωS ), with ωS given by:
!
m(si , sj )
ωS (si , sj ) = f P if si 6= sj , ωS (si , si ) = 0, si , sj ∈ S,
si ∈S,i6=j m(si , sj )
where f (x) = 1 − x. The purpose of f is to convert similarity data into dissimilarity data.
A large value of ωS (si , sj ) means that few people move from si to sj . The diagonal is
removed to ensure that we focus on migration patterns, not on the base population of each
state.
196
Interpretation of a δ-sink Next we probe the meaning of a δ-sink in the migration context.
For simplicity, we first discuss δ-sinks of 1-simplices. Let [s, s0 ] be a 1-simplex for s, s0 ∈
S, and let s00 ∈ S be a δ-sink for [s, s0 ]. Then, by unwrapping the definitions above, we
see that s00 receives at least (1 − δ)(influx(s00 )) migrants from each of s, s0 . This suggests
the following physical interpretation of the 1-simplex [s, s0 ]: in 2010, there were at least
(1 − δ)(influx(s00 )) residents in each of s and s0 who had a common goal of moving to s00
in 2011. There could be a variety of reasons for interregional migration—people might be
moving for employment purposes, for better climate, and so on—but the important point
here is that we have a quantitative estimate of residents of s and s0 with similar relocation
preferences. On the other hand, letting r, r0 ∈ S be states such that [r, r0 ] 6∈ Dsiδ , the lack
of a common δ-sink suggests that residents of r and r0 might have significantly different
migration preferences. Following this line of thought, we hypothesize the following:
Residents of states that span a 1-simplex in Dsiδ are more similar to each other (in terms of
migrational preferences) than residents of states that do not span a 1-simplex.
More generally, when n states form an n-simplex in Dsiδ , we say that they exhibit co-
herence of preference at resolution δ. The idea is that the residents of these n states have
a mutual preference for a particular attractor state, which acts as a δ-sink. Conversely, a
collection of n states that do not form an n-simplex are said to exhibit incoherence of pref-
erence at resolution δ—residents of these states do not agree strongly enough on a common
destination for migration.
Interpretation of a connected component Now we try to understand the physical inter-
pretation of a connected component in Dsiδ . Recall that two states s, s0 ∈ S belong to a
connected component in Dsiδ if and only if there exist s1 = s, . . . , sn = s0 ∈ S such that:
Let s1 , . . . , sn ∈ S be such that Condition 4.2 above is satisfied. Note that this implies that
there exists a δ-sink ri for each [si , si+1 ], for 1 ≤ i ≤ n − 1. Let r1 , . . . , rn−1 be δ-sinks
for [s1 , s2 ], . . . , [sn−1 , sn ]. We can further verify, using the fact that ωS vanishes on the
diagonal, that the sinks r1 , . . . , rn−1 themselves belong to this connected component:
The vertex set of any connected component of Dsiδ contains a special subset of “attractor”
or “sink” states at resolution δ.
So in 2010, for any i ∈ {1, n − 1}, there were at least (1 − δ)(influx(ri )) people
in si and in si−1 who had a common goal of moving to ri in 2011. Moreover, for any
i, j ∈ {1, . . . , n} , i 6= j, there were at least min1≤i≤n−1 (1 − δ)(influx(ri )) people in each
197
of si and sj in 2010 who migrated elsewhere in 2011. From a different perspective, we are
able to distinguish all the states in a connected component that are significantly attractive
to migrants (the sinks/receivers), and we have quantitative estimates on the migrant flow
within this connected component into its sink/receiver states.
Consider the special case where each state in a connected component of n states, written
as s1 , s2 , . . . , sn , loses (1 − δ)(influx(r)) residents to a single state r ∈ S. By the preceding
observations, r belongs to this connected component, and we can write r = s1 (relabeling
as needed). Then we observe that the n states {r, s2 , . . . , sn } form an n-simplex, with r as
a common sink. In this case, we have ωS (si , r) ≤ δ for each 2 ≤ i ≤ n. Also note that if
we write
then we can verify that ∂1δ (vn ) = 0, and ∂2δ (γn ) = vn . In other words, we obtain a 1-cycle
that is automatically the boundary of a 2-chain, i.e. is trivial upon passing to homology.
In general, a connected component in Dsiδ might contain chains of states that form loops,
i.e. states s1 , s2 , . . . , sn such that:
Note that Condition 4.3 is of course more stringent than Condition 4.2. By writing such
a loop in the form of vnδ above, we can verify that it forms a 1-cycle. Thus a connected
component containing a loop will be detected in a 1-dimensional Dowker persistence di-
agram, unless the resolution at which the 1-cycle appears coincides with that at which it
becomes a 1-boundary.
Interpretation of 1-cycles The preceding discussion shows that it is necessary to determine
not just 1-cycles, but also the 1-boundaries that they eventually form. Any 1-boundary
arises as the image of ∂2δ applied to a linear combination of 2-simplices in Dsiδ . Note that in
this context, each 2-simplex is a triple of states [si , sj , sk ] with a common sink r to which
each of si , sj , sk has lost (1−δ)(influx(r)) residents between 2010 and 2011. Alternatively,
at least (1 − δ)(influx(r)) residents from each of si , sj , sk had a common preference of
moving to r between 2010 and 2011. Next let {[s1 , s01 , s001 ], [s2 , s02 , s002 ], . . . , [sn , s0n , s00n ]} be
a collection of 2-simplices in Dsiδ , with sinks {r1 , . . . , rn }. One way to consolidate the
information they contain is to simply write them as a sum:
τnδ := [s1 , s01 , s001 ] + [s2 , s02 , s002 ] + . . . + [sn , s0n , s00n ] ∈ C2δ .
198
At this point we have a list of triples of states, and for each triple we have a quantitative
estimate on the number of residents who have a preferred state for migration in common.
Now we consider the following relaxation of this situation: for a fixed i ∈ {1, . . . , n} and
some δ0 < δ, it might be the case that ri is no longer a mutual δ0 -sink for [si , s0i , s00i ], or even
that there is no δ0 -sink for [si , s0i , s00i ]. However, there might still be δ0 -sinks u, u0 , u00 for
[si , s0i ], [s0i , s00i ], [s00i , si ], respectively. In such a case, we see that τnδ0 6∈ C2δ0 , but znδ0 ∈ C1δ0 .
Thus 0 6= hzn iδ0 ∈ H1 (Dsiδ0 ). Assuming that δ > δ0 is the minimum resolution at which
hzn iδ = 0, we then have a general description of the way in which persistent 1-cycles might
arise.
A very special case of the preceding example occurs when we are able to choose a
δ-sink ri for each [si , s0i , s00i ], i ∈ {1, . . . , n}, such that r1 = r2 = . . . = rn . In this
case, we say that znδ0 becomes a 1-boundary due to a single mutual sink r1 . This situation
is illustrated in Figure 4.13. Also note the interpretation of this special case: assuming
that znδ is a 1-boundary, we know that each of the states in the collection ∪ni=1 {si , s0i , s00i }
loses (1 − δ)(influx(r1 )) residents to r1 between 2010 and 2011. This signals that r1 is an
especially strong attractor state.
We remark that none of the 1-cycles in the U.S. migration data set that we analyzed
exhibited the property of becoming a boundary due to a single mutual sink. However,
we did find several examples of this special phenomenon in the global migration dataset
studied in §4.5.3. One of these special sinks turns out to be Djibouti, which is a gateway
from the Horn of Africa to the Middle East, and is both a destination and a port of transit
for migrants moving between Asia and Africa.
Interpretation of barcodes in the context of migration data. Having suggested interpre-
tations of simplices, cycles, and boundaries, we now turn to the question of interpreting a
persistence barcode in the context of migration. Note that when computing persistence bar-
codes, Javaplex can return a representative cycle for each bar, with the caveat that we do not
have any control over which representative is returned. From the 1-dimensional Dowker
persistence barcode of a migration dataset, we can use the right endpoint of a bar to obtain
a 1-boundary, i.e. a list of triples of states along with quantitative estimates on how many
residents from each triple had a preferred migration destination in common. In the special
case where the 1-boundary forms due to a single mutual sink, we will have a further quan-
titative estimate on how many residents from each state in the 1-boundary migrated to the
mutual sink. The left endpoint of a bar in the 1-dimensional Dowker persistence barcode
corresponds to a special connected component with the structure of a 1-cycle. Notice that
all the connected components are listed in the 0-dimensional Dowker persistence diagram.
See §4.5.3 for some additional comments.
Interpretation of error between lower bounds and true migration In each of Tables
4.2 and 4.3 (and Tables 4.4, 4.5 in §4.5.3), we have provided lower bounds on migration
flow between certain states, following the discussion above. More precisely, we do the
following:
199
0-cycles Given a persistence interval [0, δ), δ ∈ R, and a representative 0-cycle, we find
the 1-simplex that converts the 0-cycle into a 0-boundary at resolution δ. We then
find a δ-sink for this 1-simplex, and estimate a lower bound on the migrant flow into
this δ-sink.
We also provide the true migration flows beside our lower bound estimates. However,
in each of our analyses, we incur a certain error between our lower bound and the actual
migration value. We now provide some interpretations for this error.
For the case of 0-cycles, note that all the networks we analyze are normalized to have
edge weights in the interval [0, 1]. For efficiency, in order to produce a Dowker filtration,
we compute Dsiδ for δ-values in the set
So whenever we have ωS (si , sj ) 6∈ delta for some states si , sj ∈ S, the 1-simplex [si , sj ]
is not detected until we compute Dsiδ0 , where δ 0 is the smallest element in delta greater than
ωS (si , sj ). If sj is a δ-sink in this case, then our predicted lower bound on the migration
flow si → sj will differ by up to (0.01)(influx(sj )) from the true value. The situation
described here best explains the error values in Table 4.4.
For the case of 1-cycles, we will study a simple motivating example. Suppose we have
the following 1-simplices:
For each i ∈ {1, . . . , n}, let δi ∈ R denote the resolution at which simplex [si , si+1 (mod n) ]
emerges. For simplicity, suppose we have δ1 ≤ δ2 ≤ δ3 ≤ . . . ≤ δn , and also that s2 is
a δn -sink for [s1 , s2 ]. For our lower bound, we estimate that the migrant flow s1 → s2 is
at least (1 − δn )(influx(s2 )). A better lower bound would be (1 − δ1 )(influx(s2 )), but the
only δ-value that Javaplex gives us access to is δn . Because δ1 could be much smaller than
δn , it might be the case that our lower bound is much smaller than the true migration.
The preceding discussion suggests the following inference: if a 1-simplex [si , si+1 ]
exhibits a large error between the true migration into a δn -sink and the predicted lower
bound, then [si , si+1 ] likely emerged at a resolution proportionately smaller than δn . Thus
we can interpret the states si , si+1 as exhibiting relatively strong coherence of preference.
Conversely, 1-simplices that exhibit a smaller error likely emerged at a resolution closer to
δn —the states forming such 1-simplices exhibited incoherence of preference for a greater
range of resolutions. Note that even though we made some simplifying assumptions in our
choice of a 1-cycle, a similar analysis can be done for any 1-cycle.
200
Figure 4.13: An example of a 1-cycle becoming a 1-boundary due to a single mutual sink
r, as described in the interpretation of 1-cycles in §4.5.2. The figure on the left shows a
connected component of Dsiδ,S , consisting of [s1 , s2 ], [s2 , s3 ], [s3 , s4 ]. The arrows are meant
to suggest that r will eventually become a δ-sink for each of these 1-simplices, for some
large enough δ. The progression of these simplices for increasing values of δ are shown
from left to right. In the leftmost figure, r is not a δ-sink for any of the three 1-simplices.
Note that r has become a δ-sink for [s3 , s4 ] in the middle figure. Finally, in the rightmost
figure, r has become a δ-sink for each of the three 1-simplices.
201
Analysis of OH-KY-GA-FL cycle
1-simplex 0.90-sinks Estimated lower bound on migration True migration
[FL,OH] WV (1 − 0.90)(influx(WV)) = 4597 m(FL, WV) = 4964
m(OH, WV) = 7548
[FL,GA] AL (1 − 0.90)(influx(AL)) = 10684 m(FL, AL) = 12635
m(GA, AL) = 18799
GA (1 − 0.90)(influx(GA)) = 24913 m(FL, GA) = 38658
[KY,OH] KY (1 − 0.90)(influx(KY)) = 9925 m(OH, KY) = 12744
[GA,KY] TN (1 − 0.90)(influx(TN)) = 15446 m(GA, TN) = 16898
m(KY, TN) = 16852
2-simplex 0.94-sinks Estimated lower bound on migration True migration
[FL,OH,KY] IN (1 − 0.94)(influx(IN)) = 8640 m(FL, IN) = 11472
m(OH, IN) = 11588
m(KY, IN) = 11071
OH (1 − 0.94)(influx(OH)) = 12363 m(FL, OH) = 18191
m(KY, OH) = 19617
[FL,GA,KY] TN (1 − 0.94)(influx(TN)) = 9268 m(FL, TN) = 10451
m(GA, TN) = 16898
m(KY, TN) = 16852
Table 4.2: Quantitative estimates on migrant flow, following the interpretation presented in
§4.5.2. In each row, we list a simplex of the form [si , sj ] (resp. [si , sj , sl ] for 2-simplices)
and any possible δ-sinks sk . We hypothesize that sk receives at least (1 − δ)(influx(sk ))
migrants from each of si , sj (resp. si , sj , sl )—these lower bounds are presented in the third
column. The fourth column contains the true migration numbers. Notice that the [FL,GA]
simplex appears to show the greatest error between the lower bound and the true migra-
tion. Following the interpretation suggested earlier in §4.5.2, this indicates that Florida and
Georgia appear to have strong coherence of preference, relative to the other pairs of states
spanning 1-simplices in this table.
202
Analysis of WA-OR-CA-AZ-UT-ID cycle
1-simplex 0.87-sinks Estimated lower bound on migration True migration
[CA,OR] OR (1 − 0.87)(influx(OR)) = 14273 m(CA, OR) = 18165
[OR,WA] OR (1 − 0.87)(influx(OR)) = 14273 m(WA, OR) = 29168
[AZ,UT] UT (1 − 0.87)(influx(UT)) = 9517 m(AZ, UT) = 10577
[ID,UT] ID (1 − 0.87)(influx(ID)) = 7519 m(UT, ID) = 7538
[ID,WA] ID (1 − 0.87)(influx(ID)) = 7519 m(WA, ID) = 10895
[AZ,CA] AZ (1 − 0.87)(influx(AZ)) = 27566 m(CA, AZ) = 35650
2-simplex 0.92-sinks Estimated lower bound on migration True migration
[OR,WA,UT] ID (1 − 0.92)(influx(ID)) = 4627 m(OR, ID) = 6236
m(WA, ID) = 10895
m(UT, ID) = 7538
[AZ,CA,ID] UT (1 − 0.92)(influx(UT)) = 5856 m(AZ, UT) = 10577
m(CA, UT) = 8944
m(ID, UT) = 6059
Table 4.3: Quantitative estimates on migrant flow, following the interpretation presented in
§4.5.2. The entries in this table follow the same rules as those of Table 4.2. Notice that
the [OR,WA] and [AZ,CA] simplices show the greatest error between the lower bound and
the true migration. Following the interpretation in §4.5.2, this suggests that these two pairs
of states exhibit stronger coherence of preference than the other pairs of states forming
1-simplices in this table.
Figure 4.14: 0 and 1-dimensional Dowker persistence barcodes for U.S. migration data
203
Figure 4.15: U.S. map with representative cycles of the persistence intervals that were
highlighted in Figure 4.14. The cycle on the left appears at δ1 = 0.87, and the cycle on the
right appears at δ2 = 0.90. The red lines indicate the 1-simplices that participate in each
cycle. Each red line is decorated with an arrowhead si → sj if and only if sj is a sink for
the simplex [si , sj ]. The blue arrows point towards all possible alternative δ-sinks, and are
interpreted as follows: Tennessee is a 0.90-sink for the Kentucky-Georgia simplex, West
Virginia is a 0.90-sink for the Ohio-Florida simplex, and Alabama is a 0.90-sink for the
Georgia-Florida simplex.
204
To probe the sociological aspects of a 1-cycle, we recall our hypothesis that residents
of states that are not connected by a 1-simplex are less similar to each other than residents
of states that are connected as such. The West Coast cycle given above seems to follow
this hypothesis: It seems reasonable to think that residents of California would be quite
different from residents of Idaho or Utah, and possibly quite similar to those of Oregon.
Similarly, one would expect a large group of people from Ohio and Kentucky to be quite
similar, especially with Cincinnati being adjacent to the state border with Kentucky. The
Ohio-Florida simplex might be harder to justify, but given the very small population of
their mutual sink West Virginia, it might be the case that the similarity between Ohio and
Florida is being overrepresented.
Our analysis shows that by using Dowker persistence diagrams for exploratory analysis
of migration data, we can obtain meaningful lower bounds on the number of residents from
different states who share a common migration destination. An interesting extension of this
experiment would be to study the persistence barcodes of migration networks over a longer
range of years than the 2010-2011 range that we have used here: ideally, we would be able
to detect changing trends in migration from changes in the lower bounds that we obtain.
where f (x) = 1 − x. The 0 and 1-dimensional Dowker persistence barcodes that we obtain
from this network are provided in Figure 4.16. Some of the 0 and 1-dimensional persistence
intervals are tabulated in Tables 4.4 and 4.5.
We interpret connected components, simplices, cycles and boundaries for the Dowker
sink complexes constructed from the global migration data just as we did for the U.S.
migration data in §4.5.2.
We draw the reader’s attention to two interesting features in the persistence barcodes
of the global migration dataset. First, the 0-dimensional barcode contains many short bars
(e.g. bars of length less than 0.2). In contrast, the shortest bars in the 0-dimensional barcode
for the U.S. migration data had length greater than 0.65. In our interpretation, which we
explain more carefully below, this observation suggests that migration patterns in the U.S.
are relatively uniform, whereas global migration patterns can be more skewed. Second,
205
because there are many more 1-dimensional persistence intervals, it is easier to find a 1-
cycle that becomes a boundary due to a single mutual sink, i.e. due to an especially strong
“attractor” region.
For the first observation, consider a 0-dimensional persistence interval [0, δ), where δ
is assumed to be small. Formally, this interval represents the persistence of a 0-cycle that
emerges at resolution 0, and becomes a 0-boundary at resolution δ. One can further verify
the following: this interval represents the resolutions for which a 0-simplex [ci ], ci ∈ C
remains disconnected from other 0-simplices, and δ is a resolution at which ci forms a 1-
simplex with some cj ∈ C. Recall from §4.5.2 that this means the following: either there
exists a region ck 6∈ {ci , cj } which receives at least (1 − δ)(influx(ck )) migrants from each
of ci and cj , or ck = ci (or cj ) and ck receives over (1 − δ)(influx(ck )) migrants from cj
(resp. ci ). The first case cannot happen when δ < 0.5, because this would mean that ck
receives strictly over 50% of its migrant influx from each of ci and cj . Thus when δ < 0.5,
we know that ck = ci (or ck = cj ), and ck receives over 50% of its migrant influx from cj
(resp. ci ). For very small δ, we then know that most of the migrants into ck arrive from cj
(resp. ci ).
For convenience, let us assume that δ < 0.2, that ck = ci , and that ck receives over
80% of its migrant influx from cj . This might occur for a variety of reasons, some of which
are: (1) there might be war or political strife in cj and ck might be letting in refugees, (2)
ck might have achieved independence or administrative autonomy and some residents from
cj might be flocking to ck because they perceive it to be their homeland, and (3) cj might
be overwhelmingly populous in comparison to other neighboring regions of ck , so that the
contribution of cj to the migrant influx of ck dominates that of other regions.
Notice that neither of the first two reasons listed above are valid in the case of U.S.
migration. The third reason is valid in the case of a few states, but nevertheless, the short-
est 0-dimensional persistence interval in the U.S. migration dataset has length greater than
0.65. In other words, the minimal resolution at which a 1-simplex forms in the U.S. migra-
tion data is 0.65. This in turn means that there is no state in the U.S. which receives over
35% of its migrant influx from any single other state. Based on this reasoning, we interpret
the migration pattern of the U.S. as “diffuse” or “uniform”, and that of the world as a whole
as “skewed” or “biased”. This makes intuitive sense, because despite the heterogeneity of
the U.S. and differences in state laws and demographics, any resident can easily migrate
to any other state of their choice while maintaining similar legal rights, salary, and living
standards.
In Table 4.4, we list some short 0-dimensional persistence intervals for the global mi-
gration dataset. For each interval [0, δ), we also include the 1-simplex that emerges at δ,
the δ-sink associated to this 1-simplex, and our lower bound on the migrant influx into
this sink. Note that the error between the true migration numbers and our predicted lower
bounds is explained in §4.5.2. Also notice that many of the migration patterns provided
in Table 4.4 seem to fit with the suggestions we made earlier: (1) political turmoil in the
West Bank and Gaza (especially following the Gulf War) prompted many Palestinians to
206
enter Syria, (2) Greenland and Macao are both autonomous regions of Denmark and China,
respectively, and (3) India’s population far outstrips that of its neighbors, and its migrant
flow plays a dominating role in the migrant influx of its neighboring states.
For the second observation, recall from our discussion in §4.5.2 that whenever we have
a 1-cycle involving regions c1 , . . . , cn that becomes a 1-boundary at resolution δ ≥ 0 due
to a single mutual sink cn+1 , we know that cn+1 receives at least (1 − δ)(influx(cn+1 ))
migrants from each of c1 , . . . , cn . As such, cn+1 can be perceived to be an especially strong
attractor region. In Table 4.5 we list some 1-cycles persisting on an interval [δ0 , δ1 ), their
mutual δ1 -sinks, our lower bound on migration flow, and the true migration numbers. The
reader is again encouraged to check that the true migration agrees with the lower bounds
that we predicted. We remark that the first row of this table contains a notable example of a
strong attractor region: Djibouti. Djibouti is geographically located at a crossroads of Asia
and Africa, and is a major commercial hub due to its access to the Red Sea and the Indian
Ocean. As such, one would expect it to be a destination for many migrants in the Horn of
Africa, as well as a transit point for migrants moving between Africa and the Middle East.
The Oceania cycle listed in the fourth row of Table 4.5 can likely be discarded; the
very small migrant influx of Samoa indicates that it attractiveness as a sink state is being
overrepresented. The second row lists China as a strong attractor, which is reasonable given
its economic growth between 1990 and 2000, and as a consequence, its attractiveness to
foreign workers from neighboring countries. The third row lists Vietnam as a strong sink,
and one reason could be that in the 1990s, many refuges who had been displaced due to the
Vietnam War were returning to their homeland.
We also illustrate the emergence at δ0 for some of these cycles in Figure 4.17.
207
Figure 4.16: Dowker persistence barcodes for global migration dataset.
Table 4.4: Short 0-dimensional Dowker persistence intervals capture regions which receive
most of their incoming migrants from a single source. Each interval [0, δ) corresponds to
a 0-simplex which becomes subsumed into a 1-simplex at resolution δ. We list these 1-
simplices in the second column, and their δ sinks in the third column. The definition of
a δ-sink enables us to produce a lower bound on the migration into each sink, which we
provide in the fourth column. We also list the true migration numbers in the fifth column,
and the reader can consult §4.5.2 for our explanation of the error between the true migration
and the lower bounds on migration.
208
DJI KIR
CHN CHN
Figure 4.17: Top: Two cycles corresponding to the left endpoints of the (Djibouti-Somalia-
Uganda-Eritrea-Ethiopia) and (Kiribati-Papua New Guinea-Australia-United Kingdom-
Tuvalu) persistence intervals listed in Table 4.5. The δ values are 0.73, 0.77, respec-
tively. Bottom: Two cycles corresponding to the left endpoints of the (China-Thailand-
Philippines) and (China-Indonesia-Malaysia) persistence intervals listed in Table 4.5. The
δ values are 0.77, 0.75, respectively. Meaning of arrows: In each cycle, an arrow si → sj
means that ωS (si , sj ) ≤ δ, i.e. that sj is a sink for the simplex [si , sj ]. We can verify sep-
arately that for δ = 0.77, the Kiribati-Papua New Guinea simplex has the Solomon Islands
as a δ-sink, and that the Philippines-Thailand simplex has Taiwan as a δ-sink. Similarly,
the China-Malaysia simplex has Singapore as a δ-sink, for δ = 0.75.
209
1-cycles with single mutual sink in global migration data
Interval [δ0 , δ1 ) Regions involved Mutual δ1 -sink(s) Lower bound on migration True migration
[0.73,0.98) Djibouti—Ethiopia—Eritrea—Uganda—Somalia Djibouti (0.02)(influx(DJI)) = 1738 m(ERI, DJI) = 3259
m(ETH, DJI) = 25437
m(SOM, DJI) = 41968
m(UGA, DJI) = 1811
[0.77,0.94) China—Thailand—Philippines China (0.06)(influx(CHN)) = 12858 m(THA, CHN) = 14829
m(PHL, CHN) = 17828
[0.75,0.89) China—Indonesia—Malaysia Vietnam (0.11)(influx(VNM)) = 4465 m(CHN, VNM) = 8940
m(IDN, VNM) = 10529
m(MYS, VNM) = 4813
[0.86,0.93) American Samoa—New Zealand—Samoa—Australia Samoa (0.07)(influx(WSM)) = 397 m(ASM, WSM) = 1920
m(NZL, WSM) = 1803
m(AUS, WSM) = 404
[0.77,0.92) Kiribati—Papua New Guinea—Australia—United Kingdom—Tuvalu Not applicable Not Applicable Not Applicable
Table 4.5: Representative 1-cycles for several intervals in the 1-dimensional persistence
barcode for the global migration dataset. Each of the first four cycles has the special prop-
erty that it becomes a boundary due to a single sink at the right endpoint of its associated
persistence interval. This permits us to obtain a lower bound on the migration into this
sink from each of the regions in the cycle. The last row contains a cycle without this spe-
cial property. The font colors of the persistence intervals correspond to the colors of the
highlighted 1-dimensional bars in Figure 4.16.
210
Bibliography
[2] Emmanuel Abbe. Community detection and stochastic block models: recent devel-
opments. arXiv preprint arXiv:1703.10146, 2017.
[3] Michał Adamaszek and Henry Adams. The Vietoris–Rips complexes of a circle.
Pacific Journal of Mathematics, 290(1):1–40, 2017.
[4] Michal Adamaszek, Henry Adams, Florian Frick, Chris Peterson, and Corrine
Previte-Johnson. Nerve complexes of circular arcs. Discrete & Computational Ge-
ometry, 56(2):251–273, 2016.
[5] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Gradient flows: in metric
spaces and in the space of probability measures. Springer Science & Business Me-
dia, 2008.
[6] David Bao, S-S Chern, and Zhongmin Shen. An introduction to Riemann-Finsler
geometry, volume 200. Springer Science & Business Media, 2012.
[7] Jonathan A Barmak. Algebraic topology of finite topological spaces and applica-
tions, volume 2032. Springer, 2011.
[8] Ulrich Bauer, Michael Kerber, and Jan Reininghaus. Clear and compress: Comput-
ing persistent homology in chunks. In Topological Methods in Data Analysis and
Visualization III, pages 103–117. Springer, 2014.
[9] Ulrich Bauer and Michael Lesnick. Induced matchings of barcodes and the alge-
braic stability of persistence. In Proceedings of the thirtieth annual symposium on
Computational geometry, page 355. ACM, 2014.
[10] Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel
Peyré. Iterative Bregman projections for regularized transportation problems. SIAM
Journal on Scientific Computing, 37(2):A1111–A1138, 2015.
211
[11] Anders Björner. Topological methods. Handbook of combinatorics, 2:1819–1872,
1995.
[12] Anders Björner, Bernhard Korte, and László Lovász. Homotopy properties of gree-
doids. Advances in Applied Mathematics, 6(4):447–494, 1985.
[13] Vladimir I Bogachev. Measure theory, volume 2. Springer Science & Business
Media, 2007.
[14] Vladimir I Bogachev. Measure theory, volume 1. Springer Science & Business
Media, 2007.
[15] Mireille Boutin and Gregor Kemper. Lossless representation of graphs using distri-
butions. arXiv preprint arXiv:0710.1870, 2007.
[16] Martin R Bridson and André Haefliger. Metric spaces of non-positive curvature,
volume 319. Springer Science & Business Media, 2011.
[17] Dmitri Burago, Yuri Burago, and Sergei Ivanov. A Course in Metric Geometry,
volume 33 of AMS Graduate Studies in Math. American Mathematical Society,
2001.
[18] Rainer E Burkard, Mauro Dell’Amico, and Silvano Martello. Assignment Problems.
SIAM, 2009.
[19] Gunnar Carlsson and Vin De Silva. Zigzag persistence. Foundations of computa-
tional mathematics, 10(4):367–405, 2010.
[20] Gunnar Carlsson, Facundo Mémoli, Alejandro Ribeiro, and Santiago Segarra. Ax-
iomatic construction of hierarchical clustering in asymmetric networks. In Acoustics,
Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on,
pages 5219–5223. IEEE, 2013.
[21] Gunnar Carlsson, Afra Zomorodian, Anne Collins, and Leonidas J Guibas. Persis-
tence barcodes for shapes. International Journal of Shape Modeling, 11(02):149–
187, 2005.
[22] Gunnar E. Carlsson, Facundo Mémoli, Alejandro Ribeiro, and Santiago Segarra.
Hierarchical quasi-clustering methods for asymmetric networks. In Proceedings of
the 31th International Conference on Machine Learning, ICML 2014, 2014.
212
[24] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J Guibas, and Steve Y
Oudot. Proximity of persistence modules and their diagrams. In Proceedings of the
twenty-fifth annual symposium on Computational geometry, pages 237–246. ACM,
2009.
[25] Frédéric Chazal, David Cohen-Steiner, Leonidas J Guibas, Facundo Mémoli, and
Steve Y Oudot. Gromov-hausdorff stable signatures for shapes using persistence.
In Computer Graphics Forum, volume 28, pages 1393–1403. Wiley Online Library,
2009.
[26] Frédéric Chazal, Vin De Silva, Marc Glisse, and Steve Oudot. The structure and
stability of persistence modules. Springer International Publishing, 2016.
[27] Frédéric Chazal, Vin De Silva, and Steve Oudot. Persistence stability for geometric
complexes. Geometriae Dedicata, 173(1):193–214, 2014.
[29] Chao Chen and Michael Kerber. Persistent homology computation with a twist.
In Proceedings 27th European Workshop on Computational Geometry, volume 11,
2011.
[31] Lénaı̈c Chizat, Gabriel Peyré, Bernhard Schmitzer, and François-Xavier Vialard.
Scaling algorithms for unbalanced optimal transport problems. Math. Comp.,
87(314):2563–2609, 2018.
[32] Samir Chowdhury, Bowen Dai, and Facundo Mémoli. Topology of stimulus space
via directed network persistent homology. Cosyne Abstracts 2017.
[33] Samir Chowdhury, Bowen Dai, and Facundo Mémoli. The importance of forgetting:
Limiting memory improves recovery of topological characteristics from neural data.
PloS one, 13(9):e0202561, 2018.
[35] Samir Chowdhury and Facundo Mémoli. Distances and isomorphism between net-
works and the stability of network invariants. arXiv preprint arXiv:1708.04727,
2017.
213
[36] Samir Chowdhury and Facundo Mémoli. Explicit geodesics in Gromov-Hausdorff
space. Electronic Research Announcements in Mathematical Sciences, 2018.
[37] Samir Chowdhury and Facundo Mémoli. A functorial Dowker theorem and per-
sistent homology of asymmetric networks. Journal of Applied and Computational
Topology, 2(1-2):115–175, 2018.
[38] Samir Chowdhury and Facundo Mémoli. The Gromov-Wasserstein distance be-
tween networks and stable network invariants. arXiv preprint arXiv:1808.04337,
2018.
[39] Samir Chowdhury and Facundo Mémoli. The metric space of networks. arXiv
preprint arXiv:1804.02820, 2018.
[40] Samir Chowdhury and Facundo Mémoli. Persistent path homology of directed net-
works. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Dis-
crete Algorithms, pages 1152–1169. SIAM, 2018.
[41] Carina Curto and Vladimir Itskov. Cell groups reveal structure of stimulus space.
PLoS Computational Biology, 4(10), 2008.
[43] Yuri Dabaghian, Facundo Mémoli, L Frank, and Gunnar Carlsson. A topological
paradigm for hippocampal spatial map formation using persistent homology. PLoS
Comput Biol, 8(8), 2012.
[44] Vin De Silva and Gunnar Carlsson. Topological estimation using witness complexes.
Proc. Sympos. Point-Based Graphics, pages 157–166, 2004.
[45] Tamal K Dey, Facundo Mémoli, and Yusu Wang. Multiscale mapper: Topological
summarization via codomain covers. In Proceedings of the twenty-seventh annual
ACM-SIAM Symposium on Discrete Algorithms, pages 997–1013. SIAM, 2016.
[48] Herbert Edelsbrunner, Grzegorz Jabłoński, and Marian Mrozek. The persistent ho-
mology of a self-map. Foundations of Computational Mathematics, 15(5):1213–
1244, 2015.
214
[49] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persis-
tence and simplification. Discrete and Computational Geometry, 28(4):511–533,
2002.
[50] Herbert Edelsbrunner and Dmitriy Morozov. Persistent homology: theory and prac-
tice. 2014.
[52] Alon Efrat, Alon Itai, and Matthew J Katz. Geometry helps in bottleneck matching
and related problems. Algorithmica, 31(1):1–28, 2001.
[55] M Maurice Fréchet. Sur quelques points du calcul fonctionnel. Rendiconti del
Circolo Matematico di Palermo (1884-1940), 22(1):1–72, 1906.
[56] Patrizio Frosini. Measuring shapes by size functions. In Intelligent Robots and Com-
puter Vision X: Algorithms and Techniques, pages 122–133. International Society for
Optics and Photonics, 1992.
[57] Fred Galvin and Samuel Shore. Completeness in semimetric spaces. Pacific Journal
of Mathematics, 113(1):67–75, 1984.
[58] Fred Galvin and Samuel Shore. Distance functions and topologies. The American
Mathematical Monthly, 98(7):620–623, 1991.
[60] Chad Giusti, Eva Pastalkova, Carina Curto, and Vladimir Itskov. Clique topology
reveals intrinsic geometric structure in neural correlations. Proceedings of the Na-
tional Academy of Sciences, 112(44):13455–13460, 2015.
[61] Thibaut Le Gouic and Jean-Michel Loubes. Existence and consistency of wasser-
stein barycenters. arXiv preprint arXiv:1506.04153, 2015.
[62] Alexander Grigor’yan, Yong Lin, Yuri Muranov, and Shing-Tung Yau. Homologies
of path complexes and digraphs. arXiv preprint arXiv:1207.2834, 2012.
[63] Alexander Grigor’yan, Yong Lin, Yuri Muranov, and Shing-Tung Yau. Homotopy
theory for digraphs. Pure and Applied Mathematics Quarterly, 10(4), 2014.
215
[64] Alexander Grigor’yan, Yuri Muranov, and Shing-Tung Yau. Homologies of digraphs
and the Künneth formula. 2015.
[65] Misha Gromov. Metric structures for Riemannian and non-Riemannian spaces, vol-
ume 152 of Progress in Mathematics. Birkhäuser Boston Inc., Boston, MA, 1999.
[68] Allen Hatcher. Algebraic topology. 2002. Cambridge UP, Cambridge, 606(9), 2002.
[71] Danijela Horak, Slobodan Maletić, and Milan Rajković. Persistent homology of
complex networks. Journal of Statistical Mechanics: Theory and Experiment,
2009(03):P03034, 2009.
[72] Karen J Horowitz and Mark A Planting. Concepts and methods of the input-output
accounts. 2006.
[73] Alexandr Ivanov, Nadezhda Nikolaeva, and Alexey Tuzhilin. The Gromov-
Hausdorff metric on the space of compact metric spaces is strictly intrinsic. arXiv
preprint arXiv:1504.03830, 2015.
[74] Roy A Johnson. Atomic and nonatomic measures. Proceedings of the American
Mathematical Society, 25(3):650–655, 1970.
[75] Nigel J Kalton and Mikhail I Ostrovskii. Distances between Banach spaces. In
Forum Mathematicum, volume 11, pages 17–48. Walter de Gruyter, 1999.
[76] Arshi Khalid, Byung Sun Kim, Moo K Chung, Jong Chul Ye, and Daejong Jeon.
Tracing the evolution of multi-scale functional networks in a mouse model of depres-
sion using persistent brain network homology. NeuroImage, 101:351–363, 2014.
216
[78] Dimitry Kozlov. Combinatorial algebraic topology, volume 21. Springer Science &
Business Media, 2007.
[79] Janko Latschev. Vietoris-rips complexes of metric spaces near a closed riemannian
manifold. Archiv der Mathematik, 77(6):522–528, 2001.
[80] Hyekyoung Lee, Moo K Chung, Hyejin Kang, Boong-Nyun Kim, and Dong Soo
Lee. Computing the shape of brain networks using graph filtration and
gromov-hausdorff metric. In Medical Image Computing and Computer-Assisted
Intervention–MICCAI 2011, pages 302–309. Springer, 2011.
[81] Solomon Lefschetz. Algebraic topology, volume 27. American Mathematical Soc.,
1942.
[84] Paolo Masulli and Alessandro EP Villa. The topology of the directed clique complex
as a network invariant. SpringerPlus, 5(1):1–12, 2016.
[85] Facundo Mémoli. On the use of Gromov-Hausdorff distances for shape comparison.
2007.
[86] Facundo Mémoli. Gromov-Wasserstein distances and the metric approach to ob-
ject matching. Foundations of Computational Mathematics, pages 1–71, 2011.
10.1007/s10208-011-9093-5.
[91] VW Niemytzki. On the “third axiom of metric space”. Transactions of the American
Mathematical Society, 29(3):507–513, 1927.
217
[93] John O’Keefe and Jonathan Dostrovsky. The hippocampus as a spatial map. prelimi-
nary evidence from unit activity in the freely-moving rat. Brain research, 34(1):171–
175, 1971.
[94] Çaglar Özden, Christopher R Parsons, Maurice Schiff, and Terrie L Walmsley.
Where on earth is everybody? The evolution of global bilateral migration 1960–
2000. The World Bank Economic Review, 25(1):12–56, 2011.
[95] Panos M Pardalos and Henry Wolkowicz, editors. Quadratic assignment and related
problems. DIMACS Series in Discrete Mathematics and Theoretical Computer Sci-
ence, 16. American Mathematical Society, Providence, RI, 1994.
[96] Xavier Pennec. Intrinsic statistics on Riemannian manifolds: Basic tools for ge-
ometric measurements. Journal of Mathematical Imaging and Vision, 25(1):127,
2006.
[98] Peter Petersen. Riemannian geometry, volume 171. Springer Science & Business
Media, 2006.
[99] Giovanni Petri, Martina Scolamiero, Irene Donato, and Francesco Vaccarino. Topo-
logical strata of weighted complex networks. PloS one, 8(6):e66506, 2013.
[100] Gabriel Peyré, Marco Cuturi, and Justin Solomon. Gromov-Wasserstein averaging
of kernel and distance matrices. In International Conference on Machine Learning,
pages 2664–2672, 2016.
[101] Arthur Dunn Pitcher and Edward Wilson Chittenden. On the foundations of the
calcul fonctionnel of Fréchet. Transactions of the American Mathematical Society,
19(1):66–78, 1918.
[102] Michael W Reimann, Max Nolte, Martina Scolamiero, Katharine Turner, Rodrigo
Perin, Giuseppe Chindemi, Paweł Dłotko, Ran Levi, Kathryn Hess, and Henry
Markram. Cliques of neurons bound into cavities provide a missing link between
structure and function. Frontiers in computational neuroscience, 11:48, 2017.
[104] Sorin V Sabau, Kazuhiro Shibuya, and Hideo Shimada. Metric structures associated
to finsler metrics. arXiv preprint arXiv:1305.5880, 2013.
218
[105] Felix Schmiedl. Shape matching and mesh segmentation: mathematical analysis,
algorithms and an application in automated manufacturing. PhD thesis, München,
Technische Universität München, Diss., 2015, 2015.
[106] Bernhard Schmitzer. Stabilized sparse scaling algorithms for entropy regularized
transport problems. arXiv preprint arXiv:1610.06519, 2016.
[107] Bernhard Schmitzer and Christoph Schnörr. Modelling convex shape priors and
matching based on the Gromov-Wasserstein distance. Journal of mathematical
imaging and vision, 46(1):143–159, 2013.
[108] Yi-Bing Shen and Wei Zhao. Gromov pre-compactness theorems for nonreversible
finsler manifolds. Differential Geometry and its Applications, 28(5):565–581, 2010.
[109] Richard Sinkhorn. A relationship between arbitrary positive matrices and doubly
stochastic matrices. The annals of mathematical statistics, 35(2):876–879, 1964.
[110] Richard Sinkhorn. Diagonal equivalence to matrices with prescribed row and col-
umn sums. The American Mathematical Monthly, 74(4):402–405, 1967.
[111] Justin Solomon, Gabriel Peyré, Vladimir G Kim, and Suvrit Sra. Entropic metric
alignment for correspondence problems. ACM Transactions on Graphics (TOG),
35(4):72, 2016.
[112] Edwin H Spanier. Algebraic topology, volume 55. Springer Science & Business
Media, 1994.
[113] Sashi Mohan Srivastava. A course on Borel sets, volume 180. Springer Science &
Business Media, 2008.
[114] Lynn Arthur Steen, J Arthur Seebach, and Lynn A Steen. Counterexamples in topol-
ogy, volume 18. Springer, 1978.
[115] Aleksandar Stojmirović and Yi-Kuo Yu. Geometric aspects of biological sequence
comparison. Journal of Computational Biology, 16(4):579–610, 2009.
[116] Karl-Theodor Sturm. On the geometry of metric measure spaces. Acta mathematica,
196(1):65–131, 2006.
[117] Karl-Theodor Sturm. The space of spaces: curvature bounds and gradient flows on
the space of metric measure spaces. arXiv preprint arXiv:1208.0434, 2012.
[118] Katharine Turner. Generalizations of the Rips filtration for quasi-metric spaces with
persistent homology stability results. arXiv preprint arXiv:1608.00365, 2016.
219
[119] Titouan Vayer, Laetitia Chapel, Rémi Flamary, Romain Tavenard, and Nicolas
Courty. Optimal transport for structured data. arXiv preprint arXiv:1805.09114,
2018.
[120] Cédric Villani. Topics in optimal transportation. Number 58. American Mathemat-
ical Soc., 2003.
[121] Cédric Villani. Optimal transport: old and new, volume 338. Springer Science &
Business Media, 2008.
[122] Pawel Waszkiewicz. The local triangle axiom in topology and domain theory. Ap-
plied General Topology, 4(1):47–70, 2013.
[123] Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. Discrete
& Computational Geometry, 33(2):249–274, 2005.
220