0% found this document useful (0 votes)
17 views241 pages

Thesis

Uploaded by

caetano.info
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views241 pages

Thesis

Uploaded by

caetano.info
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 241

Metric and Topological Approaches to Network Data Analysis

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor


of Philosophy in the Graduate School of The Ohio State University

By

Samir Chowdhury, B.S., M.S.

Graduate Program in Mathematics

The Ohio State University

2019

Dissertation Committee:
Facundo Mémoli, Advisor
Matthew Kahle
David Sivakoff
c Copyright by
Samir Chowdhury
2019
Abstract

Network data, which shows the relationships between entities in complex systems, is
becoming available at an ever-increasing rate. In particular, advances in data acquisition
and computational power have shifted the bottleneck in analyzing weighted and directed
network datasets towards the domain of available mathematical methods. Thus there is a
pressing need to develop mathematical foundations for analyzing such datasets.
In this thesis, we present methods for applying one of the flagship tools of topological
data analysis—persistent homology—to weighted, directed network datasets. We ground
these methods in a network distance dN that had appeared in a restricted context in ear-
lier literature, and is now fully developed in this thesis. This development independently
provides metric methods for network data analysis, including and invoking methods from
optimal transport.
In our framework, a network dataset is represented as a set of points X (equipped
with the minimalistic structure of a first countable topological space) and a (continuous)
edge weight function ωX : X × X → R. With this terminology, a finite network dataset
is viewed as a finite sample from some infinite underlying process—a compact network.
This perspective is especially appropriate for data streams that are so large that they are
“essentially infinite”, or are perhaps being generated continuously in time.
We show that the space of all compact networks is the completion of the space of all
finite networks. We develop the notion of isomorphism in this space, and explore a range
of different geodesics that exist in this space. We develop sampling theorems and explain
their use in obtaining probabilistic convergence guarantees. Several persistent homology
methods—notably including persistent path homology—are also developed. By virtue of
the sampling theorems, we are able to define these methods even for infinite networks.
Our theoretical contributions are complemented by software packages that we devel-
oped in the course of producing this thesis. We illustrate the theory and implementations
via experiments on simulated and real-world data.

ii
In my family, I found boundless sources of love and encouragement. My grandparents,
aunts, and uncles provided me with all the support I could ask for. My mother gave me
insight from her own life as an academic, my father taught me the quadratic formula and
my first lessons in the sciences, and my brother lifted the weight of responsibility from my
shoulders so I could pursue my own path. It is to them that I dedicate this work.

iii
Acknowledgments

My thesis advisor, Facundo Mémoli, has invested incredible amounts of time and en-
ergy into my education, and I can only hope to pay it forward. Even during the final stages
of writing this thesis, I was amazed at how his vision had materialized as connections be-
tween all the research that we had done together, and how neatly the different concepts
rolled out and interlocked, like the pieces of a puzzle. I am happy and grateful that I was
able to work under his guidance for the past five years.
I am lucky to have had many mentors over the years; special thanks go to Henry Adams,
Jose Perea, Chad Giusti, Matthew Kahle, and David Sivakoff, who have all contributed
crucially to the development of my career. I would like to thank Neil Falkner, whose
instruction was critical for my education, and also Boris Hasselblatt, Richard Weiss, and
Genevieve Walsh, who first led me into mathematics.
Many academics took the time to give me advice and direction over the years. From
these conversations, I have often carried away a new insight into a complex problem, or
a deeper realization about the academic world at large. Javier Arsuaga, Pablo Cámara,
Chao Chen, Paweł Dłotko, Moon Duchin, Greg Henselman, Kathryn Hess, Steve Hunts-
man, Sara Kališnik, Katherine Kinnaird, Sanjeevi Krishnan, Melissa McGuirl, Amit Patel,
Xaq Pitkow, Vanessa Robins, Manish Saggar, Santiago Segarra, Elchanan Solomon, Justin
Solomon, Mimi Tsuruga, Mariel Vázquez, Bei Wang, Sunny Xiao, Lori Ziegelmeier—
thank you all, you make this community a wonderful place to be in.
Our research group—and more broadly, the TGDA group at OSU—comprised a won-
derful community of grad students and postdocs who contributed to my way of thinking
about problems. Tom Needham and Ben Schweinhart were both exceptional in that re-
gard. Tamal Dey gave the seminar talk that first got me interested in applied topology, and
Anastasios Sidiropoulos got me thinking about directed graphs.
My friends in Columbus made sure that my time here was rich and rewarding. Thanks
to the 91 crew for the endless laughs, to Sunita and Sabrina for many therapeutic con-
versations, and to the Rahmans for making me part of their family. Thanks to Tia for
contributing to my career, in so many ways known and unknown. Danke schön to Natalie
for her warmth, wit, and wisdom. My fellow grad students were a constant source of sup-
port and entertainment; I was lucky to have Marissa as a study buddy, Katie for keeping
my harebrained ideas about the TAGGS seminar in check, and Evan and Kevin for many
welcome distractions. Osama, Hanbaek, and I were there for each other through all the
highs and lows. I could not ask for it to be any other way.

iv
Vita

2013-present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PhD student, Mathematics,


The Ohio State University
2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.S. Mathematics,
The Ohio State University
2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.S. Engineering Science, Mathematics,
Tufts University

Publications

Research Publications

S. Chowdhury, B. Dai, F. Mémoli. “The importance of forgetting: Limiting memory


improves recovery of topological characteristics from neural data”. PloS ONE, 2018.

S. Chowdhury, F. Mémoli. “A functorial Dowker theorem and persistent homology of


asymmetric networks”. Journal of Applied and Computational Topology, 2018.

S. Chowdhury, F. Mémoli. “Persistent path homology of directed networks”. Proceedings


of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 2018.

S. Chowdhury, F. Mémoli. “Explicit geodesics in Gromov-Hausdorff space ”. Electronic


Research Announcements in Mathematical Sciences, 2018.

S. Chowdhury, F. Mémoli, Z. Smith. “Improved error bounds for tree representations of


metric spaces”. Advances in Neural Information Processing Systems, 2016.

Z. Smith, S. Chowdhury, F. Mémoli. “Hierarchical representations of network data with


optimal distortion bounds”. 50th Asilomar Conference on Signals, Systems and Computers,
2016.

v
S. Chowdhury, F. Mémoli. “Persistent homology of directed networks”. 50th Asilomar
Conference on Signals, Systems and Computers, 2016.

S. Chowdhury, F. Mémoli. “Distances between directed networks and applications”. IEEE


International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.

S. Chowdhury, F. Mémoli. “Metric structures on networks and applications”. 53rd Annual


Allerton Conference on Communication, Control, and Computing (Allerton), 2015.

Fields of Study

Major Field: Mathematics

vi
Table of Contents

Page

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Organization of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . 3


1.2 Networks and dN : Definition, reformulations, and first results . . . . . . 4
1.3 Network models: the cycle networks and the SBM networks . . . . . . . 10
1.3.1 The directed circles . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 The finite cycle networks . . . . . . . . . . . . . . . . . . . . . 15
1.3.3 The network stochastic block model . . . . . . . . . . . . . . . . 16
1.4 Persistent homology on networks: Simplicial constructions . . . . . . . . 17
1.4.1 Background on persistent homology . . . . . . . . . . . . . . . 18
1.4.2 Related literature on persistent homology of directed networks . 20
1.4.3 The Vietoris-Rips filtration of a network . . . . . . . . . . . . . 21
1.4.4 The Dowker filtration of a network . . . . . . . . . . . . . . . . 22
1.4.5 A Functorial Dowker Theorem . . . . . . . . . . . . . . . . . . 24
1.4.6 Dowker persistence diagrams and asymmetry . . . . . . . . . . . 27
1.5 Persistent path homology of networks . . . . . . . . . . . . . . . . . . . 35
1.5.1 Path homology of digraphs . . . . . . . . . . . . . . . . . . . . 35
1.5.2 The persistent path homology of a network . . . . . . . . . . . . 38

vii
1.5.3 An application: Characterizing the diagrams of cycle networks . 41
1.6 The case of compact networks . . . . . . . . . . . . . . . . . . . . . . . 42
1.6.1 ε-systems and finite sampling . . . . . . . . . . . . . . . . . . . 42
1.6.2 Weak isomorphism and dN . . . . . . . . . . . . . . . . . . . . 44
1.6.3 An additional axiom coupling weight function with topology . . 47
1.6.4 Skeletons, motifs, and motif reconstruction . . . . . . . . . . . . 49
1.7 Diagrams of compact networks and convergence results . . . . . . . . . 53
1.8 Completeness, compactness, and geodesics . . . . . . . . . . . . . . . . 54
1.8.1 Completeness of (CN /∼ =w , dN ) . . . . . . . . . . . . . . . . . . 54
1.8.2 Precompact families in CN /∼ =w . . . . . . . . . . . . . . . . . . 55
1.8.3 Geodesics: existence and explicit examples . . . . . . . . . . . . 55
1.9 Measure networks and the dN,p distances . . . . . . . . . . . . . . . . . 61
1.9.1 The structure of measure networks . . . . . . . . . . . . . . . . 63
1.9.2 Couplings and the distortion functional . . . . . . . . . . . . . . 65
1.9.3 Interval representation and continuity of distortion . . . . . . . . 66
1.9.4 Optimality of couplings in the network setting . . . . . . . . . . 68
1.9.5 The Network Gromov-Wasserstein distance . . . . . . . . . . . . 69
1.9.6 The Network Gromov-Prokhorov distance . . . . . . . . . . . . 72
1.9.7 Lower bounds and measure network invariants . . . . . . . . . . 72
1.10 Computational aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
1.10.1 Software packages . . . . . . . . . . . . . . . . . . . . . . . . . 79
1.10.2 Simulated hippocampal networks . . . . . . . . . . . . . . . . . 79
1.10.3 Clustering SBMs and migration networks . . . . . . . . . . . . . 82

2. Metric structures of dN and dN,p . . . . . . . . . . . . . . . . . . . . . . . . . 89

2.1 Proofs from §1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89


2.2 ε-systems and finite sampling . . . . . . . . . . . . . . . . . . . . . . . 91
2.3 Proofs from §1.6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.4 Skeletons and motif reconstruction . . . . . . . . . . . . . . . . . . . . 102
2.4.1 The skeleton of a compact network . . . . . . . . . . . . . . . . 102
2.4.2 Reconstruction via motifs and skeletons . . . . . . . . . . . . . . 106
2.5 Completeness, compactness, and geodesics . . . . . . . . . . . . . . . . 110
2.5.1 The completion of CN /∼ =w . . . . . . . . . . . . . . . . . . . . 110
2.5.2 Precompact families in CN /∼ =w . . . . . . . . . . . . . . . . . . 114
2.5.3 Geodesics in CN /∼ =w . . . . . . . . . . . . . . . . . . . . . . . 115
2.6 Lower bounds on dN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
2.6.1 Quantitative stability of invariants of networks . . . . . . . . . . 121
2.6.2 Proofs involving lower bounds for dN . . . . . . . . . . . . . . . 123
2.7 Proofs from §1.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

viii
3. Persistent Homology on Networks . . . . . . . . . . . . . . . . . . . . . . . . 138

3.1 Background on persistence and interleaving . . . . . . . . . . . . . . . . 138


3.1.1 Homology, persistence, and tameness . . . . . . . . . . . . . . . 138
3.1.2 Interleaving distance and stability of persistence vector spaces. . 140
3.2 Simplicial constructions . . . . . . . . . . . . . . . . . . . . . . . . . . 142
3.2.1 Stability of Vietoris-Rips and Dowker constructions . . . . . . . 142
3.2.2 The Functorial Dowker Theorem and equivalence of diagrams . . 143
3.2.3 The equivalence between the finite FDT and the simplicial FNTs 149
3.2.4 Dowker persistence diagrams of cycle networks . . . . . . . . . 156
3.3 Persistent path homology . . . . . . . . . . . . . . . . . . . . . . . . . . 160
3.3.1 Digraph maps and functoriality . . . . . . . . . . . . . . . . . . 160
3.3.2 Homotopy of digraphs . . . . . . . . . . . . . . . . . . . . . . . 163
3.3.3 The Persistent Path Homology of a Network . . . . . . . . . . . 164
3.3.4 PPH and Dowker persistence . . . . . . . . . . . . . . . . . . . 165
3.4 Diagrams of compact networks . . . . . . . . . . . . . . . . . . . . . . 170

4. Algorithms, computation, and experiments . . . . . . . . . . . . . . . . . . . . 172

4.1 The complexity of computing dN . . . . . . . . . . . . . . . . . . . . . 172


4.2 Computing lower bounds for dN . . . . . . . . . . . . . . . . . . . . . . 173
4.2.1 An algorithm for computing minimum matchings . . . . . . . . 174
4.2.2 Computational example: randomly generated networks . . . . . 174
4.2.3 Computational example: simulated hippocampal networks . . . . 176
4.3 Lower bounds for dN,p . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.3.1 Numerical stability of entropic regularization . . . . . . . . . . . 179
4.4 Complexity of PPH and algorithmic aspects . . . . . . . . . . . . . . . . 184
4.4.1 The modified algorithm . . . . . . . . . . . . . . . . . . . . . . 189
4.5 More experiments using Dowker persistence . . . . . . . . . . . . . . . 191
4.5.1 U.S. economy input-output accounts . . . . . . . . . . . . . . . 191
4.5.2 U.S. migration . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
4.5.3 Global migration . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

ix
List of Tables

Table Page

1.1 Left: The five classes of SBM networks corresponding to the experiment
in §1.10.3. N refers to the number of communities, v refers to the vector
that was used to compute a table of means via G5 (v), and ni is the number
of nodes in each community. Right: G5 (v) for v = [0, 25, 50, 75, 100]. . . . 85

1.2 Two-community SBM networks as described in §1.10.3. . . . . . . . . . . 87

4.1 The first two columns contain sample 0-dimensional persistence intervals,
as produced by Javaplex. We have added the labels in column 3, and the
common δ-sinks in column 4. . . . . . . . . . . . . . . . . . . . . . . . . . 193

4.2 Quantitative estimates on migrant flow, following the interpretation pre-


sented in §4.5.2. In each row, we list a simplex of the form [si , sj ] (resp.
[si , sj , sl ] for 2-simplices) and any possible δ-sinks sk . We hypothesize
that sk receives at least (1 − δ)(influx(sk )) migrants from each of si , sj
(resp. si , sj , sl )—these lower bounds are presented in the third column.
The fourth column contains the true migration numbers. Notice that the
[FL,GA] simplex appears to show the greatest error between the lower
bound and the true migration. Following the interpretation suggested ear-
lier in §4.5.2, this indicates that Florida and Georgia appear to have strong
coherence of preference, relative to the other pairs of states spanning 1-
simplices in this table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

4.3 Quantitative estimates on migrant flow, following the interpretation pre-


sented in §4.5.2. The entries in this table follow the same rules as those
of Table 4.2. Notice that the [OR,WA] and [AZ,CA] simplices show the
greatest error between the lower bound and the true migration. Following
the interpretation in §4.5.2, this suggests that these two pairs of states ex-
hibit stronger coherence of preference than the other pairs of states forming
1-simplices in this table. . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

x
4.4 Short 0-dimensional Dowker persistence intervals capture regions which
receive most of their incoming migrants from a single source. Each in-
terval [0, δ) corresponds to a 0-simplex which becomes subsumed into a
1-simplex at resolution δ. We list these 1-simplices in the second column,
and their δ sinks in the third column. The definition of a δ-sink enables us
to produce a lower bound on the migration into each sink, which we pro-
vide in the fourth column. We also list the true migration numbers in the
fifth column, and the reader can consult §4.5.2 for our explanation of the
error between the true migration and the lower bounds on migration. . . . . 208

4.5 Representative 1-cycles for several intervals in the 1-dimensional persis-


tence barcode for the global migration dataset. Each of the first four cycles
has the special property that it becomes a boundary due to a single sink
at the right endpoint of its associated persistence interval. This permits
us to obtain a lower bound on the migration into this sink from each of
the regions in the cycle. The last row contains a cycle without this spe-
cial property. The font colors of the persistence intervals correspond to the
colors of the highlighted 1-dimensional bars in Figure 4.16. . . . . . . . . . 210

xi
List of Figures

Figure Page

1.1 The two networks on the left have different cardinalities, but computing
correspondences shows that dN (X, Y ) = 1. Similarly one computes dN (X, Z) =
0, and thus dN (Y, Z) = 1 by triangle inequality. On the other hand, the bi-
jection given by the arrows shows dbN (Y, Z) = 1. Applying Proposition 12
then recovers dN (X, Y ) = 1. . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2 The directed circle (~S1 , ω~S1 ), the directed circle on 6 nodes (~S16 , ω~S1 ), and
6
the directed circle with reversibility ρ, for some ρ ∈ [1, ∞). Traveling in a
clockwise direction is possibly only in the directed circle with reversibility
ρ, but this incurs a penalty modulated by ρ. . . . . . . . . . . . . . . . . . 15

1.3 A cycle network on 6 nodes, along with its weight matrix. Note that the
weights are highly asymmetric. . . . . . . . . . . . . . . . . . . . . . . . . 16

1.4 A network SBM on 50 nodes, split into 5 communities, along with the
matrices of means and variances. The deepest blue corresponds to values
≈ 1, and the deepest yellow corresponds to values ≈ 29. . . . . . . . . . . 17

1.5 A schematic of some of the simplicial constructions on directed networks.


F is the collection of filtered simplicial complexes. Dgm is the collection
of persistence diagrams. We study the Rips (R) and Dowker (Dsi , Dso )
filtrations, each of which takes a network as input and produces a filtered
simplicial complex. s and t denote the network transformations of sym-
metrization (replacing a pair of weights between two nodes by the max-
imum weight) and transposition (swapping the weights between pairs of
nodes). R is insensitive to both s and t. But Dsi ◦ t = Dso , Dso ◦ t = Dsi ,
and in general, Dsi and Dso are not invariant under t (Theorem 50). . . . . . 17

1.6 Computing the Dowker sink and source complexes of a network (X, ωX ).
Observe that the sink and source complexes are different in the range 1 ≤
δ < 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

xii
1.7 The first column contains illustrations of cycle networks G3 , G4 , G5 and
G6 . The second column contains the corresponding Dowker persistence
barcodes, in dimensions 0 and 1. Note that the persistent intervals in the
1-dimensional barcodes agree with the result in Theorem 40. The third col-
umn contains the Rips persistence barcodes of each of the cycle networks.
Note that for n = 3, 4, there are no persistent intervals in dimension 1. On
the other hand, for n = 6, there are two persistent intervals in dimension 1. 31

1.8 (Y, ωY ) is the (a, c)-swap of (X, ωX ). . . . . . . . . . . . . . . . . . . . . 32

1.9 Dowker persistence barcodes of networks (X, ωX ) and (Y, ωY ) from Figure
1.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.10 Rips persistence barcodes of networks (X, ωX ) and (Y, ωY ) from Figure
1.8. Note that the Rips diagrams indicate no persistent homology in di-
mensions higher than 0, in contrast with the Dowker diagrams in Figure
1.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.11 A two-node digraph on the vertex set Y = {a, b}. . . . . . . . . . . . . . . 36

1.12 Two types of square digraphs. . . . . . . . . . . . . . . . . . . . . . . . . 37

1.13 Working over Z/2Z coefficients, we find that DgmΞ1 (X ) and DgmD
1 (Y) are
D Ξ
trivial, whereas Dgm1 (X ) = Dgm1 (Y) = {(1, 2)} = {(1, 2)}. . . . . . . . 40

1.14 Left: Gδ3 is (digraph) homotopy equivalent to a point at δ = 1, as can be


seen by collapsing points along the orange lines. Right: Dsiδ,3 becomes

contractible at δ = 2, but has √ nontrivial homology in dimension 2 that
persists across the interval [1, 2). . . . . . . . . . . . . . . . . . . . . . 41

1.15 Relaxing the requirements on the maps of this “tripod structure” is a natural
way to weaken the notion of strong isomorphism. . . . . . . . . . . . . . . 45

1.16 Note that Remark 69 does not fully characterize weak isomorphism, even
for finite networks: All three networks above, with the given weight matri-
ces, are Type I weakly isomorphic since C maps surjectively onto A and
B. But there are no surjective, weight preserving maps A → B or B → A. . 46

xiii
1.17 Left: Z represents a terminal object in p(X), and f, g are weight preserving
surjections X → Z. Here ϕ ∈ Aut(Z) is such that g = ϕ ◦ f . Right:
Here we show more of the poset structure of p(X). In this case we have
X  V  Y . . .  Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

1.18 Interpolating between the skeleton and blow-up constructions. . . . . . . . 52

1.19 Illustrations of the finite networks we consider in this paper. Notice that
the edge weights are asymmetric. The numbers in each node correspond to
probability masses; for each network, these masses sum to 1. . . . . . . . . 62

1.20 The dN,p distance between the two one-node networks is simply 12 |α − α0 |.
In Example 109 we give an explicit formula for computing dN,p between
an arbitrary network and a one-node network. . . . . . . . . . . . . . . . . 70

1.21 Networks at dN,p -distance zero which are not strongly isomorphic. . . . . . 71

1.22 Bottom right: Sample place cell spiking pattern matrix. The x-axis cor-
responds to the number of time steps, and the y-axis corresponds to the
number of place cells. Black dots represent spikes. Clockwise from bot-
tom middle: Sample distribution of place field centers in 4, 3, 0, 1, and
2-hole arenas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

1.23 Single linkage dendrogram corresponding to the distance matrix obtained


by computing bottleneck distances between 1-dimensional Dowker persis-
tence diagrams of our database of hippocampal networks (§1.10.2). Note
that the 4, 3, and 2-hole arenas are well separated into clusters at threshold
0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

1.24 Single linkage dendrogram corresponding to the distance matrix obtained


by computing bottleneck distances between 1-dimensional Rips persistence
diagrams of our database of hippocampal networks (§1.10.2). Notice that
the hierarchical clustering fails to capture the correct arena types. . . . . . . 84

xiv
1.25 Left: TLB dissimilarity matrix for SBM community networks in §1.10.3.
Classes 1 and 3 are similar, even though networks in Class 3 have twice
as many nodes as those in Class 1. Classes 2 and 5 are most dissimilar
because of the large difference in their edge weights. Class 4 has a different
number of communities than the others, and is dissimilar to Classes 1 and 3
even though all their edge weights are in comparable ranges. Right: TLB
dissimilarity matrix for two-community SBM networks in §1.10.3. The
near-zero values on the diagonal are a result of using the adaptive λ-search
described in Chapter 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

1.26 Result of applying the TLB to the migration networks in §1.10.3. Left:
Dissimilarity matrix. Nodes 1-5 correspond to female migration from 1960-
2000, and nodes 6-10 correspond to male migration from 1960-2000. Right:
Single linkage dendrogram. Notice that overall migration patterns change
in time, but within a time period, migration patterns are grouped according
to gender. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

2.1 The trace map erases data between pairs of nodes. . . . . . . . . . . . . . . 119

2.2 The out map applied to each node yields the greatest weight of an arrow
leaving the node, and the in map returns the greatest weight entering the
node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

2.3 Lower-bounding dN by using global spectra (cf. Example 148). . . . . . . 122

3.1 Directed d-cubes that are all homotopy equivalent. . . . . . . . . . . . . . . 163

4.1 Left: Lower bound matrix arising from matching local spectra on the
database of community networks. Right: Corresponding single linkage
dendrogram. The labels indicate the number of communities and the total
number of nodes. Results correspond to using local spectra as described in
Proposition 180. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

4.2 Left: Lower bound matrix arising from matching global spectra on the
database of community networks. Right: Corresponding single linkage
dendrogram. The labels indicate the number of communities and the total
number of nodes. Results correspond to using global spectra as signatures. 176

4.3 Single linkage dendrogram based on local spectrum lower bound of Propo-
sition 180 corresponding to hippocampal networks with place field radii
0.2L, 0.1L, and 0.05L (clockwise from top left). . . . . . . . . . . . . . . . 178

xv
4.4 Sinkhorn computations for dN,2 (X , Y) are carried out in the “stable region”
for K, and the end result is rescaled to recover dN,2 (X, Y ). . . . . . . . . . 183

4.5 Left: The rows and columns of Mp are initially arranged so that the domain
and codomain vectors are in increasing and decreasing allow time, respec-
tively. If there are no domain (codomain) vectors having a particular allow
time, then the corresponding vertical (horizontal) strip is omitted. Right:
After converting to column echelon form, the domain vectors of Mp,G need
not be in the original ordering. But the codomain vectors are still arranged
in decreasing allow time. . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

4.6 0 and 1-dimensional Dowker persistence barcodes for US economic sector


data, obtained by the process described in §4.5.1. The long 1-dimensional
persistence intervals that are colored in red are examined in §4.5.1 and
Figures 4.9,4.12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

4.7 Investment patterns at δ = 0.75 . . . . . . . . . . . . . . . . . . . . . . . . 195

4.8 Investment patterns at δ = 0.94 . . . . . . . . . . . . . . . . . . . . . . . . 195

4.9 Here we illustrate the representative nodes for one of the 1-dimensional
persistence intervals in Figure 4.6. This 1-cycle [PC,CH] + [CH,PL] +
[PL,WO] - [WO,FM] + [FM,PM] - [PM,PC] persists on the interval [0.75, 0.95).
At δ = 0.94, we observe that this 1-cycle has joined the homology equiv-
alence class of the shorter 1-cycle illustrated on the right. Unidirectional
arrows represent an asymmetric flow of investment. A full description of
the meaning of each arrow is provided in §4.5.1. . . . . . . . . . . . . . . . 195

4.10 Investment patterns at δ = 0.81 . . . . . . . . . . . . . . . . . . . . . . . . 196

4.11 Investment patterns at δ = 0.99 . . . . . . . . . . . . . . . . . . . . . . . . 196

4.12 Representative nodes for another 1-dimensional persistence interval in Fig-


ure 4.6. A full description of this cycle is provided in §4.5.1. . . . . . . . . 196

xvi
4.13 An example of a 1-cycle becoming a 1-boundary due to a single mutual sink
r, as described in the interpretation of 1-cycles in §4.5.2. The figure on the
left shows a connected component of Dsiδ,S , consisting of [s1 , s2 ], [s2 , s3 ], [s3 , s4 ].
The arrows are meant to suggest that r will eventually become a δ-sink for
each of these 1-simplices, for some large enough δ. The progression of
these simplices for increasing values of δ are shown from left to right. In
the leftmost figure, r is not a δ-sink for any of the three 1-simplices. Note
that r has become a δ-sink for [s3 , s4 ] in the middle figure. Finally, in the
rightmost figure, r has become a δ-sink for each of the three 1-simplices. . . 201

4.14 0 and 1-dimensional Dowker persistence barcodes for U.S. migration data . 203

4.15 U.S. map with representative cycles of the persistence intervals that were
highlighted in Figure 4.14. The cycle on the left appears at δ1 = 0.87,
and the cycle on the right appears at δ2 = 0.90. The red lines indicate the
1-simplices that participate in each cycle. Each red line is decorated with
an arrowhead si → sj if and only if sj is a sink for the simplex [si , sj ].
The blue arrows point towards all possible alternative δ-sinks, and are in-
terpreted as follows: Tennessee is a 0.90-sink for the Kentucky-Georgia
simplex, West Virginia is a 0.90-sink for the Ohio-Florida simplex, and
Alabama is a 0.90-sink for the Georgia-Florida simplex. . . . . . . . . . . . 204

4.16 Dowker persistence barcodes for global migration dataset. . . . . . . . . . 208

4.17 Top: Two cycles corresponding to the left endpoints of the (Djibouti-
Somalia-Uganda-Eritrea-Ethiopia) and (Kiribati-Papua New Guinea-Australia-
United Kingdom-Tuvalu) persistence intervals listed in Table 4.5. The
δ values are 0.73, 0.77, respectively. Bottom: Two cycles correspond-
ing to the left endpoints of the (China-Thailand-Philippines) and (China-
Indonesia-Malaysia) persistence intervals listed in Table 4.5. The δ values
are 0.77, 0.75, respectively. Meaning of arrows: In each cycle, an arrow
si → sj means that ωS (si , sj ) ≤ δ, i.e. that sj is a sink for the simplex
[si , sj ]. We can verify separately that for δ = 0.77, the Kiribati-Papua
New Guinea simplex has the Solomon Islands as a δ-sink, and that the
Philippines-Thailand simplex has Taiwan as a δ-sink. Similarly, the China-
Malaysia simplex has Singapore as a δ-sink, for δ = 0.75. . . . . . . . . . 209

xvii
Background definitions and conventions

We write N to denote the natural numbers 1, 2, 3, . . ., Z+ to denote {0} ∪ N, and R+ to


denote the nonnegative reals. The power set of a set S is denoted pow(S). The cardinality
of S is denoted |S| or card(S). The empty set is denoted ∅. The identity map on S is
denoted idS .
A simplicial complex K consists of a vertex set V (K) and a collection of nonempty
subsets σ ∈ pow(V (K)) such that whenever σ ∈ K and τ ⊆ σ, we also have τ ∈ K. This
is typically known as an abstract simplicial complex. Any such complex has a geometric
realization which can be realized as follows. For convenience, assume n := |V (K)| < ∞.
Enumerate V (K) as {v1 , v2 , . . . , vn }. Define f : V (K) → Rn by vi 7→ ei , the ith standard
basis vector in Rn . This is the vector of all zeros with a 1 in the ith coordinate. Given a
simplex σ ∈ K, the corresponding P convex hull of {f (v) : v ∈ σ},
Pgeometric simplex is the
i.e. the collection of all points v∈σ tv f (v) such that v∈σ tv = 1 and each tv ≥ 0.
The geometric realization of K, denoted |K|, is the collection of all geometric simplices
corresponding to simplices in K.
A metric space is a set X and a metric function dX : X × X → [0, ∞) such that for all
x, x0 , x00 ∈ X, the following holds:

• (Nondegeneracy) dX (x, x0 ) = 0 if and only if x = x0 .

• (Symmetry) dX (x, x0 ) = dX (x0 , x).

• (Triangle inequality) dX (x, x00 ) ≤ dX (x, x0 ) + dX (x0 , x00 ).

A metric space (X, dX ) is totally bounded if for any ε > 0, there exists a finite subset
S ⊆ X such that dX (x, S) < ε for all x ∈ S. Here dX (x, S) := mins∈S dX (x, s).
Given a set X, a topology on X is a subset τX ⊆ pow(X) such that:

• Both ∅ and X belong to τX

• Any arbitrary union of elements of τX belongs to τX (i.e. τX is closed under taking


arbitrary unions)

• A finite intersection of elements of τX belongs to τX (i.e. τX is closed under taking


finite intersections).

xviii
The elements of τX are referred to as the open sets of X.
An open cover of a topological space X is a collection of open sets {Ui ⊆ X : i ∈ I}
S
indexed by some set I such that each Ui is nonempty, and i∈I Ui = X.
A base or basis for τX is a subcollection of τX such that every open set in X can be
written as a union of elements in the subcollection. A local base at a point x ∈ X is a
collection of open sets containing x such that every open set containing x contains some
element in this collection.
There are always two topologies that one can place on a set X: the discrete topology
τX = pow(X), and the trivial topology τX = {∅, X}.
A point cloud is a discrete subset of d-dimensional Euclidean space for some d ∈ N.
Given two topological spaces X and Y , two continuous maps f, g : X → Y are
said to be homotopic if there exists a continuous map F : X × [0, 1] → Y such that
F |X×{0} = f and F |X×{1} = g. X and Y are said to be homotopy equivalent if there exist
maps f : X → Y and g : Y → X such that g ◦ f ' idX and f ◦ g ' idY . In this case, f
and g are said to be homotopy inverses.
The indicator function of a set S is denoted 1S . We denote measure spaces via the
triple (X, F, µ), where X is a set, F is a σ-field on X, and µ is the measure on F. Given a
measure space (X, F, µ), we write L0 = L0 (µ) to denote the collection of F-measurable
R
functions f : X → R. For all p ∈ (0, ∞) and all f ∈ L0 , we define kf kp := ( |f |p dµ)1/p .
For p = ∞, kf k∞ := inf{M ∈ [0, ∞] : µ(|f | > M ) = 0}. Then for any p ∈ (0, ∞],
Lp = Lp (µ) := {f ∈ L0 : kf kp < ∞}.
Given a measurable real-valued function f : X → R and t ∈ R, we will occasionally
write {f ≤ t} to denote the set {x ∈ X : f (x) ≤ t}.
Lebesgue measure on the reals will be denoted by λ. We write λI to denote the
Lebesgue measure on the unit interval I = [0, 1].
Suppose we have a measure space (X, F, µ), a measurable space (Y, G), and a measur-
able function f : X → Y . The pushforward or image measure of f is defined to be the
measure f∗ µ on G given by writing f∗ µ(A) := µ(f −1 [A]) for all A ∈ G.
A particular case where we deal with pushforward measures is the following: given a
product space X = X1 × X2 × . . . × Xn and a measure µ on X , the canonical projection
maps πi : X → Xi , for i = 1, . . . , n, define pushforward measures that we denote (πi )∗ µ.
If each Xi is itself a measure space with measure µi , then we say that µ has marginals µi ,
for i = 1, . . . , n, if (πi )∗ µ = µi for each i. We also consider projection maps of the form
(πi , πj , πk ) : X → Xi × Xj × Xk for i, j, k ∈ {1, . . . , n}, and denote the corresponding
pushforward by (πi , πj , πk )∗ µ. Notice that we can take further projections of the form
(πi , πj )ijk : Xi × Xj × Xk → Xi × Xj , and the images of these projections are precisely
those given by projections of the form (πi , πj ) : X → Xi × Xj .

Remark 1. Let X = X1 × X2 × . . . × Xn be a product space with a measure µ as above,


and suppose each Xi is equipped with a measure µi such that µ has marginals µi . Let
i, j, k ∈ {1, . . . , n}. Then the measure (πi , πj , πk )∗ µ on Xi × Xj × Xk has marginals
(πi , πk )∗ µ and (πj )∗ µ on Xi × Xk and Xj , respectively. To see this, let the projections

xix
Xi × Xj × Xk → Xi × Xk and Xi × Xj × Xk → Xj be denoted (πi , πk )ijk and (πj )ijk ,
respectively. Let B ⊆ Xj be measurable. Then

(πj )ijk
∗ (B) = (πi , πj , πk )∗ µ(Xi × B × Xk ) = µ(X1 × . . . × Xj−1 × B × Xj+1 × . . . × Xn )
= (πj )∗ (B).

Next let A ⊆ Xi and C ⊆ Xk be measurable. Then

(πi , πk )ijk
∗ (A × C) = (πi , πj , πk )∗ µ(A × Xj × C)
= µ(X1 × . . . × Xi−1 × A × Xi+1 × . . . × Xk−1 × C × Xk+1 × . . . × Xn )
= (πi , πk )∗ (A × C).

xx
Chapter 1: Introduction

Let X be a set, and let ωX : X × X → R be any function. The pair (X, ωX ) (with some
additional constraints, cf. Definition 1) is what we refer to as a network, and is the central
object of study in this thesis. Networks arise in many guises, and in many contexts [90], so
it is necessary to explain the motivation behind the preceding definition. Perhaps the most
common type of network is an undirected graph G = (V, E) consisting of a vertex set V
and edge set E. Letting n be the number of elements in V , the adjacency matrix of G is
an n × n binary matrix with a 1 in entry (i, j) if vi → vj , and 0 otherwise. In particular,
the adjacency matrix is symmetric. More generally, a directed graph relaxes the symmetry
condition by allowing for individual edges vi → vj and vj → vi . Finally, a weighted,
directed graph allows for real-valued weights on the edges. The pair (X, ωX ) encapsulates
this idea. In this sense, the networks we study in this thesis are weighted, directed graphs
with self-loops1 . We denote the collection of all networks (cf. Definition 1) by N .
In addition to graphs, N contains many other classes of objects, including metric
spaces, directed metric spaces, Riemannian manifolds, and Finsler manifolds. As will
be shown throughout this work, the perspective of viewing networks as generalized metric
spaces (as opposed to combinatorial objects) enables the import of many techniques—both
new and old—from the theory of metric spaces into network analysis. This ranges from
classical results of Gromov on reconstructions of metric spaces, to more modern tools such
as persistent homology and applied optimal transport.
The question which initially motivated this thesis is as follows. In an unpublished
manuscript first appearing in 2012, Grigor’yan, Lin, Muranov, and Yau defined a homol-
ogy theory for digraphs (i.e. directed graphs) called path homology that generalized the
standard notion of simplicial homology [62]. By this time, the theory of persistent homol-
ogy—a multiresolution version of simplicial homology used for data analysis—had already
become quite popular. So a natural question was: would it be possible to produce a per-
sistent version of path homology? More specifically, would it be possible to produce a
persistent path homology theory with satisfactory algorithms and implementations as well
as nice theoretical properties?
In this thesis, we provide a positive answer to the preceding question. More generally,
we develop a framework with machinery in place for extending many metric data analysis
1
While multigraphs (graphs allowing for multiple edges between a pair of vertices) and hypergraphs
(multiple nodes sharing a “hyperedge”) are often useful in practice, we do not consider them in this work.

1
techniques (including variants of persistent homology) to the setting of directed networks
while preserving nice theoretical properties. In particular, we generalize the two most com-
mon extant simplicial persistent homology methods—the Vietoris-Rips and Čech methods
[47]—and fit them to the directed network setting. Our contributions in these directions
have already been published [40, 37].
The crucial component of this framework is the examination of a (pseudo)metric dN on
the collection of all networks, which first appeared in a slightly restricted version in [22].
In this setting, dN was used to provide theoretical guarantees for the robustness of certain
hierarchical clustering methods on directed networks. This network distance is structurally
analogous to the Gromov-Hausdorff distance dGH on CM—the collection of all compact
metric spaces [17]. An important remark is that dN (or even dGH ) is NP-hard to compute.
With the development of various data analysis methods relying on dN , it became in-
creasingly necessary to understand and develop the core theoretical properties of dN itself.
Keeping this goal in mind, we provide a comprehensive analysis of dN in this thesis (see
also [35, 39, 34]). Because its structural counterpart for metric spaces (dGH ) is already well-
studied, we were interested in extending the desirable results for dGH to the setting of dN .
As we show here, a wide range of results does extend to the dN setting. The crux of this
development is in realizing that dGH , by nature of its definition, enforces enough structure
on CM that only certain abstract consequences of metric properties such as symmetry and
triangle inequality become necessary when proving results on CM. These consequences,
when assumed as properties of networks, are quite natural. Most importantly, they allow
us to prove strong results about dN . We expect that in general, when proving results about
dGH , verifying these assumptions (cf. Definitions 1, 2, and also 28) would allow one to
prove the results on the much broader setting of dN with little additional work.
As a natural extension of these ideas, we develop families of metrics dN,p for p = [1, ∞]
that are Lp versions of dN (cf. [38]). These are structurally analogous to the Gromov-
Wasserstein distances developed in [85, 116, 86, 117]. We currently have ongoing work
that leverages the formulation and foundational work on dN,p .
When studying a metric, an important item to clarify is the “curvature” of the metric
[17]. Knowledge of curvature has important practical consequences; for example, it gives
theoretical guarantees on the existence of means [96]. Following the extensive study of
the (Alexandrov) curvature of the space of metric measure spaces in [117], we became
interested in testing similar ideas in the settings of dN and dGH . Interestingly, even the
restricted metric dGH does not admit any curvature bounds (in the Alexandrov sense). The
deeper reason behind this is the existence of “wild” geodesics in the space of compact
metric spaces. In [36], we produced explicit constructions of infinite families of these
wild-type geodesics in CM; these results are reproduced here.
The dN distance at the core of this work is a pseudometric, and it is of natural impor-
tance to understand its zero sets thoroughly. This is related to the classic question “Can
one hear the shape of a drum”: to understand the behavior of our methods, we need to
know which networks are perceived by dN to be the same. We provide a full treatment of

2
this question. We provide an independent definition of “weakly isomorphic networks” and
prove that these are precisely the networks at 0 dN -distance. Weakly isomorphic networks
essentially live on a fiber over a core subnetwork that we call a skeleton. In particular,
when comparing two networks having different sizes, one may traverse the fibers, pick out
two representatives having the same number of nodes, and compute a simplified (but still
NP-hard) distance dbN to obtain the original distance dN .
We continually develop network invariants that serve as polynomial-time proxies for
dN . The persistent homology methods are all examples of such invariants. One of the
other invariants we consider, the motif sets, are based on the “curvature class” invariants
defined by Gromov for metric spaces [65]. In the case of compact metric spaces, the motif
sets form a full invariant. We are able to recover this result for a subcollection of CN
satisfying an additional topological condition that we call coherence, or more specifically,
Axiom A2 (cf. Definition 28). In other words (some readers may be more familiar with this
terminology), the map from weak isomorphism classes of (a subcollection of) networks to
motif sets is injective. The proof of this result is quite short for metric spaces, but requires a
long sequence of verifications for networks. The interesting part, however, is that the proof
is made possible via Axiom A2, which is an abstract consequence of the triangle inequality
(cf. Remark 74). In particular, any finite network satisfies Axiom A2, even if it violates the
triangle inequality by an arbitrary margin. Actually, to be more accurate, given any finite
network, one may traverse the fiber of weakly isomorphic networks down to its skeleton,
and this skeleton will satisfy Axiom A2.
In addition to the theoretical results outlined above, we present practical implemen-
tations on both real and simulated datasets. These implementations are available as the
PersNet, PPH, and GWnets software packages, and are written in a combination of
Matlab, Python, and C++.
Finally, we note that this thesis combines narratives and results from [35, 39, 34, 36,
38, 37, 40]. All of these papers have been developed jointly with Facundo Mémoli. While
each of these papers can be read in a self-contained manner, we have made the effort to
consolidate the landscape developed in those papers and distill it into this thesis.

1.1 Organization of this thesis


Chapter 1 is written as an extended abstract that contains definitions, statements of
main results, and examples. It provides a reasonably comprehensive overview of the entire
thesis. Proofs of the main results, as well as auxiliary results, definitions, and examples,
are provided in the later chapters.
Chapter 1 is organized as follows. The first set of definitions involving dN is pre-
sented in §1.2. In §1.3 we present some network models that serve as important examples
throughout this work. Then we switch to persistent homology and persistent path homol-
ogy in §1.4-1.5. Afterwards we return to dN in §1.6. Results on convergence follow in §1.7,

3
which also contains important guarantees on when persistent homology can be meaning-
fully applied to infinite networks. In §1.8 we discuss the existence of convex-combination
and wild-type geodesics in N . Development of the dN,p distances is carried out in §1.9.
Finally, in §1.10, we discuss computational complexity, algorithms, and the results of im-
plementing our methods on one particular dataset.
Chapter 2 is devoted to proofs of the results about dN stated in §1, along with auxiliary
results. In particular, §2.6 discusses lower bounds for computing dN . Chapter 3 contains
proofs of statements involving persistent homology. Finally we address computational
aspects and additional experiments in Chapter 4.

1.2 Networks and dN : Definition, reformulations, and first results


Proofs from this section can be found in §2.1.
From a historical perspective, our definition of a network is based on a definition that
appeared in [20, 22]. Here a network was defined to be a pair (X, AX ), where X is a finite
set of nodes and AX : X × X → R+ is a metric function without the symmetry or triangle
inequality properties.
A natural generalization is to instead consider pairs (X, ωX ) where X is any arbitrary
set and ωX is any function from X × X to R. While dN can still be defined on such
networks, this definition is much less amenable to results. In such a setting, only ωX has
any structure, and we are essentially restricted to studying functions X × X → R. It turns
out that the following definition is much better for our purposes.

Definition 1 (Network). A network is a pair (X, ωX ), where X is a first countable topolog-


ical space and ωX : X × X → R is continuous with respect to the product topology. Here
X and ωX are referred to as a node set and a weight function, respectively. The collection
of all networks is denoted N . A subnetwork of (X, ωX ) is any subset of X (equipped with
the subspace topology) along with the appropriate restriction of ωX .

Recall that a space is first countable if each point in the space has a countable local basis
(see [114, p. 7] for more details). First countability is a technical condition guaranteeing
that when the underlying topological space of a network is compact, it is also sequentially
compact. Notice that these conditions are automatically satisfied in the finite setting, as ωX
is then trivially continuous.
A further observation is that the “correct” restriction of N to work with is the collection
of compact networks:

Definition 2 (Compact and finite networks). A compact network is a network (X, ωX )


where X is compact. The collection of compact networks is denoted CN . This includes
the collection of finite networks, which we denote by FN .

For data analysis purposes, we expect to only ever work with finite networks. However,
datasets are often viewed as being sampled from some “infinite” object (as is the case

4
for very large datasets), and compact networks turn out to be the appropriate model for
our purposes. In particular, whereas FN is not complete, we have the natural inclusion
FN ⊆ CN , and the latter is a complete pseudometric space (see §1.6).
Letting FM, CM, and M denote the spaces of finite, compact, and arbitrary metric
spaces, respectively, we also note the containments
FM ( FN , CM ( CN , and M ( N .
We now proceed to define dN , starting with some auxiliary definitions.
Definition 3 (Correspondence). Let (X, ωX ), (Y, ωY ) ∈ N . A correspondence between X
and Y is a relation R ⊆ X × Y such that πX (R) = X and πY (R) = Y , where πX and πY
are the canonical projections of X × Y onto X and Y , respectively. The collection of all
correspondences between X and Y will be denoted R(X, Y ), abbreviated to R when the
context is clear.
Example 2 (1-point correspondence). Let X be a set, and let {p} be the set with one point.
Then there is a unique correspondence R = {(x, p) : x ∈ X} between X and {p}.
Example 3 (Diagonal correspondence). Let X = {x1 , . . . , xn } and Y = {y1 , . . . , yn } be
two enumerated sets with the same cardinality. A useful correspondence is the diagonal
correspondence, defined as ∆ := {(xi , yi ) : 1 ≤ i ≤ n} . When X and Y are infinite sets
with the same cardinality, and ϕ : X → Y is a given bijection, then we write the diagonal
correspondence as ∆ := {(x, ϕ(x)) : x ∈ X} .
Definition 4 (Distortion of a correspondence). Let (X, ωX ), (Y, ωY ) ∈ N and let R ∈
R(X, Y ). The distortion of R is given by:
dis(R) := sup |ωX (x, x0 ) − ωY (y, y 0 )|.
(x,y),(x0 ,y 0 )∈R

Remark 4 (Composition of correspondences). Let (X, ωX ), (Y, ωY ), (Z, ωZ ) ∈ N , and


let R ∈ R(X, Y ), S ∈ R(Y, Z). Then we define:
R ◦ S := {(x, z) ∈ X × Z | ∃y, (x, y) ∈ R, (y, z) ∈ S}.
In the proof of Theorem 72, we verify that R ◦ S ∈ R(X, Z), and that dis(R ◦ S) ≤
dis(R) + dis(S).
Following prior work in [22], we make the following definition.
Definition 5 (The first network distance). Let (X, ωX ), (Y, ωY ) ∈ N . We define the net-
work distance between X and Y as follows:
1
dN ((X, ωX ), (Y, ωY )) := inf
2 R∈R
dis(R).

When the context is clear, we will often write dN (X, Y ) to denote dN ((X, ωX ), (Y, ωY )).
We define the collection of optimal correspondences R opt between X and Y to be the col-
lection {R ∈ R(X, Y ) : dis(R) = 2dN (X, Y )} . This set is always nonempty when X, Y ∈
FN , but may be empty in general.

5
Remark 5. We remark that when restricted to the special case of networks that are also
metric spaces, the network distance dN agrees with the Gromov-Hausdorff distance. De-
tails on the Gromov-Hausdorff distance can be found in [17].
Remark 6. The intuition behind the preceding definition of network distance may be better
understood by examining the case of a finite network. Given a finite set X and two edge
0
weight functions ωX , ωX defined on it, we can use the `∞ distance as a measure of network
0
similarity between (X, ωX ) and (X, ωX ):
0
kωX − ωX k`∞ (X×X) := max
0
|ωX (x, x0 ) − ωX
0
(x, x0 )|.
x,x ∈X

A generalization of the ` distance is required when dealing with networks having
different sizes: Given two sets X and Y , we need to decide how to match up points of X
with points of Y . Any such matching will yield a subset R ⊆ X × Y such that πX (R) =
X and πY (R) = Y , where πX and πY are the projection maps from X × Y to X and
Y , respectively. This is precisely a correspondence, as defined above. A valid notion of
network similarity may then be obtained as the distortion incurred by choosing an optimal
correspondence—this is precisely the idea behind the definition of the network distance
above.
We will eventually verify that dN as defined above is a pseudometric (Theorem 72),
which will justify calling dN a “network distance”. Because dN is a pseudometric, it is
important to understand its zero sets. To this end, we first develop the notion of strong
isomorphism of networks. The definition follows below.
Definition 6 (Weight preserving maps). Let (X, ωX ), (Y, ωY ) ∈ N . A map ϕ : X → Y is
weight preserving if:
ωX (x, x0 ) = ωY (ϕ(x), ϕ(x0 )) for all x, x0 ∈ X.
Definition 7 (Strong isomorphism). Let (X, ωX ), (Y, ωY ) ∈ N . To say (X, ωX ) and
(Y, ωY ) are strongly isomorphic means that there exists a weight preserving bijection ϕ :
X → Y . We will denote a strong isomorphism between networks by X ∼ =s Y . Note that
this notion is exactly the usual notion of isomorphism between weighted graphs.
Given two strongly isomorphic networks, i.e. networks (X, ωX ), (Y, ωY ) and a weight
preserving bijection ϕ : X → Y , it is easy to use the diagonal correspondence (Example
3) to verify that dN (X, Y ) = 0. However, it is easy to see that the reverse implication
is not true in general. Using the one-point correspondence (Example 2), one can see that
dN (N1 (1), N2 (12×2 )) = 0. Here 1n×n denotes the all-ones matrix of size n × n for any
n ∈ N. However, these two networks are not strongly isomorphic, because they do not even
have the same cardinality. Thus to understand the zero sets of dN , we need to search for a
different, perhaps weaker notion of isomorphism. This will be further explored in Section
1.6.
Now we state another reformulation of dN which will be especially useful when proving
results about persistent homology.

6
Definition 8 (Distortion of a map between two networks). Given any (X, ωX ), (Y, ωY ) ∈
N and a map ϕ : (X, ωX ) → (Y, ωY ), the distortion of ϕ is defined as:
dis(ϕ) := sup |ωX (x, x0 ) − ωY (ϕ(x), ϕ(x0 ))|.
x,x0 ∈X

Given maps ϕ : (X, ωX ) → (Y, ωY ) and ψ : (Y, ωY ) → (X, ωX ), we define two co-
distortion terms:
CX,Y (ϕ, ψ) := sup |ωX (x, ψ(y)) − ωY (ϕ(x), y)|,
(x,y)∈X×Y

CY,X (ψ, ϕ) := sup |ωY (y, ϕ(x)) − ωX (ψ(y), x)|.


(y,x)∈Y ×X

Proposition 7. Let (X, ωX ), (Y, ωY ) ∈ N . Then,


dN (X, Y ) = 21 inf{sup(dis(ϕ), dis(ψ), CX,Y (ϕ, ψ), CY,X (ψ, ϕ)) :
ϕ : X → Y, ψ : Y → X any maps}.
Remark 8. Proposition 7 is analogous to a result of Kalton and Ostrovskii [75, Theo-
rem 2.1] where—instead of dN —one has the Gromov-Hausdorff distance between metric
spaces. An important remark is that in the Kalton-Ostrovskii formulation, there is only one
co-distortion term. When Proposition 7 is applied to metric spaces, the two co-distortion
terms become equal by symmetry, and thus the Kalton-Ostrovskii formulation is recovered.
But a priori, the lack of symmetry in the network setting requires us to consider both terms.
We thank Pascal Wild for pointing this out to us in an early manuscript.

The second network distance


Even though the definition of dN is very general, in some restricted settings it may
be convenient to consider a network distance that is easier to formulate. For example, in
computational purposes it suffices to assume that we are computing distances between finite
networks. Also, a reduction in computational cost is obtained if we restrict ourselves to
computing distortions of bijections instead of general correspondences. The next definition
arises from such considerations.
Definition 9 (The second network distance). Let (X, ωX ), (Y, ωY ) ∈ N be such that
card(X) = card(Y ). Then define:
1
dbN (X, Y ) := inf sup ωX (x, x0 ) − ωY (ϕ(x), ϕ(x0 )) ,
2 ϕ x,x0 ∈X
where ϕ : X → Y ranges over all bijections from X to Y .
Notice that dbN (X, Y ) = 0 if and only if X ∼ =s Y . Also, dbN satisfies symmetry and
triangle inequality. It turns out via Example 9 that dN and dbN agree on networks over two
nodes. However, the two notions do not agree in general. In particular, a minimal example
where dN 6= dbN occurs for three node networks, as we show in Remark 10.

7
Example 9 (Networks with two nodes). Let (X, ωX ), (Y, ωY ) ∈ FN where X = {x1 , x2 }
α δ
and Y = {y1 , y2 }. Then
 we claim dN (X, Y ) = dN (X, Y ). Furthermore, if X = N2 β γ
b
α0 δ 0
and Y = N2 β 0 γ 0 , then we have the explicit formula:

1
dN (X, Y ) =
min (Γ1 , Γ2 ) , where
2
Γ1 = max (|α − α0 |, |β − β 0 |, |δ − δ 0 |, |γ − γ 0 |) ,
Γ2 = max (|α − γ 0 |, |γ − α0 |, |δ − β 0 |, |β − δ 0 |) .

Details for this calculation are in §2.1.

Remark 10 (A three-node example where dN 6= dbN ). Assume (X, ωX ) and (Y, ωY ) are
two networks with the same cardinality. Then

dN (X, Y ) ≤ dbN (X, Y ).

The inequality holds because each bijection induces a correspondence, and we are min-
imizing over all correspondences to obtain dN . However, the inequality may be strict, as
demonstrated by the following example. Let X = {x1 , . . . , x3 } and let Y = {y1 , . . . , y3 }.
Define ωX (x1 , x1 ) = ωX (x3 , x3 ) = ωX (x1 , x3 ) = 1, ωX = 0 elsewhere, and define
ωY (y3 , y3 ) = 1, ωY = 0 elsewhere. In terms of matrices, X = N3 (ΣX ) and Y = N3 (ΣY ),
where    
1 0 1 0 0 0
ΣX = 0 0 0 and ΣY = 0 0 0 .
0 0 1 0 0 1

Define Γ(x, x , y, y ) = |ωX (x, x ) − ωY (y, y )| for x, x0 ∈ X, y, y 0 ∈ Y . Let ϕ be any


0 0 0 0

bijection. Then we have:

max Γ(x, x0 , ϕ(x), ϕ(x0 )) = max{Γ(x1 , x3 , ϕ(x1 ), ϕ(x3 )), Γ(x1 , x1 , ϕ(x1 ), ϕ(x1 )),
x,x0 ∈X

Γ(x3 , x3 , ϕ(x3 ), ϕ(x3 )), Γ(ϕ−1 (y3 ), ϕ−1 (y3 ), y3 , y3 )}


= 1.

So dbN (X, Y ) = 12 . On the other hand, consider the correspondence

R = {(x1 , y3 ), (x2 , y2 ), (x3 , y3 ), (x2 , y1 )}.

Then max(x,y),(x0 y0 )∈R |ωX (x, x0 ) − ωY (y, y 0 )| = 0. Thus dN (X, Y ) = 0 < dbN (X, Y ).

Example 11 (Networks with three nodes). Let (X, ωX ), (Y, ωY ) ∈ FN , where we write
X = {x1 , x2 , x3 } and Y = {y1 , y2 , y3 }. Because we do not necessarily have dN = dbN
on three node networks by Remark 10, the computation of dN becomes more difficult than
in the two node case presented in Example 9. A certain reduction is still possible, which
we present next. Consider the following list L of matrices representing correspondences,
where a 1 in position (i, j) means that (xi , yj ) belongs to the correspondence.

8
         
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1

         
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1

         
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1

Now let R ∈ R(X, Y ) be any correspondence. Then R contains a correspondence


S ∈ R(X, Y ) such that the matrix form of S is listed in L. Thus dis(R) ≥ dis(S),
since we are maximizing over a larger set. It follows that dN (X, Y ) is obtained by taking
arg min 21 dis(S) over all correspondences S ∈ R(X, Y ) with matrix forms listed in L.
For an example of this calculation, let S denote the correspondence

S := {(x1 , y1 ), (x2 , y2 ), (x3 , y3 )}


 
1
represented by the matrix 1 . Then dis(S) is the maximum among the following:
1

|ωX (x1 , x1 ) − ωY (y1 , y1 )| |ωX (x1 , x2 ) − ωY (y1 , y2 )| |ωX (x1 , x3 ) − ωY (y1 , y3 )|


|ωX (x2 , x1 ) − ωY (y2 , y1 )| |ωX (x2 , x2 ) − ωY (y2 , y2 )| |ωX (x2 , x3 ) − ωY (y2 , y3 )|
|ωX (x3 , x1 ) − ωY (y3 , y1 )| |ωX (x3 , x2 ) − ωY (y3 , y2 )| |ωX (x3 , x3 ) − ωY (y3 , y3 )|.

The following proposition provides an explicit connection between dN and dbN . We


have not defined the A ∼=w I B notation, but for now it can be interpreted as dN (A, B) = 0
(cf. Definition 25). An illustration is also provided in Figure 1.1.

Proposition 12. Let (X, ωX ), (Y, ωY ) ∈ N . Then,

dN (X, Y ) = inf{dbN (X 0 , Y 0 ) : X 0 , Y 0 ∈ N , X 0 ∼
=w 0 ∼w
I X, Y =I Y,
and card(X 0 ) = card(Y 0 )}.

The moral of the preceding proposition is that networks at 0 dN -distance live on a fiber
above their equivalence class (where equivalence is with respect to dN ), and the distance
between two networks can always be computed by computing dbN between two represen-
tatives having the same number of points. While this notion of picking representatives
with the same number of points may seem somewhat mysterious, we refer the reader to
Proposition 79 and Figure 1.18 for explicit details on how this process is carried out.

9
y1 x1 y1

3
x1 x2 5 5 3 3 5 5

y2
1 y3
0 y2
1 y3
X x21 x22

Y Z Y

Figure 1.1: The two networks on the left have different cardinalities, but computing corre-
spondences shows that dN (X, Y ) = 1. Similarly one computes dN (X, Z) = 0, and thus
dN (Y, Z) = 1 by triangle inequality. On the other hand, the bijection given by the arrows
shows dbN (Y, Z) = 1. Applying Proposition 12 then recovers dN (X, Y ) = 1.

Remark 13 (Computational aspects of dN and dbN ). Even though dbN has a simpler for-
mulation than dN , computing dbN still turns out to be an NP-hard problem, as we discuss
in §1.10. Moreover, we show in Theorem 178 that computing dN is at least as hard as
computing dbN .
Instead of trying to compute dN , we will focus on finding network invariants that can
be computed easily. Finding invariants and stability results guaranteeing their validity as
proxies for dN is an overarching goal of this work.

1.3 Network models: the cycle networks and the SBM networks
A dissimilarity network is a network (X, AX ) where AX is a map from X × X to R+ ,
and AX (x, x0 ) = 0 if and only if x = x0 . Neither symmetry nor triangle inequality is
assumed. We denote the collection of all such networks as FN dis , CN dis , and N dis for the
finite, compact, and general settings, respectively.

Example 14. Finite metric spaces and finite ultrametric spaces constitute obvious examples
of dissimilarity networks. Recall that, in an ultrametric space (X, dX ), we have the strong
triangle inequality dX (x, x0 ) ≤ max {dX (x, x00 ), dX (x00 , x0 )} for all x, x0 , x00 ∈ X. More
interesting classes of dissimilarity networks arise by relaxing the symmetry and triangle
inequality conditions of metric spaces.

Definition 10 (Finite reversibility). The reversibility ρX of a dissimilarity network (X, AX )


is defined to be the following quantity:
AX (x, x0 )
ρX := sup .
x6=x0 ∈X AX (x0 , x)

10
(X, AX ) is said to have finite reversibility if ρX < ∞. Notice that ρX ≥ 1, with equality if
and only if AX is symmetric.

Dissimilarity networks satisfying the symmetry condition, but not the triangle inequal-
ity, have a long history dating back to Fréchet’s thesis [55] and continuing with work by
Pitcher and Chittenden [101], Niemytzki [91], Galvin and Shore [57, 58], and many oth-
ers, as summarized in [67]. One of the interesting directions in this line of work was the
development of a “local triangle inequality” and related metrization theorems [91], which
has been continued more recently in [122].
Dissimilarity networks satisfying the triangle inequality, but not symmetry, include the
special class of objects called directed metric spaces, which we define below.

Definition 11. Let (X, AX ) be a dissimilarity network. Given any x ∈ X and r ∈ R+ , the
forward-open ball of radius r centered at x is

B + (x, r) := {x0 ∈ X : AX (x, x0 ) < r} .

The forward-open topology induced by AX is the topology on X generated by the collection


{B + (x, r) : x ∈ X, r > 0}. The idea of forward open balls is prevalent in the study of
Finsler geometry; see [6, p. 149] for details.

Definition 12 (Directed metric spaces). A directed metric space or quasi-metric space is


a dissimilarity network (X, νX ) such that X is equipped with the forward-open topology
induced by νX and νX : X × X → R+ satisfies:

νX (x, x00 ) ≤ νX (x, x0 ) + νX (x0 , x00 ) for all x, x0 , x00 ∈ X.

The function νX is called a directed metric or quasi-metric on X. Notice that compact


directed metric spaces constitute a subfamily of CN dis .

Directed metric spaces with finite reversibility were studied in [108], and constitute im-
portant examples of networks that are strictly non-metric. More specifically, the authors of
[108] extended notions of Hausdorff distance and Gromov-Hausdorff distance to the setting
of directed metric spaces with finite reversibility, and our network distance dN subsumes
this theory while extending it to even more general settings.

Remark 15 (Finsler metrics). An interesting class of directed metric spaces arises from
studying Finsler manifolds. A Finsler manifold (M, F ) is a smooth, connected manifold
M equipped with an asymmetric norm F (called a Finsler function) defined on each tangent
space of M [6]. A Finsler function induces a directed metric dF : M ×M → R+ as follows:
for each x, x0 ∈ M ,
Z b 
0 0
dF (x, x ) := inf F (γ(t), γ̇(t)) dt : γ : [a, b] → M a smooth curve joining x and x .
a

11
Finsler metric spaces have received interest in the applied literature. In [104], the au-
thors prove that Finsler metric spaces with reversible geodesics (i.e. the reverse curve
γ 0 (t) := γ(1 − t) of any geodesic γ : [0, 1] → M is also a geodesic) is a weighted quasi-
metric [104, p. 2]. Such objects have been shown to be essential in biological sequence
comparison [115].

1.3.1 The directed circles


In this section, we explicitly construct an infinite network in N dis , and a family of
infinite networks in CN dis .

The general directed circle


First we construct an asymmetric network in N dis . To motivate this construction, recall
from the classification of topological 1-manifolds that any connected, closed topological
1-manifold is homeomorphic to the circle S1 . So as a first construction of a quasi-metric
space, it is reasonable to adopt S1 as our model and endow it with a quasi-metric weight
function.
First define the set ~S1 := eiθ ∈ C : θ ∈ [0, 2π) . For any α, β ∈ [0, 2π), define

~ β) := β − α mod 2π, with the convention d(α,
d(α, ~ β) ∈ [0, 2π). Then d(α, ~ β) is the
iα iβ
counterclockwise geodesic distance along the unit circle from e to e . As such, it satis-
fies the triangle inequality and vanishes on a pair (eiθ1 , eiθ2 ) if and only if θ1 = θ2 . Next for
each eiθ1 , eiθ2 ∈ ~S1 , define
~ 1 , θ2 ).
ω~S1 (eiθ1 , eiθ2 ) := d(θ
To finish the construction, we specify ~S1 to have the discrete topology. Clearly this is
first countable and makes ω~S1 continuous, but the resulting network will not be compact.
Hence it is natural to ask if there exists a coarser topology that we can place on ~S1 .
We claim that a coarser topology does not work to make (~S1 , ω~S1 ) fit the framework of
N . To see why, let α ∈ [0, 2π). Suppose ω~S1 is continuous with respect to some topology
on ~S1 , to be determined. Fix 0 < ε  2π, and define V := ω~S−1 1
[(−ε, ε)]. Then V is open
iα iα
in the product topology, and in particular contains (e , e ). Since V is a union of open
rectangles, there exists an open set U ⊆ ~S1 such that (eiα , eiα ) ∈ U × U ⊆ V . Suppose
towards a contradiction that U 6= {eiα }. Then there exists eiβ ∈ U , for some β 6= α.
Then ω~S1 (eiα , eiβ ) ∈ (0, ε). But by the definition of ω~S1 , we must have ω~S1 (eiβ , eiα ) ∈
[2π − ε, 2π), which is a contradiction to having ω~S1 (U, U ) ⊆ (−ε, ε).

Definition 13. We define the directed unit circle to be (~S1 , ω~S1 ) with the discrete topology.

12
The directed circles with finite reversibility
Now we define a family of directed circles parametrized by reversibility. Unlike the
construction in §1.3.1, these directed networks belong to the family CN dis . An illustration
is provided in Figure 1.2.
Recall from §1.3.1 that for α, β ∈ [0, 2π), we wrote d(α, ~ β) to denote the counter-
clockwise geodesic distance along the unit circle from e to eiβ . Fix ρ ≥ 1. For each

eiθ1 , eiθ2 ∈ ~S1 , define


 
~ 1 , θ2 ), ρd(θ
ω~S1,ρ (eiθ1 , eiθ2 ) := min d(θ ~ 2 , θ1 ) .

In particular, ω~S1,ρ has reversibility ρ (cf. Definition 10).


Finally, we equip ~S1 with the standard subspace topology generated by the open balls
in C. In this case, ~S1 is compact and first countable. It remains to check that ω~S1,ρ is
continuous.

Proposition 16. ω~S1,ρ : ~S1 × ~S1 → R is continuous.

Proof of Proposition 16. It suffices to show that the preimages of basic open sets under
ω~S1,ρ are open. Let (a, b) be an open interval in R. Let (eiα , eiβ ) ∈ ω~S−1 1,ρ
[(a, b)], where
α, β ∈ [0, 2π). There are three cases: (1) α < β, (2) β < α, or (3) α = β.
~ β), or
Suppose first that α < β. There are two subcases: either ω~S1,ρ (eiα , eiβ ) = d(α,
~ α).
= ρd(β,
Fix r > 0 to be determined later, but small enough so that B(α, r) ∩ B(β, r) = ∅. Let
~ δ) ∈ B(d(α,
γ ∈ B(α, r), δ ∈ B(β, r). Then d(γ, ~ β), 2r). Also,

~ δ) − ρd(α,
ρd(γ, ~ β) = ρ d(γ,
~ δ) − d(α,
~ β) < 2rρ.

Now r can be made arbitrarily small, so that for any γ ∈ B(α, r) and any δ ∈ B(δ, r),
we have ω~S1,ρ (eiγ , eiδ ) ∈ (a, b). It follows that (eiα , eiβ ) is contained in an open set con-
tained inside ω~S−1 1,ρ
[(a, b)]. An analogous proof shows this to be true for the β < α case.
Next suppose α = β. Fix 0 < r < b/(2ρ). We need to show ω~S1,ρ (B(α, r), B(α, r)) ⊆
(a, b). Note that 0 ∈ (a, b). Let γ, δ ∈ B(α, r). There are three subcases. If γ = δ, then
ω~S1,ρ (eiγ , eiδ ) = 0 ∈ (a, b). If d(γ, ~ δ) < 2r, then ω~ 1 (eiγ , eiδ ) < 2r < b. Finally, suppose
S ,ρ
~ δ) ≥ 2r. Then we must have d(δ,
d(γ, ~ γ) < 2r, so ω~ 1 (eiγ , eiδ ) ≤ ρd(δ,
~ γ) < 2rρ < b.
S ,ρ
iγ iδ
Thus for any γ, δ ∈ B(α, r), we have ω~S1,ρ (e , e ) ∈ (a, b).
It follows that ω~S−1 1,ρ
[(a, b)] is open. This proves the claim.

We summarize the preceding observations in the following:

Definition 14. Let ρ ∈ [1, ∞). We define the directed unit circle with reversibility ρ to be
(~S1 , ω~S1,ρ ). This is a compact, asymmetric network in CN dis .

13
This asymmetric network provides us with concrete examples of ε-approximations
(Definition 22), for any ε > 0. To see this, fix any n ∈ N, and consider the directed
circle network on n nodes with reversibility ρ obtained by writing
n o
~S1 := e 2πik
n ∈ C : k ∈ {0, 1, . . . , n − 1} ,
n

and defining ωn,ρ to be the restriction of ω~S1,ρ on this set. The pair (~S1n , ωn,ρ ) is the network
thus obtained. An illustration of ~S1 and ~S1n for n = 6 is provided in Figure 1.2.

Theorem 17. As n → ∞, the sequence of finite dissimilarity networks (~S1n , ωn,ρ ) limits to
the dissimilarity network (~S1 , ω~S1,ρ ) in the sense of dN .

Proof of Theorem 17. Let ε > 0, and let n ∈ N be such that 2π/n < ε. It suffices to show
that dN ((~S1 , ω~S1,ρ ), (~S1n , ωn,ρ )) < ε. Define a correspondence between ~S1 and ~S1n as follows:
n 2πik
o
R := (eiθ , e n ) : θ ∈ ( 2πk
n
− ε 2πk
ρ
, n
+ ε), k ∈ {0, 1, 2, . . . , n − 1} .

2πik
Essentially this is the same as taking ε-balls around each e n , except that the reversibility
parameter skews one side of the ε-ball. Next let 0 ≤ θ1 ≤ θ2 < 2π, and let j, k ∈
{0, 1, . . . , n − 1} be such that θ1 ∈ ( 2πj
n
− ρε , 2πj
n
+ ε) and θ2 ∈ ( 2πk
n
− ρε , 2πk
n
+ ε).
Suppose first that k = j. Then
~ 1 , θ2 ), d(θ
min(d(θ ~ 2 , θ1 )) ≤ ε + ε , and max(d(θ
~ 1 , θ2 ), d(θ
~ 2 , θ1 )) ≤ ρ(ε + ε ) = ρε + ε.
ρ ρ

As ε → 0, this quantity tends to zero, so ω~S1,ρ (eiθ1 , eiθ2 ) → 0. The other cases follow from
similar observations; the key idea is that as ε → 0, the ω~S1,ρ value between any two points
on a “skewed ε-ball” also tends to 0.

Remark 18. Finite reversibility is critical when defining directed circles on n nodes. With-
out this condition, correspondences as above lead to terms like the following:
 2πik 2πik 2πik 2πik

max |ω~S1 (eiθ1 , eiθ2 ) − ω~S1 (e n , e n )|, |ω~S1 (eiθ2 , eiθ1 ) − ω~S1 (e n , e n )| ≈ 2π.

The problem here is that as ω~S1 (eiθ1 , eiθ2 ) → 0, we adversely have ω~S1 (eiθ2 , eiθ1 ) → 2π.
Indeed, one of our later results (cf. §1.8) shows that CN is complete. Because (~S1 , ω~S1 ) 6∈
CN , it follows that there cannot a sequence of finite networks converging to (~S1 , ω~S1 ).

14

6

4π 2π
2π e 6 e 6 2π
6 6

(~S1 , ω~S1 ) e

6
~S1
6 e0 (~S1 , ω~S1,ρ )

2π 2π
8π 10π
6 e 6 e 6 6


6

Figure 1.2: The directed circle (~S1 , ω~S1 ), the directed circle on 6 nodes (~S16 , ω~S1 ), and the di-
6
rected circle with reversibility ρ, for some ρ ∈ [1, ∞). Traveling in a clockwise direction is
possibly only in the directed circle with reversibility ρ, but this incurs a penalty modulated
by ρ.

Remark 19 (Directed circle with finite reversibility—forward-open topology version). In-


stead of using the subspace topology generated by the standard topology on C, we can
also endow (~S1 , ω~S1,ρ ) with the forward-open topology generated by ω~S1,ρ . The open balls
in this topology are precisely the open balls in the subspace topology induced by the stan-
dard topology, the only adjustment being the “center” of each ball. The directed metric
space (~S1 , ω~S1,ρ ) equipped with the forward-open topology is another example of a com-
pact, asymmetric network in CN dis .

1.3.2 The finite cycle networks


For each n ∈ N, let (Xn , En , WEn ) denote the weighted graph with vertex set Xn :=
{x1 , x2 , . . . , xn }, edge set En := {(x1 , x2 ), (x2 , x3 ), . . . , (xn−1 , xn ), (xn , x1 )}, and edge
weights WEn : En → R given by writing WEn (e) = 1 for each e ∈ En . Next let ωGn :
Xn × Xn → R denote the shortest path distance induced on Xn × Xn by WEn . Then we
write Gn := (Xn , ωGn ) to denote the network with node set Xn and weights given by ωGn .
Note that ωGn (x, x) = 0 for each x ∈ Xn . See Figure 1.3 for an example.
We say that Gn is the cycle network of length n. Cycle networks are highly asym-
metric because for every consecutive pair of nodes (xi , xi+1 ) in a graph Gn , where 1 ≤ i
mod (n) ≤ n, we have ωGn (xi , xi+1 ) = 1, whereas ωGn (xi+1 , xi ) = diam(Gn ) = n − 1,
which is much larger than 1 when n is large.

15
1
x1 x2 x3 x4 x5 x6
x1 x2
x1 0 1 2 3 4 5
1 1
x2 5 0 1 2 3 4
x3 4 5 0 1 2 3
x6 x3
x4 3 4 5 0 1 2
x5 2 3 4 5 0 1
1 1 x6 1 2 3 4 5 0
x5 1 x4

Figure 1.3: A cycle network on 6 nodes, along with its weight matrix. Note that the weights
are highly asymmetric.

1.3.3 The network stochastic block model


We now describe a generative model for random networks, based on the popular stochas-
tic block model for sampling random graphs [2]. The current network SBM model we
describe is a composition of Gaussian distributions. However, the construction can be ad-
justed easily to work with other distributions.
Fix a number of communities N ∈ N. For 1 ≤ i, j ≤ N , fix a mean µij and a variance
σij2 . This collection G := {N (µij , σij2 ) : 1 ≤ i, j ≤ N } of N 2 independent Gaussian
distributions comprise the network SBM.
To sample a random P network (X, ωX ) of n nodes from this SBM, start by fixing ni ∈
N, 1 ≤ i ≤ N such that i ni = n. For 1 ≤ i ≤ N , let Xi be a set with ni points. Define
X := ∪ni=1 Xi . Next sample each node weight as ωX (x, x0 ) ∼ N (µij , σij2 ), where x ∈ Xi
and x0 ∈ Xj . An illustration of this process is provided in FIgure 1.4.
A minimalistic way to obtain a measure network from the preceding construction is to
equip the pair (X, ωX ) with the uniform measure µX that assigns a mass of 1/n to each
point.
The justification for defining a network SBM this ways comes from the understanding
of ε-systems that we develop in §1.6.1.

16
µ σ2
1 10 11 12 13 1 3 3 3 3
20 1.75 14 15 16 3 1 3 3 3
21 24 2.5 17 18 3 3 1 3 3
22 25 27 3.25 19 3 3 3 1 3
23 26 28 29 4 3 3 3 3 1

Figure 1.4: A network SBM on 50 nodes, split into 5 communities, along with the matrices
of means and variances. The deepest blue corresponds to values ≈ 1, and the deepest
yellow corresponds to values ≈ 29.

1.4 Persistent homology on networks: Simplicial constructions


Prior to developing the notion of persistent path homology, we first explored related
ideas in the simplicial setting, especially with regards to capturing information from di-
rected networks. We explain these contributions in the current section. An overview is
presented in Figure 1.5.

s
R
Dsi Hk
FN F Dgm
so
D
t

Figure 1.5: A schematic of some of the simplicial constructions on directed networks.


F is the collection of filtered simplicial complexes. Dgm is the collection of persistence
diagrams. We study the Rips (R) and Dowker (Dsi , Dso ) filtrations, each of which takes a
network as input and produces a filtered simplicial complex. s and t denote the network
transformations of symmetrization (replacing a pair of weights between two nodes by the
maximum weight) and transposition (swapping the weights between pairs of nodes). R is
insensitive to both s and t. But Dsi ◦ t = Dso , Dso ◦ t = Dsi , and in general, Dsi and Dso
are not invariant under t (Theorem 50).

17
1.4.1 Background on persistent homology
Homology is a classical construction which assigns a k-dimensional signature to a topo-
logical space, where k ranges over the nonnegative integers. When the space is a simplicial
complex, the resulting homology theory is called simplicial homology. In practice, simpli-
cial homology is readily computable via matrix operations.
Datasets “in the wild” are typically discrete objects. More specifically, our measure-
ment and recording technologies are discrete, and therefore the datasets that we curate from
a process are necessarily discrete. A priori, these datasets are equipped with the uninterest-
ing discrete topology. However, there are several well-studied [47] methods for imposing
an artificial topology on a discrete dataset. Here are two examples.

Example 20 (Vietoris-Rips complexes). Let (X, dX ) be a metric space. Given a scale


parameter δ ≥ 0, the Vietoris-Rips complex at scale δ is

VRδ (X) := {σ ⊆ X : σ finite, nonempty, max


0
dX (x, x0 ) ≤ δ}.
x,x ∈σ

Example 21 (Čech complexes). Let (X, dX ) be a metric space. Given a scale parameter
δ ≥ 0, the Čech complex at scale δ is

Čδ (X) := {σ ⊆ X : σ finite, nonempty, ∃p ∈ X such that max dX (x, p) ≤ δ}.


x∈σ

Both of these constructions are accompanied by interesting theorems. For Vietoris-Rips


complexes, the following results are known (see [3] for more details).

Theorem 22 (Hausmann’s theorem, [69]). Let M be a compact Riemannian manifold.


Then for sufficiently small r > 0, we have the homotopy equivalence M ' VRr (M ).

Theorem 23 (Latschev’s theorem, [79]). Let M be a compact Riemannian manifold. Then


for sufficiently small r, η > 0, we have M ' VRr (X) whenever X is a metric space with
dGH (X, M ) < η. In particular, X can be a sufficiently dense, finite sample of M .

The Čech complex of a metric space X at scale δ coincides with the nerve simplicial
complex when we take a cover of X by δ-balls:

Definition 15 (Nerve of a cover). Let X be a topological space, and let A = {Ai }i∈I be an
open cover of X indexed by I. The nerve of A is the simplicial complex N (A) := {σ ∈
pow(I) : σ is finite, nonempty, and ∩i∈σ Ai 6= ∅}.

Theorem 24 (Nerve theorem [68] Corollary 4G.3). Let X be a paracompact space (every
open cover admits a locally finite open refinement), and let A be an open cover such that
every nonempty, finite intersection of sets in A is contractible. Then X ' |N (A)|.

18
Returning to the topic of imposing an artificial topology on a dataset via one of these
constructions, the following question comes to mind: what is the “correct” scale param-
eter to use when defining either the Vietoris-Rips or the Čech complexes? The theory
of persistent homology (PH) enables the user to bypass this consideration and instead
view the homological signatures at a range of scale parameters, along with information
about how signatures from one resolution “include” into the signatures at another resolu-
tion [56, 103, 49, 123]. The essential idea is to fix a method for “topologizing” a dataset
(e.g. the Vietoris-Rips or Čech constructions), choose a collection of scale parameters
0 ≤ δ0 < δ1 < . . . < δn (e.g. choose all the scales at which new simplices are added),
and then apply the (simplicial) homology functor with coefficients in a field. The nested
simplicial complexes and their inclusion maps

. . . ,→ Kδi (X) ,→ Kδi+1 (X) ,→ Kδi+2 (X) ,→ . . .

form a sequence of vector spaces with linear maps:

. . . → H• (Kδi (X)) → H• (Kδi+1 (X)) → H• (Kδi+2 (X)) → . . . .

Here the H• denotes homology in a given dimension • ranging over Z+ . The entire col-
lection {Kδ (X) ,→ Kδ0 (X)}δ≤δ0 is known as a simplicial filtration or a filtered simplicial
complex, and the collection of vector spaces with linear maps is a persistent vector space.
More precisely, we have the following definition:
νδ,δ0 0
Definition 16. A persistent vector space V is a family {V δ −−→ V δ }δ≤δ0 ∈R of vector
spaces and linear maps such that: (1) νδ,δ is the identity map for any δ ∈ R, and (2)
νδ,δ00 = νδ0 ,δ00 ◦ νδ,δ0 whenever δ ≤ δ 0 ≤ δ 00 .

A classification result in [21, §5.2] shows that at least in “nice” settings, a certain object
called a persistence diagram/barcode ([26]) is a full invariant of a persistent vector space.
When it is well-defined, a persistence diagram is essentially a list of the topological signa-
tures of the dataset along with the ranges of scale parameters along which each signature
persists. In typical data analysis use cases, the barcode or diagram is simply the output of
applying persistent homology to a dataset.
Persistent homology computations are matrix operations and hence computable, and
there are numerous software packages currently available for efficient PH computations.
PH computation is theoretically justified by certain stability theorems, one of which is the
following (an early version for Vietoris-Rips filtrations on finite metric spaces appared in
[25]):

Theorem 25 (PH stability for metric spaces, [27] Theorem 5.2). Let X, Y be two totally
bounded metric spaces, and let k ∈ Z+ . Let Dgm•k denote the k-dimensional persistence
diagram of either the Vietoris-Rips or Čech filtrations. Then we have:

dB (Dgm•k (X), Dgm•k (Y )) ≤ 2dGH (X, Y ).

19
Here dB is a certain metric on persistence diagrams called the bottleneck distance. It is
essentially a matching metric that can be computed via the Hungarian algorithm. Stability
results of this form show that PH outputs change in a controlled way when the input dataset
is perturbed. This provides the theoretical justification for using PH in data analysis.
Finally, we introduce another definition that will be of use to us: the interleaving dis-
tance dI is an extended pseudometric between persistent vector spaces (cf. §3.1.2). Even
when persistent vector spaces do not have well-defined persistence diagrams, we can refer
to the interleaving distance between the persistent vector spaces.

1.4.2 Related literature on persistent homology of directed networks


The overview of persistent homology described in the previous section is limited to
methods that accept metric spaces as input datasets. Some of the extant “network” PH
literature considers graph datasets that actually satisfy metric properties [80, 76], and these
fit into the metric pipeline described above. Some more general approaches for obtaining
persistence diagrams from networks are followed in [71, 23, 60, 99]. In all these cases, the
networks are required to be symmetric.
It was pointed out in [27] that Vietoris-Rips and Čech complexes could be defined for
dissimilarity spaces satisfying the symmetry property (but relaxing the other properties of
a metric space). Another key contribution of this paper was in showing that persistence
diagrams were well defined not just for finite metric spaces, but also totally bounded met-
ric spaces (in particular, compact metric spaces). A notable contribution for finite directed
networks was made in [118], where Turner considered several generalizations of Vietoris-
Rips complexes to the directed setting and proved their stability with respect to the finite
version of dN (while dN was referred to as a “correspondence distance” in [118], we adhere
to the dN formulation that had already appeared in [22]). In particular, Turner considered
ordered tuple complexes, which are morally quite different from extant simplicial construc-
tions. These OT complexes, also known as directed Rips/flag complexes, had been used
in the non-persistent setting in [102]. An efficient implementation of persistent homology
using directed flag complexes has also been developed recently by Lütgehetmann [83].
As stated earlier, one of the primary goals motivating this thesis was to develop the
notion of persistent path homology (PPH). Independently, we also studied a particular sim-
plicial construction called the Dowker complex. This complex had already appeared in the
symmetric setting in [27] with a different motivation, but in [37] we studied its behavior
thoroughly in the setting of directed networks, where it proved to be quite powerful. In
particular, we developed several experiments where Dowker persistence performed signifi-
cantly than its directed Vietoris-Rips counterpart. Regardless, PPH still appears to be truly
sensitive to asymmetry in more ways that even the Dowker complex, and thus seems to be
a natural candidate when studying directed networks.
Interestingly, based on our work in [37] and [40], the Dowker complex is a directed
analogue of the Čech complex, and there is evidence to suggest that PPH is the Čech

20
analogue of the directed Rips/flag/OT complex. In this way, our work complements the
contributions of [118].
We make a final remark to situate our work in the existing literature. In a 2018 update
to [118], Turner asks if any of the directed generalizations of the Vietoris-Rips complex
hold in the setting of infinite networks (more precisely, the question is about infinite “set-
function” pairs, which are just networks in our terminology). The answer is “yes”: as
we had already shown in [34], by the framework we develop for infinite networks, and in
particular by our notion of ε-systems, all of these directed generalizations of the Vietoris-
Rips and Čech complex constructions are well-defined for compact (in particular, infinite)
networks.

1.4.3 The Vietoris-Rips filtration of a network


Following the definition for metric spaces, we define the Vietoris-Rips complex for a
network (X, ωX ) ∈ N as follows:

RδX := {σ finite ∈ pow(X) : max


0
ωX (x, x0 ) ≤ δ}.
x,x ∈σ

0
To any network (X, ωX ), we may associate the Vietoris-Rips filtration {RδX ,→ RδX }δ≤δ0 .
We denote the k-dimensional persistence vector space associated to this filtration by PVecR
k (X).
R
It is not at all clear that the corresponding persistence diagram Dgmk (X) is well-
defined in general, although it is well-defined when (X, ωX ) ∈ FN (the finite case is
easy to see). The fact that the persistence diagram is well-defined when (X, ωX ) ∈ CN is
presented in Theorem 86, and is a consequence of the machinery we develop for dN .
The Vietoris-Rips persistence diagram, when defined, is stable to small perturbations
of the input data:
Proposition 26. Let (X, ωX ), (Y, ωY ) ∈ CN , and let k ∈ Z+ . Then

dI (PVecR R
k (X), PVeck (Y )) ≤ 2dN (X, Y ).

We omit the proof because it is similar to that of Proposition 29, which we will prove in
detail. We also remark that we obtain stability for persistence diagrams with respect to the
bottleneck distance in Corollary 177. More specifically, Corollary 177 states that we have:

dB (DgmR R
k (X), Dgmk (Y )) ≤ 2dN (X, Y ).

Remark 27. The preceding result serves a dual purpose: (1) it shows that the Vietoris-Rips
persistence diagram is robust to noise in input data, and (2) it shows that instead of comput-
ing the network distance between two networks, one can compute the bottleneck distance
between their Vietoris-Rips persistence diagrams as a suitable proxy. The advantage to
computing bottleneck distance is that it can be done in polynomial time (see [52]), whereas
computing dN is NP-hard in general. We remind the reader that the problem of comput-
ing dN includes the problem of computing the Gromov-Hausdorff distance between finite

21
metric spaces, which is an NP-hard problem [105]. We remark that the idea of computing
Vietoris-Rips persistence diagrams to compare finite metric spaces first appeared in [25],
and moreover, that Proposition 26 is an extension of Theorem 3.1 in [25].
The Vietoris-Rips filtration in the setting of symmetric networks has been used in [71,
23, 60, 99], albeit without addressing stability results.
We now introduce a definition that will help us gauge the performance of various PH
methods on directed networks.
Definition 17 (Symmetrization and Transposition). Define the max-symmetrization map
s : N → N by (X, ωX ) 7→ (X, ωc X ), where for any network (X, ωX ), we define ω
c X :
X × X → R as follows:
0 0 0 0
X (x, x ) := max(ωX (x, x ), ωX (x , x)), for x, x ∈ X.
ωc
>
Also define the transposition map t : N → N by (X, ωX ) 7→ (X, ωX ), where for any
> 0 0 0
(X, ωX ) ∈ N , we define ωX (x, x ) := ωX (x , x) for x, x ∈ X. For convenience, we
denote X > := t(X) for any network X.
Remark 28 (Vietoris-Rips is insensitive to asymmetry). A critical weakness of the Vietoris-
Rips complex construction is that it is not sensitive to asymmetry. To see this, consider the
symmetrization map s defined in Definition 17, and let (X, ωX ) ∈ FN . Now for any
σ ∈ pow(X), we have maxx,x0 ∈σ ωX (x, x0 ) = maxx,x0 ∈σ ωc 0
X (x, x ). It follows that for each
δ ≥ 0, the Rips complexes of (X, ωX ) and (X, ωc X ) = s(X, ωX ) are equal, i.e. R = R ◦ s.
Thus the Rips persistence diagrams of the original and max-symmetrized networks are
equal.

1.4.4 The Dowker filtration of a network


The Dowker complexes of a network, as defined below, comprise a natural generaliza-
tion of the Čech complex for metric spaces. In the network setting, the lack of symmetry
causes the Čech complex to decouple into a sink complex and a source complex. For histor-
ical reasons that we explain in §1.4.5, these complexes are called the Dowker complexes.
Given (X, ωX ) ∈ N , and for any δ ∈ R, consider the following relation on X:

Rδ,X := {(x, x0 ) : ωX (x, x0 ) ≤ δ} . (1.1)

Then Rδ,X ⊆ X × X, and for any δ 0 ≥ δ, we have Rδ,X ⊆ Rδ0 ,X . Using Rδ,X , we build a
simplicial complex Dsiδ as follows:

Dsiδ,X := {σ = [x0 , . . . , xn ] : there exists x0 ∈ X such that (xi , x0 ) ∈ Rδ,X for each xi } .
(1.2)
If σ ∈ Dsiδ,X , it is clear that any face of σ also belongs to Dsiδ,X . We call Dsiδ,X the Dowker
δ-sink simplicial complex associated to X, and refer to x0 as a δ-sink for σ (where σ and x0
should be clear from context).

22
Since Rδ,X is an increasing sequence of sets, it follows that Dsiδ,X is an increasing se-
quence of simplicial complexes. In particular, for δ 0 ≥ δ, there is a natural inclusion map
Dsiδ,X ,→ Dsiδ0 ,X . We write DsiX to denote the filtration {Dsiδ,X ,→ Dsiδ0 ,X }δ≤δ0 associated to
X. We call this the Dowker sink filtration on X. The corresponding persistent vector space
is denoted PVecsik (X). When it is defined, we will denote the k-dimensional persistence
diagram arising from this filtration by Dgmsik (X). Once again, we point the reader to The-
orem 86, where we show that this diagram is well-defined when (X, ωX ) ∈ CN . The case
(X, ωX ) ∈ FN is well-defined for easy reasons, and the reader may keep the finite case in
mind for now.



 ∅ : δ < −1

{[a]} : −1 ≤ δ < 0
b 0 Dsi
δ,X =

 {[a], [b], [c]} :0≤δ<1
1 
{[a], [b], [c], [ab], [bc], [ac], [abc]} :δ≥1


2 ∅ : δ < −1
1 2 


{[a]} : −1 ≤ δ < 0

2

−1 a c 0

Dso
δ,X = {[a], [b], [c]} :0≤δ<1
1

{[a], [b], [c], [ab], [ac]} :1≤δ<2





{[a], [b], [c], [ab], [bc], [ac], [abc]} :δ≥2

Figure 1.6: Computing the Dowker sink and source complexes of a network (X, ωX ).
Observe that the sink and source complexes are different in the range 1 ≤ δ < 2.

Practitioners of persistent homology might recall that there are two Dowker complexes
[59, p. 73]. One of these is the sink complex defined above. We define its dual below:
0 0
Dsoδ,X := {σ = [x0 , . . . , xn ] : there exists x ∈ X such that (x , xi ) ∈ Rδ,X for each xi } .
(1.3)
We call Dso δ,X the Dowker δ-source simplicial complex associated to X. The filtration
{Dsoδ,X ,→ D so
δ ,X }δ≤δ 0 associated to X is called the Dowker source filtration, denoted DX .
0
so

We denote the k-dimensional persistence diagram (when defined) arising from this filtration
by Dgmso si
k (X). Notice that any construction using Dδ,X can also be repeated using Dδ,X , so
so

we focus on the case of the sink complexes and restate results in terms of source complexes
where necessary. A subtle point to note here is that each of these Dowker complexes can be
used to construct a persistence diagram. A folklore result in the literature about persistent
homology of metric spaces, known as Dowker duality, is that the two persistence diagrams
arising this way are equal [27, Remark 4.8]:
Dgmsik (X) = Dgmso
k (X) for any k ∈ Z+ ,

23
Thus it makes sense to talk about “the” Dowker diagram associated to X. In particular,
in §1.4.5 we describe a stronger result—a functorial Dowker theorem—from which the
duality follows easily in the general setting of networks.
The sink and source filtrations are not equal in general; this is illustrated in Figure 1.6.
As in the case of the Rips filtration, both the Dowker sink and source filtrations are
stable.

Proposition 29. Let (X, ωX ), (Y, ωY ) ∈ CN . Then

dI (PVec•k (X), PVec•k (Y )) ≤ 2dN (X, Y ).

Here PVec• refers to either of PVecsi and PVecso .

Once again, we obtain stability for persistence diagrams with respect to the bottleneck
distance in Corollary 177. More specifically, Corollary 177 states that we have:

dB (Dgm•k (X), Dgm•k (Y )) ≤ 2dN (X, Y ).

Remark 30. The preceding result shows that the Dowker persistence diagram is robust
to noise in input data, and that the bottleneck distance between Dowker persistence dia-
grams arising from two networks can be used as a proxy for computing the actual network
distance. Note the analogy with Remark 27.

Both the Dowker and Rips filtrations are valid methods for computing persistent ho-
mology of networks, by virtue of their stability results (Propositions 26 and 29). However,
we present the Dowker filtration as an appropriate method for capturing directionality in-
formation in directed networks. In §1.4.6 we discuss this particular feature of the Dowker
filtration in full detail.

Remark 31 (Symmetric networks). In the setting of symmetric networks, the Dowker sink
and source simplicial filtrations coincide, and so we automatically obtain Dgmso
k (X) =
Dgmsik (X) for any k ∈ Z+ and any (X, ωX ) ∈ CN .

Remark 32 (The metric space setting and relation to witness complexes). When restricted
to the setting of metric spaces, the Dowker complex resembles a construction called the
witness complex [44]. In particular, a version of the Dowker complex for metric spaces,
constructed in terms of landmarks and witnesses, was discussed in [27], along with sta-
bility results. When restricted to the special networks that are pseudo-metric spaces, our
definitions and results agree with those presented in [27].

1.4.5 A Functorial Dowker Theorem


We now abstract some of the definitions presented above. Let X, Y be two totally
ordered sets, and let R ⊆ X × Y be a nonempty relation. Then one defines two simplicial

24
complexes ER and FR as follows. A finite subset σ ⊆ X belongs to ER whenever there
exists y ∈ Y such that (x, y) ∈ R for each x ∈ σ. Similarly a finite subset τ ⊆ Y belongs to
FR whenever there exists x ∈ X such that (x, y) ∈ R for each y ∈ τ . These constructions
can be traced back to [46], who proved the following result that we refer to as Dowker’s
theorem:

Theorem 33 (Dowker’s theorem; Theorem 1a, [46]). Let X, Y be two totally ordered sets,
let R ⊆ X × Y be a nonempty relation, and let ER , FR be as above. Then for each k ∈ Z+ ,

Hk (ER ) ∼
= Hk (FR ).

There is also a strong form of Dowker’s theorem that Björner proves via the classical
nerve theorem [11, Theorems 10.6, 10.9]. Below we write |X| to denote the geometric
realization of a simplicial complex X.

Theorem 34 (The strong form of Dowker’s theorem; Theorem 10.9, [11]). Under the as-
sumptions of Theorem 33, we in fact have |ER | ' |FR |.

The Functorial Dowker Theorem is the following generalization of the strong form of
Dowker’s theorem: instead of a single nonempty relation R ⊆ X × Y , consider any pair
of nested, nonempty relations R ⊆ R0 ⊆ X × Y . Then there exist homotopy equivalences
between the geometric realizations of the corresponding complexes that commute with the
canonical inclusions, up to homotopy. We formalize this statement below.

Theorem 35 (The Functorial Dowker Theorem (FDT)). Let X, Y be two totally ordered
sets, let R ⊆ R0 ⊆ X × Y be two nonempty relations, and let ER , FR , ER0 , FR0 be their
associated simplicial complexes. Then there exist homotopy equivalences Γ|ER | : |FR | →
|ER | and Γ|ER0 | : |FR0 | → |ER0 | such that the following diagram commutes up to homotopy:

|ιE |
|FR | |FR0 |

Γ|ER | ' ' Γ|ER0 |


|ιF |
|ER | |ER0 |

In other words, we have |ιF | ◦ Γ|ER | ' Γ|ER0 | ◦ |ιE |, where ιE , ιF are the canonical
inclusions.

From Theorem 35 we automatically obtain Theorem 34 (the strong form of Dowker’s


theorem) as an immediate corollary. The strong form does not appear in Dowker’s original
paper [46], but Björner has given a proof using the nerve theorem [11, Theorems 10.6,
10.9]. Moreover, Björner writes in a remark following [11, Theorem 10.9] that the nerve
theorem and the strong form of Dowker’s theorem are equivalent, in the sense that one

25
implies the other. We were not able to find an elementary proof of the strong form of
Dowker’s theorem in the existing literature. However, such an elementary proof is provided
by our proof of Theorem 35 (given in Section 3.2.2), which we obtained by extending ideas
in Dowker’s original proof of Theorem 33.2
Whereas the Functorial Dowker Theorem and our elementary proof are of indepen-
dent interest, it has been suggested in [27, Remark 4.8] that such a functorial version of
Dowker’s theorem could also be proved using a functorial nerve theorem [28, Lemma 3.4].
Despite being an interesting possibility, we were not able to find a detailed proof of this
claim in the literature. In addition, Björner’s remark regarding the equivalence between the
nerve theorem and the strong form of Dowker’s theorem suggests the following question:

Question 1. Are the Functorial Nerve Theorem (FNT) of [28] and the Functorial Dowker
Theorem (FDT, Theorem 35) equivalent?

This question is of fundamental importance because the Nerve Theorem is a crucial tool
in the applied topology literature and its functorial generalizations are equally important in
persistent homology. In general, the answer is no, and moreover, one (of the FNT and FDT)
is not stronger than the other. The FNT of [28] is stated for paracompact spaces, which are
more general than the simplicial complexes of the FDT. However, the FNT of [28] is stated
for spaces with finitely-indexed covers, so the associated nerve complexes are necessarily
finite. All the complexes involved in the statement of the FDT are allowed to be infinite, so
the FDT is more general than the FNT in this sense.
To clarify these connections, we formulate a simplicial Functorial Nerve Theorem (The-
orem 38) and prove it via a finite formulation of the FDT (Theorem 36). In turn, we show
that the simplicial FNT implies the finite FDT, thus proving the equivalence of these for-
mulations (Theorem 39).
We begin with a weaker formulation of Theorem 35 and some simplicial Functorial
Nerve Theorems.

Theorem 36 (The finite FDT). Let X, Y be two totally ordered sets, and without loss of
generality, suppose X is finite. Let R ⊆ R0 ⊆ X × Y be two nonempty relations, and let
ER , FR , ER0 , FR0 be their associated simplicial complexes (as in Theorem 35). Then there
exist homotopy equivalences Γ|ER | : |FR | → |ER | and Γ|ER0 | : |FR0 | → |ER0 | that commute
up to homotopy with the canonical inclusions.

The finite FDT (Theorem 36) is an immediate consequence of the general FDT (Theo-
rem 35).
Recall the definition of the nerve complex. Let A = {Ai }i∈I be a family of nonempty
sets indexed by I. The nerve of A is the simplicial complex N (A) := {σ ∈ pow(I) :
σ is finite, nonempty, and ∩i∈σ Ai 6= ∅}.
2
A thread with ideas towards the proof of Theorem 34 was discussed in [1, last accessed 4.24.2017], but
the proposed strategy was incomplete. We have inserted an addendum in [1] proposing a complete proof with
a slightly different construction.

26
Definition 18 (Covers of simplices and subcomplexes). Let Σ be a simplicial complex.
Then a collection of subcomplexes AΣ = {Σi }i∈I is said to be a cover of subcomplexes for
Σ if Σ = ∪i∈I Σi . Furthermore, AΣ is said to be a cover of simplices if each Σi ∈ AΣ has the
property that Σi = pow(V (Σi )). In this case, each Σi has precisely one top-dimensional
simplex, consisting of the vertex set V (Σi ).

We present two simplicial formulations of the Functorial Nerve Theorem that turn out
to be equivalent; the statements differ in that one is about covers of simplices and the other
is about covers of subcomplexes.

Theorem 37 (Functorial Nerve I). Let Σ ⊆ Σ0 be two simplicial complexes, and let AΣ =
{Σi }i∈I , AΣ0 = {Σ0i }i∈I 0 be finite covers of simplices for Σ and Σ0 such that I ⊆ I 0 and
Σi ⊆ Σ0i for each i ∈ I. In particular, card(I 0 ) < ∞. Suppose that for each finite subset
σ ⊆ I 0 , the intersection ∩i∈σ Σ0i is either empty or contractible (and likewise for ∩i∈σ Σi ).
Then |Σ| ' |N (AΣ )| and |Σ0 | ' |N (AΣ0 )|, via maps that commute up to homotopy with
the canonical inclusions.

Theorem 38 (Functorial Nerve II). The statement of Theorem 37 holds even if AΣ and
AΣ0 are covers of subcomplexes. Explicitly, the statement is as follows. Let Σ ⊆ Σ0
be two simplicial complexes, and let AΣ = {Σi }i∈I , AΣ0 = {Σ0i }i∈I 0 be finite covers of
subcomplexes for Σ and Σ0 such that I ⊆ I 0 and Σi ⊆ Σ0i for each i ∈ I. In particular,
card(I 0 ) < ∞. Suppose that for each finite subset σ ⊆ I 0 , the intersection ∩i∈σ Σ0i is
either empty or contractible (and likewise for ∩i∈σ Σi ). Then |Σ| ' |N (AΣ )| and |Σ0 | '
|N (AΣ0 )|, via maps that commute up to homotopy with the canonical inclusions.

The following result summarizes our answer to Question 1.

Theorem 39 (Equivalence). The finite FDT, the FNT I, and the FNT II are all equivalent.
Moreover, all of these results are implied by the FDT, as below:

Theorem 37

Theorem 35 Theorem 36 Theorem 38

1.4.6 Dowker persistence diagrams and asymmetry


From the very definition of the Rips complex at any given resolution, one can see that
the Rips complex is blind to asymmetry in the input data (Remark 28). In this section,
we argue that either of the Dowker source and sink complexes is sensitive to asymmetry.
Thus when analyzing datasets containing asymmetric information, one may wish to use
the Dowker filtration instead of the Rips filtration. In particular, this property suggests that
the Dowker persistence diagram is a stronger invariant for directed networks than the Rips
persistence diagram.

27
In this section, we consider the cycle networks from §1.3, for which the Dowker per-
sistence diagrams capture meaningful structure, whereas the Rips persistence diagrams do
not.
We then probe the question “What happens to the Dowker or Rips persistence diagram
of a network upon reversal of one (or more) edges?” Intuitively, if either of these persistence
diagrams captures asymmetry, we would see a change in the diagram after applying this
reversal operation to an edge.
To provide further evidence that Dowker persistence is sensitive to asymmetry, we
computed both the Rips and Dowker persistence diagrams, in dimensions 0 and 1, of cy-
cle networks Gn , for values of n between 3 and 6. Computations were carried out using
Javaplex in Matlab with Z2 coefficients. The results are presented in Figure 1.7. Based
on our computations, we were able to conjecture and prove the result in Theorem 40, which
gives a precise characterization of the 1-dimensional Dowker persistence diagram of a cy-
cle network Gn , for any n. Furthermore, the 1-dimensional Dowker persistence barcode
for any Gn contains only one persistent interval, which agrees with our intuition that there
is only one nontrivial loop in Gn . On the other hand, for large n, the 1-dimensional Rips
persistence barcodes contain more than one persistent interval. This can be seen in the
Rips persistence barcode of G6 , presented in Figure 1.7. Moreover, for n = 3, 4, the 1-
dimensional Rips persistence barcode does not contain any persistent interval at all. This
suggests that Dowker persistence diagrams/barcodes are an appropriate method for analyz-
ing cycle networks, and perhaps asymmetric networks in general.
The following theorem contains the characterization result for 1-dimensional Dowker
persistence diagrams of cycle networks.

Theorem 40. Let Gn = (Xn , ωGn ) be a cycle network for some n ∈ N, n ≥ 3. Then we
obtain:
DgmD 2

1 (Gn ) = (1, dn/2e) ∈ R .
Thus DgmD 2
1 (Gn ) consists of precisely the point (1, dn/2e) ∈ R with multiplicity 1.

Remark 41. From our experimental results (cf. Figure 1.7), it appears that the 1-dimensional
Rips persistence diagram of a cycle network does not admit a characterization as simple as
that given by Theorem 40 for the 1-dimensional Dowker persistence diagram. Moreover,
the Rips complexes RδGn , δ ∈ R, n ∈ N correspond to certain types of independence com-
plexes that appear independently in the literature, and whose homotopy types remain open
[53, Question 5.3]. On a related note, we point the reader to [3] for a complete characteri-
zation of the homotopy types of Rips complexes of points on the circle (equipped with the
restriction of the arc length metric).
To elaborate on the connection to [53], we write Hnk to denote the undirected graph
with vertex set {1, . . . , n}, and edges given by pairs (i, j) where 1 ≤ i < j ≤ n and either
j − i < k or (n + i) − j < k. Next we write Ind(Hnk ) to denote the independence complex
of Hnk , which is the simplicial complex consisting of subsets σ ⊆ {1, 2, . . . , n} such that
no two elements of σ are connected by an edge in Hnk . Then we have Ind(Hnk ) = Rn−k Gn

28
for each k, n ∈ N such that k < n. To gain intuition for this equality, fix a basepoint 1,
and consider the values of j ∈ N for which the simplex [1, j] belongs to Ind(Hnk ) and to
Rn−k
Gn , respectively. In either case, we have k + 1 ≤ j ≤ n − k + 1. Using the rotational
symmetry of the points, one then obtains the remaining 1-simplices. Rips complexes are
determined by their 1-skeleton, so this suffices to construct Rn−k
Gn . Analogously, Ind(Hn )
k

is determined by the edges in Hnk , and hence also by its 1-skeleton. In [53, Question 5.3],
the author writes that the homotopy type of Ind(Hnk ) is still unsolved. Characterizing the
persistence diagrams DgmR k (Gn ) thus seems to be a useful future step, both in providing
a computational suggestion for the homotopy type of Ind(Hnk ), and also in providing a
valuable example in the study of persistence of directed networks.

Remark 42. Theorem 40 has the following implication for data analysis: nontrivial 1-
dimensional homology in the Dowker persistence diagram of an asymmetric network sug-
gests the presence of directed cycles in the underlying data. Of course, it is not necessarily
true that nontrivial 1-dimensional persistence can occur only in the presence of a directed
circle.

Remark 43. Our motivation for studying cycle networks is that they constitute directed
analogues of circles, and we were interested in seeing if the 1-dimensional Dowker per-
sistence diagram would be able to capture this analogy. Theorem 40 shows that this is
indeed the case: we get a single nontrivial 1-dimensional persistence interval, which is
what we would expect when computing the persistent homology of a circle in the metric
space setting.
While we had initially proved Theorem 40 using elementary methods, Henry Adams
observed that Dowker complexes on cycle networks can be precisely related to nerve com-
plexes built over arcs on the circle. Using the techniques developed in [3] and [4], one
obtains the following results:

Theorem 44 (Even dimension). Fix n ∈ N, n ≥ 3. If l ∈ N is such that n is divisible by


(l + 1), and k := l+1 nl
is such that 0 ≤ k ≤ n − 2, then DgmD2l (Gn ) consists of precisely the
nl nl n
point ( l+1 , l+1 + 1) with multiplicity l+1 − 1. If l or k do not satisfy the conditions above,
then DgmD 2l (Gn ) is trivial.

As a special case, we know that if n is odd, then DgmD 2 (Gn ) is trivial. If n is even, then
D
Dgm2 (Gn ) consists of the point ( 2 , 2 + 1) with multiplicity n2 − 1.
n n

Theorem
n o Fix n ∈ N, n ≥ 3. Then for l ∈ N, define Ml :=
45 (Odd dimension).
n(l+1)
nl
m ∈ N : l+1 < m < l+2 . If Ml is empty, then DgmD 2l+1 (Gn ) is trivial. Otherwise,
we have: n l mo
D n(l+1)
Dgm2l+1 (Gn ) = al , l+2 ,

where al := min {m ∈ Ml } . We use set notation (instead of multisets) to mean that the
multiplicity is 1.

29
nl
In particular, for l = 0, we have l+1 = 0 and n(l+1)
l+2
= n2 ≥ 3/2, so 1 ∈ Ml . Thus
we have DgmD 1, n2
  
1 (Gn ) = , and so Theorem 45 recovers Theorem 40 as a special
case.

Sensitivity to network transformations


We now make a definition:
Definition 19 (Pair swaps). Let (X, ωX ) be a network. For any z, z 0 ∈ X, define the
0 z,z 0
(z, z 0 )-swap of (X, ωX ) to be the network SX (z, z 0 ) := (X z,z , ωX ) defined as follows:
0
X z,z := X,

0 0 0
0
ωX (x , x) : x = z, x = z

0 z,z
For any x, x0 ∈ X z,z , ωX (x, x0 ) := ωX (x0 , x) : x0 = z, x = z 0

ωX (x, x0 ) : otherwise.

We then pose the following question:


Given a network (X, ωX ) and an (x, x0 )-swap SX (x, x0 ) for some x, x0 ∈ X,
how do the Rips or Dowker persistence diagrams of SX (x, x0 ) differ from those
of (X, ωX )?
This situation is illustrated in Figure 1.8. Example 49 shows an example where the Dowker
persistence diagram captures the variation in a network that occurs after a pair swap,
whereas the Rips persistence diagram fails to capture this difference. Furthermore, Re-
mark 47 shows that Rips persistence diagrams always fail to do so.
We also consider the extreme situation where all the directions of the edges of a network
are reversed, i.e. the network obtained by applying the pair swap operation to each pair of
nodes. We would intuitively expect that the persistence diagrams would not change. The
following discussion shows that the Rips and Dowker persistence diagrams are invariant
under taking the transpose of a network.
Proposition 46. Recall the transposition map t and the shorthand notation X > = t(X)
from Definition 17. Let k ∈ Z+ . Then Dgmsik (X) = Dgmso > D
k (X ), and therefore Dgmk (X) =
DgmD >
k (X ) by Theorem 35.

Remark 47 (Pair swaps and their effect). Let (X, ωX ) ∈ CN , let z, z 0 ∈ X, and let
σ ∈ pow(X). Then we have:
z,z 0
max
0
ωX (x, x0 ) = max
0
ωX (x, x0 ).
x,x ∈σ x,x ∈σ

Using this observation, one then repeats the arguments used in the proof of Proposition 46
to show that:
0
DgmR R
k (X) = Dgmk (SX (z, z )), for each k ∈ Z+ .

30
x1
1 1

x2 1 x3

x1
1 1

x2 x4
1 1
x3

x1
1 1

x5 x2

1 1
x4 1 x3

x1 1 x2
1 1
x6 x3
1 1
x5 1 x4

Figure 1.7: The first column contains illustrations of cycle networks G3 , G4 , G5 and G6 .
The second column contains the corresponding Dowker persistence barcodes, in dimen-
sions 0 and 1. Note that the persistent intervals in the 1-dimensional barcodes agree with
the result in Theorem 40. The third column contains the Rips persistence barcodes of each
of the cycle networks. Note that for n = 3, 4, there are no persistent intervals in dimension
1. On the other hand, for n = 6, there are two persistent intervals in dimension 1.

31
0 0

a a
1 4 1 2
6 2 6 4

0 5 0 0 5 0
b c b c
3 3
(X, ωX ) (Y, ωY )

Figure 1.8: (Y, ωY ) is the (a, c)-swap of (X, ωX ).

This encodes the intuitive fact that Rips persistence diagrams are blind to pair swaps. More-
over, successively applying the pair swap operation over all pairs produces the transpose of
the original network, and so it follows that DgmR R
k (X) = Dgmk (X ).
>

On the other hand, k-dimensional Dowker persistence diagrams are not necessarily
invariant to pair swaps when k ≥ 1. Indeed, Example 49 below constructs a space X for
which there exist points z, z 0 ∈ X such that
0
DgmD D
1 (X) 6= Dgm1 (SX (z, z )).

However, 0-dimensional Dowker persistence diagrams are still invariant to pair swaps:

Proposition 48. Let (X, ωX ) ∈ CN , let z, z 0 be any two points in Z, and let σ ∈ pow(X).
Then we have:
0
DgmD D
0 (X) = Dgm0 (SX (z, z )).

Example 49. Consider the three node dissimilarity networks (X, ωX ) and (Y, ωY ) in Fig-
ure 1.8. Note that (Y, ωY ) coincides with SX (a, c). We present both the Dowker and Rips
persistence barcodes obtained from these networks. Note that the Dowker persistence bar-
code is sensitive to the difference between (X, ωX ) and (Y, ωY ), whereas the Rips barcode
is blind to this difference.
To show how the Dowker complex is constructed, we also list the Dowker sink com-
plexes of the networks in Figure 1.8, and also the corresponding homology dimensions
across a range of resolutions. Note that when we write [a, b](a), we mean that a is a sink

32
Figure 1.9: Dowker persistence barcodes of networks (X, ωX ) and (Y, ωY ) from Figure
1.8.

Figure 1.10: Rips persistence barcodes of networks (X, ωX ) and (Y, ωY ) from Figure 1.8.
Note that the Rips diagrams indicate no persistent homology in dimensions higher than 0,
in contrast with the Dowker diagrams in Figure 1.9.

33
corresponding to the simplex [a, b].

Dsi0,X = {[a], [b], [c]} dim(H1 (Dsi0,X )) = 0


Dsi1,X = {[a], [b], [c], [a, b](a)} dim(H1 (Dsi1,X )) = 0
Dsi2,X = {[a], [b], [c], [a, b](a), [a, c](a), [b, c](a), [a, b, c](a)} dim(H1 (Dsi2,X )) = 0
Dsi3,X = {[a], [b], [c], [a, b](a), [a, c](a), [b, c](a), [a, b, c](a)} dim(H1 (Dsi3,X )) = 0

Dsi0,Y = {[a], [b], [c]} dim(H1 (Dsi0,Y )) = 0


Dsi1,Y = {[a], [b], [c], [a, b](a)} dim(H1 (Dsi1,Y )) = 0
Dsi2,Y = {[a], [b], [c], [a, b](a), [a, c](c)} dim(H1 (Dsi2,Y )) = 0
Dsi3,Y = {[a], [b], [c], [a, b](a), [a, c](c), [b, c](b)} dim(H1 (Dsi3,Y )) = 1
Dsi4,Y = {[a], [b], [c], [a, b](a), [a, c](a), [b, c](a), [a, b, c](a)} dim(H1 (Dsi4,Y )) = 0

Note that for δ ∈ [3, 4), dim(H1 (Dsiδ,Y )) = 1, whereas dim(H1 (Dsiδ,X )) = 0 for each
δ ∈ R.

Based on the discussion in Remark 47, Proposition 48, and Example 49, we conclude
the following:

Moral: Unlike Rips persistence diagrams, Dowker persistence diagrams are truly
sensitive to asymmetry.

We summarize some of these results:

Theorem 50. Recall the symmetrization and transposition maps s and t from Definition
17. Then:

1. R ◦ s = R,

2. Dso ◦ t = Dsi , and

3. Dsi ◦ t = Dso .

Also, there exist (X, ωX ), (Y, ωY ) ∈ FN such that (Dsi ◦ s)(X) 6= Dsi (X), and (Dso ◦
s)(Y ) 6= Dso (Y ).

Proof. These follow from Example 49, Remark 28, and Proposition 46.

34
1.5 Persistent path homology of networks
To define PPH, we first summarize and condense some concepts that appeared in [62].
We also point the reader to Section 3.1.1 for the necessary background on chain complexes
and associated constructions.

Remark 51 (Reconstructing networks from path filtrations). We make an observation to


explain the moral difference between path homology and the simplicial constructions we
considered above. In the setting of metric spaces with the VR filtration, it is always pos-
sible to recover full metric information from the filtered space. Specifically, given a pair
(x, x0 ), we can recover dX (x, x0 ) simply as the filtration value of the simplex [x, x0 ]. If the
metric space is geodesic, then dX (x, x0 ) can also be recovered from the Čech filtration—the
simplex [x, x0 ] is witnessed by their midpoint, so dX (x, x0 ) is twice the filtration value of
[x, x0 ] (thanks to F. Mémoli for this observation).
In the setting of networks, however, edge weight information cannot typically be re-
covered from filtration values. The issue lies in the equality [a, b] = −[b, a] for (oriented)
simplicial complexes. Even in the case of the VR filtration, we know that the filtration value
of [a, b] corresponds to either ωX (a, b) or ωX (b, a), but it is impossible to know which.
Path homology, in contrast, allows us to recover full metric information. The key idea
is that the equality [a, b] = −[b, a] is removed, and [a, b], [b, a] are linearly independent at
the chain complex level. Thus the filtration value of [a, b] recovers ωX (a, b), and that of
[b, a] recovers ωX (b, a).

1.5.1 Path homology of digraphs


Elementary paths on a set
Given a set X and any integer p ∈ Z+ , an elementary p-path over X is a sequence
[x0 , . . . , xp ] of p + 1 elements of X. For each p ∈ Z+ , the free vector space consisting
of all formal linear combinations of elementary p-paths over X with coefficients in K is
denoted Λp = Λp (X) = Λp (X, K). One also defines Λ−1 := K and Λ−2 := {0}. Next,
for any p ∈ Z+ , one defines a linear map ∂pnr : Λp → Λp−1 to be the linearization of the
following map on the generators of Λp :
p
X
∂pnr ([x0 , . . . , xp ]) := (−1)i [x0 , . . . , xbi , . . . , xp ],
i=0

for each elementary p-path [x0 , . . . , xp ] ∈ Λp . Here xbi denotes omission of xi from the
sequence. The maps ∂•nr are referred to as the non-regular boundary maps. For p = −1,
nr nr
one defines ∂−1 : Λ−1 → Λ−2 to be the zero map. Then ∂p+1 ◦ ∂pnr = 0 for any integer
p ≥ −1 [64, Lemma 2.2]. It follows that (Λp , ∂pnr )p∈Z+ is a chain complex.
For notational convenience, we will often drop the square brackets and commas and
write paths of the form [a, b, c] as abc. We use this convention in the next example.

35
Example 52 (Paths on a double edge). We will soon explain the interaction between paths
on a set and the edges on a digraph. First consider a digraph on a vertex set Y = {a, b}
as in Figure 1.11. Notice that there is a legitimate “path” on this digraph of the form aba,
obtained by following the directions of the edges. But notice that applying ∂2nr to the 2-path
aba yields ∂2nr (aba) = ba − aa + ab, and aa is not a valid path on this particular digraph
(self-loops are disallowed). To handle situations like this, one needs to consider regular
paths, which are explained in the next section.

a b

Figure 1.11: A two-node digraph on the vertex set Y = {a, b}.

Regular paths on a set


For each p ∈ Z+ , an elementary p-path [x0 , . . . , xp ] is called regular if xi 6= xi+1 for
each 0 ≤ i ≤ p − 1, and irregular otherwise. Then for each p ∈ Z+ , one defines:
 
Rp = Rp (X, K) := K {[x0 , . . . , xp ] : [x0 , . . . , xp ] is regular}
 
Ip = Ip (X, K) := K {[x0 , . . . , xp ] : [x0 , . . . , xp ] is irregular} .

One can further verify that ∂pnr (Ip ) ⊆ Ip−1 [64, Lemma 2.6], and so ∂pnr is well-defined
on Λp /Ip . Since Rp ∼= Λp /Ip via a natural linear isomorphism, one can define ∂p : Rp →
Rp−1 as the pullback of ∂pnr via this isomorphism [64, Definition 2.7]. Then ∂p is referred
to as the regular boundary map in dimension p, where p ∈ Z+ . Now we obtain a new chain
complex (Rp , ∂p )p∈Z+ .
Example 53 (Regular paths on a double edge). Consider again the digraph in Figure 1.11.
Applying the regular boundary map to the 2-path aba yields ∂2 (aba) = ba+ab. This exam-
ple illustrates the following general principle: Irregular paths arising from an application
of ∂• are treated as zeros.

Allowed paths on digraphs


We now expand on the notion of paths on a set to discuss paths on a digraph. We follow
the intuition developed in Examples 52 and 53.
Let G = (X, E) be a digraph (possibly infinite) without self-loops. For each p ∈ Z+ ,
one defines an elementary p-path [x0 , . . . , xp ] on X to be allowed if (xi , xi+1 ) ∈ E for each
0 ≤ i ≤ p − 1. For each p ∈ Z+ , the free vector space on the collection of allowed p-paths

36
on (X, E) is denoted Ap = Ap (G) = Ap (X, E, K), and is called the space of allowed
p-paths. One further defines A−1 := K and A−2 := {0}.

a b w x

d c z y

Figure 1.12: Two types of square digraphs.

∂-invariant paths and path homology


The allowed paths do not form a chain complex, because the image of an allowed path
under ∂ need not be allowed. This is rectified as follows. Given a digraph G = (X, E) and
any p ∈ Z+ , the space of ∂-invariant p-paths on G is defined to be the following subspace
of Ap (G):

Ωp = Ωp (G) = Ωp (X, E, K) := {c ∈ Ap : ∂p (c) ∈ Ap−1 } .

One further defines Ω−1 := A−1 ∼ = K and Ω−2 := A−2 = {0}. Now it follows by
the definitions that im(∂p (Ωp )) ⊆ Ωp−1 for any integer p ≥ −1. Thus we have a chain
complex:

3∂ 2 ∂ 1 ∂ 0 ∂ ∂−1
... −
→ Ω2 −
→ Ω1 −
→ Ω0 −
→ K −−→ 0

For each p ∈ Z+ , the p-dimensional path homology groups of G = (X, E) are defined
as:

HpΞ (G) = HpΞ (X, E, K) := ker(∂p )/ im(∂p+1 ).

Example 54 (Paths on squares). We illustrate the construction of Ω• for the digraphs in


Figure 1.12.
For 0 ≤ p ≤ 2, we have the following vector spaces of ∂-invariant paths:
Ω0 (GM ) = K[{a, b, c, d}] Ω0 (GN ) = K[{w, x, y, z}]
Ω1 (GM ) = K[{ab, cb, cd, ad}] Ω1 (GN ) = K[{wx, xy, zy, wz}]
Ω2 (GM ) = {0} Ω2 (GN ) = K[{wxy − wzy}]

37
The crux of the Ω• construction lies in understanding Ω2 (GN ). Note that even though
∂2GN (wxy),
∂2GN (wzy) 6∈ Ω2 (GN ) (because wy 6∈ A1 (GN )), we still have:

∂2GN (wxy − wzy) = xy − wy + wx − zy + wy − wz ∈ A1 (GN ).

Elementary calculations show that dim(H1Ξ (GM )) = 1, and dim(H1Ξ (GN )) = 0. Thus
path homology can successfully distinguish between these two squares.
To compare this with a simplicial approach, consider the directed clique complex ho-
mology studied in [102, 84, 118]. Given a digraph G = (X, E), the directed clique complex
is defined to be the ordered simplicial complex [88, p. 76] given by writing:

FG := X ∪ {(x0 , . . . , xp ) : (xi , xj ) ∈ E for all 0 ≤ i < j ≤ p} .

Here we use parentheses to denote ordered simplices. For the squares in Figure 1.12, we
have:

FGM = {a, b, c, d, ab, cb, cd, ad} and FGN = {w, x, y, z, wx, xy, wz, zy} ,

and so their simplicial homologies are equal.

Remark 55 (The challenge of finding a natural basis for Ω• ). The digraph GN in Example
54 is a minimal example showing that it is nontrivial to compute bases for the vector spaces
Ω• . Specifically, while it is trivial to read off bases for the allowed paths A• from a digraph,
one needs to consider linear combinations of allowed paths in a systematic manner to obtain
bases for the ∂-invariant paths.
Contrast this with the setting of simplicial homology: here the simplices themselves
form bases for the associated chain complex, so there is no need for an extra preprocessing
step. Thus when using PPH for asymmetric data, it is important to consider the trade-off
between greater sensitivity to asymmetry and increased computational cost.
We derive a procedure for systematically computing bases for Ω• in §4.4.

1.5.2 The persistent path homology of a network


Let X = (X, ωX ) ∈ N . For any δ ∈ R, the digraph GδX = (X, EX δ
) is defined as
follows:
δ
EX := {(x, x0 ) ∈ X × X : x 6= x0 , ωX (x, x0 ) ≤ δ}.
0
Note that for any δ 0 ≥ δ ∈ R, we have a natural inclusion map GδX ,→ GδX . Thus we may
0
associate to X the digraph filtration {GδX ,→ GδX }δ≤δ0 ∈R .
The functoriality of the path homology construction (Appendix 3.3.1, Proposition 172)
enables us to obtain a persistent vector space from a digraph filtration. Thus we make the
following definition:

38
0
Definition 20. Let G = {Gδ ,→ Gδ }δ≤δ0 ∈R be a digraph filtration. Then for each p ∈ Z+ ,
we define the p-dimensional persistent path homology of G to be the following persistent
vector space:
(ιδ,δ0 )# 0
PVecΞp (G) := {HpΞ (Gδ ) −−−−→ HpΞ (Gδ )}δ≤δ0 ∈R .
When it is defined, the diagram associated to PVecΞp (G) is denoted DgmΞp (G).

In particular, by Theorem 86, the path persistence diagram in dimension p is defined


for any (X, ωX ) ∈ CN . We write DgmΞp (X) to denote this diagram.
Persistent path homology is stable to perturbations of input data, and hence amenable
to data analysis:

Theorem 56 (Stability). Let (X, ωX ), (Y, ωY ) ∈ CN . Let p ∈ Z+ . Then,

dI (PVecΞp (X), PVecΞp (Y )) ≤ 2dN (X, Y ).

We note that by Corollary 177, we have:

dB (DgmΞk (X), DgmΞk (Y )) ≤ 2dN (X, Y ).

Remark 57. While the preceding stability result is analogous to those for Vietoris-Rips and
Dowker persistence, the proofs in this setting require results on the homotopy of digraphs
that were recently developed in [63] (cf. Section 3.3.2).

Having defined PPH, we now answer some fundamental questions related to its char-
acterization. We show that PPH agrees with Čech/Dowker persistence on metric spaces in
dimension 1, but not necessarily in higher dimensions. We also show that in the asymmetric
case, PPH and Dowker agree in dimension 1 if a certain local condition is satisfied.

Example 58 (PPH vs Dowker for metric n-cubes). In the setting of metric spaces, PPH is
generally different from Dowker persistence in dimensions ≥ 2. To see this, consider Rn
equipped with the Euclidean distance for n ≥ 3. Define

n := {(i1 , i2 , . . . , in ) : ij ∈ {0, 1} ∀ 1 ≤ j ≤ n} .

Then Gδn has no edges for δ < 1, and for δ = 1, it has precisely an edge between any two
points of n that differ on a single coordinate. But at δ = 1, Gδn is homotopy equivalent
to Gδn−1 : the homotopy equivalence is given by collapsing points that differ exactly on the
nth coordinate (see Figure 1.14). Proceeding recursively, we see that Gδn−1 is contractible
at δ = 1. However, Dsi (n ) is not contractible at δ = 1. Moreover,√an explicit verification
for the n = 3 case shows that DgmD 2 (3 ) consists of the point (1, 2) with multiplicity 7.
D Ξ
Thus Dgm2 (3 ) 6= Dgm2 (3 ).

Theorem 59. Let X = (X, AX ) ∈ CN be a symmetric network, and fix K = Z/pZ for
some prime p. Then DgmΞ1 (X ) = DgmD
1 (X ).

39
The preceding result shows that on metric spaces, PPH agrees with Dowker persistence
in dimension 1. The converse implication is not true: in §1.5.3, we provide a family of
highly asymmetric networks for which PPH agrees with Dowker persistence in dimension
1. On the other hand, the examples in Figure 1.13 show that equality in dimension 1 does
not necessarily hold for asymmetric networks. Moreover, it turns out that the four-point
configurations illustrated in Figure 1.13 can be used to give another partial characterization
of the networks for which PPH and Dowker persistence do agree in dimension 1. We
present this statement next.

x1 x2 x3 x4 y1 y2 y3 y4 x2 y2
x1 y1
x1 0 1 2 2 y1 0 1 2 1
x2 2 0 2 2 y2 2 0 2 2
X Y
x3 2 1 0 2 y3 2 1 0 1
x4 1 2 1 0 y4 2 2 2 0 x3 y3
x4 y4

Figure 1.13: Working over Z/2Z coefficients, we find that DgmΞ1 (X ) and DgmD
1 (Y) are
D Ξ
trivial, whereas Dgm1 (X ) = Dgm1 (Y) = {(1, 2)} = {(1, 2)}.

Definition 21 (Squares, triangles, and double edges). Let G be a finite digraph. Then we
define the following local configurations of edges between distinct nodes a, b, c, d:
• A double edge is a pair of edges (a, b), (b, a).

• A triangle is a set of edges (a, b), (b, c), (a, c).

• A short square is a set of edges (a, b), (a, d), (c, b), (c, d) such that neither of (a, c),
(c, a), (b, d), (d, b) is an edge.

• A long square is a set of edges (a, b), (b, c), (a, d), (d, c) such that neither of (b, d),(a, c)
is an edge.

a b a b a b a b

c d c d c

Finally, we define a network (X, AX ) to be square-free if GδX does not contain a four-
point subset whose induced subgraph is a short or long square, for any δ ∈ R. An important

40
observation is that to be a square, the subgraph induced by a four-point subset cannot just
include one of the configurations pictured above; it must exclude diagonal edges as well.
Remark 60. We thank Guilherme Vituri for pointing out the need to exclude the b → d
edge in the definition of a long square.

Theorem 61. Let X = (X, AX ) ∈ CN be a square-free network, and fix K = Z/pZ for
some prime p. Then DgmΞ1 (X ) = DgmD
1 (X ).

Remark 62. The proofs of Theorems 59 and 61 both require an argument where simplices
are paired up—this requires us to use Z/pZ coefficients in both theorem statements.

Figure 1.14: Left: Gδ3 is (digraph) homotopy equivalent to a point at δ = 1, as can be seen

by collapsing points along the orange lines. Right: Dsiδ,3 becomes contractible at δ = 2,

but has nontrivial homology in dimension 2 that persists across the interval [1, 2).

1.5.3 An application: Characterizing the diagrams of cycle networks


Notice that the cycle networks Gn defined in §1.3 are square-free. If x1 , x2 , . . . , xk ∈
Xn appear in Gn in this clockwise order, we write x1  x2  . . .  xk . If a ≺ b ≺ c ≺
d ≺ a are four nodes on a cycle network, then for any δ ∈ R such that we have an edge
a → d, we automatically have an edge a → c. Thus the subgraph induced by {a, b, c, d}
cannot be either a long or a short square.
Cycle networks constitute an interesting family of examples with surprising connec-
tions to existing literature [4] [3]. In particular, their Dowker persistence diagrams can be
fully characterized by results in [4], [3], and [37]. More specifically, given any n ≥ 3, we
know that DgmD 1 (Gn ) consists of the point (1, dn/2e) with multiplicity 1. In this sense, a
cycle network is a directed analogue of the circle.
A natural test to see if PPH detects cyclic behavior in an expected way is to see if it can
be characterized for cycle networks. This is the content of the following theorem.
Theorem 63. Let Gn be a cycle network for some integer n ≥ 3. Fix a field K = Z/pZ for
some prime p. Then DgmΞ1 (Gn ) = {(1, dn/2e)}.

41
1.6 The case of compact networks
Having surveyed the constructions of persistent homology on networks, we return to
dN and discuss some more results.

1.6.1 ε-systems and finite sampling


Proofs from this section, along with an auxiliary lemma, are provided in §2.2
In this section, we develop the notion of ε-systems of networks. These are related to
ε-nets for metric spaces. The key idea is that instead of trying to replicate the notion of
an ε-net in a network setting (where even the notion of an open ball does not make sense),
it generalizes a particular consequence of an ε-net to the network setting. This enables us
to obtain the important result that any compact network can indeed be approximated up to
arbitrary precision by a finite network. This result in turn is instrumental for guaranteeing
that compact networks have well-defined persistence diagrams.
Proofs from this section are provided in §2.2. We start with a collection of statements
about compact networks, and end with a result about ε-systems in networks equipped with
a Borel probability measure (see §1.9.1 for more on these measure networks).

Definition 22 (ε-approximations). Let ε > 0. A network (X, ωX ) ∈ N is said to be


ε-approximable by (Y, ωY ) ∈ N if dN (X, Y ) < ε. In this case, Y is said to be an ε-
approximation of X. Typically, we will be interested in the case where X is infinite and Y
is finite, i.e. in ε-approximating infinite networks by finite networks.

Definition 23 (ε-systems). Let ε > 0. For any network (X, ωX ), an ε-system on X is a


finite open cover U = {U1 , . . . , Un } , n ∈ N, of X such that for any 1 ≤ i, j ≤ n, we have
ωX (Ui , Uj ) ⊆ B(rij , ε) for some rij ∈ R.
In some cases, we will be interested in the situation where X is a finite union of con-
nected components {X1 , . . . , Xn } , n ∈ N. By a refined ε-system, we will mean an ε-
system such that each element of the ε-system is contained in precisely one connected
component of X.

Theorem 64 (∃ of refined ε-systems). Any compact network (X, ωX ) has a refined ε-system
for any ε > 0. In particular, by picking a representative from each element of U, we get a
finite network X 0 such that dN (X, X 0 ) < ε.

Remark 65. When considering a compact metric space (X, dX ), the preceding theorem
relates to the well-known notion of taking finite ε-nets in a metric space. Recall that for
ε > 0, a subset S ⊆ X is an ε-net if for any point x ∈ X, we have B(x, ε) ∩ S 6= ∅. Such
an ε-net satisfies the nice property that dGH (X, S) < ε [17, 7.3.11]. In particular, one can
find a finite ε-net of (X, dX ) for any ε > 0 by compactness.
We do not make quantitative estimates on the cardinality of the ε-approximation pro-
duced in Theorem 64. In the setting of compact metric spaces, the size of an ε-net relates

42
to the rich theory of metric entropy developed by Kolmogorov and Tihomirov [51, Chapter
17].
The preceding result shows that refined ε-systems always exist; this result relies cru-
cially on the assumption that the network is compact. The proof of the theorem uses the
continuity of ωX : X × X → R and the compactness of X × X. In the setting of compact
subsets of Euclidean space or compact metric spaces, ε-systems are easy to construct: we
can just take a cover by ε-balls, and then extract a finite subcover by invoking compactness.
The strength of Theorem 64 lies in proving the existence of ε-systems even when symmetry
and triangle inequality (key requirements needed to guarantee the standard properties of ε-
balls) are not assumed. The next result shows that by sampling points from all the elements
of an ε-system, one obtains a finite, quantitatively good approximation to the underlying
network.
Theorem 66 (ε-systems and dN ). Let (X, ωX ) be a compact network, let ε > 0, and let U
be an ε-system on X. Suppose X 0 is any finite subset of X that has nonempty intersection
with each element in U. Then there exists a correspondence R0 ∈ R(X, X 0 ) such that
dis(R0 ) < 4ε, and for each (x, x0 ) ∈ R0 we have {x, x0 } ∈ U for some U ∈ U. In
particular, it follows that

dN (X, ωX ), (X 0 , ωX |X 0 ×X 0 ) < 2ε.




The first statement in the preceding theorem asserts that we can choose a “well-behaved”
correspondence that associates to each point in X a point in X 0 that belongs to the same el-
ement in the ε-system. We omit the proof, as it follows essentially from the proof technique
for the next result.
By virtue of Theorem 64, one can always approximate a compact network up to any
given precision. The next theorem implies that a sampled network limits to the underlying
compact network as the sample gets more and more dense.
Theorem 67 (Limit of dense sampling). Let (X, ωX ) be a compact network, and let S =
{s1 , s2 , . . .} be a countable dense subset of X with a fixed enumeration. For each n ∈ N,
let Xn be the finite network with node set {s1 , . . . , sn } and weight function ωX |Xn ×Xn .
Then we have:
dN (X, Xn ) ↓ 0 as n → ∞.
We now briefly venture into measure networks, which are Polish spaces X equipped
with a Borel probability measure µX and an essentially bounded, measurable weight func-
tion ωX : X × X → R (more in §1.9.1). For such a network, it makes sense to ask about
an “optimal” ε-system, as in the next definition.
Definition 24. Let (X, ωX , µX ) be a measure network. Let U be any ε-system on X. We
define the minimal mass function

m(U) := min {µX (U ) : U ∈ U, µX (U ) > 0} .

43
Note that m returns the minimal non-zero mass of an element in U.
Next let ε > 0. Define a function Mε : CN → (0, 1] as follows:
Mε (X) := sup {m(U) : U a refined ε-system on X} .
Since U covers X, we know that the total mass of U is 1. Thus the set of elements U with
positive mass is nonempty, and so m(U) is strictly positive. It follows that Mε (X) is strictly
positive. More is true when µX is fully supported on X: given any ε-system U on X and
any U ∈ U, we automatically have µX (U ) > 0. To see this, suppose µX (U ) = 0. Then
U ∩ supp(µX ) = ∅, which is a contradiction because supp(µX ) = X and U ∩ X 6= ∅ by
our convention for an open cover (i.e. that empty elements are excluded, see §).
In the preceding definition, for a given ε > 0, the function Mε (X) considers the collec-
tion of all refined ε-systems on X, and then maximizes the minimal mass of any element in
such an ε-system. For an example, consider the setting of Euclidean space Rd : ε-systems
can be constructed using ε-balls, and the mass of an ε-ball scales as εd . The functions in
Definition 24 are crucial to the next result, which shows that as we sample points from a
distribution on a network, the sampled subnetwork converges almost surely to the support
of the distribution.
Theorem 68 (Probabilistic network approximation). Let (X, ωX ) be a network equipped
with a Borel probability measure µX . For each i ∈ N, let xi : Ω → X be an independent
random variable defined on some probability space (Ω, F, P) with distribution µX . For
each n ∈ N, let Xn = {x1 , x2 , . . . , xn }. Let ε > 0. Then we have:
n
 1 − Mε/2 (supp(µX ))
P {ω ∈ Ω : dN (supp(µX ), Xn (ω)) ≥ ε} ≤ ,
Mε/2 (supp(µX ))
where Xn (ω) is the subnetwork induced by {x1 (ω), . . . , xn (ω)}. In particular, the subnet-
work Xn converges almost surely to X in the dN -sense.
As noted before, the mass of an ε-ball in d-dimensional Euclidean space scales as εd .
Thus in the setting of Euclidean space Rd , the quantity on the right would scale as ε−d (1 −
εd ) n .

1.6.2 Weak isomorphism and dN


Proofs from this section can be found in §2.3
We now focus on the structure of the zero sets of dN . To proceed in this direction,
first notice that a strong isomorphism between two networks (X, ωX ) and (Y, ωY ), given
by a bijection f : X → Y , is equivalent to the following condition: there exists a set
Z and bijective maps ϕX : Z → X, ϕY : Z → Y such that ωX (ϕX (z), ϕX (z 0 )) =
ωY (ϕY (z), ϕY (z 0 )) for each z, z 0 ∈ Z. To see this, simply let Z = {(x, f (x)) : x ∈ X}
and let ϕX , ϕY be the projection maps on the first and second coordinates, respectively.
Based on this observation, we make the next definition.

44
Definition 25. Let (X, ωX ) and (Y, ωY ) ∈ N . We define X and Y to be Type I weakly
isomorphic, denoted X ∼=w
I Y , if there exists a set Z and surjective maps ϕX : Z → X and
ϕY : Z → Y such that ωX (ϕX (z), ϕX (z 0 )) = ωY (ϕY (z), ϕY (z 0 )) for each z, z 0 ∈ Z.

Strong isomorphism: Z Z Type I weak isomor-


φX , φY injective and φX φY φX φY phism: φX , φY only
surjective ∼ ∼ surjective
X =s Y X =wI Y

Figure 1.15: Relaxing the requirements on the maps of this “tripod structure” is a natural
way to weaken the notion of strong isomorphism.

Notice that Type I weak isomorphism is in fact a relaxation of the notion of strong
isomorphism. Indeed, if in addition to being surjective, we require the maps φX and φY
to be injective, then the strong notion of isomorphism is recovered. In this case, the map
φY ◦ φ−1
X : X → Y would be a weight preserving bijection between the networks X and Y .
The relaxation of strong isomorphism to a Type I weak isomorphism is illustrated in Figure
1.15. Also observe that the relaxation is strict. For example, the networks X = N1 (1) and
Y = N2 (12×2 ), are weakly but not strongly isomorphic via the map that sends both nodes
of Y to the single node of X.
We remark that when dealing with infinite networks, it will turn out that an even weaker
notion of isomorphism is required. We define this weakening next.

Definition 26. Let (X, ωX ) and (Y, ωY ) ∈ N . We define X and Y to be Type II weakly
isomorphic, denoted X ∼=wII Y , if for each ε > 0, there exists a set Zε and surjective maps
ε ε
φX : Zε → X and φY : Zε → Y such that

|ωX (φεX (z), φεX (z 0 )) − ωY (φεY (z), φεY (z 0 ))| < ε for all z, z 0 ∈ Zε . (1.4)

Remark 69 (Type I isomorphism is stronger than Type II). Let (X, ωX ), (Y, ωY ) ∈ CN
and suppose ϕ : X → Y is a surjective map such that ωX (x, x0 ) = ωY (ϕ(x0 ), ϕ(x0 )) for
all x, x0 ∈ X. Then X and Y are Type I weakly isomorphic and hence Type II weakly
isomorphic, i.e. X ∼=wII Y . This result follows from Definition 25 by: (1) choosing Z = X,
and (2) letting φX be the identity map, and (3) letting φY = ϕ. The converse implication,
i.e. that Type I weak isomorphism implies the existence of a surjective map as above, is not
true: an example is shown in Figure 1.16.

45
A B C
1 1 1
2 x 2 v 2 p r
z u
y 3 w 3 q s 3
1 1 12 2 1 1 
2 2 1 2 1 1
Ψ3A (x, y, z) = 221 Ψ3B (u, v, w) = 133 Ψ4C (p, q, r, s) = 2211
1133
113 133 1133

Figure 1.16: Note that Remark 69 does not fully characterize weak isomorphism, even
for finite networks: All three networks above, with the given weight matrices, are Type I
weakly isomorphic since C maps surjectively onto A and B. But there are no surjective,
weight preserving maps A → B or B → A.

It is easy to see that strong isomorphism induces an equivalence class on N . The same
is true for both types of weak isomorphism, and we record this result in the following
proposition.
Proposition 70. Weak isomorphism of Types I and II both induce equivalence relations on
N.
In the setting of FN , it is not difficult to show that the two types of weak isomorphism
coincide. This is the content of the next proposition. By virtue of this result, there is no
ambiguity in dropping the “Type I/II” modifier when saying that two finite networks are
weakly isomorphic.
Proposition 71. Let X, Y ∈ FN be finite networks. Then X and Y are Type I weakly
isomorphic if and only if they are Type II weakly isomorphic.
Type I weak isomorphisms will play a vital role in the content of this paper, but for now,
we focus on Type II weak isomorphism. The next theorem justifies calling dN a network
distance, and shows that dN is compatible with Type II weak isomorphism.
Theorem 72. dN is a metric on N modulo Type II weak isomorphism.
The proof is in §2.1. For finite networks, we immediately obtain:
The restriction of dN to FN yields a metric modulo Type I weak isomorphism.
The proof of Proposition 71 will follow from the proof of Theorem 72. In fact, an even
stronger result is true: weak isomorphism of Types I and II coincide for compact networks
as well.
Theorem 73 (Weak isomorphism in CN ). Let X, Y ∈ CN . Then X and Y are Type II
weakly isomorphic if and only if X and Y are Type I weakly isomorphic, i.e. there exists a
set V and surjections ϕX : V → X, ϕY : V → Y such that:
ωX (ϕX (v), ϕX (v 0 )) = ωY (ϕY (v), ϕY (v 0 )) for all v, v 0 ∈ V.

46
1.6.3 An additional axiom coupling weight function with topology
We now explore some additional constraints on the coupling between the topology on a
network and its weight function. Using these constraints, we are able to prove that weakly
isomorphic networks have, in a particular sense, a strongly isomorphic core. Moreover,
weak isomorphism on the whole is guaranteed by having a certain equality of substructures
called motifs. In particular, this generalizes an observation of Gromov about reconstruction
via motif sets in metric spaces [65, 3.27 12 ] to the setting of directed metric spaces.
First we present a definition that will be used later, and will also help us understand the
topological constraints we later impose.

Definition 27 (An equivalence relation and a quotient space). Let (X, ωX ) ∈ N . Define
the equivalence relation ∼ as follows:

x ∼ x0 iff ωX (x, z) = ωX (x0 , z) and ωX (z, x) = ωX (z, x0 ) for all z ∈ X.

Next define σ : X → X/ ∼ to be the canonical map sending any x ∈ X to its equivalence


class [x] ∈ X/ ∼. Also define ωX/∼ ([x], [x0 ]) := ωX (x, x0 ) for [x], [x0 ] ∈ X/ ∼. To check
that this map is well-defined, let a, a0 ∈ X be such that a ∼ x and a0 ∼ x0 . Then,

ωX (a, a0 ) = ωX (x, a0 ) = ωX (x, x0 ),

where the first equality holds because a ∼ x, and the second equality holds because a0 ∼ x0 .
We equip X/ ∼ with the quotient topology, i.e. a set is open in X/ ∼ if and only if its
preimage under σ is open in X. Then σ is a surjective, continuous map.

Recall that we often write xn → x to mean that a sequence (xn )n∈N in a topological
space X is converging to x ∈ X, i.e. any open set containing x contains all but finitely
many of the xn terms. We also often write “(xn )n∈N is eventually inside A ⊆ X” to mean
that xn ∈ A for all but finitely many n. Also recall that given a subspace Z ⊆ X equipped
with the subspace topology, we say that a particular toplogical property (e.g. convergence
or openness) holds relative Z or rel Z if it holds in the set Z equipped with the sub-
space topology. Throughout this section, we use the “relative” terminology extensively as
a bookkeeping device to keep track of the subspace with respect to which some topological
property holds.

Definition 28. Let (X, ωX ) ∈ N . We say that X has a coherent topology if the following
axioms are satisfied for any subnetwork Z of X equipped with the subspace topology:

A1 (Open sets in a first countable space) A set A ⊆ Z is open rel Z if and only if for any
sequence (xn )n∈N in Z converging rel Z to a point x ∈ A, there exists N ∈ N such
that xn ∈ A for all n ≥ N .

A2 (Topological triangle inequality) A sequence (xn )n∈N in Z converges rel Z to a point


unif. unif.
x ∈ Z if and only if ωX (xn , •)|Z −−→ ωX (x, •)|Z and ωX (•, xn )|Z −−→ ωX (•, x)|Z .

47
Axiom A1 is a characterization of open sets in first countable spaces; we mention it
explicitly for easy reference. Axiom A2 gives a characterization of convergence (and hence
of the open sets, via A1) in terms of the given weight function. Note that A2 does not
discount the possibility of a sequence converging to non-unique limits, does not force a
space to be Hausdorff, and does not force convergent sequences to be Cauchy. The name
topological triangle inequality is explained in the next remark.
Remark 74 (The “topological triangle inequality”). Consider a metric space (X, dX ). One
key property of such a space is that whenever dX (x, x0 ) is small, we also have dX (x, •) ≈
dX (x0 , •) by the triangle inequality. Said differently, if we have a sequence (xn )n and
xn → x, then |dX (xn , z) − dX (x, z)| ≤ dX (x, xn ) → 0 for any z ∈ X. Conversely, if
|dX (xn , z) − dX (x, z)| → 0 for all z ∈ X, then by letting z = x, we immediately obtain
dX (xn , x) → 0.
Axiom A2 abstracts away this consequence of the triangle inequality into its network
formulation. However, there is more subtlety in the definition. First consider the relation
∼. Informally, if we relax the definition of ∼ and require “approximate equality” instead
of strict equality, we say that x ∼ε x0 if
ωX (x, x) ≈ ωX (x, x0 ) ≈ ωX (x0 , x) ≈ ωX (x0 , x0 ) (1.5)
0 0
ωX (x, z) ≈ ωX (x , z) and ωX (z, x) ≈ ωX (z, x ) for all z ∈ X. (1.6)
Here the ε decoration on ∼ is incorporated into the ≈ notation in the obvious way. As we
observed earlier, in a metric space, the triangle inequality ensures that (1.5) implies (1.6).
More generally, let x ∈ X, let Z be a small ε-ball containing x, and suppose (xn )n is a
sequence in Z. If Z is small, then (1.5) holds for any (xn , x) pair in Z and forces (1.6) to
hold, not just in Z, but in all of X. This type of local-to-global inference is a consequence
of the triangle inequality.
In a network (X, ωX ), A2 captures this type of local-to-global inference in a weak
sense. Suppose Z ⊆ X, {xn }n ⊆ Z, x ∈ Z, and xn → x rel Z. Note that (1.5) does not
force (1.6) to hold even in Z, and so we explicitly assume xn → x rel Z, which implicitly
assumes (1.6) restricted to Z.
By a fact about convergence in a relative topology, (xn )n in Z converges to x ∈ Z rel Z
if and only if it converges rel X. So xn → x rel Z automatically forces xn → x rel X. Thus
by A2, we know that (1.6) holds, not just in Z, but in all of X. Because A2 generalizes
the triangle inequality in some sense, and relies on properties of the subspace topology, we
interpret it as a topological triangle inequality.
Remark 75 (Heredity of coherence). An alternative formulation of a coherent topology—
without invoking the “any subnetwork Z of X” terminology—would be to say that X
satisfies A2, and that A2 is hereditary, meaning that any subspace also satisfies A2. Note
that first countability is hereditary, so any subspace of X automatically satisfies A1.
One of the reasons for discussing coherent topologies is that it enables us to prove that
isometric maps are continuous. This also justifies Axiom A2 as a fundamental property
that we should expect networks to have.

48
Proposition 76. Let (X, ωX ), (Y, ωY ) be networks with coherent topologies. Suppose f :
X → Y is a weight-preserving map and f (X) is a subnetwork of Y with the subspace
topology. Then f is continuous.

The proof of this result in in §2.4.

Remark 77 (Relation to Kuratowski embedding). In the setting of a metric space (X, dX ),


the map X → Cb (X) given by x 7→ dX (x, •) is an isometry known as the Kuratowski
embedding. Here Cb (X) is the space of bounded, continuous functions on X equipped with
unif.
the uniform norm. Since this is an isometry, we know that xn → x in X iff dX (xn , •) −−→
dX (x, •) in Cb (X).
In the setting of a general network (X, ωX ), we do not start with a notion of convergence
of the form xn → x. However, by continuity of ωX , we are able to use the language of
convergence in Cb (X). The intuition behind Axiom A2 is to use convergence in Cb (X)
to induce a notion of convergence in X, with the appropriate adjustments needed for the
asymmetry of ωX .

We use the name “coherent” because it was used in the context of describing the cou-
pling between a metric-like function and its topology as far back as in [101].

Remark 78 (Examples of coherent topologies). Let (X, dX ) be a compact metric space.


Axioms A1-A2 hold in X by properties of the metric topology and the triangle inequality.
Let (Z, dZ ) denote a metric subspace equipped with the restriction of dX . Any subspace of
a first countable space is first countable, so Z is first countable and thus satisfies A1. Axiom
A2 holds for Z by the triangle inequality of dZ . Thus the metric topology on (X, dX ) is
coherent.
The network N2 αγ βδ where α, β, γ, δ are all distinct is a minimal example of an asym-


metric network with a coherent topology. In general, for a topology on a finite network to be
coherent, it needs to be coarser than the discrete topology. Consider the network N2 ( 11 11 )
on node set {p, q}. If we assume that the constant sequence (p, p, . . .) converges to q in
the sense of Axiom A2, then {q} cannot be open for Axiom A1 to be satisfied. However,
the trivial topology {∅, {p, q}} is coherent. More generally, the discrete topology on the
skeleton sk(X) of any finite network X (esentially X/ ∼, but defined more precisely in
§1.6.4) is coherent.
The directed network with finite reversibility (~S1 , ω~S1,ρ ) described in §1.3 is a compact,
asymmetric network with a coherent topology.

1.6.4 Skeletons, motifs, and motif reconstruction


Proofs from this section can be found in §2.4.
In this section, we provide further details on the structure of the fiber of weakly iso-
morphic networks. We begin by defining a motif set, which is the network analogue of
Gromov’s curvature classes [65, 3.27]. Informally, for each n ∈ N, the n-motif set is the

49
collection of n×n weight matrices obtained from n-tuples of points in X, possibly with rep-
etition. This is made precise next, after introducing some notation. For a sequence (xi )ni=1
of nodes in a network X, we will denote the associated weight matrix by ((ωX (xi , xj )))ni,j=1 .
Entry (i, j) of this matrix is simply ωX (xi , xj ).
Definition 29 (Motif set). For each n ∈ N and each (X, ωX ) ∈ CN , define ΨnX : X n →
Rn×n to be the map (x1 , · · · , xn ) 7→ ((ωX (xi , xj )))ni,j=1 , where the (()) notation refers to the
square matrix associated with the sequence. Note that ΨnX is simply a map that sends each
sequence of length n to its corresponding weight matrix. Let C(Rn×n ) denote the closed
subsets of Rn×n . Then let Mn : CN → C(Rn×n ) denote the map defined by

(X, ωX ) 7→ {ΨnX (x1 , . . . , xn ) : x1 , . . . , xn ∈ X} .

We refer to Mn (X) as the n-motif set of X. The interpretation is that Mn (X) is a bag
containing all the motifs of X that one can form by looking at all subnetworks of size n
(with repetitions). Notice that the image of Mn is closed in Rn×n because each coordinate
is the continuous image of the compact set X × X under ωX , hence the image of Mn is
compact in Rn×n and hence closed.
It is easy to come up with examples of networks that share the same motif sets, but are
not strongly isomorphic. However, as we later show in Theorem 84, weak isomorphism of
compact, separable, and coherent networks is precisely characterized by equality of motif
sets. Another crucial object for this result is the notion of a skeleton, which we define next.
Definition 30 (Automorphisms). Let (X, ωX ) ∈ CN . We define the automorphisms of X
to be the collection

Aut(X) := {ϕ : X → X : ϕ a weight preserving bijection} .

Definition 31 (Poset of weak isomorphism). Let (X, ωX ) ∈ CN . Define a set p(X) as


follows:

p(X) := {(Y, ωY ) ∈ CN : there exists a surjective, weight preserving map ϕ : X → Y } .

Next we define a partial order  on p(X) as follows: for any (Y, ωY ), (Z, ωZ ) ∈ p(X),

(Y, ωY )  (Z, ωY ) ⇐⇒ there exists a surjective, weight preserving map ϕ : Z → Y.

Then the set p(X) equipped with  is called the poset of weak isomorphism of X.
Definition 32 (Terminal networks in CN ). Let (X, ωX ) ∈ CN . A compact network Z ∈
p(X) is terminal if:
1. For each Y ∈ p(X), there exists a weight preserving surjection ϕ : Y → Z.

2. Let Y ∈ p(X). If f : Y → Z and g : Y → Z are weight preserving surjections, then


there exists ϕ ∈ Aut(Z) such that g = ϕ ◦ f .

50
X X V Y ···

f g

Z Z

Figure 1.17: Left: Z represents a terminal object in p(X), and f, g are weight preserving
surjections X → Z. Here ϕ ∈ Aut(Z) is such that g = ϕ ◦ f . Right: Here we show more
of the poset structure of p(X). In this case we have X  V  Y . . .  Z.

One of our main results (Theorem 84) shows that two weakly isomorphic networks have
strongly isomorphic skeleta.
A terminal network captures the idea of a minimal substructure of a network. One may
ask if anything interesting can be said about superstructures of a network. This motivates
the following construction of a “blow-up” network. We provide an illustration in Figure
1.18.

Definition 33. Let (X, ωX ) be any network. Let k = (kx )x∈X be a S choice of an index set
kx for each node x ∈ X. Consider the network X[k] with node set x∈X {(x, i) : i ∈ kx }
and weights ω given as follows: for x, x0 ∈ X and for i ∈ kx , i0 ∈ kx0 ,

ω (x, i), (x0 , i0 ) := ωX (x, x0 ).




S
The topology on X[k] is given as follows: the open sets are of the form x∈U {(x, i) : i ∈ kx },
where U is open in X. By construction, X[k] is first countable with respect to this topology.
We will call any such X[k] a blow-up network of X.

In a blow-up network of X, each node x ∈ X is replaced by another network, in-


dexed by kx . All internal weights of this network are constant and all outgoing weights are
preserved from the original network. If X is compact, then so is X[k].
We also observe that X is weakly isomorphic to any of its blow-ups Y = X[k]. To
see this, let Z = X[k], let φY : Z → Y be the map sending each (x, i) to (x, i), and let
φX : Z → X be the map sending each (x, i) to x. Then φX , φY are surjective, weight
preserving maps from Z onto X and Y respectively. By Remark 69, we obtain X ∼ =w Y .
The construction of blow-up networks provides a different perspective on Proposition
12:

51
2
1 4

(q, 1) (r, 1)
blow-up
3

2 2
2
1 q r 4 1 1 4 4
3 3 3

skeletonize 2

(q, 2) (r, 2)

1 3 4

 1 1 2 2 
N2 (( 13 24 )) N4 1122
3344
3344

Figure 1.18: Interpolating between the skeleton and blow-up constructions.

Proposition 79 (see also Proposition 12). Let (X, ωX ), (Y, ωY ) ∈ N . Then X ∼


=w Y if
and only if there exist blow-ups X 0 , Y 0 such that X 0 ∼
=s Y 0 .
We now define the skeleton of a compact network. Observe that when X is compact,
X/ ∼ is the continuous image of a compact space and so is compact. In general, first
countability of a topological space is not preserved under a surjective continuous map, but
it is preserved when the surjective, continuous map is also open [114, p. 27]. The following
proposition gives a sufficient condition on X which will ensure that X/ ∼ is first countable.
Proposition 80. Suppose (X, ωX ) ∈ N has a coherent topology. Then the map σ : X →
X/ ∼ is an open map, i.e. it maps open sets to open sets.
Definition 34 (The skeleton of a compact network). Suppose (X, ωX ) ∈ CN has a coherent
topology. The skeleton of X is defined to be (sk(X), ωsk(X) ) ∈ CN , where sk(X) := X/ ∼
, and

ωsk(X) ([x], [x0 ]) := ωX (x, x0 ) for all [x], [x0 ] ∈ sk(X).

52
Observe that sk(X) is compact because X is compact, and first countable by Proposi-
tion 80 and the fact that the image of first countable space under an open, surjective, and
continuous map is also first countable [114, p. 27]. Furthermore, ωsk(X) is well defined by
the definition of ∼.

The following proposition shows that skeletons inherit the property of coherence.

Proposition 81. Let (X, ωX ) be a compact network with a coherent topology. The quotient
topology on (sk(X), ωsk(X) ) is also coherent.

In addition to coherence, the skeleton has the following useful property.

Proposition 82. Let (X, ωX ) be a compact network with a coherent topology. Then its
skeleton (sk(X), ωsk(X) ) is Hausdorff.

Theorem 83 (Skeletons are terminal). Let (X, ωX ) ∈ CN be such that the topology on X
is coherent. Then (sk(X), ωsk(X) ) ∈ CN is terminal in p(X).

Recall that a topological space is separable if it contains a countable dense subset.

Theorem 84. Suppose (X, ωX ), (Y, ωY ) are separable, compact networks with coherent
topologies. Then the following are equivalent:

1. X ∼
=w Y .

2. Mn (X) = Mn (Y ) for all n ∈ N.

3. sk(X) ∼
=s sk(Y ).

1.7 Diagrams of compact networks and convergence results


Our aim in this work is to describe the convergence of persistent homology methods ap-
plied to network data. When dealing with finite networks, the vector spaces resulting from
applying a persistent homology method will necessarily be finite dimensional. However,
our setting is that of infinite (more specifically, compact) networks, and so we need addi-
tional machinery to ensure that our methods output well-defined persistent vector spaces.
The following definition and theorem are provided in full detail in [26].
νδ,δ0 0
Definition 35 (§2.1, [26]). A persistent vector space V = {V δ −−→ V δ }δ≤δ0 ∈R is q-tame
if νδ,δ0 has finite rank whenever δ < δ 0 .

Theorem 85 ([26], also [27] Theorem 2.3). Any q-tame persistent vector space V has
a well-defined persistence diagram Dgm(V). If U, V are ε-interleaved q-tame persistent
vector spaces, then dB (Dgm(U), Dgm(V)) ≤ ε.

53
As a consequence of developing the notion of ε-systems, we are able to prove the fol-
lowing:
Theorem 86. Let (X, ωX ) ∈ CN , k ∈ Z+ . Then the persistence vector spaces associated
to the Vietoris-Rips, Dowker, and PPH constructions are all q-tame.
The metric space analogue of Theorem 86 for VR and Čech complexes appeared in [27,
Proposition 5.1]; the same proof structure works in the setting of networks after applying
our results on approximation via ε-systems.
For this next result, we again refer the reader to §1.9.1 for additional details on measure
networks.
Theorem 87 (Convergence). Let (X, ωX ) be a measure network equipped with a Borel
probability measure µX . For each i ∈ N, let xi : Ω → X be an independent random
variable defined on some probability space (Ω, F, P) with distribution µX . For each n ∈ N,
let Xn = {x1 , x2 , . . . , xn }. Let ε > 0. Then we have:
n
• •
 1 − Mε/4 (supp(µX ))
P {ω ∈ Ω : dB (Dgm (supp(µX )), Dgm (Xn (ω))) ≥ ε} ≤ ,
Mε/4 (supp(µX ))

where Xn (ω) is the subnetwork induced by {x1 (ω), . . . , xn (ω)} and Dgm• is either of the
Vietoris-Rips, Dowker, or PPH diagrams. In particular, either of these three persistent vec-
tor spaces of the subnetwork Xn converges almost surely to that of supp(µX ) in bottleneck
distance.

1.8 Completeness, compactness, and geodesics


Proofs from this section are provided in §2.5.

1.8.1 Completeness of (CN /∼


=w , dN )
The following important result further justifies working in CN :
Theorem 88. The completion of (FN /∼
=w , dN ) is (CN /∼
=w , dN ).
The result of Theorem 88 can be summarized as follows:
The limit of a convergent sequence of finite networks is a compact topological space with
a continuous weight function.
Completeness of CN /∼ =w gives us a first useful criterion for convergence of networks.
Ideally, we would also want a criterion for convergence along the lines of sequential com-
pactness. In the setting of compact metric spaces, Gromov’s Precompactness Theorem
implies that the topology induced by the Gromov-Hausdorff distance admits many pre-
compact families of compact metric spaces (i.e. collections whose closure is compact)

54
[65, 17, 98]. Any sequence in such a precompact family has a subsequence converging to
some limit point of the family. In the next section, we extend these results to the setting of
networks. Namely, we show that that there are many families of compact networks that are
precompact under the metric topology induced by dN .

1.8.2 Precompact families in CN /∼


=w
We begin this section with some definitions.

Definition 36 (Diameter for networks, [35]). For any network (X, ωX ), define diam(X) :=
supx,x0 ∈X |ωX (x, x0 )|. For compact networks, the sup is replaced by max.

Definition 37. A family F of weak isomorphism classes of compact networks is uniformly


approximable if: (1) there exists D ≥ 0 such that for every [X] ∈ F, we have diam(X) ≤
D, and (2) for every ε > 0, there exists N (ε) ∈ N such that for each [X] ∈ F, there exists
a finite network Y satisfying card(Y ) ≤ N (ε) and dN (X, Y ) < ε.

Remark 89. The preceding definition is an analogue of the definition of uniformly totally
bounded families of compact metric spaces [17, Definition 7.4.13], which is used in for-
mulating the precompactness result in the metric space setting. A family of compact metric
spaces is said to be uniformly totally bounded if there exists D ∈ R+ such that each space
has diameter bounded above by D, and for any ε > 0 there exists Nε ∈ N such that each
space in the family has an ε-net with cardinality bounded above by Nε . Recall that given
a metric space (X, dX ) and ε > 0, a subset S ⊆ X is an ε-net if for any point x ∈ X,
we have B(x, ε) ∩ S 6= ∅. Such an ε-net satisfies the nice property that dGH (X, S) < ε
[17, 7.3.11]. Thus an ε-net is an ε-approximation of the underlying metric space in the
Gromov-Hausdorff distance.

Theorem 90. Let F be a uniformly approximable family in CN /∼ =w . Then F is precompact,


i.e. any sequence in F contains a subsequence that converges in CN /∼=w .

1.8.3 Geodesics: existence and explicit examples


Thus far, we have motivated our discussion of compact networks by viewing them as
limiting objects of finite networks. By the results of the preceding section, we know that
(CN /∼ =w , dN ) is complete and obeys a well-behaved compactness criterion. In this section,
we prove that this metric space is also geodesic, i.e. any two compact networks can be
joined by a rectifiable curve with length equal to the distance between the two networks.
Geodesic spaces can have a variety of practical implications. For example, geodesic
spaces that are also complete and locally compact are proper (i.e. any closed, bounded
subset is compact), by virtue of the Hopf-Rinow theorem [17, §2.5.3]. Any probability
measure with finite second moment supported on such a space has a barycenter [92, Lemma
3.2], i.e. a “center of mass”. Conceivably, such a result can be applied to a compact,

55
geodesically convex region of (CN /∼ =w , dN ) to compute an “average” network from a
collection of networks. Such a result is of interest in statistical inference, e.g. when one
wishes to represent a noisy collection of networks by a single network. Similar results on
barycenters of geodesic spaces can be found in [61, 82]. We leave a treatment of this topic
from a probabilistic framework as future work, and only use this vignette to motivate the
results in this section.
We begin with some definitions.

Definition 38 (Curves and geodesics). A curve on N joining (X, ωX ) to (Y, ωY ) is any


continuous map γ : [0, 1] → N such that γ(0) = (X, ωX ) and γ(1) = (Y, ωY ). We will
write a curve on FN (resp. a curve on CN ) to mean that the image of γ is contained in
FN (resp. CN ). Such a curve is called a geodesic [16, §I.1] between X and Y if for all
s, t ∈ [0, 1] one has:
dN (γ(t), γ(s)) = |t − s| · dN (X, Y ).
A metric space is called a geodesic space if any two points can be connected by a geodesic.

The following theorem is a useful result about geodesics:

Theorem 91 ([17], Theorem 2.4.16). Let (X, dX ) be a complete metric space. If for any
x, x0 ∈ X there exists a midpoint z such that dX (x, z) = dX (z, y) = 21 dX (x, y), then X is
geodesic.

As a first step towards showing that CN /∼


=w is geodesic, we show that the collection
of finite networks forms a geodesic space.

Theorem 92. The metric space (FN /∼ =w , dN ) is a geodesic space. More specifically, let
[X], [Y ] ∈ (FN /∼
=w , dN ). Then, for any R ∈ R opt (X, Y ), we can construct a geodesic
γR : [0, 1] → FN /∼
=w between [X] and [Y ] as follows:

γR (0) := [(X, ωX )], γR (1) := [(Y, dY )], and γR (t) := [(R, ωγR (t) )] for t ∈ (0, 1),

where for each (x, y), (x0 , y 0 ) ∈ R and t ∈ (0, 1),

ωγR (t) (x, y), (x0 , y 0 ) := (1 − t) · ωX (x, x0 ) + t · ωY (y, y 0 ).




A key step in the proof of the preceding theorem is to choose an optimal correspondence
between two finite networks. This may not be possible, in general, for compact networks.
However, using the additional results on precompactness and completeness of CN /∼ =w , we
are able to obtain the desired geodesic structure in Theorem 93. The proof is similar to
the one used by the authors of [73] to prove that the metric space of isometry classes of
compact metric spaces endowed with the Gromov-Hausdorff distance is geodesic.

Theorem 93. The complete metric space (CN /∼


=w , dN ) is geodesic.

56
Remark 94. Consider the collection of compact metric spaces endowed with the Gromov-
Hausdorff distance. This collection can be viewed as a subspace of (CN /∼ =w , dN ). It is
known (via a proof relying on Theorem 91) that this restricted metric space is geodesic [73].
Furthermore, it was proved in [36] that an optimal correspondence always exists in this set-
ting, and that such a correspondence can be used to construct explicit geodesics instead of
resorting to Theorem 91. The key technique used in [36] was to take a convergent sequence
of increasingly-optimal correspondences, use a result about compact metric spaces called
Blaschke’s theorem [17, Theorem 7.3.8] to show that the limiting object is closed, and then
use metric properties such as the Hausdorff distance to guarantee that this limiting object
is indeed a correspondence. A priori, such techniques cannot be readily adapted to the
network setting, and while one can obtain a convergent sequence of increasingly-optimal
correspondences, the obstruction lies in showing that the limiting object is indeed a cor-
respondence. Thus in our proof of Theorem 93, we resort to the indirect route of using
Theorem 91.

While the existence of geodesics is satisfactory, it turns out that in some sense, there are
“too many” geodesics in CN . This problem is already apparent in FM. As shown in [36],
FM contains both branching and non-unique geodesics. The simultaneous presence of
both these types of geodesics precludes the placement of curvature bounds (in the sense of
Alexandrov curvature, which is a commonly used notion of curvature in metric geometry)
on even FM. The issue lies in the definition of dGH /dN : the l∞ structure of these metrics
is what enables the existence of these exotic geodesics. We reproduce these results in the
next few sections. First we show a quick lemma.

Lemma 95. Let (Z, dZ ) be a metric space. Let S, T ∈ R, with S < T , and γ : [S, T ] → Z
be a curve such that
|s − t|
dZ (γ(s), γ(t)) ≤ · dZ (γ(S), γ(T )), for all s, t ∈ [S, T ].
|S − T |

Then, in fact,

|s − t|
dZ (γ(s), γ(t)) = · dZ (γ(S), γ(T )), for all s, t ∈ [S, T ].
|S − T |

Proof of Lemma 95. Suppose the inequality is strict. Suppose also that s ≤ t. Then by the
triangle inequality, we obtain:

dZ (γ(S), γ(T )) ≤ dZ (γ(S), γ(s)) + dZ (γ(s), γ(t)) + dZ (γ(t), γ(T ))


(s − S) + (t − s) + (T − t)
< · dZ (γ(S), γ(T )).
T −S
This is a contradiction. Similarly we get a contradiction for the case t < s. This proves the
lemma.

57
Deviant geodesics
For any n ∈ N, let ∆n denote the n-point discrete space, often called the n-point unit
simplex. Fix n ∈ N, n ≥ 2. We will construct an infinite family of deviant geodesics
between ∆1 and ∆n , named as such because they deviate from the straight-line geodesics
given by Theorem 92. As a preliminary step, we describe the straight-line geodesic between
∆1 and ∆n of the form given by Theorem 92. Let {p} and {x1 , . . . , xn } denote the under-
lying sets of ∆1 and ∆n . There is a unique correspondence R := {(p, x1 ), . . . , (p, xn )}
between these two sets. According to the setup in Theorem 92, the straight-line geodesic
between ∆1 and ∆n is then given by the metric spaces (R, dγR (t) ), for t ∈ (0, 1). Here
dγR (t) ((p, xi ), (p, xj )) = t · d∆n (xi , xj ) = t for each t ∈ (0, 1) and each 1 ≤ i, j ≤ n. This
corresponds to the all-t matrix with 0s on the diagonal. Finally, we note that the unique
correspondence R necessarily has distortion 1. Thus dGH (∆1 , ∆n ) = 12 .
Now we give the parameters for the construction of a certain family of deviant geodesics
between ∆1 and ∆n . For any α ∈ (0, 1] and t ∈ [0, 1], define
(
tα : 0 ≤ t ≤ 12
f (α, t) :=
α − tα : 21 < t ≤ 1

Next let m be a positive integer such that 1 ≤ m ≤ n, and fix a set

Xn+m := {x1 , x2 , x3 , . . . , xn+m }.

Fix α1 , . . . , αm ∈ (0, 1]. For each 0 ≤ t ≤ 1, define the matrix δt := ((dtij ))n+m
i,j=1 by:


 0 :i=j

f (α , t) : j − i = n
i
For 1 ≤ i, j ≤ n + m, dtij :=
f (αj , t) : i − j = n



t : otherwise.

This is a block matrix BAT B C
where A is the n × n all-t matrix with 0s on the diagonal,
C is an m × m all-t matrix with 0s on the diagonal, and B is the n × m all-t matrix with
f (α1 , t), f (α2 , t), . . . , f (αm , t) on the diagonal.
We first claim that δt is the distance matrix of a pseudometric space. Symmetry is clear.
We now check the triangle inequality. In the cases 1 ≤ i, j, k ≤ n and n + 1 ≤ i, j, k ≤
n + m, the points xi , xj , xk form the vertices of an equilateral triangle with side length t.
Suppose 1 ≤ i, j ≤ n and n + 1 ≤ k ≤ n + m. Then the triple xi , xj , xk forms an isosceles
triangle with equal longest sides of length t, and a possibly shorter side of length f (αi , t)
(if |k − i| = n), f (αj , t) (if |k − j| = n), or just a third equal side with length t in the
remaining cases. The case 1 ≤ i ≤ n, n + 1 ≤ j, k ≤ n + m is similar. This verifies the
triangle inequality. Also note that δt is the distance matrix of a bona fide metric space for
t ∈ (0, 1). For t = 1, we identify the points xi and xi−n , for n + 1 ≤ i ≤ n + m, to obtain
∆n , and for t = 0, we identify all points together to obtain ∆1 . This allows us to define

58
geodesics between ∆1 and ∆n as follows. Let α
~ denote the vector (α1 , . . . , αm ). We define
a map γα~ : [0, 1] → M by writing:

γα~ (t) := (Xn+m , δt ) t ∈ [0, 1],

where we can take quotients at the endpoints as described above.


We now verify that these curves are indeed geodesics. There are three cases: s, t ∈
[0, 2 ], s, t ∈ ( 21 , 1], and s ∈ [0, 12 ], t ∈ ( 12 , 1]. By using the diagonal correspondence
1

diag, we check case-by-case that dis(diag) ≤ |t − s|. Thus for any s, t ∈ [0, 1], we have
dGH (γα~ (s), γα~ (t)) ≤ 12 |t − s| = |t − s| · dGH (∆1 , ∆n ). It follows by Lemma 95 that γα~ is a
geodesic between ∆1 and ∆n . Furthermore, since α ~ ∈ (0, 1]m was arbitrary, this holds for
any such α ~ . Thus we have an infinite family of geodesics γα~ : [0, 1] → M from ∆1 to ∆n .
A priori, some of these geodesics may intersect at points other than the endpoints. By
this we mean that there may exist t ∈ (0, 1) and α ~ 6= β~ ∈ (0, 1]m such that [γα~ (t)] =
[γβ~ (t)] in M/∼. This is related to the branching phenomena that we describe in the next
section. For now, we give an infinite subfamily of geodesics that do not intersect each
other anywhere except at the endpoints. Recall that the separation of a finite metric space
(X, dX ) is the smallest positive distance in X, which we denote by sep(X). If sep(X) <
sep(Y ) for two finite metric spaces X and Y , then dGH (X, Y ) > 0.
Let ≺ denote the following relation on (0, 1]m : for α ~ , β~ ∈ (0, 1]m , set α
~ ≺ β~ if αi < βi
for each 1 ≤ i ≤ m. Next let α ~ , β~ ∈ (0, 1]m be such that α
~ ≺ β.~ Then γ ~ is a geodesic from
β
∆1 to ∆n which is distinct (i.e. non-isometric) from γα~ everywhere except at its endpoints.
This is because the condition α ~ ≺ β~ guarantees that for each t ∈ (0, 1), sep(γα~ (t)) <
sep(γβ~ (t)). Hence dGH (γα~ (t), γβ~ (t)) > 0 for all t ∈ (0, 1).
~ ∈ (0, 1)m , and let ~1 denote the all-ones vector of length m. For η ∈ [0, 1],
Finally, let α
~
define β(η) := (1 − η)~a + η~1. Then by the observations about the relation ≺, {γβ(η) ~ :
η ∈ [0, 1]} is an infinite family of geodesics from ∆1 to ∆n that do not intersect each other
anywhere except at the endpoints.
Note that one could choose the diameter of ∆n to be arbitrarily small and still obtain
deviant geodesics via the construction above.

Branching geodesics

59
The structure of dGH permits branching geodesics, as illustrated on the right. We use the
notation (a)+ for any a ∈ R to denote max(0, a). As above, fix n ∈ N, n ≥ 2, and consider
the straight-line geodesic between ∆1 and ∆n described at the beginning of Section 1.8.3.
Throughout this section, we denote this geodesic by γ : [0, 1] → M. We will construct an
infinite family of geodesics which branch off from γ. For convenience, we will overload
notation and write, for each t ∈ [0, 1], the distance matrix of γ(t) as γ(t). Recall from
above that γ(t) is a symmetric n × n matrix with the following form:
 
0 t t ... t
 0 t . . . t
 
 . . .. 
 . . . . .
0

For t > a1 , we have dGH γ(t), γ (a1 ) (t) > 0, because any correspondence between
γ(t), γ (a1 ) (t) has distortion at least t − a1 . Thus γ (a1 ) branches off from γ at a1 .
The construction of γ (a1 ) (t) above is a special case of a one-point metric extension.
Such a construction involves appending an extra row and column to the distance matrix of
the starting space; explicit conditions for the entries of the new row and column are stated
in [97, Lemma 5.1.22]. In particular, γ (a1 ) (t) above satisfies these conditions.
Procedurally, the γ (a1 ) (t) construction can be generalized as follows. Let (•) denote
any finite subsequence of (ai )i∈N . We also allow (•) to be the empty subsequence. Let aj
denote the terminal element in this subsequence. Then for any ak , k > j, we can construct
γ (•,ak ) as follows:
1. Take the rightmost column of γ (•) (t), replace the only 0 by (t − ak )+ , append a 0 at
the bottom.
2. Append this column on the right to a copy of γ (•) (t).
3. Append the transpose of another copy of this column to the bottom of the newly
constructed matrix to make it symmetric.
The objects produced by this construction satisfy the one-point metric extension con-
ditions [97, Lemma 5.1.22], and hence are distance matrices of pseudometric spaces. By
taking the appropriate quotients, we obtain valid distance matrices. Symmetry is satisfied
by definition, and the triangle inequality is satisfied because any triple of points forms an
isosceles triangle with longest sides equal. We write Γ(•) (t) to denote the matrix obtained
from γ (•) (t) after taking quotients. As an example, we obtain the following matrices after
taking quotients for γ (a1 ) (t) above, for 0 ≤ t ≤ a1 (below left) and for a1 < t ≤ 1 (below
right):  
  0 t ... t t
0 t ... t  0 ... t
 0 . . . t  t  
   . . .. .. 
 . . .. 
  . . . 
 . .  
 0 (t − a1 )
0
0

60
Now let (aij )kj=1 be any finite subsequence of (ai )i∈N . For notational convenience, we
write (bi )i instead of (aij )kj=1 . Γ(bi )i is a curve in M; we need to check that it is moreover a
geodesic.
Let s ≤ t ∈ [0, 1]. Then Γ(bi )i (s) and Γ(bi )i (t) are square matrices with n + p and
n + q columns, respectively, for nonnegative integers p and q. It is possible that the matrix
grows in size between s and t, so we have q ≥ p. Denote the underlying point set by
{x1 , x2 , . . . , xn+p , . . . , xn+q }. Then define:

A := {(xi , xi ) : 1 ≤ i ≤ n + p}, B := {(xn+p , xj ) : n + p < j ≤ n + q}, R := A ∪ B.

Here B is possibly empty. Note that R is a correspondence between Γ(bi )i (s) and
Γ(bi )i (t), and by direct calculation we have dis(R) ≤ |t − s|. Hence we have

dGH Γ(bi )i (s), Γ(bi )i (t) ≤ 21 · |t − s| = |t − s| · dGH (∆1 , ∆n ).




An application of Lemma 95 now shows that Γ(bi )i is a geodesic.


The finite subsequence (bi )i of (ai )i∈N was arbitrary. Thus we have an infinite family
of geodesics which branch off from γ. Since the increasing sequence (ai )i∈N ∈ (0, 1)N was
arbitrary, the branching could occur at arbitrarily many points along γ.
Remark 96. The existence of branching geodesics shows that (M/ ∼, dGH ) is not an
Alexandrov space with curvature bounded below [17, Chapter 10]. Moreover, the exis-
tence of deviant (i.e. non-unique) geodesics shows that (M/∼, dGH ) cannot have curvature
bounded from above, i.e. (M/∼, dGH ) is not a CAT(k) space for any k > 0 [16, Proposition
2.11].
Thus far, we have produced a comprehensive treatment of dN , CN , and persistent ho-
mology methods on CN that are stable with respect to dN . This concludes the original
objective stated at the beginning of §1.
We now deviate slightly and consider the case of measure networks, which comprise
the network analogue of metric measure spaces. There is some indirect connection be-
tween persistent homology methods and measure networks (cf. the convergence results in
§1.7), but currently there is no direct analogue to “Gromov-Hausdorff stability of persistent
homology” for measure networks. However, the notion of measure networks enables the
import of methods other than persistent homology to the setting of network data analysis.
Thus we devote the next section to laying the foundations for this topic.

1.9 Measure networks and the dN,p distances


The intuitive idea behind dN is to search for the best possible alignment of edges (ac-
cording to weights) while simultaneously aligning nodes. One crucial observation about
this setup is that dN is very sensitive to outliers: the l∞ structure of dN is sensitive to even
a single outlier. In other words, dN is not equipped with a method for handling the signif-
icance of nodes. Techniques based on optimal transport (OT) provide an elegant solution

61
to this problem by endowing a network with a probability measure. The user adjusts the
measure to signify important network substructures and to smooth out the effect of outliers.
This approach was adopted in [70] to compare various real-world network datasets mod-
eled as metric measure (mm) spaces—metric spaces equipped with a probability measure.
This work was based in turn on the formulation of the Gromov-Wasserstein (GW) distance
between mm spaces presented in [85, 86].

a m

µX (x1 ) µY (y1 )

e f q r
d g p s

h t
µX (x2 ) µX (x3 ) µY (y2 ) µY (y3 )
i u
b c n o
X Y

Figure 1.19: Illustrations of the finite networks we consider in this paper. Notice that the
edge weights are asymmetric. The numbers in each node correspond to probability masses;
for each network, these masses sum to 1.

An alternative definition of the GW distance due to Sturm (the transportation formu-


lation) appeared in [116], although this formulation is less amenable to practical computa-
tions than the one in [85] (the distortion formulation). Both the transportation and distortion
formulations were studied carefully in [85, 86, 117]. It was further observed by Sturm in
[117] that the definition of the (distortion) GW distance can be extended to gauged measure
spaces of the form (X, dX , µX ). Here X is a Polish space, dX is a symmetric L2 function
on X × X (that does not necessarily satisfy the triangle inequality), and µX is a Borel prob-
ability measure on X. These results are particularly important in the context of the current
paper.
Exact computation of GW distances amounts to solving a nonconvex quadratic pro-
gram. Towards this end, the computational techniques presented in [85, 86] included both
readily-computable lower bounds and an alternate minimization scheme for reaching a lo-
cal minimum of the GW objection function. This alternate minimization scheme involved
solving successive linear optimization problems, and was used for the computations in [70].
From now on, we reserve the term network for network datasets that cannot necessarily
be represented as metric spaces, unless qualified otherwise. An illustration is provided in

62
Figure 1.19. Already in [70], it was observed that numerical computation of GW distances
between networks worked well for network comparison even when the underlying datasets
failed to be metric. This observation was further developed in [100], where the focus from
the outset was to define generalized discrepancies between matrices that are not necessarily
metric.
On the computational front, the authors of [100] directly attacked the nonconvex opti-
mization problem by considering an entropy-regularized form of the GW distance (ERGW)
following [111], and using a projected gradient descent algorithm based on results in
[10, 111]. This approach was also used (for a generalized GW distance) on graph-structured
datasets in [119]. It was pointed out in [119] that the gradient descent approach for the
ERGW problem occasionally requires a large amount of regularization to obtain conver-
gence, and that this could possibly lead to over-regularized solutions. A different approach,
developed in [85, 86], considers the use of lower bounds on the GW distance as opposed to
solving the full GW optimization problem. This is a practical approach for many use cases,
in which it may be sufficient to simply obtain lower bounds for the GW distance.
In the current section, we use the GW distance formulation to define and develop a met-
ric structure on the space of measure networks. Additionally, by following the approaches
used in [85, 86], we are able to produce quantitatively stable network invariants that pro-
duce polynomial-time lower bounds on this network GW distance.

1.9.1 The structure of measure networks


Let X be a Polish space with Borel σ-field denoted by writing Borel(X), and let µX be
a Borel probability measure on Borel(X). We will write Prob(X) to denote the collection
of Borel probability measures supported on X. For each 1 ≤ p < ∞, denote by Lp (µX )
the space of µX -measurable functions f : X → R such that |f |p is µX -integrable. For
p = ∞, denote by Lp (µX ) the space of essentially bounded µX -measurable functions, i.e.
functions that are bounded except on a set of measure zero. Formally, these spaces are
equivalence classes of functions, where functions are equivalent if they agree µX -a.e.
We write µX ⊗ µX (equivalently µ⊗2 X ) to denote the product measure on Borel(X) ⊗
Borel(X) (equivalently Borel(X) ). Next let ωX ∈ L∞ (µ⊗2
⊗2
X ). Then ωX is essentially
p
bounded. Since µX is finite, it follows that |ωX | is integrable for any 1 ≤ p < ∞.
By a measure network, we mean a triple (X, ωX , µX ). The naming convention arises
from the case when X is finite; in such a case, we can view the pair (X, ωX ) as a complete
directed graph with asymmetric real-valued edge weights. Accordingly, the points of X
are called nodes, pairs of nodes are called edges, and ωX is called the edge weight function
of X. The collection of all measure networks will be denoted Nm .

Remark 97. Sturm has studied symmetric, L2 versions of measure networks (called gauged
measure spaces) in [117], and we point to his work as an excellent reference on the geom-
etry of such spaces. Our motivation comes from studying networks, hence the difference
in our naming conventions.

63
The information contained in a network should be preserved when we relabel the nodes
in a compatible way; we formalize this idea by the following notion of strong isomorphism
of measure networks.

Definition 39 (Strong isomorphism). To say (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm are strongly


isomorphic means that there exists a Borel measurable bijection ϕ : supp(X) → supp(Y )
(with Borel measurable inverse ϕ−1 ) such that

• ωX (x, x0 ) = ωY (ϕ(x), ϕ(x0 )) for all x, x0 ∈ supp(X), and

• ϕ∗ µX = µY .

We will denote a strong isomorphism between measure networks by X ∼


=s Y .

Example 98. Networks with one or two nodes will be very instructive in providing exam-
ples and counterexamples, so we introduce them now with some special terminology.

• By N1 (a) we will refer to the network with one node X = {p}, a weight ωX (p, p) =
a, and the Dirac measure δp = 1p .

• By N2 (( ac db ) , α, β) we will mean a two-node network with node set X = {p, q}, and
weights and measures given as follows:

ωX (p, p) = a µX ({p}) = α
ωX (p, q) = b µX ({q}) = β
ωX (q, p) = c
ωX (q, q) = d

• Given a k-by-k matrix Σ ∈ Rk×k and a k × 1 vector v ∈ Rk+ with sum 1, we


automatically obtain a network on k nodes that we denote as Nk (Σ, v). Notice that
Nk (Σ, v) ∼=1 N` (Σ0 , v 0 ) if and only if k = ` and there exists a permutation matrix P
of size k such that Σ0 = P Σ P T and P A = A0 .

Notation. Even though µX takes sets as its argument, we will often omit the curly braces
and use µX (p, q, r) to mean µX ({p, q, r}).
We wish to define a notion of distance on Nm that is compatible with isomorphism. A
natural analog is the Gromov-Wasserstein distance defined between metric measure spaces
[85]. To adapt that definition for our needs, we first recall the definition of a measure
coupling.

64
1.9.2 Couplings and the distortion functional
Let (X, ωX , µX ), (Y, ωY , µY ) be two measure networks. A coupling between these two
networks is a probability measure µ on X × Y with marginals µX and µY , respectively.
Stated differently, couplings satisfy the following property:

µ(A × Y ) = µX (A) and µ(X × B) = µY (B), for all A ∈ Borel(X) and B ∈ Borel(Y ).

The collection of all couplings between (X, ωX , µX ) and (Y, ωY , µY ) will be denoted
C (µX , µY ), abbreviated to C when the context is clear.
In the case where we have a coupling µ between two measures ν, ν 0 on the same network
(X, ωX ), the quantity µ(A × B) is interpreted as the amount of mass transported from A
to B when interpolating between the two distributions ν and ν 0 . In this special case, a
coupling is also referred to as a transport plan.
Here we also recall that the product σ-field on X ×Y , denoted Borel(X)⊗Borel(Y ), is
defined as the σ-field generated by the measurable rectangles A × B, where A ∈ Borel(X)
and B ∈ Borel(Y ). Because our spaces are all Polish, we always have Borel(X × Y ) =
Borel(X) ⊗ Borel(Y ) [13, Lemma 6.4.2].
The product measure µX ⊗ µY is defined on the measurable rectangles by writing

µX ⊗ µY (A × B) := µX (A)µX (B), for all A ∈ Borel(X) and for all B ∈ Borel(Y ).

By a consequence of Fubini’s theorem and the π-λ theorem, the property above uniquely
defines the product measure µX ⊗ µY among measures on Borel(X × Y ).

Example 99 (Product coupling). Let (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm . The set C (µX , µY )


is always nonempty, because the product measure µ := µX ⊗ µY is always a coupling
between µX and µY .

Example 100 (1-point coupling). Let X be a set, and let Y = {p} be the set with one
point. Then for any probability measure µX on X there is a unique coupling µ = µX ⊗ δp
between µX and δp . To see this, first we check that µ as defined above is a coupling. Let
A ∈ Borel(X). Then µ(A × Y ) = µX (A)δp (Y ) = µX (A), and similarly µ(X × {p}) =
µX (X)δp ({p}) = δp ({p}). Thus µ ∈ C (X, Y ). For uniqueness, let ν be another coupling.
It suffices to show that ν agrees with µ on the measurable rectangles. Let A ∈ Borel(X),
and observe that

ν(A × {p}) = (πX )∗ ν(A) = µX (A) = µX (A)δp ({p}) = µ(A × {p}).

On the other hand, ν(A × ∅) ≤ ν(X × ∅) = (πY )∗ ν(∅) = 0 = µX (A)δp (∅) = µ(A × ∅).
Thus ν satisfies the property ν(A × B) = µX (A)δp (B). Thus by uniqueness of the
product measure, ν = µX ⊗ δp . Finally, note that we can endow X and Y with weight
functions ωX and ωY , thus adapting this example to the case of networks.

65
Example 101 (Diagonal coupling). Let (X, ωX , µX ) ∈ Nm . The diagonal coupling be-
tween µX and itself is defined by writing
Z
∆(A×B) := 1A×B (x, x0 ) dµX (x) dδx (x0 ) for all A ∈ Borel(X), B ∈ Borel(Y ).
X×X

To see that this is a coupling, let A ∈ Borel(X). Then,


Z Z
∆(A × X) = 1A×X (x, x ) dµX (x) dδx (x ) = 1A (x) dµX (x) = µX (A),
0 0
X×X X

and similarly ∆(X × A) = µX (A). Thus ∆ ∈ C (µX , µX ).


Now we turn to the notion of the distortion of a coupling. Let (X, ωX , µX ), (Y, ωY , µY )
be two measure networks. For convenience, we define the function

ΩX,Y : (X × Y )2 → R by writing (x, y, x0 , y 0 ) 7→ ωX (x, x0 ) − ωY (y, y 0 ).

Next let µ ∈ C (µX , µY ), and consider the probability space (X × Y )2 equipped with the
product measure µ ⊗ µ. For each p ∈ [1, ∞) the p-distortion of µ is defined as:
Z Z 1/p
0 0 p 0 0
disp (µ) = |ωX (x, x ) − ωY (y, y )| dµ(x, y) dµ(x , y )
X×Y X×Y
= kΩX,Y k Lp (µ⊗µ) .

For p = ∞, the p-distortion is defined as:

disp (µ) := sup{|ωX (x, x0 ) − ωY (y, y 0 )| : (x, y), (x0 , y 0 ) ∈ supp(µ)}.

When the context is clear, we will often write kf kp to denote kf kLp (µ⊗µ) .

1.9.3 Interval representation and continuity of distortion


We now record some standard results about Polish spaces (see also [117, §1.3]). Recall
that for a measure space (X, F, µ), an atom is an element A ∈ F such that 0 < µ(A) < ∞
and for every B ∈ F such that B ⊆ A, we have µ(B) = 0 or µ(B) = µ(A). In our
network setting, the atoms are singletons. To see this, let (X, ωX , µX ) ∈ Nm . The under-
lying measurable space consists of the Polish space X and its Borel σ-field. Because the
topology on X is just the metric topology for a suitable metric, we can use standard tech-
niques involving intersections of elements of covers to show that any atom is necessarily a
singleton. Next, since µX is a finite measure, Borel(X) can have at most countably many
atoms. In particular, µX can be decomposed as the sum of a countable number of atomic
(Dirac) measures and a nonatomic measure [74]:

X
µX = ci δxi + µ0X , xi ∈ X, ci ∈ [0, 1] for each i ∈ N.
i=1

66
In what follows, we follow the presentation in [117]. Since X is Polish, it can be viewed
as a standard Borel space [113] and therefore as the pushforward of Lebesgue measure on
the unit interval I. More specifically, let C0 = 0, write Ci = ij=1 ci for i ∈ N ∪ {∞},
P
I 0 = [C∞ , 1], and X 0 = supp(µ0X ). Now X 0 is a standard Borel space equipped with a
nonatomic measure, so by [113, Theorem 3.4.23], there is a Borel isomorphism ρ0 : I 0 →
X 0 such that µ0X = ρ0∗ λI 0 , where λI 0 denotes Lebesgue measure restricted to I 0 . Define the
representation map ρ : I → X as follows:

ρ([Ci−1 , Ci )) := {xi } for all i ∈ N, ρ|[C∞ ,1] := ρ0 .

The map ρ0 is not necessarily unique, and therefore neither is ρ. Any such map ρ is
called a parametrization of X. In particular, we have µX = ρ∗ λI .
The benefit of this construction is that it allows us to represent the underlying measur-
able space of a network via the unit interval I. Moreover, by taking the pullback of ωX via
ρ, we obtain a network (I, ρ∗ ωX , λI ). As we will see in the next section, this permits the
strategy of proving results over I and transporting them back to X using ρ.

Remark 102 (A 0-distortion coupling between a space and its interval representation).
Let (X, ωX , µX ) ∈ Nm , and let (I, ρ∗ ωX , λI ) be an interval representation of X for some
parametrization ρ. Consider the map (ρ, id) : I → X × I given by i 7→ (ρ(i), i). Define
µ := (ρ, id)∗ λI . Let A ∈ Borel(X) and B ∈ Borel(I). Then µ(A × I) = λI ({j ∈
I : ρ(j) ∈ A}) = µX (A). Also, µ(X × B) = λI ({j ∈ B : ρ(j) ∈ X}) = λI (B).
Thus µ is a coupling between µX and λI . Moreover, for any A ∈ Borel(X) and any
B ∈ Borel(I), if for each j ∈ B we have ρ(j) 6∈ A, then we have µ(A × B) = 0. In
particular, µ(A × B) = µ ((A ∩ ρ(B)) × B). Also, given (x, i) ∈ X × I, we have that
ρ(i) 6= x implies (x, i) 6∈ supp(µ).
Let 1 ≤ p < ∞. For convenience, define ωI := ρ∗ ωX . An explicit computation of
disp (µ) shows:
Z Z
p
p
disp (µ) = |ωX (x, x0 ) − ωI (i, i0 )| dµ(x, i) dµ(x0 , i0 )
ZX×I
Z X×I
p
= |ωX (ρ(i), ρ(i0 )) − ωI (i, i0 )| dλI (i) dλI (i0 )
I I
= 0.

For p = ∞, we have:

dis∞ (µ) = sup{|ωX (x, x0 ) − ωI (i, i0 )| : (x, i), (x0 , i0 ) ∈ supp(µ)}


= sup{|ωX (ρ(i), ρ(i0 )) − ωI (i, i0 )| : i, i0 ∈ supp(λ)}
= 0.

67
1.9.4 Optimality of couplings in the network setting
We now collect some results about probability spaces. Let X be a Polish space. A
subset P ⊆ Prob(X) is said to be tight if for all ε > 0, there is a compact subset Kε ⊆ X
such that µX (X \ Kε ) ≤ ε for all µX ∈ P .
A sequence (µn )n∈N ∈ Prob(X)N is said to converge narrowly to µX ∈ Prob(X) if
Z Z
lim f dµn = f dµX for all f ∈ Cb (X),
n→∞ X X

the space of continuous, bounded, real-valued functions on X. Narrow convergence is


induced by a distance [5, Remark 5.1.1], hence the convergent sequences in Prob(X) com-
pletely determine a topology on Prob(X). This topology on Prob(X) is called the narrow
topology. In some references [117], narrow convergence (resp. narrow topology) is called
weak convergence (resp. weak topology).
A further consequence of having a metric on Prob(X) [5, Remark 5.1.1] is that single-
tons are closed. This simple fact will be used below.
Theorem 103 (Prokhorov, [5] Theorem 5.1.3). Let X be a Polish space. Then P ⊆
Prob(X) is tight if and only if it is relatively compact, i.e. its closure is compact in
Prob(X).
Lemma 104 (Lemma 4.4, [121]). Let X, Y be two Polish spaces, and let PX ⊆ Prob(X),
PY ⊆ Prob(Y ) be tight in their respective spaces. Then the set C (PX , PY ) ⊆ Prob(X×Y )
of couplings with marginals in PX and PY is tight in Prob(X × Y ).
Lemma 105 (Compactness of couplings; Lemma 1.2, [117]). Let X, Y be two Polish
spaces. Let µX ∈ Prob(X), µY ∈ Prob(Y ). Then C (X, Y ) is compact in Prob(X × Y ).
Proof. The singletons {µX }, {µY } are closed and of course compact in Prob(X), Prob(Y ).
Hence by Prokhorov’s theorem, they are tight. Now consider C (X, Y ) ⊆ Prob(X × Y ).
Since this is obtained by intersecting the preimages of the continuous projections onto the
marginals µX and µY , we know that it is closed. Furthermore, C (X, Y ) is tight by Lemma
104. Then by another application of Prokhorov’s theorem, it is compact.
The following lemma appeared for the L2 case in [117].
Lemma 106 (Continuity of the distortion functional on intervals). Let 1 ≤ p < ∞, and let
(I, σX , λI ), (I, σY , λI ) ∈ Nm . The distortion functional disp is continuous on C (λI , λI ) ⊆
Prob(I × I). For p = ∞, dis∞ is lower semicontinuous.
The next lemma is standard.
Lemma 107 (Gluing lemma, Lemma 1.4 in [117], also Lemma 7.6 in [120]). Let µ1 , µ2 , . . . , µk
be probability measures supported on Polish spaces X1 , . . . , Xk . For each i ∈ {1, . . . , k −
1}, let µi,i+1 ∈ C (µi , µi+1 ). Then there exists µ ∈ Prob(X1 × X2 × . . . × Xk ) with
marginals µi,i+1 on Xi × Xi+1 for each i ∈ {1, . . . , k − 1}.

68
1.9.5 The Network Gromov-Wasserstein distance
For each p ∈ [1, ∞], we define:
1
dN,p (X, Y ) := inf disp (µ) for each (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm .
2 µ∈C (µX ,µY )

As we will see below, dN,p is a legitimate pseudometric on Nm . The structure of dN,p is


analogous to a formulation of the Gromov-Wasserstein distance between metric measure
spaces [86, 117].

Remark 108 (Boundedness of dN,p ). Recall from Example 99 that for any X, Y ∈ Nm ,
C (µX , µY ) always contains the product coupling, and is thus nonempty. A consequence
is that dN,p (X, Y ) is bounded for any p ∈ [1, ∞]. Indeed, by taking the product coupling
µ := µX ⊗ µY we have
1
dN,p (X, Y ) ≤ disp (µ).
2
Suppose first that p ∈ [1, ∞). Applying Minkowski’s inequality, we obtain:

disp (µ) = kωX − ωY kLp (µ⊗µ)


≤ kωX kLp (µ⊗µ) + kωY kLp (µ⊗µ)
Z Z 1/p
0 p 0 0
= |ωX (x, x )| dµ(x, y) dµ(x , y )
X×Y X×Y
Z Z 1/p
0 p 0 0
+ |ωY (y, y )| dµ(x, y) dµ(x , y )
X×Y X×Y
Z Z 1/p
0 p 0
= |ωX (x, x )| dµX (x) dµX (x )
X X
Z Z 1/p
0 p 0
+ |ωY (y, y )| dµY (y) dµY (y )
Y Y
= kωX kLp (µX ⊗µX ) + kωY kLp (µY ⊗µY ) < ∞.

The case p = ∞ case is analogous, except that integrals are replaced by taking essential
suprema as needed.

In some simple cases, we obtain explicit formulas for computing dN,p .

Example 109 (Easy examples of dN,p ). Let a, b ∈ R and consider the networks N1 (a) and
N1 (b). The unique coupling between the two networks is the product measure µ = δx ⊗ δy ,
where we understand x, y to be the nodes of the two networks. Then for any p ∈ [1, ∞],
we obtain:
disp (µ) = |ωN1 (a) (x, x) − ωN1 (b) (y, y)| = |a − b|.

69
α α0
p 0
(X, ωX , µX ) p

Figure 1.20: The dN,p distance between the two one-node networks is simply 21 |α − α0 |. In
Example 109 we give an explicit formula for computing dN,p between an arbitrary network
and a one-node network.

Thus dN,p (N1 (a), N1 (b)) = 21 |a − b|.


Let (X, ωX , µX ) ∈ Nm be any network and let N1 (a) = ({y}, a) be a network with one
node. Once again, there is a unique coupling µ = µX ⊗ δy between the two networks. For
any p ∈ [1, ∞), we obtain:
Z Z 1/p
1 1 0 p 0
dN,p (X, N1 (a)) = disp (µ) = |ωX (x, x ) − a| dµX (x)dµX (x ) .
2 2 X X

For p = ∞, we have dN,p (X, N1 (a)) = sup{ 12 |ωX (x, x0 ) − a| : x, x0 ∈ supp(µX )}.

Remark 110. dN,p is not necessarily a metric modulo strong isomorphism. Let X =
{x1 , x2 , x3 } and Y = {y1 , y2 , y3 }. Consider a coupling µ given as:

y1 y2 y3
 
x1 1/3 0 0
µ = x2  1/6 0 0 .
x3 0 1/6 1/3

Next equip X and Y with edge weights {e, f, g, h} as in Figure 1.21.


Comparing the edge weights, it is clear that X and Y are not strongly isomorphic.
However, dN,p (X, Y ) = 0 for all p ∈ [1, ∞]. To see this, define:

G = {(x1 , y1 ), (x2 , y1 ), (x3 , y2 ), (x3 , y3 )}

Then G contains all the points with positive µ-measure. Given any two points (x, y), (x0 , y 0 ) ∈
G, we observe that |ωX (x, x0 ) − ωY (y, y 0 )| = 0. Thus for any p ∈ [1, ∞], disp (µ) = 0, and
so dN,p (X, Y ) = 0.
The definition of dN,p is sensible in the sense that it captures the notion of a distance:
Theorem 111. For each p ∈ [1, ∞], dN,p is a pseudometric on Nm .

70
e e
x1 y1
e g h g g
e h h

g f
x2 x3 y2 y3
h f
e f f f
X Y

Figure 1.21: Networks at dN,p -distance zero which are not strongly isomorphic.

By the next result, this infimum is actually attained. Hence we may write:
1
dN,p (X, Y ) := min disp (µ).
2 µ∈C (X,Y )

Definition 40 (Optimal couplings). Let (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm , and let p ∈


[1, ∞). A coupling µ ∈ C (µX , µY ) is optimal if disp (µ) = 2dN,p (X, Y ).

The next result stands out in contrast to the case for dN : whereas we do not have results
about optimality of dN , the following result comes relatively easily by virtue of Prokhorov’s
lemma.

Theorem 112. Let (X, ωX , µX ) and (Y, ωY , µY ) be two measure networks, and let p ∈
[1, ∞]. Then there exists an optimal coupling, i.e. a minimizer for disp (·) in C (µX , µY ).

It remains to discuss the precise pseudometric structure of dN,p . The following defini-
tion is a relaxation of strong isomorphism.

Definition 41 (Weak isomorphism). (X, ωX , µX ), (Y, ωY , µY ) ∈ N are weakly isomor-


phic, denoted X ∼
=w Y , if there exists a Borel probability space (Z, µZ ) with measurable
maps f : Z → X and g : Z → Y such that

• f∗ µZ = µX , g∗ µZ = µY , and

• kf ∗ ωX − g ∗ ωY k∞ = 0.

Here f ∗ ωX : Z × Z → R is the pullback weight function given by the map (z, z 0 ) 7→


ωX (f (z), f (z 0 )). The map g ∗ ωY is defined analogously. For the definition to make sense,
we need to check that f ∗ ωX is measurable. Let (a, b) ∈ Borel(R). Then B := {ωX ∈

71
(a, b)} is measurable because ωX is measurable. Because f is measurable, we know that
(f, f ) : Z × Z → X × X is measurable. Thus A := (f, f )−1 (B) is measurable. Now we
write:
A = {(z, z 0 ) ∈ Z 2 : ((f (z), f (z 0 )) ∈ B}
= {(z, z 0 ) ∈ Z 2 : ωX (f (z), f (z 0 )) ∈ (a, b)}
= (f ∗ ωX )−1 (a, b).
Thus f ∗ ωX is measurable. Similarly, we verify that g ∗ ωY is measurable.
Theorem 113 (Pseudometric structure of dN,p ). Let (X, ωX , µX ), (Y, ωY , µY ) ∈ N , and
let p ∈ [1, ∞]. Then dN,p (X, Y ) = 0 if and only if X ∼
=w Y .
Remark 114. Theorem 113 is in the same spirit as related results for gauged measure
spaces [117] and for networks under dN , as discussed earlier. The “tripod structure” X ←
Z → Y described above is much more difficult to obtain in the setting of dN .
In the next section we follow a brief diversion to study a Gromov-Prokhorov distance
between measure networks. While it is not the main focus of the current paper, it turns out
to be useful for the notion of interleaving stability that we define in §1.9.7.

1.9.6 The Network Gromov-Prokhorov distance


Let α ∈ [0, ∞). For any (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm , we write C := C (µX , µY )
and define:
GP 1
dN,α (X, Y ) := inf inf{ε > 0 :
2 µ∈C
µ ⊗ µ {x, y, x0 , y 0 ∈ (X × Y )2 : |ωX (x, x0 ) − ωY (y, y 0 )| ≥ ε} ≤ αε}.


GP
Theorem 115. For each α ∈ [0, ∞), dN,α is a pseudometric on Nm .
Lemma 116 (Relation between Gromov-Prokhorov and Gromov-Wasserstein). Consider
(X, ωX , µX ), (Y, ωY , µY ) ∈ Nm . We always have:
GP
dN,0 (X, Y ) = dN,∞ (X, Y ).

1.9.7 Lower bounds and measure network invariants


Let (V, dV ) denote a pseudometric space. By a (pseudo)metric-valued network invari-
ant, we mean a function ι : Nm → V such that X ∼ = Y implies dV (ι(X), ι(Y )) = 0. We are
also interested in R-parametrized network invariants, which are functions ι : Nm × R → V
such that X ∼= Y implies dV (ι(X), ι(Y )) = 0. This is a bona fide generalization of the non-
parametrized setting, because any map ι : Nm → V can be viewed as being parametrized
by a constant object {0}.
There are two notions of stability that we are interested in.

72
Definition 42 (Lipschitz stability). Let p ∈ [1, ∞]. A Lipschitz-stable network invariant is
an invariant ιp : Nm → V for which there exists a Lipschitz constant L(ιp ) > 0 such that
dV (ιp (X), ιp (Y )) ≤ L(ιp )dN,p (X, Y ) for all X, Y ∈ Nm .
Definition 43 (Interleaving stability). Let p ∈ [1, ∞]. An interleaving-stable network
invariant is an R-parametrized invariant ιp : Nm × R → V for which there exists an
interleaving constant α ∈ R and a symmetric interleaving function ε : Nm × Nm → R such
that
ιp (X, t) ≤ ιp (Y, t+εXY )+αεXY ≤ ιp (X, t+2εXY )+2αεXY for all t ∈ R and X, Y ∈ Nm .
Here εXY := ε(X, Y ). In Example 117 below, we give some invariants that are interleaving
stable.
Example 117 (A map that ignores/emphasizes large edge weights). Let t ∈ R. For each
p ∈ [1, ∞], the pth t-sublevel set map for the weight function, denoted subw
p,t : Nm → R+ ,
is given as:
Z 1/p
w 0 p 0
subp,t (X, ωX , µX ) = |ωX (x, x )| d(µX ⊗ µX )(x, x ) for p ∈ [1, ∞),
{ωX ≤t}

subw
p,t (X, ωX , µX ) = sup{|ωX (x, x0 )| : x, x0 ∈ supp(µX ), ωX (x, x0 ) ≤ t} for p = ∞.
This map de-emphasizes large edge weights in a measure network. Analogously, one
can consider integrating over the set {ωX ≥ t}. In this case, the larger edge weights are
emphasized. The corresponding superlevel set invariant is denoted supwp,t .

Theorem 118 (Interleaving stability of the sublevel/superlevel set weight invariants). Let
p ∈ [1, ∞]. The subw p invariant is interleaving-stable with interleaving constant α = 1
and interleaving function dN,∞ . The supwp invariant is interleaving-stable with interleaving
constant α = −1 and interleaving function −dN,∞ .
We now define a family of local invariants that incorporate data from the networks at a
much finer scale. Computing these local invariants amounts to solving an optimal transport
(OT) problem, which is a linear programming (LP) task.
Example 119 (A generalized eccentricity function). Let (X, ωX , µX ) be a measure net-
work. Then consider the eccout
p,X : X → R+ map
Z 1/p
out p
eccp,X (s) := |ωX (s, x)| dµX (x) = kωX (s, ·)kLp (µX ) .
X

The p = ∞ version is defined analogously, with the integral replaced by a supremum


over the support. We can also replace ωX (s, ·) above with ωX (·, s) to obtain another map
eccin
p,X . In general, the two maps will not agree due to the asymmetry of the network. This
invariant is an asymmetric generalization of the p-eccentricity function for metric measure
spaces [86, Definition 5.3]

73
Example 120 (A joint eccentricity function). Let (X, ωX , µX ) and (Y, ωY , µY ) be two
measure networks, and let p ∈ [1, ∞]. Define the (outer) joint eccentricity function
eccout
p,X,Y : X × Y → R+ of X and Y as follows: for each (s, t) ∈ X × Y ,

eccout
p,X,Y (s, t) := inf kωX (s, ·) − ωY (t, ·)kLp (µ) .
µ∈C (µX ,µY )

For p ∈ [1, ∞), this invariant has the following form:


 Z 1/p
0 0 p 0 0
eccout
p,X,Y (s, t) := inf |ωX (s, x ) − ωY (t, y )| dµ(x , y ) .
µ∈C (µX ,µY ) X×Y

One obtains the inner joint eccentricity function by using the term ωX (·, s)−ωY (·, t) above,
and we denote this by eccin
p,X,Y .

Theorem 121 (Stability of local R-valued invariants). The eccentricity and joint eccen-
tricity invariants are both Lipschitz stable, with Lipschitz constant 2. Formally, for any
(X, ωX , µX ), (Y, ωY , µY ) ∈ Nm , we have:

inf eccout out


p,X − eccp,Y Lp (µ)
≤ 2dN,p (X, Y ), (eccentricity bound)
µ∈C (µX ,µY )

inf eccout
p,X,Y Lp (µ)
≤ 2dN,p (X, Y ). (joint eccentricity bound)
µ∈C (µX ,µY )

Moreover, the joint eccentricity invariant provides a stronger bound than the eccentric-
ity bound, i.e.

inf eccout out


p,X − eccp,Y Lp (µ)
≤ inf eccout
p,X,Y Lp (µ)
≤ 2dN,p (X, Y ).
µ∈C (µX ,µY ) µ∈C (µX ,µY )

Finally, the analogous bounds hold in the case of the inner eccentricity and inner joint
eccentricity functions.

Remark 122. The analogous bounds in the setting of metric measure spaces were provided
in [85], where the eccentricity and joint eccentricity bounds were called the First and Third
Lower Bounds, respectively. The TLB later appeared in [107].

Having described the form of the local network invariants, we now leverage a particu-
larly useful fact about optimal transport over the real line. For probability measures over R,
the method for constructing an optimal coupling is known, and this gives a simple formula
for computing the OT cost in terms of the cumulative distribution functions of the measures
[120, Remark 2.19]. Later we obtain lower bounds based on distributions over R that can
be computed easily and remain stable with respect to the local invariants described above.

Remark 123. The structure of the joint eccentricity bound (i.e. the TLB) in Theorem 121
shows that a priori, it involves solving an ensemble of OT problems, one for each pair
(x, y) ∈ X × Y , and a final OT problem once eccout
p,X,Y is computed.

74
Example 124 (Pushforward via ωX ). Recall that given any (X, ωX , µX ), the corresponding
pushforward of µX ⊗ µX via ωX is given as follows: for any generator of Borel(R) of the
form (a, b) ⊆ R,

(ωX )∗ (µX ⊗ µX )(a, b) := (µX ⊗ µX )({ωX ∈ (a, b)})


Z Z
= 1{ωX ∈(a,b)} (x, x0 ) dµX (x) dµX (x0 ).
X X

For convenience, we define νX := (ωX )∗ (µ⊗2


X ). This distribution is completely determined
by its cumulative distribution function, which we denote by FωX . This is a function R →
[0, 1] given by:
Z Z
FωX (t) := (µX ⊗ µX )({ωX ≤ t}) = 1{ωX ≤t} (x, x0 ) dµX (x) dµX (x0 ).
X X

The distribution-valued invariant above is a global invariant. The corresponding local


versions are below.
Example 125 (Pushforward via a single coordinate of ωX ). Let (X, ωX , µX ) and x ∈ X be
given. Then we can define local distribution-valued invariants as follows: for any generator
of Borel(R) of the form (a, b) ⊆ R,

(ωX (x, ·))∗ µX (a, b) := µX {x0 ∈ X : ωX (x, x0 ) ∈ (a, b)}




(ωX (·, x))∗ µX (a, b) := µX {x0 ∈ X : ωX (x0 , x) ∈ (a, b)} .




We adopt the following shorthand:

λX (x) := (ωX (x, ·))∗ µX , ρX (x) := (ωX (·, x)∗ µX .

Here we write λ and ρ to refer to the “left” and “right” arguments, respectively. The corre-
sponding distribution functions are defined as follows: for any t ∈ R,
Z
FωX (x,·) (t) := µX ({ωX (x, ·) ≤ t}) = 1{ωX (x,·)≤t} (x0 ) dµX (x0 )
ZX
FωX (·,x) (t) := µX ({ωX (·, x) ≤ t}) = 1{ωX (·,x)≤t} (x0 ) dµX (x0 ).
X

It is interesting to note that we get such a pair of distributions for each x ∈ X. Thus we
can add yet another layer to this construction, via the maps Nm → pow(Prob(R)) defined
by writing

(X, ωX , µX ) 7→ {λX (x) : x ∈ X} , and


(X, ωX , µX ) 7→ {ρX (x) : x ∈ X} for each (X, ωX , µX ) ∈ Nm .

Assume for now that we equip Prob(R) with the Wasserstein metric. Write X := {λX (x)}x∈X ,
let dX denote the Wasserstein metric, and let µX := (λX )∗ µX . More specifically, for any

75
A ∈ Borel(X), we have µX (A) = µX ({x ∈ X : λX (x) ∈ A}). This yields a metric mea-
sure space (X, dX , µX ). So even though we do not start off with a metric space, the operation
of passing into distributions over R forces a metric structure on (X, ωX , µX ).
Next let (Y, ωY , µY ) ∈ N , and suppose (Y, dY , µY ) are defined as above. Since X, Y ⊆
Prob(R), we know that µX , µY are both distributions on Prob(R). Thus we can compare
them via the p-Wasserstein distance as follows, for p ∈ [1, ∞):
Z 1/p
p
dW,p (µX , µY ) = inf dW,p (λX (x), λY (y)) dµ(λX (x), λY (y))
µ∈C (µX ,µY ) Prob(R)2

By the change of variables formula, this quantity coincides with one that we show below
to be a lower bound for 2dN,p (X, Y ) (cf. Inequality (1.14) of Theorem 127).

Example 126 (Pushforward via eccentricity). Let (X, ωX , µX ), (Y, ωY , µY ) ∈ N , and let
(a, b) ∈ Borel(R). Recall the outer and inner eccentricity functions eccout in
p,X and eccp,X from
Example 119. These functions induce distributions as follows:

eccout out
  
p,X ∗ µ X (a, b) = µ X x ∈ X : ecc p,X (x) ∈ (a, b) ,
in in
  
eccp,X ∗ µX (a, b) = µX x ∈ X : eccp,X (x) ∈ (a, b) .

Next let µ ∈ C (µX , µY ) and recall the joint outer/inner eccentricity functions eccout
p,X,Y and
in
eccp,X,Y from Example 120. These functions induce distributions as below:

eccout out
  
p,X,Y ∗ µ(a, b) = µ (x, y) ∈ X × Y : eccp,X,Y (x, y) ∈ (a, b) ,
in in
  
eccp,X,Y ∗ µ(a, b) = µ (x, y) ∈ X × Y : eccp,X,Y (x, y) ∈ (a, b) .

Theorem 127 (Stability of the ωX and eccentricity-pushforward distributions). Suppose


(X, ωX , µX ), (Y, ωY , µY ) ∈ Nm . Then we have the following statements about Lipschitz

76
stability, for p ∈ [1, ∞):
Z 1/p
0 0 p 0 0
2dN,p (X, Y ) ≥ inf |ωX (x, x ) − ωY (y, y )| dµ(x, x , y, y )
µ∈C (µX ⊗µX ,µY ⊗µY ) X 2 ×Y 2
(1.7)
Z 1/p
≥ inf |a − b|p dν(a, b) . (1.8)
ν∈C (νX ,νY ) R2
Z 1/p
p
2dN,p (X, Y ) ≥ inf eccout
p,X (x) − eccout
p,Y (y) dµ(x, y) (1.9)
µ∈C (µX ,µY ) X×Y
Z 1/p
p
≥ inf |a − b| dγ(a, b) . (1.10)
γ∈C ((eccout out
p,X )∗ µX ,(eccp,Y )∗ µY ) R2
Z 1/p
p
2dN,p (X, Y ) ≥ inf eccin
p,X (x) − eccin
p,Y (y) dµ(x, y) (1.11)
µ∈C (µX ,µY ) X×Y
Z 1/p
p
≥ inf |a − b| dγ(a, b) . (1.12)
γ∈C ((eccin in
p,X )∗ µX ,(eccp,Y )∗ µY ) R2
Z Z 1/p
0 0 p 0 0
2dN,p (X, Y ) ≥ inf inf |ωX (x, x ) − ωY (y, y )| dγ(x , y ) dµ(x, y)
µ∈C (µX ,µY ) X×Y γ∈C (µX ,µY ) X×Y
(1.13)
Z Z 1/p
≥ inf inf |a − b|p dγ(a, b) dµ(x, y) .
µ∈C (µX ,µY ) X×Y γ∈C (λX (x),λY (y)) R2
(1.14)
Z Z 1/p
0 0 p 0 0
2dN,p (X, Y ) ≥ inf inf |ωX (x, x ) − ωY (y, y )| dγ(x, y) dµ(x , y )
µ∈C (µX ,µY ) X×Y γ∈C (µX ,µY ) X×Y
(1.15)
Z Z 1/p
≥ inf 0
inf 0
|a − b|p dγ(a, b) dµ(x0 , y 0 ) .
µ∈C (µX ,µY ) X×Y γ∈C (ρX (x ),ρY (y )) R2
(1.16)
Here recall that νX = (ωX )∗ (µ⊗2 ⊗2
X ), νY = (ωY )∗ (µY ), λX = (ωX (x, ·))∗ µX , λY =
(ωY (y, ·))∗ µY , ρX = (ωX (·, x))∗ µX , and ρY = (ωY (·, y))∗ µY . Inequalities (1.7)-(1.8)
appeared as the Second Lower Bound and its relaxation in [85]. Inequalities (1.9), (1.11),
(1.13), and (1.15) are the eccentricity bounds in Theorem 121. Inequalities (1.10), (1.12),
(1.14), and (1.16) are their relaxations. In the symmetric case, these outer/inner pairs of
inequalities coincide; they appeared as the First and Third Lower Bounds and their relax-
ations in [85].
In Inequality (1.8), both νX and νY are probability distributions on R, and the right hand
side is precisely the p-Wasserstein distance between νX and νY . Analogous statements hold
for Inequalities (1.14) and (1.16).

77
To connect with the nomenclature introduced in [85], we give names to the inequalities
in Theorem 127: (1.7)-(1.8) are the SLB inequalities, (1.9)-(1.12) are the FLB inequalities,
and (1.13)-(1.16) are the TLB inequalities. These abbreviations stand for First, Second,
and Third Lower Bound. Note that due to the asymmetry of the network setting, we get
twice as many FLB and TLB inequalities as we would for the metric measure setting.
Next we describe the formula for computing OT over R (see [120, Remark 2.19]) . Let
measure spaces (X, µX ), (Y, µY ) and measurable functions f : X → R, g : Y → R be
given. Then let F, G : R → [0, 1] denote the cumulative distribution functions of f and g:

F (t) := µX (f ≤ t), G(t) := µY (g ≤ t).

The generalized inverses F −1 : [0, 1] → R, G−1 : [0, 1] → R are given as:

F −1 (t) := inf{u ∈ R : F (u) ≥ t}, G−1 (t) := inf{u ∈ R : G(u) ≥ t}

Then for p ≥ 1, we have:


Z Z 1
inf p
|a − b| dµ(a, b) = |F −1 (t) − G−1 (t)|p dt (1.17)
µ∈C (f∗ µX ,g∗ µY ) R×R 0

For p = 1, we even have:


Z Z
inf |a − b| dµ(a, b) = |F (t) − G(t)| dt (1.18)
µ∈C (f∗ µX ,g∗ µY ) R×R R

These formulae are easily adapted to obtain closed form solutions for the lower bounds
given by Inequalities (1.10) and (1.12) and for the inner OT problems in (1.14), (1.16) of
Theorem 127.

1.10 Computational aspects


In this section, we present two sets of experiments. One of these uses Dowker per-
sistence, the other uses the lower bounds on dN,p . Further experiments and details about
algorithms are presented in Chapter 4.
Throughout this thesis, we have presented numerous quantities that can potentially be
computed. It turns out that dN is mostly intractable—it is NP-hard, and is difficult to
compute for networks having more than four or five nodes. The dN,p distances are still
NP-hard, but because they relax the purely combinatorial structure of dN , they can be
approximately computed via quadratic optimization approaches. For both the dN and dN,p
distances, we provide invariants and associated lower bounds that are computed as linear
programs, hence become much more computationally tractable.
On the side of persistent homology computations, it turns out that Dowker (and also
Vietoris-Rips) persistence is remarkably easy to implement—the only task is compute the

78
simplicial filtration, after which any out-of-the-box persistent homology package can take
over (we use Javaplex for the latter).
Computing PPH, however, is more tricky, a priori. As we point out in Remark 55,
we are not supplied with a basis for path homology computations, and computing this
basis requires significant additional preprocessing, at least for a naive implementation of
PPH. One of our contributions is in showing that the basis computation and persistence
computation rely on the same matrix operations, so both steps can be combined and carried
out together.

1.10.1 Software packages


For dN , related lower bounds, and Vietoris-Rips/Dowker persistence computations, we
developed the PersNet software package jointly with Facundo Mémoli—see https:
//github.com/fmemoli/PersNet.
For computations related to dN,p , we released the GWnets package https://github.
com/samirchowdhury/GWnets.
For PPH computation (up to dimension 1), we have implementations in C++, Python
2.7, and Matlab—see https://github.com/samirchowdhury. The C++ imple-
mentation is the fastest of the three.

1.10.2 Simulated hippocampal networks


In the neuroscience literature, it has been shown that as an animal explores a given
environment or arena, specific “place cells” in the hippocampus show increased activity at
specific spatial regions, called “place fields” [93]. Each place cell shows a spike in activity
when the animal enters the place field linked to this place cell, accompanied by a drop in
activity as the animal moves far away from this place field. To understand how the brain
processes this data, a natural question to ask is the following: Is the time series data of the
place cell activity, referred to as “spike trains”, enough to detect the structure of the arena?
Approaches based on homology [41] and persistent homology [43] have shown positive
results in this direction. In [43], the authors simulated the trajectory of a rat in an arena
containing “holes.” A simplicial complex was then built as follows: whenever n + 1 place
cells with overlapping place fields fired together, an n-simplex was added. This yielded a
filtered simplicial complex indexed by a time parameter. By computing persistence, it was
then shown that the number of persistent bars in the 1-dimensional barcode of this filtered
simplicial complex would accurately represent the number of holes in the arena.
We repeated this experiment with the following change in methodology: we simulated
the movement of an animal, and corresponding hippocampal activity, in arenas with a vari-
ety of obstacles. We then induced a directed network from each set of hippocampal activity
data, and computed the associated 1-dimensional Dowker persistence diagrams. We were
interested in seeing if the bottleneck distances between diagrams arising from similar are-
nas would differ significantly from the bottleneck distance between diagrams arising from

79
different arenas. To further exemplify our methods, we repeated our analysis after comput-
ing the 1-dimensional Rips persistence diagrams from the hippocampal activity networks.
In our experiment, there were five arenas. The first was a square of side length L = 10,
with four circular “holes” or “forbidden zones” of radius 0.2L that the trajectory could
not intersect. The other four arenas were those obtained by removing the forbidden zones
one at a time. In what follows, we refer to the arenas of each type as 4-hole, 3-hole, 2-
hole, 1-hole, and 0-hole arenas. For each arena, a random-walk trajectory of 5000 steps
was generated, where the animal could move along a square grid with 20 points in each
direction. The grid was obtained as a discretization of the box [0, L] × [0, L], and each step
had length 0.05L. The animal could move in each direction with equal probability. If one
or more of these moves took the animal outside the arena (a disallowed move), then the
probabilities were redistributed uniformly among the allowed moves. Each trajectory was
tested to ensure that it covered the entire arena, excluding the forbidden zones. Formally,
we write the time steps as a set T := {1, 2, . . . , 5000}, and denote the trajectory as a map
traj : T → [0, L]2 .
For each of the five arenas, 20 trials were conducted, producing a total of 100 trials. For
each trial lk , an integer nk was chosen uniformly at random from the interval [150, 200].
Then nk place fields of radius 0.05L were scattered uniformly at random inside the cor-
responding arena for each lk . An illustration of the place field distribution is provided in
Figure 1.22. A spike on a place field was recorded whenever the trajectory would intersect
it. So for each 1 ≤ i ≤ nk , the spiking pattern of cell xi , corresponding to place field PFi ,
was recorded via a function ri : T → {0, 1} given by:
(
1 : if traj(t) intersects PFi ,
ri (t) = t ∈ T.
0 : otherwise

The matrix corresponding to ri is called the raster of cell xi . A sample raster is il-
lustrated in Figure 1.22. For each trial lk , the corresponding network (Xk , ωXk ) was
constructed as follows: Xk consisted of nk nodes representing place fields, and for each
1 ≤ i, j ≤ nk , the weight ωXk (xi , xj ) was given by:

Ni,j (5)
ωXk (xi , xj ) := 1 − Pnk ,
i=1 Ni,j (5)
where Ni,j (5) = card (s, t) ∈ T 2 : t ∈ [2, 5000], t − s ∈ [1, 5], rj (t) = 1, ri (s) = 1 .
 

In words, Ni,j (5) counts the pairs of times (s, t), s < t, such that cell xj spikes (at a
time t) after cell xi spikes (at a time s), and the delay between the two spikes is fewer than
5 time steps. The idea is that if cell xj frequently fires within a short span of time after
cell xi fires, then place fields PFi and PFj are likely to be in close proximity to each other.
>
The column sum of the matrix corresponding to ωXk is normalized to 1, and so ωX k
can be
interpreted as the transition matrix of a Markov process.

80
10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Figure 1.22: Bottom right: Sample place cell spiking pattern matrix. The x-axis corre-
sponds to the number of time steps, and the y-axis corresponds to the number of place
cells. Black dots represent spikes. Clockwise from bottom middle: Sample distribution
of place field centers in 4, 3, 0, 1, and 2-hole arenas.

81
Next, we computed the 1-dimensional Dowker persistence diagrams of each of the 100
networks. Note that DgmD D >
1 (ωX ) = Dgm1 (ωX ) by Proposition 46, so we are actually ob-
taining the 1-dimensional Dowker persistence diagrams of transition matrices of Markov
processes. We then computed a 100 × 100 matrix consisting of the bottleneck distances
between all the 1-dimensional persistence diagrams. The single linkage dendrogram gen-
erated from this bottleneck distance matrix is shown in Figure 1.23. The labels are in the
format env-<nh>-<nn>, where nh is the number of holes in the arena/environment, and
nn is the number of place fields. Note that with some exceptions, networks corresponding
to the same arena are clustered together. We conclude that the Dowker persistence diagram
succeeded in capturing the intrinsic differences between the five classes of networks arising
from the five different arenas, even when the networks had different sizes.
We then computed the Rips persistence diagrams of each network, and computed the
100 × 100 bottleneck distance matrix associated to the collection of 1-dimensional dia-
grams. The single linkage dendrogram generated from this matrix is given in Figure 1.24.
Notice that the Rips dendrogram does not do a satisfactory job of classifying arenas cor-
rectly.

Remark 128. We note that an alternative method of comparing the networks obtained from
our simulations would have been to compute the pairwise network distances, and plot the
results in a dendrogram. But dN is NP-hard to compute—this follows from the fact that
computing dN includes the problem of computing Gromov-Hausdorff distance between
finite metric spaces, which is NP-hard [105]. So instead, we are computing the bottleneck
distances between 1-dimensional Dowker persistence diagrams, as suggested by Remark
30.

Remark 129. It is possible to compare the current approach with the one taken in [43] on a
common dataset. We performed this comparison in [32] for a similar experiment, but with
a stochastic firing model for the place cells. Interestingly, it turns out that the network ap-
proach with Dowker persistence performs better than the approach in [43], as indicated by
computing 1-nearest neighbor classification error rates on the bottleneck distance matrices.
A possible explanation is that preprocessing the spiking data into a network automatically
incorporates a form of error correction, where the errors consist of stochastic firing between
cells that are non-adjacent. On the other hand, such errors are allowed to accumulate over
time in the approach taken in [43]. For an alternative error-correction approach, see [33].

1.10.3 Clustering SBMs and migration networks


We now describe the specifics of an experiment on clustering a collection of network
SBMs and migration networks. We perform clustering with respect to the TLB, i.e. the
bounds given by Inequalities (1.14) and (1.16).

82
Figure 1.23: Single linkage dendrogram corresponding to the distance matrix obtained by
computing bottleneck distances between 1-dimensional Dowker persistence diagrams of
our database of hippocampal networks (§1.10.2). Note that the 4, 3, and 2-hole arenas are
well separated into clusters at threshold 0.1.

83
Figure 1.24: Single linkage dendrogram corresponding to the distance matrix obtained by
computing bottleneck distances between 1-dimensional Rips persistence diagrams of our
database of hippocampal networks (§1.10.2). Notice that the hierarchical clustering fails to
capture the correct arena types.
84
Class # N v ni Sample cycle network of means
1 5 [0,25,50,75,100] 10 0 25 50 75 100
2 5 [0,50,100,150, 200] 10 100 0 25 50 75
3 5 [0,25,50,75,100] 20 75 100 0 25 50
4 2 [0,100] 25 50 75 100 0 25
5 5 [-100,-50,0,50,100] 10 25 50 75 100 0

Table 1.1: Left: The five classes of SBM networks corresponding to the experiment in
§1.10.3. N refers to the number of communities, v refers to the vector that was used to
compute a table of means via G5 (v), and ni is the number of nodes in each community.
Right: G5 (v) for v = [0, 25, 50, 75, 100].

Experiment: SBMs from cycle networks.


Let N ∈ N, and let v = [v1 , . . . , vN ] be an N × 1 vector. Define the right-shift operator
ρ by ρ([v1 , . . . , vN ]) = [vN , v1 , . . . , vN −1 ]. The cycle network GN (v) is defined to be the
N -node network whose weight matrix is given by [v T , ρ(v)T , (ρ2 (v))T , . . . , (ρN −1 (v))T ]T .
This is analogous to the cycle networks defined earlier in §1.3.
In our network SBM generation procedure, we started with a vector of means v and
generate GN (v) for certain choices of N . This gave us the N 2 choices of means to be used
for each network SBM. To keep the experiment simple, we fixed the matrix of variances to
be the N × N matrix whose entries are all 5s. We made 5 choices of v, and sampled 10
networks for each choice. The objective was then to see how well the TLB could split the
collection of 50 networks into 5 classes corresponding to the 5 different community struc-
tures. The different parameters used in our experiments are listed in Table 1.1. The TLB
was computed essentially according to the scheme presented in Algorithm 4, except that
instead of comparing the 2-Wasserstein distance between pushforward distributions over
R, we used Sinkhorn iterations to approximate the solution to each “inner” OT problem as
described in Theorem 127 Inequalities (1.13) and (1.15).
In this experiment, we were interested in understanding the behavior of the TLB on
different community structures. Class 1 is our reference; compared to this reference, class
2 differs in its edge weights, class 3 differs in the number of nodes in each community, class
4 differs in the number of communities, and class 5 differs by having a larger proportion
of negative edge weights. The TLB results in Figure 1.25 show that classes 1 and 3 are
treated as being very similar, whereas the other classes are all mutually well-separated.
One interesting suggestion arising from this experiment is that the TLB can be used for
network simplification: given a family of networks which are all at low TLB distance to
each other, it may be reasonable to retain only the smallest network in the family as the
“minimal representative” network.

85
5

25
10

0.12
15

20

20 5
0.1

25
15 0.08
10

30
0.06

15
35 10

0.04

40
20

5 0.02
45

25
50
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25

Figure 1.25: Left: TLB dissimilarity matrix for SBM community networks in §1.10.3.
Classes 1 and 3 are similar, even though networks in Class 3 have twice as many nodes
as those in Class 1. Classes 2 and 5 are most dissimilar because of the large difference in
their edge weights. Class 4 has a different number of communities than the others, and is
dissimilar to Classes 1 and 3 even though all their edge weights are in comparable ranges.
Right: TLB dissimilarity matrix for two-community SBM networks in §1.10.3. The near-
zero values on the diagonal are a result of using the adaptive λ-search described in Chapter
4.

86
Class # N v ni
1 2 [0,0] 10
2 2 [0,5] 10
3 2 [0,10] 10
4 2 [0,15] 10
5 2 [0,20] 10

Table 1.2: Two-community SBM networks as described in §1.10.3.

Experiment: Two-community SBMs with sliding means


Having understood the interaction of the TLB with network community structure, we
next investigated how the TLB behaves with respect to edge weights. In our second exper-
iment, we used a 2 × 1 means vector v, and varied v as [0, 0], [0, 5], . . . , [0, 20] (see Table
1.2). The SBM means were then given by G2 (v) for the various choices of v. The variances
were fixed to be the all 5s matrix. The edge weight histograms of the resulting SBM net-
works then looked like samples from two Gaussian distributions, with one of the Gaussians
sliding away from the other. Finally, we normalized each network by its largest weight in
absolute value, so that its normalized edge weights were in [−1, 1].
The purpose of this experiment was to test the performance of the TLB on SBMs com-
ing from a mixture of Gaussians. Note that normalization ensures that simpler invariants
such as the size invariant would likely fail in this setting. The TLB still performs rea-
sonably well in this setting, as illustrated by the dissimilarity matrix in Figure 1.25. The
computations were carried out as in the setting of §1.10.3.

Experiment: Real migration networks


For an experiment involving real-world networks, we compared global bilateral migra-
tion networks produced by the World Bank [66, 94]. The data consists of 10 networks, each
having 225 nodes corresponding to countries/administrative regions. The (i, j)-th entry in
each network is the number of people living in region i who were born in region j. The 10
networks comprise such data for male and female populations in 1960, 1970, 1980, 1990,
and 2000. When extracting the data, we removed the entries corresponding to refugee pop-
ulations, the Channel Islands, the Isle of Man, Serbia, Montenegro, and Kosovo, because
the data corresponding to these regions was incomplete/inconsistent across the database.
The TLB computations were carried out as in Algorithm 4. In particular, we used
Equation (1.17) to obtain the Wasserstein distance between pushforward distributions over
R. The result of applying the TLB to this dataset is illustrated in Figure 1.26. To better
understand the dissimilarity matrix, we also computed its single linkage dendrogram. The

87
10 4
7
1

2 6

m-2000
3
5
m-1960
4
m-1970

4
5 f-1970

f-1960
6
3
f-2000

7
f-1990
2
8 f-1980

m-1990
9 1
m-1980

10
3.5 4 4.5 5 5.5
1 2 3 4 5 6 7 8 9 10 10 4

Figure 1.26: Result of applying the TLB to the migration networks in §1.10.3. Left: Dis-
similarity matrix. Nodes 1-5 correspond to female migration from 1960-2000, and nodes
6-10 correspond to male migration from 1960-2000. Right: Single linkage dendrogram.
Notice that overall migration patterns change in time, but within a time period, migration
patterns are grouped according to gender.

dendrogram suggests that between 1980 and 1990, both male and female populations had
quite similar migration patterns. Within these years, however, migration patterns were
more closely tied to gender. This effect is more pronounced between 1960 and 1970,
where we see somewhat greater divergence between migration patterns based on gender.
Male migration in 2000 is especially divergent, with the greatest dissimilarity to all the
other datasets.
The labels in the dissimilarity matrix are as follows: 1-5 correspond to “f-1960” through
“f-2000”, and 6-10 correspond to “m-1960” through “m-2000”. The color gradient in the
dissimilarity matrix suggests that within each gender, migration patterns change in a way
that is parametrized by time. This of course reflects the shifts in global technological and
economical forces which make migration attractive and/or necessary with time.

88
Chapter 2: Metric structures of dN and dN,p

In this chapter, we supply the proofs of the results on the metric structure of N that
were stated in §1. Along the way, we occasionally state additional definitions and results.

2.1 Proofs from §1.2


Proof of Proposition 7. First we show that:

dN (X, Y ) ≥ 12 inf{max(dis(ϕ), dis(ψ), CX,Y (ϕ, ψ), CY,X (ψ, ϕ)) :


ϕ : X → Y, ψ : Y → X any maps}.

Let ε > dN (X, Y ), and let R be a correspondence such that dis(R) < 2ε. We define maps
ϕ : X → Y and ψ : Y → X as follows: for each x ∈ X, set ϕ(x) = y for some y such
that (x, y) ∈ R. Similarly, for each y ∈ Y , set ψ(y) = x for some x such that (x, y) ∈ R.
Let x ∈ X, y ∈ Y . Then we have

|ωX (x, ψ(y)) − ωY (ϕ(x), y)| < 2ε and |ωX (ψ(y), x) − ωY (y, ϕ(x))| < 2ε.

Since x ∈ X, y ∈ Y were arbitrary, it follows that CX,Y (ϕ, ψ) ≤ 2ε and CY,X (ψ, ϕ) ≤ 2ε.
Also for any x, x0 ∈ X, we have (x, ϕ(x)), (x0 , ϕ(x0 )) ∈ R, and so

|ωX (x, x0 ) − ωY (ϕ(x), ϕ(x0 ))| < 2ε.

Thus dis(ϕ) ≤ 2ε, and similarly dis(ψ) ≤ 2ε. This proves the “≥” case.
Next we wish to show the “≤” case. Suppose ϕ, ψ are given, and
1
max(dis(ϕ), dis(ψ), CX,Y (ϕ, ψ), CY,X (ψ, ϕ)) < ε,
2
for some ε > 0.
Let RX = {(x, ϕ(x)) : x ∈ X} and let RY = {(ψ(y), y) : y ∈ Y }. Then R = RX ∪RY
is a correspondence. We wish to show that for any z = (a, b), z 0 = (a0 , b0 ) ∈ R,

|ωX (a, a0 ) − ωY (b, b0 )| < 2ε.

This will show that dis(R) ≤ 2ε, and so dN (X, Y ) ≤ ε.

89
To see this, let z, z 0 ∈ R. Note that there are four cases: (1) z, z 0 ∈ RX , (2) z, z 0 ∈ RY ,
(3) z ∈ RX , z 0 ∈ RY , and (4) z ∈ RY , z 0 ∈ RX . In the first two cases, the desired
inequality follows because dis(ϕ), dis(ψ) < 2ε. The inequality follows in cases (3) and (4)
because CX,Y (ϕ, ψ) < 2ε and CY,X (ψ, ϕ) < 2ε, respectively. Thus dN (X, Y ) ≤ ε.
Proof of Example 9. We start with some notation: for x, x0 ∈ X, y, y 0 ∈ Y , let
Γ(x, x0 , y, y 0 ) = |ωX (x, x0 ) − ωY (y, y 0 )|.
Let ϕ : X → Y be a bijection. Note that Rϕ := {(x, ϕ(x)) : x ∈ X} is a correspon-
dence, and this holds for any bijection (actually any surjection) ϕ. Since we minimize over
all correspondences for dN , we conclude dN (X, Y ) ≤ dbN (X, Y ).
For the reverse inequality, we represent all the elements of R(X, Y ) as 2-by-2 binary
matrices R, where a 1 in position ij means (xi , yj ) ∈ R. Denote the matrix representation
of each R ∈ R(X, Y ) by mat(R), and the collection of such matrices as mat(R). Then we
have:
mat(R) = {( 1b a1 ) : a, b ∈ {0, 1}} ∪ {( a1 1b ) : a, b ∈ {0, 1}}
Let A = {(x1 , y1 ), (x2 , y2 )} (in matrix notation, this is ( 10 01 )) and let B = {(x1 , y2 ), (x2 , y1 )}
(in matrix notation, this is ( 01 10 )). Let R ∈ R(X, Y ). Note that either A ⊆ R or B ⊆ R.
Suppose that A ⊆ R. Then we have:
max Γ(x, x0 , y, y 0 ) ≤ max Γ(x, x0 , y, y 0 )
(x,y),(x0 ,y 0 )∈A (x,y),(x0 ,y 0 )∈R

Let Ω(A) denote the quantity on the left hand side. A similar result holds in the case
B ⊆ R:
max
0 0
Γ(x, x0 , y, y 0 ) ≤ max
0 0
Γ(x, x0 , y, y 0 )
(x,y),(x ,y )∈B (x,y),(x ,y )∈R

Let Ω(B) denote the quantity on the left hand side. Since either A ⊆ R or B ⊆ R, we have
min {Ω(A), Ω(B)} ≤ min max Γ(x, x0 , y, y 0 )
R∈R (x,y),(x0 ,y 0 )∈R

We may identify A with the bijection given by x1 7→ y1 and x2 7→ y2 . Similarly we may


identify B with the bijection sending x1 7→ y2 , x2 7→ y1 . Thus we have
min max
0
Γ(x, x0 , ϕ(x), ϕ(x0 )) ≤ min max
0 0
Γ(x, x0 , y, y 0 ).
ϕ x,x ∈X R∈R (x,y),(x ,y )∈R

So we have dbN (X, Y ) ≤ dN (X, Y ). Thus dbN = dN .


Next, let {p, q} and {p0 , q 0 } denote the vertex sets of X and Y . Consider the bijection
ϕ given by p 7→ p0 , q 7→ q 0 and the bijection ψ given by p 7→ q 0 , q 7→ p0 . Note that the
weight matrix is determined by setting ωX (p, p) = α, ωX (p, q) = δ, ωX (q, p) = β, and
ωX (q, q) = γ, and similarly for Y . Then we get
dis(ϕ) = max (|α − α0 |, |β − β 0 |, |γ − γ 0 |, |δ − δ 0 |)
dis(ψ) = max((|α − γ 0 |, |γ − α0 |, |δ − β 0 |, |β − δ 0 |) .
The formula follows immediately.

90
Proof of Proposition 12. We begin with an observation. Given X, Y ∈ N , let X 0 , Y 0 ∈ N
be such that X ∼
=w 0 ∼w 0 0 0
II X , Y =II Y , and card(X ) = card(Y ). Then we have:

dN (X, Y ) ≤ dN (X, X 0 ) + dN (X 0 , Y 0 ) + dN (Y 0 , Y ) = dN (X 0 , Y 0 ) ≤ dbN (X 0 , Y 0 ),

where the last inequality follows from Remark 10.


Next let η > dN (X, Y ), and let R ∈ R(X, Y ) be such that dis(R) < 2η. We wish
to find networks X 0 and Y 0 such that dbN (X 0 , Y 0 ) < η. Write Z = X × Y , and write
f : Z → X and g : Z → Y to denote the (surjective) projection maps (x, y) 7→ x and
(x, y) 7→ y. Notice that we may write R = {(f (z), g(z)) : z ∈ R ⊆ Z} . In particular, by
the definition of a correspondence, the restrictions of f, g to R are still surjective.
Define two weight functions f ∗ ω, g ∗ ω : R × R → R by f ∗ ω(z, z 0 ) = ωX (f (z), f (z 0 ))
and g ∗ ω(z, z 0 ) = ωY (g(z), g(z 0 )). Let (U, ωU ) = (R, f ∗ ω) and let (V, ωV ) = (R, g ∗ ω).
Note that dN (X, U ) = 0 by Remark 69, because card(U ) ≥ card(X) and for all z, z 0 ∈ U ,
we have ωU (z, z 0 ) = f ∗ ω(z, z 0 ) = ωX (f (z), f (z 0 )) for the surjective map f . Similarly
dN (Y, V ) = 0.
Next let ϕ : U → V be the bijection z 7→ z. Then we have:

sup |ωU (z, z 0 ) − ωV (ϕ(z), ϕ(z 0 ))| = sup |ωU (z, z 0 ) − ωV (z, z 0 )|
z,z 0 ∈U z,z 0 ∈U

= sup |ωX (f (z), f (z 0 )) − ωY (g(z), g(z 0 ))|


z,z 0 ∈R

= sup |ωX (x, x0 ) − ωY (y, y 0 )|


(x,y),(x0 ,y 0 )∈R

= dis(R). In particular,
inf dis(ϕ) ≤ dis(R).
ϕ:U →V bijection

So there exist networks U, V with the same node set (and thus the same cardinality)
such that dbN (U, V ) ≤ 21 dis(R) < η. We have already shown that dN (X, Y ) ≤ dbN (U, V ).
Since η > dN (X, Y ) was arbitrary, it follows that we have:
n o
dN (X, Y ) = inf dbN (X 0 , Y 0 ) : X 0 ∼
=w
II X, Y 0 ∼w
= II Y, and card(X 0
) = card(Y 0
) .

2.2 ε-systems and finite sampling


In this section, we present proofs of results stated in §1.6.1.

Proof of Theorem 64. Once an ε-system has been found, the refinement can be produced
by standard methods. So we focus on proving the existence of an ε-system. The idea
is to find a cover of X by open sets G1 , . . . , Gq and representatives xi ∈ Gi for each
1 ≤ i ≤ q such that whenever we have (x, x0 ) ∈ Gi × Gj , we know by continuity of ωX

91
that |ωX (x, x0 ) − ωX (xi , xj )| < ε. Then we define a correspondence that associates each
x ∈ Gi to xi , for 1 ≤ i ≤ q. Such a correspondence has distortion bounded above by ε.
Let ε > 0. Let B be a base for the topology on X.
Let {B(r, ε/4) : r ∈ R} be an open cover for R. Then by continuity of ωX , we get that
 −1
ωX [B(r, ε/4)] : r ∈ R

is an open cover for X × X. Each open set in this cover can be written as a union of open
rectangles U × V , for U, V ∈ B. Thus the following set is an open cover of X × X:

U := U × V : U, V ∈ B, U × V ⊆ ωX −1

[B(r, ε/4)], r ∈ R .

Claim 1. There exists a finite open cover G = {G1 , . . . , Gq } of X such that for any 1 ≤
i, j ≤ q, we have Gi × Gj ⊆ U × V for some U × V ∈ U .
Proof of Claim 1. The proof of the claim proceeds by a repeated application of the Tube
Lemma [89, Lemma 26.8]. Since X × X is compact, we take a finite subcover:

U f := {U1 × V1 , . . . , Un × Vn }, for some n ∈ N.

Let x ∈ X. Then we define:

Uxf := {U × V ∈ U f : x ∈ U },

and write n o
Uxf = Uix1 × Vix1 , . . . , Uixm(x) × Vixm(x) .

Here m(x) is an integer depending on x, and i1 , . . . , im(x) is a subset of {1, . . . , n}.
Since U f is an open cover of X × X, we know that Uxf is an open cover of {x} × X.
Next define:
m(x)
\
Ax := Uixk .
k=1

Then Ax is open and contains x. In the literature [89, p. 167], the set Ax × X is called a
tube around {x} × X. Notice that Ax × X ⊆ Uxf . Since x was arbitrary in the preceding
construction, we define Uxf and Ax for each x ∈ X. Then note that {Ax : x ∈ X} is an
open
 cover of X. Using compactness of X, we choose {s1 , . . . , sp } ⊆ X, p ∈ N, such that
As1 , . . . , Asp is a finite subcover of X.
Once again let x ∈ X, and let Uxf and Ax be defined as above. Define the following:

Bx := Ax × Vixk : 1 ≤ k ≤ m(x) .


m(x)
Since x ∈ Ax and X ⊆ ∪k=1 Vixk , it follows that Bx is a cover of {x} ×X. Furthermore,
since As1 , . . . , Asp is a cover of X, it follows that the finite collection Bs1 , . . . , Bsp is
a cover of X × X.

92
m(x)
Let z ∈ X. Since X ⊆ ∪k=1 Vixk , we pick Vixk for 1 ≤ k ≤ m(x) such that z ∈ Vixk .
Since x was arbitrary, such a choice exists for each x ∈ X. Therefore, we define:

Cz := {V ∈ B : z ∈ V, Asi × V ∈ Bsi for some 1 ≤ i ≤ p} .

Since each Bsi is finite and there are finitely many Bsi , we know that Cz is a finite
collection. Next define: \
Dz := V.
V ∈Cz

Then Dz is open and contains z. Notice that X × Dz is a tube around X × {z}. Next, using
the fact that {Asi : 1 ≤ i ≤ p} is an open cover of X, pick Asi(z) such that z ∈ Asi(z) . Here
1 ≤ i(z) ≤ p is some integer depending on z. Then define

Gz := Dz ∩ Asi(z) .

Then Gz is open and contains z. Since z was arbitrary, we define Gz for each z ∈ X. Then
{Gz : z ∈ X} is an open cover of X, and we take a finite subcover:

G := {G1 , . . . , Gq }, q ∈ N.

Finally, we need to show that for any choice of 1 ≤ i, j ≤ q, we have Gi × Gj ⊆ U × V


for some U × V ∈ U . Let 1 ≤ i, j ≤ q. Note that we can write Gi = Gw and Gj = Gy for
some w, y ∈ X. By the definition of Gw , we then have the following for some index i(w)
depending on w:

Gw ⊆ Asi(w) ⊆ U si(w) for some U si(w) × V si(w) ∈ Usfi(w) , 1 ≤ i(w) ≤ p.

Note that the second containment holds by definition of Asi(w) . Since Usfi(w) is a cover of

si(w) × X, we choose V si(w) to contain y. Then observe that Asi(w) × V si(w) ∈ Bsi(w) .
Then V si(w) ∈ Cy , and so we have:

Gy ⊆ Dy ⊆ V si(w) .

It follows that Gi × Gj = Gw × Gy ⊆ U xi(w) × V xi(w) ∈ U . 


Now we fix G = {G1 , . . . , Gq } as in Claim 1. Before defining X 0 , we perform a
disjointification step. Define:
 
Ge1 := G1 , G e2 := G2 \ G e3 := G3 \ (G
e1 , G e1 ∪ G eq := Gq \ ∪q−1 G
e2 ), . . . , G ek .
k=1

Finally we define X 0 as follows: pick a representative xi ∈ Gei for each 1 ≤ i ≤ q. Let


X = {xi : 1 ≤ i ≤ q}. Define a correspondence between X and X 0 as follows:
0

n o
R := (x, xi ) : x ∈ G ei , 1 ≤ i ≤ q .

93
Let (x, xi ), (x0 , xj ) ∈ R. Then we have (x, x0 ), (xi , xj ) ∈ G
ei × Gej ⊆ Gi × Gj . By the
preceding work, we know that Gi × Gj ⊆ U × V , for some U × V ∈ U . Therefore
ωX (x, x0 ), ωX (xi , xj ) ∈ B(r, ε/4) for some r ∈ R. It follows that:

|ωX (x, x0 ) − ωX (xi , xj )| < ε/2.

Since (x, xi ), (x0 , xj ) ∈ R were arbitrary, we have dis(R) < ε/2. Hence dN (X, X 0 ) <
ε.
Proof of Theorem 67. The first part of this proof is 
similar to that of Theorem 64. Let
ε > 0. Let B be a base for the topology on X. Then ωX −1
[B(r, ε/8)] : r ∈ R is an open
cover for X × X. Each open set in this cover can be written as a union of open rectangles
U × V , for U, V ∈ B. Thus the following set is an open cover of X × X:

U := U × V : U, V ∈ B, U × V ⊆ ωX −1

[B(r, ε/8)], r ∈ R .

By applying Claim 1 from the proof of Theorem 64, we obtain a finite open cover G =
{G1 , . . . , Gq } of X such that for any 1 ≤ i, j ≤ q, we have Gi × Gj ⊆ U × V for some
U × V ∈ U . For convenience, we assume that each Gi is nonempty.
Now let 1 ≤ i ≤ q. Then Gi ∩ S 6= ∅, because S is dense in X. Choose p(i) ∈ N such
that sp(i) ∈ Gi . We repeat this process for each 1 ≤ i ≤ q, and then define

n := max {p(1), p(2), . . . , p(q)} .

Now define Xn to be the network with node set {s1 , s2 , . . . , sn } and weight function given
 the appropriate restriction of ωX . Also define Sn to be the network with node set
by
sp(1) , sp(2) , . . . , sp(q) and weight function given by the restriction of ωX .
Claim 2. Let A be a subset of X equipped with the weight function ωX |A×A . Then
dN (Sn , A) < ε/2.
Proof of Claim 2. We begin with G = {G1 , . . . , Gq }. Notice that each Gi contains sp(i) .
To avoid ambiguity in our construction, we will need to ensure that Gi does not contain
sp(j) for i 6= j. So our first step is to obtain a cover of A by disjoint sets while ensuring that
each sp(i) ∈ Sn belongs to exactly one element of the new cover. We define:

G∗1 := G1 \ Sn , G∗2 := G2 \ Sn , G∗3 := G3 \ Sn , . . . , G∗q := Gq \ Sn , and


  
∗ ∗ ∗
 
G1 := G1 ∪ sp(1) , G2 := (G2 \ G1 ) ∪ sp(2) , G3 := G3 \ (G1 ∪ G2 ) ∪ sp(3) , . . . ,
e e e e e e
   
Geq := G∗q \ ∪q−1 G ek ∪ sp(q) .
k=1

n o
Notice that G ei : 1 ≤ i ≤ q is a cover for A, and for each 1 ≤ i ≤ q, G
ei contains sp(j) if
and only if i = j. Now we define a correspondence between A and Sn as follows:
n o
R := (x, sp(i) ) : x ∈ A ∩ G
ei , 1 ≤ i ≤ q .

94
Next let (x, sp(i) ), (x0 , sp(j) ) ∈ R. Then we have (x, x0 ), (sp(i) , sp(j) ) ∈ G
ei × G
ej ⊆ Gi ×
Gj ⊆ U × V for some U × V ∈ U . Therefore ωX (x, x ) and ωX (sp(i) , sp(j) ) both belong
0

to B(r, ε/8) for some r ∈ R. Thus we have:

|ωX (x, x0 ) − ωX (sp(i) , sp(j) )| < ε/4.

It follows that dis(R) < ε/4, and so dN (A, Sn ) < ε/2. 


Finally, we note that dN (X, Xn ) ≤ dN (X, Sn ) + dN (Sn , Xn ) < ε/2 + ε/2 = ε, by
Claim 2. Since ε > 0 was arbitrary, it follows that dN (X, Xn ) → 0.
For the final statement in the theorem, let m ≥ n and observe that Sn ⊆ Xn ⊆ Xm .
Thus whenever we have dN (X, Xn ) < ε, we also have dN (X, Xm ) < ε. It follows that:

dN (X, Xm ) ≤ dN (X, Xn ) for any m, n ∈ N, m ≥ n.

Next we proceed to Theorem 68. We first prove the following useful lemma:

Lemma 130. Assume the setup of (X, ωX ), µX , (Ω, F, P), and Xn for each n ∈ N as in
Theorem 68. Fix ε > 0, and let U = {U1 , . . . , Um } be a refined ε-system on supp(µX ). For
each 1 ≤ i ≤ m and each n ∈ N, define the following event:
n
\
Ai := {ω ∈ Ω : xk (ω) 6∈ Ui } ⊆ Ω.
k=1
Sm 1
Then we have P ( k=1 Ak ) ≤ m(U )
(1 − m(U))n .

Proof of Lemma 130. Here we are considering the probability that at least one of the Ui
has empty intersection with Xn . By independence, P(Ai ) = (1 − µX (Ui ))n . Then we have:
m
! m m
[ X X (1 − m(U))n
P Ak ≤ P(Ak ) = (1 − µX (Uk ))n ≤ m · max (1 − µ(Uk ))n ≤ .
k=1 k=1 k=1
1≤k≤m m(U)

Here the first inequality follows by subadditivity of measure, and the last inequality follows
because the total mass µX (supp(µX )) = 1 is an upper bound for m · m(U). Note also that
each U ∈ U has nonzero mass, by the observation in Definition 24.
Proof of Theorem 68. By endowing supp(µX ) with the restriction of ωX to supp(µX ) ×
supp(µX ) it may itself be viewed as a network with full support, so for notational conve-
nience, we assume X = supp(µX ).
First observe that Mε/2 (X) ∈ (0, 1]. Let r ∈ (0, Mε/2 (X)), and let Ur be an ε/2-system
on X such that m(Ur ) ∈ (r, Mε/2 (X)]. For convenience, write m := |Ur |, and also write
Ur = {U1 , . . . , Um }. For each 1 ≤ i ≤ m, define Ai as in the statement of Lemma 130.
Then by Lemma 130, the probability that at least one Ui has empty intersection with Xn
is bounded as P ( m 1 n
S
k=1 Ak ) ≤ m(Ur ) (1 − m(Ur )) . On the other hand, if Ui has nonempty

95
intersection with Xn for each 1 ≤ i ≤ m, then by Theorem 66, we obtain dN (X, Xn ) < ε.
For each n ∈ N, define: Bn := {ω ∈ Ω : dN (X, Xn (ω)) ≥ ε} . Then we have:
m
!
[ (1 − m(Ur ))n
P(Bn ) ≤ P Ak ≤ .
k=1
m(Ur )

Since r ∈ (0, Mε/2 (X)) was arbitrary, letting r approach Mε/2 (X) shows that P(Bn ) ≤
(1−Mε/2 (X))n
Mε/2 (X)
. We have by Definition 24 that Mε/2 (X) is strictly positive. Thus the term on
the right side of the inequality is an element of a convergent geometric series, so
∞ ∞
X 1 X
P(Bn ) ≤ (1 − Mε/2 (X))n < ∞.
n=1
Mε/2 (X) n=1

By the Borel-Cantelli lemma, we have P(lim supn→∞ Bn ) = 0. The result follows.

2.3 Proofs from §1.6.2


Proof of Proposition 70. The case for Type I weak isomorphism is similar to that of Type
II, so we omit it. For Type II weak isomorphism, the reflexive and symmetric properties
are easy to see, so we only provide details for verifying transitivity. Let A, B, C ∈ N be
such that A ∼=w ∼w
II B and B =II C. Let ε > 0, and let P, S be sets with surjective maps
ϕA : P → A, ϕB : P → B, ψB : S → B, ψC : S → C such that:

|ωA (ϕA (p), ϕA (p0 )) − ωB (ϕB (p), ϕB (p0 ))| < ε/2 for each p, p0 ∈ P, and
|ωB (ψB (s), ψB (s0 )) − ωC (ψC (s), ψC (s0 ))| < ε/2 for each s, s0 ∈ S.

Next define T := {(p, s) ∈ P × S : ϕB (p) = ψB (s)}.


Claim 3. The projection maps πP : T → P and πS : T → S are surjective.
Proof. Let p ∈ P . Then ϕB (p) ∈ B, and since ψB : S → B is surjective, there exists
s ∈ S such that ψB (s) = ϕB (p). Thus (p, s) ∈ T , and πP (p, s) = p. This suffices to show
that πP : T → P is a surjection. The case for πS : T → S is similar. 
It follows from the preceding claim that ϕA ◦ πP : T → A and ψC ◦ πS : T → C are
surjective. Next let (p, s), (p0 , s0 ) ∈ T . Then,

|ωA (ϕA (πP (p, s)), ϕA (πP (p0 , s0 ))) − ωC (ψC (πS (p, s)), ψC (πS (p0 , s0 )))|
= |ωA (ϕA (p), ϕA (p0 )) − ωC (ψC (s), ψC (s0 ))|
= |ωA (ϕA (p), ϕA (p0 )) − ωB (ϕB (p), ϕB (p0 )) + ωB (ϕB (p), ϕB (p0 )) − ωC (ψC (s), ψC (s0 ))|
= |ωA (ϕA (p), ϕA (p0 )) − ωB (ϕB (p), ϕB (p0 )) + ωB (ψB (s), ψB (s0 )) − ωC (ψC (s), ψC (s0 ))|
< ε/2 + ε/2 = ε.

Since ε > 0 was arbitrary, it follows that A ∼


=w
II C.

96
Proof of Theorem 72. It is clear that dN (X, Y ) ≥ 0. To show dN (X, X) = 0, consider
the correspondence R = {(x, x) : x ∈ X}. Then for any (x, x), (x0 , x0 ) ∈ R, we have
|ωX (x, x0 ) − ωX (x, x0 )| = 0. Thus dis(R) = 0 and dN (X, X) = 0.
Next we show symmetry, i.e. dN (X, Y ) ≤ dN (Y, X) and dN (Y, X) ≤ dN (X, Y ). The
two cases are similar, so we just show the second inequality. Let η > dN (X, Y ). Let
R ∈ R(X, Y ) be such that dis(R) < 2η. Then define R̃ = {(y, x) : (x, y) ∈ R}. Note that
R̃ ∈ R(Y, X). We have:

dis(R̃) = sup |ωY (y, y 0 ) − ωX (x, x0 )|


(y,x),(y 0 ,x0 )∈R̃

= sup |ωY (y, y 0 ) − ωX (x, x0 )|


(x,y),(x0 ,y 0 )∈R

= sup |ωX (x, x0 ) − ωY (y, y 0 )| = dis(R).


(x,y),(x0 ,y 0 )∈R

So dis(R) = dis(R̃). Then dN (Y, X) = 21 inf S∈R(Y,X) dis(S) ≤ 21 dis(R̃) < η. This shows
dN (Y, X) ≤ dN (X, Y ). The reverse inequality follows by a similar argument.
Next we prove the triangle inequality. Let R ∈ R(X, Y ), S ∈ R(Y, Z), and let

R ◦ S = {(x, z) ∈ X × Z | ∃y, (x, y) ∈ R, (y, z) ∈ S}

First we claim that R ◦ S ∈ R(X, Z). This is equivalent to checking that for each x ∈ X,
there exists z such that (x, z) ∈ R ◦ S, and for each z ∈ Z, there exists x such that
(x, z) ∈ R ◦ S. The proofs of these two conditions are similar, so we just prove the former.
Let x ∈ X. Let y ∈ Y be such that (x, y) ∈ R. Then there exists z ∈ Z such that
(y, z) ∈ S. Then (x, z) ∈ R ◦ S.
Next we claim that dis(R◦S) ≤ dis(R)+dis(S). Let (x, z), (x0 , z 0 ) ∈ R◦S. Let y ∈ Y
be such that (x, y) ∈ R and (y, z) ∈ S. Let y 0 ∈ Y be such that (x0 , y 0 ) ∈ R, (y 0 , z 0 ) ∈ S.
Then we have:

|ωX (x, x0 ) − ωZ (z, z 0 )| = |ωX (x, x0 ) − ωY (y, y 0 ) + ωY (y, y 0 ) − ωZ (z, z 0 )|


≤ |ωX (x, x0 ) − ωY (y, y 0 )| + |ωY (y, y 0 ) − ωZ (z, z 0 )|
≤ dis(R) + dis(S).

This holds for any (x, z), (x0 , z 0 ) ∈ R ◦ S, and proves the claim.
Now let η1 > dN (X, Y ), let η2 > dN (Y, Z, and let R ∈ R(X, Y ), S ∈ R(Y, Z) be
such that dis(R) < 2η1 and dis(S) < 2η2 . Then we have:

dN (X, Z) ≤ 21 dis(R ◦ S) ≤ 21 dis(R) + 21 dis(S) < 2η1 + 2η2 .

This shows that dN (X, Z) ≤ dN (X, Y ) + dN (Y, Z), and proves the triangle inequality.
Finally, we claim that X ∼ =w
II Y if and only if dN (X, Y ) = 0. Suppose dN (X, Y ) = 0.
Let ε > 0, and let R(ε) ∈ R(X, Y ) be such that dis(R(ε)) < ε. Then for any z =
(x, y), z 0 = (x0 , y 0 ) ∈ R(ε), we have |ωX (x, x0 ) − ωY (y, y 0 )| < ε. But this is equivalent

97
to writing |ωX (πX (z), πX (z 0 )) − ωY (πY (z), πY (z 0 ))| < ε, where πX : R(ε) → X and
πY : R(ε) → Y are the canonical projection maps. This holds for each ε > 0. Thus
X∼ =w
II Y .
Conversely, suppose X ∼ =w II Y , and for each ε > 0 let Z(ε) be a set with surjective maps
φX : Z(ε) → X, φY : Z → Y such that |ωX (φX (z), φX (z 0 )) − ωY (φY (z), φY (z 0 ))| < ε
ε ε

for all z, z 0 ∈ Z(ε). For each ε > 0, let R(ε) = {(φεX (z), φεY (z)) : z ∈ Z(ε)}. Then
R(ε) ∈ R(X, Y ) for each ε > 0, and dis(R(ε)) = supz,z0 ∈Z |ωX (φX (z), φX (z 0 )) −
ωY (φY (z), φY (z 0 ))| < ε.
We conclude that dN (X, Y ) = 0. Thus dN is a metric modulo Type II weak isomor-
phism.
Proof of Theorem 73. By the definition of ∼
=w ∼w
I , it is clear that if X =I Y , then dN (X, Y ) =
0, i.e. X ∼
=w
II Y (cf. Theorem 72).
Conversely, suppose dN (X, Y ) = 0. Our strategy is to obtain a set Z ⊆ X × Y
with canonical projection maps πX : Z → X, πY : Z → Y and surjections ψX : X →
πX (Z), ψY : Y → πY (Z) as in the following diagram:

X Z Y
idX ψX πX πY ψY idY

X ∼
=w πX (Z) ∼
=w πY (Z) ∼
=w Y
I I I

Furthermore, we will require:

ωX (πX (z), πX (z 0 )) = ωY (πY (z), πY (z 0 )) for all z, z 0 ∈ Z, (2.1)


0 0 0
ωX (x, x ) = ωX (ψX (x), ψX (x )) for all x, x ∈ X, (2.2)
0 0 0
ωY (y, y ) = ωY (ψY (y), ψY (y )) for all y, y ∈ Y. (2.3)

As a consequence, we will obtain a chain of Type I weak isomorphisms

X∼
=w ∼w ∼w
I πX (Z) =I πY (Z) =I Y.

Since Type I weak isomorphism is an equivalence relation (Proposition 70), it will follow
that X and Y are Type I weakly isomorphic.
By applying Theorem 64, we choose sequences of finite subnetworks {Xn ⊆ X : n ∈ N}
and {Yn ⊆ Y : n ∈ N} such that dN (Xn , X) < 1/n and dN (Yn , Y ) < 1/n for each n ∈ N.
By the triangle inequality, dN (Xn , Yn ) < 2/n for each n.
For each n ∈ N, let Tn ∈ R(Xn , X), Pn ∈ R(Y, Yn ) be such that dis(Tn ) < 2/n and
dis(Pn ) < 2/n. Define αn := 4/n − dis(Tn ) − dis(Pn ), and notice that αn → 0 as n → ∞.
Since dN (X, Y ) = 0 by assumption, for each n ∈ N we let Sn ∈ R(X, Y ) be such that
dis(Sn ) < αn . Then,

dis(Tn ◦ Sn ◦ Pn ) ≤ dis(Tn ) + dis(Sn ) + dis(Pn ) < 4/n. (cf. Remark 4)

98
Then for each n ∈ N, we define Rn := Tn ◦ Sn ◦ Pn ∈ R(Xn , Yn ). By Remark 4, we
know that Rn has the following expression:

Rn = {(xn , yn ) ∈ Xn × Yn : there exist x̃ ∈ X, ỹ ∈ Y such that (xn , x̃) ∈ Tn ,


(x̃, ỹ) ∈ Sn , (ỹ, yn ) ∈ Pn }.

Next define:

S := (x̃n , ỹn )n∈N ∈ (X × Y )N : (x̃n , ỹn ) ∈ Sn for each n ∈ N .

Since X, Y are first countable and compact, the product X × Y is also first countable
and compact, hence sequentially compact. Any sequence in a sequentially compact space
has a convergent subsequence, so for convenience, we replace each sequence in S by a
convergent subsequence. Next define:

Z := {(x, y) ∈ X × Y : (x, y) a limit point of some (x̃n , ỹn )n∈N ∈ S} .

Claim 4. Z is a closed subspace of X × Y . Hence it is compact and sequentially compact.


The second statement in the claim follows from the first: assuming that Z is a closed
subspace of the compact space X × Y , we obtain that Z is compact. Any subspace of a first
countable space is first countable, so Z is also first countable. Next, observe that πX (Z)
equipped with the subspace topology is compact, because it is a continuous image of a
compact space. It is also first countable because it is a subspace of the first countable space
X. Furthermore, the restriction of ωX to πX (Z) is continuous. Thus πX (Z) equipped with
the restriction of ωX is a compact network, and by similar reasoning, we get that πY (Z)
equipped with the restriction of ωY is also a compact network.
Proof of Claim 4. We will show that Z ⊆ X × Y contains all its limit points. Let (x, y) ∈
X × Y be a limit point of Z. Let {Un ⊆ X : n ∈ N, (x, y) ∈ Un } be a countable neigh-
borhood base of (x, y). For each n ∈ N, the finite intersection Vn := ∩ni=1 Ui is an open
neighborhood of (x, y), and thus contains a point (xn , yn ) ∈ Z that is distinct from (x, y)
(by the definition of a limit point). Pick such an (xn , yn ) for each n ∈ N. Then (xn , yn )n∈N
is a sequence in Z converging to (x, y) such that (xn , yn ) ∈ Vn for each n ∈ N.
For each n ∈ N, note that because (xn , yn ) ∈ Z and Vn is an open neighborhood of
(xn , yn ), there exists a sequence in S converging to (xn , yn ) for which all but finitely many
terms are contained in Vn . So for each n ∈ N, let (x̃n , ỹn ) ∈ Sn be such that (x̃n , ỹn ) ∈ Vn .
Then the sequence (x̃n , ỹn )n∈N ∈ S converges to (x, y). Thus (x, y) ∈ Z. Since (x, y) was
an arbitrary limit point of Z, it follows that Z is closed. 

Proof of Equation 2.1. We now prove Equation 2.1. Let z = (x, y), z 0 = (x0 , y 0 ) ∈ Z, and
let (x̃n , ỹn )n∈N , (x̃0n , ỹn0 )n∈N be elements of S that converge to (x, y), (x0 , y 0 ) respectively.

99
We wish to show |ωX (x, x0 ) − ωY (y, y 0 )| = 0. Let ε > 0, and observe that:

|ωX (x, x0 ) − ωY (y, y 0 )|


= |ωX (x, x0 ) − ωX (x̃n , x̃0n ) + ωX (x̃n , x̃0n ) − ωY (ỹn , ỹn0 ) + ωY (ỹn , ỹn0 ) − ωY (y, y 0 )|
≤ |ωX (x, x0 ) − ωX (x̃n , x̃0n )| + |ωX (x̃n , x̃0n ) − ωY (ỹn , ỹn0 )| + |ωY (ỹn , ỹn0 ) − ωY (y, y 0 )|.

Claim 5. Suppose we are given sequences (x̃n , ỹn )n∈N , (x̃0n , ỹn0 )n∈N in Z converging to
(x, y) and (x0 , y 0 ) in Z, respectively. Then there exists N ∈ N such that for all n ≥ N , we
have:

|ωX (x, x0 ) − ωX (x̃n , x̃0n )| < ε/4, |ωY (ỹn , ỹn0 ) − ωY (y, y 0 )| < ε/4.

Proof of Claim 5. Write a := ωX (x, x0 ), b := ωY (y, y 0 ). Since ωX , ωY are continuous, we


−1
know that ωX [B(a, ε/4)] and ωY−1 [B(b, ε/4)] are open neighborhoods of (x, x0 ) and (y, y 0 ).
Since each open set in the product space X × X is a union of open rectangles of the form
−1
A × A0 for A, A0 open subsets of X, we choose an open set A × A0 ⊆ ωX [B(a, ε/4)] such
0 0 0 −1
that (x, x ) ∈ A × A . Similarly, we choose an open set B × B ⊆ ωY [B(b, ε/4)] such
that (y, y 0 ) ∈ B × B 0 . Then A × B, A0 × B 0 are open neighborhoods of (x, y), (x0 , y 0 )
respectively. Since (x̃n , ỹn )n∈N and (x̃0n , ỹn0 )n∈N converge to (x, y) and (x0 , y 0 ), respectively,
we choose N ∈ N such that for all n ≥ N , we have (x̃n , ỹn ) ∈ A × B and (x̃0n , ỹn0 ) ∈
A0 × B 0 . The claim now follows. 
Now choose N ∈ N such that the property in Claim 5 is satisfied, as well as the addi-
tional property that 8/N < ε/4. Then for any n ≥ N , we have:

|ωX (x, x0 ) − ωY (y, y 0 )| ≤ ε/4 + |ωX (x̃n , x̃0n ) − ωY (ỹn , ỹn0 )| + ε/4.

Separately note that for each n ∈ N, having (x̃n , ỹn ), (x̃0n , ỹn0 ) ∈ Sn implies that there
exist (xn , yn ) and (x0n , yn0 ) ∈ Rn such that (xn , x̃n ), (x0n , x̃0n ) ∈ Tn and (ỹn , yn ), (ỹn0 , yn0 ) ∈
Pn . Thus we can bound the middle term above as follows:

|ωX (x̃n , x̃0n ) − ωY (ỹn , ỹn0 )|


= |ωX (x̃n , x̃0n ) − ωX (xn , x0n ) + ωX (xn , x0n ) − ωY (yn , yn0 ) + ωY (yn , yn0 ) − ωY (ỹn , ỹn0 )|
≤ |ωX (x̃n , x̃0n ) − ωX (xn , x0n )| + |ωX (xn , x0n ) − ωY (yn , yn0 )| + |ωY (yn , yn0 ) − ωY (ỹn , ỹn0 )|
≤ dis(Tn ) + dis(Rn ) + dis(Pn ) < 8/n ≤ 8/N < ε/4.

The preceding calculations show that:

|ωX (x, x0 ) − ωY (y, y 0 )| < ε.

Since ε > 0 was arbitrary, it follows that ωX (x, x0 ) = ωY (y, y 0 ). This proves Equation 2.1.
It remains to define surjective maps ψX : X → πX (Z), ψY : Y → πY (Z) and to verify
Equations 2.2 and 2.3. Both cases are similar, so we only show the details of constructing
ψX and verifying Equation 2.2.

100
Construction of ψX . Let x ∈ X. Suppose first that x ∈ πX (Z). Then we simply define
ψX (x) = x. We also make the following observation, to be used later: for each n ∈ N,
letting y ∈ Y be such that (x, y) ∈ Sn , there exists xn ∈ Xn and yn ∈ Yn such that
(xn , x) ∈ Tn and (y, yn ) ∈ Pn .
Next suppose x ∈ X \ πX (Z). For each n ∈ N, let xn ∈ Xn be such that (xn , x) ∈ Tn ,
and let x̃n ∈ X be such that (xn , x̃n ) ∈ Tn . Also for each n ∈ N, let ỹn ∈ Y be such that
(x̃n , ỹn ) ∈ Sn . Then for each n ∈ N, let yn ∈ Yn be such that (ỹn , yn ) ∈ Pn . Then by
sequential compactness of X × Y , the sequence (x̃n , ỹn )n∈N has a convergent subsequence
which belongs to S and converges to a point (x̃, ỹ) ∈ Z. In particular, we obtain a sequence
(x̃n )n∈N converging to a point x̃, such that (xn , x) and (xn , x̃n ) ∈ Tn for each n ∈ N. Define
ψX (x) = x̃.
Since x ∈ X was arbitrary, this construction defines ψX : X → πX (Z). Note that ψX
is simply the identity on πX (Z), hence is surjective.
Proof of Equation 2.2. Now we verify Equation 2.2. Let ε > 0. There are three cases to
check:

Case 1: x, x0 ∈ πX (Z) In this case, we have:

|ωX (x, x0 ) − ωX (ψX (x), ψX (x0 ))| = ωX (x, x0 ) − ωX (x, x0 ) = 0.

Case 2: x, x0 ∈ X \ πX (Z) By continuity of ωX , we obtain an open neighborhood U :=


−1
ωX [B(ωX (ψX (x), ψX (x0 )), ε/2)] of (x, x0 ). By the definition of ψX on X \ πX (Z),
we obtain sequences (x̃n , ỹn )n∈N and (x̃0n , ỹn0 )n∈N in S converging to (ψX (x), ỹ) and
(ψX (x0 ), ỹ 0 ) for some ỹ, ỹ 0 ∈ Y . By applying Claim 5, we obtain N ∈ N such that
for all n ≥ N , we have (x̃n , x̃0n ) ∈ U . Note that we also obtain sequences (xn )n∈N
and (x0n )n∈N such that (xn , x), (xn , x̃n ) ∈ Tn and (x0n , x0 ), (x0n , x̃0n ) ∈ Tn . Choose N
large enough so that it satisfies the property above and also that 4/N < ε/2. Then
for any n ≥ N ,

|ωX (x, x0 ) − ωX (ψX (x), ψX (x0 ))|


= |ωX (x, x0 ) − ωX (xn , x0n ) + ωX (xn , x0n ) − ωX (x̃n , x̃0n ) + ωX (x̃n , x̃0n ) − ωX (ψX (x), ψX (x0 ))|
≤ dis(Tn ) + dis(Tn ) + ε/2 < 4/n + ε/2 ≤ 4/N + ε/2 < ε.

Case 3: x ∈ πX (Z), x0 ∈ X \ πX (Z) By the definition of ψX on X \ πX (Z), we obtain:


(1) a sequence (x̃0n )n∈N converging to ψX (x0 ), and (2) another sequence (x0n )n∈N such
that (x0n , x0 ) and (x0n , x̃0n ) both belong to Tn , for each n ∈ N. By the definition of ψX
on πX (Z), we obtain a sequence (xn )n∈N such that (xn , x) ∈ Tn for each n ∈ N.
−1
Let U := ωX [B(ωX (x, ψX (x0 )), ε/2)]. Since (x̃0n )n∈N converges to ψX (x0 ), we
know that all but finitely many terms of the sequence (x, x̃0n )n∈N belong to U . So

101
we choose N large enough so that for each n ≥ N , we have:

|ωX (x, x0 ) − ωX (x, ψX (x0 ))|


= |ωX (x, x0 ) − ωX (xn , x0n ) + ωX (xn , x0n ) − ωX (x, x̃0n ) + ωX (x, x̃0n ) − ωX (x, ψX (x0 ))|
≤ dis(Tn ) + dis(Tn ) + ε/2 < 4/n + ε/2 ≤ 4/N + ε/2 < ε.

Since ε > 0 was arbitrary, Equation 2.2 follows. The construction of ψY and proof for
Equation 2.3 are similar. This concludes the proof of the theorem.
As a consequence of Theorem 73, we see that weak isomorphisms of Types I and II
coincide in the setting of CN . Thus we recover a desirable notion of equivalence in the
setting of compact networks.

2.4 Skeletons and motif reconstruction


In this section, we prove the results stated in §1.6.4. We also state and prove certain
auxiliary results and a definition, cf. Propositions 132, 133, Definition 44, and Theorem
134.

Proposition 76. Let (X, ωX ), (Y, ωY ) be networks with coherent topologies. Suppose f :
X → Y is a weight-preserving map and f (X) is a subnetwork of Y with the subspace
topology. Then f is continuous.

Proof. Let V 0 be an open subset of Y , and write V := V 0 ∩ f (X). Then V is open rel
f (X). We need to show that U := f −1 (V 0 ) = f −1 (V ) is open. Let x ∈ U , and suppose
(xn )n is a sequence in X converging to x. Then f (xn ) → f (x) rel f (X). To see this, note
that

kωY (f (xn ), •)|f (X) − ωY (f (x), •)|f (X) k = kωY (f (xn ), f (•))|X − ωY (f (x), f (•))|X k
= kωX (xn , •) − ωX (x, •)k,

and the latter converges to 0 uniformly by Axiom A2 for X. Similarly, kωY (, f (xn ))|f (X) −
ωY (•, f (x))|f (X) k converges to 0 uniformly. Thus by Axiom A2 for f (X), we have f (xn ) →
f (x) rel f (X). But then there must exist N ∈ N such that f (xn ) ∈ V for all n ≥ N . Then
xn ∈ U for all n ≥ N . Thus U is open rel X by A1. This concludes the proof.

2.4.1 The skeleton of a compact network


We now prove that the skeleton of a compact network is terminal in the sense of Defi-
nition 32.

Proposition 80. Suppose (X, ωX ) ∈ N has a coherent topology. Then the map σ : X →
X/ ∼ is an open map, i.e. it maps open sets to open sets.

102
Proof of Proposition 80. Let U ⊆ X be open. We need to show σ −1 (σ(U )) is open. For
convenience, define V := σ −1 (σ(U )). Let v ∈ V . Then σ(v) = [v] = [x] for some x ∈ U .
Let (vn )n∈N be any sequence in X such that vn → v rel X. We first show that vn → x
unif. unif.
rel X. We know ωX (vn , •) −−→ ωX (v, •) and ωX (•, vn ) −−→ ωX (•, v) by Axiom A2.
But ωX (v, •) = ωX (x, •) and ωX (•, v) = ωX (•, x), because x ∼ v. By A2, we then have
vn → x rel X. But then there exists N ∈ N such that vn ∈ U ⊆ V for all n ≥ N . This
shows that any sequence (vn ) in X converging rel X to an arbitrary point v ∈ V must
eventually be in V . Thus V is open rel X, by Axiom A1. This concludes the proof.
The following lemma summarizes useful facts about weight preserving maps and the
relation ∼.

Lemma 131. Let (X, ωX ), (Y, ωY ) ∈ N , and let f : X → Y be a weight preserving


surjection. Then,

1. f preserves equivalence classes of ∼, i.e. x ∼ x0 for x, x0 ∈ X iff f (x) ∼ f (x0 ).

2. f preserves weights between equivalence classes, i.e. ωX/∼ ([x], [x0 ]) = ωY /∼ ([f (x)], [f (x0 )])
for any [x], [x0 ] ∈ X/ ∼.

Proof of Lemma 131. For the first assertion, let x ∼ x0 for some x, x0 ∈ X. We wish to
show f (x) ∼ f (x0 ). Let y ∈ Y , and write y = f (z) for some z ∈ X. Then,

ωY (f (x), y) = ωY (f (x), f (z)) = ωX (x, z) = ωX (x0 , z) = ωY (f (x0 ), f (z)) = ωY (f (x0 ), y).

Similarly we have ωY (y, f (x)) = ωY (y, f (x0 )) for any y ∈ Y . Thus f (x) ∼ f (x0 ).
Conversely suppose f (x) ∼ f (x0 ). Let z ∈ X. Then,

ωX (x, z) = ωY (f (x), f (z)) = ωY (f (x0 ), f (z)) = ωX (x0 , z),

and similarly we get ωX (z, x) = ωX (z, x0 ). Thus x ∼ x0 . This proves the first assertion.
The second assertion holds by definition:

ωY /∼ ([f (x)], [f (x0 )]) = ωY (f (x), f (x0 )) = ωX (x, x0 ) = ωX/∼ ([x], [x0 ]).

Proposition 81. Let (X, ωX ) be a compact network with a coherent topology. The quotient
topology on (sk(X), ωsk(X) ) is also coherent.

Proof of Proposition 81. Let Z be any subnetwork of sk(X). Axiom A1 holds for any first
countable space, and we have already shown that sk(X) is first countable. Any subspace
of a first countable space is first countable, so Z satisfies A1.
Next we verify Axiom A2. We begin with the “if” statement. Let [x] ∈ Z and let
([xn ])n be some sequence in Z. Suppose we have
unif. unif.
ωsk(X) ([xn ], [•])|Z −−→ ωsk(X) ([x], [•])|Z , ωsk(X) ([•], [xn ])|Z −−→ ωsk(X) ([•], [x])|Z .

103
Then we also have the following:
unif. unif.
ωX (xn , •)|σ−1 (Z) −−→ ωX (x, •)|σ−1 (Z) , ωX (•, xn )|σ−1 (Z) −−→ ωX (•, x)|σ−1 (Z) .

Since X is coherent and σ −1 (Z) is a subnetwork, it follows by Axiom A2 that xn → x rel


σ −1 (Z).
Let V ⊆ Z be an open set rel Z containing [x]. We wish to show [xn ] → [x] rel Z, so
it suffices to show that V contains all but finitely many of the [xn ] terms. Since Z has the
subspace topology, we know that V = Z ∩ V 0 for some open set V 0 ⊆ sk(X) = σ(X).
Write U 0 := σ −1 (V 0 ). By continuity of σ, U 0 is open. Write U := σ −1 (Z) ∩ U 0 . Then U is
open rel σ −1 (Z). Since xn → x rel σ −1 (Z), all but finitely many of the xn terms belong to
U . Thus all but finitely many of the [xn ] terms belong to V . Thus [xn ] → [x] rel Z.
Now we show the “only if” statement. First we invoke the Axiom of Choice to pick a
representative from each equivalence class of X/ ∼. We denote this collection of represen-
tatives by Y and give it the subspace topology. Define τ := σ|Y . Then τ : Y → sk(X) is a
bijection given by x 7→ [x]. By the discussion following Definition 28, we know that Y is
coherent.
Let ([xn ])n be a sequence in Z converging rel Z to some [x] ∈ Z. First we show
xn → x rel Y . Let A ⊆ Y be an open set rel Y containing x. Then τ (A) is an open set rel
τ (Y ) = sk(X) containing [x] (Proposition 80). In particular, τ (A) ∩ Z is open rel Z. Thus
([xn ])n is eventually inside τ (A) ∩ Z, in particular τ (A), by the definition of convergence
rel Z. Because τ is a bijection, we have that (xn )n = (τ −1 ([xn ])n is eventually inside A.
Thus any open set rel Y containing x also contains all but finitely many terms of (xn )n . It
follows by the definition of convergence that xn → x rel Y .
unif.
Since Y is coherent, it follows by Axiom A2 that we have ωX (xn , •)|Y −−→ ωX (x, •)|Y
unif.
and ωX (•, xn )|Y −−→ ωX (•, x)|Y . By the definition of ∼, we then have:
unif.
ωsk(X) ([xn ], [•]) = ωX (xn , •)|Y −−→ ωX (x, •)|Y = ωsk(X) ([x], [•]).
unif.
Similarly we have ωsk(X) ([•], [xn ]) −−→ ωsk(X) ([•], [x]). This shows the “only if” state-
ment.
This verifies Axiom A2 for Z. Since Z ⊆ sk(X) was arbitrary, this concludes the
proof.

Proposition 82. Let (X, ωX ) be a compact network with a coherent topology. Then its
skeleton (sk(X), ωsk(X) ) is Hausdorff.

Proof of Proposition 82. Let [x] 6= [x0 ] ∈ sk(X). By first countability, we take a count-
able open neighborhood base {Un : n ∈ N} of [x] such that U1 ⊇ U2 ⊇ U3 . . . (if neces-
sary, we replace Un by ∩ni=1 Ui ). Similarly, we take a countable open neighborhood base
{Vn : n ∈ N} of [x0 ] such that V1 ⊇ V2 ⊇ V3 . . .. To show that sk(X) is Hausdorff, it
suffices to show that there exists n ∈ N such that Un ∩ Vn = ∅.

104
Towards a contradiction, suppose Un ∩ Vn 6= ∅ for each n ∈ N. For each n ∈ N,
let [yn ] ∈ Un ∩ Vn . Any open set containing [x] contains UN for some N ∈ N, and thus
contains [yn ] for all n ≥ N . Thus [yn ] → [x] rel sk(X). Similarly, [yn ] → [x0 ] rel sk(X).
Because sk(X) has a coherent topology (Proposition 81) and thus satisfies Axiom A2, we
then have:

ωsk(X) ([x0 ], [•]) = unif lim ωsk(X) ([yn ], [•]) = ωsk(X) ([x], [•]),
n
0
ωsk(X) ([•], [x ]) = unif lim ωsk(X) ([•], [yn ]) = ωsk(X) ([•], [x]).
n

But then x ∼ x0 and so [x] = [x0 ], a contradiction.

We are now ready to prove that skeletons are terminal, in the sense of Definition 32
(also recall Definitions 30 and 31).

Theorem 83 (Skeletons are terminal). Let (X, ωX ) ∈ CN be such that the topology on X
is coherent. Then (sk(X), ωsk(X) ) ∈ CN is terminal in p(X).

Proof of Theorem 83. Let Y ∈ p(X). Let f : X → Y be a weight preserving surjection.


We first prove that there exists a weight preserving surjection g : Y → sk(X).
Since f is surjective, for each y ∈ Y we can write y = f (xy ) for some xy ∈ X. Then
define g : Y → sk(X) by g(y) := [xy ].
To see that g is surjective, let [x] ∈ sk(X). Write y = f (x). Then there exists xy ∈ X
such that f (xy ) = y and g(y) = [xy ]. Since f preserves equivalence classes (Lemma 131)
and f (xy ) = f (x), we have x ∼ xy . Thus [xy ] = [x], and so g(y) = [x].
To see that g preserves weights, let y, y 0 ∈ Y . Then,

ωY (y, y 0 ) = ωY (f (xy ), f (xy0 )) = ωX (xy , xy0 ) = ωsk(X) ([xy ], [xy0 ]) = ωsk(X) (g(y), g(y 0 )).

This proves that the skeleton satisfies the first condition for being terminal.
Next suppose g : Y → sk(X) and h : Y → sk(X) are two weight preserving surjec-
tions. We wish to show h = ψ ◦ g for some ψ ∈ Aut(sk(X)).
For each [x] ∈ sk(X), we use the surjectivity of g to pick yx ∈ Y such that g(yx ) = [x].
Then we define ψ : sk(X) → sk(X) by ψ([x]) = ψ(g(yx )) := h(yx ).
To see that ψ is surjective, let [x] ∈ sk(X). Since h is surjective, there exists yx0 ∈ Y
such that h(yx0 ) = [x]. Write [u] = g(yx0 ). We have already chosen yu such that g(yu ) = [u].
Since g preserves equivalence classes (Lemma 131), it follows that yx0 ∼ yu . Then,

ψ([u]) = ψ(g(yu )) = h(yu ) = h(yx0 ) = [x],

where the second-to-last equality holds because h preserves equivalence classes (Lemma
131).
To see that ψ is injective, let [x], [x0 ] ∈ sk(X) be such that ψ([x]) = h(yx ) = h(yx0 ) =
ψ([x0 ]). Since h preserves equivalence classes (Lemma 131), we have yx ∼ yx0 . Next,

105
g(yx ) = [x] and g(yx0 ) = [x0 ] by the choices we made earlier. Since yx ∼ yx0 and g
preserves clusters, we have g(yx ) ∼ g(yx0 ). Thus [x] = [x0 ].
Next we wish to show that ψ preserves weights. Let [x], [x0 ] ∈ sk(X). Then,

ωsk(X) (ψ([x]), ψ([x0 ])) = ωsk(X) (h(yx ), h(yx0 )) = ωY (yx , yx0 ) = ωsk(X) (g(yx ), g(yx0 ))
= ωsk(X) ([x], [x0 ]).

Thus ψ is a bijective, weight preserving automorphism of sk(X). Finally we wish to


show that h = ψ ◦ g. Let y ∈ Y , and write g(y) = [x] for some x ∈ X. Since g preserves
equivalence classes (Lemma 131), we have y ∼ yx , where g(yx ) = [x]. Then,

ψ(g(y)) = ψ([x]) = ψ(g(yx )) = h(yx ) = h(y),

where the last equality holds because h preserves equivalence classes (Lemma 131). Thus
for each y ∈ Y , we have h(y) = ψ(g(y)). This shows that the skeleton satisfies the second
condition for being terminal. We conclude the proof.

2.4.2 Reconstruction via motifs and skeletons


Our goal in this section is to prove that weak isomorphism, equality of motif sets,
and strong isomorphism between skeleta are equivalent in the setting of compact networks
with coherent topologies. However, we need to preface this theorem by proving some
preparatory results.
Proposition 132. Let (X, ωX ), (Y, ωY ) be compact networks such that Mn (X) = Mn (Y )
for all n ∈ N. Suppose X contains a countable subset SX . Then there exists a weight-
preserving map f : SX → Y .
Proof of Proposition 132. We proceed via a diagonal argument. Write SX = {x1 , x2 , . . . , xn , . . .}.
For each n ∈ N, let fn : SX → Y be a map that preserves weights on {x1 , . . . , xn }. Such
a map exists by the assumption that Mn (X) = Mn (Y ).
Since Y is first countable and compact, hence sequentially compact, the sequence
(fn (x1 ))n has a convergent subsequence; we write this as (f1,n (x1 ))n . Since fk is weight-
preserving on {x1 , x2 } for k ≥ 2, we know that f1,n is weight-preserving on {x1 , x2 } for
n ≥ 2. Using sequential compactness again, we have that (f1,n (x2 ))n has a convergent
subsequence (f2,n (x2 ))n . This sequence converges at both x1 and x2 , and f2,n is weight-
preserving on {x1 , x2 } for n ≥ 2. Proceeding in this way, we obtain the diagonal sequence
(fn,n )n which converges pointwise on SX . Furthermore, for any n ∈ N, fk,k is weight-
preserving on {x1 , . . . , xn } for k ≥ n.
Next define f : SX → Y by setting f (x) := limn fn,n (x) for each x ∈ SX . It remains
to show that f is weight-preserving. Let xn , xm ∈ SX , and let k ≥ max(m, n). Then
ωX (xn , xm ) = ωY (fk,k (xn ), fk,k (xm )). Using (sequential) continuity of ωY , we then have:

ωY (f (xn ), f (xm )) = ωY (lim fk,k (xn ), lim fk,k (xm )) = lim ωY (fk,k (xn ), fk,k (xm )) = ωX (xn , xm ).
k k k

106
In the second equality above, we used the fact that a sequence converges in the product
topology iff the components converge. Since xn , xm ∈ SX were arbitrary, this concludes
the proof.

Proposition 133. Let (X, ωX ), (Y, ωY ) be compact networks. Suppose f : SX → Y is a


weight-preserving function defined on a countable dense subset SX ⊆ X. Then f extends
to a weight-preserving map on X.

Proof of Proposition 133. Let x ∈ X \ SX . By first countability, we take a countable


neighborhood base {Un : n ∈ N} of x such that U1 ⊇ U2 ⊇ U3 . . . (if necessary, we replace
Un by ∩ni=1 Ui ). For each n ∈ N, let xn ∈ Un ∩ SX . Then xn → x. To see this, let U be any
open set containing x. Then Un ⊆ U for some n ∈ N, and so xk ∈ Un ⊆ U for all k ≥ n.
Because Y is compact and first countable, hence sequentially compact, the sequence
(f (xn ))n has a convergent subsequence; let y be its limit. Define f (x) = y. Extend f to all
of X this way.
We need to verify that f is weight-preserving. Let x, x0 ∈ X. Invoking the definition
of f , let (xn )n , (x0n )n be sequences in SX converging to x, x0 such that f (xn ) → f (x) and
f (x0n ) → f (x0 ). By sequential continuity and the standard result that a sequence converges
in the product topology iff the components converge, we have

lim ωY (f (xn ), f (x0n )) = ωY (f (x), f (x0 )); lim ωX (xn , x0n ) = ωX (x, x0 ).
n n

Let ε > 0. By the previous observation, fix N ∈ N such that for all n ≥ N , we have
|ωY (f (xn ), f (x0n )) − ωY (f (x), f (x0 ))| < ε and |ωX (xn , x0n ) − ωX (x, x0 )| < ε. Then,

|ωX (x, x0 ) − ωY (f (x), f (x0 ))| = |ωX (x, x0 ) − ωX (xn , x0n ) + ωX (xn , x0n ) − ωY (f (x), f (x0 ))|
≤ |ωX (x, x0 ) − ωX (xn , x0n )| + |ωX (f (xn ), f (x0n )) − ωY (f (x), f (x0 ))| < 2ε.

Thus ωX (x, x0 ) = ωY (f (x), f (x0 )). Since x, x0 ∈ X were arbitrary, this concludes the
proof.
The next result generalizes the result that an isometric embedding of a compact metric
space into itself is automatically surjective [17, Theorem 1.6.14]. However, before present-
ing the theorem we first discuss an auxiliary construction that is used in its proof.

Definition 44 (The canonical pseudometric of a network). Let (X, ωX ) be any network.


For any subset A ⊆ X, define ΓA : X × X → R+ by

ΓA (x, x0 ) := max sup |ωX (x, a) − ωX (x0 , a)|, sup |ωX (a, x) − ωX (a, x0 )| .

a∈A a∈A

Then ΓA satisfies symmetry, triangle inequality, and ΓA (x, x) = 0 for all x ∈ X. Thus ΓA
is a pseudometric on X. Moreover, ΓA is a bona fide metric on sk(A). The construction is
“canonical” because it does not rely on any coupling between the topology of X and ωX :
even the continuity of ωX is not necessary for this construction.

107
Next, for any E ⊆ X and any y ∈ X, define ΓA (y, E) := inf y0 ∈E ΓA (y, y 0 ). Then
ΓA (•, E) behaves as a proxy for the “distance to a set” function, where the set is fixed to
be E.

Theorem 134. Let (X, ωX ) be a compact network with a coherent, Hausdorff topology.
Suppose f : X → X is a weight-preserving map. Then f is surjective.

Proof of Theorem 134. Towards a contradiction, suppose f (X) 6= X. By Proposition 76,


f is continuous. Define X0 := X, and Xn := f (Xn−1 ) for each n ∈ N. The continuous
image of a compact space is compact, and compact subspaces of a Hausdorff space are
closed. Thus we obtain a decreasing sequence of nonempty compact sets X0 ⊇ X1 ⊇
X2 ⊇ . . .. Then Z := ∩n∈N Xn is nonempty and compact, hence closed.
We now break up the proof up into several claims.
Claim 6. f (Z) = Z.
To see this, first note that f (∩n∈N Xn ) ⊆ ∩n∈N f (Xn ) ⊆ Z. Next let v ∈ Z. For each
n ∈ N, let un ∈ Xn be such that f (un ) = v. Since singletons in a Hausdorff space are
closed, we know that {v} is closed. By continuity, it follows that f −1 ({v}) is closed.
By sequential compactness, the sequence (un )n has a convergent subsequence that con-
verges to some limit u. Since each un ∈ f −1 ({v}) and a closed set contains its limit points,
we then have u ∈ f −1 ({v}). Thus f (u) = v, and v ∈ f (Z). Hence Z = f (Z). This proves
the claim.
Let x ∈ X0 \ X1 . Define x0 := x, and for each n ∈ N, define xn := f (xn−1 ).
Then (xn )n is a sequence in the sequentially compact space X, and so it has a convergent
subsequence (xnk )k . Let z be the limit of this subsequence.
Claim 7. z ∈ Z.
To see this, suppose towards a contradiction that z 6∈ Z. Then there exists N ∈ N such
that z 6∈ XN . Since XN is closed, we have that X \ XN is open. By the definition of
convergence, X \ XN contains all but finitely many terms of the sequence (xnk )k . But each
xnk belongs to Xnk , which is a subset of XN for sufficiently large k. Thus infinitely many
terms of the sequence (xnk )k belong to XN , a contradiction. Hence z ∈ Z.
Now we invoke the Γ• construction as in Definition 44.
Claim 8. For any E ⊆ X and any y ∈ E,

ΓE (y, Z) = Γf (E) (f (y), f (Z)).

To see this claim, fix y ∈ E. Let v ∈ f (Z). Then v = f (y 0 ) for some y 0 ∈ Z, and
Γf (E) (f (y), v) = ΓE (y, y 0 ). To see the latter assertion, let u ∈ f (E); then u = f (y 00 ) for
some y 00 ∈ E. Because f is weight-preserving, we then have:

|ωX (f (y), u) − ωX (v, u)| = |ωX (f (y), f (y 00 )) − ωX (f (y 0 ), f (y 00 ))| = |ωX (y, y 00 ) − ωX (y 0 , y 00 )|,
|ωX (u, f (y)) − ωX (u, v)| = |ωX (f (y 00 ), f (y)) − ωX (f (y 00 ), f (y 0 ))| = |ωX (y 00 , y) − ωX (y 00 , y 0 )|.

108
The preceding equalities show that for each v ∈ f (Z), there exists y 0 ∈ Z such that
Γf (E) (f (y), v) = ΓE (y, y 0 ). Conversely, for any y 0 ∈ Z, we have Γf (E) (f (y), f (y 0 )) =
ΓE (y, y 0 ). It follows that Γf (E) (f (y), f (Z)) = ΓE (y, Z).
Claim 9. ΓX (x, Z) = 0.
To see this, assume towards a contradiction that ΓX (x, Z) = ε > 0 (ΓX is posi-
tive by definition). Since f (Z) = Z, we have by the preceding claim that ΓX (x, Z) =
Γf (X) (f (x), Z) = . . . = Γf n (X) (f n (x), Z) for each n ∈ N. In particular, for any k ∈ N,

ε = Γf nk (X) (f nk (x), Z) ≤ Γf nk (X) (f nk (x), z) ≤ ΓX (f nk (x), z).

Here the first inequality follows because the left hand side includes an infimum over z ∈ Z,
and the second inequality holds because the right hand side includes a supremum over a
larger set.
Since xnk → z rel X, we have by Axiom A2 that
unif. unif.
kωX (xnk , •) − ωX (z, •)k −−→ 0, kωX (•, xnk ) − ωX (•, z)k −−→ 0.

Thus for large enough k, we have:

sup |ωX (xnk , y) − ωX (z, y)| < ε, sup |ωX (y, xnk ) − ωX (y, z)| < ε.
y∈X y∈Xnk

Thus ΓX (f nk (x), z) < ε, which is a contradiction. This proves the claim.


Recall that by assumption, x 6∈ Z. For each n ∈ N, let zn ∈ Z be such that ΓX (x, zn ) <
1/n. Then for each x0 ∈ X, we have

max (|ωX (x, x0 ) − ωX (zn , x0 )|, |ωX (x0 , x) − ωX (x0 , zn )|) < 1/n, i.e.
max (kωX (x, •) − ωX (zn , •)k, kωX (•, x) − ωX (•, zn )k) < 1/n.

Thus the sequence (zn )n converges to x, by Axiom A2. Hence any open set containing x
also contains infinitely many points of Z that are distinct from x. Thus x is a limit point of
the closed set Z, and so x ∈ Z. This is a contradiction.

Theorem 84. Suppose (X, ωX ), (Y, ωY ) are separable, compact networks with coherent
topologies. Then the following are equivalent:

1. X ∼
=w Y .

2. Mn (X) = Mn (Y ) for all n ∈ N.

3. sk(X) ∼
=s sk(Y ).

109
Proof of Theorem 84. (2) follows from (1) by the stability of motif sets (Theorem 149). (1)
follows from (3) by the triangle inequality of dN . We need to show that (2) implies (3).
First observe that sk(X), being a continuous image of the separable space X, is sep-
arable, and likewise for sk(Y ). Let SX , SY denote countable dense subsets of sk(X) and
sk(Y ). Next, because dN (X, sk(X)) = 0, an application of Theorem 149 shows that
Mn (X) = Mn (sk(X)) for each n ∈ N. The analogous result holds for sk(Y ). Thus
Mn (sk(X)) = Mn (sk(Y )) for each n ∈ N. Since X and Y have coherent topologies, so
do sk(X) and sk(Y ), by Proposition 81. By Propositions 132 and 133, there exist weight-
preserving maps ϕ : sk(X) → sk(Y ) and ψ : sk(Y ) → sk(X). Define X (1) := ψ(sk(Y ))
and Y (1) := ϕ(sk(X)). Also define ϕ1 and ψ1 to be the restrictions of ϕ and ψ to X (1) and
Y (1) , respectively. Finally define X (2) := ψ1 (Y (1) ) and Y (2) := ϕ1 (X (1) ). Then we have
the following diagram.

sk(X) ⊇ X (1) ⊇ X (2)


ϕ ϕ1
ψ ψ1

sk(Y ) ⊇ Y (1) ⊇ Y (2)

Now ψ ◦ ϕ is a weight-preserving map from sk(X) into itself. Furthermore, it is con-


tinuous by Proposition 76. Since sk(X) is Hausdorff (Proposition 82), an application of
Theorem 134 now shows that ψ ◦ ϕ : sk(X) → sk(X) is surjective. It follows from Def-
inition 32 that ψ ◦ ϕ is an automorphism of sk(X), hence a bijection. It follows that ϕ is
injective. The dual argument for ϕ ◦ ψ shows that ψ is also injective.
Since ψ ◦ ϕ(sk(X)) = X (2) = sk(X) and X (2) ⊆ X (1) ⊆ sk(X), we must have X (1) =
sk(X). Similarly, Y (1) = sk(Y ). Thus ϕ : sk(X) → sk(Y ) and ψ : sk(Y ) → sk(X) are a
weight-preserving bijections. In particular, we have sk(X) ∼ =s sk(Y ). This concludes the
proof.

2.5 Completeness, compactness, and geodesics

2.5.1 The completion of CN /∼


=w
A very natural question regarding CN /∼ =w is if it is complete. This indeed turns out to
be the case, and its proof is the content of the current section.
Lemma 135. Let X1 , . . . , Xn ∈ FN , and for each i = 1, . . . , n−1, let Ri ∈ R(Xi , Xi+1 ).
Define

R := R1 ◦ R2 ◦ · · · ◦ Rn
= (x1 , xn ) ∈ X1 × Xn | ∃(xi )n−1

i=2 , (xi , xi+1 ) ∈ Ri for all i .

Then dis(R) ≤ ni=1 dis(Ri ).


P

110
Proof. We proceed by induction, beginning with the base case n = 2. For convenience,
write X := X1 , Y := X2 , and Z := X3 . Let (x, z), (x0 , z 0 ) ∈ R1 ◦ R2 . Let y ∈ Y be such
that (x, y) ∈ R1 and (y, z) ∈ R2 . Let y 0 ∈ Y be such that (x0 , y 0 ) ∈ R1 , (y 0 , z 0 ) ∈ R2 . Then
we have:

|ωX (x, x0 ) − ωZ (z, z 0 )| = |ωX (x, x0 ) − ωY (y, y 0 ) + ωY (y, y 0 ) − ωZ (z, z 0 )|


≤ |ωX (x, x0 ) − ωY (y, y 0 )| + |ωY (y, y 0 ) − ωZ (z, z 0 )|
≤ dis(R) + dis(S).

This holds for any (x, z), (x0 , z 0 ) ∈ R ◦ S, and proves the claim.
Suppose that the result holds for n = N ∈ N. Write R0 = R1 ◦ · · · ◦ RN and R =
R ◦ RN +1 . Since R0 is itself a correspondence, applying the base case yields:
0

dis(R) ≤ dis(R0 ) + dis(RN +1 )


N
X
≤ dis(Ri ) + dis(RN +1 ) by induction
i=1
N
X +1
= dis(Ri ).
i=1

This proves the lemma.

Theorem 88. The completion of (FN /∼


=w , dN ) is (CN /∼
=w , dN ).

Proof. Let ([Xi ])i∈N be a Cauchy sequence in FN /∼ =w . First we wish to show this se-
quence converges in CN /∼ =w . Note that (Xi )i∈N is a Cauchy sequence in FN , since the
distance between two equivalence classes is given by the distance between any representa-
tives. To show (Xi )i converges, it suffices to show that a subsequence of (Xi )i converges,
so without loss of generality, suppose dN (Xi , Xi+1 ) < 2−i for each i. Then for each i,
there exists Ri ∈ R(Xi , Xi+1 ) such that dis(Ri ) ≤ 2−i+1 . Fix such a sequence (Ri )i∈N .
For j > i, define
Rij := Ri ◦ Ri+1 ◦ Ri+2 ◦ · · · ◦ Rj−1 .
By Lemma 135, dis(Rij ) ≤ dis(Ri ) + dis(Ri+1 ) + . . . + dis(Rj−1 ) ≤ 2−i+2 . Next define:

X := {(xj ) : (xj , xj+1 ) ∈ Rj for all j ∈ N} .

To see X 6= ∅, let x1 ∈ X1 , and use the (nonempty) correspondences to pick a sequence


(x1 , x2 , x3 , . . .). By construction, (xi ) ∈ X.
Define ωX ((xj ), (x0j )) = lim supj→∞ ωXj (xj , x0j ). We claim that ωX is bounded, and
thus is a real-valued weight function. To see this, let (xj ), (x0j ) ∈ X. Let j ∈ N. Then we

111
have:

|ωXj (xj , x0j )| = |ωXj (xj , x0j ) − ωXj−1 (xj−1 , x0j−1 ) + ωXj−1 (xj−1 , x0j−1 ) − . . .
− ωX1 (x1 , x01 ) + ωX1 (x1 , x01 )|
≤ |ωX1 (x1 , x01 )| + dis(R1 ) + dis(R2 ) + . . . + dis(Rj−1 )
≤ |ωX1 (x1 , x01 )| + 2

But j was arbitrary. Thus we obtain:

|ωX ((xj ), (x0j ))| = lim sup ωXj (xj , x0j ) ≤ |ωX1 (x1 , x01 )| + 2 < ∞.
j→∞

Claim 10. (X, ωX ) ∈ CN . More specifically, X is a first countable compact topological


space, and ωX is continuous with respect to the product topology on X × X.
Q
Proof of Claim Q 10. We equip i∈N Xi with the product topology. First note that the count-
able product i∈N Xi of first countable spaces
Q is first countable. Any subspace of a first
Q space is first countable, so X ⊆ i∈N Xi is first countable. By Tychonoff’s the-
countable
orem, i∈N Xi is compact. So to show that X is compact, we only need to show that it is
closed. Q
If X = i∈N Q Xi , we
 would automatically know that X is compact. Suppose not, and
let (xi )i∈N ∈ i∈N Xi \X. Then there exists N ∈ N such that (xN , xN +1 ) 6∈ RN . Define:

U := X1 × X2 × . . . × {xN } × {xN +1 } × XN +2 × . . . .

Since Xi has the discrete topology for each i ∈ N, it follows that {xN } and {xN +1 } are
Q
open. Hence U is an open neighborhood of (xi )i∈N and is disjoint from i∈N Xi . It follows
Q
that i∈N Xi \ X is open, hence X is closed and thus compact.
It remains to show that ωX is continuous. We will show that preimages of open sets
−1
in R under ωX are open. Let (a, b) ⊆ R, and suppose ωX [(a, b)] is nonempty (otherwise,
0
there is nothing to show). Let (xi )i∈N , (xi )i∈N ∈ X × X be such that

α := ωX ((xi )i , (x0i )i ) ∈ (a, b).

Write r0 := min(|α − a|, |b − α|), and define r := 12 r0 .


Let N ∈ N be such that 2−N +3 < r. Consider the following open sets:
Y
U := {x1 } × {x2 } × . . . × {xN } × XN +1 × XN +2 × . . . ⊆ Xi ,
i∈N
Y
V := {x01 } × {x02 } × ... × {x0N } × XN +1 × XN +2 × . . . ⊆ Xi .
i∈N

Next write A := X ∩ U and B := X ∩ V . Then A and B are open with respect to


the subspace topology on X. Thus A × B is open in X × X. Note that (xi )i∈N ∈ A

112
−1
and (x0i )i∈N ∈ B. We wish to show that A × B ⊆ ωX [(a, b)], so it suffices to show that
ωX (A, B) ⊆ (a, b).
Let (zi )i∈N ∈ A and (zi0 )i∈N ∈ B. Notice that zi = xi and zi0 = x0i for each i ≤ N . So
for n ≤ N , we have |ωXn (zn , zn0 ) − ωXn (xn , x0n )| = 0.
Next let n ∈ N, and note that:
0 0
|ωXN +n (zN +n , zN +n ) − ωXN +n (xN +n , xN +n )|
0 0 0 0
= |ωXN +n (zN +n , zN +n ) − ωXN (zN , zN ) + ωXN (zN , zN ) − ωXN +n (xN +n , xN +n )|
0 0 0 0
= |ωXN +n (zN +n , zN +n ) − ωXN (zN , zN ) + ωXN (xN , xN ) − ωXN +n (xN +n , xN +n )|
≤ dis(RN,N +n ) + dis(RN,N +n ) ≤ 2−N +2 + 2−N +2 = 2−N +3 < r.

Here the second to last inequality follows from Lemma 135. The preceding calculation
holds for arbitrary n ∈ N. It follows that:

lim sup ωXi (xi , x0i ) − lim sup ωXi (zi , zi0 ) ≤ lim sup(ωXi (xi , x0i ) − ωXi (zi , zi0 ) < r,
i→∞ i→∞ i→∞

and similarly lim supi→∞ ωXi (zi , zi0 )−lim supi→∞ ωXi (xi , x0i ) < r. Thus we have ωX ((zi )i , (zi0 )i ) ∈
(a, b). This proves continuity of ωX .
N d
Next we claim that Xi −→ X as i → ∞. Fix i ∈ N. We wish to construct a correspon-
dence S ∈ R(Xi , X). Let y ∈ Xi . We write xi = y and pick x1 , x2 , . . . , xi−1 , xi+1 , . . .
such that (xj ,j+1 ) ∈ Rj for each j ∈ N. We denote this sequence by (xj )xi =y , and note
that by construction, it lies in X. Conversely, for any (xj ) ∈ X, we simply pick its ith
coordinate xi as a corresponding element in Xi . We define:

S := A ∪ B, where
A := {(y, (xj )xi =y ) : y ∈ Xi }

B := (xi , (xk )) : (xk ) ∈ X

Then S ∈ R(Xi , X). We claim that dis(S) ≤ 2−i+2 . Let z = (y, (xk )), z 0 = (y 0 , (x0k )) ∈
B. Let n ∈ N, n ≥ i. Then we have:

|ωXi (y, y 0 ) − ωXn (xn , x0n )| = |ωXi (y, y 0 ) − ωXi+1 (xi+1 , x0i+1 ) + ωXi+1 (xi+1 , x0i+1 ) − . . .
+ ωXn−1 (xn−1 , x0n−1 ) + ωXn (xn , x0n )|
≤ dis(Ri ) + dis(Ri+1 ) + . . . + dis(Rn−1 )
≤ 2−i+1 + 2−i + . . . + 2−n+2
≤ 2−i+2 .

This holds for arbitrary n ≥ i. It follows that we have:

|ωXi (y, y 0 ) − ωX ((xk ), (x0k ))| ≤ 2−i+2 .

113
Similar inequalities hold for z, z 0 ∈ A, and for z ∈ A, z 0 ∈ B. Thus dis(S) ≤ 2−i+2 . It
follows that dN (Xi , X) ≤ 2−i+1 . Thus the sequence ([Xi ])i converges to [X] ∈ CN /∼ =w .
Finally, we need to check that (CN /∼ =w , dN ) is complete. Let ([Yn ])n be a Cauchy
sequence in CN /∼ =w . For each n, let [Xn ] ∈ FN /∼ =w be such that dN ([Xn ], [Yn ]) < n1 .
Let ε > 0. Then for sufficiently large m and n, we have:

dN ([Xn ], [Xm ]) ≤ dN ([Xn ], [Yn ]) + dN ([Yn ], [Ym ]) + dN ([Ym ], [Xm ]) < ε.

Thus ([Xn ])n is a Cauchy sequence in FN /∼=w . By applying what we have shown above,
this sequence converges to some [X] ∈ CN /∼ =w . By applying the triangle inequality,
we see that the sequence ([Yn ])n also converges to [X]. This shows completeness, and
concludes the proof.
The result of Theorem 88 can be summarized as follows:

The limit of a convergent sequence of finite networks is a compact topological space with
a continuous weight function.

Remark 136. The technique of composed correspondences used in the preceding proof
can also be used to show that the collection of isometry classes of compact metric spaces
endowed with the Gromov-Hausdorff distance is a complete metric space. Standard proofs
of this fact [98, §10] do not use correspondences, relying instead on a method of endowing
metrics on disjoint unions of spaces and then computing Hausdorff distances.

Remark 137. In the proof of Theorem 88, note that the construction of the limit is depen-
dent upon the initial choice of optimal correspondences. However, all such limits obtained
from different choices of optimal correspondences belong to the same weak isomorphism
class.

2.5.2 Precompact families in CN /∼


=w
We now prove Theorem 90. Our proof is modeled on the proof of an analogous result
for compact metric spaces proposed by Gromov [65]. We use one fact proved in a different
section (Proposition 145): for compact networks X, Y such that dN (X, Y ) < ε, we have
diam(X) ≤ diam(Y ) + 2ε.
Proof of Theorem 90. Let D ≥ 0 be such that diam(X) ≤ D for each [X] ∈ F. It suffices
to prove that F is totally bounded, because Theorem 88 gives completeness, and these two
properties together imply precompactness. Let ε > 0. We need to find a finite family
G ⊆ CN /∼ =w such that for every [F ] ∈ F, there exists [G] ∈ G with dN (F, G) < ε. Define:

A := {A ∈ FN : card(A) ≤ N (ε/2), dN (A, F ) < ε/2 for some [F ] ∈ F} .

Each element of A is an n × n matrix, where 1 ≤ n ≤ N (ε/2). For each A ∈ A,


there exists [F ] ∈ F with dN (A, F ) < ε/2, and by Proposition 145, we have diam(A) ≤

114
diam(F ) + 2(ε/2) ≤ D + ε. Thus the matrices in A have entries in [−D − ε, D + ε]. Let
N  1 be such that:
2D + 2ε ε
< ,
N 4
and write the refinement of [−D − ε, D + ε] into N pieces as:

W := −D − ε + k 2D+2ε
 
N
:0≤k≤N .
FN (ε/2)
Write A = i=1 Ai , where each Ai consists of the i × i matrices of A. For each i
define:

Gi := {(Gpq )1≤p,q≤i : Gpq ∈ W } , the i × i matrices with entries in W .

Let G = N
F (ε/2)
i=1 Gi and note that this is a finite collection. Furthermore, for each
Ai ∈ Ai , there exists Gi ∈ Gi such that
ε
kAi − Gi k∞ < .
4
Taking the diagonal correspondence between Ai and Gi , it follows that dN (Ai , Gi ) <
ε/2. Hence for any [F ] ∈ F, there exists A ∈ A and G ∈ G such that

dN (F, G) ≤ dN (F, A) + dN (A, G) < ε/2 + ε/2 = ε.

This shows that F is totally bounded, and concludes the proof.

2.5.3 Geodesics in CN /∼
=w
We now prove our results about the geodesic structures of FN /∼
=w and CN /∼
=w .
Proof of Theorem 92. Let [X], [Y ] ∈ FN /∼ =w . We will show the existence of a curve
γ : [0, 1] → FN such that γ(0) = (X, ωX ), γ(1) = (Y, ωY ), and for all s, t ∈ [0, 1],

dN (γ(s), γ(t)) = |t − s| · dN (X, Y ).

Note that this yields dN ([γ(s)], [γ(t)]) = |t − s| · dN ([X], [Y ]) for all s, t ∈ [0, 1], which is
what we need to show.
Let R ∈ R opt (X, Y ), i.e. let R be a correspondence
 such that dis(R) = 2dN (X, Y ).
For each t ∈ (0, 1) define γ(t) := R, ωγ(t) , where

ωγ(t) (x, y), (x0 , y 0 ) := (1 − t) · ωX (x, x0 ) + t · ωY (y, y 0 ) for all (x, y), (x0 , y 0 ) ∈ R.


Also define γ(0) = (X, ωX ) and γ(1) = (Y, ωY ).


Claim 11. For any s, t ∈ [0, 1],

dN (γ(s), γ(t)) ≤ |t − s| · dN (X, Y ).

115
Suppose for now that Claim 11 holds. We further claim that this implies, for all s, t ∈
[0, 1],
dN (γ(s), γ(t)) = |t − s| · dN (X, Y ).
To see this, assume towards a contradiction that there exist s0 < t0 such that :

dN (γ(s0 ), γ(t0 )) < |t0 − s0 | · dN (X, Y ).


Then dN (X, Y ) ≤ dN (X, γ(s0 )) + dN (γ(s0 ), γ(t0 )) + dN (γ(t0 ), Y )
< |s0 − 0| · dN (X, Y ) + |t0 − s0 | · dN (X, Y ) + |1 − t0 | · dN (X, Y )
= dN (X, Y ), a contradiction.

Thus it suffices to show Claim 11. There are three cases: (i) s, t ∈ (0, 1), (ii) s = 0, t ∈
(0, 1), and (iii) s ∈ (0, 1), t = 1. The latter two cases are similar, so we just prove (i)
and (ii). For (i), fix s, t ∈ (0, 1). Notice that ∆ := diag(R × R) := {(r, r) : r ∈ R} is a
correspondence in R(R, R). Then we obtain:

dis(∆) = max |ωγ(t) (a, b) − ωγ(s) (a, b)|


(a,a),(b,b)∈∆

= max |ωγ(t) ((x, y), (x0 , y 0 )) − ωγ(s) ((x, y), (x0 , y 0 ))|
(x,y),(x0 ,y 0 )∈R

= max |(1 − t)ωX (x, x0 ) + t · ωY (y, y 0 ) − (1 − s)ωX (x, x0 ) − s · ωY (y, y 0 )|


(x,y),(x0 ,y 0 )∈R

= max |(s − t)ωX (x, x0 ) − (s − t)ωY (y, y 0 )|


(x,y),(x0 ,y 0 )∈R

= |t − s| · max |ωX (x, x0 ) − ωY (y, y 0 )|


(x,y),(x0 ,y 0 )∈R

≤ 2|t − s| · dN (X, Y ).

Finally dN (γ(t), γ(s)) ≤ 12 dis(∆) ≤ |t − s| · dN (X, Y ).


For (ii), fix s = 0, t ∈ (0, 1). Define RX = {(x, (x, y)) : (x, y) ∈ R}. Then RX is a
correspondence in R(X, R).

dis(RX ) = max
0 0
|ωX (x, x0 ) − (1 − t) · ωX (x, x0 ) − t · ωY (y, y 0 )|
(x,(x,y)),(x ,(x ,y 0 ))∈R X

= max t · |ωX (x, x0 ) − ωY (y, y 0 )|


(x,(x,y)),(x0 ,(x0 ,y 0 ))∈RX

= t dis(R) = 2t · dN (X, Y ).

Thus dN (X, γ(t)) ≤ t · dN (X, Y ). The proof for case (iii), i.e. that dN (γ(s), Y ) ≤
|1 − s| · dN (X, Y ), is similar. This proves Claim 11, and the result follows.

Proof of Theorem 93. Let [X], [Y ] ∈ CN /∼=w . It suffices to find a geodesic between X
and Y , because the distance between any two equivalence classes is given by the distance
between any two representatives, and hence we will obtain a geodesic between [X] and
[Y ].

116
Let (Xn )n , (Yn )n be sequences in FN such that dN (Xn , X) < n1 and dN (Yn , Y ) < n1
for each n. For each n, let Rn be an optimal correspondence between Xn and Yn , endowed
with the weight function

ωn ((x, y), (a, b)) = 12 ωXn (x, a) + 12 ωXn (y, b).

By the proof of Theorem 92, the network (Rn , ωn ) is a midpoint of Xn and Yn .


Claim 12. The collection {Rn : n ∈ N} is precompact.
Assume for now that Claim 12 is true. Then we can pick a sequence (Rn ) that converges
to some R ∈ CN . Then we obtain:

dN (X, R) ≤ dN (X, Xn ) + dN (Xn , Rn ) + dN (Rn , R)


= dN (X, Xn ) + 21 dN (Xn , Yn ) + dN (Rn , R) → 21 dN (X, Y ).

Similarly dN (R, Y ) ≤ 12 dN (X, Y ). Furthermore, equality holds in both inequalities, be-


cause we would get a contradiction otherwise. Thus R is a midpoint of X and Y , and
moreover, [R] is a midpoint of [X] and [Y ]. The result now follows by an application of
Theorem 91.
It remains to prove Claim 12. By Theorem 90, it suffices to show that {Rn } is uniformly
approximable.
Since dN (Xn , X) → 0 and dN (Yn , Y ) → 0, we can choose D > 0 large enough so that
diam(Xn ) ≤ D2 and diam(Yn ) ≤ D2 for all n. Then diam(Rn ) ≤ D for all n.
Let ε > 0. Fix N large enough so that N1 < 2ε , and write N (ε) = maxn≤N card(Rn ).
We wish to show that every Rn is ε-approximable by a finite network with cardinality up
to N (ε). For any n ≤ N , we know Rn approximates itself, and card(Rn ) ≤ N (ε). Next
let n > N . It will suffice to show that Rn is ε-approximable by RN .
Let S, T be optimal correspondences between Xn , XN and Yn , YN respectively. Note
that dN (XN , Xn ) ≤ dN (XN , X)+dN (X, Xn ) ≤ N1 + N1 = N2 , and similarly dN (YN , Yn ) ≤
2
N
. Thus dis(S) ≤ N4 and dis(T ) ≤ N4 . Next write

Q = {(x, y, x0 , y 0 ) ∈ RN × Rn : (x, x0 ) ∈ S, (y, y 0 ) ∈ T } .

Observe that since S and T are correspondences, Q is a correspondence between RN and


Rn . Next we calculate dis(Q):

dis(Q) = max |ωN ((x, y), (a, b)) − ωn ((x0 , y 0 ), (a0 , b0 ))|
(x,y,x0 ,y 0 ),
(a,b,a0 ,b0 )∈Q

= max
0 0
| 12 ωXN (x, a) + 12 ωYN (y, b) − 21 ωXn (x0 , a0 ) − 12 ωYn (y 0 , b0 ))|
(x,y,x ,y ),
(a,b,a0 ,b0 )∈Q

≤ 1
max
2 (x,x0 ),(a,a0 )∈S
|ωXN (x, a) − ωXn (x0 , a0 )| + 1
max
2 (y,y 0 ),(b,b0 )∈S
|ωYN (y, b) − ωYn (y 0 , b0 )|

= 21 dis(S) + 12 dis(T ) ≤ 4
N
.

117
Thus dN (RN , Rn ) ≤ N2 < ε. This shows that any Rn can be ε-approximated by a network
having up to N (ε) points. Thus {Rn } is uniformly approximable, hence precompact. Thus
Claim 12 and the result follow.
Remark 138 (Branching and deviant geodesics). It is important to note that there exist
geodesics in CN /∼ =w that deviate from the straight-line form given by Theorem 92. Even
in the setting of compact metric spaces, there exist infinite families of branching and deviant
geodesics [36].

2.6 Lower bounds on dN


At this point, we have computed dN between several examples of networks, as in Ex-
ample 9 and Remark 10. We also asserted in Remark 13 that dN is in general difficult to
compute. The solution we propose is to compute quantitatively stable invariants of net-
works, and compare the invariants instead of comparing the networks directly. In this sec-
tion, we restrict our attention to computing invariants of compact networks, which satisfy
the useful property that the images of the weight functions are compact.
Intuitively, the invariants that we associate to two strongly isomorphic networks should
be the same. We define an R-invariant of networks to be a map ι : CN → R such that
for any X, Y ∈ CN , if X ∼ =s Y then ι(X) = ι(Y ). Any R-invariant is an example of a
pseudometric (and in particular, a metric) space valued invariant, which we define next.
Recall that a pseudometric space (V, dV ) is a metric space where we allow dV (v, v 0 ) = 0
even if v 6= v 0 .
Definition 45. Let (V, dV ) be any metric or pseudometric space. A V -valued invariant is
any map ι : CN → V such that ι(X, ωX ) = ι(Y, ωY ) whenever X ∼ =s Y .
Recall that pow(R), the nonempty elements of the power set of R, is a pseudometric
space when endowed with the Hausdorff distance [17, Proposition 7.3.3].
In what follows, we will construct several maps and claim that they are pseudomet-
ric space valued invariants; this claim will be substantiated in Proposition 144. We will
eventually prove that our proposed invariants are quantitatively stable. This notion is made
precise in §2.6.1.
Example 139. Define the diameter map to be the map

diam : CN → R given by (X, ωX ) 7→ max


0
|ωX (x, x0 )|.
x,x ∈X

Then diam is an R-invariant. Observe that the maximum is achieved for (X, ωX ) ∈ CN
because X (hence X × X) is compact and ωX : X × X → R is continuous.
Example 140. Define the spectrum map

spec : CN → pow(R) by (X, ωX ) 7→ {ωX (x, x0 ) : x, x0 ∈ X}.

118
α δ α
p =⇒ p
q q
γ β β
(X, ωX ) (X, trX )

Figure 2.1: The trace map erases data between pairs of nodes.

The spectrum also has two local variants. Define the out-local S spectrum of X by x 7→
0 0
specX (x) := {ωX (x, x ), x ∈ X}. Notice that spec(X) = x∈X specout
out
X (x) for any
network X, thus justifying the claim that this construction localizes spec. Similarly, we
in 0 0
define the in-spectrum of X asS the map inx 7→ specX (x) := {ωX (x , x) : x ∈ X} . Notice
that one still has spec(X) = x∈X specX (x) for any network X. Finally, we observe that
the two local versions of spec do not necessarily coincide in an asymmetric network.
The spectrum is closely related to the multisets used by Boutin and Kemper [15] to pro-
duce invariants of weighted undirected graphs. For an undirected graph G, they considered
the collection of all subgraphs with three nodes, along with the edge weights for each sub-
graph (compare to our notion of spectrum). Then they proved that the distribution of edge
weights of these subgraphs is an invariant when G belongs to a certain class of graphs.
Example 141. Define the trace map tr : CN → pow(R) by (X, ωX ) 7→ tr(X) :=
{ωX (x, x) : x ∈ X}. This also defines an associated map x 7→ trX (x) := ωX (x, x). An
example is provided in Figure 2.1: in this case, we have (X, trX ) = ({p, q}, (α, β)).

Example 142 (The out and in maps). Let (X, ωX ) ∈ CN , and let x ∈ X. Now define
out : CN → pow(R) and in : CN → pow(R) by
 
0
out(X) = max |ωX (x, x )| : x ∈ X for all (X, ωX ) ∈ CN
x0 ∈X
 
0
in(X) = max0
|ωX (x , x)| : x ∈ X for all (X, ωX ) ∈ CN .
x ∈X

For each x ∈ X, maxx0 ∈X |ωX (x, x0 )| and maxx0 ∈X |ωX (x0 , x)| are achieved because {x}×
X and X × {x} are compact. We also define the associated maps outX and inX by writing,
for any (X, ωX ) ∈ CN and any x ∈ X,

outX (x) = max


0
|ωX (x, x0 )| inX (x) = max
0
|ωX (x0 , x)|.
x ∈X x ∈X

To see how these maps


 operate on a network, let X = {p, q, r} and consider the weight
1 2 3
matrix Σ = 0 0 4 . The network corresponding to this matrix is shown in Figure 2.2. We
0 0 5

119
1
p
0 3
2 0
4
q r
0
0 5
X

Figure 2.2: The out map applied to each node yields the greatest weight of an arrow leaving
the node, and the in map returns the greatest weight entering the node.

ascertain the following directly from the matrix:

outX (p) = 3 inX (p) = 1


outX (q) = 4 inX (q) = 2
outX (r) = 5 inX (r) = 5.

So the out map returns the maximum (absolute) value in each row, and the in map pulls out
the maximum (absolute) value in each column of the weight matrix. As in the preceding
example, we may use the Hausdorff distance to compare the images of networks under the
out and in maps.
Constructions similar to out and in have been used by Jon Kleinberg to study the prob-
lem of searching the World Wide Web for user-specified queries [77]. In Kleinberg’s model,
for a search query σ, hubs are pages that point to highly σ-relevant pages (compare to out• ),
and authorities are pages that are pointed to by pages that have a high σ-relevance (com-
pare to in• ). Good hubs determine good authorities, and good authorities turn out to be
good search results.

Example 143 (mout and min ). Define the maps mout : CN → R and min : CN → R by

mout ((X, ωX )) = min outX (x) for all (X, ωX ) ∈ CN


x∈X

min ((X, ωX )) = min inX (x) for all (X, ωX ) ∈ CN .


x∈X

Then both min and mout are R-invariants. We take the minimum when defining mout , min
because for any network (X, ωX ), we have maxx∈X outX (x) = maxx∈X inX (x) = diam(X).
Also observe that the minima are achieved above because X is compact.
Proposition 144. The maps out, in, tr, spec, and spec• are pow(R)-invariants. Similarly,
diam, mout , and min are R-invariants.

120
Next we see that the motif sets defined in §1.6.4 are also invariants.
Definition 46 (Motif sets are metric space valued invariants). Our use of motif sets is
motivated by the following observation, which appeared in [87, Section 5]. For any n ∈ N,
let C(Rn×n ) denote the set of closed subsets of Rn×n . Under the Hausdorff distance induced
by the `∞ metric on Rn×n , this set becomes a valid metric space [17, Proposition 7.3.3].
The motif sets defined in Definition 29 define a metric space valued invariant as follows:
for each n ∈ N, let Mn : CN → C(Rn×n ) be the map X 7→ Mn (X). We call this the motif
set invariant. So for (X, ωX ), (Y, ωY ) ∈ CN , for each n ∈ N, we let (Z, dZ ) = (Rn×n , `∞ )
and consider the following distance between the n-motif sets of X and Y :

dn (Mn (X), Mn (Y )) := dZH (Mn (X), Mn (Y )).

Since dH is a proper distance between closed subsets, dn (Mn (X), Mn (Y )) = 0 if and only
if Mn (X) = Mn (Y ).

2.6.1 Quantitative stability of invariants of networks


Let (V, dV ) be a given pseudometric space. The V -valued invariant ι : CN → V is said
to be quantitatively stable if there exists a constant L > 0 such that

dV ι(X), ι(Y ) ≤ L · dN1 (X, Y )

for all networks X and Y . The least constant L such that the above holds for all X, Y ∈ CN
is the Lipschitz constant of ι and is denoted L(ι).
Note that by identifying a non-constant quantitatively stable V -valued invariant ι, we
immediately obtain a lower bound for the dN distance between any two compact networks
(X, ωX ) and (Y, ωY ). Furthermore, given a finite family ια : CN → V , α ∈ A, of non-
constant quantitatively stable invariants, we may obtain the following lower bound for the
distance between compact networks X and Y :
−1
dN (X, Y ) ≥ max L(ια ) max dV (ια (X), ια (Y )).
α∈A α∈A

It is often the case that computing dV (ι(X), ι(Y )) is substantially simpler than comput-
ing the dN distance between X and Y (which leads to a possibly NP-hard problem). The
invariants described in the previous section are quantitatively stable.
Proposition 145. The invariants diam, tr, out, in, mout , and min are quantitatively stable,
with Lipschitz constant L = 2.
Example 146. Proposition 145 provides simple lower bounds for the dN distance between
compact networks. One application is the following: for all networks X and Y , we have
dN (X, Y ) ≥ 21 diam(X) − diam(Y ) . For example, consider the weight matrices
   
0 5 2 3 4 2
Σ := 3 1 4 and Σ0 := 3 1 5 .
1 4 3 3 3 4

121
1 3
2 2
p r
q s
1 3
2 1
X Y

Figure 2.3: Lower-bounding dN by using global spectra (cf. Example 148).

Let X = N3 (Σ) and Y = N3 (Σ0 ). By comparing the diagonals, we can easily see that
X ∼6=s Y , but let us see how the invariants we proposed can help. Note that diam(X) =
diam(Y ) = 5, so the lower bound provided by diameter ( 12 |5 − 5| = 0) does not help
in telling the networks apart. However, tr(X) = {0, 1, 3} and tr(Y ) = {3, 1, 4}, and
Proposition 145 then yields
1 1
dN (X, Y ) ≥ dRH ({0, 1, 3}, {1, 3, 4}) = .
2 2
Consider now the out and in maps. Note that one has out(X) = {5, 4}, out(Y ) =
{4, 5}, in(X) = {3, 5, 4}, and in(Y ) = {3, 4, 5}. Then dRH (out(X), out(Y )) = 0, and
dRH (in(X), in(Y )) = 0. Thus in both cases, we obtain dN (X, Y ) ≥ 0. So in this particular
example, the out and in maps are not useful for obtaining a lower bound to dN (X, Y ) via
Proposition 145.

Now we state a proposition regarding the stability of global and local spectrum invari-
ants. These will be of particular interest for computational purposes as we explain in §4.2.

Proposition 147. Let spec• refer to either the out or in version of local spectrum. Then,
for all (X, ωX ), (Y, ωY ) ∈ CN we have
1
dN (X, Y ) ≥ inf sup dRH (spec•X (x), spec•Y (y))
2 R∈R (x,y)∈R
1
≥ dRH (spec(X), spec(Y )).
2
As a corollary, we get L(spec• ) = L(spec) = 2.

Example 148 (An application of Proposition 147). Consider the networks in Figure 2.3.
By Proposition 147, we may calculate a lower bound for dN (X, Y ) by simply computing
the Hausdorff distance between spec(X) and spec(Y ), and dividing by 2. In this exam-
ple, spec(X) = {1, 2} and spec(Y ) = {1, 2, 3}. Thus dRH (spec(X), spec(Y )) = 1, and
dN (X, Y ) ≥ 21 .

122
Computing the lower bound involving local spectra requires solving a bottleneck linear
assignment problem over the set of all correspondences between X and Y . This can be
solved in polynomial time; details are provided in §4.2. The second lower bound stipu-
lates computing the Hausdorff distance on R between the (global) spectra of X and Y – a
computation which can be carried out in (smaller) polynomial time as well.
To conclude this section, we state a theorem asserting that motif sets form a family of
quantitatively stable invariants.
Theorem 149. For each n ∈ N, Mn is a stable invariant with L(Mn ) = 2.

2.6.2 Proofs involving lower bounds for dN


Proof of Proposition 144. All of these cases are easy to check, so we will just record
the proof for spec. Suppose (X, ωX ) and (Y, ωY ) are strongly isomorphic via ϕ. Let
x ∈ X. Let α ∈ specX (x). Then there exists x0 ∈ X such that α = ωX (x, x0 ). But
since ωX (x, x0 ) = ωY (ϕ(x), ϕ(x0 )), we also have α ∈ specY (ϕ(x)). Thus specX (x) ⊆
specX (ϕ(x)). The reverse containment is similar. Thus for any x ∈ X, specX (x) =
S
specY (ϕ(x)). Since spec(X) = x∈X specX (x), it follows that spec(X) = spec(Y ).
Lemma 150. Let (X, ωX ), (Y, ωY ) ∈ CN . Let f represent any of the maps tr, out,
and in, and let fX (resp. fY ) represent the corresponding map trX , outX , inX (resp.
trY , outY , inY ). Then we obtain:

dRH (f (X), f (Y )) = inf sup fX (x) − fY (y) .


R∈R(X,Y ) (x,y)∈R

Proof of Lemma 150. Observe that f (X) = {fX (x) : x ∈ X} = fX (X), so we need to
show
dRH (fX (X), fY (Y )) = inf sup fX (x) − fY (y) .
R∈R(X,Y ) (x,y)∈R

Recall that by the definition of Hausdorff distance on R, we have



dRH (fX (X), fY (Y )) = max sup inf |fX (x) − fY (y)|, sup inf |fX (x) − fY (y)| .
x∈X y∈Y y∈Y x∈X

Let a ∈ X and let R ∈ R(X, Y ). Then there exists b ∈ Y such that (a, b) ∈ R. Then we
have:

|fX (a) − fY (b)| ≤ sup |fX (x) − fY (y)|, and so


(x,y)∈R

inf |fX (a) − fY (b)| ≤ sup |fX (x) − fY (y)|.


b∈Y (x,y)∈R

This holds for all a ∈ X. Then,

sup inf |fX (a) − fY (b)| ≤ sup |fX (x) − fY (y)|.


a∈X b∈Y (x,y)∈R

123
This holds for all R ∈ R(X, Y ). So we have

sup inf |fX (a) − fY (b)| ≤ inf sup |fX (x) − fY (y)|.
a∈X b∈Y R∈R (x,y)∈R

By a similar argument, we also have

sup inf |fX (a) − fY (b)| ≤ inf sup |fX (x) − fY (y)|.
b∈Y a∈X R∈R (x,y)∈R

Thus dRH (fX (X), fY (Y )) ≤ inf sup |fX (x) − fY (y)|.


R∈R (x,y)∈R

Now we show the reverse inequality. Let x ∈ X, and let η > dRH (fX (X), fY (Y )). Then
there exists y ∈ Y such that |fX (x) − fY (y)| < η. Define ϕ(x) = y, and extend ϕ to all
of X in this way. Let y ∈ Y . Then there exists x ∈ X such that |fX (x) − fY (y)| < η.
Define ψ(y) = x, and extend ψ to all of Y in this way. Let R = {(x, ϕ(x)) : x ∈ X} ∪
{(ψ(y), y) : y ∈ Y }. Then for each (a, b) ∈ R, we have |fX (a) − fY (b)| < η. Thus we
have inf R∈R sup(x,y)∈R |fX (x) − fY (y)| < η. Since η > dRH (fX (X), fY (Y )) was arbitrary,
it follows that

inf sup |fX (x) − fY (y)| ≤ dRH (fX (X), fY (Y )).


R∈R(X,Y ) (x,y)∈R

Proof of Proposition 145. Let η > dN (X, Y ). We break this proof into three parts.
The diam case. Recall that diam is an R-valued invariant, so we wish to show | diam(X)−
diam(Y )| ≤ 2dN (X, Y ). Let R ∈ R(X, Y ) be such that for any (a, b), (a0 , b0 ) ∈ R, we
have |ωX (a, a0 ) − ωY (b, b0 )| < 2η.
Let x, x0 ∈ X such that |ωX (x, x0 )| = diam(X), and let y, y 0 be such that (x, y), (x0 , y 0 ) ∈
R. Then we have:

|ωX (x, x0 ) − ωY (y, y 0 )| < 2η


|ωX (x, x0 ) − ωY (y, y 0 )| + |ωY (y, y 0 )| < 2η + |ωY (y, y 0 )|
|ωX (x, x0 )| < diam(Y ) + 2η.
Thus diam(X) < diam(Y ) + 2η

Similarly, we get diam(Y ) < diam(X) + 2η. It follows that | diam(X) − diam(Y )| < 2η.
Since η > dN (X, Y ) was arbitrary, it follows that:

| diam(X) − diam(Y )| ≤ 2dN (X, Y ).

For tightness, consider the networks X = N1 (1) and Y = N1 (2). By direct computa-
tion, we have that dN (X, Y ) = 21 . On the other hand, diam(X) = 1 and diam(Y ) = 2 so
that | diam(X) − diam(Y )| = 1 = 2dN (X, Y ).
The cases tr, out, and in. First we show L(tr) = 2. By Lemma 150, it suffices to show

inf sup | trX (x) − trY (y)| < 2η.


R∈R(X,Y ) (x,y)∈R

124
Let R ∈ R(X, Y ) be such that for any (a, b), (a0 , b0 ) ∈ R, we have |ωX (a, a0 )−ωY (b, b0 )| <
2η. Then we obtain |ωX (a, a) − ωY (b, b)| < 2η. Thus | trX (a) − trY (b)| < 2η. Since
(a, b) ∈ R was arbitrary, it follows that sup(a,b)∈R | trX (a) − trX (b)| < 2η. It follows that
inf R∈R sup(a,b)∈R | trX (a) − trX (b)| < 2η. The result now follows because η > dN (X, Y )
was arbitrary. The proofs for out and in are similar, so we just show the former. By Lemma
150, it suffices to show

inf sup | outX (x) − outY (y)| < 2η.


R∈R(X,Y ) (x,y)∈R

Recall that outX (x) = maxx0 ∈X |ωX (x, x0 )|. Let R ∈ R(X, Y ) be such that |ωX (x, x0 )−
ωY (y, y 0 )| < 2η for any (x, y), (x0 , y 0 ) ∈ R. By triangle inequality, it follows that |ωX (x, x0 )| <
|ωY (y, y 0 )| + 2η. In particular, for (x0 , y 0 ) ∈ R such that |ωX (x, x0 )| = outX (x), we have
outX (x) < |ωY (y, y 0 )| + 2η. Hence outX (x) < outY (y) + 2η. Similarly, outY (y) <
outX (x) + 2η. Thus we have | outX (x) − outY (y)| < 2η. This holds for all (x, y) ∈ R, so
we have:
sup | outX (x) − outY (y)| < 2η.
(x,y)∈R

Minimizing over all correspondences, we get:

inf sup | outX (a) − outY (b)| < 2η.


R∈R (a,b)∈R

The result follows because η > dN (X, Y ) was arbitrary.


Finally, we need to show that our bounds for the Lipschitz constant are tight. Let
X = N1 (1) and let Y = N1 (2). Then dN (X, Y ) = 21 . We also have dRH (tr(X), tr(Y )) =
|1 − 2| = 1, and similarly dRH (out(X), out(Y )) = dRH (in(X), in(Y )) = 1.

The cases mout and min . The two cases are similar, so let’s just prove L(mout ) = 2. Since
mout is an R-invariant, we wish to show | mout (X) − mout (Y )| < 2η. It suffices to show:

| mout (X) − mout (Y )| ≤ dRH (out(X), out(Y )),

because we have already shown

dRH (out(X), out(Y )) = inf sup | outX (x) − outY (y)| < 2η.
R∈R(X,Y ) (x,y)∈R

Here we have used Lemma 150 for the first equality above.
Let ε > dRH (out(X), out(Y )). Then for any x ∈ X, there exists y ∈ Y such that:

| outX (x) − outY (y)| < ε.

Let a ∈ X be such that mout (X) = outX (a). Then we have:

| outX (a) − outY (y)| < ε,

125
for some y ∈ Y . In particular, we have:

mout (Y ) ≤ outY (y) < ε + outX (a) = ε + mout (X).

Similarly, we obtain:
mout (X) < ε + mout (Y ).
Thus we have | mout (X)−mout (Y )| < ε. Since ε > dRH (out(X), out(Y )) was arbitrary,
we have:
| mout (X) − mout (Y )| ≤ dRH (out(X), out(Y )).
The inequality now follows by Lemma 150 and our proof in the case of the out map.
For tightness, note that | mout (N1 (1)) − mout (N1 (2))| = |1 − 2| = 1 = 2 · 21 =
2dN (N1 (1), N1 (2)). The same example works for the min case.
Proof of Proposition 147. (First inequality.) Let X, Y ∈ CN and let η > dN (X, Y ). Let
R ∈ R(X, Y ) be such that sup(x,y),(x0 ,y0 )∈R |ωX (x, x0 ) − ωY (y, y 0 )| < 2η. Let (x, y) ∈ R,
and let α ∈ specX (x). Then there exists x0 ∈ X such that ωX (x, x0 ) = α. Let y 0 ∈ Y be
such that (x0 , y 0 ) ∈ R. Let β = ωY (y, y 0 ). Note β ∈ specY (y). Also note that |α − β| < 2η.
By a symmetric argument, for each β ∈ specY (y), there exists α ∈ specX (x) such that
|α − β| < 2η. So dRH (specX (x), specY (y)) < 2η. This is true for any (x, y) ∈ R, and so
we have sup(x,y)∈R dRH (specX (x), specY (y)) ≤ 2η. Then we have:

inf sup dRH (specX (x), specY (y)) ≤ 2η.


R∈R (x,y)∈R

Since η > dN (X, Y ) was arbitrary, the first inequality follows.


(Second inequality.) Let R ∈ R(X, Y ). Let η(R) = sup(x,y)∈R dRH (specX (x), specY (y)).
Let α ∈ spec(X). Then α ∈ specX (x) for some x ∈ X. Let y ∈ Y such that (x, y) ∈ R.
Then there exists β ∈ specY (y) such that |α − β| ≤ dRH (specX (x), specY (y)), and in par-
ticular, |α − β| ≤ η(R). In other words, for each α ∈ spec(X), there exists β ∈ spec(Y )
such that |α − β| ≤ η(R). By a symmetric argument, for each β ∈ spec(Y ), there exists
α ∈ spec(X) such that |α − β| ≤ η(R). Thus dRH (spec(X), spec(Y )) ≤ η(R). This holds
for any R ∈ R. Thus we have

dRH (spec(X), spec(Y )) ≤ inf sup dRH (specX (x), specY (y)).
R∈R (x,y)∈R

This proves the second inequality.


Proof of Theorem 149. Let n ∈ N. We wish to show dn (Mn (X), Mn (Y )) ≤ 2dN (X, Y ).
Let R ∈ R(X, Y ). Let (xi ) ∈ X n , and let (yi ) ∈ Y n be such that for each i, we have
(xi , yi ) ∈ R. Then for all j, k ∈ {1, . . . , n}, |ωX (xi , xj ) − ωY (yi , yj )| ≤ dis(R).
Thus inf (yi )∈Y n |ωX (xi , xj ) − ωY (yi , yj )| ≤ dis(R). This is true for any (xi ) ∈ X n .
Thus we get:
sup inf n |ωX (xi , xj ) − ωY (yi , yj )| ≤ dis(R).
(xi )∈X n (yi )∈Y

126
By a symmetric argument, we get sup(yi )∈Y n inf (xi )∈X n |ωX (xi , xj ) − ωY (yi , yj )| ≤ dis(R).
Thus dn (Mn (X), Mn (Y )) ≤ dis(R). This holds for any R ∈ R(X, Y ). Thus we have

dn (Mn (X), Mn (Y )) ≤ inf dis(R) = 2dN (X, Y ).


R∈R(X,Y )

For tightness, let X = N1 (1) and let Y = N1 (2). Then dN (X, Y ) = 21 , so we wish to
show dn (Mn (X), Mn (Y )) = 1 for each n ∈ N. Let n ∈ N. Let 1n×n denote the n × n
matrix with 1 in each entry. Then Mn (X) = {1n×n } and Mn (Y ) = {2 · 1n×n }. Thus
dn (Mn (X), Mn (Y )) = 1. Since n was arbitrary, we conclude that equality holds for each
n ∈ N.

2.7 Proofs from §1.9


Proof of Lemma 106. First suppose p ∈ [1, ∞). We will construct a sequence of contin-
uous functionals that converges uniformly to disp . Since the uniform limit of continuous
functions is continuous, this will show that disp is continuous.
Continuous functions are dense in Lp (λ⊗2I ) (see e.g. [14]), so for each n ∈ N, we pick
n n p ⊗2
σX , σY ∈ L (λI ) such that
n
kσX − σX kLp (λ⊗2 ) ≤ 1/n, kσY − σYn kLp (λ⊗2 ) ≤ 1/n.
I I

For each n ∈ N, define the functional disnp : C (λI , λI ) → R+ as follows:


Z Z 1/p
p
disnp (ν) := n
|σX (i, i0 ) − σYn (j, j 0 )| 0 0
dν(i, j)dν(i , j ) .
I×I I×I

Because |σX n
− σYn |p is continuous and hence bounded on the compact cube I 2 × I 2 , we
n
know that |σX − σYn |p ∈ Cb (I 2 × I 2 ).
We claim that disnp is continuous. Since the narrow topology on Prob(I × I) is in-
duced by a distance [5, Remark 5.1.1], it suffices to show sequential continuity. Let
ν ∈ C (λI , λI ), and let (νm )m∈N be a sequence in C (λI , λI ) converging narrowly to ν.
Then we have
Z 1/p
n n n p
lim disp (νm ) = lim |σX − σY | dνm ⊗ dνm
m→∞ m→∞ I 2 ×I 2
Z 1/p
= n
|σX − σYn |p dν ⊗ dν
I 2 ×I 2
n
= disp (ν).

Here the second equality follows from the definition of convergence in the narrow topology
and the fact that the integrand is bounded and continuous. This shows sequential continuity
(hence continuity) of disnp .

127
Finally, we show that (disnp )n∈N converges to disp uniformly. Let µ ∈ C (λI , λI ). Then,

disp (µ) − disnp (µ) = kσX − σY kLp (µ⊗2 ) − kσX


n
− σYn kLp (µ⊗2 )
n
≤ kσX − σY − σX + σYn kLp (µ⊗2 )
n
≤ kσX − σX kLp (µ⊗2 ) + kσY − σYn kLp (µ⊗2 )
Z Z 1/p
0 n 0 p 0 0
= |σX (i, i ) − σX (i, i )| dµ(i, j) dµ(i , j ) ...
I×I I×I
Z Z 1/p
0 p
+ |σY (j, j ) − σYn (j, j 0 )| dµ(i, j) dµ(i , j ) 0 0
I×I I×I
Z Z 1/p
0 p
= |σX (i, i ) − n
σX (i, i0 )| dλI (i) dλI (i ) 0
...
I I
Z Z 1/p
0 p
+ |σY (j, j ) − σYn (j, j 0 )| dλI (j) dλI (j ) 0
I I
n
= kσX − σX kLp (λ⊗2 ) + kσY − σYn kLp (λ⊗2 )
I I

≤ 2/n.

But µ ∈ C (λI , λI ) was arbitrary. This shows that disp is the uniform limit of continu-
ous functions, hence is continuous. Here the first and second inequalities followed from
Minkowski’s inequality.
Now suppose p = ∞. Let µ ∈ C (λI , λI ) be arbitrary. Recall that because we are
working over probability spaces, Jensen’s inequality can be used to show that for any 1 ≤
q ≤ r < ∞, we have disq (µ) ≤ disr (µ). Moreover, we have limq→∞ disq (µ) = dis∞ (µ).
The supremum of a family of continuous functions is lower semicontinuous. In our case,
dis∞ = sup{disq : q ∈ [1, ∞)}, and we have shown above that all the functions in this
family are continuous. Hence dis∞ is lower semicontinuous.
Proof of Theorem 111. Let (X, ωX , µX ), (Y, ωY , µY ), (Z, ωY , µY ) ∈ Nm . It is clear that
dN,p (X, Y ) ≥ 0. To show dN,p (X, X) = 0, consider the diagonal coupling ∆ (see Example
101). For p ∈ [1, ∞), we have:

Z Z 1/p
0 0 p 0 0
disp (∆) = |ωX (x, x ) − ωX (z, z )| d∆(x, z)d∆(x , z )
X×X X×X
Z Z 1/p
0 0 p 0
= |ωX (x, x ) − ωX (x, x )| dµX (x)dµX (x )
X X
= 0.

128
For p = ∞, we have:
disp (∆) = sup{|ωX (x, x0 ) − ωX (z, z 0 )| : (x, z), (x0 , z 0 ) ∈ supp(∆)}
= sup{|ωX (x, x0 ) − ωX (x, x0 )| : x, x0 ∈ supp(µX )}
= 0.
Thus dN,p (X, X) = 0 for any p ∈ [1, ∞]. For symmetry, notice that for any µ ∈
C (µX , µY ), we can define µe ∈ C (µY , µX ) by µ
e(y, x) = µ(x, y). Then disp (µ) = disp (eµ),
and this will show dN,p (X, Y ) = dN,p (Y, X).
Finally, we need to check the triangle inequality. Let ε > 0, and let µ12 ∈ C (µX , µY )
and µ23 ∈ C (µY , µZ ) be couplings such that 2dN,p (X, Y ) ≥ disp (µ12 )−ε and 2dN,p (Y, Z) ≥
disp (µ23 )−ε . Invoking Lemma 107, we obtain a probability measure µ ∈ Prob(X×Y ×Z)
with marginals µ12 , µ23 , and a marginal µ13 that is a coupling between µX and µZ . This
coupling is not necessarily optimal. For p ∈ [1, ∞) we have:
2dN,p (X, Z) ≤ disp (µ13 )
Z Z 1/p
0 0 p 0 0
= |ωX (x, x ) − ωZ (z, z )| dµ13 (x, z) dµ13 (x , z )
X×Z X×Z
Z Z 1/p
0 0 p 0 0 0
= |ωX (x, x ) − ωZ (z, z )| dµ(x, y, z) dµ(x , y , z )
X×Y ×Z X×Y ×Z
= kωX − ωY + ωY − ωZ kLp (µ⊗µ)
≤ kωX − ωY kLp (µ⊗µ) + kωY − ωZ kLp (µ⊗µ)
Z Z 1/p
0 0 p 0 0
= |ωX (x, x ) − ωY (y, y )| dµ12 (x, y) dµ12 (x , y ) ...
X×Y X×Y
Z Z 1/p
0 0 p 0 0
+ |ωY (y, y ) − ωZ (z, z )| dµ23 (y, z) dµ23 (y , z )
Y ×Z Y ×Z
≤ 2dN,p (X, Y ) + 2dN,p (Y, Z) + 2ε.
The second inequality above follows from Minkowski’s inequality. Letting ε → 0 now
proves the triangle inequality in the case p ∈ [1, ∞).
For p = ∞ we have:

2dN,p (X, Z) ≤ disp (µ13 )


= sup{|ωX (x, x0 ) − ωZ (z, z 0 )| : (x, z), (x0 , z 0 ) ∈ supp(µ13 )}
= sup{|ωX (x, x0 ) − ωY (y, y 0 ) + ωY (y, y 0 ) − ωZ (z, z 0 )| : (x, y, z), (x0 , y 0 , z 0 ) ∈ supp(µ)}
≤ sup{|ωX (x, x0 ) − ωY (y, y 0 )| + |ωY (y, y 0 ) − ωZ (z, z 0 )|
: (x, y), (x0 , y 0 ) ∈ supp(µ12 ), (y, z), (y 0 , z 0 ) ∈ supp(µ23 )}
≤ disp (µ12 ) + disp (µ23 )
≤ 2dN,p (X, Y ) + 2dN,p (Y, Z) + 2ε.

129
Letting ε → 0 now proves the triangle inequality in the case p = ∞. This concludes our
proof.
Proof of Theorem 112. First suppose p ∈ [1, ∞). By the construction in Section 1.9.3,
we pass into interval representations of X and Y . As noted in Section 1.9.3, the choice
of parametrization is not necessarily unique, but this does not affect the argument. Let
(I, σX , λI ), (I, σY , λI ) denote these representations. By Lemma 106, the disp functional
is continuous on the space of couplings between these two networks. By Lemma 105, this
space of couplings is compact. Thus disp achieves its infimum.
Let µ ∈ C (λI , λI ) denote this minimizer of disp . By Remark 102, we can also take
couplings µX ∈ C (µX , λI ) and µY ∈ C (λI , µY ) which have zero distortion. By Lemma
107, we can glue together µX , µ, and µY to obtain a coupling ν ∈ C (µX , µY ). By the proof
of the triangle inequality in Theorem 111, we have:

disp (ν) ≤ disp (µX ) + disp (µ) + disp (µY ) = disp (µ) = 2dN,p ((I, σX , λI ), (I, σY , λI )).

Also by the triangle inequality, we have dN,p ((I, σX , λI ), (I, σY , λI )) ≤ dN,p (X, Y ). It
follows that ν ∈ C (µX , µY ) is optimal.
The case p = ∞ is analogous, because lower semicontinuity (Lemma 106) combined
with compactness (Lemma 105) is sufficient to guarantee that dis∞ achieves its infimum
on C (λI , λI ).

Proof of Theorem 113. Fix p ∈ [1, ∞). For the backwards direction, suppose there exist Z
and measurable maps f : Z → X and g : Z → Y such that the appropriate conditions are
satisfied. We first claim that dN,p ((X, ωX , µX ), (Z, f ∗ ωX , µZ )) = 0.
To see the claim, define µ ∈ Prob(X × Z) by µ := (f, id)∗ µZ . Then,
Z Z
p
|ωX (x, x0 ) − f ∗ ωX (z, z 0 )| dµ(x, z) dµ(x0 , z 0 )
ZX×Z ZX×Z
p
|ωX (x, x0 ) − ωX (f (z), f (z 0 ))| dµ(x, z) dµ(x0 , z 0 )
ZX×ZZ X×Z
p
|ωX (f (z), f (z 0 )) − ωX (f (z), f (z 0 ))| dµZ (z) dµZ (z 0 ) = 0.
Z Z

This verifies the claim. Similarly we have dN,p ((Y, ωY , µX ), (Z, g ∗ ωY , µZ )) = 0. Using the
diagonal coupling along with the assumption, we have dN,p ((Z, f ∗ ωX , µZ ), (Z, g ∗ ωY , µZ )) =
0. By triangle inequality, we then have dN,p (X, Y ) = 0.
For the forwards direction, let µ ∈ C (µX , µY ) be an optimal coupling with disp (µ) = 0
(Theorem 112). Define Z := X × Y , µZ := µ. Then the projection maps πX : Z → X
and πY : Z → Y are measurable. We also have (πX )∗ µ = µX and (πY )∗ µ = µY . Since
disp (µ) = 0, we also have k(πX )∗ ωX − (πY )∗ ωY k∞ = kωX − ωY k∞ = 0.
The p = ∞ case is proved analogously. This concludes the proof.

130
Proof of Theorem 115. Let (X, ωX , µX ), (Y, ωY , µY ), (Z, ωZ , µZ ) ∈ Nm . The proofs that
GP GP GP GP
dN,α (X, Y ) ≥ 0, dN,α (X, X) = 0, and that dN,α (X, Y ) = dN,α (Y, X) are analogous to
those used in Theorem 111. Hence we only check the triangle inequality. Let εXY >
GP GP
2dN,α (X, Y ), εY Z > 2dN,α (Y, Z), and let µXY , µY Z be couplings such that
µ⊗2 0 0 0 0
XY ({(x, y, x , y ) : |ωX (x, x ) − ωY (y, y )| ≥ εXY }) ≤ αεXY ,
µ⊗2 0 0 0 0
Y Z ({(y, z, y , z ) : |ωY (y, y ) − ωZ (z, z )| ≥ εY Z }) ≤ αεY Z .

For convenience, define:


A := {((x, y, z), (x0 , y 0 , z 0 )) ∈ (X × Y × Z)2 : |ωX (x, x0 ) − ωY (y, y 0 )| ≥ εXY }
B := {((x, y, z), (x0 , y 0 , z 0 )) ∈ (X × Y × Z)2 : |ωY (y, y 0 ) − ωZ (z, z 0 )| ≥ εY Z }
C := {((x, y, z), (x0 , y 0 , z 0 )) ∈ (X × Y × Z)2 : |ωX (x, x0 ) − ωZ (z, z 0 )| ≥ εXY + εY Z }.
Next let µ denote the probability measure obtained from gluing µXY and µY Z (cf.
Lemma 107). This has marginals µXY , µY Z , and a marginal µXZ ∈ C (µX , µZ ). We need
to show:
µ⊗2
XZ ((πX , πZ )(C)) ≤ α(εXY + εY Z ).

To show this, it suffices to show C ⊆ A ∪ B, because then we have µ⊗2 (C) ≤ µ⊗2 (A) +
µ⊗2 (B) and consequently
µ⊗2 ⊗2 ⊗2 ⊗2 ⊗2 ⊗2
XZ ((πX , πZ )(C)) = µ (C) ≤ µ (A) + µ (B) = µXY ((πX , πY )(A)) + µY Z ((πY , πZ )(B))
≤ α(εXY + εY Z ).
Let ((x, y, z), (x0 , y 0 , z 0 )) ∈ (X × Y × Z)2 \ (A ∪ B). Then we have
|ωX (x, x0 ) − ωY (y, y 0 )| < εXY and |ωY (y, y 0 ) − ωZ (z, z 0 )| < εY Z .
By the triangle inequality, we then have:
|ωX (x, x0 ) − ωZ (z, z 0 )| ≤ |ωX (x, x0 ) − ωY (y, y 0 )| + |ωY (y, y 0 ) − ωZ (z, z 0 )| < εXY + εY Z .
Thus ((x, y, z), (x0 , y 0 , z 0 )) ∈ (X × Y × Z)2 \ C. This shows C ⊆ A ∪ B.
GP GP
The preceding work shows that 2dN,α (X, Z) ≤ εXY + εY Z . Since εXY > 2dN,α (X, Y )
GP GP GP GP
and εY Z > 2dN,α (Y, Z) were arbitrary, it follows that dN,α (X, Z) ≤ dN,α (X, Y )+dN,α (Y, Z).

Proof of Lemma 116. Write C := C (X, Y ). When α = 0 in the dN,α


GP
formulation, we
have:
GP
(X, Y ) = inf inf{ε > 0 : µ⊗2 {x, y, x0 , y 0 ∈ (X × Y )2 : |ωX (x, x0 ) − ωY (y, y 0 )| ≥ ε} = 0}

2dN,0
µ∈C

= inf sup{ε > 0 : µ⊗2 {x, y, x0 , y 0 ∈ (X × Y )2 : |ωX (x, x0 ) − ωY (y, y 0 )| < ε} = 1}



µ∈C

= inf sup{|ωX (x, x0 ) − ωY (y, y 0 )| : (x, y), (x0 , y 0 ) ∈ supp(µ)}


µ∈C

= 2dN,∞ (X, Y ).

131
Proof of Theorem 118. Let t0 ∈ R, and let (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm . Via Lemma
GP
116, write ε := dN,0 (X, Y ) = dN,∞ (X, Y ). Using Theorem 112, let µ be an optimal
coupling between µX and µY for which dN,∞ (X, Y ) is achieved.
For each t ∈ R, write A(X, t) := {(x, x0 ) ∈ X × X : ωX (x, x0 ) ≤ t} = {ωX ≤ t}.
Similarly write A(Y, t) := {ωY ≤ t} for each t ∈ R.
Let B := {(x, y, x0 , y 0 ) ∈ (X × Y )2 : |ωX (x, x0 ) − ωY (y, y 0 )| ≥ ε}. Also let G denote
the complement of B, i.e. G := {(x, y, x0 , y 0 ) ∈ (X × Y )2 : |ωX (x, x0 ) − ωY (y, y 0 )| < ε}.
In particular, notice that for any (x, y, x0 , y 0 ) ∈ G, we have ωX (x, x0 ) < ε + ωY (y, y 0 ).
By the definition of ε, we have µ⊗2 (B) = 0, and hence µ⊗2 (G) = 1.
In what follows, we will focus on the case p ∈ [1, ∞) and write out the integrals
explicitly. An analogous proof holds for p = ∞. We have:
Z 1/p
w 0 p ⊗2 0
subp,t0 (X) = |ωX (x, x )| dµX (x, x )
A(X,t0 )
Z 1/p
0 p ⊗2 0 0
= |ωX (x, x )| dµ (x, y, x , y )
A(X,t0 )×Y 2
Z 1/p
0 p ⊗2 0 0
= |ωX (x, x )| dµ (x, y, x , y )
G∩(A(X,t0 )×Y 2 )
Z 1/p
= 1
p
G∩(A(X,t0 )×Y 2 )
0 0 0
|ωX (x, x ) − ωY (y, y ) + ωY (y, y )| dµ (x, y, x , y )
p ⊗2 0 0
.
(X×Y )2

For convenience, write H := G ∩ (A(X, t0 ) × Y 2 ). Using Minkowski’s inequality, we


have:
Z 1/p
≤ 1H |ωX (x, x ) − ωY (y, y )| dµ (x, y, x , y )
p 0 0 p ⊗2 0 0
...
(X×Y )2
Z 1/p
+ 1 |ωY (y, y )| dµ (x, y, x , y )
p
H
0 p
. ⊗2 0 0
(X×Y )2

For any (x, y, x0 , y 0 ) ∈ H, we have |ωX (x, x0 ) − ωY (y, y 0 )| < ε. Also we have ωY (y, y 0 ) <
ε+ωX (x, x0 ) ≤ ε+t0 . From the latter, we know G∩(A(X, t0 ) × Y 2 ) ⊆ X 2 ×A(Y, t0 +ε).
So we continue the previous expression as below:
Z 1/p Z 1/p
≤ 1H |ε| dµ
p p ⊗2
+ 0 p ⊗2 0 0
|ωY (y, y )| dµ (x, y, x , y )
(X×Y )2 X 2 ×A(Y,t0 +ε)
(2.4)
Z 1/p
p
≤ε+ |ωY (y, y 0 )| dµ⊗2 0
Y (y, y )
A(Y,t0 +ε)
w
= subp,t0 +ε (Y ) + ε.
Analogously, we have
subw w
p,t0 +ε (Y ) ≤ subp,t0 +2ε (X) + ε.

132
This yields interleaving for p ∈ [1, ∞). For p = ∞, we use the same arguments about G
and B to obtain:
0 0 0
subw
p,t0 (X) = sup{|ωX (x, x )| : x, x ∈ supp(µX ), ωX (x, x ) ≤ t0 }
≤ sup{|ωY (y, y 0 )| + ε : y, y 0 ∈ supp(µY ), ωY (y, y 0 ) ≤ t0 + ε}
≤ subw
p,t0 +ε (Y ) + ε.

Thus we have interleaving for all p ∈ [1, ∞].


The case for the supw p invariant is similar, except in step (2.4) above. In this case, we
note that for any (x, y, x , y ) ∈ H, we have ωY (y, y 0 ) > ωX (x, x0 ) − ε ≥ t0 − ε. Thus we
0 0

have H = G ∩ (A(X, t0 ) × Y 2 ) ⊆ X 2 × A(Y, t0 − ε), and so:


Z 1/p Z 1/p
1 |ωY (y, y )| dµ (x, y, x , y )
p
H
0 p ⊗2 0
≤0 0 p ⊗2
|ωY (y, y )| dµ (x, y, x , y ) . 0 0
(X×Y )2 X 2 ×A(Y,t0 −ε)

It follows that we have:

supw w w
p,t0 (X) ≤ supp,t0 −ε (X) + ε ≤ supp,t0 −2ε (Y ) + 2ε.

The p = ∞ is proved analogously.


Proof of Theorem 121. Let (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm . For each s ∈ X and t ∈ Y ,
0 0 0
define ϕst 2 st st
XY : (X × Y ) → R by writing ϕXY (x, y, x , y ) := ωX (s, x ), and define ψXY :
2 st 0 0 0
(X × Y ) → R by writing ψXY (x, y, x , y ) := ωY (t, y ). For convenience, we write
C := C (µX , µY ).
First let p ∈ [1, ∞). Let η > dN,p (X, Y ), and let µ ∈ C (µX , µY ) be a coupling such
that disp (µ) < 2η. Then by applying Minkowski’s inequality, we obtain

kϕst st st st
XY kLp (µ⊗µ) − kψXY kLp (µ⊗µ) ≤ kϕXY − ψXY kLp (µ⊗µ) . (2.5)

In particular, because x 7→ xp is increasing on R+ , we also have


p p
kϕst st
XY kLp (µ⊗µ) − kψXY kLp (µ⊗µ) ≤ kϕst st
XY − ψXY kLp (µ⊗µ) . (2.6)

Next we observe:
Z Z 1/p
0 0 p 0 0
kϕst
XY kLp (µ⊗µ) = ϕst
XY (x, y, x , y ) dµ(x, y) dµ(x , y )
X×Y X×Y
Z Z 1/p
0 p 0 0
= |ωX (s, x )| dµ(x, y) dµ(x , y )
X×Y X×Y
Z 1/p
0 p 0
= |ωX (s, x )| dµX (x ) = eccout
p,X (s).
X
st
Similarly, kψXY kLp (µ⊗µ) = eccout
p,Y (t).

133
For the right side of Inequality (2.6), we have:
Z Z
p 0 0 0 0 p
st st
kϕXY − ψXY kLp (µ⊗µ) = ϕst st
XY (x, y, x , y ) − ψXY (x, y, x , y ) dµ(x, y) dµ(x0 , y 0 )
ZX×Y ZX×Y
p
= |ωX (s, x0 ) − ωY (t, y 0 )| dµ(x, y) dµ(x0 , y 0 )
ZX×Y X×Y
p
= |ωX (s, x0 ) − ωY (t, y 0 )| dµ(x0 , y 0 ).
X×Y

Putting all these observations together with Inequality (2.6), we have:


Z
p p
out out
eccp,X (s) − eccp,Y (t) ≤ |ωX (s, x0 ) − ωY (t, y 0 )| dµ(x0 , y 0 ).
X×Y

The left hand side above is independent of the coupling µ, so we can infimize over C (µX , µY ):
Z
p p p
out out
eccp,X (s) − eccp,Y (t) ≤ inf |ωX (s, x0 ) − ωY (t, y 0 )| dν(x0 , y 0 ) = eccout
p,X,Y (s, t) .
ν∈C X×Y

Also observe:
Z 1/p
st st p
kϕXY − ψXY kLp (µ⊗µ) dµ(s, t)
X×Y
Z Z 1/p
0 0 p 0 0
= |ωX (s, x ) − ωY (t, y )| dµ(x , y ) dµ(s, t)
X×Y X×Y
= disp (µ) < 2η.

Thus we obtain:
Z 1/p Z 1/p
p p
eccout out
p,X (s) − eccp,Y (t) dµ(s, t) ≤ eccout
p,X,Y (s, t) dµ(s, t) < 2η.
X×Y X×Y

Since η > 2dN,p (X, Y ) was arbitrary, it follows that


 Z 1/p  Z 1/p
p p
inf eccout
p,X (s) − eccout
p,Y (t) dµ(s, t) ≤ inf eccout
p,X,Y (s, t) dµ(s, t)
µ∈C X×Y µ∈C X×Y
(2.7)
≤ 2dN,p (X, Y ).

This proves the p ∈ [1, ∞) case. The p = ∞ case follows by applying Minkowski’s
inequality to obtain Inequality (2.5), and working analogously from there. Finally, we
remark that the same proof holds for the eccin in
p,X and eccp,X,Y functions.

The following lemma is a particular statement of the change of variables theorem that
we use later.

134
Lemma 151 (Change of variables). Let (X, FX , µX ) and (Y, FY , µY ) be two probability
spaces. Let f : X → R and g : Y → R be two measurable functions. Write f∗ µX and
g∗ µY to denote the pushforward distributions on R. Let T : X × Y → R2 be the map
(x, y) 7→ (f (x), g(y)) and let h : R2 → R+ be measurable. Next let µ ∈ C (µX , µY ). Then
T∗ µ ∈ C (f∗ µX , g∗ µY ), and the following inequality holds:
 Z 1/p Z 1/p
inf h(a, b) d(T∗ µ)(a, b) ≤ h(T (x, y)) dµ(x, y) .
ν∈C (f∗ µX ,g∗ µY ) R2 X×Y

This is essentially the same as [86, Lemma 6.1] but stated for general probability spaces
instead of metric measure spaces. The form of the statement in [86, Lemma 6.1] is slightly
different, but it can be obtained from the statement presented above by using [120, Remark
2.19].
Proof of Lemma 151. First we check that T∗ µ ∈ C (f∗ µX , g∗ µY ). Let A ∈ Borel(R).
Then,

T∗ µ(A × R) = µ ({(x, y) ∈ X × Y : T (x, y) ∈ A × R}) = µ ({(x, y) ∈ X × Y : f (x) ∈ A})


= f∗ µX (A).

Similarly we check T∗ µ(R × A) = g∗ µY (A).


Next we check the inequality. By the change of variables formula, we have:
Z 1/p Z 1/p
h(a, b) d(T∗ µ)(a, b) = h(T (x, y)) dµ(x, y) .
R2 X×Y

We have already verified that T∗ µ ∈ C (f∗ µX , g∗ µY ). The inequality is obtained by in-


fimizing the left hand side over all possible couplings ν ∈ C (f∗ µX , g∗ µY ). This does not
affect the right hand side, which is independent of such couplings.
Proof of Theorem 127. Consider the probability spaces X × X and Y × Y , equipped with
the product measures µX ⊗ µX and µY ⊗ µY . For convenience, we define the shorthand
notation νX := (ωX )∗ (µX ⊗µX ) and νY := (ωY )∗ (µY ⊗µY ). Let T : (X ×X)×(Y ×Y ) →
R2 be the map (x, x0 , y, y 0 ) 7→ (ωX (x, x0 ), ωY (y, y 0 )). Also let h : R2 → R be the map
(a, b) 7→ |a − b|p .
Let η > dN,p (X, Y ), and let µ ∈ C (µX , µY ) be a coupling such that disp (µ) < 2η.
Also let τ be a measure on X × X × Y × Y defined by writing τ (A, A0 , B, B 0 ) :=
µ(A, B)µ(A0 , B 0 ) for A, A0 ∈ Borel(X) and B, B 0 ∈ Borel(Y ). Then τ ∈ C (µ⊗2 ⊗2
X , µY ).

135
By Lemma 151, we know that T∗ τ ∈ C (νX , νY ). By the change of variables formula
and Fubini’s theorem,
Z 1/p Z 1/p
p 0 0 p 0 0
|a − b| d(T∗ τ )(a, b) = |ωX (x, x ) − ωY (y, y )| dτ (x, x , y, y )
R2 X 2 ×Y 2
Z 1/p
0 0 p 0 0
= |ωX (x, x ) − ωY (y, y )| d (µ(x, y)µ(x , y ))
X 2 ×Y 2
Z Z 1/p
0 0 p 0 0
= |ωX (x, x ) − ωY (y, y )| dµ(x, y) dµ(x , y )
X×Y X×Y
< 2η.
We infimize over C (µ⊗2 ⊗2
X , µY ), use the fact that η > dN,p (X, Y ) was arbitrary, and apply
Lemma 151 to obtain:
Z !1/p
p
2dN,p (X, Y ) ≥ inf
⊗2 ⊗2
|ωX (x, x0 ) − ωY (y, y 0 )| dµ(x, x0 , y, y 0 )
µ∈C (µX ,µY ) X 2 ×Y 2
Z 1/p
p
≥ inf |a − b| dγ(a, b) .
γ∈C (νX ,νY ) R2
This yields Inequalities (1.7)-(1.8).
Next we consider the distributions induced by the eccout function. For convenience,
write eX := (eccout out
p,X )∗ µX and eY := (eccp,Y )∗ µY . Now let T : X × Y → R be the map
(x, y) 7→ (eccout out 2 p
p,X (x), eccp,Y (y)), and let h : R → R be the map (a, b) 7→ |a − b| . By the
change of variables formula and Theorem 121, we know
Z 1/p Z 1/p
p out out p
inf |a − b| d(T∗ µ)(a, b) = inf eccp,X (x) − eccp,X (y) dµ(x, y)
µ∈C (µX ,µY ) R2 µ∈C (µX ,µY ) X×Y
≤ 2dN,p (X, Y ).
By Lemma 151, we know that T∗ µ ∈ C (eX , eY ) and also the following:
Z 1/p Z 1/p
p p
inf |a − b| dγ(a, b) ≤ inf eccout out
p,X (x) − eccp,X (y) dµ(x, y)
γ∈C (eX ,eY ) R2 µ∈C (µX ,µY ) X×Y
≤ 2dN,p (X, Y ).
This proves Inequalities (1.9)-(1.10). Inequalities (1.11)-(1.12) are proved analogously.
Finally we consider the distributions obtained as pushforwards of the joint eccentricity
function, i.e. Inequalities (1.13)-(1.16). For each x ∈ X and y ∈ Y let T xy : X×Y → R be
the map (x0 , y 0 ) 7→ (ωX (x, x0 ), ωY (y, y 0 )), and let h : R2 → R be the map (a, b) 7→ |a − b|p .
Let γ ∈ C (µX , µY ). By the change of variables formula, we have
Z Z
p p
xy
|a − b| d(T∗ γ)(a, b) = |ωX (x, x0 ) − ωY (y, y 0 )| dγ(x0 , y 0 ), and so
R2
Z Z ZX×Y Z
p p
|a − b| d(T∗xy γ)(a, b) dµ(x, y) = |ωX (x, x0 ) − ωY (y, y 0 )| dγ(x0 , y 0 ) dµ(x, y).
X×Y R2 X×Y X×Y

136
By Lemma 151, T∗xy µ ∈ C (λX (x), λY (y)). Applying Theorem 121 and Lemma 151, we
have:
Z Z 1/p
0 0 p 0 0
2dN,p (X, Y ) ≥ inf inf |ωX (x, x ) − ωY (y, y )| dγ(x , y ) dµ(x, y)
µ∈C (µX ,µY ) X×Y γ∈C (µX ,µY ) X×Y
Z Z 1/p
p
≥ inf inf |a − b| dγ(a, b) dµ(x, y) .
µ∈C (µX ,µY ) X×Y γ∈C (λX (x),λY (y)) R2

This verifies Inequalities (1.13)-(1.14). Inequalities (1.15)-(1.16) are proved analogously.

137
Chapter 3: Persistent Homology on Networks

In this chapter, we give proofs and auxiliary results related to persistent homology.

3.1 Background on persistence and interleaving

3.1.1 Homology, persistence, and tameness


We assume that the reader is familiar with terms and concepts related to simplicial
homology, and refer to [88] for details. Here we describe our choices of notation. Whenever
we have a simplicial complex over a set X and a k-simplex {x0 , x1 , . . . , xk }, k ∈ Z+ , we
will assume that the simplex is oriented by the ordering x0 < x1 < . . . < xk . We will write
[x0 , x1 , . . . , xk ] to denote the equivalence class of the even permutations of this chosen
ordering, and −[x0 , x1 , . . . , xk ] to denote the equivalence class of the odd permutations of
this ordering. Given a simplicial complex Σ, we will denote its geometric realization by
|Σ|. The weak topology on |Σ| is defined by requiring that a subset A ⊆ |Σ| is closed
if and only if A ∩ |σ| is closed in |σ| for each σ ∈ Σ. A simplicial map f : Σ → Ξ
between two simplicial complexes P a map |f | : |Σ| → |Ξ| between the geometric
induces P
realizations, defined as |f |( v∈Σ av v) := v∈Σ av f (v). These induced maps satisfy the
usual composition identity: given simpicial maps f : Σ → Ξ and g : Ξ → Υ, we have
|g ◦ f | = |g| ◦ |f |. To see this, observe the following:
X X X X
|g ◦ f |( av v) = av g(f (v)) = |g|( av f (v)) = |g| ◦ |f |( av v). (3.1)
v∈Σ v∈Σ v∈Σ v∈Σ

A filtration of a simplicial complex Σ (also called a filtered simplicial complex) is de-


0
fined to be a nested sequence {Σδ ⊆ Σδ }δ≤δ0 ∈R of simplicial complexes satisfying the con-
dition that there exist δI , δF ∈ R such that Σδ = ∅ for all δ ≤ δI , and Σδ = Σ for all δ ≥
δF .
Fix a field K. Given P a finite simplicial complex Σ and a dimension k ∈ Z+ , we will
denote a k-chain in Σ as i ai σi , where each ai ∈ K and each σi ∈ Σ is a k-simplex. We
write Ck (Σ) or just Ck to denote the K-vector space of all k-chains. We will write ∂k to
denote the associated boundary map ∂k : Ck → Ck−1 :
X
∂k [x0 , . . . , xk ] := (−1)i [x0 , . . . , x̂i , . . . , xk ],
i

138
where x̂i denotes omission of xi from the sequence.
We will write C = (Ck , ∂k )k∈Z+ to denote a chain complex, i.e. a sequence of vector
spaces with boundary maps such that ∂k−1 ◦ ∂k = 0. Given a chain complex C and any
k ∈ Z+ , the k-th homology of the chain complex C is denoted Hk (C) := ker(∂k )/ im(∂k+1 ).
The k-th Betti number of C is denoted βk (C) := dim(Hk (C)).
Given a simplicial map f between simplicial complexes, we write f∗ to denote the
induced chain map between the corresponding chain complexes [88, §1.12], and (fk )# to
denote the linear map on kth homology vector spaces induced for each k ∈ Z+ .
The operations of passing from simplicial complexes and simplicial maps to chain com-
plexes and induced chain maps, and then to homology vector spaces with induced linear
maps, will be referred to as passing to homology. Recall the following useful fact, often
referred to as functoriality of homology [88, Theorem 12.2]: given a composition g ◦ f of
simplicial maps, we have

(gk ◦ fk )# = (gk )# ◦ (fk )# for each k ∈ Z+ . (3.2)


µδ,δ0 0
A persistence vector space is defined to be a family of vector spaces {U δ −−→ U δ }δ≤δ0 ∈R
such that: (1) µδ,δ is the identity for each δ ∈ R, and (2) µδ,δ00 = µδ0 ,δ00 ◦ µδ,δ0 for each
δ ≤ δ 0 ≤ δ 00 ∈ R. The persistence vector spaces that we consider in this work also satisfy
the following conditions: (1) dim(U δ ) < ∞ at each δ ∈ R, (2) there exist δI , δF ∈ R
such that all maps µδ,δ0 are isomorphisms for δ, δ 0 ≥ δF and for δ, δ 0 ≤ δI , and (3) there
are only finitely many values of δ ∈ R such that U δ−ε ∼ 6= U δ for each ε > 0. Here δ is
referred to as a resolution parameter, and such a persistence vector space is described as
being R-indexed. The collection of all such persistence vector spaces is denoted PVec(R).
Observe that by fixing k ∈ Z+ and passing to the kth homology vector space at each step
Σδ of a filtered simplicial complex (Σδ )δ∈R , the functoriality of homology gives us the kth
persistence vector space associated to (Σδ )δ∈R , denoted
(ιδ,δ0 )# 0
Hk (Σ) := {Hk (C δ ) −−−−→ Hk (C δ )}δ≤δ0 ∈R .

The elements of PVec(R) contain only a finite number of vector spaces, up to isomor-
phism. By the classification results in [21, §5.2], it is possible to associate a full invariant,
called a persistence barcode or persistence diagram, to each element of PVec(R). This
barcode is a multiset of persistence intervals, and is represented as a set of lines over a
single axis. The barcode of a persistence vector space V is denoted Pers(V). The intervals
in Pers(V) can be represented as the persistence diagram of V, which is as a multiset of
2
points lying on or above the diagonal in R , counted with multiplicity. More specifically,
 2 
Dgm(V) := (δi , δj+1 ) ∈ R : [δi , δj+1 ) ∈ Pers(V) ,
2
where the multiplicity of (δi , δj+1 ) ∈ R is given by the multiplicity of [δi , δj+1 ) ∈ Pers(V).

139
Persistence diagrams can be compared using the bottleneck distance, which we denote
by dB . Details about this distance, as well as the other material related to persistent homol-
ogy, can be found in [26]. Numerous other formulations of the material presented above
can be found in [49, 123, 19, 47, 50, 9, 48].

Remark 152. Whenever we describe a persistence diagram as being trivial, we mean that
it does not have any off-diagonal points.

3.1.2 Interleaving distance and stability of persistence vector spaces.


In what follows, we will consider R-indexed persistence vector spaces PVec(R).
νδ,δ0 0
Given ε ≥ 0, two R-indexed persistence vector spaces V = {V δ −−→ V δ }δ≤δ0 and
µδ,δ0 0
U = {U δ −−→ U δ }δ≤δ0 are said to be ε-interleaved [24, 9] if there exist two families of
linear maps

{ϕδ,δ+ε : V δ → U δ+ε }δ∈R ,


{ψδ,δ+ε : U δ → V δ+ε }δ∈R

such that the following diagrams commute for all δ ≤ δ 0 ∈ R:

νδ,δ0 0 νδ+ε,δ0 +ε 0
Vδ Vδ V δ+ε V δ +ε
ϕδ 0 ψδ
ϕδ ψδ0
µδ+ε,δ0 +ε µδ,δ0
δ+ε δ 0 +ε δ δ0
U U U U

νδ,δ+2ε
Vδ V δ+2ε V δ+ε
ψδ ϕδ+ε
ϕδ ψδ+ε
µδ,δ+2ε
U δ+ε Uδ U δ+2ε

The purpose of introducing ε-interleavings is to define a pseudometric on the collection


of persistence vector spaces. The interleaving distance between two R-indexed persistence
vector spaces V, U is given by:

dI (U, V) := inf{ε ≥ 0 : U and V are ε-interleaved}.

This definition induces an extended pseudometric on the collection of persistence vector


spaces [24, 9, 26].
Before stating the following lemma, recall that two simplicial maps f, g : Σ → Ξ are
contiguous if for any simplex σ ∈ Σ, f (σ) ∪ g(σ) is a simplex of Ξ. Contiguous maps
satisfy the following useful properties:

140
Proposition 153 (Properties of contiguous maps). Let f, g : Σ → Ξ be two contiguous
simplicial maps. Then,

1. |f |, |g| : |Σ| → |Ξ| are homotopic [112, §3.5], and

2. The chain maps induced by f and g are chain homotopic, and as a result, the induced
maps f# and g# for homology are equal [88, Theorem 12.5].

Lemma 154 (Stability Lemma, [45]). Let F, G be two filtered simplicial complexes written
as
sδ,δ0 0 tδ,δ0 0
{Fδ −−→ Fδ }δ0 ≥δ∈R and {Gδ −−→ Gδ }δ0 ≥δ∈R ,
where sδ,δ0 and tδ,δ0 denote the natural
 δ δ+η
 η ≥ δ0 is such
inclusion maps. Suppose
δ+η
that there
exist families of simplicial maps ϕδ : F → G δ∈R
and ψδ : G → F δ∈R
such
that the following are satisfied for any δ ≤ δ 0 :

1. tδ+η,δ0 +η ◦ ϕδ and ϕδ0 ◦ sδ,δ0 are contiguous

2. sδ+η,δ0 +η ◦ ψδ and ψδ0 ◦ tδ,δ0 are contiguous

3. ψδ+η ◦ ϕδ and sδ,δ+2η are contiguous

4. ϕδ+η ◦ ψδ and tδ,δ+2η are contiguous.

All the diagrams are as below:

sδ,δ0 0 sδ+η,δ0 +η 0
Fδ Fδ Fδ+η Fδ +η
ϕδ0 ψδ
ϕδ
ψδ0
tδ+η,δ0 +η tδ,δ0
δ+η δ 0 +η δ δ0
G G G G

sδ,δ+2η
Fδ Fδ+2η Fδ+η
ψδ ϕδ+η
ϕδ ψδ+η
tδ,δ+2η
δ+η
G Gδ Gδ+2η

For each k ∈ Z+ , let PVeck (F), PVeck (G) denote the k-dimensional persistent vector
spaces associated to F and G. Then for each k ∈ Z+ ,

dI (PVeck (F), PVeck (G)) ≤ η.

141
3.2 Simplicial constructions

3.2.1 Stability of Vietoris-Rips and Dowker constructions


Proof of Proposition 29. Both cases are similar, so we just prove the result for PVecsi . Let
η > 2dN (X, Y ). Then by Proposition 7, there exist maps ϕ : X → Y, ψ : Y → X such
that
max(dis(ϕ), dis(ψ), CX,Y (ϕ, ψ), CY,X (ψ, ϕ)) < η.
First we check that ϕ, ψ induce simplicial maps ϕδ : Dsiδ,X → Dsiδ+η,Y and ψδ : Dsiδ,Y →
Dsiδ+η,Y for each δ ∈ R.
Let δ 0 ≥ δ ∈ R. Let σ = [x0 , . . . , xn ] ∈ Dsiδ,X . Then there exists x0 ∈ X such that
ωX (xi , x0 ) ≤ δ for each 0 ≤ i ≤ n. Fix such an x0 . Since dis(ϕ) < η, we have the
following for each i:
|ωX (xi , x0 ) − ωY (ϕ(xi ), ϕ(x0 ))| < η.
So ωY (ϕ(xi ), ϕ(x0 )) < ωX (xi , x0 ) + η ≤ δ + η for each 0 ≤ i ≤ n. Thus ϕδ (σ) :=
{ϕ(x0 ), . . . , ϕ(xn )} is a simplex in Dsiδ+η,Y . Thus the map on simplices ϕδ induced by ϕ is
simplicial for each δ ∈ R.
Similarly we check that the map ψδ on simplices induced by ψ is simplicial. Now to
prove the result, it will suffice to check the contiguity conditions in the statement of Lemma
154. Consider the following diagram:
sδ,δ0
Dsiδ,X Dsiδ0 ,X
ϕδ 0
ϕδ
tδ+η,δ0 +η
Dsiδ+η,Y Dsiδ0 +η,Y

Here sδ,δ0 and tδ+η,δ0 +η are the inclusion maps. We claim that tδ+η,δ0 +η ◦ϕδ and ϕδ0 ◦sδ,δ0
are contiguous simplicial maps. To see this, let σ ∈ Dsiδ,X . Since sδ,δ0 is just the inclusion,
it follows that tδ+η,δ0 +η (ϕδ (σ)) ∪ ϕδ0 (sδ,δ0 (σ)) = ϕδ (σ), which is a simplex in Dsiδ+η,Y
because ϕδ is simplicial, and hence a simplex in Dsiδ0 +η,Y because the inclusion tδ+η,δ0 +η is
simplicial. Thus tδ+η,δ0 +η ◦ ϕδ and ϕδ0 ◦ sδ,δ0 are contiguous, and their induced linear maps
for homology are equal. By a similar argument, we verify that sδ+η,δ0 +η ◦ ψδ and ψδ0 ◦ tδ,δ0
are contiguous simplicial maps as well.
Next we check that the maps ψδ+η ◦ ϕδ and sδ,δ+2η in the figure below are contiguous.
sδ,δ+2η
Dsiδ,X Dsiδ+2η,X

ϕδ ψδ+η

Dsiδ+η,Y

142
Let xi ∈ σ. Note that for our fixed σ = [x0 , . . . , xn ] ∈ Dsiδ,X and x0 , we have:

|ωX (xi , x0 ) − ωX (ψ(ϕ(xi )), ψ(ϕ(x0 )))| ≤ |ωX (xi , x0 ) − ωY (ϕ(xi ), ϕ(x0 ))|
+ |ωY (ϕ(xi ), ϕ(x0 )) − ωX (ψ(ϕ(xi )), ψ(ϕ(x0 )))|
< 2η.
Thus we obtain ωX (ψ(ϕ(xi )), ψ(ϕ(x ))) < ωX (xi , x0 ) + 2η ≤ δ + 2η.
0

Since this holds for any xi ∈ σ, it follows that ψδ+η (ϕδ (σ)) ∈ Dsiδ+2η,X . We further
claim that

τ := σ ∪ ψδ+η (ϕδ (σ)) = {x0 , x1 , . . . , xn , ψ(ϕ(x0 )), . . . , ψ(ϕ(xn ))}

is a simplex in Dsiδ+2η,X . Let 0 ≤ i ≤ n. It suffices to show that ωX (xi , ψ(ϕ(x0 )) ≤ δ + 2η.


Notice that from the reformulation of dN (Proposition 7), we have

CX,Y (ϕ, ψ) = max |ωX (x, ψ(y)) − ωY (ϕ(x), y)| < η.


(x,y)∈X×Y

Let y = ϕ(x0 ). Then |ωX (xi , ψ(y)) − ωY (ϕ(xi ), y)| < η. In particular,

ωX (xi , ψ(ϕ(x0 ))) < ωY (ϕ(xi ), ϕ(x0 )) + η ≤ ωX (xi , x0 ) + 2η ≤ δ + 2η.

Since 0 ≤ i ≤ n were arbitrary, it follows that τ ∈ Dsiδ+2η,X . Thus ψδ+η ◦ ϕδ and sδ,δ+2η
are contiguous. Similarly, we use the dis(ψ) and CY,X (ψ, ϕ) terms to verify that tδ,δ+2η and
ϕδ+η ◦ ψδ are contiguous.
Since η > 2dN (X, Y ) was arbitrary, the result now follows by an application of Lemma
154.

3.2.2 The Functorial Dowker Theorem and equivalence of diagrams


Let (X, ωX ) ∈ CN , and let δ ∈ R be such that Rδ,X is nonempty. By applying
Dowker’s theorem (Theorem 33) to the setting Y = X, we have Hk (Dsiδ,X ) ∼ = Hk (Dsoδ,X ),
for any k ∈ Z+ . We still have this equality in the case where Rδ,X is empty, because then
Dsiδ,X and Dso
δ,X are both empty. Thus we obtain:

Corollary 155. Let (X, ωX ) ∈ CN , δ ∈ R, and k ∈ Z+ . Then,

Hk (Dsiδ,X ) ∼
= Hk (Dso
δ,X ).

In the persistent setting, Theorem 33 and Corollary 155 suggest the following question:

Given a network (X, ωX ) and a fixed dimension k ∈ Z+ , are the persistence di-
agrams of the Dowker sink and source filtrations of (X, ωX ) necessarily equal?

143
In what follows, we provide a positive answer to the question above. Our strategy is to
use the Functorial Dowker Theorem (Theorem 35), for which we will provide a complete
proof below. The Functorial Dowker Theorem implies equality between sink and source
persistence diagrams.

Corollary 156 (Dowker duality). Let (X, ωX ) ∈ FN , and let k ∈ Z+ . Then,

Dgmsik (X) = Dgmso


k (X).

Thus we may call either of the diagrams above the k-dimensional Dowker diagram of X,
denoted DgmDk (X).

Before proving the corollary, we state an R-indexed variant of the Persistence Equiva-
lence Theorem [47]. This particular version follows from the isometry theorem [9], and we
refer the reader to [26, Chapter 5] for an expanded presentation of this material.

Theorem 157 (Persistence Equivalence Theorem). Consider two persistent vector spaces
µδ,δ0 0 νδ,δ0 0
U = {U δ −−→ U δ }δ≤δ0 ∈R and V = {V δ −−→ V δ }δ≤δ0 ∈R with connecting maps fδ :
U δ → V δ.
0 00
··· Uδ Uδ Uδ ···

fδ fδ 0 fδ00
0 00
··· Vδ Vδ Vδ ···

If the fδ are all isomorphisms and each square in the diagram above commutes, then:

Dgm(U) = Dgm(V).

Proof of Corollary 156. Let δ ≤ δ 0 ∈ R, and consider the relations Rδ,X ⊆ Rδ0 ,X ⊆
X × X. Suppose first that Rδ,X and Rδ0 ,X are both nonempty. By applying Theorem 35,
we obtain homotopy equivalences between the source and sink complexes that commute
with the canonical inclusions up to homotopy. Passing to the k-th homology level, we
obtain persistence vector spaces that satisfy the commutativity properties of Theorem 157.
The result follows from Theorem 157.
In the case where Rδ,X and Rδ0 ,X are both empty, there is nothing to show because all
the associated complexes are empty. Suppose Rδ,X is empty, and Rδ0 ,X is nonempty. Then
Dsiδ,X and Dso si so
δ,X are empty, so their inclusions into Dδ 0 ,X and Dδ 0 ,X induce zero maps upon
passing to homology. Thus the commutativity of Theorem 157 is satisfied, and the result
follows by Theorem 157.

144
The proof of the Functorial Dowker Theorem It remains to prove Theorem 35. Because
the proof involves numerous maps, we will adopt the notational convention of adding a
subscript to a function to denote its codomain—e.g. we will write fB to denote a function
with codomain B.
First we recall the construction of a combinatorial barycentric subdivision (see [46, §2],
[81, §4.7], [7, Appendix A]).
Definition 47 (Barycentric subdivisions). For any simplicial complex Σ, one may construct
a new simplicial complex Σ(1) , called the first barycentric subdivision, as follows:
Σ(1) := {[σ1 , σ2 , . . . , σp ] : σ1 ⊆ σ2 ⊆ . . . ⊆ σp , each σi ∈ Σ} .
Note that the vertices of Σ(1) are the simplices of Σ, and the simplices of Σ(1) are nested
sequences of simplices of Σ. Furthermore, note that given any two simplicial complexes
Σ, Ξ and a simplicial map f : Σ → Ξ, there is a natural simplicial map f 1 : Σ1 → Ξ1
defined as:
f 1([σ1 , . . . , σp ]) := [f (σ1 ), . . . , f (σp )], σ1 ⊆ σ2 ⊆ . . . , σp , each σi ∈ Σ.
To see that this is simplicial, note that f (σi ) ⊆ f (σj ) whenever σi ⊆ σj . As a special case,
observe that any inclusion map ι : Σ ,→ Ξ induces an inclusion map ι1 : Σ1 ,→ Ξ1.
Given a simplex σ = [x0 , . . . , xk ] in a simplicial complex Σ, one defines the barycenter
to be the point B(σ) := ki=0 k+1 1
xi ∈ |Σ|. Then the spaces |Σ1| and |Σ| can be identified
P
via a homeomorphism E|Σ| : |Σ1| → |Σ| defined on vertices by E|Σ| (σ) := B(σ) and
extended linearly.
Details on the preceding list of definitions can be found in [88, §2.14-15, 2.19], [112,
§3.3-4], and also [7, Appendix A]. The next proposition follows from the discussions in
these references, and is a simple restatement of [7, Proposition A.1.5]. We provide a proof
in the appendix for completeness.
Proposition 158 (Simplicial approximation to E• ). Let Σ be a simplicial complex, and let
Φ : Σ1 → Σ be a simplicial map such that Φ(σ) ∈ σ for each σ ∈ Σ. Then |Φ| ' E|Σ| .
We now introduce some auxiliary constructions dating back to [46] that use the setup
stated in Theorem 35. For any nonempty relation R ⊆ X × Y , one may define [46, §2]
an associated map ΦER : ER 1 → ER as follows: first define ΦER on vertices of ER 1 by
ΦER (σ) = sσ , where sσ is the least vertex of σ with respect to the total order. Next, for
any simplex [σ1 , . . . , σk ] of ER 1, where σ1 ⊆ . . . ⊆ σk , we have ΦER (σi ) = sσi ∈ σk for
all 1 ≤ i ≤ k. Thus [ΦER (σ1 ), . . . , ΦER (σk )] = [sσ1 , sσ2 , . . . , sσk ] is a face of σk , hence a
simplex of Σ. This defines ΦER as a simplicial map ER 1 → ER . This argument also shows
that ΦER is order-reversing: if σ ⊆ σ 0 , then ΦER (σ) ≥ ΦER (σ 0 ).
Remark 159. Applying Proposition 158 to the setup above, one sees that |ΦER | ' E|ER | .
(2)
After passing to a second barycentric subdivision ER (obtained by taking a barycentric
subdivision of ER 1) and obtaining a map ΦER 1 : ER → ER 1, one also has |ΦER 1 | '
(2)

E|ER 1| .

145
One also defines [46, §3] a simplicial map ΨFR : ER 1 → FR as follows. Given a
vertex σ = [x0 , . . . , xk ] ∈ ER 1, one defines ΨFR (σ) = yσ , for some yσ ∈ Y such that
(xi , yσ ) ∈ R for each i. To see why this vertex map is simplicial, let σ 1 = [σ0 , . . . , σk ]
be a simplex in ER 1. Let x ∈ σ0 . Then, because σ0 ⊆ σ1 ⊆ . . . ⊆ σk , we automatically
have that (x, ΨFR (σi )) ∈ R, for each i = 0, . . . , k. Thus ΨFR (σ 1) is a simplex in FR . This
definition involves a choice of yσ when writing ΨFR (σ) = yσ , but all the maps resulting
from such choices are contiguous [46, §3].
The preceding map induces a simplicial map ΨFR 1 : ER → FR 1 as follows. Given
(2)

a vertex τ 1 = [τ0 , . . . , τk ] ∈ ER , i.e. a simplex in ER 1, we define ΨFR 1 (τ 1) :=


(2)

[ΨFR (τ0 ), . . . , ΨFR (τk )]. Since ΨFR is simplicial, this is a simplex in FR , i.e. a vertex
in FR 1. Thus we have a vertex map ΨFR 1 : ER → FR 1. To check that this map is sim-
(2)

plicial, let τ (2) = [τ 10 , . . . , τ 1p ] be a simplex in ER . Then τ 10 ⊆ τ 11 ⊆ . . . ⊆ τ 1p , and


(2)

because ΨFR is simplicial, we automatically have

ΨFR (τ 10 ) ⊆ ΨFR (τ 11 ) ⊆ . . . ⊆ ΨFR (τ 1p ).

Thus ΨFR 1 (τ (2) ) is a simplex in FR 1.

Proof of Theorem 35. We write FR to denote the barycentric subdivision of FR 1, and


(2)

obtain simplicial maps ΦFR 1 : FR → FR 1, ΦFR : FR → FR , ΨER 1 : FR → ER 1, and


(2) (1) (2)

(1)
ΨFR : ER → FR as above. Consider the following diagram:

(2) (2)
FR FR 0 ΦFR0 1
ΦFR 1
(1) (1)
ΨER 1 FR ΦFR FR 0 ΦFR0

ΨFR 1 ΨER FR FR 0
(2) (2)
ER ER 0
ΦFR 1 ΨFR
ER 1 ER0 1
ΦFR ER ER0

We proceed by claiming contiguity of the following.

146
FR 1 FR
(2)
FR
(1)
FR
ΦFR 1 ΦFR
ΨFR 1 ΨER
ΨER 1 ΨFR
ER
(2)
ER
(1)
ER ER 1
ΦER 1 ΦER

FR
(2) (1)
FR FR 1 FR
ΦFR 1 ΦFR
ΨFR 1
ΨER
ΨER 1 ΨFR
ΦER ΦER 1
ER 1 ER ER
(2) (1)
ER

Claim 13. More specifically:

1. ΦER ◦ ΦER 1 and ΨER ◦ ΨFR 1 are contiguous.

2. ΦFR ◦ ΦFR 1 and ΨFR ◦ ΨER 1 are contiguous.

3. ΨER ◦ ΦFR 1 and ΦER ◦ ΨER 1 are contiguous.

4. ΨFR ◦ ΦER 1 and ΦFR ◦ ΨFR 1 are contiguous.

Items (1) and (3) appear in the proof of Dowker’s theorem [46, Lemmas 5, 6], and it is
easy to see that a symmetric argument shows Items (2) and (4). For completeness, we will
verify these items in this paper, but defer this verification to the end of the proof.
By passing to the geometric realization and applying Proposition 153 and Remark 159,
we obtain the following from Item (3) of Claim 13:

|ΨER | ◦ |ΦFR 1 | ' |ΦER | ◦ |ΨER 1 |,


|ΨER | ◦ E|FR 1| ' E|ER | ◦ |ΨER 1 |, (Remark 159)
−1
E|ER|
◦ |ΨER | ◦ E|FR 1| ' |ΨER 1 |. (E is a homeomorphism, hence invertible)

Replacing this term in the expression for Item (2) of Claim 13, we obtain:

|ΨFR | ◦ |ΨER 1 | ' |ΦFR | ◦ |ΦFR 1 | ' E|FR | ◦ E|FR 1| ,


−1
|ΨFR | ◦ E|ER | ◦ |ΨER | ◦ E|FR 1| ' E|FR | ◦ E|FR 1| ,
−1 −1
|ΨFR | ◦ E|ER|
◦ |ΨER | ◦ E|FR|
' id|FR | .

Similarly, we obtain the following from Item (4) of Claim 13:


−1
|ΨFR | ◦ |ΦER 1 | ' |ΦFR | ◦ |ΨFR 1 |, so E|FR|
◦ |ΨFR | ◦ E|ER 1 | ' |ΨFR 1 |.

147
Replacing this term in the expression for Item (1) of Claim 13, we obtain:

|ΨER | ◦ |ΨFR 1 | ' |ΦER | ◦ |ΦER 1 | ' E|ER | ◦ E|ER 1| ,


−1
|ΨER | ◦ E|FR | ◦ |ΨFR | ◦ E|ER 1 | ' E|ER | ◦ E|ER 1|
−1 −1
|ΨER | ◦ E|F R|
◦ |ΨFR | ◦ E|E R|
' id|ER |
−1
Define Γ|ER | : |FR | → |ER | by Γ|ER | := |ΨER | ◦ E|F R|
. Then Γ|ER | is a homotopy equiva-
−1
lence, with inverse given by |ΨFR | ◦ E|ER | . This shows that |FR | ' |ER |, for any nonempty
relation R ⊆ X × Y .
Next we need to show that Γ|ER | commutes with the canonical inclusion. Consider
the following diagram, where the ι• maps denote the respective canonical inclusions (cf.
Definition 47):
ιF 1
FR 1 FR 0 1
ΦFR ΦFR0
ιF
ΨER FR FR 0 ΨER0

ιE
ER ER0

Claim 14. ιE ◦ ΨER and ΨER0 ◦ ιF 1 are contiguous.


Claim 15. ιF ◦ ΦFR and ΦFR0 ◦ ιF 1 are contiguous.
Suppose Claim 15 is true. Then, upon passing to geometric realizations, we have:

|ιF | ◦ E|FR | ' |ιF | ◦ |ΦFR | ' |ΦFR0 | ◦ |ιF 1 | ' E|FR0 | ◦ |ιF 1 |,
−1
E|F 0|
◦ |ιF | ◦ E|FR | ' |ιF 1 |.
R

Suppose also that Claim 14 is true. Then we have:

|ΨER0 | ◦ |ιF 1 | ' |ιE | ◦ |ΨER |,


−1
|ΨER0 | ◦ E|F 0|
◦ |ιF | ◦ E|FR | ' |ιE | ◦ |ΨER |,
R
−1 −1
|ΨER0 | ◦ E|F 0|
◦ |ιF | ' |ιE | ◦ |ΨER | ◦ E|FR|
, i.e.
R

Γ|ER0 | ◦ |ιF | ' |ιE | ◦ Γ|ER | .

This proves the theorem. It only remains to prove the various claims.
Proof of Claim 13. In proving Claim 13, we supply the proofs of Items (2) and (4). These
arguments are adapted from [46, Lemmas 1, 5, and 6], where the proofs of Items (1) and
(3) appeared.

148
For Item (2), let τ (2) = [τ0 1, . . . , τk 1] be a simplex in FR , where τ0 1 ⊆ . . . ⊆ τk 1
(2)

is a chain of simplices in FR 1. By the order-reversing property of the map ΦFR 1 , we have


that ΦFR 1 (τ0 1) ⊇ ΦFR 1 (τi 1) for each i = 0, . . . , k. Define x := ΨER (ΦFR 1 (τ0 1)). Then
(x, y) ∈ R for each y ∈ ΦFR 1 (τ0 1). But we also have (x, ΦFR (ΦFR 1 (τi 1))) ∈ R for each
i = 0, . . . , k, because ΦFR (ΦFR 1 (τi 1)) ∈ ΦFR 1 (τi 1) ⊆ ΦFR (τ0 1) for each i = 0, . . . , k.
Next let 0 ≤ i ≤ k. For each τ ∈ τ 1i , we have ΨER (τ ) ∈ ΨER 1 (τ 1i ) (by the defini-
tion of ΨER 1 ). Because ΦFR 1 (τ0 1) ∈ τ0 1 ⊆ τi 1, we then have x = ΨER (ΦFR 1 (τ0 1)) ∈
ΨER 1 (τ 1i ), which is a vertex of ER 1 or alternatively a simplex of ER . But then, by defini-
tion of ΨFR , we have that (x, ΨFR (ΨER 1 (τi 1))) ∈ R. This holds for each 0 ≤ i ≤ k. Since
τ (2) was arbitrary, this shows that ΦFR ◦ ΦFR 1 and ΨFR ◦ ΨER 1 are contiguous.
For Item (4), let σ (2) = [σ 10 , . . . , σ 1k ] be a simplex in ER . Let 0 ≤ i ≤ k. Then
(2)

σ 10 ⊆ . . . ⊆ σ 1k , and ΦER 1 (σ 1i ) ∈ σ 1i ⊆ σ 1k . So ΨFR (ΦER 1 (σi 1)) ∈ ΨFR 1 (σk 1). On


the other hand, we have ΨFR 1 (σi 1) ⊆ ΨFR 1 (σk 1). Then ΦFR (ΨFR 1 (σi 1)) ∈ ΨFR 1 (σi 1) ⊆
ΨFR 1 (σk 1). Since i was arbitrary, this shows that ΨFR ◦ ΦER 1 and ΦFR ◦ ΨFR 1 both map
the vertices of σ (2) to the simplex ΨFR 1 (σk 1), hence are contiguous. This concludes the
proof of the claim. 
Proof of Claim 14. Let τ 1 = [τ0 , τ1 , . . . , τk ] ∈ FR 1, where τ0 ⊆ τ1 ⊆ . . . ⊆ τk is a chain
of simplices in FR . Then ιF 1 (τ 1) = τ 1, and ΨER0 (τ 1) = [xτ0 , . . . , xτk ], for some choice
of xτi terms. Also we have ιE ◦ ΨER (τ 1) = [x0τ0 , . . . , x0τk ] for some other choice of x0τi
terms. For contiguity, we need to show that

[xτ0 , . . . , xτk , x0τ0 , . . . , x0τk ] ∈ ER0 .

But this is easy to see: letting y ∈ τ0 , we have (xτ0 , y), . . . , (xτk , y), (x0τ0 , y), . . . , (x0τk , y) ⊆


R. Since τ 1 was arbitrary, it follows that we have contiguity. 


Proof of Claim 15. Let τ 1 = [τ0 , τ1 , . . . , τk ] ∈ FR 1, where τ0 ⊆ τ1 ⊆ . . . ⊆ τk is a chain
of simplices in FR . Then ΦFR (τi ) ∈ τk for each 0 ≤ i ≤ k. Thus ιF ◦ ΦFR (τ 1) is a face
of τk . Similarly, ΦFR0 ◦ ιF 1 (τ 1) is also a face of τk . Since τ 1 was an arbitrary simplex of
FR 1, it follows that ιF ◦ ΦFR and ΦFR0 ◦ ιF 1 are contiguous. 

3.2.3 The equivalence between the finite FDT and the simplicial FNTs
In this section, we present our answer to Question 1. We present the proof of Theorem
39 over the course of the next few subsections.

Remark 160. By virtue of Theorem 39, we will write simplicial FNT to mean either of the
FNT I or FNT II.

149
Theorem 36 implies Theorem 37
Proof of Theorem 37. Let V, V 0 denote the vertex sets of Σ, Σ0 , respectively. We define
the relations R ⊆ V × I and R0 ⊆ V 0 × I 0 as follows: (v, i) ∈ R ⇐⇒ v ∈ Σi and
(v 0 , i0 ) ∈ R0 ⇐⇒ v 0 ∈ Σ0i . Then R ⊆ R0 , the set I 0 is finite by assumption, and so we
are in the setting of the finite FDT (Theorem 36) (perhaps invoking the Axiom of Choice
to obtain the total order on V 0 ). It suffices to show that ER = Σ, ER0 = Σ0 , FR = N (AΣ ),
and FR0 = N (AΣ0 ), where ER , ER0 , FR , FR0 are as defined in Theorem 35.
First we claim the ER = Σ. By the definitions of R and ER , we have ER = {σ ⊆
V : ∃i ∈ I, (v, i) ∈ R ∀ v ∈ σ} = {σ ⊆ V : ∃i ∈ I, v ∈ Σi ∀ v ∈ σ}. Let
σ ∈ ER , and let i ∈ I be such that v ∈ Σi for all v ∈ σ. Then σ ⊆ V (Σi ), and since
Σi = pow(V (Σi )) by the assumption about covers of simplices, we have σ ∈ Σi ⊆ Σ.
Thus ER ⊆ Σ. Conversely, let σ ∈ Σ. Then σ ∈ Σi for some i. Thus for all v ∈ σ, we
have (v, i) ∈ R. It follows that σ ∈ ER . This shows ER = Σ. The proof that ER0 = Σ0 is
analogous.
Next we claim that FR = N (AΣ ). By the definition of FR , we have FR = {τ ⊆ I :
∃v ∈ V, (v, i) ∈ R ∀ i ∈ τ }. Let τ ∈ FR , and let v ∈ V be such that (v, i) ∈ R for
each i ∈ τ . Then ∩i∈τ Σi 6= ∅, and so τ ∈ N (AΣ ). Conversely, let τ ∈ N (AΣ ). Then
∩i∈τ Σi 6= ∅, so there exists v ∈ V such that v ∈ Σi for each i ∈ τ . Thus σ ∈ FR . This
shows FR = N (AΣ ). The case for R0 is analogous.
An application of Theorem 36 now completes the proof.

Theorem 38 implies Theorem 36


Proof. Let X and Y be two sets, and suppose X is finite. Let R ⊆ R0 ⊆ X × Y be two
relations. Consider the simplicial complexes ER , FR , ER0 , FR0 as defined in Theorem 35.
Let VR := V (ER ). For each x ∈ VR , define Ax := {τ ∈ FR : (x, y) ∈ R for all y ∈ τ }.
Then Ax is a subcomplex of FR . Furthermore, ∪x∈VR Ax = FR . To see this, let τ ∈ FR .
Then there exists x ∈ X such that (x, y) ∈ R for all y ∈ τ , and so τ ∈ Ax .
Let A := {Ax : x ∈ VR }. We have seen that A is a cover of subcomplexes for FR . It
is finite because the indexing set VR is a subset of X, which is finite by assumption. Next
we claim that N (A) = ER . Let σ ∈ ER . Then there exists y ∈ Y such that (x, y) ∈ R
for all x ∈ σ. Thus ∩x∈σ Ax 6= ∅, and so σ ∈ N (A). Conversely, let σ ∈ N (A). Then
∩x∈σ Ax 6= ∅, and so there exists y ∈ Y such that (x, y) ∈ R for all x ∈ σ. Thus σ ∈ ER .
Next we check that nonempty finite intersections of elements in A are contractible.
Let σ ∈ N (A) = ER . Let Vσ := ∩x∈σ V (Ax ) ⊆ V (FR ). We claim that ∩x∈σ Ax =
pow(Vσ ), i.e. that the intersection is a full simplex in FR , hence contractible. The inclusion
∩x∈σ Ax ⊆ pow(Vσ ) is clear, so we show the reverse inclusion. Let τ ∈ pow(Vσ ), and let
y ∈ τ . Then y ∈ ∩x∈σ Ax , so (x, y) ∈ R for each x ∈ σ. This holds for each y ∈ τ , so it
follows that τ ∈ ∩x∈σ Ax . Thus ∩x∈σ Ax = pow(Vσ ). We remark that this also shows that
A is a cover of simplices for FR .

150
Now for each x ∈ V (ER0 ), define A0x := {τ ∈ FR0 : (x, y) ∈ R0 for all y ∈ τ }. Also
define A0 := {A0x : x ∈ V (ER0 )}. The same argument shows that A0 is a finite cover
of subcomplexes (in particular, a cover of simplices) for FR0 with all finite intersections
either empty or contractible, and that ER0 = N (A0 ). An application of Theorem 38 now
shows that |ER | ' |FR | and |ER0 | ' |FR0 |, via maps that commute up to homotopy with
the inclusions |ER | ,→ |ER0 | and |FR | ,→ |FR0 |.

Theorem 37 implies Theorem 38


We lead with some remarks about the ideas involved in this proof. Theorem 38 is a func-
torial statement in the sense that it is about an arbitrary inclusion Σ ⊆ Σ0 . Restricting the
statement to just Σ would lead to a non-functorial statement. A proof of this non-functorial
statement, via a non-functorial analogue of Theorem 37, can be obtained using techniques
presented in [12] (see also [78, Theorem 15.24]). We strengthen these techniques to our
functorial setting and thus obtain a proof of Theorem 38 via Theorem 37.
We first present a lemma related to barycentric subdivisions and several lemmas about
gluings and homotopy equivalences. These will be used in proving Theorem 38.
Definition 48 (Induced subcomplex). Let Σ be a simplicial complex, and let ∆ be a sub-
complex. Then ∆ is an induced subcomplex if ∆ = Σ ∩ pow(V (∆)).
Lemma 161. Let Σ be a simplicial complex, and let ∆ be a subcomplex. Then ∆1 is an
induced subcomplex of Σ1, i.e. ∆1 = Σ1 ∩ pow(V (∆1)).
Proof. Let σ be a simplex of ∆1. Then σ belongs to Σ1, and also to the full simplex
pow(V (∆1)). Thus ∆1 ⊆ Σ1 ∩ pow(V (∆1)). Conversely, let σ ∈ Σ1 ∩ pow(V (∆1)).
Since σ ∈ Σ1, we can write σ = [τ0 , . . . , τk ], where τ0 ⊆ . . . ⊆ τk . Since σ ∈
pow(V (∆1)) and the vertices of ∆1 are simplices of ∆, we also know that each τi is a
simplex of ∆. Thus σ ∈ ∆1. The equality follows.
Lemma 162 (Carrier Lemma, [12] §4). Let X be a topological space, and let Σ be a sim-
plicial complex. Also let f, g : X → |Σ| be any two continuous maps such that f (x), g(x)
belong to the same simplex of |Σ|, for any x ∈ X. Then f ' g.
Lemma 163 (Gluing Lemma, see Lemmas 4.2, 4.7, 4.9, [12]). Let Σ be a simplicial com-
plex, and let U ⊆ V (Σ). Suppose |Σ ∩ pow(U )| is contractible. Then there exists a
homotopy equivalence ϕ : |Σ ∪ pow(U )| → |Σ|.
The Gluing and Carrier Lemmas presented above are classical. We provide full details
for the Gluing lemma inside the proof of the following functorial generalization of Lemma
163.
Lemma 164 (Functorial Gluing Lemma). Let Σ ⊆ Σ0 be two simplicial complexes. Also let
U ⊆ V (Σ) and U 0 ⊆ V (Σ0 ) be such that U ⊆ U 0 . Suppose |Σ∩pow(U )| and |Σ0 ∩pow(U 0 )|
are contractible. Then,

151
1. There exists a homotopy equivalence ϕ : |Σ ∪ pow(U )| → |Σ| such that ϕ(x) and
id|Σ∪pow(U )| (x) belong to the same simplex of |Σ ∪ pow(U )| for each x ∈ |Σ ∪
pow(U )|. Furthermore, the homotopy inverse is given by the inclusion ι : |Σ| ,→
|Σ ∪ pow(U )|.

2. Given a homotopy equivalence ϕ : |Σ ∪ pow(U )| → |Σ| as above, there exists a


homotopy equivalence ϕ0 : |Σ0 ∪ pow(U 0 )| → |Σ0 | such that ϕ0 ||Σ∪pow(U )| = ϕ, and
ϕ0 (x) and id|Σ0 ∪pow(U 0 )| (x) belong to the same simplex of |Σ0 ∪ pow(U 0 )| for each
x ∈ |Σ0 ∪ pow(U 0 )|. Furthermore, the homotopy inverse is given by the inclusion
ι0 : |Σ0 | ,→ |Σ0 ∪ pow(U 0 )|.

Proof of Lemma 164. The proof uses this fact: any continuous map of an n-sphere Sn into
a contractible space Y can be continuously extended to a mapping of the (n + 1)-disk Dn+1
into Y , where Dn+1 has Sn as its boundary [112, p. 27]. First we define ϕ. On |Σ|, define
ϕ to be the identity. Next let σ be a minimal simplex in | pow(U ) \ Σ|. By minimality,
the boundary of σ (denoted Bd(σ)) belongs to |Σ ∩ pow(U )|, and |Σ| in particular. Thus
ϕ is defined on Bd(σ), which is an n-sphere for some n ≥ 0. Furthermore, ϕ maps
Bd(σ) into the contractible space |Σ ∩ pow(U )|. Then we use the aforementioned fact to
extend ϕ continuously to all of σ so that ϕ maps σ into |Σ ∩ pow(U )|. Furthermore, both
id|Σ∪pow(U )| (σ) = σ and ϕ(σ) belong to the simplex | pow(U )|. By iterating this procedure,
we obtain a retraction ϕ : |Σ ∪ pow(U )| → |Σ| such that ϕ(x) and x belong to the same
simplex in |Σ ∪ pow(U )|, for each x ∈ |Σ ∪ pow(U )|. Thus ϕ is homotopic to id|Σ∪pow(U )|
by Lemma 162. Thus we have a homotopy equivalence:

id|Σ| = ϕ ◦ ι, ι ◦ ϕ ' id|Σ∪pow(U )| . (here ι := ι|Σ|,→|Σ∪pow(U )| )

For the second part of the proof, suppose that a homotopy equivalence ϕ : |Σ ∪
pow(U )| → |Σ| as above is provided. We need to extend ϕ to obtain ϕ0 . Define ϕ0 to be
equal to ϕ on |Σ∪pow(U )|, and equal to the identity on G := |Σ0 |\|Σ∪pow(U )|. Let σ be
a minimal simplex in | pow(U 0 )|\G. Then by minimality, Bd(σ) belongs to |Σ0 ∩pow(U 0 )|.
As before, we have ϕ0 mapping Bd(σ) into the contractible space |Σ0 ∩ pow(U 0 )|, and we
extend ϕ0 continuously to a map of σ into |Σ0 ∩ pow(U 0 )|. Once again, id|Σ0 ∪pow(U 0 )| (x)
and ϕ0 (x) belong to the same simplex | pow(U 0 )|, for all x ∈ σ. Iterating this procedure
gives a continuous map ϕ0 : |Σ0 ∪ pow(U 0 )| → |Σ0 |. This map is not necessarily a re-
traction, because there may be a simplex σ ∈ |Σ ∪ pow(U )| ∩ |Σ0 | on which ϕ0 is not the
identity. However, it still holds that ϕ0 is continuous, and that x, ϕ0 (x) get mapped to the
same simplex for each x ∈ |Σ0 ∪ pow(U 0 )|. Thus Lemma 162 still applies to show that ϕ0
is homotopic to id|Σ0 ∪pow(U 0 )| .
We write ι0 to denote the inclusion ι0 : |Σ0 | ,→ |Σ0 ∪ pow(U 0 )|. By the preceding work,
we have ι0 ◦ ϕ0 ' id|Σ0 ∪pow(U 0 )| . Next let x ∈ |Σ0 |. Then either x ∈ |Σ0 | ∩ |Σ ∪ pow(U )|, or
x ∈ G. In the first case, we know that ϕ0 (x) = ϕ(x) and id|Σ0 | (x) = id|Σ∪pow(U )| (x) belong
to the same simplex of |Σ ∪ pow(U )| by the assumption on ϕ. In the second case, we know
that ϕ0 (x) = x = id|Σ0 | (x). Thus for any x ∈ |Σ0 |, we know that ϕ0 (x) and id|Σ0 | (x) belong

152
to the same simplex in |Σ0 ∪ pow(U 0 )|. By Lemma 162, we then have ϕ0 ||Σ0 | ' id|Σ0 | . Thus
ϕ0 ◦ ι0 ' id|Σ0 | . This shows that ϕ0 is the necessary homotopy equivalence.
Now we present the proof of Theorem 38.
Notation. Let I be an ordered set. For any subset J ⊆ I, we write (J) to denote the
sequence (j1 , j2 , j3 , . . .), where the ordering is inherited from the ordering on I.

Proof of Theorem 38. The first step is to functorially deform AΣ and AΣ0 into covers of
simplices while still preserving all associated homotopy types. Then we will be able to
apply Theorem 37. We can assume by Lemma 161 that each subcomplex Σi is induced,
and likewise for each Σ0i . We start by fixing an enumeration I 0 = {l1 , l2 , . . .}. Thus I 0
becomes an ordered set.
Passing to covers of simplices. We now define some inductive constructions. In what
follows, we will define complexes denoted Σ• , Σ0• obtained by “filling in” Σ and Σ0 while
preserving homotopy equivalence, as well as covers of these larger complexes denoted
Σ?,• , Σ0?,• . First define:
(
Σ ∪ pow(V (Σl1 )) : if l1 ∈ I
Σ(l1 ) :=
Σ : otherwise.
Σ0(l1 ) := Σ0 ∪ pow(V (Σ0l1 )).

Next, for all i ∈ I, define


(
Σi ∪ pow(V (Σi ) ∩ V (Σl1 )) : if l1 ∈ I
Σi,(l1 ) :=
Σi : otherwise.

And for all i ∈ I 0 , define

Σ0i,(l1 ) := Σ0i ∪ pow(V (Σ0i ) ∩ V (Σ0l1 )).

Now by induction, suppose Σ(l1 ,...,ln ) and Σi,(l1 ,...,ln ) are defined for all i ∈ I. Also suppose
Σ0(l1 ,...,ln ) and Σ0i,(l1 ,...,ln ) are defined for all i ∈ I 0 . Then we define:
(
Σ(l1 ,...,ln ) ∪ pow(V (Σln+1 ,(l1 ,...,ln ) )) : if ln+1 ∈ I
Σ(l1 ,...,ln ,ln+1 ) :=
Σ(l1 ,...,ln ) : otherwise.
Σ0(l1 ,...,ln ,ln+1 ) := Σ0(l1 ,...,ln ) ∪ pow(V (Σ0ln+1 ,(l1 ,...,ln ) )).

For all i ∈ I, we have


(
Σi,(l1 ,l2 ,...,ln ) ∪ pow(V (Σi,(l1 ,l2 ,...,ln ) ) ∩ V (Σln+1 ,(l1 ,l2 ,...,ln ) )) : if ln+1 ∈ I
Σi,(l1 ,l2 ,...,ln+1 ) :=
Σi,(l1 ,l2 ,...,ln ) : otherwise.

153
And for all i ∈ I 0 , we have

Σ0i,(l1 ,l2 ,...,ln+1 ) := Σ0i,(l1 ,l2 ,...,ln ) ∪ pow(V (Σ0i,(l1 ,l2 ,...,ln ) ) ∩ V (Σ0ln+1 ,(l1 ,l2 ,...,ln ) )).

Finally, for any n ≤ card(I 0 ), we define AΣ,(l1 ,...,ln ) := {Σi,(l1 ,...,ln ) : i ∈ I} and
AΣ0 ,(l1 ,...,ln ) := {Σ0i,(l1 ,...,ln ) : i ∈ I 0 }. We will show that these are covers of Σ(l1 ,l2 ,...,ln ) and
Σ0(l1 ,l2 ,...,ln ) , respectively.
The next step is to prove by induction that for any n ≤ card(I 0 ), we have |Σ| '
|Σ(l1 ,l2 ,...,ln )
| and |Σ0 | ' |Σ0(l1 ,l2 ,...,ln ) |, that N (AΣ ) = N (AΣ,(l1 ,l2 ,...,ln ) ) and N (AΣ0 ) =
N (AΣ0 ,(l1 ,l2 ,...,ln ) ), and that nonempty finite intersections of the new covers AΣ,(l1 ,l2 ,...,ln ) , AΣ0 ,(l1 ,l2 ,...,ln )
remain contractible. For the base case n = 0, we have Σ = Σ() , Σ0 = Σ0() . Thus the base
case is true by assumption. We present the inductive step next.
Claim 16. For this claim, let • denote l1 , . . . , ln , where 0 < n < card(I 0 ). Define l := ln+1 .
Suppose the following is true:

1. The collections AΣ,(•) and AΣ0 ,(•) are covers of Σ(•) and Σ0(•) .

2. The nerves of the coverings are unchanged: N (AΣ ) = N (AΣ,(•) ) and N (AΣ0 ) =
N (AΣ0 ,(•) ).

3. Each of the subcomplexes Σi,(•) , i ∈ I, and Σ0j,(•) , j ∈ I 0 is induced in Σ(•) and Σ0(•) ,
respectively.

4. Let σ ⊆ I. If ∩i∈σ Σi,(•) is nonempty, then it is contractible. Similarly, let τ ⊆ I 0 . If


∩i∈τ Σ0i,(•) is nonempty, then it is contractible.

5. We have homotopy equivalences |Σ| ' |Σ(•) | and |Σ0 | ' |Σ0(•) | via maps that com-
mute with the canonical inclusions.

Then the preceding statements are true for Σ(•,l) , Σ0(•,l) , AΣ,(•,l) , and AΣ0 ,(•,l) as well.
Proof. For the first claim, we have Σ(•,l) = Σ(•) ∪ pow(V (Σl,(•) )) ⊆ ∪i∈I Σi,(•,l) . For the
inclusion, we used the inductive assumption that Σ(•) = ∪i∈I Σi,(•) . Similarly, Σ0(•,l) ⊆
∪i∈I 0 Σ0i,(•,l) .
For the second claim, let i ∈ I. Then V (Σi,(l1 ) ) = V (Σi ), and in particular, we have
V (Σi,(•,l) ) = V (Σi,(•) ) = V (Σi ). Next observe that for any σ ⊆ I, the intersection

∩i∈σ Σi 6= ∅ ⇐⇒ ∩i∈σ V (Σi ) 6= ∅ ⇐⇒ ∩i∈σ V (Σi,(•,l) ) 6= ∅ ⇐⇒ ∩i∈σ Σi,(•,l) 6= ∅.

Thus N (AΣ ) = N (AΣ,(•) ) = N (AΣ,(•,l) ), and similarly N (AΣ0 ) = N (AΣ0 ,(•) ) = N (AΣ0 ,(•,l) ).
For the third claim, again let i ∈ I. If l 6∈ I, then Σi,(•,l) = Σi,(•) , so we are done by the
inductive assumption. Suppose l ∈ I. Since Σi,(•) is induced by the inductive assumption,

154
we have:

Σi,(•,l) = Σi,(•) ∪ (pow(V (Σi,(•) ) ∩ V (Σl,(•) )))


= (Σ(•) ∩ pow(V (Σi,(•) ))) ∪ (pow(V (Σi,(•) )) ∩ pow(V (Σl,(•) )))
= (Σ(•) ∪ pow(V (Σl,(•) ))) ∩ pow(V (Σi,(•) ))
= Σ(•,l) ∩ pow(V (Σi,(•) )) = Σ(l) ∩ pow(V (Σi,(•,l) )).

Thus Σi,(•,l) is induced. The same argument holds for the I 0 case.
For the fourth claim, let σ ⊆ I, and suppose ∩i∈σ Σi,(•,l) is nonempty. By the previous
claim, each Σi,(•,l) is induced. Thus we write:

∩i∈σ Σi,(•,l) = Σ(•,l) ∩ pow(∩i∈σ V (Σi,(•,l) ))


= Σ(•,l) ∩ pow(∩i∈σ V (Σi,(•) ))
= (Σ(•) ∪ pow(V (Σl,(•) ))) ∩ pow(∩i∈σ V (Σi,(•) ))
= (∩i∈σ (Σ(•) ∩ pow(V (Σi,(•) )))) ∪ pow(∩i∈σ V (Σi,(•) ) ∩ V (Σl,(•) ))
= (∩i∈σ Σi,(•) ) ∪ pow(∩i∈σ V (Σi,(•) ) ∩ V (Σl,(•) )).

For convenience, define A := (∩i∈σ Σi,(•) ) and B := pow(∩i∈σ V (Σi,(•) ) ∩ V (Σl,(•) )). Then
|A| is contractible by inductive assumption, and |B| is a full simplex, hence contractible.
Also, A ∩ B has the form

(∩i∈σ (Σ(•) ∩ pow(V (Σi,(•) )))) ∩ pow(∩i∈σ V (Σi,(•) ) ∩ V (Σl,(•) ))


=Σ(•) ∩ pow(∩i∈σ V (Σi,(•) ) ∩ V (Σl,(•) ))
= ∩i∈σ Σi,(•) ∩ Σl,(•) ,

and the latter is contractible by inductive assumption. Thus by Lemma 163, we have |A∪B|
contractible. This proves the claim for the case σ ⊆ I. The case τ ⊆ I 0 is similar.
Now we proceed to the final claim. Since Σl,(•) is induced, we have Σl,(•) = Σ(•) ∩
pow(V (Σl,(•) )). By the contractibility assumption, we know that |Σl,(•) | is contractible.
Also we know that |Σ0l,(•) | = |Σ0(•) ∩ pow(V (Σ0l,(•) ))| is contractible. By assumption we
also have V (Σl,(•) ) ⊆ V (Σ0l,(•) ). Thus by Lemma 164, we obtain homotopy equivalences
Φl : |Σ(•,l) | → |Σ(•) | and Φ0l : |Σ0(•,l) | → |Σ0(•) | such that Φ0l extends Φl . Furthermore,
the homotopy inverses of Φl and Φ0l are just the inclusions |Σ(•) | ,→ |Σ(•,l) | and |Σ0(•) | ,→
|Σ0(•,l) |.
Now let ι : |Σ(•) | → |Σ0(•) | and ιl : |Σ(•,l) | → |Σ0(•,l) | denote the canonical inclusions.
We wish to show the equality Φ0l ◦ ιl = ι ◦ Φl . Let x ∈ |Σ(•,l) |. Because Φ0l extends Φl (this
is why we needed the functorial gluing lemma), we have

Φ0l (ιl (x))) = Φ0l (x) = Φl (x) = ι(Φl (x)).

Since x ∈ |Σ(•,l) | was arbitrary, the equality follows immediately. By the inductive as-
sumption, we already have homotopy equivalences |Σ(•) | → |Σ| and |Σ0(•) | → |Σ0 | that

155
commute with the canonical inclusions. Composing these maps with Φl and Φ0l completes
the proof of the claim. 
By the preceding work, we replace the subcomplexes Σl , Σ0l by full simplices of the
form Σl,(•,l) , Σ0l,(•,l) . In this process, the nerves remain unchanged and the complexes Σ, Σ0
are replaced by homotopy equivalent complexes Σ(•,l) , Σ0(•,l) . Furthermore, this process is
functorial—the homotopy equivalences commute with the canonical inclusions Σ ,→ Σ(•,l)
and Σ0 ,→ Σ0(•,l) .
Repeating the inductive process in Claim 16 for all the finitely many l ∈ I yields
a simplicial complex Σ(I) along with a cover of simplices AΣ,(I) . We also perform the
same procedure for all l ∈ I 0 \ I (this does not affect Σ(I) ) to obtain a simplicial complex
0
Σ0(I ) along with a cover of simplices AΣ0 ,(I 0 ) . Furthermore, Σ(I) and Σ0(I) are related to
Σ and Σ0 by a finite sequence of homotopy equivalences that commute with the canonical
inclusions. Also, we have N (AΣ ) = N (AΣ,(I) ) and N (AΣ0 ) = N (AΣ0 ,(I 0 ) ). Thus we
obtain the following picture:

' ' ' '


|Σ| |Σ(l1 ) | ··· |Σ(I) | |N (AΣ,(I) )| |N (AΣ )|

ι ι(l1 ) ι(I) ιN ,(I) ιN


' ' ' 0 '
|Σ0 | |Σ0(l1 ) | ··· |Σ0(I ) | |N (AΣ0 ,(I 0 ) )| |N (AΣ0 )|

0
By applying Theorem 37 to the block consisting of |Σ(I) |, |Σ0(I ) |, |N (AΣ,I )| and
|N (AΣ0 ,I 0 )|, we obtain a square that commutes up to homotopy. Then by composing
the homotopy equivalences constructed above, we obtain a square consisting of |Σ|, |Σ0 |,
|N (AΣ )|, and |N (AΣ0 )| that commutes up to homotopy. Thus we obtain homotopy equiv-
alences |Σ| ' |N (AΣ )| and |Σ0 | ' |N (AΣ0 )| via maps that commute up to homotopy with
the canonical inclusions.

3.2.4 Dowker persistence diagrams of cycle networks


The contents of this section rely on results in [3] and [4]. We introduce some minimal-
istic versions of definitions from the referenced papers to use in this section. The reader
should refer to these papers for the original definitions.
Given a metric space (M, dM ) and m ∈ M , we will write B(m, ε) to denote a closed
ε-ball centered at m, for any ε > 0. For a subset X ⊆ M and some ε > 0, the Čech
complex of X at resolution ε is defined to be the following simplicial complex:
n o
Č(X, ε) := σ ⊆ X : ∩x∈σ B(x, ε) 6= ∅ .

In the setting of metric spaces, the Čech complex coincides with the Dowker source and
sink complexes. We will be interested in the special case where the underlying metric space

156
1
 1 S2 to denote
is the circle. We write the circle with unit circumference. Next, for any n ∈ N,
n−1
we write Xn := 0, n , n , . . . , n to denote the collection of n equally spaced points on
S 1 with the restriction of the arc length metric on S 1 . Also let Gn denote the n-node cycle
network with vertex set Xn (in contrast with Xn , here Gn is equipped with the asymmetric
weights defined in §1.3.2). The connection between Xn and Dowker complexes of the cycle
networks Gn is highlighted by the following observation:

Proposition 165. Let n ∈ N. Then for any δ ∈ [0, 1], we have Č(Xn , 2δ ) = Dsinδ,Gn .

The scaling factor arises because Gn has diameter ∼ n, whereas Xn ⊆ S 1 has diameter
∼ 1/2. This proposition provides a pedagogical step which helps us transport results from
the setting of [3] and [4] to that of the current paper.
Proof. For δ = 0, both the Čech and Dowker complexes consist of the n vertices, and are
equal. Similarly for δ = 1, both Č(Xn , 1) and Dsin,Gn are equal to the (n − 1)-simplex.
Now suppose δ ∈ (0, 1). Let σ ∈ Dsinδ,Gn . Then σ is of the form [ nk , k+1 n
, . . . , bk+nδc
n
] for
bk+nδc
some integer 0 ≤ k ≤ n − 1, where the nδ-sink is n and all the numerators are taken
modulo n. We claim that σ ∈ Č(Xn , 2δ ). To see this, observe that dS 1 ( nk , bk+nδc
n
) ≤ δ, and so
 
B( nk , 2δ ) ∩ B( bk+nδc , 2δ ) 6= ∅. Then we have σ ∈ nδ bk+ic δ
, 2 , and so σ ∈ Č(Xn , 2δ ).
T
n i=0 B n
Now let σ ∈ Č(Xn , 2δ ). Then σ is of the form [ nk , k+1
n
, . . . , k+j
n
] for some integer 0 ≤
k ≤ n − 1, where j is an integer such that n ≤ δ. In this case, we have σ = Xn ∩ji=0
j

B k+i , δ . Then in Gn , after applying the scaling factor n, we have σ ∈ Dsinδ,Gn , with k+j

n n
as an nδ-sink in Gn . This shows equality of the two simplicial complexes.

Theorem 166 (Theorem 3.5, [4]). Fix n ∈ N, and let 0 ≤ k ≤ n − 2 be an integer. Then,
(W
n−k−1 2l
k S l
if nk = l+1 ,
Č(Xn , 2n )'
S 2l+1 l
or if l+1 < nk < l+2
l+1
,
W
for some l ∈ Z+ . Here denotes the wedge sum, and ' denotes homotopy equivalence.

Theorem 44 (Even dimension). Fix n ∈ N, n ≥ 3. If l ∈ N is such that n is divisible by


(l + 1), and k := l+1 nl
is such that 0 ≤ k ≤ n − 2, then DgmD2l (Gn ) consists of precisely the
nl nl n
point ( l+1 , l+1 + 1) with multiplicity l+1 − 1. If l or k do not satisfy the conditions above,
then DgmD 2l (Gn ) is trivial.

Proof of Theorem 44. Let l ∈ N be such that (l + 1) divides n and 0 ≤ k ≤ n − 2.


k
Then Dsik,Gn = Č(Xn , 2n ) has the homotopy type of a wedge sum of (n − k − 1) copies
of S 2l , by Theorem 166. Here the equality follows from Proposition 165. Notice that
n
n − k − 1 = l+1 − 1. Furthermore, by another application of Theorem 166, it is always
possible to choose ε > 0 small enough so that Dsik−ε,Gn = Č(Xn , k−ε2n
) and Dsik+ε,Gn =
Č(Xn , k+ε
2n
) have the homotopy types of odd-dimensional spheres. Thus the inclusions

157
Dsik−ε,Gn ⊆ Dsik,Gn ⊆ Dsik+ε,Gn induce zero maps upon passing to homology. It follows that
DgmD nl nl n
2l (Gn ) consists of the point ( l+1 , l+1 + 1) with multiplicity l+1 − 1.
If l ∈ N does not satisfy the condition described above, then there does not exist an
integer 1 ≤ j ≤ n − 2 such that j/n = l/(l + 1). So for each 1 ≤ j ≤ n − 2, Dsij,Gn =
j
Č(Xn , 2n ) has the homotopy type of an odd-dimensional sphere by Theorem 166, and thus
does not contribute to DgmD k
2l (Gn ). If l satisfies the condition but k ≥ n−1, then Č(Xn , 2n )
is just the (n − 1)-simplex, hence contractible.
Theorem 44 gives a characterization of the even dimensional Dowker persistence di-
agrams of cycle networks. The most interesting case occurs when considering the 2-
dimensional diagrams: we see that cycle networks of an even number of nodes have an
interesting barcode, even if the bars are all short-lived. For dimensions 4, 6, 8, and beyond,
there are fewer and fewer cycle networks with nontrivial barcodes (in the sense that only
cycle networks with number of nodes equal to a multiple of 4, 6, 8, and so on have nontriv-
ial barcodes). For a complete picture, it is necessary to look at odd-dimensional persistence
diagrams. This is made possible by the next set of constructions.
We have already recalled the definition of a Rips complex of a metric space. To facilitate
the assessment of the connection to [3], we temporarily adopt the notation VR(X, ε) to
denote the Vietoris-Rips complex of a metric space (X, dX ) at resolution ε > 0, i.e. the
simplicial complex {σ ⊆ X : diam(σ) ≤ ε}.

Theorem 167 (Theorem 9.3, Proposition 9.5, [3]). Let 0 < r < 12 . Then there exists a
map Tr : pow(S 1 ) → pow(S 1 ) and a map πr : S 1 → S 1 such that there is an induced
homotopy equivalence
2r '
VR(Tr (X), 1+2r )−→ Č(X, r).
Next suppose X ⊆ S 1 and let 0 < r ≤ r0 < 21 . Then there exists a map η : S 1 → S 1 such
that the following diagram commutes:

2r η 2r 0
VR(Tr (X), 1+2r ) VR(Tr0 (X), 1+2r 0)

πr ' πr0 '


Č(X, r) Č(X, r0 )

Theorem 168. Consider the setup of Theorem 167. If Č(X, r) and Č(X, r0 ) are homotopy
equivalent, then the inclusion map between them is a homotopy equivalence.

Before providing the proof, we show how it implies Theorem 45.

Theorem
n o Fix n ∈ N, n ≥ 3. Then for l ∈ N, define Ml :=
45 (Odd dimension).
n(l+1)
m ∈ N : l+1 < m < l+2 . If Ml is empty, then DgmD
nl
2l+1 (Gn ) is trivial. Otherwise,
we have: n l mo
n(l+1)
DgmD 2l+1 (Gn ) = a l , l+2
,

158
where al := min {m ∈ Ml } . We use set notation (instead of multisets) to mean that the
multiplicity is 1.
k
Proof of Theorem 45. By Proposition 165 and Theorem 166, we know that Dsik,Gn = Č(Xn , 2n )'
1 n
S for integers 0 < k < 2 . Let b ∈ N be the greatest integer less than n/2. Then
by Theorem 168, we know that each inclusion map in the following chain is a homotopy
equivalence:
Dsi1,Gn ⊆ . . . ⊆ Dsib,Gn = Dsidn/2e− ,Gn .

It follows that DgmD


  n 
1 (Gn ) = 1, 2 . The notation in the last equality means that
si si
Db,Gn = Dδ,Gn for all δ ∈ [b, b + 1), where b + 1 = dn/2e.
In the more general case, let l ∈ N and let Ml be as in the statement of the result.
Suppose first that Ml is empty. Then by Proposition 165 and Theorem 166, we know
that Dsik,Gn has the homotopy type of a wedge of even-dimensional spheres or an odd-
dimensional sphere of dimension strictly different from (2l + 1), for any choice of integer
k. Thus DgmD 2l+1 (Gn ) is trivial.
Next suppose Ml is nonempty. By another application of Proposition 165 and Theorem
166, we know that Dsik,Gn = Č(Xn , 2n k
) ' S 2l+1 for integers l+1 nl
< k < n(l+1)
l+2
. Write
al := min {m ∈ Ml } and bl := max {m ∈ Ml }. Then by Theorem 168, we know that each
inclusion map in the following chain is a homotopy equivalence:

Dsial ,Gn ⊆ . . . ⊆ Dsibl ,Gn = Dsidn(l+1)/(l+2)e− ,Gn .


n l mo
It follows that DgmD
2l+1 (Gn ) = al , n(l+1)
l+2
.

It remains to provide a proof of Theorem 168. For this, we need some additional ma-
chinery.
Cyclic maps and winding fractions We introduce some more terms from [3], but for ef-
ficiency, we try to minimize the scope of the definitions to only what is needed for our
purpose. Recall that we write S 1 to denote the circle with unit circumference. Thus we
naturally identify any x ∈ S 1 with a point in [0, 1). We fix a choice of 0 ∈ S 1 , and for any
−→
x, x0 ∈ S 1 , the length of a clockwise arc from x to x0 is denoted by dS 1 (x, x0 ). Then, for any
−→
finite subset X ⊆ S 1 and any r ∈ (0, 1/2), the directed Vietoris-Rips graph VR(X, r) is
−→
defined to be the graph with vertex set X and edge set {(x, x0 ) : 0 < dS 1 (x, x0 ) < r}. Next,


let G be a Vietoris-Rips graph such that the vertices are enumerated as x0 , x1 , . . . , xn−1 ,


according to the clockwise order in which they appear. A cyclic map between G and a

− →

Vietoris-Rips graph H is a map of vertices f such that for each edge (x, x0 ) ∈ G , we

− −→
have either f (x) = f (x0 ), or (f (x), f (x0 )) ∈ H , and n−1
P
i=0 dS 1 (f (xi ), f (xi+1 )) = 1. Here
xn := x0 .

− →

Next, the winding fraction of a Vietoris-Rips graph G with vertex set V ( G ) is defined


to be the infimum of numbers nk such that there is an order-preserving map V ( G ) → Z/nZ
such that each edge is mapped to a pair of numbers at most k apart. A key property of the

159
winding fraction, denoted wf, is that if there is a cyclic map between Vietoris-Rips graphs

− →
− →
− →

G → H , then wf( G ) ≤ wf( H ).
Theorem 169 (Corollary 4.5, Proposition 4.9, [3]). Let X ⊆ S 1 be a finite set and let
0 < r < 12 . Then,
(
l −→ l+1
S 2l+1 : 2l+1 < wf(VR(X, r)) < 2l+3 for some l ∈ Z+ ,
VR(X, r) ' Wj 2l −→ l
S : wf(VR(X, r)) = 2l+1 , for some j ∈ N.
−→
Next let X 0 ⊆ S 1 be another finite set, and let r ≤ r0 < 12 . Suppose f : VR(X, r) →
−→ 0 0 l −→
VR(X , r ) is a cyclic map between Vietoris-Rips graphs and 2l+1 < wf(VR(X, r)) ≤
−→
wf(VR(X 0 , r0 )) < 2l+3
l+1
. Then f induces a homotopy equivalence between VR(X, r) and
0 0
VR(X , r ).
We now have the ingredients for a proof of Theorem 168.
Proof of Theorem 168. Since the maps πr and πr0 induce homotopy equivalences, it follows
that
2r 2r0
VR(Tr (X), 1+2r ) ' VR(Tr0 (X), 1+2r 0 ).

By the characterization result in Theorem 169, we know that there exists l ∈ Z+ such that
l −→ 2r −→ 2r0 l+1
2l+1
< wf(VR(Tr (X), 1+2r )) ≤ wf(VR(Tr0 (X), 1+2r 0 )) < 2l+3 .

The map η in Theorem 167 appears in [3, Proposition 9.5] through an explicit construction.
Moreover, it is shown that η induces a cyclic map
−→ 2r −→ 2r0
wf(VR(Tr (X), 1+2r )) → wf(VR(Tr0 (X), 1+2r 0 )).

2r
Thus by Theorem 169, η induces a homotopy equivalence between VR(Tr (X), 1+2r ) and
2r0
VR(Tr0 (X), 1+2r0 ). Finally, the commutativity of the diagram in Theorem 167 shows that
the inclusion Č(X, r) ⊆ Č(X, r0 ) induces a homotopy equivalence.
Remark 170. The analogue of Theorem 168 for Čech complexes appears as Proposition
4.9 of [3] for Vietoris–Rips complexes. We prove Theorem 168 by connecting Čech and
Vietoris-Rips complexes using Proposition 9.5 of [3]. However, as remarked in §9 of [3],
one could prove Theorem 168 directly using a parallel theory of winding fractions for Čech
complexes.

3.3 Persistent path homology

3.3.1 Digraph maps and functoriality


A digraph map between two digraphs GX = (X, EX ) and GY = (Y, EY ) is a map

f : X → Y such that for any edge (x, x0 ) ∈ EX , we have f (x) = f (x0 ). Recall that this

160
notation means:
either f (x) = f (x0 ), or (f (x), f (x0 )) ∈ EY .
To extend path homology constructions to a persistent framework, we need to verify
the functoriality of path homology. As a first step, one must understand how digraph maps
transform into maps between vector spaces. Some of the material below can be found
in [63]; we contribute a statement and verification of the functoriality of path homology
(Proposition 172) that is central to the PPH framework (Definition 20).
Let X, Y be two sets, and let f : X → Y be a set map. For each dimension p ∈ Z+ ,
one defines a map (f∗ )p : Λp (X) → Λp (Y ) to be the linearization of the following map on
generators: for any generator [x0 , . . . , xp ] ∈ Λp (X),

(f∗ )p ([x0 , . . . , xp ]) := [f (x0 ), f (x1 ), . . . , f (xp )].

Note also that for any p ∈ Z+ and any generator [x0 , . . . , xp ] ∈ Λp (X), we have:
p
X
∂pnr (−1)i (f∗ )p−1 [x0 , . . . , xbi , . . . , xp ]
 
(f∗ )p−1 ◦ ([x0 , . . . , xp ]) =
i=0
p
X
= (−1)i [f (x0 ), . . . , f[
(xi ), . . . , f (xp )]
i=0
∂pnr

= ◦ (f∗ )p ([x0 , . . . , xp ]).

It follows that f∗ := ((f∗ )p )p∈Z+ is a chain map from (Λp (X), ∂pnr )p∈Z+ to (Λp (Y ), ∂pnr )p∈Z+ .
Let p ∈ Z+ . Note that (f∗ )p (Ip (X)) ⊆ Ip (Y ), so (f∗ )p descends to a map on quotients

(fe∗ )p : Λp (X)/Ip (X) → Λp (Y )/Ip (Y )

which is well-defined. For convenience, we will abuse notation to denote the map on
quotients by (f∗ )p as well. Thus we obtain an induced map (f∗ )p : Rp (X) → Rp (Y ).
Since p ∈ Z+ was arbitrary, we get that f∗ is a chain map from (Rp (X), ∂p )p∈Z+ to
(Rp (Y ), ∂p )p∈Z+ . The operation of this chain map is as follows: for each p ∈ Z+ and
any generator [x0 , . . . , xp ] ∈ Rp (X),

(
[f (x0 ), . . . , f (xp )] : f (xi ), f (xi+1 ) distinct for 0 ≤ i ≤ p − 1
(f∗ )p ([x0 , . . . , xp ]) :=
0 : otherwise.

We refer to f∗ as the chain map induced by the set map f : X → Y .


Now given two digraphs GX = (X, EX ), GY = (Y, EY ) and a digraph map f :
GX → GY , one may use the underlying set map f : X → Y to induce a chain map
f∗ : R• (X) → R• (Y ). As one could hope, the restriction of the chain map f∗ to the chain
complex of ∂-invariant paths on GX maps into the chain complex of ∂-invariant paths on
GY , and moreover, is a chain map. We state this result as a proposition below, and provide
a reference for the proof.

161
Proposition 171 (Theorem 2.10, [63]). Let GX = (X, EX ), GY = (Y, EY ) be two di-
graphs, and let f : GX → GY be a digraph map. Let f∗ : R• (X) → R• (Y ) denote
the chain map induced by the underlying set map f : X → Y . Let (Ωp (GX ), ∂pGX )p∈Z+ ,
(Ωp (GY ), ∂pGY )p∈Z+ denote the chain complexes of the ∂-invariant paths associated to each
of these digraphs. Then (f∗ )p (Ωp (GX )) ⊆ Ωp (GY ) for each p ∈ Z+ , and the restriction of
f∗ to Ω• (GX ) is a chain map.
Henceforth, given two digraphs G, G0 and a digraph map f : G → G0 , we refer to the
chain map f∗ given by Proposition 171 as the chain map induced by the digraph map f .
Because f∗ is a chain map, we then obtain an induced linear map (f# )p : Hp (G) → Hp (G0 )
for each p ∈ Z+ .
The preceding concepts are necessary for developing the theory of path homology. We
use this set up to state and prove the following result, which is used in defining PPH (Defi-
nition 20) and also for proving stability (Theorem 56).
Proposition 172 (Functoriality of path homology). Let G, G0 , G00 be three digraphs.
1. Let idG : G → G be the identity digraph map. Then (idG# )p : Hp (G) → Hp (G) is
the identity linear map for each p ∈ Z+ .

2. Let f : G → G0 , g : G0 → G00 be digraph maps. Then ((g ◦ f )# )p = (g# )p ◦ (f# )p


for any p ∈ Z+ .
Proof. Let p ∈ Z+ . In each case, it suffices to verify the operations on generators of
Ωp (G). Let [x0 , . . . , xp ] ∈ Ωp (G). We will write idG∗ to denote the chain map induced by
the digraph map idG . First note that

(idG∗ )p ([x0 , . . . , xp ]) = [idG (x0 ), . . . , idG (xp )] = [x0 , . . . , xp ].

It follows that (idG∗ )p is the identity linear map on Ωp (G), and thus (idG# )p is the identity
linear map on Hp (G). For the second claim, suppose first that pairs of consecutive elements
of g(f (x0 )), . . . , g(f (xp )) are all distinct. This implies that pairs of consecutive elements
of f (x0 ), . . . , f (xp ) are also all distinct, and we observe:

((g ◦ f )∗ )p ([x0 , . . . , xp ]) = [g(f (x0 )), . . . , g(f (xp ))] g(f (xi )), g(f (xi+1 )) distinct
= (g∗ )p ([f (x0 ), . . . , f (xp )]) because f (xi ), f (xi+1 ) distinct

= (g∗ )p (f∗ )p ([x0 , . . . , xp ]) .

Next suppose that for some 0 ≤ i < p, we have g(f (xi )) = g(f (xi+1 )). Then we obtain:

((g ◦ f )∗ )p ([x0 , . . . , xp ]) = 0 = (g∗ )p (f∗ )p ([x0 , . . . , xp ]) .

It follows that ((g◦f )∗ )p = (g∗ )p ◦(f∗ )p . The statement of the proposition now follows.
Remark 173. We thank Paul Ignacio for pointing out an error in a version of the preceding
proof that appeared in [40].

162
3.3.2 Homotopy of digraphs
The constructions of path homology are accompanied by a theory of homotopy devel-
oped in [63]. An illustrated example is provided in Figure 3.1.

Figure 3.1: Directed d-cubes that are all homotopy equivalent.

Let GX = (X, EX ), GY = (Y, EY ) be two digraphs. The product digraph GX × GY =


(X × Y, EX×Y ) is defined as follows:

X × Y := {(x, y) : x ∈ X, y ∈ Y }, and
EX×Y := {((x, y), (x0 , y 0 )) ∈ (X × Y )2 : x = x0 and (y, y 0 ) ∈ EY ,
or y = y 0 and (x, x0 ) ∈ EX }.

Next, the line digraphs I + and I − are defined to be the two-point digraphs with vertices
{0, 1} and edges (0, 1) and (1, 0), respectively. Two digraph maps f, g : GX → GY are
one-step homotopic if there exists a digraph map F : GX × I → GY , where I ∈ {I + , I − },
such that:
F |GX ×{0} = f and F |GX ×{1} = g.
This condition is equivalent to requiring:
→ →
f (x) = g(x) for all x ∈ X, or g(x) = f (x) for all x ∈ X.

Moreover, f and g are homotopic, denoted f ' g, if there is a finite sequence of digraph
maps f0 = f, f1 , . . . , fn = g : GX → GY such that fi , fi+1 are one-step homotopic for
each 0 ≤ i ≤ n − 1. The digraphs GX and GY are homotopy equivalent if there exist
digraph maps f : GX → GY and g : GY → GX such that g ◦ f ' idGX and f ◦ g ' idGY .
An example of digraph homotopy equivalence is illustrated in Figure 3.1. Informally,
the homotopy equivalence is given by “crushing” the orange arrows according to the direc-
tions they mark. This operation crushes the 4-tesseract to the 3-cube, to the 2-square, to the
line, and finally to the point.
The concept of homotopy yields the following theorem on path homology groups:

163
Theorem 174 (Theorem 3.3, [63]). Let G, G0 be two digraphs.
1. Let f, g : G → G0 be two homotopic digraph maps. Then these maps induce identical
maps on homology vector spaces. More precisely, the following maps are identical
for each p ∈ Z+ :
(f# )p : Hp (G) → Hp (G0 ) (g# )p : Hp (G) → Hp (G0 ).

2. If G and G0 are homotopy equivalent, then Hp (G) ∼


= Hp (G0 ) for each p ∈ Z+ .

3.3.3 The Persistent Path Homology of a Network


Proof of Theorem 56. Let η > 2dN (X , Y). By virtue of Proposition 7, we obtain maps
ϕ : X → Y and ψ : Y → X such that dis(ϕ) < η, dis(ψ) < η, CX,Y (ϕ, ψ) < η, and
CY,X (ψ, ϕ) < η.
Claim 17. For each δ ∈ R, the map ϕ induces a digraph map ϕδ : GδX → Gδ+η
Y given by
δ δ+η
x 7→ ϕ(x), and the map ψ induces a digraph map ψδ : GY → GX given by y 7→ ψ(y).
Proof. Let δ ∈ R, and let (x, x0 ) ∈ EXδ
. Then AX (x, x0 ) ≤ δ. Because dis(ϕ) < η, we
have AY (ϕ(x), ϕ(x0 )) < δ + η. Thus (ϕ(x), ϕ(x0 )) ∈ EYδ+η , and so ϕδ is a digraph map.
Similarly, ψδ is a digraph map. Since δ ∈ R was arbitrary, the claim now follows. 
Claim 18. Let δ ≤ δ 0 ∈ R, and let sδ,δ0 , tδ+η,δ0 +η denote the digraph inclusion maps GδX ,→
0 0
GδX and GYδ+η ,→ GYδ +η , respectively. Then ϕδ0 ◦ sδ,δ0 and tδ+η,δ0 +η ◦ ϕδ are one-step
homotopic.

Proof. Let x ∈ X. We wish to show ϕδ0 (sδ,δ0 (x)) = tδ+η,δ0 +η (ϕδ (x)). But notice that:
ϕδ0 (sδ,δ0 (x)) = ϕδ0 (x) = ϕ(x),
where the second equality is by definition of ϕδ0 and the first equality occurs because sδ,δ0
is the inclusion map. Similarly, tδ+η,δ0 +η (ϕδ (x)) = tδ+η,δ0 +η (ϕ(x)) = ϕ(x). Thus we

obtain ϕδ0 (sδ,δ0 (x)) = tδ+η,δ0 +η (ϕδ (x)). Since x was arbitrary, it follows that ϕδ0 ◦ sδ,δ0 and
tδ+η,δ0 +η ◦ ϕδ are one-step homotopic. 
Claim 19. Let δ ∈ R, and let sδ,δ+2η denote the digraph inclusion map GδX ,→ Gδ+2η
X . Then
sδ,δ+2η and ψδ+η ◦ ϕδ are one-step homotopic.
Proof. Recall that CX,Y (ϕ, ψ) < η, which means that for any x ∈ X, y ∈ Y , we have:
|AX (x, ψ(y)) − AY (ϕ(x), y))| < η.
Let x ∈ X, and let y = ϕ(x). Notice that sδ,δ+2η (x) = x and ψδ+η (ϕδ (x)) = ψ(ϕ(x)).
Also note:
AX (x, ψ(ϕ(x))) ≤ η + AY (ϕ(x), ϕ(x)) ≤ δ + 2η.

Thus sδ,δ+2η (x) = ψδ+η (ϕδ (x)), and this holds for any x ∈ X. The claim follows. 

164
By combining the preceding claims and Theorem 174, we obtain the following, for
each p ∈ Z+ :

((sδ,δ+2η )# )p = ((ψδ+η ◦ ϕδ )# )p , ((ϕδ0 ◦ sδ,δ0 )# )p = ((tδ+η,δ0 +η ◦ ϕδ )# )p .

By invoking functoriality of path homology (Proposition 172), we obtain:

((sδ,δ+2η )# )p = ((ψδ+η )# )p ◦((ϕδ )# )p , ((ϕδ0 )# )p ◦(sδ,δ0 )# )p = ((tδ+η,δ0 +η )# )p ◦((ϕδ )# )p .

By using similar arguments, we also obtain, for each p ∈ Z+ ,

((tδ,δ+2η )# )p = ((ϕδ+η )# )p ◦((ψδ )# )p , ((ψδ0 )# )p ◦(tδ,δ0 )# )p = ((sδ+η,δ0 +η )# )p ◦((ψδ )# )p .

Thus PVecΞp (X ) and PVecΞp (Y) are η-interleaved, for each p ∈ Z+ . The result now
follows by an application of Lemma 154.

3.3.4 PPH and Dowker persistence


Definition 49 (Type I and II Dowker simplices). Let (X, AX ) ∈ FN , fix δ ∈ R, and let σ
be a simplex in Dsiδ,X . Then we define σ to be a Type I simplex if some x ∈ σ is a δ-sink
for σ. Otherwise, σ is a Type II simplex. Notice that if σ is a Type II simplex, then there
exists x 6∈ σ such that x is a δ-sink for σ.
We define analogous notions at the chain complex level: a chain σ ∈ C• (Dsiδ,X ) is of
Type I if each element in its expression corresponds to a Type I simplex. Otherwise, σ is
of Type II.

Lemma 175 (Proposition 2.9, [63]). Let G be a finite digraph. Then any v ∈ Ω2 (G) is a
linear combination of the following three types of ∂-invariant 2-paths:

1. aba with edges (a, b), (b, a) (a double edge),

2. abc with edges (a, b), (b, c), (a, c) (a triangle), and

3. abc − adc with edges (a, b), (b, c), (a, d), (d, c), where a 6= c and (a, c) is not an edge
(a long square).

Lemma 176 (Parity lemma). Fix a simplicial complex K and a field Z/pZ for some prime
P
p. Let w := i∈I bi τi be a 2-chain in C2 (K) where I is a finite index set, each bi ∈ Z/pZ,
and each τi is a 2-simplex in K. Let σ be a 1-simplex contained in some τi such that σ does
not appear in ∂2∆ (w). Define Jσ := {j ∈ I : σ a face of τj }. Then there exists n(σ) ∈ N
such that:
n(σ)
X X
w= bi τ i + (τj+ + τj− ),
i∈I\Jσ j=1

where ∂2∆ (τj+ + τj− ) is independent of σ for each 1 ≤ j ≤ n(σ).

165
Proof of Lemma 176. Since we are working over Z/pZ, we adopt the convention that bi ∈
{0, 1, . . . , p − 1} for each i ∈ I. Then for each j ∈ Jσ , we know that ∂2∆ (τj ) contributes
P P
either +σ or −σ with multiplicity bj . Write w = i∈I\Jσ bi τi + j∈Jσ bj τj .
Since σ is not a summand of ∂2∆ (w), it follows that j∈Jσ bj = 0. Define:
P

Jσ+ := {j ∈ Jσ : τj contributes + σ} , Jσ− := {j ∈ Jσ : τj contributes − σ} .


P P P
Then w = i∈I\Jσ bi τi + j∈Jσ+ bj τj + j∈Jσ− bj τj .
Also define k := |Jσ+ |, and enumerate Jσ+ as {j1 , . . . , jk }. Write n+ (σ) := km=1 bjk ,
P

where the sum is taken over Z (not Z/pZ). Next define a finite sequence (τ1+ , . . . , τn++ (σ) )
as follows:

τi+ := τj1 for i ∈ {1, . . . , bj1 } ,


τi+ := τj2 for i ∈ {bj1 + 1, . . . , bj1 + bj2 } , . . . ,
( k−1 k
)
X X
+
τi := τjk for i ∈ bjm + 1, . . . , bjm .
m=1 m=1

Here the indexing element i is of course taken over Z and not Z/pZ. Similarly we define a
P + (σ) + Pn− (σ) −
sequence (τ1− , . . . , τn−− (σ) ). Then w = i∈I\Jσ bi τi + nm=1
P
τm + m=1 τm .
The expression for ∂2∆ (w) contains +σ with multiplicity n+ (σ) and −σ with multi-
plicity n− (σ), such that the total multiplicity is 0, i.e. is a multiple of p. Thus we have
n+ (σ) − n− (σ) ∈ pZ. There are two cases: either n+ (σ) ≥ n− (σ) or n+ (σ) ≤ n− (σ).
Both cases are similar, so we consider the first. Let q be a nonnegative integer such
that n+ (σ) = n− (σ) + pq. We pad the τ − sequence by defining τi− := τn−− (σ) for
i ∈ {n− (σ) + 1, . . . , n− (σ) + pq}. Then we have:
n+ (σ) n− (σ) n+ (σ) n− (σ) n− (σ)+pq
X X X X X X X
+ − + − −
w= bi τ i + τm + τm = bi τ i + τm + τm + τm
i∈I\Jσ m=1 m=1 i∈I\Jσ m=1 m=1 m=n− (σ)+1
n+ (σ) n+ (σ)
X X X
+ −
= bi τ i + τm + τm .
i∈I\Jσ m=1 m=1

Theorem 61. Let X = (X, AX ) ∈ CN be a square-free network, and fix K = Z/pZ for
some prime p. Then DgmΞ1 (X ) = DgmD
1 (X ).

Proof of Theorem 61. Let δ ∈ R. First we wish to find an isomorphism ϕδ : H1Ξ (GδX ) →
H1∆ (Dsiδ,X ). We begin with the basis B for Ω1 (GδX ). We claim that B is just the collection of
allowed 1-paths in GδX . To see this, let ab be an allowed 1-path. Then ∂1 (ab) = b−a, which
is allowed because the vertices a and b are automatically allowed. Thus ab ∈ Ω1 (GδX ), and
so B generates Ω1 (GδX ).
Whenever ab is an allowed 1-path, we have a directed edge (a, b) in GδX , and so
AX (a, b) ≤ δ by the definition of GδX . Thus the simplex [a, b] belongs to Dsiδ,X , with b

166
as a δ-sink. Hence [a, b] is a 1-chain in C1 (Dsiδ,X ). Define a map ϕ eδ : Ω1 (GδX ) → C1 (Dsiδ,X )
by setting ϕ eδ (ab) = [a, b] and extending linearly. The image of ϕ eδ restricted to B is linearly
independent because any linear dependence relation would contradict the independence of
B. Furthermore, ϕ eδ induces a map ϕ e0δ : ker(∂1Ξ ) → ker(∂1∆ ). We need to check that this
descends to a map ϕd : ker(∂1 )/ im(∂2Ξ ) → ker(∂1∆ )/ im(∂2∆ ) on quotients. To see this, we
Ξ

need to verify that ϕ e0δ (im(∂2Ξ )) ⊆ im(∂2∆ ).


By Lemma 175, we have a complete characterization of Ω2 (GδX ). Thus we know that
any element of im(∂2Ξ ) is of the form ba + ab, bc − ac + ab, or bc + ab − dc − ad. In the first
case, we have ϕ e0δ (ba + ab) = [b, a] + [a, b] = [b, a] − [b, a] = 0 ∈ im(∂2∆ ). The next case
corresponds to the situation where we have abc ∈ Ω2 (GδX ) with edges (a, b), (b, c), (a, c)
in GδX . In this case, [a, b, c] is a 2-simplex in Dsiδ,X , with c as a δ-sink. Thus [b, c] − [a, c] +
[a, b] = ϕ e0δ (bc − ac + ab) belongs to im(∂2∆ ).
The final case cannot occur because GδX is square-free. It follows that ϕ e0δ (im(∂2Ξ )) ⊆
im(∂2∆ ), and so we obtain a well-defined map ϕδ : H1Ξ (GδX ) → H1∆ (Dsiδ,X ).
Next we check that ϕδ is injective. Let v = ki=0 ai σi ∈ ker(ϕδ ), where
P
Pthe ai terms
δ k
belong

Pmto the field K and each σi is a 1-path in GX . Then ϕδ (v) = ϕδ ( i=0 si
ai σi ) =
∂2 ( j=0 bj τj ), where the bj terms belong to K and each τj is a 2-simplex in Dδ,X .
Claim 20. w := m
P Pn 0 0 si
j=0 bj τj is homologous to a 2-cycle k=0 bk τk in C2 (Dδ,X ), where each
τk0 is of the form [a, b, c] and abc is a triangle in GδX .
Suppose the claim is true. Then we immediately see that v ∈ im(∂2Ξ ). Thus ker(ϕδ ) =
im(∂2Ξ ), and hence ker(ϕδ ) is trivial in H1Ξ (GδX ). This shows that ϕδ is injective.
Proof of Claim 20. Let us now prove the claim. Suppose τj is a Type II simplex, for some
0 ≤ j ≤ m. Write τj = [u, x, y]. Then there exists z ∈ X such that z is a δ-sink for τj . But
then [u, x, y, z] ∈ Dsiδ,X , and ∂3∆ ([u, x, y, z]) = [x, y, z]−[u, y, z]+[u, x, z]−[u, x, y]. Since
∂2∆ ◦ ∂3∆ = 0, it follows that [u, x, y] is homologous to [x, y, z] − [u, y, z] + [u, x, z], each
of which is a Type I simplex. Using this argument, we first replace all Type II simplices in
w by Type I simplices.
Next let τ be a Type I simplex in the rewritten expression for w. By taking a permutation
and appending a (−1) coefficient if needed, we can write τ = [x, y, z], where z is the δ-sink
for τ . Thus (x, z), (y, z) are edges in GδX . If (x, y) or (y, x) is also an edge, then xyz is
a triangle, and we are done. Suppose that neither is an edge, i.e. neither of xy, yx is in
Ω1 (GδX ). Then, since xy is not a summand of v, we know that [x, y] is not a summand
of ϕδ (v). Thus we are in the setting of Lemma 176, because ∂2∆ (w) = ϕδ (v). Define
J := {0 ≤ j ≤ m : [x, y] a face of τj }. By applying Lemma 176, we can rewrite w:

m n([x,y])
X X
w= bi τ i + (τj+ + τj− ),
i6∈J,i=0 j=1

where all the summands of w containing [x, y] as a face are paired in the latter term. Each
τ + + τ − summand has the following form: [x, y] is a face of both τ + and τ − , and both τ +

167
and τ − are Type I simplices. Fix 1 ≤ j ≤ n([x, y]). Then for some z, u ∈ X, τj+ = [x, y, z]
and τj− = [x, u, y] have the following arrangement:

x x x

u z u z u z

y y y

Since (X, AX ) is square-free, we must have at least one of the edges (z, u) or (u, z) in
GδX .Suppose (z, u) is an edge. Because we have

∂3∆ ([x, y, z, u]) = [y, z, u] − [x, z, u] + [x, y, u] − [x, y, z],

it follows that [x, y, z] − [x, y, u] = [x, y, z] + [x, u, y] = τj+ + τj− is homologous to


[y, z, u] − [x, z, u], where yzu and xzu are both triangles in GδX .
For the other case, suppose (u, z) is an edge. Because we have ∂3∆ ([x, y, u, z]) =
[y, u, z] − [x, u, z] + [x, y, z] − [x, y, u], we again know that τj+ + τj− is homologous to
[x, u, z] − [y, u, z], where xuz and yuz are both triangles in GδX .
We can repeat this argument to replace all summands of w containing [x, y] as a face.
Since τ = [x, y, z] was arbitrary, this proves the claim. 
It remains to verify that ϕδ is surjective. Let v = m ai τi be a 1-cycle in C1 (Dsiδ,X ).
P
i=0 P
First we wish to show that v is homologous to a 1-cycle v = ni=0 bi τi0 of Type I. Let τi be a
0

Type II simplex in the expression for v, for some 0 ≤ i ≤ m. Write τi = [x, y], and let z be
a δ-sink for τi . Then [x, y, z] is a simplex in Dsiδ,X , and ∂2∆ ([x, y, z]) = [y, z] − [x, z] + [x, y].
Thus [x, y] is homologous to [x, z]−[y, z], each of which is a Type I simplex. This argument
shows that v is homologous to a 1-cycle v 0 of Type I.
Next let τ 0 be a 1-simplex in the expression for v 0 . Write τ 0 = [x, y]. If x is the δ-sink
for τ 0 , then we replace the τ 0 = [x, y] in the expression of v 0 with −[y, x]. This does not
change v, since we have τ 0 = [x, y] = −[y, x] in C1 (Dsiδ,X ). After repeating this procedure
for each element of v 0 , we obtain a rewritten expression for v 0 in terms of elements [x, y]
where y is the δ-sink for [x, y]. Let v 0 = ni=0 b0i [xi , yi ] denote this new expression.
P
Finally, observe that for each [xi , yi ] in the rewritten expression for v 0 , we also have
(xi , yi ) as an edge in GδX . Thus ni=0 b0i xi yi is a 1-cycle in H1Ξ (GδX ) that is mapped to v 0
P
by ϕδ . It follows that ϕδ is surjective, and hence is an isomorphism.
0
To complete the proof, let δ ≤ δ 0 ∈ R. Consider the inclusion maps ιG : GδX ,→ GδX
and ιD : Dsiδ,X ,→ Dsiδ0 ,X , and let (ιG )# , (ιD )# denote the induced maps at the respective

168
homology levels. Let v = ni=0 ai xi yi be a 1-cycle in H1Ξ (GδX ). Then we have:
P

n
! n
! n
X X X
(ϕδ0 ◦ (ιG )# ) ai xi yi = ϕδ0 ai x i y i = ai [xi , yi ]
i=0 i=0 i=0
n
! n
!
X X
= (ιD )# ai [xi , yi ] = ((ιG )# ◦ ϕδ ) ai [xi , yi ] .
i=0 i=0

Thus the necessary commutativity relation holds, and the theorem follows by the Per-
sistence Equivalence Theorem.
Theorem 63. Let Gn be a cycle network for some integer n ≥ 3. Fix a field K = Z/pZ for
some prime p. Then DgmΞ1 (Gn ) = {(1, dn/2e)}.
Proof of Theorem 63. From [37], we know that DgmD 1 (Gn ) = {(1, dn/2e)}. Thus by The-
orem 61, it suffices to show that Gn is square-free. Suppose n ≥ 4, and let a, b, c, d be four
nodes that appear in Gn in clockwise order. First let δ ∈ R be such that (a, b), (b, c), (a, d), (d, c)
are edges in GδGn . Then ωGn (d, c) ≤ δ, and because of the clockwise orientation d  a  c,
we automatically ωGn (a, c) ≤ δ. Hence (a, c) is an edge in GδGn , and so the subgraph in-
duced by a, b, c, d is not a long square.
Next suppose δ ∈ R is such that (a, b), (c, b), (a, d), (c, d) are edges in GδGn . Since
ωGn (c, b) ≤ δ and c  a  b in Gn , we have ωGn (c, a) ≤ δ. Hence (c, a) is an edge in
GδGn , and so the subgraph induced by a, b, c, d is not a short square.
Theorem 59. Let X = (X, AX ) ∈ CN be a symmetric network, and fix K = Z/pZ for
some prime p. Then DgmΞ1 (X ) = DgmD
1 (X ).

Proof of Theorem 59. The proof is similar to that of Theorem 61; instead of repeating all
details, we will show how the argument changes when the square-free assumption is re-
placed by the symmetry assumption. Let δ ∈ R, and consider the map ϕ e0δ : ker(∂1Ξ ) →
ker(∂1∆ ) defined as in Theorem 61. As before, we need to check that this descends to a map
ϕd : ker(∂1Ξ )/ im(∂2Ξ ) → ker(∂1∆ )/ im(∂2∆ ) on quotients. For this we need to verify that
e0δ (im(∂2Ξ )) ⊆ im(∂2∆ ).
ϕ
By Lemma 175, we know that any element of im(∂2Ξ ) is of the form ba+ab, bc−ac+ab,
or bc+ab−dc−ad. For the first two cases, we can repeat the argument used in Theorem 61.
The final case corresponds to the situation where we have a long square in GδX consisting
of edges (a, b), (b, c), (a, d), and (d, c). This gives the 2-chain abc − adc. Now by the
symmetry condition, we also have edges (c, d) and (c, b). Thus [a, b, c] is a 2-simplex in
Dsiδ,X , with b as a δ-sink, and [a, d, c] is a 2-simplex with d as a δ-sink. Hence [a, b, c] −
e0δ (bc + ab − dc − ad) = [b, c] + [a, b] − [d, c] − [a, d]
[a, d, c] is a 2-chain in C2 (Dsiδ,X ). Thus ϕ

belongs to im(∂2 ). Thus we obtain a well-defined map ϕδ : H1Ξ (GδX ) → H1∆ (Dsiδ,X ).
Next we need to check that ϕδ is injective. As in Theorem 61, let v ∈ ker(ϕδ ). Then
ϕδ (v) = ϕδ ( ki=0 ai σi ) = ∂2∆ ( m
P P
j=0 bj τj ), where the ai , bj terms belong to the field K,
each σi is a 1-path in GX , and each τj is a 2-simplex in Dsiδ,X . We proceed by proving an
δ

169
analogue of Claim 20 in the symmetric setting. Write w := m
P
Pn 0 0 j=0 bj τj . We need to show
that w is homologous to a 2-cycle k=0 bk τk in C2 (Dδ,X ), where each τk0 is of the form
si

[a, b, c] and abc is either a triangle or part of a square in GδX .


As in the proof of Claim 20, we first replace all Type II simplices in w by Type I
simplices. Next let τ = [x, y, z] be a Type I simplex in w, and suppose z is the δ-sink for τ ,
but neither of (x, y), (y, x) is an edge. As in the proof of Theorem 61, we apply Lemma 176
to separate the summands of w containing [x, y] as a face into pairs of the form (τ + + τ − ).
Writing τ + = [x, z, y] and τ − = [x, y, u], we obtain the following arrangement:
x x

u z u z

y y

By the symmetry assumption, (z, y) and (u, y) are also edges in GδX , and so xuy, xzy
are both allowed 2-paths. Since τ − = [x, y, u] = −[x, u, y], we can replace τ + + τ − by
[x, z, y] − [x, u, y], where xzy − xuy is a square in GδX . Proceeding in this way, we replace
each summand of w containing [x, y] as a face. We repeat this argument for each choice of
τ = [x, y, z] in the expression for w.
Finally, we obtain an expression of w such that there exists v 0 ∈ Ω2 (GδX ) satisfying
ϕδ (v 0 ) = w. Then we have ∂2Ξ (v 0 ) = v, and so v = 0 in H1Ξ (GδX ). Thus ϕδ is injective.
We omit the remainder of the argument, because it is a repeat of the corresponding
part of the proof of Theorem 61. In summary, it turns out that ϕδ is surjective, hence
an isomorphism, and furthermore that it commutes with the linear maps induced by the
canonical inclusions. This concludes the proof.

3.4 Diagrams of compact networks


We now prove the well-definedness of persistence diagrams arising from compact net-
works, as well as convergence properties.
Theorem 86. Let (X, ωX ) ∈ CN , k ∈ Z+ . Then the persistence vector spaces associated
to the Vietoris-Rips, Dowker, and PPH constructions are all q-tame.
Proof of Theorem 86. All the cases are similar, so we just prove the case of PVecD k (X).
νδ,δ0 0
D
For convenience, write PVeck (X) = {V δ −−→ V δ }δ≤δ0 ∈R . Let δ < δ 0 . We need to show
νδ,δ0 has finite rank. Write ε := (δ 0 − δ)/2. Let U be an ε/4-system on X (this requires
Theorem 64). Then by Theorem 66 we pick a finite subset X 0 ⊆ X such that dN (X, X 0 ) <
ε/2. By stability results, we have that PVecD 0 D
k (X ) and PVeck (X) are ε-interleaved. For
µδ,δ0 0 δ0
convenience, write PVecD 0 δ δ
k (X ) = {U −−→ U }δ≤δ 0 ∈R . Then the map νδ,δ 0 : V → V
δ
0
factorizes through U δ+ε via interleaving maps V δ → U δ+ε → V δ+2ε = V δ . Since U δ+ε is
finite dimensional, it follows that νδ,δ0 has finite rank. This concludes the proof.

170
Corollary 177 (Stability). Let (X, ωX ), (Y, ωY ) ∈ CN , k ∈ Z+ . Then,

dB (Dgm•k (X), Dgm•k (Y )) ≤ 2dN (X, Y ),

where Dgm• denotes each of the Vietoris-Rips, Dowker, or path persistence diagrams.

Proof. By Theorem 86, both the Rips and Dowker persistent vector spaces of X and Y
are q-tame. Thus they have well-defined persistence diagrams (Theorem 85), and we have
equality of dI and dB .

Theorem 87 (Convergence). Let (X, ωX ) be a measure network equipped with a Borel


probability measure µX . For each i ∈ N, let xi : Ω → X be an independent random
variable defined on some probability space (Ω, F, P) with distribution µX . For each n ∈ N,
let Xn = {x1 , x2 , . . . , xn }. Let ε > 0. Then we have:
n
1 − M ε/4 (supp(µ X ))
P {ω ∈ Ω : dB (Dgm• (supp(µX )), Dgm• (Xn (ω))) ≥ ε} ≤

,
Mε/4 (supp(µX ))

where Xn (ω) is the subnetwork induced by {x1 (ω), . . . , xn (ω)} and Dgm• is either of the
Vietoris-Rips, Dowker, or PPH diagrams. In particular, either of these three persistent vec-
tor spaces of the subnetwork Xn converges almost surely to that of supp(µX ) in bottleneck
distance.

Proof of Theorem 87. We can consider supp(µX ) as a network with full support by endow-
ing it with the restriction of ωX to supp(µX ) × supp(µX ), so for convenience, we assume
X = supp(µX ). Let ω ∈ Ω be such that dN (X, Xn (ω)) < ε/2. Then by Corollary 177, we
have that dB (Dgm• (X), Dgm• (Xn )) < ε. By applying Theorem 68, we then have:

P {ω ∈ Ω : dB (Dgm• (X), Dgm• (Xn (ω))) ≥ ε} ≤ P {ω ∈ Ω : dN (X, Xn (ω)) ≥ ε/2}


 
n
1 − Mε/4 (supp(µX ))
≤ .
Mε/4 (supp(µX ))

We conclude the proof with an application of the Borel-Cantelli lemma, as in the proof of
Theorem 68.

171
Chapter 4: Algorithms, computation, and experiments

4.1 The complexity of computing dN


By Remark 10 and Proposition 12 we know that in the setting of finite networks, it
is possible to obtain an upper bound on dN , in the case card(X) = card(Y ), by using
dbN . Solving for dbN (X, Y ) reduces to minimizing the function maxx,x0 ∈X f (ϕ) over all
bijections ϕ from X to Y . Here f (ϕ) := maxx,x0 |ωX (x, x0 ) − ωY (ϕ(x), ϕ(x0 ))|. However,
this is an instance of an NP-hard problem known as the quadratic bottleneck assignment
problem [95]. The structure of the optimization problem induced by dN is very similar
to that of dbN , so it seems plausible that computing dN would be NP-hard as well. This
intuition is confirmed in Theorem 178. We remark that similar results were obtained for
the Gromov-Hausdorff distance by F. Schmiedl in his PhD thesis [105].

Theorem 178. Computing dN is NP-hard.

Proof. To obtain a contradiction, assume that dN is not NP-hard. Let X, Y ∈ FN dis


such that card(X) = card(Y ). We write R(X, Y ) = RB t RN , where RB consists of
correspondences for which the projections πX , πY are injective, thus inducing a bijection
between X and Y , and RN = R(X, Y ) \ RB . Note that for any R ∈ RN , there exist
x, x0 , y such that (x, y), (x0 , y) ∈ R, or there exist x, y, y 0 such that (x, y), (x, y 0 ) ∈ R.
Define Ψ : R → R by:
(
ζ + C : ζ 6= 0
Ψ(ζ) = , where
0 :ζ=0
C= max dis(R) + 1.
R∈R(X,Y )

For convenience, we will write Ψ(X), Ψ(Y ) to mean (X, Ψ ◦ ωX ) and (Y, Ψ ◦ ωY )
respectively. We will also write:

disΨ (R) := max |Ψ(ωX (x, x0 )) − Ψ(ωY (y, y 0 ))|.


(x,y),(x0 ,y 0 )∈R

172
Consider the problem of computing dN (Ψ(X), Ψ(Y )). First observe that for any R ∈ RB ,
we have dis(R) = disΨ (R). To see this, let R ∈ RB . Let (x, y), (x0 , y 0 ) ∈ R, and note that
x 6= x0 , y 6= y 0 . Then:
|Ψ(ωX (x, x0 ))−Ψ(ωY (y, y 0 ))| = |ωX (x, x0 )+C −ωY (y, y 0 )−C| = |ωX (x, x0 )−ωY (y, y 0 )|.
Since (x, y), (x0 , y 0 ) were arbitrary, it follows that dis(R) = disΨ (R). This holds for all
R ∈ RB .
On the other hand, let R ∈ RN . By a previous observation, we assume that there exist
x, x0 , y such that (x, y), (x0 , y) ∈ R. For such a pair, we have:
|Ψ(ωX (x, x0 )) − Ψ(ωY (y, y))| = |ωX (x, x0 ) + C − 0| ≥ max dis(S) + 1.
S∈R(X,Y )

It follows that disΨ (R) > disΨ (S), for any S ∈ RB . Hence:
1
dN (Ψ(X), Ψ(Y )) = min disΨ (R)
2 R∈R(X,Y )
1
= min disΨ (R)
2 R∈RB
1
= min dis(R)
2 R∈RB
1
= min dis(ϕ), where ϕ ranges over bijections X → Y
2 ϕ
= dbN (X, Y ).
It is known (see Remark 179 below) that computing dbN is NP-hard. But the preceding
calculation shows that dbN can be computed through dN , which, by assumption, is not NP-
hard. This is a contradiction. Hence dN is NP-hard.
Remark 179. We can be more precise about why computing dbN is a case of the QBAP. Let
X = {x1 , . . . , xn } and let Y = {y1 , . . . , yn }. Let Π denote the set of all n × n permutation
matrices. Note that any π ∈ Π can be written as π = ((πij ))ni,j=1 , where each πij ∈ {0, 1}.
P P
Then j πij = 1 for any i, and i πij = 1 for any j. Computing dbN now becomes:
1
dbN (X, Y ) = min max Γijkl πij πkl , where Γikjl = |ωX (xi , xk ) − ωY (yj , yl )|.
2 π∈Π 1≤i,k,j,l,≤n
This is just the QBAP, which is known to be NP-hard [18].

4.2 Computing lower bounds for dN


In this section we first discuss some algorithmic details on how to compute the lower
bounds for dN involving local spectra and then present some computational examples. All
networks in this section are assumed to be finite. Our software and datasets are available
on https://github.com/fmemoli/PersNet as part of the PersNet software
package.

173
4.2.1 An algorithm for computing minimum matchings
Lower bounds for dN involving the comparison of local spectra of two networks such
as those in Proposition 147 require computing the minimum of a functional J(R) :=
max(x,y)∈R C(x, y) where C : X × Y → R+ is a given cost function and R ranges in
R(X, Y ). This is an instance of a bottleneck linear assignment problem (or LBAP) [18].
We remark that the current instance differs from the standard formulation in that one is
now optimizing over correspondences and not over permutations. Hence the standard al-
gorithms need to be modified.
Assume n = card(X) and m = card(Y ). In this section we adopt matrix notation and
regard P)) ∈ {0, 1}
R as a matrix ((ri,j n×m
. The condition R ∈ R(X, Y ) then requires that
n×m
P
i ri,j ≥ 1 for all j and j ri,j ≥ 1 for all i. We denote by C = ((ci,j )) ∈ R+ the matrix
representation of the cost function C described above. With the goal of identifying a suit-
able algorithm, the key observation is that the optimal value minR∈R J(R) must coincide
with a value realized in the matrix C.
An algorithm with complexity O(n2 × m2 ) is the one in Algorithm 1 (we give it in
Matlab pseudo-code). The algorithm belongs to the family of thresholding algorithms for
solving matching problems over permutations, see [18]. Notice that R is a binary matrix
and that procedure TestCorrespondence has complexity O(n × m). In the worst case, the
matrix C has n × m distinct entries, and the while loop will need to exhaustively test them
all, hence the claimed complexity of O(n2 × m2 ). Even though a more efficient version
(with complexity O((n × m) log(n × m)) can be obtained by using a bisection strategy
on the range of possible values contained in the matrix C (in a manner similar to what is
described for the case of permutations in [18]), here for clarity we limit our presentation to
the version detailed above.

4.2.2 Computational example: randomly generated networks


As a first application of our ideas we generated a database of weighted directed net-
works with different numbers of “communities” and different total cardinalities using the
software provided by [54]. Using this software, we generated 35 random networks as fol-
lows: 5 networks with 5 communities and 200 nodes each (class c5-n200), 5 networks
with 5 communities and 100 nodes each (class c5-n100), 5 networks with 4 communities
and 128 nodes each (class c4-n128), 5 networks with 2 communities and 20 nodes each
(class c2-n20), 5 networks with 1 community and 50 nodes each (class c1-n50), and
10 networks with 1 community and 128 nodes each (class c1-n128). In order to make
the comparison more realistic, as a preprocessing step we divided all the weights in each
network by the diameter of the network. In this manner, discriminating between networks
requires differentiating their structure and not just the scale of the weights. Note that the
(random) weights produced by the software [54] are all non-negative.

174
Algorithm 1 MinMax matching
1: procedure M IN M AX M ATCH(C)
2: v = sort(unique(C(:)));
3: k = 1;
4: while ∼ done do
5: c = v(k);
6: R = (C <= c);
7: done = T EST C ORRESPONDENCE(R);
8: k = k + 1;
9: end while
10: return c
11: end procedure
12: procedure T EST C ORRESPONDENCE(R)
13: done = prod(sum(R))*prod(sum(R’)) > 0;
14: return done
15: end procedure

Figure 4.1: Left: Lower bound matrix arising from matching local spectra on the database
of community networks. Right: Corresponding single linkage dendrogram. The labels
indicate the number of communities and the total number of nodes. Results correspond to
using local spectra as described in Proposition 180.

175
Figure 4.2: Left: Lower bound matrix arising from matching global spectra on the database
of community networks. Right: Corresponding single linkage dendrogram. The labels
indicate the number of communities and the total number of nodes. Results correspond to
using global spectra as signatures.

Using a Matlab implementation of Algorithm 1 we computed a 35 × 35 matrix of values


corresponding to a lower bound based simultaneously on both the in and out local spectra.
This strengthening of Proposition 147 is stated below.
Proposition 180. For all X, Y ∈ FN ,
dN (X, Y ) ≥ 21 min max C(x, y), where
R∈R (x,y)∈R

C(x, y) = max dRH (specin in out out


R

X (x), specY (y)), dH (specX (x), specY (y)) .

This bound follows from Proposition 147 by the discussion at the beginning of §2.6.1.
The results are shown in the form of the lower bound matrix and its single linkage
dendrogram in Figure 4.1. Notice that the labels in the dendrogram permit ascertaining
the quality of the classification provided by the local spectra bound. With only very few
exceptions, networks with similar structure (same number of communities) were clustered
together regardless of their cardinality. Notice furthermore how networks with 4 and 5
communities merge together before merging with networks with 1 and 2 communities, and
vice versa. For comparison, we provide details about the performance of the global spectra
lower bound on the same database in Figure 4.2. The results are clearly inferior to those
produced by the local version, as predicted by the inequality in Proposition 147.

4.2.3 Computational example: simulated hippocampal networks


To compare with persistent homology methods, we repeated the experiment described
in §1.10.2 on a smaller scale using local spectra lower bounds.

176
In this experiment, there were two environments: (1) a square of side length L, and (2)
a square of side length L, with a disk of radius 0.33L removed from the center. In what
follows, we refer to the environments of the second type as 1-hole environments, and those
of the first type as 0-hole environments. For each environment, a random-walk trajectory
of 5000 steps was generated, where the animal could move above, below, left, or right with
equal probability. If one or more of these moves took the animal outside the environment (a
disallowed move), then the probabilities were redistributed uniformly among the allowed
moves. The length of each step in the trajectory was 0.1L.
In the first set of 20 trials for each environment, 200 place fields of radius 0.1L were
scattered uniformly at random. In the next two sets, the place field radii were changed to
0.2L and 0.05L. This produced a total of 60 trials for each environment. For each trial,
the corresponding network (X, ωX ) was constructed as follows: X consisted of 200 place
cells, and for each 1 ≤ i, j ≤ 200, the weight ωX (xi , xj ) was given by:

# times cell xj spiked in a window of five time units after cell xi spiked
ωX (xi , xj ) = 1 − .
# times cell xj spiked

The results of applying the local spectra lower bound are shown in Figure 4.3. The
labels env-0, env-1 correspond to 0 and 1-hole environments, respectively.
As a final remark, we note that at least for this experiment, it appears that superior
results were obtained using Dowker persistent homology, as shown in §1.10.2.

177
Figure 4.3: Single linkage dendrogram based on local spectrum lower bound of Proposition
180 corresponding to hippocampal networks with place field radii 0.2L, 0.1L, and 0.05L
(clockwise from top left).

4.3 Lower bounds for dN,p


In this section, we discuss the computation of a lower bound obtained by taking the
maximum of the outer and inner joint eccentricity bounds in Theorem 127, specifically the
bounds given by Inequalities (1.14) and (1.16). In what follows, we write TLB bound to
mean the maximum of these two bounds. Each TLB is obtained by repeatedly invoking
Equation (1.17) and finally solving a linear program. To speed up computation of the
linear program, we use entropic regularization, as used in [42] and further developed in
[10, 111, 100, 119]. As a model on which to run our computations, we develop the notion
of a stochastic block model (SBM) network, which is based on the popular generative model
for random graphs [2].
We first explain the notion of entropic regularization and associated difficulties with
numerical stability.

178
4.3.1 Numerical stability of entropic regularization
Let µX , µY be probability measures on sets X, Y with |X| = m, |Y | = n. For a general
m × n cost matrix M , one may consider the entropically regularized optimal transport
problem below, where λ ≥ 0 is a regularization parameter and H denotes entropy:
X 1 X
inf Mij pij − H(p), H(m) = − pij log pij .
p∈C (µX ,µY )
i,j
λ i,j

As shown in [42], the solution to this problem has the form diag(a)∗K ∗diag(b), where
K := e−λM is a kernel matrix and a, b are nonnegative scaling vectors in Rm , Rn , respec-
tively. Here ∗ denotes matrix multiplication, and exponentiation is performed elementwise.
An approximation to this solution can be obtained by iteratively scaling K to have row and
column sums equal to µX and µY , respectively. More specifically, the updates are simply:
µY µX
a = ones(m, 1); %initialize, b← , a← .
K0 ∗ a K ∗b
As pointed out in [106, 31, 30], using a large value of λ (corresponding to a small
regularization) leads to numerical instability, where values of K can go below machine
precision and entries of the scaling udpates a, b can also blow up. For example, Matlab
will interpret e−1000 as 0, which is a problem with even a moderate choice of λ = 200 and
Mij = 50. Theoretically, it is necessary to have K be a positive matrix for the Sinkhorn
algorithm to converge to the correct output [109, 110].
In [106, 31, 30], the authors proposed a variety of clever methods for stabilizing the
structure of the algorithm. One such idea is to incorporate some amount of log-domain
computations to stabilize the iterations, i.e. during the iterations, large values of a and b
are occasionally “absorbed” into K. This leads to some cancellation with the small values
of K so that the resulting matrix K̃ is stabilized. The iterations then continue with the
stabilized kernel until another absorption step is required, which stabilizes K̃ even further.
Even with this strategy, some entries of K might be zero at initialization. Another strategy
described in the preceding works is to start with a conservative value of λ, obtain some
scaling updates that are used to stabilize K, and then gradually increase λ to the desired
value while further stabilizing the kernel.
As discussed in [30], many entries of the stabilized kernel obtained as above could be
below machine precision, but the entries corresponding to those on which the optimal plan
is supported are likely to be above the machine limit. Indeed, this sparsity may even be
leveraged for additional computational tricks.
The techniques for stabilizing the entropy regularized OT problem are not the focus
of our work, but because these considerations naturally arose in our computational experi-
ments, we describe some strategies we undertook that are complementary to the techniques
available in the current literature. In order to provide a perspective complementary to that
presented in [30], we impose the requirement that all entries of the kernel matrix remain
above machine precision.

179
Initializing in the log domain. A simple adaptation of the “log domain absorption” step
referred to above yields a “log initialization” method that works well in most cases for
initializing K to have values above machine precision. To explain this method, we first
present an algorithm (Algorithm 2) for the log domain absorption method. We follow the
presentation provided in [30], making notational changes as necessary.

Algorithm 2 Sinkhorn with partial log domain steps


procedure SINKHORN L OG(M, λ, mX, mY ) . M an m × n cost matrix, mX, mY
prob. measures
a ← 1m , b ← 1n . scaling updates
u ← 0m , v ← 0n . log domain storage of large a, b
Kij ← exp(λ(−Mij + ui + vj )) . initialize kernel
while stopping criterion not met do
b ← mB./(K 0 a)
a ← mA./(Kb)
if max(max(a), max(b)) > threshold then
u ← u + (1/λ) log(a) . store a, b in u, v
v ← v + (1/λ) log(b)
Kij ← exp(λ(−Mij + ui + vj )) . absorb a, b into K
a ← 1m , b ← 1n . after absorption, reset a, b
end if
end while
return diag(a)K diag(b)
end procedure

Notice that in Algorithm 2, K might already have values below machine precision at
initialization. To circumvent this, we can add a preprocessing step that yields a stable
initialization of K. This is outlined in Algorithm 3. An important point to note about Al-
gorithm 3 is that the user needs to choose a function decideParam(α, β) which returns
a number γ between α and β, where α and β are as stated in the algorithm. This number γ
should be such that exp(−λβ + λγ) is above machine precision, but exp(−λα + λγ) is not
too large. The crux of Algorithm 3 is that by choosing large initial scaling vectors a, b and
immediately absorbing them into the log domain, the extreme values of M are canceled out
before exponentiation.
A geometric trick in the p = 2 case. The preceding initialization method has its limita-
tions: depending on how far min(M ), max(M ) are spread apart, the log initialization step
might not be able to yield an initial kernel K that has all entries above machine precision
and below the machine limit. This suggests that it would be beneficial to normalize the cost
matrix M to control the spread of min(M ), max(M ). However, it is crucial to remember

180
Algorithm 3 Log domain initialization of K
procedure LOG I NITIALIZE(M, λ, mX, mY ) . M an m × n cost matrix, mX, mY
prob. measures
α ← min(M ), β ← max(M ) . scan M for max and min values
γ ← decideParam(α, β) . decideParam is an independent function
a ← exp(λγ)1m , b ← exp(λγ)1n
u ← (1/λ) log(a), v ← (1/λ) log(b)
Kij ← exp(λ(−Mij + ui + vj )) . K is stably initialized
a ← 1m , b ← 1n
perform rest of SINKHORN L OG(a)s usual
end procedure

that OT problems arise in our setting when computing the TLB between two networks, so
any normalization would have to be theoretically justified.
It turns out that in the case p = 2, the particular geometry of (Nm , dN,p ) allows for an
elegant normalization scheme. In this particular case, it is possible to use a certain “cosine
rule formula” [117] to compute the TLB between rescaled versions of the original networks
X and Y , and then rescale the solution to get the TLB between X and Y . We describe this
method in detail below. In what follows, we always have p = 2 for dN,p unless specified
otherwise.
The caveat to this normalization scheme, pointed out to us by Justin Solomon, is that
scaling the network weights down in turn requires the λ values to be scaled up, which
once again leads to numerical instability. In practice, we have used the following ap-
proach. When running a computation over a database of networks which have widely
varying weights (such that any fixed choice of λ causes some of the initial kernels to have
values above/below machine precision), we rescale all the networks simultaneously using
the normalization described below. Then we proceed with the Sinkhorn algorithm, employ-
ing log domain absorption steps as needed (and some of the computations will indeed need
more of these absorption steps).
Let (X, ωX , µX ), (Y, ωY , µY ) ∈ Nm . Recall from Example 109 that dN,2 (X, N1 (0)) =
1
2
size2 (X). Define s := 12 size2 (X, ωX , µX ), t := 21 size2 (Y, ωY , µY ). Notice also that for
an optimal coupling µ ∈ C (µX , µY ), we have:
Z Z
1 2 2
2
dN,2 (X, Y ) = |ωX (x, x0 )| + |ωY (y, y 0 )| − 2ωX (x, x0 )ωY (y, y 0 ) dµ(x, y) dµ(x0 , y 0 )
4
Z Z
1
2
=s +t − 2
ωX (x, x0 )ωY (y, y 0 ) dµ(x, y) dµ(x0 , y 0 ),
2

where the first equality holds because |a − b|2 = |a|2 + |b|2 − 2ab for all a, b ∈ R, and
the last equality holds because ωX (x, x0 ), ωY (y, y 0 ) do not depend on µY , µX , respectively.

181
Sturm [117, Lemma 4.2] observed the following “cosine rule” structure. Define
0 ωX ωY
ωX := , ωY0 := . (4.1)
2s 2t
0
Then size2 (X, ωX ) = 2s 1
size2 (X, ωX ) = 1 = 2t1 size2 (Y, ωY ) = size2 (Y, ωY0 ). A geo-
metric fact about this construction is that (X, ωX , µX ), (Y, ωY , µY ) lie on geodesic rays
0
connecting X := (X, ωX , µX ) and Y := (Y, ωY0 , µY ) respectively to N1 (0). Actually, once
(X, ωX , µX ) and (Y, ωY , µY ) are chosen, the geodesic rays are automatically defined to be
given by their scalar multiples. Then we independently define X and Y to be representa-
tives of the weak isomorphism class of networks at dN,2 distance 1/2 from N1 (0) that lie
on these geodesics. We illustrate a related situation in Figure 4.4, and refer the reader to
[117] for further details. Implicitly using this geometric fact, we fix X , Y as above and
treat (X, ωX , µX ), (Y, ωY , µY ) as 2s and 2t-scalings of X and Y, respectively (i.e. such
that Equation 4.1 is satisfied). Then we have:

4dN,2 ((X, ωX , µY ), (Y, ωX , µY ))2 − 4s2 − 4t2


Z Z
0 2 2
= 4s2 |ωX (x, x0 )| + 4t2 |ωY0 (y, y 0 )| − 4s2 − 4t2
0
− 8stωX (x, x0 )ωY0 (y, y 0 ) dµ(x, y) dµ(x0 , y 0 )
0 2
= 4s2 size2 (X, ωX ) + 4t2 size2 (Y, ωY0 )2 − 4s2 − 4t2
Z Z
0
− 2st |ωX (x, x0 )| |ωY0 (y, y 0 )| dµ(x, y) dµ(x0 , y 0 )
Z Z
0
= −8st |ωX (x, x0 )| |ωY0 (y, y 0 )| dµ(x, y) dµ(x0 , y 0 ),

0
where the last equality holds because size2 (X, ωX ) = 1 = size2 (Y, ωY0 ).
Since (X, ωX , µX ), (Y, ωY , µY ) were 2s, 2t-scalings of X and Y for arbitrary s, t > 0,
this shows in particular that the quantity

(1/2st) dN,2 ((X, ωX , µY ), (Y, ωY , µY ))2 − s2 − t2



(cosine rule)

depends only on the reference networks X and Y, and is independent of s and t.


We leverage the preceding observations to produce a scaling method as follows. Sup-
pose (X, ωX , µX ), (Y, ωY , µY ) are given. Define s := 12 size2 (X, ωX , µX ) and t := 21 size2 (Y, ωY , µY ).
For α, β ≥ 1, define
ωX ωY
ωX := , ωY := .
αkωX k∞ βkωY k∞
Also define X := (X, ωX , µX ), Y := (Y, ωY , µY ). Then we have
1 s 1 t
σ := size2 (X ) = , τ := size2 (Y) = .
2 αkωX k∞ 2 βkωY k∞
Each weight in the support of X , Y is in the range [−1, 1]. Forming a TLB cost matrix
from these rescaled networks and performing Sinkhorn iterations with the rescaled weights

182
is more stable because the entries in the corresponding kernel matrix K are more likely to
be above machine precision. Indeed, for larger values of λ, one can scale down ωX , ωY
sufficiently via α and β to ensure that K is well-behaved. The cosine rule can then be used
to recover dN,2 (X, Y ) in terms of dN,2 (X , Y).

X
s

s0 X0

Y0
t0
Y
t

Figure 4.4: Sinkhorn computations for dN,2 (X , Y) are carried out in the “stable region” for
K, and the end result is rescaled to recover dN,2 (X, Y ).

0
This geometric idea is illustrated in Figure 4.4. The spaces (X, ωX ), (X, ωX ), X all
live on a geodesic ray emanating from N1 (0), and likewise for Y . See [117] for more
details about the geodesic structure of gauged measure spaces; the analogous results hold
for (Nm , dN,2 ).
From the preceding observation, we have:
1 1
dN,2 (X, Y )2 − s2 − t2 = dN,2 (X , Y)2 − σ 2 − τ 2
 
2st 2στ
s2 t2
 
2 2 2 2
dN,2 (X, Y ) − s − t = αβkωX k∞ kωY k∞ dN,2 (X , Y) − 2 − .
α kωX k2∞ β 2 kωY k2∞
The final step is summarized in the following
Lemma 181. Let X, Y, X , Y, α, β, s, t be as above. Then,
s2 βkωY k∞ t2 αkωX k∞
dN,2 (X, Y )2 = αβkωX k∞ kωY k∞ dN,2 (X , Y)2 − − + s2 + t2 .
αkωX k∞ βkωY k∞
Remark 182. From the perspective of computations, the preceding lemma should be in-
terpreted as follows. The quantities s, t, kωX k∞ , kωY k∞ are all easy to compute. One can
either attempt to obtain a local minimizer for the dN,2 (X , Y)-functional (e.g. following
[111]), or to obtain a TLB-type lower bound for dN,2 (X , Y), as in the current work. In
either case, the output can be rescaled by the formula in the lemma to approximate (or
lower-bound) dN,2 (X, Y )2 .

183
Remark 183. In our experiments, it was best to fix α, β = 1. This cosine rule method
works best when the networks in question have large sizes; if the networks are provided in
normalized form with weights in a small range (e.g. in [0,1]), then it is better to transfer
some of the computations to the log domain using Algorithms 2 and 3.
Zeros on the diagonal for TLB(A, A) and adaptive λ-search. When comparing a net-
work to itself, we expect to get a TLB output of zero. The corresponding optimal coupling
should be the diagonal coupling. However, some care needs to be taken to achieve this
during computation, because entropic regularization produces “smoothened” couplings by
design, and the diagonal coupling is highly nonsmooth. We now briefly explain a simple
heuristic we used to achieve these nonsmooth couplings.
Suppose we are comparing a matrix A to itself via the TLB. The corresponding cost
matrix M should have zeros on the diagonal, which translates to a kernel matrix K with
1s on the diagonal. When M has all 0s on the diagonal and values strictly above 0 ev-
erywhere else, and the two marginals are equal, then the optimal coupling should be the
diagonal coupling: the optimal transport plan is to not transport anything at all. To achieve
this via Sinkhorn iterations, we noticed that the off-diagonal entries in each column of K
needed to be several orders of magnitude below 1. For our desired precision, three orders
of magnitude were sufficient. Since orders of separation in K are easily related to differ-
ences of values in M , we performed the following procedure: for each column j of M , we
computed the minimal difference between values in the column, and computed λj so that
after computing K = e−λj M , the entries in column j of K would be separated by at least
three orders of magnitude. Thus we obtained a pool of λ values. From this list, we used a
binary search to pick the largest λ that would not cause entries of K to go below machine
precision.
While this approach naturally suggests using log domain initialization (as in Algorithm
3) to choose the largest possible λ, we did not use any log domain computations so that we
could independently observe the behavior of this simple heuristic. However, to ensure that
at least one λ in the pool would work without causing entries of K to go below machine
precision, we preprocessed the networks via the cosine rule normalization strategy used
above and inserted a moderate value of λ = 200 into the pool of λ values.
We used this adaptive λ-search heuristic when computing the TLB for any pair of net-
works, not just the TLB of a network with itself (which was the motivation for this heuris-
tic). For an illustration of the operation of this method, see the TLB dissimilarity matrix
corresponding to Experiment 1.10.3 in Figure 1.25.

4.4 Complexity of PPH and algorithmic aspects


The origin of a general persistent homology algorithm for simplicial complexes can
be traced back to [49] for Z/2Z coefficients, and to [123] for arbitrary field coefficients.
Here it was observed that the persistence algorithm has the same running time as Gaussian
elimination over fields, i.e. O(m3 ) in the worst case, where m is the number of simplices.

184
Algorithm 4 TLB computation with cosine rule normalization
procedure GET C OSINE TLB(X, Y, mX, mY ) . X is n × n, Y is m × m, mX, mY
prob. measures
A ← X./ max(abs(X)), B ← Y./ max(abs(Y )) . ./ denotes elementwise division
ρ ← GET TLB(A, B, mX, mY )
s ← (1/2)GET S IZE(A, mX), t ← (1/2)GET S IZE(B, mY )
perform scaling of ρ using s, t as in Lemma 181, save as π
return π
end procedure
procedure GET S IZE(A, mA) . Get the 2-size of a network
0
σ ← sqrt(mA (A. ∧ 2)mA)
return σ
end procedure
procedure GET TLB(A, B, mA, mB) . get TLB over R with p = 2
for 1 ≤ i ≤ n and 1 ≤ j ≤ m do
vAout ← A(i, :), vBout ← B(j, :) . get both eccout and eccin
vAin ← A(:, i), vBin ← B(:, j)
Cout(i, j) ← COMPARE D ISTRIBUTIONS(vAout, vBout, mA, mB)
Cin(i, j) ← COMPARE D ISTRIBUTIONS(vAin, vBin, mA, mB)
end for
perform Sinkhorn iterations for OT problem with Cout, Cin as cost matrices
store results in tlbIn, tlbOut
return max(tlbIn, tlbOut) . both are valid lower bounds for dN,2 , so take the max
end procedure
procedure COMPARE D ISTRIBUTIONS(vA, vB, mA, mB)
γ ← use Equation (1.17) to get 2-Wasserstein distance between
pushforward distributions over R induced by vA and vB
return γ
end procedure

185
The PPH setting is more complicated, due to two reasons: (1) because of directionality,
the number of p-paths on a vertex set is much larger than the number of p-simplices, for any
p ∈ N, and (2) one must first obtain bases for the ∂-invariant p-paths {Ωp : p ≥ 2}. The
first item is unavoidable, and even desirable—we capture the asymmetry in the data, thus
retaining more information. For the second item, note that Ω0 and Ω1 are just the allowed
0 and 1-paths, so their bases can just be read off from the network weight function. After
obtaining compatible bases for the filtered chain complex {Ωi• → Ωi+1 • }i∈N , however, one
can use the general persistent homology algorithm [49, 123, 29]. By compatible bases, we
mean a set of bases {Bpi ⊆ Ωip : 0 ≤ p ≤ D + 1, i ∈ N} such that Bpi ⊆ Bpi+1 for each
i, and relative to which the transformation matrices Mp of ∂p are known. Here D is the
dimension up to which we compute persistence.
We now present a procedure for obtaining compatible bases for the ∂-invariant paths.
Fix a network (X, AX ). We write Rp to denote Rp (X, K), for each p ∈ Z+ . Given a
digraph filtration on X, we obtain a filtered vector space {Ai• → Ai+1 N
• }i=1 and a filtered
chain complex {Ωi• → Ωi+1 N
• }i=1 for some N ∈ N. For any p-path v, define its allow time
as at(v) := min{k ≥ 0 : v ∈ Akp }. Similarly define its entry time as et(v) := min{k ≥
0 : v ∈ Ωkp }. The allow time and entry time coincide when p = 0, 1, but are not necessarily
equal in general. In Figure 1.13, for example, we have at(x4 x1 x2 ) = 1 < 2 = et(x4 x1 x2 ).
Now fix p ≥ 2, and consider the map ∂p : Rp → Rp−1 . Let Mp denote the matrix
representation of ∂p , relative to an arbitrary choice of bases Bp and Bp−1 for Rp and Rp−1 .
For convenience, we write the bases as Bp ={vip : 1 ≤ i ≤ dim(Rp )} and Bp−1 ={vip−1 :
1 ≤ i ≤ dim(Rp−1 )}, respectively. Each basis element has an allow time that can be
computed efficiently, and the allow times belong the set {1, 2, . . . , N }. By performing row
and column swaps as needed, we can arrange Mp so that the basis vectors for the domain
are in increasing allow time, and the basis vectors for the codomain are in decreasing allow
time. This is illustrated in Figure 4.5.
A special feature of Mp is that it is stratified into horizontal strips given by the allow
times of the codomain basis vectors. For each 1 ≤ i ≤ N , we define the height range i as:

hr(i) := {1 ≤ j ≤ dim(Rp−1 ) : at(vjp−1 ) = i}.

In words, hr(i) lists the codomain basis vectors that have allow time i. Next we trans-
form Mp into a column echelon form Mp,G , using left-to-right Gaussian elimination. In this
form, all nonzero columns are to the left of any zero column, and the leading coefficient
(the topmost nonzero element) of any column is strictly above the leading coefficient of
the column on its right. The leading coefficients are usually called pivots. An illustra-
tion of Mp,G is provided in Figure 4.5. To obtain this column echelon form, the following
elementary column operations are used:

1. swap columns i and j,

2. replace column j by (col j − k(col i)), where k ∈ K.

186
The basis for the domain undergoes corresponding changes, i.e. we replace vjp by (vjp −
kvip ) as necessary. We write the new basis Bp,G for Rp as {b vip : 1 ≤ i ≤ dim(Rp )}.
Moreover, we can write this basis as a union Bp,G = ∪N i i
i=1 Bp,G , where each Bp,G := {b vkp :
p
1 ≤ k ≤ dim(Rp ), et(b vk ) ≤ i}. This follows easily from the column echelon form: for
each basis vector v of the domain, the corresponding column vector is ∂p (v), and at(∂p (v))
can be read directly from the height of the column. Specifically, if the row index of the
topmost nonzero entry of ∂p (v) belongs to hr(i), then at(∂p (v)) = i, and if ∂p (v) = 0, then
at(∂p (v)) = 0. Then we have et(v) = max(at(v), at(∂p (v))).

Remark 184. In the Gaussian elimination step above, we only eliminate entries by adding
paths that have already been allowed in the filtration. This means that for any operation of
the form vjp ← vjp − kvip , we must have at(vip ) ≤ at(vjp ). Thus at(vjp − kvip ) = at(vjp ). It
follows that the allow times of the domain basis vectors do not change as we pass from Mp
to Mp,G , i.e. Mp and Mp,G have the same number of domain basis vectors corresponding
to any particular allow time.

Now we repeat the same procedure for ∂p+1 : Rp+1 → Rp , taking care to use the basis
Bp,G for Rp . Because we never perform any row operations on Mp+1 , the computations for
i
Mp+1 do not affect Mp,G . We claim that for each 1 ≤ i ≤ N and each p ≥ 0, Bp,G is a
i
basis for Ωp . The correctness of the procedure amounts to proving this claim. Assuming
N
the claim for now, we obtain compatible bases for the chain complex {Ωi• → Ωi+1 • }i=1 .
Applying the general persistence algorithm with respect to the bases we just found now
yields the PPH diagram.
Correctness Note that all paths become allowed eventually, so dim(ΩN p ) = dim(Rp ). We
i
claim that Bp,G is a basis for Ωip , for each 1 ≤ i ≤ N . To see this, fix 1 ≤ i ≤ N and let
i i
v ∈ Bp,G . By the definition of Bp,G , et(v) ≤ i, so v ∈ Ωip . Each Bp,G
i
was obtained by
performing linear operations on the basis Bp of Rp , so it is a linearly independent collection
of vectors in Ωip . Towards a contradiction, suppose Bp,G i
does not span Ωip . Let ũ ∈ Ωip be
i i
linearly independent from Bp,G , and let ṽ ∈ Bp,G \ Bp,G be linearly dependent on ũ (such
a ṽ exists because Bp,G is a basis for Rp ).
Consider the basis Bpũ obtained from Bp,G after replacing ṽ with ũ. Let Mpũ denote
the corresponding matrix, with the columns arranged in the following order from left to
i
right: the first |Bp,G | columns agree with those of Mp,G , the next column is ∂p (ũ), and the
remaining columns appear in the same order that they appear in Mp,G . Notice that Mp,G
differs from Mpũ by a change of (domain) basis, i.e. a sequence of elementary column
operations. Next perform another round of left-to-right Gaussian elimination to arrive at
a column echelon form Mpu , where u is the domain basis vector obtained from ũ after
performing all the column operations. Let Bpu denote the corresponding domain basis. It
is a standard theorem in linear algebra that the reduced column echelon form of a matrix
is unique. Since Mp,G and Mpu were obtained from Mp via column operations, they both
have the same unique reduced column echelon form, and it follows that they have the same
pivot positions.

187
i
Now we arrive at the contradiction. Since ṽ 6∈ Bp,G , we must have either at(ṽ) > i, or
at(∂p (ṽ)) > i. Suppose first that at(ṽ) > i. Since ũ ∈ Ωip , we must have et(ũ) ≤ i, and
so at(ũ) ≤ i. By the way in which we sorted Mpũ , we know that u is obtained by adding
i i
terms from Bp,G to ũ. Each term in Bp,G has allow time ≤ i, so at(u) ≤ i by Remark 184.
u
But then Bp has one more basis vector with allow time ≤ i than Bp , i.e. one fewer basis
vector with allow time > i. This is a contradiction, because taking linear combinations
of linearly independent vectors to arrive at Bpu can only increase the allow time. Next
suppose that at(∂p (ṽ)) > i. Then, because Mp,G is already reduced, the column of ṽ has
a pivot at a height that does not belong to hr(i). Now consider ∂p (u). Suppose first that
∂p (u) = 0. Then the column of u clearly does not have a pivot, and it does not affect the
pivots of the columns to its right in Mpu . Thus Mpu has one fewer pivot than Mp,G , which
is a contradiction because both matrices have the same reduced column echelon form and
hence the same pivot positions. Finally, suppose ∂p (u) 6= 0. Since u is obtained from ũ by
reduction, we also have at(∂p (u)) ≤ at(∂p (ũ)) ≤ i. Thus Mpu has one more pivot at height
i
range i than Mp,G , which is again a contradiction. Thus Bp,G spans Ωip . Since 1 ≤ i ≤ N
was arbitrary, the result follows.
Data structure Our work shows that left-to-right column reduction is sufficient to obtain
compatible bases for the filtered chain complex {Ωi• → Ωi+1 N
• }i=1 . As shown in [123], this
is precisely the operation needed in computing persistence intervals, so we can compute
PPH with little more work. It is known that there are simple ways to optimize the left-to-
right persistence computation [29, 8], but in this paper we follow the classical treatment.
Following [49, 123], our data structure is a linear array T labeled by the elementary regular
p-paths, 0 ≤ p ≤ D + 1, where D is the dimension up to which homology is computed.
For completeness, we show below how to modify the algorithms in [123] to obtain PPH.
Analysis The running time for this procedure is the same as that of Gaussian elimination
over fields, i.e. it is O(m3 ), where m is the number of D-paths (if we compute persistence
up to dimension D − 1). This number is large: the number of regular D-paths over n
points is n(n − 1)D . Computing persistence also requires O(m3 ) running time. Thus, to
compute PPH in dimension D − 1 for a network on n nodes, the worst case running time
is O(n3+3D ).
Compare this with the problem of producing simplicial complexes from networks, and
then computing simplicial persistent homology. For a network on n nodes, assume that the
simplicial filtration is such that every D-simplex on n points eventually enters the filtration
n
(see [37] for such filtrations). The number of D-simplices over n points is D+1 , which
D+1
is of the same order as n . Thus computing simplicial persistent homology in dimen-
sion D − 1 via such a filtration (using the general algorithm of [123]) still has complexity
O(n3+3D ).

188
basis for Rp

at = 1 at = 2 ··· at = N

at = N

at = N
0’s
.
.
.
basis for Rp−1

. at = 1
.
.
at = 0

at = 1

Figure 4.5: Left: The rows and columns of Mp are initially arranged so that the domain
and codomain vectors are in increasing and decreasing allow time, respectively. If there
are no domain (codomain) vectors having a particular allow time, then the corresponding
vertical (horizontal) strip is omitted. Right: After converting to column echelon form, the
domain vectors of Mp,G need not be in the original ordering. But the codomain vectors are
still arranged in decreasing allow time.

4.4.1 The modified algorithm


By our observations in the preceding section, computing bases for the filtered chain
complex {Ωi• → Ωi+1 N
• }i=1 can be done simultaneously while performing the column re-
duction operations needed for persistence, and this does not cause any additional overhead.
For notational convenience, we use a collection T0 , . . . , TD+1 of linear arrays, where each
Tp contains a slot for each elementary regular p-path. Specifically, for each vjp ∈ Bp
(the chosen basis for Rp ), Tp contains a slot labeled (vjp , et(vjp ), at(vjp )) which can store a
linked list of (p − 1)-paths and an integer corresponding to an entry time. We sort each
Tp according to increasing allow time and relabel Bp if needed so that vjp is the label of
Tp [j]. Thus it makes sense to talk about the index of vjp in Tp : we define index(vjp ) = j,
and Tp [index(vjp )] is labeled by (vjp , et(vjp ), at(vjp )). Note that if v, v 0 ∈ Bp are such that
index(v) ≤ index(v 0 ), then at(v) ≤ at(v 0 ).
Below we present a modified version of the algorithm in [123] that computes PPH.
We make one last remark, based on an observation in [123]: because of the relation ∂p ◦
∂p+1 , a pivot column of the reduced boundary matrix Mp,G corresponds to a zero row in
Mp+1,G . Thus whenever we compute ∂p (v) in our algorithm, we can immediately remove
the summands that correspond to pivot terms in Mp−1,G . This is done in Algorithm 6.

189
Algorithm 5 Computing persistent path homology
1: procedure C OMPUTE PPH(X , D + 1) . Compute PPH of network
X up to dimension D
2: for p = 0, . . . , D do
3: Persp = ∅; . Store intervals here
4: for j = 1, . . . , dim(Rp+1 ) do
5: [u, i, et] =BASIS C HANGE(vjp+1 , p + 1);
6: if u = 0 then Mark Tp+1 [j];
7: else
8: Tp [i] ← (u, et);
9: Add (et(vip ), et) to Persp ;
10: end if
11: end for
12: for j = 1, . . . , dim(Rp ) do
13: if Tp [j] marked and empty then
14: Add (et(vjp ), ∞) to Persp ;
15: end if
16: end for
17: end for
18: return Pers0 , . . . , PersD ;
19: end procedure

Algorithm 6 Left-to-right column reduction


1: procedure BASIS C HANGE(v, dim) . Find pivot or zero columns
2: p ← dim; u = ∂p (v); Remove unmarked (pivot) terms from u;
3: while u 6= 0 do
4: σ ← argmax{index(τ ) : τ is a summand of u};
5: i ← index(σ);
6: et ← max(at(v), at(σ));
7: if Tp−1 [i] is unoccupied then break;
8: end if
9: Let a, b be coefficients of σ in Tp−1 [i] and u, respectively;
10: u ← u − (a/b)Tp−1 [i]; . Column reduction step
11: end while
12: return u, i, et;
13: end procedure

190
4.5 More experiments using Dowker persistence

4.5.1 U.S. economy input-output accounts


Each year, the U.S. Department of Commerce Bureau of Economic Analysis (www.
bea.gov) releases the U.S. input-output accounts, which show the commodity inputs used
by each industry to produce its output, and the total commodity output of each industry.
Economists use this data to answer two questions: (1) what is the total output of the US
economy, and (2) what is the process by which this output is produced and distributed [72].
One of the core data types in these accounts is a “make” table, which shows the pro-
duction of commodities by industries. The industries are labeled according to the North
American Industry Classification System (NAICS), and the commodities (i.e. goods or ser-
vices) produced by each industry are also labeled according to the NAICS. This make table
can be viewed as a network (E, m) consisting of a set of NAICS labels E, and a function
m : E × E → Z+ . Note that the same labels are used to denote industries and commodi-
ties. After fixing an enumeration (e1 , . . . , en ) of E, the entry m(ei , ej ) corresponds to the
dollar value (in millions) of commodity type ej produced by industry ei . For example, if e
corresponds to the economic sector “Farms,” then m(e, e) is the dollar value (in millions)
of farming commodities produced by the farming industry.
In our next example, we analyze make table data from the U.S. Department of Com-
merce for the year 2011. Specifically, we begin with a set E of 71 economic sectors, and
view it as a network by the process described above. We remark that a complementary data
set, the “use” table data for 2011, has been analyzed thoroughly via hierarchical clustering
methods in [20].
Our analysis is motivated by question (2) described above, i.e. what is the process by
which commodities are produced and distributed across economic sectors. Note that one
can simply read off values from the make table that show direct investment between com-
modities and industries, i.e. which commodities are being produced by which industry. A
more interesting question is to find patterns of indirect investment, e.g. chains (ei , ej , ek )
where industry ei produces commodities of type ej , and industry ej in turn produces com-
modities of type ek . Note that if ei does not produce any commodities of type ek , then this
indirect influence of ei on ek is not immediately apparent from the make table. One can
manually infer this kind of indirect influence by tracing values across a make table, but this
process can become cumbersome when there are large numbers of economic sectors, and
when one wants to find chains of greater length.
To automate this process so that finding flows of investment can be used for exploratory
data analysis, we take the viewpoint of using persistent homology. Beginning with the
71 × 71 make table matrix, we first obtained a matrix ω E defined by:
(
m(ei , ej ) : i 6= j
ω E (ei , ej ) := , for each 1 ≤ i, j ≤ 71.
0 : i = j.

191
Because our goal was to analyze the interdependence of industries and the flow of com-
modities across industrial sectors, we removed the diagonal as above to discount the com-
modities produced by each industry in its own type. Next we defined a network (E, ωE ),
where ωE was given by:
 
ω E (ei , ej )
ωE (ei , ej ) = f P for each 1 ≤ i, j ≤ 71.
e∈E ω E (e, ej )

Here f (x) = 1 − x is a function used to convert the original similarity network into
a dissimilarity network. The greater the dissimilarity, the weaker the investment, and vice
versa. So if ωE (e, e0 ) = 0.85, then sector e is said to make an investment of 15% on
sector e0 , meaning that 15% of the commodities of type e0 produced externally (i.e. by
industries other than e0 ) are produced by industry e. After this preprocessing step, we
computed the 0 and 1-dimensional Dowker persistence diagrams of the resulting network.
The corresponding barcodes are presented in Figure 4.6, and our interpretation is given
below.

Dependent sectors
We open our discussion with the 0-dimensional Dowker persistence barcode presented
in Figure 4.6. Recall that Javaplex produces representative 0-cycles for each persistence
interval in this 0-dimensional barcode. Typically, these representative 0-cycles are given as
the boundary of a 1-simplex, so that we know which pair of sectors merges together into
a 1-simplex and converts the 0-cycle into a 0-boundary. We interpret the representative
sectors of the shortest 0-dimensional persistence intervals as pairs of sectors where one is
strongly dependent on the other. To justify this interpretation, we observe that the right
endpoint of a 0-dimensional persistence interval corresponds to a resolution δ at which two
industries e, e0 find a common δ-sink. Typically this sink is one of e or e0 , although this
sink is allowed to be a third industry e00 . We suggest the following interpretation for being
a common δ-sink: of all the commodities of type e00 produced by industries other than e00 ,
over (1 − δ) ∗ 100% is produced by each of the industries e, e0 . Note that for δ < 0.50, this
interpretation suggests that e00 is actually e0 (or e), and that over 50% of the commodities
of type e0 produced by external industries (i.e. by industries other than e0 ) are actually
produced by e (resp. e0 ).
In Table 4.1 we list some sample representative 0-cycles produced by Javaplex. Note
that these cycles are representatives for bars that we can actually see on the 0-dimensional
barcode in Figure 4.6. We do not focus any more on finding dependent sectors and direct
investment relations, and point the reader to [20] where this topic has been covered in great
detail under the lens of hierarchical clustering (albeit with a slightly different dataset, the
“use” table instead of the make table). In the following subsection, we study some of the
persistent 1-dimensional intervals shown in Figure 4.6, specifically the two longest bars
that we have colored in red.

192
Figure 4.6: 0 and 1-dimensional Dowker persistence barcodes for US economic sector data,
obtained by the process described in §4.5.1. The long 1-dimensional persistence intervals
that are colored in red are examined in §4.5.1 and Figures 4.9,4.12.

Sample 0-dimensional persistence invervals from economy data


Interval [0, δ) Representative 0-cycle Labels δ-sink
[0.0, 0.1) [47] + [70] 47=Funds, trusts, and other financial vehicles 47
70=State and local general government
[0.0, 0.4) [45] + [44] 45=Securities, commodity contracts, and investments 45
44=Federal Reserve banks, credit intermediation, and related activities
[0.0, 0.07) [24] + [3] 3=Oil and gas extraction 24
24=Petroleum and coal products

Table 4.1: The first two columns contain sample 0-dimensional persistence intervals, as
produced by Javaplex. We have added the labels in column 3, and the common δ-sinks in
column 4.

193
Patterns of investment
Examining the representative cycles of the persistent 1-dimensional intervals in Figure
4.6 allows us to discover patterns of investment that would not otherwise be apparent from
the raw data. Javaplex produces representative nodes for each nontrivial persistence inter-
val, so we were able to directly obtain the industrial sectors involved in each cycle. Note
that for a persistence interval [δ0 , δ1 ), Javaplex produces a representative cycle that emerges
at resolution δ0 . As more 1-simplices enter the Dowker filtration at greater resolutions, the
homology equivalence class of this cycle may coincide with that of a shorter cycle, until
finally it becomes the trivial class at δ1 . We have illustrated some of the representative cy-
cles produced by Javaplex in Figures 4.9 and 4.12. To facilitate our analysis, we have also
added arrows in the figures according to the following rule: for each representative cycle at
resolution δ, there is an arrow ei → ej if and only if ωE (ei , ej ) ≤ δ, i.e. if and only if ej is
a sink for the simplex [ei , ej ] in Dsiδ,E .
Consider the 1-dimensional persistence interval [0.75, 0.95), colored in red in Figure
4.6. The industries involved in a representative cycle for this interval at δ = 0.75 are:
Wood products (WO), Primary metals (PM), Fabricated metal products (FM), Petroleum
and coal products (PC), Chemical products (CH), and Plastics and rubber products (PL).
The entire cycle is illustrated in Figure 4.9. Starting at the bottom right, note that PC has an
arrow going towards CH, suggesting the dependence of the chemical industry on petroleum
and coal products. This makes sense because petroleum and coal products are the major
organic components used by chemical plants. Chemical products are a necessary ingredient
for the synthesis of plastics, which could explain the arrow (CH→PL). Plastic products are
commonly used to produce wood-plastic composites, which are low-cost alternatives to
products made entirely out of wood. This can explain the arrow PL→WO. Next consider
the arrows FM→WO and FM→PM. As a possible interpretation of these arrows, note that
fabricated metal frames and other components are frequently used in wood products, and
fabricated metal structures are used in the extraction of primary metals from ores. Also
note that the metal extraction industry is one of the largest consumers of energy. Since
energy is mostly produced from petroleum and coal products, this is a possible reason for
the arrow PC→PM.
We now consider the 1-dimensional persistence interval [0.81, 1) colored in red in Fig-
ure 4.6. The sectors involved in a representative cycle for this interval at δ = 0.81 are:
Petroleum and coal products (PC), Oil and gas (OG), Waste management (WM), State and
local general government (SLGG), Apparel and leather and allied products (AP), Textile
mills (TE), Plastics and rubber products (PL), and Chemical products (CH). The pattern of
investment in this cycle is illustrated in Figure 4.12, at resolutions δ = 0.81 and δ = 0.99.
We have already provided interpretations for the arrows OG→PC→CH→PL above. Con-
sider the arrow PL→TE. This likely reflects the widespread use of polyester and polyester
blends in production of fabrics. These fabrics are then cut and sewn to manufacture cloth-
ing, hence the arrow TE→AP. Also consider the arrow WM→OG: this suggests the role
of waste management services in the oil and gas industry, which makes sense because the

194
WO FM FM

PL PM PL PM

CH PC CH PC

Figure 4.7: Investment patterns at δ = Figure 4.8: Investment patterns at δ =


0.75 0.94

Figure 4.9: Here we illustrate the representative nodes for one of the 1-dimensional persis-
tence intervals in Figure 4.6. This 1-cycle [PC,CH] + [CH,PL] + [PL,WO] - [WO,FM] +
[FM,PM] - [PM,PC] persists on the interval [0.75, 0.95). At δ = 0.94, we observe that this
1-cycle has joined the homology equivalence class of the shorter 1-cycle illustrated on the
right. Unidirectional arrows represent an asymmetric flow of investment. A full description
of the meaning of each arrow is provided in §4.5.1.

waste management industry has a significant role in the treatment and disposal of hazardous
materials produced in the oil and gas industry. Finally, note that the arrows SLGG→WM
and SLGG→AP likely suggest the dependence of the waste management and apparel in-
dustries on state and local government support.
We note that there are numerous other 1-dimensional persistence intervals in Figure 4.6
that could be worth exploring, especially for economists who regularly analyze the make
tables in input-output accounts and are better prepared to interpret this type of data. The
results obtained in our analysis above suggest that viewing these tables as asymmetric net-
works and then computing their persistence diagrams is a reasonable method for uncovering
their hidden attributes.

4.5.2 U.S. migration


The U.S. Census Bureau (www.census.gov) publishes data on the annual migration
of individuals between the 50 states of the U.S., the District of Columbia, and Puerto Rico.
An individual is said to have migrated from State A to State B in year Y if their residence in
year Y is in State B, and in the previous year, their residence was in State A. In this section,
we define a network structure that encapsulates migration flows within the U.S., and study
this network via its Dowker persistence diagrams. We use migration flow data from 2011

195
OG WM
OG WM
PC SLGG

CH SLGG

CH AP
TE AP
PL TE

Figure 4.11: Investment patterns at δ =


Figure 4.10: Investment patterns at δ = 0.99
0.81

Figure 4.12: Representative nodes for another 1-dimensional persistence interval in Figure
4.6. A full description of this cycle is provided in §4.5.1.

in our work. We begin with a set S = {s1 , . . . , s52 } of these 52 regions and a function
m : S × S → Z+ , where m(si , sj ) represents the number of migrants moving from si to
sj . We define a network (S, ωS ), with ωS given by:
!
m(si , sj )
ωS (si , sj ) = f P if si 6= sj , ωS (si , si ) = 0, si , sj ∈ S,
si ∈S,i6=j m(si , sj )

where f (x) = 1 − x. The purpose of f is to convert similarity data into dissimilarity data.
A large value of ωS (si , sj ) means that few people move from si to sj . The diagonal is
removed to ensure that we focus on migration patterns, not on the base population of each
state.

The Dowker complex of migration


Because we will eventually compute the Dowker persistence diagram of the network
(S, ωS ), we first suggest interpretations of some aspects of the Dowker complex that we
construct as an intermediate step. Let δ ≥ 0, and write Dsiδ := Dsiδ,S , C2δ := C2 (Dsiδ ). We
proceed by defining, for any state s ∈ S, the migrant influx of s:
X
influx(s) := m(s0 , s).
s0 ∈S,s0 6=s

196
Interpretation of a δ-sink Next we probe the meaning of a δ-sink in the migration context.
For simplicity, we first discuss δ-sinks of 1-simplices. Let [s, s0 ] be a 1-simplex for s, s0 ∈
S, and let s00 ∈ S be a δ-sink for [s, s0 ]. Then, by unwrapping the definitions above, we
see that s00 receives at least (1 − δ)(influx(s00 )) migrants from each of s, s0 . This suggests
the following physical interpretation of the 1-simplex [s, s0 ]: in 2010, there were at least
(1 − δ)(influx(s00 )) residents in each of s and s0 who had a common goal of moving to s00
in 2011. There could be a variety of reasons for interregional migration—people might be
moving for employment purposes, for better climate, and so on—but the important point
here is that we have a quantitative estimate of residents of s and s0 with similar relocation
preferences. On the other hand, letting r, r0 ∈ S be states such that [r, r0 ] 6∈ Dsiδ , the lack
of a common δ-sink suggests that residents of r and r0 might have significantly different
migration preferences. Following this line of thought, we hypothesize the following:

Residents of states that span a 1-simplex in Dsiδ are more similar to each other (in terms of
migrational preferences) than residents of states that do not span a 1-simplex.

More generally, when n states form an n-simplex in Dsiδ , we say that they exhibit co-
herence of preference at resolution δ. The idea is that the residents of these n states have
a mutual preference for a particular attractor state, which acts as a δ-sink. Conversely, a
collection of n states that do not form an n-simplex are said to exhibit incoherence of pref-
erence at resolution δ—residents of these states do not agree strongly enough on a common
destination for migration.
Interpretation of a connected component Now we try to understand the physical inter-
pretation of a connected component in Dsiδ . Recall that two states s, s0 ∈ S belong to a
connected component in Dsiδ if and only if there exist s1 = s, . . . , sn = s0 ∈ S such that:

[s1 , s2 ], [s2 , s3 ], [s3 , s4 ], . . . , [sn−1 , sn ] ∈ Dsiδ . (4.2)

Let s1 , . . . , sn ∈ S be such that Condition 4.2 above is satisfied. Note that this implies that
there exists a δ-sink ri for each [si , si+1 ], for 1 ≤ i ≤ n − 1. Let r1 , . . . , rn−1 be δ-sinks
for [s1 , s2 ], . . . , [sn−1 , sn ]. We can further verify, using the fact that ωS vanishes on the
diagonal, that the sinks r1 , . . . , rn−1 themselves belong to this connected component:

[s1 , r1 ], [r1 , s2 ], [s2 , r2 ], [r2 , s3 ] . . . , [rn−1 , sn ] ∈ Dsiδ .

The moral of the preceding observations can be summarized as follows:

The vertex set of any connected component of Dsiδ contains a special subset of “attractor”
or “sink” states at resolution δ.

So in 2010, for any i ∈ {1, n − 1}, there were at least (1 − δ)(influx(ri )) people
in si and in si−1 who had a common goal of moving to ri in 2011. Moreover, for any
i, j ∈ {1, . . . , n} , i 6= j, there were at least min1≤i≤n−1 (1 − δ)(influx(ri )) people in each

197
of si and sj in 2010 who migrated elsewhere in 2011. From a different perspective, we are
able to distinguish all the states in a connected component that are significantly attractive
to migrants (the sinks/receivers), and we have quantitative estimates on the migrant flow
within this connected component into its sink/receiver states.
Consider the special case where each state in a connected component of n states, written
as s1 , s2 , . . . , sn , loses (1 − δ)(influx(r)) residents to a single state r ∈ S. By the preceding
observations, r belongs to this connected component, and we can write r = s1 (relabeling
as needed). Then we observe that the n states {r, s2 , . . . , sn } form an n-simplex, with r as
a common sink. In this case, we have ωS (si , r) ≤ δ for each 2 ≤ i ≤ n. Also note that if
we write

vnδ := [r, s2 ] + [s2 , s3 ] + [s3 , s4 ] + . . . + [sn−1 , sn ] + [sn , r] ∈ C1δ


γnδ := [r, s2 , s3 ] + [r, s3 , s4 ] + . . . + [r, sn−1 , sn ] ∈ C2δ ,

then we can verify that ∂1δ (vn ) = 0, and ∂2δ (γn ) = vn . In other words, we obtain a 1-cycle
that is automatically the boundary of a 2-chain, i.e. is trivial upon passing to homology.
In general, a connected component in Dsiδ might contain chains of states that form loops,
i.e. states s1 , s2 , . . . , sn such that:

[s1 , s2 ], [s2 , s3 ], [s3 , s4 ], . . . , [sn−1 , sn ], [sn , s1 ] ∈ Dsiδ . (4.3)

Note that Condition 4.3 is of course more stringent than Condition 4.2. By writing such
a loop in the form of vnδ above, we can verify that it forms a 1-cycle. Thus a connected
component containing a loop will be detected in a 1-dimensional Dowker persistence di-
agram, unless the resolution at which the 1-cycle appears coincides with that at which it
becomes a 1-boundary.
Interpretation of 1-cycles The preceding discussion shows that it is necessary to determine
not just 1-cycles, but also the 1-boundaries that they eventually form. Any 1-boundary
arises as the image of ∂2δ applied to a linear combination of 2-simplices in Dsiδ . Note that in
this context, each 2-simplex is a triple of states [si , sj , sk ] with a common sink r to which
each of si , sj , sk has lost (1−δ)(influx(r)) residents between 2010 and 2011. Alternatively,
at least (1 − δ)(influx(r)) residents from each of si , sj , sk had a common preference of
moving to r between 2010 and 2011. Next let {[s1 , s01 , s001 ], [s2 , s02 , s002 ], . . . , [sn , s0n , s00n ]} be
a collection of 2-simplices in Dsiδ , with sinks {r1 , . . . , rn }. One way to consolidate the
information they contain is to simply write them as a sum:

τnδ := [s1 , s01 , s001 ] + [s2 , s02 , s002 ] + . . . + [sn , s0n , s00n ] ∈ C2δ .

Notice that applying the boundary map to τn yields:


n
X
znδ ∂2δ (τn ) [s0i , s00i ] − [si , s00i ] + [si , s0i ] .

:= =
i=1

198
At this point we have a list of triples of states, and for each triple we have a quantitative
estimate on the number of residents who have a preferred state for migration in common.
Now we consider the following relaxation of this situation: for a fixed i ∈ {1, . . . , n} and
some δ0 < δ, it might be the case that ri is no longer a mutual δ0 -sink for [si , s0i , s00i ], or even
that there is no δ0 -sink for [si , s0i , s00i ]. However, there might still be δ0 -sinks u, u0 , u00 for
[si , s0i ], [s0i , s00i ], [s00i , si ], respectively. In such a case, we see that τnδ0 6∈ C2δ0 , but znδ0 ∈ C1δ0 .
Thus 0 6= hzn iδ0 ∈ H1 (Dsiδ0 ). Assuming that δ > δ0 is the minimum resolution at which
hzn iδ = 0, we then have a general description of the way in which persistent 1-cycles might
arise.
A very special case of the preceding example occurs when we are able to choose a
δ-sink ri for each [si , s0i , s00i ], i ∈ {1, . . . , n}, such that r1 = r2 = . . . = rn . In this
case, we say that znδ0 becomes a 1-boundary due to a single mutual sink r1 . This situation
is illustrated in Figure 4.13. Also note the interpretation of this special case: assuming
that znδ is a 1-boundary, we know that each of the states in the collection ∪ni=1 {si , s0i , s00i }
loses (1 − δ)(influx(r1 )) residents to r1 between 2010 and 2011. This signals that r1 is an
especially strong attractor state.
We remark that none of the 1-cycles in the U.S. migration data set that we analyzed
exhibited the property of becoming a boundary due to a single mutual sink. However,
we did find several examples of this special phenomenon in the global migration dataset
studied in §4.5.3. One of these special sinks turns out to be Djibouti, which is a gateway
from the Horn of Africa to the Middle East, and is both a destination and a port of transit
for migrants moving between Asia and Africa.
Interpretation of barcodes in the context of migration data. Having suggested interpre-
tations of simplices, cycles, and boundaries, we now turn to the question of interpreting a
persistence barcode in the context of migration. Note that when computing persistence bar-
codes, Javaplex can return a representative cycle for each bar, with the caveat that we do not
have any control over which representative is returned. From the 1-dimensional Dowker
persistence barcode of a migration dataset, we can use the right endpoint of a bar to obtain
a 1-boundary, i.e. a list of triples of states along with quantitative estimates on how many
residents from each triple had a preferred migration destination in common. In the special
case where the 1-boundary forms due to a single mutual sink, we will have a further quan-
titative estimate on how many residents from each state in the 1-boundary migrated to the
mutual sink. The left endpoint of a bar in the 1-dimensional Dowker persistence barcode
corresponds to a special connected component with the structure of a 1-cycle. Notice that
all the connected components are listed in the 0-dimensional Dowker persistence diagram.
See §4.5.3 for some additional comments.
Interpretation of error between lower bounds and true migration In each of Tables
4.2 and 4.3 (and Tables 4.4, 4.5 in §4.5.3), we have provided lower bounds on migration
flow between certain states, following the discussion above. More precisely, we do the
following:

199
0-cycles Given a persistence interval [0, δ), δ ∈ R, and a representative 0-cycle, we find
the 1-simplex that converts the 0-cycle into a 0-boundary at resolution δ. We then
find a δ-sink for this 1-simplex, and estimate a lower bound on the migrant flow into
this δ-sink.

1-cycles Given a persistence interval [δ0 , δ1 ), δ0 , δ1 ∈ R, and a representative 1-cycle, we


find a δ0 -sink for each 1-simplex in the 1-cycle, and estimate a lower bound on the
migrant flow into this δ0 -sink from its associated 1-simplex.

We also provide the true migration flows beside our lower bound estimates. However,
in each of our analyses, we incur a certain error between our lower bound and the actual
migration value. We now provide some interpretations for this error.
For the case of 0-cycles, note that all the networks we analyze are normalized to have
edge weights in the interval [0, 1]. For efficiency, in order to produce a Dowker filtration,
we compute Dsiδ for δ-values in the set

delta := {0.01, 0.02, 0.03, . . . , 1.00}.

So whenever we have ωS (si , sj ) 6∈ delta for some states si , sj ∈ S, the 1-simplex [si , sj ]
is not detected until we compute Dsiδ0 , where δ 0 is the smallest element in delta greater than
ωS (si , sj ). If sj is a δ-sink in this case, then our predicted lower bound on the migration
flow si → sj will differ by up to (0.01)(influx(sj )) from the true value. The situation
described here best explains the error values in Table 4.4.
For the case of 1-cycles, we will study a simple motivating example. Suppose we have
the following 1-simplices:

[s1 , s2 ], [s2 , s3 ], . . . , [sn−1 , sn ], [sn , s1 ], for s1 , . . . , sn ∈ S.

For each i ∈ {1, . . . , n}, let δi ∈ R denote the resolution at which simplex [si , si+1 (mod n) ]
emerges. For simplicity, suppose we have δ1 ≤ δ2 ≤ δ3 ≤ . . . ≤ δn , and also that s2 is
a δn -sink for [s1 , s2 ]. For our lower bound, we estimate that the migrant flow s1 → s2 is
at least (1 − δn )(influx(s2 )). A better lower bound would be (1 − δ1 )(influx(s2 )), but the
only δ-value that Javaplex gives us access to is δn . Because δ1 could be much smaller than
δn , it might be the case that our lower bound is much smaller than the true migration.
The preceding discussion suggests the following inference: if a 1-simplex [si , si+1 ]
exhibits a large error between the true migration into a δn -sink and the predicted lower
bound, then [si , si+1 ] likely emerged at a resolution proportionately smaller than δn . Thus
we can interpret the states si , si+1 as exhibiting relatively strong coherence of preference.
Conversely, 1-simplices that exhibit a smaller error likely emerged at a resolution closer to
δn —the states forming such 1-simplices exhibited incoherence of preference for a greater
range of resolutions. Note that even though we made some simplifying assumptions in our
choice of a 1-cycle, a similar analysis can be done for any 1-cycle.

200
Figure 4.13: An example of a 1-cycle becoming a 1-boundary due to a single mutual sink
r, as described in the interpretation of 1-cycles in §4.5.2. The figure on the left shows a
connected component of Dsiδ,S , consisting of [s1 , s2 ], [s2 , s3 ], [s3 , s4 ]. The arrows are meant
to suggest that r will eventually become a δ-sink for each of these 1-simplices, for some
large enough δ. The progression of these simplices for increasing values of δ are shown
from left to right. In the leftmost figure, r is not a δ-sink for any of the three 1-simplices.
Note that r has become a δ-sink for [s3 , s4 ] in the middle figure. Finally, in the rightmost
figure, r has become a δ-sink for each of the three 1-simplices.

Analysis of U.S. migration


We computed the 0 and 1-dimensional Dowker persistence barcodes of the network
(S, ωS ) obtained in §4.5.2. The result is shown in Figure 4.14. As particular cases, we
study the two 1-dimensional bars that are highlighted in red in Figure 4.14. We obtain
representative cycles for each of these bars, and superimpose them on a map of the U.S. in
Figure 4.15. In the remainder of this section, we will discuss the cycles presented in Figure
4.15.
The (OH-KY-GA-FL) cycle As a representative 1-cycle for the 1-dimensional interval
[0.90, 0.94), Javaplex returns the 1-cycle ([FL,OH] + [FL,GA] + [KY,OH] + [GA,FL]). This
Ohio-Kentucky-Georgia-Florida cycle first emerges in C10.90 , and becomes a 1-boundary in
C10.94 . In Table 4.2, we provide the 0.90-sinks for each of the 1-simplices, the 0.94-sinks
for some 2-simplices that convert this cycle into a boundary, our estimates on the number
of migrants to each sink, and also the true numbers of migrants.
The (WA-OR-CA-AZ-UT-ID) cycle As a representative 1-cycle for the 1-dimensional
interval [0.87, 0.92), Javaplex returns the 1-cycle ([CA,OR] + [OR,WA] + [AZ,UT] +
[ID,UT] + [ID,WA] + [AZ,CA]). This Washington-Oregon-California-Arizona-Utah-Idaho
cycle first emerges in C10.87 , and becomes a 1-boundary in C10.92 . In Table 4.3, we provide
the 0.87-sinks for each of the 1-simplices, the 0.92-sinks that convert this 1-cycle into a
boundary, our estimates on the number of migrants to each sink, and the true migration
numbers.

201
Analysis of OH-KY-GA-FL cycle
1-simplex 0.90-sinks Estimated lower bound on migration True migration
[FL,OH] WV (1 − 0.90)(influx(WV)) = 4597 m(FL, WV) = 4964
m(OH, WV) = 7548
[FL,GA] AL (1 − 0.90)(influx(AL)) = 10684 m(FL, AL) = 12635
m(GA, AL) = 18799
GA (1 − 0.90)(influx(GA)) = 24913 m(FL, GA) = 38658
[KY,OH] KY (1 − 0.90)(influx(KY)) = 9925 m(OH, KY) = 12744
[GA,KY] TN (1 − 0.90)(influx(TN)) = 15446 m(GA, TN) = 16898
m(KY, TN) = 16852
2-simplex 0.94-sinks Estimated lower bound on migration True migration
[FL,OH,KY] IN (1 − 0.94)(influx(IN)) = 8640 m(FL, IN) = 11472
m(OH, IN) = 11588
m(KY, IN) = 11071
OH (1 − 0.94)(influx(OH)) = 12363 m(FL, OH) = 18191
m(KY, OH) = 19617
[FL,GA,KY] TN (1 − 0.94)(influx(TN)) = 9268 m(FL, TN) = 10451
m(GA, TN) = 16898
m(KY, TN) = 16852

Table 4.2: Quantitative estimates on migrant flow, following the interpretation presented in
§4.5.2. In each row, we list a simplex of the form [si , sj ] (resp. [si , sj , sl ] for 2-simplices)
and any possible δ-sinks sk . We hypothesize that sk receives at least (1 − δ)(influx(sk ))
migrants from each of si , sj (resp. si , sj , sl )—these lower bounds are presented in the third
column. The fourth column contains the true migration numbers. Notice that the [FL,GA]
simplex appears to show the greatest error between the lower bound and the true migra-
tion. Following the interpretation suggested earlier in §4.5.2, this indicates that Florida and
Georgia appear to have strong coherence of preference, relative to the other pairs of states
spanning 1-simplices in this table.

202
Analysis of WA-OR-CA-AZ-UT-ID cycle
1-simplex 0.87-sinks Estimated lower bound on migration True migration
[CA,OR] OR (1 − 0.87)(influx(OR)) = 14273 m(CA, OR) = 18165
[OR,WA] OR (1 − 0.87)(influx(OR)) = 14273 m(WA, OR) = 29168
[AZ,UT] UT (1 − 0.87)(influx(UT)) = 9517 m(AZ, UT) = 10577
[ID,UT] ID (1 − 0.87)(influx(ID)) = 7519 m(UT, ID) = 7538
[ID,WA] ID (1 − 0.87)(influx(ID)) = 7519 m(WA, ID) = 10895
[AZ,CA] AZ (1 − 0.87)(influx(AZ)) = 27566 m(CA, AZ) = 35650
2-simplex 0.92-sinks Estimated lower bound on migration True migration
[OR,WA,UT] ID (1 − 0.92)(influx(ID)) = 4627 m(OR, ID) = 6236
m(WA, ID) = 10895
m(UT, ID) = 7538
[AZ,CA,ID] UT (1 − 0.92)(influx(UT)) = 5856 m(AZ, UT) = 10577
m(CA, UT) = 8944
m(ID, UT) = 6059

Table 4.3: Quantitative estimates on migrant flow, following the interpretation presented in
§4.5.2. The entries in this table follow the same rules as those of Table 4.2. Notice that
the [OR,WA] and [AZ,CA] simplices show the greatest error between the lower bound and
the true migration. Following the interpretation in §4.5.2, this suggests that these two pairs
of states exhibit stronger coherence of preference than the other pairs of states forming
1-simplices in this table.

Figure 4.14: 0 and 1-dimensional Dowker persistence barcodes for U.S. migration data

203
Figure 4.15: U.S. map with representative cycles of the persistence intervals that were
highlighted in Figure 4.14. The cycle on the left appears at δ1 = 0.87, and the cycle on the
right appears at δ2 = 0.90. The red lines indicate the 1-simplices that participate in each
cycle. Each red line is decorated with an arrowhead si → sj if and only if sj is a sink for
the simplex [si , sj ]. The blue arrows point towards all possible alternative δ-sinks, and are
interpreted as follows: Tennessee is a 0.90-sink for the Kentucky-Georgia simplex, West
Virginia is a 0.90-sink for the Ohio-Florida simplex, and Alabama is a 0.90-sink for the
Georgia-Florida simplex.

204
To probe the sociological aspects of a 1-cycle, we recall our hypothesis that residents
of states that are not connected by a 1-simplex are less similar to each other than residents
of states that are connected as such. The West Coast cycle given above seems to follow
this hypothesis: It seems reasonable to think that residents of California would be quite
different from residents of Idaho or Utah, and possibly quite similar to those of Oregon.
Similarly, one would expect a large group of people from Ohio and Kentucky to be quite
similar, especially with Cincinnati being adjacent to the state border with Kentucky. The
Ohio-Florida simplex might be harder to justify, but given the very small population of
their mutual sink West Virginia, it might be the case that the similarity between Ohio and
Florida is being overrepresented.
Our analysis shows that by using Dowker persistence diagrams for exploratory analysis
of migration data, we can obtain meaningful lower bounds on the number of residents from
different states who share a common migration destination. An interesting extension of this
experiment would be to study the persistence barcodes of migration networks over a longer
range of years than the 2010-2011 range that we have used here: ideally, we would be able
to detect changing trends in migration from changes in the lower bounds that we obtain.

4.5.3 Global migration


For our next example, we study data from the World Bank Global Bilateral Migration
Database [94] on global bilateral migration between 1990 and 2000. This dataset is avail-
able on http://databank.worldbank.org/. As in the case of the U.S. migration
database, we begin with a set C = {c1 , c2 , . . . , c231 } of 231 global regions and a function
m : C × C → Z+ , where m(ci , cj ) represents the number of migrants moving from region
ci to region cj . We define a network (C, ωC ), with ωC given by:
!
m(ci , cj )
ωC (ci , cj ) = f P if ci 6= cj , ωC (ci , ci ) = 0, ci , cj ∈ C,
ci ∈C,i6=j m(ci , cj )

where f (x) = 1 − x. The 0 and 1-dimensional Dowker persistence barcodes that we obtain
from this network are provided in Figure 4.16. Some of the 0 and 1-dimensional persistence
intervals are tabulated in Tables 4.4 and 4.5.
We interpret connected components, simplices, cycles and boundaries for the Dowker
sink complexes constructed from the global migration data just as we did for the U.S.
migration data in §4.5.2.
We draw the reader’s attention to two interesting features in the persistence barcodes
of the global migration dataset. First, the 0-dimensional barcode contains many short bars
(e.g. bars of length less than 0.2). In contrast, the shortest bars in the 0-dimensional barcode
for the U.S. migration data had length greater than 0.65. In our interpretation, which we
explain more carefully below, this observation suggests that migration patterns in the U.S.
are relatively uniform, whereas global migration patterns can be more skewed. Second,

205
because there are many more 1-dimensional persistence intervals, it is easier to find a 1-
cycle that becomes a boundary due to a single mutual sink, i.e. due to an especially strong
“attractor” region.
For the first observation, consider a 0-dimensional persistence interval [0, δ), where δ
is assumed to be small. Formally, this interval represents the persistence of a 0-cycle that
emerges at resolution 0, and becomes a 0-boundary at resolution δ. One can further verify
the following: this interval represents the resolutions for which a 0-simplex [ci ], ci ∈ C
remains disconnected from other 0-simplices, and δ is a resolution at which ci forms a 1-
simplex with some cj ∈ C. Recall from §4.5.2 that this means the following: either there
exists a region ck 6∈ {ci , cj } which receives at least (1 − δ)(influx(ck )) migrants from each
of ci and cj , or ck = ci (or cj ) and ck receives over (1 − δ)(influx(ck )) migrants from cj
(resp. ci ). The first case cannot happen when δ < 0.5, because this would mean that ck
receives strictly over 50% of its migrant influx from each of ci and cj . Thus when δ < 0.5,
we know that ck = ci (or ck = cj ), and ck receives over 50% of its migrant influx from cj
(resp. ci ). For very small δ, we then know that most of the migrants into ck arrive from cj
(resp. ci ).
For convenience, let us assume that δ < 0.2, that ck = ci , and that ck receives over
80% of its migrant influx from cj . This might occur for a variety of reasons, some of which
are: (1) there might be war or political strife in cj and ck might be letting in refugees, (2)
ck might have achieved independence or administrative autonomy and some residents from
cj might be flocking to ck because they perceive it to be their homeland, and (3) cj might
be overwhelmingly populous in comparison to other neighboring regions of ck , so that the
contribution of cj to the migrant influx of ck dominates that of other regions.
Notice that neither of the first two reasons listed above are valid in the case of U.S.
migration. The third reason is valid in the case of a few states, but nevertheless, the short-
est 0-dimensional persistence interval in the U.S. migration dataset has length greater than
0.65. In other words, the minimal resolution at which a 1-simplex forms in the U.S. migra-
tion data is 0.65. This in turn means that there is no state in the U.S. which receives over
35% of its migrant influx from any single other state. Based on this reasoning, we interpret
the migration pattern of the U.S. as “diffuse” or “uniform”, and that of the world as a whole
as “skewed” or “biased”. This makes intuitive sense, because despite the heterogeneity of
the U.S. and differences in state laws and demographics, any resident can easily migrate
to any other state of their choice while maintaining similar legal rights, salary, and living
standards.
In Table 4.4, we list some short 0-dimensional persistence intervals for the global mi-
gration dataset. For each interval [0, δ), we also include the 1-simplex that emerges at δ,
the δ-sink associated to this 1-simplex, and our lower bound on the migrant influx into
this sink. Note that the error between the true migration numbers and our predicted lower
bounds is explained in §4.5.2. Also notice that many of the migration patterns provided
in Table 4.4 seem to fit with the suggestions we made earlier: (1) political turmoil in the
West Bank and Gaza (especially following the Gulf War) prompted many Palestinians to

206
enter Syria, (2) Greenland and Macao are both autonomous regions of Denmark and China,
respectively, and (3) India’s population far outstrips that of its neighbors, and its migrant
flow plays a dominating role in the migrant influx of its neighboring states.
For the second observation, recall from our discussion in §4.5.2 that whenever we have
a 1-cycle involving regions c1 , . . . , cn that becomes a 1-boundary at resolution δ ≥ 0 due
to a single mutual sink cn+1 , we know that cn+1 receives at least (1 − δ)(influx(cn+1 ))
migrants from each of c1 , . . . , cn . As such, cn+1 can be perceived to be an especially strong
attractor region. In Table 4.5 we list some 1-cycles persisting on an interval [δ0 , δ1 ), their
mutual δ1 -sinks, our lower bound on migration flow, and the true migration numbers. The
reader is again encouraged to check that the true migration agrees with the lower bounds
that we predicted. We remark that the first row of this table contains a notable example of a
strong attractor region: Djibouti. Djibouti is geographically located at a crossroads of Asia
and Africa, and is a major commercial hub due to its access to the Red Sea and the Indian
Ocean. As such, one would expect it to be a destination for many migrants in the Horn of
Africa, as well as a transit point for migrants moving between Africa and the Middle East.
The Oceania cycle listed in the fourth row of Table 4.5 can likely be discarded; the
very small migrant influx of Samoa indicates that it attractiveness as a sink state is being
overrepresented. The second row lists China as a strong attractor, which is reasonable given
its economic growth between 1990 and 2000, and as a consequence, its attractiveness to
foreign workers from neighboring countries. The third row lists Vietnam as a strong sink,
and one reason could be that in the 1990s, many refuges who had been displaced due to the
Vietnam War were returning to their homeland.
We also illustrate the emergence at δ0 for some of these cycles in Figure 4.17.

207
Figure 4.16: Dowker persistence barcodes for global migration dataset.

Short bars in 0-dimensional global migration barcode


Interval [0, δ) 1-simplex δ-sink(s) Lower bound on migration True migration
[0.0,0.03) [India, Sri Lanka] Sri Lanka (0.97)(influx(LKA)) = 383034 m(IND, LKA) = 384789
[0.0,0.04) [India, Bangladesh] Bangladesh (0.96)(influx(BGD)) = 927270 m(IND, BGD) = 936151
[0.0,0.05) [India, Nepal] Nepal (0.95)(influx(NPL)) = 927270 m(IND, BGD) = 936151
[0.0,0.05) [India, Pakistan] Pakistan (0.95)(influx(PAK)) = 2508882 m(IND, PAK) = 2512906
[0.0,0.06) [India, Bhutan] Bhutan (0.94)(influx(BTN)) = 30206 m(IND, BTN) = 30431
[0.0,0.06) [Denmark, Greenland] Greenland (0.94)(influx(GRL)) = 6792 m(DNK, GRL) = 6808
[0.0,0.10) [Greece, Albania] Albania (0.90)(influx(ALB)) = 67008 m(GRC, ALB) = 67508
[0.0,0.11) [Timor-Leste, Indonesia] Timor-Leste (0.89)(influx(TLS)) = 8246 m(IDN, TLS) = 8334
[0.0,0.15) [West Bank and Gaza, Syria] Syria (0.85)(influx(SYR)) = 455515 m(PSE, SYR) = 458611
[0.0,0.16) [Macao, China] Macao (0.84)(influx(MAC)) = 201840 m(CHN, MAC) = 203877

Table 4.4: Short 0-dimensional Dowker persistence intervals capture regions which receive
most of their incoming migrants from a single source. Each interval [0, δ) corresponds to
a 0-simplex which becomes subsumed into a 1-simplex at resolution δ. We list these 1-
simplices in the second column, and their δ sinks in the third column. The definition of
a δ-sink enables us to produce a lower bound on the migration into each sink, which we
provide in the fourth column. We also list the true migration numbers in the fifth column,
and the reader can consult §4.5.2 for our explanation of the error between the true migration
and the lower bounds on migration.

208
DJI KIR

ETH SOM TUV PNG

ERI UGA GBR AUS

CHN CHN

PHL THA MYS IDN

Figure 4.17: Top: Two cycles corresponding to the left endpoints of the (Djibouti-Somalia-
Uganda-Eritrea-Ethiopia) and (Kiribati-Papua New Guinea-Australia-United Kingdom-
Tuvalu) persistence intervals listed in Table 4.5. The δ values are 0.73, 0.77, respec-
tively. Bottom: Two cycles corresponding to the left endpoints of the (China-Thailand-
Philippines) and (China-Indonesia-Malaysia) persistence intervals listed in Table 4.5. The
δ values are 0.77, 0.75, respectively. Meaning of arrows: In each cycle, an arrow si → sj
means that ωS (si , sj ) ≤ δ, i.e. that sj is a sink for the simplex [si , sj ]. We can verify sep-
arately that for δ = 0.77, the Kiribati-Papua New Guinea simplex has the Solomon Islands
as a δ-sink, and that the Philippines-Thailand simplex has Taiwan as a δ-sink. Similarly,
the China-Malaysia simplex has Singapore as a δ-sink, for δ = 0.75.

209
1-cycles with single mutual sink in global migration data
Interval [δ0 , δ1 ) Regions involved Mutual δ1 -sink(s) Lower bound on migration True migration
[0.73,0.98) Djibouti—Ethiopia—Eritrea—Uganda—Somalia Djibouti (0.02)(influx(DJI)) = 1738 m(ERI, DJI) = 3259
m(ETH, DJI) = 25437
m(SOM, DJI) = 41968
m(UGA, DJI) = 1811
[0.77,0.94) China—Thailand—Philippines China (0.06)(influx(CHN)) = 12858 m(THA, CHN) = 14829
m(PHL, CHN) = 17828
[0.75,0.89) China—Indonesia—Malaysia Vietnam (0.11)(influx(VNM)) = 4465 m(CHN, VNM) = 8940
m(IDN, VNM) = 10529
m(MYS, VNM) = 4813
[0.86,0.93) American Samoa—New Zealand—Samoa—Australia Samoa (0.07)(influx(WSM)) = 397 m(ASM, WSM) = 1920
m(NZL, WSM) = 1803
m(AUS, WSM) = 404
[0.77,0.92) Kiribati—Papua New Guinea—Australia—United Kingdom—Tuvalu Not applicable Not Applicable Not Applicable

Table 4.5: Representative 1-cycles for several intervals in the 1-dimensional persistence
barcode for the global migration dataset. Each of the first four cycles has the special prop-
erty that it becomes a boundary due to a single sink at the right endpoint of its associated
persistence interval. This permits us to obtain a lower bound on the migration into this
sink from each of the regions in the cycle. The last row contains a cycle without this spe-
cial property. The font colors of the persistence intervals correspond to the colors of the
highlighted 1-dimensional bars in Figure 4.16.

210
Bibliography

[1] Dowker’s theorem. https://ncatlab.org/nlab/show/Dowker%27s+


theorem. Accessed: 2017-04-24.

[2] Emmanuel Abbe. Community detection and stochastic block models: recent devel-
opments. arXiv preprint arXiv:1703.10146, 2017.

[3] Michał Adamaszek and Henry Adams. The Vietoris–Rips complexes of a circle.
Pacific Journal of Mathematics, 290(1):1–40, 2017.

[4] Michal Adamaszek, Henry Adams, Florian Frick, Chris Peterson, and Corrine
Previte-Johnson. Nerve complexes of circular arcs. Discrete & Computational Ge-
ometry, 56(2):251–273, 2016.

[5] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Gradient flows: in metric
spaces and in the space of probability measures. Springer Science & Business Me-
dia, 2008.

[6] David Bao, S-S Chern, and Zhongmin Shen. An introduction to Riemann-Finsler
geometry, volume 200. Springer Science & Business Media, 2012.

[7] Jonathan A Barmak. Algebraic topology of finite topological spaces and applica-
tions, volume 2032. Springer, 2011.

[8] Ulrich Bauer, Michael Kerber, and Jan Reininghaus. Clear and compress: Comput-
ing persistent homology in chunks. In Topological Methods in Data Analysis and
Visualization III, pages 103–117. Springer, 2014.

[9] Ulrich Bauer and Michael Lesnick. Induced matchings of barcodes and the alge-
braic stability of persistence. In Proceedings of the thirtieth annual symposium on
Computational geometry, page 355. ACM, 2014.

[10] Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel
Peyré. Iterative Bregman projections for regularized transportation problems. SIAM
Journal on Scientific Computing, 37(2):A1111–A1138, 2015.

211
[11] Anders Björner. Topological methods. Handbook of combinatorics, 2:1819–1872,
1995.

[12] Anders Björner, Bernhard Korte, and László Lovász. Homotopy properties of gree-
doids. Advances in Applied Mathematics, 6(4):447–494, 1985.

[13] Vladimir I Bogachev. Measure theory, volume 2. Springer Science & Business
Media, 2007.

[14] Vladimir I Bogachev. Measure theory, volume 1. Springer Science & Business
Media, 2007.

[15] Mireille Boutin and Gregor Kemper. Lossless representation of graphs using distri-
butions. arXiv preprint arXiv:0710.1870, 2007.

[16] Martin R Bridson and André Haefliger. Metric spaces of non-positive curvature,
volume 319. Springer Science & Business Media, 2011.

[17] Dmitri Burago, Yuri Burago, and Sergei Ivanov. A Course in Metric Geometry,
volume 33 of AMS Graduate Studies in Math. American Mathematical Society,
2001.

[18] Rainer E Burkard, Mauro Dell’Amico, and Silvano Martello. Assignment Problems.
SIAM, 2009.

[19] Gunnar Carlsson and Vin De Silva. Zigzag persistence. Foundations of computa-
tional mathematics, 10(4):367–405, 2010.

[20] Gunnar Carlsson, Facundo Mémoli, Alejandro Ribeiro, and Santiago Segarra. Ax-
iomatic construction of hierarchical clustering in asymmetric networks. In Acoustics,
Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on,
pages 5219–5223. IEEE, 2013.

[21] Gunnar Carlsson, Afra Zomorodian, Anne Collins, and Leonidas J Guibas. Persis-
tence barcodes for shapes. International Journal of Shape Modeling, 11(02):149–
187, 2005.

[22] Gunnar E. Carlsson, Facundo Mémoli, Alejandro Ribeiro, and Santiago Segarra.
Hierarchical quasi-clustering methods for asymmetric networks. In Proceedings of
the 31th International Conference on Machine Learning, ICML 2014, 2014.

[23] CJ Carstens and KJ Horadam. Persistent homology of collaboration networks. Math-


ematical Problems in Engineering, 2013, 2013.

212
[24] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J Guibas, and Steve Y
Oudot. Proximity of persistence modules and their diagrams. In Proceedings of the
twenty-fifth annual symposium on Computational geometry, pages 237–246. ACM,
2009.

[25] Frédéric Chazal, David Cohen-Steiner, Leonidas J Guibas, Facundo Mémoli, and
Steve Y Oudot. Gromov-hausdorff stable signatures for shapes using persistence.
In Computer Graphics Forum, volume 28, pages 1393–1403. Wiley Online Library,
2009.

[26] Frédéric Chazal, Vin De Silva, Marc Glisse, and Steve Oudot. The structure and
stability of persistence modules. Springer International Publishing, 2016.

[27] Frédéric Chazal, Vin De Silva, and Steve Oudot. Persistence stability for geometric
complexes. Geometriae Dedicata, 173(1):193–214, 2014.

[28] Frédéric Chazal and Steve Y Oudot. Towards persistence-based reconstruction in


Euclidean spaces. In Proceedings of the twenty-fourth annual Symposium on Com-
putational Geometry, pages 232–241. ACM, 2008.

[29] Chao Chen and Michael Kerber. Persistent homology computation with a twist.
In Proceedings 27th European Workshop on Computational Geometry, volume 11,
2011.

[30] Lénaı̈c Chizat. Transport optimal de mesures positives: modèles, méthodes


numériques, applications. PhD thesis, Université Paris-Dauphine, 2017.

[31] Lénaı̈c Chizat, Gabriel Peyré, Bernhard Schmitzer, and François-Xavier Vialard.
Scaling algorithms for unbalanced optimal transport problems. Math. Comp.,
87(314):2563–2609, 2018.

[32] Samir Chowdhury, Bowen Dai, and Facundo Mémoli. Topology of stimulus space
via directed network persistent homology. Cosyne Abstracts 2017.

[33] Samir Chowdhury, Bowen Dai, and Facundo Mémoli. The importance of forgetting:
Limiting memory improves recovery of topological characteristics from neural data.
PloS one, 13(9):e0202561, 2018.

[34] Samir Chowdhury and Facundo Mémoli. Convergence of hierarchical cluster-


ing and persistent homology methods on directed networks. arXiv preprint
arXiv:1711.04211, 2017.

[35] Samir Chowdhury and Facundo Mémoli. Distances and isomorphism between net-
works and the stability of network invariants. arXiv preprint arXiv:1708.04727,
2017.

213
[36] Samir Chowdhury and Facundo Mémoli. Explicit geodesics in Gromov-Hausdorff
space. Electronic Research Announcements in Mathematical Sciences, 2018.

[37] Samir Chowdhury and Facundo Mémoli. A functorial Dowker theorem and per-
sistent homology of asymmetric networks. Journal of Applied and Computational
Topology, 2(1-2):115–175, 2018.

[38] Samir Chowdhury and Facundo Mémoli. The Gromov-Wasserstein distance be-
tween networks and stable network invariants. arXiv preprint arXiv:1808.04337,
2018.

[39] Samir Chowdhury and Facundo Mémoli. The metric space of networks. arXiv
preprint arXiv:1804.02820, 2018.

[40] Samir Chowdhury and Facundo Mémoli. Persistent path homology of directed net-
works. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Dis-
crete Algorithms, pages 1152–1169. SIAM, 2018.

[41] Carina Curto and Vladimir Itskov. Cell groups reveal structure of stimulus space.
PLoS Computational Biology, 4(10), 2008.

[42] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In


Advances in neural information processing systems, pages 2292–2300, 2013.

[43] Yuri Dabaghian, Facundo Mémoli, L Frank, and Gunnar Carlsson. A topological
paradigm for hippocampal spatial map formation using persistent homology. PLoS
Comput Biol, 8(8), 2012.

[44] Vin De Silva and Gunnar Carlsson. Topological estimation using witness complexes.
Proc. Sympos. Point-Based Graphics, pages 157–166, 2004.

[45] Tamal K Dey, Facundo Mémoli, and Yusu Wang. Multiscale mapper: Topological
summarization via codomain covers. In Proceedings of the twenty-seventh annual
ACM-SIAM Symposium on Discrete Algorithms, pages 997–1013. SIAM, 2016.

[46] Clifford H Dowker. Homology groups of relations. Annals of mathematics, pages


84–95, 1952.

[47] Herbert Edelsbrunner and John Harer. Computational topology: an introduction.


American Mathematical Soc., 2010.

[48] Herbert Edelsbrunner, Grzegorz Jabłoński, and Marian Mrozek. The persistent ho-
mology of a self-map. Foundations of Computational Mathematics, 15(5):1213–
1244, 2015.

214
[49] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persis-
tence and simplification. Discrete and Computational Geometry, 28(4):511–533,
2002.

[50] Herbert Edelsbrunner and Dmitriy Morozov. Persistent homology: theory and prac-
tice. 2014.

[51] Gerald A Edgar. Classics on fractals. 1993.

[52] Alon Efrat, Alon Itai, and Matthew J Katz. Geometry helps in bottleneck matching
and related problems. Algorithmica, 31(1):1–28, 2001.

[53] Alexander Engström. Complexes of directed trees and independence complexes.


Discrete Mathematics, 309(10):3299–3309, 2009.

[54] Santo Fortunato. Benchmark graphs to test community detection algo-


rithms. https://sites.google.com/site/santofortunato/
inthepress2.

[55] M Maurice Fréchet. Sur quelques points du calcul fonctionnel. Rendiconti del
Circolo Matematico di Palermo (1884-1940), 22(1):1–72, 1906.

[56] Patrizio Frosini. Measuring shapes by size functions. In Intelligent Robots and Com-
puter Vision X: Algorithms and Techniques, pages 122–133. International Society for
Optics and Photonics, 1992.

[57] Fred Galvin and Samuel Shore. Completeness in semimetric spaces. Pacific Journal
of Mathematics, 113(1):67–75, 1984.

[58] Fred Galvin and Samuel Shore. Distance functions and topologies. The American
Mathematical Monthly, 98(7):620–623, 1991.

[59] Robert Ghrist. Elementary applied topology. Createspace, 2014.

[60] Chad Giusti, Eva Pastalkova, Carina Curto, and Vladimir Itskov. Clique topology
reveals intrinsic geometric structure in neural correlations. Proceedings of the Na-
tional Academy of Sciences, 112(44):13455–13460, 2015.

[61] Thibaut Le Gouic and Jean-Michel Loubes. Existence and consistency of wasser-
stein barycenters. arXiv preprint arXiv:1506.04153, 2015.

[62] Alexander Grigor’yan, Yong Lin, Yuri Muranov, and Shing-Tung Yau. Homologies
of path complexes and digraphs. arXiv preprint arXiv:1207.2834, 2012.

[63] Alexander Grigor’yan, Yong Lin, Yuri Muranov, and Shing-Tung Yau. Homotopy
theory for digraphs. Pure and Applied Mathematics Quarterly, 10(4), 2014.

215
[64] Alexander Grigor’yan, Yuri Muranov, and Shing-Tung Yau. Homologies of digraphs
and the Künneth formula. 2015.

[65] Misha Gromov. Metric structures for Riemannian and non-Riemannian spaces, vol-
ume 152 of Progress in Mathematics. Birkhäuser Boston Inc., Boston, MA, 1999.

[66] World Bank Group. Global bilateral migration database.


https://datacatalog.worldbank.org/dataset/
global-bilateral-migration-database. Accessed: October 3,
2018.

[67] Gary Gruenhage. Generalized metric spaces. Handbook of set-theoretic topology,


pages 423–501, 1984.

[68] Allen Hatcher. Algebraic topology. 2002. Cambridge UP, Cambridge, 606(9), 2002.

[69] Jean-Claude Hausmann. On the vietoris-rips complexes and a cohomology theory


for metric spaces. Ann. Math. Studies, 138:175–188, 1995.

[70] Reigo Hendrikson. Using Gromov-Wasserstein distance to explore sets of networks.


Master’s thesis, University of Tartu, 2016.

[71] Danijela Horak, Slobodan Maletić, and Milan Rajković. Persistent homology of
complex networks. Journal of Statistical Mechanics: Theory and Experiment,
2009(03):P03034, 2009.

[72] Karen J Horowitz and Mark A Planting. Concepts and methods of the input-output
accounts. 2006.

[73] Alexandr Ivanov, Nadezhda Nikolaeva, and Alexey Tuzhilin. The Gromov-
Hausdorff metric on the space of compact metric spaces is strictly intrinsic. arXiv
preprint arXiv:1504.03830, 2015.

[74] Roy A Johnson. Atomic and nonatomic measures. Proceedings of the American
Mathematical Society, 25(3):650–655, 1970.

[75] Nigel J Kalton and Mikhail I Ostrovskii. Distances between Banach spaces. In
Forum Mathematicum, volume 11, pages 17–48. Walter de Gruyter, 1999.

[76] Arshi Khalid, Byung Sun Kim, Moo K Chung, Jong Chul Ye, and Daejong Jeon.
Tracing the evolution of multi-scale functional networks in a mouse model of depres-
sion using persistent brain network homology. NeuroImage, 101:351–363, 2014.

[77] Jon M Kleinberg. Authoritative sources in a hyperlinked environment. Journal of


the ACM (JACM), 46(5):604–632, 1999.

216
[78] Dimitry Kozlov. Combinatorial algebraic topology, volume 21. Springer Science &
Business Media, 2007.

[79] Janko Latschev. Vietoris-rips complexes of metric spaces near a closed riemannian
manifold. Archiv der Mathematik, 77(6):522–528, 2001.

[80] Hyekyoung Lee, Moo K Chung, Hyejin Kang, Boong-Nyun Kim, and Dong Soo
Lee. Computing the shape of brain networks using graph filtration and
gromov-hausdorff metric. In Medical Image Computing and Computer-Assisted
Intervention–MICCAI 2011, pages 302–309. Springer, 2011.

[81] Solomon Lefschetz. Algebraic topology, volume 27. American Mathematical Soc.,
1942.

[82] Laurentiu Leustean, Adriana Nicolae, and Alexandru Zaharescu. Barycenters in


uniformly convex geodesic spaces. arXiv preprint arXiv:1609.02589, 2016.

[83] Daniel Lütgehetmann. Flagser. Software available at


https://github.com/luetge/flagser/, 2018.

[84] Paolo Masulli and Alessandro EP Villa. The topology of the directed clique complex
as a network invariant. SpringerPlus, 5(1):1–12, 2016.

[85] Facundo Mémoli. On the use of Gromov-Hausdorff distances for shape comparison.
2007.

[86] Facundo Mémoli. Gromov-Wasserstein distances and the metric approach to ob-
ject matching. Foundations of Computational Mathematics, pages 1–71, 2011.
10.1007/s10208-011-9093-5.

[87] Facundo Mémoli. Some properties of Gromov–Hausdorff distances. Discrete &


Computational Geometry, pages 1–25, 2012. 10.1007/s00454-012-9406-8.

[88] James R Munkres. Elements of algebraic topology, volume 7. Addison-Wesley


Reading, 1984.

[89] James R Munkres. Topology. Prentice Hall, 2000.

[90] Mark Newman. Networks. Oxford university press, 2018.

[91] VW Niemytzki. On the “third axiom of metric space”. Transactions of the American
Mathematical Society, 29(3):507–513, 1927.

[92] Shin-ichi Ohta. Barycenters in Alexandrov spaces of curvature bounded below.


2012.

217
[93] John O’Keefe and Jonathan Dostrovsky. The hippocampus as a spatial map. prelimi-
nary evidence from unit activity in the freely-moving rat. Brain research, 34(1):171–
175, 1971.

[94] Çaglar Özden, Christopher R Parsons, Maurice Schiff, and Terrie L Walmsley.
Where on earth is everybody? The evolution of global bilateral migration 1960–
2000. The World Bank Economic Review, 25(1):12–56, 2011.

[95] Panos M Pardalos and Henry Wolkowicz, editors. Quadratic assignment and related
problems. DIMACS Series in Discrete Mathematics and Theoretical Computer Sci-
ence, 16. American Mathematical Society, Providence, RI, 1994.

[96] Xavier Pennec. Intrinsic statistics on Riemannian manifolds: Basic tools for ge-
ometric measurements. Journal of Mathematical Imaging and Vision, 25(1):127,
2006.

[97] Vladimir Pestov. Dynamics of infinite-dimensional groups: the Ramsey-Dvoretzky-


Milman phenomenon, volume 40. American Mathematical Soc., 2006.

[98] Peter Petersen. Riemannian geometry, volume 171. Springer Science & Business
Media, 2006.

[99] Giovanni Petri, Martina Scolamiero, Irene Donato, and Francesco Vaccarino. Topo-
logical strata of weighted complex networks. PloS one, 8(6):e66506, 2013.

[100] Gabriel Peyré, Marco Cuturi, and Justin Solomon. Gromov-Wasserstein averaging
of kernel and distance matrices. In International Conference on Machine Learning,
pages 2664–2672, 2016.

[101] Arthur Dunn Pitcher and Edward Wilson Chittenden. On the foundations of the
calcul fonctionnel of Fréchet. Transactions of the American Mathematical Society,
19(1):66–78, 1918.

[102] Michael W Reimann, Max Nolte, Martina Scolamiero, Katharine Turner, Rodrigo
Perin, Giuseppe Chindemi, Paweł Dłotko, Ran Levi, Kathryn Hess, and Henry
Markram. Cliques of neurons bound into cavities provide a missing link between
structure and function. Frontiers in computational neuroscience, 11:48, 2017.

[103] Vanessa Robins. Towards computing homology from finite approximations. In


Topology proceedings, volume 24, pages 503–532, 1999.

[104] Sorin V Sabau, Kazuhiro Shibuya, and Hideo Shimada. Metric structures associated
to finsler metrics. arXiv preprint arXiv:1305.5880, 2013.

218
[105] Felix Schmiedl. Shape matching and mesh segmentation: mathematical analysis,
algorithms and an application in automated manufacturing. PhD thesis, München,
Technische Universität München, Diss., 2015, 2015.

[106] Bernhard Schmitzer. Stabilized sparse scaling algorithms for entropy regularized
transport problems. arXiv preprint arXiv:1610.06519, 2016.

[107] Bernhard Schmitzer and Christoph Schnörr. Modelling convex shape priors and
matching based on the Gromov-Wasserstein distance. Journal of mathematical
imaging and vision, 46(1):143–159, 2013.

[108] Yi-Bing Shen and Wei Zhao. Gromov pre-compactness theorems for nonreversible
finsler manifolds. Differential Geometry and its Applications, 28(5):565–581, 2010.

[109] Richard Sinkhorn. A relationship between arbitrary positive matrices and doubly
stochastic matrices. The annals of mathematical statistics, 35(2):876–879, 1964.

[110] Richard Sinkhorn. Diagonal equivalence to matrices with prescribed row and col-
umn sums. The American Mathematical Monthly, 74(4):402–405, 1967.

[111] Justin Solomon, Gabriel Peyré, Vladimir G Kim, and Suvrit Sra. Entropic metric
alignment for correspondence problems. ACM Transactions on Graphics (TOG),
35(4):72, 2016.

[112] Edwin H Spanier. Algebraic topology, volume 55. Springer Science & Business
Media, 1994.

[113] Sashi Mohan Srivastava. A course on Borel sets, volume 180. Springer Science &
Business Media, 2008.

[114] Lynn Arthur Steen, J Arthur Seebach, and Lynn A Steen. Counterexamples in topol-
ogy, volume 18. Springer, 1978.

[115] Aleksandar Stojmirović and Yi-Kuo Yu. Geometric aspects of biological sequence
comparison. Journal of Computational Biology, 16(4):579–610, 2009.

[116] Karl-Theodor Sturm. On the geometry of metric measure spaces. Acta mathematica,
196(1):65–131, 2006.

[117] Karl-Theodor Sturm. The space of spaces: curvature bounds and gradient flows on
the space of metric measure spaces. arXiv preprint arXiv:1208.0434, 2012.

[118] Katharine Turner. Generalizations of the Rips filtration for quasi-metric spaces with
persistent homology stability results. arXiv preprint arXiv:1608.00365, 2016.

219
[119] Titouan Vayer, Laetitia Chapel, Rémi Flamary, Romain Tavenard, and Nicolas
Courty. Optimal transport for structured data. arXiv preprint arXiv:1805.09114,
2018.

[120] Cédric Villani. Topics in optimal transportation. Number 58. American Mathemat-
ical Soc., 2003.

[121] Cédric Villani. Optimal transport: old and new, volume 338. Springer Science &
Business Media, 2008.

[122] Pawel Waszkiewicz. The local triangle axiom in topology and domain theory. Ap-
plied General Topology, 4(1):47–70, 2013.

[123] Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. Discrete
& Computational Geometry, 33(2):249–274, 2005.

220

You might also like