0% found this document useful (0 votes)
13 views21 pages

Hypergraph Co-Optimal Transport: Metric and Categorical Properties

Uploaded by

mymnaka82125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views21 pages

Hypergraph Co-Optimal Transport: Metric and Categorical Properties

Uploaded by

mymnaka82125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

H YPERGRAPH C O -O PTIMAL T RANSPORT: M ETRIC AND

C ATEGORICAL P ROPERTIES

Samir Chowdhury Tom Needham Ethan Semrad Bei Wang


Stanford University Florida State University Florida State University University of Utah
arXiv:2112.03904v1 [math.MG] 7 Dec 2021

Youjia Zhou
University of Utah

December 8, 2021

A BSTRACT

Hypergraphs capture multi-way relationships in data, and they have consequently seen a number of
applications in higher-order network analysis, computer vision, geometry processing, and machine
learning. In this paper, we develop the theoretical foundations in studying the space of hypergraphs
using ingredients from optimal transport. By enriching a hypergraph with probability measures
on its nodes and hyperedges, as well as relational information capturing local and global structure,
we obtain a general and robust framework for studying the collection of all hypergraphs. First,
we introduce a hypergraph distance based on the co-optimal transport framework of Redko et al.
and study its theoretical properties. Second, we formalize common methods for transforming a
hypergraph into a graph as maps from the space of hypergraphs to the space of graphs and study their
functorial properties and Lipschitz bounds. Finally, we demonstrate the versatility of our Hypergraph
Co-Optimal Transport (HyperCOT) framework through various examples.

1 Introduction

The study of hypergraphs is motivated by higher-order network analysis. Graphs are commonly used to encode data
from complex systems in cyber-security, biology, sociology, and physical infrastructure, where nodes represent entities
and edges represent pairwise relations between entities. However, real-world data can be rich in multi-way relationships.
In a co-authorship network, relations are defined by articles written by multiple authors. In a protein-protein interaction
network, relations arise from protein complexes – collections of proteins that interact with each other and are essential
in regulatory processes and cellular functions. These multi-way relationships are better captured by hypergraphs; a
hypergraph consists of a set of nodes and a set of subsets of the node set, whose elements are called hyperedges. This
generalizes the notion of a graph, which is simply a hypergraph whose hyperedges each contain exactly two nodes.
Hypergraphs also arise from applications in computer vision, geometry processing, pattern recognition, and machine
learning, where establishing correspondences between two feature sets is a fundamental issue. The correspondence
problem is traditionally formulated as a graph matching problem [42, 19]: each graph is constructed with nodes
representing features and edges representing pairwise relationships between features. For example, in geometry
processing, absolute locations of features are not as relevant as their pairwise relations, e.g., when imposing invariance
over translations and rotations. However, pairwise relations are not sufficient to incorporate the higher-order relations
among features in establishing correspondences – this can be addressed by a hypergraph matching problem [7, 23].
Recent years have seen the extension of the Gromov-Wasserstein (GW) framework—originally developed as a tool
for comparing metric measure spaces [27, 28]—to probabilistic graph matching tasks [36, 20, 46, 45, 9, 43, 10].
The numerous benefits of this approach include computability via gradient descent [31, 17] or backpropagation [44],
state-of-the-art performance in tasks such as graph partitioning [45, 12], and an underlying theoretical Riemannian
framework [38, 11]. These successes motivate the development of a GW framework for hypergraphs, which is the goal
of this paper. Our contributions include:
D ECEMBER 8, 2021

Figure 1: HyperCOT provides a framework for multi-resolution matching of data; see Sect. 4 for details. Left: Multi-
resolution graphs of a TOSCA Centaur and Wolf, inspired by [13]. Right: A single call to HyperCOT simultaneously
produces low- and high-resolution correspondences where each informs the other. Color transfer from the Centaur to
the Wolf demonstrates the matching.

• We extend the co-optimal transport framework of Redko et al. [32] to define an optimal transport-based
distance between hypergraphs. To put this on firm theoretical footing, we introduce a vastly generalized notion
of a hypergraph, called a measure hypernetwork.
• We establish fundamental properties of our distance. In particular, we show that it is a pseudometric on the
space of measure hypernetworks (Theorem 12), we characterize the pairs of non-isomorphic hypernetworks
whose distance is zero (Proposition 14), and we show that our distance descends to a complete, geodesic
metric on the space of measure networks modulo the distance-0 equivalence relation (Theorem 16).
• We study the categorical properties of hypergraphs. In particular, we study certain common transformations
taking hypergraphs to graphs (e.g., their incidence graphs, line graphs, and clique expansions) as functors from
the space of hypernetworks to the space of networks with respect to various categorical structures. We also
study their Lipschitz properties with respect to our hypernetwork distance and the GW distance on the space
of networks. We show that the incidence graph functor gives an alternative characterization of our distance as
a variant of GW distance on a subspace of measure networks which we refer to as bipartite (Theorem 25).
• We illustrate our open source computational framework 1 for hypergraph matching and comparison. An
example application is shown in Fig. 1; details of this experiment are provided in Sect. 4.

1.1 Related Work

Hypergraph metrics, similarity and dissimilarity measures. Although there is an abundant amount of data
to be modeled as hypergraphs, metrics between hypergraphs have not been explored extensively in the literature.
Karonski and Palka [21] considered hypergraphs defined over the same set of nodes as clusterings and defined the
Marczewski–Steinhaus (MS) distance between the clusterings. The MS distance is a modification from a generalization
of the Hausdorff metric for sets [41]. Karonski and Palka also discussed a distance between hypergraphs generated by
arborescences with the same set of terminal vertices. In comparison with the MS distance, our hypergraph distance
does not require hypergraphs to be defined on the same set of nodes; and more importantly, our distance comes with
nice metric properties, encodes more structural information of the hypergraph, and naturally gives rise to meaningful
hypergraph matching. Lee et al. [23] generalized the edit distance from graphs to hypergraphs. However, such a
distance is impractical as the graph edit distance is NP-hard to compute [47], and APX-hard to approximate [26].
1
Hypergraph Co-Optimal Transport: https://github.com/samirchowdhury/HyperCOT

2
D ECEMBER 8, 2021

Smaniotto and Pelillo [34] recently defined distances between attributed hypergraphs using their maximal common
subhypergraph; however, computing the maximum similarity subhypergraph isomorphism between the two hypergraphs
is NP-complete. In comparison, our hypergraph distance may be approximated efficiently by following the optimization
routine of co-optimal transport.
Another relevant topic is the study of hypergraph similarity or dissimilarity measures (e.g., [40]). The first type of
approach [40] transforms a hypergraph into a graph and applies standard graph similarity or dissimilarity measures.
As tensors [22] are becoming increasingly popular in modeling higher-order interactions, the second type of approach
considers a hypergraph as a tensor [2], and study the distance between a pair of tensor representations [40]. The work in
[16] describes a principled framework for hypergraph matching and proposed to use this framework for going beyond
isometry-invariant shape matching and obtaining invariance under similarity, affine, and projective transformations.
However, this framework constructs hypergraphs with a fixed size for each hyperedge, and is not suited for cases
where the input data is an arbitrary hypergraph. In contrast, our hypergraph distance can accept any pair of arbitrary
hypergraphs as input.
Optimal transport and Gromov-Wasserstein distances. Optimal transport metrics are typically used to
compare probability distributions on a common metric space, and are widely perceived as a powerful framework
for quantifying uncertainty within geometric measurements [30, 35]. Gromov-Wasserstein distances, in contrast, are
used to obtain correspondences across different spaces, which in turn permits comparison between spaces that are
not comparable a priori [27, 28]. GW distances essentially require only square matrices encoding pairwise relational
information in the spaces to be compared [9], and are thus well-suited for comparing graphs via adjacency, shortest path
distance, or spectral representations [46, 43, 12]. However, hypergraphs typically encode more than pairwise relations;
our strategy for extending the GW framework to hypergraphs is to work directly with multi-way relations (modeled
as rectangular matrices encoding node-hyperedge relations in finite settings) and to leverage the recently-developed
co-optimal transport framework of [32]. Using this framework, we are able to simultaneously infer node-node and
hyperedge-hyperedge correspondences when comparing hypergraphs.

2 Theoretical Formulation of Hypernetwork Distance


In this section, we develop a general theory of abstract hypernetworks which extends the co-optimal transport framework
of Redko et al. [32]. Proofs of all theoretical results are delayed to Appendix A. We begin with some background
exposition in Section 2.1.

2.1 Graphs, Measure Networks and Gromov-Wasserstein Distance

Recall that a graph is a pair (V, E), where V is a set of nodes and E is a set of 2-element subsets of V , each of which is
called an edge. In [9], a far-reaching generalization of the notion of a graph is introduced.
Definition 1. A measure network is a triple N = (X, µ, ω), where X is a Polish space, µ is a Borel probability measure
on X and ω : X × X → R is a bounded, measurable network function. Let N denote the collection of all measure
networks.

A graph (V, E) with a probability distribution µ on V defines a measure network by, for example, taking ω to be a
binary adjacency function. However, this definition encompasses a huge variety of other structures; any metric measure
space (metric space endowed with a Borel probability measure) is, in particular, a measure network. As was also
observed in, e.g., [38, 31], measure networks give a general setting for defining GW distances (which were originally
defined only for metric measure spaces [27]).
Definition 2. Given two probability spaces (X, µ) and (Y, ν), a coupling π is a joint probability measure on X × Y
satisfying π(A × Y ) = µ(A) and π(X × B) = ν(B) for each measurable A ⊆ X and B ⊆ Y . The collection of
couplings between µ, ν is denoted C(µ, ν).
Definition 3. For N = (X, µ, ω), N 0 = (X 0 , µ0 , ω 0 ) ∈ N , the p-distortion for p ∈ [1, ∞) is the functional disN
p =
disN 0
N,N ,p : C(µ, µ0
) → R defined by
Z Z 1/p
0 0 0 p
disN p (π) = |ω(x, y) − ω (x , y )| π(dx × dx 0
)π(dy × dy 0
) . (1)
X×X 0 X×X 0

The Gromov-Wasserstein (GW) p-distance is then defined to be


dN ,p (N, N 0 ) = inf disN
p (π). (2)
π∈C(µ,µ0 )

3
D ECEMBER 8, 2021

Gromov-Wasserstein p-distance is known to define a pseudometric on N and the distance-0 equivalence classes are
well-understood [9]. An important feature of this metric for practical applications is that an optimal coupling realizing
(2) provides a soft matching between the points in the measure networks.

2.2 Hypergraphs, Measure Hypernetworks and Hypernetwork Distance

Recall that a hypergraph is a pair (X, Y ), where X is a set of nodes and Y is a collection of subsets of X, each of which
is called a hyperedge. Figure 2 shows some simple hypergraphs and different hypergraph visualization techniques.

<latexit sha1_base64="UnYIjdee54GgBgRHzziBjGplxqE=">AAACAHicbVC7TsMwFHV4lvIKMDCwWLRITFXSBcYKFsYi0YeURJXjOo1Vx45sB6iiLvwKCwMIsfIZbPwNTpsBWo50paNz7tW994Qpo0o7zre1srq2vrFZ2apu7+zu7dsHh10lMolJBwsmZD9EijDKSUdTzUg/lQQlISO9cHxd+L17IhUV/E5PUhIkaMRpRDHSRhrYx3X/UdJRrJGU4sELcj+N6bReHdg1p+HMAJeJW5IaKNEe2F/+UOAsIVxjhpTyXCfVQY6kppiRadXPFEkRHqMR8QzlKCEqyGcPTOGZUYYwEtIU13Cm/p7IUaLUJAlNZ4J0rBa9QvzP8zIdXQY55WmmCcfzRVHGoBawSAMOqSRYs4khCEtqboU4RhJhbTIrQnAXX14m3WbDdRrubbPWuirjqIATcArOgQsuQAvcgDboAAym4Bm8gjfryXqx3q2PeeuKVc4cgT+wPn8Ad16WSw==</latexit>
<latexit
!
(a) (b) (c) (d) (e)
Figure 2: (a) A hypergraph visualized as a Venn diagram; nodes are black, hyperedges are colored convex hulls. (b)
Replacing each hyperedge in the Venn diagram with its own node representation, we visualize the same hypergraph as
an incidence graph. (c) Hybrid Venn diagram and incidence graph visualization. (d)-(e) An example of a morphism
between hypergraphs (see Section 3).

Inspired by the definition of a measure network given above, we now give a generalization of the notion of a hypergraph.
Definition 4. A measure hypernetwork is a quintuple H = (X, µ, Y, ν, ω), where (X, µ) and (Y, ν) are Borel-measured
Polish spaces and ω : X × Y → R is a bounded, measurable hypernetwork function. We denote the collection of all
hypernetworks by H.
Example 5 (Hypergraphs). Our motivating examples of hypernetworks arise from combinatorial hypergraphs. Let
(X, Y ) be a hypergraph, where X is a finite set of nodes and Y ⊆ P(X) is a set of hyperedges. For example, we may
endow X and Y with uniform measures µ and ν, respectively; i.e., µ(x) = 1/|X| for any x ∈ X and µ(y) = 1/|Y |
for any y ∈ Y . We may define a hypernetwork function ω : X × Y → R by the incidence relation:

1 if x ∈ y
ω(x, y) :=
0 otherwise.

Alternatively, ω may be defined by a shortest path relation: (a) if x ∈ y, then ω(x, y) = 0; (b) if x ∈
/ y, then ω(x, y)
is the length of the shortest hyperedge path—a sequence of hyperedges with nontrivial sequential pairwise overlap,
with length equal to the sum of the overlaps—from any hyperedge y 0 containing x to hyperedge y. This method
for transforming a hypergraph into a measure hypernetwork assumes that the hypergraph is connected: any two
hyperedges are joined by a hyperedge path. For the example hypergraph shown in Fig. 2c, we have X = {1, 2, 3, 4, 5},
Y = {a, b, c, d}, µ(x) = 1/5 for each x ∈ X, and ν(y) = 1/4 for each y ∈ Y . If ω encodes the incidence relation or
the shortest path relation, we have, respectively,
   
1 1 0 0 0 0 1 2
1 0 1 0 0 1 0 1
ω = 0 1 0 1 , or ω = 1 0 1 0 ,
   
0 1 1 1 1 0 0 0
0 0 1 0 1 1 0 1
where the matrices encode the function ω by indexing rows by X and columns by Y .
Remark 6. The constructions in Example 5 involve several choices: a measure on the nodes, a measure on the
hyperedges and a hypernetwork function. There are many more ways to model a hypergraph as a measure hypernetwork—
some are described in detail in Section 4.
Example 7 (Measure Networks). Any measure network (X, µ, ω) defines a measure hypernetwork (X, µ, X, µ, ω);
this association defines an injection N → H.
Example 8 (Data Matrices). The main motivation for introducing the co-optimal transport framework in [32] was to
compare data matrices; i.e. rectangular matrices used to store vector-valued datasets. Such an object is represented in
our formalism by taking X to be a set of samples, Y to be a set of features (in the machine learning sense), µ and ν to
be some data-dependent distributions and defining ω(x, y) to be the value of feature y on sample x.

4
D ECEMBER 8, 2021

Example 9 (Infinite Examples). Hypergraphs and data matrices involve finite sets, while we are working in a generality
that allows infinite sets. One motivation for doing so is that it is required to get a complete metric space (see Theorem
16 below). More concretely, there are many interesting instances of infinite measure hypernetworks. For example,
a hypernetwork H = (X, µ, Y, ν, ω) can encode a machine learning problem, where X is a data space (say, Rn ), µ
is a distribution of data in X, Y is a parameter space (say, Rm ), ν is a distribution representing constraints on the
parameters, and ω : X × Y → R is considered as a parameteric family of machine learning models. As another
example, suppose that (X, µ) and (Y, ν) are bounded, measured subsets of a common ambient metric space (Z, dZ )
and define ω : X × Y → Z by ω(x, y) = dZ (x, y); this is the sort of setup that arises in a metric construction of Sturm,
related to the usual embedding formulation of Gromov-Hausdorff distance [37]. See [29] for other examples along
these lines.
We are now prepared to define our distance between measure hypernetworks, which is a direct extension of the
co-optimal transport framework of Redko et al. introduced in [32].
Definition 10. Let H = (X, µ, Y, ν, ω), H 0 = (X 0 , µ0 , Y 0 , ν 0 , ω 0 ) ∈ H. The p-th co-optimal transport distortion for
p ∈ [1, ∞) is the functional disp = disH,H 0 ,p : C(µ, µ0 ) × C(ν, ν 0 ) → R defined by
Z Z 1/p
p
disp (π, ξ) = |ω(x, y) − ω 0 (x0 , y 0 )| ξ(dy × dy 0 )π(dx × dx0 ) . (3)
X×X 0 Y ×Y 0
The hypernetwork p-distance is then defined to be
dH,p (H, H 0 ) = inf inf disp (π, ξ). (4)
π∈C(µ,µ0 ) ξ∈C(ν,ν 0 )

Similar to the GW setting, an important feature of this metric is that its practical computation involves determining
optimal couplings π and ξ of both X, X 0 and Y, Y 0 , respectively. In the hypergraph setting, these couplings correspond
to soft matchings of both node sets and hyperedge sets. See also Appendix B for a matrix formulation following [32].

2.3 Properties of Hypernetwork Distance

We now describe some basic theoretical properties of the hypernetwork distance dH,p .
Definition 11. Measure hypernetworks H and H 0 are strongly isomorphic if there exist bijective measure-preserving
maps φ : X → X 0 and ψ : Y → Y 0 with Borel-measurable inverses such that ω 0 (φ(x), ψ(y)) = ω(x, y) for all
(x, y) ∈ X × Y .
The proof of the following theorem adapts techniques of [9] in the GW setting.
Theorem 12. The hypernetwork p-distance is a pseudometric on H. Moreover, if H and H 0 are strongly isomorphic,
then dH,p (H, H 0 ) = 0.
Measure hypernetworks H and H 0 are said to be weakly isomorphic if dH,p (H, H 0 ) = 0. The weak isomorphism class
of a hypernetwork H is denoted [H]. Let [H] denote the space of weak isomorphism classes of measure hypernetworks.
Corollary 13. Hypernetwork distance dH,p induces a metric on [H].
The following proposition gives a more geometric characterization of weak isomorphism (cf. [38, Proposition 5.6] and
[9, Theorem 2.4] in the context of GW distance).
Proposition 14. Measure hypernetworks H = (X, µ, Y, ν, ω) and H 0 = (X 0 , µ0 , Y 0 , ν 0 , ω 0 ) are weakly isomorphic
if and only if there exist measure spaces (ZX , µZ ), (ZY , νZ ) and measure-preserving maps φX : ZX → X, φX 0 :
ZX → X 0 , ψY : ZY → Y , and ψY 0 : ZY → Y 0 such that ω(φX (zX ), ψY (zY )) = ω 0 (φX 0 (zX ), ψY 0 (zY )) for
(µZ ⊗ νZ )-almost every (zX , zY ).
We define a basic weak isomorphism of measure hypernetworks H and H 0 to be a pair of measure-preserving maps
φ : X → X 0 and ψ : Y → Y 0 such that ω 0 ◦ (φ × ψ) = ω.
Example 15 (Node and Edge Collapses as Basic Weak Isomorphisms). A standard way to simplify a hypergraph is
via node collapse and hyperedge collapse [48]. As illustrated in Fig. 3, node collapse combines nodes that belong to
exactly the same set of hyperedges into a single “super-node” (visualized by concentric ring glyph), while hyperedge
collapse merges hyperedges that share exactly the same set of nodes into a “super-hyperedge” (visualized by a pie-chart
glyph). This operation can be defined in generality for finite hypernetworks. Let H = (X, µ, Y, ν, ω) be a finite (i.e.
|X|, |Y | < ∞) measured hypernetwork such that µ and ν have full support. Suppose that there exist x1 , x2 ∈ X such
that ω(x1 , y) = ω(x2 , y) for every y ∈ Y . Define a new hypernetwork H 0 = (X 0 , µ0 , Y 0 , ν 0 , ω 0 ) with X 0 = X \ {x2 },

µ(x) x 6= x1
µ0 (x) =
µ(x1 ) + µ(x2 ) x = x1 ,

5
D ECEMBER 8, 2021

Y 0 = Y , ν 0 = ν and ω 0 = ω|X 0 ×Y 0 . Then the map φ : X → X 0 sending x 7→ x for x 6= x2 and x2 7→ x1 , together


with the identity map ψ : Y → Y 0 define a basic weak isomorphism H → H 0 . This basic weak isomorphism is called a
node collapse. The formal definition of a hyperedge collapse is entirely similar.

1
3
a c
2
b

Figure 3: Node and hyperedge collapses. Left: nodes 1 and 2 merge into a super-node 3. Right: hyperedges a and b
merge into a super-hyperedge c.

We conclude this section with a result on some basic geometric properties of the metric space ([H], dH,p ). The proof is
based on adapting arguments given by Sturm in [38].
Theorem 16. The metric space ([H], dH,p ) is complete and geodesic.

3 From Hypergraphs to Graphs: Graphification Functors


A common technique in the field of hypergraph analysis is to transform a hypergraph into a traditional graph, which has
a more tractable structure. In this section, we formalize this graphification process in the general language of measure
networks and measure hypernetworks.

3.1 Transformations of Hypergraphs to Graphs

Let H = (X, Y ) be a hypergraph with nodes X = {x1 , · · · , xn } and hyperedges Y = {y1 , · · · , ym }, where yi ⊆ X.
There are several transformations from a hypergraph to a graph: we focus on the bipartite incidence graph, the line
graph [3], and the clique expansion [49], shown in Fig. 4.

(a) (b) (c) (d) (e)


Figure 4: A hypergraph H (a) with its incidence graph B(H) (b), line graph L(H) (c), dual hypergraph H∗ (d), and
clique expansion Q(H) (e).

A hypergraph H may be represented by a bipartite incidence graph B(H) as follows: the sets X and Y are the partitions
of B(H), and x and y are connected by an edge if and only if node x is contained in hyperedge y in H. The line graph
L(H) of H is a graph whose node set corresponds to the set of hyperedges of H; two nodes are adjacent in L(H) if
their corresponding hyperedges have a nonempty intersection in H. The clique expansion Q(H) of H consists of node
set X, and there is an edge (xi , xj ) in Q(H) if there exists some hyperedge y ∈ Y such that xi , xj ∈ y. That is, Q(H)
is a graph constructed from H by replacing each hyperedge with a clique among its nodes.
The line graph and clique expansion are related through the concept of a dual hypergraph H∗ , which swaps the roles of
nodes and hyperedges by considering the transpose of the incidence matrix of H: the clique expansion is the line graph
of the dual, Q(H) = L(H∗ ).
Finally, given H, the edges in B(H), L(H), and Q(H) maybe weighted based on, e.g., the similarities or dissimilarities
between nodes and hyperedges of H. The particular weighting scheme is typically application-dependent or ad hoc. In

6
D ECEMBER 8, 2021

the following subsection, we develop principled models for these various transformations in the setting of measure
hypernetworks.

3.2 Categories of Hypernetworks and Networks

Let h denote a collection of measure hypernetwork morphisms; that is,


h = {h(H, H 0 ) | H = (X, µ, Y, ν, ω), H 0 = (X 0 , µ0 , Y 0 , ν 0 , ω 0 ) ∈ H},
where each h(H, H 0 ) is a collection of pairs (φ, ψ) of functions φ : X → X 0 and ψ : Y → Y 0 . Each choice of
morphism set defines a category of measure hypernetworks Hh whose objects are elements of H and whose morphisms
belong to h. Similarly, for each choice of morphism set n between measure networks,
n = {n(N, N 0 ) | N = (X, µ, ω), N 0 = (X 0 , µ0 , ω 0 ) ∈ FN },
where each n(N, N 0 ) is a collection of functions φ : X → X 0 , we obtain a category of measure networks, denoted Nn .
Example 17 (Expansive Categories). For (combinatorial) hypergraphs (X, Y ), the usual categorical structure is
obtained by declaring morphisms to be pairs of maps which preserve hyperedge containment [15]. That is, for
hypergraphs (X, Y ) and (X 0 , Y 0 ), a morphism is a pair φ : X → X 0 and ψ : Y → Y 0 such that for all x ∈ X
and y ∈ Y with x ∈ y, we have φ(x) ∈ ψ(y). When X = X 0 and φ is the identity map, this can be viewed as a
combinatorial description of a map of covers from the topological data analysis literature [13]. We generalize this
and define an expansive morphism between arbitrary measure hypernetworks H and H 0 to be a pair φ : X → X 0 and
ψ : Y → Y 0 such that ω(x, y) ≤ ω 0 (φ(x), ψ(y)). Observe that if H is obtained from a combinatorial hypergraph and
ω is a binary incidence function, then the expansive condition agrees with the morphism structure defined above. The
set of all expansive morphisms is denoted e(H, H 0 ) and the associated morphism set on H is denoted e. The expansive
category of hypernetworks is then He .
We can similarly define an expansive category of measure networks. An expansive morphism between measure networks
N = (X, µ, ω) and N 0 = (X 0 , µ0 , ω 0 ) is a map φ : X → X 0 such that ω(x, y) ≤ ω 0 (φ(x), φ(y)). This generalizes the
usual categorical structure on the space of combinatorial graphs, where morphisms are required to take edges to edges.
Recycling notation, we denote by e = {e(N, N 0 ) | N, N 0 ∈ N } the set of expansive measure network morphisms and
we define the expansive category of measure networks to be Ne .
Example 18 (Measure-Preserving Categories). A measure-preserving morphism between measure hypernetworks H
and H 0 is a pair φ : X → X 0 and ψ : Y → Y 0 of measure-preserving maps. Denote the set of measure-preserving
maps from H to H 0 as m(H, H 0 ) and let m be the associated morphism set. Similarly, a measure-preserving morphism
of measure networks N and N 0 is a measure-preserving map φ : X → X 0 . By abuse of notation, we use m(N, N 0 ) for
the set of measure-preserving morphisms from N to N 0 and m for the associated morphism set. The measure-preserving
category of hypernetworks (respectively, networks) is Hm (respectively, Nm ).
Generalizing Section 3.1, a measure hypernetwork H = (X, µ, Y, ν, ω) has a dual hypernetwork H∗ = (Y, ν, X, µ, ω∗ ),
where ω∗ (y, x) := ω(x, y).
Proposition 19 (cf. [15], Proposition 3.1). The dualization map H 7→ H∗ is a covariant endofunctor of Hh for
h ∈ {e, m} and an isometry of the pseudometric space (H, dH,p ).

3.3 Graphification Functors

We now formalize the transformations of Section 3.1 as maps H → N . The desiderata for such a map are that it should
translate the algebraic (i.e., morphism) structure of the space of hypernetworks to the space of networks and that it
should preserve similarity (i.e., metric distance) between objects. To this end, we define:
Definition 20. For categories Hh and Nn , a (h, n)-graphification is a functor F : Hh → Nn which is Lipschitz with
respect to dH,p and dN ,p .

Binary incidence. As a first example of a graphification, we define a map B : H → N by B(X, µ, Y, ν, ω) =


(Z, ωZ , µZ ), where Z = X t Y , endowed with the disjoint union topology,
1
µZ (U ) = (µ(U ∩ X) + ν(U ∩ Y )) (5)
2
for all Borel sets U ⊂ Z, and
ω(w, z) if w ∈ X and z ∈ Y
(
ωZ (w, z) = ω∗ (w, z) if z ∈ X and w ∈ Y
0 otherwise.

7
D ECEMBER 8, 2021

The map B generalizes the transformation from a hypergraph to its bipartite incidence graph (Section 3.1). As such, we
refer to B : H → N as the incidence map.
Proposition 21. The incidence map is a (h, n)-graphification for (h, n) ∈ {(e, e), (m, m)}, with Lipschitz constant
equal to 1.
Example 22. The incidence functor does not induce a bi-Lipschitz equivalence of H and N . Indeed, there are
simple instances of H, H 0 ∈ H which are not weakly isomorphic (i.e., for which dH,p (H, H 0 ) > 0), but for which
dN ,p (B(H), B(H 0 )) = 0. For example, let |X| = |Y | = 6, µ and ν uniform and consider ω, ω 0 described by block
matrices
1 1 0
    " #
A 0 0 A 0
ω= and ω = , where A = 0 1 0 ,
0 A 0 AT
0 1 1
with rows and columns indexed by X and Y , respectively. Set H = (X, µ, Y, ν, ω) and H 0 = (X, µ, Y, ν, ω 0 ). Using
[32, Proposition 1], we can show that dH,p (H, H 0 ) > 0. Indeed, since the networks are of the same size and measures
are uniform, dH,p (H, H 0 ) = 0 would imply the existence of permutation matrices P and Q such that P ω 0 Q = ω. This
is impossible, since row/column permutations will preserve the number of rows containing exactly two ones. On the
other hand, B takes these hypernetworks to networks described by block matrices
0 0 A 0 0 0 A 0
   
0 A  T
 0 0  0 0 0 A 
ωZ =  T and ωZ 0 =  T ,
A 0 0 0  A 0 0 0
0 AT 0 0 0 A 0 0
respectively, where rows and columns are both indexed by X t Y and probability measures are uniform. There is a
permutation matrix P such that P ωZ 0 P T = ωZ , which implies the existence of a strong isomorphism between B(H)
and B(H 0 ). Thus dN ,p (B(H), B(H 0 )) = 0.

While Example 22 shows that the bipartite map is not an isomorphism, the proof of Proposition 21 suggests an alternate
characterization of dH,p , which we will now explain.
Definition 23. A measure network N = (Z, µ, ω) is called bipartite if there exists a topological disconnection
Z = X t Y such that µ(X) = µ(Y ) = 12 and for all x1 , x2 ∈ X and y1 , y2 ∈ Y we have ω(x1 , x2 ) = ω(y1 , y2 ) = 0
and ω(x1 , y1 ) = ω(y1 , x1 ). A labeled bipartite network is a bipartite network with a fixed decomposition, written as
N = (X, Y, µ, ω). The collection of labeled bipartite measure networks is denoted B.

We now define a variant of the network GW distance for labeled bipartite networks N = (X, Y, µ, ω) and N 0 =
(X 0 , Y 0 , µ0 , ω 0 ) which respects labels. A labeled coupling of µ and µ0 is a coupling π ∈ C(µ, µ0 ) such that supp(π) ⊂
(X × X 0 ) ∪ (Y × Y 0 ). Let CB ((X, Y, µ), (X 0 , Y 0 , µ0 )) denote the set of all labeled couplings.
Definition 24. The labeled Gromov-Wasserstein p-distance dB,p : B × B → R is

dB,p (N, N 0 ) = inf disN


p (π).
π∈CB ((X,Y,µ),(X 0 ,Y 0 ,µ0 ))

While we have a general Lipschitz bound dN ,p (B(H), B(H 0 )) ≤ dH,p (H, H 0 ), the following result shows that dH,p is
equivalent to our variant of GW distance.
Theorem 25. The bipartite incidence map is an isometry of (H, dH,p ) and (B, dB,p ).

Line graph. We now introduce a simple model for the line graph transformation of Section 3.1 which is general
enough to be defined on the full space of hypernetworks H. The tale of this subsection is more cautionary: we will
show that our naive definition fails to have the desired properties of a graphification, indicating that some care must be
taken when designing principled graph simplifications of hypergraphs.
We define the line graph map L : H → N by L(X, µ, Y, ν, ω) = (Y, ν, ω` ), where
Z
ω` (y1 , y2 ) := ω(x, y1 )ω(x, y2 )µ(dx).
X

If X and Y are finite and we represent ω as a matrix in ω ∈ R|X|×|Y | with rows indexed by X and columns indexed by
Y , then ω` is given as a matrix by

ω` = ω T Dµ ω, Dµ := diag(µ). (6)

8
D ECEMBER 8, 2021

Observe that if ω is a binary incidence function associated to a combinatorial hypergraph, then ω` (y1 , y2 ) takes a
weighted count of nodes in the intersection of hyperedges y1 and y2 , modeling a weighted line graph construction, as in
Section 3.1.
We now provide some examples which show that this naive definition of a line graph map does not have the desired
properties of a valid graphification.
Example 26 (The Line Graph Map is Not Lipschitz). For α ≥ 1, set Hα = (X, µ, Y, ν, ωα ), where |X| = |Y | = 2,
µ, ν are uniform and ωα is described by the matrix ωα = diag(α, α). The line graph map takes Hα to L(Hα ) =
(Y, ν, (ωα )` ), with (ωα )` = diag(α2 /2, α2 /2).
Now consider dH,p (Hα , H1 ) and dN ,p (L(Hα ), L(H1 )) for α ≥ 1 and p ∈ [1, ∞). One can show that dH,p (Hα , H1 ) is
realized by optimal couplings π = ξ = diag(1/2, 1/2) and that this coupling is also optimal for dN ,p (L(Hα ), L(H1 )).
One can then directly compute
 2 
1 1 α 1
dH,p (Hα , H1 ) = 1/p (α − 1) and dN ,p (L(Hα ), L(H1 )) = 1/p − .
2 2 2 2
Taking α arbitrarily large shows that the naive line graph map L : H → N is not Lipschitz.
The line graph map obviously defines a functor of measure-preserving categories L : Hm → Nm via L(φ, ψ) := ψ.
However, functoriality fails for expansive categories.
Example 27 (The Line Graph Map is Not Functorial for Expansive Categories). Let |X| = |Y | = 1, µ, ν uniform and
ω = [1] (in matrix form), and let X 0 = {x01 , x02 }, |Y 0 | = 1, µ0 , ν 0 uniform and ω 0 = [1, 0]T . Let φ : X → X 0 be the
map taking the unique element of X to x01 and let ψ : Y → Y 0 be the unique function between singleton sets. Then
(φ, ψ) defines an expansive morphism between H = (X, µ, Y, ν, ω) and H 0 = (X 0 , µ0 , Y 0 , ν 0 , ω 0 ). However, the line
graph map takes H to L(H) = (Y, ν, ω` ), where ω` = [1] and H 0 to L(H 0 ) = (Y 0 , ν 0 , ω`0 ) where ω`0 = [1/2]. Then
φ : Y → Y 0 does not define an expansive morphism L(H) → L(H 0 ).

Modified line graph. Inspired by the matrix formula (6) for the naive line graph map, we introduce a modified
map with the desired Lipschitz property. In this subsection, we make the simplifying assumptions that all measure
hypernetworks are finite and all measures are fully supported and we restrict our attention to the p = 2 metrics.
The definition of the modified line graph map below is based on the following observation: the map taking an m × m
matrix X to X T X is not Lipschitz continuous with respect to the Frobenius norm k · kF , and this is essentially the
reason for the failure of Lipschitzness of the naive line graph map; on the other hand, we have the following lemma.
Lemma 28 ([1], [4] Section 5). Let X, Y ∈ Rm×m . Then
1 1 √
k(X T X) 2 − (Y T Y ) 2 kF ≤ 2kX − Y kF ,
where a 21 -exponent denotes the matrix square root of a positive semi-definite matrix.
1
This motivates a modified line graph map which looks more like the map X 7→ (X T X) 2 . It turns out that some extra
normalizations are needed to get the Lipschitz condition:
Definition 29. For a finite hypernetwork H = (X, µ, Y, ν, ω), consider ω as a matrix in R|X|×|Y | and the measures as
column vectors. Let Dµ = diag(µ) and Dν = diag(ν). The modified line graph map takes H to L̂(H) := (Y, ν, ω`ˆ),
 12
−1
 1 1
−1
where ω`ˆ := Dν 2 Dν2 ω T Dµ ωDν2 Dν 2 .
 21
For example, if H ∈ H has |X| = m, |Y | = n and µ, ν uniform, then L̂(H) = m T
p
n ω ω .
Proposition 30. The modified
√ line graph map is an (m, m)-graphification on the space of finite measure hypernetworks
with Lipschitz constant 2 (with respect to dH,2 and dN ,2 ).

Clique expansion. Following Section 3.1, we can define the clique expansion of a measured hyper network H to be
the line graph of its dual H∗ . Since dualization defines a functorial isometry of H (Proposition 19), the clique expansion
will automatically inherit the categorical/Lipschitz properties of the line graph map. It follows that Q(H) := L(H∗ ) is
not a valid graphification, but that Q̂(H) := L̂(H∗ ) is (for measure-preserving categories).

4 Computational Examples
In this section, we demonstrate our computational framework on hypergraph matching tasks. We begin with toy
examples in Example 32, then we discuss a slightly larger Example 33 involving a dataset of 40 hypergraphs from a

9
D ECEMBER 8, 2021

generative model. We end this section by describing an application of our HyperCOT framework in Example 34 for
multi-resolution matching of geometric objects represented by meshes. In this section, experiments are described at
a high level; details on implementation can be found in Appendix B and details on experimental parameters can be
found in Appendix C. Before describing the experiments, we expand on Example 5 and give a refined method for
generating a measure hypernetwork from a combinatorial hypergraph in Example 31. We discuss an additional example
on hypergraph simplification via the lens of hypernetwork distances in Appendix D.
Example 31 (Encoding Degree Information in Hypergraphs). Given a hypergraph (X, Y ), we define a hypernetwork
(X, µ, Y, ν, ω) by encoding degree information and P the Jaccard indices among hyperedges as follows. µ encodes the
normalized node degree, that is, µ(x) = deg(x)/ x0 deg(x0 ), and deg(x) is the number of hyperedges containing x.
ν is proportional to the sum of its node degrees, that is, ν(y) = ν̂(y)/ y0 ν̂(y 0 ), where ν̂(y) = x∈y deg(x). ω(x, y)
P P

is defined based on a weighted line graph L(H) of H (different from the line graph functors defined above), where each
edge in L(H) is weighted by the multiplicative inverse of the Jaccard index of its end points. Then ω(x, y) captures the
Jaccard weighted shortest path relation w.r.t. to the weighted line graph. Using the same hypergraph H shown in Fig. 2c,
we have: µ(1) = µ(2) = µ(3) = 2/10, µ(4) = 3/10, and µ(5) = 1/10; ν(a) = 4/22, ν(b) = 7/22, ν(c) = 6/22,
and ν(d) = 5/22; and  
0 0 4 2
0 4 0 4
ω = 4 0 4 0 .
 
4 0 0 0
4 5 0 4
Example 32 (Hypergraph Matching Toy Examples). Fig. 5 shows two toy examples of matching between nodes and
hyperedges of hypergraphs, computed simultaneously via the co-optimal transport distance dH,2 . Both examples utilize
the method described in Example 31 that encodes degree information and the Jaccard indices among hyperedges.

(a) (c)

(b) (d)

Figure 5: Two toy examples of hypergraph matching.

The coupling matrices after running the co-optimal transport optimization are shown below. The 1st example in (a)-(b)
has coupling matrices ξ 1 for hyperedge coupling and π 1 for node coupling. The 2nd example in (c)-(d) has coupling
matrices ξ 2 and π 2 .
0.158 0 0 0.096 0.086 0
   
 0 0.211 0  2  0 0.282 0.036
ξ1 =  ξ =
0.115 0 0.254 0.272 0 0 
0 0.062 0.201 0 0 0.227
   
0.111 0 0 0 0 0.144 0 0.022 0.033 0
 0 0.111 0 0 0  0.078 0.111 0 0 0.011
π1 =  0 0 0.143 0 0.079 π 2 =  0 0 0.2 0 0 
   
 0 0.032 0 0.286 0.016   0 0 0 0.3 0 
0.032 0 0 0 0.190 0 0 0 0 0.1
In Fig. 5a, we obtain a hypergraph matching H1 → H2 , where the visual encodings capture information from the
hyperedge coupling matrix ξ 1 . Colored nodes (with white rim) and colored convex hulls in the hybrid visualization for

10
D ECEMBER 8, 2021

H1 represent the four hyperedges of H1 . The co-optimal transport results in a color transfer such that colored nodes in
H2 capture information from the hyperedge coupling matrix ξ 1 via pie charts. For instance, the 3rd column of ξ 1 shows
that hyperedges c (red) and d (purple) in H1 are both matched to the hyperedge c in H2 with non-zero probabilities.
Therefore node c in H2 is visualized by a pie chart in both red and purple. The convex hull associated with hyperedge
c ∈ H2 , on the other hand, carries the color of the hyperedge c ∈ H1 with the highest coupling probability. Similarly in
Fig. 5b, node coupling between H1 and H2 is visualized via a color transfer and pie charts. For example, both node 2
and node 4 in H1 are matched with node 2 in H2 with non-zero probability, most likely due to the symmetry of these
hypergraphs.
Example 33 (Hypergraph Matching). We generated a dataset of 40 hypergraphs based on four different parameter
settings of a hypergraph generative model presented in [14], and applied HyperCOT to obtain all pairwise distances
over the dataset. The generative model, known as Subset Sampling, essentially proceeds as follows: for each node, one
draws a positive integer k from a distribution p. Then one inserts k new hyperedges containing the selected node, where
the size of each new hyperedge is controlled by a separate distribution (cf. [14] for further details). In our experiment,
we kept all other parameters fixed and varied p ∼ Pois(λ) for λ ∈ {5, 10, 15, 20}. These simulated hypergraphs had
184 (s.d. 3) nodes and 690 (s.d. 36) hyperedges on average. The matrix of pairwise distances is plotted in Figure
6 along with a dendrogram plotted using Ward’s linkage. The results suggest that despite one misclassification, our
method is overall able to neatly separate the four classes.

1
4
8
0
2
5
3
9
6
7
10
16
19
11
17
14
13
15
12
18
39
30
33
36
32
37
38
35
34
15 28
31
25
10 21
23
22
5 29
24
26
0 20
27
1
4
8
0
2
5
3
9
6
7
10
16
19
11
17
14
13
15
12
18
39
30
33
36
32
37
38
35
34
28
31
25
21
23
22
29
24
26
20
27

Figure 6: Pairwise hypernetwork distances across the dataset of 40 simulated hypergraphs in Example 33. Row/column
colors indicate ground truth class labels.

Example 34 (Multi-resolution Matching). Figure 1 illustrates a case where hypergraphs are not provided, but are
generated from data. We take two meshes from the TOSCA dataset [6] and retain only the graph structure. For each
graph, we first obtain an overlapping cover. Note that although methods such as Mapper [33] are commonly used to
obtain such covers, we utilize a purely intrinsic approach based on heat kernels that may be of independent interest
(cf. Appendix C). We then build its nerve graph as follows: each cover element is a node, and edges connect cover
elements that share data points. Iterating this construction produces a progressive simplification of the data. At each
stage, a hypergraph is constructed by taking the cover elements to be hyperedges. We then apply HyperCOT to obtain
correspondences. As the underlying optimization is carried out via block coordinate descent on nodes and hyperedges,
information is shared across multiple resolutions. The quality of the correspondences can be understood via color
transfer: the Centaur node indices are mapped linearly to a colormap and we color each Wolf node by the color of the
node from which it received the largest mass transfer. Note that the colors overall tend to match across semantically
similar regions. Our procedure can be viewed as an empirical demonstration of a construction known as the Multiscale
Mapper [13].

11
D ECEMBER 8, 2021

5 Conclusion
We view this work to be the first in a landscape of new research directions in hypergraph data analysis. In analogy
with the measure network setting, we expect that future work could further develop the geometry of hypergraph space,
and utilize this study to obtain methods for carrying out geometric statistics (e.g. Fréchet means) on this space. From
the applied perspective, we expect that future work could focus on applications to machine learning problems as well
as scalability and deployment into deep learning pipelines. Finally, the study of morphisms between network and
hypernetwork categories initiated in this work suggests new questions about obtaining families of Lipschitz maps and
further understanding existing graphification methods from a categorical perspective.

References
[1] Huzihiro Araki and Shigeru Yamagami. An inequality for Hilbert-Schmidt norm. Communications in Mathematical
Physics, 81(1):89–96, 1981.
[2] Anirban Banerjee, Arnab Char, and Bibhash Mondal. Spectra of general hypergraphs. Linear Algebra and its
Applications, 1:14–30, 2017.
[3] Jean-Claude Bermond, Marie-Claude Heydemann, and Dominique Sotteau. Line graphs of hypergraphs I. Discrete
Mathematics, 18(3):235–241, 1977.
[4] Rajendra Bhatia. Matrix factorizations and their perturbations. Linear Algebra and its applications, 197:245–276,
1994.
[5] Salomon Bochner. Harmonic analysis and the theory of probability. University of California press, 2020.
[6] Alexander M Bronstein, Michael M Bronstein, and Ron Kimmel. Numerical geometry of non-rigid shapes.
Springer Science & Business Media, 2008.
[7] Michael Chertok and Yosi Keller. Efficient high order matching. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 32(12):2205–15, 2010.
[8] Samir Chowdhury and Facundo Mémoli. Explicit geodesics in Gromov-Hausdorff space. Electronic Research
Announcements, 25:48, 2018.
[9] Samir Chowdhury and Facundo Mémoli. The Gromov–Wasserstein distance between networks and stable network
invariants. Information and Inference: A Journal of the IMA, 8(4):757–787, 2019.
[10] Samir Chowdhury, David Miller, and Tom Needham. Quantized Gromov-Wasserstein. arXiv preprint
arXiv:2104.02013, 2021.
[11] Samir Chowdhury and Tom Needham. Gromov-Wasserstein averaging in a Riemannian framework. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 842–843, 2020.
[12] Samir Chowdhury and Tom Needham. Generalized spectral clustering via Gromov-Wasserstein learning. In
International Conference on Artificial Intelligence and Statistics, pages 712–720. PMLR, 2021.
[13] Tamal K Dey, Facundo Mémoli, and Yusu Wang. Multiscale mapper: Topological summarization via codomain
covers. In Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 997–1013.
SIAM, 2016.
[14] Manh Tuan Do, Se-eun Yoon, Bryan Hooi, and Kijung Shin. Structural patterns and generative models of
real-world hypergraphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, pages 176–186, 2020.
[15] W Dörfler and DA Waller. A category-theoretical approach to hypergraphs. Archiv der Mathematik, 34(1):185–192,
1980.
[16] Olivier Duchenne, Francis Bach, In-So Kweon, and Jean Ponce. A tensor-based algorithm for high-order graph
matching. IEEE transactions on pattern analysis and machine intelligence, 33(12):2383–2395, 2011.
[17] Rémi Flamary and Nicolas Courty. POT: Python Optimal Transport library. 2017. URL: https://github.com/
rflamary/POT.
[18] Michael Garland and Paul S Heckbert. Surface simplification using quadric error metrics. In Proceedings of the
24th annual conference on Computer graphics and interactive techniques, pages 209–216, 1997.
[19] Steven Gold and Anand Rangarajan. A graduated assignment algorithm for graph matching. IEEE Transactions
on pattern analysis and machine intelligence, 18(4):377–388, 1996.

12
D ECEMBER 8, 2021

[20] Reigo Hendrikson. Using Gromov-Wasserstein distance to explore sets of networks. Master’s thesis, University
of Tartu, 2016.
[21] M. Karoński and Z. Palka. On Marczewski–Steinhaus type distance between hypergraphs. Applicationes
Mathematicae, 16(1):47–57, 1977.
[22] Tamara Gibson Kolda. Multilinear operators for higher-order decompositions. Technical Report SAND2006-2081,
Sandia National Laboratories, 2006.
[23] Jungmin Lee, Minsu Cho, and Kyoung Mu Lee. Hyper-graph matching via reweighted random walks. Conference
on Computer Vision and Pattern Recognition (CVPR), 2011.
[24] Mingzhe Li, Sourabh Palande, Lin Yan, and Bei Wang. Sketching merge trees for scientific data visualization.
arXiv preprint arXiv:2101.03196, 2021.
[25] Arthur Liberzon, Chet Birger, Helga Thorvaldsdóttir, Mahmoud Ghandi, Jill P Mesirov, and Pablo Tamayo. The
molecular signatures database hallmark gene set collection. Cell systems, 1(6):417–425, 2015.
[26] Chih-Long Lin. Hardness of approximating graph transformation problem. In Ding-Zhu Du and Xiang-Sun
Zhang, editors, Algorithms and Computation. ISAAC 1994. Lecture Notes in Computer Science, volume 834.
Springer, Berlin, Heidelberg, 1994.
[27] Facundo Mémoli. On the use of Gromov-Hausdorff distances for shape comparison. Eurographics Symposium on
Point-Based Graphics, pages 81–90, 2007.
[28] Facundo Mémoli. Gromov–Wasserstein distances and the metric approach to object matching. Foundations of
Computational Mathematics, 11(4):417–487, 2011.
[29] Ali Mutlu and Utku Gürdal. Bipolar metric spaces and some fixed point theorems. J. Nonlinear Sci. Appl,
9(9):5362–5373, 2016.
[30] Gabriel Peyré, Marco Cuturi, et al. Computational optimal transport: With applications to data science. Founda-
tions and Trends® in Machine Learning, 11(5-6):355–607, 2019.
[31] Gabriel Peyré, Marco Cuturi, and Justin Solomon. Gromov-Wasserstein averaging of kernel and distance matrices.
In International Conference on Machine Learning, pages 2664–2672, 2016.
[32] Ievgen Redko, Titouan Vayer, Rémi Flamary, and Nicolas Courty. Co-optimal transport. In H. Larochelle,
M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems,
volume 33, pages 17559–17570. Curran Associates, Inc., 2020.
[33] Gurjeet Singh, Facundo Mémoli, Gunnar E Carlsson, et al. Topological methods for the analysis of high
dimensional data sets and 3d object recognition. PBG@ Eurographics, 2, 2007.
[34] Sebastiano Smaniotto and Marcello Pelillo. Two metrics for attributed hypergraphs. Pattern Recognition Letters,
149:143–149, 2021.
[35] Justin Solomon. Optimal transport on discrete domains. AMS Short Course on Discrete Differential Geometry,
2018.
[36] Justin Solomon, Gabriel Peyré, Vladimir G Kim, and Suvrit Sra. Entropic metric alignment for correspondence
problems. ACM Transactions on Graphics (TOG), 35(4):1–13, 2016.
[37] Karl-Theodor Sturm. On the geometry of metric measure spaces. Acta mathematica, 196(1):65–131, 2006.
[38] Karl-Theodor Sturm. The space of spaces: curvature bounds and gradient flows on the space of metric measure
spaces. arXiv preprint arXiv:1208.0434, 2012.
[39] Aravind Subramanian, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee, Benjamin L. Ebert, Michael A.
Gillette, Amanda Paulovich, Scott L. Pomeroy, Todd R. Golub, Eric S. Lander, and Jill P. Mesirov. Gene set
enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings
of the National Academy of Sciences, 102(43):15545–15550, 2005.
[40] Amit Surana, Can Chen, and Indika Rajapakse. Hypergraph dissimilarity measures. arXiv preprint
arXiv:2106.08206, 2021.
[41] S. M. Ulam. Some ideas and prospects in biomathematics. Annual Review of Biophysics and Bioengineering,
1:227–292, 1972.
[42] Shinji Umeyama. An eigendecomposition approach to weighted graph matching problems. IEEE transactions on
pattern analysis and machine intelligence, 10(5):695–703, 1988.
[43] Titouan Vayer, Nicolas Courty, Romain Tavenard, and Rémi Flamary. Optimal transport for structured data with
application on graphs. In International Conference on Machine Learning, pages 6275–6284, 2019.

13
D ECEMBER 8, 2021

[44] Hongteng Xu. Gromov-Wasserstein factorization models for graph clustering. In AAAI, pages 6478–6485, 2020.
[45] Hongteng Xu, Dixin Luo, and Lawrence Carin. Scalable Gromov-Wasserstein learning for graph partitioning and
matching. In Advances in Neural Information Processing Systems, pages 3046–3056, 2019.
[46] Hongteng Xu, Dixin Luo, Hongyuan Zha, and Lawrence Carin. Gromov-Wasserstein learning for graph matching
and node embedding. In International Conference on Machine Learning, pages 6932–6941, 2019.
[47] Zhiping Zeng, Anthony K.H. Tung, Jianyong Wang, Jianhua Feng, and Lizhu Zhou. Comparing stars: On
approximating graph edit distance. Proceedings of the VLDB Endowment, 2(1):25–36, 2009.
[48] Youjia Zhou, Archit Rathore, Emilie Purvine, and Bei Wang. Topological simplifications of hypergraphs. arXiv
preprint arXiv:2104.11214, 2021.
[49] Jason Zien, Martine Schlag, and Pak K. Chan. Multi-level spectral hypergraph partitioning with arbitrary vertex
sizes. IEEE Transactions on Computer-Aided Designof Integrated Circuits and Systems, 18:1389–1399, 1999.

A Proofs
A.1 Proofs from Section 2

We begin with a technical result.


Lemma 35. The infimum in (4) is realized.

Proof. As X and X 0 are Polish spaces and µ and µ0 are Borel probability measures, we have that C(µ, µ0 ) is compact in
Prob(X × X 0 ), by [9, Lemma 10]. It follows that C(µ, µ0 ) × C(ν, ν 0 ) is compact in Prob(X × X 0 ) × Prob(Y × Y 0 ).
Now we must show that the distortion functional is continuous on C(µ, µ0 ) and C(ν, ν 0 ) for 1 ≤ p < ∞. Since we are
working in Polish spaces with finite measures, we have that bounded continuous functions are dense. So for every
n ∈ N, we can choose continuous bounded functions ωn ∈ Lp (µ ⊗ ν) and ωn0 ∈ Lp (µ0 ⊗ ν 0 ) such that the following
holds for all x ∈ X, y ∈ Y, x0 ∈ X 0 , and y 0 ∈ Y 0 :
1 1
kω(x, y) − ωn (x, y)kLp (µ⊗ν) ≤ and kω 0 (x0 , y 0 ) − ωn0 (x0 , y 0 )kLp (µ0 ⊗ν 0 ) ≤ .
n n
Then for each n ∈ N we can define a n-distortion functional disnp : C(µ, µ0 ) × C(ν, ν 0 ) −→ R+ as disnp (π, ξ) :=
kωn (x, y) − ωn0 (x0 , y 0 )kLp (π⊗ξ)
Now let us show that the n-distortion functional is continuous. Let (π, ξ) ∈ C(µ, µ0 ) × C(ν, ν 0 ) and have (πm )m∈N and
(ξm )m∈N be sequences in C(µ, µ0 ) and C(ν, ν 0 ) that converge to π and ξ with respect to the weak topology. Then
lim disnp (πm , ξm )
m→∞
Z Z  p1
= lim |ωn (x, y) − ωn0 (x0 , y 0 )|p πm (dx 0
× dx )ξm (dy × dy ) 0
m→∞ Y ×Y 0 X×X 0
Z Z  p1
= |ωn (x, y) − ωn0 (x0 , y 0 )|p π(dx 0
× dx )ξ(dy × dy ) 0
= disnp (π, ξ),
Y ×Y 0 X×X 0

so we can see that disnp is sequentially continuous. As C(µ, µ0 ) × C(ν, ν 0 ) ⊆ Prob(X × X 0 × Y × Y 0 ), it is a metrizable
space. Therefore, as disnp is sequentially continuous in a metrizable space, it is also continuous.
Now let us see that disnp converges to disp uniformly. Let (π, ξ) ∈ C(µ, µ0 ) × C(ν, ν 0 ) and observe

|disp (π, ξ) − disnp (π, ξ)|


= kω(x, y) − ω 0 (x0 , y 0 )kLp (π⊗ξ) − kωn (x, y) − ωn0 (x0 , y 0 )kLp (π⊗ξ)
≤ kω(x, y) − ω 0 (x0 , y 0 ) − ωn (x, y) + ωn0 (x0 , y 0 )kLp (π⊗ξ)
2
≤ kω(x, y) − ωn (x, y)kLp (µ⊗ν) + kω 0 (x0 , y 0 ) − ωn0 (x0 , y 0 )kLp (µ0 ⊗ν 0 ) ≤ .
n
Since disp is the uniform limit of continuous functions, it is also continuous. It follows that the infimum in (4) is
realized with a minimal coupling.

14
D ECEMBER 8, 2021

Let us introduce some notation. Let φ : X → Y be a measurable map between measurable spaces and let µ be a
measure on X. Throughout the rest of the proofs, we use φ∗ µ to denote the pushforward measure on Y .

¯ := (f∗ π, g∗ ξ) with
Proof of Theorem 12. Let H, H 0 ∈ H. For any (π, ξ) ∈ C(µ, µ0 ) × C(ν, ν 0 ), we can define (π̄, ξ)
f : X × X 0 → X 0 × X defined as f (x, x0 ) = (x0 , x) and g : Y × Y 0 → Y 0 × Y defined as g(y, y 0 ) = (y 0 , y). Then
disH,H 0 ,p (π, ξ) = disH 0 ,H,p (π, ξ) and it follows that dH,p (H, H 0 ) = dH,p (H 0 , H).
For the triangle inequality, let (π12 , ξ12 ) and (π23 , ξ23 ) be the minimal couplings for dH,p (H, H 0 ) and dH,p (H 0 , H 00 )
respectively. Then, using the gluing lemma (see, e.g., [38, Lemma 1.4]), we can create a new coupling (π13 , ξ13 ) from
H to H 00 using the composition of couplings (π12 , ξ12 ) and (π23 , ξ23 ). Then note the following
dH,p (H, H 00 )
≤ disp (π13 , ξ13 )
= kω − ω 00 kLp (π⊗ξ)
= kω − ω 0 + ω 0 − ω 00 kLp (π⊗ξ)
≤ kω − ω 0 kLp (π⊗ξ) + kω 0 − ω 00 kLp (π⊗ξ)
= kω − ω 0 kLp (π12 ⊗ξ12 ) + kω 0 − ω 00 kLp (π23 ⊗ξ23 ) = dH,p (H, H 0 ) + dH,p (H 0 , H 00 )
Finally, if H and H 0 are strongly isomorphic, we have bijective measure-preserving maps φ : X → X 0 and ψ : Y → Y 0
with Borel-measurable inverses such that ω 0 (φ(x), ψ(y)) = ω(x, y) for all (x, y) ∈ X × Y . We define couplings
π = (idX × φ)∗ µ and ξ = (idY × ψ)∗ we can easily see that disp (π, ξ) = 0 and hence dH,p (H, H 0 ) = 0.

Proof of Proposition 14. For the forward direction, if dH,p (H, H 0 ) = 0 with optimal coupling (π, ξ) ∈ C(µ, µ0 ) ×
C(ν, ν 0 ), we can define ZX := X t X 0 , µZ := π, ZY := Y t Y 0 and νZ := ξ. Then we can have the projection
maps be our measurable maps φX : ZX → X, φX 0 : ZX → X 0 , ψY : ZY → Y , and ψY 0 : ZY → Y 0 . We can
see that each of these projections preserve measure in the way we desire and as dH,p (H, H 0 ) = 0, we have that
ω(φX (zX ), ψY (zY )) = ω 0 (φX 0 (zX ), ψY 0 (zY )) almost everywhere.
For the reverse direction, we can define couplings π := (φX ∗ µZ , φX 0 ∗ µZ ) and ξ := (ψY ∗ νZ , ψY 0 ∗ νZ ) satisfying
disp (π, ξ)p
Z Z
= |ω − ω 0 |p π(dx × dx0 )ξ(dy × dy 0 )
Y ×Y 0 X×X 0
Z Z
= |ω(φX (zX ), ψY (zY )) − ω 0 (φX 0 (zX ), ψY 0 (zY ))|p µZ (dzX )νZ (dzY ) = 0.
ZY ZX

Proof of Theorem 16. We first prove completeness by adapting the proof of [38, Theorem 5.8]. Let Hn =
(Xn , µn , Yn , νn , ωn ) be a Cauchy sequence of measure hypernetworks. Then we can choose couplings πn ∈
C(µn , µn+1 ) and ξn ∈ C(νn , νn+1 ) such that disp (πn , ξn ) → 0. We then apply the generalized Gluing Lemma
QN
[38, Lemma 1.4] to π1 , . . . , πN −1 (respectively, to ξ1 , . . . , ξN −1 ) to construct a measure ΠN on n=1 Xn (respec-
QN
tively, ΞN on n=1 YN ). We now define

Y ∞
Y
X := Xn , µ := lim ΠN , Y := Yn , ν := lim ΞN ,
←− ←−
n=1 n=1

where lim denotes projective limit. Then X, as a countable product of Polish spaces, is also a Polish space. Moreover,
←−
µ is a well-defined probability measure on X (e.g., [5]), and similar statements hold for Y ,ν. Next we define
ΩN : X × Y → R by
ΩN ((xi )∞ ∞
i=1 , (yi )i=1 ) = ωN (xN , yN ).
The sequence (ΩN )∞ p
N =1 is a Cauchy sequence in the Banach space L (X × Y , µ ⊗ ν), since

kΩN − ΩN +1 kLp (µ⊗ν) = disp (πN , ξN ) → 0,


and we define ω to be its limit. Setting H := (X, µ, Y , ν, ω), we have H ∈ H and
dH,p (Hn , H) ≤ kωn − ωkLp (µ⊗ν) → 0.

15
D ECEMBER 8, 2021

We show that ([H], dH,p ) is a geodesic space by constructing an explicit geodesic between any two weak isomorphism
classes of hypernetworks [H] and [H 0 ], following the main idea of the proof of [38, Theorem 3.1]. Let π ∈ C(µ, µ0 )
and ξ ∈ C(ν, ν 0 ) be optimal couplings (which exist, by Lemma 35). Consider the path γ : [0, 1] → H defined by
γ(t) = (X × X 0 , π, Y × Y 0 , ξ, ωt ) , (7)
where
ωt ((x, x0 ), (y, y 0 )) = (1 − t)ω(x, y) + tω 0 (x0 , y 0 ).
Observe that γ(0) ∈ [H] and γ(1) ∈ [H 0 ]; indeed, coordinate projection maps φ : X × X 0 → X and ψ : Y × Y 0 → Y
define a basic weak isomorphism γ(0) → H and a similar construction works for γ(1) → H 0 . To show that γ defines a
geodesic in [H], it suffices to show that for all 0 ≤ s ≤ t ≤ 1,
dH,p (γ(s), γ(t)) ≤ (t − s)dH,p (H, H 0 ) (8)
(see, e.g., [8, Lemma 1.3]). One can show by an elementary computation that
disγ(s),γ(t),p (π, ξ)p ≤ (t − s)p disH,H 0 ,p (π, ξ)p
for any pair of couplings (π, ξ)—indeed, this is more or less the statement that straight lines are geodesics in Lp
spaces—and (8) follows.

A.2 Proofs from Section 3

Proof of Proposition 19. For combinatorial hypergraphs, functoriality of dualization with respect to expansive mor-
phisms is [15, Proposition 3.1]. This result easily extends to the setting of measure hypernetworks and to measure-
preserving morphisms.
To show the isometry statements, note that a coupling π ∈ C(µ, µ0 ) between measure spaces (X, µ) and (X 0 , µ0 ) has a
dual π∗ ∈ C(µ0 , µ) defined by
π∗ = φ∗ π,
where φ : X × X 0 → X 0 × X is the map which switches coordinates. It follows that if π ∈ C(µ, µ0 ) and ξ ∈ C(ν, ν 0 )
are optimal couplings between hypernetworks H and H 0 , then their respective dual couplings π∗ and ξ∗ are optimal
couplings of H∗ , H∗0 realizing the same hypernetwork distance.

Proof of Proposition 21. We first establish functoriality. Let (φ, ψ) ∈ h(H, H 0 ) and define B(φ, ψ) : B(H) → B(H 0 )
by 
φ(z) if z ∈ X
B(φ, ψ)(z) =
ψ(z) if z ∈ Y .
It is immediate that if φ and ψ are measure-preserving maps, then B(φ, ψ) is as well. Moreover, if (φ, ψ) is an expansive
morphism then for any w, z ∈ Z,
ω(w, z) if w ∈ X and z ∈ Y
(
ωZ (w, z) = ω∗ (w, z) = ω(z, w) if z ∈ X and w ∈ Y
0 otherwise
ω(φ(w), ψ(z)) if w ∈ X and z ∈ Y
(
≤ ω(φ(z), ψ(w)) = ω∗ (ψ(w), φ(z)) if z ∈ X and w ∈ Y
0 otherwise
= ωZ 0 (B(φ, ψ)(w), B(φ, ψ)(z)),
so that B(φ, ψ) is also expansive.
It remains to show that B is 1-Lipschitz. Let H, H 0 ∈ H and let π ∈ C(µ, µ0 ) and ξ ∈ C(ν, ν 0 ). We define
ρ ∈ C(µZ , µZ 0 ) by
1
ρ(U ) = (π(U ∩ (X × X 0 )) + ξ(U ∩ (Y × Y 0 )) (9)
2
for any Borel set U ⊂ Z × Z 0 . Then ρ is a probability measure on
Z × Z 0 = (X t Y ) × (X 0 t Y 0 ) = X × X 0 ∪ X × Y 0 ∪ Y × X 0 ∪ Y × Y 0 .
By definition, we have
supp(ρ) ⊂ (X × X 0 ) ∪ (Y × Y 0 ). (10)

16
D ECEMBER 8, 2021

Then
Z Z
disN p
B(H),B(H 0 ),p (ρ) = |ωZ (w, z) − ωZ 0 (w0 , z 0 )|p ρ(dw × dw0 )ρ(dz × dz 0 )
Z×Z 0 Z×Z 0
XZ Z
= |ωZ (w, z) − ωZ 0 (w0 , z 0 )|p ρ(dw × dw0 )ρ(dz × dz 0 ),
A,B A B

where we sum over pairs


(A, B) ∈ {X × X 0 , X × Y 0 , Y × X 0 , Y × Y 0 } × {X × X 0 , X × Y 0 , Y × X 0 , Y × Y 0 }.
By (10), the only potentially nonzero terms in this sum belong to
{X × X 0 , Y × Y 0 } × {X × X 0 , Y × Y 0 }.
For (w, w0 ), (z, z 0 ) both belonging to X × X 0 or Y × Y 0 , we have ωZ (w, z) = ωZ 0 (w0 , z 0 ) = 0. Therefore
XZ Z
|ωZ (w, z) − ωZ 0 (w0 , z 0 )|p ρ(dw × dw0 )ρ(dz × dz 0 )
A,B A B
Z Z Z Z 
= + |ωZ (w, z) − ωZ 0 (w0 , z 0 )|p ρ(dw × dw0 )ρ(dz × dz 0 )
X×X 0 Y ×Y 0 Y ×Y 0 X×X 0
Z Z
= |ω(x, y) − ω 0 (x0 , y 0 )|p π(dx × dx0 )ξ(dy × dy 0 )
Y ×Y 0 X×X 0
= disH,H 0 ,p (π, ξ)p ,
where the second equality follows by the definition of ρ. We conclude that
dN ,p (B(H), B(H 0 )) ≤ dH,p (H, H 0 ).

Proof of Theorem 25. The bipartite incidence functor is a bijection B : H → B, with inverse
(X, Y, µ, ω) → (X, 2 · µ|X , Y, 2 · µ|Y , ω|X×Y ),
where µ|X is a measure on X defined by
µ|X (U ) = µ(U )
for any Borel U ⊂ X ⊂ X t Y , µ|Y is defined similarly and ω|X×Y is a restriction of ω to pairs in X × Y ⊂
(X t Y ) × (X t Y ).
Now let H = (X, µ, Y, ν, ω), H 0 = (X 0 , µ0 , Y, ν 0 , ω 0 ) ∈ H. The proof of Proposition 21 shows that
dB,p (B(H), B(H 0 )) ≤ dH,p (H, H 0 ): given π ∈ C(µ, µ0 ) and ξ ∈ C(ν, ν 0 ), the coupling ρ defined in (9) is an
element of CB ((X, Y, µZ ), (X 0 , Y 0 , µZ 0 )) (µZ and µZ 0 defined as in (9)), so the proof of 1-Lipschitzness goes through.
To prove the reverse inequality, let ρ be an arbitrary coupling in CB ((X, Y, µZ ), (X 0 , Y 0 , µZ 0 )) and define
π(U ) = 2 · ρ(U ∩ (X × X 0 )) and ξ(U ) = 2 · ρ(U ∩ (Y × Y 0 )).
Then π ∈ C(µ, µ0 ) and ξ ∈ C(ν, ν 0 ) and computations similar to those in the proof of Proposition 21 can be used to
show that
disB(H),B(H 0 ),p (ρ) = disH,H 0 ,p (π, ξ).
It follows that dB,p (B(H), B(H )) ≥ dH,p (H, H 0 ).
0

Proof of Proposition 30 and related technical results. The proof that the modified line graph map is Lipschitz
will require some technical lemmas.
For µ ∈ Rm and ν ∈ Rn , define a norm k · kL2 (µ⊗ν) on Rm×n by
n X
X m
kAk2L2 (µ⊗ν) = a2ij µi µj .
j=1 i=1

The following lemma says that co-optimal transport distances can always be expressed in terms of this L2 norm. We
remark that the idea of the lemma is implemented in [11] in the setting of GW distance for the purpose of computation
of Fréchet means of ensembles of networks (see also [24]).

17
D ECEMBER 8, 2021

Lemma 36. Let H, H 0 be fully supported finite measure hypernetworks. There exist finite measure hypernetworks
H = (X, µ, Y , ν, ω) ∈ [H] and H 0 = (X, µ, Y , ν, ω 0 ) ∈ [H 0 ] (i.e., with the same underlying measure spaces, but
different hypernetwork functions) such that µ and ν have full support and
dH,2 (H, H 0 ) = dH,2 (H, H 0 ) = kω − ω 0 kL2 (µ⊗ν) . (11)

Proof. Let π and ξ be optimal couplings realizing dH,2 (H, H 0 ). Define

X = supp(π), µ = π|X , Y = supp(ξ), ν = ξ|Y


and define hypernetwork functions by
ω = ω ◦ (pX × pY ) and ω 0 = ω 0 ◦ (pX 0 × pY 0 ),
where pZ denotes coordinate projection for Z ∈ {X, X 0 , Y, Y 0 }. The coordinate projections are measure-preserving,
due to the marginal constraints of π and ξ, and the pairs (pX , pY ) and (pX 0 , pY 0 ) therefore define basic weak isomor-
phisms. Moreover, (11) follows from writing out the definition of dH,2 for finite measure hypernetworks.

For A ∈ Rm×n , let AU ∈ R(m+n)×(m+n) denote the Upper right block matrix
 
0m×m A
AU = .
0n×m 0n×n

Similarly, for B ∈ Rn×n , let BL ∈ R(m+n)×(m+n) denote the Lower right block matrix
 
0m×m 0m×n
BL = .
0n×m B

Finally, let D = diag(µ, ν) ∈ R(m+n)×(m+n) be the diagonal matrix whose entries are the concatenated entries of µ
and ν. The proof of the following lemma is an elementary computation.
Lemma 37. With A, B, µ, ν and D as above,
1 1
kAkL2 (µ⊗ν) = kD 2 AU D 2 kF
and
1 1
kBkL2 (ν⊗ν) = kD 2 BL D 2 kF .

√ 30. We already remarked on the obvious measure-preserving functoriality above. It remains to


Proof of Proposition
show that L̂ is 2-Lipschitz. Let H, H 0 be fully supported finite measure networks and set D = diag(µ, ν). By
Lemma 36, we can assume without loss of generality (replacing measure hypernetworks with weakly isometric copies
as needed) that dH,2 (H, H 0 ) = kω − ω 0 kL2 (µ⊗ν) . Then

2dH,2 (H, H 0 )2 = 2kω − ω 0 k2L2 (µ⊗ν)


1 1 1 1
0
= 2kD 2 ωU D 2 − D 2 ωU D 2 k2F (12)
 1 1
 21  1 1
 21
T 0 T 0
≥ k D 2 ωU DωU D 2 − D 2 (ωU ) DωU D2 k2F (13)
1 1
 1 1
 21 1 1 1 1
 1 1
 12 1 1
= kD 2 D− 2 D 2 ωUT
DωU D 2 D− 2 D 2 − D 2 D− 2 D 2 (ωU0 T 0
) DωU D2 D− 2 D 2 k2F
1 1 1 1
= kD 2 (ω`ˆ)L D 2 − D 2 (ω`0ˆ)L D 2 k2F (14)
= kω`ˆ − ω`0ˆk2L2 (ν⊗ν) (15)
≥ dN ,2 (L̂(H), L̂(H 0 ))2 (16)
where (12) follows from Lemma 37, (13) follows from Lemma 28, (14) follows from a cumbersome but elementary
matrix computation, (15) is another application of Lemma 37, and (16) follows because the optimal coupling of Y and
Y 0 realizing dH,2 (H, H 0 ) is not necessarily optimal for dN ,2 (L̂(H), L̂(H 0 )).

18
D ECEMBER 8, 2021

B Matrix Formulations
For machine learning applications, it is useful to translate our hypernetwork distance into matrix form. As the metric is
based on co-optimal transport of Redko et al., the exposition here closely follows [32].
In this section, we will only deal with finite hypernetworks, and thus will switch from the quintuple notation
(X, µ, Y, ν, ω) to the more compact matrix representation (W, m, n), where W is the |X| × |Y | matrix representation
of ω and m, n are vector representations of µ, ν respectively.
0 0
Let (W, m, n), (W0 , m0 , n0 ) be two hypernetworks where W ∈ Rn×d and W0 ∈ Rn ×d . Here we allow d 6= d0
and n 6= n0 , and we also assume m, n, m0 , n0 are all entrywise positive. We consider the problem of obtaining a soft
alignment of the rows of W with the rows of W0 , and likewise for the columns, while respecting m, m0 and n, n0 ,
respectively. For fully supported probability vectors a ∈ Rp , b ∈ Rq , define the collection of measure couplings
Π(a, b0 ) := {P ∈ Rp×q T
≥0 : P1q = a, P 1p = b}. Then consider the following optimization problem:

X
0 2
min min |Wik − Wjl | Qkl Pij (17)
P∈Π(m,m0 ) Q∈Π(n,n0 )
ijkl
2
= min min hW n, mi + hW0 2 0
n , m0 i − 2hWQW0T , Pi, (18)
P∈Π(m,m0 ) Q∈Π(n,n0 )

where 2 denotes elementwise square and h·i denotes matrix Frobenius product. Equation (18) amounts to a bilinear
optimization problem that was studied in [32] in the context of obtaining correspondences between the samples
(rows) and features (columns) of general data matrices. One may obtain a solution via blockwise coordinate descent.
Define EQ (P) = hWQW0T , Pi, EP (Q) = hQ, WT PW0 i, and observe that the gradients are given by ∇EQ (P) =
WQW0T and ∇EP (Q) = WT PW0 . These gradients can in turn be computed in O(ndd0 + nd0 n0 ) and O(dnn0 +
dn0 d0 ). Assuming n = n0 and d = d0 , the total complexity is O(nd2 + n2 d). Note that the GW problem occurs as the
0 0
special case of Equation 18 where W ∈ Rn×n , W0 ∈ Rn ×n , m = n, m0 = n0 , and we have the constraint P = Q.

C Modeling Choices
In this section, we describe the modeling choices made for our Hypergraph Matching (Example 33) and Multi-resolution
Matching (Example 34) experiments. First we recall some of the numerous choices available when modeling a
hypergraph as a hypernetwork (cf. Section 4). Here we suppose that we are given a hypergraph (X, Y ) and are tasked
with obtaining a hypernetwork (X, µ, Y, ν, ω).
First consider µ. Let x ∈ X. Two simple choices for µ(x) are as follows:
1
Uniform µ(x) := |X| .

Normalized node degree µ(x) := P deg(x) 0 .


x0 ∈X deg(x )

Next consider ν. Let y ∈ Y . Two choices for ν(y) are as follows:


1
Uniform ν(y) := |Y | .

P ν b(y) P
Normalized sum of node degree ν(y) := b(y 0 ) , where νb(y) := x∈y deg(x).
y 0 ∈Y ν

Describing choices for ω requires some more setup. First recall the line graph construction in Section 3.1: a line graph
L(H) (different from the general line graph functors described above) is constructed from H = (X, Y ) by taking the
node set of L(H) to be Y , and adding edges (y, y 0 ) whenever y ∩ y 0 6= ∅. Note that L(H) may be unweighted or
weighted. In the case of a weighted line graph, there are two simple choices:
1
Intersection size Edge (y, y 0 ) receives weight |y∩y 0 | .
|y∪y 0 |
Jaccard index Edge (y, y 0 ) receives weight |y∩y 0 | , i.e. the inverse of the Jaccard index.

Taking inverses above ensures that similar hyperedges with large overlaps are connected by a low-weight edge in the
line graph. Finally, one defines ω(x, y) := min{dL(H) (y 0 , y) : x ∈ y 0 }.

19
D ECEMBER 8, 2021

For both our experiments in Example 33 and Example 34, we set µ to be uniform and ν to be the normalized sum
of node degrees. For Example 33, we obtained ω using the Jaccard index-weighted line graph. For Example 34, we
obtained ω using the intersection size-weighted line graph.
Further details on multi-resolution matching. We now elaborate on the experimental setup used in Example 34.
The crucial step is to obtain an overlapping cover from a graph, which is enough to yield a hypergraph structure. While
there may be numerous approaches for doing so (cf. the Mapper algorithm and related literature [33]), we carry out the
procedure described next. Fix a graph G with vertex set V (G). Initialize a set of visited nodes U = ∅ as well as a
cover Y = ∅. Also fix a node x ∈ V (G) \ U . Let L denote the normalized graph Laplacian of G. We first compute its
eigendecomposition ΦΛΦT . Next note that for a given t > 0, we may compute the graph heat kernel K t := exp(−tL)
as K t = Φ exp(−tΛ)ΦT . Given a Dirac delta vector δx , the matrix-vector product v := K t δx can be interpreted as
the diffusion of a unit mass of heat out from x within time t, and essentially has the form of a Gaussian centered at
x. Following the idea of full width at half maximum (FWHM), we mark the set {v ≥ max(v)/2} as visited, and set
U ← U ∪ {v ≥ max(v)/2}. Additionally we set Y ← Y ∪ {v ≥ max(v)/4}. We then select another x ∈ V (G) \ U
and iterate the procedure until U = V (G). Note that the use of U prevents us from sampling graph nodes too densely,
and the extra 1/2 multiplicative factor in Y allows us to get overlaps between cover elements. For t, we used t = 20
and t = 5 for the first iterations on the Centaur and Wolf, respectively, and t = 5 for the second iterations on both the
Centaur and Wolf. The reason for initially using a larger value of t for the Centaur is that the Centaur has roughly
16,000 nodes whereas the Wolf has roughly 4,000 nodes, and these choices ensured that both models were simplified to
just under 1,000 nodes in the first iteration.

D Hypergraph Simplification via the Lens of Hypernetwork Distance


In this section, we give an example that captures changes in hypernetwork distances as we apply multi-scale hypergraph
simplification following the framework of Zhou et al. [48]. Visualizing large hypergraphs is a challenging task. To
reduce visual clutter, we may apply node collapse and edge collapse to reduce the size of the hypergraph, see Example 15.
Zhou et al. [48] relaxed these notions by allowing nodes to be combined if they belong to almost the same set of
hyperedges, and hyperedges to be merged if they share almost the same set of nodes; these operations are referred to as
node simplification and hyperedge simplification, respectively. We focus on hyperedge simplification in this example.
To enable hyperedge simplification, we first convert a hypergraph H into a weighted line graph L(H) (note that this is
not one of the general line graph functors described in Sect. 3, but rather an ad hoc version designed for this particular
application). Each edge in L(H) is weighted by the multiplicative inverse of the Jaccard index of its end points. We
compute a minimum spanning tree of L(H), denoted as TL(H) . We then perform a topological simplification of H
based on TL(H) . In particular, we sort edges in TL(H) monotonically by increasing weights, {e1 , · · · , ek }. We then
simplify the hypergraph H across multiple scales by merging hyperedges connected by edge ei ∈ TL(H) for each i, as i
increases from 1 to k.
Fig. 7 gives an example hypergraph H = H0 where nodes represent genes and hyperedges represent pathways from
the Hallmarks collection within the Molecular Signatures Database (MSigDB) [25, 39]. We apply a multi-scale
hyperedge simplification and obtain a sequence of simplified hypergraphs, H1 , . . . , H8 . We use the parameter settings
of Example 31 and model each Hi (0 ≤ i ≤ 8) as a hypernetwork. We then compute the hypernetwork distances
dH,2 (H0 , Hi ) between H0 and Hi (for 1 ≤ i ≤ 8), and report them in the diagram of Fig. 7. dH,2 (H0 , Hi ) is shown
to (generally) increase as we simplify the hypergraph across multiple scales (except at H2 ); which aligns with the
intuition. dH,2 (H0 , Hi ) also changes gradually at H2 , H3 , and H3 , indicating small amount of information loss. This
example provides an initial evidence that the hypergraph distance dH,2 may be used to quantify information loss during
hypergraph simplification, thus supporting parameter tuning. See [18] for related work on error metrics for mesh
simplification.

20
D ECEMBER 8, 2021

H2 H3 H4
<latexit sha1_base64="kaS3+25U1lLVpBRpVND5Wgih/CQ=">AAAB7HicbVBNSwMxEJ2tX7V+VT16CbaCp7JbET0WvPQkFdy20C4lm2bb0GyyJFmhLP0NXjwo4tUf5M1/Y9ruQVsfDDzem2FmXphwpo3rfjuFjc2t7Z3ibmlv/+DwqHx80tYyVYT6RHKpuiHWlDNBfcMMp91EURyHnHbCyd3c7zxRpZkUj2aa0CDGI8EiRrCxkl9tDq6qg3LFrbkLoHXi5aQCOVqD8ld/KEkaU2EIx1r3PDcxQYaVYYTTWamfappgMsEj2rNU4JjqIFscO0MXVhmiSCpbwqCF+nsiw7HW0zi0nTE2Y73qzcX/vF5qotsgYyJJDRVkuShKOTISzT9HQ6YoMXxqCSaK2VsRGWOFibH5lGwI3urL66Rdr3nXNfehXmnc53EU4QzO4RI8uIEGNKEFPhBg8Ayv8OYI58V5dz6WrQUnnzmFP3A+fwCCQo3c</latexit>

<latexit sha1_base64="CzVSSc0UXE7HfK6vJlDpCqwhbL8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LLaCp5IURI8FLz1JBVMLbSib7bRdutmE3Y1QQn+DFw+KePUHefPfuG1z0NYHA4/3ZpiZFyaCa+O6305hY3Nre6e4W9rbPzg8Kh+ftHWcKoY+i0WsOiHVKLhE33AjsJMopFEo8DGc3M79xydUmsfywUwTDCI6knzIGTVW8qvNfr3aL1fcmrsAWSdeTiqQo9Uvf/UGMUsjlIYJqnXXcxMTZFQZzgTOSr1UY0LZhI6wa6mkEeogWxw7IxdWGZBhrGxJQxbq74mMRlpPo9B2RtSM9ao3F//zuqkZ3gQZl0lqULLlomEqiInJ/HMy4AqZEVNLKFPc3krYmCrKjM2nZEPwVl9eJ+16zbuquff1SuMuj6MIZ3AOl+DBNTSgCS3wgQGHZ3iFN0c6L86787FsLTj5zCn8gfP5A4C9jds=</latexit> <latexit sha1_base64="jPT+byBYw3o49sARJhglnoWRK2k=">AAAB7HicbVBNSwMxEJ2tX7V+VT16CbaCp7JbFD0WvPQkFdy20C4lm2bb0GyyJFmhLP0NXjwo4tUf5M1/Y9ruQVsfDDzem2FmXphwpo3rfjuFjc2t7Z3ibmlv/+DwqHx80tYyVYT6RHKpuiHWlDNBfcMMp91EURyHnHbCyd3c7zxRpZkUj2aa0CDGI8EiRrCxkl9tDq6qg3LFrbkLoHXi5aQCOVqD8ld/KEkaU2EIx1r3PDcxQYaVYYTTWamfappgMsEj2rNU4JjqIFscO0MXVhmiSCpbwqCF+nsiw7HW0zi0nTE2Y73qzcX/vF5qotsgYyJJDRVkuShKOTISzT9HQ6YoMXxqCSaK2VsRGWOFibH5lGwI3urL66Rdr3nXNfehXmnc53EU4QzO4RI8uIEGNKEFPhBg8Ayv8OYI58V5dz6WrQUnnzmFP3A+fwCDx43d</latexit>

H0 H1
<latexit sha1_base64="mVjOJ4Jldi30DmJR/sYqjpZ2VA4=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LLaCp5IURI8FLz1JBVMLbSib7aZdutmE3YlQSn+DFw+KePUHefPfuG1z0NYHA4/3ZpiZF6ZSGHTdb6ewsbm1vVPcLe3tHxwelY9P2ibJNOM+S2SiOyE1XArFfRQoeSfVnMah5I/h+HbuPz5xbUSiHnCS8iCmQyUiwShaya82+261X664NXcBsk68nFQgR6tf/uoNEpbFXCGT1Jiu56YYTKlGwSSflXqZ4SllYzrkXUsVjbkJpotjZ+TCKgMSJdqWQrJQf09MaWzMJA5tZ0xxZFa9ufif180wugmmQqUZcsWWi6JMEkzI/HMyEJozlBNLKNPC3krYiGrK0OZTsiF4qy+vk3a95l3V3Pt6pXGXx1GEMziHS/DgGhrQhBb4wEDAM7zCm6OcF+fd+Vi2Fpx85hT+wPn8AX2zjdk=</latexit> <latexit sha1_base64="9boGapoVcpa2lnod2ocmr0/M3jk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LLaCp5IURI8FLz1JBVMLbSib7aRdutmE3Y1QSn+DFw+KePUHefPfuG1z0NYHA4/3ZpiZF6aCa+O6305hY3Nre6e4W9rbPzg8Kh+ftHWSKYY+S0SiOiHVKLhE33AjsJMqpHEo8DEc3879xydUmifywUxSDGI6lDzijBor+dVm36v2yxW35i5A1omXkwrkaPXLX71BwrIYpWGCat313NQEU6oMZwJnpV6mMaVsTIfYtVTSGHUwXRw7IxdWGZAoUbakIQv198SUxlpP4tB2xtSM9Ko3F//zupmJboIpl2lmULLloigTxCRk/jkZcIXMiIkllClubyVsRBVlxuZTsiF4qy+vk3a95l3V3Pt6pXGXx1GEMziHS/DgGhrQhBb4wIDDM7zCmyOdF+fd+Vi2Fpx85hT+wPn8AX84jdo=</latexit>

H5 H6 H7 H8
<latexit sha1_base64="Et4ToEv+XUejotZfaatdp0Bt7bY=">AAAB7HicbVBNSwMxEJ2tX7V+VT16CbaCp7JbKHoseOlJKrhtoV1KNs22oUl2SbJCWfobvHhQxKs/yJv/xrTdg7Y+GHi8N8PMvDDhTBvX/XYKW9s7u3vF/dLB4dHxSfn0rKPjVBHqk5jHqhdiTTmT1DfMcNpLFMUi5LQbTu8WfveJKs1i+WhmCQ0EHksWMYKNlfxqa9ioDssVt+YugTaJl5MK5GgPy1+DUUxSQaUhHGvd99zEBBlWhhFO56VBqmmCyRSPad9SiQXVQbY8do6urDJCUaxsSYOW6u+JDAutZyK0nQKbiV73FuJ/Xj810W2QMZmkhkqyWhSlHJkYLT5HI6YoMXxmCSaK2VsRmWCFibH5lGwI3vrLm6RTr3mNmvtQrzTv8ziKcAGXcA0e3EATWtAGHwgweIZXeHOk8+K8Ox+r1oKTz5zDHzifP4VMjd4=</latexit> <latexit sha1_base64="oTnBY9lquGHgoAv+2XEIra0UKlQ=">AAAB7HicbVBNSwMxEJ2tX7V+VT16CbaCp7Jb8ONY8NKTVHDbQruUbJptQ7PJkmSFsvQ3ePGgiFd/kDf/jWm7B219MPB4b4aZeWHCmTau++0UNja3tneKu6W9/YPDo/LxSVvLVBHqE8ml6oZYU84E9Q0znHYTRXEcctoJJ3dzv/NElWZSPJppQoMYjwSLGMHGSn61ObiuDsoVt+YugNaJl5MK5GgNyl/9oSRpTIUhHGvd89zEBBlWhhFOZ6V+qmmCyQSPaM9SgWOqg2xx7AxdWGWIIqlsCYMW6u+JDMdaT+PQdsbYjPWqNxf/83qpiW6DjIkkNVSQ5aIo5chINP8cDZmixPCpJZgoZm9FZIwVJsbmU7IheKsvr5N2veZd1dyHeqVxn8dRhDM4h0vw4AYa0IQW+ECAwTO8wpsjnBfn3flYthacfOYU/sD5/AGG0Y3f</latexit> <latexit sha1_base64="x6C1mdjYfdcgSZzGtqULTol/7yo=">AAAB7HicbVBNSwMxEJ2tX7V+VT16CbaCp7JbkHoseOlJKrhtoV1KNs22oUl2SbJCWfobvHhQxKs/yJv/xrTdg7Y+GHi8N8PMvDDhTBvX/XYKW9s7u3vF/dLB4dHxSfn0rKPjVBHqk5jHqhdiTTmT1DfMcNpLFMUi5LQbTu8WfveJKs1i+WhmCQ0EHksWMYKNlfxqa9ioDssVt+YugTaJl5MK5GgPy1+DUUxSQaUhHGvd99zEBBlWhhFO56VBqmmCyRSPad9SiQXVQbY8do6urDJCUaxsSYOW6u+JDAutZyK0nQKbiV73FuJ/Xj810W2QMZmkhkqyWhSlHJkYLT5HI6YoMXxmCSaK2VsRmWCFibH5lGwI3vrLm6RTr3k3NfehXmne53EU4QIu4Ro8aEATWtAGHwgweIZXeHOk8+K8Ox+r1oKTz5zDHzifP4hWjeA=</latexit> <latexit sha1_base64="xSj6efmsD7auGB4eKTIWcPkBBhs=">AAAB7HicbVBNSwMxEJ2tX7V+VT16CbaCp7JbEHsseOlJKrhtoV1KNs22oUl2SbJCWfobvHhQxKs/yJv/xrTdg7Y+GHi8N8PMvDDhTBvX/XYKW9s7u3vF/dLB4dHxSfn0rKPjVBHqk5jHqhdiTTmT1DfMcNpLFMUi5LQbTu8WfveJKs1i+WhmCQ0EHksWMYKNlfxqa9ioDssVt+YugTaJl5MK5GgPy1+DUUxSQaUhHGvd99zEBBlWhhFO56VBqmmCyRSPad9SiQXVQbY8do6urDJCUaxsSYOW6u+JDAutZyK0nQKbiV73FuJ/Xj81USPImExSQyVZLYpSjkyMFp+jEVOUGD6zBBPF7K2ITLDCxNh8SjYEb/3lTdKp17ybmvtQrzTv8ziKcAGXcA0e3EITWtAGHwgweIZXeHOk8+K8Ox+r1oKTz5zDHzifP4nbjeE=</latexit>

Figure 7: Multi-scale simplification of a hypergraph via hyperedge simplification, together with changes in hypernetwork
distances. The plot of changes in hypernetwork distance may loosely be interpreted as a measure of information loss
during simplification.

21

You might also like