0% found this document useful (0 votes)
3 views

Network mutual information measures for graph similarity

Uploaded by

Fernanda Ricome
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Network mutual information measures for graph similarity

Uploaded by

Fernanda Ricome
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Network mutual information measures for graph similarity

Helcio Felippe,1 Federico Battiston,1, ∗ and Alec Kirkley2, 3, 4, †


1
Department of Network and Data Science, Central European University, Vienna
2
Institute of Data Science, University of Hong Kong, Hong Kong
3
Department of Urban Planning and Design, University of Hong Kong, Hong Kong
4
Urban Systems Institute, University of Hong Kong, Hong Kong
(Dated: May 9, 2024)
A wide range of tasks in exploratory network analysis and machine learning, such as clustering
network populations or identifying anomalies in temporal graph streams, require a measure of the
similarity between two graphs. To provide a meaningful data summary for downstream scientific
analyses, the graph similarity measures used in these unsupervised settings must be principled, in-
arXiv:2405.05177v1 [physics.soc-ph] 8 May 2024

terpretable, and capable of distinguishing meaningful overlapping network structure from statistical
noise at different scales of interest. Here we derive a family of graph mutual information measures
that satisfy these criteria and are constructed using only fundamental information theoretic princi-
ples. Our measures capture the information shared among networks according to different encodings
of their structural information, with our mesoscale mutual information measure allowing for net-
work comparison under any specified network coarse-graining. We test our measures in a range
of applications on real and synthetic network data, finding that they effectively highlight intuitive
aspects of network similarity across scales in a variety of systems.

I. INTRODUCTION imposed by global properties such as graph density or


degree distributions. In this context, information theory
Network similarity and distance measures are widely can provide a principled framework for deciding the ex-
applied across science and engineering disciplines for tent to which two networks are similar by allowing us
understanding the shared structure among multiple to compute a mutual information between two graphs
graphs [1–6]. Common graph-level analysis tasks such as that quantifies the amount of information they share un-
network population clustering [7], network regression [8], der a particular encoding scheme. Mutual information
and network classification [9] require as input a network measures resulting from a minimum description length
similarity measure and are highly sensitive to this choice, (MDL) approach have been used to compare partitions
leading to the construction of a vast number of graph of objects in a diverse range of applications [20–24], be-
similarity measures to accommodate different applica- coming the standard measure for comparing partitions of
tion needs [10–12]. In the unsupervised setting, the goal networks in the community detection setting [25].
of performing graph similarity or distance calculations An often overlooked aspect of graph similarity mea-
is often to identify some meaningful summary of a set of sures is what scale is highlighted in the network com-
graphs or network layers—for example, clusters of graphs parison calculation. Most existing graph similarity mea-
or a subset of anomalous graphs [13, 14]. In this case, it sures can be broadly categorized based on the structural
is essential that the graph similarity or distance measure level—micro, meso, or macro—at which they highlight
used is principled, interpretable, and capable of distin- network similarity [16]. For unsupervised tasks involving
guishing meaningful shared structure among the input graph similarity measures, any of the three network scales
graphs from statistical noise. Existing measures based on may be of interest. For example, when comparing snap-
collections of network summary statistics or feature em- shots of a longitudinal social network to examine its tem-
beddings [15, 16] are often hard to interpret, and there poral evolution, different analyses may require a network
is no clear criteria for which features to include in the similarity measure that focuses on micro-scale structural
similarity/distance calculation. Methods based on graph overlap among individuals’ ego networks, a measure that
spectra [17, 18] have a clear connection to community captures the evolving meso-scale community structure of
structure and random walk dynamics on graphs, but are the network, or a measure that tracks macro-scale sum-
also challenging to interpret given the localization trade- mary statistics of the network such as its total density.
off between graph space and frequency space [19]. Ad- Existing graph similarity measures have largely been con-
ditionally, many of these methods that do not rely on structed to highlight a particular scale of interest—for
label supervision do not explicitly aim to capture statisti- example, individual edge overlap at the microscale is the
cally meaningful structural distinctions between graphs, focus of traditional graph edit distances [26], and global
meaning that they can reflect a high amount of similar- path structure in the network is highlighted by between-
ity between two graphs that is purely due to constraints ness or closeness centrality-based network similarity mea-
sures [15]. Spectral distance measures can simultaneously
highlight different scales within the network due to the
[email protected] presence of high and low frequency eigenmodes [17, 18],
[email protected] but it is unclear exactly to what extent each of the three
2

(a) NMI = 0.06 (b) DC-NMI = 0.18


G1 G2

i i

E1 = 12, E2 = 13, E12 = 6 k1 (i) = 2, k2 (i) = 2, k12 (i) = 1

(c) MesoNMI = 0.94 (d) MesoNMI = 0.84

(b) (b)
B = 2, E12 = 12 B = 4, E12 = 11

Figure 1. Family of proposed network mutual information measures for graph similarity. (a) Standard
normalized mutual information (NMI, Eq. 10) between networks G1 and G2 , with shared node labels indicated by node
positions. Due to little overlap in the edge positions, this NMI measure returns a score NMI(G1 ; G2 ) ≈ 0. (b) Degree-corrected
normalized mutual information (DC-NMI, Eq. 16) between graphs G1 and G2 . Due to little overlap in node neighborhoods,
we also see a low value for DC-NMI(G1 ; G2 ). (c) Mesoscale normalized mutual information (MesoNMI, Eq. 30) between
networks G1 and G2 with respect to the indicated node partition b into B = 2 groups. Since the mesoscale structure of these
(b)
networks is quite similar, as indicated by the edge overlap E12 = 12, we have a high similarity value MesoNMI(b) (G1 ; G2 ) ≈ 1.
(d) MesoNMI between the two networks but with respect to a different partition b with B = 4 groups. Here we still see a
relatively high MesoNMI value, indicating substantial shared structure at this smaller scale. For reference, the Jaccard index
among the edge sets in this case is |G1 ∩ G2 |/|G1 ∪ G2 | = 0.315—a much higher value than the NMI in panel (a)—indicating
that the edge overlap is not much different than expected based on the network densities.

scales contributes to the computed similarity. There is problem of adjusting the scale at which the graph struc-
currently a lack of graph similarity measures that have a ture is compared. In [29], an MDL approach for cluster-
clear dependence on the scale of interest, which is critical ing populations of networks is presented, which considers
for interpretability and flexibility in applications. encoding sample graphs based on their cluster’s repre-
In [27], an information theoretic measure for compar- sentative “mode” network. The encoding for a cluster of
ing networks is presented that is based on identifying networks resembles the encoding used here for the stan-
shared substructures (e.g. stars and cliques) that mini- dard (non-degree-corrected) conditional entropy between
mize the description length of a pair of networks. This graphs, but it does not accommodate the comparison of
approach is quite elegant and flexible but is limited in networks in a symmetric pairwise manner, and also does
practice to subgraph similarities and can become compu- not address the issue of scale.
tationally burdensome with the inclusion of larger struc- In this paper we construct a family of mutual informa-
tural vocabularies. It also does not allow for the compar- tion measures for computing graph similarity at different
ison of graphs at the meso- or macro-scale while ignor- scales (see Fig. 1). We do this by considering multi-
ing edge-level details, as is possible with the mesoscale ple encodings of graphs that exploit different aspects of
mutual information measure presented in this paper. shared network structure—edges, node neighborhoods,
A graph mutual information measure is also proposed and mixing patterns among arbitrary partitions of nodes
in [28], but this measure requires a continuous embed- in a network—and using these encodings to quantify the
ding of the input graphs which may produce distortions amount of shared information between two input graphs.
of its topological structure. It also does not consider the Our measures are principled, interpretable, fast to com-
3

We will let G be the set of N2 possible edges on these N



pute, and naturally highlight the significant shared struc-
ture between two graphs beyond the overlap expected nodes, so that G1 , G2 ⊆ G.
due to global structural constraints such as edge density Analogous to constructing a mutual information
and degree distributions. To validate our methods we among labellings [20], we will derive our network mutual
apply them in a range of experiments, finding that they information measure by first considering the information
intuitively capture structural perturbations in synthetic required to transmit the network G2 to a receiver using
networks and identify meaningful shared structure across a particular encoding scheme. Then we will see by how
layers in multilayer networks. much we can decrease our information cost if we transmit
G1 and exploit the graphs’ shared structure to transmit
G2 . The information savings we incur is the mutual infor-
II. MEASURES mation between G1 and G2 under the specified encoding.
We will assume that the receiver knows the num-
To measure the similarity between two graphs G1 and ber of nodes N and edges E2 = |G2 | in the second
G2 , we can appeal to information theory and compute graph. (Transmitting this quantity will require compara-
the amount of information shared between G1 and G2 , tively negligible
 information, so we can ignore it anyway.)
also known as their mutual information [30]. Due to There are N2 possible unique undirected edges in G that
N 
nice properties such as non-negativity, boundedness, and one can construct using N nodes, so there are ( 2 ) pos-
E2
symmetry, mutual information measures and their nor- sible networks G2 consistent with the constraints known
malized variants have been widely used across unsuper- by the receiver, among which we must specify a single
vised learning tasks within and outside of network sci- one to transmit the graph structure G2 . The base-two
ence [22, 23, 31]. logarithm of this quantity is therefore the approximate
A key component in the development of a mutual in- number of bits we will need to encode all these possi-
formation measure is the specification of an encoding bilities using binary strings if we choose a simple encod-
scheme, which specifies how one will represent data con- ing that does not involve transmitting any intermediate
figurations in a codebook that is used to communicate summary information about G2 to the receiver. (We will
the data to a receiver. Encoding schemes that focus on describe two possibilities for such an encoding later on.)
different structural properties in a graph will naturally We call this quantity of information the entropy H(G2 )
tend to result in different estimates of the information of the graph G2 under this simple encoding scheme, and
shared between the graphs, thus different values of the it is given by
mutual information. Here we describe three encoding
schemes that aim to highlight shared structure among  N 
H(G2 ) = log 2 . (1)
graphs at different scales, and discuss how to use them
E2
to form three mutual information measures that capture
different aspects of similarity among networks. For simplicity of presentation, we will assume all loga-
rithms are base-two for the remainder of the paper.
We can simplify Eq. (1) into a more recog-
A. Network mutual information nizable form using the Stirling approximation
log x! ≈ x log x − x/ln(2), giving
Let G1 and G2 be graphs on the same set of N labelled  
nodes. We will focus our attention on the case where N
these graphs are undirected with no self- or multi-edges, H(G2 ) ≈ Hb (p2 ), (2)
2
but discuss later on how our measures can be straightfor-
wardly extended when we relax these assumptions. For where p2 = E2 / N2 is the fraction of all possible edges

convenience we will represent G1 and G2 with sets of occupied in the graph and
undirected, unweighted edges of sizes E1 and E2 (the
number of unique edges in G1 and G2 ) respectively. In Hb (p) = −p log p − (1 − p) log(1 − p) (3)
principle, our measure can handle any two graphs G1 and
G2 that have the same number of nodes N —regardless of is the binary entropy function. Equation (2) tells us that
whether these node labels are aligned—but in the case of it takes approximately Hb (p2 ) bits of information to spec-
unaligned node labels one must perform graph alignment ify theexistence or non-existence of an edge for each of
prior to using the algorithm in order to obtain meaning- the N2 possible edge slots, given that the receiver knows
ful results. Therefore, our measures will primarily be there will be E2 total edges.
of interest for comparing networks generated from cross- Now, consider the case where the receiver already
sectional studies with identical constraints across sub- knows G1 prior to us transmitting G2 . In this case, the
jects (e.g. brain networks among a set of patients [32]), information required to transmit G2 should be reduced,
longitudinal studies (e.g. multiple observations of a so- since we can exploit the information shared between G1
cial network among the same set of students [33]), and and G2 to reduce the size of our encoding. More specif-
experimental studies with repeated measurements [34]. ically, this reduction of information is possible when the
4

receiver knows both G1 and how G2 differs from G1 , The first term in the product counts the number of ways
since this additional constraint further reduces the num- to choose the E12 true positives from the set of edges in
ber of possibilities for G2 that need to be encoded. In G1 , and the second term counts the number of ways to
the standard formulation of mutual information between choose the E2 − E12 false positives from the remaining
labellings, the discrepancies between the labellings are edges not contained in G1 . The two actions together,
encoded in a contingency table, which counts the number which fully specify G2 , can be taken independently and
of instances in which an object is classified into one clus- so the total number of combinations available is given by
ter under the first labelling and another cluster in the the product of the two binomial coefficients.
second labelling. Typically one ignores the amount of We can now quantify the amount of information shared
information required to transmit the contingency table between the graphs G1 and G2 as the reduction in the en-
between two labellings to the receiver, as its information tropy of G2 that results from knowing G1 and the contin-
content vanishes in the limit of large labellings [20]. gency table. We call this the mutual information between
We can consider the  labelling associated with the edge the graphs, and it can be represented mathematically as
set Gi as a length- N2 binary vector whose indices repre-
sent all possible edges in G, and that contains a 1 for all MI(G1 , G2 ) = H(G2 ) − H(G2 |G1 ). (5)
entries corresponding to the edges that are present in Gi .
Grouping terms and applying Stirling’s approximation,
(Equivalently, one can just consider flattening the upper
we arrive at a simple, manifestly symmetric form for the
triangle of the adjacency matrix representation of Gi into
graph mutual information:
a vector.) In this case, the contingency table comparing
the labels in G1 and G2 takes a particularly simple form.  
N
For any possible edge (i, j) with 1 ≤ i, j ≤ N , we can MI(G1 , G2 ) ≈ [Hb (p1 ) + Hb (p2 ) − Hs (P12 )], (6)
2
either have that:
where
1. (i, j) ∈ G1 and (i, j) ∈ G2 . We call this subset of
edges true positives to indicate that they are con- P12 = {p12 , p1 − p12 , p2 − p12 , 1 − p1 − p2 + p12 } (7)
tained in both sets G1 and G2 . The set of true
positives is given by G1 ∩ G2 , whose size we denote is the (normalized) contingency table between the la-
with E12 = |G1 ∩ G2 |. bellings corresponding to G1 and G2 —encoding the frac-
tion of all N2 possible edge slots that are true posi-

2. (i, j) ∈ G1 and (i, j) ∈ / G2 . We call this subset tives, false negatives, false positives, and true negatives
of edges false negatives to indicate that they are respectively—and Hs is the Shannon entropy
contained in the set G1 but not the set G2 . The
set of false negatives is given by G1 \ G2 , whose size
X
Hs (p) = − pi log pi . (8)
is E1 − E12 . i

3. (i, j) ∈
/ G1 and (i, j) ∈ G2 . We call this subset of Typically, mutual information measures are written in
edges false positives to indicate that they are not terms of bits per symbol rather than total bits
contained in the set G1 but are contained in the set  (the units
of Eq. (6). Dividing out the prefactor of N2 ), we get the
G2 . The set of false positives is given by G2 \ G1 , more familiar form of the mutual information
whose size is E2 − E12 .
I(G1 ; G2 ) = Hb (p1 ) + Hb (p2 ) − Hs (P12 ), (9)
4. (i, j) ∈
/ G1 and (i, j) ∈/ G2 . We call this subset of
edges true negatives to indicate that they are not which just corresponds to the Shannon mutual informa-
contained in either of G1 or G2 . The set of true tion [30] between the binary vectors encoding the edge
negatives is given by G − (G1 ∪ G2 ), whose size is positions in G1 and G2 .
N

2 − E1 − E2 + E12 . The graph mutual information of Eq. (9) takes the
These true/false positives/negatives are the four values form of the standard Shannon mutual information, and
of the contingency table between the binary labellings therefore is bounded in the interval 0 ≤ I(G1 ; G2 ) ≤
associated with G1 and G2 . Hb (p1 ), Hb (p2 ), allowing for the normalized expression
Given that the receiver knows G1 , E2 , and the contin- I(G1 ; G2 )
gency table—or, equivalently, the number of true pos- NMI(G1 ; G2 ) = 2 × . (10)
Hb (p1 ) + Hb (p2 )
itives E12 since this is the only linearly independent
unknown—we can compute the logarithm of the number Equation (10) maps the graph mutual information to the
of possible configurations that G2 can take under these interval [0, 1] to allow for easier interpretation across sys-
constraints as the conditional entropy H(G2 |G1 ). This is tems of different sizes. (There are many other options
given by for normalizing the mutual information, which have their
  N   own benefits and drawbacks [22].)
E1 2 − E1 We note that, as with any mutual information measure,
H(G2 |G1 ) = log . (4)
E12 E2 − E12 Eq. (9) is invariant to label permutations in the binary
5

representation of the graphs: I(G1 ; G2 ) = I(G1 ; G2 ) = do this, a two-step process is required: firstly, we must
I(G1 ; G2 ) = I(G1 ; G2 ), where Gi = G − Gi is the graph list the degree sequence ki = {ki (1), ..., ki (N )} of each
complement of Gi . This follows intuitively, since specify- graph i. Secondly, we must specify how the total edge
ing the positions of edges is equivalent to specifying the overlap E12 is distributed among the N node neighbor-
positions of non-edges in terms of information content. In hoods. These two steps have information content that
practice, however, this symmetry will rarely ever have an scales like O(E1 + E2 ), which for sparse graphs can be
effect on results, since we are nearly always in the sparse neglected when we normalize by N2 to get the bits per

regime where p1 , p2 < 0.5, so the mutual information is symbol value of the mutual information. We therefore ig-
monotonic in the overlap p12 for fixed p1 , p2 . nore this intermediate information cost, which we will see
Computing the graph mutual information measure in leads to a nice clean expression for the degree-corrected
Eq. (9) is very fast in practice, only requiring us to find graph mutual information.
the sizes of the edge sets G1 , G2 , and G1 ∩ G2 . The to- Given knowledge of the degrees k2 and how the E12
tal complexity of these calculations is equal to the com- overlapping edges are distributed across the node neigh-
plexity of constructing the sets themselves, so poses no borhoods (i.e. the rows of G2 ’s adjacency matrix repre-
additional computational burden. sentation), we can compute the conditional entropy be-
We can adapt the mutual information of Eq. (9) to net- tween G1 and G2 as
works with directed
 and/or self-edges by simply changing N
the constant N2 for the size of G from which we must
  
X k1 (i) N − k1 (i)
Hdeg (G2 |G1 ) = log , (12)
pick the edges in our graph. For directed graphs with k 12 (i) k2 (i) − k12 (i)
i=1
self-edges, directed graphs without self-edges, and undi-
rected graphs with self-edges, we can set this constant to (1) (2)
where k12 (i) = |∂i ∩ ∂i | is the number of true posi-
be N 2 , N (N − 1), and N2 + N respectively. One can
tives (e.g. overlapping edges with G1 ) attached to node i.
also adapt our network mutual information framework to
The first binomial coefficient in the summand counts
multigraphs by using the mesoscale mutual information
the number of possible configurations for the overlap-
formulation we present in Sec. II C and putting each node
ping edges (i.e. those shared with G1 ) attached to node i
in their own group in the input partition.
in G2 . Meanwhile, the second term counts the number
One can also derive from Eq. (9) a variation of infor- of possible configurations for the non-overlapping edges
mation measure [35] between graphs, thus (i.e. those not shared with G1 ) attached to node i in G2 .
The analogous entropy expression for G2 if the receiver
VI(G1 ; G2 ) = Hb (p1 ) + Hb (p2 ) − 2 × I(G1 ; G2 ). (11)
does not have knowledge of any of the shared structure
with G1 —but does know the degrees k2 , which are inde-
Equation (11) has an advantage over the graph mutual
pendent of G1 —is given by
information for various tasks such as network embed-
ding [36] due to its pseudometric property, but we will N  
not explore its applications here.
X N
Hdeg (G2 ) = log . (13)
i=1
k2 (i)

B. Degree-corrected network mutual information This is just the amount of information required to specify
the nodes attached to i given its degree. Technically
both Eqs. (12) and (13) are only upper bounds on the
The graph mutual information measure presented in
conditional entropy and entropy for undirected graphs
Sec. II A is based on an encoding scheme in which the re-
under this encoding scheme, since only one edge direction
ceiver’s knowledge of the overlap (“true positive” count)
must be known to specify each edge. They are, however,
E12 between the graphs G1 and G2 constrains the num-
exact for directed graphs in which the degree sequences ki
ber of possibilities for G2 once G1 is known. This allows
can be chosen to be either the in- or out-degrees.
for a reduction in the information required to specify G2
By comparing Eqs. (13) and (12) with Eqs. (1) and (4),
after G1 is sent, giving a mutual information measure.
we can immediately identify the mutual information
However, one can also consider modifying the encoding
value for this degree-corrected scheme as the average of
process to exploit other shared structure between the net-
node-level mutual information values, thus
works prior to the transmission of G2 , which results in
a mutual information that captures a different notion of N
1 X
structural similarity between the graphs. Ideg (G1 ; G2 ) = [Hb (p1 (i)) + Hb (p2 (i)) − Hs (P12 (i))],
One such modification is “degree-correction”, analo- N i=1
gous to the degree correction of the stochastic block (14)
model (SBM) for community detection [37]. In this case,
instead of specifying the global edge overlap E12 between where p1 (i) = k1 (i)/(N − 1), p2 (i) = k2 (i)/(N − 1), and
G1 and G2 , we can specify how much overlap there is
(1) (2) P12 (i) = {p12 (i), p1 (i) − p12 (i), p2 (i) − p12 (i),
between the individual neighborhoods ∂i and ∂i of
each node i in G1 and G2 , respectively. In order to 1 − p1 (i) − p2 (i) + p12 (i)}, (15)
6

with p12 (i) = k12 (i)/(N − 1). We normalize by N − 1 structure such as communities or core-periphery struc-
for graphs without self-edges since a node can connect ture unless the exact positions of the edges within these
to at most N − 1 nodes excluding itself. For networks larger-scale structures happen to overlap. Indeed, if G1
with self-edges, one can change the normalization to N and G2 are both sparse networks generated from an
to account for the possibility of a node that is completely SBM [38], there is a very low probability that they share
connected to all other nodes including itself. a substantial fraction of edges despite having very similar
A degree-corrected NMI measure can be constructed mesoscale divisions into communities since each commu-
for Eq. (14) by noting that 21 [Hb (p1 (i)) + Hb (p2 (i))] is an nity is itself a sparse random graph whose edge positions
upper bound for each summand Hb (p1 (i)) + Hb (p2 (i)) − are uncorrelated. In this case, the measures we have dis-
Hs (P12 (i)) in Eq. (14), each of which is a mutual infor- cussed will likely ascribe little similarity to the two graphs
mation measure at the node neighborhood level. We can despite their identical mesoscale community structure.
therefore replace the summand in Eq. (14) with this up- To obtain meaningful graph similarity results at different
per bound to obtain an upper bound on the total degree- scales of interest—particularly at the mesoscale where
corrected mutual information. The resulting normalized community structure is present—it is therefore impor-
mutual information measure is given by tant to consider network mutual information formula-
tions that take larger-scale structure into account while
Ideg (G1 ; G2 ) ignoring small-scale details. Here we present one pos-
DC-NMI(G1 ; G2 ) = 2 × 1
P ,
N i [Hb (p1 (i)) + Hb (p2 (i))] sible formulation of such a mutual information measure
(16) between graphs which we call the mesoscale graph mutual
information.
which will be bounded to [0, 1]. As before, we can Consider the same problem setting as in Sec. II A,
also construct a degree-corrected variation of information where we have undirected, unweighted, node-aligned
measure for networks, thus graphs G1 and G2 with N nodes and E1 , E2 edges, re-
N spectively. Assume again that N, E1 , E2 are already
1 X known by the receiver. This time we will always allow
VIdeg (G1 ; G2 ) = [Hb (p1 (i)) + Hb (p2 (i))]
N i=1 G1 and G2 to potentially have self- or multi-edges, since
this will allow for easier computations, but in principle
− 2 × Ideg (G1 ; G2 ). (17)
the calculation can be extended to graphs without self-
One can show that the above expression is also a pseu- or multi-edges with more detailed combinatorics. If G1
dometric like the usual variation of information. and G2 share mesoscale structure (e.g. in the form of
Similar to the non-degree-corrected mutual informa- groups of highly connected nodes), but not necessarily
tion measure of Eq. (10), the degree-corrected mutual microscale structure (in the form of overlapping edges),
information measures presented here can be computed then the encoding schemes of sections II A and II B will
with a time complexity that is linear in the size of the be very inefficient, since there is little shared information
edge sets G1 and G2 being compared. Once the set in- at the microscale we can exploit to reduce the entropy of
tersection G1 ∩ G2 has been computed, we just need to our transmission. However, if we instead only aim for a
iterate through these shared edges again and increment mesoscale description of each network—such as the edge
k12 (i) whenever node i is present in the edge. counts within and between communities in a given par-
In contrast with the measures of Sec. II A that are tition of the graphs—then we can formulate a mutual
unaffected by where edges differ between the two graphs, information measure that captures the shared mesoscale
the degree-corrected measures presented here will ascribe information between G1 and G2 while ignoring their mi-
higher similarity to graphs whose edge discrepancies are croscale differences.
concentrated on relatively few nodes. In this sense, the Here we consider partitioning the nodes in both graphs
DC-NMI of Eq. (16) focuses on node-level structural sim- G1 and G2 with the same (non-overlapping) partition b
PN
ilarity, while the NMI of Eq. (10) focuses on edge-level with B groups. We will denote with nr = i=1 δbi ,r the
similarity. We will see from experiments in Sec. III that number of nodes with community label r in the parti-
this distinction becomes important for graphs with het- tion b. The partition b can be thought of as a specific
erogeneous degrees, since differences in network structure coarse-graining of the networks, and tuning B allows us
can naturally become concentrated on high degree nodes, to interpolate between the microscale (B ∼ O(N )) and
particularly in the case of random edge rewiring. the macroscale (B ∼ O(1)) to capture similarities at the
scale of interest. A reasonable choice for b once the scale
B is chosen is a community partition of one of the net-
C. Mesoscale network mutual information works into B groups, since this represents a meaningful
coarse-graining of the network at this scale. However, in
While the measures of sections II A and II B will cap- principle any choice for b is possible, and we will show
ture the small-scale information shared between two in Sec. III that often one can choose meaningful coarse-
graphs G1 and G2 (e.g. the overlap of specific edges and grainings b based on node metadata relevant to a partic-
node neighborhoods), they will fail to capture mesoscale ular application. We will let b be known to the receiver—
7

its corresponding entropy term would vanish in our final An important property of the multiset coefficient that
mutual information expression anyway. we will use when constructing our measure is that it is
Under the node partition b, each network Gi can be de- subadditive when transformed by a logarithm to get an
(b) entropy. In other words, for any k, l we have
scribed by a coarse-grained representation G̃i , defined
as the multiset of Ei elements where element (r, s) with  n   n   
n
r ≤ s has a multiplicity mi (r, s) equal to the number log + log ≥ log . (20)
of edges in Gi containing one node in group r and one k l k+l
node in group s. (This is equivalent to the mixing ma-
trix representation of community-community ties used To prove this, define X (n,k) as the set of non-negative
in the microcanonical SBM [39].) Defining the scale of integer vectors of length n whose values sum to k. The
the full network Gi to be order O(1), the representation multiset coefficient counts the  number of unique vectors
(b) in X (n,k) , i.e. |X (n,k) | = nk . We can construct a map
G̃i captures the aggregate structure of Gi at a scale of
f : X (n,k) × X (n,l) → X (n,k+l) given by f (x, y) = x + y.
order O(B −1 ). In the extreme case with B = N (the
The map f is surjective, since any z ∈ X (n,k+l) has at
partition b puts each node into its own group), we have
(b) least one pair (x,y) ∈ X (n,k) × X (n,l) such that f (x, y) =
G̃i = Gi and we are capturing network similarity at z. Therefore, n n
= |X (n,k) ×X (n,l) | ≥ |X (n,k+l) | =
k l
the scale O(N −1 ). The measure we present in the case 
n

B = N can thus be used as a mutual information between k+l , and taking the logarithm of both sides gives the
multigraphs G1 and G2 . result in Eq. (20).
The mesoscale mutual information measure Ib (G1 ; G2 ) (b) (b)
Now we can consider transmitting G̃1 and G̃2 to-
that we will derive aims to capture the amount of shared gether, exploiting the shared information between them
information at the scale O(B −1 ) between the graphs G1 to reduce the total entropy of the transmission. Using the
and G2 by computing the mutual information between generalization of the set intersection to multisets [40], we
(b) (b)
the multisets G̃1 and G̃2 for a chosen partition b. (As (b)
can define the true positives E12 in this context as
discussed, b can be derived by using network structure
itself or by using external metadata.) Instead of for- (b) (b)
E12 = |G̃1 ∩ G̃2 |
(b)
mulating the mutual information from the perspective X
of conditional entropy, as is done in Sec. II A, we will = min(m1 (r, s), m2 (r, s)), (21)
motivate the mesoscale mutual information from a joint r≤s
(b)
transmission process in which we transmit both G̃1 and where the index r iterates over 1, . . . , B, and the index s
(b)
G̃2 —first individually, then together using their shared iterates over r, . . . , B. Equation (21) tells us the total
information. Formulations of mutual information mea- number of common group pairs (r, s) (allowing dupli-
sures using conditional and joint entropies are equivalent, (b) (b)
cates) between the multisets G̃1 and G̃2 . The infor-
but in our case the latter allows for a more straightfor- (b)
ward exposition. mation required to transmit E12 is of O(log(E1 + E2 ))
(b)
We can first consider transmitting each individual mul- andcan beignored. With E12 known, there are
(b) B
( 2 )+B
tiset, G̃i , separately to a receiver. Using the same cal- log (b) ways to distribute the overlapping pairs
culation procedure as with the entropy measures above, E12
(b) (b)
we can find these individual entropies to be among G̃1 and G̃2 . Once the overlapping pairs are
!! known, we must specify Ei − E12 remaining pairs for
B

+B (b)
(b)
H(G̃i ) = log 2 , (18) each of the multisets G̃i . Putting this all together gives
Ei a total joint information of
B B B
 !!  !!  !!
where ′ 2 +B 2 +B 2 +B
H = log (b) (b) (b) .
 n  
n+k−1
 E12 E1 − E12 E2 − E12
= (19) (22)
k k

(B2 )+B
 In principle, with the joint entropy of Eq. (22) one
is the multiset coefficient. In Eq. (18), Ei
is the can construct a mesoscale mutual information measure
(b) (b)
(b)
number of multisets G̃i with Ei elements one can of H(G̃1 ) + H(G̃2 ) − H′ . While this has some de-
 con- sirable properties, it is not strictly increasing with the
struct from a set of objects with cardinality B2 + B, (b)
that is, the number of independent combinations (r, s) overlap E12 due to low overlap values placing substan-
(b) (b)
with 1 ≤ r ≤ s ≤ B. The logarithm of this quantity tial constraints on the multisets G̃1 and G̃2 in some
(b) cases, providing an efficient reduction in the entropy and
is thus the entropy of our encoding for specifying G̃i
given the constraints known by the receiver, and trans- a high mutual information. Intuitively, one would ex-
(b)
mitting the two multisets individually therefore requires pect that a higher overlap E12 among the multisets
(b) (b) (b) (b)
H(G̃1 ) + H(G̃2 ) bits of information. G̃1 and G̃2 should result in a higher similarity value,
8

and according to this criterion the mutual information We can also use the inequality of Eq. (26) to establish
(b) (b)
H(G̃1 ) + H(G̃2 ) − H′ is unsuitable as a candidate for an upper bound for the mesoscale mutual information.
a similarity measure. Although the graph mutual infor- Without loss of generality, label the graphs Gi so that
mation of Eq. (9) technically also has a similar limitation E1 ≤ E2 . We then have that the maximum overlap is
(b)
due to invariance under graph complements, as discussed E12 = E1 , and so
in Sec. II A this is irrelevant in the sparse regime where
(b)
Ei < N2 /2. I(b) (b)
meso ≤ Imeso (E12 = E1 )
Instead of Eq. (22), it turns out that a lower bound  B
( 2 )+B
  B
( 2 )+B

on the joint entropy can be used to produce a useful E E2
mutual information similarity measure that does exhibit = log 1 B
( 2 )+B

(b) E1 +E2 −E1
monotonicity with the overlap E12 as desired. Using the
subadditivity of the log-multiset coefficients in Eq. (20), (b)
= H(G̃1 )
we have the bound (b) (b)
B
H(G̃1 ) + H(G̃2 )
≤ , (28)
 !!
′ 2 +B 2
H ≥ log (b) . (23)
E1 + E2 − E12
where the lastinequality uses the fact that the multiset
Using this bound on the joint entropy coefficient nk is an increasing function of k and that
E1 ≤ E2 .
B
 !!
(b) (b) 2 + B Using the bounds of Eqs. (27) and (28), we can
H(G̃1 , G̃2 ) = log (b) , (24) construct a normalized mesoscale mutual information
E1 + E2 − E12
NMI(b)
meso as follows
we can now compute the mesoscale mutual information
of graphs G1 and G2 with respect to partition b as I(b)
meso (G1 ; G2 )
NMI(b)
meso (G1 ; G2 ) = 2 × (b) (b)
, (29)
(b) (b) (b) (b) H(G̃1 ) + H(G̃2 )
I(b)
meso (G1 ; G2 ) = H(G̃1 ) + H(G̃2 ) − H(G̃1 , G̃2 ).
(25) which mirrors the form of the graph NMI of Eq. (10).
For fixed E1 , E2 , we have that for any In practice, it is also useful to consider an alternative
(b) normalization with a tighter lower bound, thus
E12 ∈ [1, min(E1 , E2 )] the mesoscale mutual infor-
mation of Eq. (25) satisfies I(b) (b) (b)
meso (G1 ; G2 ) − Imeso (E12 = 0)
  MesoNMI(b) (G1 ; G2 ) = (b) (b)
,
(B2 )+B H(G̃1 )+H(G̃2 ) (b)
(b) 2 − I(b)
meso (E12 = 0)
E1 +E2 −E12 +1
I(b)
(b) (b) (b) (30)
meso (E12 ) − Imeso (E12 − 1) = log   ≥ 0,
(B2 )+B
E1 +E2 −E12
(b) where
(26) 
(B2 )+B
  B
( 2 )+B

(b) E1 E
since the multiset coefficient nk is a strictly increas- I(b)
meso (E12 = 0) = log  2 . (31)
  B
( 2 )+B
ing function of k. Here we considered the mutual infor- E1 +E2
(b)
mation as a function of only the overlap E12 since this
is the only remaining free parameter when E1 , E2 are Both Eqs. (29) and (30) will fall in the range [0, 1], with
fixed. Equation (26) implies that, when all else is con- NMI(b)
meso (G1 ; G2 ) = 1 if and only if G1 = G2 . In the ex-
(b) (b) (b)
stant, a greater overlap E12 among G̃1 and G̃2 gives periments of Sec. III, we will use the mesoscale NMI mea-
sure defined in Eq. (30).
a greater mesoscale mutual information I(b)meso (G1 ; G2 ), The mesoscale mutual information above requires the
which is consistent with what we expect from a similarity user to choose a common partition b of the networks
measure. G1 and G2 , which effectively sets the scale of interest
From the previous argument, we can also see that for comparing the similarity of the two graphs. One
I(b) (b) (b) can choose the partition b in a number of ways, a
meso ≥ Imeso (E12 = 0)
 B   B meaningful choice being a community partition of either
( 2 )+B ( 2 )+B

G1 or G2 that decomposes the graph into densely con-
E1 E
= log  B  2 nected groups of nodes with sparser connections between
( 2 )+B
E1 +E2 groups. In principle, one can also maximize or minimize
I(b)
meso (G1 ; G2 ) over all possible partitions b with B groups,
≥ 0, (27)
which can identify the divisions for which the networks
or in other words, the mesoscale mutual information is are most/least similar at the chosen scale. Similarly, one
bounded below by 0. This follows from the subadditivity could sample over partitions b at a given scale B and
of the logarithm of the multiset coefficient. examine the full distribution of values I(b) meso (G1 ; G2 ) to
9

1.0
ER BA
0.8

Graph similarity
j
i 0.6
(a)
k 0.4

0.2

Node attack 0.0

1.0
ER BA
0.8

Graph similarity
0.6
j NMI NMI
i
(b) 0.4 DC-NMI DC-NMI
Jaccard Jaccard
k MesoNMI (B = 1) MesoNMI (B = 1)
0.2
MesoNMI (B = 10) MesoNMI (B = 10)
MesoNMI (B = 100) MesoNMI (B = 100)
0.0
Edge attack 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
 

Figure 2. Graph similarity measures for networks under node and edge attacks. (a) Graph similarity as a
function of the fraction of nodes attacked ϵ for random networks, where nodes are attacked in decreasing order of degree. Graph
similarity is measured with the NMI, DC-NMI, and MesoNMI for B ∈ {1, 10, 100} to capture multiple scales of similarity. The
Jaccard index |G1 ∩ G2 |/|G1 ∪ G2 | is included for comparison. The MesoNMI partitions b are computed with a standard
stochastic block model (SBM) with fixed group sizes B ∈ {1, 10, 100} on the initial (un-attacked) graph. Simulations are
averaged over 10 realizations of the initial graph from the Erdős-Rényi model (ER, top left panel) and Barabási-Albert model
(BA, top right panel) with N = 1000 nodes and average degree ⟨k⟩ = 10 (error bars indicate three standard errors and are
vanishingly small). (b) Graph similarity as a function of the fraction of edges randomly rewired, for the same synthetic networks.
Subtle differences in the decay rates of different similarity measures are reflective of intuitive properties of these measures, as
discussed in Sec. III.

get a multifaceted assessment of the similarity between A. Similarity scores among perturbations of
G1 and G2 at the chosen scale. Finally, as we do in synthetic networks
the experiments in Sec. III, one can use node metadata
to define the partition b by grouping nodes with similar
characteristics. We examine the extent to which each of our simi-
larity measures captures different structural deviations
Figure 1 shows the application of all measures pre- by perturbing synthetic reference networks with differ-
sented in this section to a pair of example networks. ent noise (or “attack”) strategies. In each simulation, we
first generate a reference network from a random graph
model—here we used random networks generated from
the Erdős-Rényi (ER) model [41, 42], Barabási-Albert
III. RESULTS
(BA) model [43], and the stochastic block model (SBM)
with equally sized groups and mixing levels that only de-
In this section we apply our graph similarity measures pend on whether the nodes are in the same group or dif-
in a variety of experiments with synthetic and empir- ferent groups [38, 44]. (This is also known as the planted
ical networks. First, we illustrate the differences be- partition model.) These different reference graphs allow
tween similarity measures by attacking synthetic net- us to examine the effects of global network structure and
works at different scales. We find that the different en- degree heterogeneity on the similarity scores. We then
coding schemes indeed produce graph similarity measures perturb the reference graph by attacking it with one of
that highlight shared structure at different scales and two types of moves: (1) node attacks (Fig. 2a), which
are affected differently by these structural perturbations. take all the edges incident to a node and rewire them uni-
Then, we apply our measures to a case study with an em- formly at random to neighbors not currently connected to
pirical multilayer network of global trade patterns, find- the attacked node; and (2) edge attacks (Fig. 2b), which
ing that our measures capture meaningful shared struc- take an edge (i, j) and place it between a different pair of
ture among the network layers representing the move- nodes (i′ , j ′ ), chosen uniformly at random from all nodes
ment of different goods. except i, j. We run our simulations by attacking nodes in
10

decreasing order of degree and edges in a random order, (a)


measuring the extent to which the original network has
been perturbed with an attack fraction ϵ indicating the 1.0
fraction of nodes or edges that have been attacked. SBM (µ = 0.5)
0.8
Figures 2a and 2b show graph similarity values for a

MesoNMI(B)
number of measures versus the fraction ϵ of node and edge
0.6
attacks, respectively. We include the Jaccard similarity
|G1 ∩ G2 |/|G1 ∪ G2 | for reference. In these experiments, =0
the partitions b required to compute the MesoNMI mea- 0.4  = 0.25
sures from Eq. (30) are computed by fitting an SBM to  = 0.50
the original (unperturbed) network with the indicated 0.2  = 0.75
=1
number of groups B. Each curve is the average over 10
simulations, each starting from a different initial graph. 0.0
All networks in the experiment were of size N = 1000 and 100 101 102 103
Groups B
had an average degree of ⟨k⟩ = 10. Unless stated oth- (b)
erwise, error bars indicate three standard errors. The
reference networks were generated from an ER model 1.0
(left panels) and a BA one (right panels), with the goal 0.9
of capturing the effects of degree heterogeneity on how SBM (B = 2)

MesoNMI(µ)
each similarity measure is penalized under node and edge 0.8
attacks.
In all panels, we see steeper decays in the similarity 0.7
values of the microscale measures—NMI, DC-NMI, and =0
0.6  = 0.25
Jaccard index—than the MesoNMI measures (except at
 = 0.50
B = 1, which is trivially equal to 1 for all ϵ since the num- 0.5  = 0.75
ber of edges is constant throughout the attack process). =1
We can also see a greater (albeit still modest) differenti- 0.4
ation between the microscale measures for the node at- 0.0 0.2 0.4 0.6 0.8 1.0
tacks than the edge attacks. The standard NMI measure Mixing parameter µ
and Jaccard index are unchanged between the two attack
strategies, since it will be penalized equally no matter Figure 3. Mesoscale mutual information between
how edges are replaced. However, the DC-NMI is less stochastic block model (SBM) networks. Edge attack
penalized by the node attacks than the NMI or the Jac- simulations were performed on networks generated from an
card index. This is because the rewiring of a hub node i SBM with two groups of 500 nodes, average degree of 10, and
primarily impacts the DC-NMI (Eq. 16) via the change mixing level µ ∈ [0, 1] fixing the fraction of edges running
in this node’s new neighborhood overlap p12 (i) = 0, while between nodes of the same group identity. (a) MesoNMI val-
i’s old and new neighbors j may see only slight changes ues for different edge attack fractions ϵ (indicated by curves
to their values p12 (j) as these nodes often will have other of shades of gray) as a function of the number of groups
B of the node partition b used for the MesoNMI calcula-
neighbors that are unchanged after the attack. In con-
tion, for SBM networks with no mixing preference (µ = 0.5,
trast, all the edges (i, j) replaced due to the hub attack— equivalent to Erdős-Rényi random graphs). The MesoNMI
which may constitute a substantial fraction of all edges in is more sensitive to edge-level attacks as the scale of inter-
the network—will penalize equally the total edge overlap est for comparison gets smaller (i.e. the number of groups B
|G1 ∩ G2 | considered by the NMI and Jaccard measures. gets larger). (b) MesoNMI as a function of the mixing level
We can also see from Fig. 2a that degree heterogeneity µ of the initial graph being attacked, with the partition b
has a substantial effect on the sensitivity of the similar- used for the MesoNMI calculations being fixed as the initial
ity measures for the node attacks, since attacking nodes graph’s planted partition into B = 2 groups. As the mixing
in decreasing order of degree results in more edges be- level moves away from µ = 0.5, we see a stronger dependence
of the MesoNMI on the attack level ϵ due to the rewiring of
ing rewired at a given attack fraction ϵ for heterogeneous
inter-community (µ = 0) or intra-community (µ = 1) edges
(BA) than homogeneous (ER) degree distributions. How- to produce an equitable mixture of these two edge types in
ever, this effect is not present in Fig. 2b for the edge expectation.
attacks due to all edges providing a roughly equal con-
tribution to each similarity measure.
The MesoNMI measures in all panels present fairly
uniform patterns, with the decay in similarity becoming der the random rewiring of individual edges, the mixing
more severe as we increase the number of groups B in the structure within and between groups of the node parti-
partition b of the reference network. This is consistent tion remains less affected since these perturbations may
with the attacks being performed at the microscale (e.g. produce new edges that run between the same pairs of
on nodes and edges) rather than at the mesoscale: un- groups. This effect is more pronounced as we decrease
11

NMI (a) DC-NMI (b) (c)


1.0
1 1.0 NMI 0.95 0.72 0.75 0.77 0.91 0.64
Layer

DC-NMI 0.95 0.73 0.76 0.78 0.90 0.65 0.9


0.5

Number of clusters
102
MesoNMI 0.72 0.73 0.92 0.89 0.78 0.72
(B = 2) 0.8
364 0.0 MesoNMI 0.75 0.76 0.92 0.91 0.81 0.71
(B = 6) NMI
1 1.0
MesoNMI 0.7 DC-NMI
0.77 0.78 0.89 0.91 0.84 0.70
(B = 50) 101 MesoNMI (B = 2)
Layer

MesoNMI (B = 6)
0.5 MesoNMI 0.91 0.90 0.78 0.81 0.84 0.64
(B = 194) 0.6 MesoNMI (B = 50)
MesoNMI (B = 194)
JSD 0.64 0.65 0.72 0.71 0.70 0.64 JSD
364 0.0 0.5
1 364 1 364 I MI o(2) 6) 50
) 4) 10−2 10−1 100
Layer Layer NM -N s so( so( so(
19 JSD
DC Me Me Me Linkage distance
Me
MesoNMI(B = 2) JSD

Figure 4. Comparison of graph similarity values among layers of the FAO trade network. (a) Pairwise similarity
matrices among layers of the FAO trade network [7], each layer representing the global trade patterns among countries for a
particular good. The MesoNMI was computed with respect to a partition of the country nodes in each layer according to
a Global North-South dichotomy [45]. The network Jensen Shannon divergence (JSD) measure of [7] is transformed into a
similarity measure using 1 − JSD and included for comparison. All matrices indicate a similar block structure to the layer
similarities, and as the network scale of interest increases (NMI to DC-NMI, to MesoNMI to JSD) we find systematically higher
similarity values, with the MesoNMI having the greatest discriminative power. (b) Rank-biased overlap (RBO) [46] between
the pairwise distances calculated using each pair of similarity measures. For example, the (NMI, DC-NMI) entry of this matrix
is the RBO between the entires of the top two panels in (a). As the scale of interest decreases, we find greater RBO between
the corresponding pairwise distance matrices. (c) Number of clusters versus the corresponding Ward linkage distance for a
hierarchical clustering of the layers [7]. There are discrepancies in the hierarchical cluster structure inferred using the measures,
with measures operating at similar scales having similar linkage patterns.

the number of groups B, since with few groups it is more reference network is an ER random graph). Simulations
likely that rewiring an edge (i, j) to a new edge (i′ , j ′ ) will for the first interval B ∈ [1, 10] were averaged over 100
result in the same pair of groups bi , bj appearing on the random initial networks, while for the tail end we used
ends of the edge. This robustness to microscale attacks 10 simulations in total. We find (as expected) that the
also manifests in less sensitivity to the degree heterogene- MesoNMI decreases monotonically as a function of ϵ for
ity of the original graph, as can be seen in the top right all values of B, with ϵ = 0 trivially providing no change
panel. to the similarity scores. We also find, as in Fig. 2, that for
Figure 3 shows an edge attack experiment, except we a fixed value of the attack fraction ϵ the MesoNMI values
use an initial graph drawn from an SBM with two equal decrease monotonically as a function of B, indicating that
groups and tunable mixing level µ fixing the fraction of the microscale edge attacks are not felt as intensely by the
edges that run within groups. The value of µ = 0.5 measures that aggregate structural information at larger
corresponds to no mixing preference between the two scales.
groups (i.e. the network is an ER random graph), while In Fig. 3b we plot the MesoNMI versus the mixing pa-
µ = 0 corresponds to completely disassortative structure rameter µ used to generate the reference SBM graph, also
in which all edges are between groups, and µ = 1 cor- at different attack fractions ϵ. In this case, the partition
responds to a completely assortative structure in which b used to compute the MesoNMI is the original planted
all edges are among nodes that have the same group af- partition of the reference network into two groups (sim-
filiation. As before, we fix the number of nodes to be ulations were averaged over 50 random initial networks).
N = 1000 and average degree of ⟨k⟩ = 10. For the We find that, for a given attack fraction ϵ, the sensitiv-
MesoNMI measures, we calculate the partitions b in the ity in the MesoNMI is strongly dependent on the mixing
same way as in Fig. 2, computing the SBM-optimal par- level µ, with values decreasing as we move away from
tition on the reference network with the desired value of µ = 0.5. This is because edge attacks will affect a highly
B being fixed. Although the reference SBMs actually (dis)assortative partition more than one with little mix-
only have two groups in their planted structure, setting ing preference, since edges are more likely to be rewired
B to different values for the attack experiments allows to new group affiliations if they are highly concentrated
us to examine the similarity of the perturbed networks among nodes with certain pairs of group affiliations. We
with this reference graph at different scales using the also observe an interesting phase transition-like behavior
MesoNMI measure. in the MesoNMI at roughly µ = 0.25 and µ = 0.75, which
In Fig. 3a we plot the MesoNMI versus the number of may have qualitative similarities with the detectability
groups B for different attack fractions ϵ (shades of gray) transition of the SBM [47] in which community structure
for an SBM with no mixing structure (µ = 0.5, i.e. the is suddenly statistically indistinguishable from random
12

edge placement at a certain level of mixing. ever, as the network scale of interest increases (NMI to
These experiments altogether verify our intuition DC-NMI, to MesoNMI to JSD) we can observe increas-
about how each measure values certain deviations in net- ing similarity values, reflecting greater overall similarity
work structure, as well as confirm that our mesoscale mu- among the networks when viewing them at larger scales.
tual information measure is truly capturing variation at The MesoNMI measures find a much greater variability
coarser scales than the NMI and DC-NMI measures. in the similarity values across network layers, as indicated
by the range of shades present in the heatmap, while the
NMI, DC-NMI, and JSD measures infer a tighter range
B. FAO trade network case study of similarity scores among the layers.
We then compare these similarity matrices to examine
To examine how the proposed measures can be used how similar the ranks of the entries are across pairs of
to extract meaningful summaries of shared structure in measures. We use the rank-biased overlap (RBO) [46], a
sets of networks, we apply our measures to the multilayer measure indicating the overlap in the rankings of values
FAO network of global trade patterns [7, 48], where nodes from two lists. The RBO provides an alternative mea-
represent N = 214 countries, an edge (i, j) represents the sure to the Spearman rank correlation that more heavily
trade of a good between countries i and j, and each layer weights the contributions from the highest ranked items
(of which there are 364) represents a different good being in both lists, which in our case allows us to more heavily
traded. We kept only the countries with gross domestic favor pairs of measures that assign high similarity scores
product (GDP) accessible through the World Bank in- to common pairs of layers. In Fig. 4b we plot the results
dicators [49] of 2010 (the same year the tradings took of applying the RBO to the flattened similarity matri-
place), and so the final network studied was reduced to ces of each pair of measures studied. We observe a very
194 nodes. The network was binarized and converted clear trend in which the RBO of a pair of measures is in-
to an undirected representation, facilitating a straight- versely related to the difference in their scale of interest.
forward comparison with the network Jensen Shannon For example, the NMI and DC-NMI results have a de-
divergence (JSD) measure proposed in [7] and evaluated creasing RBO with the MesoNMI results as we move from
on the same dataset. (However, as discussed in Sec. II C, B = 194 to B = 50, to B = 6 to B = 2, and the similar-
one can apply all of our measures to the original directed ity among the MesoNMI results exhibits the same trend.
network representation, and apply the MesoNMI mea- Meanwhile, the highest RBO between any of our mea-
sures to the weighted representation by considering it as sures and the JSD is 0.72 (for MesoNMI(B = 2)), which
a multigraph.) The network JSD measure of [7] is com- is in turn equal to the lowest of the RBO values among
puted using the spectrum of the combinatorial Laplacian, any of our measures (NMI and MesoNMI(B = 2)). This
and should capture both small- and large-scale structural may reflect the fact that the JSD is capturing both small-
similarities between networks as with other spectral mea- and large-scale similarities among the networks, while
sures [17, 18]. our measures are targeting particular scales of interest.
We applied the NMI (Eq. 10), DC-NMI (Eq. 16), In Fig. S1a of the Supplementary Information (SI) we
MesoNMI (Eq. 30), and the network JSD [7]—which repeat the experiment in Fig. 4b using Spearman rank
was converted to a similarity measure 1 − JSD to facili- correlation instead of RBO, finding the same qualitative
tate a direct comparison with our measures—to compare results.
all pairs of layers in the FAO trade network and per- We then examine the behavior of all measures in a hi-
form hierarchical clustering analysis as in [7]. For the erarchical clustering context. We consider the rows of the
MesoNMI, we used a number of partitions b reflecting similarity matrices in Fig. 4a as node embeddings that
meaningful divisions of the countries at multiple scales: reflect the positions of each layer relative to the other lay-
(1) a partition into B = 2 groups representing the Global ers. We then cluster the layers hierarchically using the
North and Global South in accordance to the United Na- Ward linkage criterion, as done in [7] for the JSD dis-
tions’ Finance Center for South-South Cooperation [45]; tance measure. In this formulation, the pair of clusters
(2) a partition into B = 6 groups representing the conti- whose layers have the least discrepancy in their similarity
nents in the dataset; (3) a partition into B = 50 equally patterns with other clusters will be iteratively merged un-
sized groups of countries after ordering by GDP—this til there is only one remaining cluster of layers—the full
is to facilitate an analysis at an intermediate network multilayer network. We show the linkage results from this
scale; and (4) a partition of the network into B = 194 experiment in Fig. 4c. We can see that, as in Fig. 4b,
groups each of size one—this is to compare and contrast the measures are ordered roughly by their scale of in-
the multigraph-based encoding of the MesoNMI with the terest, with measures operating at similar scales having
graph-based encoding of the standard NMI, which can similar linkage patterns. Measures operating at small
give substantially different results in practice. scales (e.g. NMI and DC-NMI) have a relatively high
In Fig. 4 we show the results of these experiments. In linkage threshold at which they begin to merge clusters,
particular, panel 4a shows the pairwise similarity ma- while measures operating at coarser scales (e.g. JSD and
trices for a few of the measures, which convey a similar MesoNMI(B = 2)) have a relatively low linkage threshold
block structure for the network layer comparisons. How- at which they begin to merge clusters. At large numbers
13

of clusters, this pattern is reflective of the increasing sim- ⟨Sim(between) ⟩/⟨Sim(within) ⟩ AMI
ilarity scores seen across scales in Fig. 4a. However, we NMI 0.810 0.175
see many of the linkage patterns for the coarser measures DC-NMI 0.805 0.223
cross over those for the small-scale measures at roughly
MesoNMI(B = 2) 0.935 0.031
10 clusters. This indicates that these measures will give
MesoNMI(B = 6) 0.929 0.055
qualitatively distinct clustering patterns beyond those re-
sulting simply from a systematic difference in the mag- MesoNMI(B = 50) 0.922 0.041
nitude of their similarity scores. MesoNMI(B = 194) 0.864 0.138
To further examine the discrepancies in the cluster- JSD 0.982 0.040
ing structure induced by each similarity measure, we test
Table 1. Correspondence between similarity scores
the extent to which the clusters obtained through hier-
and product types of FAO trade network layers. The
archical clustering reflect similarities in the products be- layers of the FAO trade network were each assigned to one
ing traded in each layer. We assign a product type to of 12 “ground truth” product categories following the clas-
each layer using the following 12 categories: {Proteins, sification in [29]. The clusters for each method were com-
Grains, Dairy, Fruits, Vegetables, Sweets, Drinks, Spices, pared with the ground truth categories using the ratio of the
Animals, Raw Materials, Tobacco, Other} [29]. We call average between-category similarity and the average within-
these the “ground truth” product types for simplicity, category similarity, with lower values indicating more tightly
but emphasize that, since we are performing unsuper- knit categories according to the similarity method of inter-
vised learning, we are not trying to exactly capture the est. Then, the similarity matrices computed in Fig. 4 were
distinctions among product types with our method (a clustered using Ward linkage as in [7], with the number of
task for which supervised techniques are better suited). clusters fixed at 12 to ensure a fair comparison across meth-
ods. The adjusted mutual information (AMI) [22] between
We compare the clustering outputs from each measure the ground truth layer partition and the inferred layer parti-
with these ground truth categories using two different tion was computed for each method. Both tests indicate that
measures. First, we examine the extent to which each the micro-scale measures are capturing more of the similarity
measure finds greater similarity among product layers among layers in the same category than the measures focus-
within the same category versus other categories. To ing on the meso- and macro-scales, suggesting that scattered
do this we compute the ratio of the average similarity individual edges may carry more information about shared
scores among layers within the same ground truth cat- global trade patterns than larger-scale network structure.
egory (⟨Sim(within) ⟩) and layers within different ground
truth categories (⟨Sim(between) ⟩). A lower ratio indi-
Fig. S1b of the SI we compute the AMI between the
cates that the measure is better discriminating among
inferred clusters and the ground truth categories for all
the ground truth categories, since it finds higher similari-
levels of each measure’s hierarchical linkage dendrogram,
ties among within-category layers than between-category
finding roughly the same ordering of the measures as for
layers. The results for this experiment are shown in
the cut at 12 clusters. In Fig. S2 we also plot the dis-
the left column of Table 1. We can see that the NMI
tribution of similarity scores within and between ground
and DC-NMI tend to best discriminate among the prod-
truth product groups to visualize the full distributions of
uct categories, while the JSD is the poorest discrimi-
scores rather than just the averages reported in Table 1.
nator among the categories, with similarity scores that
are nearly equal between and within categories. We also In Figures S3 and S4 we apply our measures to two ad-
compute the adjusted mutual information (AMI) [22] be- ditional multilayer network datasets representing collab-
tween the partitions of the layers induced by each simi- oration patterns among scientists within different fields
larity measure with the ground truth layer partition ac- of physics and routes for different airline companies [50].
cording to product category. Since there are many strate-
gies for determining where to cut each measure’s cor-
responding clustering dendrogram to find its associated IV. CONCLUSION
partition of the layers, we cut each measure’s dendro-
gram at 12 clusters—the same as the number of clus- In this paper we have proposed a family of mutual in-
ters in the ground truth partition—to facilitate a fair formation measures for computing the similarity between
comparison across measures. We plot the results in the a pair of networks. We adapt the encodings used to con-
right column of Table 1. We find that the NMI and struct these measures to accommodate structural simi-
DC-NMI find clusters that are the most similar to the larity at multiple scales within the network. By applying
ground truth according to the AMI, consistent with their the proposed measures in a range of tasks, we find that
capability to distinguish these categories in the previ- the proposed measures are able to consistently capture
ous experiment. Similarly, we find poor performance for meaningful notions of network similarity at the desired
the JSD and MesoNMI measures on this dataset. Alto- scale under perturbations to synthetic networks, as well
gether, these results indicate that the similarity among as capture heterogeneity among the layers in real mul-
product networks within the same category manifests at tilayer network datasets arising in the study of global
the microscale rather than the meso- or macro-scales. In trade, scientific collaboration, and transportation.
14

There are a number of ways in which our measures V. CODE AVAILABILITY


can be extended in future works. The MesoNMI mea-
sure we propose does not consider the impact of degree Code implementing the measures presented in this
heterogeneity on similarity, and so could in principle be paper is available at https://github.com/hfelippe/
extended to a degree-corrected version as done with the network-MI.
standard graph NMI measure. One can also apply new
encodings under our framework to capture similarity with
respect to the presence of complex local structures such VI. DATA AVAILABILITY
as motifs or other subgraphs while allowing for network
of different sizes, similar to the calculations performed
The data used in this paper is available at https://
in [27]. Lastly, by adapting the combinatorial calcula-
github.com/hfelippe/network-MI.
tions to a new set of constraints the methods here pre-
sented can in principle be extended to more general dis-
crete structures, such as hypergraphs or simplicial com-
VII. AUTHOR CONTRIBUTIONS
plexes, to account for higher-order interactions [51].
Taken together, our work contributes to the growing
literature on graph similarity through the introduction H.F.: conceptualization, software, validation, formal
of principled and interpretable measures based on infor- analysis, investigation, data curation, writing—original
mation theory, which allow to capture network similarity draft, visualization; F.B.: conceptualization, writ-
in a controlled and efficient way at different scales. ing—review and editing, supervision, funding acquisi-
tion; A.K.: conceptualization, methodology, software,
writing—original draft, writing—review and editing, su-
ACKNOWLEDGMENTS pervision. All authors gave final approval for publication
A.K. was supported by an HKU Urban Systems Fellow- and agreed to be held accountable for the work performed
ship Grant and the Hong Kong Research Grants Coun- therein.
cil under Grant no. ECS–27302523. F.B. acknowledges VIII. COMPETING INTERESTS
support from the Air Force Office of Scientific Research
under award number FA8655-22-1-7025. The authors declare no competing interests.

[1] R. Sharan and T. Ideker, Modeling cellular machinery [11] P. Wills and F. G. Meyer, Metrics for graph comparison:
through biological network comparison. Nature Biotech- a practitioner’s guide. PLOS One 15, e0228728 (2020).
nology 24, 427 (2006). [12] H. Hartle, B. Klein, S. McCabe, A. Daniels, G. St-Onge,
[2] N. Nikolova and J. Jaworska, Approaches to measure C. Murphy, and L. Hébert-Dufresne, Network compari-
chemical similarity–a review. QSAR & Combinatorial son and the within-ensemble graph distance. Proceedings
Science 22, 1006 (2003). of the Royal Society A 476, 20190744 (2020).
[3] J. Wang and Y. Dong, Measurement of text similarity: a [13] I. Kyosev, I. Paun, Y. Moshfeghi, and N. Ntarmos, Mea-
survey. Information 11, 421 (2020). suring distances among graphs en route to graph cluster-
[4] Z. Zeng, A. K. Tung, J. Wang, J. Feng, and L. Zhou, ing. In 2020 IEEE International Conference on Big Data
Comparing stars: on approximating graph edit distance. (Big Data), 3632 (2020).
Proceedings of the VLDB Endowment 2, 25 (2009). [14] S. Ranshous, S. Shen, D. Koutra, S. Harenberg,
[5] X. Guo, J. Hu, J. Chen, F. Deng, and T. L. Lam, Se- C. Faloutsos, and N. F. Samatova Anomaly detection
mantic histogram based graph matching for real-time in dynamic networks: a survey. Wiley Interdisciplinary
multi-robot global localization in large scale environ- Reviews: Computational Statistics 7, 223 (2015).
ment. IEEE Robotics and Automation Letters 6, 8349 [15] M. Roy, S. Schmid, and G. Tredan, Modeling and mea-
(2021). suring graph similarity: the case for centrality distance.
[6] N. M. Kriege, F. D. Johansson, and C. Morris, A survey In Proceedings of the 10th ACM international workshop
on graph kernels. Applied Network Science 5, 1 (2020). on Foundations of mobile computing, 47 (2014).
[7] M. De Domenico, V. Nicosia, A. Arenas, and V. La- [16] S. Soundarajan, T. Eliassi-Rad, and B. Gallagher, A
tora, Structural reducibility of multilayer networks. Na- guide to selecting a network similarity method. In Pro-
ture Communications 6, 6864 (2015). ceedings of the 2014 SIAM International Conference on
[8] S. Ok, A graph similarity for deep learning. In Advances Data Mining, 1037 (2014).
in Neural Information Processing Systems 33, 1 (2020). [17] N. N. W. B. Apolloni, An introduction to spectral dis-
[9] N. Attar and S. Aliakbary, Classification of complex net- tances in networks. In Neural Nets WIRN10: Proceedings
works based on similarity of topological network features. of the 20th Italian Workshop on Neural Nets 226, 227
Chaos 27, 091102 (2017). (2011).
[10] S. V. N. Vishwanathan, N. N. Schraudolph, R. Kondor, [18] R. C. Wilson and P. Zhu, A study of graph spectra
and K. M. Borgwardt, Graph kernels. The Journal of for comparing graphs and trees. Pattern Recognition 41,
Machine Learning Research 11, 1201 (2010). 2833 (2008).
15

[19] D. K. Hammond, P. Vandergheynst, and R. Gribonval, [41] P. Erdős and A. Rényi, On random graphs. Publicationes
Wavelets on graphs via spectral graph theory. Applied Mathematicae 6, 290 (1959).
and Computational Harmonic Analysis 30, 129 (2011). [42] E. N. Gilbert, Random graphs. Annals of Mathematical
[20] M. E. Newman, G. T. Cantwell, and J.-G. Young, Im- Statistics 30, 1191 (1959).
proved mutual information measure for clustering, clas- [43] A.-L. Barabási and R. Albert, Emergence of scaling in
sification, and community detection. Physical Review E random networks. Science 286, 509 (1999).
101, 042304 (2020). [44] B. Karrer and M. E. J. Newman, Stochastic blockmodels
[21] A. F. McDaid, D. Greene, and N. Hurley, Normalized and community structure in networks. Physical Review E
mutual information to evaluate overlapping community 83, 016107 (2011).
finding algorithms. arXiv:1110.2515 (2011). [45] Finance Center for South-South Cooperation, Global
[22] N. X. Vinh, J. Epps, and J. Bailey, Information theoretic South Countries. http://www.fc-ssc.org/en/
measures for clusterings comparison: variants, proper- partnership_program/south_south_countries Ac-
ties, normalization and correction for chance. The Jour- cessed 7 May 2024.
nal of Machine Learning Research 11, 2837 (2010). [46] W. Webber, A. Moffat, and J. Zobel, A similarity mea-
[23] A. Kirkley, Spatial regionalization based on optimal in- sure for indefinite rankings. ACM Transactions on Infor-
formation compression. Communications Physics 5, 249 mation Systems 28, 1 (2010).
(2022). [47] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová,
[24] A. Kirkley, Inference of dynamic hypergraph represen- Asymptotic analysis of the stochastic block model for
tations in temporal interaction data. Physical Review E modular networks and its algorithmic applications. Phys-
109, 054306 (2024). ical Review E 84, 066106 (2011).
[25] S. Fortunato and D. Hric, Community detection in net- [48] M. Tantardini, F. Ieva, L. Tajoli, and C. Piccardi, Com-
works: a user guide. Physics Reports 659, 1 (2016). paring methods for comparing networks. Scientific Re-
[26] X. Gao, B. Xiao, D. Tao, and X. Li, A survey of graph ports 9, 17557 (2019).
edit distance. Pattern Analysis and applications 13, 113 [49] World Bank, World development indicators.
(2010). https://databank.worldbank.org/source/
[27] C. Coupette and J. Vreeken, Graph similarity descrip- world-development-indicators Accessed 7 May
tion: how are these graphs similar? In Proceedings of the 2024.
27th ACM SIGKDD Conference on Knowledge Discovery [50] V. Nicosia and V. Latora, Measuring and modeling cor-
& Data Mining, 185 (2021). relations in multiplex networks. Physical Review E 92,
[28] F. Escolano, E. R. Hancock, M. A. Lozano, and M. Cu- 032805 (2015).
rado, The mutual information between graphs. Pattern [51] F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lu-
Recognition Letters 87, 12 (2017). cas, A. Patania, J.-G. Young, and G. Petri, Networks
[29] A. Kirkley, A. Rojas, M. Rosvall, and J.-G. Young, Com- beyond pairwise interactions: Structure and dynamics.
pressing network populations with modal networks re- Physics Reports 874, 1 (2020).
veal structural diversity. Communications Physics 6, 148
(2023).
[30] T. M. Cover and J. A. Thomas, Elements of Information
Theory. John Wiley & Sons, Hoboken (2012).
[31] A. Lancichinetti, S. Fortunato, and F. Radicchi, Bench-
mark graphs for testing community detection algorithms.
Physical Review E 78, 046110 (2008).
[32] O. Sporns, Networks of the Brain. MIT Press, Cambridge
(2010).
[33] N. Eagle and A. Pentland, Reality mining: sensing com-
plex social systems. Personal and Ubiquitous Computing
10, 255 (2006).
[34] M. E. Newman, Network structure from rich but noisy
data. Nature Physics 14, 542 (2018).
[35] M. Meilă, Comparing clusterings by the variation of in-
formation. In Learning Theory and Kernel Machines,
Springer, New York (2003).
[36] B. H. Good, Y.-A. de Montjoye, and A. Clauset, Perfor-
mance of modularity maximization in practical contexts.
Physical Review E 81, 046106 (2010).
[37] T. P. Peixoto, Bayesian stochastic blockmodeling. In Ad-
vances in Network Clustering and Blockmodeling, Wiley,
New York (2019).
[38] P. W. Holland, K. B. Laskey, and S. Leinhardt, Stochas-
tic blockmodels: first steps. Social Networks 5, 109
(1983).
[39] T. P. Peixoto, Entropy of stochastic blockmodel ensem-
bles. Physical Review E 85, 056122 (2012).
[40] J. L. Hein, Discrete Mathematics. Jones & Bartlett
Learning (2003).
S1

Supplementary Information

Note S1. ROBUSTNESS CHECKS FOR FAO TRADE NETWORK EXPERIMENTS

Here we run a number of additional tests to confirm the findings in Sec. III for the FAO trade network. Fig. S1
repeats key experiments in Sec. III with different evaluation strategies, and Fig. S2 plots the distributions of layer
similarity scores for products in the same ground truth category and products in different ground truth categories.

(a) (b)

1.0 NMI
DC-NMI
NMI 0.99 0.75 0.79 0.81 0.97 0.41
0.25 MesoNMI (B = 2)
MesoNMI (B = 6)

Adjusted mutual information


DC-NMI 0.99 0.78 0.82 0.84 0.97 0.46 0.9 MesoNMI (B = 50)
MesoNMI (B = 194)
0.20
JSD
MesoNMI 0.75 0.78 0.99 0.98 0.86 0.72 Ground truth
(B = 2)
0.8
MesoNMI
0.15
0.79 0.82 0.99 0.98 0.89 0.69
(B = 6)

MesoNMI
0.7
0.81 0.84 0.98 0.98 0.91 0.66 0.10
(B = 50)

MesoNMI 0.97 0.97 0.86 0.89 0.91 0.47


(B = 194) 0.6 0.05

JSD 0.41 0.46 0.72 0.69 0.66 0.47


0.5 0.00
I MI 2) 6) ) 4)
NM so( so( 50 19 D
-N
Me Me so( so( JS 1 50 100 150 200 250 300 350
DC Me Me
Number of clusters

Figure S1. Spearman correlation and AMI for FAO layer similarity scores. (a) The same test as in Fig. 4b
was repeated but using Spearman rank correlation instead of rank-biased overlap (RBO). The same qualitative results can be
observed. (b) The AMI scores in Table 1 were computed using all levels of each measures cluster hierarchy. The dashed line
indicates 12 clusters, which is the level at which the measures are compared in Table 1. We can observe that all measures more
or less maintain the same ordering regardless of which level we choose to cut the dendrogram at.

NMI DC-NMI MesoNMI(B = 2)


1.00
Product similarity

0.75

0.50

0.25

0.00
Within Between Within Between Within Between

MesoNMI(B = 6) MesoNMI(B = 50) JSD


1.00
Product similarity

0.75

0.50

0.25

0.00
Within Between Within Between Within Between

Figure S2. Distributions of similarity scores among product layers of the FAO trade network, within and
between ground truth product categories. Boxplots of the similarity scores for the FAO trade network layers used to
compute ⟨Sim(between) ⟩/⟨Sim(within) ⟩ in Table 1. Upper and lower tails indicate the 10%-ile and 90%-ile of the score distribution.
As seen in Fig. 4a, we observe the greatest variance in the scores of the MesoNMI measures, and as observed in Table 1 we see
the greatest relative discrepancy in the within-cluster and between-cluster distributions for the NMI and DC-NMI measures.
S2

Note S2. SCIENTIFIC COLLABORATION AND CONTINENTAL AIRPORT NETWORKS

To further test our graph similarity measures on empirical data, we applied them to both a network of scientific
collaboration across different fields of physics and a network of continental airports [50].
The American Physical Society (APS) scientific collaboration network consists of authors (nodes) that have pub-
lished at least one journal paper together (edges) in any of the ten highest-level categories (layers) in the Physics
and Astronomy Classification Scheme (PACS). Figure S3 shows the NMI and DC-NMI scores among these different
PACS networks.

(a) NMI (b)

0.10 0.05 0.02 0.04 0.00 0.13 0.02 0.26 0.32 General 10 NMI
0.23 0.14 0.00 0.00 0.00 0.03 0.00 0.04 0.23 Particles 9 DC-NMI

0.09 0.32 0.01 0.02 0.02 0.00 0.00 0.01 0.11 Nuclear 8

Number of clusters
0.05 0.01 0.03 0.08 0.03 0.03 0.04 0.02 0.00 Atomic 7
DC-NMI

0.11 0.01 0.03 0.15 0.08 0.04 0.05 0.03 0.01 Classical 6
0.01 0.01 0.05 0.08 0.13 0.00 0.00 0.00 0.02 Gases and Plasmas
5
0.18 0.08 0.01 0.06 0.08 0.01 0.18 0.35 0.07 Condensed Matter I
4
0.05 0.00 0.01 0.07 0.10 0.01 0.31 0.06 0.00 Condensed Matter II
3
0.31 0.09 0.01 0.05 0.08 0.01 0.42 0.14 0.13 Interdisciplinary
2
0.43 0.45 0.20 0.01 0.02 0.03 0.13 0.01 0.19 Astronomy
1
0.0 0.2 0.4 0.6 0.8 1.0
General

Nuclear

Atomic
Particles

Classical

Astronomy
Gases and Plasmas

Condensed Matter I

Condensed Matter II

Interdisciplinary

Linkage distance

(c) (d)
NMI DC-NMI
1.0 1.0

0.8 0.8
Linkage distance

Linkage distance

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
General

Nuclear

Atomic

Nuclear
General

Atomic
Astronomy
Particles

Classical

Particles
Astronomy

Classical
Gases and Plasmas

Condensed Matter II
Condensed Matter I

Condensed Matter II
Condensed Matter I

Gases and Plasmas


Interdisciplinary

Interdisciplinary

Figure S3. Similarity between fields of physics according to the NMI and DC-NMI measures. (a) Similarity
scores between different fields of physics defined according to their PACS number. (b) Hierarchical clustering applied to the
NMI and DC-NMI distance matrices in panel (a). The rapid decay in linkage distance of the DC-NMI measure indicates a
higher similarity at the author (node) level between fields of physics (network layers). (c) NMI and (d) DC-NMI dendrograms
for hierarchical clustering with respect to similarity value. Both measures extract clusters that are consistent with topical
overlap among the fields.
S3

The OpenFlights continental airport network consists of airports (nodes) that have at least one flight between them
(edges) operated by the same airline company (layers). We test the similarity between airlines from each of the six
continents using the NMI and DC-NMI measures (see Fig. S4).

NMI DC-NMI
1.0 Africa Africa
Asia Asia
0.8 Europe Europe
Fraction of clusters

North America North America


Oceania Oceania
0.6
South America South America

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Linkage distance Linkage distance

Figure S4. Similarity score cluster dendrograms among continental airports. Hierarchical clustering was applied
to the NMI and DC-NMI distance matrices obtained from the pairwise computation of similarity between airline networks
within each continent. We can see that in both cases Africa has the greatest cluster differentiation at the small scale, while the
ordering of the linkage patterns for the NMI and DC-NMI measures differ for the other continents.

You might also like