Topological Data Analysis For Network Resilience
Topological Data Analysis For Network Resilience
Forum (2021) 2: 29
https://doi.org/10.1007/s43069-021-00070-3
ORIGINAL RESEARCH
Received: 15 March 2021 / Accepted: 19 May 2021 / Published online: 1 June 2021
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2021
Abstract
Developing accurate metrics to evaluate the resilience of large-scale networks, e.g.,
critical infrastructures, plays a pivotal role in secure operation of these networks. In
this paper, we propose a novel framework to study the resilience of a network. To
this end, we leverage the tools from Topological Data Analysis (TDA) and Persistent
Homology (PH). The combined deployment of TDA and PH tools provides us with
a solid understanding of network topology only based on the underlying weighted
graph and comparing it with the base network, e.g., fully connected network as the
most resilient structure. By utilizing an abstract network to build our arguments and
results, we present a step-by-step method to leverage the fundamental theories of
TDA to study and improve a network’s resilience. By creating a weighted graph,
where weights represent a meaningful attribute to the underlying network, we utilize
Vietori–Rips complex and filtration to create persistent diagrams. This allows us to
extract topological information to study network resilience. Further, we show how
the use of Wasserstein distances can provide detailed information about the critical
edges (e.g., roads in transportation networks, or power distribution lines in power
networks) in the network, and how adding or removing certain edges affect the level
of resilience of the network by presenting a novel metric to quantify the resilience of
a network. We evaluate the effectiveness of the proposed method using a case study
that compares a base network with networks that include different edges using our
resilience metric.
This article is part of the Topical collection on Optimization, Control, and Machine Learning for
Interdependent Networks.
* M. Hadi Amini
[email protected]
Extended author information available on the last page of the article
Vol.:(0123456789)
29 Page 2 of 17 SN Oper. Res. Forum (2021) 2: 29
1 Introduction
In this section, we first provide the preliminary definitions that are required to
understand our proposed approach to quantify network resilience using tools from
algebraic topology, namely simplicial and persistence homology1.
1
More details of these definitions are provided in [15] and [16]. Further, brief introduction with some
state-of-the-art results can be found in [17].
29 Page 4 of 17 SN Oper. Res. Forum (2021) 2: 29
Persistent Embedding
Homology Methods
Network
Our Prior Work
Filtration
Research
2.1 Simplicial Complexes
These simplicial complexes will act at the “backbone” of the upcoming discus-
sion on the persistent homology on graph structures. The central idea is that, as the
“radius” of the network filtration grows, the homology will track as new connections,
and therefore higher-order simplices, are formed. The formation, or deformation, of
SN Oper. Res. Forum (2021) 2: 29 Page 5 of 17 29
2.3 Persistent Homology
Persistent Homology provides us with the tools to describe the number of structures
of a given dimension within a topological space given a simplical complex K; or in
our case, the complexes that make up a filtration K1 ⊂ K2 ⊂ ... ⊂ KN = K . Given a
dimension d, a d−chain is a formal sum of d-dimensional simplices in K. If 𝜎1 , ..., 𝜎m
m
∑
denotes the d-dimensional simplices, then we denote a d − chain as 𝛼 = ai 𝜎i ,
i=1
where ai take values over ℤ2 = {0, 1}. We define the addition of two chains
∑ ∑ ∑
𝛼 = ai 𝜎i and 𝛽 = bi 𝜎i as 𝛼 + 𝛽 = (ai + bi )𝜎i where the addition of scalars is
done mod 2. Then, the collection of the d-dimensional chains forms a vector space,
commonly denoted as Cd (K).
Next, we may define cycles and boundaries. The boundary of a d-dimensional
simplex is given by the sum of its (d − 1)-dimensional faces, that is
∑
𝜕d (𝜎) = 𝜏
(3)
𝜏<𝜎,dim(𝜏)=d−1
We define then a cycle as a chain with 0 boundary, i.e., 𝜕(𝛼) = 0. The collections of
cycles and boundaries each form a group under addition. The group of d-dimensional
cycles is denotes as Zd (K) and the group of d-dimensional boundaries is denoted as
Bd (K).
We are now ready to define the homology group on K. A fundamental lemma
in homology is the fact that 𝜕d 𝜕d+1 = 0. That is, each boundary is indeed a cycle.
Therefore, we have that Bd (K) is actually a subgroup of Zd (K). Then we may define
the homology group as the quotient group Hd (K) = Zd (K)∕Bd (K). The elements of
Hd (K) are called homology classes and are denoted as [𝛼]. Elements assigned to the
same homology class, which means they differ by a boundary, are called homolo-
gous and therefore the choice of 𝛼 to represent [𝛼] is by no means unique.
Now, being equipped with the simplicial complex filtrations and the homol-
ogy groups, persistence homology aims to analyze how the homology changes
SN Oper. Res. Forum (2021) 2: 29 Page 7 of 17 29
Here, a homology class [𝛼] in Hd (Ki ) will get mapped to its representative in
Hd (Ki+1 ). i is said to be the birth of the homology class if it is not in the image of
the map Hd (Ki−1 ) → Hd (Ki ). The death of the same class is j if [𝛼] = 0 in Hd (Kj ) but
[𝛼] ≠ 0 in its pre-image. The idea of birth and death of homology classes is at the
center of our considerations (see Fig. 4).
The homology classes give a very useful toolbox for explaining the topological fea-
tures within the data of a given dimension. The number of these features can be
summarized in the so-called Betti Numbers 𝛽i . These numbers are defined as the
rank of a given homology group 𝛽i = rk(Hi (K)). In the case of 𝛽0, it counts the
29 Page 8 of 17 SN Oper. Res. Forum (2021) 2: 29
number of 0-dimensional holes in the data and so 𝛽0 counts the number of connected
components. 𝛽1 counts the number of 1-dimensional holes in the data, 𝛽2 counts the
number of voids, and so on. We will be mostly interested in the 𝛽0 and 𝛽1 because of
their direct meaning to our application.
The persistence diagrams and barcodes give a useful visual representation of the
information encoded in the persistence homology. We can display the birth-death
relation of the different homology classes in the persistence diagram. For each
homology class, we collect the ordered pair (i, j), where i is the birth of the homol-
ogy class and j is its death. An example of such a diagram is given in Fig. 5. The
points near the diagonal i = j correspond to homology classes that were born and
died off very quickly. In practice, these classes correspond to noise in the data, espe-
cially those closer to the origin. Those points away from the diagonal correspond to
topological features in the data that persist over time and are significant.
2.5 Comparison Metrics
Given two persistence diagrams D1 and D2, we can define the bottleneck distance as
W∞ (D1 , D2 ) = inf sup ||x − 𝛾(x)||∞
𝛾 x∈D
1
(5)
Here, || ⋅ ||∞ denotes the regular L∞ norm, or the maximum distance between two
points. 𝛾 is meant to denote mappings between points in the persistence diagram
which are then minimized. To some readers, these distances should resemble mini-
mization of Lp norms from Lp theory and others will recognize them as a variant of
the Wasserstein metric from Optimal Transport theory. These metrics explain differ-
ent geometric variations between the two persistence diagrams. The bottleneck dis-
tance seeks to cap the maximum variation while the r-Wasserstein distance punishes
too big of a variation between points while being “kinder” to smaller variations. In
general, we want to evaluate both.
We aim to evaluate the resilience of an arbitrary network. Using the tools of Top-
ological Data Analysis (TDA), briefed in Section 2, we propose an algorithm for
effectively quantifying and evaluating network resilience. In order to use the tools
of TDA, we usually start by embedding the nodes as a point cloud in ℝn. The reason
we use an embedding is to properly define the distance between two points, which
is needed for the Vietoris–Rips complex. We hypothesize that different ways of
embedding the nodes could encode different information about the network. One
of the main reasons to focus on ℝn is its relation to real life. For example, if dealing
with power lines, using an embedding such as ℝ3 would be appropriate since the
length of cables (or edges of a network) are tied to the amount of electricity that can
be transmitted.
The other option, which is the one we propose here, is to use a version of network
filtration to evaluate an arbitrary network. The reason we do this is to remove the
necessity of choosing a space onto which to embed the data. As mentioned prior,
the choice for the space could change the outcome, so our approach provides a novel
attempt at bypassing this problem. Illustrated in Fig. 6, we start with a fully con-
nected graph G and evaluate its adjacency matrix 1uv, taking the convention that
wuv = ∞ if there is no edge between u and v. We can then remove edges in the graph
according to the weight thresholds to create a network filtration. This filtration will
be represented by a sequence of adjacency matrices. Having this filtration, we can
then impose the Vietoris–Rips complex on the filtration and evaluate its persistence
homology. After every edge removal, the 2-Wasserstein and Bottleneck distances are
computed comparing the subset graph to the complete graph, as shown in Algo-
rithm 1. We emphasize the significance of the persistence diagrams and Betti num-
bers in this context. Here, 𝛽0 will play a particularly important role as it will give us
the number of connected components in the network. That is to say that the change
in 𝛽0 will tell us when a network has “split” into two or more connected components.
In terms of resilience, this split means to us that there ceased to be communication
between different sets of components. In the case of a power grid, it can mean a
cutoff of power flow; or, in the case of a social network, it can mean a cutoff of mes-
sages. 𝛽1 can also play a significant role in other particular networks. Changes in 𝛽1
give the appearance and death of holes in the data. In a WiFi network, this change
can be interpreted as a cutoff in internet coverage across an area.
29 Page 10 of 17 SN Oper. Res. Forum (2021) 2: 29
4 Simulation
Unlike cited work, our simulation is run over an arbitrary network where the
intention is to build a framework for application for any other field. We begin
by laying out the network’s structure and how the experiment will be conducted.
Then, we finish the section with the results and how to interpret them.
SN Oper. Res. Forum (2021) 2: 29 Page 11 of 17 29
4.1 Simulation Structure
4.2 Results
The subgraph generated after the removal of 20 edges (roughly 55% of the total
number of edges) was chosen arbitrarily for comparison and is depicted in Fig. 8.
Persistence diagrams, detailed in Section 2, help us measure the topological resil-
ience of our network; H0 represents the number of connected components and H1 the
number of 1-dimensional holes. Comparing the persistence diagrams in Figs. 9 and
10, we see that there are more connected components born in the complete graph that
29 Page 12 of 17 SN Oper. Res. Forum (2021) 2: 29
Fig. 8 Subgraph
disappear as the distance allowed for the VR-Complex increases, evidenced by the
series of blue (H0) points along birth 0. We interpret this as the graph becoming con-
nected at lower weight thresholds compared to the subgraph which becomes connected
at a much higher threshold. We can interpret these graphs as the complete graph show-
ing more “connectedness,” implying a higher resilience within the topological structure.
The results shown in Fig. 11 and Table I are leveraged to compare the level of
resilience of a network with that of the most resilient design, the complete graph. We
began by taking the Wasserstein distance from each subgraph’s PD to the complete
graph’s PD which generates a raw distance factor. Then, by taking the difference
between the distance after removing n edges and n + 1 edges, we can plot the change
in distance, resulting in Fig. 11. As expected and confirmed by Fig. 11, the more
edges we remove the less resilient we become; but what should also be an interest-
ing question is, “what is the difference in terms of resilience of the fully connected
graph and a given subgraph?” We use the data of Table I at this stage. By taking
the inverse of the Wasserstein distance and multiplying by 100 to make it a percent,
we calculate a measurable quantity that represents how resilient the network design
retains from the complete graph design. For example, after removing 9 edges, we
notice we have had no difference in resilience since it retains 100% of the resilience
provided by the complete graph. This is explained by the level of redundancy with
respect to connections. A fully connected graph has a lot of redundant connections
between nodes. By removing nine edges, we have only made what previously took
one step to go from one node to another, now two steps. So, having no change at
this point is somewhat expected. The first change in the proposed resilience metric
happens after removing ten edges. Further, we have noticed that the subgraph gen-
erated after the removal of 10th edge has no difference in terms of resilience than
the subgraph after removing the 11th edge. These results can be further justified by
thinking about the number of edges we have removed. By the time we have removed
20 edges, we have removed more than half of the total number of edges we began
with. In fact, if we refer back to Fig. 8, we notice that it only takes the removal of 3
edges from node three in order to make an island. If our network was a power line
network where one node represents a city, we would have essentially cut off power
to an entire city. Therefore, having only 8% of the original resilience after removing
20 edges is a realistic value. After considering our results, we can clearly state that
based on our resilience metric, the methods described above allows us to quanti-
tatively measure the topological resilience of a network and to compare different
designs of a network against each other (see Table 1).
1 100%
2 100%
3 100%
4 100%
5 100%
6 100%
7 100%
8 100%
9 100%
10 31%
11 31%
12 15%
13 15%
14 15%
15 8%
16 8%
17 8%
18 8%
19 8%
20 8%
SN Oper. Res. Forum (2021) 2: 29 Page 15 of 17 29
node, we count this distance as infinity, which automatically tell us we have effec-
tively separated parts of the network. Then, we choose a random edge to remove,
usually an edge belonging to the node with the smallest degree. Then, by recomput-
ing the sum of the number of edges between every pair of nodes, storing this value,
and repeating the process of every other edge, we can compare the impact on the
network by removing the edges on the network one-by-one and calculating the resil-
ience metric. By choosing to remove the edge which led to the largest change in the
distance between every node, we have effectively chosen to remove the edge which
causes the most deterioration of our networks resilience. Thus, we can continue this
process until we have reached the point where there is not a path between any two
nodes. Theoretically, this would lead to the most efficient removal of edges, but it
comes at the cost of additional computation time while analyzing the impact of each
edge on network resilience. This approach could be very useful to prevent entities
with malicious intent from disrupting critical networks from operating by clearly
defining what set of edges are most crucial in disrupting proper function.
Acknowledgements This material is based upon Luiz Manella Pereira’s work supported by the U.S.
Department of Homeland Security under Grant Award Number, 2017-ST-062-000002. The views
and conclusions contained in this document are those of the authors and should not be interpreted as
necessarily representing the official policies, either expressed or implied, of the U.S. Department of
Homeland Security.
Funding Information Author Luiz Manella Pereira received funding for this study from the U.S.
Department of Homeland Security via Grant Award Number 2017-ST-062-000002.
Data Availability In our study we generated our own networks as we were unable to obtain data related
to critical infrastructures. This type of information is usually protected as they pertain to national
security or are owned and protected by institutions (i.e., overhead power line architectures). The
design and steps on how to reproduce the graphs used in this study are explained in detail throughout
the sections above.
Declarations
Conflict of Interest The authors declare that they have no conflict of interest.
References
1. Berizzi A (2004) The italian 2003 blackout. In IEEE Power Engineering Society General Meet-
ing, 2004. IEEE, pp. 1673–1679
2. Little RG (2002) Controlling cascading failure: Understanding the vulnerabilities of intercon-
nected infrastructures. J Urban Tech 9(1):109–123
3. Van Eeten M, Nieuwenhuijs A, Luiijf E, Klaver M, Cruz E (2011) The state and the threat of cas-
cading failure across critical infrastructures: the implications of empirical evidence from media
incident reports. Publ Admin 89(2):381–400
4. Gao J, Barzel B, Barabási A-L (2016) Universal resilience patterns in complex networks. Nature
530(7590):307–312
5. Han SY, DeLaurentis D (2013) Development interdependency modeling for system-of-systems
(sos) using bayesian networks: Sos management strategy planning. Proc Comp Sci 16:698–707
6. Reggiani A (2013) Network resilience for transport security: Some methodological considera-
tions. Trans Pol 28:63–68
SN Oper. Res. Forum (2021) 2: 29 Page 17 of 17 29
7. Li MZ, Ryerson MS, Balakrishnan H (2019) Topological data analysis for aviation applications.
Trans Res Part E: Logistics Trans Rev 128:149–174
8. Cang Z, Munch E, Wei GW (2020) Evolutionary homology on coupled dynamical systems with
applications to protein flexibility analysis. J Appl Comput Topol 4(4):481–507
9. Nicolau M, Levine AJ, Carlsson G (2011) Topology based data analysis identifies a subgroup
of breast cancers with a unique mutational profile and excellent survival. Proc Natl Acad Sci
108(17):7265–7270
10. Saggar M, Sporns O, Gonzalez-Castillo J, Bandettini PA, Carlsson G, Glover G, Reiss AL
(2018) Towards a new approach to reveal dynamical organization of the brain using topological
data analysis. Nat Comm 9(1):1–14
11. Mileyko Y, Mukherjee S, Harer J (2011) Probability measures on the space of persistence dia-
grams. Inv Prob 27(12):124007
12. Dufresne E, Edwards P, Harrington H, Hauenstein J (2019) Sampling real algebraic varieties for
topological data analysis. 2019 18th IEEE International Conference On Machine Learning And
Applications (ICMLA)
13. Chazal F, Glisse M, Labruére C, Michel B (2013) Optimal rates of convergence for persistence
diagrams in topological data analysis. arXiv preprint arXiv:1305.6239
14. Islambekov U, Dey AK, Gel YR, Poor HV (2018) Role of local geometry in robustness of
power grid networks. In 2018 IEEE Global Conference on Signal and Information Processing
(GlobalSIP), IEEE, pp. 885–889
15. Edelsbrunner H, Harer JL (2010) Computational topology: An introduction. American Mathematical
Soc, Providence, RI, p xii+241
16. Zomorodian AJ (2005) Topology for computing, vol. 16. Cambridge University Press, Cam-
bridge, p xiv+243
17. Chazal F, Michel B (2017) An introduction to topological data analysis: fundamental and practical
aspects for data scientists
18. Zomorodian A (2010) Fast construction of the vietoris-rips complex. Comp Graph 34(3):263–271
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.