0% found this document useful (0 votes)
19 views

GNNs For Zero Forcing Sets

This article studies zero forcing sets (ZFS) in graphs, which are subsets of initially colored vertices that can color the entire graph. Finding minimum ZFS is NP-hard, and heuristics do not scale well. The authors propose enhancing the greedy heuristic with a random selection process and a graph convolutional network (GCN) trained to imitate greedy selection. Their machine learning approach achieves significant speed improvements over greedy, scaling to graphs 10x larger than the training set with comparable or better performance.

Uploaded by

dragance107
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

GNNs For Zero Forcing Sets

This article studies zero forcing sets (ZFS) in graphs, which are subsets of initially colored vertices that can color the entire graph. Finding minimum ZFS is NP-hard, and heuristics do not scale well. The authors propose enhancing the greedy heuristic with a random selection process and a graph convolutional network (GCN) trained to imitate greedy selection. Their machine learning approach achieves significant speed improvements over greedy, scaling to graphs 10x larger than the training set with comparable or better performance.

Uploaded by

dragance107
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

2110 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO.

2, MARCH/APRIL 2024

A Graph Machine Learning Framework to Compute


Zero Forcing Sets in Graphs
Obaid Ullah Ahmad , Mudassir Shabbir , Waseem Abbas , Member, IEEE,
and Xenofon Koutsoukos , Fellow, IEEE

Abstract—This article studies the problem of computing zero- such colorings have been considered with many applications
forcing sets (ZFS) in graphs and provides a machine-learning in network science and engineering, such as air traffic flow
solution. Zero-forcing is a vertex coloring process to color the management [1], nucleic acid sequence design in biochemical
entire vertex set from a small subset of initially colored vertices
constituting a ZFS. Such sets have several applications in network networks [2], channel assignment in wireless networks [3], and
science and networked control systems. However, computing a community detection in social networks [4]. In addition, graph
minimum ZFS is an NP-hard problem, and popular heuristics coloring also serves as an effective tool in solving other signifi-
encounter scalability issues. We investigate the greedy heuristic for cant graph theory problems, for instance, graph partitioning [5]
this problem and propose a combination of the random selection and clique computation [6].
and greedy algorithm called the random-greedy algorithm, which
offers an efficient solution to the ZFS problem. Moreover, we en- Zero forcing is a dynamic coloring of vertices, which are
hance this approach by incorporating a data-driven solution based initially colored either black or white. A black vertex can change
on graph convolutional networks (GCNs), leveraging a random the color of its white neighbor to black under some conditions
selection process. Our machine-learning architecture, designed to (i.e., the black vertex has exactly one white neighbor). The goal
imitate the greedy algorithm, achieves significant speed improve- is to select the minimum number of vertices in a graph, which,
ments, surpassing the computational efficiency of the greedy al-
gorithm by several orders of magnitude. We perform thorough if colored black initially, will render all vertices black at the end
numerical evaluations to demonstrate that the proposed approach of the coloring process. Such a subset of initial black vertices
is considerably efficient, scalable to graphs about ten times larger is called a zero forcing set (ZFS) in a graph (as explained in
than those used in training, and generalizable to several different Section II-A). Zero forcing has applications in modeling various
families of synthetic and real-world graphs with comparable and
physical phenomena, including logic circuits analysis, disease
sometimes better results in terms of the size of ZFS. We also curate
a comprehensive database comprising synthetic and real-world spread analysis, and information spread in social networks [7],
graph datasets, including approximate and optimal ZFS solutions. [8], [9], [10]. In particular, ZFS is an important notion in studying
This database serves as a benchmark for training machine-learning network controllability, which is a central phenomenon in the
models and provides valuable resources for further research and control of networked systems. Network controllability is ma-
evaluation in this problem domain. Our findings showcase the ef- nipulating a network of agents as desired by injecting external
fectiveness of the proposed machine-learning solution and advance
the state-of-the-art in solving the ZFS problem. control inputs through a subset of agents called leaders. The
crucial problem is determining the minimum leader agents to
Index Terms—Zero-forcing Set, graph convolutional network, make the network completely controllable. It turns out that ZFS
network controllability, leader selection problem.
in a network characterizes a leader set for the network’s complete
controllability [10], [11] (as discussed in Section II-B).
I. INTRODUCTION Unfortunately, finding a minimum ZFS for a given graph is a
YNAMIC coloring in graphs is a process of coloring combinatorial optimization problem shown to be NP-Hard [12].
D vertices iteratively according to some pre-defined rules
and conditions. Based on the coloring rules, several variants of
The algorithms to compute the optimal solutions take exponen-
tial time and are not scalable to huge networks. Algorithms
computing approximate solutions, such as greedy, generally
provide reasonable results; however, they also incur scalability
Manuscript received 15 June 2023; revised 27 October 2023; accepted 22
November 2023. Date of publication 4 December 2023; date of current version problems as the network sizes grow. Additionally, degenerate
23 February 2024. This work was supported by the National Science Foundation cases exist for which the greedy solution can be arbitrarily
under Grants 2325416 and 2325417. Recommended for acceptance by Prof. X. bad [13]. We discuss the exact and approximate solutions in
Li. (Corresponding author: Obaid Ullah Ahmad.)
Obaid Ullah Ahmad is with the Electrical Engineering Department, The Section III.
University of Texas at Dallas, Richardson, TX 75080 USA (e-mail: Obaidul- This work aims to find a computationally fast and accurate
[email protected]). solution for the ZFS problem using random selection, machine
Waseem Abbas is with the Systems Engineering Department, The
University of Texas at Dallas, Richardson, TX 75080 USA (e-mail: learning (ML) and data-driven approaches. Recently, ML-based
[email protected]). solutions have found applications in solving computationally
Mudassir Shabbir and Xenofon Koutsoukos are with the Computer Science hard problems. Gama et al. propose a GNN-based distributed
Department, Vanderbilt University, Nashville, TN 37235 USA (e-mail: mudas-
[email protected]; [email protected]). solution to solve the flocking and the multi-agent path plan-
Digital Object Identifier 10.1109/TNSE.2023.3337750 ning problems [14]. They propose a novel framework using

2327-4697 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
AHMAD et al.: GRAPH MACHINE LEARNING FRAMEWORK TO COMPUTE ZERO FORCING SETS IN GRAPHS 2111

graph signal processing (GSP) to learn the controllers for these scalable. Furthermore, we train the GCN on smaller graphs
problems. Additionally, there has been an increasing inter- and evaluate it on graphs ten times the sizes of graphs
est in utilizing these data-driven, and graph machine learning in the training set and observe a remarkable difference in
(GML) approaches to solve hard combinatorial optimization computation time from the greedy algorithm.
problems [15], [16]. Cappart et al. provide a comprehensive ZFS computation is a hard problem, as explained in Sec-
review of recent works involving the use of machine learning to tion III. There are ways to find an approximate solution that are
solve combinatorial optimization problems [16]. By exploiting generally practicable but can have unrealistic time complexity
the known instances’ solution patterns and relating them to the for huge graphs. ML methods afford the great potential to
underlying network structure, we can train ML models that provide time-efficient solutions to challenging combinatorial
provide approximately optimal solutions to large networks in optimization problems. However, in literature, these solutions
a fraction of the time than the best-known heuristic algorithms. are not trivial. For instance, Joshi et al. show that for the
Further, when combined with heuristic components, the learning Travelling salesman problem (TSP), the trained model’s learning
models could compute solutions with much better approxi- is usually limited to a certain scale level, and the model’s perfor-
mations. Nevertheless, designing efficient and stable machine mance drops drastically when evaluated on larger networks [19].
learning architectures for combinatorial problems has inherent Researchers are trying to generalize machine learning solu-
challenges. Some significant challenges include the availability tions for the more extensive networks for the TSP [20], [21],
of datasets for learning, designing appropriate ML architectures, [22]. Bello et al. introduce a neural network-based framework
and scalability and transferability [16], [17], [18] (as discussed for solving combinatorial optimization, with a focus on the
in detail in Section III-C). traveling salesman problem (TSP). It trains a recurrent neural
Our approach relies on combining data-driven and algorith- network to predict city permutations, optimizing parameters
mic insights to find near-optimal ZFS. We study the structural using reinforcement learning and comparing training methods
aspects of the problem to design a graph convolutional network using negative tour length rewards [20]. Additionally, collecting
(GCN). We also generate a huge database using synthetic and optimal data for training is also a massive problem for hard
real-world graph datasets to train our GCN model. As a result, we combinatorial optimization problems [23]. We have devised a
achieve comparable results to the greedy algorithm in a fraction way to use machine learning without needing an optimal labeled
of the time for huge networks. Our main contributions are: dataset.
r We propose a GCN-based architecture using the insights The article is organized as follows: Section II introduces the
from the greedy algorithm. The key aspect is to design a notations and definitions; Section III discusses known ways to
GCN capable of learning to imitate the steps of the greedy compute ZFS and the potential and challenges of ML; Section IV
algorithm much more efficiently. By employing this do- describes the datasets; Section V discusses the structural aspect
main knowledge in the GCN designing process, we achieve of the problem. In the last three sections, we propose a GCN-
a scalable, generalizable, and time-efficient solution, as based architecture for the ZFS problem and evaluate several
explained in Section VI-D. We analyze the proposed GNN of its aspects including generalizability and scalability. Finally,
in detail and discuss various model parameters to obtain Section IX concludes the article.
superior solutions in terms of time complexity and ZFS
size. II. PRELIMINARIES
r We study the greedy algorithm for the ZFS problem in de-
An undirected graph G = (V, E) models a multiagent net-
tail and uncover some valuable insights into its underlying
work. Vertex set V and edge set E ⊂ V × V represent agents
structure. In particular, we develop a hypothesis pertaining
and interactions between them, respectively. The edge between
to the random selection of ZFS in the initial iterations of the
vertices u and v is denoted by an unordered pair (u, v). The
greedy algorithm and empirically validate it across several
neighborhood of u is the set Nu = {v ∈ V : (u, v) ∈ E} and
distinct datasets. To the best of our knowledge, existing
the degree of u is deg(u) = |Nu |. The distance between vertices
research has not explored the integration of random and
u and v, denoted by d(u, v), is the number of edges in the shortest
greedy vertex selection strategies as a means of designing
path between them. Next, we define the zero forcing process and
effective heuristics for combinatorial optimization prob-
the respective terms and then its applications.
lems, such as ZFS.
r We generate a huge database for the ZFS problem. We
compute optimal solutions for synthetic graphs and greedy A. Zero Forcing Process
solutions for three synthetic and four real-world graph Definition (Zero forcing (ZF) Process): Consider a graph
datasets. G = (V, E), such that each v ∈ V is colored either black or
r Using the dataset with the optimal solutions, we show that white initially. The ZF process is to iteratively change the colors
for the ZFS problem, the optimal data is not vital for the of vertices using the following rule until no further changes are
training of our proposed architecture to achieve satisfactory possible.
results. Instead, training can be done using approximate Color change rule: If v ∈ V is colored black and has exactly
solutions. See Section VII for further details. one white neighbor u, change color of u to black.
r We thoroughly evaluate the proposed solution on an exten- We say that v infected u if the color of white vertex u is
sive collection of graphs and show that our model is highly changed to black by any black vertex v.

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
2112 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 2, MARCH/APRIL 2024

The controllability matrix is defined as


 
C(M, B) = B M B M 2 B · · · M |V |−1 B .
Definition (Strong Structural Controllability (SSC)): A graph
G = (V, E) with a given set of leaders VL ⊆ V (and the corre-
sponding B matrix) is strong structurally controllable if and
only if (M, B) is a controllable pair for all M ∈ M(G).
Fig. 1. V  = {v6 , v7 , v8 } is the input set. After the ZF process, D(V  ) = V ,
as indicated by the black vertices. Hence, V  is a ZFS. We are interested in finding a minimum-sized leader set VL
that renders the network strong structurally controllable and re-
fer to it as the minimum leader selection problem (LSP). The LSP
is computationally challenging and known to be NP-hard [12],
Definition (Derived Set): Consider a graph G = (V, E) with [25], [26], [27]. The LSP is equivalent to the combinatorial
V  ⊆ V be the set of initial black vertices. Then, the set of black optimization problem, called the minimum zero forcing set (ZFS)
vertices obtained at the end of the ZF process is the derived problem, of finding the minimum ZFS Z ∗ (G) of a given graph
set, denoted by D(G, V  ), or simply D(V  ) when the context is G. In [10], authors show that a leader set VL renders the network
clear. The cardinality of a derived set for an input set V  , i.e., strong structurally controllable if and only if VL is a zero forcing
|D(V  )|, is the span of V  . set of the network graph G. Thus a minimum ZFS of G is also
The set of initial black vertices V  is also referred to as the a solution of the LSP.
input set. For a given input set V  , the derived set D(V  ) is The ZFS problem is directly related to many other graph
unique [24]. Now, we define the zero forcing set. theoretic problems including the forts in graphs, the art gallery,
Definition (Zero Forcing Set (ZFS)): For a graph G = (V, E), and k-power propagation in hyper-graphs problems, etc. With
V  ⊆ V is a ZFS if and only if D(G, V  ) = V . We denote a ZFS respect to zero forcing, a fort is the set of vertices that do not
of G by Z(G) and its size by ζ(G). The minimum zero forcing get forced by some initial set of colored vertices, and a zero
set, denoted as Z ∗ (G), is the ZFS of the minimum size. forcing set is exactly a set of vertices hitting every fort [28]. The
Fig. 1 illustrates zero forcing and derived sets. number of distinct paths, the number of vertices in the paths,
and the allowable multiplicities of a matrix for the graph are also
B. Applications of ZFS to Network Control shown to have a relation with the partial zero-forcing sets [29].
The zero forcing problem has many real-world applications. Similarly, there are a lot of other problems that can be indirectly
We are going to particularly talk about an application of ZFS in solved using the zero-forcing set problem [27], [30], [31], [32].
network controllability. We consider a network G = (V, E) with
|V | agents, denoted by V = {v1 , v2 , . . . , v|V | }, of which m are III. RELATED WORK
leaders, which are represented by VL = {1 , 2 , . . . , m } ⊆ V . The computation of the minimum ZFS Z ∗ (G) is an NP-Hard
We consider the following linear time-invariant system on G. problem [12]. Therefore, many researchers have looked into the
computation of the approximate solution for this problem. In
ẋ(t) = M x(t) + Bu(t). (1)
this Section, we mention some of the approaches for the exact
Here, x(t) ∈ R|V | is the state vector and u(t) ∈ Rm is the computation of the miminum sized ZFS. We also discuss a few
external input injected into the system through m leaders. state-of-the-art algorithms to find approximate solutions, their
M ∈ M(G) is the system matrix, where M(G) is a family of shortcomings, and how machine learning can be utilized for
symmetric matrices associated with G defined as the combinatorial optimization problems such as ZFS problem,
along with its major challenges.
M(G) = {M ∈ R|V |×|V | : M = M  ,
A. Exact ZFS Computation
and for i = j, Mij = 0 ⇔ (i, j) ∈ E(G)}.
In general, the computation of ζ(G) is an NP-hard prob-
The matrix B ∈ R|V |×m in (1) is the input matrix, such that lem [12]. Consequently, finding an optimal Z(G) in polynomial
[B]ij = 1, if vi = j ; and 0 otherwise. We note that the input time, in general, is out of reach. A widely used algorithm for
matrix B is defined by the selection of leader agents. Moreover, the exact computation of ζ(G) and a minimum Z(G) is a com-
M(G) contains a broad class of system matrices defined on binatorial algorithm called the wavefront algorithm [8], [33].
graphs, including the adjacency, Laplacian, and signless Lapla- However, the algorithm has an exponential time complexity,
cian matrices. and in the worst case, it is the same as enumerating all possible
The system (1) is controllable if there exists an input u(t) subsets of vertices [8]. Some improvements based on the in-
that can drive the system from an arbitrary initial state x(t0 ) teger programming formulation, branch-and-bound techniques,
to any desired state x(tf ) in a finite amount of time. If the Boolean satisfiability models, bilevel mixed-integer linear pro-
system is controllable for given system and input matrices, gram formulation have been proposed to solve the minimum
we say that (M, B) is a controllable pair. Moreover, (M, B) ZFS problem [8], [34], [35]. However, the performance of such
is a controllable pair if and only if the controllability matrix algorithms depends on several factors, including the existence of
C(M, B) ∈ R|V |×|V |m is full rank, i.e., rank(C(M, B)) = |V |. specific subgraphs and the graph density. Though these methods

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
AHMAD et al.: GRAPH MACHINE LEARNING FRAMEWORK TO COMPUTE ZERO FORCING SETS IN GRAPHS 2113

offer improvements, their time complexity remains exponential stochastic optimization methods to adapt to the given data sam-
in general. Hence, many researchers have proposed approximate ples while optimizing the learning model parameters. GNNs
algorithms to find the Z(G) in polynomial time. In the follow- exploit the patterns and invariances in the given data distri-
ing subsection, we discuss a few approaches to compute the bution, which help solve many problems on graph-structured
approximate solutions for the minimum ZFS problem. data like toxicity prediction in chemical compounds, community
detection in social networks [37], etc. GNNs have also been
applied to challenging combinatorial optimization problems on
B. Approximate ZFS Computation graphs with some success. Satisfiability, Travelling Salesman,
Various heuristics have been proposed to compute Z(G) in Knapsack, Minimum Vertex Cover, and Maximum Cut are a few
large graphs in feasible times. The most notable is the greedy of the problems that have been solved using GNNs [21], [22],
heuristic that typically computes a small-sized Z(G). The main [38]. This is an exciting avenue for complicated combinatorial
idea is to iteratively construct a Z(G) by adding a vertex to a problems that would otherwise either take an exceptional time
Z(G) solution that maximizes the size of the derived set in that even for an approximate solution or could not be computed with
iteration. Once a solution is obtained by this iterative process, limited computational resources. This work shows a GNN-based
redundant vertices are removed to obtain a minimal Z(G). The approach can be used to compute Z(G) with reliable accuracy
greedy heuristic generally performs well. Moreover, though the and efficiency. The sizes of Z(G) returned by our GNN solution
greedy heuristic is faster than optimal algorithms and it can be are comparable to or smaller than the existing techniques while
verified in linear time whether a given set is a Z(G) [36], it still taking only a fraction of runtime as we thoroughly compare and
takes significant time to compute a solution for huge graphs as evaluate in the later sections. However, we note that in general,
we demonstrate in our numerical evaluations (Section VI-D). several challenges need to be addressed to design an efficient
Utilizing the idea of potential games, [13] presents another GNN solver for a hard combinatorial optimization problem like
heuristic based on the log-linear learning (LLL) that returns the minimum Z(G).
comparable solutions (in terms of ζ(G)) to the greedy algorithm. 1) Limited Datasets for Training: Constructing a required
However, the quality of the solution is a function of the number size dataset suitable for training a machine learning model is
of iterations. Numerical evaluations show a fast convergence; usually the most critical and challenging step in designing a data-
however, the evaluation is confined to graphs of small sizes (50 driven solution for a combinatorial optimization problem. This
vertices). Some other heuristics are also discussed in [8], [34]. is also true for the minimum ZFS problem for which there is no
As mentioned above, the best-known heuristic to find the ap- known labeled dataset available for learning. Generating a large
proximate solution for the ZFS problem is the greedy algorithm. enough data set with optimal solutions for hard combinatorial
However, it is not scalable to large graphs and has a huge time problems, such as computing a minimum Z(G), requires due
complexity for large graphs. For instance, in one of our numeri- diligence along with sufficient computational resources [16].
cal evaluations, it takes the greedy algorithm about 156 minutes We generate a huge new labeled database consisting of several
to find Z(G) of a graph with 990 vertices. Similarly, [13], [34] synthetic and real-world graph datasets (Section IV).
shows that there exist graphs for which the greedy solution can 2) Scaling and Generlizability: A related issue of scaling
be arbitrarily bad; that is, the difference between the optimal also plagues many proposed solutions. Even when there are
and greedy solutions can be arbitrarily large. This limits the some datasets available for some graph problems, the instances
applicability of the greedy algorithm for real-world network in these datasets are of very small sizes. Consequently, the
problems. machine learning models that are trained on these instances can
only reliably be tested on similar-size problems. An important
problem that some of the previous works have tried to address is
C. Graph Machine Learning – Proposed Approach for ZFS generalizability, i.e., training a model on small instances that per-
Computation form well on large instances. Such works report limited success
Graph datasets present several new challenges for traditional so far [17]. Along with being scalable and generalizable to larger
machine learning like input size variations, non-regular neigh- sizes of problems, the learned models should also extrapolate
borhood structure, and vertex permutations. There are two main and transfer well to different families of graphs. Designing a
approaches for graph-structured data. Graph embeddings trans- machine learning solution that performs well outside the support
form the input graphs to a fixed-sized low-dimensional interme- of the training distribution is a crucial research challenge [18].
diate representation that is then used in a classifier. In contrast, In this article, we present an approach to find the minimum
Graph Neural Networks (GNNs) are trained in an end-to-end Z(G) that is a combination of both data-driven and heuristic
manner, and graph structural information is used to construct methods (Section VI). We perform several experiments where
the layout of the Neural Network. we are able to illustrate that our approach is capable of both
GNNs generalize deep neural networks (DNNs) for graph- scaling and generalizing since we exploit the intricate details of
structured data. The main objective is to achieve vector repre- the greedy algorithm in the design of our method. As the major
sentations of vertices/graphs in the low-dimensional space that part of the solution comes from the machine learning model, our
encodes structural relationships in graphs in the corresponding architecture is also much faster and computationally inexpensive
vector representations. A GNN is then trained in an end-to-end than the greedy heuristic without compromising on the quality
manner against some appropriately designed loss function using of the solution.

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
2114 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 2, MARCH/APRIL 2024

3) GNN Architectures: GNN architectures for combinatorial TABLE I


DATASETS DETAILS
optimization problems are designed to maximally exploit the
structural characteristics of the problem during the learning
phase. Thus, to learn the problem structure and solution patterns
from the data, the architecture is adjusted for the particular
problem to accommodate its unique characteristics, for instance,
vertex penalties and related constraints [15]. This also requires
devising proper loss functions for the GNN. Unfortunately,
these adjustments are not straightforward and are not typically
translated from one problem to another. As a result, designing the
most appropriate GNN architecture with a suitable loss function
for the combinatorial problem is a tricky affair. Li et al. proposed is limited to graphs of small sizes. The greedy heuristics on
a novel approach to solve combinatorial optimization problems the other hand, take polynomial amount of time and perform
that leverage both deep learning and classic heuristics [39]. fairly well on large random graphs. The learning-based solutions
However, a very recent work of Böther et al. shows that for work comparatively very fast and are proclaimed to perform
problems like MIS which involve guided tree searches, the well on combinatorial optimization problems since they may
GNNs do not necessarily learn meaningful representations and discover useful patterns in the data that are usually hard to
propose to use reinforcement learning inspired by the classical specify by hand [21], [39]. To use these learning-based algo-
solvers for combinatorial optimization problems [17]. Hence, rithms, a decent-sized labeled dataset with optimal Z(G) is
we look deeper into the greedy heuristics to solve the ZFS to required. Unfortunately, there is no dataset for the minimum
provide a scalable GNN-based solution. Our machine learning ZFS problem. In this work, besides proposing a learning-based
model, inspired by the greedy solution, solves this optimization solution, we also create a huge database that can be used for any
problem in a greedy fashion. We introduce a novel loss function learning-based algorithm to find minimal Z(G). This database
for the task and refine the intermediate solution obtained from will be publicly available for the research community 1 .
GNN model using a heuristic algorithm (Section VI). In our database, we include synthetic as well as real-world
In this work, we provide a GNN-based solution to solve the graph datasets. The computation of optimal Z(G) is infeasible
ZFS problem that is capable of effectively handling all these on datasets with huge graphs. As the greedy algorithm finds
challenges and can provide comparable results as those of the solutions that are fairly close to the optimal values, we compute
greedy algorithm. We use Graph Convolutional Network (GCN) greedy solutions Zgr (G) of sizes denoted by ζgr (G) for all
to imitate the steps of the greedy algorithm and iteratively the graphs. The total time to find the greedy solutions for all
complete the Z(G) much faster than the greedy algorithm. the datasets is about a couple of months on a machine having
The empirical results show that approximate solutions can be Xeon(R) Gold 6238R CPU @ 2.20 GHz CPU and A40 PCIe
used to train our proposed network and the optimal solutions GPU. We use the same machine for all our experiments. In ad-
are not required for training for this problem. Additionally, the dition to the greedy solutions, we also generate optimal solutions
model trained on approximate (greedy solutions) performs better for a small subset of synthetic graphs for a thorough study of
than the greedy algorithm itself in many cases. This provides our proposed architecture presented in Section VII. Following
empirical evidence that our proposed architecture can be used are the details of the graph datasets used in our experiments.
to generate a new database for the ZFS problem.
Next, we present some insights about the greedy algorithm A. Synthetic Datasets
for the ZFS problem and our proposed data-driven architecture We generate three new labelled datasets of synthetic graphs.
to find the approximate Z(G). First, we present details of a The total time to find the greedy solutions for these graph
huge database prepared as a part of this work. Then, we present datasets is approximately more than a month. The details of the
interesting observations about the greedy algorithm. Then, we parameters to generate these datasets are presented below and
introduce a new GCN-based architecture motivated by the ob- the general statistics including the average ζgr (G) are mentioned
servations from the greedy algorithm to find the Z(G) for a given in Table I.
graph. Lastly, we demonstrate the capabilities of the proposed
architecture in terms of scalability and generalizability. -Large ER graphs: We generate a dataset of 1500 random Erdős-
Rényi (ER) undirected graphs using the networkx python
library with the number of vertices ranging between 500 and
IV. DATASETS 1000. The density parameter p of ER graphs is varied between
We employ a data-driven model in our proposed approach 0.013 to 0.125. The average number of vertices in a graph is
that requires sufficient labelled data for training. As mentioned 745, and the average number of edges is 14748.
in the previous section, the zero-forcing set problem is a part of -Small ER graphs: We synthesize 979 ER graphs of sizes be-
the combinatorial optimization problems class that are NP-Hard tween 30 and 70 where the average graph size is 65. The
and considerable computational resources are required to find graphs in this dataset are relatively more dense with p varying
its solution. The algorithms that provide the optimal Z(G) for
a given graph G take exponential time and their applicability 1 https://tinyurl.com/ZFS-datasets

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
AHMAD et al.: GRAPH MACHINE LEARNING FRAMEWORK TO COMPUTE ZERO FORCING SETS IN GRAPHS 2115

from 0.2 to 0.7. Only for these small graphs, we also compute
Algorithm 1: Greedy Algorithm for ZFS.
optimal solutions using the wavefront algorithm [8]. In fact,
the graphs in this dataset are selected from a pool so that, Input: G = (V, E)
the greedy algorithm performs badly on them compared to Output: Zero forcing set Z
the optimal solution. The average difference in the size of the 1: Initialize: Z ← {}
optimal solutions ζop (G) and greedy solutions ζgr (G) is 6.1 2: while D(Z) = V do
vertices. 3: v ∗ ← argmax |D(Z ∪ {vi })|
vi ∈V \Z
-scale-free graphs: This dataset contains 500 sparse undirected 4: Z ← Z ∪ {v ∗ }
scale-free graphs synthesized using the same python module. 5: end while
The average degree of graphs in this dataset is 2.72 with the 6: for all vi ∈ Z do // removing redundancies
number of vertices ranging from 50 to 1000. 7: if D(Z \ {vi }) = V then
8: Z ← Z\ {vi }
B. Real-World Datasets 9: end if
We also include four standard real-world graph datasets in our 10: end for
database.

-IMDB-BINARY & IMDB-MULTI are both movie collaboration We define this process of the combination of random selection
datasets where the nodes are the actors/actresses and an edge along with greedy heuristics as a Random-Greedy algorithm.
is formed if the actors appear in the same movie [40]. We empirically validate Conjecture 1 utilizing the database
-REDDIT-BINARY is a social networks dataset consisting of presented in Section IV. We perform experiments on all four
graphs that represent the online Reddit discussions [41]. A real-world, large ER, and scale-free graph datasets. Only a subset
node is a user and an edge represents a reply to the comments. of large ER graphs dataset (300 graphs) is selected due to the
-COLLAB is a scientific collaboration dataset in which graphs computation overhead for these large graphs. The details of the
have researchers as nodes and their collaborations as experimental setup and the results are mentioned in the following
edges [42]. subsections.
Among these real-world datasets, REDDIT-BINARY is the
only one that contains graphs with more than 500 nodes. For this A. Experimental Setup
dataset, we only take the graphs that have less than 1200 nodes
since it becomes computationally very expensive to find the For all these synthetic and real-world datasets, we compute a
greedy solutions for larger graphs. The total time for computing greedy solution represented as Zgr (G) for a graph G = (V, E)
greedy solutions for REDDIT-BINARY is approximately 6 days. and the size of this set is represented as ζgr (G). The greedy al-
The graphs in the other datasets are prepared within a day. gorithm has two dominant parts; first to greedily populate Z(G)
Further statistics about these datasets are provided in Table I. until it is a complete solution, and then a greedy redundancy
These datasets are vital for the numerical analysis of the check to remove any extra nodes. The redundancy check is a
ZFS problem. In the next section, we provide some intriguing separate part, and it is applied in all our experiments where the
observations about the greedy algorithm using this database. greedy algorithm is used to complete the solution.
To corroborate our conjecture, we introduce a random se-
lection in the greedy algorithm. We provide an intermediate
V. INSIGHTS INTO GREEDY HEURISTICS FOR ZFS
solution computed by randomly selecting a vertex set which is
COMPUTATION
a fraction of ζgr (G) and then pass this partially formed solution
One of the approaches to compute approximate Z(G) is to the greedy algorithm to obtain a Z(G) primarily based on
the greedy approach in which we iteratively construct a Z(G) random selection. We represent the solution obtained by the
by adding a vertex to Z(G) that maximizes the span of the random-greedy algorithm as Zrdm (G) and its size as ζrdm (G).
intermediate solution as mentioned in Algorithm 1. However, For comparison, we already have the greedy solution and the
the selection of a vertex will be arbitrarily random if two or ζgr(G) for each graph, we only fix the randomness fraction ratio
more vertices have the same span. In the initial few iterations, τ to obtain the intermediate solution.
we observe that the greedy algorithm is likely to have multiple The performance of this approach is described by using the
potential vertices to add to the intermediate solution if the degree percentage deviation between the size of the Z(G) by the greedy
of all the vertices is large enough. During these iterations, solution ζgr (G), and the size of the Z(G) returned by this
the construction of the intermediate Z(G) solution by greedy partial random selection ζrdm (G). We report the average of the
algorithm might be as effective as the random selection. Based deviation over all the graph instances in the test sets of different
on this observation, we propose the following conjecture. datasets. Formally,
Conjecture 1: If half of the vertices of Zgr (G) are picked
randomly and the rest of the approximate solution is completed |K|
by the greedy solution, then it is fairly the same as obtaining a 1  ζrdm (Gi ) − ζgr (Gi )
Dev(ζrdm ) = × 100,
solution completely by the greedy algorithm. |K| i=0 ζgr (Gi )

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
2116 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 2, MARCH/APRIL 2024

Fig. 3. Sizes of the Z(G) computed by the greedy and the random selection
for the graphs in the REDDIT BINARY dataset with τ = 0.7.
Fig. 2. Results for synthetic and real-world datasets when a certain percentage
of vertices is selected randomly and the greedy algorithm is used to complete
the Z(G).

where K is the test set of a dataset. We vary the fraction ratio τ


for the random selection and observe the average deviation for
each of the datasets.

B. Results and Discussion


The results obtained from this experiment align with Con-
jecture 1. In this experiment, we begin by randomly selecting a
subset of vertices from a given graph G as part of the Zrdm (G)
solution. The size of this subset is determined by multiplying a
user-defined parameter, τ , by the size of the ZFS obtained from Fig. 4. Two instances of ZFS solution. The initial two vertices (v4 and v7
the greedy algorithm, denoted as ζgr (G). Subsequently, we feed in ZFS 1, and v3 and v5 in ZFS 2) are selected randomly, whereas, the third
this partial random solution to the greedy algorithm to acquire a vertex in each solution (v8 in ZFS 1 and v1 in ZFS 2) is selected by the greedy
algorithm maximizing the span.
complete solution. Finally, a redundancy check is performed to
eliminate any redundant vertices from the solution Zrdm (G).
To evaluate the impact of the random selection, we vary
the percentage of the random selection and plot the average the first iteration, the derived set has only one vertex (itself),
deviation Dev(ζrdm ) from the sizes of the ZFS obtained by the no matter which vertex is selected. Assume vertex v7 is selected
greedy algorithm. Fig. 2 shows the deviation of ζrdm from ζgr randomly (greedy can select this vertex as well). Now, whichever
for all the datasets under consideration. Notably, we observe that vertex is picked next can increase the size of the derived set
on average, the size of Zrdm (G) remains reasonably close to the by at most two. If v4 is picked next by a random process,
size of Zgr (G), even when a considerable portion of Z(G) is then the last vertex picked by the greedy algorithm will be
selected randomly. For instance, even if we pick 0.95 × ζgr (G) v8 , which will complete the Z(G). Similarly, if v5 and v3
vertices randomly and then complete the rest of the solution are the first two randomly selected vertices, v1 will be picked
using the greedy algorithm, then the solution Zrdm (G) merely by the greedy algorithm to maximally increase and complete
has 1% more vertices on average than the purely greedy solution the Z(G). This example illustrates that not only can the first
Zgr (G). This finding substantiates Conjecture 1. The increasing iterations of the greedy algorithm be random, but there can also
trend for the Large ER graphs dataset in Fig. 2 is because the be multiple optimal solutions of Z(G) for a given graph. Hence,
graphs in the dataset are relatively sparser, and a slight difference a random process can arguably replace the greedy process for
in the size of Z(G) results in a larger deviation from the greedy initial iterations. However, the random selection can result in
algorithm. Fig. 3 compares the sizes of Z(G) for graphs in a sub-optimal solution slightly larger than the original solution.
the REDDIT BINARY dataset with τ = 0.7. It can be clearly The redundancy check, in those cases, removes the extra vertices
observed that the results from the random selection are usually from the solution and brings it closer to the optimal Z(G) in size.
as small in size as the results of the greedy algorithm. To further illustrate this point, we plot the size of the derived
As explained above, the main reason behind this conjecture set after each iteration of the greedy algorithm for a few graphs in
is that for the first several iterations, the greedy algorithm picks Fig. 5. These graphs are selected so that the difference between
the vertices randomly but the later iterations are important in the size of the graph and ζgr (G) is significant. We plot the size
choosing the vertices that usually increase the size of the derived of the derived set after the addition of each vertex by the greedy
set significantly. Consider the example in Fig. 4. The optimal algorithm. It can be easily observed that there comes a point
Z(G) size for this graph is three. If we pick two vertices where by the addition of a very small subset of nodes, the span
randomly and the third using the greedy approach, we can still of the solution sees a drastic jump. For instance, for the graph
end up with Z(G) of size three. This is primarily because, for from REDDIT-BINNARY dataset, we see that for the addition

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
AHMAD et al.: GRAPH MACHINE LEARNING FRAMEWORK TO COMPUTE ZERO FORCING SETS IN GRAPHS 2117

learning approach to this problem involves training a model to


generate this discrete labeling: a model f (G) that takes an input
graph G and outputs binary labeling of vertices. Generally, these
models generate a probability map P : V → [0, 1] indicating
how likely each vertex is to be included in a Z(G). However, a
straightforward application of this approach in which a vertex
is included in a solution set based on the probability assignment
does not always render a correct solution to the combinatorial
optimization problem [39]. Having outputs that are not necessar-
ily valid solutions is a general issue in data-driven approaches
for combinatorial optimization problems, mainly because the
training is performed to search the parameter space without any
hard constraints. To deal with this issue, we train a separate
regression-based model that predicts the size ζop (G) of an
optimal solution using basic network properties. We obtain an
intermediate solution Ŝ x (G) by selecting x = Trand × ζ̂(G)
Fig. 5. Plots for size of derived set against each greedy algorithm iteration. vertices randomly from the vertex set V , where ζ̂(G) ∈ Z is
the predicted value of ζop (G) obtained by the regression model,
and Trand is a hyper-parameter for the percentage of the solution
selected by the random process. For simplicity, we drop the
of the last 20 vertices, the size of the derived set grows by about
argument of Ŝ x (G). We then pass the input graph G and this
120 vertices. Similar behavior is observed from graphs of other
intermediate solution Ŝ x as part of vertex features to the GCN
datasets. This behavior explains the fact that only a fraction of
to get the output probabilities for each vertex. These probabilities
iterations of the greedy heuristics, especially the later ones, are
are sorted and the vertex with the maximum probability, which
important.
is already not a part of the intermediate solution, is added in Ŝ x
In a nutshell, even if 95% of Z(G) is populated by the
to form Ŝ x+1 . We find the derived set D(G, Ŝ x+1 ) and check
random vertex selection and the rest is completed by the greedy
if Ŝ x+1 is a zero forcing set. If it is not a Z(G), this partial
algorithm, it can be seen that for all kinds of synthetic and real-
solution is passed iteratively through GCN and a vertex is added
world graphs, the average deviation is no more than 3%. Hence,
to it in each iteration based on the GCN output probabilities until
random selection along with greedy heuristics can reduce the
it becomes a Z(G) denoted by Ŝ. Fig. 6 illustrates this scheme.
Z(G) computation time by manifolds without a significant loss
The GCN will iteratively add a vertex in each iteration to the
in performance. In the next section, we replace the greedy heuris-
partial solution until it becomes a ZFS. This is represented by
tics with a faster learning-based algorithm to further reduce the
the diamond-shaped block (named ZFS check) in Fig. 6. In the
computation time.
end, a redundancy check is applied to remove the extra vertices
from the solution that might have been added as a result of the
VI. GRAPH CONVOLUTIONAL NETWORK (GCN) initial random selection or by the GCN, and we obtain a final
ARCHITECTURE FOR ZFS solution Zgcn (G) of size ζgcn (G). Hence, the solution obtained
Previously, we have shown that a sophisticated algorithm is from the proposed approach is essentially a ZFS.
required only for the last few iterations of the Z(G) computation. In the following subsections, we briefly explain the main parts
In this section, we design a Graph Neural Network (GNN) based of the proposed architecture.
architecture to replace the greedy algorithm that can significantly
reduce the time complexity from the greedy algorithm and
can still generate a small-sized Z(G). The proposed GNN is A. Regression Model
based on Graph Convolutional Network (GCN) [43] that uses
a message-passing network where the information is communi- The regression model is a simple Random Forest model
cated along the neighboring vertices within the graph. It works from the python module sklearn. The random forest is a meta
by iteratively aggregating information from neighboring vertices estimator that fits a number of classifying decision trees on
to update node representations, allowing the network to learn several samples of the dataset and uses averaging to improve the
meaningful features from graph-structured data. It leverages the predictive accuracy and control over-fitting [44]. The number
graph’s connectivity patterns to improve its understanding of the of estimating trees used in our experiments are 500 and the ob-
relationships between vertices. jective function for training the regression model is the squared
error. The graph features used for training are the number of
vertices in the graph G, number of edges, and the maximum
VI. Overview of the approach five and the minimum six vertex degrees. This trained model
Given a network G, our goal is to find a binary labelling for predicts ζ̂(G), the size of ζ(G). This model is computationally
each vertex in G such that a label 1 represents that the vertex is inexpensive but provides a good prediction with a very low error
included in a Z(G), while label 0 means the converse. A machine as mentioned in Section VI-D.

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
2118 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 2, MARCH/APRIL 2024

Fig. 6. Algorithm overview. First, a regression model trained on ζ(G) provides an estimate ζ̂(G) of ζ(G). An intermediate solution Ŝ x is formed by randomly
selecting Trand × Ŝ. This intermediate solution is then completed iteratively using a multi-layered graph convolutional network f (G; θ) that provides a set of
probabilities for each vertex in input graph G until a ZFS Ŝ is formed. Lastly, a redundancy check is performed that removes any redundant vertices.

B. Random Selection a bias parameter and applying a non-linearity to the resultant.


Instead of using the adjacency matrix A of the input graph to
As explained above, for the initial iterations, random selection
can be as effective as greedy selection. Thus, we randomly aggregate neighborhood features, we use symmetric normalized
1 1
matrix Γ− 2 AΓ− 2 where Γ represents the diagonal matrix of the
populate the intermediate solution by a fraction of vertices. This
fraction is computed using the ζ̂(G) and the threshold Trand . For respective vertex degrees. Formally,
our experiments, we set Trand = 0.5. The output of this Random 1 − 12
H k+1 = σ(H k θ kb + Γ− 2 AΓ H k θ kw ), (2)
Selection process is a set of vertices Ŝ x generated by randomly
k
selecting x = 0.5 × ζ̂(G) number of vertices. where σ is any nonlinear activation function and θ kb ∈ RC ×1
k k+1
and θ kw ∈ RC ×C are the trainable weights used for the
C. GCN convolutions. The C k denotes the number of features per vertex
In our combinatorial optimization problem, the goal of the at the kth layer.
desired machine learning model is to learn a function f (G; θ) : The standard loss function used in GCN penalizes the wrong
G → {0, 1}∗ , where G is the set of all unlabeled graphs and prediction of labeling of each vertex and aggregates the total
{0, 1}∗ is the space of all 0,1-vectors of arbitrary length depend- loss over all the vertices. However, this may not be an ideal
ing on the size of graphs in G. Being a data-driven approach, we loss to minimize for the combinatorial optimization problem
use labeled data to train this supervised machine learning model. considered in this work. Therefore, we design a custom loss
We assume that our data is in the form of pairs (Gi , Si ), where function suitable for learning effectively.
Gi is an undirected graph with Ni vertices and Si ∈ {0, 1}Ni 1) The Loss Function: A differentiable loss function is used
is a binary representation of an optimal zero forcing set of Gi . in supervised learning to estimate the “distance” between the
As per standards, we let our approximation function fˆ(Gi , θ), predicted label and ground truth and find the appropriate direc-
a machine learning network, take values from [0, 1]∗ instead tion to update the neural network’s weights using optimization
of {0, 1}∗ to facilitate a differentiable loss function. The net- methods like gradient descent. The loss function minimized dur-
ing the training phase is conventionally the binary cross-entropy
work fˆ(Gi ; θ) is parameterized by a learnable parameter θ,
loss. For a training example Gi , Si , the binary cross-entropy is
and is trained to predict Si for a given Gi . The parameters
defined as
(weights) of the network are updated by gradient descent while
optimizing a loss function L(Si , fˆ(Gi ; θ)) discussed in detail 
N
LCE (Si , fˆ(Gi ; θ)) = (Sij log(fˆj (Gi ; θ))
in Section VI-C1. The network fˆ(Gi ; θ) is trained to imitate
j=1
an iteration of the greedy algorithm. For training, we randomly
remove a vertex from the ground truth (Zgr (G) in our case), + (1 − Sij ) log(1 − fˆj (Gi ; θ))),
and train the network to learn to predict that missing vertex.
We initialize the first layer by the one-hot encoding of each where Sij is the jth element of Si and fˆj (Gi ; θ) is the jth
vertex degree. We also concatenate to these vertex features a element of fˆ(Gi ; θ) (representing the jth vertex). This standard
binary value indicating whether or not a vertex is a part of the loss penalizes the wrong prediction of labeling of each vertex
intermediate solution Ŝ x . The features of a vertex v for each and aggregates the total loss over all the vertices. In addition to
subsequent layer are denoted by Hvk+1 , where the superscript this loss, we add another term that further penalizes the vertices
(k + 1) represents the layer index, and the subscript denotes the that are already a part of the intermediate solution. This term
vertex. Hvk+1 is computed from layer-wise convolutions with forces the model to learn to imitate the iterative steps of the
previous layers by first aggregating the features of neighbors greedy algorithm. We update our loss function as follows:
of v (including v itself). This so-called message-passing step L(Si , f (Gi ; θ)) = αLCE (Si , f (Gi ; θ))
precedes the convolution with the weight matrices θ k as ex-
plained in the (2). We get the updated feature vector by adding + (1 − α)Lsel (Ŝix , f (Gi ; θ)),

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
AHMAD et al.: GRAPH MACHINE LEARNING FRAMEWORK TO COMPUTE ZERO FORCING SETS IN GRAPHS 2119

where Lsel (Ŝix , f (Gi ; θ)) is the summation of the probabilities


corresponding to the x vertices included in Ŝix (G). This function
contains a hyper-parameter α that signifies the relative impor-
tance of penalizing the already selected vertices and can be used
to tune the network. We set it to 0.5.

D. Evaluation of the Proposed Approach


To evaluate our proposed GCN based approach, we run ex-
periments on all the available datasets mentioned in Section IV.
We combine all the synthetic, and the real-world graph datasets
and randomly split them into 90-10% train-test sets. Both the
regression and the GCN models are trained on this 90% train
set. The regression is trained to predict ζ̂(G) whereas the GCN
will predict the vertex that will be added to the partial solution
Ŝ x by the greedy algotithm in xth iteration.
The hyper-parameters of both the regression and GCN models
are same throughout all the following sets of experiments. The
Random Forest regression model has 500 estimators with a
squared error loss function. This regression model is trained
only once and is used for all the rest of the experiments of this
article. Its mean absolute error and mean squared error for the
test set are 3.42 and 6.87 respectively. On the other hand, the
GCN model is trained separately for all the different sets of Fig. 7. Plot (a) shows the sizes of the ZFS computed by the greedy as well
experiments. However, its hyper-parameters are kept the same. as the GCN-based solution for the graphs in the test set. Plot (b) shows the
respective time taken to compute the solutions.
The main architecture of GCN is adapted from Li et al. [39].
There are 20 hidden layers, 500 vertex features for the first layer
and 64 for all the consequent layers. For the first layer, we use
one-hot encoded degree of each vertex where a maximum of a Z(G) containing 11 vertices (4.14% deviation) less than the
499◦ is feasible for training and testing. We also concatenate a greedy solution. On the other hand, there is a graph with 880
binary indicator to inform the model whether or not a vertex is a vertices for which the greedy solution had 28 vertices (5.9%
part of Ŝ x . Each model is trained for 300 epochs with the custom deviation) less in its solution compared to the GCN solution. The
loss function defined in Section VI-C1. total time taken by the greedy algorithm to find a solution for all
We evaluate our architecture on two main metrics: accuracy, the graphs in this test set is about 99 hrs whereas the proposed
and time complexity. solution found all the solutions in approximately 8.2 hrs where
The accuracy is defined as the percentage deviation between most of the time is spent for redundancy check.
the size of the Z(G) by the greedy solution ζgr (G), and the size Evidently, the proposed architecture can find the Z(G) 12
of the Z(G) returned by GCN ζgcn (G). We report the average of times faster than the greedy algorithm while being on average,
deviation over all the graph instances in the test sets of different only 1% larger in size. In the next section, we evaluate the learn-
datasets. Formally, the average deviation of GCN solutions from ing capabilities of our architecture from sub-optimal solutions.
greedy solutions is
VII. GENERALIZABILITY OVER SUB-OPTIMAL SOLUTIONS
|K|
1  ζgcn (Gi ) − ζgr (Gi ) Being a data driven approach, the proposed GCN based ar-
Dev(ζgcn ) = × 100,
|K| i=0 ζgr (Gi ) chitecture requires optimal solutions for training. Unfortunately,
these are usually computationally hard to compute with large
where K is the set of graphs in the test set. Fig. 7 compares enough graph instances needed for efficient learning. Thus,
ζgr (G), and ζgcn (G) for the graphs test dataset and the respective forming an optimal dataset is nearly impossible for NP-hard
time to compute them. We have sorted the results on the basis of combinatorial problems. It is only natural to look for approx-
the size of the greedy solutions and the time taken by the greedy imate solutions which are not very far from the optimal so-
algorithm to compute the solution. lutions and can be found in polynomial time. As verified by
Despite having sparse as well as dense graphs in the test set the numerical evaluations, our proposed architecture is capable
with an average ζgr (G) to be 105.78, the average deviation of performing just as good whether it is trained on optimal or
Dev(ζgcn ) on this test set is 0.73%. This essentially means sub-optimal data. To validate this hypothesis, we use the small
that the Z(G) found by the proposed architecture ZGCN (G) ER graphs dataset (explained in Section IV-A) for GCN training
on average has 0.76% more vertices than Zgr (G). However, and testing since it is the only dataset for which the optimal
for a single graph with 690 vertices, the GCN was able to find solutions are available.

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
2120 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 2, MARCH/APRIL 2024

Fig. 8. Comparison of sub-optimal and optimal dataset training.

As far as the experimental setup is concerned to validate our


hypothesis, only the training and testing sets are changed; the
rest is the same as in Section VI-D. We randomly split the
small ER graph dataset into 70 − 30% train-test sets so that
we have enough data to evaluate. Since we randomly remove
a vertex from the greedy/optimal solution for training, we have
enough samples for the training. We first train a GCN model
on the optimal Z(G) and use it to find the Z(G) by our pro-
Fig. 9. Plot (a) shows the sizes of the Z(G) computed by the greedy as well
posed approach. We call the output set Zgcn_op (G) with size as the GCN-based solutions for the graphs in the test set of larger graphs. Plot
ζgcn_op (G). Similarly, we train another GCN model using the (b) shows the respective time taken to compute the solutions.
greedy solutions as the ground truth. Since this model is trained
on sub-optimal data, the Z(G) obtained from our architecture
is expected to be a little different from the one where the GCN
is trained on the optimal data. The output from this GCN model VIII. SCALABILITY OVER LARGE GRAPHS
is referred to as Zgcn_gr (G) with size ζgcn_gr (G). We plot the As scalability can be a huge challenge for data-driven ap-
size of greedy solutions ζgr (G), optimal solutions ζop (G), the proaches to the hard combinatorial problems, we also perform
greedy-gcn solutions ζgcn_gr (G), and the optimal-gcn solutions extensive time-complexity analysis on large graphs. It is im-
ζgcn_op (G) in Fig. 8. portant to show that a method that is much faster than the
It can be observed from Fig. 8 that solutions from both the state-of-the-art methods, does not compromise on results as the
GCNs are fairly close to each other. In fact, the average deviation size of the input increases. Hence, we train our GCN model on
for both the models has a difference of 0.67%, and both perform the small graphs and then test them on larger graphs (almost 10
better than the greedy algorithm. The model trained on the times the size of training graphs) and observe the performance.
greedy solutions has the average deviation Zgcn_gr (G) to be In this experiment setup, we combine all the available datasets
−5.13% from the greedy solutions whereas the model trained and sort the graphs by their size. We split this sorted mixture
on the optimal solutions has the average deviation Zgcn_op (G) such that the smallest 70% of the graphs are in the train set
to be −5.8%. This shows that both the GCN architectures will and the largest 15% are in the test set. The graph sizes in the
give a solution that will have, on average, 2 vertices less than the training set range from 6 − 110. It takes the greedy algorithm
greedy solution. However, the difference between the two GCN less than 10 secs to find the solution for a graph in this size
solutions is negligible. range. The graphs in the test set range from 500 − 1194 in size.
This experimental setup validates that our proposed approach Using the same rest of the experimental setup as in Section VI-D,
can learn equally well from the greedy as well as the optimal we evaluate the proposed approach on the accuracy and time
solutions. Not only this, we are also able to show that GNNs can complexity metrics. The sizes of ZFS ζgr (G), and ζgcn (G), and
perform better than the greedy algorithms even when trained on the time taken for ZFS computation of each large graph in the
the solutions from greedy heuristics. Hence, using our proposed test set are plotted in Fig. 9.
approach, optimal data is not necessary for the ZFS problem and Interestingly, our proposed approach turns out to give a so-
the approximate solutions can be used for training. This result lution that is nearly as same in size as the greedy solution on
paves a way for other combinatorial problems to be solved using these large graphs while taking only a fraction of time. The
GNNs without preparing an optimal dataset for training. In the average deviation for these large test graphs is 0.745% with a
next section, we validate the scalability of our approach and maximum of 3.4% (19 vertices less than the greedy solution)
compare our results for huge graphs with the greedy algorithm. and a minimum of −9.2% (29 vertices more than the greedy

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
AHMAD et al.: GRAPH MACHINE LEARNING FRAMEWORK TO COMPUTE ZERO FORCING SETS IN GRAPHS 2121

time as τ rises due to the overhead cost of redundancy checks,


which are performed in a greedy fashion. In sum, our empirical
findings collectively support the argument that even in cases
where random selection and greedy heuristics are employed,
the random-greedy approach lags significantly behind the com-
putational efficiency of our GCN-based method, particularly as
graph size increases.

IX. CONCLUSION
Fig. 10. Comparison between random-greedy and the proposed GCN-based
solution. Plot (a) shows the deviation from the greedy solution with varying Minimum ZFS is a hard combinatorial optimization problem
randomness threshold τ , and plot (b) presents the respective time taken to with several applications across various domains. We presented
compute the solution on the whole dataset.
a novel graph convolutional network to compute a small-sized
ZFS. The proposed solution utilizes data-driven and algorithmic
insights to compute ZFS in large graphs effectively. Through
extensive experiments, we showed that the proposed approach
solution) deviation. The average size of the greedy solution is
is computationally efficient, scalable to much larger graphs,
449.3 vertices in the test set. This indicates that on average, there
generalizable to different graph families, and able to learn from
are about 3.5 more vertices in the Z(G) obtained by the GCN
sub-optimal datasets. Thus, our approach is not inhibited by
than in the greedy solution. However, the GCN takes a total time
the requirement to have optimal datasets consisting of large
of about 2.5 days (primarily the time for the redundancy check)
enough instances to train the graph learning models for com-
while the greedy algorithm takes about 48 days to compute the
binatorial optimization problems. We also contributed towards
Z(G) for the 1832 graphs in the test set. This essentially means
the future data-driven algorithms for the minimum ZFS problem
that the GCN architecture is less than 1% away from the solution
through several synthetic and real-world benchmark datasets
while being about 19 times faster on large graphs.
using greedy solutions as the ground truth. In addition, a small
We expect the time difference between GCN and greedy
graphs dataset containing hard instances is also annotated with
algorithm to grow exponentially for larger graphs. However, val-
optimal solutions. In the future, we aim to extend our approach
idating this hypothesis for significantly larger graphs becomes
to solving other combinatorial optimization problems, including
impractical, primarily due to the fact that the greedy algorithm’s
minimum dominating sets in graphs.
computational demands become prohibitively high on graphs
with more than 2500 vertices. As shown above, GCN algorithm
is computationally much faster than the greedy algorithm on REFERENCES
large graphs while preserving the solution quality. Even though [1] N. Barnier and P. Brisset, “Graph coloring for air traffic flow management,”
random selection and greedy heuristics can reduce time, GCN is Ann. Operations Res., vol. 130, no. 1, pp. 163–178, 2004.
[2] Y. Peng, B. Choi, B. He, S. Zhou, R. Xu, and X. Yu, “VColor: A practical
still significantly faster than the greedy algorithm, especially for vertex-cut based approach for coloring large graphs,” in Proc. IEEE 32nd
larger graphs. To substantiate this, we conduct an empirical study Int. Conf. Data Eng., 2016, pp. 97–108.
focusing on computational efficiency and solution quality. Our [3] B. Balasundaram and S. Butenko, “Graph domination, coloring and cliques
in telecommunications,” in Handbook of Optimization in Telecommunica-
experimentation involved varying the randomness threshold τ of tions. Berlin, Germany: Springer, 2006, pp. 865–890.
the initially selected vertices from 35% to 75%. To get a range of [4] F. Moradi, T. Olovsson, and P. Tsigas, “A local seed selection algorithm
sizes of the graphs, we select the REDDIT-BINARY graphs and for overlapping community detection,” in Proc. IEEE/ACM Int. Conf. Adv.
Social Netw. Anal. Mining, 2014, pp. 1–8.
a part of the Large ER graphs dataset to obtain graphs with the [5] N. Armenatzoglou, H. Pham, V. Ntranos, D. Papadias, and C. Shahabi,
number of vertices ranging between 6 − 1194. We compare the “Real-time multi-criteria social graph partitioning: A game theoretic ap-
quality of the solution using Dev(ζrdm ) and Dev(ζgcn ). Fig. 10 proach,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015, pp. 1617–
1628.
compares the random-greedy with the proposed GCN-based [6] P. R. Östergård, “A fast algorithm for the maximum clique problem,”
approach. Discrete Appl. Math., vol. 120, no. 1–3, pp. 197–207, 2002.
We observe that the random-greedy algorithm consumes sig- [7] D. Burgarth, D. D’Alessandro, L. Hogben, S. Severini, and M. Young,
“Zero forcing, linear and quantum controllability for systems evolving on
nificantly more time than our GCN-based approach, even when networks,” IEEE Trans. Autom. Control, vol. 58, no. 9, pp. 2349–2354,
allowing for up to 75% random selection. This underscores the Sep. 2013.
computational advantages of our method, especially for larger [8] B. Brimkov, C. C. Fast, and I. V. Hicks, “Computational approaches for
zero forcing and related problems,” Eur. J. Oper. Res., vol. 273, no. 3,
graphs. Fig. 10(a) demonstrates that increasing the randomness pp. 889–903, 2019.
threshold τ slightly deteriorates the solution quality for both [9] P. A. Dreyer Jr. and F. S. Roberts, “Irreversible k-threshold processes:
approaches. This trade-off is expected, but our approach con- Graph-theoretical threshold models of the spread of disease and of opin-
ion,” Discrete Appl. Math., vol. 157, no. 7, pp. 1615–1627, 2009.
sistently maintains a competitive solution quality. Additionally, [10] N. Monshizadeh, S. Zhang, and M. K. Camlibel, “Zero forcing sets and
the time taken by the random selection and the greedy heuristics controllability of dynamical systems defined on graphs,” IEEE Trans.
decreases with the increase of threshold τ as it takes fewer Autom. Control, vol. 59, no. 9, pp. 2562–2567, Sep. 2014.
[11] S. S. Mousavi, M. Haeri, and M. Mesbahi, “On the structural and strong
iterations of the greedy algorithm to complete the solution. structural controllability of undirected networks,” IEEE Trans. Autom.
Conversely, our GCN-based approach incurs a slight increase in Control, vol. 63, no. 7, pp. 2234–2241, Jul. 2018.

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
2122 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 2, MARCH/APRIL 2024

[12] A. Aazami, “Hardness results and approximation algorithms for some [37] Y. Jiang, Y. Rong, H. Cheng, X. Huang, K. Zhao, and J. Huang, “Query
problems on graphs,” Ph.D. dissertation, University of Waterloo, Waterloo, driven-graph neural networks for community search: From non-attributed,
ON, Canada, 2008. attributed, to interactive attributed,” Proc. VLDB Endowment, vol. 15,
[13] W. Abbas, M. Shabbir, Y. Yazıcıoğlu, and X. Koutsoukos, no. 6, pp. 1243–1255, 2022.
“Leader selection for strong structural controllability in networks [38] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in Proc. 28th
using zero forcing sets,” in Proc. Amer. Control Conf., 2022, Int. Conf. Neural Inf. Process. Syst., vol. 2, 2015, pp. 2692–2700.
pp. 1444–1449. [39] Z. Li, Q. Chen, and V. Koltun, “Combinatorial optimization with graph
[14] F. Gama, E. Tolstaya, and A. Ribeiro, “Graph neural networks for de- convolutional networks and guided tree search,” in Proc. 32nd Int. Conf.
centralized controllers,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Neural Inf. Process. Syst., 2018, pp. 537–546.
Process,, 2021, pp. 5260–5264. [40] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in Proc.
[15] M. Boffa, Z. B. Houidi, J. Krolikowski, and D. Rossi, “Neural combinato- 21th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2015,
rial optimization beyond the TSP: Existing architectures under-represent pp. 1365–1374.
graph structure,” in Proc. AAAI Workshop Graphs More Complex Struc- [41] W. L. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
tures Learn. Reasoning, 2022. learning on large graphs,” in Proc. 31st Int. Conf. Neural Inf. Process. Syst.,
[16] Q. Cappart, D. Chételat, E. Khalil, A. Lodi, C. Morris, and 2017, pp. 1025–1035.
P. Veličković, “Combinatorial optimization and reasoning with graph [42] C. Morris, N. M. Kriege, F. Bause, K. Kersting, P. Mutzel, and
neural networks,” J. Mach. Learn. Res., vol. 24, no. 130, pp. 1–61, M. Neumann, “TUDataset: A collection of benchmark datasets for learning
2023. with graphs,” 2020, arXiv:2007.08663.
[17] M. Böther, O. Kißig, M. Taraz, S. Cohen, K. Seidel, and [43] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
T. Friedrich, “What’s wrong with deep learning in tree search for convolutional networks,” in Proc. Int. Conf. Learn. Representations, 2016.
combinatorial optimization,” in Proc. Int. Conf. Learn. Representations, [44] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32,
2022, pp. 1–25. 2001.
[18] K. Xu, M. Zhang, J. Li, S. S. Du, K. Kawarabayashi, and S. Jegelka, “How
neural networks extrapolate: From feedforward to graph neural networks,”
in Proc. Int. Conf. Learn. Representations, 2020, pp. 1–52.
[19] C. K. Joshi, T. Laurent, and X. Bresson, “An efficient graph convo-
lutional network technique for the travelling salesman problem,” 2019,
arXiv:1906.01227.
[20] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural combina-
torial optimization with reinforcement learning,” 2016, arXiv:1611.09940.
[21] Q. Cappart, D. Chételat, E. B. Khalil, A. Lodi, C. Morris, and P. Velickovic,
“Combinatorial optimization and reasoning with graph neural networks,”
J. Mach. Learn. Res., vol. 24, no. 130, pp. 1–61, 2023.
[22] W. Kool, H. van Hoof, and M. Welling, “Attention, learn to solve routing Obaid Ullah Ahmad received the B.S. degree in
problems!,” in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–25. electrical engineering from the University of Engi-
[23] Q. Wang and C. Tang, “Deep reinforcement learning for transportation net- neering and Technology, Lahore, Pakistan, in 2019,
work combinatorial optimization: A survey,” Knowl.-Based Syst., vol. 233, and the M.S. degree in computer science from In-
2021, Art. no. 107526. formation Technology University, Lahore, in 2021.
[24] AIM Minimum Rank Special Graphs Work Group, “Zero forcing sets He is currently working toward the Ph.D. degree in
and the minimum rank of graphs,” Linear Algebra Appl., vol. 428, no. 7, electrical engineering from the University of Texas
pp. 1628–1648, 2008. at Dallas, Richardson, TX, USA. He is currently a
[25] M. Trefois and J.-C. Delvenne, “Zero forcing number, constrained match- Research Assistant with the University of Texas at
ings and strong structural controllability,” Linear Algebra Appl., vol. 484, Dallas’ Control, Intelligence, Resilience in Networks
pp. 199–218, 2015. and Systems Lab, Richardson. His current research
[26] A. Chapman and M. Mesbahi, “On strong structural controllability of interests include control-based approaches, for graph machine learning, network
networked systems: A constrained matching approach,” in Proc. Amer. optimization, and multi-robot systems.
Control Conf., 2013, pp. 6126–6131.
[27] S. Fallat, K. Meagher, and B. Yang, “On the complexity of the posi-
tive semidefinite zero forcing number,” Linear Algebra Appl., vol. 491,
pp. 101–122, 2016.
[28] I. V. Hicks et al., “Computational and theoretical challenges for com-
puting the minimum rank of a graph,” INFORMS J. Comput., vol. 34,
pp. 2868–2872, 2022.
[29] D. Ferrero et al., “Rigid linkages and partial zero forcing,” Elec. J.
Combinatorics, pp. P2–43, 2019.
[30] F. H. Kenter and J. C.-H. Lin, “On the error of a priori sampling:
Zero forcing sets and propagation time,” Linear Algebra Appl., vol. 576,
pp. 124–141, 2019.
[31] J. C.-H. Lin, P. Oblak, and H. Šmigoc, “The strong spectral property for
graphs,” Linear Algebra Appl., vol. 598, pp. 68–91, 2020. Mudassir Shabbir received the Ph.D. degree from
[32] D. Burgarth and V. Giovannetti, “Full control by locally induced relax- the Division of Computer Science, Rutgers Univer-
ation,” Phys. Rev. Lett., vol. 99, no. 10, 2007, Art. no. 100501. sity, New Brunswick, NJ, USA, in 2014. He is cur-
[33] S. Butler et al., “Minimum rank library. Sage programs for calculating rently an Associate Professor with the Department
bounds on the minimum rank of a graph, and for computing zero forcing of Computer Science, Information Technology Uni-
parameters,” 2014, Accessed: Mar. 24, 2022. [Online]. Available: https: versity, Lahore, Pakistan, and a Research Assistant
//github.com/jasongrout/minimum_rank Professor with Vanderbilt University, Nashville, TN,
[34] B. Brimkov, D. Mikesell, and I. V. Hicks, “Improved computational USA. He was with the Lahore University of Man-
approaches and heuristics for zero forcing,” INFORMS J. Comput., vol. 33, agement Sciences, Pakistan; Los Alamos National
pp. 1259–1684, 2021. Labs, NM, Bloomberg L.P. New York, NY, USA;
[35] A. Agra, J. O. Cerdeira, and C. Requejo, “A computational comparison of and with Rutgers University. He was Rutgers Honors
compact MILP formulations for the zero forcing number,” Discrete Appl. Fellow for 2011 to 2012. His main research interests include algorithmic and
Math., vol. 269, pp. 169–183, 2019. discrete geometry, and has developed new methods for the characterization and
[36] A. Weber, G. Reissig, and F. Svaricek, “A linear time algorithm to verify computation of succinct representations of large data sets with applications in
strong structural controllability,” in Proc. IEEE 53rd Conf. Decis. Control, non-parametric statistical analysis. He also works on graph machine learning
2014, pp. 5574–5580. and resilient network systems.

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.
AHMAD et al.: GRAPH MACHINE LEARNING FRAMEWORK TO COMPUTE ZERO FORCING SETS IN GRAPHS 2123

Waseem Abbas (Member, IEEE) received the M.Sc. Xenofon Koutsoukos (Fellow, IEEE) received the
and Ph.D. degrees in electrical and computer engi- Ph.D. degree in electrical engineering from the Uni-
neering from the Georgia Institute of Technology, versity of Notre Dame, Notre Dame, IN, USA, in
Atlanta, GA, USA, in 2010 and 2013, respectively. 2000. He is currently the Thomas R. Walters Pro-
He is currently an Assistant Professor with the Sys- fessor and Chair of the Department of Computer Sci-
tem Engineering Department, University of Texas at ence, School of Engineering, Vanderbilt University,
Dallas, Richardson, TX, USA. He was a Research Nashville, TN, USA. He is also a Senior Research
Assistant Professor with the Vanderbilt University, Scientist with the Institute for Software Integrated
Nashville, TN, USA. He was a Fulbright Scholar from Systems (ISIS) and holds a secondary appointment
2009 to 2013. His research interests include control with the Department of Electrical and Computer En-
of networked systems, resilience and robustness in gineering. He was a Member of Research Staff with
networks, distributed optimization, and graph-theoretic methods in complex the Xerox Palo Alto Research Center (PARC) during 2000–2002. He has
networks. coauthored more than 350 journal and conference papers, and he is co-inventor
of four U.S. patents. His research interests include cyber-physical systems with
emphasis on learning-enabled systems, security and resilience, diagnosis and
fault tolerance, distributed algorithms, formal methods, and adaptive resource
management. Dr. Koutsoukos was the recipient of the NSF Career Award in 2004,
Excellence in Teaching Award in 2009 from the Vanderbilt University School
of Engineering, and the 2011 NASA Aeronautics Research Mission Directorate
(ARMD) Associate Administrator (AA) Award in Technology and Innovation.

Authorized licensed use limited to: University of Dubrovnik. Downloaded on April 20,2024 at 05:46:04 UTC from IEEE Xplore. Restrictions apply.

You might also like