0% found this document useful (0 votes)

24 views

71 Graph Q Learning For Combinato

1) The document proposes using graph neural networks (GNNs) to solve combinatorial optimization (CO) problems by framing the optimization process as a sequential decision making problem and using reinforcement learning to train a GNN policy. 2) It demonstrates this approach on the flexible job shop scheduling problem (FJSP), showing the GNN can learn policies that meet or exceed the performance of heuristic solvers while using fewer parameters and less training time. 3) This work presents the first application of deep learning to solve FJSP and more generally shows that GNNs trained with reinforcement learning can solve CO problems traditionally addressed with heuristics.

Uploaded by

Fabian Bazzano

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

71 Graph Q Learning For Combinato

Uploaded by

Fabian Bazzano

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Graph Q-Learning

for Combinatorial Optimization

Victoria M. Dax Jiachen Li Kevin Leahy

Stanford University Stanford University MIT Lincoln Laboratory
[email protected] [email protected] [email protected]

Mykel J. Kochenderfer
Stanford University
[email protected]

Abstract

Graph-structured data is ubiquitous throughout natural and social sciences, and

Graph Neural Networks (GNNs) have recently been shown to be effective at
solving prediction and inference problems on graph data. In this paper, we propose
and demonstrate that GNNs can be applied to solve Combinatorial Optimization
(CO) problems. CO concerns optimizing a function over a discrete solution space
that is often intractably large. To learn to solve CO problems, we formulate the
optimization process as a sequential decision making problem, where the return
is related to how close the candidate solution is to optimality. We use a GNN to
learn a policy to iteratively build increasingly promising candidate solutions. We
present preliminary evidence that GNNs trained through Q-Learning can solve CO
problems with performance approaching state-of-the-art heuristic-based solvers,
using only a fraction of the parameters and training time.

1 Introduction

Many important real-world problems, from social networks to chemical systems, are naturally rep-
resented as graphs. Over the past five years, graph neural networks (GNNs) have emerged as a
powerful approach for machine learning on such structural data [1], [2] by combining the represen-
tation learning capabilities of deep networks with bespoke adaptation based on graph properties.
These approaches have been effective in node classification [3], relation prediction [4], and graph
classification [5]. More recently, the community has been exploring other applications [6] for GNNs
and analyzing their theoretical limitations and mathematical properties [7].
Many of these successes have focused on prediction or inference problems. In this work, we demon-
strate that GNNs can be applied to solve a different class of problems. Combinatorial Optimization
(CO) problems are often encountered across diverse fields and are difficult to solve exactly. CO
problems require optimizing a function over a combinatorial space, possibly subject to constraints;
examples include finding the minimum spanning tree (MST) in a graph or determining if there exists
a variable assignment that satisfies a given Boolean formula (k-SAT). The ubiquity of CO across
domains and applications led to the development of heuristic-based solvers in the early days of
computer engineering that remain state-of-the-art today [8].
Motivated by deep learning’s success at learning representations that outperform hand-engineered
features, we explore whether GNNs can learn to outperform heuristic-based CO solvers when trained
via reinforcement learning (RL). Our specific contributions are:

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

1. Representing instances of combinatorial optimization problems as graphs and formulating
solving for an optimized solution as a Markov decision process (MDP). To the best of our
knowledge, this work is the first to solve FJSP using a deep learning-based method.
2. Showing how GNNs can be used to solve CO problems and that our formulation generalizes
to a form of meta-learning.
3. Demonstrating empirically that we meet the performance of other algorithms and baseline
heuristics with a fraction of the parameters and training time.

2 Literature Review
The current research landscape in the field of graph based learning consist of theoretical improvements
to the architecture [9], [10], explorations of limitations [11], and applications to a variety of domains
through supervised learning.
Graph neural networks have only very recently transitioned into usage in the context of RL. However
rather than end-to-end, GNNs are employed in a modularized fashion: Li et al. [12] use a GNN for
inferring a graph given sequential data streams. Xiao et al. [13] employ a graph attention network
(GAT) in a multi-agent RL context, but for optimal reward balancing to speed up the learning process.
Liu et al. [14] is using a GNN for game abstraction to simplify complex multiplayer games. Ma et al.
[15] leverage the node update to mimic k-logit in a two-player zero-sum games.
Very few papers explore GNNs in the context of CO [16]. These works focus on solving the travelling
salesperson problem only, tailoring their implementation to fit this task specifically, and use GNNs as
a search heuristics to prune the solution space rather than training a solver directly.

3 Preliminaries
Combinatorial optimization (CO) refers to optimizing an objective function whose domain is
a discrete but combinatorially large configuration space, making the space of possible solutions
typically too large to search exhaustively. Examples of well-known combinatorial optimization
problems include the Travelling Salesman Problem (TSP), Minimum Spanning Tree (MST), and
Boolean Satisfiability (SAT). While some instances of CO problems can be solved exactly through
Branch-and-Bound, many are NP-Hard. We generally must resort to specialized heuristics that
rule out large parts of the search space or approximation algorithms. Formally, a combinatorial
optimization problem A is defined by the tuple (I, f, m, g), where I is a set of instances, x ∈ I is
an instance, f (x) is the finite set of feasible solutions y, m(x, y) denotes the measure of y = f (x),
and g is the goal function, i.e. usually max or min. The goal is then to find for some instance x an
optimal solution, that is, a feasible solution y with
m(x, y) = g{m(x, y ′ ) | y ′ ∈ f (x)}. (1)

In graph-based learning, a GNN layer can be viewed as a message-passing step [17], where each
node updates its state by aggregating messages flowing from its direct neighbors. A graph is a tuple of
nodes and edges G = (V, E). The one-hop neighborhood of node u is Nu = {v ∈ V | (v, u) ∈ E}. A
node feature matrix X ∈ R|V |×k gives the k features of node u as xu ; we omit edge- and graph-level
features for clarity. A (message passing) GNN over this graph is then executed as:

hu = ψAGG {ϕ(xu , xv )|v ∈ Nu } (2)

where ψ : Rk × Rk → Rk is a message function, ϕ : Rk → Rk is a readout function, and AGG is

a permutation-invariant aggregation function (such as Σ or max). Both ϕ and ψ can be realised as
MLPs, but many special cases exist, such as attentional GNNs [18].
Reinforcement Learning (RL) considers an agent learning how to select actions in an environment
to maximize their long-term cumulative rewards, i.e., the return, in a sequential decision-making
process [19]. The environment is modeled as a Markov Decision Process (MDP), defined by the
tuple (S, A, p, r, γ) with S = {s} the set of states, A = {a} the set of actions, p(s′ | s, a) the state
transition distribution, r : A × S → R a bounded reward function, and γ the discount factor. RL
aims to find a policy π : S → A that maps a state s to an optimal action a. Optimality is defined as
maximizing the expected return.

2
Model-free RL refers to learning a policy or Q-function, though a form of trial-and-error, without
explicitly modeling the transition probability distribution p or the reward function r. Deep Q-learning
[20] is a simple algorithm but can suffer from overestimation bias and catastrophic forgetting. Several
other methods have been introduced to reduce overestimation bias. Weighted double Q-learning [21],
for example, uses a weighted combination of the double Q-learning estimate, which may lead to
underestimation bias.

4 Method
We frame combinatorial optimization problems as sequential decision making processes, where the
return is related to how close a candidate solution is to optimal. We use a GNN to learn a policy
that sequentially builds better and better candidate solutions. We aim to solve general classes of CO
problems, but in this work we focus on the flexible Job Shop Scheduling problem (FJSP).

4.1 Problem Definition

The classical job-shop scheduling problem (JSSP) [22] defines n jobs that consist of sets of operations
oi of varying processing time each, which need to be processed in a specific order as specified by a
set of precedence constraints. Each operation is assigned to one of m machines and the goal is to find
a processing schedule that minimizes the makespan, i.e., the total length of the schedule from the
start of the first job to the end of the last. The FJSP [23] is a generalization of the classical JSSP that
allows processing operations on one machine out of a set of alternative machines. The FJSP is an
NP-hard problem consisting of two sub-problems, i.e., the assignment and the scheduling problems.
The scheduling problem by itself reduces to the classical JSSP. Viable solutions are evaluated in terms
of the makespan, which is defined as the time difference between start and finish of a sequence of
jobs or tasks.
In this paper, proposing a solution to an instance of the FJSP is treated as a sequential decision-making
process, which iteratively takes a scheduling action to assign an operation to a compatible machine at
each state until all operations are scheduled. The proposed workflow is shown in Figure 1.

Figure 1: Training cycle.

4.2 Combinatorial Optimization as MDPs

By reformulating FJSP as a sequential decision making problem, more specifically a Markov Decision
Process (MDP), we can use traditional RL methods to find viable solutions to a given FJSP. The
initial state s0 defines the problem setting, i.e., it defines how many machines are available, how
many operations will need to be processed, and in what sequence. At each time step t, an action at
specifies the identity of an unassigned operation to be added to the end of the queue of a machine.
Consequently, a state st defines the resource allocation and processing sequence. A sequence ends
when all operations of all jobs have been assigned to a machine, which will be the proposed solution
to the problem instance. If the agent assigns operations with upstream requirements first, these
operations can not be executed and render a machine idle until its requirements are met. Gridlock
defines the state where all machines are idle by assignment.

3
During a rollout, previously assigned jobs can start processing and the agent will receive positive unit
rewards for completed operations, which are then removed from the queue. To encourage solving for
a shorter makespan, a negative reward of −0.1 is accrued at each time step.
An action-value function Q(st , at ) maps the expected return of taking action at in state st . This next
section defines the architecture of a heterogeneous graph neural network that we subsequently train
to learn Q.

4.3 Heterogeneous Graph Neural Networks

We employ a disjunctive graph G = {O, M, Ej , Eq , C} to model the current state of a FJSP. Here,
O is the set of operations independent of the job to which they belong, M is the set of available
machines, Ej and Eq are sets of directed edges that denote the sequence of operations within each job
and within each queue respectively, and C represents the set of conjunctive, undirected edges that
assign operations to machines.
Edges in C, represented by dashed edges in Figure 2a, specify operation-machine compatibility and
their assigned weights define a speed up or slow down of the default processing time. The feature
space of operations is composed of i) the time required to finish the operation, ii) its completion
percentage, iii) the number of downstream dependencies, iv) a one-hot encoding of the current state
of this operation, i.e., whether it is scheduled, being processed, or completed, and v) the remaining
time. The machine features comprise the number of queued operations and their minimal expected
run time.
To encode this state representation into a meaningful action-value function Q, we use a heterogeneous
graph neural network composed of graph convolution layers [3], [24]. Our network architecture
consists of two fully connected layers to embed both node types into the same dimensionality,
followed by a set of convolutional layers, each for processing a different edge-type without weight
sharing. The resulting intermediate node embeddings are summed. By looping over this same
heterogeneous layer k times, each node embedding considers the state information of nodes within a
k step radius. A dot-product readout layer is then used for edge-prediction over C. The edge with
the highest score defines the new operation-to-machine assignment, shown by solid blue arrows in
Figure 2b.

(a) Initial state representation. (b) State at t = 5

Figure 2: Graphical representation of a sample FJSP instance at t = 0 (a) and after 5 actions have
been taken, st=5 (b).

5 Experimental Results

5.1 Implementation details

Random FJSP samples are created by initializing m machines and n · ño operation nodes, and then
queuing them randomly into n jobs. Here, ño refers to the average number of operations per job. To
create C, we fully connect each machine to each operation, to indicate possible assignments, and then
randomly drop a fraction p. Their weights indicate the relative speedup or slowdown for endpoint
operations. Each operation is assigned a baseline runtime, which results in the actual runtime of
operation oi on machine wj when adjusted by the connected edge weights.

4
For the following experiments, we set the number of HGNN iterations to k = 2 and dimensions
of machine and operation embeddings to 16. In each epoch, we sample 128 trajectories, which are
stored in a replay buffer of size 5000, and run 64 training iterations with a batch size of 32 state
transitions. The discount factor is set to γ = 1.0 and the explorations constant to ϵ = 0.1. The
network is updated using the Adam optimizer, with a learning rate of 8 × 10−5 .

5.2 Baselines

To evaluate our method, we benchmark against simulated annealing, a probabilistic approximation

method, and a state-of-the-art meta-heuristic introduced in Nouri et al. [25]. To the best of our
knowledge, this work is the first to solve FJSP using a deep learning-based method. We also
benchmark our scheduling performance against recent deep learning methods designed for the
simpler scheduling problem [26], JSSP.
To compare performance across methods, we evaluate the optimality gap
Cmin
ϵ= −1 (3)
C∗

where Cmin is the makespan of a candidate solution and C ∗ the optimal makespan. This metric can
also be referred to as relative error. Throughout these experiments, we use Google’s OR-tools solver
[27] to solve for C ∗ .

5.3 Results

Figure 3 shows the learning curves for our Q-learner trained on sample problems of size 25 × 15.
Around epoch 150, we find sudden jumps in the success rate and training rewards. These jumps
happen when the learnt solver transitions from gridlocking itself to producing feasible solutions for
the specified problem instances. The optimality gap can only be evaluated for feasible solutions,
therefore the relative error curve starts around that same epoch and then decreases sharply as the
solver learns to improve on its general strategy.

1,000 20
Reward
Loss

500 10

Train
0 0 Eval

0 100 200 300 0 100 200 300

1 0.8
Relative Error
Success Rate

0.6

0.5 0.4

0.2

0 0
0 100 200 300 0 100 200 300
Epoch Epoch

Figure 3: Training performance summary (25 × 15).

After the Q-learner has been trained, it can be used to solve problems of any size. As such, this
formulation can be interpreted as a type of meta-learning, enabled by using a graphical representation

5
of the problem space that is not limited to a fixed problem size. We ran the same solver on multiple
size problems and evaluated 128 samples in each case. Table 1 summarizes our results. We find that
the meta-heuristic is a strong baseline that solves optimally until the largest size problem 100 × 20,
while FIFO is not performing well from the beginning. The results reported for DQL are from the
same network, but evaluated on different sample sizes. 100 × 20 is an exception where the relative
error was 29% but after 100 epochs of fine-tuning on larger problems, the error was reduced to 6.2%.
For further reference, we also report optimality gaps for ScheduleNet [26], which is a similar deep
RL approach but for the classic scheduling problem JSSP. Our network architecture only defines 960
independent weights, while ScheduleNet defines 6022 for the actor alone. Because ScheduleNet
adopts PPO, it further requires a trained critic. Therefore, we find that we maintain equal or better
performance than an equivalent deep RL approach performs on a simpler problem, while using less
than a sixth of its parameters.

Table 1: Optimality gaps for different FJSP sizes.

15 × 15 15 × 25 30 × 20 50 × 15 50 × 20 100 × 20
FIFO – 0.7647 0.69 0.857 – 1.235
Meta-Heuristic – 0.0 0.0 0.0 0.0 0.022
DQL 0.01 0.011 0.0 0.052 0.04 0.062
ScheduleNet 0.153 0.194 0.187 0.138 0.135 0.066

In Figure 4, we compare the runtime performance for different sizes of FJSP. While the meta-heuristic
seems to increase in polynomial time, the runtime of DQL is nearly constant with problem size.

DQL
Meta-Heuristic
6 FIFO
Runtimes [s]

0
15×15 15x25 30×20 50×15 50×20 100×20
Problem Size

Figure 4: Runtimes per sample for different FJSP sizes.

6 Conclusion
In this work, we demonstrate how graph neural networks can be used to efficiently solve large,
complex combinatorial optimization problems. By framing the CO instances as graph sequences, we
can use reinforcement learning to find promising solutions. These solutions will be approximate, but
while keeping the relative error low, we find our method scales much better in runtime than a more
accurate meta-heuristic. We believe our results show promising initial results towards approaching
the performance of state-of-the-art heuristic-based solvers.

Acknowledgments and Disclosure of Funding

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited. This ma-
terial is based upon work supported by the Under Secretary of Defense for Research and Engineering
under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommen-
dations expressed in this material are those of the author(s) and do not necessarily reflect the views of
the Under Secretary of Defense for Research and Engineering. © 2022 Massachusetts Institute of
Technology. Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part

6
252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in
this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of
this work other than as specifically authorized by the U.S. Government may violate any copyrights
that exist in this work.

References
[1] M. Gori, G. Monfardini, and F. Scarselli, “A new model for learning in graph domains,” in IEEE
International Joint Conference on Neural Networks, vol. 2, 2005, pp. 729–734.
[2] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network
model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2008.
[3] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in
International Conference on Learning Representations, 2017.
[4] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional
neural networks for web-scale recommender systems,” in ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 2018, pp. 974–983.
[5] C. Morris, M. Ritzert, M. Fey, et al., “Weisfeiler and leman go neural: Higher-order graph neural
networks,” in AAAI Conference on Artificial Intelligence (AAAI), 2019.
[6] J. Zhao, Y. Dong, M. Ding, E. Kharlamov, and J. Tang, “Adaptive diffusion in graph neural networks,” in
Advances in Neural Information Processing Systems (NeurIPS), 2021.
[7] M. Fereydounian, H. Hassani, J. Dadashkarimi, and A. Karbasi, “The exact class of graph functions
generated by graph neural networks,” arXiv preprint arXiv:2202.08833, 2022.
[8] T. Koch, T. Achterberg, E. Andersen, et al., “Miplib 2010,” Mathematical Programming Computation,
vol. 3, pp. 103–163, Jun. 2011.
[9] K. Xu, C. Li, Y. Tian, T. Sonobe, K.-i. Kawarabayashi, and S. Jegelka, “Representation learning on
graphs with jumping knowledge networks,” in International Conference on Machine Learning (ICML),
PMLR, 2018, pp. 5453–5462.
[10] S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” In International
Conference on Learning Representations, 2022.
[11] U. Alon and E. Yahav, “On the bottleneck of graph neural networks and its practical implications,” in
International Conference on Learning Representations, 2021.
[12] S. Li, J. K. Gupta, P. Morales, R. Allen, and M. J. Kochenderfer, “Deep implicit coordination graphs for
multi-agent reinforcement learning,” in AAMAS, 2021.
[13] B. Xiao, B. Ramasubramanian, and R. Poovendran, “Agent-temporal attention for reward redistribution
in episodic multi-agent reinforcement learning,” CoRR, vol. abs/2201.04612, 2022. arXiv: 2201.04612.
[14] L. Liu, W. Wang, Y. Hu, J. Hao, X. Chen, and Y. Gao, “Multi-agent game abstraction via graph attention
neural network,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7211–7218,
Apr. 2020.
[15] X. Ma, D. Isele, J. K. Gupta, K. Fujimura, and M. J. Kochenderfer, “Recursive reasoning graph for
multi-agent reinforcement learning,” Proceedings of the AAAI Conference on Artificial Intelligence,
vol. 36, no. 7, pp. 7664–7671, Jun. 2022.
[16] N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev, “Reinforcement learning for combinatorial
optimization: A survey,” Computers & Operations Research, vol. 134, p. 105 400, 2021.
[17] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum
chemistry,” in International Conference on Machine Learning (ICML), PMLR, 2017, pp. 1263–1272.
[18] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,”
in International Conference on Learning Representations, 2018.
[19] M. J. Kochenderfer, T. A. Wheeler, and K. H. Wray, Algorithms for decision making. MIT press, 2022.
[20] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in
Proceedings of the AAAI conference on artificial intelligence, vol. 30, 2016.
[21] Z. Zhang, Z. Pan, and M. J. Kochenderfer, “Weighted double q-learning.,” in International Joint Confer-
ence on Artificial Intelligence (IJCAI), 2017, pp. 3455–3461.
[22] D. Applegate and W. Cook, “A computational study of the job-shop scheduling problem,” ORSA Journal
on computing, vol. 3, no. 2, pp. 149–156, 1991.
[23] I. A. Chaudhry and A. A. Khan, “A research survey: Review of flexible job shop scheduling techniques,”
International Transactions in Operational Research, vol. 23, no. 3, pp. 551–591, 2016.
[24] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances
in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, et al., Eds., vol. 30,
Curran Associates, Inc., 2017.

7
[25] H. E. Nouri, O. Belkahla Driss, and K. Ghedira, “Solving the flexible job shop problem by hybrid
metaheuristics-based multiagent model,” International Journal of Industrial Engineering, vol. 1, pp. 1–
14, May 2017.
[26] J. Park, S. Bakhtiyarov, and J. Park, “Schedulenet: Learn to solve multi-agent scheduling problems with
reinforcement learning,” in International Conference on Learning Representations, 2022.
[27] Google AI, Google or-tools, version 9.3, Mar. 2022.

Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
61 pages
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
58 pages
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
8 pages
2204.07697v1
No ratings yet
2204.07697v1
23 pages
NIPS 2017 Learning Combinatorial Optimization Algorithms Over Graphs Paper
No ratings yet
NIPS 2017 Learning Combinatorial Optimization Algorithms Over Graphs Paper
11 pages
Applying Graph Neural Networks To The Decision Ver 240324 001255
No ratings yet
Applying Graph Neural Networks To The Decision Ver 240324 001255
14 pages
C G N N: Ooperative Raph Eural Etworks
No ratings yet
C G N N: Ooperative Raph Eural Etworks
22 pages
[2020 Arxiv]A Survey on The Expressive Power of Graph Neural Networks
No ratings yet
[2020 Arxiv]A Survey on The Expressive Power of Graph Neural Networks
42 pages
Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning
No ratings yet
Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning
8 pages
Rolip2 Report GNN
No ratings yet
Rolip2 Report GNN
6 pages
Design Space for Graph Neural Network
No ratings yet
Design Space for Graph Neural Network
9 pages
Why_are_Graph_Neural_Networks_Effective_for_EDA_Problems
No ratings yet
Why_are_Graph_Neural_Networks_Effective_for_EDA_Problems
8 pages
GMPT_cikm2021_final
No ratings yet
GMPT_cikm2021_final
10 pages
The Transformer Network For The Traveling Salesman Problem
No ratings yet
The Transformer Network For The Traveling Salesman Problem
10 pages
Graph Contrastive Learning With Augmentations
No ratings yet
Graph Contrastive Learning With Augmentations
12 pages
CS 224W Fall 2023 HW1
No ratings yet
CS 224W Fall 2023 HW1
11 pages
GNNs
No ratings yet
GNNs
28 pages
GNN Review
No ratings yet
GNN Review
26 pages
ICML20 GRL Workshop
No ratings yet
ICML20 GRL Workshop
5 pages
Computing Graph Neural Networks: A Survey From Algorithms To Accelerators
No ratings yet
Computing Graph Neural Networks: A Survey From Algorithms To Accelerators
38 pages
AI To Solve TSP
No ratings yet
AI To Solve TSP
15 pages
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
No ratings yet
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
107 pages
A graph convolutional encoder and multi-head attention decoder network for TSP via reinforcement learning
No ratings yet
A graph convolutional encoder and multi-head attention decoder network for TSP via reinforcement learning
16 pages
A Gentle Introduction To Graph Neural Networks
No ratings yet
A Gentle Introduction To Graph Neural Networks
9 pages
2302.08043v3
No ratings yet
2302.08043v3
12 pages
Introduction to Graph Neural Networks - Zhiyuan Liu & Jie Zhou
No ratings yet
Introduction to Graph Neural Networks - Zhiyuan Liu & Jie Zhou
142 pages
Bagaria 2021 DSG
No ratings yet
Bagaria 2021 DSG
15 pages
Graph Transformer Networks: Corresponding Author
No ratings yet
Graph Transformer Networks: Corresponding Author
11 pages
Chap7 GNN (20240229) - DL4H Practioner Guide
No ratings yet
Chap7 GNN (20240229) - DL4H Practioner Guide
37 pages
Approximation- and Quantization-Aware Training for Graph Neural Networks
No ratings yet
Approximation- and Quantization-Aware Training for Graph Neural Networks
14 pages
AlonAndYahav 2021 On The Bottleneck of Graph Neu
No ratings yet
AlonAndYahav 2021 On The Bottleneck of Graph Neu
16 pages
A Biased Graph Neural Network Sampler With Near Optimal Regret
No ratings yet
A Biased Graph Neural Network Sampler With Near Optimal Regret
25 pages
2024_Introduction to Graph Neural Networks A Starting
No ratings yet
2024_Introduction to Graph Neural Networks A Starting
49 pages
Article
No ratings yet
Article
10 pages
2501.11968v1
No ratings yet
2501.11968v1
22 pages
On The Bottleneck of Graph Neural Networks
No ratings yet
On The Bottleneck of Graph Neural Networks
16 pages
Automated Unsupervised Graph Representation Learning
No ratings yet
Automated Unsupervised Graph Representation Learning
14 pages
A Graph-Based Evolutionary Algorithm: Genetic Network Programming (GNP) and Its Extension Using Reinforcement Learning
No ratings yet
A Graph-Based Evolutionary Algorithm: Genetic Network Programming (GNP) and Its Extension Using Reinforcement Learning
31 pages
GraphGPT
No ratings yet
GraphGPT
10 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
141 pages
How To Transfer Algorithmic Reasoning Knowledge To Learn New Algorithms?
No ratings yet
How To Transfer Algorithmic Reasoning Knowledge To Learn New Algorithms?
21 pages
Edgenets: Edge Varying Graph Neural Networks: Elvin Isufi, Fernando Gama and Alejandro Ribeiro
No ratings yet
Edgenets: Edge Varying Graph Neural Networks: Elvin Isufi, Fernando Gama and Alejandro Ribeiro
15 pages
Improving Graph Neural Networks With Simple Architecture Design
No ratings yet
Improving Graph Neural Networks With Simple Architecture Design
10 pages
Master Thesis Lipovsky
No ratings yet
Master Thesis Lipovsky
76 pages
Reinforcement Learning Optimization
No ratings yet
Reinforcement Learning Optimization
6 pages
Tutorial9
No ratings yet
Tutorial9
38 pages
GNN-Foundations-Frontiers-and-Applications-chapter3
No ratings yet
GNN-Foundations-Frontiers-and-Applications-chapter3
11 pages
Srarm_Unit 1
No ratings yet
Srarm_Unit 1
16 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
Reinforcement Learning For Combinatorial Optimization: A Survey
No ratings yet
Reinforcement Learning For Combinatorial Optimization: A Survey
24 pages
A Comparison Between Recursive Neural Networks and Graph Neural Networks
No ratings yet
A Comparison Between Recursive Neural Networks and Graph Neural Networks
8 pages
CDL 2024 - GNNs Masterclass
No ratings yet
CDL 2024 - GNNs Masterclass
171 pages
GRL Book-Chapter 5-GNNs
No ratings yet
GRL Book-Chapter 5-GNNs
21 pages
Learning Branching Heuristics From Graph Neural Networks
No ratings yet
Learning Branching Heuristics From Graph Neural Networks
19 pages
Unit III GNN
No ratings yet
Unit III GNN
56 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
NeurIPS 2019 Exact Combinatorial Optimization With Graph Convolutional Neural Networks Paper
No ratings yet
NeurIPS 2019 Exact Combinatorial Optimization With Graph Convolutional Neural Networks Paper
13 pages
CAAI Trans on Intel Tech - 2024 - Sharma - Image and video analysis using graph neural network for Internet of Medical
No ratings yet
CAAI Trans on Intel Tech - 2024 - Sharma - Image and video analysis using graph neural network for Internet of Medical
15 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
ADVANTAGES OF GROUP WORK
No ratings yet
ADVANTAGES OF GROUP WORK
2 pages
B 140 - B 140M - 97 - Qje0mc05n0uy
No ratings yet
B 140 - B 140M - 97 - Qje0mc05n0uy
3 pages
Science Fair: Popsicle Bridge
50% (4)
Science Fair: Popsicle Bridge
13 pages
Pintura Feminismo
No ratings yet
Pintura Feminismo
28 pages
-Agriyatra.in- CRP 351 - Physiological Techniques in Crop Production (Optional Course)
No ratings yet
-Agriyatra.in- CRP 351 - Physiological Techniques in Crop Production (Optional Course)
195 pages
Apollo International School: It Is Required To Attend All The Exams Through Physical Supervision
No ratings yet
Apollo International School: It Is Required To Attend All The Exams Through Physical Supervision
12 pages
AU Colleges DATA
No ratings yet
AU Colleges DATA
86 pages
Commercial RO Plant Manufacturer in Delhi
No ratings yet
Commercial RO Plant Manufacturer in Delhi
2 pages
Site Save of CodeSlinger - Co.uk - GameBoy Emulator Programming in C++ - Memory
No ratings yet
Site Save of CodeSlinger - Co.uk - GameBoy Emulator Programming in C++ - Memory
3 pages
HYD 4
No ratings yet
HYD 4
4 pages
Log Sheet Lti Soda Recovery Description Unit Remarks
No ratings yet
Log Sheet Lti Soda Recovery Description Unit Remarks
3 pages
UNIT-4-merged
No ratings yet
UNIT-4-merged
203 pages
GDBP CON 2021-Brochure
No ratings yet
GDBP CON 2021-Brochure
13 pages
FloArm Top HPU AS Ver 5
No ratings yet
FloArm Top HPU AS Ver 5
5 pages
Stat Trek: Probability Distributions: Discrete vs. Continuous
No ratings yet
Stat Trek: Probability Distributions: Discrete vs. Continuous
3 pages
Accomplishment Report-Committee On Class Program Preparation
100% (8)
Accomplishment Report-Committee On Class Program Preparation
3 pages
Activity 1
No ratings yet
Activity 1
3 pages
Operation Manuals 4XC-W
No ratings yet
Operation Manuals 4XC-W
9 pages
Module 3: Wear: Fig. 3.1 (A) : Zero Wear of Helical Gear
No ratings yet
Module 3: Wear: Fig. 3.1 (A) : Zero Wear of Helical Gear
30 pages
Kortmann 2020 Chapter 7 Pragmatics
No ratings yet
Kortmann 2020 Chapter 7 Pragmatics
27 pages
Implementation+of+Novel+Machine+Learning+Technique+Using+Several+Meta+With+Naive+Bayes+Models+to+Analyse+the+Performance+of+Wave+Energy+Converters (1)
No ratings yet
Implementation+of+Novel+Machine+Learning+Technique+Using+Several+Meta+With+Naive+Bayes+Models+to+Analyse+the+Performance+of+Wave+Energy+Converters (1)
6 pages
Galois theory 3rd Edition Ian N. Stewart - Quickly download the ebook to read anytime, anywhere
100% (1)
Galois theory 3rd Edition Ian N. Stewart - Quickly download the ebook to read anytime, anywhere
51 pages
15 07 2020 Hydrological - Hydraulic Report - Ifakara Power Substation - 04.8.2020 - RT
No ratings yet
15 07 2020 Hydrological - Hydraulic Report - Ifakara Power Substation - 04.8.2020 - RT
28 pages
Logic Critical Thinking PPT First
No ratings yet
Logic Critical Thinking PPT First
14 pages
GPS Laparoscopic Ultrasound Embedding An Electromagnetic Sensor in A Laparoscopic Ultrasound Transducer
No ratings yet
GPS Laparoscopic Ultrasound Embedding An Electromagnetic Sensor in A Laparoscopic Ultrasound Transducer
9 pages
Refining The Research Capability of Master Teacher: It's Challenges and Courses of Action
No ratings yet
Refining The Research Capability of Master Teacher: It's Challenges and Courses of Action
6 pages
Journey To The Center of The Cell Final
No ratings yet
Journey To The Center of The Cell Final
11 pages
Accounting Theory Introduction Power Point Slides
No ratings yet
Accounting Theory Introduction Power Point Slides
11 pages
AET Question Paper, Cycle Test-2 (2011-2012)
No ratings yet
AET Question Paper, Cycle Test-2 (2011-2012)
1 page
Pressure Vessel Plates, Alloy Steel, Chromium-Molybdenum: Standard Specification For
No ratings yet
Pressure Vessel Plates, Alloy Steel, Chromium-Molybdenum: Standard Specification For
6 pages

71 Graph Q Learning For Combinato

Uploaded by

71 Graph Q Learning For Combinato

Uploaded by

Graph Q-Learning

for Combinatorial Optimization

Victoria M. Dax Jiachen Li Kevin Leahy

Graph-structured data is ubiquitous throughout natural and social sciences, and

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

where ψ : Rk × Rk → Rk is a message function, ϕ : Rk → Rk is a readout function, and AGG is

4.1 Problem Definition

Figure 1: Training cycle.

4.2 Combinatorial Optimization as MDPs

4.3 Heterogeneous Graph Neural Networks

(a) Initial state representation. (b) State at t = 5

5.1 Implementation details

To evaluate our method, we benchmark against simulated annealing, a probabilistic approximation

0 100 200 300 0 100 200 300

Figure 3: Training performance summary (25 × 15).

Table 1: Optimality gaps for different FJSP sizes.

Figure 4: Runtimes per sample for different FJSP sizes.

Acknowledgments and Disclosure of Funding

You might also like