0% found this document useful (0 votes)
3 views17 pages

Online-learning task scheduling with GNN-RL scheduler

Uploaded by

ayoubouhadi1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views17 pages

Online-learning task scheduling with GNN-RL scheduler

Uploaded by

ayoubouhadi1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Cluster Computing (2024) 27:589–605

https://doi.org/10.1007/s10586-022-03957-w (0123456789().,-volV)(0123456789().,-volV)

Online-learning task scheduling with GNN-RL scheduler


in collaborative edge computing
Chengfeng Jian1 • Zhuoyang Pan1 • Lukun Bao1 • Meiyu Zhang1

Received: 9 June 2022 / Revised: 12 December 2022 / Accepted: 19 December 2022 / Published online: 30 January 2023
Ó The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023

Abstract
With the development of collaborative edge computing (CEC), the manufacturing market is gradually moving toward
large-scale, multi-scenario, and dynamic directions. The existing scheduling strategies based on machine learning or deep
learning are only applicable to specific scenarios, which is difficult to meet the requirements of dynamic real-time
scheduling in multiple scenarios. The proposed digital twin technology provides a new solution for real-time scheduling of
multiple scenarios. In this paper, a digital twin-oriented multi-scene real-time scheduler (GNN-RL) is proposed. This
scheduler converts task sequences into node trees and sets up two learning layers. The first layer is an online learning
representation layer, which uses GNN to learn node features of embedded structures in real time to boost large instances
without additional training. The second layer is the online learning policy layer, which introduces imitation learning
mappings into optimal scheduling behavior policies adapted to multiple scenarios. Finally, our approach is validated in
several scenarios in 3D digital twin factories, such as computationally intensive, delay-sensitive, and task-urgent scenarios.
Since the scheduler proposed in this paper learns general features of the embedding graph rather than instance-specific
features, it has good generality and scalability, with good generalization and scalability, outperforming other scheduling
rules and schedulers on various benchmarks.

Keywords Task scheduling  Graph neural network  Collaborative edge computing  Digital twin

1 Introduction support for the realization of intelligent manufacturing.


The proposed edge computing [2] allows people to see the
Industry 4.0 is a new trend in the development of the global possibility of the combination of edge computing and
manufacturing industry [1]. To cope with the new tech- artificial intelligence.
nological revolution and industrial change, artificial intel- However, with the development of edge computing and
ligence, cloud computing and other technologies have intelligent technology, many problems have emerged [3].
deeply integrated traditional industries to provide technical The contradiction between the resource demand of artificial
intelligence algorithm and the limited resource of edge
nodes, and the contradiction between the diversified
Chengfeng Jian, Zhuoyang Pan, Lukun Bao and Meiyu demand of intelligent task and the single capability of edge
Zhang have contributed equally to this work. node equipment. At the same time, due to the uneven
geographical location distribution and the difference in the
& Chengfeng Jian
[email protected] number of de-vices at the edge, many communication,
collaboration and transmission problems that don’t exist in
Zhuoyang Pan
[email protected] the traditional cloud center are added [4]. With the rapid
development of the Internet of Things, the free flow of data
Lukun Bao
[email protected] has become the general trend, and data security is also an
urgent problem to be solved in CEC.
Meiyu Zhang
[email protected] Cloud-edge Computing models can take advantage of
both cloud and edge computing with powerful computing
1
Computer Science and Technology College, Zhejiang power and real-time response. However, most of the cloud-
University of Technology, Hangzhou 310023, China

123
590 Cluster Computing (2024) 27:589–605

edge collaborations only consider the task scheduling industry 4.0. Data acquisition, data communication and
between cloud and edge, ignoring the scheduling between other advanced technologies have triggered a large inter-
edge devices. Collaborative edge computing (CEC) [5] is a action between physical space and virtual space. The
new computing paradigm that combines cloud-side col- importance of digital twinning (DT), which is characterized
laborative scheduling and edge collaborative scheduling, by the integration of Information Physics, has been paid
considering not only the collaborative scheduling between more and more attention by academia and industry. the
cloud and edge layer, but also the scheduling between edge digital twin also creates higher requirements for real-time
devices. Collaborative edge computing can combine the scheduling, as its model predictions have to be reflected in
massive computing power of the cloud and the advantage the physical spaces in both directions in real-time.
of closer proximity to users at the edge to handle compu- Finally, In the context of Industry 4.0, a highly dynamic
tation-intensive and delay-sensitive applications, and and large-scale market environment has become the con-
through the cooperation of cloud and edge computing, it sensus. In the previous CEC development environment, the
enables many applications to be completed at the edge [6], real-time scheduling approach was very different, but it
which can reduce the pressure of node computing and has was not a realistic solution. There is an urgent need for a
global optimal scheduling capability. real-time dynamic scheduling approach to accommodate
In the past market environment, Scheduling problem can large-scale markets. The mainstream papers nowadays only
be regarded as an NP-hard problem, and there are three address one of these points for improvement, and do not
traditional methods to solve it: exact algorithm, approxi- consider two aspects [15].
mate algorithm and heuristic algorithm. Exact algorithms In order to solve the above problems, this paper pro-
such as mathematical programming [7] and branch-and- poses a learning task scheduler for digital twin driven
bound algorithms [8] are based on enumeration or collaborative edge computing multi scene. Specifically, the
branching and are bound with integer programming for- work done in this paper is as follows:
mulas, may be prohibited for large instances. On the other (1) We propose the Online-learning Task Scheduling
hand, polynomial time approximation algorithms are System in the collaborative edge computing environment,
desirable, but may suffer from weak optimality guarantees as shown in Fig.1, which accepts task orders in the cloud
or empirical performance, may not even suffer from non- and trains the scheduler, and deploys the scheduler at the
approximation problems. Swarm intelligence optimization edge segment to schedule the tasks.
algorithms, such as simulated annealing [9], genetic algo- (2) Design a learning task scheduler for digital twins.
rithms [10], ant colony algorithms [11], etc., generate high- First, the problem example is represented as a graph. GNN
quality solutions within reasonable computing time. They transforms the graph into a set of node embedding layers,
are usually fast, efficient algorithms that lack theoretical so as to summarize the structural information of the target
guarantees and may also require a lot of problem-specific problem. Secondly, execute node embedding and generate
research and repeated trial and error by algorithm scheduling operation. The introduction of reinforcement
designers. learning can optimize model training. Since GNN enables
In recent years, with the popularity of deep learning, the scheduling strategy to handle graphs of different sizes,
more and more researchers try to use deep learning meth- the trained strategy can be used to generate scheduling
ods to solve NP-hard problems. Supervised learning [12] operations for problem instances of different sizes.
enables the trained network to replicate the existing (3) Build a three-dimensional digital twin platform to
scheduling behaviour by learning the parameters of the guide production scheduling. The experimental results
neural network. Reinforcement learning (RL) [13, 14] is show that the method in this paper has good computing
also widely used to solve scheduling problems by mapping power and generalization performance, can be extended to
the current state of the manufacturing system to scheduling large-scale examples without additional training, and can
operations, taking into account the sequential nature of the meet the current dynamic and uncertain manufacturing
problem. Due to the unique advantages of RL in solving market environment.
scheduling problems, there have been a considerable The rest of the paper is organized as follows. Section 2
number of research results. However, most deep learning summarizes the work related to resource scheduling in a
methods, including DRL, are for specific training instances collaborative edge computing environment. Section 3
and are very poorly scalable. When the plant environment describes the online task scheduling system. Section 4
changes, the models need to be trained from scratch, which details the principle of the Graph Neural Network-Rein-
is costly, and the scheduling rules may not adapt to new forcement Learning scheduler (GNN-RL scheduler). Sec-
scenarios. tion 5 applies the scheduler on a real assembly line and
The digital twin is one of the most promising enabling analyzes the results. Section 6 summarizes our work and
technologies to realize intelligent manufacturing and discusses the direction of future work.

123
Cluster Computing (2024) 27:589–605 591

Fig. 1 Online-learning Task Scheduling System

2 Related work intelligent manufacturing scenarios. In recent years,


researchers have begun to use machine learning or deep
The explosive growth of data has brought many problems. learning to optimize resource scheduling strategies.
Although centralized cloud services have powerful com- Li et al. [16] proposed an RL based optimization
puting power and flexible resource allocation strategies, the framework to solve the resource allocation problem in
distance between highly dispersed enterprise manufactur- wireless MEC. The framework optimizes the offload
ing resources and the cloud is long, and cloud computing decision and resource allocation by optimizing the total
services cannot provide guarantee for low latency appli- delay cost and energy consumption of all UEs. Yang et al.
cations. As a result, cloud services need to sink and migrate [17] proposed a resource allocation strategy for multi-user
The concept of Cloud Edge Collaborative Environment URLLC edge computing network based on deep rein-
is put forward in the background of the rapid development forcement learning. Wang et al. [18] solve scheduling
and application of edge computing, a new information problems by seeking heuristic learning. However, the
technology. Its core idea is to make use of the advantages common limitation of these methods is that they all rely on
of CEC model to meet the requirements of personalized custom characteristics to obtain satisfactory results, can
market and global optimization. Due to the complexity of only learn specific instance characteristics to obtain the
task computing, storage, network, energy consumption, scheduling scheme, and cannot adapt to dynamic scenarios.
security and other factors, it is difficult to develop a rea- With the proposal of the concept of digital twin, more
sonable task scheduling strategy. and more researchers try to use digital twin method for
As a NP hard problem, resource scheduling problem is resource task scheduling. Dang et al. [19] expounds the two
usually solved by heuristic method and dynamic pro- major trends in the future disease prediction and medical
gramming. Dinh et al. However, the drawbacks of tradi- field integrating Digital Twins and promoting precision
tional heuristic or approximate algorithms are no longer medicine. [20] proposed the application driven digital dual
suitable to solve the dynamic scheduling requirements in network middleware to realize the dynamic management of

123
592 Cluster Computing (2024) 27:589–605

network resources in the edge industrial environment. Tao (D2D) networks, M Lee et al.[27] describe the scheduling
et al. [21] focused on the application of DT in industry, and problem as a DAG and propose a Weighted Double Deep
comprehensively reviewed the latest progress and key Q-Network-based Reinforcement Learning algorithm
technologies of DTS. (WDDQN-RL) for scheduling multiple workflows. [28] use
In fact, previous work did not make effective use of the a graph embedding-based link scheduling method for D2D
combinatorial structure of graph problem when solving networks and reduce the computational complexity by
combinatorial optimization problems, and ignored a pre- means of K-nearest neighbor graph representation.
mise, that is, the optimization problems often appear in the For the cloud and the particularity of the collaborative
real manufacturing scenarios are the same, but the data are environment, our method uses graph neural network for
different. Table 1 summarizes the scheme of graph struc- task allocation and scheduling, figure attention network
ture in solving combinatorial optimization problems. compared with neural network is introduced into the
Khali et al. [22] proposed a unique combination of attention mechanism, by giving different weight coefficient
reinforcement learning and graphic embedding to solve assigned tasks to determine the order, then the task is
combinatorial optimization problems. The learned greed expressed as node embedded figure, and use the RL strat-
strategy behaves like a meta-algorithm that incrementally egy study schedule. Our approach provides a possibility to
constructs a solution whose rows are determined by the generalize to invisible instances.
output of the graph embedded in the network that captures
the current state of the solution. Kool et al. [23] proposed a
model based on the antenna layer, which has advantages 3 Modeling and stating
over the pointer network, and showed how to train the
model using a simple baseline based on determining the This paper is mainly based on the research of task
greedy roll. Mao et al. [24] used reinforcement learning scheduling strategy in the collaborative edge computing
(RL) and neural networks to learn workload-specific environment. The collaborative edge computing model is
scheduling algorithms without any human instructions to divided into cloud service layer, edge node layer and ter-
achieve a high-level goal, such as minimizing average minal layer. Collaborative edge computing can be applied
work completion time. Gasse et al.[25] Proposed a new in scenarios such as content distribution networks, indus-
graph convolution neural network model for learning trial Internet, smart homes, and smart transportation.
branch and bound variable selection strategy by using the Among them, the cloud service layer provides high-per-
bipartite graph representation of mixed integer linear pro- formance, highly reliable, and scalable resources to support
gramming with natural variable constraints. Z. Wang et al. users’ diverse on-demand services. A control flow (CF) is
[26] address the scheduling problem of integrating generated between the cloud service layer and the edge
advanced robots in manufacturing by generating high- service layer; the edge server is determined according to
quality solutions based on graph attention networks. To the scheduling strategy Where tasks are scheduled to be
solve the link scheduling problem in device-to-device executed, tasks are executed between the cloud edge to

Table 1 Summary of graph neural network in solving combinatorial optimization problems


Graph Attention Learning Problems solved Reference
structure mechanism method

Undirected, No Q-learning Minimum Vertex Cover,Maximum Cut and TSP Khalil et al.,
weighted 2017 [22]
Undirected, Yes Reinforce TSP, Vehicle Routing Problem, Orienteering Problem, Prize Collecting TSP Kool et al.,
unnweighted 2019 [23]
Directed, No Reinforce Data processing cluster scheduling Mao et al.,
unweighted 2019 [24]
Undirected, No Behavioral Four NP-hard problem benchmarks: set covering instances, cloning combinatorial Gasse et al.,
weighted cloning auction instances, capacitated facility location instances and maximum 2019 [25]
independent set instances
Directed, Yes Imitation Multi-robot coordination under temporal and spatial constraints Wang et al.,
weighted learning 2020 [26]
Directed, No Reinforce Link scheduling in device-to-device (D2D) networks Lee et al.,
weighted 2021 [30]

123
Cluster Computing (2024) 27:589–605 593

generate data flow (DF); the terminal layer is composed of 4 GNN-RL scheduler
mobile devices, computers, smart home appliances, smart
terminals, etc. owned by users, and tasks can be prepro- 4.1 Scheduler process
cessed Or, send a task request (request data, RD) to the
edge server, and the task execution result (answer data, Combinatorial optimization problems such as traveling
AD) will be returned from the local edge server to the salesman (TSP) actually have a consistent and unified
client. graph structure, but many deep learning models do not use
Figure 1 shows the Online-learning Task Scheduling this precondition when dealing with the heuristics of
System. As shown in the figure, plant managers can learning combination problems. Their work attempts
schedule tasks in the CEC environment by operating the Through a large number of examples to learn and promote
Graph Neural Network-Reinforcement Learning scheduler to new examples. Our method takes advantage of the
(GNN-RL scheduler). When plant managers determine the prerequisite that this type of problem usually has the same
task sequences to be executed through scheduling, they first graph structure, solves the learning algorithm of graph
enter the task sequences into the Graph Neural Network- problem through the combination method of graph atten-
Reinforcement Learning scheduler (GNN-RL scheduler) tion network and reinforcement. Based on the unique
and also store the task scheduling information in the cloud combination of graph neural network and Reinforcement,
center. The data stored in the cloud centre is centrally we propose a GNN-RL scheduler to learn combinatorial
coordinated and scheduled by the cloud. The cloud coor- optimization problems. As shown in Fig. 2, the GNN-RL
dinates and schedules, while the cloud centre offloads to scheduler process is as follows.
edge nodes or even end devices based on data While obtaining optimal solutions to large-scale
characteristics. scheduling problems is computationally difficult, it is fea-
The task sequence first goes through an offloading sible to use exact methods to optimally solve smaller-scale
action. There are many studies on offloading operations, problems. Furthermore, we can use these exact methods to
and since this is not the focus of this paper, we do not automatically generate application-specific examples for
choose to use the latest cutting-edge approach, but rather training imitation learning algorithms.
the classical game theory-based approach to perform edge First, we construct the task embedding graph. We con-
offloading. We consider the computational volume and task struct feasible solutions by continuously adding nodes
complexity. The offload is divided into two steps: the first based on the graph structure, maintain them to satisfy the
offload offloads tasks with high computational volume and graph constraints of the problem. The task sequence is
high task complexity to the edge, which are scheduled transformed into a task embedding graph through GNN,
uniformly through the Graph Neural Network-Reinforce- and the characteristic information of the node is retained.
ment Learning scheduler (GNN-RL scheduler); the second This information can be expanded and updated. At the
offloads the remaining tasks with low computational vol- same time, the graph structure can clearly express the
ume and low task complexity directly to the end machine connection between the nodes, which can facilitate task
locally. processing.
Offloading to the edge requires scheduling through the Our goal is to learn strategies for arranging tasks and
Graph Neural Network-Reinforcement Learning scheduler agents after the decision process. To achieve imitative
(GNN-RL scheduler). First, the data input to the scheduler learning through expert demonstration, we define an eval-
is converted into a task embedding graph by representing uation function that calculates the total discounted payoff
GNN at the learning layer, which is still partially input for taking action at step t.
from the cloud center. Second, the task embedding graph Secondly, reinforcement learning is used for strategy
learns the basic policies through the policy learning layer to learning. We will use Reinforcement to learn the strategy
adapt to the multi-scenario task scheduling problem, and parameters of the task embedding graph. The method is set
this part of the computational work is assigned to the edge up in a way that the strategy directly optimizes the
nodes for completion. objective function of the original problem instance. The
A distributed task offloading algorithm based on game main advantage of this method is that it can process
theory is designed for the task offloading policy problem. delayed rewards in a data-efficient manner, where the
The offloading policy is obtained under the optimal remaining increase in the obtained objective function value
resource allocation, and then the optimal resource alloca- is represented; the graph embedding is updated according
tion is performed under the given offloading policy, and the to the Reinforcement partial solution to reflect the new
two processes are iterated over each other until conver- benefits of each node to the final objective value.
gence to maximize the system utility.

123
594 Cluster Computing (2024) 27:589–605

Fig. 2 GNN-RL scheduler

Finally, through reward update. The input of Rein- (5) Remaining time: the time remaining for the
forcement is always changing. Whenever a behavior is operation
made, it affects the input of the next decision, and the input As shown in Fig. 3, Graph node embedding is a method
of supervised learning is independent and identically dis- of representing nodes in a continuous vector space, con-
tributed. Therefore, an agent of reinforcement learning can sidering node/edge features and retaining different types of
make a trade-off between exploration and development, relationship information from the graph. Node embedding
and choose the largest return, and update it through can be considered as a feature vector, which contains
Reward. enough information about the target node and its relation-
ship in the graph, can be easily used to perform various
4.2 Task embedding graph final tasks. Taking into account the graphic characteristics
of the combination problem, we use the graph nueral net-
In this paper, we use the parsing graph , which is a classical work (GNN) to extract the task embedding graph.
representation of scheduling. Since the analysis diagram The task embedding graph we proposed maintains four
only contains static information about the workshop tasks, differentiable functions: the prior node updater fp ð; #1Þ,
such as processing time, process flow and machine usage, it the successor node updater fp ð; #2Þ the separated node
does not contain the dynamic state information of the updater fp ð; #3Þ and the target node updater fp ð; #4Þ. The
nodes. but not the dynamic state information of the nodes, calculation process of the proposed task embedding layer is
it is necessary to merge the dynamic characteristics of the as follows:
nodes into the node states. The dynamic characteristics of 0 0 11
the nodes need to be merged into the node states to rep- X ðk1Þ
hðvkÞ ¼ fnðkÞ ðReLU @fpðkÞ @ hi AAjj
resent the states of the nodes during scheduling. At the i2N pðvÞ
same time, the virtual nodes at the beginning and the end 0 0 11
are ignored because they do not have the actual These two X ðk1Þ AA
nodes have no practical significance in the scheduling  ReLU @fsðkÞ @ hi jj
process. The added node information is shown below: i2N sðvÞ
ð1Þ
(1) Operation status: using the binary, 100, 010, 110 0 0 11
X
indicates that the operation is not yet scheduled, is being ðkÞ
 ReLU @fd @
ðk1Þ AA
hi jj
processed by the machine, and has been completed, i2N dðvÞ
respectively. !
(2) Completion time: the completion time of the X ðk1Þ  
 ReLU hi hðk1Þ hð0Þ
operation. v v
i2V
(3) Completion: is the ratio of the available completion
time to the total completion time. where ReLUðxÞ ¼ maxð0; xÞ position-wisely, and || is the
(4) Waiting time: the time that the operation waits only vector concatenation operator.N p ðvÞ and N s ðvÞ denote the
after it is ready. precedent and succedent node set of node v,

123
Cluster Computing (2024) 27:589–605 595

Fig. 3 Graph Node Embedding Process

respectively.N d ðvÞ is the disjunctive neighbourhood of 4.3 Reinforcement learning


node v. i.e. the set of nodes connected to v by disjunctive
ðkÞ ð0Þ The previous subsection performs node embedding based
edge,hv is the k-th updated*** embedding of node v. hv is
the initial node feature xv by definition. on the graph neural network, and this section performs the
The proposed embedding layer generates the updated selection of optimal scheduling actions on the amount of
ðkÞ node embedding. It is first assumed that node embedding is
node embedding hv while utilizing six different types of
effective for determining the optimal scheduling action by
inputs that are for accounting various relationships among
training the graph neural network parameters to ensure that
the nodes. The inputs are as follows:
P ðk1Þ the computed node embedding produces good scheduling
The precedent node embedding i2N ðvÞ hi . The results.
p

precedent node embedding is used as the input of the In the proposed MDP formulation, the action as is
precedent node updater fp ð; #1Þ.The precedent node defined as the assignment of an available job to the target
updater fp ð; #1Þ generates an intermediate embedding that machine, which corresponds to the selection of the target
contextualises the relationship between target node v and machine as a node operation for processing. The state gs
the precedent nodes in N p ðvÞ. contains the information about the machines available in
P ðk1Þ the s-th transition. To derive the scheduling policy,
The succeeding node embedding i2N ðvÞ hi . The  
s
p as j gks is introduced. The corresponding operation
succeeding node embedding is used as the input of the
generates the probability distribution of the available
succedent node updater fp ð; #2Þ to generate inter-mediate
operations selected by the target machine by using the
node embedding capturing the relationships between target
softmax function as follows:
node v and the succeeding nodes in N s ðvÞ.  
P ðk1Þ   exp fl hkv ; h5
The disjunctive node embedding i2N ðdÞ hi . The k
p as j gs :¼ P ð2Þ
k
u2Ags exp fi ðhu ; h5 Þ
s

disjunctive node embedding is used as the input of the


disjunctive node updater fp ð; #3Þ generate intermediate where fp ð; #5Þ denotes the differentiable function that
node embedding capturing the relationships between target maps the node embedding to the logit value of each node, v
node v and the disjunctive nodes in N d ðvÞ. takes the value of any feasible node ats, and Ags is the set
P ðk1Þ
The graph-level embedding i2V hi . The graph- of processable nodes that take the value of avs . A feasible
level embedding is used to consider the entire status of a scheduling behavior can then be selected from this random
job shop. policy
 
The previous embedding hi
ðk1Þ
. The previous embed- To train the scheduling policy p as j gks , Proximal
ding is used to differentiate the specific node v from other Policy Optimization [29], a reinforcement learning algo-
nodes in the graph. rithm based on policy gradients, is used, and the following
The initial node embedding h0v . The initial node feature equation is a functional form of the approximate MDP state
is used to consider the initial input feature of the target value of Vðgks Þ.
!
node.   X
Using the previous embedding hv
ðk1Þ
and all the inter- vp  vp gks :¼ fv hki ; h6 ð3Þ
i2V
mediate node embeddings described above, the target node P
updater function fp ð; #4Þ computes the updated node where i2V hki is the graph-level embedding and fl ðh6 Þ is a
embedding hkv . differentiable function.

123
596 Cluster Computing (2024) 27:589–605

The goal of this paper is to learn the scheduling policy A^s ¼ds þ ðckÞdsþ1 þ    þ ðckÞTs1 dT1 ð9Þ
while minimizing the maximum job completion time Cmax .
To train the scheduling policy to minimize Cmax , the fol- ds ¼rs þ cV ðgsþ1 ; HÞ  V ðgs ; HÞ ð10Þ
lowing function is designed.
where T is the termination step of the set.
Rðgt ; at ; gtþ1 Þ ¼ wt ð4Þ In addition, the error value function, the entropy reward
term and the objective function are combined to improve
where wt denotes the number of jobs waiting at moment t.
the training performance. When the estimation of the value
The waiting job reward design is sufficient to achieve the
function is accurate, the dominance function can indicate
minimum completion time. It has been shown that mini-
the correct direction to update the parameters and the
mizing the cumulative total number of waiting jobs is
entropy reward encourages full exploration in the training.
closely related to maximizing the completion time. The
Thus, the combined objective function Ltotal s ðHÞ is:
proposed reward representation facilitates the training of h i
 
targ 2
the model by reinforcement learning. The reward for Ltotal ðHÞ ¼ b
E s LCLIP
ðHÞ  a V ð g s ; H Þ  V þbS s ðpHÞ
s s s
waiting work is well defined in each transformation;
therefore, reinforcement learning is often signaled by ð11Þ
X
meaningful rewards during training. Ss ðpHÞ :¼  logðpHðaÞÞpHðaÞ
In this paper, the proposed model is trained using ð12Þ
a2Ags
proximal policy optimization, which is a variant of the PT
newer policy gradient method. The strategy gradient vtarg
s :¼ i¼s ri ð13Þ
approach optimizes the strategy by updating the parameter where Ss ðpHÞ denotes the entropy of the current strategy at
set H in the direction of the gradient of the objective s and Vstarg is the sum of realized rewards. The PPO
function rLðHÞ. algorithm maximizes the above objective function by
Hnew ¼ Hold þ grH LðHÞ ð5Þ updating a set of parameters H in the direction suggested
P  by 5H Ltotal
s ðHÞ. By iteratively updating the parameter set,
where LðHÞ ¼ b E s  dp p
a2A pHða j sÞA ðs; aÞ and g is the the training actor strategy pH ðas j gs Þ maximizes the
learning rate.
expected cumulative reward V ðgs ; HÞ.
When updating parameters using the original policy
gradient method, small changes in the parameter space
often result in large changes in the policy space, leading to
5 Evaluation
unstable training. In particular, this property is important
during task scheduling because the state of the shop job
In this section, we evaluated the performance and feasi-
scheduling cannot be restored to its previous state and a
bility of our scheduler in resource scheduling. Our exper-
single operation will significantly affect the performance of
imental data comes from the manufacturing service
the scheduling. Therefore, to overcome this drawback of
scheduling of various patch cords in a real factory.
the policy gradient approach, this problem can be solved by
using an agent target where the PPO [30] updates the
5.1 Experimental setup
parameters of the GNN and the policy only when the
current representation and the local policy module improve
The IDE is PyCharm, using the Pytorch deep learning
the scheduling performance. Specifically, the clipped agent
platform. The experimental environment is Windows 10,
objective is defined as:
    64-bit operating system, Intel Core i5-7500 CPU, 8GB
LCLIP
s ¼E^s min rs ðHÞA^s ; clipðrs ðHÞ; 1  ; 1 þ ÞA^s memory.
ð6Þ In this paper, we will randomly generate scheduling data
for plants of different sizes (machine sizes ranging from 10
pðas j gs ; hnew Þ to 1000): small plants (about 10 machines), medium plants
rs ðHÞ ¼ ð7Þ
pðas j gs ; Hold Þ (about 100 machines), large plants (about 500 machines),
H ¼ h1 ; h2 ; h3 ; h4 ; h5 ; h6 ð8Þ and mega plants (about 1000 machines). For the range of
tasks in each size problem, small plants are assigned tasks
where the function clip ðrs ðHÞ; 1  ; 1 þ Þ restricts ranging from 100 to 200 at a time, medium plants from
rs ðHÞ to the range [1-,1?] and A^s is the generalized 1,000 to 2,000 tasks, large plants from 5,000 to 10,000
dominance estimation function defined as follows. tasks, and mega plants from 10,000 to 20,000 tasks. It is
important to note that the assignment of tasks is sequen-
tially limited. For convenience, the minimum number of

123
Cluster Computing (2024) 27:589–605 597

tasks is 50, and the duration of each task is generated in a The main advantage of the reinforcement learning-based
uniform distribution in the interval [1,10], which is con- scheduling approach is the adaptive use of various state
sistent with the distribution common in real manufacturing features to determine the optimal scheduling action,
environments. dynamically extracting the most relevant features from the
To train the model, 10,000 small plant scheduling current state and mapping these features to the optimal
problems are set up in this paper, and the training set is action. The GNN-based state representation module further
obtained by running Gurobi, a commercial solver widely enhances flexibility and adaptability by considering the
used for integer linear programming. Meanwhile, 1000 relationship between the machine and the task. As a result,
small-scale plant test datasets are set up to verify the the GNN-RL scheduler will no longer require additional
validity of the model. To verify the generalization ability of training after pre-training.
the model, 1000 problem instances of medium-sized works, The daily work of the factory we selected involves the
large plants and mega plants are set up respectively scheduling of various patch cord manufacturing services.
The trained GNN-RL scheduler can perform large-scale Figure 4 is the 3D simulation system of the factory in
scheduling problems without the need for instance-by-in- which we will conduct simulation. Table 3 is an example
stance training. To verify the effectiveness of the scheduler of specific patch cord manufacturing, including the step
proposed in this paper, a series of classical scheduling name, step number, and the time required for 100 services.
policies, including FIFO, LIFO and SPT, are listed in Among them, the production of a patch cord is ordered and
Table 2 for the priority dispatch rules (PDRs). required. For similar services (such as A01 and A02), you
PDRs were selected as the baseline algorithm to validate only need to choose one of them.
the learning scheduler proposed in this paper.PDRs are Our proposed GNN-RL scheduler can perform task
widely used in various scheduling problems as a heuristic resource scheduling without additional instance training
rule to assign tasks to machines according to their priorities (conventional RL requires additional training when input
while considering the current state of the system.PDRs are data changes occur). In order to test the generalization
simple, easy to understand and have low computational performance of the scheduler, we measured the scheduling
complexity, and are popular in various scheduling indus- performance of different unseen instance sets. As far as we
tries in real life PDRs are favored by various real-life know, no learning-based scheduler can schedule on a new
scheduling industries because of their simplicity, ease of instance, which differs from the trained instance in terms of
understanding, and low computational complexity. size, constraints, and processing time. Therefore, we
Obviously, Swarm intelligence optimization algorithms compare the generalization performance of our model with
such as particle swarm optimization and ant colony algo- some basic scheduling strategies. The specific strategies are
rithms possess superior scheduling performance. However, as shown in Table 4.
PDRs are used for comparison, considering that heuristic
algorithms require complex optimization or search-based 5.2 Scheduler calculation performance
to find the best mixture of various PDRs, which is obvi-
ously not the focus of this paper. The focus of this paper is In order to avoid the influence of contingency factors, our
to explore the potential of schedulers in deriving effective results are the average of the results of 10 experiments. We
dynamic scheduling rules rather than finding the best per- evaluate the scheduling performance of an instance of the
forming algorithm, so PDRs are chosen as the baseline scheduler by calculating the relative scheduling error
algorithm in this section. defined as follows:
PDRs need to continuously utilize predetermined iden- 
Tmax  Tmax
tical state features to perform the next scheduling action. error ¼  ð14Þ
Tmax

Table 2 BASIC STRATEGY


FIFO Process the first-in jobs first.
LIFO The last job is processed first.
SPT The job with the shortest processing time is processed first.
LPT The job with the longest processing time is processed first.
LOR The job with the least remaining processes is processed first.
MOR Process the job with the most remaining operations first.
LQNO The job with the fewest waiting jobs in the next operation is processed first.
MQNO The job with the most waiting jobs in the next operation is processed first.
Random Assign tasks randomly.

123
598 Cluster Computing (2024) 27:589–605

Fig. 4 3D simulation factory

Table 3 TRANSFER LINE


Name Step Code Estimated Execution Time (min
FACTORY
Peeling 1 A01 14
Peeling 2 A02 14
Pull core 1 B01 15
Pull core 2 B02 15
Welding 1 C01 50
Forming 1 D01 83
Forming 2 D02 69
Assembly shell E01 63
Assembly shrapnel F01 67
Screw G01 57
Test 1 H01 47
Test 2 H02 47
Full picking feed point I01 36

Table 4 TRANSFER LINE FACTORY The performance of the trained GNN-RL scheduler is
Benchmark scheduling Source
verified by 1000 test instances. As shown in Fig. 5, the
relative scheduling errors of GNN-RL, FIFO, SPT, and
ORB 01-10 Applegate and Cook 1991 MOR are low and are distinguished from other PDRs by
SWV 01-20 Wu, and Vaccari 1992 different colors in the figures.
FT 06/10/20 Fisher 1963 The proposed scheduler outperforms all PDRs on the
LA 01-40 Lawrence 1984 test instances, and the trained GNN-RL scheduler has a
ABZ 5-9 Adams, Balas, Zawack 1988 relative scheduling error of 14.7% on the test instances.
YN 1-4 Yamada and Nakano 1992 Among the several PDRs tested, only MOR achieves
TA 01-80 Taillard 1993 comparable performance to GNN-RL. The relative
scheduling errors of the other PDRs are 20% and above.
This result indicates that the proposed scheduler provides a
better scheduling scheme than all PDRs.
Among them, Tmax is the completion time derived from the

In fact, the data used to generate training and test pro-
scheduling solution, and Tmax is the maximum optimal
duction are not enough to cover all real-world situations. In
completion time. We tested the performance of the GNN- the following experiments, scheduling problems much
RL scheduler from multiple dimensions. larger than the training instances and with different

123
Cluster Computing (2024) 27:589–605 599

Fig. 5 Performance Evaluation


of Test Distribution

distributions of processing time and machine order will be


used to verify the scalability of the GNN-RL scheduler.
Since the approach in this paper can run at different
problem sizes and the computational effort of embedding
node characteristics usually increases with the size of the
scheduling instances, it is necessary to analyze the sched-
uler computational complexity.
To quantify this relationship, two factors are considered:
the number of machines m and the number of tasks n, and
the impact of each factor on the computational complexity
is investigated. Two experiments are performed in this
section: fixed m and varying n, and fixed n and varying m.
Figure 6 shows the results of the analysis of the com-
putational complexity. As Fig. 6 shows the trend of
increasing computation time with increasing number of
tasks for a fixed number of machines (M = 50). As Fig. 6
shows the trend of increasing the computation time with
the number of machines while fixing the number of tasks
(N=1000). According to the fitted curves, the computation
time seems to be proportional to the square of the number
of jobs, and the computation time seems to be proportional
to the number of machines.
GNN-RL schedulers require more computation time
than PDRs because GNN-RL needs to compute node
embedding through GNN before performing scheduling
operations, while PDRs do not require any prior compu-
tation. The computation time interval usually increases
with the number of tasks and machines. The computation
time can be effectively reduced by parallelization when
computing node embedding. Therefore, in practical appli-
cations, the computation time gap between PDRs and
Fig. 6 Computational Complexity Analysis

123
600 Cluster Computing (2024) 27:589–605

GNN-RL scheduler can be further reduced to maintain the plant problem, some of the PDRs have a higher percentage
balance of performance and computation complexity. of problem solutions than the scheduler in this paper.
When dealing with small-scale problems, both GNN-RL
5.3 Scheduler generalization verification schedulers or PDRs show notable scheduling results, with
problem resolution ratios almost always above 80%.
In this section, a comparison of the scheduler’s general- However, when dealing with medium-scale problems, the
ization capability, the verification of the scheduler’s overall scheduling performance of the PDRs drops sharply
expansion capability, will be performed. due to the increase in problem solution size. As shown in
Three aspects of the scheduler’s generalization capa- Fig. 7, the problem solution ratio of most of the PDRs
bility will be verified: the proportion of problems solved, drops to about 60%, but GNN-RL still has a correct rate of
the relative scheduling error, and the generalization com- over 80% at this time. As the problem size further
parison with other scheduling algorithms. The proportion increases, the correct rate of PDRs further declines to
of problems solved is compared using four sets of problem between 20% and 40%, is no longer able to cope with
instances of different sizes to verify that the scheduler can large-scale and super-scale problems, but the GNN-RL
effectively migrate to large-scale instances as a result of scheduler still maintains a correct rate close to 80%. the
training in small-scale instances; the relative scheduling scheduling performance of PDRs decreases sharply as the
error is analyzed successively in small-scale problem number of tasks increases, but the GNN-RL scheduler
instances and various types of benchmark problems; and exhibits consistent high performance. Considering that the
finally, the generalization capability is compared with other method in this paper uses only small-scale training data
scheduling algorithms. during the training process, it is clear from the experi-
First, to demonstrate the scalability of the scheduler mental results that the proposed scheduler can be extended
proposed in this paper, the generalization capability of the to large instances without additional training.
scheduler in this paper is evaluated by the metric of the Next, the average relative scheduling errors of the
proportion of problems solved with different sizes. The trained GNN-RL scheduler and PDRs for small-scale
experimentally derived superior performance of FIFO, SPT problems are investigated, as well as the scheduling effects
and MOR in Sect. 5.3 is selected for comparison with for other benchmark problems that are completely different
GNN-RL. Among them, the GNN-RL scheduler requires from the training data.
prior training using training data of small-scale problems, Since PDRs have lower scheduling accuracy (10% to
while the PDRs strategy requires no preprocessing 60%) in medium-sized plants, large plants, and very large
operation. plants, it is clear from the analysis that the relative
As shown in Fig. 7, experiments are conducted using scheduling error at these problem sizes is clearly larger
1000 problem instances of four scale sizes (small factory, than that of the GNN-RL scheduler. However, the
medium-sized engineering, large factory, and mega fac- scheduling correctness of PDRs at small problem sizes is
tory), where the number of machines in the experiments is close to that of GNN-RL scheduler. To further investigate
fixed and the task volume is randomly generated within a the scheduling performance of the scheduler in small
reasonable interval for different problem sizes. As shown problem sizes, the relative scheduling errors of both in
in Fig. 6, it can be seen that the GNN-RL scheduler finds small problems are compared. Setting three sets of machine
more feasible solutions than the PDRs for all scale numbers (5, 10, 15) and four sets of task numbers (50, 100,
instances except the small plant problem, and in the small 150, 200), the relative average scheduling errors of the
trained GNN-RL scheduler compared to PDRs are shown

Fig. 7 Proportion of problems solved at different problem scales

123
Cluster Computing (2024) 27:589–605 601

in Fig. 8. Compared with PDRs, the proposed GNN-RL fluctuate with the properties of particular workshop
scheduler achieves stronger performance in almost all scheduling instances, and some PDRs operate well in
cases. However, the relative scheduling errors of some general but experience significant degradation in schedul-
PDRs (FIFO and MOR) are slightly lower than those of ing performance on certain benchmark problems. In con-
GNN-RL at 10  50 and 10 20 problem sizes. in the trast, the GNN-RL scheduler reliably generates high-
overall analysis, the GNN-RL scheduler also has superior quality scheduling solutions for all types of problem
scheduling capability in small-scale problems. instances.
In addition to the comparison of generalization ability Finally, a generalized comparison with other scheduling
on random data sets, the comparison on benchmark prob- algorithms will be performed. Task scheduling strategies
lems different from the training data sets better illustrates have been studied quite a bit, for example, metaheuristic
the superior generalization ability of the scheduler. As algorithms are applicable to a wide range of scenarios for
shown in Table 4, the number of machines and jobs for the solving scheduling problems and have some practical sig-
benchmark scheduling problems are in the interval [5, 20] nificance, for example, population intelligence optimiza-
and the interval [6, 100], respectively. Each set of bench- tion algorithms (PSO, ACO, BA) all seek optimal solutions
mark test instances follows a different distribution to by simulating cluster behavior in nature, but these methods
generate scheduling instances. usually converge too early and fall into local optima,
As shown in Fig. 9, the average relative scheduling error requiring a balance between local search and global search.
of the GNN-RL scheduler compared to the PDRs for dif- The use of reinforcement learning to solve shop-floor
ferent benchmark problems. the GNN-RL scheduler pro- scheduling problems is also widely used, but most of them
duces better solutions with average relative errors ranging cannot be extended to new instances of shop-floor
from 17.8% to 24.9%. Experiments on unworkable scheduling problems without additional training.
benchmark instances confirm that the GNN-RL scheduler In this section, the GNN-RL scheduler is compared with
is relatively robust to changes in problem distribution and some existing scheduling methods to verify the general-
size compared to PDRs, which schedule operations ization capability of the scheduler and uses Gurobi as a
according to predefined rules without considering the benchmark. The evaluation metric is defined as the nor-
structure of the shop floor scheduling instances. As a result, malized maximum completion time (Normalized
the quality of the scheduling solution of PDRs may

Fig. 8 Average relative error for small-scale problems

123
602 Cluster Computing (2024) 27:589–605

Fig. 9 Average relative error of


different benchmark examples

Makespan), using the results found by the exact method instance can be extended to solve large plant scheduling
Gurobi as a standard. instances.
Since both GNN-RL scheduler and DQN require addi-
tional training, to facilitate the examination of the gener-
alization ability of the method in this paper, the training 6 Conclusion
data of small plants are selected to train the model and
validate the effectiveness in medium-sized plants, large Personalized markets and customization have exacerbated
plants and megaplants instances, while the meta-heuristic the dynamics and uncertainty of the manufacturing envi-
algorithm does not need to be pre-trained. ronment. Operational optimization using real-time
As shown in Figure 10, the GNN-RL scheduler pro- responsive and globally optimized cloud-side collaborative
posed in this paper gives the lowest maximum completion scheduling solutions to support custom manufacturing has
time in all three problem sizes (medium-sized plant, large become a hot topic of research in manufacturing. At the
plant, and mega plant). As can be seen from the figure, as same time digital twins, as a solution for Industry 4.0, place
the problem size increases, the metaheuristic algorithm will high demands on the real-time nature of manufacturing,
converge prematurely and fall into local optimal solutions, and finally w considering that the current approach only
failing to find the optimal scheduling method for large- singularly targets algorithm performance while ignoring
scale problems, and the maximum completion time keeps the large training costs caused by switching scenarios.
increasing and cannot be effectively extended to other Based on these challenges, a digital twin-oriented
instances. In the process of using the algorithm for scheduling strategy is proposed to dynamically learn the
scheduling, it is usually necessary to adjust the parameters characteristics of the scheduling problem. Compared with
according to the specific scenario or to find a suitable op- traditional scheduling methods, this method has the
timization method. Therefore, the meta-heuristic algorithm advantages of high accuracy, short response time, and
cannot realize the dynamic scene change. dynamic learning. First, we use GNN to learn task node
The GNN-RL scheduler and DQN are both trained with features and embed these nodes into a specific spatial
data from mini-factories, and it can be seen from Fig. 10 structure of the task graph. Subsequently, the embedding
that compared to DQN, the GNN-RL scheduler has a more structure of the graph is transformed into a node embed-
difficult expansion capability and can be easily expanded to ding scheduling module and augmentation is used to train
other instances, with significantly better performance than the parameters. Since the proposed GNN-RL scheduler
the DQN learning model. It can be concluded that the node learns general features of the task-embedded graph rather
embedding scheme learned from the small plant scheduling than instance-specific features, the scheduler is generic and

123
Cluster Computing (2024) 27:589–605 603

Fig. 10 Generalization comparison of different scheduling algorithms

can generate high-quality task scheduling solutions for This article does not contain any studies with human
different cloud edge collaboration scenarios without addi- participants or animals performed by any of the authors.
tional training. Moreover, based on the reproducibility of Informed consent was obtained from all individual partic-
the graph structure, our proposed scheduler can be gener- ipants included in the study.
alized to large, invisible instances.
Acknowledgements This work was supported in part by the National
There are many unconsidered factors in this approach,
Natural Science Foundation of China under Grant Nos. 61672461 and
and the task scheduling problem in real CEC environments 62073293.
needs further research and improvement. In the future, we
will continue to dig deeper into the digital twin to improve
the accuracy of scheduling. References
1. Afrin, M., Jin, J., Rahman, A., Tian, Y.-C., Kulkarni, A.: Multi-
objective resource allocation for edge cloud based robotic
7 Statement workflow in smart factory. Fut. Gen. Comput. Syst. 97, 119–130
(2019)
It is an original paper. The submission is approved by all 2. Wang, X., Han, Y., Leung, V.C., Niyato, D., Yan, X., Chen, X.:
the authors. If accepted, the work described in this paper Convergence of edge computing and deep learning: a compre-
hensive survey. IEEE Commun. Surv. Tutor. 22(2), 869–904
will not be published elsewhere. And the study is not split
(2020)
up into several parts to increase the quantity of submissions 3. Li, Y., Wang, X., Gan, X., Jin, H., Fu, L., Wang, X.: Learning-
and submitted to various journals or to one journal over aided computation offloading for trusted collaborative mobile
time. No data have been fabricated or manipulated (in- edge computing. IEEE Trans. Mob. Comput. 19(12), 2833–2849
(2019)
cluding images) to support your conclusions. No data, text,
4. Gao, H., Huang, W., Duan, Y.: The cloud-edge-based dynamic
or theories by others are presented as if they were our own. reconfiguration to service workflow for mobile ecommerce
The data used to support the findings of this study are environments: a qos prediction perspective. ACM Trans. Internet
available from the corresponding author upon request. Technol. 21(1), 1–23 (2021)

123
604 Cluster Computing (2024) 27:589–605

5. He, C., Wang, R., Wu, D., Zhang, H., Tan, Z.: Qos-aware hybrid 25. Gasse, M., Chételat, D., Ferroni, N., Charlin, L., Lodi, A.: Exact
cloudlet placement over joint fiber and wireless backhaul access combinatorial optimization with graph convolutional neural net-
network. Opt. Switch. Netw. 45, 100678 (2022) works. Adv. Neural Inf. Process. Syst. 32, 1 (2019)
6. Leng, J., Chen, Z., Sha, W., Ye, S., Liu, Q., Chen, X.: Cloud-edge 26. Wang, Z., Gombolay, M.: Learning scheduling policies for multi-
orchestration-based bi-level autonomous process control for mass robot coordination with graph attention networks. IEEE Robot.
individualization of rapid printed circuit boards prototyping ser- Autom. Lett. 5(3), 4509–4516 (2020)
vices. J. Manuf. Syst. 63, 143–161 (2022) 27. Li, H., Huang, J., Wang, B., Fan, Y.: Weighted double deep
7. Manne, A.S.: On the job-shop scheduling problem. Oper. Res. q-network based reinforcement learning for bi-objective multi-
8(2), 219–223 (1960) workflow scheduling in the cloud. Clust. Comput. 25(2), 751–768
8. Lomnicki, Z.A.: A ‘‘branch-and-bound’’ algorithm for the exact (2022)
solution of the three-machine scheduling problem. J. Oper. Res. 28. Lee, M., Yu, G., Li, G.Y.: Graph embedding-based wireless link
Soc. 16(1), 89–100 (1965) scheduling with few training samples. IEEE Trans. Wirel.
9. Krishna, K., Ganeshan, K., Ram, D.J.: Distributed simulated Commun. 20(4), 2282–2294 (2021)
annealing algorithms for job shop scheduling. IEEE Trans. Syst. 29. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust
Man Cybern. 25(7), 1102–1109 (1995) region policy optimization. In: International Conference on
10. Gupta, A.K., Sivakumar, A.I.: Job shop scheduling techniques in Machine Learning, pp. 1889–1897 (2015). PMLR
semiconductor manufacturing. Int. J. Adv. Manuf. Technol. 30. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.:
27(11), 1163–1169 (2006) Proximal policy optimization algorithms. ArXiv abs/1707.06347
11. Muteeh, A., Sardaraz, M., Tahir, M.: Mrlba: multi-resource load (2017)
balancing algorithm for cloud computing using ant colony opti-
mization. Clust. Comput. 24(4), 3135–3145 (2021) Publisher’s Note Springer Nature remains neutral with regard to
12. Kim, Y.-J.: A supervised-learning-based strategy for optimal jurisdictional claims in published maps and institutional affiliations.
demand response of an HVAC system in a multi-zone office
building. IEEE Trans. Smart Grid 11(5), 4212–4226 (2020)
Springer Nature or its licensor (e.g. a society or other partner) holds
13. Qi, Q., Zhang, L., Wang, J., Sun, H., Zhuang, Z., Liao, J., Yu,
exclusive rights to this article under a publishing agreement with the
F.R.: Scalable parallel task scheduling for autonomous driving
author(s) or other rightsholder(s); author self-archiving of the
using multi-task deep reinforcement learning. IEEE Trans.
accepted manuscript version of this article is solely governed by the
Vehicul. Technol. 69(11), 13861–13874 (2020)
terms of such publishing agreement and applicable law.
14. Grondman, I., Busoniu, L., Lopes, G.A.D., Babuska, R.: A survey
of actor-critic reinforcement learning: standard and natural policy
gradients. IEEE Trans. Syst. Man Cybern. C 42(6), 1291–1307
(2012) Chengfeng Jian Zhejiang Pro-
15. Park, J., Chun, J., Kim, S.H., Kim, Y., Park, J.: Learning to vince, China. Birthdate: June,
schedule job-shop problems: representation and policy learning 1973. Ph.D., graduated from
using graph neural network and reinforcement learning. Int. Zhejiang University. And
J. Prod. Res. 59(11), 3360–3377 (2021) research interests on Cloud
16. Li, J., Gao, H., Lv, T., Lu, Y.: Deep reinforcement learning based Computing and Machine
computation offloading and resource allocation for MEC. In: Vision. He is a professor of
2018 IEEE Wireless Communications and Networking Confer- Dept. Computer Science and
ence (WCNC), pp. 1–6 (2018). IEEE Technology Zhejiang Univer-
17. Yang, T., Hu, Y., Gursoy, M.C., Schmeink, A., Mathar, R.: Deep sity of Technology.
reinforcement learning based resource allocation in low latency
edge computing networks. In: 2018 15th International Sympo-
sium on Wireless Communication Systems (ISWCS), pp. 1–5
(2018). IEEE
18. Wang, Y.-C., Usher, J.M.: Application of reinforcement learning
for agent-based production scheduling. Eng. Appl. Artif. Intell.
18(1), 73–82 (2005) Zhuoyang Pan Zhejiang Pro-
19. Yu, Z., Wang, K., Wan, Z., Xie, S., Lv, Z.: Popular deep learning vince, China. B.S., graduated
algorithms for disease prediction: a review. Clust. Comput. 1, from Zhejiang University of
1–21 (2022) Technology. And research
20. Bellavista, P., Giannelli, C., Mamei, M., Mendula, M., Picone, interests on Cloud Computing
M.: Application-driven network-aware digital twin management and Edge Computing. He is a
in industrial edge environments. IEEE Trans. Ind. Inf. 17(11), master student of Dept. Com-
7791–7801 (2021) puter Science and Technology
21. Tao, F., Zhang, H., Liu, A., Nee, A.Y.C.: Digital twin in industry: Zhejiang University of
state-of-the-art. IEEE Trans. Ind. Inf. 15(4), 2405–2415 (2019) Technology.
22. Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning
combinatorial optimization algorithms over graphs. Adv. Neural
Inf. Process. Syst. 30, 1 (2017)
23. Kool, W., Van Hoof, H., Welling, M.: Attention, learn to solve
routing problems! arXiv preprint arXiv:1803.08475 (2018)
24. Mao, H., Schwarzkopf, M., Venkatakrishnan, S., Meng, Z., Ali-
zadeh, M.: Learning scheduling algorithms for data processing
clusters, pp. 270–288. ACM (2019)

123
Cluster Computing (2024) 27:589–605 605

Lukun Bao Zhejiang Province, Meiyu Zhang Zhejiang Pro-


China. B.S., graduated from vince, China. Birthdate: June,
Zhejiang University of Tech- 1965. M.S., graduated from
nology. And research interests Zhejiang University. And
on Cloud Computing and Edge research interests on Big data
Computing. He is a master stu- and Cloud Computing. She is a
dent of Dept. Computer Science professor of Dept. Computer
and Technology Zhejiang Science and Technology Zhe-
University of Technology. jiang University of Technology.

123

You might also like