Intelligent_Decision-Making_of_Load_Balancing
Intelligent_Decision-Making_of_Load_Balancing
ABSTRACT Machine learning and parallel processing are extremely commonly used to enhance computing
power to induce knowledge from an outsized volume of data. To deal with the problem of complexity and
high dimension, machine learning algorithms like Deep Reinforcement Learning (DRL) are used, while
parallel processing algorithms like Parallel Particle Swarm Optimization (PPSO) are parallelized to speed
up the operation and reduce the processing time to train the neural network. Due to the arrival of a large
number of incoming tasks in the cloud environment, load balancing is an important issue. To solve this
problem, the datacenter controller or an agent makes an intelligent decision to handle a large number of
tasks within a minimum time period or at high speed. In this work, we proposed an effective scheduling
algorithm named Deep Reinforcement Learning with Parallel Particle Swarm Optimization (DRLPPSO)
to solve the load balancing problem and its various parameters with greater accuracy and high speed. Our
experimental results show that our proposed scheduling algorithm increases the reward by 15.7%, 12%,
and 13.1% when the task set is 2000 and improves the reward by 17.5%, 12.6%, and 15.3% when the task
set is 4000, as compared to the Modified Particle Swarm Optimization (MPSO), Asynchronous Advantage
Actor-Critic (A3C), and Deep Q-Network (DQN) techniques.
INDEX TERMS Load balancing, deep reinforcement learning, neural network, parallel PSO.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 76939
A. Pradhan et al.: Intelligent Decision-Making of Load Balancing Using DRL and PPSO in Cloud Environment
physical machine consumes electric power for the task it it shows the problems of quick convergence as well as inabil-
does. That means if a server has a number of workloads, it ity to escape being trapped in local optimum [14], [15].
consumes more energy [4]–[6]. To overcome this problem, Two different strategies were proposed in [14] to solve the
a better scheduling algorithm is required that can handle the traditional PSO problem. A dimensional learning strategy
load among the VMs and execute all the incoming tasks with (DLS) is used for finding the personal best value of each
less execution time and consume less energy in the datacenter. particle. By applying the two-swarm learning PSO (TSLPSO)
A legitimate scheduling technique can be utilized to algorithm, it guides the local search of the particles and
enhance each parameter of load balancing for fulfilling Qual- finds the optimal value from the global search. In [15],
ity of Service (QoS) and further develop the framework exe- a load balancing method was proposed based on the Modified
cution. Various parameters such as makespan time, energy Particle Swarm Optimization (LBMPSO) method that uses
consumption, resource usage, cost, and so forth are con- the global best inertia weight parameter to avoid the local
sidered to improve the performance of the cloud network. optimum problem. Commonly, PSO is to observe the ideal
Basically, there are two important types of scheduling, such as outcome of the population with the assistance of particles’
task scheduling and resource scheduling, or VM scheduling. individual value, fitness value, and global best value. Parallel
Task scheduling is responsible for optimizing the makespan computing has been used to achieve the desired accuracy and
time or execution time of all incoming tasks within a physical is used to accelerate neural network training for large training
machine [3]. Resource scheduling is responsible for opti- data sets [16]–[19]. In [16], a parallelization strategy for
mizing resource utilization, resource selection for a given convolutional neural network (CNN) training was proposed
task, and energy consumption [7]. In recent times, most based on two major techniques to maximize the overlap.
researchers have focused on hybrid scheduling algorithms as In the first technique, all the gradients of parameters are
well as parallel computing techniques to solve this optimiza- divided into two large chunks that reduce the communication
tion problem. A hybrid scheduling algorithm, for example, time. To reduce communication costs, the second technique
combines various meta-heuristic algorithms with machine involves replicating the gradient calculation in a few fully-
learning techniques to build an effective scheduling algorithm connected layers. In [17], a concept of parallel processing
[8]–[10]. A hybrid scheduling algorithm, which is a combi- was proposed that helps in saving time in artificial neural
nation of Ant Colony Optimization (ACO) and Deep Rein- networks (ANNs) training. In [18], a parallel algorithm called
forcement Learning (DRL) algorithms, has been proposed Split and Conquer for solving Verification of Neural Network
in [8] to increase the system performance. In [9], a method (VNN) formulas, was proposed using the Reluplex proce-
was proposed for avoiding the problem of the continuous dure and an iterative-deepening strategy. In [19], a parallel
nature of the cloud environment and improving the conver- deep neural network architecture with an embedded organi-
gence rate by combining the policy gradient algorithm with zation mechanism was proposed, which enforces diversity
particle swarm optimization-based parameter exploration among the deep neural networks used as base models. The
(PG-PSOPE). To improve the exhibition, keep up load, and Parallel Particle Swarm Optimization (PPSO) algorithm is
adjusting and incrementing the throughput, a hybrid tech- an example of a parallel computing process to minimize
nique was developed in [10], which combines both modified the processing time of high computation [20]–[22]. In [20],
PSO and a Q-learning algorithm known as QMPSO. The a novel way to implement PSO on a Graphic Processing
Deep Q-network (DQN) method is broadly utilized in Deep Unit (GPU) was proposed, where the PSO algorithm can
Reinforcement Learning (DRL) to achieve the maximum be executed in parallel on the GPU to optimize the system
reward [11]–[13]. In [11], a joint optimization was formulated performance and speed. In [21], a technique was proposed
of the task offloading and bandwidth allocation for multi-user to solve the highway alignment optimization problem by
mobile edge computing, with the objective of minimizing the using an integrated model of parallel processing and a par-
overall cost, including the total energy consumption and the ticle swarm optimization algorithm. This method is used to
delay in finishing the task. To solve the resource allocation optimize the speed of processing. In [22], two parallel PSO
issue in the Mobile Edge Computing environment, a smart algorithms on the hierarchical GPU have been developed.
resource allocation algorithm, known as the Deep Reinforce- These algorithms are applied to Max-CSPs to improve the
ment Learning based Resource Allocation (DRLRA) algo- GPU’s resolution effectiveness and reduce execution time.
rithm, is proposed in [12] to minimize average service time The first method is a parallel GPU-PSO for Max-CSPs (GPU-
and balance the resource allocation. To achieve efficient real- PSO), and the second one is a GPU distributed PSO for Max-
time task scheduling in the cloud environment, a double deep CSPs (GPU-DPSO). With the help of parallel techniques,
Q-network task scheduling (DDQN-TS) scheduling method the large training dataset can be divided into a number of
was proposed in [13] to reduce the task response time while subparts that speed up the processing [23]. By applying the
ensuring a high task completion rate. Similarly, compared to back propagation learning algorithm, it adjusts the weight
various population-based methods, Particle Swarm Optimiza- and achieves better results. Authors in [24] introduced a
tion (PSO) is more efficient, simple, and easy to learn with unique technique that uses variable lengths of particles to
fewer parameters that require adjustment. It also gives higher search for optimal topologies in deep convolutional neural
performance than some other population-based methods, but networks.
As we know, in a cloud environment, incoming tasks of development was suggested. Individual subswarms evolve
are placed into an appropriate virtual machine to allocate independently in parallel in multi-stage evolution, but in
the resources for execution. However, when the appropriate single-stage evolution, subswarms exchange information to
resources are not available on the server or it is heavily find the global-best. The two intertwined stages of evolu-
loaded, these incoming tasks take too much time to complete. tion show superior performance on test functions, particu-
If a server is loaded, then it consumes more energy. To over- larly those with higher dimensions. The PSO-PSO version
come this problem, a parallel scheduling method is pro- of the technique is appealing because it does not incorpo-
posed in this paper which depends on both task and resource rate any new parameters to boost convergence performance.
scheduling and is named Deep Reinforcement Learning with In [33], a parallel particle swarm optimization algorithm
Parallel Particle Swarm Optimization (DRLPPSO). In this was proposed, which comprises two phases for solving the
work, we use both DRL and Parallel PSO techniques to convergence and local optimum problems. The first is the
optimise the solutions. multi-evolutionary phase, in which the swarm is divided into
In a cloud environment, an appropriate decision can be k sub-swarms. Each sub-swarm evolves independently. Each
made for allocating a huge number of incoming tasks to particle adjusts its position depending on its own experience
suitable resources. This decision can be made with the help of as well as the swarm’s best. For a specific number of itera-
the DRL algorithm. We consider systems that learn to man- tions, this process is repeated. The swarms are then combined
age resources directly from experience and train our model to form a single evolutionary phase. In which the swarm best
by using artificial neural networks. Using deep neural net- of each subswarm is compared to determine the global best.
works, it offers extraordinary capacity to deal with complex In [34], PPSO was employed to find the ideal virtual machine
control issues in a highly layered and continuous environ- selection in a cloud environment to lower the cost of service
ment [25]. In a cloud environment, task allocation and energy based on turnaround time, waiting time, and CPU utilization.
consumption problems are figured out as Markov Decision The primary contribution of this paper is as per the following:
Processes (MDP). Multiple replay memory is utilized in the • We proposed a hybrid method that combines PPSO and
DQN method to reduce execution time, allocation time, task DRL techniques, which aims to maximize the reward
transfer time, and energy consumption [26], [27]. According by reducing makespan time and energy consumption
to [28], energy consumption in datacenters has two distinct while maintaining high accuracy, as well as speedup the
characteristics: (i) servers use more energy when they are execution of continuous approaching position in a cloud
heavily loaded; and (ii) servers use a lot of power when environment.
they are idle. Subsequently, server solidification and load • We coordinate the numerical model of our method
balancing can be used to increase the overall system reward and also depict the point-by-point execution of our
rate and accuracy that allows users to receive more benefits. DRLPPSO algorithm.
In our proposed algorithm, an agent, such as a datacenter • We evaluated the proposed method by taking different
controller, checks the status of a VM to determine whether it task sets. The experimental outcomes show the suffi-
is overloaded or underloaded. If the VM is overloaded, then ciency of the proposed method in relation to current
it takes the necessary action to migrate some tasks from the pattern strategies.
overloaded VMs to the underloaded ones. Also, it helps to The remainder of the paper is spread out as follows:
reduce the variance between the targeted load and the present Related work is discussed in Section II; the cloud frame-
load of VMs within a single server. work model and optimization objectives are discussed in
Due to its promising results, Parallel PSO (PPSO) was cho- Section III; Section IV presents the DRLPPSO scheduling
sen as the metaheuristic optimization technique in this paper. algorithm; experimental results are discussed in Section V.
It enhances the system performance due to parallelization The paper is concluded in section VI.
and improved speed of large-scale analytical test problems
[29]–[31]. In [29], a coarse decomposition scheme has been II. RELATED WORK
chosen where the algorithm performs only the fitness evalu- There are several existing scheduling policies that are static
ations concurrently on a parallel machine. To find the global or dynamic, and that are either single objective, bi-objective,
best result by applying Parallel PSO, a model has been pro- or multi-objective. These policies have focused on maintain-
posed in [30] which is based on a master-slave model. With ing machine load balance as well as various load balancing
the help of the PPSO algorithm [31], the VMs are selected factors such as makespan time, energy consumption, and
to minimize the total execution time. To reduce the search resource utilization. To address the aforementioned prob-
time and improve fitness function, PPSO divides the swarm lem, researchers developed several machine learning and
into sub-swarms [32]–[34], and each sub-swarm contains a population-based optimization scheduling algorithms for the
DRL algorithm to find the reward. These sub-swarms are cloud environment. To improve the service quality and
run in parallel to minimize processing time. As a result, it is decrease the cost, a technique was suggested in [35] that relies
appropriate to handle several requests from various users in on the Deep Q-network (DQN) algorithm to reduce energy
parallel. In [32], a parallel swarm-oriented particle swarm consumption and makespan time by modifying the reward
optimization (PSO-PSO) with multi-stage and a single stage weight. In [36], a two-stage methodology was suggested
for job scheduling and resource allocation. A heterogeneous TABLE 1. Advantages and disadvantages of each scheduling algorithm.
distributed deep learning (HDDL) method has been employed
to manage job scheduling, and a deep Q-network (DQN)
was used to address resource allocation. Each algorithm
has been utilized to reduce the amount of energy required
and the time it takes to complete a task. In [37], a task
scheduling technique based on a deep reinforcement learn-
ing algorithm was suggested to maximize makespan and
resource consumption. In [38], a task scheduling method
based on deep reinforcement learning architecture (RLTS)
was presented to manage the complexity and high dimen-
sionality of the environment by minimizing task execution
time. In [39], foresighted job scheduling was presented based
on Q-learning to reduce reaction time and makespan while
increasing resource effectiveness. To address the job schedul-
ing problem, a deep reinforcement learning based algorithm
is proposed in [40] to help application providers dispatch
jobs to limited resources under QoS requirement constraints.
In [41], the DQN method was proposed, which follows the
DRL approach to reduce energy consumption and average
response time while increasing the success rate. In [42],
a modified PSO task scheduling algorithm (MPSO) was pro-
posed, which is utilized to minimize transmission, execution,
and energy consumption in a cloud environment by using
a modified inertia weight method. To solve the resource
allocation problem in cloud datacenters, a method is proposed
in [43], which is based on Actor-Critic Deep Reinforcement
Learning. This method seeks to reduce the energy consump-
tion and improve the QoS.
All the above methods have two common objectives,
i.e., makespan time and energy consumption. To achieve the
goal, there is no suitable action defined to handle the extra
task that can increase the reward in a dynamic environment.
Therefore, in this paper, we define a suitable action that
can handle the extra task and give the maximum reward as
compared to the above methods. Also, in this paper, we show
that our proposed method has better accuracy and faster
processing speed as compared to others. Comparison between completion time and energy consumption. Based on each
various scheduling algorithms with their advantages and dis- objective, we found the reward function that helps us achieve
advantages is represented in Table 1. the best optimization of the system. Table 2. shows the terms
and meanings used in the proposed algorithm.
III. CLOUD FRAMEWORK MODEL AND OPTIMIZATION
OBJECTIVES
A basic framework of a cloud network is shown in Fig. 1, A. TASK COMPLETING TIME
which contains two layers: a task layer and a datacenter layer. Suppose a datacenter in the cloud network consists of a set
These two layers store the required information to achieve of s servers with various resource configurations, such as
the objective. In the task layer, all needed information about Ss , where s = {1, 2, . . . , m}. Number of incoming tasks w,
incoming tasks, such as expected completion time (ECT), which are entering into datacenter, such as Tw , where w =
task length (), and task file size () is stored in the task queue to {1, 2, . . . , n} and each server contains v number of VMs, such
find the best VMs from the server. Similarly, the datacenter as VM v , where v = {1, 2, . . . , o}, which are the basic units of
layer holds all of the necessary information such as storage server resources, but the condition for execution of such tasks
units, processing units, data transfer capacity, and processing is, w > v. Each task has a length Tleng which is expressed as
speed for both servers and virtual machines. This information millions of instructions (MI) and the speed of VM processing
is helpful to the datacenter controller, where it applies the is measured in millions of instructions per second (MIPS).
best scheduling algorithm to accomplish the objective and To accomplish our objective, we figure the processing rate of
get the optimum result. Our objectives include reducing task VM as in (1). It depends on the properties of VM, such as
(
1, if Tw is allocatedto VM v
DV w,v = (5)
0, Otherwise
if the load is below the threshold value, then its status is A. REWARD FUNCTION
underload; otherwise, the VM status is balance. The VM The load balancing problem is described as a Markov Deci-
status is represented in (10) and threshold value is represented sion Process (MDP) due to the continuous nature of tasks
as in (11), where bavg is the average bound, 1 < bavg < 2. in cloud computing. In our proposed method, the datacenter
controller is represented as an agent and the datacenter is
TH value = (VM avgld × bavg ) − VM ld (10) represented as the environment where the agent takes action
< |TH value | Underload
by allocating incoming tasks to a suitable VM in each cycle.
VM ld = > |TH value | Overload (11) MDP has four important variables that are described as below.
= |TH value | Balanced
1) STATE SPACE
D. ENERGY CONSUMPTION Inside this space, the ideal move is made by an agent based
The energy consumption (Econs ) of the server Ss at time t on the current VM information such as the number of VMs,
depends on the number of active and idle VMs. Both active their available MIPS, CPU, memory, and bandwidth. Our
and idle VMs depend on CPU utilization. The energy con- state space can be characterized as:
sumption of an active VM (Eactive ) is calculated as the energy
consumption of task execution at a particular VM (Etexe ) and S = {VM v , VM mips , VM cpu , VM mem , VM bw }
energy consumption of task transfer between two VMs (Ett ),
which is represented in (14). Etexe depends on the load on VM, 2) ACTION SPACE
CPU utilization of VM and weight of the server (Swt ) which In this action space, each task is allocated a VM for execution.
is represented in (12). Energy consumption of task transfer Each action has various pieces of information, such as: the
between two VM (Ett ) depends on task transfer between two number of tasks, task length, and file size. Our action space
VM, their bandwidth and CPU utilization. It is calculated as can be addressed as:
in (13), where x and y are the two VM. According to [44],
an idle machine can consume two third energy of CPU uti- A = {Tw , Tleng , Tfs }
lization, which is represented in (15).
3) TRANSITION FUNCTION AND ACTION SELECTION
Etexe = VM ld × CPU util × Swt (12) It shows that when an agent takes an action in a current
TT x,y state, it reaches a new state. Each time an agent tries to
Ett = ( ) × CPU util (13)
BW x,y take an appropriate action to reach an optimal state that
Eactive = Etexe + Ett (14) gives the highest reward. Transition function is represented
2 as P(st , at , rt , st+1 ). It is the probability of reaching the next
Eidle = × CPU util (15) state st+1 and getting reward rt after executing selected action
3
at at the current state st . Due to the dynamic nature of cloud
Finally, energy consumption on a server is the sum of the networks, tasks vary with respect to time, length, and file
energy consumption of all active and idle VMs, which is size. Therefore, the load of the VM also changes. Some of
represented in (16). the VMs are overloaded and some of them are underloaded,
Xo which is determined by (11). To balance the load among VMs,
Econs = (Eactive + Eidle ) (16) the agent takes the action that transfers the extra task from
v=1 an overloaded to an underloaded VM at a high speed. Task
IV. DRLPPSO SCHEDULING ALGORITHM
transfer speed can be calculated by using (17). If we increase
In this section, we describe the essential idea of the proposed the task transfer speed, then it minimizes the completion time
DRLPPSO scheduling algorithm, where an agent can get an of tasks. The selected action a at iteration t, i.e., can be
individual reward by taking an appropriate action on each calculated by using (18).
state in an environment by using the DRL algorithm. By using Xn Xo Xo ETTRx,y
the PPSO algorithm, after getting their individual best reward, Tts = (1−DV w,x DV w,y )
w=1 x=1 y=1 BW x,y
they would try to get the global best reward with the minimum
(17)
processing time by exchanging their information with the
neighbors. We first select the suitable action by taking a
where, ETTRx,y is the extra task transfer between VM x to
state-action value function and then get the personal best
VM y. BW x,y is the required bandwidth between two VM x
reward for each server that optimizes the load and its param-
and VM y. DV w,x = 1, indicates that the VM x is overloaded.
eters. After getting the personal best reward, each server can
DV w,y = 1, indicates, extra task has been transferred from
find the global best value by sharing its information among
VM x to VM y.
other servers at high speed. This section contains various
sub-sections to describe the reward function, DRL, PSO, and
at = max(Tts ) (18)
PPSO. a
4) REWARD action by using (18). After getting the reward, we compare the
An agent will get a reward for making specific moves in state-action value function of evaluated Q-network with the
various states. An agent is attempting to choose a state having state-action value function of target Q- network to get more
a higher reward to maximize its accuracy. In our model, the accuracy.
reward function is characterized by the minimization of the The goal of the proposed algorithm is to get the evaluated
completion time of the task on a specific VM and energy Q-network as near to the target Q-network as possible to
consumption. The reward is represented in (19). achieve better outcomes. As a result, we train our neural
network to calculate the loss function (which decreases the
rt = {minCT w,v + Econs } (19) difference between the evaluated and target Q-networks),
and the neural network’s parameters are updated using back-
B. DEEP REINFORCEMENT LEARNING
propagation and gradient descent [47]. The loss function
Recently, a number of machine learning algorithms, such can be reduced to update the parameters of Q-networks.
as RL and DRL algorithms, have been applied to comput- Equation (22) can be used to construct the loss function
ing platforms to optimize the respective parameters. These and (23) can be utilized to establish the target state-action
learning algorithms acquire knowledge from the environment value function.
by choosing actions. It is helpful for maximizing reward by
optimizing the factors in the environment. RL algorithm is a L (θ) = E[(target t − Q(st , at ; θ))2 ] (22)
model-free Q-learning algorithm that uses a state-action value
target t = rt + γ max
at+1 Q st+1 , at+1 ; θ
0 0
function Q5 (st , at ) to represent a value for selecting an action (23)
at in a current state st that follows a policy 5. This function
is stored in Q-table or gets the reward. The state-action value where target t is the target state-action value function and it is
function follows the Bellman condition [45], [46] and it is evaluated by the action performed on the target Q-network
represented in (20). with parameters θ 0 . All the parameters of the evaluated
Q-network θ are updated at each iteration, t. However, the
Q5 (st , at ) = rt + γ max Q5 (st+1 , at+1 ) (20) parameter of target Q-network θ 0 is fixed and updated only
To increase the algorithm’s performance, the learning rate at stationary stages. As a result, the target Q-network update
β is added, as represented in (21). rate is slower than the evaluated Q- network. The pseudo
code of the DRL algorithm is described in Algorithm 1. The
datacenter controller uses this algorithm to gather the vital
Q5 (st , at ) = Q5 (st , at ) + β( rt + γ max Q5 (st+1 , at+1 )
network parameters, for example, individual load, overall
− Q5 (st , at )) (21) load, and completion time, which are refreshed as the states
of the environment change. Following the initial transition
This procedure will be carried out iteratively till the ter- function (st , at , rt , st+1 ), network parameters are utilized to
minal condition is reached. Each time, these Q-value or develop the loss function and then fine-tuned. Each time the
state-action value functions are stored in the Q-table. This load is assigned, an action at is chosen using (18), followed
shows the drawback of the Q-learning algorithm. As the by the reward depending on choosing the state st+1 . This
quantity of actions grows, the intricacy of computation also decision is forwarded to the controller, who is in charge of
grows, hence it diminishes the exhibition of the system. Thus, allocating resources for subsequent tasks.
Q-learning algorithm in RL will not show data efficiency,
learning efficiency, and stability. To overcome this challenge,
we use DQN, an altered rendition of typical Q-learning that
employs experience replay, target networks, exploration, and
exploitation techniques. This strategy enables our proposed
algorithm to be more suitable for training large neural net-
works with faster convergence speeds, as demonstrated in
Fig. 2. Two neural networks are shown in Fig. 2, which are
target Q-network and evaluated Q-network, and they have
the same network structure. Both are used in the training
process for choosing N experiences from the experience
memory reply D. We set θ as the parameter of evaluated
Q-network and θ 0 is the parameter of target Q-network.
At each iteration t, state st , action at and parameter θ are
used to generate state-action value function Q(st , at ; θ) with
reward. This serves as an input to the target Q-network to
acquire the highest state-action value function of all actions
in the target Q-network. In this paper, we use the ∈-greedy
strategy to select the random action otherwise we choose the FIGURE 2. Structure of neural network.
Algorithm 1 of DRL the global best and allocate the next incoming task to the best
Input: Learning rate; discount factor; exploration factor; particle. After finding, each particle can update its position
replay memory capacity; information of each task, VM and and velocity according to (24) and (25). Each term used in
server; PSO with its meaning is explained in Table 3.
Output: Reward of each server.
Xi (t + 1) = Xi (t) + Vi (t + 1) (24)
1. Initialize memory D to capacity N
Vi (t + 1) = ωVi (t) + c1 r1 (Pi (t)−Xi (t))
2. Initialize evaluated Q-network parameters
3. Initialize target Q-network with parameters θ 0 + c2 r2 (Gbest (t)−Xi (t)) (25)
where θ 0 =
4. Begin TABLE 3. Terms and meaning.
5. For each episode e do
6. initialize the state st with load
7. For each task in task-queue do
8. if probability ∈ then
choose a random action at
else
Select at = (Tts )
End if
9. Applying action at , calculate total reward rt by
using (19)
10. Move to the new state st+1
11. Store transition (st , at , rt , st+1 ) in memory D D. PARALLEL PARTICLE SWARM OPTIMIZATION
12. Execute Algorithm 1.1 for evaluated Q-network The PPSO algorithm is a suitable method for solving an
training optimization problem with less processing time. The PPSO
13. Every T step, update target Q-network θ 0 = θ algorithm leads to an enhanced throughput due to paralleliza-
14. End For tion and improved speed, even if the environment contains a
15. End For large population size. The working principle of PPSO is same
16. Return reward as PSO, but the difference is that in PPSO, the main swarm is
17. End divided into a number of sub-swarms where each sub-swarm
works as a single PSO and each sub-swarm runs in parallel to
reduce the processing time for getting the best result.
Algorithm 1.1 Evaluated Q-Network Training The PPSO algorithm has two phases. One is a multi-
1. Sample random mini-batch of transition evolutionary phase and the other is a single evolutionary
(su , au , ru , su+1 ) from memory D phase. The swarm or datacenter is randomly divided into k
2. if episode terminates at step j +1 then sub-swarms or servers during the multi-evolutionary phase.
Set target v = rv Each sub-swarm consists of a number of particles or virtual
else machines (VMs), each of which can be evaluated indepen-
Set target u = ru + dently of the swarm. After determining the optimal value,
γ max each particle uses (24) and (25) to update its position and
u+1 , au+1 ; θ
Q 0 s 0
au+1
3. End if velocity based on its own and the swarm’s best experiences.
4. Perform a gradient descent using loss function: For a specific number of iterations, this process is repeated.
(target u − Q(su , au ; θ))2 The sub-swarms are then combined to form a single evo-
5. Repeat till least loss value is achieved with lutionary phase. The global best (Gbest ) is determined by
update parameter comparing the swarm best (Sbest ) of each sub swarm. For
minimization problems, (26) represents the Gbest in a given
iteration.
Xij (t + 1) = Xij (t) + Vij (t + 1) (28) such as MPSO, A3C, and DQN algorithm. From the simu-
lation results, it is clearly shown that our model gives better
For a specific number of iterations, this process is repeated.
rewards as compared to existing algorithms. It also handles
In comparison to the original PSO, the parallel version of the
the machine load and minimizes load balancing parame-
method is simple to implement and delivers better assign-
ters. The results also show the high accuracy and the speed
ments. Table 4. represents the terms and meanings used in
increase of our system performance. Overall tests were con-
PPSO.
ducted in Google Colab with the Python environment and
TABLE 4. Terms and meanings. TensorFlow. In this environment, we use the PPSO technique
for training the neural network to optimize its load. All the
simulation results are shown in Fig. 4 to 9.
TABLE 5. Task properties. and 42.5%. Finally, an approximate 40% reward is obtained.
Table 8 provides detailed information about various schedul-
ing algorithms in terms of energy consumption, completion
time, and reward percentage.
From the experiment, we find that our proposed algorithm
shows 15.7%, 12% and 13.1% better rewards as compared to
MPSO, A3C, and DQN scheduling algorithms.
TABLE 6. Server and VM properties. TABLE 8. Various scheduling algorithm with reward percentage of
2000 task set.
According to Amdahl’s Law, speedup is calculated by the with Parallel PSO (DRLPPSO) scheduling algorithm. This
ratio of execution time on one CPU in sequential and the algorithm is based on both the DRL learning algorithm and
execution time for all CPUs in parallel. The speedup of our the Parallel PSO algorithm. Through the DRL learning algo-
proposed algorithm is shown in (29). rithm, we train our neural network to get the best reward.
Su = Es Ep (29) By using PPSO, the overall processing time of all the incom-
ing load is reduced. This scheduling algorithm is proposed to
where, Su is the speedup, Es is the execution time on one achieve improvement in various parameters of load balancing
CPU in sequential mode, and Ep is the execution for all CPU with a minimum time period as compared to other popu-
in parallel mode. But in this paper, we deal with parallel lar existing scheduling algorithms in a cloud environment.
processing computing. Therefore, we follow Gustafson law. Our simulation experiment is done with the help of Google
This law simplifies Amdahl’s Law. According to Gustafson’s Colab with the Python environment and TensorFlow. In the
law, speedup is represented in (30). simulation, we carried out three different experiments that
Su = 1 + (C − 1) × p (30) showed reward percentage, accuracy, and speedup process.
where, C is the number of CPU and p is the fraction which When compared to the MPSO, A3C, and DQN scheduling
lies between 0.2 to 0.99. algorithms, the DRLPPSO scheduling algorithm improves
rewards by 15.7%, 12%, and 13.1% when the task set is 2000,
and by 17.5%, 12.6%, and 15.3% when the task set is 4000.
This information concludes that our proposed algorithm gives
better rewards even if a large number of tasks come to the
datacenter. The second experiment result shows the accuracy
value, which is approximately 0.838889. The final experi-
ment result presents the speedup process which is compared
between DRLPPSO and the existing PSO algorithm.
In the future, we will compare our proposed algorithm
with other meta-heuristic algorithms such as Parallel PSO
(PPSO), Genetic Algorithm (GA), and Ant Colony Opti-
mization (ACO). Also, to improve resource allocation and
resource management concepts, a hybrid algorithm has been
proposed, which is a combination of both swarm optimization
and machine learning algorithms. This algorithm will pro-
vide real-time analytics on the complex and dynamic cloud
FIGURE 9. Speedup value. network.
[10] U. K. Jena, P. K. Das, and M. R. Kabat, ‘‘Hybridization of meta-heuristic [34] A. Abdelaziz, M. Elhoseny, A. S. Salama, and A. M. Riad, ‘‘A machine
algorithm for load balancing in cloud computing environment,’’ J. King learning model for improving healthcare services on cloud computing
Saud Univ. Comput. Inf. Sci., vol. 36, no. 6, pp. 2332–2342, Jun. 2020. environment,’’ Meas. J. Int. Meas. Ed., vol. 119, pp. 117–128, Apr. 2018.
[11] L. Huang, X. Feng, C. Zhang, L. Qian, and Y. Wu, ‘‘Deep reinforce- [35] Z. Peng, J. Lin, D. Cui, Q. Li, and J. He, ‘‘A multi-objective trade-off
ment learning-based joint task offloading and bandwidth allocation for framework for cloud resource scheduling based on the deep Q-network
multi-user mobile edge computing,’’ Digit. Commun. Netw., vol. 5, no. 1, algorithm,’’ Cluster Comput., vol. 23, no. 4, pp. 2753–2767, Dec. 2020.
pp. 10–17, Feb. 2018. [36] J. Lin, D. Cui, Z. Peng, Q. Li, and J. He, ‘‘A two-stage framework for the
[12] J. Wang, L. Zhao, J. Liu, and N. Kato, ‘‘Smart resource allocation for multi-user multi-data center job scheduling and resource allocation,’’ IEEE
mobile edge computing: A deep reinforcement learning approach,’’ IEEE Access, vol. 8, pp. 197863–197874, 2020.
Trans. Emerg. Topics Comput., vol. 9, no. 3, pp. 1529–1541, Jul. 2021. [37] H. Che, Z. Bai, R. Zuo, and H. Li, ‘‘A deep reinforcement learning
[13] Z. Tong, F. Ye, B. Liu, J. Cai, and J. Mei, ‘‘DDQN-TS: A novel bi-objective approach to the optimization of data center task scheduling,’’ Complexity,
intelligent scheduling algorithm in the cloud environment,’’ Neurocomput- vol. 2020, pp. 1–12, Aug. 2020.
ing, vol. 455, pp. 419–430, Sep. 2021. [38] T. Dong, F. Xue, C. Xiao, and J. Li, ‘‘Task scheduling based on deep rein-
[14] G. Xu, Q. Cui, X. Shi, H. Ge, Z. -H. Zhan, H. P. Lee, Y. Liang, R. Tai, forcement learning in a cloud manufacturing environment,’’ Concurrency
and C. Wu, ‘‘Particle swarm optimization based on dimensional learning Comput., Pract. Exp., vol. 32, no. 11, pp. 1–12, Jun. 2020.
strategy,’’ Swarm Evol. Comput., vol. 45, pp. 33–51, Mar. 2019. [39] S. Mostafavi and V. Hakami, ‘‘A stochastic approximation approach for
[15] A. Pradhan and S. K. Bisoy, ‘‘A novel load balancing technique for cloud foresighted task scheduling in cloud computing,’’ Wireless Pers. Commun.,
computing platform based on PSO,’’ J. King Saud Univ. Comput. Inf. Sci., vol. 114, no. 1, pp. 901–925, Sep. 2020.
vol. 34, no. 7, pp. 3988–3995, Jul. 2020. [40] Y. Wei, L. Pan, S. Liu, L. Wu, and X. Meng, ‘‘DRL-scheduling: An intel-
[16] S. Lee, D. Jha, A. Agrawal, A. Choudhary, and W.-K. Liao, ‘‘Parallel ligent QoS-aware job scheduling framework for applications in clouds,’’
deep convolutional neural network training by exploiting the overlapping IEEE Access, vol. 6, pp. 55112–55125, 2018.
of computation and communication,’’ in Proc. IEEE 24th Int. Conf. High [41] J. Yan, Y. Huang, A. Gupta, A. Gupta, C. Liu, J. Li, and L. Cheng,
Perform. Comput. (HiPC), Dec. 2017, pp. 183–192. ‘‘Energy-aware systems for real-time job scheduling in cloud data centers:
[17] M. Sharif and O. Gursoy, ‘‘Parallel computing for artificial neural network A deep reinforcement learning approach,’’ Comput. Electr. Eng., vol. 99,
training using Java native socket programming,’’ Periodicals Eng. Natural Apr. 2022, Art. no. 107688.
Sci., vol. 6, no. 1, pp. 1–10, 2018. [42] Z. Zhou, J. Chang, Z. Hu, J. Yu, and F. Li, ‘‘A modified PSO algorithm for
[18] H. Wu, A. Ozdemir, A. Zelji, A. Irfan, K. Julian, D. Gopinath, S. Fouladi, task scheduling optimization in cloud computing,’’ Concurrency Comput.
G. Katz, C. Pasareanu, and C. Barrett, ‘‘Parallelization techniques for Pract. Exp., vol. 30, no. 24, pp. 1–11, 2018.
[43] Z. Chen, J. Hu, G. Min, C. Luo, and T. El-Ghazawi, ‘‘Adaptive and
verifying neural networks,’’ 2020, arXiv:2004.08440.
[19] P. S. Mashhadi, S. Nowaczyk, and S. Pashami, ‘‘Parallel orthogonal deep efficient resource allocation in cloud datacenters using actor-critic deep
neural network,’’ Neural Netw., vol. 140, pp. 167–183, Aug. 2021. reinforcement learning,’’ IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 8,
[20] Y. Tan and Y. Zhou, ‘‘Parallel particle swarm optimization algorithm based pp. 1911–1923, Aug. 2022.
[44] T. Renugadevi, K. Geetha, N. Prabaharan, and P. Siano, ‘‘Carbon-efficient
on graphic processing units,’’ Handbook of Swarm Intelligence (Adap-
virtual machine placement based on dynamic voltage frequency scaling
tation, Learning, and Optimization). Berlin, Germany: Springer, 2011,
in Geo-distributed cloud data centers,’’ Appl. Sci., vol. 10, no. 8, p. 2701,
pp. 133–154.
Apr. 2020.
[21] S. F. Kazemi and Y. Shafahi, ‘‘An integrated model of parallel processing
[45] V. Divya and S. R. Leena, ‘‘Intelligent deep reinforcement learning based
and PSO algorithm for solving optimum highway alignment problem,’’ in
resource allocation in fog network,’’ in Proc. 26th Int. Conf. High Perform.
Proc. 27th Conf. Model. Simul., 2013, pp. 551–555.
Comput., Data Anal. Workshop (HiPCW), Dec. 2019, pp. 18–22.
[22] N. Dali and S. Bouamama, ‘‘GPU-PSO: Parallel particle swarm optimiza-
[46] F. Fu, Z. Zhang, F. R. Yu, and Q. Yan, ‘‘An actor-critic reinforcement
tion approaches on graphical processing unit for constraint reasoning: Case
learning-based resource management in mobile edge computing systems,’’
of Max-CSPs,’’ Proce. Comput. Sci., vol. 60, pp. 1070–1080, Jan. 2015.
Int. J. Mach. Learn. Cybern., vol. 11, no. 8, pp. 1875–1889, Aug. 2020.
[23] P. Chanthini, Jr., and K. Shyamala, ‘‘A survey on parallelization of neural [47] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
network using MPI and open MP,’’ Indian J. Sci. Technol., vol. 9, no. 19, pp. 436–444, May 2015.
pp. 1–7, May 2016.
[24] F. E. Fernandes and G. G. Yen, ‘‘Particle swarm optimization of deep neural
networks architectures for image classification,’’ Swarm Evol. Comput., ARABINDA PRADHAN (Graduate Student
vol. 49, pp. 62–74, Sep. 2019. Member, IEEE) received the M.Tech. (CS) degree
[25] M. Cheng, J. Li, and S. Nazarian, ‘‘DRL-cloud: Deep reinforcement from SOA University, in 2011. He is currently
learning-based resource provisioning and task scheduling for cloud ser- pursuing the Ph.D. degree with C. V. Raman
vice providers,’’ in Proc. 23rd Asia South Pacific Design Autom. Conf. Global University. He is also working as an
(ASP-DAC), Jan. 2018, pp. 129–134. Assistant Professor with the Department of CSE,
[26] X. Xiong, K. Zheng, L. Lei, and L. Hou, ‘‘Resource allocation based on GIET, Baniatangi, Bhubaneswar, India. He has
deep reinforcement learning in IoT edge computing,’’ IEEE J. Sel. Areas contributed more than ten research papers to many
Commun., vol. 38, no. 6, pp. 1133–1146, Jun. 2020.
national and international journals and confer-
[27] S. Sheng, P. Chen, Z. Chen, L. Wu, and Y. Yao, ‘‘Deep reinforcement
learning-based task scheduling in IoT edge computing,’’ Sensors, vol. 21,
ences. He is also having one patents. His research
no. 5, p. 1666, Feb. 2021. interests include cloud computing and machine learning. He reviewed
[28] L. A. Barroso, J. Clidaras, and U. Holzle, ‘‘The datacenter as a computer: articles in journals, such as Elsevier, Wiley, and BEEI.
An introduction to the design of warehouse-scale machines,’’ Synth. Lec-
tures Comput. Archit., vol. 8, no. 3, pp. 1–154, 2013. SUKANT KISHORO BISOY received the mas-
[29] J. F. Schutte, J. A. Reinbolt, B. J. Fregly, R. T. Haftka, and A. D. George, ter’s degree in computer science and engineer-
‘‘Parallel global optimization with the particle swarm algorithm,’’ Int. J. ing from Visvesvaraya Technological University
Numer. Methods Eng., vol. 61, no. 13, pp. 2296–2315, Dec. 2004. (VTU), Belgaum, India, in 2003, and the Ph.D.
[30] M. F. Abulkhair, E. S. Alkayal, and N. R. Jennings, ‘‘Automated negotia- degree from Siksha ’O’ Anusandhan University,
tion using parallel particle swarm optimization for cloud computing appli- India, in 2017. He is currently working as an Asso-
cations,’’ in Proc. Int. Conf. Comput. Appl. (ICCA), Sep. 2017, pp. 26–35. ciate Professor with the Department of Computer
[31] A. Abdelaziz, M. Anastasiadou, and M. Castelli, ‘‘A parallel particle
Science and Engineering, C. V. Raman Global
swarm optimisation for selecting optimal virtual machine on cloud envi-
ronment,’’ Appl. Sci., vol. 10, no. 18, p. 6538, Sep. 2020. University, India. He has been involved in orga-
[32] T. Gonsalves and A. Egashira, ‘‘Parallel swarms oriented particle swarm nizing many conferences, workshops, and FDP.
optimization,’’ Appl. Comput. Intell. Soft Comput., vol. 2013, Jan. 2013, He has several publications in national and international conferences and
Art. no. 756719. journals and has given invited talks in many workshops. His current research
[33] M. Vidhya and N. Sadhasivam, ‘‘Parallel particle swarm optimization for interests include wireless sensor networks, neuro-robotics, machine learning,
task scheduling in cloud computing,’’ Int. J. Innov. Res. Sci., Eng. Technol., cloud computing, and SDN. He is a reviewer of the IEEE TRANSACTIONS,
vol. 4, no. 6, pp. 136–140, 2015. Elsevier, and Springer journal and conference.
SANDEEP KAUTISH received the bachelor’s, ALI WAGDY MOHAMED received the B.Sc.,
master’s, and Ph.D. degrees in computer science M.Sc., and Ph.D. degrees from Cairo University,
on intelligent systems in social networks, and the Egypt, in 2000, 2004, and 2010, respectively.
PG Diploma degree in management. He was an Associate Professor of statistics at
He is currently working as a Professor the Wireless Intelligent Networks Center (WINC),
and the Director-Campus with LBEF Campus, Faculty of Engineering and Applied Sciences, Nile
Kathmandu, Nepal, running in academic collab- University (2019–2021). He is currently an Asso-
oration with the Asia Pacific University of Tech- ciate Professor at the Operations Research Depart-
nology and Innovation, Malaysia (301–350 in QS ment, Faculty of Graduate Studies for Statistical
Asian University Rankings). He is an academi- Research, Cairo University. He is also an Associate
cian by choice and backed with more than 18 years of work experience Professor with the Mathematics and Actuarial Science Department, School
in academics, including over eight years in academic administration in of Sciences and Engineering, The American University in Cairo, Cairo,
various institutions of India and abroad. He has meritorious academic Egypt. He has supervised two Ph.D. and one master’s students. He pub-
records throughout his academic career. He has over 60 publications in his lished more than 80 articles in reputed and high-impact journals, such as
account and his research works has been published in reputed journals, Information Sciences, Swarm and Evolutionary Computation, Computers
such as Springer, Elsevier, Taylor & Francis, Hindawi, and IGI Global & Industrial Engineering, Intelligent Manufacturing, Soft Computing, and
with high-impact factor and SCI/SCIE/Scopus/WoS indexing. His research International Journal of Machine Learning and Cybernetics. His research
papers can be found at the IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS interests include mathematical and statistical modeling, stochastic and deter-
(Impact Factor: 10.215), Computer Standards & Interface (SCI, Else- ministic optimization, swarm intelligence and evolutionary computation.
vier), Journal of Ambient Intelligence and Humanized Computing (SCIE, Additionally, he is also interested in real world problems, such as industrial,
Springer), and Wireless Personal Communications (SCIE, Springer). He has transportation, manufacturing, education, and capital investment problems.
authored/edited more than 12 books with reputed publishers, such as He has appointed as a member of the Education and Scientific Research Pol-
Springer, Elsevier, Wiley, De Gruyter, Bentham Science, and IGI Global. icy Council and the Academy of Scientific Research (2021–2024). Recently,
He has been invited as a Keynote Speaker at the VIT Vellore, in 2019, he has recognized among the top 2% scientists according to Stanford Uni-
for an International Virtual Conference held at VIT Vellore Campus. versity report 2019 and 2020, respectively. He serves as a reviewer for more
He filed one patent in the field of solar energy equipment using artificial than 80 international accredited top-tier journals and has been awarded the
intelligence, in 2019. His research interests include healthcare analytics, Publons Peer Review Awards 2018, for placing in the top 1% of review-
business analytics, machine learning, data mining, information systems, ers worldwide in assorted field. He is an Associate Editor of Swarm and
and decision support systems. He is an Editorial Member/Reviewer of Evolutionary Computation journal (Elsevier). He is an Editor of more than
various reputed SCI/SCIE journals, such as IEEE ACCESS (IEEE), Journal of ten journals of Information Sciences, Applied Mathematics, Engineering,
Intelligent Manufacturing (Springer), Computer Communications (Elsevier), System Science, and Operations Research. He has presented and participated
Multimedia Tools & Applications (Springer), Computational Intelligence in more than five international conferences. He participated as a member of
(Wiley), and Australasian Journal of Information Systems (AJIS). He is a the reviewer committee for 35 different conferences sponsored by Springer
Recognized Academician as a Session Chair/Ph.D. Thesis Supervisor and and IEEE. He has obtained Rank 3 in CEC 2017 competitions on single
an External Examiner at various international universities of reputes, such objective bound constrained real-parameter numerical optimization in the
as the University of Kufa, University of Babylon, Polytechnic University Proceedings of IEEE Congress on Evolutionary Computation, IEEE-CEC
of the Philippines (PUP), University of Madras, Anna University Chennai, 2017, San Sebastián, Spain. Besides, he obtained Rank 3 and Rank 2 in CEC
Savitribai Phule Pune University, M. S. University, Tirunelveli, and various 2018 competitions on single objective bound constrained real-parameter
other technical universities. numerical optimization and competition on large scale global optimization,
in the Proceedings of IEEE Congress on Evolutionary Computation, IEEE-
CEC 2017, Sao Paolo, Brazil. He has obtained Rank 2 in CEC 2020 com-
petitions on single objective bound constrained real-parameter numerical
optimization in the Proceedings of IEEE Congress on Evolutionary Compu-
MUHAMMED BASHEER JASSER (Member, tation, IEEE-CEC 2020, U.K. He has obtained Rank 1 in CEC 2021 com-
IEEE) received the master’s and Ph.D. degrees petitions on single objective bound constrained real-parameter numerical
in software engineering from the University Putra optimization in the Proceedings of IEEE Congress on Evolutionary Com-
Malaysia (UPM). He is currently a Senior Lecturer putation, IEEE-CEC 2020, Poland.
and the Program Leader of B.Sc. degree (Hons.)
in information technology with the Department of
Computing and Information Systems, School of
Engineering and Technology, Sunway University.
He was granted the Malaysian Technical Coopera-
tion Program Scholarship (MTCP) from the Min-
istry of Higher Education (Malaysia) for his postgraduate studies. He is also
working on several fundamental and industrial research projects in the area of
artificial intelligence and software engineering funded by several companies
and universities. Several postgraduate students are working under his super-
vision on these projects. His major research interests include optimization
algorithms, evolutionary computation, model-driven software engineering,
formal specification, verification and theorem proving, artificial intelligence,
and machine learning. He is also a member of several professional academic
bodies, including IEEE, the Institute of Electronics, Information and Com-
munication Engineers (IEICE), and Formal Methods Europe Organization.