Modeling An Optimized Approach For Load Balancing in Cloud
Modeling An Optimized Approach For Load Balancing in Cloud
October 1, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3024113
ABSTRACT Despite significant infrastructure improvements, cloud computing still faces numerous chal-
lenges in terms of load balancing. Several techniques have been applied in the literature to improve load
balancing efficiency. Recent research manifested that load balancing techniques based on metaheuristics
provide better solutions for proper scheduling and allocation of resources in the cloud. However, most of
the existing approaches consider only a single or few QoS metrics and ignore many important factors.
The performance efficiency of these approaches is further enhanced by merging with machine learning
techniques. These approaches combine the relative benefits of load balancing algorithm backed up by
powerful machine learning models such as Support Vector Machines (SVM). In the cloud, data exists in
huge volume and variety that requires extensive computations for its accessibility, and hence performance
efficiency is a major concern. To address such concerns, we propose a load balancing algorithm, namely,
Data Files Type Formatting (DFTF) that utilizes a modified version of Cat Swarm Optimization (CSO)
along with SVM. First, the proposed system classifies data in the cloud from diverse sources into various
types, such as text, images, video, and audio using one to many types of SVM classifiers. Then, the data is
input to the modified load balancing algorithm CSO that efficiently distributes the load on VMs. Simulation
results compared to existing approaches showed an improved performance in terms of throughput (7%),
the response time (8.2%), migration time (13%), energy consumption (8.5%), optimization time (9.7%),
overhead time (6.2%), SLA violation (8.9%), and average execution time (9%). These results outperformed
some of the existing baselines used in this research such as CBSMKC, FSALB, PSO-BOOST, IACSO-SVM,
CSO-DA, and GA-ACO.
INDEX TERMS Classification, cloud, SVM, load balancing, metaheuristics, virtual machine.
I. INTRODUCTION For instance, in [1], the authors have applied the Bin-packing
Over the years, an increase in online applications has resulted algorithm for multi capacity Bin-packing to achieve task
in huge volumes of data accumulated daily. Generally, waiting time and degree of imbalance on cloud resources. In a
the data is classified into different types, such as audio, video, similar work [2], the authors used the Bin-packing algorithm
image, and text. Despite the significant evolution of clouding for cost-aware and fragmentation enabled consolidation of
computing to handle such diverse data, still it faces numerous tasks to achieve minimum energy consumption. In a work
challenges in real-time processing and load balancing of by [3], the authors used a dynamic clustering algorithm
resources employed to process mega volumes of data. to achieve throughput and execution time. A study by [4]
In the past few years, several load balancing approaches applied a dynamic real clustering algorithm for achieving
have been developed for cloud computing, such as [1]–[5]. geographical load balancing in the cloud that results in better
throughput and response time. In [5], the authors applied
The associate editor coordinating the review of this manuscript and adaptive load balancing to achieve optimal resource provi-
approving it for publication was Adnan Shahid. sioning resulting in better resource utilization and throughput.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
173208 VOLUME 8, 2020
M. Junaid et al.: Modeling an Optimized Approach for Load Balancing in Cloud
However, most of the traditional load balancing approaches and SVM are widely used for text classifications in high
suffer from high computational cost, energy consumption, dimension space providing high accuracies [21]–[24]. Image
several overheads, scalability, and deadline constraints. classification involves the selection of image feature subsets
In recent years, the research trend has shifted towards from large feature space. Selecting optimum features is a
metaheuristics-based approaches for load balancing, as these complex process in image classification for load balancing
techniques are better in addressing flexibility, multimodal that is solved by several hybrid metaheuristic techniques such
optimization, efficient randomization, discontinuous prob- as CSO, GA, ACO, PSO, with SVM, NN, K-NN [25]–[28].
lems through intensification (exploitation), and diversifica- Despite several advantages, the aforementioned approaches
tion (exploration) respectively. The authors in [6] presented have certain deficiencies and therefore, there is still a need
a metaheuristics approach for load balancing using mod- for multi-factor optimal solutions for load balancing.
ified PSO in which they minimized tasks overhead and Our proposed work focuses on the development of a
maximize resource utilization over varying VMs and tasks. new load balancing algorithm named Data Files Type For-
In [7], the authors combined ACO and PSO in a hybrid matting (DFTF) that combines SVM (a machine learning
metaheuristics load balancer ACOPS which uses the histor- classifier) with modified Cat Swarm Optimization (CSO)
ical information to predict future workload of the VMs in algorithm (a scheduling algorithm). The proposed DFTF
the cloud. This approach helps in reducing computational algorithm considers multi-factor QoS metrics, such as energy
time while keeping optimum load balancing among VMs consumption, response time, SLA violations, migration time,
and tasks. This metaheuristic approach helps in finding a optimization time, execution time, throughput time, and over-
local and global best position in the solutions with fast con- head time as performance evaluation measures. In this work,
vergence and hence performing better than many heuristic SVM is applied using one-to-many classifications for gen-
approaches. Similarly, In [8], the authors employed SVM erating the data class over a set of file formats, such as
in cloud load balancing metaheuristics ACO to achieve bet- audio, video, text, and images in the cloud environment. The
ter throughput, SLA, migration, overhead, and optimization classification process can easily reduce such complexities
but lacks other critical factors such as energy consumption, while performing offline preprocessing and make the data
response time and execution time. Similarly, most of the available in the processed form [8]. This refined form of data
existing metaheuristic techniques are covering either one or when applied to load balancing can significantly improve
a few optimization parameters but ignoring other critical scheduling using QoS parameters [29].
factors that can play a pivotal role in achieving multi-factor Original CSO is more suitable for small population size
optimization [9]. Moreover, the issues faced in the cloud with a minimum number of iterations and hence not providing
due to load balancing can be further minimized with the good solutions in a situation where the processing involves
combination of metaheuristics and data mining techniques a large number of complex tasks [30], [31]. This drawback
to solve complex optimization problems more efficiently eventually leads CSO to fall into local optimum which takes
[10]–[13]. Nowadays, Cloud Data Mining (CDM) is gain- more iterations in finding solutions space and hence make
ing popularity in which machine learning models, such CSO computationally complex [32]. Therefore, we have
as supervised machine learning are integrated with cloud modified the original CSO by introducing a new grouping
load balancing approaches that result in new efficient algo- phase process that takes the data files into four groups:
rithms [14], [15]. Similarly, the classification of multiple audio, video, image, and text taking from SVM keeping in
file types in the cloud can achieve an improved load bal- view the properties associated with each group. In doing so,
ancing with increased accuracy due to the pre-assignment the population of cats in the sub-groups is sorted and later
and categorization of data for virtual machines with differ- in the stage, the best fitness value of the cat in local best
ent resources. For instance, the audio classification exists solution is selected. Thereby, integrating the two approaches
in various forms such as noise, speech, silence, and music, SVM and CSO into a merged one addresses their individual
etc, and can achieve performance efficiency using deep limitations and reinforces their combined benefits into a sin-
learning algorithms such as Convolutional Neural Network gle combined model. Earlier data type approaches such as
(CNN) [16]. Similarly, video datasets need proper catego- AWS and PostgreSQL focus only on data types classification
rization and automatic classification for quick retrieval and but not on file format types [33]. However, the proposed
indexing. This helps in understanding the semantic gap to approach is using file format types for classification in the
minimize computational complexity. Integrated metaheuris- cloud environment and then uses the resultant data class into
tic algorithms such as ACO, ABC, PSO, etc. are used in load balancing algorithm modified CSO for load balancing.
several ways to attain more accuracy in video classification This combination has outperformed some sate of the art
using classifiers such as SVM, KNN, and NN [17]–[19]. metaheuristic load balancing algorithms used in this study
It has been observed that a huge increase in text documents is such as CBS-MKC [34], FSALB [35], PSO-BOOST [36],
making the extraction process quite complex. Text clustering IACSO-SVM [37], CSO-DA [38] and GA-ACO [39].
is used for text mining to categorize the text documents but The main objective of this research is to propose a
cannot perform text feature selection [20]. The metaheuris- new optimized metaheuristic algorithm DFTF in a cloud
tic algorithms and classifiers such as GA, HS, PSO, NN, that performs classification and load balancing effectively.
This optimization model addresses the limitations of the ear- these challenges are being discussed which mainly rotates
lier load balancing approaches by its multi-factor approach. around our developed QoS metrics in the below sections.
Further, the contributions of this paper include: Cloud computing is making significant contributions in
• A new algorithm DFTF is developed based on SVM extracting useful information when combined with data min-
and modified CSO that provides better-optimized load ing techniques. This combination called cloud data min-
balancing in the cloud environment. ing (CDN) has made easy information retrieval from huge
• The classification of data file formats into audio, video, volume and variety of data with the help of load bal-
image, and text is performed in a cloud environ- ancing approaches. Similarly, several real-time cloud data
ment that shows an improved classification accuracy mining frameworks, algorithms, and services are avail-
in confusion matrix such as accuracy, precision, recall, able that provide information through a number of appli-
and F-measure over state-of-the-art classifiers helping cations [43]. These real-time cloud data mining and load
in decreasing computational complexities later in the balancing applications are VM tasks classification for load
scheduling phase. balancing, feature extraction, anomaly, and intrusion detec-
• The proposed DFTF model has provided improved tion, open shop scheduling, attribute importance, spatial
results for energy consumption, response time, SLA classifications, data analysis and satellite imagery [44],
violations, migration time, optimization time, execution spectral and statistical data analysis [45], gene expres-
time, throughput time, and overhead time as perfor- sion data mining and bioinformatics [46], geo-spatial anal-
mance evaluation measures. ysis and geo-informatics, large-scale mining in big data
and web mining [47], machine-learning applications [48],
The rest of this paper is organized as follows. A literature high-dimensional data mining [49], highly diversified and
review is presented in Section 2, the proposed methodology is dense data mining in rule mining [46], the security of data
discussed in Section 3, the experimental setup is described in in the cloud, clustering, datacenters resources optimization
Section 4, results and analysis are discussed in Section 5 and in the cloud, noise removal, reactive power problem, face
conclusions are presented in Section 6. recognition, biomedical image processing, teaching based
learning, manufacturing design, water resource problem and
II. LITERATURE REVIEW routing optimization.
The load balancing algorithms are classified as dynamic, This research is mainly focused on classification, specifi-
static, or hybrid and it depends on the machine state. They cally the combination of classifier with a load balancer in the
are also known as allocation and scheduling algorithms based cloud. So, most of the presented information rotates around
on the features used during load balancing. Further, they are classification and load balancing. In classification, data is
categorized as Cloud Data mining load balancing, VM load assigned to appropriate classes using supervised machine
balancing, CPU based load balancing, Task-based load bal- learning techniques. The classification consists of many tech-
ancing, Server-based load balancing, Network-based load niques such as SVM, decision tree, Bayesian classifier, NN,
balancing, and Standard Cloud load balancing based on their belief networks, and Rule-based classifiers. Cloud data min-
combination. Numerous studies discussed the limitations of ing classification techniques including K-NN, Point data,
the load balancing algorithms for proposing more effective NB, GA, and their hybrids such as NB with SVM, GA with
methods. The study lacks discussion of some essential QoS SVM, ACO with SVM, ACO with NN, GA with K-NN, etc.
metrics, such as migration duration, migration expense, ser- These combinations help in getting the highest accuracies in
vice quality breach, task failure rate, algorithmic efficiency, classification and further reducing computation complexities
percentage of load balancing measures, and level of balance. in load balancing over the number of applications.
The algorithm for load balancing must improve responsive- In [50], the authors proposed a hybrid metaheuristic
ness, implementation cost, implementation time, through- algorithm called WOA-AEFS. They solved the resource
put, fault sensitivity, migration duration, makespan, resource scheduling problem in cloud computing. This study has two
throughput, and usability. At the same time, energy consump- scheduling approaches that consider makespan and cost.
tion, carbon pollution, relocation costs, energy efficiency, The algorithm outperformed other metaheuristic algorithms
and SLA needs more consideration [40]–[42]. It has been such as original BAT and PSO but did not consider factors
observed that in reducing the efficiency of the load balancing, such as execution time and performance efficiency. Extended
algorithm complexity is not given much consideration. The BAT Algorithm (EBA) is suggested by [51] that modifies
studies also concluded that several issues remain a huge three benchmark functions such as Ackley, Hyperellipsoid,
challenge in the load balancing that can be traversed in the and Rosenbrock resulting in better performance of search-
future by implementing an adequate, effective, and robust ing optimum solutions, fitness function, and convergence
load balancing algorithm. The decline of these dimensions’ rate. The algorithm outperformed other metaheuristics, such
leads to poor QoS at Cloud Service Centers and a decreased as MMBO, and MBO-FS in the same cloud but compu-
economy for CSP. However, keeping QoS and economics into tational complexity is a major concern. A study by [52]
consideration, delivering optimized multi-factor QoS based suggested Gravitational Search Algorithm (GSA) for load
solutions is becoming a major challenge for CSPs. Some of balancing in the cloud where it proved strong convergence
over a set of iterations. The drawback of this algorithm benchmark functions have shown the better performance
is computation-intensive that effects its scalability. A study of each metaheuristic in best, worst, average, and SD
by [53] proposed hybrid IRRO-CSO which is inspired by scenarios. However, multi-objective QoS metrics are not
the perception of information flow in raven social behavior considered.
among members in searching food while CSO is based on A dynamic VM migration algorithm MMA is suggested
chicken behavior during a search of food. The algorithm by [89] suitable for High-Performance Computing (HPC).
has been validated against CEC 2017 benchmark functions MMA algorithm minimizes the load on overloaded machines
resulting in showing better performance over BAT, PSO, and reduced communication costs. However, saturation can
and CSO. However, the performance needs to be checked cause performance degradation and computational complex-
on real-time large datasets that improve the execution ity. A study by [90] discussed a VM migration strategy
time. that provides better scalability over other strategies. In this
In [82], modified Heterogeneous Earliest Finish Time approach, a composite scoring function is used and find the
(MHEFT) is proposed for dynamic load balancing in the host that has workload handling capability. At this stage,
cloud resulting. The algorithm works well under a smaller migration takes place and load is transferred to it. However,
number of tasks and did not consider other QoS metrics the high computation cost, overhead, and multi-objective
for performance evaluations. CEGA is a genetic inspired approach are lacking. An improvement in reduced overhead
balancing algorithm designed to meet deadline constraints using a metaheuristic automatic power-aware algorithm in
while reducing the execution time of the tasks [83]. Results VANETs is presented by [91]. This research uses PSO that
of the CEGA have shown better performance on the same achieves energy-efficient communication. However, perfor-
workflows but the algorithm suffers exponential time com- mance degradation of 8% is observed and computation cost
plexity. A variation of CSO called Improved Cat Swarm is high. Biological Inspired Self Organized Autonomous
Optimization (ICSO) is proposed by [79]. Here, the first Routing Protocol (BIOSARP) is proposed by [92] for the
modification is improving the tracing mode with changing earliest searching of a neighbor using an optimal decision
position and velocity equations. Similarly, the other modifi- by ACO. This helps in reducing overload to a significant
cation is to make changes in such a way that local optima are extent but the algorithm has a relatively higher overhead
prevented. However, the algorithm is only tested on bench- cost.
mark functions with fewer tasks whereas, the number of A study by [93] discussed load balancing in multi-core
QoS metrics is not considered. In a study by [84], authors clusters using frequent data mining in the cloud. In their,
developed a hybrid metaheuristic algorithm HBMMO for work SDFEM is proposed that provides high mining perfor-
workflow scheduling in cloud computing for improving mance in the large complex data analytic real-time applica-
throughput in the cloud. The algorithm takes a multi-objective tions. They used a hybrid approach of OpenMP and MPI
approach such as quantization of execution cost, throughput, and tested their implementation on 12 core shared memory
and makespan. Despite many factors discussed in the study; nodes. The results have shown a remarkable increase in
energy consumption is not considered. In [85], Weighted performance that is much faster and reliable. However, this
Wavelet SVM (WW-SVM) is suggested for estimating load combination has some complexities especially computational
sequences in a data center using a cloud computing environ- and memory complexities. In [49], the authors suggested
ment. For parameter selection and optimization, PSO is used a pattern mining load balancing technique for high dimen-
to make a final prediction. The proposed algorithm outper- sion data PaMPa-HD based on Map-reduce. The algorithm
forms other baselines in terms of execution time, throughput, performed well in terms of robustness and execution time
and error prediction. The algorithm considers only predic- due to its inherent properties of better mining patterns and
tion and accuracy while simple multi objectives are not the least amount of transactions. However, there are a large
discussed. number of items per transaction. The authors in [94] pre-
An improved SLA violation for load balancing in the sented metaheuristic EELBF Firefly load balancing algo-
cloud with the objective of minimum resource wastages is rithms in the cloud in which throughput and response time
discussed by [86]. The algorithm used optimum resources are focused. This algorithm besides finding relational min-
in which there is less failure rate of fulfilling tasks and ing models provides better energy consumption by balanc-
maintaining low energy usage and least SLA violations up ing workload in multiple VMs (considering less loaded and
to 17%. The algorithm did not consider execution time high loaded VMs). The algorithm has been implemented in
and numerous scientific evaluations. Sharma et al. proposed CloudSim 4.0 and compared with ACO, HBB, and WRRLB,
SLA agile-based VM to reduce response time [87]. This and overall better performance is observed. A study by [95]
research used ghost VM to reduce the VM creation time presented an energy-efficient load balancing algorithm that
by approximately 12%. Static workloads are used, and per- uses the combination of BWM and TOPSIS methodology
formance metrics are not discussed. In [88], comprehensive for a multi-objective mining approach. The selection of most
comparisons of SLA violations in a cloud environment are appropriate cloud scheduling solutions is performed in two
performed for five metaheuristics algorithms in which QoS steps in which initially a decision criterion is defined followed
constraints and penalty costs are considered. Experiments on by BWM for weights assigning and then TOPSIS is applied
to measure the performance of each alternative. Experi- better results in terms of makespan, energy consumption, and
ments have shown that the proposed algorithm has attained VM utilization over other baselines. However, this study did
not consider large scale datacenters which provide scalability A. DATA CLASSIFICATION BASED ON SUPPORT
and reliability of the proposed solution over a larger number VECTOR MACHINE
of tasks. In [96], the authors discussed the PSO based load We collected the data from different cloud sources and then
balancing algorithm used for resource allocations in cloud preprocessed the data to transform it as per our model require-
computing. The algorithm finds task initiation overload on ments. The data format of the collected data from the cloud is
VMs by optimized migration transfers to other VMs. As a comprised of video, audio, text, and images. These data sets
result, the algorithm achieved reduced execution time and are diverse and are of different sizes. In the proposed model,
transfer time. However, this algorithm is only considering a at first, the SVM classifier determines the type of data (audio,
few tasks based on a single factor thereby not addressing scal- video, text, image) based on features and then classify the data
ability issues. A study by [97] presented a new adaptive inte- by assigning it to a particular class.
grated approach based on best-worst decision making and the We have divided the VMs into four types of sets, such
ranking method called VIKOR which is used to define tasks’ as AudioVM, VideoVM, TextVM, and ImageVM based
priorities. This algorithm uses a compromised approach in on input data. Each set of VM has different process-
which group benefits are maximized over individual losses. ing and storage resources in a cloud environment. More
The algorithm provides better reliability by keeping all precisely, each machine (VM) is assigned a task based
VMs in the process during runtime. Further, the algorithm on task requirements. For example, video tasks require
achieves better throughput, reduced makespan, improved 1000 floating-point operations and 16GB memory, audio
waiting time, more virtual machine (VM) utilization, and tasks require 800 floating-point operations and 12GB mem-
less VM usage cost when compared with other baselines. ory, image tasks require 800 floating-point operations, and
However, a maximum of 1000 tasks is considered for various 8GB memory, textual tasks require 400 floating-point opera-
QoS performance metrics which means that scalability may tions and 4GB memory. After that, the SVM classifier iden-
be the issue when tasks are significantly increased along tifies the set of VM types such as VideoVM, AudioVM,
with VMs. TextVM, ImageVM, based on the requirements, size, and
It is observed from the number of studies presented here features of the tasks. Here the respective VM is assigned
that no comprehensive multi-factor approach is adopted that concerning each task. Hence, SVM intends to classify data
optimizes the QoS metrics without effecting the quality and match it to the most suitable class type and VM type.
solution. For video data classification, we extracted feature vec-
tors of sequences of 40 frames extracted from four different
III. PROPOSED METHODOLOGY video classes, where we have a 40 × 4096 matrix, where
We combined our approach using SVM and CSO to make each row refers to features of one frame (one frame per
a hybrid model called DFTF with the objective of improving row), so we classified videos between these four different
the load balancing and performance in the cloud environment. classes. We preprocess a new video to limit its number of
The architecture of the proposed DFTF approach is shown frames and then extract features from this video to classify
in Figure 1. This architecture is divided into two main mod- it. Assume that we have four video classes (ci, i = 1, .., 4).
ules: ‘Data Classification based on SVM’ and ‘Load Balanc- Each video has 40 (n = 1, .., 40) frames and from each frame
ing using CSO’. The input to the data classification module is we extracted 4096 features ([1×4096]). Since each frame has
the collection of diverse data in the form of video, text, audio, enough information to predict the video class (ci) so we used
and images, which are stored in the cloud environment. The 40 frames from each video as training/test samples, which
classification module takes the input data randomly and then creates an input matrix of [160 × 4096] dimensions, with
performs the classification on these data using polynomial 160 samples and each sample have 4096 features. Addition-
SVM. The output of this algorithm is in the form of the ally, we have created an output vector [160 × 1] that contains
partitioned data class. The second module performs load the label of each class ci = i, where i = 1 . . . , 4.
balancing using Cat Swarm Optimization (CSO). The per- For audio data classification, four feature sets of audios are
formance analysis of the proposed model is then performed evaluated for identifying five kinds of audio classes: classi-
to achieve an efficient load balancing by considering the cal music, popular music, crowd noise, speech, and simple
parameters such as execution time, number of migrations, noise. The feature sets include low-level signal properties,
optimization time, throughput time, and overhead time. The mel-frequency spectral coefficients [98], and two new sets
various tools used in this research are CloudSim 4.0 and Java based on perceptual models of hearing. For image classifica-
environment. tion, we have considered 256 × 256 pixels (total 65,536 pix-
Algorithm 1 describes the process of the proposed model els). We used each pixel as a feature in the SVM classifier.
called DFTF. In this algorithm, Lines 1 to 11 performed data For text classification, there are text documents of about
categorization that first classifies the type of data and then 6GB which are extracted in the form of unstructured
classifies the type of VMs using SVM and assigned it to text. We performed stemming and stop word removal and
the particular class. Lines 12 to 32 performed load balancing extracted the words in the form of features. We then
using CSO and then output the schedule data. used these features for text data classification using SVM.
SVM works on the principle of linear classification with Equation (1) shows the polynomial function,
a special type of rule that generates classes with effec- s
tive performance and is based on the quality of classifi- POLY (u, v) = uk v + 1 , (1)
cation. Kernel trick can be used in the construction of a where ‘s’ is the polynomial degree.
special kind of non-linear method using SVM. There are The polynomial kernel function is used with SVMs and
two types of classic kernel functions that are used in SVMs, other kernel models representing the similarity between
one of them is the radial basis function kernel and the features over the polynomials of the original variables.
other is a polynomial kernel. where, ui is used for support A polynomial kernel is defined as:
vector, ∝i is represented as Lagrange multiplier and uj is K (x, xi ) = 1 +
X
(xxxi )d . (2)
known as the label of membership class (+1, −1) where
n = 1, 2, 3 . . .. N. Here, d=1, this confirms to the linear kernel.
Algorithm 1 DFTF
Input: video, text, audio, image, N, number of virtual
machines (VM), number of cats, max_iterations, SMP
(seeking memory pool) = 5 to 10, MR is the Mixture ratio,
Sz: Size of the population, maximum_iter
Output: Data class, Scheduled data
1:for data classification do
2: for each P (u, v) do
3: Evaluate = SVM
4: for each Classification accuracy 6 = 100 do
5: Evaluate data accuracy
6: if max_ iterations 6 = N then
7: perform data categorization and VM
categorization
8: end if FIGURE 2. The network of Virtual Machines (VMs).
9: end for
10: end for where V represents the virtual machine (VM) or node and E
11: end for represents the undirected edge having a probability a weight
12: for load balancing do that shows the overload and underload intensity between
13. Create N cats and divide them into four groups G, that two nodes. After the data classification, load balancing is
is G={audio, video, text, image}. performed using CSO. In the load balancing phase, these data
14. Randomly initialize velocities to each cat belongs to are called tasks. Let us assume that:
group G.
15. Evaluate initial fitness function Fi VideoVM = {VM 1 , VM 2 , . . . . . . , VM n } ,
16. Csz = Create Cpop (sz, Fi ) //Create cat population AudioVM = {VM 1 , VM 2 , . . . . . . , VM n ,
// Distribute the cats in seeking or tracing mode TextVM = {VM 1 , VM 2 , . . . . . . , VM n } ,
17. while k ≤ maximum_iter do
ImageVM = {VM 1 , VM 2 , . . . . . . , VM n }
18. for each i = 1 to Sz do
19. if C[i] = Seekm then be the set of virtual machines for video, audio, text, image,
20. Sol = Apply Seekm (Cj ) respectively. Each set of machines is responsible to execute
21. else one task. Each task is executed for a period of maximum
22. Sol = Apply Tracem (Cj ) iterations and is evaluated using computational cost in the
23: end if form of time. The mapping of tasks on virtual machines is
24. Fbest = Sol best computed using the SVM, where each machine is assigned a
25. if C(F,k) detected then task based on requirements, size, and features of the tasks.
26. Csz = create Cpop [sz, F] Once the number of tasks and VMs are selected,
27. else the scheduling process will be initiated. Initially, N instances
28. Csz = reset Cpop [Csz ] are created and split into G groups. CSO takes into con-
29: end if sideration the behavior of the cats into two modes that
30. end for are seeking mode and tracing mode. Swarm algorithms are
31. end while widely accepted as they adapted the best-obtained solutions
32. return F (schedule data) //return best solution for searching the most similar neighbors (nodes). So, in this
33. Exit. method, the cat behavior is considered for searching a solu-
tion space. Every cat has its position having d-dimensions
with different velocities used for each dimension. Every cat is
The output of the classification phase is in the form of clas- evaluated using fitness function, if the fitness is not equal then
sified tasks, thus reducing the computational cost such as to compute the probability using equation (3), and by default,
avoid preprocessing of features learning, features extraction, the probability value is set to 1. We used the Boolean flag
data conversion, data transformation, and data classification variable to identify whether the cat is in seeking mode or
at the scheduling phase. tracing mode. The tracing mode is considered in terms of
its fitness function where the position of the cat is changed
according to the fitness function. The fitness function of CSO
B. LOAD BALANCING USING CAT SWARM OPTIMIZATION
can be obtained with the help of equation (3).
We developed the network of VMs in the form of an undi-
rected weighted graph as shown in Figure 2. The VMs’ FS i − FS max
Pi = , where 0 < i < j, (3)
network can be represented as an undirected graph G = (V, E) FS max − FS min
where Pi shows the probability, value associated with the TABLE 2. Parameter settings used for DFTF.
position of ith Cat. FS i is the fitness of ith cat, FS max rep-
resents the maximum fitness value and FS min represents the
minimum fitness value achieved so far.
The tracing mode of the CSO method is described in terms
of the movement of cats that is based on the outstanding
hunting skills of cats. In tracing mode, the movement of cats
is according to their velocities in each dimension and then
updating their positions accordingly. The updated positions of
cats and velocities are calculated using equations (4) and (5).
These equations are:
TABLE 5. Statistics about training and test sets. research include CBS-MKC, FSALB, PSO-BOOST, IACSO-
SVM, CSO-DA, and GA-ACO.
CBS-MKC used credit-based scheduling with an empha-
sis on task categorization but lacks a multi-factor appr-
oach. FSALB largely focused on reducing communication
delays experienced by the machine learning users and hence
improved response time but lacks a multi-factor approach,
PSO-BOOST considered deadline constraint within a limited
number of tasks, VMs and has shown improvements on few
parameters. IACSO-SVM worked on the classification accu-
racy of the limited number of tasks and datasets. CSO-DA
emphasized response time, number of migrations, and exe-
cution time on fewer tasks and VMs. GA-ACO improved
completion time, response time, and throughput under limited
Further, various dataset files are placed into training and resources. However, the proposed algorithm DFTF not only
testing mode with a ratio of 70:30, where 70% of data files are addresses these limitations but also adopted a multi-factor
training datasets and 30% are testing datasets as mentioned approach to solving.
in Table 5. Similarly, there are deep learning approaches which are
CloudSim 4.0 [100] as compared to other simulators is producing better results than traditional algorithms, but they
widely used in conducting and implementing cloud-related take more time in training with a large number of datasets.
research work. The simulator is providing on-demand Therefore, as per the requirement of our proposed work,
resources in virtualization form and has several advantages we have selected One-vs-Many SVM that has outperformed
such as flexibility, performance, and ease of use. A data center other classifiers in the first stage in terms of accuracy.
is configured with the region, architecture, operating system,
VM, memory, storage data transfer cost, and the number of A. ACCURACY OF DFTF
physical hardware units. In our case, we have set 500 and
Validation methods such as accuracy, precision, recall, and
1000 VMs, respectively, during experiments along with 4096,
F-measure are used to check the accuracy of the DFTF.
8192 MB of RAM and 2 TB of memory. All simulations
The classifiers such as ACO-SVM [102], Bayes Net [103],
are performed on Desktop PC comprising of MS Windows
J48 [103], and Multiclass [104] are used for comparative anal-
10 Operating System, Intel Quad-Core i7 with 2.6 GHz pro-
ysis as shown in Table 5. The results of DTFT are presented
cessor, 12 GB RAM, and 1 TB of HDD.
as comparative analysis over other algorithms in which DFTF
The algorithms used in this research include CBS-MKC,
has shown better performance in all validation methods.
FSALB, PSO-BOOST, IACSO-SVM, CSO-DA, GA-ACO,
The results of classification algorithms are validated
and proposed DFTF. There are eight metrics on which DFTF
using classification validation accuracy measures concerning
is compared, such as energy consumption, response time,
Accuracy, Precision, Recall, and F-Measure [105] reported
SLA violations, number of migrations, execution time, over-
in Table 6. The results of these classifiers are ranged between
head time, throughput time, and optimization time. There is
[0-1] with 1 being accurate classification. The more the
a total of 60,000 tasks on which evaluations are performed.
value closer to 1, the higher the accuracy of the classifier is
All algorithms are implemented in CloudSim 4.0 taken from
achieved. From Figure 3, DFTF has attained better accuracy
their respective research papers with the same configura-
than other classifiers.
tion and environmental setting to make the results reliable.
Further, the results are statistically verified through analysis TABLE 6. Comparative analysis of DFTF with classification techniques.
of student t-test to check their reliability that eliminates the
fact that the values are not by chance.
FIGURE 4. Performance of DFTF and Baselines on Energy Consumption FIGURE 5. Performance of DFTF and Baselines on Response Time on VMs
on VMs (5-2000). (5-2000).
where,
Ttotal : Total CPU time
Tvmi : Time is taken by ith VM.
The optimization time also known as convergence time GA-ACO with 17%, FSALB with 16%, CSO-DA with 14%,
of all baselines is plotted in Figure 8. Only two algorithms IACSO-SVM with 13%, PSO-Boost with 12% and only 9%
FSALB and CBS-MKC are showing an exponential increase execution time taken by DFTF.
in optimization time resulting in comparatively higher It shows that the classification method using SVM plays
unstable behavior. The algorithms such as IACSO-SVM, an effective role in shortening the task execution time of the
GA-ACO, and CSO-DA deviate a little bit in optimizing DFTF and further establishes the stronger scheduling ability
the tasks because these algorithms get trapped into local of the algorithm.
optimum, whereas as PSO-Boost and DFTF optimize quite
H. EVALUATION OF DFTF ON THROUGHPUT TIME
fast and produce better results in the presence of other base-
lines. Overall, FSALB has taken much time in getting opti- Throughput time is calculated using the following proposed
mized which is 22% followed by CBBS-MKC with 17%, equation:
IACSO-SVM with 14%, CSO-DA with 12%, GA-ACO XN Tk
TTP = . . (12)
with 12%, PSO-Boost 12% and only 11% optimization time k=1 Tp
k
taken by DFTF. Cats move on a global scale to find the global where,
best position that prevents them to fall into global optima so, Tk : k th task
they tend to optimize the solution quite fast. Tpk : Time period for completing k th task
The throughput time of all baselines is shown in Figure 10.
G. EVALUATION OF DFTF ON EXECUTION TIME
Two algorithms CBS-MKC and IACSO-SVM have initially
Execution time is calculated using the following proposed taken more time in providing throughput which gets further
equation: increased to 100 VMs because these algorithms could not
XN
Et = Tt (Tk ). (11) quickly optimize. CSO-DA started better but surged after
k=1 the addition of 500 VMs because more tasks are adding
where, complexity in it.
Tt (Tk ) : Total time for executing k th task However, FSALB, GA-ACO, and DFTF have shown good
The execution time of all baselines is shown in Figure 9. throughput performance. Overall, IACSO-SVM has taken
Here, DFTF initially performed extremely well and took the much throughput time that is 19% followed by CSO-DA with
least time and then started to rise when VMs gets 50 in 18%, CBS-MKC with 16%, PSO-Boost with 13%, FSALB
size because DFTF initially converges slowly. However, not and GA-ACO with 12% each, and only 10% throughput time
huge improvement is observed in DFTF but overall, compar- taken by DFTF. Stronger robustness by DFTF has resulted in
atively better performance can be seen. The algorithms like generating solutions in minimum throughput time.
CBS-MKC and GA-ACO from the very start deviate a lot and
therefore take more execution time in almost all runs. FSALB I. EVALUATION OF DFTF ON OVERHEAD TIME
remains quite better with every increase in VMs whereas, Overhead time is calculated using the following proposed
PSO-Boost and DFTF execute quite fast and produced better equation:
results in the presence of other baselines. Overall, CBS-MKC XN
has taken much execution time that is 19% followed by OHT = (Tott (Ti ) − Et (Ti )). (13)
i=1
J. STATISTICAL ANALYSIS
We have checked the resulting values of all parameters and
found their distribution is normal. In that case, there is a need
for a parametric test that involves 2 variables because we
have taken one baseline at a time and compare it with DFTF.
In statistics, the suitable test for 2 variables with normal
distribution is student t-test. Similarly, we can see in Table 7
the values such as mean, standard deviation (SD), p-value,
and t-value. Meanwhile, the significance level is set to
p< 0.05 [106]. At this stage, we need to define the hypothesis
in the following manner:
H0: DFTF and other baselines have no difference.
H1: A significant difference exists between DFTF and
other baselines.
We can see that p-values in all cases are less than the
significance level that is <0.05 which means that the signif-
FIGURE 10. Performance of DFTF and Baselines on Throughput Time on icant difference exists among the values of DFTF and other
VMs (5-2000). baselines. So, we are right to reject the null hypothesis and
accept the alternate hypothesis. Similarly, we can say that a
significant difference exists in terms of energy consumption,
response time, SLA violations, migration time, optimization
time, execution time, throughput time, and overhead time.
K. RANKING BASELINES
Table 8 shows eight Quality of Service (QoS) metrics used in
this study against seven baselines.
where,
Tott (Ti ) : Total time required for executing ith task
Et (Ti ) : Execution time of ith task
The overhead time of all baselines is shown in Figure 11.
Two algorithms CSO-DA and PSO-BOOST started with
huge overhead time which gets stable at 50 VMs but again
FIGURE 12. Comparative analysis of all baselines on various parameters.
instability is observed after 500 VMs which gets increased
after every run. This is mainly because of their computa- It can be observed that certain baselines perform better
tional complexity. Overall, on average, CSO-DA has taken in one scenario and average or worst in another scenario
more overhead time that is 18% followed by CBS-MKC but proposed DFTF performed better among them followed
and PSO-BOOST with 17% each, IACSO-SVM with 15%, by PSO-BOOST, GA-ACO, FSALB, IACO-SVM, CSO-DA,
PSO-Boost with 13%, FSALB and GA-ACO with 13% each, and CBS-MKC, respectively. Figure 12 shows the averaged
and only 7% throughput time taken by DFTF. performance of all baselines in terms of energy efficiency,
The minimal computational complexity, fewer iterations in response time, SLA violations, migration time, and optimiza-
finding global optima, low communication cost, low over- tion time over varying tasks and VMs. It is shown that overall
head, and better convergence has made DFTF a better choice DFTF has outperformed in all five-performance metrics in
over other baselines. the presence of other baselines.
VI. CONCLUSION [3] J. Zhao, K. Yang, X. Wei, Y. Ding, L. Hu, and G. Xu, ‘‘A heuristic
The impact of file type format classification has made signif- clustering-based task deployment approach for load balancing using
bayes theorem in cloud environment,’’ IEEE Trans. Parallel Distrib. Syst.,
icant contributions to cloud computing. We have proposed a vol. 27, no. 2, pp. 305–316, Feb. 2016.
DFTF approach that achieves better results in load balancing. [4] A. Nadjaran Toosi, C. Qu, M. D. de Assunção, and R. Buyya,
In the conducted study, DFTF is developed in two steps. ‘‘Renewable-aware geographical load balancing of Web applications for
sustainable data centers,’’ J. Netw. Comput. Appl., vol. 83, pp. 155–168,
In the first step, file type classification is done in various Apr. 2017.
formats such as video, audio, text, and images in a cloud [5] S. S. Patil and A. N. Gopal, ‘‘Dynamic load balancing using periodically
environment resulting in an appropriate data class. In our load collection with past experience policy on linux cluster system,’’
case, we have used four data classes in which appropriate Amer. J. Math. Comput., vol. 2, no. 2, pp. 60–75, 2017.
[6] S. Mohanty, P. K. Patra, M. Ray, and S. Mohapatra, ‘‘A novel meta-
file format falls. A total of 60,000 datasets/data files are heuristic approach for load balancing in cloud computing,’’ Int. J. Knowl.-
collected from different sources and placed in the cloud for Based Organizations, vol. 8, no. 1, pp. 29–49, Jan. 2018.
classification. Classification is performed using SVM one to [7] K.-M. Cho, P.-W. Tsai, C.-W. Tsai, and C.-S. Yang, ‘‘A hybrid meta-
heuristic algorithm for VM scheduling with load balancing in cloud com-
many classification approaches providing the best accuracy puting,’’ Neural Comput. Appl., vol. 26, no. 6, pp. 1297–1309, Aug. 2015.
among other classifiers such as Multiclass, J48, Bayes Net, [8] M. Junaid, A. Sohail, A. Ahmed, A. Baz, I. A. Khan, and H. Alhakami,
and ACO-SVM. In the second step, the resultant data class is ‘‘A hybrid model for load balancing in cloud using file type formatting,’’
fed into a CSO which performs load balancing in an efficient IEEE Access, vol. 8, pp. 118135–118155, 2020.
[9] L. Heilig, R. Buyya, and S. Voß, ‘‘Location-aware brokering for con-
manner. In CSO, we have introduced the grouping phase sumers in multi-cloud computing environments,’’ J. Netw. Comput. Appl.,
which divides the data files into four groups’ audio, video, vol. 95, pp. 79–93, Oct. 2017.
image, and text. The offline preprocessing in the cloud for [10] A. Kaur, B. Kaur, and D. Singh, ‘‘Comparative analysis of metaheuristics
classification helps in reducing the computational complexity based load balancing optimization in cloud environment,’’ in Smart and
Innovative Trends in Next Generation Computing Technologies (Com-
and increases the efficiency in load balancing. Furthermore, munications in Computer and Information Science), vol. 827. Singapore:
the validation of DFTF is established through QoS evalua- Springer, 2018, pp. 30–46.
tion metrics in terms of energy consumption, response time, [11] P. Kumar and R. Kumar, ‘‘Issues and challenges of load balancing tech-
niques in cloud computing: A survey,’’ ACM Comput. Surv., vol. 51, no. 6,
SLA violations, migration time, execution time, throughput p. 120, 2019.
time, overhead time, and optimization time. DFTF due to [12] W. Gai, C. Qu, J. Liu, and J. Zhang, ‘‘A novel hybrid meta-heuristic
its hybrid nature has taken the relative advantages of SVM algorithm for optimization problems,’’ Syst. Sci. Control Eng., vol. 6,
no. 3, pp. 64–73, Sep. 2018.
and ICSO which helps in achieving better performance in
[13] P. Kaur and P. D. Kaur, ‘‘Efficient and enhanced load balancing algo-
the presence of baselines such as CBS-MKC, FSALB, PSO- rithms in cloud computing,’’ Int. J. Grid Distrib. Comput., vol. 8, no. 2,
BOOST, IACSO-SVM, CSO-DA, and GA-ACO. pp. 9–14, Apr. 2015.
The proposed approach is a multi-factor approach that [14] C. Gomez, A. Shami, and X. Wang, ‘‘Machine learning aided scheme for
load balancing in dense IoT networks,’’ Sensors, vol. 18, no. 11, p. 3779,
ultimately saves time, cost, and valuable resources. It also Nov. 2018.
improves the scalability, and robustness in the cloud environ- [15] X. Sui, D. Liu, L. Li, H. Wang, and H. Yang, ‘‘Virtual machine schedul-
ment. In the future, we will perform load balancing in the ing strategy based on machine learning algorithms for load balancing,’’
EURASIP J. Wireless Commun. Netw., vol. 2019, no. 1, p. 160, Dec. 2019.
cloud by considering other sensitive parameters like deadline
[16] B. Tang, Y. Li, X. Li, L. Xu, Y. Yan, and Q. Yang, ‘‘Deep CNN framework
constraints, priority-based scheduling, and task immigrations for environmental sound classification using weighting filters,’’ in Proc.
using deep learning approaches. IEEE Int. Conf. Mechatronics Autom. (ICMA), Tianjin, China, Aug. 2019,
pp. 2297–2302.
[17] A. Zakaria, R. Rizal, and O. Dwi, ‘‘Particle swarm optimization and
REFERENCES support vector machine for vehicle type classification in video stream,’’
[1] M. Sheikhalishahi, R. M. Wallace, L. Grandinetti, J. L. Vazquez-Poletti, Int. J. Comput. Appl., vol. 182, no. 18, pp. 9–13, Sep. 2018.
and F. Guerriero, ‘‘A multi-dimensional job scheduling,’’ Future Gener. [18] Y. F. Huang and S. H. Wang, ‘‘Movie genre classification using SVM with
Comput. Syst., vol. 54, pp. 123–131, Jan. 2016. audio and video features,’’ in Active Media Technology (Lecture Notes
[2] T. Carli, S. Henriot, J. Cohen, and J. Tomasik, ‘‘A packing problem in Computer Science), vol. 7669, R. Huang, A. A. Ghorbani, G. Pasi,
approach to energy-aware load distribution in Clouds,’’ Sustain. Comput., T. Yamaguchi, N. Y. Yen, and B. Jin, Eds. Berlin, Germany: Springer,
Inform. Syst., vol. 9, pp. 30–32, Mar. 2016. 2012.
[19] K. M. Salama and A. M. Abdelbar, ‘‘Learning neural network structures [41] S. Kumar Mishra, S. Bibhudatta, and P. P. Parida, ‘‘Load balancing in
with ant colony algorithms,’’ Swarm Intell., vol. 9, no. 4, pp. 229–265, cloud computing: A big picture,’’ J. King Saud Univ.-Comput. Inf. Sci.,
Dec. 2015. vol. 32, no. 2, pp. 149–158, 2020.
[20] L. Jiao and L. Feng, ‘‘Text classification based on ant colony optimiza- [42] R. Shaikh and M. Sasikumar, ‘‘Data classification for achieving secu-
tion,’’ in Proc. 3rd Int. Conf. Inf. Comput., Jun. 2010, pp. 229–232. rity in cloud computing,’’ Procedia Comput. Sci., vol. 45, pp. 493–498,
[21] Q. Wang, R. Peng, J. Wang, Y. Xie, and Y. Zhou, ‘‘Research on text 2015.
classification method of LDA- SVM based on PSO optimization,’’ [43] H. B. Barua and K. C. Mondal, ‘‘A comprehensive survey on cloud
in Proc. Chin. Autom. Congr. (CAC), Hangzhou, China, Nov. 2019, data mining (CDM) frameworks and algorithms,’’ ACM Comput. Surv.,
pp. 1974–1978. vol. 52, no. 5, pp. 1–62, Oct. 2019.
[22] P. Adriana, L. Veronica, P. R. Pasquale, and I. Sidhu, ‘‘A genetic algorithm [44] H. Song and J. G. Lee, ‘‘RP-DBSCAN: A superfast parallel DBSCAN
for text classification rule induction,’’ in Proc. Joint Eur. Conf. Mach. algorithm based on random partitioning,’’ in Proc. Int. Conf. Manage.
Learn. Knowl. Discovery Databases. Berlin, Germany: Springer, 2008, Data, Houston, TX, 2018, pp. 1173–1187.
pp. 188–203. [45] R. Jin, C. Kou, R. Liu, and Y. Li, ‘‘Efficient parallel spectral cluster-
[23] H. Hasanpour, R. Ghavamizadeh Meibodi, and K. Navi, ‘‘Improving rule- ing algorithm design for large data sets under cloud computing envi-
based classification using harmony search,’’ PeerJ Comput. Sci., vol. 5, ronment,’’ J. Cloud Comput., Adv., Syst. Appl., vol. 2, no. 1, p. 18,
p. e188, Nov. 2019. 2013.
[24] F. Yigit and O. K. Baykan, ‘‘A new feature selection method for text [46] J. Chen, K. Li, Z. Tang, K. Bilal, S. Yu, C. Weng, and K. Li, ‘‘A parallel
categorization based on information gain and particle swarm optimiza- random forest algorithm for big data in a spark cloud computing environ-
tion,’’ in Proc. IEEE 3rd Int. Conf. Cloud Comput. Intell. Syst., Nov. 2014, ment,’’ IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 4, pp. 919–933,
pp. 523–529. Apr. 2017.
[25] H. Peng, C. Ying, S. Tan, B. Hu, and Z. Sun, ‘‘An improved feature [47] F. Ozgur Catak and M. Erdal Balaban, ‘‘CloudSVM: Training an
selection algorithm based on ant colony optimization,’’ IEEE Access, SVM classifier in cloud computing systems,’’ in Proc. Joint Int. Conf.
vol. 6, pp. 69203–69209, 2018. Pervasive Comput. Netw. World. Berlin, Germany: Springer, 2012,
[26] C. López-Franco, L. Villavicencio, N. Arana-Daniel, and A. Y. Alanis, pp. 57–68.
‘‘Image classification using PSO-SVM and an RGB-D sensor,’’ Math. [48] B. Apexa Kamdar and M. Jay Jagani, ‘‘A survey: Classification of huge
Problems Eng., vol. 2014, Jul. 2014, Art. no. 695910. cloud datasets with efficient map-reduce policy,’’ Int. J. Eng. Trends
[27] C. Sukawattanavijit, J. Chen, and H. Zhang, ‘‘GA-SVM algorithm for Technol. (IJETT), vol. 18, no. 2, pp. 103–107, 2014.
improving land-cover classification using SAR and optical remote sens- [49] D. Apiletti, E. Baralis, T. Cerquitelli, P. Garza, P. Michiardi, and
ing data,’’ IEEE Geosci. Remote Sens. Lett., vol. 14, no. 3, pp. 284–288, F. Pulvirenti, ‘‘PaMPa-HD: A parallel MapReduce-based frequent pattern
Mar. 2017. miner for high-dimensional data,’’ in Proc. IEEE Int. Conf. Data Mining
[28] V. Pallavi and V. Vaithiyanathan, ‘‘Combined artificial neural network Workshop (ICDMW), Atlantic City, NJ, USA, Nov. 2015, pp. 839–846.
and genetic algorithm for cloud classification,’’ Int. J. Eng. Res. Technol., [50] I. Strumberger, N. Bacanin, M. Tuba, and E. Tuba, ‘‘Resource scheduling
vol. 5, pp. 787–794, Apr. 2013. in cloud computing based on a hybridized whale optimization algorithm,’’
[29] Data Preprocessing for Machine Learning: Options and Recommen- Appl. Sci., vol. 9, no. 22, p. 4893, Nov. 2019.
dations. Accessed: May 17, 2020. [Online]. Available: https://cloud. [51] D. Pebrianti, A. Nurnajmin, B. Luhur, A. Nor Rul Hasma, Z. Zainah, and
google.com/solu tions/machine-learning/data-preprocessing-for-ml- I. Riyanto, ‘‘Extended bat algorithm (EBA) as an improved searching
with-tf-transform-pt2 optimization algorithm,’’ in Proc. 10th Nat. Tech. Seminar Underwater
[30] S. C. Chu, P. W. Tsai, and J. S. Pan, ‘‘Cat swarm optimization,’’ in PRICAI Syst. Technol. (NUSYS). Singapore: Springer, 2018.
2006: Trends in Artificial Intelligence (Lecture Notes in Computer Sci- [52] D. Chaudhary and B. Kumar, ‘‘Cloudy GSA for load scheduling
ence), vol. 4099, Q. Yang and G. Webb, Eds. Berlin, Germany: Springer, in cloud computing,’’ Appl. Soft Comput., vol. 71, pp. 861–871,
2006. Oct. 2018.
[31] A. M. Ahmed, T. A. Rashid, and S. A. M. Saeed, ‘‘Cat swarm optimiza- [53] S. Torabi and F. Safi-Esfahani, ‘‘A dynamic task scheduling framework
tion algorithm: A survey and performance evaluation,’’ Comput. Intell. based on chicken swarm and improved raven roosting optimization meth-
Neurosci., vol. 2020, Jan. 2020, 4854895. ods in cloud computing,’’ J. Supercomput., vol. 74, no. 6, pp. 2581–2626,
[32] C. E. Klein, L. dos Santos Coelho, Â. M. O. Sant’Anna R. Z. Freire, and Jun. 2018.
V. C. Mariani, ‘‘Improved cat swarm optimization approach applied to [54] A. Al-Hamodi, S. Lu, and Y. Al-Salhi, ‘‘An enhanced frequent pat-
reliability-redundancy problem,’’ in Proc. 22nd Eur. Symp. Artif. Neural tern growth based on MapReduce for mining association rules,’’ Int.
Netw. (ESANN), Bruges, Belgium, Apr. 2014, pp. 1–6. J. Data Mining Knowl. Manage. Process, vol. 6, no. 2, pp. 19–28,
[33] (Jan. 1, 2016). Using a PostgreSQL Database as a Source for AWS DMS. 2016.
Accessed: May 9, 2020. [Online]. Available:https://docs.aws.amazon. [55] F.-H. Tseng, X. Wang, L.-D. Chou, H.-C. Chao, and V. C. M. Leung,
com/dms/latest/userguide/CHAP_Source.PostgreSQL.html ‘‘Dynamic resource prediction and allocation for cloud data center using
[34] S. Vrajesh and M. Bala, ‘‘An improved task allocation strategy in cloud the multiobjective genetic algorithm,’’ IEEE Syst. J., vol. 12, no. 2,
using modified K-means clustering technique,’’ Egyptian Inform. J., pp. 1688–1699, Jun. 2018.
vol. 4, pp. 1–8, 2020. [56] D. Soni, A. Mishra, and H. Gupta, ‘‘An efficient cloud data mining (CDM)
[35] K. Sekaran, M. S. Khan, R. Patan, A. H. Gandomi, P. V. Krishna, and algorithm for frequent pattern mining in cloud computing environment,’’
S. Kallam, ‘‘Improving the response time of M-Learning and cloud com- Lect. Notes Softw. Eng, vol. 4, no. 3, pp. 234–237, 2016.
puting environments using a dominant firefly approach,’’ IEEE Access, [57] M. S. Sudheer and M. Dr Vamsi Krishna 2019, ‘‘Dynamic PSO for task
vol. 7, pp. 30203–30212, 2019. scheduling optimization in cloud computing,’’ Int. J. Recent Technol.
[36] M. Kumar and S. C. Sharma, ‘‘PSO-based novel resource scheduling tech- Eng., vol. 8, no. 2S11, pp. 3559–3589, 2019.
nique to improve QoS parameters in cloud computing,’’ Neural Comput. [58] K. Mangayarkkarasi and M. Chidambaram, ‘‘An intelligent service rec-
Appl., vol. 194, pp. 1–24, Jun. 2019. ommendation model for service usage pattern discovery in secure cloud
[37] S. Rongali and R. Yalavarthi, ‘‘An improved ant colony optimization for computing environment,’’ J. Theor. Appl. Inf. Technol., vol. 95, no. 12,
parameter optimization using support vector machine,’’ Int. J. Eng. Adv. pp. 3500–3512, 2017.
Technol. (IJEAT), vol. 6, no. 3, pp. 198–204, 2017. [59] A. Bouzidi, M. E. Riffi, and M. Barkatou, ‘‘Cat swarm optimization for
[38] A. Pourghaffari and M. Barar, ‘‘Workflow scheduling in cloud comput- solving the open shop scheduling problem,’’ J. Ind. Eng. Int., vol. 15,
ing environment using hybrid CSO-DA,’’ Int. J. Nonlinear Anal. Appl., no. 2, pp. 367–378, Jun. 2019.
vol. 10, no. 2, pp. 177–188, 2019. [60] M. Kumar, SC. Sharma, ‘‘Dynamic load balancing algorithm for balanc-
[39] A. M. Senthil Kumar and M. Venkatesan, ‘‘Multi-objective task schedul- ing the workload among virtual machine in cloud computing,’’ in Proc.
ing using hybrid genetic-ant colony optimization algorithm in cloud 7th Int. Conf. Adv. Comput. Commun. (ICACC), Cochin, India, Aug. 2017,
environment,’’ Wireless Pers. Commun., vol. 107, no. 4, pp. 1835–1848, pp. 322–329.
Aug. 2019. [61] L. Zhou and X. Wang, ‘‘Research of the FP-growth algorithm based
[40] A. Thakur and M. S. Goraya, ‘‘A taxonomic survey on load balancing in on cloud environments,’’ J. Software, vol. 9, no. 3, pp. 676–683,
cloud,’’ J. Netw. Comput. Appl., vol. 98, pp. 43–57, Nov. 2017. 2014.
[62] A. K. Maurya and A. K. Tripathi, ‘‘Deadline-constrained algorithms for [83] J. Meena, M. Kumar, and M. Vardhan, ‘‘Cost effective genetic algo-
scheduling of bag-of-tasks and workflows in cloud computing environ- rithm for work?ow scheduling in cloud under deadline constraint,’’ IEEE
ments,’’ in Proc. 2nd Int. Conf. High Perform. Compilation, Comput. Access, vol. 4, pp. 5065–5082, 2016.
Commun. (HP3C), Hong Kong, Mar. 2018, pp. 6–10. [84] A. Nazia and D. Huifang, ‘‘A hybrid metaheuristic for multi-objective
[63] R. Rautray and R. C. Balabantaray, ‘‘Cat swarm optimization based scientific workflow scheduling in a cloud environment,’’ Appl. Sci., vol. 8,
evolutionary framework for multi document summarization,’’ Phys. A, no. 4, p. 538, 2018.
Stat. Mech. Appl., vol. 477, pp. 174–186, Jul. 2017. [85] W. Zhong, Y. Zhuang, J. Sun, and J. Gu, ‘‘A load prediction model
[64] M. Meyer, J. Beutel, and L. Thiele, ‘‘Unsupervised feature learning for cloud computing using PSO-based weighted wavelet support vec-
for audio analysis,’’ in Proc. 5th Int. Conf. Learn. Represent. (ICLR), tor machine,’’ Int. J. Speech Technol., vol. 48, no. 11, pp. 4072–4083,
Workshop Track, Toulon, France, 2017, pp. 1–4. Nov. 2018.
[65] G. Danlami, A. S. Ismail, A. Zainal, Z. Zakaria, A. Abraham, and [86] M. Ashouraei, S. N. Khezr, R. Benlamri, and N. J. Navimipour, ‘‘A new
N. M. Dankolo, ‘‘Cloud customers service selection scheme based on SLA-aware load balancing method in the cloud using an improved paral-
improved conventional cat swarm optimization,’’ Neural Comput. Appl., lel task scheduling algorithm,’’ in Proc. IEEE 6th Int. Conf. Future Inter-
vol. 6, pp. 1–22, 2020. net Things Cloud (FiCloud), Barcelona, Spain, Aug. 2018, pp. 71–76.
[66] P. Bajare, M. Bhoyate, Y. Bhujbal, E. Monika, and V. Shinde, ‘‘k-nearest [87] N. Sharma and S. Maurya, ‘‘SLA-based agile VM management in cloud
neighbor classification over encrypted cloud data,’’ IOSR J. Comput. Eng. & datacenter,’’ in Proc. Int. Conf. Mach. Learn., Big Data, Cloud Parallel
(IOSR-JCE), pp. 45–48, 2015. Comput. (COMITCon)), Faridabad, India, Feb. 2019, pp. 252–257.
[67] D. Gabi, A. S. Ismail, A. Zainal, Z. Zakaria, and A. Abraham, ‘‘Orthog- [88] A. Kumar and S. Bawa, ‘‘A comparative review of meta-heuristic
onal taguchi-based cat algorithm for solving task scheduling problem in approaches to optimize the SLA violation costs for dynamic execution of
cloud computing,’’ Neural Comput. Appl., vol. 30, no. 6, pp. 1845–1863, cloud services,’’ Soft Comput., vol. 24, no. 6, pp. 3909–3922, Mar. 2020.
Sep. 2018. [89] X. Song, Y. Ma, and D. Teng, ‘‘A load balancing scheme using federate
[68] K. Liu and J. Boehm, ‘‘Classification of big point cloud data using cloud migration based on virtual machines for cloud simulations,’’ Math. Prob-
computing,’’ ISPRS-Int. Arch. Photogramm., Remote Sens. Spatial Inf. lems Eng., vol. 2015, Mar. 2015, Art. no. 506432.
Sci., vol. 40, no. 3, p. 553, 2015. [90] J. Rouzaud-Cornabas, ‘‘A distributed and collaborative dynamic load bal-
[69] L. Zuo, L. Shu, S. Dong, C. Zhu, and T. Hara, ‘‘A multi-objective ancer for virtual machine,’’ in Proc. Eur. Conf. Parallel Process. (ECPP),
optimization scheduling method based on the ant colony algorithm in Ischia, Italy, 2010, pp. 641–648.
cloud computing,’’ IEEE Access, vol. 3, pp. 2687–2699, 2015. [91] T. Jamal and A. Enrique, ‘‘Metaheuristics for energy-efficient data rout-
[70] K.-C. Lin, K.-Y. Zhang, Y.-H. Huang, J. C. Hung, and N. Yen, ‘‘Feature ing in vehicular networks,’’ Int. J. Metaheuristics, vol. 4, no. 1, pp. 27–56,
selection based on an improved cat swarm optimization algorithm for 2015.
big data classification,’’ J. Supercomput., vol. 72, no. 8, pp. 3210–3221, [92] K. Saleem and N. Fisal, ‘‘Enhanced ant colony algorithm for self-
Aug. 2016. optimized data assured routing in wireless sensor networks,’’ in Proc. 18th
IEEE Int. Conf. Netw. (ICON), Dec. 2012, pp. 422–427.
[71] R. Latif, H. Abbas, S. Latif, and A. Masood, ‘‘EVFDT: An enhanced
[93] L. Vu and G. Alaghband, ‘‘A load balancing parallel method for frequent
very fast decision tree algorithm for detecting distributed denial of service
pattern mining on multi-core cluster,’’ in Proc. Symp. High Perform.
attack in cloud-assisted wireless body area network,’’ Mobile Inf. Syst.,
Comput., San Diego, CA, USA, 2015, pp. 49–58.
vol. 2015, pp. 1–13, 2015, 260594.
[94] N. Susila, ‘‘An efficient load balancing approach for energy aware cloud
[72] D. Gabi, A. S. Ismail, and N. M. Dankolo, ‘‘Minimized makespan
environment,’’ Ph.D. dissertation, Dept. Inf. Commun. Eng., Anna Univ.,
based improved cat swarm optimization for efficient task scheduling in
Chennai, India, 2017.
cloud datacenter,’’ in Proc. 3rd High Perform. Comput. Cluster Technol.
[95] R. Khorsand and M. Ramezanpour, ‘‘An energy-efficient task-scheduling
Conf. (HPCCT), New York, NY, USA, Jun. 2019, pp. 16–20.
algorithm based on a multi-criteria decision-making method in cloud
[73] C. Bae, N. Wahid, Y. Y. Chung, and W. C. Yeh, ‘‘Effective audio classifi-
computing,’’ Int. J. Commun. Syst., vol. 33, no. 9, p. e4379, 2020.
cation algorithm swarm-based optimization,’’ Int. J. Innov. Comput., Inf.
[96] R. M. Alguliyev, Y. N. Imamverdiyev, and F. J. Abdullayeva, ‘‘PSO-based
Control, vol. 10, no. 1, pp. 151–167, 2014.
load balancing method in cloud computing,’’ Autom. Control Comput.
[74] B. Panchal and R. K. Kapoor, ‘‘Performance enhancement of cloud Sci., vol. 53, no. 1, pp. 45–55, Jan. 2019.
computing with clustering,’’ Int. J. Eng. Adv. Technol., vol. 14, no. 6, [97] E. Rafieyan, R. Khorsand, and M. Ramezanpour, ‘‘An adaptive schedul-
pp. 37–40, 2014. ing approach based on integrated best-worst and VIKOR for cloud com-
[75] D. Gabi, A. S. Ismail, A. Zainal, Z. Zakaria, and A. Al-Khasawneh, puting,’’ Comput. Ind. Eng., vol. 140, Feb. 2020, Art. no. 106272.
‘‘Hybrid cat swarm optimization and simulated annealing for dynamic [98] I. Mierswa and K. Morik, ‘‘Automatic feature extraction for classifying
task scheduling on cloud computing environment,’’ J. Inf. Commun. audio data,’’ Mach. Learn., vol. 58, nos. 2–3, pp. 127–149, Feb. 2005.
Technol., vol. 17, no. 3, pp. 435–467, Jun. 2018. [99] (2010). UCI Machine Learning Repository. Accessed: May 20, 2020.
[76] D. Danilo, G. Fenu, M. Marras, and D. R. Recupero, ‘‘Bridging learning [Online]. Available: http://archive.ics.uci.edu/ml
analytics and cognitive computing for big data classification in micro- [100] R. N. Calheiros, R. Ranjan, C. A. F. De Rose, and R. Buyya, ‘‘CloudSim:
learning video collections,’’ Comput. Hum. Behav., vol. 92, pp. 468–477, A novel framework for modeling and simulation of cloud computing
2019. infrastructures and services,’’ 2009, arXiv:0903.2525. [Online]. Avail-
[77] C. Sauvanaud, G. Silvestre, M. Kaaniche, and K. Kanoun, ‘‘Data able: https://arxiv.org/abs/0903.2525
stream clustering for online anomaly detection in cloud applications,’’ [101] D. Kai-Bo, C. R. Jagath, and M. N. Nguyen, ‘‘One-versus-one and one-
in Proc. 11th Eur. Dependable Comput. Conf. (EDCC), Sep. 2015, versus-all multiclass SVM-RFE for gene selection in cancer classifica-
pp. 120–131. tion,’’ in Proc. Eur. Conf. Evol. Comput., Mach. Learn. Data Mining
[78] L. Tu and Y. Chen, ‘‘Stream data clustering based on grid density and Bioinf. Berlin, Germany: Springer, 2007, pp. 47–56.
attraction,’’ ACM Trans. Knowl. Discovery Data, vol. 3, no. 3, pp. 1–27, [102] R. Karthika and P. Visalakshi, ‘‘A hybrid ACO based feature selection
Jul. 2009. method for email spam classification,’’ WSEAS Trans. Comput., vol. 14,
[79] Y. Kumar and P. K. Singh, ‘‘Improved cat swarm optimization algo- pp. 171–177, 2015.
rithm for solving global optimization problems and its application to [103] B. Çiğşar, D. Ünal, ‘‘Comparison of data mining classification algo-
clustering,’’ Int. J. Speech Technol., vol. 48, no. 9, pp. 2681–2697, rithms determining the default risk,’’ Sci. Program., vol. 2019, Feb. 2019,
Sep. 2018. Art. no. 8706505.
[80] P. Bisht and K. Singh, ‘‘Big data mining: Analysis of genetic K-means [104] P. Kanu, J. Vala, and P. Jaymit, ‘‘Comparison of various classification
algorithm for big data clustering,’’ Int. J. Adv. Res. Comput. Sci. Softw. algorithms on iris datasets using WEKA,’’ Int. J. Advance Eng. Res.
Eng., vol. 6, no. 7, pp. 223–228, 2016. Develop., vol. 1, pp. 1–7, Feb. 2014.
[81] J. Zgraja and M. Woniak, ‘‘Drifted data stream clustering based on [105] B. Desgraupes, ‘‘Clustering indices,’’ Ouest-Lab Modal’X, Univ. Paris,
ClusTree algorithm,’’ in Proc. Int. Conf. Hybrid Artif. Intell. Syst. Cham, Paris, France, Tech. Rep., 2013, pp. 1–34.
Switzerland: Springer, 2018, pp. 338–349. [106] M. Ghobaei-Arani, A. A. Rahmanian, A. Rahmanian, A. Souri, and
[82] K. Dubey, M. Kumar, and S. C. Sharma, ‘‘Modified HEFT algorithm for A. M. Rahmani, ‘‘A moth-flame optimization algorithm for Web service
task scheduling in cloud environment,’’ Procedia Comput. Sci., vol. 125, composition in cloud computing: Simulation and verification,’’ Softw.,
pp. 725–732, 2018. Pract. Exper., vol. 48, no. 10, pp. 1865–1892, 2018.
MUHAMMAD JUNAID is currently pursuing OSMAN KHALID received the master’s degree
the Ph.D. degree with Iqra University, Islamabad, from the Center for Advanced Studies in Engi-
Pakistan. His research interests include cloud com- neering and the Ph.D. degree from North Dakota
puting, blended learning, machine learning, swarm State University, USA. He is currently an Assis-
intelligence, and information security. tant Professor with COMSATS University Islam-
abad, Abbottabad. His research interests include
recommender systems, network routing protocols,
the Internet of Things, and fog computing.
RAO NAVEED BIN RAIS received the M.S. SYED SAJID HUSSAIN received the Master
and Ph.D. degrees in computer engineering (net- of Science degree in computer science from
works and distributed systems) from the Univer- COMSATS University Islamabad (CUI), Pakistan,
sity of Nice Sophia Antipolis, France, in 2007 and in 2007, and the Ph.D. degree from the Fern
2011, respectively. He has experience of more than Universität in Hagen, Germany, in 2013. He is
15 years in teaching, research, and industrial devel- currently an Assistant Professor with CUI, Abbot-
opment. He is currently an Associate Professor tabad. His research interests include collaborative
with the Department of Electrical and Computer computing, human-computer interaction, and dis-
Engineering, College of Engineering and Informa- tributed systems.
tion Technology, Ajman University, United Arab
Emirates. His research interests include network protocols and architectures,
information-centric and software-defined networks, network virtualization,
machine learning, internet naming, and addressing issues.