0% found this document useful (0 votes)
10 views18 pages

Cost Effective Genetic Algorithm For Workflow Scheduling in Cloud Under Deadline Constraint

This paper presents a Cost Effective Genetic Algorithm (CEGA) for workflow scheduling in cloud computing, focusing on minimizing execution costs while adhering to deadline constraints. The proposed algorithm addresses challenges such as virtual machine performance variation and acquisition delays, which are often overlooked in existing research. Performance evaluations demonstrate that CEGA outperforms current state-of-the-art algorithms on various scientific workflows.

Uploaded by

manojsarasyiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views18 pages

Cost Effective Genetic Algorithm For Workflow Scheduling in Cloud Under Deadline Constraint

This paper presents a Cost Effective Genetic Algorithm (CEGA) for workflow scheduling in cloud computing, focusing on minimizing execution costs while adhering to deadline constraints. The proposed algorithm addresses challenges such as virtual machine performance variation and acquisition delays, which are often overlooked in existing research. Performance evaluations demonstrate that CEGA outperforms current state-of-the-art algorithms on various scientific workflows.

Uploaded by

manojsarasyiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Received May 31, 2016, accepted June 28, 2016, date of publication August 11, 2016, date of current

version September 28, 2016.


Digital Object Identifier 10.1109/ACCESS.2016.2593903

Cost Effective Genetic Algorithm for


Workflow Scheduling in Cloud Under
Deadline Constraint
JASRAJ MEENA, MALAY KUMAR, AND MANU VARDHAN
Department of Computer Science and Engineering, National Institute of Technology, Raipur 492010, India
Corresponding author: J. Meena ([email protected])

ABSTRACT Cloud computing is becoming an increasingly admired paradigm that delivers high-
performance computing resources over the Internet to solve the large-scale scientific problems, but still
it has various challenges that need to be addressed to execute scientific workflows. The existing research
mainly focused on minimizing finishing time (makespan) or minimization of cost while meeting the quality
of service requirements. However, most of them do not consider essential characteristic of cloud and major
issues, such as virtual machines (VMs) performance variation and acquisition delay. In this paper, we
propose a meta-heuristic cost effective genetic algorithm that minimizes the execution cost of the workflow
while meeting the deadline in cloud computing environment. We develop novel schemes for encoding,
population initialization, crossover, and mutations operators of genetic algorithm. Our proposal considers
all the essential characteristics of the cloud as well as VM performance variation and acquisition delay.
Performance evaluation on some well-known scientific workflows, such as Montage, LIGO, CyberShake,
and Epigenomics of different size exhibits that our proposed algorithm performs better than the current
state-of-the-art algorithms.

INDEX TERMS Cloud computing, scientific workflows, resource provisioning, scheduling, quality of
service (QoS).

I. INTRODUCTION uses the opportunities and challenge of cloud and gives a cost-
Workflows have been most widely used to model large- effective schedule. Further, the cloud provides computing as
scale scientific and engineering application in the several a utility service, and its services are categorized as Infras-
domains such as Earth Science, Astronomy, Physics, and tructure as a Service (IaaS), Platform as a Service (PaaS),
Bio-informatics [1], [2]. A Workflow is represented as a and Software as a Service (SaaS) [6], [7]. Whereas, IaaS
Directed Acyclic Graph (DAGs) that have some nodes and cloud provides the hardware resources in the form of a vir-
edges. The nodes represent the computational tasks and edged tual resource such as computation, memory, storage, and
represents the data/control dependencies of large-scale sci- networking. PaaS cloud offers an environment for users to
entific applications [2]. The size of the workflow may be develop and deploy their applications, and SaaS, cloud pro-
small or vast as per the type of scientific applications. To run vides web applications/software over the internet, running on
these workflows of varying size, scientists needed a high- cloud infrastructure. PaaS and SaaS are not feasible for large
performance computing environment such as cluster comput- scale scientific workflow applications. Because they provide
ing, grid computing, and the latest one is cloud computing. only an environment to design, develop and test some web
Many existing projects are designed to execute large scientific based applications [8]. This proposal focuses on IaaS clouds
workflow applications Example. Pegasus [3], GrADS [4], that provide several cost and performance effective benefits,
ASKALON [5] on Grid Computing. as compared to cluster and grid computing. First, it provides
Nowadays, cloud computing is becoming an increasingly on demand resource provisioning and these resources are
admired paradigm that delivers high-performance computing controlled by service consumers.
resources over the internet to solve the large-scale scientific Second, a cloud service provider allows the end user to
applications. So, a novel approach needs to be developed that procure and release computing resource as per their demands.

2169-3536 2016 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 4, 2016 Personal use is also permitted, but republication/redistribution requires IEEE permission. 5065
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
J. Meena et al.: CEGA for Workflow Scheduling in Cloud

Therefore, the scientific applications can grow or shrink Further, workflow scheduling in the cloud has mainly two
their resource pool as per the need of their applications. steps: in the first step, the set of computing resources are
The cloud allocates only required computing resources from selected from the cloud to execute computational tasks and
the resource pool so that overall resource utilization can be then to provision of selected resources. In the second step,
increased while reducing the total execution cost [6]. Third, a schedule is generated and then the mapping of each task to
cloud service provider’s charges to the service consumers the suitable resource is done, so the overall execution cost can
using pay-per-uses price model, in which they have to pay be minimized while meeting the deadline constraint. Further-
only for the computing resources they have used. Most of more, most of the previous work on workflow scheduling in
the service providers charge to the user for the whole time cluster and grid mainly focuses on resource planning phase,
interval if they used only a fractions of last time interval. means scheduling of task on suitable resources. The reason
For example, suppose the user uses computing resources behind that the cluster and grid computing provide the only
61 minutes and the time interval is 60 minutes, then the static pool of computing resources and their configuration is
user has to pay for two-time intervals. In this research, we known in advances. Although, most of the researcher in grid
use a similar pricing model that provided by Amazon [9], computing area focused on the minimization of execution
Google [10] and CloudSigma [11]. time (Makespan) while cloud researcher focuses on execu-
Although, Cloud computing have various benefits, but still tion time besides execution cost, energy consumption, the
it have some issues that need to be addressed. First, the high degree of security and a fault tolerant based workflow
performance of VMs can vary due to virtualization of hard- scheduling with satisfying user’s QoS requirements.
ware resources, the multi-tenancy of cloud infrastructure and Our research work is based on the meta-heuristic Genetic
heterogeneous computing resources. As per the report of [12] Algorithms (GAs). It is a model that was introduced by
the overall CPU performance of the VM can vary up to 24 per- John Holland in 1975 [15] and inspired by evaluation.
centages in Amazon public cloud. However, [13] reported It encodes a potential solution to a particular problem on a sin-
that a typical cloud environment the performance variation gle chromosome and applies generic operators like crossover
can be up to 30(%) in execution time and up to 65(%) in data and mutation to improve that solution. GA is often observing
transfer time. Further, due to the variation in performance as optimization function, and it is involved in broad range
of VMs the overall deadline can be missed or budget can of problems such as Pattern Recognition, Image Processing,
be increased of a scientific workflow. So, the performance Data Mining, among others. In this paper, we proposed a Cost
variations have a significant impact on the scheduling of Effective Genetic algorithm (CEGA) workflow scheduling
workflows on cloud computing environment. Our proposed algorithm to schedule each task of the workflow to the cheap-
algorithms consider the virtual machines performance vari- est resource in the cloud. Our proposal CEGA considers all
ation to meet the user defined deadline constraint. Second, characteristics of cloud such as on-demand resource provi-
when a VM is leased, it takes the time to proper initial- sioning, elasticity and pay-per-uses price model and it also
ization (acquisition delay) and similarly, whenever com- focuses on the issues such as VMs performance variation and
puting resources will release, they will take the time to their acquisition delay. To achieve this novel scheme has been
shut-down (termination delay). So, the longer time in present for encoding, population initialization, and generic
resource acquire will increase the total execution time and operators. Our proposal not only minimizes the overall execu-
longer time in the shutdown will increase the overall cost of tion cost of the workflow but also improves the hard deadline
the workflow. Thus, these delays will affect the overall perfor- constraint in cloud computing environment.
mance and cost of the scientific workflows. In our proposed The remainder of this paper is organized as follows
work, we assume an average acquisition delay for each type Section 2 discusses related work followed by the architecture
of VM, which can recognize the overall delay in execution of of workflow scheduling such as application model and cloud
workflows. Although, we do not focus on termination delay resource model in Section 3. Then the problem is defined
because it will not affect the deadline constraint. Third, the in Section 4. The proposed meta-heuristic CEGA scheduling
issue is, if any computing resources like VMs will fail dur- algorithm is explained in Section 5. Section 6 presents the
ing the execution of workflow, the total execution time will performance evaluations and Section 7 conclude this research
increase. But the cloud environment provides 99.9 percent work and discusses future work.
availability [14] of computing resources, so the performance
variation is a serious problem as compared to VM failure II. RELATED WORK
issue. So, we are not focusing on this issue for execution The multitask workflow scheduling on parallel and dis-
of scientific workflows, our primary focuses to make cloud tributed system is extensively studied over the year, and it
computing environment cost effective. is NP-Hard Problem [16], [17]. Therefore, it is very difficult
However, above discussion on benefits and issues of cloud, to produce an optimal solution within the polynomial time.
dictates the development of a novel cost-effective workflow However, various heuristic and meta-heuristic algorithms
scheduling algorithm for large-scale scientific workflows. have been proposed that provide approximate or near optimal
So, we propose a Cost Effective Genetic Algorithm (CEGA) solutions for parallel and distributed paradigms such as Grid
to schedule each task of the workflow to suitable resource. and Cluster computing [18]–[25]. Although, very few of the

5066 VOLUME 4, 2016


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

FIGURE 1. Cost aware workflow scheduling in cloud under QoS constraint.

proposal has been focused on cloud computing to meet the (finish time) of scientific workflows, especially for data inten-
QoS requirements of the user. However, the detailed review sive applications.
of cost-aware challenges for workflow scheduling has been Another work based on executing multiples workflows
presented in [26] and [27] and they also discussed several addressed in [29]. They proposed three algorithms to execute
scheduling algorithms related to minimization of execution multiple workflows in cloud in which two are dynamic algo-
cost under QoS constraints. In this section, we have done rithms, DPDS (Dynamic Provisioning Dynamic Scheduling),
a brief literature survey on cost aware workflow scheduling WA-DPDS (Workflow-Aware DPDS) and one static algo-
with QoS constraint in the cloud computing environment. rithm is SPSS (Static Provisioning Static Scheduling) [29].
We have classified cost aware workflow scheduling algo- Their proposal try to maximize the number of workflows
rithm in cloud under QoS constraint, into two major cate- executed while satisfying the QoS constraints (such as dead-
gories Heuristic based and Meta-heuristic based scheduling line and budget) and their proposal also focused on the
algorithms shown in Fig. 1. The Heuristic based algorithms issues that are considered by Mao and Humphrey. The
are further classified into two sub categories first is the based algorithms presented in the papers [28] and [29] were
on minimizing execution cost of the workflow on unlimited designed for workflow ensembles not for single workflow
number of computing resources and second one is based instances.
on minimizing number of computing resource required to Further, Heuristic based workflow scheduling algorithms
executes workflow. Furthermore, the minimization of cost to execute a single workflow instance on cloud comput-
can be categorised based on executing single workflow or ing environment are presented in [30]–[33]. The work
multiple workflows (workflow ensembles). in [30]–[32] are based on the partial critical path (PCPs) of
In heuristic based workflow scheduling algorithm to exe- the workflow. In [30] authors proposed IaaS Cloud Partial
cutes multiple workflows. Initially, Mao and Humphrey pro- Critical Path (IC-PCP) static algorithm that initially estimates
posed a Scaling-Consolidation-Scheduling (SCS) algorithm the latest finish time for each task and then determine each
to execute, workflow ensembles on the cloud [28]. Their partial critical path (PCPs), associated with exit node of the
proposal is based on instance consolidation and bundling of workflow. Then assigns each task of the partial critical path
tasks. They focused on the heterogeneous type of VMs and to the cheapest VM instance, which can execute before its
major issues such as performance variation and acquisition finish time. If it does not find any active VM instance, that
delay of VMs. However, their proposal does not consider meets the finish time constraint of the tasks, procure a new
data transfer time between tasks. It has significant impor- cheapest VM instance from the cloud that completes all the
tance, and it will affect the total execution cost and makespan tasks before its latest finish time and PCP is assigned to it.

VOLUME 4, 2016 5067


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

This process is repeated until all the tasks of the workflow number of VM instances needed to guarantee the satisfaction
are scheduled. of workflow’s end to end deadline. Then they developed a
In paper [31] the authors tries to meet the soft heuristic minimal slack time and minimal distance (MSMD)
deadline of the workflow while replicating the tasks. They to minimize these VM instances and then schedule tasks to
proposed Enhance IaaS Cloud Partial Critical Path (EIPR) the allocated VMs instance so that execution time is reduced.
workflow scheduling algorithm that use idle time of pro- Further, they use IHM (instance hour minimization) algo-
visioned resources budget surplus to replicate tasks. Their rithm to reduce the instance hours needed by VMs to com-
proposals consider the VMs performance variation and the plete the application’s execution. However, their algorithm
variable budget to meet the soft deadline of the workflow. does not consider cloud characteristics and issues similar to
On the other hand, to execute single workflow more algo- the proposed work in paper [34].
rithms are presented in paper [32] are robust and fault tolerant The second category of workflow scheduling algorithms in
in nature. However, the paper [30]–[32] focused on all the cloud is based on Meta-heuristic algorithms that are presented
characteristics of the cloud. While, paper [30] does not con- in [37]–[40], these algorithms executes only single workflow
sider the issues (VMs performance variation, and acquisition in cloud environment. The proposal in [37] and [38] are
delay) of the cloud. Although, the work proposed in [32] based on Particle Swam Optimization (PSO) meta-heuristics
resolves major issues of the cloud. The performance of their approach. In paper [37] authors try to minimize the execution
proposal is estimated based on the propriety order of these cost of the workflow while balancing the load on the available
policies that are a) Robust-Cost-Time (RCT), b) Robust- resources. Their proposal considers a fixed set of VMs in
Time-Cost (RTC) and c) Weighted Policies that allow users the resource pool for allocation to tasks while they do not
to define their own function. The comparison of our proposed consider the elastic nature of the cloud. In Paper [38] a meta
work has been done with the robust scheduling algorithm only heuristic optimization based approach, Particle Swarm Opti-
for the parameters which caters for performance variation and mization (PSO) scheduling algorithm has been presented that
the acquisition delays of VMs. Furthermore, we have selected gives promising results, but there are still rooms for enhance-
RCT and RTC resource selection policies as our baseline ment. Their approach encodes particles according to the index
algorithms. of the resources that represent the position of the particle.
Another most recent heuristic algorithm to execute single However, these indexes do not have much information about
workflow developed for clouds is presented in [33]. They the resource, so particles move in different dimensions to
proposed a dynamic cost effective workflow scheduling algo- individual best, and global best may not lead better solution.
rithm Just in Time (JIT-C). In which they ascertain that the If the deadline is hard, using PSO it is difficult to find a
user defined deadline is achievable or not, if it is not then user feasible solution.
have to prompt the deadline. Otherwise, a feasible solution Further, In research work [39] and [40] are based on multi-
is determined in just before the tasks are ready to execute. objective Genetic Algorithms. In paper [39] authors proposed
Their proposal Just in Time (JIT-C) workflow scheduling deadline constrained scheduling for cost optimization based
algorithm consider all the characteristics and issues such as on the dynamic objective strategy but still it uses the sim-
performance variation and acquisition delay. ilar encoding scheme that uses in [38]. This proposal also
In another category of heuristic based workflow scheduling uses resource index as an integer to apply the crossover and
algorithm in cloud to try minimize the number of comput- mutation operators and these indexes do not contain much
ing resources required to execute the single workflow under information about the resource. They propose a dynamic
meeting the QoS constraints are presented in [34] and [35]. objective strategy which switches from execution cost func-
In paper [34] propose Partitioned Balanced Time Schedul- tion to execution time when there is no feasible solution so
ing (PBTS) algorithm, which estimates a minimum number of that it will minimize the execution time, not the execution
computing resources required to execute the scientific work- cost. Furthermore, In paper [40] authors focuses on essential
flow under deadline constraint. Their proposal is improve- characteristics of the cloud but does not consider major issues
ment over BTS (Balanced Time Scheduling) algorithm [36]. such as VMs performance variations and acquisition delay.
The BTS algorithm is mainly for grid environment to esti- Though, various heuristic and meta-heuristic algorithms
mating minimum number of computing resources required has been presented in the literature to execute workflow in
to execute scientific workflow within a user-specified finish cloud under QoS constraints. However, the heuristic algo-
time. The BTS algorithm [36] is based on ‘‘simple idea that rithms presented in the literature exhibits good performance
a tasks can be delayed as long as its time constraint is satis- when the number of tasks is very less in the workflow.
fied’’. The PBTS is mainly for cloud computing environments Whenever the number of tasks increases the performance
and their algorithm does not consider the heterogeneous type of these heuristics are reduces and most of the heuristic
of computing resources in the cloud and assumes only same algorithms not suitable when the number of tasks very large.
types of VMs. While, meta-heuristic for minimization of cost while meeting
The paper [35] address the resource instance and hour deadline constrained [37]–[40] does not encode the operators
minimization algorithm for deadline constraint workflows. correctly or fail to incorporate effectively essential character-
Initially, authors define lower and upper bound for the istics and major issues. So, we propose a meta-heuristic based

5068 VOLUME 4, 2016


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

FIGURE 2. (a) Simple DAG workflow, (b) ExeTime Matrix, (c) TransferTime matrix.

Genetic Algorithm. Our proposal Cost Effective Genetic B. CLOUD RESOURCE MODEL
Algorithm (CEGA) incorporates all the essential charac- Our Cloud model consists of an IaaS service provider, which
teristic and major issues such (performance variation and delivers high-performance computing resources in the form
acquisition delay) of the cloud for executing scientific work- of Virtual Machines (VMs) over the internet to execute large-
flows and tries to minimize the overall cost of the work- scale scientific workflows. These VMs are selected from the
−→
flow while meeting the deadline constraint. We consider the set of VMs (VM ) = {vm1 , vm2 , . . . , vmk }, and each VMs
IC-PCP [30], RCT [32], RTC [32], and PSO [38] algorithms have various configurations such as CPU type, memory size,
from the literature as baselines to evaluate our proposed and cost of per time interval. The cost of a VM is dependent
approach. on its configuration. For example, fast VMs means it is more
costly as compare to the slower VM. Each VM of Type (VM k )
III. ARCHITECTURE FOR WORKFLOW is defined as (ET VMti
k
, Cv ), where ET VM
ti
k
is the estimated
SCHEDULING IN CLOUD execution time of task ti on VM of type (VM k ) is defined in
In this section, we introduce workflow scheduling models Eq. (1) and Cv is the cost of VM type (VM k ) per time interval.
such as Application and Cloud Resource Model. Although, the estimated execution time of tasks on different
types of VMs can be estimated by size of task Size(ti ) divided
A. APPLICATION MODEL by processing capacity of virtual machine PC(VM k ) in terms
A deadline constrained scientific workflow is represented by of Floating point Operations Per Second (FLOPS).
DAGs W = (T , E) where T = (t1 , t2 , . . . , tm ), is the set ETti VMk = Size(ti )/PC(VM k ) (1)
of tasks and E is the set of edges. The edge ei,j is ti , tj εT
TT eij = Data(ti , out)/β

(2)
and ti 6 = tj shows the data and control dependence from the
task ti to the task tj . The task ti is said to be the parent task However, if the task ti is executed on VM of type (VM k )
of task tj , and task tj is the child of the task ti . This parent- and tj is executed on VM of different type (VM v ), Then
child relation shows that the execution of task tj , is only the data transfer time is represented as TT (eij ). It is esti-
possible after completion of task ti . The set of parents and mated by the size of the output data file Data(ti , out) to
children of a task ti is represented by Pred(t i ) and Succs(t i ). be transferred from task ti to task tj divided by the average
Each DAGs has only one entry task tentry and one exit texit bandwidth (β) is shown in Eq. (2). The value of TT eij
task. The entry task tentry is the task that does not have any will be zero when both task ti and tj are executed on the
parents, and the exit task texit is the task that does not have same VM.
any child. If there is more than one entry and exit task in Further, our cloud model is similar to the IaaS service pro-
the workflow we insert a dummy task (execution time and vided by Amazon such as computation service e.g. Amazon
communication time is zero) and makes the graph with only Elastic Compute Cloud (EC2) [9] and storage service Ama-
one entry task and one exit task. A simple DAG Graph is zon Elastic Block Store (EBS) [41] to send and receive the
shown in Fig. 2(a), with the execution time on different intermediate input/output files. We assume that all the storage
virtual machines in shown Fig. 2(b) and data transfer time and computation services are in the same datacenter. So, the
between tasks in Fig. 2(c). Each scientific workflow (W) has average bandwidth to transfer data between shared storage
a user specified deadline (D) that is associated with it, the service and VMs is roughly equal. It is also assumed that a
Deadline (D) of the scientific workflow shows the time limit client does not have a limit to the number of VMs instances
to complete the execution of workflow in the cloud computing leased for a large scale scientific workflow. Also, when a
environment. In the next section, we discussed cloud resource VM is leased, it requires an initial boot time (acquisition
model. delay) for its proper initialization before it is made available

VOLUME 4, 2016 5069


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

to the end user and when it is released it also takes some time Find a feasible schedule (S) for a given workflow (W ) such
to proper shut down (termination delay). Although, Cloud that total execution cost (TEC) is minimizing although the
service provider’s charge to the users only for the amount total execution time (TET ) does not exceed to the deadline
of resources they have used. Most of the service providers (D) of the workflow. It is shown in Eq. (3).
charge to the user for a whole time interval if they use only  
a fractions of last time interval. So, we use per minute basis
X|VM _Pool| LET k − LST k
Minimize TEC = Ctype(vmk ) ∗
time interval to calculate the execution cost of each leased v=1 timeinterval
virtual machine. However, the internal data transfer cost is subject to TET ≤ D where TET = max {FT (ti )}, (3)
ti εW
assumed to be zero, because the internal data transfer is free of
cost in most of the cloud datacenters. The next section defines Ctype(vmk ) is the cost of VM type vmk per time interval.
the problem of workflow scheduling in a cloud computing
environment. V. THE PROPOSED COST EFFECTIVE
GENETIC ALGORITHM
IV. PROBLEM DEFINITION Genetic Algorithms (GAs) are meta-heuristic algorithm
Workflow scheduling in cloud may have several objectives. inspired by evolutionary ideas of natural selection and genet-
In this paper, we proposed a meta-heuristic optimization ics [15]. GAs is applying in optimization problems, according
approach, Cost Effective Genetic Algorithm (CEGA) work- to initialization, selection, and generic operators in many sci-
flow scheduling algorithm. The goal of this proposal is to ences and engineering domains such as pattern recognition,
find a feasible schedule to execute a workflow on cloud image processing, data mining and among others. Due to
computing environment such that overall execution cost is the different characteristics of cloud, GA’s existing genetic
minimized while meeting deadline constraint. However, the operators (binary encoding, real-valued encoding) cannot be
scheduled which is generated by this algorithm is bounded applied directly to the cloud workflow scheduling problem.
by the deadline. So, we assumed that the deadline that is By considering all the characteristics of the cloud workflow
specified by the user is achievable. We define a schedule scheduling problem, we have presented a whole set of the
S = {VMset , Map, TEC, TET} in terms of a set of computing GA’s operations, including encoding, initialization of popu-
resources such as virtual machines, Map is the mapping of lation, mutation, and crossover. We proposed Cost Effective
each task to a suitable virtual machine, the total execution Genetic Algorithm (CEGA) with the deadline constraint.
cost (TEC), and the total execution time (TET) of workflow. We use the constraint handling strategy based on the pro-
VMset = {VM1 , VM2 , . . . , VMk } is the set of VMs that need posed of literature [42], if two schedules are feasible, then
to be leased, each VM in the resource pool is represented we select that have minimum execution cost. However if
type(vmk )
as VM_Pool = {VMvm , LSTk , LETk }, where VM is one schedule is feasible, and another one is not then, we
a type of vmk with its lease start time LSTk and lease end ignore the second schedule and choose the first one and if
time LETk . Map correspond to the mapping of each tasks of either solution are infeasible, then we select that have the
the workflow to a suitable virtual machine, with its start time difference of deadline and TET is less, or that is closer to
ST (ti ) and finish time FT (ti ), and it represents by four tuples the deadline. To achieve this, we use some notations that
Mapvm ti
k
= (ti , vmk , ST (ti ) , FT (ti )). Our workflow schedul- are shown in the Table 1 and in next section define these
ing problem in cloud is based on the model of Rodriguez and notations.
Buyya’s proposal, Particle Swarm Optimization (PSO) [38].
Further, it is necessary for each VM that is provisioned A. BASIC DEFINITIONS
to transfer all the output data files obtained by executing Given a DAG-based workflow application is represented as
tasks scheduled on it, to the local storage of the VMs on W = (T , E) where T = (t1 , t2 , . . . , tm ) is the set of tasks
which the corresponding children tasks are scheduled, before and E is the set of edges with a user specified deadline D.
these VMs to de-provision. Although, cloud service providers We define the following terms:
offer different type storage service such as an Amazon Elastic Predecessors Pred(ti ) and Successors Succs(ti ): For
Block Store (EBS) [41] that perseveres independence to VM task ti , its predecessor and successor task sets are defined
lifetime. So, there is no need to active the VMs, where the below:
child tasks are executed during the output data file transfer.
pred (ti ) = ti |tj εT ∧ tj , ti εE
 
Furthermore, due to the performance variation of the virtual (4)
machines other delay (acquisition and termination delay), succs (ti ) = ti |tj εT ∧ ti , tj εE
 
(5)
the deadline of the workflow can be missed. Therefore, we
use a PerVar parameter to record the variation of perfor- Task’s Earliest Start Time EST (ti ) and Task’s Earliest
mance of VMs that is randomly generated from 0 to 24%. Finish Time EFT (ti ): A task’s earliest start time is the time
We also assume that the average time to acquire a VM at which ti can start its execution on the fastest VM while
from the cloud is 60 seconds for each type of VM (acqui- all of the predecessors pred tp are executed and all the
sition delay is 60 second). Based on the above-defined terms dependencies (output files) are transferred from pred tp to ti
the problem of workflow scheduling is defined as follows: and task finish time is the time when task ti have completes

5070 VOLUME 4, 2016


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

TABLE 1. Notation used in CEGA algorithm. EFT (ti )


= EST (ti ) + MET (ti ) (7)
Minimum Execution Time of Task MET (ti ): The minimum
execution Time MET (ti ) is the execution time of task ti on the
VM of type VM k εVM set that have minimum execution time
among all types of VMs available in cloud and its minimum
execution is defined as follows:
MET (ti ) = minVMk εVMset {ET VM
ti
k
} (8)
Minimum Execution Time of Workflow MET _W : The min-
imum execution time of the workflow is defined as follows:
MET _W = maxti εW (EFT(ti )) (9)
Task Topological Level Lev(ti ): The DAG based workflow
applications topological level Lev(ti ) of task ti is defined as:
(
0 if ti = tentry
Lev(ti ) =  (10)
maxtp εpred(ti ) Lev(t p ) + 1 otherwise
Task’s Start Time ST (ti ) and Task’s Finish Time FT (ti ):
The estimated start time and estimated finish time of a
task ti is the estimated start and estimated finish time on VM
of type VM k . Wherever, all the predecessor of the task ti have
been scheduled and these are defined as follows and Where
PerVar is the amount of variation in performance of VM of
type VM k .

acq_delay
 if ti = tentry
ST (ti ) = max Avail (VM k ) , maxtp εpred(ti )

   
FT tp + TT epi otherwise

(11)
FT (ti ) = ST (ti ) + {ETti VM k /(1 − PerVar)} (12)
VM’s Available Time Avail (VM k ): This is the time at
which the VM of type VM k is ready to execute new task, if ti
is the last assigned task on VM of type VM k then Avail (VM k )
is defined as follows:
Avail (VM k ) = ST (ti ) + {ETti VM k /(1 − PerVar)} (13)
Lease Start Time LST VM k and Lease End Time LSTVM k :
The lease start time LST VM k of a VM of type VM k is time
at which the VM k is ready to execute tasks and the lease
end time LET VM k of VM type VM k is the time at which
VM k is de-provisioned from the resource pool (set of active
resources).

B. ENCODING
There exist three main groups for representing chromosome
its execution. These are recursively defined as follows: of workflow scheduling problem: TasktoIndex, TasktoVM
EST (ti ) mapping, and VMtoType. In the TasktoIndex, we assign an
 integer index to each task according to their order of exe-
0

  
if ti = tentry
cution. In, the TasktoVM mapping each task is assigned to
= maxtp εpred(ti ) EST tp + MET tp the appropriate VM, and in the VMtoType, the VM of type
 
+ TT epi otherwise

VM k is selected from the resource pool. The encoding of the
(6) chromosome in the proposed CEGA algorithm is similar to

VOLUME 4, 2016 5071


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

the meta-heuristic algorithm presented in the literature [40]. method, it generates ‘‘n’’ number of ordered List (Oi ) =
Initially, we find the order of tasks according to dependency {O1 , O2 , O3 , . . . ..On }εO and each ordered List (Oi ) will
constraint of the workflow and then assign an index (starting contain all tasks, (ti ) = {t1 , t2 , t3 , . . . ..tm }εT and follows the
from 1 to number of tasks (m)) to each task of the workflow. dependency constraint.
Second Step- Then we used CheapesttaskVMmap method
of Just in Time (JIT-C) [33] algorithm to map each tasks of the
Ordered List (Oi ) to the cheapest virtual machine. JIT-C is
proposed by Sahni and Vidyarthi [33] that is dynamic work-
flow scheduling algorithm and assign tasks just before they
are ready to execute. The CheapesttaskVMmap method takes
a task ti as input and return a cheapest VM of type VM k . So we
use this procedure to assign each task ti of ordered List (Oi ) to
FIGURE 3. Encoding of Chromosome for a schedule given in Fig. 2 DAG. a cheapest VM of type VM k and same procedure is repeated
for each ordered List (Oi ) = {O1 , O2 , O3 , . . . ..On }εO.
A schedule is composed of three strings: the first contains So, after completion of Step-1 and Step-2, the initial popula-
(TasktoIndex) index (l) of task ti according to order of task, tion of proposed CEGA workflow scheduling algorithm has
second one assign suitable VM (TasktoVM ) while the third been initialized.
one determines the type of VM (VMtoType) that is allocated
for execution of the task. The length of these vectors equals D. FITNESS EVALUATION
to the number of tasks in the workflow. A sample schedule of In this proposed CEGA algorithm the fitness of a sched-
DAG (Fig. 2) with encoding is shown in Fig. 3. The first row ule is related to the minimization of overall execution cost
shows the position of tasks (TasktoIndex), which is according while meeting the deadline constraint as given in Eq. (3).
to the index (l) of task ti in task order. The second row The Total Execution Time (TET) and Total Execution
denotes the suitable VM mapping (TasktoVM ) to each task Cost (TEC) of the workflow is estimated by scheduling
ti as per index (l) in first row, and the third row shows type each task of the workflow to the cheapest virtual machine.
of a VM (VMtoType) is used to allocate task ti , for example, We used Fitness Evaluation method shown in Fig. 5, to
task t3 which is assigned at index 5 will be scheduled to schedule each task of the workflow. The input of the Fit-
VM 1 of type 3. Our encoding of proposed CEGA is based on ness Evaluation method is the initial set of VMs of differ-
the meta-heuristic algorithm presented in literature [40], but ent types available in the cloud, the encoding of chromo-
we have develop novel scheme for population initialization, some (TasktoIndex, TasktoVM , andVMtoType), Total number
crossover, and mutations operators of genetic algorithm and of tasks in the workflow (m), and total number of VMs of
their proposal does not considers the issues of cloud such as different type (k). Then, we initialized the VM _Pool that
performance variation of VMs and the acquisition delay. contains VMs, which will be acquire for scheduling of tasks
is empty, exeTime Matrix is empty, transferTime Matrix is
C. INITIAL POPULATION empty, Mapping (Map) of tasks to VMs is zero, TET and
In this Cost Effective Genetic Algorithm (CEGA) the initial TEC also zero. Further, we have computed the execution
population is generated using following two steps: time (exeTime[m][k]) matrix of tasks on available VMs using
First Step- This step generates ‘‘n’’ number of ordered Eq. (1) and the data transfer time (transferTime[m][m])
List (Oi ) = {O1 , O2 , O3 , . . . ..On }εO of tasks, where n equals matrix of tasks ti to tj using Eq. (2). Furthermore, we select
to the size of population of the CEGA, and each ordered each task ti from the index (x) of task (TasktoIndex) in
List(Oi ) contains all tasks, (ti ) = {t1 , t2 , t3 , . . . ..tm }εT of choromsome with its mapping TasktoVM on VM of type
the workflow, where m is the total number of tasks. We use (VMtoType)VM k , and estimates the values of Start Time
Task_Order method for assigning an order to tasks is shown ST (ti ) and Finish Time FT (ti ) using Eq. (11) and Eq. (12)
in Fig. 4. Input is ordered List (Oi ), that is empty and set respectively.
of tasks (ti ) = {t1 , t2 , t3 , . . . ..tm }εT . Initially, it selects an Then, we schedule the task ti on VM of type VM k with
empty ordered List (Oi ) εO. Then it selects a task ti where, its Start Time ST (ti ) and Finish Time FT (ti ). Now we have
ti εT and put this task ti to the end of the ordered List (Oi ), updated the map that contains the task ti on VM of type VM k ,
if all the predecessors of task ti is already added to the with task ti start time ST (ti ) and finish time FT (ti ). Then,
ordered List (Oi ) or tp εpred (ti ) == NULL and remove we estimate the Least Start Time LST VM k and Least End
task ti from set of task T = T − {ti }. Otherwise, add the Time LET VM k of the VMs that leased from cloud to execute
predecessors of task ti to the ordered List (Oi ) and repeat tasks. If the VM is not present in the VM _Pool than the
the same process until all the tasks ti εT are added to the LST VM k is estimated by substring acq_delay from the ST (ti )
ordered List (Oi ). Till now the Ordered List (Oi ) εO contains and VM _Pool is updated. Otherwise no need to update least
all the task ti εT and follow the dependency constraint of the start time LST VM k and the value of Least End Time LET VM k
workflow. Now, the same procedure is repeated if there is is equal to the Finish time FT (ti ) of task ti . Repeat the same
any empty ordered List (Oi ) εO. After completion of this procedure if there is any task (ti ) that is yet not scheduled.

5072 VOLUME 4, 2016


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

FIGURE 4. Task_Order Method Flowchart.

FIGURE 5. Fitness Evaluation Method.

Finally, the TET and TEC is estimated using Eq. (3). So, E. SELECTION
now we have complete information to define a schedule S (as In CEGA, we use tournament based selection approach [15]
discussed in Section 4), of the workflow with its set of VMs to select chromosome for the next generation. In this CEGA
that are leased, tasks to VM mapping, TET and TEC of the proposal, the chromosome that satisfy the deadline constraint
workflow. and has the minimum Total Execution Cost (TEC) is selected.

VOLUME 4, 2016 5073


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

FIGURE 6. Crossover Operator.

We use the constraints handling strategy proposed by litera- the remaining tasks in the child schedule at TasktoIndex l
ture [42]. If two schedules are feasible, then we select the where (l ∈ 1, . . . , i − 1, j + 1, . . . .m) with their order of exe-
schedule having the minimum execution cost. However, if cution, mapping (TasktoVM ) on VMs and their type VM k
one schedule is feasible, and second schedule is not feasible, (VMtoType) as per the parent schedule P1. Finally, it will
then we ignore the second schedule and choose the first return the new child schedule.
schedule. If both the schedule are not feasible, then we select
the schedule that is close to the deadline.

F. GENETIC OPERATORS
1) CROSSOVER
We used a two-point crossover to generate a new Child
schedule from the existing parent schedules (P1 and P2),
shown in Fig. 6. To perform crossover operation, initially we
consider two parent schedules P1 and P2 with the crossover
rate as 100%. Then randomly choose a value between 0 and 1
and compare with crossover rate (1), if it is less than or equals
to crossover rate, crossover is performed otherwise it will
end. To perform crossover, we selects two integers i and j
randomly such that 1 ≤ i < j ≤ m, (where m is the total
number of tasks) that shows the index of tasks in the parent FIGURE 7. Crossover Operator Example.
schedule P1.
Further, Selects all tasks at index at TasktoIndex l where For example, shown in Fig. 7, to perform the crossover we
lεi, .., j of Parent Schedule P1 and put these tasks into choose two random values of i = 3 and j = 6 from parent
sub-schedule at TasktoIndex l according to the order of P1 (task at (TasktoIndex) i = 3 to j = 6 is t4 , t5 , t3 , t6 ), then
execution as the parent schedule P2 with their mapping Select these tasks (t4 , t5 , t3 , t6 ) from Parent Schedule P2 (task
(TasktoVM ) and their type VM k (VMtoType). Thus, new execution order is t3 , t4 , t6 , t5 in parent schedule P2 with their
sub-schedule contains tasks (j − i + 1) with their map- mapping (TasktoVM ) is on VM (1, 4, 3, 1), and with their type
ping on VMs (TasktoVM ) and type VM k (VMtoType) as (VMtoType) is (2, 2, 2, 1)) and put into sub-schedule. So, sub-
in Parent Schedule P2 and follow dependency constraint. schedule contains tasks (t4 , t5 , t3 , t6 ) with their VM mapping
Furthermore, copies these tasks from sub-schedule to the new VM (1, 4, 3, 1), and type VM k (2, 2, 2, 1)). Then put these
child schedule at TasktoIndex l, with their mapping on VMs tasks (t4 , t5 , t3 , t6 ), mapped on VMs (1, 4, 3, 1), and type of
(TasktoVM ) and type VM k (VMtoType), and then copies all VM k (2, 2, 2, 1) from Sub schedule into Child schedule C1

5074 VOLUME 4, 2016


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

FIGURE 8. Mutation Operator.

at index i = 3 to j = 6. Now, copies remaining tasks (t1 , t2 ,


t7 , t8 at index 1, 2, 7, 8) according to the parent schedule P1
to Child schedule C1 with their VM mapping (1, 2, 4, 2), and
with their type VM k (1, 2, 1, 1). In a similar way for the Child
Schedule C2 is generated.
2) MUTATION
In this proposed work, mutation is performed using topo-
logical level of each task to maintaining the dependency
constraint of the workflow which is shown in Fig. 8. For new
schedule first, we choose a random value between 0 and 1
and check whether it is less than or equals to mutation rate
(we have assume that mutation rate should be less than or
equals 30%).
For new schedule first, we choose a random value between
0 and 1 and check whether it is less than or equals to mutation FIGURE 9. Mutation Operator Example.

rate (we have assume that mutation rate should be less than
or equals 30%). If it is then perform mutation, and it chooses In CEGA, we replace a schedule in the population with
randomly two integers i and j such that 1 ≤ i < j ≤ m and if the new schedule if the new schedule is better in terms of
(lev(TasktoIndex[i]) == lev(TasktoIndex[j]), then swap the minimization of execution cost while meeting the deadline
tasks at TasktoIndex[i] and task at TasktoIndex[j]. Further if, constraint. We also keep the best schedule in the variable and
there are any tasks tl at index q where i < q < j between task update after every generation if new schedule is better than
at TasktoIndex[i] and task at TasktoIndex[j] such that lev(tl ) < the existing.
lev(TasktoIndex[i]), then swaps tasks tl and TasktoIndex[j]
otherwise we discard it. So, finally return the new schedule. VI. PERFORMANCE EVALUATION
Example is shown in Fig. 9, where we select two ran- In this section, we present the details about the experiment
dom value at index (i = 4 and j = 6), and whose conducted to evaluate the performance of the proposed Cost
lev(TasktoIndex[4]) == TasktoIndex[6]). So, we swap the Effective Genetic Algorithm (CEGA).
task at TasktoIndex[4] and task at TasktoIndex[6]. Further,
there is a task t3 at index 5 between task at TasktoIndex[4] and A. EXPERIMENTAL WORKFLOWS
task at TasktoIndex[6] whose lev(t3 ) < lev(TasktoIndex[4]), The CEGA’s performance is assessed on four different sci-
so we swap task t3 with the task at TasktoIndex[4]. entific workflows such as Montage, LIGO, Epigenomics,

VOLUME 4, 2016 5075


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

and CyberShake. The Montage scientific workflow [43] is variation of the virtual machines as well their booting time
an astronomy application that is used to generate custom (acquisition delay).
mosaics of the sky using some set of input images in the RTC and RCT algorithms propose by Poola et al. [32] are
‘‘Flexible Image Transport System (FITS)’’ format. Most of robust and fault tolerant in nature. They propose a robust
its tasks are characterized as I/O intensives, and they do not scheduling algorithm with the allocation of each task of the
require much computation power. workflow on heterogeneous computing resources in cloud
computing environment. Poola et al. [32] proposed policies
based on the priority of execution time (RTC) and execution
cost (RCT).
Rodriguez and Buyya’s proposed a deadline based
resource provisioning and scheduling meta-heuristic opti-
mization algorithm, Particle Swarm Optimization (PSO) [38].
Their approach minimizes execution cost while meeting the
deadline of the workflow. Their proposal represents particle
as workflow and its tasks. Their approach encodes particles
FIGURE 10. The structure of the small size workflows. (a) Montage according to the index of the resources that represent the
(b) CyberShake (c) LIGO (d) Epigenomics. position of the particle. However, these indexes do not have
much information about the resource, so particles move in
The ‘‘Laser Interferometer Gravitational Wave Observa- different dimensions to individual best, and global best may
tory (LIGO)’’ scientific workflow [44] is used to identify not lead better solution. If the deadline is hard, using PSO it
the gravitational waves produces by various events in the is difficult to find a feasible solution.
universe. Its tasks require high computing capacity with large
memory. Cybershake is utilized in the ‘‘Southern California TABLE 2. VMs types with description used in this experiment.
Earthquake Center’’ to illustrate earthquake hazards in partic-
ular region by generating synthetic seismograms, and its task
can be categorized as I/O intensive which requires significant
memory and CPU requirements [1]. Epigenomic used in the
biology and the USC Epigenome Center presently involve to
map the epigenetic state of human cells on genome-wide-
scale and the Epigenome tasks are highly CPU intensives
and less I/O intensives, authors in [2] present the complete
description of each workflow. Further, each of these scien-
tific workflows has a different structure, and composition
with characteristics such as (Pipeline, Data Distribution, Data C. EXPERIMENTAL SETUP
Aggregation and Data Redistribution). The workflow for each The cloud service providers provide different types of virtual
type application of small size is shown in Fig. 10. To evalu- machines to the end user. We assume only five types of VMs
ate the performance of the workflow scheduling algorithms with various specifications and computing capacity similar
and workflow scheduling system [45] developed a workflow to setup in literature [33]. Further, these VMs are based
generator that creates synthetic workflows of arbitrary size, on the performance analysis offered by Amazon’s Elastic
similar to real world scientific workflows. These are pre- Compute Cloud EC2 [47] is shown in Table 2. ‘‘An ECU is
sented in the DAX (Directed Acyclic Graph in XML) format the equivalent CPU power of a 1.0-1.2 GHz 2007 Opteron
in literature [46]. We have considered three sizes of work- or Xeon processor’’. The estimated processing time for each
flow in our experiments small (approx. 50 tasks), medium task of the workflow on several types of VMs is estimated by
(approx. 100 tasks) and large (approx. 1000 tasks). their processing capacity. Further, the performance variation
of VMs in a single Data Centre is modelled with the direction
B. BASELINE ALGORITHMS of work presented in literature [12] and similar to litera-
We used the IC-PCP [30], RTC [32], RCT [32], and ture [33], [38]. However, the performance of VMs (variation
PSO [38] as the baseline algorithms mentioned in the upto 24 percent) based on the normal distribution with mean
Section-2 and compare results with the proposed Cost Effec- 12 percent and a standard deviation of 10 percent. In addition,
tive Genetic Algorithm (CEGA) for execution of single work- data transfer time varies (up to 19 percent) based on normal
flow in cloud computing environment. IC-PCP workflow distribution with mean 9.5 percent and a standard deviation of
scheduling algorithm is focused on most of the characteristic 5 percent. We assumed that our pricing model is similar to the
of the cloud such as on-demand resource provisioning, pay current pricing model provided by Amazon [9], Google [10],
as you go pricing model. It accounts heterogeneous types of and CloudSigma [11].
VMs and data transfer time between tasks other than com- The average bandwidth to transfer intermediate input/
putation time. IC-PCP does not accounts on the performance output files between VMs is similar to the average bandwidth

5076 VOLUME 4, 2016


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

provided by Amazon elastic block store (EBS) (20kbps) [41]. and for Epigenomics workflow, CEGA outperforms RTC by
The billing period in our experiment is 10 minutes, and the hit rate of 10.5%.
estimated acquisition delay is 60 second. We consider the When Compared to RCT algorithm under Hard-Deadline
structure of workflows and the execution time of tasks on the constraint for Montage workflow, CEGA outperforms RCT
basis of DAX Directed Acyclic Graph in XML) [46] which by a hit rate of 45%; for LIGO workflow, CEGA outperforms
is already known. We have considered three sizes of scien- RCT by the hit rate of 35%; for Cybershake workflow, CEGA
tific workflow in our experiments small (approx. 50 tasks), outperforms RCT by the hit rate of 47.5%; and for Epige-
medium (approx. 100 tasks) and large (approx. 1000 tasks). nomics workflow, CEGA outperforms RCT by the hit rate
We have conducted each experiment 20 times and result of 45.5%.
obtained for large workflow is shown in Fig. 11. Though the When Compared to PSO algorithm under Hard-Deadline
accuracy of the obtained results is not 100 percent, so a varia- constraint for Montage workflow, outperforms PSO by a hit
tion of ±5 percent is acceptable in the simulation. To evaluate rate of 3.5%; for LIGO workflow and Cybershake workflow,
the performance of the CEGA, we define the deadline of each CEGA outperforms PSO by the hit rate of 4%; and for
type workflow, for this, we assign all tasks sequentially to Epigenomics workflow, CEGA outperforms PSO by the hit
the fastest virtual machine and calculate minimum execution rate of 3.5%.
time of the workflow MET _W . The MET _W is the lower Further, the results (Table 3) show that the IC-PCP has
bound for the makespan of executing a workflow. slightly improved its performance on soft-deadline con-
In order to set, deadline of the workflow we consider two straint. Under Soft-Deadline constraint for Montage work-
types of deadline constraint: hard and soft. The deadline is set flow, the proposed CEGA algorithms outperforms IC-PCP by
according to the following rule: a hit rate of 73%, for LIGO workflow, CEGA outperforms
IC-PCP by the hit rate of 70.5%, for Cybershake workflow,
D = MET _W × (1 + β) (14)
CEGA outperforms IC-PCP by the hit rate of 66.5%, and for
Where β is the deadline constraint and MET _W is Epigenomics workflow, CEGA outperforms IC-PCP by the
the minimum execution time of the workflow estimated hit rate of 62%. However, the proposed CEGA algorithms
using Eq. (9). The deadline constraint is defined as follows: under Soft-Deadline constraint for Montage workflow, out-
for Hard Deadline 0 ≤ β < 1.2 and for Soft Deadline performs RCT by a hit rate of 48%; for LIGO workflow,
1.2 ≤ β ≤ 3.2. The values of deadline constraint β vary CEGA outperforms RCT by the hit rate of 43.5%; for Cyber-
with step length equal to 0.4 in our experiment. shake workflow, CEGA outperforms RCT by the hit rate
In our proposed CEGA the population size is 400, the of 51%; and for Epigenomics workflow, CEGA outperforms
maximum number of generation is 500, crossover rate is RCT by the hit rate of 46.5%.The proposed CEGA algorithm,
100 percent and mutation rate is 30 percent. We initially under Soft-Deadline constraint gives 100% hit rate. Same hit
select 400 solutions using Task_Order, and JIT-C Algorithm rate under Soft-Deadline constraint is also seen in case of
and then evaluate these initial solutions. We use selection, RTC and PSO.
crossover, mutation, and fitness evaluation method that we As per discussion above and obtained results from the
have discussed in Section 5 for every generation of the CEGA Table 3 shows that the algorithm IC-PCP gives poor perfor-
to find the feasible solution that minimizes the total execution mance for both hard and soft deadline constraints, because
cost while meeting deadline constraint. the algorithm does not capture the dynamic nature of cloud
by ignoring performance variation. Another issue that is not
D. RESULTS AND ANALYSIS considered by the algorithm is the acquisition delay, which
1) DEADLINE CONSTRAINT EVALUATION has significant impact on the makespan (execution time) of
To meet the deadline of the workflow, the performance the workflow. Whereas the proposed CEGA algorithm con-
of the proposed CEGA algorithm is compared with base- siders the performance variation A(PerVar) of VMs as well
line algorithms that are shown in Table 3. The pro- acquisition delay of VMs. So, that we can estimates the start
posed CEGA algorithm under Hard-Deadline constraint time and finish time of a task properly and ensure that each
for Montage workflow, outperforms IC-PCP by a hit task is executed before their deadline.
rate of 92%, for LIGO workflow and Cybershake work- RCT and RTC algorithms are based on the resource
flow, CEGA outperforms IC-PCP by the hit rate of selection policies; and priorities are given on the basis of
88.5%, and for Epigenomics workflow, CEGA outperforms robust-cost time or robust-time-cost. These algorithms can
IC-PCP by the hit rate of 83.5%. The results in Table 3 indi- tolerate VMs performance variation only upto a certain
cates that for the Hard-deadline constraint, IC-PCP algorithm degree. Whereas the proposed CEGA algorithm the perfor-
fails to meet the deadline for each type workflow while the mance variation (PerVar) is the amount of variation in per-
CEGA algorithm gives better performance than the IC-PCP. formance of VM of type VM k that is generated randomly.
When Compared to RTC algorithm under Hard-Deadline Further, PSO approach encodes particles according to
constraint for Montage workflow, CEGA outperforms RTC the index of the resources that represent the position of
by a hit rate of 10.5%; for LIGO workflow and Cybershake the particle. However, these indexes does not have much
workflow, CEGA outperforms RTC by the hit rate of 9%; information about the resource, so particles move in different

VOLUME 4, 2016 5077


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

FIGURE 11. Cost and Makespan of scheduling large Montage Scientific workflows with deadline constraint.

dimensions to individual best and global best may not lead according to TasktoIndex, TasktoVM mapping, and
better solution. If the deadline is hard, PSO is difficult to find VMtoType. In the TasktoIndex, we assign an integer index
a feasible solution. Whereas CEGA encodes the chromosome to each task according to their order of execution. In, the

5078 VOLUME 4, 2016


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

TABLE 3. Percentage of deadline meet (hit rate) for each workflow with deadline constraint.

TasktoVM mapping each task is assigned to the appropriate Further, it is seen that the proposed CEGA algorithm
VM, and in the VMtoType, the VM of type VM k is selected for Montage workflow under deadline constraints give aver-
from the resource pool. This encoding of chromosome pro- age 28% lower makespan than RCT, average 11% lower
vides sufficient information about the resource, this helps to makespan than PSO, and the average 22% higher makespan
meet the hard deadline constraint. Thus, CEGA outperform than the RTC.
IC-PCP, RTC, RCT, and PSO in terms of meeting the hard The proposed CEGA algorithm for LIGO workflow under
deadline constraint (hit rate). CEGA also outperform IC-PCP deadline constraints give average 22% lower makespan than
and RCT in terms of meeting the soft deadline constraint RCT, average 9% lower makespan than PSO and average
(hit rate), and gives the same hit rate as compared to RTC 43% higher makespan than the RTC.
and PSO. The proposed CEGA algorithm for CyberShake workflow
under deadline constraints give average 27% lower makespan
2) MAKESPAN AND COST EVALUATION than RCT, average 17% lower makespan than PSO and aver-
Since it is anticipated that the proposed algorithm gives cost age 6% higher makespan than the RTC.
effective, feasible schedule under deadline constraints, there- The proposed CEGA algorithm for Epigenomics workflow
fore a holistic comparison with the baseline algorithms have under deadline constraints give average 38% lower makespan
been observed for average cost and average makespan. The than RCT, average 20% lower makespan than PSO and aver-
Fig. 11 shows the average execution cost (in $) and average age 47% higher makespan than the RTC. However, the RTC
makespan (finish time in seconds) for each type scientific algorithm as compared to CEGA generates the most cost
workflow. The X-axis is associated with deadline constraint expensive schedule with minimum execution time.
(values) whereas if the value of deadline constraint is less or The proposed CEGA algorithm for Montage workflow
equals to 1.2 than, it is hard deadline otherwise soft-deadline under deadline constraints give average 30% lower execution
constraint. cost than RCT and average 21% lower execution cost than
It is observed from the Fig. 11 that the algorithm IC-PCP PSO. The proposed CEGA algorithm for LIGO workflow
gives the longest execution time for each type of scientific under deadline constraints give average 14% lower execu-
workflow while the cost of executing the workflows is the tion cost than RCT and average 9% lower execution cost
cheapest at both deadline constraints. Since the execution than PSO. The proposed CEGA algorithm for Cybershake
time for each deadline constraint is long hence it is failing workflow under deadline constraints give average 25% lower
to meet the deadline of the workflows. Since the IC-PCP execution cost than RCT and average 28% lower execution
algorithm violates the deadline constraints, so it fails to fulfil cost than PSO. The proposed CEGA algorithm for Epige-
our objective is to schedule a workflow with minimization nomics workflow under deadline constraints gives average
of cost effective while meeting deadline constraint. We also 11% lower execution cost than RCT and average 9% lower
compare the performance of proposed with other three algo- execution cost than PSO.
rithms RTC, RCT and PSO. It is seen (from Fig. 11) for all Thus, from the experimental results conducted for exe-
the algorithms RCT, RTC, PSO and CEGA, The proposed cution cost, makespan and deadline constraint, evaluation
algorithm GECA gives the cost effective schedule for each concludes that CEGA outperforms in terms of meeting the
workflow, with the average hit rate to meet the hard-deadline deadline (Hit rate) of the workflow at the reduced execu-
is 88% and for soft-deadline is 100%. tion cost. At the hard deadline CEGA gives the highest

VOLUME 4, 2016 5079


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

percentage of meeting deadline (Hit rate) for all type VII. CONCLUSION AND FUTURE WORK
of scientific workflows with lower cost. Further, if the Cloud computing delivers high performance computing
deadline constraint is being relaxed, as a result the resources over the internet to solve large scale scientific
CEGA algorithm reduces the execution time and execution workflows. To execute these large scales scientific applica-
cost. tion cloud computing makes appropriate provisioning and
scheduling decision in such a manner that total execution
3) COMPLEXITY ANALYSIS cost is minimized while meeting the deadline constraint.
The computational complexity of the Cost Effective Genetic Toward this, a cost effective meta-heuristics Cost Effective
Algorithm (CEGA), is calculated based on their initialization, Genetic Algorithm (CEGA) have been proposed. The CEGA
selection, crossover, mutation and fitness evaluation oper- algorithm considers all the characteristics of the cloud such
ations. Initially we calculate the time complexity of each as heterogeneity, on-demand resource provisioning and pay-
method, then combined to complexity of each method to as-you-go price model as well as some major issues such
compute the overall complexity of the proposed CEGA algo- as VMs performance variation and booting time. Further,
rithm. Suppose the user has submitted the DAGs Workflow to achieve this, we develop novel schemes for encoding,
W = (T , E) where T = (t1 , t2 , . . . , tm ), is the set of tasks population initialization, crossover, and mutation operators
and E is the set of edges. The total number of tasks is m and of the Genetic Algorithm. The simulation experiments con-
total e number of edges. So, the maximum number of edges in ducted on four scientific workflows show that in compari-
workflow W = (m − 1)(m − 2)/2 ≈ O(m2 ), the population son to state-of-art algorithms, such as IC-PCP, RCT, RTC
size of the proposed CEGA algorithm is n, and cloud offers and PSO. The proposed algorithm CEGA exhibits the high-
k different type of virtual machines. est hit rate for deadline constraint. Although, the algorithm
We have generated Initial population in two steps. First have lower execution time than IC-PCP, RCT and PSO and
Step is Task_order that has the complexity of O(m) for lower execution cost than RTC, RCT, PSO for deadline
each individual population. Second step includes the pro- constraint.
cedure CheapesttaskVMmap which maps each task of In the future, we would like to consider other issue shut-
the Task_order to the suitable resource, that has the down time (termination delay) of VMs, because it will affects
complexity of O(m + k) for each individual. So, the the overall execution cost of the workflow. Further, we will
time complexity of individual solution for initialization is consider the VMs to execute tasks of the workflow that are
O(m) + O(m + k) ≈ O(m + k). So, the initial population has deployed in different regions and their data transfer costs
the total time complexity of O(m + k)n. The time complexity between different data centers. Finally, our goal is to imple-
for Selection operation is O(n), where n is the size of initial ment our algorithm in a real cloud computing environments
population. The time complexity for Crossover operation where as the workflow engine is running so that it can be
is O(m2 ), where m is the number of tasks. The time complex- utilized some deploying applications.
ity of the mutation is O(m). Thus, the total time complexity
of Selection, Crossover, Mutation with g generations is on the CONFLICT OF INTEREST
order of O(gn) + O(gm2 ) + O(gm) ≈ (gm2 ), where (n < m2 ). We ensure that we does not have any conflict of interest
The Fitness evaluation method for each individual has the regarding this research work.
time complexity of O(m2 ). The Fitness evaluation method
for initial population of size n, with g generations has the REFERENCES
time complexity of O(ngm2 ). The total time complexity [1] E. Deelman, D. Gannon, M. Shields, and I. Taylor, ‘‘Workflows and
of Selection, Crossover, Mutation and Fitness evaluation is e-science: An overview of workflow system features and capabili-
O(gm2 ) + O(ngm2 ) ≈ O(ngm2 ). ties,’’ Future Generat. Comput. Syst., vol. 25, no. 5, pp. 528–540,
2009.
Thus the overall complexity of CEGA algorithm is [2] G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta, and
O(m + k)n + O(gm2 )n ≈ O(m + k + gm2 )n ≈ O(ngm2 ). K. Vahi, ‘‘Characterizing and profiling scientific workflows,’’
However, if using the proper data structures, in the popu- Future Generat. Comput. Syst., vol. 29, no. 3, pp. 682–692,
2013.
lation initialization procedure, a large number of calcula- [3] E. Deelman et al., ‘‘Pegasus: A framework for mapping complex scien-
tions could be optimized. Therefore, the dominating factor tific workflows onto distributed systems,’’ Sci. Program., vol. 13, no. 3,
would be the fitness evaluation method, with the complexity pp. 219–237, 2005.
[4] K. Cooper et al., ‘‘New grid scheduling and rescheduling methods in the
of O(ngm2 ). IC-PCP, RCT, and RTC being heuristic based, GrADS project,’’ Proc. 18th Int. IEEE Parallel Distrib. Process. Symp.,
run much faster than our proposed CEGA (meta-heuristic) Apr. 2004, p. 199.
algorithm. IC-PCP, RCT, and RTC have a polynomial time [5] M. Wieczorek, R. Prodan, and T. Fahringer, ‘‘Scheduling of scientific
workflows in the ASKALON grid environment,’’ ACM SIGMOD Rec.,
complexity while CEGA have an exponential time com- vol. 34, no. 3, pp. 56–62, 2005.
plexity, which is equivalent to the PSO. The high time [6] P. Mell, ‘‘The NIST definition of cloud computing,’’ Commun. ACM, vol. 3,
complexity of CEGA is justified by achieving better sched- no. 6, p. 50, 2010.
ules as compared to IC-PCP, RCT, RTC, and PSO, thus [7] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, ‘‘Cloud
computing and emerging IT platforms: Vision, hype, and reality for deliv-
outweighing the disadvantage of having exponential time ering computing as the 5th utility,’’ Future Generat. Comput. Syst., vol. 25,
complexity. no. 6, pp. 599–616, 2009.

5080 VOLUME 4, 2016


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

[8] J. Meena, M. Kumar, and M. Vardhan, ‘‘Efficient utilization of commodity [32] D. Poola, S. K. Garg, R. Buyya, Y. Yang, and K. Ramamohanarao,
computers in academic institutes: A cloud computing approach,’’ Int. J. ‘‘Robust scheduling of scientific workflows with deadline and budget con-
Comput., Elect., Autom., Control Inf. Eng., vol. 9, no. 2, pp. 498–503, straints in clouds,’’ in Proc. Int. Conf. Adv. Inf. Netw. Appl. (AINA), 2014,
2015. pp. 858–865.
[9] Amazon EC2 Pricing, accessed on Feb. 19, 2016. [Online]. Available: [33] J. Sahni and D. Vidyarthi, ‘‘A Cost-effective deadline-constrained dynamic
https://aws.amazon.com/ec2/pricing/ scheduling algorithm for scientific workflows in a cloud environment,’’
[10] Google Cloud Platform, accessed on Feb. 24, 2016. [Online]. Available: IEEE Trans. Cloud Comput., to be published.
https://cloud.google.com/compute [34] E.-K. Byun, Y.-S. Kee, J.-S. Kim, and S. Maeng, ‘‘Cost optimized provi-
[11] CloudSigma, accessed on Feb. 25, 2016. [Online]. Available: sioning of elastic resources for application workflows,’’ Future Generat.
https://www.cloudsigma.com/pricing Comput. Syst., vol. 27, no. 8, pp. 1011–1026, 2011.
[12] J. Schad, J. Dittrich, and J.-A. Quiané-Ruiz, ‘‘Runtime measurements in [35] H. Wu, X. Hua, Z. Li, and S. Ren, ‘‘Resource and instance hour minimiza-
the cloud: Observing, analyzing, and reducing variance,’’ Proc. VLDB tion for deadline constrained DAG applications using computer clouds,’’
Endowment, vol. 3, nos. 1–2, pp. 460–471, 2010. IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 3, pp. 885–899, Mar. 2016.
[13] K. R. Jackson et al., ‘‘Performance analysis of high performance comput- [36] E.-K. Byun, Y.-S. Kee, J.-S. Kim, E. Deelman, and S. Maeng,
ing applications on the Amazon Web services cloud,’’ in Proc. 2nd IEEE ‘‘BTS: Resource capacity estimate for time-targeted science workflows,’’
Int. Conf. Cloud Comput. Technol. Sci., Nov./Dec. 2010, pp. 159–168. J. Parallel Distrib. Comput., vol. 71, no. 6, pp. 848–862, 2011.
[14] Accessed on Feb. 6, 2016. [Online]. Available: http://www. [37] S. Pandey, L. Wu, S. M. Guru, and R. Buyya, ‘‘A particle swarm
cloudharmony.com/status optimization-based heuristic for scheduling workflow applications in cloud
[15] D. Whitley, ‘‘A genetic algorithm tutorial,’’ Statist. Comput., vol. 4, no. 2, computing environments,’’ in Proc. 24th IEEE Int. Conf. Adv. Inf. Netw.
pp. 65–85, Jun. 1994. Appl., Apr. 2010, pp. 400–407.
[38] M. A. Rodriguez and R. Buyya, ‘‘Deadline based resource provisioning
[16] O. H. Ibarra and C. E. Kim, ‘‘Heuristic algorithms for scheduling indepen-
and scheduling algorithm for scientific workflows on clouds,’’ IEEE Trans.
dent tasks on nonidentical processors,’’ J. ACM, vol. 24, no. 2, pp. 280–289,
Cloud Comput., vol. 2, no. 2, pp. 222–235, Apr./Jul. 2014.
1977.
[39] Z.-G. Chen, K.-J. Du, Z.-H. Zhan, and J. Zhang, ‘‘Deadline con-
[17] M. R. Garey and D. S. Johnson, Computers and Intractability; A Guide
strained cloud computing resources scheduling for cost optimization based
to the Theory of NP-Completeness. New York, NY, USA: Freeman,
on dynamic objective genetic algorithm,’’ in Proc. IEEE Congr. Evol.
1979.
Comput. (CEC), May 2015, pp. 708–714.
[18] J. Yu, R. Buyya, and C. K. Tham, ‘‘Cost-based scheduling of scientific [40] Z. Zhu, G. Zhang, M. Li, and X. Liu, ‘‘Evolutionary multi-objective
workflow application on utility grids,’’ in Proc. 1st Int. Conf. e-Sci. Grid workflow scheduling in cloud,’’ IEEE Trans. Parallel Distrib. Syst., vol. 27,
Comput. e-Sci., 2005, pp. 140–147. no. 5, pp. 1344–1357, May 2016.
[19] A. Afzal, J. Darlington, and A. S. McGough, ‘‘QoS-constrained stochas- [41] Amazon Elastic Block Store (Amazon EBS), accessed on Feb. 2016.
tic workflow scheduling in enterprise and scientific grids,’’ in Proc. 7th [Online]. Available: http://aws.amazon.com/ebs/
IEEE/ACM Int. Conf. Grid Comput., Sep. 2006, pp. 1–8. [42] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, ‘‘A fast and elitist
[20] R. Duan, R. Prodan, and T. Fahringer, ‘‘Performance and cost optimization multiobjective genetic algorithm: NSGA-II,’’ IEEE Trans. Evol. Comput.,
for multiple large-scale grid workflow applications,’’ in Proc. ACM/IEEE vol. 6, no. 2, pp. 182–197, Apr. 2002.
Conf. Supercomput., Nov. 2007, pp. 1–12. [43] Montage: An Astronomical Image Mosaic Engine, accessed on
[21] R. Garg and A. K. Singh, ‘‘Multi-objective workflow grid scheduling Feb. 5, 2016. [Online]. Available: http://montage.ipac.caltech.edu
using ε-fuzzy dominance sort based discrete particle swarm [44] D. A. Brown, P. R. Brady, A. Dietz, J. Cao, B. Johnson, and J. McNabb,
optimization,’’ J. Supercomput., vol. 68, no. 2, pp. 709–732, ‘‘A case study on the use of workflow technologies for scientific analysis:
2014. Gravitational wave data analysis,’’ in Workflows for e-Science. London,
[22] R. Prodan and M. Wieczorek, ‘‘Bi-criteria scheduling of scientific grid U.K.: Springer, 2007, pp. 39–59.
workflows,’’ IEEE Trans. Autom. Sci. Eng., vol. 7, no. 2, pp. 364–376, [45] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M.-H. Su, and K. Vahi,
Apr. 2010. ‘‘Characterization of scientific workflows,’’ in Proc. 3rd IEEE Workshop
[23] Y. Yuan, X. Li, Q. Wang, and X. Zhu, ‘‘Deadline division-based heuristic Workflows Support Large-Scale Sci., Nov. 2008, pp. 1–10.
for cost optimization in workflow scheduling,’’ Inf. Sci., vol. 179, no. 15, [46] Workflow Generator, accessed on Feb. 28, 2016. [Online]. Available:
pp. 2562–2575, 2009. https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator.tle
[24] J. Yu and R. Buyya, ‘‘Scheduling scientific workflow applications with [47] S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer, and
deadline and budget constraints using genetic algorithms,’’ Sci. Program., D. Epema, ‘‘A performance analysis of EC2 cloud computing services for
vol. 14, nos. 3–4, pp. 217–230, 2006. scientific computing,’’ in Cloud Computing. Berlin, Germany: Springer,
[25] W.-N. Chen and J. Z. J. Zhang, ‘‘An ant colony optimization approach 2010, pp. 115–131.
to a grid workflow scheduling problem with various QoS requirements,’’
IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 39, no. 1, pp. 29–43,
Jan. 2009.
[26] S. Smanchat and K. Viriyapant, ‘‘Taxonomies of workflow scheduling
problem and techniques in the cloud,’’ Future Generat. Comput. Syst.,
vol. 52, pp. 1–12, Nov. 2015.
[27] E. N. Alkhanak, S. P. Lee, and S. U. R. Khan, ‘‘Cost-aware challenges
for workflow scheduling approaches in cloud computing environments:
Taxonomy and opportunities,’’ Future Generat. Comput. Syst., vol. 50,
pp. 3–21, Sep. 2015.
[28] M. Mao and M. Humphrey, ‘‘Auto-scaling to minimize cost and meet appli- JASRAJ MEENA received the B.Tech. degree
cation deadlines in cloud workflows,’’ in Proc. Int. Conf. High Perform.
(Hons.) in information technology from Rajasthan
Comput. Netw., Storage Anal., 2011, Art. no. 49.
Technical University, Kota, India, in 2010, and
[29] N. Malawski, G. Juve, E. Deelman, and J. Nabrzyski, ‘‘Cost- and
the M.Tech. degree in computer engineering from
deadline-constrained provisioning for scientific workflow ensembles in
IaaS clouds,’’ in Proc. Int. Conf. High Perform. Comput. Netw., Storage the National Institute of Technology, Kurukshetra,
Anal., 2012, Art. no. 22. India, in 2012. He is currently pursuing the
[30] S. Abrishami, M. Naghibzadeh, and D. H. J. Epema, ‘‘Deadline- Ph.D. degree in computer science and engineering
constrained workflow scheduling algorithms for infrastructure as a from the National Institute of Technology, Raipur,
service clouds,’’ Future Generat. Comput. Syst., vol. 29, no. 1, India. He has been an Assistant Professor with
pp. 158–169, 2013. the Department of Computer Science and Engi-
[31] R. N. Calheiros and R. Buyya, ‘‘Meeting deadlines of scientific workflows neering, National Institute of Technology, Raipur, since 2013. His current
in public clouds with tasks replication,’’ IEEE Trans. Parallel Distrib. Syst., research interests include parallel and distributed systems, cloud computing,
vol. 25, no. 7, pp. 1787–1796, Jul. 2014. and green computing.

VOLUME 4, 2016 5081


J. Meena et al.: CEGA for Workflow Scheduling in Cloud

MALAY KUMAR received the M.Tech. degree MANU VARDHAN received the M.Tech. degree
in computer engineering from the National Insti- in computer science from BITS Pilani, Pilani,
tute of Technology, Kurukshetra, India, in 2012. India, in 2009, and the Ph.D. degree in computer
He is currently pursuing the Ph.D. degree in com- science and engineering from the Motilal Nehru
puter science and engineering from the National National Institute of Technology Allahabad, India,
Institute of Technology, Raipur, India. His current in 2014. He has been an Assistant Professor with
research interests include scheduling, distributed the Department of Computer Science and Engi-
systems, and cloud computing. neering, National Institute of Technology, Raipur,
since 2013. He has authored over 25 research
papers in national and international conferences
and journals. His current research interests include distributed systems and
cloud computing.

5082 VOLUME 4, 2016

You might also like