Cost Effective Genetic Algorithm For Workflow Scheduling in Cloud Under Deadline Constraint

This paper presents a Cost Effective Genetic Algorithm (CEGA) for workflow scheduling in cloud computing, focusing on minimizing execution costs while adhering to deadline constraints. The proposed algorithm addresses challenges such as virtual machine performance variation and acquisition delays, which are often overlooked in existing research. Performance evaluations demonstrate that CEGA outperforms current state-of-the-art algorithms on various scientific workflows.

Uploaded by

manojsarasyiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views18 pages

Cost Effective Genetic Algorithm For Workflow Scheduling in Cloud Under Deadline Constraint

Uploaded by

manojsarasyiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Received May 31, 2016, accepted June 28, 2016, date of publication August 11, 2016, date of current

version September 28, 2016.

Digital Object Identifier 10.1109/ACCESS.2016.2593903

Cost Effective Genetic Algorithm for

Workflow Scheduling in Cloud Under
Deadline Constraint
JASRAJ MEENA, MALAY KUMAR, AND MANU VARDHAN
Department of Computer Science and Engineering, National Institute of Technology, Raipur 492010, India
Corresponding author: J. Meena ([email protected])

ABSTRACT Cloud computing is becoming an increasingly admired paradigm that delivers high-
performance computing resources over the Internet to solve the large-scale scientific problems, but still
it has various challenges that need to be addressed to execute scientific workflows. The existing research
mainly focused on minimizing finishing time (makespan) or minimization of cost while meeting the quality
of service requirements. However, most of them do not consider essential characteristic of cloud and major
issues, such as virtual machines (VMs) performance variation and acquisition delay. In this paper, we
propose a meta-heuristic cost effective genetic algorithm that minimizes the execution cost of the workflow
while meeting the deadline in cloud computing environment. We develop novel schemes for encoding,
population initialization, crossover, and mutations operators of genetic algorithm. Our proposal considers
all the essential characteristics of the cloud as well as VM performance variation and acquisition delay.
Performance evaluation on some well-known scientific workflows, such as Montage, LIGO, CyberShake,
and Epigenomics of different size exhibits that our proposed algorithm performs better than the current
state-of-the-art algorithms.

INDEX TERMS Cloud computing, scientific workflows, resource provisioning, scheduling, quality of
service (QoS).

I. INTRODUCTION uses the opportunities and challenge of cloud and gives a cost-
Workflows have been most widely used to model large- effective schedule. Further, the cloud provides computing as
scale scientific and engineering application in the several a utility service, and its services are categorized as Infras-
domains such as Earth Science, Astronomy, Physics, and tructure as a Service (IaaS), Platform as a Service (PaaS),
Bio-informatics [1], [2]. A Workflow is represented as a and Software as a Service (SaaS) [6], [7]. Whereas, IaaS
Directed Acyclic Graph (DAGs) that have some nodes and cloud provides the hardware resources in the form of a vir-
edges. The nodes represent the computational tasks and edged tual resource such as computation, memory, storage, and
represents the data/control dependencies of large-scale sci- networking. PaaS cloud offers an environment for users to
entific applications [2]. The size of the workflow may be develop and deploy their applications, and SaaS, cloud pro-
small or vast as per the type of scientific applications. To run vides web applications/software over the internet, running on
these workflows of varying size, scientists needed a high- cloud infrastructure. PaaS and SaaS are not feasible for large
performance computing environment such as cluster comput- scale scientific workflow applications. Because they provide
ing, grid computing, and the latest one is cloud computing. only an environment to design, develop and test some web
Many existing projects are designed to execute large scientific based applications [8]. This proposal focuses on IaaS clouds
workflow applications Example. Pegasus [3], GrADS [4], that provide several cost and performance effective benefits,
ASKALON [5] on Grid Computing. as compared to cluster and grid computing. First, it provides
Nowadays, cloud computing is becoming an increasingly on demand resource provisioning and these resources are
admired paradigm that delivers high-performance computing controlled by service consumers.
resources over the internet to solve the large-scale scientific Second, a cloud service provider allows the end user to
applications. So, a novel approach needs to be developed that procure and release computing resource as per their demands.

2169-3536 2016 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 4, 2016 Personal use is also permitted, but republication/redistribution requires IEEE permission. 5065
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
J. Meena et al.: CEGA for Workflow Scheduling in Cloud

Therefore, the scientific applications can grow or shrink Further, workflow scheduling in the cloud has mainly two
their resource pool as per the need of their applications. steps: in the first step, the set of computing resources are
The cloud allocates only required computing resources from selected from the cloud to execute computational tasks and
the resource pool so that overall resource utilization can be then to provision of selected resources. In the second step,
increased while reducing the total execution cost [6]. Third, a schedule is generated and then the mapping of each task to
cloud service provider’s charges to the service consumers the suitable resource is done, so the overall execution cost can
using pay-per-uses price model, in which they have to pay be minimized while meeting the deadline constraint. Further-
only for the computing resources they have used. Most of more, most of the previous work on workflow scheduling in
the service providers charge to the user for the whole time cluster and grid mainly focuses on resource planning phase,
interval if they used only a fractions of last time interval. means scheduling of task on suitable resources. The reason
For example, suppose the user uses computing resources behind that the cluster and grid computing provide the only
61 minutes and the time interval is 60 minutes, then the static pool of computing resources and their configuration is
user has to pay for two-time intervals. In this research, we known in advances. Although, most of the researcher in grid
use a similar pricing model that provided by Amazon [9], computing area focused on the minimization of execution
Google [10] and CloudSigma [11]. time (Makespan) while cloud researcher focuses on execu-
Although, Cloud computing have various benefits, but still tion time besides execution cost, energy consumption, the
it have some issues that need to be addressed. First, the high degree of security and a fault tolerant based workflow
performance of VMs can vary due to virtualization of hard- scheduling with satisfying user’s QoS requirements.
ware resources, the multi-tenancy of cloud infrastructure and Our research work is based on the meta-heuristic Genetic
heterogeneous computing resources. As per the report of [12] Algorithms (GAs). It is a model that was introduced by
the overall CPU performance of the VM can vary up to 24 per- John Holland in 1975 [15] and inspired by evaluation.
centages in Amazon public cloud. However, [13] reported It encodes a potential solution to a particular problem on a sin-
that a typical cloud environment the performance variation gle chromosome and applies generic operators like crossover
can be up to 30(%) in execution time and up to 65(%) in data and mutation to improve that solution. GA is often observing
transfer time. Further, due to the variation in performance as optimization function, and it is involved in broad range
of VMs the overall deadline can be missed or budget can of problems such as Pattern Recognition, Image Processing,
be increased of a scientific workflow. So, the performance Data Mining, among others. In this paper, we proposed a Cost
variations have a significant impact on the scheduling of Effective Genetic algorithm (CEGA) workflow scheduling
workflows on cloud computing environment. Our proposed algorithm to schedule each task of the workflow to the cheap-
algorithms consider the virtual machines performance vari- est resource in the cloud. Our proposal CEGA considers all
ation to meet the user defined deadline constraint. Second, characteristics of cloud such as on-demand resource provi-
when a VM is leased, it takes the time to proper initial- sioning, elasticity and pay-per-uses price model and it also
ization (acquisition delay) and similarly, whenever com- focuses on the issues such as VMs performance variation and
puting resources will release, they will take the time to their acquisition delay. To achieve this novel scheme has been
shut-down (termination delay). So, the longer time in present for encoding, population initialization, and generic
resource acquire will increase the total execution time and operators. Our proposal not only minimizes the overall execu-
longer time in the shutdown will increase the overall cost of tion cost of the workflow but also improves the hard deadline
the workflow. Thus, these delays will affect the overall perfor- constraint in cloud computing environment.
mance and cost of the scientific workflows. In our proposed The remainder of this paper is organized as follows
work, we assume an average acquisition delay for each type Section 2 discusses related work followed by the architecture
of VM, which can recognize the overall delay in execution of of workflow scheduling such as application model and cloud
workflows. Although, we do not focus on termination delay resource model in Section 3. Then the problem is defined
because it will not affect the deadline constraint. Third, the in Section 4. The proposed meta-heuristic CEGA scheduling
issue is, if any computing resources like VMs will fail dur- algorithm is explained in Section 5. Section 6 presents the
ing the execution of workflow, the total execution time will performance evaluations and Section 7 conclude this research
increase. But the cloud environment provides 99.9 percent work and discusses future work.
availability [14] of computing resources, so the performance
variation is a serious problem as compared to VM failure II. RELATED WORK
issue. So, we are not focusing on this issue for execution The multitask workflow scheduling on parallel and dis-
of scientific workflows, our primary focuses to make cloud tributed system is extensively studied over the year, and it
computing environment cost effective. is NP-Hard Problem [16], [17]. Therefore, it is very difficult
However, above discussion on benefits and issues of cloud, to produce an optimal solution within the polynomial time.
dictates the development of a novel cost-effective workflow However, various heuristic and meta-heuristic algorithms
scheduling algorithm for large-scale scientific workflows. have been proposed that provide approximate or near optimal
So, we propose a Cost Effective Genetic Algorithm (CEGA) solutions for parallel and distributed paradigms such as Grid
to schedule each task of the workflow to suitable resource. and Cluster computing [18]–[25]. Although, very few of the