Research On Cloud Computing Resources Provisioning Based On Reinforcement Learning
Research On Cloud Computing Resources Provisioning Based On Reinforcement Learning
1. Introduction
As one of the core issues for cloud computing, resource
The concept of cloud computing vividly reflects the charac- management aims to shield the underlying resource hetero-
teristics of information service in Internet age; meanwhile geneity and complexity by adopting virtualization technol-
the pursuit of the vision of cloud computing also brings ogy, which makes the massive distributed resources form
new challenges to information technology. Acting as a a unified giant resource pool. Therefore, it can guarantee
significant application research, the data center is pushing the efficient resource provision and use by implementing
a series of technology innovations to perform the key resources management methods and techniques rationally.
features of cloud computing, such as on-demand service, Therefore, how to achieve effective management of cloud
elasticity of extension, and massive data storage. The data computing resources is faced with a number of new chal-
center widely adopts virtualization technology to achieve lenges, which are mainly shown in three types of
the uncoupled mode of physical resource and application. imbalance.
Applications use Virtual Machine (VM) as a package unit
to share various physical resources with others. Hence the (i) First, Imbalance in the Needs of Applications. Cloud
resource schedule entities are represented by fine-grain com- puting application contains various behaviors of
VMs instead of coarse grain service machines. workload, from the control-intensive applications (such
Virtualization technology provides convenience for the data as search, sort, and analysis) to the data-intensive ones
center, but the VMs resource provi- sioning brings more (image processing, simulation, modeling, data mining,
challenges to the efficient management of data center etc.). In addition, it also includes the computationally
infrastructure. intensive applications (itera- tive method, numerical
method, financial modeling, etc.). The throughput of
various applications depends heavily on
2 Mathematical Problems in Engineering
User queue
1
User job
scheduling
PMA
PMA
PMA
VM2 VM2 VM2 VM2
RMA
RMA
. . . .
User queue
2
User job
transmit
Users Job Transmit (UJT). The execution results of queue where 𝜌𝑖𝑗 = 𝜆 𝑖𝑗 /𝜇𝑖𝑗 and the Probability Density Function
2 are transmitted to the corresponding user in light of the (PDF) of JQT is
transmission strategy.
JQT ≤ SLAJQT, terms of resource utilization rate and improves the optimized
JET ≤ (7) function.
As for a user’s job, the issue on resource provisioning
SLAJET, JTT
optimization in cloud computing platform can be denoted as
≤ SLAJTT.
As long as the job at each phase in the execution meets
Maximum UUTC
SLA constraints, the total response time is enabled to
{∀user job}
satisfy the global SLA constraints. Moreover, the
introduction of
segment SLA can improve the QoS of the cloud computing subject to JQT ≤ SLAJQT
(9)
platform effectively. For instance, when a JQT violates
SLA because of a resource shortage, I/O deadlock, or JET ≤
conflicts, a higher priority is given to the job execution in
the upcoming phases, guaranteeing the resources for the job SLAJET JTT ≤
and reducing SLAJTT.
the corresponding time of JET and JTT, respectively, thus Physically, UUTC is the ratio of the operation cost and
to ensure the overall SLA of the user operation to satisfy the actual execution time. It optimizes the constraints in
the QoS constraints. An example of medical image analysis
applica- tion in cloud computing environment is shown in
Figure 2.
Total cost
UUTC = 𝑇tot . (8)
As shown in (9), RAM makes a decision through
attaining the performance index at every observant
moment on the basis of PAM. Therefore, it is a sequential
decision-making problem. Aiming to solve this problem,
we propose a scheme in this study by employing
reinforcement learning, which is described in detail in
Section 4.
0.6
4 0.5
0.4
3
0.3
0.2
2
0.1
1 0
20 40 60 80 100 120 140 160 180 20 40 60 80 100 120 140 160 180
Job number Job number
Divide State Space
Repeat Utilization scheme
for each stateBasic
space partition scheme
do Utilization scheme
Q allocation
(4 set upper and low bound of CPU, memory Basic Q allocation scheme
) bandwidth
and (a)
(b)
Obtain running state of cloud computing platform
(5e 3: Comparison
FIGUR results of index
Obtain performance VCPUforresource do and SLA conflict detection between the basic 𝑄 learning scheme and
provisioning
each resources
the )utilization scheme. (a) Comparison of 1VCPU number between the basic 𝑄 learning scheme and the utilization scheme under
using Algorithm
various
(6 job numbers. (b) SLA conflict end fordetection between the basic 𝑄 learning scheme and the utilization scheme under various job
numbers.
)
(7 for
end
)
Update 𝑄 value table
(8
) (1) Obtain running state of cloud computing platform
(9ALGORithm 2: Offline reinforcement learning algorithm.
(2) Look up 𝑄 value table, configure VM resources
) (3) Obtain performance index
(4) use belief library
(5) set upper and low bound of VCPU, memory and
that the acquired relation can be approximately formulated bandwidth
with the regression function. The pseudocode of the 𝑄 (6) action space compact
value offline learning algorithm is illustrated in Algorithm (7) for each resources do
2. (8) using Algorithm 1
In spite of the fact that the 𝑄 value table resulting from (9) end for
offline learning is rather huge, data index can be used to (10) Update 𝑄 value table
accelerate the search speed, thus improving search
efficiency.
ALGORithm 3: Online reinforcement learning algorithm.
4.3.2. Belief Libraries and Simple Action Space. Set up a
VCPU belief library, whose rules are similar to the ways
by which the belief library was built in [17]. Based on the
established belief library, the action space can be simplified belief library and the simplification of action space avoid
correspondingly. For example, when provisioning the blindness of 𝑄 learning action selection effectively and
resources, if the VCPU utilization approaches the lower thus improve the convergence speed.
bound, the action increase to raise up VCPU resources in
action space should be removed; otherwise the decrease to 4.3.3. Online Learning. Offline learning environment can
reduce the VCPU resources should be removed. The only simulate part of the real operating environment. The
establishment of acquired function is a viable strategy for resource
provision- ing only when a job arrival rate meets the SLA
constraint; even so, it may not be a suboptimal strategy.
However, this initial strategy sets up an upper boundary for
resource provisioning, and under the guidance of it we can
learn online based on this initial strategy to further improve
the resource utilization.
The acquired real-time resource utilization rate by using
PMA guides RMA learning. The pseudocode of the 𝑄-
value online learning algorithm is illustrated in Algorithm
3.
9
1
8 0.9
7 0.8
0.6
5
0.5
4 0.4
3 0.3
0.2
2
0.1
1
20 40 60 80 0
100 120 140 160 180 20 40 60 80 100 120 140 160 180
Job number Job number
FIGURe 4: Comparison results of VCPU resource provisioning and SLA conflict detection between the improved 𝑄 learning scheme, the
basic 𝑄 learning scheme, and the utilization scheme. (a) Comparison of VCPU number between the improved 𝑄 learning scheme, the
basic
𝑄 learning scheme, and the utilization scheme under various job numbers. (b) SLA conflict detection between the improved 𝑄 learning
scheme, the basic 𝑄 learning scheme, and the utilization scheme under various job numbers.
5. Experience Results
𝑄 learning algorithm, (2) the utilization resources
To evaluate the efficiency of our approach, provision- ing scheme, denoted by utilization scheme, in
implementations have been performed on the simulation which the cloud computing resources were optimally
and real cloud computing environment, respectively. scheduled to the VMs by resources utilization, (3) the
genetic algorithm resources pro- visioning scheme, denoted
5.1.Simulation Experiment Results. Using MATLAB by GA scheme, in which the cloud computing resources
R2012a by MathWorks, Inc., we have developed a discrete were optimally scheduled to the VMs by genetic algorithm,
event simulator of the cloud server form to validate the (4) the nonlinear programming resources provisioning
efficiency of resource provisioning solution and have scheme, denoted by nonlinear programming scheme, in
compared the performance information among the which the cloud computing resources were opti- mally
alternative schemes in our simulations. scheduled to the VMs by nonlinear programming.
We evaluated the performance of the improved 𝑄 learn- As seen from the experimental results in Figure 5(a),
ing strategy in Figure 4 and compared it with the utiliza-
the number of VCPU resources in the GA scheme and the
tion rate scheme and the basic 𝑄 learning scheme under
the same experimental conditions in Section 4.2. From the nonlinear programming scheme may cause frequent
experimental results in Figure 4, on one hand, we can see changes due to the objective function optimization.
that basic 𝑄 learning scheme is still likely to make wrong Nevertheless, the improved 𝑄 learning scheme and the
decisions and thus resulted in the SLA conflicts and utilization scheme demonstrate the same performances as
frequent VCPU resource provisions. On the other hand, the those in Figure 4(a), respectively. With similar pricing
improved settings to Table 1, Figure 5(b) shows that the total cost of
𝑄 learning scheme can provision the VCPU resources in the improved learning
real time based on the changes in arrival rates; more 𝑄 strategy designed in this paper is lower than that of the
importantly, apart from avoiding SLA conflicts, the number compared schemes.
of VCPU resources used by it appears less than that used by We ran the simulation program 1000 times at various
utilization rate scheme through most of the time. In other arrival rates, and the average results of various performance
words, avoiding the SLA conflict, the improved 𝑄 learning indicators are shown in Table 2. As shown in the results, the
method improves the utilization rate of resources. UUTC at different arrival rates is better than that of the
Next, we compare the improved 𝑄 learning scheme con- trast algorithm. In light of the definition of UUTC, the
with the prevalent resource provisioning schemes: (1) the numerator aims to maximize the current operation costs,
proposed resources provisioning scheme, denoted by the while the denominator aims to minimize the current execu-
improved 𝑄 learning scheme, in which the cloud tion time. The ratio between the two indicates the
computing resources were optimally scheduled to the VMs utilization rate of cloud computing resources per unit time.
by improved Accordingly, the greater the value is, the higher the
corresponding utiliza- tion rate of the cloud computing
resource is.
TABLE 2: Comparison results of various resources allocation
schemes.
Arrival rate Total running time Total cost Mean VCPU numbers UUTC
(jobs per min.) (min.) (dollars) (numbers per arrival rate) (dollars per min.)
RPSUT 364.61 7270 7.36 2.70
RPSGA 381.68 3930 7.36 1.39
30
RPSNP 368.36 4353 7.36 1.60
Proposed scheme 368.36 5204 4.76 2.96
RPSUT 247.20 7433 8.20 3.66
RPSGA 273.12 5788 8.20 2.58
40
RPSNP 278.21 5349 8.20 2.34
Proposed scheme 250.30 3727 3.87 3.84
RPSUT 216.14 8688 8.18 4.91
RPSGA 244.06 8100 8.18 4.05
50
RPSNP 232.41 6376 8.18 3.35
Proposed scheme 218.63 5448 4.45 5.59
RPSUT 148.39 9540 8.78 7.32
RPSGA 172.75 8718 8.78 5.74
60
RPSNP 176.30 7569 8.78 4.88
Proposed scheme 149.85 6578 4.43 9.90
RPSUT 147.48 9611 9.96 6.54
RPSGA 175.78 10025 9.96 5.72
70
RPSNP 179.37 9878 9.96 5.529
Proposed scheme 147.87 7820 7.69 6.87
20
11000
18
10000
16
14 9000
Total cost (dollars)
VCPU number
12 8000
10
7000
8
6 6000
4 5000
2
4000
20 40 60 80 100 30 35 40 45 50 55 60 65 70
Job number Arrival rate (jobs/min.)
FIgURE 5: Comparison results of VCPU resource provisioning and total cost between the improved 𝑄 learning scheme, GA
provisioning scheme, the utilization scheme, and nonlinear programming provisioning scheme. (a) Comparison of VCPU number
between the improved
𝑄 learning scheme, GA provisioning scheme, the utilization scheme, and nonlinear programming provisioning scheme under various job
numbers. (b) Comparison of total cost between the improved 𝑄 learning scheme, GA provisioning scheme, the utilization scheme, and
nonlinear programming provisioning scheme under various job arrival rates.
TABLE 3: Comparison results of average throughput under various
12000 warehouses (Bops).
0 Utilization
Warehouse Max Proposed
10000
scheme scheme
Scores
0
1 32103.43 32847.88 33767.96
80000
2 63252.47 59510.64 59951.13
60000
3 87267.56 82380.99 83919.59
40000
4 105719.64 104081.37 99212.12
20000 4 8 5 115648.76 117200.12 116671.41
Warehouses 6 119057.39 121530.59 122071.97
Included in score calculation 7 123759.19 126187.14 124289.81
Not included in score calculation 8 120844.70 122457.18 122758.30
FIGURe 6: SPECjbb2005 throughput in default settings.
8
Abbreviations
UI: Users interface
UJQ: Users job queue
UJS: Users Job
Scheduling UJT:
Users Job Transmit
VM: Virtual Machine
VMC: Virtual Machine Cluster
VMCA: Virtual Machine Cluster
Agent PMA: Performance Monitor
Agent RMA: Resource Management
Agent JRT: Job Response Time
JQT: Job Queueing Time
JET: Job Execution
Time JTT: Job Transfer
Time QoS: Quality of
Service
SSLA: Segmentation SLA
UUTC: Utility Unit Time Cost
MDP: Markov decision process
PDF: Probability Density
Function ML: Machine learning
RL: Reinforcement learning
𝜆: Job arrival rate of cloud computing platform
𝜆 𝑖𝑗 : Job arrival rate of 𝑗th VM in 𝑖th VMC
𝜇𝑖𝑗 : Job server rate of 𝑗th VM in 𝑖th VMC
𝐷𝑖𝑗 : The executed result size of user job
𝐵𝑖𝑗 : The allocation bandwidth resources of
the user job.
Conflict of Interests
The authors declare that there is no conflict of interests
regarding the publication of this paper.
Acknowledgment
The work presented in this paper was supported by
National Natural Science Foundation of China (nos.
61272382 and 61402183).
References
[1] H.-S. Wu, C.-J. Wang, and J.-Y. Xie, “TeraScaler ELB-an
algorithm of prediction-based elastic load balancing resource
management in cloud computing,” in Proceedings of the 27th
International Conference on Advanced Information Networking
and Applications Workshops (WAINA ’13), pp. 649–654, IEEE,
Barcelona, Spain, March 2013.
[2] Y. Gao, H. Guan, Z. W. Qi, T. Song, F. Huan, and L. Liu,
“Service level agreement based energy-efficient resource
management in cloud data centers,” Computers and
Electrical Engineering, vol. 40, pp. 1621–1633, 2013.
[3] V. Suresh, P. Ezhilchelvan, and P. Watson, “Scalable and
respon- sive event processing in the cloud,” Philosophical
Transactions of the Royal Society A, vol. 371, no. 1983, Article
ID 20120095, 2013.
[4] X. Nan, Y. He, and L. Guan, “Queueing model based
resource optimization for multimedia cloud,” Journal of
Visual Commu- nication and Image Representation, vol. 25,
no. 5, pp. 928–942, 2014.
[5] H. Nguyen, T. N. Minh, and N. Thoai, “Tool-driven
strategies for resource provisioning of single-tier web
applications in clouds,” in Proceedings of the 5th
International Conference on Ubiquitous and Future
Networks (ICUFN ’13), pp. 795–799, July 2013.
[6] S. Yu, Y. Tian, S. Guo, and D. O. Wu, “Can we beat DDoS
attacks in clouds?” IEEE Transactions on Parallel and
Distributed Systems, vol. 25, no. 9, pp. 2245–2254, 2014.
[7] K. Salah and R. Boutaba, “Estimating service response time
for elastic cloud applications,” in Proceedings of the 1st
IEEE International Conference on Cloud Networking
(Cloud Net ’12), pp. 12–16, Jussieu, Paris, November 2012.
[8] H. Khazaei, J. Misic, and V. B. Misic, “Performance
analysis of cloud computing centers using M/G/m/m+r
queuing systems,” IEEE Transactions on Parallel and
Distributed Systems, vol. 23, no. 5, pp. 936–943, 2012.
[9] H. Wada, J. Suzuki, Y. Yamano, and K. Oba, “Evolutionary
deployment optimization for service-oriented clouds,” Soft-
ware: Practice and Experience, vol. 41, no. 5, pp. 469–493,
2011.
[10] M. Bourguiba, K. Haddadou, I. E. Korbi, and G. Pujolle,
“Improving network I/O virtualization for cloud computing,”
IEEE Transactions on Parallel and Distributed Systems, vol.
25, no. 3, pp. 673–681, 2014.
[11] Z. Luo and Z. Qian, “Burstiness-aware server consolidation
via queuing theory approach in a computing cloud,” in
Proceedings of the 27th IEEE International Symposium on
Parallel & Dis- tributed Processing (IPDPS ’13), pp. 332–341,
IEEE, Cambridge, Mass, USA, May 2013.
[12] F.-C. Jiang, C.-T. Yang, C.-H. Hsu, and Y.-J. Chiang,
“Optimiza- tion technique on logistic economy for cloud
computing using finite-source queuing systems,” in
Proceedings of the 4th IEEE International Conference on
Cloud Computing Technology and Science (CloudCom ’12),
pp. 827–832, December 2012.
[13] G. Grimmett and D. Stirzaker, Probability and Random Pro-
cesses, Oxford University Press, 3rd edition, 2010.
[14] Y. Wu, C. Wu, B. Li, X. Qiu, and F. C. M. Lau, “CloudMedia:
when cloud on demand meets video on demand,” in
Proceedings of the 31st International Conference on
Distributed Computing Systems (ICDCS ’11), pp. 268–277,
IEEE, Minneapolis, Minn, USA, July 2011.
[15] J. Zheng and E. Regentova, “Qos-based dynamic channel
allocation for GSM/GPRS networks,” in Network and Parallel
Computing, vol. 3779 of Lecture Notes in Computer Science,
pp. 285–294, Springer, Berlin, Germany, 2005.
[16] A. Wolke and G. Meixner, “Twospot: a cloud platform for
scaling out web applications dynamically,” in Proceeding of
3rd European Conference on Service Wave, pp. 13–24, Ghent,
Belgium, 2010.
[17] X. Bu, J. Rao, and C.-Z. Xu, “A reinforcement learning
approach to online web systems auto-configuration,” in
Proceedings of the 29th IEEE International Conference on
Distributed Computing Systems Workshops (ICDCS ’09), pp.
2–11, Montreal, Canada, June 2009.
[18] http://www.spec.org/jbb2005/.