0% found this document useful (0 votes)
38 views7 pages

Sensitivity Analysis

This document summarizes a research paper that was presented at the 2016 International Conference on Cloud Computing Research and Innovations. The paper proposes using machine learning techniques like artificial neural networks and sensitivity analysis to build a model for predicting and explaining key factors that influence energy consumption in cloud data centers. Specifically, it combines evolutionary artificial neural networks and sensitivity analysis to identify and validate important parameters affecting energy performance. The model aims to provide insight for optimizing energy usage through data-driven approaches.

Uploaded by

zba15811xc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views7 pages

Sensitivity Analysis

This document summarizes a research paper that was presented at the 2016 International Conference on Cloud Computing Research and Innovations. The paper proposes using machine learning techniques like artificial neural networks and sensitivity analysis to build a model for predicting and explaining key factors that influence energy consumption in cloud data centers. Specifically, it combines evolutionary artificial neural networks and sensitivity analysis to identify and validate important parameters affecting energy performance. The model aims to provide insight for optimizing energy usage through data-driven approaches.

Uploaded by

zba15811xc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2016 International Conference on Cloud Computing Research and Innovations

Machine Learning with Sensitivity Analysis to Determine


Key Factors Contributing to Energy Consumption in
Cloud Data Centers

Yong Wee Foo1,2, Cindy Goh1, Yun Li1


1
School of Engineering, University of Glasgow, Glasgow, U.K.
2
School of Engineering, Nanyang Polytechnic, Singapore
2
[email protected]

Abstract—Machine learning (ML) approach to modeling and Based on Uptime Institute’s 2014 Data Center Industry
predicting real-world dynamic system behaviours has received Survey, the average data centers’ PUE has only improved
widespread research interest. While ML capability in slightly from 1.89 in 2011 to 1.7 in 2014 [1]. This is a gain of
approximating any nonlinear or complex system is promising, it is 11.2% compared to a gain 32.3% from PUE of 2.50 to 1.89 from
often a black-box approach, which lacks the physical meanings of 2007 to 2011. The survey also reported that 77% of the
the actual system structure and its parameters, as well as their participating industry cited that the management has set a target
impacts on the system. This paper establishes a model to provide PUE, with more than half expecting to lower PUE to 1.5 or
explanation on how system parameters affect its output(s), as such better. This scenario presents a tremendous opportunity for
knowledge would lead to potential useful, interesting and novel
researchers, engineers and technologists to further improve the
information. The paper builds on our previous work in ML, and
also combines an evolutionary artificial neural networks with
cloud data center energy efficiency.
sensitivity analysis to extract and validate key factors affecting the Recently, applying data-driven ML techniques to lower
cloud data center energy performance. This provides an cloud data center energy consumption have been a hot research
opportunity for software analysts to design and develop energy- topic. Gao [2], proposed improving the cloud data center Power
aware applications and for Hadoop administrator to optimize the Usage Effectiveness (PUE) using ML technique based on
Hadoop infrastructure by having Big Data partitioned in bigger artificial neural networks (NN). The feed-forward NN takes in
chunks and shortening the time to complete MapReduce jobs. 19 inputs variables, contains 5 hidden layers with 50 nodes per
layer and outputs 1 variable. The NN model has achieved a high
Keywords—machine learning, artificial neural networks, predictive accuracy with a mean absolute error of 0.004 and
sensitivity analysis, cloud computing, energy efficiency, genetic standard deviation of 0.005 on the test dataset. The model was
algorithm validated with a ‘live’ experiment conducte d by simulating an
actual increase in process water supply temperature to the server
I. INTRODUCTION floor by 3oF (or ~1.7oC) resulting in an expected decrease of
The accelerated growth in cloud computing is expected to ~0.005 PUE as predicted by the model.
drive energy consumption of cloud data centers to new highs. Chen [3] suggested a spatially-aware Virtual Machine (VM)
An effective engineering ratio that measures the data center workload placement method, called SpAWM to optimize the
energy efficiency is the Power Usage Effectiveness (PUE). This consumption of power and cooling in cloud data center.
term records baseline data and traces energy efficiency SpAWM adopts the ML approach using neural network and
movements. It is expressed as a ratio as shown in Eq. 1, with the reinforcement learning (RL). Developed from Markov Decision
overall energy efficiency improving as the value decreases Process (MDP), the central idea in RL is to learn the optimal
towards 1. action, ܽ௧ via a trial and error process, after taking into account
every state, ‫ݏ‬௧ visited by the system. For every state-action pair,
σ൫௉೘೐೎೓ ା௉೐೗೐೎೟ ା௉೎೚೘೛ೠ೟೐ ା௉೚೟೓೐ೝೞ ൯ the RL keeps an associated Q value. If the selected action in a
ܷܲ‫ ܧ‬ൌ σ ௉೎೚೘೛ೠ೟೐
(1) state is positive, the feedback increases the Q value. Else, if the
feedback is negative, Q value is decreased. In the RL, the new Q
where, the numerator denotes the sum of all power consume by value is updated via the Eq. 2,
the cloud data center including the mechanical facility (chillers
and computer room air-con or CRAC), the electrical facility ܳሺ‫ݏ‬௧ ǡ ܽ௧ ሻ ൌ ߙሼ‫ݎ‬௧ ൅ ߣሾܳሺ‫ݏ‬௧ାଵ ǡ ܽ௧ାଵ ሻ െ ܳሺ‫ݏ‬௧ ǡ ܽ௧ ሻሿሽ (2)
(switchgear, UPS, battery backup), the ICT computing
infrastructure (servers, storage, networks and where, ܳሺ‫ݏ‬௧ାଵ ǡ ܽ௧ାଵ ሻ denotes the Q value of the next state t+1,
telecommunications equipment) plus any other devices ܳሺ‫ݏ‬௧ ǡ ܽ௧ ሻdenotes the Q value of the current state t, ߙis the
(lightings, printers, personal computers, VoIP phones, fax learning rate, ߣ is the discount factor and rt is the immediate
machines and etc.) expend to support the cloud data center reward received at state t. The number of Q values can grow
operations, and the denominator denotes the sum of all power exponentially if the state-action pair is large, hence NN
consume by the ICT computing infrastructure only that produces modeling is utilized for the cloud data center environment to
useful IT work. capture the relationship between resource utilization (state
space), workload assignments (actions) and thermal distribution

978-1-5090-3951-7/16 $31.00 © 2016 IEEE 107


DOI 10.1109/ICCCRI.2016.24
(reward function). In the paper, the NN RL is designed to input dataset is significantly reduced improves the learning
optimize the objective function reward R, which is expressed in process. A data compression technique using Principle
Eq. 3, Component Analysis (PCA) is then applied in addition to the
regression model. Since there are correlations among input
௜௡
ܴ ൌ ܶ௧௛ െ ݉ܽ‫ݔ‬൛ܶ௜௜௡ ൟ ǡ ݅ ‫ א‬ሾͳǡ ݊ሿ (3) variables, PCA basically compress the data by expressing the
data in terms of the patterns between the inputs. The components
௜௡ of x(t) are reduced to values denoted by p(t)=(p1(t), p2(t), …,
where, ܶ௧௛ is the safe threshold for server inlet temperature and
௜௡ pC(t)) where C‫ܰ ا‬, is the number of feature values. With this,
ƒšሼܶ௜ ሽ is the maximum observed server inlet temperature for Eq. (4) can be rewritten as:
the ith server, their difference being resulted in R or the
adjustable temperature for the CRAC. Therefore, the higher the
R, the higher the savings for cooling energy. During the ‫ݕ‬ො௞ ሺ‫ ݐ‬൅ ȟ‫ݐ‬ሻ ൌ ‫ݕ‬௞ ሺ‫ݐ‬ሻ ൅ ‫ܨ‬௞ Ԣሺ‫݌‬ሺ‫ݐ‬ሻሻ (6)
experiment, the environment is constantly monitored while VM
workloads are being distributed in an energy-efficient manner Finally, Tarutani et al. compared the prediction model with
by SpAWM to maintain a high R. The ML approach employs a random forest method that utilizes decision-tree as a weak
backpropagation feed-forward NN with 6 input variables, 1 learner to avoid overfitting and to increase the accuracy and
hidden layer of 20 nodes, and 6 outputs. The inputs variables are speed of the prediction. From the result, it shows that the number
the server resource utilization states and the outputs are inlet of input features selected affects the predictive accuracy of both
temperature, to be predicted. The experiment is based on the data the linear regression method and the random forest method.
collected from 6 blade server enclosures from each rack up to a Higher number of feature inputs leads to lesser predictive
total of 10 racks. Result from the simulated workload showed accuracy. As the number of sensors in the data center grows,
that NN RL is able to accurately predict the inlet server which is inevitable, feature selection to reduce the number of
temperature, enabling SpAWM with energy-aware capability to inputs would become a challenge.
optimize VM placement to servers. The trial and error nature of In this paper, we apply the ML approach combining
RL suggests there would be initial penalty in the form of wasted evolutionary NN and SA to model the energy consumption of a
energy due to the ‘exploratory’ nature of the algorithm. That is, dynamic cloud data center. Our approach employs a Genetic
VMs workloads may be assigned to less optimal servers in order Algorithm (GA) for feature selection utilizing SA as a guide.
to ‘explore’ potential solution space in search for better The feature subset, along with the NN architecture, is
‘rewards’. Premature convergence or convergence to a local represented by a structurally-inclusive encoding scheme in the
minimum (sub-optimal convergence) could also dampen energy form of a chromosome matrix. A population of chromosomes is
savings as tuning the learning rate and discount factor to avoid maintained through the genetic process of crossover and
such, is an expensive process. mutation. The chromosome’s fitness is evaluated at every
Tarutani et al. [4] applied ML technique using regression generation to determine its survival in the next generation. The
models to predicting the cloud data center temperature algorithm “prunes” away connections between the neurons to
distribution. The power consumption is then reduced via pro- deemphasize a particular input neuron’s contribution to the
active control of the server load and tuning the CRAC settings. NN’s output should such an input feature causes the
The inputs to the model consist of power consumption of chromosome to have a weak fitness. The eventual NN is a
servers, server intake air temperature, UPS power consumption, network with reduced complexity. This leads to a model with
current temperature distribution and its temporal changes, better generalization and at lower computational cost. A
CRAC outlet air temperature, and CRAC air volume. The inputs complex NN has high computational cost and the tendency to
are denoted by ‫ݔ‬ሺ‫ݐ‬ሻ ൌ ൫‫ݔ‬ଵ ሺ‫ݐ‬ሻǡ ‫ݔ‬ଶ ሺ‫ݐ‬ሻǡ ǥ ǡ ‫ݔ‬ே ሺ‫ݐ‬ሻ൯where N is the overfit. The proposed evolutionary NN combined with SA helps
to extract the key factors impacting the cloud data center energy
number of sensor inputs or cloud data center operation
performance. This information provides insights for better
parameters at time-step t. The output is the predicted
decision-making and management of the cloud data center
temperature distribution in the cloud data center at time-step t,
energy consumption. The rest of the paper is organized as
as denoted by ‫ݕ‬ሺ‫ݐ‬ሻ ൌ ൫‫ݕ‬ଵ ሺ‫ݐ‬ሻǡ ‫ݕ‬ଶ ሺ‫ݐ‬ሻǡ ǥ ǡ ‫ݕ‬ெ ሺ‫ݐ‬ሻ൯ where M is the follows: Section II explains the evolutionary NN, Section III
number of temperature sensors. The predicted temperature, describes the data collection and SA approach, the experiments
‫ݕ‬ො௞ ሺ‫ ݐ‬൅ ȟ‫ݐ‬ሻ at time-step ‫ ݐ‬൅ ȟ‫ݐ‬, is given by Eq. 4, and results are discussed in Section IV and Section V concludes
with recommendations for future work.
‫ݕ‬ො௞ ሺ‫ ݐ‬൅ ȟ‫ݐ‬ሻ ൌ ‫ݕ‬௞ ሺ‫ݐ‬ሻ ൅ ‫ܨ‬௞ ሺ‫ݔ‬ሺ‫ݐ‬ሻሻ (4)
II. EVOLUTIONARY NEURAL NETWORK
where, ‫ݕ‬௞ ሺ‫ݐ‬ሻ is the temperature obtained at time-step t by the kth
sensors and ‫ܨ‬௞ ሺ‫ݔ‬ሺ‫ݐ‬ሻ is function for predicting the temperature Machine learning approach to reducing energy consumption in
of the kth sensor given the input variables at time-step t. The sum cloud data center is a viable solution. To appreciate the interest
of squared error is given by Eq. 5, in this hot topic relating to ML applications for energy efficient
management in cloud computing environment, one may refer to
݁௞ ൌ  σ௡ଵሺ‫ݕ‬ො௞ ሺ‫ ݐ‬൅ ȟ‫ݐ‬ሻ െ ‫ݕ‬௞ ሺ‫ ݐ‬൅ ȟ‫ݐ‬ሻሻଶ (5) the survey papers by Demirci [5], Tantar [6] and Zhan [9].
Evolutionary NN as a ML approach to modeling and predicting
where, n is the number of training dataset. However, the large non-linear dynamic system such as the cloud data center is a
number of input variables has compelled a reduction in the data powerful and promising approach. However, one area that has
dimensionality. Working with the transformed data whereby the not been adequately address is the area of establishing impacts

108
of the inputs to a FFNN, in particular, our interest is to discover as shown in Fig. 2. A population of these individuals is
which input features have the most impact to the cloud data maintained with each chromosomes representing a possible
center energy performance. Our approach is to apply solution in the GA search space. The optimal solution will be
evolutionary algorithm to detect the ‘weightier’ input features the fittest individual over many cycles of genetic evolution. The
that contribute positively to the NN’s fitness, directed by SA. optimum NN structure is the phenotype mapping of the fittest
As NNs are black-box models, the technique itself prevents any genotype.
easy analysis of the relationships between the inputs and
outputs. In NN modeling, every computational iteration ends up TABLE 1 A CHROMOSOME MATRIX EXAMPLE
with new and different connection weights. The results in these Hidden node 1 Hidden node 2 Hidden node 3 Hidden node 4
different weight settings can be nearly or totally the same. This Input node 1 0 -0.348 0 0.492
is because each starting weight matrices are different and during Input node 2 0 0 0.492 0.214
the training, the number of free degree is very high. The
Input node 3 0.628 0 0.914 0
numerical input-output relationships can be satisfied by
separate groups of neurons, weights and connections. This is Bias node -0.583 -0.569 0.239 -0.921
Output node 0.023 -0.345 0.295 0.148
the short-coming for a black-box modeling approach. Hence,
establishing impacts of the inputs to a FFNN is non-trivial.
Embodied in the chromosome matrix in Table 1 are the
A. Multi-Layer Feed-Forward Neural Network weights and connection characteristics. In this example, the
The NN architecture used for modeling the cloud data center input weights (denoted by values in the first 3 rows) represent
is a multi-layer feed-forward neural network (FFNN). It has the corresponding links between the input nodes and the hidden
three layers; namely the input layer, the hidden layer and the nodes. The output weights (denoted by values in the last row)
output layer. The input layer has a total of 12 input nodes and 1 represent the corresponding links between the output node and
bias node. The input nodes represent 12 energy-related variables the hidden nodes. A ‘zero’ value represents no connectivity
of the Hadoop cluster. The hidden layer has a maximum of 20 between the corresponding nodes. For instance, the input
hidden nodes and 1 bias node. The output layer has 1 output weights in the matrix positions (1,1), (1,3), (2,1), (2,2), (3,2)
node. Table 2 summarizes the description of the NN inputs and and (3,4) are zeroes, it means that there are no connection
output and Fig. 1 depicts the NN architecture. During the between the corresponding input nodes and the hidden nodes.
evolution process, the number of NN connections, the And if the one of the output weight is zero, this connection, or
connection weights and the number of hidden layer neurons, are column, can be ignored as it will not affect the output in any
determined. GA explores the solution space in search of the way. If all the input for that hidden node is zero, regardless of
fittest or the optimal NN structure. the output weight, that connection will also be ignored as there
is no neuron activation of that hidden node to the output node.
Input Layer Hidden Layer Output Layer Another matrix with the redundant columns removed will be
stored to reduce computation from recalculating the matrix. The
File size
matrix with the redundant columns is still kept as its dimension
Duration 1 is required for crossover and any residual values might be
CPU Util % important for the crossover.
2
Memory Util % The NN input layer comprises of n features denoted as
System load 3 (ܺଵ ܺଶ ǡ ǥܺ௡ ሻ During the NN learning process, the evaluation
Energy of the intensity of the stimulation (excitatory or inhibitory) from
Network BW
4 Consumption the neurons of the preceding layer is expressed in Eq. 7,
Map Bwyte Read (kWh)

Map File Byte Write


5
ܽ௝ ൌ σ௡௜ୀଵ ܺ௜ ܹ௜௝ (7)
Reduce File Byte Read 6
Disabled connection
Reduce File Byte Write
7 Active connection
Reduce Shuffle Byte 1 1
Instruction No. B
Bias
Input Bias
2 2
Fig. 1. NN Architecture with Input Features O

B. Genetic Algorithm for NN Optimization 3 3

Table 1 depicts a chromosome matrix example which is


encoded to represent the problem in the GA solution space. The B 4
chromosome matrix is an individual, known also as the
genotype, which has a corresponding mapping to its phenotype
Fig. 2. NN Phenotype Mapping from its Corresponding Genotype

109
TABLE 2 NN INPUT FEATURE SUBSET PRESENTING THE HADOOP CLUSTER

Category Metric Unit Description Method of collection

1. CPU utilization % % CPU time on MapReduce process Ganglia

System 2. System Load % % system load on MapReduce process Ganglia


3. Memory use % % memory use for MapReduce process Ganglia
4. Map file byte read Gigabyte Data read by Map from local disk Hadoop built-in counters
5. Reduce file byte read Gigabyte Data read by Reduce from local disk
IO
6. Map file byte write Gigabyte Data written by Map to local disk
Input

7. Reduce file byte write Gigabyte Data written by Reduce to local disk

Network 8. Reduce Shuffle bytes Gigabyte Data transferred from Map to Reduce Hadoop built-in counters
Transfer 9. Network Bandwidth Gigabit per sec Data transmitted and received Ganglia
10. No. of MapReduce Instructions Number Job’s instruction number Ganglia

Job Profile 11. File size Gigabyte Size of MapReduce jobs Hadoop built-in counters
12. Job completion duration Hour Time taken to finish a MapReduce job Hadoop built-in counters
Energy 1. Energy consumption kWh Energy consumed by Hadoop cluster SNMP on iPDU
Output

where, ܽ௝ is the activation function of the jth downstream neuron, III. DATA COLLECTION AND SENSITIVITY ANALYSES
Xi is the output value of the ith neuron at the previous layer and
A. Experiment Setup and Data Collection
Wij is the connection weight between the ith neuron of the
previous layer and the jth neurons of the current layer. The A Hadoop cluster with Hadoop Distributed File System
activation function implemented is the sigmoid function shown (HDFS) and MapReduce stack (Facebook Apache Hadoop
in Eq. 8. version 0.20.1), is set up to perform the experiments. The
software stack is installed over 6 x HP Proliant DL360P and
ଵ DL380P Gen8 servers, consisting of 120 cores housed within a
݂൫ܽ௝ ൯ ൌ షೌೕ (8)
ଵାୣ single rack. Each server is equipped with 64 GB memory, dual
socket 6-core Intel(R) Xeon(R) CPU E5-2667 @ 2.90GHz with
The objective function to minimize (or to maximize in the case
hyper-threading technology. The Hadoop cluster comprises of
of the fitness function) during the NN training is the mean
squared error (MSE) given by Eq. 9, 1 x namenode, 1 x secondary namenode and 4 x datanodes. All
nodes are installed with CentOS 6.5 and running on bare-metal
ଵ hardware without hypervisor or virtualization. A top-of-rack
‫ ܧܵܯ‬ൌ  σ௠ ෠ ଶ
௜ୀଵሺܻ െ ܻ ሻ (9)
௠ (TOR) Gigabit Ethernet switch connects the nodes at 1Gigabit
per second speed. A mixture of MapReduce jobs, in the form of
where, Y is the target at the output of the NN and ܻ෠ is the actual the WordCount application and Sort application are executed
calculated value by the NN and m is the number of samples. This during the experiments. The Hadoop MapReduce counters such
fitness indicates how good the chromosome is in comparison as the Map file byte read, the Reduce file byte write and etc. are
with the other solutions in the population. The chromosomes extracted using the build-in Hadoop web admin user interfaces
compete for survival. Thus, the higher the fitness value, the (UIs). The counters can be access via the HDFS namenode
higher the chances of survival, reproduction and representation admin at port 50070 and the MapReduce Job tracker admin at
in the subsequent generation. port 50030. The other counters such as the CPU and memory
The GA optimization of NN starts with an initial set of utilization and network IO are collection using Ganglia, an open
random potential solutions, expressed as a population of
source monitoring system. The power consumption data is
chromosomes. In order to create the subsequent generation,
collected using the Raritan intelligent Power Distribution Unit
chromosomes from the previous generation are merged using
the crossover operation or modified by using the mutation (iPDU), through which the servers’ power supply are connected
operator. These processes populate the subsequent generation into. The data collected from the Hadoop cluster are used to
with new chromosomes, also known as offspring. Fitter train and calibrate the NN models. The details of the testbed
chromosomes are selected and weaker chromosomes are setup and evolutionary NN training is described in our earlier
rejected to keep the population size constant and to maintain the work in [7][8].
overall health of the population on a progressing level. After B. Sensitivity Analysis
repeating the process for several generations, the best
chromosome will emerge, representing the optimum or Various authors have explored various SA techniques in
suboptimal solution to the problem. determining which inputs in NN are significant
[11][12][13][14][15]. In [10], a series of seven different SA
methods were reviewed. Amongst these methods, two are of

110
particular interest. They are; the Partial Derivatives (PaD) where, RIi is the relative importance of input node i in
method which consists of calculating the partial derivatives of percentage term.
the output according to the input variables, and the ‘weights’
method which is a technique for partitioning the connection IV. EXPERIMENTS AND RESULTS
weights to determine the relative importance of the various Though both the PaD method and the ‘weight’ method are
inputs. able to provide ML with explanatory capability [10], the
‘weight’ method is adopted in our implementation as it
1) Partial Derivatives Method
combines more readily with our GA chromosome matrix
The PaD method [16][17], allows contribution analysis of
encoding. Table 3 illustrates, by means of the chromosome
the inputs by providing a profile of the variation of the output
matrix described in Table 1, our GA with SA approach to
for small changes of one input variable. For instance, in a
determine the relative importance of the input features on the
network with ni inputs, one hidden layer with nh neurons, one
output. By applying Eq. 11 and Eq. 12, it can be seen that input
output no=1 and using the logistic-sigmoid activation function,
node 1 has the highest relative importance at 42.4%, followed
the partial derivatives of the output yi with respect to input xj
by input node 3 at 41.3% and input node 2 at 16.3%.
(with j=1, …, N and N the total number of samples) is given by
the following relationship in Eq. 10, TABLE 3 EXAMPLE CHROMOSOME MATRIX FOR SA USING ‘WEIGHTS’
METHOD

݀௝௜ ൌ ܵ௝ σ௛ୀଵ

‫ݓ‬௛௢ ‫ܫ‬௛௝ ൫ͳ െ ‫ܫ‬௛௝ ൯‫ݓ‬௜௛ (10) Hidden Hidden Hidden Hidden Sum impact of
node 1 node 2 node 3 node 4 input node at
output or RI(%)
where, Sj is the derivative of the output neuron with respect to Input node 1 Q11=0 Q21=1.0 Q31=0 Q41=0.697 1.697 or 42.4%
its input, ‫ܫ‬௛௝ is the response of the hth hidden neuron, ‫ݓ‬௛௢ is the
Input node 2 Q12=0 Q22=0 Q32=0.350 Q42=0.303 0.653 or 16.3%
weight between the hth hidden neuron and the output neuron and
‫ݓ‬௜௛ is weight between the ith input neuron and the hth hidden Input node 3 Q13=1.0 Q23=0 Q33=0.650 Q43=0 1.650 or 41.3%

neuron. If the partial derivative is negative, which indicates


negative impact, that is, as the input variable increases, the A. Actual Data
output variable will decrease. On the contrary, if the partial Integrating the ‘weights’ method into our evolutionary NN
derivative is positive, which indicates a positive impact, that is, approach, the RI of the input variables from the Hadoop dataset
as the input variable increases, the output variable will increase. is plotted for further analysis. In the experiment, we set the GA
The relative contribution of an input to the NN output can be population size to 100 chromosomes for 100 generations. At
obtained via the sum of square derivatives (SSD) value as each generation, the chromosomes fight for survival by
expressed in Eq. 11. The input variable that has the highest SSD evolving towards an optimal solution through the process of
value, has the highest impact on the output variable. selection, crossover and mutation. The fittest chromosome from
each generation is selected to compute the inputs’ RI. The data

ܵܵ‫ܦ‬௜ ൌ σே
௝ୀଵ൫݀௝௜ ൯ (11) is collected and averaged over 100 generations. The process is
repeated for 20 runs and the result is shown in Fig. 3 and Table
5.
2) ‘Weights’ Method
The ‘weights’ method [18], [19] is a technique for segregating Sensitivity Analysis - Key Factors Determining the Energy
the connection weights to accord relative importance to the Consumption of a Hadoop Cluster
inputs. The connection weights between the hidden node and
12.0
the output node are associated with the connection weights of 10.0
Relative Importance (%)

9.2 9.2 9.2


10.0 8.5 8.3 8.2 8.1
the input nodes to that hidden node. For example, in a NN with 8.0
7.7 7.3 7.2 7.1

n input nodes, m hidden nodes and 1 output node, the ‘weights’ 6.0
method is implemented using the following relationships in Eq. 4.0
12, 2.0
0.0
ȁ௪೔೓ Ǥ௪೓೚ ȁ
ܳ௜௛ ൌ σ ȁ௪ (for i=1,..,n; h=1,..,m) (12)
೔ ೔೓ Ǥ௪೓೚ ȁ

where, ‫ݓ‬௜௛ is the connection weight between the input node i


and the hidden node h, ‫ݓ‬௛௢ is the connection weight between
the hidden node h and the output node o, and ܳ௜௛ is the weighted Input Features
influence of individual hidden node h to the output given its
association with all its inputs i, and in Eq. 13,
Fig. 3. Relative Importance of Input Variable on the Energy Consumption
σ೓ ொ೔೓
ܴ‫ܫ‬௜ ሺΨሻ ൌ σ x100 (for i=1,..,n; h=1,..,m) (13) From Fig. 3, it is observed that ‘Job duration’ ranks the highest
೓ σ೔ ொ೔೓
with RI at 10.0%. The next three variables of importance are
‘Reduce shuffle bytes’, ‘Reduce file byte write’ and ‘Number

111
of mapreduce instructions’, with RI at 9.2% each. The ‘Reduce
Sensitivity Analysis - By Input Categories
shuffle bytes’ counter keeps track of the total bytes being
shuffled which is an indicator of high presence of network-

input Feature Subsets


intensive activities. At the shuffle stage of the MapReduce I/O 32.5
process, files are shuffled from mappers to reducers. Depending Job Profile 27.6
on the nature of the task, as in the case of the Sort tasks, shuffle System (CPU, Memory, Proc. etc.) 23.5
activities could be intensive. The variables ‘Memory use’,
‘System load’ and ‘CPU utilization’ are ranked 7th, 8th and 11th Network Transfer 16.4
respectively. This may seem as a surprise at first, which shall
0 10 20 30 40
be explained with Fig. 4 in the later section. Another
Relative Importance (%)
observation in Fig. 3 is that the input variable ‘file size’ is
ranked 5th. As MapReduce is a distributed and parallel
processing framework where the Big Data files are split into Fig. 4. Relative Importance of Input Features Grouped by Categories
chunks of HDFS blocks. The larger the files the more chunks to
be distributed into HDFS blocks residing at various datanodes. Fig. 5 shows the RI plots for each of the twelve input
The mapper and reducer daemons in the datanodes perform the variables of the NN. The RI is calculated from the fittest
MapReduce tasks on the data as assigned by the Job tracker. chromosome structure of that generation and traced over 100
The input variable ‘file size’, in this case, has a fairly high RI generation or until convergence is reached, i.e. the optimal
as it is a key factor contributing to the energy consumption. The solution is found.
variable ‘Network bandwidth’ is the least in relative
TABLE 4: INPUT VARIABLES GROUPED BY CATEGORIES
importance. This counter keeps track of the total number of
bytes sent and received in the Hadoop cluster. Category Input Variables
In Fig. 4, the input variables are grouped by categories with CPU Utilization
System Memory Use
the breakdown shown in Table 4. It is observed that the ‘I/O’
System Load
category has the most impact over energy consumption, Map file byte Read
followed by ‘Job profile’. The ‘System’ category, which Map file byte Write
I/O
includes utilization of CPU and memory, and OS processes, is Reduce file byte Read
ranked third. The least in relative importance is the ‘Network Reduce file byte Write
transfer’ category. Earlier in Fig. 3, it was noted that the input Network Bandwidth
Network Transfer
Reduce Shuffle Bytes
variables ‘Memory use’, ‘System load’ and ‘CPU utilization’ Job completion duration
are ranked 7th, 8th and 11th respectively. However in the category File size
Job Profile
chart in Fig. 4, ‘System’ taken as a whole, is ranked closely Number of MapReduce
behind ‘Job profile’. Instructions

File Size Job Duration CPU Utilization Memory Use System Load Network Bandwidth

Map file byte read Map file byte write Reduce file byte read Reduce file byte write Reduce shuffle byte Instruction Number

Fig. 5. Relative Importance of Inputs in Sensitivity Analysis of Neural Networks using ‘Weighted’ Method Combined with Genetic Algorithm

112
TABLE 5: COMPUTATION OF RI (%) FOR 20 RUNS REFERENCES
[1] Uptime Institute, 2014 Data Center Industry Survey.

Variable 10

Variable 11

Variable 12
Variable 1

Variable 2

Variable 3

Variable 4

Variable 5

Variable 6

Variable 7

Variable 8

Variable 9
[2] J. Gao, “Machine learning applications for data center optimization,”
Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input
Research at Google, 2014.
Run

[3] H. Chen, M. Kesavan, K. Schwan, A. Gavrilovska, P. Kumar and Y. Joshi,


1 7.6 16.4 4.4 7.5 13.9 4.0 6.1 9.1 5.9 10.2 8.4 6.4
2 10.7 9.5 5.8 12.0 6.8 7.0 8.4 9.3 5.8 9.1 7.2 8.5
“Spatially-aware optimization of energy consumption in consolidated
3 7.9 4.5 7.7 11.3 7.6 7.0 10.3 7.4 6.2 7.9 8.8 13.5 data center systems,” Proceedings of the ASME 2011 Pacific Rim
4 9.9 5.3 12.3 7.9 7.6 5.8 8.3 9.0 8.3 6.6 6.9 12.2 Technical Conference and Exhibition on Packaging and Integration of
5 8.5 7.1 7.2 5.3 5.9 12.1 6.6 6.5 9.8 4.1 12.4 14.5 Electronic and Photonic Systems, 2011, pp. 461–470.
6 10.8 7.2 8.3 7.7 6.7 6.3 7.1 6.3 9.2 13.5 8.5 8.5
7 8.0 8.0 7.1 10.5 7.5 6.9 11.0 12.1 7.3 9.5 5.5 6.5
[4] Y. Tarutani, K. Hashimoto, G. Hasegawa, Y. Nakamura, T. Tamura, K.
8 11.4 9.6 6.2 7.2 10.5 9.7 7.6 8.8 6.8 6.1 8.6 7.6 Matsuda and M. Matsuoka, “Temperature Distribution Prediction in Data
9 7.3 9.2 6.5 8.4 9.7 6.3 10.2 5.8 10.1 6.3 12.4 7.7 Centers for Decreasing Power Consumption by Machine Learning,” IEEE
10 11.4 8.5 4.1 7.1 7.9 2.6 6.8 10.1 6.5 15.5 12.3 7.3 7th International Conference on Cloud Computing Technology and
11 7.0 12.0 7.6 3.1 8.9 12.8 6.4 6.4 10.9 8.5 7.5 8.9 Science (CloudCom), 2015, pp. 635-642.
12 7.9 8.3 7.9 7.2 7.9 8.1 8.4 5.6 6.7 11.0 13.6 7.4
13 5.4 10.3 5.4 9.5 11.3 6.1 4.3 7.8 7.8 9.9 15.4 6.8 [5] M. Dermirci, “A Survey of Machine Learning Applications for Energy-
14 4.1 10.3 6.8 7.7 6.9 7.8 8.5 3.5 8.6 12.9 10.9 12.2 Efficient Resource Management in Cloud Computing Environments,”
15 8.6 14.7 6.5 12.9 10.9 5.1 8.7 5.5 8.2 7.3 4.6 7.1 IEEE 14th International Conference on Machine Learning and
16 6.9 12.2 7.1 9.4 6.7 8.3 6.7 8.7 9.2 9.0 8.3 7.5 Applications, 2015, pp. 1185-1190.
17 8.1 10.6 8.3 10.4 5.8 7.7 7.7 7.2 10.6 9.2 5.9 8.6
18 8.6 12.3 7.7 8.5 6.2 9.2 5.6 3.0 9.7 8.7 9.6 11.0 [6] A.-A. Tantar, A. Q. Nguyen, P. Bouvry, B. Dorronsoro and E.-G. Talbi,
19 10.5 13.1 7.1 3.8 6.1 5.3 6.6 4.8 9.6 9.3 9.7 14.1 "Computational Intelligence for Cloud Management Current Trends and
20 9.2 10.1 10.9 6.3 7.7 4.5 8.1 8.6 8.6 9.7 8.3 8.0 Opportunities," in Proceedings of the IEEE Congress on Evolutionary
Ave 8.5 10.0 7.2 8.2 8.1 7.1 7.7 7.3 8.3 9.2 9.2 9.2 Computation (CEC), 2013, pp. 1286-1293.
Stdev 1.8 1.7 1.7 1.6 1.5 1.8 1.6 1.6 1.7 1.7 2.0 1.7
[7] Y. W. Foo, C. Goh, H. C. Lim, Z. H. Zhan and Y. Li, “Evolutionary
Neural Network Based Energy Consumption Forecast for Cloud
V. CONCLUSION AND FUTURE WORK Computing,” Proceedings of the 2015 International Conference on Cloud
The ability to give meaningful insights to black-box NN Computing Research and Innovation, 2015, pp. 53-64.
models can provide explanation on how the various system [8] Y.W. Foo, C. Goh, H.C. Lim and Y. Li, “Evolutionary Neural Network
Modeling for Energy Prediction of Cloud Data Centers,” International
parameters affect its output. This benefit is of interest as useful Symposium on Grids and Clouds, 2015.
and novel information can be derived to improve system [9] Z. H. Zhan, X. F. Liu, Y. J. Gong, J. Zhang, S.H. Chung, and Y. Li.,
characteristics and performance. In this paper, we have “Cloud computing resource scheduling and a survey of its evolutionary
presented an approach combining evolutionary NN with approaches,” ACM Computing Surveys, vol. 47, no. 4, Article 63, pp. 1-
sensitivity analysis. It was shown that the ‘weights’ method is 33, Jul. 2015.
able to seamlessly integrate with our GA to determine key [10] M. Gevrey, I. Dimopoulos and S. Lek, “Review and comparison of
methods to study the contribution of variables in artificial neural network
factors contributing to energy consumption in cloud data models,” Ecological Modelling, 2003, vol. 160(3), pp. 249-264.
centers. The results show that IO activities contribute most [11] P. Cortez and M. J. Embrechts, “Opening black box Data Mining models
significantly to energy consumption. Armed with this insight, using Sensitivity Analysis,” IEEE Symposium on Computational
data centers can reduce energy consumption by minimizing Intelligence and Data Mining (CIDM), 2011
access to storage or utilizing more efficient storage [12] A. Jain and D. Zongker, “Feature selection: evaluation, application, and
small sample performance”, IEEE Transactions on Pattern Analysis and
infrastructures and components. The other aspect revealed in Machine Intelligence, 1997, vol, 19(2), pp. 153-158.
our experiment is that the job profile such as file size, time taken [13] T. Tchaban, M. J. Taylor and J. P. Griffin, “Establishing Impacts of the
to complete a task and the resources assigned to the job have Inputs in a Feedforward Neural Network,” Neural Computing and
relatively high impact on the energy consumption. As jobs may Application, 1998, vol. 7, pp. 309-317
be bounded by service-level-agreement (SLA), adequate [14] A. Hunter, L. Kennedy, J. Henry and I. Ferguson, “Application of Neural
resources are usually assigned to complete the tasks within a Networks and Sensitivity Analysis to Improved Prediction of Trauma
Survival,” Computer Methods and Programs in Biomodicine, 2000, vol.
certain time. Hence, this knowledge provides an opportunity for 62, pp. 11-19.
software analyst to design and develop energy-aware [15] S. Lek, M. Delacoste, P. Baran, I. Dimopoulos, J. Lauga and S. Aulagnier,
applications that optimizes resource allocation. Another “Application of Neural Networks to Modeling NonLinear Relationships
potential energy savings that could be realized is to partition the in Ecology,” Ecological Modelling, 1996, vol. 90, pp. 39-52.
Big Data into bigger chunks and place them closest to the [16] Y. Dimopoulos, P. Bourret and S. Lek, “Use of Some Sensitivity Criteria
mappers and reducers in the Hadoop HDFS. By doing so would for Choosing Neteworks with Good Generalization Ability,” Neural
Processing Letter, 1995, vol.2(6), pp. 1-4.
shorten the time to complete the MapReduce jobs and thus
[17] I. Dimopoulos, J. Chronopoulos, A. Chronopoulou-Sereli and S. Lek,
saving energy. “Neural Network Models to Study Relationships between Lead
Although our experiment combining GA and SA has Concentration in Grasses and Permanent Urban Descriptors in Athens
managed to shed light on key factors contributing to energy City (Greece),” Ecological Modelling, 1999, vol. 120(2-3), pp. 157-165.
consumption, further investigations are still needed. For future [18] G. D. Garson, “Interpreting Neural Network Connection Weights,”
Artificial Intelligence Expert, 1991, vol. 6, pp. 47-51.
developments, we plan to take advantage of ML black-box
[19] A. T. C. Goh, “Back-propagation Neural Networks for Modelling
approach with SA to develop a ‘grey box’ that will preserve the Complex Systems,” Artificial Intelligence in Engineering, 1995, vol. 9,
physical significance of the system parameters while at the pp. 143-151.
same time model the nonlinear complexities of the cloud data
center to allow for greater energy savings and better
understanding of the system.

113

You might also like