Sensitivity Analysis
Sensitivity Analysis
Abstract—Machine learning (ML) approach to modeling and Based on Uptime Institute’s 2014 Data Center Industry
predicting real-world dynamic system behaviours has received Survey, the average data centers’ PUE has only improved
widespread research interest. While ML capability in slightly from 1.89 in 2011 to 1.7 in 2014 [1]. This is a gain of
approximating any nonlinear or complex system is promising, it is 11.2% compared to a gain 32.3% from PUE of 2.50 to 1.89 from
often a black-box approach, which lacks the physical meanings of 2007 to 2011. The survey also reported that 77% of the
the actual system structure and its parameters, as well as their participating industry cited that the management has set a target
impacts on the system. This paper establishes a model to provide PUE, with more than half expecting to lower PUE to 1.5 or
explanation on how system parameters affect its output(s), as such better. This scenario presents a tremendous opportunity for
knowledge would lead to potential useful, interesting and novel
researchers, engineers and technologists to further improve the
information. The paper builds on our previous work in ML, and
also combines an evolutionary artificial neural networks with
cloud data center energy efficiency.
sensitivity analysis to extract and validate key factors affecting the Recently, applying data-driven ML techniques to lower
cloud data center energy performance. This provides an cloud data center energy consumption have been a hot research
opportunity for software analysts to design and develop energy- topic. Gao [2], proposed improving the cloud data center Power
aware applications and for Hadoop administrator to optimize the Usage Effectiveness (PUE) using ML technique based on
Hadoop infrastructure by having Big Data partitioned in bigger artificial neural networks (NN). The feed-forward NN takes in
chunks and shortening the time to complete MapReduce jobs. 19 inputs variables, contains 5 hidden layers with 50 nodes per
layer and outputs 1 variable. The NN model has achieved a high
Keywords—machine learning, artificial neural networks, predictive accuracy with a mean absolute error of 0.004 and
sensitivity analysis, cloud computing, energy efficiency, genetic standard deviation of 0.005 on the test dataset. The model was
algorithm validated with a ‘live’ experiment conducte d by simulating an
actual increase in process water supply temperature to the server
I. INTRODUCTION floor by 3oF (or ~1.7oC) resulting in an expected decrease of
The accelerated growth in cloud computing is expected to ~0.005 PUE as predicted by the model.
drive energy consumption of cloud data centers to new highs. Chen [3] suggested a spatially-aware Virtual Machine (VM)
An effective engineering ratio that measures the data center workload placement method, called SpAWM to optimize the
energy efficiency is the Power Usage Effectiveness (PUE). This consumption of power and cooling in cloud data center.
term records baseline data and traces energy efficiency SpAWM adopts the ML approach using neural network and
movements. It is expressed as a ratio as shown in Eq. 1, with the reinforcement learning (RL). Developed from Markov Decision
overall energy efficiency improving as the value decreases Process (MDP), the central idea in RL is to learn the optimal
towards 1. action, ܽ௧ via a trial and error process, after taking into account
every state, ݏ௧ visited by the system. For every state-action pair,
σ൫ ା ାೠ ାೝೞ ൯ the RL keeps an associated Q value. If the selected action in a
ܷܲ ܧൌ σ ೠ
(1) state is positive, the feedback increases the Q value. Else, if the
feedback is negative, Q value is decreased. In the RL, the new Q
where, the numerator denotes the sum of all power consume by value is updated via the Eq. 2,
the cloud data center including the mechanical facility (chillers
and computer room air-con or CRAC), the electrical facility ܳሺݏ௧ ǡ ܽ௧ ሻ ൌ ߙሼݎ௧ ߣሾܳሺݏ௧ାଵ ǡ ܽ௧ାଵ ሻ െ ܳሺݏ௧ ǡ ܽ௧ ሻሿሽ (2)
(switchgear, UPS, battery backup), the ICT computing
infrastructure (servers, storage, networks and where, ܳሺݏ௧ାଵ ǡ ܽ௧ାଵ ሻ denotes the Q value of the next state t+1,
telecommunications equipment) plus any other devices ܳሺݏ௧ ǡ ܽ௧ ሻdenotes the Q value of the current state t, ߙis the
(lightings, printers, personal computers, VoIP phones, fax learning rate, ߣ is the discount factor and rt is the immediate
machines and etc.) expend to support the cloud data center reward received at state t. The number of Q values can grow
operations, and the denominator denotes the sum of all power exponentially if the state-action pair is large, hence NN
consume by the ICT computing infrastructure only that produces modeling is utilized for the cloud data center environment to
useful IT work. capture the relationship between resource utilization (state
space), workload assignments (actions) and thermal distribution
108
of the inputs to a FFNN, in particular, our interest is to discover as shown in Fig. 2. A population of these individuals is
which input features have the most impact to the cloud data maintained with each chromosomes representing a possible
center energy performance. Our approach is to apply solution in the GA search space. The optimal solution will be
evolutionary algorithm to detect the ‘weightier’ input features the fittest individual over many cycles of genetic evolution. The
that contribute positively to the NN’s fitness, directed by SA. optimum NN structure is the phenotype mapping of the fittest
As NNs are black-box models, the technique itself prevents any genotype.
easy analysis of the relationships between the inputs and
outputs. In NN modeling, every computational iteration ends up TABLE 1 A CHROMOSOME MATRIX EXAMPLE
with new and different connection weights. The results in these Hidden node 1 Hidden node 2 Hidden node 3 Hidden node 4
different weight settings can be nearly or totally the same. This Input node 1 0 -0.348 0 0.492
is because each starting weight matrices are different and during Input node 2 0 0 0.492 0.214
the training, the number of free degree is very high. The
Input node 3 0.628 0 0.914 0
numerical input-output relationships can be satisfied by
separate groups of neurons, weights and connections. This is Bias node -0.583 -0.569 0.239 -0.921
Output node 0.023 -0.345 0.295 0.148
the short-coming for a black-box modeling approach. Hence,
establishing impacts of the inputs to a FFNN is non-trivial.
Embodied in the chromosome matrix in Table 1 are the
A. Multi-Layer Feed-Forward Neural Network weights and connection characteristics. In this example, the
The NN architecture used for modeling the cloud data center input weights (denoted by values in the first 3 rows) represent
is a multi-layer feed-forward neural network (FFNN). It has the corresponding links between the input nodes and the hidden
three layers; namely the input layer, the hidden layer and the nodes. The output weights (denoted by values in the last row)
output layer. The input layer has a total of 12 input nodes and 1 represent the corresponding links between the output node and
bias node. The input nodes represent 12 energy-related variables the hidden nodes. A ‘zero’ value represents no connectivity
of the Hadoop cluster. The hidden layer has a maximum of 20 between the corresponding nodes. For instance, the input
hidden nodes and 1 bias node. The output layer has 1 output weights in the matrix positions (1,1), (1,3), (2,1), (2,2), (3,2)
node. Table 2 summarizes the description of the NN inputs and and (3,4) are zeroes, it means that there are no connection
output and Fig. 1 depicts the NN architecture. During the between the corresponding input nodes and the hidden nodes.
evolution process, the number of NN connections, the And if the one of the output weight is zero, this connection, or
connection weights and the number of hidden layer neurons, are column, can be ignored as it will not affect the output in any
determined. GA explores the solution space in search of the way. If all the input for that hidden node is zero, regardless of
fittest or the optimal NN structure. the output weight, that connection will also be ignored as there
is no neuron activation of that hidden node to the output node.
Input Layer Hidden Layer Output Layer Another matrix with the redundant columns removed will be
stored to reduce computation from recalculating the matrix. The
File size
matrix with the redundant columns is still kept as its dimension
Duration 1 is required for crossover and any residual values might be
CPU Util % important for the crossover.
2
Memory Util % The NN input layer comprises of n features denoted as
System load 3 (ܺଵ ܺଶ ǡ ǥܺ ሻ During the NN learning process, the evaluation
Energy of the intensity of the stimulation (excitatory or inhibitory) from
Network BW
4 Consumption the neurons of the preceding layer is expressed in Eq. 7,
Map Bwyte Read (kWh)
109
TABLE 2 NN INPUT FEATURE SUBSET PRESENTING THE HADOOP CLUSTER
7. Reduce file byte write Gigabyte Data written by Reduce to local disk
Network 8. Reduce Shuffle bytes Gigabyte Data transferred from Map to Reduce Hadoop built-in counters
Transfer 9. Network Bandwidth Gigabit per sec Data transmitted and received Ganglia
10. No. of MapReduce Instructions Number Job’s instruction number Ganglia
Job Profile 11. File size Gigabyte Size of MapReduce jobs Hadoop built-in counters
12. Job completion duration Hour Time taken to finish a MapReduce job Hadoop built-in counters
Energy 1. Energy consumption kWh Energy consumed by Hadoop cluster SNMP on iPDU
Output
where, ܽ is the activation function of the jth downstream neuron, III. DATA COLLECTION AND SENSITIVITY ANALYSES
Xi is the output value of the ith neuron at the previous layer and
A. Experiment Setup and Data Collection
Wij is the connection weight between the ith neuron of the
previous layer and the jth neurons of the current layer. The A Hadoop cluster with Hadoop Distributed File System
activation function implemented is the sigmoid function shown (HDFS) and MapReduce stack (Facebook Apache Hadoop
in Eq. 8. version 0.20.1), is set up to perform the experiments. The
software stack is installed over 6 x HP Proliant DL360P and
ଵ DL380P Gen8 servers, consisting of 120 cores housed within a
݂൫ܽ ൯ ൌ షೌೕ (8)
ଵାୣ single rack. Each server is equipped with 64 GB memory, dual
socket 6-core Intel(R) Xeon(R) CPU E5-2667 @ 2.90GHz with
The objective function to minimize (or to maximize in the case
hyper-threading technology. The Hadoop cluster comprises of
of the fitness function) during the NN training is the mean
squared error (MSE) given by Eq. 9, 1 x namenode, 1 x secondary namenode and 4 x datanodes. All
nodes are installed with CentOS 6.5 and running on bare-metal
ଵ hardware without hypervisor or virtualization. A top-of-rack
ܧܵܯൌ σ ଶ
ୀଵሺܻ െ ܻ ሻ (9)
(TOR) Gigabit Ethernet switch connects the nodes at 1Gigabit
per second speed. A mixture of MapReduce jobs, in the form of
where, Y is the target at the output of the NN and ܻ is the actual the WordCount application and Sort application are executed
calculated value by the NN and m is the number of samples. This during the experiments. The Hadoop MapReduce counters such
fitness indicates how good the chromosome is in comparison as the Map file byte read, the Reduce file byte write and etc. are
with the other solutions in the population. The chromosomes extracted using the build-in Hadoop web admin user interfaces
compete for survival. Thus, the higher the fitness value, the (UIs). The counters can be access via the HDFS namenode
higher the chances of survival, reproduction and representation admin at port 50070 and the MapReduce Job tracker admin at
in the subsequent generation. port 50030. The other counters such as the CPU and memory
The GA optimization of NN starts with an initial set of utilization and network IO are collection using Ganglia, an open
random potential solutions, expressed as a population of
source monitoring system. The power consumption data is
chromosomes. In order to create the subsequent generation,
collected using the Raritan intelligent Power Distribution Unit
chromosomes from the previous generation are merged using
the crossover operation or modified by using the mutation (iPDU), through which the servers’ power supply are connected
operator. These processes populate the subsequent generation into. The data collected from the Hadoop cluster are used to
with new chromosomes, also known as offspring. Fitter train and calibrate the NN models. The details of the testbed
chromosomes are selected and weaker chromosomes are setup and evolutionary NN training is described in our earlier
rejected to keep the population size constant and to maintain the work in [7][8].
overall health of the population on a progressing level. After B. Sensitivity Analysis
repeating the process for several generations, the best
chromosome will emerge, representing the optimum or Various authors have explored various SA techniques in
suboptimal solution to the problem. determining which inputs in NN are significant
[11][12][13][14][15]. In [10], a series of seven different SA
methods were reviewed. Amongst these methods, two are of
110
particular interest. They are; the Partial Derivatives (PaD) where, RIi is the relative importance of input node i in
method which consists of calculating the partial derivatives of percentage term.
the output according to the input variables, and the ‘weights’
method which is a technique for partitioning the connection IV. EXPERIMENTS AND RESULTS
weights to determine the relative importance of the various Though both the PaD method and the ‘weight’ method are
inputs. able to provide ML with explanatory capability [10], the
‘weight’ method is adopted in our implementation as it
1) Partial Derivatives Method
combines more readily with our GA chromosome matrix
The PaD method [16][17], allows contribution analysis of
encoding. Table 3 illustrates, by means of the chromosome
the inputs by providing a profile of the variation of the output
matrix described in Table 1, our GA with SA approach to
for small changes of one input variable. For instance, in a
determine the relative importance of the input features on the
network with ni inputs, one hidden layer with nh neurons, one
output. By applying Eq. 11 and Eq. 12, it can be seen that input
output no=1 and using the logistic-sigmoid activation function,
node 1 has the highest relative importance at 42.4%, followed
the partial derivatives of the output yi with respect to input xj
by input node 3 at 41.3% and input node 2 at 16.3%.
(with j=1, …, N and N the total number of samples) is given by
the following relationship in Eq. 10, TABLE 3 EXAMPLE CHROMOSOME MATRIX FOR SA USING ‘WEIGHTS’
METHOD
݀ ൌ ܵ σୀଵ
ݓ ܫ ൫ͳ െ ܫ ൯ݓ (10) Hidden Hidden Hidden Hidden Sum impact of
node 1 node 2 node 3 node 4 input node at
output or RI(%)
where, Sj is the derivative of the output neuron with respect to Input node 1 Q11=0 Q21=1.0 Q31=0 Q41=0.697 1.697 or 42.4%
its input, ܫ is the response of the hth hidden neuron, ݓ is the
Input node 2 Q12=0 Q22=0 Q32=0.350 Q42=0.303 0.653 or 16.3%
weight between the hth hidden neuron and the output neuron and
ݓ is weight between the ith input neuron and the hth hidden Input node 3 Q13=1.0 Q23=0 Q33=0.650 Q43=0 1.650 or 41.3%
n input nodes, m hidden nodes and 1 output node, the ‘weights’ 6.0
method is implemented using the following relationships in Eq. 4.0
12, 2.0
0.0
ȁ௪ Ǥ௪ ȁ
ܳ ൌ σ ȁ௪ (for i=1,..,n; h=1,..,m) (12)
Ǥ௪ ȁ
111
of mapreduce instructions’, with RI at 9.2% each. The ‘Reduce
Sensitivity Analysis - By Input Categories
shuffle bytes’ counter keeps track of the total bytes being
shuffled which is an indicator of high presence of network-
File Size Job Duration CPU Utilization Memory Use System Load Network Bandwidth
Map file byte read Map file byte write Reduce file byte read Reduce file byte write Reduce shuffle byte Instruction Number
Fig. 5. Relative Importance of Inputs in Sensitivity Analysis of Neural Networks using ‘Weighted’ Method Combined with Genetic Algorithm
112
TABLE 5: COMPUTATION OF RI (%) FOR 20 RUNS REFERENCES
[1] Uptime Institute, 2014 Data Center Industry Survey.
Variable 10
Variable 11
Variable 12
Variable 1
Variable 2
Variable 3
Variable 4
Variable 5
Variable 6
Variable 7
Variable 8
Variable 9
[2] J. Gao, “Machine learning applications for data center optimization,”
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Research at Google, 2014.
Run
113