Paper Pengolahan Data
Paper Pengolahan Data
Abstract— Predicting electricity power consumption is an Where is the energy lost in the transportation and is
important task which provides intelligence to utilities and helps real number. According to the value of we distinguish
them to improve their systems’ performance in terms of between the three following cases:
productivity and effectiveness. Machine learning models are the
most accurate models used in prediction. The goal of our study
is to predict the electricity power consumption every 10 minutes, x If is a large positive number, that means there is a
and/or every hour with the determining objective of which quantity of energy produced but not used. In general,
approach is the most successful. To this end, we will compare this exceeded energy is lost. The problem is up to
different types of machine learning models that recently have now, the extra power needs to be stored or the
gained popularity: feedforward neural network with production needs to be reduced.
backpropagation algorithm, random forest, decision tree, and x If is a large negative number, that means the
support vector machine for regression (SVR) with radial basis
consumption is larger than production. In this case
function kernel. The parameters associated with the
comparative models are optimized based on Grid-search
the problem blackout can be occurred. So, new
method in order to find the accurate performance. The dataset resources of energy are required to handle this
that is used in this comparative study is related to three different situation.
power distribution networks of Tetouan city which is located in x If is positive and is near to 0, that means it is a
north Morocco. The historical data used has been taken from stable electrical system and there is a harmony
Supervisory Control and Data Acquisition system (SCADA) between the production and consumption.
every 10 minutes for the period between 2017-01-01 and 2017-
12-31. The results indicate that random forest model achieved
The goal of power companies is to keep the value of like
smaller prediction errors compared to their counterparts.
the third case. They have to make a balance between
Keywords—Energy Prediction, Artificial Neural Networks,
production and consumption. To this end, they need a strong
Random Forest, Decision Tree, Support Vector Regression, system which stands for accurate prediction.
Linear Regression
The electric power consumption has increased in the last
I. INTRODUCTION decade due to the growing of economic development and
population. Specifically, the annual electricity power
Over time, increased consumption of electric power and consumption is increasing in the industrial and domestic
increased attention to its details, such as electrical sectors. There are many factors that determine the energy
forecasting, have been concentrated by researchers. Also, consumption, we cite here weather, population, price of
Electricity companies have been spending a lot of money and electricity and consumer behavior. The variability of those
effort to control and manage their electric power effectively. factors makes the energy prediction more difficult.
Therefore, it becomes necessary to absorb all new and
available methods and to choose the best in proportion to the Considering this complexity, researches were
nature and quality of services provided by energy companies. concentrated on finding the most accurate prediction models
One of those things which needs to be considered is how to to predict the real demand of energy consumption. So, many
know the energy produced and consumed in order to balance algorithms and models have been proposed to offer solutions
production and consumption, to decrease the cost of to the power prediction. In general, those algorithms are
production, and to control future planning? classified into three categories: statistical, engineering, and
artificial intelligence. In literature, several researchers use
Knowing the real production and consumption of power artificial intelligence algorithms to make a prediction model
is the first step of making a good electrical system. To save especially that one’s of machine learning algorithms.
resources and reduce costing, power utilities are required to
balance between produced power and customers’ 3ÕQDU7IHNF[1] examined machine learning to predict
consumption. The relationship is described by the following full load electrical power output of a base load operated
equation: combined cycle power plant. Evangelia Xypolytou et al. [2]
studied Short-term electricity consumption forecast with
( + ) = artificial neural networks a case study of office buildings.
Muhammad Waseem Ahmad et al. [3] compared feed- regression with radial basis function kernel, and artificial
forward back-propagation artificial neural network with neural network. Here, we use the common notations to
random forest to predict power consumption of a hotel in describe the algorithms.
Madrid, Spain. M. ErdemGünay[4] predicted annual gross
electricity demand by artificial neural networks case of A. Linear Regression
Turkey. Murat Kankal et al [5] studied the performance of Linear regression is considered as one of the simplest
an artificial neural network for modeling electricity energy approaches and it can be used as a baseline performance
demand in Turkey. K.P.Amber et al [6] compared five measure. It is based on linear relation between the dependent
artificial intelligent system techniques to predict electricity and independent variables[11]. It is defined by this equation:
power consumption of a building located in London. Fazil
Kaytez et al. [7] compared the regression analysis, neural () = + + + + (1)
networks and least squares support vector machines for
predicting the electricity energy consumption of Turkey. Where , ,…, are the available inputs and
Henrique Pombeiro et al. [8] compared Linear regression vs.
, , … , are the functional weights.
fuzzy modeling vs. neural networks models to predict
electricity consumption in an institutional building. Subodh
B. Decision Tree
Paudel et al. [9] predicted energy consumption of low energy
building based on support vector machine. Hamid R. The decision tree is a commonly used machine learning
Khosravani et al. [10] compared prediction models for energy method [12]. It is a kind of classification or regression and it
consumption based on neural networks of a bioclimatic utilizes a tree structure to separate a set of data into several
building. predefined classes giving the characterization, generalization
Those previous cited models may be appropriate for some and classification of given datasets[13].
cases but it cannot be generalized.
Its goal is to predict the target variable value by learning
In this study, we will compare the well-known machine simple decision rules deduced from the data and it shows how
learning algorithms in order to increase the efficiency and the target variable can be forecasted by predictor variables
revenues of the electrical generating and distribution set.
networks companies, and to assist them planning their
capacity and operations to supply all consumers with the There are many types of decision tree generation such as
required energy reliably. The availability of historical data ID3[12], C4.5[14] and classification and regression trees
allows us to use the supervised models such as decision tree, (CART)[15]. In this work, we implemented CART, along
support vector machine for regression, artificial neural with a Scikit-learn: Machine Learning in Python[16]. CART
network (feedforward neural network with backpropagation is nonparametric procedure to predict continuous dependent
algorithm) and random forest. Those models will also be variable which utilizes a binary tree to divide the predictor
compared to the linear regression method. The comparative space into subsets recursively[17]. According to our study,
study is based on historical consumption energy data of we found the performance of this model better than its family
Tetouan city for the period between 2017-01-01 and 2017- counterpart.
12-31. The historical data used are for Quads, Boussafou and
Smir distribution networks. It was taken from the SCADA C. Random Forest
system of the regional distribution company of drinking water Random forest is classified as an ensemble approach
and electricity (AMENDIS). This data is exclusive and have which combines the performance of numerous decision tree
not been used before our research. Due to the dependence of algorithms to predict the variable value. It was proposed to
prediction models on the input variables to obtain the best improve the accuracy of decision tree and it has different
output, we will include the weather data that is taken for the construction for regression and classification.
same period of power consumption. Moreover, we will use
the attributes of date and time as independent variables, study A number of regression trees are built by random forest
the impact of those factors on the prediction, and determine and the result is the average. While trees are grown, a
the importance of each factor on power consumption. regression predictor is defined as:
II. OVERVIEW OF MACHINE LEARNING ALGORITHMS Selecting randomly a set of trees in the forest is
accomplished to make new training set. The set of unselecting
In this section, we will provide a brief description of five trees is known as out of bag samples[18]. Random features
different machine learning methods: linear regression, are selected in each split node of a decision tree instead of all
decision tree, random forest, support vector machine for
features. This process is repeated in order to create a random output and there are no connections back from the output
forest [19]. Aggregation of each individual prediction trees layer to the hidden layer or from the hidden layer to the input
makes the prediction of the random forest and this layer. The values are processed through transfer function
aggregation prediction gets better performance than the from the input layer to the hidden layer and multiplied by the
individual prediction of trees [20]. Moreover, random forest connection values. Also, the values are forwarded from the
provides an estimation of the relevant important features and hidden layers to the output layer in the same way.
how each feature affects the prediction [21]. In sum, Mathematically, it can be described by the following
simplicity, velocity, interpretability, accuracy and ease of use equation:
are the most important properties of random forest.
IJ
IJKO
IJKL
D. Support Vector Machine >? = ? @A #8 8 BC #8D D EF #DG G (. . ) + &D N + &8 P + &? Q
GM
DM
8M
Support vector machine was introduced in the late 1960s
(5)
and it has not got significant consideration until recent years.
SVM is a type of supervised method to achieve the
classification of multidimensional and it was originally Where >? , , R , ST , #, &8 represent neural network output,
invented as a linear classification then to a non-linear the activation function, the number of hidden layers, the
classifier. Lately, it was used to solve regression number of neurons in the hidden layer, the weight of
problems[22] which is based on the concept of support connections, and the bias of the neuron respectively. The
vectors and called support vector regression (SVR). transfer function is called activation function and there are
many types. Sigmoid[25], Hyperbolic Tangent
It is defined as: Function(Tanh)[26], Rectified Linear Unit (ReLU)[27],
Exponential linear Unit (ELU)[28], Scaled Exponential linear
" = () = # $ %() + & (3) (3) Unit (SELU)[29] and Swish[30] activation functions were
used in this work.
Where % is any nonlinear function to map input to output:
Feedforward neural network learn through different
types of learning rules, but backpropagation is the most used
%: ' %() * -.
algorithm. To reduce errors, learning rule is used with
optimization algorithms to find the best parameters and
The best solution is detected by minimizing the following
compare the predicted output value with the real value and
function:
the errors’ feedback in order to adjust the weight of
1
connections. This step is repeated until it reaches the
/#/2 + 3 =1 4 + 5 (4) minimum number of errors or number of epochs.
2
Fig 1. Power consumption for the year of 2017 at each hour for three distribution
There are similarities and differences between the three and the opposite happened on weekends. Our dataset is
distributions. The increasing of power consumption in the aggregated data and it is not determined for specific types of
summer is similar and that because of the hot weather and building to know the general effect of working days or
vacation time (the number of visitors grow up the weekends. Figure 3 shows the consumption of the week days
population). But the difference is the reduced power and it shows slight electricity power consumption is used on
consumption of Quads and Smir distribution in November Sunday compared to the other days.
and December compared to the power consumption of
Boussafou distribution at the same months.
TABLE I WEATHER PROPERTIES AND COEFFICIENT OF As it mentioned in the case study section, we got the
CORRELATION BETWEEN THE INPUT VARIABLES AND THE data from different resources. After extracting, transforming
OUTPUT VARIABLE
and loading data from the resources, we normalized the data
Count Mean STD* Min Max Correlation as a result of depending some models’ performance on
Quads 52560 32330 7133.05 13895.7 52204.4 1 normalization such as neural network. We transformed the
Temperature 52560 18.81 5.82 3.247 40.010 0.440221
data into the values between zero and one by using Min-Max
Normalization which is one of the best used techniques. This
Humidity 52560 68.26 15.55 11.34 94.8 -0.287421
normalization is achieved by:
Wind Speed 52560 1.96 2.34 0.05 6.483 0.167444
UVWXYZ
Diffuse flows 52560 75.03 124.21 0.011 936.00 0.080274 = (6)
WX[\ VWXYZ
Global Diffuse 52560 182.67 264.41 0.004 1163.00 0.187965
*STD: Standard derivation
We used grid search to find the best parameters for
algorithms. Random Forest algorithm depends on several
In Table I and Figure 2, we showed the power
hyperparameters. Selecting the appropriate values of these
consumption correlation to the calendar and weather parameters is an important step to get the most accurate result
attributes. Also, we applied feature selection to data in and there is no rule to be determined and followed. The most
order to determine the importance of predictive
important parameters are number of trees in the forest,
variables and get rid of the unimportant features. number of features to consider at each split, max depth of
There are many ways to perform that, one of them is each tree in the forest, the required minimum number of
random forest which identifies the true predictor of a
samples to split and the minimum number of samples
large number of candidates[31]. It is shown in Figure demanded to be at a leaf node. Those parameters are
4 that all variables are important but hour and optimized by cross-validation and grid search method. A rang
temperature are the most valuable. of parameter values were selected and trained on them. The
number of trees parameter was tested on set of the values {10,
20, 30, 50, 100, 200, 300}. The number of features was tested
on {1, 2, 3, 4, 5, 6, 7, 8, 9} etc. The grid search method shows
the score for each parameter value to be considered as chosen
values. The best values which are gotten by the grid search
method were 30, 7, None, 2 and 1 for the number of trees and
the number of features, max depth of the tree, min samples
split and min sample leaf parameters respectively.
TABLE II. RSME AND MAE COMPARISON OF ALGORITHMS IN 3 DISTRIBUTIONS FOR 10 MINUTES POWER CONSUMPTION
Quads Distribution Smir Distribution Boussafou Distribution Aggregated Distribution
Algorithm RSME MAE RSME MAE RSME MAE RSME MAE
Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test
RF 671.7 3174.7 472.8 2663.5 214.1 2336.9 135.6 1939.6 594.5 3227.8 420.5 2475.9 482.3 4481.1 318.5 3595.3
DT 840.2 4613.9 550.7 3962.3 306.7 2849.8 179.3 2396.3 611.7 3543.7 405.9 2759.5 790.6 5957.3 490.0 4835.5
SVR 4092.3 3898.7 3192.3 3046.0 4205.5 5584.3 3298.6 4680.8 3334.4 3981.3 2671.6 3066.6 10821.8 9647.2 8505.6 7692.8
FFNN 2562.2 3203.6 1945.7 2601.2 3815.8 4877.8 2976.9 4007.1 2731.5 3745.6 2119.5 2965.2 6487.6 7045.9 4985.02 5583.9
LR 4404.2 3925.5 3522.1 3112.2 4068.4 4949.9 3213.4 4033.4 3142.8 5785.7 2504.1 4647.2 10687.4 10152.5 8450.9 8110.3
Fig 5. Actual vs predicted electricity power consumption of comparative models for the Quads distribution of every 10 minutes
Fig 6. Actual vs predicted forecast of comparative models for the aggregation power consumption every 10 minutes
Power utilities need to have the prediction of different time comparative algorithms again by the same optimizer method
periods such as hours, days, weeks, months and sometimes (grid search) for the same sets of parameters. Table III shows
years for decision making and plans. In this work, we used the optimized parameters of the comparative models and it is
the prediction of 10 minutes and one hour periods and it can obviously different from distribution to another. Table IV
be applied to different time periods. For the hourly prediction, shows the result of the models in each distribution and the
we reduced the number of parameters that became ineffective aggregation of the three distribution. The results present that
such as minutes. As the value of power consumption and the also the random forest still performs the best achievement.
parameters were changed, we needed to optimize
TABLE III OPTIMIZINING COMPARITIVE MODEL PARAMETERS FOR EVERY HOUR POWER CONSUMPTION BY USING GRID SEARCH
Model Quads Distribution Parameter Smir Distribution Parameter Boussafou Distribution Aggregated Distribution
Parameter Parameter
Num of features = 3, min samples Num of features = 7, min samples Num of features = 7, min samples Num of features = 5, min
split = 2, Num of Trees = 50, max split = 3, Num of Trees = 10, max split = 3, Num of Trees = 10, max samples split = 2, Num of Trees
RF
depth of the tree = None, min depth of the tree = None, min depth of the tree = None, min = 100, max depth of the tree =
sample leaf = 1 sample leaf = 1 sample leaf = 10 None, min sample leaf = 1
Num of features = 5, min samples Num of features = 7, min samples Num of features = 9, min samples Num of features = 9, min
split = 2, max depth of the tree = split = 3, max depth of the tree = split = 2, max depth of the tree = samples split = 3, max depth of
DT
None, min sample leaf = 10 None, min sample leaf = 10 None, min sample leaf = 10 the tree = None, min sample
leaf = 3
SVR C= 10, gamma= 0.01 'C': 1, 'gamma': 0.01 'C': 1000, 'gamma': 0.01 C= 1, gamma= 0.01
Activation = ReLU, optimizer = Activation = SELU, optimizer = Activation = SELU, optimizer = Activation = SELU, optimizer
SGD, batch size = 100, Adam, batch size = 350, Adam, batch size = 250, = Adam, batch size = 250,
FFNN layers=one, neurons=25, number layers=one, neurons=4, number layers=one, neurons=8, number layers=one, neurons=4,
of epochs = 100, learning rate = of epochs = 100, learning rate = of epochs = 100, learning rate = number of epochs = 100,
0.001 0.001 0.001 learning rate = 0.001
TABLE IV. RSME AND MAE COMPARISON OF ALGORITHMS IN 3 DISTRIBUTION NETWORKS FOR THE ONE HOURLY POWER
CONSUMPTION
Quads Distribution Smir Distribution Boussafou Distribution Aggregated Distribution
Model RSME MAE RSME MAE RSME MAE RSME MAE
Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test
RF 3185.8 21109.7 2286.7 15442.0 3602.2 14700.9 2342.3 11955.2 5669.8 19504.1 4079.2 15777.6 4960.9 28769.3 3493.7 24033.3
DT 8218.56 26706.6 5879.6 23216.4 7364.0 16301.5 4716.7 13392.9 5766.6 20272.6 4091.4 16332.3 7487.5 38016.7 4724.6 30354.9
SVR 24954.1 26746.2 19707.4 21291.8 24094.4 29986.9 18886.2 24206.8 19320.6 23827.4 15435.9 18262.8 62601.5 56235.9 49188.3 44758.6
FFNN 19166.6 21127.3 14511.1 15622.3 20115.4 19845.6 15182.1 15235.3 20679.2 21873.3 17081.5 17161.9 46393.3 49175.1 36238.2 38693.8
LR 25961.7 23455.2 20798.6 18643.9 24018.5 29528.9 18985.4 24096.9 18528.9 34486.8 14767.1 27776.1 62840.1 59939.8 49712.8 47962.1
Fig 7. Actual vs predicted electricity power consumption of comparative models for the Boussafou distribution network every 10 minutes
Fig 8. Actual vs predicted electricity power consumption of comparative models for the hourly aggregation of the three distributions