0% found this document useful (0 votes)
90 views9 pages

Paper Pengolahan Data

This document compares machine learning algorithms for predicting power consumption using a case study from Tetouan city in Morocco. It summarizes that predicting electricity consumption can help utilities improve their systems. The goal is to predict consumption every 10 minutes or hourly to determine the most accurate approach. Random forest, neural networks, decision trees, and support vector regression models will be compared using data from 2017. The results will indicate which model achieves the smallest prediction errors.

Uploaded by

rhbnha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views9 pages

Paper Pengolahan Data

This document compares machine learning algorithms for predicting power consumption using a case study from Tetouan city in Morocco. It summarizes that predicting electricity consumption can help utilities improve their systems. The goal is to predict consumption every 10 minutes or hourly to determine the most accurate approach. Random forest, neural networks, decision trees, and support vector regression models will be compared using data from 2017. The results will indicate which model achieves the smallest prediction errors.

Uploaded by

rhbnha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Comparison of Machine Learning Algorithms for

the Power Consumption Prediction


- Case Study of Tetouan city –
Abdulwahed Salam, Abdelaaziz El Hibaoui
Faculty of Sciences
Tetouan, Morocco
Abdelmalek Essaadi University
[email protected], [email protected]

Abstract— Predicting electricity power consumption is an Where  is the energy lost in the transportation and  is
important task which provides intelligence to utilities and helps real number. According to the value of  we distinguish
them to improve their systems’ performance in terms of between the three following cases:
productivity and effectiveness. Machine learning models are the
most accurate models used in prediction. The goal of our study
is to predict the electricity power consumption every 10 minutes, x If  is a large positive number, that means there is a
and/or every hour with the determining objective of which quantity of energy produced but not used. In general,
approach is the most successful. To this end, we will compare this exceeded energy is lost. The problem is up to
different types of machine learning models that recently have now, the extra power needs to be stored or the
gained popularity: feedforward neural network with production needs to be reduced.
backpropagation algorithm, random forest, decision tree, and x If  is a large negative number, that means the
support vector machine for regression (SVR) with radial basis
consumption is larger than production. In this case
function kernel. The parameters associated with the
comparative models are optimized based on Grid-search
the problem blackout can be occurred. So, new
method in order to find the accurate performance. The dataset resources of energy are required to handle this
that is used in this comparative study is related to three different situation.
power distribution networks of Tetouan city which is located in x If  is positive and  is near to 0, that means it is a
north Morocco. The historical data used has been taken from stable electrical system and there is a harmony
Supervisory Control and Data Acquisition system (SCADA) between the production and consumption.
every 10 minutes for the period between 2017-01-01 and 2017-
12-31. The results indicate that random forest model achieved
The goal of power companies is to keep the value of  like
smaller prediction errors compared to their counterparts.
the third case. They have to make a balance between
Keywords—Energy Prediction, Artificial Neural Networks,
production and consumption. To this end, they need a strong
Random Forest, Decision Tree, Support Vector Regression, system which stands for accurate prediction.
Linear Regression
The electric power consumption has increased in the last
I. INTRODUCTION decade due to the growing of economic development and
population. Specifically, the annual electricity power
Over time, increased consumption of electric power and consumption is increasing in the industrial and domestic
increased attention to its details, such as electrical sectors. There are many factors that determine the energy
forecasting, have been concentrated by researchers. Also, consumption, we cite here weather, population, price of
Electricity companies have been spending a lot of money and electricity and consumer behavior. The variability of those
effort to control and manage their electric power effectively. factors makes the energy prediction more difficult.
Therefore, it becomes necessary to absorb all new and
available methods and to choose the best in proportion to the Considering this complexity, researches were
nature and quality of services provided by energy companies. concentrated on finding the most accurate prediction models
One of those things which needs to be considered is how to to predict the real demand of energy consumption. So, many
know the energy produced and consumed in order to balance algorithms and models have been proposed to offer solutions
production and consumption, to decrease the cost of to the power prediction. In general, those algorithms are
production, and to control future planning? classified into three categories: statistical, engineering, and
artificial intelligence. In literature, several researchers use
Knowing the real production and consumption of power artificial intelligence algorithms to make a prediction model
is the first step of making a good electrical system. To save especially that one’s of machine learning algorithms.
resources and reduce costing, power utilities are required to
balance between produced power and customers’ 3ÕQDU7IHNF[1] examined machine learning to predict
consumption. The relationship is described by the following full load electrical power output of a base load operated
equation: combined cycle power plant. Evangelia Xypolytou et al. [2]
studied Short-term electricity consumption forecast with
  (    + ) =  artificial neural networks a case study of office buildings.

    
Muhammad Waseem Ahmad et al. [3] compared feed- regression with radial basis function kernel, and artificial
forward back-propagation artificial neural network with neural network. Here, we use the common notations to
random forest to predict power consumption of a hotel in describe the algorithms.
Madrid, Spain. M. ErdemGünay[4] predicted annual gross
electricity demand by artificial neural networks case of A. Linear Regression
Turkey. Murat Kankal et al [5] studied the performance of Linear regression is considered as one of the simplest
an artificial neural network for modeling electricity energy approaches and it can be used as a baseline performance
demand in Turkey. K.P.Amber et al [6] compared five measure. It is based on linear relation between the dependent
artificial intelligent system techniques to predict electricity and independent variables[11]. It is defined by this equation:
power consumption of a building located in London. Fazil
Kaytez et al. [7] compared the regression analysis, neural () =  +   +   +  +   (1)
networks and least squares support vector machines for
predicting the electricity energy consumption of Turkey. Where  , ,…,  are the available inputs and
Henrique Pombeiro et al. [8] compared Linear regression vs.
 ,  , … ,  are the functional weights.
fuzzy modeling vs. neural networks models to predict
electricity consumption in an institutional building. Subodh
B. Decision Tree
Paudel et al. [9] predicted energy consumption of low energy
building based on support vector machine. Hamid R. The decision tree is a commonly used machine learning
Khosravani et al. [10] compared prediction models for energy method [12]. It is a kind of classification or regression and it
consumption based on neural networks of a bioclimatic utilizes a tree structure to separate a set of data into several
building. predefined classes giving the characterization, generalization
Those previous cited models may be appropriate for some and classification of given datasets[13].
cases but it cannot be generalized.
Its goal is to predict the target variable value by learning
In this study, we will compare the well-known machine simple decision rules deduced from the data and it shows how
learning algorithms in order to increase the efficiency and the target variable can be forecasted by predictor variables
revenues of the electrical generating and distribution set.
networks companies, and to assist them planning their
capacity and operations to supply all consumers with the There are many types of decision tree generation such as
required energy reliably. The availability of historical data ID3[12], C4.5[14] and classification and regression trees
allows us to use the supervised models such as decision tree, (CART)[15]. In this work, we implemented CART, along
support vector machine for regression, artificial neural with a Scikit-learn: Machine Learning in Python[16]. CART
network (feedforward neural network with backpropagation is nonparametric procedure to predict continuous dependent
algorithm) and random forest. Those models will also be variable which utilizes a binary tree to divide the predictor
compared to the linear regression method. The comparative space into subsets recursively[17]. According to our study,
study is based on historical consumption energy data of we found the performance of this model better than its family
Tetouan city for the period between 2017-01-01 and 2017- counterpart.
12-31. The historical data used are for Quads, Boussafou and
Smir distribution networks. It was taken from the SCADA C. Random Forest
system of the regional distribution company of drinking water Random forest is classified as an ensemble approach
and electricity (AMENDIS). This data is exclusive and have which combines the performance of numerous decision tree
not been used before our research. Due to the dependence of algorithms to predict the variable value. It was proposed to
prediction models on the input variables to obtain the best improve the accuracy of decision tree and it has different
output, we will include the weather data that is taken for the construction for regression and classification.
same period of power consumption. Moreover, we will use
the attributes of date and time as independent variables, study A number of  regression trees are built by random forest
the impact of those factors on the prediction, and determine and the result is the average. While  trees are grown, a
the importance of each factor on power consumption. regression predictor is defined as:

The rest of this paper is organized as follows: Section II 1




() = =1  () (2)
exposes a technical overview of different machine learning 
algorithms in comparison. Section III presents the case study
and the description of datasets. Section IV is devoted to the  =  ,  , … … ,  !
methodology used. Experiments and their results are covered
in Section V. And finally, Section VI concludes the paper and Where  is p-dimensional vector of inputs and  () is
dresses some perspectives. referred to decision tree.

II. OVERVIEW OF MACHINE LEARNING ALGORITHMS Selecting randomly a set of trees in the forest is
accomplished to make new training set. The set of unselecting
In this section, we will provide a brief description of five trees is known as out of bag samples[18]. Random features
different machine learning methods: linear regression, are selected in each split node of a decision tree instead of all
decision tree, random forest, support vector machine for
features. This process is repeated in order to create a random output and there are no connections back from the output
forest [19]. Aggregation of each individual prediction trees layer to the hidden layer or from the hidden layer to the input
makes the prediction of the random forest and this layer. The values are processed through transfer function
aggregation prediction gets better performance than the from the input layer to the hidden layer and multiplied by the
individual prediction of trees [20]. Moreover, random forest connection values. Also, the values are forwarded from the
provides an estimation of the relevant important features and hidden layers to the output layer in the same way.
how each feature affects the prediction [21]. In sum, Mathematically, it can be described by the following
simplicity, velocity, interpretability, accuracy and ease of use equation:
are the most important properties of random forest.
IJ
IJKO
IJKL
D. Support Vector Machine >? = ? @A #8 8 BC #8D D EF #DG G (. . ) + &D N + &8 P + &? Q
GM
DM
8M
Support vector machine was introduced in the late 1960s
(5)
and it has not got significant consideration until recent years.
SVM is a type of supervised method to achieve the
classification of multidimensional and it was originally Where >? , , R , ST , #, &8 represent neural network output,
invented as a linear classification then to a non-linear the activation function, the number of hidden layers, the
classifier. Lately, it was used to solve regression number of neurons in the hidden layer, the weight of
problems[22] which is based on the concept of support connections, and the bias of the neuron respectively. The
vectors and called support vector regression (SVR). transfer function is called activation function and there are
many types. Sigmoid[25], Hyperbolic Tangent
It is defined as: Function(Tanh)[26], Rectified Linear Unit (ReLU)[27],
Exponential linear Unit (ELU)[28], Scaled Exponential linear
" = () = # $ %() + & (3) (3) Unit (SELU)[29] and Swish[30] activation functions were
used in this work.
Where % is any nonlinear function to map input to output:
Feedforward neural network learn through different
types of learning rules, but backpropagation is the most used
%:  ' %() * -.
algorithm. To reduce errors, learning rule is used with
optimization algorithms to find the best parameters and
The best solution is detected by minimizing the following
compare the predicted output value with the real value and
function:
the errors’ feedback in order to adjust the weight of
1
connections. This step is repeated until it reaches the
/#/2 + 3  =1 4 + 5 (4) minimum number of errors or number of epochs.
2

"8 # $ %() & 9 ; + 48 III. CASE STUDY


&6  7 # $ %() + & "8 9 ; + 58
Tetouan is a city located in the north of Morocco which
48 , 58 < 0
occupies an area of around 10375 km² and its population is
about 550.374 inhabitants, according to the last Census of
Where 3 is constant to control the penalty factor which is
2014, and is increasing rapidly, approximately 1.96%
used to balance between smoothness and data fitting. 4 and 5 annually. Since it is located along the Mediterranean Sea, its
are VODFNYDULDEOHVWRRSWLPL]HWKHSUREOHPVDQGɽLVWKHORVV weather is mild and rainy in the winter, hot and dry during
function which is used to estimate the accuracy of prediction. the summer months. The power consumption data was
One advantage of SVR is finding a unique solution to collected from Supervisory Control and Data Acquisition
minimize the convex function and it depends on providing 3 System (SCADA) of Amendis which is a public service
DQGɽ[23][24]. operator and in charge of the distribution of drinking water
and electricity since 2002. The purpose of the electricity
E. Artificial Neural Network distribution network is to serve low and medium voltage
Artificial neural network imitates the work of the brain. It consumers in Tetouan regions. For this purpose, the delivery
is a technology that is currently widely used due to its ability and distribution of electrical energy from the point of delivery
to solve complex issues. Also, the artificial neural network is to the end user, the customer, is ensured by Amendis. The
the most common method to develop nonlinear problems of energy which is distributed comes from the National Office
regression and classification. Many types of networks are of Electricity and Drinking Water. After transforming the
available in literature. Here, we concentrated on the most high voltage (63 kV) to medium voltage (20 kV), it is allowed
used one, namely: feedforward neural network. It learns to transport and distribute the energy. The distribution
through training not through programming and it collects the network is powered by 3 source stations, namely: Quads,
knowledge by identifying the relationships of data. Smir and Boussafou.
Feedforward neural network basically consists of at least
three layers, an input layer which receives the data and The data which is used in this study was the historical data
processes it to the hidden layers, the hidden layers connect of power consumption which was collected every 10 minutes
the input layer to the output layer through connections and for the period between 2017-01-01: 00:00:00 and 2017-12-
the output layer in our case (regression) combines of one 31: 23:50:00. It is a unique dataset, and it does not have any
missing data. It is consisted of the date, time and the shows the output consumption of the three distribution
consumption of the three distribution networks. Figure 1 networks for the whole year of 2017 at each hour.

Fig 1. Power consumption for the year of 2017 at each hour for three distribution

There are similarities and differences between the three and the opposite happened on weekends. Our dataset is
distributions. The increasing of power consumption in the aggregated data and it is not determined for specific types of
summer is similar and that because of the hot weather and building to know the general effect of working days or
vacation time (the number of visitors grow up the weekends. Figure 3 shows the consumption of the week days
population). But the difference is the reduced power and it shows slight electricity power consumption is used on
consumption of Quads and Smir distribution in November Sunday compared to the other days.
and December compared to the power consumption of
Boussafou distribution at the same months.

Different attributes of date and time are used as the inputs


for the prediction models. Month, day of month, hour, day of
year, week of year, day of week, quarter and minute are the
independent variables and their correlation to the dependent
variable which is shown in Figure 2.

Fig 3. Box plot comparison of electricity consumption among week


days

With no doubt, there are many factors which affect the


power consumption such as weather, income, population,
electricity price and etc. One of the factors that used in this
work is weather and its data was gathered from sensors.
Fig 2 Correlation relationship between power consumption of Quads Those sensors are located both in the airport at the center of
distribution network and calendar variables the city and in Faculty of Science. This data is collected in
the period between 2017-01-01 and 2017-12-31 every 5
Due to the people behavior, the consumption of power is minutes. We reformed the data to be in every 10 minutes like
changed in working days compared to weekends. Usually, on the power consumption data by resampling the data and
working days the consumption is decreased in household and taking the average of two reading. The feathers of weather
increased in factories, commercial and public establishments
used in our study are: temperature, humidity, wind speed, method will show the score for each parameter value to be
diffuse flows and general diffuse flows. considered which one will be selected. This method is
applicable when the required maximum of parameters is
Table I shows the weather properties and the correlation known[32]. In this work, the calculations have been
between the power consumption of Quads distribution and implemented using Python 3.6, with base algorithm from
the corresponding weather over the full period of the dataset Keras[34] and scikit-learn[16].

TABLE I WEATHER PROPERTIES AND COEFFICIENT OF As it mentioned in the case study section, we got the
CORRELATION BETWEEN THE INPUT VARIABLES AND THE data from different resources. After extracting, transforming
OUTPUT VARIABLE
and loading data from the resources, we normalized the data
Count Mean STD* Min Max Correlation as a result of depending some models’ performance on
Quads 52560 32330 7133.05 13895.7 52204.4 1 normalization such as neural network. We transformed the
Temperature 52560 18.81 5.82 3.247 40.010 0.440221
data into the values between zero and one by using Min-Max
Normalization which is one of the best used techniques. This
Humidity 52560 68.26 15.55 11.34 94.8 -0.287421
normalization is achieved by:
Wind Speed 52560 1.96 2.34 0.05 6.483 0.167444
UVWXYZ
Diffuse flows 52560 75.03 124.21 0.011 936.00 0.080274 = (6)
WX[\ VWXYZ
Global Diffuse 52560 182.67 264.41 0.004 1163.00 0.187965
*STD: Standard derivation
We used grid search to find the best parameters for
algorithms. Random Forest algorithm depends on several
In Table I and Figure 2, we showed the power
hyperparameters. Selecting the appropriate values of these
consumption correlation to the calendar and weather parameters is an important step to get the most accurate result
attributes. Also, we applied feature selection to data in and there is no rule to be determined and followed. The most
order to determine the importance of predictive
important parameters are number of trees in the forest,
variables and get rid of the unimportant features. number of features to consider at each split, max depth of
There are many ways to perform that, one of them is each tree in the forest, the required minimum number of
random forest which identifies the true predictor of a
samples to split and the minimum number of samples
large number of candidates[31]. It is shown in Figure demanded to be at a leaf node. Those parameters are
4 that all variables are important but hour and optimized by cross-validation and grid search method. A rang
temperature are the most valuable. of parameter values were selected and trained on them. The
number of trees parameter was tested on set of the values {10,
20, 30, 50, 100, 200, 300}. The number of features was tested
on {1, 2, 3, 4, 5, 6, 7, 8, 9} etc. The grid search method shows
the score for each parameter value to be considered as chosen
values. The best values which are gotten by the grid search
method were 30, 7, None, 2 and 1 for the number of trees and
the number of features, max depth of the tree, min samples
split and min sample leaf parameters respectively.

In decision tree, different parameters need to be set and


examined to compare the result with other algorithms. The
most important parameter of decision tree to be selected were
tested in sets by grid search were: the depth of the tree, the
minimum number of samples required to split an internal
node, the minimum number of samples required to be at a leaf
node and the number of features to consider when looking for
Fig 4 Variable importance for Quads distribution dataset for the best split. The best result of these parameters which were
minutely consumption gotten by grid search were None, 10, 10 and 9 for the
parameters above respectively.
IV. METHODOLOGY
Support vector regression is characterized by usage of
In this study, we used five known types of algorithms kernel functions. Radial basis function is used in this work
that used in prediction. Despite their advantages and the according to its lower error compared to polynomial, linear
accuracy of the algorithms, these models require an accurate and sigmoid kernels as it is reported by [35]. Cost and gamma
selection of the arrangement parameters in order to achieve are kernel parameters and are required to be optimized. To
the best performance. We utilized grid search method to find assess the accuracy of support vector regression, we tested
the best parameter for the models. It is classified as the model on different parameter companions. The values of
exhaustive method for the best parameter values. The grid cost parameter were in the set {1, 10, 100, 1000} and the
search method is recommended to be used along with cross- values of gamma were in the set {0.01, 0.001, 0.0001}. The
validation in order to obtain best values [33]. It has to explore best performance result was when the cost equals to 10 and
each parameter by setting sort of values at first. Then, the when gamma equals to 0.01.
The most challenge in feedforward neural network V. EXPERIMENTS AND RESULT
is how to define the number of hidden layers and the number
of neurons in each hidden layer. Also, choosing the suitable The original dataset was collected over 10 minutes
activation function is another challenge. A lot of studies tried and our study is examined for the prediction of 10 minutes
to figure it out but no real rule can define that. In this work, and one hour power consumption periods to give the utilities
the number of factors is optimized by grid search. The the ability of decision making. All independent inputs are
number of hidden layers and neurons are selected by grid used in the experiment according to their effect which is
search. For one hidden layer we specified the number of explained and showed in the analysis of parameters.
neurons by the following function [36]:
We applied performance criterion to evaluate the models.
႙ =  + 1 (7) We utilized two different measures: Root Mean Squared
Error (RMSE), and Mean Absolute Error (MAE) which are
Where N is the number of input data. defined as:

In one hidden layer, we also tested the neural network on L


f
ghO`ab,c Vad,c e
{4,6,8,9,10,11,12,13,16,18,20,25,30} neurons according to -]^ = _ (8)
i
[37].
f
ghOkab,c Vad,c k
Moreover, the grid search tested accuracy of another ]j = (9)
i
number of neurons in a deep network with two hidden layers.
Each hidden layer consists of 30 and 20 layer Where >l is the predicted values, >m is the actual
respectively[37]. The model of nine hidden layer consists of values and >n is the average.
200, 160, 120, 80, 60, 40, 30, 20 neurons. For this model,
each layer was also examined by the grid search. Another In order to evaluate the five models on datasets, 10
factor to be optimized by grid search is the activation minutes consumption and an hour consumption, datasets are
function. Sigmoid, Tanh, ELU, ReLU, SELU and Swish divided into a train set and test set. Each algorithm is trained
activation functions have been considered in the grid as a by using 75% of the data and 25% for testing. The test set is
result of their variety and popularity. The optimization usually used to judge the models, but we also used training
algorithms which were used in the grid search to be optimized set to show the ability of learning. For comparison purposes,
are stochastic gradient descent (SGD) [38] and Adam[39]. we compared the median of 9 implementation in all models
One hidden layer with 10 neurons is selected by grid search, of dataset. All parameters that have optimized by search grid
SELU activation is also chosen and the Adam is implemented before were for the comparison of dataset of 10 minutes
as the preferable optimizer. We manually set the number of power consumption.
epochs to 100. The initial learning rate is seted to 0.001. The
initialization of training is Glorot uniform initialization[26]. The experimental results for the prediction of the 10
We used no dropout and 0.9 momentum. minutes period are presented in Table II. From the two
performance criterions, it is shown that random forest model
achieved the best results for the four examinations. Also, it is
noticed that feedforward neural network achieved a close
result to the random forest in Quads distribution network.

TABLE II. RSME AND MAE COMPARISON OF ALGORITHMS IN 3 DISTRIBUTIONS FOR 10 MINUTES POWER CONSUMPTION
Quads Distribution Smir Distribution Boussafou Distribution Aggregated Distribution
Algorithm RSME MAE RSME MAE RSME MAE RSME MAE
Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test
RF 671.7 3174.7 472.8 2663.5 214.1 2336.9 135.6 1939.6 594.5 3227.8 420.5 2475.9 482.3 4481.1 318.5 3595.3
DT 840.2 4613.9 550.7 3962.3 306.7 2849.8 179.3 2396.3 611.7 3543.7 405.9 2759.5 790.6 5957.3 490.0 4835.5
SVR 4092.3 3898.7 3192.3 3046.0 4205.5 5584.3 3298.6 4680.8 3334.4 3981.3 2671.6 3066.6 10821.8 9647.2 8505.6 7692.8
FFNN 2562.2 3203.6 1945.7 2601.2 3815.8 4877.8 2976.9 4007.1 2731.5 3745.6 2119.5 2965.2 6487.6 7045.9 4985.02 5583.9
LR 4404.2 3925.5 3522.1 3112.2 4068.4 4949.9 3213.4 4033.4 3142.8 5785.7 2504.1 4647.2 10687.4 10152.5 8450.9 8110.3
Fig 5. Actual vs predicted electricity power consumption of comparative models for the Quads distribution of every 10 minutes

Fig 6. Actual vs predicted forecast of comparative models for the aggregation power consumption every 10 minutes

Power utilities need to have the prediction of different time comparative algorithms again by the same optimizer method
periods such as hours, days, weeks, months and sometimes (grid search) for the same sets of parameters. Table III shows
years for decision making and plans. In this work, we used the optimized parameters of the comparative models and it is
the prediction of 10 minutes and one hour periods and it can obviously different from distribution to another. Table IV
be applied to different time periods. For the hourly prediction, shows the result of the models in each distribution and the
we reduced the number of parameters that became ineffective aggregation of the three distribution. The results present that
such as minutes. As the value of power consumption and the also the random forest still performs the best achievement.
parameters were changed, we needed to optimize

TABLE III OPTIMIZINING COMPARITIVE MODEL PARAMETERS FOR EVERY HOUR POWER CONSUMPTION BY USING GRID SEARCH

Model Quads Distribution Parameter Smir Distribution Parameter Boussafou Distribution Aggregated Distribution
Parameter Parameter
Num of features = 3, min samples Num of features = 7, min samples Num of features = 7, min samples Num of features = 5, min
split = 2, Num of Trees = 50, max split = 3, Num of Trees = 10, max split = 3, Num of Trees = 10, max samples split = 2, Num of Trees
RF
depth of the tree = None, min depth of the tree = None, min depth of the tree = None, min = 100, max depth of the tree =
sample leaf = 1 sample leaf = 1 sample leaf = 10 None, min sample leaf = 1
Num of features = 5, min samples Num of features = 7, min samples Num of features = 9, min samples Num of features = 9, min
split = 2, max depth of the tree = split = 3, max depth of the tree = split = 2, max depth of the tree = samples split = 3, max depth of
DT
None, min sample leaf = 10 None, min sample leaf = 10 None, min sample leaf = 10 the tree = None, min sample
leaf = 3
SVR C= 10, gamma= 0.01 'C': 1, 'gamma': 0.01 'C': 1000, 'gamma': 0.01 C= 1, gamma= 0.01
Activation = ReLU, optimizer = Activation = SELU, optimizer = Activation = SELU, optimizer = Activation = SELU, optimizer
SGD, batch size = 100, Adam, batch size = 350, Adam, batch size = 250, = Adam, batch size = 250,
FFNN layers=one, neurons=25, number layers=one, neurons=4, number layers=one, neurons=8, number layers=one, neurons=4,
of epochs = 100, learning rate = of epochs = 100, learning rate = of epochs = 100, learning rate = number of epochs = 100,
0.001 0.001 0.001 learning rate = 0.001

TABLE IV. RSME AND MAE COMPARISON OF ALGORITHMS IN 3 DISTRIBUTION NETWORKS FOR THE ONE HOURLY POWER
CONSUMPTION
Quads Distribution Smir Distribution Boussafou Distribution Aggregated Distribution
Model RSME MAE RSME MAE RSME MAE RSME MAE
Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test
RF 3185.8 21109.7 2286.7 15442.0 3602.2 14700.9 2342.3 11955.2 5669.8 19504.1 4079.2 15777.6 4960.9 28769.3 3493.7 24033.3
DT 8218.56 26706.6 5879.6 23216.4 7364.0 16301.5 4716.7 13392.9 5766.6 20272.6 4091.4 16332.3 7487.5 38016.7 4724.6 30354.9
SVR 24954.1 26746.2 19707.4 21291.8 24094.4 29986.9 18886.2 24206.8 19320.6 23827.4 15435.9 18262.8 62601.5 56235.9 49188.3 44758.6
FFNN 19166.6 21127.3 14511.1 15622.3 20115.4 19845.6 15182.1 15235.3 20679.2 21873.3 17081.5 17161.9 46393.3 49175.1 36238.2 38693.8
LR 25961.7 23455.2 20798.6 18643.9 24018.5 29528.9 18985.4 24096.9 18528.9 34486.8 14767.1 27776.1 62840.1 59939.8 49712.8 47962.1
Fig 7. Actual vs predicted electricity power consumption of comparative models for the Boussafou distribution network every 10 minutes

Fig 8. Actual vs predicted electricity power consumption of comparative models for the hourly aggregation of the three distributions

VI. CONCLUSION Scikit-learn and TensorFlow developments teams for their


facilities.
Accurate prediction of power consumption represents a
necessary part of electricity management for sustainable, REFERENCES
productive and effective systems. In this paper, linear >@ P. Tüfekci, “Prediction of full load electrical power output of a base load
regression, decision tree, random forest, feedforward neural operated combined cycle power plant using machine learning methods,”
network and supper vector machine for regression were used Int. J. Electr. Power Energy Syst., vol. 60, pp. 126–140, Sep. 2014.
to predict the power consumption of three distribution >@ E. Xypolytou, M. Meisel, and T. Sauter, “Short-term electricity
networks in Tetouan city. The results of those algorithms consumption forecast with artificial neural networks — A case study of
were compared to determine which one gives the best office buildings,” in 2017 IEEE Manchester PowerTech, 2017, pp. 1–6.
performance in term of energy forecasting. The dataset that
>@ M. W. Ahmad, M. Mourshed, and Y. Rezgui, “Trees vs Neurons:
we utilized in this work is exclusive and have not been used
before and is used to predict the power consumption of 10 Comparison between random forest and ANN for high-resolution
minutes, and one-hour periods. Calendar and weather prediction of building energy consumption,” Energy Build., vol. 147, pp.
predictive variables were included. It was shown that hour 77–89, Jul. 2017.
and temperature were the most predictive prominent >@ M. E. Günay, “Forecasting annual gross electricity demand by artificial
variables. We optimized the comparative models by grid neural networks using predicted values of socio-economic indicators
search to figure out the best parameters of the models. The and climatic conditions: Case of Turkey,” Energy Policy, vol. 90, pp.
results indicate that the random forest model outperformed 92–101, Mar. 2016.
other models for the prediction of electricity power >@ 0 .DQNDO DQG ( 8]OX ³1HXUDO QHWZRUN DSSURDFK ZLWK WHDFKLQJ±
consumption of Tetouan city. As a perspective of this work, OHDUQLQJEDVHG RSWLPL]DWLRQ IRU PRGHOLQJ DQG IRUHFDVWLQJ ORQJWHUP
we hope to apply the same study to different Morocco’s
HOHFWULFHQHUJ\GHPDQGLQ7XUNH\´1HXUDO&RPSXW$SSOYROQR6
power supplier and distribution companies including that
SS±'HF
one’s of renewable energy. Likewise, we plan to give a
>@ .3$PEHU5$KPDG0:$VODP$.RXVDU08VPDQDQG06
financial study and measure the economic impact.
.KDQ ³,QWHOOLJHQW WHFKQLTXHV IRU IRUHFDVWLQJ HOHFWULFLW\FRQVXPSWLRQ RI
EXLOGLQJV´(QHUJ\YROSS±$XJ
ACKNOWLEDGMENT >@ ) .D\WH] 0 & 7DSODPDFLRJOX ( &DP DQG ) +DUGDODF³)RUHFDVWLQJ
HOHFWULFLW\ FRQVXPSWLRQ $ FRPSDULVRQ RI UHJUHVVLRQDQDO\VLV QHXUDO
We would like to thank the Amendis company for
QHWZRUNVDQGOHDVWVTXDUHVVXSSRUWYHFWRUPDFKLQHV´,QW-(OHFWU3RZHU
supplying us the data of power consumption, also the physics
department of the Faculty of Science of Tetouan for weather (QHUJ\6\VWYROSS±0D\
data. Likewise, thanks to Google Colaboratory, Keras. >@ +3RPEHLUR56DQWRV3&DUUHLUD&6LOYDDQG-0&6RXVD
“Comparative assessment of low-complexity models to predict Energy Build., vol. 49, pp. 591–603, Jun. 2012.
electricity consumption in an institutional building: Linear regression [24] V. Rodriguez-Galiano, M. Sanchez-Castillo, M. Chica-Olmo, and M.
vs. fuzzy modeling vs. neural networks,” Energy Build., vol. 146, pp. Chica-Rivas, “Machine learning predictive models for mineral
141–151, Jul. 2017. prospectivity: An evaluation of neural networks, random forest,
[9] S. Paudel et al., “A relevant data selection method for energy regression trees and support vector machines,” Ore Geol. Rev., vol.
consumption prediction of low energy building based on support 71, pp. 804–818, Dec. 2015.
vector machine,” Energy Build., vol. 138, pp. 240–256, Mar. 2017. [25] A. A. Minai and R. D. Williams, “On the derivatives of the sigmoid,”
[10] H. Khosravani et al., “A Comparison of Energy Consumption Neural Networks, vol. 6, no. 6, pp. 845–853, Jan. 1993.
Prediction Models Based on Neural Networks of a Bioclimatic [26] X. Glorot and Y. Bengio, “Understanding the difficulty of training
Building,” Energies, vol. 9, no. 1, p. 57, Jan. 2016. deep feedforward neural networks.” pp. 249–256, 31-Mar-2010.
[11] N. Fumo and M. A. Rafe Biswas, “Regression analysis for prediction [27] B. Xu, N. Wang, T. Chen, and M. Li, “Empirical Evaluation of
of residential energy consumption,” Renew. Sustain. Energy Rev., vol. Rectified Activations in Convolutional Network,” May 2015.
47, pp. 332–343, Jul. 2015.
[28] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate
[12] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. Deep Network Learning by Exponential Linear Units (ELUs),” Nov.
1, pp. 81–106, Mar. 1986. 2015.
[13] Z. Yu, F. Haghighat, B. C. M. Fung, and H. Yoshino, “A decision tree [29] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-
method for building energy demand modeling,” Energy Build., vol. Normalizing Neural Networks,” Jun. 2017.
42, no. 10, pp. 1637–1646, Oct. 2010.
[30] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for Activation
[14] S. L. Salzberg, “C4.5: Programs for Machine Learning by J. Ross Functions,” Oct. 2017.
Quinlan. Morgan Kaufmann Publishers, Inc., 1993,” Mach. Learn.,
[31] R. Genuer, J.-M. Poggi, and C. Tuleau-Malot, “Variable selection
vol. 16, no. 3, pp. 235–240, Sep. 1994.
using random forests,” Pattern Recognit. Lett., vol. 31, no. 14, pp.
[15] L. Breiman, Classification and Regression Trees, 1st Editio. 2225–2236, Oct. 2010.
Routledge, 2017.
[32] M. Ataei and M. Osanloo, “Using a Combination of Genetic
[16] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Algorithm and the Grid Search Method to Determine Optimum Cutoff
Mach. Learn. Res., vol. 12, no. Oct, pp. 2825–2830, 2011. Grades of Multiple Metal Deposits,” Int. J. Surf. Mining, Reclam.
[17] M. A. Razi and K. Athappilly, “A comparative predictive analysis of Environ., vol. 18, no. 1, pp. 60–78, Jan. 2004.
neural networks (NNs), nonlinear regression and classification and [33] C.-J. Lin, “A Practical Guide to Support Vector Classification
regression tree (CART) models,” Expert Syst. Appl., vol. 29, no. 1, pp. Motivation and Outline,” 2003.
65–74, Jul. 2005.
[34] F. Chollet and others, “Keras.” 2015.
[18] R. Jiang, W. Tang, X. Wu, and W. Fu, “A random forest approach to
[35] R. Zuo and E. J. M. Carranza, “Support vector machine: A tool for
the detection of epistatic interactions in case-control studies,” BMC
mapping mineral prospectivity,” Comput. Geosci., vol. 37, no. 12, pp.
Bioinformatics, vol. 10, no. Suppl 1, p. S65, Jan. 2009.
1967–1975, Dec. 2011.
[19] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–
[36] K. G. Sheela and S. N. Deepa, “Review on Methods to Fix Number of
32, 2001.
Hidden Neurons in Neural Networks,” Math. Probl. Eng., vol. 2013,
[20] M. J. Kane, N. Price, M. Scotch, and P. Rabinowitz, “Comparison of pp. 1–11, Jun. 2013.
ARIMA and Random Forest time series models for prediction of avian
[37] S. Karsoliya, “Approximating Number of Hidden layer neurons in
influenza H5N1 outbreaks,” BMC Bioinformatics, vol. 15, no. 1, p.
Multiple Hidden Layer BPNN Architecture,” Int. J. Eng. Trends
276, Aug. 2014.
Technol., 2012.
[21] P. O. Gislason, J. A. Benediktsson, and J. R. Sveinsson, “Random
[38] T. Zhang and Tong, “Solving large scale linear prediction problems
Forests for land cover classification,” Pattern Recognit. Lett., vol. 27,
using stochastic gradient descent algorithms,” in Twenty-first
no. 4, pp. 294–300, Mar. 2006.
international conference on Machine learning - ICML ’04, 2004, p.
[22] H. Drucker, H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and 116.
V. Vapnik, “Support Vector Regression Machines,” Adv. NEURAL
[39] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic
Inf. Process. Syst. 9, vol. 9, pp. 155--161, 1997.
Optimization,” 3rd Int. Conf. Learn. Represent. San Diego, 2015,
[23] R. E. Edwards, J. New, and L. E. Parker, “Predicting future hourly Dec. 2014.
residential electrical consumption: A machine learning case study,”

You might also like