Reference - 8
Reference - 8
sciences
Article
Predicting Heating Load in Energy-Efficient Buildings
Through Machine Learning Techniques
Hossein Moayedi 1,2, * , Dieu Tien Bui 3,4, * , Anastasios Dounis 5 , Zongjie Lyu 6 and
Loke Kok Foong 7
1 Department for Management of Science and Technology Development, Ton Duc Thang University,
Ho Chi Minh City 758307, Vietnam
2 Faculty of Civil Engineering, Ton Duc Thang University, Ho Chi Minh City 758307, Vietnam
3 Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
4 Geographic Information System Group, Department of Business and IT, University of South-Eastern
Norway, N-3800 Bø i Telemark, Norway
5 University of West Attica, Dept. of Industrial Design and Production Engineering, Campus 2, 250 Thivon &
P. Ralli, 12244 Egaleo, Greece; [email protected]
6 State Key Laboratory of Eco-hydraulics in Northwest Arid Region of China, Xi’an University of Technology,
Xi’an 710048, China; [email protected]
7 School of Civil Engineering, Faculty of Engineering, Universiti Teknologi Malaysia, Johor Bahru 81310, Johor,
Malaysia; [email protected]
* Correspondence: [email protected] (H.M.); [email protected] (D.T.B.);
Tel.: +84-(47)96677678 (H.M.)
Received: 5 September 2019; Accepted: 11 October 2019; Published: 15 October 2019
Abstract: The heating load calculation is the first step of the iterative heating, ventilation, and air
conditioning (HVAC) design procedure. In this study, we employed six machine learning techniques,
namely multi-layer perceptron regressor (MLPr), lazy locally weighted learning (LLWL), alternating
model tree (AMT), random forest (RF), ElasticNet (ENet), and radial basis function regression (RBFr)
for the problem of designing energy-efficient buildings. After that, these approaches were used to
specify a relationship among the parameters of input and output in terms of the energy performance
of buildings. The calculated outcomes for datasets from each of the above-mentioned models were
analyzed based on various known statistical indexes like root relative squared error (RRSE), root mean
squared error (RMSE), mean absolute error (MAE), correlation coefficient (R2 ), and relative absolute
error (RAE). It was found that between the discussed machine learning-based solutions of MLPr,
LLWL, AMT, RF, ENet, and RBFr, the RF was nominated as the most appropriate predictive network.
The RF network outcomes determined the R2 , MAE, RMSE, RAE, and RRSE for the training dataset to
be 0.9997, 0.19, 0.2399, 2.078, and 2.3795, respectively. The RF network outcomes determined the R2 ,
MAE, RMSE, RAE, and RRSE for the testing dataset to be 0.9989, 0.3385, 0.4649, 3.6813, and 4.5995,
respectively. These results show the superiority of the presented RF model in estimation of early
heating load in energy-efficient buildings.
Keywords: energy-efficient buildings; smart buildings; machine learning; random forest; optimization
1. Introduction
In recent decades, artificial intelligence-based methods have been dramatically applied by scientists
in different fields of study, particularly in energy systems engineering (such as in Nguyen et al. [1]
and Najafi et al. [2]). In this regard, scientific applications of machine learning-based techniques
were considered to be a proper alternative in order to forecast the quantity of energy in constructions.
Consequently, an appropriate inspection of the particular energy performance for buildings and optimal
contriving of the heating, ventilation, and air-conditioning (HVAC) system will help in pushing further
sustainable consumption related to energy. The world’s energy consumption is still maintained at a
high value and even though many countries have taken some reasonable measures it is expected that
energy consumption will increase in future. Many believe that this is because of the rapid expansion
of economy and the improvement of living requirements. Currently, energy required for buildings
accounts for almost 40% of all energy use in Europe [3]. Some reports have indicated that in countries
such as United States and China, this value accounts for about 39% of the whole energy demand
along with 27.5% of nationally consumed energy. As a novel idea, most recently, intelligent predictive
tools have been utilized for the field of energy consumption calculation. In fact, the problem of
heating load calculation in energy-efficient buildings is an established concern. For realizing the best
artificial intelligence (AI) model to meet this goal, this study provides and compares five well-known
models that are widely used by researchers [4–8]. Similar to other research in the fields of science and
technology, AI techniques have widespread application in order to put forward reasonable evaluation
in many engineering problems [9–17] of the energy consumption in buildings. In numerous types
of artificial intelligence-based solutions, artificial neural network (ANN) is known as a recognized
method that is largely employed for many prediction-based examples [18–22]. Similar studies are
performed in regard to hybrid metaheuristic optimization approaches [23–29]. Also, in the field of
energy management, neural networks have emerged as one of the effective prediction tools [30–33].
Zemella et al. [34] investigated the design optimization of energy efficient buildings by employing
several evolutionary neural networks. The methods were applied to drive the design of a typical facade
module (i.e., play a key role in the definition of the energy performance of buildings) for an office
building. Chou and Bui [35] employed various data mining-based solutions in order to predict the
energy performance of buildings and to facilitate early designs of energy conserving buildings. These
techniques include support vector regression (SVR), ANN, regression and classification tree, ensemble
inference model, general linear regression, and chi-squared automatic interaction detector. Yu et al. [22]
studied the challenges and advances in data mining applications for communities concerning
energy-efficient buildings. Hidayat et al. [36] employed a neural network model in an energy-efficient
building to achieve proper smart lighting control. Kheiri [20] reviewed different techniques of
optimization applied to the energy-efficient building. Malik and Kim [37] investigated smart buildings
and their efficient energy consumption. In this regard, various prediction-learning algorithms including
particle-based hybrid optimization algorithm were employed and their performances were evaluated.
Ngo [18] explored the excellent capacity of machine learning to assess early predicting cooling loads.
The main objective of such prediction was prediction of cooling loads in the office buildings through
machine learning-based solutions. His study successfully achieved the objective by providing some
neural network-based equations. Mejías et al. [38] employed both of the linear regression and neural
network to predict three conceptions associated with energy consumption, cooling, and heating energy
demands. The results of their studies proved that the neural network was superior to other models.
Deb et al. [39] explored the potential of neural network-based solutions in forecasting the diurnal
cooling energy load; this study used recorded data of the five days before the day of the experiment
to estimate the energy consumption; the outcomes demonstrated that the ANN approach is very
effective. Moreover, Li et al. [40] performed a comparative analysis between different machine learning
techniques such as radial basis function neural network (RBFNN), general regression neural network
(GRNN), traditional backpropagation neural network (BPNN), and support vector machine (SVM) in
predicting the hourly cooling load of a normal residential building.
There are few studies (e.g., Kolokotroni et al. [41] and Nguyen et al. [42]) on the machine
learning-based modeling application on the prediction of heating load. Nevertheless, using machine
learning paradigms for optimizing the answers determined by the best artificial intelligence-based
models is the chief aim of the actual study. To help engineers obtain an optimized design of
energy-efficient buildings without any further experiments, this knowledge gap should be addressed.
Hence, the basic purpose of this work is to estimate the amount of heating load in energy-efficient
Appl. Sci. 2019, 9, 4338 3 of 17
buildings by various new machine learning-based approaches. In the following, several machine
learning techniques such as multi-layer perceptron regressor (MLPr), lazy locally weighted learning
(LLWL), alternating model tree (AMT), random forest (RF), ElasticNet (ENet), and radial basis function
regression (RBFr) are employed to estimate the amount of heating load (HL) in energy-efficient buildings.
2. Database Collection
The required initial dataset was obtained from Tsanas and Xifara [43]. The obtained records
include eight inputs (i.e., conditional factors) and a separate output of heating load (i.e., response
factors or dependent outputs). Based on a residential building main conditional design factors, the
inputs were X1 (Relative Compactness), X2 (Surface Area), X3 (Wall Area), X4 (Roof Area), X5 (Overall
Height), X6 (Orientation), X7 (Glazing Area), and finally, X8 (Glazing Area Distribution). Likewise,
parameters of the heating load of the suggested building were presented to be forecasted by the inputs.
In addition, in this study the heating loads, as the main outputs, were simplified as heating load.
The characteristics of the analyzed building and fundamental assumptions are properly detailed in
the [43]. A total of 768 buildings were modelled considering twelve distinct buildings, five distribution
scenarios, four orientations, and four glazing areas. The obtained data is analyzed through Ecotect
computer software. A graphical view of this process is illustrated in Figure 1.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 19
FigureFigure 1. Graphicalview
1. Graphical view ofofdata
datapreparation.
preparation.
Used label X1 X2 X3 X4 X5 X6 X7 X8 Y1
Minimum 0.6 514.5 245.0 110.3 3.5 2.0 0.0 0.0 6.0
Maximum 1.0 808.5 416.5 220.5 7.0 5.0 0.4 5.0 43.1
Appl. Sci. 2019, 9, 4338 4 of 17
Average 0.8 671.7 318.5 176.6 5.3 3.5 0.2 2.8 22.3
(c)REVIEW
Appl. Sci. 2019, 9, x; doi: FOR PEER (d)
www.mdpi.com/journal/applsci
(e) (f)
(g) (h)
FigureFigure 2. Schematic view of some of the input data layers (X1–X8 as shown in Table 1) in predicting heating load.
2. Schematic view of some of the input data layers (X1 –X8 as shown in Table 1) in predicting
(a) X1 (Relative Compactness); (b) X2 (Surface Area); (c) X3 (Wall Area); (d) X4 (Roof Area); (e) X5 (Overall Height);
heating load. (a) X 1 (Relative Compactness); (b) X2 (Surface
(f) X6 (Orientation); (g) X7 (Glazing Area); (h) X8 (Glazing
Area); (c) X (Wall Area); (d) X4 (Roof Area);
Area Distribution). 3
(e) X5 (Overall Height); (f) X6 (Orientation); (g) X7 (Glazing Area); (h) X8 (Glazing Area Distribution).
60
50
40
Heatin load
30
20
10
0
0 200 400 600 800
Dataset number
Figure 3. Schematic
Figure view
3. Schematic of of
view some
someofofthe
theoutput datalayers
output data layers(i.e.,
(i.e.,heating
heating load).
load).
3. 3.Model
ModelDevelopment
Development
AnAnacceptable
acceptable predict
predict approach
approach that thatisisutilized
utilizedwith
withdifferent
different artificial
artificial intelligence-based
intelligence-based systems
systems like
MLPr,
like MLPr,LLWL,
LLWL,AMT,AMT,RF, RF,
ENet, and and
ENet, RBFrRBFr
models to predict
models heating
to predict load inload
heating energy-efficient buildings
in energy-efficient requires
buildings
several several
requires steps. After that,
steps. the best
After that,fitthe
model
bestisfitthen selected.
model Firstly,
is then the initial
selected. database
Firstly, should
the initial be separated
database shouldto
the datasets of training (80% of the whole dataset) and testing (20% of the whole dataset).
be separated to the datasets of training (80% of the whole dataset) and testing (20% of the whole dataset). In the current study
Inand
thebecause
currentofstudy
the size of the testing dataset, the predictability of generated networks is considered to be as a
and because of the size of the testing dataset, the predictability of generated
proof of their validations. Therefore, a greater percentage of the dataset is considered for the testing dataset to
networks is considered to be as a proof of their validations. Therefore, a greater percentage of the
be reliable for testing the trained network. Secondly, in order to obtain the best predictive network,
dataset is considered for the testing dataset to be reliable for testing the trained network. Secondly,
appropriate machine learning-based solutions have to be introduced. Lastly, the outcome of the trained
innetwork
order toshould
obtain bethe best predictive
validated and verified network,
for selectedappropriate machine
testing datasets, learning-based
randomly. The datasetsolutions
utilized inhave
this
towork
be introduced.
is generatedLastly,
by some theofoutcome
the mostof the trained
influential inputnetwork should
layers, such as be validated
surface andarea,
area, roof verified for
relative
selected testing
compactness, datasets,
wall randomly.
area, glazing The dataset
area, glazing utilized in
area distribution, this work
overall height,isandgenerated by which
orientation, some are
of the
the
effective
most parameters
influential input influencing
layers, suchthe heating loadarea,
as surface valueroof
in energy-efficient buildings. Notewall
area, relative compactness, that area,
the employed
glazing
dataset was obtained from a recent study conducted by Tsanas and Xifara [43].
area, glazing area distribution, overall height, and orientation, which are the effective parameters
All sixthe
influencing machine
heatinglearning analyses
load value provided in the
in energy-efficient current study
buildings. werethe
Note that performed
employed using Waikato
dataset was
Environment for Knowledge Analysis (WEKA). WEKA is a java-based open-source machine learning analyzer
obtained from a recent study conducted by Tsanas and Xifara [43].
software that was developed in University of Waikato, New Zealand. Each of the proposed techniques were
All six machine learning analyses provided in the current study were performed using Waikato
performed in optimized settings as explained in this section.
Environment for Knowledge Analysis (WEKA). WEKA is a java-based open-source machine learning
analyzer software
3.1. Multi-Layer that was
Perceptron developed
Regressor in University of Waikato, New Zealand. Each of the proposed
(MLPr)
techniques were performed in optimized settings as explained in this section.
The MLP is a widely-used and well-known predictive network. Accordingly, MLPr aims to coordinate
the best potential of regression between a set of data samples (shown here in terms of S). The MLPr divides
3.1. Multi-Layer Perceptron Regressor (MLPr)
the S into both of the set training and testing databases. An MLP involves several layers of computational
TheSimilar
nodes. MLP is to amany
widely-used and well-known
previous MLPr-based studies,predictive network.
a single hidden Accordingly,
layer was MLPr
used. This is aims
because to
even
with a single
coordinate the hidden layer and
best potential increasing the
of regression numbera set
between of nodes
of datainsamples
the hidden layer here
(shown an excellent
in termsrate of
of S).
prediction
The can be achieved.
MLPr divides the S intoFigure 4 shows
both of the seta training
common andMLPtesting
structure. The optimum
databases. An MLPnumber of neurons
involves in
several
each of the hidden layer are obtained after a series of trial and error processes (i.e., sensitivity analysis) as
layers of computational nodes. Similar to many previous MLPr-based studies, a single hidden layer
shown in Figure 5. Noteworthily, only one hidden layer was selected since the accuracy of a single hidden
layer was found to be high enough to not make the MLP structure more complicated.
Appl. Sci. 2019, 9, 4338 6 of 17
was used. This is because even with a single hidden layer and increasing the number of nodes in the
hidden layer an excellent rate of prediction can be achieved. Figure 4 shows a common MLP structure.
The optimum number of neurons in each of the hidden layer are obtained after a series of trial and
error processes (i.e., sensitivity analysis) as shown in Figure 5. Noteworthily, only one hidden layer
was selected since the accuracy of a single hidden layer was found to be high enough to not make the
MLP structure
Appl. Sci. 2019, 9,more complicated.
x FOR PEER REVIEW 8 of 19
Each node generates a local output. In addition, it sets the local output to the subsequent layer (the next
nodes in a further hidden layer) until reaching the nodes of output, i.e., the nodes placed in the layer of output.
Equation (1) shows the normal operation carried out considering a dataset of N groups of records by the jth
neuron to compute the predicted output:
𝑂 = 𝐹 (∑ 𝐼 𝑊 + 𝑏 ), (1)
where I symbolizes the input, b denotes the bias of the node, W is the weighting factor, and F signifies the
activation function. Tansig (i.e., the tangent sigmoid activation function) is employed (Equation (2)). Note that
we can have several types of activation functions (e.g., (i) sigmoid or logistic; (ii) Tanh—Hyperbolic tangent;
(iii) Relu—rectified linear units) and that their performances are best suitable for different purposes. In the
specific case of the sigmoid, this function (i) is real-valued and differentiable (i.e., to find gradients); (ii) has
analytic tractability for the differentiation operation; and (iii) is an acceptable mathematical representation
biological neuronal behavior.
2
𝑇𝑎𝑛𝑠𝑖𝑔(𝑥) = −1 (2)
1+𝑒
Figure 4. Multi-layer perceptron regressor (MLPr) neural network typical architecture.
Figure 4. Multi-layer perceptron regressor (MLPr) neural network typical architecture.
1.000 node generates a local output. In addition, it sets the
Each 0.200
local output to the subsequent layer (the next
nodes in a further hidden layer) until reaching the nodes of output, i.e., the nodes placed in the layer of output.
0.990 (1) shows the normal operation carried out considering
Equation 0.160 a dataset of N groups of records
Train by the jth
neuron to compute the predicted output: Test
0.980 0.120 Average
𝑂 = 𝐹 (∑ 𝐼 𝑊 + 𝑏 ),
RMSE
R²
(1)
where0.970
I symbolizes the input, b denotes the Train
bias of the node,0.080
W is the weighting factor, and F signifies the
activation function. Tansig (i.e., the tangentTest
sigmoid activation function) is employed (Equation (2)). Note that
0.960 Average 0.040
we can have several types of activation functions (e.g., (i) sigmoid or logistic; (ii) Tanh—Hyperbolic tangent;
(iii) Relu—rectified linear units) and that their performances are best suitable for different purposes. In the
0.950
specific case of the sigmoid, this function (i) is real-valued and 0.000
differentiable (i.e., to find gradients); (ii) has
analytic tractability for the differentiation operation; and (iii) is an0 acceptable
0 2 4 6 8 10 2 4 6
mathematical 8 10
representation
Number of nodes in hidden layer Number of nodes in hidden layer
biological neuronal behavior.
(a) (b)
2
𝑇𝑎𝑛𝑠𝑖𝑔(𝑥) = − 1 (2)
FigureFigure 5. Sensitivity
5. Sensitivity analysis
analysis basedon
based 1 +ofof
on number
number 𝑒 neurons
neurons in in
a single hidden
a single layer.layer.
hidden
Each2019,
Appl. Sci.
node generates a local output. In addition, it sets the localwww.mdpi.com/journal/applsci
output to the subsequent layer
1.000 9, x; doi: FOR PEER REVIEW 0.200
(the next nodes in a further hidden layer) until reaching the nodes of output, i.e., the nodes placed in
the layer0.990of output. Equation (1) shows the normal operation 0.160carried out considering aTrain dataset of N
groups of records by the jth neuron to compute the predicted output: Test
0.980 0.120 Average
RMSE
R²
XN
Oj = F ( In Wnj + b j ), (1)
0.970 Train n=1 0.080
Test
where I symbolizes the input, b of the node, W is the weighting factor, and F
denotes the bias
0.960 Average 0.040
signifies the activation function. Tansig (i.e., the tangent sigmoid activation function) is employed
(Equation (2)). Note that we can have several types of activation functions (e.g., (i) sigmoid or logistic;
0.950 0.000
(ii) Tanh—Hyperbolic
0 2 tangent;
4 (iii) 6Relu—rectified
8 10linear units) and
0 that
2 their
4 performances
6 8 are10best
suitable for different purposes.
Number Ininthe
of nodes specific
hidden Number
layer case of the sigmoid, this of nodes
function (i)inishidden layer and
real-valued
(a) (b)
Figure 5. Sensitivity analysis based on number of neurons in a single hidden layer.
differentiable (i.e., to find gradients); (ii) has analytic tractability for the differentiation operation; and
(iii) is an acceptable mathematical representation biological neuronal behavior.
2
Appl. Sci. 2019, 9, x FOR PEER REVIEW
Tansig(x) = −1 9 of 19
(2)
1 + e−2x
3.2.
Appl.Lazy
3.2. Locally
Lazy
Sci. 2019, Weighted
9,Locally
x FOR Learning
Weighted
PEER (LLWL)
Learning
REVIEW (LLWL) 9 of 19
Similar
Similar to thetoK-starthe K-star technique technique (i.e., an(i.e., an instance-based
instance-based classifier), classifier),
locally-weighted locally-weighted learning learning(LWL) [44] is
3.2. Lazy
(LWL) Locally Weighted
[44] is one of theLearning
common (LLWL) types of lazy learning-based solutions. Lazy learning approaches
one of the common types of lazy learning-based solutions. Lazy learning approaches provide valuable training
Appl. provide
Sci.
algorithms 2019,
Similarand valuable
9, x FOR
to the PEER training
representations REVIEW
K-star technique algorithms
for learning and representations
about complex classifier),
(i.e., an instance-based phenomenafor learning about
during autonomous
locally-weighted complex learning phenomena
9
adaptiveof 19
(LWL)control [44] is
one during
of complex
of autonomous
systems.
the common Commonly,
types adaptive
of control
there
lazy learning-based are of complex
disadvantages
solutions. systems.
in employing
Lazy Commonly,
learning there are
such methods.
approaches providedisadvantages
Lazyvaluable
learners training in
create a
3.2.
Appl. Lazy
considerable
Sci. Locally
employing
2019, 9, delay
x Weighted
such
FOR methods.
during
PEER Learning
REVIEW the Lazy (LLWL)
network learners create
simulation. a
algorithms and representations for learning about complex phenomena during autonomous adaptive control considerable
More explanations delay during
about the
this network
model simulation.
are 9 provided
of 19 by
Atkeso,
of More
complex
Similar explanations
et Appl.
al. to[44].
systems.
the K-star x FOR about
Sci. 2019, 9,Commonly, this there
PEER REVIEW
technique model(i.e.,arearedisadvantages
an provided by Atkeso,
instance-based inclassifier),
employinget al.locally-weighted
[44].
such methods. Lazy
9 of 19
learning learners
(LWL)create [44] isa
3.2. Lazy
The
considerable
Appl. Sci. Locally
The
key
2019, 9, key
delay
x Weighted
options
FOR options we
during
PEER Learning
we
have
REVIEW have
the in (LLWL)
in
LLWL
network LLWL include
include
simulation. number
number More ofof decimal
decimal
explanations places
places about (numDecimalPlaces),
(numDecimalPlaces),
this model are 9 batch
batch
provided
of 19 size
by
one of the 3.2. common types
Lazy Locally of lazy
Weighted learning-based
Learning (LLWL) solutions. Lazy learning approaches provide valuable training
Atkeso,size et
(batchSize),
algorithms (batchSize),
al.
and KNN
[44]. KNN (following
(following
representations for the
the learning k-nearest
k-nearest about neighbors
neighbors algorithm), nearest
algorithm),
complex classifier),
phenomena duringneighbor
nearest neighbor
autonomous searchsearch algorithm
adaptive algorithm
control
Similar to the K-star technique
Similar to the K-star technique (i.e., (i.e., an instance-based
an weighting
instance-based classifier), locally-weighted
locally-weighted learning learning
(LWL) [44] (LWL)
is [44] is
3.2.
of (nearestNeighborSearchAlgorithm),
Lazy
complex Locally
systems. Weighted
(nearestNeighborSearchAlgorithm),
The key options we Learning
Commonly, have in (LLWL)
LLWL
there and
areand weighting
include
disadvantages number Kernel
Kernelinof (weightingKernel).
(weightingKernel).
decimal
employing places
such More
More
(numDecimalPlaces),
methods. explanations
Lazy explanations
learners are
batch
create area
size
one of the one common types of
of the common lazy
types learning-based
of lazy learning-basedsolutions. solutions. Lazy Lazy learning
learning approaches
approaches providetraining
provide valuable valuable training
provided
provided
(batchSize),
considerable below below
KNN for for
each each
(followingof theof the
above
the above influential
influential
k-nearest parameters.
parameters.
neighbors algorithm), nearest neighbor search algorithm
algorithms
Similar andtodelay
algorithmsthe during
representations
and
K-star the for
representations
technique networklearning
for simulation.
learning
(i.e., an about
about complex
instance-based More
complex explanations
phenomena
phenomena
classifier), during about
during
autonomous
locally-weighted this adaptive
autonomous model
learning are
adaptive
control provided
(LWL) [44]by
control is
numDecimalPlaces—The
numDecimalPlaces—The
of complex
(nearestNeighborSearchAlgorithm),
Atkeso,
of complex et al. [44]. systems. Commonly, number
number
there
and areof decimal
of decimal
disadvantages
weighting places.
places.
one of the common types of lazy learning-based solutions. Lazy learning approaches provide valuable traininga
systems. Commonly, there are disadvantages This
in employing
Kernelin This
employing number
number
such will
methods.
(weightingKernel).
such willbe implemented
be
Lazy
methods. implemented
learners
More Lazy create for
explanations
learners the
a for output
the
create are
considerable delay during the model.
network simulation. More explanations about this model are provided by
providedTheofkey
considerable
algorithms numbers
output
below
and options
delay in
of numbers
for each we
during theof
representations model.
in the
the
havethe above
in LLWL
network
for influential
learning aboutparameters.
include
simulation. number Moreofphenomena
complex decimal places
explanations about
during (numDecimalPlaces),
this model are
autonomous adaptive batch
provided size
controlby
Atkeso, et al. [44].
(batchSize),
Atkeso,
of complex batchSize—The
numDecimalPlaces—The
etbatchSize—The
al. KNN
The key(following
[44].
systems. chosen
optionschosen
Commonly, we have
number
the number
number of
are of
ink-nearest
there LLWL
cases
of decimal
cases to process
neighbors
disadvantages
include toplaces.
number process if batch
This
algorithm),
ofindecimal
employing estimation
if number
places estimation
nearest
such is
willmethods.
batch(numDecimalPlaces), being
beneighbor
implemented completed.
is being
Lazybatch for the
completed.
search
learners
size
A normal
outputa
algorithm
create
Theofvalue
A numbers
normalof thevalue
(nearestNeighborSearchAlgorithm),
considerable key options
(batchSize),
delay inbatch
KNNwe
during theof size
model.
the
have
(following is LLWL
the batch
in 100.
network the In
size
and isthis
k-nearest 100.example
weighting
include
simulation. Innumber
thisMore
neighbors we
example
Kernel ofalso
algorithm), consider
we also
(weightingKernel).
decimal
explanations places
nearest it
consider to(numDecimalPlaces),
neighbor
about bethis
itconstant
to be
More
search
model as
constant
algorithm
areit did itnot
as batch
explanations
provideddidhave are
size
by
significant
batchSize—The
providedetbelow
(batchSize),
Atkeso, not have
al. KNN
[44]. impact
significant
for each
(following on
(nearestNeighborSearchAlgorithm),
chosen the
impact
of the above outputs.
number on and
of
the weighting
cases
outputs.
influentialneighbors
the k-nearest to Kernel
process
parameters. if (weightingKernel).
batch estimation More
is
algorithm), nearest neighbor search algorithm explanations
being completed. are A normal
provided below for each ofof the above influential parameters.
TheKNN—The
value
KNN—The of thenumber
numDecimalPlaces—The
(nearestNeighborSearchAlgorithm),
key options batch
number
we have
numDecimalPlaces—The
size ofneighbors
is
in 100.and
number
neighbors
LLWL number
that
In include
this
that are are
of decimal
employed
example
ofweighting
decimal employed
number weofalso
places.
Kernel
places.
to
Thisset
to
Thisdecimal
number
thethe
consider
number
set width itwill
width
(weightingKernel).
places
will
of
to the
be weighting
constant
of implemented
the weighting
More asfunction
itfor
for theexplanations
(numDecimalPlaces),
be implemented
didthe
functionnot
output batch size
(noting
have
output
are
that
of
provided below
(batchSize), KNN
significant
numbers
(notingKNN <=
that
for 0
impact
in
eachKNN means
the
(following
of numbers on
model.
≤
ofinthe 0all
the neighbors
outputs.
means
above all are
neighbors
influentialneighbors
the k-nearest
the model.
considered).
are
parameters. considered).
algorithm), nearest neighbor search algorithm
nearestNeighborSearchAlgorithm—The
KNN—The
numDecimalPlaces—The number
batchSize—The
batchSize—The chosen
nearestNeighborSearchAlgorithm—The
(nearestNeighborSearchAlgorithm), ofchosen
neighbors
number
number andofthat
number ofcases
of decimal
cases
weighting potential
arepotential
employed
to
to process
places.
process Kernel nearest
ifto
This
ifnearest
batch set
batch neighbor
the
number width
estimation
estimation
neighbor
(weightingKernel).iswillofsearch
being
search the
be algorithm
weighting
iscompleted.
being
implemented
algorithm
More completed. toapplied
tofunction
for
A explanations
normalbe be A applied
the (noting
normal
output
are
(the
that
value
of default
KNN
numbers
(the of value
default <=
the algorithm
in0of
batch the
means
the
algorithm batch
size
model. that
all size
is
that
provided below for each of the above influential parameters. was
is
100. 100.
neighbors
was Inalso
Inthis
also selected
this
are example in
considered).
example
selected inweour
we also
our study
also consider
study was
consider
was it LinearNN).
to be
it constant
LinearNN).to be as it
constant did not
as have
it did not have
significant impact on the outputs.
weightingKernel—The
nearestNeighborSearchAlgorithm—The
significant
batchSize—The impact on the
chosen number
outputs. that determines
potential thenearest
weighting function.
neighbor is(0implemented
search = algorithm
Linear; 1Linear;
= for
= (noting Epnechnikov;
to be A1applied
=
numDecimalPlaces—The
weightingKernel—The
KNN—The numbernumber ofnumber
neighbors ofof cases
thatdecimal
that to process
determines
are places.
employed toif the
setbatch
Thisthe estimation
number
weighting
width of thewill be
function.
weighting being completed.
(0
function the normal
output
of 2 =
(the
value Tricube;
KNN—The default
numbers of that
Epnechnikov; 3
in =
batch
KNN
Inverse;
algorithm
thenumber the
2<==model.
of
0size that4 =
neighbors
Tricube;
means is all Gaussian;
was
100. =In also
3neighbors that
thisare
Inverse; and
selected
are 5 =in
4 employed
example Constant.
= Gaussian;
considered). our
we also study
to and
set(default
was
the
consider 0 =
LinearNN).
5 =width it to
Constant. Linear)).
of bethe(default
weighting
constant = function
0 as it did not
Linear)). (noting
have
A good
that
batchSize—Theexample
weightingKernel—The
KNN
significant <= 0of
impact k-nearest
means on all
the numberneighbors
neighbors
outputs.
nearestNeighborSearchAlgorithm—The that algorithm
determines
are considered).
potential istheshownweighting
nearest in Figure
neighbor 6.
function.
search The test
(0
algorithm = sample
Linear;
to be 1 (red
=
applied dot)
Epnechnikov;should
A good examplechosen of k-nearestnumber of cases algorithm
neighbors to processisifshown batch estimation
in Figure 6.isThe beingtestcompleted.
sample (redAdot) normal
be classified
2 either
= Tricube; (theas blue
default
3batch
nearestNeighborSearchAlgorithm—The
KNN—The
value of the number squares
algorithm
= Inverse; of
size or
that
4 =100.
neighbors
is aswas
Ingreen
Gaussian; that
this triangles.
also selected
and
are potential
employed
example Ifstudy
5 =inConstant.
our
we knearest
=to3set
also (i.e.,
was depicted
LinearNN).
(default
neighbor
the
consider width0it= to inbe solid
Linear)).
ofsearch
the line circle)
algorithm
weighting
constant as toit be
function is not
depicted
applied
(noting
should beclassified either as blue
weightingKernel—The number squares or as green
that determines triangles.
the weighting If k =
function. (0 3 (i.e., depicted in solid linehave
it did
by the A green
good
(the
that triangles
example
default
KNN
significant <= ofas
algorithm
0
impact there
k-nearest
means on are
that
all
the two
neighbors
was
neighbors
outputs. triangles
also selected
are (reversed
algorithm in
considered). is
our instudy
shown shape) inwas and
Figure only6. =The
LinearNN).
Linear;
one test 1 = Epnechnikov;
rectangle
samplethrough (red dot) theshould
inner
circle) it is depicted2 = Tricube; by3the green4 triangles
= Inverse; = Gaussian;as and there are two(default
5 = Constant. triangles (reversed in shape) and only one
0 = Linear)).
(i.e.,
be
continuous
classified either line)number
goodas
weightingKernel—The circle.
blue squares
nearestNeighborSearchAlgorithm—The
KNN—The If kk-nearest
ofof
=number
5 (dashed
or neighbors
neighbors as that
green line
that circle)
triangles.
determines
are itisisshown
potential
employed Ifassigned
the knearest
=to3inset
weighting to
(i.e., the blueofsearch
depicted
function.
neighbor rectangles
in solid
(0circle)
= (redline(three
Linear;
algorithmcircle) =blue
1assigned isrectangles
it be depicted
Epnechnikov;
to applied
rectangle A
through example
the inner (i.e., continuous algorithm
line) circle. =the
If kFigure width
6. The test
5 (dashed the
sample
line weighting dot) function
it isshould (noting
to
vs. the
by twogreengreen
2 =
(the
that triangles
triangles
beTricube;
default
classified
KNN <= 3 =as inside
algorithm
either
0 there
Inverse;
as blueall
means the
are
that4 =
squaresouter
twoGaussian;
was
neighbors circle).
triangles
or also
as green are andVariation
selected(reversed
5
triangles.=in
considered). of
Constant.
Ifour in the
k = 3studyshape) correlation
(i.e.,(default
was and
depicted only
0 =
LinearNN).
in coefficient
one
Linear)).
solid rectangle (R² ) versus
through
line circle) it is depicted number
the of
inner
the blue rectangles (three blue rectangles vs. two green triangles inside the outer circle). Variation
used
(i.e., KNN by
continuous
A the
good neighbors
the green
example is
line) circle.
weightingKernel—The of shown
triangles as in
there
If k =number
k-nearest Figure
are two
5 neighbors
(dashed 7.
that It
triangles
line can be
circle)
algorithm
determines seen
(reversed in
it isthe that
shape)
assigned
shown changing
weighting and
into only
the
Figure the
one
blue KNN
rectangle
6. The
function. could
rectangles
test
(0 through significantly
the
(three
sample
= algorithm
Linear; inner
=blue
(red enhance
rectangles
dot) should
of nearestNeighborSearchAlgorithm—The
correlation
(i.e., continuous coefficient
line)For
circle. If k (R= 25)(dashed
versus number
line
potential
circle) ofassigned
it KNN
is used nearest KNN
toKNN
neighbor
the blueneighbors
rectangles
searchis shown
(three in 1Figure
blue rectangles
Epnechnikov;
to be 7.applied
It
theclassified
vs.
be correlation
two green
2 =
(the coefficient.
triangles
either
Tricube;
default as3 blue
= inside
squares
Inverse;
algorithm the
that4 =cases
outer
or as of
green
Gaussian;
was also KNN
circle). = −1,
Variation
triangles.
and
selected 5 =in If of
Constant.
our k ==
the
study 2,
3 correlation
(i.e.,
(default
was = 4,
depicted
0 KNN
=
LinearNN). in =
coefficient
solid
Linear)). 6, KNN (R²
line ) = 8,
versus
circle) andit isKNN
number =10,
depicted of
can bevs.seen that changing
two green triangles inside the the KNN outercould circle).significantly
Variation of theenhance correlationthe correlation
coefficient (R²) versuscoefficient.
number ofFor the
the
used training
by the KNN
A green
good correlation
neighbors
triangles
example is
of coefficients
shown
asKNNthere
k-nearest in
are twowere
Figure 0.9025,
7.
triangles It can 0.9579,
be seen
(reversed 0.9861,
that 0.9916,
changing
inchanging
shape) and 0.9937,
the
only KNN and
one(0 0.9943,
could
rectangle respectively.
significantly
through In
enhance
theshouldthe
inner
cases weightingKernel—The
used
of KNN KNN=neighbors
−1, =
is shown 2,neighbors
number in Figure
KNN that
= 4, algorithm
7. determines
It
KNNcan 6, is
be=seen theshown
that
KNN = 8,inand
weighting Figure
the KNN
KNN 6.could
function. The
=10, test sample
= Linear;
significantly
the training 1(red
enhance dot)
= Epnechnikov;
correlation
case
the
(i.e.,
be of our
correlation
continuous
classified study
the either we
line)
correlation
2 = Tricube; as proposed
coefficient.
circle.
blue For
If k
squares
coefficient. the
the
=For5 KNN
cases
(dashed
or
the as =
of
green
cases −1
KNN
line
of KNNas it= considers
circle)−1,
triangles.
= −1, itKNN
KNN isIfassigned
k
= =
2,all
= 2,
3
KNN neighbors.
KNN to
(i.e.,
= 4,the=
KNN 4,
blue
depicted KNN
= 6, =
rectangles
in
KNN solid6,
= KNN
8, line
and(three= 8,
circle)
KNN and
blue
=10,it isKNN
rectangles=10,
depicted
coefficients were30.9025,= Inverse; 0.9579, 4 = 0.9861,
Gaussian; 0.9916, and 0.9937,
5 = Constant.and 0.9943, (default 0 = Linear)).
respectively. In the case of our study
the
vs. training
two thecorrelation
green training
triangles correlation
inside coefficients
coefficients the were
outer were 0.9025,Variation
0.9025,
circle). 0.9579, 0.9861,
0.9579, 0.9861,
of 0.9916,
the 0.9937,
0.9916,
correlation and coefficient
0.9937, 0.9943,
and respectively.
0.9943, (R² In the number
respectively.
) versus In the
of
by theweA green
good
proposed triangles
example the of
KNNask-nearest
there= −1 are two
asneighbors triangles
it considers (reversed
algorithm in shape)
is shown and only
in Figure 6. The onetest rectangle
samplethrough (red dot) theshould
inner
case of our study we proposed the KNN = −1 asall neighbors.
it considers all neighbors.
case
used
(i.e.,
be of
KNNour neighbors
continuous
classified study
either we
line)asproposed
is shown
circle.
blue If k =the
squares KNN
in5 Figure
(dashed
or =7.−1
as green It as
line can itbeconsiders
circle)
triangles. seen k =all3changing
that
it isIfassigned neighbors. thethe
todepicted
(i.e., blue KNN could
rectangles
in solid line significantly
(three
circle) blue enhance
it isrectangles
depicted
the
vs. correlation
two green coefficient.
triangles For
inside the cases
outer of KNN
circle). = −1,
Variation KNN
by the green triangles as there are two triangles (reversed in shape) and only one rectangle through the inner of =
the2, KNN
correlation = 4, KNN =
coefficient 6, KNN (R² ) = 8,
versus and KNN
number =10,
of
the training
used
(i.e., continuous correlation
KNN neighbors is coefficients
line) circle. shown If k =in5 Figure were 7.
(dashed 0.9025,
It can
line 0.9579,
be seen
circle) it is0.9861,
that changing
assigned 0.9916,
to the 0.9937,
the
blue KNN andcould
rectangles0.9943, respectively.
significantly
(three In the
enhance
blue rectangles
casetwo
the
vs. of our
correlation
greenstudy we proposed
coefficient.
triangles For the
inside theouterKNNof
cases = −1 KNN
circle). as Variation
it=considers
−1, KNN of =allthe2,neighbors.
KNN
correlation = 4, KNN = 6, KNN
coefficient (R²)=versus
8, andnumber KNN =10, of
the training
used correlation
KNN neighbors is coefficients
shown in Figure were 7. 0.9025,
It can0.9579,be seen0.9861, that changing0.9916, 0.9937, the KNN andcould 0.9943, respectively.
significantly In the
enhance
casecorrelation
the of our study we proposed
coefficient. For the the cases
KNNof = −1 KNN as it=considers
−1, KNN =all2,neighbors. KNN = 4, KNN = 6, KNN = 8, and KNN =10,
the training correlation coefficients were 0.9025, 0.9579, 0.9861, 0.9916, 0.9937, and 0.9943, respectively. In the
case of our study we proposed the KNN = −1 as it considers all neighbors.
6. Example
FigureFigure of k-nearest
6. Example of k-nearest neighbors (KNN)
neighbors (KNN) regression/classification.
regression/classification.
0.98
0.94
0.92
0.9
0.88
0.86
0.84
-1 0 1 2 3 4 5 6 7 8
number of neighbours employed
Variationofofthe
Figure7.7.Variation thecorrelation
correlation coefficient 2 ) versus number of used KNN neighbors in lazy
(Rversus
Figure coefficient (R²) number of used KNN neighbors in lazy locally
locally weighted
weighted learning learning (LLWL) technique.
(LLWL) technique.
Table 2. Evaluation metrics calculated for the alternating model tree (AMT) method varied based on
number of iterations.
Number of Iterations
Evaluation metrics 10 20 30 40 50
Correlation coefficient 0.9984 0.9971 0.9974 0.9975 0.9972
Mean absolute error 0.4349 0.7527 0.7051 0.6464 0.6666
Root mean squared error 0.5752 0.9566 0.8936 0.8495 0.8995
Relative absolute error (%) 4.75 7.94 7.43 6.82 7.0341
Root relative squared error (%) 5.69 8.94 8.35 7.93 8.4062
Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 19
0.9986
0.9984
Correlation coefficient (R²)
0.9982
0.998
0.9978
0.9976
0.9974
0.9972
0.997
0 10 20 30 40 50 60
Number of iterations
Figure
Figure 8. Variation
8. Variation of the
of the correlation
correlation coefficient
coefficient (R2versus
(R²) ) versusnumber
numberof
of iterations,
iterations, in
in alternating
alternatingmodel
model tree
tree (AMT)
(AMT) technique.technique.
helps the solution to be more stable when there is a high correlation between the features. For the
Appl. Sci. 2019, 9, x FOR PEER REVIEW 12 of 19
number of the models (i.e., the set length of the lambda sequence to be generated), the value of 100
was used.
selected. For the
The batch sizenumber of decimal
was considered toplaces
be 100.(i.e.,
Thethe number
values of decimal
of alpha placeswere
and epsilon to beset
used for0.001
to be the and
output of numbers in the model), as usual, the value of 2 was selected. The batch size was considered
0.0001, respectively. Along with the above-mentioned structure, a unique linear regression equation can also
to be 100.
be found fromThe
thevalues
ENet asof shown
alpha and epsilon were
in Equation (4): set to be 0.001 and 0.0001, respectively. Along with the
above-mentioned structure, a unique linear regression equation can also be found from the ENet as
HL = -0.049
shown in Equation (4): × X2 + 0.100 × X3 + -0.075 × X4 + 0.144 × X5 + −0.003 × X6 + 0.051 × X7 + (4)
0.161 × X8 + 35.597.
HL = −0.049 × X2 + 0.100 × X3 + −0.075 × X4 + 0.144 × X5 + −0.003 × X6 + 0.051 × X7 +
(4)
3.6. Radial Basis Function Regression (RBFr) 0.161 × X8 + 35.597.
Radial
3.6. basis
Radial Basisfunction
Functionnetwork (RBFr)
Regression (RBFr)has a unique structure, as explained in Figure 9. Equation (5)
illustrates the basis function of this network [55]. For solving the issue, radial basis function regression can be
Radial basis function network (RBFr) has a unique structure, as explained in Figure 9. Equation (5)
used by fitting a collection of kernels for the dataset. In addition, this method attends the position of noisy
illustrates the basis function of this network [55]. For solving the issue, radial basis function regression
samples.
can be used by fitting a collection of kernels for the dataset. In addition, this method attends the
position of noisy samples. ‖𝑥 − 𝑥 ‖
𝑂 = 𝐾 (5)
kx −𝜏xi k
Oi = K (5)
Oi stands for the output of the neuron, and xi shows τthe
2
i center of kernel K. in addition, the term 𝜏 stands
for the width of the ith RBF unit.
Figure9.9.Typical
Figure Typicalarchitecture
architectureofofradial
radialbasis
basisfunction
functionregression
regression(RBFr)
(RBFr)neural
neuralnetwork.
network.
TheOmodel
i standsof
forRBFr utilizes
the output of athe
bath algorithm
neuron, for predicting
and xi shows the centerthe number
of kernel of addition,
K. in developed term τi This
thekernels.
prediction is performed
stands for the width by bathithalgorithms.
of the RBF unit. The specific function of expectation that is utilized in RBFR model
is as below:
The model of RBFr utilizes a bath algorithm for predicting the number of developed kernels. This
prediction is performed by bath algorithms.
𝐹(𝑥) = ∑The 𝑘(‖𝑥
specific
− 𝑥function
‖) 𝜑 . of expectation that is utilized(6) in
RBFR model is as below:
‖𝑥‖ stands to symbolize the Euclidean norm Xz
on 𝑥. 𝑘(‖𝑥 − 𝑥 ‖)| 𝑖 = 1,2, . . . , 𝑧 stands as a group of 𝑧
F(x) = k(kx − x k) ϕi . (6)
non-linear along with constant RBFr. In addition, the term 𝜑i shows the coefficient of regression.
i=1
kxk stands to symbolize the Euclidean norm on x. k(kx − xi k)| i = 1, 2, . . . , z stands as a group of
z non-linear along with constant RBFr. In addition, the term ϕi shows the coefficient of regression.
(Yipredicted − Yiobserved )2
Ps
2 i=1
R = 1− P 2
, (7)
s
i=1 (Yiobserved − Yobserved )
1 Xs
MAE = Yiobserved − Yipredicted , (8)
N I =1
r
1 Xs h i2
RMSE = Yiobserved − Yipredicted , (9)
N i=1
Ps
i=1 Yipredicted − Yiobserved
RAE = Ps , (10)
i=1 Yiobserved − Yobserved
v
u
t Ps
u 2
i=1 (Yipredicted − Yiobserved )
RRSE = Ps 2
, (11)
i=1 (Yiobserved − Yobserved )
where, Yiobserved and Yipredicted , represented in Equations (6) to (10), are the actual and estimated values of
heating load in energy-efficient buildings, respectively. The term S in the above equations stands for
the number of instances and Yobserved denotes the mean of the real amounts of the heating load. Weka
software environment was employed to perform the machine learning models.
compared to other techniques. The weakest estimation model results were from ENet solution where
the R2 , MAE, RMSE, RAE (%), and RRSE (%) were 0.8915, 3.2332, 4.5678, 35.3566, and 45.2993 in the
training dataset, respectively, and the R2 , MAE, RMSE, RAE (%), and RRSE (%) were 0.896, 3.2585,
4.4683, 35.4392, and 44.2052 for the testing dataset, respectively. According to the total scoring, the
best performance from Table 4 was found to be RF. Right after RF model, the next best estimation
network was obtained for the AMT technique. The R2 , MAE, RMSE, RAE (%), and RRSE (%) for
the AMT training dataset were 0.9985, 0.4096, 0.5449, 4.4788, and 5.4036, respectively. The R2 , MAE,
RMSE, RAE (%), and RRSE (%) for the AMT testing dataset were 0.9981, 0.4869, 0.6236, 5.2956, and
6.1693, respectively.
Table 3. The performance of selected machine learning techniques in the prediction of heating load
through several statistical indexes (training dataset).
Network Results Ranking the Predicted Models Total
Proposed Models RAE RRSE RAE RRSE Ranking Rank
R2 MAE RMSE R2 MAE RMSE Score
(%) (%) (%) (%)
lazy.LWL 0.903 3.2838 4.3335 35.9104 42.9757 2 1 2 1 2 8 5
Alternating Model
0.9985 0.4096 0.5449 4.4788 5.4036 5 5 5 5 5 25 2
Tree
Random Forest 0.9997 0.19 0.2399 2.078 2.3795 6 6 6 6 6 30 1
ElasticNet 0.8915 3.2332 4.5678 35.3566 45.2993 1 2 1 2 1 7 6
MLP Regressor 0.9915 0.9795 1.3156 10.7117 13.0465 4 4 4 4 4 20 3
RBF Regressor 0.9647 1.8226 2.6555 19.9307 26.3348 3 3 3 3 3 15 4
Table 4. The performance of selected machine learning techniques in the prediction of heating load
through several statistical indexes (testing dataset).
Network Results Ranking the Predicted Models Total
Proposed Models RAE RRSE RAE RRSE Ranking Rank
R2 MAE RMSE R2 MAE RMSE Score
(%) (%) (%) (%)
lazy.LWL 0.9049 3.2345 4.2752 35.1778 42.2953 2 2 2 2 2 10 5
Alternating Model
0.9981 0.4869 0.6236 5.2956 6.1693 5 5 5 5 5 25 2
Tree
Random Forest 0.9989 0.3385 0.4649 3.6813 4.5995 6 6 6 6 6 30 1
ElasticNet 0.896 3.2585 4.4683 35.4392 44.2052 1 1 1 1 1 5 6
MLP Regressor 0.9868 1.12 1.6267 12.1811 16.0934 4 4 4 4 4 20 3
RBF Regressor 0.9693 1.9109 2.4647 20.7827 24.3837 3 3 3 3 3 15 4
The results of network reliability based on the R2 performance of all proposed models, for both
training and testing, are provided in Figures 10 and 11. As stated earlier, both of the RF models could
provide a more reliable predictive network with higher accuracy when compared to other proposed
techniques. The results of network output for the proposed RF are illustrated in Figures 10d and 11d.
Having the provided information, the predictive network of RF proved to be slightly better than other
proposed techniques and was superior in making a better regression relationship among the estimated
and actual values.
training and testing, are provided in Figures 10 and 11. As stated earlier, both of the RF models could
provide a more reliable predictive network with higher accuracy when compared to other proposed
techniques. The results of network output for the proposed RF are illustrated in Figures 10 (d) and 11
(d). Having the provided information, the predictive network of RF proved to be slightly better than
Appl.
otherSci. 2019, 9, 4338
proposed techniques and was superior in making a better regression relationship among13 ofthe
17
estimated and actual values.
(a) (b)
(c) (d)
(e) (f)
Figure 10.
Figure 10 .The
Thenetwork
networkoutputs
outputsforfor
thethe training
training dataset.
dataset. (a) MLPr;
(a) MLPr; (b) LLWL;
(b) LLWL; (c) AMT;
(c) AMT; (d)(e)RF;
(d) RF; (e)
ENet;
ENet; (f)
(f) RBFr. RBFr.
(a) (b)
(c) (d)
(e) (f)
Figure
Figure 11.
11. The
Thenetwork
networkoutputs
outputsfor
forthe
thetesting
testingdataset.
dataset.(a)
(a)MLPr;
MLPr;(b)
(b)LLWL;
LLWL;(c)
(c)AMT;
AMT;(d)
(d)RF;
RF;(e)
(e)ENet;
ENet;
(f)
(f) RBFr.
RBFr.
5. Conclusions
5. Conclusions
In the
In the current
currentstudy,
study,several predictive
several networks
predictive werewere
networks introduced and evaluated.
introduced The study
and evaluated. The aimed
study
to assess
aimed to and compare
assess severalseveral
and compare of the most well-known
of the machinemachine
most well-known learning-based techniques
learning-based in order in
techniques to
introduce the most reliable predictive method in early estimation of heating load in energy-efficient
order to introduce the most reliable predictive method in early estimation of heating load in energy-
residential
efficient building systems.
residential buildingMachine
systems.learning-based solutions, namely,
Machine learning-based MLPr,namely,
solutions, LLWL, AMT,
MLPr,RF, ENet,
LLWL,
and RBFr models were employed to estimate the heating load. The results of the best model from the
proposed techniques were presented. Based on the presented outcomes, it may be said that, except
for the ENet model, almost all models (i.e., MLPr, LLWL, AMT, RF, and RBFr) have good prediction
output results in estimating heating load in energy-efficient building systems. In this regard, the
RF machine learning technique could be suggested as the most reliable and accurate among other
predictive techniques provided in the present work. The learning approach is good in RF predictive
models when compared to other models concerning both the training and validation models. The
values of R2 , MAE, RMSE, RAE (%), and RRSE (%) in the RF model training dataset, were 0.9997, 0.19,
0.2399, 2.078, and 2.3795, respectively. The values of R2 , MAE, RMSE, RAE (%), and RRSE (%) in the
AMT model training dataset were 0.9985, 0.4096, 0.5449, 4.4788, and 5.4036, respectively. Validated
testing datasets from the selected techniques also showed appropriate accuracy as R2 , MAE, RMSE,
RAE (%), and RRSE (%) in the testing output of the RF model were found to be 0.9989, 0.3385, 0.4649,
3.6813, and 4.5995, respectively; R2 , MAE, RMSE, RAE (%), and RRSE (%) in the testing output of
the AMT model were found to be 0.9981, 0.4869, 0.6236, 5.2956, and 6.1693, respectively. The worst
validation was found for the ENet technique with R2 , MAE, RMSE, RAE (%), and RRSE (%) equal to
0.896, 3.2585, 4.4683, 35.4392, and 44.2052, respectively.
Author Contributions: H.M., D.T.B., A.D. wrote the manuscript, discussion and analyzed the data. H.M. and
A.D., Z.L., and L.K.F. edited, restructured, and professionally optimized the manuscript.
Funding: This study was funded by the Ton Duc Thang University and University of South-Eastern Norway.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Nguyen, T.N.; Tran, T.P.; Hung, H.D.; Voznak, M.; Tran, P.T.; Minh, T.; Thanh-Long, N. Hybrid TSR-PSR
alternate energy harvesting relay network over Rician fading channels: Outage probability and SER analysis.
Sensors 2018, 18, 3839. [CrossRef]
2. Najafi, B.; Ardabili, S.F.; Mosavi, A.; Shamshirband, S.; Rabczuk, T. An intelligent artificial neural
network-response surface methodology method for accessing the optimum biodiesel and diesel fuel
blending conditions in a diesel engine from the viewpoint of exergy and energy analysis. Energies 2018, 11,
860. [CrossRef]
3. Nazir, R.; Ghareh, S.; Mosallanezhad, M.; Moayedi, H. The influence of rainfall intensity on soil loss mass
from cellular confined slopes. Measurement 2016, 81, 13–25. [CrossRef]
4. Mojaddadi, H.; Pradhan, B.; Nampak, H.; Ahmad, N.; Ghazali, A.H.B. Ensemble machine-learning-based
geospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS. Geomat. Nat.
Hazards Risk 2017, 8, 1080–1102. [CrossRef]
5. Rizeei, H.M.; Pradhan, B.; Saharkhiz, M.A. Allocation of emergency response centres in response to pluvial
flooding-prone demand points using integrated multiple layer perceptron and maximum coverage location
problem models. Int. J. Disaster Risk Reduct. 2019, 101205. [CrossRef]
6. Rizeei, H.M.; Pradhan, B.; Saharkhiz, M.A. Urban object extraction using Dempster Shafer feature-based
image analysis from worldview-3 satellite imagery. Int. J. Remote Sens. 2019, 40, 1092–1119. [CrossRef]
7. Mezaal, M.; Pradhan, B.; Rizeei, H. Improving landslide detection from airborne laser scanning data using
optimized Dempster–Shafer. Remote Sens. 2018, 10, 1029. [CrossRef]
8. Aal-shamkhi, A.D.S.; Mojaddadi, H.; Pradhan, B.; Abdullahi, S. Extraction and modeling of urban sprawl
development in Karbala City using VHR satellite imagery. In Spatial Modeling and Assessment of Urban Form;
Springer: Cham, Switzerland, 2017; pp. 281–296.
9. Gao, W.; Wang, W.; Dimitrov, D.; Wang, Y. Nano properties analysis via fourth multiplicative ABC indicator
calculating. Arab. J. Chem. 2018, 11, 793–801. [CrossRef]
10. Aksoy, H.S.; Gör, M.; İnal, E. A new design chart for estimating friction angle between soil and pile materials.
Geomech. Eng. 2016, 10, 315–324. [CrossRef]
11. Gao, W.; Dimitrov, D.; Abdo, H. Tight independent set neighborhood union condition for fractional critical
deleted graphs and ID deleted graphs. Discret. Contin. Dyn. Syst. S 2018, 12, 711–721. [CrossRef]
Appl. Sci. 2019, 9, 4338 16 of 17
12. Bui, D.T.; Moayedi, H.; Gör, M.; Jaafari, A.; Foong, L.K. Predicting slope stability failure through machine
learning paradigms. ISPRS Int. Geo-Inf. 2019, 8, 395. [CrossRef]
13. Gao, W.; Guirao, J.L.G.; Abdel-Aty, M.; Xi, W. An independent set degree condition for fractional critical
deleted graphs. Discret. Contin. Dyn. Syst. S 2019, 12, 877–886. [CrossRef]
14. Moayedi, H.; Bui, D.T.; Gör, M.; Pradhan, B.; Jaafari, A. The feasibility of three prediction techniques of the
artificial neural network, adaptive neuro-fuzzy inference system, and hybrid particle swarm optimization
for assessing the safety factor of cohesive slopes. ISPRS Int. Geo-Inf. 2019, 8, 391. [CrossRef]
15. Gao, W.; Guirao, J.L.G.; Basavanagoud, B.; Wu, J. Partial multi-dividing ontology learning algorithm. Inf. Sci.
2018, 467, 35–58. [CrossRef]
16. Ince, R.; Gör, M.; Alyamaç, K.E.; Eren, M.E. Multi-fractal scaling law for split strength of concrete cubes. Mag.
Concr. Res. 2016, 68, 141–150. [CrossRef]
17. Gao, W.; Wu, H.; Siddiqui, M.K.; Baig, A.Q. Study of biological networks using graph theory. Saudi J. Biol.
Sci. 2018, 25, 1212–1219. [CrossRef]
18. Ngo, N.T. Early predicting cooling loads for energy-efficient design in office buildings by machine learning.
Energy Build. 2019, 182, 264–273. [CrossRef]
19. Jafarinejad, T.; Erfani, A.; Fathi, A.; Shafii, M.B. Bi-level energy-efficient occupancy profile optimization
integrated with demand-driven control strategy: University building energy saving. Sustain. Cities Soc. 2019,
48, 101539. [CrossRef]
20. Kheiri, F. A review on optimization methods applied in energy-efficient building geometry and envelope
design. Renew. Sustain. Energy Rev. 2018, 92, 897–920. [CrossRef]
21. Wang, W.; Chen, J.Y.; Huang, G.S.; Lu, Y.J. Energy efficient HVAC control for an IPS-enabled large space in
commercial buildings through dynamic spatial occupancy distribution. Appl. Energy 2017, 207, 305–323.
[CrossRef]
22. Yu, Z.; Haghighat, F.; Fung, B.C.M. Advances and challenges in building engineering and data mining
applications for energy-efficient communities. Sustain. Cities Soc. 2016, 25, 33–38. [CrossRef]
23. Zhang, H.; Yuan, C.; Yang, G.; Wu, L.; Peng, C.; Ye, W.; Shen, Y.; Moayedi, H. A novel constitutive modelling
approach measured under simulated freeze–thaw cycles for the rock failure. Eng. Comput. 2019. [CrossRef]
24. Bui, D.T.; Moayedi, H.; Anastasios, D.; Foong, L.K. Predicting heating and cooling loads in energy-efficient
buildings using two hybrid intelligent models. Appl. Sci. 2019, 9, 3543.
25. Moayedi, H.; Nguyen, H.; Rashid, A.S.A. Comparison of dragonfly algorithm and Harris hawks optimization
evolutionary data mining techniques for the assessment of bearing capacity of footings over two-layer
foundation soils. Eng. Comput. 2019. [CrossRef]
26. Moayedi, H.; Aghel, B.; Abdullahi, M.A.M.; Nguyen, H.; Rashid, A.S.A. Applications of rice husk ash as
green and sustainable biomass. J. Clean. Prod. 2019, 237, 117851. [CrossRef]
27. Huang, X.X.; Moayedi, H.; Gong, S.; Gao, W. Application of metaheuristic algorithms for pressure analysis of
crude oil pipeline. Energy Sources Part A Recovery Util. Environ. Effects 2019. [CrossRef]
28. Gao, W.; Alsarraf, J.; Moayedi, H.; Shahsavar, A.; Nguyen, H. Comprehensive preference learning and feature
validity for designing energy-efficient residential buildings using machine learning paradigms. Appl. Soft
Comput. 2019, 84, 105748. [CrossRef]
29. Bui, D.T.; Moayedi, H.; Kalantar, B.; Osouli, A.; Pradhan, B.; Nguyen, H.; Rashid, A.S.A. Harris hawks
optimization: A novel swarm intelligence technique for spatial assessment of landslide susceptibility. Sensors
2019, 19, 3590. [CrossRef]
30. Biswas, M.R.; Robinson, M.D.; Fumo, N. Prediction of residential building energy consumption: A neural
network approach. Energy 2016, 117, 84–92. [CrossRef]
31. Fan, C.; Wang, J.; Gang, W.; Li, S. Assessment of deep recurrent neural network-based strategies for short-term
building energy predictions. Appl. Energy 2019, 236, 700–710. [CrossRef]
32. Ince, R.; Gör, M.; Eren, M.E.; Alyamaç, K.E. The effect of size on the splitting strength of cubic concrete
members. Strain 2015, 51, 135–146. [CrossRef]
33. Sayin, E.; Yön, B.; Calayir, Y.; Gör, M. Construction failures of masonry and adobe buildings during the 2011
Van earthquakes in Turkey. Struct. Eng. Mech. 2014, 51, 503–518. [CrossRef]
34. Zemella, G.; de March, D.; Borrotti, M.; Poli, I. Optimised design of energy efficient building facades via
evolutionary neural networks. Energy Build. 2011, 43, 3297–3302. [CrossRef]
Appl. Sci. 2019, 9, 4338 17 of 17
35. Chou, J.S.; Bui, D.K. Modeling heating and cooling loads by artificial intelligence for energy-efficient building
design. Energy Build. 2014, 82, 437–446. [CrossRef]
36. Hidayat, I.; Utami, S.S. Activity based smart lighting control for energy efficient building by neural network
model. In Astechnova 2017 International Energy Conference; Sunarno, I., Sasmito, A.P., Hong, L.P., Eds.; EDP
Sciences: Les Ulis, France, 2018; Volume 43.
37. Malik, S.; Kim, D. Prediction-learning algorithm for efficient energy consumption in smart buildings based
on particle regeneration and velocity boost in particle swarm optimization neural networks. Energies 2018,
11, 1289. [CrossRef]
38. Pino-Mejías, R.; Pérez-Fargallo, A.; Rubio-Bellido, C.; Pulido-Arcas, J.A. Comparison of linear regression and
artificial neural networks models to predict heating and cooling energy demand, energy consumption and
CO2 emissions. Energy 2017, 118, 24–36. [CrossRef]
39. Deb, C.; Eang, L.S.; Yang, J.; Santamouris, M. Forecasting diurnal cooling energy load for institutional
buildings using artificial neural networks. Energy Build. 2016, 121, 284–297. [CrossRef]
40. Li, Q.; Meng, Q.; Cai, J.; Yoshino, H.; Mochida, A. Predicting hourly cooling load in the building: A
comparison of support vector machine and different artificial neural networks. Energy Convers. Manag. 2009,
50, 90–96. [CrossRef]
41. Kolokotroni, M.; Davies, M.; Croxford, B.; Bhuiyan, S.; Mavrogianni, A. A validated methodology for the
prediction of heating and cooling energy demand for buildings within the Urban Heat Island: Case-study of
London. Sol. Energy 2010, 84, 2246–2255. [CrossRef]
42. Nguyen, H.; Moayedi, H.; Foong, L.K.; Al Najjar, H.A.H.; Jusoh, W.A.W.; Rashid, A.S.A.; Jamali, J. Optimizing
ANN models with PSO for predicting short building seismic response. Eng. Comput. 2019, 35, 1–15.
[CrossRef]
43. Tsanas, A.; Xifara, A. Accurate quantitative estimation of energy performance of residential buildings using
statistical machine learning tools. Energy Build. 2012, 49, 560–567. [CrossRef]
44. Atkeson, C.G.; Moore, A.W.; Schaal, S. Locally weighted learning for control. In Lazy Learning; Springer:
Dordrecht, The Netherlands, 1997; pp. 75–113.
45. Frank, E.; Mayo, M.; Kramer, S. Alternating model trees. In Proceedings of the 30th Annual ACM Symposium
on Applied Computing, Salamanca, Spain, 13–17 April 2015; pp. 871–878.
46. Hamilton, C.R. Hourly Solar Radiation Forecasting through Neural Networks and Model Trees. Ph.D. Thesis,
University of Georgia, Athens, GA, USA, 2016.
47. Rodrigues, É.O.; Pinheiro, V.; Liatsis, P.; Conci, A. Machine learning in the prediction of cardiac epicardial
and mediastinal fat volumes. Comput. Biol. Med. 2017, 89, 520–529. [CrossRef] [PubMed]
48. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
49. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Wadsworth
International Group: Belmont, CA, USA, 1984.
50. Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach.
Intell. 1998, 20, 832–844.
51. Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification
and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43,
1947–1958. [CrossRef] [PubMed]
52. Diaz-Uriarte, R.; de Andres, S.A. Gene selection and classification of microarray data using random forest.
BMC Bioinform. 2006, 7, 3. [CrossRef]
53. Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T. Random forests for classification in ecology.
Ecology 2007, 88, 2783–2792. [CrossRef]
54. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat.
Methodol.) 2005, 67, 301–320. [CrossRef]
55. Buhmann, M.D. Radial Basis Functions: Theory and Implementations; Cambridge University Press: Cambridge,
UK, 2003; Volume 12.
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).