0% found this document useful (0 votes)

11 views

Reference - 8

Uploaded by

Lokesh Vankudoth

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Reference - 8

Uploaded by

Lokesh Vankudoth

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

applied

sciences
Article
Predicting Heating Load in Energy-Efficient Buildings
Through Machine Learning Techniques
Hossein Moayedi 1,2, * , Dieu Tien Bui 3,4, * , Anastasios Dounis 5 , Zongjie Lyu 6 and
Loke Kok Foong 7
1 Department for Management of Science and Technology Development, Ton Duc Thang University,
Ho Chi Minh City 758307, Vietnam
2 Faculty of Civil Engineering, Ton Duc Thang University, Ho Chi Minh City 758307, Vietnam
3 Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
4 Geographic Information System Group, Department of Business and IT, University of South-Eastern
Norway, N-3800 Bø i Telemark, Norway
5 University of West Attica, Dept. of Industrial Design and Production Engineering, Campus 2, 250 Thivon &
P. Ralli, 12244 Egaleo, Greece; [email protected]
6 State Key Laboratory of Eco-hydraulics in Northwest Arid Region of China, Xi’an University of Technology,
Xi’an 710048, China; [email protected]
7 School of Civil Engineering, Faculty of Engineering, Universiti Teknologi Malaysia, Johor Bahru 81310, Johor,
Malaysia; [email protected]
* Correspondence: [email protected] (H.M.); [email protected] (D.T.B.);
Tel.: +84-(47)96677678 (H.M.)

Received: 5 September 2019; Accepted: 11 October 2019; Published: 15 October 2019

Abstract: The heating load calculation is the first step of the iterative heating, ventilation, and air
conditioning (HVAC) design procedure. In this study, we employed six machine learning techniques,
namely multi-layer perceptron regressor (MLPr), lazy locally weighted learning (LLWL), alternating
model tree (AMT), random forest (RF), ElasticNet (ENet), and radial basis function regression (RBFr)
for the problem of designing energy-efficient buildings. After that, these approaches were used to
specify a relationship among the parameters of input and output in terms of the energy performance
of buildings. The calculated outcomes for datasets from each of the above-mentioned models were
analyzed based on various known statistical indexes like root relative squared error (RRSE), root mean
squared error (RMSE), mean absolute error (MAE), correlation coefficient (R2 ), and relative absolute
error (RAE). It was found that between the discussed machine learning-based solutions of MLPr,
LLWL, AMT, RF, ENet, and RBFr, the RF was nominated as the most appropriate predictive network.
The RF network outcomes determined the R2 , MAE, RMSE, RAE, and RRSE for the training dataset to
be 0.9997, 0.19, 0.2399, 2.078, and 2.3795, respectively. The RF network outcomes determined the R2 ,
MAE, RMSE, RAE, and RRSE for the testing dataset to be 0.9989, 0.3385, 0.4649, 3.6813, and 4.5995,
respectively. These results show the superiority of the presented RF model in estimation of early
heating load in energy-efficient buildings.

Keywords: energy-efficient buildings; smart buildings; machine learning; random forest; optimization

1. Introduction
In recent decades, artificial intelligence-based methods have been dramatically applied by scientists
in different fields of study, particularly in energy systems engineering (such as in Nguyen et al. [1]
and Najafi et al. [2]). In this regard, scientific applications of machine learning-based techniques
were considered to be a proper alternative in order to forecast the quantity of energy in constructions.
Consequently, an appropriate inspection of the particular energy performance for buildings and optimal

Appl. Sci. 2019, 9, 4338; doi:10.3390/app9204338 www.mdpi.com/journal/applsci

Appl. Sci. 2019, 9, 4338 2 of 17

contriving of the heating, ventilation, and air-conditioning (HVAC) system will help in pushing further
sustainable consumption related to energy. The world’s energy consumption is still maintained at a
high value and even though many countries have taken some reasonable measures it is expected that
energy consumption will increase in future. Many believe that this is because of the rapid expansion
of economy and the improvement of living requirements. Currently, energy required for buildings
accounts for almost 40% of all energy use in Europe [3]. Some reports have indicated that in countries
such as United States and China, this value accounts for about 39% of the whole energy demand
along with 27.5% of nationally consumed energy. As a novel idea, most recently, intelligent predictive
tools have been utilized for the field of energy consumption calculation. In fact, the problem of
heating load calculation in energy-efficient buildings is an established concern. For realizing the best
artificial intelligence (AI) model to meet this goal, this study provides and compares five well-known
models that are widely used by researchers [4–8]. Similar to other research in the fields of science and
technology, AI techniques have widespread application in order to put forward reasonable evaluation
in many engineering problems [9–17] of the energy consumption in buildings. In numerous types
of artificial intelligence-based solutions, artificial neural network (ANN) is known as a recognized
method that is largely employed for many prediction-based examples [18–22]. Similar studies are
performed in regard to hybrid metaheuristic optimization approaches [23–29]. Also, in the field of
energy management, neural networks have emerged as one of the effective prediction tools [30–33].
Zemella et al. [34] investigated the design optimization of energy efficient buildings by employing
several evolutionary neural networks. The methods were applied to drive the design of a typical facade
module (i.e., play a key role in the definition of the energy performance of buildings) for an office
building. Chou and Bui [35] employed various data mining-based solutions in order to predict the
energy performance of buildings and to facilitate early designs of energy conserving buildings. These
techniques include support vector regression (SVR), ANN, regression and classification tree, ensemble
inference model, general linear regression, and chi-squared automatic interaction detector. Yu et al. [22]
studied the challenges and advances in data mining applications for communities concerning
energy-efficient buildings. Hidayat et al. [36] employed a neural network model in an energy-efficient
building to achieve proper smart lighting control. Kheiri [20] reviewed different techniques of
optimization applied to the energy-efficient building. Malik and Kim [37] investigated smart buildings
and their efficient energy consumption. In this regard, various prediction-learning algorithms including
particle-based hybrid optimization algorithm were employed and their performances were evaluated.
Ngo [18] explored the excellent capacity of machine learning to assess early predicting cooling loads.
The main objective of such prediction was prediction of cooling loads in the office buildings through
machine learning-based solutions. His study successfully achieved the objective by providing some
neural network-based equations. Mejías et al. [38] employed both of the linear regression and neural
network to predict three conceptions associated with energy consumption, cooling, and heating energy
demands. The results of their studies proved that the neural network was superior to other models.
Deb et al. [39] explored the potential of neural network-based solutions in forecasting the diurnal
cooling energy load; this study used recorded data of the five days before the day of the experiment
to estimate the energy consumption; the outcomes demonstrated that the ANN approach is very
effective. Moreover, Li et al. [40] performed a comparative analysis between different machine learning
techniques such as radial basis function neural network (RBFNN), general regression neural network
(GRNN), traditional backpropagation neural network (BPNN), and support vector machine (SVM) in
predicting the hourly cooling load of a normal residential building.
There are few studies (e.g., Kolokotroni et al. [41] and Nguyen et al. [42]) on the machine
learning-based modeling application on the prediction of heating load. Nevertheless, using machine
learning paradigms for optimizing the answers determined by the best artificial intelligence-based
models is the chief aim of the actual study. To help engineers obtain an optimized design of
energy-efficient buildings without any further experiments, this knowledge gap should be addressed.
Hence, the basic purpose of this work is to estimate the amount of heating load in energy-efficient
Appl. Sci. 2019, 9, 4338 3 of 17

buildings by various new machine learning-based approaches. In the following, several machine
learning techniques such as multi-layer perceptron regressor (MLPr), lazy locally weighted learning
(LLWL), alternating model tree (AMT), random forest (RF), ElasticNet (ENet), and radial basis function
regression (RBFr) are employed to estimate the amount of heating load (HL) in energy-efficient buildings.

2. Database Collection
The required initial dataset was obtained from Tsanas and Xifara [43]. The obtained records
include eight inputs (i.e., conditional factors) and a separate output of heating load (i.e., response
factors or dependent outputs). Based on a residential building main conditional design factors, the
inputs were X1 (Relative Compactness), X2 (Surface Area), X3 (Wall Area), X4 (Roof Area), X5 (Overall
Height), X6 (Orientation), X7 (Glazing Area), and finally, X8 (Glazing Area Distribution). Likewise,
parameters of the heating load of the suggested building were presented to be forecasted by the inputs.
In addition, in this study the heating loads, as the main outputs, were simplified as heating load.
The characteristics of the analyzed building and fundamental assumptions are properly detailed in
the [43]. A total of 768 buildings were modelled considering twelve distinct buildings, five distribution
scenarios, four orientations, and four glazing areas. The obtained data is analyzed through Ecotect
computer software. A graphical view of this process is illustrated in Figure 1.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 19

FigureFigure 1. Graphicalview
1. Graphical view ofofdata
datapreparation.
preparation.

Statistical Details of the Dataset

As stated earlier, the amount of the heating load was applied as the main target of the
energy-efficient buildings, while the main influential parameters were roof area, wall area, relative
compactness, surface area, overall height, glazing area, glazing area distribution, and orientation.
The statistical explanation of energy-efficient residential buildings including conditional variables
is tabulated in Table 1. In addition, Figure 2 shows the variables of relative compactness, wall area,
Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci
surface area, overall height, roof area, glazing area, orientation (i.e., north, northeast, east, southeast,
south, southwest, west, northwest), heating load, and glazing area distribution on the x-axis, against a
heating load (Figure 3) on the y-axis.
2 ) m2) m2) m) (-) ) ) /h)

Used label X1 X2 X3 X4 X5 X6 X7 X8 Y1

No. of data 768

Minimum 0.6 514.5 245.0 110.3 3.5 2.0 0.0 0.0 6.0

Maximum 1.0 808.5 416.5 220.5 7.0 5.0 0.4 5.0 43.1
Appl. Sci. 2019, 9, 4338 4 of 17
Average 0.8 671.7 318.5 176.6 5.3 3.5 0.2 2.8 22.3

Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 19

(a) (b)

(c)REVIEW
Appl. Sci. 2019, 9, x; doi: FOR PEER (d)
www.mdpi.com/journal/applsci

(e) (f)

(g) (h)

FigureFigure 2. Schematic view of some of the input data layers (X1–X8 as shown in Table 1) in predicting heating load.
2. Schematic view of some of the input data layers (X1 –X8 as shown in Table 1) in predicting
(a) X1 (Relative Compactness); (b) X2 (Surface Area); (c) X3 (Wall Area); (d) X4 (Roof Area); (e) X5 (Overall Height);
heating load. (a) X 1 (Relative Compactness); (b) X2 (Surface
(f) X6 (Orientation); (g) X7 (Glazing Area); (h) X8 (Glazing
Area); (c) X (Wall Area); (d) X4 (Roof Area);
Area Distribution). 3
(e) X5 (Overall Height); (f) X6 (Orientation); (g) X7 (Glazing Area); (h) X8 (Glazing Area Distribution).

Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci

Appl. Sci. 2019, 9, 4338 5 of 17

Table 1. The statistical description details in term of energy-efficient design.

Main
Data Layers Used as Input
Output
Surface Wall Roof Overall Glazing Glazing Area Heating
Relative Orientation
Area Area Area Height Area Distribution Load
Compactness (-)
(m2 ) (m2 ) (m2 ) (m) (m2 ) (m2 ) (kW/h)
Used label X1 X2 X3 X4 X5 X6 X7 X8 Y1
No. of data 768
Minimum 0.6 514.5 245.0 110.3 3.5 2.0 0.0 0.0 6.0
Maximum 1.0 808.5 416.5 220.5 7.0 5.0 0.4 5.0 43.1
Average
Appl. Sci. 2019, 9,0.8 671.7
x FOR PEER REVIEW 318.5 176.6 5.3 3.5 0.2 2.8 22.3
7 of 19

40
Heatin load

0
0 200 400 600 800

Dataset number
Figure 3. Schematic
Figure view
3. Schematic of of
view some
someofofthe
theoutput datalayers
output data layers(i.e.,
(i.e.,heating
heating load).
load).

3. 3.Model
ModelDevelopment
Development
AnAnacceptable
acceptable predict
predict approach
approach that thatisisutilized
utilizedwith
withdifferent
different artificial
artificial intelligence-based
intelligence-based systems
systems like
MLPr,
like MLPr,LLWL,
LLWL,AMT,AMT,RF, RF,
ENet, and and
ENet, RBFrRBFr
models to predict
models heating
to predict load inload
heating energy-efficient buildings
in energy-efficient requires
buildings
several several
requires steps. After that,
steps. the best
After that,fitthe
model
bestisfitthen selected.
model Firstly,
is then the initial
selected. database
Firstly, should
the initial be separated
database shouldto
the datasets of training (80% of the whole dataset) and testing (20% of the whole dataset).
be separated to the datasets of training (80% of the whole dataset) and testing (20% of the whole dataset). In the current study
Inand
thebecause
currentofstudy
the size of the testing dataset, the predictability of generated networks is considered to be as a
and because of the size of the testing dataset, the predictability of generated
proof of their validations. Therefore, a greater percentage of the dataset is considered for the testing dataset to
networks is considered to be as a proof of their validations. Therefore, a greater percentage of the
be reliable for testing the trained network. Secondly, in order to obtain the best predictive network,
dataset is considered for the testing dataset to be reliable for testing the trained network. Secondly,
appropriate machine learning-based solutions have to be introduced. Lastly, the outcome of the trained
innetwork
order toshould
obtain bethe best predictive
validated and verified network,
for selectedappropriate machine
testing datasets, learning-based
randomly. The datasetsolutions
utilized inhave
this
towork
be introduced.
is generatedLastly,
by some theofoutcome
the mostof the trained
influential inputnetwork should
layers, such as be validated
surface andarea,
area, roof verified for
relative
selected testing
compactness, datasets,
wall randomly.
area, glazing The dataset
area, glazing utilized in
area distribution, this work
overall height,isandgenerated by which
orientation, some are
of the
the
effective
most parameters
influential input influencing
layers, suchthe heating loadarea,
as surface valueroof
in energy-efficient buildings. Notewall
area, relative compactness, that area,
the employed
glazing
dataset was obtained from a recent study conducted by Tsanas and Xifara [43].
area, glazing area distribution, overall height, and orientation, which are the effective parameters
All sixthe
influencing machine
heatinglearning analyses
load value provided in the
in energy-efficient current study
buildings. werethe
Note that performed
employed using Waikato
dataset was
Environment for Knowledge Analysis (WEKA). WEKA is a java-based open-source machine learning analyzer
obtained from a recent study conducted by Tsanas and Xifara [43].
software that was developed in University of Waikato, New Zealand. Each of the proposed techniques were
All six machine learning analyses provided in the current study were performed using Waikato
performed in optimized settings as explained in this section.
Environment for Knowledge Analysis (WEKA). WEKA is a java-based open-source machine learning
analyzer software
3.1. Multi-Layer that was
Perceptron developed
Regressor in University of Waikato, New Zealand. Each of the proposed
(MLPr)
techniques were performed in optimized settings as explained in this section.
The MLP is a widely-used and well-known predictive network. Accordingly, MLPr aims to coordinate
the best potential of regression between a set of data samples (shown here in terms of S). The MLPr divides
3.1. Multi-Layer Perceptron Regressor (MLPr)
the S into both of the set training and testing databases. An MLP involves several layers of computational
TheSimilar
nodes. MLP is to amany
widely-used and well-known
previous MLPr-based studies,predictive network.
a single hidden Accordingly,
layer was MLPr
used. This is aims
because to
even
with a single
coordinate the hidden layer and
best potential increasing the
of regression numbera set
between of nodes
of datainsamples
the hidden layer here
(shown an excellent
in termsrate of
of S).
prediction
The can be achieved.
MLPr divides the S intoFigure 4 shows
both of the seta training
common andMLPtesting
structure. The optimum
databases. An MLPnumber of neurons
involves in
several
each of the hidden layer are obtained after a series of trial and error processes (i.e., sensitivity analysis) as
layers of computational nodes. Similar to many previous MLPr-based studies, a single hidden layer
shown in Figure 5. Noteworthily, only one hidden layer was selected since the accuracy of a single hidden
layer was found to be high enough to not make the MLP structure more complicated.
Appl. Sci. 2019, 9, 4338 6 of 17

was used. This is because even with a single hidden layer and increasing the number of nodes in the
hidden layer an excellent rate of prediction can be achieved. Figure 4 shows a common MLP structure.
The optimum number of neurons in each of the hidden layer are obtained after a series of trial and
error processes (i.e., sensitivity analysis) as shown in Figure 5. Noteworthily, only one hidden layer
was selected since the accuracy of a single hidden layer was found to be high enough to not make the
MLP structure
Appl. Sci. 2019, 9,more complicated.
x FOR PEER REVIEW 8 of 19

Figure 4. Multi-layer perceptron regressor (MLPr) neural network typical architecture.

Each node generates a local output. In addition, it sets the local output to the subsequent layer (the next
nodes in a further hidden layer) until reaching the nodes of output, i.e., the nodes placed in the layer of output.
Equation (1) shows the normal operation carried out considering a dataset of N groups of records by the jth
neuron to compute the predicted output:
𝑂 = 𝐹 (∑ 𝐼 𝑊 + 𝑏 ), (1)
where I symbolizes the input, b denotes the bias of the node, W is the weighting factor, and F signifies the
activation function. Tansig (i.e., the tangent sigmoid activation function) is employed (Equation (2)). Note that
we can have several types of activation functions (e.g., (i) sigmoid or logistic; (ii) Tanh—Hyperbolic tangent;
(iii) Relu—rectified linear units) and that their performances are best suitable for different purposes. In the
specific case of the sigmoid, this function (i) is real-valued and differentiable (i.e., to find gradients); (ii) has
analytic tractability for the differentiation operation; and (iii) is an acceptable mathematical representation
biological neuronal behavior.
2
𝑇𝑎𝑛𝑠𝑖𝑔(𝑥) = −1 (2)
1+𝑒
Figure 4. Multi-layer perceptron regressor (MLPr) neural network typical architecture.
Figure 4. Multi-layer perceptron regressor (MLPr) neural network typical architecture.
1.000 node generates a local output. In addition, it sets the
Each 0.200
local output to the subsequent layer (the next
nodes in a further hidden layer) until reaching the nodes of output, i.e., the nodes placed in the layer of output.
0.990 (1) shows the normal operation carried out considering
Equation 0.160 a dataset of N groups of records
Train by the jth
neuron to compute the predicted output: Test
0.980 0.120 Average
𝑂 = 𝐹 (∑ 𝐼 𝑊 + 𝑏 ),
RMSE
R²

(1)
where0.970
I symbolizes the input, b denotes the Train
bias of the node,0.080
W is the weighting factor, and F signifies the
activation function. Tansig (i.e., the tangentTest
sigmoid activation function) is employed (Equation (2)). Note that
0.960 Average 0.040
we can have several types of activation functions (e.g., (i) sigmoid or logistic; (ii) Tanh—Hyperbolic tangent;
(iii) Relu—rectified linear units) and that their performances are best suitable for different purposes. In the
0.950
specific case of the sigmoid, this function (i) is real-valued and 0.000
differentiable (i.e., to find gradients); (ii) has
analytic tractability for the differentiation operation; and (iii) is an0 acceptable
0 2 4 6 8 10 2 4 6
mathematical 8 10
representation
Number of nodes in hidden layer Number of nodes in hidden layer
biological neuronal behavior.
(a) (b)
2
𝑇𝑎𝑛𝑠𝑖𝑔(𝑥) = − 1 (2)
FigureFigure 5. Sensitivity
5. Sensitivity analysis
analysis basedon
based 1 +ofof
on number
number 𝑒 neurons
neurons in in
a single hidden
a single layer.layer.
hidden

Each2019,
Appl. Sci.
node generates a local output. In addition, it sets the localwww.mdpi.com/journal/applsci
output to the subsequent layer
1.000 9, x; doi: FOR PEER REVIEW 0.200
(the next nodes in a further hidden layer) until reaching the nodes of output, i.e., the nodes placed in
the layer0.990of output. Equation (1) shows the normal operation 0.160carried out considering aTrain dataset of N
groups of records by the jth neuron to compute the predicted output: Test
0.980 0.120 Average
RMSE
R²

XN
Oj = F ( In Wnj + b j ), (1)
0.970 Train n=1 0.080
Test
where I symbolizes the input, b of the node, W is the weighting factor, and F
denotes the bias
0.960 Average 0.040
signifies the activation function. Tansig (i.e., the tangent sigmoid activation function) is employed
(Equation (2)). Note that we can have several types of activation functions (e.g., (i) sigmoid or logistic;
0.950 0.000
(ii) Tanh—Hyperbolic
0 2 tangent;
4 (iii) 6Relu—rectified
8 10linear units) and
0 that
2 their
4 performances
6 8 are10best
suitable for different purposes.
Number Ininthe
of nodes specific
hidden Number
layer case of the sigmoid, this of nodes
function (i)inishidden layer and
real-valued
(a) (b)
Figure 5. Sensitivity analysis based on number of neurons in a single hidden layer.

Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci

Appl. Sci. 2019, 9, 4338 7 of 17

differentiable (i.e., to find gradients); (ii) has analytic tractability for the differentiation operation; and
(iii) is an acceptable mathematical representation biological neuronal behavior.

2
Appl. Sci. 2019, 9, x FOR PEER REVIEW
Tansig(x) = −1 9 of 19
(2)
1 + e−2x

3.2.
Appl.Lazy
3.2. Locally
Lazy
Sci. 2019, Weighted
9,Locally
x FOR Learning
Weighted
PEER (LLWL)
Learning
REVIEW (LLWL) 9 of 19
Similar
Similar to thetoK-starthe K-star technique technique (i.e., an(i.e., an instance-based
instance-based classifier), classifier),
locally-weighted locally-weighted learning learning(LWL) [44] is
3.2. Lazy
(LWL) Locally Weighted
[44] is one of theLearning
common (LLWL) types of lazy learning-based solutions. Lazy learning approaches
one of the common types of lazy learning-based solutions. Lazy learning approaches provide valuable training
Appl. provide
Sci.
algorithms 2019,
Similarand valuable
9, x FOR
to the PEER training
representations REVIEW
K-star technique algorithms
for learning and representations
about complex classifier),
(i.e., an instance-based phenomenafor learning about
during autonomous
locally-weighted complex learning phenomena
9
adaptiveof 19
(LWL)control [44] is
one during
of complex
of autonomous
systems.
the common Commonly,
types adaptive
of control
there
lazy learning-based are of complex
disadvantages
solutions. systems.
in employing
Lazy Commonly,
learning there are
such methods.
approaches providedisadvantages
Lazyvaluable
learners training in
create a
3.2.
Appl. Lazy
considerable
Sci. Locally
employing
2019, 9, delay
x Weighted
such
FOR methods.
during
PEER Learning
REVIEW the Lazy (LLWL)
network learners create
simulation. a
algorithms and representations for learning about complex phenomena during autonomous adaptive control considerable
More explanations delay during
about the
this network
model simulation.
are 9 provided
of 19 by
Atkeso,
of More
complex
Similar explanations
et Appl.
al. to[44].
systems.
the K-star x FOR about
Sci. 2019, 9,Commonly, this there
PEER REVIEW
technique model(i.e.,arearedisadvantages
an provided by Atkeso,
instance-based inclassifier),
employinget al.locally-weighted
[44].
such methods. Lazy
9 of 19
learning learners
(LWL)create [44] isa
3.2. Lazy
The
considerable
Appl. Sci. Locally
The
key
2019, 9, key
delay
x Weighted
options
FOR options we
during
PEER Learning
we
have
REVIEW have
the in (LLWL)
in
LLWL
network LLWL include
include
simulation. number
number More ofof decimal
decimal
explanations places
places about (numDecimalPlaces),
(numDecimalPlaces),
this model are 9 batch
batch
provided
of 19 size
by
one of the 3.2. common types
Lazy Locally of lazy
Weighted learning-based
Learning (LLWL) solutions. Lazy learning approaches provide valuable training
Atkeso,size et
(batchSize),
algorithms (batchSize),
al.
and KNN
[44]. KNN (following
(following
representations for the
the learning k-nearest
k-nearest about neighbors
neighbors algorithm), nearest
algorithm),
complex classifier),
phenomena duringneighbor
nearest neighbor
autonomous searchsearch algorithm
adaptive algorithm
control
Similar to the K-star technique
Similar to the K-star technique (i.e., (i.e., an instance-based
an weighting
instance-based classifier), locally-weighted
locally-weighted learning learning
(LWL) [44] (LWL)
is [44] is
3.2.
of (nearestNeighborSearchAlgorithm),
Lazy
complex Locally
systems. Weighted
(nearestNeighborSearchAlgorithm),
The key options we Learning
Commonly, have in (LLWL)
LLWL
there and
areand weighting
include
disadvantages number Kernel
Kernelinof (weightingKernel).
(weightingKernel).
decimal
employing places
such More
More
(numDecimalPlaces),
methods. explanations
Lazy explanations
learners are
batch
create area
size
one of the one common types of
of the common lazy
types learning-based
of lazy learning-basedsolutions. solutions. Lazy Lazy learning
learning approaches
approaches providetraining
provide valuable valuable training
provided
provided
(batchSize),
considerable below below
KNN for for
each each
(followingof theof the
above
the above influential
influential
k-nearest parameters.
parameters.
neighbors algorithm), nearest neighbor search algorithm
algorithms
Similar andtodelay
algorithmsthe during
representations
and
K-star the for
representations
technique networklearning
for simulation.
learning
(i.e., an about
about complex
instance-based More
complex explanations
phenomena
phenomena
classifier), during about
during
autonomous
locally-weighted this adaptive
autonomous model
learning are
adaptive
control provided
(LWL) [44]by
control is
 numDecimalPlaces—The
numDecimalPlaces—The
of complex
(nearestNeighborSearchAlgorithm),
Atkeso,
of complex et al. [44]. systems. Commonly, number
number
there
and areof decimal
of decimal
disadvantages
weighting places.
places.
one of the common types of lazy learning-based solutions. Lazy learning approaches provide valuable traininga
systems. Commonly, there are disadvantages This
in employing
Kernelin This
employing number
number
such will
methods.
(weightingKernel).
such willbe implemented
be
Lazy
methods. implemented
learners
More Lazy create for
explanations
learners the
a for output
the
create are
considerable delay during the model.
network simulation. More explanations about this model are provided by
providedTheofkey
considerable
algorithms numbers
output
below
and options
delay in
of numbers
for each we
during theof
representations model.
in the
the
havethe above
in LLWL
network
for influential
learning aboutparameters.
include
simulation. number Moreofphenomena
complex decimal places
explanations about
during (numDecimalPlaces),
this model are
autonomous adaptive batch
provided size
controlby
Atkeso, et al. [44].

(batchSize),
Atkeso,
of complex batchSize—The
numDecimalPlaces—The
etbatchSize—The
al. KNN
The key(following
[44].
systems. chosen
optionschosen
Commonly, we have
number
the number
number of
are of
ink-nearest
there LLWL
cases
of decimal
cases to process
neighbors
disadvantages
include toplaces.
number process if batch
This
algorithm),
ofindecimal
employing estimation
if number
places estimation
nearest
such is
willmethods.
batch(numDecimalPlaces), being
beneighbor
implemented completed.
is being
Lazybatch for the
completed.
search
learners
size
A normal
outputa
algorithm
create
Theofvalue
A numbers
normalof thevalue
(nearestNeighborSearchAlgorithm),
considerable key options
(batchSize),
delay inbatch
KNNwe
during theof size
model.
the
have
(following is LLWL
the batch
in 100.
network the In
size
and isthis
k-nearest 100.example
weighting
include
simulation. Innumber
thisMore
neighbors we
example
Kernel ofalso
algorithm), consider
we also
(weightingKernel).
decimal
explanations places
nearest it
consider to(numDecimalPlaces),
neighbor
about bethis
itconstant
to be
More
search
model as
constant
algorithm
areit did itnot
as batch
explanations
provideddidhave are
size
by
 significant
batchSize—The
providedetbelow
(batchSize),
Atkeso, not have
al. KNN
[44]. impact
significant
for each
(following on
(nearestNeighborSearchAlgorithm),
chosen the
impact
of the above outputs.
number on and
of
the weighting
cases
outputs.
influentialneighbors
the k-nearest to Kernel
process
parameters. if (weightingKernel).
batch estimation More
is
algorithm), nearest neighbor search algorithm explanations
being completed. are A normal
 provided below for each ofof the above influential parameters.
 TheKNN—The
value
KNN—The of thenumber
numDecimalPlaces—The
(nearestNeighborSearchAlgorithm),
key options batch
number
we have
numDecimalPlaces—The
size ofneighbors
is
in 100.and
number
neighbors
LLWL number
that
In include
this
that are are
of decimal
employed
example
ofweighting
decimal employed
number weofalso
places.
Kernel
places.
to
Thisset
to
Thisdecimal
number
thethe
consider
number
set width itwill
width
(weightingKernel).
places
will
of
to the
be weighting
constant
of implemented
the weighting
More asfunction
itfor
for theexplanations
(numDecimalPlaces),
be implemented
didthe
functionnot
output batch size
(noting
have
output
are
that
of
provided below
(batchSize), KNN
significant
numbers
(notingKNN <=
that
for 0
impact
in
eachKNN means
the
(following
of numbers on
model.
≤
ofinthe 0all
the neighbors
outputs.
means
above all are
neighbors
influentialneighbors
the k-nearest
the model.
considered).
are
parameters. considered).
algorithm), nearest neighbor search algorithm
 nearestNeighborSearchAlgorithm—The
 KNN—The
numDecimalPlaces—The number
 batchSize—The
batchSize—The chosen
nearestNeighborSearchAlgorithm—The
(nearestNeighborSearchAlgorithm), ofchosen
neighbors
number
number andofthat
number ofcases
of decimal
cases
weighting potential
arepotential
employed
to
to process
places.
process Kernel nearest
ifto
This
ifnearest
batch set
batch neighbor
the
number width
estimation
estimation
neighbor
(weightingKernel).iswillofsearch
being
search the
be algorithm
weighting
iscompleted.
being
implemented
algorithm
More completed. toapplied
tofunction
for
A explanations
normalbe be A applied
the (noting
normal
output
are
(the
that
value
of default
KNN
numbers
(the of value
default <=
the algorithm
in0of
batch the
means
the
algorithm batch
size
model. that
all size
is
that
provided below for each of the above influential parameters. was
is
100. 100.
neighbors
was Inalso
Inthis
also selected
this
are example in
considered).
example
selected inweour
we also
our study
also consider
study was
consider
was it LinearNN).
to be
it constant
LinearNN).to be as it
constant did not
as have
it did not have
significant impact on the outputs.
 weightingKernel—The
nearestNeighborSearchAlgorithm—The
 significant
batchSize—The impact on the
chosen number
outputs. that determines
potential thenearest
weighting function.
neighbor is(0implemented
search = algorithm
Linear; 1Linear;
= for
= (noting Epnechnikov;
to be A1applied
=
numDecimalPlaces—The
weightingKernel—The
 KNN—The numbernumber ofnumber
neighbors ofof cases
thatdecimal
that to process
determines
are places.
employed toif the
setbatch
Thisthe estimation
number
weighting
width of thewill be
function.
weighting being completed.
(0
function the normal
output
 of 2 =
(the
value Tricube;
KNN—The default
numbers of that
Epnechnikov; 3
in =
batch
KNN
Inverse;
algorithm
thenumber the
2<==model.
of
0size that4 =
neighbors
Tricube;
means is all Gaussian;
was
100. =In also
3neighbors that
thisare
Inverse; and
selected
are 5 =in
4 employed
example Constant.
= Gaussian;
considered). our
we also study
to and
set(default
was
the
consider 0 =
LinearNN).
5 =width it to
Constant. Linear)).
of bethe(default
weighting
constant = function
0 as it did not
Linear)). (noting
have
 A good
that
 batchSize—Theexample
weightingKernel—The

KNN
significant <= 0of
impact k-nearest
means on all
the numberneighbors
neighbors
outputs.
nearestNeighborSearchAlgorithm—The that algorithm
determines
are considered).
potential istheshownweighting
nearest in Figure
neighbor 6.
function.
search The test
(0
algorithm = sample
Linear;
to be 1 (red
=
applied dot)
Epnechnikov;should
A good examplechosen of k-nearestnumber of cases algorithm
neighbors to processisifshown batch estimation
in Figure 6.isThe beingtestcompleted.
sample (redAdot) normal
be classified
 2 either
= Tricube; (theas blue
default
3batch
nearestNeighborSearchAlgorithm—The
KNN—The
value of the number squares
algorithm
= Inverse; of
size or
that
4 =100.
neighbors
is aswas
Ingreen
Gaussian; that
this triangles.
also selected
and
are potential
employed
example Ifstudy
5 =inConstant.
our
we knearest
=to3set
also (i.e.,
was depicted
LinearNN).
(default
neighbor
the
consider width0it= to inbe solid
Linear)).
ofsearch
the line circle)
algorithm
weighting
constant as toit be
function is not
depicted
applied
(noting
should beclassified either as blue
weightingKernel—The number squares or as green
that determines triangles.
the weighting If k =
function. (0 3 (i.e., depicted in solid linehave
it did
by the A green
good
(the
that triangles
example
default
KNN
significant <= ofas
algorithm
0
impact there
k-nearest
means on are
that
all
the two
neighbors
was
neighbors
outputs. triangles
also selected
are (reversed
algorithm in
considered). is
our instudy
shown shape) inwas and
Figure only6. =The
LinearNN).
Linear;
one test 1 = Epnechnikov;
rectangle
samplethrough (red dot) theshould
inner
circle) it is depicted2 = Tricube; by3the green4 triangles
= Inverse; = Gaussian;as and there are two(default
5 = Constant. triangles (reversed in shape) and only one
0 = Linear)).
(i.e.,
be 
 continuous
classified either line)number
goodas
weightingKernel—The circle.
blue squares
nearestNeighborSearchAlgorithm—The
KNN—The If kk-nearest
ofof
=number
5 (dashed
or neighbors
neighbors as that
green line
that circle)
triangles.
determines
are itisisshown
potential
employed Ifassigned
the knearest
=to3inset
weighting to
(i.e., the blueofsearch
depicted
function.
neighbor rectangles
in solid
(0circle)
= (redline(three
Linear;
algorithmcircle) =blue
1assigned isrectangles
it be depicted
Epnechnikov;
to applied
rectangle A
through example
the inner (i.e., continuous algorithm
line) circle. =the
If kFigure width
6. The test
5 (dashed the
sample
line weighting dot) function
it isshould (noting
to
vs. the
by twogreengreen
2 =
(the
that triangles
triangles
beTricube;
default
classified
KNN <= 3 =as inside
algorithm
either
0 there
Inverse;
as blueall
means the
are
that4 =
squaresouter
twoGaussian;
was
neighbors circle).
triangles
or also
as green are andVariation
selected(reversed
5
triangles.=in
considered). of
Constant.
Ifour in the
k = 3studyshape) correlation
(i.e.,(default
was and
depicted only
0 =
LinearNN).
in coefficient
one
Linear)).
solid rectangle (R² ) versus
through
line circle) it is depicted number
the of
inner
the blue rectangles (three blue rectangles vs. two green triangles inside the outer circle). Variation
used
(i.e., KNN by
continuous
A the
good neighbors
the green
example is
line) circle.
weightingKernel—The of shown
triangles as in
there
If k =number
k-nearest Figure
are two
5 neighbors
(dashed 7.
that It
triangles
line can be
circle)
algorithm
determines seen
(reversed in
it isthe that
shape)
assigned
shown changing
weighting and
into only
the
Figure the
one
blue KNN
rectangle
6. The
function. could
rectangles
test
(0 through significantly
the
(three
sample
= algorithm
Linear; inner
=blue
(red enhance
rectangles
dot) should
of nearestNeighborSearchAlgorithm—The
correlation
(i.e., continuous coefficient
line)For
circle. If k (R= 25)(dashed
versus number
line
potential
circle) ofassigned
it KNN
is used nearest KNN
toKNN
neighbor
the blueneighbors
rectangles
searchis shown
(three in 1Figure
blue rectangles
Epnechnikov;
to be 7.applied
It
theclassified
vs.
be correlation
two green
2 =
(the coefficient.
triangles
either
Tricube;
default as3 blue
= inside
squares
Inverse;
algorithm the
that4 =cases
outer
or as of
green
Gaussian;
was also KNN
circle). = −1,
Variation
triangles.
and
selected 5 =in If of
Constant.
our k ==
the
study 2,
3 correlation
(i.e.,
(default
was = 4,
depicted
0 KNN
=
LinearNN). in =
coefficient
solid
Linear)). 6, KNN (R²
line ) = 8,
versus
circle) andit isKNN
number =10,
depicted of
can bevs.seen that changing
two green triangles inside the the KNN outercould circle).significantly
Variation of theenhance correlationthe correlation
coefficient (R²) versuscoefficient.
number ofFor the
the
used training
by the KNN
A green
good correlation
neighbors
triangles
example is
of coefficients
shown
asKNNthere
k-nearest in
are twowere
Figure 0.9025,
7.
triangles It can 0.9579,
be seen
(reversed 0.9861,
that 0.9916,
changing
inchanging
shape) and 0.9937,
the
only KNN and
one(0 0.9943,
could
rectangle respectively.
significantly
through In
enhance
theshouldthe
inner
cases weightingKernel—The
used
of KNN KNN=neighbors
−1, =
is shown 2,neighbors
number in Figure
KNN that
= 4, algorithm
7. determines
It
KNNcan 6, is
be=seen theshown
that
KNN = 8,inand
weighting Figure
the KNN
KNN 6.could
function. The
=10, test sample
= Linear;
significantly
the training 1(red
enhance dot)
= Epnechnikov;
correlation
case
the
(i.e.,
be of our
correlation
continuous
classified study
the either we
line)
correlation
2 = Tricube; as proposed
coefficient.
circle.
blue For
If k
squares
coefficient. the
the
=For5 KNN
cases
(dashed
or
the as =
of
green
cases −1
KNN
line
of KNNas it= considers
circle)−1,
triangles.
= −1, itKNN
KNN isIfassigned
k
= =
2,all
= 2,
3
KNN neighbors.
KNN to
(i.e.,
= 4,the=
KNN 4,
blue
depicted KNN
= 6, =
rectangles
in
KNN solid6,
= KNN
8, line
and(three= 8,
circle)
KNN and
blue
=10,it isKNN
rectangles=10,
depicted
coefficients were30.9025,= Inverse; 0.9579, 4 = 0.9861,
Gaussian; 0.9916, and 0.9937,
5 = Constant.and 0.9943, (default 0 = Linear)).
respectively. In the case of our study
the
vs. training
two thecorrelation
green training
triangles correlation
inside coefficients
coefficients the were
outer were 0.9025,Variation
0.9025,
circle). 0.9579, 0.9861,
0.9579, 0.9861,
of 0.9916,
the 0.9937,
0.9916,
correlation and coefficient
0.9937, 0.9943,
and respectively.
0.9943, (R² In the number
respectively.
) versus In the
of
by theweA green
good
proposed triangles
example the of
KNNask-nearest
there= −1 are two
asneighbors triangles
it considers (reversed
algorithm in shape)
is shown and only
in Figure 6. The onetest rectangle
samplethrough (red dot) theshould
inner
case of our study we proposed the KNN = −1 asall neighbors.
it considers all neighbors.
case
used
(i.e.,
be of
KNNour neighbors
continuous
classified study
either we
line)asproposed
is shown
circle.
blue If k =the
squares KNN
in5 Figure
(dashed
or =7.−1
as green It as
line can itbeconsiders
circle)
triangles. seen k =all3changing
that
it isIfassigned neighbors. thethe
todepicted
(i.e., blue KNN could
rectangles
in solid line significantly
(three
circle) blue enhance
it isrectangles
depicted
the
vs. correlation
two green coefficient.
triangles For
inside the cases
outer of KNN
circle). = −1,
Variation KNN
by the green triangles as there are two triangles (reversed in shape) and only one rectangle through the inner of =
the2, KNN
correlation = 4, KNN =
coefficient 6, KNN (R² ) = 8,
versus and KNN
number =10,
of
the training
used
(i.e., continuous correlation
KNN neighbors is coefficients
line) circle. shown If k =in5 Figure were 7.
(dashed 0.9025,
It can
line 0.9579,
be seen
circle) it is0.9861,
that changing
assigned 0.9916,
to the 0.9937,
the
blue KNN andcould
rectangles0.9943, respectively.
significantly
(three In the
enhance
blue rectangles
casetwo
the
vs. of our
correlation
greenstudy we proposed
coefficient.
triangles For the
inside theouterKNNof
cases = −1 KNN
circle). as Variation
it=considers
−1, KNN of =allthe2,neighbors.
KNN
correlation = 4, KNN = 6, KNN
coefficient (R²)=versus
8, andnumber KNN =10, of
the training
used correlation
KNN neighbors is coefficients
shown in Figure were 7. 0.9025,
It can0.9579,be seen0.9861, that changing0.9916, 0.9937, the KNN andcould 0.9943, respectively.
significantly In the
enhance
casecorrelation
the of our study we proposed
coefficient. For the the cases
KNNof = −1 KNN as it=considers
−1, KNN =all2,neighbors. KNN = 4, KNN = 6, KNN = 8, and KNN =10,
the training correlation coefficients were 0.9025, 0.9579, 0.9861, 0.9916, 0.9937, and 0.9943, respectively. In the
case of our study we proposed the KNN = −1 as it considers all neighbors.

6. Example
FigureFigure of k-nearest
6. Example of k-nearest neighbors (KNN)
neighbors (KNN) regression/classification.
regression/classification.

Figure 6. Example of k-nearest neighbors (KNN) regression/classification.

Appl. Sci. 2019, 9, 4338 8 of 17

0.98

Correlation coefficient (R²) 0.96

0.94

0.92

0.9

0.88

0.86

0.84
-1 0 1 2 3 4 5 6 7 8
number of neighbours employed

Variationofofthe
Figure7.7.Variation thecorrelation
correlation coefficient 2 ) versus number of used KNN neighbors in lazy
(Rversus
Figure coefficient (R²) number of used KNN neighbors in lazy locally
locally weighted
weighted learning learning (LLWL) technique.
(LLWL) technique.

3.3. Alternating Model Tree (AMT)

Alternating model tree (AMT) [45] is supported by ensemble learning. In this technique, a single
tree will form the structure of AMT. Therefore, it can be compared with the M5P tree algorithm (i.e.,
a reconstruction of Quinlan’s M5 algorithm for developing trees of regression models). It is well known
that the M5P combines a conventional decision tree with the possibility of linear regression functions at
the nodes. This model has been successfully employed in different subjects [46,47]. As the most similar
technique with the AMT, alternating decision trees (ADT) provide the predictive power of decision
tree ensembles in a single tree structure. Existing approaches for growing alternating decision trees
focus on classification problems. In this paper, to find a relationship between the inputs and output
layer we have proposed the AMT for regression, inspired by work on model trees for regression. As in
most machine learning-based solutions, there are different parameters that can directly influence the
accuracy of the prediction; we have run sensitivity analysis for different influential parameters. Since
the highest variations in the results obtained stemmed from the term‘number of iterations’ we ran the
analysis with different iteration numbers. To have a different data validation system, a new system of
10 k-fold selection was used here. It can be seen that the R2 reduces when the number of iterations
increases. Therefore, the number of iterations equal to 10 was used as the default in the Weka software.
Some of the influential terms that can influence the accuracy of the regression are
number of iterations (numberOfIterations), batch size (batchSize), and number of decimal places
(numDecimalPlaces).
NumberOfIterations—Sets the number of iterations to perform. A sensitivity analysis is provided
to select a proper number of iterations for the proposed AMT structure (as shown in Table 2 and
Figure 8).

Table 2. Evaluation metrics calculated for the alternating model tree (AMT) method varied based on
number of iterations.

Number of Iterations
Evaluation metrics 10 20 30 40 50
Correlation coefficient 0.9984 0.9971 0.9974 0.9975 0.9972
Mean absolute error 0.4349 0.7527 0.7051 0.6464 0.6666
Root mean squared error 0.5752 0.9566 0.8936 0.8495 0.8995
Relative absolute error (%) 4.75 7.94 7.43 6.82 7.0341
Root relative squared error (%) 5.69 8.94 8.35 7.93 8.4062
Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 19

Root mean squared error 0.5752 0.9566 0.8936 0.8495 0.8995

Relative absolute error (%) 4.75 7.94 7.43 6.82 7.0341

Appl. Sci. 2019, 9, 4338 9 of 17
Root relative squared error (%) 5.69 8.94 8.35 7.93 8.4062

0.9986

0.9984
Correlation coefficient (R²)
0.9982

0.998

0.9978

0.9976

0.9974

0.9972

0.997
0 10 20 30 40 50 60
Number of iterations
Figure
Figure 8. Variation
8. Variation of the
of the correlation
correlation coefficient
coefficient (R2versus
(R²) ) versusnumber
numberof
of iterations,
iterations, in
in alternating
alternatingmodel
model tree
tree (AMT)
(AMT) technique.technique.

3.4. Random NumDecimalPlaces—Is

Forest (RF), as described in LLWL. Based on the required accuracy, up to two decimals
are considered for final outputs.
The BatchSize—The
random forest (RF) technique
preferred number[48] is
ofwell known
instances toas an ensemble-learning
process. solution
In the current study thethat can batch
default be applied
to regression as well
size of 100 is considered.as classification trees [49]. To improve the performance of classification trees, RF
randomly alters the relations dealing with predictions. For providing the forest, some parameters (for
3.4. Random
example, Forest of
the number (RF)
variables that split the nodes (g) and the number of trees (t)) need to be determined
by the user. In this regard,
The random forest (RF) the settings
technique that
[48]were chosen
is well known foras
performing the RF techniques
an ensemble-learning solutionwere as follows:
that can be
seedapplied
= 1; number of execution slots = 1; number of decimal places = 2; the batch size = 100; the number of
to regression as well as classification trees [49]. To improve the performance of classification
iterations = 100; the maximum depth = 0; should RF compute attribute importance = False; the number of
trees, RF randomly alters the relations dealing with predictions. For providing the forest, some
features = 0. This technique has been employed and recommended as a good solution in numerous studies
parameters (for example, the number of variables that split the nodes (g) and the number of trees (t))
(Ho [50], Svetnik et al. [51], Diaz-Uriarte and de Andres [52], and Cutler et al. [53]).
need to be determined by the user. In this regard, the settings that were chosen for performing the RF
techniques were as follows: seed = 1; number of execution slots = 1; number of decimal places = 2; the
3.5. ElasticNet (ENet)
batch size = 100; the number of iterations = 100; the maximum depth = 0; should RF compute attribute
To understand
importance = False;
howtheENet finds of
number features =we
a solution, 0. need
This to make some
technique assumptions.
has been employed Consider a set of samples
and recommended
{(xi, as
yi),aigood
= 1, 2,solution
… N}, where each x ∈ R p and yi ∈ R. Also, consider y = (y1, y2, …, yn)T and X ∈ Rn×p as denoting
in numerous studies (Ho [50], Svetnik et al. [51], Diaz-Uriarte and de Andres [52],
i

the vector that is

and Cutler et called “response vector” and the set design matrix, respectively. During the model analyzing,
al. [53]).
ENet (as described in Zou and Hastie [54]) establishes a linear program of two parameters (K1 and K2) to
3.5. ElasticNet
estimate the target. (ENet)
To do this, ENet should minimize the squared loss with K2-regularization and K1-norm
constraint, To understand how ENet finds a solution, we need to make some assumptions. Consider a set of
samples {(xi , yi ), i = 1, 2, . . . N}, 𝑚𝑖𝑛where
‖𝑋𝛽 − 𝑦‖ x+
each p and y ∈ R. Also, consider y = (y , y , . . . , y )T and X
i ∈𝜇R‖𝛽‖ isuch that |𝛽| ≤ 𝑔, 1 2 n
n×p ∈
(3)
∈R as denoting the vector that is called “response vector” and the set design matrix, respectively.
During
where β = [βthe
1, β2,model ∈ Rp denotes
…, βZ]Tanalyzing, ENet
the(as described
weight vector, inμZou
2 ≥ 0 and
is theHastie [54]) establishes
P2-regularization a linear
factor, and g program
> 0 represents
by Pof two parameters
1-norm budget. The(K and K2 ) toencourages
K11constraint estimate the thetarget.
method Toto dobethis, ENetThe
sparse. should minimize
presence of the the squared
K2 regularization
factor causes
loss with K the acquisition of and
2 -regularization a unique solution
K1 -norm by making the problem severely convex, and if 𝑃 ≫ 𝑁 the
constraint,
optimization continues as stable for noticeable values of g. Furthermore, it helps the solution to be more stable
2 2
when there is a high correlation between min kXβthe
β∈Rp 2 + µ2 kβkFor
− ykfeatures. 2 such
the that
number ofg,
|β|1 ≤ the models (i.e., the set length (3)of the
lambda sequence to be generated), the value of 100 was used. For the number of decimal places (i.e., the
number
where ofβdecimal
= [β1 , β2places
, . . . , βZto]T be
∈ Rused for the
p denotes theoutput
weightof numbers
vector, µ2 ≥ in0 isthe
themodel), as usual, the
P2 -regularization value
factor, andof
g >2 was
0 represents by P1 -norm budget. The K1 constraint encourages the method to be sparse. The presence of
the K2 regularization factor causes the acquisition of a unique solution by making the problem severely
Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci
convex, and if P N the optimization continues as stable for noticeable values of g. Furthermore, it
Appl. Sci. 2019, 9, 4338 10 of 17

helps the solution to be more stable when there is a high correlation between the features. For the
Appl. Sci. 2019, 9, x FOR PEER REVIEW 12 of 19
number of the models (i.e., the set length of the lambda sequence to be generated), the value of 100
was used.
selected. For the
The batch sizenumber of decimal
was considered toplaces
be 100.(i.e.,
Thethe number
values of decimal
of alpha placeswere
and epsilon to beset
used for0.001
to be the and
output of numbers in the model), as usual, the value of 2 was selected. The batch size was considered
0.0001, respectively. Along with the above-mentioned structure, a unique linear regression equation can also
to be 100.
be found fromThe
thevalues
ENet asof shown
alpha and epsilon were
in Equation (4): set to be 0.001 and 0.0001, respectively. Along with the
above-mentioned structure, a unique linear regression equation can also be found from the ENet as
HL = -0.049
shown in Equation (4): × X2 + 0.100 × X3 + -0.075 × X4 + 0.144 × X5 + −0.003 × X6 + 0.051 × X7 + (4)
0.161 × X8 + 35.597.
HL = −0.049 × X2 + 0.100 × X3 + −0.075 × X4 + 0.144 × X5 + −0.003 × X6 + 0.051 × X7 +
(4)
3.6. Radial Basis Function Regression (RBFr) 0.161 × X8 + 35.597.
Radial
3.6. basis
Radial Basisfunction
Functionnetwork (RBFr)
Regression (RBFr)has a unique structure, as explained in Figure 9. Equation (5)
illustrates the basis function of this network [55]. For solving the issue, radial basis function regression can be
Radial basis function network (RBFr) has a unique structure, as explained in Figure 9. Equation (5)
used by fitting a collection of kernels for the dataset. In addition, this method attends the position of noisy
illustrates the basis function of this network [55]. For solving the issue, radial basis function regression
samples.
can be used by fitting a collection of kernels for the dataset. In addition, this method attends the
position of noisy samples. ‖𝑥 − 𝑥 ‖
𝑂 = 𝐾 (5)
 kx −𝜏xi k 

Oi = K  (5)
Oi stands for the output of the neuron, and xi shows τthe
2

i center of kernel K. in addition, the term 𝜏 stands
for the width of the ith RBF unit.

Figure9.9.Typical
Figure Typicalarchitecture
architectureofofradial
radialbasis
basisfunction
functionregression
regression(RBFr)
(RBFr)neural
neuralnetwork.
network.

TheOmodel
i standsof
forRBFr utilizes
the output of athe
bath algorithm
neuron, for predicting
and xi shows the centerthe number
of kernel of addition,
K. in developed term τi This
thekernels.
prediction is performed
stands for the width by bathithalgorithms.
of the RBF unit. The specific function of expectation that is utilized in RBFR model
is as below:
The model of RBFr utilizes a bath algorithm for predicting the number of developed kernels. This
prediction is performed by bath algorithms.
𝐹(𝑥) = ∑The 𝑘(‖𝑥
specific
− 𝑥function
‖) 𝜑 . of expectation that is utilized(6) in
RBFR model is as below:
‖𝑥‖ stands to symbolize the Euclidean norm Xz
on 𝑥. 𝑘(‖𝑥 − 𝑥 ‖)| 𝑖 = 1,2, . . . , 𝑧 stands as a group of 𝑧
F(x) = k(kx − x k) ϕi . (6)
non-linear along with constant RBFr. In addition, the term 𝜑i shows the coefficient of regression.
i=1

3.7. Model Assessment Approaches

kxk stands to symbolize the Euclidean norm on x. k(kx − xi k)| i = 1, 2, . . . , z stands as a group of

z non-linear along with constant RBFr. In addition, the term ϕi shows the coefficient of regression.

3.7. Model Assessment Approaches

To evaluate the reliability of early estimated heating load in energy-efficient residential buildings,
five well-known (as used mostly in academic studies) statistical indices including root mean square
error (RMSE, relative absolute error (RAE in %, mean absolute error (MAE), root relative squared error
(RRSE in %) and coefficient of determination (R2 ) are used to help to rank the network performances.
The outputs of these statistical indexes are also used for color intensity ranking. Equations (7)–(11)
designate the equations of R2 , MAE, RMSE, RAE, and RRSE, respectively.

(Yipredicted − Yiobserved )2
Ps
2 i=1
R = 1− P 2
, (7)
s
i=1 (Yiobserved − Yobserved )

1 Xs
MAE = Yiobserved − Yipredicted , (8)
N I =1
r
1 Xs h i2
RMSE = Yiobserved − Yipredicted , (9)
N i=1
Ps
i=1 Yipredicted − Yiobserved
RAE = Ps , (10)
i=1 Yiobserved − Yobserved
v
u
t Ps
u 2
i=1 (Yipredicted − Yiobserved )
RRSE = Ps 2
, (11)
i=1 (Yiobserved − Yobserved )

where, Yiobserved and Yipredicted , represented in Equations (6) to (10), are the actual and estimated values of
heating load in energy-efficient buildings, respectively. The term S in the above equations stands for
the number of instances and Yobserved denotes the mean of the real amounts of the heating load. Weka
software environment was employed to perform the machine learning models.

4. Results and Discussion

The present research aimed to provide a reliable early estimation of the heating load in
energy-efficient building systems through several machine learning solutions, namely MLPr, LLWL,
AMT, RF, ENet, and RBFr models. These approaches are well known. After running all these
machine learning techniques, the best outputs can be selected as the most trustworthy solutions in the
early estimation of heating load in energy-efficient residential buildings. Therefore, to find the most
appropriate predictive networks, the proposed AI models (e.g., MLPr, LLWL, AMT, RF, ENet, and
RBFr models) are evaluated and compared. The results of employed machine learning-based solutions
proposed here, and their performances are evaluated through Tables 3 and 4. The overall scoring for
the performances of the proposed technique is provided further in Table 5.
As is illustrated Figures 10 and 11, AMT, RF, and MLPr models provided significant accuracy in
predicting heating load in energy-efficient buildings, however, the RF-based model can be nominated
as more reliable than other developed estimations of machine learning-based techniques. The values
10, 25, 30, 5, 20, and 15 were calculated as the total scores for the LLWL, ATM, RF, ENet, MLPr, and
RBFr techniques, respectively. These scores prove the superiority of the RF when compared with
other nominated models. The amounts of R2 , MAE, RMSE, RAE (%), and RRSE (%) in the RF model
for the training dataset were 0.9997, 0.19, 0.2399, 2.078, and 2.3795, respectively. The amounts of R2 ,
MAE, RMSE, RAE (%), and RRSE (%) in the RF model for the testing dataset were 0.9989, 0.3385,
0.4649, 3.6813, and 4.5995, respectively. This indicates the higher reliability of the generated RF method
Appl. Sci. 2019, 9, 4338 12 of 17

compared to other techniques. The weakest estimation model results were from ENet solution where
the R2 , MAE, RMSE, RAE (%), and RRSE (%) were 0.8915, 3.2332, 4.5678, 35.3566, and 45.2993 in the
training dataset, respectively, and the R2 , MAE, RMSE, RAE (%), and RRSE (%) were 0.896, 3.2585,
4.4683, 35.4392, and 44.2052 for the testing dataset, respectively. According to the total scoring, the
best performance from Table 4 was found to be RF. Right after RF model, the next best estimation
network was obtained for the AMT technique. The R2 , MAE, RMSE, RAE (%), and RRSE (%) for
the AMT training dataset were 0.9985, 0.4096, 0.5449, 4.4788, and 5.4036, respectively. The R2 , MAE,
RMSE, RAE (%), and RRSE (%) for the AMT testing dataset were 0.9981, 0.4869, 0.6236, 5.2956, and
6.1693, respectively.

Table 3. The performance of selected machine learning techniques in the prediction of heating load
through several statistical indexes (training dataset).
Network Results Ranking the Predicted Models Total
Proposed Models RAE RRSE RAE RRSE Ranking Rank
R2 MAE RMSE R2 MAE RMSE Score
(%) (%) (%) (%)
lazy.LWL 0.903 3.2838 4.3335 35.9104 42.9757 2 1 2 1 2 8 5
Alternating Model
0.9985 0.4096 0.5449 4.4788 5.4036 5 5 5 5 5 25 2
Tree
Random Forest 0.9997 0.19 0.2399 2.078 2.3795 6 6 6 6 6 30 1
ElasticNet 0.8915 3.2332 4.5678 35.3566 45.2993 1 2 1 2 1 7 6
MLP Regressor 0.9915 0.9795 1.3156 10.7117 13.0465 4 4 4 4 4 20 3
RBF Regressor 0.9647 1.8226 2.6555 19.9307 26.3348 3 3 3 3 3 15 4

Table 4. The performance of selected machine learning techniques in the prediction of heating load
through several statistical indexes (testing dataset).
Network Results Ranking the Predicted Models Total
Proposed Models RAE RRSE RAE RRSE Ranking Rank
R2 MAE RMSE R2 MAE RMSE Score
(%) (%) (%) (%)
lazy.LWL 0.9049 3.2345 4.2752 35.1778 42.2953 2 2 2 2 2 10 5
Alternating Model
0.9981 0.4869 0.6236 5.2956 6.1693 5 5 5 5 5 25 2
Tree
Random Forest 0.9989 0.3385 0.4649 3.6813 4.5995 6 6 6 6 6 30 1
ElasticNet 0.896 3.2585 4.4683 35.4392 44.2052 1 1 1 1 1 5 6
MLP Regressor 0.9868 1.12 1.6267 12.1811 16.0934 4 4 4 4 4 20 3
RBF Regressor 0.9693 1.9109 2.4647 20.7827 24.3837 3 3 3 3 3 15 4

Table 5. Total ranking of proposed machine learning solutions.

Training Dataset Testing Dataset

Proposed Models Total Rank
RAE RRSE
R2 MAE RMSE R2 MAE RMSE RAE RRSE
(%) (%)
lazy.LWL 2 1 2 1 2 2 2 2 2 2 10
Alternating Model Tree 5 5 5 5 5 5 5 5 5 5 25
Random Forest 6 6 6 6 6 6 6 6 6 6 30
ElasticNet 1 2 1 2 1 1 1 1 1 1 5
MLP Regressor 4 4 4 4 4 4 4 4 4 4 20
RBF Regressor 3 3 3 3 3 3 3 3 3 3 15
RMSE: root mean squared error; R2 : correlation coefficient; RRSE: root relative squared error; MAE: mean absolute
error; RAE: relative absolute error.

The results of network reliability based on the R2 performance of all proposed models, for both
training and testing, are provided in Figures 10 and 11. As stated earlier, both of the RF models could
provide a more reliable predictive network with higher accuracy when compared to other proposed
techniques. The results of network output for the proposed RF are illustrated in Figures 10d and 11d.
Having the provided information, the predictive network of RF proved to be slightly better than other
proposed techniques and was superior in making a better regression relationship among the estimated
and actual values.
training and testing, are provided in Figures 10 and 11. As stated earlier, both of the RF models could
provide a more reliable predictive network with higher accuracy when compared to other proposed
techniques. The results of network output for the proposed RF are illustrated in Figures 10 (d) and 11
(d). Having the provided information, the predictive network of RF proved to be slightly better than
Appl.
otherSci. 2019, 9, 4338
proposed techniques and was superior in making a better regression relationship among13 ofthe
17
estimated and actual values.

(a) (b)

(e) (f)
Figure 10.
Figure 10 .The
Thenetwork
networkoutputs
outputsforfor
thethe training
training dataset.
dataset. (a) MLPr;
(a) MLPr; (b) LLWL;
(b) LLWL; (c) AMT;
(c) AMT; (d)(e)RF;
(d) RF; (e)
ENet;
ENet; (f)
(f) RBFr. RBFr.

Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci

Appl. Sci.
Appl. Sci. 2019, 9, x4338
2019, 9, FOR PEER REVIEW 214of
of 19
17

(a) (b)

(e) (f)
Figure
Figure 11.
11. The
Thenetwork
networkoutputs
outputsfor
forthe
thetesting
testingdataset.
dataset.(a)
(a)MLPr;
MLPr;(b)
(b)LLWL;
LLWL;(c)
(c)AMT;
AMT;(d)
(d)RF;
RF;(e)
(e)ENet;
ENet;
(f)
(f) RBFr.
RBFr.

5. Conclusions
5. Conclusions
In the
In the current
currentstudy,
study,several predictive
several networks
predictive werewere
networks introduced and evaluated.
introduced The study
and evaluated. The aimed
study
to assess
aimed to and compare
assess severalseveral
and compare of the most well-known
of the machinemachine
most well-known learning-based techniques
learning-based in order in
techniques to
introduce the most reliable predictive method in early estimation of heating load in energy-efficient
order to introduce the most reliable predictive method in early estimation of heating load in energy-
residential
efficient building systems.
residential buildingMachine
systems.learning-based solutions, namely,
Machine learning-based MLPr,namely,
solutions, LLWL, AMT,
MLPr,RF, ENet,
LLWL,

Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci

Appl. Sci. 2019, 9, 4338 15 of 17

and RBFr models were employed to estimate the heating load. The results of the best model from the
proposed techniques were presented. Based on the presented outcomes, it may be said that, except
for the ENet model, almost all models (i.e., MLPr, LLWL, AMT, RF, and RBFr) have good prediction
output results in estimating heating load in energy-efficient building systems. In this regard, the
RF machine learning technique could be suggested as the most reliable and accurate among other
predictive techniques provided in the present work. The learning approach is good in RF predictive
models when compared to other models concerning both the training and validation models. The
values of R2 , MAE, RMSE, RAE (%), and RRSE (%) in the RF model training dataset, were 0.9997, 0.19,
0.2399, 2.078, and 2.3795, respectively. The values of R2 , MAE, RMSE, RAE (%), and RRSE (%) in the
AMT model training dataset were 0.9985, 0.4096, 0.5449, 4.4788, and 5.4036, respectively. Validated
testing datasets from the selected techniques also showed appropriate accuracy as R2 , MAE, RMSE,
RAE (%), and RRSE (%) in the testing output of the RF model were found to be 0.9989, 0.3385, 0.4649,
3.6813, and 4.5995, respectively; R2 , MAE, RMSE, RAE (%), and RRSE (%) in the testing output of
the AMT model were found to be 0.9981, 0.4869, 0.6236, 5.2956, and 6.1693, respectively. The worst
validation was found for the ENet technique with R2 , MAE, RMSE, RAE (%), and RRSE (%) equal to
0.896, 3.2585, 4.4683, 35.4392, and 44.2052, respectively.

Author Contributions: H.M., D.T.B., A.D. wrote the manuscript, discussion and analyzed the data. H.M. and
A.D., Z.L., and L.K.F. edited, restructured, and professionally optimized the manuscript.
Funding: This study was funded by the Ton Duc Thang University and University of South-Eastern Norway.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Nguyen, T.N.; Tran, T.P.; Hung, H.D.; Voznak, M.; Tran, P.T.; Minh, T.; Thanh-Long, N. Hybrid TSR-PSR
alternate energy harvesting relay network over Rician fading channels: Outage probability and SER analysis.
Sensors 2018, 18, 3839. [CrossRef]
2. Najafi, B.; Ardabili, S.F.; Mosavi, A.; Shamshirband, S.; Rabczuk, T. An intelligent artificial neural
network-response surface methodology method for accessing the optimum biodiesel and diesel fuel
blending conditions in a diesel engine from the viewpoint of exergy and energy analysis. Energies 2018, 11,
860. [CrossRef]
3. Nazir, R.; Ghareh, S.; Mosallanezhad, M.; Moayedi, H. The influence of rainfall intensity on soil loss mass
from cellular confined slopes. Measurement 2016, 81, 13–25. [CrossRef]
4. Mojaddadi, H.; Pradhan, B.; Nampak, H.; Ahmad, N.; Ghazali, A.H.B. Ensemble machine-learning-based
geospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS. Geomat. Nat.
Hazards Risk 2017, 8, 1080–1102. [CrossRef]
5. Rizeei, H.M.; Pradhan, B.; Saharkhiz, M.A. Allocation of emergency response centres in response to pluvial
flooding-prone demand points using integrated multiple layer perceptron and maximum coverage location
problem models. Int. J. Disaster Risk Reduct. 2019, 101205. [CrossRef]
6. Rizeei, H.M.; Pradhan, B.; Saharkhiz, M.A. Urban object extraction using Dempster Shafer feature-based
image analysis from worldview-3 satellite imagery. Int. J. Remote Sens. 2019, 40, 1092–1119. [CrossRef]
7. Mezaal, M.; Pradhan, B.; Rizeei, H. Improving landslide detection from airborne laser scanning data using
optimized Dempster–Shafer. Remote Sens. 2018, 10, 1029. [CrossRef]
8. Aal-shamkhi, A.D.S.; Mojaddadi, H.; Pradhan, B.; Abdullahi, S. Extraction and modeling of urban sprawl
development in Karbala City using VHR satellite imagery. In Spatial Modeling and Assessment of Urban Form;
Springer: Cham, Switzerland, 2017; pp. 281–296.
9. Gao, W.; Wang, W.; Dimitrov, D.; Wang, Y. Nano properties analysis via fourth multiplicative ABC indicator
calculating. Arab. J. Chem. 2018, 11, 793–801. [CrossRef]
10. Aksoy, H.S.; Gör, M.; İnal, E. A new design chart for estimating friction angle between soil and pile materials.
Geomech. Eng. 2016, 10, 315–324. [CrossRef]
11. Gao, W.; Dimitrov, D.; Abdo, H. Tight independent set neighborhood union condition for fractional critical
deleted graphs and ID deleted graphs. Discret. Contin. Dyn. Syst. S 2018, 12, 711–721. [CrossRef]
Appl. Sci. 2019, 9, 4338 16 of 17

12. Bui, D.T.; Moayedi, H.; Gör, M.; Jaafari, A.; Foong, L.K. Predicting slope stability failure through machine
learning paradigms. ISPRS Int. Geo-Inf. 2019, 8, 395. [CrossRef]
13. Gao, W.; Guirao, J.L.G.; Abdel-Aty, M.; Xi, W. An independent set degree condition for fractional critical
deleted graphs. Discret. Contin. Dyn. Syst. S 2019, 12, 877–886. [CrossRef]
14. Moayedi, H.; Bui, D.T.; Gör, M.; Pradhan, B.; Jaafari, A. The feasibility of three prediction techniques of the
artificial neural network, adaptive neuro-fuzzy inference system, and hybrid particle swarm optimization
for assessing the safety factor of cohesive slopes. ISPRS Int. Geo-Inf. 2019, 8, 391. [CrossRef]
15. Gao, W.; Guirao, J.L.G.; Basavanagoud, B.; Wu, J. Partial multi-dividing ontology learning algorithm. Inf. Sci.
2018, 467, 35–58. [CrossRef]
16. Ince, R.; Gör, M.; Alyamaç, K.E.; Eren, M.E. Multi-fractal scaling law for split strength of concrete cubes. Mag.
Concr. Res. 2016, 68, 141–150. [CrossRef]
17. Gao, W.; Wu, H.; Siddiqui, M.K.; Baig, A.Q. Study of biological networks using graph theory. Saudi J. Biol.
Sci. 2018, 25, 1212–1219. [CrossRef]
18. Ngo, N.T. Early predicting cooling loads for energy-efficient design in office buildings by machine learning.
Energy Build. 2019, 182, 264–273. [CrossRef]
19. Jafarinejad, T.; Erfani, A.; Fathi, A.; Shafii, M.B. Bi-level energy-efficient occupancy profile optimization
integrated with demand-driven control strategy: University building energy saving. Sustain. Cities Soc. 2019,
48, 101539. [CrossRef]
20. Kheiri, F. A review on optimization methods applied in energy-efficient building geometry and envelope
design. Renew. Sustain. Energy Rev. 2018, 92, 897–920. [CrossRef]
21. Wang, W.; Chen, J.Y.; Huang, G.S.; Lu, Y.J. Energy efficient HVAC control for an IPS-enabled large space in
commercial buildings through dynamic spatial occupancy distribution. Appl. Energy 2017, 207, 305–323.
[CrossRef]
22. Yu, Z.; Haghighat, F.; Fung, B.C.M. Advances and challenges in building engineering and data mining
applications for energy-efficient communities. Sustain. Cities Soc. 2016, 25, 33–38. [CrossRef]
23. Zhang, H.; Yuan, C.; Yang, G.; Wu, L.; Peng, C.; Ye, W.; Shen, Y.; Moayedi, H. A novel constitutive modelling
approach measured under simulated freeze–thaw cycles for the rock failure. Eng. Comput. 2019. [CrossRef]
24. Bui, D.T.; Moayedi, H.; Anastasios, D.; Foong, L.K. Predicting heating and cooling loads in energy-efficient
buildings using two hybrid intelligent models. Appl. Sci. 2019, 9, 3543.
25. Moayedi, H.; Nguyen, H.; Rashid, A.S.A. Comparison of dragonfly algorithm and Harris hawks optimization
evolutionary data mining techniques for the assessment of bearing capacity of footings over two-layer
foundation soils. Eng. Comput. 2019. [CrossRef]
26. Moayedi, H.; Aghel, B.; Abdullahi, M.A.M.; Nguyen, H.; Rashid, A.S.A. Applications of rice husk ash as
green and sustainable biomass. J. Clean. Prod. 2019, 237, 117851. [CrossRef]
27. Huang, X.X.; Moayedi, H.; Gong, S.; Gao, W. Application of metaheuristic algorithms for pressure analysis of
crude oil pipeline. Energy Sources Part A Recovery Util. Environ. Effects 2019. [CrossRef]
28. Gao, W.; Alsarraf, J.; Moayedi, H.; Shahsavar, A.; Nguyen, H. Comprehensive preference learning and feature
validity for designing energy-efficient residential buildings using machine learning paradigms. Appl. Soft
Comput. 2019, 84, 105748. [CrossRef]
29. Bui, D.T.; Moayedi, H.; Kalantar, B.; Osouli, A.; Pradhan, B.; Nguyen, H.; Rashid, A.S.A. Harris hawks
optimization: A novel swarm intelligence technique for spatial assessment of landslide susceptibility. Sensors
2019, 19, 3590. [CrossRef]
30. Biswas, M.R.; Robinson, M.D.; Fumo, N. Prediction of residential building energy consumption: A neural
network approach. Energy 2016, 117, 84–92. [CrossRef]
31. Fan, C.; Wang, J.; Gang, W.; Li, S. Assessment of deep recurrent neural network-based strategies for short-term
building energy predictions. Appl. Energy 2019, 236, 700–710. [CrossRef]
32. Ince, R.; Gör, M.; Eren, M.E.; Alyamaç, K.E. The effect of size on the splitting strength of cubic concrete
members. Strain 2015, 51, 135–146. [CrossRef]
33. Sayin, E.; Yön, B.; Calayir, Y.; Gör, M. Construction failures of masonry and adobe buildings during the 2011
Van earthquakes in Turkey. Struct. Eng. Mech. 2014, 51, 503–518. [CrossRef]
34. Zemella, G.; de March, D.; Borrotti, M.; Poli, I. Optimised design of energy efficient building facades via
evolutionary neural networks. Energy Build. 2011, 43, 3297–3302. [CrossRef]
Appl. Sci. 2019, 9, 4338 17 of 17

35. Chou, J.S.; Bui, D.K. Modeling heating and cooling loads by artificial intelligence for energy-efficient building
design. Energy Build. 2014, 82, 437–446. [CrossRef]
36. Hidayat, I.; Utami, S.S. Activity based smart lighting control for energy efficient building by neural network
model. In Astechnova 2017 International Energy Conference; Sunarno, I., Sasmito, A.P., Hong, L.P., Eds.; EDP
Sciences: Les Ulis, France, 2018; Volume 43.
37. Malik, S.; Kim, D. Prediction-learning algorithm for efficient energy consumption in smart buildings based
on particle regeneration and velocity boost in particle swarm optimization neural networks. Energies 2018,
11, 1289. [CrossRef]
38. Pino-Mejías, R.; Pérez-Fargallo, A.; Rubio-Bellido, C.; Pulido-Arcas, J.A. Comparison of linear regression and
artificial neural networks models to predict heating and cooling energy demand, energy consumption and
CO2 emissions. Energy 2017, 118, 24–36. [CrossRef]
39. Deb, C.; Eang, L.S.; Yang, J.; Santamouris, M. Forecasting diurnal cooling energy load for institutional
buildings using artificial neural networks. Energy Build. 2016, 121, 284–297. [CrossRef]
40. Li, Q.; Meng, Q.; Cai, J.; Yoshino, H.; Mochida, A. Predicting hourly cooling load in the building: A
comparison of support vector machine and different artificial neural networks. Energy Convers. Manag. 2009,
50, 90–96. [CrossRef]
41. Kolokotroni, M.; Davies, M.; Croxford, B.; Bhuiyan, S.; Mavrogianni, A. A validated methodology for the
prediction of heating and cooling energy demand for buildings within the Urban Heat Island: Case-study of
London. Sol. Energy 2010, 84, 2246–2255. [CrossRef]
42. Nguyen, H.; Moayedi, H.; Foong, L.K.; Al Najjar, H.A.H.; Jusoh, W.A.W.; Rashid, A.S.A.; Jamali, J. Optimizing
ANN models with PSO for predicting short building seismic response. Eng. Comput. 2019, 35, 1–15.
[CrossRef]
43. Tsanas, A.; Xifara, A. Accurate quantitative estimation of energy performance of residential buildings using
statistical machine learning tools. Energy Build. 2012, 49, 560–567. [CrossRef]
44. Atkeson, C.G.; Moore, A.W.; Schaal, S. Locally weighted learning for control. In Lazy Learning; Springer:
Dordrecht, The Netherlands, 1997; pp. 75–113.
45. Frank, E.; Mayo, M.; Kramer, S. Alternating model trees. In Proceedings of the 30th Annual ACM Symposium
on Applied Computing, Salamanca, Spain, 13–17 April 2015; pp. 871–878.
46. Hamilton, C.R. Hourly Solar Radiation Forecasting through Neural Networks and Model Trees. Ph.D. Thesis,
University of Georgia, Athens, GA, USA, 2016.
47. Rodrigues, É.O.; Pinheiro, V.; Liatsis, P.; Conci, A. Machine learning in the prediction of cardiac epicardial
and mediastinal fat volumes. Comput. Biol. Med. 2017, 89, 520–529. [CrossRef] [PubMed]
48. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
49. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Wadsworth
International Group: Belmont, CA, USA, 1984.
50. Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach.
Intell. 1998, 20, 832–844.
51. Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification
and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43,
1947–1958. [CrossRef] [PubMed]
52. Diaz-Uriarte, R.; de Andres, S.A. Gene selection and classification of microarray data using random forest.
BMC Bioinform. 2006, 7, 3. [CrossRef]
53. Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T. Random forests for classification in ecology.
Ecology 2007, 88, 2783–2792. [CrossRef]
54. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat.
Methodol.) 2005, 67, 301–320. [CrossRef]
55. Buhmann, M.D. Radial Basis Functions: Theory and Implementations; Cambridge University Press: Cambridge,
UK, 2003; Volume 12.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Reference - 4
No ratings yet
Reference - 4
12 pages
applsci-09-02630
No ratings yet
applsci-09-02630
23 pages
Goyal 2020
No ratings yet
Goyal 2020
5 pages
ENB2012
No ratings yet
ENB2012
9 pages
Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble
No ratings yet
Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble
24 pages
1 s2.0 S0360132323002792 Main
No ratings yet
1 s2.0 S0360132323002792 Main
15 pages
A Study On The Application of Artificial Intelligence Techniques For Predicting The Heating and Cooling Loads of Buildings
No ratings yet
A Study On The Application of Artificial Intelligence Techniques For Predicting The Heating and Cooling Loads of Buildings
14 pages
Modeling Heating and Cooling Loads by Artificial Intelligence For Energy-Efficient Building Design
No ratings yet
Modeling Heating and Cooling Loads by Artificial Intelligence For Energy-Efficient Building Design
10 pages
An Ensemble Model for the Energy Consumption Prediction of Residential Buildings - ScienceDirect
No ratings yet
An Ensemble Model for the Energy Consumption Prediction of Residential Buildings - ScienceDirect
15 pages
Reference - 1
No ratings yet
Reference - 1
17 pages
Heating and Cooling Loads Forecasting For Residential Buildings Based On Hybrid Machine Learning Applications A Comprehensive Review and Comparative Analysis
No ratings yet
Heating and Cooling Loads Forecasting For Residential Buildings Based On Hybrid Machine Learning Applications A Comprehensive Review and Comparative Analysis
20 pages
buildings-12-02039
No ratings yet
buildings-12-02039
25 pages
sustainability-15-15055
No ratings yet
sustainability-15-15055
29 pages
Energies: A Review of Deep Learning Techniques For Forecasting Energy Use in Buildings
No ratings yet
Energies: A Review of Deep Learning Techniques For Forecasting Energy Use in Buildings
26 pages
Energies 09 00057 v2 - o
No ratings yet
Energies 09 00057 v2 - o
24 pages
Load Forecasting With Machine Learning A
No ratings yet
Load Forecasting With Machine Learning A
25 pages
Prediction of Hourly Heating Energy Use For Hvac Using Feedforward Neural Networks
No ratings yet
Prediction of Hourly Heating Energy Use For Hvac Using Feedforward Neural Networks
5 pages
Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach
No ratings yet
Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach
20 pages
Building Energy Consumption Prediction Using Deep Learning
No ratings yet
Building Energy Consumption Prediction Using Deep Learning
11 pages
Building Energy Prediction
No ratings yet
Building Energy Prediction
14 pages
Energy Prediction of Appliances Using Supervised ML Algorithms
No ratings yet
Energy Prediction of Appliances Using Supervised ML Algorithms
17 pages
Hybrid Forecasting Model of Building Cooling Load Based On Combined Neural Network
No ratings yet
Hybrid Forecasting Model of Building Cooling Load Based On Combined Neural Network
15 pages
Machine Learning Models For TH
No ratings yet
Machine Learning Models For TH
25 pages
Forecasting of Residential Unit's Heat Demands: A Comparison of Machine Learning Techniques in A Real World Case Study
No ratings yet
Forecasting of Residential Unit's Heat Demands: A Comparison of Machine Learning Techniques in A Real World Case Study
35 pages
Energy Efficiency in Smart Buildings through Prediction modeling and Optimization Using a Modified Whale Optimization Algorithm
No ratings yet
Energy Efficiency in Smart Buildings through Prediction modeling and Optimization Using a Modified Whale Optimization Algorithm
7 pages
Developing a hybrid time-series artificial intelligence model to forecast energy use in buildings_Ngo_etal_2022
No ratings yet
Developing a hybrid time-series artificial intelligence model to forecast energy use in buildings_Ngo_etal_2022
24 pages
2020 - Aaron Zeng - Prediction of BuildingmElectricity Usage Using Gaussian Process Regression
No ratings yet
2020 - Aaron Zeng - Prediction of BuildingmElectricity Usage Using Gaussian Process Regression
8 pages
Building Energy Models at Different Time Scales Based on Multi-Output Machine Learning
No ratings yet
Building Energy Models at Different Time Scales Based on Multi-Output Machine Learning
30 pages
1 s2.0 S037877881933751X Main
No ratings yet
1 s2.0 S037877881933751X Main
22 pages
Smart Energy ML
No ratings yet
Smart Energy ML
14 pages
Predicting energy consumption in multiple buildings using machine
No ratings yet
Predicting energy consumption in multiple buildings using machine
15 pages
Optimizing Building Short-Term Load Forecasting a Comparative Analysis of Machine Learning Models
No ratings yet
Optimizing Building Short-Term Load Forecasting a Comparative Analysis of Machine Learning Models
26 pages
Paper Presentation Betab Ash
No ratings yet
Paper Presentation Betab Ash
7 pages
Energy ML
No ratings yet
Energy ML
14 pages
Energies 16 03748
No ratings yet
Energies 16 03748
23 pages
Greenbuilding tr09
No ratings yet
Greenbuilding tr09
14 pages
A Study of Deep Learning-Based Multi-Horizon Building Energy Forecasting
No ratings yet
A Study of Deep Learning-Based Multi-Horizon Building Energy Forecasting
15 pages
Deep Learning Neural Network Prediction System Enhanced With
No ratings yet
Deep Learning Neural Network Prediction System Enhanced With
14 pages
Journal of Building Engineering: Chao Liu, Zhi-Gang Su, Xinyi Zhang
No ratings yet
Journal of Building Engineering: Chao Liu, Zhi-Gang Su, Xinyi Zhang
19 pages
Microbial Fuel Cell: Sustainable Approach For Reservoir Eutrophication
No ratings yet
Microbial Fuel Cell: Sustainable Approach For Reservoir Eutrophication
2 pages
Hybrid Artificial Neural Networks For Electricity Consumption Prediction
No ratings yet
Hybrid Artificial Neural Networks For Electricity Consumption Prediction
8 pages
Bai Bao Online SCIE
No ratings yet
Bai Bao Online SCIE
12 pages
Cucs 008 12
No ratings yet
Cucs 008 12
6 pages
Prediction of Energy Consumption
No ratings yet
Prediction of Energy Consumption
12 pages
Energies 17 01285 v2
No ratings yet
Energies 17 01285 v2
18 pages
Forecasting Building Energy Consumption: Adaptive Long-Short Term Memory Neural Networks Driven by Genetic Algorithm
No ratings yet
Forecasting Building Energy Consumption: Adaptive Long-Short Term Memory Neural Networks Driven by Genetic Algorithm
19 pages
Applied Sciences
No ratings yet
Applied Sciences
16 pages
Home Energy Management Machine Learning Prediction Algorithms A Review
No ratings yet
Home Energy Management Machine Learning Prediction Algorithms A Review
8 pages
A Review of Artificial Intelligence Based Building Energy Use Prediction - Contrasting The Capabilities of Single and Ensemble Prediction Models
No ratings yet
A Review of Artificial Intelligence Based Building Energy Use Prediction - Contrasting The Capabilities of Single and Ensemble Prediction Models
13 pages
Machine Learning Algorithms for Predicting Energy
No ratings yet
Machine Learning Algorithms for Predicting Energy
19 pages
A review on time series forecasting techniques for building energy_2017
No ratings yet
A review on time series forecasting techniques for building energy_2017
23 pages
Comparison of Multi Linear Regression and Artificial Neural Network To Predict The Energy Consumption of Residential Buildings
No ratings yet
Comparison of Multi Linear Regression and Artificial Neural Network To Predict The Energy Consumption of Residential Buildings
10 pages
Modeling and Forecasting Building Energy Consumption - A Review of Data-Driven Techniques
No ratings yet
Modeling and Forecasting Building Energy Consumption - A Review of Data-Driven Techniques
27 pages
IEEE Xplore Reference Download 2025.1.21.15.17.33
No ratings yet
IEEE Xplore Reference Download 2025.1.21.15.17.33
1 page
Intelligent Deep Learning Techniques For Energy Consumption Forecasting in Smart Buildings: A Review
No ratings yet
Intelligent Deep Learning Techniques For Energy Consumption Forecasting in Smart Buildings: A Review
33 pages
1 s2.0 S0378778823007430 Main
No ratings yet
1 s2.0 S0378778823007430 Main
15 pages
Urban Building Energy Performance Prediction and Retrofit Analysis Using Data-Driven Machine Learning Approach
No ratings yet
Urban Building Energy Performance Prediction and Retrofit Analysis Using Data-Driven Machine Learning Approach
16 pages
BuildingEnergyLoadForecastingusingDeep Neural
No ratings yet
BuildingEnergyLoadForecastingusingDeep Neural
6 pages
Thermostat with Machine Learning Algorithms
No ratings yet
Thermostat with Machine Learning Algorithms
8 pages
A Review on the Utilization of Reinforcement Learning and Artificial Intelligence Techniques for Buildings Heating, Ventilation, and Air Conditioning Automation System: building industry, #0
From Everand
A Review on the Utilization of Reinforcement Learning and Artificial Intelligence Techniques for Buildings Heating, Ventilation, and Air Conditioning Automation System: building industry, #0
Ahmed Paridie
No ratings yet
Enhanced Cyber Security For Big Data Challenges
No ratings yet
Enhanced Cyber Security For Big Data Challenges
7 pages
Unit -3-NNDL- Notes
No ratings yet
Unit -3-NNDL- Notes
17 pages
Data Science CSE V SEM 2023-24
No ratings yet
Data Science CSE V SEM 2023-24
11 pages
Social Implications of Data Mining and Information Privacy Interdisciplinary Frameworks and Solutions Premier Reference Source 1st Edition Ephrem Eyob - The ebook in PDF and DOCX formats is ready for download now
100% (1)
Social Implications of Data Mining and Information Privacy Interdisciplinary Frameworks and Solutions Premier Reference Source 1st Edition Ephrem Eyob - The ebook in PDF and DOCX formats is ready for download now
59 pages
Tensorflow, Keras and Deep Learning
No ratings yet
Tensorflow, Keras and Deep Learning
51 pages
Abiwinanda 2018
No ratings yet
Abiwinanda 2018
7 pages
AIMLS2022
No ratings yet
AIMLS2022
2 pages
Machine Learning: Introduction: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Machine Learning: Introduction: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
30 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
10 pages
Sentiment Analysis Using Machine Learning
No ratings yet
Sentiment Analysis Using Machine Learning
5 pages
Data Analytics With Cognos Questions
No ratings yet
Data Analytics With Cognos Questions
15 pages
Augmenting and Eliminating The Use of Sonic Logs Using Artificial Intelligence A Comparative Evaluation
No ratings yet
Augmenting and Eliminating The Use of Sonic Logs Using Artificial Intelligence A Comparative Evaluation
18 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Chapter 11 Test Bank
No ratings yet
Chapter 11 Test Bank
30 pages
Lecture4 Slides
No ratings yet
Lecture4 Slides
43 pages
New_Approach_Generative_AI_Melanoma_Data_Fusion_for_Classification_in_Dermoscopic_Images_with_Large_Language_Model
No ratings yet
New_Approach_Generative_AI_Melanoma_Data_Fusion_for_Classification_in_Dermoscopic_Images_with_Large_Language_Model
6 pages
Principal Component Analysis (PCA) : Feature Extraction Node
No ratings yet
Principal Component Analysis (PCA) : Feature Extraction Node
4 pages
Machinelearningenterprise Quizes
No ratings yet
Machinelearningenterprise Quizes
13 pages
Fake Currency Detection - !3
No ratings yet
Fake Currency Detection - !3
2 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
Lizu Report
No ratings yet
Lizu Report
16 pages
ATAL-FDP Brochure
No ratings yet
ATAL-FDP Brochure
6 pages
AI Tutorial 2
No ratings yet
AI Tutorial 2
5 pages
Chapter 3 Data Preparation
100% (1)
Chapter 3 Data Preparation
34 pages
Diabetes Prediction Using KNN Algorithm: B. Nagarjuna Reddy (1), Ch. Venkata Nilesh (2), B. Raghunath Reddy
No ratings yet
Diabetes Prediction Using KNN Algorithm: B. Nagarjuna Reddy (1), Ch. Venkata Nilesh (2), B. Raghunath Reddy
11 pages
22BCS14374 - Sanya - Singh - Assignment 2
No ratings yet
22BCS14374 - Sanya - Singh - Assignment 2
8 pages
Iot D 24 00628
No ratings yet
Iot D 24 00628
28 pages
What Is Artificial Intelligence - Introduction, History & Types of AI
No ratings yet
What Is Artificial Intelligence - Introduction, History & Types of AI
15 pages
Machine Learning Is Fun
No ratings yet
Machine Learning Is Fun
142 pages
D1-22683 Aam Tyan 2023-24 SMD
No ratings yet
D1-22683 Aam Tyan 2023-24 SMD
6 pages

Reference - 8

Uploaded by

Reference - 8

Uploaded by

applied

Appl. Sci. 2019, 9, 4338; doi:10.3390/app9204338 www.mdpi.com/journal/applsci

Statistical Details of the Dataset

No. of data 768

Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 19

Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci

Table 1. The statistical description details in term of energy-efficient design.

Figure 4. Multi-layer perceptron regressor (MLPr) neural network typical architecture.

Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci

Figure 6. Example of k-nearest neighbors (KNN) regression/classification.

Figure 6. Example of k-nearest neighbors (KNN) regression/classification.

Correlation coefficient (R²) 0.96

3.3. Alternating Model Tree (AMT)

Root mean squared error 0.5752 0.9566 0.8936 0.8495 0.8995

Relative absolute error (%) 4.75 7.94 7.43 6.82 7.0341

3.4. Random NumDecimalPlaces—Is

the vector that is

3.7. Model Assessment Approaches

3.7. Model Assessment Approaches

4. Results and Discussion

Table 5. Total ranking of proposed machine learning solutions.

Training Dataset Testing Dataset

Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci

Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci

You might also like