0% found this document useful (0 votes)
23 views

Application of Machine Learning

This document discusses applying machine learning techniques to estimate energy use intensities (EUIs) of bank branch buildings in Brazil. It collected data on 48,000 simulated bank branch models to train predictive models. A sensitivity analysis found lighting power density and weather most influenced EUI. Support vector machines achieved the most accurate EUI predictions with a mean absolute error of 3.16 kWh/m2.year and root mean squared error of 4.45 kWh/m2.year. Due to non-linear relationships, machine learning improved benchmarking model accuracy over traditional methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Application of Machine Learning

This document discusses applying machine learning techniques to estimate energy use intensities (EUIs) of bank branch buildings in Brazil. It collected data on 48,000 simulated bank branch models to train predictive models. A sensitivity analysis found lighting power density and weather most influenced EUI. Support vector machines achieved the most accurate EUI predictions with a mean absolute error of 3.16 kWh/m2.year and root mean squared error of 4.45 kWh/m2.year. Due to non-linear relationships, machine learning improved benchmarking model accuracy over traditional methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Energy & Buildings 249 (2021) 111219

Contents lists available at ScienceDirect

Energy & Buildings


journal homepage: www.elsevier.com/locate/enb

Application of machine learning to estimate building energy use


intensities
R.K. Veiga a,⇑, A.C. Veloso b, A.P. Melo a, R. Lamberts a
a
Laboratory for Energy Efficiency in Buildings, Federal University of Santa Catarina, Brazil
b
Comfort and Energy Efficiency in the Built Environment Laboratory, Federal University of Minas Gerais, Brazil

a r t i c l e i n f o a b s t r a c t

Article history: Information on building energy consumption and its characteristics is essential for carrying out bench-
Received 13 March 2021 marking processes. However, currently a lack of data acts as a major barrier in this regard. To address this
Revised 23 May 2021 issue, the purpose of this study is to demonstrate the application of machine learning to estimate building
Accepted 20 June 2021
energy use intensities of bank branches buildings located in Brazil. The methodology proposed in this
Available online 24 June 2021
study completed a data collection regarding the bank branch typology. Then, the archetype model and
its fixed and variables inputs variables were defined to generate 48,000 samples that were simulated
Keywords:
using EnergyPlus program. A Sobol sensitivity analysis was performed, showing that the lighting power
Energy modeling
Benchmarking
density followed by the weather variable were found to be the most influential variables when estimating
Machine learning techniques the energy consumption of the bank branches. Finally, a comparison between machine learning tech-
niques were applied to train the predictive model. The Support Vector Machine achieved MAE and
RMSE values of 3.16 and 4.45 kWh/m2.year, respectively, representing the most accurate model for
benchmarking purposes. Due to the non-linearity among the variable parameters, optimizing sophisti-
cated machine learning techniques significantly improved the model accuracy. The results are of great
value, since the model developed can be used in future benchmarking throughout the country. The
methodology showed high accuracy and could be extended to other typologies.
Ó 2021 Elsevier B.V. All rights reserved.

1. Introduction mation in the path to establishing an efficient building market-


place and have already developed mandatory energy disclosure
A growing number of countries have been encouraging the policies [7]. Since 1997, the Department of Energy partnered with
exchange of building energy data used to build baseline models, the Efficiency Valuation Organization (EVO) published the Interna-
i.e., building energy benchmarks for a variety of typologies [1], tional Performance Measurement and Verification Protocol
aimed at promoting energy-efficiency investments and reduce (IPMVP) [8]. The IPMVP stablishes key principles or verifying
building energy consumption [2,3]. The benchmarking process energy efficiency, water efficiency, and renewable energy projects
consists of a comparison between whole-building energy use and (including new and existing building), ensuring that an energy effi-
a sample of similar buildings [4]. Comparing building energy per- ciency project achieves its goals of improving efficiency and
formance with benchmarks can provide owners and tenants with increasing energy savings. According to the measurement and ver-
important information on building energy behavior and encourage ification analysis, IPMVP provides four options to determine the
them to use less energy and improve energy performance [5]. On a savings: IPMVP Option A - Key Parameter Measurement; IPMVP
large-scale, energy benchmarking enables scientific assessment of Option B - All Parameter Measurement; IPMVP Option C - Whole
building stock performance, helping decision makers to better Facility; and IPMVP Option D - Calibrated Simulation. Pulsipher
understand the energy profiles of cities and deploy carbon reduc- and Kaiser (2010) [9] adopted the International Performance Mea-
tion strategies [1,6]. surement and Verification Protocol to measure the energy conser-
In developed countries, such as the United States of America, vation of a residential building, presenting a description of the
researchers have recently started to understand the role of infor- baseline model construction, preliminary program evaluation,
and recommendations. Marchio and Ginestet (2010) [10] also
⇑ Corresponding author: Rua Capitão Romualdo de Barros, n° 705, Florianópolis applied the IPMVP methodology providing the accuracy and
CEP: 88.040-600, Brazil. energy saving of four different calculation options.
E-mail address: [email protected] (R.K. Veiga).

https://doi.org/10.1016/j.enbuild.2021.111219
0378-7788/Ó 2021 Elsevier B.V. All rights reserved.
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

According to Borgstein, Lamberts and Hensen (2016) [11], the and area records of distinct buildings. On account of the poorly
benchmarking process can be divided into three stages: (i) develop detailed data available, it was necessary to define a parametric
the most appropriate reference model for a chosen typology; (ii) dataset with building energy simulations [17]. The benefit of using
evaluate the energy performance of the actual building taking into a parametric sample is that a wide range of building data can be
account the parameters considered in the previous stage; and (iii) covered. To build the sample, the range of parameters must be
compare the actual building performance with the reference model carefully selected; otherwise, the validation of the dataset will be
performance. compromised [54].
Benchmarking methodologies can be classified as white-box, Despite the tendency towards the spreading of disclosure poli-
black-box or grey-box [12] according to the classification of models cies across the world and their potential to promote energy sav-
to predict building energy performance [13]. The white-boxes, or ings, Brazil has still shown a low level of interest in
engineering methods, use physical principles to determine the benchmarking. Brazilian public policies to encourage green build-
building performance and thus they require a large number of ing practices through financial and administrative incentives are
inputs and a detailed knowledge of thermodynamics. The black- behind both the USA and India [55]. As concluded by Borgstein
box methods, also known as bottom-up statistical methods, use and Lamberts (2014), there are no studies on a national scale that
historical data to simply correlate the energy performance output describe the characteristics and energy consumption of buildings
with a set of influencing inputs. To be sufficiently accurate, [17].
black-box methods require high levels of statistical significance. The user privacy and financial and technical issues limited
The grey-boxes are a fusion of white-box and black-box methods, amount of information due to a lack of data available [54,56], being
aimed at combining the advantages of each one [14,15]. the major barrier to carrying out benchmarking in Brazil. More-
According to Li, Han and Xu (2014) [12], the following black- over, the reference model, also known as the archetype, usually
box methods have been widely applied to predict building energy applied to large-scale demands, is prohibitively expensive and time
performance: multiple linear regression (MLR), support vector consuming [57]. To address this issue, the purpose of this study is
machine (SVM) and artificial neural networks (ANN). The MLR to demonstrate the application of machine learning to estimate
technique is simple to apply, easy to interpret, and in some cases building energy use intensities of bank branches, gathering a
can outperform non-linear models when the sample is small. How- new and more robust dataset related to a bank branch archetype,
ever, it often presents large variance and might not be able to rep- adding new features and more sophisticated machine learning
resent the data [16], especially when the data are non-linear. Thus, techniques to a previously described benchmarking model [17].
this technique has been broadly adopted in benchmarking for com- Defining an appropriate set of input variables is crucial to achiev-
mercial building energy [17-26]. MLR was used as a baseline to val- ing significant model accuracy, while optimizing different hyper-
idate predictive models by Ostergaard, Jensen and Maagaard parameters may also improve the accuracy. Firstly, a data
(2018) [27], who compared six metamodeling techniques to pre- collection regarding the bank branch typology was completed,
dict building energy performance resulting from simulations. defining the most important characteristics in relation to energy
Regarding the capacity to solve non-linear problems, ANN and consumption. Due to a lack of available data presented in the pre-
SVM have been widely applied to predict the energy performance vious study by Edward and Lamberts (2014) [17], an extended list
of commercial buildings [14,28-40]. A review on the applications of of building characteristics and their respective ranges of values
these two techniques was carried out by Ahmad et al. (2014) [41]. were cross referenced to build up and calibrate the archetype.
In terms of forecasting accuracy, the authors observed that both Then, the architype model and its fixed and variables inputs vari-
techniques have advantages and disadvantages that analysts need ables were defined to generate 48,000 samples that were simu-
to be aware of before selecting one of these approaches. lated using EnergyPlus and the programming routines were
Besides the benchmarking methodologies, sensitivity analysis written in R language. In order to understand the influence of the
should be applied to understand the importance of input features input variables on the output (EUI), a Sobol sensitivity analysis
to evaluate building energy performance and to reduce the model was performed for the sample previously generated. Finally, a
complexity [42]. Sensitivity analysis can be classified as local (one- comparison between machine learning techniques were applied
at-a-time) or global. The local approach represents the influence of to train the predictive model to understand the most accurate
an input around a specific point in the input hyperspace through model for benchmarking purposes. The benchmark model was val-
the first-order effect. On the other hand, the global approach rep- idated to quantify the degree of imperfectness and guarantee the
resents the total-order effect and covers a wide range in the input model accuracy outside the sample.
hyperspace. In non-linear problems, local analysis is limited to the The outline of the paper is as follows. In Section 2, background
chosen point of the hyperspace and may not fully represent the information on the data collection are presented. In Section 3, a
model behavior [43,44]. dataset structure of the factors used in the construction of energy
Since the thermal behavior of buildings and their systems usu- baseline models are outlined, and in Section 4, the predictive
ally presents non-linear behavior [45,46], many studies have model optimization is developed. A case study is presented in Sec-
applied global sensitivity analysis to understand the energy perfor- tion 5, followed by conclusion in Section 6.
mance of commercial buildings [47-51], representing a valuable
tool for feature selection [42].
Information on building energy consumption and its character-
istics is essential for carrying out benchmarking processes. How- 2. Data collection
ever, currently, a lack of information presents an obstacle to
understanding the risks and benefits of implementing energy effi- 2.1. Previous study
ciency measures [52,53]. When the amount of information avail-
able to develop a representative dataset is limited, it is necessary The earliest study related to building energy consumption in
to fall back on parametric building simulations to complete this Brazil was carried out in 2005, when the potential to implement
gap [54]. energy efficiency measures on the national market was evaluated
Edward and Lamberts (2014) [17] presented the most extensive for the first time. The survey, administrated by the Brazilian Min-
Brazilian benchmarking study related to the typology bank istry of Mines and Energy (MME) and executed by the National
branches. Their source database held monthly energy bills, location Program of Electric Energy Conservation (Procel), assessed com-
2
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

mercial, residential, public and industrial buildings through own- and size of openings, and boundary conditions (i.e., whether it
ership and consumption habits [58]. is a stand-alone building or not), were introduced into the
Seeking to understand the difference between energy consump- archetype;
tion estimated in the early design stage and in the operation phase, 2. Envelope – Floor, ceiling, roof, wall and window materials and
in 2013 the Energy Committee of the Brazilian Council for Sustain- thermal properties were gathered from interviews with build-
able Construction (CBCS) started the Operational Energy Perfor- ing designers. Wall and roof thickness were extracted from
mance project. In the following year, the development of a audited projects, while the absorptance and thermal transmit-
benchmarking targeting bank branches was launched as a pilot tance values were defined according to the standard ABNT
project. The data for this benchmarking study resulted from sys- 15220–3, titled ‘‘Thermal performance in buildings – Part 3:
tematic data collection, carried out from 2013 to 2014. A total of Brazilian bioclimatic zones and building guidelines for low-
7980 bank branches were included and information on the floor cost houses” [66];
area, energy consumption and location was gathered [17]. 3. Internal loads and HVAC – Artificial lighting data were collected
The pilot project was used by CBCS as a reference in 2015, when from in-loco audits and matched with the Brazilian labelling
benchmarking for the corporative office typology was performed in standard for commercial buildings (INI-C) [67], while people
a partnership with the British Embassy and the National Energy load followed the ASHRAE Standard 55 [68]. Information related
Conservation Program (Procel). Two years later, supported by the to occupancy and electric equipment loads were extracted from
United Nations Development Program (UNDP), the previous the preceding benchmark model [17]. The HVAC system charac-
benchmarking project was extended to cover public administrative teristics were mainly defined by interviews with bank branch
buildings at municipal and federal levels [59]. managers. When information was missing or mismatched, sup-
At the end of 2018, a project named META, administrated by plier catalogues were consulted to identify properties such as
MME, published a more extensive report with useful information COP and cooling capacity. Regarding the air change rate, the
to implement benchmarking. The main objective of META was to mandatory guidelines from ‘‘Air Conditioning Installations –
develop a consolidated database to estimate energy consumption. Part 3: Inside Air Quality” (ABNT NBR 16401–3) [69], for artifi-
In the research, 15 different typologies of commercial and public cially conditioned environments, were considered;
buildings were analyzed to identify building characteristics, such 4. Schedules – Appliance, air conditioning, lighting and occupancy
as lighting and appliance load levels, operation schedules and schedules were compiled from in loco audits and surveys car-
others [60]. In 2019, the Energy Committee of CBCS and Eletrobras, ried out with bank branch managers. In addition, the directions
Latin America’s largest company in the electric sector, attempted set by the Brazilian Federation of Banks (FEBRABAN) [70] for
to facilitate the development of benchmarks and energy perfor- working time of bank branches were also taken into account;
mance indicators by adding energy consumption records to the 5. Weather conditions – The building locations were obtained
data previously collected by META. This research also covered 15 from the CBCS database, and the weather variables, e.g., annual
different typologies; among them bank branches [61]. mean dry-bulb temperatures, were calculated using INMET
weather files, downloaded from the weather One Building orga-
2.2. Sources nization [71].

The archetype development was divided into two steps: seg- 3. Dataset structure
mentation of the building stock into groups holding similar charac-
teristics (i.e., building typology) and detailed characterization of 3.1. Archetype: Fixed parameters
the predefined groups according to pattern recognition (e.g., ther-
mal properties) [62]. The data collected from CBCS and the analysis using Google
In this paper, the bank branch typology was adopted as the seg- Street View were consistent with the model proposed by Borgstein
mentation group and its characterization was defined according to and Lamberts (2014) [17]. Thus, a two-story building with three
the most important characteristics in relation to energy consump- thermal zones per floor was adopted. The model was detailed in
tion, based on data found in the literature [48,63-65]. the EnergyPlus software, version 9.2.0 [72].
The main report of the Commercial Buildings Energy Consump- The ground floor is composed of two thermal zones occupied by
tion Survey (CBECS), conducted by the U.S. Energy Information clients and employees (Cashier 1 and ATM Client side) and one
Administration (2012), consists of an extended list of building thermal zone with equipment (ATM Back). The upper floor is also
characteristics and their respective ranges of values [63]. This composed of two occupied thermal zones (Cashier 2 and Adminis-
was one of the main reports that identified the characteristics of tration) and one thermal zone with equipment (Server). The
building stock segmentation adopted, which could potentially ground contact was modeled through the object Ground Domain
affect the thermal performance of buildings. Furthermore, the liter- Slab, using the finite difference method [73]. On the upper floor
ature review indicated the following characteristics to be consid- the roof is exposed to the outdoor environment. Fig. 2 shows all
ered in the development of archetypes for commercial thermal zones of the model and the floor area for each one is given
typologies: weather conditions, geometry, envelope, occupancy as a percentage of the total building floor area.
schedules, internal loads and air conditioning system [48,64,65]. The CBCS database revealed a non-uniform relation between
Due to a lack of available data, the data gathered from different building floor area and energy use intensity (EUI), larger building
sources were cross referenced to build up and calibrate the arche- floor areas had a smaller EUI. It was also noted that the floor area
type, as follows: varies considerably and the occurrence of records seems to be
higher for areas within the range of 150 m2 and 1.
1. Geometry – Building floor areas and layout patterns were col- Interviews with building designers revealed that in general the
lected from the CBCS database, which contains data on 7980 frontal facade of bank branches in Brazil is composed of windows
Brazilian bank branches, and cross referenced with the model and doors. Thus, in the model, tempered glass (U = 5.6 W/m2.K)
developed by Borgstein and Lamberts (2014) [17]. Data were was placed on the frontal facade of each story to represent both
complemented by analyzing pictures taken from Google EarthÒ, the windows and doors, but no natural ventilation was considered.
using Google Street View (examples are given in Fig. 1), and For security purposes, the Brazilian standard NBR 7199 ‘‘The Use of
new characteristics, such as height, number of floors, position Glasses in Constructions” requires, at least, the adoption of tem-
3
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

Fig. 1. Examples of Brazilian bank branches. Source: Google Street View.

Fig. 2. Archetype thermal zones and floor area (as a percentage of total building floor area).

pered glass resistance in windows and doors lower than 1.10 m is always on. HVAC heating and cooling set points were adopted
[74]. For safety reasons, this type of glass material also tends to as 21 °C and 24 °C, respectively.
be used in commercial buildings. Due to the variety of window The internal load of appliances in the thermal zones Cashiers
films used in commercial buildings, the solar heat gain coefficient 1/2 and Administration is mainly related to employee work sta-
(SHGC) was not fixed. Blinds or other shading solutions were not tions, while in ATM Clients it corresponds to a totem machine.
often found in the data collection and therefore they were not According to Yuventi and Mehdizadeh (2013) [75], data centers
included in the archetype. with optimized floor usage to maximize the number of servers
The opening hours of bank branches are the same throughout inside the facility can present a power density > 1000 W/m2. Since
the country. The financial services offered by these establishments on-site audits verified that servers of Brazilian bank branches are
must be open from 10:00 to 16:00, as defined by the Brazilian Cen- not highly optimized, the electric equipment load for the Server
tral Bank [70]. However, on-site audits and surveys suggested that, thermal zone was set as 750 W/m2. ATM machines are placed in
in general, the working time exceeds the stipulated range, espe- ATM Clients thermal zone; nevertheless, the internal load gener-
cially in employee zones and ATM service areas. Thus, the occupa- ated by these machines is released at the ATM Back. The electric
tion schedules were based on a working time average. On power of each ATM machine corresponds to 142 W, as defined
weekdays employees are busy from 8:30 to 19:00 (Cashiers 1/2 by Borgstein and Lamberts (2014) [17]. In the cited article, the
and Administration), while the ATM Clients thermal zone is open same value for people density was considered for all thermal
for clients from 6:00 to 22:00. Both ATM Back and Server thermal zones, although in-loco audits suggested different values regarding
zones have no occupancy. These zones are used particularly to host the presence of customers and employees. Thermal zones where a
equipment and therefore their appliance schedule is 24/7. The higher number of customers is expected tend to also present
HVAC system schedules follow the occupation schedules in the higher people density. According to the ASHRAE Standard 55
occupied zones, except for the employee thermal zones, where [68], a metabolic rate of 117 W/m2 (typing activity), was consid-
the HVAC system is activated one hour before opening time. In ered in the Cashiers 1/2 and Administration thermal zones and
ATM Back and Server thermal zones, the artificial cooling system 198 W/m2 in the ATM Clients thermal zone (walking about). Table 1
4
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

Table 1 using Equation (1), where DATM is the ATM density and the con-
Internal load densities for the thermal zones. stant values of 21.7/3.3 and 142 correspond, respectively, to floor
Building Thermal zone People density (people/ Appliances (W/ area ratio and the ATM electric power (in W). Since in-loco audits
floor m2) m2) did not reveal a pattern for internal loads associated with lights,
Ground Cashier 1 0.12 4.12 the lighting power density range was defined according to the
ATM Clients 0.231 2.31 requirements to achieve the lowest and highest efficiency levels
ATM Back 0 Variable given in the labeling standard INI-C [67].
Roof Cashier 2 0.12 4.12
Administration 0.042 4.29 ATMIL ¼ ð1=DATM Þ:ð21:7=3:3Þ:142 ð1Þ
Server 0 750
The values for heavy and light envelope materials, respectively,
refer to high and low thermal inertia envelopes commonly used in
shows the internal load densities for all thermal zones in the pro- the Brazilian construction context. The heavy envelope corre-
posed archetype. sponds to a combination of concrete walls with 10 cm thickness
(solar absorptance of 0.7) and a roof composed of fiber cement tiles
(solar absorptance of 0.7), an air gap (thermal resistance of
3.2. Archetype: Variable parameters 0.21 m2.K/W) and a slab with 10 cm thickness. The light (or insu-
lated) envelope has walls built of hollow bricks (9 cm  19 cm x19
According to the definition of the archetype, some parameters cm) with a 2.5 cm layer of plaster (solar absorptance of 0.3) on
were varied and considered as input parameters for the predictive both sides plus a roof with the following four layers: fiber cement
models. The input parameters of the predictive model and their tiles (solar absorptance of 0.3), air gap (thermal resistance of
ranges considered for this study are described in Table 2. 0.21 m2.K/W), 5 cm of glass wool and a concrete slab with 10 cm
The area range was mainly defined by the distribution of the thickness. On cross referencing the data collected from Google
areas in the CBCS database. Records with an area of < 200 m2 Street View and interviews with building designers, windows
and > 1200 m2 were excluded from the database beforehand, since coated with a variety of film layers were noted. Therefore, a SHGC
some of the buildings were actually small bank offices or bank range of 0.3 to 0.7 should cover all the possible film layers found.
headquarters, adding extra uncertainty to the model. The two HVAC system options considered in this paper were the
Two main patterns of bank branches were identified through most predominant technologies found in the data collection. The
analysis carried out using Google Street View: stand-alone build- value for ‘‘split” refers to the high-wall air conditioning technology,
ings and agencies located in commercial buildings. Thus, the which is very common in Brazilian buildings, while ‘‘vrf”, stands for
boundary conditions variable was defined to cover these two types. the variable refrigerant flow technology, which is more expensive
This variable assumes two possible values, ‘‘adiabatic” and ‘‘out- and efficient. The COP variable varies from 3 to 6, intending to
doors”, which, respectively, represent a bank branch placed inside cover the efficiencies of HVAC systems of both technologies. In this
a building and a bank branch as a stand-alone building. When study, COP values are fixed. However, it is important to mention
placed inside a building, the walls of the bank branch were set as that in the year-round operations, COP values change largely.
adiabatic, except for the frontal facade walls, which are always Due to a lack of information regarding the air change rate, four
considered exposed to an outdoor environment. This approach methods of air change calculation were adopted in this study. The
was considered as internal places in commercial buildings tend first method, called ‘‘smart”, was based on legislation published by
to be artificially conditioned, reducing the heat exchange between the Brazilian Ministry of Health [76] and considers an outdoor air
internal spaces. flow rate of 27 (m3/h)/person. Compared to the following methods,
Although the internal loads generated by ATM equipment were the ‘‘smart” method is a more efficient and expensive alternative,
set in the ATM Back thermal zone, the ATM density variable was since it requires an automatic system with CO2 sensors. The other
calculated in relation to ATM Client floor area. Thus, the calcula- three methods (‘‘min”, ‘‘inter” and ‘‘max”) are described in NBR
tions of the internal load density for the ATMs were performed 16401–3 ‘‘Air Conditioning Installations - part 3: Inside air qual-
ity”, which defines three levels of outdoor air flow (minimum,
intermediate and maximum) in order to achieve significant reduc-
tions in allergic sicknesses [69]. In these three methods a constant
Table 2
outdoor air flow rate (Qef) is calculated for the whole simulation
Description of variables considered in the predictive model.
based on the maximum occupancy of the zone (Equation (2)).
Variable Values Unity Type Before the simulation starts, a calculation using Equation (2) is per-
Area 200 ~ 1200 m2 Numerical formed for each zone. The values of Fp (air flow per person) and Fa
Azimuth angle 0 ~ 360 ° Numerical (air flow per occupied area) are extracted from Table 3. The zone
Boundary conditions ‘‘adiabatic” (1) or – Categorical
floor area (A) is calculated multiplying the total floor area, taken
‘‘outdoors” (2)
ATM density 5 ~ 20 m2/ Numerical as an input of the predictive model, by the corresponding zone
ATM floor area ratio. To find the maximum number of people in the zone
Lighting power density 10 ~ 24 W/ Numerical (P), the zone floor area ratio must be multiplied by the correspond-
m2
ing people density of the zone.
Envelope materials ‘‘heavy” (1) or ‘‘light” (2) – Categorical
Solar heat gain 0.3 ~ 0.7 – Numerical Q ef ¼ P:F p þ A:F a ð2Þ
coefficient (SHGC)
HVAC system ‘‘split” (1) or ‘‘vrf” (2) – Categorical Two variables were analyzed to be used in the weather charac-
Coefficient of 3~6 W/W Numerical terization, cooling degree hours (CDH) and annual mean dry-bulb
performance (COP)
temperature (mDBT). CDH is a wet-bulb temperature-based index
Air change calculation ‘‘smart” (1), ‘‘min” (2), – Categorical
method ‘‘inter” (3) or ‘‘max” (4) calculated using Equation (3) [77], where CDH, tout and tbal corre-
Cooling degree hours CHD: 1957 ~ 84723 °C Numerical spond, respectively, to the cooling degree hours, the balance point
(CDH) or annual mTDB: 10.8 ~ 28.2 temperature and the outdoor air temperature. Many researches
mean dry-bulb
have adopted CDH to represent outdoor temperatures and predict
temperature (mTDB)
cooling energy demand [78]. However, a recent study conducted to
5
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

Table 3
Fp and Fa values (adapted from NBR 16401–3 [69]).

Fp – Air flow per person (L/s.person) Fa – Air flow per occupied area (L/s.m2)
Minimum Intermediate Maximum Minimum Intermediate Maximum
3.8 4.8 5.7 0.3 0.4 0.5

Fig. 3. Linear regression models for mDBT and CDH.

compare cooling degree-day 10 °C and heating degree-day 18 °C to the values of the variables and return a new model with the
showed that mDBT could effectively replace the aforementioned required characteristics. All of the samples were generated and
indexes registered by INMET automatic stations at the 411 Brazil- simulated using EnergyPlus and the programming routines were
ian weather locations [79]. written in R language. The simulation outputs were processed
X and the energy consumption was calculated as the sum of the out-
CDHðt bal Þ ¼ hours
ðt out  t bal Þþ ð3Þ puts described in Table 4. Finally, the input variables were attached
to the corresponding outputs.
In order to choose between these two weather variables, two Fig. 4 shows the building energy consumption per floor area
simple linear regression models were developed to predict the (EUI) obtained with the simulated dataset on the x-axis and the
EUI. The first linear model (left side of Fig. 3) considered only energy end use on the y-axis. As previously mentioned, the server
mDBT as the weather predictor, while the second model (right side and the appliance consumptions per area were fixed, therefore,
of Fig. 3) considered only CDH. The model using CDH as a predictor these end uses presented no variance in the dataset. The end uses
showed a higher determination coefficient (0.214) than the model ATM and Lights are dependent on input variables ‘‘atm” and
using mDBT, (0.073), which indicates that CDH may behave as a ‘‘lights”, respectively. Considering the median values for the EUI,
better predictor in this type of analysis. The linear regression slope represented by a vertical black line inside each box, the sum of
(angle of the red line in the graph) is slightly higher for CDH than it Lights and Server loads account for 106.8 kWh/m2.year, corre-
is for mDBT and this leads to the same conclusion. Therefore, CDH sponding to around 70% of the EUI.
was adopted as the weather predictor. Unlike the other energy end uses in Fig. 4, for HVAC (HVAC
Heating, HVAC Cooling and Fans) the EUI is not strictly related to
3.3. Sampling method a specific input variable. The wide distribution of the EUI values
for HVAC Heating, HVAC Cooling and Fans suggests that these
Given the predictive model input variables, 48,000 samples end uses are influenced by different input variables. Despite
were generated through a Monte Carlo sampling-based method, accounting for only an average of 40.5 kWh/m2.year, there are sig-
developed by Saltelli (2002) [80]. Saltelli’s sequence has the ability nificant opportunities to implement energy efficiency through the
to fill the hyperspace uniformly with a relatively low number of HVAC system, since many input variables can be optimized to
observations. Due to the non-linear nature of the thermal behavior reduce the EUI.
of buildings, space filling is a major concern in this experiment Fig. 5 explains the correlations between the input variables and
[81]. the HVAC end uses according to the Pearson correlation coefficient.
The archetype and the input variables were added into a pro- Absolute values for the Pearson coefficient lower than 0.35 indicate
gramming script (github.com/rodolfoksveiga/cbcs) the main func- a weak correlation, values between 0.36 and 0.67 are considered
tion of which is to edit the archetype simulation model according moderate and values higher than 0.68 demonstrate a strong corre-

6
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

Table 4 between inputs, Sobol sensitivity analysis [83] was performed for
EnergyPlus outputs. the sample previously generated. The first-order and total-order
EnergyPlus output Source effects of each variable were estimated through a Python library
General Fans Electric Energy named SALib, which stands for sensitivity analysis library [84].
Zone Ventilation Fan The values of first-order (Si) and higher-order (Sij) effects and the
Lights value of total order (Sti) effects are estimated through Equations
Exterior Lights (4) and (5) [85], respectively. On the left side of Equation (4), the
Electric Equipment
Split Coil Cooling
first-order effect of the input variable i (Si) is the contribution of
Heating the effect of i to the output variable Di(Y), standardized by the vari-
VRF Heat Pump Heating ance of the output variable Var(Y). On the right side, the second-
Crankcase Heater order effect of i related to the input variable j, (Sij), is the contribu-
Cooling
tion of the effect of i when influenced by j, standardized by Var(Y).
Defrost
In Equation (5), STi corresponds to the total effect of i, which is the
sum of the first-order effect of i (Si), all of the second order effects
of i (Sij) when j varies among all the other input variables, and so
lation [82]. As expected, two input variables (‘‘hvac” and ‘‘cdh”) are
on.
highly correlated to at least one of the HVAC end uses and one
input variable (‘‘cop”) presented a moderate correlation. Di ðY Þ Dij ðY Þ
Si ¼ ; Sij ¼ ; ð4Þ
The ‘‘hvac” variable presented a strong negative correlation Var ðY Þ Var ðY Þ
with Fans (-0.87), and thus higher values for ‘‘hvac” indicates less
X X X
energy consumption. This correlation can be explained by the fact ST i ¼ Si þ S þ S þ  ¼ S ð5Þ
i<j ij j–i;k–i;j<k ijk l2#i l
that the values of ‘‘hvac” are treated as numeric (‘‘split” is equal to
1 and ‘‘vrf” is equal to 2) as shown in Table 2. Thus, the correlation Fig. 6 illustrates the first-order and total-order effects for each
value of 0.87 indicates that the energy consumption of fans is input variable, considering their influence over the EUI. The light-
higher for ‘‘split” than for ‘‘vrf”, consistent with the expected ing power density (‘‘Lights” variable) followed by the weather vari-
behavior of these two technologies. The variable ‘‘cdh” showed a able (‘‘cdh”) were found to be the most influential variables when
strong positive correlation with HVAC Cooling (+0.77) and moder- estimating the energy consumption of the bank branches. Accord-
ate negative correlation with HVAC Heating (-0.42). When CDH is ing to the total-order effects, the HVAC system (‘‘hvac”), COP
higher the energy needed to cool the building tends to increase, (‘‘cop”) and building floor area (‘‘area”) are the third, fourth and
while the opposite trend is expected in relation to heating. Besides fifth most important variables for estimating the EUI, respectively.
these two strongly correlated variables, ‘‘cop” presented moderate The first-order and total-order effects of the solar heat gain coeffi-
negative correlation (-0.41) with HVAC Cooling and near to zero cient (‘‘shgc”) and the azimuth angle (‘‘azimuth”) are close to zero,
correlation with HVAC Heating. Increasing the COP may be an hence they do not have a significant influence on the output
effective alternative in hot weather, where the cooling load is pre- variable.
dominant. Furthermore, the following variables presented correla- Since the total-order effect is the sum of first and higher-order
tions between weak and moderate: ‘‘boundaries” (+0.23 with Fans) effects (Equation (5)), an input variable presenting equal or close to
and ‘‘envelope” (-0.18 with Fans), respectively. first-order and total-order effects must have higher-order effects
close to zero, which means it directly influences the output and
3.4. Sensitivity analysis is not influenced by other input variables. The larger the difference
between the first-order and total-order effects, the higher the
In order to understand the influence of the input variables on influence from other variables and the smaller the direct influence
the output (EUI), taking into account the higher-order effect on the output will be. The input variables ‘‘lights”, ‘‘area” and ‘‘atm”

Fig. 4. EUI by energy end use.

7
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

Fig. 3, the internal loads represent a large percentage of the EUI


and they do not change during the simulation. A constant power
density (W/m2) is defined prior to the simulation and this value
is integrated over time (one year of simulation) and summed with
the EUI value for the building.
Information that serves no purpose, or noise, can distract the
model from information that actually matters, that is, the signal
[86]. Since noise does not increase the model accuracy and may
even reduce it, a careful selection of features was performed to
avoid insignificant or noisy variables that are not able to describe
the energy consumption of buildings. The dashed black vertical
lines in Fig. 6, where the sensitivity index value is equal to 0.01,
0.025 or 0.05, represents three criteria used to identify and remove
insignificant variables from the dataset [87]. According to these
criteria, the four following predictive models were optimized and
compared: a) all variables considered (no threshold); b) ‘‘azimuth”
and ‘‘shgc” removed from the sample (threshold equal to 0.01); c)
‘‘azimuth”, ‘‘shgc” and ‘‘afn” left out (threshold equal to 0.025); d)
‘‘azimuth”, ‘‘shgc”, ‘‘afn”, ‘‘envelope” and ‘‘atm” left out (threshold
equal to 0.05).

4. Predictive model optimization

4.1. Data pre-process

Before any other data pre-process, the data was divided into
two partitions, one for training and another for testing the predic-
tive model. The training per testing ratio varies depending on the
particularities of the problem. A percentage of 80% was considered
for training and 20% for testing, which is a reasonable value for
large samples [36] and was adopted in this research. The test par-
tition can be seen as a small portion of the dataset that has never
been seen by the model. It helps to avoid bias and assure that
future observations will perform as well as the test partition [88].
Fig. 5. Correlation between input variables and HVAC system. Most statistical analysis techniques are based on the assump-
tion that the sampled variables are normally distributed [89]. Thus,
improving the normality of the variables can increase the model
presented the first-order effects very close to total-order effects, accuracy, especially for distributions with a high level of non-
suggesting they are not influenced by other variables as they are normality [90]. Centering and scaling transformations were con-
mainly related to the internal loads of the building. As shown in ducted in order to boost the normality of the data for the input

Fig. 6. Sobol’s first-order and total-order effects.

8
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

variables, since the dataset was sampled through a Saltelli test the model and the most accurate model is selected according
sequence, resulting in distributions close to uniform. Basically, to a predefined performance index [97].
centering subtracts the mean of a given variable from its values, The training framework and the evaluation were supported by
while in scaling the values for a given variable are divided by its the Caret package [95], which standardizes the syntax of several
standard deviation [91]. machine learning functions from other packages. The ANN was
The EUI in the dataset was estimated by building energy simu- trained in Caret through the model called ‘‘brnn”, which stands
lations. Fig. 7 shows the relative probability (y-axis) of finding a for Bayesian regularized neural network. This ANN is restricted
given value of EUI (x-axis) in the dataset and its skewness and kur- to only one hidden layer, and the number of neurons in this layer
tosis (before and after centering and scaling). These indexes was varied from 10 to 40. As it increases the number of weights,
describe the shape of a distribution and are often used to compare large numbers of neurons tend to increase the capacity of the
a given distribution with the normal distribution [92]. The normal model to deal with complex data, however it may also lead to over-
distribution presents kurtosis equal to 3 and skewness equal to 0 fitting [98].
[93,94]. The kurtosis value estimated for the EUI distribution The SVM was trained using two different Caret models,
(3.228) suggests a leptokurtic distribution, which represents a dis- ‘‘svmPoly” and ‘‘svmRadial”, which consider as the kernel function
tribution with a higher peak and longer tails than the normal dis- the polynomial function and radial basis function, respectively.
tribution. The skewness value (0.465) being greater than zero When considering the polynomial function, it is necessary to set
indicates a positively skewed distribution, i.e., the majority of val- the function’s degree, which was varied from one (linear model)
ues are lower than the mean. Due to its positively-skewed distribu- to five. Both polynomial and radial kernels permit the configura-
tion, centering and scaling techniques were also applied to the tion of the regularization constant, which controls how sensitive
output variable. the model is to records within the margins, hence, it determines
Categorical variables cannot be described on a numerical scale. the margin width. Extremely low values for the regularization con-
Since machine learning deals with numbers, transforming each stant make the model insensitive to local patterns in the training
category of these variables into an independent binary variable, set and may lead to underfitting, while excessively high values
also known as a dummy variable, consists of a workaround for make the model harshly fit to the noise in the training set and
inputting categorical variables into machine learning models. Each may lead to overfitting [99]. The value of the regularization con-
dummy variable has two possible values, which represent the stant was varied from 0.25 to 128.
presence or absence of a quality, and may represent deterministic
effects on the output variable [91]. The categorical variables in 4.3. Performance evaluation
Table 2 were all transformed into dummy variables.
All data pre-processing was carried out using Caret, an open- According to previous literature reviews [100,101], metrics
source package developed in R language [95]. based on absolute error calculations, such as mean absolute error
(MAE), Equation (6), and root mean square error (RMSE), Equation
(7), are the most popular approaches used to evaluate forecast
4.2. Sample modeling techniques models. While MAE is conservative to outliers, RMSE penalizes
these observations, which is a drawback, since subsamples of a
ANN and SVM techniques were selected to train the predictive dataset can have different concentrations of outliers.
model over a grid of hyperparameters, described in Table 5. After The scale dependency of these metrics could represent a con-
the training, only the most accurate model was selected for bench- cern in this study if the forecast objects had different magnitudes.
marking purposes. Besides these sophisticated techniques, an MLR However, it becomes an advantage in the case of building energy
model was used as a baseline in this research, due to its reliability benchmarking, where the magnitude of the forecasted output is
and simplicity to fit, since no hyperparameters are required and extremely relevant and the error itself provides a measure of
there are no risks of overfitting [16]. energy consumption. Therefore, the MAE was used as a decision
The concept of ANNs, a non-linear regression technique, is criterion, while the RMSE was used to identify a high concentration
based on the human brain, shaped in neuron layers. Information of outliers.
flows from the input to the output layer, interacting with every
1 Xn  
hidden layer in between and calibrating the neuron weights to find MAE ¼  yj  ybj  ð6Þ
n j¼1
the most accurate model. In the node interactions, analogous to
synaptic connections, activation functions deliberate whether or rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 Xn  2
not the information should be processed and passed to the next RMSE ¼  yj  ybj ð7Þ
layer. Thus, ANNs can be used to solve non-linear problems n j¼1

[14,96]. Table 6 shows the performance of the descriptive statistics for


SVM was originally a classification technique and was later the optimized predictive models and the preceding benchmark
adapted to solve regression problems through support vector model. Each row in the table corresponds to the record of the most
regression. The SVM technique focuses on finding support vectors, accurate model for a given machine learning technique and impor-
which are the records on the edges of each classification cluster, tance threshold (number of input variables). In order to evaluate
and estimating thresholds (hyperplanes) and margins, in order to the preceding benchmark model, the whole dataset was predicted
distinctly classify the observations according to given classification using the previous model, defined by Equation (8), and the predic-
labels. Kernel functions are used to systematically find hyperplanes tions were compared with the simulated outputs.
in higher-order dimensions and, hence, solve non-linear problems
EUI ¼ 136:5 þ 0:001984:CDH ð8Þ
[37].
In order to reduce the risk of overfitting, the aforementioned It can be observed that the ANN and SVM outperformed the
optimization was performed by a 10-folders cross-validation. The MLR for all thresholds. The difference in terms of performance of
K folder cross-validation technique splits the training partition into the models developed adopting the thresholds 0 and 0.01 was con-
K folders or subsets. In each training cycle K minus one folders are sidered insignificant. As indicated by the sensitivity analysis in sec-
used to train the model, while the folder left out is used to validate tion 3.4, the input variables ‘‘azimuth” and ‘‘shgc” do not provide
the model. The cycle is repeated until every folder has been used to any extra useful information for the models and removing them
9
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

Fig. 7. EUI distribution.

Table 5 and 4.45 kWh/m2.year, respectively, and was the outperforming


Grid of hyperparameters. model when considering nine input variables. Therefore, it was
Sample modeling Hyperparameter Possible values adopted as the final benchmark model. To achieve this accuracy,
technique the kernel function of the most accurate model was set as radial,
SVM Kernel Polynomial or while the optimal value for the regularization constant was 64.
Radial A comparison between the models developed in this research
Regularization constant 0.25 ~ 128 and the preceding benchmark evidenced the need to implement
Degree (polynomial) 1~5
an appropriate set of input variables. The MAE of the least and
ANN Neurons in the hidden 10 ~ 40
layer most accurate models generated in this work exceed the previous
model by factors of 6.3 and 17.3, respectively. Furthermore, the
optimization of different machine learning techniques is crucial
may remove noise and yield simplicity. When the threshold 0.025 to obtain better accuracy. The most accurate MLR was 2 times less
was adopted, only one additional variable was removed, but the accurate than the most accurate ANN or SVM.
model accuracy reduced considerably. On comparing the perfor- Fig. 8 shows the errors between simulated and predicted data.
mance of the MLR, ANN and SVM models for the thresholds 0 The colors in both plots represent three CDH ranges: low (green),
and 0.025, the MAE increased 3.8%, 17.6% and 17.9%, respectively. intermediate (orange) and high (purple). On the left plot, the cor-
Since ‘‘azimuth” and ‘‘shgc” are indeed insignificant for the dataset relation between the simulated EUI (x-axis) and the EUI by the
considered, they were removed from this benchmark. Thus, the most accurate model (y-axis) is described by a scatter plot.
most accurate model considering the threshold of 0.01, highlighted The diagonal black line crossing the plot represents the function
in bold in Table 6, was identified and selected. y = x. Thus, any record lying on this line is a perfect prediction, i.e.,
The adoption of nine variables facilitates the use of the model, the predicted value is exactly the same as the simulated value. In
due to a lower number of input variables, and maintains a similar the left plot, there is a predominance of purple points on the top
level of model accuracy when compared to the model considering of the diagonal line and green points on the bottom of it, indicating
all 11 variables. The SVM achieved MAE and RMSE values of 3.16 a positive correlation between CDH and EUI, as expected. The con-

Table 6
Descriptive statistics for the most accurate predictive models.

Sample modeling technique Importance threshold Number of input variables MAE kWh/m2.year RMSE kWh/m2.year R2
Linear Regression Borgstein and Lamberts [17] – 1 (CDH) 53.5 65.4 0.21
Multiple Linear Regression 0 11 6.34 8.40 0.90
0.01 9 6.36 8.42 0.90
0.025 8 6.59 8.69 0.90
0.05 6 8.51 10.95 0.84
Artificial Neural Networks 0 11 3.10 4.34 0.97
0.01 9 3.26 4.57 0.97
0.025 8 3.76 5.12 0.96
0.05 6 7.36 9.46 0.88
Support Vector Machine 0 11 3.12 4.36 0.97
0.01 9 3.16 4.45 0.97
0.025 8 3.80 5.23 0.96
0.05 6 7.33 9.56 0.88

10
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

Fig. 8. Performance of the most accurate model.

Fig. 9. Validation cases.

centration of points around the diagonal line confirms the high dated with real data. The benchmark was used to predict the
degree of association between simulated and predicted outputs. energy consumption of three Brazilian bank branches. Cases A, B
On the right plot, the x-axis and y-axis represent, respectively, and C in Fig. 9 correspond to bank branches located in Curitiba
the values of the prediction errors, calculated by subtracting the (PR), Belo Horizonte (MG) and João Pessoa (PB), respectively. All
predicted from the simulated output, and the relative probability. of the input data was collected from in-loco audits, while the
This plot does not show a relevant trend; however, the probability annual energy consumption was calculated from the sum of
of obtaining negative errors is slightly higher than the probability monthly energy bills.
of obtaining positive errors for intermediate CDH, suggesting that Table 7 shows the input variables and their corresponding val-
predictions of buildings in mild climates tend to present lower ues. The geometries of these cases are similar, that is, they are all
EUI values than the simulated data. The errors for mild climates stand-alone buildings with around 600 m2 of building floor area.
are also larger than the errors for hot and cold climates. Although Case B has only three facades exposed to the sun, the

5. Case study
Table 7
Inputs of the validation cases.
A benchmark model represents a simplification of reality;
Input parameter Input value
therefore, validation is necessary to quantify the degree of imper-
fectness and guarantee the model accuracy outside the sample Case A Case B Case C
[102], especially when dealing with white-box models, which Area 574 638 652
require a large number of input parameters [12]. When the valida- Boundary conditions ‘‘outdoors” ‘‘outdoors” ‘‘outdoors”
tion does not present satisfactory results, additional calibration is ATM density 8.2 7.9 5.6
Lighting power density 19.5 20.4 6.6
required to reduce the discrepancies between simulation and real Envelope materials ‘‘light” ‘‘heavy” ‘‘light”
data and implement meaningful energy conservation measures HVAC system ‘‘split” ‘‘vrf” ‘‘vrf”
[103]. COP 3 4.3 4.1
Since the benchmark in this study was trained with a Air change calculation method ‘‘max” ‘‘min” ‘‘smart”
CDH 9481 23,883 65,930
simulation-based archetype, the most accurate model was vali-
11
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

Fig. 10. Energy consumption of the validation cases.

variable boundary condition was considered ‘‘outdoors”. The case Shifting from low to high CDH, the energy consumption for cooling
studies covered the range of the most important variables for the of the buildings tends to increase while consumption for heating
EUI, according to previously performed sensitivity analysis. CDH tends to reduce. The opposite interactions between heating and
values were considered for three different climates within the wide CDH and cooling and CDH constitute non-linear behavior, which
diversity of Brazilian climates. According to Köppen’s climate clas- could not be predicted by the linear model.
sification for Brazil [104], Cases A, B and C correspond to a humid Adding new features to the previous study [17], such as the COP
subtropical zone without a dry season (Cfb), a humid subtropical and HVAC system, which are strongly correlated to the EUI, signif-
zone with dry winter (Cwb) and a tropical zone with monsoon icantly increased the model accuracy. The additional features also
periods (Am), respectively. Because of the low efficiency of HVAC brought multiple non-linear patterns into the model. Thus, adopt-
systems in Brazilian buildings, the COP was low for both ‘‘vrf” ing a more sophisticated sample modeling technique was advanta-
and ‘‘split”. geous and benefited the model accuracy.
Due to a lack of data, the case studies were the only audits con-
taining reliable values for all of the input variables required to per- 6. Conclusions
form the predictions. Thus, even though in Case C the lighting
power density thresholds were extrapolated, they were still con- An application of machine learning to estimate building energy
sidered in this analysis. Since the lighting power density does not use intensities of bank branches in Brazil was developed in this
interact with the variables, as verified in the sensitivity analysis, study. In order to improve the accuracy to a previously described
and has a particular linear behavior, its extrapolation did not result benchmarking model [17], information was gathered from differ-
in unacceptable errors. Nevertheless, this approach is not recom- ent sources and a new and more robust archetype was obtained
mended, because the predictive model may not be able to describe and used to build a parametric dataset with building energy simu-
the behavior of highly correlated variables outside this range. To lations. Based on the results presented in this paper the following
avoid resampling and extra computational effort in order to obtain conclusions can be drawn:
wider ranges for the variables, the use of an incremental sampling-
based algorithm is suggested [105].  Data on the geometry, envelope materials, internal loads, occu-
Fig. 10 shows the values for the real and predicted energy con- pancy and other characteristics of Brazilian bank branches are
sumption using the actual predictive model and the preceding essential for the development of a suitable archetype for this
benchmark model [17], defined in Equation (8). The equation refers typology;
to a linear model which is dependent only on the CDH, hence the  The values for the sensitivity indexes were used to conduct the
interactions of the other parameters that influence the EUI are con- selection of features;
densed into the intercept coefficient, including the lighting power  Sensitivity analysis revealed that the lighting power density is
density, which has a greater influence than the CDH. the variable with the strongest influence when estimating the
The results show relative differences between the real and pre- EUI of Brazilian bank branches, followed by CDH;
dicted values for the EUI of Cases A, B and C of 0.7%, 5.4% and  Among the machine learning techniques optimized, SVM and
3.9%, respectively. The last benchmark presented a very accurate ANN were the most accurate models, with their MAE values
result for Case B, with a relative difference of 0.9%. However, it diverging by only 0.1 kWh/m2.year.
underestimated Case A by 15.9% and overestimated Case C by 99.2%.  Adding new features to the previous study [17], such as the COP
Fig. 4 shows a negative correlation between CDH and the energy and HVAC system, which are strongly correlated to the EUI, sig-
consumption associated with heating (0.42) and a positive correla- nificantly increased the model accuracy;
tion between CDH and the energy consumption for cooling (0.77).
12
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

 The additional features also brought multiple non-linear pat- [20] P. Bohdanowicz, I. Martinac, Determinants and benchmarking of resource
consumption in hotels – Case study of Hilton International and Scandic in
terns into the model. Thus, adopting a more sophisticated sam-
Europe, Energy Build. 39 (2007) 82–95.
ple modeling technique was advantageous and benefited the [21] W.-S. Lee, Benchmarking the energy efficiency of government buildings with
model accuracy. data envelopment analysis, Energy Build. 40 (2008) 891–895.
[22] W. Chung, Y. Van Hui, A study of energy efficiency of private office buildings
in Hong Kong, Energy Build. 41 (2009) 696–701.
The methodology proposed herein demonstrates that defining [23] W.-S. Lee, K.-P. Lee, Benchmarking the performance of building energy
an appropriate set of input variables is crucial to achieving signif- management using data envelopment analysis, Appl. Therm. Eng. 29 (2009)
icant model accuracy, while optimizing different hyperparameters 3269–3273.
[24] A. Sabapathy, S.KV. Ragavan, M. Vijendra, A.G. Nataraja, Energy efficiency
may also improve the accuracy. benchmarks and the performance of LEED rated buildings for Information
A novel approach to building a benchmark model from building Technology facilities in Bangalore, India, Energy Build. 42 (2010) 2206–2212.
energy simulation using a limited amount of information was pro- [25] C. Martin, Generating low-cost national energy benchmarks: A case study in
commercial buildings in Cape Town, South Africa, Energy Build. 64 (2013)
posed and the results of this study are of great value, since the 26–31.
benchmark developed can be used in future benchmarking [26] W. Chung, Using the fuzzy linear regression method to benchmark the energy
throughout Brazil. The methodology showed high accuracy and efficiency of commercial buildings, Appl. Energy 95 (2012) 45–49.
[27] A. Aranda, M.D. Germán Ferreira, S.S. Mainar-Toledo, E.L. Sastresa, Multiple
could be extended to other typologies. regression models to predict the annual energy consumption in the Spanish
banking sector, Energy Build. 49 (2012) 380–387.
[28] T. Østergård, R.L. Jensen, S.E. Maagaard, A comparison of six metamodeling
Declaration of Competing Interest techniques applied to building performance simulations, Appl. Energy 211
(2018) 89–103.
The authors declare that they have no known competing finan- [29] B. Dong, C. Cao, S.E. Lee, Applying support vector machines to predict building
energy consumption in tropical region, Energy Build. 37 (2005) 545–553.
cial interests or personal relationships that could have appeared [30] A.H. Neto, F.A.S. Fiorelli, Comparison between detailed model simulation and
to influence the work reported in this paper. artificial neural network for forecasting building energy consumption, Energy
Build. 40 (2008) 2169–2176.
[31] Q. Li, Q. Meng, J. Cai, H. Yoshino, A. Mochida, Applying support vector
Acknowledgements machine to predict hourly cooling load in the building, Appl. Energy 86
(2009) 2249–2256.
[32] J. Wang, W. Zhu, W. Zhang, D. Sun, A trend fixed on firstly and seasonal
The work reported in this paper was supported by the National adjustment model combined with the e-SVR for short-term forecasting of
Council for Scientific and Technological Development (CNPq). electricity demand, Energy Policy 37 (2009) 4901–4909.
[33] H.X. Zhao, F. Magoulès, Parallel support vector machines applied to the
prediction of multiple buildings energy consumption, J. Algor. Comput.
References Technol. 4 (2010) 231–249.
[34] De Wilde, Pieter, Carlos Martinez-Ortiz, Darren Pearson, Ian Beynon, Martin
Beck, and Nigel Barlow. Building simulation approaches for the training of
[1] S. Papadopoulos, B. Bonczak, C.E. Kontokosta, Pattern recognition in building
automated data analysis tools in building energy management. Advanced
energy performance over time using energy benchmarking data, Appl. Energy
Engineering Informatics, v. 27, p. 457-465, 2013.
221 (2018) 576–586.
[35] A.P. Melo, D. Cóstola, R. Lamberts, J.L.M. Hensen, Development of surrogate
[2] T. Nikolaou, D. Kolokotsa, G. Stavrakakis, Review on methodologies for energy
models using artificial neural network for building shell energy labelling,
benchmarking, rating and classification of buildings, Adv. Build. Energy Res. 5
Energy Policy 69 (2014) 457–466.
(2011) 53–70.
[36] Y. Zhang, Z. O’Neill, B. Dong, G. Augenbroe, Comparisons of inverse modeling
[3] M. Bourdeau, X. Qiang Zhai, E. Nefzaoui, X. Guo, P. Chatellier, Modeling and
approaches for predicting building energy performance, Build. Environ. 86
forecasting building energy consumption: A review of data-driven
(2015) 177–190.
techniques, Sustain. Cities Soc. 48 (2019) 101533.
[37] C. Deb, L.S. Eang, J. Yang, M. Santamouris, Forecasting diurnal cooling energy
[4] Kinney, Satkartar, and Mary Ann Piette. Development of a California
load for institutional buildings using Artificial Neural Networks, Energy Build.
commercial building benchmarking database. (2002).
121 (2016) 284–297.
[5] S. Wang, C. Yan, F.u. Xiao, Quantitative energy performance assessment
[38] A.P. Melo, R.S. Versage, G. Sawaya, R. Lamberts, A novel surrogate model to
methods for existing buildings, Energy Build. 55 (2012) 873–888.
support building energy labelling system: A new approach to assess cooling
[6] H. Li, X. Li, Benchmarking energy performance for cooling in large commercial
energy demand in commercial buildings, Energy Build. 131 (2016) 233–247.
buildings, Energy Build. 176 (2018) 179–193.
[39] C.E. Kontokosta, C. Tull, A data-driven predictive model of city-scale energy
[7] K. Palmer, M. Walls, Using information to close the energy efficiency gap: a
use in buildings, Appl. Energy 197 (2017) 303–317.
review of benchmarking and disclosure ordinances, Energ. Effi. 10 (2017)
[40] C. Robinson, B. Dilkina, J. Hubbs, W. Zhang, S. Guhathakurta, M.A. Brown, R.M.
673–691.
Pendyala, Machine learning approaches for estimating commercial building
[8] IPMVP. International Performance Measurement and Verification Protocol,
energy consumption, Appl. Energy 208 (2017) 889–904.
2021. Available at: http://www.evoworld.org. Accessed in: May, 2021.
[41] H. Deng, D. Fannon, M.J. Eckelman, Predictive modeling for US commercial
[9] M.J. Kaiser, A.G. Pulsipher, Preliminary assessment of the Louisiana Home
building energy use: A comparison of existing statistical and machine
Energy Rebate Offer program using IPMVP guidelines, Appl. Energy 87 (2)
learning algorithms using CBECS microdata, Energy Build. 163 (2018) 34–43.
(2010) 691–702.
[42] A.S. Ahmad, M.Y. Hassan, M.P. Abdullah, H.A. Rahman, F. Hussin, H. Abdullah,
[10] S. Ginestet, D. Marchio, Retro and on-going commissioning tool applied to an
R. Saidur, A review on applications of ANN and SVM for building electrical
existing building: Operability and results of IPMVP, Energy 35 (4) (2010)
energy consumption forecasting, Renew. Sustain. Energy Rev. 33 (2014) 102–
1717–1723.
109.
[11] https://doi.org/10.1016/j.energy.2009.12.024
[43] W. Tian, A review of sensitivity analysis methods in building energy analysis,
[12] E.H. Borgstein, R. Lamberts, J.L.M. Hensen, Evaluating energy performance in
Renew. Sustain. Energy Rev. 20 (2013) 411–419.
non-domestic buildings: A review, Energy Build. 128 (2016) 734–755.
[44] Mara, Thierry A., and Stefano Tarantola. Application of global sensitivity
[13] L.i. Zhengwei, Y. Han, X.u. Peng, Methods for benchmarking building energy
analysis of model output to building thermal simulations. In Building
consumption against its past or intended performance: An overview, Appl.
Simulation, vol. 1, no. 4, pp. 290-302. Springer Berlin Heidelberg, 2008.
Energy 124 (2014) 325–334.
[45] Iooss, Bertrand, and Paul Lemaître. A review on global sensitivity analysis
[14] A. Foucquier, S. Robert, F. Suard, L. Stéphan, A. Jay, State of the art in building
methods. In Uncertainty management in simulation-optimization of complex
modelling and energy performances prediction: A review, Renew. Sustain.
systems, pp. 101-122. Springer, Boston, MA, 2015.
Energy Rev. 23 (2013) 272–288.
[46] H. Huang, L. Chen, H.u. Eric, A neural network-based multi-zone modelling
[15] H.-X. Zhao, F. Magoulès, A review on the prediction of building energy
approach for predictive control system design in commercial buildings,
consumption, Renew. Sustain. Energy Rev. 16 (2012) 3586–3592.
Energy Build. 97 (2015) 86–97.
[16] M.L. Chalal, M. Benachir, M. White, R. Shrahily, Energy planning and
[47] X. Li, J. Wen, E.-W. Bai, Developing a whole building cooling energy
forecasting approaches for supporting physical improvement strategies in
forecasting model for on-line operation optimization using proactive
the building sector: A review, Renew. Sustain. Energy Rev. 64 (2016) 761–
system identification, Appl. Energy 164 (2016) 69–88.
776.
[48] Arababadi, Reza. Energy use in the eu building stock-case study: UK. (2012).
[17] T. Hastie, R. Tibshirani, J. Friedman, The Elements Of Statistical Learning: Data
[49] D. Daly, P. Cooper, Z. Ma, Understanding the risks and uncertainties
Mining, Inference, And Prediction, Springer Science & Business Media, 2009.
introduced by common assumptions in energy simulations for Australian
[18] E.H. Borgstein, R. Lamberts, Developing energy consumption benchmarks for
commercial buildings, Energy Build. 75 (2014) 382–393.
buildings: Bank branches in Brazil, Energy Build. 82 (2014) 82–91.
[50] H.E. Mechri, A. Capozzoli, V. Corrado, Use of the ANOVA approach for
[19] W. Chung, Y.V. Hui, Y. Miu Lam, Benchmarking the energy efficiency of
sensitive building energy design, Appl. Energy 87 (2010) 3073–3083.
commercial buildings, Appl. Energy 83 (2006) 1–14.

13
R.K. Veiga, A.C. Veloso, A.P. Melo et al. Energy & Buildings 249 (2021) 111219

[51] De Wilde, Pieter, and Wei Tian. Predicting the performance of an office under [77] Brasil. Ministério da Saúde. Agência Nacional de Vigilância Sanitária
climate change: A study of metrics, sensitivity and zonal resolution. Energy (ANVISA). Resolução – RE n.° 9, de 16 de janeiro de 2003. Determina a
and Buildings, v. 42, p. 1674-1684, 2010. publicação de Orientação Técnica elaborada por Grupo Técnico Assessor,
[52] Ruiz, Roberto, Stéphane Bertagnolio, and Vincent Lemort. Global sensitivity sobre Padrões Referenciais de Qualidade do Ar Interno, em ambientes
analysis applied to total energy use in buildings. (2012). climatizados artificialmente de uso público e coletivo. 2003. Available at: <
[53] D. Hsu, How much information disclosure of building energy performance is https://www.saude.mg.gov.br/index.php?
necessary?, Energy Policy 64 (2014) 263–272 option=com_gmg&controller=document&id=899>. Accessed in: January,
[54] S. Papadopoulos, C.E. Kontokosta, Grading buildings on energy performance 2021.
using city benchmarking data, Appl. Energy 233 (2019) 244–253. [78] Versage, R. S. 2015. Metamodelo para estimar a carga térmica de edificações
[55] L. Pérez-Lombard, J. Ortiz, R. González, I.R. Maestre, A review of condicionadas artificialmente. Thesis. Federal University of Santa Catarina.
benchmarking, rating and labelling concepts within the framework of 201 pages (in Portuguese).
building energy certification schemes, Energy Build. 41 (2009) 272–278. [79] Z.U.H.A.L. Oktay, C. Coskun, I. Dincer, A new approach for predicting cooling
[56] D. Zhao, A.B. Miotto, M. Syal, J. Chen, Framework for Benchmarking green degree-hours and energy requirements in buildings, Energy 36 (2011) 4855–
building movement: A case of Brazil, Sustain. Cities Soc. 48 (2019) 101545. 4863.
[57] Ali, Usman, Mohammad Haris Shamsi, Cathal Hoare, Eleni Mangina, and [80] L. Mazzaferro, R.MS. Machado, A.P. Melo, R. Lamberts, Do we need building
James O’Donnell. A data-driven approach for multi-scale building archetypes performance data to propose a climatic zoning for building energy efficiency
development. Energy and Buildings, v. 202 p. 109364, 2019. regulations?, Energy Build. 225 (2020) 110303.
[58] Monteiro, Claudia Sousa, André Pina, Carlos Cerezo, Christoph Reinhart, and [81] A. Saltelli, Making best use of model evaluations to compute sensitivity
Paulo Ferrão. The use of multi-detail building archetypes in urban energy indices, Comput. Phys. Commun. 145 (2002) 280–297.
modelling. Energy Procedia, v. 111, p. 817-825, 2017. [82] S.S. Garud, I.A. Karimi, M. Kraft, Design of computer experiments: A review,
[59] P. Correia, R. Souza, Procel – Pesquisa de posse de equipamentos e hábitos de Comput. Chem. Eng. 106 (2017) 71–95.
uso, Ano base 2005 – Classe Comercial alta tensão – Relatório Brasil (Survey [83] R. Taylor, Interpretation of the correlation coefficient: a basic review, Journal
on Ownership on Ownership of Equipment and Use Habit, Base Year 2005 – of diagnostic medical sonography 6 (1990) 35–39.
Commercial Class, High Voltage), Eletrobras, Rio de Janeiro, 2008 (in [84] I.M. Sobol’, On sensitivity estimation for nonlinear mathematical models,
Portuguese). Matematicheskoe modelirovanie 2 (1990) 112–118.
[60] CBCS. Conselho Brasileiro de Construção Sustentável, 2021. Available at: < [85] J. Herman, W. Usher, SALib: an open-source Python library for sensitivity
http://www.cbcs.org.br/benchmarkingenergia>. Accessed in: January, 2021. analysis, J. Open-Source Software 2 (2017) 97.
[61] META. Projeto de Assistência Técnica dos Setores de Energia e Mineral, 2013. [86] T. Homma, A. Saltelli, Importance measures in global sensitivity analysis of
Available at: < https://www.epe.gov.br/pt/publicacoes-dados-abertos/ nonlinear models, Reliab. Eng. Syst. Saf. 52 (1996) 1–17.
publicacoes/projeto-de-assistencia-tecnica-dos-setores-de-energia-e- [87] N. Silver, The signal and the noise: the art and science of prediction, Penguin
mineral-projeto-meta>. Accessed in: January, 2021. UK, 2012.
[62] CBCS. Conselho Brasileiro de Construção Sustentável, 2021. Available at: < [88] Lever, Jake, Martin Krzywinski, and Naomi Altman. Points of significance:
http://www.cbcs.org.br/website/comite-tematico/atividades-em- model selection and overfitting. (2016): 703.
andamento.asp?cctCode=AD7C7A37-2F51-4451-9202-FC0FF6407BCA >. [89] K. Amasyali, N.M. El-Gohary, A review of data-driven building energy
Accessed in: January, 2021. consumption prediction studies, Renew. Sustain. Energy Rev. 81 (2018)
[63] C.F. Reinhart, C.C. Davila, Urban building energy modeling–A review of a 1192–1205.
nascent field, Build. Environ. 97 (2016) 196–202. [90] R.M. Sakia, The Box-Cox transformation technique: a review, J. Royal Stat. Soc.
[64] Commercial buildings energy consumption survey (CBECS). International Series D (The Statistician) 41 (2) (1992) 169–178.
Energy Agency (2015). [91] J. Osborne, Improving your data transformations: Applying the Box-Cox
[65] Y. Ye, K. Hinkelman, J. Zhang, W. Zuo, G. Wang, A methodology to create transformation, Pract. Assess. Res. Eval. 15 (2010) 12.
prototypical building energy models for existing buildings: A case study on [92] M. Kuhn, K. Johnson, Applied Predictive Modeling, Vol. 26, Springer, New
US religious worship buildings, Energy Build. 194 (2019) 351–365. York, 2013.
[66] I. Korolija, L. Marjanovic-Halburd, Y. Zhang, V.I. Hanby, UK office buildings [93] D.N. Joanes, C.A. Gill, Comparing measures of sample skewness and kurtosis, J.
archetypal model as methodological approach in development of regression Royal Stat. Soc.: Series D (The Statistician) 47 (1) (1998) 183–189.
models for predicting building energy consumption from heating and cooling [94] K.P. Balanda, H.L. MacGillivray, Kurtosis: a critical review, Am. Stat. 42 (2)
demands, Energy Build. 60 (2013) 152–162. (1988) 111–119.
[67] ABNT, Desempenho térmico de edificações, Parte 3: Zoneamento bioclimático [95] D’agostino, Ralph B., Albert Belanger, and Ralph B. D’Agostino Jr. A suggestion
brasileiro e diretrizes construtivas para habitações unifamiliares de interesse for using powerful and informative tests of normality. The American
social, NBR 15.220-3, Associação Brasileira de Normas Técnicas, 2003. Statistician 44, no. 4 (1990): 316-321.
[68] CB3E, Proposta de método para a avaliação da eficiência energética com em [96] M. Kuhn, Building predictive models in R using the caret package, J. Stat.
energia primária de edificações comerciais, de serviços e públicas, INI-C, Softw. 28 (5) (2008) 1–26.
Centro Brasileiro de Eficiência Energética em Edificações, Florianópolis, 2017. [97] G. James, D. Witten, T. Hastie, R. Tibshirani (Eds.), An introduction to
[69] ASHRAE, ANSI/ASHRAE Standard 55 – 2013: Thermal Environmental statistical learning, Vol. 112, springer, New York, 2013.
Conditions for Human Occupancy, American Society of Heating, [98] S. Arlot, A. Celisse, A survey of cross-validation procedures for model
Refrigerating and Air-conditioning Engineers, Atlanta, GA, 2013. selection, Stat. Surveys 4 (2010) 40–79.
[70] ABNT, Instalações de ar-condicionado – Sistemas centrais e unitários, Parte 3: [99] Sarle, Warren S. Neural networks and statistical models. (1994).
Qualidade do ar interior, NBR 16.401-3, Associação Brasileira de Normas [100] T. Hastie, S. Rosset, R. Tibshirani, J. Zhu, The entire regularization path for the
Técnicas, 2008. support vector machine, J. Machine Learn. Res. 5 (2004) 1391–1415.
[71] Banco Central do Brasil. Altera e consolida as normas que dispõem sobre o [101] Shcherbakov, Maxim Vladimirovich, Adriaan Brebels, Nataliya Lvovna
horário de funcionamento das instituições financeiras e demais instituições Shcherbakova, Anton Pavlovich Tyukov, Timur Alexandrovich Janovsky, and
autorizadas a funcionar pelo Banco Central do Brasil, bem como acerca dos Valeriy Anatol’evich Kamaev. A survey of forecast error measures. World
dias úteis para fins de operações praticadas no mercado financeiro. Resolução Applied Sciences Journal, v. 24, p. 171-176, 2013.
n° 2932 de 28 de fevereiro de 2002. (accessed 01.26.21) (in Portuguese). [102] Botchkarev, Alexei. Performance metrics (error measures) in machine
[72] Climate One Building, c2020. Available at: < http://climate.onebuilding.org/>. learning regression, forecasting and prognostics: Properties and typology.
Accessed in: January, 2021. arXiv preprint arXiv:1809.03006 (2018).
[73] EnergyPlus, c2020. Available at: https://energyplus.net/. Accessed in: [103] M. Manfren, N. Aste, R. Moshksar, Calibration and uncertainty analysis for
January, 2021. computer models–a meta-model-based approach for integrated building
[74] DOE, Input Output Reference, EnergyPlus Version 9.0.1 Documentation, U.S. energy simulation, Appl. Energy 103 (2013) 627–641.
Department of Energy, 2018. [104] A. Cacabelos, P. Eguía, L. Febrero, E. Granada, Development of a new multi-
[75] ABNT, Vidros na construção civil – Projeto, execução e aplicações, NBR 7199, stage building energy model calibration methodology and validation in a
Associação Brasileira de Normas Técnicas, 2016. public library, Energy Build. 146 (2017) 182–199.
[76] J. Yuventi, R. Mehdizadeh, A critical analysis of Power Usage Effectiveness and [105] Alvares, Clayton Alcarde, José Luiz Stape, Paulo Cesar Sentelhas, JL de M.
its use in communicating data center energy consumption, Energy Build. 64 Gonçalves, and Gerd Sparovek. Köppen’s climate classification map for Brazil.
(2013) 90–94. Meteorologische Zeitschrift, v. 22, p. 711-728, 2013.

14

You might also like