Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble
Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble
Abstract: The ability to make accurate energy predictions while considering all related energy factors allows
production plants, regulatory bodies, and governments to meet energy demand and assess the effects of
energy-saving initiatives. When energy consumption falls within normal parameters, it will be possible to use
the developed model to predict energy consumption and develop improvements and mitigating measures for
energy consumption. The objective of this model is to accurately predict energy consumption without data
limitations and provide results that are easily interpretable. The proposed model is an implementation of the
stacked Long Short-Term Memory (LSTM) snapshot ensemble combined with the Fast Fourier Transform (FFT)
and meta-learner. Hebrail and Berard’s Individual Household Electric-Power Consumption (IHEPC) dataset
incorporated with weather data are used to analyse the model’s accuracy with predicting energy consumption.
The model is trained, and the results measured using Root Mean Square Error (RMSE), Mean Absolute Error
(MAE), Mean Absolute Percentage Error (MAPE), and coefficient of determination ( R2 ) metrics are 0.020,
0.013, 0.017, and 0.999, respectively. The stacked LSTM snapshot ensemble performs better than the
compared models based on prediction accuracy and minimized errors. The results of this study show that
prediction accuracy is high, and the model’s stability is high as well. The model shows that high levels of
accuracy prove accurate predictive ability, and together with high levels of stability, the model has good
interpretability, which is not typically accounted for in models. However, this study shows that it can be inferred.
Key words: Artificial Intelligence (AI); Deep Learning (DL); energy consumption; snapshot ensemble; prediction
© The author(s) 2024. The articles published in this open access journal are distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
248 Big Data Mining and Analytics, June 2024, 7(2): 247−270
academic researchers in the energy sector, energy forecast smart building energy consumption. In 2017,
efficiency is a concerning topic that is critical for Wang and Srinivasan[5] examined ensemble-based AI
achieving the low-carbon economy target (green models to predict building energy use. Other studies
economy)[1]. have used models from Deep Learning (DL) to conduct
Many governments appreciate the benefits of research in the creation of a system for managing
efficiently using energy. Efficient energy use affects energy. Jahani et al.[7] incorporated a numerical
the capacity of a building to acquire a green building moment matching technique with a genetic algorithm
certificate, which is based on the green building rating to create a tool for predicting residential building
systems intended to minimize greenhouse effects and energy use.
carbon emissions. In this regard, predicting energy use It is essential that national energy efficiency policies
is critical for planning, conservation, and management. be developed and proposed by evaluating trends in
Also, with increased demand, the call for better energy electricity consumption and energy structures[8]. The
consumption planning comprises improved foundation for minimizing energy costs and
consumption measurement and distribution planning. maximizing energy performance is the capacity to
The capacity to optimize and predict energy forecast energy consumption beyond buildings[9]. ML
consumption can aid with energy distribution to and AI are already being used in the building and
support the increased energy demand[1]. energy domain[10–14], where models use historical data
Various studies have been conducted to further to forecast energy consumption and generate new
develop better energy utilization and execution in insights. For example, based on the temperature of the
structures[2–4]. To create applications for smart skin, machine-learning models can predict thermal
buildings, buildings must use smart devices, and demand[10]. Using an optimization model and machine
artificial intelligence and engineering-based methods learning to identify energy data patterns, Chou et al.[15]
are typically used to predict energy consumption. evaluated time series energy data in 2016.
Engineering methods and models use principles like Artificial Neural Networks (ANN), linear regression
thermodynamic equations to forecast energy models, and Support Vector Regression (SVR) models
consumption. For energy performance evaluation, these are among the most widely used machine-learning
models and methods frequently necessitate expertise in models[5]. Ganguly et al.’s research[14] predicted energy
customizing them to meet specific requirements and consumption in a historical art gallery using an ANN
programming the thermal parameters. Engineering model. In 2019, Seyedzadeh et al.[16] analysed
models and techniques require in-depth information performance with predicting building cooling and
about the building envelope, the thermal properties of heating loads of ML models using SVR, ANN, random
the windows and construction layers, and the forest, Gaussian process, and Gradient-Boosted
ventilation, heating, and air-conditioning systems used Regression Trees (GBRT). It is resolved that GBRT
to predict energy consumption accurately. shows the best presentation when utilizing the root
Future energy consumption models based on mean square mistake values. The researchers also
historical data are referred to by Artificial Intelligence concluded that the ANN model evaluates complex
(AI) and Machine-Learning (ML) techniques[5]. The datasets best. Additionally, the ANN model computes
ability of models and algorithms to learn about the significantly faster than the other ML models in their
relationship between future and historical data is one of study[16].
the benefits of using AI and ML techniques. The ML, a subset of artificial intelligence, allows the
prediction models that are developed are fed historical machine or system to learn without human
data, in contrast to engineering techniques that use intervention. Several DL techniques are used in
comprehensive building data. Additionally, users are prediction, including Recurrent Neural Networks
not required to have a comprehensive understanding of (RNN), Deep Neural Networks (DNN), and Deep
the thermodynamic behaviour of a building. Belief Networks (DBN). These powerful tools help
AI models have been developed to predict energy with acquiring robust modelling and prediction
performance, as suggested by some studies. For performance. RNN is characterized by taking the
instance, Song et al.[6] created an evolutionary model to output of a previous step and feeding it to the current or
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 249
next step as input. Its most important feature is its model, and most of these available models focus
hidden state that retains information about the primarily on result accuracy and fail to consider result
sequence. However, it becomes untrained when the interpretability. This created a need to conceptualize a
network contains large datasets and layers. Long Short- model that would solve the issue of accuracy in an
Term Memory (LSTM) is a DL technique based on energy prediction model without dataset limitations
RNN that provides the added advantage of successfully and produce results that were easily interpretable.
training to overcome the problems found with RNN[16]. To solve this issue with accuracy and good
Although ML models can produce significant and interpretability while not being data constrained in an
demonstrated prediction accuracy improvements in energy prediction model, the present study proposes
many cases, research has mainly focused on improving applying stacked LSTM snapshot ensemble. LSTM is a
accuracy without dealing with the interpretability of the DL technique based on RNN, which can make this one
results. Currently, expert systems, primarily developed seems like just another LSTM model. However, the
using linguistic fuzzy logic systems, give users the proposed model would implement LSTM snapshot,
ability to have systems modelling capabilities and good which includes the characteristics of RNN algorithms,
interpretability[12]. However, the systems and models LSTM algorithms, and the snapshot characteristics.
often depend on individual expertise and regularly do The advantages that the stacked LSTM snapshot would
not produce accurate predictions. Thus, to meet the include inputs as connected time series; solving
requirements of interpretability and high accuracy, it is problems with a disappearing and vanishing gradient;
being proposed to combine popular techniques, expert and being able to store snapshots of different data
models, and other methods. Despite the application of slices, depending on the length of their sequence. The
DL models in energy and the meeting of the accuracy model could work with data with different sequence
requirement, there is still a need to improve lengths, as well as train them, which is a problem for
performance in energy consumption and production most prediction models. Implementation of the Fast
application. Fourier Transform (FFT) would make this possible and
ML is a rich field with many models that can be and allow the proposed model to work with seasonal
have been implemented in energy consumption pattern series. Meta-learning would then be applied to
prediction, such as ANN, SVR, linear regression collected base model snapshots and a final estimate of
models[5], and RNN models[16], to name just a few. the energy prediction determined. This is where
With every prediction model proposed and built, there accuracy is achieved because the final estimate is a
is a problem solved, but each of these machine-learning mean of all the collected snapshots for a given data
models has its own problems. The biggest issue is with input. In addition, we improve the viability of the data
the training data. ANN is attributed to the lack of by adding weather data. The fact that they are climate
energy training data, SVR is not suitable for large change and unpredictable weather events implies that
datasets, and linear regression is prone to overfitting the weather component is evolving and thus having
and noise. Overfitting is where a model will work well new implications on energy consumption and
with training data but very poorly with new data. production. The proposed model ensures that every
Another problem is when a model will work well with predicted consumption instance is checked for errors
the training data, but if there is less training data than and accuracy. The model is found to have the best
new data, the model underperforms. This is a problem accuracy relative to the compared models and can be
with SVR. In linear regression, linearity is quite used for accurate energy consumption prediction. With
important, which leaves nonlinear data as a additional performance evaluation, it can be used by
disadvantage. All these data issues greatly impact the energy companies for consumption analysis to improve
accuracy of the prediction model. RNN[16] models also their service delivery.
have an issue with training large datasets and layers. This paper discusses works related to various DL
Data issues aside, some models may manage to achieve models, including LSTM and ensemble learning
accurate data, but the results achieved cannot be models in Section 2. The proposed model is described
interpreted. Modelling capabilities and good in-depth in Section 3, and in Section 4, the data and
interpretability[12] are essential for any prediction related visualizations are presented. Section 5 delves
250 Big Data Mining and Analytics, June 2024, 7(2): 247−270
into the developed model, data preparation, settings, operations as the ANN predictive models[18].
feature selection, and result evaluation. Section 6 Using 30-minute Short-Term Load Forecasting
discusses the model training and related losses. Section (STLF) resolutions, the researcher compare the
7 compares the stacked snapshot LSTM ensemble to performance of several ANN models with numerous
other DL models, and Section 8 concludes this hidden layers and activation functions[19]. The models
discussion of the study. use 1–10 hidden layers and different activation
functions, which comprise the parametric rectified
2 Related Work linear unit, rectified linear unit, exponential linear
One of the main components of AI is DL, which is units, leaky rectified linear units, and scaled
defined as a set of layered knowledge-acquiring exponential linear unit[19]. Using electrical
computer algorithms used by computers and machines consumption data from five specific buildings collected
to learn without the need for explicit programming[17]. over 2 years and two performance metrics—the
Additionally, DL provides AI with layered algorithms Coefficient of Variation of the Root Mean Square Error
that machines can use to automatically learn and (CVRMSE) and Mean Absolute Percentage Error
improve actions based on previous experiences. While (MAPE)—they discovered that the model with five
AI is characterized by airing and applying knowledge, hidden layers has an average superior performance
DL is the acquisition of skill and knowledge[17]. When relative to other tested models designed for STLF[19].
using AI, the goal is to increase the probability of Although the researchers produced a standard model
success rather than focusing primarily on accuracy. for predicting energy consumption, it is possible to
However, when using DL, the goal is mainly to create a more precise prediction model by including the
increase the accuracy of an action, regardless of input variables, which can show a building’s energy
success. AI can be compared to smart computer consumption characteristics. Additionally, the target’s
software, while DL can be likened to the processes and forecasting performance can be anticipated to rise due
techniques a machine uses with data in learning. As to hyperparameter tuning in the Scaled Exponential
indicated earlier, DL algorithms are specifically Linear Unit (SELU) prediction models[20].
designed to help machines learn. Typically, the DL The SVR, LSTM, and predictive model combining
process involves finding relevant data to identify SVR and LSTM contain 240 samples with 24-hour
patterns. After identifying a pattern, the machine can load profiles[21]. The goal is to perform short-term
predict outcomes for new data using historical data. microgrid load forecasting. Each hour’s load quantity
There are three ways machines learn: supervised is chosen as the output variable, and the input variables
learning, reinforcement, and unsupervised learning[17]. are used as an input sample. The majority of the data
Predicting energy demand frequently makes use of (70%) in each network is used for training the model,
neural networks and DL, particularly LSTM and while the remainder (30%) is used in testing. The
ensemble learning models. short-term load prediction in the microgrid is tested
Through the ANN models, researchers have without considering climate data. Instead, it focuses on
concluded that building efficiency and rising energy the application conditions that electricity generators
demands are crucial to sustainability[18]. Their and consumers of the microgrid would encounter at
exploration aims to decide the general patterns while any given time. The researcher’s take on the outcomes
using ANNs to determine the energy consumption of a of the various DL methods is presented in Table 1.
structure. They concluded that while zeroing in on the Short-term load prediction in the microgrid is more
feed-forward brain network, they discovered that there accurate and efficient using the model[21].
are a few holes, principally in application, because The long transient memory LSTM[22] is used to
ANN is more fit to time series information, but this is improve the planning capacity of utility companies by
seen in only 14% of the cases they cover. They improving their ability to predict energy load
discovered that 6% of the applications are for general consumption, which can help in deciding whether a
regression and radial bias neural networks. It is new energy plant, transmission lines, or choosing
determined from their findings that energy between different fuel sources during production are
management, optimization and conservation needed. The researchers could show that the model is
forecasting strategies are not as suitable for day-to-day determined to be highly accurate, the MAPE obtained
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 251
is 6.54 within a confidence interval of 2.25%. Model resolution between March 2018 and July 2018. The
training takes 30 minutes. For a 5-year forecast, annual experiment demonstrates that time as a variable
offline training is required, making the computational accurately reflects the periodicity. It is found that
time a benefit. The LSTM–RNN model is suitable for LSTM shows better performance than forecast
predicting future locational marginal electricity methods, such as Back Propagation Neural Network
processes[22]. (BPNN), AutoRegressive Moving Average model
LSTM techniques are used in an attempt to provide (ARMA), and AutoRegressive Fractional Integrated
credible advice for energy resource allocation, energy Moving Average model (ARFIMA). For long-term
saving, and improving power systems[23]. Over five time series predictions, the LSTM’s Root Mean Square
months, experimental data were collected at a minute Error (RMSE) is 19.7% lower than the BPNN value,
252 Big Data Mining and Analytics, June 2024, 7(2): 247−270
54.85% lower than the ARMA value, and 69.59% Repository. Between December 2006 and November
lower than ARFIMA[23], and shows excellent energy 2010, 2 075 259 measurements from households were
forecasting potential[23]. included in the dataset. About 1.25% of the rows have
Ensemble learning[24] is used to determine multistep missing measurement values[30], but aside from that,
forecasting for time series data. Three the dataset contains the calendar timestamps. The 12
techniques—gradient-boosted trees, decision trees attributes are date, time, global active power, global
algorithm, and weighted least squares, compute the reactive power, voltage, global intensity, and
weight of the ensembles. Through this, it is possible to sub-metering 1 through 3, which are illustrated in
produce the dynamic or static ensemble model using a Fig. 1[30]. To select the data to use for the model
two-weight updating strategy technique. The prediction training, different sequence lengths are identified. This
problem is then decomposed into prediction process cannot be handled randomly because random
subproblems, where each subproblem value is used in
sampling would eliminate the possibility of catching
the forecasting horizon to obtain the ensemble member
the seasonality in the data. The study therefore uses
predictions. The researchers determined that their
FFT to extract the right sequence lengths from any
approach is scalable because DL algorithms based on
given time series. Applying FFT ensures that the
Apache Spark, a big data engine, can solve the
sample sequence lengths capture the different
subproblems. The data fed to the ensemble models is
10-year electrical data measured at 10-minute intervals. seasonalities, patterns, and other time-dependent effects
The researchers showed that the static and dynamic in the entire time series.
ensembles perform better than the individual members. 3.2 Applying the proposed stacked LSTM
The dynamic model Mean Relative Error (MRE) value snapshot ensemble
is 2%, the highest accuracy level obtained. It is also a
promising result for forecasting large time series[24]. Improving energy management services necessitates
The deep ensemble learning study on smart electric accurate energy consumption predictions in residential
grids is based on probabilistic load prediction. It is and commercial buildings. However, it is challenging
postulated that accurate load predictions are important to make accurate predictions about energy
in decisions involving benefits and costs for electrical consumption due to the unpredictability of noisy
grids[25]. The Least Absolute Shrinkage and Selection data[31]. Complex variables cannot be correlated or
Operator (LASSO) model evaluates energy evolved using conventional prediction methods. The
consumption data from 400 small and medium two-layer ensemble is fed with energy consumption
businesses, and 800 consumers. The individual data from the IHEPC along with weather data, allowing
residential data consumption features show higher for multiple sequence lengths in the proposed model,
volatility and diversity than the small and medium which addresses these issues based on the photographs.
businesses data, not withstanding the seasonality and After that, the model is trained, and a base estimate is
regularity of the aggregated load profiles. When made. The base estimate has a lot of output, and
conducting the probabilistic load forecasting on the 800 although the patterns are similar, the different models
consumers, the data are classified using one hour and learn differently. The meta-learner makes it possible to
one day intervals. The DNNs used in the ensemble select the appropriate sequences from weighted
models are randomly chosen, with a total of 7 DNNs
snapshots, effectively preventing random
with 512 hidden layer nodes, and the randomized [32]
distribution .
numbers between 1 and 4 in the hidden layers. The
The stacked ensemble LSTM DL algorithm, an
ensemble forecasts are refined using the LASSO-based
quantile combination model. advanced RNN that takes the place of the original cell
neurons, is the tool used for regression. The RNN
3 Proposed Model algorithm’s unique characteristics are passed down to
the DL algorithm, allowing the inputs to be considered
3.1 Dataset connected time series. Also, the LSTM cells’ intricate
The electrical energy consumption prediction models structure can solve problems with disappearing and
are validated with the help of Hebrail and Berard’s vanishing gradient limitations[33]. Input, cell status,
IHEPC dataset from the UCI Machine Learning forget, and output gates are the four essential
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 253
7 Global reactive power Household global minute-averaged reactive power (In kilowatt)
10 Sub-metering 1 An oven and a microwave, hot plates being not electric but gas
powered (in watt-hour of active energy)
11 Sub-metering 2 This variable corresponds to the laundry room, containing a
washing machine, a tumble-drier, a refrigerator, and a light
(In watt-hour of active energy)
12 Sub-metering 3 This variable corresponds to an electric water heater and an air
conditioner (in watt-hour of active energy)
13 Temperature The measured temperature in degrees Celsius or Fahrenheit
ht
Ct−1 Ct
it = σ (Ui xt + Wi ht−1 + bi)
tanh ~
Ct = tanh (Uc xt + Wc ht−1 + bc)
~
Ct = gt × Ct−1 + it × Ct
ft = σ (Cxt + Wg ht−1 + bf)
σ σ tanh σ ot = σ (Uo xt + Wo ht−1 + bo)
ht−1 ht ht = ot × tanh (Ct)
xt
the assistance of the sigmoid function. Equation ft provided that a set S = { s1 , s2 , . . . , sn } contains
represents the forget gate, which is responsible for sequences of various lengths. The process is repeated
removing information no longer needed by a cell. The with a different data slice following the first data slice’s
equation that solves for ot is the output gate and is training of the LSTM, and the cycle continues.
responsible for establishing the results of the cell. One Snapshots of the various sequences are saved for each
LSTM layer can have multiple timestamps. So, a data slice. The mean is derived as the base forecast,
timestamp receives data from a previous timestamp and and the collected snapshot estimates are combined
new information. The new information goes directly to from this point. In this case, meta-learning is used to
the input gate, while the previous timestamp acquire the mean function. The weight matrices from
information passes through the forget gate to select the first snapshot are used as the second sequence
only the needed information for the cell passed to the length for the current data slice because there are two
input gate. The input gate computes the new LSTMs. Meta-learning is used to combine all of the
information, the selected information from the previous base model snapshots, resulting in the identification of
gate, and uses the sigmoid function. The results are the final estimate forecast. If 20 sequence lengths are
passed to the output gate, which decides what to used, for instance, 20 snapshots are stored for the first
output. LSTM and used as the second LSTM’s sequence
Because of their design, LSTMs can only handle length. The base estimate is compiled from the
sequences of equal length for each epoch. This is snapshots obtained from the second LSTM and given
following the optimization process’s requirements for to the meta-learner for final estimation[10, 36, 37].
the matrix operations. In some data, however, a Figure 3 shows the stacked LSTM snapshot ensemble
sequence of varying lengths cannot be avoided, so used in the present study.
padding is used. This makes it possible to train models Figure 3 is a diagram of the proposed model, which
with different sequence lengths. However, the patterns received data from the UCI Machine Learning
they learn are related but different. An FFT must be Repository. It is taken as data slices and passed through
used to select the data’s sequence lengths to the FFT, which selects the best sample for the model,
accomplish this[36, 37]. The FFT makes it possible to that is sequences of different lengths, which makes it to
select sequences that distinguish between distinct include data with different time-dependent effects
periods in a given time series. When using FFT to including seasonalities and patterns. The varying length
select energy consumption sequences from time series sequences length are then recorded as the input data
data, it will be possible to capture the series’ seasonal ready to pass through the LSTM layers. The output of
patterns and other time-dependent effects, making the the first layer is used as the input of the second layer.
chosen sequences work better[36, 37]. The output of the second layer is then passed through
The proposed model will feed the LSTM through the LSTM snapshots. The number of different length
various data slices, and diversity will rise. As a result, sequences that are input determines the number of
there will be n snapshots stored for any given LSTM, snapshots taken. So, if the input data has 10 varying
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 255
Data
slices
LSTM snapshot 1
Base
estimate
FFT Input data Meta-learner
(Varying Two-layer LSTM LSTM snapshot 2
sequence
lengths)
LSTM snapshot 3
8
Global active
power (kW)
6
4
2
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(a) Global active power data resample over 30 min for mean
0.8
Global reactive
power (kW)
0.6
0.4
0.2
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(b) Global reactive power data resample over 30 min for mean
Global intensity
30
20
(A)
10
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(c) Global intensity data resample over 30 min for mean
Sub-metering 1
60
40
(W·h)
20
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(d) Sub-metering 1 data resample over 30 min for mean
Sub-metering 2
60
40
(W·h)
20
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(e) Sub-metering 2 data resample over 30 min for mean
25
Sub-metering 3
20
15
(W·h)
10
5
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(f) Sub-metering 3 data resample over 30 min for mean
state that there are demands in energy consumption in time of the day as people leave their homes. It is during
the morning hours when people wake up and start these periods that peaks are noted in the sub-metering
using electrical appliances, such as showers, toasters, data. During winter, increased demand is denoted
coffee makers, and kettles. However, the surge in starting at 15:00.
energy consumption increases faster over shorter This trend can be attributed to children returning
durations during winter. The increase in energy from school and adults returning from work. As people
consumption increases and starts stabilizing at a certain return home, electrical appliances, such as televisions,
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 257
6
Global active
power (kW)
4
2
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(a) Global active power data resample over hour for mean
0.8
Global reactive
power (kW)
0.6
0.4
0.2
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(b) Global reactive power data resample over hour for mean
30
Global intensity
20
(A)
10
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(c) Global intensity data resample over hour for mean
50
Sub-metering 1
40
30
(W·h)
20
10
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(d) Sub-metering 1 data resample over hour for mean
50
Sub-metering 2
40
30
(W·h)
20
10
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(e) Sub-metering 2 data resample over hour for mean
Sub-metering 3
20
15
(W·h)
10
5
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(f) Sub-metering 3 data resample over hour for mean
dishwashers, microwaves, and air conditioners, are beverages when the weather is warmer, but energy
turned on as people warm and light their houses and consumption in the evenings is lower in the summer.
prepare dinner. Consumption falls and drops as people
There are notable peaks in Sub-meterings 2 and 3, such
start going to bed. During summer, the surge is not as
evident as in winter because when people return home, as the air conditioners, refrigerators, and washing
it is still light outside, and their houses are warmer. machines, are used more often, increasing
There is an increased use of refrigerators and colder consumption.
258 Global active Big Data Mining and Analytics, June 2024, 7(2): 247−270
3
power (kW)
2
1
0.25
0.20
0.15
0.10
0.05
0 200 400 600 800 1000 1200 1400
Index over day
(b) Global reactive power data resample over day for mean
15.0
Global intensity
12.5
10.0
(A)
7.5
5.0
2.5
0 200 400 600 800 1000 1200 1400
Index over day
(c) Global intensity data resample over day for mean
8
Sub-metering 1
6
(W·h)
4
2
0
0 200 400 600 800 1000 1200 1400
Index over day
(d) Sub-metering 1 data resample over day for mean
Sub-metering 2
8
6
(W·h)
4
2
0
0 200 400 600 800 1000 1200 1400
Index over day
(e) Sub-metering 2 data resample over day for mean
Sub-metering 3
15
(W·h)
10
5
4.3 Rolling average power and global intensity show highs in the months
The rolling average is used to determine a trend’s starting in March 2008 that peak in July 2008, after
direction. It adds up the data points of the energy which there is a decreasing trend. This trend can be
consumption over the defined period and divides the interpreted as increased energy consumption in the
total by the data points provided to determine the months leading up to July 2008, and there is reduced
average. From the rolling curves obtained, it is possible consumption after July 2008, as observed in the data.
to observe trends in the data where the global active The global reactive rolling mean shown in Fig. 12
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 259
2.5
Global active
power (kW)
2.0
1.5
1.0
0.5
0 50 100 150 200
Index over week
(a) Global active power data resample over week for mean
Global reactive
0.20
power (kW)
0.15
0.10
10
8
6
(A)
4
2
0 50 100 150 200
Index over week
(c) Global intensity data resample over week for mean
Sub-metering 1
2.5
2.0
(W·h)
1.5
1.0
0.5
0
0 50 100 150 200
Index over week
(d) Sub-metering 1 data resample over week for mean
Sub-metering 2
8
2
(W·h)
1
0
0 50 100 150 200
Index over week
(e) Sub-metering 2 data resample over week for mean
12.5
Sub-metering 3
10.0
(W·h)
7.5
5.0
2.5
0 50 100 150 200
Index over week
(f) Sub-metering 3 data resample over week for mean
indicates that the trend is opposite to what is observed months used in defining a season in the data. Once the
and shown in Figs. 13 and 14. algorithm detects the extended season, there is an
autocorrection in the lag, where the lines fall within the
4.4 Autocorrection
confidence level. The benefit of having the extended
It should be noted that after two lags seen in 12–13 months and resultant lags is to show the ability
Figs. 15−17, the lines get inside the confidence interval of the algorithm to adapt to unconventional data and
(light blue area). The lag is caused by the 12–13 still produce accurate results.
260 Big Data Mining and Analytics, June 2024, 7(2): 247−270
2.0
Global active
power (kW)
1.5
1.0
0.5
0 10 20 30 40
Index over month
(a) Global active power data resample over month for mean
Global reactive
0.18
power (kW)
0.16
0.14
0.12
0.10
0 10 20 30 40
Index over month
(b) Global active power data resample over month for mean
Global intensity
8
6
(A)
4
2
0 10 20 30 40
Index over month
(c) Global active power data resample over month for mean
Sub-metering 1
1.5
(W·h)
1.0
0.5
0 10 20 30 40
Index over month
(d) Global active power data resample over month for mean
2.5
Sub-metering 2
2.0
(W·h)
1.5
1.0
0.5
0 10 20 30 40
Index over month
(e) Global active power data resample over month for mean
Sub-metering 3
10
8
(W·h)
6
4
2
0 10 20 30 40
Index over month
(f) Global active power data resample over month for mean
1.8
Global active power (kW)
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time
7
Global intensity (A)
1
Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time
0.18
Global reactive power (kW)
0.16
0.14
0.12
0.10
Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time
core CPU with four efficiency and four performance 5.2 Data preparation
cores, and a 16-core neural engine. From the dataset, The dataset contained 2 075 259 rows and 7 columns.
70% of the data are used in training, and 30% are used The data and date fields are parsed to the date/time
for testing and validation. The technique used for column and converted to the index column during
model validation is a train/test split. Validation occurs importation. The outliers in the data, noisy data, are
between the training and test stages. Based on the new cleaned by filling the null values and noise with the
dataset obtained, the LSTM settings used are mean values in their respective fields. The data are
normalization set to between 0 and 1, the batch size is successfully integrated after cleaning. From the dataset
50, the epoch number is 100, and there are 4 LSTM containing 2 075 259 rows, 25 979 rows contain null
layers used. values, which are filled with the mean value. This is
262 Big Data Mining and Analytics, June 2024, 7(2): 247−270
1.14
Global active power (kW)
1.12
1.10
1.08
1.06
Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time
Fig. 12 Rolling mean for global active power over 12 month period.
4.9
4.8
Global intensity (A)
4.7
4.6
4.5
4.4
Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time
0.135
Global reactive power (kW)
0.130
0.125
0.120
0.115
0.110
Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time
Fig. 14 Rolling mean for global reactive power over 12 month period.
1.00 1.00
0.75 0.75
0.50 0.50
Correlation
Correlation
0.25 0.25
0 0
−0.25 −0.25
−0.50 −0.50
−0.75 −0.75
−1.00 −1.00
0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
Lag Lag
1.0
Global active power 1.000 0.250 −0.400 1.000 0.480 0.430 0.640
Negative (Blue) to positive (Red) correlation
0.8
Global reactive power 0.250 1.000 −0.110 0.270 0.120 0.140 0.090
0.6
between 2 variables
er
ge
ity
3
g
g
w
s
lta
in
in
in
en
po
po
Vo
er
er
er
nt
e
et
et
et
li
tiv
tiv
m
ba
ac
ac
b-
b-
b-
lo
re
Su
Su
Su
l
G
ba
l
ba
lo
G
lo
G
0.008 Train curve can be recognized by the test and training loss
0.007 Test that reduce to the point of stability and with a minimal
0.006 gap between the final loss values. The model’s loss is
0.005 almost always lower on the training data than on the
Loss
0
0 100 200 300 400 500 600 700
Time step for first 750 hours
Network (CNN-ESN)[43], two-stream deep network model for DL analysis and short-term load prediction
STLF (namely STLF-Net)[44], residual GRU[45], ESN- in residential buildings. The MAPE, RMSE, and MAE
CNN[46], Region-based CNN (R-CNN) with Meta- are found to be 36.24, 0.4386, and 0.2674, respectively.
Learning LSTM (ML-LSTM)[20], standard LSTM with Khan et al.[45] forecasted energy demand and supply
LSTM-based Sequence-to-Sequence (S2S) using a residual GRU model. They discovered that the
architecture[47], Multiplicative LSTM (M-LSTM)[48], RMSE and MAE are 0.4186 and 0.2635, respectively.
Deep-Broad Network (DB-Net)[49], explainable The ESN-CNN DL model is utilized by Khan et al. in
autoencoder[50], residual GRU-based hybrid model[45], 2022[46] to enhance energy prediction. The study’s
hybrid DL network[51], multi-headed attention error values are 0.2153 (RMSE) and 0.1137 (MAE).
model[52], and Conventional LSTM-based hybrid Alsharekh et al.[20] improved short-term load prediction
architecture Network (CL-Net)[53]. Compared to all using the hybrid model and R-CNN. The RMSE, MAE,
other models, the stacked LSTM ensemble reveals MAPE, and R2 values they discovered are 0.0325,
lower error rates of 0.020, 0.013, 0.017, and 0.999 for 0.0144, 1.024, and 0.9841, respectively.
RMSE, MAE, MAPE, and R2, respectively. In an Based on its consumptive nature, the S2S model is
examination, based on RMSE, MAE, and MAPE, Kim studied and evaluated using the standard LSTM and an
and Cho[38] determined that the linear regression LSTM model based on sequence/no sequence. Findings
model’s performance has an hourly resolution of show that LSTM performs better at hourly resolution
0.6570, 0.5022, and 83.74, respectively. Rajabi and but not at a per minute resolution[47]. The RMSE is
Estebsari[39] determined that the ANN model’s 0.625. Researchers Kim and Cho[38] developed the
performance has RMSE and MAE values of 1.15 and CNN-LSTM model, which uses a hybrid connection
1.08, respectively. The structure uses recurrence plots between the LSTM and CNN networks. The CNN
to encode time series data into images, and the model network in the model extracts intricate features from
performs better than CNN, SVM, and ANN. Khan et variables that impact consumption. The LSTM
al.’s[40] work found that the LSTM autoencoder hybrid algorithm is utilized for modelling temporal
CNN model has RMSE and MAE values of 0.67 and information. The RMSE, MAE, and MAPE values are
0.47, respectively. The model performs best with daily 0.595, 0.3317, and 32.83, respectively. The explainable
predictions as opposed to hourly predictions of autoencoder DL model is used to forecast consumption
household electricity consumption. Using the CNN- for 15, 30, 45, and 60 minutes in another model with
LSTM DL algorithm, Kim and Cho[38] developed a sample data. The specialists utilize a t-SNE calculation
model to predict residential energy consumption. Their to make sense of and imagine the estimated results.
analysis reveals that the RMSE, MAE, and MAPE The MAE value produced by their model is 0.3953[50].
have values of 0.595, 0.3317, and 32.82, respectively. Khan et al.[49] published works that utilize a hybrid
Ullah et al.[41] created a CNN multilayer bidirectional connection of bidirectional LSTM and CNN networks
LSTM network based model for predicting residential along with the DB-Net algorithm to forecast
energy consumption. They discovered that the RMSE, consumption. The model’s error values are 0.1272
MAE, and MAPE values are 0.565, 0.346, and 29.10, (RMSE) and 0.0916 (MAE). Ullah et al.[48] utilized the
respectively. conventional ML and DL sequential models for energy
Using the CNN-LSTM autoencoder, Kim and Cho[38] consumption predictions. Based on error metrics, their
created a model that could anticipate residential energy investigations reveal that the M-LSTM model has
consumption. According to their research, the RMSE superior prediction ability over the DL and ML
and MAE metrics have model errors of 0.47 and 0.31, algorithms. The M-LSTM model’s error values are
respectively. Sajjad et al.’s[42] study utilizing the hybrid 0.3296 (RMSE) and 0.3086 (MAE), based on an hourly
model of CNN and GRU to forecast energy resolution. Haq et al.[51] predicted energy consumption
consumption shows that the error values are 0.47 by residential and commercial users using a novel
(RMSE) and 0.33 (MAE). Khan et al.[43] wanted to use hybrid DL model. The model acquires RMSE and
DL algorithms to improve energy harvesting and MAE upsides of 0.324 and 0.311, respectively. The
selected the CNN-ESN model. Their analysis reveals RNN model incorporating multi-headed attention is
error values of 0.0472 (RMSE) and 0.0266 (MAE)[43]. created by Bu and Cho[52] to forecast energy
In 2022, Abdel-Basset et al.[44] used the STLF-Net consumption and selectively determine spatiotemporal
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 267
characteristics. The MSE value is 0.2662, but the that lack dimension reduction algorithms[54] to allow
model provides no error metrics. Khan et al.[45] created for seasonality observation, the stacked snapshot
a hybrid model with Residual GRU (R-GRU) and LSTM ensemble shows that it is possible to investigate
dilated convolutions. The RMSE and MAE error seasonality attributed to energy consumption. Another
metric scores are 0.4186 and 0.2635, respectively, advantage of using the model is that it is easy to train
when this model is used to predict energy generation and validate. Furthermore, it supports big data and
and consumption. Khan et al.[53] modelled the CL-Net could dynamically support the model weights used
architecture using the ConvLSTM hybrid to assess the without many adjustments to the dataset. The model is
model’s accuracy in predicting energy consumption. designed to be simple and functional, such that it could
Their testing of the model results in an RMSE score of provide a relatively inexpensive method of evaluating
0.122 and an MAE score of 0.088. In this comparison, big energy datasets. Finally, the model includes an
most of the models perform better than this one. The algorithm that trains the LSTM model sequentially.
RMSE, MAE, MAPE, and R2 of the proposed model This allows the model to learn different patterns. The
are 0.020, 0.013, 0.017, and 0.999, respectively. The advantage of this feature is that the estimates provided
developed model has the lowest error scores of any by the final model are very robust and accurate due to
model, indicating that it accurately predicts energy the high levels of generalization. The model’s accuracy
consumption. The performance comparison of different and stability are measured using RMSE, MAE, MAPE,
prediction models is summarized in Table 2. and R2 as 0.020, 0.013, 0.017, and 0.999, respectively.
8 Conclusion 0.020 for the RMSE demonstrates the model’s high
level of stability, while 0.017 for MAPE is very close
Using the developed model, it is possible to accurately to 0, signalling high-level accuracy with the model.
predict energy consumption. Compared to other studies The R2 results of 0.999 is nearly 1, which shows the
Table 2 Prediction model comparison. model’s good performance. Accuracy, stability, and
Model RMSE MAE MAPE R2 performance give the model consistent results, but if
Linear regression[38] 0.6570 0.5022 83.740 – the results are out of the ordinary, the model allows for
ANN[39] 1.1500 1.0800 – – humans to understand the causes of those results. This
CNN [40] 0.6700 0.4700 – – refers to interpretability, which permits a human to
CNN-LSTM[38] 0.5950 0.3317 32.830 – explain the cause-and-effect of such anomalous results.
CNN-BDLSTM[41] 0.5650 0.3460 29.100 – Despite the model’s advantages, there are some
CNN-LSTM
0.4700 0.3100 – – limitations encountered, such that the model’s
autoencoder [38]
CNN-GRU[42] 0.4700 0.3300 – –
performance is not tested with real-time performance to
CNN-ESN[43] 0.0472 0.0266 – – determine its robustness. Therefore, the model should
STLF-NET[44] 0.4386 0.2674 36.240 – be tested in the future to determine its dynamic
ESN-CNN[46] 0.2153 0.1137 – – performance in a real-time environment. In addition, in
R-CNN with the present study, the model’s accuracy and stability
0.0325 0.0144 1.024 0.9841
ML-LSTM[20] results are used to infer interpretability. Future studies
Standard LSTM and
should determine an independent way to measure a
LSTM-based S2S 0.6250 – – –
architecture[47] model’s interpretability.
M-LSTM[48] 0.3296 0.3086 – – Based on these findings, the energy sector could use
DB-NET[49] 0.1272 0.0916 – – such a model, as it provides high-value insights, value-
Explainable addition, and service improvements based on its
– 0.3953 – –
autoencoder[50] effective use of big data regarding energy
Residual GRU-based
0.4186 0.2635 – – consumption. Because energy data reliably and in real
hybrid model[45]
Hybrid DL network[51] 0.3240 0.3110 – – time reflects economic activity trends of populations,
Multi-headed attention businesses, and the community, by virtue of these
0.2662 – – –
model[52] technological advantages and data resources, this can
CL-Net architecture[53] 0.1220 0.0880 – – be regarded as an important data resource for energy
Proposed model 0.0200 0.0130 0.0170 0.9990
companies to develop data platforms with high-level
268 Big Data Mining and Analytics, June 2024, 7(2): 247−270
accuracy and performance algorithms that can integrate management in residential buildings using energy hub
data from multiple industries, facilitating the approach, Build. Simul., vol. 13, no. 2, pp. 363–386, 2020.
[5] Z. Wang and R. S. Srinivasan, A review of artificial
transformation and upgrading of governments and intelligence based building energy use prediction:
organizations. Contrasting the capabilities of single and ensemble
For energy companies, the application of the stacked prediction models, Renew. Sustain. Energy Rev., vol. 75,
LSTM snapshot ensemble and other DL models to pp. 796–808, 2017.
energy consumption is still in early stages of [6] H. Song, A. K. Qin, and F. D. Salim, Evolutionary model
construction for electricity consumption prediction, Neural
development. The data points have numerous
Comput. Appl., vol. 32, no. 16, pp. 12155–12172, 2020.
compound values that must be discovered and mined [7] E. Jahani, K. Cetin, and I. H. Cho, City-scale single family
by internal and external businesses to obtain additional residential building energy consumption prediction using
insights and trends. Energy generation companies need genetic algorithm-based numerical moment matching
to pay close attention to how big data about energy technique, Build. Environ., vol. 172, p. 106667, 2020.
consumption work and how businesses and [8] J. Xu, W. Gao, and X. Huo, Analysis on energy
consumption of rural building based on survey in northern
multinational corporations use them by working
China, Energy Sustain. Dev., vol. 47, pp. 34–38, 2018.
closely with these businesses to learn more from them. [9] A. Zeng, H. Ho, and Y. Yu, Prediction of building
Businesses are conducting energy research to speed up electricity usage using Gaussian Process Regression, J.
the sharing of energy data. Companies that produce Build. Eng., vol. 28, p. 101054, 2020.
energy need to pay close attention to their key [10] C. Dai, H. Zhang, E. Arens, and Z. Lian, Machine learning
approaches to predict thermal demands using skin
dynamics, intensify learning, and use big data and
temperatures: Steady-state conditions, Build. Environ.,
learning applications. Additionally, they should vol. 114, pp. 1–10, 2017.
investigate cooperative endeavours and share [11] C. Xu, H. Chen, J. Wang, Y. Guo, and Y. Yuan,
mechanisms for collective improvement. Improving prediction performance for indoor temperature
Translating the findings from the stacked LSTM in public buildings based on a novel deep learning method,
snapshot ensemble energy consumption prediction Build. Environ., vol. 148, pp. 128–135, 2019.
[12] D. K. Bui, T. N. Nguyen, T. D. Ngo, and H. Nguyen-
model into the analysis of the energy usage dataset for Xuan, An artificial neural network (ANN) expert system
reporting and determining faults, optimization, and enhanced with the electromagnetism-based firefly
forecasted maintenance in households and businesses is algorithm (EFA) for predicting the energy consumption in
the future direction of this study. buildings, Energy, vol. 190, p. 116370, 2020.
[13] M. A. Jallal, A. González-Vidal, A. F. Skarmeta, S.
Acknowledgment Chabaa, and A. Zeroual, A hybrid neuro-fuzzy inference
system-based algorithm for time series forecasting applied
The authors gratefully acknowledge the support and to energy consumption prediction, Appl. Energy, vol. 268,
invaluable guidance provided by the Faculty of p. 114977, 2020.
Computing and Information Technology (FCIT), King [14] S. Ganguly, A. Ahmed, and F. Wang, Optimised building
energy and indoor microclimatic predictions using
Abdulaziz University (KAU), Jeddah, Kingdom of Saudi
knowledge-based system identification in a historical art
Arabia. gallery, Neural Comput. Appl., vol. 32, no. 8, pp.
References 3349–3366, 2020.
[15] J. S. Chou, N. T. Ngo, W. K. Chong, and G. E. Jr Gibson,
[1] M. Cellura, F. Guarino, S. Longo, and G. Tumminia, Big data analytics and cloud computing for sustainable
Climate change and the building sector: Modelling and building energy efficiency, in Start-Up Creation: The
energy implications to an office building in southern Smart Eco-Efficient Built Environment, F. Pacheco-
Europe, Energy Sustain. Dev., vol. 45, pp. 46–65, 2018. Torgal, E. Rasmussen, C. G. Granqvist, V. Ivanov, A.
[2] T. Zhang, D. Wang, H. Liu, Y. Liu, and H. Wu, Numerical Kaklauskas, and S. Makonin, eds. Cambridge, UK:
investigation on building envelope optimization for low- Woodhead Publishing, 2016, pp. 397–412.
energy buildings in low latitudes of China, Build. Simul., [16] S. Seyedzadeh, F. Pour Rahimian, P. Rastogi, and I. Glesk,
vol. 13, no. 2, pp. 257–269, 2020. Tuning machine learning models for prediction of building
[3] A. D. Pham, N. T. Ngo, T. T. Ha Truong, N. T. Huynh, energy loads, Sustain. Cities Soc., vol. 47, p. 101484,
and N. S. Truong, Predicting energy consumption in 2019.
multiple buildings using machine learning for improving [17] S. Badillo, B. Banfai, F. Birzele, I. I. Davydov, L.
energy efficiency and sustainability, J. Cleaner Prod., vol. Hutchinson, T. Kam-Thong, J. Siebourg-Polster, B.
260, p. 121082, 2020. Steiert, and J. D. Zhang, An introduction to machine
[4] A. Raza, T. N. Malik, M. F. N. Khan, and S. Ali, Energy learning, Clin. Pharmacol. Ther., vol. 107, no. 4, pp.
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 269
for short-term load forecasting in residential buildings, J. local energy systems, Int. J. Electr. Power Energy Syst.,
King Saud Univ.—Comput. Inf. Sci., vol. 34, no. 7, pp. vol. 133, p. 107023, 2021.
4296–4311, 2022. [50] J. Y. Kim and S. B. Cho, Electric energy consumption
[45] S. U. Khan, I. U. Haq, Z. A. Khan, N. Khan, M. Y. Lee, prediction by deep learning with state explainable
and S. W. Baik, Atrous convolutions and residual GRU autoencoder, Energies, vol. 12, no. 4, p. 739, 2019.
based architecture for matching power demand with [51] I. U. Haq, A. Ullah, S. U. Khan, N. Khan, M. Y. Lee, S.
supply, Sensors, vol. 21, no. 21, p. 7191, 2021. Rho, and S. W. Baik, Sequential learning-based energy
[46] Z. A. Khan, T. Hussain, I. U. Haq, F. U. M. Ullah, and S. consumption prediction model for residential and
W. Baik, Towards efficient and effective renewable commercial sectors, Mathematics, vol. 9, no. 6, p. 605,
energy prediction via deep learning, Energy Rep., vol. 8, 2021.
pp. 10230–10243, 2022. [52] S. J. Bu and S. B. Cho, Time series forecasting with multi-
[47] D. L. Marino, K. Amarasinghe, and M. Manic, Building headed attention-based deep learning for residential
energy load forecasting using deep neural networks, in energy consumption, Energies, vol. 13, no. 18, p. 4722,
Proc. 42nd Annu. Conf. IEEE Industrial Electronics 2020.
Society, Florence, Italy, 2016, pp. 7046–7051. [53] N. Khan, I. U. Haq, F. U. M. Ullah, S. U. Khan, and M. Y.
[48] F. U. M. Ullah, N. Khan, T. Hussain, M. Y. Lee, and S. Lee, CL-Net: ConvLSTM-based hybrid architecture for
W. Baik, Diving deep into short-term electricity load
batteries’ state of health and power consumption
forecasting: Comparative analysis and a novel framework,
forecasting, Mathematics, vol. 9, no. 24, p. 3326, 2021.
Mathematics, vol. 9, no. 6, p. 611, 2021. [54] U. Ugurlu, I. Oksuz, and O. Tas, Electricity price
[49] N. Khan, I. U. Haq, S. U. Khan, S. Rho, M. Y. Lee, and S.
forecasting using recurrent neural networks, Energies, vol.
W. Baik, DB-Net: A novel dilated CNN based multi-step
11, no. 5, p. 1255, 2018.
forecasting model for power consumption in integrated