0% found this document useful (0 votes)

14 views

Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble

The document presents a model for predicting energy consumption using a stacked Long Short-Term Memory (LSTM) snapshot ensemble, which integrates Fast Fourier Transform (FFT) and meta-learning techniques. The model demonstrates high accuracy and stability in predictions, achieving metrics such as RMSE of 0.020 and R2 of 0.999, making it superior to other models. It aims to provide accurate energy predictions without data limitations, enhancing interpretability and usability for energy companies and policymakers.

Uploaded by

Sultan Salahuddin Jokhio

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble

Uploaded by

Sultan Salahuddin Jokhio

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

BIG DATA MINING AND ANALYTICS

ISSN 2096-0654 01/15 pp247−270

DOI: 10.26599/BDMA.2023.9020030
Volume 7, Number 2, June 2024

Predicting Energy Consumption Using Stacked

LSTM Snapshot Ensemble

Mona Ahamd Alghamdi*, Abdullah S. AL-Malaise AL-Ghamdi, and Mahmoud Ragab

Abstract: The ability to make accurate energy predictions while considering all related energy factors allows
production plants, regulatory bodies, and governments to meet energy demand and assess the effects of
energy-saving initiatives. When energy consumption falls within normal parameters, it will be possible to use
the developed model to predict energy consumption and develop improvements and mitigating measures for
energy consumption. The objective of this model is to accurately predict energy consumption without data
limitations and provide results that are easily interpretable. The proposed model is an implementation of the
stacked Long Short-Term Memory (LSTM) snapshot ensemble combined with the Fast Fourier Transform (FFT)
and meta-learner. Hebrail and Berard’s Individual Household Electric-Power Consumption (IHEPC) dataset
incorporated with weather data are used to analyse the model’s accuracy with predicting energy consumption.
The model is trained, and the results measured using Root Mean Square Error (RMSE), Mean Absolute Error
(MAE), Mean Absolute Percentage Error (MAPE), and coefficient of determination ( R2 ) metrics are 0.020,
0.013, 0.017, and 0.999, respectively. The stacked LSTM snapshot ensemble performs better than the
compared models based on prediction accuracy and minimized errors. The results of this study show that
prediction accuracy is high, and the model’s stability is high as well. The model shows that high levels of
accuracy prove accurate predictive ability, and together with high levels of stability, the model has good
interpretability, which is not typically accounted for in models. However, this study shows that it can be inferred.

Key words: Artificial Intelligence (AI); Deep Learning (DL); energy consumption; snapshot ensemble; prediction

1 Introduction contribute to global warming and carbon emissions.

National development projects, population growth, and Energy performance in many countries is important.
urbanization are all driven by energy, making it a But homes and buildings must be designed first to
highly sought-after resource. Globally, homes and accommodate the occupants’ comfort. Where lower
buildings make up a substantial proportion of the energy consumption is preferred, it is a secondary
energy consumed when they are in operation and aspect. Therefore, among decision-makers and
Mona Ahamd Alghamdi is with Information Systems Department, Faculty of Computing and Information Technology (FCIT), King
Abdulaziz University (KAU), Jeddah 21589, Kingdom of Saudi Arabia. E-mail: [email protected].
Abdullah S. AL-Malaise AL-Ghamdi is with Information Systems Department, FCIT, KAU, Jeddah 21589, Kingdom of Saudi Arabia,
and also with School of Engineering, Computing and Design, Dar Al-Hekma University, Jeddah 22246, Kingdom of Saudi Arabia.
E-mail: [email protected].
Mahmoud Ragab is with Information Technology Department, FCIT, KAU, Jeddah 21589, Kingdom of Saudi Arabia, and also with
Mathematics Department, Faculty of Science, Al-Azhar University, Naser City 11884, Egypt. E-mail: [email protected].
* To whom correspondence should be addressed.
Manuscript received: 2023-01-19; revised: 2023-06-04; accepted: 2023-10-16

© The author(s) 2024. The articles published in this open access journal are distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
248 Big Data Mining and Analytics, June 2024, 7(2): 247−270

academic researchers in the energy sector, energy forecast smart building energy consumption. In 2017,
efficiency is a concerning topic that is critical for Wang and Srinivasan[5] examined ensemble-based AI
achieving the low-carbon economy target (green models to predict building energy use. Other studies
economy)[1]. have used models from Deep Learning (DL) to conduct
Many governments appreciate the benefits of research in the creation of a system for managing
efficiently using energy. Efficient energy use affects energy. Jahani et al.[7] incorporated a numerical
the capacity of a building to acquire a green building moment matching technique with a genetic algorithm
certificate, which is based on the green building rating to create a tool for predicting residential building
systems intended to minimize greenhouse effects and energy use.
carbon emissions. In this regard, predicting energy use It is essential that national energy efficiency policies
is critical for planning, conservation, and management. be developed and proposed by evaluating trends in
Also, with increased demand, the call for better energy electricity consumption and energy structures[8]. The
consumption planning comprises improved foundation for minimizing energy costs and
consumption measurement and distribution planning. maximizing energy performance is the capacity to
The capacity to optimize and predict energy forecast energy consumption beyond buildings[9]. ML
consumption can aid with energy distribution to and AI are already being used in the building and
support the increased energy demand[1]. energy domain[10–14], where models use historical data
Various studies have been conducted to further to forecast energy consumption and generate new
develop better energy utilization and execution in insights. For example, based on the temperature of the
structures[2–4]. To create applications for smart skin, machine-learning models can predict thermal
buildings, buildings must use smart devices, and demand[10]. Using an optimization model and machine
artificial intelligence and engineering-based methods learning to identify energy data patterns, Chou et al.[15]
are typically used to predict energy consumption. evaluated time series energy data in 2016.
Engineering methods and models use principles like Artificial Neural Networks (ANN), linear regression
thermodynamic equations to forecast energy models, and Support Vector Regression (SVR) models
consumption. For energy performance evaluation, these are among the most widely used machine-learning
models and methods frequently necessitate expertise in models[5]. Ganguly et al.’s research[14] predicted energy
customizing them to meet specific requirements and consumption in a historical art gallery using an ANN
programming the thermal parameters. Engineering model. In 2019, Seyedzadeh et al.[16] analysed
models and techniques require in-depth information performance with predicting building cooling and
about the building envelope, the thermal properties of heating loads of ML models using SVR, ANN, random
the windows and construction layers, and the forest, Gaussian process, and Gradient-Boosted
ventilation, heating, and air-conditioning systems used Regression Trees (GBRT). It is resolved that GBRT
to predict energy consumption accurately. shows the best presentation when utilizing the root
Future energy consumption models based on mean square mistake values. The researchers also
historical data are referred to by Artificial Intelligence concluded that the ANN model evaluates complex
(AI) and Machine-Learning (ML) techniques[5]. The datasets best. Additionally, the ANN model computes
ability of models and algorithms to learn about the significantly faster than the other ML models in their
relationship between future and historical data is one of study[16].
the benefits of using AI and ML techniques. The ML, a subset of artificial intelligence, allows the
prediction models that are developed are fed historical machine or system to learn without human
data, in contrast to engineering techniques that use intervention. Several DL techniques are used in
comprehensive building data. Additionally, users are prediction, including Recurrent Neural Networks
not required to have a comprehensive understanding of (RNN), Deep Neural Networks (DNN), and Deep
the thermodynamic behaviour of a building. Belief Networks (DBN). These powerful tools help
AI models have been developed to predict energy with acquiring robust modelling and prediction
performance, as suggested by some studies. For performance. RNN is characterized by taking the
instance, Song et al.[6] created an evolutionary model to output of a previous step and feeding it to the current or
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 249

next step as input. Its most important feature is its model, and most of these available models focus
hidden state that retains information about the primarily on result accuracy and fail to consider result
sequence. However, it becomes untrained when the interpretability. This created a need to conceptualize a
network contains large datasets and layers. Long Short- model that would solve the issue of accuracy in an
Term Memory (LSTM) is a DL technique based on energy prediction model without dataset limitations
RNN that provides the added advantage of successfully and produce results that were easily interpretable.
training to overcome the problems found with RNN[16]. To solve this issue with accuracy and good
Although ML models can produce significant and interpretability while not being data constrained in an
demonstrated prediction accuracy improvements in energy prediction model, the present study proposes
many cases, research has mainly focused on improving applying stacked LSTM snapshot ensemble. LSTM is a
accuracy without dealing with the interpretability of the DL technique based on RNN, which can make this one
results. Currently, expert systems, primarily developed seems like just another LSTM model. However, the
using linguistic fuzzy logic systems, give users the proposed model would implement LSTM snapshot,
ability to have systems modelling capabilities and good which includes the characteristics of RNN algorithms,
interpretability[12]. However, the systems and models LSTM algorithms, and the snapshot characteristics.
often depend on individual expertise and regularly do The advantages that the stacked LSTM snapshot would
not produce accurate predictions. Thus, to meet the include inputs as connected time series; solving
requirements of interpretability and high accuracy, it is problems with a disappearing and vanishing gradient;
being proposed to combine popular techniques, expert and being able to store snapshots of different data
models, and other methods. Despite the application of slices, depending on the length of their sequence. The
DL models in energy and the meeting of the accuracy model could work with data with different sequence
requirement, there is still a need to improve lengths, as well as train them, which is a problem for
performance in energy consumption and production most prediction models. Implementation of the Fast
application. Fourier Transform (FFT) would make this possible and
ML is a rich field with many models that can be and allow the proposed model to work with seasonal
have been implemented in energy consumption pattern series. Meta-learning would then be applied to
prediction, such as ANN, SVR, linear regression collected base model snapshots and a final estimate of
models[5], and RNN models[16], to name just a few. the energy prediction determined. This is where
With every prediction model proposed and built, there accuracy is achieved because the final estimate is a
is a problem solved, but each of these machine-learning mean of all the collected snapshots for a given data
models has its own problems. The biggest issue is with input. In addition, we improve the viability of the data
the training data. ANN is attributed to the lack of by adding weather data. The fact that they are climate
energy training data, SVR is not suitable for large change and unpredictable weather events implies that
datasets, and linear regression is prone to overfitting the weather component is evolving and thus having
and noise. Overfitting is where a model will work well new implications on energy consumption and
with training data but very poorly with new data. production. The proposed model ensures that every
Another problem is when a model will work well with predicted consumption instance is checked for errors
the training data, but if there is less training data than and accuracy. The model is found to have the best
new data, the model underperforms. This is a problem accuracy relative to the compared models and can be
with SVR. In linear regression, linearity is quite used for accurate energy consumption prediction. With
important, which leaves nonlinear data as a additional performance evaluation, it can be used by
disadvantage. All these data issues greatly impact the energy companies for consumption analysis to improve
accuracy of the prediction model. RNN[16] models also their service delivery.
have an issue with training large datasets and layers. This paper discusses works related to various DL
Data issues aside, some models may manage to achieve models, including LSTM and ensemble learning
accurate data, but the results achieved cannot be models in Section 2. The proposed model is described
interpreted. Modelling capabilities and good in-depth in Section 3, and in Section 4, the data and
interpretability[12] are essential for any prediction related visualizations are presented. Section 5 delves
250 Big Data Mining and Analytics, June 2024, 7(2): 247−270

into the developed model, data preparation, settings, operations as the ANN predictive models[18].
feature selection, and result evaluation. Section 6 Using 30-minute Short-Term Load Forecasting
discusses the model training and related losses. Section (STLF) resolutions, the researcher compare the
7 compares the stacked snapshot LSTM ensemble to performance of several ANN models with numerous
other DL models, and Section 8 concludes this hidden layers and activation functions[19]. The models
discussion of the study. use 1–10 hidden layers and different activation
functions, which comprise the parametric rectified
2 Related Work linear unit, rectified linear unit, exponential linear
One of the main components of AI is DL, which is units, leaky rectified linear units, and scaled
defined as a set of layered knowledge-acquiring exponential linear unit[19]. Using electrical
computer algorithms used by computers and machines consumption data from five specific buildings collected
to learn without the need for explicit programming[17]. over 2 years and two performance metrics—the
Additionally, DL provides AI with layered algorithms Coefficient of Variation of the Root Mean Square Error
that machines can use to automatically learn and (CVRMSE) and Mean Absolute Percentage Error
improve actions based on previous experiences. While (MAPE)—they discovered that the model with five
AI is characterized by airing and applying knowledge, hidden layers has an average superior performance
DL is the acquisition of skill and knowledge[17]. When relative to other tested models designed for STLF[19].
using AI, the goal is to increase the probability of Although the researchers produced a standard model
success rather than focusing primarily on accuracy. for predicting energy consumption, it is possible to
However, when using DL, the goal is mainly to create a more precise prediction model by including the
increase the accuracy of an action, regardless of input variables, which can show a building’s energy
success. AI can be compared to smart computer consumption characteristics. Additionally, the target’s
software, while DL can be likened to the processes and forecasting performance can be anticipated to rise due
techniques a machine uses with data in learning. As to hyperparameter tuning in the Scaled Exponential
indicated earlier, DL algorithms are specifically Linear Unit (SELU) prediction models[20].
designed to help machines learn. Typically, the DL The SVR, LSTM, and predictive model combining
process involves finding relevant data to identify SVR and LSTM contain 240 samples with 24-hour
patterns. After identifying a pattern, the machine can load profiles[21]. The goal is to perform short-term
predict outcomes for new data using historical data. microgrid load forecasting. Each hour’s load quantity
There are three ways machines learn: supervised is chosen as the output variable, and the input variables
learning, reinforcement, and unsupervised learning[17]. are used as an input sample. The majority of the data
Predicting energy demand frequently makes use of (70%) in each network is used for training the model,
neural networks and DL, particularly LSTM and while the remainder (30%) is used in testing. The
ensemble learning models. short-term load prediction in the microgrid is tested
Through the ANN models, researchers have without considering climate data. Instead, it focuses on
concluded that building efficiency and rising energy the application conditions that electricity generators
demands are crucial to sustainability[18]. Their and consumers of the microgrid would encounter at
exploration aims to decide the general patterns while any given time. The researcher’s take on the outcomes
using ANNs to determine the energy consumption of a of the various DL methods is presented in Table 1.
structure. They concluded that while zeroing in on the Short-term load prediction in the microgrid is more
feed-forward brain network, they discovered that there accurate and efficient using the model[21].
are a few holes, principally in application, because The long transient memory LSTM[22] is used to
ANN is more fit to time series information, but this is improve the planning capacity of utility companies by
seen in only 14% of the cases they cover. They improving their ability to predict energy load
discovered that 6% of the applications are for general consumption, which can help in deciding whether a
regression and radial bias neural networks. It is new energy plant, transmission lines, or choosing
determined from their findings that energy between different fuel sources during production are
management, optimization and conservation needed. The researchers could show that the model is
forecasting strategies are not as suitable for day-to-day determined to be highly accurate, the MAPE obtained
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 251

Table 1 Limitations of the related work.

Model Measure Technique employed Limitation Source
• Does not perform well outside the model’s
Forecasting training range,
Runge and
ANN model building energy use ANN • Overfitting,
Zmeureanu[18]
and demand • Inadequate selection of hyperparameters,
• Internals of the models are not known.
Forecasting Different ANN models are constructed, and the
electrical energy input variables of the data used do not reflect the
ANN-based STFL
consumption of a ANN and STLF characteristics of the target building (missing Moon et al.[19]
model
building or building characteristics); therefore, the forecast is not the
clusters most accurate.
Microgrid is examined without the presence of
Forecasting
SVR-LSTM hybrid Renewable Energy Sources (RES) in the data, Moradzadeh
consumption load SVR and LSTF
model and it only includes loads for households and et al.[21]
in microgrid
commercial consumption.
Forecasting There is room for performance improvement by
LSTM-RNN model electricity load LSTM and RNN incorporating weather parameters into the Agrawal et al.[22]
demand training data.
Predict time series
LSTM novel model LSTM Missing measurement equipment. Wang et al.[23]
with periodicity
Decision tree, gradient-
Forecasting big data
Ensemble model boosted trees, and – Galicia et al.[24]
time series
random forest
Probabilistic load LASSO-based quantile Difficulty in obtaining a narrow Predicting
Deep ensemble
forecasting in smart combination strategy and Interval (PI), which interferes with the model Yang et al.[25]
learning
grids end-to-end ensemble accuracy.
CNN for feature extraction
The model only uses the conditions at a point to
and DL neural network for
test for susceptibility to landslides. To achieve
classification by sorting
Mapping landslide more accuracy, a model is recommended that
CNN-DNN model pixels and grouping them Azarafza et al.[26]
susceptibility uses a conjunction of several conditions over an
into high-susceptibility
area. The researchers leave this as a point for
and low-susceptibility
further research.
groups
The study lacks some effective data, including
RNN algorithm National-scale
Two novel DL algorithms, soil depth, soil texture, and distance from water
model and CNN landslide Ngo et al.[27]
RNN and CNN table. These factors can help enhance the
algorithm model susceptibility
predictive power of the algorithms.
GeoDetector and ML
Hybrid model using Landslide The model selects SVM as the most effective
cluster (ANNs, Bayesian
GeoDetector and susceptibility output ML, which works well with binary data, Xie et al.[28]
network, logistic
ML cluster model mapping but no mention of other presentations of data.
regression, and SVM)
Spatially explicit Landslide
DNN Exclusive for spatial parameters. Achu et al.[29]
DNN model susceptibility

is 6.54 within a confidence interval of 2.25%. Model resolution between March 2018 and July 2018. The
training takes 30 minutes. For a 5-year forecast, annual experiment demonstrates that time as a variable
offline training is required, making the computational accurately reflects the periodicity. It is found that
time a benefit. The LSTM–RNN model is suitable for LSTM shows better performance than forecast
predicting future locational marginal electricity methods, such as Back Propagation Neural Network
processes[22]. (BPNN), AutoRegressive Moving Average model
LSTM techniques are used in an attempt to provide (ARMA), and AutoRegressive Fractional Integrated
credible advice for energy resource allocation, energy Moving Average model (ARFIMA). For long-term
saving, and improving power systems[23]. Over five time series predictions, the LSTM’s Root Mean Square
months, experimental data were collected at a minute Error (RMSE) is 19.7% lower than the BPNN value,
252 Big Data Mining and Analytics, June 2024, 7(2): 247−270

54.85% lower than the ARMA value, and 69.59% Repository. Between December 2006 and November
lower than ARFIMA[23], and shows excellent energy 2010, 2 075 259 measurements from households were
forecasting potential[23]. included in the dataset. About 1.25% of the rows have
Ensemble learning[24] is used to determine multistep missing measurement values[30], but aside from that,
forecasting for time series data. Three the dataset contains the calendar timestamps. The 12
techniques—gradient-boosted trees, decision trees attributes are date, time, global active power, global
algorithm, and weighted least squares, compute the reactive power, voltage, global intensity, and
weight of the ensembles. Through this, it is possible to sub-metering 1 through 3, which are illustrated in
produce the dynamic or static ensemble model using a Fig. 1[30]. To select the data to use for the model
two-weight updating strategy technique. The prediction training, different sequence lengths are identified. This
problem is then decomposed into prediction process cannot be handled randomly because random
subproblems, where each subproblem value is used in
sampling would eliminate the possibility of catching
the forecasting horizon to obtain the ensemble member
the seasonality in the data. The study therefore uses
predictions. The researchers determined that their
FFT to extract the right sequence lengths from any
approach is scalable because DL algorithms based on
given time series. Applying FFT ensures that the
Apache Spark, a big data engine, can solve the
sample sequence lengths capture the different
subproblems. The data fed to the ensemble models is
10-year electrical data measured at 10-minute intervals. seasonalities, patterns, and other time-dependent effects
The researchers showed that the static and dynamic in the entire time series.
ensembles perform better than the individual members. 3.2 Applying the proposed stacked LSTM
The dynamic model Mean Relative Error (MRE) value snapshot ensemble
is 2%, the highest accuracy level obtained. It is also a
promising result for forecasting large time series[24]. Improving energy management services necessitates
The deep ensemble learning study on smart electric accurate energy consumption predictions in residential
grids is based on probabilistic load prediction. It is and commercial buildings. However, it is challenging
postulated that accurate load predictions are important to make accurate predictions about energy
in decisions involving benefits and costs for electrical consumption due to the unpredictability of noisy
grids[25]. The Least Absolute Shrinkage and Selection data[31]. Complex variables cannot be correlated or
Operator (LASSO) model evaluates energy evolved using conventional prediction methods. The
consumption data from 400 small and medium two-layer ensemble is fed with energy consumption
businesses, and 800 consumers. The individual data from the IHEPC along with weather data, allowing
residential data consumption features show higher for multiple sequence lengths in the proposed model,
volatility and diversity than the small and medium which addresses these issues based on the photographs.
businesses data, not withstanding the seasonality and After that, the model is trained, and a base estimate is
regularity of the aggregated load profiles. When made. The base estimate has a lot of output, and
conducting the probabilistic load forecasting on the 800 although the patterns are similar, the different models
consumers, the data are classified using one hour and learn differently. The meta-learner makes it possible to
one day intervals. The DNNs used in the ensemble select the appropriate sequences from weighted
models are randomly chosen, with a total of 7 DNNs
snapshots, effectively preventing random
with 512 hidden layer nodes, and the randomized [32]
distribution .
numbers between 1 and 4 in the hidden layers. The
The stacked ensemble LSTM DL algorithm, an
ensemble forecasts are refined using the LASSO-based
quantile combination model. advanced RNN that takes the place of the original cell
neurons, is the tool used for regression. The RNN
3 Proposed Model algorithm’s unique characteristics are passed down to
the DL algorithm, allowing the inputs to be considered
3.1 Dataset connected time series. Also, the LSTM cells’ intricate
The electrical energy consumption prediction models structure can solve problems with disappearing and
are validated with the help of Hebrail and Berard’s vanishing gradient limitations[33]. Input, cell status,
IHEPC dataset from the UCI Machine Learning forget, and output gates are the four essential
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 253

Index Variable Description

1 Day A value from 1 to 31
2 Month A value fron 1 to 12
3 Year A value from 2006 to 2010
4 Hour A value from 0 to 23
5 Minute A vaue from 1 to 60
6 Global active power Household global minute-averaged active power (In kilowatt)

7 Global reactive power Household global minute-averaged reactive power (In kilowatt)

8 Voltage Minute-averaged voltage (in volt)

9 Global intensity Household global minute-averaged current intensity (in ampere)

10 Sub-metering 1 An oven and a microwave, hot plates being not electric but gas
powered (in watt-hour of active energy)
11 Sub-metering 2 This variable corresponds to the laundry room, containing a
washing machine, a tumble-drier, a refrigerator, and a light
(In watt-hour of active energy)
12 Sub-metering 3 This variable corresponds to an electric water heater and an air
conditioner (in watt-hour of active energy)
13 Temperature The measured temperature in degrees Celsius or Fahrenheit

14 Humidity Percentage of water vapor in the air

15 Wind speed Speed of wind, typically measured in kilometers per hour

or miles per hour
16 Wind direction Direction frorm which the wind is blowing, typically
measured in degrees
17 Precipitation Amount of rain, snow, or other forms of precipitation that
fell during a specific time period

Fig. 1 IHEPC dataset features[30].

components of the utilized LSTM algorithm. The Where

forget, input, and output gates are used to, respectively, • it represents the input gate,
keep, update, and delete the data in the cell status[34]. • Cet represents the memory gate,
The forget gate is responsible for deciding which • Ct represents the cell state,
data should be deleted or kept from the previous step, • ft represents the forget gate,
due to the presence of the sigmoid layer. The data that • ot represents the output gate,
needed to be saved in the new cell state are then • ht represents the hidden state.
identified. A sigmoid function is used in the input gate And:
layer to determine the update values. New vector • σ is the sigmoid function,
values are produced by the tanh layer and injected into • Ui , Wi , Uc , Wc , Uo , Wo , and Wo are the weight
the state. The states are then merged to produce a new matrices, bi , bc , b f , and b0 are the biases,
status update, and the LSTM memory is the cell’s state. • xt is the input data.
In this case, the algorithm performs better than LSTM has a hidden state, which is the short-term
standard RNN when processing longer input memory. It also has a cell state that is the long-term
sequences. Past cell states are coupled to the forget memory. In Fig. 2[36], they are shown as ht−1,
gate in each time step to determine the broadcastable representing the hidden state of the previous
data. The values are later combined in the input gate to timestamp, ht is the current timestamp hidden state,
create new cell memory. Finally, the LSTM cells Ct−1 is the previous cell state, and Ct is the current cell
produce and distribute energy. The cell state passes state. xt is the input signal into the cell. The equation to
beyond the tanh hyperbolic function, filtering the value solve it represents the input gate. It receives previous
of the cell state between −1 and 1, as shown in predictions and new information as input. It holds the
Fig. 2[34, 35]. information and manipulates it, hence updating it, with
254 Big Data Mining and Analytics, June 2024, 7(2): 247−270

Ct−1 Ct
it = σ (Ui xt + Wi ht−1 + bi)
tanh ~
Ct = tanh (Uc xt + Wc ht−1 + bc)
~
Ct = gt × Ct−1 + it × Ct
ft = σ (Cxt + Wg ht−1 + bf)
σ σ tanh σ ot = σ (Uo xt + Wo ht−1 + bo)
ht−1 ht ht = ot × tanh (Ct)

Fig. 2 LSTM structure[36].

the assistance of the sigmoid function. Equation ft provided that a set S = { s1 , s2 , . . . , sn } contains
represents the forget gate, which is responsible for sequences of various lengths. The process is repeated
removing information no longer needed by a cell. The with a different data slice following the first data slice’s
equation that solves for ot is the output gate and is training of the LSTM, and the cycle continues.
responsible for establishing the results of the cell. One Snapshots of the various sequences are saved for each
LSTM layer can have multiple timestamps. So, a data slice. The mean is derived as the base forecast,
timestamp receives data from a previous timestamp and and the collected snapshot estimates are combined
new information. The new information goes directly to from this point. In this case, meta-learning is used to
the input gate, while the previous timestamp acquire the mean function. The weight matrices from
information passes through the forget gate to select the first snapshot are used as the second sequence
only the needed information for the cell passed to the length for the current data slice because there are two
input gate. The input gate computes the new LSTMs. Meta-learning is used to combine all of the
information, the selected information from the previous base model snapshots, resulting in the identification of
gate, and uses the sigmoid function. The results are the final estimate forecast. If 20 sequence lengths are
passed to the output gate, which decides what to used, for instance, 20 snapshots are stored for the first
output. LSTM and used as the second LSTM’s sequence
Because of their design, LSTMs can only handle length. The base estimate is compiled from the
sequences of equal length for each epoch. This is snapshots obtained from the second LSTM and given
following the optimization process’s requirements for to the meta-learner for final estimation[10, 36, 37].
the matrix operations. In some data, however, a Figure 3 shows the stacked LSTM snapshot ensemble
sequence of varying lengths cannot be avoided, so used in the present study.
padding is used. This makes it possible to train models Figure 3 is a diagram of the proposed model, which
with different sequence lengths. However, the patterns received data from the UCI Machine Learning
they learn are related but different. An FFT must be Repository. It is taken as data slices and passed through
used to select the data’s sequence lengths to the FFT, which selects the best sample for the model,
accomplish this[36, 37]. The FFT makes it possible to that is sequences of different lengths, which makes it to
select sequences that distinguish between distinct include data with different time-dependent effects
periods in a given time series. When using FFT to including seasonalities and patterns. The varying length
select energy consumption sequences from time series sequences length are then recorded as the input data
data, it will be possible to capture the series’ seasonal ready to pass through the LSTM layers. The output of
patterns and other time-dependent effects, making the the first layer is used as the input of the second layer.
chosen sequences work better[36, 37]. The output of the second layer is then passed through
The proposed model will feed the LSTM through the LSTM snapshots. The number of different length
various data slices, and diversity will rise. As a result, sequences that are input determines the number of
there will be n snapshots stored for any given LSTM, snapshots taken. So, if the input data has 10 varying
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 255

Energy data storage Weather data storage

Data
slices
LSTM snapshot 1

Base
estimate
FFT Input data Meta-learner
(Varying Two-layer LSTM LSTM snapshot 2
sequence
lengths)

LSTM snapshot 3

Fig. 3 Stacked LSTM snapshot ensemble.

sequence lengths, there will be 10 snapshots passed to 4.2 Seasonality

the base estimate. The base estimate stores the
Seasonality refers to the periodic trends in data.
snapshots gotten and passes them to the meta-learner,
Seasonality suggests predictable patterns that repeat at
which combines the snapshots to get a final forecast
known frequencies within a specified period, such as
estimate for the model.
hourly, weekly, or monthly. From the energy
4 Data Visualization and Analysis consumption data, the variables evaluated are global
active power, global reactive power, and global
4.1 Data sampling and resampling intensity power. It is observed that the global active
The data contain variables sampled relative to time, power (Fig. 9) and global intensity power (Fig. 10)
month, and date. Resampling is also conducted because show similar highs in December and January, and then
it shows much interaction of the data since the show low consumption in August. The global reactive
periodicity of the systems is changed. Since processing power (Fig. 11) shows an opposite trend from the other
the data is expensive, and even more so with larger two variables. Low consumption is detected in the
datasets, resampling is allowed for better decision- December and January periods, while highs are
making in a suitable time frame. Figures 4–8 show the detected in August.
data resampled every 30 minutes, hour, day, week, and The trends can be explained by increased
month, all of which are selected to obtain the best consumption in the winter months and reduced
insights. consumption in the hotter months of the year. From the
During resampling, the month, date, and time metrics trends observed, it can be deduced that energy
are important because they show great interaction, as consumption is affected by weather conditions.
desired. Figures 4–8 show resampling based on 30 Typically, the demand for energy and its consumption
minutes, 1-hour, daily, weekly, and monthly is higher during winter than it is in summer, and the
resolutions. Sampling is important for obtaining peak demand during summer is lower than the peak
effective features used in the algorithm’s performance. demand during winter. Similarly, low demand during
Based on Figs. 4–8, sampling the data based on hourly the summer can be compared to low demand during
resolutions gives a balanced output of the predictions, winter. Electricity demand and consumption tend to
which allows the data to be transformed into a dataset vary daily based on human activity.
with 34 598 observations, from which seasonality Energy consumption is typically lower at night, due
could be derived. to reduced domestic consumption. It is reasonable to
256 Big Data Mining and Analytics, June 2024, 7(2): 247−270

8
Global active
power (kW)

6
4
2
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(a) Global active power data resample over 30 min for mean
0.8
Global reactive
power (kW)

0.6
0.4
0.2
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(b) Global reactive power data resample over 30 min for mean
Global intensity

30
20
(A)

10
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(c) Global intensity data resample over 30 min for mean
Sub-metering 1

60
40
(W·h)

20
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(d) Sub-metering 1 data resample over 30 min for mean
Sub-metering 2

60
40
(W·h)

20
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(e) Sub-metering 2 data resample over 30 min for mean
25
Sub-metering 3

20
15
(W·h)

10
5
0
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000
Index over 30 min
(f) Sub-metering 3 data resample over 30 min for mean

Fig. 4 Data resampling every 30 minutes.

state that there are demands in energy consumption in time of the day as people leave their homes. It is during
the morning hours when people wake up and start these periods that peaks are noted in the sub-metering
using electrical appliances, such as showers, toasters, data. During winter, increased demand is denoted
coffee makers, and kettles. However, the surge in starting at 15:00.
energy consumption increases faster over shorter This trend can be attributed to children returning
durations during winter. The increase in energy from school and adults returning from work. As people
consumption increases and starts stabilizing at a certain return home, electrical appliances, such as televisions,
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 257

6
Global active
power (kW)

4
2
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(a) Global active power data resample over hour for mean
0.8
Global reactive
power (kW)

0.6
0.4
0.2
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(b) Global reactive power data resample over hour for mean
30
Global intensity

20
(A)

0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(c) Global intensity data resample over hour for mean
50
Sub-metering 1

40
30
(W·h)

20
10
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(d) Sub-metering 1 data resample over hour for mean
50
Sub-metering 2

40
30
(W·h)

20
10
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(e) Sub-metering 2 data resample over hour for mean
Sub-metering 3

20
15
(W·h)

10
5
0
0 5000 10 000 15 000 20 000 25 000 30 000 35 000
Index over hour
(f) Sub-metering 3 data resample over hour for mean

Fig. 5 Data resampling per hour.

dishwashers, microwaves, and air conditioners, are beverages when the weather is warmer, but energy
turned on as people warm and light their houses and consumption in the evenings is lower in the summer.
prepare dinner. Consumption falls and drops as people
There are notable peaks in Sub-meterings 2 and 3, such
start going to bed. During summer, the surge is not as
evident as in winter because when people return home, as the air conditioners, refrigerators, and washing
it is still light outside, and their houses are warmer. machines, are used more often, increasing
There is an increased use of refrigerators and colder consumption.
258 Global active Big Data Mining and Analytics, June 2024, 7(2): 247−270

3
power (kW)

2
1

0 200 400 600 800 1000 1200 1400

Index over day
(a) Global active power data resample over day for mean
0.30
Global reactive
power (kW)

0.25
0.20
0.15
0.10
0.05
0 200 400 600 800 1000 1200 1400
Index over day
(b) Global reactive power data resample over day for mean
15.0
Global intensity

12.5
10.0
(A)

7.5
5.0
2.5
0 200 400 600 800 1000 1200 1400
Index over day
(c) Global intensity data resample over day for mean
8
Sub-metering 1

6
(W·h)

4
2
0
0 200 400 600 800 1000 1200 1400
Index over day
(d) Sub-metering 1 data resample over day for mean
Sub-metering 2

8
6
(W·h)

4
2
0
0 200 400 600 800 1000 1200 1400
Index over day
(e) Sub-metering 2 data resample over day for mean
Sub-metering 3

15
(W·h)

10
5

0 200 400 600 800 1000 1200 1400

Index over day
(f) Sub-metering 3 data resample over day for mean

Fig. 6 Resampling per day.

4.3 Rolling average power and global intensity show highs in the months
The rolling average is used to determine a trend’s starting in March 2008 that peak in July 2008, after
direction. It adds up the data points of the energy which there is a decreasing trend. This trend can be
consumption over the defined period and divides the interpreted as increased energy consumption in the
total by the data points provided to determine the months leading up to July 2008, and there is reduced
average. From the rolling curves obtained, it is possible consumption after July 2008, as observed in the data.
to observe trends in the data where the global active The global reactive rolling mean shown in Fig. 12
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 259

2.5
Global active
power (kW)

2.0
1.5
1.0
0.5
0 50 100 150 200
Index over week
(a) Global active power data resample over week for mean
Global reactive

0.20
power (kW)

0.15
0.10

0 50 100 150 200

Index over week
(b) Global reactive power data resample over week for mean
Global intensity

10
8
6
(A)

4
2
0 50 100 150 200
Index over week
(c) Global intensity data resample over week for mean
Sub-metering 1

2.5
2.0
(W·h)

1.5
1.0
0.5
0
0 50 100 150 200
Index over week
(d) Sub-metering 1 data resample over week for mean
Sub-metering 2

8
2
(W·h)

1
0
0 50 100 150 200
Index over week
(e) Sub-metering 2 data resample over week for mean
12.5
Sub-metering 3

10.0
(W·h)

7.5
5.0
2.5
0 50 100 150 200
Index over week
(f) Sub-metering 3 data resample over week for mean

Fig. 7 Data resampling per week.

indicates that the trend is opposite to what is observed months used in defining a season in the data. Once the
and shown in Figs. 13 and 14. algorithm detects the extended season, there is an
autocorrection in the lag, where the lines fall within the
4.4 Autocorrection
confidence level. The benefit of having the extended
It should be noted that after two lags seen in 12–13 months and resultant lags is to show the ability
Figs. 15−17, the lines get inside the confidence interval of the algorithm to adapt to unconventional data and
(light blue area). The lag is caused by the 12–13 still produce accurate results.
260 Big Data Mining and Analytics, June 2024, 7(2): 247−270

2.0
Global active
power (kW)

1.5
1.0
0.5
0 10 20 30 40
Index over month
(a) Global active power data resample over month for mean
Global reactive

0.18
power (kW)

0.16
0.14
0.12
0.10
0 10 20 30 40
Index over month
(b) Global active power data resample over month for mean
Global intensity

8
6
(A)

4
2
0 10 20 30 40
Index over month
(c) Global active power data resample over month for mean
Sub-metering 1

1.5
(W·h)

1.0
0.5

0 10 20 30 40
Index over month
(d) Global active power data resample over month for mean
2.5
Sub-metering 2

2.0
(W·h)

1.5
1.0
0.5
0 10 20 30 40
Index over month
(e) Global active power data resample over month for mean
Sub-metering 3

10
8
(W·h)

6
4
2
0 10 20 30 40
Index over month
(f) Global active power data resample over month for mean

Fig. 8 Data resampling per month.

5 Experimental Evaluation train DL models with TensorFlow as the backend. The

IHEPC data are cleaned, and noisy data are removed
5.1 LSTM settings and integrated into the model before training, testing,
The stacked LSTM snapshot ensemble is implemented and validation.
using Python, and the front and back ends are During training and testing, a MacBook Pro M2
developed using Keras and TensorFlow, respectively. clocked at 3.2 GHz with 32 Gb of RAM is used.
As a high-level DL API, Keras is used to develop and Additionally, the M2 chip has a 10-core GPU, an 8-
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 261

1.8
Global active power (kW)

1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time

Fig. 9 Global active power.

7
Global intensity (A)

1
Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time

Fig. 10 Global intensity power.

0.18
Global reactive power (kW)

0.16

0.14

0.12

0.10

Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time

Fig. 11 Global reactive power.

core CPU with four efficiency and four performance 5.2 Data preparation
cores, and a 16-core neural engine. From the dataset, The dataset contained 2 075 259 rows and 7 columns.
70% of the data are used in training, and 30% are used The data and date fields are parsed to the date/time
for testing and validation. The technique used for column and converted to the index column during
model validation is a train/test split. Validation occurs importation. The outliers in the data, noisy data, are
between the training and test stages. Based on the new cleaned by filling the null values and noise with the
dataset obtained, the LSTM settings used are mean values in their respective fields. The data are
normalization set to between 0 and 1, the batch size is successfully integrated after cleaning. From the dataset
50, the epoch number is 100, and there are 4 LSTM containing 2 075 259 rows, 25 979 rows contain null
layers used. values, which are filled with the mean value. This is
262 Big Data Mining and Analytics, June 2024, 7(2): 247−270

1.14
Global active power (kW)

1.12

1.10

1.08

1.06

Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time

Fig. 12 Rolling mean for global active power over 12 month period.
4.9

4.8
Global intensity (A)

4.7

4.6

4.5

4.4
Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time

Fig. 13 Rolling mean for global intensity over 12 month period.

0.135
Global reactive power (kW)

0.130

0.125

0.120

0.115

0.110
Jan. 2007 Jul. 2007 Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
Time

Fig. 14 Rolling mean for global reactive power over 12 month period.
1.00 1.00
0.75 0.75
0.50 0.50
Correlation

Correlation

0.25 0.25
0 0
−0.25 −0.25
−0.50 −0.50
−0.75 −0.75
−1.00 −1.00
0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
Lag Lag

Fig. 15 Autocorrection of global active power. Fig. 16 Autocorrelation of global intensity.

Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 263

1.00 5.4 Normalization

0.75
0.50 The goal of normalization is that since the variables in
Correlation

0.25 the datasets are measured on different scales, that do

0 not all contribute equally to the model learning and
−0.25 fitting and could cause bias. Therefore, to handle the
−0.50 probable problem, normalization of the data is
−0.75 completed using the MinMaxScalar function. The
−1.00 function normalizes the data used in the model between
0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
Lag a minimum value of 0 and a maximum value of 1. This
is done so that all feature values are within the 0–1
Fig. 17 Autocorrection of global reactive power.
range.
done while cleaning the data to deal with biasing the 5.5 Feature selection
results of the DL model used, and minimizing the
The Pearson Correlation Coefficient (PCC) is used to
model’s inaccuracy.
select the most relevant features from the dataset, with
5.3 Data transformation the value ranging from −1 to 1. The technique applies
covariance and two other factors to determine the
The Dickey-Fuller test for the presence of a unit root is
strength of relationship between the features and how
used in the analysis to test our time series dataset, A
strongly the features correlate. After applying PCC, it
p-value is obtained from the test, which is used to make is found that the voltage feature has a negative
inferences about the dataset. The null hypothesis correlation. Figure 18 shows correlations of the
assumes the presence of a unit root; therefore, the features in the dataset.
p-value should be less than 0.05 for the null hypothesis
5.6 Evaluation metrics and results
to be true. The p-value obtained from the data here is 0,
which implies a unit root is existed and leads to make The performance of the models is evaluated using the
transformation on the data to become stationary by metrics of Mean Absolute Error (MAE), coefficient of
taking lag differences of the series. determination ( R2 ), RMSE, and MAPE as follows:

1.0
Global active power 1.000 0.250 −0.400 1.000 0.480 0.430 0.640
Negative (Blue) to positive (Red) correlation
0.8
Global reactive power 0.250 1.000 −0.110 0.270 0.120 0.140 0.090
0.6
between 2 variables

Voltage −0.400 −0.110 1.000 −0.410 −0.200 −0.170 −0.270

0.4
Global intensity 1.000 0.270 −0.410 1.000 0.490 0.440 0.630
0.2
Sub-metering 1 0.480 0.120 −0.200 0.490 1.000 0.055 0.100
0
Sub-metering 2 0.430 0.140 −0.170 0.440 0.055 1.000 0.081
−0.2
Sub-metering 3 0.640 0.090 −0.270 0.630 0.100 0.081 1.000
−0.4
er

ity

3
g

g
w

s
lta

in
en
po

er
nt
e

et
li
tiv

tiv

m
ba
ac

b-
lo
re

Su
l

G
ba

l
ba
lo
G

lo
G

Fig. 18 Feature selection-based PCC.

264 Big Data Mining and Analytics, June 2024, 7(2): 247−270
v
t
1∑ produces highly accurate results. Accuracy contributes
n
RMSE = yi )2
(yi −b (1) to the trustworthiness of the model. Trust attests to the
n i=1
predictivity of the model. The RMSE value, at close to
∑
n
0, is on the lower side, which means it is a more stable
|yi −b
yi |
i=1 model. In this case, we can deduce that the model is
MAE = (2)
n highly stable. The predictivity and stability of the
1 ∑ |yi −b
n model will contribute immensely to its interpretability,
yi |
MAPE = × 100 (3) which is how people understand the workings of the
n i=1 |yi |
model. However, interpretability is only needed when
∑
n
the model gives results that are out of the estimated
y i )2
(yi −b
R2 = 1 −
i=1
(4)
result bounds. It references the need to understand why
∑n
the results are out of the expected range.
(yi − y)2
i=1

where n is the sample size, yi is the true value, b

yi is the
6 Model Training and Loss
predicted value, and y is the mean of the sample. The The train/test validation model trains, validates, and
prediction error standard deviation, or RMSE, is a tests the splits of the data, with the percentage used for
measure of how far apart the data points are from the training, validating, and testing being 70%, 10%, and
regression line. MAE is the average of the absolute 20%, respectively. The first step is training the data
differences between the actual and predicted values for with the training set. Next is usually the validation
every instance in the test dataset, considering that all process, in which the results from the training process
individual differences have the same weight. It is used are validated, and the hyperparameters are tuned with
to quantify the average magnitude of the prediction the validation set. The results are assessed using
errors and ignores the direction. MAPE is a relative RMSE, MAPE, MAE, and R2. The purpose of this is to
metric that represents the actual data percentage as the achieve satisfactory performance metrics. Once this
average value of the relative error. MAPE is used to stage is completed, the process moves on to testing the
evaluate the model’s accuracy through the ratio data.
reflection of the actual value to absolute error values of Because this is a supervised learning model in which
all samples. When the index is closest to 0 is when it is the DL algorithm looks at many examples to find a
most accurate. The R2 score is used to assess the good model that reduces loss, model training requires
performance of the linear regression model leading to finding the best values for each weight and bias. Losses
variations in the output dependent variables that are are penalties for poor predictions, where loss is the
predictable from the input independent variables. At number indicating how bad the data are upon which the
the same time, the coefficient of determination ( R2 ) prediction is made by the developed model. Losses also
shows how well the model fits in the prediction, with indicate insufficient data preprocessing practices.
values close to 1, indicating the best prediction When the model’s prediction is perfect, the loss is zero,
performance. RMSE is best to use in describing the and the objective of training is to select bias and weight
degree of deviation between the true value and the sets with low average data loss. The loss values are
predicted result. Lower RMSE values indicate a more evaluated using RMSE.
stable model. In Fig. 19, we can see that the model performed
After testing, the RMSE, MAE, MAPE, and R2 are optimally, with a value close to zero. From the training
found to be 0.020, 0.013, 0.017, and 0.999, and test graphs obtained, convergence has occurred.
respectively. Lower values of MAE, RMSE, and There is no overfitting or underfitting of the data, based
MAPE indicate the good accuracy of the developed on these test and train graphs, and the learning rate is
model. Also, the closer the coefficient of determination based on a decaying function. Figure 19 thus implies
( R2 ) is to 1, the better the model’s performance. For that validation is successful.
this model, the coefficient of determination ( R2 ) is If validation is not effective, underfitting or
0.999. From the values of the evaluation metrics, it can overfitting can be identified from the loss graph.
be deduced that the model fits the datasets and Underfitting implies that the developed model cannot
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 265

0.008 Train curve can be recognized by the test and training loss
0.007 Test that reduce to the point of stability and with a minimal
0.006 gap between the final loss values. The model’s loss is
0.005 almost always lower on the training data than on the
Loss

0.004 test data. The implication is that there should be a gap

0.003 (generalization gap) between the training and test loss
0.002
learning graphs. As shown in Fig. 19, the model’s
0.001
training loss curve reduces until stability is obtained.
0
Since a good fit is obtained, the model’s prediction
0 20 40 60 80 100
Number of epochs capacity is observed to be highly accurate, as shown in
Fig. 20, The conclusion that the model is highly
Fig. 19 Model loss.
accurate is warranted because the actual and predicted
learn from the training dataset. Underfit models are values match along the graph from the data points in
identified in the training loss learning curve. A typical the crests and troughs. Further, there are no outliers in
underfit learning curve has noisy or flat values of the data to suggest that there is an anomaly.
comparatively high losses, indicating the model could
7 Comparative Analysis of Models Using
not completely learn the training dataset. An overfitting
the IHEPC Dataset
curve describes a model that learns too well from the
training dataset and includes the random fluctuations or The comparison of the stacked LSTM ensemble to
statistical noise found in that data. The issue with other DL models is thoroughly explained in this
overfitting is that when the model is specialized to use section. The consumption prediction is used to
training data, it loses its capacity to generalize new determine whether the stacked LSTM ensemble
data, which results in an increased chance of outperforms other DL algorithms that use the IHEPC
encountering generalization errors. Overfitting usually dataset. Figure 20 depicts the model and dataset’s
occurs when a model contains more capacity than is performance, indicating that the obtained results are
needed for the problem, which translates into too much satisfactory. Various DL models are compared with the
flexibility. IHEPC dataset to determine performance, as shown in
This can also happen when training goes on for too Table 1. The models developed for the datasets include
long. Overfitting plots exhibit a continuing loss as the linear regression[38], ANN[39], CNN[40], CNN-
experience (epoch) decreases. A good fit curve, like LSTM[38], CNN Bidirectional LSTM (CNN-
Fig. 19, is the goal of the DL model and can be found BiLSTM)[41], CNN-LSTM autoencoder[38], CNN Gated
between the underfit and overfit models. The good fit Recurrent Unit (CNN-GRU)[42], CNN Echo State
5
Actual
Prediction
4
Global active power (kW)

0
0 100 200 300 400 500 600 700
Time step for first 750 hours

Fig. 20 Model’s prediction accuracy.

266 Big Data Mining and Analytics, June 2024, 7(2): 247−270

Network (CNN-ESN)[43], two-stream deep network model for DL analysis and short-term load prediction
STLF (namely STLF-Net)[44], residual GRU[45], ESN- in residential buildings. The MAPE, RMSE, and MAE
CNN[46], Region-based CNN (R-CNN) with Meta- are found to be 36.24, 0.4386, and 0.2674, respectively.
Learning LSTM (ML-LSTM)[20], standard LSTM with Khan et al.[45] forecasted energy demand and supply
LSTM-based Sequence-to-Sequence (S2S) using a residual GRU model. They discovered that the
architecture[47], Multiplicative LSTM (M-LSTM)[48], RMSE and MAE are 0.4186 and 0.2635, respectively.
Deep-Broad Network (DB-Net)[49], explainable The ESN-CNN DL model is utilized by Khan et al. in
autoencoder[50], residual GRU-based hybrid model[45], 2022[46] to enhance energy prediction. The study’s
hybrid DL network[51], multi-headed attention error values are 0.2153 (RMSE) and 0.1137 (MAE).
model[52], and Conventional LSTM-based hybrid Alsharekh et al.[20] improved short-term load prediction
architecture Network (CL-Net)[53]. Compared to all using the hybrid model and R-CNN. The RMSE, MAE,
other models, the stacked LSTM ensemble reveals MAPE, and R2 values they discovered are 0.0325,
lower error rates of 0.020, 0.013, 0.017, and 0.999 for 0.0144, 1.024, and 0.9841, respectively.
RMSE, MAE, MAPE, and R2, respectively. In an Based on its consumptive nature, the S2S model is
examination, based on RMSE, MAE, and MAPE, Kim studied and evaluated using the standard LSTM and an
and Cho[38] determined that the linear regression LSTM model based on sequence/no sequence. Findings
model’s performance has an hourly resolution of show that LSTM performs better at hourly resolution
0.6570, 0.5022, and 83.74, respectively. Rajabi and but not at a per minute resolution[47]. The RMSE is
Estebsari[39] determined that the ANN model’s 0.625. Researchers Kim and Cho[38] developed the
performance has RMSE and MAE values of 1.15 and CNN-LSTM model, which uses a hybrid connection
1.08, respectively. The structure uses recurrence plots between the LSTM and CNN networks. The CNN
to encode time series data into images, and the model network in the model extracts intricate features from
performs better than CNN, SVM, and ANN. Khan et variables that impact consumption. The LSTM
al.’s[40] work found that the LSTM autoencoder hybrid algorithm is utilized for modelling temporal
CNN model has RMSE and MAE values of 0.67 and information. The RMSE, MAE, and MAPE values are
0.47, respectively. The model performs best with daily 0.595, 0.3317, and 32.83, respectively. The explainable
predictions as opposed to hourly predictions of autoencoder DL model is used to forecast consumption
household electricity consumption. Using the CNN- for 15, 30, 45, and 60 minutes in another model with
LSTM DL algorithm, Kim and Cho[38] developed a sample data. The specialists utilize a t-SNE calculation
model to predict residential energy consumption. Their to make sense of and imagine the estimated results.
analysis reveals that the RMSE, MAE, and MAPE The MAE value produced by their model is 0.3953[50].
have values of 0.595, 0.3317, and 32.82, respectively. Khan et al.[49] published works that utilize a hybrid
Ullah et al.[41] created a CNN multilayer bidirectional connection of bidirectional LSTM and CNN networks
LSTM network based model for predicting residential along with the DB-Net algorithm to forecast
energy consumption. They discovered that the RMSE, consumption. The model’s error values are 0.1272
MAE, and MAPE values are 0.565, 0.346, and 29.10, (RMSE) and 0.0916 (MAE). Ullah et al.[48] utilized the
respectively. conventional ML and DL sequential models for energy
Using the CNN-LSTM autoencoder, Kim and Cho[38] consumption predictions. Based on error metrics, their
created a model that could anticipate residential energy investigations reveal that the M-LSTM model has
consumption. According to their research, the RMSE superior prediction ability over the DL and ML
and MAE metrics have model errors of 0.47 and 0.31, algorithms. The M-LSTM model’s error values are
respectively. Sajjad et al.’s[42] study utilizing the hybrid 0.3296 (RMSE) and 0.3086 (MAE), based on an hourly
model of CNN and GRU to forecast energy resolution. Haq et al.[51] predicted energy consumption
consumption shows that the error values are 0.47 by residential and commercial users using a novel
(RMSE) and 0.33 (MAE). Khan et al.[43] wanted to use hybrid DL model. The model acquires RMSE and
DL algorithms to improve energy harvesting and MAE upsides of 0.324 and 0.311, respectively. The
selected the CNN-ESN model. Their analysis reveals RNN model incorporating multi-headed attention is
error values of 0.0472 (RMSE) and 0.0266 (MAE)[43]. created by Bu and Cho[52] to forecast energy
In 2022, Abdel-Basset et al.[44] used the STLF-Net consumption and selectively determine spatiotemporal
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 267

characteristics. The MSE value is 0.2662, but the that lack dimension reduction algorithms[54] to allow
model provides no error metrics. Khan et al.[45] created for seasonality observation, the stacked snapshot
a hybrid model with Residual GRU (R-GRU) and LSTM ensemble shows that it is possible to investigate
dilated convolutions. The RMSE and MAE error seasonality attributed to energy consumption. Another
metric scores are 0.4186 and 0.2635, respectively, advantage of using the model is that it is easy to train
when this model is used to predict energy generation and validate. Furthermore, it supports big data and
and consumption. Khan et al.[53] modelled the CL-Net could dynamically support the model weights used
architecture using the ConvLSTM hybrid to assess the without many adjustments to the dataset. The model is
model’s accuracy in predicting energy consumption. designed to be simple and functional, such that it could
Their testing of the model results in an RMSE score of provide a relatively inexpensive method of evaluating
0.122 and an MAE score of 0.088. In this comparison, big energy datasets. Finally, the model includes an
most of the models perform better than this one. The algorithm that trains the LSTM model sequentially.
RMSE, MAE, MAPE, and R2 of the proposed model This allows the model to learn different patterns. The
are 0.020, 0.013, 0.017, and 0.999, respectively. The advantage of this feature is that the estimates provided
developed model has the lowest error scores of any by the final model are very robust and accurate due to
model, indicating that it accurately predicts energy the high levels of generalization. The model’s accuracy
consumption. The performance comparison of different and stability are measured using RMSE, MAE, MAPE,
prediction models is summarized in Table 2. and R2 as 0.020, 0.013, 0.017, and 0.999, respectively.
8 Conclusion 0.020 for the RMSE demonstrates the model’s high
level of stability, while 0.017 for MAPE is very close
Using the developed model, it is possible to accurately to 0, signalling high-level accuracy with the model.
predict energy consumption. Compared to other studies The R2 results of 0.999 is nearly 1, which shows the
Table 2 Prediction model comparison. model’s good performance. Accuracy, stability, and
Model RMSE MAE MAPE R2 performance give the model consistent results, but if
Linear regression[38] 0.6570 0.5022 83.740 – the results are out of the ordinary, the model allows for
ANN[39] 1.1500 1.0800 – – humans to understand the causes of those results. This
CNN [40] 0.6700 0.4700 – – refers to interpretability, which permits a human to
CNN-LSTM[38] 0.5950 0.3317 32.830 – explain the cause-and-effect of such anomalous results.
CNN-BDLSTM[41] 0.5650 0.3460 29.100 – Despite the model’s advantages, there are some
CNN-LSTM
0.4700 0.3100 – – limitations encountered, such that the model’s
autoencoder [38]
CNN-GRU[42] 0.4700 0.3300 – –
performance is not tested with real-time performance to
CNN-ESN[43] 0.0472 0.0266 – – determine its robustness. Therefore, the model should
STLF-NET[44] 0.4386 0.2674 36.240 – be tested in the future to determine its dynamic
ESN-CNN[46] 0.2153 0.1137 – – performance in a real-time environment. In addition, in
R-CNN with the present study, the model’s accuracy and stability
0.0325 0.0144 1.024 0.9841
ML-LSTM[20] results are used to infer interpretability. Future studies
Standard LSTM and
should determine an independent way to measure a
LSTM-based S2S 0.6250 – – –
architecture[47] model’s interpretability.
M-LSTM[48] 0.3296 0.3086 – – Based on these findings, the energy sector could use
DB-NET[49] 0.1272 0.0916 – – such a model, as it provides high-value insights, value-
Explainable addition, and service improvements based on its
– 0.3953 – –
autoencoder[50] effective use of big data regarding energy
Residual GRU-based
0.4186 0.2635 – – consumption. Because energy data reliably and in real
hybrid model[45]
Hybrid DL network[51] 0.3240 0.3110 – – time reflects economic activity trends of populations,
Multi-headed attention businesses, and the community, by virtue of these
0.2662 – – –
model[52] technological advantages and data resources, this can
CL-Net architecture[53] 0.1220 0.0880 – – be regarded as an important data resource for energy
Proposed model 0.0200 0.0130 0.0170 0.9990
companies to develop data platforms with high-level
268 Big Data Mining and Analytics, June 2024, 7(2): 247−270

accuracy and performance algorithms that can integrate management in residential buildings using energy hub
data from multiple industries, facilitating the approach, Build. Simul., vol. 13, no. 2, pp. 363–386, 2020.
[5] Z. Wang and R. S. Srinivasan, A review of artificial
transformation and upgrading of governments and intelligence based building energy use prediction:
organizations. Contrasting the capabilities of single and ensemble
For energy companies, the application of the stacked prediction models, Renew. Sustain. Energy Rev., vol. 75,
LSTM snapshot ensemble and other DL models to pp. 796–808, 2017.
energy consumption is still in early stages of [6] H. Song, A. K. Qin, and F. D. Salim, Evolutionary model
construction for electricity consumption prediction, Neural
development. The data points have numerous
Comput. Appl., vol. 32, no. 16, pp. 12155–12172, 2020.
compound values that must be discovered and mined [7] E. Jahani, K. Cetin, and I. H. Cho, City-scale single family
by internal and external businesses to obtain additional residential building energy consumption prediction using
insights and trends. Energy generation companies need genetic algorithm-based numerical moment matching
to pay close attention to how big data about energy technique, Build. Environ., vol. 172, p. 106667, 2020.
consumption work and how businesses and [8] J. Xu, W. Gao, and X. Huo, Analysis on energy
consumption of rural building based on survey in northern
multinational corporations use them by working
China, Energy Sustain. Dev., vol. 47, pp. 34–38, 2018.
closely with these businesses to learn more from them. [9] A. Zeng, H. Ho, and Y. Yu, Prediction of building
Businesses are conducting energy research to speed up electricity usage using Gaussian Process Regression, J.
the sharing of energy data. Companies that produce Build. Eng., vol. 28, p. 101054, 2020.
energy need to pay close attention to their key [10] C. Dai, H. Zhang, E. Arens, and Z. Lian, Machine learning
approaches to predict thermal demands using skin
dynamics, intensify learning, and use big data and
temperatures: Steady-state conditions, Build. Environ.,
learning applications. Additionally, they should vol. 114, pp. 1–10, 2017.
investigate cooperative endeavours and share [11] C. Xu, H. Chen, J. Wang, Y. Guo, and Y. Yuan,
mechanisms for collective improvement. Improving prediction performance for indoor temperature
Translating the findings from the stacked LSTM in public buildings based on a novel deep learning method,
snapshot ensemble energy consumption prediction Build. Environ., vol. 148, pp. 128–135, 2019.
[12] D. K. Bui, T. N. Nguyen, T. D. Ngo, and H. Nguyen-
model into the analysis of the energy usage dataset for Xuan, An artificial neural network (ANN) expert system
reporting and determining faults, optimization, and enhanced with the electromagnetism-based firefly
forecasted maintenance in households and businesses is algorithm (EFA) for predicting the energy consumption in
the future direction of this study. buildings, Energy, vol. 190, p. 116370, 2020.
[13] M. A. Jallal, A. González-Vidal, A. F. Skarmeta, S.
Acknowledgment Chabaa, and A. Zeroual, A hybrid neuro-fuzzy inference
system-based algorithm for time series forecasting applied
The authors gratefully acknowledge the support and to energy consumption prediction, Appl. Energy, vol. 268,
invaluable guidance provided by the Faculty of p. 114977, 2020.
Computing and Information Technology (FCIT), King [14] S. Ganguly, A. Ahmed, and F. Wang, Optimised building
energy and indoor microclimatic predictions using
Abdulaziz University (KAU), Jeddah, Kingdom of Saudi
knowledge-based system identification in a historical art
Arabia. gallery, Neural Comput. Appl., vol. 32, no. 8, pp.
References 3349–3366, 2020.
[15] J. S. Chou, N. T. Ngo, W. K. Chong, and G. E. Jr Gibson,
[1] M. Cellura, F. Guarino, S. Longo, and G. Tumminia, Big data analytics and cloud computing for sustainable
Climate change and the building sector: Modelling and building energy efficiency, in Start-Up Creation: The
energy implications to an office building in southern Smart Eco-Efficient Built Environment, F. Pacheco-
Europe, Energy Sustain. Dev., vol. 45, pp. 46–65, 2018. Torgal, E. Rasmussen, C. G. Granqvist, V. Ivanov, A.
[2] T. Zhang, D. Wang, H. Liu, Y. Liu, and H. Wu, Numerical Kaklauskas, and S. Makonin, eds. Cambridge, UK:
investigation on building envelope optimization for low- Woodhead Publishing, 2016, pp. 397–412.
energy buildings in low latitudes of China, Build. Simul., [16] S. Seyedzadeh, F. Pour Rahimian, P. Rastogi, and I. Glesk,
vol. 13, no. 2, pp. 257–269, 2020. Tuning machine learning models for prediction of building
[3] A. D. Pham, N. T. Ngo, T. T. Ha Truong, N. T. Huynh, energy loads, Sustain. Cities Soc., vol. 47, p. 101484,
and N. S. Truong, Predicting energy consumption in 2019.
multiple buildings using machine learning for improving [17] S. Badillo, B. Banfai, F. Birzele, I. I. Davydov, L.
energy efficiency and sustainability, J. Cleaner Prod., vol. Hutchinson, T. Kam-Thong, J. Siebourg-Polster, B.
260, p. 121082, 2020. Steiert, and J. D. Zhang, An introduction to machine
[4] A. Raza, T. N. Malik, M. F. N. Khan, and S. Ali, Energy learning, Clin. Pharmacol. Ther., vol. 107, no. 4, pp.
Mona Ahamd Alghamdi et al.: Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble 269

871–885, 2020. Álvarez, A scalable approach based on deep learning for

[18] J. Runge and R. Zmeureanu, Forecasting energy use in big data time series forecasting, Integr. Comput.-Aided
buildings using artificial neural networks: A review, Eng., vol. 25, no. 4, pp. 335–348, 2018.
Energies, vol. 12, no. 17, p. 3254, 2019. [32] J. F. Torres, D. Gutiérrez-Avilés, A. Troncoso, and F.
[19] J. Moon, S. Park, S. Rho, and E. Hwang, A comparative Martínez-Álvarez, Random hyper-parameter search-based
analysis of artificial neural network architectures for deep neural network for power consumption forecasting,
building energy consumption forecasting, Int. J. Distrib. in Proc. 15th Int. Work-Conf. Artificial Neural Networks,
Sens. Netw., vol. 15, no. 9, p. 155014771987761, 2019. Gran Canaria, Spain, 2019, pp. 259–269.
[20] M. F. Alsharekh, S. Habib, D. A. Dewi, W. Albattah, M. [33] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y.
Islam, and S. Albahli, Improving the efficiency of Zhang, Short-term residential load forecasting based on
multistep short-term electricity load forecasting via R- LSTM recurrent neural network, IEEE Trans. Smart Grid,
CNN with ML-LSTM, Sensors, vol. 22, no. 18, p. 6913, vol. 10, no. 1, pp. 841–851, 2019.
2022. [34] N. M. Ibrahim, A. I. Megahed, and N. H. Abbasy, Short-
[21] A. Moradzadeh, S. Zakeri, M. Shoaran, B. Mohammadi- term individual household load forecasting framework
Ivatloo, and F. Mohammadi, Short-term load forecasting using LSTM deep learning approach, in Proc. 5th Int.
of microgrid via hybrid support vector regression and long Symp. Multidisciplinary Studies and Innovative
short-term memory algorithms, Sustainability, vol. 12, no. Technologies (ISMSIT), Ankara, Turkey, 2021, pp.
17, p. 7076, 2020. 257–262.
[22] R. K. Agrawal, F. Muchahary, and M. M. Tripathi, Long [35] C. Xu, W. Li, M. Yu, J. Xu, J. Liu, Y. Wang, and L. Zhu,
term load forecasting with hourly predictions based on A correlation sorting-LSTM method for high accuracy
long-short-term-memory networks, in Proc. 2018 IEEE short-term load forecasting based on smart meter data, in
Texas Power and Energy Conf. (TPEC), Station, TX, Proc. 7th Int. Conf. Information, Cybernetics, and
USA, 2018, pp. 1–6. Computational Social Systems (ICCSS), Guangzhou,
[23] J. Q. Wang, Y. Du, and J. Wang, LSTM based long-term China, 2020, pp. 755–760.
energy consumption prediction with periodicity, Energy, [36] M. Ma, C. Liu, R. Wei, B. Liang, and J. Dai, Predicting
vol. 197, p. 117197, 2020. machine’s performance record using the stacked long
[24] A. Galicia, R. Talavera-Llames, A. Troncoso, I. short-term memory (LSTM) neural networks, J. Appl.
Koprinska, and F. Martínez-Álvarez, Multi-step Clin. Med. Phys., vol. 23, no. 3, p. E13558, 2022.
forecasting for big data time series based on ensemble [37] G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K.
learning, Knowl.-Based Syst., vol. 163, pp. 830–841, 2019. Q. Weinberger, Snapshot ensembles: Train 1, get M for
[25] Y. Yang, W. Hong, and S. Li, Deep ensemble learning free, arXiv preprint arXiv:1704.00109, 2017.
based probabilistic load forecasting in smart grids, Energy, [38] T. Y. Kim and S. B. Cho, Predicting residential energy
vol. 189, p. 116324, 2019. consumption using CNN-LSTM neural networks, Energy,
[26] M. Azarafza, M. Azarafza, H. Akgün, P. M. Atkinson, and vol. 182, pp. 72–81, 2019.
R. Derakhshani, Deep learning-based landslide [39] R. Rajabi and A. Estebsari, Deep learning based
susceptibility mapping, Sci. Rep., vol. 11, no. 1, p. 24112, forecasting of individual residential loads using recurrence
2021. plots, in Proc. 2019 IEEE Milan PowerTech, Milan, Italy,
[27] P. T. T. Ngo, M. Panahi, K. Khosravi, O. Ghorbanzadeh, 2019, pp. 1–5.
N. Kariminejad, A. Cerda, and S. Lee, Evaluation of deep [40] Z. A. Khan, T. Hussain, A. Ullah, S. Rho, M. Lee, and S.
learning algorithms for national scale landslide W. Baik, Towards efficient electricity forecasting in
susceptibility mapping of Iran, Geosci. Front., vol. 12, no. residential and commercial buildings: A novel hybrid
2, pp. 505–519, 2021. CNN with an LSTM-AE based framework, Sensors, vol.
[28] W. Xie, X. Li, W. Jian, Y. Yang, H. Liu, L. F. Robledo, 20, no. 5, p. 1399, 2020.
and W. Nie, A novel hybrid method for landslide [41] F. U. M. Ullah, A. Ullah, I. U. Haq, S. Rho, and S. W.
susceptibility mapping-based GeoDetector and machine Baik, Short-term prediction of residential power energy
learning cluster: A case of Xiaojin County, China, ISPRS consumption via CNN and multi-layer bi-directional
Int. J. Geo-Inf., vol. 10, no. 2, p. 93, 2021. LSTM networks, IEEE Access, vol. 8, pp.
[29] A. L. Achu, G. Gopinath, and U. Surendran, Landslide 123369–123380, 2020.
susceptibility modelling using deep-learning and machine- [42] M. Sajjad, Z. A. Khan, A. Ullah, T. Hussain, W. Ullah, M.
learning methods—A study from southern Western Ghats, Y. Lee, and S. W. Baik, A novel CNN-GRU-based hybrid
India, in Proc. 2021 IEEE Int. India Geoscience and approach for short-term residential load forecasting, IEEE
Remote Sensing Symp. (InGARSS), Ahmedabad, India, Access, vol. 8, pp. 143759–143768, 2020.
2021, pp. 360–364. [43] Z. A. Khan, T. Hussain, and S. W. Baik, Boosting energy
[30] G. Hebrail and A. Berard, Individual household electric harvesting via deep learning-based renewable power
power consumption, https://archive.ics.uci.edu/dataset/ generation prediction, J. King Saud Univ.—Sci., vol. 34,
235/individual+household+electric+power+consumption, no. 3, p. 101815, 2022.
2012. [44] M. Abdel-Basset, H. Hawash, K. Sallam, S. S. Askar, and
[31] J. F. Torres, A. Galicia, A. Troncoso, and F. Martínez- M. Abouhawwash, STLF-Net: Two-stream deep network
270 Big Data Mining and Analytics, June 2024, 7(2): 247−270

for short-term load forecasting in residential buildings, J. local energy systems, Int. J. Electr. Power Energy Syst.,
King Saud Univ.—Comput. Inf. Sci., vol. 34, no. 7, pp. vol. 133, p. 107023, 2021.
4296–4311, 2022. [50] J. Y. Kim and S. B. Cho, Electric energy consumption
[45] S. U. Khan, I. U. Haq, Z. A. Khan, N. Khan, M. Y. Lee, prediction by deep learning with state explainable
and S. W. Baik, Atrous convolutions and residual GRU autoencoder, Energies, vol. 12, no. 4, p. 739, 2019.
based architecture for matching power demand with [51] I. U. Haq, A. Ullah, S. U. Khan, N. Khan, M. Y. Lee, S.
supply, Sensors, vol. 21, no. 21, p. 7191, 2021. Rho, and S. W. Baik, Sequential learning-based energy
[46] Z. A. Khan, T. Hussain, I. U. Haq, F. U. M. Ullah, and S. consumption prediction model for residential and
W. Baik, Towards efficient and effective renewable commercial sectors, Mathematics, vol. 9, no. 6, p. 605,
energy prediction via deep learning, Energy Rep., vol. 8, 2021.
pp. 10230–10243, 2022. [52] S. J. Bu and S. B. Cho, Time series forecasting with multi-
[47] D. L. Marino, K. Amarasinghe, and M. Manic, Building headed attention-based deep learning for residential
energy load forecasting using deep neural networks, in energy consumption, Energies, vol. 13, no. 18, p. 4722,
Proc. 42nd Annu. Conf. IEEE Industrial Electronics 2020.
Society, Florence, Italy, 2016, pp. 7046–7051. [53] N. Khan, I. U. Haq, F. U. M. Ullah, S. U. Khan, and M. Y.
[48] F. U. M. Ullah, N. Khan, T. Hussain, M. Y. Lee, and S. Lee, CL-Net: ConvLSTM-based hybrid architecture for
W. Baik, Diving deep into short-term electricity load
batteries’ state of health and power consumption
forecasting: Comparative analysis and a novel framework,
forecasting, Mathematics, vol. 9, no. 24, p. 3326, 2021.
Mathematics, vol. 9, no. 6, p. 611, 2021. [54] U. Ugurlu, I. Oksuz, and O. Tas, Electricity price
[49] N. Khan, I. U. Haq, S. U. Khan, S. Rho, M. Y. Lee, and S.
forecasting using recurrent neural networks, Energies, vol.
W. Baik, DB-Net: A novel dilated CNN based multi-step
11, no. 5, p. 1255, 2018.
forecasting model for power consumption in integrated

Abdullah S. AL-Malaise AL-Ghamdi is

Mona Ahamd Alghamdi received the master degree in
a professor at FCIT, KAU, Jeddah, King of
information systems from King Abdulaziz University (KAU),
Saudi Arabia, and also a professor at
Jeddah, Kingdom of Saudi Arabia. Currently, she is a PhD
Information Systems Department, School
candidate at Information Systems Department, KAU. Also, she
of Engineering, Computing and Design,
is working as a lecturer at Faculty of Information Technology,
Dar Al-Hekma University, Kingdom of
Arab Open University, Jeddah, Saudi Arabia. She has published
Saudi Arabia. He received the PhD degree
several articles in reputable peer-reviewed international journals.
in computer science from George
Her research interests include artificial intelligence, data
Washington University, USA in 2003. He is a member of the
analytics, and decision support systems.
Scientific Council and holding position as a secretary general of
Scientific Council, KAU. In addition, he is working as the head
Mahmoud Ragab obtained the PhD
of consultant’s Unit at Vice-President for Development Office,
degree from Christian-Albrechts-
as a consultant to Vice-President for Graduate Studies &
University at Kiel (CAU), Schleswig-
Scientific Research, KAU. Previously, he has worked as the
Holstein, Germany. Currently, he is a
head of Information Systems Department, Dar A1-Hekma
professor of data science at Information
University, Egypt, the vice dean for Graduate Studies and
Technology Department, Faculty of
Scientific Research, and the head of Computer Skills
Computing and Information Technology
Department at FCIT. His main research areas are software
(FCIT), KAU, Jeddah, Kingdom of Saudi
engineering and systems, artificial intelligence, data analytics,
Arabia, and also working at Mathematics Department, Faculty
business intelligence, and decision support system.
of Science, Al-Azhar University, Egypt. He workes in different
research groups at many universities, such as the Combinatorial
Optimization and Graph Algorithms Group (COGA), Faculty II
Mathematics and Natural Sciences, Berlin University of
Technology, Berlin, Germany, the British University in Egypt
BUE, Automation, Integrated Communication Systems Group,
Ilmenau University of Technology TU Ilmenau, Thüringen,
Germany. His research focuses on AI algorithms, deep learning,
sorting, optimization, mathematical modeling, data science,
neural networks, time series analysis, and quantum computation.