IET Generation Trans Dist - 2023 - Yi - A deep LSTM‐CNN based on self‐attention mechanism with input data reduction for
IET Generation Trans Dist - 2023 - Yi - A deep LSTM‐CNN based on self‐attention mechanism with input data reduction for
DOI: 10.1049/gtd2.12763
ORIGINAL RESEARCH
Shiyan Yi1 Haichun Liu2 Tao Chen3,4 Jianwen Zhang5 Yibo Fan1
1
The State Key Laboratory of ASIC and System, Fudan University, Shanghai, China
2
Department of Automation, Shanghai Jiao Tong University, Shanghai, China
3
Department of Economics, Department of Statistic and Actuarial Science, Big Data Research Lab, University of Waterloo, Waterloo, Canada
4
Senior Research Fellow, Harvard University, Cambridge, Massachusetts, USA
5
The Key Laboratory of Control of Power Transmission and Conversion of Ministry of Education, Department of Electrical Engineering, School of Electronic Information and Electrical
Engineering, Shanghai Jiao Tong University, Minhang District, Shanghai, China
Correspondence Abstract
Jianwen Zhang, The Key Laboratory of Control of
Power Transmission and Conversion of Ministry of
Numerous studies on short-term load forecasting (STLF) have used feature extraction
Education, Department of Electrical Engineering, methods to increase the model’s accuracy by incorporating multidimensional features con-
School of Electronic Information and Electrical taining time, weather and distance information. However, less attention has been paid to
Engineering, Shanghai Jiao Tong University,
Minhang District, Shanghai 200240, China.
the input data size and output dimensions in STLF. To address these two issues, an STLF
Email: [email protected] model is proposed based on output dimensions using only load data. First, the load data’s
Yibo Fan, The State Key Laboratory of ASIC and long-term behavior (trend and seasonality) is extracted through the long short-term mem-
System, Fudan University, Shanghai 201203, China. ory network (LSTM), followed by convolution to obtain the load data’s non-stationarity.
Email: [email protected] Then, using the self-attention mechanism (SAM), the crucial input load information is
emphasized in the forecasting process. The calculation example shows that the proposed
Funding information
Fudan-ZTE Joint Lab; National Natural Science
algorithm outperforms LSTM, LSTM-based SAM, and CNN-GRU-based SAM by more
Foundation of China, Grant/Award Numbers: than 10% in eight different buildings, demonstrating its suitability for forecasting with only
52277193, 62031009; Pioneering Project of load data. Additionally, compared to earlier research utilizing two well-known public data
Academy for Engineering and Technology Fudan
University, Grant/Award Number: gyy2021-001;
sets, the MAPE is optimized by 2.2% and 5%, respectively. Also, the method has good
Fudan University-CIOMP Joint Fund, Grant/Award prediction accuracy for a wide variety of time granularities and load aggregation levels, so
Number: FC2019-001; CCF-Alibaba Innovative it can be applied to various load forecasting scenarios and has good reference significance
Research Fund For Young Scholars; Alibaba
Innovative Research (AIR) Program; Shanghai
for load forecasting instrumentation.
Committee of Science and Technology,
Grant/Award Number: 19DZ1205403
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the
original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
© 2023 The Authors. IET Generation, Transmission & Distribution published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.
DNN X. Kong [21] DBN Improved DBN for STLF Can abstract nonlinear interactions Cannot extract temporal
layer by layer features
K. Chen [22] ResNet An end-to-end model
W. Kong [23] LSTM Extract temporal features More accurate than other DNNs The structure is too simple.
Feature Kim [6] CNN-LSTM Accurate but many parameters Handle raw data at a higher degree Needs many input features as
extraction of abstraction temperature, humidity, date
Yao [24] CNN-GRU GRU can reduce parameters.
information. Besides, the SD
H. Chen [25] SD-FOA Each factor’s weight assigned information may not be
Zheng [26] SD-Xgboost by optimization algorithm suitable for STLF.
Park [27] SD-RL Using RL to choose SD
Load modeling W. Kong [28] LSTM Can learn residential behavior Appliance-level forecasting Needs appliances’ load
techniques provide finer
Ji [29] HSMM Can adjust different situation Necessitates complex dynamical
granularity and more accuracy models
Data Liu and K-means Using K-means to classify Can handle low-dimensional data Requires massive amount of
decompose Zhao [30] EMD IMFs and Res calculations and time
Deng [31] EEMD Reduce mode mixing in EMD
Model selection Feng [32] Q-learning Selects the most appropriate Able to handle complex situations Model pool affects accuracy,
model in the model pool calculations lead to
Li [33] Meta-learning
tremendous complexity
load data generated by distributed smart meters can represent may realize multilayer nonlinear mapping. With their nonlin-
meteorological and calendar information. Effective mining of ear mapping and adaptable capacities, systems based on deep
real-time load data can produce accurate load forecasts with- learning provide a viable solution for load forecasting [17–20].
out the need for meteorological and calendar data. This can Table 1 shows the related works on STLF. X. Kong [21] pro-
effectively reduce the required data dimension and avoid the posed an improved deep belief network (DBN) method from
transmission and storage of large amounts of data, meeting the the three aspects of model structure, parameter optimization,
requirements of the independent operation and autonomous and data selection for better use in load forecasting. In [22], a
architecture of the microgrid, the improved overall performance prediction method of a novel end-to-end neural network model
of the microgrid, and the friendliness of the system dispatch. using deep residual neural networks (ResNets) was proposed.
Conventional STLF approaches often relate to statistical Deep learning can abstract nonlinear interactions layer by
models, such as the autoregressive moving average (ARMA) layer, which can increase the accuracy of STLF in comparison to
[12], autoregressive integrated moving average (ARIMA) [13], machine learning. However, a deep neural network (DNN), like
and exponential smoothing model [14]. These approaches are DBN and ResNet, is insensitive to temporal correlation, mak-
straightforward to implement and interpretable, but their ability ing it difficult to obtain better prediction outcomes. Considering
to anticipate nonlinear series is low, so they cannot effec- energy consumption is a typical time series, the model’s output
tively handle the STLF problem. In comparison to conventional should relate to the past input. LSTM, a variant of the recurrent
STLF approaches, machine learning has better nonlinear fit- neural network (RNN), is highly competitive in various fields
ting capabilities. The back-propagation neural network [15] and and can take the temporal dependencies of the time series. In
support vector regression (SVR) [16] are commonly employed [23], LSTM was employed to predict loads for individual users,
machine learning time-series prediction algorithms. The back- and its superiority to traditional methods was demonstrated. In
propagation neural network has excellent parameter fitting consideration of the temporal features of the load data, LSTM
skills, but owing to its simple structure, it has low generaliza- was used as a basis for the model in consideration of the tem-
tion abilities and is prone to the local optimum. As SVR is poral characteristics of the load data.There are numerous STLF
based on the learning criterion of structural reduction, it has a methods based on LSTM. The most common method is fea-
high degree of generalizability. SVR performs well with small ture extraction. CNN is the most popular feature extraction
data sample sizes, but it fails to converge with high sample method in STLF, which can extract high-dimensional correla-
sizes. Machine learning can learn nonlinear correlations between tions of a feature set [34–36]. Kim [6] proposed a CNN-LSTM
sequences better than conventional methods, but it typically neural network, which can improve prediction accuracy by ana-
has drawbacks including weak generalization capabilities, trou- lyzing features that influence the load. In [24], LSTM is being
ble establishing hyper-parameters, and prediction accuracy that replaced by a gated recurrent unit (GRU) to reduce the number
frequently depends on the quality and quantity of input data. of parameters. Similar-day selection is another feature extraction
Deep learning superimposes multilayer neural networks, which method in STLF used to discover the similar forecasting day
17518695, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/gtd2.12763 by Cochrane Japan, Wiley Online Library on [09/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1540 YI ET AL.
among historical days based on multiple characteristics, includ- applied to the prediction model, the amount of calculation is
ing temperature, wind speed, and national holidays [7, 8, 37]. several times larger than that of other methods. The model
By weighting features manually, traditional similar-day selection selection approach uses optimization algorithms to select rele-
cannot identify the main factors effectively. A fruit fly opti- vant and effective models from model pools in order to manage
mization algorithm (FOA) was developed to weight features in various circumstances. In [32], Q-learning worked well when it
an intelligent manner in order to solve this problem[25]. Simi- was used in dynamic model selection. This algorithm enables
larly, [26] proposed a xgboost algorithm for evaluating feature the first choice to be deterministic load forecasting, followed by
importance. Unlike most similar-day selection models that uti- a second choice of probabilistic load forecasting. Meta-learning
lize feature-weight learning, reinforcement learning (RL) is used was also used to determine the accuracy of the candidate load
to select similar days directly [27]. Feature extraction mostly forecast models [33]. The main disadvantage of this method is
uses CNN and similar-day selection to handle raw data at a that the model pool plays a significant role, and every model’s
higher degree of abstraction. It can automatically extract inter- accuracy needs to be calculated, causing massive calculations
nal features from data to reduce the error caused by human and complexity.
feature determination. Modeling from the overall level can also In order to answer the challenge of attaining low-cost and
be an efficient way of improving prediction performance [38, efficient load forecasting utilizing accessible electric load data,
39]. W. Kong et al. incorporated residential electricity consump- after extracting long-term power consumption behavior using
tion data (dryers, washing machines, dishwashers, heat pumps, LSTM, we employ CNN to mine the data’s nonlinear prop-
TVs, and wall-hung boilers) to determine residents’ behaviors, erties in depth. Although studies have investigated CNN for
which could explain some of the forecasting variability [28]. feature extraction from input data, few studies have explored
Ji et al. [29] proposed a hidden semi-Markov model that pro- CNN performance in the output dimension. Yin [41] posited
vides a framework for dynamic statistical models of various that CNN has a strong nonlinear mapping ability and that using
household appliances to describe the demands for household CNN in load forecasting can improve the model’s accuracy. In
appliances. The load modeling method is appliance-level fore- [42], CNN was used to enhance the embedding instead of pool-
casting, which predicts the total load by simulating the load ing after Bi-LSTM. Wang et al. demonstrated that the order of
on the appliances. Compared to general forecasting approaches, CNNs and LSTMs had a significant impact on the accuracy
appliance-level forecasting techniques provide finer granularity of their method [43]. Jiang et al. presented a method that uti-
and more accuracy. lizes LSTMs to monitor current random usage and CNNs to
However, these approaches need electrical, meteorological, track long-term regular patterns [44]. Hence, this paper employs
and calendar features. Due to the tiny size of the microgrid an innovative approach by focusing on output dimensions to
region, it is difficult to obtain fine-grained data [10, 11]. The reduce the input data size while keeping accuracy in mind. Com-
majority of the present literature on electrical, meteorological, pared to single models, hybrid models incorporate the strengths
and calendar data presents data gathered in test sites. These of several different models [43, 45]. In recent years, the self-
testing regions are uncommon worldwide. In addition, the trials attention mechanism (SAM) has become a hot spot in load
were developed expressly for microgrid research and collected forecasting. Zhou’s work [46] and the work of IBM Watson
data by distributing questionnaires to consumers and installing [47] both used SAM after an LSTM neural network for one-
measuring tools. Obtaining adequate data for load forecast- dimensional inputs in natural language processing. SAM is also
ing by adapting measuring equipment is costly and subject to useful for load forecasting [48]. Using SAM, Zhao highlighted
large-scale communication challenges in practice. Furthermore, the important information by relating different probabilities
access to fine-grained data is complex and limited for most [49].
microgrids. Consequently, the focus of the study is on how to An innovative method of enhancing LSTM output using
conduct cost-effective and efficient load forecasting using the CNN is presented in this paper, which focuses on the output
currently available electric load data. For predicting with small- dimension called LSTM-CNN-based SAM. This paper makes
dimensional data (electric data), the primary available hybrid the following main contributions:
forecasting methods that may be employed include data decom-
position, integrated learning, and other methods. Researchers 1. We are the first to employ LSTM-CNN in STLF. The revolu-
have used empirical mode decomposition (EMD) to decompose tionary LSTM-CNN-based SAM model can reduce the size
complex nonlinear time series into intrinsic mode functions of input data without compromising forecasting accuracy.
(IMFs) of different frequencies with excellent smoothness and This research presents, in consideration of the output dimen-
regularity [40]. In contrast to wavelet decomposition, in EMD, sion, a hybrid framework for forecasting that uses solely load
there is no need to predefine any base functions. In [30], data.
the EMD results were classified using K-means to reduce 2. We innovatively use convolution kernels to extract user
the number of calculations. In order to solve mode mixing, randomness, resolve non-stationarity characteristics, and
Deng proposed the ensemble empirical mode decomposition circumvent SAM’s local dependence issue.
(EEMD) algorithm [31]. The EMD approach can handle low- 3. We conduct exhaustive experiments at various aggregation
dimensional data without requiring any a priori information, levels and time periods to demonstrate the superiority of the
which can mitigate the impact of complicated environmental model. Compared to the benchmarks, the proposed strategy
factors on STLF. As every IMF after decomposition must be is superior by more than 10%. In addition, the MAPE has
17518695, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/gtd2.12763 by Cochrane Japan, Wiley Online Library on [09/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YI ET AL. 1541
The LSTM-CNN approach is composed of a series of LSTM Last, the output is decided. We use a sigmoid layer named “the
and CNN connections. Load data are typically separated into output layer” to decide which part of the cell state should be
trend, seasonal, and residual categories [50], using LSTM to output. Then, we multiply it with the cell states after tanh to
17518695, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/gtd2.12763 by Cochrane Japan, Wiley Online Library on [09/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1542 YI ET AL.
FIGURE 5 The frequency domain and time domain characteristics of LSTM layer’s output, CNN layer’s output, and real load
When these two conditions are met, the sequence is considered For backward propagation, the work [53] provides a novel
stationary. For statistical characteristics of nonstationary time optimization strategy to address the non-stationarity of the time
series (the expectation and variance vary over time), the data dis- series. It is argued that for a nonstationary time series, the
tribution changes continuously, and the value of the data at the parameter should not be updated with gradient values that are
current time is not related to that a long time ago [53]. The pre- far off from the present time. It provides a time frame k, where
diction is made by determining the relationship between the past k≪T, for updating the gradient with k.
and the present based on the premise of stationary time series.
Owing to the continually shifting distribution of time series, 𝜃i = 𝜃i−1 − 𝛼gi , (9)
the relationship between the past and present for nonstationary
sequences changes with time, making prediction challenging. where 𝜃i is parameter in iter step i, 𝜃i−1 is parameter in iter step
Finally, inspired by the behavior of nonstationary time series, i − 1, 𝛼 is the learning rate, and gi is the calculated gradient of
this study resolves the nonstationary time-series problem by parameter 𝜃.
employing a convolution kernel. In contrast to fully connected The gradient calculation process is done in this manner both
networks, convolution has local connectivity. That is, it only before and after joining the convolution operation.
extracts the connections between input elements within the fil-
ter size m [42]. This procedure can forecast current load data 1) Before joining, gradients are equal at this stage because the
using historical data, which can reduce the dimension of the output of the LSTM layer is directly coupled to the input of
input data. the SAM layer.
The role of the convolution operation on nonstationary is 𝛿LSTM = 𝛿SAM , (10)
then investigated. It is evident from Figure 4 that load data
are typically nonstationary, having time-varying variances and where 𝛿LSTM is the gradient of the LSTM layer’s output, and
expectations. The model’s input is a sequential load in time T , 𝛿SAM is the gradient of the SAM layer.
that is L = {l1 , l2 , l3 , … , lt , … , lT }. 2) After joining,
For forward propagation, the output is proportional to the 𝜕l
filter size m, which is less than the time T . 𝛿LSTM = 𝛿SAM ∗ cnnout , (11)
𝜕lcnnin
( ) ( (∑ ))
∑ 𝜕 𝜎
lcnnout = 𝜎 (lcnnin ⊗ Wcnn + bcnn ) , (8) m (lcnnin ⊗ Wcnn + bcnn )
𝛿LSTM = 𝛿SAM ∗ , (12)
m 𝜕lcnnin
where lcnnout is the output of layer CNN; lcnnin is the input of layer 𝛿LSTM = 𝛿SAM ∗ ROT 180◦Wcnn ⊙ 𝜎′ (lcnnin ), (13)
CNN; Wcnn is the weight matrices, and bcnn is the bias vectors; 𝜎
is the activation function sigmoid. where 𝜎′ (lcnnin ) represents the sigmoid’s gradient at value lcnnin .
17518695, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/gtd2.12763 by Cochrane Japan, Wiley Online Library on [09/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1544 YI ET AL.
fully connected layer with a sigmoid activation function. With 3 CASE STUDIES
two layers of dropout LSTMs, the output dimension remains
[256,1,128], where the hidden size is 128. The data are made 3.1 Data preprocessing and evaluation
four-dimensional [1,256,128,1], followed by two-dimensional index
convolution with a kernel size of 8 and output dimensions of
[256,1,128]. We use the SAM to obtain outstanding features, Missing data pose a serious problem in load forecasting. Delet-
and then we transform the output dimension to the predicted ing or filling is one way to deal with missing data and outliers
dimension using the linear fully connected layer, where the caused by objective factors. Considering that direct elimination
output size is 1. The proposed LSTM-CNN-SAM architecture results in discontinuity, we apply a judgment condition to select-
is shown in Table 2. The number of parameters of the model is ing the filling methods by comparing 24 hours’ of missing data
52,416. of less or more than 10%.
All models are implemented on the Nvidia RTX 2080 Ti To reduce the effect of dimension and effectively improve
GPU using Tensorflow 1.1.0 as the back end in the Python accuracy, we use standardization methods such as min-max
3.6 environment [57]. Moreover, the models are trained by the standardization and Z-score standardization. While the min-
Adam optimizer with default parameters to minimize the mean max standardization is near zero when the data are concen-
square error (MSE) [58]. trated, the Z-score standardization avoids this problem by
keeping the value in a normal distribution, showing how feature
1∑
n
L= (̂y − yi )2 . (18) changes impact load forecasting. In this case, raw data are pro-
n i=0 i cessed using the Z-score standardization method. The formula
17518695, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/gtd2.12763 by Cochrane Japan, Wiley Online Library on [09/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1546 YI ET AL.
TABLE 3 Comparison of performance to benchmarks. The experiments are performed in eight different buildings (B1 to B8)
of parameters. The LSTM-CNN-based SAM model provides the proposed LSTM-CNN-based SAM model achieves superior
a greater performance improvement in B1–B8 because even a performance to the other methods for power forecasting, and it
kernel of the CNN can increase the model’s generalization abil- is a competitive method for power consumption prediction.
ity and resolve the sequence’s nonstationary. The connection
with SAM enables the determination of local and long-term
sequence relationships. 3.3 Performance on the different
Additionally, as illustrated in the figures, our proposed aggregations and time horizons
technique offers the highest accuracy and stability. Experi-
ments show that the proposed LSTM-CNN-based SAM model 3.3.1 Performance on the industrial data set
outperforms existing power forecasting methods and is a
competitive method for power consumption prediction. The This paper first validates the proposed method’s performance
proposed method is closer to the predicted value, even at peaks on an industrial data set, which includes the consumption data
or valleys. In general, the experimental results indicate that of one factory in an actual area in China. It covers 1 July 2020 to
17518695, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/gtd2.12763 by Cochrane Japan, Wiley Online Library on [09/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1548 YI ET AL.
FIGURE 8 Load forecasting results. The closer it is to the original value (which can be shown as closer to the black line in the graph), the better the prediction is
TABLE 4 Performance on the industrial data set there is a significant amount of noise and outliers. As a result,
Model MAPE↓ MAE↓ RMSE↓ the relative indicators MAE and RMSE are subject to increas-
ing values. The aggregation of plant load is greater than that of
LSTM 0.196 83.866 111.850
individual consumers. Because the plant operates on a schedule,
LSTM-based SAM 0.202 91.362 124.304 holidays have a more significant impact on the load. As seen
CNN-GRU-based SAM 0.209 94.659 123.279 in the table, it outperforms the other three models by more
LSTM-CNN-based SAM 0.164 70.154 94.030 than 15% on the indicators MAE and MAPE, which are prone
to noise and outliers. As a result, the suggested model retains
superior predictive ability on the actual data set.
has a 0.022 lower MAPE, a 1.696 lower MAE, and a 2.541 lower
RMSE than the model selection method in [32], implying that
the proposed method has a higher degree of precision. The lit-
erature makes use of RL to pick models. This is the first study
to propose this concept for load prediction, which is novel.
However, the proposed model is better than the method in
[32] using machine learning models in the model pool, which
FIGURE 10 MAPE of the models. The horizontal line in the box plot has been confirmed to have poorer performance than the deep
represents the mean value, and the scattered points represent the convergence. learning method.
The more scattered the points are, the more unstable the model is
TABLE 6 Performance on the residential data set 15-min intervals. The essential characteristic of this model is
Model MAPE↓ its ability to reduce data inputs and improve load forecasting
accuracy. We develop an output-dimensioned load forecasting
LSTM 1.095
model. For the extraction of time features, non-stationarity,
LSTM-based SAM 1.180 and long-term dependency, we use LSTM-CNN-based SAM,
CNN-GRU-based SAM 1.356 respectively, based on the characteristics of load data. First, the
LSTM-CNN-based SAM 0.351 model is compared with LSTM, LSTM-based SAM, and CNN-
LSTM with weather data input[23] 0.401 GRU-based SAM in three different aggregation levels and time
horizons obtained from two widely known public data sets and
one real area in China. The proposed method increases the per-
formance of the fundamental load forecasting models by more
TABLE 7 Generalization capability than 10%, and by 2.2% and 5% compared to earlier research
Model MAPE↓ MAE↓ RMSE↓ (Feng’s work [32] and Kong’s work [23]). After that, we con-
duct experiments to verify the model’s generalization ability. We
LSTM 0.070 2.991 3.904 train the model using different data sets and then test its results
LSTM-based SAM 0.067 2.884 3.826 with the same data set. The results demonstrate that the model
CNN-GRU-based SAM 0.079 3.370 4.323 is capable of generalizing well. In the future, we intend to use
LSTM-CNN-based SAM_B2 0.027 1.144 1.589 the proposed method to tackle nonstationary forecasting issues
LSTM-CNN-based SAM_B1_B2 0.040 1.748 2.460
like wind power forecasting and stock forecasting.
AUTHOR CONTRIBUTIONS
Shiyan Yi: Conceptualization, investigation, methodology, soft-
in particular, has a higher degree of randomness because the ware, visualization, writing - original draft, writing - review and
turning on and off of electrical appliances is entirely random, editing. Haichun Liu: Conceptualization, formal analysis, val-
and the total amount of residential electricity consumption is idation, writing - review and editing. Tao Chen: Validation,
small. Thus the use of individual high-powered electrical appli- writing - review and editing. Jianwen Zhang: Conceptualiza-
ances can cause significant load fluctuations. W. Kong’s work tion, data curation, funding acquisition, supervision, validation,
used multilayer LSTM with weather data for residential load writing - review and editing. Yibo Fan: Funding acquisition,
forecasting. The results, given in Table 6, show the performance project administration, resources, supervision, writing - review
of the methods. Compared with W. Kong’s work, the suggested and editing.
method has a 0.05 lower MAPE, suggesting that the proposed
method is superior. ACKNOWLEDGEMENTS
This work was supported in part by the National Natu-
ral Science Foundation of China (Grant No. 52277193), in
3.4 Generalization capability of the part by the Shanghai Committee of Science and Technology
proposed model (19DZ1205403), in part by the National Natural Science Foun-
dation of China (Grant No. 62031009), in part by Alibaba
We verify the model’s generalization capability by using training Innovative Research (AIR) Program, in part by the Fudan
sets and test sets with different data distributions. The first 80% University-CIOMP Joint Fund (FC2019-001), in part by the
of the B1 building load data in the UTD data set is used as the Fudan-ZTE Joint Lab, in part by Pioneering Project of
training set and verification set at a ratio of 0.6/0.2, and the Academy for Engineering and Technology Fudan University
last 20% of the B2 building load data in the UTD data set is (gyy2021-001), in part by CCF-Alibaba Innovative Research
used as the test set. Experimental results are compared to test Fund For Young Scholars.
results, which use B2 building load data to divide the training
and test sets. The experimental results are shown in Table 7. CONFLICT OF INTEREST
While the experimental results are not as good as the LSTM- The authors declare no conflict of interest.
CNN-based SAM that uses B2 directly as the test set, they still
exceed the performance of the other three benchmarks. As a DATA AVAILABILITY STATEMENT
result, the model has a greater capacity for generalization. The data that support the findings of this study are openly avail-
able in Australian Government Department of the Environ-
ment and Energy at https://data.gov.au/data/dataset/smart-
4 CONCLUSION grid-smart-city-customer-trial-data, The data that support the
findings of this study are openly available in IEEE Dataport at
An LSTM-CNN-based SAM model is proposed in this paper. https://doi.org/10.21227/jdw5-z996.
The model is suitable for load forecasting in different aggrega-
tion levels, including residential, building, and industrial levels, ORCID
and different time horizons, including hourly, half-hourly, and Shiyan Yi https://orcid.org/0000-0002-0744-358X
17518695, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/gtd2.12763 by Cochrane Japan, Wiley Online Library on [09/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YI ET AL. 1551
REFERENCES 21. Kong, X., Li, C., Zheng, F., Wang, C.: Improved deep belief network for
1. Zhang, G., Guo, J.: A novel ensemble method for hourly residential elec- short-term load forecasting considering demand-side management. IEEE
tricity consumption forecasting by imaging time series. Energy 203, 117858 Trans. Power Syst. 35(2), 1531–1538 (2019)
(2020) 22. Chen, K., Chen, K., Wang, Q., He, Z., Hu, J., He, J.: Short-term load
2. Wang, Y., Chen, Q., Hong, T., Kang, C.: Review of smart meter data analyt- forecasting with deep residual networks. IEEE Trans. Smart Grid 10(4),
ics: Applications, methodologies, and challenges. IEEE Trans. Smart Grid 3943–3952 (2018)
10(3), 3125–3148 (2018) 23. Kong, W., Dong, Z.Y., Jia, Y., Hill, D.J., Xu, Y., Zhang, Y.: Short-term res-
3. Lu, W., Sun, Y., Li, B., Chang, L.: Research and application of “source- idential load forecasting based on lstm recurrent neural network. IEEE
network-load-storage” coordination optimization technology based on Trans. Smart Grid 10(1), 841–851 (2017)
cloud platform. In: 2021 4th International Conference on Energy, Elec- 24. Yao, C., Yang, P., Liu, Z.: Load forecasting method based on cnn-gru
trical and Power Engineering (CEEPE), pp. 813–818. IEEE, Piscataway hybrid neural network. Power Syst. Technol. 44, 3416–3424 (2020)
(2021) 25. Hongchuan, C., Xu, C., Guoqi, S., Xiaobin, W., Yunfeng, C., Xuefeng,
4. Da Silva, P.G., Ilić, D., Karnouskos, S.: The impact of smart grid pro- S., Hui, S., Lingyan, Z.: Similar day short-term load forecasting based on
sumer grouping on forecasting accuracy and its benefits for local electricity intelligent optimization method. Power Syst. Protect. Control 49, 121–127
market trading. IEEE Trans. Smart Grid 5(1), 402–410 (2013) (2021)
5. Douglas, A.P., Breipohl, A.M., Lee, F.N., Adapa, R.: Risk due to load fore- 26. Zheng, H., Yuan, J., Chen, L.: Short-term load forecasting using emd-
cast uncertainty in short term power system planning. IEEE Trans. Power lstm neural networks with a xgboost algorithm for feature importance
Syst. 13(4), 1493–1499 (1998) evaluation. Energies 10(8), 1168 (2017)
6. Kim, T.-Y., Cho, S.-B.: Predicting residential energy consumption using 27. Park, R.-J., Song, K.-B., Kwon, B.-S.: Short-term load forecasting algorithm
cnn-lstm neural networks. Energy 182, 72–81 (2019) using a similar day selection method based on reinforcement learning.
7. Chen, Y., Luh, P.B., Guan, C., Zhao, Y., Michel, L.D., Coolbeth, M.A., Energies 13(10), 2640 (2020)
Friedland, P.B., Rourke, S.J.: Short-term load forecasting: Similar day-based 28. Kong, W., Dong, Z.Y., Hill, D.J., Luo, F., Xu, Y.: Short-term residential load
wavelet neural networks. IEEE Trans. Power Syst. 25(1), 322–330 (2009) forecasting based on resident behaviour learning. IEEE Trans. Power Syst.
8. Barman, M., Choudhury, N.D., Sutradhar, S.: A regional hybrid goa-svm 33(1), 1087–1088 (2017)
model based on similar day approach for short-term load forecasting in 29. Ji, Y., Buechler, E., Rajagopal, R.: Data-driven load modeling and forecast-
assam, india. Energy 145, 710–720 (2018) ing of residential appliances. IEEE Trans. Smart Grid 11(3), 2652–2661
9. Massaoudi, M., Refaat, S.S., Chihi, I., Trabelsi, M., Oueslati, F.S., Abu-Rub, (2019)
H.: A novel stacked generalization ensemble-based hybrid lgbm-xgb-mlp 30. Yahui, L., Qian, Z.: Ultra-short-term power load forecasting method
model for short-term load forecasting. Energy 214, 118874 (2021) based on cluster empirical mode decomposition of cnn-lstm. Power Syst.
10. Cai, L., Gu, J., Jin, Z.: Two-layer transfer-learning-based architecture for Technol. 44, 1–8 (2021)
short-term load forecasting. IEEE Trans. Ind. Inf. 16(3), 1722–1732 31. Deng, D., Li, J., Zhang, Z., Teng, Y., Huang, Q.: Short-term electric load
(2019) forecasting based on eemd-gru-mlr. Power Syst. Technol. 44(2), 593–602
11. Li, J., Deng, D., Zhao, J., Cai, D., Hu, W., Zhang, M., Huang, Q.: A novel (2020)
hybrid short-term load forecasting method of smart grid using mlr and 32. Feng, C., Sun, M., Zhang, J.: Reinforced deterministic and probabilis-
lstm neural network. IEEE Trans. Ind. Inf. 17(4), 2443–2452 (2020) tic load forecasting via q-learning dynamic model selection. IEEE Trans.
12. Vu, D.H., Muttaqi, K.M., Agalgaonkar, A.P., Bouzerdoum, A.: Short-term Smart Grid 11(2), 1377–1386 (2019)
electricity demand forecasting using autoregressive based time varying 33. Li, Y., Zhang, S., Hu, R., Lu, N.: A meta-learning based distribution system
model incorporating representative data adjustment. Appl. Energy 205, load forecasting model selection framework. Appl. Energy 294, 116991
790–801 (2017) (2021)
13. Fang, T., Lahdelma, R.: Evaluation of a multiple linear regression model 34. Somu, N., MR, G.R., Ramamritham, K.: A deep learning framework for
and sarima model in forecasting heat demand for district heating system. building energy consumption forecast. Renew. Sustain. Energy Rev. 137,
Appl. Energy 179, 544–552 (2016) 110591 (2021)
14. Mi, J., Fan, L., Duan, X., Qiu, Y.: Short-term power load forecasting 35. Eskandari, H., Imani, M., Moghaddam, M.P.: Convolutional and recurrent
method based on improved exponential smoothing grey model. Math. neural network based model for short-term load forecasting. Electr. Power
Prob. Eng. 2018, 3894723 (2018) Syst. Res. 195, 107173 (2021)
15. Wang, D., Luo, H., Grunder, O., Lin, Y., Guo, H.: Multi-step ahead 36. Tian, C., Ma, J., Zhang, C., Zhan, P.: A deep neural network model for
electricity price forecasting using a hybrid model based on two-layer short-term load forecast based on long short-term memory network and
decomposition technique and bp neural network optimized by firefly convolutional neural network. Energies 11(12), 3493 (2018)
algorithm. Appl. Energy 190, 390–407 (2017) 37. Senjyu, T., Takara, H., Uezato, K., Funabashi, T.: One-hour-ahead load
16. Chen, Y., Xu, P., Chu, Y., Li, W., Wu, Y., Ni, L., Bao, Y., Wang, K.: Short- forecasting using neural network. IEEE Trans. Power Syst. 17(1), 113–118
term electrical load forecasting using the support vector regression (svr) (2002)
model to calculate the demand response baseline for office buildings. Appl. 38. Ziel, F.: Modeling public holidays in load forecasting: a german case study.
Energy 195, 659–670 (2017) J. Mod. Power Syst. Clean Energy 6(2), 191–207 (2018)
17. Cai, M., Pipattanasomporn, M., Rahman, S.: Day-ahead building-level load 39. Amjady, N.: Short-term hourly load forecasting using time-series modeling
forecasts using deep learning vs. traditional time-series techniques. Appl. with peak load estimation capability. IEEE Trans. Power Syst. 16(3), 498–
Energy 236, 1078–1088 (2019) 505 (2001)
18. Bouktif, S., Fiaz, A., Ouni, A., Serhani, M.A.: Optimal deep learning lstm 40. Boudraa, A.-O., Cexus, J.-C.: Emd-based signal filtering. IEEE Trans.
model for electric load forecasting using feature selection and genetic Instrum. Meas. 56(6), 2196–2202 (2007)
algorithm: Comparison with machine learning approaches. Energies 11(7), 41. Yin, L., Xie, J.: Multi-temporal-spatial-scale temporal convolution net-
1636 (2018) work for short-term load forecasting of power systems. Appl. Energy 283,
19. Marino, D.L., Amarasinghe, K., Manic, M.: Building energy load fore- 116328 (2021)
casting using deep neural networks. In: IECON 2016-42nd Annual 42. Tan, M., Santos, C.d., Xiang, B., Zhou, B.: Lstm-based deep learning
Conference of the IEEE Industrial Electronics Society, pp. 7046–7051. models for non-factoid answer selection. arXiv preprint arXiv:1511.04108
IEEE, Piscataway (2016) (2015)
20. Shi, H., Xu, M., Li, R.: Deep learning for household load forecasting- novel 43. Wang, K., Qi, X., Liu, H.: Photovoltaic power forecasting based
pooling deep rnn. IEEE Trans. Smart Grid 9(5), 5271–5280 (2017) lstm-convolutional network. Energy 189, 116225 (2019)
17518695, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/gtd2.12763 by Cochrane Japan, Wiley Online Library on [09/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1552 YI ET AL.
44. Jiang, L., Wang, X., Li, W., Wang, L., Yin, X., Jia, L.: Hybrid multitask short-term load forecasting. Int. J. Electr. Power Energy Syst. 109, 470–479
multi-information fusion deep learning for household short-term load (2019)
forecasting. IEEE Trans. Smart Grid 12(6), 5362–5372 (2021) 55. Gao, C., Zhang, N., Li, Y., Bian, F., Wan, H.: Self-attention-based time-
45. Aslam, S., Herodotou, H., Mohsin, S.M., Javaid, N., Ashraf, N., Aslam, S.: variant neural networks for multi-step time series forecasting. Neural
A survey on deep learning methods for power load and renewable energy Comput. Appl. 34(11), 8737–8754 (2022)
forecasting in smart microgrids. Renew. Sustain. Energy Rev. 144, 110992 56. Wu, S., Wang, G., Tang, P., Chen, F., Shi, L.: Convolution with even-
(2021) sized kernels and symmetric padding. Advances in Neural Information
46. Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B.: Attention-based Processing Systems, vol. 32. MIT Press, Cambridge, MA (2019)
bidirectional long short-term memory networks for relation classifica- 57. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin,
tion. In: Proceedings of the 54th Annual Meeting of the Association for M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: A system for
Computational Linguistics, vol. 2, pp. 207–212. ACM, New York (2016) large-scale machine learning. In: 12th USENIX symposium on operating
47. Lin, Z., Feng, M., Santos, C.N.d., Yu, M., Xiang, B., Zhou, B., systems design and implementation (OSDI 16), pp. 265–283. USENIX
Bengio, Y.: A structured self-attentive sentence embedding. arXiv preprint Association, Berkeley, CA (2016)
arXiv:1703.03130 (2017) 58. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv
48. Zang, H., Xu, R., Cheng, L., Ding, T., Liu, L., Wei, Z., Sun, G.: Residen- preprint arXiv:1412.6980 (2014)
tial load forecasting based on lstm fusing self-attention mechanism with 59. Zhang, J., Feng, C.: Short-term load forecasting data with hierarchical
pooling. Energy 229, 120682 (2021) advanced metering infrastructure and weather features (2019). Available:
49. Zhao, B., Wang, Z., Ji, W., Gao, X., Li, X.: A short-term power load fore- https://doi.org/10.21227/jdw5-z996
casting method based on attention mechanism of cnn-gru. Power Syst. 60. Ausgrid: Smart-grid smart-city customer trial data (2015). Avail-
Technol. 43(12), 4370–4376 (2019) able: https://data.gov.au/data/dataset/smart-grid-smart-city-customer-
50. Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice. trial-data
OTexts, Melbourne, Australia (2018)
51. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Com-
put. 9(8), 1735–1780 (1997)
52. Pan, Q., Hu, W., Chen, N.: Two birds with one stone: Series saliency for
How to cite this article: Yi, S., Liu, H., Chen, T.,
accurate and interpretable multivariate time series forecasting. In: Interna-
tional Joint Conference on Artificial Intelligence (IJCAI). Springer, Cham Zhang, J., Fan, Y.: A deep LSTM-CNN based on
(2021) self-attention mechanism with input data reduction for
53. Zhang, Y., Wang, Y., Luo, G.: A new optimization algorithm for non- short-term load forecasting. IET Gener. Transm.
stationary time series prediction based on recurrent neural networks. Distrib. 17, 1538–1552 (2023).
Future Gener. Comput. Syst. 102, 738–745 (2020)
https://doi.org/10.1049/gtd2.12763
54. Wang, S., Wang, X., Wang, S., Wang, D.: Bi-directional long short-term
memory method based on attention mechanism and rolling update for