Short-Term Residential Load Forecasting Based On LSTM Recurrent Neural Network
Short-Term Residential Load Forecasting Based On LSTM Recurrent Neural Network
Abstract—As the power system is facing a transition toward intermittency of renewable generations and the complex nature
a more intelligent, flexible, and interactive system with higher of utility-customer interactions and dynamic behaviours. To
penetration of renewable energy generation, load forecasting, this end, accurate short-term electric load forecasting on the
especially short-term load forecasting for individual electric cus-
tomers plays an increasingly essential role in the future grid level of residential customers can significantly facilitate the
planning and operation. Other than aggregated residential load power system operations. With effective load forecasting, peak
in a large scale, forecasting an electric load of a single energy user load shaving can be achieved through flexibly engaging energy
is fairly challenging due to the high volatility and uncertainty storage (ES) systems or intelligent demand response (DR)
involved. In this paper, we propose a long short-term mem- technologies. From the utilities’ point of view, if accurate load
ory (LSTM) recurrent neural network-based framework, which
is the latest and one of the most popular techniques of deep learn- forecasts for individual customers are available, these electric-
ing, to tackle this tricky issue. The proposed framework is tested ity suppliers can rely on such information to target the best
on a publicly available set of real residential smart meter data, of groups of customers with the highest potential to participate
which the performance is comprehensively compared to various DR programmes in the events of power deficiency. In this case,
benchmarks including the state-of-the-arts in the field of load the utilities would be more confident about the demand reduc-
forecasting. As a result, the proposed LSTM approach outper-
forms the other listed rival algorithms in the task of short-term tion in incentive-based DR programmes, which is important to
load forecasting for individual residential households. provide load-balancing reserve and/or hedge market costs.
The ongoing worldwide expansion of smart meter infras-
Index Terms—Short-term load forecasting, recurrent neural
network, deep learning, residential load forecasting. tructure (SMI) development has laid a groundwork to drive the
conventional power systems towards the smart grids. This mas-
sive deployment has also created opportunities for short-term
load forecasting for individual customers.
I. I NTRODUCTION
Regarding short-term load forecasting methodologies, many
OAD forecasting has been an essential task throughout
L the development of the modern power system. Long-term
load forecasting aims to assist in power system infrastructure
approaches had been reported in the literature to address this
problem. However, very few of them have confronted with
individual customers directly. Customer-wise forecasts were
planning, while mid-term and short-term load forecasting can conventionally considered trivial due to the volatile nature of
be essentially useful for system operations. individual loads. Therefore, the issues on short-term household
The modern power system is now moving towards a more load forecasting remain open.
sustainable one, whereas the increasing penetration of renew- Recently, deep learning has become one of the most
ables, electric vehicles (EVs) and time-varying load demand active technologies in many research areas. As opposed to
in the distribution grids inevitably add an extra layer of sys- shallow learning, deep learning usually refers to stacking
tem complexity and uncertainty. It is evident that the system multiple layers of neural network and relying on stochastic
stability is undergoing unprecedented challenges due to the optimisation to perform machine learning tasks. A varying
number of layers can provide a different level of abstrac-
Manuscript received March 12, 2017; revised July 6, 2017; accepted tion to improve the learning ability and task performance [1].
September 12, 2017. Date of publication September 18, 2017; date of cur- Especially, the long short-term memory (LSTM) recurrent
rent version December 19, 2018. Paper no. TSG-00351-2017. (Corresponding
author: Zhao Yang Dong.) neural network (RNN), which was originally introduced
W. Kong, Z. Y. Dong, and Y. Zhang are with the School by Hochreiter and Schmidhuber [2], has received enormous
of EE&T, University of New South Wales, Sydney, NSW 2052, attention in the realm of sequence learning. Effective appli-
Australia (e-mail: [email protected]; [email protected];
[email protected]). cations based on LSTM networks have been reported in
Y. Jia is with the Hong Kong Polytechnic University, Hong Kong. many areas such as natural language translation [3], image
D. J. Hill is with the School of EIE, University of Sydney, Sydney, captioning [4]–[6] and speech recognition [7]. Most of them
NSW 2006, Australia, and also with the University of Hong Kong,
Hong Kong (e-mail: [email protected]). handle data classification, but applications on regression tasks
Y. Xu is with the Nanyang Technological University, Singapore (e-mail: are relatively limited.
[email protected]). In this paper, it is aimed to make contributions to addressing
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. the issues on short-term residential load forecasting. First, an
Digital Object Identifier 10.1109/TSG.2017.2753802 exploratory customer-wise level data analysis is conducted to
1949-3053 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
842 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 1, JANUARY 2019
compare the difference between load on system level and loads methodology not only made use of the supreme ELM learn-
in individual households. A density based clustering technique ing efficiency for self-adaptive learning but also used the
is employed to evaluate the inconsistency in the residential ensemble structure to mitigate the instability of the fore-
load profiles and to indicate the difficulty of the load fore- casts. Recently, k-nearest neighbour (KNN) algorithm had also
casting problem for the residential sector. Justification of why seen some successful examples on load forecasting [13], [14],
LSTM is suitable for this task is given. Then, a residential load whose dominant advantage is its efficiency. Ghofrani et al. [15]
forecasting framework based on the LSTM is described. In proposed a dedicated input selection scheme to work with the
addition, a conservative empirical optimisation based predictor hybrid forecasting framework using wavelet transformation
is developed to serve as one of the many benchmarks. In and Bayesian neural network. To the best of our knowl-
the case study, we demonstrate that although the empiri- edge, this approach achieved the most accurate forecasting
cal predictor occasionally outperforms other machine learning performance on system-level load forecasting, with mean
based predictors in some extremely volatile cases, the LSTM absolute percentage error (MAPE) score as small as 0.419% on
can capture the subtle temporal consumption pattern persisting average.
in single-meter load profile and produce the best forecasts for However, all of the above methods focus on learning and
the majority of cases. On the aggregation level, at last, we also forecasting load at the system or substation level. In order
demonstrate that aggregating individual forecasts are generally to support future smart grid applications, effective load fore-
more accurate than directly forecasting the aggregated loads casting techniques for electricity users are gaining increasing
through our proposed approach. interest. Zhang et al. [16] developed a big data architecture
The rest of the paper is organised as follows. Section II that combines load clustering based on smart meter data and
provides the background of the load forecasting commu- decision tree to select corresponding load forecasting model
nity. Section III conducts an explorative data analysis to for prediction. Quilumba et al. [17] also adopted the sim-
visualise the challenge of the single-meter load forecasting ilar methodology that bases on smart meter data clustering
problem. Section IV introduces the LSTM framework for and neural network to make intraday load forecasting at the
short-term residential load forecasting. Section V introduces system level. Stephen et al. [18] clustered and labelled daily
other benchmarks including two empirical predictors. The test- historical data of individual households. The individual house-
ing dataset and experimental results are given in Section VI holds were deemed as label sequences, which are further fit
while Section VII concludes the paper. to Markov chains. Then the day ahead label can be sam-
pled, and cluster means at each time points were used for
the day ahead prediction. These works all showed that the
II. L ITERATURE S TUDY forecasting errors could be reduced by effectively grouping
There have been many research works in the area of short- different customers based on their daily profiles. However, they
term load forecasting. Cao et al. [8] adopted autoregressive all only reported the aggregated load forecasting error at the
integrated moving average (ARIMA) model and similar day system or community level where individual customer predic-
method for intraday load forecasting. The mechanism of their tion errors could be offset by the diversity of different end
similar day method is to group the targeted day with meteoro- users.
logically similar days in the history and predict the load based In the existing literature, Chaouch’s work [19] and the work
on the average demand of those days. It was demonstrated that of Ghofrani et al. [20] are first two examples that focus on
in ordinary days, ARIMA performs better while similar day load forecasting for individual users. A functional time series
method wins in unordinary days. Radial basis function (RBF) forecasting approach was proposed, and the daily median abso-
neural network had been used to address the short-term lute errors were reported in [19], but the more commonly used
load forecasting in [9] and [10]. Yun et al. combine the RBF metric of mean absolute percentage error (MAPE) was not
neural network with the adaptive neural fuzzy inference sys- reported. It is thereby unsuitable to serve as a benchmark
tem (ANFIS) to adjust the prediction by taking account of for experimental comparisons. A Kalman filter estimator was
the real-time electricity price. Li et al. [10] addressed the used, and an MAPE ranging from 18% - 30% was obtained
short-term day-ahead forecasting problem via a grid method in [20]. Very recently, deep learning based methods start to
combined with the back propagation (BP) and the RBF neural emerge in the load forecasting community. Ryu et al. [21]
networks. The grid method is similar to the similar day method showed that the load forecasting accuracy for industrial cus-
in [8], but instead of grouping days with similar meteorologi- tomers could be improved by using deep neural network. In
cal measurement, it groups load profiles according to location, fact, the industrial electricity consumption patterns are much
nature and size of the loads. For each group, they trained more regular than residential ones, so that a much more accu-
a BP network and an RBF network to predict the day-ahead rate result of about 8.85% MAPE on average is achieved
load demand. Qingle and Min [11] also proposed a neural compared to the results in [20]. Mocanu et al. [22] employed
network based predictor for very short term load forecasting. a factored conditional restricted Boltzmann machine, one
It takes the only the load values of the current and previous of the deep learning methods, for single-meter residential
time steps as the input to predict the load value at the coming load forecasting and observed notable improvement com-
time step. Zhang et al. [12] used an ensemble of extreme pared to the performance of shallow artificial neural network
learning machines (ELMs) to learn and forecast the total and support vector machine. Similarly, Marino et al. [23]
load of the Australian national energy market. The proposed made the first attempt towards the same load forecasting
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
KONG et al.: SHORT-TERM RESIDENTIAL LOAD FORECASTING BASED ON LSTM RNN 843
issue by using LSTM and demonstrate comparable results a series of doings governed by the diverse motives and inten-
as in [22]. However, the effectiveness of the two pioneering tions of people. Based on the characteristic of their underlying
works was only verified on the metric of root mean square motives, these practices can be classified into four categories,
error (RMSE) instead of the more common metric of MAPE, namely ‘Practical Understanding’, ‘Rules’, ‘Teleo-affective’
which makes it hard to contrast to other works. Also, the simi- and ‘General Understanding’ [18]. ‘Practical Understanding’
lar work of [23] lacked the justifiable discussion of the reason type of practices correspond to those routine activities that
for choosing LSTM. Moreover, all the different approaches the actors know when to do and what to do, like taking
in [19], [20], [22], and [23] were only verified with a sin- a shower, doing laundry. ‘Rules’ type of practices can refer to
gle residential household, so the aggregated effect of the those operations that are constrained by the technical limits of
customer-wise load forecasts cannot be evaluated. the system. For example, the dishwasher has pre-programmed
washing procedure, and one can hardly control the duration
III. E XPLORATORY DATA A NALYSIS AND and energy of the washing process. ‘Teleo-affective’ practices
P ROBLEM I DENTIFICATION related to those goal oriented behaviours, such as making a cup
The main challenge of the single household load-forecasting of coffee and watching TV. ‘General Understanding’ may refer
problem is the diversity and volatility. Thanks to the Smart to those practices with a greater degree of persistence and
Grid Smart City (SGSC) project initiated by the Australian regular occurrence over time, such as religion related activi-
Government [24], electrical load data for thousands of indi- ties. All these four types of practices from every resident in
vidual households are available. a household constitute the interaction and engagement of var-
ious electric appliances and present to be the daily electricity
A. Introduction of the Dataset consumption. The diversity and complexity of human beings
render the loads in different houses extremely dissimilar. Some
The SGSC was Australia’s first commercial-scale smart grid dwellers may have more regular lifestyles so that their energy
project initiated as early as in 2010. During the 4-year project usage is easier to forecast.
development, SGSC had gathered smart meter data for about In order to justify the above observation, we use
10,000 different customers in New South Wales. These rich a density-based clustering technique, which is known as
data are the key to enabling further study of many smart grid the Density-Based Spatial Clustering of Application with
technologies in a real-world context. Apparently, short-term Noise (DBSCAN) [26], to evaluate the consistency in daily
load forecasting for individual households is one of smart grid power profile. The most obvious benefits of using DBSCAN
research that can make use of this project. for consistency analysis compared to other clustering tech-
Since we focus on load forecasting for a group of general niques are that it requires no number of the cluster in the
individual families, it is unrealistic and time-consuming to go data, and it has the notion of outliers. According to the
through every customer in the SGSC dataset. Therefore, a sub- Practice Theory and [18], electricity consumption patterns will
set of SGSC dataset is used for the proof of concept. The repeat with noise, so DBSCAN is an ideal clustering tech-
selection criterion is those households which possess a hot nique to identify outliers from a set of daily consumption
water system, which leads to a reasonably sized testing subset profiles. If the result has fewer outliers, then the consistency is
including 69 customers. better.
In our case, each daily profile consists of 48 half-hourly
B. Exploratory Data Analysis readings, which can be viewed as a 48-dimension sample.
Unlike electric load at the system level, individual residen- DBSCAN requires three pre-determined parameters, the dis-
tial loads lack the obvious consistent patterns that can be in tance eps, the minimum number of samples in a cluster
favour of the forecasting accuracy. Generally, the diversity in minSam and the definition of the distance measure. In this
the aggregated level smooths the daily load profiles, which paper, we set the eps equal to 10% of the average daily energy
make substation load relatively more predictable, while the consumption, minSam = 2 and use the Euclidean distance as
electricity consumption of a single customer is more dependent the distance measure. We apply the DBSCAN to the aggre-
on the underlying human behaviour. gated load and each of the 69 households over three months
In an individual household, the daily routines and the from 01/06/2013 to 31/08/2013, during which all 69 individual
lifestyle of the residents and the types of possessed major customers had complete data.
appliances may have a more direct impact on the follow- First of all, the aggregated daily profiles of all 69 households
ing short-term load profile. For example, some households with the clustering results are shown in Fig. 1. It is clear that
may have somewhat fixed routines to switch on the laundry the daily profiles form only one cluster and have no outliers.
dryer after using their washing machines, which would imply In another word, the aggregated load is rather consistent, and
the high possibility of major electricity consumption in the its daily pattern is fairly obvious.
subsequent one or two hours. For the individual customers, the number of outliers (daily
Reference [18] also noted this phenomenon and used profiles do not belong to any clusters) varies greatly from one
Practice Theory [25] to explain the root causes of energy household to another. This corresponds with the fact of the
usage on a residential level. The overall residential energy different lifestyles that each household may possess. The out-
consumption is produced by the usage of appliances through lier distribution is shown in Fig. 2. Among all customers, we
an ensemble effect of ‘practices’, which can be interpreted as visualise the daily profiles of two opposite cases. Fig. 3 shows
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
844 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 1, JANUARY 2019
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
KONG et al.: SHORT-TERM RESIDENTIAL LOAD FORECASTING BASED ON LSTM RNN 845
st = gt it + st−1 f t (5)
ht = φ(st ) ot (6) biases are learnt by minimising the differences between the
LSTM outputs and the actual training samples. Through this
where Wgx , Wgh , Wix , Wih , Wfx , Wfh , Wox and Woh are weight un-rolled structure, information of the current time step can
matrices for the corresponding inputs of the network activa- be stored and maintained to affect the LSTM output of the
tion functions; stands for an element-wise multiplication; σ future time steps.
represents the sigmoid activation function, while φ represents
the tanh function. The LSTM block structure at a single
time step is illustrated as Fig. 5, and the un-rolled LSTM B. The LSTM Based Forecasting Framework
architecture connecting information of the subsequent time The short-term LSTM based forecasting framework is given
point is shown by Fig. 6. The weights of the un-rolled LSTM as Fig. 7. The framework starts from data preprocessing for
are duplicated for an arbitrary number of time steps. the inputs. In this study, the following input features are used:
To train a simple one-layer LSTM recurrent network, one 1. the sequence of energy consumptions for the past K time
should specify the hyperparameter of the hidden output dimen- steps E = {et−K , . . . , et−2 , et−1 } ∈ RK ;
sion n. In this case, the hidden output at a given time step 2. the incremental sequence of the time day indices for the
ht ∈ Rn , which is a n-dimensionla vector, so as to st . The past K time steps I ∈ RK , where the range for each element
common initialization for ht and st is zero initialisation, i.e., of I is (1 to 48);
h0 = 0 and h0 = 0. There are three sigmoid functions, with 3. the corresponding day of week indices for the past K
output range from 0 to 1, in an LSTM block serving as the time steps D, each of which ranges from 0 to 6;
“soft” switches to decide which signals should pass the gates. 4. the corresponding binary holiday marks H, each of which
If the gate is 0, then the signal is blocked by the gate. The deci- can either be 0 or 1.
sions for the forget gate f , the input gate i and the output gate Since the LSTM is sensitive to data scale, the four input vec-
o are all dependent on the current input xt and the previous tors are scaled to the range of (0, 1) according to the feature’s
output ht−1 . The signal of the input gate controls what to nature. We adopt the min-max normalisation for E, while the
preserve in the internal state, while the forget gate controls vectors I, D, H are encoded by one hot encoder. The one hot
what to forget from the previous state st−1 . With the internal encoder maps an original element from the categorical feature
state updated, the output gate decides which the internal state vector with M cardinality into a new vector with M elements,
st should pass as the LSTM output ht . This process then con- where only the corresponding new element is one while the
tinues to repeat for the next time step. All the weights and rest of new elements are zeroes.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
846 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 1, JANUARY 2019
Fig. 8. The MAPE minimisation forecast and the empirical mean forecast.
A. Empirical Approaches
1) Empirical Means: We start with one of the most naïve
predictors – the empirical mean predictor. It simply outputs
a forecasting value based on the statistical mean given the
customer ID, the time of the day and the day type (weekday
or weekend).
2) Empirical MAPE Minimisation: We further develop
a more sophisticated optimisation based empirical predictor
to enrich the benchmarks. Similar to the empirical mean
predictor, the empirical MAPE minimisation predictor is based
Fig. 7. The LSTM based forecasting framework.
on statistical energy consumption distribution, resembling the
models used in [38]. For example, given the customer ID,
the time of the day, the day type and the desired precision
Once the four vectors are scaled to
E,I, the input for
D, H,
level (e.g., one watt-hour), an empirical probability mass func-
LSTM layer is a matrix of the concatenation of the four, i.e.:
tion (pmf) can be derived from historical data, as shown by the
X= E ,
T T T T histogram in Fig. 8. With the empirical pmf ready, the MAPE
I ,D ,H
expectation of any forecasts can be calculated. Let p denote
Each row of the input matrix is the scaled features for the an arbitrary predicted load value within the empirical range
corresponding time step, which is fed to corresponding LSTM and pi denote the ith possible discretised value also within the
block in the LSTM layer. Due to the sequential nature of the empirical range, the MAPE expectation of choosing arbitrary
output of an LSTM layer, an arbitrary number of LSTM layers predicted load value p as the forecast is
can be stacked to form a deep learning network.
(p − pi )
The outputs of the top LSTM layer are fed to a conventional Eid,t,h (p) =
pmfid,t,h (pi ) (7)
p
feedforward neural network which maps the intermediate i i
LSTM outputs to a single value, i.e., the energy consumption where pmfid,t,h () is the empirical probability mass function
forecast of the target time interval. derived by specifying the customer id, the time of day t and
All frameworks for all customers are built on a desktop the day type h. By minimising the MAPE expectation, load
PC with a 3.4 GHz Intel i7 processor and 8GB of memory forecast p̂ at time t when the day type is h can be obtained as:
using the Keras library [32] with Theano backend [33]. After
running some initial tests, the computationally efficient Adam p̂t = argmin Eid,t,h (p) (8)
p
optimiser [34] showed slightly better results than other candi-
dates, including the classic stochastic gradient descent (SGD), The empirical MAPE minimization forecast for Customer
Adagrad [35], Adadelta [36] and RMSProp [37]. Therefore, 8804804 at 12:00 in a weekday is illustrated in Fig. 8. The red
Adam is used for the training of the proposed forecasting line indicates the MAPE expectation throughout the empirical
framework with recommended default parameters (learning range for this customer in the interval. The minimal MAPE
rate, momentum and decay). expectation point is more conservative than the empirical mean
of the data.
V. T HE B ENCHMARKS
Due to the highly volatile nature of single-user power B. Some State-of-the-Art Machine Learning Approaches
consumption, the forecasting accuracy is usually low in the Due to many promising cases of neural network based
existing literature despite the state-of-the-art approaches were short-term load forecasting on the grid level [9]–[11], we
applied. Therefore, in addition to some machine learning meth- should also benchmark the results of our proposed framework
ods, we also investigate some empirical methods to serve as against those from the conventional backpropagation neural
the benchmarks in the present work. network (BPNN). Two scenarios of BPNN are considered.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
KONG et al.: SHORT-TERM RESIDENTIAL LOAD FORECASTING BASED ON LSTM RNN 847
TABLE I
One of the scenarios only features the energy consumption H YPER -PARAMETER S UMMARY
of the corresponding consumption in the past D days, which
is denoted as BPNN-D hereafter. The other looks back the
energy consumption of the past K time steps as our proposed
LSTM based framework, which is denoted as BPNN-T.
Reasonable performances based on the k-nearest neigh-
bour (KNN) regression [13], [14], superior forecasting accu-
racy based on extreme learning machine (ELM) [12] and
the most accurate results from a sophisticated input selec- TABLE II
L OAD F ORECASTING MAPE S UMMARY
tion scheme combined with a hybrid forecasting framework
(IS-HF) [15] had been demonstrated. Therefore, they are also
included as our benchmark candidates.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
848 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 1, JANUARY 2019
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
KONG et al.: SHORT-TERM RESIDENTIAL LOAD FORECASTING BASED ON LSTM RNN 849
forecast follows the peaks much better than the BPNN-T. The
BPNN-T always forecasts peaks after the peaks had already
happened.
In order to better show how the LSTM forecasts load at this
residential level, we also visualise all the internal states of the
LSTM in the middle row of Fig. 10. Each curve represents
the dynamic tracking of the internal state for a hidden node
in the network. The summation of all internal states is shown
as the third row of Fig. 10. From the summation of the LSTM
states, we can visualise five cycles, which correspond to the
five-day period (240 time steps). The internal states of the
LSTM are the key for the algorithm to develop a long-term
temporal relationship and help it outperform the BPNN. Fig. 11. MAPE versus number of outliers for LSTM and BPNN-T.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
850 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 1, JANUARY 2019
on substation level is elaborated. Unlike the load on system [10] H. Li, Y. Zhao, Z. Zhang, and X. Hu, “Short-term load forecasting based
or substation level where daily consumption patterns always on the grid method and the time series fuzzy load forecasting method,”
in Proc. Int. Conf. Renew. Power Gener. (RPG), Beijing, China, 2015,
exist, energy consumption in a single household is usually pp. 1–6.
volatile. A density based clustering technique is employed to [11] P. Qingle and Z. Min, “Very short-term load forecasting based on neural
evaluate and compare the inconsistency between the aggre- network and rough set,” in Proc. Int. Conf. Intell. Comput. Technol.
Autom. (ICICTA), Changsha, China, 2010, pp. 1132–1135.
gated load and individual loads. According to the Practice [12] R. Zhang, Z. Y. Dong, Y. Xu, K. Meng, and K. P. Wong, “Short-term
Theory, lifestyles of the residents will be reflected in the load forecasting of Australian national electricity market by an ensemble
energy consumption as the repetitive patterns no matter how model of extreme learning machine,” IET Gener. Transm. Distrib., vol. 7,
no. 4, pp. 391–397, Apr. 2013.
inconsistent it is. Therefore, this paper proposes an LSTM [13] R. Zhang, Y. Xu, Z. Y. Dong, W. Kong, and K. P. Wong, “A composite
recurrent neural network based load forecasting framework k-nearest neighbor model for day-ahead load forecasting with limited
for this extremely challenging task of individual residential temperature forecasts,” presented at the IEEE Gen. Meeting, Boston,
MA, USA, 2016, pp. 1–5.
load forecasting, because LSTM has been proven to learn the [14] F. H. Al-Qahtani and S. F. Crone, “Multivariate k-nearest neighbour
long-term temporal connections. regression for time series data—A novel algorithm for forecasting UK
Multiple benchmarks are comprehensively tested and com- electricity demand,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN),
Dallas, TX, USA, 2013, pp. 1–8.
pared to the proposed LSTM load forecasting framework on
[15] M. Ghofrani, M. Ghayekhloo, A. Arabali, and A. Ghayekhloo, “A hybrid
a real-world dataset. It turns out that many load forecasting short-term load forecasting with a new input selection framework,”
approaches which are successful for grid or substation load Energy, vol. 81, pp. 777–786, Mar. 2015.
forecasting struggle in the single-meter load forecasting prob- [16] P. Zhang, X. Wu, X. Wang, and S. Bi, “Short-term load forecasting
based on big data technologies,” CSEE J. Power Energy Syst., vol. 1,
lems. The proposed LSTM framework achieves generally the no. 3, pp. 59–67, Sep. 2015.
best forecasting performance in the dataset. [17] F. L. Quilumba, W.-J. Lee, H. Huang, D. Y. Wang, and R. L. Szabados,
Moreover, although individual load forecasting is far from “Using smart meter data to improve the accuracy of intraday load fore-
casting considering customer behavior similarities,” IEEE Trans. Smart
accurate, aggregating all individual forecasts yields better fore- Grid, vol. 6, no. 2, pp. 911–918, Mar. 2015.
cast for the aggregation level, compared to the conventional [18] B. Stephen, X. Tang, P. R. Harvey, S. Galloway, and K. I. Jennett,
strategy of directly forecasting the aggregated load. “Incorporating practice theory in sub-profile models for short term
aggregated residential load forecasting,” IEEE Trans. Smart Grid, vol. 8,
The inconsistency in daily consumption profiles generally no. 4, pp. 1591–1598, Jul. 2017.
affects the predictability of the customers. The higher the [19] M. Chaouch, “Clustering-based improvement of nonparametric func-
inconsistency is, the more the LSTM can contribute to the fore- tional time series forecasting: Application to intra-day household-level
load curves,” IEEE Trans. Smart Grid, vol. 5, no. 1, pp. 411–419,
casting improvement compared to the simple back propagation Jan. 2014.
neural network. [20] M. Ghofrani, M. Hassanzadeh, M. Etezadi-Amoli, and M. S. Fadali,
As for future work, methodologies for parameter tunning “Smart meter based short-term load forecasting for residential cus-
tomers,” in Proc. North Amer. Power Symp. (NAPS), Boston, MA, USA,
can be developed to further enhance forecasting accuracy on 2011, pp. 1–5.
different types of customers, especially those households with [21] S. Ryu, J. Noh, and H. Kim, “Deep neural network based demand
very high volatility. side short term load forecasting,” in Proc. IEEE Int. Conf. Smart
Grid Commun. (SmartGridComm), Sydney, NSW, Australia, 2016,
pp. 308–313.
R EFERENCES [22] E. Mocanu, P. H. Nguyen, M. Gibescu, and W. L. Kling, “Deep learn-
ing for estimating building energy consumption,” Sustain. Energy Grids
[1] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: Netw., vol. 6, pp. 91–99, Jun. 2016.
A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., [23] D. L. Marino, K. Amarasinghe, and M. Manic, “Building energy
vol. 35, no. 8, pp. 1798–1828, Aug. 2013. load forecasting using deep neural networks,” in Proc. 42nd Annu.
[2] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Conf. IEEE Ind. Electron. Soc. (IECON), Florence, Italy, 2016,
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. pp. 7046–7051.
[3] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning [24] Smart Grid, Smart City, Australian Govern., Australia, Canberra,
with neural networks,” in Proc. Adv. Neural Inf. Process. Syst., Montreal, ACT, Australia, 2014. [Online]. Available: http://www.industry.gov.au/
QC, Canada, 2014, pp. 3104–3112. ENERGY/PROGRAMMES/SMARTGRIDSMARTCITY/Pages/default.
[4] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural aspx
image caption generator,” in Proc. IEEE Conf. Comput. Vis. Pattern [25] E. Shove, M. Pantzar, and M. Watson, The Dynamics of Social Practice:
Recognit., Boston, MA, USA, 2015, pp. 3156–3164. Everyday Life and How It Changes. London, U.K.: SAGE, 2012.
[5] A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for gen- [26] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm
erating image descriptions,” in Proc. IEEE Conf. Comput. Vis. Pattern for discovering clusters in large spatial databases with noise,” in Proc.
Recognit., Boston, MA, USA, 2015, pp. 3128–3137. KDD, 1996, pp. 226–231.
[6] J. Mao et al. (2014). Deep Captioning With Multimodal [27] P. J. Werbos, “Backpropagation through time: What it does and how to
Recurrent Neural Networks (m-RNN). [Online]. Available: do it,” Proc. IEEE, vol. 78, no. 10, pp. 1550–1560, Oct. 1990.
https://arxiv.org/abs/1412.6632 [28] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies
[7] A. Graves and N. Jaitly, “Towards end-to-end speech recognition with with gradient descent is difficult,” IEEE Trans. Neural Netw., vol. 5,
recurrent neural networks,” in Proc. ICML, Beijing, China, 2014, no. 2, pp. 157–166, Mar. 1994.
pp. 1764–1772. [29] F. K. John and C. K. Stefan, “Gradient flow in recurrent nets: The
[8] X. Cao, S. Dong, Z. Wu, and Y. Jing, “A data-driven hybrid difficulty of learning long-term dependencies,” in A Field Guide to
optimization model for short-term residential load forecasting,” Dynamical Recurrent Networks. New York, NY, USA: IEEE Press, 2001,
in Proc. IEEE Int. Conf. Comput. Inf. Technol. Ubiquitous p. 464.
Comput. Commun. Dependable Auton. Secure Comput. Pervasive [30] F. A. Gers, J. A. Schmidhuber, and F. A. Cummins, “Learning to forget:
Intell. Comput. (CIT/IUCC/DASC/PICOM), Liverpool, U.K., 2015, Continual prediction with LSTM,” Neural Comput., vol. 12, no. 10,
pp. 283–287. pp. 2451–2471, 2000.
[9] Z. Yun et al., “RBF neural network and ANFIS-based short-term load [31] Z. C. Lipton, J. Berkowitz, and C. Elkan. (2015). A Critical Review of
forecasting approach in real-time price environment,” IEEE Trans. Power Recurrent Neural Networks for Sequence Learning. [Online]. Available:
Syst., vol. 23, no. 3, pp. 853–858, Aug. 2008. https://arxiv.org/abs/1506.00019
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
KONG et al.: SHORT-TERM RESIDENTIAL LOAD FORECASTING BASED ON LSTM RNN 851
[32] F. Chollet. (2015). Keras. [Online]. Available: https://github.com/ fchol- Youwei Jia (S’11–M’15) received the B.Eng. degree from Sichuan University,
let/keras China, in 2011 and the Ph.D. degree from Hong Kong Polytechnic University,
[33] R. Al-Rfou et al. (2016). Theano: A Python Framework for Hong Kong, in 2015, where he is currently with Hong Kong Polytechnic
Fast Computation of Mathematical Expressions. [Online]. Available: University, as a Post-Doctoral Fellow. His research interests include power
https://arxiv.org/abs/1605.02688 system security analysis, cascading failures, complex network, and artificial
[34] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- intelligence application in power engineering.
mization,” arXiv preprint arXiv:1412.6980, 2014. [Online]. Available:
https://arxiv.org/abs/1412.6980
[35] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods
for online learning and stochastic optimization,” J. Mach. Learn. Res.,
vol. 12, pp. 2121–2159, Feb. 2011.
[36] M. D. Zeiler. (2012). ADADELTA: An Adaptive Learning Rate Method.
[Online]. Available: https://arxiv.org/abs/1212.5701
[37] G. Hinton, N. Srivastava, and K. Swersky, “RMSProp: Divide David J. Hill (S’72–M’76–SM’91–F’93–LF’14) received the Ph.D. degree in
the gradient by a running average of its recent magnitude,” in electrical engineering from the University of Newcastle, Australia, in 1976.
COURSERA: Neural Networks for Machine Learning, vol. 4, He holds the Chair of Electrical Engineering in the Department of Electrical
2012. [Online]. Available: https://www.coursera.org/learn/neural- and Electronic Engineering, University of Hong Kong. He is also a Part-Time
networks/lecture/YQHki/rmsprop-divide-the-gradient-by-a-running- Professor with the University of Sydney, Australia.
average-of-its-recent-magnitude From 2005 to 2010, he was an Australian Research Council Federation
[38] S. M. Mousavi and H. A. Abyaneh, “Effect of load models on proba- Fellow with the Australian National University. Since 1994, he has been
bilistic characterization of aggregated load patterns,” IEEE Trans. Power holding various positions with the University of Sydney, Australia, includ-
Syst., vol. 26, no. 2, pp. 811–819, May 2011. ing the Chair of Electrical Engineering until 2002 and again from 2010 to
[39] A.-R. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using 2013 along with an ARC Professorial Fellowship. He has also held aca-
deep belief networks,” IEEE Audio, Speech, Language Process., vol. 20, demic and substantial visiting positions with the University of Melbourne, the
no. 1, pp. 14–22, Jan. 2012. University of California at Berkeley, the University of Newcastle, Australia,
[40] G. Hinton et al., “Deep neural networks for acoustic modeling in speech the Lund University, Sweden, the University of Munich, the City University
recognition: The shared views of four research groups,” IEEE Signal of Hong Kong, and Hong Kong Polytechnic University. His general research
Process. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012. interests are in control systems, complex networks, power systems, and sta-
[41] A. Marinescu, C. Harris, I. Dusparic, S. Clarke, and V. Cahill, bility analysis. His work is currently mainly on control and planning of
“Residential electrical demand forecasting in very small scale: An eval- future energy networks and basic stability and control questions for dynamic
uation of forecasting methods,” in Proc. 2nd Int. Workshop Softw. networks.
Eng. Challenges Smart Grid (SE4SG), San Francisco, CA, USA, 2013, Prof. Hill is a fellow of the Society for Industrial and Applied Mathematics,
pp. 25–32. USA, the Australian Academy of Science, the Australian Academy of
[42] M. Rowe, T. Yunusov, S. Haben, W. Holderbaum, and B. Potter, “The Technological Sciences and Engineering, and the Hong Kong Academy of
real-time optimisation of DNO owned storage devices on the LV network Engineering Sciences. He is also a Foreign Member of the Royal Swedish
for peak reduction,” Energies, vol. 7, pp. 3537–3560, May 2014. Academy of Engineering Sciences.
Weicong Kong (S’14) received the B.E. and M.E. degrees from the South
China University of Technology, Guangzhou, China, in 2008 and 2011, respec-
tively, the M.Sc. degree from the University of Strathclyde, Glasgow, U.K., in
2009, and the Ph.D. degree from the University of Sydney, Australia, in 2017. Yan Xu (S’10–M’13) received the B.E. and M.E. degrees from the
He was the Electrical Engineer with Shenzhen Power Supply Company from South China University of Technology, Guangzhou, China, in 2008 and
2011 to 2014, in charge of the development of distribution automation system, 2011, respectively, and the Ph.D. degree from the University of Newcastle,
SCADA and AMI. He is currently a Post-Doctoral Research Fellow with the Australia, in 2013. He is currently the Nanyang Assistant Professor with
University of New South Wales. His research interests include data analytics the School of Electrical and Electronic Engineering, Nanyang Technological
and deep learning in energy engineering, including non-intrusive load moni- University, Singapore. He was previously with the School of Electrical and
toring, load forecasting, demand response, renewable energy integration, and Information Engineering, University of Sydney, Australia. His research inter-
energy market. ests include power system stability, control, and optimization, microgrid, and
data-analytics for smart grid applications.
Zhao Yang Dong (M’99–SM’06–F’17) received the Ph.D. degree from the
University of Sydney, Australia, in 1999. He is currently with the University
of NSW, Sydney, Australia. He was with the University of Sydney, and as
an Ausgrid Chair and the Director of the Ausgrid Centre of Excellence for
Intelligent Electricity Networks, University of Newcastle, Australia. He also Yuan Zhang (S’16) received the B.E. and M.E. degrees from Xi’an Jiaotong
worked as a System Planning Manager with Transend Networks (currently University, Xi’an, China, in 2010 and 2013, respectively. She is currently
TASNetworks), Australia. His research interests include smart grid, power pursuing the Ph.D. degree with the University of New South Wales, Australia.
system planning, power system security, load modeling, renewable energy She was previously with the University of Sydney, Australia and the National
systems, electricity market, and computational intelligence. He is an Editor University of Singapore, Singapore. Her research interests include electricity
of the IEEE T RANSACTIONS ON S MART G RID, the IEEE PES Letters, and market, home energy management, statistical methods, and machine learning
IET Renewable Power Generation. and their applications in power engineering.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.