0% found this document useful (0 votes)
6 views

Short-Term Residential Load Forecasting Based On LSTM Recurrent Neural Network

Uploaded by

minggod221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Short-Term Residential Load Forecasting Based On LSTM Recurrent Neural Network

Uploaded by

minggod221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO.

1, JANUARY 2019 841

Short-Term Residential Load Forecasting Based


on LSTM Recurrent Neural Network
Weicong Kong , Student Member, IEEE, Zhao Yang Dong, Fellow, IEEE, Youwei Jia, Member, IEEE,
David J. Hill, Life Fellow, IEEE, Yan Xu , Member, IEEE, and Yuan Zhang, Student Member, IEEE

Abstract—As the power system is facing a transition toward intermittency of renewable generations and the complex nature
a more intelligent, flexible, and interactive system with higher of utility-customer interactions and dynamic behaviours. To
penetration of renewable energy generation, load forecasting, this end, accurate short-term electric load forecasting on the
especially short-term load forecasting for individual electric cus-
tomers plays an increasingly essential role in the future grid level of residential customers can significantly facilitate the
planning and operation. Other than aggregated residential load power system operations. With effective load forecasting, peak
in a large scale, forecasting an electric load of a single energy user load shaving can be achieved through flexibly engaging energy
is fairly challenging due to the high volatility and uncertainty storage (ES) systems or intelligent demand response (DR)
involved. In this paper, we propose a long short-term mem- technologies. From the utilities’ point of view, if accurate load
ory (LSTM) recurrent neural network-based framework, which
is the latest and one of the most popular techniques of deep learn- forecasts for individual customers are available, these electric-
ing, to tackle this tricky issue. The proposed framework is tested ity suppliers can rely on such information to target the best
on a publicly available set of real residential smart meter data, of groups of customers with the highest potential to participate
which the performance is comprehensively compared to various DR programmes in the events of power deficiency. In this case,
benchmarks including the state-of-the-arts in the field of load the utilities would be more confident about the demand reduc-
forecasting. As a result, the proposed LSTM approach outper-
forms the other listed rival algorithms in the task of short-term tion in incentive-based DR programmes, which is important to
load forecasting for individual residential households. provide load-balancing reserve and/or hedge market costs.
The ongoing worldwide expansion of smart meter infras-
Index Terms—Short-term load forecasting, recurrent neural
network, deep learning, residential load forecasting. tructure (SMI) development has laid a groundwork to drive the
conventional power systems towards the smart grids. This mas-
sive deployment has also created opportunities for short-term
load forecasting for individual customers.
I. I NTRODUCTION
Regarding short-term load forecasting methodologies, many
OAD forecasting has been an essential task throughout
L the development of the modern power system. Long-term
load forecasting aims to assist in power system infrastructure
approaches had been reported in the literature to address this
problem. However, very few of them have confronted with
individual customers directly. Customer-wise forecasts were
planning, while mid-term and short-term load forecasting can conventionally considered trivial due to the volatile nature of
be essentially useful for system operations. individual loads. Therefore, the issues on short-term household
The modern power system is now moving towards a more load forecasting remain open.
sustainable one, whereas the increasing penetration of renew- Recently, deep learning has become one of the most
ables, electric vehicles (EVs) and time-varying load demand active technologies in many research areas. As opposed to
in the distribution grids inevitably add an extra layer of sys- shallow learning, deep learning usually refers to stacking
tem complexity and uncertainty. It is evident that the system multiple layers of neural network and relying on stochastic
stability is undergoing unprecedented challenges due to the optimisation to perform machine learning tasks. A varying
number of layers can provide a different level of abstrac-
Manuscript received March 12, 2017; revised July 6, 2017; accepted tion to improve the learning ability and task performance [1].
September 12, 2017. Date of publication September 18, 2017; date of cur- Especially, the long short-term memory (LSTM) recurrent
rent version December 19, 2018. Paper no. TSG-00351-2017. (Corresponding
author: Zhao Yang Dong.) neural network (RNN), which was originally introduced
W. Kong, Z. Y. Dong, and Y. Zhang are with the School by Hochreiter and Schmidhuber [2], has received enormous
of EE&T, University of New South Wales, Sydney, NSW 2052, attention in the realm of sequence learning. Effective appli-
Australia (e-mail: [email protected]; [email protected];
[email protected]). cations based on LSTM networks have been reported in
Y. Jia is with the Hong Kong Polytechnic University, Hong Kong. many areas such as natural language translation [3], image
D. J. Hill is with the School of EIE, University of Sydney, Sydney, captioning [4]–[6] and speech recognition [7]. Most of them
NSW 2006, Australia, and also with the University of Hong Kong,
Hong Kong (e-mail: [email protected]). handle data classification, but applications on regression tasks
Y. Xu is with the Nanyang Technological University, Singapore (e-mail: are relatively limited.
[email protected]). In this paper, it is aimed to make contributions to addressing
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. the issues on short-term residential load forecasting. First, an
Digital Object Identifier 10.1109/TSG.2017.2753802 exploratory customer-wise level data analysis is conducted to
1949-3053 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
842 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 1, JANUARY 2019

compare the difference between load on system level and loads methodology not only made use of the supreme ELM learn-
in individual households. A density based clustering technique ing efficiency for self-adaptive learning but also used the
is employed to evaluate the inconsistency in the residential ensemble structure to mitigate the instability of the fore-
load profiles and to indicate the difficulty of the load fore- casts. Recently, k-nearest neighbour (KNN) algorithm had also
casting problem for the residential sector. Justification of why seen some successful examples on load forecasting [13], [14],
LSTM is suitable for this task is given. Then, a residential load whose dominant advantage is its efficiency. Ghofrani et al. [15]
forecasting framework based on the LSTM is described. In proposed a dedicated input selection scheme to work with the
addition, a conservative empirical optimisation based predictor hybrid forecasting framework using wavelet transformation
is developed to serve as one of the many benchmarks. In and Bayesian neural network. To the best of our knowl-
the case study, we demonstrate that although the empiri- edge, this approach achieved the most accurate forecasting
cal predictor occasionally outperforms other machine learning performance on system-level load forecasting, with mean
based predictors in some extremely volatile cases, the LSTM absolute percentage error (MAPE) score as small as 0.419% on
can capture the subtle temporal consumption pattern persisting average.
in single-meter load profile and produce the best forecasts for However, all of the above methods focus on learning and
the majority of cases. On the aggregation level, at last, we also forecasting load at the system or substation level. In order
demonstrate that aggregating individual forecasts are generally to support future smart grid applications, effective load fore-
more accurate than directly forecasting the aggregated loads casting techniques for electricity users are gaining increasing
through our proposed approach. interest. Zhang et al. [16] developed a big data architecture
The rest of the paper is organised as follows. Section II that combines load clustering based on smart meter data and
provides the background of the load forecasting commu- decision tree to select corresponding load forecasting model
nity. Section III conducts an explorative data analysis to for prediction. Quilumba et al. [17] also adopted the sim-
visualise the challenge of the single-meter load forecasting ilar methodology that bases on smart meter data clustering
problem. Section IV introduces the LSTM framework for and neural network to make intraday load forecasting at the
short-term residential load forecasting. Section V introduces system level. Stephen et al. [18] clustered and labelled daily
other benchmarks including two empirical predictors. The test- historical data of individual households. The individual house-
ing dataset and experimental results are given in Section VI holds were deemed as label sequences, which are further fit
while Section VII concludes the paper. to Markov chains. Then the day ahead label can be sam-
pled, and cluster means at each time points were used for
the day ahead prediction. These works all showed that the
II. L ITERATURE S TUDY forecasting errors could be reduced by effectively grouping
There have been many research works in the area of short- different customers based on their daily profiles. However, they
term load forecasting. Cao et al. [8] adopted autoregressive all only reported the aggregated load forecasting error at the
integrated moving average (ARIMA) model and similar day system or community level where individual customer predic-
method for intraday load forecasting. The mechanism of their tion errors could be offset by the diversity of different end
similar day method is to group the targeted day with meteoro- users.
logically similar days in the history and predict the load based In the existing literature, Chaouch’s work [19] and the work
on the average demand of those days. It was demonstrated that of Ghofrani et al. [20] are first two examples that focus on
in ordinary days, ARIMA performs better while similar day load forecasting for individual users. A functional time series
method wins in unordinary days. Radial basis function (RBF) forecasting approach was proposed, and the daily median abso-
neural network had been used to address the short-term lute errors were reported in [19], but the more commonly used
load forecasting in [9] and [10]. Yun et al. combine the RBF metric of mean absolute percentage error (MAPE) was not
neural network with the adaptive neural fuzzy inference sys- reported. It is thereby unsuitable to serve as a benchmark
tem (ANFIS) to adjust the prediction by taking account of for experimental comparisons. A Kalman filter estimator was
the real-time electricity price. Li et al. [10] addressed the used, and an MAPE ranging from 18% - 30% was obtained
short-term day-ahead forecasting problem via a grid method in [20]. Very recently, deep learning based methods start to
combined with the back propagation (BP) and the RBF neural emerge in the load forecasting community. Ryu et al. [21]
networks. The grid method is similar to the similar day method showed that the load forecasting accuracy for industrial cus-
in [8], but instead of grouping days with similar meteorologi- tomers could be improved by using deep neural network. In
cal measurement, it groups load profiles according to location, fact, the industrial electricity consumption patterns are much
nature and size of the loads. For each group, they trained more regular than residential ones, so that a much more accu-
a BP network and an RBF network to predict the day-ahead rate result of about 8.85% MAPE on average is achieved
load demand. Qingle and Min [11] also proposed a neural compared to the results in [20]. Mocanu et al. [22] employed
network based predictor for very short term load forecasting. a factored conditional restricted Boltzmann machine, one
It takes the only the load values of the current and previous of the deep learning methods, for single-meter residential
time steps as the input to predict the load value at the coming load forecasting and observed notable improvement com-
time step. Zhang et al. [12] used an ensemble of extreme pared to the performance of shallow artificial neural network
learning machines (ELMs) to learn and forecast the total and support vector machine. Similarly, Marino et al. [23]
load of the Australian national energy market. The proposed made the first attempt towards the same load forecasting

Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
KONG et al.: SHORT-TERM RESIDENTIAL LOAD FORECASTING BASED ON LSTM RNN 843

issue by using LSTM and demonstrate comparable results a series of doings governed by the diverse motives and inten-
as in [22]. However, the effectiveness of the two pioneering tions of people. Based on the characteristic of their underlying
works was only verified on the metric of root mean square motives, these practices can be classified into four categories,
error (RMSE) instead of the more common metric of MAPE, namely ‘Practical Understanding’, ‘Rules’, ‘Teleo-affective’
which makes it hard to contrast to other works. Also, the simi- and ‘General Understanding’ [18]. ‘Practical Understanding’
lar work of [23] lacked the justifiable discussion of the reason type of practices correspond to those routine activities that
for choosing LSTM. Moreover, all the different approaches the actors know when to do and what to do, like taking
in [19], [20], [22], and [23] were only verified with a sin- a shower, doing laundry. ‘Rules’ type of practices can refer to
gle residential household, so the aggregated effect of the those operations that are constrained by the technical limits of
customer-wise load forecasts cannot be evaluated. the system. For example, the dishwasher has pre-programmed
washing procedure, and one can hardly control the duration
III. E XPLORATORY DATA A NALYSIS AND and energy of the washing process. ‘Teleo-affective’ practices
P ROBLEM I DENTIFICATION related to those goal oriented behaviours, such as making a cup
The main challenge of the single household load-forecasting of coffee and watching TV. ‘General Understanding’ may refer
problem is the diversity and volatility. Thanks to the Smart to those practices with a greater degree of persistence and
Grid Smart City (SGSC) project initiated by the Australian regular occurrence over time, such as religion related activi-
Government [24], electrical load data for thousands of indi- ties. All these four types of practices from every resident in
vidual households are available. a household constitute the interaction and engagement of var-
ious electric appliances and present to be the daily electricity
A. Introduction of the Dataset consumption. The diversity and complexity of human beings
render the loads in different houses extremely dissimilar. Some
The SGSC was Australia’s first commercial-scale smart grid dwellers may have more regular lifestyles so that their energy
project initiated as early as in 2010. During the 4-year project usage is easier to forecast.
development, SGSC had gathered smart meter data for about In order to justify the above observation, we use
10,000 different customers in New South Wales. These rich a density-based clustering technique, which is known as
data are the key to enabling further study of many smart grid the Density-Based Spatial Clustering of Application with
technologies in a real-world context. Apparently, short-term Noise (DBSCAN) [26], to evaluate the consistency in daily
load forecasting for individual households is one of smart grid power profile. The most obvious benefits of using DBSCAN
research that can make use of this project. for consistency analysis compared to other clustering tech-
Since we focus on load forecasting for a group of general niques are that it requires no number of the cluster in the
individual families, it is unrealistic and time-consuming to go data, and it has the notion of outliers. According to the
through every customer in the SGSC dataset. Therefore, a sub- Practice Theory and [18], electricity consumption patterns will
set of SGSC dataset is used for the proof of concept. The repeat with noise, so DBSCAN is an ideal clustering tech-
selection criterion is those households which possess a hot nique to identify outliers from a set of daily consumption
water system, which leads to a reasonably sized testing subset profiles. If the result has fewer outliers, then the consistency is
including 69 customers. better.
In our case, each daily profile consists of 48 half-hourly
B. Exploratory Data Analysis readings, which can be viewed as a 48-dimension sample.
Unlike electric load at the system level, individual residen- DBSCAN requires three pre-determined parameters, the dis-
tial loads lack the obvious consistent patterns that can be in tance eps, the minimum number of samples in a cluster
favour of the forecasting accuracy. Generally, the diversity in minSam and the definition of the distance measure. In this
the aggregated level smooths the daily load profiles, which paper, we set the eps equal to 10% of the average daily energy
make substation load relatively more predictable, while the consumption, minSam = 2 and use the Euclidean distance as
electricity consumption of a single customer is more dependent the distance measure. We apply the DBSCAN to the aggre-
on the underlying human behaviour. gated load and each of the 69 households over three months
In an individual household, the daily routines and the from 01/06/2013 to 31/08/2013, during which all 69 individual
lifestyle of the residents and the types of possessed major customers had complete data.
appliances may have a more direct impact on the follow- First of all, the aggregated daily profiles of all 69 households
ing short-term load profile. For example, some households with the clustering results are shown in Fig. 1. It is clear that
may have somewhat fixed routines to switch on the laundry the daily profiles form only one cluster and have no outliers.
dryer after using their washing machines, which would imply In another word, the aggregated load is rather consistent, and
the high possibility of major electricity consumption in the its daily pattern is fairly obvious.
subsequent one or two hours. For the individual customers, the number of outliers (daily
Reference [18] also noted this phenomenon and used profiles do not belong to any clusters) varies greatly from one
Practice Theory [25] to explain the root causes of energy household to another. This corresponds with the fact of the
usage on a residential level. The overall residential energy different lifestyles that each household may possess. The out-
consumption is produced by the usage of appliances through lier distribution is shown in Fig. 2. Among all customers, we
an ensemble effect of ‘practices’, which can be interpreted as visualise the daily profiles of two opposite cases. Fig. 3 shows

Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
844 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 1, JANUARY 2019

Fig. 1. Aggregated daily profiles with one cluster and no outliers.


Fig. 4. A case (Customer 8282282) with no clusters at all.

In this regard, a learning algorithm with the capability of


abstracting previous observations as some hidden knowledge
of residents’ behaviour and establishing a correlation between
the abstraction and forecast target is the key to better fore-
casting performance. For example, as shown in Fig. 3, the
algorithm should ideally learn that this customer tends to main-
tain energy consumption of about 2.6 kWh for about two
and a half hours in the morning, regardless what the time
of the day is. The LSTM recurrent network is one of the
Fig. 2. The distribution of the number of outliers.
ideal candidates due to its proven capability of learning tem-
poral correlations according to its performance in language
translation and speech recognition [3], [7]. Such temporal cor-
relations exist widely in the power consumption pattern of
a single family because they are based on residents’ behaviours
that are hard to learn. In the case of residential load forecast-
ing, the LSTM network is expected to be able to abstract some
residents’ states from the pattern of the input consumption
profile, maintain the memory of the states, and finally make
a prediction based on the learnt information.

IV. T HE F ORECASTING F RAMEWORK BASED ON LSTM


A. The LSTM Model
Recurrent neural networks (RNNs) are fundamentally dif-
Fig. 3. A case (Customer 8342852) with only ONE outlier, two major clusters ferent from traditional feedforward neural network. They are
and two minor clusters. sequence-based models, which are able to establish the tempo-
ral correlations between previous information and the current
circumstances. For time series problem, this means that the
a case that presents fairly consistent daily profiles. There is decision an RNN made at time step t −1 could affect the deci-
only one daily profile being identified as an outlier. Apart from sion it will reach at time step later t. Such characteristic of
that, all daily profiles are grouped into two major clusters and RNNs is ideal for the load forecasting problems of individual
two minor clusters. The two dominant daily patterns are visible households, since it has been pointed out that residents’ intrin-
from the figure. sic daily routines may be one of the most important factors to
Fig. 4 shows a completely opposite case. This customer the energy consumption at the later time intervals.
had no consistency at all during the 3-month period. None of RNNs are trained by backpropagation through
the daily profiles can form even one cluster with any others. time (BPTT) [27]. However, learning long-range depen-
Compared Fig. 3 and Fig. 4 to Fig. 2, we can clearly see the dencies with RNNs is difficult due to the problems of
difference between the individual loads and the aggregated gradient vanishing or exploding [28], [29]. Gradient van-
load. Even in the consistent case, there exist two prominent ishing in RNN refers to the problems that the norm of the
patterns. Successful forecasting approaches that focus on the gradient for long-term components decreasing exponentially
features such as time of the day and day of the week for loads fast to zero, limiting the model’s ability to learn long-term
on substation level may no longer be suitable for single-meter temporal correlations, while the gradient exploding refers to
load forecasting. the opposite event. In order to overcome the issues, the long

Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
KONG et al.: SHORT-TERM RESIDENTIAL LOAD FORECASTING BASED ON LSTM RNN 845

short-term memory (LSTM) architecture was first introduced


by Hochreiter and Schmidhuber [2] when a memory cell
was included and further improved by Gers et al. [30] with
an extra forget gate. It has been the most successful RNN
architecture and received huge popularity in many subsequent
applications.
Lipton et al. [31] have done a detailed review of the over-
all structure and the latest development of LSTM. To briefly
introduce the concept of the LSTM, we adopt a similar nam-
ing convention as in [31]. Let {x1 , x2 , . . . , xT } denote a typical
input sequence for an LSTM, where xt ∈ Rk represents a
k-dimensional vector of real values at the tth time step. In order
to establish temporal connections, the LSTM defines and main-
tains an internal memory cell state throughout the whole life
cycle, which is the most important element of the LSTM struc-
ture. The memory cell state st−1 interacts with the intermediate
output ht−1 and the subsequent input xt to determine which
elements of the internal state vector should be updated, main-
tained or erased based on the outputs of the previous time Fig. 5. The structure of an LSTM block.
step and the inputs of the present time step. In addition to the
internal state, the LSTM structure also defines input node gt ,
input gate it , forget gate f t and output gate ot . The formulations
of all nodes in an LSTM structure are given by (1) to (6),
 
f t = σ Wfx xt + Wfh ht−1 + bf (1)
it = σ (Wix xt + Wih ht−1 + bi ) (2)
 
gt = φ Wgx xt + Wgh ht−1 + bg (3)
ot = σ (Wox xt + Woh ht−1 + bo ) (4) Fig. 6. The un-rolled LSTM sequential architecture.

st = gt  it + st−1  f t (5)
ht = φ(st )  ot (6) biases are learnt by minimising the differences between the
LSTM outputs and the actual training samples. Through this
where Wgx , Wgh , Wix , Wih , Wfx , Wfh , Wox and Woh are weight un-rolled structure, information of the current time step can
matrices for the corresponding inputs of the network activa- be stored and maintained to affect the LSTM output of the
tion functions;  stands for an element-wise multiplication; σ future time steps.
represents the sigmoid activation function, while φ represents
the tanh function. The LSTM block structure at a single
time step is illustrated as Fig. 5, and the un-rolled LSTM B. The LSTM Based Forecasting Framework
architecture connecting information of the subsequent time The short-term LSTM based forecasting framework is given
point is shown by Fig. 6. The weights of the un-rolled LSTM as Fig. 7. The framework starts from data preprocessing for
are duplicated for an arbitrary number of time steps. the inputs. In this study, the following input features are used:
To train a simple one-layer LSTM recurrent network, one 1. the sequence of energy consumptions for the past K time
should specify the hyperparameter of the hidden output dimen- steps E = {et−K , . . . , et−2 , et−1 } ∈ RK ;
sion n. In this case, the hidden output at a given time step 2. the incremental sequence of the time day indices for the
ht ∈ Rn , which is a n-dimensionla vector, so as to st . The past K time steps I ∈ RK , where the range for each element
common initialization for ht and st is zero initialisation, i.e., of I is (1 to 48);
h0 = 0 and h0 = 0. There are three sigmoid functions, with 3. the corresponding day of week indices for the past K
output range from 0 to 1, in an LSTM block serving as the time steps D, each of which ranges from 0 to 6;
“soft” switches to decide which signals should pass the gates. 4. the corresponding binary holiday marks H, each of which
If the gate is 0, then the signal is blocked by the gate. The deci- can either be 0 or 1.
sions for the forget gate f , the input gate i and the output gate Since the LSTM is sensitive to data scale, the four input vec-
o are all dependent on the current input xt and the previous tors are scaled to the range of (0, 1) according to the feature’s
output ht−1 . The signal of the input gate controls what to nature. We adopt the min-max normalisation for E, while the
preserve in the internal state, while the forget gate controls vectors I, D, H are encoded by one hot encoder. The one hot
what to forget from the previous state st−1 . With the internal encoder maps an original element from the categorical feature
state updated, the output gate decides which the internal state vector with M cardinality into a new vector with M elements,
st should pass as the LSTM output ht . This process then con- where only the corresponding new element is one while the
tinues to repeat for the next time step. All the weights and rest of new elements are zeroes.

Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
846 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 1, JANUARY 2019

Fig. 8. The MAPE minimisation forecast and the empirical mean forecast.

A. Empirical Approaches
1) Empirical Means: We start with one of the most naïve
predictors – the empirical mean predictor. It simply outputs
a forecasting value based on the statistical mean given the
customer ID, the time of the day and the day type (weekday
or weekend).
2) Empirical MAPE Minimisation: We further develop
a more sophisticated optimisation based empirical predictor
to enrich the benchmarks. Similar to the empirical mean
predictor, the empirical MAPE minimisation predictor is based
Fig. 7. The LSTM based forecasting framework.
on statistical energy consumption distribution, resembling the
models used in [38]. For example, given the customer ID,
the time of the day, the day type and the desired precision
Once the four vectors are scaled to 
E,I,   the input for
D, H,
level (e.g., one watt-hour), an empirical probability mass func-
LSTM layer is a matrix of the concatenation of the four, i.e.:
  tion (pmf) can be derived from historical data, as shown by the
X=  E ,
T T T  T histogram in Fig. 8. With the empirical pmf ready, the MAPE
I ,D ,H
expectation of any forecasts can be calculated. Let p denote
Each row of the input matrix is the scaled features for the an arbitrary predicted load value within the empirical range
corresponding time step, which is fed to corresponding LSTM and pi denote the ith possible discretised value also within the
block in the LSTM layer. Due to the sequential nature of the empirical range, the MAPE expectation of choosing arbitrary
output of an LSTM layer, an arbitrary number of LSTM layers predicted load value p as the forecast is
can be stacked to form a deep learning network.  
  (p − pi ) 
The outputs of the top LSTM layer are fed to a conventional Eid,t,h (p) = 
pmfid,t,h (pi )  (7)
p 
feedforward neural network which maps the intermediate i i

LSTM outputs to a single value, i.e., the energy consumption where pmfid,t,h () is the empirical probability mass function
forecast of the target time interval. derived by specifying the customer id, the time of day t and
All frameworks for all customers are built on a desktop the day type h. By minimising the MAPE expectation, load
PC with a 3.4 GHz Intel i7 processor and 8GB of memory forecast p̂ at time t when the day type is h can be obtained as:
using the Keras library [32] with Theano backend [33]. After  
running some initial tests, the computationally efficient Adam p̂t = argmin Eid,t,h (p) (8)
p
optimiser [34] showed slightly better results than other candi-
dates, including the classic stochastic gradient descent (SGD), The empirical MAPE minimization forecast for Customer
Adagrad [35], Adadelta [36] and RMSProp [37]. Therefore, 8804804 at 12:00 in a weekday is illustrated in Fig. 8. The red
Adam is used for the training of the proposed forecasting line indicates the MAPE expectation throughout the empirical
framework with recommended default parameters (learning range for this customer in the interval. The minimal MAPE
rate, momentum and decay). expectation point is more conservative than the empirical mean
of the data.
V. T HE B ENCHMARKS
Due to the highly volatile nature of single-user power B. Some State-of-the-Art Machine Learning Approaches
consumption, the forecasting accuracy is usually low in the Due to many promising cases of neural network based
existing literature despite the state-of-the-art approaches were short-term load forecasting on the grid level [9]–[11], we
applied. Therefore, in addition to some machine learning meth- should also benchmark the results of our proposed framework
ods, we also investigate some empirical methods to serve as against those from the conventional backpropagation neural
the benchmarks in the present work. network (BPNN). Two scenarios of BPNN are considered.

Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
KONG et al.: SHORT-TERM RESIDENTIAL LOAD FORECASTING BASED ON LSTM RNN 847

TABLE I
One of the scenarios only features the energy consumption H YPER -PARAMETER S UMMARY
of the corresponding consumption in the past D days, which
is denoted as BPNN-D hereafter. The other looks back the
energy consumption of the past K time steps as our proposed
LSTM based framework, which is denoted as BPNN-T.
Reasonable performances based on the k-nearest neigh-
bour (KNN) regression [13], [14], superior forecasting accu-
racy based on extreme learning machine (ELM) [12] and
the most accurate results from a sophisticated input selec- TABLE II
L OAD F ORECASTING MAPE S UMMARY
tion scheme combined with a hybrid forecasting framework
(IS-HF) [15] had been demonstrated. Therefore, they are also
included as our benchmark candidates.

VI. C ASE S TUDY


A. The Test Settings
Due to the incompleteness of the data, we pick a 3-month
period from 01-Jun-2013 to 31-Aug-2013, when all selected
customers have complete data 30-minute interval energy con-
sumption data. This period also covers the whole winter season
of New South Walse, Australia, so that seasonal factor can be
reasonably ignored in this case study.
The period spans over 92 days. The data are split into
three subsets for different purposes, namely training set (from
01-Jun-2013 to 05-Aug-2013), validating set (from 06-Aug-
2013 to 22-Aug-2013) and testing set (from 23-Aug-2013 to
31-Aug-2013). In other words, the data split is 0.7/0.2/0.1. The
training set is used to train forecasting models, and the val-
idating set is used to select best performing models, while
the testing set is used for result evaluation at last. In this in [15]. For the neural network parameters in IS-HF, we use
case, there are nine days in total for evaluation. Considering one hidden layer with six neurons. Parameters of all mod-
the SGSC data are with 30-minute intervals, there are totally els are summarised in Table I. The total number of epochs
29,808 time points to forecast for all customers in the group. for both LSTM and BPNN is set to 150. Due to the nature
For the empirical approaches, both the training and validating of random initialisation of ELM, there is no concept of an
sets are combined to derive the statistics for each user. epoch in the original ELM framework. Therefore, the training
In general, hyper-parameter tuning is a big topic and is of ELM repeats 150 times to find the best model to perform
essential to obtain the best forecasting performance. However, forecasting.
since we are forecasting individually, tuning 69 models for
each of the candidate methods is very time-consuming for
this proof-of-concept paper. In this work, we only focus B. Experimental Results and Discussion
on identifying the candidate method(s) with the best overall Multiple scenarios with different time horizons of {2, 6,
performance. Therefore, some rules of thumb for hyperparam- 12} look-back time steps, for each machine learning method
eter selection is adopted. For deep learning models, previous are tested. For the BPNN-D, scenarios up to 3 days in the
work suggests that the performance of the networks is rela- past are studied. For each scenario, the mean absolute per-
tively insensitive to any combination of some layers and layer centage error (MAPE) is calculated for test time intervals of
size [39]. This is also confirmed in the paper [23] that also all customers. The testing forecasting MAPEs are given in
adopts the LSTM technique. According to the consistent find- Table II.
ing in [40], multiple layers always work better than a single 1) Performance for Individual Households: The second
layer, and the number of hidden nodes should be sufficiently column of Table II shows the comparison of the average
large. In this case, we connect both the LSTM and the BPNN MAPE of all 29,808 individual forecasts. Generally, the fore-
with two hidden layers and 20 hidden nodes on each hidden casts for each time interval of each household are not as
layer to allow horizontal comparisons. The number of neigh- accurate as the forecasts for substation loads. This level of
bours for the KNN algorithm is set to 20 with uniform weight accuracy is anticipated because it was reported that the fore-
scheme. For the parameter setting of ELM, it is typically a sin- casting accuracy tends to drop significantly as the level of
gle layer feedforward network. Due to ELM’s advantageous aggregation decreases [41], from 1.97% MAPE at the national
training efficiency, an exhaustive search strategy is employed level and 5.15% at university campus level to 13.8% at the
to get the optimal number of hidden neurones. For the IS- village level. Reference [42] also provides a range of MAPE
HF, we strictly follow the structure and parameters introduced between 10% - 35% for a specific case of a residential feeder.

Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
848 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 1, JANUARY 2019

level [16]–[18]. That is to aggregate all the forecasts for indi-


vidual customers and compare them to the aggregated actual
loads. As shown in the third column of Table II, all the results
correspond to the range of residential forecasting MAPE of
10% - 35% suggested by [42]. The LSTM and the BPNN-T
in the top tier are still in the dominant position, outperforming
all the other benchmarks by noticeable margins with MAPE
range from 8.18% to 8.64%. 69 customers can form the size
of a village effectively. By aggregating the individual fore-
casts from the proposed LSTM framework, we achieve much
better forecasts compared to the MAPE scores of 13.8% [41]
Fig. 9. The Statistics of the best approaches.
and 11.13% (123 households) [18] on the same aggregation
level. The MAPE minimisation approach, as one of the top-tier
predictors for individual load forecasting, becomes the worst
In our case, as we attempt to forecast the most granular level of in the aggregated scenario.
loads, the accuracies achieved in Table II for various methods 3) Aggregating Forecasts Versus Forecasting the Aggregate:
are reasonable. In addition to aggregating the forecasts, the aggregated load
The results can be divided into three tiers according to the forecasting can also be carried out by the conventional strategy
range of the average MAPE, namely below 50%, below 100% of forecasting the aggregated load directly. We apply all the
and above. For the top tier, only three methods achieve the top- and the second-tier approaches and compare their results
level of accuracy of MAPE below 50%. They are our proposed in the fourth column of Table II.
LSTM forecasting framework, the proposed statistics-based It should be noted that the performances of empirical mean
empirical MAPE minimisation predictor and the BPNN con- by either way are identical, but other forecasting approaches
sidering the consumption of the time interval prior to the target demonstrate different accuracy under different strategies. The
time step. The LSTM framework is generally the best, fol- strategy of aggregating the forecasts generally provides more
lowed by the MAPE minimisation predictor. The best score accurate forecasting performance than forecasting the aggre-
is 44.06% achieved by the proposed LSTM method with gate for the two top-tier approaches. Aggregating the forecasts
12 look-back steps. The average of the three LSTM scenar- for both LSTM and BPNN-T sees about 0.49-percentage-point
ios is 44.25%, which is generally better than the BNPP-T and 1.08-percentage-point improvement respectively.
approaches (with an average of the three scenarios 49.38%) In addition, for the LSTM and the BPNN-T approaches in
in the same tier by about five percentage points. the first tier, the LSTM once again outperforms the BPNN-T
KNN, BPNN-D and IS-HF fall in the second tier of fore- by directly forecasting the aggregated loads. However, due to
casting performance. The KNN achieves MAPE from 71% to the difference between of load characteristics on the aggrega-
81% by also using inputs of the energy consumption from the tion level and the individual residential level, the advantage of
previous time intervals. The BPNN using the input of energy LSTM performance on aggregated loads is weakened. Recall
consumption from the same time intervals of previous days that the LSTM is designed to track and learn the temporal
achieves MAPE from 74% to 80%. It is comparative to the relationships of energy consumption. Such relationship may
KNN approaches. The IS-HF, which is reported to achieve stem from human’s daily routine practices such as the pat-
supreme accuracy on a system level, has only 96.76% of terns shown in Fig. 3. When this kind of patterns is not so
MAPE for the individual customers in this test case. The prominent in the load on aggregation level, the advantage of
performance gap between the top tier and the second is using LSTM is not as large as applying it on individual loads.
obvious. In summary, the LSTM and the BPNN-T approaches are the
ELM, as another state-of-the-art machine learning approach two best methods for the tasks of individual residential load
that achieves superior accuracy on a system level, completely forecasting. More specifically, LSTM has much better perfor-
fails for the tasks of single-meter load forecasting. Its MAPE mances for single-meter load forecasting than BPNN-T due
range from 122% to 136%, which is only marginally better to its capability of establishing a temporal relationship and
than the most naïve approach – the empirical mean predictor. the load nature on the granular level. The LSTM performance
The statistics of the best methods for all 69 households are edge on aggregation level is not as significant as on the gran-
shown in Fig. 9. The LSTM performs the best for 35 out of ular level. However, load forecasting on aggregation level can
the 69 households, which accounts for 50% of the test sub- be further improved by aggregating the granular forecasts.
jects. The empirical MAPE minimisation approach is the best
predictor for 23 customers, while the BPNN-T and KNN are
the best for 9 and 2 households respectively. The rest of the C. Tracking the Internal State of the LSTM
benchmarks do not lead the forecasting performance on for Recall that the customer 8342852 as shown in Fig. 3 demon-
any households. strates two major daily consumption patterns. For this particu-
2) Performance for Aggregated Loads: Instead of giving lar customer, the LSTM achieves the MAPE of 36.52%, while
forecast evaluation on individual customers, many research the second-best BPNN-T only gets 58.97%. The comparison is
works only report the forecasting error on the aggregated shown in the first row of Fig. 10. It can be seen that the LSTM

Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
KONG et al.: SHORT-TERM RESIDENTIAL LOAD FORECASTING BASED ON LSTM RNN 849

Fig. 10. Forecast comparison and LSTM internal state visualisation.

forecast follows the peaks much better than the BPNN-T. The
BPNN-T always forecasts peaks after the peaks had already
happened.
In order to better show how the LSTM forecasts load at this
residential level, we also visualise all the internal states of the
LSTM in the middle row of Fig. 10. Each curve represents
the dynamic tracking of the internal state for a hidden node
in the network. The summation of all internal states is shown
as the third row of Fig. 10. From the summation of the LSTM
states, we can visualise five cycles, which correspond to the
five-day period (240 time steps). The internal states of the
LSTM are the key for the algorithm to develop a long-term
temporal relationship and help it outperform the BPNN. Fig. 11. MAPE versus number of outliers for LSTM and BPNN-T.

as the inconsistency of lifestyle grows, the energy consump-


tion becomes more difficult to forecast for both approaches.
D. The Load Consistency and the Forecasting Accuracy
The LSTM approach generally improves the forecasting per-
Every household has different life consistency, which lead to formance. However, when the inconsistency is relatively small,
different difficulty of the forecasting task. In Section III, we the improvement is also marginal; the improvement of LSTM
had introduced the DBSCAN clustering algorithm and used tends to be more obvious as the inconsistency gets larger.
the number of outliers as an indicator of the lifestyle incon-
sistency of each household. Fig. 11 shows both the results of
LSTM and BPNN-T for all customers. The blue dots rep- VII. C ONCLUSION
resent the results achieved by the BPNN-T, while the red This paper tries to address the short-term load forecasting
dots represent the MPAE of LSTM. Two linear regression problem for individual residential households. First, the dif-
curves are fit to each group of the results. It turns out that ference between electric load on such granular level and load

Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
850 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 1, JANUARY 2019

on substation level is elaborated. Unlike the load on system [10] H. Li, Y. Zhao, Z. Zhang, and X. Hu, “Short-term load forecasting based
or substation level where daily consumption patterns always on the grid method and the time series fuzzy load forecasting method,”
in Proc. Int. Conf. Renew. Power Gener. (RPG), Beijing, China, 2015,
exist, energy consumption in a single household is usually pp. 1–6.
volatile. A density based clustering technique is employed to [11] P. Qingle and Z. Min, “Very short-term load forecasting based on neural
evaluate and compare the inconsistency between the aggre- network and rough set,” in Proc. Int. Conf. Intell. Comput. Technol.
Autom. (ICICTA), Changsha, China, 2010, pp. 1132–1135.
gated load and individual loads. According to the Practice [12] R. Zhang, Z. Y. Dong, Y. Xu, K. Meng, and K. P. Wong, “Short-term
Theory, lifestyles of the residents will be reflected in the load forecasting of Australian national electricity market by an ensemble
energy consumption as the repetitive patterns no matter how model of extreme learning machine,” IET Gener. Transm. Distrib., vol. 7,
no. 4, pp. 391–397, Apr. 2013.
inconsistent it is. Therefore, this paper proposes an LSTM [13] R. Zhang, Y. Xu, Z. Y. Dong, W. Kong, and K. P. Wong, “A composite
recurrent neural network based load forecasting framework k-nearest neighbor model for day-ahead load forecasting with limited
for this extremely challenging task of individual residential temperature forecasts,” presented at the IEEE Gen. Meeting, Boston,
MA, USA, 2016, pp. 1–5.
load forecasting, because LSTM has been proven to learn the [14] F. H. Al-Qahtani and S. F. Crone, “Multivariate k-nearest neighbour
long-term temporal connections. regression for time series data—A novel algorithm for forecasting UK
Multiple benchmarks are comprehensively tested and com- electricity demand,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN),
Dallas, TX, USA, 2013, pp. 1–8.
pared to the proposed LSTM load forecasting framework on
[15] M. Ghofrani, M. Ghayekhloo, A. Arabali, and A. Ghayekhloo, “A hybrid
a real-world dataset. It turns out that many load forecasting short-term load forecasting with a new input selection framework,”
approaches which are successful for grid or substation load Energy, vol. 81, pp. 777–786, Mar. 2015.
forecasting struggle in the single-meter load forecasting prob- [16] P. Zhang, X. Wu, X. Wang, and S. Bi, “Short-term load forecasting
based on big data technologies,” CSEE J. Power Energy Syst., vol. 1,
lems. The proposed LSTM framework achieves generally the no. 3, pp. 59–67, Sep. 2015.
best forecasting performance in the dataset. [17] F. L. Quilumba, W.-J. Lee, H. Huang, D. Y. Wang, and R. L. Szabados,
Moreover, although individual load forecasting is far from “Using smart meter data to improve the accuracy of intraday load fore-
casting considering customer behavior similarities,” IEEE Trans. Smart
accurate, aggregating all individual forecasts yields better fore- Grid, vol. 6, no. 2, pp. 911–918, Mar. 2015.
cast for the aggregation level, compared to the conventional [18] B. Stephen, X. Tang, P. R. Harvey, S. Galloway, and K. I. Jennett,
strategy of directly forecasting the aggregated load. “Incorporating practice theory in sub-profile models for short term
aggregated residential load forecasting,” IEEE Trans. Smart Grid, vol. 8,
The inconsistency in daily consumption profiles generally no. 4, pp. 1591–1598, Jul. 2017.
affects the predictability of the customers. The higher the [19] M. Chaouch, “Clustering-based improvement of nonparametric func-
inconsistency is, the more the LSTM can contribute to the fore- tional time series forecasting: Application to intra-day household-level
load curves,” IEEE Trans. Smart Grid, vol. 5, no. 1, pp. 411–419,
casting improvement compared to the simple back propagation Jan. 2014.
neural network. [20] M. Ghofrani, M. Hassanzadeh, M. Etezadi-Amoli, and M. S. Fadali,
As for future work, methodologies for parameter tunning “Smart meter based short-term load forecasting for residential cus-
tomers,” in Proc. North Amer. Power Symp. (NAPS), Boston, MA, USA,
can be developed to further enhance forecasting accuracy on 2011, pp. 1–5.
different types of customers, especially those households with [21] S. Ryu, J. Noh, and H. Kim, “Deep neural network based demand
very high volatility. side short term load forecasting,” in Proc. IEEE Int. Conf. Smart
Grid Commun. (SmartGridComm), Sydney, NSW, Australia, 2016,
pp. 308–313.
R EFERENCES [22] E. Mocanu, P. H. Nguyen, M. Gibescu, and W. L. Kling, “Deep learn-
ing for estimating building energy consumption,” Sustain. Energy Grids
[1] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: Netw., vol. 6, pp. 91–99, Jun. 2016.
A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., [23] D. L. Marino, K. Amarasinghe, and M. Manic, “Building energy
vol. 35, no. 8, pp. 1798–1828, Aug. 2013. load forecasting using deep neural networks,” in Proc. 42nd Annu.
[2] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Conf. IEEE Ind. Electron. Soc. (IECON), Florence, Italy, 2016,
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. pp. 7046–7051.
[3] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning [24] Smart Grid, Smart City, Australian Govern., Australia, Canberra,
with neural networks,” in Proc. Adv. Neural Inf. Process. Syst., Montreal, ACT, Australia, 2014. [Online]. Available: http://www.industry.gov.au/
QC, Canada, 2014, pp. 3104–3112. ENERGY/PROGRAMMES/SMARTGRIDSMARTCITY/Pages/default.
[4] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural aspx
image caption generator,” in Proc. IEEE Conf. Comput. Vis. Pattern [25] E. Shove, M. Pantzar, and M. Watson, The Dynamics of Social Practice:
Recognit., Boston, MA, USA, 2015, pp. 3156–3164. Everyday Life and How It Changes. London, U.K.: SAGE, 2012.
[5] A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for gen- [26] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm
erating image descriptions,” in Proc. IEEE Conf. Comput. Vis. Pattern for discovering clusters in large spatial databases with noise,” in Proc.
Recognit., Boston, MA, USA, 2015, pp. 3128–3137. KDD, 1996, pp. 226–231.
[6] J. Mao et al. (2014). Deep Captioning With Multimodal [27] P. J. Werbos, “Backpropagation through time: What it does and how to
Recurrent Neural Networks (m-RNN). [Online]. Available: do it,” Proc. IEEE, vol. 78, no. 10, pp. 1550–1560, Oct. 1990.
https://arxiv.org/abs/1412.6632 [28] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies
[7] A. Graves and N. Jaitly, “Towards end-to-end speech recognition with with gradient descent is difficult,” IEEE Trans. Neural Netw., vol. 5,
recurrent neural networks,” in Proc. ICML, Beijing, China, 2014, no. 2, pp. 157–166, Mar. 1994.
pp. 1764–1772. [29] F. K. John and C. K. Stefan, “Gradient flow in recurrent nets: The
[8] X. Cao, S. Dong, Z. Wu, and Y. Jing, “A data-driven hybrid difficulty of learning long-term dependencies,” in A Field Guide to
optimization model for short-term residential load forecasting,” Dynamical Recurrent Networks. New York, NY, USA: IEEE Press, 2001,
in Proc. IEEE Int. Conf. Comput. Inf. Technol. Ubiquitous p. 464.
Comput. Commun. Dependable Auton. Secure Comput. Pervasive [30] F. A. Gers, J. A. Schmidhuber, and F. A. Cummins, “Learning to forget:
Intell. Comput. (CIT/IUCC/DASC/PICOM), Liverpool, U.K., 2015, Continual prediction with LSTM,” Neural Comput., vol. 12, no. 10,
pp. 283–287. pp. 2451–2471, 2000.
[9] Z. Yun et al., “RBF neural network and ANFIS-based short-term load [31] Z. C. Lipton, J. Berkowitz, and C. Elkan. (2015). A Critical Review of
forecasting approach in real-time price environment,” IEEE Trans. Power Recurrent Neural Networks for Sequence Learning. [Online]. Available:
Syst., vol. 23, no. 3, pp. 853–858, Aug. 2008. https://arxiv.org/abs/1506.00019

Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.
KONG et al.: SHORT-TERM RESIDENTIAL LOAD FORECASTING BASED ON LSTM RNN 851

[32] F. Chollet. (2015). Keras. [Online]. Available: https://github.com/ fchol- Youwei Jia (S’11–M’15) received the B.Eng. degree from Sichuan University,
let/keras China, in 2011 and the Ph.D. degree from Hong Kong Polytechnic University,
[33] R. Al-Rfou et al. (2016). Theano: A Python Framework for Hong Kong, in 2015, where he is currently with Hong Kong Polytechnic
Fast Computation of Mathematical Expressions. [Online]. Available: University, as a Post-Doctoral Fellow. His research interests include power
https://arxiv.org/abs/1605.02688 system security analysis, cascading failures, complex network, and artificial
[34] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- intelligence application in power engineering.
mization,” arXiv preprint arXiv:1412.6980, 2014. [Online]. Available:
https://arxiv.org/abs/1412.6980
[35] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods
for online learning and stochastic optimization,” J. Mach. Learn. Res.,
vol. 12, pp. 2121–2159, Feb. 2011.
[36] M. D. Zeiler. (2012). ADADELTA: An Adaptive Learning Rate Method.
[Online]. Available: https://arxiv.org/abs/1212.5701
[37] G. Hinton, N. Srivastava, and K. Swersky, “RMSProp: Divide David J. Hill (S’72–M’76–SM’91–F’93–LF’14) received the Ph.D. degree in
the gradient by a running average of its recent magnitude,” in electrical engineering from the University of Newcastle, Australia, in 1976.
COURSERA: Neural Networks for Machine Learning, vol. 4, He holds the Chair of Electrical Engineering in the Department of Electrical
2012. [Online]. Available: https://www.coursera.org/learn/neural- and Electronic Engineering, University of Hong Kong. He is also a Part-Time
networks/lecture/YQHki/rmsprop-divide-the-gradient-by-a-running- Professor with the University of Sydney, Australia.
average-of-its-recent-magnitude From 2005 to 2010, he was an Australian Research Council Federation
[38] S. M. Mousavi and H. A. Abyaneh, “Effect of load models on proba- Fellow with the Australian National University. Since 1994, he has been
bilistic characterization of aggregated load patterns,” IEEE Trans. Power holding various positions with the University of Sydney, Australia, includ-
Syst., vol. 26, no. 2, pp. 811–819, May 2011. ing the Chair of Electrical Engineering until 2002 and again from 2010 to
[39] A.-R. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using 2013 along with an ARC Professorial Fellowship. He has also held aca-
deep belief networks,” IEEE Audio, Speech, Language Process., vol. 20, demic and substantial visiting positions with the University of Melbourne, the
no. 1, pp. 14–22, Jan. 2012. University of California at Berkeley, the University of Newcastle, Australia,
[40] G. Hinton et al., “Deep neural networks for acoustic modeling in speech the Lund University, Sweden, the University of Munich, the City University
recognition: The shared views of four research groups,” IEEE Signal of Hong Kong, and Hong Kong Polytechnic University. His general research
Process. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012. interests are in control systems, complex networks, power systems, and sta-
[41] A. Marinescu, C. Harris, I. Dusparic, S. Clarke, and V. Cahill, bility analysis. His work is currently mainly on control and planning of
“Residential electrical demand forecasting in very small scale: An eval- future energy networks and basic stability and control questions for dynamic
uation of forecasting methods,” in Proc. 2nd Int. Workshop Softw. networks.
Eng. Challenges Smart Grid (SE4SG), San Francisco, CA, USA, 2013, Prof. Hill is a fellow of the Society for Industrial and Applied Mathematics,
pp. 25–32. USA, the Australian Academy of Science, the Australian Academy of
[42] M. Rowe, T. Yunusov, S. Haben, W. Holderbaum, and B. Potter, “The Technological Sciences and Engineering, and the Hong Kong Academy of
real-time optimisation of DNO owned storage devices on the LV network Engineering Sciences. He is also a Foreign Member of the Royal Swedish
for peak reduction,” Energies, vol. 7, pp. 3537–3560, May 2014. Academy of Engineering Sciences.

Weicong Kong (S’14) received the B.E. and M.E. degrees from the South
China University of Technology, Guangzhou, China, in 2008 and 2011, respec-
tively, the M.Sc. degree from the University of Strathclyde, Glasgow, U.K., in
2009, and the Ph.D. degree from the University of Sydney, Australia, in 2017. Yan Xu (S’10–M’13) received the B.E. and M.E. degrees from the
He was the Electrical Engineer with Shenzhen Power Supply Company from South China University of Technology, Guangzhou, China, in 2008 and
2011 to 2014, in charge of the development of distribution automation system, 2011, respectively, and the Ph.D. degree from the University of Newcastle,
SCADA and AMI. He is currently a Post-Doctoral Research Fellow with the Australia, in 2013. He is currently the Nanyang Assistant Professor with
University of New South Wales. His research interests include data analytics the School of Electrical and Electronic Engineering, Nanyang Technological
and deep learning in energy engineering, including non-intrusive load moni- University, Singapore. He was previously with the School of Electrical and
toring, load forecasting, demand response, renewable energy integration, and Information Engineering, University of Sydney, Australia. His research inter-
energy market. ests include power system stability, control, and optimization, microgrid, and
data-analytics for smart grid applications.

Zhao Yang Dong (M’99–SM’06–F’17) received the Ph.D. degree from the
University of Sydney, Australia, in 1999. He is currently with the University
of NSW, Sydney, Australia. He was with the University of Sydney, and as
an Ausgrid Chair and the Director of the Ausgrid Centre of Excellence for
Intelligent Electricity Networks, University of Newcastle, Australia. He also Yuan Zhang (S’16) received the B.E. and M.E. degrees from Xi’an Jiaotong
worked as a System Planning Manager with Transend Networks (currently University, Xi’an, China, in 2010 and 2013, respectively. She is currently
TASNetworks), Australia. His research interests include smart grid, power pursuing the Ph.D. degree with the University of New South Wales, Australia.
system planning, power system security, load modeling, renewable energy She was previously with the University of Sydney, Australia and the National
systems, electricity market, and computational intelligence. He is an Editor University of Singapore, Singapore. Her research interests include electricity
of the IEEE T RANSACTIONS ON S MART G RID, the IEEE PES Letters, and market, home energy management, statistical methods, and machine learning
IET Renewable Power Generation. and their applications in power engineering.

Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:04:25 UTC from IEEE Xplore. Restrictions apply.

You might also like