0% found this document useful (0 votes)
41 views

Clustering-Based Forecasting Method For Individual

Uploaded by

camus Ca
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Clustering-Based Forecasting Method For Individual

Uploaded by

camus Ca
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Open Comput. Sci.

2018; 8:38–50

Research Article

Peter Laurinec* and Mária Lucká

Clustering-based forecasting method for


individual consumers electricity load using time
series representations
https://doi.org/10.1515/comp-2018-0006
Received Feb 26, 2018; accepted May 22, 2018
1 Introduction
Abstract: This paper presents a new method for forecast- Accurate decision making based on data-driven technolo-
ing a load of individual electricity consumers using smart gies and processes is high in demand nowadays. A large
grid data and clustering. The data from all consumers are amount of data is stored in order to improve knowledge
used for clustering to create more suitable training sets discovery and to support decision making. This also hap-
to forecasting methods. Before clustering, time series are pens in the energy industry, especially by deploying smart
efficiently preprocessed by normalisation and the com- grids. The smart grid consists of consumers (also produc-
putation of various model-based time series representa- ers) of electricity load, where every consumer is equipped
tion methods. Final centroid-based forecasts are scaled with a smart meter that sends data reflecting actual elec-
by saved normalisation parameters to create the forecast tricity consumption (or production) usually in 15 or 30
for every consumer. Our method is compared with the minute intervals. Analysis of the whole smart grid, consist-
approach that creates forecasts for every consumer sep- ing of thousands or even millions of consumers, is impor-
arately. Evaluation and experiments were conducted on tant for many reasons. The most important among them
three smart meter datasets from residences of Ireland and are predicting and avoiding disturbances and blackouts,
Australia, and factories of Slovakia. The achieved results sustainable (environmental) usage of energy and other
proved that our clustering-based method improves fore- economic factors. The economic factors include questions
casting accuracy mainly for residential consumers. We can about what amount of energy must be produced or saved,
also proclaim that it can be found such time series repre- and also sold or bought respectively. To make these deci-
sentation and clustering setting that will our forecasting sions responsibly, an accurate forecasting of future elec-
method perform more accurately than fully disaggregated tricity load values are essential. Practitioners or salesper-
approach. Our method is also more scalable since it is nec- sons can be interested in forecasting of a global aggre-
essary to train the model only on clusters and not for every gated consumption, or an aggregated one in a small area,
consumer separately. or even in the consumption of individual end-consumers.
For some companies or producers of electricity, disaggre-
Keywords: clustering, time series data mining, electricity
gated end-consumer consumption forecasting is the most
load forecasting, smart grid
important task. However, this is a very difficult task be-
cause every consumer behaves differently and often un-
predictably (because of a high rate of random effects). For
these reasons, the end-consumers time series of electricity
consumption are very noisy and often irregular. This is the
reason why classical forecasting methods often fail, and
developing new robust methods becomes important.
We propose a new clustering-based forecasting
*Corresponding Author: Peter Laurinec: Faculty of Informatics
method that uses smart grid data gathered from all con-
and Information Technologies, Slovak University of Technology in sumers and shares data consumption profiles to improve
Bratislava, Ilkovičova 2, 842 16 Bratislava, Slovak Republic; Email: the accuracy of the forecast calculated for disaggregated
[email protected] (individual) end-consumer electricity consumption. Rep-
Mária Lucká: Faculty of Informatics and Information Technologies, resentatives of clusters are used as training data on fore-
Slovak University of Technology in Bratislava, Ilkovičova 2, 842 16
casting methods to overcome noise and irregularities in
Bratislava, Slovak Republic; Email: [email protected]
Open Access. © 2018 Peter Laurinec and Mária Lucká, published by De Gruyter. This work is licensed under the Creative Commons
Attribution-NonCommercial-NoDerivatives 4.0 License
Unauthenticated
Download Date | 7/26/18 2:42 AM
Clustering-based forecasting method for individual consumers | 39

disaggregated time series (avoiding also overfitting). Our hyay et al. [5] proposed context-based forecasting with
method decreases the maximum forecasting errors as well. clustering for improving the forecast accuracy of aggregate
This paper is structured as follows: Section 2 con- electricity consumption. They compared results from the
tains an introduction describing related works and con- proposed method with the completely aggregated method,
tributions. Section 3 describes our approach containing completely disaggregated method, and K-means cluster-
the methods used for time series processing, cluster anal- ing. In our previous works [6–8], we have focused on time
ysis, and forecasting. Section 4 presents a description of series representations and various forecasting methods to
the used datasets and the evaluation of performed experi- improve the forecasting accuracy of an aggregated load in
ments, and Section 5 concludes the paper. the combination of K-means clustering. We concluded that
the model-based representation methods are the best for
the extracting of consumers’ patterns. We have proved that
the clustering based on different types of representations
2 Related work gives more accurate results than clustering using only the
original load time series.
There are only a few research papers that are focused on an
individual (disaggregated) consumer consumption fore-
casting in a smart grid. This can be due to its difficulty and
2.1 Contributions
the possibility of unstable results. According to our best
knowledge, there are no papers describing methods based
According to our best knowledge, until now the combi-
on all consumers load profiles clustering to improve the
nation of results of the clustering of all consumers in a
accuracy of individual forecasts.
smart grid and forecasting consumption of individual con-
Wijaya et al. [1] use a correlation-based feature se-
sumers has not been explored and evaluated. We propose
lection method for forecasting individual and also aggre-
a centroid-based method for forecasting that increases
gate residential electricity load. They use linear regres-
forecasting accuracy and decreases computational load as
sion, multilayer perceptron and support vector regression
well. The time needed for calculations is significantly re-
as forecasting methods. Forecasting aggregate electricity
duced from the N trained models, where N is the number
load is enhanced by clustering, but for individual con-
of consumers, to K trained models, where K is the num-
sumers forecasting the clustering is not used. Support vec-
ber of centroids (clusters created). An additional bonus is
tor regression and linear regression achieved best results
an information describing typical profiles of consumption
of the forecast for an individual residential consumer, but
created by clustering that can be used for further analysis.
error rate remains high around 45% of the NMAE mea-
sure (Normalised Mean Absolute Error). Ghofrani et al. [2]
use spectral analysis and Kalman filtering as the method
for residential customers. They evaluated three forecast- 3 Proposed method
ing horizons: 15 minutes, 30 minutes and 1 hour. The 1-
hour horizon forecasts had 30% error rate of the MAPE The description of our proposed clustering-based (or
measure (Mean Absolute Percentage Error) on average. centroid-based) method for disaggregated end-consumer
Koskova et al. [3] use ensembles of variable time series forecasting of electricity load can be better illustrated via
analysis and regression methods to produce forecasts of a special example. We will suppose that N is a number
aggregate loads via ZIP code. The median-based weighted of consumers, the length of the training set is 21 days (3
ensemble learning method significantly outperforms indi- weeks) whereby in every day we will consider 24 × 2 = 48
vidual forecasting models. measurements, and we will execute one hour ahead fore-
Using clustering for improving aggregated (global) casts.
load forecasting is a more widely used method than using
1. Starting with iteration iter = 0.
results of clustering for individual consumers load fore-
2. Creating of time series for each consumer of the
casting. Misiti [4] used discrete wavelet transform as the
lengths of three weeks.
preprocessing of signals from consumers. The transformed
3. Normalisation of each time series by z-score (keep-
data were then clustered and next optimally reclustered to
ing a mean and a standard deviation in memory for
a final load time series. The optimised reclustering proce-
every time series).
dure was controlled by the forecasting performance based
4. Computation of representations of each time series.
on the cross-prediction dissimilarity index. Bandyopad-

Unauthenticated
Download Date | 7/26/18 2:42 AM
40 | P. Laurinec and M. Lucká

5. K-means or K-medoids clustering of representations


and an optimal number of clusters is computed.
6. The extraction of K centroids and using them as
training set to any forecasting method.
7. The denormalisation of K forecasts using the stored
mean and standard deviation to produce N fore-
casts.
8. iter = iter + 1. If iter is divisible by 24 (iter mod 24
= 0 mod 24) then steps 4) and 5) are performed oth-
erwise they are skipped and the stored centroids are
used.
The new batch data is created by the sliding window ap- Figure 1: The z-score denormalisation procedure visualisation.
proach that is moved each time by two values, so every one
hour, because the forecasts are performed one hour (two
values) ahead. Our proposed method is compared with noise and emphasises the essential characteristics of data.
a typical approach that trains N models and produces N As we have shown in our previous work, model-based rep-
forecasts. A detailed description of the methods used in resentations are highly appropriate for seasonal time se-
the procedure above follows. ries [6]. For a model, Multiple Linear Regression (MLR),
L1 regression (L1) or Generalized Additive Model (GAM) is
used for extraction of regression coefficients of two sea-
3.1 Normalisation and representation of sonalities (daily and weekly). Formally, the model for MLR
and L1 methods can be written as follows:
time series
x t = β d1 u td1 + · · · + β dfreq u tdfreq + β w1 u tw1 + . . . (3)
The first necessary step is the normalisation of the time se-
+ β w6 u tw6 + ε t ,
ries of electricity consumption by the z-score because we
want to cluster similar patterns and not the time series ac- for t = 1, . . . , n, where x t is the t − th electricity con-
cording to the amount of energy consumption. We are us- sumption, β d1 , . . . , β dfreq are regression coefficients for
ing results of normalisation also for scaling (denormalisa- the daily season, freq is the length of one day period
tion) clustering-based forecasts. Z-score normalisation is (freq = 48 in our case), β w1 , . . . , β w6 are regression coeffi-
defined as cients for a weekly season. Weekly regression coefficients
x −µ
^x i = i , (1) are just six, not seven, because of prevention from the sin-
σ
where ^x i is a normalised value, x i is an original value, gularity of the model. The u td1 , . . . , u tdfreq , u tw1 , . . . , u tw6
i = 1, . . . , n, where n is length of time series data, µ is a are independent binary (dummy) variables representing
sample mean and σ is a sample standard deviation. Z-score the sequence numbers in the regression model. They are
denormalisation is defined according to previous equation equal to 1 in the case when they point to the j − th value
as of the season, j = 1, 2, . . . , freq, in case of a daily sea-
x i = ^x i σ + µ. (2) son and j = 1, 2, . . . , 6 in the case of a weekly season.
The ε t are random errors having the normal distribution
In Figure 1, the denormalisation procedure with z-score on of N(0, σ2 ) that are for different t mutually independent.
electricity consumption data is shown. The most widespread methods for obtaining an estimate
The next step is the computation of the time series rep- of the vector β = (β d1 , . . . , β dfreq , β w1 , . . . , β w6 ) is the Or-
resentation, which is an input to the clustering algorithm. dinary Least Squares method [9] in the case of MLR and
The modification of the time series to its representation Frisch-Newton interior point method [10] in the case of L1.
is performed by a suitable transformation. The main rea- Possible extension for linear models is the General-
son for using representations of time series is to strength ized Additive Model [11]. The difference when compared to
more effective and easier work with time series, depend- the multiple linear regression is that the variables (predic-
ing on the application. Using time series representations tors) are modelled by using the smoothing functions. The
is appropriate because the dimensionality reduction leads GAM model can be written as follows:
also to memory requirements reduction and to the decreas-
ing of computational complexity. This implicitly removes E(x i ) = β0 + f d (u id ) + f w (u iw ), i = 1, . . . , n, (4)

Unauthenticated
Download Date | 7/26/18 2:42 AM
Clustering-based forecasting method for individual consumers | 41

where f d is a cubic regression spline and u d is ter new centroids are calculated. The last two steps are re-
the vector of type (1, 2, . . . , freq, 1, . . . , freq, . . . ). peated until the classification to clusters does not change.
The f w is a P-spline and u w is the vector of type For K-medoids computation, Partition Around
(1, 1, . . . , 1, 2, . . . , 2, . . . , 7, . . . , 1, . . . ). The parameters Medoids (PAM) algorithm was used [13].
of the model are estimated by penalized iteratively re- In each special iteration (iter mod 24 = 0 mod 24) of
weighted least squares (P-IRLS). a batch processing, we have automatically determined the
Another model-based representation that was used in optimal number of clusters to K according to the internal
our experiments is a simple median daily profile. Point of validation rate Davies-Bouldin index [14].
the representation repr k , k = (1, . . . , freq) is calculated as After the clusters of consumers are computed, the orig-
follows: inal normalised time series are averaged according to cor-
responding centroids. The forecasts calculated from cen-
repr k = median((x k , x k+freq , . . . , x k+freq×(d−1) )), troids are then denormalised according to Eq. 2 to create
forecasts for every consumer.
where d is the number of days in the data set.
Figure 3 presents the clustering results of time series
In Figure 2, the transformation of time series of the
preprocessed by the MLR model-based representation us-
length of three weeks (48 × 21 = 1008) to the four different
ing the Slovak dataset. The corresponding centroid-based
model-based representations is shown.
time series are shown in Figure 4.
Original Time Series

2.0
3.3 Forecasting methods
Normalized Load

1.5

1.0

0.5
Five different forecasting methods were implemented to
0.0 observe the performance of the clustering-based forecast-
0 250 500 750 1000

Time Series Representations


Length
ing method.
1.2
Representation value

0.8 Daily Period Weekly


method
Period MLR

0.4
L1
GAM
SNAIVE
Profile

0.0

0 20 40
Seasonal naïve method is appropriate only for the time se-
Length
ries data. All forecasts are simply set to the values of the
Figure 2: The comparison of four model-based time series represen- last observations from the previous season. In our case, it
tations on a randomly picked Australian consumer. means that the forecasts of all future values are set to be
equal to the last observed values from the previous week.

MLR
3.2 Clustering
Multiple Linear Regression method was also used for fore-
For clustering consumers, we used the centroid-based
casting purposes, but the model defined in the Eq. 3 was
clustering method K-means with centroids initialisation K-
different. In this scenario, daily and weekly attributes in-
means++ [12] and K-medoids [13].
teracted with each other, so instead of ds + w6 number of
The advantage over conventional K-means is based on
attributes we have ds × w6 number of interactions.
carefully seeding of initial centroids, which improves the
speed and accuracy of clustering and it works as follows.
Let d(x) denote the shortest Euclidean distance from a data
RF
point x to the closest centroid. Let us choose an initial cen-
troid K1 uniformly at random from the set X, where X is the
Random Forests is an ensemble learning method that con-
dataset of size N × n. Choose the next center K i , K i = ^x ∈ X
2 structs a large number of trees and outputs the mean pre-
with probability ∑︀ d(^x)d(x)2 . Repeat the previous step until
x∈X diction of individual regression trees [15]. For adaptation
we have chosen all K centers. Each object from a dataset to a trend change, the dependent variable (time series of
is connected with a centroid K i that is closest to it. And af- electricity consumption) was detrended by STL decompo-

Unauthenticated
Download Date | 7/26/18 2:42 AM
42 | P. Laurinec and M. Lucká

1 2 3 4
4 2 3
2
2 1 2
1
1
0 0
0 0
−1 −1
−2 −1
5 6 7 8
3 3
2
2 2 2
1
1 1 0 1
0 0 −1 0
−2 −1
Regression Coefficients

−1 −1
−3 −2
9 10 11 12
4
4 2 2
2
2 0
0 0
0
−2
−2 −2
13 14 15 16

2 2
2 1 2
1
0 0 0
0
−1 −1
−2 −2
−2 −2
17 18 19 20
2 3 3
2 4 2
1
1 1
0 0 2
0
−1 −1 0 −1
−2 −2 −2
0 20 40 0 20 40 0 20 40 0 20 40
Length

Figure 3: 20 clusters of Slovak consumers represented by regression coeflcients created by K-means. Centroids are displayed by the red
line. Blue dashed line splits daily (48) and weekly (6) seasonal regression coeflcients.

1 2 3 4
1.5
1.0
1 1 1.0
0.5
0.5
0.0
0 0 0.0
−0.5
−1.0 −0.5

5 6 7 8
2 1.0
2
0.5 0.5
1 1 0.0 0.0
−0.5 −0.5
0 0
−1.0
−1.0
−1.5
Normalized Load

9 10 11 12
5
4 0.5 1.0 0.25
3 0.5
0.0 0.00
2 0.0
1 −0.5 −0.5 −0.25
0
−1.0 −0.50
13 14 15 16
1.0
1.0 1.0
0.5 0.5 1
0.5
0.0 0.0
0.0 0
−0.5 −0.5
−0.5
−1.0 −1.0
−1.0 −1
17 18 19 20
1.5 1.0
1 0.5
1.0
0.5 0.0
0 0.5
0.0 −0.5
0.0
−1
−0.5 −0.5 −1.0

0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000
Time

Figure 4: Final centroid-based time series based on Figure 3 are the input to forecasting methods.

Unauthenticated
Download Date | 7/26/18 2:42 AM
Clustering-based forecasting method for individual consumers | 43

sition [16] in order to improve forecasting accuracy (in Fig- was picked to produce a forecast. These models were a
ure 4, time series n.4, 7, 10, 12, 16, 18 and 20 shows the full additive model with a trend component, an additive
trend change). From the extracted trend part of the time model with a damping trend and an additive model with-
series, future values were forecasted by automatic ARIMA out a trend. The best model among them was chosen ac-
procedure [17] and added to the forecast from the RF model cording to the best values of Akaike Information Criterion
that predicts the aggregated seasonal and remainder part (AIC) [19].
of the time series. As attributes (independent variables) We remark that only the SNAIVE and CTREE forecast-
of the model, double seasonal Fourier terms were created. ing methods were used in both approaches. So the first ap-
The Fourier signals are perfect for modelling seasonal time proach was clustering-based (snaive-clust and ctree-clust)
series because they consist of periodic trigonometric func- and the second approach is benchmark simple one that
tions. The daily seasonal Fourier term has ds pairs of terms is applied to every consumer uniquely (snaive-disagg and
(︂ (︂ )︂ (︂ )︂)︂ds ctree-disagg). This is because these two methods are fast
2πjt 2πjt to compute (train) since they must be applied to every con-
sin , cos (5)
48 48 j=1 sumer separately.
and the weekly seasonal Fourier term has ws pairs of terms
(︂ (︂ )︂ (︂ )︂)︂ws
2πjt 2πjt
sin
7
, cos
7
, (6) 4 Evaluation and experiments
j=1

where t = (1, . . . , n). As we have experimentally verified The source code of the all implemented meth-
the best setting of the number of Fourier term pairs was ods is available online (https://github.com/PetoLau/
ds = 2 and ws = 2. The weekly seasonal component (part) ClusterForecast). Time series representations methods are
from STL decomposition [16] with one day lag was also available in the TSrepr package (https://cran.r-project.
used as the attribute to the model. The hyperparameters org/package=TSrepr) that enables fast computing [20].
for RF were set as follows: the number of trees was set to In the next sections, two different versions of experi-
1100, the minimum size of terminal nodes to 3 and the ments will be described. The first one is original from our
number of variables randomly sampled at each split to 3. conference paper [21] and the second one is extended eval-
uation on some different settings of used methods.

CTREE
4.1 Smart grid data
Conditional inference trees is a statistical approach to re-
cursive partitioning, which takes into account the distri- To evaluate the performance of our clustering-based fore-
butional properties of the measurements [18]. CTREE per- casting method, we used three different datasets consist-
forms multiple test procedures that are applied to deter- ing of a large number of variable patterns that were gath-
mine whether no significant association between any of ered from smart meters. This measurement data includes
the covariates and the response can be stated and the re- Irish, Slovak, and Australian electricity load data.
cursion needs to stop. Conditional inference trees method The Irish data was collected by the Irish Commission
was used with the version of the model with double- for Energy Regulation (CER) and is available from the Irish
seasonal Fourier terms as are defined in Eq. 5 and 6 with Social Science Data Archive (http://www.ucd.ie/issda/
the same number of term pairs as the RF model. data/commissionforenergyregulationcer/). This data con-
tains three different types of customers: residential, SMEs
and others. The largest group are residential, where after
ES removing consumers with missing data, we have 3639 res-
idential consumers left. The frequency of data measure-
Triple Exponential Smoothing is a forecasting method ap- ments was thirty minutes, so during a day, 48 measure-
plied to a seasonal time series, whereby past observations ments were performed.
are not weighted equally, but the weights decrease expo- The Slovak data was collected within the project “In-
nentially with time [19]. In order to adapt a model to var- ternational Centre of Excellence for Research of Intelli-
ious patterns in electricity consumption data, three dif- gent and Secure Information-Communication Technolo-
ferent models were fitted each time and the best of them gies and Systems”. These measurements were obtained

Unauthenticated
Download Date | 7/26/18 2:42 AM
44 | P. Laurinec and M. Lucká

from Slovak enterprises and factories. After removing con-


sumers with missing data, those with zero consumption
and maximal consumption higher than 42 kW, the cus-
tomer base comprised 3607 consumers.
The frequency of Slovak dataset measurements was
every fifteen minutes, so daily 96 measurements were per-
formed. The frequency of data measurements was aggre-
gated to half-hourly in order to have it equal and compara-
ble with the Irish dataset.
The Australian data was collected by the Ausgrid
company (https://www.ausgrid.com.au/Common/About-
us/Corporate-information/Data-to-share/Solar-home-
electricity-data.aspx). The half-hour electricity data is for
300 homes with rooftop solar systems that are measured
by a gross meter that records the total amount of solar
power generated every 30 minutes. The data has been
sourced from 300 randomly selected solar customers in
Ausgrid’s electricity network area that were billed on a
domestic tariff and had a gross metered solar system in- Figure 5: The map of Australian residential consumers and their
stalled for the whole of the period from 1 July 2010 to 30 corresponding nearest meteorological station. These stations are:
June 2013. Sydney, Bankstown, and Williamtown.
The Ausgrid residential dataset was merged also with
a temperature data from three meteorological stations 4.2 Original results of experiments
that were downloaded from the Weather Underground API
(https://www.wunderground.com/). The temperature fea- This section is based on our original conference paper [21].
ture was added to the CTREE method model (ctree-disagg It contains experiments on Irish and Slovak datasets. The
and ctree-t-clust). The ctree-t-clust model used tempera- K-means clustering method with the optimal number of
ture data from a station that had the highest occurrence in clusters ranged from 25 to 28 was used. As a time series
the selected cluster. The map of consumers and their near- representation method, the MLR model-based method was
est meteorological station is shown in Figure 5. used. The forecasting methods SNAIVE, RF, MLR, and ES
From each of the mentioned tree datasets, a three- were used.
week testing set was chosen to investigate the perfor- As it was mentioned above, we have calculated one
mance of the implemented methods. For the Irish residen- hour forecasts ahead (i.e., short-term load forecasting). For
tial testing dataset, the data measurements from 1.2.2010 every consumer in the dataset, 504 forecasts were per-
to 21.2.2010 were chosen. For the Slovak factories testing formed during the three weeks testing period. The statis-
dataset the data measurements from 10.2.2014 to 2.3.2014 tics of results of the MAE measure are showed in the Ta-
were chosen. For the Ausgrid residential testing dataset, ble 1. The statistics as the mean, median and maximum of
the data measurements from 24.7.2010 to 14.8.2010 were errors were computed for every consumer separately.
chosen. Moreover, we had additional 21 days (three weeks) On the Irish residential dataset, on average, the lowest
in front of every first day of the mentioned periods. Those MAE was achieved by using our clustering-based forecast-
days were used for clustering and training the forecasting ing method combined with exponential smoothing (es-
methods. clust). However, with respect to the median of errors, mlr-
The accuracy of the electricity load forecast was mea- clust methods performed best. The lowest average maxi-
sured by MAE (Mean Absolute Error) [22]. MAE is defined mal errors had mlr-clust method. On the Slovak dataset,
as on average, the lowest MAE has achieved again by us-
n
1 ∑︁ ing our clustering-based forecasting method combined
|x t − x t |, (7)
n with exponential smoothing (es-clust). However, with re-
t=1
spect to the median of errors, the completely disaggre-
where x t is a real consumption, x t is the forecasted load
gated method snaive-disagg performed best. The lowest
and n is a length of data.
average maximal errors had again mlr-clust. We can see
an obvious pattern in the results of the performed experi-

Unauthenticated
Download Date | 7/26/18 2:42 AM
Clustering-based forecasting method for individual consumers | 45

Table 1: Statistics of MAE (in kW) of 5 forecasting methods evaluated on Irish and Slovak datasets. In cells are the average values of all
consumers. disagg represents completely disaggregated forecasting, clust represents clustering-based forecasting method described in
Section 3. Underlined values in italic represent the best results among the 5 methods.

MAE Ireland Residential Slovak Factories


Mean Median Max Mean Median Max
snaive-disagg 0.3807 0.1928 3.014 2.6903 1.769 16.1599
snaive-clust 0.3373 0.235 2.6605 2.7873 2.1479 14.0958
mlr-clust 0.3403 0.2394 2.6453 2.9326 2.3109 14.0306
rf-clust 0.3394 0.2425 2.675 2.7639 2.0765 14.4476
es-clust 0.3359 0.2387 2.6629 2.6752 2.0283 14.1695

ments. The best method on average is our clustering-based 5 Method


ES_Clust
method combined with exponential smoothing (es-clust). MLR_Clust
RF_Clust
SNAIVE_Clust
The completely disaggregated approach in combination 4
SNAIVE_DisAgg

with seasonal naïve forecasting method performed best 3

with respect to the medians of errors. And finally, with re-


spect to the maximal errors, our clustering-based method MAE
2

combined with multiple linear regression (mlr-clust) had


1
the best results. So, our clustering-based method outper-
formed benchmark in the meaning of average and maxi- 0

mal error. 1 2 3 4 5 6 7 8 9 10 11 12
Hour
13 14 15 16 17 18 19 20 21 22 23 24

A better view of reasons why our clustering-based


Figure 6: Boxplots of average hourly MAE (in kW) for 5 forecasting
method was mostly not better than the benchmark, with methods on the Irish dataset. DisAgg represents completely disag-
respect to the medians of errors, are shown in the follow- gregated forecasting, Clust represents clustering-based forecasting
ing visualisations. In the Figure 6, boxplots of hourly er- method described in Section 3.
rors of forecasts by implemented methods for the Irish
dataset are shown, while in the Figure 7, boxplots for the 60
Method
ES_Clust
Slovak dataset are shown. The reason for this situation is MLR_Clust
RF_Clust
SNAIVE_Clust
the fact that many zero values of errors occur. We can see SNAIVE_DisAgg

that minimums and the lower quartiles of boxplots lie of- 40

ten on a zero value, also the median of errors is zero for


MAE

some hours. This pushes a median very low, and when


20

forecasts are performed with the seasonal naïve method


for every consumer separately, the consumption is not
changing very often during the week. Therefore the use 0

of sophisticated machine learning methods for completely 1 2 3 4 5 6 7 8 9 10 11 12


Hour
13 14 15 16 17 18 19 20 21 22 23 24

disaggregated electricity consumption time series is prob-


Figure 7: Boxplots of average hourly MAE (in kW) for 5 forecast-
lematic. However, for time series created as a result of clus- ing methods on the Slovak dataset. DisAgg represents completely
tering, it is highly recommended to use machine learning disaggregated forecasting, Clust represents clustering-based fore-
methods that for example incorporate a time series trend casting method described in Section 3.
to a model.

only from Irish and Ausgrid residential datasets are shown


here because on Slovak factory dataset the results were not
4.3 Extended evaluation
satisfactory. This implies that our clustering-based method
is appropriate only for residential consumers data.
In order to push median of forecasting errors lower than
Results of experiments are shown in the Table 2 and 3.
in the completely disaggregated approach, the new exper-
These tables contain also information about what type of
iments with other time series representation methods and
clustering method was used (K-means or K-medoids), what
clustering settings were performed. Results of experiments
type of time series representation method was used (L1,

Unauthenticated
Download Date | 7/26/18 2:42 AM
Table 2: Statistics of MAE (in kW) of 7 forecasting methods evaluated on Ausgrid dataset. In cells are the average values of all consumers. disagg represents completely disaggregated fore-
casting, clust represents clustering-based forecasting method described in Section 3. Kmed represents usage of K-medoids and Km represents K-means. Numbers at clustering and represen-
tation method abbreviation represent range of number of clusters. Underlined values in italic represent the best results among the all forecasting methods.

MAE Ausgrid Residential


46 | P. Laurinec and M. Lucká

Kmed GAM 30-33 Km GAM 20-23 Km L1 20-23 Km L1 27-30


mean median max mean median max mean median max mean median max
snaive-disagg 0.268 0.146 1.730 0.268 0.146 1.730 0.268 0.146 1.730 0.268 0.146 1.730
ctree-disagg 0.234 0.143 1.587 0.234 0.143 1.587 0.234 0.143 1.587 0.234 0.143 1.587
snaive-clust 0.232 0.143 1.547 0.232 0.144 1.527 0.232 0.140 1.546 0.248 0.181 1.466
mlr-clust 0.226 0.142 1.524 0.227 0.142 1.512 0.226 0.138 1.531 0.243 0.180 1.444
es-clust 0.226 0.140 1.544 0.227 0.141 1.532 0.224 0.137 1.550 0.239 0.175 1.452
ctree-t-clust 0.226 0.140 1.535 0.227 0.141 1.520 0.225 0.137 1.541 0.241 0.176 1.455
ctree-clust 0.225 0.141 1.527 0.227 0.142 1.513 0.226 0.138 1.531 0.242 0.178 1.443
Table 3: Statistics of MAE (in kW) of 6 forecasting methods evaluated on Ireland dataset. In cells are the average values of all consumers. disagg represents completely disaggregated fore-
casting, clust represents clustering-based forecasting method described in Section 3. Kmed represents usage of K-medoids and Km represents K-means. Numbers at clustering and represen-
tation method abbreviation represent range of number of clusters. Underlined values in italic represent the best results among the all forecasting methods.

MAE Ireland Residential


Kmed L1 37-40 Km GAM 37-40 Kmed GAM 40-43 Kmed Profile 40-43
mean med max mean med max mean med max mean med max
snaive-disagg 0.381 0.193 3.014 0.381 0.193 3.014 0.381 0.193 3.014 0.381 0.193 3.014
ctree-disagg 0.324 0.195 2.637 0.324 0.195 2.637 0.324 0.195 2.637 0.324 0.195 2.637
snaive-clust 0.305 0.165 2.860 0.333 0.234 2.651 0.307 0.169 2.848 0.304 0.164 2.862
mlr-clust 0.305 0.167 2.847 0.335 0.238 2.637 0.307 0.171 2.832 0.304 0.166 2.849
es-clust 0.304 0.164 2.869 0.332 0.234 2.655 0.306 0.168 2.854 0.304 0.163 2.871
ctree-clust 0.305 0.166 2.854 0.333 0.236 2.643 0.306 0.170 2.840 0.304 0.165 2.856

Download Date | 7/26/18 2:42 AM


Unauthenticated
Clustering-based forecasting method for individual consumers | 47

Table 4: Counts of Ausgrid residential consumers with better forecasting accuracy with our method than benchmark approach. DisAgg rep-
resents completely disaggregated forecasting, Clust represents clustering-based forecasting method described in Section 3. Kmed repre-
sents usage of K-medoids and Km represents K-means. Sig represent the count of significantly better results, and Mean represent the count
of better results on average. Underlined values in italic represent the best results among the all 4 approaches.

Ausgrid Residential
Kmed GAM Km GAM Km L1 Km L1
Number of clusters 30-33 20-23 20-23 27-30
snaive DisAgg-Clust Sig 139 143 144 86
snaive DisAgg-Clust Mean 251 255 247 219
ctree DisAgg-Clust Sig 75 74 85 24
ctree DisAgg-Clust Mean 205 199 205 133
snaive-es DisAgg-Clust Sig 142 147 155 103
snaive-es DisAgg-Clust Mean 265 258 257 241
ctree-es DisAgg-Clust Sig 77 75 78 27
ctree-es DisAgg-Clust Mean 204 198 205 146

GAM or median daily profile) and the range of an optimal with our method than the benchmark approach. We also
number of clusters. counted how many consumers had significantly better
On the Ausgrid dataset (Table 2), on average, the forecasting accuracy based on the p-value of Wilcoxon
lowest MAE was achieved by using our clustering-based rank sum test with our method than the benchmark ap-
method with K-means, L1 representation, and range of a proach (the significance is in our case p-value less than
number of clusters 20 − 23 and with K-medoids, GAM rep- 0.05). These results are shown in Table 4 and 5.
resentation and range of the number of clusters 30−33. On On the Ausgrid dataset (Table 4), the snaive-clust
median, the lowest MAE has achieved again by using our method was better on average than snaive-disagg in 80%
clustering-based method with K-means, L1 representation, of all cases. The ctree-t-clust method was better on aver-
and range of the number of clusters 20 − 23. However, on age than ctree-disagg in 68.3% of all cases. The es-clust
maximum, the lowest MAE was achieved by our clustering- method (on average and median the best performing fore-
based method with K-means, L1 representation, and range casting method) was better on average than snaive-disagg
of the number of clusters 27 − 30. in 88.3% of all cases. The es-clust method was better on
On the Irish dataset (Table 3), on average and also average than ctree-disagg in 68.3% of all cases.
median, the lowest MAE was achieved by using our On the Irish dataset (Table 5), the snaive-clust method
clustering-based method with K-medoids, median daily was better on average than snaive-disagg in 93.9% of all
profile representation and range of the number of clus- cases. The ctree-clust method was better on average than
ters 40 − 43. However, on maximum, the lowest MAE was ctree-disagg in 80% of all cases. The es-clust method
achieved by our clustering-based method with K-means, (on average and median the best performing forecast-
GAM representation and range of the number of clusters ing method) was better on average than snaive-disagg in
37 − 40. 93.7% of all cases. The es-clust method was better on aver-
The most important remark is that our clustering- age than ctree-disagg in 79.9% of all cases.
based forecasting method achieved better forecasting ac- These numbers are really high and imply efficiency
curacy results than the completely disaggregated ap- of our clustering-based forecasting method. The counts of
proach in every occasion and metric (also median here). It significantly better results of forecasting are not that high
implies that we can find the combination of proper time se- as results on average. However, these numbers are still
ries representation and clustering setting to improve fore- high to be satisfactory.
casting accuracy for individual consumers. In Figure 8, the clustered Ausgrid dataset time series
The important question is also how much better is preprocessed by the L1 model-based representation are
our method. In other words, on how many consumers shown for the comparison of differences between enter-
the forecasting accuracy results of our method were bet- prise time series data shown in Figure 3.
ter than the benchmark case. We counted how many In the Figure 9, boxplots of hourly errors of fore-
consumers had better forecasting accuracy on average casts by implemented methods for the Ausgrid dataset

Unauthenticated
Download Date | 7/26/18 2:42 AM
48 | P. Laurinec and M. Lucká

Table 5: Counts of Irish residential consumers with better forecasting accuracy with our method than benchmark approach. DisAgg repre-
sents completely disaggregated forecasting, Clust represents clustering-based forecasting method described in Section 3. Kmed repre-
sents usage of K-medoids and Km represents K-means. Sig represent the count of significantly better results, and Mean represent the count
of better results on average. Underlined values in italic represent the best results among the all 4 approaches.

Ireland Residential
Kmed L1 Km GAM Kmed GAM Kmed Profile
Number of clusters 37-40 37-40 40-43 40-43
snaive DisAgg-Clust Sig 2626 1468 2548 2657
snaive DisAgg-Clust Mean 3402 3168 3388 3417
ctree DisAgg-Clust Sig 1554 200 1477 1626
ctree DisAgg-Clust Mean 2889 1586 2861 2912
snaive-es DisAgg-Clust Sig 2645 1487 2546 2668
snaive-es DisAgg-Clust Mean 3394 3174 3384 3408
ctree-es DisAgg-Clust Sig 1578 223 1463 1594
ctree-es DisAgg-Clust Mean 2881 1714 2839 2906

1 2 3 4
2
3 3
2
1 2 2
1
1 1
0
0 0
0
−1 −1 −1
−1
5 6 7 8
3 2 2
2
2
1 1 1 1
0 0
0 0
−1
−1
−2 −1
Regression Coefficients

9 10 11 12
3
3
2
1 2 2

1 1 1
0
0 0 0

−1 −1 −1
−1
13 14 15 16
3 3 3
2
2 2 2

1 1 1 1

0 0 0 0

−1 −1 −1 −1
17 18 19 20
2
2 3
1
2 1
1
0 1
0
0 0
−1 −1
−1 −1
0 20 40 0 20 40 0 20 40 0 20 40
Length

Figure 8: 20 clusters of Ausgrid consumers represented by L1 regression coeflcients created by K-means. Centroids are displayed by the
red line. Blue dashed line splits daily (48) and weekly (6) seasonal regression coeflcients.

Unauthenticated
Download Date | 7/26/18 2:42 AM
Clustering-based forecasting method for individual consumers | 49

We can say that it can be found such time series rep-


Method
SNAIVE_DisAgg
CTREE_DisAgg
SNAIVE_Clust
resentation and clustering setting that will our forecasting
MLR_Clust

2
ES_Clust
CTREE−tem_Clust
CTREE_Clust
method benefit and will improve the forecasting accuracy.
We proved that this statement is valid especially for resi-
dential consumers. Also, we proved that our robust model-
MAE

1 based representations are highly appropriate for smart me-


ter time series data. In addition to the forecasting accuracy
factor, our method is interesting also for its improved scal-
0 ability against a fully disaggregated approach. Our method
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour needs to train only K models (in our case about 20 − 40)
Figure 9: Boxplots of average hourly MAE (in kW) for 7 forecasting instead of N models (thousands) that is leading to a huge
methods on the Ausgrid dataset. DisAgg represents completely decrease of the computational load.
disaggregated forecasting, Clust represents clustering-based fore-
casting method described in Section 3. Acknowledgement: This work was partially supported by
the Slovak Research and Development Agency, Grant No.
are shown. We can see that again in the hours when the APVV-16-0484 and No. APVV-16-0213.
consumption is low (early morning), the fully disaggre-
gated approach performed better than our method. How-
ever, during residential peak hours (breakfast and evening References
time), our method is evidently more accurate than the
benchmark approach. [1] Wijaya T. K., Humeau S. F. R. J., Vasirani M., Aberer K., Individual,
aggregate, and cluster-based aggregate forecasting of residen-
tial demand, EPFL, Tech. Rep., 2014
[2] Ghofrani M., Hassanzadeh M., Etezadi-Amoli M., Fadali M. S.,
5 Conclusion Smart meter based short-term load forecasting for residen-
tial customers, 43rd North American Power Symposium (NAPS
2011), 2011, 13–17
In this paper, we have proposed the clustering-based fore-
[3] Kosková G., Laurinec P., Rozinajová V., Ezzeddine A. B., Lucká
casting method for disaggregated end-consumer electric- M., Lacko P., Vrablecová P., Návrat P., Incremental ensemble
ity load using all data from consumers in a smart grid. learning for electricity load forecasting, Acta Polytechnica Hun-
The clustering procedure consists of the preprocessing of garica, 2015, 13(2), 97–117
time series from consumers by z-score normalisation and [4] Misiti M., Misiti Y., Oppenheim G., Poggi J. M., Optimized
computation of model-based representations of time se- clusters for disaggregated electricity load forecasting, Revstat,
2010, 8(2), 105–124
ries. After the K-means or K-medoids based clustering is
[5] Bandyopadhyay S., Ganu T., Khadilkar H., Arya V., Individual and
calculated, the number of clusters is optimally found by aggregate electrical load forecasting: One for all and all for one,
DB-index. Then the centroids of clusters are extracted and In: Proceedings of the 2015 ACM 6th International Conference
used as training sets to implemented forecasting methods. on Future Energy Systems, 2015, 121–130
Centroid-based forecasts are scaled by stored normalisa- [6] Laurinec P., Lucká M., Comparison of representations of time
series for clustering smart meter data, In: Lecture Notes in En-
tion results (mean and standard deviation) to create a fore-
gineering and Computer Science: Proceedings of The World
cast for every consumer. The method was evaluated on Congress on Engineering and Computer Science 2016, 2016,
three real datasets that consist of a large number of con- 458–463
sumption patterns and compared with the approach that [7] Laurinec P., Lóderer M., Vrablecová P., Lucká M., Rozinajová V.,
train a forecasting model to every consumer separately. Ezzeddine A. B., Adaptive time series forecasting of energy con-
We proved that our clustering-based method decreases the sumption using optimized cluster analysis, In: 2016 IEEE 16th
International Conference on Data Mining Workshops (ICDMW),
forecasting error in the meaning of an average, median
IEEE, 2016, 398–405
and maximum on Irish and Ausgrid datasets. On Slovak [8] Laurinec P., Lucká M., Usefulness of unsupervised ensemble
dataset, our clustering-based method decreases the fore- learning methods for time series forecasting of aggregated or
casting error in the meaning of an average and maximum. clustered load, In: Loglisci C., Manco G., Masciari E., Ras Z. W.
However, the error rates did not decrease with respect to (Eds.), Annalisa Appice, New Frontiers in Mining Complex Pat-
terns, Cham, Springer International Publishing, 2018, 122–137
the median because of the nature of enterprise smart me-
[9] Friedman J., Hastie T., Tibshirani R., The elements of statistical
ter data. learning, volume 1, Springer series in statistics, New York, 2001

Unauthenticated
Download Date | 7/26/18 2:42 AM
50 | P. Laurinec and M. Lucká

[10] Koenker R., Quantile regression, Number 38, Cambridge Univer- [17] Hyndman, R., Khandakar Y., Automatic time series forecasting:
sity Press, 2005 The forecast package for R, Journal of Statistical Software, Arti-
[11] Wood S., Generalized Additive Models: An Introduction with R, cles, 2008, 27(3), 1–22
Chapman and Hall/CRC, 2006 [18] Strasser H., Weber Ch., On the asymptotic theory of permutation
[12] Arthur D., Vassilvitskii S., K-means++: The advantages of careful statistics, SFB Adaptive Information Systems and Modelling in
seeding, In: SODA ’07 Proceedings of the 8th annual ACM-SIAM Economics and Management Science, 1999
Symposium on Discrete Algorithms, 2007, 1027–1035 [19] Hyndman R. J., Koehler A. B., Snyder R. D., Grose S., A state
[13] Kaufman L., Rousseeuw P. J., Finding Groups in Data: An In- space framework for automatic forecasting using exponential
troduction to Cluster Analysis, Wiley Series in Probability and smoothing methods, International Journal of Forecasting, 2002,
Statistics, Wiley, 2009 18(3), 439–454
[14] Davies D. L., Bouldin D. W., A cluster separation measure, [20] Laurinec P., TSrepr R package: Time series representations,
IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal of Open Source Software, 2018, 3(23), 577
1979, 1(2), 224–227 [21] Laurinec P., Lucká M., New clustering-based forecasting method
[15] Breiman L., Random forests, Machine Learning, 2001, 45(1), for disaggregated end-consumer electricity load using smart
5–32 grid data, In: 2017 IEEE 14th International Scientific Conference
[16] Cleveland R. B., Cleveland W. S., McRae J. E., Terpenning I., STL: on Informatics, 2017, 210–215
A seasonal-trend decomposition procedure based on Loess, [22] Hyndman R. J., Koehler A. B., Another look at measures of fore-
Journal of Oflcial Statistics, 1990, 6(1), 3–73 cast accuracy, International Journal of Forecasting, 2006, 22(4),
679–688

Unauthenticated
Download Date | 7/26/18 2:42 AM

You might also like