Clustering-Based Forecasting Method For Individual
Clustering-Based Forecasting Method For Individual
2018; 8:38–50
Research Article
disaggregated time series (avoiding also overfitting). Our hyay et al. [5] proposed context-based forecasting with
method decreases the maximum forecasting errors as well. clustering for improving the forecast accuracy of aggregate
This paper is structured as follows: Section 2 con- electricity consumption. They compared results from the
tains an introduction describing related works and con- proposed method with the completely aggregated method,
tributions. Section 3 describes our approach containing completely disaggregated method, and K-means cluster-
the methods used for time series processing, cluster anal- ing. In our previous works [6–8], we have focused on time
ysis, and forecasting. Section 4 presents a description of series representations and various forecasting methods to
the used datasets and the evaluation of performed experi- improve the forecasting accuracy of an aggregated load in
ments, and Section 5 concludes the paper. the combination of K-means clustering. We concluded that
the model-based representation methods are the best for
the extracting of consumers’ patterns. We have proved that
the clustering based on different types of representations
2 Related work gives more accurate results than clustering using only the
original load time series.
There are only a few research papers that are focused on an
individual (disaggregated) consumer consumption fore-
casting in a smart grid. This can be due to its difficulty and
2.1 Contributions
the possibility of unstable results. According to our best
knowledge, there are no papers describing methods based
According to our best knowledge, until now the combi-
on all consumers load profiles clustering to improve the
nation of results of the clustering of all consumers in a
accuracy of individual forecasts.
smart grid and forecasting consumption of individual con-
Wijaya et al. [1] use a correlation-based feature se-
sumers has not been explored and evaluated. We propose
lection method for forecasting individual and also aggre-
a centroid-based method for forecasting that increases
gate residential electricity load. They use linear regres-
forecasting accuracy and decreases computational load as
sion, multilayer perceptron and support vector regression
well. The time needed for calculations is significantly re-
as forecasting methods. Forecasting aggregate electricity
duced from the N trained models, where N is the number
load is enhanced by clustering, but for individual con-
of consumers, to K trained models, where K is the num-
sumers forecasting the clustering is not used. Support vec-
ber of centroids (clusters created). An additional bonus is
tor regression and linear regression achieved best results
an information describing typical profiles of consumption
of the forecast for an individual residential consumer, but
created by clustering that can be used for further analysis.
error rate remains high around 45% of the NMAE mea-
sure (Normalised Mean Absolute Error). Ghofrani et al. [2]
use spectral analysis and Kalman filtering as the method
for residential customers. They evaluated three forecast- 3 Proposed method
ing horizons: 15 minutes, 30 minutes and 1 hour. The 1-
hour horizon forecasts had 30% error rate of the MAPE The description of our proposed clustering-based (or
measure (Mean Absolute Percentage Error) on average. centroid-based) method for disaggregated end-consumer
Koskova et al. [3] use ensembles of variable time series forecasting of electricity load can be better illustrated via
analysis and regression methods to produce forecasts of a special example. We will suppose that N is a number
aggregate loads via ZIP code. The median-based weighted of consumers, the length of the training set is 21 days (3
ensemble learning method significantly outperforms indi- weeks) whereby in every day we will consider 24 × 2 = 48
vidual forecasting models. measurements, and we will execute one hour ahead fore-
Using clustering for improving aggregated (global) casts.
load forecasting is a more widely used method than using
1. Starting with iteration iter = 0.
results of clustering for individual consumers load fore-
2. Creating of time series for each consumer of the
casting. Misiti [4] used discrete wavelet transform as the
lengths of three weeks.
preprocessing of signals from consumers. The transformed
3. Normalisation of each time series by z-score (keep-
data were then clustered and next optimally reclustered to
ing a mean and a standard deviation in memory for
a final load time series. The optimised reclustering proce-
every time series).
dure was controlled by the forecasting performance based
4. Computation of representations of each time series.
on the cross-prediction dissimilarity index. Bandyopad-
Unauthenticated
Download Date | 7/26/18 2:42 AM
40 | P. Laurinec and M. Lucká
Unauthenticated
Download Date | 7/26/18 2:42 AM
Clustering-based forecasting method for individual consumers | 41
where f d is a cubic regression spline and u d is ter new centroids are calculated. The last two steps are re-
the vector of type (1, 2, . . . , freq, 1, . . . , freq, . . . ). peated until the classification to clusters does not change.
The f w is a P-spline and u w is the vector of type For K-medoids computation, Partition Around
(1, 1, . . . , 1, 2, . . . , 2, . . . , 7, . . . , 1, . . . ). The parameters Medoids (PAM) algorithm was used [13].
of the model are estimated by penalized iteratively re- In each special iteration (iter mod 24 = 0 mod 24) of
weighted least squares (P-IRLS). a batch processing, we have automatically determined the
Another model-based representation that was used in optimal number of clusters to K according to the internal
our experiments is a simple median daily profile. Point of validation rate Davies-Bouldin index [14].
the representation repr k , k = (1, . . . , freq) is calculated as After the clusters of consumers are computed, the orig-
follows: inal normalised time series are averaged according to cor-
responding centroids. The forecasts calculated from cen-
repr k = median((x k , x k+freq , . . . , x k+freq×(d−1) )), troids are then denormalised according to Eq. 2 to create
forecasts for every consumer.
where d is the number of days in the data set.
Figure 3 presents the clustering results of time series
In Figure 2, the transformation of time series of the
preprocessed by the MLR model-based representation us-
length of three weeks (48 × 21 = 1008) to the four different
ing the Slovak dataset. The corresponding centroid-based
model-based representations is shown.
time series are shown in Figure 4.
Original Time Series
2.0
3.3 Forecasting methods
Normalized Load
1.5
1.0
0.5
Five different forecasting methods were implemented to
0.0 observe the performance of the clustering-based forecast-
0 250 500 750 1000
0.4
L1
GAM
SNAIVE
Profile
0.0
0 20 40
Seasonal naïve method is appropriate only for the time se-
Length
ries data. All forecasts are simply set to the values of the
Figure 2: The comparison of four model-based time series represen- last observations from the previous season. In our case, it
tations on a randomly picked Australian consumer. means that the forecasts of all future values are set to be
equal to the last observed values from the previous week.
MLR
3.2 Clustering
Multiple Linear Regression method was also used for fore-
For clustering consumers, we used the centroid-based
casting purposes, but the model defined in the Eq. 3 was
clustering method K-means with centroids initialisation K-
different. In this scenario, daily and weekly attributes in-
means++ [12] and K-medoids [13].
teracted with each other, so instead of ds + w6 number of
The advantage over conventional K-means is based on
attributes we have ds × w6 number of interactions.
carefully seeding of initial centroids, which improves the
speed and accuracy of clustering and it works as follows.
Let d(x) denote the shortest Euclidean distance from a data
RF
point x to the closest centroid. Let us choose an initial cen-
troid K1 uniformly at random from the set X, where X is the
Random Forests is an ensemble learning method that con-
dataset of size N × n. Choose the next center K i , K i = ^x ∈ X
2 structs a large number of trees and outputs the mean pre-
with probability ∑︀ d(^x)d(x)2 . Repeat the previous step until
x∈X diction of individual regression trees [15]. For adaptation
we have chosen all K centers. Each object from a dataset to a trend change, the dependent variable (time series of
is connected with a centroid K i that is closest to it. And af- electricity consumption) was detrended by STL decompo-
Unauthenticated
Download Date | 7/26/18 2:42 AM
42 | P. Laurinec and M. Lucká
1 2 3 4
4 2 3
2
2 1 2
1
1
0 0
0 0
−1 −1
−2 −1
5 6 7 8
3 3
2
2 2 2
1
1 1 0 1
0 0 −1 0
−2 −1
Regression Coefficients
−1 −1
−3 −2
9 10 11 12
4
4 2 2
2
2 0
0 0
0
−2
−2 −2
13 14 15 16
2 2
2 1 2
1
0 0 0
0
−1 −1
−2 −2
−2 −2
17 18 19 20
2 3 3
2 4 2
1
1 1
0 0 2
0
−1 −1 0 −1
−2 −2 −2
0 20 40 0 20 40 0 20 40 0 20 40
Length
Figure 3: 20 clusters of Slovak consumers represented by regression coeflcients created by K-means. Centroids are displayed by the red
line. Blue dashed line splits daily (48) and weekly (6) seasonal regression coeflcients.
1 2 3 4
1.5
1.0
1 1 1.0
0.5
0.5
0.0
0 0 0.0
−0.5
−1.0 −0.5
5 6 7 8
2 1.0
2
0.5 0.5
1 1 0.0 0.0
−0.5 −0.5
0 0
−1.0
−1.0
−1.5
Normalized Load
9 10 11 12
5
4 0.5 1.0 0.25
3 0.5
0.0 0.00
2 0.0
1 −0.5 −0.5 −0.25
0
−1.0 −0.50
13 14 15 16
1.0
1.0 1.0
0.5 0.5 1
0.5
0.0 0.0
0.0 0
−0.5 −0.5
−0.5
−1.0 −1.0
−1.0 −1
17 18 19 20
1.5 1.0
1 0.5
1.0
0.5 0.0
0 0.5
0.0 −0.5
0.0
−1
−0.5 −0.5 −1.0
0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000
Time
Figure 4: Final centroid-based time series based on Figure 3 are the input to forecasting methods.
Unauthenticated
Download Date | 7/26/18 2:42 AM
Clustering-based forecasting method for individual consumers | 43
sition [16] in order to improve forecasting accuracy (in Fig- was picked to produce a forecast. These models were a
ure 4, time series n.4, 7, 10, 12, 16, 18 and 20 shows the full additive model with a trend component, an additive
trend change). From the extracted trend part of the time model with a damping trend and an additive model with-
series, future values were forecasted by automatic ARIMA out a trend. The best model among them was chosen ac-
procedure [17] and added to the forecast from the RF model cording to the best values of Akaike Information Criterion
that predicts the aggregated seasonal and remainder part (AIC) [19].
of the time series. As attributes (independent variables) We remark that only the SNAIVE and CTREE forecast-
of the model, double seasonal Fourier terms were created. ing methods were used in both approaches. So the first ap-
The Fourier signals are perfect for modelling seasonal time proach was clustering-based (snaive-clust and ctree-clust)
series because they consist of periodic trigonometric func- and the second approach is benchmark simple one that
tions. The daily seasonal Fourier term has ds pairs of terms is applied to every consumer uniquely (snaive-disagg and
(︂ (︂ )︂ (︂ )︂)︂ds ctree-disagg). This is because these two methods are fast
2πjt 2πjt to compute (train) since they must be applied to every con-
sin , cos (5)
48 48 j=1 sumer separately.
and the weekly seasonal Fourier term has ws pairs of terms
(︂ (︂ )︂ (︂ )︂)︂ws
2πjt 2πjt
sin
7
, cos
7
, (6) 4 Evaluation and experiments
j=1
where t = (1, . . . , n). As we have experimentally verified The source code of the all implemented meth-
the best setting of the number of Fourier term pairs was ods is available online (https://github.com/PetoLau/
ds = 2 and ws = 2. The weekly seasonal component (part) ClusterForecast). Time series representations methods are
from STL decomposition [16] with one day lag was also available in the TSrepr package (https://cran.r-project.
used as the attribute to the model. The hyperparameters org/package=TSrepr) that enables fast computing [20].
for RF were set as follows: the number of trees was set to In the next sections, two different versions of experi-
1100, the minimum size of terminal nodes to 3 and the ments will be described. The first one is original from our
number of variables randomly sampled at each split to 3. conference paper [21] and the second one is extended eval-
uation on some different settings of used methods.
CTREE
4.1 Smart grid data
Conditional inference trees is a statistical approach to re-
cursive partitioning, which takes into account the distri- To evaluate the performance of our clustering-based fore-
butional properties of the measurements [18]. CTREE per- casting method, we used three different datasets consist-
forms multiple test procedures that are applied to deter- ing of a large number of variable patterns that were gath-
mine whether no significant association between any of ered from smart meters. This measurement data includes
the covariates and the response can be stated and the re- Irish, Slovak, and Australian electricity load data.
cursion needs to stop. Conditional inference trees method The Irish data was collected by the Irish Commission
was used with the version of the model with double- for Energy Regulation (CER) and is available from the Irish
seasonal Fourier terms as are defined in Eq. 5 and 6 with Social Science Data Archive (http://www.ucd.ie/issda/
the same number of term pairs as the RF model. data/commissionforenergyregulationcer/). This data con-
tains three different types of customers: residential, SMEs
and others. The largest group are residential, where after
ES removing consumers with missing data, we have 3639 res-
idential consumers left. The frequency of data measure-
Triple Exponential Smoothing is a forecasting method ap- ments was thirty minutes, so during a day, 48 measure-
plied to a seasonal time series, whereby past observations ments were performed.
are not weighted equally, but the weights decrease expo- The Slovak data was collected within the project “In-
nentially with time [19]. In order to adapt a model to var- ternational Centre of Excellence for Research of Intelli-
ious patterns in electricity consumption data, three dif- gent and Secure Information-Communication Technolo-
ferent models were fitted each time and the best of them gies and Systems”. These measurements were obtained
Unauthenticated
Download Date | 7/26/18 2:42 AM
44 | P. Laurinec and M. Lucká
Unauthenticated
Download Date | 7/26/18 2:42 AM
Clustering-based forecasting method for individual consumers | 45
Table 1: Statistics of MAE (in kW) of 5 forecasting methods evaluated on Irish and Slovak datasets. In cells are the average values of all
consumers. disagg represents completely disaggregated forecasting, clust represents clustering-based forecasting method described in
Section 3. Underlined values in italic represent the best results among the 5 methods.
mal error. 1 2 3 4 5 6 7 8 9 10 11 12
Hour
13 14 15 16 17 18 19 20 21 22 23 24
Unauthenticated
Download Date | 7/26/18 2:42 AM
Table 2: Statistics of MAE (in kW) of 7 forecasting methods evaluated on Ausgrid dataset. In cells are the average values of all consumers. disagg represents completely disaggregated fore-
casting, clust represents clustering-based forecasting method described in Section 3. Kmed represents usage of K-medoids and Km represents K-means. Numbers at clustering and represen-
tation method abbreviation represent range of number of clusters. Underlined values in italic represent the best results among the all forecasting methods.
Table 4: Counts of Ausgrid residential consumers with better forecasting accuracy with our method than benchmark approach. DisAgg rep-
resents completely disaggregated forecasting, Clust represents clustering-based forecasting method described in Section 3. Kmed repre-
sents usage of K-medoids and Km represents K-means. Sig represent the count of significantly better results, and Mean represent the count
of better results on average. Underlined values in italic represent the best results among the all 4 approaches.
Ausgrid Residential
Kmed GAM Km GAM Km L1 Km L1
Number of clusters 30-33 20-23 20-23 27-30
snaive DisAgg-Clust Sig 139 143 144 86
snaive DisAgg-Clust Mean 251 255 247 219
ctree DisAgg-Clust Sig 75 74 85 24
ctree DisAgg-Clust Mean 205 199 205 133
snaive-es DisAgg-Clust Sig 142 147 155 103
snaive-es DisAgg-Clust Mean 265 258 257 241
ctree-es DisAgg-Clust Sig 77 75 78 27
ctree-es DisAgg-Clust Mean 204 198 205 146
GAM or median daily profile) and the range of an optimal with our method than the benchmark approach. We also
number of clusters. counted how many consumers had significantly better
On the Ausgrid dataset (Table 2), on average, the forecasting accuracy based on the p-value of Wilcoxon
lowest MAE was achieved by using our clustering-based rank sum test with our method than the benchmark ap-
method with K-means, L1 representation, and range of a proach (the significance is in our case p-value less than
number of clusters 20 − 23 and with K-medoids, GAM rep- 0.05). These results are shown in Table 4 and 5.
resentation and range of the number of clusters 30−33. On On the Ausgrid dataset (Table 4), the snaive-clust
median, the lowest MAE has achieved again by using our method was better on average than snaive-disagg in 80%
clustering-based method with K-means, L1 representation, of all cases. The ctree-t-clust method was better on aver-
and range of the number of clusters 20 − 23. However, on age than ctree-disagg in 68.3% of all cases. The es-clust
maximum, the lowest MAE was achieved by our clustering- method (on average and median the best performing fore-
based method with K-means, L1 representation, and range casting method) was better on average than snaive-disagg
of the number of clusters 27 − 30. in 88.3% of all cases. The es-clust method was better on
On the Irish dataset (Table 3), on average and also average than ctree-disagg in 68.3% of all cases.
median, the lowest MAE was achieved by using our On the Irish dataset (Table 5), the snaive-clust method
clustering-based method with K-medoids, median daily was better on average than snaive-disagg in 93.9% of all
profile representation and range of the number of clus- cases. The ctree-clust method was better on average than
ters 40 − 43. However, on maximum, the lowest MAE was ctree-disagg in 80% of all cases. The es-clust method
achieved by our clustering-based method with K-means, (on average and median the best performing forecast-
GAM representation and range of the number of clusters ing method) was better on average than snaive-disagg in
37 − 40. 93.7% of all cases. The es-clust method was better on aver-
The most important remark is that our clustering- age than ctree-disagg in 79.9% of all cases.
based forecasting method achieved better forecasting ac- These numbers are really high and imply efficiency
curacy results than the completely disaggregated ap- of our clustering-based forecasting method. The counts of
proach in every occasion and metric (also median here). It significantly better results of forecasting are not that high
implies that we can find the combination of proper time se- as results on average. However, these numbers are still
ries representation and clustering setting to improve fore- high to be satisfactory.
casting accuracy for individual consumers. In Figure 8, the clustered Ausgrid dataset time series
The important question is also how much better is preprocessed by the L1 model-based representation are
our method. In other words, on how many consumers shown for the comparison of differences between enter-
the forecasting accuracy results of our method were bet- prise time series data shown in Figure 3.
ter than the benchmark case. We counted how many In the Figure 9, boxplots of hourly errors of fore-
consumers had better forecasting accuracy on average casts by implemented methods for the Ausgrid dataset
Unauthenticated
Download Date | 7/26/18 2:42 AM
48 | P. Laurinec and M. Lucká
Table 5: Counts of Irish residential consumers with better forecasting accuracy with our method than benchmark approach. DisAgg repre-
sents completely disaggregated forecasting, Clust represents clustering-based forecasting method described in Section 3. Kmed repre-
sents usage of K-medoids and Km represents K-means. Sig represent the count of significantly better results, and Mean represent the count
of better results on average. Underlined values in italic represent the best results among the all 4 approaches.
Ireland Residential
Kmed L1 Km GAM Kmed GAM Kmed Profile
Number of clusters 37-40 37-40 40-43 40-43
snaive DisAgg-Clust Sig 2626 1468 2548 2657
snaive DisAgg-Clust Mean 3402 3168 3388 3417
ctree DisAgg-Clust Sig 1554 200 1477 1626
ctree DisAgg-Clust Mean 2889 1586 2861 2912
snaive-es DisAgg-Clust Sig 2645 1487 2546 2668
snaive-es DisAgg-Clust Mean 3394 3174 3384 3408
ctree-es DisAgg-Clust Sig 1578 223 1463 1594
ctree-es DisAgg-Clust Mean 2881 1714 2839 2906
1 2 3 4
2
3 3
2
1 2 2
1
1 1
0
0 0
0
−1 −1 −1
−1
5 6 7 8
3 2 2
2
2
1 1 1 1
0 0
0 0
−1
−1
−2 −1
Regression Coefficients
9 10 11 12
3
3
2
1 2 2
1 1 1
0
0 0 0
−1 −1 −1
−1
13 14 15 16
3 3 3
2
2 2 2
1 1 1 1
0 0 0 0
−1 −1 −1 −1
17 18 19 20
2
2 3
1
2 1
1
0 1
0
0 0
−1 −1
−1 −1
0 20 40 0 20 40 0 20 40 0 20 40
Length
Figure 8: 20 clusters of Ausgrid consumers represented by L1 regression coeflcients created by K-means. Centroids are displayed by the
red line. Blue dashed line splits daily (48) and weekly (6) seasonal regression coeflcients.
Unauthenticated
Download Date | 7/26/18 2:42 AM
Clustering-based forecasting method for individual consumers | 49
2
ES_Clust
CTREE−tem_Clust
CTREE_Clust
method benefit and will improve the forecasting accuracy.
We proved that this statement is valid especially for resi-
dential consumers. Also, we proved that our robust model-
MAE
Unauthenticated
Download Date | 7/26/18 2:42 AM
50 | P. Laurinec and M. Lucká
[10] Koenker R., Quantile regression, Number 38, Cambridge Univer- [17] Hyndman, R., Khandakar Y., Automatic time series forecasting:
sity Press, 2005 The forecast package for R, Journal of Statistical Software, Arti-
[11] Wood S., Generalized Additive Models: An Introduction with R, cles, 2008, 27(3), 1–22
Chapman and Hall/CRC, 2006 [18] Strasser H., Weber Ch., On the asymptotic theory of permutation
[12] Arthur D., Vassilvitskii S., K-means++: The advantages of careful statistics, SFB Adaptive Information Systems and Modelling in
seeding, In: SODA ’07 Proceedings of the 8th annual ACM-SIAM Economics and Management Science, 1999
Symposium on Discrete Algorithms, 2007, 1027–1035 [19] Hyndman R. J., Koehler A. B., Snyder R. D., Grose S., A state
[13] Kaufman L., Rousseeuw P. J., Finding Groups in Data: An In- space framework for automatic forecasting using exponential
troduction to Cluster Analysis, Wiley Series in Probability and smoothing methods, International Journal of Forecasting, 2002,
Statistics, Wiley, 2009 18(3), 439–454
[14] Davies D. L., Bouldin D. W., A cluster separation measure, [20] Laurinec P., TSrepr R package: Time series representations,
IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal of Open Source Software, 2018, 3(23), 577
1979, 1(2), 224–227 [21] Laurinec P., Lucká M., New clustering-based forecasting method
[15] Breiman L., Random forests, Machine Learning, 2001, 45(1), for disaggregated end-consumer electricity load using smart
5–32 grid data, In: 2017 IEEE 14th International Scientific Conference
[16] Cleveland R. B., Cleveland W. S., McRae J. E., Terpenning I., STL: on Informatics, 2017, 210–215
A seasonal-trend decomposition procedure based on Loess, [22] Hyndman R. J., Koehler A. B., Another look at measures of fore-
Journal of Oflcial Statistics, 1990, 6(1), 3–73 cast accuracy, International Journal of Forecasting, 2006, 22(4),
679–688
Unauthenticated
Download Date | 7/26/18 2:42 AM