0% found this document useful (0 votes)
4 views3 pages

2206.03186v1

This paper critiques traditional time series aggregation (TSA) methods used in optimization models, arguing that a-priori approaches can lead to significant inaccuracies in output error despite minimizing input error. It proposes a basis-oriented TSA framework that demonstrates improved computational efficiency and accuracy by focusing on the optimal basis for aggregated models. The findings suggest that a-posteriori methods should replace one-size-fits-all TSA methods to enhance optimization results.

Uploaded by

martabresco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

2206.03186v1

This paper critiques traditional time series aggregation (TSA) methods used in optimization models, arguing that a-priori approaches can lead to significant inaccuracies in output error despite minimizing input error. It proposes a basis-oriented TSA framework that demonstrates improved computational efficiency and accuracy by focusing on the optimal basis for aggregated models. The findings suggest that a-posteriori methods should replace one-size-fits-all TSA methods to enhance optimization results.

Uploaded by

martabresco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO.

8, AUGUST 2015 1

Time series aggregation for optimization:


One-size-fits-all?
Sonja Wogrin, Senior Member, IEEE

Abstract—One of the fundamental problems of using opti- models, and that they should ultimately be replaced by a-
mization models that use different time series as data input, posteriori methods. To that purpose, we first define full and
is the trade-off between model accuracy and computational aggregated optimization models and apply a traditional k-
arXiv:2206.03186v1 [math.OC] 7 Jun 2022

tractability. To overcome computational intractability of these


full optimization models, the dimension of input data and model means clustering technique to an illustrative example in section
size is commonly reduced through time series aggregation (TSA) II. And second, that when a-posteriori methods are based
methods. However, traditional TSA methods often apply a one- on theoretical underpinnings - as the basis-oriented method
size-fits-all approach based on the common belief that the clusters proposed in section III - they outperform a-priori methods by
that best approximate the input data also lead to the aggregated orders of magnitude. Section IV concludes the paper.
model that best approximates the full model, while the metric that
really matters –the resulting output error in optimization results –
is not well addressed. In this paper, we plan to challenge this belief
and show that output-error based TSA methods with theoretical II. F ULL AND AGGREGATED OPTIMIZATION MODELS
underpinnings have unprecedented potential of computational
efficiency and accuracy. We consider the following generic formulation of a full (left)
and its corresponding aggregated (right) optimization problem:
Index Terms—time series aggregation, optimization.
minf (x, T S) minf (x, T S)
I. I NTRODUCTION s.t. g(x, T S) ≤ 0 s.t. g(x, T S) ≤ 0,

T HE traditional and vast majority of time series aggre-


gation (TSA) frameworks focus on best approximating
the original data (i.e. to reduce difference between cluster
where x are the decision variables, T S represent the original
time series used as data, and f and g are the objective function
and constraints. The number of variables is proportional to the
centroids and actual data, which we will refer to as the cardinality of the time series, |x| ∼ |T S|, and hence there is a
input error) with aggregated or clustered data, completely large amount of variables x and a large number of constraints
separating the realm of data from the realm of optimization. g in the full problem, which often leads to computational com-
Such traditional a-priori methods are based on the common plexity and intractability. Through a TSA process often obtain
belief that the clusters that best approximate the data also lead through clustering algorithms, the original TS are transformed
to the aggregated model that best approximates the full model into the aggregated T S, where |T S|  |T S|. Correspondingly
(i.e. minimize the output error, the difference between full |T S|  |x|, and so is the number of constraints, which
and aggregated model results), which is not necessarily true, leads to a significant reduction in computational burden of
as we will show through an illustrative case. Examples of the aggregated optimization model with respect to the full
such a-priori methods and applications, including k-medoids optimization model. In general, there is no guarantee with
[1] or k-means, can be found in a recent literature review [2]. respect to the quality of aggregated versus full model results.
Some a-priori TSA methods keep additional information about
the original time series that are important for optimization
model results, such as adding periods with extreme events, A. Economic dispatch problem
e.g. [3], [4], [5]. While this might improve model outcomes,
the choice of extreme days is still taken with respect to The full economic dispatch (ED) optimization problem min-
input data only. As pointed out by [6], the correct extreme imizes overall power system cost over a time horizon of hours
periods cannot be known in advance because they depend h by determining the optimal production of generating units g,
on endogenous optimization outcomes, which leads us to a- each of which has their corresponding variable operating costs
posteriori1 methods. Some examples of a-posteriori methods Cg , and upper and lower bounds while meeting system demand
include [7], [8], [9]; however, they either contain some kind Dg,h at each hour2 . Lower bound P g depends on technical
of heuristic components or are tailored to toy problems. characteristics of the generator itself, but upper bound P g,h
In this paper we show that first, a-priori TSA methods are also depends on the temporal index. Indeed P g,h can be
not a one-size-fits-all solution when used for optimization obtained as the product of the installed generator capacity

S. Wogrin, Institute of Electricity Economics and Energy Innovation, Graz 2 If it is not possible to meet demand with existing generators, then there
University of Technology, Austria, e-mail: [email protected] is non-supplied energy at a high cost. We have not modeled this explicitly
1 A-posteriori methods employ preliminary optimizations to improve the for simplicity. But a fictitious generator with high operating cost and infinite
aggregation process. upper bound can represent non-supplied energy in the presented formulation.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

P g multiplied by its capacity factor CF g,h 3 . The TS of kmeans clustering


this problem are system demand and capacity factors. In an 1
aggregated ED problem, we consider only r representative
hours, each of which has a weight W and |r|  |h|. Both 0.8
the cardinality of |r|, the corresponding weights Wr and
aggregated data Dr and P g,r most likely stem from a data 0.6

Wind
aggregation/clustering procedure. A stylized formulation of the
full (left) and aggregated (right) ED is given below:
0.4
X X
min Cg pg,h min Cg pg,r Wr
g,h g,r 0.2
X X
s.t. pg,h = Dh ∀h s.t. pg,r = Dr ∀r
g g 0
0 0.2 0.4 0.6 0.8 1
P g ≤ pg,h ≤ P g,h ∀g, h P g ≤ pg,r ≤ P g,r ∀g, r
Demand

B. Economic dispatch and k-means clustering (a)


We consider a numerical example of the ED with one wind
unit and one thermal unit over the time horizon of one year
(h = 1, · · · , 8760). The model data are hourly time series of
demand and wind production factors (the latter affect P g,h ). To
obtain aggregated data for the aggregated ED model, we use
the probably most frequently used input-error-based a-priori
TSA method, i.e., k-means clustering. In order to run k-means,
the user has to specify the total desired number of clusters,
which can range from 1 to 8760. However, the choice of the
number of clusters is often done on a trial and error basis. In
this example, we choose 3 clusters - this seemingly small and
arbitrary number will become relevant later on.
When applying k-means on this data and demanding 3
clusters (|r| = 3, |h| = 8760), we obtain TSA results as
shown in Figure 1a, where each wind-demand TS pair is
plotted as an x and color-coded depending on the cluster it
has been assigned to. The mean squared error (MSE) between (b)
the original hourly TS and the cluster centroids is 0.0167, Fig. 1: K-means (a) and basis-oriented (b) clusters on full time
which is the minimum error in the input space that can be series data.
obtained for 3 clusters. The 3 cluster centroids for demand
and wind capacity factors, i.e. Dr and P g,r ), as well as the
corresponding cluster weights Wr - also a result of the TSA model outputs, i.e., the output error, and not the input error.
procedure - are then used as data input in the aggregated Therefore, in this section we propose an innovative a-posteriori
economic dispatch model. Running the aggregated economic framework of basis-oriented TSA for aggregated optimization
dispatch model with 3 weighted representative periods using models to overcome such inefficiencies. Finally, we apply this
k-means clustered data, yields an overall error of 91%4 in total new methodology to the same ED problem from before.
system costs between the full and the aggregated model results.
This also shows how inefficient an a-priori TSA method can
be when used in aggregated optimization models. Note that in A. Basis-oriented time series aggregation framework
order to achieve a 0% output error using k-means, all 8760 We consider a linear program (LP) that depends on TS, such
clusters would have to be used. as the ED from section II-A. For the sake of simplicity, we
analyze each time step individually and assume there are no
III. BASIS - ORIENTED TIME SERIES AGGREGATION FOR time-period-linking constraints5 . In particular, we focus on the
AGGREGATED OPTIMIZATION MODELS optimal solution for this single time step and its corresponding
When solving aggregated optimization models, what really basis B (in the simplex framework). In the remainder of this
matters is how well aggregated model results approximate full section, we use the concrete example of the ED to introduce
3 For thermal generators, the capacity factor is 1, but for generators that
the idea of basis-oriented clustering. We say hour, but it could
belong to variable renewable energy sources, e.g. wind and solar, such a be a generic time step as well.
capacity factor depends on solar irradiation or wind speeds, which vary over
time. 5 If there are no time-period-linking constraints in the optimization problem,
4 Calculated as the difference between the objective function value of the then analyzing time steps separately does not incur an error. In any case, the
full model, and the objective function value of the aggregated model. issue of complicating constraints will be a topic of future research.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

Consider two hours with different TS data whose optimal the MSE obtained by k-means, so the input error is higher
solution of the ED belong to the same basis B. The TS data with the obtained clusters. However, if we cluster all hours
affects the right-hand-side (RHS) vector of the constraints. An within their basis, then the aggregated optimization results are
expected value (or centroid) of these data, yields an optimal exact! Solving the aggregated optimization model with only
solution that also has the same optimal basis B. 3 representative periods that have been clustered, accounting
for the corresponding bases, yields an error of 0%. Apart
Theorem. For each i = 1, . . . , I consider the following LPs
from being theoretically exact, basis-oriented clustering also
(Ei ): min cT x s.t. Ax = bi , where x ∈ Rn , A ∈ Rmxn , c ∈
establishes the maximum number of clusters necessary to
Rn , bi ∈ Rm and the LPs only differ in the RHS values bi .
obtain exact optimization results, which is nothing current
Then, B is also an optimal basis for
P the problem (E): min cT x
bi TSA methods can offer.
s.t. Ax = E(bi ), where E(bi ) = i I .
Proof. We proof this by contradiction. Assume that B is not IV. C ONCLUSION
the optimal basis for problem (E): min cT x s.t. Ax = E(bi ). The takeaways from this paper are as follows. First, a-
Instead, let B (6= B) and N be the optimal basis and the priori TSA methods are fundamentally flawed when used
non-basis matrices of (E). Under this assumption, it follows in/for optimization models and therefore should be abandoned
T
that cB xB < cTB xB for this problem. In (E) we obtain that and replaced by a-posteriori methods. As we have shown by
Ax = BxB + N xN = E(bi ). By definition xN = 0, so counter-example, the lowest input error (as indicated by MSE)
we further simplify Ax = BxB = E(bi ) and it follows that does not translate into the lowest (or even a low) output error
−1
xB = B (E(bi )). We now substitute this optimal solution in when approximating full optimization model results. Second,
the objective function of (E), which yields cT x = cTB xB = basis-oriented TSA can achieve a tremendous reduction (3 out
−1 −1 P −1 of 8760 hours) in input data of several orders of magnitude
cTB B (E(bi )) = cTB B ( i bIi ) = I1 cTB B b1 + . . . + while replicating full model results exactly. This confirms that
1 T −1
I cB B bI . Since we know that B is an optimal basis for picking clusters intelligently can outperform traditional one-
1 T −1 size-fits-all a-priori TSA methods (such as k-means) even if
each problem (Ei ), we can say that: I cB B b1 + . . . +
1 T −1 those use most of the original data. Finally, more research is
I cB B bI ≥ I1 cTB B −1 b1 + . . . + 1 T −1
I cB B bI = cTB xB . This
required to develop a theoretical framework on most efficient
would imply that B is an optimal basis for (E), which is a a-posteriori TSA methods. The basis-oriented TSA proposed
contradiction. here could be a starting point. In future research, we plan to
This result has significant ramifications with respect to clus- explore basis-oriented TSA further to see if it can be extended
tering TS data: imagine two hours with different data, i.e. b1 to large-scale, realistic problems with, e.g., time-linking and
and b2 , but an identical optimal basis B for the underlying LP, discrete constraints.
then the full optimization model (with individually represented ACKNOWLEDGMENT
hours), and the aggregated optimization model where we only
have one expected hour (i.e. the cluster centroid or expected I want to thank D. Cardona-Vasquez, D. Di Tondo, S. Pineda
value b1 +b and J.M. Morales for their comments.
2 ), yield the same objective function value, and the
2

same expected results for the variables. Hence, those two R EFERENCES
hours can be merged without losing any accuracy in the
[1] H. Teichgraeber, A. R. Brandt, Clustering methods to find representative
final aggregated model. In other words, if hours are aggre- periods for the optimization of energy systems: An initial framework and
gated within their basis and represented by the cluster centroid, comparison, Applied energy 239 (2019) 1283–1293.
then the aggregated model results will be exactly the same [2] M. Hoffmann, L. Kotzur, D. Stolten, M. Robinius, A review on time
series aggregation methods for energy system models, Energies 13 (3)
as the full hourly model results and have zero output error (2020) 641.
in expectation. This shows: first, that TSA for optimization [3] F. Domı́nguez-Muñoz, J. M. Cejudo-López, A. Carrillo-Andrés,
purposes must be based on the impact of the aggregation on M. Gallardo-Salazar, Selection of typical demand days for chp optimiza-
tion, Energy and buildings 43 (11) (2011) 3036–3043.
the optimization output error and not on similarity of input [4] F. J. De Sisternes, M. D. Webster, O. J. De Sisternes, Optimal selection
data as in traditional methods; second, it introduces basis- of sample weeks for approximating the net load in generation planning
oriented TSA as a promising way of clustering for aggregated problems (2013).
[5] F. D. Munoz, A. D. Mills, Endogenous assessment of the capacity value
optimization problems. of solar pv in generation investment planning studies, IEEE Transactions
on Sustainable Energy 6 (4) (2015) 1574–1585.
[6] I. J. Scott, P. M. Carvalho, A. Botterud, C. A. Silva, Clustering represen-
B. Economic dispatch and basis-oriented clustering tative days for power systems generation expansion planning: Capturing
the effects of variable renewables and energy storage, Applied Energy
Applying basis-oriented clustering to the above-mentioned 253 (2019) 113603.
ED example shows that there are only 3 different bases. [7] A. Pöstges, C. Weber, Time series aggregation–a new methodological
approach using the “peak-load-pricing” model, Utilities Policy 59 (2019).
Therefore, we only require 3 clusters. In Figure 1b, we [8] M. Sun, F. Teng, X. Zhang, G. Strbac, D. Pudjianto, Data-driven represen-
have color-coded each hour depending on the basis (and tative day selection for investment decisions: A cost-oriented approach,
corresponding cluster) this hour belongs to: blue (the wind IEEE Transactions on Power Systems 34 (4) (2019) 2925–2936.
[9] C. Li, A. J. Conejo, J. D. Siirola, I. E. Grossmann, On representative
generator is the marginal generator), black (thermal is on the day selection for capacity expansion planning of power systems under
margin), and red (hours with non-supplied energy). The MSE extreme operating conditions, International Journal of Electrical Power
in the input space is 0.0385, which is 2.3 times larger than & Energy Systems 137 (2022) 107697.

You might also like