2104.05522
2104.05522
Abstract
We extend the neural basis expansion analysis (NBEATS) to incorporate exogenous factors.
The resulting method, called NBEATSx, improves on a well performing deep learning model,
extending its capabilities by including exogenous variables and allowing it to integrate mul-
tiple sources of useful information. To showcase the utility of the NBEATSx model, we
conduct a comprehensive study of its application to electricity price forecasting (EPF) tasks
across a broad range of years and markets. We observe state-of-the-art performance, signifi-
cantly improving the forecast accuracy by nearly 20% over the original NBEATS model, and
by up to 5% over other well established statistical and machine learning methods specialized
for these tasks. Additionally, the proposed neural network has an interpretable configura-
tion that can structurally decompose time series, visualizing the relative impact of trend
and seasonal components and revealing the modeled processes’ interactions with exogenous
factors. To assist related work we made the code available in a dedicated repository.
Keywords: Deep Learning, NBEATS and NBEATSx models, Interpretable neural
network, Time series decomposition, Fourier series, Electricity price forecasting
1. Introduction
In the last decade, a significant progress has been made in the application of deep learning
to forecasting tasks, with models such as the exponential smoothing recurrent neural network
(ESRNN; Smyl 2020) and the neural basis expansion analysis (NBEATS; Oreshkin et al. 2020),
outperforming classical statistical approaches in the recent M4 competition (Makridakis
et al., 2020). Despite this success we still identify two possible improvements, namely the
integration of time-dependent exogenous variables as their inputs and the interpretability of
the neural network outputs.
∗
Corresponding author
Email address: [email protected] (Kin G. Olivares)
International Journal of Forecasting April 5, 2022
Neural networks have proven powerful and flexible, yet there are several situations where
our understanding of the model’s predictions can be as crucial as their accuracy, which
constitutes a barrier for their wider adoption. The interpretability of the algorithm’s outputs
is critical because it encourages trust in its predictions, improves our knowledge of the
modeled processes, and provides insights that can improve the method itself.
Additionally, the absence of time-dependent covariates makes these powerful models
unsuitable for many applications. For instance, Electricity Price Forecasting (EPF) is a task
where covariate features are fundamental to obtain accurate predictions. For this reason, we
chose this challenging application as a test ground for our proposed forecasting methods.
In this work, we address the two mentioned limitations by first extending the neural basis
expansion analysis, allowing it to incorporate temporal and static exogenous variables. And
second, by further exploring the interpretable configuration of NBEATS and showing its use
as a time-series signal decomposition tool. We refer to the new method as NBEATSx. The
main contributions of this paper include:
The remainder of the paper is structured as follows. Section 2 reviews relevant literature
on the developments and applications of deep learning to sequence modeling and current
approaches to EPF. Section 3 introduces mathematical notation and describes the NBEATSx
model. Section 4 explores our model’s application to time series decomposition and forecast-
ing over a broad range of electricity markets and time periods. Finally, Section 5 discusses
possible directions for future research, wraps up the results, and concludes the paper.
2
2. Literature Review
2.1. Deep Learning and Sequence Modeling
The Deep Learning methodology (DL) has demonstrated a significant utility in solving
sequence modeling problems with applications to natural language processing, audio signal
processing, and computer vision. This subsection summarizes the critical DL developments
in sequence modeling, that are building blocks of the NBEATS and ESRNN architectures.
For a long time, sequence modeling with neural networks and Recurrent Neural Net-
works (RNNs; Elman 1990) were treated as synonyms. The hidden internal activations of
the RNNs propagated through time provided these models with the ability to encode the
observed past of the sequence. This explains their great popularity in building different vari-
ants of the Sequence-to-Sequence models (Seq2Seq) applied to natural language processing
(Graves, 2013), and machine translation (Sutskever et al., 2014). Most progress on RNNs
was made possible by architectural innovations and novel training techniques that made
their optimization easier.
The adoption of convolutions and skip-connections within the recurrent structures were
important precursors for new advancements in sequence modeling, as using deeper represen-
tations endowed longer effective memory for the models. Examples of such precursors could
be found in WaveNet for audio generation and machine translation (van den Oord et al.,
2016), as well as the Dilated Recurrent Neural Network (DilRNN; Chang et al. 2017) and the
Temporal Convolutional Network (TCN; Bai et al. 2018).
Nowadays, Seq2Seq models and their derivatives can learn complex nonlinear temporal
dependencies efficiently; its use in the time series analysis domain has been a great suc-
cess. Seq2Seq models have recently showed better forecasting performance than classical
statistical methods, while greatly simplifying the forecasting systems into single-box mod-
els, such as the Multi Quantile Convolutional Neural Network (MQCNN; Wen et al. 2017), the
Exponential Smoothing Recurrent Neural Network (ESRNN; Smyl 2020), or the Neural Basis
Expansion Analysis (NBEATS; Oreshkin et al. 2020). For quite a while, the academia resisted
to broadly adopt these new methods (Makridakis et al., 2018), although their evident suc-
cess in challenges such as the M4 competition has motivated their wider adoption by the
forecasting research community (Benidis et al., 2020).
3
turning the field into a prolific subject on which to test novel forecasting ideas and trading
strategies (Chitsaz et al., 2018; Gianfreda et al., 2020; Uniejewski & Weron, 2021).
Out of the numerous approaches to EPF developed over the last two decades, two classes
of models are of particular importance when predicting day-ahead prices – statistical (also
called econometric or technical analysis), in most cases based on linear regression, and
computational intelligence (also referred to as artificial intelligence, non-linear or machine
learning), with neural networks being the fundamental building block. Among the latter,
many of the recently proposed methods utilize deep learning (Wang et al. 2017; Lago et al.
2018a; Marcjasz 2020), or are hybrid solutions, that typically comprise data decomposition,
feature selection, clustering, forecast averaging and/or heuristic optimization to estimate
the model (hyper)parameters (Nazar et al., 2018; Li & Becker, 2021).
Unfortunately, as argued by Lago et al. (2021a), the majority of the neural network EPF
related research suffers from too short and limited to a single market test periods, lack of
well performing and established benchmark methods, and/or incomplete descriptions of the
pipeline and training methodology resulting in poor reproducibility of the results. To address
these shortcomings, our models are compared across two-year out-of-sample periods from
five power markets and using two highly competitive benchmarks recommended in previous
studies: the Lasso Estimated Auto-Regressive (LEAR) model and a (relatively) parsimonious
Deep Neural Network (DNN).
3. NBEATSx Model
As a general overview, the NBEATSx framework decomposes the objective signal by per-
forming separate local nonlinear projections of the target data onto basis functions across
its different blocks. Figure 1 depicts the general architecture of the model. Each block con-
sists of a Fully Connected Neural Network (FCNN; Rosenblatt 1961) which learns expansion
coefficients for the backcast and forecast elements. The backcast model is used to clean the
inputs of subsequent blocks, while the forecasts are summed to compose the final prediction.
The blocks are grouped in stacks. Each of the potentially multiple stacks specializes in a
different variant of basis functions.
To continue the description of the NBEATSx, we introduce the following notation: the
objective signal is represented by the vector y, the inputs for the model are the backcast
window vector yback of length L, and the forecast window vector yf or of length H; where
L denotes the length of the lags available as classic autoregressive features, and H is the
forecast horizon treated as the objective. While the original NBEATS only admits as regressor
the backcast period of the target variable yback , the NBEATSx incorporates covariates in its
analysis denoted with the matrix X. Figure 1 shows an example where the target variable
is the hourly electricity price, the backcast vector has a length L of 96 hours, and the
forecast horizon H is 72 hours, in the example, the covariate matrix X is composed of wind
power production and electricity load. For the EPF comparative analysis of Section 4.3.6
the horizon considered is H = 24 that corresponds to day-ahead predictions, while backcast
inputs L = 168 correspond to a week of lagged values.
4
X2
X1
y
Stack 1 Block 1
_
\
FC Stack
Seasonality
Stack
Block 2
Stack 2
forecast
+
\
_
\
FC FC
… … … for back
… θs,b θs,b
Exogenous
Block B
Stack S _
\
Forecast Backcast
+
\
ŷ for
s,b
ŷback
s,b
Global Forecast Stack Residual
(model output) (to next stack)
ŷ for yback
s
Figure 1: Building blocks of the NBEATSx are structured as a system of multilayer fully connected networks
with ReLU based nonlinearities. Blocks overlap using the doubly residual stacking principle for the backcast
back f or
ŷs,b and forecast ŷs,b outputs of the b-th block within the s-th stack. The final predictions ŷf or are
composed by aggregating the outputs of the stacks.
For its predictions, the NBEATS model only receives a local vector of inputs corresponding
to the backcast period, making the computations exceptionally fast. The model can still
represent longer time dependencies through its local inputs from the exogenous variables;
for example, it can learn long seasonal effects from calendar variables.
Overall, as shown in Figure 1, the NBEATSx is composed of S stacks of B blocks each, the
input yback of the first block consists of L lags of the target time series y and the exogenous
matrix X, while the inputs of each of the subsequent blocks include residual connections with
the backcast output of the previous block. We will describe in detail in the next subsections
the blocks, stacks, and model predictions.
5
3.1. Blocks
For a given s-th stack and b-th block within it, the NBEATSx model performs two trans-
formations, depicted in the blue rectangle of Figure 1. The first transformation, defined in
back
Equation (1), takes the input data (ys,b−1 , Xs,b−1 ), and applies a Fully Connected Neural
Network (FCNN; Rosenblatt 1961) to learn hidden units hs,b ∈ RNh that are linearly adapted
into the forecast θ fs,bor ∈ RNs and backcast θ back
s,b ∈ R
Ns
expansion coefficients, with Ns the
dimension of the stack basis.
back
hs,b = FCNNs,b ys,b−1 , Xb−1
(1)
θ back
s,b = LINEAR
back
(hs,b ) θ fs,bor = LINEARf or (hs,b )
The second transformation, defined in Equation (2), consists of a basis expansion operation
back f or
between the learnt coefficients and the block’s basis vectors Vs,b ∈ RL×Ns and Vs,b ∈
H×Ns back f or
R , this transformation results in the backcast ŷs,b and forecast ŷs,b components.
The additive generation of the forecast implies a very intuitive decomposition of the
prediction components when the bases within the blocks are interpretable.
6
3.4. NBEATSx Configurations
The original neural basis expansion analysis method proposed two configurations based
back
on the assumptions encoded in the learning algorithm by selecting the basis vectors Vs,b
f or
and Vs,b used in the blocks from Equation (2). A mindful selection of restrictions to the
basis allows the model to output an interpretable decomposition of the forecasts, while
allowing the basis to be freely determined can produce more flexible forecasts by effectively
removing any constraints on the form of the basis functions.
In this subsection, we present both interpretable and generic configurations, explaining
in particular how we propose to include the covariates in each case. We limit ourselves to
the analysis of the forecast basis, as the backcast basis analysis is almost identical, only
differing by its extension over time. We show an example in Appendix A.1.
bH/2−1c
X t t
seas
ŷs,b = cos 2πi seas
θs,b,i + sin 2πi seas
θs,b,i+bH/2c ≡ S θ seas
s,b (6)
i=0
Nhr Nhr
Nx
X
exog exog
ŷs,b = Xi θs,b,i ≡ X θ exog
s,b (7)
i=0
|
where the time vector t = [0, 1, 2, . . . , H − 2, H − 1]/H is defined discretely. When the basis
f or
Vs,b is T = [1, t, . . . , tNpol ] ∈ RH×(Npol +1) , where Npol is the maximum polynomial degree, the
f or
coefficients are those of a trend polynomial model. When the bases Vs,b are harmonic S =
[1, cos(2π Nhr ), . . . , cos(2πbH/2 − 1c Nhr ), sin(2π Nhr ), . . . , sin(2πbH/2 − 1c Nthr )] ∈ RH×(H−1) ,
t t t
the coefficients vector θ fs,bor can be interpreted as Fourier transform coefficients, the hyper-
parameter Nhr controls the harmonic oscillations. The exogenous basis expansion can be
thought as a time-varying local regression when the basis is the matrix X = [X1 , . . . , XNx ] ∈
RH×Nx , where Nx is the number of exogenous variables. The resulting models can flexibly
reflect common structural assumptions, in particular using the interpretable bases, as well
as their combinations.
In this paper, we propose including one more type of stack to specifically represent
exogenous variable basis as described in Equation (7) and depicted in Figure 1. In the
7
original NBEATS framework (Oreshkin et al. (2020)), the interpretable configuration usually
consists of a trend stack followed by a seasonality stack, each containing three blocks. Our
NBEATSx extension of this configuration consists of three stacks, one of each type of factors
(trend, seasonal, exogenous). We refer to this interpretable and its enhanced interpretable
configuration as the NBEATS-I and NBEATSx-I models, respectively.
This basis enables NBEATSx to effectively behave like a classic Fully Connected Neural
Network (FCNN). The output layer of the FCNN inside each block has H neurons, that cor-
respond to the forecast horizon, each producing the forecast for one particular time point
of the forecast period. This can be understood as the basis vectors being learned during
optimization, allowing the waveform of the basis of each stack to be freely determined in a
data-driven fashion. Compared to the interpretable counterpart described in Section 3.4.1,
the constraints on the form of the basis functions are removed. This affords the generic
variant more flexibility and power at representing complex data, but it can also lead to less
interpretable outcomes and potentially escalated risk of overfitting.
For the NBEATSx model with the generic configuration, we propose a new type of exoge-
nous block that learns a context vector Cs,b from the time-dependent covariates with an
encoder convolutional sub-structure:
Nc
X
exog f or
ŷs,b = Cs,b,i θs,b,i ≡ Cs,b θ fs,bor with Cs,b = TCN(X)
i=1
(9)
In the previous equation, a Temporal Convolutional Network (TCN; Bai et al. 2018) is em-
ployed as an encoder, but any neural network with a sequential structure will be compatible
with the backcast and forecast branches of the model, and could be used as an encoder. For
example, the WaveNet (van den Oord et al., 2016) can be an effective alternative to RNNs
as it is also able to capture long term dependencies and interactions of covariates by stack-
ing multiple layers, while dilations help it keep the models computationally tractable. In
addition, convolutions have a very convenient interpretation as a weighted moving average
signal filters. The final linear projection and the additive composition of the predictions can
be interpreted as a decoder.
The original NBEATS configuration includes only one generic stack with dozens of blocks,
while our proposed model includes both the generic and exogenous stacks, with the order
determined via data-driven hyperparameter tuning. We refer to this configuration as the
NBEATSx-G model.
8
3.4.3. Exogenous Variables
We distinguish the exogenous variables by whether they reflect static or time-dependent
aspects of the modeled data. The static exogenous variables carry time-invariant informa-
tion. When the model is built with common parameters to forecast multiple time series,
these variables allow sharing information within groups of time series with similar static vari-
able levels. Examples of static variables include designators such as identifiers of regions,
groups of products, among others.
As for the time-dependent exogenous covariates, we discern two subtypes. First, we
consider seasonal covariates from the natural frequencies in the data. These variables are
useful for NBEATSx to identify seasonal patterns and special events inside and outside the
window lookback periods. Examples of these are the trends and harmonic functions from
Equation (5) and Equation (6). Second, we identify domain-specific temporal covariates
unique to each problem. The EPF setting typically includes day-ahead forecasts of electricity
load and production levels from renewable energy sources.
4. Empirical Evaluation
4.1. Electricity Price Forecasting Datasets
To evaluate our method’s forecasting capabilities, we consider short-term electricity price
forecasting tasks, where the objective is to predict day-ahead prices. Five major power
markets1 are used in the empirical evaluation, all comprised of hourly observations of the
prices and two influential temporal exogenous variables that extend for 2,184 days (312
weeks, six years). From the six years of available data for each market, we hold two years
out, to test the forecasting performance of the algorithms. The length and diversity of
the test sets allow us to obtain accurate and highly comprehensive measurements of the
robustness and the generalization capabilities of the models.
Table 1 summarizes the key characteristics of each market. The Nord Pool electric-
ity market (NP), which corresponds to the Nordic countries exchange, contains the hourly
prices and day-ahead forecasts of load and wind generation. The second dataset is the
Pennsylvania-New Jersey-Maryland market in the United States (PJM), which contains hourly
zonal prices in the Commonwealth Edison (COMED) and two day-ahead forecasts of load
at the system and COMED zonal levels. The remaining three markets are obtained from
the integrated European Power Exchange (EPEX). Belgium (EPEX-BE) and France (EPEX-FR)
markets share the day-ahead forecast generation in France as covariates since it is known to
be one of the best predictors for Belgian prices (Lago et al., 2018b). Finally, the German
market (EPEX-DE) contains the hourly prices, day-ahead load forecasts, and the country level
wind and solar generation day-ahead forecast.
Figure 2 displays the NP electricity price time series and its corresponding covariate
variables to illustrate the datasets. The NP market is the least volatile among the considered
markets, since most of its power comes from hydroelectric generation, renewable source
1
For the sake of reproducibility we only consider datasets that are openly accessible in the EPFtoolbox
library https://github.com/jeslago/epftoolbox (Lago et al., 2021a).
9
Table 1: Datasets used in our empirical study. For the five day-ahead electricity markets considered, we
report the test period dates and two influential covariate variables.
volatility is negligible, and zero spikes are rare. The PJM market is transitioning from coal
generation to natural gas and some renewable sources, zero spikes are rare, but the system
exhibits higher volatility than NP. In EPEX-BE and EPEX-FR markets, negative prices and
spikes are more frequent, and as time passes, these markets begin to show increasing signs
of integration. Finally, the EPEX-DE market shows few price spikes, but the most frequent
negative and zero price events, due in great part to the impact of renewable sources.
The exogenous covariates are normalized following best practices drawn from the EPF
literature (Uniejewski et al., 2018). Preprocessing the inputs of neural networks is essential
to accelerate and stabilize the optimization (LeCun et al., 1998).
10
Early Stopping
Figure 2: The top panel shows the day-ahead electricity price time series for the NordPool (NP) market.
The second and third panels show the day-ahead forecast for the system load and wind generation. The
training data is composed of the first four years of each dataset. The validation set is the year that follows
the training data (between the first and second dotted lines). For the held-out test set, the last two years
of each dataset are used (marked by the second dotted line). During evaluation, we recalibrate the model
updating the training set to incorporate all available data before each daily prediction. The recalibration
uses an early stopping set of 42 weeks randomly chosen from the updated training set (a sample selection is
marked with blue rectangles in the top panel).
11
Price[EUR/MWh]
Price[EUR/MWh]
true true
50 level 50 level
forecast forecast
40 40
30 30
4 4
3 3
2 2
Trend
Trend
1 1
0 0
1 1
5.0 5.0
Seasonality
Seasonality
2.5 2.5
0.0 0.0
2.5 2.5
20
Exogenous
10
20 20
Residual
Residual
10 10
0 0
0 5 10 15 20 0 5 10 15 20
Hours [December 18, 2017, 00:00 to 23:00] Hours [December 18, 2017, 00:00 to 23:00]
(a) NBEATS (b) NBEATSx
Figure 3: Time series signal decomposition for NP electricity price day-ahead forecasts using interpretable
variants of NBEATS and NBEATSx. The top row of graphs shows the original signal and the level, the latter
is defined as the last available observation before the forecast. The second row shows the polynomial
trend components, the third and fourth rows display the complex seasonality modeled by nonlinear Fourier
projections and the exogenous effects of the electricity load on the price, respectively. The bottom row
graphs show the unexplained variation of the signal. The use of electricity load and production forecasts
turns out to be fundamental for accurate price forecasting.
12
4.3. Comparative Analysis
4.3.1. Evaluation Metrics
To ensure the comparability of our results with the existing literature, we opted to follow
the widely accepted practice of evaluating the accuracy of point forecasts with the following
metrics: mean absolute error (MAE), relative mean absolute error (rMAE) 2 , symmetric
mean absolute percentage error (sMAPE) and root mean squared error (RMSE), defined as:
Nd X 24 PNd P24
1 X |yd,h − ŷd,h |
M AE = |yd,h − ŷd,h | rM AE = PNd=1P24h=1
24Nd d=1 h=1 d
d=1
naive
h=1 |yd,h − ŷd,h |
v
Nd X
24 u Nd X 24
200 X |yd,h − ŷd,h | u 1 X
sM AP E = RM SE = t (yd,h − ŷd,h )2
24Nd d=1 h=1
|yd,h | + |ŷd,h | 24Nd d=1 h=1
where yd,h and ŷd,h are the actual value and the forecast of the time series at day d and hour
h, for our experiments given the two years of each test set Nd = 728.
While regression-based models are estimated by minimizing squared errors, to train
neural networks we minimize absolute errors (see Section 4.3.3 below). Hence, both the
MAE and RMSE are highly relevant in our context. Since they are not easily comparable
across datasets – and given the popularity of such errors in forecasting practice (Makridakis
et al., 2020) – we have additionally computed a percentage and a relative measure. The
sMAPE is used as an alternative to MAPE, which in the presence of values close to zero
may degenerate (Hyndman & Koehler, 2006). The rMAE is calculated instead of a scaled
measure used in the M4 competition for reasons explained in Sec. 5.4.2. of Lago et al.
(2021a).
2
The naı̈ve forecast method in EPF corresponds to a similar day rule, where the forecast for a Monday,
Saturday and Sunday equals the value of the series observed on the same weekday of the previous week,
while the forecast for Tuesday, Wednesday, Thursday, and Friday is the value observed on the previous day.
13
A and B, conditioned on the available information to that moment3 in time Fd−1 .
h i
H0 : E ||yd − ŷdA ||1 − ||yd − ŷdB ||1 | Fd−1 ≡ E ∆A,B
d | Fd−1 = 0 (10)
3
In practice, the available information Fd−1 is replaced with a constant and lags of the error difference
∆A,B
d and the test is performed using a linear regression with a Wald-like test. When the conditional
information considered is only the constant variable, one recovers the original DB test.
14
Table 2: Hyperparameters of NBEATSx networks. They are common to all presented datasets. We list the
typical values we considered in our experiments. The configuration that performed best on the validation
set was selected automatically.
4.3.5. Ensembling
In many recent forecasting competitions, and particularly in the M4 competition, most of
the top-performing models were ensembles (Atiya, 2020). It has been shown that in practice,
combining a diverse group of models can be a powerful form of regularization to reduce the
variance of predictions (Breiman, 1996; Nowotarski et al., 2014; Hubicka et al., 2019).
The techniques used by the forecasting community to induce diversity in the models
are plentiful. The original NBEATS model obtained its diversity from three sources, training
with different loss functions, varying the size of the input windows, and bagging models
15
with different random initializations (Oreshkin et al., 2020). They used the median as the
aggregation function for 180 different models. Interestingly, the original model did not rely
on regularization, such as L2 or dropout, as Oreshkin et al. (2020) found it to be good for
the individual models but detrimental to the ensemble.
In our case, we ensemble the NBEATSx model using two sources of diversity. The first
being a data augmentation technique controlled by the sampling frequency of the windows
used during training, as defined in the data parameters from Table 2. The second source of
diversity being whether we randomly select the early stopping set or instead use the last 42
weeks preceding the test set. Combining the data augmentation and early stopping options,
we obtain four models that we ensemble using arithmetic mean as the aggregation function.
This technique is also used by the DNN benchmark (Lago et al., 2018a, 2021a).
17
NP (ensemble, MAE) 0.10 PJM (ensemble, MAE) 0.10 BE (ensemble, MAE) 0.10
AR1 AR1 AR1
ESRNN 0.08 ESRNN 0.08 ESRNN 0.08
NBEATS NBEATS NBEATS
ARx1 0.06 ARx1 0.06 ARx1 0.06
LEARx 0.04 LEARx 0.04 LEARx 0.04
DNN DNN DNN
NBEATSx-G 0.02 NBEATSx-G 0.02 NBEATSx-G 0.02
NBEATSx-I NBEATSx-I NBEATSx-I
0.00 0.00 0.00
AR1
ESRNN
NBEATS
ARx1
LEARx
DNN
NBEATSx-G
NBEATSx-I
AR1
ESRNN
NBEATS
ARx1
LEARx
DNN
NBEATSx-G
NBEATSx-I
AR1
ESRNN
NBEATS
ARx1
LEARx
DNN
NBEATSx-G
NBEATSx-I
FR (ensemble, MAE) 0.10 DE (ensemble, MAE) 0.10
AR1 AR1
ESRNN 0.08 ESRNN 0.08
NBEATS NBEATS
ARx1 0.06 ARx1 0.06
LEARx 0.04 LEARx 0.04
DNN DNN
NBEATSx-G 0.02 NBEATSx-G 0.02
NBEATSx-I NBEATSx-I
0.00 0.00
AR1
ESRNN
NBEATS
ARx1
LEARx
DNN
NBEATSx-G
NBEATSx-I
AR1
ESRNN
NBEATS
ARx1
LEARx
DNN
NBEATSx-G
NBEATSx-I
Figure 4: Results of the Giacomini-White test for the day-ahead predictions with mean absolute error (MAE)
applied to pairs of the ensembled models on the five electricity markets datasets. Each grid represents one
market. Each colored cell in a grid is plotted black, unless the predictions of the model corresponding to its
column of the grid outperforms the predictions of the model corresponding to its row of the grid. The color
scale reflects significance of the difference in MAE, with solid green representing the lowest p-values.
5. Conclusions
We have presented NBEATSx: the new method for univariate time series forecasting with
exogenous variables. It extends the well-performing neural basis expansion analysis. The
resulting neural based method has several valuable properties that make it suitable for a
wide range of forecasting tasks. The network is fast to optimize as it is mainly composed
of fully-connected layers. It can produce interpretable results, and achieves state-of-the-art
performance on forecasting tasks where consideration of exogenous variables is fundamental.
We demonstrated the utility of the proposed method using a set of benchmark datasets
from electricity price forecasting domain, but it can be straightforwardly applied to fore-
casting problems in other domains. Qualitative evaluation shows that the interpretable
configuration of NBEATSx can provide valuable insights to the analyst, as it explains the
variation of the time series by separating it into trend, seasonality, and exogenous compo-
nents, in a fashion analogous to classic time series decomposition. Regarding the quantitative
forecasting performance, we observed no significant differences between ESRNN and NBEATS
without exogenous variables. At the same time, NBEATSx improves over NBEATS by nearly
20% and up to 5% over LEAR and DNN models specialized for the Electricity Price Forecasting
tasks. Finally, we found no significant trade-offs between the accuracy and interpretability
of NBEATSx-G and NBEATSx-I predictions.
18
The neural basis expansion analysis is a very flexible method capable of producing ac-
curate and interpretable forecasts, yet there is still room for improvement. For instance,
augmentation of the harmonic functions towards wavelets or replacement of the convolu-
tional encoder that would generate the covariate basis with smoothing alternatives such as
splines. Additionally, one can extend the current non-interpretable method by regularizing
its outputs with smoothness constraints.
Acknowledgements
This work was partially supported by the Defense Advanced Research Projects Agency
(award FA8750-17-2-0130), the National Science Foundation (grant 2038612), the Space
Technology Research Institutes grant from NASA’s Space Technology Research Grants Pro-
gram, the U.S. Department of Homeland Security (award 18DN-ARI-00031), the Ministry
of Education and Science (MEiN, Poland; grant 0219/DIA/2019/48), the National Science
Center (NCN, Poland; grant 2018/30/A/HS4/00444), and Nixtla. Kin G. Olivares and Cris-
tian Challu want to thank Stefania La Vattiata, Max Mergenthaler and Federico Garza for
their support.
References
Atiya, A. F. (2020). Why does forecast combination work so well? International Journal of Forecasting,
36 , 197–200. URL: https://www.sciencedirect.com/science/article/pii/S0169207019300779.
doi:https://doi.org/10.1016/j.ijforecast.2019.03.010. M4 Competition.
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent
networks for sequence modeling. Computing Research Repository, abs/1803.01271 . URL: http://arxiv.
org/abs/1803.01271. arXiv:1803.01271.
Benidis, K., Rangapuram, S. S., Flunkert, V., Wang, B., Maddix, D., Turkmen, C., Gasthaus, J., Bohlke-
Schneider, M., Salinas, D., Stella, L., Callot, L., & Januschowski, T. (2020). Neural forecasting: Intro-
duction and literature overview. Computing Research Repository, . arXiv:2004.10240.
Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization.
In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in Neural
Information Processing Systems (pp. 2546–2554). Curran Associates, Inc. volume 24. URL: https:
//proceedings.neurips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24 , 123–140. URL: https://doi.org/10.
1023/A:1018054314350. doi:10.1023/A:1018054314350.
Chang, S., Zhang, Y., Han, W., Yu, M., Guo, X., Tan, W., Cui, X., Witbrock, M., Hasegawa-Johnson,
M. A., & Huang, T. S. (2017). Dilated recurrent neural networks. In I. Guyon, U. V. Luxburg, S. Bengio,
H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing
Systems. Curran Associates, Inc. volume 30. URL: https://proceedings.neurips.cc/paper/2017/
file/32bb90e8976aab5298d5da10fe66f21d-Paper.pdf.
Chitsaz, H., Zamani-Dehkordi, P., Zareipour, H., & Parikh, P. (2018). Electricity price forecasting for
operational scheduling of behind-the-meter storage systems. IEEE Transactions on Smart Grid , 9 ,
6612–6622. doi:10.1109/TSG.2017.2717282.
Diebold, F., & Mariano, R. (1995). Comparing predictive accuracy. Journal of Business & Economic
Statistics, 13 , 253–265. URL: https://www.sas.upenn.edu/~fdiebold/papers/paper68/pa.dm.pdf.
doi:10.1080/07350015.1995.10524599.
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14 , 179–211. URL: https://
onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1402_1.
19
Giacomini, R., & White, H. (2006). Tests of conditional predictive ability. Econometrica, 74 ,
1545–1578. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-0262.2006.00718.
x. doi:https://doi.org/10.1111/j.1468-0262.2006.00718.x.
Gianfreda, A., Ravazzolo, F., & Rossini, L. (2020). Comparing the forecasting performances of linear
models for electricity prices with high RES penetration. International Journal of Forecasting, 36 , 974–
986. URL: https://www.sciencedirect.com/science/article/pii/S0169207019302596. doi:https:
//doi.org/10.1016/j.ijforecast.2019.11.002.
Graves, A. (2013). Generating sequences with recurrent neural networks. Computing Research Repository,
abs/1308.0850 . URL: http://arxiv.org/abs/1308.0850. arXiv:1308.0850.
Hubicka, K., Marcjasz, G., & Weron, R. (2019). A note on averaging day-ahead electricity price forecasts
across calibration windows. IEEE Transactions on Sustainable Energy, 10(1), 321–323. doi:10.1109/
TSTE.2018.2869557.
Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International
Journal of Forecasting, 22 , 679 – 688. URL: http://www.sciencedirect.com/science/article/pii/
S0169207006000239. doi:https://doi.org/10.1016/j.ijforecast.2006.03.001.
Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., & Tang, P. T. P. (2017). On large-batch training
for deep learning: Generalization gap and sharp minima. URL: http://arxiv.org/abs/1609.04836
published as a conference paper at the 5th International Conference for Learning Representations (ICLR),
Toulon, France, 2017.
Kingma, D. P., & Ba, J. (2014). ADAM: A method for stochastic optimization. URL: http://arxiv.
org/abs/1412.6980 published as a conference paper at the 3rd International Conference for Learning
Representations (ICLR), San Diego, 2015.
Koopmans, L. H. (1995). The spectral analysis of time series. Elsevier.
Lago, J., De Ridder, F., & De Schutter, B. (2018a). Forecasting spot electricity prices: Deep learn-
ing approaches and empirical comparison of traditional algorithms. Applied Energy, 221 , 386 –
405. URL: http://www.sciencedirect.com/science/article/pii/S030626191830196X. doi:https:
//doi.org/10.1016/j.apenergy.2018.02.069.
Lago, J., De Ridder, F., Vrancx, P., & De Schutter, B. (2018b). Forecasting day-ahead electricity
prices in Europe: The importance of considering market integration. Applied Energy, 211 , 890–
903. URL: https://www.sciencedirect.com/science/article/pii/S0306261917316999. doi:https:
//doi.org/10.1016/j.apenergy.2017.11.098.
Lago, J., Marcjasz, G., De Schutter, B., & Weron, R. (2021a). Forecasting day-ahead electric-
ity prices: A review of state-of-the-art algorithms, best practices and an open-access bench-
mark. Applied Energy, 293 , 116983. URL: https://www.sciencedirect.com/science/article/pii/
S0306261921004529. doi:https://doi.org/10.1016/j.apenergy.2021.116983.
Lago, J., Marcjasz, G., Schutter, B. D., & Weron, R. (2021b). Erratum to ’Forecasting day-ahead electricity
prices: A review of state-of-the-art algorithms, best practices and an open-access benchmark’ [Appl. Energy
293 (2021) 116983] . WORking papers in Management Science (WORMS) WORMS/21/12 Department
of Operations Research and Business Intelligence, Wroclaw University of Science and Technology. URL:
https://ideas.repec.org/p/ahh/wpaper/worms2112.html.
LeCun, Y., Bottou, L., Orr, G. B., & Müller, K. R. (1998). Efficient backprop. In Neural Networks: Tricks
of the Trade (pp. 9–50). Berlin, Heidelberg: Springer Berlin Heidelberg. URL: https://doi.org/10.
1007/3-540-49430-8_2. doi:10.1007/3-540-49430-8_2.
Li, W., & Becker, D. (2021). Day-ahead electricity price prediction applying hybrid models of lstm-based
deep learning methods and feature selection algorithms under consideration of market coupling. Energy,
237 , 121543.
Livera, A. M. D., Hyndman, R. J., & Snyder, R. D. (2011). Forecasting time series with complex sea-
sonal patterns using exponential smoothing. Journal of the American Statistical Association, 106 ,
1513–1527. URL: https://doi.org/10.1198/jasa.2011.tm09771. doi:10.1198/jasa.2011.tm09771.
arXiv:https://doi.org/10.1198/jasa.2011.tm09771.
Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2018). Statistical and machine learning forecasting
20
methods: Concerns and ways forward. PLoS One, 13(3), e0194889. URL: https://journals.plos.
org/plosone/article?id=10.1371/journal.pone.0194889.
Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2020). The M4 competition: 100,000
time series and 61 forecasting methods. International Journal of Forecasting, 36 , 54–
74. URL: https://www.sciencedirect.com/science/article/pii/S0169207019301128. doi:https:
//doi.org/10.1016/j.ijforecast.2019.04.014. M4 Competition.
Marcjasz, G. (2020). Forecasting electricity prices using deep neural networks: A robust hyper-parameter
selection scheme. Energies, 13 , 13184605.
Mayer, K., & Trück, S. (2018). Electricity markets around the world. Journal of Commodity Markets, 9 ,
77–100. URL: https://doi.org/10.1016/j.jcomm.2018.02.001.
Narajewski, M., & Ziel, F. (2020). Econometric modelling and forecasting of intraday electricity prices.
Journal of Commodity Markets, 19 , 100107. doi:10.1016/j.jcomm.2019.100107.
Nazar, M. S., Fard, A. E., Heidari, A., Shafie-khah, M., & Catalão, J. P. (2018). Hybrid model using
three-stage algorithm for simultaneous load and price forecasting. Electric Power Systems Research, 165 ,
214–228. doi:10.1016/j.epsr.2018.09.004.
Nowotarski, J., Raviv, E., Trück, S., & Weron, R. (2014). An empirical comparison of alternative schemes
for combining electricity spot price forecasts. Energy Economics, 46 , 395–412. URL: https://ideas.
repec.org/a/eee/eneeco/v46y2014icp395-412.html. doi:10.1016/j.eneco.2014.07.0.
Nowotarski, J., & Weron, R. (2018). Recent advances in electricity price forecasting: A review of probabilistic
forecasting. Renewable and Sustainable Energy Reviews, 81 , 1548–1568. doi:https://doi.org/10.1016/
j.rser.2017.05.234.
van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior,
A. W., & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. CoRR, abs/1609.03499 .
URL: http://arxiv.org/abs/1609.03499. arXiv:1609.03499.
Oreshkin, B. N., Carpov, D., Chapados, N., & Bengio, Y. (2020). N-BEATS: neural basis expansion analysis
for interpretable time series forecasting. In 8th International Conference on Learning Representations,
ICLR 2020 . URL: https://openreview.net/forum?id=r1ecqn4YwB.
Paszke et al. (2019). Pytorch: An imperative style, high-performance Deep Learning library. In H. Wallach,
H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Infor-
mation Processing Systems 32 (pp. 8024–8035). Curran Associates, Inc. URL: http://papers.neurips.
cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
Rosenblatt, F. (1961). Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Tech-
nical Report Cornell Aeronautical Lab Inc Buffalo NY.
Smyl, S. (2020). A hybrid method of exponential smoothing and recurrent neural networks for time series
forecasting. International Journal of Forecasting, 36 , 75–85. URL: https://www.sciencedirect.com/
science/article/pii/S0169207019301153. doi:https://doi.org/10.1016/j.ijforecast.2019.03.
017. M4 Competition.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence learning with neural networks. In
Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural
Information Processing Systems. Curran Associates, Inc. volume 27.
Tishby, N., Pereira, F. C., & Bialek, W. (1999). The information bottleneck method. URL: https://
arxiv.org/abs/physics/0004057 in The 37th annual Allerton Conf. on Communication, Control, and
Computing, pp 368–377.
Uniejewski, B., Nowotarski, J., & Weron, R. (2016). Automated variable selection and shrinkage for day-
ahead electricity price forecasting. Energies, 9(8),621 . doi:10.3390/en9080621.
Uniejewski, B., & Weron, R. (2021). Regularized quantile regression averaging for probabilistic electricity
price forecasting. Energy Economics, 95 , 105121. URL: https://www.sciencedirect.com/science/
article/pii/S0140988321000268. doi:https://doi.org/10.1016/j.eneco.2021.105121.
Uniejewski, B., Weron, R., & Ziel, F. (2018). Variance stabilizing transformations for electricity spot price
forecasting. IEEE Transactions on Power Systems, 33 , 2219–2229. doi:10.1109/TPWRS.2017.2734563.
Wang, L., Zhang, Z., & Chen, J. (2017). Short-term electricity price forecasting with stacked denoising
21
autoencoders. IEEE Transactions on Power Systems, 32 , 2673–2681. doi:10.1109/TPWRS.2016.2628873.
Wen, R., Torkkola, K., Narayanaswamy, B., & Madeka, D. (2017). A Multi-horizon Quantile Recurrent
Forecaster. In 31st Conference on Neural Information Processing Systems NIPS 2017, Time Series
Workshop. URL: https://arxiv.org/abs/1711.11053. arXiv:1711.11053.
Weron, R. (2014). Electricity price forecasting: A review of the state-of-the-art with a look into the future.
International Journal of Forecasting, 30 , 1030–1081. URL: https://www.sciencedirect.com/science/
article/pii/S0169207014001083. doi:https://doi.org/10.1016/j.ijforecast.2014.08.008.
Yao, Y., Rosasco, L., & Andrea, C. (2007). On early stopping in gradient descent learning. Constructive
Approximation, 26(2), 289–315. URL: https://doi.org/10.1007/s00365-006-0663-2.
Ziel, F., & Steinert, R. (2018). Probabilistic mid- and long-term electricity price forecasting. Renewable and
Sustainable Energy Reviews, 94 , 251–266. URL: https://arxiv.org/abs/1703.10806.
22
Appendix A. Appendix
Appendix A.1. Forecast and Backast Basis
1.0 1.0
0.9 0.8
Backcast Basis
Forecast Basis
0.8 0.6
0.7 0.4
0.6 0.2
0.5 0.0
0 10 20 30 40 0 10 20
Time Index Time Index
(a) Trend Basis
1.00 1.00
0.75 0.75
0.50 0.50
Backcast Basis
Forecast Basis
0.25 0.25
0.00 0.00
0.25 0.25
0.50 0.50
0.75 0.75
1.00 1.00
0 10 20 30 40 0 10 20
Time Index Time Index
(b) Harmonic Basis
Figure A.1: Examples of polynomial and harmonic basis included in the interpretable configuration of the
neural basis expansion analysis. The slowly varying basis allow NBEATS to model trends and seasonalities.
As discussed in Section 3.4, the interpretable configuration of the NBEATSx method per-
forms basis projections into polynomial functions for the trends, harmonic functions for the
seasonalities and exogenous variables. As shown in Figure A.1, both the forecast and the
backcast components of the model rely on similar basis functions, and the only difference de-
pends upon the span of their time indexes. For this work in the EPF application of NBEATS,
the backcast horizon corresponds to 168 hours while the forecast horizon corresponds to 24.
23
Appendix A.2. Training and validation curves
2.4 NBEATSx
2.2 NBEATS
2.0
1.8
MAE
1.6
1.4
1.2
1.0
0.8
0 250 500 750 1000 1250 1500 1750 2000
Iteration
(a) Train set
NBEATSx
2.6 NBEATS
2.4
2.2
2.0
MAE
1.8
1.6
1.4
1.2
0 250 500 750 1000 1250 1500 1750 2000
Iteration
(b) Validation set
Figure A.2: Training and validation Mean Absolute Error (MAE) curves on the NP market. We show the
curves for NBEATSx-G with exogenous variables and NBEATS without exogenous variables as a function of the
optimization iterations. We define the four curves by a different random seed used for initialization.
To study the effects of exogenous variables on the NBEATS model, we performed model
training procedure diagnostics. Figure A.2 shows the train and validation mean absolute
error (MAE) for the NBEATS and NBEATSx models as training progresses. The curves cor-
respond to the hyperparameter optimization phase described in Section 4.3.4. The models
trained with and without exogenous variables display a considerable difference in their train
and validation errors as observed by the two separate clusters of trajectories. The exogenous
variables, in this case, the electricity load and production forecasts, significantly improve
the neural basis expansion analysis.
24
Appendix A.3. Computational Time
Table A1: Computational time performance in seconds for the top four most accurate models for the day-
ahead electricity price forecasting task in the NP market, averaged for the four elements of the ensembles
(Time performance for the rest of the markets is almost identical).
We measured the computational time of the top four best algorithms with two metrics:
the recalibration of the ensemble models selected from the hyperparameter optimization,
and the computation of the predictions. For these experiments, we used a GeForce RTX
2080 GPU for the neural network models and an Intel(R) Xeon(R) Silver 4210 CPU @
2.20GHz for LEAR.
The training time of the recalibration phase of NBEATSx remains efficient, as it still
trains in 75 and 81 seconds, increasing by 30 seconds on the relatively simple DNN. The
computational time of the prediction remains within miliseconds. Finally the hyperparameter
optimization scales linearly with respect to the time of the recalibration phase and the
evaluation steps of the optimization, in case of the NBEATSx-G the approximate time of a
hyperparameter search of 1000 steps takes two days4 .
Appendix A.4. Best Single Models
Table A2 shows that the best NBEATSx models yield improvements of 14.8% on aver-
age across all the evaluation metrics when compared to its NBEATS counterpart without
exogenous covariates, and improvements of 23.9% when compared to ESRNN without time-
dependent covariates. A perhaps more remarkable result is the statistically significant im-
provement of forecast accuracy over LEAR and DNN benchmarks, ranging from 0.75% to 7.2%
across all metrics and markets, with the exception of EPEX-BE. Compared to DNN, the RMSE
improved on average 4.9%, the MAE improved 3.2%, the rMAE improved 3.0%, and sMAPE
improved 1.7%. When comparing the best NBEATSx models against the best DNN on individ-
ual markets, NBEATSx improved by 3.18% on the Nord Pool market (NP), 2.03% 2.65% on
French (EPEX-FR) and 5.24% on German (EPEX-DE) power markets. The positive difference
in performance for Belgian (EPEX-BE) market of 0.53% was not statistically significant.
Figure A.3 provides a graphical representation of the GW test for the six best models,
across the five markets for the MAE evaluation metric. The models included in the signifi-
cance tests are the same as in Tables A2: LEAR, DNN, the ESRNN, NBEATS, and our proposed
methods, the NBEATSx-G and NBEATSx-I. The p-value of each individual comparison shows if
the improvement in performance (measured by MAE or RMSE) of the x-axis model over the
y-axis model is statistically significant. Both the NBEATSx-G and NBEATSx-I model outper-
formed the LEAR and DNN models in all markets, with the exception of Belgium. Moreover,
no benchmark model outperformed the NBEATSx-I and NBEATSx-G on any market.
4
For comparability we use 1000 steps (Lago et al., 2021a), restricting to 300 steps yields similar results.
25
Table A2: Forecast accuracy measures for day-ahead electricity prices for the best single model out of the
four models described in the Subsection 4.3.5. The ESRNN and NBEATS, are the original implementations and
do not include time dependent covariates. The reported metrics are mean absolute error (MAE), relative
mean absolute error (rMAE), symmetric mean absolute percentage error (sMAPE) and root mean squared
error (RMSE). The smallest errors in each row are highlighted in bold.
*
The LEARx results for EPEX-DE differ from Lago et al. (2021a) – the values presented there are revised
(Lago et al., 2021b)
NP (single, MAE) 0.10 PJM (single, MAE) 0.10 BE (single, MAE) 0.10
AR1 AR1 AR1
ESRNN 0.08 ESRNN 0.08 ESRNN 0.08
NBEATS NBEATS NBEATS
ARx1 0.06 ARx1 0.06 ARx1 0.06
LEARx 0.04 LEARx 0.04 LEARx 0.04
DNN DNN DNN
NBEATSx-G 0.02 NBEATSx-G 0.02 NBEATSx-G 0.02
NBEATSx-I NBEATSx-I NBEATSx-I
0.00 0.00 0.00
AR1
ESRNN
NBEATS
ARx1
LEARx
DNN
NBEATSx-G
NBEATSx-I
AR1
ESRNN
NBEATS
ARx1
LEARx
DNN
NBEATSx-G
NBEATSx-I
AR1
ESRNN
NBEATS
ARx1
LEARx
DNN
NBEATSx-G
NBEATSx-I
AR1
ESRNN
NBEATS
ARx1
LEARx
DNN
NBEATSx-G
NBEATSx-I
Figure A.3: Results of the Giacomini-White test for the day-ahead predictions with mean absolute error
(MAE) applied to pairs of the single models on the five electricity markets datasets. Each grid represents
one market. Each colored cell in a grid is plotted black, unless the predictions of the model corresponding
to its column of the grid outperforms the predictions of the model corresponding to its row of the grid. The
color scale reflects significance of the difference in MAE, with solid green representing the lowest p-values.
26
Appendix A.5. Comments on Hyperparameter Optimization
In this Section, we summarize observations and key empirical findings from the extensive
hyperparameter optimization on the space defined by Table 2 for the four models composing
each dataset ensemble. These observations and regularities of the optimally selected hyper-
parameters are important to create a more efficient and informed hyperparameter space and
possibly guide future experiments with the NBEATSx architecture.
Interpretable configuration observations:
1. Among quadratic, cubic and fourth degree polynomials, Npol ∈ {2, 3, 4}, the most
common basis selected for the day-ahead EPF task was quadratic, Npol = 2. As shown
in Figure 3, the combination of quadratic trend and harmonics already describes the
electricity price average daily profiles successfully. Linear trends were omitted from
exploration as they showed to be fairly restrictive. In experiments on longer forecast
horizons (H > 24), beyond the scope of this paper, we observed that more trend
flexibility tended to be beneficial.
2. We did not observe preferences in the harmonic basis spectrum controlled by Nhr ∈
{1, 2}, the hyperparameter that controls the number of oscillations of the basis in the
forecast horizon. We believe this is due to the flexibility of the harmonic basis S ∈
RH×(H−1) that already covers a broad spectrum of frequencies. Our intuition dictates
that Nhr = 1 is a good setting unless there is an apparent mismatch between the time-
series frequency and the number of recorded observations that one could have in a
Nyquist-frequency under-sampling or over-sampling phenomenon (Koopmans, 1995).
This, however, is beyond the scope of this paper.